Hi,
I compiled paraview in headless mode with EGL support. However, I can use it only if PBS scheduler assigns me the very first GPU in the node (index 0). If it assigns me any other card, despite it is visible within the job as the device with index 0, the EGL device can not be initialized.
Do you have any idea what might be wrong?
Additionally, I have found no way to specify the EGL device index manually, I always tried variable $CUDA_VISIBLE_DEVICES or the index according to nvidia-smi:
Now it can see the script, but the resulting error with EGL is still the same:
( 1.100s) [pvbatch ] vtkEGLRenderWindow.cxx:353 WARN| vtkEGLRenderWindow (0x556bb0976040): EGL device index: 0 could not be initialized. Trying other devices…
( 1.105s) [pvbatch ] vtkEGLRenderWindow.cxx:386 WARN| vtkEGLRenderWindow (0x555f5634a070): Setting an EGL display to device index: -1 require EGL_EXT_device_base EGL_EXT_platform_device EGL_EXT_platform_base extensions
( 1.105s) [pvbatch ] vtkEGLRenderWindow.cxx:388 WARN| vtkEGLRenderWindow (0x555f5634a070): Attempting to use EGL_DEFAULT_DISPLAY…
( 1.105s) [pvbatch ] vtkEGLRenderWindow.cxx:393 ERR| vtkEGLRenderWindow (0x555f5634a070): Could not initialize a device. Exiting…
( 1.105s) [pvbatch ]vtkOpenGLRenderWindow.c:511 ERR| vtkEGLRenderWindow (0x555f5634a070): GLEW could not be initialized: Missing GL version
Regardless of what I supply to the display, it attempts to use device index 0. This should be correct, as PBS gives me resources in a separate namespace, in nvidia-smi output, there is only one GPU visible, with index 0. But when I ssh directly to the cluster and use nvidia-smi, there are 8 different GPUs and for example, what I see from the PBS job as index 0, is in fact GPU with index 7 within whole machine.
I assume that problem is that paraview tries to use EGL by index and not by persistent address, e.g. Bus-ID, or the ID of the GPU which is in the $CUDA_VISIBLE_DEVICES variable.
Well, I really don’t know what is happening, this is just my guess. From what I tested, on the same machine, EGL could be initialized when I was assigned the first GPU card and it could not if I was assigned any other one. This I have verified on three different machines, same behavior.