Unable to use multi-gpu rendering with pvbatch using EGL ParaView

ssss · April 21, 2023, 9:29am

When using pvbatch with multiple GPUs, nvidia-smi reports that only GPU0 is being used. To ease reproducing this on Kitware’s side, here are some steps and scripts that we used:

I handcrafted a simple python script called test.py.

from paraview import simple

renderViewSettings = simple.GetSettingsProxy('RenderViewSettings')

# Properties modified on renderViewSettings
renderViewSettings.RemoteRenderThreshold = 0.0
a = simple.Cone()
simple.Show(a)
simple.Render()
simple.SaveScreenshot("/tmp/hello.png")

import time
time.sleep(20)

Now, I created a headless EGL pvserver using

mpiexec \
-np 2 -x NVIDIA_VISIBLE_DEVICES=0,1,2,3 ./pvserver --mpi --force-offscreen-rendering --displays 0 : \
-np 2 -x NVIDIA_VISIBLE_DEVICES=0,1,2,3 ./pvserver --mpi --force-offscreen-rendering --displays 1 : \
-np 2 -x NVIDIA_VISIBLE_DEVICES=0,1,2,3 ./pvserver --mpi --force-offscreen-rendering --displays 2

I then connected to this pvserver using a QT paraview binary, ran test.py, and saw 3 GPUs being used in nvidia-smi. Hence, pvserver seems to be working fine.
I tried then with pvbatch using

mpiexec \
-np 2 -x NVIDIA_VISIBLE_DEVICES=0,1,2,3 ./pvbatch --mpi --force-offscreen-rendering --displays 0 -- ./test.py : \
-np 2 -x NVIDIA_VISIBLE_DEVICES=0,1,2,3 ./pvbatch --mpi --force-offscreen-rendering --displays 1 -- ./test.py  : \
-np 2 -x NVIDIA_VISIBLE_DEVICES=0,1,2,3 ./pvbatch --mpi --force-offscreen-rendering --displays 2 -- ./test.py

and I only see GPU with 0 index being used.

Does anyone know a solution for this issue?
If not, is there an easy way to replace a pvbatch-like workflow with a pvserver-based one?

Thank you!

Francois_Mazen · April 24, 2023, 11:23am

Have you tried the -bynode option of OpenMPI or --map-by node of MPICH?
This is the usual way to map MPI processes to computation nodes with multiple GPU.

ssss · April 24, 2023, 4:14pm

@Francois_Mazen the same command works in pvserver but not in pvbatch and, also, these tests were done in a single node. Hence, I don’t think that it’s an MPI issue. It seems to me that it’s more related to how pvbatch works in MPI.