Unable to use multi-gpu rendering with pvbatch using EGL ParaView

When using pvbatch with multiple GPUs, nvidia-smi reports that only GPU0 is being used. To ease reproducing this on Kitware’s side, here are some steps and scripts that we used:

  • I handcrafted a simple python script called test.py.
from paraview import simple

renderViewSettings = simple.GetSettingsProxy('RenderViewSettings')

# Properties modified on renderViewSettings
renderViewSettings.RemoteRenderThreshold = 0.0
a = simple.Cone()
simple.Show(a)
simple.Render()
simple.SaveScreenshot("/tmp/hello.png")

import time
time.sleep(20)
  • Now, I created a headless EGL pvserver using
mpiexec \
-np 2 -x NVIDIA_VISIBLE_DEVICES=0,1,2,3 ./pvserver --mpi --force-offscreen-rendering --displays 0 : \
-np 2 -x NVIDIA_VISIBLE_DEVICES=0,1,2,3 ./pvserver --mpi --force-offscreen-rendering --displays 1 : \
-np 2 -x NVIDIA_VISIBLE_DEVICES=0,1,2,3 ./pvserver --mpi --force-offscreen-rendering --displays 2
  • I then connected to this pvserver using a QT paraview binary, ran test.py, and saw 3 GPUs being used in nvidia-smi. Hence, pvserver seems to be working fine.
  • I tried then with pvbatch using
mpiexec \
-np 2 -x NVIDIA_VISIBLE_DEVICES=0,1,2,3 ./pvbatch --mpi --force-offscreen-rendering --displays 0 -- ./test.py : \
-np 2 -x NVIDIA_VISIBLE_DEVICES=0,1,2,3 ./pvbatch --mpi --force-offscreen-rendering --displays 1 -- ./test.py  : \
-np 2 -x NVIDIA_VISIBLE_DEVICES=0,1,2,3 ./pvbatch --mpi --force-offscreen-rendering --displays 2 -- ./test.py 

and I only see GPU with 0 index being used.

  • Does anyone know a solution for this issue?
  • If not, is there an easy way to replace a pvbatch-like workflow with a pvserver-based one?

Thank you!

Have you tried the -bynode option of OpenMPI or --map-by node of MPICH?
This is the usual way to map MPI processes to computation nodes with multiple GPU.

@Francois_Mazen the same command works in pvserver but not in pvbatch and, also, these tests were done in a single node. Hence, I don’t think that it’s an MPI issue. It seems to me that it’s more related to how pvbatch works in MPI.