ospray backend with mpi offload

Hello, how to launch pvbatch using the ospray backend in parallel with mpi?
I am trying something like this:

    mpirun -ppn 1 -host localhost VTKOSPRAY_ARGS="--osp:mpi" \
     ./pvbatch script.py : -host n1, n2 ./ospray_mpi_worker --osp:mpi

but like that pv doesn’t connect with the ospray_mpi_worker


With ospray 1.8 / paraview 5.7 the command line arguments have been replaced by ospray specific environment variables. I’ll post an example in a bit.

1 Like

Something like this should work
mpirun -n 1 -ppn 1 pvbatch : -n 7 ospray_mpi_worker

1 Like

That should work, though if you run into mpi errors you may want to try a pvbatch build without mpi or use pvpython. Can we get this added to the paraview docs, or at least updated in the headers?

1 Like

I am getting a Segmentation fault both in ospray_mpi_worker and in pvbatch. I suppose it is related that I am using and Ospray compiled with the system mpi while Paraview is the osmesa binary downloaded. I am trying compiling Paraview with the system mpi.
Error output attached.
pvM0.e69290 (9.3 KB)

I have compiled pv-5.7.0 and ospray-1.8.5 against the same mpi version and use the next simple python program to test:

from paraview . simple import *

model=Sphere ()
render_view = GetActiveViewOrCreate('RenderView')
render_view.EnableRayTracing = 1
render_view.BackEnd = 'OSPRay scivis'
render_view.Shadows = 1
render_view.SamplesPerPixel = 32
render_view.LightScale = 0.75

I setted :


and run:

mpiexec -np 1 pvbatch ~/ukaea/prog/viscli/src/Sphere.py : -n 1 strace ospray_mpi_worker --osp:mpi

and never finish. If I trace any of the tasks they are polling forever

poll([{fd=5, events=POLLIN}, {fd=17, events=POLLIN}], 2, 0) = 0 (Timeout)
poll([{fd=5, events=POLLIN}, {fd=17, events=POLLIN}], 2, 0) = 0 (Timeout)

I did similar experiment with vtk and ospray and it worked

I was able to run your script locally on my mac using 5.7 and ospray 1.8.5. I did notice that paraview does not seem to exit ospray correctly and reports some errors, but it still ran through the script correctly. Is the stall you are seeing after it exports sphere.png? Also, another thought I had is if the stalling you are seeing is hiding some other error, possibly a linking issue finding module_mpi. Are you able to run that same script loading the mpi module, but without running the mpi device?

Hi, what happened to me is only that paraview and ospray_mpi_worker get polling for ever. Never excute the script.

It’s going to be tricky to debug this issue from my end unfortunately since I can’t reproduce it locally. My hunch is that your error output is actually caused by a linking issue.
To help narrow this down, can you please try running that same script without mpi? Keep the env var that loads the mpi module, but take out the env flag that loads the mpi device. If that fails, try just running it without the mpi module at all.
If it succeeds, could you try running ospExampleViewer with the same mpi split launch you used for paraview?

Also, did you try my earlier suggestion of trying pvpython instead of pvbatch? This avoids linking and running any mpi code from paraview’s side.

Hi, sorry I didn’t do before. Yes, with pvpython run without problems.
ospExampleViewer also works perfect.

mpiexec -np 1 pvpython ~/ukaea/prog/viscli/src/SphereOspray.py : -n 4 ospray_mpi_worker --osp:mpi

#o: initMPI::OSPonRanks: 1/5
#o: initMPI::OSPonRanks: 4/5
#o: initMPI::OSPonRanks: 0/5
#o: initMPI::OSPonRanks: 2/5
#o: initMPI::OSPonRanks: 3/5
master: Made ‘worker’ intercomm (through split): 0x55f638663690
#w: app process -1/-1 (global 4/5
#w: app process 0/1 (global 0/5
master: Made ‘worker’ intercomm (through split): 0x55ffe68f5690
#w: app process -1/-1 (global 2/5
master: Made ‘worker’ intercomm (through split): 0x55bcb14196c0
#w: app process -1/-1 (global 1/5
master: Made ‘worker’ intercomm (through split): 0x55ae68a54690
#w: app process -1/-1 (global 3/5
master: Made ‘worker’ intercomm (through intercomm_create): 0x563ecc7da500
#osp.mpi.master: processing/sending work item 0
#osp.mpi.master: done work item, tag 0: N6ospray3mpi4work15SetLoadBalancerE
#w: running MPI worker process 0/4 on pid 8468@dellrg
#w: running MPI worker process 3/4 on pid 8471@dellrg
#w: running MPI worker process 2/4 on pid 8470@dellrg
#w: running MPI worker process 1/4 on pid 8469@dellrg
Running on: dellrg /home/reynaldo/tmp/test 2019-12-06 09:30:49.403919
#osp.mpi.master: processing/sending work item 1
#ospray: trying to look up renderer type ‘scivis’ for the first time
#osp.mpi.master: done work item, tag 1: N6ospray3mpi4work10NewObjectTINS_8RendererEEE
#osp.mpi.master: processing/sending work item 2

So, what do you think is the problem?

I think the problem is that pvbatch is meant to run paraview distributed and is likely trying to send conflicting mpi messages to the other ranks in that mpirun command. Pvpython works because it does not send any mpi messages. I’m not sure there is an easy way around this really, short of perhaps connecting to a separate ring of pvservers to run some data analysis distributed, or using ospray in a different mpi mode than offload.