ospray backend with mpi offload

gilcu2 · November 25, 2019, 11:51am

Hello, how to launch pvbatch using the ospray backend in parallel with mpi?
I am trying something like this:

    mpirun -ppn 1 -host localhost VTKOSPRAY_ARGS="--osp:mpi" \
     ./pvbatch script.py : -host n1, n2 ./ospray_mpi_worker --osp:mpi

but like that pv doesn’t connect with the ospray_mpi_worker
Thanks
Reynaldo

mwestphal · November 25, 2019, 12:10pm

@Dave_DeMarle

Dave_DeMarle · November 25, 2019, 2:39pm

With ospray 1.8 / paraview 5.7 the command line arguments have been replaced by ospray specific environment variables. I’ll post an example in a bit.

Dave_DeMarle · November 25, 2019, 11:33pm

Something like this should work
export OSPRAY_LOAD_MODULES=mpi
export OSPRAY_DEFAULT_DEVICE=mpi
mpirun -n 1 -ppn 1 pvbatch : -n 7 ospray_mpi_worker

carsonsbrownlee · November 26, 2019, 9:49am

That should work, though if you run into mpi errors you may want to try a pvbatch build without mpi or use pvpython. Can we get this added to the paraview docs, or at least updated in the headers?

gilcu2 · November 26, 2019, 10:09am

I am getting a Segmentation fault both in ospray_mpi_worker and in pvbatch. I suppose it is related that I am using and Ospray compiled with the system mpi while Paraview is the osmesa binary downloaded. I am trying compiling Paraview with the system mpi.
Error output attached.
Thanks
Reynaldo
pvM0.e69290 (9.3 KB)

gilcu2 · November 29, 2019, 8:19am

I have compiled pv-5.7.0 and ospray-1.8.5 against the same mpi version and use the next simple python program to test:

from paraview . simple import *

model=Sphere ()
render_view = GetActiveViewOrCreate('RenderView')
render_view.EnableRayTracing = 1
render_view.BackEnd = 'OSPRay scivis'
render_view.Shadows = 1
render_view.SamplesPerPixel = 32
render_view.LightScale = 0.75
Show(model,render_view)
render_view.Update()
SaveScreenshot("sphere.png",render_view)

I setted :

export OSPRAY_LOG_OUTPUT=cout
export OSPRAY_LOG_LEVEL=9
export OSPRAY_DEFAULT_DEVICE="mpi"
export OSPRAY_LOAD_MODULES="mpi"

and run:

mpiexec -np 1 pvbatch ~/ukaea/prog/viscli/src/Sphere.py : -n 1 strace ospray_mpi_worker --osp:mpi

and never finish. If I trace any of the tasks they are polling forever

poll([{fd=5, events=POLLIN}, {fd=17, events=POLLIN}], 2, 0) = 0 (Timeout)
poll([{fd=5, events=POLLIN}, {fd=17, events=POLLIN}], 2, 0) = 0 (Timeout)

I did similar experiment with vtk and ospray and it worked

carsonsbrownlee · December 5, 2019, 11:39am

I was able to run your script locally on my mac using 5.7 and ospray 1.8.5. I did notice that paraview does not seem to exit ospray correctly and reports some errors, but it still ran through the script correctly. Is the stall you are seeing after it exports sphere.png? Also, another thought I had is if the stalling you are seeing is hiding some other error, possibly a linking issue finding module_mpi. Are you able to run that same script loading the mpi module, but without running the mpi device?

gilcu2 · December 5, 2019, 2:42pm

Hi, what happened to me is only that paraview and ospray_mpi_worker get polling for ever. Never excute the script.
Thanks

carsonsbrownlee · December 5, 2019, 8:06pm

It’s going to be tricky to debug this issue from my end unfortunately since I can’t reproduce it locally. My hunch is that your error output is actually caused by a linking issue.
To help narrow this down, can you please try running that same script without mpi? Keep the env var that loads the mpi module, but take out the env flag that loads the mpi device. If that fails, try just running it without the mpi module at all.
If it succeeds, could you try running ospExampleViewer with the same mpi split launch you used for paraview?

carsonsbrownlee · December 5, 2019, 8:11pm

Also, did you try my earlier suggestion of trying pvpython instead of pvbatch? This avoids linking and running any mpi code from paraview’s side.

gilcu2 · December 6, 2019, 9:34am

Hi, sorry I didn’t do before. Yes, with pvpython run without problems.
ospExampleViewer also works perfect.

mpiexec -np 1 pvpython ~/ukaea/prog/viscli/src/SphereOspray.py : -n 4 ospray_mpi_worker --osp:mpi

#o: initMPI::OSPonRanks: 1/5
#o: initMPI::OSPonRanks: 4/5
#o: initMPI::OSPonRanks: 0/5
#o: initMPI::OSPonRanks: 2/5
#o: initMPI::OSPonRanks: 3/5
master: Made ‘worker’ intercomm (through split): 0x55f638663690
#w: app process -1/-1 (global 4/5
#w: app process 0/1 (global 0/5
master: Made ‘worker’ intercomm (through split): 0x55ffe68f5690
#w: app process -1/-1 (global 2/5
master: Made ‘worker’ intercomm (through split): 0x55bcb14196c0
#w: app process -1/-1 (global 1/5
master: Made ‘worker’ intercomm (through split): 0x55ae68a54690
#w: app process -1/-1 (global 3/5
master: Made ‘worker’ intercomm (through intercomm_create): 0x563ecc7da500
#osp.mpi.master: processing/sending work item 0
#osp.mpi.master: done work item, tag 0: N6ospray3mpi4work15SetLoadBalancerE
#w: running MPI worker process 0/4 on pid 8468@dellrg
#w: running MPI worker process 3/4 on pid 8471@dellrg
#w: running MPI worker process 2/4 on pid 8470@dellrg
#w: running MPI worker process 1/4 on pid 8469@dellrg
Running on: dellrg /home/reynaldo/tmp/test 2019-12-06 09:30:49.403919
#osp.mpi.master: processing/sending work item 1
#ospray: trying to look up renderer type ‘scivis’ for the first time
#osp.mpi.master: done work item, tag 1: N6ospray3mpi4work10NewObjectTINS_8RendererEEE
#osp.mpi.master: processing/sending work item 2
…

So, what do you think is the problem?
Thanks sphere

carsonsbrownlee · December 6, 2019, 11:28pm

I think the problem is that pvbatch is meant to run paraview distributed and is likely trying to send conflicting mpi messages to the other ranks in that mpirun command. Pvpython works because it does not send any mpi messages. I’m not sure there is an easy way around this really, short of perhaps connecting to a separate ring of pvservers to run some data analysis distributed, or using ospray in a different mpi mode than offload.