vtk Debug Leak mitigation with Catalyst

I have a very straight forward implementation of Catalyst rendering off screen images on ORNL’s Summit.

Our code uses very simple multiblock structured grids with one block per processor. It is working beautifully for our 2D cases wih ~8000 blocks. However when we scale up to our production 3D case comprised of ~40,000 blocks Catalyst appears to be hanging during the image generation process, (we are just saving two render views and one renderview will be written out and then the code hangs before writing out the second.) This also appears to happen with fewer 3D blocks. (For reference the 2D blocks are on the order of 20x20 cells, where the 3D blocks are on the order of 30x30x30 cells)

The smaller 2D cases seems to run indefinitely with no issues, and tracking memory usage, it is consistent throughout the simulation, however vtk reports leaks detected after the simulation completes:

vtkDebugLeaks has detected LEAKS!
Class “vtkCellData” has 1 instance still around.
Class “vtkStructuredGrid” has 1 instance still around.
Class “vtkInformationVector” has 14 instances still around.
Class “vtkQuad” has 2 instances still around.
Class “vtkPointData” has 1 instance still around.
Class “vtkPoints” has 14 instances still around.
Class “vtkInformation” has 26 instances still around.
Class “vtkLine” has 6 instances still around.
Class “vtkInformationIntegerPointerValue” has 1 instance still around.
Class “vtkHexahedron” has 1 instance still around.
Class “vtkIdList” has 13 instances still around.
Class “vtkEmptyCell” has 1 instance still around.
Class “vtkDoubleArray” has 19 instances still around.
Class “vtkInformationIntegerValue” has 1 instance still around.
Class “vtkVertex” has 1 instance still around.
Class “vtkFloatArray” has 1 instance still around.
Class “vtkTriangle” has 2 instances still around.
Class “vtkFieldData” has 1 instance still around.
Class “vtkInformationDoubleVectorValue” has 20 instances still around.
Class “9vtkBufferIfE” has 1 instance still around.
Class “9vtkBufferIdE” has 19 instances still around.
Class “vtkCommand or subclass” has 1 instance still around.

I don’t know if this is the cause of the hanging runs on the large scale case, however I figured it should be addressed one way or another.

I am a bit confused about how to address this issue as whenever I use ->Delete() on any constructed objects I get invalid memory references.

My adapter can be found here https://github.com/kaschau/catalyst_test

I am using ParaView 5.6.1, XL compilers with EGL off screen rendering.

Thanks so much for the help.

Kyle

Hi Kyle,

After a real quick look, you’ll want to change

vtkSmartPointer<vtkPoints> points = vtkPoints::New();

to

vtkSmartPointer<vtkPoints> points = vtkSmartPointer<vtkPoints>::New();

Your vtkStructuredGrid is leaking as well. I recommend creating this as a smart pointer as well - you can set it as a block as you are doing just fine.

Those changes should take care of the memory leaks, but it’s not clear that will help with the 3D blocks. Is there any chance you can attach a debugger to one of the processes and pause execution when it appears to hang to see where the hang might be occurring?

Hi Cory,

Thank you for the reply. The changes to smart pointers did the trick! I will do my best with the debugger, or at least try an catch a core dump as summit is pretty good about that.

You’ll have to bear with me as this only comes up with many blocks (doesn’t show up in a case with similar sized blocks on the order of 100 blocks) and it takes a long time to get a simulation that size through the queue. I will be in touch!

Hi Cory

I was finally able to attach to a process when the hang happened. Here is the back trace:

#0 0x0000200020de8464 in PAMI_Context_advancev ()
from /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/xl-16.1.1-3/spectrum-mpi-10.3.0.1-20190611-aqjt3jo53mogrrhcrd2iufr435azcaha/lib/pami_port/libpami.so.3

#1 0x0000200021407624 in LIBCOLL_Advance_pami (context=) at api.cc:87

#2 0x00002000214018d8 in LIBCOLL_Advance (ctxt=) at libcoll.cc:133

#3 0x000020002130cd58 in start_libcoll_blocking_collective ()
from /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/xl-16.1.1-3/spectrum-mpi-10.3.0.1-20190611-aqjt3jo53mogrrhcrd2iufr435azcaha/lib/spectrum_mpi/mca_coll_ibm.so

#4 0x00002000213116d4 in mca_coll_ibm_barrier ()
from /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/xl-16.1.1-3/spectrum-mpi-10.3.0.1-20190611-aqjt3jo53mogrrhcrd2iufr435azcaha/lib/spectrum_mpi/mca_coll_ibm.so

#5 0x000020000053352c in PMPI_Barrier () from /opt/ibm/spectrum_mpi/jsm_pmix/…/lib/libmpi_ibm.so.3

#6 0x0000200003c42334 in vtkMPICommunicatorDebugBarrier (handle=0x3d1d3b50)
at /ccs/home/kschau/software/ParaView/VTK/Parallel/MPI/vtkMPICommunicator.cxx:39

#7 0x0000200003c44774 in vtkMPICommunicator::BroadcastVoidArray (this=0x3d1d3aa0, data=0x7ffff72073a8, length=1, type=11, root=0)
at /ccs/home/kschau/software/ParaView/VTK/Parallel/MPI/vtkMPICommunicator.cxx:1157

#8 0x0000200001ba555c in vtkCommunicator::Broadcast (this=0x3d1d3aa0, data=0x7ffff72073a8, length=1, srcProcessId=0)
at /ccs/home/kschau/software/ParaView/VTK/Parallel/Core/vtkCommunicator.h:339

#9 0x0000200001bc9d68 in vtkMultiProcessController::Broadcast (this=0x3d1d3800, data=0x7ffff72073a8, length=1, srcProcessId=0)
at /ccs/home/kschau/software/ParaView/VTK/Parallel/Core/vtkMultiProcessController.h:507

#10 0x0000200043760460 in PyvtkMultiProcessController_Broadcast_s4 (self=0x2000e1feb130, args=0x2000e1fe7910)
at /ccs/home/kschau/software/ParaView-v5.6.1-XL/build/VTK/Wrapping/Python/vtkMultiProcessControllerPython.cxx:2381

#11 0x0000200008602b1c in vtkPythonOverload::CallMethod (methods=0x2000437d64a0 <_$STATIC+8096>, self=0x2000e1feb130, args=0x2000e1fe7910)
at /ccs/home/kschau/software/ParaView/VTK/Wrapping/PythonCore/vtkPythonOverload.cxx:884

#12 0x0000200043767498 in PyvtkMultiProcessController_Broadcast (self=0x2000e1feb130, args=0x2000e1fe7910)
at /ccs/home/kschau/software/ParaView-v5.6.1-XL/build/VTK/Wrapping/Python/vtkMultiProcessControllerPython.cxx:2554

#13 0x00002000081749b8 in _PyMethodDef_RawFastCallKeywords (method=, self=0x2000e1feb130, args=,
nargs=, kwnames=0x0) at Objects/call.c:698

#14 0x0000200008174b58 in _PyCFunction_FastCallKeywords (func=0x2000e1fe0820, args=, nargs=,
kwnames=) at Objects/call.c:734

#15 0x000020000813da50 in call_function (kwnames=0x0, oparg=, pp_stack=) at Python/ceval.c:4568

#16 _PyEval_EvalFrameDefault (f=0x51830680, throwflag=) at Python/ceval.c:3093

#17 0x00002000082b5514 in PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:547

#18 0x00002000082b622c in _PyEval_EvalCodeWithName (_co=0x20004d1de6f0, globals=, locals=,
args=, argcount=2, kwnames=, kwargs=, kwcount=3, kwstep=1, defs=0x20004d1e5108, defcount=3,
kwdefs=0x0, closure=0x0, name=0x20004d208670, qualname=0x20004d1d6350) at Python/ceval.c:3930

#19 0x00002000081741cc in _PyFunction_FastCallKeywords (func=, stack=, nargs=,
kwnames=) at Objects/call.c:433

#20 0x000020000813e0dc in call_function (kwnames=, oparg=, pp_stack=) at Python/ceval.c:4616

#21 _PyEval_EvalFrameDefault (f=0x20004d1c3d00, throwflag=) at Python/ceval.c:3139

#22 0x00002000082b5514 in PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:547

#23 0x0000200008132184 in function_code_fastcall (co=, args=, nargs=, globals=)
at Objects/call.c:283

#24 0x000020000813d870 in call_function (kwnames=0x0, oparg=, pp_stack=) at Python/ceval.c:4616

#25 _PyEval_EvalFrameDefault (f=0x20004d1e7050, throwflag=) at Python/ceval.c:3093

#26 0x00002000082b5514 in PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:547

#27 0x00002000082b622c in _PyEval_EvalCodeWithName (_co=0x200041a88420, globals=, locals=,
args=, argcount=0, kwnames=, kwargs=, kwcount=0, kwstep=2, defs=0x0, defcount=0,
kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:3930

#28 0x00002000082b63bc in PyEval_EvalCodeEx (_co=, globals=, locals=, args=,
argcount=, kws=, kwcount=, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0)
at Python/ceval.c:3959

#29 0x00002000082b6428 in PyEval_EvalCode (co=, globals=, locals=) at Python/ceval.c:524

#30 0x0000200008300ca0 in run_mod (arena=0x20004cddefd0, flags=, locals=0x200041a427d0, globals=0x200041a427d0,
filename=0x200041a8e970, mod=) at Python/pythonrun.c:1035

#31 PyRun_StringFlags (
str=0x57ac3418 “dataDescription = vtkPVCatalyst.vtkCPDataDescription(‘40289c90’)\ncoproc.DoCoProcessing(dataDescription)\n”,
start=, globals=0x200041a427d0, locals=0x200041a427d0, flags=) at Python/pythonrun.c:959

#32 0x0000200008300dac in PyRun_SimpleStringFlags (
command=0x57ac3418 “dataDescription = vtkPVCatalyst.vtkCPDataDescription(‘40289c90’)\ncoproc.DoCoProcessing(dataDescription)\n”,
flags=0x0) at Python/pythonrun.c:455

#33 0x000020000808cb30 in vtkPythonInterpreter::RunSimpleString (
script=0x57d93498 “dataDescription = vtkPVCatalyst.vtkCPDataDescription(‘40289c90’)\ncoproc.DoCoProcessing(dataDescription)\n”)
at /ccs/home/kschau/software/ParaView/VTK/Utilities/PythonInterpreter/vtkPythonInterpreter.cxx:477

#34 0x0000200000e5769c in vtkCPPythonScriptPipeline::CoProcess (this=0x40289de0, dataDescription=0x40289c90)
at /ccs/home/kschau/software/ParaView/CoProcessing/PythonCatalyst/vtkCPPythonScriptPipeline.cxx:264

#35 0x0000200000ebe44c in vtkCPProcessor::CoProcess (this=0x3c751780, dataDescription=0x40289c90)
at /ccs/home/kschau/software/ParaView/CoProcessing/Catalyst/vtkCPProcessor.cxx:310

#36 0x0000200000e9ce44 in vtkCPAdaptorAPI::CoProcess () at /ccs/home/kschau/software/ParaView/CoProcessing/Catalyst/vtkCPAdaptorAPI.cxx:162

#37 0x0000200000ec4fac in coprocess_ () at /ccs/home/kschau/software/ParaView/CoProcessing/Catalyst/CAdaptorAPI.cxx:50

#38 0x000000001000bc5c in arch_out_coproc () at /gpfs/alpine/scratch/kschau/chm139/mxl/DTMS-3D/Usr/arch_out_coproc.f90:133

#39 0x0000000010045370 in gusr (strio=‘postdt’) at /gpfs/alpine/scratch/kschau/chm139/mxl/DTMS-3D/Usr/gusr.f90:174

#40 0x00000000100ca328 in dtms () at /gpfs/alpine/scratch/kschau/chm139/mxl/DTMS-3D/Src/DTMS.f90:548

Since this looks like an MPI issue and I am using my system install (through pip) of mpi4py, so I am going to try and have Paraview build mpi4py on its own and see if that helps as a first and easy step.

For some more context after paying more attention to the hanging process, there are a few updates:

1.) It is inconsistent as to exactly when it hangs, i.e. sometimes it will happen after 20 iterations, sometimes after 200, but it always happens.

2.) It actually DOES occur with the smaller 2D cases, it just takes much longer to occur with smaller cases.

I am using a self install of python3.6 as the python library, and allowing the vtkmpi4py to be build with the paraview build. I have installed with PARAVIEW_USE_MPISSEND on and off with no effect. Thanks again for your time.

Hi Kyle,

Happy New Year!

Thanks for attaching the trace. It looks like the hang is occurring at an MPI broadcast operation. There is likely an MPI collective communications call mismatch somewhere, either within ParaView itself, or in your script that uses mpi4py. We have several times encountered this kind of hang when some section of code is skipped because there is no data on an MPI process.

You may want to examine your Python script and ensure that all MPI collective calls are executed on all processes. If that looks good, feel free to attach your Python script so we can take a look. That will also help us see if there is an internal ParaView bug.

Hello Cory!

Happy New Year to you as well.

I actually do not use mpi4py in anything I create, my python code is generated purely out of the ParaView GUI through the Catalyst Exporter from renderviews, poly-data, etc. on a representative data set.

The script that I am using can be found here https://github.com/kaschau/catalyst_test/blob/master/coproc.py

It is just saving out two renderviews as png files.

I am not sure where else to look for mpi4py usage on my end… it’s all a pretty simple implementation.

Thanks again for your help,

Kyle

Okay, thanks for clarifying your non-use of mpi4py. That means the finger is pointed at ParaView itself.

Looking at the coprocessing python module, as best I can tell your backtrace involves the call here. Now, the question is, where are the other MPI processes stuck? If it is possible to get a stack trace on one of those processes, then we can see where things go awry. I know that isn’t the easiest or quickest thing to do, but without that, finding the communications mismatch is a bit like a needle in a haystack.

Thanks Cory, I will give it a go and see if I get lucky!

Hey Cory,

I was able to get a backtrace on ~65% of the 8000 processes running on the case (unfortunately my wall time ran out before I could gather them all) But hopefully this is at least helpful.

So here is the break down (just for a refresher, this hanging is very consistent and ALWAYS occurs after successfully saving out one image, but before saving out the second).

A majority of the processes (4600 out of 5200) are hung here -> majority. Which is in the iceT library.

A minority of the processes (600 out of 5200) are hung here -> minority. Which is in the vtkMPIcomminicator.

I have all the backtraces but it seems pretty representative.

Thanks again for your help, and let me know what else I can do to help.

Cory,

As a workaround for this I am trying to play with MergeBlocks, hoping that will move my slice data to a head node, so there is no need for MPI communication with the image saver.

However in my tests with saving out data (a .vtm or .pvtu) of a merge block I still get many separate files instead of a single file as I would expect.

Do you know what the expected behavior of MergeBlocks is in this regard?

Thanks