Catalyst2 detrimental performance

I find that using Catalyst2/MeshBlueprint has significant overhead each timestep. This is a big change w.r.t. the first Catalyst, where the VTK-based (mapped) grids could be re-used with only changing fields data.

To give an idea, when I run a small test case - only 4000 cells on 8 processes - for 100 time steps without running the actual simulation, only the in-situ analysis, I get the wallclock runtime below. The analysis pipeline consists of a pressure contour and writing data to file. The grid consists purely of polyhedra.

Catalyst with mapped grids: 18 seconds
Catalyst2 with mesh blueprint: 1112 seconds

About 60-fold difference in execution time!

Just to be absolutely clear, 99+% of the runtime is spent in the call to catalyst_execute, AFTER my code has set up the complete mesh blueprint node structure!

Am I doing something wrong? Are other users of Catalyst2/MeshBlueprint seeing this too? Do you have any tips on how to further analyse what is going on, and/or how to improve the performance?

My suspicion is vtkConduitSource::RequestData is where most of time is being spent. More timing information from that would help. Alternatively, use the data-dump to generate a data dump and send that over. Using the smaller number of ranks and smallest number of timesteps to reproduce the issue will be the best as that makes debugging the easiest.

This also alludes to reason for the expected performance degradation. Currently, the mesh is re-created each timestep since there’s no indication for ParaView-Catalyst to realize that the mesh is static and only fields are changing. I don’t see any reason we couldn’t add that. Can you describe your schema a little more so we can determine what would be a good way to add support for static meshes?

The mesh may be completely static, or have a static topology with (some) moving vertices, or may change completely, in case of automatic mesh refinement or a change in load balancing for example . I think those three cases would need to be supported. There may be more that I’m not aware of, of course.

I think that in addition to the current protocol (initialize/execute/finalize) more fine-grained control is needed.

Just to get the discussion going, mabye something along the lines of:

  • initialize_catalyst - initialize the scripts, pipelines etc. Runs once at the start of the in-situ analysis.

  • initialize_meshes - inititialize the meshes, call this to build the meshes from scratch at the beginning of the simulation and when the mesh changes topologically due to e.g. AMR or otherwise. This sets up the channel(s) according to the conduit/MeshBlueprint protocol.

  • move_meshes - mesh is topologically identical, but (some) vertices have moved. This is needed if the vertices are copied, rather than externally referenced.

  • execute - set timestep, time, parameters, and run the scripts / pipelines based on the existing meshes and pipeline(s). As much as possible existing data is preserved between calls.

  • finalize_meshes - finalize the meshes. Can be followed by a call to initalize_meshes when meshes have changed topology.

  • finalize_catalyst - finalize prior to exiting the program. Is called only once.

While that’s definitely a workable approach, I am not sure it’s necessary. catalyst_execute should always be given the full node structure each timestep. I think this is perhaps the only way to easily validate the incoming conduit_node using existing blueprint::verify logic. Plus, if a node didn’t change between timesteps, the sim can always just re-use the same conduit_node it constructed previously, there really shouldn’t be any extra burden on the sim side for this.

We just need to extend the scheme to add some indication of which nodes have not changed between timesteps – unless there’s already and existing way of figuring that out.

TL;DR: found my mistake. 100 timesteps with Catalyst2 is now 25 seconds wallclock time, very acceptable.

More details:
I was looking at the timings of the pipeline execution, and I noticed this:

image

x-axis: timestep, y-axis: execution time.

From analysis using valgrind with the tool callgrind, I noticed also that the number of writes to file was way off.

Finally I found my mistake: I created 3 extractors at every pipeline execution, and it turns out these extractors do not get destroyed after pipeline execution. So in timestep 1 there are 3 extractors, then 6, then 9, then 12, etc. etc. After 100 timesteps, there are 300 extractors. :person_facepalming:

good to hear! Yea, extractors are pipeline objects that stick around just like other pipeline objects like sources, filters etc.

I suppose there’s still something to be said for adding support for static meshes, but that can be delayed. For polyhedral meshes, especially, something like this can further speedup the conversion from conduit-to-vtk and thus obviate the need for any static mesh indicators.

Thanks for your help, and yes I read about that proposal to update the element storage for vtkCellArray for polyhedra. Is that something I can help with?

1 Like

Of course, code contributions are always welcome. As far as making any progress on that on my end, it’s really on the back burner for now. None of the active projects we have is keen on prioritizing that task at this time. Personally, I’d love to get to it since I always enjoy doing such cleanups.

2 Likes