Various questions, issues, and feedback migrating from legacy Catalyst to Catalyst2.

Hello, I’ve mostly finalized the transition from legacy Catalyst to Catalyst 2 in the EDF’s code_saturne (code-saturne.org) CFD code.

Last tests I ran were based on ParaView 6.0.0-603-gad0a153f2a (In Git, commit ad0a153f2, master branch as of 2025-09-08).

There are quite a few nice things with this, especially as the possibility of switching from one Catalyst implementation to another without rebuilding the whole client code, debugging features with Catalyst replay, …, but there are also areas where things do not seem as robust or feature-complete as with the legacy Catalyst implementation, at least for our workflows using unstructured, mixed-element type meshes. So here are my questions and issues.

Minor issues regarding the Conduit mesh blueprint.

  • According to the conduit Mesh blueprint documentation, in the “shape_map” section, it would seem that for mixed element meshes, element type ids do not need to match those of VTK, which seems to be mentioned more as an example than a compulsory choice. I would assume the provided typemap only needs to be consistent with the Conduit shape names. In practice, the ParaView catalyst implementation generates an error if I use type ids different from those of VTK, so it would seem the shape map is not really honored, and VTK shapes are always assumed.

    • If I am wrong and Conduit really expects VTK shape ids, then that would be an issue with the Conduit documentation, as it should then list available types, and could remove the comments on element windings not being defined.
  • For mixed element type meshes, Conduit allows either using a “mixed” element type, which requires defining offsets and element type sizes for each element type, or simply using multiple topologies. But the documentation does not mention than with multiple topologies, a field defined over the mesh needs to have a different name on each topology. This is mentioned briefly in Documentation: a mixed topology example would be useful · Issue #927 · LLNL/conduit · GitHub , but should be more explicit in the documentation. Adding a field key such as “name” or “display_name” (mentioned as being used by VisIt in some examples) could allow handling fields with multiple topologies without requiring the full mixed element type info. Whatever the choice, I believe either Conduit should handle fields with a single name defined over multiple topologies, or avoid mentioning this as a solution to handle mixed element types, and accept only the “mixed” shape definitions.

  • The Conduit documentation provides examples for vector fields, but none for tensors. How should components be named ?

  • In the Conduit mesh blueprint, polyhedra faces may be shared between polyhedral cells (as in the example), but no orientation information is provided, so it is not easy to determine whether a faces’s normal points inwards or outwards for a given polyhedron. With most other models I know of, either the face is duplicated (for a “nodal” definition), or this information is specified. In code_saturne, we have a similar intermediate representation when exporting polyhedra, but use 1-based indexing, and a sign to specify orientation. I do not know if this lack of info (in the Conduit Blueprint model itself) can be an issue for some VTK algorithms…

Minor Issues with the ParaView Conduit implementation

  • When using Vector or tensor field types, in vtkConduitToDataObject.cxx, it seems that ParaView simply determines the scalar, vector, or tensor type based on the number of value keys encountered, so using “values/x”, “values/y”, … or “values/0”, “values/whatever” would work the same. This is practical for handling tensors, which are not described in the Conduit documentation, but could be dangerous, as ordering of tensor coordinates is then implicit. Also, it does not garantee consistency with the Conduit blueprint documentation.

  • When using vector and tensor fields build from field data copied to conduit, we get messages such as this one:

(   0.861s) [pvbatch.0       ]vtkSOADataArrayTemplate:410   WARN| 23vtkSOADataArrayTemplateIfE (0x564bd72ed230): GetVoidPointer called. This is very expensive for non-array-of-structs subclasses, as the scalar array must be generated for each call. Using the vtkGenericDataArray API with vtkArrayDispatch are preferred. Define the environment variable VTK_SILENCE_GET_VOID_POINTER_WARNINGS to silence this warning.

Running under a debugger, It seems this is done in ParaView itself for vector and tensor fields, and not related to our way of handling field components (which is dictated by the Conduit Blueprint). We do not have this sort of message for point coordinates. Note that in the case of point coordinates, we can use shared arrays with Conduit’s set_external methods instead of copies with set. I do not know if this is related, but in the cases I tested, It seems we get those warnings only for fields, not for mesh coordinates.

Missing ghost cells generation ?

Using the legacy ParaView Catalyst, we could simply provide our mesh’s global vertex ids, and call the ghost cells generation filter to build ghost cells.

In Catalyst 2, we can provide the same info using fields, and specify using “state/metadata/vtk_fields/GlobalNodeIds” that these are global ids, but ghost cells do not seem to be generated (testing simply visualizing cell process ids → ghost cells generator → cell data to point data, which should lead to a smooth transition at process interfaces with ghost cells, and a jump otherwise).

Grepping the ParaView code, I find no instance of adjset, so I assume Adjacency sets (Conduit’s recommended manner of handling domain connectivity info) are not handled ? Or am I looking in the wrong place ? I can add this info to my output, relatively easily (as pairwise adjacency sets map pretty well with an optional internal structure in our code), but need to check if there is a point in doing so.

Incorrect behavior on mixed element mesh with polyhedra

Though things seem to work pretty well (with the caveats mentioned above) on a simple hexahedral mesh, I have very strange behavior with ParaView Catalyst on a mesh containing a mix of tetrahedra, pyramids, wedges, and polyhedra:

  • On a single MPI rank:

    • using the Conduit’s print() method on nodes for debugging seems to change the behavior, and lead to an error in the validation of the “execute method”, where Conduit complains “execute” is not a valid protocol element.
    • I cannot visualize some fields, and using a simple pipeline which merley exports data to a VTM output, fields provided to Catalyst do not appear in the exported data. They do appear in the Catalyst dump (using CATALYST_DEBUG and “stub” implementation), but are missing in the output generated by catalyst_replay with the “paraview” implemementation.
  • On multiple MPI ranks:
    I get crashes in MPI reductions even when simply trying to visualize a field on the mesh surface, with no additional filter.

One nice thing with Catalyst 2 is the possibility of dumping Conduit output, so I can easily provide reproduction data

Other issues

With legacy Catalyst, I could specify the log file using

    vtkFileOutputWindow *log_output = vtkFileOutputWindow::New();
    if (mpi_rank < 1)
      log_output->SetFileName("./catalyst.log");
    else
      log_output->SetFileName("/dev/null");
    vtkFileOutputWindow::SetInstance(log_output);

Is there a way to do this using Catalyst2 (without requiring direct calls to the VTK API, which would defeat the purpose of the Conduit mesh blueprint). An environment variable would do, but it seems than in Catalyst examples, such variables can be used to specify the log level, but the log file is determined based on a command-line option.

Attached files

Attached is a Catalyst dump (for use with catalyst_replay on 1 or 2 ranks)
catalyst_dump_1_rank.tar.gz (7.7 MB)
catalyst_dump_2_ranks.tar.gz (7.7 MB)
catalyst_export.py (2.1 KB)

On a single rank, the exported dataset does not contain all the fields visible in the input. On 2 ranks, catalyst_replay hangs…

Follow-up : after checking with the help-desk, 2 of the above issues were mainly on the user side.

  • The issue with polyhedra in parallel was due to an invalid Mesh definition (missing subelements node on a rank where the matching list was empty). On the ParaView side, the Conduit verification error was not propagated correctly.
  • The issue visualizing some fields was due to an incorrect use of VTK attributes metadata (which were not needed, and acted as a filter).

So those 2 issues (which were the most critical ones) are solved.

1 Like

Following-up to this post publicly so that anyone finding this has answers to some of the items mentioned too:

According to the conduit Mesh blueprint documentation, in the “shape_map” section, it would seem that for mixed element meshes, element type ids do not need to match those of VTK, which seems to be mentioned more as an example than a compulsory choice. I would assume the provided typemap only needs to be consistent with the Conduit shape names. In practice, the ParaView catalyst implementation generates an error if I use type ids different from those of VTK, so it would seem the shape map is not really honored, and VTK shapes are always assumed.

That’s a known VTK/ParaView bug

The Conduit documentation provides examples for vector fields, but none for tensors. How should components be named ?

Component naming should not matter

When using Vector or tensor field types, in vtkConduitToDataObject.cxx, it seems that ParaView simply determines the scalar, vector, or tensor type based on the number of value keys encountered, so using “values/x”, “values/y”, … or “values/0”, “values/whatever” would work the same. This is practical for handling tensors, which are not described in the Conduit documentation, but could be dangerous, as ordering of tensor coordinates is then implicit. Also, it does not garantee consistency with the Conduit blueprint documentation.

I assume component order is based on creation order, but I have not found whether Conduit guarantees to maintain child node order. We should figure this out, but it’s never been an issue before, x/y/z for point coordinates are always kept in order of creation.

Using the legacy ParaView Catalyst, we could simply provide our mesh’s global vertex ids, and call the ghost cells generation filter to build ghost cells.

The ParaView Ghost Cells generator uses the Global node Ids fields if specified. From Catalyst2, you can specify “special” fields like this one using the metadata node. Corrections were made recently to ParaView to better support input global node ids

I get crashes in MPI reductions even when simply trying to visualize a field on the mesh surface, with no additional filter.

We recently improved error logging in ParaView, and made it so that node validation errors in non-0 MPI ranks are properly handled.

Is there a way to do this using Catalyst2 (without requiring direct calls to the VTK API, which would defeat the purpose of the Conduit mesh blueprint). An environment variable would do, but it seems than in Catalyst examples, such variables can be used to specify the log level, but the log file is determined based on a command-line option.

ParaView now supports the environment variable PARAVIEW_LOG_FILE for that purpose.

1 Like

Hello, and thanks for the improvements to ParaView.

With further testing, I have encountered new issues:

  • Legacy catalyst scripts do not seem to be recognized anymore. With Catalyst1, I needed to use a vtkCPPythonScriptPipeline type pipeline for legacy scripts, and vtkCPPythonScriptV2Pipeline for the current structure. With Catalyst2, I have no way of specifying the type of script used, but when using a legacy script, I have a warning about a missing “Options” object telling me a default object will be created, and no output.

    • This is an issue, because although we do not produce such scripts anymore, we do still use quite a few of them in our automated tests, and these scripts seemed better adapted to using loops so as to render multiple variables. Also, we do not have the following color scale definition issues as with v2 pipelines described below in this case (i.e. automatic rescaling seems to be the default).
  • Using Catalyst v2 pipelines, it does not seem possible to rescale color maps in any automatic manner:

    • The script generated by ParaView always generates “hard coded” ranges RGPPoint ranges. When generating a script on a coarser mesh or with a different number of iterations as the computation using Catalyst, the actual ranges may be different.
      • Using <input>.CellData["<variable_id>"].GetRange(0), the ranges obtained are local to each MPI rank, so computing the actual range would require additional MPI operations.
      • Directly adding <input>Display.RescaleTransferFunctionToDataRange after the colorbar definitions is not workable either, as it seems the rescaling is based on the local process ranges also, so the output is incorrect on multiple ranks.
      • Simply removing <variable>LUT.RGBPoints() calls in the script leaves the scale in the [0, 1] range.
        • Running the same script with Catalyst1, the behavior is different, and we seem to get the desired/default automatic rescaling.

I do not remember we remove this support, so I feel like this is an issue that should be reported. This should be easily tested with the different Catalyst tests from ParaView.

It should be possible to rescale inside the catalyst_execute method (see https://gitlab.kitware.com/paraview/paraview/-/blob/master/Examples/Catalyst2/CxxFullExample/catalyst_pipeline_with_rendering.py?ref_type=heads#L70)

This method is called at each update and allows for this kind of automation.

Dear all,

Thanks you for this discussion.

Like Yvan, I have not found a way to automatically reset color maps.

I have been recommended the following code, but it only rescales for rank 0, not other ranks:

pxm = renderView1.GetSessionProxyManager()
tfmgr = sm.vtkSMTransferFunctionManager()
tfmgr.ResetAllTransferFunctionRangesUsingCurrentData(pxm, True)

I have found no way outside of using

    LUT.RescaleTransferFunction(color_map_min, color_map_max)

The min and max need to be fetched manually, carefully handling the fact that the data array might not be present at this time step and that the mesh might have no cells on this rank, then reduced across ranks (about a page of code). An added pitfall is that LUTs for a given array name are shared by all scripts, so when running two scripts, one that sets the LUT manually, and one that doesn’t, the LUT gets set for both. This means that we need to manually create different LUTs for each scripts.

I wouldn’t mind more details on what the best way to do this is. I’ve already spent hours trying to get it right and have yet to setup handling of keeping track of setting up one LUT per array and per script.

Thanks,
Alexandre

Thanks for the reply.

I have reported the legacy catalysts scripts issue in the EDF section of the Kitware helpdesk this morning (I also reported the colormap rescaling issue in a separate issue).

I just tried rescaling inside the catalyst_execute method, but have exactly the same issue (and the printed bounds and pressure range are still different for each MPI rank). Given that the same script seems to work in Catalyst1, I have the impression Catalyst2 is simply missing a global MPI metadata sync somewhere. In any case, this is also reproducible under catalyst_replay. I posted the replay files in the issue report. I could also post them here if needed.

I should apologize here, it seems that Catalyst2 has always supported only v2 scripts. I think I was confused by Catalyst1 that is able to read scripts v1 and v2.

Indeed, I reproduce also this issue. The different automatic rescale methods do not work InSitu.

From my early investigations, it comes from the fact that Catalyst2 is run in “symmetric mode” (i.e. as pvbatch --sym: each node behaves as a client, instead of having one client and N server).
In that case, meta-information (like vtkPVDataInformation) are not gathered correctly.

This was reported on ParaView some times ago (last point of https://gitlab.kitware.com/paraview/paraview/-/issues/19590)

And a new dedicated issue for the Rescaling bug: https://gitlab.kitware.com/paraview/paraview/-/issues/23145