Expanding ParaView Catalyst Source Mechanism

Bob_Obara · August 8, 2025, 9:32pm

In the current implementation, it is difficult to extend how channels in a Catalyst Conduit tree get mapped to sources in ParaView pipelines. What I propose here is to make this process a bit more flexible and to allow ParaView users the ability to use their own class of sources.

How Things Currently Work

When Catalyst Execute is called, ParaView Catalyst will check to see if a source proxy has been created for each channel and will create one if necessary. These proxies are maintained by the vtkInSituInitializationHelper class. While processing a Python script, if a proxy needs to be created, the registration name is checked to see if it is a channel name. For those names that correspond to a channel, instead of creating a proxy, it will use one created by ParaView Catalyst. Note that later on, the code will attempt to set the properties on the proxy based on the other input parameters associated with the original constructor call. If the properties are not applicable to the return proxy, they are quietly ignored (due to a setting on the proxy object).

A Closer Look at ParaView’s catalyst_execute_paraview function

The function is passed a Conduit node called params.

This function initially does the following:

Makes sure params contains a catalyst node and that node is a non-empty list
Extracts
- Timestep
- Time
- Output multiblock

It then loops over all of the channels extracting the following information:

Name
Type
Conduit node for the channel’s data
Channel timestep if it exists or uses the timestep extracted earlier
Channel time if it exists or uses the time extracted earlier
Channel output multiblock if it exists or uses the output multiblock earlier

Next if the channel’s type is one supported by Blueprint it will attempt to validate the channel’s data.

It then constructs a new Conduit node that contains all of the information extracted from above called fields. Note: this is part of another node it created called globalFields which doesn’t seem to be used anywhere else.

Finally it calls the appropriate update_producer function (based on the channel type), fetches the producer, grabs its client side object as a vtkAlgorithm and calls SetNoPriorTemporalAccessInformationKey.

The Update Producer functions

There are currently 3:

bool update_producer_mesh_blueprint(const std::string& channel_name, const conduit_node* node, const conduit_node* global_fields, bool multimesh, const conduit_node* assemblyNode, bool multiblock, bool amr)

bool update_producer_ioss(const std::string& channel_name, const conduit_cpp::Node* node, const conduit_cpp::Node* vtkNotUsed(global_fields))

bool update_producer_fides( const std::string& channel_name, const conduit_cpp::Node& node, double& time)

Note: In the case of fides, it writes to the time parameter though it is never used in the calling function.

These functions check to see if there is a proxy source for the channel and will create one if it doesn’t have one. Note: That all three methods grab the client-side object and casts them to the following types respectively:

vtkConduitSource
IOSSReader
vtkFidesReader

How to make this more flexible

Let’s assume we have created a new source that we would like to use when processing a specific type of channel or a channel with a specific name. There are 2 main approaches

We extend the update_producer pattern to allow new functions to be registered based on channel name and channel type.
We assume that the sources we want to use share the same core API that the sources they are replacing and instead encode the information in the Conduit tree itself.

In either case we assume that the new sources are properly registered with ParaView via a plugin.

Approach 1: Registering New Producer Functions

In this approach new functions could be added to vtkInSituInitializationHelper that would allow producer factory functions to be registered either by channel name and/or by channel type. Then in the ParaView Catalyst Script’s Initialization section, the user would call these functions to register the appropriate producer factor function. Note that we would have the existing functions set as the defaults for current channel types.

In order to make this work we would require these factory functions to have the same signature which will require the existing ones to be changed as well. One possible signature would be:

bool myProducerFactory(const conduit_node* channel_node, const conduit_node* catalyst_node)

Since all of the information being passed to the original functions came from either the channel node or the catalyst node it would seem reasonable that should be sufficient.

What about passing in additional information?

Since the Python session will attempt to set additional information via the proxy layer, as long as the parameters in the constructor used in the script are supported then this would be a gimme.

Approach 2: Using the Conduit Tree itself

In this approach all we would be doing is swapping out the sources that the current update_producer functions are creating. We could do that by adding this information to the Conduit tree itself.

{
 "catalyst": {
   "scripts": {
     "script": {
       "filename": "foo.py"
     }
   },
 },
 "catalyst_load": {
   "implementation": "paraview",
   "search_paths": {
     "paraview": "@ParaView_CATALYST_DIR@"
   },
   "type_sources": [
     {
       "type" : "mesh",
       "source": "MyConduitSource",
       "properties" : {}
     }
   ],
   "channel_sources": [
     {
       "channel" : "grid",
       "source" : "MyGridSource",
       "properties" : {}
     }
   ]
 }
}

I’ve enhanced the tree by adding a list of type sources, which links channel types to specific sources, and a list of channel sources, mapping channel names to their corresponding sources. Additionally, a ‘properties’ node has been introduced to accommodate the setting of additional properties.

A key challenge arises in the update_producer functions. Currently, these functions retrieve the client-side object and cast it to a specific type. This necessitates that new sources either inherit from existing source classes or that we establish an abstract source class providing the core API, from which all derived sources would then inherit. For instance, MyConduitSource would either derive from vtkConduitSource, or both would derive from a new vtkAbstractConduitSource class..

Bob_Obara · August 8, 2025, 9:33pm

@berkgeveci @coreylee @dcthomp FYI

Francois_Mazen · August 11, 2025, 7:32am

Hi @Bob_Obara thanks for this overview of how producer works in the ParaView-Catalyst implementation!

What is your use case to add more producers? Is blueprint mesh not suitable in your case?
I guess that if something is missing in mesh blueprint we must likely push it upstream instead of adding more tweaks downstream in our specific ParaView variant of Catalyst. This would prevent the simulation code to be too ParaView specific when they use Catalyst 2.

Bob_Obara · August 11, 2025, 3:54pm

Hi Francois - The focus here is on how the data modeled in blueprint/conduit is consumed by the backend and not to deal with any deficiencies on the blueprint/conduit side. In approach 2, the only conduit changes I had proposed was to provide additional information to backend so it that it can create appropriate sources based on the channel type / name.

The one main deficiencies on the blueprint side seems to be always representing data as SOA (structure of arrays) which can cause unnecessary data copying in VTK when the original data is AOS. I don’t think blueprint currently supports AOS structures correct?

Bob_Obara · August 11, 2025, 4:02pm

Related Topic - Lack of Reader/Source Parity

One reason that a developer might feel the need to create a new Catalyst Source is to better match the functionality existing in VTK Readers. For example, readers will let you set the active data array, the Catalyst Source does not. So another way to expand functionality would be to add relevant API to the Catalyst Source based on the types of readers in would be replacing when exporting a Catalyst script.

Francois_Mazen · August 12, 2025, 6:27am

It should be supported via the set_external method with stride/offset. Here a VTK issue which should be fixed soon: https://gitlab.kitware.com/vtk/vtk/-/issues/18718

This is an interesting use case, similar to OpenFOAM output control via its controldict files.
It sounds to me that the choice of processing some arrays or not depends on the given python pipeline and it may change at any time. Why not adding some parameters in pvTrivialProducer which is called in the python script? Or some extra key in the catalyst sub-node of the catalyst_execute function?

berkgeveci · August 12, 2025, 3:31pm

The primary use case that comes to mind is the array selection that Bob mentioned. Many readers support array selection, which then gets written to the Catalyst script. It would be nice if Catalyst honoured those selections. This could probably be done by changing the conduit source…

The secondary use case that I can think of is dealing with data structures not supported by blueprint. Currently, these require deep copying to data structures that blueprint accepts, even if the Python script does not need these data structures. If we allowed for code inside the source to do some of the transformation, it can decide whether to do deep copies or not. Also, we can support richer data models in cases where we use Viskores to do the processing. This is a secondary use case. Blueprint handles 80-90% of use cases.

nicolas.vuaille · August 25, 2025, 9:15am

AOS support

ParaView Catalyst already support AOS and SOA structures with zero-copies as defined in blueprint: one should use the set_external method and set stride and offset accordingly. See this ParaView example: https://gitlab.kitware.com/paraview/paraview/-/blob/master/Examples/Catalyst2/CxxFullExample/CatalystAdaptor.h?ref_type=heads#L115 . Coordinates is AOS and velocity is SOA.

See also the VTK array creation: https://gitlab.kitware.com/vtk/vtk/-/blob/master/IO/CatalystConduit/vtkConduitArrayUtilities.cxx#L78

Strided arrays

The mentioned VTK MR is about AOS but where only a subset of components should be taken into account because several arrays are interlaced.
E.g.: let’s say you have a pressure array p and a velocity array v, and the memory is allocated as follow: p0 vx0 vy0 vz0 p1 vx1 vy1 vz1 ... . The MR adds a class that reuse the raw pointer to, let’s say, only expose p inside the VTK data array API.

Source override

I have no strong opinion for now about the 2 proposed design. I will be happy if the chosen one is not to much intrusive in the current code path, to keep simple code for simple cases.