Expanding ParaView Catalyst Source Mechanism

In the current implementation, it is difficult to extend how channels in a Catalyst Conduit tree get mapped to sources in ParaView pipelines. What I propose here is to make this process a bit more flexible and to allow ParaView users the ability to use their own class of sources.

How Things Currently Work

When Catalyst Execute is called, ParaView Catalyst will check to see if a source proxy has been created for each channel and will create one if necessary. These proxies are maintained by the vtkInSituInitializationHelper class. While processing a Python script, if a proxy needs to be created, the registration name is checked to see if it is a channel name. For those names that correspond to a channel, instead of creating a proxy, it will use one created by ParaView Catalyst. Note that later on, the code will attempt to set the properties on the proxy based on the other input parameters associated with the original constructor call. If the properties are not applicable to the return proxy, they are quietly ignored (due to a setting on the proxy object).

A Closer Look at ParaView’s catalyst_execute_paraview function

The function is passed a Conduit node called params.

This function initially does the following:

  • Makes sure params contains a catalyst node and that node is a non-empty list
  • Extracts
    • Timestep
    • Time
    • Output multiblock

It then loops over all of the channels extracting the following information:

  • Name
  • Type
  • Conduit node for the channel’s data
  • Channel timestep if it exists or uses the timestep extracted earlier
  • Channel time if it exists or uses the time extracted earlier
  • Channel output multiblock if it exists or uses the output multiblock earlier

Next if the channel’s type is one supported by Blueprint it will attempt to validate the channel’s data.

It then constructs a new Conduit node that contains all of the information extracted from above called fields. Note: this is part of another node it created called globalFields which doesn’t seem to be used anywhere else.

Finally it calls the appropriate update_producer function (based on the channel type), fetches the producer, grabs its client side object as a vtkAlgorithm and calls SetNoPriorTemporalAccessInformationKey.

The Update Producer functions

There are currently 3:

bool update_producer_mesh_blueprint(const std::string& channel_name, const conduit_node* node, const conduit_node* global_fields, bool multimesh, const conduit_node* assemblyNode, bool multiblock, bool amr)

bool update_producer_ioss(const std::string& channel_name, const conduit_cpp::Node* node, const conduit_cpp::Node* vtkNotUsed(global_fields))

bool update_producer_fides( const std::string& channel_name, const conduit_cpp::Node& node, double& time)

Note: In the case of fides, it writes to the time parameter though it is never used in the calling function.

These functions check to see if there is a proxy source for the channel and will create one if it doesn’t have one. Note: That all three methods grab the client-side object and casts them to the following types respectively:

  • vtkConduitSource
  • IOSSReader
  • vtkFidesReader

How to make this more flexible

Let’s assume we have created a new source that we would like to use when processing a specific type of channel or a channel with a specific name. There are 2 main approaches

  1. We extend the update_producer pattern to allow new functions to be registered based on channel name and channel type.
  2. We assume that the sources we want to use share the same core API that the sources they are replacing and instead encode the information in the Conduit tree itself.

In either case we assume that the new sources are properly registered with ParaView via a plugin.

Approach 1: Registering New Producer Functions

In this approach new functions could be added to vtkInSituInitializationHelper that would allow producer factory functions to be registered either by channel name and/or by channel type. Then in the ParaView Catalyst Script’s Initialization section, the user would call these functions to register the appropriate producer factor function. Note that we would have the existing functions set as the defaults for current channel types.

In order to make this work we would require these factory functions to have the same signature which will require the existing ones to be changed as well. One possible signature would be:

bool myProducerFactory(const conduit_node* channel_node, const conduit_node* catalyst_node)

Since all of the information being passed to the original functions came from either the channel node or the catalyst node it would seem reasonable that should be sufficient.

What about passing in additional information?

Since the Python session will attempt to set additional information via the proxy layer, as long as the parameters in the constructor used in the script are supported then this would be a gimme.

Approach 2: Using the Conduit Tree itself

In this approach all we would be doing is swapping out the sources that the current update_producer functions are creating. We could do that by adding this information to the Conduit tree itself.

{
 "catalyst": {
   "scripts": {
     "script": {
       "filename": "foo.py"
     }
   },
 },
 "catalyst_load": {
   "implementation": "paraview",
   "search_paths": {
     "paraview": "@ParaView_CATALYST_DIR@"
   },
   "type_sources": [
     {
       "type" : "mesh",
       "source": "MyConduitSource",
       "properties" : {}
     }
   ],
   "channel_sources": [
     {
       "channel" : "grid",
       "source" : "MyGridSource",
       "properties" : {}
     }
   ]
 }
}

I’ve enhanced the tree by adding a list of type sources, which links channel types to specific sources, and a list of channel sources, mapping channel names to their corresponding sources. Additionally, a ‘properties’ node has been introduced to accommodate the setting of additional properties.

A key challenge arises in the update_producer functions. Currently, these functions retrieve the client-side object and cast it to a specific type. This necessitates that new sources either inherit from existing source classes or that we establish an abstract source class providing the core API, from which all derived sources would then inherit. For instance, MyConduitSource would either derive from vtkConduitSource, or both would derive from a new vtkAbstractConduitSource class..

@berkgeveci @coreylee @dcthomp FYI

Hi @Bob_Obara thanks for this overview of how producer works in the ParaView-Catalyst implementation!

What is your use case to add more producers? Is blueprint mesh not suitable in your case?
I guess that if something is missing in mesh blueprint we must likely push it upstream instead of adding more tweaks downstream in our specific ParaView variant of Catalyst. This would prevent the simulation code to be too ParaView specific when they use Catalyst 2.

Hi Francois - The focus here is on how the data modeled in blueprint/conduit is consumed by the backend and not to deal with any deficiencies on the blueprint/conduit side. In approach 2, the only conduit changes I had proposed was to provide additional information to backend so it that it can create appropriate sources based on the channel type / name.

The one main deficiencies on the blueprint side seems to be always representing data as SOA (structure of arrays) which can cause unnecessary data copying in VTK when the original data is AOS. I don’t think blueprint currently supports AOS structures correct?

Related Topic - Lack of Reader/Source Parity

One reason that a developer might feel the need to create a new Catalyst Source is to better match the functionality existing in VTK Readers. For example, readers will let you set the active data array, the Catalyst Source does not. So another way to expand functionality would be to add relevant API to the Catalyst Source based on the types of readers in would be replacing when exporting a Catalyst script.

It should be supported via the set_external method with stride/offset. Here a VTK issue which should be fixed soon: https://gitlab.kitware.com/vtk/vtk/-/issues/18718

This is an interesting use case, similar to OpenFOAM output control via its controldict files.
It sounds to me that the choice of processing some arrays or not depends on the given python pipeline and it may change at any time. Why not adding some parameters in pvTrivialProducer which is called in the python script? Or some extra key in the catalyst sub-node of the catalyst_execute function?