Catalyst: Finer grain control over execution

Christos_Tsolakis · December 11, 2023, 4:04am

As it stands right now if a simulation calls catalyst_execute data will be converted even if the extractors run at a frequency greater than one. Even though conduit blueprints are lightweight this could be costly depending on the conversion a simulation may required to do. We could ask from the simulation side to control how often catalyst is called but then the integration becomes more convoluted.
See also related issue paraview/paraview#22358.

Moreover, users of catalyst have expressed the need to alter the behavior of a catalyst script based on information coming from the simulation. For example, near convergence they might need more frequent runs of the extractors of even different kinds of extractors/pipelines. Another use case could be to modify the properties of the filters composing a pipeline dynamically based on values coming from each iteration.

With the current design the above use-cases are either too convoluted or impossible.

To allow for a finer control over the execution of catalyst we could provide an optional field under catalyst/state of the execute protocol in the ParaViewBlueprint.

Here is a possible usage as documented by @berkgeveci in paraview/paraview#22358 which I copy here for completeness

from paraview.simple import *

source = TrivialProducer(registrationName='input')

def catalyst_execute(info):
    global source
    n = info.catalyst_state_conduit_node()
    if n['pass'] == 'meta-data':
        n['request/arrays/a'] = 1
        n['request/arrays/b'] = 1
    elif n['pass'] == 'execute':
        source.UpdatePipeline()
        print(source.GetDataInformation().DataInformation)

catalyst/state/pass can be populated by the simulation upon runtime.
This is in Python but any of the supported languages could be used.

Some things to note here:

catalyst_execute gets the state conduit node using the catalyst_state_conduit_node() call and it reads a pass node from it.
If the pass is set to meta-data, it returns meta-data (the TrivialProducer has no data at this point). This is done by directly manipulating the state node
If the pass is set to execute, it executes the TrivialProducer.

def coprocess(time, timeStep, grid, attributes):
    # do the actual in situ analysis and visualization.
    node = catalyst_conduit.Node()

    node['catalyst/state/timestep'] = timeStep
    node['catalyst/state/time'] = time

    # meta-data pass
    node['catalyst/state/pass'] = 'meta-data'
    catalyst.execute(node)

    # execution pass
    arrays_requested = node['catalyst/state/request/arrays']
    node['catalyst/state/pass'] = 'execute'

    node['catalyst/channels/input/type'] = 'mesh'

    mesh = node['catalyst/channels/input/data']
    # Populate the mesh. Use the arrays_requested from above
    # ...
    catalyst.execute(node)

A working merge request of this functionality can be found in paraview/paraview!6614.

What do you think of this feature ? Any concerns or ideas for improving it ?

cc: @Francois_Mazen , @Lucas_Givord @nicolas.vuaille @Andy_Bauer @coreylee @utkarsh.ayachit

Francois_Mazen · December 11, 2023, 3:45pm

As far as I understood, the convention is to consider the conduit nodes as immutable. The only mutable conduit nodes are the ones passed to the catalyst_results and catalyst_about methods. With this in mind, have you considered using the steering mechanism to get informations from the Catalyst implementation?

What kind of meta-data are your expecting from the Catalyst implementation? Again, if your goal is to get some information from the ParaView side, I guess the steering mechanism is the way to go, especially with the recent Steering Extractor feature addition.

Otherwise the state/pass key to allow or prevent pipeline execution looks great to me. Do you plan to specify the name of the pipeline which will be (or not) executed?

utkarsh.ayachit · December 11, 2023, 4:42pm

I am wondering if overloading catalyst_execute is really the best way. I see how this is intended to avoid ABI issues, but I am wondering if we still have compatibility challenges. Let’s consider the following cases:

If a simulation instrumented with 2-pass support is using a Catalyst analysis implementation without knowledge of the two passes, the analysis is going to be executed twice! Note, analysis codes need not be ParaView Catalyst alone. A simple C++/VTK analysis code may not use triggers etc at all and could just be designed to dump out all input. So, now it;s going to behave very oddly with a 2-pass simulation.
what happens with catalyst replay? That too now needs to handle passes and support use-cases where the data-dump is not 2-pass but analysis is, and vice-versa.

I wonder if instead adding a new API call to get metadata is better, despite the version change. If we add catalyst_metadata API call, then simulation that don’t use it can still continue to work with implementaitons with or without this function. Simulations that use catalyst_metadata will correctly require a newer implementation that handles this call. Taking it further, catalyst_api.c perhaps can be even updated to include a default implementation of catalyst_metadata call if the loaded implementation doesn’t implement it – not entirely sure. But if possible, that’d be a way to avoid the Catalyst version change entirely making it forward and backward compatible.

ben.boeckel · December 11, 2023, 7:49pm

I also have worries about the behavioral change here. Such a call is trivially implemented (do nothing) where the lack of any metadata requests means “provide everything” like it always has.

berkgeveci · December 12, 2023, 12:04am

Everybody is right

Our top priority is to expose conduit information to the Python scripts in a user-friendly way. There are several clunky ways of passing information (not data objects) back and forth. This would clean up things and open new possibilities. No one objected to this so I am assuming that we agree.
François is right: the conduit node passed to catalyst_execute is supposed to be read-only. So our example is flawed.
Our example would not work with other catalyst implementations as pointed out by Utkarsh and Ben. I am not worried about this since this is for a specific use case that is not likely to use other implementations. Nevertheless, it is bad practice to have an example or an implementation that can blow in unexpected ways.

@Christos_Tsolakis and I talked and we believe the right way forward is to expose more capability to the ParaView/Catalyst scripts. So we proposed the following:

We expose the (by convention read-only) conduit node in the Python scripts (catalyst_execute)
We add the capability of providing a Python function to handle catalyst_results. This would be in addition to the steering extractors. A bi-directional conduit node is exposed to this call and it would be available in Python. The Python method would be called before the steering extractors are executed.

I am not excited about adding a new method (catalyst_metadata) for a single use case at this point. I’d rather (ab)use catalyst_results at this point. Other implementations do not use it so we can do some creative things with it.

Andy_Bauer · December 12, 2023, 3:02pm

Could the catalyst_about method be changed for ParaView Catalyst to return information to the adaptor on whether or not there’s a metadata check to be done before calling catalyst_execute? I’m not sure what this would look like yet and am just throwing ideas out there.

ben.boeckel · December 12, 2023, 4:15pm

That could work, but still conflicts with the const conduit_node* parameter to catalyst_execute.

ben.boeckel · December 12, 2023, 4:18pm

Oh, if you mean about specifying whether it is a no-op because of being too old or an explicit no-op? I suppose that the catalyst_about wrapper could add information libcatalyst has about the implementation to it…

berkgeveci · December 12, 2023, 9:14pm

I believe that I have a solution that does not violate any of Catalyst’s assumptions. Here is a summary:

catalyst_initialize()
n = conduit.Node()
catalyst_execute(n) # dummy execute
results = conduit.Node()
catalyst_results(results) # the Results() method in Python fills the results node
for t in time:
    data = conduit.Node()
    # populate data with simulation data. take results into account in selecting arrays
    catalyst_execute(data)

You can even do this and potentially change which fields are populated at any time:

catalyst_initialize()
n = conduit.Node()
catalyst_execute(n) # dummy execute
results = conduit.Node()
catalyst_results(results) # the Results() method in Python fills the results node
for t in time:
    data = conduit.Node()
    # populate data with simulation data. take results into account in selecting arrays
    catalyst_execute(data)
    catalyst_results(results) # the Results() method in Python fills the results node

This approach very nicely lines up with the steering concept. We are steering catalyst by telling which data structures to adapt. It can be mix & matched with any other steering approach including Live.

Christos_Tsolakis · December 14, 2023, 4:05pm

Two merge requests that implement the above functionality:

paraview/paraview!6614 exposes the params conduit node inside the catalyst_execute() call of a catalyst script. This is the node that the simulation passed to the catalyst_execute(params) call. The node is assumed to be readonly. We do nothing besides giving access to it
paraview/paraview!6626 realizes the second example from above. catalyst_results can now be specified inside a catalyst script. The node passed an input from the simulation side is available for this call via its info parameter.
The merge request contains a new example which I add here simplified for completeness.

catalyst script:

from paraview.simple import *
producer = TrivialProducer(registrationName="input")

def catalyst_execute(info):
    global producer

    print("available point arrays:", producer.PointData.keys())
    print("available cell arrays:", producer.CellData.keys())

def catalyst_results(info):
    # request only some of the available arrays for the next `catalyst_execute` call
    info.catalyst_params['input/request/velocity'] = True
    if info.timestep % 2 == 0:
      info.catalyst_params['input/request/pressure'] = True
    else:
      info.catalyst_params['input/request/pressure'] = False

simulation driver:

results = None
for i in range(100):
    attributes.Update(i)
    catalyst_adaptor.execute(i, i, grid, attributes,results)
    results = catalyst_adaptor.results()

catalyst_adaptor :

def results():
    # get results from catalyst script
    node = catalyst_conduit.Node()
    catalyst.results(node)
    return node

def execute(time, timeStep, grid, attributes, results = None):
    node = catalyst_conduit.Node()
   
    node['catalyst/state/timestep'] = timeStep
    node['catalyst/state/time'] = time
    ...
   # based on previous results pass only the required arrays
    use_velocity = results is None or results['input/request/velocity'] == True
    use_pressure = results is None or results['input/request/pressure'] == True

   if use_velocity:
      fields['velocity/association'] = 'vertex'
      ...

   if use_pressure:
      fields['pressure/association'] = 'pressure'
      ...

Of course we could have more complicated logic like applying different filters and updating different producers if multiple channels exists.

berkgeveci · December 14, 2023, 9:30pm

This works very nicely. The example that @Christos_Tsolakis gave updates the list of variables every time step. If one wants to update only at the beginning, something like this works:

    node = catalyst_conduit.Node()
    node['catalyst/state/timestep'] = -1
    node['catalyst/state/time'] = -1.0
    catalyst.execute(node)
    n = catalyst_conduit.Node()
    catalyst.results(n)

This requires that the in situ pipeline ignores empty data. Alternatively, one can do:

checked_request = False
for i, t in enumerate(times):
    node = catalyst_conduit.Node()
    node['catalyst/state/timestep'] = i
    node['catalyst/state/time'] = t
    # Fill node with data
    catalyst.execute(node)
    if not checked_request:
      n = catalyst_conduit.Node()
      catalyst.results(n)
      # Do something with n
      checked_request = True

In this case, the pipeline generates the whole data in the first timestep and only what is requested afterward.

This should handle all of the use cases we talked about and open a whole lot of possibilities in communicating between the simulation and in situ scripts.