We have been mulling updates to Catalyst API – the API and patterns developers use to when instrumenting a simulation to work with Catalyst – for a while now and we’ve had several discussions with several folks. This is proposal that puts together some of the ideas discussed.
Motivation
Looking at various existing adaptor implementations, it becomes clear that most of these are an arbitrary collection of C functions that are passed initialization parameters and simulation data-structures and meta-data. This has a tendency for creating functions with a large set of arguments that can get very confusing very quickly. For example, here’s a function from a real life CTH Catalyst Adaptor
void pvspy_sta(int block_id, int allocated, int active, int level, int max_level, int bxbot,
int bxtop, int bybot, int bytop, int bzbot, int bztop, int npxma11, int npxma21, int npxma12,
int npxma22, int npyma11, int npyma21, int npyma12, int npyma22, int npzma11, int npzma21,
int npzma12, int npzma22, int npxpa11, int npxpa21, int npxpa12, int npxpa22, int npypa11,
int npypa21, int npypa12, int npypa22, int npzpa11, int npzpa21, int npzpa12, int npzpa22,
int nbxma11, int nbxma21, int nbxma12, int nbxma22, int nbyma11, int nbyma21, int nbyma12,
int nbyma22, int nbzma11, int nbzma21, int nbzma12, int nbzma22, int nbxpa11, int nbxpa21,
int nbxpa12, int nbxpa22, int nbypa11, int nbypa21, int nbypa12, int nbypa22, int nbzpa11,
int nbzpa21, int nbzpa12, int nbzpa22)
{
...
}
So the first question becomes can we give the API some structure so that we avoid running into this kind of adaptor code which is tedious to maintain and debug.
It’s fair to say that ParaView/Catalyst changes more frequently than the simulation code. That being the case, for each release of ParaView, the simulation needs to be rebuilt with an updated version of ParaView. This is burdensome. Can we support a use-case where the simulation doesn’t need to rebuild / re-link whenever there’s a new version of ParaView? A corollary of this is can we support run-time selection of which version of ParaView/Catalyst to use. That way it’s easy to try multiple version of ParaView. This enables simulations to easily update to latest version of ParaView and go back to an earlier stable version in case of regressions.
Another common challenge encountered when using Catalyst is debugging. Sometimes a filter (or some other component in the in situ analysis and viz. pipeline) fails when running in situ with the simulation but the problem it hard to reproduce using just ParaView or pvbatch. Can we simplify this debugging use-case i.e. make it possible to recreate the environment within the simulation without having to run the simulation?
Design
With these questions in mind, let’s enumerate the key aspects of a design that can address them:
- Use a data-structure to pass data / meta-data from simulation to the adaptor. Something that lets the simulation pass named parameters by value or reference to the adaptor e.g. a dictionary. In that case, the
pvspy_sta(...)
function in the example above, could be rewritten as apvspy_sta(params)
. Instead of every adaptor instrumentation defining its own API, this also helps us standardize the adaptor interface. The API, for example, can comprise of just 3 calls:catalyst_initialize(params)
,calatayst_execute(params)
, andcatalyst_finalize(params)
where parameters for each of the calls are passed through that dictionary-like data-structure. - With the aforementioned change, the API that simulation will use to setup and execute Catalyst is fixed and limited. It only consists of the API related to creation/assignment/cleanup of the dictionary and the 3
catalyst_..
calls. If we keep the dictionary data-structure opaque, we can provide a ABI-stable adaptor API. Make it aC
API, instead ofC++
, and we make it even more stable and easier to use from multitude of languages. We can then provide a stub implementation of this API that is trivial, with no external dependencies, that simulations can link against. This stub will do nothing by default – thus introduce no overhead for simulations. At runtime, one can swap this stub adaptor implementation with a custom adaptor implementation that is specific for the simulation and uses a chosen version of ParaView. Since the adaptor will be ABI compatible, it should be easy to implement using standard environment modules or updatingLD_LIBRARY_PATH
(orDYLD_LIBRARY_PATH
). - This standardized adaptor API will be provided in its own separate source repository / package with no external dependencies. Linking and building against this standardized adaptor API will be kept simple. One should not require CMake-based build system or anything fancy at all. Something as simple as
-I $root/include -L $root/bin -l catalyst_adaptor
added to the compiler/linker should suffice. - Finally, since all exchange between simulation and adaptor happens via the
params
dictionary, if we serialize the dictionary, it should be possible to recreate the state for debugging purposes. To support this, we can provide an implementation in the stub adaptor to dump out all data passed to the 3catalyst_...
calls. Now, all we need a small miniapp / driver that can load these dumps and play them back to recreate the state for debugging later on.
Discussion
- The dictionary can represent a hierarchical structure where the key is path rather than just a string. Thus makes it possible for the adaptor developer to device a schema to conveniently pass simulation data-structures and meta-data to the adaptor. One possible choice for this dictionary datastructure is Conduit.
- ParaView can define a standard schema for all supported VTK data types. For simple simulations, they can use this standard schema and one doesn’t need to write a custom adaptor at all.
- A typical adaptor implementation in
catalyst_execute(..)
will take the params dictionary passed to it and pass it to an data-produce vtkAlgorithm subclass that can be connected in Catalyst pipeline. This vtkAlgorithm subclass will implement aRequestData
where the creation of VTK data objects using the simulation provided paramaters will happen. Since the VTK data object creation will happen inRequestData
, there will be no conversion of simulation data-structures to VTK until requested by the analysis pipeline. Thus for timesteps where the analysis pipeline is not executed, we won’t be wasting any cycles converting simulation data to VTK.
Thoughts? Comments?