Catalyst V2 AddPipeline() doesn't use PYTHONPATH

Hi,

Looking at vtkInSituInitializationHelper::AddPipeline(const std::string& path) it expects the location of the passed in Catalyst script to be in a “good” location (I believe either in the current working directory or have an absolute path). For people that are using the Python wrapped API to Catalyst V2 trying to specify the script location through PYTHONPATH doesn’t work. Should we change vtkInSituInitializationHelper::AddPipeline() to also search for the script in PYTHONPATH too?

Thanks,
Andy

Adding to this, PYTHONPATH is getting scrubbed in both pvbatch and also just in ParaView Catalyst.

Hi Andy,

If adding the PYTHONPATH to the search locations for catalyst scripts would make for a more comfortable or familiar workflow for the target audience here, I don’t see any immediate problem with it. My question would be is this a preferred workflow we should advertise, or a supplemental path for deeply enfranchised python simulation devs?

What do you mean by “scrubbed”? Python 3.10 changed some initialization logic that forced us to update some things (and we adapted for older Python at the same time to avoid a giant conditional code switch). We used to fill in sys.path and then initialize Python, but now initialization unconditionally sets up sys.path ignoring anything we did before. I suppose something could have messed up somewhere, but without tests for the behavior you expect it’s not surprising that something may have gotten lost in the conversion.

As for the core request, I don’t know that searching in PYTHONPATH makes sense. I don’t think I’d expect to see “scripts” in an importable location. Should we just support PYTHON_CATALYST_PATH and search there for any kind of script?

I got confused about something with the environment variables so never mind about the scrubbing part.

Is PYTHON_CATALYST_PATH any better than just using PYTHONPATH? With PYTHON_CATALYST_PATH people have to find what that environment variable is (yet another ParaView Catalyst environment variable). People may assume/guess to use PYTHONPATH.

If people are using the Python API to Catalyst v2 I do think they would expect to see these ParaView Catalyst scripts in an importable location. That’s certainly what I’m expecting.

I suppose. Is this a ParaView thing? If so, I’m a lot less concerned as ParaView has an easier time of deprecating behavior if we decide on something else. libcatalyst has a much higher bar with its ABI compatibility guarantee.

Yes, this would be specifically for ParaView Catalyst Python scripts. No need to touch Catalyst itself.

I am wondering if we’re overloading things here. To me vtkInSituInitializationHelper::AddPipeline(const std::string& path) using absolute path makes perfect sense. I’d even say we get rid of support of relative paths. For Catalyst, relative path becomes convoluted since we’re running in an externally initialized runtime environment.

Perhaps we want a vtkInSituInitializationHelper::AddPipelinePythonModule(const std::string& modulename). Now, instead of a path to a file, we’re passing a Python module name and then Python can do the lookup for us. So PYTHONPATH, sys.path, and whatever other mechanisms Python chooses to support will work to locate the module.

The ParaView-Catalyst blueprint can then be extended to support modules as follows:

protocol: 'initialize'
Currently, 'initialize' protocol defines how to pass scripts to load for analysis.

catalyst/scripts: (optional) if present must either be a 'list' or 'object' node with child nodes that provides paths to the Python scripts to load for in situ analysis.
catalyst/scripts/[name]: (optional) if present can be a 'string' or 'object'. If string, it is interpreted as path to the Python script. If 'object', can have following attributes.
catalyst/scripts/[name]/filename**: path to the Python script, OR
catalyst/scripts/[name]/module**: Python module name # <============ NEW
catalyst/scripts/[name]/args: (optional) if present must be of type 'list' with each child node of type 'string'.

** only one of `filename` or `module` must be specified.
2 Likes

This sounds a lot better to me too.

I’m more concerned about functionality than the underlying implementation details. The functionality I’m thinking of is from either command line arguments (like how the Catalyst examples are set up) or possibly from a config file to the simulation specifying the script name. The functionality that I think should be supported is:

  • Python script with a relative path
  • Python script with an absolute path
  • Python script existing in a path in either $PYTHONPATH or sys.path

Will a user know that Catalyst does a run-time linking to the ParaView Catalyst implementation through the generic Catalyst API or the details of the implementation? Will the user know that they’re running the in situ pipelines in an externally initialized runtime environment? It doesn’t really matter. To users just the functionality matters and by providing those 3 mechanisms to use a Python script all seem like reasonable ways to get access to a Python script.