General design questions for plugins programming

raffienficiaud · July 1, 2018, 2:17pm

Dear forum,

My name is Raffi Enficiaud, and I am developing a plugin in Paraview for brain data visualization. I am relatively new to Paraview/VTK and plugin developments, so please accept my apologies if my questions seem naive.

My question concern general design of a python programmable filter, and how to improve the user “experience” by making things faster and easier to interact with. Python obviously is not the fastest programming language, but this apart, I would like to have some better insights on how to shape the overall architecture of the plugin such that it scales better to computations and interactions.

Right now the pipeline looks like this:

some mesh
python programmable filter

The python filter depends on:

the mesh upstream
an external file (numpy array) that is transformed depending on the mesh
some parameters that affect the processing performed by the plugin
some parameters that affect the rendering of the plugin

the step 2 is the most time consuming, and ideally I want to avoid any recomputation that is unnecessary. Especially, I want to avoid reprocessing everything is the user change some visualization parameter from 3/ (for instance, he/she selects a specific trajectory or primitive that should be shown or hidden).

The processing is performed in numpy, so there are vtk<->numpy transformations I would like to keep as minimal as possible.

Here are some ideas, but I do not know how to achieve those. Any hint would be more than welcome:

divide the plugin into 2 or 3: a processing plugin and a graphic primitive plugin. The questions are then:
- is it possible a plugin to output a numpy array, and this array being passed to downstream plugins?
- if so, how do I name the array that is being passed downstream? how do I distinguish between the different sources of information?
- is the user required to manually add those 2 plugins in the pipeline? Is there a way to hide this, in the sense that the end user does not need to know there are 2 plugins? For instance, the user selects my plugin, and automatically two elements are added to the pipeline without the user doing any manual configuration. I am asking this because the targeted users are not techy at all.
- if the processing plugin does not need any input (just a file that is outside of the pipeline), then it would make sense to make it a programmable source. Yet, I would like to be able to change some of the parameters of the source. I am currently using examples for generating the XML out of some python description from there: https://blog.kitware.com/easy-customization-of-the-paraview-python-programmable-filter-property-panel/ and that I adapted to my needs. Is a programmable source working the same?
The other option would be caching some information that does not need to be recomputed.
- is there a way to store information in a global memory, such that it can be reused the next time the pipeline executes the plugin? Is there a best disk location to cache information on disk?
- is there a way to have an information about the upstream pipeline? If the pipeline changes somehow, some computation need to be executed again with the new incoming data. Is there a flag or a hash that might be used to indicate that the incoming data has changed since last processing?
Finally, to accelerate the processing, some python parts might be compiled as python extensions. But there I am facing some problems.
Paraview on OSX and Linux are both compiled against the system python, which is a good news for ABI compatibility. The version of numpy might however not match the one that comes from the system or any virtualenv. So best is that, during the compilation of the python extension, we use the same environment as the one provided by Paraview. My python extension uses cython. Was there any successful attempt to compile a python extension into Paraview with cython?

Thank you for your time, and thanks in advance for any hint,
Best,
Raffi Enficiaud

Kenneth_Moreland · July 2, 2018, 3:50pm

Raffi,

Your description and question you ask is very broad, and I don’t think I can give any definitive answer for everything that you ask. I will try to give some feedback on some of the points you make.

First, I am a bit confused by what you mean by a “plugin.” In ParaView, plugins are a specific thing that can encapsulate a number of objects such as filters, rendering components, GUI elements, and code to load at startup. They are typically incased in a shared object library or sometimes as an xml (see the plugin documentation for more). It makes little sense to me to break up a plugin into pieces since you can wrap up lots of different filters into a single plugin.

I think what you actually mean is a filter, which is what we call those units that are displayed in the Pipeline Browser. Whether or not you split filters into multiple is a style decision. I don’t think I understand your problem well enough to give any advice one way or another.

Second, you say that you are trying to minimize the vtk <-> numpy transformations. I’m not sure this is necessary. The passing of data between VTK and numpy should by very efficient. I believe the linking takes C arrays directly from numpy and places them in VTK without data copying (and vice versa). Unless you are noticing a slowdown specifically with getting numpy arrays in and out of VTK data structures, I wouldn’t start by focusing on this.

Third, you asked about passing arrays downstream. There are a couple of ways to do that. The first way is to build a data object that holds the array as a field or something like that. You could create a 1D vtkImageData or you could create a vtkTable. However, if you want to embed your array in a VTK data structure (perhaps a copy of the input mesh), you can do that using the field data of VTK data objects. That would work something like this:

vtkarray = paraview.vtk.dataset_adapter. numpyTovtkDataArray(numpy_array, 'mydata')
output_data.GetFieldData().AddArray(vtkarray)

Note that it is important to assign a name to the array so that you can later get it out of the field. In the downstream filter, that would look something like this.

vtkarray = input_data.GetFieldData().GetArray('mydata')

raffienficiaud · July 2, 2018, 5:49pm

Hi Kenneth,

Thank you for your reply!

Sorry for the lack of precision, I meant Paraview filter plugin (as a vtkPythonProgrammableFilter). The one I have is written in Python and exported to XML with the tool I pointed on the link.

Concerning the computation performed, let’s say it is composed of 2 parts, one heavy and one less heavy. I would like to cache the heavy computation, and to have a way to know when I can consider this cache dirty.

The fact that I am able to push an array from upstream to downstream in the way you mentioned is indeed what I was looking for. It involves a copy though, even though this copy is fast. I guess there is no way to avoid that: ideally I would like to pass a numpy array instead of having to transform this back and forth to a vtkArray.

I guess having to add two elements to the pipeline for the user is not very difficult, so I’ll try this splitting then. Do you know if there is a way to pass parameters from the GUI to a programmable source?

Thanks again,
Best,
Raffi

utkarsh.ayachit · July 2, 2018, 6:27pm

Check this post. ParaView master (which will become ParaView 5.6 this fall) has a even better approach to generate GUI for Python filters that uses Python Algorithm. More docs for that are under development, but you can look at the examples here. You simply load the example py file using Tools | Manage Plugins dialog.

raffienficiaud · July 2, 2018, 7:22pm

Thank you for the reply. I am aware of this post, this is the one on which I based my filter installation procedure. My question was more directed to the programmable source: does the same approach work with programmable sources?

Thanks

utkarsh.ayachit · July 3, 2018, 12:19pm

Indeed. The blog does talk about a Programmable Source: Double Helix Source. There’s no difference between Filter Source and Filter except that the latter takes an Input.