numpy functions in Python Calculator

Hi,

I’m using numpy methods in the Python Calculator but I’m getting different behavior depending on whether the input is a vtkDataSet or a vtkCompositeDataSet. The expression I’m trying out is:

pythonCalculator2.Expression = "numpy.swapaxes(Gradients, 1, 2)"

Note that Gradients is a 9 component array.

This works fine if the input is a vtkDataSet but has problems if the input is a vtkCompositeDataSet. So what I’m wondering is – should users should be allowed to use numpy like this in the Python Calculator? I don’t see anything in the documentation that says this is allowed or not allowed. I noticed that in dataset_adapter.py that VTKArray derives from numpy.ndarray but VTKCompositeDataArray derives from object.

Any thoughts on this?

Thanks,
Andy

Note that if I don’t allow composite datasets in the vtkPythonCalculator class by commenting out:

    info->Append(vtkAlgorithm::INPUT_REQUIRED_DATA_TYPE(), "vtkCompositeDataSet");

in vtkPythonCalculator::FillInputPortInformation I’d get the same behavior here for vtkDataSets and vtkCompositeDataSets since then the pipeline would iterate through each block in a composite data set and have vtkPythonCalculator just operate on vtkDataSets.

Trying this out in an MR – https://gitlab.kitware.com/paraview/paraview/-/merge_requests/6452

Unfortunately, this won’t work. For composite dataset arrays, we actually use an Python class that is not even a numpy array. We have custom functions that iterate over each array and call the underlying numpy function. So there has to be a swapaxes() version of that.

Yeah, VTKCompositeDataArray derives from object instead of numpy.ndarray so it’s not working in the Python Calculator when passed to numpy functions. If we do the change that’s in my MR though Python Calculator won’t ever operate on composite datasets though so then we could use numpy methods directly. Wouldn’t that be an improvement?

1 Like

If vtkPythonCalculator does not have a specific code path for Composite (I didn’t check), then your change make sense I think

Well, yes and no. If we do that, the global functions would stop working. There are many of those such as min, max, mean etc. So something like the following would stop working for partitioned datasets:

avar / mean(avar)

which is more important than supporting arbitrary numpy functions. The code path for adding support for numpy functions is pretty simple. It’s a simple as adding something like the following to algorithms.py:

mod = _make_dfunc(numpy.mod)

Well, indeed … my bad.

So no better way than manually adding function as needed ?

We could probably add a wrapper that takes a numpy function. Something like:

apply_numpy(numpy.swapaxes, arrays, args...)

What do you think @Andy_Bauer ?

Yeah, I like that. I couldn’t quite figure out how to make it look but adding a numpy wrapper would be nice to get all of the numpy functionality wihout having to explicitly add it in.

@Andy_Bauer
It turns out that this functionality already exists :slight_smile: I wrote it to implement the underlying looping. All you have to do is:

apply_ufunc(numpy.swapaxes, Gradients, (1, 2))

Give it a try.

1 Like

You can also use apply_dfunc when you have two arguments.