C extensions for Python plugins

elynch · February 22, 2021, 8:46pm

Is there any way to build a C or C++ extension module that could be imported by
a Python ParaView plugin? Alternatively, is there any way to use Cython in a
ParaView plugin?

Background: I’m building a Python reader plugin for a binary file format that
stores unstructured cell connectivity in a way that is slow to parse in Python.
I essentially have to do:

for i in range(numCells):
    header = np.fromfile(fin, np.int32, 1)
    cellType = header[0] >> 18
    if cellType == MY_FORMAT_TET:
        nodes = np.fromfile(fin, np.int32, 4)
    elif cellType == MY_FORMAT_HEX:
        nodes = np.fromfile(fin, np.int32, 8)
    ...

That loop is quite slow in Python for more than a million or so cells. In
standalone tools, I’ve been able to parse these files in Python quickly if I
can use a C extension module to handle the volume connectivity portion of the
file.

I’d like to avoid building the entirety of this plugin in C++. I can’t expect
my users to compile the superbuild and then compile a plugin. A Python plugin
is a much lower bar.

elynch · March 1, 2021, 2:26pm

I ended up solving this problem by having my Python plugin launch an external process (via subprocess.Popen) that reads the unstructured connectivity in chunks of 100,000 cells and writes them to its stdout as plain arrays of offsets, node indices, and cell types, which are convenient for constructing a vtkCellArray for each chunk. The external process is written in C, so all that branching based on what type of cell is coming up next isn’t so expensive. The plugin still reads all the other parts of the file (node coordinates, variables, etc.). The extra executable is easy to build (and instruct my users on building) since it has no dependencies.

With this technique, I can read 46 million cells in about 5 seconds. Doing it all in Python takes at least 100 times longer (I lost patience and killed it after that long).

utkarsh.ayachit · March 1, 2021, 2:38pm

Interesting approach. You should also be able to import a standard C-module in Python using standard Python mechanisms such as ctypes, for example.

elynch · March 1, 2021, 9:41pm

I tried the ctypes approach that Utkarsh mentioned, and it works quite nicely. It’s certainly more convenient than streaming all the connectivity from a 2nd process. However, there’s one unexpected thing happening.

Before using ctypes to call my fast connectivity reader function, I allocate my cell array by calling vtkCellArray.AllocateExact. It seems there’s something not quite right about that function, at least when called from Python (or I’m mis-using it). For example, if I want to allocate space for one tet, I do:

cells = vtk.vtkCellArray()
cells.AllocateExact(1, 4)

But then, if I check the contents of the offset and connectivity arrays, I get:

offsets = cells.GetOffsetsArray()
print(vtk_to_numpy(offsets))
>>> [0]
conn = cells.GetConnectivityArray()
print(vtk_to_numpy(conn))
>>> []

If I check the sizes of those arrays, they look exactly as I would expect:

print(offsets.GetSize())
>>> 2
print(conn.GetSize())
>>> 4

If I print the number of tuples (which I would expect to match the size), I get:

print(offsets.GetNumberOfTuples())
>>> 1
print(conn.GetNumberOfTuples())
>>> 0

If I call SetNumberOfTuples for those two arrays, everything starts to work as expected. If I don’t call SetNumberOfTuples on those arrays before passing my cell array to vtkUnstructuredGrid.SetCells, ParaView says I have zero cells. Am I supposed to have to call those? Or am I doing something wrong?

utkarsh.ayachit · March 5, 2021, 1:35pm

Allocate only allocates internal buffer without impacting the reported size. It’s similar to std::vector::reserve in that regard.