Usage of vtk.numpy_interface in parallel

Aquan · January 22, 2020, 4:56pm

Hello,

I’m trying to use paraview to analyze large numerical datasets in parallel on our cluster. What I want to do is extract data at several slices, and convert them to numpy arrays for further post-processing. Currently I’m using something like this:

from paraview.simple import *
import vtk
from vtk.numpy_interface import dataset_adapter as dsa
from mpi4py import MPI
import numpy as np

data = XMLMultiBlockDataReader(FileName=[<InputFile>])

xPos = np.linspace(0.32, 0.8, 10)
sliceData = dict()

for pos in xPos:
    slice1 = Slice(Input=data)
    slice1.SliceType = 'Plane'
    slice1.SliceOffsetValues = [0.0]
    slice1.SliceType.Origin = [pos, 0, 0.0]
    slice1.SliceType.Normal = [1, 0, 0.0]
    sliceDataPV = servermanager.Fetch(slice1)
    sliceDataNP = dsa.WrapDataObject(sliceDataPV)

    # extract point coordinates
    sliceData["x"] = sliceDataNP.Points.Arrays[0][:,0]
    sliceData["y"] = sliceDataNP.Points.Arrays[0][:,1]
    sliceData["z"] = sliceDataNP.Points.Arrays[0][:,2]
    sliceData["pressure"] = sliceDataNP.PointData["pressure"].Arrays[0]

This works well, as long as I run my script in serial mode, however, due to large file sizes I also want to be able to run it in parallel. However, since the data is distributed between the different processes I have the problem that every process contains only part of the sliceData, causing problems with my post-processing.

My first try was resolve this, was to try and gather all the data on one (or all) processes by using mpi4py, similar to whats been described here: mpi4py and vtk

So I just added the following lines to the beginning of my script:

gc = vtk.vtkMultiProcessController.GetGlobalController()
comm = vtk.vtkMPI4PyCommunicator.ConvertToPython(gc.GetCommunicator())

And inside the for-loop I tried gathering the data with the following code:

sendcounts = comm.allreduce(len(sliceData["pressure"]), op=MPI.SUM)
pressureGathered = np.empty(sendcounts,dtype=np.float64)
comm.Allgatherv([sliceData["pressure"], MPI.FLOAT], [pressureGathered, MPI.FLOAT]

However, pvbatch always hangs at the allreduce commands and stops doing anything. There’s no error, and the cpu usage of the processes drops to almost 0. If I run the same commands with normal python and only mpi4py and numpy arrays, everything works as expected.

So I’d like to know: Is there anything special I need to take into account when gathering vtk arrays? Or is there maybe a different way to get all the slice data on one process?

Aquan · January 25, 2020, 12:13pm

Okay, it turned out it was a “user error” after all. The servermanager.Fetch() command did indeed collect all the data on the client process, the difference in array sizes in parallel was caused by the fact that I’m using a multiblock dataset. After I merged the blocks, the problem disappeared…