Hello,
I’m trying to use paraview to analyze large numerical datasets in parallel on our cluster. What I want to do is extract data at several slices, and convert them to numpy arrays for further post-processing. Currently I’m using something like this:
from paraview.simple import *
import vtk
from vtk.numpy_interface import dataset_adapter as dsa
from mpi4py import MPI
import numpy as np
data = XMLMultiBlockDataReader(FileName=[<InputFile>])
xPos = np.linspace(0.32, 0.8, 10)
sliceData = dict()
for pos in xPos:
slice1 = Slice(Input=data)
slice1.SliceType = 'Plane'
slice1.SliceOffsetValues = [0.0]
slice1.SliceType.Origin = [pos, 0, 0.0]
slice1.SliceType.Normal = [1, 0, 0.0]
sliceDataPV = servermanager.Fetch(slice1)
sliceDataNP = dsa.WrapDataObject(sliceDataPV)
# extract point coordinates
sliceData["x"] = sliceDataNP.Points.Arrays[0][:,0]
sliceData["y"] = sliceDataNP.Points.Arrays[0][:,1]
sliceData["z"] = sliceDataNP.Points.Arrays[0][:,2]
sliceData["pressure"] = sliceDataNP.PointData["pressure"].Arrays[0]
This works well, as long as I run my script in serial mode, however, due to large file sizes I also want to be able to run it in parallel. However, since the data is distributed between the different processes I have the problem that every process contains only part of the sliceData, causing problems with my post-processing.
My first try was resolve this, was to try and gather all the data on one (or all) processes by using mpi4py, similar to whats been described here: mpi4py and vtk
So I just added the following lines to the beginning of my script:
gc = vtk.vtkMultiProcessController.GetGlobalController()
comm = vtk.vtkMPI4PyCommunicator.ConvertToPython(gc.GetCommunicator())
And inside the for-loop I tried gathering the data with the following code:
sendcounts = comm.allreduce(len(sliceData["pressure"]), op=MPI.SUM)
pressureGathered = np.empty(sendcounts,dtype=np.float64)
comm.Allgatherv([sliceData["pressure"], MPI.FLOAT], [pressureGathered, MPI.FLOAT]
However, pvbatch always hangs at the allreduce commands and stops doing anything. There’s no error, and the cpu usage of the processes drops to almost 0. If I run the same commands with normal python and only mpi4py and numpy arrays, everything works as expected.
So I’d like to know: Is there anything special I need to take into account when gathering vtk arrays? Or is there maybe a different way to get all the slice data on one process?