Slow Programmable Filter

I’ve created a programmable filter to do some analysis, but it’s running more slowly than I expected. Does anyone have ideas for how to improve the filter? It’s running over ~65k cells. I’d also love it to be parallel, but I’m not sure if that’s possible.

Thanks,

Nathan Woods

def flatten(input, output):
    output.ShallowCopy(input)
    numCells = output.GetNumberOfCells()
    cellData = input.GetCellData()
    r = cellData.GetArray('radius')
    v = cellData.GetArray('vol')
    d = cellData.GetArray('den')

    def base_bump(t, tau):
        x = (t-tau)**2
        if x > 1:
            out = 0
        else: 
            out = exp(1/(x-1))/0.443993816237631
        return out

    def bump(r, rp, h):
        return base_bump(r/h, rp/h)**(1/h)

    h = 1
    dtype = r.GetDataType()
    out = r.NewInstance()
    out.SetName("denbar")
    print(numCells) 
    for indi in range(0, numCells):
        temp = 0
        for indj in range(0, numCells):
            temp += d.GetValue(indj)*v.GetValue(indj)*bump(
                r.GetValue(indi),r.GetValue(indj),h) 
        out.InsertValue(indi,temp) 

#    output.AddArray(out)


input = self.GetInputDataObject(0, 0)
output = self.GetOutputDataObject(0)

if input.IsA("vtkMultiBlockDataSet"):
    output.CopyStructure(input)
    iter = input.NewIterator()
    iter.UnRegister(None)
    iter.InitTraversal()
    while not iter.IsDoneWithTraversal():
        curInput = iter.GetCurrentDataObject()
        curOutput = curInput.NewInstance()
        curOutput.UnRegister(None)
        output.SetDataSet(iter, curOutput)
        flatten(curInput, curOutput)
        iter.GoToNextItem();
else:
  flatten(input, output)

I’d not recommend using the GetValue/SetValue APIs. Instead you want to use the numpy wrapped API for modifying data arrays.

Refer to examples in ParaView Guide, Chapter 12.

The VTK/ParaView numpy API is also described in this series of blog posts starting here.

I tried the NumPy version for quite a while, but I eventually gave up. It certainly didn’t seem any faster; should I expect it to be?

it’d be faster if you can rephrase your code to now use for loops over each element in Python instead using numpy functions and expressions. Essentially, avoid for .. in range(0, numCells), you never want to do that if you care about performance.

I have a vectorized version ready to time. I’m putting it here in case PView hangs. Comments welcome.

def flatten(input, output):
r = inputs[0]['radius']
v = inputs[0]['vol']
d = inputs[0]['den']

def base_bump(t, tau):
    x = (t-tau)**2
    if x > 1:
        out = 0
    else: 
        out = exp(1/(x-1))/0.443993816237631
    return out

def bump(r, rp, h):
    return base_bump(r/h, rp/h)**(1/h)

vbump = frompyfunc(bump,2,1)
h = vbump(r,r,1)
out = (v*d)@h
output.CellData.append(out, "denbar")

input = self.GetInputDataObject(0, 0)
output = self.GetOutputDataObject(0)

if input.IsA("vtkMultiBlockDataSet"):
output.CopyStructure(input)
iter = input.NewIterator()
iter.UnRegister(None)
iter.InitTraversal()
while not iter.IsDoneWithTraversal():
    curInput = iter.GetCurrentDataObject()
    curOutput = curInput.NewInstance()
    curOutput.UnRegister(None)
    output.SetDataSet(iter, curOutput)
    flatten(curInput, curOutput)
    iter.GoToNextItem();
else:
  flatten(input, output)

Another NumPy attempt. This one is far slower than the VTK version, probably due to the looping construct to form the h array. The commented portion works instantaneously, but returns all zeros.

r = inputs[0]['radius']
v = inputs[0]['vol']
d = inputs[0]['den']

def base_bump(t, tau):
    x = (t-tau)**2
    if x > 1:
        out = 0
    else: 
        out = exp(1/(x-1))/0.443993816237631
    return out

def bump(r, rp, h):
    return base_bump(r/h, rp/h)**(1/h)

# This doesn't work for some reason
# vbump = frompyfunc(bump,3,1)
# h = vbump(r,r,1+0*r)
h = array([[bump(ri,rj,1) for ri in r] for rj in r])
out = inner(v*d,h)
output.CellData.append(out, "denbar")

The GetValue version I posted above seems to be returning a single value per block, rather than a value per cell. Does anyone have any idea why that might be?

def integrate(input, output):
    output.ShallowCopy(input)
    numCells = output.GetNumberOfCells()
    cellData = input.GetCellData()
    r = cellData.GetArray('radius')
    v = cellData.GetArray('vol')
    d = cellData.GetArray('den')

    def base_bump(t, tau):
        x = (t-tau)**2
        if x > 1:
            out = 0
        else: 
            out = exp(1/(x-1))/0.443993816237631
        return out

    def bump(r, rp, h):
        return base_bump(r/h, rp/h)**(1/h)

    h = 1
    dtype = r.GetDataType()
    out = r.NewInstance()
    out.SetName("denbar")
    for indi in range(0, numCells):
        temp = 0
        for indj in range(0, numCells):
            temp += d.GetValue(indj)*v.GetValue(indj)*bump(
                r.GetValue(indi),r.GetValue(indj),h) 
        out.InsertValue(indi,temp) 
    output.GetCellData().AddArray(out)


input = self.GetInputDataObject(0, 0)
output = self.GetOutputDataObject(0)

if input.IsA("vtkMultiBlockDataSet"):
    output.CopyStructure(input)
    iter = input.NewIterator()
    iter.UnRegister(None)
    iter.InitTraversal()
    while not iter.IsDoneWithTraversal():
        curInput = iter.GetCurrentDataObject()
        curOutput = curInput.NewInstance()
        curOutput.UnRegister(None)
        output.SetDataSet(iter, curOutput)
        integrate(curInput, curOutput)
        iter.GoToNextItem();
else:
  integrate(input, output)

This, unfortunately, suffers from the same “looping in python” problem, except now you’re using the generator to do the loop for you e.g. [.. for rj in r].

As an example, let’s say we want to raise all items in an array “radius” to power 2. Two ways to do this:

r = inputs[0].PointData["radius"]

# 1. slower way script:
r2 = array([ x**2 for x in r])

# 2. faster way
r2 = numpy.power(r, 2)

You will have to express your logic in bump/bump_base and expressions over numpy arrays. And use numpy functions, where ever possible. HTH.

I’m working in parallel on both approaches. A purely vectorized approach looks like this:

from vtk.numpy_interface.algorithms import shape
r = inputs[0].CellData['radius']
v = inputs[0].CellData['vol']
d = inputs[0].CellData['den']

def bump(r, rp, h):
    import numpy as np
    x = np.subtract.outer(r/h,rp/h)**2
    return where(x<1, np.exp(1/(x-1))/0.443993816237631,0*x)**(1/h)

h = bump(r,r,1)
out = inner(h,v*d)
print(max(h))
output.CellData.append(out, "denbar") 

Unfortunately, this fails with the following error message:
AttributeError: 'VTKCompositeDataArray' object has no attribute 'exp'
Presumably, this is because np.exp doesn’t know how to treat VTK objects, but I’m not sure what else to try.

use exp from the vtk.numpy_interface.algorithms module instead

Unfortunately, using the function from algs yields the same error:
AttributeError: 'VTKCompositeDataArray' object has no attribute 'exp'

Full code:

import numpy as np
import vtk.numpy_interface.algorithms as algs
r = inputs[0].CellData['radius']
v = inputs[0].CellData['vol']
d = inputs[0].CellData['den']

def bump(r, rp, h):
    x = np.subtract.outer(r/h,rp/h)**2
    return where(x<1, algs.exp(1/(x-1))/0.443993816237631, 0*x)**(1/h)

h = bump(r,r,1)
out = inner(h,v*d)
print(max(h))
output.CellData.append(out, "denbar")

Do you have sample dataset to share, so I can try I your script out locally and see what’s going on? Thanks.

I put together an example using Wavelet.

Is the where function not implemented fully? I got errors testing on the Wavelet source that it only accepted 1 positional argument.

I’m also having trouble accessing the outer methods of functions. Those should be present for numpy ufuncs, and they cause the function to compute on all pairs of values from the input arrays, rather than element-wise. Various work-arounds are slow.

ForKitware.pvsm (606.6 KB)

Any luck with this? Once I fixed the bug in my VTK implementation, the whole thing became too slow to use reliably.

Your script had a small typo, change d = inputs[0].CellData['RTdata'] to d = inputs[0].CellData['RTData'] (note the case for RTData). With that, it produces a result for the state file you attached.

I wanted to put a note here on the final result. It turns out that I am essentially unable to use the NumPy interface for this problem, because I run into memory problems for large data.

The problem statement is this one:

x = np.subtract.outer(r/h,rp/h)**2

r is an array of size (ncells), which makes x an array of size (ncells**2) For a million cells, that’s a trillion-element array (~7TB?), and my workstation couldn’t even construct it, failing with error:

Traceback (most recent call last):
  File "<string>", line 22, in <module>
  File "<string>", line 6, in RequestData
MemoryError

I believe that the conclusion I should draw from this is that I need to derive a new algorithm and/or use the older VTK interface.

For completeness, the precise script I was using was this:

import numpy
r = inputs[0].CellData['radius']
m = inputs[0].CellData['mass']
h = 1
x = numpy.subtract.outer(r/h,r/h)**2
#bump = where(x<1, exp((x-1)**-1)*2.252283620690761, 0)**1/h

#out = inner(bump, mass)
#output.PointData.append(out, "out")