I’ve created a programmable filter to do some analysis, but it’s running more slowly than I expected. Does anyone have ideas for how to improve the filter? It’s running over ~65k cells. I’d also love it to be parallel, but I’m not sure if that’s possible.

Thanks,

Nathan Woods

def flatten(input, output):
output.ShallowCopy(input)
numCells = output.GetNumberOfCells()
cellData = input.GetCellData()
r = cellData.GetArray('radius')
v = cellData.GetArray('vol')
d = cellData.GetArray('den')
def base_bump(t, tau):
x = (t-tau)**2
if x > 1:
out = 0
else:
out = exp(1/(x-1))/0.443993816237631
return out
def bump(r, rp, h):
return base_bump(r/h, rp/h)**(1/h)
h = 1
dtype = r.GetDataType()
out = r.NewInstance()
out.SetName("denbar")
print(numCells)
for indi in range(0, numCells):
temp = 0
for indj in range(0, numCells):
temp += d.GetValue(indj)*v.GetValue(indj)*bump(
r.GetValue(indi),r.GetValue(indj),h)
out.InsertValue(indi,temp)
# output.AddArray(out)
input = self.GetInputDataObject(0, 0)
output = self.GetOutputDataObject(0)
if input.IsA("vtkMultiBlockDataSet"):
output.CopyStructure(input)
iter = input.NewIterator()
iter.UnRegister(None)
iter.InitTraversal()
while not iter.IsDoneWithTraversal():
curInput = iter.GetCurrentDataObject()
curOutput = curInput.NewInstance()
curOutput.UnRegister(None)
output.SetDataSet(iter, curOutput)
flatten(curInput, curOutput)
iter.GoToNextItem();
else:
flatten(input, output)

it’d be faster if you can rephrase your code to now use for loops over each element in Python instead using numpy functions and expressions. Essentially, avoid for .. in range(0, numCells), you never want to do that if you care about performance.

Another NumPy attempt. This one is far slower than the VTK version, probably due to the looping construct to form the h array. The commented portion works instantaneously, but returns all zeros.

r = inputs[0]['radius']
v = inputs[0]['vol']
d = inputs[0]['den']
def base_bump(t, tau):
x = (t-tau)**2
if x > 1:
out = 0
else:
out = exp(1/(x-1))/0.443993816237631
return out
def bump(r, rp, h):
return base_bump(r/h, rp/h)**(1/h)
# This doesn't work for some reason
# vbump = frompyfunc(bump,3,1)
# h = vbump(r,r,1+0*r)
h = array([[bump(ri,rj,1) for ri in r] for rj in r])
out = inner(v*d,h)
output.CellData.append(out, "denbar")

The GetValue version I posted above seems to be returning a single value per block, rather than a value per cell. Does anyone have any idea why that might be?

This, unfortunately, suffers from the same “looping in python” problem, except now you’re using the generator to do the loop for you e.g. [.. for rj in r].

As an example, let’s say we want to raise all items in an array “radius” to power 2. Two ways to do this:

r = inputs[0].PointData["radius"]
# 1. slower way script:
r2 = array([ x**2 for x in r])
# 2. faster way
r2 = numpy.power(r, 2)

You will have to express your logic in bump/bump_base and expressions over numpy arrays. And use numpy functions, where ever possible. HTH.

I’m working in parallel on both approaches. A purely vectorized approach looks like this:

from vtk.numpy_interface.algorithms import shape
r = inputs[0].CellData['radius']
v = inputs[0].CellData['vol']
d = inputs[0].CellData['den']
def bump(r, rp, h):
import numpy as np
x = np.subtract.outer(r/h,rp/h)**2
return where(x<1, np.exp(1/(x-1))/0.443993816237631,0*x)**(1/h)
h = bump(r,r,1)
out = inner(h,v*d)
print(max(h))
output.CellData.append(out, "denbar")

Unfortunately, this fails with the following error message: AttributeError: 'VTKCompositeDataArray' object has no attribute 'exp'
Presumably, this is because np.exp doesn’t know how to treat VTK objects, but I’m not sure what else to try.

Unfortunately, using the function from algs yields the same error: AttributeError: 'VTKCompositeDataArray' object has no attribute 'exp'

Full code:

import numpy as np
import vtk.numpy_interface.algorithms as algs
r = inputs[0].CellData['radius']
v = inputs[0].CellData['vol']
d = inputs[0].CellData['den']
def bump(r, rp, h):
x = np.subtract.outer(r/h,rp/h)**2
return where(x<1, algs.exp(1/(x-1))/0.443993816237631, 0*x)**(1/h)
h = bump(r,r,1)
out = inner(h,v*d)
print(max(h))
output.CellData.append(out, "denbar")

Is the where function not implemented fully? I got errors testing on the Wavelet source that it only accepted 1 positional argument.

I’m also having trouble accessing the outer methods of functions. Those should be present for numpy ufuncs, and they cause the function to compute on all pairs of values from the input arrays, rather than element-wise. Various work-arounds are slow.

Your script had a small typo, change d = inputs[0].CellData['RTdata'] to d = inputs[0].CellData['RTData'] (note the case for RTData). With that, it produces a result for the state file you attached.

I wanted to put a note here on the final result. It turns out that I am essentially unable to use the NumPy interface for this problem, because I run into memory problems for large data.

The problem statement is this one:

x = np.subtract.outer(r/h,rp/h)**2

r is an array of size (ncells), which makes x an array of size (ncells**2) For a million cells, that’s a trillion-element array (~7TB?), and my workstation couldn’t even construct it, failing with error:

Traceback (most recent call last):
File "<string>", line 22, in <module>
File "<string>", line 6, in RequestData
MemoryError

I believe that the conclusion I should draw from this is that I need to derive a new algorithm and/or use the older VTK interface.

For completeness, the precise script I was using was this:

import numpy
r = inputs[0].CellData['radius']
m = inputs[0].CellData['mass']
h = 1
x = numpy.subtract.outer(r/h,r/h)**2
#bump = where(x<1, exp((x-1)**-1)*2.252283620690761, 0)**1/h
#out = inner(bump, mass)
#output.PointData.append(out, "out")