Hello,
I have two sources that are overlapping that have common global ids (they are coming from a common source but divided), to better exemplify, here is a small example to achieve this:
this creates two filters (threshold1 and threshold2) that have inherently the same global ids.
I would like to create a programmable filter that:
takes the two thresholds as inputs
gets the shared points, using global Points IDs
gets the cells that have this points on threshold1 and stores all other points of this cells
gets another ‘layer’ of cells that have this specific points
I got a prototype working of it:
input1 = self.GetInputDataObject(0, 0) # target mesh (removedLayer)
input2 = self.GetInputDataObject(0, 1) # reference mesh (finalChannel)
output = self.GetOutputDataObject(0)
output.ShallowCopy(input2)
arr1 = input1.GetPointData().GetArray("GlobalPointIds")
gpids1 = {arr1.GetValue(i) for i in range(input1.GetNumberOfPoints())}
nCells = input2.GetNumberOfCells()
import vtk
shared = vtk.vtkIntArray()
shared.SetName("flag")
shared.SetNumberOfTuples(nCells)
IDsToRecheck=set()
cellIDsToNotRecheck=[]
for cid in range(nCells):
cell = input2.GetCell(cid)
ids = cell.GetPointIds()
count = 0
for i in range(ids.GetNumberOfIds()):
pid = ids.GetId(i)
gpid = input2.GetPointData().GetArray("GlobalPointIds").GetValue(pid)
if gpid in gpids1:
count += 1
if count!=0:
shared.SetValue(cid, count)
for i in range(ids.GetNumberOfIds()):
IDsToRecheck.add(ids.GetId(i))
else:
shared.SetValue(cid, 0)
#for i in dir(shared):
# print(i)
#second layer
for cid in range(nCells):
cell = input2.GetCell(cid)
val=shared.GetValue(cid)
if val==0:
ids = cell.GetPointIds()
pid = [ids.GetId(i) for i in range(ids.GetNumberOfIds())]
if any([i in IDsToRecheck for i in pid]):
shared.SetValue(cid, -11)
output.GetCellData().AddArray(shared)
my question is if there is a faster way to do this, without the need to go thought the hole nCells each time. as I am looking to identify the interface and i have a bunch of cells on the bulk that i know that will not be there.
I am thinking of two different approaches (this is where i am looking for help) that not sure how to achieve it:
instead of going thought the hole range of the nCells in a for loop which can be quite slow in python, i would like to select specific ids inside a bounding box that would be a slight larger (~3 voxels in each direction, of the input1). this is far from optimal as in case that input1 is not a box (as it is in the example) the number of cells in the loop can be unnecessary checked.
i was thinking on:
a. find the shared GlobalPointIds between the two sources.
b. select using get the internal point ids corresponding to these values of GlobalPointIds (found in a.), ideally not needing this step.
c. select the cells using the information in a.(ideally) or b., something like ‘hey, return me the cell that has this specific value, or that it has this point id inside of itself’
would like some input on how to improve this script. lastly, if someone comes to mind a good learning material for this kind of vtk manipulation in python would appreciate.
The key to avoiding the full-cell iteration is GetPointCells() - it directly returns which cells use a given point, so you never need to loop over all cells.
Here is an optimized Programmable Filter script that replaces both full-cell passes with targeted lookups:
import numpy as np
from vtkmodules.vtkCommonCore import vtkIdList
from vtk.numpy_interface import dataset_adapter as dsa
from vtk.util.numpy_support import numpy_to_vtk, vtk_to_numpy
# --- Get inputs ---
def get_vtk_dataset(wrapped_input):
vtk_obj = wrapped_input.VTKObject
if vtk_obj.IsA("vtkMultiBlockDataSet"):
return vtk_obj.GetBlock(0)
return vtk_obj
vtk_ref = get_vtk_dataset(inputs[0]) # removedLayer
vtk_tgt = get_vtk_dataset(inputs[1]) # finalChannel
output.ShallowCopy(vtk_tgt)
nCells = vtk_tgt.GetNumberOfCells()
vtk_tgt.BuildLinks()
# --- Step 1: Find shared GlobalPointIds and map to local indices ---
wrap_ref = dsa.WrapDataObject(vtk_ref)
wrap_tgt = dsa.WrapDataObject(vtk_tgt)
gid_ref = np.asarray(wrap_ref.PointData["GlobalPointIds"]).ravel()
gid_tgt = np.asarray(wrap_tgt.PointData["GlobalPointIds"]).ravel()
# Sort ref, then searchsorted to find which target points exist in ref
gid_ref_sorted = np.sort(gid_ref)
pos = np.searchsorted(gid_ref_sorted, gid_tgt)
pos_clipped = np.clip(pos, 0, len(gid_ref_sorted) - 1)
mask = gid_ref_sorted[pos_clipped] == gid_tgt
shared_local_ids = np.nonzero(mask)[0]
# --- Step 2: Get interface cells via GetPointCells() ---
# Instead of iterating ALL cells, ask VTK "which cells use this point?"
interface_cell_ids = set()
cell_id_list = vtkIdList()
for pt_id in shared_local_ids:
vtk_tgt.GetPointCells(int(pt_id), cell_id_list)
for i in range(cell_id_list.GetNumberOfIds()):
interface_cell_ids.add(cell_id_list.GetId(i))
# --- Step 3: Find adjacent layer cells ---
# Get connectivity arrays as numpy (zero-copy)
cells = vtk_tgt.GetCells()
conn = vtk_to_numpy(cells.GetConnectivityArray())
offsets = vtk_to_numpy(cells.GetOffsetsArray())
# Extract all unique point IDs from interface cells
interface_arr = np.fromiter(interface_cell_ids, dtype=np.intp, count=len(interface_cell_ids))
starts = offsets[interface_arr]
ends = offsets[interface_arr + 1]
all_pts = np.concatenate([conn[s:e] for s, e in zip(starts, ends)])
unique_pts = np.unique(all_pts)
# Shared points only connect to interface cells, so skip them
boundary_pts = unique_pts[~np.isin(unique_pts, shared_local_ids)]
adjacent_cell_ids = set()
for pt_id in boundary_pts:
vtk_tgt.GetPointCells(int(pt_id), cell_id_list)
for i in range(cell_id_list.GetNumberOfIds()):
cid = cell_id_list.GetId(i)
if cid not in interface_cell_ids:
adjacent_cell_ids.add(cid)
# --- Step 4: Create flag array ---
layer_flag = np.zeros(nCells, dtype=np.int32)
layer_flag[interface_arr] = 1
layer_flag[np.fromiter(adjacent_cell_ids, dtype=np.intp, count=len(adjacent_cell_ids))] = 2
arr = numpy_to_vtk(layer_flag)
arr.SetName("LayerFlag")
output.GetCellData().AddArray(arr)
Your approach iterates all cells twice (range(nCells)), checking every cell’s points against the shared set. This is O(nCells) regardless of how small the interface region is.
This approach works from the shared points outward:
np.searchsorted finds which target points exist in the reference - pure numpy C code, no Python loop over points
GetPointCells(pointId) directly returns the cells that use a given point. Instead of checking every cell in the mesh, we only query the shared points — so the work scales with the size of the interface region, not the total number of cells
For the adjacent layer, we extract the connectivity array as numpy and collect unique boundary points (excluding shared points, which by definition only connect to interface cells). Then GetPointCells on those boundary points gives us the adjacent cells directly
In my test with a mesh of 4M cells and 40k shared points, the original approach took 343 sec while the optimized version completed in 0.44 sec (~780x faster). The larger the mesh and the smaller the interface region, the bigger the speedup.
Hello @Kenichiro-Yoshimi ,
first of all, awesome explanation, secondly, thanks, for taking the time to explain it and for the indeed improvement over my code. I highly appreciate,and furthermore it helps me improve my coding.
again, thanks a lot.
If I may, would you have some documentation recommendation on this kind of tools? just direct numpy? something else? a specific documentation in numpy? today I get to make ‘everything’ im looking for, but as you can see in my original code, even thought quite understandable and clear, it is not optimized at all.
again, thanks a lot for your awsome answer.