Node data in unstructured, multiblock data (Catalyst)

gmarks · June 13, 2018, 1:30pm

I’m quite new to Paraview/Catalyst, but I was wondering if somebody could clarify something in the Catalyst user guide.

With respect to multiblock data, it says (page 33) that “If the leaf is a vtkDataSet then it should be non-empty on exactly one process”. Now, I take this to mean that, for instance,if a node is shared between two blocks in your data structure (i.e. at a block boundary) node data for that node should be specified only on one of the relevant blocks.

However, attempting this in the toy code I’ve been playing with (adapted from the FullExample.cxx example), by only setting 4 tuples for the second block in a 2-block mesh, produces segmentation faults as the code attempts to access data for all 8 points specified for that block.

Doing it the other way, setting the nodal data at all points included in a block including duplicates (which worked for the similar structured code I was testing earlier) runs, but produces errors when the resulting files are loaded into Paraview, stating that “Cannot read point data array “velocity” from PointData in piece 0. The data array in the element may be too short”.

Could someone clarify what the correct approach to building the VTK data structures is in this instance?

edit: I should add that running this code with a single block on one processor produces no problems whatsoever, so I’m sure it has something to do with how block interfaces are handled in unstructured grids.

Andy_Bauer · June 13, 2018, 2:20pm

VTK does cell partitioning for parallel data which means that, ignoring ghost cells, that a cell will only exist on a single parallel partition. This means that the point at the partition boundaries will be duplicated on multiple processes in order for the cells to have full information about all of the points that they use.

The multiblock data structure is a generalization of datasets which allow multiple datasets of different types (e.g. unstructured grids, Cartesian grids, etc.) to be grouped together. A multiblock dataset can be thought of as a tree structure and for VTK that tree structure is fully described on each MPI process with each leaf node of the tree structure existing on a single MPI process.

If you don’t need the multiblock dataset to represent your data I would recommend using the vtkDataSet most appropriate for your data.

gmarks · June 13, 2018, 2:28pm

Thanks for the reply Andy. I definitely need to use multiblock, as dealing with large, unstructured, multiblock grids this seems like the only effective way to handle our data.

I’m still not clear on what you mean by the “full tree” being described on each MPI process. Do you mean that each block in the multiblock structure must contain all nodes, but then only define certain cells?

Andy_Bauer · June 13, 2018, 2:54pm

What does your data look like? VTK’s unstructured grid works very well in parallel and the multiblock data in general doesn’t help with large datasets or partitioning of those datasets.

Also, to be precise I’ll be using the VTK nomenclature for things. This means that grids have points and cells (instead of nodes and elements). Grids are derived from vtkDataSet. The multiblock dataset has blocks and when a block is a vtkDataSet then it is a leaf block.

gmarks · June 13, 2018, 3:05pm

What does your data look like? VTK’s unstructured grid works very well in parallel and the multiblock data in general doesn’t help with large datasets or partitioning of those datasets.

This is interesting information. Based on my reading of the Catalyst user guide it seemed like, for an unstructured grid, the multiblock structure was the preferred (and perhaps only) way to handle partitioning. Our data primarily consists of large, irregular, unstructured, multiblock grids for flow solutions. So it’s possible to have each process simply run through the code, define their locally relevant points and cells as an unstructured grid type, without any use of the multiblock or multipart structures?

So if I’m reading this correctly, to use multiblock properly the cells and points themselves must be defined on all blocks, but then when defining point data, one only defines it on the points that are relevant to that block?

Andy_Bauer · June 14, 2018, 2:16pm

I’d recommend working from the CxxFullExample example as that creates a partitioned unstructured grid.