Data Hierarchy: How to name blocks

utkarsh.ayachit · February 24, 2021, 2:27pm

As I continue to work on adding support for data assembly in VTK/ParaView, I encountered a design choice and I figured I’d see what folks prefer.

So here’s the background: to make it possible for the application to support both vtkPartitionedDataSetCollection (PDC) and vtkMultiBlockDataSet (MB) at the same time, we introduced the concept of a hierarchy. A hierarchy is simply a vtkDataAssembly (DA) representing the structure of the PDC or MB. Remember, a PDC can have additional DAs associated with it that define arbitrary logical groupping and organization of the individual blocks (rather partitioned-datasets). The hierarchy is simply another DA that can be auto-generated using a PDC or MB that represents the parent/child relationships available in the dataset itself. This allows filters to work with DA together with selectors for nodes in the DA always, irrespective of whether the data is PDC or MB. Previously, such filters used composite-index (CID). CIDs are no longer workable solution with PDC since PDC structure does not need to match across all ranks and hence CIDs are not valid across ranks. Thus, ExtractBlock filter, for example, no longer accepts CIDs, but takes selectors identifying nodes in a chosen DA.

vtkDataAssemblyUtilites is a utility class that can generate these hierarchies for a PDC or MB. Here’s a hierarchy generated for the MB produced by the exodus reader for the familiar can.ex2 dataset.

<Root type="vtkDataAssembly" version="1.0" id="0" vtk_category="hierarchy" cid="0" vtk_type="13">
    <Block0 id="1" label="Element Blocks" cid="1" vtk_type="13">
        <Block0 id="2" label="Unnamed block ID: 1" cid="2" />
        <Block1 id="3" label="Unnamed block ID: 2" cid="3" />
    </Block0>
    <Block1 id="4" label="Face Blocks" cid="4">
    </Block1>
    <Block2 id="5" label="Side Sets" cid="5" vtk_type="13">
        <Block0 id="6" label="Unnamed set ID: 4" cid="6" />
    </Block2>
    <Block3 id="7" label="Node Sets" cid="7" vtk_type="13">
        <Block0 id="8" label="Unnamed set ID: 1" cid="8" />
        <Block1 id="9" label="Unnamed set ID: 100" cid="9" />
    </Block3>
</Root>

Note how node names (names used for the XML node) are not same as the block names in the MB. This is so since the block-names are ill-formed for use as XML node-names. They can have spaces, for example. So I decided to adopt the generic form Block# and use label attribute to preserve the block name information. However, this has a problem: when defining selectors for block names, it’s ugly. I can’t simply say //BlockName to select a node with the name BlockName instead, I have to do //*[@label="BlockName"].

Another option is to use a sanitized version of the block name from the original dataset as the node name. For example, the hierarchy could look like follows:

<Root ... >
  <ElementBlocks label="Element Blocks" ... >
     <UnnamedblockID1 ... >
...
</Root>

Here now the xpath selectors are nicer : //ElementBlocks will select the element blocks nicely. However, the issue is that we have no guarantee that block names in the original multiblock dataset are unique. If they aren’t, //ElementBlocks will indeed select multiple nodes. When were were dealing with composite ids, this ambiguity didn’t exist.

My question is, is this a reasonable sacrifice to make the selectors easier and more user friendly? I think so. Any sensible composite dataset producer will assign sensible block names and duplicated block names just don’t seem sensible. I wonder what folks think.

dcthomp · February 24, 2021, 2:46pm

I agree. Really, the names assigned by sources can be thought of as CSS classes… it just happens that most leaf nodes have unique “class” names.

theodorebaltis · February 24, 2021, 4:04pm

I agree.

And the ambiguity actually helps my user base as we typically structure our composite datasets so that every (2nd from) bottom-most branch is then actually subdivided into further groups based on element type. Something like:

Root
  Group1
    Solid
    Shell
    SPH
  Group2
    SubgroupA
      Solid
      Shell
    SubgroupB
      Solid
      SPH

So it would be great to be able to select all the Solid parts in the model using //Solid

utkarsh.ayachit · February 24, 2021, 5:51pm

Great! I think I’ll make that change then. Thanks for the feedback, @theodorebaltis , @dcthomp !

boonth · February 24, 2021, 6:35pm

Assuming block names are unique is a sensible assumption, I agree.

wascott · February 24, 2021, 6:45pm

I also agree…