Deprecating MultiBlock datasets

Andy_Bauer · July 27, 2020, 5:00pm

The parts of ParaView that work with backwards compatibility that could potentially be affected are:

Catalyst adaptors
State files
Python scripts
Programmable filters (and later editions)

These are things that users work with regularly that may no longer be backwards compatible that I’m concerned about, especially since people have invested time into making these things work.

I do like the idea of improving on the multiblock dataset as there are quite a bit of issues with using it. Starting over may be what’s needed.

utkarsh.ayachit · July 27, 2020, 10:37pm

Indeed. I suspect it will be mix of automatic conversion and migration docs to handle backwards compatibility. By making sure this effort pays off dividends in added features and simplicity, we can hopefully ameliorate the pain.

penkod · July 30, 2020, 7:48am

I worry about backward compatibility for custom plugins as well, as I am extensively employing the vtkMultiBlockDataSetAlgorithm and the MultiBlock inspector in my own custom ParaView plugins (C++). It would be very convenient, after MultiBlocks get deprecated, to have clear instructions on how to switch to the new classes (the mentioned vtkPartitionedDataSet, vtkPartitionedDataSetCollection or vtkDataAssembly). If this could be provided then the transition might not be so painful then, otherwise, those who heavily use the MultiBlockDataSets, including myself, might remain “stuck” at the older ParaView versions for some time.

utkarsh.ayachit · July 30, 2020, 10:46am

Note, there are filters that ParaView will use internally to convert MutliblockDataSet to the new classes for at least another version or so. So in first pass, your plugin should be not affected at all.

Indeed, an extensive guide for transitioning is a requirement for this work.

amuhsin · September 25, 2020, 3:32pm

This looks very good, great job.

Is there a reason why a vtkPartitionedDataSetCollection cannot have more than one vtkDataAssembly attached to it?

I’m thinking it would be very useful if users are allowed to have multiple representations of the same data without the need to make multiple vtkPartitionedDataSetCollection instances. It would also give users a quick way to know how many assemblies a particular vtkPartitionedDataSetCollection has and quick way to access them.

To compliment this, we can allow users to add annotations on the assemblies in order to provide some kind of description or purpose for an assembly and so that searching for a particular assembly is easy. Also, since we will have multiple assemblies we should be able to set one of them to be the default assembly. Doing that will allow the vtkPartitionedDataSetCollection to behave as if it has only one assembly in use cases where users don’t care about the other assemblies that the data has. Essentially it will behave like it it would currently.

In summary the changes would look something like this:

vtkDataAssembly

// provide a way to add anotations to assemblies 
void SetAnnotation	(const char * key, const char * value)

vtkPartitionedDataSetCollection

// allow users to add more than one data assembly
void AddDataAssembly(vtkDataAssembly *assembly) 

// setDataAssembly would become setDefaultDataAssembly
void SetDefaultDataAssembly(vtkDataAssembly *assembly)

// GetDataAssembly would become GetDefaultDataAssembly
vtkDataAssembly* GetDefaultDataAssembly ()

// Provide a way to grab certain assemblies using a map of annotation key value pairs
// passing in an empty map can indicate that the user wants all the assemblies
std::set<vtkDataAssembly*> GetDataAssemblies(const std::map<std::string, std::string>& annotations, bool match_all)

utkarsh.ayachit · September 25, 2020, 5:47pm

Is there a reason why a vtkPartitionedDataSetCollection cannot have more than one vtkDataAssembly attached to it?

That had indeed crossed my mind. No particular reason, honestly. Figured we could add support for it when we have a compelling use-case. It does add a bit of complexity to the UI.

Another question is whether the ‘default-assembly’ is a parameter on the vtkPartitionedDataSetCollection (as you have done in the API) or on a filter – similar to how vtkAlgorithm::SetInputArrayToProcess lets user choose which array to operate on, we could have SetInputDataAssemblyToProcess to choose which assembly to operate on. Or it could be a combination of both: i.e. choose default on the PDC and then let filters override which they operate on.

amuhsin · September 25, 2020, 10:25pm

I don’t know how compelling my use-case is but I do see this addition as a way for the VTK Session in SMTK to take greater advantage of the Graph Resource. Conceptually the Graph Resource is very similar to vtkDataAssembly, the only difference is that it has the option to hold data local to it as well as reference data from an external dataset.

Model Resource currently acts like the SMTK wrapper for vtkMultiBlockDataSet (in the VTK session atleast) therefore the data and assembly of the data are coupled. However, when vtkMultiBlockDataSet gets deprecated we can have Model Resource (or some new type of Resource) act as the SMTK wrapper for vtkPartitionedDataSetCollection while Graph Resource acts as the SMTK wrapper for vtkDataAssembly.

Allowing multiple assemblies will then become very useful because in SMTK we are dealing with CAE workflows. A lot of workflows require users to take a single set of input data and move it through workflow phases that change its assembly in different ways. For example, we build Assembly 1 and present it to the user as their new representation, then collect some more settings and build Assembly 2. Sometimes the user just wants to view the input data using a different Assembly structure without necessarily moving it through any workflow phase. An example of that that would be building Assembly 1 and Assembly 2 at the same time.

In any case, I think allowing all the different assemblies to be connected to each other (through the vtkPartitionedDataSetCollection that they assemble) would simplify a lot of complexity on the SMTK side of things. It would reduce the need to copy input data and make getting from one assembly to another very easy.

Maybe in the future we can even rectify the differences between Graph Resource and vtkDataAssembly by allowing vtkDataAssemblies to ‘override’ or provide a local copy of certain vtkPartitionDataSets in the vtkPartitionedDataSetCollection that it is set on. In addition, the vtkDataAssembly can provide vtkPartitionDataSets that are meant to add on to or augment the linked vtkParitionedDataSetCollection when the particular vtkDataAssembly is used.

I like this approach a lot. It introduces a lot of flexibility.

utkarsh.ayachit · January 25, 2021, 1:55pm

Just to update everyone, we are continuing to fill in capabilities holes to support vtkPartitionedDataSetCollection (PDC) in ParaView/VTK. While doing that, a possibly cleaner solution to bridge the gap to a multiblock-free world has come to light. Here are the details:

Note, we do want to ultimately get rid of vtkMultiBlockDataSet (MB), however, for now we don’t have explicitly flag it deprecated. We can let them continue to exist together with PDC.
vtkDataAssembly will be support attributes on nodes in the assembly. This is not much different than attributes on XML nodes. This also enables a richer selection / querying capability above and beyond the basic path-based queries originally proposed. Let’s call these selectors.
We define a new concept of a “hierarchy”. A hierarchy is nothing but simply the internal organization of a vtkCompositeDataSet (CD) represented in a vtkDataAssembly. For any CD subclass, VTK will provide utility functions (my current prototype implementation puts these in a vtkDataAssemblyUtilities class) to create a hierarchy. This hierarchy will have attributes on nodes to help map concepts such as composite index. Adding a simple cid attribute to all nodes, we can now support selectors that directly use the composite-index to select nodes.
PDC will support multiple named assemblies. Thus hierarchy becomes another named assembly associated with a PDC. Other CDs won’t support assemblies except hierarchy which is automatically generated when needed.
ParaView exposes a small subset of filters/representations that allow user to choose composite index to subset an input CD e.g. extract block, spread sheet representation, volume representation etc. All these cases will now use a selector instead. Thus, any proxy with a int-property for a CompositeDataSetIndex is replaced with a string-property for selector instead. These filters will also expose UI to choose which named-assembly to use. Default will be hierarchy. For all CDs, hierarchy is available. PDC can have other assemblies too. This allows us to develop filters like ExtractBlock that use selectors, but also work with MBs and other CDs without any issues.

Now, suddenly, things like backwards compatibility of python scripts and pvsm state files or external plugins that use multiblocks become a non-issue. When loading old states or scripts, ParaView can automatically convert a composite id to a selector expression that’d produce similar result. Development wise, too, this becomes less daunting: now, it’s just a grep for CompositeIndex in the XMLs and update those to use selectors instead – a more manageable task.

danlipsa · April 15, 2021, 6:47pm

The CityGML reader produces a multiblock dataset, where the root has a list of children that represent buildings (and some other objects such as bridges, tunnels and so on), each building as a list of children that represent the walls and the roof of the building. The reason for this structure is that each node can have its own color or texture. We store the color or a reference to the texture as field data.

In the past I had a ParaView script that traverses all leaf nodes, extracts them and applies the texture or color. With the new PDC model ExtractBlock needs a selector so my iteration does not work anymore. Is there a way to iterate through all blocks and extract them for a MultiBlock dataset? Here is the blog CityGML Reader - Kitware Blog which has a reference to the script: https://blog.kitware.com/wp-content/uploads/2018/10/citygml-paraview-apply-texture.zip
Thanks!

Andrew_Maclean · July 15, 2021, 6:31am

@utkarsh.ayachit I have 31 C++ examples using vtkMultiBlockDataSet, many of these also use vtkMultiBlockPLOT3DReader, would I be correct in using a vtkPartitionedDataSetCollection instead? Also what replaces vtkMultiBlockPLOT3DReader?
If you have time to look at:

giving me some pointers, I’ll have a go at updating these examples!

utkarsh.ayachit · July 15, 2021, 12:25pm

Ideally, yes. But reality is always a little more clunky :).

So I think what’s going to happen is we’re going to slowly deprecate MB. There’s a lot of code that relies on MB and changing all of that in one go is simply not possible.

vtkMultiBlockPLOT3DReader indeed needs to be converted first. Then the examples can be ported over.

Andrew_Maclean · July 15, 2021, 8:46pm

Ok

jfavre · September 1, 2021, 12:45pm

I am ready to convert a few private reader plugins which use vtkMultiBlockDataSet. My first idea was to re-use my prototypes based on ProgrammableSources. However, the current version on the master branch does not have an option to set its Output Data Set Type to “vtkPartitionedDataSet”. Could this be added before the pv5.10 branch? TIA

utkarsh.ayachit · September 1, 2021, 1:03pm

definitely, I’ve reported an issue here: https://gitlab.kitware.com/paraview/paraview/-/issues/20923

jfavre · September 1, 2021, 1:06pm

sweet. Thanks.

Juan_Sanchez · October 30, 2022, 9:11pm

Are MultiBlock datasets officially deprecated? How do I convert my vtm file to vtpc? I wasn’t able to find documentation on the file format.

<?xml version="1.0"?>
<VTKFile type="vtkMultiBlockDataSet" version="0.1" byte_order="LittleEndian" compressor="vtkZLibDataCompressor">
  <vtkMultiBlockDataSet>
    <DataSet group="0" dataset="0" file="gmsh_mos2d_potentialonly_0.vtu"/>
    <DataSet group="1" dataset="0" file="gmsh_mos2d_potentialonly_1.vtu"/>
    <DataSet group="2" dataset="0" file="gmsh_mos2d_potentialonly_2.vtu"/>
  </vtkMultiBlockDataSet>
</VTKFile>

mwestphal · October 31, 2022, 1:14am

Hi @Juan_Sanchez

Multiblock are still supported and not yet deprecated, if it ever happens.
However, vtkPartitionedDataSet are much better than multiblocks in many ways.

I’m afraid multiblock are indeed missing from the doc:
https://kitware.github.io/vtk-examples/site/VTKFileFormats/

But good info can be found here (from @berkgeveci ):
https://public.kitware.com/pipermail/paraview/2005-November/002188.html

IvanPi · November 7, 2022, 8:56am

Is there any place I could find an XML example of a vtkPartitionedDataSet? Would it be similar to the MultiBlock dataset example shown by @Juan_Sanchez?

mwestphal · November 7, 2022, 9:02am

Here is one, but you can generate one easily with ParaView using SaveData:
collection.tgz (106.6 KB)

<VTKFile type="vtkPartitionedDataSetCollection" version="1.0" byte_order="LittleEndian" header_type="UInt64">
  <vtkPartitionedDataSetCollection>
    <Partitions index="0" name="Boy">
      <DataSet index="0" file="collection/collection_0_0.vtp"/>
    </Partitions>
    <DataAssembly encoding="base64">
      PD94bWwgdmVyc2lvbj0iMS4wIj8+CjxBc3NlbWJseSB0eXBlPSJ2dGtEYXRhQXNzZW1ibHkiIHZlcnNpb249IjEuMCIgaWQ9IjAiPgogIDxOb25PcmllbnRhYmxlU3VyZmFjZXMgaWQ9IjEiPgogICAgPGRhdGFzZXQgaWQ9IjAiIC8+CiAgPC9Ob25PcmllbnRhYmxlU3VyZmFjZXM+CiAgPE9yaWVudGFibGVTdXJmYWNlcyBpZD0iMiIgLz4KPC9Bc3NlbWJseT4K
    </DataAssembly>
  </vtkPartitionedDataSetCollection>
</VTKFile>

Juan_Sanchez · November 11, 2022, 4:49am

Mathieu Westphal (Kitware):

      PD94bWwgdmVyc2lvbj0iMS4wIj8+CjxBc3NlbWJseSB0eXBlPSJ2dGtEYXRhQXNzZW1ibHkiIHZlcnNpb249IjEuMCIgaWQ9IjAiPgogIDxOb25PcmllbnRhYmxlU3VyZmFjZXMgaWQ9IjEiPgogICAgPGRhdGFzZXQgaWQ9IjAiIC8+CiAgPC9Ob25PcmllbnRhYmxlU3VyZmFjZXM+CiAgPE9yaWVudGFibGVTdXJmYWNlcyBpZD0iMiIgLz4KPC9Bc3NlbWJseT4K

I put the DataAssembly into a base64 decoder and got:

<?xml version="1.0"?>
<Assembly type="vtkDataAssembly" version="1.0" id="0">
  <NonOrientableSurfaces id="1">
    <dataset id="0" />
  </NonOrientableSurfaces>
  <OrientableSurfaces id="2" />
</Assembly>

Is there a documentation for these types of xml records? I assume I can substitute .vtu files files in the partitions section.