Deprecating MultiBlock datasets

Will this have any effect on VTK? Will vtkMultiBlockDataset be deprecated upstream?

Yes. However, I suspect the dataset type itself can hang out longer esp if we have 2-way conversion filters.

1 Like

A small demo of the conversion filters in action:

Here’s the current output from a CGNS file. The multiblock looks as follows:

image

On converting this to vtkParitionedDataSetCollection using the vtkDataObjectToPartitionedDataSetCollection filter, it looks like this:

image

The data-assembly captures the relationships as follows:
image

Thus, this for filters that care about the relationships, they have all information necessary via the DataAssembly.

Now, by applying the vtkPartitionedDataSetCollectionToMultiBlockDataSet to convert this paritioned-dataset-collection back to multiblock, we get the following:

image

1 Like

In the case of CGNS, we can have /Base/blk1/Internal and /Base2/blk1/Internal in the same file. Thus we do not have unicity of block naming without the full path. I suspect it can be the same with vtm files.
Is it an issue ?

Very nice.

Not at all. Names in the Structure are just helpful hints and not needed to be unique. The assembly would indeed have full hierarchy. Users will be using the assembly to set color and other parameters in the Multiblock Inspector or in filters like Extract Block; so there won’t be any ambiguity.

The parts of ParaView that work with backwards compatibility that could potentially be affected are:

  1. Catalyst adaptors
  2. State files
  3. Python scripts
  4. Programmable filters (and later editions)

These are things that users work with regularly that may no longer be backwards compatible that I’m concerned about, especially since people have invested time into making these things work.

I do like the idea of improving on the multiblock dataset as there are quite a bit of issues with using it. Starting over may be what’s needed.

Indeed. I suspect it will be mix of automatic conversion and migration docs to handle backwards compatibility. By making sure this effort pays off dividends in added features and simplicity, we can hopefully ameliorate the pain.

I worry about backward compatibility for custom plugins as well, as I am extensively employing the vtkMultiBlockDataSetAlgorithm and the MultiBlock inspector in my own custom ParaView plugins (C++). It would be very convenient, after MultiBlocks get deprecated, to have clear instructions on how to switch to the new classes (the mentioned vtkPartitionedDataSet, vtkPartitionedDataSetCollection or vtkDataAssembly). If this could be provided then the transition might not be so painful then, otherwise, those who heavily use the MultiBlockDataSets, including myself, might remain “stuck” at the older ParaView versions for some time.

1 Like

Note, there are filters that ParaView will use internally to convert MutliblockDataSet to the new classes for at least another version or so. So in first pass, your plugin should be not affected at all.

Indeed, an extensive guide for transitioning is a requirement for this work.

1 Like

This looks very good, great job.

Is there a reason why a vtkPartitionedDataSetCollection cannot have more than one vtkDataAssembly attached to it?

I’m thinking it would be very useful if users are allowed to have multiple representations of the same data without the need to make multiple vtkPartitionedDataSetCollection instances. It would also give users a quick way to know how many assemblies a particular vtkPartitionedDataSetCollection has and quick way to access them.

To compliment this, we can allow users to add annotations on the assemblies in order to provide some kind of description or purpose for an assembly and so that searching for a particular assembly is easy. Also, since we will have multiple assemblies we should be able to set one of them to be the default assembly. Doing that will allow the vtkPartitionedDataSetCollection to behave as if it has only one assembly in use cases where users don’t care about the other assemblies that the data has. Essentially it will behave like it it would currently.

In summary the changes would look something like this:

vtkDataAssembly

// provide a way to add anotations to assemblies 
void SetAnnotation	(const char * key, const char * value)	

vtkPartitionedDataSetCollection

// allow users to add more than one data assembly
void AddDataAssembly(vtkDataAssembly *assembly) 

// setDataAssembly would become setDefaultDataAssembly
void SetDefaultDataAssembly(vtkDataAssembly *assembly)

// GetDataAssembly would become GetDefaultDataAssembly
vtkDataAssembly* GetDefaultDataAssembly ()

// Provide a way to grab certain assemblies using a map of annotation key value pairs
// passing in an empty map can indicate that the user wants all the assemblies
std::set<vtkDataAssembly*> GetDataAssemblies(const std::map<std::string, std::string>& annotations, bool match_all)

Is there a reason why a vtkPartitionedDataSetCollection cannot have more than one vtkDataAssembly attached to it?

That had indeed crossed my mind. No particular reason, honestly. Figured we could add support for it when we have a compelling use-case. It does add a bit of complexity to the UI.

Another question is whether the ‘default-assembly’ is a parameter on the vtkPartitionedDataSetCollection (as you have done in the API) or on a filter – similar to how vtkAlgorithm::SetInputArrayToProcess lets user choose which array to operate on, we could have SetInputDataAssemblyToProcess to choose which assembly to operate on. Or it could be a combination of both: i.e. choose default on the PDC and then let filters override which they operate on.

I don’t know how compelling my use-case is but I do see this addition as a way for the VTK Session in SMTK to take greater advantage of the Graph Resource. Conceptually the Graph Resource is very similar to vtkDataAssembly, the only difference is that it has the option to hold data local to it as well as reference data from an external dataset.

Model Resource currently acts like the SMTK wrapper for vtkMultiBlockDataSet (in the VTK session atleast) therefore the data and assembly of the data are coupled. However, when vtkMultiBlockDataSet gets deprecated we can have Model Resource (or some new type of Resource) act as the SMTK wrapper for vtkPartitionedDataSetCollection while Graph Resource acts as the SMTK wrapper for vtkDataAssembly.

Allowing multiple assemblies will then become very useful because in SMTK we are dealing with CAE workflows. A lot of workflows require users to take a single set of input data and move it through workflow phases that change its assembly in different ways. For example, we build Assembly 1 and present it to the user as their new representation, then collect some more settings and build Assembly 2. Sometimes the user just wants to view the input data using a different Assembly structure without necessarily moving it through any workflow phase. An example of that that would be building Assembly 1 and Assembly 2 at the same time.

In any case, I think allowing all the different assemblies to be connected to each other (through the vtkPartitionedDataSetCollection that they assemble) would simplify a lot of complexity on the SMTK side of things. It would reduce the need to copy input data and make getting from one assembly to another very easy.

Maybe in the future we can even rectify the differences between Graph Resource and vtkDataAssembly by allowing vtkDataAssemblies to ‘override’ or provide a local copy of certain vtkPartitionDataSets in the vtkPartitionedDataSetCollection that it is set on. In addition, the vtkDataAssembly can provide vtkPartitionDataSets that are meant to add on to or augment the linked vtkParitionedDataSetCollection when the particular vtkDataAssembly is used.

I like this approach a lot. It introduces a lot of flexibility.

1 Like

Just to update everyone, we are continuing to fill in capabilities holes to support vtkPartitionedDataSetCollection (PDC) in ParaView/VTK. While doing that, a possibly cleaner solution to bridge the gap to a multiblock-free world has come to light. Here are the details:

  1. Note, we do want to ultimately get rid of vtkMultiBlockDataSet (MB), however, for now we don’t have explicitly flag it deprecated. We can let them continue to exist together with PDC.
  2. vtkDataAssembly will be support attributes on nodes in the assembly. This is not much different than attributes on XML nodes. This also enables a richer selection / querying capability above and beyond the basic path-based queries originally proposed. Let’s call these selectors.
  3. We define a new concept of a “hierarchy”. A hierarchy is nothing but simply the internal organization of a vtkCompositeDataSet (CD) represented in a vtkDataAssembly. For any CD subclass, VTK will provide utility functions (my current prototype implementation puts these in a vtkDataAssemblyUtilities class) to create a hierarchy. This hierarchy will have attributes on nodes to help map concepts such as composite index. Adding a simple cid attribute to all nodes, we can now support selectors that directly use the composite-index to select nodes.
  4. PDC will support multiple named assemblies. Thus hierarchy becomes another named assembly associated with a PDC. Other CDs won’t support assemblies except hierarchy which is automatically generated when needed.
  5. ParaView exposes a small subset of filters/representations that allow user to choose composite index to subset an input CD e.g. extract block, spread sheet representation, volume representation etc. All these cases will now use a selector instead. Thus, any proxy with a int-property for a CompositeDataSetIndex is replaced with a string-property for selector instead. These filters will also expose UI to choose which named-assembly to use. Default will be hierarchy. For all CDs, hierarchy is available. PDC can have other assemblies too. This allows us to develop filters like ExtractBlock that use selectors, but also work with MBs and other CDs without any issues.

Now, suddenly, things like backwards compatibility of python scripts and pvsm state files or external plugins that use multiblocks become a non-issue. When loading old states or scripts, ParaView can automatically convert a composite id to a selector expression that’d produce similar result. Development wise, too, this becomes less daunting: now, it’s just a grep for CompositeIndex in the XMLs and update those to use selectors instead – a more manageable task.

5 Likes

The CityGML reader produces a multiblock dataset, where the root has a list of children that represent buildings (and some other objects such as bridges, tunnels and so on), each building as a list of children that represent the walls and the roof of the building. The reason for this structure is that each node can have its own color or texture. We store the color or a reference to the texture as field data.

In the past I had a ParaView script that traverses all leaf nodes, extracts them and applies the texture or color. With the new PDC model ExtractBlock needs a selector so my iteration does not work anymore. Is there a way to iterate through all blocks and extract them for a MultiBlock dataset? Here is the blog CityGML Reader - Kitware Blog which has a reference to the script: https://blog.kitware.com/wp-content/uploads/2018/10/citygml-paraview-apply-texture.zip
Thanks!

@utkarsh.ayachit I have 31 C++ examples using vtkMultiBlockDataSet, many of these also use vtkMultiBlockPLOT3DReader, would I be correct in using a vtkPartitionedDataSetCollection instead? Also what replaces vtkMultiBlockPLOT3DReader?
If you have time to look at:

giving me some pointers, I’ll have a go at updating these examples!

Ideally, yes. But reality is always a little more clunky :).

So I think what’s going to happen is we’re going to slowly deprecate MB. There’s a lot of code that relies on MB and changing all of that in one go is simply not possible.

vtkMultiBlockPLOT3DReader indeed needs to be converted first. Then the examples can be ported over.

Ok

I am ready to convert a few private reader plugins which use vtkMultiBlockDataSet. My first idea was to re-use my prototypes based on ProgrammableSources. However, the current version on the master branch does not have an option to set its Output Data Set Type to “vtkPartitionedDataSet”. Could this be added before the pv5.10 branch? TIA

definitely, I’ve reported an issue here: https://gitlab.kitware.com/paraview/paraview/-/issues/20923