Deprecating MultiBlock datasets

This sounds frightening, but sounds like the only way, and a very logical way, to move forward. I’m convinced we have got to move forward from the old Exodus reader. That thing is going to sink some day soon from all of the barnacles surrounding it.

Is this a big enough change in the code to justify calling this ParaView 6.0? Maybe include the update to the toolbars at the top of ParaView?

They don’t need to be updated in the first pass, but ideally should be. I can definitely help with that. Just looking the two examples, the changes should be fairly minimal, once we have the reader converted over to producing vtkPartitionedDataSetCollection – which itself is quite easy here too.

I am not sure. Let’s wait and see.

1 Like

Thanks for the offer. It is much appreciated. In my VTK build VTK_LEGACY_REMOVE is ON. So if I miss the MR, I should see it when it happens!

This is a bold move but it seems fair. Is the vtkDataAssembly designed to replace Subset Inclusion Lattice ? Can multiple vtkDataAsssembly be used over a partitioned dataset collection ?

1 Like

Presumably not a large problem for our generators within OpenFOAM - they generate multiblock for topologically different items and multipiece for handling ranks, which likely map OK.
But what becomes of the .vtm format?

/mark

To some extent.

Based on our discussion earlier about selection mechanisms offered by readers, SubsetInclusionLattice for selecting which blocks should simply be removed. Readers can offer format-specific simpler selection, all of which can simply use vtkDataArraySelection instances (which should probably be renamed, since it can be used for other things than just array-selection).

The other conceptualized use-case – but not implemented yet – was to allow setting up of block parameters or selection (in case of filters like Extract Block) using the SubsetInclusionLattice. That would indeed be replaced by vtkDataAssembly.

That was indeed a use-case we were considering. Currently, it doesn’t; only one vtkDataAsssembly is supported. However, it should be possible to support that in the future – just needs a little more thinking about ramifications etc.

Indeed. Should be a trivial change.

There’s .vtpc which is replaces .vtm. However, when Multiblock Dataset is totally removed, we can make the .vtm reader produce a PartitionedDataSetCollection + DataAssembly instead (similar to the vtkDataObjectToPartitionedDataSetCollection filter under development).

Will this have any effect on VTK? Will vtkMultiBlockDataset be deprecated upstream?

Yes. However, I suspect the dataset type itself can hang out longer esp if we have 2-way conversion filters.

1 Like

A small demo of the conversion filters in action:

Here’s the current output from a CGNS file. The multiblock looks as follows:

image

On converting this to vtkParitionedDataSetCollection using the vtkDataObjectToPartitionedDataSetCollection filter, it looks like this:

image

The data-assembly captures the relationships as follows:
image

Thus, this for filters that care about the relationships, they have all information necessary via the DataAssembly.

Now, by applying the vtkPartitionedDataSetCollectionToMultiBlockDataSet to convert this paritioned-dataset-collection back to multiblock, we get the following:

image

1 Like

In the case of CGNS, we can have /Base/blk1/Internal and /Base2/blk1/Internal in the same file. Thus we do not have unicity of block naming without the full path. I suspect it can be the same with vtm files.
Is it an issue ?

Very nice.

Not at all. Names in the Structure are just helpful hints and not needed to be unique. The assembly would indeed have full hierarchy. Users will be using the assembly to set color and other parameters in the Multiblock Inspector or in filters like Extract Block; so there won’t be any ambiguity.

The parts of ParaView that work with backwards compatibility that could potentially be affected are:

  1. Catalyst adaptors
  2. State files
  3. Python scripts
  4. Programmable filters (and later editions)

These are things that users work with regularly that may no longer be backwards compatible that I’m concerned about, especially since people have invested time into making these things work.

I do like the idea of improving on the multiblock dataset as there are quite a bit of issues with using it. Starting over may be what’s needed.

Indeed. I suspect it will be mix of automatic conversion and migration docs to handle backwards compatibility. By making sure this effort pays off dividends in added features and simplicity, we can hopefully ameliorate the pain.

I worry about backward compatibility for custom plugins as well, as I am extensively employing the vtkMultiBlockDataSetAlgorithm and the MultiBlock inspector in my own custom ParaView plugins (C++). It would be very convenient, after MultiBlocks get deprecated, to have clear instructions on how to switch to the new classes (the mentioned vtkPartitionedDataSet, vtkPartitionedDataSetCollection or vtkDataAssembly). If this could be provided then the transition might not be so painful then, otherwise, those who heavily use the MultiBlockDataSets, including myself, might remain “stuck” at the older ParaView versions for some time.

Note, there are filters that ParaView will use internally to convert MutliblockDataSet to the new classes for at least another version or so. So in first pass, your plugin should be not affected at all.

Indeed, an extensive guide for transitioning is a requirement for this work.

1 Like

This looks very good, great job.

Is there a reason why a vtkPartitionedDataSetCollection cannot have more than one vtkDataAssembly attached to it?

I’m thinking it would be very useful if users are allowed to have multiple representations of the same data without the need to make multiple vtkPartitionedDataSetCollection instances. It would also give users a quick way to know how many assemblies a particular vtkPartitionedDataSetCollection has and quick way to access them.

To compliment this, we can allow users to add annotations on the assemblies in order to provide some kind of description or purpose for an assembly and so that searching for a particular assembly is easy. Also, since we will have multiple assemblies we should be able to set one of them to be the default assembly. Doing that will allow the vtkPartitionedDataSetCollection to behave as if it has only one assembly in use cases where users don’t care about the other assemblies that the data has. Essentially it will behave like it it would currently.

In summary the changes would look something like this:

vtkDataAssembly

// provide a way to add anotations to assemblies 
void SetAnnotation	(const char * key, const char * value)	

vtkPartitionedDataSetCollection

// allow users to add more than one data assembly
void AddDataAssembly(vtkDataAssembly *assembly) 

// setDataAssembly would become setDefaultDataAssembly
void SetDefaultDataAssembly(vtkDataAssembly *assembly)

// GetDataAssembly would become GetDefaultDataAssembly
vtkDataAssembly* GetDefaultDataAssembly ()

// Provide a way to grab certain assemblies using a map of annotation key value pairs
// passing in an empty map can indicate that the user wants all the assemblies
std::set<vtkDataAssembly*> GetDataAssemblies(const std::map<std::string, std::string>& annotations, bool match_all)

Is there a reason why a vtkPartitionedDataSetCollection cannot have more than one vtkDataAssembly attached to it?

That had indeed crossed my mind. No particular reason, honestly. Figured we could add support for it when we have a compelling use-case. It does add a bit of complexity to the UI.

Another question is whether the ‘default-assembly’ is a parameter on the vtkPartitionedDataSetCollection (as you have done in the API) or on a filter – similar to how vtkAlgorithm::SetInputArrayToProcess lets user choose which array to operate on, we could have SetInputDataAssemblyToProcess to choose which assembly to operate on. Or it could be a combination of both: i.e. choose default on the PDC and then let filters override which they operate on.

I don’t know how compelling my use-case is but I do see this addition as a way for the VTK Session in SMTK to take greater advantage of the Graph Resource. Conceptually the Graph Resource is very similar to vtkDataAssembly, the only difference is that it has the option to hold data local to it as well as reference data from an external dataset.

Model Resource currently acts like the SMTK wrapper for vtkMultiBlockDataSet (in the VTK session atleast) therefore the data and assembly of the data are coupled. However, when vtkMultiBlockDataSet gets deprecated we can have Model Resource (or some new type of Resource) act as the SMTK wrapper for vtkPartitionedDataSetCollection while Graph Resource acts as the SMTK wrapper for vtkDataAssembly.

Allowing multiple assemblies will then become very useful because in SMTK we are dealing with CAE workflows. A lot of workflows require users to take a single set of input data and move it through workflow phases that change its assembly in different ways. For example, we build Assembly 1 and present it to the user as their new representation, then collect some more settings and build Assembly 2. Sometimes the user just wants to view the input data using a different Assembly structure without necessarily moving it through any workflow phase. An example of that that would be building Assembly 1 and Assembly 2 at the same time.

In any case, I think allowing all the different assemblies to be connected to each other (through the vtkPartitionedDataSetCollection that they assemble) would simplify a lot of complexity on the SMTK side of things. It would reduce the need to copy input data and make getting from one assembly to another very easy.

Maybe in the future we can even rectify the differences between Graph Resource and vtkDataAssembly by allowing vtkDataAssemblies to ‘override’ or provide a local copy of certain vtkPartitionDataSets in the vtkPartitionedDataSetCollection that it is set on. In addition, the vtkDataAssembly can provide vtkPartitionDataSets that are meant to add on to or augment the linked vtkParitionedDataSetCollection when the particular vtkDataAssembly is used.

I like this approach a lot. It introduces a lot of flexibility.

1 Like

Just to update everyone, we are continuing to fill in capabilities holes to support vtkPartitionedDataSetCollection (PDC) in ParaView/VTK. While doing that, a possibly cleaner solution to bridge the gap to a multiblock-free world has come to light. Here are the details:

  1. Note, we do want to ultimately get rid of vtkMultiBlockDataSet (MB), however, for now we don’t have explicitly flag it deprecated. We can let them continue to exist together with PDC.
  2. vtkDataAssembly will be support attributes on nodes in the assembly. This is not much different than attributes on XML nodes. This also enables a richer selection / querying capability above and beyond the basic path-based queries originally proposed. Let’s call these selectors.
  3. We define a new concept of a “hierarchy”. A hierarchy is nothing but simply the internal organization of a vtkCompositeDataSet (CD) represented in a vtkDataAssembly. For any CD subclass, VTK will provide utility functions (my current prototype implementation puts these in a vtkDataAssemblyUtilities class) to create a hierarchy. This hierarchy will have attributes on nodes to help map concepts such as composite index. Adding a simple cid attribute to all nodes, we can now support selectors that directly use the composite-index to select nodes.
  4. PDC will support multiple named assemblies. Thus hierarchy becomes another named assembly associated with a PDC. Other CDs won’t support assemblies except hierarchy which is automatically generated when needed.
  5. ParaView exposes a small subset of filters/representations that allow user to choose composite index to subset an input CD e.g. extract block, spread sheet representation, volume representation etc. All these cases will now use a selector instead. Thus, any proxy with a int-property for a CompositeDataSetIndex is replaced with a string-property for selector instead. These filters will also expose UI to choose which named-assembly to use. Default will be hierarchy. For all CDs, hierarchy is available. PDC can have other assemblies too. This allows us to develop filters like ExtractBlock that use selectors, but also work with MBs and other CDs without any issues.

Now, suddenly, things like backwards compatibility of python scripts and pvsm state files or external plugins that use multiblocks become a non-issue. When loading old states or scripts, ParaView can automatically convert a composite id to a selector expression that’d produce similar result. Development wise, too, this becomes less daunting: now, it’s just a grep for CompositeIndex in the XMLs and update those to use selectors instead – a more manageable task.

5 Likes