Join Tables Filter

Hello to the ParaView discourse community!

In the context of improving Table data processing and manipulation in ParaView, a few features which I have been contributing to have recently been implemented. These contributions include :

Following the direction of these changes, I would like to improve the handling of data coming from different sources. Let’s say the user has two different Tables containing different data attributes :

We would like to “merge” these two tables into one bigger table with more columns and just as many rows, like an SQL JOIN operation :

There exist very few reliable ways to do this in ParaView, especially if the rows don’t follow the same order in the two input tables. ParaView offers the generic Append Attributes Filter which almost does the job, but only works in the very particular case where both input tables have the same number of rows, and fails to realize that the rows have been ordered differently.

VTK also has a vtkMergeTables class but it has nothing to do with the JOIN operation that we are trying to achieve here, since it just concatenates both input tables “on top” of each other.

Filter idea :

The idea is to create a filter that allows the user to combine data coming from two different tables based on a “key” column, just like a JOIN operation in SQL. This filter would take exactly two input Tables and output a single Table.

Note that the python library pandas already has an SQL style JOIN operation implemented, the list of arguments exposed by this function is quite extensive. For a ParaView implementation, one can assume that not all of them would make sense.

The parameters of this filter would include :

  • Column name to base the merge operation on. For the merge to make sense, the two tables must have at least a column in common. In ParaView, this parameter can be implemented in the form of a combo box.
  • Join method : Combo box {“inner”, “full”, “left”, “right”} specifying the join type. See this doc for reference.
  • Dealing with undefined values : let the user decide whether to impute an arbitrary value, or leave the cell empty.

What are your thoughts on this proposal?

Would it make more sense to modify and improve an already existing filter (ie. Append Attributes), or to start from scratch and create a completely new filter?

Do you think of other options or parameters that would make sense to be implemented?

Are there edge cases that could expose issues?

Thanks in advance for your help!

3 Likes

Very good idea !

This merge or join operation only make sense in a vtkTable environnement, you can’t append attributes from one dataset into another if they do not have the same geometry as you would also have to merge/join the geometries, which is completelly out of scope.

Then this is a new filter, specific to tables. Since MergeTables already exists, I suppose JoinTables makes sense ?

Column name to base the merge operation on

ParaView basically would rely on the ID here. So the user would be able to select one column in the first input and one column in the second input to do the merge ?

Join method : Combo box {“inner”, “full”, “left”, “right”}

I’m not a SQL person, could you give some details ?

Dealing with undefined values : let the user decide whether to impute an arbitrary value, or leave the cell empty.

Empty is not a thing, NaN or arbitrary value could be fine

Do you think of other options or parameters that would make sense to be implemented?
Are there edge cases that could expose issues?

When you compare the content of the two columns for the merge, many edge cases can appears:

  • Column of different types( eg: 1.0f == 1.0d or 1.0f == 1i ?)
  • Column of different way to represent same values (eg: 1.0 == “1.0” ?)
  • Column with numerical imprecisions (1.0000000000000001 == 1.0 ?)

A solution would be to specify the multi type comparison is not supported, add a tolerance parameter for floating point comparison, or even restrict to String/vtkIdType. In any case, this should be considered.

@mwesphal: For the join methods from SQL:

We should probably pick other names to match Scientific Visualization world : “difference”, “union”, etc.

This sounds the most reasonable solution.

1 Like