Hello to the ParaView discourse community!
In the context of improving Table data processing and manipulation in ParaView, a few features which I have been contributing to have recently been implemented. These contributions include :
- The implementation of the Threshold Table Filter
- Table support for the Python Calculator.
Following the direction of these changes, I would like to improve the handling of data coming from different sources. Let’s say the user has two different Tables containing different data attributes :
We would like to “merge” these two tables into one bigger table with more columns and just as many rows, like an SQL JOIN operation :
There exist very few reliable ways to do this in ParaView, especially if the rows don’t follow the same order in the two input tables. ParaView offers the generic Append Attributes Filter which almost does the job, but only works in the very particular case where both input tables have the same number of rows, and fails to realize that the rows have been ordered differently.
VTK also has a vtkMergeTables class but it has nothing to do with the JOIN operation that we are trying to achieve here, since it just concatenates both input tables “on top” of each other.
Filter idea :
The idea is to create a filter that allows the user to combine data coming from two different tables based on a “key” column, just like a JOIN operation in SQL. This filter would take exactly two input Tables and output a single Table.
Note that the python library pandas already has an SQL style JOIN operation implemented, the list of arguments exposed by this function is quite extensive. For a ParaView implementation, one can assume that not all of them would make sense.
The parameters of this filter would include :
- Column name to base the merge operation on. For the merge to make sense, the two tables must have at least a column in common. In ParaView, this parameter can be implemented in the form of a combo box.
- Join method : Combo box {“inner”, “full”, “left”, “right”} specifying the join type. See this doc for reference.
- Dealing with undefined values : let the user decide whether to impute an arbitrary value, or leave the cell empty.
What are your thoughts on this proposal?
Would it make more sense to modify and improve an already existing filter (ie. Append Attributes), or to start from scratch and create a completely new filter?
Do you think of other options or parameters that would make sense to be implemented?
Are there edge cases that could expose issues?
Thanks in advance for your help!