And a few instance in the source code of ParaView. I think we should defined this term precisely.
Here is a suggestion:
"A parallel aware filter or reader is a filter or reader containing dedicated code that makes it work correctly in parallel and would not work without it. Regarding filter, the stream tracer is a parralel aware filter as it would not work without dedicated code to transfer particles from one domain to the next. Regarding reader, all unstructred data reader needs to be parallel aware to output distributed data, eg: Exodus reader"
If we consider that we have an agreement here, that we need to list which readers are parallel aware and which aren’t.
If should be standard in the documentation of each unstructured data reader to specify if they are parallel aware are not. Ideally, this information should also be available as a nice list in the doc.
That lead us to my final point, regarding readers, parallel aware is not enough.
Some readers are indeed parallel aware, but still require the whole dataset to be reader from disk in order to perform the distribution.
eg:
PVTP readers on each nodes do not need to read the whole dataset, only the part that each node needs
EnSight readers on each nodes need to read the whole dataset before distributing
This is an important distinction and users needs to know about this in order to choose a file format to work with.
The PVTP reader (as well as the PVTU reader) is only as parallel/distributed as the number of files that it was partitioned into when writing it out. In general more files are written out than pvservers will be used when reading it back in but if there’s only one VTP or VTU file pointed to by the meta file then the data will only be read in by a single pvserver process no matter how many MPI pvserver processes are created.
Should ghost information be included in this discussion? Some readers produce it automatically while others do not. Without knowing which ones do and which ones don’t, knowing how to make a pipeline work correctly in parallel requires a significant amount of knowledge. For example, PVTU reader and then Point Data to Cell Data filter won’t work correctly in parallel while PVTI reader and then Point Data to Cell Data filter will work correctly in parallel.
I believe this is true for all unstructured grid readers / file formats.
I am not sure this is reader implementation specific and hence probably not related here. Any reader will provide ghost information is present. We’re also working on streamlineing ghost-cell generation so eventually, the ParaView pipeline should be able to provide /compute ghost cells when needed with ease independently. cc: @Yohann_Bearzi
I thought that the discussion here was informing PV users about the behavior of parallel/distributed aware readers. I think most advanced users realize that readers that output structured dataset formats will get different behavior with regards to cell distribution and ghost information than non structured dataset formats. Informing newer users of this subtlety would be a nice thing. If that’s not the goal of this discussion then nevermind.