I ran the case in our solver on 4 MPI ranks, so there are 4 partitions in this dataset. When I read the pvtu file in ParaView, I get a single unstructured grid. However, when I use the Extract Surface filter, the partition boundaries are included in the boundary surfaces. How can I make those disappear? I tried using the Ghost Cells filter, but as far as I can tell, the output is the same as the input. The output has a vtkGhostType array, but it’s all zeros (i.e. not ghosts). This is true whether I run ParaView in parallel (with 4 processes) or not. The image is what I see with a pipeline like Load pvtu file → Extract Surface → Clip (Z=0).
All those blocky surfaces are the partition boundaries. Is there a different filter I should be using? Should we be writing our data out from the solver in a different way?
One option is to load the underlying vtu files individually, then do a GroupDataset followed by Merge Blocks. This successfully removes the partition boundaries, but it’s pretty slow. This is what I see with Load individual files → GroupDataSet → Merge Blocks → Extract Surface → Clip.
is another pipeline that should work as a workaround.
I think the issue that we aren’t handling gracefully is with unstructured grids that are defined by multiple pieces. Typically, each piece would exist on a different rank in a distributed pvserver, and in that scenario you can use the Ghost Cells filter to generate ghost cells - with those, the surface extraction at rank boundaries will be skipped so that the internal surfaces won’t be extracted. But in built-in server mode or running a distributed pvserver on fewer ranks than there are pieces, there are going to be piece boundaries that we don’t process nicely without workarounds discussed here.
Let me make sure I understand your comment. You’re saying that if the data is held on a distributed pvserver, the Ghost Cells filter should work correctly? I have tried starting pvserver (on my workstation) with mpiexec -n 4 pvserver. In the terminal, I see a message saying “Accepting connection(s): my.host.name:11111”. In the GUI (running on the same workstation), I connect to cs://localhost:11111, which has Startup Type: Manual. Is that what you mean by a distributed pvserver?
After connecting, I do: demo.pvtu → Ghost Cells, but I don’t get any ghost cells, so subsequent pipeline steps still have the partition boundaries.
After a bit more experimentation, I can say that the Ghost Cells filter works if I select the Generate Global Ids option and I run a pvserver with as many processes as I have partitions. Then I get ghost cells and Extract Surface ignores the partition boundaries. Why do I have to generate global IDs? The VTK documentation says “If there are no point global ids in the input, the 3D position of the points is used to determine if grids match, up to floating point precision in the coordinates.” I’m pretty sure the coordinates in my data are an exact match.
Needing to match the number of pvserver processes with the number of partitions is an unfortunate limitation, but it looks like I can work around that by inserting a Clean to Grid.
pvtu file → Clean to Grid → Ghost Cells (Generate Global Ids) → Extract Surface
This will remove boundaries between the partitions owned by each process. Then the Ghost Cells filter can hide the boundaries between each process.
Here is some benchmarking for a case with 3.3 million cells (the demo I shared is only 154,000 cells) and 4 partitions. I’m measuring the time from when I click the Load State button to when I see my first image.
Serial with a pipeline of pvtu → Clean to Grid → Extract Surface → Clip: 42 seconds
Parallel with 2 processes and a pipeline of pvtu → Clean to Grid → Ghost Cells (Generate Global Ids) → Extract Surface → Clip: 25 seconds
Parallel with 4 processes and pipeline above: 14 seconds
Parallel with 4 processes and a pipeline that omits Clean to Grid: 8.5 seconds
So that’s a reasonable workflow as long as the number of partitions isn’t a lot more than the number of pvserver processes.
After even more experimentation, I found that the experience is even more pleasant if I load the data through a vtpd file rather than a pvtu file. The vtpd file references the same vtu files as the pvtu version. demo.vtpd (356 Bytes)
Then I can do vtpd file → Ghost Cells (Generate Global Ids) → Extract Surface. This works nicely whether I run the server in parallel or not and doesn’t leave any partition boundaries behind if I have fewer server processes than partitions. For my larger 3.4 million cell dataset, I have 18 seconds to my first image running in serial and 8 seconds running in parallel with 4 processes.
Is there any reason we shouldn’t always use a vtpd file to load the data? What are the implications of writing the data as vtpd vs pvtu?
Yeah, that sounds like a bug. I’ll file an issue about it referencing the dataset you shared here. That option shouldn’t be necessary to get ghost cells.
Yes, that should work well in your case. When you use a .vtpd file, ParaView loads the data as a Partitioned DataSet Collection, which the Ghost Cells was designed to add ghost cells across adjacent blocks on a single rank. With a remote server, the different blocks in the single partitioned dataset in the collection you defined will be automatically be read by different ranks.
The Ghost Cells not processing individual pieces of a parallel unstructured grid loaded onto the same rank was not a use case considered when the new Ghost Cells filter was written, and that was an oversight.