I am trying to visualize data from my CFD solver using the VTKHDF file format and ParaView. The solver is MPI parallel. Until now I resorted to assembling the mesh in memory and then writing a single partition to the output file. The resulting output file worked as expected, as in when I use filters such as CellDataToPointData or Contour, I get the expected output.
For very large datasets, however, this strategy seems to not work, as I was getting some allocation failure from ParaView when opening the dataset (very big dataset >64GB). I checked the documentation again for VTKHDF and am now trying to write partitioned datasets instead. That way I can run ParaView on the distributed dataset (probably as intended).
The issue I have now is that when I visualize my data, I can see imprints of the individual partitions. I would like to get rid of these, for obvious reasons. Per rank, I am writing the points (includes duplicates from other ranks) and all inner cells (no duplicates). When I use the Contour filter, for example, the output is “closed”. What I tried was to mark the duplicated points with a vtkGhostType array and add the GlobalNodeId for all the points. Then I tried to use the AddGhostCells filter to have ParaView add ghost cells so that the contour output would be closed. This, however, crashed my ParaView. After some Googling, I found a thread that mentioned that you have to run RedistributeMesh before generating ghost cells. This, however, also does not produce the desired result.
I have attached sample data that illustrates my problem. Could someone please tell me what I am doing wrong in this case.
Steps to reproduce:
Unzip data
Load navier_stokes_00011.vtkhdf in ParaView
Contour filter for density (Value=2)
Output is not smooth but has imprints of partitions
I modified my code to now also include the halo cells in the output to test if this would fix the issue. For a different test case (too large to share, but basically forward facing step) this is what I get if I don’t write the halo region:
I created a contour plot and then colored the iso-surfaces by the partition index. As you can see, there is a gap between partitions. This is the gap I would like to get rid of.
I am not sure if marking the ghost points as vtkGhostType and including the global node Ids is necessary, for this final image, they were included in the output.
For future reference, this is what you need to do to make ParaView read the data correctly:
The output needs to include the halo cells
The halo cells have to be full/valid 3D cells (initially this was not the case for me)
The halo cells have to be marked in the vtkGhostType dataset (write this yourself; don’t let ParaView try to determine this on its own). If you want to include boundary condition cells in your output (useful for computing gradients in ParaView), these also have to be marked in the vtkGhostType array, however with a different value (check this enum for reference: VTK: vtkDataSetAttributes Class Reference ). In my case, halo cells should be marked as DUPLICATECELL (=1) and ghost/periodic cells should be marked as HIDDENCELL (=32). There is also EXTERIORCELL (=16), which, from the comment, I would assume should be the correct type, but if you only select that, ParaView will still show the ghost/periodic cells. I ended up marking them as EXTERIORCELL | HIDDENCELL (=48), which produced the desired result.
Marking the halo/ghost points as vtkGhostType or including GlobalNodeId in the output file appears to be not necessary (I assume this is only true for cell-centered simulation data). When I tried to put it in, it did not improve the output in any way that I could determine, so I left it out.
If you do this, all the usual filters should work as expected; no need to do anything special before using them. I’ll attach the correct version of my initially submitted data for reference, since it is hard to come by such data.
Plotting old and new data next to each other, you can see that there are no more gaps in the output. It is still not perfect, but I suspect that this is now a resolution issue and not an issue in the output data. The gradient filter also works much better with the new dataset, still not perfect, but again, I think this is due to the resolution.
It seems like my “fix” is not enough for big datasets. I still get the gaps that you see in the screenshot above. Maybe someone could clarify, on a conceptual level, what needs to be done in order for ParaView to be able to stitch together the individual iso-surfaces into one. Is this even something that ParaView can do for partitioned datasets?
Here is an example from a Taylor-Green vortex; as you can see, the gaps are quite distracting:
This is data from a 1024-core simulation; the grid is 256^3. The resolution should be fine to get high-quality output. I would also be fine with running some filters first to get rid of the gaps. So far I have tried “Redistribute DataSet” and “Merge Blocks”, unsuccessfully.
At partition boundaries, there is not only a gap, but the surface itself is computed incorrectly. This also happens for data that is not based on gradients (density), although there the error in the surface is less severe:
Maybe this is actually a problem with ParaView itself and not with the input format. The CGNS documentation has some example mesh files, which are also partitioned; for example, this one. If I load this into ParaView, select Density, and then add a bunch of iso-surfaces, this is what I get:
This is also what I figured. I did try writing my halo region as ghost cells, which almost worked, but not completely (see posts above). My suspicion is that filters like contour work on point data, so internally ParaView will run something like cell data to point data to compute the iso surfaces with something like marching cubes.
My halo region is meant for CFD, where you only need the face neighbors. For the point values to match up across partition boundaries, the ghost cells need to include all the cells that are connected to the points on the outer surface of the domain partition. Increasing the halo region in my code would mean significant code changes. However, I am pretty sure that I could write a program that reads a one-partition VTKHDF file and produces an n-partition VTKHDF file with the appropriate ghost cells.
Could you maybe confirm if my suspicions about the ParaView internals are correct? If yes, I will implement a splitting function and test again.
Side note: I continued reading through that wiki that you mentioned, and 4. Visualizing Large Models — ParaView Documentation 6.0.0 documentation mentions that ParaView should be able to generate ghost cells on its own. If I select that filter on the CGNS example data, ParaView crashes with this backtrace:
Ok, I am sorry, you are right, it does work with the binary release. I was using the package from my package manager. It seems like the problem is their fault.