ParaView crash for large datasets (309GB) using XDMF2

FabianDe · July 23, 2020, 8:04pm

Hello ParaView community,

first off, I am fairly new in the ParaView cosmos but am highly excited seeing the capabilities it offers especially with regards to my CFD simulation data that I have acquired and am trying to post-process.

I am trying to run ParaView5.6 on a TACC cluster, primarily in serial (though I have run it with 2 MPI processes as well) as the visualization node I’m using offers plenty (2.1TB to be exact) memory, but am experiencing issues when loading large datasets with the XDMF2 loader. Specifically, I’m talking about an HDF5 file containing XYZ + 5 conservative fluid variables on ~ 5 billion grid points which equates to a 309GB solution file for one time instance.

Problem being, that when I try to load the dataset with ParaView run in GUI mode, it tries to load the data into memory for a few minutes before it crashes without throwing an error in the command line (it’s run through Software Rasterizer (swr) ). When running a “free -m” during this process, I can see that ParaView is a) trying to load the data and then crashing and b) crashing before fully loading the data (see attached plot). Also, I tried reading in only the grid + 1 flow variable (density) as well as only the grid by specifying the XDMF file accordingly, but the crash happens almost identically.

I have tried swapping out the solution file with a lower resolution case that is 1/8th of the size and it works just fine (see plot below). This seemingly is a problem that arises solely because of the solution file size, but cannot be related to a lack of sufficient memory as I have plenty available before it crashes.

There are obviously different ways to go about this, e.g. dividing solution files into multiple slabs and loading subsequently, or using multi-grids options. Subsampling the solutions files (every other point) has been helpful in visualizing but that means a) I throw away picture quality and b) requires an extra computationally expensive and storage intensive step to the procedure.

Has anyone experienced similar issues with XFDM2 files associate with large datasets and/or has an idea what could be happening and possibly how to avoid this?

Thanks in advance for your help,
Fabian

Comparison loading full solution vs. XYZ+density vs. XYZ only:

Comparison loading full solution for subsampled solution files (every second / third grid point):

berkgeveci · July 23, 2020, 8:16pm

Hi Fabian,

It doesn’t sound like a memory issue. I wonder if this has something to do with 32 bit vs 64 bit (meaning somewhere in the code a 32 bit integer is used to address the point indices). One simple way of finding this out would be to try it with a dataset just below 2^31 points and another above.

FabianDe · July 24, 2020, 3:41pm

That is an excellent point and I tested this with the following grid size examples (with 2^31 = 3.147 billion):
- 2.00 billion --> loads properly
- 2.10 billion --> loads properly
- 2.14 billion --> loads properly
- 2.15 billion --> crashes
- 2.20 billion --> crashes
That would certainly confirm your hypothesis. I’m looking into the software environment provided by the HPC cluster to see how I address this. Once I have more details if it was indeed the 32-bit integer limit, I’ll update this for future reference.

Thank you for the brief response and great help!

berkgeveci · July 24, 2020, 7:12pm

Does the simulation offer support for another output format? If you could write this out as a VTK XML file (needs to be appended), it would be a good way of testing if the problem is in the Xdmf reader/format or somewhere in ParaView.