Which file format is right for me?

flotus1 · March 5, 2021, 11:12pm

Could someone recommend a file format for transient result files from a CFD solver to read and analyze in ParaView?

So far, I have been using ensight gold file format to get the data from my fluid simulations into ParaView. Part of the reason why I used it is because that’s what the was used in the group back when I joined. And to be honest, I stuck with it because I actually understood the documentation. But I feel like I am beginning to push its limits. Using it with an MPI parallel solver is not exactly straightforward. And unless I messed something up with my file exports, I might have surpassed an element count limit, at least with file import in ParaView.

Here is what I need from the file format:

easy to understand for a part-time, self-taught programmer like myself
decent support in ParaView (obviously)
at least scalar and 3D vector quantities, symmetric tensors would be nice to have. Results are usually defined at cell centers, but an option for vertex-based values seems to come in handy on occasion.
mesh elements/cells are exclusively Cartesian cubes, or axis-aligned cubes if that makes it any clearer. However, the mesh itself is not Cartesian in the strictest sense of the word. The geometry is usually sparsely filled, and there are different cell sizes (I call it “levels”) in each mesh, with a 2:1 cell size ratio between different levels. So if there is a simple-to-use element type for this application, I would be over the moon.
The solvers writing the result files are MPI-parallel with domain decomposition. So parallel I/O seems to be a must-have.
The geometry is mostly stationary, while the values vary between time steps. This is one constraint that kept me from looking further into VTK. From what I understood, VTK does require writing the geometry for each time step in transient cases, even if it does not change.
Part of the mesh must be able to move. We haven’t implemented moving meshes yet, but will start with it quite soon. There is no deformation, the cubes retain their shape. But they translate and rotate between time steps. Total cell count remains constant.
Speaking of cell count: I am currently having trouble with ~400 million cells. Sooner or later, we will have cases with more than 1 billion cells.

Really looking forward to comments and suggestions. If I failed to provide crucial bits of information, or should clarify some points, please let me know.

Added sample image of what meshes typically look like:

GregVernon · March 6, 2021, 6:05am

Look into Exodus. The Exodus file format is part of the open-source “Sandia Engineering Analysis Code Access System” (SEACAS) and is natively (and rigorously) supported by ParaView. See GitHub page here.

The only question I have is whether in your vocabulary a “cell” is an element or the face of an element. I don’t believe Exodus supports face-values, but it does support elements and nodes (Exodus is FEM-centric).

Addressing each of your requests:

Easy to understand for a part-time, self-taught programmer like myself

Exodus is essentially a schema built upon either NetCDF or HDF5 (your choice). Also, see my discussion at end of this post regarding the multitude of SEACAS conversion utilities, APIs, etc.

decent support in ParaView (obviously)

Natively and rigorously supported in ParaView

at least scalar and 3D vector quantities, symmetric tensors would be nice to have. Results are usually defined at cell centers, but an option for vertex-based values seems to come in handy on occasion.

Exodus is FEM-centric, can store scalars, vectors, or matrices (i.e. Voigt-notation of tensors) at nodes or cell-centers.

mesh elements/cells are exclusively Cartesian cubes, or axis-aligned cubes if that makes it any clearer. However, the mesh itself is not Cartesian in the strictest sense of the word. The geometry is usually sparsely filled, and there are different cell sizes (I call it “levels”) in each mesh, with a 2:1 cell size ratio between different levels. So if there is a simple-to-use element type for this application, I would be over the moon.

Exodus supports 1D (bars & beams), 2D (quads, tris), and 3D (hex, wedge, pyramid, tet) elements. See the full list here.

The solvers writing the result files are MPI-parallel with domain decomposition. So parallel I/O seems to be a must-have.

Exodus is used in DOE codes that run on supercomputers. SEACAS includes a domain-decomposition utility decomp that wraps nem_slice and nem_spread executables for domain decomposition. Separate files for each processor. Then the epu utility can “rejoin” the separate files back into a single file (if you want). Fun note, epu stands for e pluribus unum - Latin for: out of many, one.

The geometry is mostly stationary, while the values vary between time steps. This is one constraint that kept me from looking further into VTK. From what I understood, VTK does require writing the geometry for each time step in transient cases, even if it does not change.

Exodus writes the geometry description once. Logical connections “link” variables to their respective mesh entities.

Part of the mesh must be able to move. We haven’t implemented moving meshes yet, but will start with it quite soon. There is no deformation, the cubes retain their shape. But they translate and rotate between time steps. Total cell count remains constant.

Save nodal displacements to variables disp_x, disp_y, and disp_z to allow ParaView to automatically apply displacements (no need to apply a Warp filter).

Speaking of cell count: I am currently having trouble with ~400 million cells. Sooner or later, we will have cases with more than 1 billion cells.

Did I mention that the DOE heavily uses Exodus? Sandia wrote an open-source CFD code, Nalu, which uses Exodus. They claim (see below) to have run with 10 Billion elements… is that enough for you?

(Disclosure - I am the director of product management at Coreform).
Also, you can build an Exodus mesh either via a dedicated meshing tool such as Sandia’s Cubit meshing software (if you are an authorized US Govt entity) or the commercial version: Coreform Cubit

We (Coreform) offer a free non-commercial license of Coreform Cubit (limited to 50k element export) if you’re interested. We also provide academic licensing if that’s of interest to you. Even if you want to go fully open-source, the free version should at least help you orient yourself with Exodus… there are SEACAS utilities to convert Exodus to & from ASCII text (exotxt / txtexo), and to/from Matlab *.mat files (exo2mat / mat2exo). There are also extensive APIs for C, Fortran, and Python (exodus.py) - not to mention that since Exodus is either a NetCDF or HDF5 file, you can use native APIs for these formats in any language (e.g. Python, Matlab, Julia, R, etc.)

With those tools, once you’ve oriented yourself with Exodus (which again, I would recommend using Coreform Cubit + the SEACAS utilities / APIs to assist you) you could write your own meshing tool (if you, understandably, don’t want to have a commercial, closed-source meshing tool in your workflow) using any language with SEACAS utility support, Exodus API, or NetCDF/HDF5 library.

flotus1 · March 8, 2021, 3:02pm

Thanks for the thorough analysis. That was more than I expected.
In my vocabulary, “cell” refers to a hexa8 element. I.e. a 3D cube.

Two follow-ups if I may:
From your description, it sounds like parellel I/O is implemented in a way where each MPI thread writes its own data in a separate file. And if I want to view the data, I have to run a collector routine first, that translates all these files into a single database.
That’s pretty much the workaround I came up with in order to keep ensight gold alive for a bit longer. Problem is, it is a bit heavy on ye olde hard drives. I made sure to read and write data in fairly large chunks, but depending on the underlying hardware and file system, this step can still take quite a lot of time.
I was hoping to avoid that entirely. I.e. the the MPI threads from the solver write into a single database, which can be imported directly into PV.

Speaking of parallel I/O: how would you compare your solution to CGNS? Because that’s the alternative that usually comes up. Yet so far, I could not fully wrap my head around it. But as far as I understood, it does “direct” parallel I/O into a single database, that can be imported into PV without extra steps.

GregVernon · March 9, 2021, 5:32am

You do not need to epu the files back together if you don’t wish. ParaView can load the split files and if you’re running parallel ParaView it can even read/process the files in parallel. My group would often epu the files back together only at the archival stage.

MicK7 · March 9, 2021, 7:35am

   Easy to understand for a part-time, self-taught programmer like myself

CGNS is essentially a data model built upon HDF5.

    decent support in ParaView (obviously)

Unsteady Transient and mesh deforming CGNS is supported in ParaView

    at least scalar and 3D vector quantities, symmetric tensors would be nice to have. Results are usually defined at cell centers, but an option for vertex-based values seems to come in handy on occasion.

CGNS can store scalars, vectors, or matrices at nodes or cell-centers.

    mesh elements/cells are exclusively Cartesian cubes, or axis-aligned cubes if that makes it any clearer. However, the mesh itself is not Cartesian in the strictest sense of the word. The geometry is usually sparsely filled, and there are different cell sizes (I call it “levels”) in each mesh, with a 2:1 cell size ratio between different levels. So if there is a simple-to-use element type for this application, I would be over the moon.

CGNS has HEXA_8 elements.

    The solvers writing the result files are MPI-parallel with domain decomposition. So parallel I/O seems to be a must-have.

CGNS has a parallel API for writing. Parallel reading is very efficient but parallel writing is a bit slow compared to pure MPI-IO (it is a layer over HDF5). The best thing is to test on a simple configuration to check if it fits the need.

    The geometry is mostly stationary, while the values vary between time steps. This is one constraint that kept me from looking further into VTK. From what I understood, VTK does require writing the geometry for each time step in transient cases, even if it does not change.

In CGNS, the geometry description is written once. Then “CGNS links” (HDF5 soft link) should connect to the reference geometry at each time step when using separated files for each time step.
To be efficient ParaView implement a geometry cache for CGNS so post-processing only read the geometry once.

    Part of the mesh must be able to move. We haven’t implemented moving meshes yet, but will start with it quite soon. There is no deformation, the cubes retain their shape. But they translate and rotate between time steps. Total cell count remains constant.

In CGNS, you can define arbitrary motion (rotation, translation) to a part of the mesh.ParaView does not handle this information yet (so a python script is needed to do the animation).
Mesh deformation is also possible and ParaView capable.

    Speaking of cell count: I am currently having trouble with ~400 million cells. Sooner or later, we will have cases with more than 1 billion cells.

CGNS with 64 bit compilation can easily handle billion cells mesh. CGNS is used at NASA for big simulation challenges.

CGNS is an open data model with an open source API.

Speaking of open source, XDMF can also be interesting (it also relies on hdf5).