Recommended file format for ‘light’ description of existing ‘heavy’ HDF-5 files
Apologies for asking a common question. I’ve looked through several of the previous posts, and haven’t quite found the right solution – of note, Which file format is right for me? and xdmf HyperSlab with timesteps - #4 by patchett2002
Background
I am looking to import time-series data from a pre-existing custom HDF-5 files containing ~TB simulation data, defined on a reasonably complex unstructured grid. We decompose our simulation spatially in cylindrical coordinates, and save using serial IO. Therefore, fields are stored over N files, each with a distinct grid. The grid files are currently in VTK format (although with meshio, we can easily change this), defining a cell (3D hexahedron) around each simulation point. Then, the data files stored the values from the simulation at each simulation point, at several time-points.
Currently, I can plot the data by using meshio to copy the fields into a format understood by Paraview – this was used to make the above figure. However, since the files are large, I would like to interface with them directly rather than copying them into a file format like VTK, and since we have a pre-existing library of analysis tools I can’t change the format.
The issue
The natural solution would be something like XDMF, where some ‘light’ description file is used to describe the interface to the ‘heavy’ HDF-5 files. However, when trying to do this
- The time dimension causes a headache. I have tried to use hyperslab-indexing and a temporal collection to point to each time point in each file, but then I’d need to make N times T entries in the XDMF file (i.e. per plane and per time point). Additionally, each time a new time-point is added, I’d need to modify the XML. Apparently improved handling of time-data was under development in XDMF, but it doesn’t appear to be there yet (if ever)
- Additionally, the XDMF reader (all 3 versions) only loads the grid if I add a hyperslab: seems possibly related to https://gitlab.kitware.com/paraview/paraview/-/issues/19608, see below
- Finally, from other posts, it sounds like the XDMF project is no longer developed. As such, formats like EXODUS II or CGNS are recommended over XDMF. However, can these formats be used as a ‘light’ data descriptor – I don’t see a way to point to existing data files with these? I think I’d need to reformat my data file, or at least copy a EXODUS II/CGNS header into my HDF5 files (the latter option is possible, but I don’t see documentation on how to do this for time series)
As such, I’m not sure how I should go about reading my data files. Is there a better way to handle the time dimension (and to work around the hyperslab issue) in XDMF (preferably using Xdmf3ReaderS to spatially parallelise)? Or is there another “light+heavy” format which is easier to get a time series into?
Any help/pointers would be greatly appreciated!
Sample data
Sample data is available at MPCDF DataShare (password: viz
)
Currently, I convert the VTK mesh files in mesh
and the data in the snaps
files into XDMF+HDF5 files via the following simple script.
import meshio
from netCDF4 import Dataset
for p in range(16):
mesh = meshio.read(f'mesh/mesh{p:05d}.vtu')
ds = Dataset(f'snaps/snaps{p:05d}.nc')
with meshio.xdmf.TimeSeriesWriter(f'data{p:05d}.xdmf') as writer:
writer.write_points_cells(mesh.points, mesh.cells)
for i, t in enumerate(ds['ne/tau']):
writer.write_data(t, cell_data = dict(density = [ds['ne/vals'][i, :],]))
Working XDMF3
<Xdmf xmlns:ns0="http://www.w3.org/2003/XInclude" Version="3.0">
<Domain>
<Grid Name="mesh" GridType="Uniform">
<Geometry GeometryType="XYZ">
<DataItem DataType="Float" Dimensions="213554 3" Format="HDF" Precision="8">data00000.h5:/data0</DataItem>
</Geometry>
<Topology TopologyType="Hexahedron" NumberOfElements="105767">
<DataItem DataType="Int" Dimensions="105767 8" Format="HDF" Precision="4">data00000.h5:/data1</DataItem>
</Topology>
</Grid>
<Grid Name="TimeSeries_meshio" GridType="Collection" CollectionType="Temporal">
<Grid>
<ns0:include xpointer="xpointer(//Grid[@Name="mesh"]/*[self::Topology or self::Geometry])" />
<Time Value="0.0" />
<Attribute Name="density" AttributeType="Scalar" Center="Cell">
<DataItem DataType="Float" Dimensions="105767" Format="HDF" Precision="8">data00000.h5:/data2</DataItem>
</Attribute>
</Grid>
</Grid>
</Domain>
</Xdmf>
Non-working XDMF-3
Loads, but can’t view the data in the variable – just have the “SolidColor” option available
<Xdmf xmlns:ns0="http://www.w3.org/2003/XInclude" Version="3.0">
<Domain>
<Grid Name="mesh" GridType="Uniform">
<Geometry GeometryType="XYZ">
<DataItem DataType="Float" Dimensions="213554 3" Format="HDF" Precision="8">data00000.h5:/data0</DataItem>
</Geometry>
<Topology TopologyType="Hexahedron" NumberOfElements="105767">
<DataItem DataType="Int" Dimensions="105767 8" Format="HDF" Precision="4">data00000.h5:/data1</DataItem>
</Topology>
</Grid>
<Grid Name="TimeSeries_meshio" GridType="Collection" CollectionType="Temporal">
<Grid>
<ns0:include xpointer="xpointer(//Grid[@Name="mesh"]/*[self::Topology or self::Geometry])" />
<Time Value="0.0" />
<Attribute Name="density" AttributeType="Scalar" Center="Cell">
<DataItem ItemType="HyperSlab" Dimensions="105767" Type="HyperSlab">
<DataItem Dimensions="3" Format="XML">
0
1
105767
</DataItem>
<DataItem
Name="density"
Dimensions="105767"
Format="HDF">
data00000.h5:/data2
</DataItem>
</DataItem>
</Attribute>
</Grid>
</Grid>
</Domain>
</Xdmf>