Multi-file HDF5 in .xdmf?

Erik_Keever · February 23, 2019, 2:19am

Hello,

I am trying to generate .xdmf to permit ParaView to directly read .h5 output from a parallel CFD simulation. The simulation generates one .h5 file per rank per saved step, including overlap cells.

Writing a postprocessing program that merges a single gigantic output file per step is simple, however the problems with this approach are obvious: The need for large-memory nodes capable of storing an entire step in memory, and even with 4 ranks the merged output files are approaching 100GB a piece.

After days of searching and experimentation I was eventually able to find an example for how to do this with one file per step (hence the fallback approach of merging), however I have had no luck finding any explanation or example of how to partition a grid across multiple files in xdmf.

This is where I am at, trying to make it read a spatial collection of grids per step. It is marvellously effective at causing segfaults.

[edit: okay hopefully it’ll display this now]

<Xdmf xmlns:xi="http://www.w3.org/2003/XInclude" Version="2.2">
<Domain>
    <Grid Name="TimeSeries" GridType="Collection" CollectionType="Temporal">
        <Grid Name="foo" GridType="Collection" CollectionType="Spatial"> 
            <Time Value="0.000000" />
            <Grid Name="thedomain" GridType="Uniform">
                <Topology name="thetopo" TopologyType="3DSMesh" Dimensions="10 512 260"/>
                <Geometry name="thegeo" GeometryType="XYZ">
                <DataItem Dimensions="2621440 3" NumberType="Float" Precision="4" Format="HDF">sim_geometry1.h5:/geometry_mesh</DataItem>
                </Geometry>
                <Time Value="0.000000" />
                <Attribute Name="massA" Active="1" AttributeType="Scalar" Center="Node">
                <DataItem Dimensions="10 512 260" NumberType="Float" Precision="4" Format="HDF">3D_XYZ_rank0_00.h5:/fluid1/mass</DataItem>
                </Attribute>
            </Grid>
            <Grid Name="thedomainB" GridType="Uniform">
                <Topology name="thetopoB" TopologyType="3DSMesh" Dimensions="10 512 260"/>
                <Geometry name="thegeoB" GeometryType="XYZ">
                <DataItem Dimensions="2621440 3" NumberType="Float" Precision="4" Format="HDF">sim_geometry2.h5:/geometry_mesh</DataItem>
                </Geometry>
                <Time Value="0.000000" />
                <Attribute Name="massB" Active="1" AttributeType="Scalar" Center="Node">
                <DataItem Dimensions="10 512 260" NumberType="Float" Precision="4" Format="HDF">3D_XYZ_rank1_00.h5:/fluid1/mass</DataItem>
                </Attribute>
            </Grid>
        </Grid>
    </Grid>
</Domain>
</Xdmf>

So I give up. To posit a simpler scenario: If I have a 32x1x1 grid, and geometryA/massA h5 files contain elements [0 1 2 … 13 14 15 16 17 18 19] and geometryB/massB contain elements [12 13 14 15 16 17 18 … 30 31] - i.e. a partitioned rectilinear grid with ghost cells - what do I write?

It seems like it’s got to be either a spatial collection or a subset. I’ve seen examples of collections that are groups of different objects but this is a single object whose data is stored in multiple files.