partitioning large datasets and parallel I/O

ChriWiChris · August 7, 2018, 9:11am

Hi,

to visualize a very large dataset in a reasonable amount of time I partitioned the original XDMF file in a way that each MPI process can load a part of the data. This is the structure of the file:

<?xml version="1.0" ?>
<!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" []>
<Xdmf Version="2.0">
 <Domain>
  <DataItem NumberType="UInt" Precision="4" Format="HDF" Dimensions="48542797">sumatra_compressed.h5:/partition</DataItem>
  <DataItem NumberType="Float" Precision="4" Format="HDF" Dimensions="         101 48542797">sumatra_compressed.h5:/u</DataItem>
  <DataItem NumberType="Float" Precision="4" Format="HDF" Dimensions="         101 48542797">sumatra_compressed.h5:/v</DataItem>
  <DataItem NumberType="Float" Precision="4" Format="HDF" Dimensions="         101 48542797">sumatra_compressed.h5:/w</DataItem>
  <Grid Name="TimeSeries" GridType="Collection" CollectionType="Temporal">
  <Grid Name="SpaceSeries" GridType="Collection" CollectionType="Spatial">
  <Grid Name="step_000000000050_p0" GridType="Uniform">
    <Topology TopologyType="Tetrahedron" NumberOfElements="108355">
      <DataItem ItemType="HyperSlab" Dimensions="108355 4">
        <DataItem NumberType="UInt" Precision="8" Format="XML" Dimensions="3 2">0 0 1 1 108355 4</DataItem>
        <DataItem NumberType="Int" Precision="8" Format="HDF" Dimensions="48542797 4">sumatra_compressed.h5:/connect</DataItem>
      </DataItem>
    </Topology>
    <Geometry GeometryType="XYZ" NumberOfElements="8400027">
      <DataItem NumberType="Float" Precision="4" Format="HDF" Dimensions="8400027 3">sumatra_compressed.h5:/geometry</DataItem>
    </Geometry>
    <Time Value="250"/>
    <Attribute Name="u" Center="Cell">
     <DataItem ItemType="HyperSlab" Dimensions="108355">
      <DataItem NumberType="UInt" Precision="8" Format="XML" Dimensions="3 2">50 0 1 1 1 108355</DataItem>
      <DataItem Reference="/Xdmf/Domain/DataItem[2]"/>
     </DataItem>
    </Attribute>
    <Attribute Name="v" Center="Cell">
     <DataItem ItemType="HyperSlab" Dimensions="108355">
      <DataItem NumberType="UInt" Precision="8" Format="XML" Dimensions="3 2">50 0 1 1 1 108355</DataItem>
      <DataItem Reference="/Xdmf/Domain/DataItem[3]"/>
     </DataItem>
    </Attribute>
    <Attribute Name="w" Center="Cell">
     <DataItem ItemType="HyperSlab" Dimensions="108355">
      <DataItem NumberType="UInt" Precision="8" Format="XML" Dimensions="3 2">50 0 1 1 1 108355</DataItem>
      <DataItem Reference="/Xdmf/Domain/DataItem[4]"/>
     </DataItem>
    </Attribute>
   </Grid>
  <Grid Name="step_000000000050_p1" GridType="Uniform">
  ...
  <Grid Name="step_000000000050_p447" GridType="Uniform">
  ...

whereas the _p0 stands for “process 0”. This works quite well, as I can also just select a subset of all partitions and paraview shows them correctly. The only problem I am encountering is that although the data is partitioned, each MPI process reads the complete timestep instead of the specific part of the timestep. This causes memory problems as one timestep is getting big. This is the reason why I can’t use D3 in that case - the memory footprint is much too high.

It seems as this is a bug - or am I doing something wrong?
I am using paraview 5.2.0, which is the latest version running on our cluster as 5.5.0 is not supported yet.

Thanks in advance and regards

Chris

ChriWiChris · August 14, 2018, 12:39pm

Hi,

I just figured out what the real problem is: the simulation data is perfectly split up between processes, only the geometry is read in by every process, which is the biggest part of my data.

So my question is:
Is there a good way of converting prepartioned XDMF into VTK?
Using D3 is not an option due to the high memory footprint. My idea was to use the D3 algorithm sequentially and tell the program to split the data into p pieces. Is this possible or should I look for another option?

Thanks in advance

Chris