Xdmf and Paralellized file reading and processing

bastian · August 9, 2018, 4:04pm

Dear all,

I couldn’t quite find this information on the web so I decided to just go ahead and ask:

When opening an Xdmf-file I am asked to choose between the following three readers. Are the first two equal?
Someone else was claiming to see different behaviours for all three of them:
https://public.kitware.com/pipermail/paraview/2016-April/036756.html
The docs (https://www.paraview.org/ParaView/Doc/Nightly/www/py-doc/paraview.simple.XDMFReader.html) are not helpful either.
- Xdmf3 Reader,
- Xdmf3 Reader (Top Level Partition),
- Xdmf Reader
my .xmf file typically looks like the one below. I have to load it using the Xdmf-reader, because the other two connot handle HyperSlabs. Can I expect the referenced file to be read with MPI-parallelized file access?
I did a test on a 16GB-file and noticed that all my pvserver processes have a CPU load of 100%, but only the memory consumption of one was increasing during the reading phase. The reading process felt long, but I’m not sure here.
Some Xdmf2-reader was claimed to do automatic parallel decomposition of structured data under some circumstances (“The xdmf3 reader does not do automatic parallel decomposition of structured data like xdmf2 reader did.”, https://gitlab.kitware.com/paraview/paraview/issues/17295). Where is that Xdmf2 reader hidden?

If I cannot get to have Paraview read my (PETSc) binary files in parallel, then post-processing is not feasible this way.

My backup plan is to write a parallelized Fortran Code that would load the binary files and hand them over to Paraview using the in-situ/Catalyst library. https://www.paraview.org/in-situ/
The same code could then of course be integrated into the main simulation code to avoid the storage of intermediate files if desired. For now though, I’m concerned with the processing of already existing data.

I would really appreciate if someone shed some light on this this issue.

Sample .xmf file

<?xml version="1.0" ?>
<!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" [
  <!ENTITY xst   "6.250000000000710E-004 1.250000000000000E-003 1.875000000000071E-003 ...">
  <!ENTITY yst   "0.000000000000000E+000 6.249999999999977E-004 1.249999999999995E-003 ...">
  <!ENTITY zst   "6.250000000000000E-004 1.250000000000000E-003 1.875000000000000E-003 ...">
  <!ENTITY xce   "3.125000000000355E-004 9.375000000000355E-004 1.562500000000036E-003 ...">
  <!ENTITY yce   "-3.124999999999989E-004 3.124999999999989E-004 9.374999999999966E-004 ...">
  <!ENTITY zce   "3.125000000000000E-004 9.375000000000000E-004 1.562500000000000E-003 ...">
  <!ENTITY gridx "0 6.250000000000710E-004 1.250000000000000E-003 1.875000000000071E-003 ...">
  <!ENTITY gridy "-0.000625 0.000000000000000E+000 6.249999999999977E-004 1.249999999999995E-003 ...">
  <!ENTITY gridz "0 6.250000000000000E-004 1.250000000000000E-003 1.875000000000000E-003 ...">
  <!ENTITY gridNx "2049" >
  <!ENTITY gridNy "339" >
  <!ENTITY gridNz "1025" >
  <!ENTITY Nx     "2048" >
  <!ENTITY Ny     "338" >
  <!ENTITY Nz     "1024" >
]>

<Xdmf Version="2.0" xmlns:xi="http://www.w3.org/2001/XInclude" >

  <Domain>

    <Grid Name="staggered_east" GridType="Uniform">
        
        <Topology TopologyType="3DRectMesh" NumberOfElements="&Nz; &Ny; &Nx;"/>
        
        <Geometry GeometryType="VXVYVZ">
          <DataItem Dimensions="&Nx;" NumberType="Float" Precision="4" Format="XML"> &xst; </DataItem>
          <DataItem Dimensions="&Ny;" NumberType="Float" Precision="4" Format="XML"> &yce; </DataItem>
          <DataItem Dimensions="&Nz;" NumberType="Float" Precision="4" Format="XML"> &zce; </DataItem>
        </Geometry>
      
        <Attribute Name="u" AttributeType="Scalar" Center="Node">
          
          <DataItem ItemType="HyperSlab" Dimensions="&Nz; &Ny; &Nx;" Type="HyperSlab">
            
            <DataItem Dimensions="3 4" Format="XML">
              0 0 0 0
              1 1 1 1
              &Nz; &Ny; &Nx; 1
            </DataItem>
            
            <DataItem 
              ItemType="Uniform"
              Name="u_raw"
              Format="Binary"
              Dimensions="&Nz; &Ny; &Nx; 3"
              DataType="Float" Precision="8" Endian="Big" Seek="8"
              >
              uvw_000642.bin
            </DataItem>
            
          </DataItem>
          
        </Attribute>

    </Grid>

  </Domain>

</Xdmf>

ChriWiChris · August 14, 2018, 12:30pm

Hi Bastian,

I had/have a similar problem regarding parallel xdmf.

If you can’t find the xdmf2 reader, you should recompile paraview with setting your specific options. All that I can find is for VTK:

//Request building vtkIOXdmf2
Module_vtkIOXdmf2:BOOL=ON

//Request building vtkIOXdmf3
Module_vtkIOXdmf3:BOOL=ON

//Request building vtkIOParallelXdmf3
Module_vtkIOParallelXdmf3:BOOL=ON

//Request building vtkxdmf2
Module_vtkxdmf2:BOOL=ON

//Request building vtkxdmf3
Module_vtkxdmf3:BOOL=ON

But beware, I’m not an expert and just figuring those things out myself as those options are not very well documented (or at least I can’t find them!).

As I have a problem with partitioning my large unstructered grid I want to use the D3 filter who does a great job. The only problem here is that every process reads in the complete mesh, so that the memory footprint is very high and not feasible in my case. That’s why I try to develop a sequential partitioning program, that saves the output in VTK, as this file format is very well implemented in paraview and tests with a smaller dataset gave me very good results in parallel I/O.

Greetings,

Chris

Jonathan_Carroll-Nel · February 5, 2021, 4:21pm

Hi Chris,
We are running into the same issue. We have a single domain with 1000^3 data points in a structured grid defined in an hdf5 file. Based on memory usage patterns - and a brief glancing at the source code - it appears that every rank is reading in the entire domain - prior to repartitioning. Did you end up switching to VTK?