Importing 2D structured cartesian mesh from multiple xdmf/hdf5 data files

I am trying to construct a 2D surface from a series of data files that each contain a portion of the structure rectangular mesh. I’d like paraview to read in each of the files and merge them into a single surface mesh. My data sets are large (TB-PB), so I’m not able to generate a new single data file containing the entire mesh/field. I’ve made an xdmf file which describes each of the data files. When I load one file, everything works. When I have ParaView load the group of xdmf files, it only displays the last file. I’m not sure what is wrong with what I’m doing. I made a Matlab script which represents what I’m trying to do (shown below).

In this sample problem, the total mesh is split between three regions with separate mesh/data files. For each region, the script creates an hdf5 domain file which stores the grid points along each axis of my mesh and an hdf5 solution file which holds the 2D field data. I’m trying to set this up in xdmf by defining a uniform grid type with 3DRectMesh topology. Since I only have the mesh nodes along each axis stored in the file(as opposed to full 2D arrays for each coordinate), I set GeometryType=“VXVYVZ” and read in the grid vectors in each direction (but have the z-direction only contain one value set to 0). Each of the three regions have a separate xdmf file associated with it and I load them as a group in paraview.

The cluster that I’m running on has ParaView 5.2 (it is the only version that the systems have installed and all I have to work with for now). When I load the three xdmf files, it only shows the data/properties corresponding to region 3(domain_3.h5 and field_data_3.h5). It is as though it overwrites the mesh each time it loads a new file rather than adding to the field. Does anyone have any suggestions on how I can get ParaView to build a single mesh from the separate data files?


for ind=1:3
    % Define dimension of the 2D-array
    Lx = 1.0;   nx = 2000; 
    Lt = 10.0;  nt = 5000;

    % Build axis of the mesh
    dx = Lx/(nx-1);  x = 0:dx:Lx;
    dt = Lt/(nt-1);  t = (ind-1)*Lt:dt:ind*Lt;

    % Build mesh rectangular grid
    [X,T] = meshgrid(t,x);

    % sample solution field: (changes direction for each file to make it easy to confirm that it loads correctly)
    if(rem(ind, 2) == 0)
        solution = sin(2*pi*X - 4*pi*T);
        solution = sin(2*pi*X + 4*pi*T);        

    % Create geometry file
    % x-axis

    % y-axis

    % z-axis

    % Create data file
    % write associated XDMF file:
    fileID = fopen("openWithParaView_"+ind+".xmf",'w');
    fprintf(fileID,'<?xml version="1.0" ?>\n');
    fprintf(fileID,'<!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" []>\n');
    fprintf(fileID,'<Xdmf Version="2.0">\n');
    fprintf(fileID,' <Domain>\n');
    fprintf(fileID,'  <Grid Name="mesh" GridType="Uniform">\n');
    fprintf(fileID,'   <Topology TopologyType="3DRectMesh" NumberOfElements="%d %d %d"/>\n',1,nt,nx);
    fprintf(fileID,'   <Geometry GeometryType="VXVYVZ">\n');
    fprintf(fileID,'    <DataItem Name="coordx" Dimensions="%g" NumberType="Float" Precision="%g" Format="HDF">\n',nx,8);
    fprintf(fileID,"     "+char("domain_"+ind+".h5:")+"/x_nodes\n");
    fprintf(fileID,'    </DataItem>\n');
    fprintf(fileID,'    <DataItem Name="coordy" Dimensions="%g" NumberType="Float" Precision="%g" Format="HDF">\n',nt,8);
    fprintf(fileID,"     "+char("domain_"+ind+".h5:")+"/t_nodes\n");
    fprintf(fileID,'    </DataItem>\n');
    fprintf(fileID,'    <DataItem Name="coordz" Dimensions="%g" NumberType="Float" Precision="%g" Format="HDF">\n',1,8);
    fprintf(fileID,"     "+char("domain_"+ind+".h5:")+"/z_nodes\n");
    fprintf(fileID,'    </DataItem>\n');
    fprintf(fileID,'   </Geometry>\n');

    % I seem to need this TimeType line or it loads each file as a different time step in an animation
    fprintf(fileID,'   <Time TimeType="Single" Value="%g"/>\n',0.0);

    fprintf(fileID,'   <Attribute Name="solution" AttributeType="Scalar" Center="Node">\n');
    fprintf(fileID,'    <DataItem Dimensions="%d %d %d" NumberType="Float" Precision="%d" Format="HDF">\n',1,nt,nx,8);
    fprintf(fileID,"     "+char("field_data_"+ind+".h5")+":/solution\n");
    fprintf(fileID,'    </DataItem>\n');
    fprintf(fileID,'   </Attribute>\n');
    fprintf(fileID,'  </Grid>\n');
    fprintf(fileID,' </Domain>\n');

Hi Bryce,
I am not sure I have fully understood your problem, but I believe the issue you are experiencing (only last file “visible” after importing N domains) is due to the way you structured your Xdmf files.

If I got it right, you have an XDMF file for each sub-domain; this make each file a kind of “closed” entity, represented by this XML like document (you have more or less the same header for each file).

One solution would be to write different entries for each sub-domain, append them in only one XDMF file, put only once the header with document description and main key attributes and try to load it.

A couple of remarks if you work in this way:

  • you may need to re-structure the data instead of appending them. Have you checked if xml.etree may be an option to write your XML?
  • I do not know if, at the end working like described, you will not reach the same performance bottle neck (in term of size of data) that you had at the beginning and drove you to choose this solution. Maybe you still gain something because of a different way paraview handles data if provided in different formats…

If you still experience performance issues due to data size I am not sure that this approach would help you solving the problem… I also deal with big data problem and one strategy I find helpful is to divide the output data to postprocess: so, even if the total topological domain is big, maybe the data to visualize can be split in different blocks (let’s say for structural analysis stresses, strains, displacements or for CFD eulerian speed, pressure,…). Could such a tip be applicable for your case?

Another tip that could boost performances would be to work (at least for output data to postprocess) with binary data instead of ASCII files. This could make the things more complex at the beginning, but could offer you a big jump…

Or, last strategy, combine the two things: split the data and, for each partition, convert it to binary and load the binary. I believe this would be, at least from performance point of view, the best solution.

Hope it helps!

Thanks for responding so quickly. The problem seems to be the way ParaVeiw reads the xdmf files. It seems to change depending on the ParaView version that I use. I tested it for Paraview 5.2.0, 5.4.1, 5.7.0-RC1, and 5.9.1. If I use the Matlab script (shown above) to make sample data for the 3 regions (3 domain, field and xmf files) I find that it only works for ParaView 5.4.1. For other versions, I find

Paraview 5.2.0:
It only loads the third region.

ParaView 5.4.1:
It has three xdmf reader options (“xdmf3 Reader(Top Level Partition)”, “xdmf3 reader”, and “xdmf reader”). If I choose the “xdmf3 Reader(Top Level Partition)”, it works fine. If I choose “xdmf3 reader”, the GUI remains interactive but it doesn’t display any of the field data/colorbar(the information tab says N/A in the extents area). If I choose “xdmf reader”, it only displays the third region.

ParaView 5.7.0-RC1
It has three xdmf reader options (“xdmf reader”, “xdmf3readerS”, and “xdmf3readerT”). If I choose “xdmf reader”, it displays one regions and then then it crashes (GUI menus stop working). If I choose “xdmf3readerS”, it loads correctly but the GUI menu stops working. If I choose “xdmf3readerT”, the GUI stays interactive but doesn’t display any data (the information tab says N/A in the extents area).

ParView 5.9.1
It segfaults and crashes immediately without displaying anything.

I’m not sure why the behavior changes so much depending on the version. I’m also not sure why each ParaView release changes the xdmf reader options.

As for your other comments:
1.) For the moment, I’m required to have the data formatted in this way. It’s too big to change and there are other software packages that rely on it being organized this way.
2.) You said I should use binary data. I am. The xdmf (in the xmf files) just tell paraview how to read my hdf5 files.
3.) I’m not sure what the xml.etree is. It could be worth exploring if I can’t get this to work.
4.) Eventually I’ll be concerned with performance. But this sample problem is small enough to run quickly on a laptop. ParaView should be able to handle this problem without issue. It just doesn’t seem to like the way the xdmf files are written and I’m not sure why.

Any ideas on how to re-write the xdmf in the *.xmf files to make this work?