Substantial Memory-use Differences between VTU and Multi-block XDMF2+HDF5

I am observing substantial differences in self-reported ParaView memory usage between a single-block VTU file and a multi-block XDMF2+HDF5 file of otherwise identical mesh and data. The VTU file consists of a single block of 100 slabs, each with 6,400 linear hexahedral elements, leading to a single mesh with 640,000 elements. The multi-block mesh consists of 100 separate blocks with 6,400 linear hexahedral elements each. All node and connectivity data are identical, and the same four cell-wise data sets are present in both cases.

These files are sized to be large enough to clearly display behavior within the uncertainty of memory accounting arithmetic. All data files (VTU+XDMF2+HDF5) are ~15 MiB total (zipped), so I don’t provide them here.

The stages reported are:

  1. Fresh load of ParaView 5.8.0 on macOS 10.14.6,
  2. Load & Apply the data file (VTU or XDMF2 via XDMF Reader) to make it appear in the Render view,
  3. Apply Mesh Quality filter,
  4. Delete Mesh Quality filter, and
  5. Delete data file.

ParaView memory utilization via the Memory inspector at the conclusion of each phase is:

1 287 MiB 295 MiB
2 599 MiB 3.45 GiB
3 701 MiB 3.52 GiB
4 566 MiB 3.44 GiB
5 461 MiB 369 MiB


  1. Why is substantially more memory required for XDMF2+HDF5 versus VTU in this (as direct as possible) comparison?
  2. When the VTU filter and data file are deleted, why is 50% more memory consumed relative to prior to their loading? What is not getting cleared?

CC: @patchett2002

@jkulesza, mind emailing me the data offline? I’ll take a look to see what’s going on. I vaguely remember that Xdmf2 ended up taking a lot of space for the DOM that it built. The xml file could represent the dom more compactly but it needed up being “unrolled” in memory and that had the potential to be a memory hog. That was years ago, so I won’t trust my recollection too much. That could, however, explain the jump in stage 2. Stage 5 is perplexing; worth looking to see if that’s really a leak or just misreporting.

@jkulesza, mind emailing me the data offline?

Sent to what I believe your email address is. Please let me know if you don’t see my message.


Thanks, I got the data. I can definitely see the memory difference between the two. I have an inkling of what’s going on. I’ll dig in a bit further and get back to you once I’ve confirmed the issue.

I have an inkling of what’s going on. I’ll dig in a bit further and get back to you once I’ve confirmed the issue.

Excellent, thanks! It will be good to understand the reporting/deallocation weirdness.

More importantly to me as a user-developer: anything that can be done to reduce the memory needs of XDMF will be a direct benefit. I’ve run production models using XDMF that exhaust memory when applying filters when a VTU equivalent does not.

From what I can see, the issue seems to a limitation in Xdmf2 itself. The xdmf file has the following structure:

<Xdmf Version="2.0">
    <DataItem Name="Point Data" Format="HDF" DataType="Double" Dimensions="840500 3" Precision="8">

    <Grid Name="1_plate-1P1">
      <Geometry GeometryType="XYZ">
         <DataItem Reference="XML">/Xdmf/Domain/DataItem[@Name="Point Data"]</DataItem>
    <Grid Name="2_plate-1-lin-1-2P1">
      <Geometry GeometryType="XYZ">
         <DataItem Reference="XML">/Xdmf/Domain/DataItem[@Name="Point Data"]</DataItem>

You’re using DataItem Reference=... xml-nodes to share the globally defined point coordinates. The issue is that this shared “reference” doesn’t really imply the node_points won’t be read multiple times within Xdmf library itself. By stepping into XdmfGeometry::Update, which gets called for reading each Grid, I noticed that it ends up reading the point coordinates each time and providing new arrays to the VTK reader. The VTK reader happily treats each one as a new point coordinates array and adds it to each block that it created for each Grid element. Thus you end up with the complete points array repeated for each of the 100 blocks, although each of the blocks is only using a few of the points.

Thus you end up with the complete points array repeated for each of the 100 blocks

OK…that makes sense. The approach of using a global node array was to ensure that all points are uniquely defined and that no shared points disagree in position by floating-point roundoff. Do you know of another way to address this concern in a more-efficient manner?

Splitting the points by blocks may be the easiest. You shouldn’t really have any floating-point rounding issues since this is simply a subseting of the points, so should not require creation of new points.

Splitting the points by blocks may be the easiest.

That makes sense. I’d argued against that, but the idea that the points are being written from a common source does suggest the same values will be written…

oops … yes :), I’ll edit my original post to avoid confusion.