Apply spatial filter to structured grid before loading the data

pgrete · October 11, 2023, 10:10am

I’m trying to analyze data of a 2.7TB output containing a structured grid (non-overlapping multilevel mesh where each block contains 128^3 cells with 10 variables):

I’m mostly interested in the central region of the domain (say the innermost 2048^3 cells).
While I was able to load a single variable and then “Resample to image” the innermost region to a coarser (than original) resolution, trying to load more variables always results in crashes (even without applying any filter) even though I should have used enough nodes (aggregate memory of 8TB).

So I’m now trying to reduce the total amount of data loaded in first place but struggle to figure out selecting blocks based on their spatial location.
How can I load the dataset (but not the data itself yet), use the spatial location to determine block ids and then only load (selected) variables from those blocks.
As far as I followed the documentation I should probably use the SpreadSheetView (which unfortunately crashes when I open it) or FindData to determine blocks (but didn’t figure out how to select a spatial region).

I’m also open to other suggestions, if I’m doing sth fundamentally wrong.

Thanks,

Philipp

PS: I tried Paraview 5.9.1, 5.10.0, 5.11.0, and 5.11.2

nicolas.vuaille · October 11, 2023, 12:55pm

Hi !

Which file format are you loading from ParaView ?
ParaView has support for Adaptative Mesh Refinement (AMR) data, that looks similar to your case.

In any case, crash should not occurs so feel free to report an issue on our bug tracker https://gitlab.kitware.com/paraview/paraview/-/issues.

pgrete · October 11, 2023, 1:20pm

The data on disk is a single hdf5 file with the variable in the following dataset

   DATASET "cons" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 17088, 10, 128, 128, 128 ) / ( 17088, 10, 128, 128, 128 ) }
      STORAGE_LAYOUT {
         CONTIGUOUS
         SIZE 2866890670080
         OFFSET 107208194
      }
      FILTERS {
         NONE
      }
      FILLVALUE {
         FILL_TIME H5D_FILL_TIME_NEVER
         VALUE  H5D_FILL_VALUE_DEFAULT
      }
      ALLOCATION_TIME {
         H5D_ALLOC_TIME_EARLY
      }
   }

I’m reading this with the “XDMF Reader” using a separate xdmf:

<?xml version="1.0" ?>
<!DOCTYPE Xdmf SYSTEM "Xdmf.dtd">
<Xdmf Version="3.0">
<Information Name="TimeVaryingMetaData" Value="True"/>
  <Domain>
  <Grid Name="Mesh" GridType="Collection">
    <Information Name="Cycle" Value="748126"/>
    <Time Value="1.78"/>
    <Grid GridType="Uniform" Name="0">
      <Topology TopologyType="3DRectMesh" Dimensions="129 129 129"/>
      <Geometry GeometryType="VXVYVZ">
        <DataItem ItemType="HyperSlab" Dimensions="129"><DataItem Dimensions="3 2" NumberType="Int" Format="XML">0 0 1 1 1 129</DataItem>
          <DataItem Format="HDF" Dimensions=" 17088 129" Name="x" NumberType="Float" Precision="8">
            parthenon.restart.00089.rhdf:/Locations/x
          </DataItem>
        </DataItem>
        <DataItem ItemType="HyperSlab" Dimensions="129"><DataItem Dimensions="3 2" NumberType="Int" Format="XML">0 0 1 1 1 129</DataItem>
          <DataItem Format="HDF" Dimensions=" 17088 129" Name="y" NumberType="Float" Precision="8">
            parthenon.restart.00089.rhdf:/Locations/y
          </DataItem>
        </DataItem>
        <DataItem ItemType="HyperSlab" Dimensions="129"><DataItem Dimensions="3 2" NumberType="Int" Format="XML">0 0 1 1 1 129</DataItem>
          <DataItem Format="HDF" Dimensions=" 17088 129" Name="z" NumberType="Float" Precision="8">
            parthenon.restart.00089.rhdf:/Locations/z
          </DataItem>
        </DataItem>
      </Geometry>
      <Attribute Name="cons_density" Center="Cell">
        <DataItem ItemType="HyperSlab" Dimensions="128 128 128 ">
          <DataItem Dimensions="3 5" NumberType="Int" Format="XML">0 0 0 0 0  1 1 1 1 1 1 1 128 128 128</DataItem>
          <DataItem Format="HDF" Dimensions=" 17088 10 128 128 128" Name="cons" NumberType="Float" Precision="8">
            parthenon.restart.00089.rhdf:/cons
          </DataItem>
        </DataItem>
      </Attribute>
      <Attribute Name="cons_momentum_density_1" Center="Cell">
        <DataItem ItemType="HyperSlab" Dimensions="128 128 128 ">
          <DataItem Dimensions="3 5" NumberType="Int" Format="XML">0 1 0 0 0  1 1 1 1 1 1 1 128 128 128</DataItem>
          <DataItem Format="HDF" Dimensions=" 17088 10 128 128 128" Name="cons" NumberType="Float" Precision="8">
            parthenon.restart.00089.rhdf:/cons
          </DataItem>
        </DataItem>
      </Attribute>
...

[edit] Regarding the crashes, as far as I can tell they currently boil down to running out of memory, which is why I’m trying to reduce the memory footprint. If it turns out to be sth else, I’ll file an issue. [/edit]

nicolas.vuaille · October 12, 2023, 7:48am

I fear this is not easily doable in ParaView as of today.

AMR data format should be the right way to go, but with some dev to support this option in the reader…

pgrete · October 12, 2023, 12:36pm

Would the answer be different for a different data format, e.g., OpenPMD/ADIOS2?

nicolas.vuaille · October 12, 2023, 3:13pm

I do not really know OpenPMD.

Different hierarchical data format may be ok, like Exodus and CGNS. Note that they will not allow you to choose any spatial location. Instead they offer a predefined and checkable set of blocks that can be loaded.