Issue opening large VTKHDF file

Hi all!

First of all, let me start by thanking everyone for the work you are doing with this software and the help that we will receive for sure :slight_smile:
We are developing a CFD Code (SOD2D BSC_SOD2D / sod2d_gitlab · GitLab) and our output results are formatted using VTKHDF format.
Till now, we had no problems with it and we are able to generate the output files and read them properly using ParaView. Since we use high-order Lagrangian elements we have two options to save the meshes/results:

  • Using high-order lagrange hexahedra (we interpolate the results using equidistant node distribution)
  • Linealising the mesh (we “transform” and divide the p-order elements into several first order hexahedra)

Till here no problem, everything working well!

The problem arose last week when pushing the software and we computed a case for meshes with more than 1 billion nodes. Trying to open the mesh or the results in Paraview gets the following issue:

[…]
HDF5-DIAG: Error detected in HDF5 (1.12.1) thread 0:
#000: /builds/gitlab-kitware-sciviz-ci/build/superbuild/hdf5/src/src/H5Dio.c line 179 in H5Dread(): can’t read data
major: Dataset
minor: Read failed
#001: /builds/gitlab-kitware-sciviz-ci/build/superbuild/hdf5/src/src/H5VLcallback.c line 2011 in H5VL_dataset_read(): dataset read failed
major: Virtual Object Layer
minor: Read failed
#002: /builds/gitlab-kitware-sciviz-ci/build/superbuild/hdf5/src/src/H5VLcallback.c line 1978 in H5VL__dataset_read(): dataset read failed
major: Virtual Object Layer
minor: Read failed
#003: /builds/gitlab-kitware-sciviz-ci/build/superbuild/hdf5/src/src/H5VLnative_dataset.c line 159 in H5VL__native_dataset_read(): could not get a validated dataspace from file_space_id
major: Invalid arguments to routine
minor: Bad value
#004: /builds/gitlab-kitware-sciviz-ci/build/superbuild/hdf5/src/src/H5S.c line 266 in H5S_get_validated_dataspace(): selection + offset not within extent
major: Dataspace
minor: Out of range
( 202.168s) [pvserver.46 ]vtkHDFReaderImplementat:864 ERR| vtkHDFReader (0x1514c150): Error H5Dread start: 18446744071577530368, 140159467271832, 0 count: 1555968, 354777680, 354776672
( 202.168s) [pvserver.46 ] vtkHDFReader.cxx:440 ERR| vtkHDFReader (0x1514c150): Cannot read the Connectivity array
( 202.168s) [pvserver.46 ] vtkExecutive.cxx:753 ERR| vtkPVCompositeDataPipeline (0x1514fe70): Algorithm vtkFileSeriesReader(0x1514e570) returned failure for request: vtkInformation (0x15227980)
Debug: Off
Modified Time: 163221
Reference Count: 1
Registered Events: (none)
Request: REQUEST_DATA
FROM_OUTPUT_PORT: 0
ALGORITHM_AFTER_FORWARD: 1
FORWARD_DIRECTION: 0
[…]

I have checked the mesh/results files in our code and the values look ok (I think). I don’t know if the issue can be related to int32 / int64 for the vtkIdType… This mesh goes above the int32 limit and in fact, we had to refactor our code for these cases allowing us to store larger global ids.
Of course, the mesh is partitioned in several ranks (for this particular case 5520 ranks), so the local ids do not arrive at the int32 limit, but maybe when trying to read the HDF5 file affects the variable vtkIdType offsetvtk in HDFReader.cxx. No idea, just a guess…
We have tried two different versions of paraview (5.10.1 & 5.11) getting the same error. We have asked the support of our cluster and they told us that the Paraview versions we have in the cluster are the precompiled versions, so I expect to have the flag compilation VTK_USE_64BIT_IDS, but cannot be sure…
In summary, our code is able to read and use the mesh file (stored in H5), but Paraview cannot open it giving the error I posted above. For all the meshes that we did till now there was no problem, and this error showed when going to this very large mesh, let me show the values:

HDF5 “cube-5520.hdf” {
GROUP “VTKHDF” {
ATTRIBUTE “Type” {
DATATYPE H5T_STRING {
STRSIZE 16;
STRPAD H5T_STR_NULLPAD;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
}
ATTRIBUTE “Version” {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
}
GROUP “CellData” {
DATASET “mpi_rank” {
DATATYPE H5T_STD_U8LE
DATASPACE SIMPLE { ( 1073741824 ) / ( 1073741824 ) }
}
}
DATASET “Connectivity” {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 8589934592 ) / ( 8589934592 ) }
}
GROUP “FieldData” {
}
DATASET “NumberOfCells” {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 5520 ) / ( 5520 ) }
}
DATASET “NumberOfConnectivityIds” {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 5520 ) / ( 5520 ) }
}
DATASET “NumberOfPoints” {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 5520 ) / ( 5520 ) }
}
DATASET “Offsets” {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 1073747344 ) / ( 1073747344 ) }
}
GROUP “PointData” {
}
DATASET “Points” {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 1147407183, 3 ) / ( 1147407183, 3 ) }
}
DATASET “Types” {
DATATYPE H5T_STD_U8LE
DATASPACE SIMPLE { ( 1073741824 ) / ( 1073741824 ) }
}
}
}

Has anyone faced similar issues for large VTKHDF files?

Any help on how to solve/debug the problem? I’m thinking of a first try of ‘splitting’ the mesh in some smaller files, but of course is not the desired solution.

It would be difficult to share the mesh files since it is 125Gb… but if you need…

Thanks a lot!

FYI @Lucas_Givord @danlipsa @jfausty

HI @jordimuela,

Thanks for posting and welcome to the ParaView discourse!

This definitely looks like some sort of overflow issue where the size of the array gets passed through a 32bit integer at some point. Would it be simple for you to switch your NumberOf* datasets to unsigned integers to check if it behaves better?

Best,
Julien

Mmmm, I think it is possible, it would be more of an issue of recomputing the mesh and generating the new mesh file with the unsigned integers.

But I cannot see now how it can help. The VTK reader will inherit the unsigned int type from the HDF5 file? Because if I check my current NumberOf* datasets they look ok, see:

https://drive.google.com/drive/folders/1sj3SAcNpb7hb-vpqvNT7pgx4YYGJ_l70?usp=sharing

BR

The VTK reader will inherit the unsigned int type from the HDF5 file?

Yes, the reader looks at the type and determines what kind of buffer to allocated depending.

I just realized however, in your case, the unsigned int type might still overflow.

From these lines in the error message:

( 202.168s) [pvserver.46 ]vtkHDFReaderImplementat:864 ERR| vtkHDFReader (0x1514c150): Error H5Dread start: 18446744071577530368, 140159467271832, 0 count: 1555968, 354777680, 354776672
( 202.168s) [pvserver.46 ] vtkHDFReader.cxx:440 ERR| vtkHDFReader (0x1514c150): Cannot read the Connectivity array

The offsets the reader is using to try to read the connectivity are way off but also wouldn’t be able to describe such high numbers if they were 32 bit integers. There must be some implicit type casting at some point that is leading to this result.

Yes, I think that numbers like 18446744071577530368, 140159467271832, are basically garbage… Moreover, I think that the uint32 will not help… and unsinged int32 can store from 0 to 4294967295, but for this mesh, the connectivity dataset is

DATASET “Connectivity” {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 8589934592 ) / ( 8589934592 ) }
}

so my guess is that in that part of the vtkHDFReader.cxx file (https://gitlab.kitware.com/paul.lafoix/vtk/-/blob/master/IO/HDF/vtkHDFReader.cxx) a uint32 it will also fail…

[…]
vtkIdType offset = std::accumulate(&numberOfCells[0], &numberOfCells[filePiece], filePiece);
if ((offsetsArray = vtk::TakeSmartPointer(
this->Impl->NewMetadataArray(“Offsets”, offset, numberOfCells[filePiece] + 1))) == nullptr)
{
vtkErrorMacro(“Cannot read the Offsets array”);
return 0;
}
offset = std::accumulate(&numberOfConnectivityIds[0], &numberOfConnectivityIds[filePiece], 0);
if ((connectivityArray = vtk::TakeSmartPointer(this->Impl->NewMetadataArray(
“Connectivity”, offset, numberOfConnectivityIds[filePiece]))) == nullptr)
{
vtkErrorMacro(“Cannot read the Connectivity array”);
return 0;
}
cellArray->SetData(offsetsArray, connectivityArray);
[…]

HI @jfausty (and everyone who is interested in this issue :slight_smile: )!

I can confirm 100% that the issue is related to a 32-bit integer issue. I have generated a tool that allows me to ‘externally’ link the original hdf5 file that was failing and only include some of the ranks (the original mesh was partitioned in 5520 ranks).
If I do a ‘new’ mesh including only the first 1300 ranks, so the “Connectivity” array has a size of 2022899200 I can open the mesh in PV. On the other hand, if I include the first 1400 ranks, with a “Connectivity” size of 2178507264 the PV gives the same error explained in the first post (the int32 limit is 2147483647 so values just under/above it)

As commented in the first post, we are using precompiled 5.10 & 5.11 versions in our cluster, so in theory, precompiled for int 64-bit support.

What I’m trying now is to compile Paraview 5.11 from source files. So I’ve just checked that the flag VTK_USE_64BIT_IDS is ON. Just wondering which is the ‘best’ option for flag VTK_SMP_IMPLEMENTATION_TYPE, it does matter?
I’ll try to install and use this PV to see if behaves differently than the pre-compiled versions that were failing.

BR,

Jordi

1 Like

Hi @jfausty (and again, everyone who is interested or can be involved in this issue),

I was able to compile from source files ParaView 5.11.1 in our cluster, so I am 100% sure that flag VTK_USE_64BIT_IDS = ON. Unfortunately, the error with this compiled version was exactly the same as with the precompiled version.
I have been checking the involved functions in VTK library when opening VTKHDF files and the connectivity array, and cannot find the possible source of the error, but as @jfausty mentioned, the source of the error must be that the size of the array gets passed through a 32bit integer at some point… So, can we consider it a bug in VTK lib? Or what do you think/suggest?

BR

Hi @jordimuela,

I do believe this to be a bug in VTK. Thank you very much for running your tests.

Best thing to do now would be to document it with a minimal reproducer on the VTK gitlab: https://gitlab.kitware.com/vtk/vtk for future reference by opening an issue. We can go track its progress from there and continue the discussion.

(Don’t hesitate to tag me on the issue, my gitlab handle is julien.fausty)

Best regards,
Julien

Ok! Thanks a lot @jfausty :wink:

I have opened the issue in the VTK gitlab so let’s see. I mentioned you there, but anyway I’m adding the link here in case anyone else is interested:

https://gitlab.kitware.com/vtk/vtk/-/issues/19087

BR

Perfect! Thanks @jordimuela