openfoam + xdmf + vtkhdf parallel file loading development

osarusan · October 20, 2024, 12:11pm

Hi There,

I’m looking to resolve performance issues inside openfoam, XDMF3 + VTKHDF5. Been very exciting to see and track VTKHDF5 but I don’t think I’ve seen my observation of XDMF3 issues fixed in vtkhdf5, openfoam case loading issues are much the same.

The basic issue in all of them is with a single processor or multiprocessor paraview - files and metadata need to be loaded in parallel. The main reason is CFD cases are processed with a high degree of parallelism and are best left in this distributed file layout system - openfoam has nprocessor * ntimestep * nfield files , hdf5 related solutions are at least ntimestep * nprocessor files, but may split finer as well. This means 10k+ files, and metadata may require several fileseeks and intermediate data loading to get. In practice what this means is paraview ui’s are slow to load up data or seek to new timesteps even if there’s not that much data when you look at a system using distributed filesystems. Distributed filesystems often have latency in the order of 1-10 milliseconds, so the interaction with that level of latency and large number of files in distributed CFD - you end up with a very slow paraview.

However it is possible to write code that does all file opens and reads in parallel - one can do this in threads or processes - same system on a weak processor or distributed in a cluster - and to such code you get a performance that is completely reasonable despite filesystem latency.

So one change I think needs to be made is each individual paraview process - particularly for desktop users - needs to embrace threads for loading up the blocks that particular rank is responsible for - and metadata all needs to be loaded in parallel.

I think the number of users who will benefit by this change in loading practices will be large.

I’ve studied the problem more than a little bit and I want to get it addressed. cc @Lucas_Givord @MatthiasLang

Francois_Mazen · October 21, 2024, 6:30am

Hi @osarusan and welcome to the discourse!

Converting your data to VTKHDF, you should have only one file, even with temporal data, and expect performance gain reading the file.

Agreed, it would be a great addition for the OpenFOAM community to output VTKHDF file directly from the solver.

Great! please reach me by email to check how Kitware may help in your case: francois.mazen@kitware.com

Best,
François

Lucas_Givord · October 21, 2024, 6:45am

Hi @osarusan,

As François said the VTKHDF File Format allows you to write in a single file your data, for temporal aspect the link from françois explains how to proceed.

For the distributed part, most of hdf dataset will have an array of n dimensions where n is the number of process.

You can check the documentation of the unstructured grid (VTK File Formats - VTK documentation) when we talk about partition.

Also maybe you prefer to have your 10k files but reduce the number of metadata, it could be possible by using virtual dataset.

In any case, do not hesitate to reach us