Save large datasets (on unstructured grid) in PVD (.VTU) format

(Corso) #1

Hi everyone,

I’m using pvbatch to post-process DNS data and although calculations scripted in Python are quite fast, saving the data requires a prohibitively large amount of time.
Here is the function used to save the data:
SaveData(’/path/file.pvd’, proxy=activeSource )
It’s also taking a long time to save them when launching pvbatch in parallel.

Is there a more time-efficient way to save such large datasets?

Thanks in advance for your help.

Best,
-Pascal

(Berk Geveci) #2

Can you elaborate on your workflow? Are you loading simulation output, filtering it and then writing out filtered data?

(Corso) #3

Here’s the workflow (steps with * are optional):

  • Data loading from NEK5000
  • Calculation of various quantities (on unstructured grid with around 50e6 points and circa 400 time instances)
    -* Gaussian filtering
  • Temporal statistics
    -* Slicing
  • Saving data (on unstructured grid, structured grid or slice) --> sticking point = very high amount of time (days)

I also noticed that the reader for NEK5000 fields isn’t stable. It seems that sometimes (for several time instants) the indexing of the points isn’t done correctly (see attached figure).

Many thanks.

Best,
-Pascal

(Berk Geveci) #4

For sanity check, can you try the legacy VTK format and the Exodus format for your unstructured grid use case? I’d like to know how they perform.

(Corso) #5

Using legacy VTK and Exodus formats does not speed the process up.
Furthermore, when I apply the temporal statistics filter to the unfiltered DNS fields, I obtain an averaged field with misplaced points like the ones in the figure previously sent.

Attached you will find two examples of Python scripts used to run the post-processing steps with pvbatch on our cluster.
macroPostProcDNSF_5000_2_tot.py (19.6 KB)
macroPostProcDNSMeanHighRes2.py (9.7 KB)

Many thanks.