Applying ParaView pipelines on different datasets (originating from the same code)

peterl · June 10, 2019, 11:27am

Good afternoon,

I am using ParaView now for over a year but still struggle with one significant issue: I run MHD simulations on a daily basis with one code and generate vtu output that I analyze with ParaView. I built a complex pipeline doing coordinate transformations for vector fields, computing derived quantities from the output etc. Building the pipeline takes me > 3 hrs!

The problem is that when I run my code again with new parameters I have to build the pipeline again because the data files output is not identical. I usually run with my code with a different number of cores (each of which writes out part of the output vtu data in a separate file) depending of whether I run the code locally or on our cluster. I ususally also have a different number of iterations/timesteps before convergence, so I have a different number of datafiles even if use the same computational mesh.

So far I was only partially successful to reuse my pipelines on different runs of the same code:

I linked (and renamed) one temporal snapshot to a separate directory and then was able to read-in that single time step by using the feature “only import data from this directory”. So this works for a fixed number of cores (=fixed number of output files) and one single time step. I had to generate the same pipeline for, say 8 and 60 cores. I cannot study the temporal evolution with this pipeline.
I tried to merge all the vtu output (from different I/O cores and different time steps) into on h5-file and build the whole pipeline based on this h5-file. However, also this approach fails when trying to load that pipeline on data from a new run as the different number of timesteps in a run is unavoidable for my implicit code.

At this point I don’t know whether I should use ParaView any longer. The data analysis and rendering works very well but I cannot the time to re-build complex pipelines (pvsm state files) everytime I want to analyze data from a new run of the same code. I simply don’t have the time for that and don’t know whether this issue can be fixed somehow.

Thanks a lot for your time and feedback!

mwestphal · June 10, 2019, 11:37am

I’m not what is the problem here. Can’t you use statefiles and change the inputs ?

On a side note, you may be interested by catalyst.
https://www.paraview.org/in-situ/

peterl · June 10, 2019, 12:07pm

Thank you Mathieu for your reply!

Can’t you use statefiles and change the inputs ?

Well, that was exactly my question. I use statefiles where I save my pipelines. The problem is that in ParaView pipelines and the data they are acting upon are not separate entities. Loading of the data is part of the pipeline. Now what I can do is merging the data into one h5 file and step 1 of my pipeline would be to read-in that data file. But this approach does not allow me to load my statefile based on a different h5 (with different numbers of timesteps, say).

So my question was exactly what you asked me in return: Is it possible to modify the statefiles for this specific purpose and how? Let’s say instead of 500 timesteps in my original statefile my new h5-file has 1300 timesteps. So I have to open the statefile in an editor and copy the lines I find associated with reading in a single time step 800 times? Apart from that this would be cumbersome, I don’t even know whether it’s possible - I got an error message when naively trying to load my statefile based on output of a different run that my timesteps are not the same – which of course they aren’t, my CFL changes adaptively and so on. In the end I also have to run the same simulations on different tetrahedral unstructured meshes with different resolutions.

So my question was really can this be fixed in the statefile so that I do not have to recreate a new statefile for every new run? If yes, I can devote more time into studying ParaView besides using the GUI but if not, this would be a good time to know so that I switch to some other 3-D data analysis platform, as for instance some colleagues of mine don’t have these problems using Visit.

I would like to stick to ParaView but at the moment I don’t know how to proceed with this problem. If you say with catalyst I can solve it I will devote my time going into it. In the ParaView manual I didn’t find this problem addressed at all where pipelines and statefiles are discussed. But again, so far I am mostly using the GUI and am not an expert. Maybe I should save the state as a python script and modify that somehow-I am regularly programming in Python so if you think that would be a doable approach I’ll go for it. Just at the moment I don’t know which approach would be best and, most importantly, whether this can be done at all.

Thanks for your feedback!

mwestphal · June 10, 2019, 12:17pm

Considering that you want to apply the same exact pipeline with the same exact properties, statefile (.pvsm) should work fine. If they don’t, maybe you are encountering some bugs.

If you want to do some modifications, I would suggest using python statefile instead, which are much easier to modify or even to use a macros in order to recreate your pipeline.

To test this, just change the type of the file when saving your state.

Catalyst is not a solution for your specific problem, just a way to do in-situ analysis with your simulaton.

peterl · June 10, 2019, 12:25pm

So if I understand you correctly, no matter whether the data I am analyzing sit on a different unstructured mesh (with a different resolution, say) have a different number of time steps and were written from different I/O cores before merged into on .h5-file, the pipeline in my pvsm statefile should load that data correctly (in the ideal case without my intervention) and if not it is either a bug or the code I am using writes the vtu in a non-ideal way?

I will give it a try with python statefiles, maybe I can see from there more easily what has to be modified from one run to the next.

Thank you very much for your input!

mwestphal · June 10, 2019, 12:29pm

it is either a bug or the code I am using writes the vtu in a non-ideal way?

Yes

peterl · June 10, 2019, 1:17pm

Thanks again for the hint that in principle the pipeline/statefiles should work without my intervention (as long as the h5-file for a particular run lies in the same directory as the unchanged statefile and has a particular (unchanged) name, of course).

I found that I have been writing the statefile on ParaView 5.5.2 Qt5 on my institute and was trying to load that same state on ParaView 5.6 on my private laptop at home. Installing 5.5.2 and loading the state based on data from different runs suddenly worked!

This is great indeed because I was really afraid I couldn’t do this sort of job with ParaView. And without the really handy possibility to convert the many vtu-output files into one h5 in ParaView that probably would not even have worked without a lot of manual modification of the statefile to adjust to each particular run.

mwestphal · June 10, 2019, 1:19pm

State file are suposed to be backward compatible, but it can indeed causes issues. I’m glad you found a solution.

peterl · June 30, 2019, 5:26pm

Actually, I am still having a similar problem: I am running simulations that generates a couple of hundreds vtk-files. When analyzing the data, I build a complex pipeline with lots of coordinate transformations and computation of derived quantities which takes me more than an hour.

I save the state file as pvsm and, because I of course want to reuse it with my next simulation run, I edit the pvsm such that absolute paths are replaced by actual path “./”. After that I copy the pvsm into the new data directory and and open paraview from there.

The problem is when I do not have the same number of vtk files in my new run. If it is lower, Paraview will give an error when I reach that timestep e.g. during making an animation. When I select the topmost load data part in the pipeline and select “Reload files” and then “Find new files” new files are not added, nor are timesteps reduced if the new run has less snapshots.

Now I can of course open the new set of vtk-files in the running Paraview session which adds another item in the pipeline at the bottom. But these new data I can no longer analyze with the pipeline from above which I created.

This is my biggest issue with Paraview, that I cannot simply insert a step in an existing pipeline, when I later notice I have forgotten to compute a derived quantity or when I want to replace the loaded dataset with another dataset. Either I simply don’t understand how pipelines can be manipulated later on or I really have to start from scratch again and again.

I tried, as you suggested in a post above, to use a python statefile rather than pvsm as it is indeed easier to edit. Having hundreds of vtk files (one for each temoral snapshot) is still a lot of work, if I have to add them in the list
data = LegacyVTKReader(FileNames=[‘./data.0000.vtk’, ‘./data.0001.vtk’, ‘./data.0002.vtk’, …])
But even if I would go for that really not ideal solution, it is still not working, because when I start paraview from the data directory with

paraview ./my-python-statefile.py
I am asked with which Reader I want to read-in those data (although they are all vtk and I would assume Paraview knows automatically how to treat them) and the only vtk reader I can select is a vtk particle reader which is not appropriate for my fluid + magnetic vector field data…

Maybe there is a simple solution to this and I just can’t see it.

Thanks a lot in advance.

mwestphal · July 5, 2019, 10:14am

I would suggest using a python statefiles. You would be able to implement your own logic in Python.

peterl · July 5, 2019, 2:49pm

The problem with the python statefiles I have so far is that when I start paraview with that statefile in the terminal (> paraview5 ./my-statefile.py) I receive a prompt asking me to choose a reader for the data. And there is no option to choose LEGACY VTK, only a bunch of readers that are inappropriate for my *.vtk data.

This is especially weird as in the python statefile the data files are specified as follows:
data0 = LegacyVTKReader(FileNames=[’./data.0000.vtk’, ‘./data.0001.vtk’,…])

So at the moment python statefiles don’t work for me because of this weird issue that I have to choose a reader from a list that does not list LEGACY VTK - probably a bug.

The other issue with that approach is that my datasets sometimes have a couple of hundreds of snapshots, sometimes over a thousand. I would have to edit the python state file (the line I listed above) everytime I analyze a new run because the “Reload Files” -> “Find new files” menu does not work when starting from a python state file and only sometimes works starting from a pvsm statefile. Especially when I analyze a shorter timeseries the number of snapshots does not get reduced when updating the data. Probably also a bug.

My only workaround at the moment is to load the data from a new run I want to analyze and then select “Change Input” at the topmost items of my pipelines. At least with this method I don’t have to recreate my pipelines of which I was really getting weary in the past.

mwestphal · July 8, 2019, 8:10am

Python statefiles and scripts are tested and used all the time on our buildbots without any issue. You may want to figure out what is going on there.