Hardware recommendation for medium size data sets

nopech · May 6, 2020, 1:48pm

Hello everyone

I frequently post process data sets with around 50Mio. unstructured cells (OpenFOAM decomposed data). However i just can’t get my head around what would be the most efficient hardware.

When I use a CPU cluster and run pvserver with mpi I can use a large amount a cores. This accelerates the loading of the data and manipulating a lot. However this many cores (around 180) cost some money and software rendering is not the most efficient way.

When I use hardware with a GPU, i have a smaller number of CPU cores. The rendering might work more efficient but loading the data takes forever. Usually GPU nodes in a cluster have a smaller number of cores.

Also if I want to use NVIDIA Index for volume rendering on one GPU which is said to be enough for such a data set, i only have around 12 CPU cores and loading the data is slow.

How do large visualization clusters work? Do they use many CPU cores and GPUs to overcome this issue or is there something i have missed?

Thank you for any hints.

wascott · May 6, 2020, 4:45pm

There are numerous approaches to skinning this cat. It seems that many of the supercomputing sites have different sollutions. Further, it depends on the software you are running. For instance, EnSight runs in parallel differently than ParaView. (More of it’s processing occurs on the client side).

In my humble opinion, it depends on your goals, how much you have actually analyzed your needs, and how deep your pockets are.

About 15 years ago, we at Sandia thought through this question. For production vis, we found that rendering speeds generally are a small issue. Graphics cards also make builds and system maintenance harder. Biggest performance issues are having enough memory to run (and knowing that memory is an issue), second is load time off disk (use spread datasets, and a fast parallel disk), and third is MPI communication among processes (generally solved by the developers creating better algorithms. See Volume Rendering in PV 5.9.0. Utkarsh rocks.).

For us, we just have a few percent of the normal cluster nodes dedicated to viz (so they are always available), and I encouraged the hardware guys to not buy graphics cards/ gpu’s/ etc, but rather buy more memory.

As you state, as time goes on, we will have to rethink this decision as functionality like NVIDIA’s magic hardware support. But, at the end of the day, I don’t want fancy hardware, but rather a good experience for my users. Whatever that means.

Quefrency · May 12, 2020, 3:12pm

Hi,
From my limited experience, my suggestion is that you need to look at how much RAM you have. I suggest you look at getting 64 GB or more. Also the speed at which you can read the data will make a difference, so the faster the hard drive the better, ie NVME > SSD > HD.

The format at which you data is in will also make a difference.
This post might help understanding the background but in short get the openfoam data into vtm format it will save processing time.

and https://www.youtube.com/watch?v=AoUygiAk4fc

Careful of Index, it is not a silver bullet that will solve your volume rendering problems. I have not be able to get one of my openFOAM simulations to be able to render with index, I have tried with a snappyhex mesh and a cfmesh mesh. I keep getting a topology error. So if I need to do a volume render I still resample to image and then volume render, however to do this with a large data set needs lots of ram. My suggestion is to clip your data down to a subset then sort out your colouring and opacity. Then apply that to your data set to get the image that you want, using a python script to do it will be better.