Fast volume rendering of time series on GPU cluster

I am playing with remote visualisation of relatively large data sets (~300GB per case, about 30 cases). Generally my objective is to accelerate post-processing tasks for me and my colleagues, but I appreciate I am still quite new to using PV in client-server mode and in parallel. There may be swathes of background I am missing. I feel I have five million questions on all sorts of subjects, but to focus the discussion I want to start with volume renders as I am just studing the performance.

I am converting my files from serial Ensight to VTMs. VTMs are arranged in separate time steps and decomposed with ParaView’s D3 to 64 subdomains. I am using a build v5.6 a Release build on a P8+4xP100 nodes using EGL backends. @utkarsh.ayachit has already helped me a lot by pointing out to “Use Data Partitions” option in the “Miscellanous” section. This appears to suppress the re-decomposition of the data everytime I play the volume. The video becomes sensibly quick but when I watch GPUs they still work only for a fraction of time. Are there any other tricks I can do to accelerate this process?

I am investigating:

  1. alternative data formats for temporal, mesh and field data (preferrably parallel I/O),
  2. temporal cache data filter,
  3. different decomposition techniques: mode, ghost regions etc,
  4. different cpu/gpu ratios,
  5. keep the case in RAM (cluster nodes have 100s of GBs of RAM).

If you do have any advice, do let me know. I speak C++ and basic VTK so I am happy to dig in the code, but also any non-code references on the subject of parallel rendering will be much appreciated.