Run Paraview on multiple nodes on HPC cluster

I used the tunneling method to connect Paraview in client server mode. In this method, I did connect to one node on the HPC. What method should I adopt if I would like to run Paraview on multiple nodes on HPC? Does it deliver faster performance on multiple nodes? How impactful is the usage of GPUs? When using a combination of GPUs and multiple nodes, What is the optimal amount of nodes, cores, and GPUs to be utilized for a large scale visualization?

It is very slow when I try to save an animation. I would appreciate any feedback to make this process faster.

Use your MPI implementation.

Yes, but it depends on your workflow

Very important if you render

https://www.kitware.com/paraview-hpc-scalability-guide/

You are probably limited by the reading time of each time step