Run Paraview on multiple nodes on HPC cluster

I used the tunneling method to connect Paraview in client server mode. In this method, I did connect to one node on the HPC. What method should I adopt if I would like to run Paraview on multiple nodes on HPC? Does it deliver faster performance on multiple nodes? How impactful is the usage of GPUs? When using a combination of GPUs and multiple nodes, What is the optimal amount of nodes, cores, and GPUs to be utilized for a large scale visualization?

It is very slow when I try to save an animation. I would appreciate any feedback to make this process faster.

Use your MPI implementation.

Yes, but it depends on your workflow

Very important if you render

You are probably limited by the reading time of each time step