I am highly interested on using the Parallel processing in paraview. But I have no idea how to implement it. I read that the following variables should be set: PARAVIEW_USE_MPI = ON. The problem is that I have installed paraview from Ubuntu repositories:
sudo apt-get install paraview . So, first question is, is the paraview package version able to handle parallel processing?
In case that the package version is not able to handle parallel processing, I must download ParaView Source Code. Some time ago I faced a lot of problems trying to install this type of paraview version. So, if you do not mind, I please you to assist me to install it, since the wiki (https://www.paraview.org/Wiki/ParaView:Build_And_Install) is quite short for me. I mean, indicate me the steps required to manage a good installation process: how to install Qt5, the required libraries, etc.
Once I have properly configured paraview to handle parallel processing, could you indicate me the required steps to use it?. I have read that I must connect to a server (I have access to a cluster) but I have no idea how to implement it. I have searched around Internet but I have not find out the solution to my problem.
Also, I tried to replicate the steps on the 15.7 Parallel processing in paraview and pvpython section from Paraview guide 5.0.0. When executing: mpirun -np 4 pvserver, the startup message appears:
Waiting for client …
Connection URL : cs :// myhost :11111
Accepting connection (s): myhost :11111
but paraview doesn’t start up.
I am using Ubuntu 16.04 and paraview version 5.0.1.
I also compiled ParaView from the source with PARAVIEW_USE_MPI=ON. The reason with source compilation was to enable the gdal. I have 5.8.0 version source compiled.
Although I enable AutoMPI in “MultiCore Support”, the ParaView doesn’t start with multiprocessors. It also doesn’t give any warning or error, very strange…
Is MPI the same as using multiple cores on a single machine? For instance, on Windows, if I have a machine with 2 physical CPUs, each with 48 cores, would I set the number of cores to 2 or to 96?
In the old days, you could use 96 (well, 95 to give the client a core) to split serial algorithms across your computing cores. Nowadays, quite a few core operations in ParaView/VTK are multithreaded to take advantage of multiple cores that share memory if you’ve compiled with a symmetric multiprocesing backend (such as TBB) enabled. Hence, you are probably better off most of the time not using MPI at all unless you need to, e.g., you are running on a cluster of machines each with their own memory spaces.
@cory.quammen I’ve noticed ParaView is EXTREMELY slow to open with MPI on and it takes never loads my files (micro-CT DICOM stacks in my case) with MPI on, but it does with it turned off.
When I turn on AutoMPI and enable up to 47 cores, the mpiexec does run
after about a minute (as opposed to maybe 10 seconds with it turned off), it opens. I then load my data (I cannot share my data) and when I hit visualize (i.e. click the eyeball), without MPI it takes maybe 20-30 seconds to load. With MPI, it just hangs and never loads.
Distribute across cores on a single CPU, across CPUs on 1 machine in separate sockets, or distribute across CPUs on a cluster of workstations?
Are there instructions for distributing data within the ParaView GUI? e.g. It is an easy task for me to, for example, perform the same OpenCV Gaussian blurring filter in Python across thousands of images in parallel by distributing the images across multiple cores. However, MPI is not a prerequisite to do this (i.e. I do not have to install MSMPI to utilize the Python multiprocessing and threading libraries). How can I acheive the same results via ParaView in the GUI and why is MPI necessary?
This depends on your workflow and process. If you are using multithreaded filters, then on a single machine or on a single CPU, using MPI may not be needed at all.
On a cluster, using MPI is needed.
Are there instructions for distributing data within the ParaView GUI? e.g. It is an easy task for me to, for example, perform the same OpenCV Gaussian blurring filter in Python across thousands of images in parallel by distributing the images across multiple cores. However, MPI is not a prerequisite to do this (i.e. I do not have to install MSMPI to utilize the Python multiprocessing and threading libraries).
You need to make sure your data is distributed, show the “processId” field to check.
How can I acheive the same results via ParaView in the GUI and why is MPI necessary?
You still need to run pvserver with MPI and connect to it, then you need to make sure your data is distributed. Then this is as usual.
You may want to follow a ParaView course to learn all this.
MPI is necessary because this is how ParaView handle distributed computing/
YES! I would love this! I have been scouring YouTube for information and it all feels kind of jumbled. Do you have good resources you would recommend? Where are the best places I can learn?
I have the same request for CMake. I would love to learn and between the Kitware books and YouTube, I still feel a little lost.