I just compiled paraview-osmesa-mpi. But when I use it on SLURM cluster, it shows
$ srun -pdebug -n1 -c4 pvserver --mpi --force-offscreen-rendering
srun: job 1692049 queued and waiting for resources
srun: job 1692049 has been allocated resources
Waiting for client…
Connection URL: cs://cn67:11111
Accepting connection(s): cn67:11111
Client connected.
SWR detected AVX # this line and below shows when I load headsq.vti
and turn on volume rendering.
pthread_setaffinity_np failure for tid 0: Invalid argument
pthread_setaffinity_np failure for tid 6: Invalid argument
pthread_setaffinity_np failure for tid 7: Invalid argument
pthread_setaffinity_np failure for tid 8: Invalid argument
pthread_setaffinity_np failure for tid 9: Invalid argument
pthread_setaffinity_np failure for tid 10: Invalid argument
pthread_setaffinity_np failure for tid 11: Invalid argument
pthread_setaffinity_np failure for tid 12: Invalid argument
pthread_setaffinity_np failure for tid 13: Invalid argument
pthread_setaffinity_np failure for tid 14: Invalid argument
pthread_setaffinity_np failure for tid 15: Invalid argument
pthread_setaffinity_np failure for tid 1: Invalid argument
and from htop
. I can see the cpu-binding is
2,3,4,5,2,2,2,2, … # 1 process with 15 light weight process (or threads?)
in node cn67, the total number of processor is 16.
My question is: how to control the affinity more automatically? What is the best setting for performance?
configuration:
paraview: v5.5.2
osmesa: 17.3.9-swr
llvm: 3.9.1
gcc: 5.4.0
mpi: mvapich2
slurm: 15.08.11
I tested the the following strategy.
$ KNOB_MAX_WORKER_THREADS=4 srun -pdebug -n1 -c4 pvserver --mpi --force-offscreen-rendering
srun: job 1692056 queued and waiting for resources
srun: job 1692056 has been allocated resources
Waiting for client...
Connection URL: cs://cn94:11111
Accepting connection(s): cn94:11111
Client connected.
SWR detected AVX
no error for volume rendering of headsq.vti
$ KNOB_SINGLE_THREADED=1 srun -pdebug -n4 pvserver --mpi --force-offscreen-rendering
srun: job 1692060 queued and waiting for resources
srun: job 1692060 has been allocated resources
Waiting for client...
Connection URL: cs://cn94:11111
Accepting connection(s): cn94:11111
Waiting for client...
Connection URL: cs://cn94:11111
ERROR: In /home/dic17007/downloads/paraview/VTK/Common/System/vtkSocket.cxx, line 206
vtkServerSocket (0x2492470): Socket error in call to bind. Address already in use.
ERROR: In /home/dic17007/downloads/paraview/ParaViewCore/ClientServerCore/Core/vtkTCPNetworkAccessManager.cxx, line 437
vtkTCPNetworkAccessManager (0x17b22a0): Failed to set up server socket.
Exiting...
Waiting for client...
Connection URL: cs://cn94:11111
ERROR: In /home/dic17007/downloads/paraview/VTK/Common/System/vtkSocket.cxx, line 206
vtkServerSocket (0x278ace0): Socket error in call to bind. Address already in use.
ERROR: In /home/dic17007/downloads/paraview/ParaViewCore/ClientServerCore/Core/vtkTCPNetworkAccessManager.cxx, line 437
vtkTCPNetworkAccessManager (0x1aab2a0): Failed to set up server socket.
Exiting...
Waiting for client...
Connection URL: cs://cn94:11111
ERROR: In /home/dic17007/downloads/paraview/VTK/Common/System/vtkSocket.cxx, line 206
vtkServerSocket (0x2fa7d50): Socket error in call to bind. Address already in use.
ERROR: In /home/dic17007/downloads/paraview/ParaViewCore/ClientServerCore/Core/vtkTCPNetworkAccessManager.cxx, line 437
vtkTCPNetworkAccessManager (0x22c82a0): Failed to set up server socket.
Exiting...
$ KNOB_SINGLE_THREADED=1 salloc -pdebug -n4 mpiexec pvserver --force-offscreen-rendering
salloc: Pending job allocation 1692061
salloc: job 1692061 queued and waiting for resources
salloc: job 1692061 has been allocated resources
salloc: Granted job allocation 1692061
Waiting for client...
Connection URL: cs://cn94:11111
Accepting connection(s): cn94:11111
Client connected.
SWR detected SWR detected SWR detected SWR detected AVX AVX AVX
AVX