affinity control os paraview-osmesa-mpi on cluster with SLURM

I just compiled paraview-osmesa-mpi. But when I use it on SLURM cluster, it shows
$ srun -pdebug -n1 -c4 pvserver --mpi --force-offscreen-rendering
srun: job 1692049 queued and waiting for resources
srun: job 1692049 has been allocated resources
Waiting for client…
Connection URL: cs://cn67:11111
Accepting connection(s): cn67:11111
Client connected.
SWR detected AVX # this line and below shows when I load headsq.vti and turn on volume rendering.
pthread_setaffinity_np failure for tid 0: Invalid argument
pthread_setaffinity_np failure for tid 6: Invalid argument
pthread_setaffinity_np failure for tid 7: Invalid argument
pthread_setaffinity_np failure for tid 8: Invalid argument
pthread_setaffinity_np failure for tid 9: Invalid argument
pthread_setaffinity_np failure for tid 10: Invalid argument
pthread_setaffinity_np failure for tid 11: Invalid argument
pthread_setaffinity_np failure for tid 12: Invalid argument
pthread_setaffinity_np failure for tid 13: Invalid argument
pthread_setaffinity_np failure for tid 14: Invalid argument
pthread_setaffinity_np failure for tid 15: Invalid argument
pthread_setaffinity_np failure for tid 1: Invalid argument

and from htop. I can see the cpu-binding is
2,3,4,5,2,2,2,2, … # 1 process with 15 light weight process (or threads?)

in node cn67, the total number of processor is 16.

My question is: how to control the affinity more automatically? What is the best setting for performance?

configuration:
paraview: v5.5.2
osmesa: 17.3.9-swr
llvm: 3.9.1
gcc: 5.4.0
mpi: mvapich2
slurm: 15.08.11

I tested the the following strategy.

$ KNOB_MAX_WORKER_THREADS=4 srun -pdebug -n1 -c4 pvserver --mpi --force-offscreen-rendering
srun: job 1692056 queued and waiting for resources
srun: job 1692056 has been allocated resources
Waiting for client...
Connection URL: cs://cn94:11111
Accepting connection(s): cn94:11111
Client connected.
SWR detected AVX

no error for volume rendering of headsq.vti

$ KNOB_SINGLE_THREADED=1 srun -pdebug -n4 pvserver --mpi --force-offscreen-rendering
srun: job 1692060 queued and waiting for resources
srun: job 1692060 has been allocated resources
Waiting for client...
Connection URL: cs://cn94:11111
Accepting connection(s): cn94:11111
Waiting for client...
Connection URL: cs://cn94:11111
ERROR: In /home/dic17007/downloads/paraview/VTK/Common/System/vtkSocket.cxx, line 206
vtkServerSocket (0x2492470): Socket error in call to bind. Address already in use.

ERROR: In /home/dic17007/downloads/paraview/ParaViewCore/ClientServerCore/Core/vtkTCPNetworkAccessManager.cxx, line 437
vtkTCPNetworkAccessManager (0x17b22a0): Failed to set up server socket.

Exiting...
Waiting for client...
Connection URL: cs://cn94:11111
ERROR: In /home/dic17007/downloads/paraview/VTK/Common/System/vtkSocket.cxx, line 206
vtkServerSocket (0x278ace0): Socket error in call to bind. Address already in use.

ERROR: In /home/dic17007/downloads/paraview/ParaViewCore/ClientServerCore/Core/vtkTCPNetworkAccessManager.cxx, line 437
vtkTCPNetworkAccessManager (0x1aab2a0): Failed to set up server socket.

Exiting...
Waiting for client...
Connection URL: cs://cn94:11111
ERROR: In /home/dic17007/downloads/paraview/VTK/Common/System/vtkSocket.cxx, line 206
vtkServerSocket (0x2fa7d50): Socket error in call to bind. Address already in use.

ERROR: In /home/dic17007/downloads/paraview/ParaViewCore/ClientServerCore/Core/vtkTCPNetworkAccessManager.cxx, line 437
vtkTCPNetworkAccessManager (0x22c82a0): Failed to set up server socket.

Exiting...
$ KNOB_SINGLE_THREADED=1 salloc -pdebug -n4 mpiexec pvserver --force-offscreen-rendering
salloc: Pending job allocation 1692061
salloc: job 1692061 queued and waiting for resources
salloc: job 1692061 has been allocated resources
salloc: Granted job allocation 1692061
Waiting for client...
Connection URL: cs://cn94:11111
Accepting connection(s): cn94:11111
Client connected.
SWR detected SWR detected SWR detected SWR detected AVX AVX AVX


AVX

I do not have any insights to your questions, but you might want to reach out to the OpenSWR folks for advice.

  • about affinity setting

I found a TACC’s script:

the setting is
KNOB_MAX_WORKER_THREADS=$(($NUMBER_CORES_IN_NODE / $TASKS_PER_NODE))

It shows that for a same CPU with 32 cores and a certain dataset, performance is:
4Process8Threads > 8Process4Threads > 16 Process2Threads > 16Process1Threads

1 Like