affinity control os paraview-osmesa-mpi on cluster with SLURM

chengdi123000 · August 26, 2018, 4:51pm

I just compiled paraview-osmesa-mpi. But when I use it on SLURM cluster, it shows
$ srun -pdebug -n1 -c4 pvserver --mpi --force-offscreen-rendering
srun: job 1692049 queued and waiting for resources
srun: job 1692049 has been allocated resources
Waiting for client…
Connection URL: cs://cn67:11111
Accepting connection(s): cn67:11111
Client connected.
SWR detected AVX # this line and below shows when I load headsq.vti and turn on volume rendering.
pthread_setaffinity_np failure for tid 0: Invalid argument
pthread_setaffinity_np failure for tid 6: Invalid argument
pthread_setaffinity_np failure for tid 7: Invalid argument
pthread_setaffinity_np failure for tid 8: Invalid argument
pthread_setaffinity_np failure for tid 9: Invalid argument
pthread_setaffinity_np failure for tid 10: Invalid argument
pthread_setaffinity_np failure for tid 11: Invalid argument
pthread_setaffinity_np failure for tid 12: Invalid argument
pthread_setaffinity_np failure for tid 13: Invalid argument
pthread_setaffinity_np failure for tid 14: Invalid argument
pthread_setaffinity_np failure for tid 15: Invalid argument
pthread_setaffinity_np failure for tid 1: Invalid argument

and from htop. I can see the cpu-binding is
2,3,4,5,2,2,2,2, … # 1 process with 15 light weight process (or threads?)

in node cn67, the total number of processor is 16.

My question is: how to control the affinity more automatically? What is the best setting for performance?

configuration:
paraview: v5.5.2
osmesa: 17.3.9-swr
llvm: 3.9.1
gcc: 5.4.0
mpi: mvapich2
slurm: 15.08.11

I tested the the following strategy.

$ KNOB_MAX_WORKER_THREADS=4 srun -pdebug -n1 -c4 pvserver --mpi --force-offscreen-rendering
srun: job 1692056 queued and waiting for resources
srun: job 1692056 has been allocated resources
Waiting for client...
Connection URL: cs://cn94:11111
Accepting connection(s): cn94:11111
Client connected.
SWR detected AVX

no error for volume rendering of headsq.vti

$ KNOB_SINGLE_THREADED=1 srun -pdebug -n4 pvserver --mpi --force-offscreen-rendering
srun: job 1692060 queued and waiting for resources
srun: job 1692060 has been allocated resources
Waiting for client...
Connection URL: cs://cn94:11111
Accepting connection(s): cn94:11111
Waiting for client...
Connection URL: cs://cn94:11111
ERROR: In /home/dic17007/downloads/paraview/VTK/Common/System/vtkSocket.cxx, line 206
vtkServerSocket (0x2492470): Socket error in call to bind. Address already in use.

ERROR: In /home/dic17007/downloads/paraview/ParaViewCore/ClientServerCore/Core/vtkTCPNetworkAccessManager.cxx, line 437
vtkTCPNetworkAccessManager (0x17b22a0): Failed to set up server socket.

Exiting...
Waiting for client...
Connection URL: cs://cn94:11111
ERROR: In /home/dic17007/downloads/paraview/VTK/Common/System/vtkSocket.cxx, line 206
vtkServerSocket (0x278ace0): Socket error in call to bind. Address already in use.

ERROR: In /home/dic17007/downloads/paraview/ParaViewCore/ClientServerCore/Core/vtkTCPNetworkAccessManager.cxx, line 437
vtkTCPNetworkAccessManager (0x1aab2a0): Failed to set up server socket.

Exiting...
Waiting for client...
Connection URL: cs://cn94:11111
ERROR: In /home/dic17007/downloads/paraview/VTK/Common/System/vtkSocket.cxx, line 206
vtkServerSocket (0x2fa7d50): Socket error in call to bind. Address already in use.

ERROR: In /home/dic17007/downloads/paraview/ParaViewCore/ClientServerCore/Core/vtkTCPNetworkAccessManager.cxx, line 437
vtkTCPNetworkAccessManager (0x22c82a0): Failed to set up server socket.

Exiting...

$ KNOB_SINGLE_THREADED=1 salloc -pdebug -n4 mpiexec pvserver --force-offscreen-rendering
salloc: Pending job allocation 1692061
salloc: job 1692061 queued and waiting for resources
salloc: job 1692061 has been allocated resources
salloc: Granted job allocation 1692061
Waiting for client...
Connection URL: cs://cn94:11111
Accepting connection(s): cn94:11111
Client connected.
SWR detected SWR detected SWR detected SWR detected AVX AVX AVX


AVX

cory.quammen · August 27, 2018, 1:30pm

I do not have any insights to your questions, but you might want to reach out to the OpenSWR folks for advice.

chengdi123000 · August 28, 2018, 3:32pm

about affinity setting

I found a TACC’s script:

github.com

TACC/vis-workloads/blob/master/scripts/llvmpipe

#!/bin/bash

#/* =======================================================================================
#   This file is released as part of SVBench: Scientific Visualization Benchmarking Suite
#	 https://github.com/TACC/vis-workloads
#
#   Copyright 2013-2015 Texas Advanced Computing Center, The University of Texas at Austin
#   All rights reserved.
#
#   Licensed under the BSD 3-Clause License, (the "License"); you may not use this file
#   except in compliance with the License.
#   A copy of the License is included with this software in the file LICENSE.
#   If your copy does not contain the License, you may obtain a copy of the License at:
#
#       http://opensource.org/licenses/BSD-3-Clause
#
#   Unless required by applicable law or agreed to in writing, software distributed under
#   the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
#   KIND, either express or implied.
#   See the License for the specific language governing permissions and limitations under

This file has been truncated. show original

the setting is
KNOB_MAX_WORKER_THREADS=$(($NUMBER_CORES_IN_NODE / $TASKS_PER_NODE))

about performance
I noticed intel’s benchmark in the slides: https://www.intel.com/content/dam/www/public/us/en/documents/presentation/paraview-waspray-openswr.pdf

It shows that for a same CPU with 32 cores and a certain dataset, performance is:
4Process8Threads > 8Process4Threads > 16 Process2Threads > 16Process1Threads