Dear all,
I intend to batch-render on the GPU nodes of my HPC by submitting non-interactive SLURM jobs,
but I am a little stuck getting it running:
Here is what I tried:
1. Use the Paraview module provided by the HPC adminstrators → fails
like so:
#SBATCH --cpus-per-task=2
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
module load ParaView/5.9.0-RC1-egl-mpi-Python-3.8
pvbatch --mpi --egl-device-index=$CUDA_VISIBLE_DEVICES --force-offscreen-rendering session.py
# OR
mpiexec -n $SLURM_CPUS_PER_TASK -bind-to core pvbatch --mpi --egl-device-index=$CUDA_VISIBLE_DEVICES --force-offscreen-rendering session.py
… refuses to render:
( 1.223s) [pvbatch.0 ] vtkEGLRenderWindow.cxx:382 ERR| vtkEGLRenderWindow (0x632d440): Only EGL 1.4 and greater allows OpenGL as client API. See eglBindAPI for more information.
( 1.223s) [pvbatch.0 ]vtkOpenGLRenderWindow.c:458 ERR| vtkEGLRenderWindow (0x632d440): GLEW could not be initialized: Missing GL version
This ParaView module is an easybuild-installation, and I am pretty sure, it’s nothing but the precompiled server binary provided on https://www.paraview.org/download/
2. Use the pre-compiled server binary → fails
from https://www.paraview.org/download/. Produces the same error as above.
3. Own build through superbuild
#!/usr/bin/env bash
#SBATCH -J install
#SBATCH --cpus-per-task=12
#SBATCH --mem-per-cpu=2541
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
module load fosscuda/2020b # GCCcore-10.2.0
module load Python/3.8.6-GCCcore-10.2.0
module load SciPy-bundle/2020.11-fosscuda-2020b
module load matplotlib/3.3.3-fosscuda-2020b
module load CMake/3.18.4-GCCcore-10.2.0
module load libpng/1.6.37-GCCcore-10.2.0
module load Ninja/1.10.1-GCCcore-10.2.0
module load gperf/3.1-GCCcore-10.2.0
module load libffi/3.3-GCCcore-10.2.0
module load libxml2/2.9.10-GCCcore-10.2.0
module load pybind11/2.6.0-GCCcore-10.2.0
module load util-linux/2.36-GCCcore-10.2.0
module load zlib/1.2.11-GCCcore-10.2.0
module load freetype/2.10.3-GCCcore-10.2.0
module load SQLite/3.33.0-GCCcore-10.2.0
git clone --recursive https://gitlab.kitware.com/paraview/paraview-superbuild.git
cd paraview-superbuild
git checkout v5.9.1
git submodule update
cd ..
mkdir ParaView-5.9.1
cd ParaView-5.9.1
cmake -GNinja -Wno-dev \
-DCMAKE_INSTALL_PREFIX=$(pwd) \
-DPARAVIEW_BUILD_EDITION=CANONICAL \
-DENABLE_cuda=ON \
-DENABLE_egl=ON \
-DENABLE_fontconfig=ON \
-DENABLE_freetype=ON \
-DENABLE_mpi=ON \
-DENABLE_numpy=ON \
-DENABLE_ospray=ON \
-DENABLE_ospraymaterials=ON \
-DENABLE_png=ON \
-DENABLE_python3=ON \
-DENABLE_scipy=ON \
-DUSE_SYSTEM_bzip2=ON \
-DUSE_SYSTEM_freetype=ON \
-DUSE_SYSTEM_lapack=ON \
-DUSE_SYSTEM_libxml2=ON \
-DUSE_SYSTEM_mpi=ON \
-DUSE_SYSTEM_numpy=ON \
-DUSE_SYSTEM_png=ON \
-DUSE_SYSTEM_python3=ON \
-DUSE_SYSTEM_sqlite=ON \
../paraview-superbuild/
ninja -j $SLURM_CPUS_PER_TASK
ninja install
The build process seems successful. There is no error message and I end up with the bin folder holding pvbatch
and pvpython
among a few others.
However, it won’t run .
I have tried to run pvbatch in interactive slurm sessions as well as through batch jobs, executing the lines
-
mpiexec -n $SLURM_NTASKS -bind-to core pvbatch --mpi --egl-device-index=$CUDA_VISIBLE_DEVICES --force-offscreen-rendering ../session.py
slurm_19326116.sh (1.0 KB) -
pvbatch --mpi --egl-device-index=$CUDA_VISIBLE_DEVICES --force-offscreen-rendering ../session.py
slurm_19326127.sh (997 Bytes) -
pvbatch --egl-device-index=$CUDA_VISIBLE_DEVICES --force-offscreen-rendering ../session.py
slurm_19326144.sh (991 Bytes) -
but also a plain
pvbatch ../session.py
all producing this error:
--------------------------------------------------------------------------
An error occurred while trying to map in the address of a function.
Function Name: cuIpcOpenMemHandle_v2
Error string: /lib64/libcuda.so.1: undefined symbol: cuIpcOpenMemHandle_v2
CUDA-aware support is disabled.
--------------------------------------------------------------------------
- Is there any additional information I can provide to you to help you help me?
- Is our CUDA too old? We have nvidia tesla K80 with driver version 450.36.06 and CUDA Version: 11.0. The most recent version seems to be 470.57.02 and CUDA 11.4.
Best,
Bastian