Unable to get working server build for GPU based rendering through EGL

bastian · August 30, 2021, 5:22pm

Dear all,

I intend to batch-render on the GPU nodes of my HPC by submitting non-interactive SLURM jobs,
but I am a little stuck getting it running:

Here is what I tried:

1. Use the Paraview module provided by the HPC adminstrators → fails

like so:

#SBATCH --cpus-per-task=2
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1

module load ParaView/5.9.0-RC1-egl-mpi-Python-3.8

pvbatch --mpi --egl-device-index=$CUDA_VISIBLE_DEVICES --force-offscreen-rendering session.py
# OR
mpiexec -n $SLURM_CPUS_PER_TASK -bind-to core pvbatch --mpi --egl-device-index=$CUDA_VISIBLE_DEVICES --force-offscreen-rendering session.py

… refuses to render:

(   1.223s) [pvbatch.0       ] vtkEGLRenderWindow.cxx:382    ERR| vtkEGLRenderWindow (0x632d440): Only EGL 1.4 and greater allows OpenGL as client API. See eglBindAPI for more information.
(   1.223s) [pvbatch.0       ]vtkOpenGLRenderWindow.c:458    ERR| vtkEGLRenderWindow (0x632d440): GLEW could not be initialized: Missing GL version

This ParaView module is an easybuild-installation, and I am pretty sure, it’s nothing but the precompiled server binary provided on https://www.paraview.org/download/

2. Use the pre-compiled server binary → fails

from https://www.paraview.org/download/. Produces the same error as above.

3. Own build through superbuild

#!/usr/bin/env bash

#SBATCH -J install

#SBATCH --cpus-per-task=12
#SBATCH --mem-per-cpu=2541
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1

module load fosscuda/2020b # GCCcore-10.2.0
module load Python/3.8.6-GCCcore-10.2.0
module load SciPy-bundle/2020.11-fosscuda-2020b
module load matplotlib/3.3.3-fosscuda-2020b
module load CMake/3.18.4-GCCcore-10.2.0
module load libpng/1.6.37-GCCcore-10.2.0
module load Ninja/1.10.1-GCCcore-10.2.0
module load gperf/3.1-GCCcore-10.2.0
module load libffi/3.3-GCCcore-10.2.0
module load libxml2/2.9.10-GCCcore-10.2.0
module load pybind11/2.6.0-GCCcore-10.2.0
module load util-linux/2.36-GCCcore-10.2.0
module load zlib/1.2.11-GCCcore-10.2.0
module load freetype/2.10.3-GCCcore-10.2.0
module load SQLite/3.33.0-GCCcore-10.2.0

git clone --recursive https://gitlab.kitware.com/paraview/paraview-superbuild.git
cd paraview-superbuild
git checkout v5.9.1
git submodule update
cd ..

mkdir ParaView-5.9.1
cd ParaView-5.9.1

cmake -GNinja -Wno-dev \
-DCMAKE_INSTALL_PREFIX=$(pwd) \
-DPARAVIEW_BUILD_EDITION=CANONICAL \
-DENABLE_cuda=ON \
-DENABLE_egl=ON \
-DENABLE_fontconfig=ON \
-DENABLE_freetype=ON \
-DENABLE_mpi=ON \
-DENABLE_numpy=ON \
-DENABLE_ospray=ON \
-DENABLE_ospraymaterials=ON \
-DENABLE_png=ON \
-DENABLE_python3=ON \
-DENABLE_scipy=ON \
-DUSE_SYSTEM_bzip2=ON \
-DUSE_SYSTEM_freetype=ON \
-DUSE_SYSTEM_lapack=ON \
-DUSE_SYSTEM_libxml2=ON \
-DUSE_SYSTEM_mpi=ON \
-DUSE_SYSTEM_numpy=ON \
-DUSE_SYSTEM_png=ON \
-DUSE_SYSTEM_python3=ON \
-DUSE_SYSTEM_sqlite=ON \
../paraview-superbuild/

ninja -j $SLURM_CPUS_PER_TASK
ninja install

The build process seems successful. There is no error message and I end up with the bin folder holding pvbatch and pvpython among a few others.

However, it won’t run .
I have tried to run pvbatch in interactive slurm sessions as well as through batch jobs, executing the lines

mpiexec -n $SLURM_NTASKS -bind-to core pvbatch --mpi --egl-device-index=$CUDA_VISIBLE_DEVICES --force-offscreen-rendering ../session.py
slurm_19326116.sh (1.0 KB)
pvbatch --mpi --egl-device-index=$CUDA_VISIBLE_DEVICES --force-offscreen-rendering ../session.py
slurm_19326127.sh (997 Bytes)
pvbatch --egl-device-index=$CUDA_VISIBLE_DEVICES --force-offscreen-rendering ../session.py
slurm_19326144.sh (991 Bytes)
but also a plain pvbatch ../session.py

all producing this error:

--------------------------------------------------------------------------
An error occurred while trying to map in the address of a function.
Function Name: cuIpcOpenMemHandle_v2
Error string:  /lib64/libcuda.so.1: undefined symbol: cuIpcOpenMemHandle_v2
CUDA-aware support is disabled.
--------------------------------------------------------------------------

Is there any additional information I can provide to you to help you help me?
Is our CUDA too old? We have nvidia tesla K80 with driver version 450.36.06 and CUDA Version: 11.0. The most recent version seems to be 470.57.02 and CUDA 11.4.

Best,
Bastian

bastian · September 1, 2021, 11:49am

So maybe I should try spack then? Yay or nay?

bastian · September 3, 2021, 9:20am

Okay, so I think I’ll stick with osmesa-based rendering on CPUs only.

Should I continue my attempts to do GPU-based rendering through EGL, I will consult the following (anew):

Superbuild build documentation: https://gitlab.kitware.com/paraview/paraview-superbuild/-/blob/v5.9.1/README.md
ParaView build documentation: https://github.com/Kitware/ParaView/blob/v5.9.1/Documentation/dev/build.md
ParaView: ParaView and Offscreen Rendering

[ ] instructions for Cray-build might be helpful
[x] Trying ninja may be worth it “Builds with Ninja is less buggy”
[ ] Consider spack → https://github.com/spack/spack/blob/develop/var/spack/repos/builtin/packages/paraview/package.py
[ ] Related issue? https://gitlab.kitware.com/paraview/paraview/-/issues/19255