NVIDIA Index issues with CUDA 11.4 and Paraview 5.10

darrengarvey · February 3, 2022, 3:26pm

I’ve been trying to test out NVIDIA Index but not been able to get it working. I suspect it’s a CUDA version mismatch.

I’ve got two RTX 3090s and have tried the official Paraview 5.10 download as well as building Paraview from source (with all the necessary options enabled).

In both cases I get a nondescript error when the plugin tries to query my GPUs. Most of the time I see this error (unrelated to Paraview) is when there’s a CUDA driver version mismatch.

nvindex:   0.0   PVPLN  main info : NVIDIA IndeX ParaView plugin 5.10 using NVIDIA IndeX library 2.3 (build 348900.100.964, 26 Aug 2021, linux-x86-64-gcc7).
nvindex:   0.0   INDEX  main info : NVIDIA IndeX 2.3 (build 348900.100.964, 26 Aug 2021, linux-x86-64-gcc7) is starting up ...
nvindex:   0.0   INDEX  main info : Using default NVIDIA IndeX license.
nvindex:   0.0   INDEX  main info : Authenticating DiCE library with vendor key 'NVIDIA IndeX License for Paraview IndeX:PV:Free:v1 - 20210823 (oem:retail_cloud.20230831)'
nvindex:   0.0   INDEX  main info : This NVIDIA IndeX license key will expire on 2023-08-31.
nvindex:   0.0   INDEX  main info : This free version of NVIDIA IndeX enables the compute power of a single GPU for scientific visualization.
nvindex:   0.0   INDEX  main info : Starting the DiCE library (DiCE 2021, build 348900.100.964, 26 Aug 2021, linux-x86-64) ...
(  38.794s) [        19EB5000]    vtkOutputWindow.cxx:85    WARN| nvindex:   0.1   CUDA   rend warn : CUDA module initialization failed.

(  38.794s) [        19EB5000]    vtkOutputWindow.cxx:85    WARN| nvindex:   0.1   CUDA   rend warn : cudaRuntimeGetVersion returned with error 'unknown error'

nvindex:   0.1   CLUSTR net  info : Networking is switched off.
(  38.846s) [paraview        ]    vtkOutputWindow.cxx:75     ERR| nvindex:   1.0   INDEX  main error: Failed to query the number of CUDA devices (cudaGetDeviceCount): unknown error.

(  38.846s) [paraview        ]    vtkOutputWindow.cxx:75     ERR| nvindex:   1.0   INDEX  main error: Could not find any valid CUDA devices, aborting.

(  38.846s) [paraview        ]    vtkOutputWindow.cxx:75     ERR| nvindex:   1.0   PVPLN  main error: Fatal: Could not start NVIDIA IndeX library (error code 6), see log messages above for details.

(  39.031s) [paraview        ]    vtkOutputWindow.cxx:75     ERR| nvindex:   1.0   PVPLN  main error: The NVIDIA IndeX plugin was not initialized! See the log output for details.

Paraview is able to see my GPU for OpenGL rendering just not for Index (screenshot is from the about info).

Screenshot from 2022-02-03 15-18-52

I can’t find what CUDA versions index supports or what libnvindex.so (from: https://www.paraview.org/files/dependencies/) is built against. Is this information available anywhere? Or is there a build of libnvindex.so that is compatible with CUDA 11.4?

jmensmann · February 3, 2022, 5:13pm

The NVIDIA IndeX plugin requires a GPU driver that supports CUDA 10.2 or newer, meaning version >=440.33. So your driver should be more than sufficient.

Since the CUDA initialization fails, it is likely that you are missing some dependency libraries. The official NVIDIA driver package contains all of them, but Linux distributions tend to split them into multiple packages.

Which Linux distribution are you using? On Ubuntu, for example, the CUDA libraries are provided by the package “libnvidia-compute-*”.

Also please check the output of “nvidia-smi”. Which CUDA version is reported there?

darrengarvey · February 3, 2022, 9:25pm

TL;DR: Suspending and resuming appears to have caused issues with nvidia-uvm. A reboot fixed it.

Thanks for the reply. The CUDA versions are:

NVIDIA-SMI 470.86 Driver Version: 470.86 CUDA Version: 11.4

I did some straceing and the error comes up when /dev/nvidia-uvm is attempted to be opened.

[pid 765341] fstat(37, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
[pid 765341] close(37)                  = 0
[pid 765341] stat("/dev/nvidiactl", {st_mode=S_IFCHR|0666, st_rdev=makedev(0xc3, 0xff), ...}) = 0
[pid 765341] openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = 37
[pid 765341] fcntl(37, F_SETFD, FD_CLOEXEC) = 0
[pid 765341] ioctl(37, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd2, 0x48), 0x7fdc651714e0) = 0
------8<------
[pid 765341] stat("/dev/nvidia-uvm", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1f8, 0), ...}) = 0
[pid 765341] stat("/dev/nvidia-uvm-tools", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1f8, 0x1), ...}) = 0
[pid 765341] openat(AT_FDCWD, "/dev/nvidia-uvm", O_RDWR|O_CLOEXEC) = -1 EIO (Input/output error)
[pid 765341] openat(AT_FDCWD, "/dev/nvidia-uvm", O_RDWR) = -1 EIO (Input/output error)
[pid 765341] ioctl(-5, _IOC(_IOC_NONE, 0, 0x2, 0x3000), 0) = -1 EBADF (Bad file descriptor)
[pid 765341] ioctl(37, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x29, 0x10), 0x7fdc65171520) = 0
[pid 765341] close(37)                  = 0
[pid 765341] futex(0x7fdc69799318, FUTEX_WAKE_PRIVATE, 2147483647) = 0
( 101.187s) [        65172700]    vtkOutputWindow.cxx:85    WARN| nvindex:   0.1   CUDA   rend warn : CUDA module initialization failed.

( 101.187s) [        65172700]    vtkOutputWindow.cxx:85    WARN| nvindex:   0.1   CUDA   rend warn : cudaRuntimeGetVersion returned with error 'unknown error'

This lead me to find that it’s reported elsewhere that after resuming from standby, there can be problems with NVIDIA’s unified virtual memory manager (ie. /dev/nvidia-uvm).

So it’s working now!

EDIT: All of the relevant libraries were found and all other system calls appeared to work as they should.