Floating point exception since Paraview 5.7.0 using NoMachine

nathan · October 28, 2019, 3:58pm

Hello,
I used to execute ParaView remotely on (headless) EC2 using NoMachine, which worked perfectly with the command

$ GL_VERSION_OVERRIDE=3.2 /data/ParaView-5.6.1-MPI-Linux-64bit/bin/paraview

Since version 5.7.0, I get a floating point error with no additional output:

$ MESA_GL_VERSION_OVERRIDE=3.2 /data/ParaView-5.7.0-MPI-Linux-Python2.7-64bit/bin/paraview
Floating point exception (core dumped)

This is independent whether I choose the python 2.7 or python 3.7 download of ParaView 5.7.0 from https://www.paraview.org/download/.

What has changed between 5.6 and 5.7? Probably some OpenGL version update? Will it be possible to change settings so that it runs on EC2 through NoMachine again?

Thanks!
Jo

ben.boeckel · October 29, 2019, 1:03pm

Is it possible to get a backtrace of where the exception occurred? I don’t think OpenGL changed substantially between the two. Mesa did get updated, but an FP exception seems odd for that. @chuckatkins

nathan · October 29, 2019, 2:43pm

Does this help? I am not very familiar with debuggers, but I opened the core dump in gdb. If you could provide a binary with debug information, I might be able to send you a more detailed backtrace. I couldn’t compile ParaView successfully.

Thanks
Jo

$ gdb /tmp/ParaView-5.7.0-MPI-Linux-Python2.7-64bit/bin/paraview core
GNU gdb (Ubuntu 8.1-0ubuntu3.1) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /tmp/ParaView-5.7.0-MPI-Linux-Python2.7-64bit/bin/paraview...(no debugging symbols found)...done.
[New LWP 5267]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/tmp/ParaView-5.7.0-MPI-Linux-Python2.7-64bit/bin/paraview'.
Program terminated with signal SIGFPE, Arithmetic exception.
#0  0x00007fc80bbf3543 in mkldnn::impl::cpu::(anonymous namespace)::get_cache_size(int, bool) [clone .constprop.245] () from /tmp/ParaView-5.7.0-MPI-Linux-Python2.7-64bit/bin/../lib/libOpenImageDenoise.so.0
(gdb) bt
#0  0x00007fc80bbf3543 in mkldnn::impl::cpu::(anonymous namespace)::get_cache_size(int, bool) [clone .constprop.245] () from /tmp/ParaView-5.7.0-MPI-Linux-Python2.7-64bit/bin/../lib/libOpenImageDenoise.so.0
#1  0x00007fc80bb87c52 in _GLOBAL__sub_I_jit_avx512_common_conv_kernel.cpp () from /tmp/ParaView-5.7.0-MPI-Linux-Python2.7-64bit/bin/../lib/libOpenImageDenoise.so.0
#2  0x00007fc83361d733 in call_init (env=0x7fffab844688, argv=0x7fffab844678, argc=1, l=<optimized out>) at dl-init.c:72
#3  _dl_init (main_map=0x7fc833836170, argc=1, argv=0x7fffab844678, env=0x7fffab844688) at dl-init.c:119
#4  0x00007fc83360e0ca in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#5  0x0000000000000001 in ?? ()
#6  0x00007fffab8453ae in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb)

ben.boeckel · October 29, 2019, 3:06pm

This points to OIDN being the problem. @Dave_DeMarle

nathan · October 29, 2019, 3:28pm

Is a workaround possible? Maybe install a different version of libOpenImageDenoise.so.0 and force it to be used?

Can the “virtual hardware” of EC2 on the AWS cloud be the problem? Stack frame #1 says something about avx512, which might require specific cpu capabilities?

Found on https://github.com/OpenImageDenoise/oidn:

Intel Open Image Denoise internally builds on top of [Intel® Math Kernel Library for Deep Neural Networks (MKL-DNN)](https://github.com/intel/mkl-dnn), and automatically exploits modern instruction sets like Intel SSE4, AVX2, and AVX-512 to achieve high denoising performance. A CPU with support for at least SSE4.1 is required to run Intel Open Image Denoise.

Is OIDN always required in ParaView, or only for certain raytracing optimization? Can I disable OIDN (without recompiling ParaView)?

Thanks
Jo

ben.boeckel · October 29, 2019, 9:56pm

I think there’s work to make OSPRay and VisRTX and friends only load if they’re necessary. I don’t know the status of it. I don’t know if there’s a workaround once it is compiled-in right now, sorry.

mwestphal · October 30, 2019, 9:10am

@tbiedert @Dave_DeMarle

Dave_DeMarle · October 30, 2019, 2:47pm

More of an Carson/Jefferson thing than a Tim Biedert thing. I’ll submit a bug upstream and work with them on it. (Perhaps this has been noticed and fixed since the oidn version 0.7 that we packaged with the ParaView 5.7.0 binaries).

@nathan it is a ray tracing thing, OIDN should only become active when you first turn on ray tracing and subsequently enable denoising (both at the bottom of the view panel) but it seems to go wrong before that in your situation. You might get away with swapping in a newer library, but to the best of my knowledge you need to recompile PaaView to get around it. The most relevant CMAKE option to try turning off is VTKOSPRAY_ENABLE_DENOISER.

nathan · October 30, 2019, 2:54pm

I’ll appreciate an update, if it is possible to run ParaView without the raytraycers or some other workaround. Otherwise I’m stuck with ParaView <=5.6.

The avx/sse4.1 support in EC2 (here for example a c3.large instance) seems not to be the problem. Under “flags”, avx, sse4_1 and sse4_2 are listed:

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping        : 4
microcode       : 0x42e
cpu MHz         : 2793.344
cache size      : 25600 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti fsgsbase smep erms xsaveopt
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips        : 5586.53
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

@Dave_DeMarle yes, I cannot even see ParaViews UI, let alone use any raytracing. It crashes before any effect is visible except for the FPE. I’ll try if I find a new OIDN library version and make ParaView use it. I am not able to compile ParaView in this environment due to many dependencies that are too complicated for me, so I am limited to the binaries by Kitware.

Thanks
Jo

Dave_DeMarle · October 30, 2019, 2:56pm

Also check to see why your EC2 image doesn’t have SSE4.1 compatible instruction sets on the CPUs. I’ve heard that most AMD and Intel CPUs made for the last 7 years do (it was introduced in 2007).

Dave_DeMarle · October 30, 2019, 2:59pm

The Paraview Superbuild project should make the build (and especially dependencies) go a lot easier see: https://gitlab.kitware.com/paraview/paraview-superbuild

Dave_DeMarle · October 30, 2019, 3:02pm

So something is awry with oidn’s chipset detection. hmmm…

Dave_DeMarle · October 30, 2019, 3:33pm

Tracking this here: https://gitlab.kitware.com/paraview/paraview/issues/19424

nathan · November 1, 2019, 2:29pm

Thank you, Dave, you mentioned at https://github.com/OpenImageDenoise/oidn/issues/43 that you’ll dig in the problem. Looking forward to see updates.

Here are the steps to make it easier to reproduce the problem (assuming you have access to EC2 on AWS):

launch c3-large instance from ami-0ac019f4fcb7cb7e6
Install requirements:
sudo apt update && sudo apt install -y libxt6 libgl1-mesa-dev libglu1-mesa-dev
Download paraview from https://www.paraview.org/paraview-downloads/download.php?submit=Download&version=v5.7&type=binary&os=Linux&downloadFile=ParaView-5.7.0-MPI-Linux-Python2.7-64bit.tar.gz
Extract and run ParaView. You will get the stack trace shown above.

nathan · November 7, 2019, 9:44am

@Dave_DeMarle any progress on this issue?