Possible race condition in QVTKOpenGLNativeWidget::initializeGL

This is a bug when packaging paraview, vtk with nix/nixpkgs on darwin platform.
You can reproduce the bug by running (on aarch64-darwin)

nix build github:qbisi/nur-fem/7d748bdf7e3b132a90c56d524af016d5facbcd82#paraview
./result/bin/paraview

and a segfault error will happen

(   2.379s) [paraview        ]vtkOpenGLRenderWindow.c:921   WARN| .   .   .   .   vtkGenericOpenGLRenderWindow (0x1578fc880): Failed to initialize OpenGL functions!
(   2.380s) [paraview        ]vtkOpenGLRenderWindow.c:947   WARN| .   .   .   .   vtkGenericOpenGLRenderWindow (0x1578fc880): Unable to find a valid OpenGL 3.2 or later implementation. Please update your video card driver to the latest version. If you are using Mesa please make sure you have version 11.2 or later and make sure your driver in Mesa supports OpenGL 3.2 such as llvmpipe or openswr. If you are on windows and using Microsoft remote desktop note that it only supports OpenGL 3.2 with nvidia quadro cards. You can use other remoting software such as nomachine to avoid this issue.
[Resolvent:66468] *** Process received signal ***
[Resolvent:66468] Signal: Segmentation fault: 11 (11)
[Resolvent:66468] Signal code: Invalid permissions (2)
[Resolvent:66468] Failing at address: 0x0
[Resolvent:66468] [ 0] 0   libsystem_platform.dylib            0x0000000191ef7624 _sigtramp + 56
[Resolvent:66468] [ 1] 0   libvtkRenderingOpenGL2.1.dylib      0x000000010a9b56bc _ZN14vtkOpenGLState32InitializeTextureInternalFormatsEv + 180
[Resolvent:66468] [ 2] 0   libvtkRenderingOpenGL2.1.dylib      0x000000010a9b5004 _ZN14vtkOpenGLState10InitializeEP21vtkOpenGLRenderWindow + 72
[Resolvent:66468] [ 3] 0   libvtkRenderingOpenGL2.1.dylib      0x000000010a99c47c _ZN21vtkOpenGLRenderWindow5StartEv + 88
[Resolvent:66468] [ 4] 0   libvtkRenderingCore.1.dylib         0x00000001124bd414 _ZN15vtkRenderWindow6RenderEv + 176
[Resolvent:66468] [ 5] 0   libvtkRenderingOpenGL2.1.dylib      0x000000010a99ef50 _ZN21vtkOpenGLRenderWindow6RenderEv + 124
[Resolvent:66468] [ 6] 0   libvtkRenderingOpenGL2.1.dylib      0x000000010a8e1e00 _ZN28vtkGenericOpenGLRenderWindow6RenderEv + 128
[Resolvent:66468] [ 7] 0   libvtkRemotingViews.6.0.dylib       0x00000001088e45c0 _ZN15vtkPVRenderView6RenderEbb + 2760
[Resolvent:66468] [ 8] 0   libvtkRemotingViews.6.0.dylib       0x00000001088e3a84 _ZN15vtkPVRenderView17InteractiveRenderEv + 192
[Resolvent:66468] [ 9] 0   libvtkRemotingApplication.6.0.dylib 0x000000010533cdec _ZL22vtkPVRenderViewCommandP26vtkClientServerInterpreterP13vtkObjectBasePKcRK21vtkClientServerStreamRS5_Pv + 2980
[Resolvent:66468] [10] 0   libvtkRemotingClientServerStream.6. 0x00000001096353b0 _ZN26vtkClientServerInterpreter20ProcessCommandInvokeERK21vtkClientServerStreami + 276
[Resolvent:66468] [11] 0   libvtkRemotingClientServerStream.6. 0x0000000109634898 _ZN26vtkClientServerInterpreter17ProcessOneMessageERK21vtkClientServerStreami + 192
[Resolvent:66468] [12] 0   libvtkRemotingClientServerStream.6. 0x0000000109634788 _ZN26vtkClientServerInterpreter13ProcessStreamERK21vtkClientServerStream + 60
[Resolvent:66468] [13] 0   libvtkRemotingServerManager.6.0.dyl 0x0000000108bf28e8 _ZN16vtkPVSessionCore21ExecuteStreamInternalERK21vtkClientServerStreamb + 284
[Resolvent:66468] [14] 0   libvtkRemotingServerManager.6.0.dyl 0x0000000108bf264c _ZN16vtkPVSessionCore13ExecuteStreamEjRK21vtkClientServerStreamb + 268
[Resolvent:66468] [15] 0   libvtkRemotingServerManager.6.0.dyl 0x0000000108bef3c0 _ZN16vtkPVSessionBase13ExecuteStreamEjRK21vtkClientServerStreamb + 72
[Resolvent:66468] [16] 0   libvtkRemotingViews.6.0.dylib       0x00000001089a2d10 _ZN14vtkSMViewProxy17InteractiveRenderEv + 236
[Resolvent:66468] [17] 0   libvtkRemotingViews.6.0.dylib       0x00000001089a6ae8 _ZN30vtkSMViewProxyInteractorHelper6RenderEv + 520
[Resolvent:66468] [18] 0   libvtkCommonCore.1.dylib            0x0000000124ab9a00 _ZN16vtkSubjectHelper11InvokeEventEmPvP9vtkObject + 1240
[Resolvent:66468] [19] 0   libvtkGUISupportQt.1.dylib          0x0000000103e30b60 _ZN23QVTKRenderWindowAdapter13QVTKInternals5paintEv + 864
[Resolvent:66468] [20] 0   libvtkGUISupportQt.1.dylib          0x0000000103e2d944 _ZN22QVTKOpenGLNativeWidget7paintGLEv + 104
[Resolvent:66468] [21] 0   QtOpenGLWidgets                     0x0000000102809368 _ZN20QOpenGLWidgetPrivate6renderEv + 888
[Resolvent:66468] [22] 0   QtWidgets                           0x000000011954cd7c _ZN7QWidget5eventEP6QEvent + 112
[Resolvent:66468] [23] 0   QtOpenGLWidgets                     0x000000010280aa4c _ZN13QOpenGLWidget5eventEP6QEvent + 456
[Resolvent:66468] [24] 0   QtWidgets                           0x00000001194f21d0 _ZN19QApplicationPrivate13notify_helperEP7QObjectP6QEvent + 332
[Resolvent:66468] [25] 0   QtWidgets                           0x00000001194f323c _ZN12QApplication6notifyEP7QObjectP6QEvent + 460
[Resolvent:66468] [26] 0   QtCore                              0x0000000119ce682c _ZN16QCoreApplication20sendSpontaneousEventEP7QObjectP6QEvent + 168
[Resolvent:66468] [27] 0   QtWidgets                           0x000000011954427c _ZN14QWidgetPrivate14sendPaintEventERK7QRegion + 72
[Resolvent:66468] [28] 0   QtOpenGLWidgets                     0x000000010280a344 _ZN13QOpenGLWidget11resizeEventEP12QResizeEvent + 260
[Resolvent:66468] [29] 0   QtWidgets                           0x000000011954d4b4 _ZN7QWidget5eventEP6QEvent + 1960

The error indicate that we are calling glad_glGetString(GL_VERSION) before a valid OpenGL context is current on the calling thread.

I can further valid this race condition bug by fixing with this sleep patch

diff --git a/GUISupport/Qt/QVTKOpenGLNativeWidget.cxx b/GUISupport/Qt/QVTKOpenGLNativeWidget.cxx
index a0f00debde..1e74544e03 100644
--- a/GUISupport/Qt/QVTKOpenGLNativeWidget.cxx
+++ b/GUISupport/Qt/QVTKOpenGLNativeWidget.cxx
@@ -10,6 +10,8 @@
 #include <QPointer>
 #include <QScopedValueRollback>
 #include <QtDebug>
+#include <thread>
+#include <chrono>
 
 #include "QVTKInteractor.h"
 #include "QVTKInteractorAdapter.h"
@@ -225,6 +227,7 @@ void QVTKOpenGLNativeWidget::initializeGL()
         {
           if (auto* symbol = context->getProcAddress(name))
           {
+            std::this_thread::sleep_for(std::chrono::milliseconds(500));
             return symbol;
           }
         }

Though this patch does not fix the root cause of race condition.

@jaswantp @cory.quammen

I remember that printing the symbol to console fixed this issue. The null check seemed to have added enough of a delay to fix this segfault. The context should already be current because vtkOpenGLRenderWindow calls MakeCurrent before the state initialization. It is a mystery because this happens only on macOS.

Yes, printing the symbol did fix this race condition on macos platform for me.
What’s your current approch to fix this problem. Printing something to stdout will contamination the log system and in my opinion is not good.
For now, i would prefer this patch on darwin platform

diff --git a/GUISupport/Qt/QVTKOpenGLNativeWidget.cxx b/GUISupport/Qt/QVTKOpenGLNativeWidget.cxx
index a0f00debde..2f5ce52ddb 100644
--- a/GUISupport/Qt/QVTKOpenGLNativeWidget.cxx
+++ b/GUISupport/Qt/QVTKOpenGLNativeWidget.cxx
@@ -230,7 +230,7 @@ void QVTKOpenGLNativeWidget::initializeGL()
         }
         return nullptr;
       };
-      this->RenderWindow->SetOpenGLSymbolLoader(loadFunc, this->context());
+      // this->RenderWindow->SetOpenGLSymbolLoader(loadFunc, this->context());
       this->RenderWindow->vtkOpenGLRenderWindow::OpenGLInit();
     }
     auto ostate = this->RenderWindow->GetState();

and it do work for me.

Hi, how did you solve this bug when you packaging paraview on macos. Any recommend method for doing null check without polluting stdout.

Hi,

Apologies for the slow response. I think this is not an issue in our macos paraview distribution (haven’t seen anyone report a bug like this).

If you have the time, my suggestion is to build qt using a RelWithDebInfo config and try to understand what is going on here.

I’m not entirely sure what nullptr check is being done, but perhaps you just loop on that check until it is valid and then proceed?

Hi, thans for your suggestion. What if i comment this line and use vtkglad to find the appropriate OpenGL lib on darwin platform. It do works for me, but i am not sure if is has some side effect.

The nullptr check refers to https://gitlab.kitware.com/vtk/vtk/-/blob/master/GUISupport/Qt/QVTKOpenGLNativeWidget.cxx#L226

if (auto* symbol = context->getProcAddress(name))
{
  return symbol;
}

What if i comment this line and use vtkglad to find the appropriate OpenGL lib on darwin platform. It do works for me, but i am not sure if is has some side effect.

We kind of want to use the OpenGL functions provided by Qt. But as it is troublesome on macOS, I think it is reasonable to conditionally skip this->RenderWindow->SetOpenGLSymbolLoader(loadFunc, this->context()); on apple.

1 Like
diff --git a/GUISupport/Qt/QVTKOpenGLNativeWidget.cxx b/GUISupport/Qt/QVTKOpenGLNativeWidget.cxx
index a0f00debde..6dd37d3ac2 100644
--- a/GUISupport/Qt/QVTKOpenGLNativeWidget.cxx
+++ b/GUISupport/Qt/QVTKOpenGLNativeWidget.cxx
@@ -225,6 +225,7 @@ void QVTKOpenGLNativeWidget::initializeGL()
         {
           if (auto* symbol = context->getProcAddress(name))
           {
+            qDebug() << "symbol not null";
             return symbol;
           }
         }

I had to print something to console to pass the race condition.
Which i think might pollute the stdout.

By the way, i had also build paraview/vtk with serial SMP backend and no mpi.

Yeah, I remember having to do that too. But I noticed that the if statement on symbol added enough of a delay to fix the race condition and I did not need the print statement.

Apparently, it is still not enough of a delay and needs a print. Can you contribute the fix to VTK which skips setting the opengl symbol loader for macOS?

Another occurance of setting the opengl symbol loader
https://gitlab.kitware.com/vtk/vtk/-/blob/master/GUISupport/Qt/QVTKOpenGLWindow.cxx?ref_type=heads#L219.
Should i conditionaly comment this line too ?

Yes

Something like

#if !defined(__APPLE__)
this->RenderWindow->SetOpenGLSymbolLoader(loadFunc, this->context());
#endif

Skip setting the opengl symbol loader for macOS (!12262) · Merge requests · VTK / VTK · GitLab

I have opened my first commit to vtk. Fix me if i made something wrong.

Thank you for the contribution! i have started the test and will merge after it comes back clean.