Issue and situation
This post is a second branch of Reverse client-server connection fails with more than one MPI process. Parallel jobs fall back to serial in a reverse connection between client and server within the same network – see above post for the mechanics of the error.
- At server and client side I work with same architectures, operating systems and compilers: x86_64, Debian GNU/Linux 10 (buster), gcc (Debian 8.3.0-6) 8.3.0.
- Paraview is a release build of 5.7.0 out of the Superbuild at https://gitlab.kitware.com/paraview/paraview-superbuild/blob/master/README.md. The repository has been cloned separately on client and server.
Earlier I was trying to pinpoint the cause of the error linked to the disconnection with a mpi process: "vtkSocketCommunicator::GetVersion() returns different values on the two connecting processes". @mwestphal has pointed out that this very message is not informative and is a known issue in itself, see https://gitlab.kitware.com/paraview/paraview/issues/18171. Therefore it makes sense to inquire into the error messages received just before that one.
[31m( 26.150s) [pvserver ]vtkSocketCommunicator.c:808 ERR| vtkSocketCommunicator (0x56537cb0bff0): Could not receive tag. 1010580540
[31m( 26.150s) [pvserver ]vtkSocketCommunicator.c:557 ERR| vtkSocketCommunicator (0x56537cb0bff0): Endian handshake failed.
I found documentation of either error or of both in
- 2016, issue tracker of VTK: https://vtk.org/Bug/view.php?id=16094
- 2013 https://www.cfd-online.com/Forums/paraview/124418-volume-rendering-only-one-core.html
- 2013 http://vtk.1045678.n5.nabble.com/Using-paraview-with-pvserver-td5719016.html
- 2013, paraview narchive: https://paraview.paraview.narkive.com/ogEJvINP/connecting-to-pvserver
- 2009, https://www.paraview.org/pipermail/paraview/2009-February/011237.html
which all kind of relate to the same attempts to have pvserver work as expected. But they do not seem to have received an adequate answer, or I could not decipher a relationship to my specifics.
Previous activity on this forum lets me think that
- the error “Could not receive tag” happen also after connection, see https://discourse.paraview.org/search?q=“could%20not%20receive%20tag”. Whereas in my case this exception occurs at connect time (parallel launch degrades to serial).
- there’s been no discussion on endian handshakes https://discourse.paraview.org/search?q=“endian%20handshake” either.
Same tune on Stackexchange https://stackexchange.com/search?q="Could+not+find+tag", if not no tune at all: https://stackexchange.com/search?q="Endian+handshake+failed"
Any suggestion for fixes and workarounds to have the reverse connection work in parallel?