1. MPI compatibility
Following the suggestion of @Dave_DeMarle, I have made sure that both client and server sides are built with the flag USE_SYSTEM_mpi=ON
so that when I launch a command
mpirun -np 4 $PV570DIR/bin/pvserver --hostname=$(hostname -a) -rc --client-host=[client-url] --disable-xdisplay-test --timeout=1
the mpirun version at run time is the same as at compile time. Also, client and server tap from the same pool of programs and shared libraries.
2. cmake settings
The external cache variables in CMakeCache.txt are the same in one setting. In another setting they differ because the client has qt5, the server elg. In both settings problems occur.
3. setting qt5+elg with --disable-xdisplay-test
3.1 server starts
I receive the following messages
Connecting to client (reverse connection requested)…
Connection URL: csrc://[client-url]:11111
Connecting to client (reverse connection requested)…
Connection URL: csrc://[client-url]:11111
Connecting to client (reverse connection requested)…
Connection URL: csrc://[client-url]:11111
Connecting to client (reverse connection requested)…
Connection URL: csrc://[client-url]:11111
3.2 client connects
At the client side I see the notifications
Accepting connection(s): [client-hostname]:11111
Accepting connection(s): [client-hostname]:11111
At the server side
[31m( 26.150s) [pvserver ]vtkSocketCommunicator.c:808 ERR| vtkSocketCommunicator (0x56537cb0bff0): Could not receive tag. 1010580540
[31m( 26.150s) [pvserver ]vtkSocketCommunicator.c:557 ERR| vtkSocketCommunicator (0x56537cb0bff0): Endian handshake failed.
Client connected.
[31m( 26.150s) [pvserver ]vtkTCPNetworkAccessMana:333 ERR| vtkTCPNetworkAccessManager (0x56537c975b60):
Connection failed during handshake. vtkSocketCommunicator::GetVersion()
returns different values on the two connecting processes
(Current value: 100).
Exiting…
So one connection works and the other exits. The other two idle connections continue to loop over the sequence 472-51-396
[31m( 25.176s) [pvserver ] vtkSocket.cxx:472 ERR| vtkClientSocket (0x5622bf734610): Socket error in call to connect. Connection refused.
[31m( 25.176s) [pvserver ] vtkClientSocket.cxx:51 ERR| vtkClientSocket (0x5622bf734610): Failed to connect to server [client-url]:11111
[33m( 25.176s) [pvserver ]vtkTCPNetworkAccessMana:396 WARN| vtkTCPNetworkAccessManager (0x5622becadb60): Connect failed. Retrying for 34.9704 more seconds.
until time-out, while the client operates.
3.3 clients works
I manage to operate at the client side and load the data from the server side. At the server there is one process pvserver
at work. Effectively a sequential job.
4. setting qt5+elg without --disable-xdisplay-test
At the point of connection I have 1 working connection (hence a pvserver process), 3 rapid exits and 0 timeouts. The threefold exit message is again
Connection failed during handshake. vtkSocketCommunicator::GetVersion() returns different values on the two connecting processes (Current value: 100).
and pvserver
works sequentially as in Sec. 3.3.
5. setting qt5+qt5 with --disable-xdisplay-test
In this case I also set the cache variable Qt5_DIR
to where I could locate the file sb-qt5-configure.cmake
.
5.1 server starts
At the server side I see that prgep pvserver
lists four process IDs.
5.2 client connects
However, of the four processes only one connects to the client and the other three keep on trying.
5.3 client does not work
When I try to load data from the server the connection at the server side crashes with the message
X Error of failed request: BadValue (integer parameter out of range for operation)
Major opcode of failed request: 154 (GLX)
Minor opcode of failed request: 3 (X_GLXCreateContext)
Value in failed request: 0x0
Serial number of failed request: 52
Current serial number in output stream: 53Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.
with three SIGTERMS and finally with
mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[64131,1],3]
Exit code: 1
These errors are why I moved to the qt5+egl configuration, see top post .
6. setting qt5+qt5 without --disable-xdisplay-test
One connection works. The other one exits with the same connection-failed-during-handshake as in Sec 4, and then the system crashes with the X-error, 3 SIGTERMs and the MPI job termination as in Sec 5.3.
7. similar issues reported?
- Client / Server configuration
- Cannot connect to Catalyst Live visualization on Mac OS X
- ParaView version
- [Paraview] Connecting to pvserver
8. questions
Where could the problem be?
How to get pvserver working parallel in full glory?
Tips and/or directions?