SSH tests have been acting up lately.
I’m keeping an eye on it and see if that can be improved.
Here is what’s happening. One of the SSH test rarely fail for a yet to be identified reason, killing the paraview executable. However, the server that have been run by the test is not killed, since it is not managed by smTestDriver but actually run with .sh script, like a real life pvserver.
Once a remaining pvserver is running, not a single SSH test will be able to pass, as they all uses the same port.
I do hope to fix the yet to be identified reason, but this shows a (small) design problem with these test, of which I’m sole responsible .
Indeed, a failing test should just be rerun and pass, it should not fail and break all the following build.
I see a few mitigations possible :
- Use random port in the SSH tests : no way to do that currently, and in any case, each failing test will let a pvserver process on the buildbot.
- Kill all pvserver after each build : Overkill but may be a temporary solution
- Use smTestDriver to run the pvserver : I’m not sure that it is possible, but if smTestDriver could be carefully told how to configure it’s pvserver, that may work.
Let me know what you think @utkarsh.ayachit @cory.quammen
Nice detective work! For a quick stopgap, option 2 seems reasonable. In the long run, making smTestDriver work is probably the way to go.
@ben.boeckel : Is option 2 doable ? If yes, could you do it ?
are these tests run serially? If so, can the server launch script that you have do a killall -9 pvserver
before starting the pvserver process? That’ll avoid requiring any changes on the buildbot/test runner side.
smTestDriver
can certainly run pvserver
. See the argument slinging done in CMake/ParaViewTesting.cmake
to get arguments to the server and such to that program.
Thanks for your suggestions !
Turns out it is much more complex than just killing pvserver.
One of the tests using the “Terminal” mechanism that enables to spawn a terminal to run the server in with a reverse tunelling ends up leaving a the ssh tunnel alive if pvserver is killed with a SIGINT or a SIGTERM.
I’ve implemented a quick and dirty fix to make sure the dashboard stays clean. This should enable me to see if the test still fails occasionally and investigate more deeply.
https://gitlab.kitware.com/paraview/paraview/merge_requests/3626