Error running Paraview Server binary (pvserver) compiled on Cray XC40

I have compiled Paraview on Cray XC40 (SahasraT at SERC in India) for Haswell compute nodes.
Following are the compile options used:
config.sh (2.0 KB)
paraview-cray-serc.cmake (3.0 KB)
After a successful compilation I submitted an interactive job script as follows:
qsub -I -N PARAVIEW_VIS -l select=2:ncpus=24:mpiprocs=24,walltime=24:00:00,place=scatter,accelerator_type="None"
Once inside the assigned node, I executed the following:

phyalan@nid00988:~> cd /mnt/lustre/phy3/phyalan/paraview-cray/install/bin/ 
phyalan@nid00988:/mnt/lustre/phy3/phyalan/paraview-cray/install/bin> export LD_LIBRARY_PATH="/mnt/lustre/phy3/phyalan/paraview-cray/install/lib:$LD_LIBRARY_PATH"
phyalan@nid00988:/mnt/lustre/phy3/phyalan/paraview-cray/install/bin> module switch PrgEnv-cray PrgEnv-gnu
phyalan@nid00988:/mnt/lustre/phy3/phyalan/paraview-cray/install/bin> module load gcc/7.3.0
phyalan@nid00988:/mnt/lustre/phy3/phyalan/paraview-cray/install/bin> module load cray-mpich/7.7.2
phyalan@nid00988:/mnt/lustre/phy3/phyalan/paraview-cray/install/bin> module load cce
phyalan@nid00988:/mnt/lustre/phy3/phyalan/paraview-cray/install/bin> module load craype-haswell
phyalan@nid00988:/mnt/lustre/phy3/phyalan/paraview-cray/install/bin> module load cray-hdf5
phyalan@nid00988:/mnt/lustre/phy3/phyalan/paraview-cray/install/bin> module load cray-python/2.7.15.3
phyalan@nid00988:/mnt/lustre/phy3/phyalan/paraview-cray/install/bin> aprun -j 1 -n 48 -N 24 ./pvserver --force-offscreen-rendering
Application 1268893 resources: utime ~0s, stime ~4s, Rss ~45220, inblocks ~808, outblocks ~0

I didn’t get a successful connection to paraview sever

@cory.quammen You have provided me with a lot of help on compiling Paraview but now the compiled binary cannot setup a Paraview server. Can you help me with this issue?

Sure. Take a look at this presentation on how to connect to a remote server. It covers the basics and dealing with more advanced topics such as reverse connections and SSH tunneling.

@cory.quammen
On submitting a job script to my Cray system I don’t get the expected output (the ip of pvserver waiting for client). Instead it almost immediately stops with the following output in connection_info.out file :

Application 1272281 resources: utime ~0s, stime ~4s, Rss ~48208, inblocks ~1012, outblocks ~0

Doing the same stuff in PBS interactive mode shows the same error

Job script:

#!/bin/sh 
#This job should be redirected to small queue
#PBS -N Paraview
#PBS -l select=2:ncpus=24
#PBS -l walltime=00:59:00
#PBS -l place=scatter
#PBS -l accelerator_type="None"
#PBS -m b 
#PBS -M alankardutta@iisc.ac.in
#PBS -e err_job
#PBS -o out_job
#PBS -S /bin/sh@sdb -V 
. /opt/modules/default/init/sh
cd $PBS_O_WORKDIR
module switch PrgEnv-cray PrgEnv-gnu
module load gcc/7.3.0
module load cray-mpich/7.7.2
module load cce
module load craype-haswell
module load cray-hdf5
module load cray-python/3.6.5.3
export LD_LIBRARY_PATH="$PBS_O_WORKDIR/../lib:$LD_LIBRARY_PATH"
aprun -j 1 -n 48 -N 24 ./pvserver --force-offscreen-rendering > connection_info.out

I also tried running pvserver in smaller number of nodes than what I asked for in interactive mode and ran into some more error messages.

phyalan@nid01372:/mnt/lustre/phy3/phyalan/paraview-cray/compiled/bin> ./pvserver --force-ofscreen-rendering
[Thu Jan 16 11:27:31 2020] [c7-0c0s7n0] Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(537): 
MPID_Init(246).......: channel initialization failed
MPID_Init(638).......:  PMI2 init failed: 1 
Aborted (core dumped)

phyalan@nid01372:/mnt/lustre/phy3/phyalan/paraview-cray/compiled/bin> aprun -j 1 -n 2 -N 1 ./pvserver --force-ofscreen-rendering
Thu Jan 16 11:28:13 2020: [PE_0]:inet_listen_socket_setup:inet_setup_listen_socket: bind failed port 1371 listen_sock = 14 Address already in use
Thu Jan 16 11:28:13 2020: [PE_0]:_pmi_inet_listen_socket_setup:socket setup failed
Thu Jan 16 11:28:13 2020: [PE_0]:_pmi_init:_pmi_inet_listen_socket_setup (full) returned -1
Thu Jan 16 11:28:13 2020: [PE_1]:inet_listen_socket_setup:inet_setup_listen_socket: bind failed port 1371 listen_sock = 14 Address already in use
Thu Jan 16 11:28:13 2020: [PE_1]:_pmi_inet_listen_socket_setup:socket setup failed
Thu Jan 16 11:28:13 2020: [PE_1]:_pmi_init:_pmi_inet_listen_socket_setup (full) returned -1
[Thu Jan 16 11:28:13 2020] [c0-0c0s2n0] Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(537): 
MPID_Init(246).......: channel initialization failed
MPID_Init(638).......:  PMI2 init failed: 1 
Thu Jan 16 11:28:13 2020: [unset]:_pmi_daemon_barrier:daemon pipe read failed from PE 0 errno = Success
Thu Jan 16 11:28:13 2020: [unset]:_pmiu_daemon:_pmi_daemon_barrier failed before _pmi_alps_sync()
[Thu Jan 16 11:28:13 2020] [c1-0c0s3n3] Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(537): 
MPID_Init(246).......: channel initialization failed
MPID_Init(638).......:  PMI2 init failed: 1 
Thu Jan 16 11:28:13 2020: [unset]:_pmi_daemon_barrier:daemon pipe read failed from PE 0 errno = Success
Thu Jan 16 11:28:13 2020: [unset]:_pmiu_daemon:_pmi_daemon_barrier failed before _pmi_alps_sync()
_pmiu_daemon(SIGCHLD): [NID 00008] [c0-0c0s2n0] [Thu Jan 16 11:28:13 2020] PE RANK 0 exit signal Aborted
[NID 00008] 2020-01-16 11:28:13 Apid 1272307: initiated application termination
_pmiu_daemon(SIGCHLD): [NID 00207] [c1-0c0s3n3] [Thu Jan 16 11:28:13 2020] PE RANK 1 exit signal Aborted
Application 1272307 exit codes: 134
Application 1272307 resources: utime ~0s, stime ~2s, Rss ~48308, inblocks ~4282, outblocks ~0
phyalan@nid01372:/mnt/lustre/phy3/phyalan/paraview-cray/compiled/bin>

Since MPI is failing to initialize, I would first check to see that the MPI that you compiled ParaView with is the same one you are running with.

@cory.quammen
If you see my config.sh script, I used that for compiling Paraview after loading the modules. I am now loading exactly the same version of the modules as in the config.sh script before trying to run pvserver. I’m absolutely stuck and any help will be a real lifesaver.

@cory.quammen
Thanks for helping me out. Your suspicion was right. It seemed to be a problem how Cray was configured in my institute by the admins. I have no idea why they installed mvapich2 and even added that to path. As a result, Paraview compile scripts were detecting that making the binaries generated incompatible with aprun. Adding -DMPIEXEC_EXECUTABLE=$(which aprun) to cmake argument and also in genericio.cmake file in paraview-superbuild solved the issue. Thanks a lot once again.

Awesome! I’m so glad it’s working. Every machine is a new adventure :slight_smile: