MPI fail on HPC cluster

Thanks for the link!

So I was able to install FreeFEM and PETSc following the link you sent me. Using OpenMPI I still get the same error when trying to launch a job over several cores. I’d like to try with Intel MPI but I have a question. In the pdf you mentioned, they say “if PETSc is not detected, overwrite MPIRUN” for the configure of FreeFEM, which is the case for me with Intel (I didn’t have this problem with OpenMPI). So which command should I use instead of ./configure --without-hdf5 --with-petsc=${PETSC_VAR}/lib ? What should I do to overwrite mpirun?
Thanks in advance.

Do PETSc properly detect IntelMPI? If it is the case, then you should not need to worry about FreeFEM because it will use the MPI implementation detected by PETSc. By the way, do “standard” PETSc examples run with your installation? E.g., if you do make check in ${PETSC_DIR}?

make check in the PETSc directory fails to detect my MPI:

Running check examples to verify correct installation
Using PETSC_DIR=/softs/freefem-petsc-intel/petsc and PETSC_ARCH=arch-FreeFem
gmake[3]: [ex19.PETSc] Error 2 (ignored)
*******************Error detected during compile or link!*******************
See http://www.mcs.anl.gov/petsc/documentation/faq.html
/softs/freefem-petsc-intel/petsc/src/snes/tutorials ex19
*********************************************************************************
mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -D_POSIX_C_SOURCE=199309L -O3  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -D_POSIX_C_SOURCE=199309L -O3  -std=c99  -I/softs/freefem-petsc-intel/petsc/include -I/softs/freefem-petsc-intel/petsc/arch-FreeFem/include  -std=c99   ex19.c  -Wl,-rpath,/softs/freefem-petsc-intel/petsc/arch-FreeFem/lib -L/softs/freefem-petsc-intel/petsc/arch-FreeFem/lib -Wl,-rpath,/softs/freefem-petsc-intel/petsc/arch-FreeFem/lib -L/softs/freefem-petsc-intel/petsc/arch-FreeFem/lib -Wl,-rpath,/softs/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/lib/debug_mt -L/softs/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/softs/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/lib -L/softs/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -Wl,-rpath,/softs/intel/clck/2019.10/lib/intel64 -L/softs/intel/clck/2019.10/lib/intel64 -Wl,-rpath,/softs/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib -L/softs/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib -Wl,-rpath,/softs/intel/compilers_and_libraries_2020.4.304/linux/ipp/lib/intel64 -L/softs/intel/compilers_and_libraries_2020.4.304/linux/ipp/lib/intel64 -Wl,-rpath,/softs/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin -L/softs/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin -Wl,-rpath,/softs/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin -L/softs/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin -Wl,-rpath,/softs/intel/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8 -L/softs/intel/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8 -Wl,-rpath,/softs/intel/compilers_and_libraries_2020.4.304/linux/daal/lib/intel64_lin -L/softs/intel/compilers_and_libraries_2020.4.304/linux/daal/lib/intel64_lin -Wl,-rpath,/softs/intel/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64_lin/gcc4.8 -L/softs/intel/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64_lin/gcc4.8 -lpetsc -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu -lflapack -lfblas -lparmmg -lmmg -lmmg3d -lptesmumps -lptscotchparmetis -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lparmetis -lmetis -ltet -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lrt -lquadmath -lstdc++ -ldl -o ex19
/usr/bin/bash: mpicc: command not found
gmake[4]: *** [ex19] Error 127
1,5c1
< lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
<   0 SNES Function norm 0.0406612
<   1 SNES Function norm 4.12227e-06
<   2 SNES Function norm 6.098e-11
< Number of SNES iterations = 2
---
> /usr/bin/bash: mpiexec: command not found
/softs/freefem-petsc-intel/petsc/src/snes/tutorials
Possible problem with ex19 running with hypre, diffs above
=========================================
1,9c1
< lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
<   0 SNES Function norm 0.239155
<     0 KSP Residual norm 0.235858
<     1 KSP Residual norm < 1.e-11
<   1 SNES Function norm 6.81968e-05
<     0 KSP Residual norm 2.30906e-05
<     1 KSP Residual norm < 1.e-11
<   2 SNES Function norm < 1.e-11
< Number of SNES iterations = 2
---
> /usr/bin/bash: mpiexec: command not found
/softs/freefem-petsc-intel/petsc/src/snes/tutorials
Possible problem with ex19 running with mumps, diffs above
=========================================
1,5c1
< lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
<   0 SNES Function norm 0.0406612
<   1 SNES Function norm 3.32845e-06
<   2 SNES Function norm < 1.e-11
< Number of SNES iterations = 2
---
> /usr/bin/bash: mpiexec: command not found
/softs/freefem-petsc-intel/petsc/src/snes/tutorials
Possible problem with ex19 running with suitesparse, diffs above
=========================================
Completed test examples

The MPI should be:

which mpiexec
/softs/intel/intelpython3/bin/mpiexec

And running make in the FreeFEM folder fails because FreeFEM doesn’t seem to look for the PETSc libs in the right directory (’-L/softs/freefem-petsc-intel/FreeFem-sources/3rdparty/lib’ whereas it should be ${PETSC_VAR}/lib:

/softs/intel/compilers_and_libraries_2020.4.304/linux/bin/intel64/icpc -shared -fPIC -g -DNDEBUG -O3 -mmmx -mavx -std=c++14 -DBAMG_LONG_LONG -DNCHECKPTR -fPIC 'Schur-Complement.o' -o Schur-Complement.so '-L/softs/freefem-petsc-intel/FreeFem-sources/3rdparty/lib' '-llapack'
ld: cannot find -llapack
make[4]: *** [Schur-Complement.so] Error 1

Well, you need to fix the PETSc installation first. There is no hope going forward if PETSc alone is failing. You probably need to configure (PETSc) with the additional flag --with-mpi-dir.

Thanks. So I was finally able to install everything with the Intel compilers. Now when trying to run a job, I get this error:

'/softs/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiexec' -n 4 /softs/freefem-petsc-intel/FreeFem-sources/src/mpi/FreeFem++-mpi -nw 'diffusion-2d-PETSc.edp' -log_view -v 0
[0] MPI startup(): Intel(R) MPI Library, Version 2019 Update 9  Build 20200923 (id: abd58e492)
[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.10.1-impi
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       116427   compute03  30
[0] MPI startup(): 1       116428   compute03  32
[0] MPI startup(): 2       116429   compute03  34
[0] MPI startup(): 3       116430   compute03  36
  current line = 4 mpirank 3 / 4
  current line = 4 mpirank 2 / 4

Load error: PETSc
         fail:
 dlerror : ./PETSc.so: cannot open shared object file: No such file or directory
list prefix: '' './' list suffix: '' , '.so'
  current line = 4 mpirank 0 / 4
Load error : PETSc
        line number :4, PETSc
error Load error : PETSc
        line number :4, PETSc
 code = 2 mpirank: 0
  current line = 4 mpirank 1 / 4
~

Do you know what this means?

I’m guessing you didn’t compile the plugins? Or you are mixing both the OpenMPI and IntelMPI version of PETSc.so. What do you get with ldd /softs/freefem-petsc-intel/FreeFem-sources/plugin/mpi/PETSc.so?

Thanks a lot for your help. I had to add /softs/freefem-petsc-intel/FreeFem-sources/plugin/seq and /softs/freefem-petsc-intel/FreeFem-sources/plugin/mpi to the LIBRARY_PATH in my module file. Then I was able to run a simple job. Now I’m going to ask my user to check if a longer works correctly.
I have one last question about how to create a suitable FreeFEM module for my install. Here’s my module file currently:

set     FF_DIR  /softs/freefem-petsc-intel/FreeFem-sources
setenv FF_DIR  /softs/freefem-petsc-intel/FreeFem-sources
setenv FF_INCLUDEPATH /softs/freefem-petsc-intel/FreeFem-sources/idp

prepend-path    PATH            $FF_DIR/bin:$FF_DIR/src/mpi:$FF_DIR/src/nw:/softs/freefem-petsc-intel/petsc/arch-FreeFem/bin
prepend-path    INCLUDE  /softs/freefem-petsc-intel/FreeFem-sources/idp:/softs/freefem-petsc-intel/petsc/include:/softs/freefem-petsc-intel/petsc/arch-FreeFem/include
prepend-path    LD_LIBRARY_PATH /softs/freefem-petsc-intel/petsc/arch-FreeFem/lib:/softs/freefem-petsc-intel/FreeFem-sources/plugin/seq:/softs/freefem-petsc-intel/FreeFem-sources/plugin/mpi
prepend-path    LIBRARY_PATH /softs/freefem-petsc-intel/petsc/arch-FreeFem/lib:/softs/freefem-petsc-intel/FreeFem-sources/plugin/seq:/softs/freefem-petsc-intel/FreeFem-sources/plugin/mpi

Does it look correct to you?

You are not setting FF_LOADPATH, cf. the tutorial link I sent you. /softs/freefem-petsc-intel/FreeFem-sources/plugin/seq:/softs/freefem-petsc-intel/FreeFem-sources/plugin/mpi should not be set in LD_LIBRARY_PATH nor in LIBRARY_PATH, IMHO.

Hello,
We are doing some tests with the newly installed FreeFEM. There’s an error with one job launched like this:

ff-mpirun Helmholtz-3d-pml.edp -ns -wg -ffddm_medit -frequency 0.5 -nppwl 4 -npml 8

'/softs/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiexec' /softs/freefem-petsc-intel/FreeFem-sources/src/mpi/FreeFem++-mpi -nw 'Helmholtz-3d-pml.edp' -ns -wg -ffddm_medit -frequency 0.5 -nppwl 4 -npml 8
[0] MPI startup(): Intel(R) MPI Library, Version 2019 Update 9  Build 20200923 (id: abd58e492)
[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.10.1-impi
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       181633   compute01  17
[0] MPI startup(): 1       181634   compute01  35
[0] MPI startup(): 2       181635   compute01  37
[0] MPI startup(): 3       181636   compute01  39
-- FreeFem++ v4.1 (Tue Nov 16 16:49:40 CET 2021 - git v4.10-1-g5525c9c)
   file : Helmholtz-3d-pml.edp
 Load: lg_fem lg_mesh lg_mesh3 parallelempi

 *** Warning  The identifier y0 hide a Global identifier

 *** Warning  The identifier y1 hide a Global identifier
 (already loaded: medit)ffglut: error while loading shared libraries: libglut.so.3: cannot open shared object file: No such file or directory
 load: init metis (v  5 )

 *** Warning  The identifier N hide a Global identifier

 *** Warning  The identifier x hide a Global identifier

 *** Warning  The identifier y hide a Global identifier
 sizestack + 1024 =32432  ( 31408 )

lambda = 3, h = 0.75
29 29 43
[H] Building decomposition from mesh of 216978 elements

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 181633 RUNNING AT compute01
=   KILLED BY SIGNAL: 13 (Broken pipe)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 181634 RUNNING AT compute01
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================                                                                                        

We don’t have the libglut library installed on our compute nodes as they’re not meant for visualization. Do you know if there is a way to run this job without getting this error? Or should I install libglut on all the nodes?
Thanks in advance once again.

On a cluster, use the option -ng (no graphics) instead of -wg (with graphics)

1 Like