MPI fail on HPC cluster

Hello,
I installed FreeFEM last month on an HPC cluster (I’m the admin). But it seems there’s a problem when a user submits a job:

MPI_INIT has failed because at least one MPI process is unreachable
from another. This usually means that an underlying communication
plugin – such as a BTL or an MTL – has either not loaded or not
allowed itself to be used. Your MPI job will now abort.

You may wish to try to narrow down the problem;

  • Check the output of ompi_info to see which BTL/MTL plugins are
    available.
  • Run your application with MPI_THREAD_SINGLE.
  • Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
    if using MTL-based communications) to see exactly which
    communication plugins were considered and/or discarded.

[compute06:116371] *** An error occurred in MPI_Init_thread
[compute06:116371] *** reported by process [46912364806145,7]
[compute06:116371] *** on a NULL communicator
[compute06:116371] *** Unknown error
[compute06:116371] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[compute06:116371] *** and potentially your MPI job)
[compute06:116359] 39 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:pml-add-procs-fail
[compute06:116359] Set MCA parameter “orte_base_help_aggregate” to 0 to see all help / error messages
[compute06:116359] 39 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle

The job consists of the following command:
mpirun FreeFem+±mpi cube.edp -ns -nw -ffddm_schwarz_method oras

I installed FreeFEM following the procedure on the official website:
autoreconf -i
./configure --enable-download --enable-optim --prefix=/softs/freefem-ompi412
./3rdparty/getall -a
cd 3rdparty/ff-petsc
make petsc-slepc
cd …/…
./reconfigure
make -j16
make install

Before the install, I loaded some of our modules: OpenMPI/4.1.2, gcc 8, hdf5 v.1.12.1 as I usually do when installing new software.

Could someone help me understand this MPI error please?
Thanks in advance.

Please share FreeFem-sources/config.log and FreeFem-sources/3rdparty/ff-petsc/petsc-3.XY.Z/fr/lib/petsc/conf/configure.log.

You should probably use ff-mpirun instead of mpirun FreeFem++-mpi.

Hello,

Here are the files:
https://1fichier.com/?n38okvqzwri01dpibx61
https://1fichier.com/?cahd41w6bx0o0519ybjm

I tried using ff-mpirun but got the same error.

Thanks for your help.

If you use a dummy file like

load "PETSc"

Could you please run ff-mpirun -n 1 dummy.edp -log_view and send me the exact output, surrounded by triple back quote (`)?

-- FreeFem++ v4.9 (Fri Oct 29 14:20:29 CEST 2021 - git v4.9)
 Load: lg_fem lg_mesh lg_mesh3 eigenvalue parallelempi 
    1 : load "PETSc"
    2 : "" sizestack + 1024 =1072  ( 48 )

times: compile 0.08s, execution 0s,  mpirank:0
 CodeAlloc : nb ptr  4013,  size :547824 mpirank: 0
Ok: Normal End
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/softs/freefem-ompi412/bin/FreeFem++-mpi on a  named compute01 with 1 processor, by user9 Fri Nov  5 15:01:20 2021
Using Petsc Release Version 3.15.0, Mar 30, 2021 

                         Max       Max/Min     Avg       Total
Time (sec):           1.123e-03     1.000   1.123e-03
Objects:              1.000e+00     1.000   1.000e+00
Flop:                 0.000e+00     0.000   0.000e+00  0.000e+00
Flop/sec:             0.000e+00     0.000   0.000e+00  0.000e+00
MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 1.1194e-03  99.7%  0.0000e+00   0.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 3.19e-08
#PETSc Option Table entries:
-log_view
-nw dummy.edp
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/softs/freefem-ompi412/ff-petsc/r MAKEFLAGS= --with-debugging=0 COPTFLAGS="-O3 -mtune=native" CXXOPTFLAGS="-O3 -mtune=native" FOPTFLAGS="-O3 -mtune=native" --with-cxx-dialect=C++11 --with-ssl=0 --with-x=0 --with-fortran-bindings=0 --with-cudac=0 --with-cc=/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpicc --with-cxx=/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpic++ --with-fc=/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpif90 --with-scalar-type=real --download-f2cblaslapack --download-metis --download-ptscotch --download-hypre --download-parmetis --download-mmg --download-parmmg --download-superlu --download-suitesparse --download-tetgen --download-slepc --download-hpddm --download-scalapack --download-mumps --download-slepc-configure-arguments=--download-arpack=https://github.com/prj-/arpack-ng/archive/b64dccb.tar.gz PETSC_ARCH=fr
-----------------------------------------
Libraries compiled on 2021-10-29 11:40:15 on admin-hpc.univ-cotedazur.fr 
Machine characteristics: Linux-3.10.0-1160.42.2.el7.x86_64-x86_64-with-centos-7.9.2009-Core
Using PETSc directory: /softs/freefem-ompi412/ff-petsc/r
Using PETSc arch: 
-----------------------------------------

Using C compiler: /softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -O3 -mtune=native   
Using Fortran compiler: /softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -mtune=native     
-----------------------------------------

Using include paths: -I/softs/freefem-ompi412/ff-petsc/r/include
-----------------------------------------

Using C linker: /softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpicc
Using Fortran linker: /softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpif90
Using libraries: -Wl,-rpath,/softs/freefem-ompi412/ff-petsc/r/lib -L/softs/freefem-ompi412/ff-petsc/r/lib -lpetsc -Wl,-rpath,/softs/freefem-ompi412/ff-petsc/r/lib -L/softs/freefem-ompi412/ff-petsc/r/lib -Wl,-rpath,/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/lib -L/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/lib -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -Wl,-rpath,/softs/hdf5/lib -L/softs/hdf5/lib -Wl,-rpath,/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/nccl_rdma_sharp_plugin/lib -L/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/nccl_rdma_sharp_plugin/lib -Wl,-rpath,/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/sharp/lib -L/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/sharp/lib -Wl,-rpath,/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/hcoll/lib -L/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/hcoll/lib -Wl,-rpath,/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ucx/lib -L/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ucx/lib -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu -lf2clapack -lf2cblas -lparmmg -lmmg -lmmg3d -lptesmumps -lptscotchparmetis -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lparmetis -lmetis -ltet -lm -lstdc++ -ldl -lmpi_usempi -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lrt -lquadmath -lstdc++ -ldl
-----------------------------------------```

OK, now switch to examples/hpddm/diffusion-2d-PETSc.edp, please (first still with -np 1).
Then, switch to -np 4. Share both outputs, please. You may want to add the flag -v 0 for both runs, otherwise the output will be huge.

Here’s the output for -np 1. With 4 it failed (see below)

'/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpiexec' -n 1 /softs/freefem-ompi412/bin/FreeFem++-mpi -nw 'diffusion-2d-PETSc.edp' -log_view -v 0
KSP Object: 1 MPI processes
  type: gmres
    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: bjacobi
    number of blocks = 1
    Local solver information for first block is in the following KSP and PC objects on rank 0:
    Use -ksp_view ::ascii_info_detail to display information for all blocks
  KSP Object: (sub_) 1 MPI processes
    type: preonly
    maximum iterations=10000, initial guess is zero
    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
    left preconditioning
    using NONE norm type for convergence test
  PC Object: (sub_) 1 MPI processes
    type: ilu
      out-of-place factorization
      0 levels of fill
      tolerance for zero pivot 2.22045e-14
      matrix ordering: natural
      factor fill ratio given 1., needed 1.
        Factored matrix follows:
          Mat Object: 1 MPI processes
            type: seqaij
            rows=1681, cols=1681
            package used to perform factorization: petsc
            total: nonzeros=11441, allocated nonzeros=11441
              not using I-node routines
    linear system matrix = precond matrix:
    Mat Object: (sub_) 1 MPI processes
      type: seqaij
      rows=1681, cols=1681
      total: nonzeros=11441, allocated nonzeros=11441
      total number of mallocs used during MatSetValues calls=0
        not using I-node routines
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: mpiaij
    rows=1681, cols=1681
    total: nonzeros=11441, allocated nonzeros=11441
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
1.03629e+06 bytes of memory in usage
KSP Object: 1 MPI processes
  type: gmres
    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: gamg
    type is MULTIPLICATIVE, levels=3 cycles=v
      Cycles per PCApply=1
      Using externally compute Galerkin coarse grid matrices
      GAMG specific options
        Threshold for dropping small values in graph on each level =   0.   0.   0.  
        Threshold scaling factor for each level not specified = 1.
        AGG specific options
          Symmetric graph false
          Number of levels to square graph 1
          Number smoothing steps 1
        Complexity:    grid = 1.36693
  Coarse grid solver -- level -------------------------------
    KSP Object: (mg_coarse_) 1 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (mg_coarse_) 1 MPI processes
      type: bjacobi
        number of blocks = 1
        Local solver information for first block is in the following KSP and PC objects on rank 0:
        Use -mg_coarse_ksp_view ::ascii_info_detail to display information for all blocks
      KSP Object: (mg_coarse_sub_) 1 MPI processes
        type: preonly
        maximum iterations=1, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using NONE norm type for convergence test
      PC Object: (mg_coarse_sub_) 1 MPI processes
        type: lu
          out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
          matrix ordering: nd
          factor fill ratio given 5., needed 1.47107
            Factored matrix follows:
              Mat Object: 1 MPI processes
                type: seqaij
                rows=48, cols=48
                package used to perform factorization: petsc
                total: nonzeros=1424, allocated nonzeros=1424
                  using I-node routines: found 33 nodes, limit used is 5
        linear system matrix = precond matrix:
        Mat Object: (mg_coarse_sub_) 1 MPI processes
          type: seqaij
          rows=48, cols=48
          total: nonzeros=968, allocated nonzeros=968
          total number of mallocs used during MatSetValues calls=0
            not using I-node routines
      linear system matrix = precond matrix:
      Mat Object: 1 MPI processes
        type: mpiaij
        rows=48, cols=48
        total: nonzeros=968, allocated nonzeros=968
        total number of mallocs used during MatSetValues calls=0
          using nonscalable MatPtAP() implementation
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object: (mg_levels_1_) 1 MPI processes
      type: chebyshev
        eigenvalue estimates used:  min = 0.0993536, max = 1.09289
        eigenvalues estimate via gmres min 0.0215197, max 0.993536
        eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
        KSP Object: (mg_levels_1_esteig_) 1 MPI processes
          type: gmres
            restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
            happy breakdown tolerance 1e-30
          maximum iterations=10, initial guess is zero
          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
          left preconditioning
          using PRECONDITIONED norm type for convergence test
        estimating eigenvalues using noisy right hand side
      maximum iterations=2, nonzero initial guess
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (mg_levels_1_) 1 MPI processes
      type: sor
        type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object: 1 MPI processes
        type: mpiaij
        rows=252, cols=252
        total: nonzeros=3230, allocated nonzeros=3230
        total number of mallocs used during MatSetValues calls=0
          using nonscalable MatPtAP() implementation
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object: (mg_levels_2_) 1 MPI processes
      type: chebyshev
        eigenvalue estimates used:  min = 0.0994616, max = 1.09408
        eigenvalues estimate via gmres min 0.0098394, max 0.994616
        eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
        KSP Object: (mg_levels_2_esteig_) 1 MPI processes
          type: gmres
            restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
            happy breakdown tolerance 1e-30
          maximum iterations=10, initial guess is zero
          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
          left preconditioning
          using PRECONDITIONED norm type for convergence test
        estimating eigenvalues using noisy right hand side
      maximum iterations=2, nonzero initial guess
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (mg_levels_2_) 1 MPI processes
      type: sor
        type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object: 1 MPI processes
        type: mpiaij
        rows=1681, cols=1681
        total: nonzeros=11441, allocated nonzeros=11441
        total number of mallocs used during MatSetValues calls=0
          has attached near null space
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: mpiaij
    rows=1681, cols=1681
    total: nonzeros=11441, allocated nonzeros=11441
    total number of mallocs used during MatSetValues calls=0
      has attached near null space
      not using I-node (on process 0) routines
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/softs/freefem-ompi412/bin/FreeFem++-mpi on a  named compute01 with 1 processor, by user9 Fri Nov  5 15:49:32 2021
Using Petsc Release Version 3.15.0, Mar 30, 2021 

                         Max       Max/Min     Avg       Total
Time (sec):           6.243e-02     1.000   6.243e-02
Objects:              3.040e+02     1.000   3.040e+02
Flop:                 1.016e+07     1.000   1.016e+07  1.016e+07
Flop/sec:             1.627e+08     1.000   1.627e+08  1.627e+08
MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 6.2431e-02 100.0%  1.0160e+07 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided         17 1.0 1.5486e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
BuildTwoSidedF        11 1.0 1.8911e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult              137 1.0 1.2177e-03 1.0 2.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2 22  0  0  0   2 22  0  0  0  1844
MatMultAdd            12 1.0 9.6975e-05 1.0 7.16e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   738
MatMultTranspose      13 1.0 1.4345e-04 1.0 9.45e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   659
MatSolve              50 1.0 8.4689e-04 1.0 9.50e+05 1.0 0.0e+00 0.0e+00 0.0e+00  1  9  0  0  0   1  9  0  0  0  1121
MatSOR                70 1.0 1.0255e-03 1.0 1.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2 10  0  0  0   2 10  0  0  0  1003
MatLUFactorSym         1 1.0 5.9837e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         2 1.0 8.7487e-05 1.0 4.70e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   538
MatILUFactorSym        1 1.0 4.5152e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatConvert             2 1.0 1.6149e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale               6 1.0 5.9438e-05 1.0 4.13e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   694
MatResidual           12 1.0 9.4276e-05 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  1867
MatAssemblyBegin      25 1.0 1.1670e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd        25 1.0 7.8539e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatGetRowIJ            2 1.0 9.1350e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         2 1.0 4.4952e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCoarsen             2 1.0 6.9257e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         2 1.0 1.7760e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                9 1.0 2.8031e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
MatAXPY                2 1.0 1.7306e-04 1.0 1.93e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    11
MatTranspose           4 1.0 8.5592e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMatMultSym          6 1.0 6.7558e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatMatMultNum          2 1.0 7.9409e-05 1.0 2.93e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   370
MatPtAPSymbolic        2 1.0 1.1173e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatPtAPNumeric         2 1.0 3.5130e-04 1.0 1.78e+05 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0   506
MatTrnMatMultSym       1 1.0 4.2111e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
KSPSetUp               8 1.0 1.1534e-03 1.0 1.13e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2 11  0  0  0   2 11  0  0  0   980
KSPSolve               2 1.0 1.1377e-02 1.0 1.01e+07 1.0 0.0e+00 0.0e+00 0.0e+00 18 99  0  0  0  18 99  0  0  0   887
KSPGMRESOrthog        87 1.0 8.2486e-04 1.0 4.60e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1 45  0  0  0   1 45  0  0  0  5579
PCGAMGGraph_AGG        2 1.0 9.8906e-04 1.0 2.93e+04 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0    30
PCGAMGCoarse_AGG       2 1.0 7.2509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
PCGAMGProl_AGG         2 1.0 7.4639e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
PCGAMGPOpt_AGG         2 1.0 1.6074e-03 1.0 8.70e+05 1.0 0.0e+00 0.0e+00 0.0e+00  3  9  0  0  0   3  9  0  0  0   541
GAMG: createProl       2 1.0 4.1161e-03 1.0 8.99e+05 1.0 0.0e+00 0.0e+00 0.0e+00  7  9  0  0  0   7  9  0  0  0   218
  Graph                4 1.0 9.3922e-04 1.0 2.93e+04 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0    31
  MIS/Agg              2 1.0 8.3806e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
  SA: col data         2 1.0 5.6530e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
  SA: frmProl0         2 1.0 6.6154e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
  SA: smooth           2 1.0 7.8817e-04 1.0 4.32e+04 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0    55
GAMG: partLevel        2 1.0 1.4928e-03 1.0 1.78e+05 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   119
PCGAMG Squ l00         1 1.0 4.2155e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
PCGAMG Gal l00         1 1.0 1.0574e-03 1.0 1.22e+05 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   116
PCGAMG Opt l00         1 1.0 3.7154e-04 1.0 2.29e+04 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0    62
PCGAMG Gal l01         1 1.0 4.2844e-04 1.0 5.55e+04 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   129
PCGAMG Opt l01         1 1.0 1.4855e-04 1.0 6.46e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    43
PCSetUp                4 1.0 7.3636e-03 1.0 2.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00 12 22  0  0  0  12 22  0  0  0   306
PCSetUpOnBlocks        7 1.0 2.9771e-04 1.0 4.70e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   158
PCApply               50 1.0 2.5069e-03 1.0 2.69e+06 1.0 0.0e+00 0.0e+00 0.0e+00  4 26  0  0  0   4 26  0  0  0  1072
VecMDot               87 1.0 5.5492e-04 1.0 2.30e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1 23  0  0  0   1 23  0  0  0  4146
VecNorm               95 1.0 1.8127e-04 1.0 2.56e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0  1415
VecScale              95 1.0 6.5938e-05 1.0 1.28e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1945
VecCopy               43 1.0 2.0628e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               253 1.0 2.9753e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                8 1.0 1.6919e-05 1.0 2.12e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1252
VecAYPX               72 1.0 3.8402e-05 1.0 9.28e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  2416
VecAXPBYCZ            24 1.0 1.4638e-05 1.0 1.16e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  7923
VecMAXPY              94 1.0 2.5752e-04 1.0 2.54e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0 25  0  0  0   0 25  0  0  0  9850
VecPointwiseMult      22 1.0 1.0403e-05 1.0 2.13e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2044
VecScatterBegin      164 1.0 1.2301e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterEnd        164 1.0 5.4436e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSetRandom           2 1.0 5.8197e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize          95 1.0 2.6538e-04 1.0 3.85e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0  1450
SFSetGraph             9 1.0 2.4650e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                6 1.0 5.5386e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFPack               164 1.0 8.3180e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFUnpack             164 1.0 8.3400e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container     4              4         2336     0.
              Matrix    44             44      1488800     0.
      Matrix Coarsen     2              2         1264     0.
   Matrix Null Space     1              1          624     0.
       Krylov Solver    10             10       149832     0.
      Preconditioner    10             10        10492     0.
              Viewer     4              3         2544     0.
              Vector   158            158      1489304     0.
           Index Set    23             23        48208     0.
   Star Forest Graph    23             23        26136     0.
    Distributed Mesh     7              7        35336     0.
     Discrete System     7              7         6328     0.
           Weak Form     7              7         5768     0.
         PetscRandom     4              4         2680     0.
========================================================================================================================
Average time to get PetscTime(): 2.56e-08
#PETSc Option Table entries:
-ksp_max_it 200
-ksp_type gmres
-ksp_view
-log_view
-nw diffusion-2d-PETSc.edp
-pc_type gamg
-v 0
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/softs/freefem-ompi412/ff-petsc/r MAKEFLAGS= --with-debugging=0 COPTFLAGS="-O3 -mtune=native" CXXOPTFLAGS="-O3 -mtune=native" FOPTFLAGS="-O3 -mtune=native" --with-cxx-dialect=C++11 --with-ssl=0 --with-x=0 --with-fortran-bindings=0 --with-cudac=0 --with-cc=/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpicc --with-cxx=/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpic++ --with-fc=/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpif90 --with-scalar-type=real --download-f2cblaslapack --download-metis --download-ptscotch --download-hypre --download-parmetis --download-mmg --download-parmmg --download-superlu --download-suitesparse --download-tetgen --download-slepc --download-hpddm --download-scalapack --download-mumps --download-slepc-configure-arguments=--download-arpack=https://github.com/prj-/arpack-ng/archive/b64dccb.tar.gz PETSC_ARCH=fr
-----------------------------------------
Libraries compiled on 2021-10-29 11:40:15 on admin-hpc.univ-cotedazur.fr 
Machine characteristics: Linux-3.10.0-1160.42.2.el7.x86_64-x86_64-with-centos-7.9.2009-Core
Using PETSc directory: /softs/freefem-ompi412/ff-petsc/r
Using PETSc arch: 
-----------------------------------------

Using C compiler: /softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -O3 -mtune=native   
Using Fortran compiler: /softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -mtune=native     
-----------------------------------------

Using include paths: -I/softs/freefem-ompi412/ff-petsc/r/include
-----------------------------------------

Using C linker: /softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpicc
Using Fortran linker: /softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpif90
Using libraries: -Wl,-rpath,/softs/freefem-ompi412/ff-petsc/r/lib -L/softs/freefem-ompi412/ff-petsc/r/lib -lpetsc -Wl,-rpath,/softs/freefem-ompi412/ff-petsc/r/lib -L/softs/freefem-ompi412/ff-petsc/r/lib -Wl,-rpath,/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/lib -L/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/lib -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -Wl,-rpath,/softs/hdf5/lib -L/softs/hdf5/lib -Wl,-rpath,/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/nccl_rdma_sharp_plugin/lib -L/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/nccl_rdma_sharp_plugin/lib -Wl,-rpath,/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/sharp/lib -L/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/sharp/lib -Wl,-rpath,/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/hcoll/lib -L/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/hcoll/lib -Wl,-rpath,/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ucx/lib -L/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ucx/lib -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu -lf2clapack -lf2cblas -lparmmg -lmmg -lmmg3d -lptesmumps -lptscotchparmetis -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lparmetis -lmetis -ltet -lm -lstdc++ -ldl -lmpi_usempi -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lrt -lquadmath -lstdc++ -ldl
-----------------------------------------```


‘/softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpiexec’ -n 4 /softs/freefem-ompi412/bin/FreeFem+±mpi -nw ‘diffusion-2d-PETSc.edp’ -log_view -v 0

MPI_INIT has failed because at least one MPI process is unreachable
from another. This usually means that an underlying communication
plugin – such as a BTL or an MTL – has either not loaded or not
allowed itself to be used. Your MPI job will now abort.

You may wish to try to narrow down the problem;

  • Check the output of ompi_info to see which BTL/MTL plugins are
    available.
  • Run your application with MPI_THREAD_SINGLE.
  • Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
    if using MTL-based communications) to see exactly which
    communication plugins were considered and/or discarded.

[compute02:15752] *** An error occurred in MPI_Init_thread
[compute02:15752] *** reported by process [46911522209793,3]
[compute02:15752] *** on a NULL communicator
[compute02:15752] *** Unknown error
[compute02:15752] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[compute02:15752] *** and potentially your MPI job)
[compute02:15745] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:pml-add-procs-fail
[compute02:15745] Set MCA parameter “orte_base_help_aggregate” to 0 to see all help / error messages
[compute02:15745] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle```

Does a simple Hello World! run in parallel on this compute node?

Of course. We have many users using MPI on our cluster. But FreeFEM doesn’t work if it is installed in the shared directory where I install software. Because of this, our FreeFEM user installed it manually in her home directory using this command:
./configure --enable-download --disable-ipopt --with-hdf5=no --prefix=/home/username/FreeFem-build
She said it works like this but she didn’t install petsc/slepc. She will need it later.

This is why I’d like to install it with everything but I don’t understand why it doesn’t work. My module contains the following:

depends-on gnu8
depends-on openmpi/4.1.2
depends-on hdf5/1.12.1

prepend-path    PATH            /softs/freefem-ompi412/bin
prepend-path    MANPATH         /softs/freefem-ompi412/share
prepend-path    LIBRARY_PATH    /softs/freefem-ompi412/lib
prepend-path    LD_LIBRARY_PATH /softs/freefem-ompi412/lib

What if you try /softs/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-redhat7.9-x86_64/ompi/bin/mpirun -n 4 /softs/freefem-ompi412/bin/FreeFem++mpi -nw 'diffusion-2d-PETSc.edp' -log_view -v 0? How do you usually launch MPI jobs?

This command gives a similar error:

--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;

 * Check the output of ompi_info to see which BTL/MTL plugins are
   available.
 * Run your application with MPI_THREAD_SINGLE.
 * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
   if using MTL-based communications) to see exactly which
   communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
[login-hpc:15954] *** An error occurred in MPI_Init_thread
[login-hpc:15954] *** reported by process [46911848775681,2]
[login-hpc:15954] *** on a NULL communicator
[login-hpc:15954] *** Unknown error
[login-hpc:15954] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[login-hpc:15954] ***    and potentially your MPI job)
[login-hpc:15948] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2198
[login-hpc:15948] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2198
[login-hpc:15948] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:pml-add-procs-fail
[login-hpc:15948] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[login-hpc:15948] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle

In order to launch MPI jobs, our users load the ompi module: module load openmpi/4.1.2 which is the OpenMPI that comes with the HPC-X Mellanox archive. Then they can use mpirun, mpiexec, etc. This MPI is compatible with Slurm so they don’t need to use the np option. Instead they can just give the number of cores using the ntasks option from Slurm. It works perfectly with the existing software that we have.

I also tried to install using the Intel compilers and Intel MPI instead of the GNU compilers and OpenMPI. I followed the tutorial here: Building FreeFEM++ with Intel® Software Tools for Developers
But I get an error with make petsc-slepc:

make[1]: Leaving directory `/softs/SOURCES/ff/FreeFem-sources-4.9/3rdparty/ff-petsc/petsc-3.15.0/fr/externalpackages/git.ptscotch/src'In file included from /usr/include/sys/wait.h(30),
                 from common.h(130),
                 from common_string.c(57):
/usr/include/signal.h(156): error: identifier "siginfo_t" is undefined
  extern void psiginfo (const siginfo_t *__pinfo, const char *__s);
                              ^

compilation aborted for common_string.c (code 2)
make[3]: *** [common_string.o] Error 2
make[2]: *** [scotch] Error 2
make[1]: *** [libscotch] Error 2
*******************************************************************************

make: *** [petsc-3.15.0/tag-conf-real] Error 1

I don’t know what is up with OpenMPI, you’ll have to follow the leads that it is giving you to understand what is going on (using MCA parameters to get more info).
The error with PT-SCOTCH is rather common, see Compiltion errors on Centos 7 - #3 by prj.

Thanks for the link. This solved the PT-SCOTCH error, however I have another issue with fftw during the make of freefem:

libtool: compile:  icc -std=gnu99 -no-gcc -DHAVE_CONFIG_H -I. -I.. -I .. -xCOMMON-AVX512 -DNDEBUG -O3 -mmmx -mavx -fPIC -c tensor.c -o tensor.o
tensor.c(31): error: identifier "__INT_MAX__" is undefined
       if (FINITE_RNK(rnk) && rnk > 1)
           ^

tensor.c(67): error: identifier "__INT_MAX__" is undefined
       if (!FINITE_RNK(sz->rnk))
            ^

tensor.c(79): error: identifier "__INT_MAX__" is undefined
       if (FINITE_RNK(t->rnk)) {
           ^

tensor.c(108): error: identifier "__INT_MAX__" is undefined
       if (FINITE_RNK(x->rnk)) {
           ^

compilation aborted for tensor.c (code 2)
make[8]: *** [tensor.lo] Error 1
make[8]: Leaving directory `/softs/SOURCES/freefem2/FreeFem-sources/3rdparty/fftw/fftw-3.3.8/kernel'
make[7]: *** [all-recursive] Error 1
make[7]: Leaving directory `/softs/SOURCES/freefem2/FreeFem-sources/3rdparty/fftw/fftw-3.3.8'
make[6]: *** [all] Error 2
make[6]: Leaving directory `/softs/SOURCES/freefem2/FreeFem-sources/3rdparty/fftw/fftw-3.3.8'
make[5]: *** [fftw-3.3.8/FAIT] Error 2
make[5]: Leaving directory `/softs/SOURCES/freefem2/FreeFem-sources/3rdparty/fftw'
make[4]: *** [compile-dir] Error 2
make[4]: Leaving directory `/softs/SOURCES/freefem2/FreeFem-sources/3rdparty'
make[3]: *** [tag-compile-pkg] Error 1
make[3]: Leaving directory `/softs/SOURCES/freefem2/FreeFem-sources/3rdparty'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/softs/SOURCES/freefem2/FreeFem-sources/3rdparty'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/softs/SOURCES/freefem2/FreeFem-sources'
make: *** [all] Error 2

Let me know if I should start another thread for this issue. I’d like to try FreeFEM with the Intel compilers since I don’t understand the problem with OpenMPI yet.

Do you really need FFTW? I usually configure with the option --enable-download-fftw=no to avoid such problems.

I don’t know if the user needs fftw. He asked for freefem “with petsc and everything”. But we do already have fftw modules on the cluster so maybe this is not a problem. However, I still have errors when trying to compile. Here’s exactly what I did:

git clone https://github.com/FreeFem/FreeFem-sources.git
cd FreeFem-sources/
module purge
module load intel
module load autotools
module load cmake/3.20.1-intel
export OPTF=-xCOMMON-AVX512 
export CC=icc 
export CFLAGS=$OPTF 
export CXX=icpc 
export CXXFLAGS=$OPTF 
export FC=ifort 
export FCFLAGS=$OPTF 
export F77=ifort 
export FFLAGS=$OPTF 
autoreconf -i
./configure --prefix=/softs/freefem-intel --enable-download --enable-download-fftw=no --with-mpiinc=-I${I_MPI_ROOT}/intel64/include --with-mpilibs="-L${I_MPI_ROOT}/intel64/lib/release_mt -L${I_MPI_ROOT}/intel64/lib -lmpicxx -lmpifort -lmpi -lmpigi -ldl -lrt -lpthread" --with-mpilibsc="-L${I_MPI_ROOT}/intel64/lib/release_mt -L${I_MPI_ROOT}/intel64/lib -lmpicxx -lmpifort -lmpi -lmpigi -ldl -lrt -lpthread"
./3rdparty/getall -a
cd 3rdparty/ff-petsc/ 
I edited the Makefile as suggested to avoid the pt scotch error, then:
make petsc-slepc

Which gave an error:

          CC fc/obj/mat/coarsen/impls/mis/mis.o
/usr/include/c++/4.8.5/bits/stl_algo.h(2263): error: function "lambda [](std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &, std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &)->bool" cannot be called with the given argument list
            argument types are: (std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>, const std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>)
            object type is: lambda [](std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &, std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &)->bool
          while (__comp(*__first, __pivot))
                 ^
/softs/freefem-intel/ff-petsc/c/include/htool/misc/../types/hmatrix.hpp(451): note: this candidate was rejected because arguments do not match
      std::sort(MyComputedBlocks.begin(), MyComputedBlocks.end(), [](std::unique_ptr<IMatrix<T>> &a, std::unique_ptr<IMatrix<T>> &b) {
                                                                    ^
          detected during:
            instantiation of "_RandomAccessIterator std::__unguarded_partition(_RandomAccessIterator, _RandomAccessIterator, const _Tp &, _Compare) [with _RandomAccessIterator=__gnu_cxx::__normal_iterator<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> *, std::vector<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>, std::allocator<std::unique_ptr<htool::IMatrix<PetscScalar>,
                      std::default_delete<htool::IMatrix<PetscScalar>>>>>>, _Tp=std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>, _Compare=lambda [](std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &, std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &)->bool]" at line 2296
            instantiation of "_RandomAccessIterator std::__unguarded_partition_pivot(_RandomAccessIterator, _RandomAccessIterator, _Compare) [with _RandomAccessIterator=__gnu_cxx::__normal_iterator<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> *, std::vector<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>, std::allocator<std::unique_ptr<htool::IMatrix<PetscScalar>,
                      std::default_delete<htool::IMatrix<PetscScalar>>>>>>, _Compare=lambda [](std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &, std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &)->bool]" at line 2337
            instantiation of "void std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size, _Compare) [with _RandomAccessIterator=__gnu_cxx::__normal_iterator<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> *, std::vector<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>, std::allocator<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>>>>,
                      _Size=long, _Compare=lambda [](std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &, std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &)->bool]" at line 5499
            instantiation of "void std::sort(_RAIter, _RAIter, _Compare) [with _RAIter=__gnu_cxx::__normal_iterator<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> *, std::vector<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>, std::allocator<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>>>>, _Compare=lambda [](std::unique_ptr<htool::IMatrix<PetscScalar>,
                      std::default_delete<htool::IMatrix<PetscScalar>>> &, std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &)->bool]" at line 457 of "/softs/freefem-intel/ff-petsc/c/include/htool/misc/../types/hmatrix.hpp"
            instantiation of "void htool::HMatrix<T>::ComputeBlocks(htool::VirtualGenerator<T> &, const double *, const double *) [with T=PetscScalar]" at line 317 of "/softs/freefem-intel/ff-petsc/c/include/htool/misc/../types/hmatrix.hpp"
            instantiation of "void htool::HMatrix<T>::build(htool::VirtualGenerator<T> &, const double *, const double *) [with T=PetscScalar]" at line 118 of "/softs/freefem-intel/ff-petsc/c/include/htool/misc/../types/hmatrix.hpp"
            implicit generation of "htool::HMatrix<T>::~HMatrix() [with T=PetscScalar]" at line 118 of "/softs/freefem-intel/ff-petsc/c/include/htool/misc/../types/hmatrix.hpp"
            instantiation of class "htool::HMatrix<T> [with T=PetscScalar]" at line 118 of "/softs/freefem-intel/ff-petsc/c/include/htool/misc/../types/hmatrix.hpp"
            instantiation of "htool::HMatrix<T>::HMatrix(const std::shared_ptr<htool::VirtualCluster> &, const std::shared_ptr<htool::VirtualCluster> &, double, double, char, char, const int &, MPI_Comm={int}) [with T=PetscScalar]" at line 466 of "/softs/SOURCES/freefemintel/FreeFem-sources/3rdparty/ff-petsc/petsc-3.16.1/src/mat/impls/htool/htool.cxx"

/usr/include/c++/4.8.5/bits/stl_algo.h(2266): error: function "lambda [](std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &, std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &)->bool" cannot be called with the given argument list
            argument types are: (const std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>, std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>)
            object type is: lambda [](std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &, std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &)->bool
          while (__comp(__pivot, *__last))
                 ^
/softs/freefem-intel/ff-petsc/c/include/htool/misc/../types/hmatrix.hpp(451): note: this candidate was rejected because arguments do not match
      std::sort(MyComputedBlocks.begin(), MyComputedBlocks.end(), [](std::unique_ptr<IMatrix<T>> &a, std::unique_ptr<IMatrix<T>> &b) {
                                                                    ^
          detected during:
            instantiation of "_RandomAccessIterator std::__unguarded_partition(_RandomAccessIterator, _RandomAccessIterator, const _Tp &, _Compare) [with _RandomAccessIterator=__gnu_cxx::__normal_iterator<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> *, std::vector<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>, std::allocator<std::unique_ptr<htool::IMatrix<PetscScalar>,
                      std::default_delete<htool::IMatrix<PetscScalar>>>>>>, _Tp=std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>, _Compare=lambda [](std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &, std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &)->bool]" at line 2296
            instantiation of "_RandomAccessIterator std::__unguarded_partition_pivot(_RandomAccessIterator, _RandomAccessIterator, _Compare) [with _RandomAccessIterator=__gnu_cxx::__normal_iterator<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> *, std::vector<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>, std::allocator<std::unique_ptr<htool::IMatrix<PetscScalar>,
                      std::default_delete<htool::IMatrix<PetscScalar>>>>>>, _Compare=lambda [](std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &, std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &)->bool]" at line 2337
            instantiation of "void std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size, _Compare) [with _RandomAccessIterator=__gnu_cxx::__normal_iterator<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> *, std::vector<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>, std::allocator<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>>>>,
                      _Size=long, _Compare=lambda [](std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &, std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &)->bool]" at line 5499
            instantiation of "void std::sort(_RAIter, _RAIter, _Compare) [with _RAIter=__gnu_cxx::__normal_iterator<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> *, std::vector<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>, std::allocator<std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>>>>>, _Compare=lambda [](std::unique_ptr<htool::IMatrix<PetscScalar>,
                      std::default_delete<htool::IMatrix<PetscScalar>>> &, std::unique_ptr<htool::IMatrix<PetscScalar>, std::default_delete<htool::IMatrix<PetscScalar>>> &)->bool]" at line 457 of "/softs/freefem-intel/ff-petsc/c/include/htool/misc/../types/hmatrix.hpp"
            instantiation of "void htool::HMatrix<T>::ComputeBlocks(htool::VirtualGenerator<T> &, const double *, const double *) [with T=PetscScalar]" at line 317 of "/softs/freefem-intel/ff-petsc/c/include/htool/misc/../types/hmatrix.hpp"
            instantiation of "void htool::HMatrix<T>::build(htool::VirtualGenerator<T> &, const double *, const double *) [with T=PetscScalar]" at line 118 of "/softs/freefem-intel/ff-petsc/c/include/htool/misc/../types/hmatrix.hpp"
            implicit generation of "htool::HMatrix<T>::~HMatrix() [with T=PetscScalar]" at line 118 of "/softs/freefem-intel/ff-petsc/c/include/htool/misc/../types/hmatrix.hpp"
            instantiation of class "htool::HMatrix<T> [with T=PetscScalar]" at line 118 of "/softs/freefem-intel/ff-petsc/c/include/htool/misc/../types/hmatrix.hpp"
            instantiation of "htool::HMatrix<T>::HMatrix(const std::shared_ptr<htool::VirtualCluster> &, const std::shared_ptr<htool::VirtualCluster> &, double, double, char, char, const int &, MPI_Comm={int}) [with T=PetscScalar]" at line 466 of "/softs/SOURCES/freefemintel/FreeFem-sources/3rdparty/ff-petsc/petsc-3.16.1/src/mat/impls/htool/htool.cxx"

          CC fc/obj/mat/order/sp1wd.o
          CC fc/obj/mat/order/spnd.o
          CC fc/obj/mat/order/spqmd.o
          CC fc/obj/mat/order/sprcm.o
compilation aborted for /softs/SOURCES/freefemintel/FreeFem-sources/3rdparty/ff-petsc/petsc-3.16.1/src/mat/impls/htool/htool.cxx (code 2)
gmake[5]: *** [fc/obj/mat/impls/htool/htool.o] Error 2
gmake[5]: *** Waiting for unfinished jobs....
gmake[4]: *** [libs] Error 2
**************************ERROR*************************************
  Error during compile, check fc/lib/petsc/conf/make.log
  Send it and fc/lib/petsc/conf/configure.log to petsc-maint@mcs.anl.gov
********************************************************************
make[3]: *** [all] Error 1
make[2]: *** [all] Error 2
make[2]: Leaving directory `/softs/SOURCES/freefemintel/FreeFem-sources/3rdparty/ff-petsc/petsc-3.16.1'
make[1]: *** [petsc-3.16.1/tag-make-complex] Error 2
make[1]: Leaving directory `/softs/SOURCES/freefemintel/FreeFem-sources/3rdparty/ff-petsc'
make: *** [WHERE-all] Error 2

It looks like it’s using the wrong c++ compiler: /usr/include/c++/4.8.5 instead of the icc compiler.

Also, during make petsc-slepc, it looks like the wrong mpi is being used. It says “using mpiexec: /softs/freefem-intel/ff-petsc/r/bin/mpiexec” (which is the mpich installed with make petsc-slepc) whereas it should use my Intel MPI compiler?

Maybe it’s possible to install freefem only, and petsc separately?

Here’s the make.log related to the error: 1fichier.com: Cloud Storage

Don’t you have access to a newer g++? It looks like you are using icpc with g++ 4.8.5. Please try to load a newer g++ module. In case you don’t know, icpc relies on an underlying g++.

Of course it is possible to install FreeFEM and PETSc separately, that’s what I always advice people to do (especially on clusters).

Do you have a procedure for how to install FreeFEM and PETSc separately on a cluster? I guess I should start by installing PETSc?

http://jolivet.perso.enseeiht.fr/FreeFem-tutorial/main.pdf#page=197

The link doesn’t work. :slightly_frowning_face:

Try again (I’ve removed the s from https), please.