FreeFEM compilation: issues on HPC

aszaboa · March 19, 2025, 5:02pm

Sure! I think I did it. I attached to a process, and printed out the full backtrace. I attach three ones: two sleeping that are related to the srun command, and one running related to the FreeFem++=mpi command. I am not sure which one you were referring to. All the parallel freefrem processes seem to be running.
gdb_srun_1059945.log (1.9 KB)
gdb_srun_1059944.log (4.5 KB)
gdb-backtrace.log (61.3 KB)

prj · March 19, 2025, 5:23pm

Are those the back traces when processes are hanging? They look like the back traces when processes are starting (before the KSPSolve()).

aszaboa · March 19, 2025, 5:41pm

Maybe I did not explain myself well enough. I mean the process is still running. It just stops doing anything useful, I assume right before KSPSolve().

Edit: as GDB starts, it writes warning: Could not load shared library symbols for 2 libraries, e.g. ../../plugin/mpi/PETSc.so. I am not sure if that is meaningful.

prj · March 19, 2025, 6:14pm

Sorry, I didn’t read carefully enough. I now see that one process is stuck in MUMPS. Could you try to run with something trivial, like PCJACOBI for just 10 iterations. It will of course not converge, but it would be nice to check if the deadlock is specific to MUMPS.

aszaboa · March 19, 2025, 6:41pm

I tried out -pc_type jacobi -ksp_max_it 10 as you suggested and the program seems to run fine. Same applied for -pc_type bjacobi . However, -pc_type lu -pc_factor_mat_solver type superlu_dist get stuck just like mumps. Maybe the issue is somehow related to the conversion between matrix types (that is required by PCLU ) behind the scenes?

I attach the backtrace of the superlu run.
gdb_265145_superlu.log (27.1 KB)

prj · March 19, 2025, 6:54pm

Can you reproduce this on a standalone PETSc example?

aszaboa · March 19, 2025, 7:06pm

I do not think so. If I run the previously mentioned PETSc example, ex15.c with srun -n 384 ex15 -m 200 -n 200 -ksp_type preonly -pc_type lu -pc_factor_mat_solver_type mumps -log_view -options_left , it seems like all the options are detected, and the program completes successfully.

prj · March 19, 2025, 7:22pm

Maybe try to edit src/fflib/ffapi.cpp, in ffapi_mpi_init(), remove everything and just keep the call to PetscInitialize() (you may have to add #include <petsc.h> at the top of the file), then try to recompile in src.

aszaboa · March 19, 2025, 7:49pm

Thanks for the tip. Unfortunately, that did not work.
The corresponding function looks like the following:

static  void ffapi_mpi_init(int &argc, char** &argv){

#ifdef WITH_PETSCxxxxx
    PetscInitialize(&argc, &argv, 0, "");
#endif

  }

The compilation is successful, but the tests do not even start properly - I get the message Attempting to use an MPI routine before initializing MPICH ; then, the program exits.

prj · March 19, 2025, 7:50pm

You need to remove the #ifdef.

aszaboa · March 19, 2025, 8:23pm

Unfortunately, that does not work either. The following function

static  void ffapi_mpi_init(int &argc, char** &argv){

    PetscInitialize(&argc, &argv, 0, "");

  }

returns the error

../fflib/ffapi.cpp: In function ‘void ffapi::ffapi_mpi_init(int&, char**&)’:
../fflib/ffapi.cpp:258:5: error: ‘PetscInitialize’ was not declared in this scope
     PetscInitialize(&argc, &argv, 0, "");
     ^~~~~~~~~~~~~~~

At the top of the file petsc.h is included like this:

#ifdef WITH_PETSC
#include <petsc.h>
#endif

If I remove the macro here, I get a different error.

prj · March 19, 2025, 8:29pm

Well, you need to remove the macro and fix the error.

aszaboa · March 20, 2025, 4:27am

Unfortunately, I just cannot figure it out. I think removing the macro inside the function mentioned causes problems in the non-petsc part of FreeFEM. I understand that you do not have much time to dedicate to this issue (I already appreciate very much the help you gave me), but without further guidance the only solution I have in mind is export FreeFEM matrices in PETSc format, and do the linear algebra in a separate PETSc code, which, given the convenience the FreeFEM-PETSc interface offers, would be rather underwhelming.

prj · March 20, 2025, 6:35am

Right, you just want to compile the files needed for FreeFem++-mpi, and you probably need to fix the compile line/link line to include the proper PETSc flags. Anyway, if you can’t get it to compile (my idea was to let PETSc initialize MPI since the code seems to be working when it’s PETSc-only), you should ask your sysadmin what may cause the deadlock. I’m not much of an help without access to the machine.

aszaboa · March 21, 2025, 4:22pm

Thanks for the reply.

The sysadmin suggested a different route: spack + MPICH with libfabric installation. Unfortunately, the FreeFEM compilatio gets stuck, and I am not sure why. I attach the config.log, and the stdout/stderr of the make command. Could you please take a look at it? Maybe you could spot what goes wrong.

makeout.log (283.7 KB)
makeerr.log (41.3 KB)
config.log (381.1 KB)

From the output of ps aux, I think this is what gets stuck:

/sw/pkgs/arc/intel/2022.1.2/compiler/2022.0.2/linux/bin-llvm/clang++ -cc1 -triple x86_64-unknown-linux-gnu -emit-obj --mrelax-relocations -disable-free -disable-llvm-verifier -discard-value-names -main-file-name cmaes.cpp -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -fveclib=SVML -mframe-pointer=none -menable-no-infs -menable-no-nans -menable-unsafe-fp-math -fno-signed-zeros -mreassociate -freciprocal-math -fdenormal-fp-math=preserve-sign,preserve-sign -ffp-contract=fast -fno-rounding-math -ffast-math -ffinite-math-only -mconstructor-aliases -munwind-tables -target-cpu x86-64 -target-feature +mmx -target-feature +avx2 -target-feature +avx -target-feature +sse4.2 -target-feature +sse2 -target-feature +sse -mllvm -x86-enable-unaligned-vector-move=true -tune-cpu generic -debug-info-kind=limited -dwarf-version=4 -debugger-tuning=gdb -fcoverage-compilation-dir=/home/andrasz/FFinstall_mpich/FreeFem-sources/plugin/mpi -resource-dir /sw/pkgs/arc/intel/2022.1.2/compiler/2022.0.2/linux/lib/clang/14.0.0 -I ../seq/include -I ../seq -I /home/andrasz/.spack/opt/spack/intel-2022.0.2/mpich/4.2.3-akkh/include -I /home/andrasz/FFinstall_mpich/petsc/arch-FreeFem/include/suitesparse -I /home/andrasz/FFinstall_mpich/petsc/arch-FreeFem/include -I /home/andrasz/.spack/opt/spack/intel-2022.0.2/mpich/4.2.3-akkh/include -D NDEBUG -D BAMG_LONG_LONG -D NCHECKPTR -I/sw/pkgs/arc/intel/2022.1.2/tbb/2021.5.1/include -internal-isystem /sw/pkgs/arc/intel/2022.1.2/compiler/2022.0.2/linux/bin-llvm/../compiler/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /sw/pkgs/arc/intel/2022.1.2/compiler/2022.0.2/linux/lib/clang/14.0.0/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -std=gnu++14 -fdeprecated-macro -fdebug-compilation-dir=/home/andrasz/FFinstall_mpich/FreeFem-sources/plugin/mpi -ferror-limit 19 -fheinous-gnu-extensions -fgnuc-version=4.2.1 -fcxx-exceptions -fexceptions -mllvm -enable-gvn-hoist -vectorize-loops -vectorize-slp -D__GCC_HAVE_DWARF2_CFI_ASM=1 -fintel-compatibility -mllvm -disable-hir-generate-mkl-call -mllvm -intel-libirc-allowed -mllvm -loopopt=0 -floopopt-pipeline=none -mllvm -enable-lv -o cmaes.o -x c++ ../seq/cmaes.cpp

prj · March 21, 2025, 4:28pm

This will fix things with some compiler:

diff --git a/plugin/mpi/mpi-cmaes.cpp b/plugin/mpi/mpi-cmaes.cpp
index c80ee72e..2d03f0e2 100644
--- a/plugin/mpi/mpi-cmaes.cpp
+++ b/plugin/mpi/mpi-cmaes.cpp
@@ -28,3 +28,3 @@
 
-//ff-c++-LIBRARY-dep: mpi
+//ff-c++-LIBRARY-dep: mpi broken
 //ff-c++-cpp-dep: ../seq/cmaes.cpp -I../seq
diff --git a/plugin/seq/cmaes.cpp b/plugin/seq/cmaes.cpp
index b8196b79..92847e45 100644
--- a/plugin/seq/cmaes.cpp
+++ b/plugin/seq/cmaes.cpp
@@ -23,3 +23,3 @@
 /* clang-format off */
-//ff-c++-LIBRARY-dep:
+//ff-c++-LIBRARY-dep: broken
 //ff-c++-cpp-dep:

aszaboa · March 21, 2025, 4:51pm

Thanks for the quick help! This indeed fixed the problem. I will test whether this version can work on multiple nodes.

aszaboa · March 22, 2025, 2:47am

With MPICH, I experience similar unreliability when using a high number of processes an with openMPI. I managed to reproduce the issue with a PETSc-only script. I wrote to the PETSc developers; hopefully, they can offer some help with the investigation.

aszaboa · March 24, 2025, 10:36pm

As recommended by the PETSc developers, I am trying out the newest intel compilers/mpi with spack. Unfortunaltely, tine FreeFEM configure command returns an error. Could please take a look at the config.log what goes wrong?
config.log (53.9 KB)

prj · March 25, 2025, 5:34am

You need an up-to-date autoconf library.

Topic		Replies	Views
MPI fail on HPC cluster FreeFEM installation	30	3276	November 22, 2021
FreeFem++ PETSc configuration problem in Linux FreeFEM installation	32	3817	January 28, 2021
PETSc/MUMPS Compilation fail on arch-linux FreeFEM installation	9	450	April 7, 2022
Trouble installing FreeFEM with full plugin support on HPC cluster FreeFEM installation	25	146	April 16, 2025
Building FreeFEM with MPI from source with PETSc FreeFEM installation	13	1461	July 7, 2021

FreeFEM compilation: issues on HPC

Related topics