FreeFEM compilation: issues on HPC

Hello everyone,

I am having trouble installing FreeFEM + PETSc on a HPC cluster. Precisely, the installation seems OK at first glance, and everything runs smoothly when I run the codes on one node; however, when I try to run on multiple nodes, the code just stops randomly, similarly to a previously problem report.

First, I tried the installation with the following modules:

module load git/2.39.1
module load intel/2022.1.2
module load openmpi/5.0.3

With this, the installation is successful, but the program stops without a message on multiple nodes.

Then, as suggested in a previous post, I tried to switch to the only intel MPI version which was the following: module load impi/2021.5.1 .
Then, the PETSc compilation was successful, while the compilation FreeFEM failed with the following message:

mv -f $depbase.Tpo $depbase.Po
depbase=`echo ../lglib/lg.tab.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
mpicxx -DHAVE_CONFIG_H -I. -I../..  -DPARALLELE -I./../fflib -I./../Graphics -I./../femlib -I./../bamglib/ -I"/sw/pkgs/arc/intel/2022.1.2/mpi/2021.5.1/include" -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker -Xlinker -rpath -Xlinker -Xlinker --enable-new-dtags   -I/home/andrasz/FFinstall/petsc/arch-FreeFem/include/suitesparse -I/home/andrasz/FFinstall/petsc/arch-FreeFem/include -I./../../3rdparty/include/BemTool/ -I./../../3rdparty/boost/include -I../../plugin/mpi  -DPARALLELE -DHAVE_ZLIB -g  -DNDEBUG -O3 -mmmx -mavx2 -mavx -msse4.2 -msse2 -msse -std=gnu++14 -DBAMG_LONG_LONG  -DNCHECKPTR -fPIC -MT ../lglib/lg.tab.o -MD -MP -MF $depbase.Tpo -c -o ../lglib/lg.tab.o ../lglib/lg.tab.cpp &&\
mv -f $depbase.Tpo $depbase.Po
depbase=`echo compositeFESpace.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
mpicxx -DHAVE_CONFIG_H -I. -I../..  -DPARALLELE -I./../fflib -I./../Graphics -I./../femlib -I./../bamglib/ -I"/sw/pkgs/arc/intel/2022.1.2/mpi/2021.5.1/include" -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker -Xlinker -rpath -Xlinker -Xlinker --enable-new-dtags   -I/home/andrasz/FFinstall/petsc/arch-FreeFem/include/suitesparse -I/home/andrasz/FFinstall/petsc/arch-FreeFem/include -I./../../3rdparty/include/BemTool/ -I./../../3rdparty/boost/include -I../../plugin/mpi  -DPARALLELE -DHAVE_ZLIB -g  -DNDEBUG -O3 -mmmx -mavx2 -mavx -msse4.2 -msse2 -msse -std=gnu++14 -DBAMG_LONG_LONG  -DNCHECKPTR -fPIC -MT compositeFESpace.o -MD -MP -MF $depbase.Tpo -c -o compositeFESpace.o compositeFESpace.cpp &&\
mv -f $depbase.Tpo $depbase.Po
depbase=`echo parallelempi.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
mpicxx -DHAVE_CONFIG_H -I. -I../..  -DPARALLELE -I./../fflib -I./../Graphics -I./../femlib -I./../bamglib/ -I"/sw/pkgs/arc/intel/2022.1.2/mpi/2021.5.1/include" -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker -Xlinker -rpath -Xlinker -Xlinker --enable-new-dtags   -I/home/andrasz/FFinstall/petsc/arch-FreeFem/include/suitesparse -I/home/andrasz/FFinstall/petsc/arch-FreeFem/include -I./../../3rdparty/include/BemTool/ -I./../../3rdparty/boost/include -I../../plugin/mpi  -DPARALLELE -DHAVE_ZLIB -g  -DNDEBUG -O3 -mmmx -mavx2 -mavx -msse4.2 -msse2 -msse -std=gnu++14 -DBAMG_LONG_LONG  -DNCHECKPTR -fPIC -MT parallelempi.o -MD -MP -MF $depbase.Tpo -c -o parallelempi.o parallelempi.cpp &&\
mv -f $depbase.Tpo $depbase.Po
g++: error: unrecognized command line option ‘-rpath’
make[3]: *** [Makefile:710: ffapi.o] Error 1
make[3]: *** Waiting for unfinished jobs....
g++: error: unrecognized command line option ‘-rpath’
g++: error: unrecognized command line option ‘-rpath’
g++: error: unrecognized command line option ‘-rpath’
g++: error: unrecognized command line option ‘-rpath’
make[3]: *** [Makefile:710: ../lglib/mymain.o] Error 1
make[3]: *** [Makefile:710: ../lglib/lg.tab.o] Error 1
make[3]: *** [Makefile:710: parallelempi.o] Error 1
make[3]: *** [Makefile:710: compositeFESpace.o] Error 1
make[3]: Leaving directory '/home/andrasz/FFinstall/FreeFem-sources/src/mpi'
make[2]: *** [Makefile:554: all-recursive] Error 1
make[2]: Leaving directory '/home/andrasz/FFinstall/FreeFem-sources/src'
make[1]: *** [Makefile:896: all-recursive] Error 1
make[1]: Leaving directory '/home/andrasz/FFinstall/FreeFem-sources'

I attach the config.logs of both cases. Any help with finding the issue is much appreciated.

config_intel_openmpi.zip (643.0 KB)

config_intel_intelmpi.zip (631.5 KB)

I don’t know of a better fix than to manually remove -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker -Xlinker -rpath -Xlinker -Xlinker --enable-new-dtags from the various Makefile in the arborescence.

Thanks for the tip. Removing the corresponding string is not enough. There is a occurence of it in the file plugin/seq/WHERE_LIBRARY-config which is created by the ./configure command. If I remove the command there completely, the compilation seems fine. However, the original issue remains: the code runs fine on a single node, but stalls when I run it on two nodes. Do you have any ideas how to figure out the solution?

Are you sure you are not mixing MPI implementations at compile- and run-time?

I do not think so. I load either the module impi/2021.5.1 or openmpi/5.0.3, and set the compiler flags/environmental variables myself. Still, the issue persists in both cases.

I will try to contact the HPC admins about this. In the meantime, if you have any ideas where I should investigate further, I would appreciate it.

How are you launching the FreeFEM code?

I tried the following:
ff-mpirun -n 384 Nonlinear-solver.edp -Re 50 -v 0
srun --mpi=pmi2 -n 384 FreeFem++-mpi Nonlinear-solver.edp -Re ${Re} -v 0
Neither did work.

Can you do ldd FreeFem++-mpi and send the output, please? Does a simple cout << mpirank << endl; run?

Sure, here is the log.
Note that in the meantime, I switched back to openmpi as its compilation was easier.
ldd.log (4.6 KB)
Yes, the program displays all the mpiranks.

What is really odd (in the case of multiple nodes ofc) that with the current openmpi installation, when I use ff-mpirun -n NPROC, the code stalls, but when I launch with srun -n NPROC FreeFem++-mpi (without --mpi=pmi2), the code seems to work. However, it displays the following error message: [lh1659:3771795] Error: coll_hcoll_module.c:241 - mca_coll_hcoll_module_enable() coll_hcol: mca_coll_hcoll_save_coll_handlers failed
I suppose this happens everytime when communication occurs between the nodes.
I am not sure how serious of an issue this is, if it is one at all. Upon googling, I think this may cause suboptimal performance.
Anyways, I would appreciate if you could possibly provide some insight into this behavior which would help me (and the community) with sorting out similar issues.

This error comes from your machine, not the installation.

Ok, thank you so much for your help!!

In the meantime, I got the idea that I should compile PETSc and FreeFEM with MPICH. The installation is successful, and when I run an example on a login node using ff-mpirun, it seems to work fine. However, when I run the code with slurm + srun, it does not seem to work as intended: there are only one process (rank 0, mpisize 1). It seems like the connection/link between mpich and slurm needs to be specified somehow. Do you have any idea how that issue may be resolved?

srun probably uses the system MPI, meanwhile you want to use your own MPICH. But you should avoid compiling MPICH yourself and instead rely on a system module, precisely to avoid such issues. If your system module is bugged (cf. previous error message [lh1659:3771795] Error: coll_hcoll_module.c:241 - mca_coll_hcoll_module_enable() coll_hcol: mca_coll_hcoll_save_coll_handlers failed), you should ask how to get this fixed.

Thanks for the information. I thought about MPICH because it has worked in the past, and I have trouble with both openMPI/intel MPI which are the available MPIs on this HPC.

I have managed to run FreeFEM with openMPI, and run it with export OMPI_MCA_coll="^hcoll" plus srun --mpi=pmix, but I there seem to be some issues. When I tried to examine the scaling of the program on 60 nodes/11k processes, real PETSc seem to work, but the complex version stops with the following message:
Internal error 2 with small buffers. Furthermore, I now once again recompiled the code, and my code what runs fine on 5 nodes crashes on 10 nodes (more precisely, one code using real PETSc works, while the complex PETSc crashes with Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range). One note that previously, srun --mpi=pmixworked, but now I recompiled FreeFEM + PETSc and only srun --mpi=pmix_v4 works (only one 5 nodes).

Update: now to the code seems to be running on 10 nodes, everything being exactly the same … something is definitely not right.

Regarding intel MPI, I think I will have to clean out manually the make files, but I need to try that. I have been contact with the HPC support about this. Anyway, if based on the above behavior you have any tips how to progress, I would appreciate it.

OK, I managed to make FreeFEM work with intel compiler + MPI. The catch is, that once again it does not work on multiple nodes.

Interestingly, I observed the following. When I run a test on one node with an sbatch script with srun within, if I redirect the stdout/stderr output to separate files, the text is written as binary. If I do not redirect it, it appears correctly.

I think there is a mismatch between software(FreeFEm+PETSc) and hardware, both of them were tested extensively, but separately. Can you suggest any tips how I can possibly resolve the issue?

I continued to investigate intel MPI.

I the meantime, tried the https://petsc.org/release/src/ksp/ksp/tutorials/ex15.c.html example out. I ran it with the following command: srun -n 384 ex15 -m 200 -n 200 -ksp_type tsirm -pc_type ksp -ksp_monitor_short -ksp_ksp_type fgmres -ksp_ksp_rtol 1e-10 -ksp_pc_type mg -ksp_ksp_max_it 30 , and it seems to work thus on two nodes.

Another simple test I conducted is FreeFem-sources/examples/hpddm/stokes-block-2d-PETSc.edp at master · FreeFem/FreeFem-sources · GitHub. It runs on 1 node/192 processes, but stalls on two nodes/382 processes.

Then, I use the matrices exported by the block-stokes example in FreeFem-sources/examples/hpddm/MatLoad-PETSc.edp at master · FreeFem/FreeFem-sources · GitHub. Then, once again the example works on 1 node, but does not on two nodes.

These experiments indicate for me that something goes sideways in the FreeFEM/PETSc interface, or at least its compatibility with the HPS machine I am using. I attach the full output of the MatLoad examples with options export I_MPI_DEBUG=30 and export I_MPI_HYDRA_DEBUG=1. I also attach an output with two nodes plus the -start_in_debugger noxterm option. I do not find these outputs very informative: the KSP solve stage just stops.

If you have any tips how to proceed with the investigation, I would appreciate it.

MatLoad-384.log (64.6 KB)
MatLoad-384-debugger.log (1.0 MB)
MatLoad-192.log (40.8 KB)

In the debugger.log file, I read:

Mat Object: 384 MPI processes
  type: mpiaij
  rows=543003, cols=543003
  total: nonzeros=15631208, allocated nonzeros=15631208
  total number of mallocs used during MatSetValues calls=0
    using I-node (on process 0) routines: found 708 nodes, limit used is 5

So it looks to me like the code is working?

Ah, my bad, then there should be a KSPSolve()? Could you just stop one of the faulty process (no need to use GDB everywhere, can just run with -debugger_ranks 0) and get the back trace of where the process is hanging?

Thanks for taking a look. Yes, the KSPSolve is where the process should continue. Currently, all the machine is mostly occupied, so I just queued the run without specifying -N 2. The weird thing is that now, the MatLoad() example works 9/10 times. However, when I run the FreeFem-sources/examples/hpddm/navier-stokes-2d-PETSc.edp at master · FreeFem/FreeFem-sources · GitHub example, does code consistently does not work on two nodes. Once again, it works on one node.

I attach outputs of runs with the option -debugger-ranks 0 option; the Matload example when it was working/not working, and the NAvier-stokes output (the OK outputs are printed by me; I think it stops just ater the Jacobian is set, before KSPSolve is called internally). Any help is appreciated. I am quite confused by this behavior, especially the inconsistency.

NavierStokes_debugger_notworking.log (67.1 KB)
MatLoad-384-debugger-working.log (72.6 KB)
MatLoad-384-debugger-notworking.log (67.2 KB)

When a process hangs, I need you to show the back trace in GDB.