Ah maybe I have to do
make clean first then make petsc-slepc for ff-petsc…
I will come back with new configure.log…
Indeed, you need to nuke the /home/yim/FreeFem-install-dev/ff-petsc
folder.
Dear prj,
With the make clean in 3rd-party/ff-petsc, I manage to compile the program without ERRORs.
config.log (438.3 KB)
cofigure2.log (1.0 MB)
Then, I was happy to retry the example file like diffusion-3d.edp,
Now I have this output file…
srun: error: PMK_KVS_Barrier duplicate request from task 0
srun: error: PMK_KVS_Barrier duplicate request from task 1
srun: error: PMK_KVS_Barrier duplicate request from task 2
srun: error: PMK_KVS_Barrier duplicate request from task 3
Just to inform you, when I run the code, I use shell script something like…
module add intel/18.0.5 intel-mpi/2018.4.274 intel-mkl/2018.5.274
export LD_LIBRARY_PATH={LD_LIBRARY_PATH}:{INTEL_ROOT}/compiler/lib/intel64
dirff=/home/yim/FreeFem-sources-develop/src/mpi/ff-mpirun
srun $dirff diffusion-3d.edp > output.txt
What did I do wrong this time?
Thank you and sorry…
Eunok
You have to run srun -n 4 /home/yim/FreeFem-sources-develop/src/mpi/FreeFem++-mpi diffusion-3d.edp -v 0
, not /home/yim/FreeFem-sources-develop/src/mpi/ff-mpirun
. Could you try that instead, please?
Ah yes. I was using it previously… Sorry…
diffusion-3d.edp works as before! And I made random tests for some examples: diffusion-3d-PETSc.edp works! and also laplace-adapt-3d-PETSc.edp!
But not newton-vi-2d-PETSc.edp
The error msg are
[1]PETSC ERROR: ------------------------------------------------------------------------
[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[1]PETSC ERROR: to get more information on the crash.
[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: Signal received
[1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[1]PETSC ERROR: Petsc Release Version 3.13.4, Aug 01, 2020
[1]PETSC ERROR: /home/yim/FreeFem-sources-develop/src/mpi/FreeFem+±mpi on a named g130 by yim Tue Aug 25 18:11:38 2020
[1]PETSC ERROR: Configure options --prefix=/home/yim/FreeFem-install-dev/ff-petsc/r MAKEFLAGS= --with-debugging=0 COPTFLAGS="-O3 -mtune=native" CXXOPTFLAGS="-O3 -mtune=native" FOPTFLAGS="-O3 -mtune=native" --with-cxx-dialect=C++11 --with-ssl=0 --with-x=0 --with-fortran-bindings=0 --with-cudac=0 --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-scalar-type=real --with-blaslapack-include=/ssoft/spack/external/intel/2018.4/compilers_and_libraries_2018.5.274/linux/mkl/include --with-blaslapack-lib="-Wl,-rpath,/ssoft/spack/external/intel/2018.4/compilers_and_libraries_2018.5.274/linux/mkl/lib/intel64 -L/ssoft/spack/external/intel/2018.4/compilers_and_libraries_2018.5.274/linux/mkl/lib/intel64 -lmkl_rt -lmkl_sequential -lmkl_core -liomp5 -lpthread" --download-metis --download-ptscotch --download-hypre --download-parmetis --download-superlu --download-suitesparse --download-tetgen --download-slepc --download-hpddm --download-hpddm-commit=ce6ce80 --download-scalapack --download-mumps PETSC_ARCH=fr
[1]PETSC ERROR: #1 User provided function() line 0 in unknown file
application called MPI_Abort(MPI_COMM_WORLD, 50162059) - process 1
In: PMI_Abort(50162059, application called MPI_Abort(MPI_COMM_WORLD, 50162059) - process 1)
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** STEP 5080250.0 ON g130 CANCELLED AT 2020-08-25T18:11:38 ***
slurmstepd: error: *** STEP 5080250.0 ON g130 CANCELLED AT 2020-08-25T18:11:38 ***
srun: error: g130: task 0: Killed
srun: Terminating job step 5080250.0
srun: error: g130: task 1: Exited with exit code 15
srun: error: g137: tasks 2-3: Killed
Dear prj,
I found something very strange…
For instance, I told you that diffusion-3d-PETSc.edp works.
But, diffusion-2d-PETSc.edp does not.
When I compare the code, there are only few differences between these two codes: dimension 3 vs. 2 and cube vs. square… I have no idea why one works but not the other…
I attach the output file if you like to have a look.
slurm-5083554_diffusion-2d-PETSc.log (2.5 KB) diffusion-2d-PETSc.log (174.8 KB) diffusion-3d-PETSc.log (186.9 KB)
Best,
Eunok
It’s good news that some examples are working. I’m guessing there is something wrong in your software stack, all these examples are basically the same, it’s not normal that some work while others don’t. Consider updating your Intel modules and GNU compilers.
Dear prj,
Thanks a lot for your help.
For updates of intel and GNU, I will contact the system manager.
In the meantime, strangely the examples/hpddm, only all 3D examples are working but none of the 2D examples…
I will keep you updated…
Best,
Eunok
Hi Eunok,
Depending on which computing architectures am I working on, I may experience the same error 50162059 for a script working fine on an other computing architecture:
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode 50162059.
Did you manage to find the reason for such an error ?
Best,
Lucas
Could you share a MWE that reproduce the error, and say a little bit more about the faulty architecture?
Unfortunately, I did not resolve this problem…
Hi Prj,
I did not have any special code. slurm-6954404.log (4.7 KB) stokes-2d-PETSc.log (173.0 KB)
I used the freefem example in hpddm.
I attach the failed log files for 2D example ‘stokes-2d-PETSc.log’.
Did you update your compilers as advised 4 months ago? Also, you may want to update your FreeFEM installation.