Exec error : exec assert

I run a piece of parallel code using 12 processors. This “Exec error” occurs randomly… what does this mean?

You need to tell us what is line 1484 otherwise we can’t help you.

Line 1484 is not in my code as attached: fsi_3d_1.edp (7.0 KB)
By the way, this error occurs when using P1b/P1elements, there is no such error for P2/P1 elements.

Run without -v 0 or -ns so that the FreeFEM script will be parsed and printed to screen.
That line (1484) probably comes from a .idp you are including, so you’ll be able to see it then.

Thank you @prj . I removed -v 0, and the it printed the following messages:

You are still not telling me what is line 1484 of the script executed by FreeFEM.
Here is an example of what I want you to do.

$ cat test.edp
include "test.idp"
$ cat test.idp

$ FreeFem++ test.edp -v 0
  current line = 10
Exec error : exec assert
   -- number :1
Exec error : exec assert
   -- number :1
 err code 8 ,  mpirank 0
-- weird, there is no line 10 in test.edp!
$ FreeFem++ test.edp -v 1 | grep 10
   10 : assert(0);
    2 :  sizestack + 1024 =1072  ( 48 )
  current line = 10
-- oh no, there is one indeed!

The following are three screenshots which all include line 1484, which one is relevant to the issue?

The last one since it’s the only one with an assert. There seems to be something wrong within the transfer macro. If you can provide a minimal working example that reproduces this, I can have a look.

Please try this working example test.edp (988 Bytes) together with these two meshes: aorta.mesh (3.5 MB)
leaflets.mesh (1.5 MB).

Very interesting: I tried to run it using different processors, the assert error only occurs when using 8 processors:
on Windows:

on ARC:

Thanks! This probably comes from an ill-shaped decomposition. I’ll try to fix this, but this may not be trivial. In the meantime, you can launch your script with the additional parameter -Dpartitioner=scotch. I’ve tried with 8 processes, and this finishes successfully.

Great, thank you very much.
BTW, for this test script the error occurs only for 8 processors. However, for my real long script, the error happens randomly: OK for a few time steps, then this error comes out…

Even with -Dpartitioner=scotch?

I works fine with -Dpartitioner=scotch for this test example. I will try my real script…

Good morning @prj ,
The same error still happens for my long scriptfsi_3d_2.edp (7.0 KB), and this is the command I used to run on a ARC system:

#$ -cwd -V
#$ -l np=12
#$ -l h_rt=48:00:00
#$ -m be
#$ -M scsywan@leeds.ac.uk
module load singularity/3.6.4
singularity run -e --env OMP_NUM_THREADS=1 --bind /nobackup:/nobackup …/ff-latest.sif ff-mpirun -n $NSLOTS $PWD/fsi_3d_2.edp -v 0 -Dpartitioner=scotch

I see… Then, I really need to try to fix that :smiley:

Do you think this is only a problem for P1b/P1 elements, or this problem does not depend on elements, and other types of elements, such as P2/P1, may have the same problem (although I have not seen)?

I have a fix, but it ain’t pretty (so I won’t commit it yet and will try to figure out something better — not guaranteed). Please apply the following patch and let me know how things work out.

diff --git a/idp/macro_ddm.idp b/idp/macro_ddm.idp
index ea8c0861..3649bb3e 100644
--- a/idp/macro_ddm.idp
+++ b/idp/macro_ddm.idp
@@ -1603,19 +1603,19 @@ ENDIFMACRO
         bb(0:2 * dimension - 1) = tmp;
         boundingbox(ThNew, tmp);
         bb(2 * dimension:4 * dimension - 1) = tmp;
-        bb(0) -= max(ThName.hmax, ThNew.hmax);
-        bb(1) += max(ThName.hmax, ThNew.hmax);
-        bb(2) -= max(ThName.hmax, ThNew.hmax);
-        bb(3) += max(ThName.hmax, ThNew.hmax);
-        bb(4) -= max(ThName.hmax, ThNew.hmax);
-        bb(5) += max(ThName.hmax, ThNew.hmax);
-        bb(6) -= max(ThName.hmax, ThNew.hmax);
-        bb(7) += max(ThName.hmax, ThNew.hmax);
+        bb(0) -= 2.0 * max(ThName.hmax, ThNew.hmax);
+        bb(1) += 2.0 * max(ThName.hmax, ThNew.hmax);
+        bb(2) -= 2.0 * max(ThName.hmax, ThNew.hmax);
+        bb(3) += 2.0 * max(ThName.hmax, ThNew.hmax);
+        bb(4) -= 2.0 * max(ThName.hmax, ThNew.hmax);
+        bb(5) += 2.0 * max(ThName.hmax, ThNew.hmax);
+        bb(6) -= 2.0 * max(ThName.hmax, ThNew.hmax);
+        bb(7) += 2.0 * max(ThName.hmax, ThNew.hmax);
-        bb(8) -= max(ThName.hmax, ThNew.hmax);
-        bb(9) += max(ThName.hmax, ThNew.hmax);
-        bb(10) -= max(ThName.hmax, ThNew.hmax);
-        bb(11) += max(ThName.hmax, ThNew.hmax);
+        bb(8) -= 2.0 * max(ThName.hmax, ThNew.hmax);
+        bb(9) += 2.0 * max(ThName.hmax, ThNew.hmax);
+        bb(10) -= 2.0 * max(ThName.hmax, ThNew.hmax);
+        bb(11) += 2.0 * max(ThName.hmax, ThNew.hmax);
     int size = mpiSize(ThName#Comm);

Thank you @prj .

I will try this later…

If I switch to other types of elements, such as P2/P1, do you think the problem is still there? I am just curious whether this problem depends on elements type…

I think with my patch it will work in all cases. But you can try to not use the patch and see if the problem persists by simply changing the polynomial order.

1 Like

Hi @prj ,
Because I use the docker image of FreeFem from https://github.com/FreeFem/FreeFem-docker/releases/download/v4.9/freefem.tar.gz
and then use this to create a singularity container on the university’s ARC system.

Do you have idea how to find the file “macro_ddm.idp” and modify it?

I did
tar xfv freefem.tar.gz
find the file in the directory of
However, when I
tar czf freefem.tar.gz ...

There are error to build singularity container from this new freefem.tar.gz.