Parallel computing with PETSc using a large number of cores on a supercomputer

Hi FreeFem++ developers!

Thank you very much for making FF.

I am currently attempting to run FF with PETSc on a supercomputer.
Are there any considerations to keep in mind for accuracy when performing large-scale parallel computations?

First, I compared the calculation results between my PC and the supercomputer, and found that the results were different.
I am trying to deal with this problem of deteriorating accuracy by setting the required residuals (-ksp_rtol) more strictly and by increasing the upper limit of the number of convergence times (-ksp_max_it).

set(A, sparams = "-pc_type gamg -ksp_type gmres -ksp_rtol 1e-6 -ksp_max_it 10000 -pc_gamg_threshold 0.01");

You could try to add -ksp_pc_side right as well, but other than that, it should work just fine. Are you facing a particular issue?

Dear Jolivet,

Thank you very much for your constant guidance.

I will try the method you suggested as soon as possible.
I am probably facing a worse accuracy problem when the number of cores is increased, perhaps…

I am currently working on a 3D structural optimization problem using FF++ and PETSc.
There, I have encountered a problem where the final design solution is different for different number of cores (10 cores @ my WS, 76 cores @ super computer).
It seems that when the number of cores is large, (-ksp_max_it) is reached before the required accuracy (-ksp_rtol) is met, poor accuracy accumulated during the optimization process.

I have very little experience with parallel ( and in fact find it to be an unfortunate late
resort lol ) but I am curious if you can elaborate on the issue or think of a simple
test case that may help narrow down the problem. Is the solution likely to be sensitive
to matrix element values? Someone was here earlier complainging FF was different
from matlab but posted files demonstrating rounding of matrtix elements in the
one case. If your problem is marginally well posed, that could be a factor.
Anye idea if the errors have any relationship to the way the task is partitioned?
There may be simpler problems that generate simpler matricies that make any issues
more obvious to spot.