Need help assembling parallelization results

Hello,
I am new to parallel computing. I wonder how can I assemble the results of different processors to the global field, in order to continue the calculations after.
For example, I ran the file stokes-fieldsplit-3d-PETSc.edp with the modification at the end like this:

macro def1(u)u//
cout<<u(1,1,1)<<"   "<<uB(1,1,1)<<"   "<<uC(1,1,1)<<"   "<<uD(1,1,1)<<endl;
//plotMPI(Th, u, P2, def1, real, cmm = "Global solution")

And here is what i’ve got:

.......
Mat Object: 4 MPI processes
    type: mpiaij
    rows=35635, cols=35635
    total: nonzeros=3176161, allocated nonzeros=3176161
    total number of mallocs used during MatSetValues calls=0
      using I-node (on process 0) routines: found 2570 nodes, limit used is 5
 --- system solved with PETSc (in 1.395692e+01)
0   0   0   -173.356
0   0   0   17.1095
0.000000e+00   0.000000e+00   0.000000e+00   -1.140499e+01
0   0   0   -12.7859
times: compile 0.041434s, execution 14.5749s,  mpirank:3
 ######## We forget of deleting   6 Nb pointer,   0Bytes  ,  mpirank 3, memory leak =484128
times: compile 0.04276s, execution 14.559s,  mpirank:2
 ######## We forget of deleting   6 Nb pointer,   0Bytes  ,  mpirank 2, memory leak =472480
 CodeAlloc : nb ptr  7009,  size :669544 mpirank: 3
 CodeAlloc : nb ptr  7009,  size :669544 mpirank: 2
times: compile 0.041002s, execution 14.5757s,  mpirank:1
 ######## We forget of deleting   8 Nb pointer,   0Bytes  ,  mpirank 1, memory leak =488672
 CodeAlloc : nb ptr  7009,  size :669544 mpirank: 1
times: compile 8.474800e-02s, execution 1.450528e+01s,  mpirank:0
 ######## We forget of deleting   8 Nb pointer,   0Bytes  ,  mpirank 0, memory leak =472848
 CodeAlloc : nb ptr  7009,  size :669544 mpirank: 0

Thanks in advance,
Daoming

You can look at Pierre Jolivet introduction to parallel FreeFEM part 1 - YouTube. The first comment lists all part of the tutorial, what you are looking for is: " 1:27:08 macro ThN2O() n2o//, restrict", but it may be worth watching the tutorial in full.

Thank you very much for your answer. Please allow me to ask another naive question on occasion. I want to do 3D multiphysics simulations (including fluid). Considering the calculation time, the memory consumption and the difficulty of programming, according to your experience, which parallelization technology (with ffddm/hpddm, with PETs, with fieldsplit) is the most suitable?

PETSc with fieldsplit.

Thanks, I modified the stokes-fieldsplit-3d-PETSc.edp by adding the macro ThN2O() n2o// and restrict. However i don’t know where i missed, the program reports errors at restrict function.

Here is the program and the report:

//  run with MPI:  ff-mpirun -np 4 script.edp
// NBPROC 4

load "PETSc"                        // PETSc plugin
macro dimension()3// EOM            // 2D or 3D
include "macro_ddm.idp"             // additional DDM functions

macro def(i)[i, i#B, i#C, i#D]//    // vector field definition
macro init(i)[i, i, i, i]// EOM     // vector field initialization
macro grad(u)[dx(u), dy(u), dz(u)]//// three-dimensional gradient
real Sqrt = sqrt(2.);
macro div(u)(dx(u) + dy(u#B) + dz(u#C))// EOM
func Pk = [P2, P2, P2, P1];         // finite element space

string solver;
if(!HasType("MATSOLVER", "mumps") && !HasType("MATSOLVER", "superlu"))
    exit(0);
else
    solver = (HasType("MATSOLVER", "mumps") ? "mumps" : "superlu");

mesh3 Th;
{
    mesh ThGlobal2d = square(getARGV("-global", 12), getARGV("-global", 12), [x, y]);    // global mesh
    ThGlobal2d = trunc(ThGlobal2d, (x <= 0.5) || (y <= 0.5), label = 5);
    ThGlobal2d = trunc(ThGlobal2d, (y >= 0.25) || (x >= 0.25), label = 5);
    mesh Th2d = movemesh(ThGlobal2d, [-x, y]);
    ThGlobal2d = ThGlobal2d + Th2d;
    Th = buildlayers(ThGlobal2d, getARGV("-global", 12) / 2, zbound = [0, 0.4]);
}

mesh3 ThGolbal=Th;
int[int] myN2O;
macro ThN2O() myN2O//

Mat A;
buildMat(Th, getARGV("-split",1),A, Pk, mpiCommWorld)

fespace Wh(Th, Pk);                 // local finite element space
varf vPb([u, uB, uC, p], [v, vB, vC, q]) = int3d(Th)(grad(u)' * grad(v) + grad(uB)' * grad(vB) + grad(uC)' * grad(vC) - div(u) * q - div(v) * p + 1e-10 * p * q) + on(0, 1, 3, 5, u = 0, uB = 0, uC = 0) + on(2, u = 1000*y*(0.5-y)*z*(0.4-z), uB = 0, uC = 0);
real[int] rhs = vPb(0, Wh, tgv = -1);
Wh<real> def(u)=[1,1,1,2];
string[int] names(2);
names[0] = "velocity";
names[1] = "pressure";

A = vPb(Wh, Wh, tgv = -1);
//set(A, sparams = "-ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_schur_fact_type lower -pc_fieldsplit_detect_saddle_point -fieldsplit_velocity_sub_pc_type lu " + " -fieldsplit_pressure_sub_pc_type lu -fieldsplit_velocity_sub_pc_factor_mat_solver_type " + solver + " -fieldsplit_pressure_sub_pc_factor_mat_solver_type " + solver + " -ksp_monitor -ksp_view " + " -fieldsplit_velocity_ksp_type gmres -fieldsplit_velocity_ksp_max_it 5 -fieldsplit_pressure_ksp_type gmres -fieldsplit_pressure_ksp_max_it 5 -ksp_rtol 1e-6", fields = u[], names = names);
set
u[] = 0.0;
u[] = A^-1 * rhs;

fespace WhGlobal(ThGolbal,Pk);
int[int] subIdx=restrict(Wh,WhGlobal,myN2O);

//WhGlobal<real> def(uGlobal),def(uSum);

//real[int] locPETSc;
//ChangeNumbering(A,u[],locPETSc);
//ChangeNumbering(A,u[],locPETSc, inverse=true);
//uGlobal[](subIdx)=u[];
//mpiAllReduce(uGlobal[],uSum[],mpiCommWorld,mpiSUM);

//uGlobal[]=0;
//u[].*=A.D;
//uGlobal[](subIdx)=u[];
//mpiAllReduce(uGlobal[],uSum[],mpiCommWorld,mpiSUM);


//macro def3(u)[u, u#B, u#C]// EOM
//macro def1(u)[u#D]// EOM
//savevtk("stokes-fieldsplit-3d-petcs",def3(uSum),def1(uSum),order=[1,1]);

.........
  Mat Object: 8 MPI processes
    type: mpiaij
    rows=35635, cols=35635
    total: nonzeros=3176161, allocated nonzeros=3176161
    total number of mallocs used during MatSetValues calls=0
      using I-node (on process 0) routines: found 1175 nodes, limit used is 5
 --- system solved with PETSc (in 1.203322e+01)
  -- Build Nodes/DF on mesh :   n.v. 1666, n. elmt. 7128, n b. elmt. 1836
     nb of Nodes 11323    nb of DoF   35635  DFon=4300
  -- FESpace: Nb of Nodes 11323 Nb of DoF 35635
  -- Build Nodes/DF on mesh :   n.v. 1666, n. elmt. 7128, n b. elmt. 1836
     nb of Nodes 11323    nb of DoF   35635  DFon=4300
  -- FESpace: Nb of Nodes 11323 Nb of DoF 35635
[3]PETSC ERROR: ------------------------------------------------------------------------
[3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[3]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind
[3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple MacOS to find memory corruption errors
[3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[3]PETSC ERROR: to get more information on the crash.
[3]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[3]PETSC ERROR: Signal received
[3]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[3]PETSC ERROR: Petsc Release Version 3.17.0, Mar 30, 2022 
[3]PETSC ERROR: /usr/local/bin/FreeFem++-mpi on a  named debian by dyu Thu Sep  8 17:54:22 2022
[3]PETSC ERROR: [5]PETSC ERROR: ------------------------------------------------------------------------
[5]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[5]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[5]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind
[5]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple MacOS to find memory corruption errors
[5]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[5]PETSC ERROR: to get more information on the crash.
[5]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[5]PETSC ERROR: Signal received
[5]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[5]PETSC ERROR: Petsc Release Version 3.17.0, Mar 30, 2022 
[5]PETSC ERROR: /usr/local/bin/FreeFem++-mpi on a  named debian by dyu Thu Sep  8 17:54:22 2022
[5]PETSC ERROR: Configure options --prefix=/usr/local/ff-petsc/r MAKEFLAGS= --with-debugging=0 COPTFLAGS="-O3 -mtune=native" CXXOPTFLAGS="-O3 -mtune=native" FOPTFLAGS="-O3 -mtune=native" --with-cxx-dialect=11 --with-ssl=0 --with-x=0 --with-fortran-bindings=0 --with-cudac=0 --with-cc=/usr/bin/mpicc --with-cxx=/usr/bin/mpic++ --with-fc=/usr/bin/mpif90 --with-scalar-type=real --with-blaslapack-include= --with-blaslapack-lib="-llapack -lblas" --download-metis --download-ptscotch --download-hypre --download-parmetis --download-mmg --download-parmmg --download-superlu --download-suitesparse --download-tetgen --download-slepc --download-hpddm --download-slepc-configure-arguments=--download-arpack=https://github.com/prj-/arpack-ng/archive/9fc0c71.tar.gz --download-scalapack --download-mumps PETSC_ARCH=fr
[5]PETSC ERROR: #1 User provided function() at unknown file:0
[5]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 5 in communicator MPI_COMM_WORLD
with errorcode 59.
.........

You need to use createMat, not buildMat, please go through the tutorial screencast I shared earlier.

Thank you Pr. Jolivet, I finished the video you shared. It is really useful to meet my need. But I still have a little trouble when adding the viscosity of water in the program. When i did this, the solver seems not very happy with these small values. What do I need to change in the solver settings ?

real mu=1.0016e-9;//Dynamic viscosity Water 1.0016e-9 Mpa*s
varf vPb([u, uB, uC, p], [v, vB, vC, q]) = 
int3d(Th)(mu*(grad(u)' * grad(v) + grad(uB)' * grad(vB) + grad(uC)' * grad(vC)) 
- div(u) * q - div(v) * p + 1e-10 * p * q) 
+ on(0, 1, 3, 5, u = 0, uB = 0, uC = 0) 
+ on(2, u = 1000*y*(0.5-y)*z*(0.4-z), uB = 0, uC = 0);

Linear solve did not converge due to DIVERGED_PC_FAILED iterations 0
               PC failed due to FACTOR_OUTMEMORY 
KSP final norm of residual -nan.
KSP Object: 8 MPI processes
  type: gmres
    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: lu
    out-of-place factorization
    tolerance for zero pivot 2.22045e-14
    matrix ordering: external
    factor fill ratio given 0., needed 0.
      Factored matrix follows:
        Mat Object: 8 MPI processes
          type: mumps
........

The error is fully spelled out: PC failed due to FACTOR_OUTMEMORY. Not much more I can add: either switch to a larger machine or switch to an iterative method.

That’s a little strange, because i’m using a machine with 32Go to solve a problem with 1666 vertices in 3D. Without adding the viscosity, i can increase the vertices number to 11323 without any problem.

Probably MUMPS is pivoting too much or not setup properly then.

I’m not sure if i’m doing right, but when i use set(A, sparams = "-pc_type jacobi -ksp_type fgmres"); in the stokes-3d-PETSc.edp. I can get a clear results.

This is one of the most ineffective iterative method, but if you are satisfied by the result, then I guess that’s OK.

Do you have a tutorial link? Of course more effective is better :slight_smile: Thank you for your help.

There are many different preconditioners for Stokes, I don’t know of a comprehensive tutorial with all of them listed. If you are trying to learn, you should stick to a direct solver, e.g., MUMPS, and figure out why it is failing for your. You could use the command line parameter -info dump and look in each dump.XYZ (XYZ ranging from 0 to 7 if you have 8 MPI processes) if there is any MUMPS error. Then, look at http://mumps.enseeiht.fr/doc/userguide_5.5.1.pdf to understand what that error means and how to fix it.

Thank you again Pr.Jolivet. I will go for it. :blush: