# Question of PETSc and parallelization in FreeFEM++

Hello everyone

I have a few questions of PETSc and its parallelization in FreeFEM++

First of all, I am not sure if PETSc in my computer works well. The OS used in my labtop is window 11. In addition, I installed FreeFEM++ (version 12.4) from Github without installing PETSc from its official website. My configuration is just default. But it seems that PETSc is included in the installing process of FreeFEM++, because I could load PETSc in .edp file without any trouble. So, if someone installs FreeFEM with default configuration, do they think that PETSc can be loaded in FreeFEM++?

Secondly, if it is true, I have another question of computation speed in parallelization of FreeFEM. I know that it depends on user’s computers and algorithms. But, I want to know general trends and other people’s results with respect to the number of cores in FreeFEM calculations.

As an example, I run an example code in FreeFEM hpddm directory, diffusion-2d-PETSc.edp. I slightly modified the code to check the computation time.

``````//  run with MPI:  ff-mpirun -np 4 script.edp
// NBPROC 4

macro dimension()2// EOM            // 2D or 3D
include "macro_ddm.idp"             // additional DDM functions

func Pk = P1;                       // finite element space

mesh Th = square(getARGV("-global", 200), getARGV("-global", 200)); // global mesh
Mat A;
buildMat(Th, getARGV("-split", 1), A, Pk, mpiCommWorld)

fespace Wh(Th, Pk);                 // local finite element space
varf vPb(u, v) = int2d(Th)(grad(u)' * grad(v)) + int2d(Th)(v) + on(1, u = 0.0);
real[int] rhs = vPb(0, Wh);

set(A, sparams = "-ksp_view");
Wh<real> u;                         // local solution

if (mpirank == 0) real begin = mpiWtime();

A = vPb(Wh, Wh);
//real memory = PetscMemoryGetCurrentUsage();
//u[] = A^-1 * rhs;
//memory = PetscMemoryGetCurrentUsage() - memory;
//if(mpirank == 0)
//    cout << memory << " bytes of memory in usage" << endl;

//real[int] err = A * u[];            // global matrix-vector product
//real[int] transpose = A' * u[];
//exchange(A, rhs, scaled = true);
//err -= rhs;

//macro def(u)u//
//plotMPI(Th, u, Pk, def, real, cmm = "Global solution")
//u[] = err;
//plotMPI(Th, u, Pk, def, real, cmm = "Global residual")

Wh<real> Rb;
Rb = 1;
set(A, sparams = "-pc_type gamg -ksp_type gmres -ksp_max_it 200", nearnullspace = Rb);
u[] = 0.0;
u[] = A^-1 * rhs;
//plotMPI(Th, u, Pk, def, real, cmm = "Global solution")
if (mpirank == 0 ){

real finish = mpiWtime();
cout << "Time: " << finish - begin << endl;
}
``````

The below one is the average of computation times in 10 time trials with respect to the number of codes;

the number of cores | time (sec)
1 | 0.22804
2 | 0.153596
3 | 0.122373
4 | 0.100976
5 | 0.088156
6 | 0.087026

So, do you think that the result is reasonable?

Sincerely

Your results are reasonable, but not great. Here are my results on OpenSuse 15.4 with up to 6 i7 cores at 4.8 GHz. You will get better scaling on larger problem sizes.

1 | 0.151537
2 | 0.086251
3 | 0.063060
4 | 0.051341
5 | 0.040208
6 | 0.038132

1 Like
1. The problem is tiny.
2. You are using a far-from-optimal solver to deal with this problem.

So it is difficult to judge, but as Chris said, things look reasonable.

Thank you, Chris. Your results help me understand it a lot. If you don’t mind, could you let me know what your CPU is? My CPU is i7-11600H. Thanks.

Thank you, prj. If you don’t mind, can I ask the other questions? Actually, I am a newbie in numerical computation. It would be a dummy question, but how could I find the optimal solver for a specific problem? Do I compare several solvers by trial and error?

Thanks

Mine are Intel i7-12800H

Do I compare several solvers by trial and error?

That, and by reading lots of literature. And then, after 10+ years doing that full time, you’ll be able to help others out Kidding aside, the art of preconditioning and deriving efficient solvers is difficult to master, it is usually best to ask for some pointers. E.g., if you are simply interested by Poisson equation as in your original post, just stick to AMG methods such as `-pc_type hypre` or `-pc_type gamg`.

1 Like

Thanks a lot!. when I use `-pc_type hypre`, the computation time is twice faster than the previous one.