Hello everyone
I have a few questions of PETSc and its parallelization in FreeFEM++
First of all, I am not sure if PETSc in my computer works well. The OS used in my labtop is window 11. In addition, I installed FreeFEM++ (version 12.4) from Github without installing PETSc from its official website. My configuration is just default. But it seems that PETSc is included in the installing process of FreeFEM++, because I could load PETSc in .edp file without any trouble. So, if someone installs FreeFEM with default configuration, do they think that PETSc can be loaded in FreeFEM++?
Secondly, if it is true, I have another question of computation speed in parallelization of FreeFEM. I know that it depends on user’s computers and algorithms. But, I want to know general trends and other people’s results with respect to the number of cores in FreeFEM calculations.
As an example, I run an example code in FreeFEM hpddm directory, diffusion-2d-PETSc.edp. I slightly modified the code to check the computation time.
// run with MPI: ff-mpirun -np 4 script.edp
// NBPROC 4
load "PETSc" // PETSc plugin
macro dimension()2// EOM // 2D or 3D
include "macro_ddm.idp" // additional DDM functions
macro grad(u)[dx(u), dy(u)]// EOM // two-dimensional gradient
func Pk = P1; // finite element space
mesh Th = square(getARGV("-global", 200), getARGV("-global", 200)); // global mesh
Mat A;
buildMat(Th, getARGV("-split", 1), A, Pk, mpiCommWorld)
fespace Wh(Th, Pk); // local finite element space
varf vPb(u, v) = int2d(Th)(grad(u)' * grad(v)) + int2d(Th)(v) + on(1, u = 0.0);
real[int] rhs = vPb(0, Wh);
set(A, sparams = "-ksp_view");
Wh<real> u; // local solution
if (mpirank == 0) real begin = mpiWtime();
A = vPb(Wh, Wh);
//real memory = PetscMemoryGetCurrentUsage();
//u[] = A^-1 * rhs;
//memory = PetscMemoryGetCurrentUsage() - memory;
//if(mpirank == 0)
// cout << memory << " bytes of memory in usage" << endl;
//real[int] err = A * u[]; // global matrix-vector product
//real[int] transpose = A' * u[];
//exchange(A, rhs, scaled = true);
//err -= rhs;
//macro def(u)u//
//plotMPI(Th, u, Pk, def, real, cmm = "Global solution")
//u[] = err;
//plotMPI(Th, u, Pk, def, real, cmm = "Global residual")
Wh<real> Rb[1];
Rb[0] = 1;
set(A, sparams = "-pc_type gamg -ksp_type gmres -ksp_max_it 200", nearnullspace = Rb);
u[] = 0.0;
u[] = A^-1 * rhs;
//plotMPI(Th, u, Pk, def, real, cmm = "Global solution")
if (mpirank == 0 ){
real finish = mpiWtime();
cout << "Time: " << finish - begin << endl;
}
The below one is the average of computation times in 10 time trials with respect to the number of codes;
the number of cores | time (sec)
1 | 0.22804
2 | 0.153596
3 | 0.122373
4 | 0.100976
5 | 0.088156
6 | 0.087026
So, do you think that the result is reasonable?
Sincerely