Question for parallelization in FreeFem

Dear all

I have a question related to an example of parallelization in FreeFEM++. The example is VG.edp, found in the FreeFEM example directory. When I ran the program, it seemed that each single core performed calculations individually rather than using parallelization.

Below one is my command;

mpirun -np 4 FreeFem+±mpi.exe ./VG.edp

These is the results;

lambda,mu,gravity =115079 83333.3 -0.05   
                                                                                                                         
===============================================                                                                                                                    
====            CPU time                  =====                                                                                                                    
===============================================                                                                                                                     
ALL solving steps :::: 6.827                                                                                                                                       
Matrix            :::: 2.383                                                                                                                                       
Fact              :::: 0                                                                                                                                          
Second member     :::: 0.062                                                                                                                                       
Solve             :::: 4.382                                                                                                                                      
===============================================                                                                                                                    
0 max deplacement = 1.43944e-06                                                                                                                                    
lambda,mu,gravity =115079 83333.3 -0.05                                                                                                                            
===============================================                                                                                                                    
====            CPU time                  =====                                                                                                                    
===============================================                                                                                                                     
ALL solving steps :::: 8.343                                                                                                                                       
Matrix            :::: 3.036                                                                                                                                      
 Fact              :::: 0                                                                                                                                           
Second member     :::: 0.125                                                                                                                                       
Solve             :::: 5.182                                                                                                                                      
===============================================                                                                                                                    
0 max deplacement = 1.43944e-06                                                                                                                                    
lambda,mu,gravity =115079 83333.3 -0.05                                                                                                                            
===============================================                                                                                                                    
====            CPU time                  =====                                                                                                                    
===============================================                                                                                                                     
ALL solving steps :::: 9.366                                                                                                                                       
Matrix            :::: 3.939                                                                                                                                       
Fact              :::: 0                                                                                                                                           
Second member     :::: 0.125                                                                                                                                       
Solve             :::: 5.302                                                                                                                                      
===============================================                                                                                                                    
0 max deplacement = 1.43944e-06                                                                                                                                    
lambda,mu,gravity =115079 83333.3 -0.05                                                                                                                            
===============================================                                                                                                                    
====            CPU time                  =====                                                                                                                    
===============================================                                                                                                                     
ALL solving steps :::: 9.625                                                                                                                                       
Matrix            :::: 4.423                                                                                                                                       
Fact              :::: 0                                                                                                                                           
Second member     :::: 0.125                                                                                                                                      
Solve             :::: 5.077                                                                                                                                      
===============================================                                                                                                                    
0 max deplacement = 1.43944e-06 

The below one is the result when I use only one core.

$ mpirun -np 1 FreeFem++-mpi.exe ./VG.edp -wp -v 0                                                                                                                 
lambda,mu,gravity =115079 83333.3 -0.05                                                                                                                            
===============================================                                                                                                                    
====            CPU time                  =====                                                                                                                    
===============================================                                                                                                                     
ALL solving steps :::: 6.387                                                                                                                                       
Matrix            :::: 2.321                                                                                                                                       
Fact              :::: 0                                                                                                                                           
Second member     :::: 0.062                                                                                                                                       
Solve             :::: 4.004                                                                                                                                      
===============================================                                                                                                                    
0 max deplacement = 1.43944e-06  

So I am slightly confused of this example in FreeFEM

Thanks.

What makes you say that each core is working individually? Is it just the clock time? It is common for parallel algorithms to be slower than serial algorithms for small problem sizes. In such cases, the communication overhead required for parallelization exceeds the speedup.

Thank you for leaving a comment, Chris.

Because what I thought was that four cores have to be used together to solve the matrix equation (e.g. x=A^-1 * y) through MPI. Thus, if all cores work together, I think that there are no reasons that four results happen, and only one result should be generated. And if I control the parameter of the number of cores in MPI, the result might have different CPU times depending on the number of cores. Of course, depending on the problem and algorithm, CPU times would increase or decrease.

And generated four results with different CPU times seems that four different cores solve the same matrix equation individually rather than working together. Also, as you can see, when I use only one core, its computation time is close to the first one in the four cores’ results. So, I suspect that MPI works correctly.

I leave the example code;

// NBPROC 4

// other
//load "medit"
load "MUMPS_mpi"
include "cube.idp"

real ttgv=1e10;
string ssparams="nprow=1, npcol="+mpisize;

int[int]  Nxyz=[40,8,8];
real [int,int]  Bxyz=[[0.,5.],[0.,1.],[0.,1.]];
int [int,int]  Lxyz=[[1,1],[2,2],[2,2]];
mesh3 Th=Cube(Nxyz,Bxyz,Lxyz);

real E = 21.5e4;
real sigma = 0.29;
real mu = E/(2*(1+sigma));
real lambda = E*sigma/((1+sigma)*(1-2*sigma));
real gravity = -0.05;

fespace Vh(Th,[P2,P2,P2]);
//fespace Vh(Th,[P1,P1,P1]);
Vh [u1,u2,u3], [v1,v2,v3];
cout << "lambda,mu,gravity ="<<lambda<< " " << mu << " " << gravity << endl;

real sqrt2=sqrt(2.);
macro epsilon(u1,u2,u3)  [dx(u1),dy(u2),dz(u3),(dz(u2)+dy(u3))/sqrt2,(dz(u1)+dx(u3))/sqrt2,(dy(u1)+dx(u2))/sqrt2] // EOM
macro div(u1,u2,u3) ( dx(u1)+dy(u2)+dz(u3) ) // EOM
real time=clock();  

real tMatrix = clock();
varf vLame([u1,u2,u3],[v1,v2,v3]) = 
int3d(Th)(
		 lambda*div(u1,u2,u3)*div(v1,v2,v3)	
	    + 2.*mu*( epsilon(u1,u2,u3)'*epsilon(v1,v2,v3) ) //') for emacs
	      )
	      + on(1,u1=0,u2=0,u3=0);

matrix MLame=vLame(Vh,Vh,tgv=ttgv);
tMatrix = clock()-tMatrix;

real tFact = clock();
set(MLame,solver="MUMPSMPI",tgv=ttgv);
tFact = clock()-tFact;

real tsdc = clock();
varf vsdc([u1,u2,u3],[v1,v2,v3])=int3d(Th) (gravity*v3)+ on(1,u1=0,u2=0,u3=0);
real[int] sdc= vsdc(0,Vh); 
tsdc = clock()-tsdc;

real tsolve=clock();
u1[] = MLame^-1*sdc;
tsolve= clock()-tsolve;
cout << "===============================================" << endl;
cout << "====            CPU time                  =====" << endl;
cout << "===============================================" << endl;
cout << " ALL solving steps :::: "  << clock()-time << endl;
cout << " Matrix            :::: "  << tMatrix << endl;
cout << " Fact              :::: "  << tFact   << endl;
cout << " Second member     :::: "  << tsdc    << endl;
cout << " Solve             :::: "  << tsolve  << endl;
cout << "===============================================" << endl;

if(mpirank==0)
{
	real dmax= u1[].max;
	cout << mpirank << " max deplacement = " << dmax << endl;
	real coef= 0.01/max(dmax,1e-10);
	int[int] ref2=[1,0,2,0];
	
searchMethod=0;		
  mesh3 Thm=movemesh3(Th,transfo=[x+u1*coef,y+u2*coef,z+u3*coef],label=ref2);
  savemesh(Thm,"beam-deformed-mumps.mesh");
}

Just run the following program:

cout << mpirank << "/" << mpisize << endl;

If mpisize is 1 and mpirank is 0, repeated multiple times, then you have a problem with your installation. Otherwise, all is good. The way you are doing parallelism is quite bad, you should consider switching to something more efficient, e.g., PETSc. You could look at the examples examples/hpddm/*PETSc*.edp if you want to know more (or go to FreeFem-tutorial).

Thank you for comment, prj

The below one is my another simple code to check MPI;

load "MUMPS_mpi"
cout << mpirank << "/" << mpisize << endl;

the below one is my command;

mpiexec.exe -np 4 FreeFem+±mpi.exe .\mpi.example.2.edp -v 0

This below one is the result;

0/4
2/4
1/4
3/4

So, it seems like my MPI works correctly. But somehow, I am confused of why I couldn’t reduce the computation time in the FreeFEM MPI example files.

Because your problem is probably too small and you are not using the proper tool.