Composite FE spaces + Parallel assembly with PETSc

Hello,
I am a user of version 4.14, and I’ve run the test cases presented here:
https://doc.freefem.org/documentation/composite.html#with-petsc
I have a problem with parallel assembly + PETSc. Compared with the parallel assembly with MUMPS, I get a ratio of around 60 on matrix assembly time in favor of MUMPS over PETSc.
I attach my scripts below.
The way I run the scripts are “time ~/path_to_FreeFem/bin/ff-mpirun -np 4 ./test_mumps.edp -wg”
Calculating on a laptop running ubuntu 20.04.
Thank you for help.
Pierre


load “MUMPS”
load “bem”

int nn = 200; // number of edges in each direction

mesh ThP = square(nn,nn,[2pix,2piy],flags=3); // Pressure mesh
mesh ThU = trunc(ThP,1,split=2); // Velocity mesh

fespace Uh(ThU,[P1,P1]); // Velocity space
fespace Ph(ThP,P1); // Pressure space

macro grad(u) [dx(u),dy(u)] //
macro Grad(u1,u2) [grad(u1), grad(u2)] //
macro div(u1,u2) (dx(u1)+dy(u2)) //

// definition of the boundary condition
func g1 = sin(x)*cos(y);
func g2 = -cos(x)*sin(y);

// definition of the right-hand side
func f1 = 0;
func f2 = -4*cos(x)*sin(y);

Uh [u1,u2],[v1,v2];
Ph p,q;

fespace Xh=Uh*Ph;

varf Stokes (<[u1,u2],[p]>, <[v1,v2],[q]>) = int2d(ThU)((Grad(u1,u2):Grad(v1,v2)))

  • int2d(ThU)(-div(u1,u2)*q -div(v1,v2)*p)
  • int2d(ThP)(-1e-10pq)
  • int2d(ThU)([f1,f2]'*[v1,v2])
  • on(1,2,3,4, u1=g1, u2=g2);

real t1 = clock();
matrix M = Stokes(Xh,Xh,solver=sparsesolver, master=-1);
real t2 = clock();
cout << “init matrix time=” << t2-t1 << endl;

real[int] b = Stokes(0,Xh);
real t3 = clock();
cout << “init rhs time=” << t3-t2 << endl;

real[int] sol = M^-1*b;
real t4 = clock();
cout << “solver time=” << t4-t3 << endl;


load “PETSc”
load “bem”

int nn = 200; // number of edges in each direction

mesh ThP = square(nn,nn,[2pix,2piy],flags=3); // Pressure mesh
mesh ThU = trunc(ThP,1,split=2); // Velocity mesh

fespace Uh(ThU,[P1,P1]); // Velocity space
fespace Ph(ThP,P1); // Pressure space

macro grad(u) [dx(u),dy(u)] //
macro Grad(u1,u2) [grad(u1), grad(u2)] //
macro div(u1,u2) (dx(u1)+dy(u2)) //

// definition of the boundary condition
func g1 = sin(x)*cos(y);
func g2 = -cos(x)*sin(y);

// definition of the right-hand side
func f1 = 0;
func f2 = -4*cos(x)*sin(y);

Uh [u1,u2],[v1,v2];
Ph p,q;

fespace Xh=Uh*Ph;

varf Stokes (<[u1,u2],[p]>, <[v1,v2],[q]>) = int2d(ThU)((Grad(u1,u2):Grad(v1,v2)))

  • int2d(ThU)(-div(u1,u2)*q -div(v1,v2)*p)
  • int2d(ThP)(-1e-10pq)
  • int2d(ThU)([f1,f2]'*[v1,v2])
  • on(1,2,3,4, u1=g1, u2=g2);

real t1 = clock();
Mat M = Stokes(Xh,Xh);
set(M, sparams = “-pc_type lu”);
real t2 = clock();
cout << “init matrix time=” << t2-t1 << endl;

real[int] b = Stokes(0,Xh);
real t3 = clock();
cout << “init rhs time=” << t3-t2 << endl;

real[int] sol = M^-1*b;
real t4 = clock();
cout << “solver time=” << t4-t3 << endl;

Your code is not runnable, because it’s not in a code block, so we can’t help you. Furthermore, you don’t need composite fespaces for this particular problem.

Sorry, here are code blocks. With MUMPS :

load "MUMPS"
load "bem"

int nn = 200; // number of edges in each direction

mesh ThP = square(nn,nn,[2*pi*x,2*pi*y],flags=3); // Pressure mesh
mesh ThU = trunc(ThP,1,split=2);  // Velocity mesh

fespace Uh(ThU,[P1,P1]); // Velocity space
fespace Ph(ThP,P1);      // Pressure space

macro grad(u) [dx(u),dy(u)] //
macro Grad(u1,u2) [grad(u1), grad(u2)] //
macro div(u1,u2) (dx(u1)+dy(u2)) //

// definition of the boundary condition
func g1 = sin(x)*cos(y);
func g2 = -cos(x)*sin(y);

// definition of the right-hand side
func f1 = 0;
func f2 = -4*cos(x)*sin(y);

Uh [u1,u2],[v1,v2];
Ph p,q;

fespace Xh=Uh*Ph;

varf Stokes (<[u1,u2],[p]>, <[v1,v2],[q]>) = int2d(ThU)((Grad(u1,u2):Grad(v1,v2)))
+ int2d(ThU)(-div(u1,u2)*q -div(v1,v2)*p)
+ int2d(ThP)(-1e-10*p*q)
+ int2d(ThU)([f1,f2]'*[v1,v2])
+ on(1,2,3,4, u1=g1, u2=g2);

real t1 = clock();
matrix M = Stokes(Xh,Xh,solver=sparsesolver, master=-1);
real t2 = clock();
cout << "init matrix time=" << t2-t1 << endl;

real[int] b = Stokes(0,Xh);
real t3 = clock();
cout << "init rhs time=" << t3-t2 << endl;

real[int] sol = M^-1*b;
real t4 = clock();
cout << "solver time=" << t4-t3 << endl;

and with PETSc :

load "PETSc"
load "bem"

int nn = 200; // number of edges in each direction

mesh ThP = square(nn,nn,[2*pi*x,2*pi*y],flags=3); // Pressure mesh
mesh ThU = trunc(ThP,1,split=2);  // Velocity mesh

fespace Uh(ThU,[P1,P1]); // Velocity space
fespace Ph(ThP,P1);      // Pressure space

macro grad(u) [dx(u),dy(u)] //
macro Grad(u1,u2) [grad(u1), grad(u2)] //
macro div(u1,u2) (dx(u1)+dy(u2)) //

// definition of the boundary condition
func g1 = sin(x)*cos(y);
func g2 = -cos(x)*sin(y);

// definition of the right-hand side
func f1 = 0;
func f2 = -4*cos(x)*sin(y);

Uh [u1,u2],[v1,v2];
Ph p,q;

fespace Xh=Uh*Ph;

varf Stokes (<[u1,u2],[p]>, <[v1,v2],[q]>) = int2d(ThU)((Grad(u1,u2):Grad(v1,v2)))
+ int2d(ThU)(-div(u1,u2)*q -div(v1,v2)*p)
+ int2d(ThP)(-1e-10*p*q)
+ int2d(ThU)([f1,f2]'*[v1,v2])
+ on(1,2,3,4, u1=g1, u2=g2);

real t1 = clock();
Mat M = Stokes(Xh,Xh);
set(M, sparams = "-pc_type lu");
real t2 = clock();
cout << "init matrix time=" << t2-t1 << endl;

real[int] b = Stokes(0,Xh);
real t3 = clock();
cout << "init rhs time=" << t3-t2 << endl;

real[int] sol = M^-1*b;
real t4 = clock();
cout << "solver time=" << t4-t3 << endl;

Thanks

The problem is in the line Mat M = Stokes(Xh,Xh). I didn’t wrote that part of the code, so I can’t tell you why, but it’s highly inefficient, it’s not related to MUMPS. Again, you don’t really need composite fespaces for this, assembling Mat M = Stokes(Xh,Xh) with non-composite fespaces is just as fast as a standard assembly.

Indeed, this is not intended behavior. Thank you for reporting the problem, I will look into it.

This is fixed @ fix computation of mapping for PETSc Mats with composite FEspaces · FreeFem/FreeFem-sources@4871f25 · GitHub
It will make it into the next release.

@prj and @phtournier ,
Thank you very much for your feedback. I’ll be testing the patch quickly.
Pierre