Efficient solver and preconditionner for 3D Stokes Problem

Dear all,

I am solving a 3D Stokes problem with PETSc in parallel, with this variationnal formulation grad(u)' * grad(v) + grad(uB)' * grad(vB) + grad(uC)' * grad(vC) - div(u) * q - div(v) * p + epsq * p * q

To solve this I am currently using the solver gmres with the LU preconditionning -ksp_type gmres -pc_type lu -pc_factor_mat_solver_type mumps.

However this method, is memory and time consuming due to the LU factorization (so I can only run small case even on HPC).

I have also try the solver block jacobi with a ILU preconditionning, but the the linear solver does not converge.

Could someone know a solver and a preconditionner to solve efficiently such a problem ?

Thank you in advance for your help,

Best regards,


Use fieldsplit and look at the literature.

Thanks ! I will have a look on this.


I have tried the solver proposed on the FreeFEM git: https://github.com/FreeFem/FreeFem-sources/blob/develop/examples/hpddm/stokes-fieldsplit-3d-PETSc.edp

set(A, sparams = " -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_schur_fact_type lower -pc_fieldsplit_detect_saddle_point -fieldsplit_velocity_sub_pc_type lu "
               +  " -fieldsplit_pressure_sub_pc_type lu -fieldsplit_velocity_sub_pc_factor_mat_solver_type mumps -fieldsplit_pressure_sub_pc_factor_mat_solver_type mumps -ksp_monitor -ksp_view "
               + " -fieldsplit_velocity_ksp_type gmres -fieldsplit_velocity_ksp_max_it 5 -fieldsplit_pressure_ksp_type gmres -fieldsplit_pressure_ksp_max_it 5 -ksp_rtol 1e-6");

However, this solver is slower that my previous solver and gives me an incorrect result.

Do you know what might be the cause ?
or how could I adapt it ?

Thank you in advance,

Best regards,


What does -ksp_converged_reason return?

It returns

 0 KSP Residual norm 4.595362033965e+31 
 1 KSP Residual norm 3.698028393291e+16 
Linear solve converged due to CONVERGED_RTOL iterations 1


Are you using the default tgv value of 10^{30}?

Yes I am using the default one.

Then switch to something else first.

I have tried different value of tgv but the result remain not correct and the residuals remain big.

What do you get with tgv = -1 + -ksp_monitor_true_residual -pc_type lu?

By doing this, I get the right result:

-->Solving with PETSc began....
  0 KSP preconditioned resid norm 2.720997719081e+02 true resid norm 4.595728221659e+01 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 1.879238360449e-04 true resid norm 1.609760664389e-10 ||r(i)||/||b|| 3.502732508859e-12

and it is much faster.


For the largest problem you tried with this, what was the performance of the method? Also, if the problem is symmetric with homogeneous Dirichlet boundary conditions (tgv = -2), you should consider switching to -pc_type cholesky.

With -ksp_type gmres -pc_type lu -pc_factor_mat_solver_type mumps I have been able to run a case with about 384 000 tetras (mesh3 cube(40, 40, 40)) on 144 CPUs and it takes 53 min.

I have not succeeded to run larger case with this method,


I have try tgv=-2 with -pc_type cholesky. The code converges, but the results are not correct.

Is your problem symmetric with homogeneous Dirichlet boundary conditions?

Sorry for my misunderstanding, I don’t have homogeneous Dirichlet boundary conditions. And my problem I think is not symmetric due to the extra term -eps*p*q.

The term -eps*p*q is symmetric, but if you don’t have homogeneous Dirichlet BC, tgv=-2 is useless indeed. 53 minutes for the above problem suggest that something is wrong on your machine.

Considering a case with 750 000 tetras (cube(50, 50, 50)) I have not been able to run it even using more that 200 CPUs.

The PETSc iterations don’t even start. This suggests that there is indeed a problem on the machine. But in the other hand it is known that LU factorization is time and memory consuming.

So I don’t know what is exactly the problem


Share the test case, I will let you know how much time it takes on a properly configured machine.