March 7, 2023, 7:51am
I am solving a 3D Stokes problem with PETSc in parallel, with this variationnal formulation
grad(u)' * grad(v) + grad(uB)' * grad(vB) + grad(uC)' * grad(vC) - div(u) * q - div(v) * p + epsq * p * q
To solve this I am currently using the solver
gmres with the
-ksp_type gmres -pc_type lu -pc_factor_mat_solver_type mumps.
However this method, is memory and time consuming due to the
LU factorization (so I can only run small case even on HPC).
I have also try the solver
block jacobi with a
ILU preconditionning, but the the linear solver does not converge.
Could someone know a solver and a preconditionner to solve efficiently such a problem ?
Thank you in advance for your help,
March 7, 2023, 8:06am
Use fieldsplit and look at the literature.
March 7, 2023, 8:29am
Thanks ! I will have a look on this.
March 7, 2023, 10:24am
I have tried the solver proposed on the FreeFEM git:
set(A, sparams = " -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_schur_fact_type lower -pc_fieldsplit_detect_saddle_point -fieldsplit_velocity_sub_pc_type lu "
+ " -fieldsplit_pressure_sub_pc_type lu -fieldsplit_velocity_sub_pc_factor_mat_solver_type mumps -fieldsplit_pressure_sub_pc_factor_mat_solver_type mumps -ksp_monitor -ksp_view "
+ " -fieldsplit_velocity_ksp_type gmres -fieldsplit_velocity_ksp_max_it 5 -fieldsplit_pressure_ksp_type gmres -fieldsplit_pressure_ksp_max_it 5 -ksp_rtol 1e-6");
However, this solver is slower that my previous solver and gives me an incorrect result.
Do you know what might be the cause ?
or how could I adapt it ?
Thank you in advance,
March 7, 2023, 12:37pm
What does -ksp_converged_reason return?
March 7, 2023, 12:50pm
0 KSP Residual norm 4.595362033965e+31
1 KSP Residual norm 3.698028393291e+16
Linear solve converged due to CONVERGED_RTOL iterations 1
March 7, 2023, 12:52pm
Are you using the default
tgv value of
March 7, 2023, 12:54pm
Yes I am using the default one.
March 7, 2023, 1:01pm
Then switch to something else first.
March 7, 2023, 1:10pm
I have tried different value of
tgv but the result remain not correct and the residuals remain big.
March 7, 2023, 1:12pm
What do you get with
tgv = -1 +
-ksp_monitor_true_residual -pc_type lu?
March 7, 2023, 1:19pm
By doing this, I get the right result:
-->Solving with PETSc began....
0 KSP preconditioned resid norm 2.720997719081e+02 true resid norm 4.595728221659e+01 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 1.879238360449e-04 true resid norm 1.609760664389e-10 ||r(i)||/||b|| 3.502732508859e-12
and it is much faster.
March 7, 2023, 1:22pm
For the largest problem you tried with this, what was the performance of the method? Also, if the problem is symmetric with homogeneous Dirichlet boundary conditions (
tgv = -2), you should consider switching to
March 7, 2023, 1:33pm
-ksp_type gmres -pc_type lu -pc_factor_mat_solver_type mumps I have been able to run a case with about 384 000 tetras (
mesh3 cube(40, 40, 40)) on 144 CPUs and it takes 53 min.
I have not succeeded to run larger case with this method,
March 7, 2023, 1:41pm
I have try
-pc_type cholesky. The code converges, but the results are not correct.
March 7, 2023, 1:42pm
Is your problem symmetric with homogeneous Dirichlet boundary conditions?
March 7, 2023, 1:45pm
Sorry for my misunderstanding, I don’t have homogeneous Dirichlet boundary conditions. And my problem I think is not symmetric due to the extra term
March 7, 2023, 1:47pm
-eps*p*q is symmetric, but if you don’t have homogeneous Dirichlet BC,
tgv=-2 is useless indeed. 53 minutes for the above problem suggest that something is wrong on your machine.
March 7, 2023, 1:55pm
Considering a case with 750 000 tetras (
cube(50, 50, 50)) I have not been able to run it even using more that 200 CPUs.
The PETSc iterations don’t even start. This suggests that there is indeed a problem on the machine. But in the other hand it is known that LU factorization is time and memory consuming.
So I don’t know what is exactly the problem
March 7, 2023, 2:00pm
Share the test case, I will let you know how much time it takes on a properly configured machine.