Adding MatDiagonalSet, MatAXPY and MatAYPX

aszaboa · November 22, 2021, 3:40pm

Dear Developers,

I am solving an evolution problem, which works nicely, but I am trying to optimize it. As not every part of my system matrix changes in each step, I would like to make my code more efficient by assembling a system from block matrices, evaluating only the terms which are necessary to be updated in each step. I am attempting this using the PETSc interface. Therefore, it would be more efficient to do as much of this as I can using PETSc numbering. To be able to do this, I need the PETSc functions MatDiagonalSet, MatAXPY and MatAYPX. I kindly ask you to consider making these functions available in FreeFem.

prj · November 22, 2021, 3:44pm

Just FYI, MatAXPY is already available (A += alpha * B; is a valid FreeFEM instruction which is dispatched to PETSc MatAXPY). MatAYPX simply calls MatAXPY under the hood, see https://petsc.org/release/src/mat/utils/axpy.c.html#MatAYPX.

aszaboa · November 22, 2021, 3:45pm

Thanks for the info, I will try this out right away!

prj · November 22, 2021, 3:47pm

Also, early optimization is the root of all evil. Could you run your problem with -log_view, maybe add a couple of PetscLogStage in your script — with PetscLogStagePush()/PetscLogStagePop() — and send the output? I would be surprised that these trivial operations are the costliest of your script.

aszaboa · November 22, 2021, 4:12pm

Thanks for these tips helpful tips, I will look into these (although -log_view does not seem to give much info regarding the system as far as I see).

Also some remarks regarding the problems I am solving: these are boundary layer-type problem, so I am solving [P2,P2,P1] complex systems on a 2D mesh with roughly 40k elements/20k vertices. For this, I use LU factorization, and the system assembly takes about 1.0-1.8 seconds, why the solution of the system takes about 3.6 second, and my system has several terms. This is the reason I thought that even saving like 10%-20% on the system assembly can be worth it.

prj · November 22, 2021, 4:17pm

How are you measuring those times? Through FreeFEM timers, or via -log_view? Please send the output of -log_view, this is the most reliable source of information (and consider adding PetscLogStage otherwise assemblies are not accounted for).

aszaboa · November 22, 2021, 4:49pm

Thanks, I really appreciate the help. Here is a log file from a test run using a few steps only. damping and no stab are the events which time the system assembly
(if I did no screw it uphopefully)

PSE_test.log (729.4 KB)

prj · November 22, 2021, 6:44pm

OK, I see

 0:      Main Stage: 1.1108e+02  68.8%  5.4755e+12 100.0%  5.200e+03 100.0%  2.736e+04      100.0%  5.550e+02  91.4%

Out of which

MatLUFactorNum        30 1.0 1.0529e+02 1.0 2.29e+09 1.8 0.0e+00 0.0e+00 0.0e+00 65  0  0  0  0  95  0  0  0  0    88

So indeed, there is very little need to optimize MatAXPY and such, but instead, we need to find a better solver than plain LU. Or maybe, you could try to reuse the factorization for multiple steps?

aszaboa · November 22, 2021, 9:53pm

I see, thanks for the tip. I think will so as you suggest and try to find a more efficient solver. Unfortunately, I do not think that reusing the LU factorization works since in each marching step, I have to do an iteration which requires updating my equations.
However, I have very little idea where to start for choosing a more efficient solver. I assume as my equations have a very similar structure to the Navier-Stokes equations, I should start off by using a fieldsplit preconditioner, but I have no idea what solvers/preconditioners I should use to solve my system. Do you have any suggestion? I am actually solving the parabolized stability equations, which is discussed here for example. Could the mAL preconditioner be used for this with tuning the parameters?

prj · November 23, 2021, 6:53am

Your problem is so tiny that an augmented Lagrangian approach is probably not necessary. Again, first, I would use as a preconditioner a previous LU factorization. Then, try other simple approach (that could work for tiny problems as yours) such as block Jacobi or additive Schwarz.

aszaboa · November 23, 2021, 9:54am

I see I misunderstood what you meant. I will definitely try reusing the LU factorization. Unfortunately I have trouble figuring out how to do it. KSPSetReusePreconditioner seems to me to be the method, but I do not know whether I can access it (also the how is ambiguous to me). Another example suggest me that just something like set(MyMat, sparams = "-ksp_type fgmres -reuse_preconditioner 1" could work?

prj · November 23, 2021, 11:58am

The option is -ksp_reuse_preconditioner.

aszaboa · November 23, 2021, 2:57pm

Thanks, I tried to make this work by modifying the Navier-Stokes PETSc example. However, I seem to screw something up: the GMRES solver seems to be working but when I run the script with -log_view, I get both using only LU only and trying to reuse LU factorization for GMRES the following
MatLUFactorNum 5
which suggests me that the LU factorization is recomputed in each step when the jacobian is computed. Could you please take a look at the example what am I doing wrong?
navier-stokes-2d-PETSc-reuselu.edp (2.8 KB)

prj · November 23, 2021, 3:02pm

If you are using a SNES, then the option is -snes_lag_preconditioner, e.g., -snes_lag_preconditioner 2, cf. SNESSetFromOptions.

aszaboa · November 23, 2021, 10:15pm

Thank you once again, I really appreciate the help! Now that I am a bit more familiar with these options, I will try to apply it to my problems, and see how it affects the performance!

aszaboa · November 25, 2021, 5:11pm

Dear @prj,

I managed to implement the reused-LU preconditioned FGMRES for the first part of my problem (real-values evolution problem, and in each marching step, a nonlinear system is solved with SNES). This technique resulted in an ~30% lower runtime compared to using only LU factorization. This is a huge improvement; I am thankful for your suggestion.
Motivated by this promising result, I would like to try out the block-Jacobi and additive Schwarz method too you suggested. However, before jumping into it, I would like to ask you whether my approach is correct or not. Since I have an incompressible fluid problem, the pressure-block of my system is singular; therefore, simply writing -ksp_type fgmres -pc_type asm will fail. Because of this, I think the following example is a good way to start things off: I need to define the block for the velocity/pressure (although I am not sure whether I should treat the three velocity fields separately) and define the block preconditioners/solvers one by one, and also supply the Schur complement. Or should I provide a Schur complement like in this example, and then I can use -ksp_type fgmres -pc_type asm ?

prj · November 25, 2021, 5:35pm

Your problem is tiny, so I would not recommend using PCFIELDSPLIT, as this is more for targeting large-scale systems. -pc_type asm will fail because by default, PETSc uses an ILU(0) subdomain solver, which will complain that there are zero coefficients on the diagonal. You can try either -pc_type asm -sub_pc_type lu or -pc_type asm -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps.

aszaboa · November 25, 2021, 11:30pm

Thanks for the tip, -pc_type asm -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps works indeed, but both asm and ``bjacobi``` underperforms reusing the LU factorization as PC.

I have another question: reusing LU factorization works fine with SNESSolve + `-snes_lag_preconditioner, plus changing the propeties of Mat inside the function that calculates the Jacobian. However, KSPSolve + ksp_reuse_preconditioner seems to completely turn off the calculation of the LU preconditioner. How can I force the recalculation of the of the PC?

prj · November 26, 2021, 8:27am

Please send a reproducer.

aszaboa · November 26, 2021, 9:20am

I created a modified version of the 2D Navier-Stokes example which I think correctly reproduces the issue. Thanks for taking a look at it.
navier-stokes-2d-PETSc-KPS-updateMat.edp (3.1 KB)

Topic		Replies	Views
Supply PETSc numbering for shared-memory block matrix General Discussion	30	1526	September 19, 2022
Poisson equation with lagrange multiplier using PETSc General Discussion	5	924	February 27, 2023
SNES for bordered block matrix General Discussion	25	1015	February 7, 2023
Call PETSc matrix operator MatDiagonalSet (…) General Discussion	38	1639	June 3, 2024
Resolvent operator with FF/PETSc (MatMatSolve?) General Discussion	20	824	August 9, 2024

Adding MatDiagonalSet, MatAXPY and MatAYPX

Related topics