# Adding MatDiagonalSet, MatAXPY and MatAYPX

Dear Developers,

I am solving an evolution problem, which works nicely, but I am trying to optimize it. As not every part of my system matrix changes in each step, I would like to make my code more efficient by assembling a system from block matrices, evaluating only the terms which are necessary to be updated in each step. I am attempting this using the PETSc interface. Therefore, it would be more efficient to do as much of this as I can using PETSc numbering. To be able to do this, I need the PETSc functions MatDiagonalSet, MatAXPY and MatAYPX. I kindly ask you to consider making these functions available in FreeFem.

Just FYI, `MatAXPY` is already available (`A += alpha * B;` is a valid FreeFEM instruction which is dispatched to PETSc `MatAXPY`). `MatAYPX` simply calls `MatAXPY` under the hood, see https://petsc.org/release/src/mat/utils/axpy.c.html#MatAYPX.

Thanks for the info, I will try this out right away!

Also, early optimization is the root of all evil. Could you run your problem with `-log_view`, maybe add a couple of `PetscLogStage` in your script — with `PetscLogStagePush()/PetscLogStagePop()` — and send the output? I would be surprised that these trivial operations are the costliest of your script.

Thanks for these tips helpful tips, I will look into these (although `-log_view` does not seem to give much info regarding the system as far as I see).

Also some remarks regarding the problems I am solving: these are boundary layer-type problem, so I am solving `[P2,P2,P1]` complex systems on a 2D mesh with roughly 40k elements/20k vertices. For this, I use LU factorization, and the system assembly takes about 1.0-1.8 seconds, why the solution of the system takes about 3.6 second, and my system has several terms. This is the reason I thought that even saving like 10%-20% on the system assembly can be worth it.

How are you measuring those times? Through FreeFEM timers, or via `-log_view`? Please send the output of `-log_view`, this is the most reliable source of information (and consider adding `PetscLogStage` otherwise assemblies are not accounted for).

Thanks, I really appreciate the help. Here is a log file from a test run using a few steps only. `damping` and `no stab` are the events which time the system assembly
(if I did no screw it uphopefully)

PSE_test.log (729.4 KB)

OK, I see

`````` 0:      Main Stage: 1.1108e+02  68.8%  5.4755e+12 100.0%  5.200e+03 100.0%  2.736e+04      100.0%  5.550e+02  91.4%
``````

Out of which

``````MatLUFactorNum        30 1.0 1.0529e+02 1.0 2.29e+09 1.8 0.0e+00 0.0e+00 0.0e+00 65  0  0  0  0  95  0  0  0  0    88
``````

So indeed, there is very little need to optimize `MatAXPY` and such, but instead, we need to find a better solver than plain LU. Or maybe, you could try to reuse the factorization for multiple steps?

I see, thanks for the tip. I think will so as you suggest and try to find a more efficient solver. Unfortunately, I do not think that reusing the LU factorization works since in each marching step, I have to do an iteration which requires updating my equations.
However, I have very little idea where to start for choosing a more efficient solver. I assume as my equations have a very similar structure to the Navier-Stokes equations, I should start off by using a fieldsplit preconditioner, but I have no idea what solvers/preconditioners I should use to solve my system. Do you have any suggestion? I am actually solving the parabolized stability equations, which is discussed here for example. Could the mAL preconditioner be used for this with tuning the parameters?

Your problem is so tiny that an augmented Lagrangian approach is probably not necessary. Again, first, I would use as a preconditioner a previous LU factorization. Then, try other simple approach (that could work for tiny problems as yours) such as block Jacobi or additive Schwarz.

I see I misunderstood what you meant. I will definitely try reusing the LU factorization. Unfortunately I have trouble figuring out how to do it. KSPSetReusePreconditioner seems to me to be the method, but I do not know whether I can access it (also the how is ambiguous to me). Another example suggest me that just something like `set(MyMat, sparams = "-ksp_type fgmres -reuse_preconditioner 1"` could work?

The option is `-ksp_reuse_preconditioner`.

Thanks, I tried to make this work by modifying the Navier-Stokes PETSc example. However, I seem to screw something up: the GMRES solver seems to be working but when I run the script with `-log_view`, I get both using only LU only and trying to reuse LU factorization for GMRES the following
`MatLUFactorNum 5 `
which suggests me that the LU factorization is recomputed in each step when the jacobian is computed. Could you please take a look at the example what am I doing wrong?
navier-stokes-2d-PETSc-reuselu.edp (2.8 KB)

If you are using a `SNES`, then the option is `-snes_lag_preconditioner`, e.g., ` -snes_lag_preconditioner 2`, cf. SNESSetFromOptions.

Thank you once again, I really appreciate the help! Now that I am a bit more familiar with these options, I will try to apply it to my problems, and see how it affects the performance!

Dear @prj,

I managed to implement the reused-LU preconditioned FGMRES for the first part of my problem (real-values evolution problem, and in each marching step, a nonlinear system is solved with SNES). This technique resulted in an ~30% lower runtime compared to using only LU factorization. This is a huge improvement; I am thankful for your suggestion.
Motivated by this promising result, I would like to try out the block-Jacobi and additive Schwarz method too you suggested. However, before jumping into it, I would like to ask you whether my approach is correct or not. Since I have an incompressible fluid problem, the pressure-block of my system is singular; therefore, simply writing `-ksp_type fgmres -pc_type asm ` will fail. Because of this, I think the following example is a good way to start things off: I need to define the block for the velocity/pressure (although I am not sure whether I should treat the three velocity fields separately) and define the block preconditioners/solvers one by one, and also supply the Schur complement. Or should I provide a Schur complement like in this example, and then I can use `-ksp_type fgmres -pc_type asm `?

Your problem is tiny, so I would not recommend using PCFIELDSPLIT, as this is more for targeting large-scale systems. `-pc_type asm` will fail because by default, PETSc uses an ILU(0) subdomain solver, which will complain that there are zero coefficients on the diagonal. You can try either `-pc_type asm -sub_pc_type lu` or `-pc_type asm -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps`.

Thanks for the tip, `-pc_type asm -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps` works indeed, but both `asm` and ``bjacobi``` underperforms reusing the LU factorization as PC.

I have another question: reusing LU factorization works fine with `SNESSolve` + ` `-snes_lag_preconditioner`, plus changing the propeties of `Mat` inside the function that calculates the Jacobian. However, `KSPSolve` + `ksp_reuse_preconditioner` seems to completely turn off the calculation of the LU preconditioner. How can I force the recalculation of the of the PC?