Supply PETSc numbering for shared-memory block matrix

prj · July 13, 2022, 3:40pm

For a reason I’m not sure I understand, the new options are not being processed by the inner solver. There is however a way to force this, both in PETSc and in FreeFEM. In your .edp, I believe if you do:

set(A11, parent = Sblock, sparams= " -fieldsplit_1_ksp_type preonly -fieldsplit_1_pc_type lu -fieldsplit_1_ksp_reuse_preconditioner false "); // do not call "set" on Sblock

Then, it will work. This forces the inner solver to use the proper options.

aszaboa · July 13, 2022, 4:12pm

Thanks you so much! This indeed seems to achieve the job! I attach a working example made from the previous codes I sent - someone might find this useful.
ChangeOperator-MWE-sol.edp (1.9 KB)

aszaboa · July 14, 2022, 10:54pm

Just one more question: the following code produces a PETSc erro at the last line:

load "PETSc"      
macro dimension()2// EOM      
include "macro_ddm.idp" 

macro grad(u)[dx(u), dy(u)]// EOM 
func Pk = P1;    

mesh Th = square(getARGV("-global", 40), getARGV("-global", 40)); 
Mat A00;
createMat(Th, A00, Pk)

fespace Wh(Th, Pk);  
varf vPb(u, v) = int2d(Th)(grad(u)' * grad(v)) + int2d(Th)(v) + on(1, u = 0.0);
varf vPb2(u, v) = int2d(Th)(grad(u)' * grad(v) + 1e-8*dx(u)*dx(v)) + int2d(Th)(v) + on(1, u = 0.0);
real[int] rhs = vPb(0, Wh);

A00 = vPb(Wh, Wh);
Mat<real> A11(A00);
matrix I = eye(Wh.ndof);
A11 = I;

Mat<real> Sblock = [[A00, 0], [0, A11]];

real[int] rhsPETScTotal, usolPETScTotal;
changeNumbering([A00,A11],[rhs,rhs],rhsPETScTotal);
Wh<real> u1=1;
Wh<real> u2=2;
real[int] fieldIndices;
changeNumbering([A00,A11],[u1[], u2[]],fieldIndices); 
string PETScPar = " -ksp_type preonly -ksp_monitor_true_residual -ksp_initial_guess_nonzero false  -pc_type fieldsplit -pc_fieldsplit_type additive -options_left ";
set(Sblock, sparams=PETScPar, fields=fieldIndices);
set(A00, parent = Sblock, sparams=" -fieldsplit_0_ksp_type preonly -fieldsplit_0_pc_type lu -fieldsplit_0_ksp_reuse_preconditioner true ");
set(A11, parent = Sblock, sparams=" -fieldsplit_1_ksp_type preonly -fieldsplit_1_pc_type lu -fieldsplit_1_ksp_reuse_preconditioner true ");
KSPSolve(Sblock, rhsPETScTotal, usolPETScTotal); 

matrix<real> A11new = vPb2(Wh, Wh);
ChangeOperator(A11,A11new);

They key difference between the previous code I sent and this one is the following. Previously. a block of the matrix was changed from the Laplacian to the identity, which seemed to be fine. However, doing it in reverse does not seem to be OK ( I did not check the results precisely, but the mile long PETSc error tells me something is seriously wrong). My question is: is this a bug, or does the sparsity pattern of the matrix cause this (I have no other idea what would cause the error). I think I can circumvent this error on my application I think; nevertheless, I would be happy to know whether this is a bug, or there is some reason for this behavior.

prj · July 15, 2022, 5:04am

ChangeOperator should be used with matrices with the same sparsity pattern or in such a way that it does not trigger a reallocation. You can enforce the same pattern by augmenting your varf with a penalized term like 1e-16*(terms you have in your other varf but not on this one). Or you can assemble an initial dummy Mat with all your terms, and then reassemble the proper varf.

aszaboa · July 15, 2022, 6:02am

Thanks for the clarification and the tips!

aszaboa · September 14, 2022, 3:57pm

Dear @prj ,

I want to benchmark the iterative solver I use for my block-system against LU factorization. Since the structure of the system I am solving over and over does not change, I could use PCFactorSetReuseOrdering for LU.
The thing is, this would require for me to add the interface for MatCopy as I currently need to use MatConvert which does not preserve any info from the preconditioner. Before attempting the implementation of MatCopy, I took a look at the equation solution with -log_view + PetscLogStagePush/PetscLogStagePop, and the results look like this

 ------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
...
 --- Event Stage 3: Eq. Solution

BuildTwoSided       2902 1.0 6.1489e-0110.4 0.00e+00 0.0 4.2e+04 4.0e+00 2.9e+03  0  0  5  0  1   0  0 21  0 12     0
MatMult             1451 1.0 3.9460e+01 1.2 6.90e+10 1.2 7.0e+04 8.7e+03 1.5e+03  0  0  8  5  1   0  0 35  8  6  7809
MatSolve            1451 1.0 2.6974e+02 1.0 6.59e+14 1.5 1.3e+05 5.3e+04 8.7e+03  0100 15 60  3   1100 65 92 35 10014955
MatLUFactorSym      1451 1.0 1.1202e+03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.8e+03  2  0  0  0  2   2  0  0  0 24     0
MatLUFactorNum      1451 1.0 4.7517e+04 1.0 5.03e+11 1.8 0.0e+00 0.0e+00 0.0e+00 87  0  0  0  0  97  0  0  0  0    39
VecNorm             2902 1.0 7.0892e+0013.8 8.98e+08 1.2 0.0e+00 0.0e+00 2.9e+03  0  0  0  0  1   0  0  0  0 12   561
VecSet              1451 1.0 2.1920e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX             1451 1.0 1.3461e-01 1.4 2.25e+08 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  7383
VecScatterBegin     4353 1.0 1.7169e+00 1.5 0.00e+00 0.0 1.2e+05 4.5e+04 4.4e+03  0  0 15 48  2   0  0 62 74 18     0
VecScatterEnd       4353 1.0 4.7873e+00 7.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph          1451 1.0 1.8840e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp             4353 1.0 1.3270e+00 1.5 0.00e+00 0.0 8.3e+04 8.7e+03 2.9e+03  0  0 10  6  1   0  0 41 10 12     0
SFPack              4353 1.0 1.6344e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFUnpack            4353 1.0 1.2561e-01 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp            1451 1.0 1.4229e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve            1451 1.0 3.1021e+02 1.0 6.59e+14 1.5 2.0e+05 3.8e+04 1.3e+04  1100 24 66  5   1100100100 53 8709433
PCSetUp             1451 1.0 4.8637e+04 1.0 5.03e+11 1.8 0.0e+00 0.0e+00 5.8e+03 89  0  0  0  2  99  0  0  0 24    38
PCApply             1451 1.0 2.6974e+02 1.0 6.59e+14 1.5 1.3e+05 5.3e+04 8.7e+03  0100 15 60  3   1100 65 92 35 10014814
------------------------------------------------------------------------------------------------------------------------

The ratio of MatLUFactorNum/MatLUFactorSym tells me that using PCFactorSetReuseOrdering only a little time can be saved, and as an estimate I can just substract the time of MatLUFactorSym if I want to be conservative. Am I correct, or do I miss something?

prj · September 15, 2022, 6:12am

That sounds about right! I’m curious about your iterative approach, so if it’s polished enough for your to share in private, please do let me know (and if you need help to polish it some more, you can count me in!).

aszaboa · September 15, 2022, 2:06pm

Thanks for the helpful feedback!

I would like to share my code, but unfortunately because of the sake of generality and due to the dependence of other programs, it is really long and pointless to share IMO.

I am solving spatial evolution (boundary layer like) problem, which is also a harmonic balance problem. In each marching step, I need to solve successive linear systems during the iterative search for the solution. I am using fGMRES + block-Jacobi (in the case of strong coupling, block GS) preconditioner with calculating the LU factorization only once and then reusing it in the successive iterations (as you suggested ). This seems to be a very efficient approach as (i) the system has a strong block-diagonal dominance, and (ii) during the iteration of each marching step the system changes only slightly. Each block (harmonic) has a size of 10^4-10^5 unknowns in PETSc numbering, so I guess LU factorization is a feasible preconditioner.

I am not sure whether efficiency can be gained from modifying the preconditioner. Do you think using a different solver than fGMRES might improves the performance? Or maybe using KSPSetInitialGuessNonzero might improve the efficiency? Any help is appreciated.

prj · September 15, 2022, 2:21pm

OK, I think this is somehow connected to what we did here Adjoint-based sensitivity analysis of periodic orbits by the Fourier–Galerkin method - ScienceDirect. Maybe you could try FGCODR (-ksp_type hpddm -ksp_hpddm_type gcrodr -ksp_hpddm_variant flexible -ksp_hpddm_recycle 5) instead of FGMRES to further decrease the number of iterations? If you don’t plan on increasing the block size, then I guess what you are doing is close to the ideal preconditioner.

aszaboa · September 19, 2022, 9:04am

Thanks for the suggestions.

The equations in your paper are indeed very similar to mines, I plant to read it thoroughly. I think a significant difference is that in my case the harmonics are small relative to the mean (base) flow, therefore the strong diagonal dominance of my system and the good convergence properties.

I tried the recycled GMRES but seems to decrease the performance. I think this is due to the fact that the solver converges quite fast, at most 7-8 iterations are required to solve the system. This suggests me that there is not much room for the improvement of the solver either.

I think using higher order FE (e.g., P4-P3 instead of P2-P1 for velocity-pressure) i might be able to reduce the size of the system and improve the speed, but I do not know whether the iterative solver work the same way with less sparse matrices

prj · September 19, 2022, 12:43pm

Ah OK, indeed, for so few iterates, there is little point in trying to recycle information.

Topic		Replies	Views
Changing Matrix components PETSc Parallel General Discussion	10	473	April 2, 2023
Scalar equation in PETSc block matrix system General Discussion	6	744	February 18, 2021
Preconditioner for PETSc Mat General Discussion	7	579	July 18, 2021
PETSc with errors depend on the number of processors General Discussion	3	463	November 23, 2022
PETSc operators General Discussion	9	584	September 6, 2021

Supply PETSc numbering for shared-memory block matrix

Related topics