Thanks for the helpful feedback!
I would like to share my code, but unfortunately because of the sake of generality and due to the dependence of other programs, it is really long and pointless to share IMO.
I am solving spatial evolution (boundary layer like) problem, which is also a harmonic balance problem. In each marching step, I need to solve successive linear systems during the iterative search for the solution. I am using fGMRES + block-Jacobi (in the case of strong coupling, block GS) preconditioner with calculating the LU factorization only once and then reusing it in the successive iterations (as you suggested ). This seems to be a very efficient approach as (i) the system has a strong block-diagonal dominance, and (ii) during the iteration of each marching step the system changes only slightly. Each block (harmonic) has a size of 10^4-10^5 unknowns in PETSc numbering, so I guess LU factorization is a feasible preconditioner.
I am not sure whether efficiency can be gained from modifying the preconditioner. Do you think using a different solver than fGMRES might improves the performance? Or maybe using KSPSetInitialGuessNonzero
might improve the efficiency? Any help is appreciated.