Parallel implementation of a problem with multiple right hand sides

Dear FreeFEM developers,

I am attempting to resolve a linear system (for a matrix K) with multiple right hand sides using HPC.
My approach is to solve the problem for each right hand side on each processor using an iterative method.
Hence I need to create a global stiffness matrix and then distribute it to each processor.

For an unpartitioned mesh Thglob, If I were to do
fespace Vhglob(Thglob, P1)
varf lap(u, v) = ...
Mat Kglob = lap(Vhglob, Vhblob);
then FreeFEM would make multiple copies of the same stiffness matrix for each processor right?
For a problem with large DoF, this would be a waste of memory.

So my question is: is there a way to create a single copy of the stiffness matrix in a parallel script?
And even better, could I make this matrix locally (using createMat), assemble the global matrix and then keep a single copy?

I apologize if this question has already been answered in a previous thread.
Thank you in advance,

This example demonstrated how to use PETSc to solve a linear system with multiple rsh.

My approach is to solve the problem for each right hand side on each processor using an iterative method.

Why would you want to do that? It is usually much more efficient to solve all RHS on all processes, so that you can parallelize their assembly and use efficient solvers such as block Krylov methods (cf. the example provided by @aszaboa).

I want to do that so that communication between processors is minimized.
If CG for each RHS is implemented on all processors, there has to be exchange of the gradient vector after each and every iteration of CG; whereas if it performed locally, there is zero communication in CG.
So I suppose that latter can be much faster.

I am unable to open the link shared by @aszaboa, can you share the file or point at another example on FF?

Could you please formulate exactly what you want to achieve? Because what you are saying doesn’t make much sense to me. Communicating part of a vector is very little work compared to having to assemble the same coefficient matrix on every processes. Setting up the same preconditioner on every processes is also a waste, while you could keep it distributed. The file is here:
example14.edp (1.8 KB).

Thank you for sharing the file and for your quick response.

I am basically trying to solve the Eigenvalue problem for each index on each processor.
The goal is to develop a solver that is faster (or at least more memory efficient) than ARPACK.
This is of course ambitious.

I have the approximate eigenvalues and eigenvectors and I am attempting to correct them further using GMRES on each eigenvalue equation.
This is very different from the algorithm IRAM (implicitly restarted Arnoldi method, used in ARPACK).

A priori, I have not thought of applying a preconditioner here, since the eigenvalue problem is indefinite and I am not sure if AMG or Jacobi is good idea.

If communication is not as time consuming a you point out, I would discard the idea of solving each RHS on each processor and simply partition the matrix as usual.

I will use the example you shared on my problem, and see if the ASM (with LU) preconditioner helps.

Why not use SLEPc? You can provide an initial space if you have a good first guess. Still, your problem is not very well-posed, (e.g., what does “solve the Eigenvalue problem for each index on each processor” mean?) so it’s difficult for me to give you the most appropriate answer. If you want to share more information in private, please feel free to do so.

I would certainly try SLEPc. I had heard (from someone who uses it), that when we give SLEPc the initial approximate eigenvectors, it takes a linear combination of those eigenvectors.
This is a well-known strategy in explicitly or implicitly restarted Arnoldi method, but to my experience (I have myself implemented these algorithms), it does not result in a significant speed-up.
But maybe SLEPc has some better tricks.

More details on the Eigenvalue problem, I shall write to you in private.