How to solve multiple problems in parallel (not domain decomposition)

Jade · May 22, 2022, 8:57am

I need to solve multiple instances of the Helmholtz equation with fixed matrix but different right-hand-sides. So I find this example. I have some experience with FEM in FreeFem and MATLAB, but zero experience with parallel computing. I have two questions regarding this example.

Is this code ready for parallel computing? Running the code with time ff-mpirun -np 4 takes even longer than time ff-mpirun -np 1 (almost 4 times longer). Using clock() I see it’s solved 4 times, each time takes the same amount of the time with -np 1. Should I run the code using other commands?
I suppose the code use some domain docomposition and parallel solver to speed up the computation. But in my case the bottleneck is on the function evaluation of the right-hand-side and the assembly of the FEM vector (line 31–35 in the example code). How do we speed up that part using multiple cores? I went through this post and the two YouTube videos but still have no clue (it seems the videos are about domain decomposition and parallel solver).

I also searched other posts but didn’t find a complete code. I hope somebody can provide a minimal but complete example for speeding up problems with multiple rhs (or multiple matrices).

prj · May 22, 2022, 11:47am

What solver are you using?
Domain decomposition is the most efficient way to speed up matrix and right-hand side assemblies.

Jade · May 22, 2022, 12:35pm

Let’s look at this code snippet in the example.

int nRhs = 1000; // in my case I have a lot more iterations
complex[int, int] rhs(Vh.ndof, nRhs), sol(Vh.ndof, nRhs);
for(int i = 0; i < nRhs; ++i) {
    func f = 100 * exp(-20 * ((x-i/3.0)^2 + (y-i/3.0)^2)); // my function takes long to evaluate
    varf vRhs(u, v) = int2d(Th)(f * v);
    rhs(:, i) = vRhs(0, Vh);
}

When I run ff-mpirun -np 8 example14.edp, will the program distribute the for loop in 8 cores or decompose the mesh Th into 8 parts?

prj · May 22, 2022, 12:56pm

It will decompose Th into 8 parts.

Jade · May 22, 2022, 1:11pm

Thank a lot!

Still the example runs slower with -np 8 than with -np 1. I have 8 cores in my computer.

prj · May 22, 2022, 1:23pm

This problem is tiny and for educational purposes only. Increase the mesh size, do not use clock() but instead run the program with the command line -log_view, and at some point you’ll see that it gets faster.

Jade · May 22, 2022, 1:27pm

In my case, each problem is tiny (40x40 2D mesh), but it’s to be solved with millions of different rhs. In this case I suppose domain decomposition will not help a lot but parallelizing the for loop is necessary? What is the API to do so in FreeFem?

prj · May 22, 2022, 1:29pm

See, e.g., Solving a lot of independent ODEs in parallel.

Jade · May 22, 2022, 1:34pm

I have seen your answer before.

int ntask = 200;
for(int i = (mpirank * ntask) / mpisize; i < ((mpirank + 1) * ntask) / mpisize; ++i) {
  cout << "process #" << mpirank << " dealing with task #" << i << endl;
}
mpiAllreduce(...); // for synchronization

Sorry I am really new to these MPI stuff, what should we put inside mpiAllreduce? I looked over FreeFem doc but can’t find relevant parts (most part is about domain decomposition and parallel solver). Should I just read the MPI doc itself?

prj · May 22, 2022, 1:35pm

It will depends on what you want to do with each solution of the “millions” of linear systems.

Jade · May 22, 2022, 1:42pm

I need ouput to a file all solution values interpolated on some discrete points on a larger circle enclosing the support of the source.

prj · May 22, 2022, 1:45pm

OK, then I guess you need to save to disk all your solutions? I feel obligated to warn you that what you want to do seems extremely inefficient.

Jade · May 24, 2022, 6:23pm

Now I can get my code run, but the output data has more points than expected. Here is the last part of my code:

// solve the system
  complex[int, int] B(A.n, rhs.m), X(A.n, rhs.m);
  ChangeNumbering(A, rhs, B);
  set(A, sparams = "-ksp_type preonly -pc_type lu");
  KSPSolve(A, B, X);
  ChangeNumbering(A, sol, X, inverse = true, exchange = true);

  // write to disk the solution values on a circle
  Vh ur;
  real xt;
  real yt;
  for(int i = 0; i < rhs.m; ++i) {
    ur[] = sol(:, i).re;
    for (int n=0; n<nOut; ++n){
      xt = rOut * cos(2*pi*n/nOut);
      yt = rOut * sin(2*pi*n/nOut);
      urfile << ur(xt, yt) << endl;
    }
  }

My rhs.m=1000 and nOut=64. So the total number of values in urfile should be 64000. However, I get 64015 values if I use multiple processes in ff-mpirun!

I feel like domain decomposition will not do much good for my small sized problems, but I do want MPI for the loops in my code (which I know how to do now). Is there a way to disable domain decomposition but still use MPI in my code when I run the command ff-mpirun?

prj · May 24, 2022, 6:31pm

Yes, do not do domain decomposition…

Jade · May 24, 2022, 8:46pm

Before using PETSc, I tried the simple way: sol = A^-1 * rhs, where sol and rhs are 2D arrays. Then I extract each solution as sol(:, i). However, the solution obtained this way are completely wrong unless rhs has only 1 column. For example, one solution looks like this:

Jade · May 24, 2022, 9:55pm

A natural idea for small problems is to store the LU decomposition and reuse it. But I don’t know how is that done in FreeFem.

prj · May 25, 2022, 5:25am

Use PETSc but no domain decomposition.

Jade · May 25, 2022, 6:05am

This is exactly where I don’t understand the code of the original example. I don’t recognize which line of code does domain decomposition explicitly. Thus I don’t know how not to do decomposition in the code except using -np 1 in the command line. But then everything is done sequentially.

prj · May 25, 2022, 6:32am

Functions from the file macro_ddm.idp are used to do domain decomposition. So remove those.

prj · May 26, 2022, 4:49am

Mat is not a function but a type, it comes from load "PETSc" not include "macro_ddm.idp". Here is what you want to do: FreeFem-sources/diffusion-cartesian-2d-PETSc.edp at master · FreeFem/FreeFem-sources · GitHub.

Topic		Replies	Views
Parallel implementation of a problem with multiple right hand sides General Discussion	7	477	January 17, 2022
Automatic parallel General Discussion	4	439	April 28, 2021
Question for parallelization in FreeFem General Discussion	5	421	February 23, 2023
Saving and loading global/distibuted arrays General Discussion	39	2453	August 5, 2020
My computer keeps freezing when running FreeFEM FreeFEM installation	12	872	November 12, 2020

How to solve multiple problems in parallel (not domain decomposition)

Related topics