Memory allocation problem in MPI computation using PETSc

Dear FreeFEM developers and users,

I’m writing here since I’m having an issue in memory allocation during MPI computation using PETSc.
The problem is the following one: I’m using the same script and the same amount of resources and increasing the mesh size (which I’m uploading with the command readmesh3) above a certain threshold (~ 1GB meshes) I get the following error (repeted for each mpi rank) right at the line with the command buildDmesh(Th):

***Failed to allocate memory for adjncy.

  current line = 122 mpirank 131 / 350
Assertion fail : (nbv_surf)
        line :1981, in file ../femlib/Mesh3dn.cpp
 err code 6 ,  mpirank 131

I’m working on a cluster which is very big, using 175 GB of memory for each node (working now on 15 nodes) so it seems strange to me to have overcome the maximum available memory on the cluster nodes on which my simulation is currently running.

Is someone able to tell me where the problem can be, and how it can be solved?

Thanks in advance for your cooperation.

Yours,

Marco

Could you try to apply the following patch and then recompile FreeFEM and the msh3 plugin, please?

diff --git a/src/femlib/Mesh3dn.cpp b/src/femlib/Mesh3dn.cpp
index 50489d93..cfadf8b4 100644
--- a/src/femlib/Mesh3dn.cpp
+++ b/src/femlib/Mesh3dn.cpp
@@ -1956,3 +1956,3 @@ namespace Fem2D
     {
-        
+#if 0
         if (meshS) {
@@ -2034,2 +2034,3 @@ namespace Fem2D
         
+#endif
     }

Thanks @prj for your prompt answer!
Unfortunately the modification you have suggested is not solving the issue.

Just to be sure to have understood properly, these are the modifications which I have introduced:
image

image

If you have any other suggestion it will be extremely helpful for me!

Thanks again for your precious contribution!

Marco

So, if I understand correctly, you are load a mesh file of size ~1GB? What is the mesh format?

Exactly! Everything works smoothly since the mesh size is below 1GB, once overcome that threshold the above mentioned error message appears. If it can help I have put a “cout” right after the buildDmesh command (if mpirank == 0) and the execution prints that cout line. Indeed the execution is never killed, but simply shows the error message and freeze without computing anything (also waiting several hours when for meshes slightly below 1GB in few minutes the first results were already computed)

the mesh is saved in .mesh format and exported from gmsh

Could you, by any chance, load a coarse mesh and then perform uniform mesh refinement? Or do you need to load an initial fine mesh to properly resolve, e.g., nonlinear boundaries?

Unfortunately I can’t perform uniform mesh refinement, since is fundamental to have different regions meshed with different mesh size.

Right, you can generate an initial coarse mesh with different mesh sizes, then perform mesh refinement. The mesh sizes will still be different.

Oh, I wasn’t aware of this possibility. Can you suggest me how this can be done?

Instead of using buildDmesh, please try buildMat, cf. FreeFem-sources/diffusion-2d-PETSc.edp at master · FreeFem/FreeFem-sources · GitHub, with a -split parameter greater than one. This is an old API, if this solves your issue of using large meshes, I’ll add this option to the buildDmesh + (newer) createMat API.

I’m trying and I’ll let you know as soon as possible!

Thank you very much for your support!

Sorry for disturbing you again @prj , can you tell me how, in this case I can move my results to the ThPlt mesh since after the buildMat command Th is now refined and ThPlt no?

hope to be clear.

Why would you want to do that on a 1GB mesh? You should rather keep your solution distributed.

I was doing it since I need to compute integral values from the computed (distributed) variable. Is it possible to be done just keeping it distributed?

Yes, there are some examples in the folder examples/hpddm, using, e.g., the partition of unity. It depends on what you compute exactly, but you should not use the initial global mesh for something as trivial as computing integrals.

In my case I need to compute a power deposition arising from an electrical current distribution which is evaluated starting from the computed magntic vector potential (solving basically transient ampere law).
What I was doing since today was:

  1. Computing the distributed solution (magnetic vector potential) from solving the algebraic system (sol=A*b^-1) defined on the distributed mesh
  2. “Moving” the computed magnetic vector potential from distributed mesh to global one (ThPlt) with the following list of command:
  real[int] AxPETSc;
  real[int] A0xPETSc;
  ChangeNumbering(AA,Ax[],AxPETSc);
  ChangeNumbering(AA,Ax[],AxPETSc,inverse = true);
  ChangeNumbering(AA,A0x[],A0xPETSc);
  ChangeNumbering(AA,A0x[],A0xPETSc,inverse = true);

  Agx[](SubIdx) = Ax[];
  Ag0x[](SubIdx) = A0x[];

  mpiAllReduce(Agx[],AgxS[], mpiCommWorld, mpiSUM);
  mpiAllReduce(Ag0x[],Ag0xS[],mpiCommWorld,mpiSUM);
  1. Compute from the obtained [AgxS, AgyS, AgzS] and [Ag0xS, Ag0yS, Ag0zS] the information I need (i.e. Current distribution and deposited power) on the global mesh (ThPlt)

So what you reccomend is to avoid the step 2 and perform step 3 in a “distributed way”. Is that correct?

If this is the case, can you provide me a more precise reference within the folder you are mentioning where I can take inspiration?

Thank you very much again!

Moreover, in the way you are mentioning, how can I store (e.g. in a .vtu file) the current/power distribution in the entire domain?

How do you compute the current distribution and the deposited power?
Lookup savevtk in the examples/hpddm, that will show you examples.

here the lines computing current and power (in two different regions):

    [Jex, Jey, Jez] = SigmaPlt*[(Ag0xS-AgxS)/dt,(Ag0yS-AgyS)/dt,(Ag0zS-AgzS)/dt];

    PTFPP = int3d(ThPlt,cassaPlt)(1./SigmaPlt*(Jex^2+Jey^2+Jez^2));
    PVVPP = int3d(ThPlt,VVPlt)(1./SigmaPlt*(Jex^2+Jey^2+Jez^2));

Then you can do like in FreeFem-sources/minimal-surface-Tao-2d-PETSc.edp at develop · FreeFem/FreeFem-sources · GitHub, except you cannot use createPartition (because that’s not available with buildMat) and you need to initiliaze part (of the same fespace as your Mat, not P0 like in the example) by part[] = A.D.