Memory allocation problem in MPI computation using PETSc

marcodeba · October 18, 2022, 11:27am

Dear FreeFEM developers and users,

I’m writing here since I’m having an issue in memory allocation during MPI computation using PETSc.
The problem is the following one: I’m using the same script and the same amount of resources and increasing the mesh size (which I’m uploading with the command readmesh3) above a certain threshold (~ 1GB meshes) I get the following error (repeted for each mpi rank) right at the line with the command buildDmesh(Th):

***Failed to allocate memory for adjncy.

  current line = 122 mpirank 131 / 350
Assertion fail : (nbv_surf)
        line :1981, in file ../femlib/Mesh3dn.cpp
 err code 6 ,  mpirank 131

I’m working on a cluster which is very big, using 175 GB of memory for each node (working now on 15 nodes) so it seems strange to me to have overcome the maximum available memory on the cluster nodes on which my simulation is currently running.

Is someone able to tell me where the problem can be, and how it can be solved?

Thanks in advance for your cooperation.

Yours,

Marco

prj · October 18, 2022, 11:57am

Could you try to apply the following patch and then recompile FreeFEM and the msh3 plugin, please?

diff --git a/src/femlib/Mesh3dn.cpp b/src/femlib/Mesh3dn.cpp
index 50489d93..cfadf8b4 100644
--- a/src/femlib/Mesh3dn.cpp
+++ b/src/femlib/Mesh3dn.cpp
@@ -1956,3 +1956,3 @@ namespace Fem2D
     {
-        
+#if 0
         if (meshS) {
@@ -2034,2 +2034,3 @@ namespace Fem2D
         
+#endif
     }

marcodeba · October 18, 2022, 7:01pm

Thanks @prj for your prompt answer!
Unfortunately the modification you have suggested is not solving the issue.

Just to be sure to have understood properly, these are the modifications which I have introduced:

If you have any other suggestion it will be extremely helpful for me!

Thanks again for your precious contribution!

Marco

prj · October 19, 2022, 6:10am

So, if I understand correctly, you are load a mesh file of size ~1GB? What is the mesh format?

marcodeba · October 19, 2022, 6:18am

Exactly! Everything works smoothly since the mesh size is below 1GB, once overcome that threshold the above mentioned error message appears. If it can help I have put a “cout” right after the buildDmesh command (if mpirank == 0) and the execution prints that cout line. Indeed the execution is never killed, but simply shows the error message and freeze without computing anything (also waiting several hours when for meshes slightly below 1GB in few minutes the first results were already computed)

the mesh is saved in .mesh format and exported from gmsh

prj · October 19, 2022, 6:25am

Could you, by any chance, load a coarse mesh and then perform uniform mesh refinement? Or do you need to load an initial fine mesh to properly resolve, e.g., nonlinear boundaries?

marcodeba · October 19, 2022, 6:32am

Unfortunately I can’t perform uniform mesh refinement, since is fundamental to have different regions meshed with different mesh size.

prj · October 19, 2022, 6:34am

Right, you can generate an initial coarse mesh with different mesh sizes, then perform mesh refinement. The mesh sizes will still be different.

marcodeba · October 19, 2022, 6:35am

Oh, I wasn’t aware of this possibility. Can you suggest me how this can be done?

prj · October 19, 2022, 6:38am

Instead of using buildDmesh, please try buildMat, cf. FreeFem-sources/diffusion-2d-PETSc.edp at master · FreeFem/FreeFem-sources · GitHub, with a -split parameter greater than one. This is an old API, if this solves your issue of using large meshes, I’ll add this option to the buildDmesh + (newer) createMat API.

marcodeba · October 19, 2022, 6:46am

I’m trying and I’ll let you know as soon as possible!

Thank you very much for your support!

marcodeba · October 19, 2022, 7:17am

Sorry for disturbing you again @prj , can you tell me how, in this case I can move my results to the ThPlt mesh since after the buildMat command Th is now refined and ThPlt no?

hope to be clear.

prj · October 19, 2022, 7:51am

Why would you want to do that on a 1GB mesh? You should rather keep your solution distributed.

marcodeba · October 19, 2022, 8:15am

I was doing it since I need to compute integral values from the computed (distributed) variable. Is it possible to be done just keeping it distributed?

prj · October 19, 2022, 8:31am

Yes, there are some examples in the folder examples/hpddm, using, e.g., the partition of unity. It depends on what you compute exactly, but you should not use the initial global mesh for something as trivial as computing integrals.

marcodeba · October 19, 2022, 9:09am

In my case I need to compute a power deposition arising from an electrical current distribution which is evaluated starting from the computed magntic vector potential (solving basically transient ampere law).
What I was doing since today was:

Computing the distributed solution (magnetic vector potential) from solving the algebraic system (sol=A*b^-1) defined on the distributed mesh
“Moving” the computed magnetic vector potential from distributed mesh to global one (ThPlt) with the following list of command:

  real[int] AxPETSc;
  real[int] A0xPETSc;
  ChangeNumbering(AA,Ax[],AxPETSc);
  ChangeNumbering(AA,Ax[],AxPETSc,inverse = true);
  ChangeNumbering(AA,A0x[],A0xPETSc);
  ChangeNumbering(AA,A0x[],A0xPETSc,inverse = true);

  Agx[](SubIdx) = Ax[];
  Ag0x[](SubIdx) = A0x[];

  mpiAllReduce(Agx[],AgxS[], mpiCommWorld, mpiSUM);
  mpiAllReduce(Ag0x[],Ag0xS[],mpiCommWorld,mpiSUM);

Compute from the obtained [AgxS, AgyS, AgzS] and [Ag0xS, Ag0yS, Ag0zS] the information I need (i.e. Current distribution and deposited power) on the global mesh (ThPlt)

So what you reccomend is to avoid the step 2 and perform step 3 in a “distributed way”. Is that correct?

If this is the case, can you provide me a more precise reference within the folder you are mentioning where I can take inspiration?

Thank you very much again!

marcodeba · October 19, 2022, 9:11am

Moreover, in the way you are mentioning, how can I store (e.g. in a .vtu file) the current/power distribution in the entire domain?

prj · October 19, 2022, 9:36am

How do you compute the current distribution and the deposited power?
Lookup savevtk in the examples/hpddm, that will show you examples.

marcodeba · October 19, 2022, 9:39am

here the lines computing current and power (in two different regions):

    [Jex, Jey, Jez] = SigmaPlt*[(Ag0xS-AgxS)/dt,(Ag0yS-AgyS)/dt,(Ag0zS-AgzS)/dt];

    PTFPP = int3d(ThPlt,cassaPlt)(1./SigmaPlt*(Jex^2+Jey^2+Jez^2));
    PVVPP = int3d(ThPlt,VVPlt)(1./SigmaPlt*(Jex^2+Jey^2+Jez^2));

prj · October 19, 2022, 9:47am

Then you can do like in FreeFem-sources/minimal-surface-Tao-2d-PETSc.edp at develop · FreeFem/FreeFem-sources · GitHub, except you cannot use createPartition (because that’s not available with buildMat) and you need to initiliaze part (of the same fespace as your Mat, not P0 like in the example) by part[] = A.D.

Topic		Replies	Views
Segmentation Violation using PETSc General Discussion	14	1899	July 22, 2021
PETSc with errors depend on the number of processors General Discussion	3	463	November 23, 2022
Periodic FE space in parallel computation (PETSc) General Discussion	16	608	April 28, 2022
Stall in some multi-node parallel calculations General Discussion	83	1145	August 14, 2021
Saving and loading global/distibuted arrays General Discussion	39	2391	August 5, 2020

Memory allocation problem in MPI computation using PETSc

Related topics