I am currently trying to reproduce the large-scale electromagnetic scattering simulation from the Cobra Cavity example shown on the FreeFEM gallery (https://freefem.org/gallery/cobra), where the problem reportedly reaches 198 million degrees of freedom (DoFs) using 6144 cores.
In my attempt to reproduce this result, I used the following command:
To increase the number of DoFs, I adjusted the distx parameter in Maxwell_Cobracavity.edp to 0.12, which refined the mesh. However, when the DoF reaches several tens of millions, the computation runs into out-of-memory errors.
I also found that the -noGlob option in ffddm_parameters.idp helps avoid global mesh generation, which reduces the memory usage. But even with this option, as the DoFs grow further, the memory issue still persists.
Here is my current cluster setup:
100 compute nodes
32 CPU cores per node (total 3200 cores)
124 GB memory per node
Given this, I would like to ask:
Is it possible to share the exact .edp script and command-line options used to achieve the large-scale 198 million DoFs computation on 6144 cores?
Are there any additional parameters or best practices needed to successfully scale to this problem size?
Thank you very much for your help and suggestions!
The exact setting of the test case is detailed in the paper, section 6.5. It is said that
The problems are discretized with order 2 edge elements using 10 points per
wavelength, so you need to set
func Pk = Edge13d;
func PkP0 = Edge13ds0;
for order 2 (degree 1) edge elements, and
int nloc = 10./mysplit*sec3/lambda;
to use 10 points per wavelength instead of 20.
The mesh for the coarse problem corresponds to a discretization with 3.33 points per wavelength, so there is a refinement factor of 3 between the coarse and the fine mesh. Thus, you need to set the splitting factor to 3:
int mysplit = 3;
Finally, it is said that the sides of the box are placed 4 wavelengths away from the cavity in each direction, so you need to set
real distx = 4*lambda;
real disty = distx;
real distz = distx;
As you pointed out, you can reduce memory usage by using the -noGlob flag. You can also reduce the maximum size of the Krylov subspace in GMRES by setting a lower maximum number of iterations
u[] = MfGMRES(x0, rhs, 1.e-6, 50, "right");
or by using restarted GMRES with a smaller restart, e.g. with -ffddm_gmres_restart 50.
I just reproduced the results of the paper on the Irene supercomputer, using
With your cluster setup I am afraid that the 16Ghz case may be difficult to pass, but you should be able to at least reproduce the 10Ghz case with no problem.