RAM usage during parallel simulation

Dear FF users and developers,
I’m writing here since I’m not properly understanding what’s happening to the RAM during the solution of a transient problem with PETSc.
I have developed my code that is perfectly working, but I’m now trying to optimize it in order to reduce the computational time (keeping fixed the number of cores used) for each iteration.
Looking at what is happening during the simulation it seems that the software allocates the RAM memory it needs for the iteration and then it de-allocate it once the iteration is completed requiring to reallocate the entire RAM request at each time step. This sounds to be extremely unefficient to me. Is someone knowing the reason of this behavior?

In order to try understanding on my own the reason of this behavior I have tried to make use of the command “storageused()”, but I’m not sure how to use it since whatever I do it always provides me zero as an outcome. Is someone ablo to suggest me how to properly use the command storageused or another functional way to produce a “memory usage report” at each iteration?

Thanks a lot for your precious help,
Yours sincerely,

Marco

This is the expected behavior, why are you saying that “this is extremely inefficient”?

Thanks a lot @prj for your prompt reply!
Well, this was my personal supposition, I was wondering that if it would be possible to allocate once (at first iteration) the RAM required and then only perform the computations (and so observing the oscillations in the CPU use only…) this would decrease the iteration time.
Of course this was my supposition based on my limited knowledge, I am pretty new in this field, and so it would be a pleasure for me if you would explain why I’m wrong and why this behaviour is actually the best solution.

Thanks again for the precious help that you always kindly provide to me and to all the users.

Marco

Allocating/deallocating memory is usually the last thing you should worry about. Please run your code with the additional command line parameter -log_view and send the output. We will be able to see the costliest operations of your script and figure out what really needs to be optimized.

Goodmorning @prj. I’ve started the run of the code but it is still not at the end. By the way I can attach here the output up to the the point at which the run is arrived:

FFMPI_relevant.log (67.0 KB)

I do not know if it is enough or if we have to wait the end of the run.
Thanksa lot for your precious help

Marco

This is not what I’m asking for. Just run a couple of steps and post the result of -log_view, not your code.

Well this is actually the last part of the log file produced and not my code. Since I’m working on a remote cluster this is the only way I have to see the output of the code.
If it is not enough I’m trying to run it on a local machine to directly see printed on screen the output

You can redirect the standard output (stdout) to a logfile

YourFreeFemCommandAndOptions &> TheFileName

The option -log_view should print text to your screen/terminal/log file at the end of the job, no matter whether you run the job interactively or using a scheduler. If not, you are not using the option correctly.

Ok sorry I wasn’t understanding that it was necessary to end the job!
Now it should be fine:

FFMPI_log_view.log (12.6 KB)

Thanks again for your help!

Marco

OK, we can now see that a huge amount of time is spent factorizing the coefficient matrices MatLUFactorNum 3 1.0 7.0932e+03. What kind of problem are you solving? Maybe there are better preconditioners than plain exact LU.

So, I’m solving an electro-dynamic problem (eddy current equation) making use of edge03d elements.
Do you have any suggestion on the preconditioner to be used or some references on which the choice of the preconditioner is described as a function of the problem/matrix?

Maybe I’m wrong, but I think you could use AMS which is much more efficient than plain LU. See FreeFem-sources/maxwell-3d-PETSc.edp at develop · FreeFem/FreeFem-sources · GitHub and this reference if you are interested Parallel Auxiliary Space AMG for H(Curl) Problems (Journal Article) | DOE PAGES.

Thanks a lot!!! I will try and I’ll let you know if I’ll be able obtaining cpu time decrease thanks to your suggest.

Again a warn thank you

Marco