RAM usage during parallel simulation

marcodeba · November 27, 2021, 2:17pm

Dear FF users and developers,
I’m writing here since I’m not properly understanding what’s happening to the RAM during the solution of a transient problem with PETSc.
I have developed my code that is perfectly working, but I’m now trying to optimize it in order to reduce the computational time (keeping fixed the number of cores used) for each iteration.
Looking at what is happening during the simulation it seems that the software allocates the RAM memory it needs for the iteration and then it de-allocate it once the iteration is completed requiring to reallocate the entire RAM request at each time step. This sounds to be extremely unefficient to me. Is someone knowing the reason of this behavior?

In order to try understanding on my own the reason of this behavior I have tried to make use of the command “storageused()”, but I’m not sure how to use it since whatever I do it always provides me zero as an outcome. Is someone ablo to suggest me how to properly use the command storageused or another functional way to produce a “memory usage report” at each iteration?

Thanks a lot for your precious help,
Yours sincerely,

Marco

prj · November 27, 2021, 6:06pm

This is the expected behavior, why are you saying that “this is extremely inefficient”?

marcodeba · November 28, 2021, 9:27am

Thanks a lot @prj for your prompt reply!
Well, this was my personal supposition, I was wondering that if it would be possible to allocate once (at first iteration) the RAM required and then only perform the computations (and so observing the oscillations in the CPU use only…) this would decrease the iteration time.
Of course this was my supposition based on my limited knowledge, I am pretty new in this field, and so it would be a pleasure for me if you would explain why I’m wrong and why this behaviour is actually the best solution.

Thanks again for the precious help that you always kindly provide to me and to all the users.

Marco

prj · November 28, 2021, 6:50pm

Allocating/deallocating memory is usually the last thing you should worry about. Please run your code with the additional command line parameter -log_view and send the output. We will be able to see the costliest operations of your script and figure out what really needs to be optimized.

marcodeba · November 29, 2021, 7:59am

Goodmorning @prj. I’ve started the run of the code but it is still not at the end. By the way I can attach here the output up to the the point at which the run is arrived:

FFMPI_relevant.log (67.0 KB)

I do not know if it is enough or if we have to wait the end of the run.
Thanksa lot for your precious help

Marco

prj · November 29, 2021, 4:30pm

This is not what I’m asking for. Just run a couple of steps and post the result of -log_view, not your code.

marcodeba · November 29, 2021, 5:01pm

Well this is actually the last part of the log file produced and not my code. Since I’m working on a remote cluster this is the only way I have to see the output of the code.
If it is not enough I’m trying to run it on a local machine to directly see printed on screen the output

julienG · November 29, 2021, 5:41pm

You can redirect the standard output (stdout) to a logfile

YourFreeFemCommandAndOptions &> TheFileName

prj · November 29, 2021, 6:47pm

The option -log_view should print text to your screen/terminal/log file at the end of the job, no matter whether you run the job interactively or using a scheduler. If not, you are not using the option correctly.

marcodeba · November 30, 2021, 8:15am

Ok sorry I wasn’t understanding that it was necessary to end the job!
Now it should be fine:

FFMPI_log_view.log (12.6 KB)

Thanks again for your help!

Marco

prj · November 30, 2021, 8:53am

OK, we can now see that a huge amount of time is spent factorizing the coefficient matrices MatLUFactorNum 3 1.0 7.0932e+03. What kind of problem are you solving? Maybe there are better preconditioners than plain exact LU.

marcodeba · November 30, 2021, 9:07am

So, I’m solving an electro-dynamic problem (eddy current equation) making use of edge03d elements.
Do you have any suggestion on the preconditioner to be used or some references on which the choice of the preconditioner is described as a function of the problem/matrix?

prj · November 30, 2021, 9:16am

Maybe I’m wrong, but I think you could use AMS which is much more efficient than plain LU. See FreeFem-sources/maxwell-3d-PETSc.edp at develop · FreeFem/FreeFem-sources · GitHub and this reference if you are interested Parallel Auxiliary Space AMG for H(Curl) Problems (Journal Article) | DOE PAGES.

marcodeba · November 30, 2021, 9:24am

Thanks a lot!!! I will try and I’ll let you know if I’ll be able obtaining cpu time decrease thanks to your suggest.

Again a warn thank you

Marco

Topic		Replies	Views
Report on memory and storage use General Discussion	3	1022	March 14, 2021
Number of processes and memory usage General Discussion	15	584	May 20, 2022
Memory allocation problem in MPI computation using PETSc General Discussion	43	1021	October 25, 2022
MUMPS, solve and possible memory leaks General Discussion	2	1672	June 29, 2020
PC failed due to FACTOR_OUTMEMORY General Discussion	10	1430	March 29, 2022

RAM usage during parallel simulation

Related topics