Dear FF users and developers,
I’m writing here since I’m not properly understanding what’s happening to the RAM during the solution of a transient problem with PETSc.
I have developed my code that is perfectly working, but I’m now trying to optimize it in order to reduce the computational time (keeping fixed the number of cores used) for each iteration.
Looking at what is happening during the simulation it seems that the software allocates the RAM memory it needs for the iteration and then it de-allocate it once the iteration is completed requiring to reallocate the entire RAM request at each time step. This sounds to be extremely unefficient to me. Is someone knowing the reason of this behavior?
In order to try understanding on my own the reason of this behavior I have tried to make use of the command “storageused()”, but I’m not sure how to use it since whatever I do it always provides me zero as an outcome. Is someone ablo to suggest me how to properly use the command storageused or another functional way to produce a “memory usage report” at each iteration?
Thanks a lot for your precious help,
This is the expected behavior, why are you saying that “this is extremely inefficient”?
Thanks a lot @prj for your prompt reply!
Well, this was my personal supposition, I was wondering that if it would be possible to allocate once (at first iteration) the RAM required and then only perform the computations (and so observing the oscillations in the CPU use only…) this would decrease the iteration time.
Of course this was my supposition based on my limited knowledge, I am pretty new in this field, and so it would be a pleasure for me if you would explain why I’m wrong and why this behaviour is actually the best solution.
Thanks again for the precious help that you always kindly provide to me and to all the users.
Allocating/deallocating memory is usually the last thing you should worry about. Please run your code with the additional command line parameter
-log_view and send the output. We will be able to see the costliest operations of your script and figure out what really needs to be optimized.
Goodmorning @prj. I’ve started the run of the code but it is still not at the end. By the way I can attach here the output up to the the point at which the run is arrived:
FFMPI_relevant.log (67.0 KB)
I do not know if it is enough or if we have to wait the end of the run.
Thanksa lot for your precious help
This is not what I’m asking for. Just run a couple of steps and post the result of
-log_view, not your code.
Well this is actually the last part of the log file produced and not my code. Since I’m working on a remote cluster this is the only way I have to see the output of the code.
If it is not enough I’m trying to run it on a local machine to directly see printed on screen the output
You can redirect the standard output (
stdout) to a logfile
YourFreeFemCommandAndOptions &> TheFileName
-log_view should print text to your screen/terminal/log file at the end of the job, no matter whether you run the job interactively or using a scheduler. If not, you are not using the option correctly.
Ok sorry I wasn’t understanding that it was necessary to end the job!
Now it should be fine:
FFMPI_log_view.log (12.6 KB)
Thanks again for your help!
OK, we can now see that a huge amount of time is spent factorizing the coefficient matrices
MatLUFactorNum 3 1.0 7.0932e+03. What kind of problem are you solving? Maybe there are better preconditioners than plain exact LU.
So, I’m solving an electro-dynamic problem (eddy current equation) making use of edge03d elements.
Do you have any suggestion on the preconditioner to be used or some references on which the choice of the preconditioner is described as a function of the problem/matrix?
Thanks a lot!!! I will try and I’ll let you know if I’ll be able obtaining cpu time decrease thanks to your suggest.
Again a warn thank you