Dear FF users and developers,
I’m writing here since I’m not properly understanding what’s happening to the RAM during the solution of a transient problem with PETSc.
I have developed my code that is perfectly working, but I’m now trying to optimize it in order to reduce the computational time (keeping fixed the number of cores used) for each iteration.
Looking at what is happening during the simulation it seems that the software allocates the RAM memory it needs for the iteration and then it de-allocate it once the iteration is completed requiring to reallocate the entire RAM request at each time step. This sounds to be extremely unefficient to me. Is someone knowing the reason of this behavior?
In order to try understanding on my own the reason of this behavior I have tried to make use of the command “storageused()”, but I’m not sure how to use it since whatever I do it always provides me zero as an outcome. Is someone ablo to suggest me how to properly use the command storageused or another functional way to produce a “memory usage report” at each iteration?
Thanks a lot for your precious help,
Yours sincerely,
Thanks a lot @prj for your prompt reply!
Well, this was my personal supposition, I was wondering that if it would be possible to allocate once (at first iteration) the RAM required and then only perform the computations (and so observing the oscillations in the CPU use only…) this would decrease the iteration time.
Of course this was my supposition based on my limited knowledge, I am pretty new in this field, and so it would be a pleasure for me if you would explain why I’m wrong and why this behaviour is actually the best solution.
Thanks again for the precious help that you always kindly provide to me and to all the users.
Allocating/deallocating memory is usually the last thing you should worry about. Please run your code with the additional command line parameter -log_view and send the output. We will be able to see the costliest operations of your script and figure out what really needs to be optimized.
Goodmorning @prj. I’ve started the run of the code but it is still not at the end. By the way I can attach here the output up to the the point at which the run is arrived:
Well this is actually the last part of the log file produced and not my code. Since I’m working on a remote cluster this is the only way I have to see the output of the code.
If it is not enough I’m trying to run it on a local machine to directly see printed on screen the output
The option -log_view should print text to your screen/terminal/log file at the end of the job, no matter whether you run the job interactively or using a scheduler. If not, you are not using the option correctly.
OK, we can now see that a huge amount of time is spent factorizing the coefficient matrices MatLUFactorNum 3 1.0 7.0932e+03. What kind of problem are you solving? Maybe there are better preconditioners than plain exact LU.
So, I’m solving an electro-dynamic problem (eddy current equation) making use of edge03d elements.
Do you have any suggestion on the preconditioner to be used or some references on which the choice of the preconditioner is described as a function of the problem/matrix?