Number of processes and memory usage

aszaboa · February 23, 2022, 12:21pm

Dear FreeFem users,

when I run FreeFem in parallel on e.g., 5 processes, it also starts 10 sleeping processes. These processes seem to consume memory. Should I be worried about this? That is the reason for this? See also the attached image.

I experience this behaviour on ubuntu 18.04, either mounted from oracle virtualbox or on a machine with just ubuntu.
I use a recently compiled version of FreeFem (develop branch). I run FreeFem with mpirun --allow-run-as-root -np 5 FreeFem++-mpi .....

prj · February 23, 2022, 1:18pm

What BLAS are you using?

aszaboa · February 23, 2022, 2:45pm

I must admit, I have no idea what the FreeFem compilation process does, so it is likely what I am not looking at the right place. I am sending you the config.log file from the FreeFem-sources directory. Could you please help me out how can i check the BLAS version?
config.log (347.1 KB)

prj · February 23, 2022, 3:20pm

It’s in the file you sent or in PETSc configure.log. But it looks alright. I think it’s OK, are you getting bad performance?

aszaboa · February 23, 2022, 3:28pm

The speed of the calculation is fine I suppose. I am trying out the steady-state mAL preconditioner NS solver, and I seems to run out of memory quite fast (at least compared to other commercial CFD software with about the same mesh size).

I found that changing the last part of string paramsV = "-ksp_type gmres -ksp_pc_side right " + "-ksp_rtol " + tolV + " -ksp_gmres_restart 50 -pc_type asm " + "-pc_asm_overlap 1 -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps";
to -pc_asm_overlap 1 -sub_pc_type ilu helps.

Do you have any suggestion maybe to decrease the memory requirement (even if the computational speed is slowed down)?

prj · February 23, 2022, 3:40pm

What’s the size of the problem?

aszaboa · February 23, 2022, 4:56pm

This run just dpes not run out of ram on a virtual machine with 12 GB ram.
number of triangles: 244925,number of vertices: 47183
total ndof: 1183721 (in FreeFem numbering)
I am also sending you the output of the -log_view option.
Nonlinear-solver_Re350.log (14.3 KB)

prj · February 23, 2022, 8:21pm

This is a tiny problem, why not use plain LU?

aszaboa · February 24, 2022, 8:24am

For this mesh size the mAL preconditioned GMRES method seems to outperform plain LU factorization considering lower memory usage, but LU is slightly faster. The thing is, we want to run this on our desktop machine, or even if we will use a cluster, we would like to use only a few processes (4-6/case), and run several cases in parallel (the scaling of other parts of the workflow which take up most of the computation determine the number of processes). This is why we are concerned with the memory usage.

Do you have maybe other suggestions to altering the numerics? E.g., use -pc_type gamg instead of -pc_type asm? Inner iteration tolerance?

And referring back to the original question regarding the additional sleeping processes: to you think I can ignore these, or should I investigate the cause? If the latter, where should I start!

prj · February 24, 2022, 4:47pm

If you’ll stick to low process counts, you could try PCASM + PCLU for subdomain solvers (maybe with MUMPS instead of the default PETSc LU factorisation). For the sleeping processes, I don’t know, sorry.

aszaboa · February 24, 2022, 8:46pm

Thanks for the helpful comments. Just to be sure, you mean by PCASM+PCLU for the subdomain solver the following: -pc_type asm -pc_asm_overlap 1 -sub_pc_type asm -sub_sub_pc_type lu -sub_sub_pc_factor_mat_solver_type mumps? (this seems to run and converge)

prj · February 24, 2022, 8:51pm

No, it should be -pc_type asm -pc_asm_overlap 1 -sub_pc_type lu -sub_pc_factor_mat_solver_type [mumps,petsc]. Always check on a small problem with -ksp_view to make sure that what you think you are feeding to PETSc is what it is actually using.

aszaboa · February 25, 2022, 9:32am

I see what I mixed up. Thanks for the tip.

Just two more questions: as I said previously, for PCASM, -sub_pc_type ilu seems to outperform -sub_pc_type lu. Do you think it is worth experimenting with the -sub_ksp_type option? Do you have other suggestion what other method could be chosen for the subdomain?

prj · February 25, 2022, 1:30pm

Finding the most efficient solver is a job in itself. There are too many options to just list them here, it takes time to adjust and fine tune such a method.

aszaboa · February 28, 2022, 7:16am

Ok, thanks for the tips!

aszaboa · May 20, 2022, 1:16pm

Regarding the additional sleeping processes, changing openmpi to mpich helped; see this post for more detail.

Topic		Replies	Views
Problem when increasing the number of procs General Discussion	7	593	January 11, 2021
RAM usage during parallel simulation General Discussion	13	468	November 30, 2021
My computer keeps freezing when running FreeFEM FreeFEM installation	12	889	November 12, 2020
MUMPS, solve and possible memory leaks General Discussion	2	1689	June 29, 2020
PC failed due to FACTOR_OUTMEMORY General Discussion	10	1469	March 29, 2022

Number of processes and memory usage

Related topics