Unused parallel `createMat` leads to more efficiency but wrong results

Hi all,

I’m using FF to solve an eigenproblem but I have come across some problems with SLEPc.

Problem

I’d like to solve the Hydrogen atom electron wave function and I’ve done it with a coarse mesh grid (40x40x40). But when it comes to a finner grid, for example, 100x100x100, the EigenValue from ARPACK would fail. Even if EigenValue can handle it, it takes much, much longer to solve it using a single CPU core. Then I turned to SLEPc.

According to the example from here, I just copied it and put the Hydrogen atom Schrodinger equation on it. Then the result seemed to be strange:

Unused createMat(Th, A, Pk)

I happened to write two redundant lines of

Mat A;
create(Th, A, Pk);

but never used A again. A remarkable performance improvement could be observed due to the ff-mpirun -np $NUM_PROCS. However, the results varied as NUM_PROCS changed. Here are the results:

!!! ff-mpirun -np 1
!!! Eigenvalues:  
 ---- 0 4.395276e+00 == 
 ---- 1 1.096251e+01 == 
 ---- 2 1.096251e+01 == 
 ---- 3 1.119920e+01 == 
 ---- 4 1.139153e+01 == 
 ---- 5 1.255011e+01 == 
 ---- 6 1.255011e+01 == 
 ---- 7 1.259630e+01 == 
 ---- 8 1.259631e+01 == 
 ---- 9 1.261844e+01 == 
times: compile 2.000000e-01s, execution 4.920000e+01s,  mpirank:0

!!! ff-mpirun -np 20
!!! Eigenvalues:
 ---- 0 2.048881e+00 == 
 ---- 1 1.012404e+01 == 
 ---- 2 1.046671e+01 == 
 ---- 3 1.079496e+01 == 
 ---- 4 1.225954e+01 == 
 ---- 5 1.235036e+01 == 
 ---- 6 1.240903e+01 == 
 ---- 7 1.251413e+01 == 
 ---- 8 1.271387e+01 == 
 ---- 9 1.299447e+01 == 
times: compile 1.800000e-01s, execution 4.420000e+00s,  mpirank:0

The time usage of -np 20 is about 10x shorter than -np 1. But the results of eigenvalues differed.

Uncomment the two lines

If I uncomment the two unused lines, the results are shown below:

!!! ff-mpirun -np 1
!!! Eigenvalues:
 ---- 0 4.39528 == 
 ---- 1 10.9625 == 
 ---- 2 10.9625 == 
 ---- 3 11.1992 == 
 ---- 4 11.3915 == 
 ---- 5 12.5501 == 
 ---- 6 12.5501 == 
 ---- 7 12.5963 == 
 ---- 8 12.5963 == 
 ---- 9 12.6184 == 
times: compile 0.11s, execution 49.24s,  mpirank:0

!!! ff-mpirun -np 20
!!! Eigenvalues:
 ---- 0 4.39528 == 
 ---- 1 10.9625 == 
 ---- 2 10.9625 == 
 ---- 3 11.1992 == 
 ---- 4 11.3915 == 
 ---- 5 12.5501 == 
 ---- 6 12.5501 == 
 ---- 7 12.5963 == 
 ---- 8 12.5963 == 
 ---- 9 12.6184 == 
times: compile 0.11s, execution 55.49s,  mpirank:0

The results of eigenvalues are the same but time usage does not change very much, which means this script has not enabled parallel acceleration.

I am sure that there are issues within my code, but I’m still a newbie to FF.
Are there any patches to it? Thanks in advance!

MWE

load "SLEPc"
include "macro_ddm.idp"
include "cube.idp"

macro dimension() 3 // EOM
macro Grad(u) [dx(u), dy(u), dz(u)] // EOM

int nn = 40;

real ka = 3.8099821161548593; // hbar^2 / 2m_e
real kc = 14.39964547842567; // e^2 / (4pi epsilon_0)

mesh3 Th = cube(nn, nn, nn, [40*x-20, 40*y-20, 40*z-20]);
// 1 => y=0
// 2 => x=1
// 3 => y=1
// 4 => x=0
// 5 => z=0
// 6 => z=1

func Pk = P1;

//Mat A;                // !!! 
//createMat(Th, A, Pk); // !!! Uncomment these two lines to see the difference
fespace Vh(Th, P1);

cout << "Th :  nv = " << Th.nv << " nt = " << Th.nt << endl;

real sigma = -14;

varf a(u1, u2) = int3d(Th) (
    Grad(u1)' * Grad(u2) * ka
		- 1.0 / sqrt(x^2 + y^2 + z^2)  * u1 * u2 * kc
    - sigma * u1 * u2
    )
    + on(5, 6, u1=0.0)
    ;

varf b([u1], [u2]) = int3d(Th) ( u1 * u2 ) ;

matrix LocA = a(Vh, Vh);
matrix LocB = b(Vh, Vh);

int nev = 10;
real[int] ev(nev);
Vh[int] eV(nev);

string ssparams = "-eps_nev " + nev 
				+ " -eps_type krylovschur"
				+ " -eps_target " + sigma
				//+ " -eps_view "
				+ " -st_type sinvert "
				+ " -eps_gen_hermitian";

if (!mpirank) {
		cout << "----------  " << ssparams << "  ----------" << endl;
}

Mat DistA(LocA);
Mat DistB(LocB);

int k = EPSSolve(
		DistA, DistB, 
		vectors = eV,
		values = ev,
		sparams = ssparams
);


k = min(k, nev);
int[int] Order= [1];

if (!mpirank) {
	for (int i=0; i<k; ++i) {
			//macro params()cmm = "Eigenvalue #" + i + " = " + ev[i], wait = 1, fill = 1//
			cout << " ---- " << i << " " << ev[i] << " == " << endl;
			//plotD(Th, eV[i], params)
			//int[int] fforder = [1];
			//savevtk("EV_" + i + ".vtu", Th, eV[i], order=fforder);
	}
}

Logs

logs.zip (200.2 KB)

Mat DistA(LocA);
Mat DistB(LocB);

can only be used to define “sequential” matrix, living on a single MPI process.
If you do createMat, your decompose your mesh in -np 20 subdomains. So what you are doing is basically solving -np 20 “small” eigenvalue problems on each process. You are not solving the original “big” problem in a distributed fashion. Please look at the many *-SLEPc.edp examples https://github.com/FreeFem/FreeFem-sources/tree/develop/examples/hpddm to do the proper calls and solve the original “big” problem correctly.

Also, when you say:

According to the example from here, I just copied it

It is not correct. The lines there read Mat B(A, LocB), not Mat B(LocB). Passing A enables correct parallelism.

1 Like

Thanks for the instruction! I’ll try it now!

The lines there read Mat B(A, LocB) , not Mat B(LocB) . Passing A enables correct parallelism

I’ve finished it by Mat B(A, LocB) and it ran as I expected. It is a great software which SAVES me!

I’m grateful for your help.