jp@jp:~/Descargas$ mpiexec -n 1 FreeFem++ poisson_gpu.edp -log_view
– FreeFem++ v4.14 (mié 20 nov 2024 17:13:44 -05 - git v4.14-95-g6fcdacf31)
file : poisson_gpu.edp
Load: lg_fem lg_mesh lg_mesh3 init_mesh3_array
eigenvalue
1 : load “PETSc”
2 :
3 : macro defKSP_OPTIONS() “-ksp_type cg -pc_type jacobi -vec_type cuda -mat_type aijcusparse -ksp_monitor -log_view -ksp_view” ////
4 :
5 : int nx = 1000, ny = 500;
6 : real lx = 1.0, ly = 1.0;
7 : mesh Th = square(nx, ny, [x * lx, y * ly]);
8 :
9 : fespace Vh(Th, P1);
10 :
11 : Vh u, v;
12 : func f = 1.0;
13 :
14 : solve Poisson(u, v, solver = “petsc”) =
15 : int2d(Th)(dx(u) * dx(v) + dy(u) * dy(v)) -
16 : int2d(Th)(f * v);
17 :
18 : cout << “System solved using GPU with PETSc.” << endl;
19 :
20 : sizestack + 1024 =1768 ( 744 )
– Square mesh : nb vertices =501501 , nb triangles = 1000000 , nb boundary edges 3000 rmdup= 0
– Solve :
min -1.55651e+10 max -1.55651e+10
System solved using GPU with PETSc.
times: compile 0.074077s, execution 7.02759s, mpirank:0
######## unfreed pointers 23 Nb pointer, 0Bytes , mpirank 0, memory leak =17228288
CodeAlloc : nb ptr 4165, size :560176 mpirank: 0
Ok: Normal End
*** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use ‘enscript -r -fCourier9’ to print this document ***
------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------
FreeFem++ on a named jp with 1 process, by jp on Fri Nov 22 11:43:32 2024
Using Petsc Development GIT revision: v3.22.1-208-g2ad7182b109 GIT Date: 2024-11-20 12:46:14 -0600
Max Max/Min Avg Total
Time (sec): 7.083e+00 1.000 7.083e+00
Objects: 0.000e+00 0.000 0.000e+00
Flops: 0.000e+00 0.000 0.000e+00 0.000e+00
Flops/sec: 0.000e+00 0.000 0.000e+00 0.000e+00
MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00
MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N → 2N flops
and VecAXPY() for complex vectors of length N → 8N flops
Summary of Stages: ----- Time ------ ----- Flop ------ — Messages — – Message Lengths – – Reductions –
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 7.0831e+00 100.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
See the ‘Profiling’ chapter of the users’ manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
CpuToGpu Count: total number of CPU to GPU copies per processor
CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
GpuToCpu Count: total number of GPU to CPU copies per processor
GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
GPU %F: percent flops on GPU in this event
Event Count Time (sec) Flop — Global — — Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F
— Event Stage 0: Main Stage
Object Type Creations Destructions. Reports information only for process 0.
— Event Stage 0: Main Stage
========================================================================================================================
Average time to get PetscTime(): 2.37e-08
#PETSc Option Table entries:
-log_view # (source: command line)
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-mpi=1 --with-cuda=1 --with-cudac=nvcc --download-fblaslapack --with-debugging=0 --prefix=/usr/local/petsc --with-shared-libraries=1 --download-hpddm --download-metis --download-ptscotch --download-parmetis --download-superlu --download-mmg --download-parmmg --download-scalapack --download-mumps
Libraries compiled on 2024-11-20 21:56:11 on jp
Machine characteristics: Linux-6.8.0-49-generic-x86_64-with-glibc2.35
Using PETSc directory: /usr/local/petsc
Using PETSc arch:
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g -O
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O
Using include paths: -I/usr/local/petsc/include -I/usr/local/cuda-12.2/include
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/usr/local/petsc/lib -L/usr/local/petsc/lib -lpetsc -Wl,-rpath,/usr/local/petsc/lib -L/usr/local/petsc/lib -Wl,-rpath,/usr/local/cuda-12.2/lib64 -L/usr/local/cuda-12.2/lib64 -L/usr/local/cuda-12.2/lib64/stubs -Wl,-rpath,/usr/local/openmpi/lib -L/usr/local/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -ldmumps -lmumps_common -lpord -lpthread -lscalapack -lsuperlu -lflapack -lfblas -lparmmg -lmmg -lmmg3d -lptesmumps -lptscotchparmetisv3 -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lparmetis -lmetis -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lrt -lquadmath