Mpirun and ff-mpirun works in Mac but not in Ubuntu

I have the following python code solveMPI.py to run an embarrasingly parallel job

import numpy as np
from mpi4py import MPI
# import os
import subprocess

comm = MPI.COMM_WORLD
size = comm.Get_size()


def loop(rank):
    # ffcmd = "ff-mpirun random_source.edp -nProcess {} -process {}".format(size, rank)
    # print(ffcmd)
    # os.system(ffcmd)
    subprocess.run(["ff-mpirun", "random_source.edp", "-nProcess", str(size), "-process", str(rank)], shell=False)


def parallel():
    rank = comm.Get_rank()
    loop(rank)


if __name__ == '__main__':
    parallel()

Then I run the python code by mpirun -np 10 python3 solveMPI.py.

It works perfectly well in Mac OS. But in Ubuntu, the command does nothing but just print the following text

'/usr/bin/mpiexec' --oversubscribe /usr/local/bin/FreeFem++-mpi -nw 'random_source.edp' -nProcess 10 -process 1
'/usr/bin/mpiexec' --oversubscribe /usr/local/bin/FreeFem++-mpi -nw 'random_source.edp' -nProcess 10 -process 4
'/usr/bin/mpiexec' --oversubscribe /usr/local/bin/FreeFem++-mpi -nw 'random_source.edp' -nProcess 10 -process 6
'/usr/bin/mpiexec' --oversubscribe /usr/local/bin/FreeFem++-mpi -nw 'random_source.edp' -nProcess 10 -process 3
'/usr/bin/mpiexec' --oversubscribe /usr/local/bin/FreeFem++-mpi -nw 'random_source.edp' -nProcess 10 -process 8
'/usr/bin/mpiexec' --oversubscribe /usr/local/bin/FreeFem++-mpi -nw 'random_source.edp' -nProcess 10 -process 9
'/usr/bin/mpiexec' --oversubscribe /usr/local/bin/FreeFem++-mpi -nw 'random_source.edp' -nProcess 10 -process 5
'/usr/bin/mpiexec' --oversubscribe /usr/local/bin/FreeFem++-mpi -nw 'random_source.edp' -nProcess 10 -process 0
'/usr/bin/mpiexec' --oversubscribe /usr/local/bin/FreeFem++-mpi -nw 'random_source.edp' -nProcess 10 -process 2
'/usr/bin/mpiexec' --oversubscribe /usr/local/bin/FreeFem++-mpi -nw 'random_source.edp' -nProcess 10 -process 7

If I replace the command in subprocess.run by simple ones such as date, then it produces expected output. I searched the internet for answers but coudn’t solve the problem. Hope experts here can help me!

BTW, If I copy one of the 10 commands and paste it into the terminal, then it runs ok.

Then it probably means that those binaries are not in your path.

I checked and they are in my path.

If I change the subprocess line to

subprocess.run(["ff-mpirun", "--version"], shell=False)

Then I get the expected output

'/usr/bin/mpiexec' --oversubscribe --version
'/usr/bin/mpiexec' --oversubscribe --version
'/usr/bin/mpiexec' --oversubscribe --version
'/usr/bin/mpiexec' --oversubscribe --version
'/usr/bin/mpiexec' --oversubscribe --version
'/usr/bin/mpiexec' --oversubscribe --version
'/usr/bin/mpiexec' --oversubscribe --version
'/usr/bin/mpiexec' --oversubscribe --version
'/usr/bin/mpiexec' --oversubscribe --version
'/usr/bin/mpiexec' --oversubscribe --version
mpiexec (OpenRTE) 4.1.2

Report bugs to http://www.open-mpi.org/community/help/
mpiexec (OpenRTE) 4.1.2

Report bugs to http://www.open-mpi.org/community/help/
mpiexec (OpenRTE) 4.1.2

Report bugs to http://www.open-mpi.org/community/help/
mpiexec (OpenRTE) 4.1.2

Report bugs to http://www.open-mpi.org/community/help/
mpiexec (OpenRTE) 4.1.2

Report bugs to http://www.open-mpi.org/community/help/
mpiexec (OpenRTE) 4.1.2

Report bugs to http://www.open-mpi.org/community/help/
mpiexec (OpenRTE) 4.1.2

Report bugs to http://www.open-mpi.org/community/help/
mpiexec (OpenRTE) 4.1.2

Report bugs to http://www.open-mpi.org/community/help/
mpiexec (OpenRTE) 4.1.2

Report bugs to http://www.open-mpi.org/community/help/
mpiexec (OpenRTE) 4.1.2

Report bugs to http://www.open-mpi.org/community/help/

Once the --version is replaced by an edp file, the command will not be executed.

You are missing the parameter -np 1.

Adding -np 1 makes no difference. Pasting the command in terminal works but not the Python script. Exactly the same script works in Mac. I’ll check again to see if it’s the Python problem.

I replaced the Python script with a C++ code:

#include<stdlib.h>
#include<string.h>
#include <mpi.h>

using namespace std;

int main(int argc, char** argv) {
  int rank, n_ranks;

  // First call MPI_Init
  MPI_Init(&argc, &argv);
  // Get my rank and the number of ranks
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &n_ranks);

  string cmd = "ff-mpirun random_source.edp -nProcess " + to_string(n_ranks) + " -process " + to_string(rank) + " -ne 0";
  cout << cmd << endl;
  system(cmd.c_str());

  // Call finalize at the end
  return MPI_Finalize();
}

Again the command is print but not executed. I have no idea now.

No problem whatsoever on my end, I just tried with a dummy random_source.edp that just prints to screen.

What is your linux distribution and FreeFem version?

Debian Sid with FreeFEM develop branch.

I also tried the following dummy test.edp file:

cout << "This is a test!" << endl << endl;

Then

os.system("FreeFem++ test.edp")

gives the correct output. But

os.system("ff-mpirun -np 1 test.edp")

just prints the command itself. Since ff-mpirun calls FreeFem++-mpi, I test the code

os.system("FreeFem++-mpi -nw test.edp")

This time it gives a long error message

It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  getting local rank failed
  --> Returned value No permission (-17) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_init failed
  --> Returned value No permission (-17) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "No permission" (-17) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)

Finally, all 3 commands give correct output if they are executed directly in the terminal.