=== MPI-python ===

Referring to the [[mpi]] page, we will use ''openmpi/4.1.4'' and ''mvapich2/2.3.7'' MPI modules, and the same form for ''mpirun/mpiexec''. Referring to the [[python]] page, we will create a conda environment to match each MPI variant.

Mixing ''modules'' and ''conda'' can be a little tricky as both try to take over the current environment, most importantly ''$PATH''.  It sets the search path for executables such as ''python'' or ''mpiexec''.  Both ''conda'' and ''module load'' will put their new search path in the front of ''$PATH'', so in general the last conda environment loaded or module loaded will control the ''$PATH''.

Here we will create two ''conda'' environments for ''mpi4py'' in both ''openmpi'' and ''mvapich2'' variations.  If you created the environment, you can go ahead and add the other ''python'' modules that you may need (program collections for conda or pip not to confused with ''lmod/environment modules'' in HPC and its ''module'' commands).

HPCC supplies a base miniconda installation for each python release, at this writing, 3.10, each to be followed by a corresponding ''source'' command to replace the changes that ''conda'' will try to make to your ''~/.bashrc'', as changing your ''~/.bashrc'' is a bad idea in HPC.

We will create a new conda environment and add the python module ''mpich'' (the predecessor of mvapich and Intel MPI) to it.  

<code>
module purge
module load gcc/11.2.1 mkl/19.0.5 python/3.10-anaconda
source /share/apps/bin/conda-3.10.sh
conda create -n mpi4py-mvapich-3.10  #use a different name if you make a new one
conda activate mpi4py-mvapich-3.10
conda install mpich
</code>

Oddly the conda ''mpich'' does not install a working mpi implementation but ''openmpi'' (shown below) does.  In either case we want to override that with the locally installed version optimized for Infiniband. So we will use ''module'' to push the local installation of either ''mvapich/openmpi'' to the top of the ''$PATH'', and also load the ''cuda'' module if we happen to be doing a ''cuda'' build on a GPU node. If you intend to use your python programs with a GPU, use ''srun'' to build on a GPU node with the ''cuda'' module. With non-GPU nodes, the ''cuda'' module will be ignored.

Then we will download and install the latest ''mpi4py'' from git to make sure it is recompiled with the optimized MPI of either variant.

<code>
module load mvapich2/2.3.7 cuda/11.7
git clone https://github.com/mpi4py/mpi4py
cd mpi4py
python setup.py install
cd ..
</code>

The installation of ''mpi4py'' is complete and we can now test under slurm, and we will repeat the commands we need at runtime, substituting the name of your environment if you created one. You can also run this as-is without any installation with the name shown.  To test under slurm we'll try 2 nodes and 4 MPI tasks per node as in [[mpi]]. We'll also need other ''srun/#SBATCH'' statements such as partition and time, but they don't affect MPI directly.

<code>
#SBATCH --nodes=2
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=1
hostfile=/scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID}
module purge
module load gcc/11.2.1 mkl/19.0.5 python/3.10-anaconda
source /share/apps/bin/conda-3.10.sh
conda activate mpi4py-mvapich-3.10
module load mvapich2/2.3.7 cuda/11.7
mpiexec -ppn 4 -hostfile $hostfile python test_mpi.py
</code>

The program 'test_mpi.py' is

<code>
from mpi4py import MPI
import sys

size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()

sys.stdout.write("Hello, World! I am process %d of %d on %s.\n" % (rank, size, name))
</code>

and the output, with node names from your ''$hostfile'', should be:

<code>
Hello, World! I am process 5 of 8 on c1716.
Hello, World! I am process 6 of 8 on c1716.
Hello, World! I am process 4 of 8 on c1716.
Hello, World! I am process 7 of 8 on c1716.
Hello, World! I am process 1 of 8 on c1715.
Hello, World! I am process 0 of 8 on c1715.
Hello, World! I am process 3 of 8 on c1715.
Hello, World! I am process 2 of 8 on c1715.
</code>

Using ''openmpi'' you can follow the same process while substituting "openmpi" for mvapich2 or mpich, and its current version 4.1.4 for 2.3.7.

<code>
module purge
module load gcc/11.2.1 mkl/19.0.5 python/3.10-anaconda
source /share/apps/bin/conda-3.10.sh
conda create -n mpi4py-openmpi-3.10  #use a different name if you make a new one
conda activate mpi4py-openmpi-3.10
conda install openmpi
module load openmpi/4.1.4 cuda/11.7
git clone https://github.com/mpi4py/mpi4py
cd mpi4py
python setup.py install
cd ..
</code>

and at runtime we'll also need the openmpi form of ''mpiexec'', replacing the compiled program ''my\_MPI\_executable'' from [[mpi]] with ''python test\_mpi.py'':

<code>
#SBATCH --nodes=2
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=1
hostfile=/scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID}
module load gcc/11.2.1 mkl/19.0.5 python/3.10-anaconda
source /share/apps/bin/conda-3.10.sh
conda activate mpi4py-openmpi-3.10
module load openmpi/4.1.4 cuda/11.7
mpiexec -np 8 --map-by node -hostfile $hostfile -x PATH -x LD_LIBRARY_PATH python test_mpi.py
</code>

and the output should be similar to mvapich.