User Tools

Site Tools


mpi

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
mpi [2016/10/27 17:34]
root
mpi [2016/10/27 17:38]
root
Line 3: Line 3:
 Here are some MPI examples for different flavors. Each also illustrates (a) setting modules and environment variables in the batch file, and (b) hybrid MPI/MKL threads. Hybrid MPI/OpenMP is run in the same way as MPI/MKL, except the relevant environment variable is ''​OMP_NUM_THREADS''​. ​ Each MPI performance is very similar for this small example, but there may be a big difference for larger jobs. Here are some MPI examples for different flavors. Each also illustrates (a) setting modules and environment variables in the batch file, and (b) hybrid MPI/MKL threads. Hybrid MPI/OpenMP is run in the same way as MPI/MKL, except the relevant environment variable is ''​OMP_NUM_THREADS''​. ​ Each MPI performance is very similar for this small example, but there may be a big difference for larger jobs.
  
-The test program is HPL (High Perfomance Linpack) on (a) two E5-2650 v2/Mellanox 16-core nodes or (b) two E5-2670/​QLogic 16-core nodes, older but higher-clocked. ​ This job is with a relatively small matrix, so performance is well short of maximum. ​ The best HPL layout we have tested is 4 MPI processes per two-socket node, with 3 or 4 MKL threads per MPI process for 12 or 16-core nodes respectively. ​ HPL is usually compiled with the ''​gcc''​ compiler.+The test program is HPL (High Perfomance Linpack) on (a) two E5-2650 v2/Mellanox 16-core nodes or (b) two E5-2670/​QLogic 16-core nodes, older but higher-clocked. ​ This job is with a relatively small matrix, so performance is well short of maximum. ​ The best HPL layout we have tested is 4 MPI processes per two-socket node, with 3 or 4 MKL threads per MPI process for 12 or 16-core nodes respectively. ​ HPL is usually compiled with the ''​gcc''​ compiler.  Better compilers don't help as the great majority of the execution time is in the BLAS library.
  
-We have found that MPI software has mostly improved over time, so the best choices are recent versions as shown in the examples.+We have found that MPI software has mostly improved over time, so the best choices are recent versions as shown in the examples. ​ Also newer versions have better support for module selection in the job file as shown in these examples.
    
 ==Intel MPI == ==Intel MPI ==
  
-If you set the ''​impi''​ module at the top of your batch file, paths (program ''​$PATH''​ and shared library ''​$LD_LIBRARY_PATH''​) will be passed to slave nodes and a multiple-node job will run.  If you need other environment variables in the batch file, they won't be passed to slave nodes and must be set either (a) in .bashrc or (b) set in the ''​mpirun''​ statement, for instance the number of MKL threads below. Notice no equals sign in the assignment statement.+If you set the ''​impi''​ module at the top of your batch file, paths (program ''​$PATH''​ and shared library ''​$LD_LIBRARY_PATH''​) will be passed to slave nodes and a multiple-node job will run.  If you need other environment variables in the batch file, they won't be passed to slave nodes and must be set either (a) in .bashrc or (b) set in the ''​mpirun''​ statement, for instance the number of MKL threads below. Notice ​the csh-style, ​no equals signin the MKL_NUM_THREADS ​assignment statement.
  
 <​code>​ <​code>​
mpi.txt · Last modified: 2016/10/31 16:05 by root