User Tools

Site Tools


environment_modules

Environment Modules, .bashrc

Modules

When multiple versions of software are installed, some method is needed to use the version you want to use. The Modules package is the overwhelming choice of HPC centers for this. Modules is supplied on the system to manipulate the users's environment variables to run a choice of the needed programs and versions. The most important of these variables are $PATH, telling the system where to find executable files such as matlab or mpirun, and $LDLIBRARYPATH, telling the system where to find shared libraries that an executable calls and requires. You can manipulate the environment variables yourself instead of calling modules, but generally there is no advantage in doing so.

The four critical module commands are list (currently loaded),avail (all available), load, and purge. The combination purge/load clears all loaded modules and sets new ones:

$ module list
Currently Loaded Modulefiles:
  1) intel/14.0.3   2) impi/5.0.0
$ module avail
-------- /share/apps/modules/Modules/3.2.10/modulefiles ------------------------------------------------
dot           gcc/4.9.1     impi/5.1.1    mkl/14.0.3    module-info   mvapich2/2.1  openmpi/1.8.6 perl/5
gcc/4.7.2     impi/5.0.0    intel/14.0.3  module-git    modules       null          openmpi/1.8.8 use.own
$ module purge
$ module load intel/14.0.3 impi/5.1.1
$ module list
Currently Loaded Modulefiles:
  1) intel/14.0.3   2) impi/5.1.1
$ module purge
$ module load intel/14.0.3 mkl/14.0.3 mvapich2/2.1
$ module list
Currently Loaded Modulefiles:
  1) intel/14.0.3   2) mkl/14.0.3. 2) mvapich2/2.1

module avail can be slow to load. A quicker way to see the top-level names only is (on any cluster, this is razor)

$ ls /share/apps/modulefiles
abinit    cdhit      flash        idb        mkl          netcdf    pindel        reapr        tophat
abyss     changa     gadget       ifort      module-cvs   null      platform_mpi  samtools     transdecoder
allpaths  cmake      gamess       impi       module-info  nwchem    pqs           scipy        trilinos
augustus  cplex      gcc          infernal   modules      open64    proj.4        sickle       trinity
bbmap     crb-blast  gdal         intel      mono         opencv    python        siesta       usearch
bib       cuda       GenomeVISTA  java       moose        openfoam  qiime         soapdenovo2  use.own
blast     cufflinks  glpk         last       mothur       openmpi   qlogicmpi     spades       velvet
blat      dDocent    gotoblas2    LIGGGHTS   mpiblast     orca      qmcpack       sra-tools    ViennaRNA
boost     deepTools  grass        maker      mpt          os        quast         sunstudio    visit
bowtie    dot        gromacs      matlab     muscle       parallel  R             swat         wise2
bowtie2   emboss     gurobi       mcl        mvapich      pear      randfold      swig
busco     fastqc     hdf5         migrate-n  mvapich2     perl      raxml         symphony
bwa       fftw       hmmer        miRDeep    ncl          PGI       rDock         tcltk

ls -R /share/apps/modulefiles | more gives a full list of every file.

.bashrc

A script file in each user's home directory called .bashrc also sets environment variables such as $PATH and $LDLIBRARYPATH at each login. Since the file name begins with a period, the file is “hidden” so it does not register by ls but does register by ls -a.

If you use a different shell than bash such as tcsh, there will be an analogous file such as .cshrc which does the same functions in csh syntax.

The default .bashrc loaded with a new account includes three module loads. You can and usually should edit this file to get the effect you want.

$ cat ~/.bashrc
. /etc/profile.d/env-modules.sh
ulimit -s unlimited 2>/dev/null
ulimit -l unlimited 2>/dev/null 
#if using goto/mkl blas with mpi these should be set to 1 unless you want hybrid mpi/openmp
export GOTO_NUM_THREADS=1
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
#enter your modules here
module load intel
module load openmpi
module load mkl
[ -z "$PS1" ] && return
  PS1='`/bin/hostname -s`:`whoami`:`echo $PWD | sed "s=$HOME=="`$ '
  alias ls='ls --color=auto'
  module list

.bashrc is sourced at the beginning of each interactive job. There is a similar file .bashprofile sourced at the beginning of each non-interactive job. In our setup, we source .bashrc from .bashprofile so that the files are effectively the same, thus reducing the maintenance effort, but you can separate them if necessary. Interactive or batch is determined in .bashrc by [ -z “$PS1” ] && return which drops out of the loop on batch runs, so commands following that are for interactive sessions only, like setting the value of the prompt $PS1. Commands towards the top of .bashrc are for both interactive and batch. You can add commands to .bash_profile for batch jobs only.

For csh-users, module commands should operate identically under tcsh but they are generally untested.

On multiple-node parallel jobs, .bashrc can do some unexpected things as the system spawns a shell on each node. Here are some recommended module/.bashrc setups for different cases:

You run only one program, or all the programs you run use the same modules, or each uses different modules that don't conflict

Put the module load … in .bashrc, above [ -z…. The same environment will be loaded from .bashrc for every interactive session, batch job, and MPI program if any. Many modules, for instance R and matlab and python, can be assumed to not conflict, though most of the very many combinations have not been tested. Modules that definitely do conflict are MPI modules, only one may be safely used at a time, and multiple versions of the same program, like gcc/4.7.1 and gcc/4.9.1. Multiple compilers, such as gcc and intel usually don't conflict, but for MVAPICH2 and OpenMPI, the compiler module is used to load the MPI module, so multiple compiler modules should be loaded in order (1) compiler module you want to use with MPI (2) MPI module (3) additional compiler module. In some cases, gnu MPI programs using MKL libraries can want a library only available from the Intel compiler, so a combination like module load gcc/4.7.2 mkl/14.0.3 openmpi/1.8.8;module load intel/14.0.3 may be necessary. The last intel/14.0.3 may show a harmless warning message.

You use different (conflicting) modules for different programs, but only run single-node batch jobs

If you put module load … in your .bashrc, every interactive session and batch and MPI thread will load that, which won't be good if it's a slave compute nodes loading the wrong MPI version. Delete or comment out (add leading “#” to) the module load … in .bashrc. Put the relevant module load … in your batch scripts after the #PBS statements, for example:

#PBS ...
#PBS -l node=1:ppn=12
module purge
module load R openmpi
cd $PBS_O_WORKDIR
R < myprog.R --nosave --vanilla

In this case, for interactive sessions such as compiling, type the relevant module load … in your session before work.

You use different (conflicting) modules for different programs, and also run multi-node MPI jobs

This is a more difficult case. The first two solutions won't always work. If a module is set in a batch script using multiple nodes, the module definitely applies to the MPI processes running in the first or “master” compute node (usually the first and lowest numbered assigned node in our batch configuration) but does not necessarily apply to the “slave” compute nodes, depending how different MPI types issue remote threads. Multiple nodes imply MPI is being used, and the solution varies by MPI type. A certain form of the mpirun statement for each MPI type is required. See the MPI article for more details.

environment_modules.txt · Last modified: 2020/02/04 19:01 by root