User Tools

Site Tools


sapplication_software

**This is an old revision of the document!**

Application Software

Locating and using software has been made a little more complicated by some at-the-time reasonable decisions made 50 years ago for Unix: /usr/local for applications, $PATH to find an executable, $LD_LIBRARY_PATH to find dynamic link libraries, the file ~/.cshrc to set up these variables. These environment variables continue today in Linux and Mac, while Windows combines the two PATH variables. Also important were software packages intended to be used as infrastructure for complete applications, and which didn't need to be copied into the code of every project.. These “shared libraries” such as MPI and FFTW were specified by source code interfaces. At the time, nearly everyone used one computer and one compiler, so a source interface corresponded directly to one binary interface. Today there are many types of computers and compilers, and modern applications often have a defined binary interface or “ABI” to avoid compatibility issues.

Today on HPC systems, the MPI implementation must be heavily customized for each site and its network fabric. Almost all multi-node programs depend on MPI, and there are three popular implentations of MPI (Open MPI, MVAPICH2, and Intel). There are about six compilers that are reasonably popular (GNU/gcc, Intel proprietary, Intel open LLVM, NVidia/PGI, AMD LLVM, stock LLVM). MVAPICH2 and Intel MPI are binary compatible (thus a defacto ABI). Probably all the LLVM implementations are binary compatible (with each other, not with Open MPI vs MVAPICH2), so there are about 8 to 12 binary versions of MPI, not considering updates for each that come out 2 or 3 times a year. As will be seen in the demonstrations below, many of the software packaging programs (OpenHPC, Spack, conda) use their own version of MPI. Usually we can expect this to only work within one node using shared memory. Sometimes we can intercept MPI library calls with our site-compiled MPI that can run on multiple nodes.

With thousands of applications (most of which have multiple versions), it's obviously impractical just from name collisions to put every executable in /usr/local/bin. It's also impractical (unless you use only one application) to semipermanently set up these variables in ~/.bashrc or ~/.cshrc. There are several ways to handle this.

Modules

Almost all HPC centers use “modules” software to help manage versioning. This was originally Environment Modules and at most centers, including this one, has been replaced by an upward compatible rewrite Lmod. The primary use is to manage $PATH,$L\D_LIBRARY_PATH, and other environment variables over a large number of applications. Unfortunately the name is easily confused with the unrelated packaged programs in modular languages such as Python and R:Python modules.

Module command syntax for most uses is relatively simple: load/unload to invoke/remove a module, purge to unload all modules, list to show loaded modules, help, and spider for searching. We share some examples below for our three sources of software and module definitions “modulefiles”. There is a complete list of modulefiles in the text file /share/apps/modulelist which can be grepped.

Locally written modulefiles

There are currently about 660 locally written modulefiles, some of which have some “smart” capability to select from multiple software builds compiled for the computer loading the module. This shows the first 5:

$ grep "/share/apps/modulefiles" /share/apps/modulelist | head -5
/share/apps/modulefiles/ abinit/8.0.8b
/share/apps/modulefiles/ abinit/8.4.4
/share/apps/modulefiles/ abinit/8.6.1
/share/apps/modulefiles/ abinit/8.6.1-QFDcc
/share/apps/modulefiles/ abinit/8.6.1-qFD-trestles
OpenHPC modulefiles

At this writing there are 548 module files from the OpenHPC distribution. It concentrates on mathematical software. Many of the packages are a bit dated, but when we update to Rocky 8 Linux we will be able to install some newer packages.

You can run particular packages like so. For an example, we'll try to find the newest version of petsc and the MPI and compiler package that it needs. We then load those three and it auto-loads some prerequisites. There is quite a bit of unnecessary duplication, as for instance, gnu 7 and gnu 8 binaries are compatible, so it wasn't really necessary to duplicate every module file.

$ grep petsc /share/apps/modulelist
/opt/ohpc/pub/moduledeps/ gnu7-impi/petsc/3.9.1
/opt/ohpc/pub/moduledeps/ gnu7-mpich/petsc/3.9.1
/opt/ohpc/pub/moduledeps/ gnu7-mvapich2/petsc/3.9.1
/opt/ohpc/pub/moduledeps/ gnu7-openmpi3/petsc/3.9.1
/opt/ohpc/pub/moduledeps/ gnu7-openmpi/petsc/3.8.3
/opt/ohpc/pub/moduledeps/ gnu8-impi/petsc/3.12.0
/opt/ohpc/pub/moduledeps/ gnu8-mpich/petsc/3.12.0
/opt/ohpc/pub/moduledeps/ gnu8-mvapich2/petsc/3.12.0
/opt/ohpc/pub/moduledeps/ gnu8-openmpi3/petsc/3.12.0
/opt/ohpc/pub/moduledeps/ gnu-impi/petsc/3.8.3
/opt/ohpc/pub/moduledeps/ gnu-mpich/petsc/3.8.3
/opt/ohpc/pub/moduledeps/ gnu-mvapich2/petsc/3.8.3
/opt/ohpc/pub/moduledeps/ gnu-openmpi/petsc/3.8.3
/opt/ohpc/pub/moduledeps/ intel-impi/petsc/3.12.0
/opt/ohpc/pub/moduledeps/ intel-mpich/petsc/3.12.0
/opt/ohpc/pub/moduledeps/ intel-mvapich2/petsc/3.12.0
/opt/ohpc/pub/moduledeps/ intel-openmpi3/petsc/3.12.0
/opt/ohpc/pub/moduledeps/ intel-openmpi/petsc/3.8.3
/share/apps/modulefiles/ petsc/3.10.5
/share/apps/modulefiles/ petsc/3.11.3
/share/apps/modulefiles/ petsc/3.14.2
/share/apps/modulefiles/ petsc/3.16.4
/share/apps/modulefiles/ petsc/3.8.58

$ grep "gnu8/openmpi3/" /share/apps/modulelist
/opt/ohpc/pub/moduledeps/ gnu8/openmpi3/3.1.4

$ grep "gnu8/8" /share/apps/modulelist
/opt/ohpc/pub/modulefiles/ gnu8/8.3.0

$ module load gnu8/8.3.0 gnu8/openmpi3/3.1.4 gnu8-openmpi3/petsc/3.12.0
$ module list
Currently Laded Modules:
  1) gnu8/8.3.0   2) gnu8/openmpi3/3.1.4   3) gnu8-openmpi3/phdf5/1.10.5   4) openblas/3.20-noomp   5) gnu8-openmpi3/scalapack/2.0.2   6) gnu8-openmpi3/petsc/3.12.0

It is likely that this will only work on one node since it links back to its internal MPI which is not customized for our network. However, it should suffice for a test- or education-sized petsc run, while petsc and its prerequisites take some hours to install manually.

Spack modulefiles

There are currently 114 programs auto-installed using Spack. This will need to be duplicated for every node type in the cluster, which will take some time. We have begun mostly with bioinformatics programs that will have fewer MPI complicataions.

$ grep spack /share/apps/modulelist | head -5
/share/apps/spackmodulefiles/ gcc-11.2.1/SKYLAKEX/abinit/9.6.1
/share/apps/spackmodulefiles/ gcc-11.2.1/SKYLAKEX/abyss/2.3.1
/share/apps/spackmodulefiles/ gcc-11.2.1/SKYLAKEX/atompaw/4.2.0.1
/share/apps/spackmodulefiles/ gcc-11.2.1/SKYLAKEX/bamtools/2.5.2
/share/apps/spackmodulefiles/ gcc-11.2.1/SKYLAKEX/bcftools/1.14

As an example, we'll try to run abyss-pe in parallel as in Biowulf-abyss. The module turns out to call, but not to load, mpirun so we will add some local modules. At this time, all the local Spack software is compiled with gcc/11.2.1 and openmpi/4.1.4 (though not our versions) so we will use those modules. This does work, though we have not tested with multiple nodes, as it is not clear which version of MPI is actually being linked to.

$ module load gcc-11.2.1/SKYLAKEX/abyss/2.3.1
$ abyss-pe np=32 j=8 k=25 n=10 in='*fq' name=OutputPrefix

mpirun -np 32 ABYSS-P -k25 -q3    --coverage-hist=coverage.hist -s OutputPrefix-bubbles.fa  -o OutputPrefix-1.fa *fq 
bash: mpirun: command not found
make: *** [OutputPrefix-1.fa] Error 127

$ module load gcc/11.2.1 openmpi/4.1.4
$ time abyss-pe np=32 j=8 k=25 n=10 in='*fq' name=OutputPrefix

/share/apps/mpi/openmpi-4.1.4/cuda/gcc/bin/mpirun -np 32 ABYSS-P -k25 -q3    --coverage-hist=coverage.hist -s OutputPrefix-bubbles.fa  -o OutputPrefix-1.fa *fq 
ABySS 2.3.1
ABYSS-P -k25 -q3 --coverage-hist=coverage.hist -s OutputPrefi-bubbles.fa -o OutputPrefi-1.fa Thalassiosira-weissflogii_AJA159-02_0ppt_r8_BS-440_trimmed_filtered_1.fq Thalassiosira-weissflogii_AJA159-02_0ppt_r8_BS-440_trimmed_filtered_2.fq
Running on 32 processors
4: Running on host c1411
.etc
...
abyss-fac   OutputPrefi-unitigs.fa OutputPrefi-contigs.fa OutputPrefi-scaffolds.fa |tee OutputPrefi-stats.tab
n	n:500	L50	min	N75	N50	N25	E-size	max	sum	name
211760	17866	4775	500	936	1470	2362	1912	17668	22.72e6	OutputPrefi-unitigs.fa
204672	15838	3757	500	1051	1927	3072	2431	17668	23.38e6	OutputPrefi-contigs.fa
204045	15400	3447	500	1057	2025	3400	2642	17668	23.38e6	OutputPrefi-scaffolds.fa
ln -sf OutputPrefi-stats.tab OutputPrefi-stats
tr '\t' , <OutputPrefi-stats.tab >OutputPrefi-stats.csv
abyss-tabtomd OutputPrefi-stats.tab >OutputPrefi-stats.md

real	36m33.072s
user	759m31.204s
sys	3m11.375s

As for OpenHPC, this can be an easy way to try out a program without a lot of installation time.

Compiler-MPI Recommendations

Most parallel programs require the selection of a compiler and an MPI version. We usually recommend the following compiler versions (select only one, usually, with exceptions noted below):

$ module load gcc/11.2.1    
#synonym gnu also works, latest gnu compiler from "Centos 7 Development Tools", enables gcc/g++/gfortran

$ module load intel/21.2.0
#synonym intelcompiler also works, both intel proprietary icc/icpc/ifort and intel llvm icx/icpx/ifx

$ module load nvhpc/22.7
#synonym PGI also works, Nvidia/PGI compiler equally nvc/nvc++/nvfortran and pgcc/pgc++/pgf77/pgf90/pgf95/pgfortran

$ module load aocc/3.0
#AMD llvm compiler clang/clang++/flang

If you don't load any modules, there are some very old compilers built into Centos:

$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)

$ clang -v
clang version 3.4.2 (tags/RELEASE_34/dot2-final)
Target: x86_64-redhat-linux-gnu
Thread model: posix
Found candidate GCC installation: /bin/../lib/gcc/x86_64-redhat-linux/4.8.2
Found candidate GCC installation: /bin/../lib/gcc/x86_64-redhat-linux/4.8.5
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.2
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.5
Selected GCC installation: /bin/../lib/gcc/x86_64-redhat-linux/4.8.5

These compilers aren't recommended for compiling, but probably suffice for running non-parallel applications compiled with newer gcc, as they are likely binary compatible.

We recommend the following MPI versions. Definitely select only one (at runtime mvapich2 and impi should be equivalent, but not at compile time):

openmpi/4.1.4
#with gcc, intel, nvhpc

mvapich2/2.3.7
#with gcc, intel

impi/17.0.7
#with gcc, intel

In combination we recommend (as compiler, then mpi in order so that the correct libraries are loaded).

$ module load { gcc/11.2.1 | intel/21.2.0 | nvhpc/22.7 } openmpi/4.1.4

$ module load { gcc/11.2.1 | intel/21.2.0 } mvapich2/2.3.7

$ module load { gcc/11.2.1 | intel/21.2.0 } impi/17.0.7

There are a couple of situations where you would want multiple compilers loaded (But first compiler-MPI version that is module loaded will determine the MPI code that is loaded).

(1) Most c++ compilers use the gnu c++ include libraries. For a program (LAMMPS is one) that uses a lot of relatively recent c++, you will want a recent gcc to provide those libraries.

This works with the Intel proprietary icpc compiler

$ module load intel/17.0.7 openmpi/4.1.4 gcc/11.2.1

If you don't add the third module, icpc will use the libraries from the default Centos g++ 4.8.5 which is quite old and probably can't compile LAMMPS at all.

(2) llvm compilers (aocc/3.0.0 and intel/21.2.0 icx) try to auto-find g++ libraries but don't do it quite correctly.

$ module load aocc/3.0.0
$ clang++ -v
AMD clang version 12.0.0 (CLANG: AOCC_3.0.0-Build#78 2020_12_10) (based on LLVM Mirror.Version.12.0.0)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/AMD/aocc-compiler-3.0.0/bin
Found candidate GCC installation: /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7
Found candidate GCC installation: /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8
Found candidate GCC installation: /opt/rh/devtoolset-9/root/usr/lib/gcc/x86_64-redhat-linux/9
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.2
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.5
Selected GCC installation: /opt/rh/devtoolset-9/root/usr/lib/gcc/x86_64-redhat-linux/9

so it picks devtoolset-9 libraries in spite of devtoolset-10 and 11 being available

$ ls /opt/rh
devtoolset-10  devtoolset-11  devtoolset-3  devtoolset-7  devtoolset-8  devtoolset-9  

if devtoolset-9 (gcc/9.3.1) is new enough, then that's ok.

(3) Sometimes Intel MKL will link back to the Intel compiler when using Intel OMP instead of GNU OMP. This should work:

$ module load gcc/11.2.1 mkl/20.0.4 openmpi/4.1.4 intel/17.0.7
sapplication_software.1664318694.txt.gz · Last modified: 2022/09/27 22:44 by root