===== Quantum Espresso =====

Versions 6.8/7.1

** Compilation **
With Intel compiler, Intel MPI, and MKL

<code>
#COMPUTER=skylake
#OPT="-xHOST"
COMPUTER=bulldozer
OPT="-msse3 -axsse3,sse4.2,AVX,core-AVX2,CORE-AVX512"
VERSION=7.1
HDF5=1.12.0
module purge
module load intel/19.0.5 mkl/20.0.4 impi/17.0.4
OMP="--enable-openmp"
make clean
./install/configure MPIF90=mpiifort F90=ifort F77=ifort FC=ifort CC=icc \
SCALAPACK_LIBS="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" \
LAPACK_LIBS="-L$MKLROOT/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64" \
BLAS_LIBS="-lmkl_intel_lp64  -lmkl_intel_thread -lmkl_core -liomp5 -thread" \
FFT_LIBS="-L$MKLROOT/interfaces/fftw3xf -lfftw3xf_intel" \
FFLAGS="-O3 $OPT -D__INTEL -D__GNUC__ -D__FFTW3 -D__MPI -D__PARA -D__SCALAPACK -assume byterecl \-I$MKLROOT/include/fftw" \
CFLAGS="-O3 $OPT -D__INTEL -D__GNUC__ -D__FFTW3 -D__MPI -D__PARA -D__SCALAPACK" \
--with-hdf5=/share/apps/hdf5/$HDF5/intel/impi -with-scalapack=intel --enable-parallel \
$OMP --prefix=/share/apps/espresso/espresso-$VERSION-intel-impi-mkl-$COMPUTER
make depends
make all
make install
</code>

Runtime:
<code>
module load intel/18.0.2 impi/17.0.4 mkl/20.0.4 {qe/7.1 or qe/6.8}
trestles:module load intel/18.0.2 impi/17.0.4 mkl/20.0.1 {qe/7.1 or qe/6.8}
</code>

The performance is not sensitive to qe version between 6.8 and 7.1, but is quite sensitive to MKL version.  Newest MKL (20.0.4) is best on all platforms except on trestles (20.0.1) is best.  There are two executable sets selected by the module at runtime ("skylake" for Pinnacle-I and "bulldozer" for all other platforms).  Performance with OpenMP is slightly slower.

The AUSURF112 benchmark is used for comparison with "-nk 2" and both CPUs on one node

<code>
System     QE version cores OMP  time 
Pinnacle II-AMD7543  7.1 64   1    86
Pinnacle II-AMD7543  7.1 32   2    89
Pinnacle I-Intel6130 7.1 32   1   133
Pinnacle I-Intel6130 7.1 16   2   137
Trestles-AMD6136     7.1 32   1   718
Trestles-AMD6136     7.1 16   2   858
</code>