Versions 6.8/7.1
Compilation With Intel compiler, Intel MPI, and MKL
#COMPUTER=skylake #OPT="-xHOST" COMPUTER=bulldozer OPT="-msse3 -axsse3,sse4.2,AVX,core-AVX2,CORE-AVX512" VERSION=7.1 HDF5=1.12.0 module purge module load intel/19.0.5 mkl/20.0.4 impi/17.0.4 OMP="--enable-openmp" make clean ./install/configure MPIF90=mpiifort F90=ifort F77=ifort FC=ifort CC=icc \ SCALAPACK_LIBS="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" \ LAPACK_LIBS="-L$MKLROOT/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64" \ BLAS_LIBS="-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -thread" \ FFT_LIBS="-L$MKLROOT/interfaces/fftw3xf -lfftw3xf_intel" \ FFLAGS="-O3 $OPT -D__INTEL -D__GNUC__ -D__FFTW3 -D__MPI -D__PARA -D__SCALAPACK -assume byterecl \-I$MKLROOT/include/fftw" \ CFLAGS="-O3 $OPT -D__INTEL -D__GNUC__ -D__FFTW3 -D__MPI -D__PARA -D__SCALAPACK" \ --with-hdf5=/share/apps/hdf5/$HDF5/intel/impi -with-scalapack=intel --enable-parallel \ $OMP --prefix=/share/apps/espresso/espresso-$VERSION-intel-impi-mkl-$COMPUTER make depends make all make install
Runtime:
module load intel/18.0.2 impi/17.0.4 mkl/20.0.4 {qe/7.1 or qe/6.8} trestles:module load intel/18.0.2 impi/17.0.4 mkl/20.0.1 {qe/7.1 or qe/6.8}
The performance is not sensitive to qe version between 6.8 and 7.1, but is quite sensitive to MKL version. Newest MKL (20.0.4) is best on all platforms except on trestles (20.0.1) is best. There are two executable sets selected by the module at runtime (“skylake” for Pinnacle-I and “bulldozer” for all other platforms). Performance with OpenMP is slightly slower.
The AUSURF112 benchmark is used for comparison with “-nk 2” and both CPUs on one node
System QE version cores OMP time Pinnacle II-AMD7543 7.1 64 1 86 Pinnacle II-AMD7543 7.1 32 2 89 Pinnacle I-Intel6130 7.1 32 1 133 Pinnacle I-Intel6130 7.1 16 2 137 Trestles-AMD6136 7.1 32 1 718 Trestles-AMD6136 7.1 16 2 858