Differences

This shows you the differences between two versions of the page.

--- quantum_espresso [2020/09/21 21:50]
root created
+++ quantum_espresso [2022/07/01 20:57] (current)
root
@@ Line 1: / Line 1: @@
-===== Quantum Espresso 5.1 =====
+===== Quantum Espresso =====
+Versions 6.8/7.1
 ** Compilation **
+With Intel compiler, Intel MPI, and MKL
-With Intel compiler and either OpenMPI or MVAPICH2:
 <code>
-OpenMPI:
+#COMPUTER=skylake
-DFLAGS         = -D__INTEL -D__FFTW3 -D__MPI -D__PARA -D__SCALAPACK $(MANUAL_DFLAGS)
+#OPT="-xHOST"
-IFLAGS         = -I../include
+COMPUTER=bulldozer
-MPIF90         = mpif90
+OPT="-msse3 -axsse3,sse4.2,AVX,core-AVX2,CORE-AVX512"
-CFLAGS         = -O3 -xSSE2 -axavx $(DFLAGS) $(IFLAGS)
+VERSION=7.1
-F90FLAGS       = $(FFLAGS) -nomodule -fpp $(FDFLAGS) $(IFLAGS) $(MODFLAGS)
+HDF5=1.12.0
-FFLAGS         = -O2 -xSSE2 -axavx -assume byterecl -g -traceback -par-report0 -vec-report0
+module purge
-FFLAGS_NOOPT   = -O0 -assume byterecl -g -traceback
+module load intel/19.0.5 mkl/20.0.4 impi/17.0.4
-FFLAGS_NOMAIN  = -nofor_main
+OMP="--enable-openmp"
-LD             = mpif90
+make clean
-LDFLAGS        = -static-intel
+./install/configure MPIF90=mpiifort F90=ifort F77=ifort FC=ifort CC=icc \
-SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64
+SCALAPACK_LIBS="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" \
-FFT_LIBS       = -L ${MKL_ROOT}/interfaces/fftw3xf -lfftw3xf_intel
+LAPACK_LIBS="-L$MKLROOT/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64" \
+BLAS_LIBS="-lmkl_intel_lp64  -lmkl_intel_thread -lmkl_core -liomp5 -thread" \
-MVAPICH2: same except
+FFT_LIBS="-L$MKLROOT/interfaces/fftw3xf -lfftw3xf_intel" \
-SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64
+FFLAGS="-O3 $OPT -D__INTEL -D__GNUC__ -D__FFTW3 -D__MPI -D__PARA -D__SCALAPACK -assume byterecl \-I$MKLROOT/include/fftw" \
+CFLAGS="-O3 $OPT -D__INTEL -D__GNUC__ -D__FFTW3 -D__MPI -D__PARA -D__SCALAPACK" \
-trestles: same except
+--with-hdf5=/share/apps/hdf5/$HDF5/intel/impi -with-scalapack=intel --enable-parallel \
-no -axavx (though an "optional" code path, it makes the program fail on AMD)
+$OMP --prefix=/share/apps/espresso/espresso-$VERSION-intel-impi-mkl-$COMPUTER
+make depends
+make all
+make install
 </code>
-** Benchmarks **
+Runtime:
-We run AUSURF112 from [[http://qe-forge.org/gf/project/q-e/frs/?action=FrsReleaseBrowse&frs_package_id=36|Espresso Benchmarks]] and compare with [[http://glennklockwood.blogspot.com/2014/02/quantum-espresso-performance-benefits.html|Glenn Lockwood]] who ran the AUSURF112 benchmark on SDSC Comet and on the Trestles system when it was at SDSC.  Unfortunately, the AUSURF112 benchmark generally ends the simulation with
-''convergence NOT achieved after   2 iterations: stopping'',
-but it does so fairly repeatably so may be timed.
 <code>
-OpenMPI:
+module load intel/18.0.2 impi/17.0.4 mkl/20.0.4 {qe/7.1 or qe/6.8}
-module load intel/14.0.3 mkl/14.0.3 openmpi/1.8.8
+trestles:module load intel/18.0.2 impi/17.0.4 mkl/20.0.1 {qe/7.1 or qe/6.8}
-mpirun -np 64  -machinefile $PBS_NODEFILE -x LD_LIBRARY_PATH \
-/share/apps/espresso/espresso-5.1-intel-openmpi/bin/pw.x -npools 1 <ausurf.in
-MVAPICH2:
-module load intel/14.0.3 mkl/14.0.3 mvapich2/2.1
-mpirun -np 64  -machinefile $PBS_NODEFILE \
-/share/apps/espresso/espresso-5.1-intel-mvapich2/bin/pw.x -npools 1 <ausurf.in
 </code>
-The tables shows Lockwood's and our times.  We add 32x4 core runs for Trestles as we think node-to-node is the more representative comparison.  Our newer versions of OpenMPI show better results on the almost-identical hardware than Lockwood.
-<csv>
-Walltime,CoresxNodes,Intel/Mvapich, Intel/OpenMPI
-Lockwood Gordon E5-2670,16x4,470,580
-Lockwood Trestles AMD6136,32x2,1060,1440
-Our E5-2650V2,16x4,na,475
-Our E5-2670,16x4,456,488
-Our Trestles AMD6136,32x2,(1),1007
-Our Trestles AMD6136,32x4,642,762
-</csv>
-(1) Fails with error [[http://www.quantum-espresso.org/faq/frequent-errors-during-execution/#5.6|charge is wrong]].
-** Notes **
+The performance is not sensitive to qe version between 6.8 and 7.1, but is quite sensitive to MKL version.  Newest MKL (20.0.4) is best on all platforms except on trestles (20.0.1) is best.  There are two executable sets selected by the module at runtime ("skylake" for Pinnacle-I and "bulldozer" for all other platforms).  Performance with OpenMP is slightly slower.
-Each run fails with error messages (depending on MPI type) and RC 1 after terminating normally according to the log. This appears harmless:
+The AUSURF112 benchmark is used for comparison with "-nk 2" and both CPUs on one node
 <code>
-   This run was terminated on:  13: 2:44  11Nov2015
+System     QE version cores OMP  time
-=------------------------------------------------------------------------------=
+Pinnacle II-AMD7543  7.1 64   1    86
-   JOB DONE.
+Pinnacle II-AMD7543  7.1 32   2    89
-=------------------------------------------------------------------------------=
+Pinnacle I-Intel6130 7.1 32   1   133
--------------------------------------------------------
+Pinnacle I-Intel6130 7.1 16   2   137
-Primary job  terminated normally, but 1 process returned
+Trestles-AMD6136     7.1 32   1   718
-a non-zero exit code.. Per user-direction, the job has been aborted.
+Trestles-AMD6136     7.1 16   2   858
--------------------------------------------------------
-------------------------------------------------------------
-A process or daemon was unable to complete a TCP connection
-to another process:
-etc.
 </code>
-** Continuing Work **
-ELPA in newer versions of Espresso is reportedly faster than Scalapack.
-OpenMPI threading.
-MKL threading.
-FFTW fft vs. Intel fft on AMD.

Arkansas High Performace Computing Center [hpcwiki]

User Tools

Site Tools

Differences

Page Tools