This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
quantum_espresso [2022/06/20 18:40] root |
quantum_espresso [2022/07/01 20:57] (current) root |
||
---|---|---|---|
Line 1: | Line 1: | ||
===== Quantum Espresso ===== | ===== Quantum Espresso ===== | ||
- | Version 5.1 | ||
- | ** Compilation ** | ||
- | With Intel compiler and either OpenMPI or MVAPICH2: | + | Versions 6.8/7.1 |
- | < | + | |
- | OpenMPI: | + | |
- | DFLAGS | + | |
- | IFLAGS | + | |
- | MPIF90 | + | |
- | CFLAGS | + | |
- | F90FLAGS | + | |
- | FFLAGS | + | |
- | FFLAGS_NOOPT | + | |
- | FFLAGS_NOMAIN | + | |
- | LD = mpif90 | + | |
- | LDFLAGS | + | |
- | SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 | + | |
- | FFT_LIBS | + | |
- | MVAPICH2: same except | + | ** Compilation ** |
- | SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 | + | With Intel compiler, Intel MPI, and MKL |
- | trestles: same except | ||
- | no -axavx (though an " | ||
- | </ | ||
- | |||
- | ** Benchmarks ** | ||
- | |||
- | We run AUSURF112 from [[http:// | ||
- | '' | ||
- | but it does so fairly repeatably so may be timed. | ||
- | < | ||
- | OpenMPI: | ||
- | module load intel/ | ||
- | mpirun -np 64 -machinefile $PBS_NODEFILE -x LD_LIBRARY_PATH \ | ||
- | / | ||
- | MVAPICH2: | ||
- | module load intel/ | ||
- | mpirun -np 64 -machinefile $PBS_NODEFILE \ | ||
- | / | ||
- | </ | ||
- | The tables shows Lockwood' | ||
- | <csv> | ||
- | Walltime, | ||
- | Lockwood Gordon E5-2670, | ||
- | Lockwood Trestles AMD6136, | ||
- | Our E5-2650V2, | ||
- | Our E5-2670, | ||
- | Our Trestles AMD6136, | ||
- | Our Trestles AMD6136, | ||
- | </ | ||
- | (1) Fails with error [[http:// | ||
- | |||
- | ** Notes ** | ||
- | |||
- | Each run fails with error messages (depending on MPI type) and RC 1 after terminating normally according to the log. This appears harmless: | ||
- | |||
- | < | ||
- | This run was terminated on: 13: 2:44 11Nov2015 | ||
- | =------------------------------------------------------------------------------= | ||
- | JOB DONE. | ||
- | =------------------------------------------------------------------------------= | ||
- | ------------------------------------------------------- | ||
- | Primary job terminated normally, but 1 process returned | ||
- | a non-zero exit code.. Per user-direction, | ||
- | ------------------------------------------------------- | ||
- | ------------------------------------------------------------ | ||
- | A process or daemon was unable to complete a TCP connection | ||
- | to another process: | ||
- | etc. | ||
- | </ | ||
- | |||
- | ** Continuing Work ** | ||
- | |||
- | ELPA in newer versions of Espresso is reportedly faster than Scalapack. | ||
- | |||
- | OpenMPI threading. | ||
- | |||
- | MKL threading. | ||
- | |||
- | FFTW fft vs. Intel fft on AMD. | ||
- | |||
- | === 2020 Update q-e 6.6=== | ||
- | On Trestles with Intel tools. | ||
- | |||
- | |||
- | < | ||
- | module load intel/ | ||
- | MKL_NUM_THREADS=# | ||
- | </ | ||
- | |||
- | q-e appears to be a code that does not like mpi threads x OMP threads > physical cores. | ||
- | Performance on two trestles nodes is better than the previous 5.1 benchmarks, but it doesn' | ||
- | |||
- | < | ||
- | Cores Node type #mpi # | ||
- | 32 Trestles AMD 32 | ||
- | 32 Trestles AMD 32 | ||
- | 32 Trestles AMD 32 | ||
- | 32 Trestles AMD 16 | ||
- | 32 Trestles AMD 64 | ||
- | 32 Trestles AMD | ||
- | 32 6130 | ||
- | 32 6130 | ||
- | 32 6130 | ||
- | 48 7402 | ||
- | 48 7402 | ||
- | 48 7402 | ||
- | </ | ||
- | |||
- | * using a better optimized version for more modern machines, which doesn' | ||
- | |||
- | ** tested for number of hardware threads, which is negative for performance vs. number of physical cores | ||
- | |||
- | *** 6.5 version. | ||
- | |||
- | Install script | ||
< | < | ||
+ | # | ||
+ | # | ||
+ | COMPUTER=bulldozer | ||
+ | OPT=" | ||
+ | VERSION=7.1 | ||
+ | HDF5=1.12.0 | ||
+ | module purge | ||
+ | module load intel/ | ||
OMP=" | OMP=" | ||
- | VERSION=6.6 | + | make clean |
./ | ./ | ||
SCALAPACK_LIBS=" | SCALAPACK_LIBS=" | ||
Line 125: | Line 22: | ||
BLAS_LIBS=" | BLAS_LIBS=" | ||
FFT_LIBS=" | FFT_LIBS=" | ||
- | FFLAGS=" | + | FFLAGS=" |
- | -assume byterecl -I$MKLROOT/ | + | CFLAGS=" |
- | CFLAGS=" | + | --with-hdf5=/ |
- | --with-hdf5=/ | + | $OMP --prefix=/ |
- | --enable-parallel $OMP --prefix=/ | + | make depends |
+ | make all | ||
+ | make install | ||
</ | </ | ||
- | ==Update 2022== | + | Runtime: |
- | + | ||
- | QE 6.8 are 7.1 installed with two versions compiled with the Intel compiler (" | + | |
- | " | + | |
- | " | + | |
- | Both use '' | + | |
- | The '' | + | |
- | The " | + | |
- | '' | + | |
- | For AMD, explicitly set after the module load: | + | |
< | < | ||
- | export MKL_DEBUG_CPU_TYPE=0 | + | module load intel/18.0.2 impi/17.0.4 mkl/20.0.4 {qe/7.1 or qe/6.8} |
+ | trestles: | ||
</ | </ | ||
- | A small pw.x input was used to allow some parameter sweeps. | + | The performance is not sensitive to qe version between 6.8 and 7.1, but is quite sensitive |
- | < | + | |
- | Single Node Results | + | |
- | {OMP_NUM_THREADS=1|2} time mpirun -np {16|32|64} \ | + | The AUSURF112 benchmark is used for comparison with "-nk 2" and both CPUs on one node |
- | / | + | |
- | -nk {1|4|8|16} <scf.in >log | + | |
- | System | + | < |
- | + | System | |
- | Pinnacle | + | Pinnacle |
- | Pinnacle | + | Pinnacle |
- | Pinnacle I | + | Pinnacle I-Intel6130 7.1 32 |
- | Pinnacle I 32 Intel 6130 6.8 skylake | + | Pinnacle I-Intel6130 7.1 16 |
- | + | Trestles-AMD6136 | |
- | Pinnacle I | + | Trestles-AMD6136 |
- | Pinnacle I 32 Intel 6130 6.8 bulldozer | + | |
- | + | ||
- | Trestles | + | |
- | Trestles | + | |
- | Trestles | + | |
- | + | ||
- | Pinnacle II 64 AMD 7543 6.8 bulldozer | + | |
- | Pinnacle II 64 AMD 7543 6.8 bulldozer | + | |
- | Pinnacle II 64 AMD 7543 6.8 bulldozer | + | |
- | Pinnacle II 64 | + | |
- | Pinnacle II 64 AMD 7543 7.1 bulldozer | + | |
</ | </ | ||
- | Conclusions for this sample program: | ||
- | |||
- | '' | ||
- | |||
- | QE 7.1 is slightly slower than 6.8 on Pinnacle I & II and slightly faster on Trestles. | ||
- | |||
- | '' | ||
- | |||
- | The bulldozer version runs on Intel but is significantly slower than the skylake version (other older platforms such as E5 condo nodes won't run AVX512 codes and would probably benefit from their own version). | ||
- | |||
- | The usually zero-wait Trestles system has relatively good performance on QE if shared memory (64 GB) allows (relatively meaning ~1/2 of Pinnacle I performance when for some programs it is ~1/5) |