This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
quantum_espresso [2020/11/20 20:49] root |
quantum_espresso [2022/06/20 18:40] root |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== Quantum Espresso | + | ===== Quantum Espresso ===== |
+ | Version 5.1 | ||
** Compilation ** | ** Compilation ** | ||
Line 82: | Line 83: | ||
=== 2020 Update q-e 6.6=== | === 2020 Update q-e 6.6=== | ||
- | On Trestles with Intel tools. | + | On Trestles with Intel tools. |
< | < | ||
- | module load intel/17.0.4 impi/19.0.5 impi/19.0.5 | + | module load intel/18.0.2 mkl/19.0.5 impi/17.0.4 |
- | OMP_NUM_THREADS=# | + | MKL_NUM_THREADS=# |
</ | </ | ||
- | q-e appears to be a code that does not like mpi threads x OMP threads > cores. | + | q-e appears to be a code that does not like mpi threads x OMP threads > physical |
- | Performance on two trestles nodes is a lot better than the previous 5.1 benchmarks. | + | Performance on two trestles nodes is better than the previous 5.1 benchmarks, but it doesn' |
< | < | ||
- | cores Node type | + | Cores Node type #mpi # |
- | 32 Trestles AMD | + | 32 Trestles AMD |
- | 32 Trestles AMD | + | 32 Trestles AMD |
- | 32 Trestles AMD | + | 32 Trestles AMD |
+ | 32 Trestles AMD 16 | ||
+ | 32 Trestles AMD 64 | ||
+ | 32 Trestles AMD | ||
+ | 32 6130 | ||
+ | 32 6130 | ||
+ | 32 6130 | ||
+ | 48 7402 | ||
+ | 48 7402 | ||
+ | 48 7402 | ||
</ | </ | ||
+ | |||
+ | * using a better optimized version for more modern machines, which doesn' | ||
+ | |||
+ | ** tested for number of hardware threads, which is negative for performance vs. number of physical cores | ||
+ | |||
+ | *** 6.5 version. | ||
Install script | Install script | ||
< | < | ||
OMP=" | OMP=" | ||
- | ./ | + | VERSION=6.6 |
+ | ./ | ||
+ | SCALAPACK_LIBS=" | ||
+ | LAPACK_LIBS=" | ||
+ | BLAS_LIBS=" | ||
+ | FFT_LIBS=" | ||
+ | FFLAGS=" | ||
+ | -assume byterecl -I$MKLROOT/ | ||
+ | CFLAGS=" | ||
+ | --with-hdf5=/ | ||
+ | --enable-parallel $OMP --prefix=/ | ||
</ | </ | ||
+ | ==Update 2022== | ||
+ | |||
+ | QE 6.8 are 7.1 installed with two versions compiled with the Intel compiler (" | ||
+ | " | ||
+ | " | ||
+ | Both use '' | ||
+ | The '' | ||
+ | The " | ||
+ | '' | ||
+ | For AMD, explicitly set after the module load: | ||
+ | < | ||
+ | export MKL_DEBUG_CPU_TYPE=0 | ||
+ | </ | ||
+ | |||
+ | A small pw.x input was used to allow some parameter sweeps. | ||
+ | < | ||
+ | Single Node Results | ||
+ | |||
+ | {OMP_NUM_THREADS=1|2} time mpirun -np {16|32|64} \ | ||
+ | / | ||
+ | -nk {1|4|8|16} <scf.in >log | ||
+ | |||
+ | System | ||
+ | |||
+ | Pinnacle I 32 Intel 6130 6.8 skylake | ||
+ | Pinnacle I 32 Intel 6130 6.8 skylake | ||
+ | Pinnacle I 32 Intel 6130 6.8 skylake | ||
+ | Pinnacle I 32 Intel 6130 6.8 skylake | ||
+ | |||
+ | Pinnacle I 32 Intel 6130 6.8 skylake | ||
+ | Pinnacle I 32 Intel 6130 6.8 bulldozer | ||
+ | |||
+ | Trestles | ||
+ | Trestles | ||
+ | Trestles | ||
+ | |||
+ | Pinnacle II 64 AMD 7543 6.8 bulldozer | ||
+ | Pinnacle II 64 AMD 7543 6.8 bulldozer | ||
+ | Pinnacle II 64 AMD 7543 6.8 bulldozer | ||
+ | Pinnacle II 64 AMD 7543 6.8 bulldozer | ||
+ | Pinnacle II 64 AMD 7543 7.1 bulldozer | ||
+ | </ | ||
+ | |||
+ | Conclusions for this sample program: | ||
+ | |||
+ | '' | ||
+ | |||
+ | QE 7.1 is slightly slower than 6.8 on Pinnacle I & II and slightly faster on Trestles. | ||
+ | |||
+ | '' | ||
+ | |||
+ | The bulldozer version runs on Intel but is significantly slower than the skylake version (other older platforms such as E5 condo nodes won't run AVX512 codes and would probably benefit from their own version). | ||
+ | The usually zero-wait Trestles system has relatively good performance on QE if shared memory (64 GB) allows (relatively meaning ~1/2 of Pinnacle I performance when for some programs it is ~1/5) |