This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
quantum_espresso [2022/06/20 18:32] root |
quantum_espresso [2022/07/01 20:57] root |
||
---|---|---|---|
Line 1: | Line 1: | ||
===== Quantum Espresso ===== | ===== Quantum Espresso ===== | ||
- | Version 5.1 | ||
- | ** Compilation ** | ||
- | With Intel compiler and either OpenMPI or MVAPICH2: | + | Versions 6.8/7.1 |
- | < | + | |
- | OpenMPI: | + | |
- | DFLAGS | + | |
- | IFLAGS | + | |
- | MPIF90 | + | |
- | CFLAGS | + | |
- | F90FLAGS | + | |
- | FFLAGS | + | |
- | FFLAGS_NOOPT | + | |
- | FFLAGS_NOMAIN | + | |
- | LD = mpif90 | + | |
- | LDFLAGS | + | |
- | SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 | + | |
- | FFT_LIBS | + | |
- | MVAPICH2: same except | + | ** Compilation ** |
- | SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 | + | With Intel compiler, Intel MPI, and MKL |
- | trestles: same except | ||
- | no -axavx (though an " | ||
- | </ | ||
- | |||
- | ** Benchmarks ** | ||
- | |||
- | We run AUSURF112 from [[http:// | ||
- | '' | ||
- | but it does so fairly repeatably so may be timed. | ||
- | < | ||
- | OpenMPI: | ||
- | module load intel/ | ||
- | mpirun -np 64 -machinefile $PBS_NODEFILE -x LD_LIBRARY_PATH \ | ||
- | / | ||
- | MVAPICH2: | ||
- | module load intel/ | ||
- | mpirun -np 64 -machinefile $PBS_NODEFILE \ | ||
- | / | ||
- | </ | ||
- | The tables shows Lockwood' | ||
- | <csv> | ||
- | Walltime, | ||
- | Lockwood Gordon E5-2670, | ||
- | Lockwood Trestles AMD6136, | ||
- | Our E5-2650V2, | ||
- | Our E5-2670, | ||
- | Our Trestles AMD6136, | ||
- | Our Trestles AMD6136, | ||
- | </ | ||
- | (1) Fails with error [[http:// | ||
- | |||
- | ** Notes ** | ||
- | |||
- | Each run fails with error messages (depending on MPI type) and RC 1 after terminating normally according to the log. This appears harmless: | ||
- | |||
- | < | ||
- | This run was terminated on: 13: 2:44 11Nov2015 | ||
- | =------------------------------------------------------------------------------= | ||
- | JOB DONE. | ||
- | =------------------------------------------------------------------------------= | ||
- | ------------------------------------------------------- | ||
- | Primary job terminated normally, but 1 process returned | ||
- | a non-zero exit code.. Per user-direction, | ||
- | ------------------------------------------------------- | ||
- | ------------------------------------------------------------ | ||
- | A process or daemon was unable to complete a TCP connection | ||
- | to another process: | ||
- | etc. | ||
- | </ | ||
- | |||
- | ** Continuing Work ** | ||
- | |||
- | ELPA in newer versions of Espresso is reportedly faster than Scalapack. | ||
- | |||
- | OpenMPI threading. | ||
- | |||
- | MKL threading. | ||
- | |||
- | FFTW fft vs. Intel fft on AMD. | ||
- | |||
- | === 2020 Update q-e 6.6=== | ||
- | On Trestles with Intel tools. | ||
- | |||
- | |||
- | < | ||
- | module load intel/ | ||
- | MKL_NUM_THREADS=# | ||
- | </ | ||
- | |||
- | q-e appears to be a code that does not like mpi threads x OMP threads > physical cores. | ||
- | Performance on two trestles nodes is better than the previous 5.1 benchmarks, but it doesn' | ||
- | |||
- | < | ||
- | Cores Node type #mpi # | ||
- | 32 Trestles AMD 32 | ||
- | 32 Trestles AMD 32 | ||
- | 32 Trestles AMD 32 | ||
- | 32 Trestles AMD 16 | ||
- | 32 Trestles AMD 64 | ||
- | 32 Trestles AMD | ||
- | 32 6130 | ||
- | 32 6130 | ||
- | 32 6130 | ||
- | 48 7402 | ||
- | 48 7402 | ||
- | 48 7402 | ||
- | </ | ||
- | |||
- | * using a better optimized version for more modern machines, which doesn' | ||
- | |||
- | ** tested for number of hardware threads, which is negative for performance vs. number of physical cores | ||
- | |||
- | *** 6.5 version. | ||
- | |||
- | Install script | ||
< | < | ||
+ | # | ||
+ | # | ||
+ | COMPUTER=bulldozer | ||
+ | OPT=" | ||
+ | VERSION=7.1 | ||
+ | HDF5=1.12.0 | ||
+ | module purge | ||
+ | module load intel/ | ||
OMP=" | OMP=" | ||
- | VERSION=6.6 | + | make clean |
./ | ./ | ||
SCALAPACK_LIBS=" | SCALAPACK_LIBS=" | ||
Line 125: | Line 22: | ||
BLAS_LIBS=" | BLAS_LIBS=" | ||
FFT_LIBS=" | FFT_LIBS=" | ||
- | FFLAGS=" | + | FFLAGS=" |
- | -assume byterecl -I$MKLROOT/ | + | CFLAGS=" |
- | CFLAGS=" | + | --with-hdf5=/ |
- | --with-hdf5=/ | + | $OMP --prefix=/ |
- | --enable-parallel $OMP --prefix=/ | + | make depends |
+ | make all | ||
+ | make install | ||
</ | </ | ||
- | ==Update 2022== | + | Runtime: |
- | + | ||
- | QE 6.8 are 7.1 installed with two versions compiled with the Intel compiler (" | + | |
- | " | + | |
- | " | + | |
- | Both use '' | + | |
- | The '' | + | |
- | The " | + | |
- | '' | + | |
- | For AMD, explicitly set after the module load: | + | |
< | < | ||
- | export MKL_DEBUG_CPU_TYPE=0 | + | module load intel/18.0.2 impi/17.0.4 mkl/20.0.4 {qe/7.1 or qe/6.8} |
+ | trestles: | ||
</ | </ | ||
- | A small pw.x input was used to allow some parameter sweeps. | + | The performance is not sensitive to qe version between 6.8 and 7.1, but is quite sensitive |
- | < | + | |
- | Single Node Results | + | |
- | {OMP_NUM_THREADS=1|2} time mpirun -np {16|32|64} \ | + | The AUSURF112 benchmark is used for comparison with "-nk 2" and both CPUs on one node |
- | / | + | |
- | -nk {1|4|8|16} <scf.in >log | + | |
- | System | + | < |
- | + | System | |
- | Pinnacle | + | Pinnacle |
- | Pinnacle | + | Pinnacle |
- | Pinnacle I | + | Pinnacle I-Intel6130 7.1 32 |
- | Pinnacle I 32 Intel 6130 6.8 skylake | + | Pinnacle I-Intel6130 7.1 16 |
- | + | Trestles-AMD6136 | |
- | Pinnacle I | + | Trestles-AMD6136 |
- | Pinnacle I 32 Intel 6130 6.8 bulldozer | + | |
- | + | ||
- | Trestles | + | |
- | Trestles | + | |
- | Trestles | + | |
- | + | ||
- | Pinnacle II 64 AMD 7543 6.8 bulldozer | + | |
- | Pinnacle II 64 AMD 7543 6.8 bulldozer | + | |
- | Pinnacle II 64 AMD 7543 6.8 bulldozer | + | |
- | Pinnacle II 64 | + | |
- | Pinnacle II 64 AMD 7543 7.1 bulldozer | + | |
</ | </ | ||
- | Conclusions for this sample program: | ||
- | |||
- | '' | ||
- | |||
- | QE 7.1 is slightly slower than 6.8 on Pinnacle I & II and slightly faster on Trestles. | ||
- | |||
- | '' | ||
- | |||
- | The bulldozer version runs on Intel but is significantly slower than the skylake version. | ||
- | |||
- | The uncrowded Trestles system has relatively good performance on QE if memory (64 GB) allows. |