quantum_espresso
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| quantum_espresso [2022/06/20 18:32] – root | quantum_espresso [2025/10/15 19:51] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ===== Quantum Espresso ===== | ===== Quantum Espresso ===== | ||
| - | Version 5.1 | ||
| - | ** Compilation ** | ||
| - | With Intel compiler and either OpenMPI or MVAPICH2: | + | Versions 6.8/7.1 |
| - | < | + | |
| - | OpenMPI: | + | |
| - | DFLAGS | + | |
| - | IFLAGS | + | |
| - | MPIF90 | + | |
| - | CFLAGS | + | |
| - | F90FLAGS | + | |
| - | FFLAGS | + | |
| - | FFLAGS_NOOPT | + | |
| - | FFLAGS_NOMAIN | + | |
| - | LD = mpif90 | + | |
| - | LDFLAGS | + | |
| - | SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 | + | |
| - | FFT_LIBS | + | |
| - | MVAPICH2: same except | + | ** Compilation ** |
| - | SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 | + | With Intel compiler, Intel MPI, and MKL |
| - | trestles: same except | ||
| - | no -axavx (though an " | ||
| - | </ | ||
| - | |||
| - | ** Benchmarks ** | ||
| - | |||
| - | We run AUSURF112 from [[http:// | ||
| - | '' | ||
| - | but it does so fairly repeatably so may be timed. | ||
| - | < | ||
| - | OpenMPI: | ||
| - | module load intel/ | ||
| - | mpirun -np 64 -machinefile $PBS_NODEFILE -x LD_LIBRARY_PATH \ | ||
| - | / | ||
| - | MVAPICH2: | ||
| - | module load intel/ | ||
| - | mpirun -np 64 -machinefile $PBS_NODEFILE \ | ||
| - | / | ||
| - | </ | ||
| - | The tables shows Lockwood' | ||
| - | <csv> | ||
| - | Walltime, | ||
| - | Lockwood Gordon E5-2670, | ||
| - | Lockwood Trestles AMD6136, | ||
| - | Our E5-2650V2, | ||
| - | Our E5-2670, | ||
| - | Our Trestles AMD6136, | ||
| - | Our Trestles AMD6136, | ||
| - | </ | ||
| - | (1) Fails with error [[http:// | ||
| - | |||
| - | ** Notes ** | ||
| - | |||
| - | Each run fails with error messages (depending on MPI type) and RC 1 after terminating normally according to the log. This appears harmless: | ||
| - | |||
| - | < | ||
| - | This run was terminated on: 13: 2:44 11Nov2015 | ||
| - | =------------------------------------------------------------------------------= | ||
| - | JOB DONE. | ||
| - | =------------------------------------------------------------------------------= | ||
| - | ------------------------------------------------------- | ||
| - | Primary job terminated normally, but 1 process returned | ||
| - | a non-zero exit code.. Per user-direction, | ||
| - | ------------------------------------------------------- | ||
| - | ------------------------------------------------------------ | ||
| - | A process or daemon was unable to complete a TCP connection | ||
| - | to another process: | ||
| - | etc. | ||
| - | </ | ||
| - | |||
| - | ** Continuing Work ** | ||
| - | |||
| - | ELPA in newer versions of Espresso is reportedly faster than Scalapack. | ||
| - | |||
| - | OpenMPI threading. | ||
| - | |||
| - | MKL threading. | ||
| - | |||
| - | FFTW fft vs. Intel fft on AMD. | ||
| - | |||
| - | === 2020 Update q-e 6.6=== | ||
| - | On Trestles with Intel tools. | ||
| - | |||
| - | |||
| - | < | ||
| - | module load intel/ | ||
| - | MKL_NUM_THREADS=# | ||
| - | </ | ||
| - | |||
| - | q-e appears to be a code that does not like mpi threads x OMP threads > physical cores. | ||
| - | Performance on two trestles nodes is better than the previous 5.1 benchmarks, but it doesn' | ||
| - | |||
| - | < | ||
| - | Cores Node type #mpi # | ||
| - | 32 Trestles AMD 32 | ||
| - | 32 Trestles AMD 32 | ||
| - | 32 Trestles AMD 32 | ||
| - | 32 Trestles AMD 16 | ||
| - | 32 Trestles AMD 64 | ||
| - | 32 Trestles AMD | ||
| - | 32 6130 | ||
| - | 32 6130 | ||
| - | 32 6130 | ||
| - | 48 7402 | ||
| - | 48 7402 | ||
| - | 48 7402 | ||
| - | </ | ||
| - | |||
| - | * using a better optimized version for more modern machines, which doesn' | ||
| - | |||
| - | ** tested for number of hardware threads, which is negative for performance vs. number of physical cores | ||
| - | |||
| - | *** 6.5 version. | ||
| - | |||
| - | Install script | ||
| < | < | ||
| + | # | ||
| + | # | ||
| + | COMPUTER=bulldozer | ||
| + | OPT=" | ||
| + | VERSION=7.1 | ||
| + | HDF5=1.12.0 | ||
| + | module purge | ||
| + | module load intel/ | ||
| OMP=" | OMP=" | ||
| - | VERSION=6.6 | + | make clean |
| ./ | ./ | ||
| SCALAPACK_LIBS=" | SCALAPACK_LIBS=" | ||
| Line 125: | Line 22: | ||
| BLAS_LIBS=" | BLAS_LIBS=" | ||
| FFT_LIBS=" | FFT_LIBS=" | ||
| - | FFLAGS=" | + | FFLAGS=" |
| - | -assume byterecl -I$MKLROOT/ | + | CFLAGS=" |
| - | CFLAGS=" | + | --with-hdf5=/ |
| - | --with-hdf5=/ | + | $OMP --prefix=/ |
| - | --enable-parallel $OMP --prefix=/ | + | make depends |
| + | make all | ||
| + | make install | ||
| </ | </ | ||
| - | ==Update 2022== | + | Runtime: |
| - | + | ||
| - | QE 6.8 are 7.1 installed with two versions compiled with the Intel compiler (" | + | |
| - | " | + | |
| - | " | + | |
| - | Both use '' | + | |
| - | The '' | + | |
| - | The " | + | |
| - | '' | + | |
| - | For AMD, explicitly set after the module load: | + | |
| < | < | ||
| - | export MKL_DEBUG_CPU_TYPE=0 | + | module load intel/18.0.2 impi/17.0.4 mkl/20.0.4 {qe/7.1 or qe/6.8} |
| + | trestles: | ||
| </ | </ | ||
| - | A small pw.x input was used to allow some parameter sweeps. | + | The performance is not sensitive to qe version between 6.8 and 7.1, but is quite sensitive |
| - | < | + | |
| - | Single Node Results | + | |
| - | {OMP_NUM_THREADS=1|2} time mpirun -np {16|32|64} \ | + | The AUSURF112 benchmark is used for comparison with "-nk 2" and both CPUs on one node |
| - | / | + | |
| - | -nk {1|4|8|16} <scf.in >log | + | |
| - | System | + | < |
| - | + | System | |
| - | Pinnacle | + | Pinnacle |
| - | Pinnacle | + | Pinnacle |
| - | Pinnacle I | + | Pinnacle I-Intel6130 7.1 32 |
| - | Pinnacle I 32 Intel 6130 6.8 skylake | + | Pinnacle I-Intel6130 7.1 16 |
| - | + | Trestles-AMD6136 | |
| - | Pinnacle I | + | Trestles-AMD6136 |
| - | Pinnacle I 32 Intel 6130 6.8 bulldozer | + | |
| - | + | ||
| - | Trestles | + | |
| - | Trestles | + | |
| - | Trestles | + | |
| - | + | ||
| - | Pinnacle II 64 AMD 7543 6.8 bulldozer | + | |
| - | Pinnacle II 64 AMD 7543 6.8 bulldozer | + | |
| - | Pinnacle II 64 AMD 7543 6.8 bulldozer | + | |
| - | Pinnacle II 64 | + | |
| - | Pinnacle II 64 AMD 7543 7.1 bulldozer | + | |
| </ | </ | ||
| - | Conclusions for this sample program: | ||
| - | |||
| - | '' | ||
| - | |||
| - | QE 7.1 is slightly slower than 6.8 on Pinnacle I & II and slightly faster on Trestles. | ||
| - | |||
| - | '' | ||
| - | |||
| - | The bulldozer version runs on Intel but is significantly slower than the skylake version. | ||
| - | |||
| - | The uncrowded Trestles system has relatively good performance on QE if memory (64 GB) allows. | ||
quantum_espresso.1655749972.txt.gz · Last modified: (external edit)
