Both sides previous revision
Previous revision
Next revision
|
Previous revision
Last revision
Both sides next revision
|
quantum_espresso [2020/12/30 21:07] root |
quantum_espresso [2022/06/20 18:40] root |
<code> | <code> |
OMP="--enable-openmp" | OMP="--enable-openmp" |
./install/configure MPIF90=mpiifort F90=ifort F77=ifort FC=ifort CC=icc SCALAPACK_LIBS="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" LAPACK_LIBS="-L$MKLROOT/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64" BLAS_LIBS="-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -thread" FFT_LIBS="-L$MKLROOT/interfaces/fftw3xf -lfftw3xf_intel" FFLAGS="-O3 -xHOST -D__INTEL -D__GNUC__ -D__FFTW3 -D__MPI -D__PARA -D__SCALAPACK -assume byterecl -I$MKLROOT/include/fftw" CFLAGS="-O3 -xHOST -D__INTEL -D__GNUC__ -D__FFTW3 -D__MPI -D__PARA -D__SCALAPACK" --with-hdf5=/share/apps/hdf5/1.10.5/intel/impi -with-scalapack=intel --enable-parallel $OMP --prefix=/share/apps/espresso/espresso-6.6-intel-impi-mkl-trestles | VERSION=6.6 |
| ./install/configure MPIF90=mpiifort F90=ifort F77=ifort FC=ifort CC=icc \ |
| SCALAPACK_LIBS="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" \ |
| LAPACK_LIBS="-L$MKLROOT/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64" \ |
| BLAS_LIBS="-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -thread" \ |
| FFT_LIBS="-L$MKLROOT/interfaces/fftw3xf -lfftw3xf_intel" \ |
| FFLAGS="-O3 -xHOST -D__INTEL -D__GNUC__ -D__FFTW3 -D__MPI -D__PARA -D__SCALAPACK \ |
| -assume byterecl -I$MKLROOT/include/fftw" \ |
| CFLAGS="-O3 -xHOST -D__INTEL -D__GNUC__ -D__FFTW3 -D__MPI -D__PARA -D__SCALAPACK" \ |
| --with-hdf5=/share/apps/hdf5/1.10.5/intel/impi -with-scalapack=intel \ |
| --enable-parallel $OMP --prefix=/share/apps/espresso/espresso-$VERSION-intel-impi-mkl-trestles |
</code> | </code> |
| |
| ==Update 2022== |
| |
| QE 6.8 are 7.1 installed with two versions compiled with the Intel compiler ("skylake" for Intel and "bulldozer" for AMD). |
| "skylake" uses ''-xHOST'' and is compiled on Pinnacle I. |
| "bulldozer" uses ''-msse3 -axsse3,sse4.2,AVX,core-AVX2,CORE-AVX512'' and is compiled on Trestles, so should work at some speed on all systems. |
| Both use ''module load intel/19.0.5 mkl/19.0.5 impi/17.0.4''. |
| The ''impi/19.0.5'' module causes a fault on the AMD platforms for unknown reasons. |
| The "skylake" binary causes a fault on the AMD platforms because of a single AVX512 code path. |
| ''export MKL\_DEBUG\_CPU\_TYPE=5'' as set on AMD by module ''mkl<20'' causes a fault on the AMD platforms (failure on Trestles, wrong answer on Pinnacle II). |
| For AMD, explicitly set after the module load: |
| <code> |
| export MKL_DEBUG_CPU_TYPE=0 |
| </code> |
| |
| A small pw.x input was used to allow some parameter sweeps. |
| <code> |
| Single Node Results |
| |
| {OMP_NUM_THREADS=1|2} time mpirun -np {16|32|64} \ |
| /share/apps/espresso/espresso-{6.8|7.1}-intel-impi-mkl-{skylake|bulldozer}/bin/pw.x \ |
| -nk {1|4|8|16} <scf.in >log |
| |
| System Cores CPU QE version compile #mpi #OMP #MKL -nk Wall Time(s) |
| |
| Pinnacle I 32 Intel 6130 6.8 skylake 32 1 1 1 20.5 |
| Pinnacle I 32 Intel 6130 6.8 skylake 32 1 1 4 13.6 |
| Pinnacle I 32 Intel 6130 6.8 skylake 32 1 1 8 9.5 |
| Pinnacle I 32 Intel 6130 6.8 skylake 32 1 1 16 14.3 |
| |
| Pinnacle I 32 Intel 6130 6.8 skylake 16 2 1 4 >360 |
| Pinnacle I 32 Intel 6130 6.8 bulldozer 32 1 1 8 22.0 |
| |
| Trestles 32 AMD 6136 6.8 bulldozer 32 1 1 8 17.5 |
| Trestles 32 AMD 6136 6.8 bulldozer 32 1 1 4 16.2 |
| Trestles 32 AMD 6136 7.1 bulldozer 32 1 1 4 15.3 |
| |
| Pinnacle II 64 AMD 7543 6.8 bulldozer 64 1 1 4 4.9 |
| Pinnacle II 64 AMD 7543 6.8 bulldozer 64 1 1 8 4.0 |
| Pinnacle II 64 AMD 7543 6.8 bulldozer 64 1 1 16 4.5 |
| Pinnacle II 64 AMD 7543 6.8 bulldozer 32 1 1 8 6.0 |
| Pinnacle II 64 AMD 7543 7.1 bulldozer 64 1 1 8 4.2 |
| </code> |
| |
| Conclusions for this sample program: |
| |
| ''-nk 8'' is best on Pinnacle I & II platforms, ''-nk 4'' on Trestles. |
| |
| QE 7.1 is slightly slower than 6.8 on Pinnacle I & II and slightly faster on Trestles. |
| |
| ''OMP\_NUM\_THREADS>1'' is a very substantial slowdown versus 1 or unset. |
| |
| The bulldozer version runs on Intel but is significantly slower than the skylake version (other older platforms such as E5 condo nodes won't run AVX512 codes and would probably benefit from their own version). |
| |
| The usually zero-wait Trestles system has relatively good performance on QE if shared memory (64 GB) allows (relatively meaning ~1/2 of Pinnacle I performance when for some programs it is ~1/5) |