For a reference Intel compilation, see
/scrfs/apps/lammps/lammps-16Mar18/src/lmppinnacle-threaded''. Lammps modules are
<code>
/scrfs/apps/lammps/lammps-16Mar18/src$ make ps | grep “YES:”
Installed YES: package CLASS2
Installed YES: package KSPACE
Installed YES: package MANYBODY
Installed YES: package MOLECULE
Installed YES: package USER-FEP
Installed YES: package USER-INTEL
Installed YES: package USER-MESO
Installed YES: package USER-MISC
Installed YES: package USER-MOLFILE
Installed YES: package USER-REAXC
Installed YES: package USER-SMD
</code>
Makefile is a slight modification of the Lammps-included Makefile.knl. This is compiled on Trestles to be upwards compatible with all systems. For the Phis you would want to include the MIC-AVX512 from the original.
<code>
/scrfs/apps/lammps/lammps-16Mar18/src/MAKE/OPTIONS$ diff Makefile.knl Makefile.pinnacle-threaded
10c10
< OPTFLAGS = -xMIC-AVX512 -O2 -fp-model fast=2 -no-prec-div -qoverride-limits
—
> OPTFLAGS = -xHOST -axsse4.2,AVX,CORE-AVX512 -O2 -fp-model fast=2 -no-prec-div -qoverride-limits
12c12,13
< -DLMPINTELUSELRT -DLMPUSEMKLRNG $(OPTFLAGS)
—
> -DLMPINTELUSELRT -DLMPUSEMKLRNG $(OPTFLAGS) \
> -I/scrfs/apps/intel/parallelstudioxe2019update4/compilersandlibraries2019.5.281/linux/tbb/include
18c19
< LIB = -ltbbmalloc
—
> LIB = -L$(MKLROOT)/lib/intel64/ -lmkllapack95lp64 -lmklintellp64 -lmklsequential -lmklcore -ltbbmalloc
33c34
< LMPINC = -DLAMMPSGZIP -DLAMMPSJPEG
—
> LMPINC = -DLAMMPSGZIP -DLAMMPSMEMALIGN=64 -DLAMMPSJPEG
119c120
< cc -O -o $@ $<
—
> icc -O -o $@ $<
</code>
Best known run configuration on all (non-Phi) systems is with MPI threads=physical cores/2 like so for Pinnacle 32-core:
<code>
module load intel/19.0.5 impi/19.0.5 mkl/19.0.5
mpirun -np 16 -genv OMPNUMTHREADS=2 /scrfs/apps/lammps/lammps-16Mar18/src/lmp_pinnacle-threaded -pk intel 0 omp 2 -sf intel <in.dpd >out
</code>
Benchmarks across systems for one node are
<code>
Razor 16 core, np=8 1:56
Trestles 32 core, np=16 2:02
Pinnacle 32 core, np=16 0:46
</code>