=== Gromacs ===

**Updated by** [[ gromacs2023 ]].

Several versions of the [[ http://www.gromacs.org ]] molecular dynamics program are installed. The most complete and latest is version 2016.3.

Gromacs makes heavy use of the [[ http://www.fftw.org ]] FFT package.  Both Gromacs and FFTW make heavy use of Intel SSE/AVX vector instructions.  Our modules for gromacs 2016.3 and fftw 3.3.6 automatically select the proper vector type for our compute nodes at runtime (SSE2, AVX, AVX2 for FFTW, same plus SSE4.1 for Gromacs). The scheduler requires that multiple nodes in a job are of the same type, so the vector type selected for the head compute node should be correct for all.

==Non-GPU usage == 
with single/double precision selected by _mpi or _mpi_d:
<code>
module purge
module load intel/16.0.1 impi/5.1.2 mkl/16.0.1 fftw/3.3.6  gromacs/2016.3
NP=$(wc -l <$PBS_NODEFILE)
mpirun -np $NP -machinefile $PBS_NODEFILE {mdrun_mpi|mdrun_mpi_d} -c conf.gro -s topol.tpr
</code>

Benchmark results (single node) [[ http://www.gromacs.org/GPU_acceleration ]] ADH Cubic PME
<code>
Node Type                Vector Compiler SP ns/day DP ns/day  GPU
32-core AMD 4-Opteron 6136 SSE2   Intel     4.93      8.55    none
12-core Intel 2-X5670    SSE4.1   Intel     3.16      6.47    none
16-core Intel 2-E5-2670     AVX   Intel     7.47     12.45    none
24-core Intel 2-E5-2650v4  AVX2   Intel    10.95     16.78    none   
24-core Intel 2-E5-2650v4  AVX2     gcc    27.85     xx.xx    1 K80
24-core Intel 2-E5-2650v4  AVX2     gcc    37.75     xx.xx    2 K80
24-core Intel 2-E5-2650v4  AVX2     gcc    38.45     xx.xx    4 K80
</code>
In many of our benchmarks, a single 32-core 6136 Trestles node is about the same performance as a 16-core E5-2670 node, but AMD does not work very well for Gromacs because of its limited SSE capability.

== GPU usage ==  
in progress