
The namd-verbs-smp binary version 2.11 or 2.12 is installed in /share/apps/NAMD on razor and trestles. It does not use MPI.

This is for multiple-node runs with charmrun as the distributed component and namd2 on each compute node. We have found most runs are faster with the +setcpuaffinity +isomalloc_sync options. charmrun ++ppn should match PBS ppn=.

module load namd/2.12  [or 2.11]
NP=$(wc -l <$PBS_NODEFILE)
rm -f nodelist
for node in `cat $PBS_NODEFILE | sort | uniq`
  echo "host ${node}" >> nodelist
charmrun ++remote-shell ssh ++ppn 16 `which namd2` \
+p $NP +setcpuaffinity +isomalloc_sync apoa1.namd >apoa1.logfile

This is for single-node run using only the shared-memory program namd2.

module load namd/2.12  [or 2.11]
NP=$(wc -l <$PBS_O_WORKDIR)
namd2 +p $NP apoa1.namd +setcpuaffinity +isomalloc_sync >apoa1.logfile


The NAMD website has benchmarks run on Trestles while at UCSD, but they don't have any info on how the scores were obtained (namd2, charmrun, or mpi). These are shown as benchmark time*cores. Best results for charmrun here were obtained with multiple nodes using ppn=cores/node,p=total cores (or ppn*nodes). Single nodes running namd2 were both p=cores and are comparable with the published benchmarks. Version 2.12 is substantially faster than 2.11. The downloaded verbs-smp version is set by the module as it is faster than the ibverbs-smp version. On this problem, the Intel version didn't show any useful scaling for more than 2 nodes, and AMD not very useful scaling for more than 3 nodes.

Node Type      ppn  version  p  Nodes Bench WallClock  UCSD Bench
16-core Intel  16   2.11    16    1   1.21    383        n/a
16-core Intel  16   2.12    16    1   0.76    256        n/a
16-core Intel  16   2.12    16    2   0.90    146        n/a
16-core Intel  16   2.12    16    3   1.32    146        n/a
32-core AMD    32   2.12    32    1   1.95    317        1.9
32-core AMD    32   2.12    32    2   2.22    185        2.0
32-core AMD    32   2.12    32    3   2.29    127        n/a
32-core AMD    32   2.12    32    4   2.56    104        n/a

2020 Update

Replicated a couple of old benchmarks and added some new versions and machines

Cores Node Type  ppn GPU version     nodes WallClock
16 Intel Razor    16      2.12          1   242
32 AMD Trestles   32      2.12          1   315
32 Intel G6130    32      2.12          1   127
32 Intel G6130    32      2.13          1   127
48 AMD Epyc 7402  48      2.13          1    89
32 Intel G6130    32      2.15a1-AVX512 1    76
32 Intel G6130    32 V100 3.0a7-cuda    1    39
48 AMD Epyc 7402  48      2.15a1-AVX2 needs recompilation