The namd-verbs-smp binary https://web.archive.org/web/20181127065652/http://www.ks.uiuc.edu/Research/namd/benchmarks/ version 2.11 or 2.12 is installed in /share/apps/NAMD on razor and trestles. It does not use MPI.
This is for multiple-node runs with charmrun
as the distributed component and namd2
on each compute node. We have found most runs are faster with the +setcpuaffinity +isomalloc_sync
options. charmrun ++ppn
should match PBS ppn=
.
module load namd/2.12 [or 2.11] cd $PBS_O_WORKDIR NP=$(wc -l <$PBS_NODEFILE) rm -f nodelist for node in `cat $PBS_NODEFILE | sort | uniq` do echo "host ${node}" >> nodelist done charmrun ++remote-shell ssh ++ppn 16 `which namd2` +p $NP +setcpuaffinity +isomalloc_sync apoa1.namd >apoa1.logfile
This is for single-node run using only the shared-memory program namd2
.
module load namd/2.12 [or 2.11] cd $PBS_O_WORKDIR NP=$(wc -l <$PBS_O_WORKDIR) namd2 +p $NP apoa1.namd +setcpuaffinity +isomalloc_sync >apoa1.logfile
The NAMD website has benchmarks run on Trestles while at UCSD http://www.ks.uiuc.edu/Research/namd/performance.html, but they don't have any info on how the scores were obtained (namd2, charmrun, or mpi). These are shown as benchmark timecores. Best results for charmrun here were obtained with multiple nodes using ppn=cores/node,p=total cores (or ppnnodes). Single nodes running namd2 were both p=cores and are comparable with the published benchmarks. Version 2.12 is substantially faster than 2.11. The downloaded verbs-smp version is set by the module as it is faster than the ibverbs-smp version. On this problem, the Intel version didn't show any useful scaling for more than 2 nodes, and AMD not very useful scaling for more than 3 nodes.
Node Type ppn version p Nodes Bench WallClock UCSD Bench 16-core Intel 16 2.11 16 1 1.21 383 n/a 16-core Intel 16 2.12 16 1 0.76 256 n/a 16-core Intel 16 2.12 16 2 0.90 146 n/a 16-core Intel 16 2.12 16 3 1.32 146 n/a 32-core AMD 32 2.12 32 1 1.95 317 1.9 32-core AMD 32 2.12 32 2 2.22 185 2.0 32-core AMD 32 2.12 32 3 2.29 127 n/a 32-core AMD 32 2.12 32 4 2.56 104 n/a
Replicated a couple of old benchmarks and added some new versions and machines
Cores Node Type ppn GPU version nodes WallClock 16 Intel Razor 16 2.12 1 242 32 AMD Trestles 32 2.12 1 315 32 Intel G6130 32 2.12 1 127 32 Intel G6130 32 2.13 1 127 48 AMD Epyc 7402 48 2.13 1 89 32 Intel G6130 32 2.15a1-AVX512 1 76 32 Intel G6130 32 V100 3.0a7-cuda 1 39 48 AMD Epyc 7402 48 2.15a1-AVX2 needs recompilation