Arkansas High Performace Computing Center [hpcwiki]

This is an old revision of the document!

Namd

The namd-verbs-smp binary https://web.archive.org/web/20181127065652/http://www.ks.uiuc.edu/Research/namd/benchmarks/ version 2.11 or 2.12 is installed in /share/apps/NAMD on razor and trestles. It does not use MPI.

This is for multiple-node runs with charmrun as the distributed component and namd2 on each compute node. We have found most runs are faster with the +setcpuaffinity +isomallocsync options. charmrun ++ppn should match PBS ppn=. <code> module load namd/2.12 [or 2.11] cd $PBSOWORKDIR NP=$(wc -l <$PBSNODEFILE) rm -f nodelist for node in cat $PBS_NODEFILE | sort | uniq do echo “host ${node}” » nodelist done charmrun ++remote-shell ssh ++ppn 16 which namd2 +p $NP +setcpuaffinity +isomallocsync apoa1.namd >apoa1.logfile </code> This is for single-node run using only the shared-memory program namd2''. <code> module load namd/2.12 [or 2.11] cd $PBSOWORKDIR NP=$(wc -l <$PBSOWORKDIR) namd2 +p $NP apoa1.namd +setcpuaffinity +isomallocsync >apoa1.logfile </code> == Benchmarks == The NAMD website has benchmarks run on Trestles while at UCSD http://www.ks.uiuc.edu/Research/namd/performance.html, but they don't have any info on how the scores were obtained (namd2, charmrun, or mpi). These are shown as benchmark timecores. Best results for charmrun here were obtained with multiple nodes using ppn=cores/node,p=total cores (or ppnnodes). Single nodes running namd2 were both p=cores and are comparable with the published benchmarks. Version 2.12 is substantially faster than 2.11. The downloaded verbs-smp version is set by the module as it is faster than the ibverbs-smp version. On this problem, the Intel version didn't show any useful scaling for more than 2 nodes, and AMD not very useful scaling for more than 3 nodes. <code> Node Type ppn version p Nodes Bench WallClock UCSD Bench 16-core Intel 16 2.11 16 1 1.21 383 n/a 16-core Intel 16 2.12 16 1 0.76 256 n/a 16-core Intel 16 2.12 16 2 0.90 146 n/a 16-core Intel 16 2.12 16 3 1.32 146 n/a 32-core AMD 32 2.12 32 1 1.95 317 1.9 32-core AMD 32 2.12 32 2 2.22 185 2.0 32-core AMD 32 2.12 32 3 2.29 127 n/a 32-core AMD 32 2.12 32 4 2.56 104 n/a </code> == 2020 Update == Replicated a couple of old benchmarks and added some new versions and machines <code> Cores Node Type ppn GPU version nodes WallClock 16 Intel Razor 16 2.12 1 242 32 AMD Trestles 32 2.12 1 315 32 Intel G6130 32 2.12 1 127 32 Intel G6130 32 2.13 1 127 48 AMD Epyc 7402 48 2.13 1 89 32 Intel G6130 32 2.15a1-AVX512 1 76 32 Intel G6130 32 V100 3.0a7-cuda 1 39 48 AMD Epyc 7402 48 2.15a1-AVX2 needs recompilation </code>

**This is an old revision of the document!**

Namd

This is an old revision of the document!