=== Namd ==== The namd-verbs-smp binary [[https://web.archive.org/web/20181127065652/http://www.ks.uiuc.edu/Research/namd/benchmarks/]] version 2.11 or 2.12 is installed in /share/apps/NAMD on [[razor]] and [[trestles]]. It does not use MPI. This is for multiple-node runs with ''charmrun'' as the distributed component and ''namd2'' on each compute node. We have found most runs are faster with the ''+setcpuaffinity +isomalloc_sync'' options. charmrun ''++ppn'' should match PBS ''ppn=''. module load namd/2.12 [or 2.11] cd $PBS_O_WORKDIR NP=$(wc -l <$PBS_NODEFILE) rm -f nodelist for node in `cat $PBS_NODEFILE | sort | uniq` do echo "host ${node}" >> nodelist done charmrun ++remote-shell ssh ++ppn 16 `which namd2` +p $NP +setcpuaffinity +isomalloc_sync apoa1.namd >apoa1.logfile This is for single-node run using only the shared-memory program ''namd2''. module load namd/2.12 [or 2.11] cd $PBS_O_WORKDIR NP=$(wc -l <$PBS_O_WORKDIR) namd2 +p $NP apoa1.namd +setcpuaffinity +isomalloc_sync >apoa1.logfile == Benchmarks == The NAMD website has benchmarks run on Trestles while at UCSD [[http://www.ks.uiuc.edu/Research/namd/performance.html]], but they don't have any info on how the scores were obtained (namd2, charmrun, or mpi). These are shown as benchmark time*cores. Best results for charmrun here were obtained with multiple nodes using ppn=cores/node,p=total cores (or ppn*nodes). Single nodes running namd2 were both p=cores and are comparable with the published benchmarks. Version 2.12 is substantially faster than 2.11. The downloaded verbs-smp version is set by the module as it is faster than the ibverbs-smp version. On this problem, the Intel version didn't show any useful scaling for more than 2 nodes, and AMD not very useful scaling for more than 3 nodes. Node Type ppn version p Nodes Bench WallClock UCSD Bench 16-core Intel 16 2.11 16 1 1.21 383 n/a 16-core Intel 16 2.12 16 1 0.76 256 n/a 16-core Intel 16 2.12 16 2 0.90 146 n/a 16-core Intel 16 2.12 16 3 1.32 146 n/a 32-core AMD 32 2.12 32 1 1.95 317 1.9 32-core AMD 32 2.12 32 2 2.22 185 2.0 32-core AMD 32 2.12 32 3 2.29 127 n/a 32-core AMD 32 2.12 32 4 2.56 104 n/a == 2020 Update == Replicated a couple of old benchmarks and added some new versions and machines Cores Node Type ppn GPU version nodes WallClock 16 Intel Razor 16 2.12 1 242 32 AMD Trestles 32 2.12 1 315 32 Intel G6130 32 2.12 1 127 32 Intel G6130 32 2.13 1 127 48 AMD Epyc 7402 48 2.13 1 89 32 Intel G6130 32 2.15a1-AVX512 1 76 32 Intel G6130 32 V100 3.0a7-cuda 1 39 48 AMD Epyc 7402 48 2.15a1-AVX2 needs recompilation