The namd-verbs-smp binary https://web.archive.org/web/20181127065652/http://www.ks.uiuc.edu/Research/namd/benchmarks/ version 2.11 or 2.12 is installed in /share/apps/NAMD on razor and trestles. It does not use MPI.
This is for multiple-node runs with charmrun
as the distributed component and namd2
on each compute node. We have found most runs are faster with the +setcpuaffinity +isomallocsync
options. charmrun
++ppn should match PBS
ppn=.
<code>
module load namd/2.12 [or 2.11]
cd $PBSOWORKDIR
NP=$(wc -l <$PBSNODEFILE)
rm -f nodelist
for node in
namd2''.
<code>
module load namd/2.12 [or 2.11]
cd $PBSOWORKDIR
NP=$(wc -l <$PBSOWORKDIR)
namd2 +p $NP apoa1.namd +setcpuaffinity +isomallocsync >apoa1.logfile
</code>
== Benchmarks ==
The NAMD website has benchmarks run on Trestles while at UCSD http://www.ks.uiuc.edu/Research/namd/performance.html, but they don't have any info on how the scores were obtained (namd2, charmrun, or mpi). These are shown as benchmark timecores. Best results for charmrun here were obtained with multiple nodes using ppn=cores/node,p=total cores (or ppnnodes). Single nodes running namd2 were both p=cores and are comparable with the published benchmarks. Version 2.12 is substantially faster than 2.11. The downloaded verbs-smp version is set by the module as it is faster than the ibverbs-smp version. On this problem, the Intel version didn't show any useful scaling for more than 2 nodes, and AMD not very useful scaling for more than 3 nodes.
<code>
Node Type ppn version p Nodes Bench WallClock UCSD Bench
16-core Intel 16 2.11 16 1 1.21 383 n/a
16-core Intel 16 2.12 16 1 0.76 256 n/a
16-core Intel 16 2.12 16 2 0.90 146 n/a
16-core Intel 16 2.12 16 3 1.32 146 n/a
32-core AMD 32 2.12 32 1 1.95 317 1.9
32-core AMD 32 2.12 32 2 2.22 185 2.0
32-core AMD 32 2.12 32 3 2.29 127 n/a
32-core AMD 32 2.12 32 4 2.56 104 n/a
</code>
== 2020 Update ==
Replicated a couple of old benchmarks and added some new versions and machines
<code>
Cores Node Type ppn GPU version nodes WallClock
16 Intel Razor 16 2.12 1 242
32 AMD Trestles 32 2.12 1 315
32 Intel G6130 32 2.12 1 127
32 Intel G6130 32 2.13 1 127
48 AMD Epyc 7402 48 2.13 1 89
32 Intel G6130 32 2.15a1-AVX512 1 76
32 Intel G6130 32 V100 3.0a7-cuda 1 39
48 AMD Epyc 7402 48 2.15a1-AVX2 needs recompilation
</code>
cat $PBS_NODEFILE | sort | uniq
do
echo “host ${node}” » nodelist
done
charmrun ++remote-shell ssh ++ppn 16 which namd2
+p $NP +setcpuaffinity +isomallocsync apoa1.namd >apoa1.logfile
</code>
This is for single-node run using only the shared-memory program
namd.1605904512.txt.gz · Last modified: 2020/11/20 20:35 by root