namd2023
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| namd2023 [2024/01/26 19:00] – root | namd2023 [2025/10/15 19:51] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====namd 2023==== | + | =====namd 2023===== |
| - | An update on [[namd]] for the shared memory one-node version ``namd2/ | + | Here is an update on [[namd]] for the shared memory one-node version ``namd2/ |
| - | Examples follow for the newest versions we can run. | + | ====Versions==== |
| + | Most of NAMD 3 and a few of the newer NAMD 2 are not usable with Centos 7 OS because of being compiled with too new a glibc. | ||
| + | |||
| + | ===CPU=== | ||
| + | We are using the number of cores available on the node, either " | ||
| + | |||
| + | ==namd2 shared memory== | ||
| + | |||
| + | The 2.14 ``verbs-smp`` version can be used with both ``namd2`` and ``charmrun++``. | ||
| - | ==CPU== | ||
| - | Using the number of cores available on the node, either " | ||
| < | < | ||
| - | export PATH=$PATH:/share/ | + | module load namd/2.14 |
| + | #Pinnacle I Intel 6130 2.09 ns/day | ||
| + | namd2 +p31 +setcpuaffinity +isomalloc_sync step7.2_production_colvar.inp | ||
| + | #Pinnacle II AMD 7543 0.81 ns/day | ||
| + | namd2 +p64 +setcpuaffinity +isomalloc_sync step7.2_production_colvar.inp | ||
| + | #Trestles AMD 4.51 ns/day | ||
| namd2 +p32 +setcpuaffinity +isomalloc_sync step7.2_production_colvar.inp | namd2 +p32 +setcpuaffinity +isomalloc_sync step7.2_production_colvar.inp | ||
| </ | </ | ||
| - | ==GPU== | + | The 2.15a1 AVX512 version with ``namd2`` here runs only on Pinnacle I, but is very much faster |
| - | Again using the number of cores available on the node (24/32/64) and one GPU (two or more GPUs ``devices 0,1,2,3`` scale poorly, not recommended or approved | + | |
| < | < | ||
| - | export PATH=$PATH:/share/ | + | module load namd/2.15a1 |
| - | namd3 +p32 +setcpuaffinity +isomalloc_sync | + | #Pinnacle I Intel 6130 1.24 ns/day |
| + | namd2 +p31 +setcpuaffinity +isomalloc_sync step7.2_production_colvar.inp | ||
| </ | </ | ||
| - | ==Results== | + | ==charmrun++ running namd2== |
| + | |||
| + | Single node 2.14 ``charmrun++ ++np 1`` with ``++ppn ##`` moved to left side should run equivalently to the same ``namd`` and same ``++ppn ##``. | ||
| + | |||
| + | With two nodes, in a few cases ``charmrun++`` scales fairly well, but because of better alternatives, | ||
| + | |||
| + | On Pinnacle I, 2.14 ``charmrun++ ++np 2`` scaled well but was still hardly faster than single-node 2.15a1 ``namd2``. | ||
| < | < | ||
| - | Partition Cores Proc GPU Used Walltime | + | module load namd/2.14 |
| - | _____________________________________ | + | #Pinnacle I Intel 6130 1 node 2.09 ns/ |
| - | comp72 | + | charmrun ++remote-shell ssh ++np 1 ++ppn 31 `which namd2`+setcpuaffinity +isomalloc_sync step7.2_production_colvar.inp |
| - | acomp06 | + | #Pinnacle I Intel 6130 2 node 1.17 ns/day |
| - | tres72 | + | charmrun ++remote-shell ssh ++np 1 ++ppn 31 `which namd2`+setcpuaffinity +isomalloc_sync |
| - | gpu72 32c i6130 1xV100 | + | #Pinnacle I 3 node 0.88 ns/day |
| - | pcon06 | + | charmrun ++remote-shell ssh ++np 1 ++ppn 31 `which namd2`+setcpuaffinity +isomalloc_sync |
| - | agpu72 | + | |
| - | tgpu72 | + | |
| - | tgpu72 | + | |
| - | tgpu72 | + | |
| - | tgpu72 | + | |
| </ | </ | ||
| - | ==ToDo== | + | On Pinnacle II, two-node ``charmrun++`` didn't scale well, so again little use case for ``charmrun++``. |
| - | Enable tgpu72 partition. | + | |
| + | < | ||
| + | module load namd/2.14 | ||
| + | #Pinnacle II AMD 7543 2 node 0.69 ns/day | ||
| + | charmrun ++remote-shell ssh ++np 2 ++ppn 64 `which namd2` +setcpuaffinity +isomalloc_sync | ||
| + | </ | ||
| + | |||
| + | On Trestles, two-node 2.14 scaled well to about the same speed as one-node 2.14 ``namd2`` on Pinnacle I. Three nodes did not scale well. So here there may be a use case for using an uncrowded cluster. | ||
| + | |||
| + | < | ||
| + | module load namd/2.14 | ||
| + | #Trestles AMD 2 node 2.81 ns/day | ||
| + | charmrun ++remote-shell ssh ++np 2 ++ppn 64 `which namd2` +setcpuaffinity +isomalloc_sync | ||
| + | #Trestles AMD 2 node 1.99 ns/day | ||
| + | charmrun ++remote-shell ssh ++np 2 ++ppn 64 `which namd2` +setcpuaffinity +isomalloc_sync | ||
| + | </ | ||
| + | |||
| + | =nodelist= | ||
| + | |||
| + | ``charmrun++`` is expecting, instead of an ``mpirun`` hostfile/ | ||
| + | < | ||
| + | host tres0931 | ||
| + | host tres0929 | ||
| + | host tres0928 | ||
| + | </ | ||
| + | |||
| + | To modify the machinefile (generated by the system for each job) to be a nodelist in the PWD, try | ||
| + | < | ||
| + | cat / | ||
| + | </ | ||
| + | |||
| + | Overall there is not a good use case for ``charmrun++`` because there are better alternatives, | ||
| + | |||
| + | ==GPU== | ||
| + | |||
| + | Here we are using the number of CPU cores available on the node (24/32/64) and one GPU (two or more GPUs ``devices 0,1,2,3`` scale poorly, not recommended or approved for AHPCC public use partitions). This benchmark simulation scaled significantly with the CPU cores used up to the number of cores present. | ||
| + | |||
| + | On the ``gpu72`` nodes with Intel 6130 and single NVidia V100, it's about 5 times faster than the best CPU version, so are a good use case. On ``agpu72`` nodes with AMD7543 and single A100, it's only about 10% faster than 6130/V100, so that's not a good use case for the more expensive AMD/A100 nodes, unless gpu memory requires the newer GPU. The even more expensive multi-gpu ``qgpu72`` nodes also don't scale well over single-gpu and are not a good use case. | ||
| + | |||
| + | < | ||
| + | # | ||
| + | module load namd/ | ||
| + | namd3 +p32 +setcpuaffinity +isomalloc_sync +devices 0 step7.2_production_colvar.inp | ||
| + | Info: Benchmark time: 32 CPUs 0.0393942 s/step 0.227976 days/ns 0 MB memory | ||
| + | # | ||
| + | namd3 +p64 +setcpuaffinity +isomalloc_sync _devices 0 step7.24_production.inp | ||
| + | Info: Benchmark time: 64 CPUs 0.0344332 s/step 0.199266 days/ns 0 MB memory | ||
| + | </ | ||
| - | Test 3.0b5 on Rocky 8 on test nodes. | ||
namd2023.1706295640.txt.gz · Last modified: (external edit)
