Differences

This shows you the differences between two versions of the page.

--- torque_queues [2020/01/29 18:43]
127.0.0.1 external edit
+++ torque_queues [2022/02/02 17:25]
jpummil removed
@@ Line 1: / Line 1: @@
-====Queues Trestles/Razor====
+====Torque Queues Trestles/Razor====
-These ''torque'' scheduler queues are deprecated as the ''torque'' scheduler will be replaced by ''slurm'' and most razor 12-core nodes will be retired.
+See [[ equipment | Selecting Resources ]] for help on choosing the best node/queue for your work.
+These ''torque'' scheduler queues (a different set for each of two ''torque'' instances) are deprecated as the ''torque'' scheduler will be replaced by ''slurm'' and most razor 12-core nodes will be retired.
 **Razor** general use queues are
 <csv>
-queue,CPU,memory/node,max PBS spec,max PBS time,notes
+queue,CPU,memory/node,max PBS spec,max PBS time,notes,Maui partitions
-debug12core,2x Intel X5670 2.93 GHz,24GB,nodes=2:ppn=12,walltime=0:0:30:00, dedicated
+debug12core,2x Intel X5670 2.93 GHz,24GB,nodes=2:ppn=12,walltime=0:0:30:00, dedicated,rz
-med12core,2x Intel X5670 2.93 GHz,24GB,nodes=24:ppn=12,walltime=3:00:00:00, node pool shared
+tiny12core,2x Intel X5670 2.93 GHz,24GB,nodes=24:ppn=12,walltime=0:06:00:00, node pool shared,rt/rm
-tiny12core,2x Intel X5670 2.93 GHz,24GB,nodes=24:ppn=12,walltime=0:06:00:00, node pool shared
+med12core,2x Intel X5670 2.93 GHz,24GB,nodes=24:ppn=12,walltime=3:00:00:00, node pool shared,rm
-debug16core,2x Intel E5-2670 2.6 GHz,32GB,nodes=2:ppn=16,walltime=0:0:30:00, dedicated
+debug16core,2x Intel E5-2670 2.6 GHz,32GB,nodes=2:ppn=16,walltime=0:0:30:00, dedicated,yd
-tiny16core,2x Intel E5-2670 2.6 GHz,32GB,nodes=18:ppn=16,walltime=0:06:00:00, node pool shared
+tiny16core,2x Intel E5-2670 2.6 GHz,32GB,nodes=18:ppn=16,walltime=0:06:00:00, node pool shared,yt/ym
-med16core,2x Intel E5-2670 2.6 GHz,32GB,nodes=18:ppn=16,walltime=3:00:00:00, node pool shared
+med16core,2x Intel E5-2670 2.6 GHz,32GB,nodes=18:ppn=16,walltime=3:00:00:00, node pool shared,ym
-gpu16core,2x Intel E5-2630V3 2.4 GHz/2xK40c,64GB,nodes=1:ppn=16,walltime=3:00:00:00,gpu jobs only
+onenode16core,2x Intel E5-2670 2.6 GHz,32GB,nodes=1:ppn=16,walltime=72:00:00, dedicated/one node max per job,yl
-mem512GB64core,4x AMD 6276 2.3 GHz,512GB,nodes=2:ppn=64,walltime=3:00:00:00,>64GB shared memory only
+gpu16core,2x Intel E5-2630V3 2.4 GHz/2xK40c,64GB,nodes=1:ppn=16,walltime=3:00:00:00,gpu jobs only,gu
-mem768GB32core,4x Intel E5-4640 2.4 GHz,768GB,nodes=2:ppn=32,walltime=3:00:00:00,>512 GB shared memory only
+mem512GB64core,4x AMD 6276 2.3 GHz,512GB,nodes=2:ppn=64,walltime=3:00:00:00,>64GB shared memory only,yc
+mem768GB32core,4x Intel E5-4640 2.4 GHz,768GB,nodes=2:ppn=32,walltime=3:00:00:00,>512 GB shared memory only,yb
 nebula,nebula cloud,,
 </csv>
-Production queues in quantity for most usage are tiny/med 12core/16core.
+Maui partitions in the table can be used with the ''shownodes'' command to estimate which nodes are immediately available like below. So if you want to use the ''gpu16core'' queue, this will show that resources are available now:
+<code>
+$ shownodes -l -n | grep Idle | grep gu
+compute0804 Idle    24:24 gu 0.00
+</code>
+This doesn't guarantee that a job will start immediately as the scheduler may be assembling idle nodes for a large job, but it is a good indication.  Similarly for a queue ''tiny16core'' that can draw from multiple Maui partitions, at this time there are 22 idle nodes.
+The following script will count the major node categories for Idle nodes:
+<code>
+$ /share/apps/bin/idle_razor_nodes.sh
+-core n30m=1 n06h=35 n72h=34
+-core n30m=3 n06h=22 n72h=10 onenode=16 graphics=1
+bigmem 512m=2 768m=2
+condo xqian=4 sbarr=2 aja=0 laur=3 itza=2
+</code>
+Production queues in quantity for most usage are {tiny/med}{12core/16core}.  About 16 Razor 16-core nodes that have difficulty with multi-node MPI jobs are in a queue "onenode16core" for single-node,16-core jobs.
 **Trestles** general use queues are (the first three queues are production queues in quantity)
 <csv>
-queue,CPU,memory/node,max PBS spec,max PBS time,notes
+queue,CPU,memory/node,max PBS spec,max PBS time,notes,Maui partitions
-q10m32c,4x AMD 6136 2.4 GHz,64GB,nodes=4:ppn=32,walltime=0:0:10:00,or "qtraining" dedicated
+q30m32c,4x AMD 6136 2.4 GHz,64GB,nodes=128:ppn=32,walltime=0:0:30:00, node pool shared,tu/ts/tl
-q30m32c,4x AMD 6136 2.4 GHz,64GB,nodes=128:ppn=32,walltime=0:0:30:00, node pool shared
+q06h32c,4x AMD 6136 2.4 GHz,64GB,nodes=128:ppn=32,walltime=0:06:00:00, node pool shared,ts/tl
-q06h32c,4x AMD 6136 2.4 GHz,64GB,nodes=128:ppn=32,walltime=0:06:00:00, node pool shared
+q72h32c,4x AMD 6136 2.4 GHz,64GB,nodes=64:ppn=32,walltime=3:00:00:00, node pool shared,tl
-q72h32c,4x AMD 6136 2.4 GHz,64GB,nodes=64:ppn=32,walltime=3:00:00:00, node pool shared
 nebula,nebula cloud,,
 </csv>
+<code>
+$ /share/apps/bin/idle_trestles_nodes.sh
+n30m=2 n06h=0 n72h=0
+condo laur=8 agri=2 doug=0 itza=1
+condo mill=23 millgpu=2 nair=27 nairphi=2
+$
+</code>
 Production queues in quantity for most usage are q06h32c/q72h32c.

Arkansas High Performace Computing Center [hpcwiki]

User Tools

Site Tools

Differences

Page Tools