See Selecting Resources for help on choosing the best node/queue for your work.
These torque
scheduler queues (a different set for each of two torque
instances) are deprecated as the torque
scheduler will be replaced by slurm
and most razor 12-core nodes will be retired.
Razor general use queues are
queue | CPU | memory/node | max PBS spec | max PBS time | notes | Maui partitions |
---|---|---|---|---|---|---|
debug12core | 2x Intel X5670 2.93 GHz | 24GB | nodes=2:ppn=12 | walltime=0:0:30:00 | dedicated | rz |
tiny12core | 2x Intel X5670 2.93 GHz | 24GB | nodes=24:ppn=12 | walltime=0:06:00:00 | node pool shared | rt/rm |
med12core | 2x Intel X5670 2.93 GHz | 24GB | nodes=24:ppn=12 | walltime=3:00:00:00 | node pool shared | rm |
debug16core | 2x Intel E5-2670 2.6 GHz | 32GB | nodes=2:ppn=16 | walltime=0:0:30:00 | dedicated | yd |
tiny16core | 2x Intel E5-2670 2.6 GHz | 32GB | nodes=18:ppn=16 | walltime=0:06:00:00 | node pool shared | yt/ym |
med16core | 2x Intel E5-2670 2.6 GHz | 32GB | nodes=18:ppn=16 | walltime=3:00:00:00 | node pool shared | ym |
onenode16core | 2x Intel E5-2670 2.6 GHz | 32GB | nodes=1:ppn=16 | walltime=72:00:00 | dedicated/one node max per job | yl |
gpu16core | 2x Intel E5-2630V3 2.4 GHz/2xK40c | 64GB | nodes=1:ppn=16 | walltime=3:00:00:00 | gpu jobs only | gu |
mem512GB64core | 4x AMD 6276 2.3 GHz | 512GB | nodes=2:ppn=64 | walltime=3:00:00:00 | >64GB shared memory only | yc |
mem768GB32core | 4x Intel E5-4640 2.4 GHz | 768GB | nodes=2:ppn=32 | walltime=3:00:00:00 | >512 GB shared memory only | yb |
nebula | nebula cloud |
Maui partitions in the table can be used with the shownodes
command to estimate which nodes are immediately available like below. So if you want to use the gpu16core
queue, this will show that resources are available now:
$ shownodes -l -n | grep Idle | grep gu compute0804 Idle 24:24 gu 0.00
This doesn't guarantee that a job will start immediately as the scheduler may be assembling idle nodes for a large job, but it is a good indication. Similarly for a queue tiny16core
that can draw from multiple Maui partitions, at this time there are 22 idle nodes.
The following script will count the major node categories for Idle nodes:
$ /share/apps/bin/idle_razor_nodes.sh 12-core n30m=1 n06h=35 n72h=34 16-core n30m=3 n06h=22 n72h=10 onenode=16 graphics=1 bigmem 512m=2 768m=2 condo xqian=4 sbarr=2 aja=0 laur=3 itza=2
Production queues in quantity for most usage are {tiny/med}{12core/16core}. About 16 Razor 16-core nodes that have difficulty with multi-node MPI jobs are in a queue “onenode16core” for single-node,16-core jobs.
Trestles general use queues are (the first three queues are production queues in quantity)
queue | CPU | memory/node | max PBS spec | max PBS time | notes | Maui partitions |
---|---|---|---|---|---|---|
q30m32c | 4x AMD 6136 2.4 GHz | 64GB | nodes=128:ppn=32 | walltime=0:0:30:00 | node pool shared | tu/ts/tl |
q06h32c | 4x AMD 6136 2.4 GHz | 64GB | nodes=128:ppn=32 | walltime=0:06:00:00 | node pool shared | ts/tl |
q72h32c | 4x AMD 6136 2.4 GHz | 64GB | nodes=64:ppn=32 | walltime=3:00:00:00 | node pool shared | tl |
nebula | nebula cloud |
$ /share/apps/bin/idle_trestles_nodes.sh n30m=2 n06h=0 n72h=0 condo laur=8 agri=2 doug=0 itza=1 condo mill=23 millgpu=2 nair=27 nairphi=2 $
Production queues in quantity for most usage are q06h32c/q72h32c.
“node pool shared” on tiny/med or q30m/q06h/q72h means that the queues allocate jobs from a common pool of identical nodes, with some dedicated for the shorter queues.
For a complete listing of all defined queues and their properties on each cluster please use the qstat -q
command.