User Tools

Site Tools


queues

Razor general use queues are (the first four queues are production queues in quantity)

queueCPUmemory/nodemax PBS specmax PBS timenotes
tiny12core2x Intel X567024GBnodes=24:ppn=12walltime=0:06:00:00node pool shared
med12core2x Intel X567024GBnodes=24:ppn=12walltime=3:00:00:00node pool shared
tiny16core2x Intel E5-267032GBnodes=18:ppn=16walltime=0:06:00:00node pool shared
med16core2x Intel E5-267032GBnodes=18:ppn=16walltime=3:00:00:00node pool shared
debug12core2x Intel X567024GBnodes=2:ppn=12walltime=0:0:30:00dedicated
debug16core2x Intel E5-267032GBnodes=2:ppn=16walltime=0:0:30:00dedicated
gpu8core2x Intel E5520/2xGTX48012GBnodes=5:ppn=8walltime=3:00:00:00gpu jobs only
gpu16core2x Intel E5-2630V3/2xK40c64GBnodes=1:ppn=16walltime=3:00:00:00gpu jobs only
mem96GB12core2x Intel X567096GBnodes=4:ppn=12walltime=3:00:00:00>24GB shared memory only
student48core4x AMD 6174256GBnodes=1:ppn=48walltime=0:06:00:00student/debug only
mem512GB64core4x AMD 6276512GBnodes=2:ppn=64walltime=3:00:00:00>96GB shared memory only
mem768GB32core4x Intel E5-4640768GBnodes=2:ppn=32walltime=3:00:00:00>512 GB shared memory only

Production queues for most usage are tiny/med 12core/16core.

Trestles general use queues are (the first three queues are production queues in quantity)

queueCPUmemory/nodemax PBS specmax PBS timenotes
q30m32c4x AMD 613664GBnodes=128:ppn=32walltime=0:0:30:00node pool shared
q06h32c4x AMD 613664GBnodes=128:ppn=32walltime=0:06:00:00node pool shared
q72h32c4x AMD 613664GBnodes=64:ppn=32walltime=3:00:00:00node pool shared
q10m32c4x AMD 613664GBnodes=4:ppn=32walltime=0:0:10:00or "qtraining" dedicated

“node pool shared” on tiny/med or q30m/q06h/q72h means that the queues allocate jobs from a common pool of identical nodes, with some dedicated for the shorter queues.

For a complete listing of all defined queues and their properties on each cluster please use the qstat -q command. Output on Trestles is shown below.

tres-l1:pwolinsk:$ qstat -q
server: torque
Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
qcABI              --      --    2400:00:     1   1   2 --   E R
q06h16c256gb       --      --    06:00:00     3   0   0 --   E R
q06h32c768gb       --      --    06:00:00     1   0   0 --   E R
q06h               --      --    06:00:00     1   0   0 --   E R
qcondo             --      --    2400:00:     3   3   3 --   E R
q10m32c            --      --    00:10:00     1   0   0 --   E R
q30m32c            --      --    00:30:00    64   0   0 --   E R
qcDouglas          --      --    2400:00:     3   0   0 --   E R
qtraining          --      --    00:10:00    64   0   0 --   E R
q06h32c            --      --    06:00:00    64   7   0 --   E R
q72h32c            --      --    72:00:00    32  48  35 --   E R
                                               ----- -----
                                                  59    40

Here is a table to help you select a razor queue for a particular job:

ProgramYesNo
User has permission of condo queue owneryes: use condo queuesno:don't use condo queues
Shared-memory program uses GPUyes: use GPU queuedon't use GPU queue
Shared-memory GPU program uses CUDA double precision or requires Teslayes: use 16core GPU queues (Tesla GPU)no: use 8core GPU queues (GTX GPU)
Shared-memory GPU program uses multiple GPUs or more than half of GPU node main memoryyes: use full GPU node(see gpu article)no: use half GPU node(see gpu article)
MPI GPU progamcontact hpc-support for assistance
Shared-memory program uses single core and <2GB main memory (note 1)yes: use serial12core queue with ppn=1no: use full node with ppn=number of cores
Multiple single-core programs run simultaneouslyyes: contact hpc-support for assistanceno: use serial12core
Shared-memory program uses 2GB<main memory (note 1)<24GByes: use {tiny|med}12core queue with ppn=12(note 2)
Shared-memory program uses 24GB<main memory (note 1)<32GByes: use {tiny|med}16core queue with ppn=16(note 2)
Shared-memory program uses 32GB<main memory (note 1)<96GByes: use mem96GB12core queue with ppn=12(note 2)
Shared-memory program uses 96GB<main memory (note 1)<512GByes: use mem512GB64core queue with ppn=64(note 2)
Shared-memory program uses 512GB<main memory (note 1)<768GByes: use mem768GB32core queue with ppn=32(note 2)
Shared-memory program uses 768GB<main memory (note 1)<3072GByes: use ABI trestles node if in ABI group or ask for special permission(note 2)
Distributed-memory/MPI program uses more than one nodeyes:use nodes=n:ppn=(number of cores in queue) on {tiny|med}{12|16}core queues (note 3)single-node mpi:treat like shared-memory

note 1: To determine memory usage after the run, look in your scheduler log file for the larger of “mem” or “vmem” in “Resources Used”, which is memory used on the first compute node:

Resources Used: cput=05:58:00,mem=116588kb,vmem=2197528kb,walltime=00:01:01

Indicates 116 MB of mem, 22 GB of vmem, so it is near the upper limit for a 12-core 24-GB node. Memory also has to hold the operating system, so usable memory is 1-2 GB less than indicated.

note 2: One-time test jobs with overspecified nodes are permissible (i.e. 26 GB memory use in 512 GB queue). Regular production use of overspecified nodes is subject to cancellation (high memory nodes are much more expensive).

note 3: When using multiple nodes, always specify the complete number of cores in the queue/hardware (i.e. nodes=6:ppn=16 in 16core queues) even if memory requirements or fixed grids force you to not use all of the cores (in that case reduce the number of mpi threads “mpi -n” and restructure the nodefile). See MPI.

Update

After scheduler issues were found, maximum queued jobs per user has been set to 1000.

queues.txt · Last modified: 2017/03/21 18:23 by root