User Tools

Site Tools


slurm_queues

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
slurm_queues [2020/01/29 21:30]
root
slurm_queues [2024/02/26 19:14] (current)
pwolinsk
Line 1: Line 1:
 =====Slurm Queues Pinnacle/Karpinski===== =====Slurm Queues Pinnacle/Karpinski=====
-Queues or ''slurm'' "partitions" are:+See [[ equipment | Selecting Resources ]] for help on choosing the best node/queue for your work.   
 + 
 +Updates:  
 + 
 +<code> 
 +tres288 queue added with 288 hour/12 day maximum 
 +tres72 time limit changed to 288 hours, same as tres288, retained for existing scripts 
 +csce-k2-72 queue added for new csce Pinnacle-2 nodes 
 +</code> 
 + 
 + 
 +Pinnacle queues or ''slurm'' "partitions" are:
  
 <csv> <csv>
-pinnacle partition,description,time limit,number of nodes,other +pinnacle partition,description,time limit,cores per node,number of nodes,other 
-comp01,192 GB nodes,1 hr, 48,full node usage required +comp01,192 GB nodes,1 hr, 32, 48,full node usage required 
-comp06,192 GB nodes,6 hr, 44,full node usage required +comp06,192 GB nodes,6 hr, 32, 44,full node usage required 
-comp72,192 GB nodes,72 hr, 40,full node usage required +comp72,192 GB nodes,72 hr, 32, 40,full node usage required 
-gpu06,gpu nodes,6 hr,19,gpu usage required/full node usage required +gpu06,gpu nodes,6 hr,32, 19,gpu usage required/full node usage required 
-gpu72,gpu nodes,72 hr,19,gpu usage required/full node usage required +gpu72,gpu nodes,72 hr,32, 19,gpu usage required/full node usage required 
-himem06,768 GB nodes, 6 hr,6,>192 GB memory usage required/full node usage required +himem06,768 GB nodes, 6 hr,24, 6,>192 GB memory usage required/full node usage required 
-himem72,768 GB nodes, 72 hr,6,>192 GB memory usage required/full node usage required +himem72,768 GB nodes,72 hr,24, 6,>192 GB memory usage required/full node usage required 
-cloud72,virtual machines/containers/single processor jobs, 72 hr, 3,for non-intensive computing up to 4 cores+cloud72,virtual machines/containers/single processor jobs, 72 hr, 32, 3,for non-intensive computing up to 4 cores 
 +tres72, 64 GB nodes, 72hr, 32, 23, Trestles nodes with Pinnacle operating system 
 +tres288, 64 GB nodes,  288hr, 32, 23, Trestles nodes with Pinnacle operating system
 </csv> </csv>
 +
 <csv> <csv>
-karpinski partition,description,time limit,number of nodes +karpinski partition,description,time limit,cores per node,number of nodes 
-csce72,32 GB nodes, 72 hr,18 +csce72,32 GB nodes, 72 hr,8, 18 
-cscloud72,virtual machines/containers/single processor jobs, 72 hr, 18+csce-k2-72, 256 GB nodes, 72 hr, 64, 6 
 +cscloud72,virtual machines/containers/single processor jobs, 72 hr,8, 18
 </csv> </csv>
  
Line 22: Line 37:
 <csv> <csv>
 pinnacle partition,description,time limit,number of nodes,other pinnacle partition,description,time limit,number of nodes,other
-condo,condo nodes, none,25, authorization required +condo,condo nodes, none,25, authorization and appropriate properties required 
-pcon06,public use of condo nodes,6 hr, 25,+pcon06,public use of condo nodes,6 hr, 25, appropriate properties required
 </csv> </csv>
 Condo nodes require specification of a sufficient set of slurm properties. Property choices available are: Condo nodes require specification of a sufficient set of slurm properties. Property choices available are:
  
-gpu or not: ''0gpu''/''1v100''/''2v100''\\+**condo/pcon06 jobs running on the wrong nodes through lack of specified properties will be canceled without notice**\\ 
 +**non-gpu jobs running on gpu nodes may be canceled without notice**\\ 
 + 
 +gpu or not: ''0gpu''/''1v100''/''2v100''/''1a100''/''4a100''\\
 processor: ''i6130''/''a7351''/''i6128''\\ processor: ''i6130''/''a7351''/''i6128''\\
 equivalently: ''192gb''/''256gb''/''768gb''\\ equivalently: ''192gb''/''256gb''/''768gb''\\
 equivalently: ''32c''/''32c''/''24c''\\ equivalently: ''32c''/''32c''/''24c''\\
 local drive: ''nvme''/no specification\\ local drive: ''nvme''/no specification\\
-research group: ''fwang'' equivalent to ''2v100''/''i6130''/''768gb''/''32c''/nvme\\ +research group: ''fwang'' equivalent to ''0gpu''/''i6130|i6230''/''768gb''/''32c|40c''/nvme\\ 
-research group: ''tkaman'' equivalent to ''0gpu''/''i6130''/''192gb''/''32c''\\ +research group: ''tkaman'' equivalent to ''2v100''/''i6130''/''192gb''/''32c''\\ 
-research group: ''aja'' equivalent to ''0gpu''/''i6130''/''192gb''/''32c'' or ''0gpu''/''i6128''/''768gb''/''24c''+research group: ''aja'' equivalent to ''0gpu''/''i6128''/''192gb|768gb''/''24c'' 
  
 examples:\\ examples:\\
Line 42: Line 60:
 ''#SBATCH --constraint=256gb'' ''#SBATCH --constraint=256gb''
  
-A script is available to show idle nodes like this (in this case 2 nodes idle in the1-hour comp queue, none in 6-hour or 72-hour comp queue, but nodes available in gpu, himem, and csce.+A script is available to show idle nodes like this (in this case 2 nodes idle in the1-hour comp queue, none in 6-hour or 72-hour comp queue, but nodes available in gpu, himem, csce, and csce cloud.  Sufficient idle nodes in your queue of interest do not guarantee that your job will start immediately, but that is usually the case.
 <code> <code>
 $ idle_pinnacle_nodes.sh $ idle_pinnacle_nodes.sh
Line 50: Line 68:
 $ $
 </code> </code>
 +
 +=== Public Condo Queue - pcon06 ===
 +
 +The condo nodes, which are reserved for priority access by the condo node owners, are also available for public use via the **''pcon06''** queue.  There is to 6 hour walltime limit for **''pcon06''**, but it may be extended upon request if there are no condo owner jobs waiting in the queue.  The **''pcon06''** contains a collection of multiple types of nodes purchased by different departments at various times.  So the hardware configuration for those nodes varies.  Each node in the queue has a set of features assigned to it which describe its hardware.  To select the appropriate node, slurm uses a **constraints** (''-C'') parameter in the **''sbatch''** and **''srun''** commands.  
 +
 +
 +**''pcon06-info.sh''** script is available to list the available idle nodes in the **''pcon06''** queue along with a list of constraints for each node.  
 +
 +
 +<code>
 +pinnacle-l5:pwolinsk:~$ pcon06-info.sh 
 +  Idle pcon06 nodes 
 +
 +  NodeName Constraint list
 +============================
 +    c1302: fwang,0gpu,nvme,384gb,i6230,avx512,40c,intel
 +    c1305: fwang,0gpu,nvme,384gb,i6230,avx512,40c,intel
 +    c1306: fwang,0gpu,nvme,384gb,i6230,avx512,40c,intel
 +    c1307: fwang,0gpu,nvme,384gb,i6230,avx512,40c,intel
 +    c1308: fwang,0gpu,nvme,384gb,i6230,avx512,40c,intel
 +    c1309: fwang,0gpu,nvme,384gb,i6230,avx512,40c,intel
 +    c1310: fwang,0gpu,nvme,384gb,i6230,avx512,40c,intel
 +    c1311: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1312: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1313: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1314: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1315: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1316: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1317: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1318: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1319: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1320: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1321: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1322: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1323: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1324: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1325: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1326: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1328: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1329: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1330: fwang,0gpu,nvme,192gb,i6130,avx512,32c,intel
 +    c1432: aja,0gpu,256gb,a7543,avx2,64c,amd
 +    c1618: jzhao77,0gpu,256gb,a7402,avx2,48c,amd
 +    c1716: yongwang,1v100,192gb,i6230,avx512,40c,intel
 +    c1719: mlbernha,0gpu,256gb,a7351,avx2,32c,amd
 +    c1720: mlbernha,0gpu,256gb,a7351,avx2,32c,amd
 +    c1913: laurent,0gpu,256gb,a7543,avx2,64c,amd
 +    c1915: laurent,0gpu,256gb,a7543,avx2,64c,amd
 +    c1916: laurent,0gpu,256gb,a7543,avx2,64c,amd
 +    c1917: laurent,0gpu,256gb,a7543,avx2,64c,amd
 +    c1918: laurent,0gpu,256gb,a7543,avx2,64c,amd
 +    c1919: laurent,0gpu,256gb,a7543,avx2,64c,amd
 +    c1920: laurent,0gpu,256gb,a7543,avx2,64c,amd
 +    c2001: aimrc,4a100,1024gb,a7543,avx2,64c,amd
 +    c2002: aimrc,4a100,1024gb,a7543,avx2,64c,amd
 +    c2003: aimrc,4a100,1024gb,a7543,avx2,64c,amd
 +    c2004: aimrc,4a100,1024gb,a7543,avx2,64c,amd
 +    c2010: zhang,2a100,512gb,a7543,avx2,64c,amd
 +    c2011: harris,0gpu,1024gb,a7543,avx2,64c,amd
 +    c2101: csce,4a100,1024gb,a7543,avx2,64c,amd
 +    c2102: csce,4a100,1024gb,a7543,avx2,64c,amd
 +    c2103: csce,4a100,1024gb,a7543,avx2,64c,amd
 +    c2104: csce,4a100,1024gb,a7543,avx2,64c,amd
 +    c2105: harris,4a100,1024gb,a7543,avx2,64c,amd
 +    c2112: kmbefus,0gpu,1024gb,a7543,avx2,64c,amd
 +    c2113: fwang,0gpu,512gb,a7543,avx2,64c,amd
 +    c2114: fwang,0gpu,512gb,a7543,avx2,64c,amd
 +    c2115: fwang,0gpu,512gb,a7543,avx2,64c,amd
 +    c2116: fwang,0gpu,512gb,a7543,avx2,64c,amd
 +    c2118: jm217,1a100,1024gb,a7543,avx2,64c,amd
 +    c2402: kwalters,1a40,1024gb,a7543,avx2,64c,amd
 +    c2403: kwalters,1a40,1024gb,a7543,avx2,64c,amd
 +    c2404: kwalters,0gpu,256gb,a7543,avx2,64c,amd
 +    c2405: kwalters,0gpu,256gb,a7543,avx2,64c,amd
 +    c2406: kwalters,0gpu,256gb,a7543,avx2,64c,amd
 +    c2407: kwalters,0gpu,256gb,a7543,avx2,64c,amd
 +    c2408: kwalters,0gpu,256gb,a7543,avx2,64c,amd
 +    c2409: kmbefus,0gpu,1024gb,a7543,avx2,64c,amd
 +    c2416: kwalters,0gpu,256gb,a7543,avx2,64c,amd
 +    c2417: kwalters,0gpu,256gb,a7543,avx2,64c,amd
 +    c2418: kwalters,0gpu,256gb,a7543,avx2,64c,amd
 +    c2421: laurent,0gpu,256gb,a7543,avx2,64c,amd
 +    c2422: laurent,0gpu,256gb,a7543,avx2,64c,amd
 +    c2423: laurent,0gpu,256gb,a7543,avx2,64c,amd
 +    c3101: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3103: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3104: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3107: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3108: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3109: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3110: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3111: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3114: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3115: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3116: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3118: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3119: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3120: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3121: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3122: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3123: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3124: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3125: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3126: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3127: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3128: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3129: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3130: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3131: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3132: pmillett,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3133: pmillett,4k80,128gb,i2650v2,avx,16c,intel
 +    c3201: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3202: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3203: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3204: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3205: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3206: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3207: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3208: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3209: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3210: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3211: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3212: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3213: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3214: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3216: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3217: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3219: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3220: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3221: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3222: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3224: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3226: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +    c3227: nair,0gpu,64gb,i2650v2,avx,16c,intel
 +
 +example submit commands:
 +
 +     srun   -p pcon06 -t 6:00:00 -n 16 -q comp -C 'nair&0gpu&64gb&i2650v2&avx&16c&intel' --pty /bin/bash
 +     sbatch -p pcon06 -t 6:00:00 -n 16 -q comp -C 'nair&0gpu&64gb&i2650v2&avx&16c&intel' <slurm_script>.slrum
 +
 +pinnacle-l5:pwolinsk:~$ srun   -p pcon06 -t 6:00:00 -N 2 -n 16 -q comp -C 'nair&0gpu&64gb&i2650v2&avx&16c&intel' --pty /bin/bash
 +srun: job 338300 queued and waiting for resources
 +srun: job 338300 has been allocated resources
 +c3201:pwolinsk:~$ 
 +</code>
 +
 +
 +
 +
 +
 +
slurm_queues.1580333428.txt.gz · Last modified: 2020/01/29 21:30 by root