User Tools

Site Tools


pinnacle_usage

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
pinnacle_usage [2022/01/27 17:06]
root
pinnacle_usage [2022/01/27 17:13] (current)
root
Line 16: Line 16:
 <code> <code>
 comp72/06/01:     standard compute nodes, 72/6/1 hour limit, 42/46/48 nodes comp72/06/01:     standard compute nodes, 72/6/1 hour limit, 42/46/48 nodes
-gpu72/06:      gpu nodes: 72/6 hour limit, 19 nodes +gpu72/06:         gpu nodes: 72/6 hour limit, 19 nodes 
-agpu72/06:     a100 gpu nodes: 72/6 hour limit +agpu72/06:        a100 gpu nodes: 72/6 hour limit 
-himem72/06:    768 GB nodes, 72/6 hour limit, 6 nodes +himem72/06:       768 GB nodes, 72/6 hour limit, 6 nodes 
-pubcondo06: condo nodes all-user use, 6 hour limit, various constraints required, 25 nodes +pubcondo06:       condo nodes all-user use, 6 hour limit, various constraints required, 25 nodes 
-pcon06:     same as pubcondo06, shortened name for easier printout, use this going forward +pcon06:           same as pubcondo06, shortened name for easier printout, use this going forward 
-cloud72:    virtual machines and containers, usually single processor, 72 hour limit, 3 nodes +cloud72:          virtual machines and containers, usually single processor, 72 hour limit, 3 nodes 
-condo:      condo nodes, no time limit, authorization required, various constraints required, 25 nodes +condo:            condo nodes, no time limit, authorization required, various constraints required, 25 nodes 
-tres72/06:     reimaged trestles nodes, 72/06 hour limit, 126 nodes +tres72/06:        reimaged trestles nodes, 72/06 hour limit, 126 nodes 
-razr72/06:     reimaged razor nodes, 72 hour limit, in progress+razr72/06:        reimaged razor nodes, 72 hour limit, in progress
 </code> </code>
- 
- 
- 
  
 ===Transition from Torque/PBS=== ===Transition from Torque/PBS===
Line 103: Line 100:
 Leading hash-bang /bin/sh or /bin/bash or /bin/tcsh is optional in torque, required in slurm, pbs2slurm.sh inserts it if not present Leading hash-bang /bin/sh or /bin/bash or /bin/tcsh is optional in torque, required in slurm, pbs2slurm.sh inserts it if not present
  
-Slurm does not autogenerate machinefile like torque. We have the prologue automatically generate +Slurm does not autogenerate an MPI hostfile/machinefile like torque. We have the prologue automatically generate this as:
  
 <code>/scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID}</code> <code>/scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID}</code>
Line 111: Line 108:
  
 === Interactive Jobs in SLURM === === Interactive Jobs in SLURM ===
-Multiple nodes or multiple tasks are not currently supported under srun, multiple cores up to the number in one node are. 
  
 <code> <code>
Line 134: Line 130:
 rm -f sf_*TMP* fort* rm -f sf_*TMP* fort*
 rsync -av m* $SLURM_SUBMIT_DIR/ rsync -av m* $SLURM_SUBMIT_DIR/
-</code> 
- 
-==Notes:== 
- 
 </code> </code>
  
Line 151: Line 143:
  
 Generally the nodes are reserved for the most efficient use, especially for expensive features such as GPU and extra memory. Generally the nodes are reserved for the most efficient use, especially for expensive features such as GPU and extra memory.
-Pinnacle compute nodes are very busy (comp.. and himem.. partitions) are reserved for scalable programs that can use all 32/24 cores (except for the cloud partition, and condo usage by the owner).+**Pinnacle compute nodes are** very busy (comp.. and himem.. partitions) are reserved for scalable **programs that can use all 32/24 cores** (except for the cloud partition, and condo usage by the owner).
 Cores are allocated by the product of ntasks-per-node x cpus-per-task. Cores are allocated by the product of ntasks-per-node x cpus-per-task.
 Exceptions:  Exceptions: 
Line 158: Line 150:
 (3) two jobs on one high-memory node (2 x 12 cores) that each use more than 192 GB (and less than 384 GB so that they can run on the himem node) (3) two jobs on one high-memory node (2 x 12 cores) that each use more than 192 GB (and less than 384 GB so that they can run on the himem node)
  
-Single core serial jobs should be run on the cloud.. partitions or tres.. or razr.. partitions (unless requiring 64 to 192 GB, then run in the comp.. partitions with 32 cores allocated.+**Single core serial jobs should be run on the cloud.. partitions or tres.. or razr.. partitions** (unless requiring 64 to 192 GB, then run in the comp.. partitions with 32 cores allocated.
  
-GPU nodes are reserved for programs that use the GPU.  +**GPU nodes are reserved for programs that use the GPU** (usually through the cuda libraries).
  
-Large memory nodes are reserved for programs that use more shared memory than the 192 GB available on standard nodes.+**Large memory nodes are reserved for programs that use more shared memory than the 192 GB** available on standard nodes.
  
-Condo jobs must have the id of the project PI/node owner as a constraint and unique node identifying information where the PI has more than one type. +**Condo jobs must have the id of the project PI/node owner as a constraint and unique node identifying information where the PI has more than one type.** 
  
-Pubcondo non-gpu jobs must have 0gpu as a constraint and the number of cores and memory as a constraint, with the memory reasonably related to the job. Options are 16c & 64gb (64 Intel nodes), 32c & 192gb (20 Intel nodes) , 32c & 256 gb (2 AMD nodes), 40c & 384 gb (10 Intel nodes), 48c & 256 gb (1 AMD node), 64c & 112 gb (2 Intel Phi nodes), 64c & 256 gb (5 AMD nodes) , 64c & 512gb (5 AMD nodes), 64c & 1024gb (1 AMD node), 64c & 2048 gb (1 AMD node). A slurm string would look like ''--partition pcon06 --constraint "0gpu & 16c & 64gb"''. Examples (with the same options available in sbatch scripts):+**Pubcondo non-gpu jobs must have** 0gpu as a constraint and the number of cores and memory as a constraint, with the memory reasonably related to the job. Options are 16c & 64gb (64 Intel nodes), 32c & 192gb (20 Intel nodes) , 32c & 256 gb (2 AMD nodes), 40c & 384 gb (10 Intel nodes), 48c & 256 gb (1 AMD node), 64c & 112 gb (2 Intel Phi nodes), 64c & 256 gb (5 AMD nodes) , 64c & 512gb (5 AMD nodes), 64c & 1024gb (1 AMD node), 64c & 2048 gb (1 AMD node). A slurm string would look like ''--partition pcon06 --constraint "0gpu & 16c & 64gb"''. Examples (with the same options available in sbatch scripts):
 <code> <code>
 pinnacle-l1:rfeynman:$ srun --nodes=1 --ntasks-per-node=1 --cpus-per-task=16 --partition pcon06 --qos comp --time=6:00:00 --constraint="0gpu&16c&64gb" --pty /bin/bash pinnacle-l1:rfeynman:$ srun --nodes=1 --ntasks-per-node=1 --cpus-per-task=16 --partition pcon06 --qos comp --time=6:00:00 --constraint="0gpu&16c&64gb" --pty /bin/bash
Line 179: Line 171:
 </code> </code>
  
-Pubcondo gpu jobs must have the gpu type as a constraint and use that many gpus. Options are 4titanv & 24c (1 node), 1v100 & 40c (1 node), 2v100 & 32c (1 node), 1a100 & 64c (2 nodes), 4a100 & 64c (9 nodes).+**Pubcondo gpu jobs must have** the gpu type as a constraint and use that many gpus. Options are 4titanv & 24c (1 node), 1v100 & 40c (1 node), 2v100 & 32c (1 node), 1a100 & 64c (2 nodes), 4a100 & 64c (9 nodes).
  
 ===Selecting cores per node=== ===Selecting cores per node===
pinnacle_usage.1643303164.txt.gz · Last modified: 2022/01/27 17:06 by root