User Tools

Site Tools


pinnacle_usage

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
pinnacle_usage [2022/01/17 17:37]
root
pinnacle_usage [2022/01/27 17:13] (current)
root
Line 4: Line 4:
 Pinnacle has 101 compute nodes. 30 GPU and GPU-ready nodes are Dell R740, 69 nodes are Dell R640, two nodes are Dell R7425. There is no user-side difference between R740 (GPU-ready) and R640 nodes. Pinnacle has 101 compute nodes. 30 GPU and GPU-ready nodes are Dell R740, 69 nodes are Dell R640, two nodes are Dell R7425. There is no user-side difference between R740 (GPU-ready) and R640 nodes.
  
-All-user nodes number 76, of which 6 nodes have 768 GB of memory and no GPU, 19 nodes have 192 GB and one V100 GPU, and 51 are standard compute nodes with 192 GB and no GPU.+All-user nodes number 76, of which 6 nodes have 768 GB of memory and no GPU (himem.. partition), 19 nodes have 192 GB and one V100 GPU (gpu.. partition), and 51 are standard compute nodes with 192 GB and no GPU (comp.. partition).
  
-Condo nodes number 25, including 20 Wang, standard compute nodes with NVMe drives, two Alverson, one 192 GB standard and one 768 GB, one Kaman, 768 GB with two V100 GPU, and two Barry, 64-core AMD with 256 GB. +Standard nodes have two Gold 6130 CPUs with total 32 cores at 2.1 GHz. 768 GB nodes have two Gold 6126 CPUs with total 24 cores at 2.6 GHz, fewer and faster cores for better performance on often poorly-threaded bioinformatics applications. 
- +
-Standard nodes have two Gold 6130 CPUs with total 32 cores at 2.1 GHz. 768 GB nodes have two Gold 6126 CPUs with total 24 cores at 2.6 GHz, fewer and faster cores for better performance on often poorly-threaded bioinformatics applications. The two R7425 nodes have dual Epyc 7351 processors, total 32 cores at 2.4 GHz.+
  
 ===Login=== ===Login===
Line 14: Line 12:
  
 ===Scheduler=== ===Scheduler===
-We are transitioning to Centos 7 and the SLURM scheduler for Pinnacle, and all nodes and clusters will eventually be transitioned. +All systems now use the slurm scheduler.Queues (slurm "partitions") are:
- +
-Queues (slurm "partitions") are:+
  
 <code> <code>
-comp72:     standard compute nodes, 72 hour limit, 40 nodes +comp72/06/01:     standard compute nodes, 72/6/1 hour limit, 42/46/48 nodes 
-comp06:     standard compute nodes, hour limit, 44 nodes +gpu72/06        gpu nodes: 72/6 hour limit, 19 nodes 
-comp01:     standard compute nodes: 1 hour limit, 48 nodes +agpu72/06       a100 gpu nodes: 72/6 hour limit 
-gpu72:      gpu nodes: 72 hour limit, 19 nodes +himem72/06      768 GB nodes, 72/6 hour limit, 6 nodes 
-gpu06     gpu nodes: 6 hour limit, 19 nodes +pubcondo06:       condo nodes all-user use, 6 hour limit, various constraints required, 25 nodes 
-himem72:    768 GB nodes, 72 hour limit, 6 nodes +pcon06:           same as pubcondo06, shortened name for easier printout, use this going forward 
-himem06:    768 GB nodes, 6 hour limit, 6 nodes +cloud72:          virtual machines and containers, usually single processor, 72 hour limit, 3 nodes 
-pubcondo06: condo nodes all-user use, 6 hour limit, various constraints required, 25 nodes +condo:            condo nodes, no time limit, authorization required, various constraints required, 25 nodes 
-pcon06:     same as pubcondo06, shortened name for easier printout, use this going forward +tres72/06       reimaged trestles nodes, 72/06 hour limit, 126 nodes 
-cloud72:    virtual machines and containers, usually single processor, 72 hour limit, 3 nodes +razr72/06       reimaged razor nodes, 72 hour limit, in progress
-condo:      condo nodes, no time limit, authorization required, various constraints required, 25 nodes +
-tres72:     reimaged trestles nodes, 72 hour limit, 23 nodes +
-tres06    reimaged trestles nodes, hour limit, 23 nodes+
 </code> </code>
  
- +===Transition from Torque/PBS=== 
-===Selecting the right Queue/Partition among multiple clusters=== +Basic slurm commands are shownwith transition from Torque/PBS/Maui Compatibility commands are installed in slurm for qsub/qstat/qstat -u/qdel/qstat -q so those commands may still be used.
- +
-Generally the nodes are reserved for the most efficient useespecially for expensive features such as GPU and extra memory. +
-Pinnacle compute nodes are reserved for scalable programs that can use all 32/24 cores (except for the cloud partition, and condo usage by the owner). +
-Non-scalable programs should be run on Razor/Trestles (unless the 128GB shared memory size on Pinnacle is needed). +
-GPU nodes are reserved for programs that use the GPU. +
-Large memory nodes are reserved for programs that use more shared memory than is available on standard nodes (that is 128 to 768 GB). +
-Condo and pubcondo partitions require constraints so that they are not selected randomly by the scheduler. +
-Possible constraints are +
  
 <code> <code>
-0gpu/1v100/2v100,i6128/i6130/a7351,intel/amd,24c/32c,avx512/ +sbatch                      qsub                        submit <job file> 
-avx2,192gb/256gb/768gb,aja/fwang/mlbernha,nvme +srun                        qsub -I                     submit interactive job 
-</code> +squeue                      qstat                       list all queued jobs 
- +squeue -u -rfeynman         qstat -u rfeynman           list queued jobs for user rfeynman 
-Jobs that don't meet these standards may be canceled without warning. +scancel                     qdel                        cancel <job#> 
-Basic commands are, with transition from Torque/PBS/Maui: +sinfo                       shownodes -l -n;qstat -q    node status;list of queues
- +
-<code> +
-qsub                        sbatch                submit <job file> +
-qsub -I                     srun                  submit interactive job +
-qstat                       squeue                list all queued jobs +
-qstat -u rfeynman           squeue -u rfeynman    list queued jobs for user rfeynman +
-qdel                        scancel               cancel <job#> +
-shownodes -l -n;qstat -q    sinfo                 node status;list of queues+
 </code> </code>
  
Line 67: Line 44:
  
 <code> <code>
-tres-l1:$ cat script.sh+pinnacle-l1:$ cat pbsscript.sh
 #PBS -N espresso #PBS -N espresso
 #PBS -j oe #PBS -j oe
Line 83: Line 60:
  
  
-tres-l1:$ pbs2slurm.sh script.sh  >script.slurm+pinnacle-l1:$ pbs2slurm.sh pbsscript.sh  >slurmscript.sh
  
-tres-l1:$ cat script.slurm+pinnacle-l1:$ cat slurmscript.sh
 #!/bin/bash #!/bin/bash
 #SBATCH --job-name=espresso #SBATCH --job-name=espresso
Line 122: Line 99:
  
 Leading hash-bang /bin/sh or /bin/bash or /bin/tcsh is optional in torque, required in slurm, pbs2slurm.sh inserts it if not present Leading hash-bang /bin/sh or /bin/bash or /bin/tcsh is optional in torque, required in slurm, pbs2slurm.sh inserts it if not present
-Use full nodes only on all-user Pinnacle (tasks-per-node=32 standard and 24 himem) except the cloud partition where single cores are available. + 
-All non-cloud jobs should use either all the cores or more than 64 GB of memory, otherwise use Razor/Trestles. +Slurm does not autogenerate an MPI hostfile/machinefile like torque. We have the prologue automatically generate this as:
-If using for the memory allocate all the cores anyway so that the node will not be split by the scheduler. +
-Valid condo (not pcon06) jobs may subdivide the nodes (tasks-per-node = integer divisions of 32/24). +
-Slurm does not autogenerate machinefile like torque. We have the prologue automatically generate +
  
 <code>/scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID}</code> <code>/scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID}</code>
  
-The generated machinefile differs from torque machinefile in that it has 1 entry per host instead of ''ncores'' entry per host. +The generated machinefile differs from torque machinefile in that it has 1 entry per host instead of ''ncores'' entries per host. 
-Slurm does define a variable with the total number of cores ''$SLURM_NTASKS'', good for most MPI jobs.+Slurm defines variables ''$SLURM_NTASKS'' and ''$SLURM_CPUS_PER_TASK'' Usually these should be set by the job request and request ''$SLURM_NTASKS'' MPI tasks and ''$SLURM_CPUS_PER_TASK'' OpenMP threads with their product usually equal to the number of cores in the node.
  
 === Interactive Jobs in SLURM === === Interactive Jobs in SLURM ===
-Multiple nodes or multiple tasks are not currently supported under srun, multiple cores up to the number in one node are. 
  
 <code> <code>
Line 159: Line 132:
 </code> </code>
  
-==Notes:== +=== Software=== 
- +Modules are the same as on the Trestles cluster. We recommend the more recent versions of compiler and math libraries so that they will recognize the AVX512 floating-point instructionsExamples:
-Condo/pubcondo06 jobs require a constraint sufficient to specify the node (see table below) +
-Similarly to razor/trestles, scratch directories **/scratch/$SLURM_JOB_ID** and **/local_scratch/$SLURM_JOB_ID** are auto-created by the job prologNode Constraints in Condo Queues+
  
 <code> <code>
-Wang            (20 nodes)     fwang,0gpu,nvme                ''--constraint 0gpu&192gb&nvme'' +module load intel/20.0.1 mkl/20.0.1 impi/20.0.
-Alverson        ( 2 nodes)     aja,0gpu,192gb(1) or 768gb(2) +module load gcc/9.3.1
-Kaman           node )     tkaman,2v100,768gb +
-Bernhardt Barry ( 2 nodes)     mlbernha,0gpu,a7351,256gb,amd +
- +
-requesting a non-gpu condo node:  +
-                              ''--constraint 0gpu&192gb'' ,  +
-                              ''--constraint 0gpu&256gb'' ,  +
-                              ''--constraint 0gpu&768gb''  (high memory use required for use as pubcondo) +
-requesting the gpu condo node:  +
-                              ''--constraint 2v100&768gb'' (dual gpu use required for use as pubcondo)+
 </code> </code>
  
-=== Software=== +===Selecting the right Queue/Partition among multiple clusters===
-Modules are the same as on the Trestles cluster. We recommend the latest versions of compiler and math libraries so that they will recognize the AVX512 floating-point instructions. Examples:+
  
 +Generally the nodes are reserved for the most efficient use, especially for expensive features such as GPU and extra memory.
 +**Pinnacle compute nodes are** very busy (comp.. and himem.. partitions) are reserved for scalable **programs that can use all 32/24 cores** (except for the cloud partition, and condo usage by the owner).
 +Cores are allocated by the product of ntasks-per-node x cpus-per-task.
 +Exceptions: 
 +(1) serial/single core jobs that use more memory than available on Razor/Trestles (64 to 192 GB)
 +(2) multiple jobs submitted together that use a whole node, such as 4 x 8 cores
 +(3) two jobs on one high-memory node (2 x 12 cores) that each use more than 192 GB (and less than 384 GB so that they can run on the himem node)
 +
 +**Single core serial jobs should be run on the cloud.. partitions or tres.. or razr.. partitions** (unless requiring 64 to 192 GB, then run in the comp.. partitions with 32 cores allocated.
 +
 +**GPU nodes are reserved for programs that use the GPU** (usually through the cuda libraries).
 +
 +**Large memory nodes are reserved for programs that use more shared memory than the 192 GB** available on standard nodes.
 +
 +**Condo jobs must have the id of the project PI/node owner as a constraint and unique node identifying information where the PI has more than one type.** 
 +
 +**Pubcondo non-gpu jobs must have** 0gpu as a constraint and the number of cores and memory as a constraint, with the memory reasonably related to the job. Options are 16c & 64gb (64 Intel nodes), 32c & 192gb (20 Intel nodes) , 32c & 256 gb (2 AMD nodes), 40c & 384 gb (10 Intel nodes), 48c & 256 gb (1 AMD node), 64c & 112 gb (2 Intel Phi nodes), 64c & 256 gb (5 AMD nodes) , 64c & 512gb (5 AMD nodes), 64c & 1024gb (1 AMD node), 64c & 2048 gb (1 AMD node). A slurm string would look like ''--partition pcon06 --constraint "0gpu & 16c & 64gb"''. Examples (with the same options available in sbatch scripts):
 <code> <code>
-module load intel/18.0.2 mkl/18.0.2 impi/18.0.2 +pinnacle-l1:rfeynman:$ srun --nodes=1 --ntasks-per-node=1 --cpus-per-task=16 --partition pcon06 --qos comp --time=6:00:00 --constraint="0gpu&16c&64gb" --pty /bin/bash 
-module load gcc/7.3.1+srun: job 706884 queued and waiting for resources 
 +srun: job 706884 has been allocated resources 
 +c3204:rfeynman66:
 + 
 +pinnacle-l3:rfeynman:$ srun --nodes=--ntasks-per-node=1 --cpus-per-task=24 --partition pcon06 --qos comp --time=6:00:00 --constraint="4titanv&24c" --pty /bin/bash 
 +srun: job 706892 queued and waiting for resources 
 +srun: job 706892 has been allocated resources 
 +c1522:rfeynman:
 </code> </code>
 +
 +**Pubcondo gpu jobs must have** the gpu type as a constraint and use that many gpus. Options are 4titanv & 24c (1 node), 1v100 & 40c (1 node), 2v100 & 32c (1 node), 1a100 & 64c (2 nodes), 4a100 & 64c (9 nodes).
 +
 +===Selecting cores per node===
 +
 +To maintain throughput and avoid wasting capacity with partly-filled nodes, there are standards for selecting part node/full node jobs on comp..,himem.., and gpu.. partititons.  These three partitions are restricted to whole nodes with a few exceptions.  Whole nodes means the product ntasks-per-node x cpus-per-task = the number of cores per node, 32 for comp.. and gpu.. and 24 for himem...
 +
 +Permitted exceptions:  Jobs submitted at once x cores per job=32, such as 2 jobs x16 cores,4x8,8x4,16x2 on comp.., 2 jobs x 12 cores with 192 GB < memory per job < 384 GB and
 +multiple gpu.. jobs meant to share the gpu (set cores per job to set the number of jobs per node as for comp).
 +
 +Jobs that don't meet these standards may be canceled without warning.
pinnacle_usage.1642441029.txt.gz · Last modified: 2022/01/17 17:37 by root