Arkansas High Performace Computing Center [hpcwiki]

This is an old revision of the document!

How to use the Pinnacle Cluster

This is a brief “how to” summary of usage for users of the Pinnacle cluster.

Pinnacle has 101 compute nodes. 30 GPU and GPU-ready nodes are Dell R740, 69 nodes are Dell R640, two nodes are Dell R7425. There is no user-side difference between R740 (GPU-ready) and R640 nodes.

All-user nodes number 76, of which 6 nodes have 768 GB of memory and no GPU, 19 nodes have 192 GB and one V100 GPU, and 51 are standard compute nodes with 192 GB and no GPU.

Condo nodes number 25, including 20 Wang, standard compute nodes with NVMe drives, two Alverson, one 192 GB standard and one 768 GB, one Kaman, 768 GB with two V100 GPU, and two Barry, 64-core AMD with 256 GB.

Standard nodes have two Gold 6130 CPUs with total 32 cores at 2.1 GHz. 768 GB nodes have two Gold 6126 CPUs with total 24 cores at 2.6 GHz, fewer and faster cores for better performance on often poorly-threaded bioinformatics applications. The two R7425 nodes have dual Epyc 7351 processors, total 32 cores at 2.4 GHz.

ssh to pinnacle.uark.edu redirects to one of two servers running 7 virtual login machines each, named pinnacle-l1 through pinnacle-l14. If there is a login problem you can try another ssh session and you will be assigned a different virtual machine, which may solve the problem.

Scheduler

We are transitioning to Centos 7 and the SLURM scheduler for Pinnacle, and all nodes and clusters will eventually be transitioned.

Queues (slurm “partitions”) are:

comp72:     standard compute nodes, 72 hour limit, 40 nodes
comp06:     standard compute nodes, 6 hour limit, 44 nodes
comp01:     standard compute nodes: 1 hour limit, 48 nodes
gpu72:      gpu nodes: 72 hour limit, 19 nodes
gpu06:      gpu nodes: 6 hour limit, 19 nodes
himem72:    768 GB nodes, 72 hour limit, 6 nodes
himem06:    768 GB nodes, 6 hour limit, 6 nodes
pubcondo06: condo nodes all-user use, 6 hour limit, various constraints required, 25 nodes
pcon06:     same as pubcondo06, shortened name for easier printout, use this going forward
cloud72:    virtual machines and containers, usually single processor, 72 hour limit, 3 nodes
condo:      condo nodes, no time limit, authorization required, various constraints required, 25 nodes
tres72:     reimaged trestles nodes, 72 hour limit, 23 nodes
tres06:     reimaged trestles nodes, 6 hour limit, 23 nodes

Selecting the right Queue/Partition among multiple clusters

Generally the nodes are reserved for the most efficient use, especially for expensive features such as GPU and extra memory. Pinnacle compute nodes are reserved for scalable programs that can use all 32/24 cores (except for the cloud partition, and condo usage by the owner). Non-scalable programs should be run on Razor/Trestles (unless the 128GB shared memory size on Pinnacle is needed). GPU nodes are reserved for programs that use the GPU. Large memory nodes are reserved for programs that use more shared memory than is available on standard nodes (that is 128 to 768 GB). Condo and pubcondo partitions require constraints so that they are not selected randomly by the scheduler. Possible constraints are

0gpu/1v100/2v100,i6128/i6130/a7351,intel/amd,24c/32c,avx512/
avx2,192gb/256gb/768gb,aja/fwang/mlbernha,nvme

Jobs that don't meet these standards may be canceled without warning. Basic commands are, with transition from Torque/PBS/Maui:

qsub                        sbatch                submit <job file>
qsub -I                     srun                  submit interactive job
qstat                       squeue                list all queued jobs
qstat -u rfeynman           squeue -u rfeynman    list queued jobs for user rfeynman
qdel                        scancel               cancel <job#>
shownodes -l -n;qstat -q    sinfo                 node status;list of queues

See also [ https://hprc.tamu.edu/wiki/TAMU_Supercomputing_Facility:HPRC:Batch_Translation ] [ https://slurm.schedmd.com/rosetta.pdf ] [ https://www.sdsc.edu/~hocks/FG/PBS.slurm.html ]

We have a conversion script /share/apps/bin/pbs2slurm.sh which should do 95% of the script conversion. Please report errors by the script so we can improve it. Here is an example conversion from PBS to SLURM:

tres-l1:$ cat script.sh
#PBS -N espresso
#PBS -j oe
#PBS -o zzz.$PBS_JOBID
#PBS -l nodes=4:ppn=32,walltime=00:00:10
#PBS -q q06h32c
module purge
module load intel/14.0.3 mkl/14.0.3 fftw/3.3.6 impi/5.1.2
cd $PBS_O_WORKDIR
cp *.in *UPF /scratch/$PBS_JOBID
cd /scratch/$PBS_JOBID
sort -u $PBS_NODEFILE >hostfile
mpirun -ppn 16 -hostfile hostfile -genv OMP_NUM_THREADS 4 -genv MKL_NUM_THREADS 4 /share/apps/espresso/qe-6.1-intel-mkl-impi/bin/pw.x -npools 1 <ausurf.in
mv ausurf.log *mix* *wfc* *igk* $PBS_O_WORKDIR/


tres-l1:$ pbs2slurm.sh script.sh  >script.slurm

tres-l1:$ cat script.slurm
#!/bin/bash
#SBATCH --job-name=espresso
#SBATCH --output=zzz.slurm
#SBATCH --nodes=4
#SBATCH --tasks-per-node=32
$SBATCH --time=00:00:10
#SBATCH --partition comp06
module purge
module load intel/14.0.3 mkl/14.0.3 fftw/3.3.6 impi/5.1.2
cd $SLURM_SUBMIT_DIR
cp *.in *UPF /scratch/$SLURM_JOB_ID
cd /scratch/$SLURM_JOB_ID
sort -u /scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID} >hostfile
mpirun -ppn 16 -hostfile hostfile -genv OMP_NUM_THREADS 4 -genv MKL_NUM_THREADS 4 /share/apps/espresso/qe-6.1-intel-mkl-impi/bin/pw.x -npools 1 <ausurf.in
mv ausurf.log *mix* *wfc* *igk* $SLURM_SUBMIT_DIR/
tres-l1:$

and another sample slurm script.

#!/bin/bash
#SBATCH --partition comp06
#SBATCH --nodes=2
#SBATCH --tasks-per-node=32
#SBATCH --time=6:00:00
cd $SLURM_SUBMIT_DIR
module load intel/18.0.1 impi/18.0.1 mkl/18.0.1
mpirun -np $SLURM_NTASKS -machinefile /scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID} ./mympiexe -inputfile MA4um.mph -outputfile MA4um-output.mph

Notes:

Leading hash-bang /bin/sh or /bin/bash or /bin/tcsh is optional in torque, required in slurm, pbs2slurm.sh inserts it if not present Use full nodes only on all-user Pinnacle (tasks-per-node=32 standard and 24 himem) except the cloud partition where single cores are available. All non-cloud jobs should use either all the cores or more than 64 GB of memory, otherwise use Razor/Trestles. If using for the memory allocate all the cores anyway so that the node will not be split by the scheduler. Valid condo (not pcon06) jobs may subdivide the nodes (tasks-per-node = integer divisions of 32/24). Slurm does not autogenerate a machinefile like torque. We have the prologue automatically generate

/scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID}

The generated machinefile differs from torque machinefile in that it has 1 entry per host instead of ncores entry per host. Slurm does define a variable with the total number of cores $SLURM_NTASKS, good for most MPI jobs.

Interactive Jobs in SLURM

Multiple nodes or multiple tasks are not currently supported under srun, multiple cores up to the number in one node are.

srun --nodes=1 --ntasks-per-node=1  --cpus-per-task=32 --partition gpu06 --time=6:00:00 --pty /bin/bash

Another script:

#!/bin/bash
#SBATCH --partition condo
#SBATCH --constraint=nvme
#SBATCH --nodes=1
#SBATCH --tasks-per-node=32
#SBATCH --time=144:00:00
#SBATCH --job-name=MOLPRO_lscr
cd $SLURM_SUBMIT_DIR
cp $SLURM_SUBMIT_DIR/mpr\*inp /local_scratch/$SLURM_JOB_ID/
cd /local_scratch/$SLURM_JOB_ID
module load mkl/18.0.2 intel/18.0.2 impi/18.0.2
/home/trr007/molpro/molprop_2015_1_linux_x86_64_i8/bin/molpro -n 4/4:8 mpr_qm_region.inp -d /local_scratch/$SLURM_JOB_ID -W /local_scratch/$SLURM_JOB_ID
rm -f sf_*TMP* fort*
rsync -av m* $SLURM_SUBMIT_DIR/

Notes:

Condo/pubcondo06 jobs require a constraint sufficient to specify the node (see table below) Similarly to razor/trestles, scratch directories /scratch/$SLURMJOBID and /localscratch/$SLURMJOB_ID are auto-created by the job prolog. Node Constraints in Condo Queues

Wang            (20 nodes)     fwang,0gpu,nvme                ''--constraint 0gpu&192gb&nvme''
Alverson        ( 2 nodes)     aja,0gpu,192gb(1) or 768gb(2)
Kaman           ( 1 node )     tkaman,2v100,768gb
Bernhardt Barry ( 2 nodes)     mlbernha,0gpu,a7351,256gb,amd

requesting a non-gpu condo node: 
                              ''--constraint 0gpu&192gb'' , 
                              ''--constraint 0gpu&256gb'' , 
                              ''--constraint 0gpu&768gb''  (high memory use required for use as pubcondo)
requesting the gpu condo node: 
                              ''--constraint 2v100&768gb'' (dual gpu use required for use as pubcondo)

Software

Modules are the same as on the Trestles cluster. We recommend the latest versions of compiler and math libraries so that they will recognize the AVX512 floating-point instructions. Examples:

module load intel/18.0.2 mkl/18.0.2 impi/18.0.2
module load gcc/7.3.1

Arkansas High Performace Computing Center [hpcwiki]

User Tools

Site Tools

**This is an old revision of the document!**

How to use the Pinnacle Cluster

Login

Scheduler

Selecting the right Queue/Partition among multiple clusters

Notes:

Interactive Jobs in SLURM

Notes:

Software

Page Tools

This is an old revision of the document!