Basic slurm commands are:
slurm | use |
---|---|
sbatch | submit <job file> |
srun | submit interactive job |
squeue | list all queued jobs |
squeue -u rfeynman | list queued jobs for user rfeynman |
scancel | cancel <job#> |
sinfo | node status;list of queues |
A Torque compatibility layer also offers some torque commands such as qstat
and qsub
. A basic script in slurm
looks like:
#!/bin/bash #SBATCH --job-name=mpi #SBATCH --output=zzz.slurm #SBATCH --partition comp06 #SBATCH --nodes=2 #SBATCH --tasks-per-node=32 #SBATCH --time=6:00:00 cd $SLURM_SUBMIT_DIR module purge module load intel/18.0.1 impi/18.0.1 mkl/18.0.1 mpirun -np $SLURM_NTASKS -machinefile /scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID} ./mympiexe -inputfile MA4um.mph -outputfile MA4um-output.mph
and a more complex script with file moving looks like:
#!/bin/bash #SBATCH --job-name=espresso #SBATCH --output=zzz.slurm #SBATCH --nodes=4 #SBATCH --tasks-per-node=32 $SBATCH --time=00:00:10 #SBATCH --partition comp06 module purge module load intel/14.0.3 mkl/14.0.3 fftw/3.3.6 impi/5.1.2 cd $SLURM_SUBMIT_DIR cp *.in *UPF /scratch/$SLURM_JOB_ID cd /scratch/$SLURM_JOB_ID mpirun -ppn 16 -hostfile /scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID} -genv OMP_NUM_THREADS 2 \ /share/apps/espresso/qe-6.1-intel-mkl-impi/bin/pw.x -npools 1 <ausurf.in mv ausurf.log *mix* *wfc* *igk* $SLURM_SUBMIT_DIR/ pinnacle-l1:$
See also [ https://www.marquette.edu/high-performance-computing/pbs-to-slurm.php ] [ https://hpc.nih.gov/docs/pbs2slurm.html ]
We have a conversion script /share/apps/bin/pbs2slurm.sh which should do 95% of the script conversion from old PBS scripts to SLURM scripts. Please report errors by the script so we can improve it. Normally it should be in your path and
pbs2slurm.sh <pbs-script-name>
will generate the conversion to stdout, thus save with
pbs2slurm.sh demoscriptpbs.sh > demoscriptslurm.sh
Leading hash-bang /bin/sh or /bin/bash or /bin/tcsh is optional in torque, required in slurm, pbs2slurm.sh inserts it if not present
Slurm date formats with days are “2-00:00:00” not “2:00:00:00” like Torque. If invalid sbatch will use the partition default and srun will kick the job back.
Slurm unlike Torque does not autogenerate an MPI machinefile/hostfile, so the job creates
/scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID}
The generated machinefile differs from torque machinefile in that it has 1 entry per host instead of ncores
entry per host.
Slurm does define a variable with the total number of cores $SLURM_NTASKS
, good for most MPI jobs that use every core.