Slurm jobs may be submitted by:
sbatch
sbatch
or qsub
srun
Essential slurm subcommands and available values are described in Selecting Resources . The same constraints apply regardless of the source of the commands.
Basic slurm commands are:
slurm | use |
---|---|
sbatch | submit <job script> |
srun | submit interactive job |
squeue | list all queued jobs |
squeue -u rfeynman | list queued jobs for user rfeynman |
scancel | cancel <job#> |
sinfo | node status;list of queues |
A basic slurm batch script for MPI (2 full nodes) follows. It should begin with “#!/bin/sh” or other shell such as “#!/bin/tcsh”.
For MPI jobs of more than one node, a hostfile
or machinefile
is required, and is optional for single-node MPI. The machinefile below is auto-generated by slurm startup and uses the unique job number ${SLURMJOBID}. The slurm variable ${SLURM_NTASKS} will be defined as nodes * tasks-per-node, here 64 at runtime. Usually unless out of memory we want tasks-per-node x cpus-per-task to equal the number of cores in a partition node (here 32) to allocate a full node, and usually MPI processes to be that number times x the number of nodes.
#!/bin/sh #SBATCH --partition comp06 #SBATCH --qos comp #SBATCH --nodes=2 #SBATCH --tasks-per-node=32 #SBATCH --cpus-per-task=1 #SBATCH --time=6:00:00 module purge module load intel/18.0.1 impi/18.0.1 mkl/18.0.1 mpirun -np $SLURM_NTASKS -machinefile /scratch/${SLURM_JOB_ID}/machinefile_\ ${SLURM_JOB_ID} ./mympiexe <inputfile >logfile
A similar interactive job with one node in the comp01
partition would be:
srun --nodes 1 --ntasks-per-node=1 --cpus-per-task=32 --partition comp01 --qos comp \ --time=1:00:00 --pty /bin/bash
All the slurm options between srun
and –pty
are the same for srun
or sbatch
.
Then the module
and mpirun
commands would be entered interactively.
A PBS compatibility layer will run simple PBS scripts under slurm. Basic PBS commands that can be interpreted as slurm commands will be translated. Commands qsub
and qstat -u
are available.
$ cat qcp2.sh #!/bin/bash #PBS -q cloud72 #PBS -l walltime=00:10:00 #PBS -l nodes=1:ppn=2 sleep 5 echo $HOSTNAME $ qsub qcp2.sh 1430970 $ cat qcp2.sh.o1430970 c1331 $
A bioinformatics and large data example follows. When a job produces more than about 500 MB or 10,000 files of output, use the /scratch/ or /local_scratch/ directory to redirect program output and temporary files to avoid excess load on the main storage /scrfs/storage/. This job produces about 30 GB of output in 5 files.
In this script you have 1) slurm commands 2) job setup 3) go to scratch directory and create tmprun
directory 4) run trimmomatic
in tmprun directory 5) copy back output files and remove output files after successful copy.
If you don't understand it, do the copy back and delete manually as misapplied rm -rf
can be very dangerous. But any possible damage can be limited by using a specific name such as “tmprun” that is not likely to contain important data. Don't rm -rf *
unless you are very sure what you are doing.
#!/bin/bash #SBATCH --partition tres72 #SBATCH --qos tres #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=32 #SBATCH --time=72:00:00 # module load python/anaconda-3.7.3 source /share/apps/bin/conda-3.7.3.sh conda activate tbprofiler export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK # cd /scratch/$SLURM_JOB_ID mkdir -p tmprun cd tmprun # FILE=/storage/jpummil/C.horridus/SnakeNanopore/Kausttrim trimmomatic PE -threads $SLURM_CPUS_PER_TASK ${FILE}F.fq ${FILE}R.fq \ Kausttrim-Unpaired.fq ILLUMINACLIP:TruSeq3-SE:2:30:10 HEADCROP:5 LEADING:3 \ TRAILING:3 SLIDINGWINDOW:4:28 MINLEN:65 # cd .. rsync -av tmprun $SLURM_SUBMIT_DIR/ if [ $? -eq 0 ]; then rm -rf tmprun fi