User Tools

Site Tools


slurm_sbatch_srun

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
slurm_sbatch_srun [2023/03/01 20:29]
root created
slurm_sbatch_srun [2023/03/01 22:46] (current)
root
Line 1: Line 1:
-====Slurm sbatch/srun parameters+====Slurm sbatch/srun scripts==== 
 + 
 +Slurm jobs may be submitted by: 
 + 
 +1. Slurm batch scripts submitted by ''sbatch'' 
 +2. PBS batch scripts submitted by ''sbatch'' or ''qsub'' 
 +3. Slurm interactive submitted by ''srun'' 
 +4. Slurm interactive and graphical submitted by [[ portal_login_new | OpenOnDemand ]] 
 + 
 +Essential slurm subcommands and available values are described in [[ selecting_resources | Selecting Resources ]].  The same constraints apply regardless of the source of the commands.
  
 Basic slurm commands are: Basic slurm commands are:
 <csv> <csv>
 slurm, use slurm, use
-sbatch         ,       submit <job file>+sbatch         ,       submit <job script>
 srun            ,      submit interactive job srun            ,      submit interactive job
 squeue           ,     list all queued jobs squeue           ,     list all queued jobs
Line 12: Line 21:
 </csv> </csv>
  
-Torque compatibility layer also offers some torque commands such as ''qstat'' and ''qsub''.  A basic script in ''slurm'' looks like:+basic slurm batch script for MPI (2 full nodes) follows. It should begin with "#!/bin/sh" or other shell such as "#!/bin/tcsh"
 + 
 +For MPI jobs of more than one node, a ''hostfile'' or ''machinefile'' is required, and is optional for single-node MPI. The machinefile below is auto-generated by slurm startup and uses the unique job number ${SLURM_JOB_ID}.  The slurm variable ${SLURM_NTASKS} will be defined as nodes * tasks-per-node, here 64 at runtime. Usually unless out of memory we want tasks-per-node x cpus-per-task to equal the number of cores in a partition node (here 32) to allocate a full node, and usually MPI processes to be that number times x the number of nodes.
  
 <code> <code>
-#!/bin/bash +#!/bin/sh
-#SBATCH --job-name=mpi +
-#SBATCH --output=zzz.slurm+
 #SBATCH --partition comp06 #SBATCH --partition comp06
 +#SBATCH --qos comp
 #SBATCH --nodes=2 #SBATCH --nodes=2
 #SBATCH --tasks-per-node=32 #SBATCH --tasks-per-node=32
 +#SBATCH --cpus-per-task=1
 #SBATCH --time=6:00:00 #SBATCH --time=6:00:00
-cd $SLURM_SUBMIT_DIR 
 module purge module purge
 module load intel/18.0.1 impi/18.0.1 mkl/18.0.1 module load intel/18.0.1 impi/18.0.1 mkl/18.0.1
-mpirun -np $SLURM_NTASKS -machinefile /scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID} ./mympiexe -inputfile MA4um.mph -outputfile MA4um-output.mph+mpirun -np $SLURM_NTASKS -machinefile /scratch/${SLURM_JOB_ID}/machinefile_
 +${SLURM_JOB_ID} ./mympiexe <inputfile >logfile
 </code> </code>
  
-and a more complex script with file moving looks like+A similar interactive job with one node in the ''comp01'' partition would be
 +<code> 
 +srun  --nodes 1 --ntasks-per-node=1  --cpus-per-task=32 --partition comp01 --qos comp \ 
 +--time=1:00:00 --pty /bin/bash 
 +</code> 
 +All the slurm options between ''srun'' and ''--pty'' are the same for ''srun'' or ''sbatch''
 +Then the ''module'' and ''mpirun'' commands would be entered interactively.
  
 +A PBS compatibility layer will run simple PBS scripts under slurm. Basic PBS commands that can be interpreted as slurm commands will be translated.  Commands ''qsub'' and ''qstat -u'' are available.
 <code> <code>
 +$ cat qcp2.sh
 #!/bin/bash #!/bin/bash
-#SBATCH --job-name=espresso +#PBS -q cloud72 
-#SBATCH --output=zzz.slurm +#PBS -l walltime=00:10:00 
-#SBATCH --nodes=4 +#PBS -l nodes=1:ppn=
-#SBATCH --tasks-per-node=32 +sleep 5 
-$SBATCH --time=00:00:10 +echo $HOSTNAME
-#SBATCH --partition comp06 +
-module purge +
-module load intel/14.0.3 mkl/14.0.3 fftw/3.3.6 impi/5.1.2 +
-cd $SLURM_SUBMIT_DIR +
-cp *.in *UPF /scratch/$SLURM_JOB_ID +
-cd /scratch/$SLURM_JOB_ID +
-mpirun -ppn 16 -hostfile /scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID} -genv OMP_NUM_THREADS \ /share/apps/espresso/qe-6.1-intel-mkl-impi/bin/pw.x -npools 1 <ausurf.in +
-mv ausurf.log *mix* *wfc* *igk* $SLURM_SUBMIT_DIR/ +
-pinnacle-l1:$ +
-</code>+
  
-See also [ https://www.marquette.edu/high-performance-computing/pbs-to-slurm.php ] [ https://hpc.nih.gov/docs/pbs2slurm.html +$ qsub qcp2.sh 
- ]+1430970
  
 +$ cat qcp2.sh.o1430970
 +c1331
 +
 +</code>
  
 +A bioinformatics and large data example follows.  When a job produces more than about 500 MB or 10,000 files of output, use the /scratch/ or /local_scratch/ directory to redirect program output and temporary files to avoid excess load on the main storage /scrfs/storage/ This job produces about 30 GB of output in 5 files.  
  
 +In this script you have 1) slurm commands 2) job setup 3) go to scratch directory and create ''tmprun'' directory 4) run ''trimmomatic'' in tmprun directory 5) copy back output files and remove output files after successful copy.  
  
 +If you don't understand it, do the copy back and delete manually as misapplied ''rm -rf'' can be very dangerous.  But any possible damage can be limited by using a specific name such as "tmprun" that is not likely to contain important data.  Don't ''rm -rf *'' unless you are very sure what you are doing.
  
- 
-We have a conversion script **/share/apps/bin/pbs2slurm.sh** which should do 95% of the script conversion from old PBS scripts to SLURM scripts. Please report errors by the script so we can improve it. Normally it should be in your path and 
 <code> <code>
-pbs2slurm.sh <pbs-script-name>+#!/bin/bash 
 +#SBATCH --partition tres72 
 +#SBATCH --qos tres 
 +#SBATCH --nodes=1 
 +#SBATCH --ntasks-per-node=1 
 +#SBATCH --cpus-per-task=32 
 +#SBATCH --time=72:00:00 
 +
 +module load python/anaconda-3.7.3 
 +source /share/apps/bin/conda-3.7.3.sh 
 +conda activate tbprofiler 
 +export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK 
 +
 +cd /scratch/$SLURM_JOB_ID 
 +mkdir -p tmprun 
 +cd tmprun 
 +
 +FILE=/storage/jpummil/C.horridus/SnakeNanopore/Kausttrim 
 +trimmomatic PE -threads $SLURM_CPUS_PER_TASK ${FILE}F.fq ${FILE}R.fq 
 +Kausttrim-Unpaired.fq ILLUMINACLIP:TruSeq3-SE:2:30:10 HEADCROP:5 LEADING:3 \ 
 +TRAILING:3 SLIDINGWINDOW:4:28 MINLEN:65 
 +
 +cd .. 
 +rsync -av tmprun $SLURM_SUBMIT_DIR/ 
 +if [ $? -eq 0 ]; then 
 +rm -rf tmprun 
 +fi
 </code> </code>
-will generate the conversion to stdout, thus save with 
-<code> 
-pbs2slurm.sh demoscriptpbs.sh > demoscriptslurm.sh 
-</code> 
- 
-==Notes:== 
- 
-Leading hash-bang /bin/sh or /bin/bash or /bin/tcsh is optional in torque, required in slurm, pbs2slurm.sh inserts it if not present\\ 
- 
-Slurm date formats with days are "2-00:00:00" not "2:00:00:00" like Torque.  If invalid sbatch will use the partition default and srun will kick the job back.\\ 
  
-Slurm unlike Torque does not autogenerate an MPI machinefile/hostfile, so the job creates 
-<code>/scratch/${SLURM_JOB_ID}/machinefile_${SLURM_JOB_ID}</code> 
-The generated machinefile differs from torque machinefile in that it has 1 entry per host instead of ''ncores'' entry per host. 
-Slurm does define a variable with the total number of cores ''$SLURM_NTASKS'', good for most MPI jobs that use every core. 
slurm_sbatch_srun.1677702594.txt.gz · Last modified: 2023/03/01 20:29 by root