User Tools

Site Tools


ahpcc_slurmjob_watcher

AHPCC has a custom tool to help you interface with the SLURM scheduler. This tool is designed to retrieve the most commonly sought-after metrics of your SLURM jobs with much less effort to the user.

The script can be found here:

/path/to/ahpcc_slurmjob_watcher.sh/on/pinnacle

Should you find the tool useful, you might consider setting up an alias for it in your .bashrc file. An example of an alias you could set would be:

alias sjob='/path/to/ahpcc_slurmjob_watcher.sh/on/pinnacle'

which would allow you to access this script by simply typing

sjob

in your AHPCC linux terminal.

Here are the usage notes from the current version of the script. The same usage notes can be obtained on any AHPCC cluster by executing this command:

/path/to/ahpcc_slurmjob_watcher.sh/on/pinnacle --help 
===================================================================================
 ahpcc_slurmjob_watcher.sh      (Arkansas High Performance Computing Center's tool
                                 to help you be a good WATCHER of your SLURM JOBs)

 Written by:           T. Ryan Rogers (trr007@email.uark.edu)
 Last modified:        10/02/2019
-----------------------------------------------------------------------------------
__USAGE__
     ahpcc_slurmjob_watcher.sh   [OPTIONS]

__SYNOPSIS__
     This script prints custom-formatted information about the user's SLURM jobs.
 Currently, this program can not be used (unlike SLURM\'s squeue command) to see
 information about another user's jobs.
     None of the options are required. Without any additional arguments, the script
 will display default information for all the user\'s running jobs. Options can be
 used to display pending jobs, extra information, & selective output.

__OPTIONS__
    -h, --help
        Print this usage/help/manual info, then exit.

    NUM, all, oo, 00
        NUM is Any integer. If provided, information for all running jobs belonging
        to $USER, along with NUM pending ("PD" status) jobs, is displayed.
        Other symbols, including "all" or either of the "infinity" symbols
        cause information to be printed for all running and all pending jobs.

    -v, --verbose
        Any string containing "-v" can be used. If used, extra information about
        all running jobs belonging to $USER are displayed.

    -j SLURM_JOB_ID
        Any string containing "-j" can be used. When used, also enter the
        SLURM_JOB_ID associated with a particular job. Only information about job
        SLURM_JOB_ID will be printed.

-----------------------------------------------------------------------------------
__NOTE_1__
     To make full use of this script, the user should use a special setting with
 the SLURM "--job-name=" designation. Specifically, the job name argument
 should be set to PROGRAM~~OUTFILE, where "~~" is the required separator. The
 PROGRAM and OUTFILE keywords are described below.
     Currently, this script only recognizes the following options for PROGRAM:

         COMSOL     : COMSOL Multiphysics simulation software
         GAMESS     : General Atomic and Molecular Electronic Structure System
         GAUSSIAN   : (computational chemistry software)
         GROMACS    : GROningen MAchine for Chemical Simulations
         MOLPRO     : Molpro Quantum Chemistry Software
         PQS        : Parallel Quantum Solutions
         SAPT       : Symmetry Adapted Perturbation Theory, specifically SAPT2016.1

 If you do not see your job type listed there, you can email the author to update
 the script to include new options for PROGRAM. New PROGRAM strings must be 10
 characters or less.
     For the OUTFILE keyword, any string can be used, as long at the string matches
 the name of a file present in your running job's scratch directory.

__NOTE_2__
     The -v flag's "Comment" feature may not work properly if a wildcard is
 used in the "--job-name=" designation. Whenever possible, the exact output
 filename should be used without wildcards. E.g. instead of,
     --job-name=SAPT~~*.out
 use something like,
     --job-name=SAPT~~ch3oh-h2o_277_.out

__NOTE_3__
     This script assumes that your output file(s) are located at one of the two
 default locations prepared by the AHPCC, viz. either at
 /scratch/$SLURM_JOB_ID/   or   /local_scratch/$SLURM_JOB_ID/.
 Running jobs in other locations may not be recognized by the script and is
 therefore discouraged.

__NOTE_4__
     When using the -j flag, -v is still functional, but printing of PD jobs is
 disabled. If you wish to see PD jobs, consider executing the script again without
 the -j flag.
===================================================================================

Here are some real-world examples of how to use this script (assuming the above-mentioned alias is set), including some example output:

Display the above help page only:

sjob -h

Display standard information for all your jobs– including those running and queued (each option equivalent; first two are the infinity symbol):

sjob oo
sjob 00
sjob all

=============================================================================================================
 #       JOBID   PARTITION  NODES  CPUS    HEAD_NODE    TIME_LIMIT            ELAP_TIME    JOB_NAME  WORK_DIR
-------------------------------------------------------------------------------------------------------------
 1        6310       condo      1    16        c1310      12:00:00                 0:06     GROMACS  /my/example/job/directory1/
 2        6311       condo      1    16        c1310      12:00:00                 0:06     GROMACS  /my/example/job/directory2/
 3        6312       condo      1    16        c1311      12:00:00                 0:06     GROMACS  /my/example/job/directory3/
 4        6313       condo      1    16        c1311      12:00:00                 0:06     GROMACS  /my/example/job/directory4/
 5        6314       condo      1    16        c1312      12:00:00                 0:06     GROMACS  /my/example/job/directory5/
 6        6315       condo      1    16        c1312      12:00:00                 0:06     GROMACS  /my/example/job/directory6/
=============================================================================================================
 #       JOBID   PARTITION  NODES  CPUS   SCHEDNODES    TIME_LIMIT       EST_START_TIME    JOB_NAME  WORK_DIR
-------------------------------------------------------------------------------------------------------------
 1        6323       condo      1    16       (null)      12:00:00                  N/A     GROMACS  /my/example/job/directory7/
 2        6324       condo      1    16       (null)      12:00:00                  N/A     GROMACS  /my/example/job/directory8/
 3        6325       condo      1    16       (null)      12:00:00                  N/A     GROMACS  /my/example/job/directory9/
=============================================================================================================

Display standard information for all running jobs and only the first 2 queued jobs:

sjob 2
=============================================================================================================
 #       JOBID   PARTITION  NODES  CPUS    HEAD_NODE    TIME_LIMIT            ELAP_TIME    JOB_NAME  WORK_DIR
-------------------------------------------------------------------------------------------------------------
 1        6310       condo      1    16        c1310      12:00:00                 0:06     GROMACS  /my/example/job/directory1/
 2        6311       condo      1    16        c1310      12:00:00                 0:06     GROMACS  /my/example/job/directory2/
 3        6312       condo      1    16        c1311      12:00:00                 0:06     GROMACS  /my/example/job/directory3/
 4        6313       condo      1    16        c1311      12:00:00                 0:06     GROMACS  /my/example/job/directory4/
 5        6314       condo      1    16        c1312      12:00:00                 0:06     GROMACS  /my/example/job/directory5/
 6        6315       condo      1    16        c1312      12:00:00                 0:06     GROMACS  /my/example/job/directory6/
=============================================================================================================
 #       JOBID   PARTITION  NODES  CPUS   SCHEDNODES    TIME_LIMIT       EST_START_TIME    JOB_NAME  WORK_DIR
-------------------------------------------------------------------------------------------------------------
 1        6323       condo      1    16       (null)      12:00:00                  N/A     GROMACS  /my/example/job/directory7/
 2        6324       condo      1    16       (null)      12:00:00                  N/A     GROMACS  /my/example/job/directory8/
=============================================================================================================

Display extra, “verbose” information only for job with $SLURM_JOBID=6314:

sjob -v -j 6314
========================================================================================================================================
 #       JOBID   PARTITION  NODES  CPUS    NODE_LIST  CPU_1m  CPU_15m    TIME_LIMIT            ELAP_TIME    JOB_NAME  WORK_DIR / COMMENT
----------------------------------------------------------------------------------------------------------------------------------------
 5        6314       condo      1    16        c1312   14.06     1.22      12:00:00                 0:36     GROMACS  /my/example/job/directory5/
                                                                                                                                20400       10.20000

========================================================================================================================================

The extra output obtained by a “-v” flag is dependent on the “JOB_NAME” that the script finds. Admittedly, understanding the extra information often requires some familiarity the program in use. For example, in these GROMACS jobs, the verbose information includes the simulation step number and simulation time, respectively. Each program will have its own verbose output.

Display only the job with $SLURM_JOBID=6314:

sjob -j 6314

Display extra, “verbose” information for all running jobs:

sjob -v
ahpcc_slurmjob_watcher.txt · Last modified: 2019/11/04 23:24 by trr007