User Tools

Site Tools


ahpcc_slurmjob_watcher

AHPCC has a custom tool to help you interface with the SLURM scheduler. This tool is designed to retrieve the most commonly sought-after metrics of your SLURM jobs with much less effort to the user.

The script can be found here:

/path/to/ahpcc_slurmjob_watcher.sh/on/pinnacle

Should you find the tool useful, you might consider setting up an alias for it in your .bashrc file. An example of an alias you could set would be:

alias sjob='/path/to/ahpcc_slurmjob_watcher.sh/on/pinnacle'

which would allow you to access this script by simply typing

sjob

in your AHPCC linux terminal.

Here are the usage notes from the current version of the script. The same usage notes can be obtained on any AHPCC cluster by executing this command:

/path/to/ahpcc_slurmjob_watcher.sh/on/pinnacle --help 

<code>

ahpccslurmjobwatcher.sh (Arkansas High Performance Computing Center's tool

                               to help you be a good WATCHER of your SLURM JOBs)

Written by: T. Ryan Rogers (trr007@email.uark.edu)

Last modified: 10/02/2019

USAGE

   ahpcc_slurmjob_watcher.sh   [OPTIONS]

SYNOPSIS

   This script prints custom-formatted information about the user's SLURM jobs.

Currently, this program can not be used (unlike SLURM\'s squeue command) to see information about another user's jobs.

   None of the options are required. Without any additional arguments, the script

will display default information for all the user\'s running jobs. Options can be used to display pending jobs, extra information, & selective output.

OPTIONS

  1. h, –help
    Print this usage/help/manual info, then exit.

NUM, all, oo, 00
    NUM is Any integer. If provided, information for all running jobs belonging
    to $USER, along with NUM pending ("PD" status) jobs, is displayed.
    Other symbols, including "all" or either of the "infinity" symbols
    cause information to be printed for all running and all pending jobs.

-v, --verbose
    Any string containing "-v" can be used. If used, extra information about
    all running jobs belonging to $USER are displayed.

-j SLURM_JOB_ID
    Any string containing "-j" can be used. When used, also enter the
    SLURM_JOB_ID associated with a particular job. Only information about job
    SLURM_JOB_ID will be printed.

NOTE_1

   To make full use of this script, the user should use a special setting with

the SLURM “–job-name=” designation. Specifically, the job name argument should be set to PROGRAM~~OUTFILE, where “~~” is the required separator. The PROGRAM and OUTFILE keywords are described below.

   Currently, this script only recognizes the following options for PROGRAM:
       COMSOL     : COMSOL Multiphysics simulation software
       GAMESS     : General Atomic and Molecular Electronic Structure System
       GAUSSIAN   : (computational chemistry software)
       GROMACS    : GROningen MAchine for Chemical Simulations
       MOLPRO     : Molpro Quantum Chemistry Software
       PQS        : Parallel Quantum Solutions
       SAPT       : Symmetry Adapted Perturbation Theory, specifically SAPT2016.1

If you do not see your job type listed there, you can email the author to update the script to include new options for PROGRAM. New PROGRAM strings must be 10 characters or less.

   For the OUTFILE keyword, any string can be used, as long at the string matches

the name of a file present in your running job's scratch directory.

NOTE2 The -v flag's “Comment” feature may not work properly if a wildcard is used in the “–job-name=” designation. Whenever possible, the exact output filename should be used without wildcards. E.g. instead of, –job-name=SAPT~~*.out use something like, –job-name=SAPT~~ch3oh-h2o277_.out NOTE3 This script assumes that your output file(s) are located at one of the two default locations prepared by the AHPCC, viz. either at /scratch/$SLURMJOBID/ or /localscratch/$SLURMJOBID/. Running jobs in other locations may not be recognized by the script and is therefore discouraged.

NOTE_4

   When using the -j flag, -v is still functional, but printing of PD jobs is

disabled. If you wish to see PD jobs, consider executing the script again without

the -j flag.

</code>

Here are some real-world examples of how to use this script (assuming the above-mentioned alias is set), including some example output:

Display the above help page only:

sjob -h

Display standard information for all your jobs– including those running and queued (each option equivalent; first two are the infinity symbol):

sjob oo
sjob 00
sjob all

=============================================================================================================
 #       JOBID   PARTITION  NODES  CPUS    HEAD_NODE    TIME_LIMIT            ELAP_TIME    JOB_NAME  WORK_DIR
-------------------------------------------------------------------------------------------------------------
 1        6310       condo      1    16        c1310      12:00:00                 0:06     GROMACS  /my/example/job/directory1/
 2        6311       condo      1    16        c1310      12:00:00                 0:06     GROMACS  /my/example/job/directory2/
 3        6312       condo      1    16        c1311      12:00:00                 0:06     GROMACS  /my/example/job/directory3/
 4        6313       condo      1    16        c1311      12:00:00                 0:06     GROMACS  /my/example/job/directory4/
 5        6314       condo      1    16        c1312      12:00:00                 0:06     GROMACS  /my/example/job/directory5/
 6        6315       condo      1    16        c1312      12:00:00                 0:06     GROMACS  /my/example/job/directory6/
=============================================================================================================
 #       JOBID   PARTITION  NODES  CPUS   SCHEDNODES    TIME_LIMIT       EST_START_TIME    JOB_NAME  WORK_DIR
-------------------------------------------------------------------------------------------------------------
 1        6323       condo      1    16       (null)      12:00:00                  N/A     GROMACS  /my/example/job/directory7/
 2        6324       condo      1    16       (null)      12:00:00                  N/A     GROMACS  /my/example/job/directory8/
 3        6325       condo      1    16       (null)      12:00:00                  N/A     GROMACS  /my/example/job/directory9/
=============================================================================================================

Display standard information for all running jobs and only the first 2 queued jobs:

sjob 2
=============================================================================================================
 #       JOBID   PARTITION  NODES  CPUS    HEAD_NODE    TIME_LIMIT            ELAP_TIME    JOB_NAME  WORK_DIR
-------------------------------------------------------------------------------------------------------------
 1        6310       condo      1    16        c1310      12:00:00                 0:06     GROMACS  /my/example/job/directory1/
 2        6311       condo      1    16        c1310      12:00:00                 0:06     GROMACS  /my/example/job/directory2/
 3        6312       condo      1    16        c1311      12:00:00                 0:06     GROMACS  /my/example/job/directory3/
 4        6313       condo      1    16        c1311      12:00:00                 0:06     GROMACS  /my/example/job/directory4/
 5        6314       condo      1    16        c1312      12:00:00                 0:06     GROMACS  /my/example/job/directory5/
 6        6315       condo      1    16        c1312      12:00:00                 0:06     GROMACS  /my/example/job/directory6/
=============================================================================================================
 #       JOBID   PARTITION  NODES  CPUS   SCHEDNODES    TIME_LIMIT       EST_START_TIME    JOB_NAME  WORK_DIR
-------------------------------------------------------------------------------------------------------------
 1        6323       condo      1    16       (null)      12:00:00                  N/A     GROMACS  /my/example/job/directory7/
 2        6324       condo      1    16       (null)      12:00:00                  N/A     GROMACS  /my/example/job/directory8/
=============================================================================================================

Display extra, “verbose” information only for job with $SLURMJOBID=6314: sjob -v -j 6314 ======================================================================================================================================== # JOBID PARTITION NODES CPUS NODELIST CPU1m CPU15m TIMELIMIT ELAPTIME JOBNAME WORKDIR / COMMENT


 5        6314       condo      1    16        c1312   14.06     1.22      12:00:00                 0:36     GROMACS  /my/example/job/directory5/
                                                                                                                                20400       10.20000

========================================================================================================================================

The extra output obtained by a “-v” flag is dependent on the “JOB_NAME” that the script finds. Admittedly, understanding the extra information often requires some familiarity the program in use. For example, in these GROMACS jobs, the verbose information includes the simulation step number and simulation time, respectively. Each program will have its own verbose output.

Display only the job with $SLURM_JOBID=6314:

sjob -j 6314

Display extra, “verbose” information for all running jobs:

sjob -v
ahpcc_slurmjob_watcher.txt · Last modified: 2020/09/21 21:06 by root