AHPCC has a custom tool to help you interface with the SLURM scheduler. This tool is designed to retrieve the most commonly sought-after metrics of your SLURM jobs with much less effort to the user. The script can be found here: /path/to/ahpcc_slurmjob_watcher.sh/on/pinnacle Should you find the tool useful, you might consider setting up an alias for it in your .bashrc file. An example of an alias you could set would be: alias sjob='/path/to/ahpcc_slurmjob_watcher.sh/on/pinnacle' which would allow you to access this script by simply typing sjob in your AHPCC linux terminal. Here are the usage notes from the current version of the script. The same usage notes can be obtained on any AHPCC cluster by executing this command: \\ /path/to/ahpcc_slurmjob_watcher.sh/on/pinnacle --help =================================================================================== ahpcc_slurmjob_watcher.sh (Arkansas High Performance Computing Center's tool to help you be a good WATCHER of your SLURM JOBs) Written by: T. Ryan Rogers (trr007@email.uark.edu) Last modified: 10/02/2019 ----------------------------------------------------------------------------------- __USAGE__ ahpcc_slurmjob_watcher.sh [OPTIONS] __SYNOPSIS__ This script prints custom-formatted information about the user's SLURM jobs. Currently, this program can not be used (unlike SLURM\'s squeue command) to see information about another user's jobs. None of the options are required. Without any additional arguments, the script will display default information for all the user\'s running jobs. Options can be used to display pending jobs, extra information, & selective output. __OPTIONS__ -h, --help Print this usage/help/manual info, then exit. NUM, all, oo, 00 NUM is Any integer. If provided, information for all running jobs belonging to $USER, along with NUM pending ("PD" status) jobs, is displayed. Other symbols, including "all" or either of the "infinity" symbols cause information to be printed for all running and all pending jobs. -v, --verbose Any string containing "-v" can be used. If used, extra information about all running jobs belonging to $USER are displayed. -j SLURM_JOB_ID Any string containing "-j" can be used. When used, also enter the SLURM_JOB_ID associated with a particular job. Only information about job SLURM_JOB_ID will be printed. ----------------------------------------------------------------------------------- __NOTE_1__ To make full use of this script, the user should use a special setting with the SLURM "--job-name=" designation. Specifically, the job name argument should be set to PROGRAM~~OUTFILE, where "~~" is the required separator. The PROGRAM and OUTFILE keywords are described below. Currently, this script only recognizes the following options for PROGRAM: COMSOL : COMSOL Multiphysics simulation software GAMESS : General Atomic and Molecular Electronic Structure System GAUSSIAN : (computational chemistry software) GROMACS : GROningen MAchine for Chemical Simulations MOLPRO : Molpro Quantum Chemistry Software PQS : Parallel Quantum Solutions SAPT : Symmetry Adapted Perturbation Theory, specifically SAPT2016.1 If you do not see your job type listed there, you can email the author to update the script to include new options for PROGRAM. New PROGRAM strings must be 10 characters or less. For the OUTFILE keyword, any string can be used, as long at the string matches the name of a file present in your running job's scratch directory. __NOTE_2__ The -v flag's "Comment" feature may not work properly if a wildcard is used in the "--job-name=" designation. Whenever possible, the exact output filename should be used without wildcards. E.g. instead of, --job-name=SAPT~~*.out use something like, --job-name=SAPT~~ch3oh-h2o_277_.out __NOTE_3__ This script assumes that your output file(s) are located at one of the two default locations prepared by the AHPCC, viz. either at /scratch/$SLURM_JOB_ID/ or /local_scratch/$SLURM_JOB_ID/. Running jobs in other locations may not be recognized by the script and is therefore discouraged. __NOTE_4__ When using the -j flag, -v is still functional, but printing of PD jobs is disabled. If you wish to see PD jobs, consider executing the script again without the -j flag. =================================================================================== Here are some real-world examples of how to use this script (assuming the above-mentioned alias is set), including some example output: Display the above help page only: sjob -h Display standard information for all your jobs-- including those running and queued (each option equivalent; first two are the infinity symbol): sjob oo sjob 00 sjob all ============================================================================================================= # JOBID PARTITION NODES CPUS HEAD_NODE TIME_LIMIT ELAP_TIME JOB_NAME WORK_DIR ------------------------------------------------------------------------------------------------------------- 1 6310 condo 1 16 c1310 12:00:00 0:06 GROMACS /my/example/job/directory1/ 2 6311 condo 1 16 c1310 12:00:00 0:06 GROMACS /my/example/job/directory2/ 3 6312 condo 1 16 c1311 12:00:00 0:06 GROMACS /my/example/job/directory3/ 4 6313 condo 1 16 c1311 12:00:00 0:06 GROMACS /my/example/job/directory4/ 5 6314 condo 1 16 c1312 12:00:00 0:06 GROMACS /my/example/job/directory5/ 6 6315 condo 1 16 c1312 12:00:00 0:06 GROMACS /my/example/job/directory6/ ============================================================================================================= # JOBID PARTITION NODES CPUS SCHEDNODES TIME_LIMIT EST_START_TIME JOB_NAME WORK_DIR ------------------------------------------------------------------------------------------------------------- 1 6323 condo 1 16 (null) 12:00:00 N/A GROMACS /my/example/job/directory7/ 2 6324 condo 1 16 (null) 12:00:00 N/A GROMACS /my/example/job/directory8/ 3 6325 condo 1 16 (null) 12:00:00 N/A GROMACS /my/example/job/directory9/ ============================================================================================================= Display standard information for all running jobs and only the first 2 queued jobs: sjob 2 ============================================================================================================= # JOBID PARTITION NODES CPUS HEAD_NODE TIME_LIMIT ELAP_TIME JOB_NAME WORK_DIR ------------------------------------------------------------------------------------------------------------- 1 6310 condo 1 16 c1310 12:00:00 0:06 GROMACS /my/example/job/directory1/ 2 6311 condo 1 16 c1310 12:00:00 0:06 GROMACS /my/example/job/directory2/ 3 6312 condo 1 16 c1311 12:00:00 0:06 GROMACS /my/example/job/directory3/ 4 6313 condo 1 16 c1311 12:00:00 0:06 GROMACS /my/example/job/directory4/ 5 6314 condo 1 16 c1312 12:00:00 0:06 GROMACS /my/example/job/directory5/ 6 6315 condo 1 16 c1312 12:00:00 0:06 GROMACS /my/example/job/directory6/ ============================================================================================================= # JOBID PARTITION NODES CPUS SCHEDNODES TIME_LIMIT EST_START_TIME JOB_NAME WORK_DIR ------------------------------------------------------------------------------------------------------------- 1 6323 condo 1 16 (null) 12:00:00 N/A GROMACS /my/example/job/directory7/ 2 6324 condo 1 16 (null) 12:00:00 N/A GROMACS /my/example/job/directory8/ ============================================================================================================= Display extra, "verbose" information only for job with $SLURM_JOBID=6314: sjob -v -j 6314 ======================================================================================================================================== # JOBID PARTITION NODES CPUS NODE_LIST CPU_1m CPU_15m TIME_LIMIT ELAP_TIME JOB_NAME WORK_DIR / COMMENT ---------------------------------------------------------------------------------------------------------------------------------------- 5 6314 condo 1 16 c1312 14.06 1.22 12:00:00 0:36 GROMACS /my/example/job/directory5/ 20400 10.20000 ======================================================================================================================================== The extra output obtained by a "-v" flag is dependent on the "JOB_NAME" that the script finds. Admittedly, understanding the extra information often requires some familiarity the program in use. For example, in these GROMACS jobs, the verbose information includes the simulation step number and simulation time, respectively. Each program will have its own verbose output. Display only the job with $SLURM_JOBID=6314: sjob -j 6314 Display extra, "verbose" information for all running jobs: sjob -v