In this section, we present several Slurm commands and other utilities that are available to help you plan and track your job submissions as well as check the status of the Slurm queues.
Job Management
When interpreting queue and job status, remember that Frontera doesn't operate on a first-come-first-served basis. Instead, the sophisticated, tunable algorithms built into Slurm attempt to keep the system busy, while scheduling jobs in a way that is as fair as possible to everyone. At times this means leaving nodes idle ("draining the queue") to make room for a large job that would otherwise never run. It also means considering each user's "fair share", scheduling jobs so that those who haven't run jobs recently may have a slightly higher priority than those who have.
Monitoring Queue Status with sinfo
and qlimits
TACC's qlimits
command
To display resource limits for the Frontera queues, execute: qlimits
. The result is real-time data; the corresponding information in this document's table of Frontera queues may lag behind the actual configuration that the qlimits
utility displays.
Slurm's sinfo
command
Slurm's sinfo
command allows you to monitor the status of the queues. If you execute sinfo
without arguments, you'll see a list of every node in the system together with its status. To skip the node list and produce a tight, alphabetized summary of the available queues and their status, execute:
login1$ sinfo -S+P -o "%18P %8a %20F" # compact summary of queue status
An excerpt from this command's output might look like this:
login1$ sinfo -S+P -o "%18P %8a %20F" PARTITION AVAIL NODES(A/I/O/T) debug up 1757/4419/776/6952 development* up 85/153/114/352 large up 1691/112/485/2288 normal up 1691/112/485/2288
The AVAIL
column displays the overall status of each queue (up or down), while the column labeled NODES(A/I/O/T)
shows the number of nodes in each of several states ("Allocated", "Idle", "Offline", and "Total"). Execute man sinfo
for more information. Use caution when reading the generic documentation, however: some available fields are not meaningful or are misleading on Frontera (e.g. TIMELIMIT
, displayed using the %l
option).
Monitoring Job Status
Slurm's squeue
command
Slurm's squeue
command allows you to monitor jobs in the queues, whether pending (waiting) or currently running:
login1$ squeue # show all jobs in all queues login1$ squeue -u bjones # show all jobs owned by bjones login1$ man squeue # more info
An excerpt from the default output might look like this:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 25781 debug idv72397 bjones CG 9:36 2 c190-131,c191-092 25918 debug ppm_4828 bjones PD 0:00 4828 (Resources) 25915 debug MV2-test siliu PD 0:00 4200 (Priority) 25940 normal SWMF xtwang PD 0:00 18 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions) 25589 normal aatest slindsey PD 0:00 8 (Dependency) 25949 debug psdns_la sniffjck PD 0:00 256 (Priority) 25942 normal WRF2 sniffjck PD 0:00 128 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions) 25618 normal SP256U connor PD 0:00 1 (Dependency) 25944 normal MoTi_hi wchung R 35:13 1 c112-203 25945 normal WTi_hi_e wchung R 27:11 1 c113-131 25606 normal trainA jackhu R 23:28:28 1 c119-152
The column labeled ST
displays each job's status:
PD
means "Pending" (waiting);R
means "Running";CG
means "Completing" (cleaning up after exiting the job script).
Pending jobs appear in order of decreasing priority. The last column includes a nodelist for running/completing jobs, or a reason for pending jobs. If you submit a job before a scheduled system maintenance period, and the job cannot complete before the maintenance begins, your job will run when the maintenance/reservation concludes. The squeue
command will report ReqNodeNotAvailable
("Required Node Not Available"). The job will remain in the PD
state until Frontera returns to production.
The default format for squeue
now reports total nodes associated with a job rather than cores, tasks, or hardware threads. One reason for this change is clarity: the operating system sees each compute node's 56 hardware threads as "processors", and output based on that information can be ambiguous or otherwise difficult to interpret.
The default format lists all nodes assigned to displayed jobs; this can make the output difficult to read. A handy variation that suppresses the nodelist is:
login1$ squeue -o "%.10i %.12P %.12j %.9u %.2t %.9M %.6D" # suppress nodelist
The --start
option displays job start times, including very rough estimates for the expected start times of some pending jobs that are relatively high in the queue:
login1$ squeue --start -j 167635 # display estimated start time for job 167635
TACC's showq
utility
TACC's showq
utility mimics a tool that originated in the PBS project, and serves as a popular alternative to the Slurm squeue
command:
login1$ showq # show all jobs; default format login1$ showq -u # show your own jobs login1$ showq -U bjones # show jobs associated with user bjones login1$ showq -h # more info
The output groups jobs in four categories: ACTIVE
, WAITING
, BLOCKED
, and COMPLETING/ERRORED
. A BLOCKED
job is one that cannot yet run due to temporary circumstances (e.g. a pending maintenance or other large reservation.).
If your waiting job cannot complete before a maintenance/reservation begins, showq
will display its state as **WaitNod**
("Waiting for Nodes"). The job will remain in this state until Frontera returns to production.
The default format for showq
now reports total nodes associated with a job rather than cores, tasks, or hardware threads. One reason for this change is clarity: the operating system sees each compute node's 56 hardware threads as "processors", and output based on that information can be ambiguous or otherwise difficult to interpret.
Above Frontera
Other Job Management Commands
scancel
, scontrol
, and sacct
It's not possible to add resources to a job (e.g. allow more time) once you've submitted the job to the queue.
To cancel a pending or running job, first determine its jobid, then use scancel
:
login1$ squeue -u bjones # one way to determine jobid JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 170361 normal spec12 bjones PD 0:00 32 (Resources) login1$ scancel 170361 # cancel job
For detailed information about the configuration of a specific job, use scontrol
:
login1$ scontrol show job=170361
To view some accounting data associated with your own jobs, use sacct
:
login1$ sacct --starttime 2019-06-01 # show jobs that started on or after this date
Dependent Jobs using sbatch
You can use sbatch
to help manage workflows that involve multiple steps: the --dependency
option allows you to launch jobs that depend on the completion (or successful completion) of another job. For example you could use this technique to split into three jobs a workflow that requires you to (1) compile on a single node; then (2) compute on 40 nodes; then finally (3) post-process your results using 4 nodes.
login1$ sbatch --dependency=afterok:173210 myjobscript
For more information see the Slurm online documentation. Note that you can use $SLURM_JOBID
from one job to find the jobid you'll need to construct the sbatch
launch line for a subsequent one. But also remember that you can't use sbatch
to submit a job from a compute node.