Job State Codes

Each job in the Slurm system has a state assigned to it. How the job state is displayed depends on the method used to identify the state.

Overview

In the Slurm code, there are base states and state flags. Each job has a base state and may have additional state flags set. When using the REST API, both the base state and current flag(s) will be returned.

When the squeue and sacct command report a job state, they represent it as a single state. Both will recognize all base states but not all state flags. If a recognized flag is present, it will be reported instead of the base state. Refer to the relevant command documentation for details.

This page represents all job codes and flags that are represented in the code. The names provided are the string representations that are used in user-facing output. For most, the names used in the code are identical, with JOB_ at the start. For more visibility into the job states and flags, set DebugFlags=TraceJobs and SlurmctldDebug=verbose (or higher) in slurm.conf.

Job states

Each job known to the system will have one of the following states:

NameDescription
PENDINGqueued and waiting for initiation; will typically have a reason code specifying why it has not yet started
RUNNINGallocated resources and executing
SUSPENDEDallocated resources, execution suspended; commonly caused by preemption or a direct request from an authorized user
COMPLETEDcompleted execution successfully; finished with an exit code of zero on all nodes
CANCELLEDcancelled by user or administrator
FAILEDcompleted execution unsuccessfully; non-zero exit code or other failure condition
TIMEOUTterminated on reaching time limit; time limit may have been configured in slurm.conf or at job submission
NODE_FAILterminated on node failure
PREEMPTEDterminated due to preemption; may transition to another state based on the configured PreemptMode and job characteristics
BOOT_FAILterminated due to node boot failure
DEADLINEterminated due to reaching deadline specified at job submission
OUT_OF_MEMORYexperienced out of memory error

Job flags

Jobs may have additional flags set:

NameDescription
LAUNCH_FAILEDfailed to launch on the chosen node(s); includes prolog failure and other failure conditions
UPDATE_DBsending an update about the job to the database
REQUEUEDjob is being requeued, whether due to preemption or a direct request from an authorized user
REQUEUE_HOLDsame as REQUEUED but will not be considered for scheduling until it is released
SPECIAL_EXITsame as REQUEUE_HOLD but used to identify a special situation that applies to this job
RESIZINGthe size of the job is changing; prevents conflicting job changes from taking place
CONFIGURINGjob has been allocated nodes and is waiting for them to boot or reboot
COMPLETINGthe job has finished or been cancelled and is performing cleanup tasks, including the epilog script if present
STOPPEDreceived SIGSTOP to suspend the job without releasing resources
RECONFIG_FAILnode configuration for job failed
POWER_UP_NODEjob has been allocated powered down nodes and is waiting for them to boot
REVOKEDrevoked due to conditions of its sibling job in a federated setup
REQUEUE_FEDrequeued due to conditions of its sibling job in a federated setup
RESV_DEL_HOLDheld due to deleted reservation
SIGNALINGoutgoing signal to job is pending
STAGE_OUTstaging out data (burst buffer)

Last modified 22 August 2024