Job State Codes
Each job in the Slurm system has a state assigned to it. How the job state is displayed depends on the method used to identify the state.
Overview
In the Slurm code, there are base states and state flags. Each job has a base state and may have additional state flags set. When using the REST API, both the base state and current flag(s) will be returned.
When the squeue and sacct command report a job state, they represent it as a single state. Both will recognize all base states but not all state flags. If a recognized flag is present, it will be reported instead of the base state. Refer to the relevant command documentation for details.
This page represents all job codes and flags that are represented in the
code. The names provided are the string representations that are used in
user-facing output. For most, the names used in the code are identical, with
JOB_
at the start.
For more visibility into the job states and flags, set
DebugFlags=TraceJobs
and SlurmctldDebug=verbose
(or higher) in slurm.conf.
Job states
Each job known to the system will have one of the following states:
Name | Description |
PENDING | queued and waiting for initiation; will typically have a reason code specifying why it has not yet started |
RUNNING | allocated resources and executing |
SUSPENDED | allocated resources, execution suspended; commonly caused by preemption or a direct request from an authorized user |
COMPLETED | completed execution successfully; finished with an exit code of zero on all nodes |
CANCELLED | cancelled by user or administrator |
FAILED | completed execution unsuccessfully; non-zero exit code or other failure condition |
TIMEOUT | terminated on reaching time limit; time limit may have been configured in slurm.conf or at job submission |
NODE_FAIL | terminated on node failure |
PREEMPTED | terminated due to preemption; may transition to another state based on the configured PreemptMode and job characteristics |
BOOT_FAIL | terminated due to node boot failure |
DEADLINE | terminated due to reaching deadline specified at job submission |
OUT_OF_MEMORY | experienced out of memory error |
Job flags
Jobs may have additional flags set:
Name | Description |
LAUNCH_FAILED | failed to launch on the chosen node(s); includes prolog failure and other failure conditions |
UPDATE_DB | sending an update about the job to the database |
REQUEUED | job is being requeued, whether due to preemption or a direct request from an authorized user |
REQUEUE_HOLD | same as REQUEUED but will
not be considered for scheduling until it is
released |
SPECIAL_EXIT | same as REQUEUE_HOLD but
used to identify a special situation
that applies to this job |
RESIZING | the size of the job is changing; prevents conflicting job changes from taking place |
CONFIGURING | job has been allocated nodes and is waiting for them to boot or reboot |
COMPLETING | the job has finished or been cancelled and is performing cleanup tasks, including the epilog script if present |
STOPPED | received SIGSTOP to suspend the job without releasing resources |
RECONFIG_FAIL | node configuration for job failed |
POWER_UP_NODE | job has been allocated powered down nodes and is waiting for them to boot |
REVOKED | revoked due to conditions of its sibling job in a federated setup |
REQUEUE_FED | requeued due to conditions of its sibling job in a federated setup |
RESV_DEL_HOLD | held due to deleted reservation |
SIGNALING | outgoing signal to job is pending |
STAGE_OUT | staging out data (burst buffer) |
Last modified 22 August 2024