sacctmgr

Section: Slurm Commands (1)
Updated: Slurm Commands
Index

 

NAME

sacctmgr - Used to view and modify Slurm account information.

 

SYNOPSIS

sacctmgr [OPTIONS...] [COMMAND...]

 

DESCRIPTION

sacctmgr is used to view or modify Slurm account information. The account information is maintained within a database with the interface being provided by slurmdbd (Slurm Database daemon). This database can serve as a central storehouse of user and computer information for multiple computers at a single site. Slurm account information is recorded based upon four parameters that form what is referred to as an association. These parameters are user, cluster, partition, and account. user is the login name. cluster is the name of a Slurm managed cluster as specified by the ClusterName parameter in the slurm.conf configuration file. partition is the name of a Slurm partition on that cluster. account is the bank account for a job. The intended mode of operation is to initiate the sacctmgr command, add, delete, modify, and/or list association records then commit the changes and exit.

NOTE: The contents of Slurm's database are maintained in lower case. This may result in some sacctmgr output differing from that of other Slurm commands.

 

OPTIONS

-s, --associations
Use with show or list to display associations with the entity. This is equivalent to the associations command.

-h, --help
Print a help message describing the usage of sacctmgr. This is equivalent to the help command.

-i, --immediate
Commit changes immediately without asking for confirmation.

--json, --json=list, --json=<data_parser>
Dump information as JSON using the default data_parser plugin or explicit data_parser with parameters. Sorting and formatting arguments will be ignored.

-n, --noheader
No header will be added to the beginning of the output.

-p, --parsable
Output will be '|' delimited with a '|' at the end.

-P, --parsable2
Output will be '|' delimited without a '|' at the end.

-Q, --quiet
Print no messages other than error messages. This is equivalent to the quiet command.

-r, --readonly
Makes it so the running sacctmgr cannot modify accounting information. The readonly option is for use within interactive mode.

--yaml, --yaml=list, --yaml=<data_parser>
Dump information as YAML using the default data_parser plugin or explicit data_parser with parameters. Sorting and formatting arguments will be ignored.

-v, --verbose
Enable detailed logging. This is equivalent to the verbose command.

-V , --version
Display version number. This is equivalent to the version command.

 

COMMANDS

add <ENTITY> <SPECS>
Add an entity. Identical to the create command.

archive {dump|load} <SPECS>
Write database information to a flat file or load information that has previously been written to a file.

clear stats
Clear the server statistics.

create <ENTITY> <SPECS>
Add an entity. Identical to the add command.

delete <ENTITY> where <SPECS>
Delete the specified entities. Identical to the remove command.

dump <cluster>
Dump cluster data to the specified file. If the filename is not specified it uses clustername.cfg filename by default.

help
Display a description of sacctmgr options and commands.

list <ENTITY> [<SPECS>]
Display information about the specified entity. By default, all entries are displayed, you can narrow results by specifying SPECS in your query. Identical to the show command.

load <FILENAME>
Load cluster data from the specified file. This is a configuration file generated by running the sacctmgr dump command. This command does not load archive data, see the sacctmgr archive load option instead.

modify <ENTITY> where <SPECS> set <SPECS>
Modify an entity.

reconfigure
Reconfigures the SlurmDBD if running with one.

remove <ENTITY> where <SPECS>
Delete the specified entities. Identical to the delete command.

show <ENTITY> [<SPECS>]
Display information about the specified entity. By default, all entries are displayed, you can narrow results by specifying SPECS in your query. Identical to the list command.

shutdown
Shutdown the server.

version
Display the version number of sacctmgr.

 

INTERACTIVE COMMANDS

NOTE: All commands listed below can be used in the interactive mode, but NOT on the initial command line.

exit
Terminate sacctmgr interactive mode. Identical to the quit command.

quiet
Print no messages other than error messages.

quit
Terminate the execution of sacctmgr interactive mode. Identical to the exit command.

verbose
Enable detailed logging. This includes time-stamps on data structures, record counts, etc. This is an independent command with no options meant for use in interactive mode.

!!
Repeat the last command.

 

ENTITIES

account
A bank account, typically specified at job submit time using the --account= option. These may be arranged in a hierarchical fashion, for example accounts 'chemistry' and 'physics' may be children of the account 'science'. The hierarchy may have an arbitrary depth.

association
The entity used to group information consisting of four parameters: account, cluster, partition (optional), and user. Used only with the list or show command. Add, modify, and delete should be done to a user, account or cluster entity, which will in turn update the underlying associations. Modification of attributes like limits is allowed for an association but not a modification of the four core attributes of an association. You cannot change the partition setting (or set one if it has not been set) for an existing association. Instead, you will need to create a new association with the partition included. You can either keep the previous association with no partition defined, or delete it. Note that these newly added associations are unique entities and any existing usage information will not be carried over to the new association.

cluster
The ClusterName parameter in the slurm.conf configuration file, used to differentiate accounts on different machines.

configuration
Used only with the list or show command to report current system configuration.

coordinator
A special privileged user, usually an account manager, that can add users or sub-accounts to the account they are coordinator over. This should be a trusted person since they can change limits on account and user associations, as well as cancel, requeue or reassign accounts of jobs inside their realm.

event
Events like downed or draining nodes on clusters.

federation
A group of clusters that work together to schedule jobs.

job
Used to modify specific fields of a job: Derived Exit Code, Comment, AdminComment, Extra, SystemComment, or WCKey.

problem
Use with show or list to display entity problems.

qos
Quality of Service.

reservation
A collection of resources set apart for use by a particular account, user or group of users for a given period of time.

resource
Software resources for the system. Those are software licenses shared among clusters.

RunawayJobs
Used only with the list or show command to report current jobs that have been orphaned on the local cluster and are now runaway. If there are jobs in this state it will also give you an option to "fix" them. NOTE: You must have an AdminLevel of at least Operator to perform this.

stats
Used with list or show command to view server statistics. Accepts optional argument of ave_time or total_time to sort on those fields. By default, sorts on increasing RPC count field.

transaction
List of transactions that have occurred during a given time period.

tres
Used with list or show command to view a list of Trackable RESources configured on the system.

user
The login name. Usernames are case-insensitive (forced to lowercase) unless the PreserveCaseUser option has been set in the SlurmDBD configuration file.

wckeys
Workload Characterization Key. An arbitrary string for grouping orthogonal accounts.

 

GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES

NOTE: The group limits (GrpJobs, GrpTRES, etc.) are tested when a job is being considered for being allocated resources. If starting a job would cause any of its group limit to be exceeded, that job will not be considered for scheduling even if that job might preempt other jobs which would release sufficient group resources for the pending job to be initiated.

DefaultQOS=<default_qos>
The default QOS this association and its children should have. This is overridden if set directly on a user. To clear a previously set value use the modify command with a new value of -1.

Fairshare={<fairshare_number>|parent}
Share={<fairshare_number>|parent}
Number used in conjunction with other accounts to determine job priority. Can also be the string parent, when used on a user this means that the parent association is used for fairshare. If Fairshare=parent is set on an account, that account's children will be effectively re-parented for fairshare calculations to the first parent of their parent that is not Fairshare=parent. Limits remain the same, only its fairshare value is affected. To clear a previously set value use the modify command with a new value of -1.

GrpJobs=<max_jobs>
Maximum number of running jobs in aggregate for this association and all associations which are children of this association. To clear a previously set value use the modify command with a new value of -1.

GrpJobsAccrue=<max_jobs>
Maximum number of pending jobs in aggregate able to accrue age priority for this association and all associations which are children of this association. To clear a previously set value use the modify command with a new value of -1.

GrpSubmit=<max_jobs>
GrpSubmitJobs=<max_jobs>
Maximum number of jobs which can be in a pending or running state at any time in aggregate for this association and all associations which are children of this association. To clear a previously set value use the modify command with a new value of -1.

GrpTRES=TRES=<max_TRES>[,TRES=<max_TRES>,...]
Maximum number of TRES running jobs are able to be allocated in aggregate for this association and all associations which are children of this association. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres.

NOTE: This limit only applies fully when using the Select Consumable Resource plugin.

GrpTRESMins=TRES=<minutes>[,TRES=<minutes>,...]
The total number of TRES minutes that can possibly be used by past, present and future jobs running from this association and its children. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres.

NOTE: This limit is not enforced if set on the root association of a cluster. So even though it may appear in sacctmgr output, it will not be enforced.

NOTE: This limit only applies when using the Priority Multifactor plugin. The time is decayed using the value of PriorityDecayHalfLife or PriorityUsageResetPeriod as set in the slurm.conf. When this limit is reached all associated jobs running will be killed and all future jobs submitted with associations in the group will be delayed until they are able to run inside the limit.

GrpTRESRunMins=TRES=<minutes>[,TRES=<minutes>,...]
Used to limit the combined total number of TRES minutes used by all jobs running with this association and its children. This takes into consideration time limit of running jobs and consumes it, if the limit is reached no new jobs are started until other jobs finish to allow time to free up. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres.

GrpWall=<max_wall>
Maximum wall clock time running jobs are able to be allocated in aggregate for this association and all associations which are children of this association. To clear a previously set value use the modify command with a new value of -1.

NOTE: This limit is not enforced if set on the root association of a cluster. So even though it may appear in sacctmgr output, it will not be enforced.

NOTE: This limit only applies when using the Priority Multifactor plugin. The time is decayed using the value of PriorityDecayHalfLife or PriorityUsageResetPeriod as set in the slurm.conf. When this limit is reached all associated jobs running will be killed and all future jobs submitted with associations in the group will be delayed until they are able to run inside the limit.

MaxJobs=<max_jobs>
Maximum number of jobs each user is allowed to run at one time in this association. This is overridden if set directly on a user. Default is the cluster's limit. To clear a previously set value use the modify command with a new value of -1.

MaxJobsAccrue=<max_jobs>
Maximum number of pending jobs able to accrue age priority at any given time for the given association. This is overridden if set directly on a user. Default is the cluster's limit. To clear a previously set value use the modify command with a new value of -1.

MaxSubmit=<max_jobs>
MaxSubmitJobs=<max_jobs>
Maximum number of jobs which this association can have in a pending or running state at any time. Default is the cluster's limit. To clear a previously set value use the modify command with a new value of -1.

MaxTRESMins=TRES=<minutes>[,TRES=<minutes>,...]
MaxTRESMinsPerJob=TRES=<minutes>[,TRES=<minutes>,...]
Maximum number of TRES minutes each job is able to use in this association. This is overridden if set directly on a user. Default is the cluster's limit. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres.

MaxTRES=TRES=<max_TRES>[,TRES=<max_TRES>,...]
MaxTRESPerJob=TRES=<max_TRES>[,TRES=<max_TRES>,...]
Maximum number of TRES each job is able to use in this association. This is overridden if set directly on a user. Default is the cluster's limit. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres.

NOTE: This limit only applies fully when using the cons_tres select type plugin.

MaxWall=<max_wall>
MaxWallDurationPerJob=<max_wall>
Maximum wall clock time each job is able to use in this association. This is overridden if set directly on a user. Default is the cluster's limit. <max wall> format is <min> or <min>:<sec> or <hr>:<min>:<sec> or <days>-<hr>:<min>:<sec> or <days>-<hr>. The value is recorded in minutes with rounding as needed. To clear a previously set value use the modify command with a new value of -1.

NOTE: Changing this value will have no effect on any running or pending job.

Priority
What priority will be added to a job's priority when using this association. This is overridden if set directly on a user. Default is the cluster's limit. To clear a previously set value use the modify command with a new value of -1.

QosLevel<operator><comma_separated_list_of_qos_names>
Specify the default Quality of Service's that jobs are able to run at for this association. To get a list of valid QOSs use 'sacctmgr list qos'. This value will override its parents value and push down to its children as the new default. Setting a QosLevel to '' (two single quotes with nothing between them) restores its default setting. You can also use the operator += and -= to add or remove certain QOSs from a QOS list.

Valid <operator> values include:

=
Set QosLevel to the specified value. Note: the QOS that can be used at a given account in the hierarchy are inherited by the children of that account. By assigning QOS with the = sign only the assigned QOS can be used by the account and its children.
+=
Add the specified <qos> value to the current QosLevel. The account will have access to this QOS and the other previously assigned to it.
-=
Remove the specified <qos> value from the current QosLevel.
See the EXAMPLES section below.

 

SPECIFICATIONS FOR ACCOUNTS

Accounts can be created, modified, and deleted with sacctmgr. These options allow you to set the corresponding attributes or filter on them when querying for Accounts.

Cluster=<cluster>
Specific cluster to add account to. Default is all in system.

Description=<description>
An arbitrary string describing an account.

Name=<name>
The name of a bank account. Note the name must be unique and can not be represent different bank accounts at different points in the account hierarchy.

Organization=<org>
Organization to which the account belongs.

Parent=<parent>
Parent account of this account. Default is the root account, a top level account.

RawUsage=<value>
This allows an administrator to reset the raw usage accrued to an account. The only value currently supported is 0 (zero). This is a settable specification only - it cannot be used as a filter to list accounts.

WithAssoc
Display all associations for this account.

WithCoord
Display all coordinators for this account.

WithDeleted
Display information with previously deleted data. Accounts that are deleted within 24 hours of being created and did not have a job run in the account during that time will be removed from the database. Otherwise, the account will be marked as deleted and will be viewable with the WithDeleted flag.

NOTE: If using the WithAssoc option you can also query against association specific information to view only certain associations this account may have. These extra options can be found in the SPECIFICATIONS FOR ASSOCIATIONS section. You can also use the general specifications list above in the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES section.

 

LIST/SHOW ACCOUNT FORMAT OPTIONS

Fields you can display when viewing Account records by using the format= option. The default format is:
Account,Description,Organization

Account
The name of a bank account.

Description
An arbitrary string describing an account.

Organization
Organization to which the account belongs.

Coordinators
List of users that are a coordinator of the account. (Only filled in when using the WithCoordinator option.)

NOTE: If using the WithAssoc option you can also view the information about the various associations the account may have on all the clusters in the system. The association information can be filtered. Note that all the accounts in the database will always be shown as filter only takes effect over the association data. The Association format fields are described in the LIST/SHOW ASSOCIATION FORMAT OPTIONS section.

 

SPECIFICATIONS FOR ASSOCIATIONS

Associations can be created, modified, and deleted with sacctmgr. These options allow you to set the corresponding attributes or filter on them when querying for Associations.

Clusters=<cluster_name>[,<cluster_name>,...]
List the associations of the cluster(s).

Accounts=<account_name>[,<account_name>,...]
List the associations of the account(s).

Users=<user_name>[,<user_name>,...]
List the associations of the user(s).

Partition=<partition_name>[,<partition_name>,...]
List the associations of the partition(s).

NOTE: You can also use the general specifications list above in the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES section.

Other options unique for listing associations:

OnlyDefaults
Display only associations that are default associations

Tree
Display account names in a hierarchical fashion.

WithDeleted
Display information with previously deleted data. Associations that are deleted within 24 hours of being created and did not have a job run in the association during that time will be removed from the database. Otherwise, the association will be marked as deleted and will be viewable with the WithDeleted flag.

WithSubAccounts
Display information with subaccounts. Only really valuable when used with the account= option. This will display all the subaccount associations along with the accounts listed in the option.

WOLimits
Display information without limit information. This is for a smaller default format of "Cluster,Account,User,Partition".

WOPInfo
Display information without parent information (i.e. parent id, and parent account name). This option also implicitly sets the WOPLimits option.

WOPLimits
Display information without hierarchical parent limits (i.e. will only display limits where they are set instead of propagating them from the parent).

 

LIST/SHOW ASSOCIATION FORMAT OPTIONS

Fields you can display when viewing Association records by using the format= option.

Account
The name of a bank account in the association.

Cluster
The name of a cluster in the association.

DefaultQOS
The QOS the association will use by default if it as access to it in the QOS list mentioned below.

Fairshare
Share
Number used in conjunction with other accounts to determine job priority. Can also be the string parent, when used on a user this means that the parent association is used for fairshare. If Fairshare=parent is set on an account, that account's children will be effectively re-parented for fairshare calculations to the first parent of their parent that is not Fairshare=parent. Limits remain the same, only its fairshare value is affected.

GrpJobs
Maximum number of running jobs in aggregate for this association and all associations which are children of this association.

GrpJobsAccrue
Maximum number of pending jobs in aggregate able to accrue age priority for this association and all associations which are children of this association.

GrpSubmit
GrpSubmitJobs
Maximum number of jobs which can be in a pending or running state at any time in aggregate for this association and all associations which are children of this association.

GrpTRES
Maximum number of TRES running jobs are able to be allocated in aggregate for this association and all associations which are children of this association.

GrpTRESMins
The total number of TRES minutes that can possibly be used by past, present and future jobs running from this association and its children.

GrpTRESRunMins
Used to limit the combined total number of TRES minutes used by all jobs running with this association and its children. This takes into consideration time limit of running jobs and consumes it, if the limit is reached no new jobs are started until other jobs finish to allow time to free up.

GrpWall
Maximum wall clock time running jobs are able to be allocated in aggregate for this association and all associations which are children of this association.

ID
The id of the association.

LFT
Associations are kept in a hierarchy: this is the left most spot in the hierarchy. When used with the RGT variable, all associations with a LFT inside this LFT and before the RGT are children of this association.

MaxJobs
Maximum number of jobs each user is allowed to run at one time.

MaxJobsAccrue
Maximum number of pending jobs able to accrue age priority at any given time. This limit only applies to the job's QOS and not the partition's QOS.

MaxSubmit
MaxSubmitJobs
Maximum number of jobs in the pending or running state at any time.

MaxTRES
MaxTRESPerJob
Maximum number of TRES each job is able to use.

MaxTRESMins
MaxTRESMinsPerJob
Maximum number of TRES minutes each job is able to use.

MaxTRESPerNode
Maximum number of TRES each node in a job allocation can use.

MaxWall
MaxWallDurationPerJob
Maximum wall clock time each job is able to use.

Qos
Valid QOSs for this association.

QosRaw
Numeric IDs of valid QOSs for this association.

ParentID
The association id of the parent of this association.

ParentName
The account name of the parent of this association.

Partition
The name of a partition in the association.

Priority
What priority will be added to a job's priority when using this association.

RGT
Associations are kept in a hierarchy: this is the right most spot in the hierarchy. When used with the LFT variable, all associations with a LFT inside this RGT and after the LFT are children of this association.

User
The name of a user in the association.

WithRawQOSLevel
Display QosLevel in an unevaluated raw format, consisting of a comma separated list of QOS names prepended with '' (nothing), '+' or '-' for the association. QOS names without +/- prepended were assigned (ie, sacctmgr modify ... set QosLevel=qos_name) for the entity listed or on one of its parents in the hierarchy. QOS names with +/- prepended indicate the QOS was added/filtered (ie, sacctmgr modify ... set QosLevel=[+-]qos_name) for the entity listed or on one of its parents in the hierarchy. Including WOPLimits will show exactly where each QOS was assigned, added or filtered in the hierarchy.

 

SPECIFICATIONS FOR CLUSTERS

Clusters can be created, modified, and deleted with sacctmgr. These options allow you to set the corresponding attributes or filter on them when querying for Clusters.

Classification=<classification>
Type of machine, current classifications are capability, capacity and capapacity.

Features=<comma_separated_list_of_feature_names>
Features that are specific to the cluster. Federated jobs can be directed to clusters that contain the job requested features. To clear a previously set value, use the modify command with a new value of '' (two single quotes with nothing between them).

Federation=<federation>
The federation that this cluster should be a member of. A cluster can only be a member of one federation at a time.

FedState=<state>
The state of the cluster in the federation.
Valid states are:
ACTIVE
Cluster will actively accept and schedule federated jobs.

INACTIVE
Cluster will not schedule or accept any jobs.

DRAIN
Cluster will not accept any new jobs and will let existing federated jobs complete.

DRAIN+REMOVE
Cluster will not accept any new jobs and will remove itself from the federation once all federated jobs have completed. When removed from the federation, the cluster will accept jobs as a non-federated cluster.

Name=<name>
The name of a cluster. This should be equal to the ClusterName parameter in the slurm.conf configuration file for some Slurm-managed cluster.

RPC=<rpc_list>
Comma separated list of numeric RPC values.

WithDeleted
Display information with previously deleted data. Clusters that are deleted within 24 hours of being created and did not have a job run in the cluster during that time will be removed from the database. Otherwise, the cluster will be marked as deleted and will be viewable with the WithDeleted flag.

WithFed
Appends federation related columns to default format options (e.g. Federation,ID,Features,FedState).

WOLimits
Display information without limit information. This is for a smaller default format of Cluster,ControlHost,ControlPort,RPC

NOTE: You can also use the general specifications list above in the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES section.

 

LIST/SHOW CLUSTER FORMAT OPTIONS

Fields you can display when viewing Cluster records by using the format= option.

Classification
Type of machine, i.e. capability, capacity or capapacity.

Cluster
The name of the cluster.

ControlHost
When a slurmctld registers with the database the ip address of the controller is placed here.

ControlPort
When a slurmctld registers with the database the port the controller is listening on is placed here.

Features
The list of features on the cluster (if any).

Federation
The name of the federation this cluster is a member of (if any).

FedState
The state of the cluster in the federation (if a member of one).

FedStateRaw
Numeric value of the name of the FedState.

Flags
Attributes possessed by the cluster. Current flags include Cray, External and MultipleSlurmd.

External clusters are registration only clusters. A slurmctld can designate an external slurmdbd with the AccountingStorageExternalHost slurm.conf option. This allows a slurmctld to register to an external slurmdbd so that clusters attached to the external slurmdbd can communicate with the external cluster with Slurm commands.

ID
The ID assigned to the cluster when a member of a federation. This ID uniquely identifies the cluster and its jobs in the federation.

NodeCount
The current count of nodes associated with the cluster.

NodeNames
The current Nodes associated with the cluster.

RPC
When a slurmctld registers with the database the rpc version the controller is running is placed here.

TRES
Trackable RESources (Billing, BB (Burst buffer), CPU, Energy, GRES, License, Memory, and Node) this cluster is accounting for.

NOTE: You can also view the information about the root association for the cluster. The Association format fields are described in the LIST/SHOW ASSOCIATION FORMAT OPTIONS section.

 

SPECIFICATIONS FOR COORDINATOR

Coordinators can be created, modified, and deleted with sacctmgr. These options allow you to set the corresponding attributes or filter on them when querying for Coordinators.

Account=<account_name>[,<account_name>,...]
Account name to add this user as a coordinator to.

Names=<user_name>[,<user_name>,...]
Names of coordinators.

NOTE: To list coordinators use the WithCoordinator options with list account or list user.

 

SPECIFICATIONS FOR EVENTS

Events are automatically generated and sent to slurmdbd to be stored. These are options you can specify to filter for specific types of events.

All_Clusters
Shortcut to get information on all clusters.

All_Time
Shortcut to get time period for all time.

Clusters=<cluster_name>[,<cluster_name>,...]
List the events of the cluster(s). Default is the cluster where the command was run.

CondFlags=<flag>[,<flag>,...]
Optional list of flags to filter events by.
Valid options are:
Open
If set, only open node events (currently down) will be returned.

End=<OPT>
Period ending of events. Default is now.
Valid time formats are:

HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
now[{+|-}count[seconds(default)|minutes|hours|days|weeks]]

Event=<OPT>
Specific types of events to look for. Valid options are Cluster or Node. The default is both.

MaxCPUs=<OPT>
Max number of CPUs affected by an event.

MinCPUs=<OPT>
Min number of CPUs affected by an event.

Nodes=<node_name>[,<node_name>,...]
Node names affected by an event.

Reason=<reason>[,<reason>,...]
Reason associated with a node going down. A reason that contains a space should be surrounded by quotes.

Start=<OPT>
Period start of events. Default is 00:00:00 of previous day, unless states are given with the States=<spec> events. If this is the case the default behavior is to return events currently in the states specified.
Valid time formats are:
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
now[{+|-}count[seconds(default)|minutes|hours|days|weeks]]

States=<state>[,<state>,...]
State of a node in a node event. If this is set, the event type is set automatically to Node.

User=<user_name>[,<user_name>,...]
Query against users who set the event. If this is set, the event type is set automatically to Node since only the slurm user can perform a cluster event.

 

LIST/SHOW EVENT FORMAT OPTIONS

Fields you can display when viewing Event records by using the format= option. The default format is:
Cluster,NodeName,TimeStart,TimeEnd,State,Reason,User

Cluster
The name of the cluster event happened on.

ClusterNodes
The hostlist of nodes on a cluster in a cluster event.

Duration
Time period the event was around for.

End
Period when event ended.

Event
Name of the event.

EventRaw
Numeric value of the name of the event.

NodeName
The node affected by the event. In a cluster event, this is blank.

Reason
The reason an event happened.

Start
Period when event started.

State
On a node event this is the formatted state of the node during the event.

StateRaw
On a node event this is the numeric value of the state of the node during the event.

TRES
Number of TRES involved with the event.

User
On a node event this is the user who caused the event to happen.

 

SPECIFICATIONS FOR FEDERATION

Federations can be created, modified, and deleted with sacctmgr. These options allow you to set the corresponding attributes or filter on them when querying for Federations.

Clusters[+|-]=<cluster_name>[,<cluster_name>,...]
List of clusters to add/remove to a federation. A blank value (e.g. clusters=) will remove all federations for the federation. NOTE: A cluster can only be a member of one federation.

Name=<name>
The name of the federation.

Tree
Display federations in a hierarchical fashion.

WithDeleted
Display information with previously deleted data. Federations that are deleted within 24 hours of being created will be removed from the database. Federations that were created more than 24 hours prior to the deletion request are just marked as deleted and will be viewable with the WithDeleted flag.

 

LIST/SHOW FEDERATION FORMAT OPTIONS

Fields you can display when viewing Federation records by using the format= option. The default format is:
Federation,Cluster,Features,FedState

Cluster
Name of the cluster that is a member of the federation.

Features
The list of features on the cluster.

Federation
The name of the federation.

FedState
The state of the cluster in the federation.

FedStateRaw
Numeric value of the name of the FedState.

Index
The index of the cluster in the federation.

 

SPECIFICATIONS FOR INSTANCES

Information about cloud node instances is sent to slurmdbd to be stored. These are options you can specify to filter for specific instances.

Clusters=<cluster_name>[,<cluster_name>,...]
Name of the cluster that the instance ran on. Default is the cluster where the command was run.

End=<OPT>
Period ending of instances. Default is now.

Valid time formats are:
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
now[{+|-}count[seconds(default)|minutes|hours|days|weeks]]

Extra=<OPT>
Arbitrary string associated with node during life of the instance.

InstanceId=<OPT>
Cloud instance ID.

InstanceType=<OPT>
Cloud instance type.

Nodes=<node_name>[,<node_name>,...]
The node on which the instance ran.

Start=<OPT>
Period start of instances. Default is 00:00:00 of previous day.

Valid time formats are:
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
now[{+|-}count[seconds(default)|minutes|hours|days|weeks]]

 

LIST/SHOW INSTANCE FORMAT OPTIONS

Fields you can display when viewing Instance records by using the format= option. The default format is:
Cluster,NodeName,Start,End,InstanceID,InstanceType,Extra

Cluster
Name of the cluster that the instance ran on.

End
Time when instance ended.

Extra
Arbitrary string associated with node during life of the instance.

InstanceId
Cloud instance ID.

InstanceType
Cloud instance type.

NodeName
The node on which the instance ran.

Start
Time when instance started.

 

SPECIFICATIONS FOR JOB

Job information is automatically sent to slurmdbd to be stored. These are options you can specify to filter for specific jobs. There are also some attributes you can modify for a job record.

AdminComment=<admin_comment>
Arbitrary descriptive string. Can only be modified by a Slurm administrator. To clear a previously set value, use the modify command with a new value of '' (two single quotes with nothing between them).

Comment=<comment>
The job's comment string when the AccountingStoreFlags parameter in the slurm.conf file contains 'job_comment'. The user can only modify the comment string of their own job. To clear a previously set value, use the modify command with a new value of '' (two single quotes with nothing between them).

Cluster=<cluster_list>
List of clusters to alter jobs on, defaults to local cluster.

DerivedExitCode=<derived_exit_code>
The derived exit code can be modified after a job completes based on the user's judgment of whether the job succeeded or failed. The user can only modify the derived exit code of their own job.

EndTime
Jobs must end before this time to be modified. Format output is, YYYY-MM-DDTHH:MM:SS, unless changed through the SLURM_TIME_FORMAT environment variable.

Extra=<extra>
The job's extra string when the AccountingStoreFlags parameter in the slurm.conf file contains 'job_extra'. The user can only modify the extra string of their own job. To clear a previously set value, use the modify command with a new value of '' (two single quotes with nothing between them).

JobID=<jobid_list>
The id of the job to change. Not needed if altering multiple jobs using wckey specification.

NewWCKey=<new_wckey>
Use to rename a wckey on job(s) in the accounting database

StartTime
Jobs must start at or after this time to be modified in the same format as EndTime.

SystemComment=<system_comment>
Arbitrary descriptive string, usually managed by the BurstBufferPlugin. Can only be modified by a Slurm administrator. To clear a previously set value, use the modify command with a new value of '' (two single quotes with nothing between them).

User=<user_list>
Used to specify the jobs of users jobs to alter.

WCKey=<wckey_list>
Used to specify the wckeys to alter.

The AdminComment, Comment, DerivedExitCode, Extra, SystemComment, and WCKey fields are the only fields of a job record in the database that can be modified after job completion.

 

LIST/SHOW JOB FORMAT OPTIONS

The sacct command is the exclusive command to display job records from the Slurm database.

 

SPECIFICATIONS FOR QOS

A QOS can be created, modified, and deleted with sacctmgr. These options allow you to set the corresponding attributes or filter on them when querying for a QOS.

NOTE: The group limits (GrpJobs, GrpTRES, etc.) are tested when a job is being considered for being allocated resources. If starting a job would cause any of its group limit to be exceeded, that job will not be considered for scheduling even if that job might preempt other jobs which would release sufficient group resources for the pending job to be initiated.

Description
An arbitrary string describing a QOS. Can only be modified by a Slurm administrator.

Flags
Used by the slurmctld to override or enforce certain characteristics. To clear a previously set value use the modify command with a new value of -1.
Valid options are
DenyOnLimit
If set, jobs using this QOS will be rejected at submission time if they do not conform to the QOS 'Max' or 'Min' limits as stand-alone jobs. Jobs that exceed these limits when other jobs are considered, but conform to the limits when considered individually will not be rejected. Instead they will pend until resources are available. Group limits (e.g. GrpTRES) will also be treated like 'Max' limits (e.g. MaxTRESPerNode) and jobs will be denied if they would violate the limit as stand-alone jobs. This currently only applies to QOS and Association limits.

EnforceUsageThreshold
If set, and the QOS also has a UsageThreshold, any jobs submitted with this QOS that fall below the UsageThreshold will be held until their Fairshare Usage goes above the Threshold.

NoDecay
If set, this QOS will not have its GrpTRESMins, GrpWall and UsageRaw decayed by the slurm.conf PriorityDecayHalfLife or PriorityUsageResetPeriod settings. This allows a QOS to provide aggregate limits that, once consumed, will not be replenished automatically. Such a QOS will act as a time-limited quota of resources for an association that has access to it. Account/user usage will still be decayed for associations using the QOS. The QOS GrpTRESMins and GrpWall limits can be increased or the QOS RawUsage value reset to 0 (zero) to again allow jobs submitted with this QOS to be queued (if DenyOnLimit is set) or run (pending with QOSGrp{TRES}MinutesLimit or QOSGrpWallLimit reasons, where {TRES} is some type of trackable resource).

NoReserve
If this flag is set and backfill scheduling is used, jobs using this QOS will not reserve resources in the backfill schedule's map of resources allocated through time. This flag is intended for use with a QOS that may be preempted by jobs associated with all other QOS (e.g use with a "standby" QOS). If this flag is used with a QOS which can not be preempted by all other QOS, it could result in starvation of larger jobs.

OverPartQOS
If set jobs using this QOS will be able to override any limits used by the requested partition's QOS limits.

PartitionMaxNodes
If set jobs using this QOS will be able to override the requested partition's MaxNodes limit.

PartitionMinNodes
If set jobs using this QOS will be able to override the requested partition's MinNodes limit.

PartitionTimeLimit
If set jobs using this QOS will be able to override the requested partition's TimeLimit.

Relative
If set the QOS limits will be treated as percentages of a cluster/partition instead of absolutes. Since the QOS limits will be based on the size of a cluster or partitions some limitations are inherit. Namely if this flag is set only one partition can have this QOS as it's partition QOS, and if this is used as a partition QOS jobs will not be allowed to use it as a normal QOS.

RequiresReservation
If set jobs using this QOS must designate a reservation when submitting a job. This option can be useful in restricting usage of a QOS that may have greater preemptive capability or additional resources to be allowed only within a reservation.

UsageFactorSafe
If set, and AccountingStorageEnforce includes Safe, jobs will only be able to run if the job can run to completion with the UsageFactor applied.

GraceTime
Preemption grace time, in seconds, to be extended to a job which has been selected for preemption. The default value is zero, no preemption grace time is allowed on this QOS. This value is only meaningful for QOS PreemptMode=CANCEL and PreemptMode=REQUEUE.

GrpJobs
Maximum number of running jobs in aggregate for this QOS. To clear a previously set value use the modify command with a new value of -1.

GrpJobsAccrue
Maximum number of pending jobs in aggregate able to accrue age priority for this QOS. This limit only applies to the job's QOS and not the partition's QOS. To clear a previously set value use the modify command with a new value of -1.

GrpSubmit
GrpSubmitJobs
Maximum number of jobs which can be in a pending or running state at any time in aggregate for this QOS. To clear a previously set value use the modify command with a new value of -1.

GrpTRES
Maximum number of TRES running jobs are able to be allocated in aggregate for this QOS. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres.

GrpTRESMins
The total number of TRES minutes that can possibly be used by past, present and future jobs running from this QOS. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres.

NOTE: This limit only applies when using the Priority Multifactor plugin. The time is decayed using the value of PriorityDecayHalfLife or PriorityUsageResetPeriod as set in the slurm.conf. When this limit is reached all associated jobs running will be killed and all future jobs submitted with this QOS will be delayed until they are able to run inside the limit.

GrpTRESRunMins
Used to limit the combined total number of TRES minutes used by all jobs running with this QOS. This takes into consideration time limit of running jobs and consumes it, if the limit is reached no new jobs are started until other jobs finish to allow time to free up. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres.

GrpWall
Maximum wall clock time running jobs are able to be allocated in aggregate for this QOS. If this limit is reached submission requests will be denied and the running jobs will be killed. To clear a previously set value use the modify command with a new value of -1.

NOTE: This limit only applies when using the Priority Multifactor plugin. The time is decayed using the value of PriorityDecayHalfLife or PriorityUsageResetPeriod as set in the slurm.conf. When this limit is reached all associated jobs running will be killed and all future jobs submitted with this QOS will be delayed until they are able to run inside the limit.

LimitFactor
A float that is factored into an associations [Grp|Max]TRES limits. For example, if the LimitFactor is 2, then an association with a GrpTRES of 30 CPUs, would be allowed to allocate 60 CPUs when running under this QOS. To clear a previously set value use the modify command with a new value of -1. NOTE: This factor is only applied to associations running in this QOS and is not applied to any limits in the QOS itself.

MaxJobsAccruePA
MaxJobsAccruePerAccount
Maximum number of pending jobs an account (or subacct) can have accruing age priority at any given time. This limit only applies to the job's QOS and not the partition's QOS. To clear a previously set value use the modify command with a new value of -1.

MaxJobsAccruePU
MaxJobsAccruePerUser
Maximum number of pending jobs a user can have accruing age priority at any given time. This limit only applies to the job's QOS and not the partition's QOS. To clear a previously set value use the modify command with a new value of -1.

MaxJobsPA
MaxJobsPerAccount
Maximum number of jobs each account is allowed to run at one time. To clear a previously set value use the modify command with a new value of -1.

MaxJobsPU
MaxJobsPerUser
Maximum number of jobs each user is allowed to run at one time. To clear a previously set value use the modify command with a new value of -1.

MaxSubmitJobsPA
MaxSubmitJobsPerAccount
Maximum number of jobs pending or running state at any time per account. To clear a previously set value use the modify command with a new value of -1.

MaxSubmitJobsPU
MaxSubmitJobsPerUser
Maximum number of jobs pending or running state at any time per user. To clear a previously set value use the modify command with a new value of -1.

MaxTRES
MaxTRESPerJob
Maximum number of TRES each job is able to use.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

MaxTRESMins
MaxTRESMinsPerJob
Maximum number of TRES minutes each job is able to use.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

MaxTRESPA
MaxTRESPerAccount
Maximum number of TRES each account is able to use.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

MaxTRESPerNode
Maximum number of TRES each node in a job allocation can use.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

MaxTRESPU
MaxTRESPerUser
Maximum number of TRES each user is able to use.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

MaxWall
MaxWallDurationPerJob
Maximum wall clock time each job is able to use. MaxWall format is <min> or <min>:<sec> or <hr>:<min>:<sec> or <days>-<hr>:<min>:<sec> or <days>-<hr>. The value is recorded in minutes with rounding as needed. To clear a previously set value use the modify command with a new value of -1.

MinPrioThreshold
Minimum priority required to reserve resources when scheduling. To clear a previously set value use the modify command with a new value of -1.

MinTRES
MinTRESPerJob
Minimum number of TRES each job running under this QOS must request. Otherwise the job will pend until modified.

TRES can be one of the Slurm defaults (i.e. cpu, mem, node, etc...), or any defined generic resource. You can see the list of available resources by running sacctmgr show tres. To clear a previously set value use the modify command with a new value of -1 for each TRES id.

Name
Name of the QOS. Needed for creation.

Preempt
Other QOSs this QOS can preempt. To clear a previously set value, use the modify command with a new value of '' (two single quotes with nothing between them).

NOTE: The Priority of a QOS is NOT related to QOS preemption, only Preempt is used to define which QOS can preempt others.

PreemptExemptTime
Specifies a minimum run time for jobs of this QOS before they are considered for preemption. This QOS option takes precedence over the global PreemptExemptTime. This is only honored for PreemptMode=REQUEUE and PreemptMode=CANCEL.
Setting to -1 disables the option, allowing another QOS or the global option to take effect. Setting to 0 indicates no minimum run time and supersedes the lower priority QOS (see OverPartQOS) and/or the global option in slurm.conf.

PreemptMode
Mechanism used to preempt jobs or enable gang scheduling for this QOS when the cluster PreemptType is set to preempt/qos. This QOS-specific PreemptMode will override the cluster-wide PreemptMode for this QOS. Unsetting the QOS specific PreemptMode, by specifying "OFF", "" or "Cluster", makes it use the default cluster-wide PreemptMode.
The GANG option is used to enable gang scheduling independent of whether preemption is enabled (i.e. independent of the PreemptType setting). It can be specified in addition to a PreemptMode setting with the two options comma separated (e.g. PreemptMode=SUSPEND,GANG).
See <preempt> and <gang_scheduling> for more details.

NOTE: For performance reasons, the backfill scheduler reserves whole nodes for jobs, not partial nodes. If during backfill scheduling a job preempts one or more other jobs, the whole nodes for those preempted jobs are reserved for the preemptor job, even if the preemptor job requested fewer resources than that. These reserved nodes aren't available to other jobs during that backfill cycle, even if the other jobs could fit on the nodes. Therefore, jobs may preempt more resources during a single backfill iteration than they requested.
NOTE: For heterogeneous job to be considered for preemption all components must be eligible for preemption. When a heterogeneous job is to be preempted the first identified component of the job with the highest order PreemptMode (SUSPEND (highest), REQUEUE, CANCEL (lowest)) will be used to set the PreemptMode for all components. The GraceTime and user warning signal for each component of the heterogeneous job remain unique. Heterogeneous jobs are excluded from GANG scheduling operations.

OFF
Is the default value and disables job preemption and gang scheduling. It is only compatible with PreemptType=preempt/none at a global level.

CANCEL
The preempted job will be cancelled.

GANG
Enables gang scheduling (time slicing) of jobs in the same partition, and allows the resuming of suspended jobs. Configure the OverSubscribe setting to FORCE for all partitions in which time-slicing is to take place. Gang scheduling is performed independently for each partition, so if you only want time-slicing by OverSubscribe, without any preemption, then configuring partitions with overlapping nodes is not recommended. Time-slicing won't happen between jobs on different partitions.

NOTE: Heterogeneous jobs are excluded from GANG scheduling operations.

REQUEUE
Preempts jobs by requeuing them (if possible) or canceling them. For jobs to be requeued they must have the --requeue sbatch option set or the cluster wide JobRequeue parameter in slurm.conf must be set to 1.

SUSPEND
The preempted jobs will be suspended, and later the Gang scheduler will resume them. Therefore the SUSPEND preemption mode always needs the GANG option to be specified at the cluster level. Also, because the suspended jobs will still use memory on the allocated nodes, Slurm needs to be able to track memory resources to be able to suspend jobs.
If PreemptType=preempt/qos is configured and if the preempted job(s) and the preemptor job are on the same partition, then they will share resources with the Gang scheduler (time-slicing). If not (i.e. if the preemptees and preemptor are on different partitions) then the preempted jobs will remain suspended until the preemptor ends.

NOTE: Suspended jobs will not release GRES. Higher priority jobs will not be able to preempt to gain access to GRES.

WITHIN
Allows for preemption between jobs sharing the same qos. By default, PreemptType=preempt/qos will only consider jobs to be eligible for preemption if they do not share the same qos value.

Priority
What priority will be added to a job's priority when using this QOS.

NOTE: The Priority of a QOS is NOT related to QOS preemption, see Preempt instead.

RawUsage=<value>
This allows an administrator to set the raw usage accrued to a QOS. Specifying a value of 0 (zero) will reset the raw usage. This is a settable specification only - it cannot be used as a filter to list accounts.

UsageFactor
A float that is factored into a job's TRES usage (e.g. RawUsage, TRESMins, TRESRunMins). For example, if the usagefactor was 2, for every TRESBillingUnit second a job ran it would count for 2. If the usagefactor was .5, every second would only count for half of the time. A setting of 0 would add no timed usage from the job.

The usage factor only applies to the job's QOS and not the partition QOS.

If the UsageFactorSafe flag is set and AccountingStorageEnforce includes Safe, jobs will only be started if they can run to completion with the UsageFactor applied, and won't be killed due to limits.

If the UsageFactorSafe flag is not set and AccountingStorageEnforce includes Safe, jobs will be started if they can run to completion without the UsageFactor applied, and won't be killed due to limits.

If the UsageFactorSafe flag is not set and AccountingStorageEnforce does not include Safe, jobs will be scheduled as long as the limits are not reached, but could be killed due to limits.

See AccountingStorageEnforce in slurm.conf man page.

Default is 1. To clear a previously set value use the modify command with a new value of -1.

UsageThreshold
A float representing the lowest fairshare of an association allowable to run a job. If an association falls below this threshold and has pending jobs or submits new jobs those jobs will be held until the usage goes back above the threshold. Use sshare to see current shares on the system. To clear a previously set value use the modify command with a new value of -1.

 

LIST/SHOW QOS FORMAT OPTIONS

Fields you can display when viewing QOS records by using the format= option.

Description
An arbitrary string describing a QOS.

Flags
Used by the slurmctld to override or enforce certain characteristics.

GraceTime
Preemption grace time to be extended to a job which has been selected for preemption in the format of hh:mm:ss.

GrpJobs
Maximum number of running jobs in aggregate for this QOS.

GrpJobsAccrue
Maximum number of pending jobs in aggregate able to accrue age priority for this QOS. This limit only applies to the job's QOS and not the partition's QOS.

GrpSubmit
GrpSubmitJobs
Maximum number of jobs which can be in a pending or running state at any time in aggregate for this QOS.

GrpTRES
Maximum number of TRES running jobs are able to be allocated in aggregate for this QOS.

GrpTRESMins
The total number of TRES minutes that can possibly be used by past, present and future jobs running from this QOS.

GrpTRESRunMins
Used to limit the combined total number of TRES minutes used by all jobs currently running with this QOS.

GrpWall
Maximum wall clock time running jobs are able to be allocated in aggregate for this QOS.

ID
The id of the QOS.

LimitFactor
A float that is factored into an associations [Grp|Max]TRES limits.

MaxJobsAccruePA
MaxJobsAccruePerAccount
Maximum number of jobs an account (or subacct) can have accruing age priority at any given time. This limit only applies to the job's QOS and not the partition's QOS.

MaxJobsAccruePU
MaxJobsAccruePerUser
Maximum number of jobs a user can have accruing age priority at any given time. This limit only applies to the job's QOS and not the partition's QOS.

MaxJobsPA
MaxJobsPerAccount
Maximum number of jobs each account is allowed to run at one time.

MaxJobsPU
MaxJobsPerUser
Maximum number of jobs each user is allowed to run at one time.

MaxTRESMins
MaxTRESMinsPerJob
Maximum number of TRES minutes each job is able to use.

MaxTRESPA
MaxTRESPerAccount
Maximum number of TRES each account is able to use.

MaxTRES
MaxTRESPerJob
Maximum number of TRES each job is able to use.

MaxTRESPerNode
Maximum number of TRES each node in a job allocation can use.

MaxTRESPU
MaxTRESPerUser
Maximum number of TRES each user is able to use.

MaxSubmitJobsPA
MaxSubmitJobsPerAccount
Maximum number of jobs pending or running state at any time per account.

MaxSubmitJobsPU
MaxSubmitJobsPerUser
Maximum number of jobs pending or running state at any time per user.

MaxWall
MaxWallDurationPerJob
Maximum wall clock time each job is able to use. <max wall> format is <min> or <min>:<sec> or <hr>:<min>:<sec> or <days>-<hr>:<min>:<sec> or <days>-<hr>.

MinPrioThreshold
Minimum priority required to reserve resources when scheduling.

MinTRES
Minimum number of TRES each job running under this QOS must request. Otherwise the job will pend until modified.

Name
Name of the QOS.

Preempt
Other QOSs this QOS can preempt.

PreemptExemptTime
Specifies a minimum run time for jobs of this QOS before they are considered for preemption.

PreemptMode
Mechanism used to preempt jobs of this QOS if the clusters PreemptType is configured to preempt/qos. The default preemption mechanism is specified by the cluster-wide PreemptMode configuration parameter.

Priority
What priority will be added to a job's priority when using this QOS.

UsageFactor
A float that is factored into a job's TRES usage (e.g. RawUsage, TRESMins, TRESRunMins).

UsageThreshold
A float representing the lowest fairshare of an association allowable to run a job.

WithDeleted
Display information with previously deleted data. A QOS that is deleted within 24 hours of being created and did not have a job run in the QOS during that time will be removed from the database. Otherwise, the QOS will be marked as deleted and will be viewable with the WithDeleted flag.

 

SPECIFICATIONS FOR RESERVATIONS

Reservations are created with the scontrol command and information about the reservations is sent to slurmdbd to be stored. These are options you can specify to filter for specific reservations.

Clusters=<cluster_name>[,<cluster_name>,...]
List the reservations of the cluster(s). Default is the cluster where the command was run.

End=<OPT>
Period ending of reservations. Default is now.

Valid time formats are:
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
now[{+|-}count[seconds(default)|minutes|hours|days|weeks]]

ID=<OPT>
Comma separated list of reservation ids.

Names=<OPT>
Comma separated list of reservation names.

Nodes=<node_name>[,<node_name>,...]
Node names where reservation ran.

Start=<OPT>
Period start of reservations. Default is 00:00:00 of previous day.

Valid time formats are:
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
now[{+|-}count[seconds(default)|minutes|hours|days|weeks]]

 

LIST/SHOW RESERVATION FORMAT OPTIONS

Fields you can display when viewing Reservation records by using the format= option. The default format is:
Cluster,Name,TRES,Start,End,UnusedWall

Associations
The id's of the associations able to run in the reservation.

Cluster
Name of cluster reservation was on.

End
End time of reservation.

Flags
Flags on the reservation.

ID
Reservation ID.

Name
Name of this reservation.

NodeNames
List of nodes in the reservation.

Start
Start time of reservation.

TRES
List of TRES in the reservation.

UnusedWall
Wall clock time in seconds unused by any job. A job's allocated usage is its run time multiplied by the ratio of its CPUs to the total number of CPUs in the reservation. For example, a job using all the CPUs in the reservation running for 1 minute would reduce unused_wall by 1 minute.

 

SPECIFICATIONS FOR RESOURCE

Resources can be created, modified, and deleted with sacctmgr. These options allow you to set the corresponding attributes or filter on them when querying for Resources.

LastConsumed=<OPT>
Number of software resources of a specific name consumed out of Count on the system being controlled by a resource manager.

Clusters=<name_list>
Comma separated list of cluster names on which specified resources are to be available. If no names are designated then the clusters already allowed to use this resource will be altered.

Count=<OPT>
Number of software resources of a specific name configured on the system being controlled by a resource manager.

Descriptions=
A brief description of the resource.

Flags[-|+]=<OPT>
Flags that identify specific attributes of the system resource.
Valid options are
Absolute
If set the resource will treat the counts for Allowed and Allocated as absolute counts instead of percentages.

NOTE: If removing this with flags-=absolute there is no effort to convert the numbers in the database back to percentages. This is required by the user.

Names=<OPT>
Comma separated list of the name of a resource configured on the system being controlled by a resource manager. If this resource is seen on the slurmctld its name will be name@server to distinguish it from local resources defined in a slurm.conf.

Allowed=<allowed>
Percentage/Count of a specific resource that can be used on specified cluster.

Server=<OPT>
The name of the server serving up the resource. Default is 'slurmdb' indicating the licenses are being served by the database.

ServerType=<OPT>
The type of a software resource manager providing the licenses. For example FlexNext Publisher Flexlm license server or Reprise License Manager RLM.

Type=<OPT>
The type of the resource represented by this record. Currently the only valid type is License.

WithClusters
Display the clusters percentage/count of resources. If a resource hasn't been given to a cluster the resource will not be displayed with this flag.

WithDeleted
Display information with previously deleted data. Resources that are deleted within 24 hours of being created will be removed from the database. Resources that were created more than 24 hours prior to the deletion request are just marked as deleted and will be viewable with the WithDeleted flag.

NOTE: Resource is used to define each resource configured on a system available for usage by Slurm clusters.

 

LIST/SHOW RESOURCE FORMAT OPTIONS

Fields you can display when viewing Resource records by using the format= option. The default format is:
Name,Server,Type,Count,LastConsumed,Allocated,ServerType,Flags

Allocated
The percent/count of licenses allocated to a cluster.

LastConsumed
The count of a specific resource consumed out of Count on the system globally.

Cluster
Name of cluster resource is given to.

Count
The count of a specific resource configured on the system globally.

Description
Description of the resource.

Name
Name of this resource.

Server
Server serving up the resource.

ServerType
The type of the server controlling the licenses.

Type
Type of resource this record represents.

 

LIST/SHOW RUNAWAYJOB FORMAT OPTIONS

Under certain circumstances, jobs can complete without having that completion recorded by slurmdbd. This results in a "runaway job", where slurmdbd is not going to record a completion time for that job without intervention. This command allows you to identify jobs that are in this state and have slurmdbd clean up the job record.

Cluster
Name of cluster job ran on.

ID
Id of the job.

Name
Name of the job.

Partition
Partition job ran on.

State
Current State of the job in the database.

TimeEnd
Current recorded time of the end of the job.

TimeStart
Time job started running.

 

SPECIFICATIONS FOR TRANSACTIONS

Information about changes to clusters, resources, accounts, associations, etc., are recorded as transactions by slurmdbd. These are options you can specify to filter for specific transactions.

Accounts=<account_name>[,<account_name>,...]
Only print out the transactions affecting specified accounts.

Action=<Specific_action_the_list_will_display>
Only display transactions of the specified action type.

Actor=<Specific_name_the_list_will_display>
Only display transactions done by a certain person.

Clusters=<cluster_name>[,<cluster_name>,...]
Only print out the transactions affecting specified clusters.

End=<Date_and_time_of_last_transaction_to_return>
Return all transactions before this Date and time. Default is now.

Start=<Date_and_time_of_first_transaction_to_return>
Return all transactions after this Date and time. Default is epoch.

Valid time formats for End and Start are:
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
now[{+|-}count[seconds(default)|minutes|hours|days|weeks]]

Users=<user_name>[,<user_name>,...]
Only print out the transactions affecting specified users.

WithAssoc
Get information about which associations were affected by the transactions.

 

LIST/SHOW TRANSACTIONS FORMAT OPTIONS

Fields you can display when viewing Transaction records by using the format= option. The default format is:
Time,Action,Actor,Where,Info

Action
Displays the type of Action that took place.

Actor
Displays the Actor to generate a transaction.

Info
Displays details of the transaction.

TimeStamp
Displays when the transaction occurred.

Where
Displays details of the constraints for the transaction.

NOTE: If using the WithAssoc option you can also view the information about the various associations the transaction affected. The Association format fields are described in the LIST/SHOW ASSOCIATION FORMAT OPTIONS section.

 

SPECIFICATIONS FOR USERS

Users can be created, modified, and deleted with sacctmgr. These options allow you to set the corresponding attributes or filter on them when querying for Users.

It is important to recognize the difference between a User and an Association. There is a User entity that exists for each unique username. However, there can be multiple User Associations for the same User. The combination of a Cluster, Account, User, and optionally a Partition constitute a User Association. When adding an existing User to another Account, you are creating an additional User Association rather than modifying an existing User.

Account=<account>
Account name to add this user to.

AdminLevel=<level>
Admin level of user. Valid levels are None, Operator, and Admin.

Cluster=<cluster>
Specific cluster to add user to the account on. Default is all in system.

DefaultAccount=<account>
Identify the default bank account name to be used for a job if none is specified at submission time.

DefaultWCKey=<defaultwckey>
Identify the default Workload Characterization Key.

Name=<name>
Name of user.

NewName=<newname>
Use to rename a user in the accounting database

Partition=<name>
Partition name.

RawUsage=<value>
This allows an administrator to reset the raw usage accrued to a user. The only value currently supported is 0 (zero). This is a settable specification only - it cannot be used as a filter to list users.

WCKeys=<wckeys>
Workload Characterization Key values.

WithAssoc
Display all associations for this user.

WithCoord
Display all accounts a user is coordinator for.

WithDeleted
Display information with previously deleted data. Users that are deleted within 24 hours of being created and did not have a job run by the user during that time will be removed from the database. Otherwise, the user will be marked as deleted and will be viewable with the WithDeleted flag.

NOTE: If using the WithAssoc option you can also query against association specific information to view only certain associations this user may have. These extra options can be found in the SPECIFICATIONS FOR ASSOCIATIONS section. You can also use the general specifications list above in the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES section.

 

LIST/SHOW USER FORMAT OPTIONS

Fields you can display when viewing User records by using the format= option. The default format is:
User,DefaultAccount,DefaultWCKey,AdminLevel

AdminLevel
Admin level of user.

Coordinators
List of users that are a coordinator of the account. (Only filled in when using the WithCoordinator option.)

DefaultAccount
The user's default account.

DefaultWCKey
The user's default wckey.

User
The name of a user.

NOTE: If using the WithAssoc option you can also view the information about the various associations the user may have on all the clusters in the system. The association information can be filtered. Note that all the users in the database will always be shown as filter only takes effect over the association data. The Association format fields are described in the LIST/SHOW ASSOCIATION FORMAT OPTIONS section.

 

LIST/SHOW WCKey

Fields you can display when viewing WCKey records by using the format= option. The default format is:
WCKey,Cluster,User

Cluster
Specific cluster for the WCKey.

ID
The ID of the WCKey.

User
The name of a user for the WCKey.

WCKey
Workload Characterization Key.

WithDeleted
Display information with previously deleted data. WCKeys that are deleted within 24 hours of being created and did not have a job run with the WCKey during that time will be removed from the database. Otherwise, the WCKey will be marked as deleted and will be viewable with the WithDeleted flag.

 

LIST/SHOW TRES

Fields you can display when viewing TRES records by using the format= option. The default format is:
Type,Name,ID

ID
The identification number of the trackable resource as it appears in the database.

Name
The name of the trackable resource. This option is required for TRES types BB (Burst buffer), GRES, and License. Types CPU, Energy, Memory, and Node do not have Names. For example if GRES is the type then name is the denomination of the GRES itself e.g. GPU.

Type
The type of the trackable resource. Current types are BB (Burst buffer), CPU, Energy, GRES, License, Memory, and Node.

 

TRES information

Trackable RESources (TRES) are used in many QOS or Association limits. When setting the limits they are comma separated list. Each TRES has a different limit, i.e. GrpTRESMins=cpu=10,mem=20 would make 2 different limits 1 for 10 cpu minutes and 1 for 20 MB memory minutes. This is the case for each limit that deals with TRES. To remove the limit -1 is used i.e. GrpTRESMins=cpu=-1 would remove only the cpu TRES limit.

NOTE: When dealing with Memory as a TRES all limits are in MB.

NOTE: The Billing TRES is calculated from a partition's TRESBillingWeights. It is temporarily calculated during scheduling for each partition to enforce billing TRES limits. The final Billing TRES is calculated after the job has been allocated resources. The final number can be seen in scontrol show jobs and sacct output.

 

GLOBAL FORMAT OPTION

When using the format option for listing various fields you can put a %NUMBER afterwards to specify how many characters should be printed.

e.g. format=name%30 will print 30 characters of field name right justified. A -30 will print 30 characters left justified.

 

FLAT FILE DUMP AND LOAD

sacctmgr has the capability to load and dump Slurm association data to and from a file. This method can easily add a new cluster or copy an existing cluster's associations into a new cluster with similar accounts. Each file contains Slurm association data for a single cluster. Be aware that QOS information is not currently included in the information that can be dumped to a file. QOS information can be retrieved and loaded using the REST API or it must be transferred to a new cluster manually. Comments can be put into the file with the # character. Each line of information must begin with one of the four titles; Cluster, Parent, Account or User. Following the title is a space, dash, space, entity value, then specifications. Specifications are colon separated. If any variable, such as an Organization name, has a space in it, surround the name with single or double quotes.

sacctmgr dump/load must be run as a Slurm administrator or root. If using sacctmgr load on a database without any associations, it must be run as root (because there aren't any users in the database yet).

 

dump

Dump cluster associations from the database into a file. If no file is given then one will be generated, using the cluster name for the file name. That file will be created in the current working directory.

To create a file with the association information you can run:

sacctmgr dump tux file=tux.cfg

Cluster=
Specify the cluster to dump the information for.

File=
Specify a file to save flat file data to. If the filename is not specified it uses clustername.cfg filename by default.

 

load

Load cluster associations into the database. The imported associations will be reconciled with existing ones.

To load a previously created file you can run:

sacctmgr load file=tux.cfg

clean
Delete what was already there and start from scratch with this information.

Cluster=
Specify a different name for the cluster than that which is in the file.

File=
Specify a flat file to load from.

 

SPECIFICATIONS FOR FLAT FILE

Since the associations in the system follow a hierarchy, so does the file. Anything that is a parent needs to be defined before any children. The only exception is the understood 'root' account. This is always a default for any cluster and does not need to be defined.

To edit/create a file start with a cluster line for the new cluster:

Cluster - cluster_name:MaxTRESPerJob=node=15

Anything included on this line will be the default for all associations on this cluster. The options for the cluster are:

FairShare=
Number used in conjunction with other associations to determine job priority.

GrpJobs=
Maximum number of running jobs in aggregate for this association and all associations which are children of this association.

GrpJobsAccrue=
Maximum number of pending jobs in aggregate able to accrue age priority for this association and all associations which are children of this association.

GrpNodes=
This option has been deprecated in favor of the more versatile TRES. Equivalent limit definition is now GrpTRES=node=#.

GrpSubmitJobs=
Maximum number of jobs which can be in a pending or running state at any time in aggregate for this association and all associations which are children of this association.

GrpTRES=
Maximum number of TRES running jobs are able to be allocated in aggregate for this association and all associations which are children of this association.

GrpTRESMins=
The total number of TRES minutes that can possibly be used by past, present and future jobs running from this association and its children.

GrpTRESRunMins=
Used to limit the combined total number of TRES minutes used by all jobs running with this association and its children. This takes into consideration time limit of running jobs and consumes it, if the limit is reached no new jobs are started until other jobs finish to allow time to free up.

GrpWall=
Maximum wall clock time running jobs are able to be allocated in aggregate for this association and all associations which are children of this association.

MaxJobs=
Maximum number of jobs the children of this association can run.

MaxTRESPerJob=
Maximum number of trackable resources per job the children of this association can run.

MaxWallDurationPerJob=
Maximum time (not related to job size) children of this accounts jobs can run.

QOS=
Comma separated list of Quality of Service names (Defined in sacctmgr).

After the entry for the root account you will have entries for the other accounts on the system. The entries will look similar to this example:

Parent - root
Account - cs:MaxTRESPerJob=node=5:MaxJobs=4:FairShare=399:MaxWallDurationPerJob=40:Description='Computer Science':Organization='LC'
Parent - cs
Account - test:MaxTRESPerJob=node=1:MaxJobs=1:FairShare=1:MaxWallDurationPerJob=1:Description='Test Account':Organization='Test'

Any of the options after a ':' can be left out and they can be in any order. If you want to add any sub accounts just list the Parent THAT HAS ALREADY BEEN CREATED before the account you are adding.

Account options are:

Description=
A brief description of the account.

FairShare=
Number used in conjunction with other associations to determine job priority.

GrpTRESMins=
Maximum number of TRES hours running jobs are able to be allocated in aggregate for this association and all associations which are children of this association. GrpTRESRunMins= Used to limit the combined total number of TRES minutes used by all jobs running with this association and its children. This takes into consideration time limit of running jobs and consumes it, if the limit is reached no new jobs are started until other jobs finish to allow time to free up.

GrpTRES=
Maximum number of TRES running jobs are able to be allocated in aggregate for this association and all associations which are children of this association.

GrpJobs=
Maximum number of running jobs in aggregate for this association and all associations which are children of this association.

GrpJobsAccrue
Maximum number of pending jobs in aggregate able to accrue age priority for this association and all associations which are children of this association.

GrpNodes=
This option has been deprecated in favor of the more versatile TRES. Equivalent limit definition is now GrpTRES=node=#.

GrpSubmitJobs=
Maximum number of jobs which can be in a pending or running state at any time in aggregate for this association and all associations which are children of this association.

GrpWall=
Maximum wall clock time running jobs are able to be allocated in aggregate for this association and all associations which are children of this association.

MaxJobs=
Maximum number of jobs the children of this association can run.

MaxNodesPerJob=
Maximum number of nodes per job the children of this association can run.

MaxWallDurationPerJob=
Maximum time (not related to job size) children of this accounts jobs can run.

Organization=
Name of organization that owns this account.

QOS(=,+=,-=)
Comma separated list of Quality of Service names (Defined in sacctmgr).

To add users to an account add a line after the Parent line, similar to this:

Parent - test
User - adam:MaxTRESPerJob=node:2:MaxJobs=3:FairShare=1:MaxWallDurationPerJob=1:AdminLevel=Operator:Coordinator='test'

User options are:

AdminLevel=
Type of admin this user is (Administrator, Operator)
Must be defined on the first occurrence of the user.

Coordinator=
Comma separated list of accounts this user is coordinator over
Must be defined on the first occurrence of the user.

DefaultAccount=
System wide default account name
Must be defined on the first occurrence of the user.

FairShare=
Number used in conjunction with other associations to determine job priority.

MaxJobs=
Maximum number of jobs this user can run.

MaxTRESPerJob=
Maximum number of trackable resources per job this user can run.

MaxWallDurationPerJob=
Maximum time (not related to job size) this user can run.

QOS(=,+=,-=)
Comma separated list of Quality of Service names (Defined in sacctmgr).

 

ARCHIVE FUNCTIONALITY

Sacctmgr has the capability to archive to a flatfile and or load that data if needed later. The archiving is usually done by the slurmdbd and it is highly recommended you only do it through sacctmgr if you completely understand what you are doing. For slurmdbd options see "man slurmdbd" for more information. Loading data into the database can be done from these files to either view old data or regenerate rolled up data. For information about configuring an archive server see <https://slurm.schedmd.com/accounting.html#archive>.

 

archive dump

Dump accounting data to file. Data will not be archived unless the corresponding purge option is included in this command or in slurmdbd.conf. This operation cannot be rolled back once executed. If one of the following options is not specified when sacctmgr is called, the value configured in slurmdbd.conf is used.

Directory=
Directory to store the archive data.

Events
Archive Events. If not specified and PurgeEventAfter is set all event data removed will be lost permanently.

Jobs
Archive Jobs. If not specified and PurgeJobAfter is set all job data removed will be lost permanently.

PurgeEventAfter=
Purge cluster event records older than time stated in months. If you want to purge on a shorter time period you can include hours, or days behind the numeric value to get those more frequent purges. (e.g. a value of '12hours' would purge everything older than 12 hours.)

PurgeJobAfter=
Purge job records older than time stated in months. If you want to purge on a shorter time period you can include hours, or days behind the numeric value to get those more frequent purges. (e.g. a value of '12hours' would purge everything older than 12 hours.)

PurgeStepAfter=
Purge step records older than time stated in months. If you want to purge on a shorter time period you can include hours, or days behind the numeric value to get those more frequent purges. (e.g. a value of '12hours' would purge everything older than 12 hours.)

PurgeSuspendAfter=
Purge job suspend records older than time stated in months. If you want to purge on a shorter time period you can include hours, or days behind the numeric value to get those more frequent purges. (e.g. a value of '12hours' would purge everything older than 12 hours.)

Script=
Run this script instead of the generic form of archive to flat files.

Steps
Archive Steps. If not specified and PurgeStepAfter is set all step data removed will be lost permanently.

Suspend
Archive Suspend Data. If not specified and PurgeSuspendAfter is set all suspend data removed will be lost permanently.

 

archive load

Load in to the database previously archived data. The archive file will not be loaded if the records already exist in the database - therefore, trying to load an archive file more than once will result in an error. When this data is again archived and purged from the database, if the old archive file is still in the directory ArchiveDir, a new archive file will be created (see ArchiveDir in the slurmdbd.conf man page), so the old file will not be overwritten and these files will have duplicate records.

Archive files from the current or any prior Slurm release may be loaded through archive load.

File=
File to load into database. The specified file must exist on the slurmdbd host, which is not necessarily the machine running the command.

Insert=
SQL to insert directly into the database. This should be used very cautiously since this is writing your sql into the database.

 

PERFORMANCE

Executing sacctmgr sends a remote procedure call to slurmdbd. If enough calls from sacctmgr or other Slurm client commands that send remote procedure calls to the slurmdbd daemon come in at once, it can result in a degradation of performance of the slurmdbd daemon, possibly resulting in a denial of service.

Do not run sacctmgr or other Slurm client commands that send remote procedure calls to slurmdbd from loops in shell scripts or other programs. Ensure that programs limit calls to sacctmgr to the minimum necessary for the information you are trying to gather.

 

ENVIRONMENT VARIABLES

Some sacctmgr options may be set via environment variables. These environment variables, along with their corresponding options, are listed below. (Note: Command line options will always override these settings.)

SLURM_CONF
The location of the Slurm configuration file.

SLURM_DEBUG_FLAGS
Specify debug flags for sacctmgr to use. See DebugFlags in the slurm.conf(5) man page for a full list of flags. The environment variable takes precedence over the setting in the slurm.conf.

 

EXAMPLES

NOTE: There is an order to set up accounting associations. You must define clusters before you add accounts and you must add accounts before you can add users.

$ sacctmgr create cluster tux
$ sacctmgr create account name=science fairshare=50
$ sacctmgr create account name=chemistry parent=science fairshare=30
$ sacctmgr create account name=physics parent=science fairshare=20
$ sacctmgr create user name=adam cluster=tux account=physics fairshare=10
$ sacctmgr delete user name=adam cluster=tux account=physics
$ sacctmgr delete account name=physics cluster=tux
$ sacctmgr modify user where name=adam cluster=tux account=physics set maxjobs=2 maxwall=30:00
$ sacctmgr add user brian account=chemistry
$ sacctmgr list associations cluster=tux format=Account,Cluster,User,Fairshare tree withd
$ sacctmgr list transactions Action="Add Users" Start=11/03-10:30:00 format=Where,Time
$ sacctmgr dump cluster=tux file=tux_data_file
$ sacctmgr load tux_data_file

A user's account can not be changed directly. A new association needs to be created for the user with the new account. Then the association with the old account can be deleted.

When modifying an object placing the key words 'set' and the optional 'where' is critical to perform correctly below are examples to produce correct results. As a rule of thumb anything you put in front of the set will be used as a quantifier. If you want to put a quantifier after the key word 'set' you should use the key word 'where'. The following is wrong:

$ sacctmgr modify user name=adam set fairshare=10 cluster=tux

This will produce an error as the above line reads modify user adam set fairshare=10 and cluster=tux. Either of the following is correct:

$ sacctmgr modify user name=adam cluster=tux set fairshare=10
$ sacctmgr modify user name=adam set fairshare=10 where cluster=tux

When changing qos for something only use the '=' operator when wanting to explicitly set the qos to something. In most cases you will want to use the '+=' or '-=' operator to either add to or remove from the existing qos already in place.

If a user already has qos of normal,standby for a parent or it was explicitly set you should use qos+=expedite to add this to the list in this fashion.

If you are looking to only add the qos expedite to only a certain account and or cluster you can do that by specifying them in the sacctmgr line.

$ sacctmgr modify user name=adam set qos+=expedite

or

$ sacctmgr modify user name=adam acct=this cluster=tux set qos+=expedite

Let's give an example how to add QOS to user accounts. List all available QOSs in the cluster.

$ sacctmgr show qos format=name
       Name
  ---------
     normal
   expedite

List all the associations in the cluster.

$ sacctmgr show assoc format=cluster,account,qos
   Cluster     Account                  QOS
  --------  ---------- --------------------
     zebra        root               normal
     zebra        root               normal
     zebra           g               normal
     zebra          g1               normal

Add the QOS expedite to account G1 and display the result. Using the operator += the QOS will be added together with the existing QOS to this account.

$ sacctmgr modify account name=g1 set qos+=expedite
$ sacctmgr show assoc format=cluster,account,qos
   Cluster     Account                  QOS
  --------  ---------- --------------------
     zebra        root               normal
     zebra        root               normal
     zebra           g               normal
     zebra          g1      expedite,normal

Now set the QOS expedite as the only QOS for the account G and display the result. Using the operator = that expedite is the only usable QOS by account G

$ sacctmgr modify account name=G set qos=expedite
$ sacctmgr show assoc format=cluster,account,qos
   Cluster     Account                  QOS
  --------  ---------- --------------------
     zebra        root               normal
     zebra        root               normal
     zebra           g             expedite
     zebra          g1      expedite,normal

If a new account is added under the account G it will inherit the QOS expedite and it will not have access to QOS normal.

$ sacctmgr add account banana parent=G
$ sacctmgr show assoc format=cluster,account,qos
   Cluster     Account                  QOS
  --------  ---------- --------------------
     zebra        root               normal
     zebra        root               normal
     zebra           g             expedite
     zebra      banana             expedite
     zebra          g1      expedite,normal

An example of listing trackable resources:

$ sacctmgr show tres
      Type              Name      ID
---------- ----------------- --------
       cpu                          1
       mem                          2
    energy                          3
      node                          4
   billing                          5
      gres         gpu:tesla     1001
   license               vcs     1002
        bb              cray     1003

 

COPYING

Copyright (C) 2008-2010 Lawrence Livermore National Security. Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
Copyright (C) 2010-2022 SchedMD LLC.

This file is part of Slurm, a resource management program. For details, see <https://slurm.schedmd.com/>.

Slurm is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

 

SEE ALSO

slurm.conf(5), slurmdbd(8)


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
COMMANDS
INTERACTIVE COMMANDS
ENTITIES
GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES
SPECIFICATIONS FOR ACCOUNTS
LIST/SHOW ACCOUNT FORMAT OPTIONS
SPECIFICATIONS FOR ASSOCIATIONS
LIST/SHOW ASSOCIATION FORMAT OPTIONS
SPECIFICATIONS FOR CLUSTERS
LIST/SHOW CLUSTER FORMAT OPTIONS
SPECIFICATIONS FOR COORDINATOR
SPECIFICATIONS FOR EVENTS
LIST/SHOW EVENT FORMAT OPTIONS
SPECIFICATIONS FOR FEDERATION
LIST/SHOW FEDERATION FORMAT OPTIONS
SPECIFICATIONS FOR INSTANCES
LIST/SHOW INSTANCE FORMAT OPTIONS
SPECIFICATIONS FOR JOB
LIST/SHOW JOB FORMAT OPTIONS
SPECIFICATIONS FOR QOS
LIST/SHOW QOS FORMAT OPTIONS
SPECIFICATIONS FOR RESERVATIONS
LIST/SHOW RESERVATION FORMAT OPTIONS
SPECIFICATIONS FOR RESOURCE
LIST/SHOW RESOURCE FORMAT OPTIONS
LIST/SHOW RUNAWAYJOB FORMAT OPTIONS
SPECIFICATIONS FOR TRANSACTIONS
LIST/SHOW TRANSACTIONS FORMAT OPTIONS
SPECIFICATIONS FOR USERS
LIST/SHOW USER FORMAT OPTIONS
LIST/SHOW WCKey
LIST/SHOW TRES
TRES information
GLOBAL FORMAT OPTION
FLAT FILE DUMP AND LOAD
dump
load
SPECIFICATIONS FOR FLAT FILE
ARCHIVE FUNCTIONALITY
archive dump
archive load
PERFORMANCE
ENVIRONMENT VARIABLES
EXAMPLES
COPYING
SEE ALSO

This document was created by man2html using the manual pages.
Time: 00:27:51 GMT, March 26, 2024