This is the command sacctmgr that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator
PROGRAM:
NAME
sacctmgr - Used to view and modify Slurm account information.
SYNOPSIS
sacctmgr [OPTIONS...] [COMMAND...]
DESCRIPTION
sacctmgr is used to view or modify Slurm account information. The account information is
maintained within a database with the interface being provided by slurmdbd (Slurm Database
daemon). This database can serve as a central storehouse of user and computer information
for multiple computers at a single site. Slurm account information is recorded based upon
four parameters that form what is referred to as an association. These parameters are
user, cluster, partition, and account. user is the login name. cluster is the name of a
Slurm managed cluster as specified by the ClusterName parameter in the slurm.conf
configuration file. partition is the name of a Slurm partition on that cluster. account
is the bank account for a job. The intended mode of operation is to initiate the sacctmgr
command, add, delete, modify, and/or list association records then commit the changes and
exit.
Note: The content's of Slurm's database are maintained in lower case. This may result in
some sacctmgr output differing from that of other Slurm commands.
OPTIONS
-h, --help
Print a help message describing the usage of sacctmgr. This is equivalent to the
help command.
-i, --immediate
commit changes immediately.
-n, --noheader
No header will be added to the beginning of the output.
-p, --parsable
Output will be '|' delimited with a '|' at the end.
-P, --parsable2
Output will be '|' delimited without a '|' at the end.
-Q, --quiet
Print no messages other than error messages. This is equivalent to the quiet
command.
-r, --readonly
Makes it so the running sacctmgr cannot modify accounting information. The
readonly option is for use within interactive mode.
-s, --associations
Use with show or list to display associations with the entity. This is equivalent
to the associations command.
-v, --verbose
Enable detailed logging. This is equivalent to the verbose command.
-V , --version
Display version number. This is equivalent to the version command.
COMMANDS
add <ENTITY> <SPECS>
Add an entity. Identical to the create command.
associations
Use with show or list to display associations with the entity.
create <ENTITY> <SPECS>
Add an entity. Identical to the add command.
delete <ENTITY> where <SPECS>
Delete the specified entities.
dump <ENTITY> <File=FILENAME>
Dump cluster data to the specified file. If the filename is not specified it uses
clustername.cfg filename by default.
exit Terminate sacctmgr interactive mode. Identical to the quit command.
help Display a description of sacctmgr options and commands.
list <ENTITY> [<SPECS>]
Display information about the specified entity. By default, all entries are
displayed, you can narrow results by specifying SPECS in your query. Identical to
the show command.
load <FILENAME>
Load cluster data from the specified file. This is a configuration file generated
by running the sacctmgr dump command. This command does not load archive data, see
the sacctmgr archive load option instead.
modify <ENTITY> where <SPECS> set <SPECS>
Modify an entity.
problem
Use with show or list to display entity problems.
quiet Print no messages other than error messages.
quit Terminate the execution of sacctmgr interactive mode. Identical to the exit
command.
reconfigure
Reconfigures the SlurmDBD if running with one.
show <ENTITY> [<SPECS>]
Display information about the specified entity. By default, all entries are
displayed, you can narrow results by specifying SPECS in your query. Identical to
the list command.
verbose
Enable detailed logging. This includes time-stamps on data structures, record
counts, etc. This is an independent command with no options meant for use in
interactive mode.
version
Display the version number of sacctmgr.
!! Repeat the last command.
ENTITIES
account
A bank account, typically specified at job submit time using the --account= option.
These may be arranged in a hierarchical fashion, for example accounts chemistry and
physics may be children of the account science. The hierarchy may have an
arbitrary depth.
association
The entity used to group information consisting of four parameters: account,
cluster, partition (optional), and user. Used only with the list or show command.
Add, modify, and delete should be done to a user, account or cluster entity. This
will in-turn update the underlying associations.
cluster
The ClusterName parameter in the slurm.conf configuration file, used to
differentiate accounts from on different machines.
configuration
Used only with the list or show command to report current system configuration.
coordinator
A special privileged user usually an account manager or such that can add users or
sub accounts to the account they are coordinator over. This should be a trusted
person since they can change limits on account and user associations inside their
realm.
event Events like downed or draining nodes on clusters.
job Job - but only two specific fields of the job: Derived Exit Code and the Comment
String
qos Quality of Service.
Resource
Software resources for the system. Those are software licenses shared among
clusters.
transaction
List of transactions that have occurred during a given time period.
user The login name. Only lowercase usernames are supported.
wckeys Workload Characterization Key. An arbitrary string for grouping orthogonal
accounts.
GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES
NOTE: The group limits (GrpJobs, GrpTRES, etc.) are tested when a job is being considered
for being allocated resources. If starting a job would cause any of its group limit to be
exceeded, that job will not be considered for scheduling even if that job might preempt
other jobs which would release sufficient group resources for the pending job to be
initiated.
DefaultQOS=<default qos>
The default QOS this association and its children should have. This is overridden
if set directly on a user. To clear a previously set value use the modify command
with a new value of -1.
Fairshare=<fairshare number | parent>
Number used in conjunction with other accounts to determine job priority. Can also
be the string parent, when used on a user this means that the parent association is
used for fairshare. If Fairshare=parent is set on an account, that account's
children will be effectively reparented for fairshare calculations to the first
parent of their parent that is not Fairshare=parent. Limits remain the same, only
it's fairshare value is affected. To clear a previously set value use the modify
command with a new value of -1.
GraceTime=<preemption grace time in seconds>
Specifies, in units of seconds, the preemption grace time to be extended to a job
which has been selected for preemption. The default value is zero, no preemption
grace time is allowed on this QOS.
NOTE: This value is only meaningful for QOS PreemptMode=CANCEL)
GrpTRESMins=<TRES=max TRES minutes,...>
The total number of TRES minutes that can possibly be used by past, present and
future jobs running from this association and its children. To clear a previously
set value use the modify command with a new value of -1.
NOTE: This limit is not enforced if set on the root association of a cluster. So even
though it may appear in sacctmgr output, it will not be enforced.
ALSO NOTE: This limit only applies when using the Priority Multifactor plugin. The time
is decayed using the value of PriorityDecayHalfLife or PriorityUsageResetPeriod as set in
the slurm.conf. When this limit is reached all associated jobs running will be killed and
all future jobs submitted with associations in the group will be delayed until they are
able to run inside the limit.
GrpTRESRunMins=<TRES=max TRES run minutes,...>
Used to limit the combined total number of TRES minutes used by all jobs running
with this association and its children. This takes into consideration time limit
of running jobs and consumes it, if the limit is reached no new jobs are started
until other jobs finish to allow time to free up.
GrpTRES=<TRES=max TRES,...>
Maximum number of TRES running jobs are able to be allocated in aggregate for this
association and all associations which are children of this association. To clear
a previously set value use the modify command with a new value of -1.
NOTE: This limit only applies fully when using the Select Consumable Resource plugin.
GrpJobs=<max jobs>
Maximum number of running jobs in aggregate for this association and all
associations which are children of this association. To clear a previously set
value use the modify command with a new value of -1.
GrpSubmitJobs=<max jobs>
Maximum number of jobs which can be in a pending or running state at any time in
aggregate for this association and all associations which are children of this
association. To clear a previously set value use the modify command with a new
value of -1.
GrpWall=<max wall>
Maximum wall clock time running jobs are able to be allocated in aggregate for this
association and all associations which are children of this association. To clear
a previously set value use the modify command with a new value of -1.
NOTE: This limit is not enforced if set on the root association of a cluster. So even
though it may appear in sacctmgr output, it will not be enforced.
ALSO NOTE: This limit only applies when using the Priority Multifactor plugin. The time
is decayed using the value of PriorityDecayHalfLife or PriorityUsageResetPeriod as set in
the slurm.conf. When this limit is reached all associated jobs running will be killed and
all future jobs submitted with associations in the group will be delayed until they are
able to run inside the limit.
MaxTRESMins=<max TRES minutes>
Maximum number of TRES minutes each job is able to use in this association. This
is overridden if set directly on a user. Default is the cluster's limit. To clear
a previously set value use the modify command with a new value of -1.
MaxTRES=<max TRES>
Maximum number of TRES each job is able to use in this association. This is
overridden if set directly on a user. Default is the cluster's limit. To clear a
previously set value use the modify command with a new value of -1.
NOTE: This limit only applies fully when using the Select Consumable Resource plugin.
MaxJobs=<max jobs>
Maximum number of jobs each user is allowed to run at one time in this association.
This is overridden if set directly on a user. Default is the cluster's limit. To
clear a previously set value use the modify command with a new value of -1.
MaxSubmitJobs=<max jobs>
Maximum number of jobs which can this association can have in a pending or running
state at any time. Default is the cluster's limit. To clear a previously set
value use the modify command with a new value of -1.
MaxWall=<max wall>
Maximum wall clock time each job is able to use in this association. This is
overridden if set directly on a user. Default is the cluster's limit. <max wall>
format is <min> or <min>:<sec> or <hr>:<min>:<sec> or <days>-<hr>:<min>:<sec> or
<days>-<hr>. The value is recorded in minutes with rounding as needed. To clear a
previously set value use the modify command with a new value of -1.
NOTE: Changing this value will have no effect on any running or pending job.
QosLevel<operator><comma separated list of qos names>
Specify the default Quality of Service's that jobs are able to run at for this
association. To get a list of valid QOS's use 'sacctmgr list qos'. This value
will override its parents value and push down to its children as the new default.
Setting a QosLevel to '' (two single quotes with nothing between them) restores its
default setting. You can also use the operator += and -= to add or remove certain
QOS's from a QOS list.
Valid <operator> values include:
= Set QosLevel to the specified value. Note: the QOS that can be used at a given
account in the hierarchy are inherited by the children of that account. By
assigning QOS with the = sign only the assigned QOS can be used by the account
and its childern.
+= Add the specified <qos> value to the current QosLevel. The account will have
access to this QOS and the other previously assigned to it.
-= Remove the specified <qos> value from the current QosLevel.
See the EXAMPLES section below.
SPECIFICATIONS FOR ACCOUNTS
Cluster=<cluster>
Specific cluster to add account to. Default is all in system.
Description=<description>
An arbitrary string describing an account.
Name=<name>
The name of a bank account. Note the name must be unique and can not be represent
different bank accounts at different points in the account hierarchy.
Organization=<org>
Organization to which the account belongs.
Parent=<parent>
Parent account of this account. Default is the root account, a top level account.
RawUsage=<value>
This allows an administrator to reset the raw usage accrued to an account. The
only value currently supported is 0 (zero). This is a settable specification only
- it cannot be used as a filter to list accounts.
WithAssoc
Display all associations for this account.
WithCoord
Display all coordinators for this account.
WithDeleted
Display information with previously deleted data.
NOTE: If using the WithAssoc option you can also query against association specific
information to view only certain associations this account may have. These extra options
can be found in the SPECIFICATIONS FOR ASSOCIATIONS section. You can also use the general
specifications list above in the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES
section.
LIST/SHOW ACCOUNT FORMAT OPTIONS
Account
The name of a bank account.
Description
An arbitrary string describing an account.
Organization
Organization to which the account belongs.
Coordinators
List of users that are a coordinator of the account. (Only filled in when using the
WithCoordinator option.)
NOTE: If using the WithAssoc option you can also view the information about the various
associations the account may have on all the clusters in the system. The Association
format fields are described in the LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
SPECIFICATIONS FOR ASSOCIATIONS
Clusters=<comma separated list of cluster names>
List the associations of the cluster(s).
Accounts=<comma separated list of account names>
List the associations of the account(s).
Users=<comma separated list of user names>
List the associations of the user(s).
Partition=<comma separated list of partition names>
List the associations of the partition(s).
NOTE: You can also use the general specifications list above in the GENERAL SPECIFICATIONS
FOR ASSOCIATION BASED ENTITIES section.
Other options unique for listing associations:
OnlyDefaults
Display only associations that are default associations
Tree Display account names in a hierarchical fashion.
WithDeleted
Display information with previously deleted data.
WithSubAccounts
Display information with subaccounts. Only really valuable when used with the
account= option. This will display all the subaccount associations along with the
accounts listed in the option.
WOLimits
Display information without limit information. This is for a smaller default format
of Cluster,Account,User,Partition
WOPInfo
Display information without parent information. (i.e. parent id, and parent account
name.) This option also invokes WOPLIMITS.
WOPLimits
Display information without hierarchical parent limits. (i.e. will only display
limits where they are set instead of propagating them from the parent.)
LIST/SHOW ASSOCIATION FORMAT OPTIONS
Account
The name of a bank account in the association.
Cluster
The name of a cluster in the association.
DefaultQOS
The QOS the association will use by default if it as access to it in the QOS list
mentioned below.
Fairshare
Number used in conjunction with other accounts to determine job priority. Can also
be the string parent, when used on a user this means that the parent association is
used for fairshare. If Fairshare=parent is set on an account, that account's
children will be effectively reparented for fairshare calculations to the first
parent of their parent that is not Fairshare=parent. Limits remain the same, only
it's fairshare value is affected.
GrpTRESMins
The total number of TRES minutes that can possibly be used by past, present and
future jobs running from this association and its children.
GrpTRESRunMins
Used to limit the combined total number of TRES minutes used by all jobs running
with this association and its children. This takes into consideration time limit
of running jobs and consumes it, if the limit is reached no new jobs are started
until other jobs finish to allow time to free up.
GrpTRES
Maximum number of TRES running jobs are able to be allocated in aggregate for this
association and all associations which are children of this association.
GrpJobs
Maximum number of running jobs in aggregate for this association and all
associations which are children of this association.
GrpSubmitJobs
Maximum number of jobs which can be in a pending or running state at any time in
aggregate for this association and all associations which are children of this
association.
GrpWall
Maximum wall clock time running jobs are able to be allocated in aggregate for this
association and all associations which are children of this association.
ID The id of the association.
LFT Associations are kept in a hierarchy: this is the left most spot in the hierarchy.
When used with the RGT variable, all associations with a LFT inside this LFT and
before the RGT are children of this association.
MaxTRESMins
Maximum number of TRES minutes each job is able to use.
MaxTRES
Maximum number of TRES each job is able to use.
MaxJobs
Maximum number of jobs each user is allowed to run at one time.
MaxSubmitJobs
Maximum number of jobs pending or running state at any time.
MaxWall
Maximum wall clock time each job is able to use.
Qos Valid QOS´ for this association.
ParentID
The association id of the parent of this association.
ParentName
The account name of the parent of this association.
Partition
The name of a partition in the association.
RawQOS The numeric values of valid QOS´ for this association.
RGT Associations are kept in a hierarchy: this is the right most spot in the hierarchy.
When used with the LFT variable, all associations with a LFT inside this RGT and
after the LFT are children of this association.
User The name of a user in the association.
SPECIFICATIONS FOR CLUSTERS
Classification=<classification>
Type of machine, current classifications are capability and capacity.
Flags=<flag list>
Comma separated list of Attributes for a particular cluster. Current Flags include
AIX, BGL, BGP, BGQ, Bluegene, CrayXT, FrontEnd, MultipleSlurmd, and
SunConstellation
Name=<name>
The name of a cluster. This should be equal to the ClusterName parameter in the
slurm.conf configuration file for some Slurm-managed cluster.
RPC=<rpc list>
Comma separated list of numeric RPC values.
WOLimits
Display information without limit information. This is for a smaller default format
of Cluster,ControlHost,ControlPort,RPC
NOTE: You can also use the general specifications list above in the GENERAL SPECIFICATIONS
FOR ASSOCIATION BASED ENTITIES section.
LIST/SHOW CLUSTER FORMAT OPTIONS
Classification
Type of machine, i.e. capability or capacity.
Cluster
The name of the cluster.
ControlHost
When a slurmctld registers with the database the ip address of the controller is
placed here.
ControlPort
When a slurmctld registers with the database the port the controller is listening
on is placed here.
TRES Trackable RESources (BB (Burst buffer), CPU, Energy, GRES, License, Memory, and
Node) this cluster is accounting for.
Flags Attributes possessed by the cluster.
NodeCount
The current count of nodes associated with the cluster.
NodeNames
The current Nodes associated with the cluster.
PluginIDSelect
The numeric value of the select plugin the cluster is using.
RPC When a slurmctld registers with the database the rpc version the controller is
running is placed here.
NOTE: You can also view the information about the root association for the cluster. The
Association format fields are described in the LIST/SHOW ASSOCIATION FORMAT OPTIONS
section.
SPECIFICATIONS FOR COORDINATOR
Account=<comma separated list of account names>
Account name to add this user as a coordinator to.
Names=<comma separated list of user names>
Names of coordinators.
NOTE: To list coordinators use the WithCoordinator options with list account or list user.
SPECIFICATIONS FOR EVENTS
All_Clusters
Get information on all cluster shortcut.
All_Time
Get time period for all time shortcut.
Clusters=<comma separated list of cluster names>
List the events of the cluster(s). Default is the cluster where the command was
run.
End=<OPT>
Period ending of events. Default is now.
Valid time formats are...
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
Event=<OPT>
Specific events to look for, valid options are Cluster or Node, default is both.
MaxTRES=<OPT>
Max number of TRES affected by an event.
MinTRES=<OPT>
Min number of TRES affected by an event.
Nodes=<comma separated list of node names>
Node names affected by an event.
Reason=<comma separated list of reasons>
Reason an event happened.
Start=<OPT>
Period start of events. Default is 00:00:00 of previous day, unless states are
given with the States= spec events. If this is the case the default behavior is to
return events currently in the states specified.
Valid time formats are...
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
States=<comma separated list of states>
State of a node in a node event. If this is set, the event type is set
automatically to Node.
User=<comma separated list of users>
Query against users who set the event. If this is set, the event type is set
automatically to Node since only user slurm can perform a cluster event.
LIST/SHOW EVENT FORMAT OPTIONS
Cluster
The name of the cluster event happened on.
ClusterNodes
The hostlist of nodes on a cluster in a cluster event.
TRES Number of TRES involved with the event.
Duration
Time period the event was around for.
End Period when event ended.
Event Name of the event.
EventRaw
Numeric value of the name of the event.
NodeName
The node affected by the event. In a cluster event, this is blank.
Reason The reason an event happened.
Start Period when event started.
State On a node event this is the formatted state of the node during the event.
StateRaw
On a node event this is the numeric value of the state of the node during the
event.
User On a node event this is the user who caused the event to happen.
SPECIFICATIONS FOR JOB
DerivedExitCode
The derived exit code can be modified after a job completes based on the user's
judgement of whether the job succeeded or failed. The user can only modify the
derived exit code of their own job.
Comment
The job's comment string when the AccountingStoreJobComment parameter in the
slurm.conf file is set (or defaults) to YES. The user can only modify the comment
string of their own job.
The DerivedExitCode and Comment fields are the only fields
of a job record in the database that can be modified after job completion.
LIST/SHOW JOB FORMAT OPTIONS
The sacct command is the exclusive command to display job records from the Slurm database.
SPECIFICATIONS FOR QOS
NOTE: The group limits (GrpJobs, GrpNodes, etc.) are tested when a job is being considered
for being allocated resources. If starting a job would cause any of its group limit to be
exceeded, that job will not be considered for scheduling even if that job might preempt
other jobs which would release sufficient group resources for the pending job to be
initiated.
Flags Used by the slurmctld to override or enforce certain characteristics.
Valid options are
DenyOnLimit
If set jobs using this QOS will be rejected at submission time if they do
not conform to the QOS 'Max' limits. By default jobs that go over these
limits will pend until they conform.
EnforceUsageThreshold
If set, and the QOS also has a UsageThreshold, any jobs submitted with this
QOS that fall below the UsageThreshold will be held until their Fairshare
Usage goes above the Threshold.
NoReserve
If this flag is set and backfill scheduling is used, jobs using this QOS
will not reserve resources in the backfill schedule's map of resources
allocated through time. This flag is intended for use with a QOS that may be
preempted by jobs associated with all other QOS (e.g use with a "standby"
QOS). If the allocated is used with a QOS which can not be preempted by all
other QOS, it could result in starvation of larger jobs.
PartitionMaxNodes
If set jobs using this QOS will be able to override the requested
partition's MaxNodes limit.
PartitionMinNodes
If set jobs using this QOS will be able to override the requested
partition's MinNodes limit.
OverPartQOS
If set jobs using this QOS will be able to override any limits used by the
the requested partition's QOS limits.
PartitionTimeLimit
If set jobs using this QOS will be able to override the requested
partition's TimeLimit.
RequiresReservaton
If set jobs using this QOS must designate a reservation when submitting a
job. This option can be useful in restricting usage of a QOS that may have
greater preemptive capability or additional resources to be allowed only
within a reservation.
GraceTime
Preemption grace time to be extended to a job which has been selected for
preemption.
GrpTRESMins
The total number of TRES minutes that can possibly be used by past, present and
future jobs running from this QOS.
GrpTRESRunMins Used to limit the combined total number of TRES
minutes used by all jobs running with this QOS. This takes into consideration time
limit of running jobs and consumes it, if the limit is reached no new jobs are
started until other jobs finish to allow time to free up.
GrpTRES
Maximum number of TRES running jobs are able to be allocated in aggregate for this
QOS.
GrpJobs
Maximum number of running jobs in aggregate for this QOS.
GrpSubmitJobs
Maximum number of jobs which can be in a pending or running state at any time in
aggregate for this QOS.
GrpWall
Maximum wall clock time running jobs are able to be allocated in aggregate for this
QOS. If this limit is reached submission requests will be denied and the running
jobs will be killed.
ID The id of the QOS.
MaxTRESMins
Maximum number of TRES minutes each job is able to use.
MaxTRESPerJob
Maximum number of TRES each job is able to use.
MaxTRESPerNode
Maximum number of TRES each node in a job allocation can use.
MaxTRESPerUser
Maximum number of TRES each user is able to use.
MaxJobs
Maximum number of jobs each user is allowed to run at one time.
MinTRESPerJob
Minimum number of TRES each job running under this QOS must request. Otherwise the
job will pend until modified.
MaxSubmitJobs
Maximum number of jobs pending or running state at any time per user.
MaxWall
Maximum wall clock time each job is able to use.
Name Name of the QOS.
Preempt
Other QOS´ this QOS can preempt.
PreemptMode
Mechanism used to preempt jobs of this QOS if the clusters PreemptType is
configured to preempt/qos. The default preemption mechanism is specified by the
cluster-wide PreemptMode configuration parameter. Possible values are "Cluster"
(meaning use cluster default), "Cancel", "Checkpoint" and "Requeue". This option
is not compatible with PreemptMode=OFF or PreemptMode=SUSPEND (i.e. preempted jobs
must be removed from the resources).
Priority
What priority will be added to a job´s priority when using this QOS.
RawUsage=<value>
This allows an administrator to reset the raw usage accrued to a QOS. The only
value currently supported is 0 (zero). This is a settable specification only - it
cannot be used as a filter to list accounts.
UsageFactor
Usage factor when running with this QOS.
UsageThreshold
A float representing the lowest fairshare of an association allowable to run a job.
If an association falls below this threshold and has pending jobs or submits new
jobs those jobs will be held until the usage goes back above the threshold. Use
sshare to see current shares on the system.
WithDeleted
Display information with previously deleted data.
LIST/SHOW QOS FORMAT OPTIONS
Description
An arbitrary string describing a QOS.
GraceTime
Preemption grace time to be extended to a job which has been selected for
preemption in the format of hh:mm:ss. The default value is zero, no preemption
grace time is allowed on this partition. NOTE: This value is only meaningful for
QOS PreemptMode=CANCEL.
GrpTRESMins
The total number of TRES minutes that can possibly be used by past, present and
future jobs running from this QOS. To clear a previously set value use the modify
command with a new value of -1. NOTE: This limit only applies when using the
Priority Multifactor plugin. The time is decayed using the value of
PriorityDecayHalfLife or PriorityUsageResetPeriod as set in the slurm.conf. When
this limit is reached all associated jobs running will be killed and all future
jobs submitted with this QOS will be delayed until they are able to run inside the
limit.
GrpTRES
Maximum number of TRES running jobs are able to be allocated in aggregate for this
QOS. To clear a previously set value use the modify command with a new value of
-1.
GrpJobs
Maximum number of running jobs in aggregate for this QOS. To clear a previously
set value use the modify command with a new value of -1.
GrpSubmitJobs
Maximum number of jobs which can be in a pending or running state at any time in
aggregate for this QOS. To clear a previously set value use the modify command
with a new value of -1.
GrpWall
Maximum wall clock time running jobs are able to be allocated in aggregate for this
QOS. To clear a previously set value use the modify command with a new value of
-1. NOTE: This limit only applies when using the Priority Multifactor plugin. The
time is decayed using the value of PriorityDecayHalfLife or
PriorityUsageResetPeriod as set in the slurm.conf. When this limit is reached all
associated jobs running will be killed and all future jobs submitted with this QOS
will be delayed until they are able to run inside the limit.
MaxTRESMins
Maximum number of TRES minutes each job is able to use. To clear a previously set
value use the modify command with a new value of -1.
MaxTRESPerJob
Maximum number of TRES each job is able to use. To clear a previously set value
use the modify command with a new value of -1.
MaxTRESPerNode
Maximum number of TRES each node in a job allocation can use. To clear a
previously set value use the modify command with a new value of -1.
MaxTRESPerUser
Maximum number of TRES each user is able to use. To clear a previously set value
use the modify command with a new value of -1.
MaxJobs
Maximum number of jobs each user is allowed to run at one time. To clear a
previously set value use the modify command with a new value of -1.
MaxSubmitJobs
Maximum number of jobs pending or running state at any time per user. To clear a
previously set value use the modify command with a new value of -1.
MaxWall
Maximum wall clock time each job is able to use. <max wall> format is <min> or
<min>:<sec> or <hr>:<min>:<sec> or <days>-<hr>:<min>:<sec> or <days>-<hr>. The
value is recorded in minutes with rounding as needed. To clear a previously set
value use the modify command with a new value of -1.
MinTRES
Minimum number of TRES each job running under this QOS must request. Otherwise the
job will pend until modified. To clear a previously set value use the modify
command with a new value of -1.
Name Name of the QOS. Needed for creation.
Preempt
Other QOS´ this QOS can preempt. Setting a Preempt to '' (two single quotes with
nothing between them) restores its default setting. You can also use the operator
+= and -= to add or remove certain QOS's from a QOS list.
PreemptMode
Mechanism used to preempt jobs of this QOS if the clusters PreemptType is
configured to preempt/qos. The default preemption mechanism is specified by the
cluster-wide PreemptMode configuration parameter. Possible values are "Cluster"
(meaning use cluster default), "Cancel", "Checkpoint" and "Requeue". This option
is not compatible with PreemptMode=OFF or PreemptMode=SUSPEND (i.e. preempted jobs
must be removed from the resources).
Priority
What priority will be added to a job´s priority when using this QOS. To clear a
previously set value use the modify command with a new value of -1.
UsageFactor
Usage factor when running with this QOS. This is a float that is factored into the
priority time calculations of running jobs. e.g. if the usagefactor of a QOS was 2
for every TRESBillingUnit second a job ran it would count for 2. Also if the
usagefactor was .5, every second would only count for half of the time. Setting
this value to 0 will make it so that running jobs will not add time to fairshare or
association/qos limits. To clear a previously set value use the modify command
with a new value of -1.
SPECIFICATIONS FOR RESOURCE
Clusters=<name list> Comma separated list of cluster names on which specified resources
are to be available. If no names are designated then the clusters already allowed to use
this resource will be altered.
Count=<OPT>
Number of software resources of a specific name configured on the system being
controlled by a resource manager.
Descriptions=
A brief description of the resource.
Flags=<OPT>
Flags that identify specific attributes of the system resource. At this time no
flags have been defined.
ServerType=<OPT>
The type of a software resource manager providing the licenses. For example
FlexNext Publisher Flexlm license server or Reprise License Manager RLM.
Names=<OPT>
Comma separated list of the name of a resource configured on the system being
controlled by a resource manager. If this resource is seen on the slurmctld it's
name will be name@server to distinguish it from local resources defined in a
slurm.conf.
PercentAllowed=<percent allowed>
Percentage of a specific resource that can be used on specified cluster.
Server=<OPT>
The name of the server serving up the resource. Default is 'slurmdb' indicating
the licenses are being served by the database.
Type=<OPT>
The type of the resource represented by this record. Currently the only valid type
is License.
WithClusters
Display the clusters percentage of resources. If a resource hasn't been given to a
cluster the resource will not be displayed with this flag.
NOTE: Resource is used to define each resource configured on a system available for usage
by Slurm clusters.
LIST/SHOW RESOURCE FORMAT OPTIONS
Cluster
Name of cluster resource is given to.
Count The count of a specific resource configured on the system globally.
Allocated
The percent of licenses allocated to a cluster.
Description
Description of the resource.
ServerType
The type of the server controlling the licenses.
Name Name of this resource.
Server Server serving up the resource.
Type Type of resource this record represents.
SPECIFICATIONS FOR TRANSACTIONS
Accounts=<comma separated list of account names>
Only print out the transactions affecting specified accounts.
Action=<Specific action the list will display>
Actor=<Specific name the list will display>
Only display transactions done by a certain person.
Clusters=<comma separated list of cluster names>
Only print out the transactions affecting specified clusters.
End=<Date and time of last transaction to return>
Return all transactions before this Date and time. Default is now.
Start=<Date and time of first transaction to return>
Return all transactions after this Date and time. Default is epoch.
Valid time formats for End and Start are...
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
Users=<comma separated list of user names>
Only print out the transactions affecting specified users.
WithAssoc
Get information about which associations were affected by the transactions.
LIST/SHOW TRANSACTIONS FORMAT OPTIONS
Action
Actor
Info
TimeStamp
Where
NOTE: If using the WithAssoc option you can also view the information about the various
associations the transaction affected. The Association format fields are described in the
LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
SPECIFICATIONS FOR USERS
Account=<account>
Account name to add this user to.
AdminLevel=<level>
Admin level of user. Valid levels are None, Operator, and Admin.
Cluster=<cluster>
Specific cluster to add user to the account on. Default is all in system.
DefaultAccount=<account>
Identify the default bank account name to be used for a job if none is specified at
submission time.
DefaultWCKey=<defaultwckey>
Identify the default Workload Characterization Key.
Name=<name>
Name of user.
Partition=<name>
Partition name.
RawUsage=<value>
This allows an administrator to reset the raw usage accrued to a user. The only
value currently supported is 0 (zero). This is a settable specification only - it
cannot be used as a filter to list users.
WCKeys=<wckeys>
Workload Characterization Key values.
WithAssoc
Display all associations for this user.
WithCoord
Display all accounts a user is coordinator for.
WithDeleted
Display information with previously deleted data.
NOTE: If using the WithAssoc option you can also query against association specific
information to view only certain associations this account may have. These extra options
can be found in the SPECIFICATIONS FOR ASSOCIATIONS section. You can also use the general
specifications list above in the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES
section.
LIST/SHOW USER FORMAT OPTIONS
AdminLevel
Admin level of user.
DefaultAccount
The user's default account.
Coordinators
List of users that are a coordinator of the account. (Only filled in when using the
WithCoordinator option.)
User The name of a user.
NOTE: If using the WithAssoc option you can also view the information about the various
associations the user may have on all the clusters in the system. The Association format
fields are described in the LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
LIST/SHOW WCKey
WCKey Workload Characterization Key.
Cluster
Specific cluster for the WCKey.
User The name of a user for the WCKey.
NOTE: If using the WithAssoc option you can also view the information about the various
associations the user may have on all the clusters in the system. The Association format
fields are described in the LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
LIST/SHOW TRES
Name The name of the trackable resource. This option is required for TRES types BB
(Burst buffer), GRES, and License. Types CPU, Energy, Memory, and Node do not have
Names. For example if GRES is the type then name is the denomination of the GRES
itself e.g. GPU.
ID The identification number of the trackable resource as it appears in the database.
Type The type of the trackable resource. Current types are BB (Burst buffer), CPU,
Energy, GRES, License, Memory, and Node.
TRES information
Trackable RESources (TRES) are used in many QOS or Association limits. When setting the
limits they are comma separated list. Each TRES has a different limit, i.e.
GrpTRESMins=cpu=10,mem=20 would make 2 different limits 1 for 10 cpu minutes and 1 for 20
MB memory minutes. This is the case for each limit that deals with TRES. To remove the
limit -1 is used i.e. GrpTRESMins=cpu-1 would remove only the cpu TRES limit.
NOTE: On GrpTRES limits dealing with nodes as a TRES. Each job's node allocation is
counted separately (i.e. if a single node has resources allocated to two jobs, this is
counted as two allocated nodes).
NOTE: When dealing with Memory as a TRES all limits are in MB.
GLOBAL FORMAT OPTION
When using the format option for listing various fields you can put a %NUMBER afterwards
to specify how many characters should be printed.
e.g. format=name%30 will print 30 characters of field name right justified. A -30 will
print 30 characters left justified.
FLAT FILE DUMP AND LOAD
sacctmgr has the capability to load and dump Slurm association data to and from a file.
This method can easily add a new cluster or copy an existing clusters associations into a
new cluster with similar accounts. Each file contains Slurm association data for a single
cluster. Comments can be put into the file with the # character. Each line of
information must begin with one of the four titles; Cluster, Parent, Account or User.
Following the title is a space, dash, space, entity value, then specifications.
Specifications are colon separated. If any variable such as Organization has a space in
it, surround the name with single or double quotes.
To create a file of associations one can run
> sacctmgr dump tux file=tux.cfg
(file=tux.cfg is optional)
To load a previously created file you can run
> sacctmgr load file=tux.cfg
Other options for load are -
clean - delete what was already there and start from scratch with this information.
Cluster= - specify a different name for the cluster than that which is in the file.
Quick explanation how the file works.
Since the associations in the system follow a hierarchy, so does the file. Anything that
is a parent needs to be defined before any children. The only exception is the understood
'root' account. This is always a default for any cluster and does not need to be defined.
To edit/create a file start with a cluster line for the new cluster
Cluster - cluster_name:MaxNodesPerJob=15
Anything included on this line will be the defaults for all associations on this cluster.
These options are as follows...
GrpTRESMins=
The total number of TRES minutes that can possibly be used by past, present and
future jobs running from this association and its children.
GrpTRESRunMins=
Used to limit the combined total number of TRES minutes used by all jobs running
with this association and its children. This takes into consideration time limit
of running jobs and consumes it, if the limit is reached no new jobs are started
until other jobs finish to allow time to free up.
GrpTRES=
Maximum number of TRES running jobs are able to be allocated in aggregate for this
association and all associations which are children of this association.
GrpJobs=
Maximum number of running jobs in aggregate for this association and all
associations which are children of this association.
GrpNodes=
Maximum number of nodes running jobs are able to be allocated in aggregate for this
association and all associations which are children of this association.
NOTE: Each job's node allocation is counted separately (i.e. if a single node has
resources allocated to two jobs, this is counted as two allocated nodes).
GrpSubmitJobs=
Maximum number of jobs which can be in a pending or running state at any time in
aggregate for this association and all associations which are children of this
association.
GrpWall=
Maximum wall clock time running jobs are able to be allocated in aggregate for this
association and all associations which are children of this association.
FairShare=
Number used in conjunction with other associations to determine job priority.
MaxJobs=
Maximum number of jobs the children of this association can run.
MaxNodesPerJob=
Maximum number of nodes per job the children of this association can run.
MaxWallDurationPerJob=
Maximum time (not related to job size) children of this accounts jobs can run.
QOS= Comma separated list of Quality of Service names (Defined in sacctmgr).
Followed by Accounts you want in this fashion...
Parent - root (Defined by default)
Account - cs:MaxNodesPerJob=5:MaxJobs=4:FairShare=399:MaxWallDurationPerJob=40:Description='Computer
Science':Organization='LC'
Parent - cs
Account - test:MaxNodesPerJob=1:MaxJobs=1:FairShare=1:MaxWallDurationPerJob=1:Description='Test
Account':Organization='Test'
Any of the options after a ':' can be left out and they can be in any order.
If you want to add any sub accounts just list the Parent THAT HAS ALREADY BEEN
CREATED before the account line in this fashion...
All account options are
Description=
A brief description of the account.
GrpTRESMins=
Maximum number of TRES hours running jobs are able to be allocated in aggregate for
this association and all associations which are children of this association.
GrpTRESRunMins= Used to limit the combined total number of TRES minutes used by all
jobs running with this association and its children. This takes into consideration
time limit of running jobs and consumes it, if the limit is reached no new jobs are
started until other jobs finish to allow time to free up.
GrpTRES=
Maximum number of TRES running jobs are able to be allocated in aggregate for this
association and all associations which are children of this association.
GrpJobs=
Maximum number of running jobs in aggregate for this association and all
associations which are children of this association.
GrpNodes=
Maximum number of nodes running jobs are able to be allocated in aggregate for this
association and all associations which are children of this association.
NOTE: Each job's node allocation is counted separately (i.e. if a single node has
resources allocated to two jobs, this is counted as two allocated nodes).
GrpSubmitJobs=
Maximum number of jobs which can be in a pending or running state at any time in
aggregate for this association and all associations which are children of this
association.
GrpWall=
Maximum wall clock time running jobs are able to be allocated in aggregate for this
association and all associations which are children of this association.
FairShare=
Number used in conjunction with other associations to determine job priority.
MaxJobs=
Maximum number of jobs the children of this association can run.
MaxNodesPerJob=
Maximum number of nodes per job the children of this association can run.
MaxWallDurationPerJob=
Maximum time (not related to job size) children of this accounts jobs can run.
Organization=
Name of organization that owns this account.
QOS(=,+=,-=)
Comma separated list of Quality of Service names (Defined in sacctmgr).
To add users to a account add a line like this after a Parent - line
Parent - test
User - adam:MaxNodesPerJob=2:MaxJobs=3:FairShare=1:MaxWallDurationPerJob=1:AdminLevel=Operator:Coordinator='test'
All user options are
AdminLevel=
Type of admin this user is (Administrator, Operator)
Must be defined on the first occurrence of the user.
Coordinator=
Comma separated list of accounts this user is coordinator over
Must be defined on the first occurrence of the user.
DefaultAccount=
system wide default account name
Must be defined on the first occurrence of the user.
FairShare=
Number used in conjunction with other associations to determine job priority.
MaxJobs=
Maximum number of jobs this user can run.
MaxNodesPerJob=
Maximum number of nodes per job this user can run.
MaxWallDurationPerJob=
Maximum time (not related to job size) this user can run.
QOS(=,+=,-=)
Comma separated list of Quality of Service names (Defined in sacctmgr).
ARCHIVE FUNCTIONALITY
Sacctmgr has the capability to archive to a flatfile and or load that data if needed
later. The archiving is usually done by the slurmdbd and it is highly recommended you
only do it through sacctmgr if you completely understand what you are doing. For slurmdbd
options see "man slurmdbd" for more information. Loading data into the database can be
done from these files to either view old data or regenerate rolled up data.
These are the options for both dump and load of archive information.
archive dump
Directory=
Directory to store the archive data.
Events Archive Events. If not specified and PurgeEventAfter is set all event data removed
will be lost permanently.
Jobs Archive Jobs. If not specified and PurgeJobAfter is set all job data removed will
be lost permanently.
PurgeEventAfter=
Purge cluster event records older than time stated in months. If you want to purge
on a shorter time period you can include hours, or days behind the numeric value to
get those more frequent purges. (e.g. a value of '12hours' would purge everything
older than 12 hours.)
PurgeJobAfter=
Purge job records older than time stated in months. If you want to purge on a
shorter time period you can include hours, or days behind the numeric value to get
those more frequent purges. (e.g. a value of '12hours' would purge everything older
than 12 hours.)
PurgeStepAfter=
Purge step records older than time stated in months. If you want to purge on a
shorter time period you can include hours, or days behind the numeric value to get
those more frequent purges. (e.g. a value of '12hours' would purge everything older
than 12 hours.)
PurgeSuspendAfter=
Purge job suspend records older than time stated in months. If you want to purge
on a shorter time period you can include hours, or days behind the numeric value to
get those more frequent purges. (e.g. a value of '12hours' would purge everything
older than 12 hours.)
Script=
Run this script instead of the generic form of archive to flat files.
Steps Archive Steps. If not specified and PurgeStepAfter is set all step data removed
will be lost permanently.
Suspend
Archive Suspend Data. If not specified and PurgeSuspendAfter is set all suspend
data removed will be lost permanently.
Archive Load
Load in to the database previously archived data.
File= File to load into database.
Insert=
SQL to insert directly into the database. This should be used very cautiously
since this is writing your sql into the database.
ENVIRONMENT VARIABLES
Some sacctmgr options may be set via environment variables. These environment variables,
along with their corresponding options, are listed below. (Note: commandline options will
always override these settings)
SLURM_CONF The location of the Slurm configuration file.
EXAMPLES
NOTE: There is an order to set up accounting associations. You must define clusters
before you add accounts and you must add accounts before you can add users.
-> sacctmgr create cluster tux
-> sacctmgr create account name=science fairshare=50
-> sacctmgr create account name=chemistry parent=science fairshare=30
-> sacctmgr create account name=physics parent=science fairshare=20
-> sacctmgr create user name=adam cluster=tux account=physics fairshare=10
-> sacctmgr delete user name=adam cluster=tux account=physics
-> sacctmgr delete account name=physics cluster=tux
-> sacctmgr modify user where name=adam cluster=tux account=physics set
maxjobs=2 maxwall=30:00
-> sacctmgr list associations cluster=tux format=Account,Cluster,User,Fairshare tree withd
-> sacctmgr list transactions StartTime=11/03\-10:30:00 format=Timestamp,Action,Actor
-> sacctmgr dump cluster=tux file=tux_data_file
-> sacctmgr load tux_data_file
A user's account can not be changed directly. A new association needs to be created for
the user with the new account. Then the association with the old account can be deleted.
When modifying an object placing the key words 'set' and the optional 'where' is critical
to perform correctly below are examples to produce correct results. As a rule of thumb
anything you put in front of the set will be used as a quantifier. If you want to put a
quantifier after the key word 'set' you should use the key word 'where'.
wrong-> sacctmgr modify user name=adam set fairshare=10 cluster=tux
This will produce an error as the above line reads modify user adam set fairshare=10 and
cluster=tux.
right-> sacctmgr modify user name=adam cluster=tux set fairshare=10
right-> sacctmgr modify user name=adam set fairshare=10 where cluster=tux
When changing qos for something only use the '=' operator when wanting to explicitly set
the qos to something. In most cases you will want to use the '+=' or '\-=' operator to
either add to or remove from the existing qos already in place.
If a user already has qos of normal,standby for a parent or it was explicitly set you
should use qos+=expedite to add this to the list in this fashion.
If you are looking to only add the qos expedite to only a certain account and or cluster
you can do that by specifying them in the sacctmgr line.
-> sacctmgr modify user name=adam set qos+=expedite
> sacctmgr modify user name=adam acct=this cluster=tux set qos+=expedite
Let's give an example how to add QOS to user accounts. List all available QOSs in the
cluster.
->sacctmgr show qos format=name
Name
---------
normal
expedite
List all the associations in the cluster.
->sacctmgr show assoc format=cluster,account,qos
Cluster Account QOS
-------- ---------- -----
zebra root normal
zebra root normal
zebra g normal
zebra g1 normal
Add the QOS expedite to account G1 and display the result. Using the operator += the QOS
will be added together with the existing QOS to this account.
->sacctmgr modify account name=g1 set qos+=expedite
->sacctmgr show assoc format=cluster,account,qos
Cluster Account QOS
-------- -------- -------
zebra root normal
zebra root normal
zebra g normal
zebra g1 expedite,normal
Now set the QOS expedite as the only QOS for the account G and display the result. Using
the operator = that expedite is the only usable QOS by account G
->sacctmgr modify account name=G set qos=expedite
>sacctmgr show assoc format=cluster,account,user,qos
Cluster Account QOS
--------- -------- -----
zebra root normal
zebra root normal
zebra g expedite
zebra g1 expedite,normal
If a new account is added under the account G it will inherit the QOS expedite and it will
not have access to QOS normal.
->sacctmgr add account banana parent=G
->sacctmgr show assoc format=cluster,account,qos
Cluster Account QOS
--------- -------- -----
zebra root normal
zebra root normal
zebra g expedite
zebra banana expedite
zebra g1 expedite,normal
An example of listing trackable resources
->sacctmgr show tres
Type Name ID
---------- ----------------- --------
cpu 1
mem 2
energy 3
node 4
gres gpu:tesla 1001
license vcs 1002
bb cray 1003
COPYING
Copyright (C) 2008-2010 Lawrence Livermore National Security. Produced at Lawrence
Livermore National Laboratory (cf, DISCLAIMER).
Copyright (C) 2010-2015 SchedMD LLC.
This file is part of Slurm, a resource management program. For details, see
<http://slurm.schedmd.com/>.
Slurm is free software; you can redistribute it and/or modify it under the terms of the
GNU General Public License as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Use sacctmgr online using onworks.net services