scontrol - Online in the Cloud

This is the command scontrol that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


scontrol - Used view and modify Slurm configuration and state.

SYNOPSIS


scontrol [OPTIONS...] [COMMAND...]

DESCRIPTION


scontrol is used to view or modify Slurm configuration including: job, job step, node,
partition, reservation, and overall system configuration. Most of the commands can only be
executed by user root. If an attempt to view or modify configuration information is made
by an unauthorized user, an error message will be printed and the requested action will
not occur. If no command is entered on the execute line, scontrol will operate in an
interactive mode and prompt for input. It will continue prompting for input and executing
commands until explicitly terminated. If a command is entered on the execute line,
scontrol will execute that command and terminate. All commands and options are
case-insensitive, although node names, partition names, and reservation names are
case-sensitive (node names "LX" and "lx" are distinct). All commands and options can be
abbreviated to the extent that the specification is unique. A modified Slurm
configuration can be written to a file using the scontrol write config command. The
resulting file will be named using the convention "slurm.conf.<datetime>" and located in
the same directory as the original "slurm.conf" file. The directory containing the
original slurm.conf must be writable for this to occur.

OPTIONS


-a, --all
When the show command is used, then display all partitions, their jobs and jobs
steps. This causes information to be displayed about partitions that are configured
as hidden and partitions that are unavailable to user's group.

-d, --details
Causes the show command to provide additional details where available. Repeating
the option more than once (e.g., "-dd") will cause the show job command to also
list the batch script, if the job was a batch job.

-h, --help
Print a help message describing the usage of scontrol.

--hide Do not display information about hidden partitions, their jobs and job steps. By
default, neither partitions that are configured as hidden nor those partitions
unavailable to user's group will be displayed (i.e. this is the default behavior).

-M, --clusters=<string>
The cluster to issue commands to. Only one cluster name may be specified.

-o, --oneliner
Print information one line per record.

-Q, --quiet
Print no warning or informational messages, only fatal error messages.

-v, --verbose
Print detailed event logging. Multiple -v's will further increase the verbosity of
logging. By default only errors will be displayed.

-V , --version
Print version information and exit.

COMMANDS

all Show all partitions, their jobs and jobs steps. This causes information to be
displayed about partitions that are configured as hidden and partitions that are
unavailable to user's group.

abort Instruct the Slurm controller to terminate immediately and generate a core file.
See "man slurmctld" for information about where the core file will be written.

checkpoint CKPT_OP ID
Perform a checkpoint activity on the job step(s) with the specified identification.
ID can be used to identify a specific job (e.g. "<job_id>", which applies to all of
its existing steps) or a specific job step (e.g. "<job_id>.<step_id>"). Acceptable
values for CKPT_OP include:

able Test if presently not disabled, report start time if checkpoint in
progress

create Create a checkpoint and continue the job or job step

disable Disable future checkpoints

enable Enable future checkpoints

error Report the result for the last checkpoint request, error code and
message

restart Restart execution of the previously checkpointed job or job step

requeue Create a checkpoint and requeue the batch job, combines vacate and
restart operations

vacate Create a checkpoint and terminate the job or job step
Acceptable values for CKPT_OP include:

MaxWait=<seconds> Maximum time for checkpoint to be written. Default value is 10
seconds. Valid with create and vacate options only.

ImageDir=<directory_name>
Location of checkpoint file. Valid with create, vacate and
restart options only. This value takes precedent over any
--checkpoint-dir value specified at job submission time.

StickToNodes If set, resume job on the same nodes are previously used.
Valid with the restart option only.

cluster CLUSTER_NAME
The cluster to issue commands to. Only one cluster name may be specified.

create SPECIFICATION
Create a new partition or reservation. See the full list of parameters below.
Include the tag "res" to create a reservation without specifying a reservation
name.

completing
Display all jobs in a COMPLETING state along with associated nodes in either a
COMPLETING or DOWN state.

delete SPECIFICATION
Delete the entry with the specified SPECIFICATION. The two SPECIFICATION choices
are PartitionName=<name> and Reservation=<name>. On Dynamically laid out Bluegene
systems BlockName=<name> also works. Reservations and partitions should have no
associated jobs at the time of their deletion (modify the job's first). If the
specified partition is in use, the request is denied.

details
Causes the show command to provide additional details where available. Job
information will include CPUs and NUMA memory allocated on each node. Note that on
computers with hyperthreading enabled and Slurm configured to allocate cores, each
listed CPU represents one physical core. Each hyperthread on that core can be
allocated a separate task, so a job's CPU count and task count may differ. See the
--cpu_bind and --mem_bind option descriptions in srun man pages for more
information. The details option is currently only supported for the show job
command. To also list the batch script for batch jobs, in addition to the details,
use the script option described below instead of this option.

errnumstr ERRNO
Given a Slurm error number, return a descriptive string.

exit Terminate the execution of scontrol. This is an independent command with no
options meant for use in interactive mode.

help Display a description of scontrol options and commands.

hide Do not display partition, job or jobs step information for partitions that are
configured as hidden or partitions that are unavailable to the user's group. This
is the default behavior.

hold job_list
Prevent a pending job from beginning started (sets it's priority to 0). Use the
release command to permit the job to be scheduled. The job_list argument is a
comma separated list of job IDs OR "jobname=" with the job's name, which will
attempt to hold all jobs having that name. Note that when a job is held by a
system administrator using the hold command, only a system administrator may
release the job for execution (also see the uhold command). When the job is held by
its owner, it may also be released by the job's owner.

notify job_id message
Send a message to standard error of the salloc or srun command or batch job
associated with the specified job_id.

oneliner
Print information one line per record.

pidinfo proc_id
Print the Slurm job id and scheduled termination time corresponding to the supplied
process id, proc_id, on the current node. This will work only with processes on
node on which scontrol is run, and only for those processes spawned by Slurm and
their descendants.

listpids [job_id[.step_id]] [NodeName]
Print a listing of the process IDs in a job step (if JOBID.STEPID is provided), or
all of the job steps in a job (if job_id is provided), or all of the job steps in
all of the jobs on the local node (if job_id is not provided or job_id is "*").
This will work only with processes on the node on which scontrol is run, and only
for those processes spawned by Slurm and their descendants. Note that some Slurm
configurations (ProctrackType value of pgid or aix) are unable to identify all
processes associated with a job or job step.

Note that the NodeName option is only really useful when you have multiple slurmd
daemons running on the same host machine. Multiple slurmd daemons on one host are,
in general, only used by Slurm developers.

ping Ping the primary and secondary slurmctld daemon and report if they are responding.

quiet Print no warning or informational messages, only fatal error messages.

quit Terminate the execution of scontrol.

reboot_nodes [NodeList]
Reboot all nodes in the system when they become idle using the RebootProgram as
configured in Slurm's slurm.conf file. Accepts an option list of nodes to reboot.
By default all nodes are rebooted. NOTE: This command does not prevent additional
jobs from being scheduled on these nodes, so many jobs can be executed on the nodes
prior to them being rebooted. You can explicitly drain the nodes in order to reboot
nodes as soon as possible, but the nodes must also explicitly be returned to
service after being rebooted. You can alternately create an advanced reservation to
prevent additional jobs from being initiated on nodes to be rebooted. NOTE: Nodes
will be placed in a state of "MAINT" until rebooted and returned to service with a
normal state. Alternately the node's state "MAINT" may be cleared by using the
scontrol command to set the node state to "RESUME", which clears the "MAINT" flag.

reconfigure
Instruct all Slurm daemons to re-read the configuration file. This command does
not restart the daemons. This mechanism would be used to modify configuration
parameters (Epilog, Prolog, SlurmctldLogFile, SlurmdLogFile, etc.). The Slurm
controller (slurmctld) forwards the request all other daemons (slurmd daemon on
each compute node). Running jobs continue execution. Most configuration parameters
can be changed by just running this command, however, Slurm daemons should be
shutdown and restarted if any of these parameters are to be changed: AuthType,
BackupAddr, BackupController, ControlAddr, ControlMach, PluginDir,
StateSaveLocation, SlurmctldPort or SlurmdPort. The slurmctld daemon must be
restarted if nodes are added to or removed from the cluster.

release job_list
Release a previously held job to begin execution. The job_list argument is a comma
separated list of job IDs OR "jobname=" with the job's name, which will attempt to
hold all jobs having that name. Also see hold.

requeue job_list
Requeue a running, suspended or finished Slurm batch job into pending state. The
job_list argument is a comma separated list of job IDs.

requeuehold job_list
Requeue a running, suspended or finished Slurm batch job into pending state,
moreover the job is put in held state (priority zero). The job_list argument is a
comma separated list of job IDs. A held job can be released using scontrol to
reset its priority (e.g. "scontrol release <job_id>"). The command accepts the
following option:

State=SpecialExit
The "SpecialExit" keyword specifies that the job has to be put in a special
state JOB_SPECIAL_EXIT. The "scontrol show job" command will display the
JobState as SPECIAL_EXIT, while the "squeue" command as SE.

resume job_list
Resume a previously suspended job. The job_list argument is a comma separated list
of job IDs. Also see suspend.

NOTE: A suspended job releases its CPUs for allocation to other jobs. Resuming a
previously suspended job may result in multiple jobs being allocated the same CPUs,
which could trigger gang scheduling with some configurations or severe degradation
in performance with other configurations. Use of the scancel command to send
SIGSTOP and SIGCONT signals would stop a job without releasing its CPUs for
allocaiton to other jobs and would be a preferable mechanism in many cases. Use
with caution.

schedloglevel LEVEL
Enable or disable scheduler logging. LEVEL may be "0", "1", "disable" or "enable".
"0" has the same effect as "disable". "1" has the same effect as "enable". This
value is temporary and will be overwritten when the slurmctld daemon reads the
slurm.conf configuration file (e.g. when the daemon is restarted or scontrol
reconfigure is executed) if the SlurmSchedLogLevel parameter is present.

script Causes the show job command to list the batch script for batch jobs in addition to
the detail information described under the details option above.

setdebug LEVEL
Change the debug level of the slurmctld daemon. LEVEL may be an integer value
between zero and nine (using the same values as SlurmctldDebug in the slurm.conf
file) or the name of the most detailed message type to be printed: "quiet",
"fatal", "error", "info", "verbose", "debug", "debug2", "debug3", "debug4", or
"debug5". This value is temporary and will be overwritten whenever the slurmctld
daemon reads the slurm.conf configuration file (e.g. when the daemon is restarted
or scontrol reconfigure is executed).

setdebugflags [+|-]FLAG
Add or remove DebugFlags of the slurmctld daemon. See "man slurm.conf" for a list
of supported DebugFlags. NOTE: Changing the value of some DebugFlags will have no
effect without restarting the slurmctld daemon, which would set DebugFlags based
upon the contents of the slurm.conf configuration file.

show ENTITY ID
Display the state of the specified entity with the specified identification.
ENTITY may be aliases, cache, config, daemons, frontend, job, node, partition,
powercap, reservation, slurmd, step, topology, hostlist, hostlistsorted or
hostnames (also block or submp on BlueGene systems). ID can be used to identify a
specific element of the identified entity: job ID, node name, partition name,
reservation name, or job step ID for job, node, partition, or step respectively.
For an ENTITY of topology, the ID may be a node or switch name. If one node name
is specified, all switches connected to that node (and their parent switches) will
be shown. If more than one node name is specified, only switches that connect to
all named nodes will be shown. aliases will return all NodeName values associated
to a given NodeHostname (useful to get the list of virtual nodes associated with a
real node in a configuration where multiple slurmd daemons execute on a single
compute node). cache displays the current contents of the slurmctld's internal
cache for users and associations. config displays parameter names from the
configuration files in mixed case (e.g. SlurmdPort=7003) while derived parameters
names are in upper case only (e.g. SLURM_VERSION). hostnames takes an optional
hostlist expression as input and writes a list of individual host names to standard
output (one per line). If no hostlist expression is supplied, the contents of the
SLURM_NODELIST environment variable is used. For example "tux[1-3]" is mapped to
"tux1","tux2" and "tux3" (one hostname per line). hostlist takes a list of host
names and prints the hostlist expression for them (the inverse of hostnames).
hostlist can also take the absolute pathname of a file (beginning with the
character '/') containing a list of hostnames. Multiple node names may be
specified using simple node range expressions (e.g. "lx[10-20]"). All other ID
values must identify a single element. The job step ID is of the form
"job_id.step_id", (e.g. "1234.1"). slurmd reports the current status of the slurmd
daemon executing on the same node from which the scontrol command is executed (the
local host). It can be useful to diagnose problems. By default hostlist does not
sort the node list or make it unique (e.g. tux2,tux1,tux2 = tux[2,1-2]). If you
wanted a sorted list use hostlistsorted (e.g. tux2,tux1,tux2 = tux[1-2,2]). By
default, all elements of the entity type specified are printed. For an ENTITY of
job, if the job does not specify socket-per-node, cores-per-socket or threads-per-
core then it will display '*' in ReqS:C:T=*:*:* field.

shutdown OPTION
Instruct Slurm daemons to save current state and terminate. By default, the Slurm
controller (slurmctld) forwards the request all other daemons (slurmd daemon on
each compute node). An OPTION of slurmctld or controller results in only the
slurmctld daemon being shutdown and the slurmd daemons remaining active.

suspend job_list
Suspend a running job. The job_list argument is a comma separated list of job IDs.
Use the resume command to resume its execution. User processes must stop on
receipt of SIGSTOP signal and resume upon receipt of SIGCONT for this operation to
be effective. Not all architectures and configurations support job suspension. If
a suspended job is requeued, it will be placed in a held state.

takeover
Instruct Slurm's backup controller (slurmctld) to take over system control.
Slurm's backup controller requests control from the primary and waits for its
termination. After that, it switches from backup mode to controller mode. If
primary controller can not be contacted, it directly switches to controller mode.
This can be used to speed up the Slurm controller fail-over mechanism when the
primary node is down. This can be used to minimize disruption if the computer
executing the primary Slurm controller is scheduled down. (Note: Slurm's primary
controller will take the control back at startup.)

uhold job_list
Prevent a pending job from being started (sets it's priority to 0). The job_list
argument is a space separated list of job IDs or job names. Use the release
command to permit the job to be scheduled. This command is designed for a system
administrator to hold a job so that the job owner may release it rather than
requiring the intervention of a system administrator (also see the hold command).

update SPECIFICATION
Update job, step, node, partition, powercapping or reservation configuration per
the supplied specification. SPECIFICATION is in the same format as the Slurm
configuration file and the output of the show command described above. It may be
desirable to execute the show command (described above) on the specific entity you
which to update, then use cut-and-paste tools to enter updated configuration values
to the update. Note that while most configuration values can be changed using this
command, not all can be changed using this mechanism. In particular, the hardware
configuration of a node or the physical addition or removal of nodes from the
cluster may only be accomplished through editing the Slurm configuration file and
executing the reconfigure command (described above).

verbose
Print detailed event logging. This includes time-stamps on data structures, record
counts, etc.

version
Display the version number of scontrol being executed.

wait_job job_id
Wait until a job and all of its nodes are ready for use or the job has entered some
termination state. This option is particularly useful in the Slurm Prolog or in the
batch script itself if nodes are powered down and restarted automatically as
needed.

write config
Write the current configuration to a file with the naming convention of
"slurm.conf.<datetime>" in the same directory as the original slurm.conf file.

!! Repeat the last command executed.

SPECIFICATIONS FOR UPDATE COMMAND, JOBS

Account=<account>
Account name to be changed for this job's resource use. Value may be cleared with
blank data value, "Account=".

ArrayTaskThrottle=<count>
Speciify the maximum number of tasks in a job array that can execute at the same
time. Set the count to zero in order to eliminate any limit. The task throttle
count for a job array is reported as part of its ArrayTaskId field, preceded with a
percent sign. For example "ArrayTaskId=1-10%2" indicates the maximum number of
running tasks is limited to 2.

BurstBuffer=<spec>
Burst buffer specification to be changed for this job's resource use. Value may be
cleared with blank data value, "BurstBuffer=". Format is burst buffer plugin
specific.

Conn-Type=<type>
Reset the node connection type. Supported only on IBM BlueGene systems. Possible
values on are "MESH", "TORUS" and "NAV" (mesh else torus).

Contiguous=<yes|no>
Set the job's requirement for contiguous (consecutive) nodes to be allocated.
Possible values are "YES" and "NO". Only the Slurm administrator or root can
change this parameter.

Dependency=<dependency_list>
Defer job's initiation until specified job dependency specification is satisfied.
Cancel dependency with an empty dependency_list (e.g. "Dependency=").
<dependency_list> is of the form <type:job_id[:job_id][,type:job_id[:job_id]]>.
Many jobs can share the same dependency and these jobs may even belong to different
users.

after:job_id[:jobid...]
This job can begin execution after the specified jobs have begun execution.

afterany:job_id[:jobid...]
This job can begin execution after the specified jobs have terminated.

afternotok:job_id[:jobid...]
This job can begin execution after the specified jobs have terminated in
some failed state (non-zero exit code, node failure, timed out, etc).

afterok:job_id[:jobid...]
This job can begin execution after the specified jobs have successfully
executed (ran to completion with an exit code of zero).

singleton
This job can begin execution after any previously launched jobs sharing the
same job name and user have terminated.

EligibleTime=<time_spec>
See StartTime.

ExcNodeList=<nodes>
Set the job's list of excluded node. Multiple node names may be specified using
simple node range expressions (e.g. "lx[10-20]"). Value may be cleared with blank
data value, "ExcNodeList=".

Features=<features>
Set the job's required node features. The list of features may include multiple
feature names separated by ampersand (AND) and/or vertical bar (OR) operators. For
example: Features="opteron&video" or Features="fast|faster". In the first example,
only nodes having both the feature "opteron" AND the feature "video" will be used.
There is no mechanism to specify that you want one node with feature "opteron" and
another node with feature "video" in case no node has both features. If only one
of a set of possible options should be used for all allocated nodes, then use the
OR operator and enclose the options within square brackets. For example:
"Features=[rack1|rack2|rack3|rack4]" might be used to specify that all nodes must
be allocated on a single rack of the cluster, but any of those four racks can be
used. A request can also specify the number of nodes needed with some feature by
appending an asterisk and count after the feature name. For example
"Features=graphics*4" indicates that at least four allocated nodes must have the
feature "graphics." Constraints with node counts may only be combined with AND
operators. Value may be cleared with blank data value, for example "Features=".

Geometry=<geo>
Reset the required job geometry. On Blue Gene the value should be three digits
separated by "x" or ",". The digits represent the allocation size in X, Y and Z
dimensions (e.g. "2x3x4").

Gres=<list>
Specifies a comma delimited list of generic consumable resources. The format of
each entry on the list is "name[:count[*cpu]]". The name is that of the consumable
resource. The count is the number of those resources with a default value of 1.
The specified resources will be allocated to the job on each node allocated unless
"*cpu" is appended, in which case the resources will be allocated on a per cpu
basis. The available generic consumable resources is configurable by the system
administrator. A list of available generic consumable resources will be printed
and the command will exit if the option argument is "help". Examples of use
include "Gres=gpus:2*cpu,disk=40G" and "Gres=help".

JobId=<job_list>
Identify the job(s) to be updated. The job_list may be a comma separated list of
job IDs. Either JobId or JobName is required.

Licenses=<name>
Specification of licenses (or other resources available on all nodes of the
cluster) as described in salloc/sbatch/srun man pages.

MinCPUsNode=<count>
Set the job's minimum number of CPUs per node to the specified value.

MinMemoryCPU=<megabytes>
Set the job's minimum real memory required per allocated CPU to the specified
value. Either MinMemoryCPU or MinMemoryNode may be set, but not both.

MinMemoryNode=<megabytes>
Set the job's minimum real memory required per node to the specified value. Either
MinMemoryCPU or MinMemoryNode may be set, but not both.

MinTmpDiskNode=<megabytes>
Set the job's minimum temporary disk space required per node to the specified
value. Only the Slurm administrator or root can change this parameter.

JobName=<name>
Identify the name of jobs to be modified or set the job's name to the specified
value. When used to identify jobs to be modified, all jobs belonging to all users
are modified unless the UserID option is used to identify a specific user. Either
JobId or JobName is required.

Nice[=delta]
Adjust job's priority by the specified value. Default value is 100. The adjustment
range is from -10000 (highest priority) to 10000 (lowest priority). Nice value
changes are not additive, but overwrite any prior nice value and are applied to the
job's base priority. Only privileged users, Slurm administrator or root, can
specify a negative adjustment.

NodeList=<nodes>
Change the nodes allocated to a running job to shrink it's size. The specified
list of nodes must be a subset of the nodes currently allocated to the job.
Multiple node names may be specified using simple node range expressions (e.g.
"lx[10-20]"). After a job's allocation is reduced, subsequent srun commands must
explicitly specify node and task counts which are valid for the new allocation.

NumCPUs=<min_count>[-<max_count>]
Set the job's minimum and optionally maximum count of CPUs to be allocated.

NumNodes=<min_count>[-<max_count>]
Set the job's minimum and optionally maximum count of nodes to be allocated. If
the job is already running, use this to specify a node count less than currently
allocated and resources previously allocated to the job will be relinquished. After
a job's allocation is reduced, subsequent srun commands must explicitly specify
node and task counts which are valid for the new allocation. Also see the NodeList
parameter above.

NumTasks=<count>
Set the job's count of required tasks to the specified value.

Partition=<name>
Set the job's partition to the specified value.

Priority=<number>
Set the job's priority to the specified value. Note that a job priority of zero
prevents the job from ever being scheduled. By setting a job's priority to zero it
is held. Set the priority to a non-zero value to permit it to run. Explicitly
setting a job's priority clears any previously set nice value and removes the
priority/multifactor plugin's ability to manage a job's priority. In order to
restore the priority/multifactor plugin's ability to manage a job's priority, hold
and then release the job. Only the Slurm administrator or root can increase job's
priority.

QOS=<name>
Set the job's QOS (Quality Of Service) to the specified value. Value may be
cleared with blank data value, "QOS=".

ReqNodeList=<nodes>
Set the job's list of required node. Multiple node names may be specified using
simple node range expressions (e.g. "lx[10-20]"). Value may be cleared with blank
data value, "ReqNodeList=".

Requeue=<0|1>
Stipulates whether a job should be requeued after a node failure: 0 for no, 1 for
yes.

ReservationName=<name>
Set the job's reservation to the specified value. Value may be cleared with blank
data value, "ReservationName=".

Rotate=<yes|no>
Permit the job's geometry to be rotated. Possible values are "YES" and "NO".

Shared=<yes|no>
Set the job's ability to share nodes with other jobs. Possible values are "YES"
and "NO". This option can only be changed for pending jobs.

StartTime=<time_spec>
Set the job's earliest initiation time. It accepts times of the form HH:MM:SS to
run a job at a specific time of day (seconds are optional). (If that time is
already past, the next day is assumed.) You may also specify midnight, noon, fika
(3 PM) or teatime (4 PM) and you can have a time-of-day suffixed with AM or PM for
running in the morning or the evening. You can also say what day the job will be
run, by specifying a date of the form MMDDYY or MM/DD/YY or MM.DD.YY, or a date and
time as YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now + count
time-units, where the time-units can be minutes, hours, days, or weeks and you can
tell Slurm to run the job today with the keyword today and to run the job tomorrow
with the keyword tomorrow.

Notes on date/time specifications:
- although the 'seconds' field of the HH:MM:SS time specification is allowed by
the code, note that the poll time of the Slurm scheduler is not precise enough to
guarantee dispatch of the job on the exact second. The job will be eligible to
start on the next poll following the specified time. The exact poll interval
depends on the Slurm scheduler (e.g., 60 seconds with the default sched/builtin).
- if no time (HH:MM:SS) is specified, the default is (00:00:00).
- if a date is specified without a year (e.g., MM/DD) then the current year is
assumed, unless the combination of MM/DD and HH:MM:SS has already passed for that
year, in which case the next year is used.

Switches=<count>[@<max-time-to-wait>]
When a tree topology is used, this defines the maximum count of switches desired
for the job allocation. If Slurm finds an allocation containing more switches than
the count specified, the job remain pending until it either finds an allocation
with desired switch count or the time limit expires. By default there is no switch
count limit and no time limit delay. Set the count to zero in order to clean any
previously set count (disabling the limit). The job's maximum time delay may be
limited by the system administrator using the SchedulerParameters configuration
parameter with the max_switch_wait parameter option. Also see wait-for-switch.

TimeLimit=<time>
The job's time limit. Output format is [days-]hours:minutes:seconds or
"UNLIMITED". Input format (for update command) set is minutes, minutes:seconds,
hours:minutes:seconds, days-hours, days-hours:minutes or
days-hours:minutes:seconds. Time resolution is one minute and second values are
rounded up to the next minute. If changing the time limit of a job, either specify
a new time limit value or precede the time with a "+" or "-" to increment or
decrement the current time limit (e.g. "TimeLimit=+30"). In order to increment or
decrement the current time limit, the JobId specification must precede the
TimeLimit specification. Only the Slurm administrator or root can increase job's
TimeLimit.

UserID=<UID or name>
Used with the JobName option to identify jobs to be modified. Either a user name
or numeric ID (UID), may be specified.

WCKey=<key>
Set the job's workload characterization key to the specified value.

NOTE: The "show" command, when used with the "job" or "job <jobid>"
entity displays detailed information about a job or jobs. Much of this information
may be modified using the "update job" command as described above. However, the
following fields displayed by the show job command are read-only and cannot be
modified:

AllocNode:Sid
Local node and system id making the resource allocation.

BatchFlag
Jobs submitted using the sbatch command have BatchFlag set to 1. Jobs submitted
using other commands have BatchFlag set to 0.

CoreSpec=<count>
Number of cores to reserve per node for system use. The job will be charged for
these cores, but be unable to use them. Will be reported as "*" if not
constrained.

EndTime
The time the job is expected to terminate based on the job's time limit. When the
job ends sooner, this field will be updated with the actual end time.

ExitCode=<exit>:<sig>
Exit status reported for the job by the wait() function. The first number is the
exit code, typically as set by the exit() function. The second number of the
signal that caused the process to terminate if it was terminated by a signal.

GroupId
The group under which the job was submitted.

JobState
The current state of the job.

NodeList
The list of nodes allocated to the job.

NodeListIndices
The NodeIndices expose the internal indices into the node table associated with the
node(s) allocated to the job.

NtasksPerN:B:S:C=
<tasks_per_node>:<tasks_per_baseboard>:<tasks_per_socket>:<tasks_per_core>
Specifies the number of tasks to be started per hardware component (node,
baseboard, socket and core). Unconstrained values may be shown as "0" or "*".

PreemptTime
Time at which job was signaled that it was selected for preemption. (Meaningful
only for PreemptMode=CANCEL and the partition or QOS with which the job is
associated has a GraceTime value designated.)

PreSusTime
Time the job ran prior to last suspend.

Reason The reason job is not running: e.g., waiting "Resources".

ReqB:S:C:T=
<baseboard_count>:<socket_per_baseboard_count>:<core_per_socket_count>:<thread_per_core_count>
Specifies the count of various hardware components requested by the job.
Unconstrained values may be shown as "0" or "*".

SecsPreSuspend=<seconds>
If the job is suspended, this is the run time accumulated by the job (in seconds)
prior to being suspended.

Socks/Node=<count>
Count of desired sockets per node

SubmitTime
The time and date stamp (in Universal Time Coordinated, UTC) the job was
submitted. The format of the output is identical to that of the EndTime field.

NOTE: If a job is requeued, the submit time is reset. To obtain the original
submit time it is necessary to use the "sacct -j <job_id[.<step_id>]" command also
designating the -D or --duplicate option to display all duplicate entries for a
job.

SuspendTime
Time the job was last suspended or resumed.

UserId The user under which the job was submitted.

NOTE on information displayed for various job states:
When you submit a request for the "show job" function the scontrol process makes an
RPC request call to slurmctld with a REQUEST_JOB_INFO message type. If the state
of the job is PENDING, then it returns some detail information such as: min_nodes,
min_procs, cpus_per_task, etc. If the state is other than PENDING the code assumes
that it is in a further state such as RUNNING, COMPLETE, etc. In these cases the
code explicitly returns zero for these values. These values are meaningless once
the job resources have been allocated and the job has started.

SPECIFICATIONS FOR UPDATE COMMAND, STEPS

StepId=<job_id>[.<step_id>]
Identify the step to be updated. If the job_id is given, but no step_id is
specified then all steps of the identified job will be modified. This
specification is required.

CompFile=<completion file>
Update a step with information about a steps completion. Can be useful if step
statistics aren't directly available through a jobacct_gather plugin. The file is
a space-delimited file with format for Version 1 is as follows

1 34461 0 2 0 3 1361906011 1361906015 1 1 3368 13357 /bin/sleep
A B C D E F G H I J K L M

Field Descriptions:

A file version
B ALPS apid
C inblocks
D outblocks
E exit status
F number of allocated CPUs
G start time
H end time
I utime
J stime
K maxrss
L uid
M command name

TimeLimit=<time>
The job's time limit. Output format is [days-]hours:minutes:seconds or
"UNLIMITED". Input format (for update command) set is minutes, minutes:seconds,
hours:minutes:seconds, days-hours, days-hours:minutes or
days-hours:minutes:seconds. Time resolution is one minute and second values are
rounded up to the next minute. If changing the time limit of a step, either
specify a new time limit value or precede the time with a "+" or "-" to increment
or decrement the current time limit (e.g. "TimeLimit=+30"). In order to increment
or decrement the current time limit, the StepId specification must precede the
TimeLimit specification.

SPECIFICATIONS FOR UPDATE COMMAND, NODES

NodeName=<name>
Identify the node(s) to be updated. Multiple node names may be specified using
simple node range expressions (e.g. "lx[10-20]"). This specification is required.

Features=<features>
Identify feature(s) to be associated with the specified node. Any previously
defined feature(s) will be overwritten with the new value. Features assigned via
scontrol will only persist across the restart of the slurmctld daemon with the -R
option and state files preserved or slurmctld's receipt of a SIGHUP. Update
slurm.conf with any changes meant to be persistent across normal restarts of
slurmctld or the execution of scontrol reconfig.

Gres=<gres>
Identify generic resources to be associated with the specified node. Any
previously defined generic resources will be overwritten with the new value.
Specifications for multiple generic resources should be comma separated. Each
resource specification consists of a name followed by an optional colon with a
numeric value (default value is one) (e.g. "Gres=bandwidth:10000,gpus"). Generic
resources assigned via scontrol will only persist across the restart of the
slurmctld daemon with the -R option and state files preserved or slurmctld's
receipt of a SIGHUP. Update slurm.conf with any changes meant to be persistent
across normal restarts of slurmctld or the execution of scontrol reconfig.

Reason=<reason>
Identify the reason the node is in a "DOWN". "DRAINED", "DRAINING", "FAILING" or
"FAIL" state. Use quotes to enclose a reason having more than one word.

State=<state>
Identify the state to be assigned to the node. Possible node states are "NoResp",
"ALLOC", "ALLOCATED", "COMPLETING", "DOWN", "DRAIN", "ERROR, "FAIL", "FAILING",
"FUTURE" "IDLE", "MAINT", "MIXED", "PERFCTRS/NPC", "RESERVED", "POWER_DOWN",
"POWER_UP", "RESUME" or "UNDRAIN". Not all of those states can be set using the
scontrol command only the following can: "NoResp", "DRAIN", "FAIL", "FUTURE",
"RESUME", "POWER_DOWN", "POWER_UP" and "UNDRAIN". If a node is in a "MIXED" state
it usually means the node is in multiple states. For instance if only part of the
node is "ALLOCATED" and the rest of the node is "IDLE" the state will be "MIXED".
If you want to remove a node from service, you typically want to set it's state to
"DRAIN". "FAILING" is similar to "DRAIN" except that some applications will seek
to relinquish those nodes before the job completes. "PERFCTRS/NPC" indicates that
Network Performance Counters associated with this node are in use, rendering this
node as not usable for any other jobs. "RESERVED" indicates the node is in an
advanced reservation and not generally available. "RESUME" is not an actual node
state, but will change a node state from "DRAINED", "DRAINING", "DOWN" or "MAINT"
to either "IDLE" or "ALLOCATED" state as appropriate. "UNDRAIN" clears the node
from being drained (like "RESUME"), but will not change the node's base state (e.g.
"DOWN"). Setting a node "DOWN" will cause all running and suspended jobs on that
node to be terminated. "POWER_DOWN" and "POWER_UP" will use the configured
SuspendProg and ResumeProg programs to explicitly place a node in or out of a power
saving mode. If a node is already in the process of being powered up or down, the
command will have no effect until the configured ResumeTimeout or SuspendTimeout is
reached. The "NoResp" state will only set the "NoResp" flag for a node without
changing its underlying state. While all of the above states are valid, some of
them are not valid new node states given their prior state. If the node state code
printed is followed by "~", this indicates the node is presently in a power saving
mode (typically running at reduced frequency). If the node state code is followed
by "#", this indicates the node is presently being powered up or configured. If
the node state code is followed by "$", this indicates the node is currently in a
reservation with a flag value of "maintenance" or is scheduled to be rebooted.
Generally only "DRAIN", "FAIL" and "RESUME" should be used. NOTE: The scontrol
command should not be used to change node state on Cray systems. Use Cray tools
such as xtprocadmin instead.

Weight=<weight>
Identify weight to be associated with specified nodes. This allows dynamic changes
to weight associated with nodes, which will be used for the subsequent node
allocation decisions. Weight assigned via scontrol will only persist across the
restart of the slurmctld daemon with the -R option and state files preserved or
slurmctld's receipt of a SIGHUP. Update slurm.conf with any changes meant to be
persistent across normal restarts of slurmctld or the execution of scontrol
reconfig.

SPECIFICATIONS FOR UPDATE COMMAND, FRONTEND

FrontendName=<name>
Identify the front end node to be updated. This specification is required.

Reason=<reason>
Identify the reason the node is in a "DOWN" or "DRAIN" state. Use quotes to
enclose a reason having more than one word.

State=<state>
Identify the state to be assigned to the front end node. Possible values are
"DOWN", "DRAIN" or "RESUME". If you want to remove a front end node from service,
you typically want to set it's state to "DRAIN". "RESUME" is not an actual node
state, but will return a "DRAINED", "DRAINING", or "DOWN" front end node to
service, either "IDLE" or "ALLOCATED" state as appropriate. Setting a front end
node "DOWN" will cause all running and suspended jobs on that node to be
terminated.

SPECIFICATIONS FOR CREATE, UPDATE, AND DELETE COMMANDS, PARTITIONS

AllowGroups=<name>
Identify the user groups which may use this partition. Multiple groups may be
specified in a comma separated list. To permit all groups to use the partition
specify "AllowGroups=ALL".

AllocNodes=<name>
Comma separated list of nodes from which users can execute jobs in the partition.
Node names may be specified using the node range expression syntax described above.
The default value is "ALL".

Alternate=<partition name>
Alternate partition to be used if the state of this partition is "DRAIN" or
"INACTIVE." The value "NONE" will clear a previously set alternate partition.

Default=<yes|no>
Specify if this partition is to be used by jobs which do not explicitly identify a
partition to use. Possible output values are "YES" and "NO". In order to change
the default partition of a running system, use the scontrol update command and set
Default=yes for the partition that you want to become the new default.

DefaultTime=<time>
Run time limit used for jobs that don't specify a value. If not set then MaxTime
will be used. Format is the same as for MaxTime.

DefMemPerCPU=<MB>
Set the default memory to be allocated per CPU for jobs in this partition. The
memory size is specified in megabytes.

DefMemPerNode=<MB>
Set the default memory to be allocated per node for jobs in this partition. The
memory size is specified in megabytes.

DisableRootJobs=<yes|no>
Specify if jobs can be executed as user root. Possible values are "YES" and "NO".

GraceTime=<seconds>
Specifies, in units of seconds, the preemption grace time to be extended to a job
which has been selected for preemption. The default value is zero, no preemption
grace time is allowed on this partition or qos. (Meaningful only for
PreemptMode=CANCEL)

Hidden=<yes|no>
Specify if the partition and its jobs should be hidden from view. Hidden
partitions will by default not be reported by Slurm APIs or commands. Possible
values are "YES" and "NO".

MaxMemPerCPU=<MB>
Set the maximum memory to be allocated per CPU for jobs in this partition. The
memory size is specified in megabytes.

MaxMemPerCNode=<MB>
Set the maximum memory to be allocated per node for jobs in this partition. The
memory size is specified in megabytes.

MaxNodes=<count>
Set the maximum number of nodes which will be allocated to any single job in the
partition. Specify a number, "INFINITE" or "UNLIMITED". (On a Bluegene type system
this represents a c-node count.) Changing the MaxNodes of a partition has no
effect upon jobs that have already begun execution.

MaxTime=<time>
The maximum run time for jobs. Output format is [days-]hours:minutes:seconds or
"UNLIMITED". Input format (for update command) is minutes, minutes:seconds,
hours:minutes:seconds, days-hours, days-hours:minutes or
days-hours:minutes:seconds. Time resolution is one minute and second values are
rounded up to the next minute. Changing the MaxTime of a partition has no effect
upon jobs that have already begun execution.

MinNodes=<count>
Set the minimum number of nodes which will be allocated to any single job in the
partition. (On a Bluegene type system this represents a c-node count.) Changing
the MinNodes of a partition has no effect upon jobs that have already begun
execution.

Nodes=<name>
Identify the node(s) to be associated with this partition. Multiple node names may
be specified using simple node range expressions (e.g. "lx[10-20]"). Note that
jobs may only be associated with one partition at any time. Specify a blank data
value to remove all nodes from a partition: "Nodes=". Changing the Nodes in a
partition has no effect upon jobs that have already begun execution.

PartitionName=<name>
Identify the partition to be updated. This specification is required.

PreemptMode=<mode>
Reset the mechanism used to preempt jobs in this partition if PreemptType is
configured to preempt/partition_prio. The default preemption mechanism is specified
by the cluster-wide PreemptMode configuration parameter. Possible values are
"OFF", "CANCEL", "CHECKPOINT", "REQUEUE" and "SUSPEND".

Priority=<count>
Jobs submitted to a higher priority partition will be dispatched before pending
jobs in lower priority partitions and if possible they will preempt running jobs
from lower priority partitions. Note that a partition's priority takes precedence
over a job's priority. The value may not exceed 65533.

RootOnly=<yes|no>
Specify if only allocation requests initiated by user root will be satisfied. This
can be used to restrict control of the partition to some meta-scheduler. Possible
values are "YES" and "NO".

ReqResv=<yes|no>
Specify if only allocation requests designating a reservation will be satisfied.
This is used to restrict partition usage to be allowed only within a reservation.
Possible values are "YES" and "NO".

Shared=<yes|no|exclusive|force>[:<job_count>]
Specify if nodes in this partition can be shared by multiple jobs. Possible values
are "YES", "NO", "EXCLUSIVE" and "FORCE". An optional job count specifies how many
jobs can be allocated to use each resource.

State=<up|down|drain|inactive>
Specify if jobs can be allocated nodes or queued in this partition. Possible
values are "UP", "DOWN", "DRAIN" and "INACTIVE".

UP Designates that new jobs may queued on the partition, and that jobs may
be allocated nodes and run from the partition.

DOWN Designates that new jobs may be queued on the partition, but queued jobs
may not be allocated nodes and run from the partition. Jobs already
running on the partition continue to run. The jobs must be explicitly
canceled to force their termination.

DRAIN Designates that no new jobs may be queued on the partition (job
submission requests will be denied with an error message), but jobs
already queued on the partition may be allocated nodes and run. See also
the "Alternate" partition specification.

INACTIVE Designates that no new jobs may be queued on the partition, and jobs
already queued may not be allocated nodes and run. See also the
"Alternate" partition specification.

SPECIFICATIONS FOR UPDATE COMMAND, POWERCAP

PowerCap=<count>
Set the amount of watts the cluster is limited to. Specify a number, "INFINITE" to
enable the power capping logic without power restriction or "0" to disable the
power capping logic. Update slurm.conf with any changes meant to be persistent
across normal restarts of slurmctld or the execution of scontrol reconfig.

SPECIFICATIONS FOR CREATE, UPDATE, AND DELETE COMMANDS, RESERVATIONS

Reservation=<name>
Identify the name of the reservation to be created, updated, or deleted. This
parameter is required for update and is the only parameter for delete. For create,
if you do not want to give a reservation name, use "scontrol create res ..." and a
name will be created automatically.

Accounts=<account list>
List of accounts permitted to use the reserved nodes, for example
"Accounts=physcode1,physcode2". A user in any of the accounts may use the reserved
nodes. A new reservation must specify Users and/or Accounts. If both Users and
Accounts are specified, a job must match both in order to use the reservation.
Accounts can also be denied access to reservations by preceding all of the account
names with '-'. Alternately precede the equal sign with '-'. For example,
"Accounts=-physcode1,-physcode2" or "Accounts-=physcode1,physcode2" will permit any
account except physcode1 and physcode2 to use the reservation. You can add or
remove individual accounts from an existing reservation by using the update command
and adding a '+' or '-' sign before the '=' sign. If accounts are denied access to
a reservation (account name preceded by a '-'), then all other accounts are
implicitly allowed to use the reservation and it is not possible to also explicitly
specify allowed accounts.

BurstBuffer=<buffer_spec>[,<buffer_spec>,...]
Specification of burst buffer resources which are to be reserved. "buffer_spec"
consists of four elements: [plugin:][type:]#[units] "plugin" is the burst buffer
plugin name, currently either "cray" or "generic". If no plugin is specified, the
reservation applies to all configured burst buffer plugins. "type" specifies a
Cray generic burst buffer resource, for example "nodes". if "type" is not
specified, the number is a measure of storage space. The "units" may be "N"
(nodes), "GB" (gigabytes), "TB" (terabytes), "PB" (petabytes), etc. with the
default units being gigabyes for reservations of storage space. For example
"BurstBuffer=cray:2TB" (reserve 2TB of storage plus 3 nodes from the Cray plugin)
or "BurstBuffer=100GB" (reserve 100 GB of storage from all configured burst buffer
plugins). Jobs using this reservation are not restricted to these burst buffer
resources, but may use these reserved resources plus any which are generally
available.

CoreCnt=<num>
This option is only supported when SelectType=select/cons_res. Identify number of
cores to be reserved. If NodeCnt is used, this is the total number of cores to
reserve where cores per node is CoreCnt/NodeCnt. If a nodelist is used, this should
be an array of core numbers by node: Nodes=node[1-5] CoreCnt=2,2,3,3,4

Licenses=<license>
Specification of licenses (or other resources available on all nodes of the
cluster) which are to be reserved. License names can be followed by a colon and
count (the default count is one). Multiple license names should be comma separated
(e.g. "Licenses=foo:4,bar"). A new reservation must specify one or more resource
to be included: NodeCnt, Nodes and/or Licenses. If a reservation includes
Licenses, but no NodeCnt or Nodes, then the option Flags=LICENSE_ONLY must also be
specified. Jobs using this reservation are not restricted to these licenses, but
may use these reserved licenses plus any which are generally available.

NodeCnt=<num>[,num,...]
Identify number of nodes to be reserved. The number can include a suffix of "k" or
"K", in which case the number specified is multiplied by 1024. On BlueGene
systems, this number represents a c-node (compute node) count and will be rounded
up as needed to reserve whole nodes (midplanes). In order to optimize the topology
of the resource allocation on a new reservation (not on an updated reservation),
specific sizes required for the reservation may be specified. For example, if you
want to reserve 4096 c-nodes on a BlueGene system that can be used to allocate two
jobs each with 2048 c-nodes, specify "NodeCnt=2k,2k". A new reservation must
specify one or more resource to be included: NodeCnt, Nodes and/or Licenses.

Nodes=<name>
Identify the node(s) to be reserved. Multiple node names may be specified using
simple node range expressions (e.g. "Nodes=lx[10-20]"). Specify a blank data value
to remove all nodes from a reservation: "Nodes=". A new reservation must specify
one or more resource to be included: NodeCnt, Nodes and/or Licenses. A
specification of "ALL" will reserve all nodes. Set Flags=PART_NODES and
PartitionName= in order for changes in the nodes associated with a partition to
also be reflected in the nodes associated with a reservation.

StartTime=<time_spec>
The start time for the reservation. A new reservation must specify a start time.
It accepts times of the form HH:MM:SS for a specific time of day (seconds are
optional). (If that time is already past, the next day is assumed.) You may also
specify midnight, noon, fika (3 PM) or teatime (4 PM) and you can have a
time-of-day suffixed with AM or PM for running in the morning or the evening. You
can also say what day the job will be run, by specifying a date of the form MMDDYY
or MM/DD/YY or MM.DD.YY, or a date and time as YYYY-MM-DD[THH:MM[:SS]]. You can
also give times like now + count time-units, where the time-units can be minutes,
hours, days, or weeks and you can tell Slurm to run the job today with the keyword
today and to run the job tomorrow with the keyword tomorrow. You cannot update the
StartTime of a reservation in ACTIVE state.

EndTime=<time_spec>
The end time for the reservation. A new reservation must specify an end time or a
duration. Valid formats are the same as for StartTime.

Duration=<time>
The length of a reservation. A new reservation must specify an end time or a
duration. Valid formats are minutes, minutes:seconds, hours:minutes:seconds,
days-hours, days-hours:minutes, days-hours:minutes:seconds, or UNLIMITED. Time
resolution is one minute and second values are rounded up to the next minute.
Output format is always [days-]hours:minutes:seconds.

PartitionName=<name>
Identify the partition to be reserved.

Flags=<flags>
Flags associated with the reservation. You can add or remove individual flags from
an existing reservation by adding a '+' or '-' sign before the '=' sign. For
example: Flags-=DAILY (NOTE: this shortcut is not supported for all flags).
Currently supported flags include:

ANY_NODES This is a reservation for burst buffers and/or licenses only and not
compute nodes. If this flag is set, a job using this reservation may
use the associated burst buffers and/or licenses plus any compute
nodes. If this flag is not set, a job using this reservation may use
only the nodes and licenses associated with the reservation.

DAILY Repeat the reservation at the same time every day

FIRST_CORES Use the lowest numbered cores on a node only.

IGNORE_JOBS Ignore currently running jobs when creating the reservation. This
can be especially useful when reserving all nodes in the system for
maintenance.

LICENSE_ONLY See ANY_NODES.

MAINT Maintenance mode, receives special accounting treatment. This
partition is permitted to use resources that are already in another
reservation.

OVERLAP This reservation can be allocated resources that are already in
another reservation.

PART_NODES This flag can be used to reserve all nodes within the specified
partition. PartitionName and Nodes=ALL must be specified or this
option is ignored.

REPLACE Resources allocated to jobs as automaticallly replenished using idle
resources. This option can be used to maintain a constant number of
idle resources available for pending jobs (subject to availability of
idle resources). This should be used with the NodeCnt reservation
option; do not identify specific nodes to be included in the
reservation. This option is not supported on IBM Bluegene systems.

SPEC_NODES Reservation is for specific nodes (output only)

STATIC_ALLOC Make it so after the nodes are selected for a reservation they don't
change. Without this option when nodes are selected for a
reservation and one goes down the reservation will select a new node
to fill the spot.

TIME_FLOAT The reservation start time is relative to the current time and moves
forward through time (e.g. a StartTime=now+10minutes will always be
10 minutes in the future).

WEEKLY Repeat the reservation at the same time every week

Features=<features>
Set the reservation's required node features. Multiple values may be "&" separated
if all features are required (AND operation) or separated by "|" if any of the
specified features are required (OR operation). Value may be cleared with blank
data value, "Features=".

Users=<user list>
List of users permitted to use the reserved nodes, for example
"User=jones1,smith2". A new reservation must specify Users and/or Accounts. If
both Users and Accounts are specified, a job must match both in order to use the
reservation. Users can also be denied access to reservations by preceding all of
the user names with '-'. Alternately precede the equal sign with '-'. For example,
"User=-jones1,-smith2" or "User-=jones1,smith2" will permit any user except jones1
and smith2 to use the reservation. You can add or remove individual users from an
existing reservation by using the update command and adding a '+' or '-' sign
before the '=' sign. If users are denied access to a reservation (user name
preceded by a '-'), then all other users are implicitly allowed to use the
reservation and it is not possible to also explicitly specify allowed users.

TRES=<tres_spec>
Comma-separated list of TRES required for the reservation. Current supported TRES
types with reservations are: CPU, Node, License and BB. CPU and Node follow the
same format as CoreCnt and NodeCnt parameters respectively. License names can be
followed by an equal '=' and a count:

License/<name1>=<count1>[,License/<name2>=<count2>,...]

BurstBuffer can be specified in a similar way as BurstBuffer parameter. The only
difference is that colon symbol ':' should be replaced by an equal '=' in order to
follow the TRES format.

Some examples of TRES valid specifications:

TRES=cpu=5,bb/cray=4,license/iop1=1,license/iop2=3

TRES=node=5k,license/iop1=2

As specified in CoreCnt, if a nodelist is specified, cpu can be an array of core
numbers by node: nodes=compute[1-3] TRES=cpu=2,2,1,bb/cray=4,license/iop1=2

Please note that CPU, Node, License and BB can override CoreCnt, NodeCnt, Licenses
and BurstBuffer parameters respectively. Also CPU represents CoreCnt, in a
reservation and will be adjusted if you have threads per core on your nodes.

SPECIFICATIONS FOR UPDATE BLOCK/SUBMP

Bluegene systems only!

BlockName=<name>
Identify the bluegene block to be updated. This specification is required.

State=<free|error|recreate|remove|resume>
This will update the state of a bluegene block. (i.e. update BlockName=RMP0
STATE=ERROR) WARNING!!!! With the exception of the RESUME state, all other state
values will cancel any running job on the block!

FREE Return the block to a free state.

ERROR Make it so jobs don't run on the block.

RECREATE Destroy the current block and create a new one to take its place.

REMOVE Free and remove the block from the system. If the block is smaller than
a midplane every block on that midplane will be removed. (only available
on dynamic laid out systems)

RESUME If a block is in ERROR state RESUME will return the block to its previous
usable state (FREE or READY).

SubMPName=<name>
Identify the bluegene ionodes to be updated (i.e. bg000[0-3]). This specification
is required. NOTE: Even on BGQ where node names are given in bg0000[00000] format
this option takes an ionode name bg0000[0].

SPECIFICATIONS FOR UPDATE COMMAND, LAYOUTS

Layout=<name>
Identify the layout to be updated. This specification is required.

Entity=<entity list>
Identify the entities to be updated. This specification is required.

Key=<value>
Keys/Values to update for the entities. The format must respect the layout.d
configuration files. Key=Type cannot be updated. At least one Key/Value is
required, several can be set.

SPECIFICATIONS FOR SHOW COMMAND, LAYOUTS

Without options, lists all configured layouts. With a layout specified, shows entities
with following options:

Key=<value>
Keys/Values to update for the entities. The format must respect the layout.d
configuration files. Key=Type cannot be updated. One Key/Value is required, several
can be set.

Entity=<value>
Entities to show, default is not used. Can be set to "*".

Type=<value>
Type of entities to show, default is not used.

nolayout
If not used, only entities with defining the tree are shown. With the option, only
leaves are shown.

DESCRIPTION FOR SHOW COMMAND, NODES

The meaning of the energy information is as follows:

CurrentWatts
The instantaneous power consumption of the node at the time of the last node energy
accounting sample, in watts.

LowestJoules
The energy consumed by the node between the last time it was powered on and the
last time it was registered by slurmd, in joules.

ConsumedJoules
The energy consumed by the node between the last time it was registered by the
slurmd daemon and the last node energy accounting sample, in joules.

If the reported value is "n/s" (not supported), the node does not support the configured
AcctGatherEnergyType plugin. If the reported value is zero, energy accounting for nodes is
disabled.

The meaning of the external sensors information is as follows:

ExtSensorsJoules
The energy consumed by the node between the last time it was powered on and the
last external sensors plugin node sample, in joules.

ExtSensorsWatts
The instantaneous power consumption of the node at the time of the last external
sensors plugin node sample, in watts.

ExtSensorsTemp
The temperature of the node at the time of the last external sensors plugin node
sample, in celsius.

If the reported value is "n/s" (not supported), the node does not support the configured
ExtSensorsType plugin.

The meaning of the resource specialization information is as follows:

CPUSpecList
The list of Slurm abstract CPU IDs on this node reserved for exclusive use by the
Slurm compute node daemons (slurmd, slurmstepd).

MemSpecLimit
The combined memory limit, in megabytes, on this node for the Slurm compute node
daemons (slurmd, slurmstepd).

The meaning of the memory information is as follows:

RealMemory
The total memory, in MB, on the node.

AllocMem
The total memory, in MB, currently allocated by jobs on the node.

FreeMem
The total memory, in MB, currently free on the node as reported by the OS.

ENVIRONMENT VARIABLES


Some scontrol options may be set via environment variables. These environment variables,
along with their corresponding options, are listed below. (Note: Commandline options will
always override these settings.)

SCONTROL_ALL -a, --all

SLURM_BITSTR_LEN Specifies the string length to be used for holding a job array's task
ID expression. The default value is 64 bytes. A value of 0 will
print the full expression with any length required. Larger values may
adversely impact the application performance.

SLURM_CLUSTERS Same as --clusters

SLURM_CONF The location of the Slurm configuration file.

SLURM_TIME_FORMAT Specify the format used to report time stamps. A value of standard,
the default value, generates output in the form
"year-month-dateThour:minute:second". A value of relative returns
only "hour:minute:second" if the current day. For other dates in the
current year it prints the "hour:minute" preceded by "Tomorr"
(tomorrow), "Ystday" (yesterday), the name of the day for the coming
week (e.g. "Mon", "Tue", etc.), otherwise the date (e.g. "25 Apr").
For other years it returns a date month and year without a time (e.g.
"6 Jun 2012"). All of the time stamps use a 24 hour format.

A valid strftime() format can also be specified. For example, a value
of "%a %T" will report the day of the week and a time stamp (e.g. "Mon
12:34:56").

SLURM_TOPO_LEN Specify the maximum size of the line when printing Topology. If not
set, the default value "512" will be used.

AUTHORIZATION


When using the Slurm db, users who have AdminLevel's defined (Operator or Admin) and users
who are account coordinators are given the authority to view and modify jobs,
reservations, nodes, etc., as defined in the following table - regardless of whether a
PrivateData restriction has been defined in the slurm.conf file.

scontrol show job(s): Admin, Operator, Coordinator
scontrol update job: Admin, Operator, Coordinator
scontrol requeue: Admin, Operator, Coordinator
scontrol show step(s): Admin, Operator, Coordinator
scontrol update step: Admin, Operator, Coordinator

scontrol show block: Admin, Operator
scontrol update block: Admin

scontrol show node: Admin, Operator
scontrol update node: Admin

scontrol create partition: Admin
scontrol show partition: Admin, Operator
scontrol update partition: Admin
scontrol delete partition: Admin

scontrol create reservation: Admin, Operator
scontrol show reservation: Admin, Operator
scontrol update reservation: Admin, Operator
scontrol delete reservation: Admin, Operator

scontrol reconfig: Admin
scontrol shutdown: Admin
scontrol takeover: Admin

EXAMPLES


# scontrol
scontrol: show part debug
PartitionName=debug
AllocNodes=ALL AllowGroups=ALL Default=YES
DefaultTime=NONE DisableRootJobs=NO Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1
Nodes=snowflake[0-48]
Priority=1 RootOnly=NO Shared=YES:4
State=UP TotalCPUs=694 TotalNodes=49
scontrol: update PartitionName=debug MaxTime=60:00 MaxNodes=4
scontrol: show job 71701
JobId=71701 Name=hostname
UserId=da(1000) GroupId=da(1000)
Priority=66264 Account=none QOS=normal WCKey=*123
JobState=COMPLETED Reason=None Dependency=(null)
TimeLimit=UNLIMITED Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0
SubmitTime=2010-01-05T10:58:40 EligibleTime=2010-01-05T10:58:40
StartTime=2010-01-05T10:58:40 EndTime=2010-01-05T10:58:40
SuspendTime=None SecsPreSuspend=0
Partition=debug AllocNode:Sid=snowflake:4702
ReqNodeList=(null) ExcNodeList=(null)
NodeList=snowflake0
NumNodes=1 NumCPUs=10 CPUs/Task=2 ReqS:C:T=1:1:1
MinCPUsNode=2 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
scontrol: update JobId=71701 TimeLimit=30:00 Priority=500
scontrol: show hostnames tux[1-3]
tux1
tux2
tux3
scontrol: create res StartTime=2009-04-01T08:00:00 Duration=5:00:00 Users=dbremer
NodeCnt=10
Reservation created: dbremer_1
scontrol: update Reservation=dbremer_1 Flags=Maint NodeCnt=20
scontrol: delete Reservation=dbremer_1
scontrol: quit

COPYING


Copyright (C) 2002-2007 The Regents of the University of California. Produced at Lawrence
Livermore National Laboratory (cf, DISCLAIMER).
Copyright (C) 2008-2010 Lawrence Livermore National Security.
Copyright (C) 2010-2015 SchedMD LLC.

This file is part of Slurm, a resource management program. For details, see
<http://slurm.schedmd.com/>.

Slurm is free software; you can redistribute it and/or modify it under the terms of the
GNU General Public License as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

Use scontrol online using onworks.net services



Latest Linux & Windows online programs