How to use the command "sinfo" (with examples)

How to use the command "sinfo" (with examples)

The “sinfo” command is used to view information about Slurm nodes and partitions in a cluster. It provides detailed status information about the partitions, as well as the ability to filter the output based on node states and other criteria.

Use case 1: Show a quick summary overview of the cluster

Code:

sinfo --summarize

Motivation: This use case is helpful when you want to quickly get an overview of the cluster’s status without having to go through the detailed information.

Explanation: The “–summarize” option is used to provide a concise summary of the cluster’s status. It displays the number of nodes and partitions, as well as the number of nodes in each state (idle, allocated, etc.).

Example output:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      2  idle~  node[1-2]
normal*      up   infinite      4  idle     node[3-6]

Use case 2: View the detailed status of all partitions across the entire cluster

Code:

sinfo

Motivation: This use case allows you to see the detailed status of all the partitions in the cluster, including information about the nodes and their states.

Explanation: Without any specific options, the “sinfo” command provides a detailed status of all the partitions across the entire cluster. It shows information such as the partition name, availability, time limit, number of nodes, and the state of each node.

Example output:

PARTITION  AVAIL  TIMELIMIT  NODES  STATE      NODELIST
debug*        up   infinite      2  idle~       node[1-2]
normal*       up   infinite      4  idle         node[3-6]

Use case 3: View the detailed status of a specific partition

Code:

sinfo --partition partition_name

Motivation: This use case is useful when you want to focus on the status of a particular partition in the cluster.

Explanation: The “–partition” option allows you to specify the name of the partition for which you want to view the detailed status. Only the information related to that specific partition will be displayed.

Example output:

PARTITION  AVAIL  TIMELIMIT  NODES  STATE      NODELIST
normal*       up   infinite      2  idle         node[1-2]

Use case 4: View information about idle nodes

Code:

sinfo --states idle

Motivation: This use case lets you view detailed information about all the idle nodes in the cluster.

Explanation: The “–states” option allows you to filter the output based on the node states. By specifying “idle”, you can filter the output to only show information about the nodes that are currently idle.

Example output:

PARTITION    AVAIL  TIMELIMIT  NODES  STATE    NODELIST
debug*          up   infinite      2  idle~     node[1-2]
normal*         up   infinite      4  idle       node[3-6]

Use case 5: Summarise dead nodes

Code:

sinfo --dead

Motivation: This use case provides a summary of the dead nodes in the cluster, making it easier to identify any potential issues.

Explanation: The “–dead” option is used to display a summary of the dead nodes in the cluster. It shows the partition name, availability, and the number of dead nodes in each partition.

Example output:

PARTITION  AVAIL   NODELIST
debug*     up      <none>
normal*    up      <none>

Use case 6: List dead nodes and the reasons why

Code:

sinfo --list-reasons

Motivation: This use case is helpful when you want to identify the reasons behind the dead nodes in the cluster.

Explanation: The “–list-reasons” option is used to list the dead nodes along with the reasons why they are dead. This can provide valuable information for troubleshooting and resolving any issues.

Example output:

PARTITION  NODELIST         REASON
debug*     node1            Node is in DOWN state
debug*     node2            Node is in DOWN state

Conclusion:

The “sinfo” command is a powerful tool for obtaining information about Slurm nodes and partitions in a cluster. By using its various options, you can view overview summaries, detailed statuses, filter information based on node states, and identify dead nodes and their reasons. It provides valuable insights into the state of the cluster, allowing for more effective management and troubleshooting of the workload.

Related Posts

How to use the command 'go tool' (with examples)

How to use the command 'go tool' (with examples)

The ‘go tool’ command is a powerful tool in the Go programming language that allows users to run specific Go tools or commands.

Read More
How to use the command 'scp' (with examples)

How to use the command 'scp' (with examples)

Secure copy. Use case 1: Copy a local file to a remote host Code:

Read More
How to use the command genid (with examples)

How to use the command genid (with examples)

The genid command is a versatile tool for generating various types of IDs such as snowflakes, UUIDs, and GAIDs (Generic Anonymous IDs).

Read More