How to use the command "sinfo" (with examples)
- Linux
- December 25, 2023
The “sinfo” command is used to view information about Slurm nodes and partitions in a cluster. It provides detailed status information about the partitions, as well as the ability to filter the output based on node states and other criteria.
Use case 1: Show a quick summary overview of the cluster
Code:
sinfo --summarize
Motivation: This use case is helpful when you want to quickly get an overview of the cluster’s status without having to go through the detailed information.
Explanation: The “–summarize” option is used to provide a concise summary of the cluster’s status. It displays the number of nodes and partitions, as well as the number of nodes in each state (idle, allocated, etc.).
Example output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 2 idle~ node[1-2]
normal* up infinite 4 idle node[3-6]
Use case 2: View the detailed status of all partitions across the entire cluster
Code:
sinfo
Motivation: This use case allows you to see the detailed status of all the partitions in the cluster, including information about the nodes and their states.
Explanation: Without any specific options, the “sinfo” command provides a detailed status of all the partitions across the entire cluster. It shows information such as the partition name, availability, time limit, number of nodes, and the state of each node.
Example output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 2 idle~ node[1-2]
normal* up infinite 4 idle node[3-6]
Use case 3: View the detailed status of a specific partition
Code:
sinfo --partition partition_name
Motivation: This use case is useful when you want to focus on the status of a particular partition in the cluster.
Explanation: The “–partition” option allows you to specify the name of the partition for which you want to view the detailed status. Only the information related to that specific partition will be displayed.
Example output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up infinite 2 idle node[1-2]
Use case 4: View information about idle nodes
Code:
sinfo --states idle
Motivation: This use case lets you view detailed information about all the idle nodes in the cluster.
Explanation: The “–states” option allows you to filter the output based on the node states. By specifying “idle”, you can filter the output to only show information about the nodes that are currently idle.
Example output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 2 idle~ node[1-2]
normal* up infinite 4 idle node[3-6]
Use case 5: Summarise dead nodes
Code:
sinfo --dead
Motivation: This use case provides a summary of the dead nodes in the cluster, making it easier to identify any potential issues.
Explanation: The “–dead” option is used to display a summary of the dead nodes in the cluster. It shows the partition name, availability, and the number of dead nodes in each partition.
Example output:
PARTITION AVAIL NODELIST
debug* up <none>
normal* up <none>
Use case 6: List dead nodes and the reasons why
Code:
sinfo --list-reasons
Motivation: This use case is helpful when you want to identify the reasons behind the dead nodes in the cluster.
Explanation: The “–list-reasons” option is used to list the dead nodes along with the reasons why they are dead. This can provide valuable information for troubleshooting and resolving any issues.
Example output:
PARTITION NODELIST REASON
debug* node1 Node is in DOWN state
debug* node2 Node is in DOWN state
Conclusion:
The “sinfo” command is a powerful tool for obtaining information about Slurm nodes and partitions in a cluster. By using its various options, you can view overview summaries, detailed statuses, filter information based on node states, and identify dead nodes and their reasons. It provides valuable insights into the state of the cluster, allowing for more effective management and troubleshooting of the workload.