How to use the command 'scontrol' (with examples)
- Linux
- December 25, 2023
The ‘scontrol’ command is a powerful tool in the Slurm workload manager that allows users to view information about and modify jobs. It provides various capabilities such as showing information for a specific job, suspending and resuming jobs, and holding and releasing jobs in the queue. This article will walk you through each of these use cases with examples.
Use case 1: View information for a job
Code:
scontrol show job job_id
Motivation: When working with Slurm, it is often necessary to retrieve detailed information about a specific job. By using the ‘scontrol show job’ command with the job ID, you can obtain a detailed summary of the job’s status, resource allocation, and other relevant information.
Explanation:
scontrol
: The command itself, used to interact with Slurm.show job
: This argument specifies that we want to view information for a specific job.job_id
: The ID of the job you want to view information for.
Example output:
JobId=12345 Cluster=your_cluster Partition=normal AllocNode:Sid=compute-node-1 NumNodes=1 CpUs=4 CPUTime=00:10:00 ElapsedTime=00:05:00 .....
Use case 2: Suspend a comma-separated list of running jobs
Code:
scontrol suspend job_id
Motivation: Sometimes, you may need to temporarily pause the execution of one or multiple running jobs. The ‘scontrol suspend’ command allows you to do just that by specifying the job ID(s) you want to suspend. This can be useful when you need to perform system maintenance or prioritize other critical tasks.
Explanation:
scontrol
: The command itself, used to interact with Slurm.suspend
: This argument tells Slurm to suspend the specified job(s).job_id
: The ID of the job(s) you want to suspend.
Example output:
Suspended job(s): 12345, 12346, 12347
Use case 3: Resume a comma-separated list of suspended jobs
Code:
scontrol resume job_id
Motivation: After suspending a job, you may want to resume its execution at a later time. The ‘scontrol resume’ command allows you to resume one or multiple suspended jobs by specifying their job ID(s). This is helpful when you need to allocate resources back to the jobs and resume their progress.
Explanation:
scontrol
: The command itself, used to interact with Slurm.resume
: This argument instructs Slurm to resume the specified job(s).job_id
: The ID of the job(s) you want to resume.
Example output:
Resumed job(s): 12345, 12346, 12347
Use case 4: Hold a comma-separated list of queued jobs
Code:
scontrol hold job_id
Motivation: At times, you may need to prevent a group of jobs from being scheduled for execution. The ‘scontrol hold’ command lets you place one or multiple queued jobs on hold, effectively preventing them from starting unless explicitly released. This can be useful when you want to prioritize specific jobs, perform additional troubleshooting, or wait for additional resources.
Explanation:
scontrol
: The command itself, used to interact with Slurm.hold
: This argument tells Slurm to put the specified job(s) on hold.job_id
: The ID of the job(s) you want to put on hold.
Example output:
Job(s) put on hold: 12345, 12346, 12347
Use case 5: Release a comma-separated list of suspended jobs
Code:
scontrol release job_id
Motivation: Once you have suspended a job, you may later decide to release it and allow it to resume execution. The ‘scontrol release’ command enables you to release one or multiple suspended jobs by specifying their job ID(s). This is useful when you want to restore the jobs back into the normal execution flow.
Explanation:
scontrol
: The command itself, used to interact with Slurm.release
: This argument instructs Slurm to release the specified job(s) from suspension.job_id
: The ID of the job(s) you want to release.
Example output:
Released job(s): 12345, 12346, 12347
Conclusion:
The ‘scontrol’ command is an essential tool in managing Slurm jobs. With its ability to view job information, suspend and resume jobs, and hold and release jobs, you gain flexibility and control over your workload. By understanding and utilizing the different use cases, you can efficiently manage and monitor your jobs in the Slurm workload manager.