How to use the command 'srun' (with examples)
- Linux
- December 17, 2024
The srun
command is an essential utility in the Slurm Workload Manager, used primarily to allocate resources and launch tasks in parallel within a computing cluster. It enables users to start interactive sessions or execute command-line programs with specified resources, facilitating efficient and optimized resource usage. This command is particularly useful for developers, researchers, and engineers working in high-performance computing environments where tasks need to be distributed across multiple nodes to accelerate computation.
Use case 1: Submit a basic interactive job
Code:
srun --pty /bin/bash
Motivation:
This command is ideal for users who wish to quickly start an interactive session on a Slurm managed cluster without specifying any detailed resources. It allows a user to interact with the shell directly on a compute node, which is beneficial for tasks such as data exploration, testing, or development work that doesn’t require immediate specification of large computational resources.
Explanation:
srun
: The command used to initiate the Slurm job, requesting resources and initiating the task on available compute nodes.--pty
: This option allocates a pseudo-terminal for the session, crucial for interaction purposes as it facilitates direct communication between the user and the node./bin/bash
: Specifies that a Bash shell should be started upon connecting to the node, providing a familiar environment for command-line operations.
Example output:
Upon successful execution, the user is logged into a compute node, presenting a standard Bash shell prompt. The user can proceed to execute any shell commands directly, as if working on any other terminal or command-line interface. There are no error messages, indicating the job was allocated successfully.
Use case 2: Submit an interactive job with different attributes
Code:
srun --ntasks-per-node=num_cores --mem-per-cpu=memory_MB --pty /bin/bash
Motivation:
This approach is suited for situations where specific computational resources are needed. By specifying the number of tasks per node and the memory required per CPU, a user can tailor the slurm job to fit the demands of the application they are running. This is useful for data-intensive applications or those needing parallel computation, allowing optimization of resource allocation and avoiding overloading nodes.
Explanation:
--ntasks-per-node=num_cores
: This option specifies the number of tasks (or cores) to allocate per node. “num_cores” should be replaced with the exact number you require. It helps ensure that the operation can utilize the desired number of processing units, enhancing computational efficiency.--mem-per-cpu=memory_MB
: Indicates the amount of memory to allocate per CPU. “memory_MB” should be set to a value that reflects the memory necessity of your application, preventing scenarios where tasks fail due to memory shortages.--pty
: As before, provides a terminal interface./bin/bash
: Ensures a Bash shell session is made available to the user, maintaining a consistent work environment.
Example output:
Once executed, the command results in an interactive Bash session on a Slurm node with dedicated resources. Any command run in this shell will utilize the allocated cores and memory. Successfully partitioned resources reduce downtime and improve processing efficiency.
Use case 3: Connect to a worker node with a job running
Code:
srun --jobid=job_id --pty /bin/bash
Motivation:
This command serves users needing to connect to an already running job. It is particularly useful for monitoring the status of active jobs or modifying parameters while tasks are executing. This operation allows real-time access to the node where the job is running, providing direct interaction to make necessary adjustments or to diagnose any potential issues.
Explanation:
--jobid=job_id
: This flag requests srun to connect to an existing job, identified by its unique job ID. This is critical for targeting specific jobs without initiating new sessions—ensuring the user interfaces with the ongoing job.--pty
: Continues to offer a command-line interface, important for running commands or investigations within the job context./bin/bash
: Launches Bash to facilitate user interaction, maintaining a standard terminal experience.
Example output:
The output offers a seamless transition into the environment where the job specified by job_id
is running. This lets the user review output logs, inspect resource usage, or run supplementary commands directly related to the ongoing task.
Conclusion:
The srun
command is a powerful element of the Slurm workload management system that offers flexibility in job management and resource allocation. Through the examples provided, users can leverage srun
to start basic interactive sessions, set tailored resource specifications, or connect to existing jobs—all within the framework of a high-performance computing environment. Each use case illustrates the command’s efficiency and utility, contributing to more effective workload management and system utilization.