How to Use the Command 'sacct' (with Examples)
- Linux
- December 17, 2024
The sacct
command is a versatile tool provided by the Slurm Workload Manager for accessing detailed job accounting information. It allows administrators, developers, and analysts to monitor and review job performance, resource usage, and state transitions efficiently. By providing insights into job execution and performance metrics, sacct
helps in optimizing resource allocation and troubleshooting workflows within high-performance computing (HPC) environments.
Use Case 1: Display Job ID, Job Name, Partition, Account, Number of Allocated CPUs, Job State, and Job Exit Codes for Recent Jobs
Code:
sacct
Motivation:
Understanding the details of jobs recently executed is crucial for monitoring system performance and user activity. By displaying fundamental information such as job ID, job name, partition, account, allocated CPUs, job state, and exit codes, users can quickly assess which jobs have completed successfully, are currently running, or failed.
Explanation:
sacct
: This basic command fetches detailed records of all jobs from the Slurm accounting database. By default, it retrieves jobs executed since midnight of the current day with essential information such as job ID, job name, and more, which provides a comprehensive snapshot of job activity.
Example Output:
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
123456 myjob compute myacct 4 COMPLETED 0:0
123457 testjob compute myacct 2 FAILED 1:0
Use Case 2: Display Job ID, Job State, and Job Exit Code for Recent Jobs
Code:
sacct --brief
Motivation:
When users need a succinct overview of job completion, focusing on the job ID, its state, and exit code suffices. The --brief
option simplifies output, making it faster to identify jobs that need attention due to failures or ongoing status.
Explanation:
sacct
: Retrieves job information from the Slurm database.--brief
: This option limits the output to a concise set of columns, specifically displaying only the job ID, state, and exit code. It is ideal for quickly scanning job statuses.
Example Output:
JobID State ExitCode
-------- ---------- --------
123456 COMPLETED 0:0
123457 FAILED 1:0
Use Case 3: Display the Allocations of a Job
Code:
sacct --jobs job_id --allocations
Motivation:
Understanding resource allocations for specific jobs is essential for performance analysis and cost management. Knowing which resources were assigned lets users optimize resource requests and improve utilization in future runs.
Explanation:
sacct
: Executes the job accounting command.--jobs job_id
: Specifies the job of interest for which to display information.--allocations
: Focuses the output on resource allocation details for the specified job, such as nodes, CPUs, and memory.
Example Output:
JobID NodeList AllocCPUS
------------ ---------- ----------
123456 node01,node02 4
Use Case 4: Display Elapsed Time, Job Name, Number of Requested CPUs, and Memory Requested of a Job
Code:
sacct --jobs job_id --format=Elapsed,JobName,ReqCPUS,ReqMem
Motivation:
Detailed information about a job’s runtime and resources can reveal inefficiencies or scalability issues in the computational workload. By analyzing these parameters, users can fine-tune application performance and resource distribution.
Explanation:
sacct
: Initiates the job accounting data retrieval.--jobs job_id
: Targets a specific job for detailed information.--format=Elapsed,JobName,ReqCPUS,ReqMem
: Customizes the output to show elapsed time, job name, requested CPUs, and memory, providing a clear depiction of job resource demand versus actual consumption.
Example Output:
Elapsed JobName ReqCPUS ReqMem
--------- ---------- ------- --------
00:30:00 myjob 4 4000M
Use Case 5: Display Recent Jobs That Occurred from One Week Ago up to the Present Day
Code:
sacct --starttime=$(date -d "1 week ago" +'%F')
Motivation:
Reviewing job activity within a specific timeframe is essential for post-mortem analyses, reporting, and historical insights. By filtering jobs executed in the past week, users can focus on relevant data for regular assessments and audits.
Explanation:
sacct
: Calls the accounting data retrieval for jobs in the specified period.--starttime=$(date -d "1 week ago" +'%F')
: Defines the start date for job data retrieval. The$(date -d "1 week ago" +'%F')
command calculates the date exactly one week prior, ensuring the command captures all jobs executed in this timeframe.
Example Output:
JobID JobName State StartTime
------------ ---------- ------ ----------
123450 analysis COMPLETED 2023-09-23T12:00:00
123455 run_test FAILED 2023-09-22T16:30:00
Use Case 6: Output a Larger Number of Characters for an Attribute
Code:
sacct --format=JobID,JobName%100
Motivation:
Long job names can be truncated in default outputs. Allocating additional character space to the job name attribute ensures complete visibility of the job names, thus preserving clarity and preventing confusion over similarly named jobs.
Explanation:
sacct
: Retrieves job information.--format=JobID,JobName%100
: Modifies the format of the output to include a custom column width for the job name field, set to 100 characters, allowing for full display of longer job names.
Example Output:
JobID JobName
---------- ----------------------------------------------------------
123456 full_analysis_run_with_detailed_parameters_and_logging
Conclusion
The sacct
command is a powerful utility within the Slurm Workload Manager for accessing and reporting job accounting details. With various options available, users can tailor the command output to suit monitoring needs, performance evaluation, and capacity planning. By leveraging such detailed insights, optimization of HPC resources becomes attainable, ensuring efficiency and reliability across computational tasks.