How to Use the Command 'sacct' (with Examples)

How to Use the Command 'sacct' (with Examples)

The sacct command is a versatile tool provided by the Slurm Workload Manager for accessing detailed job accounting information. It allows administrators, developers, and analysts to monitor and review job performance, resource usage, and state transitions efficiently. By providing insights into job execution and performance metrics, sacct helps in optimizing resource allocation and troubleshooting workflows within high-performance computing (HPC) environments.

Use Case 1: Display Job ID, Job Name, Partition, Account, Number of Allocated CPUs, Job State, and Job Exit Codes for Recent Jobs

Code:

sacct

Motivation:
Understanding the details of jobs recently executed is crucial for monitoring system performance and user activity. By displaying fundamental information such as job ID, job name, partition, account, allocated CPUs, job state, and exit codes, users can quickly assess which jobs have completed successfully, are currently running, or failed.

Explanation:

  • sacct: This basic command fetches detailed records of all jobs from the Slurm accounting database. By default, it retrieves jobs executed since midnight of the current day with essential information such as job ID, job name, and more, which provides a comprehensive snapshot of job activity.

Example Output:

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
123456             myjob   compute     myacct          4  COMPLETED      0:0 
123457           testjob   compute     myacct          2     FAILED      1:0 

Use Case 2: Display Job ID, Job State, and Job Exit Code for Recent Jobs

Code:

sacct --brief

Motivation:
When users need a succinct overview of job completion, focusing on the job ID, its state, and exit code suffices. The --brief option simplifies output, making it faster to identify jobs that need attention due to failures or ongoing status.

Explanation:

  • sacct: Retrieves job information from the Slurm database.
  • --brief: This option limits the output to a concise set of columns, specifically displaying only the job ID, state, and exit code. It is ideal for quickly scanning job statuses.

Example Output:

   JobID      State ExitCode 
-------- ---------- -------- 
123456  COMPLETED      0:0 
123457      FAILED      1:0 

Use Case 3: Display the Allocations of a Job

Code:

sacct --jobs job_id --allocations

Motivation:
Understanding resource allocations for specific jobs is essential for performance analysis and cost management. Knowing which resources were assigned lets users optimize resource requests and improve utilization in future runs.

Explanation:

  • sacct: Executes the job accounting command.
  • --jobs job_id: Specifies the job of interest for which to display information.
  • --allocations: Focuses the output on resource allocation details for the specified job, such as nodes, CPUs, and memory.

Example Output:

       JobID    NodeList  AllocCPUS 
------------ ---------- ---------- 
123456     node01,node02          4 

Use Case 4: Display Elapsed Time, Job Name, Number of Requested CPUs, and Memory Requested of a Job

Code:

sacct --jobs job_id --format=Elapsed,JobName,ReqCPUS,ReqMem

Motivation:
Detailed information about a job’s runtime and resources can reveal inefficiencies or scalability issues in the computational workload. By analyzing these parameters, users can fine-tune application performance and resource distribution.

Explanation:

  • sacct: Initiates the job accounting data retrieval.
  • --jobs job_id: Targets a specific job for detailed information.
  • --format=Elapsed,JobName,ReqCPUS,ReqMem: Customizes the output to show elapsed time, job name, requested CPUs, and memory, providing a clear depiction of job resource demand versus actual consumption.

Example Output:

  Elapsed    JobName  ReqCPUS   ReqMem 
--------- ---------- ------- -------- 
00:30:00     myjob          4    4000M 

Use Case 5: Display Recent Jobs That Occurred from One Week Ago up to the Present Day

Code:

sacct --starttime=$(date -d "1 week ago" +'%F')

Motivation:
Reviewing job activity within a specific timeframe is essential for post-mortem analyses, reporting, and historical insights. By filtering jobs executed in the past week, users can focus on relevant data for regular assessments and audits.

Explanation:

  • sacct: Calls the accounting data retrieval for jobs in the specified period.
  • --starttime=$(date -d "1 week ago" +'%F'): Defines the start date for job data retrieval. The $(date -d "1 week ago" +'%F') command calculates the date exactly one week prior, ensuring the command captures all jobs executed in this timeframe.

Example Output:

       JobID    JobName  State StartTime 
------------ ---------- ------ ---------- 
123450     analysis     COMPLETED 2023-09-23T12:00:00 
123455      run_test        FAILED 2023-09-22T16:30:00 

Use Case 6: Output a Larger Number of Characters for an Attribute

Code:

sacct --format=JobID,JobName%100

Motivation:
Long job names can be truncated in default outputs. Allocating additional character space to the job name attribute ensures complete visibility of the job names, thus preserving clarity and preventing confusion over similarly named jobs.

Explanation:

  • sacct: Retrieves job information.
  • --format=JobID,JobName%100: Modifies the format of the output to include a custom column width for the job name field, set to 100 characters, allowing for full display of longer job names.

Example Output:

      JobID    JobName 
---------- ---------------------------------------------------------- 
123456    full_analysis_run_with_detailed_parameters_and_logging 

Conclusion

The sacct command is a powerful utility within the Slurm Workload Manager for accessing and reporting job accounting details. With various options available, users can tailor the command output to suit monitoring needs, performance evaluation, and capacity planning. By leveraging such detailed insights, optimization of HPC resources becomes attainable, ensuring efficiency and reliability across computational tasks.

Related Posts

How to use the command 'gladtex' (with examples)

How to use the command 'gladtex' (with examples)

GladTeX is a useful tool designed for web developers and document developers who want to seamlessly integrate LaTeX mathematical formulas into HTML files.

Read More
How to use the command 'home-manager' (with examples)

How to use the command 'home-manager' (with examples)

Home Manager is a powerful tool that leverages the Nix package manager to help users manage their personal environments.

Read More
How to use the command 'esptool.py' (with examples)

How to use the command 'esptool.py' (with examples)

Esptool.py is a vital utility tool for interacting with Espressif Systems’ chips, such as the popular ESP8266 and ESP32.

Read More