How to use the command `sh5util` (with examples)
- Linux
- December 25, 2023
The sh5util
command is a utility provided by Slurm to merge HDF5 files produced by the sacct_gather_profile
plugin. It allows users to perform various data extraction operations on the merged files.
Use case 1: Merge HDF5 files produced on each allocated node for a specified job or step
Code:
sh5util --jobs=job_id|job_id.step_id
Motivation: This use case is useful when you want to combine the individual HDF5 files generated on each allocated node during the execution of a job or step. Merging these files into a single HDF5 file can provide a consolidated view of the job performance and resource utilization across all the nodes.
Explanation:
--jobs
: Specifies the job or job-step ID for which the HDF5 files need to be merged.job_id
: The ID of the job for which the HDF5 files need to be merged.step_id
: (Optional) The ID of the specific job-step for which the HDF5 files need to be merged. If not provided, the command will merge HDF5 files for the entire job.
Example output:
Merging HDF5 files for job ID: 12345
Merged file: path/to/merged_file.h5
Use case 2: Extract one or more data series from a merged job file
Code:
sh5util --jobs=job_id|job_id.step_id --extract -i path/to/file.h5 --series=Energy|Filesystem|Network|Task
Motivation: This use case is helpful when you only need to extract specific data series from a merged job file, rather than working with the entire dataset. Extracting specific data series allows you to focus on the particular metrics or information that are of interest to you.
Explanation:
--extract
: Indicates that data extraction operation needs to be performed.-i
: Specifies the input merged job file from which the data series should be extracted.path/to/file.h5
: The path to the input merged job file.--series
: Specifies the data series that need to be extracted. Multiple data series can be provided, separated by the pipe (|
) character.
Example output:
Extracting data series from file: path/to/merged_file.h5
Extracted data series: Energy, Filesystem, Task
Use case 3: Extract one data item from all nodes in a merged job file
Code:
sh5util --jobs=job_id|job_id.step_id --item-extract --series=Energy|Filesystem|Network|Task --data=data_item
Motivation: In scenarios where you require a specific data item (such as the maximum energy consumption or the average CPU utilization) from multiple nodes in a merged job file, this use case comes in handy. It allows extracting a single data item from all nodes, providing essential insights into specific metrics across the job execution.
Explanation:
--item-extract
: Specifies that a data item extraction operation needs to be performed.--series
: Specifies the data series from which the data item needs to be extracted. Multiple data series can be provided, separated by the pipe (|
) character.--data
: Specifies the specific data item that needs to be extracted from all nodes.
Example output:
Extracting data item 'max_energy' from series 'Energy' in merged file: path/to/merged_file.h5
Data item values from all nodes: 12, 15, 18, 10, 16
Conclusion:
The sh5util
command provides a set of useful options for merging and extracting data from HDF5 files generated by the sacct_gather_profile
plugin. By utilizing these options, users can gain valuable insights into job performance and resource utilization, as well as focus on specific metrics or data items of interest.