How to use the command 'pv' (with examples)
The pv
command, or “Pipe Viewer,” is a versatile tool in the Unix/Linux ecosystem, designed to monitor the progress of data through pipelines. It’s particularly valuable when working with data streams, enabling users to visualize and measure the flow of data. The various uses of pv
include tracking data transfer speeds, verifying data processing in real-time, and managing large file operations with ease. This article will delve into several practical uses of pv
, demonstrating its power and flexibility in different contexts.
Use case 1: Print the contents of the file and display a progress bar
Code:
pv path/to/file
Motivation: In scenarios where you are dealing with large files, it can be useful to have a visual indication of how much of the file has been processed or transferred. Whether you are monitoring a file being sent over a network or simply assessing the size of a local file, the progress bar provides valuable feedback on the operation’s advancement.
Explanation:
pv
: The command invokes the Pipe Viewer tool.path/to/file
: This specifies the file whose contents you want to read and display with a progress monitoring interface.
Example Output:
3.5MiB 0:00:02 [1.47MiB/s] [===============> ] 35% ETA 0:00:05
Use case 2: Measure the speed and amount of data flow between pipes
Code:
command1 | pv --size expected_amount_of_data_for_eta | command2
Motivation: When processing data through multiple commands in a pipeline, it’s often necessary to know both the speed and the volume of data being processed. This information can help in diagnosing bottlenecks or optimizing processes to improve performance.
Explanation:
command1
: Represents the first command in the pipeline whose output is being processed.pv
: Utilizes Pipe Viewer to monitor the data betweencommand1
andcommand2
.--size expected_amount_of_data_for_eta
: Provides an estimated total size, allowingpv
to calculate an estimated time of arrival (ETA) for the completion of data transfer.command2
: Represents the second command in the pipeline which processes the data coming fromcommand1
.
Example Output:
5.6MiB 0:00:04 [1.35MiB/s] [======================> ] 65% ETA 0:00:02
Use case 3: Filter a file, see both progress and amount of output data
Code:
pv -cN in big_text_file | grep pattern | pv -cN out > filtered_file
Motivation:
In data analysis and log processing, it’s commonplace to filter files to extract relevant information. By using pv
, you can not only filter the data but also track the progress of both input and output, offering a comprehensive overview of how much data is processed and generated.
Explanation:
pv -cN in
: The-cN
switch enables cursor control and labels the data stream as “in.”big_text_file
: Indicates the file being filtered.grep pattern
: Filters lines that contain the specified pattern.pv -cN out
: Again uses the-cN
switch to label the data stream as “out,” while also showing the data flowing into thefiltered_file
.> filtered_file
: Redirects the output to a filtered file.
Example Output:
in: 20.0MiB 0:00:15 [1.30MiB/s] [=========> ] 50% ETA 0:00:15
out: 4.0MiB 0:00:02 [1.50MiB/s] [============> ] 80% ETA 0:00:01
Use case 4: Attach to an already running process and see its file reading progress
Code:
pv -d PID
Motivation: Sometimes, you need mid-process information about a file being read by an already-running process. This is useful for long-running operations where you want to gauge how far along the process is without interrupting it.
Explanation:
pv
: Starts Pipe Viewer.-d PID
: The-d
option attaches to a specific process identified by its Process ID (PID), allowing you to monitor its file reading operations in real-time.
Example Output:
4.0MiB 0:00:10 [390KiB/s] [============> ] 60% ETA 0:00:05
Use case 5: Read an erroneous file, skip errors as dd conv=sync,noerror
would
Code:
pv -EE path/to/faulty_media > image.img
Motivation: In data recovery situations, dealing with a faulty media file can lead to read errors that interrupt processes. Skipping errors during reading ensures that the process can continue even if some data is irretrievable. This technique is crucial for scenarios where every accessible piece of data counts.
Explanation:
pv
: Uses Pipe Viewer.-EE
: Allowspv
to ignore read errors, akin todd
with theconv=sync,noerror
options, enabling continued processing despite errors.path/to/faulty_media
: The path specifies the faulty media file to be read.> image.img
: Redirects output to an image file.
Example Output:
2.5GiB 0:01:52 [390KiB/s] [========> ] 35% ETA 0:03:28
Use case 6: Stop reading after reading specified amount of data, rate limit to 1K/s
Code:
pv -L 1K --stop-at --size maximum_file_size_to_be_read
Motivation: Control over data flow is exceptionally desirable when bandwidth is a constraint, or when you aim to process or sample only a specific part of the data. Limiting the read rate and setting a stop condition allows you to manage data input efficiently.
Explanation:
pv
: Invokes Pipe Viewer.-L 1K
: Limits the read rate to 1KB per second, thereby controlling the data flow into the pipeline.--stop-at
: Determines a stopping point for reading, particularly for large datasets where a partial read is needed.--size maximum_file_size_to_be_read
: Sets the maximum size of data to read, enabling you to stop the read after this limit is reached.
Example Output:
100K 0:01:40 [1.02KiB/s] [===================================] 100%
Conclusion:
The pv
command is an essential tool for anyone working with pipes and data streams in Unix/Linux environments. By visually tracking data movement, limiting throughput, and handling errors gracefully, pv
provides users with the power and control necessary to efficiently manage and monitor data processing tasks. The examples above highlight just a few of its many applications, showcasing its capabilities in practical and varied scenarios.