How to use the command 'pv' (with examples)

How to use the command 'pv' (with examples)

The pv command, or “Pipe Viewer,” is a versatile tool in the Unix/Linux ecosystem, designed to monitor the progress of data through pipelines. It’s particularly valuable when working with data streams, enabling users to visualize and measure the flow of data. The various uses of pv include tracking data transfer speeds, verifying data processing in real-time, and managing large file operations with ease. This article will delve into several practical uses of pv, demonstrating its power and flexibility in different contexts.

Use case 1: Print the contents of the file and display a progress bar

Code:

pv path/to/file

Motivation: In scenarios where you are dealing with large files, it can be useful to have a visual indication of how much of the file has been processed or transferred. Whether you are monitoring a file being sent over a network or simply assessing the size of a local file, the progress bar provides valuable feedback on the operation’s advancement.

Explanation:

  • pv: The command invokes the Pipe Viewer tool.
  • path/to/file: This specifies the file whose contents you want to read and display with a progress monitoring interface.

Example Output:

3.5MiB 0:00:02 [1.47MiB/s] [===============>                     ] 35% ETA 0:00:05

Use case 2: Measure the speed and amount of data flow between pipes

Code:

command1 | pv --size expected_amount_of_data_for_eta | command2

Motivation: When processing data through multiple commands in a pipeline, it’s often necessary to know both the speed and the volume of data being processed. This information can help in diagnosing bottlenecks or optimizing processes to improve performance.

Explanation:

  • command1: Represents the first command in the pipeline whose output is being processed.
  • pv: Utilizes Pipe Viewer to monitor the data between command1 and command2.
  • --size expected_amount_of_data_for_eta: Provides an estimated total size, allowing pv to calculate an estimated time of arrival (ETA) for the completion of data transfer.
  • command2: Represents the second command in the pipeline which processes the data coming from command1.

Example Output:

5.6MiB 0:00:04 [1.35MiB/s] [======================>              ] 65% ETA 0:00:02

Use case 3: Filter a file, see both progress and amount of output data

Code:

pv -cN in big_text_file | grep pattern | pv -cN out > filtered_file

Motivation: In data analysis and log processing, it’s commonplace to filter files to extract relevant information. By using pv, you can not only filter the data but also track the progress of both input and output, offering a comprehensive overview of how much data is processed and generated.

Explanation:

  • pv -cN in: The -cN switch enables cursor control and labels the data stream as “in.”
  • big_text_file: Indicates the file being filtered.
  • grep pattern: Filters lines that contain the specified pattern.
  • pv -cN out: Again uses the -cN switch to label the data stream as “out,” while also showing the data flowing into the filtered_file.
  • > filtered_file: Redirects the output to a filtered file.

Example Output:

in: 20.0MiB 0:00:15 [1.30MiB/s] [=========>             ] 50% ETA 0:00:15 
out: 4.0MiB 0:00:02 [1.50MiB/s] [============>          ] 80% ETA 0:00:01 

Use case 4: Attach to an already running process and see its file reading progress

Code:

pv -d PID

Motivation: Sometimes, you need mid-process information about a file being read by an already-running process. This is useful for long-running operations where you want to gauge how far along the process is without interrupting it.

Explanation:

  • pv: Starts Pipe Viewer.
  • -d PID: The -d option attaches to a specific process identified by its Process ID (PID), allowing you to monitor its file reading operations in real-time.

Example Output:

4.0MiB 0:00:10 [390KiB/s] [============>               ] 60% ETA 0:00:05

Use case 5: Read an erroneous file, skip errors as dd conv=sync,noerror would

Code:

pv -EE path/to/faulty_media > image.img

Motivation: In data recovery situations, dealing with a faulty media file can lead to read errors that interrupt processes. Skipping errors during reading ensures that the process can continue even if some data is irretrievable. This technique is crucial for scenarios where every accessible piece of data counts.

Explanation:

  • pv: Uses Pipe Viewer.
  • -EE: Allows pv to ignore read errors, akin to dd with the conv=sync,noerror options, enabling continued processing despite errors.
  • path/to/faulty_media: The path specifies the faulty media file to be read.
  • > image.img: Redirects output to an image file.

Example Output:

2.5GiB 0:01:52 [390KiB/s] [========>                    ] 35% ETA 0:03:28

Use case 6: Stop reading after reading specified amount of data, rate limit to 1K/s

Code:

pv -L 1K --stop-at --size maximum_file_size_to_be_read

Motivation: Control over data flow is exceptionally desirable when bandwidth is a constraint, or when you aim to process or sample only a specific part of the data. Limiting the read rate and setting a stop condition allows you to manage data input efficiently.

Explanation:

  • pv: Invokes Pipe Viewer.
  • -L 1K: Limits the read rate to 1KB per second, thereby controlling the data flow into the pipeline.
  • --stop-at: Determines a stopping point for reading, particularly for large datasets where a partial read is needed.
  • --size maximum_file_size_to_be_read: Sets the maximum size of data to read, enabling you to stop the read after this limit is reached.

Example Output:

100K 0:01:40 [1.02KiB/s] [===================================] 100%

Conclusion:

The pv command is an essential tool for anyone working with pipes and data streams in Unix/Linux environments. By visually tracking data movement, limiting throughput, and handling errors gracefully, pv provides users with the power and control necessary to efficiently manage and monitor data processing tasks. The examples above highlight just a few of its many applications, showcasing its capabilities in practical and varied scenarios.

Related Posts

How to Manage Paper Size Options with the 'tlmgr paper' Command (with examples)

How to Manage Paper Size Options with the 'tlmgr paper' Command (with examples)

The tlmgr paper command is part of the TeX Live Manager suite of tools, designed to manage the paper size configurations for TeX Live installations.

Read More
How to Use the 'pr' Command (with Examples)

How to Use the 'pr' Command (with Examples)

The pr command in Unix-based systems is a powerful tool used for paginating or columnating files for printing.

Read More
How to use the command 'grub-mkconfig' (with examples)

How to use the command 'grub-mkconfig' (with examples)

GRUB, or the GRand Unified Bootloader, is an integral part of GNU/Linux systems that allows users to have different operating systems on one machine and manage them at the boot level.

Read More