How to Use the Command `parallel` (with Examples)
The parallel
command is a powerful tool from GNU that allows users to execute shell commands simultaneously across multiple CPU cores. This utility significantly improves efficiency and performance by automating concurrency, making it an invaluable asset for handling large data-processing tasks across systems. Whether compressing files, converting images, or downloading data, parallel
ensures tasks are executed swiftly using available processing power.
Use Case 1: Gzip Several Files at Once, Using All Cores
Code:
parallel gzip ::: path/to/file1 path/to/file2 ...
Motivation:
Imagine having a directory filled with hundreds of large log files that need compressing. Compressing them one by one is time-consuming and inefficient, especially with multi-core processors idling away. This method leverages all available CPU cores, drastically reducing the time required for compression.
Explanation:
parallel
: Invokes the GNU parallel tool to execute commands concurrently.gzip
: The command applied to compress files.:::
: Introduces the list of arguments, each treated as a separate command-line argument.path/to/file1 path/to/file2 ...
: Specifies the files to be compressed. Theparallel
command will handle each file individually, distributing them as tasks over multiple cores.
Example Output:
The command compresses each file with output similar to the following, depending on file processing speed:
Gzip successful on file1
Gzip successful on file2
...
Use Case 2: Read Arguments from stdin
, Run 4 Jobs at Once
Code:
ls *.txt | parallel -j4 gzip
Motivation:
When tasked with compressing a collection of .txt
files, you might want to limit CPU usage by controlling the number of concurrent tasks. Here, controlling concurrency to four jobs ensures system resources are efficiently utilized without overloading any single CPU.
Explanation:
ls *.txt
: Lists all.txt
files in the current directory.|
: Pipes the output of the file list toparallel
.parallel
: Executes commands in parallel.-j4
: Specifies running only four jobs concurrently.gzip
: The command that gets applied to compress the incoming file arguments.
Example Output:
You’ll observe compression starting for a batch of up to four .txt
files at a time:
Compressing file1.txt
Compressing file2.txt
...
Use Case 3: Convert JPEG Images to PNG Using Replacement Strings
Code:
parallel convert {} {.}.png ::: *.jpg
Motivation:
Batch image format conversion tasks like converting JPEG to PNG are common in image processing. This command simplifies the conversion task by running the conversions simultaneously across all images, saving precious time on manual execution.
Explanation:
parallel
: Initiates concurrent command execution.convert
: ImageMagick command-line tool for converting image formats.{}
: Placeholder for the current argument (JPEG file being processed).{.}
: Represents the file name without the extension..png
: The new file extension for the converted image.:::
: Indicates the following as argument list.*.jpg
: All JPEG files in the current directory to be converted.
Example Output:
You’d see the images being converted, with the system reporting progress as:
Converting image1.jpg to image1.png
Converting image2.jpg to image2.png
...
Use Case 4: Parallel Xargs, Cram as Many Args as Possible onto One Command
Code:
args | parallel -X command
Motivation:
In scenarios where you have numerous arguments to pass to a single command line but want to maximize efficiency by using fewer processes, packing as many arguments onto one command minimizes process creation overhead.
Explanation:
args
: Placeholder for demonstration; typically, this would be a list of arguments.|
: Pipes the list of arguments toparallel
.parallel
: Executes commands using parallel processing.-X
: Instructsparallel
to cram as many arguments as possible into each invocation ofcommand
.command
: The operation or application being executed on the provided arguments.
Example Output:
The command outputs depend on the crammed arguments, usually showing a concise result of batch processed data:
Processing arguments in bulk...
Use Case 5: Break stdin
into ~1M Blocks, Feed Each Block to stdin
of New Command
Code:
cat big_file.txt | parallel --pipe --block 1M command
Motivation:
When dealing with massive text files, breaking the file into manageable chunks makes processing efficient and reduces memory bottleneck risks. This method allows you to efficiently handle large files by feeding chunks to a command, which processes data in blocks rather than line-by-line.
Explanation:
cat big_file.txt
: Outputs the contents of a large text file.|
: Pipes the output toparallel
.parallel
: Organizes data processing tasks.--pipe
: Indicates that the block-oriented processing mode splits the input.--block 1M
: Specifies each chunk’s size as approximately 1 megabyte.command
: The process to be applied on each block, defined per user needs.
Example Output:
The system manages processes by block chunks, printing results from each processed block:
Processing block 1...
Processing block 2...
...
Use Case 6: Run on Multiple Machines via SSH
Code:
parallel -S machine1,machine2 command ::: arg1 arg2
Motivation:
In distributed computing environments, executing tasks across multiple machines can harness the combined power of several computers, significantly speeding up operation completion. This is ideal for distributed workloads like simulations or data analytics.
Explanation:
parallel
: Executes commands in parallel across multiple machines.-S machine1,machine2
: Designates the target machines, separated by commas.command
: The operation or script executed on each specified machine.:::
: Denotes a list of arguments to pass to the command.arg1 arg2
: Represents the command arguments processed on each machine.
Example Output:
The command facilitates the distribution of tasks to machines, with progress such as:
Task started on machine1 with arg1
Task started on machine2 with arg2
...
Use Case 7: Download 4 Files Simultaneously from a Text File Containing Links Showing Progress
Code:
parallel -j4 --bar --eta wget -q {} :::: path/to/links.txt
Motivation:
When handling multiple file downloads, especially from a list of URLs, doing so sequentially can be a bottleneck. This command executes downloads in parallel, displaying progress indicators for enhanced monitoring.
Explanation:
parallel
: Handles parallel execution of download tasks.-j4
: Restricts concurrent downloads to four, optimizing network and system resource eating.--bar
: Shows a progress bar for ongoing downloads.--eta
: Provides an estimated time of arrival for the task completion.wget -q
: The command to retrieve files, with-q
suppressing output to focus on wget’s operational progress.{}
: Placeholder for substituting arguments (URLs) from the file.::::
: Introduces a file containing a list of URLs.path/to/links.txt
: Path to the file with lists of links to download.
Example Output:
View the download progress alongside estimated completion times and progress bars:
Downloading file1...
Downloading file2...
[=> ] 50% ETA 00:02:00
...
Use Case 8: Print the Jobs Which parallel
is Running in stderr
Code:
parallel -t command ::: args
Motivation:
When debugging or verifying what commands are being executed, real-time transparency can be instrumental. Printing these jobs during execution aids debugging by offering insights into task management.
Explanation:
parallel
: Executes tasks in parallel.-t
: Commandsparallel
to output what it is executing before actually running the job, sending information to standard error.command
: Represents the operation being performed on listed arguments.:::
: Introduces the arguments list thatparallel
will process.args
: Command-line arguments on whichcommand
operates.
Example Output:
This option prints detailed execution plans, ensuring visibility like:
Running command with arg1
Running command with arg2
...
Conclusion:
The parallel
command proves indispensable in optimizing tasks across CPUs and networked machines. By executing multiple processes simultaneously, users can achieve remarkable efficiency and productivity in scenarios ranging from file compression to network downloads, resource-efficient data management, and distributed computation. This functionality not just leverages multi-core processors effectively but also seamlessly extends processing across networked systems, making it a go-to tool for anyone working with substantial workloads.