How to use the command 'parallel' (with examples)
The command ‘parallel’ is a powerful tool that allows you to execute commands in parallel on multiple CPU cores. It is particularly useful when you have a large amount of work to be done and want to take advantage of multiple cores to speed up the execution. ‘parallel’ works by splitting the work into smaller tasks, distributing them across the available CPU cores, and executing them concurrently.
Use case 1: Gzip several files at once, using all cores
Code:
parallel gzip ::: file1 file2 file3
Motivation: The motivation for using this example is to demonstrate how ‘parallel’ can be used to gzip multiple files concurrently, which can significantly reduce compression time when you have a large number of files.
Explanation:
parallel
: The command name.gzip
: The command to be executed in parallel.:::
: Delimits the list of arguments to be passed to ‘gzip’.file1 file2 file3
: The list of files to be gzipped.
Example Output:
file1.gz
file2.gz
file3.gz
Use case 2: Read arguments from ‘stdin’, run 4 jobs at once
Code:
ls *.txt | parallel -j4 gzip
Motivation: The motivation for using this example is to demonstrate how ‘parallel’ can read arguments from ‘stdin’ and execute commands in parallel with a specified number of jobs. This can be useful when you want to process a large number of files and limit the number of parallel jobs to avoid overwhelming system resources.
Explanation:
ls *.txt
: Lists all the files with the ‘.txt’ extension in the current directory.|
: Pipes the output of ’ls’ to ‘parallel’.parallel
: The command name.-j4
: Specifies the number of jobs to run concurrently.gzip
: The command to be executed in parallel.
Example Output:
file1.txt.gz
file2.txt.gz
file3.txt.gz
Use case 3: Convert JPG images to PNG using replacement strings
Code:
parallel convert {} {.}.png ::: *.jpg
Motivation: The motivation for using this example is to demonstrate how ‘parallel’ can be used to convert multiple JPG images to PNG format in a single command. With ‘parallel’, each conversion will be executed concurrently, saving time when dealing with a large number of images.
Explanation:
convert {} {.}.png
: The command to be executed in parallel, with ‘{}’ being replaced by the input file name and ‘{.}’ being replaced by the input file name without the file extension.:::
: Delimits the list of input files to be converted.*.jpg
: Matches all files with the ‘.jpg’ extension in the current directory.
Example Output:
image1.jpg.png
image2.jpg.png
image3.jpg.png
Use case 4: Parallel xargs, cram as many args as possible onto one command
Code:
args | parallel -X command
Motivation: The motivation for using this example is to demonstrate how ‘parallel’ can be used with ‘xargs’ to execute commands with a large number of arguments efficiently. By using ‘parallel’ in combination with ‘xargs’, the arguments can be split into multiple commands and executed in parallel, helping to maximize the usage of system resources.
Explanation:
args
: Represents a command that generates a list of arguments.|
: Pipes the output of ‘args’ to ‘parallel’.parallel
: The command name.-X
: Crams as many arguments as possible onto one command.command
: The command to be executed in parallel.
Example Output:
Output of command executed with multiple arguments
Use case 5: Break ‘stdin’ into ~1M blocks, feed each block to ‘stdin’ of new command
Code:
cat big_file.txt | parallel --pipe --block 1M command
Motivation: The motivation for using this example is to demonstrate how ‘parallel’ can be used to process large input files by breaking them into smaller blocks and feeding each block to a new command. This can help distribute the workload across multiple CPU cores and improve performance when dealing with large amounts of data.
Explanation:
cat big_file.txt
: Reads the contents of ‘big_file.txt’ and outputs it to ‘stdout’.|
: Pipes the output of ‘cat’ to ‘parallel’.parallel
: The command name.--pipe
: Tells ‘parallel’ to read input from ‘stdin’.--block 1M
: Specifies the size of each block in megabytes.command
: The command to be executed in parallel for each block of input.
Example Output:
Output of command executed with each block of input
Use case 6: Run on multiple machines via SSH
Code:
parallel -S machine1,machine2 command ::: arg1 arg2
Motivation: The motivation for using this example is to illustrate how ‘parallel’ can be used to execute a command on multiple remote machines via SSH. This can be useful for distributing workload across multiple machines and utilizing their combined computing power.
Explanation:
parallel
: The command name.-S machine1,machine2
: Specifies the list of remote machines to execute the command on.command
: The command to be executed on remote machines.:::
: Delimits the list of arguments to be passed to ‘command’.arg1 arg2
: The list of arguments to be passed to ‘command’.
Example Output:
Output of command executed on remote machines for each argument
Conclusion:
The command ‘parallel’ is a versatile tool that allows you to execute commands in parallel on multiple CPU cores. It can significantly improve performance by efficiently utilizing system resources and distributing workload across multiple cores. ‘parallel’ provides various options and features that enable you to handle complex scenarios and automate parallel execution of commands. By understanding and utilizing the different use cases of ‘parallel’, you can optimize your command-line workflow and save valuable time.