How to use the command 'comm' (with examples)

How to use the command 'comm' (with examples)

The comm command is a powerful utility available in Unix/Linux environments designed to compare two sorted files line by line. It helps users identify and manipulate differences and similarities between the two files by producing distinct columns of unique and common lines. This comm command is particularly useful in data analysis, software development, and system administration for processing textual data, identifying configurations, or managing lists.

Use case 1: Produce three tab-separated columns: lines only in first file, lines only in second file and common lines

Code:

comm file1 file2

Motivation:

You may need to have a comprehensive view of what is unique to each file and what they share, such as when analyzing differences and similarities between two datasets or configuration lists. This can be particularly useful in scenarios like selecting unique product codes or syncing differences between two versions of a text document.

Explanation:

  • comm: The base command used to compare two files.
  • file1: The first input file to be compared.
  • file2: The second input file to be compared.

In this form, comm generates three columns of output: the first displays lines unique to file1, the second shows lines unique to file2, and the third contains lines common to both files. Ensure both files are sorted beforehand; otherwise, the output will be inaccurate.

Example Output:

	Alice
Bob
	Charlie
David
Edward
	Fiona

In this example, “Bob” and “David” are unique to file1, “Edward” is unique to file2, and “Alice”, “Charlie”, and “Fiona” are common to both.

Use case 2: Print only lines common to both files

Code:

comm -12 file1 file2

Motivation:

When you need to extract only the shared information between two files, such as finding common entries in two databases or lists or identifying mutual contacts from two address books. This helps reduce redundancy and focus on shared data.

Explanation:

  • -12: This option suppresses the first and second columns of unique lines, showing only the third column (common lines).
  • file1: The first sorted file.
  • file2: The second sorted file.

The -12 flag is pivotal, as it ensures that only common lines between file1 and file2 are displayed.

Example Output:

Alice
Charlie
Fiona

The output lists names (e.g., Alice, Charlie, Fiona) that appear in both file1 and file2.

Use case 3: Print only lines common to both files, reading one file from stdin

Code:

cat file1 | comm -12 - file2

Motivation:

This use case embodies flexibility by allowing data to be piped directly to comm. It is beneficial when the first file is generated or transformed on-the-fly or when working within a chain of command execution, thereby avoiding the need to create a temporary file.

Explanation:

  • cat file1: Reads the content of file1 and outputs it to stdout.
  • |: Pipes the output from cat into the comm command.
  • -12: Suppresses unique lines from both files, only printing common lines.
  • -: Represents a standard input substitute for file1.
  • file2: The second sorted file.

Handling one file via stdin provides flexibility in workflows and scripting.

Example Output:

Alice
Charlie
Fiona

The result remains consistent, showing lines common to both files.

Use case 4: Get lines only found in first file, saving the result to a third file

Code:

comm -23 file1 file2 > file1_only

Motivation:

This usage allows for isolation of records unique to the first data set and saving them for future use, such as generating a report or update file. Efficiently managing your data often means extracting unique differences that require follow-up actions.

Explanation:

  • -23: Suppresses the second and third columns, displaying lines unique to file1.
  • file1: The first input file, to find unique lines in.
  • file2: The second file, which content is compared against.
  • > file1_only: Redirects the unique lines from file1 into the file named file1_only.

Output redirection (>) is used to save the result into a specific file instead of displaying it on the screen.

Example Output:

Assuming the content was redirected, file1_only will contain:

Bob
David

This isolates the lines exclusive to file1.

Use case 5: Print lines only found in second file, when the files aren’t sorted

Code:

comm -13 <(sort file1) <(sort file2)

Motivation:

Files are often unsorted in their raw or natural state, necessitating proper sorting before analysis. This example shows the dynamic duo of comm and sort via process substitution, useful during real-time data comparison or rapid scripting tasks.

Explanation:

  • -13: Prints lines unique to the second file by suppressing the first and third columns.
  • <(sort file1): This process substitution sorts and feeds a sorted version of file1 into comm.
  • <(sort file2): Similarly, sorts file2 before feeding it into comm.

Process substitution, <(...), creates temporary sorted outputs, allowing comm to work seamlessly on unsorted input.

Example Output:

Edward

The output demonstrates that “Edward” is found solely in file2.

Conclusion

The comm command serves as a versatile and efficient tool for text comparison tasks in Unix/Linux systems. From simple column separation of file differences to complex, on-the-fly data piping, comm offers a breadth of functionality in text processing that ensures productive data management and analysis.

Related Posts

Using the 'virsh pool-autostart' Command (with examples)

Using the 'virsh pool-autostart' Command (with examples)

The virsh pool-autostart command is a powerful tool used in virtualization environments to manage the autostart property of storage pools.

Read More
Understanding the `iw` Command for Wireless Management (with examples)

Understanding the `iw` Command for Wireless Management (with examples)

The iw command is a powerful tool used in Linux environments for managing and manipulating wireless devices and configurations.

Read More
How to Use the Command 'code' (with examples)

How to Use the Command 'code' (with examples)

Visual Studio Code, often referred to simply as “VS Code,” is a versatile and widely-used code editor developed by Microsoft.

Read More