How to use the command mlr (with examples)

How to use the command mlr (with examples)

Miller is a command-line utility that acts as a combination of multiple tools like awk, sed, cut, join, and sort specifically designed to work with name-indexed data such as CSV, TSV, and tabular JSON. It provides a wide range of functionalities for manipulating and processing structured data in an efficient manner. In this article, we will explore several practical use cases of the mlr command.

Use case 1: Pretty-print a CSV file in a tabular format

Code:

mlr --icsv --opprint cat example.csv

Motivation: When working with large CSV files, it can be difficult to read and interpret the data due to its flat structure. The mlr command allows us to pretty-print the CSV file, presenting the data in a more concise and readable tabular format.

Explanation:

  • --icsv : Specifies that the input file is in CSV format.
  • --opprint : Formats the output in a visually appealing tabular format.
  • cat : The cat verb is used to display the contents of the CSV file.

Example output:

+-------+-----+-----+
| field1|field2|field3|
+-------+-----+-----+
|   A   |  10 |  20 |
|   B   |  30 |  40 |
+-------+-----+-----+

Use case 2: Receive JSON data and pretty print the output

Code:

echo '{"hello":"world"}' | mlr --ijson --opprint cat

Motivation: When dealing with JSON data, it is often challenging to interpret the structure and values. By using mlr, we can easily pretty print the JSON data, making it more readable and easy to comprehend.

Explanation:

  • --ijson : Specifies that the input data is in JSON format.
  • --opprint : Formats the output in a visually appealing tabular format.
  • cat : The cat verb is used to display the JSON data.

Example output:

+-------+--------+
| hello | world  |
+-------+--------+

Use case 3: Sort alphabetically on a field

Code:

mlr --icsv --opprint sort -f field example.csv

Motivation: Sometimes it is necessary to sort the data based on a specific field for better analysis or organization. The mlr command allows us to easily sort the data alphabetically based on a chosen field.

Explanation:

  • --icsv : Specifies that the input file is in CSV format.
  • --opprint : Formats the output in a visually appealing tabular format.
  • sort : Sorts the input data.
  • -f field : Specifies the field to sort by.

Example output:

+-------+-----+-----+
| field1|field2|field3|
+-------+-----+-----+
|   A   |  10 |  20 |
|   B   |  30 |  40 |
+-------+-----+-----+

Use case 4: Sort in descending numerical order on a field

Code:

mlr --icsv --opprint sort -nr field example.csv

Motivation: In some cases, sorting data in descending order based on a numerical field can be crucial for accurate analysis or identifying patterns. The mlr command provides an easy way to sort data in descending numerical order.

Explanation:

  • --icsv : Specifies that the input file is in CSV format.
  • --opprint : Formats the output in a visually appealing tabular format.
  • sort : Sorts the input data.
  • -nr field : Specifies the field to sort by in descending order.

Example output:

+-------+-----+-----+
| field1|field2|field3|
+-------+-----+-----+
|   B   |  30 |  40 |
|   A   |  10 |  20 |
+-------+-----+-----+

Use case 5: Convert CSV to JSON, perform calculations, and display those calculations

Code:

mlr --icsv --ojson put '$newField1 = $oldFieldA/$oldFieldB' example.csv

Motivation: Converting CSV data to JSON and performing calculations on specific fields can be a useful task in data analysis or processing workflows. The mlr command allows us to perform such calculations and display the results.

Explanation:

  • --icsv : Specifies that the input file is in CSV format.
  • --ojson : Specifies the output format as JSON.
  • put '$newField1 = $oldFieldA/$oldFieldB' : Calculates the division of two fields in the record and assigns the result to a new field.

Example output:

{"field1":"A","field2":"10","field3":"20","newField1":0.5}
{"field1":"B","field2":"30","field3":"40","newField1":0.75}

Use case 6: Receive JSON and format the output as vertical JSON

Code:

echo '{"hello":"world", "foo":"bar"}' | mlr --ijson --ojson --jvstack cat

Motivation: Formatting JSON data in a vertical layout can provide a better visualization of the structure, especially when working with complex JSON objects. The mlr command allows us to convert the JSON data into a vertical JSON format.

Explanation:

  • --ijson : Specifies that the input data is in JSON format.
  • --ojson : Specifies the output format as JSON.
  • --jvstack : Converts the JSON data into a vertical layout.
  • cat : The cat verb is used to display the JSON data.

Example output:

{
  "hello": "world"
}
{
  "foo": "bar"
}

Use case 7: Filter lines of a compressed CSV file treating numbers as strings

Code:

mlr --prepipe 'gunzip' --csv filter -S '$fieldName =~ "regular_expression"' example.csv.gz

Motivation: Filtering specific lines from a large compressed CSV file based on certain conditions can be difficult. The mlr command allows us to filter lines by providing regular expressions and additional preprocessing steps.

Explanation:

  • --prepipe 'gunzip' : Preprocesses the input by unzipping the compressed CSV file using gunzip.
  • --csv : Specifies that the input file is in CSV format.
  • filter : Filters the input data based on specific conditions.
  • -S '$fieldName =~ "regular_expression"' : Filters the lines where the field matches the given regular expression.

Example output:

field1,field2,field3
A,10,20

Conclusion:

The mlr command provides a powerful set of functionalities for working with name-indexed data such as CSV, TSV, and tabular JSON. Its ability to combine multiple data processing tools into a single command simplifies and streamlines data manipulation tasks. By exploring the provided use cases, you now have a better understanding of how to utilize mlr in your data processing workflows.

Related Posts

How to use the command po4a-gettextize (with examples)

How to use the command po4a-gettextize (with examples)

The po4a-gettextize command is used to convert files to PO (Portable Object) files.

Read More
How to use the command "git squash" (with examples)

How to use the command "git squash" (with examples)

The “git squash” command is a part of the git-extras package that allows you to combine multiple commits into a single commit.

Read More
How to use the command "age" (with examples)

How to use the command "age" (with examples)

“age” is a simple, modern, and secure file encryption tool. It allows users to encrypt and decrypt files using passphrase-based encryption or public key-based encryption.

Read More