How to use the command 'datamash' (with examples)

How to use the command 'datamash' (with examples)

The ‘datamash’ command is a powerful tool that allows users to perform basic numeric, textual, and statistical operations on input textual data files. It is especially useful for data analysis and manipulation tasks.

Use case 1: Get max, min, mean and median of a single column of numbers

Code:

seq 3 | datamash max 1 min 1 mean 1 median 1

Motivation: This use case is useful when you want to quickly analyze a column of numerical data. By using the ‘datamash’ command, you can easily calculate the maximum, minimum, mean, and median values of the values in a single column.

Explanation:

  • seq 3 generates a sequence of numbers from 1 to 3.
  • max 1 calculates the maximum value in column number 1.
  • min 1 calculates the minimum value in column number 1.
  • mean 1 calculates the mean value (average) of column number 1.
  • median 1 calculates the median value of column number 1 (the middle value when the numbers are sorted in ascending order).

Example output:

3       1       2       2

Use case 2: Get the mean of a single column of float numbers (floats must use “,” and not “.”)

Code:

echo -e '1.0\n2.5\n3.1\n4.3\n5.6\n5.7' | tr '.' ',' | datamash mean 1

Motivation: In some cases, decimal numbers may be represented using a comma instead of a period as the decimal separator. This use case is helpful when you have a single column of float numbers with comma separators and you want to calculate the mean value.

Explanation:

  • echo -e '1.0\n2.5\n3.1\n4.3\n5.6\n5.7' prints a series of float numbers, each on a new line.
  • tr '.' ',' replaces all occurrences of ‘.’ with ‘,’ in the input to convert decimal separators.
  • mean 1 calculates the mean value of column number 1.

Example output:

3.933333333

Use case 3: Get the mean of a single column of numbers with a given decimal precision

Code:

echo -e '1\n2\n3\n4\n5\n5' | datamash -R number_of_decimals_wanted mean 1

Motivation: When calculating the mean value of a column, you may want to specify the decimal precision of the output. This use case is useful when you have a single column of numbers and you want to round the mean value to a specific number of decimal places.

Explanation:

  • echo -e '1\n2\n3\n4\n5\n5' prints a series of numbers, each on a new line.
  • -R number_of_decimals_wanted specifies the number of decimal places in the output mean value.
  • mean 1 calculates the mean value of column number 1.

Example output:

3.333333333

Use case 4: Get the mean of a single column of numbers ignoring “Na” and “NaN” (literal) strings

Code:

echo -e '1\n2\nNa\n3\nNaN' | datamash --narm mean 1

Motivation: When dealing with data sets, it is common to encounter missing values represented as “Na” or “NaN”. This use case allows you to calculate the mean of a column of numbers while ignoring these missing values.

Explanation:

  • echo -e '1\n2\nNa\n3\nNaN' prints a series of numbers and the literal strings “Na” and “NaN”, each on a new line.
  • --narm instructs the ‘datamash’ command to ignore the literal strings “Na” and “NaN”.
  • mean 1 calculates the mean value of column number 1.

Example output:

2

Conclusion:

The ‘datamash’ command provides a convenient way to perform basic numeric, textual, and statistical operations on input textual data files. With its various options and operations, it can be a valuable tool for data analysis and manipulation tasks.

Related Posts

How to use the command 'decaffeinate' (with examples)

How to use the command 'decaffeinate' (with examples)

The ‘decaffeinate’ command is used to convert CoffeeScript source code to modern JavaScript.

Read More
How to use the command 'gpgv' (with examples)

How to use the command 'gpgv' (with examples)

The ‘gpgv’ command is used to verify OpenPGP signatures. It is commonly used to ensure the authenticity and integrity of files that have been signed using the OpenPGP standard.

Read More
How to use the command 'xml transform' (with examples)

How to use the command 'xml transform' (with examples)

The ‘xml transform’ command is used to transform XML documents using XSLT (Extensible Stylesheet Language Transformations).

Read More