How to use the command 'datamash' (with examples)

The ‘datamash’ command is a powerful tool that allows users to perform basic numeric, textual, and statistical operations on input textual data files. It is especially useful for data analysis and manipulation tasks.

Use case 1: Get max, min, mean and median of a single column of numbers


seq 3 | datamash max 1 min 1 mean 1 median 1

Motivation: This use case is useful when you want to quickly analyze a column of numerical data. By using the ‘datamash’ command, you can easily calculate the maximum, minimum, mean, and median values of the values in a single column.


  • seq 3 generates a sequence of numbers from 1 to 3.
  • max 1 calculates the maximum value in column number 1.
  • min 1 calculates the minimum value in column number 1.
  • mean 1 calculates the mean value (average) of column number 1.
  • median 1 calculates the median value of column number 1 (the middle value when the numbers are sorted in ascending order).

Example output:

3       1       2       2

Use case 2: Get the mean of a single column of float numbers (floats must use “,” and not “.”)


echo -e '1.0\n2.5\n3.1\n4.3\n5.6\n5.7' | tr '.' ',' | datamash mean 1

Motivation: In some cases, decimal numbers may be represented using a comma instead of a period as the decimal separator. This use case is helpful when you have a single column of float numbers with comma separators and you want to calculate the mean value.


  • echo -e '1.0\n2.5\n3.1\n4.3\n5.6\n5.7' prints a series of float numbers, each on a new line.
  • tr '.' ',' replaces all occurrences of ‘.’ with ‘,’ in the input to convert decimal separators.
  • mean 1 calculates the mean value of column number 1.

Example output:


Use case 3: Get the mean of a single column of numbers with a given decimal precision


echo -e '1\n2\n3\n4\n5\n5' | datamash -R number_of_decimals_wanted mean 1

Motivation: When calculating the mean value of a column, you may want to specify the decimal precision of the output. This use case is useful when you have a single column of numbers and you want to round the mean value to a specific number of decimal places.


  • echo -e '1\n2\n3\n4\n5\n5' prints a series of numbers, each on a new line.
  • -R number_of_decimals_wanted specifies the number of decimal places in the output mean value.
  • mean 1 calculates the mean value of column number 1.

Example output:


Use case 4: Get the mean of a single column of numbers ignoring “Na” and “NaN” (literal) strings


echo -e '1\n2\nNa\n3\nNaN' | datamash --narm mean 1

Motivation: When dealing with data sets, it is common to encounter missing values represented as “Na” or “NaN”. This use case allows you to calculate the mean of a column of numbers while ignoring these missing values.


  • echo -e '1\n2\nNa\n3\nNaN' prints a series of numbers and the literal strings “Na” and “NaN”, each on a new line.
  • --narm instructs the ‘datamash’ command to ignore the literal strings “Na” and “NaN”.
  • mean 1 calculates the mean value of column number 1.

Example output:



The ‘datamash’ command provides a convenient way to perform basic numeric, textual, and statistical operations on input textual data files. With its various options and operations, it can be a valuable tool for data analysis and manipulation tasks.

