How to use the command 'mashtree' (with examples)
- Linux
- November 5, 2023
The mashtree
command is a tool that makes a fast tree from genomes. However, it does not create a phylogeny. It can take input in the form of fastq and/or fasta files and output a tree in the Newick format.
Use case 1: Creating a fast tree with multiple threads
Code:
mashtree --numcpus 12 *.fastq.gz *.fasta > mashtree.dnd
Motivation: This use case is the fastest method in mashtree to create a tree from fastq and/or fasta files. By specifying the --numcpus
option with a value of 12, the command utilizes multiple threads to speed up the computation process.
Explanation:
mashtree
is the command line tool being used.--numcpus 12
specifies the number of threads to be used for the computation. In this case, it is set to 12.*.fastq.gz *.fasta
represents the input files that will be used to create the tree. The*
wildcard is used to include all files with a.fastq.gz
or.fasta
extension.> mashtree.dnd
redirects the output of the command to a file namedmashtree.dnd
, which will contain the resulting tree in the Newick format.
Example Output:
The command will generate a tree in the Newick format and save it in the mashtree.dnd
file.
Use case 2: Creating an accurate tree with multiple threads
Code:
mashtree --mindepth 0 --numcpus 12 *.fastq.gz *.fasta > mashtree.dnd
Motivation: This use case aims to create the most accurate tree possible by setting the --mindepth
option to 0. By doing so, no minimum depth threshold is applied during the computation process, resulting in more accurate results.
Explanation:
mashtree
is the command line tool being used.--mindepth 0
specifies the minimum depth threshold to be used for the computation. Setting it to 0 means that no minimum depth threshold is applied.--numcpus 12
specifies the number of threads to be used for the computation. In this case, it is set to 12.*.fastq.gz *.fasta
represents the input files that will be used to create the tree. The*
wildcard is used to include all files with a.fastq.gz
or.fasta
extension.> mashtree.dnd
redirects the output of the command to a file namedmashtree.dnd
, which will contain the resulting tree in the Newick format.
Example Output:
The command will generate a tree in the Newick format with improved accuracy due to the lack of a minimum depth threshold. The resulting tree will be saved in the mashtree.dnd
file.
Use case 3: Creating a tree with confidence values
Code:
mashtree_bootstrap.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.bootstrap.dnd
Motivation: This use case is specifically aimed at creating a tree with confidence values. By utilizing the mashtree_bootstrap.pl
script and specifying the --reps
option with a value of 100, the command performs bootstrapping to generate confidence values for the tree.
Explanation:
mashtree_bootstrap.pl
is a script used to create a tree with confidence values.--reps 100
specifies the number of bootstrap replicates to be performed. In this case, the value is set to 100.--numcpus 12
specifies the number of threads to be used for the computation. In this case, it is set to 12.*.fastq.gz
represents the input files that will be used to create the tree. The*
wildcard is used to include all files with a.fastq.gz
extension.-- --min-depth 0
separates the options formashtree_bootstrap.pl
from the options formashtree
itself. It sets the minimum depth threshold to 0, allowing more accurate results.> mashtree.bootstrap.dnd
redirects the output of the command to a file namedmashtree.bootstrap.dnd
, which will contain the resulting tree with confidence values in the Newick format.
Example Output:
The command will generate a tree in the Newick format with confidence values. The resulting tree, along with the corresponding confidence values, will be saved in the mashtree.bootstrap.dnd
file.
Conclusion:
The mashtree
command is a powerful tool for creating fast and accurate trees from genomic data. By using different options, it is possible to customize the tree generation process and obtain results that suit specific needs. Whether it’s creating a fast tree, improving accuracy, or generating a tree with confidence values, the mashtree
command has got you covered.