How to use the command 'nextclade' (with examples)
The nextclade
command is a bioinformatics tool used for virus genome alignment, clade assignment, and quality control checks. It helps analyze genetic sequences of viruses, with a focus on SARS-CoV-2. This command has various use cases that facilitate data analysis and reporting.
Use case 1: Create a TSV report, auto-downloading the latest dataset
Code:
nextclade run -d dataset_name path/to/fasta -t path/to/output_tsv
Motivation: This use case is helpful when you want to create a TSV report and automatically download the latest dataset. It saves time and ensures you are working with the most up-to-date information.
Explanation:
run
: Executes thenextclade
command.-d dataset_name
: Specifies the name of the dataset to be used.path/to/fasta
: Specifies the path to the input FASTA file.-t path/to/output_tsv
: Specifies the path to the output TSV file.
Example output: The command will generate a TSV report file containing the analysis results for the given input FASTA file.
Use case 2: List all available datasets
Code:
nextclade dataset list
Motivation: This use case is useful when you want to quickly view a list of all available datasets. It helps you easily identify and select the appropriate dataset for your analysis.
Explanation:
dataset list
: Executes thelist
command for thenextclade dataset
subcommand.
Example output: The command will display a list of available datasets, including their names and descriptions.
Use case 3: Download the latest SARS-CoV-2 dataset
Code:
nextclade dataset get --name sars-cov-2 --output-dir path/to/directory
Motivation: This use case is beneficial when you need to download the latest dataset specifically for the SARS-CoV-2 virus. It ensures you have the most up-to-date information for your analysis.
Explanation:
dataset get
: Executes theget
command for thenextclade dataset
subcommand.--name sars-cov-2
: Specifies the name of the dataset to be downloaded (SARS-CoV-2 in this case).--output-dir path/to/directory
: Specifies the directory where the downloaded dataset will be saved.
Example output: The command will download the latest SARS-CoV-2 dataset and save it in the specified directory.
Use case 4: Use a downloaded dataset, producing all outputs
Code:
nextclade run -D path/to/dataset_dir -O path/to/output_dir path/to/dataset_dir/sequences.fasta
Motivation: This use case is applicable when you want to use a previously downloaded dataset and generate all possible outputs. It allows for a comprehensive analysis of the input sequences.
Explanation:
run
: Executes thenextclade
command.-D path/to/dataset_dir
: Specifies the path to the downloaded dataset directory.-O path/to/output_dir
: Specifies the directory where the output files will be saved.path/to/dataset_dir/sequences.fasta
: Specifies the path to the input FASTA file within the dataset directory.
Example output: The command will analyze the input sequences using the downloaded dataset and generate various output files in the specified output directory.
Use case 5: Run on multiple files
Code:
nextclade run -d dataset_name -t path/to/output_tsv -- path/to/input_fasta_1 path/to/input_fasta_2 ...
Motivation: This use case is useful when you want to analyze multiple input FASTA files in one go. It saves time and streamlines the analysis process.
Explanation:
run
: Executes thenextclade
command.-d dataset_name
: Specifies the name of the dataset to be used.-t path/to/output_tsv
: Specifies the path to the output TSV file.--
: Separates the command options from the input file paths.path/to/input_fasta_1 path/to/input_fasta_2 ...
: Specifies the paths to the input FASTA files.
Example output: The command will analyze each input FASTA file using the specified dataset and produce a consolidated TSV report containing the analysis results.
Use case 6: Try reverse complement if sequence does not align
Code:
nextclade run --retry-reverse-complement -d dataset_name -t path/to/output_tsv path/to/input_fasta
Motivation: This use case is beneficial when encountering alignment-related issues with the input sequence. It automatically tries the reverse complement of the sequence to improve alignment and analysis.
Explanation:
run
: Executes thenextclade
command.--retry-reverse-complement
: Specifies to attempt reverse complement if the sequence does not align initially.-d dataset_name
: Specifies the name of the dataset to be used.-t path/to/output_tsv
: Specifies the path to the output TSV file.path/to/input_fasta
: Specifies the path to the input FASTA file.
Example output: The command will analyze the input sequence. If the initial alignment fails, it will automatically attempt the reverse complement of the sequence and proceed with the analysis. The analysis results will be saved in the specified TSV file.
Conclusion:
The nextclade
command is a versatile tool for virus genome analysis. It provides several convenient use cases, from generating reports to downloading datasets and performing comprehensive analyses. Understanding how to use these use cases will enable researchers and bioinformaticians to analyze virus genomes efficiently.