How to Use the Command 'blastp' (with Examples)

How to Use the Command 'blastp' (with Examples)

The BLASTP command stands for Basic Local Alignment Search Tool for Proteins and is employed widely in bioinformatics to compare an amino acid query sequence against a protein sequence database. It helps researchers identify homologous proteins and infer functional and evolutionary relationships. BLASTP leverages sophisticated algorithms to search protein sequences, which can aid in various research applications such as comparative genomics and proteomics.

Use case 1: Align two or more sequences using blastp, with the e-value threshold of 1e-9, pairwise output format, output to screen

Code:

blastp -query query.fa -subject subject.fa -evalue 1e-9

Motivation:

This use case is ideal for researchers looking to determine the degree of similarity between two protein sequences. Using a stringent e-value threshold ensures only highly significant matches are considered, which is useful in filtering out random matches and focusing on biologically relevant alignments.

Explanation:

  • -query query.fa: Specifies the file containing the query protein sequence.
  • -subject subject.fa: Specifies the file containing the subject protein sequence for alignment.
  • -evalue 1e-9: Sets the expectation value (e-value) threshold for reporting matches. An e-value of 1e-9 indicates a high confidence in the matches, helping filter out lesser significant alignments.

Example Output:

Identities = 250/300 (83%), Positives = 270/300 (90%), Gaps = 2/300 (0.77%)
Query  1    MEIVA...  300
Subject 1   MEIVA...  300

Use case 2: Align two or more sequences using blastp-fast

Code:

blastp -task blastp-fast -query query.fa -subject subject.fa

Motivation:

BLASTP-FAST offers a quicker alignment option for users who need results promptly and can accommodate slightly less sensitivity in their analysis. This is particularly beneficial in large-scale studies or time-sensitive research projects.

Explanation:

  • -task blastp-fast: Specifies using the BLASTP-FAST algorithm for quicker execution at the cost of some sensitivity.
  • -query query.fa: Input query protein sequence file.
  • -subject subject.fa: Input subject protein sequence file for alignment.

Example Output:

Identities = 240/300 (80%), Positives = 260/300 (87%)
Query  1    MFVLK...  300
Subject 1   MFVLK...  300

Use case 3: Align two or more sequences, custom tabular output format, output to file

Code:

blastp -query query.fa -subject subject.fa -outfmt '6 qseqid qlen qstart qend sseqid slen sstart send bitscore evalue pident' -out output.tsv

Motivation:

Researchers may require customized output formats to meet specific data analysis needs. By tailoring the results, they can integrate output into other bioinformatics tools or pipelines for further investigation or visualization.

Explanation:

  • -query query.fa: Input query file containing the protein sequence.
  • -subject subject.fa: Input subject file containing the protein sequence for comparison.
  • -outfmt '6 qseqid qlen qstart qend sseqid slen sstart send bitscore evalue pident': Specifies a custom tabular format for output, which includes fields like query and subject sequence IDs, lengths, start and end positions, bitscore, e-value, and percentage identity.
  • -out output.tsv: Directs the output to the specified file, “output.tsv.”

Example Output (output.tsv):

query1  300  1  300  subject1  300  1  300  500  1e-20  95.0
query2  285  1  285  subject2  280  1  280  450  3e-15  90.0

Use case 4: Search protein databases using a protein query, 16 threads to use in the BLAST search, with a maximum number of 10 aligned sequences to keep

Code:

blastp -query query.fa -db blast_database_name -num_threads 16 -max_target_seqs 10

Motivation:

Incorporating parallel computing by utilizing 16 threads accelerates the BLAST search, making it feasible to process large datasets efficiently. Limiting output to the top 10 sequences helps focus the analysis on the most relevant alignments.

Explanation:

  • -query query.fa: Specifies the input query sequence.
  • -db blast_database_name: Designates the protein database to search.
  • -num_threads 16: Utilizes 16 threads for parallel processing, enhancing performance on multicore systems.
  • -max_target_seqs 10: Limits the number of stored alignments to 10, concentrating on top hits.

Example Output:

Sequence     E-value       Identity
Seq1         2e-50         95%
Seq2         4e-45         93%
...

Use case 5: Search the remote non-redundant protein database using a protein query

Code:

blastp -query query.fa -db nr -remote

Motivation:

Accessing remote databases like the non-redundant (nr) protein database allows researchers to utilize comprehensive datasets offered by NCBI. This is particularly useful for comparative studies against a wide array of known proteins.

Explanation:

  • -query query.fa: Denotes the protein sequence to query.
  • -db nr: Specifies using the non-redundant protein database.
  • -remote: Enables querying a remote database, accessing up-to-date and extensive datasets without needing local database installation.

Example Output:

Protein_ID   Description                   E-value
XP_001234   Hypothetical Protein 1        1e-35 
XP_002345   Conserved protein precursor   2e-40
...

Use case 6: Display help (use -help for detailed help)

Code:

blastp -h

Motivation:

Researchers and bioinformaticians often need to quickly check command options and syntax, especially when dealing with intricate parameters or when scripting automated workflows.

Explanation:

  • -h: Displays a brief help message, listing available command-line options and their short descriptions.

Example Output:

Usage: blastp [options]

Options:
  -help                       Print full usage, including all advanced options.
  -query <File_In>            File name of input file containing query sequence(s).
  ...

Conclusion:

BLASTP remains an invaluable tool in bioinformatics for protein sequence comparison and functional annotation. From aligning sequences to analyzing vast protein databases, the command offers a robust set of features to support diverse research endeavors. By understanding its different usage scenarios, researchers can leverage BLASTP to unveil novel insights into protein function and evolution.

Related Posts

How to Use the Command `doctl databases user` (with examples)

How to Use the Command `doctl databases user` (with examples)

The doctl databases user command is a powerful and flexible tool provided by DigitalOcean to manage database users from the command line.

Read More
How to Use the Command 'qm list' (with examples)

How to Use the Command 'qm list' (with examples)

The qm list command is a versatile tool used in the Proxmox VE environment to manage and display virtual machines (VMs).

Read More
How to use the command 'cradle package' (with examples)

How to use the command 'cradle package' (with examples)

The cradle package command is a versatile tool for managing packages within a Cradle instance.

Read More