How to use the command einfo (with examples)
- Linux
- December 25, 2023
The einfo
command is part of the Entrez Direct suite of programs provided by the National Center for Biotechnology Information (NCBI). It is used to retrieve information about the available databases, fields, and links in the NCBI Entrez system. This command is especially useful for biologists and researchers who need to gather specific information from the vast collection of biological databases offered by NCBI.
Use case 1: Print all database names
Code:
einfo -dbs
Motivation:
By using the einfo -dbs
command, you can quickly retrieve a list of all the databases available in the NCBI Entrez system. This list can be helpful when deciding which databases to explore further for retrieving specific biological information.
Explanation:
einfo
: The command to retrieve information about the available databases.-dbs
: A flag that specifies to print all database names.
Example output:
protein
nucleotide
gene
genome
taxonomy
pubmed
...
Use case 2: Print all information of the protein database in XML format
Code:
einfo -db protein
Motivation:
If you are interested in exploring the details and structure of the protein database, you can use the einfo -db protein
command. It provides you with all the information about the protein database in XML format.
Explanation:
einfo
: The command to retrieve information about the available databases.-db protein
: A flag that specifies the protein database.
Example output (truncated for brevity):
<?xml version="1.0"?>
<!DOCTYPE DatabaseProperties PUBLIC "-//NLM//DTD DatabaseProperties, 13th January 2017//EN" "https://www.ncbi.nlm.nih.gov/corehtml/query/DTD/database.dtd">
<DatabaseProperties>
<DbBuild>Build201</DbBuild>
<Name>protein</Name>
<MenuName>Protein</MenuName>
<Description>The
protein sequence database curated by the Swiss Institute of Bioinformatics (SIB) UniProt consortium. This
database, UniProt, is a central hub for the collection and dissemination of functional information on proteins,
...
Use case 3: Print all fields of the nuccore database
Code:
einfo -db nuccore -fields
Motivation:
When working with the nuccore database, it can be valuable to know all the available fields that you can query. The einfo -db nuccore -fields
command helps you obtain a complete list of fields associated with the nuccore database.
Explanation:
einfo
: The command to retrieve information about the available databases.-db nuccore
: A flag that specifies the nuccore database.-fields
: A flag that instructs the command to print all available fields.
Example output (truncated for brevity):
Caption
CommentOnStatus
Completeness
...
Use case 4: Print all links of the protein database
Code:
einfo -db protein -links
Motivation:
The protein database in NCBI Entrez is interconnected with various other databases, allowing you to navigate through related information easily. By using the einfo -db protein -links
command, you can retrieve all the available links from the protein database to other Entrez databases.
Explanation:
einfo
: The command to retrieve information about the available databases.-db protein
: A flag that specifies the protein database.-links
: A flag that instructs the command to print all available links.
Example output:
nucleotide_protein_links
nucleotide_nuccore_links
pubmed_protein_citedin
protein_protein_domains
...
Conclusion:
The einfo
command is a versatile tool that provides valuable insights into the NCBI Entrez system. Whether you need to explore available databases, extract specific information from particular fields, or navigate through related databases, einfo
can help you gather the necessary details for your research.