Overview

The NCBI Taxonomy Database is the authoritative taxonomic classification system used across all sequence databases maintained by the National Center for Biotechnology Information. It provides a single, curated hierarchy of organism names and classifications that underpins GenBank, RefSeq, PubMed, and every other NCBI resource, making it one of the most heavily referenced biological vocabularies in existence.

Background

The NCBI Taxonomy Database was developed in the early 1990s as part of NCBI's mission to provide integrated access to molecular biology information. As nucleotide and protein sequence databases grew, there was a critical need for a consistent taxonomic framework to organize sequences by organism. Rather than adopting any single existing taxonomic authority, the NCBI Taxonomy group curates a synthetic classification that draws on published taxonomic literature while maintaining internal consistency across all NCBI databases.

The database is continuously updated as new organisms are sequenced and as taxonomic revisions are published. It currently contains over 2.5 million taxa, representing approximately 10% of the described species of life on the planet.

Purpose & Scope

The NCBI Taxonomy serves as the standard reference for organism classification within the INSDC (International Nucleotide Sequence Database Collaboration), which includes GenBank (USA), the European Nucleotide Archive (ENA), and the DNA Data Bank of Japan (DDBJ). Every sequence submitted to these databases must be associated with a valid NCBI Taxonomy identifier (TaxID).

The taxonomy covers all domains of life — Bacteria, Archaea, and Eukaryota — as well as viruses and unclassified sequences. It includes scientific names, common names, synonyms, and a hierarchical classification from superkingdom down to subspecies and strain level where applicable.

Key Features

Feature	Description
Taxa	2.5 million+
Coverage	All organisms in INSDC sequence databases
Identifiers	Numeric TaxID (e.g., 9606 for Homo sapiens)
Updates	Continuous
Ranks	Domain through subspecies/strain

Serializations & Technical Formats

The NCBI Taxonomy is available for bulk download from the NCBI FTP server in a flat-file dump format (taxdump). Individual records can be retrieved through the NCBI Taxonomy Browser web interface or programmatically via the Entrez E-Utilities API. The data is also integrated into the NCBI Datasets resource.

Governance & Maintenance

The NCBI Taxonomy is maintained by a dedicated curation team at the National Center for Biotechnology Information, part of the National Library of Medicine (NLM) at the US National Institutes of Health (NIH). Taxonomic updates are informed by published literature, submissions from sequence depositors, and consultation with domain experts. As a US government resource, the data is in the public domain.

Notable Implementations

The NCBI TaxID is used as the organism identifier in GenBank, RefSeq, UniProt, the Protein Data Bank, and thousands of other biological databases. It serves as the de facto standard for linking molecular data to organism identity in bioinformatics pipelines worldwide. The Common Tree tool allows users to generate phylogenetic trees for selected taxa.

Related Standards

Darwin Core — biodiversity data standard that references taxonomic authorities including NCBI

Resources & Links

Specification

Taxonomy Browser

Documentation

Registry Entry

Wikidata Entry

API

E-Utilities API

Other

Related Standards

Darwin Core (DwC)

Biodiversity Information Standards (TDWG)

element set

NCBI Taxonomy Database

Overview

Background

Purpose & Scope

Key Features

Serializations & Technical Formats

Governance & Maintenance

Notable Implementations

Related Standards

Further Reading

Resources & Links

Specification

Documentation

Registry Entry

API

Other

Related Standards