Darwin Core is the most widely adopted standard for sharing biodiversity information, providing a stable, well-documented glossary of terms that enables institutions, researchers, and data aggregators to exchange data about organisms, their occurrences in nature, and related evidence such as specimens, samples, and observations. Maintained by Biodiversity Information Standards (TDWG), it underpins the Global Biodiversity Information Facility (GBIF) and hundreds of other biodiversity data networks worldwide.
Background
Darwin Core originated from work by the TDWG community to create a common language for biological collection data. The standard was ratified by TDWG on October 9, 2009, drawing on earlier efforts including the original "Darwin Core" concept developed in the late 1990s for natural history collections. The name references Charles Darwin's foundational contributions to biodiversity science. Since ratification, the standard has been continuously maintained and extended by the Darwin Core Maintenance Group through an open process on GitHub.
Purpose & Scope
Darwin Core is scoped to biological collection data in the broadest sense: any record of an organism's existence at a place and time, whether documented through physical specimens, preserved samples, human observations, or machine sensors. The standard provides terms for describing taxa, occurrences, events, locations, geological contexts, and identification histories.
Explicitly out of scope are data interchange protocols (handled by standards like TAPIR and BioCASe), purely taxonomic name data (covered by the TDWG Taxon Name Usage standard), and non-biodiversity-related data.
Key Components
The standard comprises seven vocabularies and twelve documents:
Core vocabulary (http://rs.tdwg.org/dwc/) -- The main set of terms covering Record-level, Occurrence, Event, Location, Taxon, Identification, and other classes.
Controlled vocabularies -- Separate vocabularies for establishmentMeans, degreeOfEstablishment, and pathway, providing standardized values for describing how organisms arrive at and persist in locations.
Chronometric Age Extension -- Terms for recording the results of chronometric age assays on material samples.
Humboldt Extension -- Added in 2024, this vocabulary supports reporting on ecological inventories and surveys, including hierarchical event structures and relative abundance data.
Serialization guides -- Separate documents for RDF, text (CSV/TSV using Darwin Core Archive format), and XML implementations.
Serializations & Technical Formats
Darwin Core data is most commonly exchanged as Darwin Core Archives (DwC-A), which are ZIP packages containing CSV/TSV files with a meta.xml descriptor conforming to the Darwin Core Text Guide. RDF representations use the DwC namespace (http://rs.tdwg.org/dwc/terms/) and a parallel IRI namespace (http://rs.tdwg.org/dwc/iri/) for non-literal values. XML serialization follows the Darwin Core XML Guide.
Governance & Maintenance
The Darwin Core Maintenance Group, operating under TDWG governance, manages all modifications and enhancements. Proposals are submitted as GitHub issues, discussed openly, and decided according to TDWG's published standards process. The standard is licensed under CC-BY 4.0, and all normative content is maintained in the tdwg/dwc GitHub repository.
Notable Implementations
Darwin Core is the backbone of the GBIF network, which aggregates over 2.5 billion occurrence records from thousands of institutions. The Atlas of Living Australia, iDigBio, VertNet, and national biodiversity portals across the world rely on Darwin Core for data exchange. The Darwin Core Archive format is the primary mechanism for publishing datasets to GBIF and similar aggregators. OBIS (Ocean Biodiversity Information System) extends Darwin Core for marine biodiversity data.
Related Standards
Darwin Core is closely related to ABCD (Access to Biological Collection Data), another TDWG standard that takes a more comprehensive, highly structured XML approach to the same domain. The two standards serve complementary roles: Darwin Core favors simplicity and flat structures, while ABCD supports deeply nested, atomized data. Many data networks support both.