The DataCite Metadata Schema is the core metadata standard used by the DataCite consortium for the registration and citation of research data. By defining a set of mandatory and recommended properties for describing datasets and other research outputs, it ensures that DOI-identified resources are findable, citable, and interoperable across repositories and discovery platforms. The schema is one of the most widely adopted standards in the research data management landscape.
Background
DataCite was founded in 2009 as an international consortium of research institutions, data centers, and libraries with the mission of improving access to research data through persistent identifiers. The Metadata Schema was developed to accompany the DOI registration service, providing a standardized way to describe the resources receiving DOIs. The first public version of the schema was released around 2011, and it has been updated regularly since then to accommodate evolving community needs.
The schema is developed and maintained by the DataCite Metadata Working Group, which consults with DataCite members and coordinates with related community standards including ORCID (for researcher identifiers), the Open Funder Registry (for funding organizations), the International DOI Foundation (IDF), and the Dublin Core Metadata Initiative (DCMI).
Purpose and Scope
The DataCite Metadata Schema defines a list of core metadata properties for the accurate and consistent identification of resources for citation and retrieval purposes. It is designed to describe research data, but its scope extends to software, workflows, collections, and other research outputs that can receive DOIs.
The schema distinguishes between:
- Mandatory properties -- required for every DOI registration (Identifier, Creator, Title, Publisher, PublicationYear, ResourceType)
- Recommended properties -- strongly encouraged for discoverability (Subject, Contributor, Date, RelatedIdentifier, Description, GeoLocation)
- Optional properties -- additional detail as relevant (Language, AlternateIdentifier, Size, Format, Version, Rights, FundingReference)
Key Properties
| Property | Obligation | Description |
|---|---|---|
| Identifier | Mandatory | The DOI being registered |
| Creator | Mandatory | The main researchers or authors |
| Title | Mandatory | Name of the resource |
| Publisher | Mandatory | Entity responsible for making the resource available |
| PublicationYear | Mandatory | Year the resource was published or made available |
| ResourceType | Mandatory | Type of resource (Dataset, Software, Collection, etc.) |
| Subject | Recommended | Subject or keyword terms |
| Contributor | Recommended | Additional contributors with typed roles |
| Date | Recommended | Dates relevant to the resource lifecycle |
| RelatedIdentifier | Recommended | Links to related resources via typed relationships |
| Description | Recommended | Free-text description or abstract |
| GeoLocation | Recommended | Spatial information for the resource |
Serializations and Technical Formats
The DataCite Metadata Schema is formally defined as an XML Schema (XSD). Metadata can be submitted to DataCite in XML, and the DataCite REST API also accepts and returns metadata in JSON format. The schema documentation is hosted at schema.datacite.org, with each version published under its own kernel path (e.g., kernel-4.7).
Version History
The schema has undergone regular updates. Version 4.7, released in March 2026, added new resourceTypeGeneral values (Poster, Presentation), new relatedIdentifierType values (RAiD, SWHID), a new relationType value (Other), and a relationTypeInformation sub-property. Previous major versions include 4.5 (2024), 4.4 (2021), and 4.3 (2019), each expanding controlled vocabularies and refining property definitions.
Governance and Maintenance
The DataCite Metadata Working Group is responsible for maintaining the schema. Members are drawn from DataCite member organizations and listed on the DataCite website. The working group actively solicits community feedback through a public contribution process documented on the schema site. Changes are discussed openly and tracked through a formal release history. The schema is licensed under CC-BY.
Notable Implementations
The DataCite Metadata Schema is implemented across thousands of data repositories worldwide:
- All DataCite member organizations use the schema when registering DOIs for datasets
- Major repository platforms including Zenodo, Dryad, Figshare, and institutional repositories submit metadata conforming to the schema
- DataCite Commons provides a discovery interface built on the accumulated metadata
- The DataCite REST API enables programmatic access to all registered metadata
- CrossRef and DataCite coordinate on metadata exchange for linking publications to datasets
Related Standards
- Dublin Core -- DataCite maps to Dublin Core elements and coordinates with DCMI
- DCAT (Data Catalog Vocabulary) -- a W3C vocabulary for describing datasets in catalogs, with overlap in scope
- Schema.org -- DataCite metadata can be mapped to Schema.org for search engine visibility
- ORCID -- integrated for researcher identification in Creator and Contributor fields
DataCite