The Ecological Metadata Language (EML) is one of the most widely adopted metadata standards in the environmental and ecological sciences. Developed to address the need for comprehensive, machine-readable documentation of research datasets, EML provides a modular XML-based schema that captures everything from basic bibliographic information to detailed descriptions of spatial coverage, taxonomic scope, research methods, and data table structures. Its adoption spans thousands of data repositories worldwide, making it a cornerstone of open data practices in ecology and earth science.
Background
EML originated in 1997 at the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California, Santa Barbara. Its creation was motivated by a report from the Ecological Society of America's Committee on the Future of Long-Term Ecological Data and by foundational work on ecological metadata by William Michener and colleagues. Version 1.0 was used internally at NCEAS, with subsequent internal releases (1.2, 1.3, 1.4) closely following the committee's recommendations.
With Version 2.0, EML transitioned to a community-maintained, open specification. Significant improvements in the 2.x series drew on practical experience at NCEAS and extensive feedback from the Long Term Ecological Research (LTER) Network's information managers. Version 2.1 introduced internationalization support, while version 2.2.0, released in 2019, added semantic annotations, data paper support, structured funding information, and dataset licensing — reflecting the evolving landscape of FAIR data principles and open science.
Purpose & Scope
EML is designed for documenting any research data relevant to observational disciplines, with a primary focus on ecology, earth science, and environmental science. It serves researchers, data managers, repository operators, and software developers who need to describe datasets in a structured, interoperable way.
The standard addresses several layers of data documentation:
- Resource identification: titles, creators, keywords, citations, and DOIs
- Coverage: geographic, temporal, and taxonomic extents
- Methods and protocols: detailed descriptions of research methodologies
- Data structure: entity types, attributes, measurement scales, and constraints for data tables, spatial rasters, spatial vectors, and other data formats
- Semantic annotations: formal links to ontology terms using RDF-compatible annotations
- Project context: funding sources, project descriptions, and personnel
Key Modules
EML is organized into a modular architecture with specialized sub-schemas:
| Module | Purpose |
|---|---|
| eml | Root container module |
| eml-resource | Base bibliographic information |
| eml-dataset | Dataset-specific metadata |
| eml-literature | Citation information |
| eml-software | Software-specific metadata |
| eml-protocol | Research protocol descriptions |
| eml-dataTable | Tabular data structure |
| eml-attribute | Variable/column-level descriptions |
| eml-coverage | Geographic, temporal, taxonomic extents |
| eml-methods | Research methodology |
| eml-project | Research project context |
| eml-physical | File format and distribution |
| eml-party | People and organizations |
| eml-semantics | Semantic annotation support |
| eml-spatialRaster | Gridded geospatial data |
| eml-spatialVector | Vector geospatial data |
Serializations & Technical Formats
EML is defined entirely in XML Schema (XSD). Documents are validated against the schema using standard XML validation tools, and an EML-specific validity parser enforces additional content reference constraints beyond what XSD alone can express. The canonical namespace for version 2.2.0 is https://eml.ecoinformatics.org/eml-2.2.0.
EML documents can be authored using text editors, XML-specific tools like Oxygen, scripting libraries such as the R EML package, or web-based metadata editors like MetacatUI. The schema supports content references between elements, allowing complex data packages to be described without redundancy.
Governance & Maintenance
EML is maintained by a community of voluntary project members coordinated through NCEAS. Decisions are made by consensus among current project maintainers. Development occurs in feature branches on GitHub, with contributions accepted via pull requests. Discussion takes place on a dedicated Slack channel and through the GitHub issue tracker.
The specification is versioned using a major.minor.patch scheme, with backward compatibility maintained within major versions. The project is funded through NCEAS, with support from the University of California, Santa Barbara, the State of California, and multiple National Science Foundation grants.
Notable Implementations
EML is the primary metadata standard for the Knowledge Network for Biocomplexity (KNB) data repository, the DataONE federation, and the LTER Network Information System. It is used by the Environmental Data Initiative (EDI) and by numerous individual research groups and field stations. The Arctic Data Center, the National Ecological Observatory Network (NEON), and many biodiversity data platforms also use EML for dataset documentation.
Tools supporting EML include the R EML package (part of rOpenSci), the Metacat data management system, and MetacatUI, a web-based metadata editor.
Related Standards
EML exists within a broader ecosystem of scientific metadata standards. It is complementary to Darwin Core (for biodiversity occurrence data), ISO 19115 (for geospatial metadata), and DataCite (for dataset citation). EML's semantic annotation features in version 2.2.0 allow bridging to linked data standards and ontologies such as the Semantic Web for Earth and Environmental Terminology (SWEET) and the Environment Ontology (ENVO).