Skip to main content
Back to Standards
Ecological Metadata Language logo

Ecological Metadata Language

EML

The Ecological Metadata Language (EML) defines a comprehensive vocabulary and a readable XML markup syntax for documenting research data. It is in widespread use in the earth and environmental sciences, and increasingly in other research disciplines. EML includes modules for identifying and citing data packages, for describing the spatial, temporal, taxonomic, and thematic extent of data, for describing research methods and protocols, and for precisely annotating data with semantic vocabularies. Version 2.2.0 added support for semantic annotations, data papers, internationalization, and structured funding information.

Overview

The Ecological Metadata Language (EML) is one of the most widely adopted metadata standards in the environmental and ecological sciences. Developed to address the need for comprehensive, machine-readable documentation of research datasets, EML provides a modular XML-based schema that captures everything from basic bibliographic information to detailed descriptions of spatial coverage, taxonomic scope, research methods, and data table structures. Its adoption spans thousands of data repositories worldwide, making it a cornerstone of open data practices in ecology and earth science.

Background

EML originated in 1997 at the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California, Santa Barbara. Its creation was motivated by a report from the Ecological Society of America's Committee on the Future of Long-Term Ecological Data and by foundational work on ecological metadata by William Michener and colleagues. Version 1.0 was used internally at NCEAS, with subsequent internal releases (1.2, 1.3, 1.4) closely following the committee's recommendations.

With Version 2.0, EML transitioned to a community-maintained, open specification. Significant improvements in the 2.x series drew on practical experience at NCEAS and extensive feedback from the Long Term Ecological Research (LTER) Network's information managers. Version 2.1 introduced internationalization support, while version 2.2.0, released in 2019, added semantic annotations, data paper support, structured funding information, and dataset licensing — reflecting the evolving landscape of FAIR data principles and open science.

Purpose & Scope

EML is designed for documenting any research data relevant to observational disciplines, with a primary focus on ecology, earth science, and environmental science. It serves researchers, data managers, repository operators, and software developers who need to describe datasets in a structured, interoperable way.

The standard addresses several layers of data documentation:

  • Resource identification: titles, creators, keywords, citations, and DOIs
  • Coverage: geographic, temporal, and taxonomic extents
  • Methods and protocols: detailed descriptions of research methodologies
  • Data structure: entity types, attributes, measurement scales, and constraints for data tables, spatial rasters, spatial vectors, and other data formats
  • Semantic annotations: formal links to ontology terms using RDF-compatible annotations
  • Project context: funding sources, project descriptions, and personnel

Key Modules

EML is organized into a modular architecture with specialized sub-schemas:

Module Purpose
eml Root container module
eml-resource Base bibliographic information
eml-dataset Dataset-specific metadata
eml-literature Citation information
eml-software Software-specific metadata
eml-protocol Research protocol descriptions
eml-dataTable Tabular data structure
eml-attribute Variable/column-level descriptions
eml-coverage Geographic, temporal, taxonomic extents
eml-methods Research methodology
eml-project Research project context
eml-physical File format and distribution
eml-party People and organizations
eml-semantics Semantic annotation support
eml-spatialRaster Gridded geospatial data
eml-spatialVector Vector geospatial data

Serializations & Technical Formats

EML is defined entirely in XML Schema (XSD). Documents are validated against the schema using standard XML validation tools, and an EML-specific validity parser enforces additional content reference constraints beyond what XSD alone can express. The canonical namespace for version 2.2.0 is https://eml.ecoinformatics.org/eml-2.2.0.

EML documents can be authored using text editors, XML-specific tools like Oxygen, scripting libraries such as the R EML package, or web-based metadata editors like MetacatUI. The schema supports content references between elements, allowing complex data packages to be described without redundancy.

Governance & Maintenance

EML is maintained by a community of voluntary project members coordinated through NCEAS. Decisions are made by consensus among current project maintainers. Development occurs in feature branches on GitHub, with contributions accepted via pull requests. Discussion takes place on a dedicated Slack channel and through the GitHub issue tracker.

The specification is versioned using a major.minor.patch scheme, with backward compatibility maintained within major versions. The project is funded through NCEAS, with support from the University of California, Santa Barbara, the State of California, and multiple National Science Foundation grants.

Notable Implementations

EML is the primary metadata standard for the Knowledge Network for Biocomplexity (KNB) data repository, the DataONE federation, and the LTER Network Information System. It is used by the Environmental Data Initiative (EDI) and by numerous individual research groups and field stations. The Arctic Data Center, the National Ecological Observatory Network (NEON), and many biodiversity data platforms also use EML for dataset documentation.

Tools supporting EML include the R EML package (part of rOpenSci), the Metacat data management system, and MetacatUI, a web-based metadata editor.

Related Standards

EML exists within a broader ecosystem of scientific metadata standards. It is complementary to Darwin Core (for biodiversity occurrence data), ISO 19115 (for geospatial metadata), and DataCite (for dataset citation). EML's semantic annotation features in version 2.2.0 allow bridging to linked data standards and ontologies such as the Semantic Web for Earth and Environmental Terminology (SWEET) and the Environment Ontology (ENVO).

Further Reading

Resources & Links