Skip to main content
Back to Standards
Open Archives Initiative Protocol for Metadata Harvesting logo

Open Archives Initiative Protocol for Metadata Harvesting

OAI-PMH

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier interoperability framework for harvesting metadata from repositories. It defines six verbs — Identify, ListMetadataFormats, ListSets, ListIdentifiers, ListRecords, and GetRecord — that are invoked via HTTP requests and return XML responses. Data Providers expose structured metadata through OAI-PMH endpoints, while Service Providers harvest that metadata to build aggregated services. Version 2.0, released in 2002, remains the current specification and is widely deployed across digital libraries, institutional repositories, and cultural heritage aggregators worldwide.

Overview

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is one of the foundational interoperability protocols for digital libraries and institutional repositories. By providing a simple, HTTP-based mechanism for harvesting metadata records from distributed repositories, OAI-PMH has enabled the construction of aggregated discovery services across thousands of independent collections worldwide.

Background

OAI-PMH emerged from the Open Archives Initiative, established in 1999 following the Santa Fe Convention, which brought together the e-print and digital library communities to address interoperability challenges. The first version of the protocol was released in 2001, and version 2.0 — the current specification — was published on 14 June 2002. The protocol was designed to be deliberately simple, lowering the barrier for repository operators to expose their metadata to harvesting services.

The Open Archives Initiative itself was incubated at Cornell University with funding from the Digital Library Federation, the Coalition for Networked Information, and the National Science Foundation. While the initiative has since produced other specifications (OAI-ORE for object reuse, ResourceSync for resource synchronization), OAI-PMH remains its most widely adopted protocol.

Purpose & Scope

OAI-PMH establishes a client-server model with two roles:

  • Data Providers maintain repositories and expose metadata through OAI-PMH endpoints
  • Service Providers issue harvest requests to collect metadata from one or more Data Providers and build aggregated services

The protocol is metadata-format agnostic, though all implementations must support Dublin Core (oai_dc) as a baseline. Repositories may additionally expose metadata in any XML-based format.

The Six Verbs

OAI-PMH defines exactly six verbs, each invoked as an HTTP GET or POST request:

Verb Purpose
Identify Retrieve information about the repository
ListMetadataFormats List available metadata formats
ListSets List the set structure of the repository
ListIdentifiers List headers (identifiers and datestamps) for records
ListRecords Harvest full metadata records
GetRecord Retrieve a single metadata record by identifier

Responses are returned as well-formed XML documents conforming to the OAI-PMH XML Schema. Flow control for large result sets is handled through resumption tokens, allowing incremental harvesting.

Technical Details

OAI-PMH requests are standard HTTP requests with parameters encoded in the query string. A typical harvest request looks like:

https://repository.example.org/oai?verb=ListRecords&metadataPrefix=oai_dc

The protocol supports selective harvesting by date range (using from and until parameters) and by set membership, enabling Service Providers to request only new or changed records since their last harvest. Deleted records can be tracked through the repository's deleted-record policy (no, transient, or persistent).

All OAI-PMH identifiers must conform to the URI syntax, and the protocol recommends using the oai: scheme (e.g., oai:repository.example.org:item/12345).

Governance & Maintenance

OAI-PMH is maintained by the Open Archives Initiative. The version 2.0 specification has been stable since 2002 and is accompanied by implementation guidelines and a static repository specification for small, infrequently changing collections. While no new versions have been released, the protocol's simplicity has contributed to its longevity. The OAI provides a validation service for testing Data Provider compliance.

Notable Implementations

OAI-PMH is deployed across thousands of repositories and aggregation services:

  • Europeana harvests cultural heritage metadata from institutions across Europe via OAI-PMH
  • DPLA (Digital Public Library of America) uses OAI-PMH as one of its primary ingestion pathways
  • BASE (Bielefeld Academic Search Engine) harvests from over 10,000 content providers
  • OpenDOAR lists thousands of OAI-PMH-compliant open access repositories
  • Repository platforms including DSpace, EPrints, Fedora, and Islandora include built-in OAI-PMH Data Provider support

Related Standards

  • OAI-ORE — Object Reuse and Exchange, a companion OAI specification for describing aggregations of web resources
  • Dublin Core — the required baseline metadata format for all OAI-PMH implementations
  • ResourceSync — a newer OAI specification using Sitemaps for resource synchronization, designed to complement or succeed OAI-PMH for some use cases

Further Reading