Research Object Crate (RO-Crate) is a community-driven specification for packaging research data together with rich, machine-actionable metadata. In an era where reproducibility and FAIR principles (Findable, Accessible, Interoperable, Reusable) are central concerns for the research community, RO-Crate offers a practical, developer-friendly approach to bundling datasets, software, workflows, and documentation into self-describing packages that both humans and machines can understand.
Background
RO-Crate emerged around 2019 from the broader Research Object movement, which since 2010 had been developing methods for identifying, aggregating, and exchanging scholarly information on the web. The Research Object approach sought to improve scientific reproducibility by associating related research artifacts -- data, code, documentation, provenance -- under a single identifier. RO-Crate simplified this vision by adopting Schema.org vocabulary expressed in JSON-LD, aiming for "just enough" linked data to be practical without requiring deep semantic web expertise.
The design philosophy explicitly prioritized developer friendliness. The team assumed a target audience of web application developers familiar with JSON, which informed the choice of JSON-LD as the serialization format and the emphasis on self-contained, human-readable metadata files. The result is a specification that a typical developer can understand and implement without prior linked data experience.
Purpose and Scope
An RO-Crate is fundamentally a directory containing a metadata file (ro-crate-metadata.json) that describes its contents using Schema.org types and properties serialized as JSON-LD. The metadata describes:
- Data entities -- files and directories, either stored within the crate or referenced by persistent identifiers/URLs
- Contextual entities -- people (identified by ORCID), organizations, licenses, places, and other entities that provide context for the data
Each entity is described with at least a type and name, ensuring that clients can render meaningful information without following external links. RO-Crates can be distributed as ZIP archives, published to repositories like Zenodo, or hosted directly on the web via GitHub Pages or similar platforms.
Structure
| Component | Description |
|---|---|
| Root Data Entity | The top-level directory, identified by the presence of ro-crate-metadata.json |
| RO-Crate Metadata File | JSON-LD file conforming to the RO-Crate specification, describing all entities |
| Data Entities | Files and directories within or referenced by the crate |
| Contextual Entities | People, organizations, licenses, and other context stored as metadata only |
Profiles
RO-Crate supports domain-specific profiles that extend the base specification with additional requirements:
- Workflow RO-Crate -- Packages executable workflows with documentation, used by WorkflowHub
- Workflow Run RO-Crate -- Records provenance of workflow executions
- Process Run Crate -- Captures provenance of individual computational steps
Governance and Maintenance
RO-Crate is developed as a community effort, with contributions from the University of Technology Sydney, The University of Manchester, and a broad international community of research infrastructure developers. The specification is developed openly on GitHub with community input. The current stable version is 1.1, identified by the context URI https://w3id.org/ro/crate/1.1/context.
Notable Implementations
RO-Crate has been adopted by a growing number of research platforms and communities:
- WorkflowHub -- European workflow registry that imports and exports Workflow RO-Crates
- Language Data Commons of Australia (LDaCA) -- Uses RO-Crate for language data interchange and archiving
- Open Microscopy Environment (OME) -- Uses RO-Crate for data transfer between OMERO servers
- M@TE (Model Atlas of the Earth) -- Packages numerical Earth system models with RO-Crate metadata
- HUN-REN ARP (AROMA) -- Extends Harvard Dataverse with RO-Crate metadata editing
- nf-prov Nextflow plugin -- Generates Workflow Run RO-Crate provenance from Nextflow pipeline runs
Related Standards
- Schema.org -- The vocabulary foundation for RO-Crate metadata annotations
- JSON-LD -- The serialization format used for the RO-Crate metadata file
- DCAT -- Data Catalog Vocabulary, complementary for dataset discovery and cataloging