Skip to main content
Back to Standards
PROV: Provenance Data Model and Supporting Definitions logo

PROV: Provenance Data Model and Supporting Definitions

PROV

A W3C family of twelve documents that defines a data model, serializations, and supporting definitions for the inter-operable interchange of provenance information in heterogeneous environments such as the Web. At its core is PROV-DM, a conceptual data model describing entities, activities, and agents involved in producing data or things. PROV-O provides an OWL2 ontology mapping, while PROV-N and PROV-XML offer alternative serializations. Four of the twelve documents are W3C Recommendations (PROV-DM, PROV-O, PROV-N, PROV-CONSTRAINTS), with the remainder published as Working Group Notes.

Overview

The PROV family of specifications is the W3C's standardized framework for representing, exchanging, and querying provenance information across the Web and other distributed systems. Provenance -- information about the entities, activities, and agents involved in producing a piece of data -- is essential for assessing the quality, reliability, and trustworthiness of digital resources. PROV provides a vendor-neutral, interoperable foundation for recording that lineage.

Background

Work on PROV grew out of the W3C Provenance Incubator Group, which between 2009 and 2010 cataloged use cases, elicited requirements, and surveyed existing provenance literature. The Incubator Group identified eight broad recommendations for a provenance framework, covering core concepts of object identification, attribution, processing steps, versioning, reproducibility, and derivation. The Provenance Working Group was chartered in 2011 to develop specifications meeting those requirements. The resulting twelve documents were published on 30 April 2013, with four achieving W3C Recommendation status.

Purpose and Scope

PROV is designed for any scenario where it is important to record how data was produced, who was responsible, and what processes were involved. Use cases range from scientific reproducibility and audit trails to content attribution and regulatory compliance. The standard is intentionally domain-neutral: its core vocabulary of Entities, Activities, and Agents can be extended to serve fields as diverse as genomics, geospatial analysis, journalism, and government open data.

Document Family

The twelve PROV documents are organized by audience:

Document Type Audience Purpose
PROV-Overview Note Users Roadmap to the family of documents
PROV-Primer Note Users Introduction to the data model
PROV-DM Recommendation Advanced Conceptual data model with UML diagrams
PROV-O Recommendation Developers OWL2 ontology for Linked Data
PROV-N Recommendation Advanced Human-readable provenance notation
PROV-CONSTRAINTS Recommendation Advanced Validity constraints for provenance
PROV-XML Note Developers XML Schema serialization
PROV-AQ Note Developers Mechanisms for accessing provenance
PROV-DC Note Developers Mapping to Dublin Core Terms
PROV-DICTIONARY Note Developers Provenance of dictionary data structures
PROV-SEM Note Advanced First-order logic semantics
PROV-LINKS Note Advanced Linking across provenance bundles

Key Concepts

PROV-DM defines three core types:

  • Entity -- a physical, digital, or conceptual thing with fixed aspects.
  • Activity -- something that occurs over a period of time and acts upon or with entities.
  • Agent -- something that bears responsibility for an activity or the existence of an entity.

Relations between these types express derivation ("was derived from"), usage ("used"), generation ("was generated by"), attribution ("was attributed to"), delegation, and influence.

Serializations and Technical Formats

All PROV terms share a single namespace: http://www.w3.org/ns/prov# (prefix prov:). The standard supports multiple serializations:

  • PROV-O -- OWL2 ontology, usable with any RDF serialization (Turtle, RDF/XML, JSON-LD, N-Triples)
  • PROV-XML -- Native XML Schema for non-RDF environments
  • PROV-N -- Human-readable notation used in examples and constraint definitions

Governance and Maintenance

PROV was produced by the W3C Provenance Working Group, which concluded its work in 2013. The four Recommendations (PROV-DM, PROV-O, PROV-N, PROV-CONSTRAINTS) are maintained under the W3C Process, with errata tracked on a dedicated errata page. While the Working Group is no longer active, the specifications remain stable W3C standards and are widely referenced.

Notable Implementations

PROV has been adopted across a broad range of systems and communities. Scientific workflow platforms such as Apache Taverna and VisTrails generate PROV traces. The ProvStore service provides a repository and API for PROV documents. Government open data initiatives, digital humanities projects, and life sciences pipelines use PROV to document data lineage. The W3C's own implementation report catalogs dozens of software implementations, datasets using PROV, and extensions built on top of the standard.

Related Standards

  • Dublin Core -- PROV-DC defines a formal mapping between PROV-O and Dublin Core Terms, allowing provenance to be linked with standard resource descriptions.

Further Reading