The PROV family of specifications is the W3C's standardized framework for representing, exchanging, and querying provenance information across the Web and other distributed systems. Provenance -- information about the entities, activities, and agents involved in producing a piece of data -- is essential for assessing the quality, reliability, and trustworthiness of digital resources. PROV provides a vendor-neutral, interoperable foundation for recording that lineage.
Background
Work on PROV grew out of the W3C Provenance Incubator Group, which between 2009 and 2010 cataloged use cases, elicited requirements, and surveyed existing provenance literature. The Incubator Group identified eight broad recommendations for a provenance framework, covering core concepts of object identification, attribution, processing steps, versioning, reproducibility, and derivation. The Provenance Working Group was chartered in 2011 to develop specifications meeting those requirements. The resulting twelve documents were published on 30 April 2013, with four achieving W3C Recommendation status.
Purpose and Scope
PROV is designed for any scenario where it is important to record how data was produced, who was responsible, and what processes were involved. Use cases range from scientific reproducibility and audit trails to content attribution and regulatory compliance. The standard is intentionally domain-neutral: its core vocabulary of Entities, Activities, and Agents can be extended to serve fields as diverse as genomics, geospatial analysis, journalism, and government open data.
Document Family
The twelve PROV documents are organized by audience:
| Document | Type | Audience | Purpose |
|---|---|---|---|
| PROV-Overview | Note | Users | Roadmap to the family of documents |
| PROV-Primer | Note | Users | Introduction to the data model |
| PROV-DM | Recommendation | Advanced | Conceptual data model with UML diagrams |
| PROV-O | Recommendation | Developers | OWL2 ontology for Linked Data |
| PROV-N | Recommendation | Advanced | Human-readable provenance notation |
| PROV-CONSTRAINTS | Recommendation | Advanced | Validity constraints for provenance |
| PROV-XML | Note | Developers | XML Schema serialization |
| PROV-AQ | Note | Developers | Mechanisms for accessing provenance |
| PROV-DC | Note | Developers | Mapping to Dublin Core Terms |
| PROV-DICTIONARY | Note | Developers | Provenance of dictionary data structures |
| PROV-SEM | Note | Advanced | First-order logic semantics |
| PROV-LINKS | Note | Advanced | Linking across provenance bundles |
Key Concepts
PROV-DM defines three core types:
- Entity -- a physical, digital, or conceptual thing with fixed aspects.
- Activity -- something that occurs over a period of time and acts upon or with entities.
- Agent -- something that bears responsibility for an activity or the existence of an entity.
Relations between these types express derivation ("was derived from"), usage ("used"), generation ("was generated by"), attribution ("was attributed to"), delegation, and influence.
Serializations and Technical Formats
All PROV terms share a single namespace: http://www.w3.org/ns/prov# (prefix prov:). The standard supports multiple serializations:
- PROV-O -- OWL2 ontology, usable with any RDF serialization (Turtle, RDF/XML, JSON-LD, N-Triples)
- PROV-XML -- Native XML Schema for non-RDF environments
- PROV-N -- Human-readable notation used in examples and constraint definitions
Governance and Maintenance
PROV was produced by the W3C Provenance Working Group, which concluded its work in 2013. The four Recommendations (PROV-DM, PROV-O, PROV-N, PROV-CONSTRAINTS) are maintained under the W3C Process, with errata tracked on a dedicated errata page. While the Working Group is no longer active, the specifications remain stable W3C standards and are widely referenced.
Notable Implementations
PROV has been adopted across a broad range of systems and communities. Scientific workflow platforms such as Apache Taverna and VisTrails generate PROV traces. The ProvStore service provides a repository and API for PROV documents. Government open data initiatives, digital humanities projects, and life sciences pipelines use PROV to document data lineage. The W3C's own implementation report catalogs dozens of software implementations, datasets using PROV, and extensions built on top of the standard.
Related Standards
- Dublin Core -- PROV-DC defines a formal mapping between PROV-O and Dublin Core Terms, allowing provenance to be linked with standard resource descriptions.
Further Reading
- PROV-Overview -- the roadmap document for the entire family
- PROV Model Primer -- the recommended starting point for new users
- PROV FAQ -- answers to common questions