Data Package is a lightweight standard for packaging and describing datasets, designed to make data easier to find, share, and use across tools and platforms. Developed by the Open Knowledge Foundation under the Frictionless Data initiative, it provides a minimal but extensible set of specifications that follow the FAIR principles. The v2.0 release in June 2024 marked a significant maturation of the standard, consolidating years of community feedback and real-world deployment.
Background
The Frictionless Data project originated around 2013 at the Open Knowledge Foundation, motivated by the observation that much of the friction in working with data comes not from analysis but from the mundane tasks of finding, fetching, understanding, and cleaning data. The Data Package concept was designed as a "container" that could travel alongside data, providing just enough metadata to make the data immediately usable by software without manual intervention. The standard evolved through extensive community use in open data portals, scientific research, and government data publishing before reaching its v2.0 milestone.
Purpose & Scope
The Data Package standard comprises four interlocking specifications:
- Data Package -- a container format describing a coherent collection of data (a dataset), including its contributors, licenses, and other metadata
- Data Resource -- a format for describing an individual data file, including its name, format, path, and media type
- Table Dialect -- a description of how a tabular data file is structured, covering delimiters, header rows, escape characters, quoting, and line terminators
- Table Schema -- a format for describing tabular data columns, including field names, data types, constraints, missing value sentinels, and foreign key relationships
Together, these specifications allow a dataset to be fully self-describing. A datapackage.json file at the root of a dataset directory contains all the metadata needed for software to read, validate, and process the data without guesswork.
Key Descriptor Properties
| Property | Specification | Description |
|---|---|---|
| name | Data Package | A short identifier for the package |
| resources | Data Package | An array of Data Resource descriptors |
| licenses | Data Package | Licensing information |
| path | Data Resource | Location of the data file |
| schema | Data Resource | Reference to a Table Schema |
| delimiter | Table Dialect | Field separator character |
| fields | Table Schema | Array of field descriptors with names and types |
| primaryKey | Table Schema | Field(s) forming the primary key |
Serializations & Technical Formats
Data Package metadata is serialized as JSON. The descriptor file is conventionally named datapackage.json. The standard does not prescribe the format of the data files themselves, though CSV is the most common for tabular data and has the deepest tooling support.
Governance & Maintenance
The standard is maintained by the Open Knowledge Foundation. Development is conducted openly on GitHub, with community input through issues and pull requests. The v2.0 specification was released on June 26, 2024, after an extended development and review period. An online validator is provided for checking conformance.
Notable Implementations
- CKAN -- the widely deployed open data portal software supports Data Package import and export
- Open Data Editor -- a no-code visual tool for creating and editing Data Packages
- Frictionless Framework -- reference implementations in Python, JavaScript, R, and other languages
- data.gov and similar portals -- government open data platforms use Data Package for dataset distribution
- Zenodo and Figshare -- research data repositories support Data Package descriptors
Related Standards
Data Package complements DCAT (Data Catalog Vocabulary) which operates at the catalog level rather than the dataset-internal level. Schema.org provides general-purpose dataset discovery metadata and can wrap Data Package descriptors. The standard also interoperates with CSV on the Web (CSVW) for detailed tabular data description.
OKF