Skip to main content
Back to Standards
Data Package logo

Data Package

A standard consisting of a set of simple yet extensible specifications to describe datasets, data files, and tabular data. Developed by the Open Knowledge Foundation as part of the Frictionless Data project, Data Package acts as a data definition language and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data. The v2.0 standard, released in June 2024, comprises four sub-specifications: Data Package (dataset containers), Data Resource (individual files), Table Dialect (CSV dialect description), and Table Schema (column types and constraints).

Overview

Data Package is a lightweight standard for packaging and describing datasets, designed to make data easier to find, share, and use across tools and platforms. Developed by the Open Knowledge Foundation under the Frictionless Data initiative, it provides a minimal but extensible set of specifications that follow the FAIR principles. The v2.0 release in June 2024 marked a significant maturation of the standard, consolidating years of community feedback and real-world deployment.

Background

The Frictionless Data project originated around 2013 at the Open Knowledge Foundation, motivated by the observation that much of the friction in working with data comes not from analysis but from the mundane tasks of finding, fetching, understanding, and cleaning data. The Data Package concept was designed as a "container" that could travel alongside data, providing just enough metadata to make the data immediately usable by software without manual intervention. The standard evolved through extensive community use in open data portals, scientific research, and government data publishing before reaching its v2.0 milestone.

Purpose & Scope

The Data Package standard comprises four interlocking specifications:

  • Data Package -- a container format describing a coherent collection of data (a dataset), including its contributors, licenses, and other metadata
  • Data Resource -- a format for describing an individual data file, including its name, format, path, and media type
  • Table Dialect -- a description of how a tabular data file is structured, covering delimiters, header rows, escape characters, quoting, and line terminators
  • Table Schema -- a format for describing tabular data columns, including field names, data types, constraints, missing value sentinels, and foreign key relationships

Together, these specifications allow a dataset to be fully self-describing. A datapackage.json file at the root of a dataset directory contains all the metadata needed for software to read, validate, and process the data without guesswork.

Key Descriptor Properties

Property Specification Description
name Data Package A short identifier for the package
resources Data Package An array of Data Resource descriptors
licenses Data Package Licensing information
path Data Resource Location of the data file
schema Data Resource Reference to a Table Schema
delimiter Table Dialect Field separator character
fields Table Schema Array of field descriptors with names and types
primaryKey Table Schema Field(s) forming the primary key

Serializations & Technical Formats

Data Package metadata is serialized as JSON. The descriptor file is conventionally named datapackage.json. The standard does not prescribe the format of the data files themselves, though CSV is the most common for tabular data and has the deepest tooling support.

Governance & Maintenance

The standard is maintained by the Open Knowledge Foundation. Development is conducted openly on GitHub, with community input through issues and pull requests. The v2.0 specification was released on June 26, 2024, after an extended development and review period. An online validator is provided for checking conformance.

Notable Implementations

  • CKAN -- the widely deployed open data portal software supports Data Package import and export
  • Open Data Editor -- a no-code visual tool for creating and editing Data Packages
  • Frictionless Framework -- reference implementations in Python, JavaScript, R, and other languages
  • data.gov and similar portals -- government open data platforms use Data Package for dataset distribution
  • Zenodo and Figshare -- research data repositories support Data Package descriptors

Related Standards

Data Package complements DCAT (Data Catalog Vocabulary) which operates at the catalog level rather than the dataset-internal level. Schema.org provides general-purpose dataset discovery metadata and can wrap Data Package descriptors. The standard also interoperates with CSV on the Web (CSVW) for detailed tabular data description.

Further Reading