Skip to main content
Back to Standards
Data Catalog Vocabulary logo

Data Catalog Vocabulary

DCAT

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web, standardized as a W3C Recommendation. It enables publishers to describe datasets, data services, and dataset series in catalogs using a standard model, facilitating federated search and metadata aggregation across multiple catalogs. DCAT 3, the current version published in August 2024, adds support for dataset series, versioning, and inverse properties while maintaining backward compatibility with DCAT 2. The vocabulary is built around seven main classes including Catalog, Dataset, Distribution, DataService, and DatasetSeries.

Overview

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. As a W3C Recommendation published 22 August 2024, it enables publishers to describe datasets and data services in catalogs using a standard model that facilitates federated search, metadata aggregation, and dataset discovery across organizational boundaries. DCAT is foundational to open data infrastructure worldwide, particularly in the European Union's open data portal ecosystem.

Background

The original DCAT vocabulary was developed at the Digital Enterprise Research Institute (DERI), then refined by the W3C eGov Interest Group, and standardized as DCAT 1 in January 2014 by the Government Linked Data (GLD) Working Group. It was originally developed in the context of government data catalogs such as data.gov and data.gov.uk. DCAT 2, published in February 2020 by the Dataset Exchange Working Group, addressed shortcomings identified through community experience -- notably adding a class for data services and improving support for identifiers, quality information, and citation. DCAT 3, the current version, was developed to address more pressing use cases including dataset series, versioning, and inverse properties.

Purpose & Scope

DCAT provides RDF classes and properties to allow datasets and data services to be described and included in a catalog. Use of a standard model facilitates:

  • Increased discoverability of datasets and data services
  • Federated search for datasets across catalogs at multiple sites
  • Aggregation of metadata from multiple catalogs for digital preservation

DCAT makes no assumptions about the serialization formats of the data being described. It distinguishes between the abstract dataset and its different distributions, accommodating data in any format from spreadsheets and XML to RDF and specialized scientific formats.

Key Classes

DCAT 3 is organized around seven main classes:

Class Description
dcat:Catalog A dataset in which each item is a metadata record describing some resource
dcat:Resource Parent class of Dataset, DataService, and Catalog (not used directly)
dcat:Dataset A collection of data published or curated by a single agent
dcat:Distribution An accessible form of a dataset such as a downloadable file
dcat:DataService A collection of operations (API) providing access to datasets
dcat:DatasetSeries A collection of separately published datasets sharing common characteristics
dcat:CatalogRecord A metadata record describing the registration of a resource in a catalog

Versioning & Compatibility

DCAT 3 supersedes DCAT 2 but does not make it obsolete. It maintains backward compatibility -- existing DCAT 2 deployments that do not use DCAT 3 features (versioning, dataset series, inverse properties) remain conformant without changes. Key additions in DCAT 3 include the spdx:checksum property, versioning properties (dcat:version, dcat:previousVersion, dcat:hasCurrentVersion), and the dcat:DatasetSeries class.

External Vocabularies

DCAT incorporates terms from several established vocabularies where stable terms with appropriate meanings exist, including Dublin Core (dcterms), FOAF, PROV-O, SKOS, OWL, ODRL, SPDX, OWL-TIME, and vCard. It defines a minimal set of its own classes and properties. The namespace for DCAT terms is http://www.w3.org/ns/dcat# with the suggested prefix dcat.

Governance & Maintenance

DCAT is developed and maintained by the W3C Dataset Exchange Working Group (DXWG). The DCAT 3 editors are Riccardo Albertoni (CNR, Italy), David Browning, Simon Cox, Alejandra Gonzalez Beltran (STFC, UK), Andrea Perego, and Peter Winstanley. Former editors include Fadi Maali (DERI) and John Erickson (RPI). The specification source, issues, and discussion are hosted on the W3C GitHub repository (w3c/dxwg).

Notable Implementations

DCAT is deployed extensively in government open data portals. The European Data Portal and national portals across EU member states implement DCAT-AP, the European application profile. CKAN, the leading open data catalog software, provides native DCAT support. The US data.gov, Australia's data.gov.au, and similar national portals adopt DCAT or DCAT-based profiles. Research data management platforms including DataCite and re3data also align with DCAT. The Healthcare and Life Sciences Community Profile and GeoDCAT-AP for geospatial data are notable domain-specific profiles.

Related Standards

  • Dublin Core -- DCAT relies heavily on Dublin Core terms for basic descriptive metadata
  • Schema.org -- Complementary vocabulary for structured data on the Web; crosswalks exist between DCAT and Schema.org's Dataset type
  • VoID -- Can be used with DCAT to describe statistics about RDF datasets, as noted in the DCAT specification itself

Further Reading