Skip to main content
Back to Standards
Vocabulary of Interlinked Datasets logo

Vocabulary of Interlinked Datasets

VoID

VoID is an RDF Schema vocabulary for expressing metadata about RDF datasets. Published as a W3C Interest Group Note, it serves as a bridge between the publishers and users of RDF data, covering general metadata based on Dublin Core, access metadata (SPARQL endpoints, data dumps, URI lookups), structural metadata (class and property partitions, statistics), and descriptions of links between datasets. VoID defines 4 classes and 21 properties that enable dataset discovery, cataloging, and archiving within the Linked Data ecosystem.

Overview

VoID is an RDF Schema vocabulary for expressing metadata about RDF datasets. Published as a W3C Interest Group Note on 3 March 2011, it serves as a bridge between the publishers and users of RDF data, with applications ranging from data discovery to cataloging and archiving of datasets. VoID descriptions help users find the right data for their tasks by answering fundamental questions: what is in this dataset, how can I access it, what does it link to, and how large is it.

Background

A first version of VoID was developed and published by the authors starting in 2008. The extended and improved version, based on community feedback received since the original publication, was submitted to the W3C Semantic Web Interest Group (SWIG) as a W3C Interest Group Note. The authors are Keith Alexander (Talis), Richard Cyganiak (DERI, NUI Galway), Michael Hausenblas (DERI, NUI Galway), and Jun Zhao (University of Oxford). The work was partly supported by EC FP7 projects (ROMULUS, OKKAM, LATC), Science Foundation Ireland, and EPSRC.

Purpose & Scope

VoID covers four areas of metadata about RDF datasets:

  1. General metadata -- Titles, descriptions, creators, publishers, licenses, subjects, and dates following the Dublin Core model. Datasets can be categorized by subject using DBpedia resource URIs or domain-specific SKOS concepts.

  2. Access metadata -- How to reach the data: resolvable HTTP URIs, SPARQL endpoints (void:sparqlEndpoint), RDF data dumps (void:dataDump), root resources for crawling, URI lookup endpoints, and OpenSearch description documents.

  3. Structural metadata -- The internal structure of datasets: example resources, URI patterns (void:uriSpace, void:uriRegexPattern), vocabularies used (void:vocabulary), class and property partitions for describing which classes and properties appear in the data, and statistical counts for triples, entities, classes, properties, distinct subjects, and distinct objects.

  4. Link descriptions -- Linksets (void:Linkset, a subclass of void:Dataset) that describe RDF links between datasets, including the link predicate (void:linkPredicate) and the two target datasets (void:target, void:subjectsTarget, void:objectsTarget).

Key Classes & Properties

VoID defines 4 classes and 21 properties:

Class Description
void:Dataset A set of RDF triples published, maintained, or aggregated by a single provider
void:DatasetDescription A web resource whose topics include VoID datasets
void:Linkset A collection of RDF links between two datasets (subclass of Dataset)
void:TechnicalFeature A technical feature such as a supported serialization format

Key properties include void:sparqlEndpoint, void:dataDump, void:vocabulary, void:triples, void:entities, void:classes, void:properties, void:subset, void:classPartition, void:propertyPartition, void:linkPredicate, void:uriSpace, void:exampleResource, void:inDataset, and void:rootResource.

Deployment & Discovery

VoID descriptions can be deployed by placing a Turtle file (void.ttl) alongside the dataset with a hash URI for the dataset, embedding RDFa in the dataset homepage, or serving via content negotiation. Discovery uses two mechanisms: void:inDataset backlinks from individual RDF documents to the dataset description, and a well-known URI (/.well-known/void) registered via RFC 5785 for automatic discovery on any web server.

VoID integrates with the SPARQL 1.1 Service Description vocabulary. void:Dataset is a superclass of both sd:Dataset and sd:Graph, allowing VoID metadata to be provided within SPARQL service descriptions.

Governance & Maintenance

VoID was produced within the W3C Semantic Web Interest Group. The SWIG did not expect the document to become a W3C Recommendation. The vocabulary definition is maintained separately as a companion document. The namespace URI is http://rdfs.org/ns/void# (hosted at rdfs.org, not w3.org). The vocabulary is considered stable and has not undergone significant revision since 2011.

Notable Implementations

VoID is widely used across the Linked Open Data ecosystem. It is supported by data registries such as the DataHub, LODStats, and Sindice. Major Linked Data publishers including DBpedia, Bio2RDF, and LinkedGeoData provide VoID descriptions. The LOD Cloud diagram relies on VoID linkset descriptions to map connections between datasets. DCAT references VoID as a complementary vocabulary for describing statistics about RDF datasets.

Related Standards

  • DCAT -- A broader W3C Recommendation for general dataset cataloging, often used alongside VoID
  • RDF -- The data model that VoID describes and is itself expressed in
  • SPARQL -- The query language for accessing datasets described by VoID; SPARQL Service Description integrates directly with VoID

Further Reading