Skip to main content
Back to Standards
Wikidata logo

Wikidata

A free, collaborative, multilingual, secondary knowledge base operated by the Wikimedia Foundation. Wikidata collects structured data to support Wikipedia, Wikimedia Commons, and other Wikimedia projects, as well as third-party applications worldwide. Each data item is identified by a Q-number and described through property-value statements (properties use P-numbers), with optional qualifiers and source references. The data is published under the Creative Commons CC0 public domain dedication, enabling unrestricted reuse. Wikidata exposes its data through a SPARQL endpoint, REST APIs, and linked data serializations.

Overview

Wikidata is a free, open, multilingual knowledge base that serves as the central structured data repository for the Wikimedia ecosystem. Launched in October 2012, it has grown into one of the largest openly available knowledge graphs in the world. Unlike traditional metadata standards that define schemas for others to populate, Wikidata functions as both an ontology and a living, collaboratively edited dataset -- providing a shared structure for representing knowledge that is maintained by humans and machines alike under a CC0 public domain dedication.

Background

Wikidata was conceived to address a fundamental problem: identical factual information (population figures, birth dates, geographic coordinates) was being independently maintained across hundreds of language editions of Wikipedia, leading to inconsistencies and enormous duplicated effort. Wikimedia Deutschland led the initial development on the Wikibase software platform. The project launched in October 2012, initially providing centralized interlanguage links, and rapidly expanded into a general-purpose knowledge base used far beyond the Wikimedia projects.

Purpose & Scope

Wikidata serves multiple overlapping roles:

  • Central data store for Wikimedia -- providing structured facts (infobox data, sitelinks, labels) to Wikipedia and sister projects in all languages
  • Open knowledge graph -- offering a freely reusable, machine-readable dataset for researchers, developers, and institutions
  • Authority file -- functioning as a cross-domain identifier system where every entity receives a unique, persistent Q-identifier (items) or P-identifier (properties)
  • Linked data hub -- interconnecting with external identifiers from hundreds of authority files, databases, and classification systems worldwide

The scope is encyclopedic and domain-agnostic: any notable entity or concept that meets Wikidata's notability criteria can be represented. Content loaded dynamically from Wikidata does not need to be maintained in each individual wiki project.

Data Model

Wikidata uses a statement-based data model. The Wikidata repository consists mainly of items, each identified by a Q-number (e.g., Douglas Adams is Q42). Statements describe characteristics of an item through property-value pairs, where properties are identified by P-numbers (e.g., educated at is P69).

Component Description
Labels Human-readable names in multiple languages
Descriptions Brief disambiguating text per language
Aliases Alternative names and spellings
Statements Property-value pairs with optional qualifiers and references
Sitelinks Links to corresponding pages in Wikimedia projects

Statements can carry qualifiers (additional context such as date ranges or measurement methods) and references (source citations), reflecting the diversity of knowledge available and supporting verifiability. Properties that link items to external databases, such as authority control systems used by libraries and archives, are called identifiers.

Serializations & Technical Formats

Wikidata exposes its content through multiple channels:

  • SPARQL Query Service -- powered by Blazegraph, allows complex queries over the full dataset
  • REST API and MediaWiki API -- provide programmatic access to individual entities
  • Full dataset dumps -- available in JSON and RDF (N-Triples, Turtle) formats
  • Lua Scribunto interface -- allows client wikis to access data directly in page templates
  • Entity namespace -- follows the pattern https://www.wikidata.org/entity/ with Q-identifiers

Governance & Maintenance

Wikidata is governed by the Wikimedia community through consensus-based processes, with infrastructure support from the Wikimedia Foundation and development primarily by Wikimedia Deutschland. Editing is open to anyone, including anonymous contributors. The community coordinates through Project Chat, WikiProjects (focused editing groups), and regular events such as WikidataCon. Property proposals go through a formal community review process before creation. Automated bots also enter and maintain data.

Structured data is available under the Creative Commons CC0 license (public domain dedication), enabling unrestricted reuse for any purpose.

Notable Implementations

Wikidata is used extensively across sectors. Search engines (Google Knowledge Panels), virtual assistants, and mapping platforms consume Wikidata. Cultural heritage institutions -- libraries, museums, archives -- use Wikidata as an authority hub, mapping their local identifiers to Q-numbers. The research community uses Wikidata as both a subject of study and a data source for computational analyses. Tools like OpenRefine provide reconciliation services against Wikidata for data enrichment. The Reasonator tool provides enhanced browsing of Wikidata items.

Related Standards

Wikidata connects to nearly every major authority file and controlled vocabulary through its property system, including VIAF, GND, LCNAF, and hundreds of domain-specific identifiers. It complements rather than replaces traditional metadata standards, often serving as the linking hub between them. Schema.org types and properties are mapped to Wikidata equivalents, enabling interoperability with web-scale structured data.

Further Reading