Skip to main content
Back to Standards
CSV on the Web logo

CSV on the Web

CSVW

A suite of W3C Recommendations defining a model and metadata vocabulary for tabular data (primarily CSV files) on the Web. The core specification, Model for Tabular Data and Metadata on the Web, provides a data model for interpreting CSV and similar tabular formats. The companion Metadata Vocabulary for Tabular Data allows publishers to annotate CSV files with structural information, datatypes, foreign key relationships, and transformation rules using JSON-LD metadata documents. Additional specifications define standard procedures for converting annotated tabular data to JSON (CSV2JSON) and RDF (CSV2RDF). Published as four W3C Recommendations on 17 December 2015 by the CSV on the Web Working Group.

Overview

CSV on the Web is a suite of W3C Recommendations that brings structured metadata to the world's most ubiquitous data format. CSV files are everywhere -- from government open data portals to scientific repositories -- yet they carry no inherent schema, no type information, and no standardized way to describe their structure. CSVW fills this gap by defining a metadata vocabulary and processing model for tabular data on the Web.

Background

The CSV on the Web Working Group was chartered by W3C in 2013 in response to the widespread use of CSV for publishing open data, particularly by governments. Editors Jeni Tennison (Open Data Institute) and Gregg Kellogg (Kellogg Associates), along with authors Rufus Pollock (Open Knowledge) and Ivan Herman (W3C), developed four specifications published simultaneously as W3C Recommendations on 17 December 2015.

Purpose and Scope

CSVW addresses several fundamental problems with CSV data on the web:

  • Ambiguity: CSV files lack a formal schema; column semantics, datatypes, and table relationships are undefined
  • Discoverability: No standard mechanism exists to associate metadata with a CSV file
  • Interoperability: Different tools make different assumptions about delimiters, encodings, null values, and line endings
  • Integration: CSV data cannot participate in the linked data ecosystem without transformation

The Four Specifications

  1. Model for Tabular Data and Metadata on the Web defines the abstract data model for tables, columns, rows, cells, and their annotations
  2. Metadata Vocabulary for Tabular Data provides a JSON-LD vocabulary for describing CSV structure, including datatypes, foreign keys, transformations, and dialect settings
  3. Generating JSON from Tabular Data on the Web (CSV2JSON) defines standard conversion from annotated CSV to JSON
  4. Generating RDF from Tabular Data on the Web (CSV2RDF) defines standard conversion from annotated CSV to RDF

How It Works

Publishers place a JSON-LD metadata document alongside their CSV files (or link to it via HTTP headers or a well-known URI). The metadata describes table structure, column names, datatypes, default values, null values, foreign key relationships, and transformation templates. Consumers can then process the CSV with full awareness of its structure and semantics.

Serialization and Namespace

  • Namespace: http://www.w3.org/ns/csvw#
  • Metadata documents are expressed in JSON-LD
  • A comprehensive primer provides introductory guidance for publishers

Governance and Maintenance

Developed by the W3C CSV on the Web Working Group. The specification includes a test suite and implementation report demonstrating interoperability across implementations. Source code and issues are tracked on GitHub.

Notable Implementations

CSVW metadata is used by government open data platforms and data publishing tools. Libraries exist in multiple programming languages for parsing CSVW metadata, validating CSV files against it, and performing the defined JSON and RDF conversions.

Related Standards

  • JSON-LD (json-ld): The format used for CSVW metadata documents
  • DCAT (dcat): Often used alongside CSVW for dataset catalog metadata

Further Reading