CSV on the Web is a suite of W3C Recommendations that brings structured metadata to the world's most ubiquitous data format. CSV files are everywhere -- from government open data portals to scientific repositories -- yet they carry no inherent schema, no type information, and no standardized way to describe their structure. CSVW fills this gap by defining a metadata vocabulary and processing model for tabular data on the Web.
Background
The CSV on the Web Working Group was chartered by W3C in 2013 in response to the widespread use of CSV for publishing open data, particularly by governments. Editors Jeni Tennison (Open Data Institute) and Gregg Kellogg (Kellogg Associates), along with authors Rufus Pollock (Open Knowledge) and Ivan Herman (W3C), developed four specifications published simultaneously as W3C Recommendations on 17 December 2015.
Purpose and Scope
CSVW addresses several fundamental problems with CSV data on the web:
- Ambiguity: CSV files lack a formal schema; column semantics, datatypes, and table relationships are undefined
- Discoverability: No standard mechanism exists to associate metadata with a CSV file
- Interoperability: Different tools make different assumptions about delimiters, encodings, null values, and line endings
- Integration: CSV data cannot participate in the linked data ecosystem without transformation
The Four Specifications
- Model for Tabular Data and Metadata on the Web defines the abstract data model for tables, columns, rows, cells, and their annotations
- Metadata Vocabulary for Tabular Data provides a JSON-LD vocabulary for describing CSV structure, including datatypes, foreign keys, transformations, and dialect settings
- Generating JSON from Tabular Data on the Web (CSV2JSON) defines standard conversion from annotated CSV to JSON
- Generating RDF from Tabular Data on the Web (CSV2RDF) defines standard conversion from annotated CSV to RDF
How It Works
Publishers place a JSON-LD metadata document alongside their CSV files (or link to it via HTTP headers or a well-known URI). The metadata describes table structure, column names, datatypes, default values, null values, foreign key relationships, and transformation templates. Consumers can then process the CSV with full awareness of its structure and semantics.
Serialization and Namespace
- Namespace:
http://www.w3.org/ns/csvw# - Metadata documents are expressed in JSON-LD
- A comprehensive primer provides introductory guidance for publishers
Governance and Maintenance
Developed by the W3C CSV on the Web Working Group. The specification includes a test suite and implementation report demonstrating interoperability across implementations. Source code and issues are tracked on GitHub.
Notable Implementations
CSVW metadata is used by government open data platforms and data publishing tools. Libraries exist in multiple programming languages for parsing CSVW metadata, validating CSV files against it, and performing the defined JSON and RDF conversions.
Related Standards
- JSON-LD (json-ld): The format used for CSVW metadata documents
- DCAT (dcat): Often used alongside CSVW for dataset catalog metadata