Skip to main content
Back to Standards

CodeMeta

A minimal metadata vocabulary for software, designed to support research software discovery, citation, and interoperability across platforms. Founded in 2016 by a consortium of researchers, CodeMeta builds on Schema.org terms and JSON-LD to create a shared language for describing software in academic and research contexts. It serves as a metadata exchange layer through crosswalk mappings that translate between diverse repository-specific and language-specific metadata formats, preserving information across systems.

Overview

CodeMeta is a community-driven metadata vocabulary designed to make research software findable, citable, and interoperable across platforms and repositories. By providing a shared, minimal set of terms expressed in JSON-LD and grounded in Schema.org, CodeMeta has become an essential bridge between software development practices and scholarly communication needs.

Background

CodeMeta was founded in 2016 by a consortium of researchers seeking to address the persistent challenge of software metadata fragmentation. Different repositories, package managers, and programming language ecosystems each use their own metadata formats, making it difficult to exchange information about software across systems. CodeMeta emerged as a translation layer — a Rosetta Stone for software metadata — that could map between these diverse formats while preserving essential information.

The project has grown from its initial research community roots into a recognized framework used by a worldwide community of developers, researchers, and archivists. It is now part of major European research infrastructure initiatives, notably FAIRCORE4EOSC, which uses CodeMeta to make research software FAIR (Findable, Accessible, Interoperable, Reusable).

Purpose & Scope

CodeMeta addresses five key use cases: enabling reliable software citation in scholarly work, standardizing software metadata across incompatible tools and formats, improving software discovery and reuse by humans and machines, supporting interoperability between repositories and registries, and enhancing transparency and reproducibility in science by providing clear metadata about what software does and how it was created.

The vocabulary is recommended by the Research Software MetaData Guidelines (RSMD) for multiple essential and important practices, including maintaining machine-readable metadata in a single source file, adding software descriptions, classification metadata, and author information.

Key Elements

CodeMeta defines a vocabulary of properties for describing software, building on Schema.org's existing terms. Key metadata fields include software name, description, version, authors (with ORCID support), license, programming language, operating system, related publications, funding sources, and development status.

Serializations & Technical Formats

CodeMeta files are expressed in JSON-LD, typically stored as a codemeta.json file in the root of a software repository. The use of JSON-LD enables compatibility with linked data ecosystems and Schema.org tooling.

Crosswalks

A distinguishing feature of CodeMeta is its crosswalk system. Crosswalks provide mappings between CodeMeta's vocabulary and the metadata formats used by specific repositories and tools — such as PyPI, CRAN, npm, Maven, Debian packages, DataCite, and many others. This allows metadata to flow between systems without bespoke converters, with CodeMeta acting as the central interchange format.

Governance & Maintenance

CodeMeta is maintained by its open community, with development coordinated through GitHub. The project welcomes contributions from anyone interested in software metadata. The CodeMeta Generator tool provides a web-based interface for creating compliant codemeta.json files.

Notable Implementations

CodeMeta is used by Software Heritage for software archival and citation, by FAIRCORE4EOSC for European Open Science Cloud integration, and by numerous research institutions and software repositories. The codemeta.json format is recognized by Zenodo, HAL (French national archive), and the Software Heritage archive.

Related Standards

  • Schema.org — CodeMeta builds on Schema.org vocabulary
  • Citation File Format — Complementary software citation standard; CFF can be converted to CodeMeta

Further Reading

Resources & Links