Skip to main content
Back to Standards

Lexicon Model for Ontologies (OntoLex-Lemon)

OntoLex

A W3C Community Group specification for representing lexical information relative to ontologies. OntoLex-Lemon provides a core model and several extension modules for describing lexical entries, forms, senses, and their relationships to ontology concepts. It evolved from the original lemon (Lexicon Model for Ontologies) developed in the Monnet project, and is now the standard model used by Wikidata's lexicographic data and by numerous linguistic linked data resources across Europe and beyond.

Overview

OntoLex-Lemon is a W3C Community Group specification that provides an RDF model for representing lexicographic and linguistic data in relation to ontologies. It enables the creation of machine-readable lexicons where words, their forms, meanings, and relationships are expressed as linked data, bridging the gap between natural language processing resources and the Semantic Web. The model is used by Wikidata for its lexicographic data and by numerous linguistic linked data resources across Europe.

Background

The need for a standard way to connect lexical information with ontologies emerged as the Semantic Web matured. Ontologies define concepts and relationships, but they do not inherently capture the linguistic expressions used to denote those concepts across different languages. The original lemon (Lexicon Model for Ontologies) was developed in the Monnet project, a European research initiative focused on multilingual access to structured data. In 2011, the W3C Ontology-Lexica Community Group was formed to develop an improved and standardized version. The result, OntoLex-Lemon, was published as a W3C Community Group Final Report in May 2016.

Purpose & Scope

OntoLex-Lemon provides a core model and several extension modules:

Module Purpose
ontolex (core) Lexical entries, forms, senses, and their links to ontology concepts
synsem Syntactic frames and the syntax-semantics interface
decomp Morphological decomposition of compound words
vartrans Lexical variations and translations between languages
lime Linguistic metadata for describing lexical resources

The core model centers on three key classes: LexicalEntry (a word or multi-word expression), Form (a specific morphological realization with a written or phonetic representation), and LexicalSense (the meaning of a lexical entry in relation to an ontology concept). A LexicalEntry has one or more Form instances and one or more LexicalSense instances, each of which references a concept in an external ontology.

Governance & Maintenance

OntoLex-Lemon is maintained by the W3C Ontology-Lexica Community Group, which continues active development of extension modules. The FrAC (Frequency, Attestation, and Corpus) module is a more recent addition. The community group holds regular meetings and publishes updates through the W3C community group process.

Notable Implementations

The most prominent deployment of OntoLex-Lemon is in Wikidata, which adopted the model for its lexicographic data starting in 2018. Wikidata now contains millions of lexemes modeled according to OntoLex-Lemon, covering hundreds of languages. The model is also used by DBnary (a multilingual lexical resource extracted from Wiktionary), the LLOD (Linguistic Linked Open Data) cloud, and various European linguistic infrastructure projects including ELEXIS (European Lexicographic Infrastructure).

Related Standards

  • SKOS -- Used for simpler thesaurus-style vocabularies; OntoLex-Lemon provides richer lexical modeling
  • LexInfo -- An ontology of linguistic categories that extends OntoLex-Lemon with detailed morphological and syntactic properties

Further Reading