OntoLex-Lemon is a W3C Community Group specification that provides an RDF model for representing lexicographic and linguistic data in relation to ontologies. It enables the creation of machine-readable lexicons where words, their forms, meanings, and relationships are expressed as linked data, bridging the gap between natural language processing resources and the Semantic Web. The model is used by Wikidata for its lexicographic data and by numerous linguistic linked data resources across Europe.
Background
The need for a standard way to connect lexical information with ontologies emerged as the Semantic Web matured. Ontologies define concepts and relationships, but they do not inherently capture the linguistic expressions used to denote those concepts across different languages. The original lemon (Lexicon Model for Ontologies) was developed in the Monnet project, a European research initiative focused on multilingual access to structured data. In 2011, the W3C Ontology-Lexica Community Group was formed to develop an improved and standardized version. The result, OntoLex-Lemon, was published as a W3C Community Group Final Report in May 2016.
Purpose & Scope
OntoLex-Lemon provides a core model and several extension modules:
| Module | Purpose |
|---|---|
| ontolex (core) | Lexical entries, forms, senses, and their links to ontology concepts |
| synsem | Syntactic frames and the syntax-semantics interface |
| decomp | Morphological decomposition of compound words |
| vartrans | Lexical variations and translations between languages |
| lime | Linguistic metadata for describing lexical resources |
The core model centers on three key classes: LexicalEntry (a word or multi-word expression), Form (a specific morphological realization with a written or phonetic representation), and LexicalSense (the meaning of a lexical entry in relation to an ontology concept). A LexicalEntry has one or more Form instances and one or more LexicalSense instances, each of which references a concept in an external ontology.
Governance & Maintenance
OntoLex-Lemon is maintained by the W3C Ontology-Lexica Community Group, which continues active development of extension modules. The FrAC (Frequency, Attestation, and Corpus) module is a more recent addition. The community group holds regular meetings and publishes updates through the W3C community group process.
Notable Implementations
The most prominent deployment of OntoLex-Lemon is in Wikidata, which adopted the model for its lexicographic data starting in 2018. Wikidata now contains millions of lexemes modeled according to OntoLex-Lemon, covering hundreds of languages. The model is also used by DBnary (a multilingual lexical resource extracted from Wiktionary), the LLOD (Linguistic Linked Open Data) cloud, and various European linguistic infrastructure projects including ELEXIS (European Lexicographic Infrastructure).
Related Standards
- SKOS -- Used for simpler thesaurus-style vocabularies; OntoLex-Lemon provides richer lexical modeling
- LexInfo -- An ontology of linguistic categories that extends OntoLex-Lemon with detailed morphological and syntactic properties