Skip to main content
Back to Standards
Gene Ontology logo

Gene Ontology

GO

By GO

The most widely used biomedical ontology, providing a structured, controlled vocabulary for annotating gene and gene product attributes across all species. GO comprises three sub-ontologies — molecular function, biological process, and cellular component — organized as a directed acyclic graph with over 45,000 terms. Maintained by a large international consortium, GO annotations are used in genomic databases worldwide to enable functional analysis, enrichment studies, and cross-species comparison of gene function.

Overview

The Gene Ontology (GO) is the most widely adopted biomedical ontology in use today, providing a unified framework for describing the functions, processes, and cellular locations associated with gene products across all species. Since its creation in 1998, GO has become an indispensable tool in genomics and bioinformatics, with millions of annotations connecting gene products to standardized functional descriptions.

Background

The Gene Ontology project was founded in 1998 by researchers studying three model organisms: Drosophila melanogaster (fruit fly), Mus musculus (mouse), and Saccharomyces cerevisiae (yeast). The motivation was straightforward: while gene nomenclature conventions varied across species, the underlying biological functions were often conserved. A shared vocabulary for functional attributes would enable meaningful cross-species comparison and data integration.

The initiative rapidly grew beyond its original three organisms. The Gene Ontology Consortium now includes contributions from dozens of model organism databases and multi-species protein databases worldwide. GO was one of the initial candidate members of the OBO Foundry, which coordinates the development of interoperable biomedical ontologies.

Purpose & Scope

GO provides a controlled vocabulary organized into three complementary sub-ontologies:

  • Molecular Function — elemental activities of gene products at the molecular level, such as catalytic activity or binding
  • Biological Process — ordered sets of molecular events with a defined beginning and end, such as signal transduction or DNA repair
  • Cellular Component — the parts of a cell or its extracellular environment where gene products are active, such as the nucleus or plasma membrane

Each GO term has a unique alphanumeric identifier (e.g., GO:0000016), a human-readable name, a textual definition with cited sources, and defined relationships to other terms. The ontology is structured as a directed acyclic graph, allowing terms to have multiple parent terms.

Key Statistics

Metric Value
Total terms ~45,000+
Annotations 6.4 million+
Organisms annotated 4,400+
Sub-ontologies 3

Serializations & Technical Formats

GO is distributed in multiple formats including OBO format (the historical default for OBO Foundry ontologies), OWL (Web Ontology Language), and JSON. The canonical namespace URI is http://purl.obolibrary.org/obo/GO_. Annotation files use the Gene Association File (GAF) and Gene Product Association Data (GPAD) formats.

Governance & Maintenance

The Gene Ontology Consortium maintains the ontology through a dedicated editorial office supported by contributions from the broader research community. Additions and corrections are suggested by annotators and domain experts, then reviewed by ontology editors. The consortium includes model organism databases (such as FlyBase, Mouse Genome Informatics, and the Saccharomyces Genome Database), multi-species resources (such as UniProt), and software development groups.

GO is funded primarily by the National Human Genome Research Institute (NHGRI) of the US National Institutes of Health, with additional support from international funding bodies.

Notable Implementations

GO annotations are a core component of nearly every major genomic and proteomic database, including UniProt, Ensembl, NCBI Gene, and all major model organism databases. GO enrichment analysis — determining whether a set of genes is statistically enriched for particular GO terms — is one of the most common analytical steps in transcriptomic and genomic studies. AmiGO is the official web-based tool for searching and browsing the GO database.

Related Standards

  • OBO Foundry — GO is a founding member of the Open Biomedical Ontologies Foundry
  • Evidence and Conclusion Ontology (ECO) — provides the evidence codes used in GO annotations

Further Reading

Resources & Links