Skip to main content
Back to Standards
Cellosaurus logo

Cellosaurus

A comprehensive, freely accessible knowledge resource on cell lines, maintained by the SIB Swiss Institute of Bioinformatics as part of the neXtProt project and ExPASy. Cellosaurus provides nomenclature, cross-references to over 100 external resources, species of origin, disease associations, and provenance information for human and animal cell lines used in biomedical research. It serves as the primary reference for cell line identification and authentication in the life sciences.

Overview

Cellosaurus is a comprehensive, freely accessible knowledge resource on cell lines maintained by the SIB Swiss Institute of Bioinformatics. It provides standardized nomenclature, detailed provenance information, and extensive cross-referencing for tens of thousands of human and animal cell lines used in biomedical research. As the most thorough cell line catalog available, it has become an essential tool for researchers seeking to identify, authenticate, and properly cite cell lines in scientific publications.

Background

Cell lines are fundamental tools in biological and medical research, yet their management has long been plagued by problems of misidentification, contamination, and inconsistent naming. Hundreds of research papers have been retracted or questioned due to the use of misidentified cell lines. The Cellosaurus project was initiated to address this problem by creating a single authoritative resource that documents every known cell line with its correct identity, origin, and history. Developed within the framework of the neXtProt project and hosted on the ExPASy bioinformatics resource portal, Cellosaurus has grown from a focused reference list into a comprehensive knowledge base covering over 150,000 cell lines.

Purpose & Scope

Cellosaurus catalogs cell lines from a wide range of species and tissue types, with particularly deep coverage of human cell lines used in cancer research, immunology, and drug development. For each cell line entry, the resource provides:

  • A unique accession number (CVCL identifier) for unambiguous reference
  • Recommended name and synonyms
  • Species of origin and disease association
  • Cross-references to over 100 external databases and resources
  • Literature references
  • Provenance and authentication data, including STR profiles
  • Information on known problematic cell lines (contaminated, misidentified)

The resource explicitly flags cell lines known to be contaminated or misidentified, helping researchers avoid using compromised materials.

Data Model

Field Description
Accession Unique CVCL identifier (e.g., CVCL_0030 for HeLa)
Name Recommended cell line name
Synonyms Alternative names and identifiers
Species Species of origin
Disease Associated disease or condition
Cross-references Links to external databases (ATCC, DSMZ, RIKEN, etc.)
STR Profile Short tandem repeat authentication data
Comments Provenance, contamination warnings, and other notes

Serializations & Technical Formats

Cellosaurus data is available in multiple formats for computational use. The primary distribution formats are an OBO flat file and an XML export, both available via FTP from the ExPASy server. A web-based search interface and REST API provide programmatic access to individual records. The data is also represented in RDF for integration with Semantic Web resources.

Governance & Maintenance

Cellosaurus is maintained by the CALIPHO group at SIB Swiss Institute of Bioinformatics, led by Amos Bairoch. The resource is updated regularly with new cell line entries, corrections, and expanded cross-references. It is released under a Creative Commons Attribution 4.0 license, allowing free reuse with attribution. Major cell line repositories such as ATCC, DSMZ, JCRB, and RIKEN collaborate by providing data feeds.

Notable Implementations

Cellosaurus accession numbers are increasingly used as standard identifiers in scientific journals. Several publishers and funding agencies recommend citing cell lines using Cellosaurus identifiers. The resource is cross-referenced by major biomedical databases including UniProt, ChEMBL, and the Catalogue of Somatic Mutations in Cancer (COSMIC). It is also registered in FAIRsharing as a recognized bioinformatics resource.

Related Standards

  • OBO Foundry -- Cellosaurus uses OBO format for one of its distribution files
  • MIRIAM/Identifiers.org -- Cellosaurus accession numbers are registered as a MIRIAM data type

Further Reading