Bioschemas is a community-driven initiative that extends Schema.org to improve the findability of life sciences resources on the web. Rather than creating a standalone vocabulary, Bioschemas works within the Schema.org ecosystem by proposing new types and properties for biological entities and by defining usage profiles that constrain Schema.org's broad vocabulary to the specific needs of the life sciences community.
Background
Bioschemas started as a community effort in November 2015, bringing together representatives from a wide variety of institutions across the life sciences. The initiative grew from the recognition that while Schema.org provided an excellent framework for structured web markup, it lacked the specificity needed to adequately describe biological databases, proteins, genes, chemical substances, training materials, and other resources central to life sciences research.
The effort has achieved significant milestones. As part of the Schema.org version 13.0 release, several Bioschemas types were incorporated into Schema.org's pending area: BioChemEntity, ChemicalSubstance, Gene, MolecularEntity, Protein, and Taxon. This integration represents the successful upstream contribution of community-developed extensions into the core Schema.org vocabulary.
Purpose & Scope
Bioschemas aims to make life sciences resources — including datasets, software, training materials, samples, and molecular entities — more findable by search engines and other services. It achieves this through two main contributions:
- New types and properties — Proposing extensions to Schema.org for describing life science resources that have no existing adequate type
- Usage profiles — Defining profiles over Schema.org types that specify which properties are mandatory (minimum), recommended, and optional, along with cardinality constraints and preferred domain ontologies for property values
For example, the Schema.org Dataset type has over 100 available properties. The Bioschemas Dataset profile reduces this to 5 mandatory and 8 recommended properties, focusing on those that are meaningful for scientific datasets and that enable discovery through tools like Google Dataset Search.
Key Profiles
Bioschemas defines profiles for various life sciences resource types, built on top of both existing Schema.org types and new Bioschemas types:
| Profile | Based On | Purpose |
|---|---|---|
| Dataset | schema:Dataset | Biological datasets |
| Gene | bioschemas:Gene | Gene information |
| Protein | bioschemas:Protein | Protein data |
| Taxon | bioschemas:Taxon | Taxonomic information |
| TrainingMaterial | schema:LearningResource | Life sciences training |
| Course | schema:Course | Training courses |
| Tool | schema:SoftwareApplication | Bioinformatics tools |
Serializations & Technical Formats
Bioschemas markup uses JSON-LD embedded in web pages, following Schema.org conventions. This allows search engines and specialized harvesters to discover and index structured metadata directly from resource web pages.
Governance & Maintenance
Bioschemas operates as an open community initiative with regular monthly calls (alternating between community and technical discussions) and participation from researchers, developers, and data managers worldwide. The community coordinates through GitHub and mailing lists. Bioschemas is a flagship policy of ELIXIR, the European life sciences infrastructure for biological information, and a key component of their 2024-2028 Scientific Programme.
The initiative has received funding through the ELIXIR-EXCELERATE grant and ELIXIR Implementation Studies. Its use has been endorsed by the European Research Council in their Open Research Data policy and by the International Society for Biocuration.
Notable Implementations
Bioschemas markup is used across the ELIXIR infrastructure, including by numerous biological databases, training portals, and tool registries. The structured markup enables resources to be discovered through Google Dataset Search and other Schema.org-aware services.
Related Standards
- Schema.org — Bioschemas extends and profiles Schema.org
- Darwin Core — Complementary biodiversity data standard used for occurrence records