MARCXML is the XML representation of MARC 21 records developed by the Library of Congress. It provides a standardized way to express the full content of MARC records in XML, enabling interoperability with modern web technologies while preserving the complete fidelity of the original MARC data structure.
Background
In the early 2000s, as XML became the dominant data interchange format on the web, the Library of Congress Network Development and MARC Standards Office recognized the need for a standard way to represent MARC 21 data in XML. Earlier efforts had produced SGML DTDs for MARC in the mid-1990s, but these were extremely large because they mapped every individual MARC data element to a separate XML element. The MARCXML approach took a different path, using a slim schema that mirrors the generic MARC record structure — leader, control fields, data fields with indicators, and subfields — rather than enumerating every possible tag and subfield code.
Purpose & Scope
MARCXML serves as a bridge between the traditional MARC binary format (ISO 2709) and XML-based systems. The framework is designed to be flexible and extensible, allowing institutions to work with MARC data in ways specific to their needs. Key use cases include data exchange between library systems, transformation pipelines (e.g., converting MARC to MODS or Dublin Core), and long-term preservation of bibliographic data in an open, text-based format.
The schema itself is intentionally generic: it represents the structure of a MARC record without constraining which tags or subfield codes are valid. Validation of MARC content rules is handled separately through additional stylesheets.
Key Components
| Component | Purpose |
|---|---|
| MARC21slim.xsd | Core XML Schema for MARCXML records |
| Conversion stylesheets | XSLT transforms to/from MODS, Dublin Core, OAI MARC, ONIX |
| MARCXML Toolkit | Java tools for MARC-to-XML conversion with full character set support |
| Validation stylesheets | XSLT-based MARC bibliographic validation |
| HTML display stylesheets | Tagged view and English-labeled view for browser rendering |
Technical Architecture
A MARCXML document represents one or more MARC records using a small set of XML elements: <record>, <leader>, <controlfield>, <datafield>, and <subfield>. Each <datafield> carries tag, ind1, and ind2 attributes corresponding to the MARC tag number and indicator values. Subfields are identified by a code attribute. This structure preserves round-trip fidelity with the ISO 2709 binary format.
The framework includes an extensive library of XSLT stylesheets maintained by the Library of Congress for transforming MARCXML to other metadata formats, notably MODS (multiple versions from 3.0 through 3.7) and Dublin Core (in RDF, OAI, and SRW encodings). Reverse transformations from these formats back to MARCXML are also provided.
Governance & Maintenance
MARCXML is maintained by the Network Development and MARC Standards Office at the Library of Congress. Updates to the schema track changes in the MARC 21 formats. The page was last updated on February 2, 2022.
Notable Implementations
MARCXML is widely used across the library community as an interchange format. It serves as the primary XML representation for MARC data in systems such as Ex Libris Alma, OCLC WorldCat, and numerous institutional repository and digital library platforms. The format is central to metadata transformation pipelines, particularly the MARC-to-MODS and MARC-to-BIBFRAME conversion workflows maintained by the Library of Congress.
Related Standards
- MARC 21 — the underlying record format that MARCXML encodes
- MODS — a derivative XML schema for which extensive MARCXML-to-MODS conversion stylesheets exist
- MADS — the authority counterpart to MODS, also linked through MARCXML conversions
- BIBFRAME — the Library of Congress successor initiative to MARC, with conversion tools operating on MARCXML as an intermediate format