Skip to main content
Back to Standards
BagIt File Packaging Format logo

BagIt File Packaging Format

BagIt

A set of hierarchical file system conventions designed to support disk-based storage and network transfer of arbitrary digital content. Defined in IETF RFC 8493, a "bag" consists of a payload directory containing the digital content and tag files that document the package, including a manifest of filenames with corresponding checksums for integrity verification. Originally developed through a collaboration between the Library of Congress and the California Digital Library, BagIt is widely adopted in digital preservation and library communities for archival transfers.

Overview

BagIt is a straightforward file packaging convention widely adopted in the digital preservation community for transferring and verifying collections of files. Defined in IETF RFC 8493, it provides a minimal structure --- a payload directory, a manifest of checksums, and a small set of metadata tags --- that makes it easy to package, ship, and validate arbitrary digital content using ordinary filesystem tools.

Background

BagIt emerged from a practical need at the Library of Congress and the California Digital Library in 2007. The California Digital Library needed to transfer several terabytes of web archiving data to the Library of Congress, and existing approaches (XML wrapping, custom scripts) were cumbersome for large-scale transfers. The "bag" concept drew inspiration from work at the University of Tsukuba on the "enclose and deposit" model for mutual depositing of archived resources.

John Kunze wrote up the specification as an IETF Internet-Draft in December 2008. After several revisions and extensive community use, version 1.0 was published as RFC 8493 in October 2018 by the Internet Engineering Task Force.

Purpose & Scope

A BagIt "bag" is a directory that contains:

  • A payload (data/ directory) holding the actual digital content --- files in any format, organised in any subdirectory structure
  • A manifest file listing every payload file alongside its checksum (MD5, SHA-256, SHA-512, etc.) for integrity verification
  • A bagit.txt tag file identifying the directory as a bag, specifying the BagIt version and character encoding

Upon receipt, verification software compares the checksums in the manifest against the actual files, immediately flagging any corruption, truncation, or missing files.

Key Components

File / Directory Required Description
bagit.txt Yes Version declaration and encoding
manifest-{alg}.txt Yes (at least one) Payload file checksums
data/ Yes Payload directory
bag-info.txt No Key-value metadata about the bag
tagmanifest-{alg}.txt No Checksums for tag files themselves
fetch.txt No URLs for files to be fetched remotely

The bag-info.txt file supports metadata fields such as Source-Organization, Bag-Count, Bag-Size, Bagging-Date, and External-Identifier, using a simple colon-separated key-value format similar to HTTP headers.

Serializations & Technical Formats

BagIt is a filesystem convention, not a serialization format. Bags can be transferred as loose directory trees or serialized into archive formats such as ZIP or TAR for transport, though the specification itself (from version 15 onward) does not prescribe serialization. The bag structure relies on cross-platform filesystem naming conventions compatible with both Windows and Unix.

Governance & Maintenance

BagIt is published as an IETF RFC under the IETF Trust license. The specification was authored by John Kunze (California Digital Library), Justin Littman (Stanford University), Elizabeth Madden (Library of Congress), John Scancella (Library of Congress), and Chris Adams (Library of Congress). Community discussion takes place on the Digital Curation mailing list. The GitHub repository hosts the working copy of the specification.

Notable Implementations

BagIt is a de facto standard in the digital preservation community:

  • Library of Congress --- uses BagIt for large-scale content transfers and preservation workflows
  • DPN (Digital Preservation Network) --- adopted BagIt as its transfer format
  • APTrust --- the Academic Preservation Trust requires BagIt packaging for ingest
  • Archivematica --- digital preservation system that can ingest BagIt bags
  • bagit-python and bagit-java --- reference libraries maintained by the Library of Congress
  • Bagger --- GUI tool from the Library of Congress for creating and validating bags

Related Standards

  • METS (Metadata Encoding and Transmission Standard) --- a more complex XML-based packaging and metadata standard sometimes used alongside BagIt
  • OCFL (Oxford Common File Layout) --- a newer filesystem convention for digital preservation that can use BagIt bags as a transfer mechanism
  • WARC --- web archive format; BagIt is often used to package collections of WARC files

Further Reading