Introduction

This Oxford Common File Layout (OCFL) specification describes an application-independent approach to the storage of digital information in a structured, transparent, and predictable manner. It is designed to promote long-term object management best practices within digital repositories.

This specification covers two principle areas:

  1. Structure. A normative specification of the nature of an OCFL Object (the "object-at-rest");
  2. Client Behaviours. A set of recommendations for how OCFL Objects should be acted upon (the "object-in-motion")

Need

The OCFL initiative arose from a need to have well-defined application-independent file management within digital repositories.

A general observation is that the contents of a digital repository -- that is, the digital files and metadata that an institution might wish to manage -- are largely stable. Once content has been accessioned, it is unlikely to change significantly over its lifetime. This is in contrast to the software applications that manage these contents, which are ephemeral, requiring constant updating and replacement. Thus, transitions between application-specific methods of file management to support software upgrades and replacement cycles can be seen as unnecessary and risky change, changing the long-term stable objects to support the short-term, ephemeral software.

By providing a specification for the file-and-folder layout on disk, the OCFL is an attempt at reducing, or even eliminating, the need for these transitions. As an application-independent specification, conforming applications will natively 'understand' the underlying file structure without needing to first transition these contents to their own format.

Terminology

Object Specification

An OCFL Object is a group of one or more content bitstreams (data and metadata), and their administrative information that are together identified by a URI. The object may contain a sequence of versions of the bitstreams that represent the evolution of the object's contents.

Basic Structure

The basic OCFL Object structure has a minimal set of files and folders necessary to support data storage and object validation. The minimum required is shown in the following figure:

[object_root]
    ├── 0=ocfl_object_1.0
    ├── inventory.jsonld
    ├── inventory.jsonld.sha512
    ├── logs
    │   └── .keep
    └── v1
        ├── inventory.jsonld
        └── inventory.jsonld.sha512
        

Object Conformance Declaration

The version declaration MUST be formatted according to the [[!NAMASTE]] specification. It MUST be an empty file in the base directory of the object giving the OCFL object version in the filename. The filename MUST be constructed with a leading zero-equals (0=) string, the string ocfl_object_, followed by the OCFL specification version number. For example 0=ocfl_object_1.0 for version 1.0 of this specification.

Logs Directory

Top-level Inventory

Top-level Inventory Checksum

Versions Directories

OCFL object content is stored as a sequence of one or more versions. Each object version is stored in a version directory under the object root. The sequence of version numbers is the sequence of positive integers: 1, 2, 3, etc., and the version directory name is constructed by adding the prefix v.

Implementations SHOULD use version directory names constructed without zero-padding the version number, ie. v1, v2, v3, etc..

For compatibility with existing filesystem conventions, implementations MAY use zero-padded version numbers, with the following restriction: If zero-padded version numbers are used then they MUST start with a zero. For example, in an implementation that uses five digits the version directory names v00001 to v09999 are allowed, v10000 is not allowed.

The first version of an object defines the naming convention for all versions of the object. All versions of an object MUST use the same naming convention: either a non-padded version number, or a zero-padded version number of consistent length. Operations that add a new version to an object MUST follow the directory naming convention established by earlier versions. In all cases, references to files inside version directories from inventory files MUST use the actual version directory names.

OCFL Object Inventory

Basic Structure

Digest Algorithm Choice

Hashing plays two roles in an OCFL Object. The first is that it allows for content-addressable storage; that is, for a file to be addressed by its contents, rather than its filename. The second is that it provides for fixity checks to determine whether a file has become corrupt through hardware degradation or malicious actors.

OCFL Objects SHOULD use SHA512 by default. The choice of SHA512 recognizes that it has no known collision vulnerabilities and is less computationally intensive to compute than SHA256.

However, it is acknowledged that this may be a significant barrier for legacy content migration and consistency. Implementors may thus choose from the following list of hashing algorithms:

Algorithm Note
md5 Insecure. Use only for legacy content.
sha1 Insecure. Use only for legacy content.
sha256 Non-truncated only; note performance implications.
sha512 Default choice. Non-truncated forms only.

Implementers MAY wish to store their file hashes in a system external to their OCFL object stores at the point of ingest, to further safeguard against the possibility of malicious manipulation of file contents and checksums.

Digest Sidecar File

Every occurrence of an inventory file MUST have an accompanying sidecar file stating its checksum. This sidecar file must be of the form inventory.jsonld.HASHALG, where HASHALG follows the chosen hashing algorithm for the object. An example might be inventory.jsonld.sha512.

The digest sidecar file MUST contain the hash of the inventory file. This MUST follow the format CHECKSUM inventory.jsonld; that is, the hash of the inventory file, a single space, and the name of the file being checksummed.

The hash of the inventory MUST be computed only after all changes to the inventory have been made, so this sidecar file SHOULD be created as the last step in the versioning process.

Object Manifest

Object Versions

Storage Root

Root Conformance Declaration

Examples

Minimal Object

Moab in OCFL Object

Implementation Considerations