This Oxford Common File Layout (OCFL) specification describes an application-independent approach to the storage of digital information in a structured, transparent, and predictable manner. It is designed to promote long-term object management best practices within digital repositories.
This specification covers two principle areas:
The OCFL initiative arose from a need to have well-defined application-independent file management within digital repositories.
A general observation is that the contents of a digital repository -- that is, the digital files and metadata that an institution might wish to manage -- are largely stable. Once content has been accessioned, it is unlikely to change significantly over its lifetime. This is in contrast to the software applications that manage these contents, which are ephemeral, requiring constant updating and replacement. Thus, transitions between application-specific methods of file management to support software upgrades and replacement cycles can be seen as unnecessary and risky change, changing the long-term stable objects to support the short-term, ephemeral software.
By providing a specification for the file-and-folder layout on disk, the OCFL is an attempt at reducing, or even eliminating, the need for these transitions. As an application-independent specification, conforming applications will natively 'understand' the underlying file structure without needing to first transition these contents to their own format.
An OCFL Object is a group of one or more content bitstreams (data and metadata), and their administrative information that are together identified by a URI. The object may contain a sequence of versions of the bitstreams that represent the evolution of the object's contents.
The basic OCFL Object structure has a minimal set of files and folders necessary to support data storage and object validation. The minimum required is shown in the following figure:
[object_root] ├── 0=ocfl_object_1.0 ├── inventory.jsonld ├── inventory.jsonld.sha512 ├── logs │ └── .keep └── v1 ├── inventory.jsonld └── inventory.jsonld.sha512
The version declaration MUST be formatted according to the [[!NAMASTE]] specification. It MUST be an empty
file in the base directory of the object giving the OCFL object version in the filename. The filename MUST
be constructed with a leading zero-equals (
0=) string, the string
followed by the OCFL specification version number. For example
0=ocfl_object_1.0 for version
1.0 of this specification.
OCFL object content is stored as a sequence of one or more versions. Each object version is stored in a
version directory under the object root. The sequence of version numbers is the sequence of positive
integers: 1, 2, 3, etc., and the version directory name is constructed by adding the prefix
Implementations SHOULD use version directory names constructed without zero-padding the
version number, ie.
For compatibility with existing filesystem conventions, implementations MAY use zero-padded version numbers,
with the following restriction: If zero-padded version numbers are used then they MUST start with a zero.
For example, in an implementation that uses five digits the version directory names
v09999 are allowed,
v10000 is not allowed.
The first version of an object defines the naming convention for all versions of the object. All versions of an object MUST use the same naming convention: either a non-padded version number, or a zero-padded version number of consistent length. Operations that add a new version to an object MUST follow the directory naming convention established by earlier versions. In all cases, references to files inside version directories from inventory files MUST use the actual version directory names.
Hashing plays two roles in an OCFL Object. The first is that it allows for content-addressable storage; that is, for a file to be addressed by its contents, rather than its filename. The second is that it provides for fixity checks to determine whether a file has become corrupt through hardware degradation or malicious actors.
OCFL Objects SHOULD use SHA512 by default. The choice of SHA512 recognizes that it has no known collision vulnerabilities and is less computationally intensive to compute than SHA256.
However, it is acknowledged that this may be a significant barrier for legacy content migration and consistency. Implementors may thus choose from the following list of hashing algorithms:
|md5||Insecure. Use only for legacy content.|
|sha1||Insecure. Use only for legacy content.|
|sha256||Non-truncated only; note performance implications.|
|sha512||Default choice. Non-truncated forms only.|
Implementers MAY wish to store their file hashes in a system external to their OCFL object stores at the point of ingest, to further safeguard against the possibility of malicious manipulation of file contents and checksums.
Every occurrence of an inventory file MUST have an accompanying sidecar file stating its checksum. This
sidecar file must be of the form
the chosen hashing algorithm for the object. An example might be
The digest sidecar file MUST contain the hash of the inventory file. This MUST follow the format
CHECKSUM inventory.jsonld; that is, the hash of the inventory file, a single space, and the
name of the file being checksummed.
The hash of the inventory MUST be computed only after all changes to the inventory have been made, so this sidecar file SHOULD be created as the last step in the versioning process.