Introduction

This Oxford Common File Layout (OCFL) specification describes an application-independent approach to the storage of digital objects in a structured, transparent, and predictable manner. It is designed to promote long-term access and management of digital objects within digital repositories.

This specification is a normative specification which describes the nature of an OCFL Object (the "object-at-rest"). A set of recommendations for how OCFL Objects should be acted upon (the "object-in-motion") can be found in the [[OCFL-Implementation-Notes]].

Need

The OCFL initiative arose from a need to have well-defined application-independent file management within digital repositories.

A general observation is that the contents of a digital repository -- that is, the digital files and metadata that an institution might wish to manage -- are largely stable. Once content has been accessioned, it is unlikely to change significantly over its lifetime. This is in contrast to the software applications that manage these contents, which are ephemeral, requiring constant updating and replacement. Thus, transitions between application-specific methods of file management to support software upgrades and replacement cycles can be seen as unnecessary and risky change which affects the long-term stable objects in support of the short-term, ephemeral software.

By providing a specification for the file and directory layout on disk, OCFL is an attempt at reducing, or even eliminating, the need for these transitions. As an application-independent specification, conforming applications will natively "understand" the underlying file structure without needing to first transition these contents to their own format.

Terminology

Existing File Path:
The file path of a file on disk or in an object store, relative to the OCFL Object Root. Existing file paths are used in the Manifest within an Inventory.
Digest:
An algorithmic characterization of the contents of a file conforming to a standard digest algorithm.
Inventory:
A file, expressed in JSON, that tracks the history and current state of an OCFL Object.
Logical File Path:
A path that represents a file's location in the logical state of an object. Logical file paths are used in conjunction with a digest to represent the file name for a given bitstream at a given version.
Logical State:
A grouping of logical file paths reflecting the state of the content for a given version.
Logs Directory:
A directory for storing information about the content (e.g., actions performed) that may not necessarily be part of the content itself.
Manifest
A section of the Inventory listing all files and their digests within an OCFL Object.
OCFL Object:
A group of one or more content files and administrative information, that are together identified by a URI. The object may contain a sequence of versions of the files that represent the evolution of the object's contents.
OCFL Object Root:
The base directory of an OCFL Object, identified by a [[NAMASTE]] file "0=ocfl_object_1.0".
OCFL Storage Root:
A base directory used to store OCFL Objects, identified by a [[NAMASTE]] file "0=ocfl_1.0".
OCFL Version:
The state of an OCFL Object's content which is constructed using the incremental changes recorded in the sequence of corresponding and prior version directories.

OCFL Object

An OCFL Object is a group of one or more content files and administrative information, that are together identified by a URI. The object may contain a sequence of versions of the files that represent the evolution of the object's contents.

A file is defined as a content bitstream that can be stored and transmitted. Directories (also called "folders") allow for the organization of files into tree-like hierarchies. The content of an OCFL Object is the files and the directories they are organized in that are stored within the hierarchy layout described in this specification.

An OCFL Object includes administrative information that identifies a directory as an OCFL Object, and also provides a means of tracking changes to the contents of the object over time.

An OCFL Object is therefore:

  1. A conceptual gathering of all files (data and metadata), the directories they are organized in, and their changes over time which together form the digital representation of an entity that need to be managed, in preservation terms, as a single coherent whole (i.e., content); and
  2. A file and directory layout and administrative information on a storage medium that provides a defined structure for the storage of this content, and through which these files and their changes may be understood (i.e., structure).

A key goal of OCFL is the rebuildability of a repository from an OCFL Storage Root without additional information resources. Consequently, a key implementation consideration should be to ensure that OCFL Objects contain all the data and metadata required to achieve this. With reference to the [[OAIS]] model, this would include all the descriptive, administrative, structural, representation and preservation metadata relevant to the object.

A central feature of the OCFL specification is support for versioning. This recognizes that digital objects will change over time, through new requirements, fixes, updates, or format shifts. The specification takes no position on what constitutes a version or a versionable action, but it is recommended that implementers have a clear position on this within their local storage policies.

Object Structure

The OCFL Object structure organizes content files and administrative information in order to support content storage and object validation. The structure for an object with one version is shown in the following figure:

[object_root]
    ├── 0=ocfl_object_1.0
    ├── inventory.json
    ├── inventory.json.sha512
    └── v1
        ├── inventory.json
        ├── inventory.json.sha512
        └── content
               └── ... content files ...

Object Conformance Declaration

The version declaration MUST be formatted according to the [[!NAMASTE]] specification. It MUST be an empty file in the base directory of the object giving the OCFL Object version in the filename. The filename MUST be constructed with a leading zero-equals (0=) string, the string ocfl_object_, followed by the OCFL specification version number. For example 0=ocfl_object_1.0 for version 1.0 of this specification.

Version Directories

OCFL Object content MUST be stored as a sequence of one or more versions. Each object version is stored in a version directory under the object root. The sequence of version numbers is the sequence of positive integers: 1, 2, 3, etc., and the version directory name is constructed by adding the prefix v.

Implementations SHOULD use version directory names constructed without zero-padding the version number, ie. v1, v2, v3, etc..

For compatibility with existing filesystem conventions, implementations MAY use zero-padded version directory numbers, with the following restriction: If zero-padded version directory numbers are used then they MUST start with the prefix v and then a zero. For example, in an implementation that uses five digits for version directory names then v00001 to v09999 are allowed, v10000 is not allowed.

The first version of an object defines the naming convention for all version directories for the object. All version directories of an object MUST use the same naming convention: either a non-padded version directory number, or a zero-padded version directory number of consistent length. Operations that add a new version to an object MUST follow the version directory naming convention established by earlier versions. In all cases, references to files inside version directories from inventory files MUST use the actual version directory names.

Empty directories within a version directory are not permitted. Otherwise empty directories MAY be maintained by creating a .keep file within that directory.

Version directories MUST have a sub-directory called content if there are files present, and SHOULD NOT contain a content sub-directory otherwise. There MUST be no other files or directories as children of a version directory, other than an inventory file, an inventory digest, and a content directory.

Digests

Digests play two roles in an OCFL Object. The first is that digests allow for content-addressable storage; that is, for a file to be addressed by the digest of its contents, rather than its filename. The second is that digests provide for fixity checks to determine whether a file has become corrupt through hardware degradation, accident, or malicious actors.

OCFL Objects SHOULD use sha512 by default. The choice of sha512 recognizes that it has no known collision vulnerabilities and is less computationally intensive to compute than sha256, [[Stop-Using-SHA-256]].

For content addressability OCFL Objects MUST use either sha256 or sha512, to reduce the likelihood of digest collisions.

However, for legacy content migration and consistency, implementers MAY choose from the following list of digest algorithms for storing fixity values:

Digest Algorithm Note
md5 Insecure. Use only for legacy fixity values. MD5 algorithm and hex encoding defined by [[!RFC1321]]. For example, the md5 digest of a zero-length bitstream is d41d8cd98f00b204e9800998ecf8427e.
sha1 Insecure. Use only for legacy fixity values. SHA-1 algorithm defined by [[!FIPS-180-4]] and MUST be encoded using hex (base16) encoding [[!RFC4648]]. For example, the sha1 digest of a zero-length bitstream is da39a3ee5e6b4b0d3255bfef95601890afd80709.
sha256 Non-truncated form only; note performance implications. SHA-256 algorithm defined by [[!FIPS-180-4]] and MUST be encoded using hex (base16) encoding [[!RFC4648]]. For example, the sha256 digest of a zero-length bitstream starts e3b0c44298fc1c149afbf4c8996fb92427ae41e4... (64 hex digits long).
sha512 Default choice. Non-truncated form only. SHA-512 algorithm defined by [[!FIPS-180-4]] and MUST be encoded using hex (base16) encoding [[!RFC4648]]. For example, the sha512 digest of a zero-length bitstream starts cf83e1357eefb8bdf1542850d66d8007d620e405... (128 hex digits long).

An OCFL Inventory MAY contain a fixity section that can store one or more blocks containing fixity values using multiple digest algorithms. See the section on fixity below for further details.

Non-normative note: Implementers may also store copies of their file digests in a system external to their OCFL Object stores at the point of ingest, to further safeguard against the possibility of malicious manipulation of file contents and digests.

Implementers should be aware that base16 digests are case insensitive. Different tools will generate digests in uppercase or lowercase, and this may lead to case differences between references to a digest and the digest itself within the inventory. If string-based methods are used to work with digests and inventories (as is the case in most common JSON libraries) then extra care must be taken to ensure case-insensitive comparisons are being made.

Inventory

An OCFL Object Inventory MUST follow the [[!JSON]] structure described in this section and MUST be named inventory.json. The order of entries in both the [[JSON]] objects and arrays used in inventory files has no significance.

Non-normative note: A [[JSON-Schema]] for validating OCFL Object Inventory files is provided at inventory_schema.json.

Basic Structure

An OCFL Object Inventory MUST include the following keys:

id
A unique identifier for the OCFL Object. This MUST be unique in the local context, and SHOULD be either a URI or URN. There is no expectation that values given as URIs are resolveable as URLs.
type
A type for the inventory JSON object. This MUST have the value Object.
digestAlgorithm
The digest algorithm used for calculating digests within the OCFL Object. This SHOULD be sha512, however other values are permitted. See the section on Digests for more information.
head
A value corresponding to the identifier of the most recent, or 'head,' version of the object.

In addition to these keys, there MUST be two other blocks present, manifest and versions, which are discussed in the next two sections.

Manifest

The value of the manifest key is a JSON object, with keys corresponding to the digests of every content file in all versions of the OCFL Object. The value for each key is an array containing the existing file paths of files in the OCFL Object that have content with the given digest. Existing file paths within a manifest block MUST be relative to the OCFL Object Root.

Non-normative note: If only one file is stored in the OCFL Object for each digest, fully de-duplicating the content, then there will be only one existing file path for each digest. There may, however, be multiple logical file paths for a given digest if the content was not entirely de-duplicated when constructing the OCFL Object.

An example manifest object for three existing file paths, all in version 1, is shown below:

  "manifest": {
    "7dcc35...c31": [ "v1/content/foo/bar.xml" ],
    "cf83e1...a3e": [ "v1/content/empty.txt" ],
    "ffccf6...62e": [ "v1/content/image.tiff" ]
  }

Versions

An OCFL Object Inventory MUST include a block for storing versions. This block MUST have the key of versions within the inventory, and it MUST be a JSON object. The keys of this object MUST correspond to the names of the version directories used. The values MUST be another JSON object that describes this version.

Version

A JSON object to describe one OCFL Version, which MUST include the following keys:

type
A type for the version JSON object. This MUST have the value Version.
created
The value of this key MUST be expressed in [[!ISO8601]], and SHOULD include a timezone value or UTC. It SHOULD also be granular to the second level.
message
The value of this key is freeform text, used to record the rationale for creating this version.
user
The value of this key is a JSON object, containing a readable user name key, name and an address key, address. The format of the address key MAY be either an e-mail address or a URI reference to a personal identifier, e.g., an ORCID iD.
state

The value of this key is a JSON object, containing a list of keys and values corresponding to the logical state of the object at that version. The keys of this JSON object are digest values, each of which MUST correspond to an entry in the manifest of the inventory. The value for each key is an array containing logical file path names of files in the OCFL Object state that have content with the given digest.

Non-normative note: The logical state of the object uses content-addressing to map logical file paths to their bitstreams, as expressed in the manifest section of the inventory.

Notably, the version state can be used to provide de-duplication of content within the OCFL Object, by mapping multiple logical file paths to the same content digest in the manifest. Implementers may choose to use this functionality on a case-by-case basis. For example, they may choose to implement renaming (that is, changes in file name, but not file content) by pointing a new logical file path to the digest for content that was added in a previous version. Implementers may also, however, choose to accession a new file with a new name and the same content as a new version. See [[OCFL-Implementation-Notes]].

An example state block is shown below:

  "state": {
    "4d27c8...b53": [ "foo/bar.xml" ],
    "cf83e1...a3e": [ "empty.txt", "empty2.txt" ]
  }

This state block describes an object with 3 files, two of which have the same content (empty.txt and empty2.txt), and one of which is in a sub-directory (bar.xml). The logical state shown as a tree is thus:

    ├── empty.txt
    ├── empty2.txt
    └── foo
        └── bar.xml

Fixity

An OCFL Object inventory MAY include a block for storing fixity checks. This block MUST have the key of fixity within the inventory.

The structure of the fixity section MUST contain a key corresponding to an approved digest algorithm. The value of this key MUST follow the structure of the manifest section; that is, a key corresponding to the digest value, and an array of existing file paths that match that digest.

An example fixity section with md5 and sha1 digests is shown below.

  "fixity": {
    "md5": {
      "184f84e28cbe75e050e9c25ea7f2e939": [ "v1/content/foo/bar.xml" ],
      "2673a7b11a70bc7ff960ad8127b4adeb": [ "v2/content/foo/bar.xml" ],
      "c289c8ccd4bab6e385f5afdd89b5bda2": [ "v1/content/image.tiff" ],
      "d41d8cd98f00b204e9800998ecf8427e": [ "v1/content/empty.txt" ]
    },
    "sha1": {
      "66709b068a2faead97113559db78ccd44712cbf2": [ "v1/content/foo/bar.xml" ],
      "a6357c99ecc5752931e133227581e914968f3b9c": [ "v2/content/foo/bar.xml" ],
      "b9c7ccc6154974288132b63c15db8d2750716b49": [ "v1/content/image.tiff" ],
      "da39a3ee5e6b4b0d3255bfef95601890afd80709": [ "v1/content/empty.txt" ]
    }
  }

Inventory Digest

Every occurrence of an inventory file MUST have an accompanying sidecar file stating its digest. This sidecar file must be of the form inventory.json.ALGORITHM, where ALGORITHM is the chosen digest algorithm for the object. An example might be inventory.json.sha512.

The digest sidecar file MUST contain the digest of the inventory file. This MUST follow the format DIGEST inventory.json; that is, the digest of the inventory file, a single space, and the name of the inventory file.

The digest of the inventory MUST be computed only after all changes to the inventory have been made, so this sidecar file SHOULD be created as the last step in the versioning process.

Version Inventory and Inventory Digest

Every version directory SHOULD include an inventory file that is an Inventory of all content for versions up to and including that particular version. Every inventory file MUST have a corresponding Inventory Digest.

Non-normative note: Storing an inventory for every version provides redundancy for this critical information in a way that is compatible with storage strategies that have immutable version directories.

Logs Directory

The base directory of the object MAY contain a logs directory, which MAY be empty. Implementers SHOULD use this for storing files that contain a record of actions taken on the object. Since these logs may be subject to local standards requirements, the format of these logs is considered out-of-scope for the OCFL Object. Clients operating on the object MAY log actions here that are not otherwise captured.

Non-normative note: The purpose of the logs directory is to provide implementers with a location for storing local information about actions to the OCFL Object's content that is not part of the content itself.

As an example, implementers may have different local requirements to store audit information for their content. Some may wish to store a log entry indicating that an audit was conducted, and nothing was wrong, while others may wish to only store a log entry if an intervention was required.

OCFL Storage Root

An OCFL Storage Root is the base directory of an OCFL storage layout.

Root Structure

An OCFL Storage Root MUST contain a Root Conformance Declaration identifying it as such.

An OCFL Storage Root SHOULD also contain the OCFL specification in human-readable plain-text format in the root.

An OCFL Storage Root MUST NOT contain any files or directories other than those described here.

Although implementations may require multiple OCFL Storage Roots; that is, several logical or physical volumes, or multiple "buckets" in an object store, each OCFL Storage Root MUST be independent.

The following example OCFL Storage Root represents the minimal set of files and folders:

[storage_root]
    ├── 0=ocfl_1.0
    └── ocfl_1.0.txt (optional)
        

Root Conformance Declaration

The OCFL version declaration MUST be formatted according to the [[!NAMASTE]] specification. It MUST be an empty file in the base directory of the OCFL Storage Root giving the OCFL version in the filename. The filename MUST be constructed with a leading zero-equals (0=) string, the string ocfl_, followed by the OCFL specification version number. For example 0=ocfl_1.0 for version 1.0 of this specification.

Root conformance indicates that the OCFL Storage Root conforms to this section (i.e. the OCFL Storage Root section) of the specification. OCFL Objects within the OCFL Storage Root also include a conformance declaration which MUST indicate OCFL Object conformance to the same or earlier version of the specification.

Storage Hierarchies

OCFL Object Roots MUST be stored either as the terminal resource at the end of a directory storage hierarchy or as direct children of a containing OCFL Storage Root.

A common practice is to use a unique identifier scheme to compose this storage hierarchy, typically arranged according to some form of the [[PairTree]] specification. Irrespective of the pattern chosen for the storage hierarchies, the following restrictions apply:

  1. Storage hierarchies MUST NOT include files within intermediate directories
  2. Storage hierarchies MUST be terminated by OCFL Object Roots
  3. Storage hierarchies within the same OCFL Storage Root SHOULD use just one layout pattern
  4. Storage hierarchies within the same OCFL Storage Root SHOULD consistently use either a directory hierarchy of OCFL Objects or top-level OCFL Objects

Extensions

The behavior of the storage root may be extended to support features from other specifications. An OCFL validator MUST ignore any files in the storage root it does not understand. Additional files MUST NOT appear in other directories under the storage root.

Non-normative note: Storage extensions can be used to support additional features, such as providing the storage hierarchy disposition when pairtree is in use, or additional human-readable text about the nature of the storage root.

Examples

Minimal OCFL Object

The following example OCFL Object has content that is a single file (file.txt), and just one version (v1):

[object root]
    ├── 0=ocfl_object_1.0
    ├── inventory.json
    ├── inventory.json.sha512
    └── v1
        ├── inventory.json
        ├── inventory.json.sha512
        └── content
            └── file.txt

The inventory for this OCFL Object, the same both at the top-level and in the v1 directory, might be:

{
  "digestAlgorithm": "sha512",
  "head": "v1",
  "id": "http://example.org/minimal",
  "manifest": {
    "7545b8...f67": [ "v1/content/file.txt" ]
  },
  "type": "Object",
  "versions": {
    "v1": {
      "created": "2018-10-02T12:00:00Z",
      "message": "One file",
      "state": {
        "7545b8...f67": [ "file.txt" ]
      },
      "type": "Version",
      "user": {
        "address": "alice@example.org",
        "name": "Alice"
      }
    }
  }
}

Versioned OCFL Object

The following example OCFL Object has three versions:

[object root]
    ├── 0=ocfl_object_1.0
    ├── inventory.json
    ├── inventory.json.sha512
    ├── v1        
    │   ├── inventory.json
    │   ├── inventory.json.sha512
    │   └── content
    │       ├── empty.txt
    │       ├── foo
    │       │   └── bar.xml
    │       └── image.tiff
    ├── v2    
    │   ├── inventory.json
    │   ├── inventory.json.sha512
    │   └── content
    │       ├── foo
    │       └── bar.xml
    └── v3
        ├── inventory.json
        ├── inventory.json.sha512
        └── content


In v1 there are three files, empty.txt, foo/bar.xml, and image.tiff. In v2 the content of foo/bar.xml is changed, empty2.txt is added with the same content as empty.txt, and image.tiff is removed. In v3 the file empty.txt is removed, and image.tiff is reinstated. As a result of forward-delta versioning, the object tree above shows only new content added in each version. The inventory shown below details the other changes, includes additional fixity information using md5 and sha1 digest algorithms, and minimal metadata for each version.

{
  "digestAlgorithm": "sha512",
  "fixity": {
    "md5": {
      "184f84e28cbe75e050e9c25ea7f2e939": [ "v1/content/foo/bar.xml" ],
      "2673a7b11a70bc7ff960ad8127b4adeb": [ "v2/content/foo/bar.xml" ],
      "c289c8ccd4bab6e385f5afdd89b5bda2": [ "v1/content/image.tiff" ],
      "d41d8cd98f00b204e9800998ecf8427e": [ "v1/content/empty.txt" ]
    },
    "sha1": {
      "66709b068a2faead97113559db78ccd44712cbf2": [ "v1/content/foo/bar.xml" ],
      "a6357c99ecc5752931e133227581e914968f3b9c": [ "v2/content/foo/bar.xml" ],
      "b9c7ccc6154974288132b63c15db8d2750716b49": [ "v1/content/image.tiff" ],
      "da39a3ee5e6b4b0d3255bfef95601890afd80709": [ "v1/content/empty.txt" ]
    }
  },
  "head": "v3",
  "id": "ark:/12345/bcd987",
  "manifest": {
    "4d27c8...b53": [ "v2/content/foo/bar.xml" ],
    "7dcc35...c31": [ "v1/content/foo/bar.xml" ],
    "cf83e1...a3e": [ "v1/content/empty.txt" ],
    "ffccf6...62e": [ "v1/content/image.tiff" ]
  },
  "type": "Object",
  "versions": {
    "v1": {
      "created": "2018-01-01T01:01:01Z",
      "message": "Initial import",
      "state": {
        "7dcc35...c31": [ "foo/bar.xml" ],
        "cf83e1...a3e": [ "empty.txt" ],
        "ffccf6...62e": [ "image.tiff" ]
      },
      "type": "Version",
      "user": {
        "address": "alice@example.com",
        "name": "Alice"
      }
    },
    "v2": {
      "created": "2018-02-02T02:02:02Z",
      "message": "Fix bar.xml, remove image.tiff, add empty2.txt",
      "state": {
        "4d27c8...b53": [ "foo/bar.xml" ],
        "cf83e1...a3e": [ "empty.txt", "empty2.txt" ]
      },
      "type": "Version",
      "user": {
        "address": "bob@example.com",
        "name": "Bob"
      }
    },
    "v3": {
      "created": "2018-03-03T03:03:03Z",
      "message": "Reinstate image.tiff, delete empty.txt",
      "state": {
        "4d27c8...b53": [ "foo/bar.xml" ],
        "cf83e1...a3e": [ "empty2.txt" ],
        "ffccf6...62e": [ "image.tiff" ]
      },
      "type": "Version",
      "user": {
        "address": "cecilia@example.com",
        "name": "Cecilia"
      }
    }
  }
}

BagIt in an OCFL Object

[[BagIt]] is a common file packaging specification, but unlike OCFL it does not provide a mechanism for content versioning. Using OCFL it is possible to store a BagIt structure with content versioning, such that when the object state is resolved, it creates a valid BagIt 'bag'. This example will illustrate how this can be accomplished, using the example of a basic bag given in the BagIt specification.

[object root]
    ├── 0=ocfl_object_1.0
    ├── inventory.json
    ├── inventory.json.sha512
    └── v1
        ├── inventory.json
        ├── inventory.json.sha512
        └── content
            └── myfirstbag
                ├── bagit.txt
                ├── data
                │   └── 27613-h
                │       └── images
                │           ├── q172.png
                │           └── q172.txt
                └── manifest-md5.txt

If, for example, a new directory were added in a subsequent version, the OCFL Object would look like this:

[object root]
    ├── 0=ocfl_object_1.0
    ├── inventory.json
    ├── inventory.json.sha512
    ├── v1
    │   ├── inventory.json
    │   ├── inventory.json.sha512
    │   └── content
    │       └── myfirstbag
    │           ├── bagit.txt
    │           ├── data
    │           │   └── 27613-h
    │           │       └── images
    │           │           ├── q172.png
    │           │           └── q172.txt
    │           └── manifest-md5.txt
    └── v2
        ├── inventory.json
        ├── inventory.json.sha512
        └── content
            └── myfirstbag
                ├── data
                │   └── 27614-h
                │       └── images
                │           ├── q173.png
                │           └── q173.txt
                └── manifest-md5.txt

The state of the object at version 2 would be the following BagIt object:

myfirstbag
    ├── bagit.txt
    ├── data
    │   ├── 27613-h
    │   │   └── images
    │   │       ├── q172.png
    │   │       └── q172.txt
    │   └── 27614-h
    │       └── images
    │           ├── q173.png
    │           └── q173.txt
    └── manifest-md5.txt

The OCFL Inventory for this object would be as follows:

{
  "digestAlgorithm": "sha512",
  "head": "v1",
  "id": "urn:uri:example.com/myfirstbag",
  "manifest": {
    "cf83e1...a3e": [ "v1/content/myfirstbag/bagit.txt" ],
    "f15428...83f": [ "v1/content/myfirstbag/manifest-md5.txt" ],
    "85f2b0...007": [ "v1/content/myfirstbag/data/27613-h/images/q172.png" ],
    "d66d80...8bd": [ "v1/content/myfirstbag/data/27613-h/images/q172.txt" ],
    "2b0ff8...620": [ "v2/content/myfirstbag/manifest-md5.txt" ],
    "921d36...877": [ "v2/content/myfirstbag/data/27614-h/images/q173.png" ],
    "b8bdf1...927": [ "v2/content/myfirstbag/data/27614-h/images/q173.txt" ]
  },
  "type": "Object",
  "versions": {
    "v1": {
      "created": "2018-10-09T11:20:29.209164Z",
      "message": "Initial Ingest",
      "state": {
        "cf83e1...a3e": [ "myfirstbag/bagit.txt" ],
        "85f2b0...007": [ "myfirstbag/data/27613-h/images/q172.png" ],
        "d66d80...8bd": [ "myfirstbag/data/27613-h/images/q172.txt" ],
        "f15428...83f": [ "myfirstbag/manifest-md5.txt" ]
      },
      "type": "Version",
      "user": {
        "address": "someone@example.org",
        "name": "Some One"
      }
    },
    "v2": {
      "created": "2018-10-31T11:20:29.209164Z",
      "message": "Added new images",
      "state": {
        "cf83e1...a3e": [ "myfirstbag/bagit.txt" ],
        "85f2b0...007": [ "myfirstbag/data/27613-h/images/q172.png" ],
        "d66d80...8bd": [ "myfirstbag/data/27613-h/images/q172.txt" ],
        "2b0ff8...620": [ "myfirstbag/manifest-md5.txt" ],
        "921d36...877": [ "myfirstbag/data/27614-h/images/q173.png" ],
        "b8bdf1...927": [ "myfirstbag/data/27614-h/images/q173.txt" ]
      },
      "type": "Version",
      "user": {
        "address": "somebody-else@example.org",
        "name": "Somebody Else"
      }
    }
  ]
}

Moab in an OCFL Object

[[Moab]] is an archive information package format developed and used by Stanford University. Many of the ideas in Moab have been refined by OCFL, and OCFL is designed to give institutions currently using Moab an easy path to adoption.

Converting content preserved in a Moab object in a way that does not compromise existing Moab access patterns whilst allowing for the eventual use of OCFL-native workflows requires a Moab to OCFL conversion tool. This tool uses the Moab-versioning gem to extract deltas and digests of the Moab data directory for each Moab version and translate those into version state blocks in an OCFL inventory file, which would be placed in the root directory of the Moab object. All extant files in the Moab version directories are tracked in the manifest block. The contents of the Moab manifests directory are not tracked in the version state blocks; they are effectively deleted from the OCFL object without compromising legacy access patterns.

During the transitionary period the OCFL inventory file exists only in the root of the Moab object. Once OCFL-native object creation workflows have been completed, future versions of that object will be fully OCFL compliant - new versions will no longer have a manifests directory and will contain an OCFL inventory file. At this stage OCFL tools will be able to access all versions of the content originally preserved by Moab.

Consider the following sample Moab object:

[object root]
    └── bj102hs9687
        ├── v0001
        │   └──content
        │       ├── data
        │       │   ├── content
        │       │   │   ├── eric-smith-dissertation-augmented.pdf
        │       │   │   └── eric-smith-dissertation.pdf
        │       │   └── metadata
        │       │       ├── contentMetadata.xml
        │       │       ├── descMetadata.xml
        │       │       ├── identityMetadata.xml
        │       │       ├── provenanceMetadata.xml
        │       │       ├── relationshipMetadata.xml
        │       │       ├── rightsMetadata.xml
        │       │       ├── technicalMetadata.xml
        │       │       └── versionMetadata.xml
        │       └── manifests
        │           ├── fileInventoryDifference.xml
        │           ├── manifestInventory.xml
        │           ├── signatureCatalog.xml
        │           ├── versionAdditions.xml
        │           └── versionInventory.xml
        ├── v0002
        │   └──content
        │       ├── data
        │       │   └── metadata
        │       │       ├── contentMetadata.xml
        │       │       ├── embargoMetadata.xml
        │       │       ├── events.xml
        │       │       ├── identityMetadata.xml
        │       │       ├── provenanceMetadata.xml
        │       │       ├── relationshipMetadata.xml
        │       │       ├── rightsMetadata.xml
        │       │       ├── versionMetadata.xml
        │       │       └── workflows.xml
        │       └── manifests
        │           ├── fileInventoryDifference.xml
        │           ├── manifestInventory.xml
        │           ├── signatureCatalog.xml
        │           ├── versionAdditions.xml
        │           └── versionInventory.xml
        └── v0003
            └──content
                ├── data
                │   └── metadata
                │       ├── contentMetadata.xml
                │       ├── descMetadata.xml
                │       ├── embargoMetadata.xml
                │       ├── events.xml
                │       ├── identityMetadata.xml
                │       ├── provenanceMetadata.xml
                │       ├── rightsMetadata.xml
                │       ├── technicalMetadata.xml
                │       ├── versionMetadata.xml
                │       └── workflows.xml
                └── manifests
                    ├── fileInventoryDifference.xml
                    ├── manifestInventory.xml
                    ├── signatureCatalog.xml
                    ├── versionAdditions.xml
                    └── versionInventory.xml

An OCFL inventory that tracks the /data directory would include a manifest comprised as follows:

"manifest": {
    "197320...2e1": [ "v0001/content/manifests/manifestInventory.xml" ],
    "a64b62...5c3": [ "v0001/content/manifests/versionAdditions.xml" ],
    "794d5e...38b": [ "v0001/content/manifests/signatureCatalog.xml" ],
    "a4d20f...b48": [ "v0001/content/manifests/fileInventoryDifference.xml" ],
    "363785...bfb": [ "v0001/content/manifests/versionInventory.xml" ],
    "98114a...588": [ "v0001/content/data/content/eric-smith-dissertation-augmented.pdf" ],
    "7f3d87...15b": [ "v0001/content/data/content/eric-smith-dissertation.pdf" ],
    "6d19f0...064": [ "v0001/content/data/metadata/technicalMetadata.xml" ],
    "6e4be4...375": [ "v0001/content/data/metadata/provenanceMetadata.xml" ],
    "d8a319...d0f": [ "v0001/content/data/metadata/descMetadata.xml" ],
    "de823a...acc": [ "v0001/content/data/metadata/rightsMetadata.xml" ],
    "080617...40c": [ "v0001/content/data/metadata/identityMetadata.xml" ],
    "e15267...58d": [ "v0001/content/data/metadata/versionMetadata.xml" ],
    "0d9e0b...9a2": [ "v0001/content/data/metadata/contentMetadata.xml" ],
    "dd9289...31d": [ "v0001/content/data/metadata/relationshipMetadata.xml" ],
    "f90947...11b": [ "v0002/content/manifests/manifestInventory.xml" ],
    "bb3011...a58": [ "v0002/content/manifests/versionAdditions.xml" ],
    "0dc4fc...f9f": [ "v0002/content/manifests/signatureCatalog.xml" ],
    "2f1cf8...ea1": [ "v0002/content/manifests/fileInventoryDifference.xml" ],
    "45b457...db5": [ "v0002/content/manifests/versionInventory.xml" ],
    "7519c5...63f": [ "v0002/content/data/metadata/provenanceMetadata.xml" ],
    "abda4c...622": [ "v0002/content/data/metadata/workflows.xml" ],
    "76549e...b2b": [ "v0002/content/data/metadata/rightsMetadata.xml" ],
    "bdc4d6...3b6": [ "v0002/content/data/metadata/events.xml" ],
    "7b331c...f9b": [ "v0002/content/data/metadata/identityMetadata.xml" ],
    "80ceac...b9c": [ "v0002/content/data/metadata/versionMetadata.xml" ],
    "4853a2...fbe": [ "v0002/content/data/metadata/contentMetadata.xml" ],
    "1d5090...f5f": [ "v0002/content/data/metadata/relationshipMetadata.xml" ],
    "f209bf...ceb": [ "v0002/content/data/metadata/embargoMetadata.xml" ],
    "04461b...5c6": [ "v0003/content/manifests/manifestInventory.xml" ],
    "190103...20f": [ "v0003/content/manifests/versionAdditions.xml" ],
    "24a84a...82d": [ "v0003/content/manifests/signatureCatalog.xml" ],
    "5f4d52...7ef": [ "v0003/content/manifests/fileInventoryDifference.xml" ],
    "2d7f8d...d67": [ "v0003/content/manifests/versionInventory.xml" ],
    "dd9125...d4b": [ "v0003/content/data/metadata/technicalMetadata.xml" ],
    "d9e177...477": [ "v0003/content/data/metadata/provenanceMetadata.xml" ],
    "4f5908...4f5": [ "v0003/content/data/metadata/workflows.xml" ],
    "e64db0...500": [ "v0003/content/data/metadata/descMetadata.xml" ],
    "05fa51...818": [ "v0003/content/data/metadata/rightsMetadata.xml" ],
    "d70dd8...5ad": [ "v0003/content/data/metadata/events.xml" ],
    "509a2d...dc6": [ "v0003/content/data/metadata/identityMetadata.xml" ],
    "548066...893": [ "v0003/content/data/metadata/versionMetadata.xml" ],
    "93884e...aae": [ "v0003/content/data/metadata/contentMetadata.xml" ],
    "4c5ab4...b02": [ "v0003/content/data/metadata/embargoMetadata.xml" ]
}

The version 1 state block would look like this. Note the absence of the manifests/ directory.

"state": {
    "98114a...588": [ "data/content/eric-smith-dissertation-augmented.pdf" ],
    "7f3d87...15b": [ "data/content/eric-smith-dissertation.pdf" ],
    "6d19f0...064": [ "data/metadata/technicalMetadata.xml" ],
    "6e4be4...375": [ "data/metadata/provenanceMetadata.xml" ],
    "d8a319...d0f": [ "data/metadata/descMetadata.xml" ],
    "de823a...acc": [ "data/metadata/rightsMetadata.xml" ],
    "080617...40c": [ "data/metadata/identityMetadata.xml" ],
    "e15267...58d": [ "data/metadata/versionMetadata.xml" ],
    "0d9e0b...9a2": [ "data/metadata/contentMetadata.xml" ],
    "dd9289...31d": [ "data/metadata/relationshipMetadata.xml" ]
}

The version 2 state block would contain:

"state": {
    "98114a...588": [ "data/content/eric-smith-dissertation-augmented.pdf" ],
    "7f3d87...15b": [ "data/content/eric-smith-dissertation.pdf" ],
    "6d19f0...064": [ "data/metadata/technicalMetadata.xml" ],
    "7519c5...63f": [ "data/metadata/provenanceMetadata.xml" ],
    "d8a319...d0f": [ "data/metadata/descMetadata.xml" ],
    "76549e...b2b": [ "data/metadata/rightsMetadata.xml" ],
    "7b331c...f9b": [ "data/metadata/identityMetadata.xml" ],
    "80ceac...b9c": [ "data/metadata/versionMetadata.xml" ],
    "4853a2...fbe": [ "data/metadata/contentMetadata.xml" ],
    "1d5090...f5f": [ "data/metadata/relationshipMetadata.xml" ],
    "abda4c...622": [ "data/metadata/workflows.xml" ],
    "bdc4d6...3b6": [ "data/metadata/events.xml" ],
    "f209bf...ceb": [ "data/metadata/embargoMetadata.xml" ]
}

The version 3 state block would be:

"state": {
    "98114a...588": [ "data/content/eric-smith-dissertation-augmented.pdf" ],
    "7f3d87...15b": [ "data/content/eric-smith-dissertation.pdf" ],
    "dd9125...d4b": [ "data/metadata/technicalMetadata.xml" ],
    "d9e177...477": [ "data/metadata/provenanceMetadata.xml" ],
    "e64db0...500": [ "data/metadata/descMetadata.xml" ],
    "05fa51...818": [ "data/metadata/rightsMetadata.xml" ],
    "509a2d...dc6": [ "data/metadata/identityMetadata.xml" ],
    "548066...893": [ "data/metadata/versionMetadata.xml" ],
    "93884e...aae": [ "data/metadata/contentMetadata.xml" ],
    "1d5090...f5f": [ "data/metadata/relationshipMetadata.xml" ],
    "4f5908...4f5": [ "data/metadata/workflows.xml" ],
    "d70dd8...5ad": [ "data/metadata/events.xml" ],
    "4c5ab4...b02": [ "data/metadata/embargoMetadata.xml" ]
}