Rules for Encoding Ingest Packages in METS (i.e., Fedora Extension)

As usual an example is worth a thousand words. So, please refer to the sample Fedora object that is encoded for ingest in METS: mets-ingest-example.xml

Fedora supports ingest of objects in a Fedora-specific extension of Metadata Encoding and Transmission Standard (METS). More information on METS can be found at http://www.loc.gov/standards/mets/. As of Fedora 1.0, the repository will only accept a Fedora-specific extension of the METS 1.0 Schema. We made a few minor additions to METS 1.0 to accommodate the requirements of Fedora. In the future, we plan to accept XML submissions encoded to the METS 1.3 Schema, which will have changes that accommodate the few Fedora-specific extensions to METS. In the mean time, we validate the Fedora Object XML submissions against the Fedora extension of METS which is published at: http://www.fedora.info/definitions/1/0/mets-fedora-ext.xsd

Since METS was designed to be very generic and support a variety of uses, the rules of the METS Schema are very general-purpose. Fedora objects must conform to other rules that are beyond the scope of what is expressed in the METS schema. Therefore, the Fedora Object XML submissions will also be validated against a set of Fedora-specific rules that are expressed using the Schematron language (link for Schematron). Internally, the repository will use Schematron to enforce these rules on incoming XML submission packages. The Schematron rules are expressed in XML and can be found in the Fedora server distribution at:

%FEDORA_HOME%\dist\server\schematron\metsExtRules1-0.xml.

For convenience and ease of understanding we have enumerated the Fedora rules in plain English below:

Object Encoding Rules:

Encoding by hand requires a pretty good understanding of METS, although it can be done by following the patterns in the demo objects that come with the Fedora distribution. Demo objects are located at: %FEDORA_HOME%\dist\client\demo.

  1. General attributes:

    1. On METS root element, the OBJID attribute will represent the Fedora object PID. Normally, this should be left empty so that the Fedora repository can generate a new PID. However, you can assign test/demo PIDs by inserting a value in OBJID that begins with “demo:” or “test:” for example, “demo:100”

    2. On METS root element, the value of the TYPE attribute must be “FedoraObject” , “FedoraBDefObject”, or “FedoraBMechObject”

    3. On METS root element, the value of LABEL serves as the official description of the object. If there is no Dublin Core record present in the object, the Fedora repository will use this label to populate the title element of a baseline Dublin Core record for the object.

    4. On METS root element, the PROFILE element can be used by institutions to classify different types of Fedora data objects.

    5. On the METS:metsHdr element the CREATEDATE attribute should be omitted since the Fedora repository will assign this at ingest time. Fedora dates are in the ISO 8601 format in milliseconds and with UTC time as follows: yyyy-MM-ddTHH:mm:ss.SSSZ. The same thing goes for LASTMODDATE.

    6. On the METS:metsHdr element the RECORDSTATUS should be set to “I” to indicate the METS serves as an “Ingest” package for Fedora.

  2. Datastreams:

    1. To create a proper section for Datastreams in the METS file, the METS:fileSec must have a single child METS:fileSec element whose ID attribute has the value “DATASTREAMS”

    2. Datastreams that are encoded in the METS:fileSec must follow the following pattern to establish proper version groups and datastream IDs. Each datastream has its own METS:fileGrp whose ID attribute is the official datastream ID. The recommended convention is ID=”DSn” where n is a number (for example ID=”DS1” or ID=”DS2).”

    3. Within a METS:fileGrp, there can be one or more METS:file elements to represent different versions of a datastream. As of Fedora 1.2, versioning of data objects is supported. The METS:file element for the datastream must have and ID attribute that represent the version number relative to the datastream ID set in the METS:fileGrp. The recommended convention is ID=”DSn.v” where n is the number of the datastream and v is the version number (for example ID=DS1.0 or ID=DS1.1).

    4. The METS:file element for a datastream must have a MIMETYPE.

    5. The METS:file element for a datastream must have an OWNERID attribute. In Fedora, the OWNERID attribute is used to encode the Datastream Control Group. The following are valid values:

      • “M” – Managed Content. This tells the repository to store the datastream’s content byte stream inside the repository. When the METS:file contains “M” on the OWNERID, the repository will resolve the URL associated with the METS:file element and pull the content into the repository for permanent storage. Fedora will establish its own local identifier for retrieving the content, and disregard the original URL that came in on the METS submission package.

      • “E” - External Referenced Content. This tells the repository to store the URL for the datastream content, not the content byte stream itself. For this type of datastream, Fedora does not actually manage or have custodianship of the content, but it manages the link to the content and some basic metadata about it.

      • “R” – Redirected Content. Like “E” this tells the repository to store the URL for the datastream content, not the content byte stream itself. More importantly, it tells the repository not to mediate or proxy this content at runtime. Instead, the repository will redirect to the URL at run time. This is desirable when a datastream points to a streaming media source, or to a complex web page where some components are lost during proxying.

  3. Inline XML Datastreams:

    1. Datastreams can also be encoded in the METS:dmdSecFedora and METS:amdSec. These are considered “inline XML datastreams” in Fedora. The METS:dmdSecFedora and METS:amdSec elements act as datastream version group containers just like the METS:fileGrp acts for regular datastreams. Within these elements, the METS “metadata section” elements (i.e., METS:techMD, METS:rightsMD, etc.) are used for the specific version instances of the inline metadata datastreams, just like the METS:file acts for regular datastreams. The datastream IDs work the same way, where the ID attribute on the container element acts as the datastream ID, and the ID on the metadata section element acts as the datastream version ID.

    2. Do not use the schemaLocation attribute in the root element of inline XML datastreams (within METS:mdWrap element).

  4. Dublin Core Record Datastream:

    1. A Dublin Core (DC) record is optional in the Fedora object submission package. If one is not provided the repository will automatically create a minimal DC record in the object by using the LABEL (on METS root) as the DC title element. It will also use the object PID as the DC identifier element.

    2. If a DC record is provided in the METS submission package it should be encoded within a METS:dmdSecFedora. The dmdSecFedora element will act as the datastream version group container. It MUST have an ID attribute whose value is “DC” to be recognized by Fedora!

    3. Within the METS:dmdSecFedora, there must be one METS:descMD element. This element is part of the Fedora extension of METS 1.0 and is used to encode a specific version of the DC datastream within the version group container. The ID attribute on the METS:descMD element MUST have the value “DC1.0” to be recognized by Fedora.

    4. The actual DC metadata should be encoded using the Open Archives Initiative (OAI) Dublin Core schema.

  5. Disseminators

    1. Each Disseminator is encoded in its own METS:behaviorSec element. The METS:behaviorSec element acts as a version container for different versions of the Disseminator. As of Fedora 1.2.1, only one version is supported. Each Disseminator must have a disseminator ID which is encode in the ID attribute of the METS:behaviorSec. The recommended convention is ID=”DISSn” where n is a number (for example ID=”DISS1” or ID=”DISS2).”

    2. The METS:serviceBinding element represents a particular version of the disseminator. Again, in Fedora 1.2.1 only one version is supported. The element must have and ID attribute that represent the version number relative to the Disseminator ID that is set in the METS:behaviorSec. The recommended convention is ID=”DISSn.v” where n is the number of the Disseminator and v is the version number (for example ID=DISS1.0 or ID=DISS1.1).

    3. The METS:serviceBinding element must have a STRUCID attribute. The value of this attribute the ID of a METS:structMap section in the submission package. The METS:structMap section constitutes the Fedora “Datastream Binding Map” which identifies the Datastreams in the object that will be used by the Disseminator. Specifically, these are the datastreams that fulfill the “data contract” defined by the Behavior Mechanism Object that is pointed to by the Disseminator.

    4. The METS:structMap, in turn, points to Datastreams in the object, and gives them a special name via the TYPE attribute of the METS:structMap. Again, the METS:structMap encodes the fulfillment of the “data contract” that the Behavior Mechanism object specifies so that datastreams can act as input parameters to service methods (described earler).

    5. To make a Disseminator point to a Behavior Definition Object (to make the object subscribe to a “behavior contract”), there must be a single METS:interfaceMD element as a child to the METS:serviceBinding element. The METS:interfaceMD element must have a LOCTYPE attribute whose value is “URN” and an xlink:href attribute whose value is the PID of a Fedora Behavior Definition Object.

    6. To make a Disseminator point to a Behavior Mechanism Object (to associate a particular service that runs a behavior contract’s methods), there must be a single METS:serviceBindMD element as a child to the METS:serviceBinding element. The METS: serviceBindMD element must have a LOCTYPE attribute whose value is “URN” and an xlink:href attribute whose value is the PID of a Fedora Behavior Mechanism Object.