Metadata entity analysis

Main Page > Vancouver Digital Archives > Requirements Analysis > Metadata entity analysis

'''

Objects
'''

Common Object [Entity] Symantic Units
Useful information to include:
 * When metadata is generated (at what step, ex. AD1v.4)
 * How metadata is generated - i.e., what application/software component, input by archivist, implicit in policy, etc.
 * possible values


 * PREMIS
 * 1.1 objectIdentifier (M, R)
 * 1.1.1 objectIdentifierType
 * Possible values:
 * TRIM record URI
 * TRIM [record] number (TRIM xml propid=2)
 * TRIM or Creator's file name
 * Archives ID number (Accession number + auto generated number)
 * 1.1.2 objectIdentifierValue (M, NR)
 * 1.2 objectCategory (M, NR) {representation, file, bitstream}
 * PREMIS suggests: representation, file, bitstream
 * 1.3 preservationLevel (O, R) [representation, file]
 * 1.3.1 preservationLevelValue (M, NR) [representation, file]
 * 1.3.2 preservationLevelRole (O, NR) [representation, file]
 * 1.3.3 preservationLevelRationale (O, R) [representation, file]
 * 1.3.4 preservationLevelDateAssigned (O, NR) [representation, file]
 * 1.4 significantProperties (O, R)
 * 1.4.1 significantPropertiesType (O, NR)
 * 1.4.2 significantPropertiesValue (O, NR)
 * 1.4.3 significantPropertiesExtension (O, R)
 * 1.5 objectCharacteristics (M, R) [file, bitstream]
 * 1.5.1 compositionLevel (M, NR) [file, bitstream]
 * 1.5.2 fixity (O, R) [file, bitstream]
 * 1.5.2.1 messageDigestAlgorithm (M, NR) [file, bitstream]
 * Possible values:
 * md5 hash
 * 1.5.2.2 messageDigest (M, NR) [file, bitstream]
 * 1.5.2.3 messageDigestOriginator (O, NR) [file, bitstream]
 * 1.5.3 size (O, NR) [file, bitstream]
 * 1.5.4 format (M, R) [file, bitstream]
 * 1.5.4.1 formatDesignation (O, NR) [file, bitstream]
 * 1.5.4.1.1 formatName (M, NR) [file, bitstream]
 * 1.5.4.1.2 formatVersion (O, NR) [file, bitstream]
 * 1.5.4.2 formatRegistry (O, NR) [file, bitstream]
 * 1.5.4.2.1 formatRegistryName (M, NR) [file, bitstream]
 * 1.5.4.2.2 formatRegistryKey (M, NR) [file, bitstream]
 * 1.5.4.2.3 formatRegistryRole (O, NR) [file, bitstream]
 * 1.5.4.3 formatNote (O, R) [file, bitstream]
 * 1.5.5 creatingApplication (O, R) [file, bitstream]
 * 1.5.5.1 creatingApplicationName (O, NR) [file, bitstream]
 * 1.5.5.2 creatingApplicationVersion (O, NR) [file, bitstream]
 * 1.5.5.3 dateCreatedByApplication (O, NR) [file, bitstream]
 * 1.5.5.4 creatingApplicationExtension (O, R) [file, bitstream]
 * 1.5.6 inhibitors (O, R) [file, bitstream]
 * 1.5.6.1 inhibitorType (M, NR) [file, bitstream]
 * 1.5.6.2 inhibitorTarget (O, R) [file, bitstream]
 * 1.5.6.3 inhibitorKey (O, NR) [file, bitstream]
 * 1.5.7 objectCharacteristicsExtension (O, R) [file, bitstream]
 * 1.6 originalName (O, NR) [representation, file]
 * 1.7 storage (M, R) [file, bitstream]
 * 1.7.1 contentLocation (O, NR) [file, bitstream]
 * 1.7.1.1 contentLocationType (M, NR) [file, bitstream]
 * 1.7.1.2 contentLocationValue (M, NR) [file, bitstream]
 * 1.7.2 storageMedium (O, NR) [file, bitstream]
 * 1.8 environment (O, R)
 * 1.8.1 environmentCharacteristic (O, NR)
 * 1.8.2 environmentPurpose (O, R)
 * 1.8.3 environmentNote (O, R)
 * 1.8.4 dependency (O, R)
 * 1.8.4.1 dependencyName (O, R)
 * 1.8.4.2 dependencyIdentifier (O, R)
 * 1.8.4.2.1 dependencyIdentifierType (M, NR)
 * 1.8.4.2.2 dependencyIdentifierValue (M, NR)
 * 1.8.5 software (O, R)
 * 1.8.5.1 swName (M, NR)
 * 1.8.5.2 swVersion (O, NR)
 * 1.8.5.3 swType (M, NR)
 * 1.8.5.4 swOtherInformation (O, R)
 * 1.8.5.5 swDependency (O, R)
 * 1.8.6 hardware (O, R)
 * 1.8.6.1 hwName (M, NR)
 * 1.8.6.2 hwType (M, NR)
 * 1.8.6.3 hwOtherInformation (O, R)
 * 1.8.7 environmentExtension (O, R)
 * 1.9 signatureInformation (O, R) [file, bitstream]
 * 1.9.1 signature (O, R)
 * 1.9.1.1 signatureEncoding (M, NR) [file, bitstream]
 * 1.9.1.2 signer (O, NR) [file, bitstream]
 * 1.9.1.3 signatureMethod (M, NR) [file, bitstream]
 * 1.9.1.4 signatureValue (M, NR) [file, bitstream]
 * 1.9.1.5 signatureValidationRules (M, NR) [file, bitstream]
 * 1.9.1.6 signatureProperties (O, R) [file, bitstream]
 * 1.9.1.7 keyInformation (O, NR) [file, bitstream]
 * 1.9.2 signatureInformationExtension (O, R) [file, bitstream]
 * 1.10 relationship (O, R)
 * 1.10.1 relationshipType (M, NR)
 * 1.10.2 relationshipSubType (M, NR)
 * 1.10.3 relatedObjectIdentification (M, R)
 * 1.10.3.1 relatedObjectIdentifierType (M, NR)
 * 1.10.3.2 relatedObjectIdentifierValue (M, NR)
 * 1.10.3.3 relatedObjectSequence (O, NR)
 * 1.10.4 relatedEventIdentification (O, R)
 * 1.10.4.1 relatedEventIdentifierType (M, NR)
 * 1.10.4.2 relatedEventIdentifierValue (M, NR)
 * 1.10.4.3 relatedEventSequence (O, NR)
 * 1.11 linkingEventIdentifier (O, R)
 * 1.11.1 linkingEventIdentifierType (M, NR)
 * 1.11.2 linkingEventIdentifierValue (M, NR)
 * 1.12 linkingIntellectualEntityIdentifier (O, R)
 * 1.12.1 linkingIntellectualEntityIdentifierType (M, NR)
 * 1.12.2 linkingIntellectualEntityIdentifierValue (M, NR)
 * 1.13 linkingRightsStatementIdentifier (O, R)
 * 1.13.1 linkingRightsStatementIdentifierType (M, NR)
 * 1.13.2 linkingRightsStatementIdentifierValue (M, NR)
 * Other
 * date
 * time
 * size
 * SIPs and AIPs would have total file counts (for manifest checking)
 * SIPs and AIPs would have total file counts (for manifest checking)

Information Packages

 * City SIP
 * A City SIP is a directory folder that contains one or more documents exported from any City system outside of TRIM, and a single .xml file that contains all of the metadata for the exported files.


 * City AIP
 * A zipped file containing content and metadata from a City SIP, normalized or other derivative versions of the content objects, additional metadata generated on or extracted from City SIP content objects during the ingest process, and packaging information about the City AIP. All objects in the City AIP are organized according to the BagIt file packaging format.


 * Donor SIP
 * One (or many?) directory folder(s) that contains one or more documents exported from a private donor to the City, and, ideally .xml file(s) that contain all of the metadata for the files.


 * Donor AIP
 * A zipped file containing content and metadata from a Donor SIP, normalized or other derivative versions of the content objects, additional metadata generated on or extracted from Donor SIP content objects during the ingest process, and packaging information about the Donor SIP. All objects in the Donor AIP are organized according to the BagIt file packaging format.


 * Transfer
 * A transfer is a set of one or more SIPs that the Archives has agreed to receive. All SIPs contained within a transfer have the same OPR


 * VanDocs SIP
 * A VanDocs SIP is a directory folder that contains one or more documents exported from TRIM, and a single .xml file that contains all of the TRIM metadata for the exported files.


 * VanDocs AIP
 * A zipped file containing content and metadata from a VanDocs SIP, normalized or other derivative versions of the content objects, additional metadata generated on or extracted from VanDocs_SIP content objects during the ingest process, and packaging information about the VanDocs AIP. All objects in the VanDocs AIP are organized according to the BagIt file packaging format.

Smaller Objects

 * Data object (bitstream or file)
 * Just the object without Representation Information or PDI.


 * Derivative (representation or file)
 * A copy of a record created by the Archives during Ingest or Preservation activities (i.e. normalized version of a file, jpg of a tif, etc.)


 * Record (representation or file)
 * Data object together with associated Representation Information and PDI - the target of Preservation activities


 * Sequestered Object (bitstream or file)
 * A data object removed from a SIP because of the presence of malware. [example: email message with attachment whose msg is corrupted, but the attachment is fine]

Object Characteristics
Transfer


 * Is initiated by RIM
 * Is approved by OPR
 * Is approved by Archives
 * Has an OPR
 * Contains 1 or more SIPs

VanDocs SIP


 * Contains 1 or more data objects
 * Contains 1 or more records
 * Contains 1 TRIM metadata document (aka manifest)
 * Belongs to 1 transfer event
 * Has origin location
 * Has target location

VanDocs AIP


 * Contains 1 or more data objects
 * Contains 1 or more records
 * Contains 0 or more derivative objects
 * Contains 1 TRIM metadata document (aka manifest)
 * Contains ingest metadata (Rep Info, PDI, and...)
 * Contains packaging information
 * Was generated from 1 VanDocs SIP
 * Is stored at a location
 * Has backup copies at a location

City SIP


 * Contains 1 or more data objects
 * Contains 1 metadata document
 * Contains 1 or more records
 * Belongs to 1 or more transfer(s)
 * Has storage location
 * Has backup location

City AIP


 * Contains 1 or more data objects
 * Contains 1 or more records
 * Contains 0 or more derivative objects
 * Contains 1 metadata document (could be manifest)
 * Contains ingest metadata (Rep Info, PDI, and...)
 * Contains packaging information
 * Was generated from 1 or more City SIPs
 * Is stored at a location
 * Has backup copies at a location

Donor SIP


 * Contains 1 or more data objects
 * Contains 1 or more records
 * Contains 1 metadata document (could be manifest)
 * Belongs to 1 or more transfer(s)
 * Has storage location
 * Has backup location

Donor AIP


 * Contains 1 or more data objects
 * Contains 1 or more records
 * Contains 0 or more derivative objects
 * Contains 1 metadata document (could be manifest)
 * Contains ingest metadata (Rep Info, PDI, and...)
 * Contains packaging information
 * Was generated from 1 or more Donor SIP(s)
 * Is stored at a location
 * Has backup copies at a location

Record


 * Has record metadata
 * Belongs to record description
 * Belongs to 1 transfer
 * Belongs to 1 SIP
 * Has Rep Info and PDI metadata
 * Has 0 or more derivative object(s)
 * Was derived from 0 or 1 data objects
 * Belongs to 1 AIP
 * Belongs to classification
 * Belongs to file
 * Has author
 * Has creator
 * Has addressee
 * Has title
 * Has subject
 * Etc.
 * Has original data object(s)

Sequestered object
 * Belongs to 1 SIP
 * Has sequester location

Events
'''

Common Event Symantic Units

 * PREMIS
 * 2.1 eventIdentifier (M, R)
 * 2.1.1 eventIdentifierType (M, NR) [see types below]
 * 2.1.2 eventIdentifierValue (M, NR)
 * 2.2 eventType (M, NR)
 * 2.3 eventDateTime (M, NR) [see below, should this be expanded to start time/end time?]
 * 2.4 eventDetail (O, NR)
 * 2.5 eventOutcomeInformation (O, R)
 * 2.5.1 eventOutcome (O, NR)
 * 2.5.2 eventOutcomeDetail (O, R) [see "Reports generated" below]
 * 2.5.5.1 eventOutcomeDetailNote (O, NR)
 * 2.5.5.2 eventOutcomeDetailExtension (O, R)
 * 2.6 linkingAgentIdentifier (O, R)
 * 2.6.1 linkinAgentIdentifierType
 * 2.6.2 linkingAgentIdentifierValue (M, NR)
 * 2.6.3 linkinAgentRole (O, R)
 * 2.7 linkingObjectIdentifier (O, R)
 * 2.7.1 linkingObjectIdentifierType (M, NR)
 * 2.7.2 linkingObjectIdentifierValue (M, NR)
 * 2.7.3 linkingObjectRole (O, R)


 * Other
 * Start time
 * End Time
 * Reports generated [could this be included in Outcome Detail?]
 * location of reports generated (another environment?)
 * Status? {not started, in progress, completed}
 * Environment - could apply to location of any event, such as data management location, backup location, etc.
 * Aggregate result {pass= all passed; fail=single failure or more} [could this be included in Outcome Detail?]

Event Types

 * Accession
 * The Archives takes formal custody and control of a SIP. The creator may destroy all related objects in their custody.


 * Archival Appraisal
 * The process of assessing the value of records for the purpose of determining the length and conditions of their preservation. Selection considers enduring value (fonds) and technical (format, record type, etc) feasibility of preservation.  Includes automated and manual (human) processes at multiple points in the Ingest process [See City of Vancouver Archives AD1 (v1), AD2 (v5), and AD3 (v4)] .  Formal accession follows the final archival appraisal.


 * Description
 * Any process towards the complete arrangement and description of an object. An object that has been described is, at a minimum, accessible (at least internally) with some contextual information.


 * Back-up
 * To make a copy of a data file for the purpose of system recovery.


 * Compression
 * The process of coding data to save storage space or transmission time. (PREMIS Data Dictionary)


 * Deaccession
 * the process of removing an object from the inventory of a repository (PREMIS Data Dictionary)


 * Decompression
 * The process of reversing the effects of compression. (PREMIS Data Dictionary)


 * Destruction
 * The process of removing an object from any part of the system.


 * Digital Signature Validation
 * The process of determining that a decrypted digital signature matches an expected value. (PREMIS Data Dictionary)


 * Dissemination
 * The precess of retrieving an object from repository storage and making it available to users. (PREMIS Data Dictionary)


 * Fixity (Integrity) Checking
 * Checking to make sure that the information which documents the authentication mechanisms and provides authentication keys

to ensure that the Content Information object has not been altered in an undocumented manner. (i.e. a Cyclical Redundancy Check (CRC) code for a file, checksums, etc.).
 * Format Characterization
 * Message digest calculation, metadata extraction, identifying and validating formats


 * Ingest Scheduling
 * [could have a variety of types of ingest, ie for City SIPS vs. Donor SIPS; or could be media specific, ie we already know what the SIP consists of media-wise, like a bunch of videos from a single known system and can skip some of the Ingest activities]
 * A transferred SIP is queued for bundled Ingest activities, which include Manifest Checking, Malware Checking, Quarantine, Accession, Fixity Checking, Format Characterization, Transformation (aka Normalization/type of Migration instance), Packaging, Storage, Uploading to Data Management. Once each event is completed, its status will change to "complete" and it will go on to the next event. Upon completion of the final step, Ingest Scheduling status will be "complete".


 * Malware Checking
 * The process of scanning files for malicious code.


 * Manifest Checking
 * The process during Ingest of checking that all files listed on the manifest are present and that all files present are listed on the manifest.


 * Migration
 * The transfer of digital information, while intending to preserve it, within the OAIS. It is distinguished from transfers in general by three attributes:a focus on the preservation of the full information content; a perspective that the new archival implementation of the information is a replacement for the old; and an understanding that full control and responsibility over all aspects of the transfer resides with the OAIS.(as defined by CASPAR )
 * Types of migration:
 * refreshment - A digital migration where the effect is to replace a media instance with a copy that is sufficiently exact that all Archival Storage hardware and software continues to run as before. (OAIS)
 * replication - The process of creating a copy of an object that is, bit-wise, identical to the original (duplication) (PREMIS Data Dictionary)
 * repackaging - A migration in which there is an alteration in the Packaging Information of the AIP. (CASPAR ))
 * transformation - A Digital Migration in which there is an alteration to the Content Information or PDI of an Archival Information Package. (OAIS) note: Derivation, could be Normalization, file type conversion, etc.


 * Notification
 * [As of this draft, Nov 09, the Archives would notify the creator about whether their SIP (or part(s) of the SIP) was accepted or rejected, whether the SIP (or part(s) of the SIP) requires resubmission, and whether it has failed multiple times]


 * Packaging
 * Binding and identifying the components of an Archival Information Package (AIP). (Currently done using BagIt)


 * Preservation Scheduling
 * [based on risk to objects and then depending on object type and risk, preservation actions would be scheduled.. likely to be variations of migration events]


 * Quarantine
 * The process of isolating an object for an assigned period of time prior to malware checking to avoid possible object corruption and infection of other system files, while allowing malware registries time to update their contents in order to perform a relevant check on the isolated files.


 * Repair
 * The process of repairing sequestered objects that have failed the malware check.


 * Sequestering
 * The process of secluding objects that have failed the malware check.


 * Storage
 * The process of queuing and placing the AIP into Archival Storage.


 * Transfer
 * The process by which the repository actively obtains an object - not the same as accession [PREMIS Data Dictionary calls this "capture"]


 * Uploading to Data Management
 * The process of uploading the AIP and/or AIP derivatives to Data Management, which may or may not simultaneously create derivatives of certain file types.

Unique Event Characteristics
These are special characteristics of events that may not be covered by PREMIS or other entities identified above.


 * Accession
 * special status indicators: {SIP transfer received(Archives accession pending audit); SIP audit passed (Archives has ownership, RIM may destroy corresponding records}}
 * Transfer
 * assigned transfer location - would have to have "from" and "to" environments
 * Quarantine
 * Quarantine Duration

'''

Agents
'''

Common Agent Symantic Units

 * PREMIS
 * 3.1 agentIdentifier
 * 3.1.1 agentIdentifierType
 * 3.1.2 agentIdentifierValue
 * 3.2 agentName
 * 3.3 agentType

Agent Types

 * Archival Storage (database?)
 * Archives (ex. name: City of Vancouver)
 * Archivist
 * Creator
 * Consumer
 * Data Management (ex. name: Qubit)
 * Digital Archives (ex. name: Archivematica)
 * Donor
 * FITS
 * Normalization software (ex. name: Xena)

Agent Characteristics
'''
 * Archival Storage (database?)
 * Archives (ex. name: City of Vancouver)
 * Archivist
 * Creator
 * Consumer
 * Data Management (ex. name: Qubit)
 * Digital Archives (ex. name: Archivematica)
 * Donor
 * FITS
 * Normalization software (ex. name: Xena)

Rights
'''

Common Rights Symantic Units

 * PREMIS