RQ-2

Main Page > Vancouver Digital Archives > Requirements Analysis > Archival storage

The technology/tool(s) that will be used to provide Archival storage functionality should/may have the following functionality:

snapshots
Snapshots are a important feature when considering archival storage. If a ingest error has occur it is important to be able to roll back to previous states. A scheme of daily-2 weekly-1 monthly-1 should be considered usable. If snapshot features are not native to the filesystem however a POSIX environment exists *nix utilities such as amanda or rsnapshot could be used to carry out this task.

incremental backups
In order for a snapshot service to scale it must utilize incremental backup features, by using file links a duplicate file will not be needed for snapshot iteration. This allows many snapshots to exist that only contain new changes. Also there are filesystem utilities that can cary out this task.

periodic file integrity checks using checksums(aka Scrubbing)
A checksum aware filesystem ensures that any data inconsistencies are logged and repaired by the filesystem. This is often cordinated by a mds(meta data service) which keeps track of checksums.

high availability
When managing archival data, constant access is to the data is very important. Many issues such as hardware failure or software issues must be taken into account when deploying a system. Fail over will allow a hotspare machine to take over application duties while a systems admin can replace failed hardware. This allows for minimized downtime.

Backup to Tape
Tape backups provide a non-volitile - high capacity method of data storage. Tapes are also used for off site backup in low bandwidth networks. It also provides insurance for disaster recovery in case any of the networked/hard-disk storage is permanently eliminated for whatever reason (e.g. human error, natural disaster, malicious attack).

multiple file access
These filesystems are important as the allow more then 1 machine to access and modify data. With out this loadballencing/failover becomes more complicated.

distributed storage (raid/striping or distribution of files)
Performance and reliability are the main components of distributed storage. Striping allows for data repair and distributed disk access.

rapid expansion
The ability to HOT add new nodes to a pool or grid is important. Administrators need the ability to add storage to the grid with out bringing down the network.

network aware
Network aware filesystems allow us to scale with out the purchase of expensive hardware (fibre chanel, scsi, etc etc 500$+). It allows filesystems to use normal 100/1000 adapters (25$). This can mean a world of difference for a smaller organization that is seeking to increase its data storage.

ease of deployment
When setting out to deploy OAIS a method that will work for a small organization but will also scale out for large organizations

IRODS
?? Waiting for mailing list replies  ??