Implementing a Reliable Digital Object Archive

  • Brian Cooper
  • Arturo Crespo
  • Hector Garcia-Molina
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1923)

Abstract

An Archival Repository reliably stores digital objects for long periods of time (decades or centuries). The archival nature of the system requires new techniques for storing, indexing, and replicating digital objects. In this paper we discuss the specialized indexing needs of a write-once archive. We also present a reliability algorithm for effectively replicating sets of related objects. We describe a data import utility for archival repositories. Finally, we discuss and evaluate a prototype repository we have built, the Stanford Archival Vault (SAV).

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [2]
    Yuri Breitbart, Raghavan Komondoor, Rajeev Rastogi, S. Seshadri, and Avi Silberschatz. Update propagation protocols for replicated databases. In Proceedings of the ACM SIGMOD Conference, 1999.Google Scholar
  2. [3]
    Yuan Chen, Jan Edler, Andrew Goldberg, Allan Gottlieb, Sumeet Sobti, and Peter Yianilos. A prototype implementation of archival intermemory. In Proceedings of the Fourth ACM DL Conference, 1999.Google Scholar
  3. [4]
    Ann Chervenak, Vivekenand Vellanki, and Zachary Kurmas. Protecting file systems: A survey of backup techniques. In Proceedings Joint NASA and IEEE Mass Storage Conference, March 1998.Google Scholar
  4. [5]
    Brian Cooper, Arturo Crespo, and Hector Garcia-Molina. Implementing a reliable digital object archive. http://dbpubs.stanford.edu/pub/2000-27, 2000. Extended version of paper.
  5. [6]
    Brian Cooper and Hector Garcia-Molina. InfoMonitor: Unobtrusively archiving a World Wide Web server. http://www-db.stanford.edu/pub/papers/fmpaper.ps, 2000. Technical Report.
  6. [7]
    Inktomi Corporation. Web surpasses one billion documents. http://-www.inktomi.com/new/press/billion.html, 2000.
  7. [8]
    Arturo Crespo and Hector Garcia-Molina. Awareness services for digital libraries.In Lecture Notes in Computer Science, volume 1324, 1997.CrossRefGoogle Scholar
  8. [9]
    Arturo Crespo and Hector Garcia-Molina. Archival storage for digital libraries. In Proceedings of the Third ACM DL Conference, 1998.Google Scholar
  9. [10]
    Arturo Crespo and Hector Garcia-Molina. Modeling archival repositories for digital libraries. In Proceedings of the Fourth European Conference on Research and Advanced Technology for Digital Libraries (ECDL), 2000.Google Scholar
  10. [11]
    Jean Deken. Writ in water? an exploration of the gap between archival construct and practice in the machine-readable environment. In Working With Knowldge Conference, May 1998. Accessible at http://www.slac.stanford.edu/pubs/slacpubs/7000/slac-pub-7811.html.
  11. [12]
    Ross Finlayson and David Cheriton. Log files: An extended file service exploiting write-once storage. In Proceedings of the 11th Symposium on Operating Systems Principles, November 1987.Google Scholar
  12. [13]
    National Science Foundation. Workshop on Data Archival and Information Preservation: Executive summary. http://cecssrv1.cecs.missouri.edu/NSFWorkshop/execsum.html, 1999.
  13. [14]
    Hector Garcia-Molina, Jeff Ullman, and Jennifer Widom. Database System Implementation. Prentice Hall, Upper Saddle River, New Jersey, 2000.Google Scholar
  14. [15]
    John Garrett and Donald Waters. Preserving digital information: Report of the Task Force on Archiving of Digital Information, May 1996. Accessible at http://www.rlg.org/ArchTF/.
  15. [16]
    Kaj Gronbaek and Randall Trigg. Design issues for a Dexter-based hypermedia system. Communications of the ACM, 37(2):40–49, February 1994.CrossRefGoogle Scholar
  16. [17]
    Anja Haake and David Hicks. Verse: Towards hypertext versioning styles. In Hypertext’ 96, 1996.Google Scholar
  17. [18]
    Frank Halasz and Mayer Schwartz. The Dexter Hypertext Reference Model. Communications of the ACM, 37(2):30–39, February 1994.CrossRefGoogle Scholar
  18. [19]
    Joseph Halpern and Carl Lagoze. The Computing Research Repository: Promoting the rapid dissemination and archiving of computer science research. In Proceedings of the Fourth ACM DL Conference, 1999.Google Scholar
  19. [20]
    John Hartman and John Ousterhout. The Zebra striped network file system. In Proceedings 14th Symposium on Operating Systems Principles, December 1993.Google Scholar
  20. [21]
    Norman C. Hutchinson, Stephen Manley, Mike Federwisch, Guy Harris, Dave Hitz, Steven Kleiman, and Sean O’Malley. Logical vs. physical file system backup. In Proceedings of the Third USENIX Symposium on Operating Systems Design and Implementation (OSDI), 1999.Google Scholar
  21. [22]
    Tivoli Systems Inc. Tivoli storage manager. http://www.tivoli.com/products/index/storage mgr/, 1999.
  22. [23]
    Richard P. King, Nagui Halim, Hector Garcia-Molina, and Christos A. Polyzois. Management of a remote backup copy for disaster recovery. TODS, 16(2):338–68, 1991.CrossRefGoogle Scholar
  23. [24]
    Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba Shrira, and Michael Williams. Replication in the Harp file system. In Proceedings 13th Symposium on Operating Systems Principles, October 1991.Google Scholar
  24. [25]
    Stanford Conservation Online. Electronic storage media.http://palimpsest.stanford.edu/bytopic/electronic-records/electronic-storage-media/, 2000.
  25. [26]
    David Patterson, Garth Gibson, and Randy H. Katz. A case for redundant arrays of inexpensive disks (RAID). SIGMOD Record, 17(3):109–116, September 1988.CrossRefGoogle Scholar
  26. [27]
    Michael Rabinovich, Narain Gehani, and Alex Kononov. Efficient update propagation in epidemic replicated databases. In Proceedings of the 5th International Conference on Extending Database Technology, 1996.Google Scholar
  27. [28]
    Arcot Rajasekar, Richard Marciano, and Reagan Moore. Collection-based persistent archives. http://www.sdsc.edu/NARA/Publications/OTHER/Persistent/Persistent.html, 2000.
  28. [29]
    Mendel Rosenblum and John K. Ousterhout. The design and implementation of a log-structured file system. In Proceedings 13th Symposium on Operating Systems Principles, October 1991.Google Scholar
  29. [30]
    David Rosenthal and Vicky Reich. Permanent web publishing.http://lockss.stanford.edu/, 2000. To appear at Freenix, San Diego, CA, June 2000.
  30. [31]
    Victorian Electronic Records Strategy. Victorian electronic records strategy final report. http://home.vicnet.net.au/~ provic/vers/final.htm, 1999.
  31. [32]
    Walter Tichy. RCS — a system for version control. Software — Practice and Experience, 15(7):637–654, 1985.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Brian Cooper
    • 1
  • Arturo Crespo
    • 1
  • Hector Garcia-Molina
    • 1
  1. 1.Department of Computer ScienceStanford UniversityUSA

Personalised recommendations