Preserving the Fabric of Our Lives: A Survey of Web Preservation Initiatives

  • Michael Day
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2769)


This paper argues that the growing importance of the World Wide Web means that Web sites are key candidates for digital preservation. After an brief outline of some of the main reasons why the preservation of Web sites can be problematic, a review of selected Web archiving initiatives shows that most current initiatives are based on combinations of three main approaches: automatic harvesting, selection and deposit. The paper ends with a discussion of issues relating to collection and access policies, software, costs and preservation.


National Library National Archive Access Policy Selective Approach Digital Preservation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Castells, M.: The Internet galaxy: reflections on the Internet, business, and society. Oxford University Press, Oxford (2001)Google Scholar
  2. 2.
    Hendler, J.: Science and the Semantic Web. Science 299, 520–521 (2003)CrossRefGoogle Scholar
  3. 3.
    Lyman, P.: Archiving the World Wide Web. In: Building a national strategy for digital preservation. Council on Library and Information Resources and Library of Congress, Washington, D.C., pp. 8–51, (2002),
  4. 4.
    Day, M.: Collecting and preserving the World Wide Web: a feasibility study undertaken for the JISC and Wellcome Trust (February 2003),
  5. 5.
    Lawrence, S., Giles, C.L.: Searching the World Wide Web. Science 280, 98–100 (1998)CrossRefGoogle Scholar
  6. 6.
    Lawrence, S., Giles, C.L.: Accessibility of information on the Web. Nature 400, 107–109 (1999)CrossRefGoogle Scholar
  7. 7.
    Murray, B., Moore, A.: Sizing the Internet. Cyveillance White Paper. Cyveillance, Inc. (July 10, 2000),
  8. 8.
    Lyman, P., Varian, H.R.: How much information? University of California at Berkeley, School of Information Management and Systems, Berkeley, Calif (2000),
  9. 9.
    Bar-Ilan, J.: Data collection methods on the Web for informetric purposes: a review and analysis. Scientometrics 50, 7–32 (2001)CrossRefGoogle Scholar
  10. 10.
    Bergman, M.K.: The deep Web: surfacing hidden value. Journal of Electronic Publishing (August 7, 2001), Available at:
  11. 11.
    Lawrence, S., Pennock, D.M., Flake, G.W., Krovetz, R., Coetzee, F.M., Glover, E., Nielsen, F.Å., Kruger, A., Giles, C.L.: Persistence of Web references in scientific research. Computer 34, 26–31 (2001)CrossRefGoogle Scholar
  12. 12.
    Casey, C.: The cyberarchive: a look at the storage and preservation of Web sites. College & Research Libraries 59, 304–310 (1998)Google Scholar
  13. 13.
    Webb, C.: Who will save the Olympics?OCLC/Preservation Resources Symposium, Digital Past, Digital Future: an Introduction to Digital Preservation, OCLC, Dublin, Ohio (June 15, 2001),
  14. 14.
    Charlesworth, A.: A study of legal issues related to the preservation of Internet resources in the UK, EU, USA and Australia (February 2003),
  15. 15.
    Bollacker, K.D., Lawrence, S., Giles, C.L.: Discovering relevant scientific literature on the Web. IEEE Intelligent Systems 15, 42–47 (2000)Google Scholar
  16. 16.
    Chakrabarti, S., Dom, B.E., Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A., Kleinberg, J., Gibson, D.: Hypersearching the Web. Scientific American 280, 44–52 (1999)Google Scholar
  17. 17.
    Herring, S.D.: Using the World Wide Web for research: are faculty satisfied? Journal of Academic Librarianship 27, 213–219 (2001)CrossRefGoogle Scholar
  18. 18.
    Kahle, B.: Way back when. New Scientist 176, 46–49 (2002)Google Scholar
  19. 19.
    Hallgrímsson, T.: Survey of Web archiving in Europe. E-mail sent to list web archive@ (February 3, 2003)Google Scholar
  20. 20.
    National Archives of Australia: Archiving Web resources: a policy for keeping records of Web-based activity in the Commonwealth Government (January 2001),
  21. 21.
    National Archives of Australia: Archiving Web resources: guidelines for keeping records of Web-based activity in the Commonwealth Government (March 2001),
  22. 22.
    Public Record Office: Managing Web resources: management of electronic records on Websites and Intranets: an ERM toolkit, v. 1.0 (December (2001)Google Scholar
  23. 23.
    Bellardo, L.J.: Memorandum to Chief Information Officers: snapshot of public Web sites. National Archives & Records Administration, Washington, D.C. (January 12, 2001),
  24. 24.
    Ryan, D.: Preserving the No 10 Web site: the story so far. Web-archiving: managing and archiving online documents, DPC Forum, London (March 25, 2002),
  25. 25.
    Arvidson, A., Persson, K., Mannerheim, J.: The Royal Swedish Web Archive: a complete collection of Web pages. International Preservation News 26, 10–12 (December 2001), Google Scholar
  26. 26.
    Hakala, J.: The NEDLIB Harvester. Zeitschrift für Bibliothekswesen und Bibliographie 48, 211–216 (2001)Google Scholar
  27. 27.
    Rauber, A., Aschenbrenner, A., Witvoet, O.: Austrian Online Archive processing: analyzing archives of the World Wide Web. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, pp. 16–31. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  28. 28.
    Arms, W.Y., Adkins, R., Ammen, C., Hayes, A.: Collecting and preserving the Web: the Minerva prototype. RLG DigiNews (April 5, 2001),
  29. 29.
    Mannerheim, J.: The new preservation tasks of the library community. International Preservation News 26, 5–9 (December 2001), Google Scholar
  30. 30.
    Abiteboul, S., Cobéna, G., Masanès, J., Sedrati, G.: A first experience in archiving the French Web. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, pp. 1–15. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  31. 31.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)CrossRefGoogle Scholar
  32. 32.
    Masanès, J.: Towards continuous Web archiving: first results and an agenda for the future. D-Lib Magazine 8 (December 2002),
  33. 33.
    Brygfjeld, S.A.: Access to Web archives: the Nordic Web Archive Access Project. Zeitschrift für Bibliothekswesen und Bibliographie 49, 227–231 (2002)Google Scholar
  34. 34.
    Ardö, A., Lundberg, S.: A regional distributed WWW search and indexing service – the DESIRE way. Computer Networks and ISDN Systems 30, 173–183 (1998)CrossRefGoogle Scholar
  35. 35.
    CCSDS 650.0-B-1: Reference Model for an Open Archival Information System (OAIS). Consultative Committee on Space Data Systems (2002), documents/pdf/CCSDS-650.0-B-1.pdf
  36. 36.
    Lynch, C.: Authenticity and integrity in the digital environment: an exploratory analysis of the central role of trust. In: Authenticity in a digital environment. Council on Library and Information Resources, Washington, D.C, pp. 32–50. (2000), pub92abst.html
  37. 37.
    Hirtle, P.B: Archival authenticity in a digital age. Authenticity in a digital environment. Council on Library and Information Resources, Washington, D.C., 8–23 (2000),
  38. 38.
    RLG/OCLC Working Group on Digital Archive Attributes: Trusted digital repositories: attributes and responsibilities (2002),

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Michael Day
    • 1
  1. 1.UKOLNUniversity of BathBathUnited Kingdom

Personalised recommendations