Encyclopedia of Social Network Analysis and Mining

Editors: Reda Alhajj, Jon Rokne

Web Archives

  • Klaus Berberich
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-6170-8_128

Synonyms

Web archiving; Web preservation

Glossary

Crawler

A software that harvests content from the World Wide Web

URL

Uniform Resource Locator

Scope

The resources that a web archive seeks to preserve

Website

A collection of related URLs

Definition

Web archives are repositories of web contents collected in the past. They act against the ephemeral nature of the World Wide Web, where new contents are constantly added while others are removed and thus lost forever. Web archives counter this loss by preserving web contents as part of the cultural heritage for future generations. To this end, web archives select resources (e.g., specific websites) worth preserving, repeatedly acquire snapshots of these resources, store them together with metadata (e.g., a time stamp or keywords), and provide accessto the archived web contents (e.g., via keyword search). Institutions operating web archives include nonprofit organizations, universities, national libraries, and for-profit companies. Users of...

This is a preview of subscription content, log in to check access

References

  1. Archive-It (2013) http://www.archive-it.org. Last access 25 Apr 2013
  2. Archive The Net (2013) http://archivethe.net. Last access 25 Apr 2013
  3. Arvidson A, Lettenstrom F (1998) The Kulturarw project - the Swedish royal web archive. Electron Libr 16(2): 105–108Google Scholar
  4. Berberich K, Bedathur S, Neumann T, Weikum G (2007) A time machine for text search. In: SIGIR'07: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, Amsterdam, pp 519–526Google Scholar
  5. Denev D, Mazeika A, Spaniol M, Weikum G (2011) The sharc framework for data quality in web archiving. VLDB J 20(2): 183–207Google Scholar
  6. Dougherty M, Meyer ET, Madsen C, van den Heuvel C, Thomas A, Wyatt S (2010) Researcher engagement with web archives - state of the art. JISC Project Report. http://ssrn.com/abstract=1714997. Last access 25 Apr 2013
  7. Gomes D, Miranda Ja, Costa M (2011) A survey on web archiving initiatives. In: Proceedings of the 15th international conference on theory and practice of digital libraries: research and advanced technology for digital libraries, TPDL'11, Berlin, pp 408–420Google Scholar
  8. Heritrix (2013) http://crawler.archive.org. Last access 25 Apr 2013
  9. International Internet Preservation Consortium (2013) http://www.netpreserve.org. Last access 25 Apr 2013
  10. International Web Archiving Workshop (2013) http://www.iwaw.net. Last access 25 Apr 2013
  11. Internet Archive (2013) http://www.archive.org. Last access 25 Apr 2013
  12. Internet Memory Foundation (2013) http://www.internetmemory.org. Last access 25 Apr 2013
  13. Kahle B (1997) Preserving the internet. Scientific American, New YorkGoogle Scholar
  14. Library of Congress (2013) http://www.loc.gov. Last access 25 Apr 2013
  15. Madhavan J, Ko D, Kot L, Ganapathy V, Rasmussen A, Halevy AY (2008) Google's deep web crawl. PVLDB 1(2): 1241–1252Google Scholar
  16. Masanes J (2006) Web archiving. Springer, HeidelbergGoogle Scholar
  17. Meyer ET, Thomas A, Schroeder R (2011) Web archives: the future(s). Oxford Internet Institute Technical Report. http://ssrn.com/abstract=1830025. Last access 25 Apr 2013
  18. National Library of France (2013) http://www.bnf.fr. Last access 25 Apr 2013
  19. Niu J (2012a) Functionalities of web archives. D-Lib Mag 18(5/6). http://www.dlib.org/dlib/march12/niu/03niu2.html
  20. Niu J (2012b) An overview of web archiving. D-Lib Mag 18(5/6). http://www.dlib.org/dlib/march12/niu/03niu1.html
  21. Olston C, Najork M (2010) Web crawling. Found Trends Inf Retr 4(3): 175–246MATHGoogle Scholar
  22. Portuguese Web Archive (2013) http://www.arquivo.pt. Last access 25 Apr 2013
  23. Stanford WebBase Project (2013) http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase. Last access 25 Apr 2013
  24. Thomas A, Meyer ET, Dougherty M, den Heuvel CV, Madsen CM, Wyatt S (2010) Researcher engagement with web archives: challenges and opportunities for investment. http://ssrn.com/abstract=1715000. Last access 25 Apr 2013
  25. Toyoda M, Kitsuregawa M (2012) The history of web archiving. Proc IEEE 100:1441–1443Google Scholar
  26. UNESCO (2003) Charter on the preservation of digital heritage. http://portal.unesco.org/ci/en/files/13367/10700115911Charter_en.pdf/Charter_en.pdf. Last access 25 Apr 2013

Recommended Reading

  1. Masanès (2006) remains the key reference on web archives. While web technology has evolved since its publication, the majority of issues discussed therein are still current. More recent accounts on the state of the art in web archiving can be found in Niu (2012b) and Dougherty et al. (2010). Meyer et al. (2011), as a final recommendation, give a glimpse of web archives' future.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Klaus Berberich
    • 1
  1. 1.Max Planck Institute for InformaticsSaarbrückenGermany