Encyclopedia of Social Network Analysis and Mining

2018 Edition
| Editors: Reda Alhajj, Jon Rokne

Web Archives

  • Klaus Berberich
Reference work entry
DOI: https://doi.org/10.1007/978-1-4939-7131-2_128




A software that harvests content from the World Wide Web


Uniform Resource Locator


The resources that a web archive seeks to preserve


A collection of related URLs


Web archives are repositories of web contents collected in the past. They act against the ephemeral nature of the World Wide Web, where new contents are constantly added while others are removed and thus lost forever. Web archives counter this loss by preserving web contents as part of the cultural heritage for future generations. To this end, web archives select resources (e.g., specific websites) worth preserving, repeatedly acquire snapshots of these resources, store them together with metadata (e.g., a time stamp or keywords), and provide access to the archived web contents (e.g., via keyword search). Institutions operating web archives include nonprofit organizations, universities, national libraries, and for-profit companies. Users...

This is a preview of subscription content, log in to check access.


  1. Archive-It (2013) http://www.archive-it.org. Last access 25 Apr 2013
  2. Archive The Net (2013) http://archivethe.net. Last access 25 Apr 2013
  3. Arvidson A, Lettenstrom F (1998) The Kulturarw project – the Swedish royal web archive. Electron Libr 16(2):105–108CrossRefGoogle Scholar
  4. Berberich K, Bedathur S, Neumann T, Weikum G (2007) A time machine for text search. In: SIGIR’07: proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, Amsterdam, pp 519–526Google Scholar
  5. Denev D, Mazeika A, Spaniol M, Weikum G (2011) The SHARC framework for data quality in web archiving. VLDB J 20(2):183–207CrossRefGoogle Scholar
  6. Dougherty M, Meyer ET, Madsen C, van den Heuvel C, Thomas A, Wyatt S (2010) Researcher engagement with web archives – state of the art. JISC project report. http://ssrn.com/abstract=1714997. Last access 25 Apr 2013
  7. Gomes D, Miranda JA, Costa M (2011) A survey on web archiving initiatives. In: Proceedings of the 15th international conference on theory and practice of digital libraries: research and advanced technology for digital libraries, TPDL’11, Berlin, pp 408–420CrossRefGoogle Scholar
  8. Heritrix (2013) http://crawler.archive.org. Last access 25 Apr 2013
  9. International Internet Preservation Consortium (2013) http://www.netpreserve.org. Last access 25 Apr 2013
  10. International Web Archiving Workshop (2013) http://www.iwaw.net. Last access 25 Apr 2013
  11. Internet Archive (2013) http://www.archive.org. Last access 25 Apr 2013
  12. Internet Memory Foundation (2013) http://www.internetmemory.org. Last access 25 Apr 2013
  13. Kahle B (1997) Preserving the internet. Scientific American, New YorkGoogle Scholar
  14. Library of Congress (2013) http://www.loc.gov. Last access 25 Apr 2013
  15. Madhavan J, Ko D, Kot L, Ganapathy V, Rasmussen A, Halevy AY (2008) Google’s deep web crawl. PVLDB 1(2):1241–1252Google Scholar
  16. Masanes J (2006) Web archiving. Springer, HeidelbergCrossRefGoogle Scholar
  17. Meyer ET, Thomas A, Schroeder R (2011) Web archives: the future(s). Oxford internet institute technical report. http://ssrn.com/abstract=1830025. Last access 25 Apr 2013
  18. National Library of France (2013) http://www.bnf.fr. Last access 25 Apr 2013
  19. Niu J (2012a) Functionalities of web archives. D-Lib Mag 18(5/6). http://www.dlib.org/dlib/march12/niu/03niu2.html
  20. Niu J (2012b) An overview of web archiving. D-Lib Mag 18(5/6). http://www.dlib.org/dlib/march12/niu/03niu1.html
  21. Olston C, Najork M (2010) Web crawling. Found Trends Inf Retr 4(3):175–246zbMATHCrossRefGoogle Scholar
  22. Portuguese Web Archive (2013) http://www.arquivo.pt. Last access 25 Apr 2013
  23. Stanford WebBase Project (2013) http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase. Last access 25 Apr 2013
  24. Thomas A, Meyer ET, Dougherty M, den Heuvel CV, Madsen CM, Wyatt S (2010) Researcher engagement with web archives: challenges and opportunities for investment. http://ssrn.com/abstract=1715000. Last access 25 Apr 2013
  25. Toyoda M, Kitsuregawa M (2012) The history of web archiving. Proc IEEE 100:1441–1443CrossRefGoogle Scholar
  26. UNESCO (2003) Charter on the preservation of digital heritage. http://portal.unesco.org/ci/en/files/13367/10700115911Charter_en.pdf/Charter_en.pdf. Last access 25 Apr 2013

Recommended Reading

  1. Masanès (2006) remains the key reference on web archives. While web technology has evolved since its publication, the majority of issues discussed therein are still current. More recent accounts on the state of the art in web archiving can be found in Niu (2012b) and Dougherty et al. (2010). Meyer et al. (2011), as a final recommendation, give a glimpse of web archives’ futureGoogle Scholar

Copyright information

© Springer Science+Business Media LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Max Planck Institute for InformaticsSaarbrückenGermany

Section editors and affiliations

  • Thomas Gottron
    • 1
  • Stefan Schlobach
    • 2
  • Steffen Staab
    • 3
  1. 1.Institute for Web Science and TechnologiesUniversität Koblenz-LandauKoblenzGermany
  2. 2.YUAmsterdamThe Netherlands
  3. 3.Institute for Web Science and Technologies – WeSTUniversity of Koblenz-LandauKoblenzGermany