Skip to main content

Archival Tools to Match the Web: Open, International, Comprehensive

  • Conference paper
Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers (ICADL 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4822))

Included in the following conference series:

Abstract

Together with a number of national libraries, the Internet Archive committed itself in 2003 to international collaboration to create open source tools and standardized formats for web archiving. This project was motivated by our experience as home to over 100 billion archived web resources dating back to 1996, and as a partner to memory institutions building thematic web archives. Resulting tools include the Heritrix archival web crawler/harvester, the Wayback archive browsing service, and the NutchWAX archive full-text index and query utilities. A standard ingest/archival format for web resources called WARC has also been developed. Software with full source code is free to download and reuse, and organizations worldwide have adopted and contributed to these tools. Working with large collections remains a challenge, and the web itself is constantly growing and changing, so we continue to seek international cooperation to expand and improve this web archive tool set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Authors

Editor information

Dion Hoe-Lian Goh Tru Hoang Cao Ingeborg Torvik Sølvberg Edie Rasmussen

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mohr, G. (2007). Archival Tools to Match the Web: Open, International, Comprehensive. In: Goh, D.HL., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds) Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. ICADL 2007. Lecture Notes in Computer Science, vol 4822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77094-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77094-7_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77093-0

  • Online ISBN: 978-3-540-77094-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics