Advertisement

Profiling Web Archive Coverage for Top-Level Domain and Content Language

  • Ahmed Alsum
  • Michele C. Weigle
  • Michael L. Nelson
  • Herbert Van de Sompel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8092)

Abstract

The Memento aggregator currently polls every known public web archive when serving a request for an archived web page, even though some web archives focus on only specific domains and ignore the others. Similar to query routing in distributed search, we investigate the impact on aggregated Memento TimeMaps (lists of when and where a web page was archived) by only sending queries to archives likely to hold the archived page. We profile twelve public web archives using data from a variety of sources (the web, archives’ access logs, and full-text queries to archives) and discover that only sending queries to the top three web archives (i.e., a 75% reduction in the number of queries) for any request produces the full TimeMaps on 84% of the cases.

Keywords

Web archive query routing memento aggregator 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brown, A.: Archiving websites: A practical guide for information management professionals, 1st edn. Facet, London (2006)Google Scholar
  2. 2.
    Shiozaki, R., Eisenschitz, T.: Role and justification of web archiving by national libraries: A questionnaire survey. Journal of Librarianship and Information Science 41, 90–107 (2009)CrossRefGoogle Scholar
  3. 3.
    Masanès, J.: Web archiving. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Ainsworth, S.G., AlSum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How much of the Web is Archived? In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, JCDL 2011, pp. 133–136 (2011)Google Scholar
  5. 5.
    Brügger, N.: Archiving Websites. In: General Considerations and Strategies, 1st edn. The Center for Internet Research, Aarhus N (2005)Google Scholar
  6. 6.
    Niu, J.: Functionalities of Web Archives. D-Lib Magazine 18 (2012)Google Scholar
  7. 7.
    Niu, J.: An Overview of Web Archiving. D-Lib Magazine 18 (2012)Google Scholar
  8. 8.
    Aubry, S.: Introducing Web Archives as a New Library Service: The Experience of the National Library of France. LIBER Quarterly 20, 179–199 (2010)Google Scholar
  9. 9.
    Gomes, D., Nogueira, A., Miranda, J.A., Costa, M.: Introducing the Portuguese Web Archive Initiative. In: Proceedings of 8th IWAW (2008)Google Scholar
  10. 10.
    Vlcek, I.: Identification and Archiving of the Czech Web Outside the National Domain. In: Proceedings of 8th IWAW (2008)Google Scholar
  11. 11.
    Chen, K., Chen, Y., Ting, P.: Developing National Taiwan University Web Archiving System. In: Proceedings of 8th IWAW (2008)Google Scholar
  12. 12.
    Heslop, H., Davis, S., Wilson, A.: An Approach to the Preservation of Digital Records. Technical report, National Archives of Australia (2002)Google Scholar
  13. 13.
    Yan, H., Huang, L., Chen, C., Xie, Z.: A New Data Storage and Service Model of China Web. In: Proceedings of 4th IWAW (2004)Google Scholar
  14. 14.
    Van de Sompel, H., Nelson, M.L., Sanderson, R.: HTTP framework for time-based access to resource states – Memento (2011), https://datatracker.ietf.org/doc/draft-vandesompel-memento/
  15. 15.
    Sanderson, R., Shankar, H., AlSum, A.: Memento Aggregator source code (2010), https://code.google.com/p/memento-server
  16. 16.
    Sanderson, R.: Memento Tools: Proxy Scripts (2010), http://www.mementoweb.org/tools/proxy/
  17. 17.
    Tofel, B.: ‘Wayback’ for Accessing Web Archives. In: Proceedings of 7th IWAW (2007)Google Scholar
  18. 18.
    AlNoamany, Y., Weigle, M.C., Nelson, M.L.: Access Patterns for Robots and Humans in Web Archives. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013 (2013)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ahmed Alsum
    • 1
  • Michele C. Weigle
    • 1
  • Michael L. Nelson
    • 1
  • Herbert Van de Sompel
    • 2
  1. 1.Computer Science DepartmentOld Dominion UniversityNorfolkUSA
  2. 2.Los Alamos National LaboratoryLos AlamosUSA

Personalised recommendations