On the Change in Archivability of Websites Over Time

Kelly, Mat; Brunelle, Justin F.; Weigle, Michele C.; Nelson, Michael L.

doi:10.1007/978-3-642-40501-3_5

Mat Kelly²¹,
Justin F. Brunelle²¹,
Michele C. Weigle²¹ &
…
Michael L. Nelson²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8092))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

2661 Accesses
10 Citations
9 Altmetric

Abstract

As web technologies evolve, web archivists work to keep up so that our digital history is preserved. Recent advances in web technologies have introduced client-side executed scripts that load data without a referential identifier or that require user interaction (e.g., content loading when the page has scrolled). These advances have made automating methods for capturing web pages more difficult. Because of the evolving schemes of publishing web pages along with the progressive capability of web preservation tools, the archivability of pages on the web has varied over time. In this paper we show that the archivability of a web page can be deduced from the type of page being archived, which aligns with that page’s accessibility in respect to dynamic content. We show concrete examples of when these technologies were introduced by referencing mementos of pages that have persisted through a long evolution of available technologies. Identifying these reasons for the inability of these web pages to be archived in the past in respect to accessibility serves as a guide for ensuring that content that has longevity is published using good practice methods that make it available for preservation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Twitter Donates Entire Tweet Archive to Library of Congress (2010), http://www.loc.gov/today/pr/2010/10-081.html
Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How Much of the Web is Archived. In: Proceeding of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), JCDL 2011, pp. 133–136. ACM, New York (2011)
Chapter Google Scholar
Ast, P., Kapfenberger, M., Hauswiesner, S.: Crawler Approaches And Technology. Graz University of Technology, Styria, Austria (2008), http://www.iicm.tugraz.at/cguetl/courses/isr/uearchive/uews2008/Ue01%20-%20Crawler-Approaches-And-Technology.pdf
Bass, J.: Getting Personal: Confronting the Challenges of Archiving Personal Records in the Digital Age. Master’s thesis, University of Winnipeg (2012)
Google Scholar
Bergman, M.: White Paper: the Deep Web: Surfacing Hidden Value. Journal of Electronic Publishing 7(1) (2001)
Google Scholar
Brunelle, J.F., Kelly, M., Weigle, M.C., Nelson, M.L.: Losing the Moment: The Unarchivability of Shared Links (submitted for publication)
Google Scholar
Chisholm, W., Vanderheiden, G., Jacobs, I.: Web Content Accessibility Guidelines 1.0. Interactions 8(4), 35–54 (2001)
Article Google Scholar
Crook, E.: Web Archiving in a Web 2.0 World. The Electronic Library 27(5), 831–836 (2009)
Article Google Scholar
Garrett, J.: Ajax: A New Approach to Web Applications (2005), http://www.adaptivepath.com/ideas/ajax-new-approach-web-applications
Brunelle, J.F.: Zombies in the Archives (2012), http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
Kelly, M.: An Extensible Framework For Creating Personal Archives Of Web Resources Requiring Authentication. Master’s thesis, Old Dominion University (2012)
Google Scholar
Kelly, M., Weigle, M.C.: WARCreate - Create Wayback-Consumable WARC Files from Any Webpage. In: Proceedings of the 12th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), Washington, DC, pp. 437–438 (June 2012)
Google Scholar
Kıcıman, E., Livshits, B.: AjaxScope: A Platform for Remotely Monitoring the Client-Side Behavior of Web 2.0 Applications. In: Proceedings of Symposium on Operating Systems Principles (2007)
Google Scholar
Likarish, P., Jung, E.: A Targeted Web Crawling for Building Malicious Javascript Collection. In: Proceedings of the ACM First International Workshop on Data-Intensive Software Management and Mining, DSMM 2009, pp. 23–26. ACM, New York (2009)
Chapter Google Scholar
Livshits, B., Guarnieri, S.: Gulfstream: Incremental Static Analysis for Streaming JavaScript Applications. Technical Report MSR-TR-2010-4, Microsoft (January 2010)
Google Scholar
McCown, F., Diawara, N., Nelson, M.L.: Factors Affecting Website Reconstruction from the Web Infrastructure. In: Proceedings of the 7th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 39–48 (2007)
Google Scholar
McCown, F., Marshall, C.C., Nelson, M.L.: Why Websites Are Lost (and How They’re Sometimes Found). Communications of the ACM 52(11), 141–145 (2009)
Article Google Scholar
McCown, F., Nelson, M.L.: What Happens When Facebook is Gone. In: Proceedings of the 9th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 251–254. ACM, New York (2009)
Google Scholar
Meyer, E.: Researcher Engagement with Web Archives-Challenges and Opportunities. Technical report, University of Oxford (2010)
Google Scholar
Meyerovich, L., Livshits, B.: Conscript: Specifying and Enforcing Fine-Grained Security Policies for Javascript in the Browser. In: 2010 IEEE Symposium on Security and Privacy (SP), pp. 481–496. IEEE (2010)
Google Scholar
Mohr, G., Kimpton, M., Stack, M., Ranitovic, I.: Introduction to Heritrix, an Archival Quality Web Crawler. In: Proceedings of the 4th International Web Archiving Workshop (IWAW 2004) (September 2004)
Google Scholar
Parmanto, B., Zeng, X.: Metric for Web Accessibility Evaluation. Journal of the American Society for Information Science and Technology 56(13), 1394–1404 (2005)
Article Google Scholar
Prellwitz, M., Nelson, M.L.: Music Video Redundancy and Half-Life in YouTube. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 143–150. Springer, Heidelberg (2011)
Chapter Google Scholar
Shah, C.: Tubekit: a Query-based YouTube Crawling Toolkit. In: Proceedings of the 8th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), p. 433. ACM (2008)
Google Scholar
Sigurðsson, K.: Incremental Crawling with Heritrix. In: Proceedings of the 5th International Web Archiving Workshop, IWAW 2005 (2005)
Google Scholar
Tofel, B.: ‘Wayback’ for Accessing Web Archives. In: Proceedings of the 7th International Web Archiving Workshop, IWAW 2007 (2007)
Google Scholar
Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L.L., Ainsworth, S., Shankar, H.: Memento: Time Travel for the Web. Technical Report arXiv:0911.1112 (2009)
Google Scholar
Vikram, K., Prateek, A., Livshits, B.: Ripley: Automatically Securing Web 2.0 Applications Through Replicated Execution. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 173–186. ACM (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Old Dominion University, Norfolk, VA, 23529, USA
Mat Kelly, Justin F. Brunelle, Michele C. Weigle & Michael L. Nelson

Authors

Mat Kelly
View author publications
You can also search for this author in PubMed Google Scholar
Justin F. Brunelle
View author publications
You can also search for this author in PubMed Google Scholar
Michele C. Weigle
View author publications
You can also search for this author in PubMed Google Scholar
Michael L. Nelson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, Norwegian University of Science and Technology, 7491, Trondheim, Norway
Trond Aalberg
Department of Archives and Library Science, Ionian University, 49100, Corfu, Greece
Christos Papatheodorou
Department of Library Information and Archive Sciences, University of Malta, MSD2280, Msida, Malta
Milena Dobreva
Library and Information Center, University of Patras, 26504, Patras, Greece
Giannis Tsakonas
National Archives of Malta, RBT1043, Rabat, Malta
Charles J. Farrugia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kelly, M., Brunelle, J.F., Weigle, M.C., Nelson, M.L. (2013). On the Change in Archivability of Websites Over Time. In: Aalberg, T., Papatheodorou, C., Dobreva, M., Tsakonas, G., Farrugia, C.J. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2013. Lecture Notes in Computer Science, vol 8092. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40501-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-40501-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40500-6
Online ISBN: 978-3-642-40501-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics