Abstract
As web technologies evolve, web archivists work to keep up so that our digital history is preserved. Recent advances in web technologies have introduced client-side executed scripts that load data without a referential identifier or that require user interaction (e.g., content loading when the page has scrolled). These advances have made automating methods for capturing web pages more difficult. Because of the evolving schemes of publishing web pages along with the progressive capability of web preservation tools, the archivability of pages on the web has varied over time. In this paper we show that the archivability of a web page can be deduced from the type of page being archived, which aligns with that page’s accessibility in respect to dynamic content. We show concrete examples of when these technologies were introduced by referencing mementos of pages that have persisted through a long evolution of available technologies. Identifying these reasons for the inability of these web pages to be archived in the past in respect to accessibility serves as a guide for ensuring that content that has longevity is published using good practice methods that make it available for preservation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Twitter Donates Entire Tweet Archive to Library of Congress (2010), http://www.loc.gov/today/pr/2010/10-081.html
Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How Much of the Web is Archived. In: Proceeding of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), JCDL 2011, pp. 133–136. ACM, New York (2011)
Ast, P., Kapfenberger, M., Hauswiesner, S.: Crawler Approaches And Technology. Graz University of Technology, Styria, Austria (2008), http://www.iicm.tugraz.at/cguetl/courses/isr/uearchive/uews2008/Ue01%20-%20Crawler-Approaches-And-Technology.pdf
Bass, J.: Getting Personal: Confronting the Challenges of Archiving Personal Records in the Digital Age. Master’s thesis, University of Winnipeg (2012)
Bergman, M.: White Paper: the Deep Web: Surfacing Hidden Value. Journal of Electronic Publishing 7(1) (2001)
Brunelle, J.F., Kelly, M., Weigle, M.C., Nelson, M.L.: Losing the Moment: The Unarchivability of Shared Links (submitted for publication)
Chisholm, W., Vanderheiden, G., Jacobs, I.: Web Content Accessibility Guidelines 1.0. Interactions 8(4), 35–54 (2001)
Crook, E.: Web Archiving in a Web 2.0 World. The Electronic Library 27(5), 831–836 (2009)
Garrett, J.: Ajax: A New Approach to Web Applications (2005), http://www.adaptivepath.com/ideas/ajax-new-approach-web-applications
Brunelle, J.F.: Zombies in the Archives (2012), http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
Kelly, M.: An Extensible Framework For Creating Personal Archives Of Web Resources Requiring Authentication. Master’s thesis, Old Dominion University (2012)
Kelly, M., Weigle, M.C.: WARCreate - Create Wayback-Consumable WARC Files from Any Webpage. In: Proceedings of the 12th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), Washington, DC, pp. 437–438 (June 2012)
Kıcıman, E., Livshits, B.: AjaxScope: A Platform for Remotely Monitoring the Client-Side Behavior of Web 2.0 Applications. In: Proceedings of Symposium on Operating Systems Principles (2007)
Likarish, P., Jung, E.: A Targeted Web Crawling for Building Malicious Javascript Collection. In: Proceedings of the ACM First International Workshop on Data-Intensive Software Management and Mining, DSMM 2009, pp. 23–26. ACM, New York (2009)
Livshits, B., Guarnieri, S.: Gulfstream: Incremental Static Analysis for Streaming JavaScript Applications. Technical Report MSR-TR-2010-4, Microsoft (January 2010)
McCown, F., Diawara, N., Nelson, M.L.: Factors Affecting Website Reconstruction from the Web Infrastructure. In: Proceedings of the 7th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 39–48 (2007)
McCown, F., Marshall, C.C., Nelson, M.L.: Why Websites Are Lost (and How They’re Sometimes Found). Communications of the ACM 52(11), 141–145 (2009)
McCown, F., Nelson, M.L.: What Happens When Facebook is Gone. In: Proceedings of the 9th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 251–254. ACM, New York (2009)
Meyer, E.: Researcher Engagement with Web Archives-Challenges and Opportunities. Technical report, University of Oxford (2010)
Meyerovich, L., Livshits, B.: Conscript: Specifying and Enforcing Fine-Grained Security Policies for Javascript in the Browser. In: 2010 IEEE Symposium on Security and Privacy (SP), pp. 481–496. IEEE (2010)
Mohr, G., Kimpton, M., Stack, M., Ranitovic, I.: Introduction to Heritrix, an Archival Quality Web Crawler. In: Proceedings of the 4th International Web Archiving Workshop (IWAW 2004) (September 2004)
Parmanto, B., Zeng, X.: Metric for Web Accessibility Evaluation. Journal of the American Society for Information Science and Technology 56(13), 1394–1404 (2005)
Prellwitz, M., Nelson, M.L.: Music Video Redundancy and Half-Life in YouTube. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 143–150. Springer, Heidelberg (2011)
Shah, C.: Tubekit: a Query-based YouTube Crawling Toolkit. In: Proceedings of the 8th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), p. 433. ACM (2008)
Sigurðsson, K.: Incremental Crawling with Heritrix. In: Proceedings of the 5th International Web Archiving Workshop, IWAW 2005 (2005)
Tofel, B.: ‘Wayback’ for Accessing Web Archives. In: Proceedings of the 7th International Web Archiving Workshop, IWAW 2007 (2007)
Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L.L., Ainsworth, S., Shankar, H.: Memento: Time Travel for the Web. Technical Report arXiv:0911.1112 (2009)
Vikram, K., Prateek, A., Livshits, B.: Ripley: Automatically Securing Web 2.0 Applications Through Replicated Execution. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 173–186. ACM (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kelly, M., Brunelle, J.F., Weigle, M.C., Nelson, M.L. (2013). On the Change in Archivability of Websites Over Time. In: Aalberg, T., Papatheodorou, C., Dobreva, M., Tsakonas, G., Farrugia, C.J. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2013. Lecture Notes in Computer Science, vol 8092. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40501-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-40501-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40500-6
Online ISBN: 978-3-642-40501-3
eBook Packages: Computer ScienceComputer Science (R0)