On the Change in Archivability of Websites Over Time
- 8 Citations
- 10 Mentions
- 2.2k Downloads
Abstract
As web technologies evolve, web archivists work to keep up so that our digital history is preserved. Recent advances in web technologies have introduced client-side executed scripts that load data without a referential identifier or that require user interaction (e.g., content loading when the page has scrolled). These advances have made automating methods for capturing web pages more difficult. Because of the evolving schemes of publishing web pages along with the progressive capability of web preservation tools, the archivability of pages on the web has varied over time. In this paper we show that the archivability of a web page can be deduced from the type of page being archived, which aligns with that page’s accessibility in respect to dynamic content. We show concrete examples of when these technologies were introduced by referencing mementos of pages that have persisted through a long evolution of available technologies. Identifying these reasons for the inability of these web pages to be archived in the past in respect to accessibility serves as a guide for ensuring that content that has longevity is published using good practice methods that make it available for preservation.
Keywords
Web Archiving Digital PreservationPreview
Unable to display preview. Download preview PDF.
References
- 1.Twitter Donates Entire Tweet Archive to Library of Congress (2010), http://www.loc.gov/today/pr/2010/10-081.html
- 2.Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How Much of the Web is Archived. In: Proceeding of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), JCDL 2011, pp. 133–136. ACM, New York (2011)CrossRefGoogle Scholar
- 3.Ast, P., Kapfenberger, M., Hauswiesner, S.: Crawler Approaches And Technology. Graz University of Technology, Styria, Austria (2008), http://www.iicm.tugraz.at/cguetl/courses/isr/uearchive/uews2008/Ue01%20-%20Crawler-Approaches-And-Technology.pdf
- 4.Bass, J.: Getting Personal: Confronting the Challenges of Archiving Personal Records in the Digital Age. Master’s thesis, University of Winnipeg (2012)Google Scholar
- 5.Bergman, M.: White Paper: the Deep Web: Surfacing Hidden Value. Journal of Electronic Publishing 7(1) (2001)Google Scholar
- 6.Brunelle, J.F., Kelly, M., Weigle, M.C., Nelson, M.L.: Losing the Moment: The Unarchivability of Shared Links (submitted for publication)Google Scholar
- 7.Chisholm, W., Vanderheiden, G., Jacobs, I.: Web Content Accessibility Guidelines 1.0. Interactions 8(4), 35–54 (2001)CrossRefGoogle Scholar
- 8.Crook, E.: Web Archiving in a Web 2.0 World. The Electronic Library 27(5), 831–836 (2009)CrossRefGoogle Scholar
- 9.Garrett, J.: Ajax: A New Approach to Web Applications (2005), http://www.adaptivepath.com/ideas/ajax-new-approach-web-applications
- 10.Brunelle, J.F.: Zombies in the Archives (2012), http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
- 11.Kelly, M.: An Extensible Framework For Creating Personal Archives Of Web Resources Requiring Authentication. Master’s thesis, Old Dominion University (2012)Google Scholar
- 12.Kelly, M., Weigle, M.C.: WARCreate - Create Wayback-Consumable WARC Files from Any Webpage. In: Proceedings of the 12th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), Washington, DC, pp. 437–438 (June 2012)Google Scholar
- 13.Kıcıman, E., Livshits, B.: AjaxScope: A Platform for Remotely Monitoring the Client-Side Behavior of Web 2.0 Applications. In: Proceedings of Symposium on Operating Systems Principles (2007)Google Scholar
- 14.Likarish, P., Jung, E.: A Targeted Web Crawling for Building Malicious Javascript Collection. In: Proceedings of the ACM First International Workshop on Data-Intensive Software Management and Mining, DSMM 2009, pp. 23–26. ACM, New York (2009)CrossRefGoogle Scholar
- 15.Livshits, B., Guarnieri, S.: Gulfstream: Incremental Static Analysis for Streaming JavaScript Applications. Technical Report MSR-TR-2010-4, Microsoft (January 2010)Google Scholar
- 16.McCown, F., Diawara, N., Nelson, M.L.: Factors Affecting Website Reconstruction from the Web Infrastructure. In: Proceedings of the 7th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 39–48 (2007)Google Scholar
- 17.McCown, F., Marshall, C.C., Nelson, M.L.: Why Websites Are Lost (and How They’re Sometimes Found). Communications of the ACM 52(11), 141–145 (2009)CrossRefGoogle Scholar
- 18.McCown, F., Nelson, M.L.: What Happens When Facebook is Gone. In: Proceedings of the 9th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 251–254. ACM, New York (2009)Google Scholar
- 19.Meyer, E.: Researcher Engagement with Web Archives-Challenges and Opportunities. Technical report, University of Oxford (2010)Google Scholar
- 20.Meyerovich, L., Livshits, B.: Conscript: Specifying and Enforcing Fine-Grained Security Policies for Javascript in the Browser. In: 2010 IEEE Symposium on Security and Privacy (SP), pp. 481–496. IEEE (2010)Google Scholar
- 21.Mohr, G., Kimpton, M., Stack, M., Ranitovic, I.: Introduction to Heritrix, an Archival Quality Web Crawler. In: Proceedings of the 4th International Web Archiving Workshop (IWAW 2004) (September 2004)Google Scholar
- 22.Parmanto, B., Zeng, X.: Metric for Web Accessibility Evaluation. Journal of the American Society for Information Science and Technology 56(13), 1394–1404 (2005)CrossRefGoogle Scholar
- 23.Prellwitz, M., Nelson, M.L.: Music Video Redundancy and Half-Life in YouTube. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 143–150. Springer, Heidelberg (2011)CrossRefGoogle Scholar
- 24.Shah, C.: Tubekit: a Query-based YouTube Crawling Toolkit. In: Proceedings of the 8th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), p. 433. ACM (2008)Google Scholar
- 25.Sigurðsson, K.: Incremental Crawling with Heritrix. In: Proceedings of the 5th International Web Archiving Workshop, IWAW 2005 (2005)Google Scholar
- 26.Tofel, B.: ‘Wayback’ for Accessing Web Archives. In: Proceedings of the 7th International Web Archiving Workshop, IWAW 2007 (2007)Google Scholar
- 27.Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L.L., Ainsworth, S., Shankar, H.: Memento: Time Travel for the Web. Technical Report arXiv:0911.1112 (2009)Google Scholar
- 28.Vikram, K., Prateek, A., Livshits, B.: Ripley: Automatically Securing Web 2.0 Applications Through Replicated Execution. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 173–186. ACM (2009)Google Scholar