International Journal on Digital Libraries

, Volume 17, Issue 2, pp 95–117

The impact of JavaScript on archivability

  • Justin F. Brunelle
  • Mat Kelly
  • Michele C. Weigle
  • Michael L. Nelson
Article

DOI: 10.1007/s00799-015-0140-8

Cite this article as:
Brunelle, J.F., Kelly, M., Weigle, M.C. et al. Int J Digit Libr (2016) 17: 95. doi:10.1007/s00799-015-0140-8

Abstract

As web technologies evolve, web archivists work to adapt so that digital history is preserved. Recent advances in web technologies have introduced client-side executed scripts (Ajax) that, for example, load data without a change in top level Universal Resource Identifier (URI) or require user interaction (e.g., content loading via Ajax when the page has scrolled). These advances have made automating methods for capturing web pages more difficult. In an effort to understand why mementos (archived versions of live resources) in today’s archives vary in completeness and sometimes pull content from the live web, we present a study of web resources and archival tools. We used a collection of URIs shared over Twitter and a collection of URIs curated by Archive-It in our investigation. We created local archived versions of the URIs from the Twitter and Archive-It sets using WebCite, wget, and the Heritrix crawler. We found that only 4.2 % of the Twitter collection is perfectly archived by all of these tools, while 34.2 % of the Archive-It collection is perfectly archived. After studying the quality of these mementos, we identified the practice of loading resources via JavaScript (Ajax) as the source of archival difficulty. Further, we show that resources are increasing their use of JavaScript to load embedded resources. By 2012, over half (54.5 %) of pages use JavaScript to load embedded resources. The number of embedded resources loaded via JavaScript has increased by 12.0 % from 2005 to 2012. We also show that JavaScript is responsible for 33.2 % more missing resources in 2012 than in 2005. This shows that JavaScript is responsible for an increasing proportion of the embedded resources unsuccessfully loaded by mementos. JavaScript is also responsible for 52.7 % of all missing embedded resources in our study.

Keywords

Web architectureWeb archiving Digital preservation

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Justin F. Brunelle
    • 1
  • Mat Kelly
    • 1
  • Michele C. Weigle
    • 1
  • Michael L. Nelson
    • 1
  1. 1.Department of Computer ScienceOld Dominion UniversityNorfolkUSA