Skip to main content

Creating Structure in Web Archives with Collections: Different Concepts from Web Archivists

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13541)

Abstract

As web archives’ holdings grow, archivists subdivide them into collections so they are easier to understand and manage. In this work, we review the collection structures of eight web archive platforms. We note a plethora of different approaches to web archive collection structures. Some web archive collections support sub-collections and some permit embargoes. Curatorial decisions may be attributed to a single organization or many. Archived web pages are known by many names: mementos, copies, captures, or snapshots. Some platforms restrict a memento to a single collection and others allow mementos to cross collections. Knowledge of collection structures has implications for many different applications and users. Visitors will need to understand how to navigate collections. Future archivists will need to understand what options are available for designing collections. Platform designers need it to know what possibilities exist. The developers of tools that consume collections need to understand collection structures so they can meet the needs of their users.

Keywords

  • Web archives
  • Collections
  • Information organization
  • Memento

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-031-16802-4_45
  • Chapter length: 9 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-031-16802-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.

References

  1. AlNoamany, Y., Weigle, M.C., Nelson, M.L.: Generating stories from archived collections. In: Proceedings of the 2017 ACM on Web Science Conference, pp. 309–318. Troy, New York (2017). https://doi.org/10.1145/3091478.3091508

  2. Archive-It: archive-it (2022). https://archive-it.org

  3. Arms, W.Y., Aya, S., Dmitriev, P., Kot, B., Mitchell, R., Walle, L.: A research library based on the historical collections of the internet archive. D-Lib Mag. 12(2), (2006). http://www.dlib.org/dlib/february06/arms/02arms.html

  4. Brodkin, J.: US edits National Stockpile website after Kushner claims it’s not for states. Ars Technica (2020). https://arstechnica.com/tech-policy/2020/04/us-edits-national-stockpile-website-after-kushner-claims-its-not-for-states/

  5. Crook, E.: Web archiving in a web 2.0 world. Electron. Libr. 27(5), 831–836 (2009). https://doi.org/10.1108/02640470910998542

  6. Curty, R.G., Zhang, P.: Social commerce: looking back and forward. In: Proceedings of the 2011 Meeting of the American Society for Information Science and Technology, vol. 48, pp. 1–10. New Orleans, Louisiana (2011). https://doi.org/10.1002/meet.2011.14504801096

  7. Deutch, S., McKay, S.: The future of artist files: here today, gone tomorrow. Art Documentation J. Art Libr. Soc. N. Am. 35(1), 27–42 (2016). https://doi.org/10.1086/685975

  8. Fenlon, K.: Toward a characterization of digital humanities research collections: A contrastive analysis of technical designs. In: Proceedings of the 2017 Annual Meeting of the Association for Information Science and Technology, pp. 82–92. Washington, DC (2017). https://doi.org/10.1002/pra2.2017.14505401010

  9. Gossen, G., Demidova, E., Risse, T.: Analyzing web archives through topic and event focused sub-collections. In: Proceedings of the 2016 ACM Conference on Web Science, pp. 291–295. Hannover, Germany (2016). https://doi.org/10.1145/2908131.2908175

  10. Graham, M.: The Wayback machine’s save page now is new and improved (2019). https://blog.archive.org/2019/10/23/the-wayback-machines-save-page-now-is-new-and-improved/

  11. Jayanetti, H.R., et al.: Creating structure in web archives with collections: different concepts from web archivists. Technical report arXiv (2022)

    Google Scholar 

  12. Jones, S.M.: Improving collection understanding for web archives with storytelling: shining light into dark and stormy archives. Ph.D. thesis, Old Dominion University (2021). https://doi.org/10.25777/zts6-v512

  13. Jones, S.M., Jayanetti, H., Kelly, M.: GitHub - oduwsdl/aiu: a library for interacting with web archive collections at Archive-It, Trove, PANDORA, and more. https://github.com/oduwsdl/aiu. Accessed 25 May 2021

  14. Jones, S.M., et al.: The DSA toolkit shines light into dark and stormy archives. Code4Lib J. (2022). https://journal.code4lib.org/articles/16441

  15. Jones, S.M., Weigle, M.C., Klein, M., Nelson, M.L.: Hypercane: intelligent sampling for web archive collections. In: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 316–317 (2021). https://doi.org/10.1109/JCDL52503.2021.00049

  16. Jones, S.M., Weigle, M.C., Nelson, M.L.: The off-topic memento toolkit. In: Proceedings of the 15th International Conference on Digital Preservation, pp. 1–10. Boston, Massachusetts, USA (2018). https://doi.org/10.17605/OSF.IO/UBW87

  17. Jones, S.M., Weigle, M.C., Nelson, M.L.: Hypercane: toolkit for summarizing large collections of archived webpages. ACM SIGWEB Newsl. (Summer), 1–14 (2021). https://doi.org/10.1145/3473044.3473047

  18. Kahle, B.: Wayback machine now has 898,570,440,000 URL’s (2020). https://twitter.com/brewster_kahle/status/1225167435399036939

  19. Klein, M., Balakireva, L., Van de Sompel, H.: Focused crawl of web archives to build event collections. In: Proceedings of the 2018 ACM Conference on Web Science, pp. 333–342. Amsterdam, Netherlands (2018). https://doi.org/10.1145/3201064.3201085

  20. Library of congress: library of congress web archive (2022). https://www.loc.gov/web-archives/collections/

  21. Milligan, I.: Lost in the infinite archive: the promise and pitfalls of web archives. Int. J. Hum. Arts Comput. 10(1), 78–94 (2016). https://doi.org/10.3366/ijhac.2016.0161

    MathSciNet  CrossRef  Google Scholar 

  22. Milligan, I.: History in the Age of Abundance: How the Web Is Transforming Historical Research. McGill-Queen’s Unversity Press, Montreal (2019)

    Google Scholar 

  23. Mull, I.R., Lee, S.E.: PIN pointing the motivational dimensions behind Pinterest. Comput. Hum. Behav. 33, 192–200 (2014). https://doi.org/10.1016/j.chb.2014.01.011

    CrossRef  Google Scholar 

  24. National and University Library in Zagreb: Croatian Web Archive (2022). https://haw.nsk.hr/

  25. Niu, J.: Functionalities of web archives. D-Lib 18(3/4) (2012). https://doi.org/10.1045/march2012-niu2

  26. Nwala, A.C., Weigle, M.C., Nelson, M.L.: Bootstrapping web archive collections from social media. In: Proceedings of the 2018 ACM Conference on Hypertext and Social Media, pp. 64–72. Baltimore, Maryland, USA (2018). https://doi.org/10.1145/3209542.3209560

  27. Nwala, A.C., Weigle, M.C., Nelson, M.L.: Scraping SERPs for archival seeds: it matters where you start. In: Proceedings of the 2018 ACM/IEEE Joint Conference on Digital Libraries, pp. 263–272. ACM, Fort Worth, Texas (2018). https://doi.org/10.1145/3197026.3197056

  28. Ogden, J., Halford, S., Carr, L.: Observing web archives: the case for an ethnographic study of web archiving. In: Proceedings of the 2017 ACM on Web Science Conference, pp. 299–308 (2017). https://doi.org/10.1145/3091478.3091506

  29. Padia, K., AlNoamany, Y., Weigle, M.C.: Visualizing digital collections at archive-it. In: Proceedings of the 2012 ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 15–18. Washington, DC, USA (2012). https://doi.org/10.1145/2232817.2232821

  30. Rhizome: Conifer (2022). https://conifer.rhizome.org/

  31. Risse, T., Demidova, E., Gossen, G.: What do you want to collect from the web. In: Proceedings of the Building Web Observatories Workshop (BWOW). Seoul, Korea (2014). http://www.l3s.de/ risse/pub/bwow2014.pdf

  32. Slania, H.: Online art ephemera: web archiving at the national museum of women in the arts. Art Documentation J. Art Libr. Soc. North Am. 32(1), 112–126 (2013). https://doi.org/10.1086/669993

  33. Van de Sompel, H., Nelson, M., Sanderson, R.: RFC 7089 - HTTP Framework for Time-Based Access to Resource States - Memento (2013). https://tools.ietf.org/html/rfc7089

  34. Stafford, T.F., Stafford, M.R., Schkade, L.L.: Determining uses and gratifications for the Internet. Decis. Sci. 35(2), 259–288 (2004). https://doi.org/10.1111/j.00117315.2004.02524.x

    CrossRef  Google Scholar 

  35. The British Library, Bodleian Libraries, The National Library of Wales, The National Library of Scotland, Cambridge University Library, Trinity College Dublin: United Kingdom Web Archive (2022). https://www.webarchive.org.uk/

  36. The national library of Australia: PANDORA web archive (2022). http://pandora.nla.gov.au/

  37. The national library of Australia and others: trove (2022). https://trove.nla.gov.au/

  38. Tolliver, J.: Buffalo Mass Shooting Livestream Reached Millions Even After Twitch Removed Footage. HuffPost (2022). https://www.huffpost.com/entry/buffalo-mass-shooting-video-reached-millions-due-to-reuploads_n_628417f4e4b0c2dce65605b3

  39. Wang, R., Yang, F., Zheng, S., Sundar, S.S.: Why do we pin? new gratifications explain unique activities in Pinterest. Soc. Media+Soc. 2(3), 2056305116662173 (2016). https://doi.org/10.1177/2056305116662173

Download references

Acknowledgements

Our many thanks go to the International Internet Preservation Consortium (IIPC) for funding this work in part. This research was supported by the Information Science and Technology Institute and by the Laboratory Directed Research and Development program of Los Alamos National Laboratory (LANL) under project number 20210529CR. LANL is operated by Triad National Security, LLC, for the National Nuclear Security Administration of the U.S. Department of Energy (Contract No. 89233218CNA000001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Himarsha R. Jayanetti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Jayanetti, H.R. et al. (2022). Creating Structure in Web Archives with Collections: Different Concepts from Web Archivists. In: , et al. Linking Theory and Practice of Digital Libraries. TPDL 2022. Lecture Notes in Computer Science, vol 13541. Springer, Cham. https://doi.org/10.1007/978-3-031-16802-4_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16802-4_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16801-7

  • Online ISBN: 978-3-031-16802-4

  • eBook Packages: Computer ScienceComputer Science (R0)