Skip to main content

ArchiveWeb: collaboratively extending and exploring web archive collections—How would you like to work with your collections?


Curated web archive collections contain focused digital content which is collected by archiving organizations, groups, and individuals to provide a representative sample covering specific topics and events to preserve them for future exploration and analysis. In this paper, we discuss how to best support collaborative construction and exploration of these collections through the ArchiveWeb system. ArchiveWeb has been developed using an iterative evaluation-driven design-based research approach, with considerable user feedback at all stages. The first part of this paper describes the important insights we gained from our initial requirements engineering phase during the first year of the project and the main functionalities of the current ArchiveWeb system for searching, constructing, exploring, and discussing web archive collections. The second part summarizes the feedback we received on this version from archiving organizations and libraries, as well as our corresponding plans for improving and extending the system for the next release.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9





















  20. ArchiveWeb is not (yet) directly coupled to the Archive-It administrative interface: curators have to switch to this interface to add suggested seed URLs to the corresponding Archive-It collection.


  22. Web Archive Transformation (WAT) files contain key metadata such as capture information, essential text and link data, and other information extracted from (W)ARC files.

  23. Web Archive Named Entities (WANE) files contain a list of people, places, and organizations mentioned in each valid archived record, extracted using Stanford Named Entity Recognizer.

  24. Longitudinal Graph Analysis (LGA) files feature a complete list of what URLs link to what URLs, along with a timestamp, within an entire web archive collection.




  1. Alonso, O., Strötgen, J., Baeza-Yates, R., Gertz, M.: Temporal information retrieval: challenges and opportunities. In: Proceedings of the 1st international temporal web analytics workshop (TWAW 2011) associated to WWW’11, pp. 1–8 (2011)

  2. Bragg, M., Hanna, K., Donovan, L., Hukill, G., Peterson, A.: The web archiving life cycle model. White Paper. (2013)

  3. Cutrell, E., Robbins, D., Dumais, S., Sarin, R.: Fast, flexible filtering with phlat. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’06), pp. 261–270 (2006)

  4. Dougherty, M., van den Heuvel, C.: Historical infrastructures for web archiving: annotation of ephemeral collections for researchers and cultural heritage institutions. (2009)

  5. Dumais, S., Cutrell, E., Cadiz, J., Jancke, G., Sarin, R., Robbins, D.C.: Stuff I’ve seen: a system for personal information retrieval and re-use. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (SIGIR ’03), pp. 72–79 (2003)

  6. Fernando, Z.T., Marenzi, I., Nejdl, W., Kalyani, R.: Archiveweb: collaboratively extending and exploring web archive collections. In: Proceedings of 20th International Conference on Theory and Practice of Digital Libraries: Research and Advanced Technology for Digital Libraries (TPDL ’16), pp. 107–118 (2016)

  7. Gomes, D., Miranda, J., Costa, M.: A survey on web archiving initiatives. In: Proceedings of the 15th International Conference on Theory and Practice of Digital Libraries: Research and Advanced Technology for Digital Libraries (TPDL ’11), pp. 1045–1050 (2011)

  8. Jackson, A., Lin, J., Milligan, I., Ruest, N.: Desiderata for exploratory search interfaces to web archives in support of scholarly activities. In: Proceedings of the 16th Joint Conference on Digital Libraries, JCDL ’16, pp. 103–106 (2016)

  9. Lieser, W.: Digital Art (Art Pocket). H.F.Ullmann Publishing GmbH, Berlin (2009)

  10. Lin, J., Gholami, M., Rao, J.: Infrastructure for supporting exploration and discovery in web archives. In: Proceedings of the 23rd International Conference on World Wide Web (WWW ’14), pp. 851–856 (2014)

  11. Marenzi, I.: Multiliteracies and e-learning2.0. In: Blell, G., Kupetz, R. (eds.) Foreign Language Pedagogy, Content and Learner Oriented, vol. 28. Peter Lang, Frankfurt am Main (2014)

  12. Marenzi, I., Nejdl, W.: I search therefore I learn—active and collaborative learning in language teaching: two case studies. In: Okada, A., Connolly, T., Scott, P. (eds.) Collaborative Learning 2.0: Open Educational Resources, pp. 103–125. IGI Global, Hershei, PA (2012)

  13. Marenzi, I., Zerr, S.: Multiliteracies and active learning in CLIL—the development of LearnWeb2.0. In: IEEE Trans. Learn. Technol. (TLT) 5, 336–348 (2012)

  14. Odijk, D., Gârbacea, C., Schoegje, T., Hollink, L., de Boer, V., Ribbens, K., van Ossenbruggen, J.: Supporting Exploration of Historical Perspectives across Collections. In: Proceedings of 19th International Conference on Theory and Practice of Digital Libraries (TPDL ’15), pp. 238–251 (2015)

  15. Padia, K., AlNoamany, Y., Weigle, M.C.: Visualizing digital collections at Archive-It. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’12), pp. 15–18 (2012)

  16. Ras, M., van Bussel, S.: Web archiving user survey. Technical report, National Library of the Netherlands (Koninklijke Bibliotheek) (2007)

  17. Stalker, P.J.: Gaming in Art: A Case Study of Two Examples of the Artistic Appropriation of Computer Games and the Mapping of Historical Trajectories Of ’Art Games’ Versus Mainstream Computer Games. University of Witwatersrand, South Africa (2005)

  18. Weikum, G., Ntarmos, N., Spaniol, M., Triantafillou, P., Benczúr, A., Kirkpatrick, S., Rigaux, P., Williamson, M.: Longitudinal analytics on web archive data: it’s about time! In: Proceedings of the \(5^{th}\) biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, January 9-12, pp. 199–202 (2011)

  19. Winters, J.: Tackling complexity in humanities big data: From parliamentary proceedings to the archived web. In Hiltunen, T., McVeigh, J., Säily, T. (eds.), Big and Rich Data in English Corpus Linguistics: Methods and Explorations. Studies in Variation, Contacts and Change in English. Helsinki: VARIENG (Forthcoming 2017)

Download references


We especially thank Jefferson Bailey from the Internet Archive who provided us with the contacts to his colleagues at university libraries and archiving institutions, and for his helpful comments during the requirements and evaluation phase. We are also grateful to all experts, who participated with enthusiasm in our evaluation, providing valuable feedback and useful suggestions to improve the ArchiveWeb system. This work was partially funded by the European Commission in the context of the Alexandria project (ERC advanced Grant No. 339233).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ivana Marenzi.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fernando, Z.T., Marenzi, I. & Nejdl, W. ArchiveWeb: collaboratively extending and exploring web archive collections—How would you like to work with your collections?. Int J Digit Libr 19, 39–55 (2018).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Working with web archives
  • Collaborative search and exploration
  • Web archive requirements and evaluation