Skip to main content

Web archives as a data resource for digital scholars

Abstract

The aim of this article is to provide an exploratory analysis of the landscape of web archiving activities in Europe. Our contribution, based on desk research, and complemented with data from interviews with representatives of European heritage institutions, provides a descriptive overview of the state-of-the-art of national web archiving in Europe. It is written for a broad interdisciplinary audience, including cultural heritage professionals, IT specialists and managers, and humanities and social science researchers. The legal, technical and operational aspects of web archiving and the value of web archives as born-digital primary research resources are both explored. In addition to investigating the organisations involved and the scope of their web archiving programmes, the curatorial aspects of the web archiving process, such as selection of web content, the tools used and the provision of access and discovery services are also considered. Furthermore, general policies related to web archiving programmes are analysed. The article concludes by offering four important issues that digital scholars should consider when using web archives as a historical data source. Whilst recognising that this study was limited to a sample of only nine web archives, this article can nevertheless offer some useful insights into the technical, legal, curatorial and policy-related aspects of web archiving. Finally, this paper could function as a stepping stone for more extensive and qualitative research.

This is a preview of subscription content, access via your institution.

Notes

  1. For instance, obtaining prior authorisation of right holders, creating new exceptions for reproduction or communication to the public for archiving purposes and obtaining a fair balance between the public interest in preserving information of cultural or historical significance and the interests of rights holders.

  2. It is the case for France with the DADVSI Law (see « Loi n° 2006–961 du 1er août 2006 relative au droit d’auteur et aux droits voisins dans la société de l’information »), for Luxembourg (see « Loi luxembourgeoise du 25 juin 2004 portant réorganisation des instituts culturels de l’Etat »), for United Kingdom (see « Legal Deposit Libraries (Non-Print Works) Regulations of 5th April 2013 »), For Denmark (see “Danish Act n° 1439 on Legal Deposit of Published Material of 22nd December 2004”).

  3. For instance, The Netherlands, Portugal and Switzerland (at the federal level).

  4. Prior authorization of the right holders is not necessary for websites that have fallen into the public domain or that were made available under the system of Creative Commons License (Beunen and Schiphof 2006, p. 16).

  5. This approach is the one of the National Library of The Netherlands (KB Nederland, n.d.-b and n.d.-d).

  6. This approach is the one of Arquivo.pt. in Portugal (Arquivo.pt, n.d.-c).

  7. Let us indicate that websites are composed of a set of elements that can be each protected by copyright (original texts, images, search engine, database, etc.) and may each have a different right holder (KB Nederland n.d.-e). We also have to underline the fact that websites can also be composed of elements protected by other rights such as trademark law, database right, neighboring rights and image right (KB Nederland n.d.-b).

  8. Act for which the consent of the right holders is in principle required.

  9. In France, the DADVSI Law has introduced an exception allowing acts of reproduction and communication related to the web legal deposit (see French Heritage Code, art. L132–4 to L132–6). In the United Kingdom, Sections 19 to 31 of the Legal Deposit Libraries (Non-Print Works) Regulations of 5th April 2013 and Section 44A of the Copyright, Designs and Patents Act of 15th November 1988 allow the realization of certain activities related to web legal deposit without that they violate copyright.

  10. For instance, in France, Article L132–2-1 of the French Heritage Code authorize the “Bibliothèque Nationale de France” to turn to domain names management bodies or to the Higher Audiovisual Council to identify the publishers and producers of websites. There is also a similar legal provision in Denmark (See Danish Act n° 1439 on Legal Deposit of Published Material of 22nd December 2004, §11).

  11. It is the case in France (see French Heritage Code, art. R132–23-1, II), United Kingdom (see Legal Deposit Libraries (Non-Print Works) Regulations of 5th April 2013, Section 16 (4)) and Denmark (see Danish Act n° 1439 on Legal Deposit of Published Material of 22nd December 2004, §10).

  12. The Legal Deposit Libraries (Non-Print Works) Regulations 2013

  13. Sierman and Teszelszky 2017; BnF 2017a, b; Maurer and Els 2017b; UK Web Archive (n.d.-a); Hockx-Yu 2014; Brügger et al. 2017; Arquivo.pt n.d.-c; Ryan 2017; National Library of Ireland 2017a, b.

  14. Tanésie et al. 2017; Maurer and Els 2017b; British Library 2017a; British Library (n.d.-b); National Archives (n.d.-a); Netarkivet.dk 2017; Moesgaard and Larsen 2017a

  15. In the case of the National Library of Ireland, this only counts for the web archive collections that were based on a selective policy. Access conditions to the web material collected during the top-level domain crawl that started in 2017 were not yet defined at the time of the interview.

  16. See Legal Deposit Libraries (Non-Print Works) Regulation of 5th April 2013, Section 23.

  17. See French Heritage Code, art. R132–23-2.

  18. See http://timetravel.mementoweb.org/

References

Download references

Acknowledgements

The research outlined in this article was conducted in the context of the PROMISE-project. This project received funding from the Belgian Science Policy Office (BELSPO) in December 2016, through their Belgian Research Action through Interdisciplinary Networks (BRAIN) research programme, for a 24-month period. The project was initiated by the Royal Library of Belgium and the State Archives of Belgium and the project consortium also includes the universities of Ghent and Namur and the Information and Documentation School of the Brussels-Brabant Institute of Higher Education (HE2B IESSID). We would like to thank the interviewees and their colleagues for taking the time to answer our many questions.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Eveline Vlassenroot or Sally Chambers.

List of institutions and representatives consulted

List of institutions and representatives consulted

  • National Library of The Netherlands: Kees Teszelszky (Researcher web archiving, Digital Preservation Department)

  • National Archive of The Netherlands: Antal Posthumus (Adviser recordkeeping, Directie Infrastructuur & Advies) and Jeroen van Luin (Acquisition and Maintenance of Digital Archives)

  • National Library of France (BnF): Pascal Tanésie (Assistant to the head of the department of digital legal deposit), Sara Aubry (Web Archiving Project Manager, IT department) and Bert Wendland (IT Department)

  • National Library of Luxembourg: Yves Maurer (Webarchiving Technical Manager) and Ben Els (Digital Curator)

  • The Royal Danish Library: Jakob Moesgaard (Specialkonsulent, Department of Digital Legal Deposit and Preservation) and Tue Hejlskov Larsen (IT analyst)

  • The UK National Archives: Tom Storrar (Head of Web Archiving) and Claire Newing (Web Archivist)

  • The British Library: Jason Webber (Web Archiving Engagement and Liaison Manager)

  • Arquivo.pt.: Daniel Gomes (Head of Arquivo.pt., the Portuguese web-archive, Advanced Services Department)

  • National Library of Ireland (NLI): Maria Ryan (Web Archivist)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Vlassenroot, E., Chambers, S., Di Pretoro, E. et al. Web archives as a data resource for digital scholars. Int J Digit Humanities 1, 85–111 (2019). https://doi.org/10.1007/s42803-019-00007-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42803-019-00007-7

Keywords

  • Web archives
  • Digital scholarship
  • Curation of digital collections
  • Copyright
  • Technology for web archiving