A Survey on Web Archiving Initiatives

  • Daniel Gomes
  • João Miranda
  • Miguel Costa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6966)

Abstract

Web archiving has been gaining interest and recognized importance for modern societies around the world. However, for web archivists it is frequently difficult to demonstrate this fact, for instance, to funders. This study provides an updated and global overview of web archiving. The obtained results showed that the number of web archiving initiatives significantly grew after 2003 and they are concentrated on developed countries. We statistically analyzed metrics, such as, the volume of archived data, archive file formats or number of people engaged. Web archives all together must process more data than any web search engine. Considering the complexity and large amounts of data involved in web archiving, the results showed that the assigned resources are scarce. A Wikipedia page was created to complement the presented work and be collaboratively kept up-to-date by the community.

Keywords

National Library Archive Collection Internet Archive Archive Content Joint Information System Committee 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Austrian National Library. Österreichische Nationalbibliothek - Web archiving (March 2011), http://www.onb.ac.at/ev/about/webarchive.htm
  2. 2.
    Bibliotheksservice-Zentrum Baden-Württemberg. Willkommen im Bibliotheksservice-Zentrum Baden-Württemberg (March 2011), http://www.bsz-bw.de/index.html
  3. 3.
    British Library. UK Web Archive (March 2011), http://www.webarchive.org.uk/ukwa/
  4. 4.
    Burner, M., Kahle, B.: WWW Archive File Format Specification (September 1996), http://pages.alexa.com/company/arcformat.html
  5. 5.
    California Digital Library. Web Archives: yesterday’s web; today’s archives (March 2011), http://webarchives.cdlib.org/
  6. 6.
    Charlesworth, A.: Legal issues relating to the archiving of Internet resources in the UK, EU, USA and Australia (2003), http://www.jisc.ac.uk/media/documents/programmes/preservation/archiving_legal.pdf
  7. 7.
    Columbia University Libraries. Web Resources Collection Program (March 2011), https://www1.columbia.edu/sec/cu/libraries/bts/web_resource_collection/
  8. 8.
    Day, M.: Collecting and preserving the World Wide Web (2003), http://www.jisc.ac.uk/uploaded_documents/archiving_feasibility.pdf
  9. 9.
    de Kunder, M.: WorldWideWebSize.com | The size of the World Wide Web (March 2011), http://www.worldwidewebsize.com/
  10. 10.
    Bundestag, D.: Web-Archiv (March 2011), http://webarchiv.bundestag.de/cgi/kurz.php
  11. 11.
    Dougherty, M., Meyer, E., Madsen, C., Van den Heuvel, C., Thomas, A., Wyatt, S.: Researcher engagement with web archives: State of the art. Technical report, Joint Information Systems Committee, JISC (2010)Google Scholar
  12. 12.
    FCCN. Portuguese Web Archive: search the past (March 2011), http://www.archive.pt/
  13. 13.
    Google Inc. Official Google Blog: We knew the web was big... (July 2008), http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html
  14. 14.
    Grotke, A.: IIPC - 2008 Member Profile Survey Results (December 2008), http://www.netpreserve.org/publications/IIPC_Survey_Report_Public_12152008.pdf
  15. 15.
    Harvard University Library. Web Archive Collection Service - Harvard University Library (March 2011), http://wax.lib.harvard.edu/collections/home.do
  16. 16.
    Historical Archives of Ljubljana. Zgodovinski arhiv Ljubljana (March 2011), http://www.zal-lj.si/
  17. 17.
    Ina. Ina.fr - A la une: vidéo, radio, audio et publicité - Actualités, archives du jour de la radio et de la télévision en ligne (March 2011), http://www.ina.fr/
  18. 18.
    Innsbruck Newspaper Archive at the Univ. of Innsbruck and Dept. for Digitisation & Digital Preservation at the Univ. of Innsbruck Lib. Digitale Literatur Magazine (March 2011), http://dilimag.literature.at/default.alo
  19. 19.
    International Web Archiving Workshop. Index (March 2011), http://iwaw.europarchive.org/
  20. 20.
    Internet Archive. Nutchwax - Home Page (March 2008), http://archive-access.sourceforge.net/
  21. 21.
    Internet Archive. Internet Archive: Digital Library of Free Books, Movies, Music & Wayback Machine (March 2011), http://www.archive.org/
  22. 22.
    Internet Memory Foundation. Web Archiving in Europe (2010), http://internetmemory.org/images/uploads/Web_Archiving_Survey.pdf
  23. 23.
    Internet Memory Foundation. Welcome to Internet Memory Foundation website (March 2011), http://internetmemory.org/en/
  24. 24.
    I.ISO. 28500: 2009 Information and documentation-WARC file format (2009)Google Scholar
  25. 25.
    Library and Archives Canada. Home - Library and Archives Canada (March 2011), http://www.collectionscanada.gc.ca/index-e.html
  26. 26.
    Library of Catalonia. PADICAT, Patrimoni Digital de Catalunya (March 2011), http://www.padicat.cat/
  27. 27.
    Library of Congress. Web Archiving (Library of Congress) (March 2011), http://www.loc.gov/webarchiving/
  28. 28.
    Masanès, J.: Web Archiving. Springer-Verlag New York, Inc., Secaucus (2006)CrossRefGoogle Scholar
  29. 29.
    Miranda, J., Gomes, D.: Trends in Web characteristics. In: 7th Latin American Web Congress (LA-Web 2009), Merida, Mexico (November 2009)Google Scholar
  30. 30.
    National and University Library in Zagreb. Hrvatski arhiv weba, HAW (March 2011), http://haw.nsk.hr/
  31. 31.
    National and University Library of Iceland. Vefsafn - English (March 2011), http://vefsafn.is/index.php?page=english
  32. 32.
    National Central Library, Taiwan. Web Archive Taiwan (March 2011), http://webarchive.ncl.edu.tw/nclwa98Front/
  33. 33.
    National Diet Library. Web Archiving Project (March 2011), http://warp.da.ndl.go.jp/search/
  34. 34.
    National Diet Library, Japan - Conference of Directors of National Libraries in Asia and Oceania 2010. Report on questionnaire survey on web-archiving - Document 3 (2010), http://www.ndl.go.jp/en/cdnlao/meetings/pdf/report_Japan1_doc3.pdf
  35. 35.
    National Library Board Singapore. Web Archive - National Library Board, Singapore (March 2011), http://was.nl.sg/
  36. 36.
    National Library of Australia. PADI - Preserving Access to Digital Information (March 2011), http://www.nla.gov.au/padi/
  37. 37.
    National Library of Australia. Pandora Archive - Preserving and Accessing Networked DOcumentary Resources of Australia (March 2011), http://pandora.nla.gov.au/
  38. 38.
    National Library of China. Web Information Collection and Preservation - WICP (Chinese Web Archive) (March 2011), http://210.82.118.162:9090/webarchive
  39. 39.
    National Library of France. BnF - Digital legal deposit (March 2011), http://www.bnf.fr/en/professionals/digital_legal_deposit.html
  40. 40.
    National Library of Korea. About OASIS - About OASIS (March 2011), http://www.oasis.go.kr/intro_new/intro_overview_e.jsp
  41. 41.
    National Library of New Zealand. New Zealand Web Archive - National Library of New Zealand (March 2011), http://www.natlib.govt.nz/collections/a-z-of-all-collections/nz-web-archive
  42. 42.
    National Library of Norway. Nasjonalbiblioteket || index (March 2011), http://www.nb.no/
  43. 43.
    National Library of Spain. Biblioteca Nacional de España. Ministerio de Cultura (March 2011), http://www.bne.es/es/LaBNE/PreservacionDominioES/
  44. 44.
    National Library of Sweden. Swedish Websites - Kungliga biblioteket (March 2011), http://www.kb.se/english/find/internet/websites/
  45. 45.
    National Library of the Czech Republic. WebArchiv (March 2011), http://en.webarchiv.cz/
  46. 46.
    National library of the Netherlands. Web Archiving (March 2011), http://www.kb.nl/hrd/dd/dd_projecten/webarchivering/index-en.html
  47. 47.
    National Taiwan University Library. NTU Web Archiving System, NTUWAS (March 2011), http://webarchive.lib.ntu.edu.tw/eng/default.asp
  48. 48.
    North Carolina State Archives and State Library of North Carolina. North Carolina State Government Web Site Archives (March 2011), http://webarchives.ncdcr.gov/
  49. 49.
    Ntoulas, A., Cho, J., Olston, C.: What’s new on the web?: the evolution of the web from a search engine perspective. In: Proceedings of the 13th International Conference on World Wide Web, pp. 1–12. ACM Press, New York (2004)Google Scholar
  50. 50.
    Public Library Čačak. Web Archive of Cacak - English - Digitalizacija i digitalne biblioteke (March 2011), http://digital.cacak-dis.rs/english/web-archive-of-cacak/
  51. 51.
    Shiozaki, R.: Role and justification of web archiving by national libraries - A questionnaire survey (2009), http://lis.sagepub.com/content/41/2/90
  52. 52.
    Spinellis, D.: The decay and failures of web references. Communications of the ACM 46(1), 71–77 (2003)MathSciNetCrossRefGoogle Scholar
  53. 53.
    State and University Library. netarkivet.dk (March 2011), http://netarkivet.dk/index-da.php
  54. 54.
    State Library of Tasmania. Our Digital Island (March 2011), http://odi.statelibrary.tas.gov.au/
  55. 55.
    Swiss National Library. Swiss National Library NL -e-Helvetica (March 2011), http://www.nb.admin.ch/nb_professionnel/01693/index.html?lang=en
  56. 56.
    The National Archives. UK Government Web Archive | The National Archives (March 2011), http://www.nationalarchives.gov.uk/webarchive/
  57. 57.
    The National Library of Finland. Finnish Web Archive (March 2011), http://verkkoarkisto.kansalliskirjasto.fi/
  58. 58.
    The Regents of the University of Michigan. University of Michigan Web Archives (March 2011), http://bentley.umich.edu/uarphome/webarchives/webarchive.php
  59. 59.
    UNESCO. Charter on the Preservation of Digital Heritage. In: Adopted at the 32nd session of the General Conference of UNESCO (October 17, 2003), http://portal.unesco.org/ci/en/files/13367/10700115911Charter_en.pdf/Charter_en.pdf
  60. 60.
    United States Securities and Exchange Commission. Form 10-K (December 2010), http://www.sec.gov/Archives/edgar/data/1288776/000119312511032930/d10k.htm
  61. 61.
    University of Hawaii at Manoa Library. Web Archiving Project for the Pacific Islands | University of Hawaii at Manoa Library (March 2011), http://library.manoa.hawaii.edu/research/archiveit/
  62. 62.
    University of Texas at Austin. Latin American Web Archiving Project, LAWAP (March 2011), http://lanic.utexas.edu/project/archives/

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Daniel Gomes
    • 1
  • João Miranda
    • 1
  • Miguel Costa
    • 1
  1. 1.FCCN: Portuguese Web ArchiveLisboaPortugal

Personalised recommendations