Skip to main content
Log in

How old is the Web? Characterizing the age and the currency of the European scientific Web

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The aim of this paper is to model and study the age of the Web using a sample of about four million of web pages from the 16 European Research Area countries obtained during 2004 and 2005. Web page time-stamp (date when the web pages were created or last changed for last time), format and size in bytes data have been analysed. Several indicators are introduced to measure longitudinal aspects of the Web. Half-age is proposed as a measure of the age distribution because this is found to be exponential. “Web Update Index” and “Lifespan Index” are introduced to measure the changing rate of a small sample over time. Results show that the British Web space has the youngest Web pages while the Greek and Belgian ones have the oldest. The study also compared Web pages topics and found that Biology pages are more stable than Physics pages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adamic, I. A., Huberman, B. A. (2001), The web’s hidden order, Communications of the ACM, 44(9): 55–59.

    Article  Google Scholar 

  • Adamic, L. A., Huberman, B. A. (2000), Power-law distribution of the world wide web, Science, 287(5461): 2115.

    Article  Google Scholar 

  • Albert, R., Jeong, H., Barabasi, A. L. (1999), Internet — diameter of the world-wide web, Nature, 401(6749): 130–131.

    Article  Google Scholar 

  • Baeza-Yates, R., Castillo, C. (2005), Características de la web chilena 2004, Technical Report, Center for Web Research, University of Chile, Santiago de Chile. http://www.ciw.cl/webcl2004 /Web_Chilena_2004.pdf

    Google Scholar 

  • Baeza-Yates, R., Castillo, C. (2007), Características de la web chilena 2006, Technical Report, Center for Web Research, University of Chile, Santiago de Chile. http://www.ciw.cl/material/web_chilena_2006 /index.html

    Google Scholar 

  • Bar-Ilan, J. (2001), Data collection methods on the Web for informetric purposes — A review and analysis, Scientometrics, 50 (1): 7–32.

    Article  MathSciNet  Google Scholar 

  • Bar-Ilan, J., Peritz B. C. (1999), The life span of a specific topic on the Web; the case of ‘Informetrics’: A quantitative analysis, Scientometrics, 46(3): 371–382.

    Article  Google Scholar 

  • Bar-Yossef, Z., Broder, A. Z., Kumar, R., Tomkins, A. (2004), Sic transit gloria telae: towards an understanding of the web’s decay, In: S. I. Feldman, M. Uretsky, M. Najork, C. E. Wills (Eds), Proceedings of the 13th international conference on World Wide Web, www 2004, ACM Press, New York, USA, pp. 328–337.

    Chapter  Google Scholar 

  • Barabasi, A. L., Albert, R. (1999), Emergence of scaling in random networks, Science, 286(5439): 509–512.

    Article  MathSciNet  Google Scholar 

  • Berners-Lee, T., Cailliau, R., Groff, J. F., Pollermann, B. (1992), World-wide web: the information universe, Internet Research, 2(1): 52–58.

    Article  Google Scholar 

  • Bordignon, F. R. A., Lavallén, P. J., Tolosa, G. H. (2006), El Estado de la Web de Paraguay y la Sociedad de la Información, In: Proceedings of the I Congreso Internacional y VI Congreso Nacional de Bibliotecarios, Documentalistas y Archivistas del Paraguay, Asunción, Paraguay. http://eprints.rclis.org/archive/00007704/01/webpy.pdf

  • Bordignon, F. R. A., Tolosa, G. H. (2006), Characterization of South American Educational Web Domains, In: Proceedings Congreso Argentino de Ciencias de la Computación. CACIC 2006, Potrero de los Funes, Argentina. http://eprints.rclis.org/archive/00007705/01/676-Educational_Webs_CACIC_English_v4.pdf

  • Brewington, B. E., Cybenko, G. (2000), How dynamic is the Web? Computer Networks, 33(1–6): 257–276. http://www9.org/w9cdrom/264/264.html

    Article  Google Scholar 

  • Burrell, Q. (2002), The nth-citation distribution and obsolescence, Scientometrics, 53(3): 309–323.

    Article  Google Scholar 

  • Cailliau, R. (1995), A short history of the web. Netvalley.com http://www.netvalley.com/archives/mirrors/robert_cailliau_speech.htm

  • Cho, J., Garcia-Molina, H., (2000), The evolution of the web and implications for an incremental crawler. In: Proceedings of the 26th International Conference on Very Large Data Bases, San Francisco, USA.

  • Cothey, V. (2004), Web-crawling reliability, Journal of the American Society for Information Science and Technology, 55(14): 1228–1238.

    Article  Google Scholar 

  • Cothey, V. (2005), Some preliminary results from a link-crawl of the European Union Research Area Web. In: P. Ingwersen, B. Larsen (Eds), Proceeding of the 10th International Conference of the International Society for Scientometrics and Informetrics, Stockholm, Sweden.

  • Douglis, F., Feldmann, A., Krishnamurthy, B. (1997), Rate of change and other metrics: A live study of the World Wide Web, In: Proceedings of the USENIX Symposium on Internet Technologies and Systems, Monterrey, USA, pp. 147–158, http://www.usenix.org/publications/library/proceedings/usits97/full_papers/douglis_rate/douglis_rate_html/douglis_rate.html

  • Egghe, L., Ravichandra, R. I. K. (1992), Citation age data and the obsolescence function: Fits and explanations, Information processing and management, 28(2): 201–217.

    Article  Google Scholar 

  • Fetterly, D., Manasse, M., Najork, M., Wiener, J. L. (2003), A large-scale study of the evolution of web pages. In: Proceedings of the 12th International World Wide Web Conference, ACM Press, Budapest, Hungary, pp. 669–678

    Google Scholar 

  • Flake, G. W., Lawrence, S., Giles, L. (2000), Efficient identification of web communities. In: Proceeding of the Sixth International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD-2000), Boston, USA, pp. 150–160.

  • Harter, S., Kim, H. (1996), Electronic journals and scholarly communication: a citation and reference study, Information Research, 2(1): paper 9a, http://informationr.net/ir/2-1/paper9a.html

    Google Scholar 

  • Internet Systems Consortium (2004), Domain Survey Information. http://www.isc.org/index.pl?/ops/ds/.

  • Koehler, W. (1999), An analysis of web page and web site constancy and permanence, Journal of the American Society for Information Science, 50(2): 162–180.

    Article  MathSciNet  Google Scholar 

  • Koehler, W. (2002), Web page change and persistence — a four-year longitudinal study, Journal of the American Society for Information Science and Technology, 53(2): 162–171.

    Article  Google Scholar 

  • Koehler, W. (2004), A longitudinal study of web pages continued: A consideration of document persistence, Information Research, 9(2): paper 174, http://informationr.net/ir/9-2/paper174.html

    Google Scholar 

  • Lawrence, S., Pennock, D. M., Flake, G. W., Krovetz, R., Coetzee, F. M., Glover, E., Nielsen, F. A., Kruger, A., Giles, C. L. (2001), Persistence of web references in scientific research, Computer, 34(2): 26–31.

    Article  Google Scholar 

  • Leydesdorff, L., Curran, M. (2000), Mapping university-industry-government relations on the internet: the construction of indicators for a knowledge-based economy, Cybermetrics, 4(1): paper 2.

    Google Scholar 

  • Line, M. B. (1970), The ‘half-life’ of periodical literature: Apparent and real obsolescence, Journal of Documentation, 26: 46–54.

    Article  Google Scholar 

  • Line, M. B. (1993), Changes in the literature with time — Obsolescence revisited, Library Trends, 41: 665–683.

    Google Scholar 

  • Nelson, M., Allen, B. (2002), Object persistence and availability in digital libraries. D-Lib Magazine, 8(1), http://www.dlib.org/dlib/january02/nelson/01nelson.html

  • Netcraft Ltd (2007), Web Server Survey. http://news.netcraft.com/archives/web_server_survey.html

  • Nicholas, D., Huntington, P., Dobrowolski, T., Rowlands, I., Jamali, H. R., Polydoratou, P. (2005), Revisiting ‘obsolescence’ and journal article ‘decay’ through usage data: An analysis of digital journal use by year of publication, Information Processing & Management, 41(6): 1441–1461

    Article  Google Scholar 

  • Nielsen, J. (2007), 100 Million Websites, http://www.useit.com/alertbox/web-growth.html.

  • O’Neill, E. T., Lavoie, B. F., Bennet, R. (2003), Trends in the evolution of the public web 1998–2002, D-Lib Magazine, 9(4), http://www.dlib.org/dlib/april03/lavoie/04lavoie.html

  • Ortega, J. L., Aguillo, I. F., Cothey, V., Scharnhorst, A. (2008), Maps of the academic web in the European higher education area — an exploration of visual web indicators. Scientometrics, 74(2): 295–308.

    Article  Google Scholar 

  • Ortega, J. L., Aguillo, I. F., Prieto, J. A. (2006), Longitudinal study of contents and elements in the scientific web environment, Journal of Information Science, 32(4): 344–351.

    Article  Google Scholar 

  • Payne, N., Thelwall, M. (2008), Longitudinal trends in academic web links, Journal of Information Science, 34(1): 3–14.

    Article  Google Scholar 

  • Pennock, D. M., Flake, G. W., Lawrence, S., Glover, E. J., Giles, C. L. (2002), Winners don’t take all: characterizing the competition for links on the web, Proceedings of the National Academy of Sciences of the United States of America, 99(8): 5207–5211.

    Article  MATH  Google Scholar 

  • Price, D. D. S. (1976), A general theory of bibliometric and other cumulative advantage processes, Journal of the American Society for Information Science, 27(5): 292–306.

    Article  MathSciNet  Google Scholar 

  • Rivest, R. (1992), The MD5 Message Digest Algorithm, Internet RFC 1321. http://people.csail.mit.edu /rivest/Rivest-MD5.txt

  • Rousseau, R. (1997), Sitations: an exploratory study, Cybermetrics, 1(1): paper 1, http://www.cindoc.csic.es/cybermetrics/articles/v1i1p1.html

    MathSciNet  Google Scholar 

  • Rutherford, E. (1900), A radioactive substance emitted from thorium compounds, Philosophical Magazine, 49(5): 1–14.

    Google Scholar 

  • Spinellis, D. (2003), The decay and failure of web references, Communications of the ACM, 46(1): 71–77.

    Article  MathSciNet  Google Scholar 

  • Tolosa, G., Bordignon, F., Baeza-Yates, R., Castillo, C. (2007), Characterization of the Argentinian web, Cybermetrics, 11(1): paper 3, http://www.cindoc.csic.es/cybermetrics/articles/v11i1p3.html

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Luis Ortega.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ortega, J.L., Cothey, V. & Aguillo, I.F. How old is the Web? Characterizing the age and the currency of the European scientific Web. Scientometrics 81, 295–309 (2009). https://doi.org/10.1007/s11192-008-2149-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-008-2149-x

Keywords

Navigation