Abstract
The aim of this paper is to model and study the age of the Web using a sample of about four million of web pages from the 16 European Research Area countries obtained during 2004 and 2005. Web page time-stamp (date when the web pages were created or last changed for last time), format and size in bytes data have been analysed. Several indicators are introduced to measure longitudinal aspects of the Web. Half-age is proposed as a measure of the age distribution because this is found to be exponential. “Web Update Index” and “Lifespan Index” are introduced to measure the changing rate of a small sample over time. Results show that the British Web space has the youngest Web pages while the Greek and Belgian ones have the oldest. The study also compared Web pages topics and found that Biology pages are more stable than Physics pages.
Similar content being viewed by others
References
Adamic, I. A., Huberman, B. A. (2001), The web’s hidden order, Communications of the ACM, 44(9): 55–59.
Adamic, L. A., Huberman, B. A. (2000), Power-law distribution of the world wide web, Science, 287(5461): 2115.
Albert, R., Jeong, H., Barabasi, A. L. (1999), Internet — diameter of the world-wide web, Nature, 401(6749): 130–131.
Baeza-Yates, R., Castillo, C. (2005), Características de la web chilena 2004, Technical Report, Center for Web Research, University of Chile, Santiago de Chile. http://www.ciw.cl/webcl2004 /Web_Chilena_2004.pdf
Baeza-Yates, R., Castillo, C. (2007), Características de la web chilena 2006, Technical Report, Center for Web Research, University of Chile, Santiago de Chile. http://www.ciw.cl/material/web_chilena_2006 /index.html
Bar-Ilan, J. (2001), Data collection methods on the Web for informetric purposes — A review and analysis, Scientometrics, 50 (1): 7–32.
Bar-Ilan, J., Peritz B. C. (1999), The life span of a specific topic on the Web; the case of ‘Informetrics’: A quantitative analysis, Scientometrics, 46(3): 371–382.
Bar-Yossef, Z., Broder, A. Z., Kumar, R., Tomkins, A. (2004), Sic transit gloria telae: towards an understanding of the web’s decay, In: S. I. Feldman, M. Uretsky, M. Najork, C. E. Wills (Eds), Proceedings of the 13th international conference on World Wide Web, www 2004, ACM Press, New York, USA, pp. 328–337.
Barabasi, A. L., Albert, R. (1999), Emergence of scaling in random networks, Science, 286(5439): 509–512.
Berners-Lee, T., Cailliau, R., Groff, J. F., Pollermann, B. (1992), World-wide web: the information universe, Internet Research, 2(1): 52–58.
Bordignon, F. R. A., Lavallén, P. J., Tolosa, G. H. (2006), El Estado de la Web de Paraguay y la Sociedad de la Información, In: Proceedings of the I Congreso Internacional y VI Congreso Nacional de Bibliotecarios, Documentalistas y Archivistas del Paraguay, Asunción, Paraguay. http://eprints.rclis.org/archive/00007704/01/webpy.pdf
Bordignon, F. R. A., Tolosa, G. H. (2006), Characterization of South American Educational Web Domains, In: Proceedings Congreso Argentino de Ciencias de la Computación. CACIC 2006, Potrero de los Funes, Argentina. http://eprints.rclis.org/archive/00007705/01/676-Educational_Webs_CACIC_English_v4.pdf
Brewington, B. E., Cybenko, G. (2000), How dynamic is the Web? Computer Networks, 33(1–6): 257–276. http://www9.org/w9cdrom/264/264.html
Burrell, Q. (2002), The nth-citation distribution and obsolescence, Scientometrics, 53(3): 309–323.
Cailliau, R. (1995), A short history of the web. Netvalley.com http://www.netvalley.com/archives/mirrors/robert_cailliau_speech.htm
Cho, J., Garcia-Molina, H., (2000), The evolution of the web and implications for an incremental crawler. In: Proceedings of the 26th International Conference on Very Large Data Bases, San Francisco, USA.
Cothey, V. (2004), Web-crawling reliability, Journal of the American Society for Information Science and Technology, 55(14): 1228–1238.
Cothey, V. (2005), Some preliminary results from a link-crawl of the European Union Research Area Web. In: P. Ingwersen, B. Larsen (Eds), Proceeding of the 10th International Conference of the International Society for Scientometrics and Informetrics, Stockholm, Sweden.
Douglis, F., Feldmann, A., Krishnamurthy, B. (1997), Rate of change and other metrics: A live study of the World Wide Web, In: Proceedings of the USENIX Symposium on Internet Technologies and Systems, Monterrey, USA, pp. 147–158, http://www.usenix.org/publications/library/proceedings/usits97/full_papers/douglis_rate/douglis_rate_html/douglis_rate.html
Egghe, L., Ravichandra, R. I. K. (1992), Citation age data and the obsolescence function: Fits and explanations, Information processing and management, 28(2): 201–217.
Fetterly, D., Manasse, M., Najork, M., Wiener, J. L. (2003), A large-scale study of the evolution of web pages. In: Proceedings of the 12th International World Wide Web Conference, ACM Press, Budapest, Hungary, pp. 669–678
Flake, G. W., Lawrence, S., Giles, L. (2000), Efficient identification of web communities. In: Proceeding of the Sixth International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD-2000), Boston, USA, pp. 150–160.
Harter, S., Kim, H. (1996), Electronic journals and scholarly communication: a citation and reference study, Information Research, 2(1): paper 9a, http://informationr.net/ir/2-1/paper9a.html
Internet Systems Consortium (2004), Domain Survey Information. http://www.isc.org/index.pl?/ops/ds/.
Koehler, W. (1999), An analysis of web page and web site constancy and permanence, Journal of the American Society for Information Science, 50(2): 162–180.
Koehler, W. (2002), Web page change and persistence — a four-year longitudinal study, Journal of the American Society for Information Science and Technology, 53(2): 162–171.
Koehler, W. (2004), A longitudinal study of web pages continued: A consideration of document persistence, Information Research, 9(2): paper 174, http://informationr.net/ir/9-2/paper174.html
Lawrence, S., Pennock, D. M., Flake, G. W., Krovetz, R., Coetzee, F. M., Glover, E., Nielsen, F. A., Kruger, A., Giles, C. L. (2001), Persistence of web references in scientific research, Computer, 34(2): 26–31.
Leydesdorff, L., Curran, M. (2000), Mapping university-industry-government relations on the internet: the construction of indicators for a knowledge-based economy, Cybermetrics, 4(1): paper 2.
Line, M. B. (1970), The ‘half-life’ of periodical literature: Apparent and real obsolescence, Journal of Documentation, 26: 46–54.
Line, M. B. (1993), Changes in the literature with time — Obsolescence revisited, Library Trends, 41: 665–683.
Nelson, M., Allen, B. (2002), Object persistence and availability in digital libraries. D-Lib Magazine, 8(1), http://www.dlib.org/dlib/january02/nelson/01nelson.html
Netcraft Ltd (2007), Web Server Survey. http://news.netcraft.com/archives/web_server_survey.html
Nicholas, D., Huntington, P., Dobrowolski, T., Rowlands, I., Jamali, H. R., Polydoratou, P. (2005), Revisiting ‘obsolescence’ and journal article ‘decay’ through usage data: An analysis of digital journal use by year of publication, Information Processing & Management, 41(6): 1441–1461
Nielsen, J. (2007), 100 Million Websites, http://www.useit.com/alertbox/web-growth.html.
O’Neill, E. T., Lavoie, B. F., Bennet, R. (2003), Trends in the evolution of the public web 1998–2002, D-Lib Magazine, 9(4), http://www.dlib.org/dlib/april03/lavoie/04lavoie.html
Ortega, J. L., Aguillo, I. F., Cothey, V., Scharnhorst, A. (2008), Maps of the academic web in the European higher education area — an exploration of visual web indicators. Scientometrics, 74(2): 295–308.
Ortega, J. L., Aguillo, I. F., Prieto, J. A. (2006), Longitudinal study of contents and elements in the scientific web environment, Journal of Information Science, 32(4): 344–351.
Payne, N., Thelwall, M. (2008), Longitudinal trends in academic web links, Journal of Information Science, 34(1): 3–14.
Pennock, D. M., Flake, G. W., Lawrence, S., Glover, E. J., Giles, C. L. (2002), Winners don’t take all: characterizing the competition for links on the web, Proceedings of the National Academy of Sciences of the United States of America, 99(8): 5207–5211.
Price, D. D. S. (1976), A general theory of bibliometric and other cumulative advantage processes, Journal of the American Society for Information Science, 27(5): 292–306.
Rivest, R. (1992), The MD5 Message Digest Algorithm, Internet RFC 1321. http://people.csail.mit.edu /rivest/Rivest-MD5.txt
Rousseau, R. (1997), Sitations: an exploratory study, Cybermetrics, 1(1): paper 1, http://www.cindoc.csic.es/cybermetrics/articles/v1i1p1.html
Rutherford, E. (1900), A radioactive substance emitted from thorium compounds, Philosophical Magazine, 49(5): 1–14.
Spinellis, D. (2003), The decay and failure of web references, Communications of the ACM, 46(1): 71–77.
Tolosa, G., Bordignon, F., Baeza-Yates, R., Castillo, C. (2007), Characterization of the Argentinian web, Cybermetrics, 11(1): paper 3, http://www.cindoc.csic.es/cybermetrics/articles/v11i1p3.html
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ortega, J.L., Cothey, V. & Aguillo, I.F. How old is the Web? Characterizing the age and the currency of the European scientific Web. Scientometrics 81, 295–309 (2009). https://doi.org/10.1007/s11192-008-2149-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-008-2149-x