Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Web Characteristics and Evolution

  • Dennis Fetterly
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_456

Definition

Web characteristics are properties related to collections of documents accessible via the World Wide Web. There are vast numbers of properties that can be characterized. Some examples include the number of words in a document, the length of a document in bytes, the language a document is authored in, the mime-type of a document, properties of the URL that indentifies a document, HTML tags used to author a document, and the hyperlink structure created by the collection of documents.

As in the physical world, the process of change that the web continually undergoes is identified as web evolution. The web is a tremendously dynamic place, with new users, servers, and pages entering and leaving the system continuously, which causes the web to change very rapidly. Web evolution encompasses changes in all web characteristics, as defined above.

Historical Background

As early as 1994, researchers were interested in studying characteristics of the World Wide Web. As documented by...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Angeles Serrano M, Maguitman A, Santo Fortunato ná MB, Vespignani V. Decoding the structure of the www: a comparative analysis of web crawls. ACM Trans Web. 2007;1(2):10.CrossRefGoogle Scholar
  2. 2.
    Baeza-Yates R, Castillo C, Efthimiadis EN. Characterization of national web domains. ACM Trans Int Tech. 2007;7(2):9.CrossRefGoogle Scholar
  3. 3.
    Broder AZ, Glassman SC, Manasse MS, Zweig G. Syntactic clustering of the web. In: Selected papers from the Sixth International Conference on World Wide Web; 1997. p. 1157–66.CrossRefGoogle Scholar
  4. 4.
    Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J. Graph structure in the web: experiments and models. In: Proceedings of the 8th International World Wide Web Conference; 2002.Google Scholar
  5. 5.
    Charikar MS. Similarity estimation techniques from rounding algorithms. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing; 2002. p. 380–88.Google Scholar
  6. 6.
    Cho J, Garcia-Molina H. The evolution of the web and implications for an incremental crawler. In: Proceedings of the 26th International Conference on Very Large Data Bases; 2000. p. 200–09.Google Scholar
  7. 7.
    Douglis F, Feldmann A, Krishnamurthy B, Mogul JC. Rate of change and other metrics: a live study of the world wide web. In: Proceedings of the 1st USENIX Symposium on Internet Technologies and Systems; 1997.Google Scholar
  8. 8.
    Fetterly D, Manasse M, Najork M, Wiener J. A large-scale study of the evolution of web pages. In: Proceedings of the 12th International World Wide Web Conference; 2003. p. 669–78.Google Scholar
  9. 9.
    Henzinger M. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2006. p. 284–91.Google Scholar
  10. 10.
    Henzinger MR, Heydon A, Mitzenmacher M, Najork~M. On near-uniform url sampling. Comput Netw. 2000;33(1–6):294–308.Google Scholar
  11. 11.
    Lawrence S, Giles LC. Accessibility of information on the web. Nature. 1999;400(6740):107–7.CrossRefGoogle Scholar
  12. 12.
    Pitkow JE. Summary of www characterizations. Comput Netw. 1998;30(1–7):551–8.Google Scholar
  13. 13.
    Woodruff A, Aoki PM, Brewer EA, Gauthier P, Rowe LA. An investigation of documents from the world wide web. Comput Netw. 1996;28(7–11):963–80.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Google, Inc.Mountain ViewUSA

Section editors and affiliations

  • Cong Yu
    • 1
  1. 1.Google ResearchNew YorkUSA