Level-Biased Statistics in the Hierarchical Structure of the Web
In the literature of web search and mining, researchers used to consider the World Wide Web as a flat network, in which each page as well as each hyperlink is treated identically. However, it is the common knowledge that the Web is organized with a natural hierarchical structure according to the URLs of pages. Exploring the hierarchical structure, we found several level-biased characteristics of the Web. First, the distribution of pages over levels has a spindle shape. Second, the average indegree in each level decreases sharply when the level goes down. Third, although the indegree distributions in deeper levels obey the same power law with the global indegree distribution, the top levels show a quite different statistical characteristic. We believe that these new discoveries might be essential to the Web, and by taking use of them, the current web search and mining technologies could be improved and thus better services to the web users could be provided.
KeywordsHierarchical Structure Mining Technology Link Structure Spindle Shape Virtual Page
Unable to display preview. Download preview PDF.
- 2.Brin, S., Page, L., Motwami, R., Winograd, T.: The PageRank citation ranking: bring order to the web. Technical report, Computer Science Department, Stanford University (1998)Google Scholar
- 3.Broder, A.Z., Kumar, S.R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the web: experiments and models. In: Proc. of the 9th WWW Conference, pp. 309–320 (2000)Google Scholar
- 5.Eiron, N., McCurley, K.: Link structure of hierarchical information networks. In: Proc. Third Workshop on Algorithms and Models for the Web-Graph (2004)Google Scholar
- 8.Klemm, K., Eguiluz, V.M.: Highly clustered scale-free networks. Phys. Rev. E 65, 036123 (2002)Google Scholar
- 9.Laura, L., Leonardi, S., Caldarelli, G., Rios, P.D.L.: A multi-layer model for the web graph. In: 2nd International Workshop on Web Dynamics, Honolulu (2002)Google Scholar
- 11.Pennock, D.M., Flake, G.W., Lawrence, S., Giles, C.L., Glover, E.J.: Winners don’t take all: Characterizing the competition for links on the Web. In: Proceedings of the National Academy of Sciences (2002)Google Scholar
- 12.Ravasz, E., Barabasi, A.-L.: Hierarchical organization in complex networks. Phys. Rev. E 67, 026112 (2003)Google Scholar
- 13.Simon, H.A.: The Sciences of the Artifical, 3rd edn. MIT Press, Cambridge (1981)Google Scholar