On the Stability of Web Crawling and Web Search
In this paper, we analyze a graph-theoretic property motivated by web crawling. We introduce a notion of stable cores, which is the set of web pages that are usually contained in the crawling buffer when the buffer size is smaller than the total number of web pages. We analyze the size of core in a random graph model based on the bounded Pareto power law distribution. We prove that a core of significant size exists for a large range of parameters 2 < α< 3 for the power law.
KeywordsRandom Graph Degree Distribution Total Degree Core Size Multiple Edge
Unable to display preview. Download preview PDF.
- 3.Castillo, C.: Effective Web Crawling, Ph.D. Thesis, University of Chile (2004)Google Scholar
- 4.Chung, F., Lu, L.: Complex Graphs and Networks. AMS (2007)Google Scholar
- 5.Faloutsos, C., Faloutsos, M., Faloutsos, P.: On power-law relationships of the internee topology. In: Proc. SIGCOMM (1999)Google Scholar