Advertisement

On the Stability of Web Crawling and Web Search

  • Reid Anderson
  • Christian Borgs
  • Jennifer Chayes
  • John Hopcroft
  • Vahab Mirrokni
  • Shang-Hua Teng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5369)

Abstract

In this paper, we analyze a graph-theoretic property motivated by web crawling. We introduce a notion of stable cores, which is the set of web pages that are usually contained in the crawling buffer when the buffer size is smaller than the total number of web pages. We analyze the size of core in a random graph model based on the bounded Pareto power law distribution. We prove that a core of significant size exists for a large range of parameters 2 < α< 3 for the power law.

Keywords

Random Graph Degree Distribution Total Degree Core Size Multiple Edge 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 107–117 (1998)CrossRefGoogle Scholar
  2. 2.
    Bollobas, B., Riordan, O., Spencer, J., Tusnady, G.: The degree sequence of a scale-free random process. Random Structures and Algorithms 18, 279–290 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Castillo, C.: Effective Web Crawling, Ph.D. Thesis, University of Chile (2004)Google Scholar
  4. 4.
    Chung, F., Lu, L.: Complex Graphs and Networks. AMS (2007)Google Scholar
  5. 5.
    Faloutsos, C., Faloutsos, M., Faloutsos, P.: On power-law relationships of the internee topology. In: Proc. SIGCOMM (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Reid Anderson
    • 1
  • Christian Borgs
    • 1
  • Jennifer Chayes
    • 1
  • John Hopcroft
    • 2
  • Vahab Mirrokni
    • 3
  • Shang-Hua Teng
    • 4
  1. 1.Microsoft ResearchUSA
  2. 2.Cornell UniversityUSA
  3. 3.Google ResearchUSA
  4. 4.Boston UniversityUSA

Personalised recommendations