Advertisement

The Web as a Graph: Measurements, Models, and Methods

  • Jon M. Kleinberg
  • Ravi Kumar
  • Prabhakar Raghavan
  • Sridhar Rajagopalan
  • Andrew S. Tomkins
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1627)

Abstract

The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons — mathematical, sociological, and commercial — for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web.

Keywords

Association Rule Random Graph Query Term Giant Component Principal Eigenvector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Weiner. The Lorel Query language for semistructured data. Intl. J. on Digital Libraries, 1(1):68–88, 1997.Google Scholar
  2. 2.
    R. Agrawal and R. Srikanth. Fast algorithms for mining association rules. Proc. VLDB, 1994.Google Scholar
  3. 3.
    G. O. Arocena, A. O. Mendelzon, G. A. Mihaila. Applications of a Web query language. Proc. 6th WWW Conf., 1997.Google Scholar
  4. 4.
    K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public Web search engines. Proc. 7th WWW Conf., 1998.Google Scholar
  5. 5.
    K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. Proc. ACM SIGIR, 1998.Google Scholar
  6. 6.
    S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Proc. 7th WWW Conf., 1998.Google Scholar
  7. 7.
    B. Bollobás. Random Graphs, Academic Press, 1985.Google Scholar
  8. 8.
    J. Carriére and R. Kazman. WebQuery: Searching and visualizing the Web through connectivity. Proc. 6th WWW Conf., 1997.Google Scholar
  9. 9.
    S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. Proc. 7th WWW Conf., 1998.Google Scholar
  10. 10.
    S. Chakrabarti, B. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. SIGIR workshop on hypertext IR, 1998.Google Scholar
  11. 11.
    S. Chakrabarti and B. Dom and P. Indyk. Enhanced hypertext classification using hyperlinks. Proc. ACM SIGMOD, 1998.Google Scholar
  12. 12.
    H. T. Davis. The Analysis of Economic Time Series. Principia press, 1941.Google Scholar
  13. 13.
    R. Downey, M. Fellows. Parametrized Computational Feasibility. In Feasible Mathematics II, P. Clote and J. Remmel, eds., Birkhauser, 1994.Google Scholar
  14. 14.
    L. Egghe, R. Rousseau, Introduction to Informetrics, Elsevier, 1990.Google Scholar
  15. 15.
    D. Florescu, A. Levy and A. Mendelzon. Database techniques for the World Wide Web: A survey. SIGMOD Record, 27(3): 59–74, 1998.CrossRefGoogle Scholar
  16. 16.
    E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471–479, 1972.CrossRefGoogle Scholar
  17. 17.
    N. Gilbert. A simulation of the structure of academic science. Sociological Research Online, 2(2), 1997.Google Scholar
  18. 18.
    G. Golub, C. F. Van Loan. Matrix Computations, Johns Hopkins University Press, 1989.Google Scholar
  19. 19.
    M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. AMS-DIMACS series, special issue on computing on very large datasets, 1998.Google Scholar
  20. 20.
    M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10–25, 1963.CrossRefGoogle Scholar
  21. 21.
    J. Kleinberg. Authoritative sources in a hyperlinked environment, J. of the ACM, 1999, to appear. Also appears as IBM Research Report RJ 10076(91892) May 1997.Google Scholar
  22. 22.
    D. Konopnicki and O. Shmueli. Information gathering on the World Wide Web: the W3QL query language and the W3QS system. Trans. on Database Systems, 1998.Google Scholar
  23. 23.
    S. R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. Trawling emerging cyber-communities automatically. Proc. 8th WWW Conf., 1999.Google Scholar
  24. 24.
    L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian. A declarative approach to querying and restructuring the World Wide Web. Post-ICDE Workshop on RIDE, 1996.Google Scholar
  25. 25.
    R. Larson. Bibliometrics of the World Wide Web: An exploratory analysis of the intellectual structure of cyberspace. Ann. Meeting of the American Soc. Info. Sci., 1996.Google Scholar
  26. 26.
    A. J. Lotka. The frequency distribution of scientific productivity. J. of the Washington Acad. of Sci., 16:317, 1926.Google Scholar
  27. 27.
    A. Mendelzon, G. Mihaila, and T. Milo. Querying the World Wide Web, J. of Digital Libraries 1(1):68–88, 1997.Google Scholar
  28. 28.
    A. Mendelzon and P. Wood. Finding regular simple paths in graph databases. SIAM J. Comp., 24(6):1235–1258, 1995.zbMATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    E. Spertus. ParaSite: Mining structural information on the Web. Proc. 6th WWW Conf., 1997.Google Scholar
  30. 30.
    D. Tsur, J. Ullman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov, and A. Rosenthal. Query Flocks: A generalization of association rule mining. Proc. ACM SIGMOD, 1998.Google Scholar
  31. 31.
    G. K. Zipf. Human behavior and the principle of least effort. New York: Hafner, 1949.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Jon M. Kleinberg
    • 1
  • Ravi Kumar
    • 2
  • Prabhakar Raghavan
    • 2
  • Sridhar Rajagopalan
    • 2
  • Andrew S. Tomkins
    • 2
  1. 1.Department of Computer ScienceCornell UniversityIthaca
  2. 2.IBM Almaden Research Center K53/B1San Jose

Personalised recommendations