Skip to main content

Graph Mining: Laws and Generators

  • Chapter
  • First Online:
Book cover Managing and Mining Graph Data

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

Abstract

How does the Web look? How could we tell an “abnormal” social network from a “normal” one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks, to sociology, to biology, and many more. Indeed, any M : N relation in database terminology can be represented as a graph. Many of these ques- tions boil down to the following: “How can we generate synthetic but realistic graphs?” To answer this, we must first understand what patterns are common in real-world graphs, and can thus be considered a mark of normality/realism. This survey gives an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology and computer science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lada A. Adamic and Bernardo A. Huberman. Power-law distribution of the World Wide Web. Science, 287:2115, 2000.

    Article  Google Scholar 

  2. Lada A. Adamic and Bernardo A. Huberman. The Web’s hidden order. Communications of the ACM, 44(9):55–60, 2001.

    Article  Google Scholar 

  3. William Aiello, Fan Chung, and Linyuan Lu. A random graph model for massive graphs. In ACM Symposium on Theory of Computing, pages 171–180, New York, NY, 2000. ACM Press.

    Google Scholar 

  4. William Aiello, Fan Chung, and Linyuan Lu. Random evolution in massive graphs. In IEEE Symposium on Foundations of Computer Science, Los Alamitos, CA, 2001. IEEE Computer Society Press.

    Google Scholar 

  5. Leman Akoglu, Mary Mcglohon, and Christos Faloutsos. Rtm: Laws and a recursive generator for weighted time-evolving graphs. In International Conference on Data Mining, December 2008.

    Google Scholar 

  6. Reka Albert and Albert-Laszlo Barabasi. Topology of evolving networks: local events and universality. Physical Review Letters, 85(24):5234–5237, 2000.

    Article  Google Scholar 

  7. Reka Albert and Albert-Laszlo Barabasi. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):47–97, 2002.

    Article  MathSciNet  Google Scholar 

  8. Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi. Diameter of the World-Wide Web. Nature, 401:130–131, September 1999.

    Article  Google Scholar 

  9. Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi. Error and attack tolerance of complex networks. Nature, 406:378–381, 2000.

    Article  Google Scholar 

  10. Lus A. Nunes Amaral, Antonio Scala, Marc Barthelemy, and H. Eugene Stanley. Classes of small-world networks. Proceedings of the National Academy of Sciences, 97(21):11149–11152, 2000.

    Article  Google Scholar 

  11. Ricardo Baeza-Yates and Barbara Poblete. Evolution of the Chilean Web structure composition. In Latin American Web Congress, Los Alamitos, CA, 2003. IEEE Computer Society Press.

    Google Scholar 

  12. Albert-Laszlo Barabasi. Linked: The New Science of Networks. Perseus Books Group, New York, NY, first edition, May 2002.

    Google Scholar 

  13. Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999.

    Article  MathSciNet  Google Scholar 

  14. Albert-Laszlo Barabasi, Hawoong Jeong, Z. Neda, Erzsebet Ravasz, A. Schubert, and Tamas Vicsek. Evolution of the social network of scientific collaborations. Physica A, 311:590–614, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  15. Jan Beirlant, Tertius de Wet, and Yuri Goegebeur. A goodness-of-fit statistic for Pareto-type behaviour. Journal of Computational and Applied Mathematics, 186(1):99–116, 2005.

    Article  Google Scholar 

  16. Noam Berger, Christian Borgs, Jennifer T. Chayes, Raissa M. D’Souza, and Bobby D. Kleinberg. Competition-induced preferential attachment. Combinatorics, Probability and Computing, 14:697–721, 2005.

    Article  MATH  MathSciNet  Google Scholar 

  17. Zhiqiang Bi, Christos Faloutsos, and Flip Korn. The DGX distribution for mining massive, skewed data. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, pages 17–26, New York, NY, 2001. ACM Press.

    Google Scholar 

  18. Ginestra Bianconi and Albert-Laszlo Barabasi. Competition and multi-scaling in evolving networks. Europhysics Letters, 54(4):436–442, 2001.

    Article  Google Scholar 

  19. Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. Structural properties of the African Web. In International World Wide Web Conference, New York, NY, 2002. ACM Press.

    Google Scholar 

  20. Bela Bollobas. Random Graphs. Academic Press, London, 1985.

    MATH  Google Scholar 

  21. Bela Bollobas, Christian Borgs, Jennifer T. Chayes, and Oliver Riordan. Directed scale-free graphs. In ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, 2003. SIAM.

    Google Scholar 

  22. Bela Bollobas and Oliver Riordan. The diameter of a scale-free random graph. Combinatorica, 2002.

    Google Scholar 

  23. Sergey Brin and Lawrence Page. The anatomy of a large-scale hyper-textual Web search engine. Computer Networks and ISDN Systems, 30(1–7):107–117, 1998.

    Article  Google Scholar 

  24. Andrei Z. Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web: experiments and models. In International World Wide Web Conference, New York, NY, 2000. ACM Press.

    Google Scholar 

  25. Tian Bu and Don Towsley. On distinguishing between Internet power law topology generators. In IEEE INFOCOM, Los Alamitos, CA, 2002. IEEE Computer Society Press.

    Google Scholar 

  26. Kenneth L. Calvert, Matthew B. Doar, and Ellen W. Zegura. Modeling Internet topology. IEEE Communications Magazine, 35(6):160–163, 1997.

    Article  Google Scholar 

  27. Jean M. Carlson and John Doyle. Highly optimized tolerance: A mechanism for power laws in designed systems. Physical Review E, 60(2):1412–1427, 1999.

    Article  Google Scholar 

  28. Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. R-MAT: A recursive model for graph mining. In SIAM Data Mining Conference, Philadelphia, PA, 2004. SIAM.

    Google Scholar 

  29. Q. Chen, H. Chang, Ramesh Govindan, Sugih Jamin, Scott Shenker, and Walter Willinger. The origin of power laws in Internet topologies revisited. In IEEE INFOCOM, Los Alamitos, CA, 2001. IEEE Computer Society Press.

    Google Scholar 

  30. Colin Cooper and Alan Frieze. The size of the largest strongly connected component of a random digraph with a given degree sequence. Combinatorics, Probability and Computing, 13(3):319–337, 2004.

    Article  MATH  MathSciNet  Google Scholar 

  31. Mark Crovella and Murad S. Taqqu. Estimating the heavy tail index from scaling properties. Methodology and Computing in Applied Probability, 1(1):55–79, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  32. Derek John de Solla Price. A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27:292–306, 1976.

    Article  Google Scholar 

  33. Stephen Dill, Ravi Kumar, Kevin S. McCurley, Sridhar Rajagopalan, D. Sivakumar, and Andrew Tomkins. Self-similarity in the Web. In International Conference on Very Large Data Bases, San Francisco, CA, 2001. Morgan Kaufmann.

    Google Scholar 

  34. Pedro Domingos and Matthew Richardson. Mining the network value of customers. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, New York, NY, 2001. ACM Press.

    Google Scholar 

  35. Sergey N. Dorogovtsev and Jose Fernando Mendes. Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford, UK, 2003.

    MATH  Google Scholar 

  36. Sergey N. Dorogovtsev, Jose Fernando Mendes, and Alexander N. Samukhin. Structure of growing networks with preferential linking. Physical Review Letters, 85(21):4633–4636, 2000.

    Article  Google Scholar 

  37. Sergey N. Dorogovtsev, Jose Fernando Mendes, and Alexander N. Samukhin. Giant strongly connected component of directed networks. Physical Review E, 64:025101 1–4, 2001.

    Google Scholar 

  38. John Doyle and Jean M. Carlson. Power laws, Highly Optimized Tolerance, and Generalized Source Coding. Physical Review Letters, 84(24):5656–5659, June 2000.

    Article  Google Scholar 

  39. Nan Du, Christos Faloutsos, Bai Wang, and Leman Akoglu. Large human communication networks: patterns and a utility-driven generator. In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 269–278, New York, NY, USA, 2009. ACM.

    Chapter  Google Scholar 

  40. Paul Erdos and Alfred Renyi. On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Acadamy of Science, 5:17–61, 1960.

    MathSciNet  Google Scholar 

  41. Paul Erdos and Alfred Renyi. On the strength of connectedness of random graphs. Acta Mathematica Scientia Hungary, 12:261–267, 1961.

    Article  MathSciNet  Google Scholar 

  42. Alex Fabrikant, Elias Koutsoupias, and Christos H. Papadimitriou. Heuristically Optimized Trade-offs: A new paradigm for power laws in the Internet. In International Colloquium on Automata, Languages and Programming, pages 110–122, Berlin, Germany, 2002. Springer Verlag.

    Chapter  Google Scholar 

  43. Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On power-law relationships of the Internet topology. In Conference of the ACM Special Interest Group on Data Communications (SIGCOMM), pages 251–262, New York, NY, 1999. ACM Press.

    Google Scholar 

  44. Andrey Feuerverger and Peter Hall. Estimating a tail exponent by modelling departure from a Pareto distribution. The Annals of Statistics, 27(2):760–781, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  45. Michael L. Goldstein, Steven A. Morris, and Gary G. Yen. Problems with fitting to the power-law distribution. The European Physics Journal B, 41:255–258, 2004.

    Article  Google Scholar 

  46. Ramesh Govindan and Hongsuda Tangmunarunkit. Heuristics for Internet map discovery. In IEEE INFOCOM, pages 1371–1380, Los Alamitos, CA, March 2000. IEEE Computer Society Press.

    Google Scholar 

  47. Mark S. Granovetter. The strength of weak ties. The American Journal of Sociology, 78(6):1360–1380, May 1973.

    Article  Google Scholar 

  48. Bruce M. Hill. A simple approach to inference about the tail of a distribution. The Annals of Statistics, 3(5):1163–1174, 1975.

    Article  MATH  MathSciNet  Google Scholar 

  49. George Karypis and Vipin Kumar. Multilevel algorithms for multi-constraint graph partitioning. Technical Report 98-019, University of Minnesota, 1998.

    Google Scholar 

  50. Jon Kleinberg. Small world phenomena and the dynamics of information. In Neural Information Processing Systems Conference, Cambridge, MA, 2001. MIT Press.

    Google Scholar 

  51. Jon Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. The web as a graph: Measurements, models and methods. In International Computing and Combinatorics Conference, Berlin, Germany, 1999. Springer.

    Google Scholar 

  52. Paul L. Krapivsky and Sidney Redner. Organization of growing random networks. Physical Review E, 63(6):066123 1–14, 2001.

    Article  Google Scholar 

  53. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, D. Sivakumar, Andrew Tomkins, and Eli Upfal. Stochastic models for the Web graph. In IEEE Symposium on Foundations of Computer Science, Los Alamitos, CA, 2000. IEEE Computer Society Press.

    Google Scholar 

  54. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Extracting large-scale knowledge bases from the web. In International Conference on Very Large Data Bases, San Francisco, CA, 1999. Morgan Kaufmann.

    Google Scholar 

  55. Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Gharamani. Kronecker graphs: an approach to modeling networks, 2008.

    Google Scholar 

  56. Jure Leskovec, Mary Mcglohon, Christos Faloutsos, Natalie Glance, and Matthew Hurst. Cascading behavior in large blog graphs. SIAM International Conference on Data Mining (SDM), 2007.

    Google Scholar 

  57. Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, and Christos Faloutsos. Realistic, mathematically tractable graph generation and evolution, using Kronecker Multiplication. In Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, 2005. Springer.

    Google Scholar 

  58. Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, New York, NY, 2005. ACM Press.

    Google Scholar 

  59. Mary Mcglohon, Leman Akoglu, and Christos Faloutsos. Weighted graphs and disconnected components: Patterns and a generator. In ACM Special Interest Group on Knowledge Discovery and Data Mining (SIG-KDD), August 2008.

    Google Scholar 

  60. Alberto Medina, Ibrahim Matta, and John Byers. On the origin of power laws in Internet topologies. In Conference of the ACM Special Interest Group on Data Communications (SIGCOMM), pages 18–34, New York, NY, 2000. ACM Press.

    Google Scholar 

  61. Milena Mihail and Christos H. Papadimitriou. On the eigenvalue power law. In International Workshop on Randomization and Approximation Techniques in Computer Science, Berlin, Germany, 2002. Springer Verlag.

    Google Scholar 

  62. Michael Mitzenmacher. A brief history of generative models for power law and lognormal distributions. In Proc. 39th Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL, 2001. UIUC Press.

    Google Scholar 

  63. Alan L. Montgomery and Christos Faloutsos. Identifying Web browsing trends and patterns. IEEE Computer, 34(7):94–95, 2001.

    Google Scholar 

  64. M. E. J. Newman. Power laws, pareto distributions and zipf’s law, December 2004.

    Google Scholar 

  65. Mark E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167–256, 2003.

    Article  MATH  MathSciNet  Google Scholar 

  66. Mark E. J. Newman. Power laws, pareto distributions and Zipf’s law. Contemporary Physics, 46:323–351, 2005.

    Article  Google Scholar 

  67. Mark E. J. Newman, Stephanie Forrest, and Justin Balthrop. Email networks and the spread of computer viruses. Physical Review E, 66(3):035101 1–4, 2002.

    Article  Google Scholar 

  68. Mark E. J. Newman, Michelle Girvan, and J. Doyne Farmer. Optimal design, robustness and risk aversion. Physical Review Letters, 89(2):028301 1–4, 2002.

    Article  Google Scholar 

  69. Mark E. J. Newman, Steven H. Strogatz, and Duncan J. Watts. Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64(2):026118 1–17, 2001.

    Google Scholar 

  70. Christine Nickel. Random Dot Product Graphs: A Model for Social Networks. PhD thesis, The Johns Hopkins University, 2007.

    Google Scholar 

  71. Christopher Palmer, Phil B. Gibbons, and Christos Faloutsos. ANF: A fast and scalable tool for data mining in massive graphs. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, New York, NY, 2002. ACM Press.

    Google Scholar 

  72. Christopher Palmer and J. Gregory Steffan. Generating network topologies that obey power laws. In IEEE Global Telecommunications Conference, Los Alamitos, CA, November 2000. IEEE Computer Society Press.

    Google Scholar 

  73. Gopal Pandurangan, Prabhakar Raghavan, and Eli Upfal. Using PageRank to characterize Web structure. In International Computing and Combinatorics Conference, Berlin, Germany, 2002. Springer.

    Google Scholar 

  74. Romualdo Pastor-Satorras, Alexei Vasquez, and Alessandro Vespignani. Dynamical and correlation properties of the Internet. Physical Review Letters, 87(25):258701 1–4, 2001.

    Article  Google Scholar 

  75. David M. Pennock, Gary W. Flake, Steve Lawrence, Eric J. Glover, and C. Lee Giles. Winners don’t take all: Characterizing the competition for links on the Web. Proceedings of the National Academy of Sciences, 99(8):5207–5211, 2002.

    Article  MATH  Google Scholar 

  76. Sidney Redner. How popular is your paper? an empirical study of the citation distribution. The European Physics Journal B, 4:131–134, 1998.

    Article  Google Scholar 

  77. Herbert Simon. On a class of skew distribution functions. Biometrika, 42(3/4):425–440, 1955.

    Article  MATH  MathSciNet  Google Scholar 

  78. Hongsuda Tangmunarunkit, Ramesh Govindan, Sugih Jamin, Scott Shenker, and Walter Willinger. Network topologies, power laws, and hierarchy. Technical Report 01-746, University of Southern California, 2001.

    Google Scholar 

  79. Sudhir L. Tauro, Christopher Palmer, Georgos Siganos, and Michalis Faloutsos. A simple conceptual model for the Internet topology. In Global Internet, Los Alamitos, CA, 2001. IEEE Computer Society Press.

    Google Scholar 

  80. Jeffrey Travers and Stanley Milgram. An experimental study of the Small World problem. Sociometry, 32(4):425–443, 1969.

    Article  Google Scholar 

  81. Duncan J. Watts. Six Degrees: The Science of a Connected Age. W. W. Norton and Company, New York, NY, 1st edition, 2003.

    Google Scholar 

  82. Duncan J. Watts, Peter Sheridan Dodds, and Mark E. J. Newman. Identity and search in social networks. Science, 296:1302–1305, 2002.

    Article  Google Scholar 

  83. Duncan J. Watts and Steven H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393:440–442, 1998.

    Article  Google Scholar 

  84. Bernard M. Waxman. Routing of multipoint connections. IEEE Journal on Selected Areas in Communications, 6(9):1617–1622, December 1988.

    Article  Google Scholar 

  85. H. S. Wilf. Generating Functionology. Academic Press, 1990.

    Google Scholar 

  86. Jared Winick and Sugih Jamin. Inet-3.0: Internet Topology Generator. Technical Report CSE-TR-456-02, University of Michigan, Ann Arbor, 2002.

    Google Scholar 

  87. Soon-Hyung Yook, Hawoong Jeong, and Albert-Laszlo Barabasi. Modeling the Internet’s large-scale topology. Proceedings of the National Academy of Sciences, 99(21):13382–13386, 2002.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepayan Chakrabarti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag US

About this chapter

Cite this chapter

Chakrabarti, D., Faloutsos, C., McGlohon, M. (2010). Graph Mining: Laws and Generators. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-6045-0_3

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-6044-3

  • Online ISBN: 978-1-4419-6045-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics