Graph Mining: Laws and Generators

Chakrabarti, Deepayan; Faloutsos, Christos; McGlohon, Mary

doi:10.1007/978-1-4419-6045-0_3

Deepayan Chakrabarti³,
Christos Faloutsos⁴ &
Mary McGlohon⁴

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

7314 Accesses
11 Citations

Abstract

How does the Web look? How could we tell an “abnormal” social network from a “normal” one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks, to sociology, to biology, and many more. Indeed, any M : N relation in database terminology can be represented as a graph. Many of these ques- tions boil down to the following: “How can we generate synthetic but realistic graphs?” To answer this, we must first understand what patterns are common in real-world graphs, and can thus be considered a mark of normality/realism. This survey gives an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology and computer science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lada A. Adamic and Bernardo A. Huberman. Power-law distribution of the World Wide Web. Science, 287:2115, 2000.
Article Google Scholar
Lada A. Adamic and Bernardo A. Huberman. The Web’s hidden order. Communications of the ACM, 44(9):55–60, 2001.
Article Google Scholar
William Aiello, Fan Chung, and Linyuan Lu. A random graph model for massive graphs. In ACM Symposium on Theory of Computing, pages 171–180, New York, NY, 2000. ACM Press.
Google Scholar
William Aiello, Fan Chung, and Linyuan Lu. Random evolution in massive graphs. In IEEE Symposium on Foundations of Computer Science, Los Alamitos, CA, 2001. IEEE Computer Society Press.
Google Scholar
Leman Akoglu, Mary Mcglohon, and Christos Faloutsos. Rtm: Laws and a recursive generator for weighted time-evolving graphs. In International Conference on Data Mining, December 2008.
Google Scholar
Reka Albert and Albert-Laszlo Barabasi. Topology of evolving networks: local events and universality. Physical Review Letters, 85(24):5234–5237, 2000.
Article Google Scholar
Reka Albert and Albert-Laszlo Barabasi. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):47–97, 2002.
Article MathSciNet Google Scholar
Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi. Diameter of the World-Wide Web. Nature, 401:130–131, September 1999.
Article Google Scholar
Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi. Error and attack tolerance of complex networks. Nature, 406:378–381, 2000.
Article Google Scholar
Lus A. Nunes Amaral, Antonio Scala, Marc Barthelemy, and H. Eugene Stanley. Classes of small-world networks. Proceedings of the National Academy of Sciences, 97(21):11149–11152, 2000.
Article Google Scholar
Ricardo Baeza-Yates and Barbara Poblete. Evolution of the Chilean Web structure composition. In Latin American Web Congress, Los Alamitos, CA, 2003. IEEE Computer Society Press.
Google Scholar
Albert-Laszlo Barabasi. Linked: The New Science of Networks. Perseus Books Group, New York, NY, first edition, May 2002.
Google Scholar
Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999.
Article MathSciNet Google Scholar
Albert-Laszlo Barabasi, Hawoong Jeong, Z. Neda, Erzsebet Ravasz, A. Schubert, and Tamas Vicsek. Evolution of the social network of scientific collaborations. Physica A, 311:590–614, 2002.
Article MATH MathSciNet Google Scholar
Jan Beirlant, Tertius de Wet, and Yuri Goegebeur. A goodness-of-fit statistic for Pareto-type behaviour. Journal of Computational and Applied Mathematics, 186(1):99–116, 2005.
Article Google Scholar
Noam Berger, Christian Borgs, Jennifer T. Chayes, Raissa M. D’Souza, and Bobby D. Kleinberg. Competition-induced preferential attachment. Combinatorics, Probability and Computing, 14:697–721, 2005.
Article MATH MathSciNet Google Scholar
Zhiqiang Bi, Christos Faloutsos, and Flip Korn. The DGX distribution for mining massive, skewed data. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, pages 17–26, New York, NY, 2001. ACM Press.
Google Scholar
Ginestra Bianconi and Albert-Laszlo Barabasi. Competition and multi-scaling in evolving networks. Europhysics Letters, 54(4):436–442, 2001.
Article Google Scholar
Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. Structural properties of the African Web. In International World Wide Web Conference, New York, NY, 2002. ACM Press.
Google Scholar
Bela Bollobas. Random Graphs. Academic Press, London, 1985.
MATH Google Scholar
Bela Bollobas, Christian Borgs, Jennifer T. Chayes, and Oliver Riordan. Directed scale-free graphs. In ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, 2003. SIAM.
Google Scholar
Bela Bollobas and Oliver Riordan. The diameter of a scale-free random graph. Combinatorica, 2002.
Google Scholar
Sergey Brin and Lawrence Page. The anatomy of a large-scale hyper-textual Web search engine. Computer Networks and ISDN Systems, 30(1–7):107–117, 1998.
Article Google Scholar
Andrei Z. Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web: experiments and models. In International World Wide Web Conference, New York, NY, 2000. ACM Press.
Google Scholar
Tian Bu and Don Towsley. On distinguishing between Internet power law topology generators. In IEEE INFOCOM, Los Alamitos, CA, 2002. IEEE Computer Society Press.
Google Scholar
Kenneth L. Calvert, Matthew B. Doar, and Ellen W. Zegura. Modeling Internet topology. IEEE Communications Magazine, 35(6):160–163, 1997.
Article Google Scholar
Jean M. Carlson and John Doyle. Highly optimized tolerance: A mechanism for power laws in designed systems. Physical Review E, 60(2):1412–1427, 1999.
Article Google Scholar
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. R-MAT: A recursive model for graph mining. In SIAM Data Mining Conference, Philadelphia, PA, 2004. SIAM.
Google Scholar
Q. Chen, H. Chang, Ramesh Govindan, Sugih Jamin, Scott Shenker, and Walter Willinger. The origin of power laws in Internet topologies revisited. In IEEE INFOCOM, Los Alamitos, CA, 2001. IEEE Computer Society Press.
Google Scholar
Colin Cooper and Alan Frieze. The size of the largest strongly connected component of a random digraph with a given degree sequence. Combinatorics, Probability and Computing, 13(3):319–337, 2004.
Article MATH MathSciNet Google Scholar
Mark Crovella and Murad S. Taqqu. Estimating the heavy tail index from scaling properties. Methodology and Computing in Applied Probability, 1(1):55–79, 1999.
Article MATH MathSciNet Google Scholar
Derek John de Solla Price. A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27:292–306, 1976.
Article Google Scholar
Stephen Dill, Ravi Kumar, Kevin S. McCurley, Sridhar Rajagopalan, D. Sivakumar, and Andrew Tomkins. Self-similarity in the Web. In International Conference on Very Large Data Bases, San Francisco, CA, 2001. Morgan Kaufmann.
Google Scholar
Pedro Domingos and Matthew Richardson. Mining the network value of customers. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, New York, NY, 2001. ACM Press.
Google Scholar
Sergey N. Dorogovtsev and Jose Fernando Mendes. Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford, UK, 2003.
MATH Google Scholar
Sergey N. Dorogovtsev, Jose Fernando Mendes, and Alexander N. Samukhin. Structure of growing networks with preferential linking. Physical Review Letters, 85(21):4633–4636, 2000.
Article Google Scholar
Sergey N. Dorogovtsev, Jose Fernando Mendes, and Alexander N. Samukhin. Giant strongly connected component of directed networks. Physical Review E, 64:025101 1–4, 2001.
Google Scholar
John Doyle and Jean M. Carlson. Power laws, Highly Optimized Tolerance, and Generalized Source Coding. Physical Review Letters, 84(24):5656–5659, June 2000.
Article Google Scholar
Nan Du, Christos Faloutsos, Bai Wang, and Leman Akoglu. Large human communication networks: patterns and a utility-driven generator. In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 269–278, New York, NY, USA, 2009. ACM.
Chapter Google Scholar
Paul Erdos and Alfred Renyi. On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Acadamy of Science, 5:17–61, 1960.
MathSciNet Google Scholar
Paul Erdos and Alfred Renyi. On the strength of connectedness of random graphs. Acta Mathematica Scientia Hungary, 12:261–267, 1961.
Article MathSciNet Google Scholar
Alex Fabrikant, Elias Koutsoupias, and Christos H. Papadimitriou. Heuristically Optimized Trade-offs: A new paradigm for power laws in the Internet. In International Colloquium on Automata, Languages and Programming, pages 110–122, Berlin, Germany, 2002. Springer Verlag.
Chapter Google Scholar
Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On power-law relationships of the Internet topology. In Conference of the ACM Special Interest Group on Data Communications (SIGCOMM), pages 251–262, New York, NY, 1999. ACM Press.
Google Scholar
Andrey Feuerverger and Peter Hall. Estimating a tail exponent by modelling departure from a Pareto distribution. The Annals of Statistics, 27(2):760–781, 1999.
Article MATH MathSciNet Google Scholar
Michael L. Goldstein, Steven A. Morris, and Gary G. Yen. Problems with fitting to the power-law distribution. The European Physics Journal B, 41:255–258, 2004.
Article Google Scholar
Ramesh Govindan and Hongsuda Tangmunarunkit. Heuristics for Internet map discovery. In IEEE INFOCOM, pages 1371–1380, Los Alamitos, CA, March 2000. IEEE Computer Society Press.
Google Scholar
Mark S. Granovetter. The strength of weak ties. The American Journal of Sociology, 78(6):1360–1380, May 1973.
Article Google Scholar
Bruce M. Hill. A simple approach to inference about the tail of a distribution. The Annals of Statistics, 3(5):1163–1174, 1975.
Article MATH MathSciNet Google Scholar
George Karypis and Vipin Kumar. Multilevel algorithms for multi-constraint graph partitioning. Technical Report 98-019, University of Minnesota, 1998.
Google Scholar
Jon Kleinberg. Small world phenomena and the dynamics of information. In Neural Information Processing Systems Conference, Cambridge, MA, 2001. MIT Press.
Google Scholar
Jon Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. The web as a graph: Measurements, models and methods. In International Computing and Combinatorics Conference, Berlin, Germany, 1999. Springer.
Google Scholar
Paul L. Krapivsky and Sidney Redner. Organization of growing random networks. Physical Review E, 63(6):066123 1–14, 2001.
Article Google Scholar
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, D. Sivakumar, Andrew Tomkins, and Eli Upfal. Stochastic models for the Web graph. In IEEE Symposium on Foundations of Computer Science, Los Alamitos, CA, 2000. IEEE Computer Society Press.
Google Scholar
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Extracting large-scale knowledge bases from the web. In International Conference on Very Large Data Bases, San Francisco, CA, 1999. Morgan Kaufmann.
Google Scholar
Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Gharamani. Kronecker graphs: an approach to modeling networks, 2008.
Google Scholar
Jure Leskovec, Mary Mcglohon, Christos Faloutsos, Natalie Glance, and Matthew Hurst. Cascading behavior in large blog graphs. SIAM International Conference on Data Mining (SDM), 2007.
Google Scholar
Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, and Christos Faloutsos. Realistic, mathematically tractable graph generation and evolution, using Kronecker Multiplication. In Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, 2005. Springer.
Google Scholar
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, New York, NY, 2005. ACM Press.
Google Scholar
Mary Mcglohon, Leman Akoglu, and Christos Faloutsos. Weighted graphs and disconnected components: Patterns and a generator. In ACM Special Interest Group on Knowledge Discovery and Data Mining (SIG-KDD), August 2008.
Google Scholar
Alberto Medina, Ibrahim Matta, and John Byers. On the origin of power laws in Internet topologies. In Conference of the ACM Special Interest Group on Data Communications (SIGCOMM), pages 18–34, New York, NY, 2000. ACM Press.
Google Scholar
Milena Mihail and Christos H. Papadimitriou. On the eigenvalue power law. In International Workshop on Randomization and Approximation Techniques in Computer Science, Berlin, Germany, 2002. Springer Verlag.
Google Scholar
Michael Mitzenmacher. A brief history of generative models for power law and lognormal distributions. In Proc. 39th Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL, 2001. UIUC Press.
Google Scholar
Alan L. Montgomery and Christos Faloutsos. Identifying Web browsing trends and patterns. IEEE Computer, 34(7):94–95, 2001.
Google Scholar
M. E. J. Newman. Power laws, pareto distributions and zipf’s law, December 2004.
Google Scholar
Mark E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167–256, 2003.
Article MATH MathSciNet Google Scholar
Mark E. J. Newman. Power laws, pareto distributions and Zipf’s law. Contemporary Physics, 46:323–351, 2005.
Article Google Scholar
Mark E. J. Newman, Stephanie Forrest, and Justin Balthrop. Email networks and the spread of computer viruses. Physical Review E, 66(3):035101 1–4, 2002.
Article Google Scholar
Mark E. J. Newman, Michelle Girvan, and J. Doyne Farmer. Optimal design, robustness and risk aversion. Physical Review Letters, 89(2):028301 1–4, 2002.
Article Google Scholar
Mark E. J. Newman, Steven H. Strogatz, and Duncan J. Watts. Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64(2):026118 1–17, 2001.
Google Scholar
Christine Nickel. Random Dot Product Graphs: A Model for Social Networks. PhD thesis, The Johns Hopkins University, 2007.
Google Scholar
Christopher Palmer, Phil B. Gibbons, and Christos Faloutsos. ANF: A fast and scalable tool for data mining in massive graphs. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, New York, NY, 2002. ACM Press.
Google Scholar
Christopher Palmer and J. Gregory Steffan. Generating network topologies that obey power laws. In IEEE Global Telecommunications Conference, Los Alamitos, CA, November 2000. IEEE Computer Society Press.
Google Scholar
Gopal Pandurangan, Prabhakar Raghavan, and Eli Upfal. Using PageRank to characterize Web structure. In International Computing and Combinatorics Conference, Berlin, Germany, 2002. Springer.
Google Scholar
Romualdo Pastor-Satorras, Alexei Vasquez, and Alessandro Vespignani. Dynamical and correlation properties of the Internet. Physical Review Letters, 87(25):258701 1–4, 2001.
Article Google Scholar
David M. Pennock, Gary W. Flake, Steve Lawrence, Eric J. Glover, and C. Lee Giles. Winners don’t take all: Characterizing the competition for links on the Web. Proceedings of the National Academy of Sciences, 99(8):5207–5211, 2002.
Article MATH Google Scholar
Sidney Redner. How popular is your paper? an empirical study of the citation distribution. The European Physics Journal B, 4:131–134, 1998.
Article Google Scholar
Herbert Simon. On a class of skew distribution functions. Biometrika, 42(3/4):425–440, 1955.
Article MATH MathSciNet Google Scholar
Hongsuda Tangmunarunkit, Ramesh Govindan, Sugih Jamin, Scott Shenker, and Walter Willinger. Network topologies, power laws, and hierarchy. Technical Report 01-746, University of Southern California, 2001.
Google Scholar
Sudhir L. Tauro, Christopher Palmer, Georgos Siganos, and Michalis Faloutsos. A simple conceptual model for the Internet topology. In Global Internet, Los Alamitos, CA, 2001. IEEE Computer Society Press.
Google Scholar
Jeffrey Travers and Stanley Milgram. An experimental study of the Small World problem. Sociometry, 32(4):425–443, 1969.
Article Google Scholar
Duncan J. Watts. Six Degrees: The Science of a Connected Age. W. W. Norton and Company, New York, NY, 1st edition, 2003.
Google Scholar
Duncan J. Watts, Peter Sheridan Dodds, and Mark E. J. Newman. Identity and search in social networks. Science, 296:1302–1305, 2002.
Article Google Scholar
Duncan J. Watts and Steven H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393:440–442, 1998.
Article Google Scholar
Bernard M. Waxman. Routing of multipoint connections. IEEE Journal on Selected Areas in Communications, 6(9):1617–1622, December 1988.
Article Google Scholar
H. S. Wilf. Generating Functionology. Academic Press, 1990.
Google Scholar
Jared Winick and Sugih Jamin. Inet-3.0: Internet Topology Generator. Technical Report CSE-TR-456-02, University of Michigan, Ann Arbor, 2002.
Google Scholar
Soon-Hyung Yook, Hawoong Jeong, and Albert-Laszlo Barabasi. Modeling the Internet’s large-scale topology. Proceedings of the National Academy of Sciences, 99(21):13382–13386, 2002.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Yahoo! Research, Sunnyvale, CA, USA
Deepayan Chakrabarti
School of Computer Science Carnegie Mellon University, Pittsburgh, PA, USA
Christos Faloutsos & Mary McGlohon

Authors

Deepayan Chakrabarti
View author publications
You can also search for this author in PubMed Google Scholar
Christos Faloutsos
View author publications
You can also search for this author in PubMed Google Scholar
Mary McGlohon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepayan Chakrabarti .

Editor information

Editors and Affiliations

Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, U.S.A.
Charu C. Aggarwal
Microsoft Research Asia, Zhichun Road 49, Beijing, 100080, China, People's Republic
Haixun Wang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chakrabarti, D., Faloutsos, C., McGlohon, M. (2010). Graph Mining: Laws and Generators. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_3

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6045-0_3
Published: 18 January 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6044-3
Online ISBN: 978-1-4419-6045-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics