Abstract
How does the Web look? How could we tell an “abnormal” social network from a “normal” one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks, to sociology, to biology, and many more. Indeed, any M : N relation in database terminology can be represented as a graph. Many of these ques- tions boil down to the following: “How can we generate synthetic but realistic graphs?” To answer this, we must first understand what patterns are common in real-world graphs, and can thus be considered a mark of normality/realism. This survey gives an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology and computer science.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lada A. Adamic and Bernardo A. Huberman. Power-law distribution of the World Wide Web. Science, 287:2115, 2000.
Lada A. Adamic and Bernardo A. Huberman. The Web’s hidden order. Communications of the ACM, 44(9):55–60, 2001.
William Aiello, Fan Chung, and Linyuan Lu. A random graph model for massive graphs. In ACM Symposium on Theory of Computing, pages 171–180, New York, NY, 2000. ACM Press.
William Aiello, Fan Chung, and Linyuan Lu. Random evolution in massive graphs. In IEEE Symposium on Foundations of Computer Science, Los Alamitos, CA, 2001. IEEE Computer Society Press.
Leman Akoglu, Mary Mcglohon, and Christos Faloutsos. Rtm: Laws and a recursive generator for weighted time-evolving graphs. In International Conference on Data Mining, December 2008.
Reka Albert and Albert-Laszlo Barabasi. Topology of evolving networks: local events and universality. Physical Review Letters, 85(24):5234–5237, 2000.
Reka Albert and Albert-Laszlo Barabasi. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):47–97, 2002.
Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi. Diameter of the World-Wide Web. Nature, 401:130–131, September 1999.
Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi. Error and attack tolerance of complex networks. Nature, 406:378–381, 2000.
Lus A. Nunes Amaral, Antonio Scala, Marc Barthelemy, and H. Eugene Stanley. Classes of small-world networks. Proceedings of the National Academy of Sciences, 97(21):11149–11152, 2000.
Ricardo Baeza-Yates and Barbara Poblete. Evolution of the Chilean Web structure composition. In Latin American Web Congress, Los Alamitos, CA, 2003. IEEE Computer Society Press.
Albert-Laszlo Barabasi. Linked: The New Science of Networks. Perseus Books Group, New York, NY, first edition, May 2002.
Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999.
Albert-Laszlo Barabasi, Hawoong Jeong, Z. Neda, Erzsebet Ravasz, A. Schubert, and Tamas Vicsek. Evolution of the social network of scientific collaborations. Physica A, 311:590–614, 2002.
Jan Beirlant, Tertius de Wet, and Yuri Goegebeur. A goodness-of-fit statistic for Pareto-type behaviour. Journal of Computational and Applied Mathematics, 186(1):99–116, 2005.
Noam Berger, Christian Borgs, Jennifer T. Chayes, Raissa M. D’Souza, and Bobby D. Kleinberg. Competition-induced preferential attachment. Combinatorics, Probability and Computing, 14:697–721, 2005.
Zhiqiang Bi, Christos Faloutsos, and Flip Korn. The DGX distribution for mining massive, skewed data. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, pages 17–26, New York, NY, 2001. ACM Press.
Ginestra Bianconi and Albert-Laszlo Barabasi. Competition and multi-scaling in evolving networks. Europhysics Letters, 54(4):436–442, 2001.
Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. Structural properties of the African Web. In International World Wide Web Conference, New York, NY, 2002. ACM Press.
Bela Bollobas. Random Graphs. Academic Press, London, 1985.
Bela Bollobas, Christian Borgs, Jennifer T. Chayes, and Oliver Riordan. Directed scale-free graphs. In ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, 2003. SIAM.
Bela Bollobas and Oliver Riordan. The diameter of a scale-free random graph. Combinatorica, 2002.
Sergey Brin and Lawrence Page. The anatomy of a large-scale hyper-textual Web search engine. Computer Networks and ISDN Systems, 30(1–7):107–117, 1998.
Andrei Z. Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web: experiments and models. In International World Wide Web Conference, New York, NY, 2000. ACM Press.
Tian Bu and Don Towsley. On distinguishing between Internet power law topology generators. In IEEE INFOCOM, Los Alamitos, CA, 2002. IEEE Computer Society Press.
Kenneth L. Calvert, Matthew B. Doar, and Ellen W. Zegura. Modeling Internet topology. IEEE Communications Magazine, 35(6):160–163, 1997.
Jean M. Carlson and John Doyle. Highly optimized tolerance: A mechanism for power laws in designed systems. Physical Review E, 60(2):1412–1427, 1999.
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. R-MAT: A recursive model for graph mining. In SIAM Data Mining Conference, Philadelphia, PA, 2004. SIAM.
Q. Chen, H. Chang, Ramesh Govindan, Sugih Jamin, Scott Shenker, and Walter Willinger. The origin of power laws in Internet topologies revisited. In IEEE INFOCOM, Los Alamitos, CA, 2001. IEEE Computer Society Press.
Colin Cooper and Alan Frieze. The size of the largest strongly connected component of a random digraph with a given degree sequence. Combinatorics, Probability and Computing, 13(3):319–337, 2004.
Mark Crovella and Murad S. Taqqu. Estimating the heavy tail index from scaling properties. Methodology and Computing in Applied Probability, 1(1):55–79, 1999.
Derek John de Solla Price. A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27:292–306, 1976.
Stephen Dill, Ravi Kumar, Kevin S. McCurley, Sridhar Rajagopalan, D. Sivakumar, and Andrew Tomkins. Self-similarity in the Web. In International Conference on Very Large Data Bases, San Francisco, CA, 2001. Morgan Kaufmann.
Pedro Domingos and Matthew Richardson. Mining the network value of customers. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, New York, NY, 2001. ACM Press.
Sergey N. Dorogovtsev and Jose Fernando Mendes. Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford, UK, 2003.
Sergey N. Dorogovtsev, Jose Fernando Mendes, and Alexander N. Samukhin. Structure of growing networks with preferential linking. Physical Review Letters, 85(21):4633–4636, 2000.
Sergey N. Dorogovtsev, Jose Fernando Mendes, and Alexander N. Samukhin. Giant strongly connected component of directed networks. Physical Review E, 64:025101 1–4, 2001.
John Doyle and Jean M. Carlson. Power laws, Highly Optimized Tolerance, and Generalized Source Coding. Physical Review Letters, 84(24):5656–5659, June 2000.
Nan Du, Christos Faloutsos, Bai Wang, and Leman Akoglu. Large human communication networks: patterns and a utility-driven generator. In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 269–278, New York, NY, USA, 2009. ACM.
Paul Erdos and Alfred Renyi. On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Acadamy of Science, 5:17–61, 1960.
Paul Erdos and Alfred Renyi. On the strength of connectedness of random graphs. Acta Mathematica Scientia Hungary, 12:261–267, 1961.
Alex Fabrikant, Elias Koutsoupias, and Christos H. Papadimitriou. Heuristically Optimized Trade-offs: A new paradigm for power laws in the Internet. In International Colloquium on Automata, Languages and Programming, pages 110–122, Berlin, Germany, 2002. Springer Verlag.
Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On power-law relationships of the Internet topology. In Conference of the ACM Special Interest Group on Data Communications (SIGCOMM), pages 251–262, New York, NY, 1999. ACM Press.
Andrey Feuerverger and Peter Hall. Estimating a tail exponent by modelling departure from a Pareto distribution. The Annals of Statistics, 27(2):760–781, 1999.
Michael L. Goldstein, Steven A. Morris, and Gary G. Yen. Problems with fitting to the power-law distribution. The European Physics Journal B, 41:255–258, 2004.
Ramesh Govindan and Hongsuda Tangmunarunkit. Heuristics for Internet map discovery. In IEEE INFOCOM, pages 1371–1380, Los Alamitos, CA, March 2000. IEEE Computer Society Press.
Mark S. Granovetter. The strength of weak ties. The American Journal of Sociology, 78(6):1360–1380, May 1973.
Bruce M. Hill. A simple approach to inference about the tail of a distribution. The Annals of Statistics, 3(5):1163–1174, 1975.
George Karypis and Vipin Kumar. Multilevel algorithms for multi-constraint graph partitioning. Technical Report 98-019, University of Minnesota, 1998.
Jon Kleinberg. Small world phenomena and the dynamics of information. In Neural Information Processing Systems Conference, Cambridge, MA, 2001. MIT Press.
Jon Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. The web as a graph: Measurements, models and methods. In International Computing and Combinatorics Conference, Berlin, Germany, 1999. Springer.
Paul L. Krapivsky and Sidney Redner. Organization of growing random networks. Physical Review E, 63(6):066123 1–14, 2001.
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, D. Sivakumar, Andrew Tomkins, and Eli Upfal. Stochastic models for the Web graph. In IEEE Symposium on Foundations of Computer Science, Los Alamitos, CA, 2000. IEEE Computer Society Press.
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Extracting large-scale knowledge bases from the web. In International Conference on Very Large Data Bases, San Francisco, CA, 1999. Morgan Kaufmann.
Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Gharamani. Kronecker graphs: an approach to modeling networks, 2008.
Jure Leskovec, Mary Mcglohon, Christos Faloutsos, Natalie Glance, and Matthew Hurst. Cascading behavior in large blog graphs. SIAM International Conference on Data Mining (SDM), 2007.
Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, and Christos Faloutsos. Realistic, mathematically tractable graph generation and evolution, using Kronecker Multiplication. In Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, 2005. Springer.
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, New York, NY, 2005. ACM Press.
Mary Mcglohon, Leman Akoglu, and Christos Faloutsos. Weighted graphs and disconnected components: Patterns and a generator. In ACM Special Interest Group on Knowledge Discovery and Data Mining (SIG-KDD), August 2008.
Alberto Medina, Ibrahim Matta, and John Byers. On the origin of power laws in Internet topologies. In Conference of the ACM Special Interest Group on Data Communications (SIGCOMM), pages 18–34, New York, NY, 2000. ACM Press.
Milena Mihail and Christos H. Papadimitriou. On the eigenvalue power law. In International Workshop on Randomization and Approximation Techniques in Computer Science, Berlin, Germany, 2002. Springer Verlag.
Michael Mitzenmacher. A brief history of generative models for power law and lognormal distributions. In Proc. 39th Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL, 2001. UIUC Press.
Alan L. Montgomery and Christos Faloutsos. Identifying Web browsing trends and patterns. IEEE Computer, 34(7):94–95, 2001.
M. E. J. Newman. Power laws, pareto distributions and zipf’s law, December 2004.
Mark E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167–256, 2003.
Mark E. J. Newman. Power laws, pareto distributions and Zipf’s law. Contemporary Physics, 46:323–351, 2005.
Mark E. J. Newman, Stephanie Forrest, and Justin Balthrop. Email networks and the spread of computer viruses. Physical Review E, 66(3):035101 1–4, 2002.
Mark E. J. Newman, Michelle Girvan, and J. Doyne Farmer. Optimal design, robustness and risk aversion. Physical Review Letters, 89(2):028301 1–4, 2002.
Mark E. J. Newman, Steven H. Strogatz, and Duncan J. Watts. Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64(2):026118 1–17, 2001.
Christine Nickel. Random Dot Product Graphs: A Model for Social Networks. PhD thesis, The Johns Hopkins University, 2007.
Christopher Palmer, Phil B. Gibbons, and Christos Faloutsos. ANF: A fast and scalable tool for data mining in massive graphs. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining, New York, NY, 2002. ACM Press.
Christopher Palmer and J. Gregory Steffan. Generating network topologies that obey power laws. In IEEE Global Telecommunications Conference, Los Alamitos, CA, November 2000. IEEE Computer Society Press.
Gopal Pandurangan, Prabhakar Raghavan, and Eli Upfal. Using PageRank to characterize Web structure. In International Computing and Combinatorics Conference, Berlin, Germany, 2002. Springer.
Romualdo Pastor-Satorras, Alexei Vasquez, and Alessandro Vespignani. Dynamical and correlation properties of the Internet. Physical Review Letters, 87(25):258701 1–4, 2001.
David M. Pennock, Gary W. Flake, Steve Lawrence, Eric J. Glover, and C. Lee Giles. Winners don’t take all: Characterizing the competition for links on the Web. Proceedings of the National Academy of Sciences, 99(8):5207–5211, 2002.
Sidney Redner. How popular is your paper? an empirical study of the citation distribution. The European Physics Journal B, 4:131–134, 1998.
Herbert Simon. On a class of skew distribution functions. Biometrika, 42(3/4):425–440, 1955.
Hongsuda Tangmunarunkit, Ramesh Govindan, Sugih Jamin, Scott Shenker, and Walter Willinger. Network topologies, power laws, and hierarchy. Technical Report 01-746, University of Southern California, 2001.
Sudhir L. Tauro, Christopher Palmer, Georgos Siganos, and Michalis Faloutsos. A simple conceptual model for the Internet topology. In Global Internet, Los Alamitos, CA, 2001. IEEE Computer Society Press.
Jeffrey Travers and Stanley Milgram. An experimental study of the Small World problem. Sociometry, 32(4):425–443, 1969.
Duncan J. Watts. Six Degrees: The Science of a Connected Age. W. W. Norton and Company, New York, NY, 1st edition, 2003.
Duncan J. Watts, Peter Sheridan Dodds, and Mark E. J. Newman. Identity and search in social networks. Science, 296:1302–1305, 2002.
Duncan J. Watts and Steven H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393:440–442, 1998.
Bernard M. Waxman. Routing of multipoint connections. IEEE Journal on Selected Areas in Communications, 6(9):1617–1622, December 1988.
H. S. Wilf. Generating Functionology. Academic Press, 1990.
Jared Winick and Sugih Jamin. Inet-3.0: Internet Topology Generator. Technical Report CSE-TR-456-02, University of Michigan, Ann Arbor, 2002.
Soon-Hyung Yook, Hawoong Jeong, and Albert-Laszlo Barabasi. Modeling the Internet’s large-scale topology. Proceedings of the National Academy of Sciences, 99(21):13382–13386, 2002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag US
About this chapter
Cite this chapter
Chakrabarti, D., Faloutsos, C., McGlohon, M. (2010). Graph Mining: Laws and Generators. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_3
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6045-0_3
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6044-3
Online ISBN: 978-1-4419-6045-0
eBook Packages: Computer ScienceComputer Science (R0)