A Survey of Clustering Algorithms for Graph Data

  • Charu C. Aggarwal
  • Haixun Wang
Part of the Advances in Database Systems book series (ADBS, volume 40)


In this chapter, we will provide a survey of clustering algorithms for graph data. We will discuss the different categories of clustering algorithms and recent efforts to design clustering methods for various kinds of graphical data. Clustering algorithms are typically of two types. The first type consists of node clustering algorithms in which we attempt to determine dense regions of the graph based on edge behavior. The second type consists of structural clustering algorithms, in which we attempt to cluster the different graphs based on overall structural behavior. We will also discuss the applicability of the approach to other kinds of data such as semi-structured data, and the utility of graph mining algorithms to such representations.


Graph Clustering Dense Subgraph Discovery 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    J. Abello, M. G. Resende, S. Sudarsky, Massive quasi-clique detection. Proceedings of the 5th Latin American Symposium on Theoretical Informatics (LATIN), pp. 598–612, 2002.Google Scholar
  2. [2]
    C. Aggarwal, N. Ta, J. Feng, J. Wang, M. J. Zaki. XProj: A Framework for Projected Structural Clustering of XML Documents, KDD Conference, 2007.Google Scholar
  3. [3]
    R. Agrawal, A. Borgida, H.V. Jagadish. Efficient Maintenance of transitive relationships in large data and knowledge bases, ACM SIGMOD Conference, 1989.Google Scholar
  4. [4]
    R. Ahuja, J. Orlin, T. Magnanti. Network Flows: Theory, Algorithms, and Applications, Prentice Hall, Englewood Cliffs, NJ, 1992.Google Scholar
  5. [5]
    A. Z. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher, Syntactic clustering of the web, WWW Conference, Computer Networks, 29(8–13):1157–1166, 1997.Google Scholar
  6. [6]
    D. Chakrabarti, Y. Zhan, C. Faloutsos R-MAT: A Recursive Model for Graph Mining. SDM Conference, 2004.Google Scholar
  7. [7]
    S.S. Chawathe. Comparing Hierachical data in external memory. Very Large Data Bases Conference, 1999.Google Scholar
  8. [8]
    J. Cheriyan, T. Hagerup, K. Melhorn An O(n 3)-time maximum-flow algorithm, SIAM Journal on Computing, Volume 25, Issue 6, pp. 1144–1170, 1996.zbMATHCrossRefMathSciNetGoogle Scholar
  9. [9]
    F. Chung,. Spectral graph theory. Washington: Conference Board of the Mathematical Sciences, 1997.Google Scholar
  10. [10]
    T. Dalamagas, T. Cheng, K. Winkel, T. Sellis. Clustering XML Documents Using Structural Summaries. Information Systems, Elsevier, January 2005.Google Scholar
  11. [11]
    J. Cheng, J. Xu Yu, X. Lin, H. Wang, and P. S. Yu, Fast Computing Reachability Labelings for Large Graphs with High Compression Rate, EDBT Conference, 2008.Google Scholar
  12. [12]
    J. Cheng, J. Xu Yu, X. Lin, H. Wang, and P. S. Yu, Fast Computation of Reachability Labelings in Large Graphs, EDBT Conference, 2006.Google Scholar
  13. [13]
    E. Cohen. Size-estimation framework with applications to transitive closure and reachability, Journal of Computer and System Sciences, v.55 n.3, p.441–453, Dec. 1997.zbMATHCrossRefMathSciNetGoogle Scholar
  14. [14]
    E. Cohen, E. Halperin, H. Kaplan, and U. Zwick, Reachability and distance queries via 2-hop labels, ACM Symposium on Discrete Algorithms, 2002.Google Scholar
  15. [15]
    D. Cook, L. Holder, Mining Graph Data, John Wiley & Sons Inc, 2007.Google Scholar
  16. [16]
    E. W. Dijkstra, A note on two problems in connection with graphs. Numerische Mathematik, 1 (1959), S. 269–271.zbMATHCrossRefMathSciNetGoogle Scholar
  17. [17]
    M. Faloutsos, P. Faloutsos, C. Faloutsos, On Power Law Relationships of the Internet Topology. SIGCOMM Conference, 1999.Google Scholar
  18. [18]
    P.-O. Fjallstrom, Algorithms for Graph Partitioning: A Survey, Linkoping Electronic Articles in Computer and Information Science Vol 3, no 10, 1998.Google Scholar
  19. [19]
    G. Flake, R. Tarjan, M. Tsioutsiouliklis. Graph Clustering and Minimum Cut Trees, Internet Mathematics, 1(4), 385–408, 2003.MathSciNetGoogle Scholar
  20. [20]
    I. Freeman. Centrality in Social Networks, Social Networks, 1, 215–239, 1979.CrossRefGoogle Scholar
  21. [21]
    M. S. Garey, D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness, W. H. Freeman, 1979.Google Scholar
  22. [22]
    D. Gibson, R. Kumar, A. Tomkins, Discovering Large Dense Subgraphs in Massive Graphs, VLDB Conference, 2005.Google Scholar
  23. [23]
    M. Girvan, M. Newman. Community Structure in Social and Biological Networks, Proceedings of the National Academy of Science, 99, 7821–7826, 2002.zbMATHCrossRefMathSciNetGoogle Scholar
  24. [24]
    A. Jain and R. Dubes, Algorithms for Clustering Data, Prentice Hall, New Jersey, 1998.Google Scholar
  25. [25]
    H. Kashima, K. Tsuda, A. Inokuchi. Marginalized Kernels between Labeled Graphs, ICML, 2003.Google Scholar
  26. [26]
    B.W. Kernighan, S. Lin. An efficient heuristic procedure for partitioning graphs, Bell System Tech. Journal, vol. 49, Feb. 1970, pp. 291–307.Google Scholar
  27. [27]
    T. Kudo, E. Maeda, Y. Matsumoto. An Application of Boosting to Graph Classification, NIPS Conf. 2004.Google Scholar
  28. [28]
    M. Lee, W. Hsu, L. Yang, X. Yang. XClust: Clustering XML Schemas for Effective Integration. ACM Conference on Information and Knowledge Management, 2002.Google Scholar
  29. [29]
    W. Lian, D.W. Cheung, N. Mamoulis, S. Yiu. An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, IEEE Transactions on Knowledge and Data Engineering, Vol 16, No. 1, 2004.Google Scholar
  30. [30]
    R. Kumar, P Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal. The Web as a Graph. ACM PODS Conference, 2000.Google Scholar
  31. [31]
    M. Matsuda et al. Classifying molecular sequences using a linkage graph with their pairwise similarities. Theoretical Computer Science, 210(2):305–325, 1999.zbMATHCrossRefMathSciNetGoogle Scholar
  32. [32]
    J. Pei, D. Jiang, A. Zhang. On Mining Cross-Graph Quasi-Cliques, ACM KDD Conference, 2005.Google Scholar
  33. [33]
    J. Pei, D. Jiang, A. Zhang. Mining Cross-Graph Quasi-Cliques in Gene Expression and Protein Interaction Data, ICDE Conference, 2005.Google Scholar
  34. [34]
    S. Raghavan, H. Garcia-Molina. Representing web graphs. ICDE Conference, pages 405–416, 2003.Google Scholar
  35. [35]
    M. Rattigan, M. Maier, D. Jensen: Graph Clustering with Network Sructure Indices. ICML, 2007.Google Scholar
  36. [36]
    M. Rattigan, M. Maier, D. Jensen: Using structure indices for approximation of network properties. ACM KDD Conference, 2006.Google Scholar
  37. [37]
    A. A. Tsay, W. S. Lovejoy, David R. Karger, Random Sampling in Cut, Flow, and Network Design Problems, Mathematics of Operations Research, 24(2):383–413, 1999.CrossRefMathSciNetGoogle Scholar
  38. [38]
    H. Wang, H. He, J. Yang, J. Xu-Yu, P. Yu. Dual Labeling: Answering Graph Reachability Queries in Constant Time. ICDE Conference, 2006.Google Scholar
  39. [39]
    X. Yan, J. Han. CloseGraph: Mining Closed Frequent Graph Patterns, ACM KDD Conference, 2003.Google Scholar
  40. [40]
    X. Yan, H. Cheng, J. Han, and P. S. Yu, Mining Significant Graph Patterns by Scalable Leap Search, SIGMOD Conference, 2008.Google Scholar
  41. [41]
    X. Yan, P. S. Yu, and J. Han, Graph Indexing: A Frequent Structure-based Approach, SIGMOD Conference, 2004.Google Scholar
  42. [42]
    M. J. Zaki, C. C. Aggarwal. XRules: An Effective Structural Classifier for XML Data, KDD Conference, 2003.Google Scholar
  43. [43]
    Z. Zeng, J. Wang, L. Zhou, G. Karypis, Out-of-core Coherent Closed Quasi-Clique Mining from Large Dense Graph Databases, ACM Transactions on Database Systems, Vol 31(2), 2007.Google Scholar

Copyright information

© Springer-Verlag US 2010

Authors and Affiliations

  1. 1.IBM T. J. Watson Research CenterHawthorneUSA
  2. 2.Microsoft Research AsiaBeijingChina

Personalised recommendations