Is There a Best Quality Metric for Graph Clusters?

  • Hélio Almeida
  • Dorgival Guedes
  • Wagner MeiraJr.
  • Mohammed J. Zaki
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6911)

Abstract

Graph clustering, the process of discovering groups of similar vertices in a graph, is a very interesting area of study, with applications in many different scenarios. One of the most important aspects of graph clustering is the evaluation of cluster quality, which is important not only to measure the effectiveness of clustering algorithms, but also to give insights on the dynamics of relationships in a given network. Many quality evaluation metrics for graph clustering have been proposed in the literature, but there is no consensus on how do they compare to each other and how well they perform on different kinds of graphs. In this work we study five major graph clustering quality metrics in terms of their formal biases and their behavior when applied to clusters found by four implementations of classic graph clustering algorithms on five large, real world graphs. Our results show that those popular quality metrics have strong biases toward incorrectly awarding good scores to some kinds of clusters, especially seen in larger networks. They also indicate that currently used clustering algorithms and quality metrics do not behave as expected when cluster structures are different from the more traditional, clique-like ones.

Keywords

Cluster Algorithm Quality Metrics Spectral Cluster Good Cluster Graph Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brandes, U., Gaertler, M., Wagner, D.: Engineering graph clustering: Models and experimental evaluation. J. Exp. Algorithmics 12, 1–26 (2008)MathSciNetMATHGoogle Scholar
  2. 2.
    Danon, L., Díaz-Guilera, A., Duch, J., Arenas, A.: Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment 2005(09), P09008 (2005)Google Scholar
  3. 3.
    Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)CrossRefGoogle Scholar
  4. 4.
    Van Dongen, S.: Graph clustering via a discrete uncoupling process. SIAM Journal on Matrix Analysis and Applications 30(1), 121–141 (2008)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America 99(12), 7821–7826 (2002)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Good, B.H., de Montjoye, Y.A., Clauset, A.: Performance of modularity maximization in practical contexts. Physical Review E 81(4), 046106+ (2010)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Gustafson, M., Lombardi, A.: Comparison and validation of community structures in complex networks. Physica A: Statistical Mechanics and its Application 367, 559–576 (2006)CrossRefGoogle Scholar
  8. 8.
    Kannan, R., Vempala, S., Vetta, A.: On clusterings: Good, bad and spectral. J. ACM 51(3), 497–515 (2004)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. Comput. Netw. 31, 1481–1493 (1999)CrossRefGoogle Scholar
  10. 10.
    Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Statistical properties of community structure in large social and information networks. In: WWW 2008: Proceeding of the 17th International Conference on World Wide Web, pp. 695–704. ACM, New York (2008)Google Scholar
  11. 11.
    Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 631–640. ACM, New York (2010)Google Scholar
  12. 12.
    Nepusz, T., Bazso, F.: Likelihood-based clustering of directed graphs, pp. 189–194 (March 2007)Google Scholar
  13. 13.
    Nepusz, T., Sasidharan, R., Paccanaro, A.: Scps: a fast implementation of a spectral method for detecting protein families on a genome-wide scale. BMC Bioinformatics 11(1), 120 (2010)CrossRefGoogle Scholar
  14. 14.
    Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E 69(2) (February 2004)Google Scholar
  15. 15.
    Newman, M.E.J.: Mixing patterns in networks. Phys. Rev. E 67(2), 26126 (2003)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Newman, M.E.J., Girvan, M.: Mixing Patterns and Community Structure in Networks. In: Pastor-Satorras, R., Rubi, M., Diaz-Guilera, A. (eds.) Statistical Mechanics of Complex Networks. Lecture Notes in Physics, vol. 625, pp. 66–87. Springer, Berlin (2003)CrossRefGoogle Scholar
  17. 17.
    Schaeffer, S.E.: Graph clustering. Computer Science Review 1(1), 27–64 (2007)CrossRefMATHGoogle Scholar
  18. 18.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 888–905 (2000)CrossRefGoogle Scholar
  19. 19.
    Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Grobelnik, M., Mladenic, D., Milic-Frayling, N. (eds.) KDD-2000 Workshop on Text Mining, Boston, MA, August 20, pp. 109–111 (2000)Google Scholar
  20. 20.
    Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: KDD 2002: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 32–41. ACM, New York (2002)Google Scholar
  21. 21.
    Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Inc., Boston (2005)Google Scholar
  22. 22.
    van Dongen, S.M.: Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, The Netherlands (2000)Google Scholar
  23. 23.
    Zachary, W.W.: An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 452–473 (1977)CrossRefGoogle Scholar
  24. 24.
    Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Hélio Almeida
    • 1
  • Dorgival Guedes
    • 1
  • Wagner MeiraJr.
    • 1
  • Mohammed J. Zaki
    • 2
  1. 1.Universidade Federal de Minas GeraisBrazil
  2. 2.Rensselaer Polytechnic InstituteUSA

Personalised recommendations