Advertisement

Social Network Analysis and Mining

, Volume 3, Issue 4, pp 1039–1062 | Cite as

Communities validity: methodical evaluation of community mining algorithms

  • Reihaneh Rabbany
  • Mansoureh Takaffoli
  • Justin Fagnan
  • Osmar R. Zaïane
  • Ricardo J. G. B. Campello
Original Article

Abstract

Grouping data points is one of the fundamental tasks in data mining, which is commonly known as clustering if data points are described by attributes. When dealing with interrelated data, that is represented in the form a graph wherein a link between two nodes indicates a relationship between them, there has been a considerable number of approaches proposed in recent years for mining communities in a given network. However, little work has been done on how to evaluate the community mining algorithms. The common practice is to evaluate the algorithms based on their performance on standard benchmarks for which we know the ground-truth. This technique is similar to external evaluation of attribute-based clustering methods. The other two well-studied clustering evaluation approaches are less explored in the community mining context; internal evaluation to statistically validate the clustering result and relative evaluation to compare alternative clustering results. These two approaches enable us to validate communities discovered in a real-world application, where the true community structure is hidden in the data. In this article, we investigate different clustering quality criteria applied for relative and internal evaluation of clustering data points with attributes and also different clustering agreement measures used for external evaluation and incorporate proper adaptations to make them applicable in the context of interrelated data. We further compare the performance of the proposed adapted criteria in evaluating community mining results in different settings through extensive set of experiments.

Keywords

Evaluation approaches Quality measures Clustering evaluation Clustering objective function Community mining 

Notes

Acknowledgments

The authors are grateful for the support from Alberta Innovates Centre for Machine Learning and NSERC. Ricardo Campello also acknowledges the financial support of Fapesp and CNPq.

References

  1. Albatineh AN, Niewiadomska-Bugaj M, Mihalko D (2006) On similarity indices and correction for chance agreement. J Classif 23:301–313. doi: 10.1007/s00357-006-0017-z MathSciNetCrossRefGoogle Scholar
  2. Aldecoa R, Marin I (2012) Closed benchmarks for network community structure characterization. Phys Rev E 85:026109CrossRefGoogle Scholar
  3. Bezdek JC (1981) Pattern Recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, NorwellCrossRefzbMATHGoogle Scholar
  4. Calinski T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Theory Methods 3:1–27MathSciNetCrossRefzbMATHGoogle Scholar
  5. Campello R (2010) Generalized external indexes for comparing data partitions with overlapping categories. Pattern Recogn Lett 31(9):966–975CrossRefGoogle Scholar
  6. Campello R, Hruschka ER (2006) A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets Syst 157(21):2858–2875MathSciNetCrossRefzbMATHGoogle Scholar
  7. Chen J, Zaïane OR, Goebel R (2009) Detecting communities in social networks using max-min modularity. In: SIAM international conference on data mining, pp 978–989Google Scholar
  8. Clauset A (2005) Finding local community structure in networks. Phys Rev E (Statistical, Nonlinear, and Soft Matter Physics) 72(2):026132CrossRefGoogle Scholar
  9. Collins LM, Dent CW (1988) Omega: a general formulation of the rand index of cluster recovery suitable for non-disjoint solutions. Multivar Behav Res 23(2):231–242CrossRefGoogle Scholar
  10. Dalrymple-Alford EC (1970) Measurement of clustering in free recall. Psychol Bull 74:32–34CrossRefGoogle Scholar
  11. Danon L, Díaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 2005(09):09008. doi: 10.1088/1742-5468/2005/09/P09008
  12. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227Google Scholar
  13. Dumitrescu D, BL, Jain LC (2000) Fuzzy sets and their application to clustering and training. CRC Press, Boca RatonGoogle Scholar
  14. Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybern 4(1):95–104MathSciNetCrossRefGoogle Scholar
  15. Fortunato S (2010) Community detection in graphs. Phys Rep 486(35):75–174MathSciNetCrossRefGoogle Scholar
  16. Fortunato S, Barthélemy M (2007) Resolution limit in community detection. Proc Nat Acad Sci 104(1):36–41CrossRefGoogle Scholar
  17. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Nat Acad Sci 99(12):7821–7826MathSciNetCrossRefzbMATHGoogle Scholar
  18. Gregory S (2011) Fuzzy overlapping communities in networks. J Stat Mech Theory Exp 2:17Google Scholar
  19. Gustafsson M, Hörnquist M, Lombardi A (2006) Comparison and validation of community structures in complex networks. Phys A Stat Mech Appl 367:559–576CrossRefGoogle Scholar
  20. Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inform Syst 17:107–145CrossRefzbMATHGoogle Scholar
  21. Hppner F, Klawonn F, Kruse R, Runkler T (1999) Fuzzy cluster analysis: methods for classification, data analysis and image recognition. Wiley, New YorkGoogle Scholar
  22. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218CrossRefGoogle Scholar
  23. Hubert LJ, Levin JR (1976) A general statistical framework for assessing categorical clustering in free recall. Psychol Bull 83:1072–1080CrossRefGoogle Scholar
  24. Kenley EC, Cho Y-R (2011) Entropy-based graph clustering: application to biological and social networks. In: IEEE International Conference on Data MiningGoogle Scholar
  25. Krebs V. Books about us politics. http://www.orgnet.com/2004
  26. Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E 80(5):056117CrossRefGoogle Scholar
  27. Lancichinetti A, Fortunato S (2012) Consensus clustering in complex networks. Nat Sci Rep 2:336Google Scholar
  28. Lancichinetti A, Fortunato S, Kertsz J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11(3):033015CrossRefGoogle Scholar
  29. Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(4):046110Google Scholar
  30. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: ACM SIGKDD international conference on knowledge discovery in data mining, pp 177–187Google Scholar
  31. Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: International conference on world wide web, pp 631–640Google Scholar
  32. Luo F, Wang JZ, Promislow E (2008) Exploring local community structures in large networks. Web Intell Agent Syst 6(4):387–400Google Scholar
  33. Manning CD, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
  34. Meil M (2007) Comparing clusteringsan information based distance. J Multivar Anal 98(5):873–895CrossRefGoogle Scholar
  35. Milligan G, Cooper M (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179CrossRefGoogle Scholar
  36. Newman M (2010) Networks: an introduction. Oxford University Press, Inc., New YorkCrossRefGoogle Scholar
  37. Newman MEJ (2006) Modularity and community structure in networks. Proc Nat Acad Sci 103(23):8577–8582CrossRefGoogle Scholar
  38. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113CrossRefGoogle Scholar
  39. Nooy Wd, Mrvar A, Batagelj V (2004) Exploratory Social Network Analysis with Pajek. Cambridge University Press, CambridgeGoogle Scholar
  40. Onnela J-P, Fenn DJ, Reid S, Porter MA, Mucha PJ, Fricker MD, Jones NS (2010) Taxonomies of Networks. ArXiv e-printsGoogle Scholar
  41. Orman GK, Labatut V (2010) The effect of network realism on community detection algorithms. In: Proceedings of the 2010 international conference on advances in social networks analysis and mining. ASONAM ’10, pp 301–305Google Scholar
  42. Orman GK, Labatut V, Cherifi H (2011) Qualitative comparison of community detection algorithms. In: International conference on digital information and communication technology and its applications, vol 167, pp 265–279Google Scholar
  43. Pakhira M, Dutta A (2011) Computing approximate value of the pbm index for counting number of clusters using genetic algorithm. In: International conference on recent trends in information systemsGoogle Scholar
  44. Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814–818CrossRefGoogle Scholar
  45. Porter MA, Onnela J-P, Mucha PJ (2009) Communities in networks. Notices of the AMS 56(9):1082–1097Google Scholar
  46. Rabbany R, Chen J, Zaïane OR (2010) Top leaders community detection approach in information networks. In: SNA-KDD workshop on social network mining and analysis Google Scholar
  47. Rabbany R, Takaffoli M, Fagnan J, Zaiane O, Campello R (2012) Relative validity criteria for community mining algorithms. In: International conference on advances in social networks analysis and mining (ASONAM)Google Scholar
  48. Rabbany R, Zaïane OR (2011) A diffusion of innovation-based closeness measure for network associations. In: IEEE international conference on data mining workshops, pp 381–388Google Scholar
  49. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabsi A-L (2002) Hierarchical organization of modularity in metabolic networks. Science 297(5586):1551–1555CrossRefGoogle Scholar
  50. Rees BS, Gallagher KB (2012) Overlapping community detection using a community optimized graph swarm. Soc Netw Anal Mining 2(4):405–417CrossRefGoogle Scholar
  51. Rosvall M, Bergstrom CT (2007) An information-theoretic framework for resolving community structure in complex networks. Proc Nat Acad Sci 104(18):7327–7331CrossRefGoogle Scholar
  52. Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Nat Acad Sci 105(4):1118–1123CrossRefGoogle Scholar
  53. Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65CrossRefzbMATHGoogle Scholar
  54. Sallaberry A, Zaidi F, Melançon G (2013) Model for generating artificial social networks having community structures with small-world and scale-free properties. Soc Netw Anal Min 3(3):597–609Google Scholar
  55. Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetzbMATHGoogle Scholar
  56. Theodoridis S, Koutroumbas K (2009) Cluster validity. In: Pattern recognition, chapter 16, 4 ed. Elsevier Science, LondonGoogle Scholar
  57. Vendramin L, Campello RJGB, Hruschka ER (2010) Relative clustering validity criteria: a comparative overview. Stat Anal Data Mining 3(4):209–235MathSciNetGoogle Scholar
  58. Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th annual international conference on machine learning, ICML ’09. ACM, New York, pp 1073–1080Google Scholar
  59. Vinh NX, Epps J, Bailey J (2010). Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854MathSciNetzbMATHGoogle Scholar
  60. Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, CambridgeGoogle Scholar
  61. Wu J, Xiong H, Chen J (2009) Adapting the right measures for k-means clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09. ACM, New York, pp 877–886Google Scholar
  62. Yoshida T (2013) Weighted line graphs for overlapping community discovery. Soc Netw Anal Min 1–13. doi: 10.1007/s13278-013-0104-1
  63. Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33:452–473Google Scholar

Copyright information

© Springer-Verlag Wien 2013

Authors and Affiliations

  • Reihaneh Rabbany
    • 1
  • Mansoureh Takaffoli
    • 1
  • Justin Fagnan
    • 1
  • Osmar R. Zaïane
    • 1
  • Ricardo J. G. B. Campello
    • 1
  1. 1.Department of Computing ScienceUniversity of AlbertaEdmontonCanada

Personalised recommendations