Skip to main content

Evaluation of Community Mining Algorithms in the Presence of Attributes

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9441)

Abstract

Grouping data points is one of the fundamental tasks in data mining, commonly known as clustering. In the case of interrelated data, when data is represented in the form of nodes and their relationships, the grouping is referred to as community. A community is often defined based on the connectivity of nodes rather than their attributes or features. The variety of definitions and methods and its subjective nature, makes the evaluation of community mining methods non-trivial. In this paper we point out the critical issues in the common evaluation practices, and discuss the alternatives. In particular, we focus on the common practice of using attributes as the ground-truth communities in large real networks. We suggest to treat these attributes as another source of information, and to use them to refine the communities and tune parameters.

Keywords

  • Network clusters
  • Community mining
  • Networks with attributes
  • Community evaluation
  • Community validation

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-25660-3_13
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   44.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-25660-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   59.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

Notes

  1. 1.

    Code available at: https://github.com/rabbanyk/CommunityEvaluation.

  2. 2.

    This graph representation has also been used in link recommendation, e.g. see [10].

  3. 3.

    The concept is however general and can be applied to fine tune parameters of any community mining algorithm. Which is true for algorithms which are capable of providing different community structure perspectives, based on different values for the algorithm parameters.

  4. 4.

    For attribute ‘highschool’, true k is 1075 and out of the plot’s scale.

References

  1. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Statis. Mech.: Theory Exp. 2008(10), P10008 (2008)

    Google Scholar 

  2. Chen, J., Zaiane, O., Goebel, R.: An unsupervised approach to cluster web search results based on word sense communities. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2008, vol. 1, pp. 725–729, December 2008

    Google Scholar 

  3. Chen, J., Zaïane, O.R., Goebel, R.: Detecting communities in social networks using max-min modularity. In: SIAM International Conference on Data Mining, pp. 978–989 (2009)

    Google Scholar 

  4. Clauset, A.: Finding local community structure in networks. Phys. Rev. E (Statis., Nonlinear, Soft Matter Phys.) 72(2), 026132 (2005)

    Google Scholar 

  5. Crandall, D., Cosley, D., Huttenlocher, D., Kleinberg, J., Suri, S.: Feedback effects between similarity and social influence in online communities. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 160–168. ACM (2008)

    Google Scholar 

  6. Cruz Gomez, J.D., Bothorel, C.: Information integration for detecting communities in attributed graphs. In: 2013 Fifth International Conference on Computational Aspects of Social Networks (CASoN), pp. 62–67 (2013)

    Google Scholar 

  7. Danon, L., Guilera, A.D., Duch, J., Arenas, A.: Comparing community structure identification. J. Statis. Mech.: Theory Exp. (09), 09008 (2005)

    Google Scholar 

  8. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(35), 75–174 (2010)

    MathSciNet  CrossRef  Google Scholar 

  9. Fortunato, S., Castellano, C.: Community structure in graphs. In: Computational Complexity, pp. 490–512. Springer (2012)

    Google Scholar 

  10. Gong, N.Z., Talwalkar, A., Mackey, L., Huang, L., Shin, E.C.R., Stefanov, E., Song, D., et al.: Jointly predicting links and inferring attributes using a social-attribute network (san). arXiv preprint arXiv:1112.3265 (2011)

  11. Günnemann, S., Boden, B., Färber, I., Seidl, T.: Efficient mining of combined subspace and subgraph clusters in graphs with feature vectors. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part I. LNCS, vol. 7818, pp. 261–275. Springer, Heidelberg (2013)

    CrossRef  Google Scholar 

  12. Gustafsson, M., Hörnquist, M., Lombardi, A.: Comparison and validation of community structures in complex networks. Phys. A Statis. Mech. Its Appl. 367, 559–576 (2006)

    CrossRef  Google Scholar 

  13. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intel. Inf. Syst. 17, 107–145 (2001)

    CrossRef  MATH  Google Scholar 

  14. Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological networks and gene expression data. Bioinformatics 18(suppl. 1), S145–S154 (2002)

    CrossRef  Google Scholar 

  15. Hu, B., Song, Z., Ester, M.: User features and social networks for topic modeling in online social media. In: 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 202–209. IEEE (2012)

    Google Scholar 

  16. La Fond, T., Neville, J.: Randomization tests for distinguishing social influence and homophily effects. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 601–610. ACM, New York (2010)

    Google Scholar 

  17. Lancichinetti, A., Fortunato, S.: Community detection algorithms: A comparative analysis. Phys. Rev. E 80(5), 056117 (2009)

    Google Scholar 

  18. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)

    Google Scholar 

  19. Lancichinetti, A., Kivelä, M., Saramäki, J., Fortunato, S.: Characterizing the community structure of complex networks. PloS One 5(8), e11976 (2010)

    Google Scholar 

  20. Largeron, C., Mougel, P., Rabbany, R., Zaïane, O.R.: Generating attributed networks with communities. PloS One (to appear, 2015)

    Google Scholar 

  21. Lee, C., Cunningham, P.: Benchmarking community detection methods on social media data. arXiv preprint arXiv:1302.0739 (2013)

  22. Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, pp. 631–640. ACM (2010)

    Google Scholar 

  23. Lewis, K., Gonzalez, M., Kaufman, J.: Social selection and peer influence in an online social network. Proc. Nat. Acad. Sci. 109(1), 68–72 (2012)

    CrossRef  Google Scholar 

  24. Luo, F., Wang, J.Z., Promislow, E.: Exploring local community structures in large networks. Web Intel. Agent Syst. 6, 387–400 (2008)

    Google Scholar 

  25. Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM 2010, pp. 251–260. ACM, New York (2010)

    Google Scholar 

  26. Moser, F., Colak, R., Rafiey, A., Ester, M.: Mining cohesive patterns from graphs with feature vectors. SDM 9, 593–604 (2009)

    Google Scholar 

  27. Moussiades, L., Vakali, A.: Benchmark graphs for the evaluation of clustering algorithms. In: Proceedings of the Third IEEE International Conference on Research Challenges in Information Science, RCIS 2009, pp. 197–206 (2009)

    Google Scholar 

  28. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)

    Google Scholar 

  29. Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)

    Google Scholar 

  30. Onnela, J.P., Arbesman, S., González, M.C., Barabási, A.L., Christakis, N.A.: Geographic constraints on social network groups. PLoS One 6(4), e16939 (2011)

    Google Scholar 

  31. Orman, G.K., Labatut, V.: The effect of network realism on community detection algorithms. In: Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2010, pp. 301–305 (2010)

    Google Scholar 

  32. Orman, G.K., Orman, G.K., Labatut, V., Labatut, V., Cherifi, H., Cherifi, H.: Qualitative comparison of community detection algorithms. In: Cherifi, H., Cherifi, H., Zain, J.M., Zain, J.M., El-Qawasmeh, E., El-Qawasmeh, E. (eds.) DICTAP 2011 Part II. CCIS, vol. 167, pp. 265–279. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  33. Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814–818 (2005)

    CrossRef  Google Scholar 

  34. Latapy, M., Latapy, M., Pons, P., Pons, P.: Computing communities in large networks using random walks. In: Yolum, I., Yolum, I., Özturan, C., Özturan, C., Gürgen, F., Gürgen, F., Güngör, T., Güngör, T. (eds.) ISCIS 2005. LNCS, vol. 3733, pp. 284–293. Springer, Heidelberg (2005)

    CrossRef  Google Scholar 

  35. Rabbany, R., Takaffoli, M., Fagnan, J., Zaiane, O., Campello, R.: Relative validity criteria for community mining algorithms. In: 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), August 2012

    Google Scholar 

  36. Rabbany, R., Chen, J., Zaïane, O.R.: Top leaders community detection approach in information networks. In: Proceedings of the 4th Workshop on Social Network Mining and Analysis (2010)

    Google Scholar 

  37. Rabbany, R., Chen, J., Zaïane, O.R.: Top leaders community detection approach in information networks. In: SNA-KDD Workshop on Social Network Mining and Analysis (2010)

    Google Scholar 

  38. Rabbany, R., Takaffoli, M., Fagnan, J., Zaïane, O.R., Campello, R.: Relative validity criteria for community mining algorithms. In: Social Networks Analysis and Mining (SNAM) (2013)

    Google Scholar 

  39. Rabbany, R., Zaïane, O.R.: A diffusion of innovation-based closeness measure for network associations. In: IEEE International Conference on Data Mining Workshops, pp. 381–388 (2011)

    Google Scholar 

  40. Rabbany, R., Zaïane, O.R.: Generalization of clustering agreements and distances for overlapping clusters and network communities. CoRR abs/1412.2601 (2014)

    Google Scholar 

  41. Rosvall, M., Bergstrom, C.T.: An information-theoretic framework for resolving community structure in complex networks. Proc. Nat. Acad. Sci. 104(18), 7327–7331 (2007)

    CrossRef  Google Scholar 

  42. Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Nat. Acad. Sci. 105(4), 1118–1123 (2008)

    CrossRef  Google Scholar 

  43. Rosvall, M., Bergstrom, C.T.: Mapping change in large networks. PloS One 5(1), e8694 (2010)

    Google Scholar 

  44. Spirin, V., Mirny, L.A.: Protein complexes and functional modules in molecular networks. Proc. Nat. Acad. Sci. 100(21), 12123–12128 (2003)

    CrossRef  Google Scholar 

  45. Traud, A.L., Kelsic, E.D., Mucha, P.J., Porter, M.A.: Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53(3), 526–543 (2011)

    MathSciNet  CrossRef  Google Scholar 

  46. Traud, A.L., Mucha, P.J., Porter, M.A.: Social structure of facebook networks. Phys. A: Statis. Mech. Appl. 391(16), 4165–4180 (2012)

    CrossRef  Google Scholar 

  47. Wagner, A., Fell, D.A.: The small world inside large metabolic networks. Proc. Royal Soc. Lond. Ser. B: Biol. Sci. 268(1478), 1803–1810 (2001)

    CrossRef  Google Scholar 

  48. Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.: Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 824–833. ACM (2007)

    Google Scholar 

  49. Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, p. 3. ACM (2012)

    Google Scholar 

  50. Yang, T., Jin, R., Chi, Y., Zhu, S.: Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 927–936. ACM (2009)

    Google Scholar 

  51. Yang, Y., Sun, Y., Pandit, S., Chawla, N.V., Han, J.: Perspective on measurement metrics for community detection algorithms. In: Mining Social Networks and Security Informatics, pp. 227–242. Springer (2013)

    Google Scholar 

  52. Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endowment 2(1), 718–729 (2009)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reihaneh Rabbany .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Rabbany, R., Zaïane, O.R. (2015). Evaluation of Community Mining Algorithms in the Presence of Attributes. In: Li, XL., Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D. (eds) Trends and Applications in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science(), vol 9441. Springer, Cham. https://doi.org/10.1007/978-3-319-25660-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25660-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25659-7

  • Online ISBN: 978-3-319-25660-3

  • eBook Packages: Computer ScienceComputer Science (R0)