Abstract
Grouping data points is one of the fundamental tasks in data mining, commonly known as clustering. In the case of interrelated data, when data is represented in the form of nodes and their relationships, the grouping is referred to as community. A community is often defined based on the connectivity of nodes rather than their attributes or features. The variety of definitions and methods and its subjective nature, makes the evaluation of community mining methods non-trivial. In this paper we point out the critical issues in the common evaluation practices, and discuss the alternatives. In particular, we focus on the common practice of using attributes as the ground-truth communities in large real networks. We suggest to treat these attributes as another source of information, and to use them to refine the communities and tune parameters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Code available at: https://github.com/rabbanyk/CommunityEvaluation.
- 2.
This graph representation has also been used in link recommendation, e.g. see [10].
- 3.
The concept is however general and can be applied to fine tune parameters of any community mining algorithm. Which is true for algorithms which are capable of providing different community structure perspectives, based on different values for the algorithm parameters.
- 4.
For attribute ‘highschool’, true k is 1075 and out of the plot’s scale.
References
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Statis. Mech.: Theory Exp. 2008(10), P10008 (2008)
Chen, J., Zaiane, O., Goebel, R.: An unsupervised approach to cluster web search results based on word sense communities. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2008, vol. 1, pp. 725–729, December 2008
Chen, J., Zaïane, O.R., Goebel, R.: Detecting communities in social networks using max-min modularity. In: SIAM International Conference on Data Mining, pp. 978–989 (2009)
Clauset, A.: Finding local community structure in networks. Phys. Rev. E (Statis., Nonlinear, Soft Matter Phys.) 72(2), 026132 (2005)
Crandall, D., Cosley, D., Huttenlocher, D., Kleinberg, J., Suri, S.: Feedback effects between similarity and social influence in online communities. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 160–168. ACM (2008)
Cruz Gomez, J.D., Bothorel, C.: Information integration for detecting communities in attributed graphs. In: 2013 Fifth International Conference on Computational Aspects of Social Networks (CASoN), pp. 62–67 (2013)
Danon, L., Guilera, A.D., Duch, J., Arenas, A.: Comparing community structure identification. J. Statis. Mech.: Theory Exp. (09), 09008 (2005)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(35), 75–174 (2010)
Fortunato, S., Castellano, C.: Community structure in graphs. In: Computational Complexity, pp. 490–512. Springer (2012)
Gong, N.Z., Talwalkar, A., Mackey, L., Huang, L., Shin, E.C.R., Stefanov, E., Song, D., et al.: Jointly predicting links and inferring attributes using a social-attribute network (san). arXiv preprint arXiv:1112.3265 (2011)
Günnemann, S., Boden, B., Färber, I., Seidl, T.: Efficient mining of combined subspace and subgraph clusters in graphs with feature vectors. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part I. LNCS, vol. 7818, pp. 261–275. Springer, Heidelberg (2013)
Gustafsson, M., Hörnquist, M., Lombardi, A.: Comparison and validation of community structures in complex networks. Phys. A Statis. Mech. Its Appl. 367, 559–576 (2006)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intel. Inf. Syst. 17, 107–145 (2001)
Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological networks and gene expression data. Bioinformatics 18(suppl. 1), S145–S154 (2002)
Hu, B., Song, Z., Ester, M.: User features and social networks for topic modeling in online social media. In: 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 202–209. IEEE (2012)
La Fond, T., Neville, J.: Randomization tests for distinguishing social influence and homophily effects. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 601–610. ACM, New York (2010)
Lancichinetti, A., Fortunato, S.: Community detection algorithms: A comparative analysis. Phys. Rev. E 80(5), 056117 (2009)
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)
Lancichinetti, A., Kivelä, M., Saramäki, J., Fortunato, S.: Characterizing the community structure of complex networks. PloS One 5(8), e11976 (2010)
Largeron, C., Mougel, P., Rabbany, R., Zaïane, O.R.: Generating attributed networks with communities. PloS One (to appear, 2015)
Lee, C., Cunningham, P.: Benchmarking community detection methods on social media data. arXiv preprint arXiv:1302.0739 (2013)
Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, pp. 631–640. ACM (2010)
Lewis, K., Gonzalez, M., Kaufman, J.: Social selection and peer influence in an online social network. Proc. Nat. Acad. Sci. 109(1), 68–72 (2012)
Luo, F., Wang, J.Z., Promislow, E.: Exploring local community structures in large networks. Web Intel. Agent Syst. 6, 387–400 (2008)
Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM 2010, pp. 251–260. ACM, New York (2010)
Moser, F., Colak, R., Rafiey, A., Ester, M.: Mining cohesive patterns from graphs with feature vectors. SDM 9, 593–604 (2009)
Moussiades, L., Vakali, A.: Benchmark graphs for the evaluation of clustering algorithms. In: Proceedings of the Third IEEE International Conference on Research Challenges in Information Science, RCIS 2009, pp. 197–206 (2009)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)
Onnela, J.P., Arbesman, S., González, M.C., Barabási, A.L., Christakis, N.A.: Geographic constraints on social network groups. PLoS One 6(4), e16939 (2011)
Orman, G.K., Labatut, V.: The effect of network realism on community detection algorithms. In: Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2010, pp. 301–305 (2010)
Orman, G.K., Orman, G.K., Labatut, V., Labatut, V., Cherifi, H., Cherifi, H.: Qualitative comparison of community detection algorithms. In: Cherifi, H., Cherifi, H., Zain, J.M., Zain, J.M., El-Qawasmeh, E., El-Qawasmeh, E. (eds.) DICTAP 2011 Part II. CCIS, vol. 167, pp. 265–279. Springer, Heidelberg (2011)
Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814–818 (2005)
Latapy, M., Latapy, M., Pons, P., Pons, P.: Computing communities in large networks using random walks. In: Yolum, I., Yolum, I., Özturan, C., Özturan, C., Gürgen, F., Gürgen, F., Güngör, T., Güngör, T. (eds.) ISCIS 2005. LNCS, vol. 3733, pp. 284–293. Springer, Heidelberg (2005)
Rabbany, R., Takaffoli, M., Fagnan, J., Zaiane, O., Campello, R.: Relative validity criteria for community mining algorithms. In: 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), August 2012
Rabbany, R., Chen, J., Zaïane, O.R.: Top leaders community detection approach in information networks. In: Proceedings of the 4th Workshop on Social Network Mining and Analysis (2010)
Rabbany, R., Chen, J., Zaïane, O.R.: Top leaders community detection approach in information networks. In: SNA-KDD Workshop on Social Network Mining and Analysis (2010)
Rabbany, R., Takaffoli, M., Fagnan, J., Zaïane, O.R., Campello, R.: Relative validity criteria for community mining algorithms. In: Social Networks Analysis and Mining (SNAM) (2013)
Rabbany, R., Zaïane, O.R.: A diffusion of innovation-based closeness measure for network associations. In: IEEE International Conference on Data Mining Workshops, pp. 381–388 (2011)
Rabbany, R., Zaïane, O.R.: Generalization of clustering agreements and distances for overlapping clusters and network communities. CoRR abs/1412.2601 (2014)
Rosvall, M., Bergstrom, C.T.: An information-theoretic framework for resolving community structure in complex networks. Proc. Nat. Acad. Sci. 104(18), 7327–7331 (2007)
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Nat. Acad. Sci. 105(4), 1118–1123 (2008)
Rosvall, M., Bergstrom, C.T.: Mapping change in large networks. PloS One 5(1), e8694 (2010)
Spirin, V., Mirny, L.A.: Protein complexes and functional modules in molecular networks. Proc. Nat. Acad. Sci. 100(21), 12123–12128 (2003)
Traud, A.L., Kelsic, E.D., Mucha, P.J., Porter, M.A.: Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53(3), 526–543 (2011)
Traud, A.L., Mucha, P.J., Porter, M.A.: Social structure of facebook networks. Phys. A: Statis. Mech. Appl. 391(16), 4165–4180 (2012)
Wagner, A., Fell, D.A.: The small world inside large metabolic networks. Proc. Royal Soc. Lond. Ser. B: Biol. Sci. 268(1478), 1803–1810 (2001)
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.: Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 824–833. ACM (2007)
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, p. 3. ACM (2012)
Yang, T., Jin, R., Chi, Y., Zhu, S.: Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 927–936. ACM (2009)
Yang, Y., Sun, Y., Pandit, S., Chawla, N.V., Han, J.: Perspective on measurement metrics for community detection algorithms. In: Mining Social Networks and Security Informatics, pp. 227–242. Springer (2013)
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endowment 2(1), 718–729 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Rabbany, R., Zaïane, O.R. (2015). Evaluation of Community Mining Algorithms in the Presence of Attributes. In: Li, XL., Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D. (eds) Trends and Applications in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science(), vol 9441. Springer, Cham. https://doi.org/10.1007/978-3-319-25660-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-25660-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25659-7
Online ISBN: 978-3-319-25660-3
eBook Packages: Computer ScienceComputer Science (R0)