Abstract
The pervasiveness of Web 2.0 and social networking sites has enabled people to interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment on shared content (bookmarks, photos, videos), and users can tag their favorite content. Users can also connect with one another, and subscribe to or become a fan or a follower of others. These diverse activities result in a multi-dimensional network among actors, forming group structures with group members sharing similar interests or affiliations. This work systematically addresses two challenges. First, it is challenging to effectively integrate interactions over multiple dimensions to discover hidden community structures shared by heterogeneous interactions. We show that representative community detection methods for single-dimensional networks can be presented in a unified view. Based on this unified view, we present and analyze four possible integration strategies to extend community detection from single-dimensional to multi-dimensional networks. In particular, we propose a novel integration scheme based on structural features. Another challenge is the evaluation of different methods without ground truth information about community membership. We employ a novel cross-dimension network validation (CDNV) procedure to compare the performance of different methods. We use synthetic data to deepen our understanding, and real-world data to compare integration strategies as well as baseline methods in a large scale. We study further the computational time of different methods, normalization effect during integration, sensitivity to related parameters, and alternative community detection methods for integration.
Similar content being viewed by others
References
Abou-Rjeili A, Karypis G (2006) Multilevel algorithms for partitioning power-law graphs. In: Proceedings of the 20th international conference on parallel and distributed processing (IPDPS’06). IEEE Computer Society, Washington, DC, p 124
Airodi EM, Blei D, Fienberg SE, Xing EP (2008) Mixed membership stochastic blockmodels. J Mach Learn Res 9: 1981–2014
Aleman-Meza B, Nagarajan M, Ramakrishnan C, Ding L, Kolari P, Sheth AP, Arpinar BI, Joshi A, Finin T (2006) Semantic analytics on social networks: experiences in addressing the problem of conflict of interest detection. In: WWW ’06: proceedings of the 15th international conference on World Wide Web. ACM Press, New York, pp 407–416
Argyriou A, Herbster M, Pontil M (2006) Combining graph Laplacians for semi-supervised learning. Adv Neural Inf Process Syst 18: 67
Asur S, Parthasarathy S, Ucar D (2007) An event-based framework for characterizing the evolutionary behavior of interaction graphs. In: KDD ’07: proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 913–921
Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: KDD ’06: proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 44–54
Bansal N, Chiang F, Koudas N, Tompa FW (2007) Seeking stable clusters in the blogosphere. In: VLDB ’07: proceedings of the 33rd international conference on very large data bases. Vienna, Austria, pp 806–817
Bickel S, Scheffere T (2004) Multi-view clustering. In: ICDM ’04: proceedings of the fourth IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 19–26
Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications. Springer, New York
Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2009) Multi-view clustering via canonical correlation analysis. In: ICML ’09: proceedings of the 26th annual international conference on machine learning. ACM, New York, pp 1–8
Clauset A, Newman M, Moore C (2004) Finding community structure in very large networks. Arxiv preprint cond-mat/0408187
Clauset A, Shalizi CR, Newman MEJ (2007) Power-law distributions in empirical data. arXiv, 706
de Sa VR (2005) Spectral clustering with two views. In: Proceedings of workshop of learning with multiple views. Bonn, Germany, pp 20–27
Demmel J, Dongarra J, Ruhe A, van der Vorst H (2000) Templates for the solution of algebraic eigenvalue problems: a practical guide. Society for Industrial and Applied Mathematics, Philadelphia
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: ICML ’04: proceedings of the twenty-first international conference on machine learning. ACM, New York, p 36
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5): 75–174
Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci USA 104(1): 36–41
Goder A, Filkov V (2008) Consensus clustering algorithms: comparison and refinement. In: Proceedings of the workshop on algorithm engineering and experiments, ALENEX 2008, San Francisco, CA, January 19, 2008
Good B, De Montjoye Y, Clauset A (2010) Performance of modularity maximization in practical contexts. Phys Rev E 81(4): 046106
Guimera R, Amaral LAN (2005) Functional cartography of complex metabolic networks. Nature 433:895–900
Hopcroft J, Khan O, Kulis B, Selman B (2003) Natural communities in large linked networks. In: KDD ’03: proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 541–546
Hotelling H (1936) Relations between two sets of variates. Biometrika 28(3/4): 321–377
Hu T, Sung SY (2005) Consensus clustering. Intell Data Anal 9(6): 551–565
Jung J, Euzenat J (2007) Towards semantic social networks. In: Proceedings of the European semantic Web conference (ESWC2007), volume 4519 of LNCS. Springer, Berlin, pp 267–280
Kang H, Getoor L, Singh L (2007) Visual analysis of dynamic group membership in temporal social networks. SIGKDD Explor Spec Issue Vis Anal 9(2): 13–21
Kemp C, Tenenbaum J, Griffiths T, Yamada T, Ueda N (1999/2006) Learning systems of concepts with an infinite relational model. In: Proceedings of the national conference on artificial intelligence, vol 21. AAAI Press/MIT Press, Menlo Park/Cambridge, p 381
Kettenring J (1971) Canonical analysis of several sets of variables. Biometrika 58: 433–451
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5): 604–632
Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E 80(5): 056117
Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: WWW ’10: proceedings of the 19th international conference on World Wide Web. ACM, New York, pp 631–640
Lin Y-R, Chi Y, Zhu S, Sundaram H, Tseng BL (2008) Facetnet: a framework for analyzing communities and their evolutions in dynamic networks. In: Proceeding of the 17th international conference on World Wide Web (WWW ’08). ACM, New York, NY, pp 685–694
Lin Y-R, Chi Y, Zhu S, Sundaram H, Tseng BL (2009) Analyzing communities and their evolutions in dynamic social networks. ACM Trans Knowl Discov Data 3(2): 1–31
Lizorkin D, Medelyan O, Grineva M (2009) Analysis of community structure in Wikipedia. In: WWW ’09: proceedings of the 18th international conference on World Wide Web. ACM, New York, pp 1221–1222
Long B, Yu PS, Zhang ZM (2008) A general model for multiple view unsupervised learning. In: Proceedings of the 2008 SIAM international conference on data mining, pp 822–833
Luxburg Uv (2007) A tutorial on spectral clustering. Stat Comput 17(4): 395–416
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27: 415–444
Mika P (2007) Ontologies are us: a unified model of social networks and semantics. In: Web semantics: science, services and agents on the World Wide Web, vol 5, no 1, pp 5–15. Selected papers from the international semantic Web conference (ISWC2005)
Miller K, Griffiths T, Jordan M (2009) Nonparametric latent feature models for link prediction. Adv Neural Inf Process Syst 22: 1276–1284
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2): 91–118
Mucha P, Richardson T, Macon K, Porter M, Onnela J (2010) Community structure in time-dependent, multiscale, and multiplex networks. Science 328(5980): 876–878
Newman M (2006a) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3): 036104
Newman M (2006b) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23): 8577–8582
Nguyen N, Caruana R (2007) Consensus clusterings. In: ICDM ’07: proceedings of the 2007 seventh IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 607–612
Palla G, Barabasi A-L, Vicsek T (2007) Quantifying social group evolution. Nature 446(7136): 664–667
Richardson M, Domingos P (2002) Mining knowledge-sharing sites for viral marketing. In: KDD ’02: proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 61–70
Rosvall M, Bergstrom C (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105(4): 1118–1123
Sarkar P, Moore AW (2005) Dynamic social network analysis using latent space models. SIGKDD Explor Newsl 7(2): 31–40
Specia L, Motta E (2007) Integrating folksonomies with the semantic web. In: Proceedings of the 4th European conference on the semantic Web: research and applications. Springer, Berlin, pp 624–639
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617
Tang L, Liu H (2009a) Relational learning via latent social dimensions. In: KDD ’09: proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 817–826
Tang L, Liu H (2009b) Scalable learning of collective behavior based on sparse social dimensions. In: CIKM ’09: proceeding of the 18th ACM conference on information and knowledge management. ACM, New York, pp 1107–1116
Tang L, Liu H, Zhang J, Nazeri Z (2008) Community evolution in dynamic multi-mode networks. In: KDD ’08: proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 677–685
Tang L, Wang X, Liu H (2009a) Uncovering groups via heterogeneous interaction analysis. In: Proceeding of IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 503–512
Tang W, Lu Z, Dhillon IS (2009b) Clustering with multiple graphs. In: ICDM ’09: proceedings of the 2009 ninth IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 1016–1021
Topchy A, Jain AK, Punch W (2003) Combining multiple weak clusterings. In: ICDM ’03: proceedings of the third IEEE international conference on data mining. IEEE Computer Society, Washington, DC, p 331
Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, Cambridge
White S, Smyth P (2005) A spectral clustering approach to finding communities in graphs. In: Proceedings of the fifth SIAM international conference on data mining, pp 274–285
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufman, San Francisco
Yingzi Jin YM, Ishizuka M (2008) Ranking entities on the web using social network mining and ranking learning. In: WWW ’08: proceedings of the 17th international conference on World Wide Web. ACM Press, New York, pp 21–25
Yu K, Yu S, Tresp V (2005) Soft clustering on graphs. In: Advances in neural information processing systems 18. MIT Press, Cambridge, pp 1553–1560
Zhou D, Burges CJC (2007) Spectral clustering and transductive learning with multiple views. In: ICML ’07: proceedings of the 24th international conference on machine learning. ACM, New York, 1159–1166
Zhou D, Zhu S, Yu K, Song X, Tseng BL, Zha H, Giles CL (2008) Learning multiple graphs for document recommendations. In: WWW ’08: proceeding of the 17th international conference on World Wide Web. ACM, New York, pp 141–150
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Eamonn Keogh.
Rights and permissions
About this article
Cite this article
Tang, L., Wang, X. & Liu, H. Community detection via heterogeneous interaction analysis. Data Min Knowl Disc 25, 1–33 (2012). https://doi.org/10.1007/s10618-011-0231-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-011-0231-0