Skip to main content
Log in

Community detection via heterogeneous interaction analysis

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The pervasiveness of Web 2.0 and social networking sites has enabled people to interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment on shared content (bookmarks, photos, videos), and users can tag their favorite content. Users can also connect with one another, and subscribe to or become a fan or a follower of others. These diverse activities result in a multi-dimensional network among actors, forming group structures with group members sharing similar interests or affiliations. This work systematically addresses two challenges. First, it is challenging to effectively integrate interactions over multiple dimensions to discover hidden community structures shared by heterogeneous interactions. We show that representative community detection methods for single-dimensional networks can be presented in a unified view. Based on this unified view, we present and analyze four possible integration strategies to extend community detection from single-dimensional to multi-dimensional networks. In particular, we propose a novel integration scheme based on structural features. Another challenge is the evaluation of different methods without ground truth information about community membership. We employ a novel cross-dimension network validation (CDNV) procedure to compare the performance of different methods. We use synthetic data to deepen our understanding, and real-world data to compare integration strategies as well as baseline methods in a large scale. We study further the computational time of different methods, normalization effect during integration, sensitivity to related parameters, and alternative community detection methods for integration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abou-Rjeili A, Karypis G (2006) Multilevel algorithms for partitioning power-law graphs. In: Proceedings of the 20th international conference on parallel and distributed processing (IPDPS’06). IEEE Computer Society, Washington, DC, p 124

  • Airodi EM, Blei D, Fienberg SE, Xing EP (2008) Mixed membership stochastic blockmodels. J Mach Learn Res 9: 1981–2014

    Google Scholar 

  • Aleman-Meza B, Nagarajan M, Ramakrishnan C, Ding L, Kolari P, Sheth AP, Arpinar BI, Joshi A, Finin T (2006) Semantic analytics on social networks: experiences in addressing the problem of conflict of interest detection. In: WWW ’06: proceedings of the 15th international conference on World Wide Web. ACM Press, New York, pp 407–416

  • Argyriou A, Herbster M, Pontil M (2006) Combining graph Laplacians for semi-supervised learning. Adv Neural Inf Process Syst 18: 67

    Google Scholar 

  • Asur S, Parthasarathy S, Ucar D (2007) An event-based framework for characterizing the evolutionary behavior of interaction graphs. In: KDD ’07: proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 913–921

  • Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: KDD ’06: proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 44–54

  • Bansal N, Chiang F, Koudas N, Tompa FW (2007) Seeking stable clusters in the blogosphere. In: VLDB ’07: proceedings of the 33rd international conference on very large data bases. Vienna, Austria, pp 806–817

  • Bickel S, Scheffere T (2004) Multi-view clustering. In: ICDM ’04: proceedings of the fourth IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 19–26

  • Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications. Springer, New York

    MATH  Google Scholar 

  • Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2009) Multi-view clustering via canonical correlation analysis. In: ICML ’09: proceedings of the 26th annual international conference on machine learning. ACM, New York, pp 1–8

  • Clauset A, Newman M, Moore C (2004) Finding community structure in very large networks. Arxiv preprint cond-mat/0408187

  • Clauset A, Shalizi CR, Newman MEJ (2007) Power-law distributions in empirical data. arXiv, 706

  • de Sa VR (2005) Spectral clustering with two views. In: Proceedings of workshop of learning with multiple views. Bonn, Germany, pp 20–27

  • Demmel J, Dongarra J, Ruhe A, van der Vorst H (2000) Templates for the solution of algebraic eigenvalue problems: a practical guide. Society for Industrial and Applied Mathematics, Philadelphia

    MATH  Google Scholar 

  • Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: ICML ’04: proceedings of the twenty-first international conference on machine learning. ACM, New York, p 36

  • Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5): 75–174

    Article  MathSciNet  Google Scholar 

  • Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci USA 104(1): 36–41

    Article  Google Scholar 

  • Goder A, Filkov V (2008) Consensus clustering algorithms: comparison and refinement. In: Proceedings of the workshop on algorithm engineering and experiments, ALENEX 2008, San Francisco, CA, January 19, 2008

  • Good B, De Montjoye Y, Clauset A (2010) Performance of modularity maximization in practical contexts. Phys Rev E 81(4): 046106

    Article  MathSciNet  Google Scholar 

  • Guimera R, Amaral LAN (2005) Functional cartography of complex metabolic networks. Nature 433:895–900

    Google Scholar 

  • Hopcroft J, Khan O, Kulis B, Selman B (2003) Natural communities in large linked networks. In: KDD ’03: proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 541–546

  • Hotelling H (1936) Relations between two sets of variates. Biometrika 28(3/4): 321–377

    Article  MATH  Google Scholar 

  • Hu T, Sung SY (2005) Consensus clustering. Intell Data Anal 9(6): 551–565

    Google Scholar 

  • Jung J, Euzenat J (2007) Towards semantic social networks. In: Proceedings of the European semantic Web conference (ESWC2007), volume 4519 of LNCS. Springer, Berlin, pp 267–280

  • Kang H, Getoor L, Singh L (2007) Visual analysis of dynamic group membership in temporal social networks. SIGKDD Explor Spec Issue Vis Anal 9(2): 13–21

    Article  Google Scholar 

  • Kemp C, Tenenbaum J, Griffiths T, Yamada T, Ueda N (1999/2006) Learning systems of concepts with an infinite relational model. In: Proceedings of the national conference on artificial intelligence, vol 21. AAAI Press/MIT Press, Menlo Park/Cambridge, p 381

  • Kettenring J (1971) Canonical analysis of several sets of variables. Biometrika 58: 433–451

    Article  MathSciNet  MATH  Google Scholar 

  • Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5): 604–632

    Article  MathSciNet  MATH  Google Scholar 

  • Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E 80(5): 056117

    Article  Google Scholar 

  • Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: WWW ’10: proceedings of the 19th international conference on World Wide Web. ACM, New York, pp 631–640

  • Lin Y-R, Chi Y, Zhu S, Sundaram H, Tseng BL (2008) Facetnet: a framework for analyzing communities and their evolutions in dynamic networks. In: Proceeding of the 17th international conference on World Wide Web (WWW ’08). ACM, New York, NY, pp 685–694

  • Lin Y-R, Chi Y, Zhu S, Sundaram H, Tseng BL (2009) Analyzing communities and their evolutions in dynamic social networks. ACM Trans Knowl Discov Data 3(2): 1–31

    Article  Google Scholar 

  • Lizorkin D, Medelyan O, Grineva M (2009) Analysis of community structure in Wikipedia. In: WWW ’09: proceedings of the 18th international conference on World Wide Web. ACM, New York, pp 1221–1222

  • Long B, Yu PS, Zhang ZM (2008) A general model for multiple view unsupervised learning. In: Proceedings of the 2008 SIAM international conference on data mining, pp 822–833

  • Luxburg Uv (2007) A tutorial on spectral clustering. Stat Comput 17(4): 395–416

    Article  MathSciNet  Google Scholar 

  • McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27: 415–444

    Article  Google Scholar 

  • Mika P (2007) Ontologies are us: a unified model of social networks and semantics. In: Web semantics: science, services and agents on the World Wide Web, vol 5, no 1, pp 5–15. Selected papers from the international semantic Web conference (ISWC2005)

  • Miller K, Griffiths T, Jordan M (2009) Nonparametric latent feature models for link prediction. Adv Neural Inf Process Syst 22: 1276–1284

    Google Scholar 

  • Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2): 91–118

    Article  MATH  Google Scholar 

  • Mucha P, Richardson T, Macon K, Porter M, Onnela J (2010) Community structure in time-dependent, multiscale, and multiplex networks. Science 328(5980): 876–878

    Article  MathSciNet  MATH  Google Scholar 

  • Newman M (2006a) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3): 036104

    Article  MathSciNet  Google Scholar 

  • Newman M (2006b) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23): 8577–8582

    Article  Google Scholar 

  • Nguyen N, Caruana R (2007) Consensus clusterings. In: ICDM ’07: proceedings of the 2007 seventh IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 607–612

  • Palla G, Barabasi A-L, Vicsek T (2007) Quantifying social group evolution. Nature 446(7136): 664–667

    Article  Google Scholar 

  • Richardson M, Domingos P (2002) Mining knowledge-sharing sites for viral marketing. In: KDD ’02: proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 61–70

  • Rosvall M, Bergstrom C (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105(4): 1118–1123

    Article  Google Scholar 

  • Sarkar P, Moore AW (2005) Dynamic social network analysis using latent space models. SIGKDD Explor Newsl 7(2): 31–40

    Article  Google Scholar 

  • Specia L, Motta E (2007) Integrating folksonomies with the semantic web. In: Proceedings of the 4th European conference on the semantic Web: research and applications. Springer, Berlin, pp 624–639

  • Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617

    MathSciNet  MATH  Google Scholar 

  • Tang L, Liu H (2009a) Relational learning via latent social dimensions. In: KDD ’09: proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 817–826

  • Tang L, Liu H (2009b) Scalable learning of collective behavior based on sparse social dimensions. In: CIKM ’09: proceeding of the 18th ACM conference on information and knowledge management. ACM, New York, pp 1107–1116

  • Tang L, Liu H, Zhang J, Nazeri Z (2008) Community evolution in dynamic multi-mode networks. In: KDD ’08: proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 677–685

  • Tang L, Wang X, Liu H (2009a) Uncovering groups via heterogeneous interaction analysis. In: Proceeding of IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 503–512

  • Tang W, Lu Z, Dhillon IS (2009b) Clustering with multiple graphs. In: ICDM ’09: proceedings of the 2009 ninth IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 1016–1021

  • Topchy A, Jain AK, Punch W (2003) Combining multiple weak clusterings. In: ICDM ’03: proceedings of the third IEEE international conference on data mining. IEEE Computer Society, Washington, DC, p 331

  • Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, Cambridge

    Google Scholar 

  • White S, Smyth P (2005) A spectral clustering approach to finding communities in graphs. In: Proceedings of the fifth SIAM international conference on data mining, pp 274–285

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufman, San Francisco

    MATH  Google Scholar 

  • Yingzi Jin YM, Ishizuka M (2008) Ranking entities on the web using social network mining and ranking learning. In: WWW ’08: proceedings of the 17th international conference on World Wide Web. ACM Press, New York, pp 21–25

  • Yu K, Yu S, Tresp V (2005) Soft clustering on graphs. In: Advances in neural information processing systems 18. MIT Press, Cambridge, pp 1553–1560

  • Zhou D, Burges CJC (2007) Spectral clustering and transductive learning with multiple views. In: ICML ’07: proceedings of the 24th international conference on machine learning. ACM, New York, 1159–1166

  • Zhou D, Zhu S, Yu K, Song X, Tseng BL, Zha H, Giles CL (2008) Learning multiple graphs for document recommendations. In: WWW ’08: proceeding of the 17th international conference on World Wide Web. ACM, New York, pp 141–150

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Tang.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, L., Wang, X. & Liu, H. Community detection via heterogeneous interaction analysis. Data Min Knowl Disc 25, 1–33 (2012). https://doi.org/10.1007/s10618-011-0231-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-011-0231-0

Keywords

Navigation