Advertisement

Data Mining and Knowledge Discovery

, Volume 28, Issue 5–6, pp 1586–1610 | Cite as

Overlapping community detection in labeled graphs

  • Esther Galbrun
  • Aristides Gionis
  • Nikolaj Tatti
Article

Abstract

We present a new approach for the problem of finding overlapping communities in graphs and social networks. Our approach consists of a novel problem definition and three accompanying algorithms. We are particularly interested in graphs that have labels on their vertices, although our methods are also applicable to graphs with no labels. Our goal is to find k communities so that the total edge density over all k communities is maximized. In the case of labeled graphs, we require that each community is succinctly described by a set of labels. This requirement provides a better understanding for the discovered communities. The proposed problem formulation leads to the discovery of vertex-overlapping and dense communities that cover as many graph edges as possible. We capture these properties with a simple objective function, which we solve by adapting efficient approximation algorithms for the generalized maximum-coverage problem and the densest-subgraph problem. Our proposed algorithm is a generic greedy scheme. We experiment with three variants of the scheme, obtained by varying the greedy step of finding a dense subgraph. We validate our algorithms by comparing with other state-of-the-art community-detection methods on a variety of performance measures. Our experiments confirm that our algorithms achieve results of high quality in terms of the reported measures, and are practical in terms of performance.

Keywords

Social networks Graph partitioning Multi-labeled graphs 

References

  1. Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466:761–764CrossRefGoogle Scholar
  2. Asahiro Y, Iwama K, Tamaki H, Tokuyama T (2000) Greedily finding a dense subgraph. J Algorithms 34(2):203–221CrossRefzbMATHMathSciNetGoogle Scholar
  3. Atkins JE, Boman EG, Hendrickson B (1998) A spectral algorithm for seriation and the consecutive ones problem. SIAM J Comput 28:297–310CrossRefzbMATHMathSciNetGoogle Scholar
  4. Balasubramanyan R, Cohen WW (2011) Block-LDA: Jointly modeling entity-annotated text and entity-entity links. In: SIAM international conference on data mining (SDM’11), SIAM/Omnipress, pp 450–461Google Scholar
  5. Charikar M (2000) Greedy approximation algorithms for finding dense components in a graph. In: International workshop on approximation, randomization, and combinatorial optimization (APPROX’00) pp 84–95Google Scholar
  6. Chen W, Liu Z, Sun X, Wang Y (2010) A game-theoretic framework to identify overlapping communities in social networks. Data Min Knowl Discov 21(2):224–240CrossRefMathSciNetGoogle Scholar
  7. Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111CrossRefGoogle Scholar
  8. Cohen R, Katzir L (2008) The generalized maximum coverage problem. Inf Process Lett 108:15–22CrossRefzbMATHMathSciNetGoogle Scholar
  9. Coscia M, Rossetti G, Giannotti F, Pedreschi D (2012) DEMON: a local-first discovery method for overlapping communities. In: Yang Q, Agarwal D, Pei J (eds) ACM SIGKDD international conference on knowledge discovery and data mining (KDD’12), pp 615–623Google Scholar
  10. Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: Ramakrishnan R, Stolfo SJ, Bayardo RJ, Parsa I (eds) ACM SIGKDD international conference on knowledge discovery and data mining (KDD’00), ACM, pp 150–160Google Scholar
  11. Fortunato S (2010) Community detection in graphs. Physics Reports 486Google Scholar
  12. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Nat Acad Sci USA 99:7821–7826CrossRefzbMATHMathSciNetGoogle Scholar
  13. Gregory S (2007) An algorithm to find overlapping community structure in networks. In: Kok JN, Koronacki J, de Mántaras RL, Matwin S, Mladenic D, Skowron A (eds) European conference on principles and practice of knowledge discovery in databases (ECML/PKDD’07). Lecture Notes in Computer Science, vol 4702, pp 91–102. Springer, BerlinGoogle Scholar
  14. Gupta R, Roughgarden T, Seshadhri C (2014) Decompositions of triangle-dense graphs. In: Naor M (ed) Innovations in theoretical computer science. (ITCS’14), ACM, pp 471–482Google Scholar
  15. Karypis G, Kumar V (1998) Multilevel algorithms for multi-constraint graph partitioning. In: ACM/IEEE conference on supercomputing (SC ’98), IEEE Computer Society, pp 1–13Google Scholar
  16. McAuley J, Leskovec J (2012) Learning to discover social circles in ego networks. In: Bartlett PL, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems (NIPS’12), pp 548–556Google Scholar
  17. Miettinen P, Mielikäinen T, Gionis A, Das G, Mannila H (2008) The discrete basis problem. IEEE Trans Knowl Data Eng 20(10):1348–1362Google Scholar
  18. Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ (eds) Advances in neural information processing systems (NIPS’01), pp 849–856Google Scholar
  19. Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818CrossRefGoogle Scholar
  20. Pinney J, Westhead D (2006) Betweenness-based decomposition methods for social and biological networks. In: Interdisciplinary statistics and bioinformatics, pp 87–90Google Scholar
  21. Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms Appl 10(2):284–293CrossRefMathSciNetGoogle Scholar
  22. Pool S, Bonchi F, van Leeuwen M (2014) Description-driven community detection. ACM Trans Intell Syst Technol 5(2):1–28CrossRefGoogle Scholar
  23. van Dongen S (2000) Graph clustering by flow simulation. Ph.D. Thesis, University of UtrechtGoogle Scholar
  24. von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416CrossRefMathSciNetGoogle Scholar
  25. White S, Smyth P (2005) A spectral clustering approach to finding communities in graph. In: SIAM international conference on data mining (SDM’05), SIAM/Omnipress, pp 76–84Google Scholar
  26. Xie J, Kelley S, Szymanski BK (2011) Overlapping community detection in networks: the state of the art and comparative study. arxivorg/abs/11105813Google Scholar
  27. Yan B, Gregory S (2009) Detecting communities in networks by merging cliques. In: IEEE international conference on intelligent computing and intelligent systems (ICIS’09), pp 832–836Google Scholar
  28. Yang J, Leskovec J (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Leonardi S, Panconesi A, Ferragina P, Gionis A (eds) ACM international conference on web search and data mining (WSDM’13), ACM, pp 587–596Google Scholar
  29. Zhou H, Lipowsky R (2004) Network Brownian motion: A new method to measure vertex–vertex proximity and to identify communities and subcommunities. In: Bubak M, Albada G, Sloot P, Dongarra J (eds) Computational science (ICCS’04). Lecture Notes in Computer Science, vol 3038, pp 1062–1069Google Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  • Esther Galbrun
    • 1
  • Aristides Gionis
    • 2
  • Nikolaj Tatti
    • 2
  1. 1.Department of Computer ScienceBoston UniversityBostonUSA
  2. 2.Helsinki Institute for Information Technology (HIIT) and Department of Information and Computer ScienceAalto UniversityEspooFinland

Personalised recommendations