Data Mining and Knowledge Discovery

, Volume 30, Issue 5, pp 1134–1165 | Cite as

Top-k overlapping densest subgraphs

Article

Abstract

Finding dense subgraphs is an important problem in graph mining and has many practical applications. At the same time, while large real-world networks are known to have many communities that are not well-separated, the majority of the existing work focuses on the problem of finding a single densest subgraph. Hence, it is natural to consider the question of finding the top-kdensest subgraphs. One major challenge in addressing this question is how to handle overlaps: eliminating overlaps completely is one option, but this may lead to extracting subgraphs not as dense as it would be possible by allowing a limited amount of overlap. Furthermore, overlaps are desirable as in most real-world graphs there are vertices that belong to more than one community, and thus, to more than one densest subgraph. In this paper we study the problem of finding top-koverlapping densest subgraphs, and we present a new approach that improves over the existing techniques, both in theory and practice. First, we reformulate the problem definition in a way that we are able to obtain an algorithm with constant-factor approximation guarantee. Our approach relies on using techniques for solving the max-sum diversification problem, which however, we need to extend in order to make them applicable to our setting. Second, we evaluate our algorithm on a collection of benchmark datasets and show that it convincingly outperforms the previous methods, both in terms of quality and efficiency.

Keywords

Community detection Overlapping communities Social network analysis Dense subgraphs Diverse subgraphs Approximation algorithm 

References

  1. Ahn Y-Y, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466:761–764CrossRefGoogle Scholar
  2. Andersen R, Chellapilla K (2009) Finding dense subgraphs with size bounds. In: Proceedings of the 6th international workshop on algorithms and models for the web-graph (WAW), p 25–37Google Scholar
  3. Angel A, Sarkas N, Koudas N, Srivastava D (2012) Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proc Very Large Data Bases Endow 5(6):574–585Google Scholar
  4. Asahiro Y, Iwama K, Tamaki H, Tokuyama T (1996) Greedily finding a dense subgraph. In: Proceedings of the 5th Scandinavian workshop on algorithm theory (SWAT), p 136–148Google Scholar
  5. Balalau OD, Bonchi F, Chan TH, Gullo F, Sozio M (2015) Finding subgraphs with maximum total density and limited overlap. In: Proceedings of the 8th ACM international conference on web search and data mining (WSDM), p 379–388Google Scholar
  6. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 10:2008Google Scholar
  7. Borodin A, Lee HC, Ye Y (2012) Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS), p 155–166Google Scholar
  8. Charikar M (2000) Greedy approximation algorithms for finding dense components in a graph. In: Proceedings of the 3rd international workshop on approximation algorithms for combinatorial optimization (APPROX), p 84–95Google Scholar
  9. Chen M, Kuzmin K, Szymanski B (2014) Extension of modularity density for overlapping community structure. In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), p 856–863Google Scholar
  10. Chen W, Liu Z, Sun X, Wang Y (2010) A game-theoretic framework to identify overlapping communities in social networks. Data Min Knowl Discov 21(2):224–240MathSciNetCrossRefGoogle Scholar
  11. Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111CrossRefGoogle Scholar
  12. Coscia M, Rossetti G, Giannotti F, Pedreschi D (2012) DEMON: a local-first discovery method for overlapping communities. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 615–623Google Scholar
  13. Feige U, Peleg D, Kortsarz G (2001) The dense \(k\)-subgraph problem. Algorithmica 29(3):410–421MathSciNetCrossRefMATHGoogle Scholar
  14. Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 150–160Google Scholar
  15. Fratkin E, Naughton BT, Brutlag DL, Batzoglou S (2006) MotifCut: regulatory motifs finding with maximum density subgraphs. Bioinformatics 22(14):150–157CrossRefGoogle Scholar
  16. Galbrun E, Gionis A, Tatti N (2014) Overlapping community detection in labeled graphs. Data Min Knowl Discov 28(5–6):1586–1610MathSciNetCrossRefGoogle Scholar
  17. Garey M, Johnson D (1979) Computers and intractability: a guide to the theory of NP-completeness. WH Freeman and Co., New YorkMATHGoogle Scholar
  18. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99:7821–7826MathSciNetCrossRefMATHGoogle Scholar
  19. Goldberg AV (1984) Finding a maximum density subgraph. Technical report. University of California, BerkeleyGoogle Scholar
  20. Gregory S (2007) An algorithm to find overlapping community structure in networks. In: Proceedings of the 2007 European conference on principles and practice of knowledge discovery in databases, Part I (ECML/PKDD), p 91–102Google Scholar
  21. Gregory S (2010) Finding overlapping communities in networks by label propagation. N J Phys 12(10):103018CrossRefGoogle Scholar
  22. Håstad J (1996) Clique is hard to approximate within \(n^{1-\epsilon }.\) In: Proceedings of the 37th annual symposium on foundations of computer science (FOCS), p 627–636Google Scholar
  23. Karypis G, Kumar V (1998) Multilevel algorithms for multi-constraint graph partitioning. In: Proceedings of the ACM/IEEE conference on supercomputing (SC). IEEE Computer Society, Washington, DC, p 1–13Google Scholar
  24. Khuller S, Saha B (2009) On finding dense subgraphs. In: Automata, languages and programming, p 597–608Google Scholar
  25. Kumar R, Raghavan P, Rajagopalan S, Tomkins A (1999) Trawling the Web for emerging cyber-communities. Comput Netw 31(11–16):1481–1493CrossRefGoogle Scholar
  26. Leskovec J, Lang K, Dasgupta A, Mahoney M (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123MathSciNetCrossRefMATHGoogle Scholar
  27. Nemhauser G, Wolsey L, Fisher M (1978) An analysis of approximations for maximizing submodular set functions: I. Math Program 14(1):265–294MathSciNetCrossRefMATHGoogle Scholar
  28. Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems (NIPS), p 849–856Google Scholar
  29. Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818CrossRefGoogle Scholar
  30. Pinney J, Westhead D (2006) Betweenness-based decomposition methods for social and biological networks. In: Interdisciplinary statistics and bioinformatics. Leeds University Press, Leeds, p 87–90Google Scholar
  31. Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms Appl 10(2):284–293MathSciNetCrossRefMATHGoogle Scholar
  32. Schrijver A (2003) Combinatorial optimization. Springer, BerlinMATHGoogle Scholar
  33. Sozio M, Gionis A (2010) The community-search problem and how to plan a successful cocktail party. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 939–948Google Scholar
  34. Tatti N, Gionis A (2015) Density-friendly graph decomposition. In: Proceedings of the 24th international conference on world wide web (WWW), p 1089–1099Google Scholar
  35. Tsourakakis C (2015) The k-clique densest subgraph problem. In: Proceedings of the 24th international conference on world wide web (WWW), p 1122–1132Google Scholar
  36. Tsourakakis C, Bonchi F, Gionis A, Gullo F, Tsiarli M (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 104–112Google Scholar
  37. van Dongen S (2000) Graph clustering by flow simulation. PhD Thesis, University of UtrechtGoogle Scholar
  38. von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416MathSciNetCrossRefGoogle Scholar
  39. White S, Smyth P (2005) A spectral clustering approach to finding communities in graph. In: Proceedings of the 2005 SIAM international conference on data mining, p 76–84Google Scholar
  40. Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv 45(4):43CrossRefMATHGoogle Scholar
  41. Xie J, Szymanski BK, Liu X (2011) SLPA: uncovering overlapping communities in social networks via a speaker–listener interaction dynamic process. In: International conference on data mining workshops (ICDMW)Google Scholar
  42. Yang J, Leskovec J (2012) Community-affiliation graph model for overlapping network community detection. In: Proceedings of the 12th IEEE international conference on data mining (ICDM), p 1170–1175Google Scholar
  43. Yang J, Leskovec J (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the 6th ACM international conference on web search and data mining (WSDM), p 587–596Google Scholar
  44. Zachary W (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33:452–473CrossRefGoogle Scholar
  45. Zhou H, Lipowsky R (2004) Network Brownian motion: a new method to measure vertex–vertex proximity and to identify communities and subcommunities. Comput Sci (ICCS) 3038:1062–1069Google Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  • Esther Galbrun
    • 1
  • Aristides Gionis
    • 2
  • Nikolaj Tatti
    • 2
  1. 1.Inria Nancy – Grand EstVillers-lès-NancyFrance
  2. 2.Helsinki Institute for Information Technology (HIIT) and Department of Computer ScienceAalto UniversityHelsinkiFinland

Personalised recommendations