Data Mining and Knowledge Discovery

, Volume 30, Issue 4, pp 819–847 | Cite as

SONIC: streaming overlapping community detection

  • Ahmet Erdem Sarıyüce
  • Buğra Gedik
  • Gabriela Jacques-Silva
  • Kun-Lung Wu
  • Ümit V. Çatalyürek
Article
  • 1.4k Downloads

Abstract

A community within a graph can be broadly defined as a set of vertices that exhibit high cohesiveness (relatively high number of edges within the set) and low conductance (relatively low number of edges leaving the set). Community detection is a fundamental graph processing analytic that can be applied to several application domains, including social networks. In this context, communities are often overlapping, as a person can be involved in more than one community (e.g., friends, and family); and evolving, since the structure of the network changes. We address the problem of streaming overlapping community detection, where the goal is to maintain communities in the presence of streaming updates. This way, the communities can be updated more efficiently. To this end, we introduce SONIC—a find-and-merge type of community detection algorithm that can efficiently handle streaming updates. SONIC first detects when graph updates yield significant community changes. Upon the detection, it updates the communities via an incremental merge procedure. The SONIC algorithm incorporates two additional techniques to speed-up the incremental merge; min-hashing and inverted indexes. Results show that SONIC can provide high quality overlapping communities, while handling streaming updates several orders of magnitude faster than the alternatives performing from-scratch computation.

Keywords

Streaming graph processing Community detection Overlapping communities 

References

  1. Agarwal MK, Ramamritham K, Bhide M (2012) Real time discovery of dense clusters in highly dynamic graphs: identifying real world events in highly dynamic environments. Proc Very Large Data Bases Endow 5(10):980–991Google Scholar
  2. Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466(7307):761–764CrossRefGoogle Scholar
  3. Angel A, Sarkas N, Koudas N, Srivastava D (2012) Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proc Very Large Data Bases Endow 5(6):574–585Google Scholar
  4. Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (1998) Min-wise independent permutations. J Comput Syst Sci 60:327–336MathSciNetMATHGoogle Scholar
  5. Cazabet R, Amblard F, Hanachi C (2010) Detection of overlapping communities in dynamical social networks. In: IEEE second international conference on social computing (SocialCom), pp 309–314Google Scholar
  6. Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’06, pp 554–560Google Scholar
  7. Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings of the thiry-fourth annual ACM symposium on theory of computing, STOC’02, pp 380–388Google Scholar
  8. Coscia M, Rossetti G, Giannotti F, Pedreschi D (2012) Demon: a local-first discovery method for overlapping communities. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’12, pp 615–623Google Scholar
  9. Cui W, Xiao Y, Wang H, Lu Y, Wang W (2013) Online search of overlapping communities. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, SIGMOD’13, pp 277–288Google Scholar
  10. Danon L, Daz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech 9:P09008Google Scholar
  11. DBLP (2014) http://www.informatik.uni-trier.de/~ley/db/. Accessed Mar 2014
  12. Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174MathSciNetCrossRefGoogle Scholar
  13. Freeman LC (1982) Centered graphs and the structure of ego networks. Math Soc Sci 3(3):291–304MathSciNetCrossRefMATHGoogle Scholar
  14. Gleich DF, Seshadhri C (2012) Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’12, pp 597–605Google Scholar
  15. Goldberg M, Magdon-Ismail M, Nambirajan S, Thompson J (2011) Tracking and predicting evolution of social communities. In: Privacy, security, risk and trust (PASSAT) and 2011 IEEE third international conference on social computing (SocialCom), pp 780–783Google Scholar
  16. Gregory S (2010) Finding overlapping communities in networks by label propagation. New J Phys 12(10):103,018CrossRefGoogle Scholar
  17. Hildrum K, Yu P (2005) Focused community discovery. In: Fifth IEEE international conference on data mining, pp 27–30Google Scholar
  18. Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, STOC’98, pp 604–613Google Scholar
  19. Kim MS, Han J (2009) A particle-and-density based evolutionary clustering method for dynamic networks. Proc Very Large Data Bases Endow 2(1):622–633Google Scholar
  20. Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E 80:056117CrossRefGoogle Scholar
  21. Lancichinetti A, Fortunato S, Kertesz J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11(3):033,015CrossRefGoogle Scholar
  22. Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on world wide web, WWW’10, pp 631–640Google Scholar
  23. Lin YR, Chi Y, Zhu S, Sundaram H, Tseng BL (2008) Facetnet: a framework for analyzing communities and their evolutions in dynamic networks. In: Proceedings of the 17th international conference on world wide web, WWW’08, pp 685–694Google Scholar
  24. McDaid AF, Greene D, Hurley N (2011) Normalized mutual information to evaluate overlapping community finding algorithms. CoRR.arXiv:1110.2515
  25. Nanavati AA, Gurumurthy S, Das G, Chakraborty D, Dasgupta K, Mukherjea S, Joshi A (2006) On the structural properties of massive telecom call graphs: findings and implications. In: Proceedings of the 15th ACM international conference on information and knowledge management, CIKM’06, pp 435–444Google Scholar
  26. Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23):8577–8582CrossRefGoogle Scholar
  27. Padrol-Sureda A, Perarnau-Llobet G, Pfeifle J, Muntes-Mulero V (2010) Overlapping community search for social networks. In: IEEE 26th international conference on data engineering (ICDE), 2010, pp 992–995Google Scholar
  28. Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818CrossRefGoogle Scholar
  29. Qi GJ, Aggarwal CC, Huang TS (2013) Online community detection in social sensing. In: Proceedings of the sixth ACM international conference on web search and data mining, WSDM’13 pp 617–626Google Scholar
  30. Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76:036,106CrossRefGoogle Scholar
  31. Rees B, Gallagher K (2010) Overlapping community detection by collective friendship group inference. In: International conference on advances in social networks analysis and mining (ASONAM), 2010, pp 375–379Google Scholar
  32. Rees BS, Gallagher KB (2012) Overlapping community detection using a community optimized graph swarm. Soc Netw Anal Min 2(4):405–417CrossRefGoogle Scholar
  33. Rees BS, Gallagher KB (2013a) Detecting overlapping communities in complex networks using swarm intelligence for multi-threaded label propagation. Complex networks. Springer, BerlinGoogle Scholar
  34. Rees BS, Gallagher KB (2013b) Egoclustering: overlapping community detection via merged friendship-groups. The influence of technology on social network analysis and mining. Springer, ViennaGoogle Scholar
  35. Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105(4):1118–1123CrossRefGoogle Scholar
  36. Sarr I, Missaoui R, Lalande R (2013) Group disappearance in social networks with communities. Soc Netw Anal Min 3(3):651–665CrossRefGoogle Scholar
  37. SNAP (2014) Stanford network analysis package. http://snap.stanford.edu/snap. Accessed Mar 2014
  38. Sozio M, Gionis A (2010) The community-search problem and how to plan a successful cocktail party. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’10, pp 939–948Google Scholar
  39. Twitter (2014) http://www.twitter.com/. Accessed Mar 2014
  40. Wang F, Li T, Wang X, Zhu S, Ding C (2011) Community discovery using nonnegative matrix factorization. Data Min Knowl Discov 22(3):493–521MathSciNetCrossRefMATHGoogle Scholar
  41. Whang JJ, Gleich DF, Dhillon IS (2013) Overlapping community detection using seed set expansion. In: Proceedings of the 22nd ACM international conference on information and knowledge management, CIKM’13 pp 2099–2108Google Scholar
  42. Xie J, Chen M, Szymanski BK (2013) Labelrankt: incremental community detection in dynamic networks via label propagation. In: Proceedings of the workshop on dynamic networks management and mining, DyNetMM’13, pp 25–32Google Scholar
  43. Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv 45(4):43:1–43:35CrossRefMATHGoogle Scholar
  44. Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proceedings of the ACM SIGKDD workshop on mining data semantics, MDS’12 pp 3:1–3:8Google Scholar
  45. Yang J, Leskovec J (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the sixth ACM international conference on web search and data mining, WSDM’13 pp 587–596Google Scholar
  46. Zhang Y, Yeung DY (2012) Overlapping community detection via bounded nonnegative matrix tri-factorization. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’12 pp 606–614Google Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  • Ahmet Erdem Sarıyüce
    • 1
  • Buğra Gedik
    • 2
  • Gabriela Jacques-Silva
    • 3
  • Kun-Lung Wu
    • 3
  • Ümit V. Çatalyürek
    • 4
  1. 1.Sandia National LabsLivermoreUSA
  2. 2.Bilkent UniversityAnkaraTurkey
  3. 3.IBM Thomas J. Watson Research CenterIBM ResearchNew YorkUSA
  4. 4.The Ohio State UniversityColumbusUSA

Personalised recommendations