Skip to main content
Log in

SONIC: streaming overlapping community detection

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

A community within a graph can be broadly defined as a set of vertices that exhibit high cohesiveness (relatively high number of edges within the set) and low conductance (relatively low number of edges leaving the set). Community detection is a fundamental graph processing analytic that can be applied to several application domains, including social networks. In this context, communities are often overlapping, as a person can be involved in more than one community (e.g., friends, and family); and evolving, since the structure of the network changes. We address the problem of streaming overlapping community detection, where the goal is to maintain communities in the presence of streaming updates. This way, the communities can be updated more efficiently. To this end, we introduce SONIC—a find-and-merge type of community detection algorithm that can efficiently handle streaming updates. SONIC first detects when graph updates yield significant community changes. Upon the detection, it updates the communities via an incremental merge procedure. The SONIC algorithm incorporates two additional techniques to speed-up the incremental merge; min-hashing and inverted indexes. Results show that SONIC can provide high quality overlapping communities, while handling streaming updates several orders of magnitude faster than the alternatives performing from-scratch computation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Agarwal MK, Ramamritham K, Bhide M (2012) Real time discovery of dense clusters in highly dynamic graphs: identifying real world events in highly dynamic environments. Proc Very Large Data Bases Endow 5(10):980–991

    Google Scholar 

  • Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466(7307):761–764

    Article  Google Scholar 

  • Angel A, Sarkas N, Koudas N, Srivastava D (2012) Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proc Very Large Data Bases Endow 5(6):574–585

    Google Scholar 

  • Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (1998) Min-wise independent permutations. J Comput Syst Sci 60:327–336

    MathSciNet  MATH  Google Scholar 

  • Cazabet R, Amblard F, Hanachi C (2010) Detection of overlapping communities in dynamical social networks. In: IEEE second international conference on social computing (SocialCom), pp 309–314

  • Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’06, pp 554–560

  • Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings of the thiry-fourth annual ACM symposium on theory of computing, STOC’02, pp 380–388

  • Coscia M, Rossetti G, Giannotti F, Pedreschi D (2012) Demon: a local-first discovery method for overlapping communities. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’12, pp 615–623

  • Cui W, Xiao Y, Wang H, Lu Y, Wang W (2013) Online search of overlapping communities. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, SIGMOD’13, pp 277–288

  • Danon L, Daz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech 9:P09008

    Google Scholar 

  • DBLP (2014) http://www.informatik.uni-trier.de/~ley/db/. Accessed Mar 2014

  • Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174

    Article  MathSciNet  Google Scholar 

  • Freeman LC (1982) Centered graphs and the structure of ego networks. Math Soc Sci 3(3):291–304

    Article  MathSciNet  MATH  Google Scholar 

  • Gleich DF, Seshadhri C (2012) Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’12, pp 597–605

  • Goldberg M, Magdon-Ismail M, Nambirajan S, Thompson J (2011) Tracking and predicting evolution of social communities. In: Privacy, security, risk and trust (PASSAT) and 2011 IEEE third international conference on social computing (SocialCom), pp 780–783

  • Gregory S (2010) Finding overlapping communities in networks by label propagation. New J Phys 12(10):103,018

    Article  Google Scholar 

  • Hildrum K, Yu P (2005) Focused community discovery. In: Fifth IEEE international conference on data mining, pp 27–30

  • Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, STOC’98, pp 604–613

  • Kim MS, Han J (2009) A particle-and-density based evolutionary clustering method for dynamic networks. Proc Very Large Data Bases Endow 2(1):622–633

    Google Scholar 

  • Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E 80:056117

    Article  Google Scholar 

  • Lancichinetti A, Fortunato S, Kertesz J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11(3):033,015

    Article  Google Scholar 

  • Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on world wide web, WWW’10, pp 631–640

  • Lin YR, Chi Y, Zhu S, Sundaram H, Tseng BL (2008) Facetnet: a framework for analyzing communities and their evolutions in dynamic networks. In: Proceedings of the 17th international conference on world wide web, WWW’08, pp 685–694

  • McDaid AF, Greene D, Hurley N (2011) Normalized mutual information to evaluate overlapping community finding algorithms. CoRR.arXiv:1110.2515

  • Nanavati AA, Gurumurthy S, Das G, Chakraborty D, Dasgupta K, Mukherjea S, Joshi A (2006) On the structural properties of massive telecom call graphs: findings and implications. In: Proceedings of the 15th ACM international conference on information and knowledge management, CIKM’06, pp 435–444

  • Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23):8577–8582

    Article  Google Scholar 

  • Padrol-Sureda A, Perarnau-Llobet G, Pfeifle J, Muntes-Mulero V (2010) Overlapping community search for social networks. In: IEEE 26th international conference on data engineering (ICDE), 2010, pp 992–995

  • Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818

    Article  Google Scholar 

  • Qi GJ, Aggarwal CC, Huang TS (2013) Online community detection in social sensing. In: Proceedings of the sixth ACM international conference on web search and data mining, WSDM’13 pp 617–626

  • Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76:036,106

    Article  Google Scholar 

  • Rees B, Gallagher K (2010) Overlapping community detection by collective friendship group inference. In: International conference on advances in social networks analysis and mining (ASONAM), 2010, pp 375–379

  • Rees BS, Gallagher KB (2012) Overlapping community detection using a community optimized graph swarm. Soc Netw Anal Min 2(4):405–417

    Article  Google Scholar 

  • Rees BS, Gallagher KB (2013a) Detecting overlapping communities in complex networks using swarm intelligence for multi-threaded label propagation. Complex networks. Springer, Berlin

    Google Scholar 

  • Rees BS, Gallagher KB (2013b) Egoclustering: overlapping community detection via merged friendship-groups. The influence of technology on social network analysis and mining. Springer, Vienna

    Google Scholar 

  • Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105(4):1118–1123

    Article  Google Scholar 

  • Sarr I, Missaoui R, Lalande R (2013) Group disappearance in social networks with communities. Soc Netw Anal Min 3(3):651–665

    Article  Google Scholar 

  • SNAP (2014) Stanford network analysis package. http://snap.stanford.edu/snap. Accessed Mar 2014

  • Sozio M, Gionis A (2010) The community-search problem and how to plan a successful cocktail party. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’10, pp 939–948

  • Twitter (2014) http://www.twitter.com/. Accessed Mar 2014

  • Wang F, Li T, Wang X, Zhu S, Ding C (2011) Community discovery using nonnegative matrix factorization. Data Min Knowl Discov 22(3):493–521

    Article  MathSciNet  MATH  Google Scholar 

  • Whang JJ, Gleich DF, Dhillon IS (2013) Overlapping community detection using seed set expansion. In: Proceedings of the 22nd ACM international conference on information and knowledge management, CIKM’13 pp 2099–2108

  • Xie J, Chen M, Szymanski BK (2013) Labelrankt: incremental community detection in dynamic networks via label propagation. In: Proceedings of the workshop on dynamic networks management and mining, DyNetMM’13, pp 25–32

  • Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv 45(4):43:1–43:35

    Article  MATH  Google Scholar 

  • Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proceedings of the ACM SIGKDD workshop on mining data semantics, MDS’12 pp 3:1–3:8

  • Yang J, Leskovec J (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the sixth ACM international conference on web search and data mining, WSDM’13 pp 587–596

  • Zhang Y, Yeung DY (2012) Overlapping community detection via bounded nonnegative matrix tri-factorization. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’12 pp 606–614

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmet Erdem Sarıyüce.

Additional information

Responsible editors: G. Karypis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sarıyüce, A.E., Gedik, B., Jacques-Silva, G. et al. SONIC: streaming overlapping community detection. Data Min Knowl Disc 30, 819–847 (2016). https://doi.org/10.1007/s10618-015-0440-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-015-0440-z

Keywords

Navigation