Abstract
A community within a graph can be broadly defined as a set of vertices that exhibit high cohesiveness (relatively high number of edges within the set) and low conductance (relatively low number of edges leaving the set). Community detection is a fundamental graph processing analytic that can be applied to several application domains, including social networks. In this context, communities are often overlapping, as a person can be involved in more than one community (e.g., friends, and family); and evolving, since the structure of the network changes. We address the problem of streaming overlapping community detection, where the goal is to maintain communities in the presence of streaming updates. This way, the communities can be updated more efficiently. To this end, we introduce SONIC—a find-and-merge type of community detection algorithm that can efficiently handle streaming updates. SONIC first detects when graph updates yield significant community changes. Upon the detection, it updates the communities via an incremental merge procedure. The SONIC algorithm incorporates two additional techniques to speed-up the incremental merge; min-hashing and inverted indexes. Results show that SONIC can provide high quality overlapping communities, while handling streaming updates several orders of magnitude faster than the alternatives performing from-scratch computation.
Similar content being viewed by others
References
Agarwal MK, Ramamritham K, Bhide M (2012) Real time discovery of dense clusters in highly dynamic graphs: identifying real world events in highly dynamic environments. Proc Very Large Data Bases Endow 5(10):980–991
Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466(7307):761–764
Angel A, Sarkas N, Koudas N, Srivastava D (2012) Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proc Very Large Data Bases Endow 5(6):574–585
Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (1998) Min-wise independent permutations. J Comput Syst Sci 60:327–336
Cazabet R, Amblard F, Hanachi C (2010) Detection of overlapping communities in dynamical social networks. In: IEEE second international conference on social computing (SocialCom), pp 309–314
Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’06, pp 554–560
Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings of the thiry-fourth annual ACM symposium on theory of computing, STOC’02, pp 380–388
Coscia M, Rossetti G, Giannotti F, Pedreschi D (2012) Demon: a local-first discovery method for overlapping communities. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’12, pp 615–623
Cui W, Xiao Y, Wang H, Lu Y, Wang W (2013) Online search of overlapping communities. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, SIGMOD’13, pp 277–288
Danon L, Daz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech 9:P09008
DBLP (2014) http://www.informatik.uni-trier.de/~ley/db/. Accessed Mar 2014
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Freeman LC (1982) Centered graphs and the structure of ego networks. Math Soc Sci 3(3):291–304
Gleich DF, Seshadhri C (2012) Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’12, pp 597–605
Goldberg M, Magdon-Ismail M, Nambirajan S, Thompson J (2011) Tracking and predicting evolution of social communities. In: Privacy, security, risk and trust (PASSAT) and 2011 IEEE third international conference on social computing (SocialCom), pp 780–783
Gregory S (2010) Finding overlapping communities in networks by label propagation. New J Phys 12(10):103,018
Hildrum K, Yu P (2005) Focused community discovery. In: Fifth IEEE international conference on data mining, pp 27–30
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, STOC’98, pp 604–613
Kim MS, Han J (2009) A particle-and-density based evolutionary clustering method for dynamic networks. Proc Very Large Data Bases Endow 2(1):622–633
Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E 80:056117
Lancichinetti A, Fortunato S, Kertesz J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11(3):033,015
Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on world wide web, WWW’10, pp 631–640
Lin YR, Chi Y, Zhu S, Sundaram H, Tseng BL (2008) Facetnet: a framework for analyzing communities and their evolutions in dynamic networks. In: Proceedings of the 17th international conference on world wide web, WWW’08, pp 685–694
McDaid AF, Greene D, Hurley N (2011) Normalized mutual information to evaluate overlapping community finding algorithms. CoRR.arXiv:1110.2515
Nanavati AA, Gurumurthy S, Das G, Chakraborty D, Dasgupta K, Mukherjea S, Joshi A (2006) On the structural properties of massive telecom call graphs: findings and implications. In: Proceedings of the 15th ACM international conference on information and knowledge management, CIKM’06, pp 435–444
Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23):8577–8582
Padrol-Sureda A, Perarnau-Llobet G, Pfeifle J, Muntes-Mulero V (2010) Overlapping community search for social networks. In: IEEE 26th international conference on data engineering (ICDE), 2010, pp 992–995
Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818
Qi GJ, Aggarwal CC, Huang TS (2013) Online community detection in social sensing. In: Proceedings of the sixth ACM international conference on web search and data mining, WSDM’13 pp 617–626
Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76:036,106
Rees B, Gallagher K (2010) Overlapping community detection by collective friendship group inference. In: International conference on advances in social networks analysis and mining (ASONAM), 2010, pp 375–379
Rees BS, Gallagher KB (2012) Overlapping community detection using a community optimized graph swarm. Soc Netw Anal Min 2(4):405–417
Rees BS, Gallagher KB (2013a) Detecting overlapping communities in complex networks using swarm intelligence for multi-threaded label propagation. Complex networks. Springer, Berlin
Rees BS, Gallagher KB (2013b) Egoclustering: overlapping community detection via merged friendship-groups. The influence of technology on social network analysis and mining. Springer, Vienna
Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105(4):1118–1123
Sarr I, Missaoui R, Lalande R (2013) Group disappearance in social networks with communities. Soc Netw Anal Min 3(3):651–665
SNAP (2014) Stanford network analysis package. http://snap.stanford.edu/snap. Accessed Mar 2014
Sozio M, Gionis A (2010) The community-search problem and how to plan a successful cocktail party. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’10, pp 939–948
Twitter (2014) http://www.twitter.com/. Accessed Mar 2014
Wang F, Li T, Wang X, Zhu S, Ding C (2011) Community discovery using nonnegative matrix factorization. Data Min Knowl Discov 22(3):493–521
Whang JJ, Gleich DF, Dhillon IS (2013) Overlapping community detection using seed set expansion. In: Proceedings of the 22nd ACM international conference on information and knowledge management, CIKM’13 pp 2099–2108
Xie J, Chen M, Szymanski BK (2013) Labelrankt: incremental community detection in dynamic networks via label propagation. In: Proceedings of the workshop on dynamic networks management and mining, DyNetMM’13, pp 25–32
Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv 45(4):43:1–43:35
Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proceedings of the ACM SIGKDD workshop on mining data semantics, MDS’12 pp 3:1–3:8
Yang J, Leskovec J (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the sixth ACM international conference on web search and data mining, WSDM’13 pp 587–596
Zhang Y, Yeung DY (2012) Overlapping community detection via bounded nonnegative matrix tri-factorization. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’12 pp 606–614
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editors: G. Karypis.
Rights and permissions
About this article
Cite this article
Sarıyüce, A.E., Gedik, B., Jacques-Silva, G. et al. SONIC: streaming overlapping community detection. Data Min Knowl Disc 30, 819–847 (2016). https://doi.org/10.1007/s10618-015-0440-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-015-0440-z