Abstract
In this paper, we investigate the problem of (k,r)-core which intends to find cohesive subgraphs on social networks considering both user engagement and similarity perspectives. In particular, we adopt the popular concept of k-core to guarantee the engagement of the users (vertices) in a group (subgraph) where each vertex in a (k,r)-core connects to at least k other vertices. Meanwhile, we consider the pairwise similarity among users based on their attributes. Efficient algorithms are proposed to enumerate all maximal (k,r)-cores and find the maximum (k,r)-core, where both problems are shown to be NP-hard. Effective pruning techniques substantially reduce the search space of two algorithms. A novel (\(k\),\(k'\))-core based (\(k\),\(r\))-core size upper bound enhances the performance of the maximum (k,r)-core computation. We also devise effective search orders for two algorithms with different search priorities for vertices. Besides, we study the diversified (\(k\),\(r\))-core search problem to find l maximal (\(k\),\(r\))-cores which cover the most vertices in total. These maximal (\(k\),\(r\))-cores are distinctive and informationally rich. An efficient algorithm is proposed with a guaranteed approximation ratio. We design a tight upper bound to prune unpromising partial (\(k\),\(r\))-cores. A new search order is designed to speed up the search. Initial candidates with large size are generated to further enhance the pruning power. Comprehensive experiments on real-life data demonstrate that the maximal (k,r)-cores enable us to find interesting cohesive subgraphs, and performance of three mining algorithms is effectively improved by all the proposed techniques.
Similar content being viewed by others
Notes
Following the convention, when the distance metric (e.g., Euclidean distance) is employed, we say two vertices are similar if their distance is not larger than the given distance threshold.
To avoid the noise, we enforce that there are at least three co-authored papers between two connected authors in the case study.
To make the figure clear, in this case study, one edge represents that there are at least three co-authored papers between two corresponding authors.
References
Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM, pp. 5–14 (2009)
Angel, A., Koudas, N.: Efficient diversity-aware search. In: SIGMOD, pp. 781–792 (2011)
Ausiello, G., Boria, N., Giannakos, A., Lucarelli, G., Paschos, V.T.: Online maximum k-coverage. Discrete Appl. Math. 160(13–14), 1901–1913 (2012)
Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: KDD, pp. 671–680 (2014)
Batagelj, V., Zaversnik, M.: An o(m) algorithm for cores decomposition of networks. In: CoRR, cs.DS/0310049 (2003)
Bhawalkar, K., Kleinberg, J.M., Lewi, K., Roughgarden, T., Sharma, A.: Preventing unraveling in social networks: the anchored k-core problem. SIAM J. Discrete Math. 29(3), 1452–1475 (2015)
Bird, C., Gourley, A., Devanbu, P. T., Gertz, M., Swaminathan, A.: Mining email social networks. In: MSR, pp. 137–143 (2006)
Borodin, A., Lee, H.C., Ye, Y.: Max-sum diversification, monotone submodular functions and dynamic updates. In: PODS, pp. 155–166 (2012)
Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16(9), 575–576 (1973)
Chang, L.: Efficient maximum clique computation over large sparse graphs. In: SIGKDD, pp. 529–538 (2019)
Chang, L., Yu, J.X., Qin, L.: Fast maximal cliques enumeration in sparse graphs. Algorithmica 66(1), 173–186 (2013)
Chen, K., Lei, C.: Network game design: hints and implications of player interaction. In: NETGAMES, p. 17 (2006)
Chen, L., Liu, C., Zhou, R., Li, J., Yang, X., Wang, B.: Maximum co-located community search in large scale social networks. PVLDB 11(10), 1233–1246 (2018)
Cheng, J., Zhu, L., Ke, Y., Chu, S.: Fast algorithms for maximal clique enumeration with limited memory. In: KDD, pp. 1240–1248 (2012)
Clark, B.N., Colbourn, C.J., Johnson, D.S.: Unit disk graphs. Discrete Math. 86(1–3), 165–177 (1990)
Deng, T., Fan, W.: On the complexity of query result diversification. PVLDB 6(8), 577–588 (2013)
Drosou, M., Pitoura, E.: Search result diversification. SIGMOD Rec. 39(1), 41–47 (2010)
Eppstein, D., Strash, D.: Listing all maximal cliques in large sparse real-world graphs. In: SEA, pp. 364–375 (2011)
Facebook. How does facebook suggest groups for me? https://www.facebook.com/help/382485908586472?helpref=uf_permalink. Accessed 16 Sep 2019
Fan, W., Wang, X., Wu, Y.: Diversified top-k graph pattern matching. PVLDB 6(13), 1510–1521 (2013)
Fang, Y., Cheng, R., Li, X., Luo, S., Hu, J.: Effective community search over large spatial graphs. PVLDB 10(6), 709–720 (2017)
Fang, Y., Cheng, R., Luo, S., Hu, J.: Effective community search for large attributed graphs. PVLDB 9(12), 1233–1244 (2016)
Fang, Y., Zhang, H., Ye, Y., Li, X.: Detecting hot topics from twitter: a multiview approach. J. Inf. Sci. 40(5), 578–593 (2014)
Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)
Ferrara, E., JafariAsbagh, M., Varol, O., Qazvinian, V., Menczer, F., Flammini, A.: Clustering memes in social media. In: ASONAM, pp. 548–555 (2013)
Garey, M.R., Johnson, D.S.: The complexity of near-optimal graph coloring. JACM 23(1), 43–49 (1976)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H Freeman, New York (1979)
Goldberg, M.K., Kelley, S., Magdon-Ismail, M., Mertsalov, K., Wallace, A.: Finding overlapping communities in social networks. In: SocialCom/PASSAT, pp. 104–113 (2010)
Gupta, R., Walrand, J., Goldschmidt, O.: Maximal cliques in unit disk graphs: polynomial approximation. In: Proceedings INOC, vol. 2005. Citeseer (2005)
Hristova, D., Musolesi, M., Mascolo, C.: Keep your friends close and your facebook friends closer: A multiplex network approach to the analysis of offline and online social ties. In: ICWSM (2014)
Huang, X., Lu, W., Lakshmanan, L.V.S.: Truss decomposition of probabilistic graphs: Semantics and algorithms. In: SIGMOD, pp. 77–90 (2016)
Pfeiffer, J.J III., Moreno, S., Fond, T.L., Neville, J., Gallagher, B.: Attributed graph models: modeling network structure with correlated attributes. In: WWW, pp. 831–842 (2014)
Izumi, T., Suzuki, D.: Faster enumeration of all maximal cliques in unit disk graphs using geometric structure. IEICE Trans. 98–D(3), 490–496 (2015)
Kitsak, M., Gallos, L.K., Havlin, S., Liljeros, F., Muchnik, L., Stanley, H.E., Makse, H.A.: Identification of influential spreaders in complex networks. Nat. Phys. 6(11), 888–893 (2010)
Lee, P., Lakshmanan, L.V.S., Milios, E.E.: CAST: a context-aware story-teller for streaming social content. In: CIKM, pp. 789–798 (2014)
Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: the k most representative skyline operator. In: ICDE, pp. 86–95 (2007)
Liu, Y., Sutanto, J.: Buyers purchasing time and herd behavior on deal-of-the-day group-buying websites. Electron. Mark. 22(2), 83–93 (2012)
Luo, M.M., Chea, S.: The effect of social rewards and perceived effectiveness of e-commerce institutional mechanisms on intention to group buying. In: Advances in Human Factors, Business Management, Training and Education, pp. 833–840. Springer, Berlin (2017)
Luo, X., Andrews, M., Song, Y., Aspara, J.: Group-buying deal popularity. J. Mark. 78(2), 20–33 (2014)
Malliaros, F.D., Vazirgiannis, M.: To stay or not to stay: modeling engagement dynamics in social graphs. In: CIKM, pp. 469–478 (2013)
Minack, E., Siberski, W., Nejdl, W.: Incremental diversification for very large sets: a streaming-based approach. In: SIGIR, pp. 585–594 (2011)
Mitzlaff, F., Atzmüller, M., Hotho, A., Stumme, G.: The social distributional hypothesis: a pragmatic proxy for homophily in online social networks. Soc. Netw. Anal. Min. 4(1), 216 (2014)
PokemonGo. Developer insights: Inside the philosophy of friends and trading. https://pokemongolive.com/en/post/jundevupdate-trading/. Accessed 16 Sep 2019
Qin, L., Yu, J.X., Chang, L.: Diversifying top-k results. PVLDB 5(11), 1124–1135 (2012)
Seidman, S.B.: Network structure and minimum degree. Soc. Netw. 5(3), 269–287 (1983)
Sharma, P., Govindan, S.: Information seeking behavior of expats in asia on facebook open groups. Singap. J. Libr. Inf. Manag. 44, 35 (2016)
Singla, P., Richardson, M.: Yes, there is a correlation—from social networks to personal behavior on the web. In: WWW, pp. 655–664 (2008)
Statista. Number of active users of pokemon go worldwide from 2016 to 2020, by region (in millions). https://www.statista.com/statistics/665640. Accessed 16 Sep 2019
Ugander, J., Backstrom, L., Marlow, C., Kleinberg, J.: Structural diversity in social contagion. PNAS 109(16), 5962–5966 (2012)
Vieira, M.R., Razente, H.L., Barioni, M.C.N., Hadjieleftheriou, M., Srivastava, D., Traina, Jr. C., Tsotras, V.J.: On query result diversification. In: ICDE, pp. 1163–1174 (2011)
Wang, J., Cheng, J., Fu, A.W.: Redundancy-aware maximal cliques. In: KDD, pp. 122–130 (2013)
Wang, K., Cao, X., Lin, X., Zhang, W., Qin, L.: Efficient computing of radius-bounded k-cores. In: ICDE, pp. 233–244 (2018)
Wang, K., Lin, X., Qin, L., Zhang, W., Zhang, Y.: Vertex priority based butterfly counting for large-scale bipartite networks. PVLDB 12(10), 1139–1152 (2019)
Wen, D., Qin, L., Zhang, Y., Lin, X., Yu, J.X.: I/O efficient core graph decomposition at web scale. In: ICDE, pp. 133–144 (2016)
Wu, S., Sarma, A.D., Fabrikant, A., Lattanzi, S., Tomkins, A.: Arrival and departure dynamics in social networks. In: WSDM, pp. 233–242 (2013)
Wu, Y., Jin, R., Zhu, X., Zhang, X.: Finding dense and connected subgraphs in dual networks. In: ICDE, pp. 915–926 (2015)
Xu, Z., Ke, Y., Wang, Y., Cheng, H., Cheng, J.: A model-based approach to attributed graph clustering. In: SIGMOD, pp. 505–516 (2012)
Yang, J., McAuley, J.J., Leskovec, J.: Community detection in networks with node attributes. In: ICDM, pp. 1151–1156 (2013)
Yu, H., Yuan, D.: Set coverage problems in a one-pass data stream. In: SDM, pp. 758–766 (2013)
Yuan, L., Qin, L., Lin, X., Chang, L., Zhang, W.: Diversified top-k clique search. In: ICDE, pp. 387–398 (2015)
Yuan, Q., Zhao, S., Chen, L., Liu, Y., Ding, S., Zhang, X., Zheng, W.: Augmenting collaborative recommender by fusing explicit social relationships. In: Recsys Workshop (2009)
Zhang, F., Yuan, L., Zhang, Y., Qin, L., Lin, X., Zhou, A.: Discovering strong communities with user engagement and tie strength. In: DASFAA, pp. 425–441 (2018)
Zhang, F., Zhang, W., Zhang, Y., Qin, L., Lin, X.: OLAK: an efficient algorithm to prevent unraveling in social networks. PVLDB 10(6), 649–660 (2017)
Zhang, F., Zhang, Y., Qin, L., Zhang, W., Lin, X.: Finding critical users for social network engagement: the collapsed k-core problem. In: AAAI, pp. 245–251 (2017)
Zhang, F., Zhang, Y., Qin, L., Zhang, W., Lin, X.: When engagement meets similarity: efficient (k, r)-core computation on social networks. PVLDB 10(10), 998–1009 (2017)
Zhang, F., Zhang, Y., Qin, L., Zhang, W., Lin, X.: Efficiently reinforcing social networks over user engagement and tie strength. In: ICDE, pp. 557–568 (2018)
Zhang, Y., Qin, L., Zhang, F., Zhang, W.: Hierarchical decomposition of big graphs. In: ICDE, pp. 2064–2067 (2019)
Zhou, Z., Zhang, F., Lin, X., Zhang, W., Chen, C.: K-core maximization: An edge addition approach. In: IJCAI, pp. 4867–4873 (2019)
Zhu, Q., Hu, H., Xu, C., Xu, J., Lee, W.: Geo-social group queries with minimum acquaintance constraints. VLDB J. 26(5), 709–727 (2017)
Acknowledgements
Xuemin Lin is supported by 2018YFB1003504, NSFC61232006, ARC DP180103096 and DP170101628. Ying Zhang is supported by ARC DP180103096 and FT170100128. Lu Qin is supported by ARC DP160101513. Wenjie Zhang is supported by ARC DP180103096.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, F., Lin, X., Zhang, Y. et al. Efficient community discovery with user engagement and similarity. The VLDB Journal 28, 987–1012 (2019). https://doi.org/10.1007/s00778-019-00579-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-019-00579-4