The VLDB Journal

, Volume 28, Issue 6, pp 987–1012 | Cite as

Efficient community discovery with user engagement and similarity

  • Fan ZhangEmail author
  • Xuemin Lin
  • Ying Zhang
  • Lu Qin
  • Wenjie Zhang
Regular Paper


In this paper, we investigate the problem of (k,r)-core which intends to find cohesive subgraphs on social networks considering both user engagement and similarity perspectives. In particular, we adopt the popular concept of k-core to guarantee the engagement of the users (vertices) in a group (subgraph) where each vertex in a (k,r)-core connects to at least k other vertices. Meanwhile, we consider the pairwise similarity among users based on their attributes. Efficient algorithms are proposed to enumerate all maximal (k,r)-cores and find the maximum (k,r)-core, where both problems are shown to be NP-hard. Effective pruning techniques substantially reduce the search space of two algorithms. A novel (\(k\),\(k'\))-core based (\(k\),\(r\))-core size upper bound enhances the performance of the maximum (k,r)-core computation. We also devise effective search orders for two algorithms with different search priorities for vertices. Besides, we study the diversified (\(k\),\(r\))-core search problem to find l maximal (\(k\),\(r\))-cores which cover the most vertices in total. These maximal (\(k\),\(r\))-cores are distinctive and informationally rich. An efficient algorithm is proposed with a guaranteed approximation ratio. We design a tight upper bound to prune unpromising partial (\(k\),\(r\))-cores. A new search order is designed to speed up the search. Initial candidates with large size are generated to further enhance the pruning power. Comprehensive experiments on real-life data demonstrate that the maximal (k,r)-cores enable us to find interesting cohesive subgraphs, and performance of three mining algorithms is effectively improved by all the proposed techniques.


Community detection User engagement User similarity Diversification 



Xuemin Lin is supported by 2018YFB1003504, NSFC61232006, ARC DP180103096 and DP170101628. Ying Zhang is supported by ARC DP180103096 and FT170100128. Lu Qin is supported by ARC DP160101513. Wenjie Zhang is supported by ARC DP180103096.


  1. 1.
    Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM, pp. 5–14 (2009)Google Scholar
  2. 2.
    Angel, A., Koudas, N.: Efficient diversity-aware search. In: SIGMOD, pp. 781–792 (2011)Google Scholar
  3. 3.
    Ausiello, G., Boria, N., Giannakos, A., Lucarelli, G., Paschos, V.T.: Online maximum k-coverage. Discrete Appl. Math. 160(13–14), 1901–1913 (2012)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: KDD, pp. 671–680 (2014)Google Scholar
  5. 5.
    Batagelj, V., Zaversnik, M.: An o(m) algorithm for cores decomposition of networks. In: CoRR, cs.DS/0310049 (2003)Google Scholar
  6. 6.
    Bhawalkar, K., Kleinberg, J.M., Lewi, K., Roughgarden, T., Sharma, A.: Preventing unraveling in social networks: the anchored k-core problem. SIAM J. Discrete Math. 29(3), 1452–1475 (2015)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Bird, C., Gourley, A., Devanbu, P. T., Gertz, M., Swaminathan, A.: Mining email social networks. In: MSR, pp. 137–143 (2006)Google Scholar
  8. 8.
    Borodin, A., Lee, H.C., Ye, Y.: Max-sum diversification, monotone submodular functions and dynamic updates. In: PODS, pp. 155–166 (2012)Google Scholar
  9. 9.
    Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16(9), 575–576 (1973)CrossRefGoogle Scholar
  10. 10.
    Chang, L.: Efficient maximum clique computation over large sparse graphs. In: SIGKDD, pp. 529–538 (2019)Google Scholar
  11. 11.
    Chang, L., Yu, J.X., Qin, L.: Fast maximal cliques enumeration in sparse graphs. Algorithmica 66(1), 173–186 (2013)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Chen, K., Lei, C.: Network game design: hints and implications of player interaction. In: NETGAMES, p. 17 (2006)Google Scholar
  13. 13.
    Chen, L., Liu, C., Zhou, R., Li, J., Yang, X., Wang, B.: Maximum co-located community search in large scale social networks. PVLDB 11(10), 1233–1246 (2018)Google Scholar
  14. 14.
    Cheng, J., Zhu, L., Ke, Y., Chu, S.: Fast algorithms for maximal clique enumeration with limited memory. In: KDD, pp. 1240–1248 (2012)Google Scholar
  15. 15.
    Clark, B.N., Colbourn, C.J., Johnson, D.S.: Unit disk graphs. Discrete Math. 86(1–3), 165–177 (1990)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Deng, T., Fan, W.: On the complexity of query result diversification. PVLDB 6(8), 577–588 (2013)Google Scholar
  17. 17.
    Drosou, M., Pitoura, E.: Search result diversification. SIGMOD Rec. 39(1), 41–47 (2010)CrossRefGoogle Scholar
  18. 18.
    Eppstein, D., Strash, D.: Listing all maximal cliques in large sparse real-world graphs. In: SEA, pp. 364–375 (2011)Google Scholar
  19. 19.
    Facebook. How does facebook suggest groups for me? Accessed 16 Sep 2019
  20. 20.
    Fan, W., Wang, X., Wu, Y.: Diversified top-k graph pattern matching. PVLDB 6(13), 1510–1521 (2013)Google Scholar
  21. 21.
    Fang, Y., Cheng, R., Li, X., Luo, S., Hu, J.: Effective community search over large spatial graphs. PVLDB 10(6), 709–720 (2017)Google Scholar
  22. 22.
    Fang, Y., Cheng, R., Luo, S., Hu, J.: Effective community search for large attributed graphs. PVLDB 9(12), 1233–1244 (2016)Google Scholar
  23. 23.
    Fang, Y., Zhang, H., Ye, Y., Li, X.: Detecting hot topics from twitter: a multiview approach. J. Inf. Sci. 40(5), 578–593 (2014)CrossRefGoogle Scholar
  24. 24.
    Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Ferrara, E., JafariAsbagh, M., Varol, O., Qazvinian, V., Menczer, F., Flammini, A.: Clustering memes in social media. In: ASONAM, pp. 548–555 (2013)Google Scholar
  26. 26.
    Garey, M.R., Johnson, D.S.: The complexity of near-optimal graph coloring. JACM 23(1), 43–49 (1976)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H Freeman, New York (1979)zbMATHGoogle Scholar
  28. 28.
    Goldberg, M.K., Kelley, S., Magdon-Ismail, M., Mertsalov, K., Wallace, A.: Finding overlapping communities in social networks. In: SocialCom/PASSAT, pp. 104–113 (2010)Google Scholar
  29. 29.
    Gupta, R., Walrand, J., Goldschmidt, O.: Maximal cliques in unit disk graphs: polynomial approximation. In: Proceedings INOC, vol. 2005. Citeseer (2005)Google Scholar
  30. 30.
    Hristova, D., Musolesi, M., Mascolo, C.: Keep your friends close and your facebook friends closer: A multiplex network approach to the analysis of offline and online social ties. In: ICWSM (2014)Google Scholar
  31. 31.
    Huang, X., Lu, W., Lakshmanan, L.V.S.: Truss decomposition of probabilistic graphs: Semantics and algorithms. In: SIGMOD, pp. 77–90 (2016)Google Scholar
  32. 32.
    Pfeiffer, J.J III., Moreno, S., Fond, T.L., Neville, J., Gallagher, B.: Attributed graph models: modeling network structure with correlated attributes. In: WWW, pp. 831–842 (2014)Google Scholar
  33. 33.
    Izumi, T., Suzuki, D.: Faster enumeration of all maximal cliques in unit disk graphs using geometric structure. IEICE Trans. 98–D(3), 490–496 (2015)CrossRefGoogle Scholar
  34. 34.
    Kitsak, M., Gallos, L.K., Havlin, S., Liljeros, F., Muchnik, L., Stanley, H.E., Makse, H.A.: Identification of influential spreaders in complex networks. Nat. Phys. 6(11), 888–893 (2010)CrossRefGoogle Scholar
  35. 35.
    Lee, P., Lakshmanan, L.V.S., Milios, E.E.: CAST: a context-aware story-teller for streaming social content. In: CIKM, pp. 789–798 (2014)Google Scholar
  36. 36.
    Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: the k most representative skyline operator. In: ICDE, pp. 86–95 (2007)Google Scholar
  37. 37.
    Liu, Y., Sutanto, J.: Buyers purchasing time and herd behavior on deal-of-the-day group-buying websites. Electron. Mark. 22(2), 83–93 (2012)CrossRefGoogle Scholar
  38. 38.
    Luo, M.M., Chea, S.: The effect of social rewards and perceived effectiveness of e-commerce institutional mechanisms on intention to group buying. In: Advances in Human Factors, Business Management, Training and Education, pp. 833–840. Springer, Berlin (2017)Google Scholar
  39. 39.
    Luo, X., Andrews, M., Song, Y., Aspara, J.: Group-buying deal popularity. J. Mark. 78(2), 20–33 (2014)CrossRefGoogle Scholar
  40. 40.
    Malliaros, F.D., Vazirgiannis, M.: To stay or not to stay: modeling engagement dynamics in social graphs. In: CIKM, pp. 469–478 (2013)Google Scholar
  41. 41.
    Minack, E., Siberski, W., Nejdl, W.: Incremental diversification for very large sets: a streaming-based approach. In: SIGIR, pp. 585–594 (2011)Google Scholar
  42. 42.
    Mitzlaff, F., Atzmüller, M., Hotho, A., Stumme, G.: The social distributional hypothesis: a pragmatic proxy for homophily in online social networks. Soc. Netw. Anal. Min. 4(1), 216 (2014)CrossRefGoogle Scholar
  43. 43.
    PokemonGo. Developer insights: Inside the philosophy of friends and trading. Accessed 16 Sep 2019
  44. 44.
    Qin, L., Yu, J.X., Chang, L.: Diversifying top-k results. PVLDB 5(11), 1124–1135 (2012)Google Scholar
  45. 45.
    Seidman, S.B.: Network structure and minimum degree. Soc. Netw. 5(3), 269–287 (1983)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Sharma, P., Govindan, S.: Information seeking behavior of expats in asia on facebook open groups. Singap. J. Libr. Inf. Manag. 44, 35 (2016)Google Scholar
  47. 47.
    Singla, P., Richardson, M.: Yes, there is a correlation—from social networks to personal behavior on the web. In: WWW, pp. 655–664 (2008)Google Scholar
  48. 48.
    Statista. Number of active users of pokemon go worldwide from 2016 to 2020, by region (in millions). Accessed 16 Sep 2019
  49. 49.
    Ugander, J., Backstrom, L., Marlow, C., Kleinberg, J.: Structural diversity in social contagion. PNAS 109(16), 5962–5966 (2012)CrossRefGoogle Scholar
  50. 50.
    Vieira, M.R., Razente, H.L., Barioni, M.C.N., Hadjieleftheriou, M., Srivastava, D., Traina, Jr. C., Tsotras, V.J.: On query result diversification. In: ICDE, pp. 1163–1174 (2011)Google Scholar
  51. 51.
    Wang, J., Cheng, J., Fu, A.W.: Redundancy-aware maximal cliques. In: KDD, pp. 122–130 (2013)Google Scholar
  52. 52.
    Wang, K., Cao, X., Lin, X., Zhang, W., Qin, L.: Efficient computing of radius-bounded k-cores. In: ICDE, pp. 233–244 (2018)Google Scholar
  53. 53.
    Wang, K., Lin, X., Qin, L., Zhang, W., Zhang, Y.: Vertex priority based butterfly counting for large-scale bipartite networks. PVLDB 12(10), 1139–1152 (2019)Google Scholar
  54. 54.
    Wen, D., Qin, L., Zhang, Y., Lin, X., Yu, J.X.: I/O efficient core graph decomposition at web scale. In: ICDE, pp. 133–144 (2016)Google Scholar
  55. 55.
    Wu, S., Sarma, A.D., Fabrikant, A., Lattanzi, S., Tomkins, A.: Arrival and departure dynamics in social networks. In: WSDM, pp. 233–242 (2013)Google Scholar
  56. 56.
    Wu, Y., Jin, R., Zhu, X., Zhang, X.: Finding dense and connected subgraphs in dual networks. In: ICDE, pp. 915–926 (2015)Google Scholar
  57. 57.
    Xu, Z., Ke, Y., Wang, Y., Cheng, H., Cheng, J.: A model-based approach to attributed graph clustering. In: SIGMOD, pp. 505–516 (2012)Google Scholar
  58. 58.
    Yang, J., McAuley, J.J., Leskovec, J.: Community detection in networks with node attributes. In: ICDM, pp. 1151–1156 (2013)Google Scholar
  59. 59.
    Yu, H., Yuan, D.: Set coverage problems in a one-pass data stream. In: SDM, pp. 758–766 (2013)Google Scholar
  60. 60.
    Yuan, L., Qin, L., Lin, X., Chang, L., Zhang, W.: Diversified top-k clique search. In: ICDE, pp. 387–398 (2015)Google Scholar
  61. 61.
    Yuan, Q., Zhao, S., Chen, L., Liu, Y., Ding, S., Zhang, X., Zheng, W.: Augmenting collaborative recommender by fusing explicit social relationships. In: Recsys Workshop (2009)Google Scholar
  62. 62.
    Zhang, F., Yuan, L., Zhang, Y., Qin, L., Lin, X., Zhou, A.: Discovering strong communities with user engagement and tie strength. In: DASFAA, pp. 425–441 (2018)Google Scholar
  63. 63.
    Zhang, F., Zhang, W., Zhang, Y., Qin, L., Lin, X.: OLAK: an efficient algorithm to prevent unraveling in social networks. PVLDB 10(6), 649–660 (2017)Google Scholar
  64. 64.
    Zhang, F., Zhang, Y., Qin, L., Zhang, W., Lin, X.: Finding critical users for social network engagement: the collapsed k-core problem. In: AAAI, pp. 245–251 (2017)Google Scholar
  65. 65.
    Zhang, F., Zhang, Y., Qin, L., Zhang, W., Lin, X.: When engagement meets similarity: efficient (k, r)-core computation on social networks. PVLDB 10(10), 998–1009 (2017)Google Scholar
  66. 66.
    Zhang, F., Zhang, Y., Qin, L., Zhang, W., Lin, X.: Efficiently reinforcing social networks over user engagement and tie strength. In: ICDE, pp. 557–568 (2018)Google Scholar
  67. 67.
    Zhang, Y., Qin, L., Zhang, F., Zhang, W.: Hierarchical decomposition of big graphs. In: ICDE, pp. 2064–2067 (2019)Google Scholar
  68. 68.
    Zhou, Z., Zhang, F., Lin, X., Zhang, W., Chen, C.: K-core maximization: An edge addition approach. In: IJCAI, pp. 4867–4873 (2019)Google Scholar
  69. 69.
    Zhu, Q., Hu, H., Xu, C., Xu, J., Lee, W.: Geo-social group queries with minimum acquaintance constraints. VLDB J. 26(5), 709–727 (2017)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Guangzhou UniversityGuangzhouChina
  2. 2.University of New South WalesSydneyAustralia
  3. 3.East China Normal UniversityShanghaiChina
  4. 4.Centre for AIUniversity of Technology SydneySydneyAustralia

Personalised recommendations