The VLDB Journal

, Volume 25, Issue 2, pp 171–196 | Cite as

Diversified top-k clique search

  • Long Yuan
  • Lu Qin
  • Xuemin Lin
  • Lijun Chang
  • Wenjie Zhang
Regular Paper

Abstract

Maximal clique enumeration is a fundamental problem in graph theory and has been extensively studied. However, maximal clique enumeration is time-consuming in large graphs and always returns enormous cliques with large overlaps. Motivated by this, in this paper, we study the diversified top-k clique search problem which is to find top-k cliques that can cover most number of nodes in the graph. Diversified top-k clique search can be widely used in a lot of applications including community search, motif discovery, and anomaly detection in large graphs. A naive solution for diversified top-k clique search is to keep all maximal cliques in memory and then find k of them that cover most nodes in the graph by using the approximate greedy max k-cover algorithm. However, such a solution is impractical when the graph is large. In this paper, instead of keeping all maximal cliques in memory, we devise an algorithm to maintain k candidates in the process of maximal clique enumeration. Our algorithm has limited memory footprint and can achieve a guaranteed approximation ratio. We also introduce a novel light-weight \(\mathsf {PNP}\)-\(\mathsf {Index}\), based on which we design an optimal maximal clique maintenance algorithm. We further explore three optimization strategies to avoid enumerating all maximal cliques and thus largely reduce the computational cost. Besides, for the massive input graph, we develop an I/O efficient algorithm to tackle the problem when the input graph cannot fit in main memory. We conduct extensive performance studies on real graphs and synthetic graphs. One of the real graphs contains 1.02 billion edges. The results demonstrate the high efficiency and effectiveness of our approach.

Keywords

Graph Diversified top-k search Clique I/O efficient 

References

  1. 1.
    Aggarwal, A., Vitter, J., et al.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of WSDM’09, pp. 5–14 (2009)Google Scholar
  3. 3.
    Akkoyunlu, E.A.: The enumeration of maximal cliques of large graphs. SIAM J. Comput. 2(1), 1–6 (1973)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Angel, A., Koudas, N.: Efficient diversity-aware search. In: Proceedings of SIGMOD’11, pp. 781–792 (2011)Google Scholar
  5. 5.
    Ausiello, G., Boria, N., Giannakos, A., Lucarelli, G., Paschos, V.T.: Online maximum k-coverage. Discrete Appl. Math. 160(13–14), 1901–1913 (2012)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: Proceedings of KDD’14, pp. 671–680 (2014)Google Scholar
  7. 7.
    Batagelj, V., Zaversnik, M.: An o(m) algorithm for cores decomposition of networks. CoRR. cs.DS/0310049 (2003)Google Scholar
  8. 8.
    Bernard, H.R., Killworth, P.D., Sailer, L.: Informant accuracy in social network data IV: a comparison of clique-level structure in behavioral and cognitive network data. Soc. Netw. 2(3), 191–218 (1979)CrossRefGoogle Scholar
  9. 9.
    Berry, N., Ko, T., Moy, T., Smrcka, J., Turnley, J., Wu, B.: Emergent clique formation in terrorist recruitment. In: Workshop on Agent Organizations: Theory and Practice (2004)Google Scholar
  10. 10.
    Borodin, A., Lee, H.C., Ye, Y.: Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of PODS’12, pp. 155–166 (2012)Google Scholar
  11. 11.
    Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16(9), 575–576 (1973)CrossRefMATHGoogle Scholar
  12. 12.
    Carraghan, R., Pardalos, P.M.: An exact algorithm for the maximum clique problem. Operat. Res. Lett. 9(6), 375–382 (1990)CrossRefMATHGoogle Scholar
  13. 13.
    Chang, L., Yu, J.X., Qin, L.: Fast maximal cliques enumeration in sparse graphs. Algorithmica 66(1), 173–186 (2013)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Cheng, J., Ke, Y., Chu, S., Özsu, M.T.: Efficient core decomposition in massive networks. In: Proceedings of ICDE, pp. 51–62 (2011)Google Scholar
  15. 15.
    Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks. ACM Trans. Database Syst. 36(4), 21:1–21:34 (2011)CrossRefGoogle Scholar
  16. 16.
    Cheng, J., Zhu, L., Ke, Y., Chu, S.: Fast algorithms for maximal clique enumeration with limited memory. In: Proceedings of KDD’12, pp. 1240–1248 (2012)Google Scholar
  17. 17.
    Chierichetti, F., Kumar, R., Tomkins, A.: Max-cover in map-reduce. In: Proceedings of WWW’10, pp. 231–240 (2010)Google Scholar
  18. 18.
    Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In: Proceedings of SIGKDD, pp. 672–680 (2011)Google Scholar
  19. 19.
    Demidova, E., Fankhauser, P., Zhou, X., Nejdl, W.: DivQ: diversification for keyword search over structured databases. In: Proceedings of SIGIR’10, pp. 331–338 (2010)Google Scholar
  20. 20.
    Deng, T., Fan, W.: On the complexity of query result diversification. ACM Trans. Database Syst. 39(2), 15:1–15:46 (2014)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Drosou, M., Pitoura, E.: Search result diversification. SIGMOD Rec. 39(1), 41–47 (2010)CrossRefGoogle Scholar
  22. 22.
    Eppstein, D., Loffler, M., Strash, D.: Listing all maximal cliques in sparse graphs in near-optimal time. ISAAC 1, 403–414 (2010)MathSciNetMATHGoogle Scholar
  23. 23.
    Eppstein, D., Strash, D.: Listing all maximal cliques in large sparse real-world graphs. In: Proceedings of SEA’11, pp. 364–375 (2011)Google Scholar
  24. 24.
    Fan, W., Wang, X., Wu, Y.: Diversified top-k graph pattern matching. PVLDB 6(13), 1510–1521 (2013)Google Scholar
  25. 25.
    Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., San Francisco (1979)MATHGoogle Scholar
  27. 27.
    Hu, X., Tao, Y., Chung, C.: I/O-efficient algorithms on triangle listing and counting. ACM Trans. Database Syst. 39(4), 27:1–27:30 (2014)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of Computer Computations. Plenum Press (1972)Google Scholar
  29. 29.
    Konc, J., Janezic, D.: An improved branch and bound algorithm for the maximum clique problem. Proteins 4, 5 (2007)MathSciNetMATHGoogle Scholar
  30. 30.
    Lee, C., Reid, F., McDaid, A., Hurley, N.: Detecting highly overlapping community structure by greedy clique expansion. In: Workshop on Social Network Mining and Analysis (2010)Google Scholar
  31. 31.
    Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: The k most representative skyline operator. In: Proceedings of ICDE, pp. 86–95 (2007)Google Scholar
  32. 32.
    Minack, E., Siberski, W., Nejdl, W.: Incremental diversification for very large sets: a streaming-based approach. In: Proceedings of SIGIR’11, pp. 585–594 (2011)Google Scholar
  33. 33.
    Östergård, P.R.: A fast algorithm for the maximum clique problem. Discrete Appl. Math. 120(1), 197–207 (2002)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Qin, L., Yu, J.X., Chang, L.: Diversifying top-k results. PVLDB 5(11), 1124–1135 (2012)Google Scholar
  35. 35.
    Robson, J.: Finding a maximum independent set in time \(O(2^{n/4})\). In: Technical report, 1251-01, LaBRI, Université de Bordeaux I (2001)Google Scholar
  36. 36.
    Saha, B., Getoor, L.: On maximum coverage in the streaming model & application to multi-topic blog-watch. In: Proceedings of SDM’09, pp. 697–708 (2009)Google Scholar
  37. 37.
    Schmidt, M.C., Samatova, N.F., Thomas, K., Park, B.-H.: A scalable, parallel algorithm for maximal clique enumeration. J. Parallel Distrib. Comput. 69(4), 417–428 (2009)CrossRefGoogle Scholar
  38. 38.
    Suyudi, M., Mohd, I.B., Mamat, M., Sopiyan, S., Supriatna, A.K.: Solution of maximum clique problem by using branch and bound method. Appl. Math. Sci. 8(2), 81–90 (2014)MathSciNetGoogle Scholar
  39. 39.
    Tomita, E., Kameda, T.: An efficient branch-and-bound algorithm for finding a maximum clique with computational experiments. J. Global Optim. 37(1), 95–111 (2007)MathSciNetCrossRefMATHGoogle Scholar
  40. 40.
    Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1), 28–42 (2006)MathSciNetCrossRefMATHGoogle Scholar
  41. 41.
    Vieira, M.R., Razente, H.L., Barioni, M.C.N., Hadjieleftheriou, M., Srivastava, D., Traina, Jr., C., Tsotras, V.J.: On query result diversification. In: Proceedings of ICDE’11 (2011)Google Scholar
  42. 42.
    Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)Google Scholar
  43. 43.
    Wang, J., Cheng, J., Fu, A.W.-C.: Redundancy-aware maximal cliques. In: Proceedings of KDD’13, pp. 122–130 (2013)Google Scholar
  44. 44.
    Welsh, D.J.A., Powell, M.B.: An upper bound for the chromatic number of a graph and its application to timetabling problems. Comput. J. 10(1), 85–86 (1967)CrossRefMATHGoogle Scholar
  45. 45.
    Xiang, J., Guo, C., Aboulnaga, A.: Scalable maximum clique computation using mapreduce. In Proceedings of ICDE’13, pp. 74–85 (2013)Google Scholar
  46. 46.
    Xu, Y., Cheng, J., Fu, A.W.-C., Bu, Y.: Distributed maximal clique computation. In: Proceedings of BigData’14, pp. 160–167 (2014)Google Scholar
  47. 47.
    Yu, H., Yuan, D.: Set coverage problems in a one-pass data stream. In: Proceedings of SDM’13, pp. 758–766 (2013)Google Scholar
  48. 48.
    Yuan, L., Qin, L., Lin, X., Chang, L., Zhang, W.: Diversified top-k clique search. In: Proceedings of ICDE’15, pp. 387–398 (2015)Google Scholar
  49. 49.
    Zhang, Z., Qin, L., Yu, J.X.: Contract & expand: I/O efficient sccs computing. In: Proceedings of ICDE, pp. 208–219 (2014)Google Scholar
  50. 50.
    Zhang, Z., Yu, J.X., Qin, L., Chang, L., Lin, X.: I/O efficient: computing sccs in massive graphs. In: Proceedings of SIGMOD, pp. 181–192 (2013)Google Scholar
  51. 51.
    Zhang, Z., Yu, J.X., Qin, L., Shang, Z.: Divide & conquer: I/O efficient depth-first search. In: Proceedings of SIGMOD, pp. 445–458 (2015)Google Scholar
  52. 52.
    Zheng, X., Liu, T., Yang, Z., Wang, J.: Large cliques in Arabidopsis gene coexpression network and motif discovery. J. Plant Physiol. 168(6), 611–618 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Long Yuan
    • 1
  • Lu Qin
    • 2
  • Xuemin Lin
    • 1
  • Lijun Chang
    • 1
  • Wenjie Zhang
    • 1
  1. 1.The University of New South WalesSydneyAustralia
  2. 2.Centre for QCISUniversity of TechnologySydneyAustralia

Personalised recommendations