Skip to main content

Towards efficient solutions of bitruss decomposition for large-scale bipartite graphs

Abstract

In recent years, cohesive subgraph mining in bipartite graphs becomes a popular research topic. An important cohesive subgraph model k-bitruss is the maximal cohesive subgraph where each edge is contained in at least k butterflies (i.e., (2, 2)-bicliques). In this paper, we study the bitruss decomposition problem which aims to find all the k-bitrusses for \(k \ge 0\). The existing algorithms follow a bottom-up strategy which peels the edges with the lowest butterfly support iteratively. In this peeling process, these algorithms are time-consuming to enumerate all the supporting butterflies for each edge. To solve this issue, we propose a novel online index, the \(\mathsf {BE}\)-\(\mathsf {Index}\) which compresses butterflies into k-blooms (i.e., (2, k)-bicliques). Based on the \(\mathsf {BE}\)-\(\mathsf {Index}\), the new bitruss decomposition algorithm \(\mathsf {BiT}\)-\(\mathsf {BU}\) is proposed, along with two batch-based optimizations, to accomplish the butterfly enumeration of the peeling process efficiently. Furthermore, the \(\mathsf {BiT}\)-\(\mathsf {PC}\) algorithm is designed which is more efficient against handling the edges with high butterfly supports. Besides, we explore shared-memory parallel solutions to handle large graphs in a more efficient way. In the parallel algorithms, we propose effective techniques to reduce conflicts among threads. We theoretically show that our new algorithms significantly reduce the time complexities of the existing algorithms. In addition, extensive empirical evaluations are conducted on real-world datasets. The experimental results further validate the effectiveness of the bitruss model and demonstrate that our proposed solutions significantly outperform the state-of-the-art techniques by several orders of magnitude.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Notes

  1. 1.

    https://www.openmp.org/.

  2. 2.

    http://konect.uni-koblenz.de/networks/.

References

  1. 1.

    Ahmed, A., Batagelj, V., Fu, X., Hong, S.-H., Merrick, D., Mrvar, A.: Visualisation and analysis of the internet movie database. In: 2007 6th International Asia-Pacific Symposium on Visualization, pp. 17–24. IEEE (2007)

  2. 2.

    Alexe, G., Alexe, S., Crama, Y., Foldes, S., Hammer, P.L., Simeone, B.: Consensus algorithms for the generation of all maximal bicliques. Discrete Appl. Math. 145(1), 11–21 (2004)

    MathSciNet  Article  Google Scholar 

  3. 3.

    Batagelj, V., Zaversnik, M.: An o (m) algorithm for cores decomposition of networks. cs/0310049 (2003)

  4. 4.

    Beutel, A., Xu, W., Guruswami, V., Palow, C., Faloutsos, C.: Copycatch: stopping group attacks by spotting lockstep behavior in social networks. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 119–130. ACM (2013)

  5. 5.

    Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16(9), 575–576 (1973)

    Article  Google Scholar 

  6. 6.

    Cerinšek, M., Batagelj, V.: Generalized two-mode cores. Social Netw. 42, 80–87 (2015)

    Article  Google Scholar 

  7. 7.

    Chang, L.: Efficient maximum clique computation over large sparse graphs. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 529–538 (2019)

  8. 8.

    Chang, L., Yu, J.X., Qin, L.: Fast maximal cliques enumeration in sparse graphs. Algorithmica 66(1), 173–186 (2013)

    MathSciNet  Article  Google Scholar 

  9. 9.

    Cheng, J., Ke, Y., Chu, S., Özsu, M.T.: Efficient core decomposition in massive networks. In: 2011 IEEE 27th International Conference on Data Engineering, pp. 51–62. IEEE (2011)

  10. 10.

    Chu, D., Zhang, F., Lin, X., Zhang, W., Zhang, Y., Xia, Y., Zhang, C.: Finding the best k in core decomposition: A time and space optimal solution. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 685–696. IEEE (2020)

  11. 11.

    Cohen, J.: Trusses: Cohesive subgraphs for social network analysis. Natl Secur. Agency Tech. Rep. 16, 1–3 (2008)

    Google Scholar 

  12. 12.

    Danisch, M., Balalau, O., Sozio, M.: Listing k-cliques in sparse real-world graphs. In: Proceedings of the 2018 World Wide Web Conference, pp. 589–598 (2018)

  13. 13.

    Dasari, N.S., Desh, R., Zubair, M.: Park: An efficient algorithm for k-core decomposition on multicore processors. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 9–16. IEEE (2014)

  14. 14.

    Ding, D., Li, H., Huang, Z., Mamoulis, N.: Efficient fault-tolerant group recommendation using alpha-beta-core. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2047–2050. ACM (2017)

  15. 15.

    Fang, Y., Cheng, R., Chen, Y., Luo, S., Hu, J.: Effective and efficient attributed community search. VLDB J. 26(6), 803–828 (2017)

    Article  Google Scholar 

  16. 16.

    Fang, Y., Cheng, R., Li, X., Luo, S., Hu, J.: Effective community search over large spatial graphs. Proc. VLDB Endowment 10(6), 709–720 (2017)

    Article  Google Scholar 

  17. 17.

    Fang, Y., Cheng, R., Luo, S., Hu, J.: Effective community search for large attributed graphs. Proc. VLDB Endowment 9(12), 1233–1244 (2016)

    Article  Google Scholar 

  18. 18.

    Fang, Y., Cheng, R., Luo, S., Hu, J., Huang, K.: C-explorer: browsing communities in large graphs. Proc. VLDB Endowment 10(12), 1885–1888 (2017)

    Article  Google Scholar 

  19. 19.

    Fang, Y., Huang, X., Qin, L., Zhang, Y., Zhang, W., Cheng, R., Lin, X.: A survey of community search over big graphs. VLDB J. 29(1), 353–392 (2020)

    Article  Google Scholar 

  20. 20.

    Fang, Y., Wang, Z., Cheng, R., Li, X., Luo, S., Hu, J., Chen, X.: On spatial-aware community search. IEEE Trans. Knowl. Data Eng. (TKDE) 31(4), 783–798 (2019)

    Article  Google Scholar 

  21. 21.

    Ghafouri, M., Wang, K., Zhang, F., Zhang, Y., Lin, X.: Efficient graph hierarchical decomposition with user engagement and tie strength. In: International Conference on Database Systems for Advanced Applications, pp. 448–465. Springer (2020)

  22. 22.

    Giatsidis, C., Thilikos, D. M., Vazirgiannis, M.: Evaluating cooperation in communities with the k-core structure. In: 2011 International conference on advances in social networks analysis and mining, pp. 87–93. IEEE (2011)

  23. 23.

    He, Y., Wang, K., Zhang, W., Lin, X., Zhang, Y.: Exploring cohesive subgraphs with vertex engagement and tie strength in bipartite graphs. arXiv preprint arXiv:2008.04054 (2020)

  24. 24.

    Hochbaum, D.S.: Approximating clique and biclique problems. J. Algorithms 29(1), 174–200 (1998)

    MathSciNet  Article  Google Scholar 

  25. 25.

    Huang, X., Cheng, H., Qin, L., Tian, W., Yu, J.X.: Querying k-truss community in large and dynamic graphs. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pp. 1311–1322 (2014)

  26. 26.

    Huang, X., Lakshmanan, L.V.: Attribute-driven community search. Proc. VLDB Endowment 10(9), 949–960 (2017)

    Article  Google Scholar 

  27. 27.

    Kabir, H., Madduri, K.: Shared-memory graph truss decomposition. In: 2017 IEEE 24th International Conference on High Performance Computing (HiPC), pp. 13–22. IEEE (2017)

  28. 28.

    Khaouid, W., Barsky, M., Srinivasan, V., Thomo, A.: K-core decomposition of large networks on a single pc. Proc. VLDB Endowment 9(1), 13–23 (2015)

    Article  Google Scholar 

  29. 29.

    Lee, V.E., Ruan, N., Jin, R., Aggarwal, C.: A survey of algorithms for dense subgraph discovery. In: Managing and Mining Graph Data, pp. 303–336. Springer (2010)

  30. 30.

    Li, C., Zhang, F., Zhang, Y., Qin, L., Zhang, W., Lin, X.: Efficient progressive minimum k-core search. Proc. VLDB Endowment 13(3), 362–375 (2019)

    Article  Google Scholar 

  31. 31.

    Li, Y., Kuboyama, T., Sakamoto, H.: Truss decomposition for extracting communities in bipartite graph. In: Third International Conference on Advances in Information Mining and Management, pp. 76–80 (2013)

  32. 32.

    Liu, B., Yuan, L., Lin, X., Qin, L., Zhang, W., Zhou, J.: Efficient (\(\alpha \), \(\beta \))-core computation: An index-based approach. In: The World Wide Web Conference, pp. 1130–1141. ACM (2019)

  33. 33.

    Lyu, B., Qin, L., Lin, X., Zhang, Y., Qian, Z., Zhou, J.: Maximum biclique search at billion scale. Proc. VLDB Endowment 13(9), 1359–1372 (2020)

    Article  Google Scholar 

  34. 34.

    Malliaros, F.D., Giatsidis, C., Papadopoulos, A.N., Vazirgiannis, M.: The core decomposition of networks: Theory, algorithms and applications. VLDB J. 29(1), 61–92 (2020)

    Article  Google Scholar 

  35. 35.

    Matula, D.W., Beck, L.L.: Smallest-last ordering and clustering and graph coloring algorithms. J. ACM (JACM) 30(3), 417–427 (1983)

    MathSciNet  Article  Google Scholar 

  36. 36.

    Mitzenmacher, M., Pachocki, J., Peng, R., Tsourakakis, C., Xu, S. C.: Scalable large near-clique detection in large-scale networks via sampling. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 815–824. ACM (2015)

  37. 37.

    Morone, F., Del Ferraro, G., Makse, H.A.: The k-core as a predictor of structural collapse in mutualistic ecosystems. Nature Phys. 15(1), 95–102 (2019)

    Article  Google Scholar 

  38. 38.

    Mukherjee, A.P., Tirthapura, S.: Enumerating maximal bicliques from a large graph using mapreduce. IEEE Trans. Serv. Comput. 10(5), 771–784 (2016)

    Article  Google Scholar 

  39. 39.

    Nataraj, R., Selvan, S.: Parallel mining of large maximal bicliques using order preserving generators. Int. J. Comput. 8(3), 105–113 (2014)

    Article  Google Scholar 

  40. 40.

    Peng, Y., Zhang, Y., Lin, X., Zhang, W., Qin, L., Zhou, J.: Towards bridging theory and practice: hop-constrained st simple path enumeration. Proc. VLDB Endowment 13(4), 463–476 (2019)

    Article  Google Scholar 

  41. 41.

    Peng, Y., Zhang, Y., Zhang, W., Lin, X., Qin, L.: Efficient probabilistic k-core computation on uncertain graphs. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1192–1203. IEEE (2018)

  42. 42.

    Qiu, X., Cen, W., Qian, Z., Peng, Y., Zhang, Y., Lin, X., Zhou, J.: Real-time constrained cycle detection in large dynamic graphs. Proc. VLDB Endowment 11(12), 1876–1888 (2018)

    Article  Google Scholar 

  43. 43.

    Saito, K., Yamada, T., Kazama, K.: Extracting communities from complex networks by the k-dense method. IEICE Trans. Fundam. Electron. Commun. Computer Sci. 91(11), 3304–3311 (2008)

    Article  Google Scholar 

  44. 44.

    Sanei-Mehri, S.-V., Sariyuce, A.E., Tirthapura, S.: Butterfly counting in bipartite networks. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2150–2159. ACM (2018)

  45. 45.

    Sarıyüce, A.E., Pinar, A.: Peeling bipartite networks for dense subgraph discovery. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 504–512. ACM (2018)

  46. 46.

    Sarıyüce, A.E., Seshadhri, C., Pinar, A.: Parallel local algorithms for core, truss, and nucleus decompositions. e-Print archive arXiv:1704.00386 (2017)

  47. 47.

    Seidman, S.B.: Network structure and minimum degree. Social Netw. 5(3), 269–287 (1983)

    MathSciNet  Article  Google Scholar 

  48. 48.

    Shi, J., Shun, J.: Parallel algorithms for butterfly computations. In: 1st Symposium on Algorithmic Principles of Computer Systems, APOCS@SODA 2020, Salt Lake City, UT, USA, January 8, 2020, pp. 16–30 (2020)

  49. 49.

    Sim, K., Li, J., Gopalkrishnan, V., Liu, G.: Mining maximal quasi-bicliques: Novel algorithm and applications in the stock market and protein networks. Stat. Anal. Data Mining: The ASA Data Sci. J. 2(4), 255–273 (2009)

    MathSciNet  Article  Google Scholar 

  50. 50.

    Smith, S., Liu, X., Ahmed, N. K., Tom, A. S., Petrini, F., Karypis, G.: Truss decomposition on shared-memory parallel systems. In: 2017 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2017)

  51. 51.

    Su, X., Khoshgoftaar, T. M.: A survey of collaborative filtering techniques. Advances in artificial intelligence (2009)

  52. 52.

    Wang, J., Cheng, J.: Truss decomposition in massive networks. Proc. VLDB Endowment 5(9), 812–823 (2012)

    Article  Google Scholar 

  53. 53.

    Wang, J., Cheng, J., Fu, A. W.-C.: Redundancy-aware maximal cliques. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 122–130 (2013)

  54. 54.

    Wang, J., Fu, A.W.-C., Cheng, J.: Rectangle counting in large bipartite graphs. In: 2014 IEEE International Congress on Big Data, pp. 17–24. IEEE (2014)

  55. 55.

    Wang, K., Cao, X., Lin, X., Zhang, W., Qin, L.: Efficient computing of radius-bounded k-cores. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 233–244. IEEE (2018)

  56. 56.

    Wang, K., Lin, X., Qin, L., Zhang, W., Zhang, Y.: Vertex priority based butterfly counting for large-scale bipartite networks. Proc. VLDB Endowment 12(10), 1139–1152 (2019)

    Article  Google Scholar 

  57. 57.

    Wang, K., Wang, S., Cao, X., Qin, L.: Efficient radius-bounded community search in geo-social networks. IEEE Trans. Knowl. Data Eng. (2020). https://doi.org/10.1109/TKDE.2020.3040172

  58. 58.

    Wang, K., Zhang, W., Lin, X., Zhang, Y., Qin, L., Zhang, Y.: Efficient and effective community search on large-scale bipartite graphs. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE (2021)

  59. 59.

    Yuan, L., Qin, L., Lin, X., Chang, L., Zhang, W.: Diversified top-k clique search. VLDB J. 25(2), 171–196 (2016)

    Article  Google Scholar 

  60. 60.

    Zhang, C., Zhang, F., Zhang, W., Liu, B., Zhang, Y., Qin, L., Lin, X.: Exploring finer granularity within the cores: Efficient (k, p)-core computation. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 181–192. IEEE (2020)

  61. 61.

    Zhang, C., Zhang, W., Zhang, Y., Qin, L., Zhang, F., Lin, X.: Selecting the optimal groups: Efficiently computing skyline k-cliques. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1211–1220 (2019)

  62. 62.

    Zhang, F., Li, C., Zhang, Y., Qin, L., Zhang, W.: Finding critical users in social communities: The collapsed core and truss problems. IIEEE Trans. Knowl. Data Eng. (2018)

  63. 63.

    Zhang, F., Zhang, Y., Qin, L., Zhang,W., Lin, X.: Finding critical users for social network engagement: The collapsed k-core problem. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

  64. 64.

    Zhang, F., Zhang, Y., Qin, L., Zhang, W., Lin, X.: When engagement meets similarity: Efficient (k, r)-core computation on social networks. Proc. VLDB Endowment 10(10), 998–1009 (2017)

    Article  Google Scholar 

  65. 65.

    Zhang, F., Zhang, Y., Qin, L., Zhang, W., Lin, X.: Efficiently reinforcing social networks over user engagement and tie strength. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 557–568. IEEE (2018)

  66. 66.

    Zhang, Y., Parthasarathy, S.: Extracting analyzing and visualizing triangle k-core motifs within networks. In: 2012 IEEE 28th International Conference on Data Engineering, pp. 1049–1060, IEEE (2012)

  67. 67.

    Zhang, Y., Phillips, C.A., Rogers, G.L., Baker, E.J., Chesler, E.J., Langston, M.A.: On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC Bioinform. 15(1), 110 (2014)

    Article  Google Scholar 

  68. 68.

    Zhou, Z., Zhang, F., Lin, X., Zhang, W., Chen, C.: K-core maximization: An edge addition approach. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 4867–4873. AAAI Press (2019)

  69. 69.

    Zou, Z.: Bitruss decomposition of bipartite graphs. In: International Conference on Database Systems for Advanced Applications, pp. 218–233. Springer (2016)

Download references

Acknowledgements

Xuemin Lin is supported by the National Key R&D Program of China under grant 2018AAA0102502 and ARC DP200101338. Lu Qin is supported by ARC FT200100787. Wenjie Zhang is supported by ARC DP210101393 and ARC DP200101116. Ying Zhang is supported by FT170100128 and ARC DP180103096.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kai Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, K., Lin, X., Qin, L. et al. Towards efficient solutions of bitruss decomposition for large-scale bipartite graphs. The VLDB Journal (2021). https://doi.org/10.1007/s00778-021-00658-5

Download citation

Keywords

  • Bipartite graph
  • Cohesive subgraph
  • Bitruss decomposition
  • Parallelization