Skip to main content
Log in

Top-k heavy weight triangles listing on graph stream

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Graph stream is used to express complex and highly dynamic relationships between entities, such as friendships in social networks. The storage and mining on graph stream are the main research areas of big data research. Triangle listing/counting is an important topic in graph mining research. The triangle, as the simplest circle and clique structure, has many applications in many real-world scenarios. A large amount of related work also exists on the study of triangles on graph streams. However, the existing research has focused on triangle counting on static graphs or graph streams, and there is a lack of research targeting heavy weight triangle listing. This paper formally defines the triangle weight on graph stream. Based on this definition, this paper presents an approximation algorithm for the top-k heavy weight triangle listing problem on graph stream, and proposes various optimized data structures DolhaT, Filtered DolhaT and Double Filtered DolhaT (DFD) to solve this problem. Experiments on real graph stream datasets demonstrate the effectiveness of the proposed optimized structures for the heavy weight triangle listing problem on graph stream.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Algorithm 1
Algorithm 2
Algorithm 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

Data availability

All experimental data generated and analysed during this study are included in this published article.

References

  1. Guha, S., McGregor, A.: Graph synopses, sketches, and streams: A survey. Proc. VLDB Endow 5(12), 2030–2031 (2012)

    Article  Google Scholar 

  2. Tweet statistics. http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazingtwitter-stats/10/

  3. Email Statistics Report, 2015-2019. https://radicati.com/wp/wp-content/uploads/2015/02/Email-Statistics-Report-2015-2019-Executive-Summary.pdf

  4. Broder, A.Z., Mitzenmacher, M.: Survey: Network applications of bloom filters: A survey. Internet Math. 1(4), 485–509 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  5. Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Metwally, A., Agrawal, D., Abbadi, A.E.: Efficient computation of frequent and top-k elements in data streams 3363, 398–412 (2005)

  7. Homem, N., Carvalho, J.P.: Finding top-k elements in data streams. Inf. Sci. 180(24), 4958–4974 (2010)

    Article  Google Scholar 

  8. Afek, Y., Bremler-Barr, A., Cohen, E., Feibish, S.L., Shagam, M.: Efficient distinct heavy hitters for DNS ddos attack detection. arXiv:1612.02636 (2016)

  9. Basat, R.B., Chen, X., Einziger, G., Rottenstreich, O.: Designing heavy-hitter detection algorithms for programmable switches. IEEE/ACM Trans. Netw. 28(3), 1172–1185 (2020)

    Article  Google Scholar 

  10. Newman, M.E., Watts, D.J., Strogatz, S.H.: Random graph models of social networks. Proceedings of the National Academy of Sciences 99(suppl 1), 2566–2572 (2002)

    Article  MATH  Google Scholar 

  11. Pourhabibi, T., Ong, K., Kam, B., Boo, Y.L.: Fraud detection: A systematic literature review of graph-based anomaly detection approaches. Decis. Support Syst. 133, 113303 (2020)

    Article  Google Scholar 

  12. Stefani, L.D., Epasto, A., Riondato, M., Upfal, E.: Trièst: Counting local and global triangles in fully dynamic streams with fixed memory size. ACM Trans. Knowl. Discov. Data 11(4), 43–14350 (2017)

    Article  Google Scholar 

  13. Qiu, X., Cen, W., Qian, Z., Peng, Y., Zhang, Y., Lin, X., Zhou, J.: Real-time constrained cycle detection in large dynamic graphs. Proc. VLDB Endow. 11(12), 1876–1888 (2018)

    Article  Google Scholar 

  14. Berry, J.W., Hendrickson, B., LaViolette, R.A., Phillips, C.A.: Tolerating the community detection resolution limit with edge weighting. Physical Review E 83(5), 056119 (2011)

    Article  Google Scholar 

  15. Eckmann, J.-P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the world wide web. Proceedings of the National Academy of Sciences 99(9), 5825–5829 (2002)

    Article  MathSciNet  Google Scholar 

  16. Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient algorithms for large-scale local triangle counting. ACM Trans. Knowl. Discov. Data 4(3), 13–11328 (2010)

    Article  Google Scholar 

  17. Chu, S., Cheng, J.: Triangle listing in massive networks and its applications, 672–680 (2011)

  18. Lim, Y., Kang, U.: MASCOT: memory-efficient and accurate sampling for counting local triangles in graph streams, 685–694 (2015)

  19. Lee, D., Shin, K., Faloutsos, C.: Temporal locality-aware sampling for accurate triangle counting in real graph streams. VLDB J. 29(6), 1501–1525 (2020)

    Article  Google Scholar 

  20. Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  21. Gemulla, R., Lehner, W., Haas, P.J.: Maintaining bounded-size sample synopses of evolving datasets. VLDB J. 17(2), 173–202 (2008)

    Article  Google Scholar 

  22. Wang, P., Qi, Y., Sun, Y., Zhang, X., Tao, J., Guan, X.: Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage. Proc. VLDB Endow. 11(2), 162–175 (2017)

    Article  Google Scholar 

  23. Jung, M., Lim, Y., Lee, S., Kang, U.: FURL: fixed-memory and uncertainty reducing local triangle counting for multigraph streams. Data Min. Knowl. Discov. 33(5), 1225–1253 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  24. Shin, K., Oh, S., Kim, J., Hooi, B., Faloutsos, C.: Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Trans. Knowl. Discov. Data 14(2), 12–11239 (2020)

    Article  Google Scholar 

  25. Ting, D.: Streamed approximate counting of distinct elements: beating optimal batch methods, 442–451 (2014)

  26. Pavan, A., Tangwongsan, K., Tirthapura, S., Wu, K.: Counting and sampling triangles from a graph stream. Proc. VLDB Endow. 6(14), 1870–1881 (2013)

    Article  Google Scholar 

  27. Jha, M., Seshadhri, C., Pinar, A.: A space efficient streaming algorithm for triangle counting using the birthday paradox, 589–597 (2013)

  28. Ahmed, N.K., Duffield, N.G., Neville, J., Kompella, R.R.: Graph sample and hold: a framework for big-graph analytics, 1446–1455 (2014)

  29. Yang, T., Zhang, H., Yang, D., Huang, Y., Li, X.: Finding significant items in data streams, 1394–1405 (2019)

  30. Kumar, V., Sinha, D.: A robust intelligent zero-day cyber-attack detection technique. Complex & Intelligent Systems 7(5), 2211–2234 (2021)

    Article  Google Scholar 

  31. Choudhury, S., Holder, L.B., Jr., G.C., Agarwal, K., Feo, J.: A selectivity based approach to continuous pattern detection in streaming graphs, 157–168 (2015)

  32. Li, Y., Zou, L., Özsu, M.T., Zhao, D.: Time constrained continuous subgraph search over streaming graphs, 1082–1093 (2019)

  33. Kong, Y.-X., Shi, G.-Y., Wu, R.-J., Zhang, Y.-C.: k-core: Theories and applications. Physics Reports 832, 1–32 (2019)

    Article  MathSciNet  Google Scholar 

  34. Zhang, F., Zou, L., Zeng, L., Gou, X.: Dolha - an efficient and exact data structure for streaming graphs. World Wide Web 23(2), 873–903 (2020)

    Article  Google Scholar 

  35. Li, J., Li, Z., Xu, Y., Jiang, S., Yang, T., Cui, B., Dai, Y., Zhang, G.: Wavingsketch: An unbiased and generic sketch for finding top-k items in data streams. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1574–1584 (2020)

  36. Fan, Z., Hu, Z., Wu, Y., Guo, J., Liu, W., Yang, T., Wang, H., Xu, Y., Uhlig, S., Tu, Y.: Pisketch: finding persistent and infrequent flows. In: Proceedings of the ACM SIGCOMM Workshop on Formal Foundations and Security of Programmable Network Infrastructures, pp. 8–14 (2022)

  37. Song, C., Liu, X., Ge, T., Ge, Y.: Top-k frequent items and item frequency tracking over sliding windows of any size. Information Sciences 475, 100–120 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  38. Ben-Basat, R., Einziger, G., Friedman, R., Kassner, Y.: Heavy hitters in streams and sliding windows. In: IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, pp. 1–9 (2016). IEEE

  39. Alon, N., Yuster, R., Zwick, U.: Finding and counting given length cycles. Algorithmica 17(3), 209–223 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  40. Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study 3503, 606–609 (2005)

  41. Gall, F.L.: Improved quantum algorithm for triangle finding via combinatorial arguments, 216–225 (2014)

  42. Vassilevska, V., Williams, R.: Finding a maximum weight triangle in \(o(n^{3-\delta })\) time, with applications, 225–231 (2006)

  43. Czumaj, A., Lingas, A.: Finding a heaviest vertex-weighted triangle is not harder than matrix multiplication. SIAM J. Comput. 39(2), 431–444 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  44. Patrascu, M.: Towards polynomial lower bounds for dynamic problems, 603–610 (2010)

  45. Williams, V.V., Williams, R.: Subcubic equivalences between path, matrix and triangle problems, 645–654 (2010)

  46. Williams, V.V., Williams, R.: Finding, minimizing, and counting weighted subgraphs. SIAM J. Comput. 42(3), 831–854 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  47. Williams, R.R.: Faster all-pairs shortest paths via circuit complexity. SIAM J. Comput. 47(5), 1965–1985 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  48. Cohen, W.W.: Enron email dataset. http://www.cs.cmu.edu/~enron/. Accessed in 2009

  49. Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization, 4292–4293 (2015)

  50. Mislove, A., Koppula, H.S., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Growth of the flickr social network, 25–30 (2008)

  51. Richardson, M., Agrawal, R., Domingos, P.M.: Trust management for the semantic web 2870, 351–368 (2003)

  52. Massa, P., Avesani, P.: Controversial users demand local trust metrics: An experimental study on epinions.com community, 121–126 (2005)

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Fan Zhang was a major contributor in the problem definition and the algorithm design and implementation as well as the manuscript writing. Xiangyang Gou revised the paper writing. Lei Zou guided the whole project and revised the paper writing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lei Zou.

Ethics declarations

Ethical approval and Consent to participate

Not applicable.

Consent for publication

Not applicable.

Human and animal ethics

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, F., Gou, X. & Zou, L. Top-k heavy weight triangles listing on graph stream. World Wide Web 26, 1827–1851 (2023). https://doi.org/10.1007/s11280-022-01117-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-022-01117-z

Keywords

Navigation