Skip to main content
Log in

Sliding window-based approximate triangle counting with bounded memory usage

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Streaming graph analysis is gaining importance in various fields due to the natural dynamicity in many real graph applications. However, approximately counting triangles in real-world streaming graphs with duplicate edges and sliding window model remains an unsolved problem. In this paper, we propose SWTC algorithm to address approximate sliding-window triangle counting problem in streaming graphs. In SWTC, we propose a fixed-length slicing strategy that addresses both sample maintaining and cardinality estimation issues with a bounded memory usage. We theoretically prove the superiority of our method in sample graph size and estimation accuracy under given memory upper bound. To further improve the performance of our algorithm, we propose two optimization techniques, vision counting to avoid computation peaks, and asynchronous grouping to stabilize the accuracy. Extensive experiments also confirm that our approach has higher accuracy compared with the baseline method under the same memory usage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. Depending on the semantics of binary counting or weighted counting, we need to either count distinct number of edges, or include the duplicate edges in counting.

  2. In page 3, Sect. 3.1 of [19].

  3. notice that a bi-direction edge means one edge marked as bi-direction rather than 2 edges with reverse directions. In the latter case the motif include 4 edges rather than 3.

  4. http://snap.stanford.edu/data/sx-stackoverflow.html

  5. https://webscope.sandbox.yahoo.com/catalog.php?datatype=g

  6. http://konect.cc/networks/wiki_talk_en/

  7. http://konect.cc/

References

  1. Berry, J.W., Hendrickson, B., LaViolette, R.A., Phillips, C.A.: Tolerating the community detection resolution limit with edge weighting. Phys. Rev. E 83(5), 056119 (2011)

    Article  Google Scholar 

  2. Jean-Pierre, E., Elisha, Moses: Curvature of co-links uncovers hidden thematic layers in the world wide web. Proc. Natl. Acad. Sci. USA 99(9), 5825–5829 (2002)

    Article  MathSciNet  Google Scholar 

  3. Becchetti, L., Boldi, Paolo, Castillo, C., Gionis, A.: Efficient algorithms for large-scale local triangle counting. ACM Trans. Know. Dis. Data (TKDD) 4(3), 13 (2010)

    Google Scholar 

  4. Milo, R., Shen-Orr, Shai, Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, Uri: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)

    Article  Google Scholar 

  5. Kang, U., Meeder, B., Papalexakis, Evangelos E., Faloutsos, C.: Heigen: Spectral analysis for billion-scale graphs. IEEE Trans. Know. Data Eng. 26(2), 350–362 (2012)

    Article  Google Scholar 

  6. Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network sybils in the wild. ACM Trans. Know. Dis. Data (TKDD) 8(1), 1–29 (2014)

    Article  Google Scholar 

  7. Li, Z., Yunting, Lu., Zhang, W.-P., Li, R.-H., Guo, J., Huang, X., Mao, Rui: Discovering hierarchical subgraphs of k-core-truss. Data Sci. Eng. 3(2), 136–149 (2018)

    Article  Google Scholar 

  8. Pavan, A., Tangwongsan, K., Tirthapura, S., Kun Lung, Wu.: Counting and sampling triangles from a graph stream. Proc. Vldb Endowment 6(14), 1870–1881 (2013)

    Article  Google Scholar 

  9. Ahmed, N. K., Duffield, N., Neville, J., & Kompella, R.: Graph sample and hold: A framework for big-graph analytics. In: Acm Sigkdd International Conference on Knowledge Discovery & Data Mining, (2014)

  10. Wang, P., Qi, Y., Sun, Yu., Zhang, X., Guan, X.: Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage. Proc. Vldb Endowment 11(2), 162–175 (2017)

    Article  Google Scholar 

  11. Boykin, P.O., Roychowdhury, Vwani P.: Leveraging social networks to fight spam. Computer 38(4), 61–68 (2005)

    Article  Google Scholar 

  12. Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. Siam J. Comput. 31(6), 1794–1813 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  13. Li, Y., Zou, L., Özsu, M.T., Dongyan, Z.: Time constrained continuous subgraph search over streaming graphs. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 1082–1093. IEEE, (2019)

  14. Crouch, M.S., McGregor, A., Stubbs, D.: Dynamic graphs in the sliding-window model. In European Symposium on Algorithms, pages 337–348. Springer, (2013)

  15. Qiu, X., Cen, W., Qian, Z., Peng, Y., Zhang, Y., Lin, X., Zhou, J.: Real-time constrained cycle detection in large dynamic graphs. Proc. VLDB Endowment 11(12), 1876–1888 (2018)

    Article  Google Scholar 

  16. Jung, M., Lim, Y., Lee, S., Kang, U.: Furl: fixed-memory and uncertainty reducing local triangle counting for multigraph streams. Data Min. Know. Dis. 33(5), 1225–1253 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  17. De Stefani, Lorenzo, Epasto, Alessandro, Riondato, Matteo, Upfal, Eli: Triest: counting local and global triangles in fully dynamic streams with fixed memory size. ACM Trans. Know. Dis. Data (TKDD) 11(4), 1–50 (2017)

    Article  Google Scholar 

  18. Shin, Kijung, Sejoon, Oh., Kim, Jisu, Hooi, Bryan, Faloutsos, Christos: Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Trans. Know. Dis. Data (TKDD) 14(2), 1–39 (2020)

    Article  Google Scholar 

  19. Gemulla, R., Lehner, W.: Sampling time-based sliding windows in bounded space. In: Acm Sigmod International Conference on Management of Data, (2008)

  20. Flajolet, P., Fusy, É., Gandouet, O., Meunier, F.: Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science, pages 137–156. Discrete Mathematics and Theoretical Computer Science, (2007)

  21. Ting, D.: Streamed approximate counting of distinct elements: Beating optimal batch methods. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 442–451 (2014)

  22. Dongjin, L., Kijung, S., Christos, F.: Temporal locality-aware sampling for accurate triangle counting in real graph streams. The VLDB Journal, pages 1–25 (2020)

  23. Source code of swtc and the baseline method. https://github.com/StreamingTriangleCounting/TriangleCounting.git

  24. Brian, B., Mayur, D., Rajeev, M.: Sampling from a moving window over streaming data. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, pages 633–634. Society for Industrial and Applied Mathematics (2002)

  25. Seidel, R., Aragon, Cecilia R.: Randomized search trees. Algorithmica 16(4), 464–497 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  26. Kac, Mark: Statistical Independence in Probability. Courier Dover Publications, Analysis and Number, New York (2018)

    MATH  Google Scholar 

  27. Duffield, N.G., Grossglauser, M.: Trajectory sampling for direct traffic observation. IEEE/ACM Trans. Netw. 9(3), 280–292 (2001)

    Article  Google Scholar 

  28. Duffield, Nick: Sampling for passive internet measurement: A review. Stat. Sci. 19(3), 472–498 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  29. Aggarwal, C.C., Yuchen, Z., Yu, P.S.: Outlier detection in graph streams. In 2011 IEEE 27th international conference on data engineering, pages 399–409. IEEE, (2011)

  30. Ashish, T., Sen, S.J., Namit, J., Zheng, S., Prasad, C., Ning, Z., Suresh, A., Hao, L., Raghotham, M.: Hive-a petabyte scale data warehouse using hadoop. In: 2010 IEEE 26th international conference on data engineering (ICDE 2010), pages 996–1005. IEEE (2010)

  31. Maurizio, M., Saverio, N., Duffield, N.G.: A comparative experimental study of hash functions applied to packet sampling. In: Proc. of International Teletraffic Congress (ITC) (2005)

  32. Slota, G.M., Madduri, Kamesh: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  Google Scholar 

  33. Slota, G.M., Kamesh, M.: Complex network analysis using parallel approximate motif counting. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pages 405–414. IEEE, (2014)

  34. Bressan, M., Chierichetti, F., Kumar, R., Leucci, S., Panconesi, A.: Motif counting beyond five nodes. ACM Trans. Know. Dis. Data (TKDD) 12(4), 1–25 (2018)

    Article  Google Scholar 

  35. Bobhash function. http://burtleburtle.net/bob/hash/doobs.html

  36. Murmurhash function. Published by Austin Appleby at https://github.com/aappleby/smhasher

  37. Sedgewick, R.:. Algorithms in c. Pearson Education, (2001)

  38. Aphash and collection of other hash functions. http://www.partow.net/programming/hashfunctions/#RSHashFunction

  39. Alon, N., Yuster, R., Zwick, U.: Finding and counting given length cycles. Algorithmica 17(3), 209–223 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  40. Shaikh, A., Maleq, K., Madhav, M.: Patric: a parallel algorithm for counting triangles in massive networks. In Acm International Conference on Information & Knowledge Management, (2013)

  41. Xiaocheng, H., Yufei, T., Chung, C.W.: Massive graph triangulation. In: Acm Sigmod International Conference on Management of Data, (2013)

  42. Jinha, K., Wook, S.H., Sangyeon, L., Kyungyeol, P., Yu, H.: Opt:a new framework for overlapped and parallel triangulation in large-scale graphs. (2014)

  43. Ha-Myung, P., Sung-Hyon, M., Kang, U.: Pte: Enumerating trillion triangles on distributed systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1115–1124 (2016)

  44. Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, pages 623–632. Society for Industrial and Applied Mathematics (2002)

  45. Buriol, S.L., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., Sohler, C.: Counting triangles in data streams. In Acm Sigmod-sigact-sigart Symposium on Principles of Database Systems, (2006)

  46. Jowhari, H., Ghodsi, M.: New streaming algorithms for counting triangles in graphs. In International Computing and Combinatorics Conference, pages 710–716. Springer, (2005)

  47. Lim, Y., Kang, U.: Mascot: Memory-efficient and accurate sampling for counting local triangles in graph streams. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 685–694. ACM (2015)

  48. Jha, M., Seshadhri, C., Pinar, A.: A space efficient streaming algorithm for triangle counting using the birthday paradox. (2013)

  49. Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 837–846, (2009)

  50. Vitter, Jeffrey S.: Random sampling with a reservoir. ACM Trans. Math. Soft. (TOMS) 11(1), 37–57 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  51. Gemulla, R., Lehner, Wolfgang, Haas, P.J.: Maintaining bounded-size sample synopses of evolving datasets. The VLDB J. 17(2), 173–201 (2008)

    Article  Google Scholar 

  52. Braverman, V., Ostrovsky, R., Zaniolo, C.: Optimal sampling from sliding windows. In Twenty-eighth Acm Sigmod-sigact-sigart Symposium on Principles of Database Systems, (2009)

  53. Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 77–86, (2010)

  54. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: Densification and shrinking diameters. ACM Trans. Know. Dis. Data (TKDD), 1(1):2–es, (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Zou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by NSFC under Grant 61932001 and U20A20174.

Appendix

Appendix

1.1 Influence of duplication ratio

In order to evaluate the influence of duplication ratio of the streaming graph, we use a synthetic dataset FF to carry out experiments. This dataset is generated by Fire-Forest model [54]. It includes 18, 311, 282 edges and 1M nodes. There are originally no duplicate edges. We generate edge frequencies for it with power-law distribution and vary the duplication ratio to carry out experiments. The timestamps are randomly generated in this dataset. We formally define the duplication ratio as \(\frac{total\ number\ of\ edges}{number\ of\ distinct\ edges}-1\). The window length is set to be 3M and the sample rate is set to be \(4\%\). We use binary counting semantics in this experiment. The memory usage and the valid sample size does not change with the duplication ratio. The experimental result in Fig. 17 shows that MAPE and max error decreases with the increment of duplication ratio. Because with more duplicate edges, the number of distinct edges in the sliding window decreases, and the sample size becomes relatively large.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gou, X., Zou, L. Sliding window-based approximate triangle counting with bounded memory usage. The VLDB Journal 32, 1087–1110 (2023). https://doi.org/10.1007/s00778-023-00783-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-023-00783-3

Keywords

Navigation