Advertisement

On Counting Triangles Through Edge Sampling in Large Dynamic Graphs

  • Guyue Han
  • Harish SethuEmail author
Chapter
Part of the Lecture Notes in Social Networks book series (LNSN)

Abstract

Traditional frameworks for dynamic graphs have relied on processing only the stream of edges added into or deleted from an evolving graph, but not any additional related information such as the degrees or neighbor lists of nodes incident to the edges. In this chapter, we propose a new edge sampling framework for big-graph analytics in dynamic graphs which enhances the traditional model by enabling the use of additional related information. To demonstrate the advantages of this framework, we present a new sampling algorithm, called Edge Sample and Discard (esd). It generates an unbiased estimate of the total number of triangles, which can be continuously updated in response to both edge additions and deletions. We provide a comparative analysis of the accuracy and computational complexity of esd under the new framework against two current state-of-the-art algorithms operating under the traditional framework. The results of the experiments performed on real graphs show that, with the help of the neighborhood information of the sampled edges, the accuracy achieved by our algorithm is substantially better. We also characterize the impact of properties of the graph on the performance of our algorithm by testing on several Barabási–Albert graphs.

Notes

Acknowledgements

This work was partially supported by the National Science Foundation Award #1250786.

References

  1. 1.
    Ahmed, N.K., Duffield, N., Neville, J., Kompella, R.: Graph sample and hold: a framework for big-graph analytics. In: ACM KDD, pp. 1446–1455. ACM, NY (2014)Google Scholar
  2. 2.
    Alon, N., Yuster, R., Zwick, U.: Finding and counting given length cycles. Algorithmica 17(3), 209–223 (1997)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Avron, H.: Counting triangles in large graphs using randomized matrix trace estimation. In: Workshop on Large-Scale Data Mining: Theory and Applications, vol. 10, pp. 10–9 (2010)Google Scholar
  4. 4.
    Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient algorithms for large-scale local triangle counting. ACM Trans. Knowl. Discov. Data 4(3), 13 (2010)CrossRefGoogle Scholar
  5. 5.
    Berry, J.W., Hendrickson, B., LaViolette, R.A., Phillips, C.A.: Tolerating the community detection resolution limit with edge weighting. Phys. Rev. E 83(5), 056119 (2011)CrossRefGoogle Scholar
  6. 6.
    Buriol, L.S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., Sohler, C.: Counting triangles in data streams. In: ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 253–262. ACM, New York (2006)Google Scholar
  7. 7.
    Chakrabarti, D., Faloutsos, C.: Graph mining: laws, generators, and algorithms. ACM Comput. Surv. 38(1), 2 (2006)CrossRefGoogle Scholar
  8. 8.
    Chiba, N., Nishizeki, T.: Arboricity and subgraph listing algorithms. SIAM J. Comput. 14(1), 210–223 (1985)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. In: Proceedings of the 19th Annual ACM Symposium on Theory of Computing, pp. 1–6. ACM, New York (1987)Google Scholar
  10. 10.
    Foucault Welles, B., Van Devender, A., Contractor, N.: Is a friend a friend?: investigating the structure of friendship networks in virtual worlds. In: CHI’10 Extended Abstracts on Human Factors in Computing Systems, pp. 4027–4032. ACM, New York (2010)Google Scholar
  11. 11.
    Gemulla, R., Lehner, W., Haas, P.J.: Maintaining bounded-size sample synopses of evolving datasets. VLDB J. 17(2), 173–201 (2008)CrossRefGoogle Scholar
  12. 12.
    Han, G., Sethu, H.: Edge sample and discard: a new algorithm for counting triangles in large dynamic graphs. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 44–49. ACM, New York (2017)Google Scholar
  13. 13.
    Hardiman, S.J., Katzir, L.: Estimating clustering coefficients and size of social networks via random walk. In: WWW, pp. 539–550. ACM, New York (2013)Google Scholar
  14. 14.
    Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47(260), 663–685 (1952)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Jha, M., Seshadhri, C., Pinar, A.: A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox. ACM Trans. Knowl. Discov. Data 9(3), 15 (2015)CrossRefGoogle Scholar
  16. 16.
    Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8(1–2), 161–185 (2012)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Kutzkov, K., Pagh, R.: Triangle counting in dynamic graph streams. In: Algorithm Theory–SWAT 2014, pp. 306–318. Springer, Cham (2014)Google Scholar
  18. 18.
    Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1), 458–473 (2008)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (Jun 2014)
  20. 20.
    Lim, Y., Kang, U.: Mascot: memory-efficient and accurate sampling for counting local triangles in graph streams. In: ACM KDD, pp. 685–694. ACM, New York (2015)Google Scholar
  21. 21.
    Newman, M.E.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Pagh, R., Tsourakakis, C.E.: Colorful triangle counting and a MapReduce implementation. Inf. Process. Lett. 112(7), 277–281 (2012)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization (2013), http://networkrepository.com
  24. 24.
    Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: Experimental and Efficient Algorithms, pp. 606–609. Springer, Berlin (2005)CrossRefGoogle Scholar
  25. 25.
    Shin, K.: Wrs: Waiting room sampling for accurate triangle counting in real graph streams. arXiv preprint arXiv:1709.03147 (2017)Google Scholar
  26. 26.
    Stefani, L.D., Epasto, A., Riondato, M., Upfal, E.: TRIÈST: Counting local and global triangles in fully-dynamic streams with fixed memory size. CoRR abs/1602.07424 (2016), http://arxiv.org/abs/1602.07424
  27. 27.
    Thorup, M., Zhang, Y.: Tabulation-based 5-independent hashing with applications to linear probing and second moment estimation. SIAM J. Comput. 41(2), 293–331 (2012)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Tiropanis, T., Hall, W., Crowcroft, J., Contractor, N., Tassiulas, L.: Network science, web science, and Internet science. Commun. ACM 58(8), 76–82 (2015)CrossRefGoogle Scholar
  29. 29.
    Tsourakakis, C.E.: Fast counting of triangles in large real networks without counting: algorithms and laws. In: 2008 8th IEEE International Conference on Data Mining, pp. 608–617. IEEE, Pisa (2008)Google Scholar
  30. 30.
    Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: Counting triangles in massive graphs with a coin. In: ACM KDD, pp. 837–846, ACM, New York (2009)Google Scholar
  31. 31.
    Türkoğlu, D., Turk, A.: Edge-based wedge sampling to estimate triangle counts in very large graphs. arXiv preprint arXiv:1710.09961 (2017)Google Scholar
  32. 32.
    Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cambridge University Press, Cambridge (1994)CrossRefGoogle Scholar
  33. 33.
    Welser, H.T., Gleave, E., Fisher, D., Smith, M.: Visualizing the signatures of social roles in online discussion groups. J. Soc. Struct. 8(2), 1–32 (2007)Google Scholar
  34. 34.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of ECEDrexel UniversityPhiladelphiaUSA

Personalised recommendations