Tracking triadic cardinality distributions for burst detection in high-speed graph streams

Abstract

In everyday life, we often observe unusually frequent interactions among people before or during important events, e.g., people send/receive more greetings to/from their friends on holidays than regular days. We also observe that some videos or hashtags suddenly go viral through people’s sharing on online social networks (OSNs). Do these seemingly different phenomena share a common structure? All these phenomena are associated with the sudden surges of node interactions in networks, which we call “bursts” in this work. We uncover that, in many scenarios, the emergence of a burst is accompanied with the formation of triangles in networks. This finding motivates us to propose a new and robust method for burst detection on an OSN. We first introduce a new measure, i.e., “triadic cardinality distribution,” corresponding to the fractions of nodes with different numbers of triangles, i.e., triadic cardinalities, in a network. We show that this distribution not only changes when a burst occurs, but it also has a robustness property that it is immunized against common spamming social-bot attacks. Hence, by tracking triadic cardinality distributions, we can more reliably detect bursts than simply counting node interactions on an OSN. To avoid handling massive activity data generated by OSN users during the triadic tracking, we design an efficient “sample-estimate” framework to provide maximum likelihood estimate of the triadic cardinality distribution. We propose several sampling methods and provide insights into their performance difference through both theoretical analysis and empirical experiments on real-world networks.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Notes

  1. 1.

    We assume two Enron users are friends if they have at least one email communication in the dataset.

References

  1. 1.

    Ahmed NK, Duffield N, Neville J, Kompella R (2014) Graph sample and hold: a framework for big-graph analytics. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining

  2. 2.

    Arifuzzaman S, Khan M, Vishnu M (2019) Fast parallel algorithms for counting and listing triangles in big graphs. ACM Trans Knowl Discov Data 14(1):1–34

    Article  Google Scholar 

  3. 3.

    Backstrom L, Bakshy E, Kleinberg J, Lento TM, Rosenn I (2011) Center of attention: How Facebook users allocate attention across friends. In: Proceedings of the 5th international AAAI conference on weblogs and social media

  4. 4.

    Barabasi AL (2005) The origin of bursts and heavy tails in human dynamics. Nature 435:207–211

    Article  Google Scholar 

  5. 5.

    Becchetti L, Boldi P, Castillo C, Gionis A (2008) Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining

  6. 6.

    Beta-binomial distribution. https://en.wikipedia.org/wiki/Beta-binomial_distribution (Retrieved June 2020)

  7. 7.

    Beutel A, Xu W, Guruswami V, Palow C, Faloutsos C (2013) CopyCatch: stopping group attacks by spotting lockstep behavior in social networks. In: Proceedings of the 22nd international world wide web conference

  8. 8.

    Boshmaf Y, Muslukhov I, Beznosov K, Ripeanu M (2011) The socialbot network: When bots socialize for fame and money. In: Proceedings of the 27th annual computer security applications conference

  9. 9.

    Budak C, Agrawal D, Abbadi AE (2011) Structural trend analysis for online social networks. In: Proceedings of the VLDB endowment

  10. 10.

    Cadena J, Vullikanti A (2018) Mining heavy temporal subgraphs: Fast algorithms and applications. In: Proceedings of the 32nd AAAI conference on artificial intelligence

  11. 11.

    Chierichetti F, Kleinberg J, Kumar R, Mahdian M, Pandey S (2014) Event detection via communication pattern analysis. In: Proceedings of the 8th international AAAI conference on weblogs and social media

  12. 12.

    Chu Z, Gianvecchio S, Wang H, Jajodia S (2010) Who is tweeting on Twitter: Human, bot, or cyborg? In: Proceedings of the 26th annual computer security applications conference

  13. 13.

    Costa AF, Yamaguchi Y, Traina AJM Jr, Faloutsos CT (2017) Modeling temporal activity to detect anomalous behavior in social media. ACM Trans Knowl Discov Data 11(4):1–23

    Article  Google Scholar 

  14. 14.

    Cramér-Rao bound. https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound (Retrived June 2020)

  15. 15.

    Duffield N, Lund C, Thorup M (2003) Estimating flow distributions from sampled flow statistics. In: Proceedings of the ACM special interest group on data communication

  16. 16.

    Durand M, Flajolet P (2003) Loglog counting of large cardinalities. In: Proceedings of the 11th annual European symposium on algorithms

  17. 17.

    Eftekhar M, Koudas N, Ganjali Y (2013) Bursty subgraphs in social networks. In: Proceedings of the 6th international ACM conference on web search and data mining

  18. 18.

    Eswaran D, Faloutsos C, Guha S, Mishra N (2018) SpotLight: Detecting anomalies in streaming graphs. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining

  19. 19.

    Gjoka M, Kurant M, Butts CT, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Sel Areas Commun 29(9):1872–1892

    Article  Google Scholar 

  20. 20.

    Gorman JD, Hero AO (1990) Lower bounds for parametric estimation with constraints. IEEE Transit Inf Theory 26(6):1285–1301

    MathSciNet  Article  Google Scholar 

  21. 21.

    Grier C, Thomas K, Paxson V, Zhang M (2010) @spam: The underground on 140 characters or less. In: Proceedings of the ACM SIGSAC conference on computer and communications security

  22. 22.

    Harvey M (2020) Fans mourn artist for whom it didn’t matter if you were black or white. http://wayback.archive.org/web/20100531165925/http://www.timesonline.co.uk/tol/news/world/us_and_americas/article6580897.ece (Retrieved June 2020)

  23. 23.

    Jha M, Seshadhri C, Pinar A (2013) A space efficient streaming algorithm for triangle counting using the birthday paradox. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining

  24. 24.

    Kleinberg J (2002) Bursty and hierarchical structure in streams. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining

  25. 25.

    Klimt B, Yang Y (2004) The Enron corpus: A new dataset for email classification research. In: Proceeding of the European conference on machine learning and principles and practice of knowledge discovery in databases

  26. 26.

    Kossinets G, Watts DJ (2006) Empirical analysis of an evolving social network. Science 311(5757):88–90

    MathSciNet  Article  Google Scholar 

  27. 27.

    Krikorian R (2020) New tweets per second record, and how! https://blog.twitter.com/engineering/en_us/a/2013/new-tweets-per-second-record-and-how.html (Retrived June 2020)

  28. 28.

    Leskovec J, McGlohon M, Faloutsos C, Glance N, Hurst M (2007) Cascading behavior in large blog graphs. In: Proceedings of the 7th SIAM international conference on data mining

  29. 29.

    Lim Y, Kang U (2015) MASCOT: Memory-efficient and accurate sampling for counting local triangles in graph streams. In: Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and data mining. Sydney, Australia

  30. 30.

    Liu X, Ge T, Wu Y (2019) Finding densest lasting subgraphs in dynamic graphs: a stochastic approach. In: Proceedings of the 35th IEEE international conference on data engineering

  31. 31.

    London riots: more than 2,000 people arrested over disorder. http://www.mirror.co.uk/news/uk-news/london-riots-more-than-2000-people-185548 (Retrived June 2020)

  32. 32.

    Manzoor E, Milajerdi, SM, Akoglu L (2016) Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. San Francisco, California, USA

  33. 33.

    Mathioudakis M, Bansal N, Koudas N (2010) Identifying, attributing and describing spatial bursts. In: Proceedings of the VLDB endowment

  34. 34.

    Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827

    Article  Google Scholar 

  35. 35.

    Pagh R, Tsourakakis CE (2012) Colorful triangle counting and a MapReduce implementation. J Inf Process Lett 112(7):277–281

    MathSciNet  Article  Google Scholar 

  36. 36.

    Paranjape A, Benson AR, Leskovec J (2017)Motifs in temporal networks. In: Proceedings of the 10th ACM international conference on web search and data mining

  37. 37.

    Parikh N, Sundaresan N (2008)Scalable and near real-time burst detection from ecommerce queries. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining

  38. 38.

    Pavan A, Tangwongsan K, Tirthapura S, Wu KL (2013)Counting and sampling triangles from a graph stream. In: Proceedings of the VLDB endowment

  39. 39.

    Ribeiro B, Towsley D, Ye T, Bolot JC (2006) Fisher information of sampled packets: An application to flow size estimation. In: Proceedings of the 6th ACM SIGCOMM conference on internet measurement

  40. 40.

    Rodrigues T, Benevenuto F, Cha M, Gummadi KP, Almeida V (2011) On word-of-mouth based discovery of the web. In: Proceedings of the 11th ACM SIGCOMM conference on internet measurement

  41. 41.

    Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: Real-time event detection by social sensors. In: Proceedings of the 19th international world wide web conference

  42. 42.

    Sanei-Mehri SV, Sarüyüce AE, Tirthapura S (2018) Butterfly counting in bipartite networks. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining

  43. 43.

    Sarüyüce AE, Pinar A (2018) Peeling bipartite networks for dense subgraph discovery. In: Proceedings of the 11th international ACM conference on web search and data mining, pp. 504–512

  44. 44.

    Seshadhri C, Pinar A, Kolda TG (2013) Triadic measures on graphs: The power of wedge sampling. In: Proceedings of the 13th SIAM international conference on data mining

  45. 45.

    Shiels M (2020) Web slows after Jackson’s death. http://news.bbc.co.uk/2/hi/technology/8120324.stm (Retrieved June 2020)

  46. 46.

    Shin K, Oh S, Kim J, Hooi B, Faloutsos C (2019) Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Trans Knowl Discov Data 14(2):1–39

    Article  Google Scholar 

  47. 47.

    Stefani LD, Epasto A, Riondato M, Upfal E (2016) TRIEST: Counting local and global triangles in fully-dynamic streams with fixed memory size. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining

  48. 48.

    Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th annual computer security applications conference

  49. 49.

    Takahashi T, Tomioka R, Yamanishi K (2011) Discovering emerging topics in social streams via link anomaly detection. In: Proceedings of the IEEE international conference on data mining

  50. 50.

    Teng X, Yan M, Ertugrul AM, Lin YR (2018) Deep into hypersphere: Robust and unsupervised anomaly discovery in dynamic networks. In: Proceedings of the 27th international joint conference on artificial intelligence

  51. 51.

    Thomas K, Grier C, Paxson V, Song D (2011) Suspended accounts in retrospect: an analysis of Twitter spam. In: Proceedings of the 11th ACM SIGCOMM conference on internet measurement

  52. 52.

    Trees HLV (2001) Detection, estimation, and modulation theory. Part I. Wiley, Hoboken

    Google Scholar 

  53. 53.

    Tsotsis A (2020) First credible reports of Bin Laden’s death spread like wildfire on Twitter. https://techcrunch.com/2011/05/01/news-of-osama-bin-ladens-death-spreads-like-wildfire-on-twitter (Retrived June 2020)

  54. 54.

    Tsourakakis CE, Kang U, Miller GL, Faloutsos C (2009) DOULION: Counting triangles in massive graphs with a coin. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining

  55. 55.

    Tune P, Veitch D (2011) Fisher information in flow size distribution estimation. IEEE Trans Inf Theory 57(10):7011–7035

    MathSciNet  Article  Google Scholar 

  56. 56.

    Turkett W, Fulp E, Lever C, Edward Allan J (2011) Graph mining of motif profiles for computer network activity inference. In: Proceedings of the 7th workshop on mining and learning with graphs

  57. 57.

    Veitch D, Tune P (2015) Optimal sampling for the flow size distribution. IEEE Trans Inf Theory 61(6):3075–3099

    Article  Google Scholar 

  58. 58.

    Wang P, Guan X, Zhao J, Tao J, Qin T (2014) A new sketch method for measuring host connection degree distribution. IEEE Trans Inf Forensics Secur 9(6):948–960

    Article  Google Scholar 

  59. 59.

    Wang P, Lui JC, Ribeiro B, Towsley D, Zhao J, Guan X (2014) Efficiently estimating motif statistics of large networks. ACM Trans Knowl Discov Data 9(2):1–27

    Article  Google Scholar 

  60. 60.

    Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442

    Article  Google Scholar 

  61. 61.

    Wu B, Yi K, Li Z (2016) Counting triangles in large graphs by random sampling. IEEE Trans Knowl Data Eng 28(8):2013–2026

    Article  Google Scholar 

  62. 62.

    Yi J (2005) Detecting buzz from time-sequenced document streams. In: Proceedings of the IEEE international conference on e-technology, e-commerce and e-service

  63. 63.

    Yoon M, Hooi B, Shin K, Faloutsos C (2019) Fast and accurate anomaly detection in dynamic graphs with a two-pronged approach. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining

  64. 64.

    Yu C, Zelterman D (2002) Sums of dependent Bernoulli random variables and disease clustering. Stat Probab Lett 57(1):363–373

    MathSciNet  Article  Google Scholar 

  65. 65.

    Yu M, Song C, Gu J, Liu M (2019) Distributed triangle counting algorithms in simple graph stream. In: Proceedings of the 25th IEEE international conference on parallel and distributed system

  66. 66.

    Zhao J, Lui JC, Towsley D, Wang P, Guan X (2015) Tracking triadic cardinality distributions for burst detection in social activity streams. In: ACM conference on online social networks

  67. 67.

    Zhu Y, Shasha D (2003) Efficient elastic burst detection in data streams. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China (2016YFE0206700), Shenzhen Basic Research Grant (JCYJ20170816100819428), Sichuan Science and Technology Program (2019YFG0407), and the National Natural Science Foundation of China (61922067, U1736205, 61902305).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Pinghui Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhao, J., Wang, P., Chen, Z. et al. Tracking triadic cardinality distributions for burst detection in high-speed graph streams. Knowl Inf Syst 63, 939–969 (2021). https://doi.org/10.1007/s10115-021-01543-x

Download citation

Keywords

  • Burst detection
  • Graph sampling
  • Streaming algorithms
  • Online social network
  • Estimation theory