Advertisement

Mobile Networks and Applications

, Volume 24, Issue 5, pp 1732–1741 | Cite as

Efficient Identification of TOP-K Heavy Hitters over Sliding Windows

  • Haina TangEmail author
  • Yulei Wu
  • Tong Li
  • Chunjing Han
  • Jingguo Ge
  • Xiangpeng Zhao
Article
  • 141 Downloads

Abstract

Due to the increasing volume of network traffic and growing complexity of network environment, rapid identification of heavy hitters is quite challenging. To deal with the massive data streams in real-time, accurate and scalable solution is required. The traditional method to keep an individual counter for each host in the whole data streams is very resource-consuming. This paper presents a new data structure called FCM and its associated algorithms. FCM combines the count-min sketch with the stream-summary structure simultaneously for efficient TOP-K heavy hitter identification in one pass. The key point of this algorithm is that it introduces a novel filter-and-jump mechanism. Given that the Internet traffic has the property of being heavy-tailed and hosts of low frequencies account for the majority of the IP addresses, FCM periodically filters the mice from input streams to efficiently improve the accuracy of TOP-K heavy hitter identification. On the other hand, considering that abnormal events are always time sensitive, our algorithm works by adjusting its measurement window to the newly arrived elements in the data streams automatically. Our experimental results demonstrate that the performance of FCM is superior to the previous related algorithm. Additionally this solution has a good prospect of application in advanced network environment.

Keywords

Heavy hitters Count-min sketch Space saving Sliding window 

Notes

Acknowledgements

This work was supported in part by the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010306 and the National Natural Science Foundation of China under Grant No. 61303241. Furthermore, this work is done also with the support of Chinese Academy of Sciences project under Grant No. CXJJ-16 M119.

References

  1. 1.
    Zhao Q, Kumar A, Xu J (2005) Joint data streaming and sampling techniques for detection of super sources and destinations. Proc 5th ACM SIGCOMM Conf Internet Measure: 7–7Google Scholar
  2. 2.
    Kompella R, Singh S, Varghese G (2004) On scalable attack detection in the network. Proc 4th ACM SIGCOMM Conf Internet Measure: 187–200Google Scholar
  3. 3.
    Akamai (2016) Akamai Q1 2016 State of the Internet Security Report [Online]. https://content.akamai.com/PG6292-SOTI-Security.html
  4. 4.
    Shapsough S, Qatan F, Aburukba R, Aloul F, Ali A (2015) Smart grid cyber security: challenges and solutions. Int Conf Smart Grid Clean Energy Technol (ICSGCE): 170–175Google Scholar
  5. 5.
    Yao Y, Xiong S, Qi H, Liu Y, Tolbert L, Cao Q (2014) Efficient histogram estimation for smart grid data processing with the Loglog-bloom-filter. IEEE Trans Smart Grid 6(1):199–208CrossRefGoogle Scholar
  6. 6.
    Procopiou A, Komninos N (2015) Current and future threats framework in smart grid domain. IEEE Int Conf Cyber Technol Auto Contrl Intell Syst(CYBER): 1852–1857Google Scholar
  7. 7.
    Homem N, Carvalho J (December 2010) Finding top- k elements in data streams. Inf Sci 180(24):4958–4974CrossRefGoogle Scholar
  8. 8.
    Roesch M (1999) Snort–lightweight intrusion detection for networks. Proc USENIX LISA 1999:229–238Google Scholar
  9. 9.
    Plonka D (2000) FlowScan: a network traffic flow reporting and visualization tool. Proc USENIX LISA 2000:305–317Google Scholar
  10. 10.
    Estan C, Varghese G, Fiskin M (2003) Bitmap algorithms for counting active flows on high speed links. Proc 3rd ACM SIGCOMM Conf Internet Measure: 153–166Google Scholar
  11. 11.
    Wang P, Guan X, Gong W, Towsley D (2011) A new virtual indexing method for measuring host connection degrees. INFOCOM 2011:156–160Google Scholar
  12. 12.
    S. Venkataraman, D. Song, P. Gibbons, and A. Blum, (2005) New streaming algorithms for fast detection of Superspreaders. Proc Netwk Distributed Syst Security Sym (NDSS): 149–166Google Scholar
  13. 13.
    Bandi N, Agrawal D, El A (2007) Fast Algorithms for heavy distinct hitters using associative memories. Int Conf Distrib Comput Syst: 6–6Google Scholar
  14. 14.
    Dimitropoulos X, Hurley P, Kind A (January 2008) Probabilistic lossy counting: an efficient algorithm for finding heavy hitters. ACM SIGCOMM Comput Commun Rev 38(1):5–5CrossRefGoogle Scholar
  15. 15.
    Karp R, Shenker S, Papadimitriou C (March 2003) A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS) 28(1):51–55CrossRefGoogle Scholar
  16. 16.
    Metwally A, Agrawal D, El A (2005) Efficient computation of frequent and top-k elements in data streams. Int Conf Database Theory. Springer Berlin Heidelberg: 398–412CrossRefGoogle Scholar
  17. 17.
    M. Charikar, K. Chen, and M. Farach-Colton, (2002) Finding frequent items in data streams. International colloquium on automata, languages, and programming. Springer berlin Heidelberg, pp. 693–703Google Scholar
  18. 18.
    Cormode G (November 2014) Count-min sketch. Encyclopedia Algorithms Springer US 29(1):1–6Google Scholar
  19. 19.
    Huang Q, Lee P (2014) LD-sketch: a distributed sketching design for accurate and scalable anomaly detection in network data streams. Int Conf Comput Commun: 1420–1428Google Scholar
  20. 20.
    Anceaume E, Busnel Y, Rivetti N (2015) Estimating the frequency of data items in massive distributed streams. IEEE Sym Netwrk Cloud Comput Appl (NCCA): 59–66Google Scholar
  21. 21.
    Roy P, Khan A, Alonso G (2016) Augmented sketch: faster and more accurate stream processing. Proc 2016 Int Conf Manag Data: 1449–1463Google Scholar
  22. 22.
    Pitel G, Fouquier G (2015) Count-min-log sketch: approximately counting with approximate counters. 1st Int Sym Web AlgorithmsGoogle Scholar
  23. 23.
    Ben-Basat R, Einziger G, Friedman R, Kassner Y (2016) Heavy hitters in streams and sliding windows. IEEE INFOCOM 2016:1–9zbMATHGoogle Scholar
  24. 24.
    Roy P, Teubner J, Alonso G (2012) Efficient frequent item counting in multi-core hardware. Proc 18th ACM SIGKDD Int Conf Knowledge Discov Data Mining: 1451–1459Google Scholar
  25. 25.
    Das S, Antony S, Agrawal D, El A (2009) Thread cooperation in multicore architectures for frequency counting over multiple data streams. Proc VLDB Endowment 2(1):217–228CrossRefGoogle Scholar
  26. 26.
    Einziger G, Friedman R (2016) Counting with TinyTable: every bit counts!. Proc Int Conf Distrib Comput Network (ICDCN 2016), Article No 27Google Scholar
  27. 27.
    Homem N, Carvalho J (2011) Finding top-k elements in a time-sliding window. Evol Syst 2(1):51–70CrossRefGoogle Scholar
  28. 28.
    Zhang Z, Wang B, Lan J (2015) Identifying elephant flows in internet backbone traffic with bloom filters and LRU. Comput Commun 61:70–78CrossRefGoogle Scholar
  29. 29.
    Cafaro M, Pulimeno M, Epicoco I, Aloisio G (2016) Mining frequent items in the time fading model. Inf Sci 370:221–238CrossRefGoogle Scholar
  30. 30.
    Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J 19(1):3–20CrossRefGoogle Scholar
  31. 31.
    Cormode G, Hadjieleftheriou M (2008) Finding frequent items in data streams. Proc VLDB Endowment 1(2):1530–1541CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Haina Tang
    • 1
    Email author
  • Yulei Wu
    • 2
  • Tong Li
    • 3
  • Chunjing Han
    • 3
  • Jingguo Ge
    • 3
  • Xiangpeng Zhao
    • 1
  1. 1.School of Engineering ScienceUniversity of Chinese Academy of SciencesBeijingChina
  2. 2.School of Engineering, Mathematics and Physical SciencesUniversity of ExeterExeterUK
  3. 3.Institute of Information EngineeringChinese Academy of ScienceBeijingChina

Personalised recommendations