Skip to main content
Log in

Efficient Identification of TOP-K Heavy Hitters over Sliding Windows

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

Due to the increasing volume of network traffic and growing complexity of network environment, rapid identification of heavy hitters is quite challenging. To deal with the massive data streams in real-time, accurate and scalable solution is required. The traditional method to keep an individual counter for each host in the whole data streams is very resource-consuming. This paper presents a new data structure called FCM and its associated algorithms. FCM combines the count-min sketch with the stream-summary structure simultaneously for efficient TOP-K heavy hitter identification in one pass. The key point of this algorithm is that it introduces a novel filter-and-jump mechanism. Given that the Internet traffic has the property of being heavy-tailed and hosts of low frequencies account for the majority of the IP addresses, FCM periodically filters the mice from input streams to efficiently improve the accuracy of TOP-K heavy hitter identification. On the other hand, considering that abnormal events are always time sensitive, our algorithm works by adjusting its measurement window to the newly arrived elements in the data streams automatically. Our experimental results demonstrate that the performance of FCM is superior to the previous related algorithm. Additionally this solution has a good prospect of application in advanced network environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Zhao Q, Kumar A, Xu J (2005) Joint data streaming and sampling techniques for detection of super sources and destinations. Proc 5th ACM SIGCOMM Conf Internet Measure: 7–7

  2. Kompella R, Singh S, Varghese G (2004) On scalable attack detection in the network. Proc 4th ACM SIGCOMM Conf Internet Measure: 187–200

  3. Akamai (2016) Akamai Q1 2016 State of the Internet Security Report [Online]. https://content.akamai.com/PG6292-SOTI-Security.html

  4. Shapsough S, Qatan F, Aburukba R, Aloul F, Ali A (2015) Smart grid cyber security: challenges and solutions. Int Conf Smart Grid Clean Energy Technol (ICSGCE): 170–175

  5. Yao Y, Xiong S, Qi H, Liu Y, Tolbert L, Cao Q (2014) Efficient histogram estimation for smart grid data processing with the Loglog-bloom-filter. IEEE Trans Smart Grid 6(1):199–208

    Article  Google Scholar 

  6. Procopiou A, Komninos N (2015) Current and future threats framework in smart grid domain. IEEE Int Conf Cyber Technol Auto Contrl Intell Syst(CYBER): 1852–1857

  7. Homem N, Carvalho J (December 2010) Finding top- k elements in data streams. Inf Sci 180(24):4958–4974

    Article  Google Scholar 

  8. Roesch M (1999) Snort–lightweight intrusion detection for networks. Proc USENIX LISA 1999:229–238

    Google Scholar 

  9. Plonka D (2000) FlowScan: a network traffic flow reporting and visualization tool. Proc USENIX LISA 2000:305–317

    Google Scholar 

  10. Estan C, Varghese G, Fiskin M (2003) Bitmap algorithms for counting active flows on high speed links. Proc 3rd ACM SIGCOMM Conf Internet Measure: 153–166

  11. Wang P, Guan X, Gong W, Towsley D (2011) A new virtual indexing method for measuring host connection degrees. INFOCOM 2011:156–160

    Google Scholar 

  12. S. Venkataraman, D. Song, P. Gibbons, and A. Blum, (2005) New streaming algorithms for fast detection of Superspreaders. Proc Netwk Distributed Syst Security Sym (NDSS): 149–166

  13. Bandi N, Agrawal D, El A (2007) Fast Algorithms for heavy distinct hitters using associative memories. Int Conf Distrib Comput Syst: 6–6

  14. Dimitropoulos X, Hurley P, Kind A (January 2008) Probabilistic lossy counting: an efficient algorithm for finding heavy hitters. ACM SIGCOMM Comput Commun Rev 38(1):5–5

    Article  Google Scholar 

  15. Karp R, Shenker S, Papadimitriou C (March 2003) A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS) 28(1):51–55

    Article  Google Scholar 

  16. Metwally A, Agrawal D, El A (2005) Efficient computation of frequent and top-k elements in data streams. Int Conf Database Theory. Springer Berlin Heidelberg: 398–412

    Chapter  Google Scholar 

  17. M. Charikar, K. Chen, and M. Farach-Colton, (2002) Finding frequent items in data streams. International colloquium on automata, languages, and programming. Springer berlin Heidelberg, pp. 693–703

  18. Cormode G (November 2014) Count-min sketch. Encyclopedia Algorithms Springer US 29(1):1–6

    Google Scholar 

  19. Huang Q, Lee P (2014) LD-sketch: a distributed sketching design for accurate and scalable anomaly detection in network data streams. Int Conf Comput Commun: 1420–1428

  20. Anceaume E, Busnel Y, Rivetti N (2015) Estimating the frequency of data items in massive distributed streams. IEEE Sym Netwrk Cloud Comput Appl (NCCA): 59–66

  21. Roy P, Khan A, Alonso G (2016) Augmented sketch: faster and more accurate stream processing. Proc 2016 Int Conf Manag Data: 1449–1463

  22. Pitel G, Fouquier G (2015) Count-min-log sketch: approximately counting with approximate counters. 1st Int Sym Web Algorithms

  23. Ben-Basat R, Einziger G, Friedman R, Kassner Y (2016) Heavy hitters in streams and sliding windows. IEEE INFOCOM 2016:1–9

    MATH  Google Scholar 

  24. Roy P, Teubner J, Alonso G (2012) Efficient frequent item counting in multi-core hardware. Proc 18th ACM SIGKDD Int Conf Knowledge Discov Data Mining: 1451–1459

  25. Das S, Antony S, Agrawal D, El A (2009) Thread cooperation in multicore architectures for frequency counting over multiple data streams. Proc VLDB Endowment 2(1):217–228

    Article  Google Scholar 

  26. Einziger G, Friedman R (2016) Counting with TinyTable: every bit counts!. Proc Int Conf Distrib Comput Network (ICDCN 2016), Article No 27

  27. Homem N, Carvalho J (2011) Finding top-k elements in a time-sliding window. Evol Syst 2(1):51–70

    Article  Google Scholar 

  28. Zhang Z, Wang B, Lan J (2015) Identifying elephant flows in internet backbone traffic with bloom filters and LRU. Comput Commun 61:70–78

    Article  Google Scholar 

  29. Cafaro M, Pulimeno M, Epicoco I, Aloisio G (2016) Mining frequent items in the time fading model. Inf Sci 370:221–238

    Article  Google Scholar 

  30. Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J 19(1):3–20

    Article  Google Scholar 

  31. Cormode G, Hadjieleftheriou M (2008) Finding frequent items in data streams. Proc VLDB Endowment 1(2):1530–1541

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010306 and the National Natural Science Foundation of China under Grant No. 61303241. Furthermore, this work is done also with the support of Chinese Academy of Sciences project under Grant No. CXJJ-16 M119.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haina Tang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, H., Wu, Y., Li, T. et al. Efficient Identification of TOP-K Heavy Hitters over Sliding Windows. Mobile Netw Appl 24, 1732–1741 (2019). https://doi.org/10.1007/s11036-018-1051-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-018-1051-x

Keywords

Navigation