Efficient Identification of TOP-K Heavy Hitters over Sliding Windows

Tang, Haina; Wu, Yulei; Li, Tong; Han, Chunjing; Ge, Jingguo; Zhao, Xiangpeng

doi:10.1007/s11036-018-1051-x

Efficient Identification of TOP-K Heavy Hitters over Sliding Windows

Published: 19 May 2018

Volume 24, pages 1732–1741, (2019)
Cite this article

Mobile Networks and Applications Aims and scope Submit manuscript

Haina Tang¹,
Yulei Wu²,
Tong Li³,
Chunjing Han³,
Jingguo Ge³ &
…
Xiangpeng Zhao¹

420 Accesses
5 Citations
Explore all metrics

Abstract

Due to the increasing volume of network traffic and growing complexity of network environment, rapid identification of heavy hitters is quite challenging. To deal with the massive data streams in real-time, accurate and scalable solution is required. The traditional method to keep an individual counter for each host in the whole data streams is very resource-consuming. This paper presents a new data structure called FCM and its associated algorithms. FCM combines the count-min sketch with the stream-summary structure simultaneously for efficient TOP-K heavy hitter identification in one pass. The key point of this algorithm is that it introduces a novel filter-and-jump mechanism. Given that the Internet traffic has the property of being heavy-tailed and hosts of low frequencies account for the majority of the IP addresses, FCM periodically filters the mice from input streams to efficiently improve the accuracy of TOP-K heavy hitter identification. On the other hand, considering that abnormal events are always time sensitive, our algorithm works by adjusting its measurement window to the newly arrived elements in the data streams automatically. Our experimental results demonstrate that the performance of FCM is superior to the previous related algorithm. Additionally this solution has a good prospect of application in advanced network environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stratified random sampling from streaming and stored data

Article 23 October 2020

Survey of intrusion detection systems: techniques, datasets and challenges

Article Open access 17 July 2019

Big data analytics on Apache Spark

Article 13 October 2016

References

Zhao Q, Kumar A, Xu J (2005) Joint data streaming and sampling techniques for detection of super sources and destinations. Proc 5th ACM SIGCOMM Conf Internet Measure: 7–7
Kompella R, Singh S, Varghese G (2004) On scalable attack detection in the network. Proc 4th ACM SIGCOMM Conf Internet Measure: 187–200
Akamai (2016) Akamai Q1 2016 State of the Internet Security Report [Online]. https://content.akamai.com/PG6292-SOTI-Security.html
Shapsough S, Qatan F, Aburukba R, Aloul F, Ali A (2015) Smart grid cyber security: challenges and solutions. Int Conf Smart Grid Clean Energy Technol (ICSGCE): 170–175
Yao Y, Xiong S, Qi H, Liu Y, Tolbert L, Cao Q (2014) Efficient histogram estimation for smart grid data processing with the Loglog-bloom-filter. IEEE Trans Smart Grid 6(1):199–208
Article Google Scholar
Procopiou A, Komninos N (2015) Current and future threats framework in smart grid domain. IEEE Int Conf Cyber Technol Auto Contrl Intell Syst(CYBER): 1852–1857
Homem N, Carvalho J (December 2010) Finding top- k elements in data streams. Inf Sci 180(24):4958–4974
Article Google Scholar
Roesch M (1999) Snort–lightweight intrusion detection for networks. Proc USENIX LISA 1999:229–238
Google Scholar
Plonka D (2000) FlowScan: a network traffic flow reporting and visualization tool. Proc USENIX LISA 2000:305–317
Google Scholar
Estan C, Varghese G, Fiskin M (2003) Bitmap algorithms for counting active flows on high speed links. Proc 3rd ACM SIGCOMM Conf Internet Measure: 153–166
Wang P, Guan X, Gong W, Towsley D (2011) A new virtual indexing method for measuring host connection degrees. INFOCOM 2011:156–160
Google Scholar
S. Venkataraman, D. Song, P. Gibbons, and A. Blum, (2005) New streaming algorithms for fast detection of Superspreaders. Proc Netwk Distributed Syst Security Sym (NDSS): 149–166
Bandi N, Agrawal D, El A (2007) Fast Algorithms for heavy distinct hitters using associative memories. Int Conf Distrib Comput Syst: 6–6
Dimitropoulos X, Hurley P, Kind A (January 2008) Probabilistic lossy counting: an efficient algorithm for finding heavy hitters. ACM SIGCOMM Comput Commun Rev 38(1):5–5
Article Google Scholar
Karp R, Shenker S, Papadimitriou C (March 2003) A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS) 28(1):51–55
Article Google Scholar
Metwally A, Agrawal D, El A (2005) Efficient computation of frequent and top-k elements in data streams. Int Conf Database Theory. Springer Berlin Heidelberg: 398–412
Chapter Google Scholar
M. Charikar, K. Chen, and M. Farach-Colton, (2002) Finding frequent items in data streams. International colloquium on automata, languages, and programming. Springer berlin Heidelberg, pp. 693–703
Cormode G (November 2014) Count-min sketch. Encyclopedia Algorithms Springer US 29(1):1–6
Google Scholar
Huang Q, Lee P (2014) LD-sketch: a distributed sketching design for accurate and scalable anomaly detection in network data streams. Int Conf Comput Commun: 1420–1428
Anceaume E, Busnel Y, Rivetti N (2015) Estimating the frequency of data items in massive distributed streams. IEEE Sym Netwrk Cloud Comput Appl (NCCA): 59–66
Roy P, Khan A, Alonso G (2016) Augmented sketch: faster and more accurate stream processing. Proc 2016 Int Conf Manag Data: 1449–1463
Pitel G, Fouquier G (2015) Count-min-log sketch: approximately counting with approximate counters. 1st Int Sym Web Algorithms
Ben-Basat R, Einziger G, Friedman R, Kassner Y (2016) Heavy hitters in streams and sliding windows. IEEE INFOCOM 2016:1–9
MATH Google Scholar
Roy P, Teubner J, Alonso G (2012) Efficient frequent item counting in multi-core hardware. Proc 18th ACM SIGKDD Int Conf Knowledge Discov Data Mining: 1451–1459
Das S, Antony S, Agrawal D, El A (2009) Thread cooperation in multicore architectures for frequency counting over multiple data streams. Proc VLDB Endowment 2(1):217–228
Article Google Scholar
Einziger G, Friedman R (2016) Counting with TinyTable: every bit counts!. Proc Int Conf Distrib Comput Network (ICDCN 2016), Article No 27
Homem N, Carvalho J (2011) Finding top-k elements in a time-sliding window. Evol Syst 2(1):51–70
Article Google Scholar
Zhang Z, Wang B, Lan J (2015) Identifying elephant flows in internet backbone traffic with bloom filters and LRU. Comput Commun 61:70–78
Article Google Scholar
Cafaro M, Pulimeno M, Epicoco I, Aloisio G (2016) Mining frequent items in the time fading model. Inf Sci 370:221–238
Article Google Scholar
Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J 19(1):3–20
Article Google Scholar
Cormode G, Hadjieleftheriou M (2008) Finding frequent items in data streams. Proc VLDB Endowment 1(2):1530–1541
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010306 and the National Natural Science Foundation of China under Grant No. 61303241. Furthermore, this work is done also with the support of Chinese Academy of Sciences project under Grant No. CXJJ-16 M119.

Author information

Authors and Affiliations

School of Engineering Science, University of Chinese Academy of Sciences, Beijing, 100049, China
Haina Tang & Xiangpeng Zhao
School of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, EX4 4QF, UK
Yulei Wu
Institute of Information Engineering, Chinese Academy of Science, Beijing, 100195, China
Tong Li, Chunjing Han & Jingguo Ge

Authors

Haina Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yulei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Tong Li
View author publications
You can also search for this author in PubMed Google Scholar
Chunjing Han
View author publications
You can also search for this author in PubMed Google Scholar
Jingguo Ge
View author publications
You can also search for this author in PubMed Google Scholar
Xiangpeng Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haina Tang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, H., Wu, Y., Li, T. et al. Efficient Identification of TOP-K Heavy Hitters over Sliding Windows. Mobile Netw Appl 24, 1732–1741 (2019). https://doi.org/10.1007/s11036-018-1051-x

Download citation

Published: 19 May 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11036-018-1051-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Identification of TOP-K Heavy Hitters over Sliding Windows

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

Survey of intrusion detection systems: techniques, datasets and challenges

Big data analytics on Apache Spark

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient Identification of TOP-K Heavy Hitters over Sliding Windows

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

Survey of intrusion detection systems: techniques, datasets and challenges

Big data analytics on Apache Spark

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation