Science China Information Sciences

, Volume 53, Issue 3, pp 659–676 | Cite as

Identifying heavy hitters in high-speed network monitoring

Research Papers

Abstract

Identifying heavy hitters in a network traffic stream is important for a variety of network applications ranging from traffic engineering to anomaly detection such as detection of denial-of-service attacks. Existing methods generally examine newly arriving items in the stream, perform a small number of operations using a small amount of memory, and still provide guarantees on the identifying accuracy. In high-speed network monitoring, the update speed per item is extremely critical. However, so far as we know, there are no identifying algorithms which can provide constant update time (O(1)) in a weighted data stream. In this paper, we present an algorithm named Weighted Lossy Counting (WLC) which is able to identify heavy hitters in a high-speed weighted data stream with constant update time. WLC employs a novel efficient partially ordered data structure which is able to provide a fast per-item update speed while keeping the memory cost relatively low. We compare WLC with state-of-the-art algorithms for finding heavy hitters in real traffic traces. The experimental results show that WLC performs well in accuracy (recall, precision and average relative error) as other algorithms; moreover it has a much higher update speed at the cost of relatively larger memory space used. A theoretical worst-case memory bound of WLC is also derived in this paper; however, experiments with long real traffic traces show that WLC requires much less space than the theoretical bound in practice.

Keywords

network traffic monitoring heavy hitter weighted data streams 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sommer R, Feldmann A. NetFlow: Information loss or win? In: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurement. New York: ACM, 2002. 173–174CrossRefGoogle Scholar
  2. 2.
    Fred S, Bonald T, Proutiere A, et al. Statistical bandwidth sharing: A study of congestion at flow level. ACM SIGCOMM Comput Commun Rev, 2001, 31: 111–122CrossRefGoogle Scholar
  3. 3.
    Mori T, Kawahara S, Naito S, et al. On the characteristics of Internet traffic variability: Spikes and elephants. In: Proceedings of the 2004 International Symposium on Applications and the Internet. Tokyo: IEEE, 2004. 99–106Google Scholar
  4. 4.
    Papagiannaki K, Taft N, Bhattachayya S, et al. On the feasibility of identifying elephants in Internet backbone traffic. Sprint ATL Technical Report TR01-ATL-110918, 2001Google Scholar
  5. 5.
    Zhang Y, Breslau L, Paxson V, et al. On the characteristics and origins of internet flow rates. In: Proceedings of the 2002 SIGCOMM Conference. New York: ACM, 2002. 309–322Google Scholar
  6. 6.
    Fang W J, Peterson L. Inter-AS traffic patterns and their implications. In: Proceedings of GLOBECOM’99. Rio de Janeireo: IEEE, 1999. 1859–1868Google Scholar
  7. 7.
    Mahajan R, Floyd S, Wetherall D. Controlling high-bandwidth flows at the congested router. In: Proceedings of IEEE ICNP’01. Washington: IEEE, 2001. 192Google Scholar
  8. 8.
    Feldmann A, Greenberg A, Lund C, et al. Deriving traffic demands for operational IP networks: Methodology and experience. IEEE/ACM Trans Netw (ToN), 2001, 9: 265–280CrossRefGoogle Scholar
  9. 9.
    Estan C, Varghese G. New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Trans Comput Syst (ToCS), 2003, 21: 270–313CrossRefGoogle Scholar
  10. 10.
    Wang F Y, Yun X C, Wang X F, et al. Identifying heavy hitters in high-speed network monitoring (in Chinese). J Softw, 2007, 18: 3060–3070Google Scholar
  11. 11.
    Wang H B, Pei Y J, Lin Y, et al. A LRU based algorithm for identifying and measuring large flows (in Chinese). J Electron Inf Tech, 2007, 29: 2487–2492Google Scholar
  12. 12.
    Pei Y J, Wang H B, Cheng S D, et al. A dual-LRU based algorithm for identifying and measuring large flows (in Chinese). Acta Electron Sin, 2009, 37: 684–691Google Scholar
  13. 13.
    Raspall F, Sallent S. Adaptive shared-state sampling. In: Proceedings of the 8th ACM SIGCOMM Workshop on Internet Measurement. New York: ACM, 2008. 271–284Google Scholar
  14. 14.
    Kumar A, Sung M, Xu J, et al. Data streaming algorithms for efficient and accurate estimation of flow size distribution. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems. New York: ACM, 2004. 177–188Google Scholar
  15. 15.
    Metwally A, Agrawal D, El Abbadi A. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans Database Syst (TODS), 2006, 31: 1095–1133CrossRefGoogle Scholar
  16. 16.
    Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. Theor Comput Sci, 2004, 312: 3–15MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Cormode G, Muthukrishnan S. An improved data stream summary: The count-min sketch and its applications. J Algorithms, 2005, 55: 58–75MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Karp R, Shenker S, Papadimitriou C. A simple algorithm for finding frequent elements in streams and bags. ACM Trans Database Syst (TODS), 2003, 28: 51–55CrossRefGoogle Scholar
  19. 19.
    Manku G, Motwani R. Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases. Hong Kong: VLDB Endowment, 2002. 346–357CrossRefGoogle Scholar
  20. 20.
    Wang W P, Li J Z, Zhang D D, et al. An efficient algorithm for mining approximate frequent item over data streams (in Chinese). J Software, 2007, 18: 884–892MATHCrossRefGoogle Scholar
  21. 21.
    Jin C, Qian W, Sha C, et al. Dynamically maintaining frequent items over a data stream. In: Proceedings of the 12th International Conference on Information and Knowledge Management. New York: ACM, 2003. 287–294Google Scholar
  22. 22.
    Cormode G, Muthukrishnan S. What’s hot and what’s not: Tracking most frequent items dynamically. ACM Trans Database Syst (TODS), 2005, 30: 249–278CrossRefGoogle Scholar
  23. 23.
    Cormode G, Hadjieleftheriou M. Finding frequent items in data streams. In: Proceedings of the VLDB Endowment. Auckland: VLDB Endowment, 2008. 1530–1541Google Scholar
  24. 24.
    Pang Y H, Wang J L, Xu C F. State-of-the-art on frequent pattern mining in data streams (in Chinese). Acta Automat Sin, 2006, 32: 594–602Google Scholar
  25. 25.
    Agilent Technologies. JTC 003: Mixed packet size throughput. J Internet Test Method, 2004, 16–18. Available at http://advanced.comms.agilent.com/n2x/docs/journal/index.htm
  26. 26.
    Cao Z, Wang Z. Flow identification for supporting per-flow queueing. In: Proceedings of 9th International Conference on Computer Communications and Networks. Las Vegas: IEEE, 2000. 88–93Google Scholar
  27. 27.
    Cormode G, Hadjieleftheriou M. Finding frequent items in data streams: Source code. http://www.research.att.com/~marioh/frequent-items/index.html
  28. 28.
    Cormode G, Muthukrishnan S. MassDAL public code bank. http://www.cs.rutgers.edu/~muthu/massdal.html
  29. 29.
    CAIDA anonymized OC48 Internet traces dataset. http://www.caida.org/data/passive/passive oc48 dataset.xml
  30. 30.
    CAIDA anonymized 2008 Internet traces dataset. http://www.caida.org/data/passive/passive 2008 dataset.xml

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Research Center of Computer Network and Information Security TechnologyHarbin Institute of TechnologyHarbinChina
  2. 2.Research Center of Information Security, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina

Personalised recommendations