Skip to main content

Frequency Estimation of Internet Packet Streams with Limited Space

  • Conference paper
  • First Online:
Algorithms — ESA 2002 (ESA 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2461))

Included in the following conference series:

Abstract

We consider a router on the Internet analyzing the statistical properties of a TCP/IP packet stream. A fundamental difficulty with measuring trafic behavior on the Internet is that there is simply too much data to be recorded for later analysis, on the order of gigabytes a second. As a result, network routers can collect only relatively few statistics about the data. The central problem addressed here is to use the limited memory of routers to determine essential features of the network traffic stream. A particularly difficult and representative subproblem is to determine the top k categories to which the most packets belong, for a desired value of k and for a given notion of categorization such as the destination IP address.

We present an algorithm that deterministically finds (in particular) all categories having a frequency above 1/(m+1) using m counters, which we prove is best possible in the worst case. We also present a sampling-based algorithm for the case that packet categories follow an arbitrary distribution, but their order over time is permuted uniformly at random. Under this model, our algorithm identifies flows above a frequency threshold of roughly 1/√nm with high probability, where m is the number of counters and n is the number of packets observed. This guarantee is not far off from the ideal of identifying all flows (probability 1/n), and we prove that it is best possible up to a logarithmic factor. We show that the algorithm ranks the identified flows according to frequency within any desired constant factor of accuracy.

This research is partially supported by the Natural Science and Engineering Research Council of Canada, by the Canada Research Chairs Program, and by the Nippon Telegraph and Telephone Corporation through the NTT-MIT research collaboration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. Alon, Y. Matias and M. Szegedy. “The space complexity of approximating the frequency moments”, STOC, 1996, pp. 20–29.

    Google Scholar 

  2. B. Bloom. “Space/time trade-offs in hash coding with allowable queries”, Comm. ACM, 13:7, July 1970, pp. 422–426.

    Article  MATH  Google Scholar 

  3. M. Charikar, K. Chen and M. Farach-Colton. “Finding frequent items in data streams”, to appear in ICALP, 2002.

    Google Scholar 

  4. S. Chaudhuri, R. Motwani and V. Narasayya. “Random sampling for histogram construction: how much is enough”, In SIGMOD, 1998, pp. 436–447.

    Google Scholar 

  5. Cisco Systems. Sampled NetFlow, http://www.cisco.com/univercd/cc/td/doc/product/software/ios120/120newft/120limit/120s/120s11/12ssanf.htm, April 2002.

  6. K. Claffy, G. Miller, K. Thompson. The nature of the beast: recent traffic measurements from an Internet backbone. In Proc. 8th Annual Internet Society Conference, 1998.

    Google Scholar 

  7. M. Datar, A. Gionis, P. Indyk and R. Motwani. “Maintaining stream statistics over sliding windows”, In SODA, 2002, pp. 635–644.

    Google Scholar 

  8. N.G. Duffield and M. Grossglauser. “Trajectory sampling for direct traffic observation”, In Proc. ACM SIGCOMM, 2000, pp. 271–282.

    Google Scholar 

  9. C. Estan and G. Varghese. “New directions in trafic measurement and accounting”, In Proc. ACM SIGCOMM Internet Measurement Workshop, 2001.

    Google Scholar 

  10. M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani and J. Ullman. “Computingiceberg queries efficiently”, VLDB, 1998, pp. 299–310.

    Google Scholar 

  11. J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. “An approximate L1-difference algorithm for massive data streams”, In FOCS, 1999, pp. 501–511.

    Google Scholar 

  12. J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. “Testing and Spot Checking of Data Streams”, In SODA, 2000, pp. 165–174.

    Google Scholar 

  13. M. J. Fischer and S. L. Salzberg. “Finding a Majority Among N Votes: Solution to Problem 81-5 (Journal of Algorithms, June 1981)”, J. Algorithms, 3(4):362–380, 1982.

    Article  Google Scholar 

  14. W. Feller.An Introduction to Probability Theory and its Applications. 3rd Edition, John Wiley & Sons, 1968.

    Google Scholar 

  15. P. Flajolet. “Approximate counting: a detailed analysis”, BIT, 25, 1985, pp. 113–134.

    Article  MATH  MathSciNet  Google Scholar 

  16. P. Flajolet and G.N. Martin. “Probabilistic counting algorithms”, J. Computer and System Sciences, 31, 1985, pp. 182–209.

    Article  MATH  MathSciNet  Google Scholar 

  17. P. B. Gibbons and Y. Matias. “New sampling-based summary statistics for improving approximate query answers”, InProc. ACM SIGMOD International Conf. on Management of Data, June 1998, pp. 331–342.

    Google Scholar 

  18. I.D. Graham, S. F. Donelly, S. Martin, J. Martens, and J.G. Cleary. Nonintrusive and accurate measurements of unidirectional delay and delay variation in the Internet. Proc. 8th Annual Internet Society Conference, 1998.

    Google Scholar 

  19. P. Gupta and N. Mckeown. “Packet classification on multiple fields”, In Proc. ACM SIGCOMM, 1999, pp. 147–160.

    Google Scholar 

  20. P. J. Haas, J. F Naughton, S. Seshadri and L. Stokes. “Sampling-Based Estimation of the Number of Distinct Values of an Attributerd, In VLDB, 1995, pp. 311–322.

    Google Scholar 

  21. P. Indyk. “Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computations”, In FOCS, 2000, pp. 189–197.

    Google Scholar 

  22. J. G. Kalbfleisch, Probability and Statistical Inference, Springer-Verlag, 1979.

    Google Scholar 

  23. R. Mahajan and S. Floyd. “Controlling High Bandwith Flows at the Congested Router”, In Proc. 9th International Conference on Network Protocols, 2001.

    Google Scholar 

  24. R. Morris. “Counting large numbers of events in small registers”, Comm. ACM, 21, 1978, pp. 840–842.

    Article  MATH  Google Scholar 

  25. Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms, Cambridge University Press, 1995.

    Google Scholar 

  26. J. S. Vitter. “Optimum algorithms for two random sampling problemsrd, In FOCS, 1983, pp. 65–75.

    Google Scholar 

  27. K.-Y Whang, B.T. Vander-Zanden, H. M. Taylor. “A Linear-Time Probabilistic Counting Algorithm for Database Applications”, ACM Trans. Database Systems 15(2):208–229, 1990.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Demaine, E.D., López-Ortiz, A., Munro, J.I. (2002). Frequency Estimation of Internet Packet Streams with Limited Space. In: Möhring, R., Raman, R. (eds) Algorithms — ESA 2002. ESA 2002. Lecture Notes in Computer Science, vol 2461. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45749-6_33

Download citation

  • DOI: https://doi.org/10.1007/3-540-45749-6_33

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44180-9

  • Online ISBN: 978-3-540-45749-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics