Frequency Estimation of Internet Packet Streams with Limited Space

  • Erik D. Demaine
  • Alejandro López-Ortiz
  • J. Ian Munro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2461)


We consider a router on the Internet analyzing the statistical properties of a TCP/IP packet stream. A fundamental difficulty with measuring trafic behavior on the Internet is that there is simply too much data to be recorded for later analysis, on the order of gigabytes a second. As a result, network routers can collect only relatively few statistics about the data. The central problem addressed here is to use the limited memory of routers to determine essential features of the network traffic stream. A particularly difficult and representative subproblem is to determine the top k categories to which the most packets belong, for a desired value of k and for a given notion of categorization such as the destination IP address.

We present an algorithm that deterministically finds (in particular) all categories having a frequency above 1/(m+1) using m counters, which we prove is best possible in the worst case. We also present a sampling-based algorithm for the case that packet categories follow an arbitrary distribution, but their order over time is permuted uniformly at random. Under this model, our algorithm identifies flows above a frequency threshold of roughly 1/√nm with high probability, where m is the number of counters and n is the number of packets observed. This guarantee is not far off from the ideal of identifying all flows (probability 1/n), and we prove that it is best possible up to a logarithmic factor. We show that the algorithm ranks the identified flows according to frequency within any desired constant factor of accuracy.


Frequency Estimation Frequency Threshold Current Element Popular Category Packet Stream 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    N. Alon, Y. Matias and M. Szegedy. “The space complexity of approximating the frequency moments”, STOC, 1996, pp. 20–29.Google Scholar
  2. [2]
    B. Bloom. “Space/time trade-offs in hash coding with allowable queries”, Comm. ACM, 13:7, July 1970, pp. 422–426.zbMATHCrossRefGoogle Scholar
  3. [3]
    M. Charikar, K. Chen and M. Farach-Colton. “Finding frequent items in data streams”, to appear in ICALP, 2002.Google Scholar
  4. [4]
    S. Chaudhuri, R. Motwani and V. Narasayya. “Random sampling for histogram construction: how much is enough”, In SIGMOD, 1998, pp. 436–447.Google Scholar
  5. [5]
  6. [6]
    K. Claffy, G. Miller, K. Thompson. The nature of the beast: recent traffic measurements from an Internet backbone. In Proc. 8th Annual Internet Society Conference, 1998.Google Scholar
  7. [7]
    M. Datar, A. Gionis, P. Indyk and R. Motwani. “Maintaining stream statistics over sliding windows”, In SODA, 2002, pp. 635–644.Google Scholar
  8. [8]
    N.G. Duffield and M. Grossglauser. “Trajectory sampling for direct traffic observation”, In Proc. ACM SIGCOMM, 2000, pp. 271–282.Google Scholar
  9. [9]
    C. Estan and G. Varghese. “New directions in trafic measurement and accounting”, In Proc. ACM SIGCOMM Internet Measurement Workshop, 2001.Google Scholar
  10. [10]
    M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani and J. Ullman. “Computingiceberg queries efficiently”, VLDB, 1998, pp. 299–310.Google Scholar
  11. [11]
    J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. “An approximate L1-difference algorithm for massive data streams”, In FOCS, 1999, pp. 501–511.Google Scholar
  12. [12]
    J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. “Testing and Spot Checking of Data Streams”, In SODA, 2000, pp. 165–174.Google Scholar
  13. [13]
    M. J. Fischer and S. L. Salzberg. “Finding a Majority Among N Votes: Solution to Problem 81-5 (Journal of Algorithms, June 1981)”, J. Algorithms, 3(4):362–380, 1982.CrossRefGoogle Scholar
  14. [14]
    W. Feller.An Introduction to Probability Theory and its Applications. 3rd Edition, John Wiley & Sons, 1968.Google Scholar
  15. [15]
    P. Flajolet. “Approximate counting: a detailed analysis”, BIT, 25, 1985, pp. 113–134.zbMATHCrossRefMathSciNetGoogle Scholar
  16. [16]
    P. Flajolet and G.N. Martin. “Probabilistic counting algorithms”, J. Computer and System Sciences, 31, 1985, pp. 182–209.zbMATHCrossRefMathSciNetGoogle Scholar
  17. [17]
    P. B. Gibbons and Y. Matias. “New sampling-based summary statistics for improving approximate query answers”, InProc. ACM SIGMOD International Conf. on Management of Data, June 1998, pp. 331–342.Google Scholar
  18. [18]
    I.D. Graham, S. F. Donelly, S. Martin, J. Martens, and J.G. Cleary. Nonintrusive and accurate measurements of unidirectional delay and delay variation in the Internet. Proc. 8th Annual Internet Society Conference, 1998.Google Scholar
  19. [19]
    P. Gupta and N. Mckeown. “Packet classification on multiple fields”, In Proc. ACM SIGCOMM, 1999, pp. 147–160.Google Scholar
  20. [20]
    P. J. Haas, J. F Naughton, S. Seshadri and L. Stokes. “Sampling-Based Estimation of the Number of Distinct Values of an Attributerd, In VLDB, 1995, pp. 311–322.Google Scholar
  21. [21]
    P. Indyk. “Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computations”, In FOCS, 2000, pp. 189–197.Google Scholar
  22. [22]
    J. G. Kalbfleisch, Probability and Statistical Inference, Springer-Verlag, 1979.Google Scholar
  23. [23]
    R. Mahajan and S. Floyd. “Controlling High Bandwith Flows at the Congested Router”, In Proc. 9th International Conference on Network Protocols, 2001.Google Scholar
  24. [24]
    R. Morris. “Counting large numbers of events in small registers”, Comm. ACM, 21, 1978, pp. 840–842.zbMATHCrossRefGoogle Scholar
  25. [25]
    Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms, Cambridge University Press, 1995.Google Scholar
  26. [26]
    J. S. Vitter. “Optimum algorithms for two random sampling problemsrd, In FOCS, 1983, pp. 65–75.Google Scholar
  27. [27]
    K.-Y Whang, B.T. Vander-Zanden, H. M. Taylor. “A Linear-Time Probabilistic Counting Algorithm for Database Applications”, ACM Trans. Database Systems 15(2):208–229, 1990.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Erik D. Demaine
    • 1
  • Alejandro López-Ortiz
    • 2
  • J. Ian Munro
    • 2
  1. 1.Laboratory for Computer ScienceMassachusetts Institute of TechnologyCambridgeUSA
  2. 2.Department of Computer ScienceUniversity of WaterlooWaterloo, OntarioCanada

Personalised recommendations