Frequency Estimation of Internet Packet Streams with Limited Space

Demaine, Erik D.; López-Ortiz, Alejandro; Munro, J. Ian

doi:10.1007/3-540-45749-6_33

Erik D. Demaine⁶,
Alejandro López-Ortiz⁷ &
J. Ian Munro⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2461))

Included in the following conference series:

European Symposium on Algorithms

1946 Accesses
172 Citations

Abstract

We consider a router on the Internet analyzing the statistical properties of a TCP/IP packet stream. A fundamental difficulty with measuring trafic behavior on the Internet is that there is simply too much data to be recorded for later analysis, on the order of gigabytes a second. As a result, network routers can collect only relatively few statistics about the data. The central problem addressed here is to use the limited memory of routers to determine essential features of the network traffic stream. A particularly difficult and representative subproblem is to determine the top k categories to which the most packets belong, for a desired value of k and for a given notion of categorization such as the destination IP address.

We present an algorithm that deterministically finds (in particular) all categories having a frequency above 1/(m+1) using m counters, which we prove is best possible in the worst case. We also present a sampling-based algorithm for the case that packet categories follow an arbitrary distribution, but their order over time is permuted uniformly at random. Under this model, our algorithm identifies flows above a frequency threshold of roughly 1/√nm with high probability, where m is the number of counters and n is the number of packets observed. This guarantee is not far off from the ideal of identifying all flows (probability 1/n), and we prove that it is best possible up to a logarithmic factor. We show that the algorithm ranks the identified flows according to frequency within any desired constant factor of accuracy.

This research is partially supported by the Natural Science and Engineering Research Council of Canada, by the Canada Research Chairs Program, and by the Nippon Telegraph and Telephone Corporation through the NTT-MIT research collaboration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

N. Alon, Y. Matias and M. Szegedy. “The space complexity of approximating the frequency moments”, STOC, 1996, pp. 20–29.
Google Scholar
B. Bloom. “Space/time trade-offs in hash coding with allowable queries”, Comm. ACM, 13:7, July 1970, pp. 422–426.
Article MATH Google Scholar
M. Charikar, K. Chen and M. Farach-Colton. “Finding frequent items in data streams”, to appear in ICALP, 2002.
Google Scholar
S. Chaudhuri, R. Motwani and V. Narasayya. “Random sampling for histogram construction: how much is enough”, In SIGMOD, 1998, pp. 436–447.
Google Scholar
Cisco Systems. Sampled NetFlow, http://www.cisco.com/univercd/cc/td/doc/product/software/ios120/120newft/120limit/120s/120s11/12ssanf.htm, April 2002.
K. Claffy, G. Miller, K. Thompson. The nature of the beast: recent traffic measurements from an Internet backbone. In Proc. 8th Annual Internet Society Conference, 1998.
Google Scholar
M. Datar, A. Gionis, P. Indyk and R. Motwani. “Maintaining stream statistics over sliding windows”, In SODA, 2002, pp. 635–644.
Google Scholar
N.G. Duffield and M. Grossglauser. “Trajectory sampling for direct traffic observation”, In Proc. ACM SIGCOMM, 2000, pp. 271–282.
Google Scholar
C. Estan and G. Varghese. “New directions in trafic measurement and accounting”, In Proc. ACM SIGCOMM Internet Measurement Workshop, 2001.
Google Scholar
M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani and J. Ullman. “Computingiceberg queries efficiently”, VLDB, 1998, pp. 299–310.
Google Scholar
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. “An approximate L1-difference algorithm for massive data streams”, In FOCS, 1999, pp. 501–511.
Google Scholar
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. “Testing and Spot Checking of Data Streams”, In SODA, 2000, pp. 165–174.
Google Scholar
M. J. Fischer and S. L. Salzberg. “Finding a Majority Among N Votes: Solution to Problem 81-5 (Journal of Algorithms, June 1981)”, J. Algorithms, 3(4):362–380, 1982.
Article Google Scholar
W. Feller.An Introduction to Probability Theory and its Applications. 3rd Edition, John Wiley & Sons, 1968.
Google Scholar
P. Flajolet. “Approximate counting: a detailed analysis”, BIT, 25, 1985, pp. 113–134.
Article MATH MathSciNet Google Scholar
P. Flajolet and G.N. Martin. “Probabilistic counting algorithms”, J. Computer and System Sciences, 31, 1985, pp. 182–209.
Article MATH MathSciNet Google Scholar
P. B. Gibbons and Y. Matias. “New sampling-based summary statistics for improving approximate query answers”, InProc. ACM SIGMOD International Conf. on Management of Data, June 1998, pp. 331–342.
Google Scholar
I.D. Graham, S. F. Donelly, S. Martin, J. Martens, and J.G. Cleary. Nonintrusive and accurate measurements of unidirectional delay and delay variation in the Internet. Proc. 8th Annual Internet Society Conference, 1998.
Google Scholar
P. Gupta and N. Mckeown. “Packet classification on multiple fields”, In Proc. ACM SIGCOMM, 1999, pp. 147–160.
Google Scholar
P. J. Haas, J. F Naughton, S. Seshadri and L. Stokes. “Sampling-Based Estimation of the Number of Distinct Values of an Attributerd, In VLDB, 1995, pp. 311–322.
Google Scholar
P. Indyk. “Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computations”, In FOCS, 2000, pp. 189–197.
Google Scholar
J. G. Kalbfleisch, Probability and Statistical Inference, Springer-Verlag, 1979.
Google Scholar
R. Mahajan and S. Floyd. “Controlling High Bandwith Flows at the Congested Router”, In Proc. 9th International Conference on Network Protocols, 2001.
Google Scholar
R. Morris. “Counting large numbers of events in small registers”, Comm. ACM, 21, 1978, pp. 840–842.
Article MATH Google Scholar
Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms, Cambridge University Press, 1995.
Google Scholar
J. S. Vitter. “Optimum algorithms for two random sampling problemsrd, In FOCS, 1983, pp. 65–75.
Google Scholar
K.-Y Whang, B.T. Vander-Zanden, H. M. Taylor. “A Linear-Time Probabilistic Counting Algorithm for Database Applications”, ACM Trans. Database Systems 15(2):208–229, 1990.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Erik D. Demaine
Department of Computer Science, University of Waterloo, N2L 3G1, Waterloo, Ontario, Canada
Alejandro López-Ortiz & J. Ian Munro

Authors

Erik D. Demaine
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro López-Ortiz
View author publications
You can also search for this author in PubMed Google Scholar
J. Ian Munro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fakultät II: Mathematik und Naturwissenschaften, Technische Universität Berlin, Strasse des 17. Juni 136, 10623, Berlin, Germany
Rolf Möhring
Department of Mathematics and Computer Science, University of Leicester, University Road, LE1 7RH, Leicester, UK
Rajeev Raman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Demaine, E.D., López-Ortiz, A., Munro, J.I. (2002). Frequency Estimation of Internet Packet Streams with Limited Space. In: Möhring, R., Raman, R. (eds) Algorithms — ESA 2002. ESA 2002. Lecture Notes in Computer Science, vol 2461. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45749-6_33

Download citation

DOI: https://doi.org/10.1007/3-540-45749-6_33
Published: 29 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44180-9
Online ISBN: 978-3-540-45749-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics