Advertisement

Top-\(k\) Frequent Item Maintenance over Streams

  • Moses CharikarEmail author
Chapter
Part of the Data-Centric Systems and Applications book series (DCSA)

Abstract

We consider the problem of finding the most frequent items in a data stream. Given a data stream \(a_{1},a_{2},\ldots,a_{n}\), where each \(a_{i} \in \{1,\ldots,m\}\), we would like to identify the items that occur most frequently in one pass over the data stream using a small amount of storage space. Such problems arise in a variety of settings. For example, a search engine might be interested in gathering statistics about its query stream and in particular, identifying the most popular queries. Another application is to detecting network anomalies by monitoring network traffic. We describe a variety of approaches that have been proposed to solve these problems. Our goal is to give a flavor of the various techniques that have been used in this area.

Keywords

Data Stream Hash Function Total Count Frequent Item Deterministic Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58(1), 137–147 (1999) MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    M. Charikar, K. Chen, M. Farach-Colton, Finding frequent items in data streams. Theor. Comput. Sci. 312(1), 3–15 (2004) MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    G. Cormode, S. Muthukrishnan, What’s hot and what’s not: tracking most frequent items dynamically, in Proceedings of PODS (2003), pp. 296–306 Google Scholar
  4. 4.
    G. Cormode, S. Muthukrishnan, An improved data stream summary: the count-min sketch and its applications, in Proceedings of LATIN (2004), pp. 29–38 Google Scholar
  5. 5.
    E.D. Demaine, A. López-Ortiz, J.I. Munro, Frequency estimation of internet packet streams with limited space, in Proceedings of ESA (2002), pp. 348–360 Google Scholar
  6. 6.
    B. Kalyanasundaram, G. Schnitger, The probabilistic communication complexity of set intersection. SIAM J. Discrete Math. 5(4), 545–557 (1992) MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    R.M. Karp, S. Shenker, C.H. Papadimitriou, A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28, 51–55 (2003) CrossRefGoogle Scholar
  8. 8.
    G.S. Manku, R. Motwani, Approximate frequency counts over data streams, in Proceedings of VLDB (2002), pp. 346–357 Google Scholar
  9. 9.
    J. Misra, D. Gries, Finding repeated elements. Sci. Comput. Program. 2, 143–152 (1982) MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Computer Science DepartmentStanford UniversityStanfordUSA

Personalised recommendations