Skip to main content

Top-\(k\) Frequent Item Maintenance over Streams

  • Chapter
  • First Online:
Data Stream Management

Part of the book series: Data-Centric Systems and Applications ((DCSA))

  • 3486 Accesses

Abstract

We consider the problem of finding the most frequent items in a data stream. Given a data stream \(a_{1},a_{2},\ldots,a_{n}\), where each \(a_{i} \in \{1,\ldots,m\}\), we would like to identify the items that occur most frequently in one pass over the data stream using a small amount of storage space. Such problems arise in a variety of settings. For example, a search engine might be interested in gathering statistics about its query stream and in particular, identifying the most popular queries. Another application is to detecting network anomalies by monitoring network traffic. We describe a variety of approaches that have been proposed to solve these problems. Our goal is to give a flavor of the various techniques that have been used in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58(1), 137–147 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  2. M. Charikar, K. Chen, M. Farach-Colton, Finding frequent items in data streams. Theor. Comput. Sci. 312(1), 3–15 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  3. G. Cormode, S. Muthukrishnan, What’s hot and what’s not: tracking most frequent items dynamically, in Proceedings of PODS (2003), pp. 296–306

    Google Scholar 

  4. G. Cormode, S. Muthukrishnan, An improved data stream summary: the count-min sketch and its applications, in Proceedings of LATIN (2004), pp. 29–38

    Google Scholar 

  5. E.D. Demaine, A. López-Ortiz, J.I. Munro, Frequency estimation of internet packet streams with limited space, in Proceedings of ESA (2002), pp. 348–360

    Google Scholar 

  6. B. Kalyanasundaram, G. Schnitger, The probabilistic communication complexity of set intersection. SIAM J. Discrete Math. 5(4), 545–557 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  7. R.M. Karp, S. Shenker, C.H. Papadimitriou, A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28, 51–55 (2003)

    Article  Google Scholar 

  8. G.S. Manku, R. Motwani, Approximate frequency counts over data streams, in Proceedings of VLDB (2002), pp. 346–357

    Google Scholar 

  9. J. Misra, D. Gries, Finding repeated elements. Sci. Comput. Program. 2, 143–152 (1982)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moses Charikar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Charikar, M. (2016). Top-\(k\) Frequent Item Maintenance over Streams. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds) Data Stream Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28608-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28608-0_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28607-3

  • Online ISBN: 978-3-540-28608-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics