Abstract
We consider the problem of finding the most frequent items in a data stream. Given a data stream \(a_{1},a_{2},\ldots,a_{n}\), where each \(a_{i} \in \{1,\ldots,m\}\), we would like to identify the items that occur most frequently in one pass over the data stream using a small amount of storage space. Such problems arise in a variety of settings. For example, a search engine might be interested in gathering statistics about its query stream and in particular, identifying the most popular queries. Another application is to detecting network anomalies by monitoring network traffic. We describe a variety of approaches that have been proposed to solve these problems. Our goal is to give a flavor of the various techniques that have been used in this area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58(1), 137–147 (1999)
M. Charikar, K. Chen, M. Farach-Colton, Finding frequent items in data streams. Theor. Comput. Sci. 312(1), 3–15 (2004)
G. Cormode, S. Muthukrishnan, What’s hot and what’s not: tracking most frequent items dynamically, in Proceedings of PODS (2003), pp. 296–306
G. Cormode, S. Muthukrishnan, An improved data stream summary: the count-min sketch and its applications, in Proceedings of LATIN (2004), pp. 29–38
E.D. Demaine, A. López-Ortiz, J.I. Munro, Frequency estimation of internet packet streams with limited space, in Proceedings of ESA (2002), pp. 348–360
B. Kalyanasundaram, G. Schnitger, The probabilistic communication complexity of set intersection. SIAM J. Discrete Math. 5(4), 545–557 (1992)
R.M. Karp, S. Shenker, C.H. Papadimitriou, A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28, 51–55 (2003)
G.S. Manku, R. Motwani, Approximate frequency counts over data streams, in Proceedings of VLDB (2002), pp. 346–357
J. Misra, D. Gries, Finding repeated elements. Sci. Comput. Program. 2, 143–152 (1982)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Charikar, M. (2016). Top-\(k\) Frequent Item Maintenance over Streams. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds) Data Stream Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28608-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-28608-0_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28607-3
Online ISBN: 978-3-540-28608-0
eBook Packages: Computer ScienceComputer Science (R0)