Count-Min Sketch

Cormode, Graham

doi:10.1007/978-1-4614-8265-9_87

Count-Min Sketch

Graham Cormode³

Reference work entry
First Online: 01 January 2018

74 Accesses
1 Citations

Synonyms

CM Sketch

Definition

The Count-Min (CM) Sketch is a compact summary data structure capable of representing a high-dimensional vector and answering queries on this vector, in particular point queries and dot product queries, with strong accuracy guarantees. Such queries are at the core of many computations, so the structure can be used in order to answer a variety of other queries, such as frequent items (heavy hitters), quantile finding, join size estimation, and more. Since the data structure can easily process updates in the form of additions or subtractions to dimensions of the vector (which may correspond to insertions or deletions, or other transactions), it is capable of working over streams of updates, at high rates.

The data structure maintains the linear projection of the vector with a number of other random vectors. These vectors are defined implicitly by simple hash functions. Increasing the range of the hash functions increases the accuracy of the summary, and...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. J Algorith. 2005;55(1):58–75.
Article MathSciNet MATH Google Scholar
Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In: Proceedings of the 29th International Colloquium on Automata, Languages, and Programming; 2002. p. 693–703.
Chapter Google Scholar
Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annual ACM Symposium on Theory of Computing; 1996. p. 20–9. Journal version in J Comput Syst Sci. 1999;58(1):137–47.
Article MATH Google Scholar
Estan C, Varghese G. New directions in traffic measurement and accounting. In: Proceedings of the ACM International Conference of the on Data Communication; 2002. p. 323–38.
Article Google Scholar
Motwani R, Raghavan P. Randomized algorithms. Cambridge: Cambridge University Press; 1995.
Book MATH Google Scholar
Cormode G, Muthukrishnan S. Summarizing and mining skewed data streams. In: Proceedings of the 2005 SIAM International Conference on Data Mining; 2005.
Google Scholar
Lee GM, Liu H, Yoon Y, Zhang Y. Improving sketch reconstruction accuracy using linear least squares method. In: Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement; 2005. p. 273–8.
Google Scholar
Bhattacharrya S, Madeira A, Muthukrishnan S, Ye T. How to scalably skip past streams. In: Proceedings of the 1st International Workshop on Scalable Stream Processing Systems; 2007. p. 654–63.
Google Scholar
Indyk P. Better algorithms for high-dimensional proximity problems via asymmetric embeddings. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms; 2003.
Google Scholar
Lakshminath B, Ganguly S. Estimating entropy over data streams. In: Proceedings of the 14th European Symposium on Algorithms; 2006. p. 148–59.
Google Scholar
Sarlós T, Benzúr A, Csalogány K, Fogaras D, Rácz B. To randomize or not to randomize: space optimal summaries for hyperlink analysis. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 297–306.
Google Scholar
Spiegel J, Polyzotis N. Graph-based synopses for relational selectivity estimation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006. p. 205–16.
Google Scholar
Rusu F, Dobra A. Statistical analysis of sketch estimators. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2007. p. 187–98.
Google Scholar
Cormode G, Muthukrishnan S. Space efficient mining of multigraph streams. In: Proceedings of the 24th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2005. p. 271–82.
Google Scholar
Kollios G, Byers J, Considine J, Hadjieleftheriou M, Li F. Robust aggregation in sensor networks. Q Bull IEEE TC Data Eng. 2005;28(1):26–32.
Google Scholar
Roughan M, Zhang Y. Secure distributed data mining and its application in large-scale network measurements. Computer Communication Review. 2006;36(1):7–14.
Article Google Scholar
Cormode G, Korn F, Muthukrishnan S, Johnson T, Spatscheck O, Srivastava D. Holistic UDAFs at streaming speeds. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 35–46.
Google Scholar
Lai Y-K, Byrd GT. High-throughput sketch update on a low-power stream processor. In: Proceedings of the ACM/IEEE Symposium on Architecture for Networking and Communications Systems; 2006. p. 123–32.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science, University of Warwick, Warwick, UK
Graham Cormode

Authors

Graham Cormode
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Graham Cormode .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Section Editor information

AT&T Labs-Research, Bedminster, NJ, USA
Divesh Srivastava

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Cormode, G. (2018). Count-Min Sketch. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_87

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_87
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics