Synonyms
Definition
The Count-Min (CM) Sketch is a compact summary data structure capable of representing a high-dimensional vector and answering queries on this vector, in particular point queries and dot product queries, with strong accuracy guarantees. Such queries are at the core of many computations, so the structure can be used in order to answer a variety of other queries, such as frequent items (heavy hitters), quantile finding, join size estimation, and more. Since the data structure can easily process updates in the form of additions or subtractions to dimensions of the vector (which may correspond to insertions or deletions, or other transactions), it is capable of working over streams of updates, at high rates.
The data structure maintains the linear projection of the vector with a number of other random vectors. These vectors are defined implicitly by simple hash functions. Increasing the range of the hash functions increases the accuracy of the summary, and...
This is a preview of subscription content, log in via an institution.
Recommended Reading
Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. J Algorith. 2005;55(1):58–75.
Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In: 29th International Colloquium on Automata, Languages, and Programming; 2002. p. 693–703.
Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of 28th Annual ACM Symposium on Theory of Computing; 1996. p. 20–9. Journal version in J Comput Syst Sci. 1999;58:137–47.
Estan C, Varghese G. New directions in traffic measurement and accounting. In: Proceedings of ACM International Conference of the on Data Communication; 2002. p. 323–38.
Motwani R, Raghavan P. Randomized algorithms. Cambridge: Cambridge University Press; 1995.
Cormode G, Muthukrishnan S. Summarizing and mining skewed data streams. In: Proceedings of SIAM International Conference on Data Mining; 2005.
Lee GM, Liu H, Yoon Y, Zhang Y. Improving sketch reconstruction accuracy using linear least squares method. In: Proceedings of 5th ACM SIGCOMM Conference on Internet Measurement; 2005. p. 273–8.
Bhattacharrya S, Madeira A, Muthukrishnan S, Ye T. How to scalably skip past streams. In: Proceedings of 1st International Workshop on Scalable Stream Processing Systems; 2007. p. 654–63.
Indyk P. Better algorithms for high-dimensional proximity problems via asymmetric embeddings. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms; 2003.
Lakshminath B, Ganguly S. Estimating entropy over data streams. In: Proceedings of 14th European Symposium on Algorithms; 2006. p. 148–59.
Sarlós T, Benzúr A, Csalogány K, Fogaras D, Rácz B. To randomize or not to randomize: space optimal summaries for hyperlink analysis. In: Proceedings of 15th International World Wide Web Conference; 2006. p. 297–306.
Spiegel J, Polyzotis N. Graph-based synopses for relational selectivity estimation. In: Proceedings of ACM SIGMOD International Conference on Management of Data; 2006. p. 205–16.
Rusu F, Dobra A. Statistical analysis of sketch estimators. In: Proceedings of ACM SIGMOD International Conference on Management of Data; 2007. p. 187–98.
Cormode G, Muthukrishnan S. Space efficient mining of multigraph streams. In: Proceedings of 24th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2005. p. 271–82.
Kollios G, Byers J, Considine J, Hadjieleftheriou M, Li F. Robust aggregation in sensor networks. Q Bull IEEE TC Data Eng. 2005;28(1):26–32.
Roughan M, Zhang Y. Secure distributed data mining and its application in large-scale network measurements. Computer Communication Review. 2006;36(1):7–14.
Cormode G, Korn F, Muthukrishnan S, Johnson T, Spatscheck O, Srivastava D. Holistic UDAFs at streaming speeds. In: Proceedings of ACM SIGMOD International Conference on Management of Data; 2004. p. 35–46.
Lai Y-K, Byrd GT. High-throughput sketch update on a low-power stream processor. In: Proceedings of ACM/IEEE Symposium on Architecture for Networking and Communications Systems; 2006. p. 123–32.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media LLC
About this entry
Cite this entry
Cormode, G. (2017). Count-Min Sketch. In: Liu, L., Özsu, M. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4899-7993-3_87-2
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7993-3_87-2
Received:
Accepted:
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4899-7993-3
Online ISBN: 978-1-4899-7993-3
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering