Skip to main content

Count-Min Sketch

  • Reference work entry
  • First Online:

Synonyms

CM Sketch

Definition

The Count-Min (CM) Sketch is a compact summary data structure capable of representing a high-dimensional vector and answering queries on this vector, in particular point queries and dot product queries, with strong accuracy guarantees. Such queries are at the core of many computations, so the structure can be used in order to answer a variety of other queries, such as frequent items (heavy hitters), quantile finding, join size estimation, and more. Since the data structure can easily process updates in the form of additions or subtractions to dimensions of the vector (which may correspond to insertions or deletions, or other transactions), it is capable of working over streams of updates, at high rates.

The data structure maintains the linear projection of the vector with a number of other random vectors. These vectors are defined implicitly by simple hash functions. Increasing the range of the hash functions increases the accuracy of the summary, and...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. J Algorith. 2005;55(1):58–75.

    Article  MathSciNet  MATH  Google Scholar 

  2. Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In: Proceedings of the 29th International Colloquium on Automata, Languages, and Programming; 2002. p. 693–703.

    Chapter  Google Scholar 

  3. Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annual ACM Symposium on Theory of Computing; 1996. p. 20–9. Journal version in J Comput Syst Sci. 1999;58(1):137–47.

    Article  MATH  Google Scholar 

  4. Estan C, Varghese G. New directions in traffic measurement and accounting. In: Proceedings of the ACM International Conference of the on Data Communication; 2002. p. 323–38.

    Article  Google Scholar 

  5. Motwani R, Raghavan P. Randomized algorithms. Cambridge: Cambridge University Press; 1995.

    Book  MATH  Google Scholar 

  6. Cormode G, Muthukrishnan S. Summarizing and mining skewed data streams. In: Proceedings of the 2005 SIAM International Conference on Data Mining; 2005.

    Google Scholar 

  7. Lee GM, Liu H, Yoon Y, Zhang Y. Improving sketch reconstruction accuracy using linear least squares method. In: Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement; 2005. p. 273–8.

    Google Scholar 

  8. Bhattacharrya S, Madeira A, Muthukrishnan S, Ye T. How to scalably skip past streams. In: Proceedings of the 1st International Workshop on Scalable Stream Processing Systems; 2007. p. 654–63.

    Google Scholar 

  9. Indyk P. Better algorithms for high-dimensional proximity problems via asymmetric embeddings. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms; 2003.

    Google Scholar 

  10. Lakshminath B, Ganguly S. Estimating entropy over data streams. In: Proceedings of the 14th European Symposium on Algorithms; 2006. p. 148–59.

    Google Scholar 

  11. Sarlós T, Benzúr A, Csalogány K, Fogaras D, Rácz B. To randomize or not to randomize: space optimal summaries for hyperlink analysis. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 297–306.

    Google Scholar 

  12. Spiegel J, Polyzotis N. Graph-based synopses for relational selectivity estimation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006. p. 205–16.

    Google Scholar 

  13. Rusu F, Dobra A. Statistical analysis of sketch estimators. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2007. p. 187–98.

    Google Scholar 

  14. Cormode G, Muthukrishnan S. Space efficient mining of multigraph streams. In: Proceedings of the 24th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2005. p. 271–82.

    Google Scholar 

  15. Kollios G, Byers J, Considine J, Hadjieleftheriou M, Li F. Robust aggregation in sensor networks. Q Bull IEEE TC Data Eng. 2005;28(1):26–32.

    Google Scholar 

  16. Roughan M, Zhang Y. Secure distributed data mining and its application in large-scale network measurements. Computer Communication Review. 2006;36(1):7–14.

    Article  Google Scholar 

  17. Cormode G, Korn F, Muthukrishnan S, Johnson T, Spatscheck O, Srivastava D. Holistic UDAFs at streaming speeds. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 35–46.

    Google Scholar 

  18. Lai Y-K, Byrd GT. High-throughput sketch update on a low-power stream processor. In: Proceedings of the ACM/IEEE Symposium on Architecture for Networking and Communications Systems; 2006. p. 123–32.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Graham Cormode .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Cormode, G. (2018). Count-Min Sketch. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_87

Download citation

Publish with us

Policies and ethics