Skip to main content

Count-Min Sketch

  • Living reference work entry
  • First Online:
  • 42 Accesses

Synonyms

CM Sketch

Definition

The Count-Min (CM) Sketch is a compact summary data structure capable of representing a high-dimensional vector and answering queries on this vector, in particular point queries and dot product queries, with strong accuracy guarantees. Such queries are at the core of many computations, so the structure can be used in order to answer a variety of other queries, such as frequent items (heavy hitters), quantile finding, join size estimation, and more. Since the data structure can easily process updates in the form of additions or subtractions to dimensions of the vector (which may correspond to insertions or deletions, or other transactions), it is capable of working over streams of updates, at high rates.

The data structure maintains the linear projection of the vector with a number of other random vectors. These vectors are defined implicitly by simple hash functions. Increasing the range of the hash functions increases the accuracy of the summary, and...

This is a preview of subscription content, log in via an institution.

Recommended Reading

  1. Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. J Algorith. 2005;55(1):58–75.

    Article  MathSciNet  MATH  Google Scholar 

  2. Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In: 29th International Colloquium on Automata, Languages, and Programming; 2002. p. 693–703.

    Google Scholar 

  3. Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of 28th Annual ACM Symposium on Theory of Computing; 1996. p. 20–9. Journal version in J Comput Syst Sci. 1999;58:137–47.

    Google Scholar 

  4. Estan C, Varghese G. New directions in traffic measurement and accounting. In: Proceedings of ACM International Conference of the on Data Communication; 2002. p. 323–38.

    Google Scholar 

  5. Motwani R, Raghavan P. Randomized algorithms. Cambridge: Cambridge University Press; 1995.

    Book  MATH  Google Scholar 

  6. Cormode G, Muthukrishnan S. Summarizing and mining skewed data streams. In: Proceedings of SIAM International Conference on Data Mining; 2005.

    Google Scholar 

  7. Lee GM, Liu H, Yoon Y, Zhang Y. Improving sketch reconstruction accuracy using linear least squares method. In: Proceedings of 5th ACM SIGCOMM Conference on Internet Measurement; 2005. p. 273–8.

    Google Scholar 

  8. Bhattacharrya S, Madeira A, Muthukrishnan S, Ye T. How to scalably skip past streams. In: Proceedings of 1st International Workshop on Scalable Stream Processing Systems; 2007. p. 654–63.

    Google Scholar 

  9. Indyk P. Better algorithms for high-dimensional proximity problems via asymmetric embeddings. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms; 2003.

    Google Scholar 

  10. Lakshminath B, Ganguly S. Estimating entropy over data streams. In: Proceedings of 14th European Symposium on Algorithms; 2006. p. 148–59.

    Google Scholar 

  11. Sarlós T, Benzúr A, Csalogány K, Fogaras D, Rácz B. To randomize or not to randomize: space optimal summaries for hyperlink analysis. In: Proceedings of 15th International World Wide Web Conference; 2006. p. 297–306.

    Google Scholar 

  12. Spiegel J, Polyzotis N. Graph-based synopses for relational selectivity estimation. In: Proceedings of ACM SIGMOD International Conference on Management of Data; 2006. p. 205–16.

    Google Scholar 

  13. Rusu F, Dobra A. Statistical analysis of sketch estimators. In: Proceedings of ACM SIGMOD International Conference on Management of Data; 2007. p. 187–98.

    Google Scholar 

  14. Cormode G, Muthukrishnan S. Space efficient mining of multigraph streams. In: Proceedings of 24th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2005. p. 271–82.

    Google Scholar 

  15. Kollios G, Byers J, Considine J, Hadjieleftheriou M, Li F. Robust aggregation in sensor networks. Q Bull IEEE TC Data Eng. 2005;28(1):26–32.

    Google Scholar 

  16. Roughan M, Zhang Y. Secure distributed data mining and its application in large-scale network measurements. Computer Communication Review. 2006;36(1):7–14.

    Article  Google Scholar 

  17. Cormode G, Korn F, Muthukrishnan S, Johnson T, Spatscheck O, Srivastava D. Holistic UDAFs at streaming speeds. In: Proceedings of ACM SIGMOD International Conference on Management of Data; 2004. p. 35–46.

    Google Scholar 

  18. Lai Y-K, Byrd GT. High-throughput sketch update on a low-power stream processor. In: Proceedings of ACM/IEEE Symposium on Architecture for Networking and Communications Systems; 2006. p. 123–32.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Graham Cormode .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this entry

Cite this entry

Cormode, G. (2017). Count-Min Sketch. In: Liu, L., Özsu, M. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4899-7993-3_87-2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4899-7993-3_87-2

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4899-7993-3

  • Online ISBN: 978-1-4899-7993-3

  • eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics