Abstract
Sketch is a memory-efficient data structure, and is used to store and query the frequency of any item in a given multiset. As it can achieve fast query and update, it has been applied to various fields. Different sketches have different advantages and disadvantages. Sketches are originally proposed for estimation of flow size in network measurement. The key factor of sketches for network measurement is the insertion speed and accuracy. In this paper, we propose a new sketch, which can significantly improve the insertion speed while improving the accuracy. Our key methods include on-chip/off-chip separation and partial update algorithm. Extensive experimental results show that our sketch significantly outperforms the state-of-the-art both in terms of accuracy and speed.
Similar content being viewed by others
References
Aggarwal, C.C., Subbian, K.: Event detection in social streams. In: SDM, vol. 12. SIAM (2012)
Aguilar-Saborit, J., Trancoso, P., Muntes-Mulero, V., Larriba-Pey, J.-L.: Dynamic count filters. ACM SIGMOD Rec. 35, 26–32 (2006)
Ben Basat, R., Einziger, G., Friedman, R., Luizelli, M.C., Waisbard, E.: Constant time updates in hierarchical heavy hitters. In: Proceedings of ACM SIGCOMM, pp. 127–140 (2017)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Callegari, C.: Statistical approaches for network anomaly detection. In: Proceedings of ICIMP. Citeseer (2009)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Automata, Languages and Programming. Springer, New York (2002)
Cohen, S., Matias, Y.: Spectral bloom filters. In: Proceedings of ACM SIGMOD, pp. 241–252 (2003)
Cormode, G.: Sketch Techniques for Approximate Query Processing. Foundations and Trends in Databases. NOW Publishers, Breda (2011)
Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. Proc. VLDB Endow. 1(2), 1530–1541 (2008)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithm. 55(1), 58–75 (2005)
Durme, B.V., Lall, A.: Streaming pointwise mutual information. In: Advances in Neural Information Processing Systems, pp. 1892–1900 (2009)
Estan, C., Varghese, G.: New directions in traffic measurement and accounting. ACM SIGMCOMM CCR 32(4) (2002)
Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM ToN 8(3), 281–293 (2000)
Gilbert, A.C., Strauss, M.J., Tropp, J.A., Vershynin, R.: One sketch for all: fast algorithms for compressed sensing. In: Proceedings of ACM Symposium on Theory of Computing (2007)
Goyal, A., Daumé, H. III.: Approximate scalable bounded space sketch for large data nlp. In: Proceedings of EMNLP (2011)
Goyal, A., Daumé, H. III.: Lossy conservative update (lcu) sketch: Succinct approximate count storage. In: Proceedings of AAAI (2011)
Li, P., Church, K.W., Hastie, T.J.: One sketch for all: Theory and application of conditional random sampling. In: Proceedings of Advances in Neural Information Processing Systems, pp. 953–960 (2009)
Li, T., Chen, S., Ling, Y.: Per-flow traffic measurement through randomized counter sharing. IEEE/ACM Trans. Netw. 20(5), 1622–1634 (2012)
Liu, Z., Manousis, A., et al.: One sketch to rule them all: Rethinking network flow monitoring with univmon. In: Proceedings of ACM SIGCOMM (2016)
Liu, Z., Manousis, A., Vorsanger, G., Sekar, V., Braverman, V.: One sketch to rule them all: Rethinking network flow monitoring with univmon. In: ACM Proceedings of SIGCOMM, pp. 101–114 (2016)
Lu, Y., Montanari, A., Prabhakar, B., Dharmapurikar, S., Kabbani, A.: Counter braids: a novel counter architecture for per-flow measurement. Proc. ACM SIGMETRICS 36(1), 121–132 (2008)
Pitel, G., Fouquier, G.: Count-min-log sketch: Approximately counting with approximate counters (2015). arXiv:1502.04885
Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Approximate xml query answers. In: Proceedings of ACM SIGMOD (2004)
Powers, D.M.: Applications and explanations of Zipf’s law. In: Proceedings of EMNLP-CoNLL. Association for Computational Linguistics (1998)
Rousskov, A., Wessels, D.: High-performance benchmarking with web polygraph. Software 34(2), 187–211 (2004)
Talbot, D., Osborne, M.: Smoothed bloom filter language models: Tera-scale lms on the cheap. In: EMNLP-CoNLL, pp. 468–476 (2007)
Thomas, D., Bordawekar, R., et al.: On efficient query processing of stream counts on the cell processor. In: Proceedings of IEEE ICDE (2009)
Van Durme, B., Lall, A.: Probabilistic counting with randomized storage. In: Proceedings of IJCAI, pp. 1574–1579 (2009)
Yang, T., Yuan, B., Zhang, S., Zhang, T., Duan, R., Wang, Y., Liu, B.: Approaching optimal compression with fast update for large scale routing tables. In: Proceedings of IEEE IWQoS, p. 32. IEEE Press, New York (2012)
Yang, T., Xie, G., Li, Y., Fu, Q., Liu, A.X., Li, Q., Mathy, L.: Guarantee ip lookup performance with fib explosion. In: Proceedings of ACM SIGCOMM, pp. 39–50. ACM (2014)
Yang, T., Liu, A.X., Shahzad, M., Zhong, Y., Fu, Q., Li, Z., Xie, G., Li, X.: A shifting bloom filter framework for set queries. Proc. VLDB Endow. 9(5), 408–419 (2016)
Yang, T., Liu, A.X., Shahzad, M., Yang, D., Fu, Q., Xie, G., Li, X.: A shifting framework for set queries. IEEE/ACM Trans. Netw. 25(5), 3116–3131 (2017)
Zhang, Y., Singh, S., Sen, S., Duffield, N., Lund, C.: Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications. In: Proceedings of ACM IMC (2004)
Zhao, Q.G., Ogihara, M., Wang, H., Xu, J.J.: Finding global icebergs over distributed data sets. In: Proceedings of ACM SIGMOD-SIGACT-SIGART (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xin, Q., Wu, J. An accurate estimation algorithm for big data streams. Distrib Parallel Databases 36, 461–483 (2018). https://doi.org/10.1007/s10619-018-7225-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-018-7225-5