Fast Identification of Heavy Hitters by Cached and Packed Group Testing

  • Yusaku KanetaEmail author
  • Takeaki Uno
  • Hiroki Arimura
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11811)


The \(\epsilon \)-approximate \(\phi \)-heavy hitters problem is, for any element from some universe \(\mathbb {U}=[0..n)\), to maintain its frequency under an arbitrary data stream of form \((x_i, \varDelta _i)\in \mathbb {U}\times \mathbb {Z}\) that changes the frequency of \(x_i\) by \(\varDelta _i\), such that one can output every element with frequency more than \(\phi {N}\) and no element with frequency no more than \((\phi -\epsilon ){N}\) for \({N}=\sum _i \varDelta _i\) and prespecified parameters \(\epsilon , \phi \in \mathbb {R}\). To solve this problem in small space, Cormode and Muthukrishnan (ACM TODS, 2005) have proposed an \({O}(\rho \epsilon ^{-1}\lg {n})\)-space probabilistic data structure with good practical performance, where \(\rho =\lg {(1/(\delta \phi ))}\) for any failure probability \(\delta \in \mathbb {R}\). In this paper, we improve its output time from \({O}(\rho \epsilon ^{-1}(\lg {n}+\rho ))\) to \({O}(\rho ^2\epsilon ^{-1})\) for arbitrary updates (\(\varDelta _i\in \mathbb {Z}\)) and its update time from \({O}(\rho \lg {n})\) to amortized \({O}(\rho )\) for constant updates (\(\varDelta _i\in {O}(1)\)) with the same space and output guarantee by removing application-specific \(\lg {n}\) terms that are not tunable, unlike other parameters \(\delta \), \(\epsilon \), and \(\phi \).



The authors would like to thank anonymous referees for their comments that greatly improved the readability and structure of this paper.


  1. 1.
    Basat, R.B., Einziger, G., Friedman, R., Luizelli, M.C., Waisbard, E.: Constant time updates in hierarchical heavy hitters. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, SIGCOMM 2017, pp. 127–140 (2017)Google Scholar
  2. 2.
    Belazzougui, D., Gagie, T., Navarro, G.: Better space bounds for parameterized range majority and minority. In: Dehne, F., Solis-Oba, R., Sack, J.-R. (eds.) WADS 2013. LNCS, vol. 8037, pp. 121–132. Springer, Heidelberg (2013). Scholar
  3. 3.
    Bender, M.A., et al.: The online event-detection problem. arXiv e-prints arXiv:1812.09824 (2018)
  4. 4.
    Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proceedings of the 21 Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 1297–1308 (2010)Google Scholar
  5. 5.
    Boyer, R.S., Moore, J.S.: MJRTY: a fast majority vote algorithm. In: Boyer, R.S. (ed.) Automated Reasoning: Essays in Honor of Woody Bledsoe, pp. 105–118. Springer, Dordrecht (1991). Scholar
  6. 6.
    Carter, J.L., Wegman, M.N.: Universal classes of hash functions. J. Comput. Syst. Sci. 18(2), 143–154 (1979)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Charikar, M., Chen, K.C., Farach-Colton, M.: Finding frequent items in data streams. Theor. Comput. Sci. 312(1), 3–15 (2004)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Cormode, G., Hadjieleftheriou, M.: Methods for finding frequent items in data streams. VLDB J. 19(1), 3–20 (2010)CrossRefGoogle Scholar
  9. 9.
    Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)CrossRefGoogle Scholar
  11. 11.
    Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Möhring, R., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002). Scholar
  12. 12.
    Durocher, S., He, M., Munro, J.I., Nicholson, P.K., Skala, M.: Range majority in constant time and linear space. Inf. Comput. 222, 169–179 (2013)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Feigenblat, G., Itzhaki, O., Porat, E.: The frequent items problem, under polynomial decay, in the streaming model. Theor. Comput. Sci. 411(34–36), 3048–3054 (2010)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Frandsen, G.S., Skyum, S.: Dynamic maintenance of majority information in constant time per update. Inf. Process. Lett. 63(2), 75–78 (1997)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Gagie, T., He, M., Munro, J.I., Nicholson, P.K.: Finding frequent elements in compressed 2D arrays and strings. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 295–300. Springer, Heidelberg (2011). Scholar
  16. 16.
    Gagie, T., He, M., Navarro, G.: Compressed dynamic range majority datastructures. In: 2017 Data Compression Conference, DCC 2017, pp. 260–269 (2017)Google Scholar
  17. 17.
    Grabowski, S., Fredriksson, K.: Bit-parallel string matching under Hamming distance in \(O\)\((n{m/w}])\) worst case time. Inf. Process. Lett. 105(5), 182–187 (2008)Google Scholar
  18. 18.
    Hovmand, J.N., Nygaard, M.H.: Estimating frequencies and finding heavy hitters. Master’s thesis, Aarhus University (2016)Google Scholar
  19. 19.
    Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28(1), 51–55 (2003)CrossRefGoogle Scholar
  20. 20.
    Karpinski, M., Nekrich, Y.: Searching for frequent colors in rectangles. In: Proceedings of the 20th Annual Canadian Conference on Computational Geometry, CCCG 2008 (2008)Google Scholar
  21. 21.
    Kveton, B., Muthukrishnan, S., Vu, H.T., Xian, Y.: Finding subcube heavy hitters in analytics data streams. In: Proceedings of the 2018 World Wide Web Conference WWW 2018, pp. 1705–1714 (2018)Google Scholar
  22. 22.
    Larsen, K.G., Nelson, J., Nguyen, H.L., Thorup, M.: Heavy hitters via cluster-preserving clustering. In: Proceedings of the IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, pp. 61–70 (2016)Google Scholar
  23. 23.
    Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solution for computing frequent and top-\(k\) elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006)Google Scholar
  24. 24.
    Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Program. 2(2), 143–152 (1982)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Muthukrishnan, S.: Data streams: algorithms and applications. Found. Trends Theor. Comput. Sci. 1(2), 117–236 (2005)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Navarro, G., Thankachan, S.V.: Encodings for range majority queries. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 262–272. Springer, Cham (2014). Scholar
  27. 27.
    Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: parallel analysis with Sawzall. Sci. Program. J. 13, 277–298 (2005)Google Scholar
  28. 28.
    Sivaraman, V., Narayana, S., Rottenstreich, O., Muthukrishnan, S., Rexford, J.: Heavy-hitter detection entirely in the data plane. In: Proceedings of the Symposium on SDN Research, SOSR 2017, pp. 164–176 (2017)Google Scholar
  29. 29.
    Thorup, M.: High speed hashing for integers and strings. arXiv e-prints arXiv:1504.06804 (2015)
  30. 30.
    Tong, D., Prasanna, V.K.: Sketch acceleration on FPGA and its applications in network anomaly detection. IEEE Trans. Parallel Distrib. Syst. 29(4), 929–942 (2018)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Autonomous Networking Research and Innovation DepartmentRakuten Mobile, Inc. and Rakuten Institute of Technology, Rakuten, Inc.TokyoJapan
  2. 2.National Institute of InformaticsTokyoJapan
  3. 3.ISTHokkaido UniversitySapporoJapan

Personalised recommendations