Finding Frequent Elements in Compressed 2D Arrays and Strings

  • Travis Gagie
  • Meng He
  • J. Ian Munro
  • Patrick K. Nicholson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7024)

Abstract

We show how to store a compressed two-dimensional array such that, if we are asked for the elements with high relative frequency in a range, we can quickly return a short list of candidates that includes them. More specifically, given an m ×n array A and a fraction α > 0, we can store A in \(\ensuremath{\mathcal{O}\!\left( {m n (H + 1) \log^2 (1 / \alpha)} \right)}\) bits, where H is the entropy of the elements’ distribution in A, such that later, given a rectangular range in A and a fraction β ≥ α, in \(\ensuremath{\mathcal{O}\!\left( {1 / \beta} \right)}\) time we can return a list of \(\ensuremath{\mathcal{O}\!\left( {1 / \beta} \right)}\) distinct array elements that includes all the elements that have relative frequency at least β in that range. We do not verify that the elements in the list have relative frequency at least β, so the list may contain false positives. In the case when m = 1, i.e., A is a string, we improve this space bound by a factor of log(1/α), and explore a space-time trade off for verifying the frequency of the elements in the list. This leads to an \(\ensuremath{\mathcal{O}\!\left( {n \min(\log(1/\alpha), H+1)\log n} \right)}\) bit data structure for strings that, in \(\ensuremath{\mathcal{O}\!\left( {1/\beta} \right)}\) time, can return the \(\ensuremath{\mathcal{O}\!\left( {1/\beta} \right)}\) elements that have relative frequency at least β in a given range, without false positives, for β ≥ α.

Keywords

Relative Frequency Query Range High Relative Frequency Frequent Element Heavy Hitter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barbay, J., Gagie, T., Navarro, G., Nekrich, Y.: Alphabet partitioning for compressed rank/Select and applications. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part II. LNCS, vol. 6507, pp. 315–326. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Clark, D., Munro, J.I.: Efficient Suffix Trees on Secondary Storage (extended abstract). In: Proc. SODA, p. 383 (1996)Google Scholar
  3. 3.
    Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. of Alg. 55(1), 58–75 (2005)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Durocher, S., He, M., Munro, J.I., Nicholson, P.K., Skala, M.: Range majority in constant time and linear space. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP 2011. LNCS, vol. 6755, pp. 244–255. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Elias, P.: Universal codeword sets and representations of the integers. IEEE Trans. on Inf. Theory 21(2), 194–203 (1975)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.: Computing iceberg queries efficiently. In: Proc. VLDB, pp. 299–310 (1998)Google Scholar
  8. 8.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. on Alg. 3(2) (2007)Google Scholar
  9. 9.
    Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. SODA, pp. 841–850 (2003)Google Scholar
  10. 10.
    Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Data. Sys. 28(1), 51–55 (2003)CrossRefGoogle Scholar
  11. 11.
    Karpinski, M., Nekrich, Y.: Searching for frequent colors in rectangles. In: Proc. CCCG, pp. 11–14 (2008)Google Scholar
  12. 12.
    Misra, J., Gries, D.: Finding repeated elements. Sci. Comp. Prog. 2, 143–152 (1982)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Travis Gagie
    • 1
  • Meng He
    • 2
  • J. Ian Munro
    • 2
  • Patrick K. Nicholson
    • 2
  1. 1.Department of Computer Science and EngineeringAalto UniversityFinland
  2. 2.Cheriton School of Computer ScienceUniversity of WaterlooCanada

Personalised recommendations