Advertisement

Counting Colours in Compressed Strings

  • Travis Gagie
  • Juha Kärkkäinen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6661)

Abstract

Motivated by the problem of counting unique visitors to a website, we consider how to preprocess a string s [1..n] such that later, given a substring’s endpoints, we can quickly count how many distinct characters that substring contains. The smallest reasonably fast previous data structure for this problem takes \(n \log \sigma + \ensuremath{\mathcal{O}\!\left( {n \log \log n} \right)}\) bits and answers queries in \(\ensuremath{\mathcal{O}\!\left( {\log n} \right)}\) time. We give a data structure for this problem that takes \(n H_0 (s) + \ensuremath{\mathcal{O}\!\left( {n} \right)} + o (n H_0 (s))\) bits, where H 0 (s) is the 0th-order empirical entropy of s, and answers queries in \(\ensuremath{\mathcal{O}\!\left( {\log \ell} \right)}\) time, where ℓ is the length of the query substring. As far as we know, this is the first data structure, where the query time depends only on ℓ and not on n. We also show how our data structure can be made partially dynamic.

Keywords

Block Size Range Query Query Time Distinct Character Answer Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bozanis, P., Kitsios, N., Makris, C., Tsakalidis, A.K.: New upper bounds for generalized intersection searching problems. In: Proceedings of the 22nd International Colloquium on Automata, Language and Programming (ICALP), pp. 464–474 (1995)Google Scholar
  2. 2.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms 3(2) (2007)Google Scholar
  3. 3.
    Gagie, T., Navarro, G., Puglisi, S.J.: Colored range queries and document retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 67–81. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    González, R., Navarro, G.: Rank/select on dynamic compressed sequences and applications. Theoretical Computer Science 410(43), 4414–4422 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the 14th Symposium on Discrete Algorithms (SODA), pp. 636–645 (2003)Google Scholar
  6. 6.
    Kaplan, H., Rubin, N., Sharir, M., Verbin, E.: Efficient colored orthogonal range counting. SIAM Journal on Computing 38(3), 982–1011 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Lai, Y.K., Poon, C.K., Shi, B.: Approximate colored range and pointer enclosure queries. Journal of Discrete Algorithms 6(3), 420–432 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 45–56. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Mäkinen, V., Navarro, G.: Implicit compression boosting with applications to self-indexing. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 229–241. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Mäkinen, V., Navarro, G.: Rank and select revisited and extended. Theoretical Computer Science 387(3), 332–347 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Morrison, D.R.: PATRICIA — practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM 15(4) (1968)Google Scholar
  12. 12.
    Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proceedings of the 13th Symposium on Discrete Algorithms (SODA), pp. 657–666 (2002)Google Scholar
  13. 13.
    Patrascu, M.: Predecessor search. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms. Springer, Heidelberg (2008)Google Scholar
  14. 14.
    Sadakane, K.: Succinct data structures for flexible text retrieval systems. Journal of Discrete Algorithms 5(1), 12–22 (2007)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Travis Gagie
    • 1
  • Juha Kärkkäinen
    • 2
  1. 1.Aalto UniversityFinland
  2. 2.University of HelsinkiFinland

Personalised recommendations