Skip to main content

Counting Colours in Compressed Strings

  • Conference paper
Combinatorial Pattern Matching (CPM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6661))

Included in the following conference series:

Abstract

Motivated by the problem of counting unique visitors to a website, we consider how to preprocess a string s [1..n] such that later, given a substring’s endpoints, we can quickly count how many distinct characters that substring contains. The smallest reasonably fast previous data structure for this problem takes \(n \log \sigma + \ensuremath{\mathcal{O}\!\left( {n \log \log n} \right)}\) bits and answers queries in \(\ensuremath{\mathcal{O}\!\left( {\log n} \right)}\) time. We give a data structure for this problem that takes \(n H_0 (s) + \ensuremath{\mathcal{O}\!\left( {n} \right)} + o (n H_0 (s))\) bits, where H 0 (s) is the 0th-order empirical entropy of s, and answers queries in \(\ensuremath{\mathcal{O}\!\left( {\log \ell} \right)}\) time, where ℓ is the length of the query substring. As far as we know, this is the first data structure, where the query time depends only on ℓ and not on n. We also show how our data structure can be made partially dynamic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bozanis, P., Kitsios, N., Makris, C., Tsakalidis, A.K.: New upper bounds for generalized intersection searching problems. In: Proceedings of the 22nd International Colloquium on Automata, Language and Programming (ICALP), pp. 464–474 (1995)

    Google Scholar 

  2. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms 3(2) (2007)

    Google Scholar 

  3. Gagie, T., Navarro, G., Puglisi, S.J.: Colored range queries and document retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 67–81. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. González, R., Navarro, G.: Rank/select on dynamic compressed sequences and applications. Theoretical Computer Science 410(43), 4414–4422 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the 14th Symposium on Discrete Algorithms (SODA), pp. 636–645 (2003)

    Google Scholar 

  6. Kaplan, H., Rubin, N., Sharir, M., Verbin, E.: Efficient colored orthogonal range counting. SIAM Journal on Computing 38(3), 982–1011 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  7. Lai, Y.K., Poon, C.K., Shi, B.: Approximate colored range and pointer enclosure queries. Journal of Discrete Algorithms 6(3), 420–432 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  8. Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 45–56. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Mäkinen, V., Navarro, G.: Implicit compression boosting with applications to self-indexing. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 229–241. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  10. Mäkinen, V., Navarro, G.: Rank and select revisited and extended. Theoretical Computer Science 387(3), 332–347 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  11. Morrison, D.R.: PATRICIA — practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM 15(4) (1968)

    Google Scholar 

  12. Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proceedings of the 13th Symposium on Discrete Algorithms (SODA), pp. 657–666 (2002)

    Google Scholar 

  13. Patrascu, M.: Predecessor search. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms. Springer, Heidelberg (2008)

    Google Scholar 

  14. Sadakane, K.: Succinct data structures for flexible text retrieval systems. Journal of Discrete Algorithms 5(1), 12–22 (2007)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gagie, T., Kärkkäinen, J. (2011). Counting Colours in Compressed Strings. In: Giancarlo, R., Manzini, G. (eds) Combinatorial Pattern Matching. CPM 2011. Lecture Notes in Computer Science, vol 6661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21458-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21458-5_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21457-8

  • Online ISBN: 978-3-642-21458-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics