Abstract
Motivated by the problem of counting unique visitors to a website, we consider how to preprocess a string s [1..n] such that later, given a substring’s endpoints, we can quickly count how many distinct characters that substring contains. The smallest reasonably fast previous data structure for this problem takes \(n \log \sigma + \ensuremath{\mathcal{O}\!\left( {n \log \log n} \right)}\) bits and answers queries in \(\ensuremath{\mathcal{O}\!\left( {\log n} \right)}\) time. We give a data structure for this problem that takes \(n H_0 (s) + \ensuremath{\mathcal{O}\!\left( {n} \right)} + o (n H_0 (s))\) bits, where H 0 (s) is the 0th-order empirical entropy of s, and answers queries in \(\ensuremath{\mathcal{O}\!\left( {\log \ell} \right)}\) time, where ℓ is the length of the query substring. As far as we know, this is the first data structure, where the query time depends only on ℓ and not on n. We also show how our data structure can be made partially dynamic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bozanis, P., Kitsios, N., Makris, C., Tsakalidis, A.K.: New upper bounds for generalized intersection searching problems. In: Proceedings of the 22nd International Colloquium on Automata, Language and Programming (ICALP), pp. 464–474 (1995)
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms 3(2) (2007)
Gagie, T., Navarro, G., Puglisi, S.J.: Colored range queries and document retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 67–81. Springer, Heidelberg (2010)
González, R., Navarro, G.: Rank/select on dynamic compressed sequences and applications. Theoretical Computer Science 410(43), 4414–4422 (2009)
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the 14th Symposium on Discrete Algorithms (SODA), pp. 636–645 (2003)
Kaplan, H., Rubin, N., Sharir, M., Verbin, E.: Efficient colored orthogonal range counting. SIAM Journal on Computing 38(3), 982–1011 (2008)
Lai, Y.K., Poon, C.K., Shi, B.: Approximate colored range and pointer enclosure queries. Journal of Discrete Algorithms 6(3), 420–432 (2008)
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 45–56. Springer, Heidelberg (2005)
Mäkinen, V., Navarro, G.: Implicit compression boosting with applications to self-indexing. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 229–241. Springer, Heidelberg (2007)
Mäkinen, V., Navarro, G.: Rank and select revisited and extended. Theoretical Computer Science 387(3), 332–347 (2007)
Morrison, D.R.: PATRICIA — practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM 15(4) (1968)
Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proceedings of the 13th Symposium on Discrete Algorithms (SODA), pp. 657–666 (2002)
Patrascu, M.: Predecessor search. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms. Springer, Heidelberg (2008)
Sadakane, K.: Succinct data structures for flexible text retrieval systems. Journal of Discrete Algorithms 5(1), 12–22 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gagie, T., Kärkkäinen, J. (2011). Counting Colours in Compressed Strings. In: Giancarlo, R., Manzini, G. (eds) Combinatorial Pattern Matching. CPM 2011. Lecture Notes in Computer Science, vol 6661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21458-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-21458-5_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21457-8
Online ISBN: 978-3-642-21458-5
eBook Packages: Computer ScienceComputer Science (R0)