Keywords and Synonyms
Space-efficient text indexing; Compressed full-text indexing; Self-indexing
Problem Definition
Given a text string \( { T = t_1 t_2 \dots t_n } \) over an alphabet Σ of size σ, the compressed text indexing (CTI) problem asks to replace T with a space-efficient data structure capable of efficiently answering basic string matching and substring queries on T. Typical queries required from such an index are the following:
\( { count(P) } \): count how many times a given pattern string \( { P = p_1 p_2 \dots p_m } \) occurs in T.
\( { locate(P) } \): return the locations where P occurs in T.
display(i, j): return \( { T[i,j] } \).
Key Results
An elegant solution to the problem is obtained by exploiting the connection of Burrows-Wheeler Transform (BWT) [1] and Suffix Array data structure [9]. The suffix array \( { SA[1,n] } \) of T is the permutation of text positions \( { (1 \dots n) } \) listing the suffixes \( { T[i,n] } \) in lexicographic order. That is, \( {...
Keywords
- Suffix Array
- Wavelet Tree
- Text Position
- Global Query
- Database Scenario
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)
Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. J. ACM 52(4), 688–713 (2005)
Ferragina, P. Manzini, G.: Indexing compressed texts. J. ACM 52(4), 552–581 (2005)
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representation of sequences and full-text indexes. ACM Trans. Algorithms 3(2) Article 20 (2007)
Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 841–850 (2003)
Jacobson, G.: Space-efficient static trees and graphs. In: Proc. 30th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 549–554 (1989)
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nord. J. Comput. 12(1), 40–66 (2005)
Mäkinen, V., Navarro, G.: Dynamic entropy-compressed sequences and full-text indexes. In: Proc. 17th Annual Symposium on Combinatorial Pattern Matching (CPM). LNCS, vol. 4009, pp. 307–318 (2006) Extended version as TR/DCC-2006-10, Department of Computer Science, University of Chile, July 2006
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Manzini, G.: An analysis of the Burrows-Wheeler transform. J. ACM 48(3), 407–430 (2001)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1) Article 2 (2007)
Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 233–242 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag
About this entry
Cite this entry
Mäkinen, V., Navarro, G. (2008). Compressed Text Indexing. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30162-4_83
Download citation
DOI: https://doi.org/10.1007/978-0-387-30162-4_83
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30770-1
Online ISBN: 978-0-387-30162-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering