Abstract
A full-text index is a data structure built over a text string T[1,n]. The most basic functionality provided is (a) counting how many times a pattern string P[1,m] appears in T and (b) locating all those occ positions. There exist several indexes that solve (a) in O(m) time and (b) in O(occ) time. In this paper we propose two new queries, (c) counting how many times P[1,m] appears in T[l,r] and (d) locating all those occ l,r positions. These can be solved using (a) and (b) but this requires O(occ) time. We present two solutions to (c) and (d) in this paper. The first is an index that requires O(nlog n) bits of space and answers (c) in O(m+log n) time and (d) in O(log n) time per occurrence (that is, O(occ l,r log n) time overall). A variant of the first solution answers (c) in O(m+loglog n) time and (d) in constant time per occurrence, but requires O(nlog\(^{\rm 1+{\it \epsilon}}\) n) bits of space for any constant ε > 0. The second solution requires O(nm log σ) bits of space, solving (c) in O(m⌈log σ / loglog n⌉) time and (d) in O(m⌈log σ / loglog n⌉) time per occurrence, where σ is the alphabet size. This second structure takes less space when the text is compressible.
Our solutions can be seen as a generalization of rank and select dictionaries, which allow computing how many times a given character c appears in a prefix T[1,i] and also locate the i-th occurrence of c in T. Our solution to (c) extends character rank queries to substring rank queries, and our solution to (d) extends character select to substring select queries.
As a byproduct, we show how rank queries can be used to implement fractional cascading in little space, so as to obtain an alternative implementation of a well-known two-dimensional range search data structure by Chazelle. We also show how Grossi et al.’s wavelet trees are suitable for two-dimensional range searching, and their connection with Chazelle’s data structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alstrup, S., Brodal, G., Rahue, T.: New data structures for orthogonal range searching. In: Proc. 41st IEEE Symposium on Foundations of Computer Science (FOCS), pp. 198–207 (2000)
Apostolico, A.: The myriad virtues of subword trees. In: Combinatorial Algorithms on Words. NATO ISI Series, pp. 85–96. Springer, Heidelberg (1985)
Chazelle, B.: A functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing 17(3), 427–462 (1988)
Clark, D.: Compact Pat Trees. PhD thesis, University of Waterloo (1996)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. 41st IEEE Symposium on Foundations of Computer Science (FOCS), pp. 390–398 (2000)
Ferragina, P., Manzini, G.: Indexing compressed texts. Journal of the ACM 52(4), 552–581 (2005)
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004)
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representation of sequences and full-text indexes. Technical Report 2004-05, Technische Fakultät, Universität Bielefeld, Germany (December 2004); Submitted to a journal
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Succinct representation of sequences. Technical Report TR/DCC-2004-5, Department of Computer Science, University of Chile, Chile (August 2004), ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/sequences.ps.gz
Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 841–850 (2003)
Grossi, R., Gupta, A., Vitter, J.: When indexing equals compression: Experiments with compressing suffix arrays and applications. In: Proc. 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 636–645 (2004)
Jacobson, G.: Space-efficient static trees and graphs. In: Proc. 30th IEEE Symp. Foundations of Computer Science (FOCS 1989), pp. 549–554 (1989)
Kärkkäinen, J.: Repetition-based text indexes. PhD thesis, Dept. of Computer Science, University of Helsinki, Finland (1999); Also available as Report A-1999-4, Series A
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 935–948 (1993)
Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)
Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. Technical Report TR/DCC-2005-7, Department of Computer Science, University of Chile, Chile (June 2005), ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/survcompr.ps.gz , Submitted to a journal
Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th Annual ACMSIAM Symposium on Discrete Algorithms (SODA 2002), pp. 233–242 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mäkinen, V., Navarro, G. (2006). Position-Restricted Substring Searching. In: Correa, J.R., Hevia, A., Kiwi, M. (eds) LATIN 2006: Theoretical Informatics. LATIN 2006. Lecture Notes in Computer Science, vol 3887. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11682462_64
Download citation
DOI: https://doi.org/10.1007/11682462_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32755-4
Online ISBN: 978-3-540-32756-1
eBook Packages: Computer ScienceComputer Science (R0)