Position-Restricted Substring Searching

Mäkinen, Veli; Navarro, Gonzalo

doi:10.1007/11682462_64

Veli Mäkinen¹⁹ &
Gonzalo Navarro²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3887))

Included in the following conference series:

Latin American Symposium on Theoretical Informatics

1085 Accesses
30 Citations

Abstract

A full-text index is a data structure built over a text string T[1,n]. The most basic functionality provided is (a) counting how many times a pattern string P[1,m] appears in T and (b) locating all those occ positions. There exist several indexes that solve (a) in O(m) time and (b) in O(occ) time. In this paper we propose two new queries, (c) counting how many times P[1,m] appears in T[l,r] and (d) locating all those occ _l,r positions. These can be solved using (a) and (b) but this requires O(occ) time. We present two solutions to (c) and (d) in this paper. The first is an index that requires O(nlog n) bits of space and answers (c) in O(m+log n) time and (d) in O(log n) time per occurrence (that is, O(occ _l,r log n) time overall). A variant of the first solution answers (c) in O(m+loglog n) time and (d) in constant time per occurrence, but requires O(nlog\(^{\rm 1+{\it \epsilon}}\) n) bits of space for any constant ε > 0. The second solution requires O(nm log σ) bits of space, solving (c) in O(m⌈log σ / loglog n⌉) time and (d) in O(m⌈log σ / loglog n⌉) time per occurrence, where σ is the alphabet size. This second structure takes less space when the text is compressible.

Our solutions can be seen as a generalization of rank and select dictionaries, which allow computing how many times a given character c appears in a prefix T[1,i] and also locate the i-th occurrence of c in T. Our solution to (c) extends character rank queries to substring rank queries, and our solution to (d) extends character select to substring select queries.

As a byproduct, we show how rank queries can be used to implement fractional cascading in little space, so as to obtain an alternative implementation of a well-known two-dimensional range search data structure by Chazelle. We also show how Grossi et al.’s wavelet trees are suitable for two-dimensional range searching, and their connection with Chazelle’s data structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alstrup, S., Brodal, G., Rahue, T.: New data structures for orthogonal range searching. In: Proc. 41st IEEE Symposium on Foundations of Computer Science (FOCS), pp. 198–207 (2000)
Google Scholar
Apostolico, A.: The myriad virtues of subword trees. In: Combinatorial Algorithms on Words. NATO ISI Series, pp. 85–96. Springer, Heidelberg (1985)
Chapter Google Scholar
Chazelle, B.: A functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing 17(3), 427–462 (1988)
Article MathSciNet MATH Google Scholar
Clark, D.: Compact Pat Trees. PhD thesis, University of Waterloo (1996)
Google Scholar
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. 41st IEEE Symposium on Foundations of Computer Science (FOCS), pp. 390–398 (2000)
Google Scholar
Ferragina, P., Manzini, G.: Indexing compressed texts. Journal of the ACM 52(4), 552–581 (2005)
Article MathSciNet MATH Google Scholar
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004)
Chapter Google Scholar
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representation of sequences and full-text indexes. Technical Report 2004-05, Technische Fakultät, Universität Bielefeld, Germany (December 2004); Submitted to a journal
Google Scholar
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Succinct representation of sequences. Technical Report TR/DCC-2004-5, Department of Computer Science, University of Chile, Chile (August 2004), ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/sequences.ps.gz
Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 841–850 (2003)
Google Scholar
Grossi, R., Gupta, A., Vitter, J.: When indexing equals compression: Experiments with compressing suffix arrays and applications. In: Proc. 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 636–645 (2004)
Google Scholar
Jacobson, G.: Space-efficient static trees and graphs. In: Proc. 30th IEEE Symp. Foundations of Computer Science (FOCS 1989), pp. 549–554 (1989)
Google Scholar
Kärkkäinen, J.: Repetition-based text indexes. PhD thesis, Dept. of Computer Science, University of Helsinki, Finland (1999); Also available as Report A-1999-4, Series A
Google Scholar
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 935–948 (1993)
Google Scholar
Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)
Article MathSciNet MATH Google Scholar
Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)
Chapter Google Scholar
Navarro, G., Mäkinen, V.: Compressed full-text indexes. Technical Report TR/DCC-2005-7, Department of Computer Science, University of Chile, Chile (June 2005), ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/survcompr.ps.gz , Submitted to a journal
Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th Annual ACMSIAM Symposium on Discrete Algorithms (SODA 2002), pp. 233–242 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Helsinki, Finland
Veli Mäkinen
Center for Web Research, Dept. of Computer Science, University of Chile, Chile
Gonzalo Navarro

Authors

Veli Mäkinen
View author publications
You can also search for this author in PubMed Google Scholar
Gonzalo Navarro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Business, Universidad Adolfo Ibáñez, Chile
José R. Correa
Dept. of Computer Science, University of Chile, Blanco Encalada 2120, 3er piso, Santiago, Chile
Alejandro Hevia
Dept. Ing. Matemática & Ctr. de Modelamiento Matemático, UMI 2807 U. Chile–CNRS, Chile
Marcos Kiwi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mäkinen, V., Navarro, G. (2006). Position-Restricted Substring Searching. In: Correa, J.R., Hevia, A., Kiwi, M. (eds) LATIN 2006: Theoretical Informatics. LATIN 2006. Lecture Notes in Computer Science, vol 3887. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11682462_64

Download citation

DOI: https://doi.org/10.1007/11682462_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32755-4
Online ISBN: 978-3-540-32756-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics