Skip to main content

Position-Restricted Substring Searching

  • Conference paper
LATIN 2006: Theoretical Informatics (LATIN 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3887))

Included in the following conference series:

Abstract

A full-text index is a data structure built over a text string T[1,n]. The most basic functionality provided is (a) counting how many times a pattern string P[1,m] appears in T and (b) locating all those occ positions. There exist several indexes that solve (a) in O(m) time and (b) in O(occ) time. In this paper we propose two new queries, (c) counting how many times P[1,m] appears in T[l,r] and (d) locating all those occ l,r positions. These can be solved using (a) and (b) but this requires O(occ) time. We present two solutions to (c) and (d) in this paper. The first is an index that requires O(nlog n) bits of space and answers (c) in O(m+log n) time and (d) in O(log n) time per occurrence (that is, O(occ l,r log n) time overall). A variant of the first solution answers (c) in O(m+loglog n) time and (d) in constant time per occurrence, but requires O(nlog\(^{\rm 1+{\it \epsilon}}\) n) bits of space for any constant ε > 0. The second solution requires O(nm log σ) bits of space, solving (c) in O(m⌈log σ / loglog n⌉) time and (d) in O(m⌈log σ / loglog n⌉) time per occurrence, where σ is the alphabet size. This second structure takes less space when the text is compressible.

Our solutions can be seen as a generalization of rank and select dictionaries, which allow computing how many times a given character c appears in a prefix T[1,i] and also locate the i-th occurrence of c in T. Our solution to (c) extends character rank queries to substring rank queries, and our solution to (d) extends character select to substring select queries.

As a byproduct, we show how rank queries can be used to implement fractional cascading in little space, so as to obtain an alternative implementation of a well-known two-dimensional range search data structure by Chazelle. We also show how Grossi et al.’s wavelet trees are suitable for two-dimensional range searching, and their connection with Chazelle’s data structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alstrup, S., Brodal, G., Rahue, T.: New data structures for orthogonal range searching. In: Proc. 41st IEEE Symposium on Foundations of Computer Science (FOCS), pp. 198–207 (2000)

    Google Scholar 

  2. Apostolico, A.: The myriad virtues of subword trees. In: Combinatorial Algorithms on Words. NATO ISI Series, pp. 85–96. Springer, Heidelberg (1985)

    Chapter  Google Scholar 

  3. Chazelle, B.: A functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing 17(3), 427–462 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  4. Clark, D.: Compact Pat Trees. PhD thesis, University of Waterloo (1996)

    Google Scholar 

  5. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. 41st IEEE Symposium on Foundations of Computer Science (FOCS), pp. 390–398 (2000)

    Google Scholar 

  6. Ferragina, P., Manzini, G.: Indexing compressed texts. Journal of the ACM 52(4), 552–581 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  7. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representation of sequences and full-text indexes. Technical Report 2004-05, Technische Fakultät, Universität Bielefeld, Germany (December 2004); Submitted to a journal

    Google Scholar 

  9. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Succinct representation of sequences. Technical Report TR/DCC-2004-5, Department of Computer Science, University of Chile, Chile (August 2004), ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/sequences.ps.gz

  10. Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 841–850 (2003)

    Google Scholar 

  11. Grossi, R., Gupta, A., Vitter, J.: When indexing equals compression: Experiments with compressing suffix arrays and applications. In: Proc. 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 636–645 (2004)

    Google Scholar 

  12. Jacobson, G.: Space-efficient static trees and graphs. In: Proc. 30th IEEE Symp. Foundations of Computer Science (FOCS 1989), pp. 549–554 (1989)

    Google Scholar 

  13. Kärkkäinen, J.: Repetition-based text indexes. PhD thesis, Dept. of Computer Science, University of Helsinki, Finland (1999); Also available as Report A-1999-4, Series A

    Google Scholar 

  14. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 935–948 (1993)

    Google Scholar 

  15. Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  16. Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  17. Navarro, G., Mäkinen, V.: Compressed full-text indexes. Technical Report TR/DCC-2005-7, Department of Computer Science, University of Chile, Chile (June 2005), ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/survcompr.ps.gz , Submitted to a journal

  18. Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th Annual ACMSIAM Symposium on Discrete Algorithms (SODA 2002), pp. 233–242 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mäkinen, V., Navarro, G. (2006). Position-Restricted Substring Searching. In: Correa, J.R., Hevia, A., Kiwi, M. (eds) LATIN 2006: Theoretical Informatics. LATIN 2006. Lecture Notes in Computer Science, vol 3887. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11682462_64

Download citation

  • DOI: https://doi.org/10.1007/11682462_64

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32755-4

  • Online ISBN: 978-3-540-32756-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics