CPM 2015: Combinatorial Pattern Matching pp 185-195 | Cite as
Succinct Non-overlapping Indexing
Abstract
Given a text \(\mathsf {T}\) having \(n\) characters, we consider the non-overlapping indexing problem defined as follows: pre-process \(\mathsf {T}\) into a data-structure, such that whenever a pattern \(P\) comes as input, we can report a maximal set of non-overlapping occurrences of \(P\) in \(\mathsf {T}\). The best known solution for this problem takes linear space, in which a suffix tree of \(\mathsf {T}\) is augmented with \(O(n)\)-word data structures. A query \(P\) can be answered in optimal \(O(|P|+\mathsf {nocc})\) time, where \(\mathsf {nocc}\) is the output size [Cohen and Porat, ISAAC 2009]. We present the following new result: let \(\mathsf {CSA}\) (not necessarily a compressed suffix array) be an index of \(\mathsf {T}\) that can compute (i) the suffix range of \(P\) in \(\mathsf {search}(P)\) time, and (ii) a suffix array or an inverse suffix array value in \(\mathsf {t_{SA}}\) time; then by using \(\mathsf {CSA}\) alone, we can answer a query \(P\) in \(O(\mathsf {search}(P)+ \mathsf {nocc}\cdot \mathsf {t_{SA}})\) time. Additionally, we present an improved result for a generalized version of this problem called range non-overlapping indexing.
Keywords
Query Time Suffix Tree Suffix Array Output Size Short PatternReferences
- 1.Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discret. Algorithms 2(1), 53–86 (2004)MATHMathSciNetCrossRefGoogle Scholar
- 2.Alstrup, S., Brodal, G.S., Rauhe, T.: New data structures for orthogonal range searching. In: 41st Annual Symposium on Foundations of Computer Science, FOCS 2000, 12–14 November 2000, Redondo Beach, California, USA, pp. 198–207 (2000)Google Scholar
- 3.Alstrup, S., Brodal, G.S., Rauhe, T.: Optimal static range reporting in one dimension. In: Proceedings on 33rd Annual ACM Symposium on Theory of Computing, 6–8 July 2001, Heraklion, Crete, Greece, pp. 476–482 (2001)Google Scholar
- 4.Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. ACM Trans. Algorithms 10(4), 23 (2014)MathSciNetCrossRefGoogle Scholar
- 5.Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)MATHCrossRefGoogle Scholar
- 6.Cohen, H., Porat, E.: Range non-overlapping indexing. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1044–1053. Springer, Heidelberg (2009) CrossRefGoogle Scholar
- 7.Crochemore, M., Iliopoulos, C.S., Kubica, M., Rahman, M.S., Walen, T.: Improved algorithms for the range next value problem and applications. In: STACS 2008, Proceeding of the 25th Annual Symposium on Theoretical Aspects of Computer Science, Bordeaux, France, 21–23 February 2008, pp. 205–216 (2008)Google Scholar
- 8.Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)MathSciNetCrossRefGoogle Scholar
- 9.Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract). In: Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, 21–23 May 2000, Portland, OR, USA, pp. 397–40 (2000)Google Scholar
- 10.Gusfield, D.: Algorithms on Strings, Trees, and Sequences : Computer Science and Computational Biology. Cambridge University Press, New York (1997) MATHCrossRefGoogle Scholar
- 11.Hon, W., Shah, R., Thankachan, S.V., Vitter, J.S.: On position restricted substring searching in succinct space. J. Discret. Algorithms 17, 109–114 (2012)MATHMathSciNetCrossRefGoogle Scholar
- 12.Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)MATHMathSciNetCrossRefGoogle Scholar
- 13.Keller, O., Kopelowitz, T., Lewenstein, M.: Range non-overlapping indexing and successive list indexing. In: Dehne, F., Sack, J.-R., Zeh, N. (eds.) WADS 2007. LNCS, vol. 4619, pp. 625–636. Springer, Heidelberg (2007) CrossRefGoogle Scholar
- 14.Knuth, D.E., Jr., J.H.M., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)MATHMathSciNetCrossRefGoogle Scholar
- 15.Mäkinen, V., Navarro, G.: Position-restricted substring searching. In: Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 703–714. Springer, Heidelberg (2006) CrossRefGoogle Scholar
- 16.Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22, 935–948 (1993)MATHMathSciNetCrossRefGoogle Scholar
- 17.Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv., vol. 39(1) (2007)Google Scholar
- 18.Nekrich, Y., Navarro, G.: Sorted range reporting. In: Fomin, F.V., Kaski, P. (eds.) SWAT 2012. LNCS, vol. 7357, pp. 271–282. Springer, Heidelberg (2012) CrossRefGoogle Scholar
- 19.Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)MATHMathSciNetCrossRefGoogle Scholar
- 20.Weiner, P.: Linear pattern matching algorithms. In: 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, 15–17 October 1973, pp. 1–11 (1973)Google Scholar