Advertisement

Optimal Exact String Matching Based on Suffix Arrays

  • Mohamed Ibrahim Abouelhoda
  • Enno Ohlebusch
  • Stefan Kurtz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2476)

Abstract

Using the suffix tree of a string S, decision queries of the type “Is P a substring of S?” can be answered in O(|P|) time and enumeration queries of the type “Where are all z occurrences of P in S?” can be answered in O(|P|+z) time, totally independent of the size of S. However, in large scale applications as genome analysis, the space requirements of the suffix tree are a severe drawback. The suffix array is a more space economical index structure. Using it and an additional table, Manber and Myers (1993) showed that decision queries and enumeration queries can be answered in O(|P|+log|S|) and O(|P|+log|S|+z) time, respectively, but no optimal time algorithms are known. In this paper, we show how to achieve the optimal O(|P|) and O(|P| + z) time bounds for the suffix array. Our approach is not confined to exact pattern matching. In fact, it can be used to efficiently solve all problems that are usually solved by a top-down traversal of the suffix tree. Experiments show that our method is not only of theoretical interest but also of practical relevance.

Keywords

Space Requirement Suffix Tree Input String Alphabet Size Additional Table 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. The Enhanced Suffix Array and its Applications to Genome Analysis. In Proceedings of the Second Workshop on Algorithms in Bioinformatics. Springer Verlag, Lecture Notes in Computer Science, accepted for publication, 2002.Google Scholar
  2. [2]
    A. Apostolico. The Myriad Virtues of Subword Trees. In Combinatorial Algorithms on Words, Springer Verlag, pages 85–96, 1985.Google Scholar
  3. [3]
    J. Bentley and R. Sedgewick. Fast Algorithms for Sorting and Searching Strings. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pages 360–369, 1997.Google Scholar
  4. [4]
    P. Ferragina and G. Manzini. Opportunistic data structures with applications. In IEEE Symposium on Foundations of Computer Science, pages 390–398, 2000.Google Scholar
  5. [5]
    P. Ferragina and G. Manzini. An experimental study of an opportunistic index. In Symposium on Discrete Algorithms, pages 269–278, 2001.Google Scholar
  6. [6]
    G. Gonnet, R. Baeza-Yates, and T. Snider. New Indices for Text: PAT trees and PAT arrays. In W. Frakes and R.A. Baeza-Yates, editors, Information Retrieval: Algorithms and Data Structures, pages 66–82. Prentice-Hall, Englewood Cliffs, NJ, 1992.Google Scholar
  7. [7]
    R. Grossi and J. S. Vitter. Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. In ACM Symposium on the Theory of Computing (STOC 2000), pages 397–406. ACM Press, 2000.Google Scholar
  8. [8]
    D. Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.Google Scholar
  9. [9]
    T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and its Applications. In Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, July 2001, Lecture Notes in Computer Science 2089, Springer Verlag, pages 181–192, 2001.Google Scholar
  10. [10]
    S. Kurtz. Reducing the Space Requirement of Suffix Trees. Software-Practice and Experience, 29(13):1149–1171, 1999.CrossRefGoogle Scholar
  11. [11]
    N. J. Larsson and K. Sadakane. Faster Suffix Sorting. Technical Report LU-CSTR: 99-214, Dept. of Computer Science, Lund University, 1999.Google Scholar
  12. [12]
    U. Manber and E.W. Myers. Suffix Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing, 22(5):935–948, 1993.zbMATHCrossRefMathSciNetGoogle Scholar
  13. [13]
    P. Weiner. Linear Pattern Matching Algorithms. In Proceedings of the 14th IEEE Annual Symposium on Switching and Automata Theory, pages 1–11, The University of Iowa, 1973.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Mohamed Ibrahim Abouelhoda
    • 1
  • Enno Ohlebusch
    • 1
  • Stefan Kurtz
    • 1
  1. 1.Faculty of TechnologyUniversity of BielefeldBielefeldGermany

Personalised recommendations