Skip to main content

Optimal Exact String Matching Based on Suffix Arrays

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2476))

Included in the following conference series:

Abstract

Using the suffix tree of a string S, decision queries of the type “Is P a substring of S?” can be answered in O(|P|) time and enumeration queries of the type “Where are all z occurrences of P in S?” can be answered in O(|P|+z) time, totally independent of the size of S. However, in large scale applications as genome analysis, the space requirements of the suffix tree are a severe drawback. The suffix array is a more space economical index structure. Using it and an additional table, Manber and Myers (1993) showed that decision queries and enumeration queries can be answered in O(|P|+log|S|) and O(|P|+log|S|+z) time, respectively, but no optimal time algorithms are known. In this paper, we show how to achieve the optimal O(|P|) and O(|P| + z) time bounds for the suffix array. Our approach is not confined to exact pattern matching. In fact, it can be used to efficiently solve all problems that are usually solved by a top-down traversal of the suffix tree. Experiments show that our method is not only of theoretical interest but also of practical relevance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. The Enhanced Suffix Array and its Applications to Genome Analysis. In Proceedings of the Second Workshop on Algorithms in Bioinformatics. Springer Verlag, Lecture Notes in Computer Science, accepted for publication, 2002.

    Google Scholar 

  2. A. Apostolico. The Myriad Virtues of Subword Trees. In Combinatorial Algorithms on Words, Springer Verlag, pages 85–96, 1985.

    Google Scholar 

  3. J. Bentley and R. Sedgewick. Fast Algorithms for Sorting and Searching Strings. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pages 360–369, 1997.

    Google Scholar 

  4. P. Ferragina and G. Manzini. Opportunistic data structures with applications. In IEEE Symposium on Foundations of Computer Science, pages 390–398, 2000.

    Google Scholar 

  5. P. Ferragina and G. Manzini. An experimental study of an opportunistic index. In Symposium on Discrete Algorithms, pages 269–278, 2001.

    Google Scholar 

  6. G. Gonnet, R. Baeza-Yates, and T. Snider. New Indices for Text: PAT trees and PAT arrays. In W. Frakes and R.A. Baeza-Yates, editors, Information Retrieval: Algorithms and Data Structures, pages 66–82. Prentice-Hall, Englewood Cliffs, NJ, 1992.

    Google Scholar 

  7. R. Grossi and J. S. Vitter. Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. In ACM Symposium on the Theory of Computing (STOC 2000), pages 397–406. ACM Press, 2000.

    Google Scholar 

  8. D. Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.

    Google Scholar 

  9. T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and its Applications. In Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, July 2001, Lecture Notes in Computer Science 2089, Springer Verlag, pages 181–192, 2001.

    Google Scholar 

  10. S. Kurtz. Reducing the Space Requirement of Suffix Trees. Software-Practice and Experience, 29(13):1149–1171, 1999.

    Article  Google Scholar 

  11. N. J. Larsson and K. Sadakane. Faster Suffix Sorting. Technical Report LU-CSTR: 99-214, Dept. of Computer Science, Lund University, 1999.

    Google Scholar 

  12. U. Manber and E.W. Myers. Suffix Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing, 22(5):935–948, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  13. P. Weiner. Linear Pattern Matching Algorithms. In Proceedings of the 14th IEEE Annual Symposium on Switching and Automata Theory, pages 1–11, The University of Iowa, 1973.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abouelhoda, M.I., Ohlebusch, E., Kurtz, S. (2002). Optimal Exact String Matching Based on Suffix Arrays. In: Laender, A.H.F., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2002. Lecture Notes in Computer Science, vol 2476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45735-6_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-45735-6_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44158-8

  • Online ISBN: 978-3-540-45735-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics