Skip to main content

Suffix arrays for multiple strings: A method for on-line multiple string searches

  • Conference paper
  • First Online:
Concurrency and Parallelism, Programming, Networking, and Security (ASIAN 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1179))

Included in the following conference series:

Abstract

Manber and Myers' suffix array for a single string is a useful data structure for solving string matching problems. In this paper, we will show how to generalize their idea to multiple strings. We call this generalization the generalized suffix array. We present algorithms for constructing a generalized suffix array and for searching the array. Let A denote the set of strings for which we are to build a generalized suffix array. Let N be the sum of the lengths of all strings in A and n the length of the longest string in A. Our sort algorithm needs O(N log n) time in the worst case using O(N) storage to construct the generalized suffix array and the information about the longest common prefixes (lcps) between adjacent suffixes in the suffix array which will be required by the search algorithm. Given the suffix array and its lcp information, the search algorithm answers an on-line search query of the type, “Is W a substring of some strings in A? If so, where does it occur within strings of A?” in OW¦+log N) time in the worst case. The above bounds are independent of the size of the underlying alphabet Σ. We then apply the generalized suffix array to the problem of finding all occurrences of an m×m matrix (the pattern) as a submatrix in a larger n×n matrix (the text). Our solution falls into the class of the 2D pattern matching algorithms that first preprocess the text and then search for the pattern. After preprocessing the text using O(n 2 log n) time and O(n 2) space, our algorithm can find all occurrences of the pattern in the text in expectedtime sublinear in the size of the pattern. To the best of our knowledge, our algorithm is the average-case fastest algorithm in its class.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Amir, G. Benson and M. Farach, Alphabet Independent Two-dimensional Matching. Proc. the 24th annual ACM Symposium on Theory of Computing, 1992, 59–68.

    Google Scholar 

  2. A. Apostolico and F.P. Preparata, Structural properties of the string statistics problem, Journal of computer and System science 31 (1985), pp. 394–411.

    Article  Google Scholar 

  3. B. Clift, D. Haussler, R. McConnell, T.D. Schneider and G.D. Stomo, Sequence landscapes, Nucleic Acids Research 4, 1 (1986), pp. 141–158.

    Google Scholar 

  4. W. Chang and E. Lawler, Approximate String Matching in Sublinear Expected Time, Proc. 31st FOCS, St. Louis, MO, Oct. 1990, IEEE, pp. 116–124.

    Google Scholar 

  5. T. Cormen, C. Leiserson and R. Rivest, Algorithms. The MIT Press, 1990.

    Google Scholar 

  6. D. Harel and R.E. Tarjan, Fast algorithms for finding nearest common ancestors, SIAM journal on Computing 13 (1984), pp. 338–355.

    Article  Google Scholar 

  7. L.C.K. Hui, Color set size problem with applications to string matching, Proc. CPM'92, LNCS 644 (1992), Springer-Verlag, pp. 230–243.

    Google Scholar 

  8. R. Karp and M. Rabin, Efficient Randomized Pattern Matching Algorithms. IBM J. Res. Develop. Vol. 31, No. 2, March 1987, pp. 249–260.

    Google Scholar 

  9. E.M. McCreight, A Space-economical Suffix Tree Construction Algorithm. Journal of the ACM 23 (1976), 262–272.

    Article  Google Scholar 

  10. U. Manber and G. Myers, Suffix Arrays: A New Method for On-Line String Searches. Proc. the 1st ACM-SIAM Symposium on Discrete Algorithms, 1990, pp. 319–327.

    Google Scholar 

  11. Fei Shi, An algorithm for two-dimensional pattern matching, Proceedings of the 2nd South American Workshop on String Processing, Valpariso, Chile, (eds.) U. Manber, R. Baeza-Yates, 1995, pp. 101–116.

    Google Scholar 

  12. Fei Shi, Fast approximate string matching with q-blocks sequences, in Proceedings of the third South American Workshop on String Processing, Carleton University Press, Ottawa, Canada, 1996, pp. 257–271.

    Google Scholar 

  13. B. Schieber and U. Vishkin, On finding lowest common ancestors: Simplification and parallelization, SIAM Journal on computing 17 (December 1988), pp. 1253–1262.

    Article  Google Scholar 

  14. P. Weiner, Linear Pattern Matching Algorithm, Proc. 14th IEEE Symposium on Switching and Automata Theory, 1973, 1–11.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Joxan Jaffar Roland H. C. Yap

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shi, F. (1996). Suffix arrays for multiple strings: A method for on-line multiple string searches. In: Jaffar, J., Yap, R.H.C. (eds) Concurrency and Parallelism, Programming, Networking, and Security. ASIAN 1996. Lecture Notes in Computer Science, vol 1179. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0027775

Download citation

  • DOI: https://doi.org/10.1007/BFb0027775

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62031-0

  • Online ISBN: 978-3-540-49626-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics