Skip to main content

A Fast Algorithm for Constructing Suffix Arrays for Fixed-Size Alphabets

  • Conference paper
Experimental and Efficient Algorithms (WEA 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3059))

Included in the following conference series:

Abstract

The suffix array of a string T is basically a sorted list of all the suffixes of T. Suffix arrays have been fundamental index data structures in computational biology. If we are to search a DNA sequence in a genome sequence, we construct the suffix array for the genome sequence and then search the DNA sequence in the suffix array. In this paper, we consider the construction of the suffix array of T of length n where the size of the alphabet is fixed. It has been well-known that one can construct the suffix array of T in O(n) time by constructing suffix tree of T and traversing the suffix tree. Although this approach takes O(n) time, it is not appropriate for practical use because it uses a lot of spaces and it is complicated to implement. Recently, almost at the same time, several algorithms have been developed to directly construct suffix arrays in O(n) time. However, these algorithms are developed for integer alphabets and thus do not exploit the properties given when the size of the alphabet is fixed. We present a fast algorithm for constructing suffix arrays for the fixed-size alphabet. Our algorithm constructs suffix arrays faster than any other algorithms developed for integer or general alphabets when the size of the alphabet is fixed. For example, we reduced the time required for constructing suffix arrays for DNA sequences by 25%-38%. In addition, we do not sacrifice the space to improve the running time. The space required by our algorithm is almost equal to or even less than those required by previous fast algorithms.

This work is supported by Korea Research Foundation grant KRF-2003-03-D00343.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm, Technical Report 124 (1994), Digital Equipment Corporation (1994)

    Google Scholar 

  2. Farach, M.: Optimal suffix tree construction with large alphabets. In: IEEE Symp. Found. Computer Science, pp. 137–143 (1997)

    Google Scholar 

  3. Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. Assoc. Comput. Mach. 47, 987–1011 (2000)

    MATH  MathSciNet  Google Scholar 

  4. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: IEEE Symp. Found. Computer Science, pp. 390–398 (2001)

    Google Scholar 

  5. Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: ACM-SIAM Symp. on Discrete Algorithms, pp. 269–278 (2001)

    Google Scholar 

  6. Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: Pat trees and pat arrays. In: Frakes, W.B., Baeza-Yates, R.A. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 66–82. Prentice-Hall, Englewood Cliffs (1992)

    Google Scholar 

  7. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge Univ. Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  8. Gusfield, D.: An “Increment-by-one” approach to suffix arrays and trees (1990) (manuscript)

    Google Scholar 

  9. Grossi, R., Gupta, A., Vitter, J.S.: When indexing equals compression: Experiments with compressing suffix arrays and applications. In: ACM-SIAM Symp. on Discrete Algorithms (2004)

    Google Scholar 

  10. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: ACM Symp. Theory of Computing, pp. 397–406 (2000)

    Google Scholar 

  11. Hon, W.K., Sadakane, K., Sung, W.K.: Breaking a time-and-space barrier in constructing full-text indices. In: IEEE Symp. Found. Computer Science, pp. 251–260 (2003)

    Google Scholar 

  12. Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longestcommon- prefix computation in suffix arrays and its applications. In: Symp. Combinatorial Pattern Matching, pp. 181–192 (2001)

    Google Scholar 

  13. Kärkkäinen, J., Sanders, P.: Simpler linear work suffix array construction. In: Int. Colloq. Automata Languages and Programming, pp. 943–955 (2003)

    Google Scholar 

  14. Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Symp. Combinatorial Pattern Matching, pp. 186–199 (2003)

    Google Scholar 

  15. Ko, P., Aluru, S.: Space-efficient linear time construction of suffix arrays. In: Symp. Combinatorial Pattern Matching, pp. 200–210 (2003)

    Google Scholar 

  16. Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22, 935–938 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  17. McCreight, E.M.: A space-economical suffix tree construction algorithm, J. Assoc. Comput. Mach. 23, 262–272 (1976)

    MATH  MathSciNet  Google Scholar 

  18. Sadakane, K.: Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Int. Symp. Algorithms and Computation, pp. 410–421 (2000)

    Google Scholar 

  19. Sadakane, K.: Succinct representations of lcp Information and improvements in the compressed suffix arrays. In: ACM-SIAM Symp. on Discrete Algorithms, pp. 225–232 (2002)

    Google Scholar 

  20. Sim, J.S., Kim, D.K., Park, H., Park, K.: Linear-time search in suffix arrays. In: Australasian Workshop on Combinatorial Algorithms, pp. 139–146 (2003)

    Google Scholar 

  21. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  22. Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th IEEE Symp. Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, D.K., Jo, J., Park, H. (2004). A Fast Algorithm for Constructing Suffix Arrays for Fixed-Size Alphabets. In: Ribeiro, C.C., Martins, S.L. (eds) Experimental and Efficient Algorithms. WEA 2004. Lecture Notes in Computer Science, vol 3059. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24838-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24838-5_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22067-1

  • Online ISBN: 978-3-540-24838-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics