A Fast Algorithm for Constructing Suffix Arrays for Fixed-Size Alphabets

Kim, Dong K.; Jo, Junha; Park, Heejin

doi:10.1007/978-3-540-24838-5_23

Dong K. Kim¹⁶,
Junha Jo¹⁶ &
Heejin Park¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3059))

Included in the following conference series:

International Workshop on Experimental and Efficient Algorithms

903 Accesses
10 Citations

Abstract

The suffix array of a string T is basically a sorted list of all the suffixes of T. Suffix arrays have been fundamental index data structures in computational biology. If we are to search a DNA sequence in a genome sequence, we construct the suffix array for the genome sequence and then search the DNA sequence in the suffix array. In this paper, we consider the construction of the suffix array of T of length n where the size of the alphabet is fixed. It has been well-known that one can construct the suffix array of T in O(n) time by constructing suffix tree of T and traversing the suffix tree. Although this approach takes O(n) time, it is not appropriate for practical use because it uses a lot of spaces and it is complicated to implement. Recently, almost at the same time, several algorithms have been developed to directly construct suffix arrays in O(n) time. However, these algorithms are developed for integer alphabets and thus do not exploit the properties given when the size of the alphabet is fixed. We present a fast algorithm for constructing suffix arrays for the fixed-size alphabet. Our algorithm constructs suffix arrays faster than any other algorithms developed for integer or general alphabets when the size of the alphabet is fixed. For example, we reduced the time required for constructing suffix arrays for DNA sequences by 25%-38%. In addition, we do not sacrifice the space to improve the running time. The space required by our algorithm is almost equal to or even less than those required by previous fast algorithms.

This work is supported by Korea Research Foundation grant KRF-2003-03-D00343.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm, Technical Report 124 (1994), Digital Equipment Corporation (1994)
Google Scholar
Farach, M.: Optimal suffix tree construction with large alphabets. In: IEEE Symp. Found. Computer Science, pp. 137–143 (1997)
Google Scholar
Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. Assoc. Comput. Mach. 47, 987–1011 (2000)
MATH MathSciNet Google Scholar
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: IEEE Symp. Found. Computer Science, pp. 390–398 (2001)
Google Scholar
Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: ACM-SIAM Symp. on Discrete Algorithms, pp. 269–278 (2001)
Google Scholar
Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: Pat trees and pat arrays. In: Frakes, W.B., Baeza-Yates, R.A. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 66–82. Prentice-Hall, Englewood Cliffs (1992)
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge Univ. Press, Cambridge (1997)
Book MATH Google Scholar
Gusfield, D.: An “Increment-by-one” approach to suffix arrays and trees (1990) (manuscript)
Google Scholar
Grossi, R., Gupta, A., Vitter, J.S.: When indexing equals compression: Experiments with compressing suffix arrays and applications. In: ACM-SIAM Symp. on Discrete Algorithms (2004)
Google Scholar
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: ACM Symp. Theory of Computing, pp. 397–406 (2000)
Google Scholar
Hon, W.K., Sadakane, K., Sung, W.K.: Breaking a time-and-space barrier in constructing full-text indices. In: IEEE Symp. Found. Computer Science, pp. 251–260 (2003)
Google Scholar
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longestcommon- prefix computation in suffix arrays and its applications. In: Symp. Combinatorial Pattern Matching, pp. 181–192 (2001)
Google Scholar
Kärkkäinen, J., Sanders, P.: Simpler linear work suffix array construction. In: Int. Colloq. Automata Languages and Programming, pp. 943–955 (2003)
Google Scholar
Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Symp. Combinatorial Pattern Matching, pp. 186–199 (2003)
Google Scholar
Ko, P., Aluru, S.: Space-efficient linear time construction of suffix arrays. In: Symp. Combinatorial Pattern Matching, pp. 200–210 (2003)
Google Scholar
Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22, 935–938 (1993)
Article MATH MathSciNet Google Scholar
McCreight, E.M.: A space-economical suffix tree construction algorithm, J. Assoc. Comput. Mach. 23, 262–272 (1976)
MATH MathSciNet Google Scholar
Sadakane, K.: Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Int. Symp. Algorithms and Computation, pp. 410–421 (2000)
Google Scholar
Sadakane, K.: Succinct representations of lcp Information and improvements in the compressed suffix arrays. In: ACM-SIAM Symp. on Discrete Algorithms, pp. 225–232 (2002)
Google Scholar
Sim, J.S., Kim, D.K., Park, H., Park, K.: Linear-time search in suffix arrays. In: Australasian Workshop on Combinatorial Algorithms, pp. 139–146 (2003)
Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Article MATH MathSciNet Google Scholar
Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th IEEE Symp. Switching and Automata Theory, pp. 1–11 (1973)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, Pusan National University, Busan, 609-735, South Korea
Dong K. Kim & Junha Jo
College of Information and Communications, Hanyang University, South Korea
Heejin Park

Authors

Dong K. Kim
View author publications
You can also search for this author in PubMed Google Scholar
Junha Jo
View author publications
You can also search for this author in PubMed Google Scholar
Heejin Park
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Universidade Federal Fluminense, Rua Passo da Pátria, 156, Bloco E, RJ 24210-240, Niterói, Brazil
Celso C. Ribeiro
Departamento de Ciência da Computação, Universidade Federal Fluminense, Niterói, RJ, Brazil
Simone L. Martins

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, D.K., Jo, J., Park, H. (2004). A Fast Algorithm for Constructing Suffix Arrays for Fixed-Size Alphabets. In: Ribeiro, C.C., Martins, S.L. (eds) Experimental and Efficient Algorithms. WEA 2004. Lecture Notes in Computer Science, vol 3059. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24838-5_23

Download citation

DOI: https://doi.org/10.1007/978-3-540-24838-5_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22067-1
Online ISBN: 978-3-540-24838-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics