Abstract
We present a linear-time algorithm to compute the longest common prefix information in suffix arrays. As two applications of our algorithm, we show that our algorithm is crucial to the effective use of block-sorting compression, and we present a linear-time algorithm to sim- ulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A.V. Aho, J.E. Hopcroft and U.D. Ullman, Data Structures and Algorithms, Addison-Wesley, 1983.
H. Arimura, S. Arikawa and S. Shimozono, Efficient discovery of optimal word-association patterns in large text databases, New Generation Comput., 18, 49–60, 2000.
H. Arimura, H. Asaka, H. Sakamoto and S. Arikawa, Efficient discovery of proximity patterns with suffix arrays, In Proc. CPM 2001, Poster paper, LNCS, Springer-Verlag, 2001. (In this volumn).
M. Burrows and D.J. Wheeler, A block-sorting lossless data compression algorithm, Digital Systems Research Center Research Report 124, 1994.
M. Farach-Colton, P. Ferragina and S. Muthukrishnan, On the sorting-complexity of suffix tree construction, Journal of the ACM, Vol.47,No.6, 987–1011, 2000.
P. Ferragina and G. Manzini, Opportunistic data structures with applications, In Proc. 41st IEEE Symposium on Foundations of Computer Science, 390–398 2000.
P. Ferragina and G. Manzini, An experimental study of an opportunistic index, In Proc. 12th ACM-SIAM Symposium on Discrete Algorithms, 269–278 2001.
P. Fenwick, Block sorting text compression, In Proc. Australian Computer Science Communications, 18(1), 193–202, 1996.
R. Fujino, H. Arimura and S. Arikawa, Discovering unordered and ordered phrase association patterns for text mining, In Proc. PAKDD2000, LNAI 1805, 281–293, 2000.
R. Grossi and J.S. Vitter, Compressed suffix arrays and suffix trees with applications to text indexing and string matching, In Proc. 32nd ACM Symposium on Theory of Computing, 397–406, 2000.
D. Gusfield, An increment-by-one approach to suffix arrays and trees, Technical Report CSE-90-39, UC Davis, Dept. Computer Science, 1990.
D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, New York, 1997.
R. Harris, Abstract Index, Monash Univ (1998).
T. Kasai, H. Arimura and S. Arikawa, Efficient substring traversal with suffix arrays, DOI-TR 185, Feb. 2001. (First appeared as T. Kasai, Fast algorithms for the subword statistics problems with suffix arrays, Mc. Thesis, Dept. Informatics, Kyushu Univ.,1999, In Japanese.
S.E. Lee and K. Park, A new algorithm for constructing suffix arrays, Journal of Korea Information Science Society (A), 24(7), 697–704, 1997.
U. Manber and G. Myers, Suffix arrays: A new method for on-line string searches, SIAM J. Computing, 22(5), 935–948 (1993).
E.M. McCreight, A space-economical suffix tree construction algorithm, Journal of the ACM, 23(2), 262–272, 1976.
K. Sadakane and H. Imai, A cooperative distributed text database management method unifying search and compression based on the Burrows-Wheeler transformation, In Proc. International Workshop on New Database Technologies for Collaborative Work Support and Spatio-Temporal Data Management, 434–445, 1998.
K. Sadakane, A modified Burrows-Wheeler transformation for case-insensitive search with application to suffix array compression, In Proc. Data Compression Conference, p.548, 1999.
K. Sadakane, Compressed text databases with efficient query algorithms based on the compressed suffix array, In Proc. 11th Annual International Symposium on Algorithms and Computation, 410–421, 2000.
J. Seward, http://www.sources.redhat.com/bzip2
J. Stoye and D. Gusfield, Simple and flexible detection of contiguous repeats using a suffix tree, In Proc. CPM’98, LNCS, 140–152, 1998.
E. Ukkonen, On-line construction of suffix trees, Algorithmica 14, 249–260, 1995.
J.S. Vitter, External memory algorithms, In Proc. PODS’98, 119–128 (1998).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K. (2001). Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications. In: Amir, A. (eds) Combinatorial Pattern Matching. CPM 2001. Lecture Notes in Computer Science, vol 2089. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48194-X_17
Download citation
DOI: https://doi.org/10.1007/3-540-48194-X_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42271-6
Online ISBN: 978-3-540-48194-2
eBook Packages: Springer Book Archive