Abstract
We consider the problem of computing the suffix array of a text T [1], [n]. Thisproblem consists in sorting the suffixes of T in lexicographic order. The suffix array [16] (or pat array [9]) is a simple, easy to code, and elegant data structure used for several fundamental string matching problems involving both linguistic texts and biological data [4], [11]. Recently, the interest in this data structure has been revitalized by its use as a building block for three novel applications: (1) the Burrows-Wheeler compression algorithm [3], which is a provably [17] and practically [20] effective compression tool; (2) the construction of succinct [10], [19] and compressed [7], [8] indexes; the latter can store both the input text and its full-text index using roughly the same space used by traditional compressors for the text alone; and (3) algorithms for clustering and ranking the answers to user queries in web-search engines [22]. In all these applications the construction of the suffix array is the computational bottleneck both in time and space. This motivated our interest in designing yet another suffix array construction algorithm which is fast and “lightweight” in the sense that it uses small space.
Partially supported by Italian MIUR project on “Technologies and services for enhanced content delivery”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. L. Bentley and M.D. McIlroy. Engineering a sort function. Software-Practice and Experience, 23(11):1249–1265, 1993.
J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In Proceedings of the 8th ACM-SIAM Symposium on Discrete Algorithms, pages 360–369, 1997.
M. Burrows and D. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994.
M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, 1994.
M. Farach-Colton, P. Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. Journal of the ACM, 47(6):987–1011, 2000.
P. Ferragina and R. Grossi. The string B-tree: A new data structure for string search in external memory and its applications. Journal of the ACM, 46(2):236–280, 1999.
P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Proc. of the 41st IEEE Symposium on Foundations of Computer Science, pages 390–398, 2000.
P. Ferragina and G. Manzini. An experimental study of an opportunistic index. In Proc. 12th ACM-SIAM Symposium on Discrete Algorithms, pages 269–278, 2001.
G.H. Gonnet, R. A. Baeza-Yates, and T. Snider. New indices for text: PAT trees and PAT arrays. In B. Frakes and R.A. Baeza-Yates and, editors, Information Retrieval: Data Structures and Algorithms, chapter 5, pages 66–82. Prentice-Hall, 1992.
R. Grossi and J. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proc. of the 32nd ACM Symposium on Theory of Computing, pages 397–406, 2000.
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
H. Itoh and H. Tanaka. An efficient method for in memory construction of suffix arrays. In Proceedings of the sixth Symposium on String Processing and Information Retrieval, SPIRE’ 99, pages 81–88. IEEE Computer Society Press, 1999.
R. Karp, R. Miller, and A. Rosenberg. Rapid Identification of Repeated Patterns in Strings, Arrays and Trees. In Proceedings of the ACM Symposium on Theory of Computation, pages 125–136, 1972.
S. Kurtz. Reducing the space requirement of suffix trees. Software—Practice and Experience, 29(13):1149–1171, 1999.
N. J. Larsson and K. Sadakane. Faster suffix sorting. Technical Report LUCS-TR:99-214, LUNDFD6/(NFCS-3140)/1-43/(1999), Department of Computer Science, Lund University, Sweden, 1999.
U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 22(5):935–948, 1993.
G. Manzini. An analysis of the Burrows-Wheeler transform. Journal of the ACM, 48(3):407–430, 2001.
P. M. McIlroy and K. Bostic. Engineering radix sort. Computing Systems, 6(1):5–27, 1993.
K. Sadakane. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Proceeding of the 11th International Symposium on Algorithms and Computation, pages 410–421. Springer-Verlag, LNCS n. 1969, 2000.
J. Seward. The bzip2 home page, 1997. http://sourceware.cygnus.com/bzip2/index.html.
J. Seward. On the performance of BWT sorting algorithms. In DCC: Data Compression Conference, pages 173–182. IEEE Computer Society TCC, 2000.
O. Zamir and O. Etzioni. Grouper: A dynamic clustering interface to web search results. Computer Networks, 31(11–16):1361–1374, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Manzini, G., Ferragina, P. (2002). Engineering a Lightweight Suffix Array Construction Algorithm. In: Möhring, R., Raman, R. (eds) Algorithms — ESA 2002. ESA 2002. Lecture Notes in Computer Science, vol 2461. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45749-6_61
Download citation
DOI: https://doi.org/10.1007/3-540-45749-6_61
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44180-9
Online ISBN: 978-3-540-45749-7
eBook Packages: Springer Book Archive