Engineering a Lightweight Suffix Array Construction Algorithm

Manzini, Giovanni; Ferragina, Paolo

doi:10.1007/3-540-45749-6_61

Giovanni Manzini^6,7 &
Paolo Ferragina⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2461))

Included in the following conference series:

European Symposium on Algorithms

1753 Accesses
13 Citations

Abstract

We consider the problem of computing the suffix array of a text T [1], [n]. Thisproblem consists in sorting the suffixes of T in lexicographic order. The suffix array [16] (or pat array [9]) is a simple, easy to code, and elegant data structure used for several fundamental string matching problems involving both linguistic texts and biological data [4], [11]. Recently, the interest in this data structure has been revitalized by its use as a building block for three novel applications: (1) the Burrows-Wheeler compression algorithm [3], which is a provably [17] and practically [20] effective compression tool; (2) the construction of succinct [10], [19] and compressed [7], [8] indexes; the latter can store both the input text and its full-text index using roughly the same space used by traditional compressors for the text alone; and (3) algorithms for clustering and ranking the answers to user queries in web-search engines [22]. In all these applications the construction of the suffix array is the computational bottleneck both in time and space. This motivated our interest in designing yet another suffix array construction algorithm which is fast and “lightweight” in the sense that it uses small space.

Partially supported by Italian MIUR project on “Technologies and services for enhanced content delivery”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. L. Bentley and M.D. McIlroy. Engineering a sort function. Software-Practice and Experience, 23(11):1249–1265, 1993.
Article Google Scholar
J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In Proceedings of the 8th ACM-SIAM Symposium on Discrete Algorithms, pages 360–369, 1997.
Google Scholar
M. Burrows and D. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994.
Google Scholar
M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, 1994.
Google Scholar
M. Farach-Colton, P. Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. Journal of the ACM, 47(6):987–1011, 2000.
Article MATH MathSciNet Google Scholar
P. Ferragina and R. Grossi. The string B-tree: A new data structure for string search in external memory and its applications. Journal of the ACM, 46(2):236–280, 1999.
Article MATH MathSciNet Google Scholar
P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Proc. of the 41st IEEE Symposium on Foundations of Computer Science, pages 390–398, 2000.
Google Scholar
P. Ferragina and G. Manzini. An experimental study of an opportunistic index. In Proc. 12th ACM-SIAM Symposium on Discrete Algorithms, pages 269–278, 2001.
Google Scholar
G.H. Gonnet, R. A. Baeza-Yates, and T. Snider. New indices for text: PAT trees and PAT arrays. In B. Frakes and R.A. Baeza-Yates and, editors, Information Retrieval: Data Structures and Algorithms, chapter 5, pages 66–82. Prentice-Hall, 1992.
Google Scholar
R. Grossi and J. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proc. of the 32nd ACM Symposium on Theory of Computing, pages 397–406, 2000.
Google Scholar
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
Google Scholar
H. Itoh and H. Tanaka. An efficient method for in memory construction of suffix arrays. In Proceedings of the sixth Symposium on String Processing and Information Retrieval, SPIRE’ 99, pages 81–88. IEEE Computer Society Press, 1999.
Google Scholar
R. Karp, R. Miller, and A. Rosenberg. Rapid Identification of Repeated Patterns in Strings, Arrays and Trees. In Proceedings of the ACM Symposium on Theory of Computation, pages 125–136, 1972.
Google Scholar
S. Kurtz. Reducing the space requirement of suffix trees. Software—Practice and Experience, 29(13):1149–1171, 1999.
Article Google Scholar
N. J. Larsson and K. Sadakane. Faster suffix sorting. Technical Report LUCS-TR:99-214, LUNDFD6/(NFCS-3140)/1-43/(1999), Department of Computer Science, Lund University, Sweden, 1999.
Google Scholar
U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 22(5):935–948, 1993.
Article MATH MathSciNet Google Scholar
G. Manzini. An analysis of the Burrows-Wheeler transform. Journal of the ACM, 48(3):407–430, 2001.
Article MathSciNet Google Scholar
P. M. McIlroy and K. Bostic. Engineering radix sort. Computing Systems, 6(1):5–27, 1993.
Google Scholar
K. Sadakane. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Proceeding of the 11th International Symposium on Algorithms and Computation, pages 410–421. Springer-Verlag, LNCS n. 1969, 2000.
Google Scholar
J. Seward. The bzip2 home page, 1997. http://sourceware.cygnus.com/bzip2/index.html.
J. Seward. On the performance of BWT sorting algorithms. In DCC: Data Compression Conference, pages 173–182. IEEE Computer Society TCC, 2000.
Google Scholar
O. Zamir and O. Etzioni. Grouper: A dynamic clustering interface to web search results. Computer Networks, 31(11–16):1361–1374, 1999.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università del Piemonte Orientale, I-15100, Alessandria, Italy
Giovanni Manzini
Istituto di Informatica e Telematica, CNR, I-56100, Pisa, Italy
Giovanni Manzini
Dipartimento di Informatica, Università di Pisa, I-56100, Pisa, Italy
Paolo Ferragina

Authors

Giovanni Manzini
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Ferragina
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fakultät II: Mathematik und Naturwissenschaften, Technische Universität Berlin, Strasse des 17. Juni 136, 10623, Berlin, Germany
Rolf Möhring
Department of Mathematics and Computer Science, University of Leicester, University Road, LE1 7RH, Leicester, UK
Rajeev Raman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Manzini, G., Ferragina, P. (2002). Engineering a Lightweight Suffix Array Construction Algorithm. In: Möhring, R., Raman, R. (eds) Algorithms — ESA 2002. ESA 2002. Lecture Notes in Computer Science, vol 2461. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45749-6_61

Download citation

DOI: https://doi.org/10.1007/3-540-45749-6_61
Published: 29 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44180-9
Online ISBN: 978-3-540-45749-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics