A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays

Lam, Tak-Wah; Sadakane, Kunihiko; Sung, Wing-Kin; Yiu, Siu-Ming

doi:10.1007/3-540-45655-4_43

Tak-Wah Lam⁶,
Kunihiko Sadakane⁷,
Wing-Kin Sung⁸ &
…
Siu-Ming Yiu⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2387))

Included in the following conference series:

International Computing and Combinatorics Conference

640 Accesses
13 Citations

Abstract

With the first Human DNA being decoded into a sequence of about 2.8 billion base pairs, many biological research has been centered on analyzing this sequence. Theoretically speaking, it is now feasible to accommodate an index for human DNA in main memory so that any pattern can be located efficiently. This is due to the recent breakthrough on compressed suffix arrays, which reduces the space requirement from O(n log n) bits to O(n) bits. However, constructing compressed suffix arrays is still not an easy task because we still have to compute suffix arrays first and need a working memory of O(n log n) bits (i.e., more than 13 Gigabytes for human DNA). This paper initiates the study of constructing compressed suffix arrays directly from text. The main contribution is a new construction algorithm that uses only O(n) bits of working memory, and more importantly, the time complexity remains the same as before, i.e., O(n log n).

This research was supported in part by NUS Academic Research Grant R-252-000-119-112

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. R. Clark and J. I. Munro. Efficient suffix trees on secondary storage. In Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 383–391. 1996.
Google Scholar
Altschul S. F., Gish W., Miller W., Myers E. W., and Lipman D. J. Basic locol alignment search tool. Journal of Molecular Biology, pages 403–410, 1990.
Google Scholar
P. Elias. Universal codeword sets and representation of the integers. IEEE Transactions on Information Theory, 21(2):194–203, 1975.
Article MATH MathSciNet Google Scholar
P. Ferragine and G. Manzini. Opportunistic data structures with applications. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), pages 390–398. 2000.
Google Scholar
R. Grossi and J.S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proceedings of the 32nd ACM Symposium on Theory of Computing, pages 397–406, 2000.
Google Scholar
E. Hunt, M. P. Atkinson, and R. W. Irving. A database index to large biological sequences. In Proceedings of the 27th VLDB Conference, pages 410–421. 2000.
Google Scholar
S. Kurtz. Reducing the space requirement of suffix trees. Software Practice and Experiences, 29:1149–1171, 1999.
Article Google Scholar
U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 22(5):935–948, 1993.
Article MATH MathSciNet Google Scholar
E. M. MCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262–272, 1976.
Article Google Scholar
K. Sadakane. Compressed text databases with efficient query algorithms based on compressed suffix array. In Proceedings of the 11th International Conference on Algorithms and Computation (ISAAC), pages 410–421. 2000.
Google Scholar
K. Sadakane and T. Shibyya. Indexing huge genome sequences for solving various porblems. In Genome Informatics, pages 175–183. 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Hong Kong, Hong Kong
Tak-Wah Lam & Siu-Ming Yiu
Department of System Information Sciences Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Kunihiko Sadakane
Department of Computer Science, National University of Singapore, Singapore
Wing-Kin Sung

Authors

Tak-Wah Lam
View author publications
You can also search for this author in PubMed Google Scholar
Kunihiko Sadakane
View author publications
You can also search for this author in PubMed Google Scholar
Wing-Kin Sung
View author publications
You can also search for this author in PubMed Google Scholar
Siu-Ming Yiu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of California, Santa Barbara, California, 93106, USA
Oscar H. Ibarra
Department of Mathematics, National University of Singapore, Singapore, Singapore, 117543
Louxin Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lam, TW., Sadakane, K., Sung, WK., Yiu, SM. (2002). A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays. In: Ibarra, O.H., Zhang, L. (eds) Computing and Combinatorics. COCOON 2002. Lecture Notes in Computer Science, vol 2387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45655-4_43

Download citation

DOI: https://doi.org/10.1007/3-540-45655-4_43
Published: 29 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43996-7
Online ISBN: 978-3-540-45655-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics