Skip to main content

A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays

  • Conference paper
  • First Online:
Computing and Combinatorics (COCOON 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2387))

Included in the following conference series:

Abstract

With the first Human DNA being decoded into a sequence of about 2.8 billion base pairs, many biological research has been centered on analyzing this sequence. Theoretically speaking, it is now feasible to accommodate an index for human DNA in main memory so that any pattern can be located efficiently. This is due to the recent breakthrough on compressed suffix arrays, which reduces the space requirement from O(n log n) bits to O(n) bits. However, constructing compressed suffix arrays is still not an easy task because we still have to compute suffix arrays first and need a working memory of O(n log n) bits (i.e., more than 13 Gigabytes for human DNA). This paper initiates the study of constructing compressed suffix arrays directly from text. The main contribution is a new construction algorithm that uses only O(n) bits of working memory, and more importantly, the time complexity remains the same as before, i.e., O(n log n).

This research was supported in part by NUS Academic Research Grant R-252-000-119-112

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. R. Clark and J. I. Munro. Efficient suffix trees on secondary storage. In Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 383–391. 1996.

    Google Scholar 

  2. Altschul S. F., Gish W., Miller W., Myers E. W., and Lipman D. J. Basic locol alignment search tool. Journal of Molecular Biology, pages 403–410, 1990.

    Google Scholar 

  3. P. Elias. Universal codeword sets and representation of the integers. IEEE Transactions on Information Theory, 21(2):194–203, 1975.

    Article  MATH  MathSciNet  Google Scholar 

  4. P. Ferragine and G. Manzini. Opportunistic data structures with applications. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), pages 390–398. 2000.

    Google Scholar 

  5. R. Grossi and J.S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proceedings of the 32nd ACM Symposium on Theory of Computing, pages 397–406, 2000.

    Google Scholar 

  6. E. Hunt, M. P. Atkinson, and R. W. Irving. A database index to large biological sequences. In Proceedings of the 27th VLDB Conference, pages 410–421. 2000.

    Google Scholar 

  7. S. Kurtz. Reducing the space requirement of suffix trees. Software Practice and Experiences, 29:1149–1171, 1999.

    Article  Google Scholar 

  8. U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 22(5):935–948, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  9. E. M. MCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262–272, 1976.

    Article  Google Scholar 

  10. K. Sadakane. Compressed text databases with efficient query algorithms based on compressed suffix array. In Proceedings of the 11th International Conference on Algorithms and Computation (ISAAC), pages 410–421. 2000.

    Google Scholar 

  11. K. Sadakane and T. Shibyya. Indexing huge genome sequences for solving various porblems. In Genome Informatics, pages 175–183. 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lam, TW., Sadakane, K., Sung, WK., Yiu, SM. (2002). A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays. In: Ibarra, O.H., Zhang, L. (eds) Computing and Combinatorics. COCOON 2002. Lecture Notes in Computer Science, vol 2387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45655-4_43

Download citation

  • DOI: https://doi.org/10.1007/3-540-45655-4_43

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43996-7

  • Online ISBN: 978-3-540-45655-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics