Skip to main content

Engineering a Lightweight Suffix Array Construction Algorithm

Extended Abstract

  • Conference paper
  • First Online:
Algorithms — ESA 2002 (ESA 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2461))

Included in the following conference series:

Abstract

We consider the problem of computing the suffix array of a text T [1], [n]. Thisproblem consists in sorting the suffixes of T in lexicographic order. The suffix array [16] (or pat array [9]) is a simple, easy to code, and elegant data structure used for several fundamental string matching problems involving both linguistic texts and biological data [4], [11]. Recently, the interest in this data structure has been revitalized by its use as a building block for three novel applications: (1) the Burrows-Wheeler compression algorithm [3], which is a provably [17] and practically [20] effective compression tool; (2) the construction of succinct [10], [19] and compressed [7], [8] indexes; the latter can store both the input text and its full-text index using roughly the same space used by traditional compressors for the text alone; and (3) algorithms for clustering and ranking the answers to user queries in web-search engines [22]. In all these applications the construction of the suffix array is the computational bottleneck both in time and space. This motivated our interest in designing yet another suffix array construction algorithm which is fast and “lightweight” in the sense that it uses small space.

Partially supported by Italian MIUR project on “Technologies and services for enhanced content delivery”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. L. Bentley and M.D. McIlroy. Engineering a sort function. Software-Practice and Experience, 23(11):1249–1265, 1993.

    Article  Google Scholar 

  2. J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In Proceedings of the 8th ACM-SIAM Symposium on Discrete Algorithms, pages 360–369, 1997.

    Google Scholar 

  3. M. Burrows and D. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994.

    Google Scholar 

  4. M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, 1994.

    Google Scholar 

  5. M. Farach-Colton, P. Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. Journal of the ACM, 47(6):987–1011, 2000.

    Article  MATH  MathSciNet  Google Scholar 

  6. P. Ferragina and R. Grossi. The string B-tree: A new data structure for string search in external memory and its applications. Journal of the ACM, 46(2):236–280, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  7. P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Proc. of the 41st IEEE Symposium on Foundations of Computer Science, pages 390–398, 2000.

    Google Scholar 

  8. P. Ferragina and G. Manzini. An experimental study of an opportunistic index. In Proc. 12th ACM-SIAM Symposium on Discrete Algorithms, pages 269–278, 2001.

    Google Scholar 

  9. G.H. Gonnet, R. A. Baeza-Yates, and T. Snider. New indices for text: PAT trees and PAT arrays. In B. Frakes and R.A. Baeza-Yates and, editors, Information Retrieval: Data Structures and Algorithms, chapter 5, pages 66–82. Prentice-Hall, 1992.

    Google Scholar 

  10. R. Grossi and J. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proc. of the 32nd ACM Symposium on Theory of Computing, pages 397–406, 2000.

    Google Scholar 

  11. D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.

    Google Scholar 

  12. H. Itoh and H. Tanaka. An efficient method for in memory construction of suffix arrays. In Proceedings of the sixth Symposium on String Processing and Information Retrieval, SPIRE’ 99, pages 81–88. IEEE Computer Society Press, 1999.

    Google Scholar 

  13. R. Karp, R. Miller, and A. Rosenberg. Rapid Identification of Repeated Patterns in Strings, Arrays and Trees. In Proceedings of the ACM Symposium on Theory of Computation, pages 125–136, 1972.

    Google Scholar 

  14. S. Kurtz. Reducing the space requirement of suffix trees. Software—Practice and Experience, 29(13):1149–1171, 1999.

    Article  Google Scholar 

  15. N. J. Larsson and K. Sadakane. Faster suffix sorting. Technical Report LUCS-TR:99-214, LUNDFD6/(NFCS-3140)/1-43/(1999), Department of Computer Science, Lund University, Sweden, 1999.

    Google Scholar 

  16. U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 22(5):935–948, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  17. G. Manzini. An analysis of the Burrows-Wheeler transform. Journal of the ACM, 48(3):407–430, 2001.

    Article  MathSciNet  Google Scholar 

  18. P. M. McIlroy and K. Bostic. Engineering radix sort. Computing Systems, 6(1):5–27, 1993.

    Google Scholar 

  19. K. Sadakane. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Proceeding of the 11th International Symposium on Algorithms and Computation, pages 410–421. Springer-Verlag, LNCS n. 1969, 2000.

    Google Scholar 

  20. J. Seward. The bzip2 home page, 1997. http://sourceware.cygnus.com/bzip2/index.html.

  21. J. Seward. On the performance of BWT sorting algorithms. In DCC: Data Compression Conference, pages 173–182. IEEE Computer Society TCC, 2000.

    Google Scholar 

  22. O. Zamir and O. Etzioni. Grouper: A dynamic clustering interface to web search results. Computer Networks, 31(11–16):1361–1374, 1999.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Manzini, G., Ferragina, P. (2002). Engineering a Lightweight Suffix Array Construction Algorithm. In: Möhring, R., Raman, R. (eds) Algorithms — ESA 2002. ESA 2002. Lecture Notes in Computer Science, vol 2461. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45749-6_61

Download citation

  • DOI: https://doi.org/10.1007/3-540-45749-6_61

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44180-9

  • Online ISBN: 978-3-540-45749-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics