Advertisement

Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform

  • Alberto Policriti
  • Nicola Gigante
  • Nicola PrezzaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8977)

Abstract

The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let \(n\), \(\sigma \), and \(H_k\) be the text length, the alphabet size, and the \(k\)-th order empirical entropy of the text, respectively. Moreover, let \(H_k^*=\min \{H_k+1,\lceil \log \sigma \rceil \}\). Under the word RAM model with word size \(w\in \Theta (\log n)\), our algorithm builds the BWT in average \(\mathcal {O}(nH_k^*)\) time using \(nH_k^* + o(nH_k^*)\) bits of space, where \(k=\log _\sigma (n/\log ^2 n)-1\). We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.

Keywords

Internal Node Input Symbol Automaton State Word Size Alphabet Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Belazzougui, D.: Linear time construction of compressed text indices in compact space. arXiv preprint arXiv:1401.0936 (2014)
  2. 2.
    Beller, T., Zwerger, M., Gog, S., Ohlebusch, E.: Space-efficient construction of the burrows-wheeler transform. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 5–16. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  3. 3.
    de Bruijn, N.G., Erdos, P.: A combinatorial problem. Koninklijke Nederlandse Akademie v. Wetenschappen 49(49), 758–764 (1946)zbMATHGoogle Scholar
  4. 4.
    Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm (1994)Google Scholar
  5. 5.
    Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. Algorithmica 63(3), 707–730 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. Journal of the ACM (JACM) 52(4), 688–713 (2005)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE (2000)Google Scholar
  8. 8.
    Fredman, M., Saks, M.: The cell probe complexity of dynamic data structures. In: Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing, pp. 345–354. ACM (1989)Google Scholar
  9. 9.
    Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: plug and play with succinct data structures. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 326–337. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  10. 10.
    González, R., Navarro, G.: Rank/select on dynamic compressed sequences and applications. Theoretical Computer Science 410(43), 4414–4422 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  11. 11.
    He, M., Munro, J.I.: Succinct representations of dynamic strings. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 334–346. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  12. 12.
    Huffman, D.A., et al.: A method for the construction of minimum redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)CrossRefGoogle Scholar
  13. 13.
    Lippert, R.A., Mobarry, C.M., Walenz, B.P.: A space-efficient construction of the burrows-wheeler transform for genomic data. Journal of Computational Biology 12(7), 943–951 (2005)CrossRefGoogle Scholar
  14. 14.
    Manzini, G.: An analysis of the burrows-wheeler transform. Journal of the ACM (JACM) 48(3), 407–430 (2001)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Navarro, G.: Wavelet trees for all. Journal of Discrete Algorithms 25, 2–20 (2014)CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys (CSUR) 39(1), 2 (2007)CrossRefGoogle Scholar
  17. 17.
    Navarro, G., Nekrich, Y.: Optimal dynamic sequence representations. SIAM Journal on Computing 43(5), 1781–1806 (2014)CrossRefzbMATHMathSciNetGoogle Scholar
  18. 18.
    Navarro, G., Sadakane, K.: Fully functional static and dynamic succinct trees. ACM Transactions on Algorithms (TALG) 10(3), 16 (2014)MathSciNetGoogle Scholar
  19. 19.
    Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Data Compression Conference, DCC 2009, pp. 193–202. IEEE (2009)Google Scholar
  20. 20.
    Raman, R., Raman, V., Rao, S.S.: Succinct dynamic data structures. In: Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001. LNCS, vol. 2125, pp. 426–437. Springer, Heidelberg (2001) CrossRefGoogle Scholar
  21. 21.
    Tischler, G.: Faster average case low memory semi-external construction of the burrows-wheeler transform. In: CEUR Workshop Proceedings, vol. 1146, pp. 61–68 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Alberto Policriti
    • 1
    • 2
  • Nicola Gigante
    • 1
  • Nicola Prezza
    • 1
    Email author
  1. 1.Department of Mathematics and InformaticsUniversity of UdineUdineItaly
  2. 2.Istituto di Genomica ApplicataUdineItaly

Personalised recommendations