Advertisement

The Myriad Virtues of Wavelet Trees

  • Paolo Ferragina
  • Raffaele Giancarlo
  • Giovanni Manzini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4051)

Abstract

Wavelet Trees have been introduced in [Grossi, Gupta and Vitter, SODA ’03] and have been rapidly recognized as a very flexible tool for the design of compressed full-text indexes and data compressors. Although several papers have investigated the beauty and usefulness of this data structure in the full-text indexing scenario, its impact on data compression has not been fully explored. In this paper we provide a complete theoretical analysis of a wide class of compression algorithms based on Wavelet Trees. We also show how to improve their asymptotic performance by introducing a novel framework, called Generalized Wavelet Trees, that aims for the best combination of binary compressors (like, Run-Length encoders) versus non-binary compressors (like, Huffman and Arithmetic encoders) and Wavelet Trees of properly-designed shapes. As a corollary, we prove high-order entropy bounds for the challenging combination of Burrows-Wheeler Transform and Wavelet Trees.

Keywords

Internal Node Binary String Compression Algorithm Complete Binary Tree Alphabet Symbol 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abel, J.: Improvements to the Burrows-Wheeler compression algorithm: After BWT stages, http://citeseer.ist.psu.edu/abel03improvements.html
  2. 2.
    Arnavut, Z., Magliveras, S.: Block sorting and compression. In: DCC: Data Compression Conference, pp. 181–190. IEEE Computer Society TCC, Los Alamitos (1997)CrossRefGoogle Scholar
  3. 3.
    Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)Google Scholar
  4. 4.
    Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. Journal of the ACM 52, 688–713 (2005)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Ferragina, P., Manzini, G.: Indexing compressed text. Journal of the ACM 52(4), 552–581 (2005)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Foschini, L., Grossi, R., Gupta, A., Vitter, J.: Fast compression with a static model in high order entropy. In: DCC: Data Compression Conference, pp. 62–71. IEEE Computer Society TCC, Los Alamitos (2004)Google Scholar
  7. 7.
    Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA 2003), pp. 841–850 (2003)Google Scholar
  8. 8.
    Grossi, R., Gupta, A., Vitter, J.: When indexing equals compression: Experiments on compressing suffix arrays and applications. In: Proc. 15th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA 2004), pp. 636–645 (2004)Google Scholar
  9. 9.
    Grossi, R., Vitter, J.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM Journal on Computing 35, 378–407 (2005)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Mäkinen, V., Navarro, G.: Succinct suffix arrays based on rul-length encoding. Nordic Journal of Computing 12(1), 40–66 (2005)MathSciNetGoogle Scholar
  11. 11.
    Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Paolo Ferragina
    • 1
  • Raffaele Giancarlo
    • 2
  • Giovanni Manzini
    • 3
  1. 1.Dipartimento di InformaticaUniversità di PisaItaly
  2. 2.Dipartimento di Matematica ed ApplicazioniUniversità di PalermoItaly
  3. 3.Dipartimento di InformaticaUniversità del Piemonte OrientaleItaly

Personalised recommendations