The Myriad Virtues of Wavelet Trees
Wavelet Trees have been introduced in [Grossi, Gupta and Vitter, SODA ’03] and have been rapidly recognized as a very flexible tool for the design of compressed full-text indexes and data compressors. Although several papers have investigated the beauty and usefulness of this data structure in the full-text indexing scenario, its impact on data compression has not been fully explored. In this paper we provide a complete theoretical analysis of a wide class of compression algorithms based on Wavelet Trees. We also show how to improve their asymptotic performance by introducing a novel framework, called Generalized Wavelet Trees, that aims for the best combination of binary compressors (like, Run-Length encoders) versus non-binary compressors (like, Huffman and Arithmetic encoders) and Wavelet Trees of properly-designed shapes. As a corollary, we prove high-order entropy bounds for the challenging combination of Burrows-Wheeler Transform and Wavelet Trees.
KeywordsInternal Node Binary String Compression Algorithm Complete Binary Tree Alphabet Symbol
Unable to display preview. Download preview PDF.
- 1.Abel, J.: Improvements to the Burrows-Wheeler compression algorithm: After BWT stages, http://citeseer.ist.psu.edu/abel03improvements.html
- 3.Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)Google Scholar
- 6.Foschini, L., Grossi, R., Gupta, A., Vitter, J.: Fast compression with a static model in high order entropy. In: DCC: Data Compression Conference, pp. 62–71. IEEE Computer Society TCC, Los Alamitos (2004)Google Scholar
- 7.Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA 2003), pp. 841–850 (2003)Google Scholar
- 8.Grossi, R., Gupta, A., Vitter, J.: When indexing equals compression: Experiments on compressing suffix arrays and applications. In: Proc. 15th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA 2004), pp. 636–645 (2004)Google Scholar