Abstract
In this paper we describe the Burrows-Wheeler Transform (BWT) a completely new approach to data compression which is the basis of some of the best compressors available today. Although it is easy to intuitively understand why the BWT helps compression, the analysis of BWT-based algorithms requires a careful study of every single algorithmic component. We describe two algorithms which use the BWT and we show that their compression ratio can be bounded in terms of the k-th order empirical entropy of the input string for any k ≥ 0. Intuitively, this means that these algorithms are able to make use of all the regularity which is in the input string.
We also discuss some of the algorithmic issues which arise in the computation of the BWT, and we describe two variants of the BWT which promise interesting developments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Andersson and S. Nilsson. Efficient implementation of suffix trees. Software — Practice and Experience, 25(2):129–141, 1995.
R. Arnold and T. Bell. The Canterbury corpus home page, http://corpus.canterbury.ac.nz.
S. Kurtz B. Balkenhol. Universal data compression based on the burrows and wheeler transformation: Theory and practice. Technical Report 98-069, Universitat Bielefeld, 1998. http://www.mathematik.uni-bielefeld.de/sfb343/preprints/.
Y. M. Shtarkov B. Balkenhol, S. Kurtz. Modification of the burrows and wheeler data compression algorithm. In DCC: Data Compression Conference. IEEE Computer Society TCC, 1999.
J. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 360–369, New Orleans, Louisiana, 1997.
J. Bentley, D. Sleator, R. Tarjan, and V. Wei. A locally adaptive data compression scheme. Communications of the ACM, 29(4):320–330, April 1986.
M. Burrows and D. J. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, California, 1994.
J. G. Cleary and W. J. Teahan. Unbounded length contexts for PPM. The Computer Journal, 40(2/3): 67–75, 1997.
G. V. Cormack and R. N. S. Horspool. Data compression using dynamic Markov modelling. The Computer Journal, 30(6):541–550, 1987.
M. Effros. Universal lossless source coding with the burrows wheeler transform. In DCC: Data Compression Conference. IEEE Computer Society TCC, 1999.
M. Farach and T. Thorup. String matching in Lempel-Ziv compressed strings. In ACM Symposium on Theory of Computing (STOC), 1995.
P. Fenwick. Block sorting text compression — final report. Technical Report 130, Dept. of Computer Science, The University of Auckland New Zeland, 1996.
P. Fenwick. The Burrows-Wheeler transform for block sorting text compression: principles and improvements. The Computer Journal, 39(9):731–740, 1996.
P. Fenwick. Symbol ranking text compression with Shannon recoding. J. UCS, 3(2):70–85, 1997.
P. Howard and J. Vitter. Analysis of arithmetic coding for data compression. Information Processing and Management, 28(6), 1992.
R. Kosaraju and G. Manzini. Compression of low entropy strings with Lempel-Ziv algorithms. SIAM J. on Computing, To Appear. Preliminary version in Proceedings Int. Conference on Compression and Complexity of Sequences, 102–121, 1997.
S. Kurtz. Reducing the space requirement of suffix trees. Technical Report 98-03, Universitat Bielefeld, 1998. http://www.mathematik.uni-bielefeld.de/sfb343/preprints/.
N. J. Larsson. The context trees of block sorting compression. In Proceedings of the IEEE Data Compression Conference, pages 189–198, March–April 1998.
N. J. Larsson and K. Sadakane. Faster suffix sorting. Technical Report LU-CS-TR:99-214, LUNDFD6/(NFCS-3140)/1–43/(1999), Department of Computer Science, Lund University, Sweden, 1999.
U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM J. Comput, 22(5):935–948, October 1993.
G. Manzini. An analysis of the Burrows-Wheeler transform, 1999. In preparation. Preliminary version in Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, (SODA’ 99), 669–677.
G. Manzini. Efficient algorithms for on-line symbol ranking compression. In Proceedings of the 7th European Symposium on Algorithms (ESA’ 99). Springer Verlag, 1999.
E. McCreight. A space economical suffix tree construction algorithm. Journal of the ACM, 23(2):262–272, 1976.
D. McIlroy. ssort.c, 1997. http://cm.bell-lab8.com/cm/cs/who/doug/8ource.html.
A. Moffat. Implementing the PPM data compression scheme. IEEE Transactions on Communications, COM-38:1917–1921, 1990.
Mark R. Nelson. Data compression with the Burrows-Wheeler transform. Dr. Dobb’s Journal of Software Tools, 21(9):46–50, 1996.
B. Y. Ryabko. Data compression by means of a’ book stack’. Prob.Inf.Transm, 16(4), 1980.
K. Sadakane. Text compression using recency rank with context and relation to context sorting, block sorting and PPM*. In Proc. Int. Conference on Compression and Complexity of Sequences (SEQUENCES’ 97). IEEE Computer Society TCC, 1997.
K. Sadakane. On optimality of variants of the block sorting compression. In Data Compression Conference. IEEE Computer Society TCC, 1998.
K. Sadakane. A modified Burrows-Wheeler transformation for case-insensitive search with application to suffix array compression. In DCC: Data Compression Conference. IEEE Computer Society TCC, 1999.
D. Salomon. Data Compression: the Complete Reference. Springer Verlag, 1997.
M. Schindler. A fast block-sorting algorithm for lossless data compression. In Data Compression Conference. IEEE Computer Society TCC, 1997. http://eiunix.tuwien.ac.at/~michael/st/.
M. Schindler. The SZIP home page, 1997. http://www.compressconsult.com/szip/.
J. Seward. The BZIP2 home page, 1997. http://www.muraroa.demon.co.uk.
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249–260, 1995.
J. Vitter. Design and analysis of dynamic Huffman codes. Journal of the ACM, 34(4):825–845, October 1987.
P. Weiner. Linear pattern matching algorithms. In Proc. 14th IEEE Annual Symposium on Switching and Automata Theory, pages 1–11, 1973.
I. Witten, R. Neal, and J. Cleary. Arithmetic coding for data compression. Communications of the ACM, 30(6):520–540, June 1987.
H. Yokoo. Data compression using a sort-based similarity measure. The Computer Journal, 40(2/3):94–102, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Manzini, G. (1999). The Burrows-Wheeler Transform: Theory and Practice. In: Kutyłowski, M., Pacholski, L., Wierzbicki, T. (eds) Mathematical Foundations of Computer Science 1999. MFCS 1999. Lecture Notes in Computer Science, vol 1672. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48340-3_4
Download citation
DOI: https://doi.org/10.1007/3-540-48340-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66408-6
Online ISBN: 978-3-540-48340-3
eBook Packages: Springer Book Archive