Skip to main content

The Burrows-Wheeler Transform: Theory and Practice

Invited Lecture

  • Conference paper
Mathematical Foundations of Computer Science 1999 (MFCS 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1672))

Abstract

In this paper we describe the Burrows-Wheeler Transform (BWT) a completely new approach to data compression which is the basis of some of the best compressors available today. Although it is easy to intuitively understand why the BWT helps compression, the analysis of BWT-based algorithms requires a careful study of every single algorithmic component. We describe two algorithms which use the BWT and we show that their compression ratio can be bounded in terms of the k-th order empirical entropy of the input string for any k ≥ 0. Intuitively, this means that these algorithms are able to make use of all the regularity which is in the input string.

We also discuss some of the algorithmic issues which arise in the computation of the BWT, and we describe two variants of the BWT which promise interesting developments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Andersson and S. Nilsson. Efficient implementation of suffix trees. Software — Practice and Experience, 25(2):129–141, 1995.

    Article  Google Scholar 

  2. R. Arnold and T. Bell. The Canterbury corpus home page, http://corpus.canterbury.ac.nz.

  3. S. Kurtz B. Balkenhol. Universal data compression based on the burrows and wheeler transformation: Theory and practice. Technical Report 98-069, Universitat Bielefeld, 1998. http://www.mathematik.uni-bielefeld.de/sfb343/preprints/.

  4. Y. M. Shtarkov B. Balkenhol, S. Kurtz. Modification of the burrows and wheeler data compression algorithm. In DCC: Data Compression Conference. IEEE Computer Society TCC, 1999.

    Google Scholar 

  5. J. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 360–369, New Orleans, Louisiana, 1997.

    Google Scholar 

  6. J. Bentley, D. Sleator, R. Tarjan, and V. Wei. A locally adaptive data compression scheme. Communications of the ACM, 29(4):320–330, April 1986.

    Article  MATH  MathSciNet  Google Scholar 

  7. M. Burrows and D. J. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, California, 1994.

    Google Scholar 

  8. J. G. Cleary and W. J. Teahan. Unbounded length contexts for PPM. The Computer Journal, 40(2/3): 67–75, 1997.

    Article  Google Scholar 

  9. G. V. Cormack and R. N. S. Horspool. Data compression using dynamic Markov modelling. The Computer Journal, 30(6):541–550, 1987.

    MathSciNet  Google Scholar 

  10. M. Effros. Universal lossless source coding with the burrows wheeler transform. In DCC: Data Compression Conference. IEEE Computer Society TCC, 1999.

    Google Scholar 

  11. M. Farach and T. Thorup. String matching in Lempel-Ziv compressed strings. In ACM Symposium on Theory of Computing (STOC), 1995.

    Google Scholar 

  12. P. Fenwick. Block sorting text compression — final report. Technical Report 130, Dept. of Computer Science, The University of Auckland New Zeland, 1996.

    Google Scholar 

  13. P. Fenwick. The Burrows-Wheeler transform for block sorting text compression: principles and improvements. The Computer Journal, 39(9):731–740, 1996.

    Article  Google Scholar 

  14. P. Fenwick. Symbol ranking text compression with Shannon recoding. J. UCS, 3(2):70–85, 1997.

    MATH  Google Scholar 

  15. P. Howard and J. Vitter. Analysis of arithmetic coding for data compression. Information Processing and Management, 28(6), 1992.

    Google Scholar 

  16. R. Kosaraju and G. Manzini. Compression of low entropy strings with Lempel-Ziv algorithms. SIAM J. on Computing, To Appear. Preliminary version in Proceedings Int. Conference on Compression and Complexity of Sequences, 102–121, 1997.

    Google Scholar 

  17. S. Kurtz. Reducing the space requirement of suffix trees. Technical Report 98-03, Universitat Bielefeld, 1998. http://www.mathematik.uni-bielefeld.de/sfb343/preprints/.

  18. N. J. Larsson. The context trees of block sorting compression. In Proceedings of the IEEE Data Compression Conference, pages 189–198, March–April 1998.

    Google Scholar 

  19. N. J. Larsson and K. Sadakane. Faster suffix sorting. Technical Report LU-CS-TR:99-214, LUNDFD6/(NFCS-3140)/1–43/(1999), Department of Computer Science, Lund University, Sweden, 1999.

    Google Scholar 

  20. U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM J. Comput, 22(5):935–948, October 1993.

    Article  MATH  MathSciNet  Google Scholar 

  21. G. Manzini. An analysis of the Burrows-Wheeler transform, 1999. In preparation. Preliminary version in Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, (SODA’ 99), 669–677.

    Google Scholar 

  22. G. Manzini. Efficient algorithms for on-line symbol ranking compression. In Proceedings of the 7th European Symposium on Algorithms (ESA’ 99). Springer Verlag, 1999.

    Google Scholar 

  23. E. McCreight. A space economical suffix tree construction algorithm. Journal of the ACM, 23(2):262–272, 1976.

    Article  MATH  MathSciNet  Google Scholar 

  24. D. McIlroy. ssort.c, 1997. http://cm.bell-lab8.com/cm/cs/who/doug/8ource.html.

  25. A. Moffat. Implementing the PPM data compression scheme. IEEE Transactions on Communications, COM-38:1917–1921, 1990.

    Article  Google Scholar 

  26. Mark R. Nelson. Data compression with the Burrows-Wheeler transform. Dr. Dobb’s Journal of Software Tools, 21(9):46–50, 1996.

    Google Scholar 

  27. B. Y. Ryabko. Data compression by means of a’ book stack’. Prob.Inf.Transm, 16(4), 1980.

    Google Scholar 

  28. K. Sadakane. Text compression using recency rank with context and relation to context sorting, block sorting and PPM*. In Proc. Int. Conference on Compression and Complexity of Sequences (SEQUENCES’ 97). IEEE Computer Society TCC, 1997.

    Google Scholar 

  29. K. Sadakane. On optimality of variants of the block sorting compression. In Data Compression Conference. IEEE Computer Society TCC, 1998.

    Google Scholar 

  30. K. Sadakane. A modified Burrows-Wheeler transformation for case-insensitive search with application to suffix array compression. In DCC: Data Compression Conference. IEEE Computer Society TCC, 1999.

    Google Scholar 

  31. D. Salomon. Data Compression: the Complete Reference. Springer Verlag, 1997.

    Google Scholar 

  32. M. Schindler. A fast block-sorting algorithm for lossless data compression. In Data Compression Conference. IEEE Computer Society TCC, 1997. http://eiunix.tuwien.ac.at/~michael/st/.

  33. M. Schindler. The SZIP home page, 1997. http://www.compressconsult.com/szip/.

  34. J. Seward. The BZIP2 home page, 1997. http://www.muraroa.demon.co.uk.

  35. E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249–260, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  36. J. Vitter. Design and analysis of dynamic Huffman codes. Journal of the ACM, 34(4):825–845, October 1987.

    Article  MATH  MathSciNet  Google Scholar 

  37. P. Weiner. Linear pattern matching algorithms. In Proc. 14th IEEE Annual Symposium on Switching and Automata Theory, pages 1–11, 1973.

    Google Scholar 

  38. I. Witten, R. Neal, and J. Cleary. Arithmetic coding for data compression. Communications of the ACM, 30(6):520–540, June 1987.

    Article  Google Scholar 

  39. H. Yokoo. Data compression using a sort-based similarity measure. The Computer Journal, 40(2/3):94–102, 1997.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Manzini, G. (1999). The Burrows-Wheeler Transform: Theory and Practice. In: Kutyłowski, M., Pacholski, L., Wierzbicki, T. (eds) Mathematical Foundations of Computer Science 1999. MFCS 1999. Lecture Notes in Computer Science, vol 1672. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48340-3_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-48340-3_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66408-6

  • Online ISBN: 978-3-540-48340-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics