Skip to main content

Most Burrows-Wheeler Based Compressors Are Not Optimal

  • Conference paper
Combinatorial Pattern Matching (CPM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4580))

Included in the following conference series:

Abstract

We present a technique for proving lower bounds on the compression ratio of algorithms which are based on the Burrows-Wheeler Transform (BWT). We study three well known BWT-based compressors: the original algorithm suggested by Burrows and Wheeler; BWT with distance coding; and BWT with run-length encoding. For each compressor, we show a Markov source such that for asymptotically-large text generated by the source, the compression ratio divided by the entropy of the source is a constant greater than 1. This constant is 2 − ε, 1.26, and 1.29, for each of the three compressors respectively. Our technique is robust, and can be used to prove similar claims for most BWT-based compressors (with a few notable exceptions). This stands in contrast to statistical compressors and Lempel-Ziv-style dictionary compressors, which are long known to be optimal, in the sense that for any Markov source, the compression ratio divided by the entropy of the source asymptotically tends to 1.

We experimentally corroborate our theoretical bounds. Furthermore, we compare BWT-based compressors to other compressors and show that for “realistic” Markov sources they indeed perform bad and often worse than other compressors. This is in contrast with the well known fact that on English text, BWT-based compressors are superior to many other types of compressors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The Canterbury Corpus, http://corpus.canterbury.ac.nz

  2. Abel, J.: Web page about Distance Coding, http://www.data-compression.info/Algorithms/DC/

  3. Bentley, J.L., Sleator, D.D., Tarjan, R.E., Wei, V.K.: A locally adaptive data compression scheme. Communications of the ACM 29(4), 320–330 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  4. Binder, E.: Distance coder. Usenet group comp.compression (2000)

    Google Scholar 

  5. Burrows, M., Wheeler, D.J.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, California (1994)

    Google Scholar 

  6. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & sons, New York (1991)

    MATH  Google Scholar 

  7. Deorowicz, S.: Second step algorithms in the Burrows–Wheeler compression algorithm. Software–Practice and Experience 32(2), 99–111 (2002)

    Article  MATH  Google Scholar 

  8. Effros, M., Visweswariah, K., Kulkarni, S., Verdu, S.: Universal lossless source coding with the Burrows Wheeler transform. IEEE Transactions on Information Theory 48(5), 1061–1081 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  9. Ferragina, P., Giancarlo, R., Manzini, G.: The engineering of a compression boosting library: Theory vs practice in BWT compression. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 756–767. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. Journal of the ACM 52, 688–713 (2005)

    Article  MathSciNet  Google Scholar 

  11. Gailly, J., Adler, M.: The gzip compressor, http://www.gzip.org/

  12. Gallager, R.: Variations on a theme by Huffman. IEEE Transactions on Information Theory 24(6), 668–674 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  13. Kaplan, H., Landau, S., Verbin, E.: A simpler analysis of Burrows-Wheeler based compression. To be puiblished in Theoretical Computer Science, special issue on the Burrows-Wheeler Transform and its Applications, Preliminary version published in CPM 2006 (2007)

    Google Scholar 

  14. Kosaraju, S.R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM J. Comput. 29(3), 893–911 (1999)

    Article  MathSciNet  Google Scholar 

  15. Manzini, G.: An analysis of the Burrows-Wheeler Transform. Journal of the ACM 48(3), 407–430 (2001)

    Article  MathSciNet  Google Scholar 

  16. Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 33–50 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  17. Moffat, A., Neal, R.M., Witten, I.H.: Arithmetic coding revisited. ACM Transactions on Information Systems 16(3), 256–294 (1998)

    Article  Google Scholar 

  18. Savari, S.A.: Redundancy of the Lempel-Ziv-Welch code. In: Proc. Data Compression Conference (DCC), pp. 191–200 (1997)

    Google Scholar 

  19. Seward, J.: bzip2, a program and library for data compression, http://www.bzip.org/

  20. Shkarin, D., Cheney, J.: ppmdi, a statistical compressor. This is Shkarin’s compressor PPMII, as modified and incorporated into XMLPPM by Cheney, and then extracted from XMLPPM by Adiego. J

    Google Scholar 

  21. Shor, P.: Lempel-Ziv compression (lecture notes for the course principles of applied mathematics), www-math.mit.edu/~shor/PAM/lempel_ziv_notes.pdf

  22. Welch, T.A.: A technique for high-performance data compression. Computer 17, 8–19 (1984)

    Article  Google Scholar 

  23. Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Communications of the ACM 30(6), 520–540 (1987)

    Article  Google Scholar 

  24. Wyner, A.D., Ziv, J.: The sliding-window Lempel-Ziv algorithm is asymptotically optimal. Proc. IEEE 82(8), 872–877 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bin Ma Kaizhong Zhang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kaplan, H., Verbin, E. (2007). Most Burrows-Wheeler Based Compressors Are Not Optimal. In: Ma, B., Zhang, K. (eds) Combinatorial Pattern Matching. CPM 2007. Lecture Notes in Computer Science, vol 4580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73437-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73437-6_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73436-9

  • Online ISBN: 978-3-540-73437-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics