Advertisement

Move-to-Front, Distance Coding, and Inversion Frequencies Revisited

  • Travis Gagie
  • Giovanni Manzini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4580)

Abstract

Move-to-Front, Distance Coding and Inversion Frequencies are three somewhat related techniques used to process the output of the Burrows-Wheeler Transform. In this paper we analyze these techniques from the point of view of how effective they are in the task of compressing low-entropy strings, that is, strings which have many regularities and are therefore highly compressible. This is a non-trivial task since many compressors have non-constant overheads that become non-negligible when the input string is highly compressible.

Because of the properties of the Burrows-Wheeler transform, being locally optimal ensures an algorithm compresses low-entropy strings effectively. Informally, local optimality implies that an algorithm is able to effectively compress an arbitrary partition of the input string. We show that in their original formulation neither Move-to-Front, nor Distance Coding, nor Inversion Frequencies is locally optimal. Then, we describe simple variants of the above algorithms which are locally optimal. To achieve local optimality with Move-to-Front it suffices to combine it with Run Length Encoding. To achieve local optimality with Distance Coding and Inversion Frequencies we use a novel “escape and re-enter” strategy. Since we build on previous results, our analyses are simple and shed new light on the inner workings of the three techniques considered in this paper.

Keywords

Local Optimality Compression Algorithm Input String Arithmetic Code Alphabet Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arnavut, Z., Magliveras, S.: Block sorting and compression. In: Procs of IEEE Data Compression Conference (DCC), pp. 181–190 (1997)Google Scholar
  2. 2.
    Bentley, J., Sleator, D., Tarjan, R., Wei, V.: A locally adaptive data compression scheme. Communications of the ACM 29(4), 320–330 (1986)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Binder, E.: Distance coder, Usenet group (2000) comp.compression Google Scholar
  4. 4.
    Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)Google Scholar
  5. 5.
    Deorowicz, S.: Second step algorithms in the Burrows-Wheeler compression algorithm. Software: Practice and Experience 32(2), 99–111 (2002)zbMATHCrossRefGoogle Scholar
  6. 6.
    Fenwick, P.: Burrows-Wheeler compression with variable length integer codes. Software: Practice and Experience 32, 1307–1316 (2002)zbMATHCrossRefGoogle Scholar
  7. 7.
    Ferragina, P., Giancarlo, R., Manzini, G.: The engineering of a compression boosting library: Theory vs practice in bwt compression. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 756–767. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Ferragina, P., Giancarlo, R., Manzini, G.: The myriad virtues of wavelet trees. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 561–572. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. Journal of the ACM 52, 688–713 (2005)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Foschini, L., Grossi, R., Gupta, A., Vitter, J.: Fast compression with a static model in high-order entropy. In: Procs of IEEE Data Compression Conference (DCC), pp. 62–71 (2004)Google Scholar
  11. 11.
    Giancarlo, R., Sciortino, M.: Optimal partitions of strings: A new class of Burrows-Wheeler compression algorithms. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 129–143. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Kaplan, H., Landau, S., Verbin, E.: A simpler analysis of Burrows-Wheeler based compression. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nordic Journal of Computing 12(1), 40–66 (2005)MathSciNetGoogle Scholar
  14. 14.
    Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys (to Appear)Google Scholar
  16. 16.
    Ryabko, B.Y.: Data compression by means of a ’book stack’. Prob.Inf.Transm, 16(4) (1980)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Travis Gagie
    • 1
  • Giovanni Manzini
    • 1
  1. 1.Dipartimento di Informatica, Università del Piemonte Orientale, I-15100 AlessandriaItaly

Personalised recommendations