An Application of Self-organizing Data Structures to Compression

  • Reza Dorrigiv
  • Alejandro López-Ortiz
  • J. Ian Munro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5526)

Abstract

List update algorithms have been widely used as subroutines in compression schemas, most notably as part of Burrows-Wheeler compression. The Burrows-Wheeler transform (BWT), which is the basis of many state-of-the-art general purpose compressors applies a compression algorithm to a permuted version of the original text. List update algorithms are a common choice for this second stage of BWT-based compression. In this paper we perform an experimental comparison of various list update algorithms both as stand alone compression mechanisms and as a second stage of the BWT-based compression. Our experiments show MTF outperforms other list update algorithms in practice after BWT. This is consistent with the intuition that BWT increases locality of reference and the predicted result from the locality of reference model of Angelopoulos et al. [1]. Lastly, we observe that due to an often neglected difference in the cost models, good list update algorithms may be far from optimal for BWT compression and construct an explicit example of this phenomena. This is a fact that had yet to be supported theoretically in the literature.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Angelopoulos, S., Dorrigiv, R., López-Ortiz, A.: List update with locality of reference. In: Laber, E.S., Bornstein, C., Nogueira, L.T., Faria, L. (eds.) LATIN 2008. LNCS, vol. 4957, pp. 399–410. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Bentley, J.L., Sleator, D.D., Tarjan, R.E., Wei, V.K.: A locally adaptive data compression scheme. Communications of the ACM 29, 320–330 (1986)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Albers, S., Mitzenmacher, M.: Average case analyses of list update algorithms, with applications to data compression. Algorithmica 21(3), 312–329 (1998)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Bachrach, R., El-Yaniv, R., Reinstadtler, M.: On the competitive theory and practice of online list accessing algorithms. Algorithmica 32(2), 201–245 (2002)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical Report 124, DEC SRC (1994)Google Scholar
  6. 6.
    Kaplan, H., Landau, S., Verbin, E.: A simpler analysis of burrows-wheeler based compression. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 282–293. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Chapin, B.: Switching between two on-line list update algorithms for higher compression of burrows-wheeler transformed data. In: Data Compression Conference, pp. 183–192 (2000)Google Scholar
  8. 8.
    Nagy, D.A., Linder, T.: Experimental study of a binary block sorting compression scheme. In: Data Compression Conference, pp. 439–448 (2003)Google Scholar
  9. 9.
    Deorowicz, S.: Improvements to burrows-wheeler compression algorithm. Software, Practice, and Experience 30(13), 1465–1483 (2000)CrossRefMATHGoogle Scholar
  10. 10.
    Fenwick, P.M.: The Burrows-Wheeler Transform for block sorting text compression: principles and improvements. The Computer Journal 39(9), 731–740 (1996)CrossRefGoogle Scholar
  11. 11.
    Balkenhol, B., Kurtz, S.: Universal data compression based on the burrows-wheeler transformation: Theory and practice. IEEE Transactions on Computers 49(10), 1043–1053 (2000)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Balkenhol, B., Kurtz, S., Shtarkov, Y.M.: Modifications of the burrows and wheeler data compression algorithm. In: Data Compression Conference, pp. 188–197 (1999)Google Scholar
  13. 13.
    Seward, J.: bzip2, a program and library for data compression, http://www.bzip.org/
  14. 14.
    Sleator, D.D., Tarjan, R.E.: Amortized efficiency of list update and paging rules. Communications of the ACM 28, 202–208 (1985)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Albers, S.: Improved randomized on-line algorithms for the list update problem. SIAM Journal on Computing 27(3), 682–693 (1998)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Schulz, F.: Two new families of list update algorithms. In: Chwa, K.-Y., H. Ibarra, O. (eds.) ISAAC 1998. LNCS, vol. 1533, pp. 99–108. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  17. 17.
    Witten, I.H., Bell, T.: The Calgary text compression corpus. Anonymous ftp from ftp.cpsc.ucalgary.ca/pub/text.compression/corpus/text.compression.corpus.tar.Z
  18. 18.
    Arnold, R., Bell, T.C.: A corpus for the evaluation of lossless compression algorithms. In: Data Compression Conference, pp. 201–210 (1997)Google Scholar
  19. 19.
    Elias, P.: Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory 21(2), 194–203 (1975)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Sleator, D.D., Tarjan, R.E.: Self-adjusting binary search trees. Journal of the ACM 32(3), 652–686 (1985)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Jones, D.W.: Application of splay trees to data compression. Communications of the ACM 31(8), 996–1007 (1988)CrossRefGoogle Scholar
  22. 22.
    Grinberg, D., Rajagopalan, S., Venkatesan, R., Wei, V.K.: Splay trees for data compression. In: Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms (SODA 1995), pp. 522–530 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Reza Dorrigiv
    • 1
  • Alejandro López-Ortiz
    • 1
  • J. Ian Munro
    • 1
  1. 1.Cheriton School of Computer ScienceUniversity of WaterlooCanada

Personalised recommendations