An Application of Self-organizing Data Structures to Compression
List update algorithms have been widely used as subroutines in compression schemas, most notably as part of Burrows-Wheeler compression. The Burrows-Wheeler transform (BWT), which is the basis of many state-of-the-art general purpose compressors applies a compression algorithm to a permuted version of the original text. List update algorithms are a common choice for this second stage of BWT-based compression. In this paper we perform an experimental comparison of various list update algorithms both as stand alone compression mechanisms and as a second stage of the BWT-based compression. Our experiments show MTF outperforms other list update algorithms in practice after BWT. This is consistent with the intuition that BWT increases locality of reference and the predicted result from the locality of reference model of Angelopoulos et al. . Lastly, we observe that due to an often neglected difference in the cost models, good list update algorithms may be far from optimal for BWT compression and construct an explicit example of this phenomena. This is a fact that had yet to be supported theoretically in the literature.
Unable to display preview. Download preview PDF.
- 5.Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical Report 124, DEC SRC (1994)Google Scholar
- 7.Chapin, B.: Switching between two on-line list update algorithms for higher compression of burrows-wheeler transformed data. In: Data Compression Conference, pp. 183–192 (2000)Google Scholar
- 8.Nagy, D.A., Linder, T.: Experimental study of a binary block sorting compression scheme. In: Data Compression Conference, pp. 439–448 (2003)Google Scholar
- 12.Balkenhol, B., Kurtz, S., Shtarkov, Y.M.: Modifications of the burrows and wheeler data compression algorithm. In: Data Compression Conference, pp. 188–197 (1999)Google Scholar
- 13.Seward, J.: bzip2, a program and library for data compression, http://www.bzip.org/
- 17.Witten, I.H., Bell, T.: The Calgary text compression corpus. Anonymous ftp from ftp.cpsc.ucalgary.ca/pub/text.compression/corpus/text.compression.corpus.tar.Z
- 18.Arnold, R., Bell, T.C.: A corpus for the evaluation of lossless compression algorithms. In: Data Compression Conference, pp. 201–210 (1997)Google Scholar
- 22.Grinberg, D., Rajagopalan, S., Venkatesan, R., Wei, V.K.: Splay trees for data compression. In: Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms (SODA 1995), pp. 522–530 (1995)Google Scholar