A weight-based dynamic compression method has recently been proposed, which is especially suitable for the encoding of files with locally skewed distributions. Its main idea is to assign larger weights to closer to be encoded symbols by means of an increasing weight function, rather than considering each position in the text evenly. A well known transformation that tends to convert input files into files with a more skewed distribution is the Burrows–Wheeler Transform (BWT). This paper proposes to apply the weighted approach on Burrows–Wheeler transformed files. While it is shown that the compression performance is not altered for static and adaptive arithmetic coding by any permutation of the symbols, hence in particular for BWT, empirical evidence of the efficiency of the combination of BWT with the weighted approach is provided.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
All data generated or analysed during this study are included in this published article.
Burrows M, Wheeler D.J. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)
Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Trans Inf Theory. 1977;23(3):337–43.
Moffat A. Huffman coding. ACM Comput Surv. 2019;52(4):85–18535.
Fruchtman A, Gross Y, Klein ST, Shapira D. Weighted Burrows-Wheeler compression. CoRR abs/2105.10327 (2021)
Hon W, Sadakane K, Sung W. Breaking a time-and-space barrier in constructing full-text indices. SIAM J Comput. 2009;38(6):2162–78.
Kempa D, Kociumaka T. String synchronizing sets: sublinear-time BWT construction and optimal LCE data structure. In: Charikar M, Cohen E, editors. Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23–26; 2019. p. 756–767.
Bentley JL, Sleator DD, Tarjan RE, Wei VK. A locally adaptive data compression scheme. Commun ACM. 1986;29(4):320–30.
Ryabko BY, Horspool RN, Cormack GV. Comments to: a locally adaptive data compression scheme. Commun ACM. 1987;30(9):792–4.
Arnavut Z, Magliveras SS. Block sorting and compression. In: Storer JA, Cohn M, editors. Proceedings of the 7th Data Compression Conference (DCC ’97), Snowbird, Utah, USA, March 25–27; 1997. p. 181–190.
Binder E. Distance coder. Usenet group: comp.compression. 2000. http://groups.google.com/group/comp.compression/msg/27d46abca0799d12.
Gagie T, Manzini G. Move-to-front, distance coding, and inversion frequencies revisited. Theor Comput Sci. 2010;411(31–33):2925–44.
Fruchtman A, Gross Y, Klein S.T, Shapira D. Backward weighted coding. In: 31st Data Compression Conference, DCC 2021, Snowbird, UT, USA, March 23–26; 2021. p. 93–102.
Fenwick PM. The Burrows-Wheeler transform for block sorting text compression: principles and improvements. Comput J. 1996;39(9):731–40.
Klein ST, Saadia S, Shapira D. Forward looking Huffman coding. Theory Comput Syst. 2020;65(3):593–612.
Fruchtman A, Klein S.T, Shapira D. Bidirectional adaptive compression. In: Proceedings of the Prague Stringology Conference; 2019. pp. 92–101.
Fruchtman A, Gross Y, Klein ST, Shapira D. Weighted forward looking adaptive coding. Theor Comput Sci. 2022;930:86–99.
Avrunin RM, Klein ST, Shapira D. Combining forward compression with PPM. SN Comput Sci. 2022;3(3):239.
Cleary J, Witten I. Data compression using adaptive coding and partial string matching. IEEE Trans Commun. 1984;32(4):396–402.
Witten IH, Neal RM, Cleary JG. Arithmetic coding for data compression. Commun ACM. 1987;30(6):520–40.
Vitter JS. Design and analysis of dynamic Huffman codes. JACM. 1987;34(4):825–45.
Nelson M, Gailly J-L. The data compression book. New York: M & T Books; 1996. p. 550–1.
Elias P. Universal codeword sets and representations of the integers. IEEE Trans Inf Theory. 1975;21(2):194–203.
Moffat A, Turpin A. Compression and Coding Algorithms. The international series in engineering and computer science, vol. 669, Kluwer (2002)
Gray F. Pulse code communication. U.S. Patent 2,632,058A, Serial No. 785697 (1953)
Hankerson DC, Harris GA, Johnson J. Introduction to information theory and data compression. Boca Raton, Florida: CRC; 1998.
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “String Processing and Combinatorial Algorithms guest edited by Simone Faro.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fruchtman, A., Gross, Y., Klein, S.T. et al. Weighted Burrows–Wheeler Compression. SN COMPUT. SCI. 4, 265 (2023). https://doi.org/10.1007/s42979-022-01629-5
- Adaptive compression
- Huffman code
- Arithmetic code
- Burrows-Wheeler Transform