Abstract
It is well known from a theoretical point of view that LZ78 have an asymptotic convergence to the entropy faster than LZ77. A faster rate of convergence to the theoretical compression limit should lead to a better compression ratio. In effect, early LZ78-like and LZ77-like compressors behave accordingly to the theory. On the contrary, it seems that most of the recent commercial LZ77-like compressors perform better than the other ones. Probably this is due to a strategy of optimal parsing, which is used to factorize the text and can be applied to both LZ77 and LZ78 cases, as recent results suggest. To our best knowledge there are no theoretical results concerning the rate of convergence to the entropy of both LZ77-like and LZ78-like case when a strategy of optimal parsing is used. In this paper we investigate how an optimal parsing affect the rate of convergence to the entropy of LZ78-like compressors. We discuss some experimental results on LZ78-like compressors and we consider the ratio between the speed of convergence to the entropy of a compressor with optimal parsing and the speed of convergence to the entropy of a classical LZ78-like compressor. This ratio presents a kind of wave effect that become bigger and bigger as the entropy of the memoryless source decreases but it seems always to slowly converge to one. These results suggest that for non-zero entropy sources the optimal parsing does not improve the speed of convergence to the entropy in the case of LZ78-like compressors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bell, T.C., Cleary, J.G., Witten, I.H.: Text Compression. Prentice Hall, Upper Saddle River (1990)
Bell, T.C., Witten, I.H.: The relationship between greedy parsing and symbolwise text compression. J. ACM 41(4), 708–724 (1994)
Crochemore, M., Giambruno, L., Langiu, A., Mignosi, F., Restivo, A.: Dictionary-symbolwise flexible parsing. In: Iliopoulos, C.S., Smyth, W.F. (eds.) IWOCA 2010. LNCS, vol. 6460, pp. 390–403. Springer, Heidelberg (2011)
Crochemore, M., Giambruno, L., Langiu, A., Mignosi, F., Restivo, A.: Dictionary-symbolwise flexible parsing. J. Discrete Algorithms 14, 74–90 (2012)
Crochemore, M., Langiu, A., Mignosi, F.: The rightmost equal-cost position problem. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) DCC, pp. 421–430. IEEE, Los Alamitos (2013)
Crochemore, M., Lecroq, T.: Pattern-matching and text-compression algorithms. ACM Comput. Surv. 28(1), 39–41 (1996)
Jacob, T., Bansal, R.K.: Almost sure optimality of sliding window Lempel-Ziv algorithm and variants revisited. IEEE Trans. Inf.Theor. 59(8), 4977–4984 (2013)
Jacquet, P., Szpankowski, W.: Asymptotic behavior of the Lempel-Ziv parsing scheme and digital search trees. Theor. Comput. Sci. 144(1&2), 161–197 (1995)
Jacquet, P., Szpankowski, W.: Analytic Pattern Matching. From DNA to Twitter. Cambridge University Press, Cambridge (2015)
Langiu, A.: Optimal Parsing for dictionary text compression. Ph.D thesis, Université Paris-Est, (2012). https://tel.archives-ouvertes.fr/tel-00804215/document
Langiu, A.: On parsing optimality for dictionary-based text compression - the zip case. J. Discrete Algorithms 20, 65–70 (2013)
Lastras-Montano, L.A.: On certain pathwise properties of the sliding-window Lempel-Ziv algorithm. IEEE Trans. Inf. Theor. 52(12), 5267–5283 (2006)
Matias, Y., Sahinalp, S.C.: On the optimality of parsing in dynamic dictionary based data compression. In: SODA, pp. 943–944 (1999)
Ornstein, D., Weiss, B.: Entropy and data compression schemes. IEEE Trans. Inf. Theor. 39(1), 78–83 (1993)
Salomon, D.: Data compression - The Complete Reference, 4th edn. Springer, New York (2007)
Salomon, D.: Variable-length Codes for Data Compression. Springer-Verlag, London (2007)
Savari, S.A.: Redundancy of the Lempel-Ziv string matching code. IEEE Trans. Inf. Theor. 44(2), 787–791 (1998)
Wyner, A.D., Wyner, A.J.: Improved redundancy of a version of the Lempel-Ziv algorithm. IEEE Trans. Inf. Theor. 41(3), 723–731 (1995)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (1977)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theor. 24(5), 530–536 (1978)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Aronica, S., Langiu, A., Marzi, F., Mazzola, S., Mignosi, F., Nazzicone, G. (2016). Compressing Big Data: When the Rate of Convergence to the Entropy Matters. In: Kotsireas, I., Rump, S., Yap, C. (eds) Mathematical Aspects of Computer and Information Sciences. MACIS 2015. Lecture Notes in Computer Science(), vol 9582. Springer, Cham. https://doi.org/10.1007/978-3-319-32859-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-32859-1_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32858-4
Online ISBN: 978-3-319-32859-1
eBook Packages: Computer ScienceComputer Science (R0)