Advertisement

Compressing Big Data: When the Rate of Convergence to the Entropy Matters

  • Salvatore AronicaEmail author
  • Alessio Langiu
  • Francesca Marzi
  • Salvatore Mazzola
  • Filippo Mignosi
  • Giulio Nazzicone
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9582)

Abstract

It is well known from a theoretical point of view that LZ78 have an asymptotic convergence to the entropy faster than LZ77. A faster rate of convergence to the theoretical compression limit should lead to a better compression ratio. In effect, early LZ78-like and LZ77-like compressors behave accordingly to the theory. On the contrary, it seems that most of the recent commercial LZ77-like compressors perform better than the other ones. Probably this is due to a strategy of optimal parsing, which is used to factorize the text and can be applied to both LZ77 and LZ78 cases, as recent results suggest. To our best knowledge there are no theoretical results concerning the rate of convergence to the entropy of both LZ77-like and LZ78-like case when a strategy of optimal parsing is used. In this paper we investigate how an optimal parsing affect the rate of convergence to the entropy of LZ78-like compressors. We discuss some experimental results on LZ78-like compressors and we consider the ratio between the speed of convergence to the entropy of a compressor with optimal parsing and the speed of convergence to the entropy of a classical LZ78-like compressor. This ratio presents a kind of wave effect that become bigger and bigger as the entropy of the memoryless source decreases but it seems always to slowly converge to one. These results suggest that for non-zero entropy sources the optimal parsing does not improve the speed of convergence to the entropy in the case of LZ78-like compressors.

Keywords

Lempel-Ziv compression algorithms Text compression Text entropy String algorithms 

References

  1. 1.
    Bell, T.C., Cleary, J.G., Witten, I.H.: Text Compression. Prentice Hall, Upper Saddle River (1990)Google Scholar
  2. 2.
    Bell, T.C., Witten, I.H.: The relationship between greedy parsing and symbolwise text compression. J. ACM 41(4), 708–724 (1994)CrossRefzbMATHGoogle Scholar
  3. 3.
    Crochemore, M., Giambruno, L., Langiu, A., Mignosi, F., Restivo, A.: Dictionary-symbolwise flexible parsing. In: Iliopoulos, C.S., Smyth, W.F. (eds.) IWOCA 2010. LNCS, vol. 6460, pp. 390–403. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  4. 4.
    Crochemore, M., Giambruno, L., Langiu, A., Mignosi, F., Restivo, A.: Dictionary-symbolwise flexible parsing. J. Discrete Algorithms 14, 74–90 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Crochemore, M., Langiu, A., Mignosi, F.: The rightmost equal-cost position problem. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) DCC, pp. 421–430. IEEE, Los Alamitos (2013)Google Scholar
  6. 6.
    Crochemore, M., Lecroq, T.: Pattern-matching and text-compression algorithms. ACM Comput. Surv. 28(1), 39–41 (1996)CrossRefGoogle Scholar
  7. 7.
    Jacob, T., Bansal, R.K.: Almost sure optimality of sliding window Lempel-Ziv algorithm and variants revisited. IEEE Trans. Inf.Theor. 59(8), 4977–4984 (2013)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Jacquet, P., Szpankowski, W.: Asymptotic behavior of the Lempel-Ziv parsing scheme and digital search trees. Theor. Comput. Sci. 144(1&2), 161–197 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Jacquet, P., Szpankowski, W.: Analytic Pattern Matching. From DNA to Twitter. Cambridge University Press, Cambridge (2015)CrossRefGoogle Scholar
  10. 10.
    Langiu, A.: Optimal Parsing for dictionary text compression. Ph.D thesis, Université Paris-Est, (2012). https://tel.archives-ouvertes.fr/tel-00804215/document
  11. 11.
    Langiu, A.: On parsing optimality for dictionary-based text compression - the zip case. J. Discrete Algorithms 20, 65–70 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Lastras-Montano, L.A.: On certain pathwise properties of the sliding-window Lempel-Ziv algorithm. IEEE Trans. Inf. Theor. 52(12), 5267–5283 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Matias, Y., Sahinalp, S.C.: On the optimality of parsing in dynamic dictionary based data compression. In: SODA, pp. 943–944 (1999)Google Scholar
  14. 14.
    Ornstein, D., Weiss, B.: Entropy and data compression schemes. IEEE Trans. Inf. Theor. 39(1), 78–83 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Salomon, D.: Data compression - The Complete Reference, 4th edn. Springer, New York (2007)zbMATHGoogle Scholar
  16. 16.
    Salomon, D.: Variable-length Codes for Data Compression. Springer-Verlag, London (2007)CrossRefzbMATHGoogle Scholar
  17. 17.
    Savari, S.A.: Redundancy of the Lempel-Ziv string matching code. IEEE Trans. Inf. Theor. 44(2), 787–791 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Wyner, A.D., Wyner, A.J.: Improved redundancy of a version of the Lempel-Ziv algorithm. IEEE Trans. Inf. Theor. 41(3), 723–731 (1995)CrossRefzbMATHGoogle Scholar
  19. 19.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theor. 24(5), 530–536 (1978)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Salvatore Aronica
    • 1
    Email author
  • Alessio Langiu
    • 1
    • 2
  • Francesca Marzi
    • 3
  • Salvatore Mazzola
    • 1
  • Filippo Mignosi
    • 3
  • Giulio Nazzicone
    • 3
  1. 1.IAMC-CNR Unit of Capo Granitola, National Research CouncilTrapaniItaly
  2. 2.King’s College LondonLondonUK
  3. 3.DISIM DepartmentUniversity of L’AquilaL’AquilaItaly

Personalised recommendations