Skip to main content

Compressing Big Data: When the Rate of Convergence to the Entropy Matters

  • Conference paper
  • First Online:
Mathematical Aspects of Computer and Information Sciences (MACIS 2015)

Abstract

It is well known from a theoretical point of view that LZ78 have an asymptotic convergence to the entropy faster than LZ77. A faster rate of convergence to the theoretical compression limit should lead to a better compression ratio. In effect, early LZ78-like and LZ77-like compressors behave accordingly to the theory. On the contrary, it seems that most of the recent commercial LZ77-like compressors perform better than the other ones. Probably this is due to a strategy of optimal parsing, which is used to factorize the text and can be applied to both LZ77 and LZ78 cases, as recent results suggest. To our best knowledge there are no theoretical results concerning the rate of convergence to the entropy of both LZ77-like and LZ78-like case when a strategy of optimal parsing is used. In this paper we investigate how an optimal parsing affect the rate of convergence to the entropy of LZ78-like compressors. We discuss some experimental results on LZ78-like compressors and we consider the ratio between the speed of convergence to the entropy of a compressor with optimal parsing and the speed of convergence to the entropy of a classical LZ78-like compressor. This ratio presents a kind of wave effect that become bigger and bigger as the entropy of the memoryless source decreases but it seems always to slowly converge to one. These results suggest that for non-zero entropy sources the optimal parsing does not improve the speed of convergence to the entropy in the case of LZ78-like compressors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bell, T.C., Cleary, J.G., Witten, I.H.: Text Compression. Prentice Hall, Upper Saddle River (1990)

    Google Scholar 

  2. Bell, T.C., Witten, I.H.: The relationship between greedy parsing and symbolwise text compression. J. ACM 41(4), 708–724 (1994)

    Article  MATH  Google Scholar 

  3. Crochemore, M., Giambruno, L., Langiu, A., Mignosi, F., Restivo, A.: Dictionary-symbolwise flexible parsing. In: Iliopoulos, C.S., Smyth, W.F. (eds.) IWOCA 2010. LNCS, vol. 6460, pp. 390–403. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  4. Crochemore, M., Giambruno, L., Langiu, A., Mignosi, F., Restivo, A.: Dictionary-symbolwise flexible parsing. J. Discrete Algorithms 14, 74–90 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  5. Crochemore, M., Langiu, A., Mignosi, F.: The rightmost equal-cost position problem. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) DCC, pp. 421–430. IEEE, Los Alamitos (2013)

    Google Scholar 

  6. Crochemore, M., Lecroq, T.: Pattern-matching and text-compression algorithms. ACM Comput. Surv. 28(1), 39–41 (1996)

    Article  Google Scholar 

  7. Jacob, T., Bansal, R.K.: Almost sure optimality of sliding window Lempel-Ziv algorithm and variants revisited. IEEE Trans. Inf.Theor. 59(8), 4977–4984 (2013)

    Article  MathSciNet  Google Scholar 

  8. Jacquet, P., Szpankowski, W.: Asymptotic behavior of the Lempel-Ziv parsing scheme and digital search trees. Theor. Comput. Sci. 144(1&2), 161–197 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  9. Jacquet, P., Szpankowski, W.: Analytic Pattern Matching. From DNA to Twitter. Cambridge University Press, Cambridge (2015)

    Book  Google Scholar 

  10. Langiu, A.: Optimal Parsing for dictionary text compression. Ph.D thesis, Université Paris-Est, (2012). https://tel.archives-ouvertes.fr/tel-00804215/document

  11. Langiu, A.: On parsing optimality for dictionary-based text compression - the zip case. J. Discrete Algorithms 20, 65–70 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  12. Lastras-Montano, L.A.: On certain pathwise properties of the sliding-window Lempel-Ziv algorithm. IEEE Trans. Inf. Theor. 52(12), 5267–5283 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  13. Matias, Y., Sahinalp, S.C.: On the optimality of parsing in dynamic dictionary based data compression. In: SODA, pp. 943–944 (1999)

    Google Scholar 

  14. Ornstein, D., Weiss, B.: Entropy and data compression schemes. IEEE Trans. Inf. Theor. 39(1), 78–83 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  15. Salomon, D.: Data compression - The Complete Reference, 4th edn. Springer, New York (2007)

    MATH  Google Scholar 

  16. Salomon, D.: Variable-length Codes for Data Compression. Springer-Verlag, London (2007)

    Book  MATH  Google Scholar 

  17. Savari, S.A.: Redundancy of the Lempel-Ziv string matching code. IEEE Trans. Inf. Theor. 44(2), 787–791 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  18. Wyner, A.D., Wyner, A.J.: Improved redundancy of a version of the Lempel-Ziv algorithm. IEEE Trans. Inf. Theor. 41(3), 723–731 (1995)

    Article  MATH  Google Scholar 

  19. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  20. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theor. 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salvatore Aronica .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Aronica, S., Langiu, A., Marzi, F., Mazzola, S., Mignosi, F., Nazzicone, G. (2016). Compressing Big Data: When the Rate of Convergence to the Entropy Matters. In: Kotsireas, I., Rump, S., Yap, C. (eds) Mathematical Aspects of Computer and Information Sciences. MACIS 2015. Lecture Notes in Computer Science(), vol 9582. Springer, Cham. https://doi.org/10.1007/978-3-319-32859-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32859-1_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32858-4

  • Online ISBN: 978-3-319-32859-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics