Skip to main content

On the Approximation Ratio of LZ-End to LZ77

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2021)

Abstract

A family of Lempel-Ziv factorizations is a well-studied string structure. The LZ-End factorization is a member of the family that achieved faster extraction of any substrings (Kreft & Navarro, TCS 2013). One of the interests for LZ-End factorizations is the possible difference between the size of LZ-End and LZ77 factorizations. They also showed families of strings where the approximation ratio of the number of LZ-End phrases to the number of LZ77 phrases asymptotically approaches 2. However, the alphabet size of these strings is unbounded. In this paper, we analyze the LZ-End factorization of the period-doubling sequence. We also show that the approximation ratio for the period-doubling sequence asymptotically approaches 2 for the binary alphabet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This version of LZ77 is often called non-overlapping LZ77 or LZ77 without self-references, since each phrase \(p_i\) never overlaps with any of its sources.

  2. 2.

    This definition of LZ77 is different from the original one [33] (see [21] for more information).

References

  1. Allouche, J.P., Shallit, J.: Automatic Sequences: Theory, Applications, Generalizations. Cambridge University Press, Cambridge (2003). https://doi.org/10.1017/CBO9780511546563

    Book  MATH  Google Scholar 

  2. Belazzougui, D., et al.: Queries on LZ-bounded encodings. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) 2015 Data Compression Conference, DCC 2015, Snowbird, UT, USA, 7–9 April 2015, pp. 83–92. IEEE (2015). https://doi.org/10.1109/DCC.2015.69

  3. Berstel, J., Savelli, A.: Crochemore factorization of sturmian and other infinite words. In: Královič, R., Urzyczyn, P. (eds.) MFCS 2006. LNCS, vol. 4162, pp. 157–166. Springer, Heidelberg (2006). https://doi.org/10.1007/11821069_14

    Chapter  MATH  Google Scholar 

  4. Bille, P., Gagie, T., Gørtz, I.L., Prezza, N.: A separation between RLSLPs and LZ77. J. Discret. Algorithms 50, 36–39 (2018). https://doi.org/10.1016/j.jda.2018.09.002

    Article  MathSciNet  MATH  Google Scholar 

  5. Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Technical report, Digital SRC Research Report (1994)

    Google Scholar 

  6. Charikar, M., et al.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005). https://doi.org/10.1109/TIT.2005.850116

    Article  MathSciNet  MATH  Google Scholar 

  7. Chen, K.T., Fox, R.H., Lyndon, R.C.: Free differential calculus, IV. The quotient groups of the lower central series. Ann. Math. 68(1), 81–95 (1958). http://www.jstor.org/stable/1970044

  8. Christiansen, A.R., Ettienne, M.B., Kociumaka, T., Navarro, G., Prezza, N.: Optimal-time dictionary-compressed indexes. ACM Trans. Algorithms 17(1), 8:1–8:39 (2021). https://doi.org/10.1145/3426473

  9. Crochemore, M.: An optimal algorithm for computing the repetitions in a word. Inf. Process. Lett. 12(5), 244–250 (1981). https://doi.org/10.1016/0020-0190(81)90024-7

    Article  MathSciNet  MATH  Google Scholar 

  10. Do, H.H., Jansson, J., Sadakane, K., Sung, W.: Fast relative Lempel-Ziv self-index for similar sequences. Theor. Comput. Sci. 532, 14–30 (2014). https://doi.org/10.1016/j.tcs.2013.07.024

    Article  MathSciNet  MATH  Google Scholar 

  11. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28332-1_21

    Chapter  Google Scholar 

  12. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Pardo, A., Viola, A. (eds.) LATIN 2014. LNCS, vol. 8392, pp. 731–742. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54423-1_63

    Chapter  Google Scholar 

  13. Goto, K., Bannai, H., Inenaga, S., Takeda, M.: LZD Factorization: simple and practical online grammar compression with variable-to-fixed encoding. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 219–230. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19929-0_19

    Chapter  Google Scholar 

  14. Kärkkäinen, J., Kempa, D., Nakashima, Y., Puglisi, S.J., Shur, A.M.: On the size of Lempel-Ziv and Lyndon factorizations. In: Vollmer, H., Vallée, B. (eds.) 34th Symposium on Theoretical Aspects of Computer Science, STACS 2017. LIPIcs, Hannover, Germany, 8–11 March 2017, vol. 66, pp. 45:1–45:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017). https://doi.org/10.4230/LIPIcs.STACS.2017.45

  15. Kempa, D., Kociumaka, T.: Resolution of the Burrows-Wheeler transform conjecture. In: 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, 16–19 November 2020, pp. 1002–1013. IEEE (2020). https://doi.org/10.1109/FOCS46700.2020.00097

  16. Kempa, D., Kosolobov, D.: LZ-End parsing in compressed space. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) 2017 Data Compression Conference, DCC 2017, Snowbird, UT, USA, 4–7 April 2017, pp. 350–359. IEEE (2017). https://doi.org/10.1109/DCC.2017.73

  17. Kempa, D., Kosolobov, D.: LZ-End parsing in linear time. In: Pruhs, K., Sohler, C. (eds.) 25th Annual European Symposium on Algorithms, ESA 2017. LIPIcs, Vienna, Austria, 4–6 September 2017, vol. 87, pp. 53:1–53:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017). https://doi.org/10.4230/LIPIcs.ESA.2017.53

  18. Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: Diakonikolas, I., Kempe, D., Henzinger, M. (eds.) Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, 25–29 June 2018, pp. 827–840. ACM (2018). https://doi.org/10.1145/3188745.3188814

  19. Kociumaka, T., Navarro, G., Prezza, N.: Towards a definitive measure of repetitiveness. In: Kohayakawa, Y., Miyazawa, F.K. (eds.) LATIN 2021. LNCS, vol. 12118, pp. 207–219. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61792-9_17

    Chapter  Google Scholar 

  20. Kosolobov, D., Valenzuela, D., Navarro, G., Puglisi, S.J.: Lempel–Ziv-Like Parsing in Small Space. Algorithmica 82(11), 3195–3215 (2020). https://doi.org/10.1007/s00453-020-00722-6

    Article  MathSciNet  MATH  Google Scholar 

  21. Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor. Comput. Sci. 483, 115–133 (2013). https://doi.org/10.1016/j.tcs.2012.02.006

    Article  MathSciNet  MATH  Google Scholar 

  22. Kärkkäinen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index structures for string matching (extended abstract). In: Proceedings of the 3rd South American Workshop on String Processing, WSP 1996, pp. 141–155. Carleton University Press (1996)

    Google Scholar 

  23. Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 201–206. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16321-0_20

    Chapter  MATH  Google Scholar 

  24. Kutsukake, K., Matsumoto, T., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: On repetitiveness measures of Thue-Morse words. In: Boucher, C., Thankachan, S.V. (eds.) SPIRE 2020. LNCS, vol. 12303, pp. 213–220. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59212-7_15

    Chapter  Google Scholar 

  25. Lothaire, M.: Applied Combinatorics on Words, vol. 105. Cambridge University Press, Cambridge (2005)

    Book  Google Scholar 

  26. Mitsuya, S., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: Compressed communication complexity of Hamming distance. Algorithms 14(4), 116 (2021). https://doi.org/10.3390/a14040116

    Article  Google Scholar 

  27. Navarro, G., Ochoa, C., Prezza, N.: On the approximation ratio of ordered parsings. IEEE Trans. Inf. Theory 67(2), 1008–1026 (2021). https://doi.org/10.1109/TIT.2020.3042746

    Article  MathSciNet  MATH  Google Scholar 

  28. Nishimoto, T., Tomohiro, I., Inenaga, S., Bannai, H., Takeda, M.: Dynamic index and LZ factorization in compressed space. Discret. Appl. Math. 274, 116–129 (2020). https://doi.org/10.1016/j.dam.2019.01.014

  29. Nishimoto, T., Tabei, Y.: LZRR: LZ77 parsing with right reference. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) Data Compression Conference, DCC 2019, Snowbird, UT, USA, 26–29 March 2019, pp. 211–220. IEEE (2019). https://doi.org/10.1109/DCC.2019.00029

  30. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1–3), 211–222 (2003). https://doi.org/10.1016/S0304-3975(02)00777-6

    Article  MathSciNet  MATH  Google Scholar 

  31. Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29(4), 928–951 (1982). https://doi.org/10.1145/322344.322346

    Article  MathSciNet  MATH  Google Scholar 

  32. Urabe, Y., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: On the size of overlapping Lempel-Ziv and Lyndon factorizations. In: Pisanti, N., Pissis, S.P. (eds.) 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019. LIPIcs, Pisa, Italy, 18–20 June 2019, vol. 128, pp. 29:1–29:11. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). https://doi.org/10.4230/LIPIcs.CPM.2019.29

  33. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977). https://doi.org/10.1109/TIT.1977.1055714

    Article  MathSciNet  MATH  Google Scholar 

  34. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978). https://doi.org/10.1109/TIT.1978.1055934

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Numbers JP20J11983 (TM), JP20J21147 (MF), JP18K18002 (YN), JP21K17705 (YN), JP18H04098 (MT), JP20H05964 (MT), and by JST PRESTO Grant Number JPMJPR1922 (SI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takumi Ideue .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ideue, T., Mieno, T., Funakoshi, M., Nakashima, Y., Inenaga, S., Takeda, M. (2021). On the Approximation Ratio of LZ-End to LZ77. In: Lecroq, T., Touzet, H. (eds) String Processing and Information Retrieval. SPIRE 2021. Lecture Notes in Computer Science(), vol 12944. Springer, Cham. https://doi.org/10.1007/978-3-030-86692-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86692-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86691-4

  • Online ISBN: 978-3-030-86692-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics