The Smallest Grammar Problem Revisited

  • Danny HuckeEmail author
  • Markus Lohrey
  • Carl Philipp Reh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9954)


In a seminal paper of Charikar et al. on the smallest grammar problem, the authors derive upper and lower bounds on the approximation ratios for several grammar-based compressors, but in all cases there is a gap between the lower and upper bound. Here we close the gaps for LZ78 and BISECTION by showing that the approximation ratio of LZ78 is \(\varTheta ( (n/\log n)^{2/3})\), whereas the approximation ratio of BISECTION is \(\varTheta ( (n/\log n)^{1/2})\). We also derive a lower bound for a smallest grammar for a word in terms of its number of LZ77-factors, which refines existing bounds of Rytter. Finally, we improve results of Arpe and Reischuk relating grammar-based compression for arbitrary alphabets and binary alphabets.


Approximation Ratio Kolmogorov Complexity Addition Chain Binary Alphabet Block Encode 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The work in this paper was supported by the DFG grant LO 748/10-1.


  1. 1.
    Arpe, J., Reischuk, R.: On the complexity of optimal grammar-based compression. In: Proceedings of the DCC 2006, pp. 173–182. IEEE Computer Society (2006)Google Scholar
  2. 2.
    Berstel, J., Brlek, S.: On the length of word chains. Inf. Process. Lett. 26(1), 23–28 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Casel, K., Fernau, H., Gaspers, S., Gras, B., Schmid, M.L.: On the complexity of grammar-based compression over fixed alphabets. In: Proceeding ICALP 2016, LNCS. Springer, Heidelberg (2016, to appear)Google Scholar
  4. 4.
    Charikar, M., Lehman, E., Lehman, A., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Diwan, A.A.: A New Combinatorial Complexity Measure for Languages. Tata Institute, Bombay (1986)Google Scholar
  6. 6.
    Gasieniec, L., Karpinski, M., Plandowski, W., Rytter, W.: Efficient algorithms for Lempel-Ziv encoding (extended abstract). In: Karlsson, R., Lingas, A. (eds.) SWAT 1996. LNCS, vol. 1097, pp. 392–403. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  7. 7.
    Jeż, A.: Approximation of grammar-based compression via recompression. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 165–176. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  8. 8.
    Kieffer, J.C., Yang, E.-H.: Grammar-based codes: a new class of universal lossless source codes. IEEE Trans. Inf. Theory 46(3), 737–754 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Kieffer, J.C., Yang, E.-H., Nelson, G.J., Cosman, P.C.: Universal lossless compression via multilevel pattern matching. IEEE Trans. Inf. Theory 46(4), 1227–1245 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Larsson, N.J., Moffat, A.: Offline dictionary-based compression. In: Proceedings of the DCC 1999, pp. 296–305. IEEE Computer Society (1999)Google Scholar
  11. 11.
    Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer, Heidelberg (2008)CrossRefzbMATHGoogle Scholar
  12. 12.
    Lohrey, M.: The Compressed Word Problem for Groups. Springer, Heidelberg (2014)CrossRefzbMATHGoogle Scholar
  13. 13.
    Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artif. Intell. Res. 7, 67–82 (1997)zbMATHGoogle Scholar
  14. 14.
    Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1–3), 211–222 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29(4), 928–951 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Tabei, Y., Takabatake, Y., Sakamoto, H.: A succinct grammar compression. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 235–246. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  17. 17.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1977)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.University of SiegenSiegenGermany

Personalised recommendations