Abstract
A grammar-based compressor computes for a given input w a context-free grammar that produces only w. So-called global grammar-based compressors (\(\mathsf {RePair}\), \(\mathsf {LongestMatch}\) and \(\mathsf {Greedy}\)) achieve impressive practical compression results, but the recursive character of those algorithms makes it hard to achieve strong theoretical results. To this end, this paper studies the approximation ratio of those algorithms for unary input strings, which is strongly related to the field of addition chains. We show that in this setting, \(\mathsf {RePair}\) and \(\mathsf {LongestMatch}\) produce equal size grammars that are by a factor of at most \(\log _2(3)\) larger than a smallest grammar. We also provide a matching lower bound. The main result of this paper is a new lower bound for \(\mathsf {Greedy}\) of 1.348..., which improves the best known lower bound for arbitrary (not necessarily unary) input strings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
While LZ78 was not introduced as a grammar-based compressor, it is straightforward to compute from the LZ78-factorization of w an SLP for w of roughly the same size.
References
Aho, A.V., Sloane, N.J.A.: Some doubly exponential sequences. Fib. Quart. 11, 429–437 (1973)
Apostolico, A., Lonardi, S.: Some theory and practice of greedy off-line textual substitution. In: Proceeding of the DCC 1998, pp. 119–128. IEEE Computer Society (1998)
Apostolico, A., Lonardi, S.: Compression of biological sequences by greedy off-line textual substitution. In: Proceedings of the DCC 2000, pp. 143–152. IEEE Computer Society (2000)
Apostolico, A., Lonardi, S.: Off-line compression by greedy textual substitution. Proc. IEEE 88(11), 1733–1744 (2000)
Arpe, J., Reischuk, R.: On the complexity of optimal grammar-based compression. In: Proceedings of the DCC 2006, pp. 173–182. IEEE Computer Society (2006)
Berstel, J., Brlek, S.: On the length of word chains. Inf. Process. Lett. 26(1), 23–28 (1987)
Casel, K., Fernau, H., Gaspers, S., Gras, B., Schmid, M.L.: On the complexity of grammar-based compression over fixed alphabets. In: Proceedings ICALP 2016, Lecture Notes in Computer Science. Springer (1996)
Charikar, M., et al.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
Diwan, A.A.: A New Combinatorial Complexity Measure for Languages. Tata Institute, Bombay (1986)
Jeż, A.: Approximation of grammar-based compression via recompression. Theor. Comput. Sci. 592, 115–134 (2015)
Kieffer, J.C., Yang, E.-H.: Grammar-based codes: a new class of universal lossless source codes. IEEE Trans. Inf. Theory 46(3), 737–754 (2000)
Kieffer, J.C., Yang, E.-H., Nelson, G.J., Cosman, P.C.: Universal lossless compression via multilevel pattern matching. IEEE Trans. Inf. Theory 46(4), 1227–1245 (2000)
Larsson, N.J., Moffat, A.: Offline dictionary-based compression. In: Proceedings of the DCC 1999, pp. 296–305. IEEE Computer Society (1999)
D. Hucke, A. Jeż, and M. Lohrey. Approximation ratio of RePair. Technical report, arxiv. org (2017). https://arxiv.org/abs/1703.06061
Hucke, D., Lohrey, M., Reh, C.P.: The smallest grammar problem revisited. In: Inenaga, S., Sadakane, K., Sakai, T. (eds.) SPIRE 2016. LNCS, vol. 9954, pp. 35–49. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46049-9_4
Knuth, D.E.: The Art of Computer Programming Volume II. Seminumerical Algorithms, 3rd edn. Addison Wesley, Reading (1998)
Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artif. Intell. Res. 7, 67–82 (1997)
Noma, A.M., Muhammed, A., Mohamed, M.A., Zulkarnain, Z.A.: A review on heuristics for addition chain problem: towards efficient public key cryptosystems. J. Comput. Sci. 13(8), 275–289 (2017)
Ochoa, C., Navarro, G.: RePair and all irreducible grammars are upper bounded by high-order empirical entropy. IEEE Trans. Inf. Theory 65(5), 3160–3164 (2019)
Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29(4), 928–951 (1982)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1977)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hucke, D. (2019). Approximation Ratios of \(\mathsf {RePair}\), \(\mathsf {LongestMatch}\) and \(\mathsf {Greedy}\) on Unary Strings. In: Brisaboa, N., Puglisi, S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science(), vol 11811. Springer, Cham. https://doi.org/10.1007/978-3-030-32686-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-32686-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32685-2
Online ISBN: 978-3-030-32686-9
eBook Packages: Computer ScienceComputer Science (R0)