Encyclopedia of Big Data Technologies

2019 Edition
| Editors: Sherif Sakr, Albert Y. Zomaya

Grammar-Based Compression

  • Sebastian ManethEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-77525-8_56



Grammar-based compression means to represent an object by a grammar that generates the object. For instance, a string can be represented by a context-free grammar that only generates the string, or, a node-labeled tree can be represented by a context-free tree grammar that generates only the tree. Two main questions are how to construct a small grammar for a given object and how to execute algorithms directly over grammars (without decompression).


This entry presents grammar-based compression of three types of finite objects: strings, trees, and (hyper)graphs, using context-free string, tree, and hyperedge-replacement grammars. The entry (Bannai 2016) focuses on grammar construction algorithms for strings. Here, only a few grammar construction algorithms are discussed. As a prototypical example of algorithms over grammars, the equivalence...

This is a preview of subscription content, log in to check access.


  1. Bannai H (2016) Grammar compression. In: Encyclopedia of Algorithms, Springer, pp 861–866. https://doi.org/10.1007/978-1-4939-2864-4_635
  2. Bille P, Landau GM, Raman R, Sadakane K, Satti SR, Weimann O (2015) Random access to grammar-compressed strings and trees. SIAM J Comput 44(3):513–539. https://doi.org/10.1137/130936889MathSciNetzbMATHCrossRefGoogle Scholar
  3. Bodlaender HL (1990) Polynomial algorithms for graph isomorphism and chromatic index on partial k-trees. J Algorithms 11(4):631–643. https://doi.org/10.1016/0196-6774(90)90013-5MathSciNetzbMATHCrossRefGoogle Scholar
  4. Casel K, Fernau H, Gaspers S, Gras B, Schmid ML (2016) On the complexity of grammar-based compression over fixed alphabets. In: Proceeding of 43rd international colloquium on automata, languages, and programming, ICALP 2016, 11–15 July 2016, Rome, pp 122:1–122:14.  https://doi.org/10.4230/LIPIcs.ICALP.2016.122
  5. Charikar M, Lehman E, Liu D, Panigrahy R, Prabhakaran M, Sahai A, Shelat A (2005) The smallest grammar problem. IEEE Trans Information Theory 51(7):2554–2576.  https://doi.org/10.1109/TIT.2005.850116MathSciNetzbMATHCrossRefGoogle Scholar
  6. Downey PJ, Sethi R, Tarjan RE (1980) Variations on the common subexpression problem. J ACM 27(4):758–771. http://doi.acm.org/10.1145/322217.322228MathSciNetzbMATHCrossRefGoogle Scholar
  7. Engelfriet J (1997) Context-free graph grammars. In: Rozenberg G, Salomaa A (eds) Handbook of formal languages: beyond words, vol 3. Springer, Berlin/Heidelberg, pp 125–213. https://doi.org/10.1007/978-3-642-59126-6_3CrossRefGoogle Scholar
  8. Ershov AP (1958) On programming of arithmetic operations. Commun ACM 1(8):3–9zbMATHCrossRefGoogle Scholar
  9. Ganardi M, Hucke D, Jez A, Lohrey M, Noeth E (2017) Constructing small tree grammars and small circuits for formulas. J Comput Syst Sci 86:136–158. https://doi.org/10.1016/j.jcss.2016.12.007MathSciNetzbMATHCrossRefGoogle Scholar
  10. Hermelin D, Landau GM, Landau S, Weimann O (2009) A unified algorithm for accelerating edit-distance computation via text-compression. In: Proceedings of the 26th international symposium on theoretical aspects of computer science, STACS 2009, 26–28 Feb 2009, Freiburg, pp 529–540.  https://doi.org/10.4230/LIPIcs.STACS.2009.1804
  11. Jez A (2015) Faster fully compressed pattern matching by recompression. ACM Trans Algorithms 11(3):20:1–20:43. http://doi.acm.org/10.1145/2631920MathSciNetzbMATHCrossRefGoogle Scholar
  12. Jez A, Lohrey M (2016) Approximation of smallest linear tree grammar. Inf Comput 251:215–251. https://doi.org/10.1016/j.ic.2016.09.007MathSciNetzbMATHCrossRefGoogle Scholar
  13. Kieffer JC, Yang E (2000) Grammar-based codes: a new class of universal lossless source codes. IEEE Trans Inf Theory 46(3):737–754. https://doi.org/10.1109/18.841160MathSciNetzbMATHCrossRefGoogle Scholar
  14. Kieffer JC, Yang E, Nelson GJ, Cosman PC (2000) Universal lossless compression via multilevel pattern matching. IEEE Trans Inf Theory 46(4):1227–1245. https://doi.org/10.1109/18.850665MathSciNetzbMATHCrossRefGoogle Scholar
  15. Larsson NJ, Moffat A (1999) Offline dictionary-based compression. In: Data Compression Conference, DCC 1999, Snowbird, 29–31 Mar 1999, pp 296–305.  https://doi.org/10.1109/DCC.1999.755679
  16. Liu Q, Yang Y, Chen C, Bu J, Zhang Y, Ye X (2008) Rnacompress: grammar-based compression and informational complexity measurement of rna secondary structure. BMC Bioinform 9(1):176. https://doi.org/10.1186/1471-2105-9-176CrossRefGoogle Scholar
  17. Lohrey M (2012) Algorithmics on SLP-compressed strings: a survey. Groups Complex Cryptol 4(2):241–299.  https://doi.org/10.1515/gcc-2012-0016MathSciNetzbMATHCrossRefGoogle Scholar
  18. Lohrey M, Maneth S, Mennicke R (2013) XML tree structure compression using RePair. Inf Syst 38(8):1150–1167. https://doi.org/10.1016/j.is.2013.06.006CrossRefGoogle Scholar
  19. Lohrey M, Maneth S, Peternek F (2015) Compressed tree canonization. In: Proceedings of the 42nd International Colloquium on Automata, Languages, and Programming, ICALP 2015, Part II, Kyoto, 6–10 July 2015, pp 337–349. https://doi.org/10.1007/978-3-662-47666-6_27CrossRefGoogle Scholar
  20. Maneth S, Peternek F (2017) Grammar-based graph compression. CoRR abs/1704.05254. http://arxiv.org/abs/1704.05254, 1704.05254
  21. Maneth S, Sebastian T (2010) Fast and tiny structural self-indexes for XML. CoRR abs/1012.5696. http://arxiv.org/abs/1012.5696, 1012.5696
  22. Nevill-Manning CG, Witten IH (1997) Identifying hierarchical structure in sequences: a linear-time algorithm. J Artif Intell Res 7:67–82.  https://doi.org/10.1613/jair.374zbMATHCrossRefGoogle Scholar
  23. Rytter W (2003) Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor Comput Sci 302(1–3):211–222. https://doi.org/10.1016/S0304-3975(02)00777-6MathSciNetzbMATHCrossRefGoogle Scholar
  24. Sakr S (2009) XML compression techniques: a survey and comparison. J Comput Syst Sci 75(5):303–322. https://doi.org/10.1016/j.jcss.2009.01.004MathSciNetzbMATHCrossRefGoogle Scholar
  25. Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In: Proceedings of the European conference on machine learning and knowledge discovery in databases – , ECML PKDD 2014, Nancy, Part III, 15–19 Sept 2014, pp 468–472. https://doi.org/10.1007/978-3-662-44845-8_37Google Scholar
  26. Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S (2015) Time series anomaly discovery with grammar-based compression. In: Proceedings of the 18th international conference on extending database technology, EDBT 2015, Brussels, Belgium, 23–27 March 2015, pp 481–492. https://doi.org/10.5441/002/edbt.2015.42
  27. Storer JA, Szymanski TG (1978) The macro model for data compression (extended abstract). In: Proceedings of the 10th annual ACM symposium on theory of computing, 1–3 May 1978, San Diego, pp 30–39. http://doi.acm.org/10.1145/800133.804329
  28. Storer JA, Szymanski TG (1982) Data compression via textural substitution. J ACM 29(4):928–951. http://doi.acm.org/10.1145/322344.322346zbMATHCrossRefGoogle Scholar
  29. Tabei Y, Saigo H, Yamanishi Y, Puglisi SJ (2016) Scalable partial least squares regression on grammar-compressed data matrices. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13–17 Aug 2016, pp 1875–1884. http://doi.acm.org/10.1145/2939672.2939864
  30. Takabatake Y, Nakashima K, Kuboyama T, Tabei Y, Sakamoto H (2016) SIEDM: an efficient string index and search algorithm for edit distance with moves. Algorithms 9(2):26. https://doi.org/10.3390/a9020026MathSciNetCrossRefGoogle Scholar
  31. Takabatake Y, Tomohiro I, Sakamoto H (2017) A space-optimal grammar compression. In: Proceedings of 25th annual European symposium on algorithms, ESA 2017, 4–6 Sept 2017, Vienna, pp 67:1–67:15.  https://doi.org/10.4230/LIPIcs.ESA.2017.67
  32. Weisfeiler B, Lehman AA (1968) A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsia, Seriya 2(9):12–16 (in Russian)Google Scholar
  33. Zhao Y, Hayashida M, Cao Y, Hwang J, Akutsu T (2015) Grammar-based compression approach to extraction of common rules among multiple trees of glycans and rnas. BMC Bioinform 16(1):128. https://doi.org/10.1186/s12859-015-0558-4CrossRefGoogle Scholar
  34. Ziv J, Lempel A (1978) Compression of individual sequences via variable-rate coding. IEEE Trans Information Theory 24(5):530–536.  https://doi.org/10.1109/TIT.1978.1055934MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Mathematics and InformaticsUniversität BremenBremenGermany