Abstract
In the smallest grammar problem, we are given a word w and we want to compute a preferably small contextfree grammar G for the singleton language {w} (where the size of a grammar is the sum of the sizes of its rules, and the size of a rule is measured by the length of its right side). It is known that, for unbounded alphabets, the decision variant of this problem is NPhard and the optimisation variant does not allow a polynomialtime approximation scheme, unless P = NP. We settle the longstanding open problem whether these hardness results also hold for the more realistic case of a constantsize alphabet. More precisely, it is shown that the smallest grammar problem remains NPcomplete (and its optimisation version is APXhard), even if the alphabet is fixed and has size of at least 17. The corresponding reduction is robust in the sense that it also works for an alternative sizemeasure of grammars that is commonly used in the literature (i. e., a size measure also taking the number of rules into account), and it also allows to conclude that even computing the number of rules required by a smallest grammar is a hard problem. On the other hand, if the number of nonterminals (or, equivalently, the number of rules) is bounded by a constant, then the smallest grammar problem can be solved in polynomial time, which is shown by encoding it as a problem on graphs with interval structure. However, treating the number of rules as a parameter (in terms of parameterised complexity) yields W[1]hardness. Furthermore, we present an \(\mathcal {O}(3^{\mid {w}\mid })\) exact exponentialtime algorithm, based on dynamic programming. These three main questions are also investigated for 1level grammars, i. e., grammars for which only the start rule contains nonterminals on the right side; thus, investigating the impact of the “hierarchical depth” of grammars on the complexity of the smallest grammar problem. In this regard, we obtain for 1level grammars similar, but slightly stronger results.
Introduction
Contextfree grammars are among the most classical concepts in theoretical computer science. Their wide range of applications, both of theoretical and practical nature, is wellknown and usually forms an integral part of academic undergraduate courses in computer science. In this paper, we are concerned with grammars G that describe singleton languages {w} (or, by slightly abusing notation, grammars describing single words).^{Footnote 1}
Grammars as Inference Tools and Compressors
Although, from a formal languages point of view, describing a single word by a contextfree grammar seems excessive, there are at least two evident motivations:

Compression Perspective:^{Footnote 2} The grammar G is a compressed representation of the word w.

Inference Perspective: The grammar G identifies the hierarchical structure of the word w.
The inference perspective can be traced back to the work of NevillManning and Witten [1, 2],^{Footnote 3} in which the authors consider algorithmic possibilities of extracting (hierarchical) structure from sequential data, such as texts (in a natural or formal language), music or DNA, by constructing a grammar for a given sequence. The hypothesis that small grammars are to be preferred can be considered as an application of Occam’s razor (note that the size of a grammar is the sum of the sizes of its rules, where the size of a rule is measured by the length of its right side). In a more general sense, NevillManning and Witten’s approach embarks on the quest of inferring the intrinsic information content of a given sequence, which is a central problem in learning theory and algorithmic information theory (especially Kolmogorov complexity, as mentioned below). In NevillManning’s PhDthesis [2], a multitude of connections between the compression perspective of computing grammars for single words and other core topics of mathematics and theoretical computer science are discussed (e. g., the minimum description length principle in learning theory, information theory, data compression). The inference perspective of computing grammars for single words has been applied in two more PhDtheses, namely by de Marcken [3] in order to investigate whether analysing the structure of small grammars for large English texts could help understanding the structure of the language itself, and by Gallé [4] in order to infer hierarchical structures in DNA. Moreover, Lanctot et al. [5] contribute to the work on estimating the entropy of DNAsequences (see the references in [5]), by using an algorithm first proposed by Kieffer and Yang [6] to compute grammars for DNAsequences.
While in the above mentioned work, grammars are mainly used as an inference tool, the obvious connections to data compression are often highlighted as well (e. g., in [2]). The work of Kieffer et al. [6,7,8] directly approaches the concept of representing words by grammars from a traditional data compression perspective, i. e., we want to compute a small grammar representing a large given word w (in the following, we denote the general concept of compressing a single word by a contextfree grammar as grammarbased compression). Besides the above mentioned papers by NevillManning and Witten, the work by Kieffer et al. is usually stated as the second origin of using grammars for single words, but a closer look into the older literature reveals that the external pointer macro scheme (without overlapping and with pointer size 1) defined by Storer and Szymanski [9, 10] is also equivalent to grammarbased compression.
Another motivation is that grammarbased compression, like any lossless data compression scheme, provides a computable upper bound of the Kolmogorov complexity (see [11]). Since this central measure in algorithmic information theory is generally incomputable, such computable approximations are important and, in this regard, grammars are of relevance, since, in comparison to other practically applied compression schemes, they achieve high compression rates and therefore yield a better approximation of the Kolmogorov complexity (in this regard, note that many practically relevant compression schemes, e. g., some of the ones mentioned in Section 1.3, allow fast compression and decompression, but cannot achieve exponential compression rates).
Algorithmics on Compressed Strings
The original motivations outlined so far are still relevant, but the actual reason why grammarbased compression has experienced a renaissance and thrives today as an independent and important field of research on its own are the following. While in the early days of computer science, the most important requirements for compression schemes were fast (i. e., linear or near linear time) compression and decompression, nowadays the investigation regarding whether they are suitable for solving problems directly on the compressed data without prior decompression forms a vibrant research area.^{Footnote 4} This area is usually subsumed under the term algorithmics on compressed strings, and grammarbased compression is particularly well suited for this purpose.
The success of grammars with respect to algorithmics on compressed strings is due to the fact that they cover many compression schemes from practice (most notably, the family of LempelZiv encodings) and that they are mathematically easy to handle (see Lohrey [15] for a survey on the role of grammarbased compression for algorithmics on compressed strings). Many basic problems on strings, e. g., comparison, pattern matching, membership in a regular language, retrieving subwords, etc. can all be solved in polynomial time directly on the grammars [15]. In addition, grammarbased compression has been successfully applied in combinatorial group theory (see the textbook [16] by Lohrey) and to prove problems in computational topology to be polynomialtime solvable [15]. Grammars as compression schemes have also been extended to more complicated objects, e. g., trees (see [17,18,19,20,21], and [21, 22] for applications in term unification) and twodimensional words (see [23]). It is also worth pointing out the successful applications of compressiontechniques for solving word equations (see, e. g., [24, 25]).
A rather recent result is that any contextfree grammar for a single word can be transformed in linear time into an equivalent one that is balanced in the sense that the depth of its derivation tree is logarithmic in the size of the represented word (see [26]). This result has a direct impact on basic algorithmic problems on grammarcompressed data, e. g., the random access problem (i. e., accessing in the compressed string the symbol at a given position).
The Smallest Grammar Problem
For grammarbased compression, the central computational problem is that of computing a smallest (or at least small) grammar for a given word, which is called the smallest grammar problem,^{Footnote 5} and the respective literature is mainly about approximation algorithms:^{Footnote 6}LZ78 [35], LZW [36], Bisection [7], Sequitur [1, 2] and Sequential [8], Longest Match [6], Greedy [37], RePair [38] (the names of algorithms in this list are according to [33, 34]). These algorithms share the benefit of being rather simple and fast, and their approximation ratios have been studied thoroughly by Charikar et al. in [33], by Lehman in his PhDthesis [34] and some bounds have recently been further improved by Hucke et al. [39]. Unfortunately, none of the approximation ratios are constant and the currently best achieved approximation ratio is \(\mathcal {O}\left (\log \left (\frac {{w}}{m^{*}}\right )\right )\), where m^{∗} is the size of a smallest grammar (i. e., it is still open whether an approximation algorithm with a constant approximationratio exists, or equivalently, whether the problem is in APX). This result is due to the algorithms by Rytter [40] and Charikar et al. [33, 34], which have been developed independently from each other and are not mentioned in the above list. On the other hand, assuming P≠NP, it has been shown in [33, 34] that an approximation ratio better than \(\frac {8569}{8568} \approx 1.0001\) is not possible (thus, ruling out a polynomialtime approximation scheme (PTAS)). However, the research seems to have stagnated at this huge gap between lower and upper bound and still neither an approximation algorithm with a constant approximation ratio nor stronger inapproximability results are known.
The strong bias towards approximation algorithms is usually justified by the general NPhardness of the smallest grammar problem, but, as explained next, this theoretical justification is seriously flawed. The NPcompleteness can be shown by a reduction from vertex cover (see [33, 34]), but in the reduction, an unbounded number of symbols in the underlying alphabet is needed. This means nothing less than that the hardnessreduction is invalid for any realistic scenario, where we deal with a constant alphabet (even more, if the alphabet is rather small, as it is the case in practical applications). Consequently, since the motivation for the approximation algorithms mentioned above is of a rather practical kind (i. e., string compression in realworld scenarios), this theoretical foundation falls apart (in particular, note that an unbounded alphabet is also necessary for the inapproximability result of [33, 34]). One reason for this situation is probably that in [41], it is claimed that the hardness for alphabets of size 3 follows from [10], but a closer look into [10] does not confirm this (we elaborate on this claim in Section 2.4). Consequently, the NPhardness of the smallest grammar problem for fixed alphabets is essentially open (for well over 30 years, taking [9, 10] as the first reference, which investigates hardness and complexity questions).
Our Contribution
The main result of this paper is a reduction that proves the smallest grammar problem for fixed alphabets to be NPcomplete, at least for alphabet sizes of 17 or larger. As explained above, this closes an important gap in the literature and therefore puts the previous work on grammarbased compression on a more solid theoretical foundation.
Moreover, it also follows that the optimisation version of the smallest grammar problem is APXhard; thus, the impossibility of a PTAS, previously only known for unbounded alphabets, carries over to the more realistic case of bounded alphabets. By a minor modification of this reduction, we can also show that these two hardness results hold for a slightly different (but frequently used) size measure of grammars, i. e., the rulesize, which equals the size of a grammar as defined above plus the number of its rules (both these measures are formally defined Section 2.2).
Given these negative complexity results, we move on to the question of whether smallest grammars can be efficiently computed, if certain parameters (e. g., levels of the derivation tree, number of rules) are bounded. In this regard, we show that smallest grammars can be computed in polynomial time, provided that the size of the nonterminal alphabet (i. e., number of rules) is bounded. This result, which is due to an encoding of the smallest grammar problem as a problem on graphs with interval structure, raises two followup questions: (1) is the problem fixedparameter tractable with respect to the number of rules, (2) is it possible to efficiently compute, how many rules are at least necessary for a smallest grammar? Both of these questions are answered in the negative, by showing W[1]hardness and NPhardness, respectively.
Finally, we investigate exact exponentialtime algorithms which are not yet considered in the literature. We consider this a relevant topic, since grammars are particularly suitable for solving basic problems directly on the compressed representation without decompression, which motivates scenarios, where an extensive running time is invested only once, in order to obtain an optimal compression, which is then stored and worked with. While bruteforce algorithms with running time \(\mathcal {O}^{*}(c^{{w}})\), for a constant c, can be easily found, we present a dynamic programming algorithm with running time \(\mathcal {O}^{*}(3^{{w}})\).
The exploitation of hierarchical structure is one of the main features of grammars (making them suitable tools for structural inference, and also allowing exponential compression rates) and is reflected in the number of levels of the corresponding derivation tree. Hence, from a (parameterised) complexity point of view, it is natural to measure the impact of this “hierarchical depth” of grammars with respect to the complexity of the smallest grammar problem. To this end, we investigate the above mentioned questions also for 1level grammars, i. e., grammars in which only the start rule contains nonterminals and, surprisingly, our results suggest that computing general grammars is, if at all, only insignificantly more difficult than computing 1level grammars. More precisely, the smallest grammar problem for 1level grammars is NPhard for alphabets of size 5 (also with respect to the rule size measure), W[1]hard if parameterised by the number of rules, it can be solved in polynomial time if the number of rules is bounded by a constant and there is an \(\mathcal {O}^{*}(1.8392^{{w}})\) exact algorithm. Moreover, the exact exponentialtime algorithm for the general case works incrementally in the sense that in the process of producing a smallest grammar, it also produces a smallest 1level grammar, a smallest 2level grammar and so on.
Outline of the Paper
In Section 2, we give basic definitions, we define the smallest grammar problem, we illustrate it with several examples and also illustrate in detail the connections between grammarbased compression and the related macro schemes by Storer and Szymanski [9]. The next section contains the hardness results mentioned above, where the 1level and the multilevel case is treated separately in Sections 3.1 and 3.2, respectively (in Section 3.3, we define and discuss possible extensions of the hardness reductions). The second main part of the paper is Section 4, where we show that the smallest grammar problem can be solved in polynomial time, if the number of nonterminals is bounded (in Section 4.1, we discuss some related questions). In the last part, Section 5, we first present a (simple) exact exponentialtime algorithm for the 1level case and then, in Section 5.2, we define the dynamic programming algorithm for the multilevel case. Finally, in Section 6, we summarise our results, point out open problems and mention further research tasks.
Preliminaries
In this section, we first introduce some general mathematical definitions and terminology about strings, and some basic concepts from graph theory and complexity theory. Then we define grammars and the smallest grammar problem and illustrate it by several examples. We conclude this section by a discussion of Storer and Szymanski’s external pointer macro scheme already mentioned in Section 1.
Let \(\mathbb {N} = \{1, 2, 3, \ldots \}\) denote the natural numbers. By A, we denote the cardinality of a set A. Let Σ be a finite alphabet of symbols. A word or string (over Σ) is a sequence of symbols from Σ. For any word w over Σ, w denotes the length of w and ε denotes the empty word, i. e., ε = 0. The symbol Σ^{+} denotes the set of all nonempty words over Σ and Σ^{∗} = Σ^{+} ∪{ε}. For the concatenation of two words w_{1}, w_{2} we write w_{1} ⋅ w_{2} or simply w_{1}w_{2}. For every symbol a ∈Σ, we denote by w_{a} the number of occurrences of symbol a in w. We say that a word v ∈Σ^{∗} is a factor of a word w ∈Σ^{∗} if there are \(u_{1}, u_{2} \in {\Sigma }^{*}\) such that w = u_{1}vu_{2}. If u_{1} = ε (or u_{2} = ε), then v is a prefix (or a suffix, respectively) of w. Furthermore, F(w) = {u∣u is a factor of w} and F_{≥ 2}(w) = {u∣u ∈F(w),u≥ 2}. For a position j, 1 ≤ j ≤w, we refer to the symbol at position j of w by the expression w[j] and \(w[j..j^{\prime }] = w[j] w[j + 1] {\ldots } w[j^{\prime }]\), \(j \leq j^{\prime } \leq {w}\). By w^{R}, we denote the reversal of w, i. e., w^{R} = w[n]w[n − 1]…w[1], where w = n.
A factorisation of a word w is a tuple (u_{1}, u_{2},…, u_{k}) with u_{i}≠ε, 1 ≤ i ≤ k, such that w = u_{1}u_{2}…u_{k}.
Basic Concepts of Graph Theory and Complexity Theory
We use undirected graphs, which are represented as pairs (V, E), where V is the set of vertices and E is the set of edges. For the sake of convenience, we write edges {u, v}∈ E also as (u, v) or (v, u). For a vertex v ∈ V, N(v) = {u∣(v, u) ∈ E} is the (open) neighbourhood (of v), N[v] = N(v) ∪{v} is the closed neighbourhood (of v) and, furthermore, we extend the notation of closed neighbourhood to sets \(C \subseteq V\) in the obvious way, i. e., \(N[C] = \bigcup _{v \in C}N[v]\). A graph is cubic (or subcubic) if, for every v ∈ V, N(v) = 3 (or N(v) ≤ 3, respectively).
A set \(C \subseteq V\) is

an independent set if, for every u, v ∈ C, (u, v)∉E,

a dominating set if N[C] = V,

an independent dominating set if it is both an independent and a dominating set,

a vertex cover if, for every (u, v) ∈ E, {u, v}∩ C≠∅.
We are concerned with the corresponding problems of deciding, for a given graph G and a \(k \in \mathbb {N}\), whether there is a vertex cover (or an independent dominating set) of cardinality at most k. It is a wellknown fact that these decision problems are NPcomplete problems (see [42]).
For \(k \in \mathbb {N}\), a graph G = (V, E), with V  = n, is a kinterval graph, if there are intervals I_{i, j}, 1 ≤ i ≤V , 1 ≤ j ≤ k, on the real line, such that G is isomorphic to \((\{v_{i} \mid 1 \leq i \leq V\}, \{(v_{i}, v_{i^{\prime }}) \mid \bigcup ^{k}_{j = 1} I_{i, j} \cap \bigcup ^{k}_{j = 1} I_{i^{\prime }, j} \neq \emptyset \})\). For 1interval graphs (which are also just called interval graphs), it is possible to compute minimal independent dominating sets in linear time (see [43]; note that a perfect elimination ordering (that is part of the input of Farber’s algorithm) can be easily computed in our applications, because the intervals are clear).
We assume the reader to be familiar with the basic concepts of complexity theory (for unexplained notions, see Papadimitriou [44]) and the theory of NPcompleteness (see [44] and [42]).
As usual, for our runningtime estimations, we mainly use the \(\mathcal {O}\)notation, but sometimes also the \(\mathcal {O}^{*}\)notation (ignoring polynomial factors). The latter is appropriate, if we are dealing with exponentialtime algorithms (see Section 5).
Since we also wish to discuss some of our results from the parameterised complexity point of view, we shall briefly mention the concepts relevant for us (for detailed explanations on parameterised complexity, the reader is referred to the textbooks [45,46,47]). A parameterised problem is a decision problem with instances (x, k), where x is the actual input and \(k \in \mathbb {N}\) is the parameter. By XP, we denote the class of parameterised problems that are solvable in time \(\mathcal {O}(n^{f(k)})\) (where n is the size of the instance) and FPT denotes the class of fixedparameter tractable problems, i. e., problems having an algorithm with runningtime \(\mathcal {O}(g(k) \cdot f(n))\), for a computable function g and polynomial f.
In order to argue about fixedparameter intractability, we need the following kind of reductions. A (classical) manyone reduction R from a parameterised problem to another is an fptreduction, if the parameter of the target problem is bounded in terms of the parameter of the source problem, i. e., there is a recursive function \(h\colon \mathbb {N} \rightarrow \mathbb {N}\) such that \(R(x, k) = (x^{\prime }, k^{\prime })\) implies \(k^{\prime } \leq h(k)\).
We shall use two different kinds of fixedparameter intractability. First, if a parameterised problem is NPhard if the parameter is fixed to a constant, then it is not in FPT, unless P = NP. As a slightly weaker form of fixedparameter intractability, the framework of parameterised complexity provides the classes of the socalled Whierarchy, for which the hard problems (with respect to fptreductions) are considered fixedparameter intractable, i. e., they are not in FPT (under some complexity theoretical assumptions). For a detailed definition of the Whierarchy, we refer to the textbooks [45,46,47]; in this paper, we only use the first level of this hierarchy, i. e., the class W[1], and our respective intractability results are W[1]hardness results.
A minimisation problem^{Footnote 7}P is a triple (I, S, m) with I being the set of instances, S being a function that maps instances x ∈ I to the set of feasible solutions for x, and m being the objective function that maps pairs (x, y) with x ∈ I and y ∈ S(x) to a positive rational number. For every x ∈ I, we denote \(m^{*}(x):=\min \limits \{m(x,y)\colon y\in S(x)\}\). For two minimisation problems P_{1}, P_{2} with P_{j} given by (I_{j}, S_{j}, m_{j}), j ∈{1,2}, an Lreduction from P_{1} to P_{2} is a quadruple (f, g, β, γ) such that

f is a polynomialtime computable function from I_{1} to I_{2} that satisfies, for every x ∈ I_{1} with S_{1}(x)≠∅, S_{2}(f(x))≠∅.

g is a polynomialtime computable function that, for every x ∈ I_{1} and y ∈ S_{2}(f(x)), maps (x, y) to a solution in S_{1}(x).

β is a constant such that \(m_{2}^{*}(f(x))\leq \beta \cdot m_{1}^{*}(x)\) for each x ∈ I_{1}.

γ is a constant such that \(m_{1}(x,g(x,y))m_{1}^{*}(x)\leq \gamma \cdot (m_{2}(f(x),y)m_{2}^{*}(f(x)))\) for each x ∈ I_{1} and y ∈ S_{2}(f(x)).
We shall use Lreductions in order to show hardness for APX, the class of optimisation problems for which there exists an approximation algorithm with a constant approximation ratio. Note that, unless P = NP, an APXhard problem does not have a polynomialtime approximation scheme (see [48] for detailed information of approximation hardness).
Grammars
A contextfree grammar is a tuple G = (N,Σ, R, S), where N is the set of nonterminals, Σ is the terminal alphabet, S ∈ N is the start symbol and \(R \subseteq N \times (N \cup {\Sigma })^{+}\) is the set of rules (as a convention, we write rules (A, w) ∈ R also in the form A → w). A contextfree grammar G = (N,Σ, R, S) is a singleton grammar if R is a total function N → (N ∪Σ)^{+} and the relation {(A, B)∣(A, w) ∈ R,w_{B} ≥ 1} is acyclic.
For a singleton grammar G = (N,Σ, R, S), let D_{G}: (N ∪Σ) → (N ∪Σ)^{+} be defined by D_{G}(A) = R(A), A ∈ N, and D_{G}(a) = a, a ∈Σ. We extend D_{G} to a morphism (N ∪Σ)^{+} → (N ∪Σ)^{+} by setting D_{G}(α_{1}α_{2}…α_{n}) = D_{G}(α_{1})D_{G}(α_{2})…D_{G}(α_{n}), for α_{i} ∈ (N ∪Σ), 1 ≤ i ≤ n. Furthermore, for every α ∈ (N ∪Σ)^{+}, we set \({\mathsf {D}^{1}_{G}}(\alpha ) = \mathsf {D}_{G}(\alpha )\), \({\mathsf {D}^{k}_{G}}(\alpha ) = \mathsf {D}(\mathsf {D}^{k1}_{G}(\alpha ))\), for every k ≥ 2, and \(\mathfrak {D}_{G}(\alpha ) = \lim _{k \to \infty } {\mathsf {D}^{k}_{G}}(\alpha )\) is the derivative of α. By definition, \(\mathfrak {D}_{G}(\alpha )\) exists for every α ∈ (N ∪Σ)^{+} and is an element from Σ^{+}. The size of the singleton grammar G is defined by \(G = {\sum }_{A \in N} \mathsf {D}_{G}(A)\) and the rulesize of G is defined by G_{r} = G + N or, equivalently, \(G_{\mathsf {r}} = {\sum }_{A \in N} (\mathsf {D}_{G}(A) + 1)\). Our main size measure will be ⋅. The rulesize ⋅_{r} will play a role in Section 3.3 and will be discussed in more detail there.
Remark 1
The class of singleton grammars exactly coincides with the class of contextfree grammars that do not have unreachable rules (i. e., rules that cannot occur in any derivation) and that can derive exactly one word. As mentioned before, such grammars are also called straightline programs in the literature. A contextfree grammar that can derive only a single word and is not a singleton grammar must contain some rules that are not reachable. Since unreachable rules can easily be discovered and removed, we directly add this restriction to the concept of singleton grammars.
The derivation tree of G is a ranked ordered tree with nodelabels from Σ ∪ N, inductively defined as follows. The root is labelled by S and every node labelled by A ∈ N with D(A) = α_{1}α_{2}…α_{n} has n children labelled by α_{1}, α_{2},…, α_{n} in exactly this order; note that this means that all leaves are from Σ.
From now on, we simply use the term grammar instead of singleton grammar and if the grammar under consideration is clear from the context, we also drop the subscript G. We set \(\mathfrak {D}(G) = \mathfrak {D}(S)\) and say that Gis a grammar for \(\mathfrak {D}(G)\). Since for singleton grammars, the start symbol is somewhat superfluous, we will ignore it and denote grammars G = (N,Σ, R, S) in the form G = (N,Σ, R, ax) instead, where ax = R(S) is called the axiom (of G). In particular, we interpret derivations to start directly with the axiom and, correspondingly, we also sometimes ignore the root of derivation trees. However, this does not change the size measures ⋅ and ⋅_{r}, which, when ignoring the start symbol, can also be defined as \(G = ({\sum }_{A \in N} \mathsf {D}_{G}(A)) + \mathsf {ax}\) and \(G_{\mathsf {r}} = ({\sum }_{A \in N} (\mathsf {D}_{G}(A) + 1)) + \mathsf {ax} + 1\).
The number of levels of a grammar G = (N,Σ, R, ax) is \(\min \limits \{k \mid {\mathsf {D}^{k}_{G}}(\mathsf {ax}) = \mathfrak {D}_{G}(\mathsf {ax})\}\), and a grammar with d levels is a dlevel grammar. Intuitively speaking, a grammar G is a dlevel grammar if we need exactly d derivation steps in order to derive \(\mathfrak {D}(G)\) from the axiom; thus, the number of levels measures what we called in the introduction the “hierarchical depth” of a grammar. Note that for a dlevel grammar, the derivation tree has a maximum depth of d + 1 and d + 2 levels (when counting the root as well). With this definition, the grammars that are the most restricted with respect to their hierarchical depth and that are still reasonable, are 1level grammars (i. e., an axiom that derives a word in one step).
Let G = (N,Σ, R, ax) be a 1level grammar. The profit of a rule (A, α) ∈ R is defined by p(A) = ax_{A}(α− 1) −α. Intuitively speaking, if all occurrences of A in ax are replaced by α and the rule A → α is deleted, then the size of the grammar increases by exactly p(A). Consequently, \(G = \mathfrak {D}(G)  {\sum }_{A \in N} \mathsf {p}(A)\).
Example 1
The grammar G = (N,Σ, R, ax) with N = {A, B}, Σ = {,}, ax = AAB and
is a 2level grammar of size 13 (and rulesize 16) with axiom AAB. Furthermore, \(\mathfrak {D}(B) = {\mathtt {b}} {\mathtt {a}} {\mathtt {a}} {\mathtt {b}}\), \(\mathfrak {D}(A) = \mathfrak {D}(B) {\mathtt {a}} \mathfrak {D}(B) = {\mathtt {b}} {\mathtt {a}} {\mathtt {a}} {\mathtt {b}} {\mathtt {a}} {\mathtt {b}} {\mathtt {a}} {\mathtt {a}} {\mathtt {b}}\) and
Consequently, G is a size 13 representation of a word of length 25. A derivation tree of G can be seen in Fig. 1.
Replacing the axiom by R(A)R(A)B = BBBBB and deleting rule A → BB turns G into a 1level grammar \(G^{\prime }\) with \(\mathfrak {D}(G^{\prime }) = \mathfrak {D}(G)\). Moreover, p(B) = ax_{B}(R(B)− 1) −R(B) = 5(4 − 1) − 4 = 11 and \(G^{\prime } = \mathfrak {D}(G^{\prime })  \mathsf {p}(B) = 25  11 = 14\).
A smallest grammar for a word w is any grammar G with \(\mathfrak {D}(G) = w\) and \(G \leq G^{\prime }\) for every grammar \(G^{\prime }\) with \(\mathfrak {D}(G^{\prime }) = w\); generally, a grammar G is smallest if it is a smallest grammar for \(\mathfrak {D}(G)\) (grammars that are smallest with respect to the rulesize measure will be called rsmallest grammars). The decision problem variant of computing smallest grammars is defined as follows:

Smallest Grammar Problem (SGP)

Instance: A word w and a \(k \in \mathbb {N}\).

Question: Does there exist a grammar G with \(\mathfrak {D}(G) = w\) and G≤ k?
The Smallest 1Level Grammar Problem(1SGP) is defined analogously, with the only difference that we ask for a 1level grammar of size at most k. By SGP_{r} and 1SGP_{r}, we denote the problem variants, where we consider the rulesize instead of the size, i. e., we require G_{r} ≤ k.
The optimisation variant of SGP, i. e., the task of actually producing a smallest grammar for a given word w, shall be denoted by SGP_{opt} (and SGP_{r, opt} if we are concerned with the rulesize). More precisely, according to the definitions given in Section 2.1, SGP_{opt} = (I, S, m), where I = Σ^{∗}, \(S(w) = \{G \mid \mathfrak {D}(G) = w\}\) and m(w, G) = G (or m(w, G) = G_{r} for SGP_{r, opt}).
Examples
While the following examples illustrate the smallest grammar problem in general, they are particularly tailored to the technicalities to be encountered in Section 3, i. e., they shall point out the difficulties arising in predicting how factors in a larger word are compressed by a smallest grammar, which is crucial in the design of gadgets for a hardness reduction.
Let \(w = {\prod }^{n}_{i = 1} 1 0^{i}\) be a word over the binary alphabet Σ = {0,1}, where n = 2^{k}, \(k \in \mathbb {N}\). This word has a very simple structure and can be interpreted as a list of a (potentially unbounded) number of integers. This is crucial, since if we want to encode objects (e. g., graphs), the size of which is not bounded in terms of the alphabet size, then structures of this form will inevitably appear.
One way of compressing w that comes to mind is by the use of rules A_{1} → 10, A_{i} → A_{i− 1}0, 2 ≤ i ≤ n − 1, and an axiom A_{1}A_{2}…A_{n− 1}A_{n− 1}0, which leads to the grammar G_{1} = (N,Σ, R, ax), with:
This grammar has an overall size given by \(G_{1} = \underbrace {n + 1}_{\mathsf {ax}} + \underbrace {2(n  1)}_{\text {rules}} = 3n  1\).
However, it is also possible to construct the factors 0^{i}, 1 ≤ i ≤ n, “from the middle” by rules A_{1} → 010, A_{i} → 0A_{i− 1}0, \(2 \leq i \leq \frac {n}{2}  1\), and an axiom 1(A_{1})^{2}(A_{2})^{2}… By using these ideas, we can construct the smaller grammar G_{2} = (N,Σ, R, ax), where
We have \(G_{2} = \underbrace {n + 4}_{\mathsf {ax}} + \underbrace {3(\tfrac {n}{2}  1) + 2(k2)}_{\text {rules}} = \frac {5n}{2} + 2k  3\).
Both of these grammars achieve an asymptotic compression rate of order \(\mathcal {O}(\sqrt {{w}})\), but, generally, grammars are capable of exponential compression rates (see [33, 34]). Aiming for such exponential compression, it seems worthwhile to represent every unary factor \(0^{2^{\ell }}\), 1 ≤ ℓ ≤ k, by a nonterminal B_{ℓ} (obviously, this requires only k rules of size 2) and then represent all unary factors by sums of these powers (e. g., 0^{74} is compressed by B_{1}B_{3}B_{6}). Formally, consider G_{3} = (N,Σ, R, ax), where
where α_{i} = x_{0}x_{1}…x_{k− 1} and, for every j, 1 ≤ j ≤ k − 1, x_{j} = B_{j} if the j^{th} bit (i. e., the one representing 2^{j}) of the binary representation of i is 1 and x_{j} = ε otherwise. However, this yields a grammar of size
which, if k is sufficiently large, is worse than the previous grammars.
A grammar that is even smaller than G_{2} can be obtained by combining the idea of G_{2} with that of representing factors \(0^{2^{\ell }}\) by nonterminals B_{ℓ}. More precisely, for every ℓ, 1 ≤ i ≤ k − 2, we represent \(0^{2^{\ell }}\) by an individual nonterminal B_{ℓ} and, in addition, we use rules A_{1} → 010, A_{i} → 0A_{i− 1}0, \(2 \leq i \leq \frac {n}{4}\). Then the left and right half of w can be compressed in the way of G_{2}, with the only difference that in the right part, for every unary factor, we also need an occurrence of B_{k− 1}, i. e., consider G_{4} = (N,Σ, R, ax) with:
This grammar yields a size of \(G_{4} = \underbrace {\tfrac {3n}{2} + 1}_{\mathsf {ax}} + \underbrace {\tfrac {3n}{4} + 2(k1)}_{\text {rules}} = \frac {9n}{4} + 2k  1\). Note that again the asymptotic compression rate is of order \(\mathcal {O}(\sqrt {{w}})\).
These considerations point out that even for simply structured words like w, it is very difficult to determine the structure of a smallest grammar or its size. However, for reducing an NPhard problem, we need to know, to at least some extent, how smallest grammars compress the constructed strings in order to relate the reduced instances to the original instances. Consequently, the above examples point out the challenges that arise in this regard.
We conclude this list of examples, by pointing out that giving a smallest grammar for our toyexample \(w = {\prod }^{n}_{i = 1} 1 0^{i}\) in dependency of n, is essentially an open problem. A respective asymptotic bound of \({\Omega }(\sqrt {w})\) is a reasonable assumption, but we have no proof for this claim.
Storer and Szymanski’s External Pointer Macro Scheme and GrammarBased Compression
Storer and Szymanski [9] introduce a very general form of a compression scheme that covers a large variety of different compression strategies, in particular also grammarbased compression. On the one hand, we cite their work as the first that, in a sense, considered grammarbased compression, but in the context of our paper, it is also of greater importance for the following reasons. The technical report [10]^{Footnote 8} provides a comprehensive complexity analysis of many different variants of Storer and Szymanski’s compression scheme with many NPhardness reductions. Some of the considered variants also concern the case of fixed alphabets, which has led to the misunderstanding that the hardness of the smallest grammar problem for fixed alphabets is provided by [10], leading to the misconception that also in practical scenarios – i. e., for fixed alphabets – grammarbased compression is known to be intractable. Since closing this gap by providing the assumed hardness result is one of the main objectives of this paper, we shall discuss in some more detail why it cannot already be found among the many hardness results of [10].
First, we recall the definitions of Storer and Szymanski [9] that are relevant here. For a word w ∈Σ^{+} and a pointer size \(p \in \mathbb {N}\), a compressed form of w for pointer size p using the external pointer macro, EPM for short, is any word s_{0}#s_{1} with \(s_{0}, s_{1} \in ({\Sigma } \cup \{1, 2, \ldots , s_{0}\}^{2})^{+}\), #∉Σ, and w can be obtained from s_{0}#s_{1} by repeating the following two steps:

Replace every symbol (i, j) in s_{1} by s_{0}[i..j],

repeat the first step until s_{1} equals w.
The size of an EPM s_{0}#s_{1} is defined by \({\sum }_{i = 1}^{s_{0} s_{1}} \ell _{i}\), where ℓ_{i} = 1, if s_{0}s_{1}[i] ∈Σ and ℓ_{i} = p, otherwise (i. e., each occurrence of a symbol from \(\{1, 2, \ldots , s_{0}\}^{2}\) (the actual pointers) contribute the pointer size p to the overall size of the EPM).
A grammar for a word w easily translates into an EPM for w. For example, the grammar G = (N,Σ, R, ax) with N = {A, B}, Σ = {a, b, c}, R = {A → BcB, B → ba} and ax = AabBBAc translates into the external pointer macro ba(1,2)c(1,2)#(3,5)ab(1,2)(1,2)(3,5)c. More precisely, the prefix ab is the right side of the rule for B, (1,2)c(1,2) corresponds to the right side of the rule for A, where the occurrences of B are represented by pointers (1,2) to the prefix s_{0}[1..2] = ab, (3,5)ab(1,2)(1,2)(3,5)c corresponds to the axiom, where occurrences of A and B are represented by pointers (3,5) and (1,2), respectively. If the pointer size is 1, then the EPM has the same size as the grammar.
If an EPM s_{0}#s_{1} is nonoverlapping, i. e., it is never the case that for two pointers (i, j) and (k, ℓ) we have i ≤ k ≤ j or k ≤ i ≤ ℓ, then it also translates into a grammar by transforming each pointer (i, j) into a nonterminal A_{(i, j)} with a rule A_{(i, j)} → s_{0}[i..j]. In this regard, it is important to note that the property of an EPM that s_{1} can be turned into w by repeated replacement of the pointers ensures that the derivation function of the grammar constructed in this way is acyclic.
We conclude that the concept of singleton grammars and the concept of EPMs with pointer size 1 and without overlapping are more or less identical, i. e., they just differ syntactically. Consequently, the problem of grammarbased compression and the problem of computing smallest EPMs with pointer size 1 and without overlapping are identical problems.
However, a closer look at Storer [10] shows that in this paper the variant of computing EPMs with pointer size 1 is not considered. Instead, the focus is on EPMs (and other kind of compression schemes), for which the pointer size is not even constant, but a function of the length of the word that is compressed, typically logarithmic in the size w. Note that this avoids the main difficulties encountered when designing a reduction for grammarbased compression with fixed alphabets (see Section 3): the factors that encode vertices of a graph must have unbounded length, which makes it rather difficult to control how the grammar compresses these codewords. On the other hand, if the pointers (which correspond to nonterminals in the grammar) have size \(\log ({w})\), then it does not make sense to compress factors that are smaller than this size (since we gain nothing by replacing them by pointers). It is straightforward to represent a graph as a word of length linear in the size of the graph, where the length of the factors (i. e., the codewords) that represent single vertices are logarithmic in the size of the graph (this is the case in all reductions of [9, 33, 34]). The property mentioned above, i. e., that factors of logarithmic size are not compressed, then simply means that we can assume that the codewords for vertices are not compressed in the string that describes the graph, which makes is rather simple to devise a hardness reduction (in fact, controlling the possible compression of codewords is the main technical challenge in our reductions).
N PHardness of Computing Smallest Grammars for Fixed Alphabets
In their basic structure, the hardness reductions to be presented next are similar to the one from [33, 34], which shows NPhardness of SGP for unbounded alphabets by a reduction from the vertex cover problem. All the effort of this section will consist in the extension of the general idea to the case of a fixed alphabet. In order to facilitate the accessibility of our technical proofs, we shall sketch this reduction from [33, 34].
Let \(\mathcal {G} = (V, E)\) be a graph with
We define the following word over the alphabet V ∪{◇_{i}∣1 ≤ i ≤ 5n + m}∪{#} (for the sake of simplicity, every individual occurrence of ◇ in the word stands for a distinct symbol of {◇_{i}∣1 ≤ i ≤ 5n + m}):
Let G = (N,Σ, R, S) be a smallest grammar for \(w_{\mathcal {G}}\), then we can observe the following:

For every A ∈ N, \(\mathfrak {D}(A) \in \{\# v_{i}, v_{i} \#, \# v_{i} \# \mid 1 \leq i \leq n\}\). This is due to the fact that the only factors of \(w_{\mathcal {G}}\) with repetitions are of the form #v_{i}, v_{i}# or #v_{i}#.

We can assume that, for every i, 1 ≤ i ≤ n, there are rules A_{i} → #v_{i} and B_{i} → v_{i}#, since if some of these rules are missing, then adding them and compressing the respective factors does not increase the size of the grammar.

Let \(\mathfrak {I} \subseteq \{1, 2, \ldots , n\}\) contain exactly the indices i such that a rule with derivative #v_{i}# exists; moreover, we can assume that all these rules have the form C_{i} → A_{i}#.

Let \({\Gamma } = \{v_{i} \mid i \in \mathfrak {I}\}\). If an edge \((v_{j_{2i1}}, v_{j_{2i}})\) is not covered by Γ, then adding a rule \(C_{j_{2i1}} \to A_{j_{2i1}} \#\) or \(C_{j_{2i}} \to A_{j_{2i}} \#\) does not increase the size of the grammar. So we can assume that Γ is a vertex cover.
These observations show that there exists a grammar G for \(w_{\mathcal {G}}\) with G≤ 15n + 3m + k if and only if there is a vertex cover for \(\mathcal {G}\) of size at most k (for a formal proof, we refer to [33, 34]).
A simple modification of this reduction yields the following.
Theorem 1
1SGP is NPcomplete.
Proof
We slightly change the reduction from [33, 34] as follows:
The only difference from the original reduction is that the size of the rules with derivative #v_{i}# has increased by 1, i. e., they now have the form C_{i} → #v_{i}#, so by repeating the factors #v_{i}# ◇, we make sure that adding such a rule whenever an edge is not covered does not increase the size of the grammar. □
In these reductions, we encode the different vertices of a graph by single symbols and also use individual separator symbols (i. e., symbols with only one occurrence in the word to be compressed). This makes it particularly easy to devise suitable gadgets, but, on the other hand, it assumes that we have an arbitrarily large alphabet at our disposal. In the remainder of this section, we shall extend these hardness results to the more realistic case of fixed alphabets. The general structure of our reductions is similar to the ones of [10, 33, 34] sketched above, but, due to the constraint of having a fixed alphabet, they substantially differ on a more detailed level. More precisely, since fixed alphabets make it impossible to use single symbols (or even words of constant size) as separators or as representatives for vertices, we need to use special encodings for which we are able to determine how a smallest grammar will compress them (in this regard, recall our examples from Section 2.3 demonstrating how difficult it can be to determine a smallest grammar even for a single simply structured word). This constitutes a substantial technical challenge, which complicates our reductions considerably.
In the following, we prove that 1SGP and SGP are NPhard, even for constant alphabet of size 5 and 24, respectively. The stronger result claimed in the abstract and introduction, i. e., the hardness of SGP for alphabets of size 17, is presented later as an improvement (see Section 3.4, Corollary 1).
The 1Level Case
As a tool for proving the hardness of 1SGP, but also as a result in its own right, we first show that the compression of any 1level grammar is at best quadratic (in contrast to general grammars, which can achieve exponential compression). Note that the bound of Lemma 1 is tight, e. g., consider \(\mathtt {a}^{n^{2}}\) and a grammar with rules S → A^{n} and A →^{n}.
Lemma 1
Let G be a 1level grammar. Then \(G \geq 2 \sqrt {\mathfrak {D}(G)}\).
Proof
Let \(n = \mathfrak {D}(G)\), let ax be the axiom and let A → u be a rule with a right side of maximum length. Obviously, axu≥ n, and, since \(x+y\geq 2\sqrt {xy}\) holds for all x, y ≥ 0, also \(\mathsf {ax} + u \geq 2 \sqrt {\mathsf {ax}u}\). Consequently,
□
In order to prove the NPhardness of 1SGP for constant alphabets, we also devise a reduction from the vertex cover problem. To this end, let \(\mathcal {G} = (V, E)\) be the graph defined above and, without loss of generality, we assume n ≥ 40. We define Σ = {,,◇,⋆, #} and \([\diamond ] = \diamond ^{n^{3}}\). For each i, 1 ≤ i ≤ n, we encode v_{i} by a word \(\overline {v_{i}} \in \{\mathtt {a},\mathtt {b}\}^{\lceil \log (n)\rceil }\) such that \(\overline {v_{i}} \neq \overline {v_{j}}\) if and only if i≠j (e. g., by taking \(\overline {v_{i}}\) to be the binary representation of i over symbols and with \(\lceil \log (n)\rceil \) many digits). We now define the following word over Σ:
First, we show how a vertex cover for \(\mathcal {G}\) translates into a grammar for w:
Lemma 2
If there exists a size k vertex cover of \(\mathcal {G}\), then there exists a 1level grammar G with \(\mathfrak {D}(G) = w\) and \(G = 13n\left \lceil \log (n) \right \rceil + 17n + k + 6m + 1 + 2n^{3}\).
Proof
Let \({\Gamma } \subseteq V\) be a sizek vertex cover of \(\mathcal {G}\). We define a grammar G = (N,Σ, R, ax) with
where, for every i, 1 ≤ i ≤ n, \(y_{i} = \overset {{~}_{\leftrightarrow }}{V_{i}}\) if v_{i} ∈Γ and \(y_{i} = \overset {{~}_{\leftarrow }}{V_{i}} \#\) otherwise, and, for every i, 1 ≤ i ≤ m, \(z_{i} = \overset {{~}_{\leftrightarrow }}{V}_{j_{2i1}} {\!}_{\rightarrow }{V}_{j_{2i}}\) if \(v_{j_{2i1}} \in {\Gamma }\) and \(z_{i} = \overset {{~}_{\leftarrow }}{V}_{j_{2i1}} \overset {{~}_{\leftrightarrow }}{V}_{j_{2i}}\) if \(v_{j_{2i1}} \notin {\Gamma }\) (note that in this case \(v_{j_{2i}} \in {\Gamma }\)).
Obviously, G is a 1level grammar and it can be easily verified that \(\mathfrak {D}(G) = w\). It remains to determine the size of G. To this end, we first observe that each rule \(\overset {{~}_{\leftarrow }}{V_{i}} \to \# \overline {v_{i}}\) and \({\!}_{\rightarrow }{V_{i}} \to \overline {v_{i}} \#\), 1 ≤ i ≤ n, has size of \(\lceil \log (n)\rceil + 1\), each rule \(\overset {{~}_{\leftrightarrow }}{V_{j}} \to \# \overline {v_{j}} \#\), v_{j} ∈Γ, has size of \(\lceil \log (n)\rceil + 2\), and the rule D → [◇] has size of n^{3}. Hence, the size contributed by these rules is
The axiom has size of
So the total size is
□
Next, we take care of the opposite direction, i. e., we show how a vertex cover can be extracted from a grammar for w:
Lemma 3
If there exists a 1level grammar G with \(\mathfrak {D}(G) = w\) and \(G \leq 13n\left \lceil \log (n) \right \rceil + 17n + k + 6m + 1 + 2n^{3}\), then there exists a size k vertex cover of \(\mathcal {G}\).
Proof
Let G = (N,Σ, R, ax) be a smallest 1level grammar with
and \(\mathfrak {D}(G) = w\). We first observe that, since n ≥ 40,
Thus, \(G < \frac {n^{3}}{2} + 2n^{3} = \frac {5n^{3}}{2}\). Due to the separator symbol ⋆ with only one occurrence in w, we know that the axiom of G has the form \(u \star u^{\prime }\). Hence, we can consider all the nonterminals (and their rules) that occur in \(u^{\prime }\) as an individual 1level grammar \(G^{\prime }\) for the word \(\mathfrak {D}(u^{\prime }) = [\diamond ]^{n^{3}}\) of size n^{6}. By Lemma 1, we can conclude that \(G^{\prime } \geq 2n^{3}\); thus, \(2n^{3} \leq G < \frac {5n^{3}}{2}\). Claim 1: There is a D ∈ N with D → [◇] and, for every other rule A → x in R, x_{◇} = 0.
Proof of Claim 1: First, we assume that there is a rule A →◇^{ℓ} with ℓ > n^{3}. This rule can only be used in order to compress the suffix \([\diamond ]^{n^{3}}\) of w, since the other part of w has no occurrence of a factor ◇^{ℓ}. Hence, we can replace A →◇^{ℓ} by the rule \(A \to \diamond ^{n^{3}}\) and change the axiom to \(u \star A^{n^{3}}\). By Lemma 1, the rule \(A \to \diamond ^{n^{3}}\) with axiom \(A^{n^{3}}\) compresses the subword \([\diamond ]^{n^{3}}\) optimally which means that this operation does not increase the size of G. Therefore, we conclude that G does not contain a rule A →◇^{ℓ} with ℓ > n^{3}.
Since w contains at least n^{3} nonoverlapping occurrences of the factor [◇] and since G < 3n^{3}, at least one of these factors must be produced by at most 2 nonterminals. This implies that there is a rule B → v with \(v \geq \frac {[\diamond ]}{2} = \frac {n^{3}}{2}\). If v contains a symbol from Σ ∖{◇}, then B → v is not a rule of \(G^{\prime }\); thus, by Lemma 1, it follows that \(G \geq G^{\prime } + \frac {n^{3}}{2} \geq 2n^{3} + \frac {n^{3}}{2} = \frac {5n^{3}}{2}\), which is a contradiction. Hence, we can conclude that v ∈{◇}^{∗} and we further assume that, among all rules with a right side in {◇}^{∗} of size at least \(\frac {n^{3}}{2}\), B → v is such that v is maximal. Moreover, let v = n^{3} − t, for a \(t \in \mathbb {N}\).
We note that, due to the maximality of B → v and the fact that all rules in \(G^{\prime }\) have a right side in {◇}^{∗}, a rule of maximum size in \(G^{\prime }\) has size at most n^{3} − t. In particular, this implies
where \(u^{\prime }\) is the right side of the axiom as defined above.
We now remove rule B → v, add the rule D → [◇] and replace part \(u^{\prime }\) of the axiom by \(D^{n^{3}}\). Since [◇] = v + t and \(u^{\prime } \geq n^{3} + t = D^{n^{3}} + t\), this does not increase the size of the grammar. However, the rule B → v might have been used in order to produce some of the factors [◇] in the left part u of the axiom of G; thus, since we removed the rule B → v, we have to repair G accordingly.
To this end, we first note that every occurrence of [◇] to the left of ⋆ in w is compressed by a sequence E_{1}C_{1}C_{2}…C_{p}E_{2} of terminals or nonterminals, such that \(\mathfrak {D}(E_{1} C_{1} C_{2} {\ldots } C_{p} E_{2}) = x [\diamond ] y\), where E_{1} → x ◇^{q}, q ≥ 1, or E_{1} = ε, and E_{2} →◇^{r}y, r ≥ 1, or E_{2} = ε. For every such occurrence of [◇] to the left of ⋆ in w, we exchange E_{1}C_{1}C_{2}…C_{p}E_{2} by \(E^{\prime }_{1} D E^{\prime }_{2}\), where \(E^{\prime }_{1} = \varepsilon \), if E_{1} = ε and \(E^{\prime }_{1} = x\) if E_{1} → x ◇^{q}, q ≥ 1, and \(E^{\prime }_{2} = \varepsilon \), if E_{2} = ε and \(E^{\prime }_{2} = y\) if E_{2} →◇^{r}y, r ≥ 1. This construction removes rules or shortens them; thus, in order to conclude that the overall size of the grammar does not increase, we only have to observe that the size of the axiom is not increased. To this end, we first observe that if p = 0, then E_{1} or E_{2} must have a right side of length at least \(\frac {n^{3}}{2}\) that contains a symbol from Σ ∖{◇}, but, as shown above, such rules do not exist. Hence, we can assume that p ≥ 1. Furthermore, since E_{1} = ε implies \(E^{\prime }_{1} = \varepsilon \) and E_{2} = ε implies \(E^{\prime }_{2} = \varepsilon \), \(E_{1} C_{1} C_{2} {\ldots } C_{p} E_{2} \geq E^{\prime }_{1} D E^{\prime }_{2}\) follows.
We conclude that the overall size of the grammar did not increase due to these modifications. Moreover, G now contains a rule D → [◇] and, since all occurrences of ◇ in w are produced by this rule, we can safely remove all other rules that produce an occurrence of ◇ from the grammar. (Claim 1) \(\square \)
The statement of the previous claim particularly implies that the axiom of G has the form
where \(\alpha _{i}, \alpha ^{\prime }_{i}, \beta _{i}, \gamma _{j} \in (N \cup {\Sigma })^{*}\), 1 ≤ i ≤ n, 1 ≤ j ≤ m.
Claim 2: For every i, 1 ≤ i ≤ n, \(\alpha _{i} = \overset {{~}_{\leftarrow }}{V_{i}}\), \(\alpha ^{\prime }_{i} = {\!}_{\rightarrow }{V_{i}}\), where \(\overset {{~}_{\leftarrow }}{V_{i}}, {\!}_{\rightarrow }{V_{i}}\) are nonterminals with rules \(\overset {{~}_{\leftarrow }}{V_{i}} \rightarrow \# \overline {v_{i}}\) and \({\!}_{\rightarrow }{V_{i}} \rightarrow \overline {v_{i}} \#\).
Proof of Claim 2: Obviously, for every i, 1 ≤ i ≤ n, \(\mathfrak {D}(\alpha _{i}) = \# \overline {v_{i}}\), which means that α_{i} = 1 implies that α_{i} is a nonterminal with derivative \(\# \overline {v_{i}}\). We now assume that α_{i}≥ 2 for some i, 1 ≤ i ≤ n. If we substitute α_{i}, by a new nonterminal \(\overset {{~}_{\leftarrow }}{V_{i}}\) with a rule \(\overset {{~}_{\leftarrow }}{V_{i}} \rightarrow \# \overline {v_{i}}\), then we shorten the axiom by at least \(2\lceil \log (n)\rceil +3\) and the size of the new rule is \(\# \overline {v_{i}} = \left \lceil \log (n) \right \rceil + 1\); thus, the overall size of the grammar does not increase. An analogous argument applies if \(\alpha ^{\prime }_{i} \geq 2\) for some i, 1 ≤ i ≤ n. Consequently, we can assume that we have \(\overset {{~}_{\leftarrow }}{V_{i}}, {\!}_{\rightarrow }{V_{i}} \in N\) with rules \( \overset {{~}_{\leftarrow }}{V_{i}} \rightarrow \# \overline {v_{i}}\) and \({\!}_{\rightarrow }{V_{i}} \rightarrow \overline {v_{i}} \#\), and \(\alpha _{i} = \overset {{~}_{\leftarrow }}{V_{i}}\), \(\alpha ^{\prime }_{i} = {\!}_{\rightarrow }{V_{i}}\), 1 ≤ i ≤ n.(Claim 2) \(\square \)
We recall that, for every i, 1 ≤ i ≤ n, \(\mathfrak {D}(\beta _{i}) = \# \overline {v_{i}} \#\). Hence, if, for some i, 1 ≤ i ≤ n, β_{i}≥ 2, then we can as well replace β_{i} by \(\overset {{~}_{\leftarrow }}{V_{i}} \#\) without increasing the size of the grammar. This implies that, for every i, 1 ≤ i ≤ n, \(\beta _{i} = \overset {{~}_{\leftarrow }}{V_{i}} \#\) or \(\beta _{i} = \overset {{~}_{\leftrightarrow }}{V_{i}}\) with \(\overset {{~}_{\leftrightarrow }}{V_{i}} \to \# \overline {v_{i}} \#\).
Next, recall that, for every j, 1 ≤ j ≤ m, \(\mathfrak {D}(\gamma _{i}) = \# \overline {v_{j_{2i1}}} \# \overline {v_{j_{2i}}} \#\). If, for some i, 1 ≤ i ≤ n, γ_{i}≥ 3, then we can as well replace γ_{i} by \(\overset {{~}_{\leftarrow }}{V}_{j_{2i1}} \overset {{~}_{\leftarrow }}{V}_{j_{2i}} \#\) without increasing the size of the grammar. If γ_{i} = 1, then there is a rule \(E \to \# \overline {v_{j_{2i1}}} \# \overline {v_{j_{2i}}} \#\) of size \(2 \lceil \log (n)\rceil + 3\). If we now replace γ_{i} by \(\overset {{~}_{\leftarrow }}{V}_{j_{2i1}} \overset {{~}_{\leftarrow }}{V}_{j_{2i}} \#\), then we increase the size of the axiom (and therefore of the grammar) by 4. However, since there are no other occurrences of \(\# \overline {v_{j_{2i1}}} \# \overline {v_{j_{2i}}} \#\) in w, there are no other occurrences of E in the axiom; thus, we can remove the rule \(E \to \# \overline {v_{j_{2i1}}} \# \overline {v_{j_{2i}}} \#\), which decreases the size of the grammar by \(2 \lceil \log (n)\rceil + 3 \geq 4\). Hence, the overall size of the grammar does not increase. If γ_{i} = 2, then γ_{i} = E_{1}E_{2} with \(E_{1} \to \# \overline {v_{j_{2i1}}} \# x\) or \(E_{2} \to x \# \overline {v_{j_{2i}}} \#\). Let us assume that there is a rule \(E_{1} \to \# \overline {v_{j_{2i1}}} \# x\) (the case \(E_{2} \to x \# \overline {v_{j_{2i}}} \#\) is analogous). If we now change this rule to \(E_{1} \to \# \overline {v_{j_{2i1}}} \#\) and substitute every E_{2} by \({\!}_{\rightarrow }{V}_{j_{2i}}\), then the size of the grammar does not increase (note that the nonterminals E_{1} and E_{2} can only occur in some γ_{j}, which has been replaced in this way).
These considerations demonstrate that we can assume that, in addition to the rule D → [◇], the rules of G are \(\overset {{~}_{\leftarrow }}{V_{i}} \rightarrow \# \overline {v_{i}}\), \({\!}_{\rightarrow }{V_{i}} \to \overline {v_{i}}\#\), 1 ≤ i ≤ n, and rules \(\overset {{~}_{\leftrightarrow }}{V_{i}} \to \#\overline {v_{i}}\#\) with \(i \in \mathfrak {I}\), for some \(\mathfrak {I} \subseteq \{1, 2, \ldots , n\}\). We now define \(\ell = \mathfrak {I}\) and the vertex set \(\mathcal {V} = \{v_{i} \mid i \in \mathfrak {I}\}\); furthermore, let t be the number of edges from \(\mathcal {G}\) that are covered by some vertex of \(\mathcal {V}\). The axiom has the following form:
where, for every i, 1 ≤ i ≤ n, \(y_{i} = \overset {{~}_{\leftrightarrow }}{V_{i}}\) if \(v_{i} \in \mathcal {V}\) and \(y_{i} = \overset {{~}_{\leftarrow }}{V_{i}} \#\) otherwise, and, for every i, 1 ≤ i ≤ m, \(z_{i} = \overset {{~}_{\leftarrow }}{V}_{j_{2i1}} \overset {{~}_{\leftarrow }}{V}_{j_{2i}} \#\), if the edge \((v_{j_{2i1}}, v_{j_{2i}})\) is not covered by \(\mathcal {V}\), \(z_{i} = \overset {{~}_{\leftrightarrow }}{V}_{j_{2i1}} {\!}_{\rightarrow }{V}_{j_{2i}}\) or \(z_{i} = \overset {{~}_{\leftarrow }}{V}_{j_{2i1}} \overset {{~}_{\leftrightarrow }}{V}_{j_{2i}}\), if \(v_{j_{2i1}} \in \mathcal {V}\) or \(v_{j_{2i}} \in \mathcal {V}\), respectively.
The total size of the rules is
Moreover,
Consequently, \(G = 13n\left \lceil \log (n) \right \rceil + 17n + \ell + 8m  2t + 1 + 2n^{3}\). Since, by assumption, \(G \leq 13n\left \lceil \log (n) \right \rceil + 17n + k + 6m + 1 + 2n^{3}\), we conclude that ℓ + 8m − 2t ≤ k + 6m. From this inequality, since t ≤ m, we can deduce ℓ ≤ k on the one hand and also \(m  \frac {k\ell }{2} \leq t\) on the other.
Consequently, the vertex set \(\mathcal {V}\) covers already \(m  \frac {k\ell }{2}\) edges of \(\mathcal {G}\). This implies that we can extend \(\mathcal {V}\) to a vertex cover \(\mathcal {V}^{\prime }\) for \(\mathcal {G}\) by adding q vertices, where \(q \leq \frac {k\ell }{2} \leq k\ell \). Since \(\mathcal {V} = \ell \), \(\mathcal {V}^{\prime } \leq \mathcal {V} + q \leq \ell + k\ell = k\). □
From Lemmas 2 and 3, we can directly conclude the following theorem:
Theorem 2
1SGP is NPcomplete, even for Σ = 5.
The MultiLevel Case
In the above reduction for the 1level case, the main difficulty is the use of unary factors as separators. However, once those separators are in place, we know the factors of w that are produced by nonterminals and, for a smallest 1level grammar, this already fully determines the axiom and therefore also the grammar itself. For the multilevel case, the situation is much more complicated. Even if we manage to force the axiom to factorise w into parts that are either separators or codewords of vertices, this only determines the topmost level of the grammar and we do not necessarily know how these single factors are further hierarchically compressed and, more importantly, the dependencies between these compressions (i. e., how they share the same rules).
To deal with these issues, we rely on a larger alphabet Σ and we use palindromic codewords u ⋆ u^{R}, where ⋆ ∈Σ and u is a word over an alphabet of size 7 representing a 7ary number. The purpose of the palindromic structure is twofold. Firstly, it implies that codewords always start and end with the same symbol, which, in the construction of w, makes it easier to avoid the situation that an overlapping between neighbouring codewords is repeated elsewhere in w (see Lemma 4). Secondly, if all codewords are produced by individual nonterminals, then we can show that they are produced best “from the middle”, similar to the rules of the example grammar G_{2} from Section 2.3. In addition to this, we also need a vertex colouring and an edge colouring of certain variants of the graph to be encoded.
In order to formally define the reduction, we first give some preparatory definitions. Let
be an alphabet of size 24. The function \(M \colon \mathbb {N}\times \mathbb {N}\rightarrow \mathbb {N}\) is defined by
(note that M is the positive modulofunction, i. e., M(q, k) = q%k, if q%k≠ 0 and M(q, k) = k, otherwise). Let the functions \(f\colon \mathbb {N} \rightarrow \{x_{1},\dots ,x_{7}\}^{+}\) and \(g\colon \mathbb {N} \rightarrow \{d_{1},\dots ,d_{7}\}^{+}\) be defined by
for every \(q \in \mathbb {N}\), where \(k \in \mathbb {N} \cup \{0\}\) and a_{i} ∈{1,2,…,7}, 0 ≤ i ≤ k, such that \(q={\sum }^{k}_{i=0} a_{i} 7^{i}\) is satisfied. Note that since, for every \(q \in \mathbb {N}\), there are unique \(k \in \mathbb {N}\) and a_{i} ∈{1,2,…,7}, 1 ≤ i ≤ k, such that \(q={\sum }^{k}_{i \geq 0} a_{i} 7^{i}\), the functions f and g are welldefined.
For every \(i \in \mathbb {N}\), let 〈i〉_{v} := f(i) ⋆ f(i)^{R} and 〈i〉_{◇} := g(i) ⋆ g(i)^{R}. The factors 〈i〉_{v} and 〈i〉_{◇} are called codewords; 〈i〉_{v} represents a vertex v_{i}, while the 〈i〉_{◇} are used as separators.
Observation 1
The functions f and g are bijections and they are 7ary representations of the integers n > 0 (least significant digit first). Thus, for any \(n \in \mathbb {N} \cup \{0\}\), g(7n + i)[1] = d_{i} and f(7n + i)[1] = x_{i}, 1 ≤ i ≤ 7. In particular, this means that \(\{g(n+i)[1]\mid 0\leq i \leq 6\}=\{d_{1},\dots ,d_{7}\}\) and \(\{f(n+i)[1]\mid 0\leq i \leq 6\}=\{x_{1},\dots ,x_{7}\}\), for every \(n \in \mathbb {N}\). Consequently, for every \(n, n^{\prime } \in \mathbb {N}\) with \(M(n, 7) \neq M(n^{\prime }, 7)\), the factors 〈n〉_{v} and \(\langle n^{\prime } \rangle _{v}\) do not share any prefixes or suffixes (and the same holds for the words 〈n〉_{◇}).
Let \(\mathcal {G}=(V,E)\) be a subcubic graph (i. e., a graph with maximum degree 3) with \(V=\{v_{1},\dots ,v_{n}\}\) and \(E=\{\{v_{j_{2i1}},v_{j_{2i}}\}\mid 1 \leq i \leq m\}\) (note that the vertex cover problem remains NPhard if restricted to subcubic graphs (see [49])). Let \(\mathcal {G}^{\prime }=(V,E^{\prime })\) be the multigraph defined by
By [50], it is possible to compute in polynomial time a proper edgecolouring (meaning a colouring such that no two edges which share one or two vertices have the same colour) for a multigraph with at most \(\lfloor \tfrac 32 m\rfloor \) colours, where m is the maximum degree of the multigraph. Since the graph \(\mathcal {G}\) is subcubic, the maximum degree of \(\mathcal {G}^{\prime }\) is three and we can compute a proper edgecolouring \(C_{e}\colon E^{\prime }\rightarrow \{1,2,3,4\}\) for \(\mathcal {G}^{\prime }\) with colours {1,2,3,4}. Let \(\mathcal {G}^{2}=(V,E^{\prime \prime })\) be the graph defined by
Since \(\mathcal {G}\) is subcubic, \(\mathcal {G}^{2}\) has maximum degree at most six. Let \(C_{v}\colon \{1,\dots ,n\}\rightarrow \{1,2,3,4,5,6,7\}\) be a proper vertexcolouring (defined over the vertexindices of \(V=\{v_{1},\dots ,v_{n}\}\)) for \(\mathcal {G}^{2}\) with colours {1,2,3,4,5,6,7}. Such a colouring can be computed by an algorithmic version of Brook’s theorem [51].
Let \(w_{\mathcal {G}} = u v w\) be the word representing \(\mathcal {G}\), where u, v, w ∈Σ^{+} are defined as follows (note that \(m \leq \frac {3n}{2}\), so 7m < 14n in the word w).
This concludes the definition of the reduction. Since the following proof of correctness is very complicated, we first present a corresponding “roadmap”, to make it more accessible:

First, and completely independent from the question of how a grammar could compress \(w_{\mathcal {G}}\), we take a closer look at the structure of this word. More precisely, in Propositions 1 and 2, we show that if a factor of \(w_{\mathcal {G}}\) spans over the symbol ⋆ of some codeword 〈i〉_{v} or 〈i〉_{◇} and also reaches over the boundaries of this codeword into some other factor, then it is not repeated in \(w_{\mathcal {G}}\). This property is the main reason for the complicated structure of \(w_{\mathcal {G}}\) (especially the factor v).

An immediate consequence of the property described in the previous point, is that in a smallest grammar, any nonterminal that derives a factor with an occurrence of ⋆ necessarily derives a factor that is completely contained in some codeword 〈i〉_{◇} or in some codeword 〈i〉_{v} delimited by two occurrences of the symbol # (see Lemma 4).

Next, we show that we can assume that in a smallest grammar, there are nonterminals that have exactly our codewords as derivatives (see Lemma 5).

The next result (Lemma 6) states that we can also assume that in a smallest grammar there are nonterminals with derivative #〈7i + C_{v}(i)〉_{v} and nonterminals with derivative 〈7i + C_{v}(i)〉_{v}#.

Finally, we are able to fix the structure of a smallest grammar (Lemma 7) and we can show that, just like in the reduction from [33, 34] (see Page 16), the set of rules that derive factors of the form #〈7i + C_{v}(i)〉_{v}# can be transformed into a vertex cover (see Lemma 8).
The following simple, but crucial observation shall be helpful throughout the proof of correctness:
Observation 2
The word \(w_{\mathcal {G}}\) contains each of the symbols $_{1},…, $_{6} exactly once, which implies that any smallest grammar for \(w_{\mathcal {G}}\) has an axiom of the form \({\prod }^{6}_{i=1}(\beta _{i} \$_{i}) \beta _{7}\), \(\beta _{i} \in ((V\cup {\Sigma }) \setminus \{\$_{1},\dots ,\$_{6}\})^{+}\), 1 ≤ i ≤ 7.
We now prove the two propositions that establish the property with respect to the repetitions of factors containing ⋆.
Proposition 1
For every i, 1 ≤ i ≤ 14n, and j, 1 ≤ j ≤ 7, the word \(w_{\mathcal {G}}\) contains at most one occurrence of a factor of the form
Furthermore, if such a factor occurs in \(w_{\mathcal {G}}\), then the occurrence is in u.
Proof
We first note that factors of the form stated in the lemma can only occur in factors of the form \(\langle i \rangle _{v} \langle i^{\prime } \rangle _{\diamond }\) or \(\langle i \rangle _{\diamond } \langle i^{\prime } \rangle _{v}\). Since such factors only occur in u, the second statement of the proposition holds.
We first take care of factors of the form \(\langle i \rangle _{v} d_{j^{\prime }}\), 1 ≤ i ≤ 14n, \(1 \leq j^{\prime } \leq 7\). These factors are subwords of 〈M(x + j,14n)〉_{v}〈x + 1〉_{◇} for some \(j\in \{0,\dots ,6\}\) and x such that i = M(x + j,14n), which for each choice of pair (j, x) occur at most once in u. For every i, 6 < i ≤ 14n, this gives the seven choices (j, i − j) with 0 ≤ j ≤ 6; note that i = M(x + j,14n) implies x = i − j. This shows that the word u contains the subword 〈i〉_{v}g(x + 1)[1] = 〈i〉_{v}g(i − j + 1)[1] once for each j, 0 ≤ j ≤ 6, and these are the only occurrences of a subword of the form \(\langle i \rangle _{v} d_{j^{\prime }}\) for some \(j^{\prime }\in \{1,\dots ,7\}\) in u. Since \(\{g(ij+1)[1] \mid 0 \leq j \leq 6\}=\{d_{1},\dots ,d_{7}\}\) by Observation 1, it follows that no subword of the form \(\langle i \rangle _{v}d_{j^{\prime }}\) with \(j^{\prime }\in \{1,\dots ,7\}\) appears in u more than once. For every i, 1 ≤ i ≤ 6, the choices of pairs (j, x) shift x by taking the modulo and are (j, i − j) for 0 ≤ j < i and (j,14n − j + i) for i ≤ j ≤ 6. The word u hence contains the subword 〈i〉_{v}g(i − j + 1)[1] once for each j, 0 ≤ j < i, the subword 〈i〉_{v}g(14n − j + i + 1)[1] once for each j, i ≤ j ≤ 6, and these are the only occurrences of a subword of the form \(\langle i \rangle _{v} d_{j^{\prime }}\) for some \(j^{\prime }\in \{1,\dots ,7\}\) in u. By reducing the 14n modulo 7 to zero, shifting by + 7 and substituting j by 7 − r we get that {g(14n − j + i + 1)[1]∣i ≤ j ≤ 6} = {g(i + 1 + r)[1]∣1 ≤ r ≤ 7 − i} and {g(i − j + 1)[1]∣0 ≤ j < i} = {g(i + 1 + r)[1]∣7 − i < r ≤ 7}. By Observation 1 we can hence conclude that each subword of the form \(\langle i \rangle _{v}d_{j^{\prime }}\) with \(j^{\prime }\in \{1,\dots ,7\}\) appears in u mat most once. Note that for i = 6, the factor 〈i〉_{v}g(14n − j + i + 1)[1] for the only choice j = 6 does not show up, as in this case u ends and 〈6〉_{v} is followed by $_{1}. Consequently, for every i, 1 ≤ i ≤ 14n, every factor ⋆ f(i)^{R}d_{j}, 1 ≤ j ≤ 7, has at most one occurrence in u.
Analogously, we can show that, for every i, 1 ≤ i ≤ 14n, every factor d_{j}f(i) ⋆, 1 ≤ j ≤ 7, has at most one occurrence in u. More precisely, it is sufficient to observe that, for every 6 < i ≤ 14n, the word u contains the subword g(i − j)[1]〈i〉_{v} once for each j, 0 ≤ j ≤ 6; for every 1 ≤ i ≤ 6, the subword g(i − j)[1]〈i〉_{v} once for each j, 0 ≤ j ≤ i − 1, and the subword g(14n − j)[1]〈i〉_{v} once for each j, 0 ≤ j ≤ 6 − i. As before, these are the only occurrences of a subword of the form \(d_{j^{\prime }} \langle i \rangle _{v}\) for some \(j^{\prime }\in \{1,\dots ,7\}\) in u.
For every i, 1 ≤ i ≤ 14n, there are exactly 7 factors of the form ⋆ g(i)^{R}x_{j}, for some j, 1 ≤ j ≤ 7. Let \(\star g(i) x_{j_{\ell }}\), 1 ≤ ℓ ≤ 7, be these 7 factors. By the structure of u, we observe that {j_{ℓ}∣1 ≤ ℓ ≤ 7} = {x_{1}, x_{2},…, x_{7}}, which directly implies that, for every i, 1 ≤ i ≤ 14n, every factor \(\star g(i)^{R} x_{j_{\ell }}\), 1 ≤ ℓ ≤ 7, has at most one occurrence in u. Analogously, we can show that, for every i, 1 < i ≤ 14n, every factor of the form x_{j}g(i) ⋆, 1 ≤ j ≤ 7, has at most one occurrence in u. Finally, there are exactly 6 factors of the form x_{j}g(1) ⋆, 1 ≤ j ≤ 7, namely the factors f(14n)[1]g(1) ⋆ and f(j)[1]g(1) ⋆, 1 ≤ j ≤ 5. Since {f(14n)[1], f(j)[1]∣1 ≤ j ≤ 5} = {x_{7}, x_{1}, x_{2},…, x_{5}}, it follows that every factor of the form x_{j}g(1) ⋆, 1 ≤ j ≤ 7, has at most one occurrence in u. □
Proposition 2
For every i, 1 ≤ i ≤ 14n, and j, 1 ≤ j ≤ 7, the word \(w_{\mathcal {G}}\) contains at most one occurrence of a factor of the form
where y ∈Σ∖{d_{1},…, d_{7}} and z ∈Σ∖{x_{1},…, x_{7}, #}.
Proof
We first consider the factors ⋆ g(i)^{R}y with y ∈Σ∖{d_{1},…, d_{7}}. In the case y ∈{x_{1},…, x_{7}}, Proposition 1 shows that such factors have at most one occurrence in \(w_{\mathcal {G}}\). For y ∈{⋆, #,¢_{1},¢_{2}, $_{1},…, $_{6}}, there are occurrences of factors of the form ⋆ g(i)^{R}y in v and in w, but not in u. We note that each two occurrences of factors ⋆ g(i)^{R}y and \(\star g(i^{\prime })^{R} y^{\prime }\) in w satisfy \(i \neq i^{\prime }\) and are therefore different. Moreover, all factors ⋆ g(i)^{R}y in w satisfy g(i)[1] ∈{1,2,3,4} (this is due to the colouring C_{e}). We next observe that all factors ⋆ g(i)^{R}y in v satisfy \(i \in \{7i^{\prime }, 7i^{\prime }1, 7i^{\prime }2 \mid i^{\prime } \in \mathbb {N}\}\), which implies that for these factors, we have g(i)[1] ∈{5,6,7}; thus, they all differ from the factors ⋆ g(i)^{R}y in w. Consequently, if a factor of the form ⋆ g(i)^{R}y repeats, then there must be individual occurrences of factors 〈i〉_{◇}y and \(\langle i \rangle _{\diamond } y^{\prime }\) in v. This is only the case for \(i = 7i^{\prime }  1\), but then there are exactly two such factors and with y ∈{#, $_{2}}, \(y^{\prime } = {\cent}_{2}\), or for \(i = 7i^{\prime }  2\), but then there are exactly two such factors and with y ∈{#, $_{3}}, \(y^{\prime } = {\cent}_{1}\). This shows that each factor ⋆ g(i)^{R}y with y ∈Σ∖{d_{1},…, d_{7}} has at most one occurrence in \(w_{\mathcal {G}}\). For the factors yg(i) ⋆ the argument is the same up to the point where we consider individual occurrences of factors y〈i〉_{◇} and \(y^{\prime } \langle i \rangle _{\diamond }\) in v. Again, this is only possible for \(i = 7i^{\prime }  1\) or \(i = 7i^{\prime }  2\), but in the first case, we have y = ¢_{1}, \(y^{\prime } = \#\), while in the second case, we have y = ¢_{2}, \(y^{\prime } = \#\).
We next turn to the factors ⋆ f(i)^{R}z with z ∈Σ∖{x_{1},…, x_{7}, #}. Again, Proposition 1 shows that for y ∈{d_{1},…, d_{7}} such factors have at most one occurrence in \(w_{\mathcal {G}}\); thus, we consider the case y ∈{⋆,¢_{1},¢_{2}, $_{1},…, $_{6}}. We first note that such factors have no occurrence in u. Moreover, for every i, 1 ≤ i ≤ 14n, any factor of the form 〈i〉_{v}y with y∉{d_{1},…, d_{7}, x_{1},…, x_{7}} has either no occurrence in vw, or exactly 5 occurrences in v and at most 3 occurrences in w (this is due to the fact that \(\mathcal {G}\) is subcubic). However, y is equal to # for all but two of those occurrences, where one occurrence is with y = ¢_{1} and the other with y = ¢_{2}. Consequently, each factor ⋆ f(i)^{R}z with z ∈Σ∖{x_{1},…, x_{7}, #} has at most one occurrence in \(w_{\mathcal {G}}\). The argument for the factors zf(i) ⋆ with z ∈Σ∖{x_{1},…, x_{7}, #} is analogous, with the difference that the only two occurrences of a factor y〈i〉_{v} in v with y∉{d_{1},…, d_{7}, x_{1},…, x_{7}, #} are once with y ∈{$_{3},¢_{1}} and once with y ∈{$_{4},¢_{2}}.
We next consider the factors d_{j}#f(i) ⋆ and first note that such a factor only occurs in a factor #〈i〉_{v} that is preceded by a factor \(\langle i^{\prime } \rangle _{\diamond }\), for some \(i^{\prime }\), \(1 \leq i^{\prime } \leq 14n\), and that such factors only occur in v or w. In v, there are either no or exactly 3 occurrences of #〈i〉_{v}. The first one is either a prefix of v or preceded by 〈7ℓ − 1〉_{◇}, 1 ≤ ℓ ≤ n, the second is preceded by either $_{2} or 〈7ℓ − 2〉_{◇}, 1 ≤ ℓ ≤ n, and the third one is preceded by either $_{5} or 〈7ℓ〉_{◇}, 1 ≤ ℓ ≤ n. Hence, these three occurrences are preceded by symbols d_{6}, d_{5} and d_{7}, respectively (or by symbols not in {d_{1},…, d_{7}}). Consequently, the factor d_{j}#f(i) ⋆ is not repeated in v and if it occurs, j ∈{5,6,7} holds. Next, we note that every #〈i〉_{v} in w that is preceded by a \(\langle i^{\prime } \rangle _{\diamond }\), satisfies \(i^{\prime } = 7\ell + C_{e}(v_{j_{2\ell }}, v_{j_{2\ell + 1}})\), and since the range of C_{e} is {1,2,3,4}, this occurrence of #〈i〉_{v} is preceded by symbol d_{1}, d_{2}, d_{3} or d_{4}. Finally, we have to show that no d_{j}#〈i〉_{v} is repeated in w. To this end, we assume that d_{j}#〈i〉_{v} with j ∈{1,2,3,4} is repeated. This implies that there are \(k, k^{\prime }\), \(1 \leq k < k^{\prime } \leq m1\), with \(j_{2k1} = j_{2k^{\prime }1} = i\), and, furthermore, \(\langle 7(k  1) + C_{e}(v_{j_{2(k1)}}, v_{j_{2(k  1)+1}}) \rangle _{\diamond }\) and \(\langle 7(k^{\prime }  1) + C_{e}(v_{j_{2(k^{\prime }1)}}, v_{j_{2(k^{\prime }  1)+1}}) \rangle _{\diamond }\) both end with symbol d_{j}. Thus, \(C_{e}(v_{j_{2(k1)}}, v_{j_{2(k  1)+1}}) = C_{e}(v_{j_{2(k^{\prime }1)}}, v_{j_{2(k^{\prime }  1)+1}}) = j\), which is a contradiction, since the edges \((v_{j_{2(k1)}}, v_{j_{2(k  1)+1}})\) and \((v_{j_{2(k^{\prime }1)}}, v_{j_{2(k^{\prime }  1)+1}})\) of \(\mathcal {G}^{\prime }\) are incident with the same vertex \(v_{j_{2k1}} = v_{j_{2k^{\prime }1}} = v_{i}\) and C_{e} is a proper edge colouring for \(\mathcal {G}^{\prime }\). Consequently, no d_{j}#〈i〉_{v} is repeated in w; thus, the word \(w_{\mathcal {G}}\) contains at most one occurrence of a factor of the form d_{j}#f(i) ⋆.
In an analogous way, we can show that every factor of form ⋆ f(i)^{R}#d_{j} in v satisfies j ∈{5,6,7} and in w it satisfies j ∈{1,2,3,4}. That these factors do not repeat follows from the fact that ⋆ f(i)^{R}# occurs at most 3 times in v (followed by the different symbols d_{5}, d_{6} and d_{7}) and the repetitions of ⋆ f(i)^{R}# in w are followed by distinct symbols from {d_{1}, d_{2}, d_{3}, d_{4}} due to the proper edge colouring C_{e} of \(\mathcal {G}^{\prime }\). Thus, the word \(w_{\mathcal {G}}\) contains at most one occurrence of a factor of the form ⋆ f(i)^{R}#d_{j}.
For any i, 1 ≤ i ≤ 14n, and j, 1 ≤ j ≤ 7, the factor ⋆ f(i)^{R}#x_{j} only occurs in w and only in a factor of the form \(\langle 7\ell +C_{v}(\ell ) \rangle _{v} \# \langle 7\ell ^{\prime }+C_{v}(\ell ^{\prime }) \rangle _{v}\), \(1 \leq \ell , \ell ^{\prime } \leq n\), with i = 7ℓ + C_{v}(ℓ) and \(f(7\ell ^{\prime }+C_{v}(\ell ^{\prime }))[1] = x_{j}\). Hence, if ⋆ f(i)^{R}#x_{j} has two occurrences, then there are \(\ell ^{\prime }, \ell ^{\prime \prime }\), \(1 \leq \ell ^{\prime }, \ell ^{\prime \prime } \leq n\), such that the vertices \(v_{\ell ^{\prime }}\) and \(v_{\ell ^{\prime \prime }}\) are neighbours of v_{ℓ} (in \(\mathcal {G}\)), and \(f(7\ell ^{\prime }+C_{v}(\ell ^{\prime }))[1] = f(7\ell ^{\prime \prime }+C_{v}(\ell ^{\prime \prime }))[1] = x_{j}\), which implies \(C_{v}(\ell ^{\prime }) = C_{v}(\ell ^{\prime \prime }) = j\). This is a contradiction to the fact that C_{v} is a proper vertex colouring for the graph \(\mathcal {G}^{2}\). In an analogous way, it follows that the factor x_{j}#f(i) ⋆ is not repeated. □
Since a smallest grammar does not contain rules which produce a factor which is not repeated, Propositions 1 and 2 yield the following:
Lemma 4
For every smallest grammar G = (N,Σ, R, ax) for \(w_{\mathcal {G}}\), \(\mathfrak {D}(A)_{\star } \geq 1\) for some A ∈ N implies that \(\mathfrak {D}(A)\) is a factor of some #〈7i + C_{v}(i)〉_{v}#, 1 ≤ i ≤ n, or a factor of some 〈j〉_{v}, 1 ≤ j ≤ 14n, or a factor of some 〈j〉_{◇}, 1 ≤ j ≤ 14n.
The main consequence of Lemma 4 is that, in a smallest grammar, the axiom has a length of at least the number of occurrences of ⋆ in \(w_{\mathcal {G}}\). This allows us to show that, without increasing the size of the grammar, the axiom can be restructured, such that each individual codeword is produced by its own nonterminal.
Lemma 5
There is a smallest grammar G for \(w_{\mathcal {G}}\) such that, for every i, 1 ≤ i ≤ 14n, there is a nonterminal with derivative 〈i〉_{◇} and a nonterminal with derivative 〈i〉_{v}.
Proof
Let G = (N,Σ, R, ax) be a smallest grammar with \(\mathfrak {D}(G) = w_{\mathcal {G}}\). We shall first show how G can be modified in such a way that, for every i, 1 ≤ i ≤ 14n, there is a nonterminal with derivative 〈i〉_{◇}. To this end, we assume that for some \(\mathfrak {I}_{\diamond } \subseteq \{1, 2, \ldots , 14n\}\) and every i, 1 ≤ i ≤ 14n, there currently is a nonterminal in G with derivative 〈i〉_{◇} if and only if \(i \in \mathfrak {I}_{\diamond }\); furthermore, let \(\overline {\mathfrak {I}_{\diamond }} = \{1, 2, \ldots , 14n\} \setminus \mathfrak {I}_{\diamond }\). For the sake of concreteness, for every \(i \in \mathfrak {I}_{\diamond }\), let \(\widehat {D}_{i}\) be the nonterminal with \(\mathfrak {D}(\widehat {D}_{i}) = \langle i \rangle _{\diamond }\).
We now recursively define a set of rules R_{◇} := {r_{◇, i}∣1 ≤ i ≤ 14n} for nonterminals D_{i}, 1 ≤ i ≤ 14n, by \(r_{\diamond , i} := D_{i} \rightarrow d_{i} \star d_{i}\), 1 ≤ i ≤ 7, and \(r_{\diamond , i} := D_{i} \rightarrow g(i)[1] D_{h(i)} g(i)[1]\), 8 ≤ i ≤ 14n, where \(h(i):= \frac {iM(i,7)}{7}\). Obviously, \(\mathfrak {D}(D_{i}) = \langle i \rangle _{\diamond }\), 1 ≤ i ≤ 14n. We modify G by the following algorithm. For every i = 1,2,…,14n, if \(i \in \overline {\mathfrak {I}_{\diamond }}\), then we add the rule D_{i} from R_{◇} to G, and if \(i \in \mathfrak {I}_{\diamond }\), then we replace the rule \(\widehat {D}_{i} \to \alpha \) by D_{i} → α. Furthermore, we can carry out an analogous modification with respect to derivatives 〈i〉_{v}. More precisely, we define \(\mathfrak {I}_{v} \subseteq \{1, 2, \ldots , 14n\}\) to be such that, for exactly the \(i \in \mathfrak {I}_{v}\), there is a nonterminal with derivative 〈i〉_{v}. Then, in the same way as above, we can add rules from the set R_{v} := {r_{v, i}∣1 ≤ i ≤ 14n}, where \(r_{v, i} := V_{i} \rightarrow x_{i} \star x_{i}\), 1 ≤ i ≤ 7, and \(r_{v, i} := V_{i} \rightarrow f(i)[1] V_{h(i)} f(i)[1]\), 8 ≤ i ≤ 14n, where \(h(i):= \frac {iM(i,7)}{7}\).
We denote this modified grammar by \(G^{\prime }\) and note that, by the considerations from above, for every i, 1 ≤ i ≤ 14n, \(G^{\prime }\) contains nonterminals D_{i} and V_{i} with
Moreover, since every rule from R_{◇} and R_{v} has size 3, \(G^{\prime } = G + 3(\overline {\mathfrak {I}_{\diamond }} + \overline {\mathfrak {I}_{v}})\). In the remainder of this proof, we show that this size increase can be compensated by using the new rules in order to significantly shorten the axiom. Hence, we obtain a smallest grammar, with the properties claimed in the lemma. To this end, we first measure the size of the axiom of the original grammar G.Claim 1: \(\mathsf {ax} = {\prod }^{6}_{i=1}(\beta _{i} \$_{i}) \beta _{7}\), where \(\beta _{i}\in ((N\cup {\Sigma }) \setminus \{\$_{1},\dots ,\$_{6}\})^{+}\), 1 ≤ i ≤ 7, and β_{1} contains at least 196n occurrences of symbols (terminal or nonterminal) that each produces exactly one occurrence of ⋆.
Proof of Claim 1: From Observation 2, it follows that \(\mathsf {ax} = {\prod }^{6}_{i=1}(\beta _{i} \$_{i}) \beta _{7}\), \(\beta _{i}\in ((N\cup {\Sigma }) \setminus \{\$_{1},\dots ,\$_{6}\})^{+}\), 1 ≤ i ≤ 7. Furthermore, β_{1} contains at least u_{⋆} symbols (terminal or nonterminal), since otherwise at least two occurrences of ⋆ of u are produced by the same nonterminal, which is a contradiction to Lemma 4. Hence, β_{1} contains at least 196n occurrences of symbols that each produces exactly one occurrence of ⋆. (Claim 1) \(\square \)Claim 2: There are at least \(7\lceil \frac {\overline {\mathfrak {I}_{\diamond }} + \overline {\mathfrak {I}_{v}}}{2}\rceil \) occurrences of symbols in β_{1} (terminal or nonterminal), each of which has a derivative without any occurrence of ⋆.
Proof of Claim 2: Let \(i \in \overline {\mathfrak {I}_{\diamond }}\), i. e., there is no nonterminal with derivative 〈i〉_{◇}. Furthermore, a derivative that properly contains 〈i〉_{◇} (and the corresponding nonterminal which occurs in β_{1}) contains an occurrence of ⋆ and occurrences of symbols from both sets {d_{1},…, d_{7}} and {x_{1},…, x_{7}}, which contradicts Lemma 4. Consequently, each of the 7 occurrences of 〈i〉_{◇} are produced by at least two symbols. Hence, for each of these 7 occurrences, there is one symbol producing a factor of 〈i〉_{◇} containing the symbol ⋆ and a second symbol, which produces a factor of 〈i〉_{◇} that contains symbols from {d_{1},…, d_{7}}. Due to Lemma 4, this second symbol cannot also produce the next or preceding occurrence of ⋆. This means that for each \(i \in \overline {\mathfrak {I}_{\diamond }}\), there exist 7 symbols that do not produce a symbol ⋆. In the same way, we can also conclude that for each \(i \in \overline {\mathfrak {I}_{v}}\), there exist 7 symbols that do not produce a symbol ⋆. However, it is possible that these symbols in β_{1} which do not produce a ⋆ coincide, i. e., such a symbol can produce parts of some 〈i〉_{◇} with \(i \in \overline {\mathfrak {I}_{\diamond }}\) and \(\langle i^{\prime } \rangle _{v}\), with \(i^{\prime } \in \overline {\mathfrak {I}_{v}}\). So we can only conclude that there are at least \(7\lceil \frac {\overline {\mathfrak {I}_{\diamond }} + \overline {\mathfrak {I}_{v}}}{2}\rceil \) occurrences of symbols in β_{1} that do not produce an occurrence of ⋆. (Claim 2) \(\square \)From these two claims, it follows that the axiom of G (and therefore the whole grammar G) has size of at least \(196n + 7\lceil \frac {\overline {\mathfrak {I}_{\diamond }} + \overline {\mathfrak {I}_{v}}}{2}\rceil \). We now change \(G^{\prime }\) a second time (into \(G^{\prime \prime }\)), as follows. We replace β_{1} in the axiom \(\mathsf {ax}^{\prime } = {\prod }^{6}_{i=1}(\beta _{i} \$_{i}) \beta _{7}\) of \(G^{\prime }\) (note that Observation 2 implies that \(\mathsf {ax}^{\prime }\) must have this structure) by \(\beta ^{\prime }_{1} = {\prod }^{6}_{j=0} {\prod }^{14n}_{i=1} D_{i} V_{M(i+j,14n)}\). We note that \(\beta _{1} \geq 196n + 7\lceil \frac {\overline {\mathfrak {I}_{\diamond }} + \overline {\mathfrak {I}_{v}}}{2}\rceil \), whereas \(\beta ^{\prime }_{1} = 196n\). Consequently,
□
In the hardness proof from [33, 34] for the case of unbounded alphabets (see Page 16), one simple, but crucial fact was that for every i, 1 ≤ i ≤ n, we can assume that nonterminals for each factor #v_{i} and v_{i}# exist. By using the previously mentioned lemmas, we now show a similar statement for our reduction:
Lemma 6
There is a smallest grammar G for \(w_{\mathcal {G}}\) such that, for every i, 1 ≤ i ≤ n, there is a nonterminal with derivative #〈7i + C_{v}(i)〉_{v} and a nonterminal with derivative 〈7i + C_{v}(i)〉_{v}#.
Proof
Let G = (N,Σ, R, ax) be a smallest grammar for \(w_{\mathcal {G}}\). By Lemma 5, we can assume that, for every i, 1 ≤ i ≤ 14n, there is a nonterminal D_{i} with derivative 〈i〉_{◇} and a nonterminal V_{i} with derivative 〈i〉_{v}.
Let ℓ be the total number of occurrences of symbols from {⋆, ¢_{1}, ¢_{2}, #, $_{1}, \(\dots \), $_{6}} in \(w_{\mathcal {G}}\). We can conclude that ax≤ ℓ, since an axiom of length ℓ can be obtained from \(w_{\mathcal {G}}\) (without introducing any new rules) by replacing all occurrences of 〈i〉_{◇} and 〈i〉_{v} by D_{i} and V_{i}, respectively.
Let \(N_{\mathsf {ax}} = \{A \mid A \in N, \mathsf {ax}_{A} \geq 1, \mathfrak {D}(A)_{\star } \geq 1\}\) and let Γ = {⋆,¢_{1},¢_{2}, #}. Furthermore, for every i, 1 ≤ i ≤ 3, \(N_{\mathsf {ax}, i} = \{A \mid A \in N_{\mathsf {ax}}, {\sum }_{x \in {\Gamma }} \mathfrak {D}(A)_{x} = i\}\). Since, for every A ∈ N_{ax}, \({\sum }_{x \in {\Gamma }} \mathfrak {D}(A)_{x} > 3\) is a contradiction to Lemma 4, we can conclude that {N_{ax,1}, N_{ax,2}, N_{ax,3}} is a partition of N_{ax}. Consequently, we can use this partition in order to estimate the length of the axiom in the following way: \(\mathsf {ax} \geq \ell  {\sum }_{A \in N_{\mathsf {ax}, 2}} \mathsf {ax}_{A}  2 {\sum }_{A \in N_{\mathsf {ax}, 3}} \mathsf {ax}_{A}\) (note that each occurrence of some A ∈ N_{ax, j}, j ∈{2,3}, is responsible for ax_{A} units of the size ax, but also for exactly jax_{A} occurrences of the total amount ℓ of symbols from \(\{\star , {\cent}_{1}, {\cent}_{2}, \#, \$_{1},\dots ,\$_{6}\}\)). Moreover, also due to Lemma 4, for every A ∈ N_{ax,2}, \(\mathfrak {D}(A) = \# f(7i+C_{v}(i)) \star r_{i}\) or \(\mathfrak {D}(A) = r_{i} \star f(7i+C_{v}(i))^{R}\#\) with r_{i}≤f(7i + C_{v}(i)) and, for every A ∈ N_{ax,3}, \(\mathfrak {D}(A) = \#\langle 7i+C_{v}(i) \rangle _{v} \#\).
We now add to G, for every i, 1 ≤ i ≤ n, the rules \(\overset {{~}_{\leftarrow }}{V_{i}}\rightarrow \# V_{7i+C_{v}(i)}\) and \({\!}_{\rightarrow }{V_{i}}\rightarrow V_{7i+C_{v}(i)} \#\), and, for every A ∈ N_{ax,3}, we add the rule \(\overset {{~}_{\leftrightarrow }}{V_{i}}\rightarrow \overset {{~}_{\leftarrow }}{V_{i}} \#\), where \(\mathfrak {D}(A) = \#\langle 7i+C_{v}(i) \rangle _{v} \#\). Then, we replace ax by a new axiom \(\mathsf {ax}^{\prime }\) that is obtained from \(w_{\mathcal {G}}\) in the following way. Every factor 〈i〉_{◇} is replaced by D_{i}. For every occurrence of ⋆ in \(w_{\mathcal {G}}\), if this occurrence of ⋆ is produced (according to ax) by a nonterminal A ∈ N_{ax,3}, which, since \(\mathfrak {D}(A) = \#\langle 7i+C_{v}(i) \rangle _{v} \#\), implies that it is inside a factor #〈7i + C_{v}(i)〉_{v}#, then we replace #〈7i + C_{v}(i)〉_{v}# by \(\overset {{~}_{\leftrightarrow }}{V_{i}}\). All remaining factors of the form #〈7i + C_{v}(i)〉_{v}# are replaced by \(\overset {{~}_{\leftarrow }}{V_{i}} \#\). Then, all remaining factors #〈7i + C_{v}(i)〉_{v} and 〈7i + C_{v}(i)〉_{v}# are replaced by \(\overset {{~}_{\leftarrow }}{V_{i}}\) and \({\!}_{\rightarrow }{V_{i}}\), respectively (note that since there are no factors of the form #〈7i + C_{v}(i)〉_{v}# left, this is unambiguous). We note that \(\mathsf {ax}^{\prime } = \ell  {\sum }_{i=1}^{n}(\mathsf {ax}^{\prime }{\overset {{~}_{\leftarrow }}{V_{i}}}+\mathsf {ax}^{\prime }{{\!}_{\rightarrow }{V_{i}}})  2 {\sum }_{i=1}^{n} \mathsf {ax}^{\prime }{\overset {{~}_{\leftrightarrow }}{V_{i}}}\).
Next, we show that all the rules for the nonterminals of N_{ax,2} ∪ N_{ax,3} can be removed from the grammar. To this end, let A ∈ N_{ax,2} ∪ N_{ax,3}, which means that \(\mathfrak {D}(A)_{\#} \geq 1\). However, every occurrence of # of \(w_{\mathcal {G}}\) that is produced by a rule (and is not already present in the new axiom \(\mathsf {ax}^{\prime }\)), is directly produced by \(\overset {{~}_{\leftarrow }}{V_{i}}\), \({\!}_{\rightarrow }{V_{i}}\) or \(\overset {{~}_{\leftrightarrow }}{V_{i}}\), i. e., it occurs on the right side of these rules and is not produced by means of any other nonterminal. Consequently, in the derivation of \(w_{\mathcal {G}}\), the nonterminal A is not used and, therefore, its rule can be erased.
It only remains to show that the modified grammar is not larger than the original one, i. e., we have to compare \(\mathsf {ax}^{\prime }\) to ax show that the size increase of 2 caused by each added rule is compensated. For every new rule \(\overset {{~}_{\leftrightarrow }}{V_{i}}\rightarrow \overset {{~}_{\leftarrow }}{V_{i}} \#\) (of cost 2), there is an A ∈ N_{ax,3} with \(\mathfrak {D}(A) = \# \langle 7i+C_{v}(i) \rangle _{v} \#\) (of cost at least 2), for which the rule is erased and all all occurrences of A in ax correspond to occurrences of some \(\overset {{~}_{\leftrightarrow }}{V_{i}}\) in \(\mathsf {ax}^{\prime }\), hence \({\sum }_{i=1}^{n} \mathsf {ax}^{\prime }{\overset {{~}_{\leftrightarrow }}{V_{i}}}={\sum }_{A \in N_{\mathsf {ax}, 3}} \mathsf {ax}_{A}\). For every new rule \(\overset {{~}_{\leftarrow }}{V_{i}}\rightarrow \# V_{7i+C_{v}(i)}\) consider \(\overset {{~}_{\leftarrow }}{I}:=\{i\colon \mathfrak {D}(A) = \# f(7i+C_{v}(i)) \star r_{i} \text { for some } A \in N_{\mathsf {ax}, 2}\}\). If \(i\in \overset {{~}_{\leftarrow }}{I}\) we have removed at least one rule A → α with \( \mathfrak {D}(A) = \# f(7i+C_{v}(i)) \star r_{i}\) with α≥ 2, so the cost for all rules \(\overset {{~}_{\leftarrow }}{V_{i}}\rightarrow \# V_{7i+C_{v}(i)}\) with \(i\in \overset {{~}_{\leftarrow }}{I}\) is compensated. Further, every occurrence of this A in ax yields an occurrence of \(\overset {{~}_{\leftarrow }}{V_{i}}\) in \(\mathsf {ax}^{\prime }\). If \(i\not \in \overset {{~}_{\leftarrow }}{I}\), then both occurrences of #〈7i + C_{v}(i)〉_{v} in the factor v of \(w_{\mathcal {G}}\) are produced in ax by at least two nonterminals each. An analogous argument applies to the new rules \({\!}_{\rightarrow }{V_{i}}\rightarrow V_{7i+C_{v}(i)} \#\) with \({\!}_{\rightarrow }{I}:=\{i\colon \mathfrak {D}(A) = r_{i}\star f(7i+C_{v}(i))^{R} \# \text { for some } A \in N_{\mathsf {ax}, 2}\}\). This yields \({\sum }_{i=1}^{n}(\mathsf {ax}^{\prime }{\overset {{~}_{\leftarrow }}{V_{i}}}+\mathsf {ax}^{\prime }{{\!}_{\rightarrow }{V_{i}}}) \geq {\sum }_{A \in N_{\mathsf {ax}, 2}} \mathsf {ax}_{A} +2(n\overset {{~}_{\leftarrow }}{I})+2(n{\!}_{\rightarrow }{I})\). Together with \({\sum }_{i=1}^{n} \mathsf {ax}^{\prime }{\overset {{~}_{\leftrightarrow }}{V_{i}}}={\sum }_{A \in N_{\mathsf {ax}, 3}} \mathsf {ax}_{A}\) we can conclude:
Since every new rule for \({\overset {{~}_{\leftarrow }}{V_{i}}}\) or \({{\!}_{\rightarrow }{V_{i}}}\) is added at a cost of two, the difference between \(\mathsf {ax}^{\prime }\) and ax compensates for the additional rules \(\overset {{~}_{\leftarrow }}{V_{i}}\rightarrow \#V_{7i+C_{v}(i)} \) with \(i\not \in \overset {{~}_{\leftarrow }}{I}\) and \({\!}_{\rightarrow }{V_{i}}\rightarrow V_{7i+C_{v}(i)} \#\) with \(i\not \in {\!}_{\rightarrow }{I}\). Recall further that the cost for the rules for \({\overset {{~}_{\leftrightarrow }}{V_{i}}}\) are compensated by deleting the rules in N_{ax,3}. Overall, the modified grammar is not larger than the original grammar. Furthermore, the new grammar has now the form stated in the lemma. □
Now, by the lemmas presented above, we are able to sufficiently pin down the structure of a smallest grammar for \(w_{\mathcal {G}}\):
Lemma 7
There is a smallest grammar G for \(w_{\mathcal {G}}\) that contains all the rules

R_{◇} := {r_{◇, i} : ∣1 ≤ i ≤ 14n}, with \(r_{\diamond , i} := D_{i} \rightarrow d_{i} \star d_{i}\), 1 ≤ i ≤ 7, and \(r_{\diamond , i} := D_{i} \rightarrow g(i)[1] D_{h(i)} g(i)[1]\), 8 ≤ i ≤ 14n, where \(h(i):= \frac {iM(i,7)}{7}\),

R_{v} := {r_{v, i} : ∣1 ≤ i ≤ 14n}, with \(r_{v, i} := V_{i} \rightarrow x_{i} \star x_{i}\), 1 ≤ i ≤ 7, and \(r_{v, i} := V_{i} \rightarrow f(i)[1] V_{h(i)} f(i)[1]\), 8 ≤ i ≤ 14n, where \(h(i):= \frac {iM(i,7)}{7}\),

\(\overset {{~}_{\leftarrow }}{V} := \{\overset {{~}_{\leftarrow }}{V_{i}}\rightarrow \# V_{7i+C_{v}(i)} \mid 1 \leq i \leq n\}\),

\({\!}_{\rightarrow }{V} := \{{\!}_{\rightarrow }{V_{i}} \rightarrow V_{7i+C_{v}(i)} \# \mid 1 \leq i \leq n\}\),

\(\overset {{~}_{\leftrightarrow }}{V} := \{\overset {{~}_{\leftrightarrow }}{V_{i}} \rightarrow \# {\!}_{\rightarrow }{V_{i}} \mid i \in \mathfrak {I}\}\), for some \(\mathfrak {I} \subseteq \{1, 2, \ldots , n\}\).
and an axiom \(\mathsf {ax} = {\prod }^{6}_{i=1}(\beta _{i} \$_{i}) \beta _{7}\) with
Proof
Let G be a smallest grammar for \(w_{\mathcal {G}}\). By Lemma 5, we can assume that, for every i, 1 ≤ i ≤ 14n, there is a nonterminal D_{i} with derivative 〈i〉_{◇} and a nonterminal V_{i} with derivative 〈i〉_{v}, and, by Lemma 6, we can assume that, for every i, 1 ≤ i ≤ n, there is a nonterminal \(\overset {{~}_{\leftarrow }}{V_{i}}\) with derivative #〈7i + C_{v}(i)〉_{v} and a nonterminal \({\!}_{\rightarrow }{V_{i}}\) with derivative 〈7i + C_{v}(i)〉_{v}#. Obviously, for every i, 1 ≤ i ≤ n, we can substitute the rule for \(\overset {{~}_{\leftarrow }}{V_{i}}\) by \(\overset {{~}_{\leftarrow }}{V_{i}} \to \# V_{i}\) and the rule for \({\!}_{\rightarrow }{V_{i}}\) by \({\!}_{\rightarrow }{V_{i}} \to V_{i} \#\), without increasing the size of G.
Next, for every V_{j} → α_{j} with α_{j}≥ 3, we can replace V_{j} → α_{j} by V_{j} → x_{j} ⋆ x_{j}, if j ≤ 7, and by \(V_{j} \rightarrow f(j)[1] V_{h(j)} f(j)[1]\), if 8 ≤ j, where \(h(j):= \frac {jM(j,7)}{7}\). This does not increase the size of G, since the size of the modified rules can only decrease and no new rules need to be added. Now let \(j = \max \limits \{i \mid 1 \leq i \leq 14n, V_{i} \to \alpha _{i}, \alpha _{i} = 2\}\). We can now again replace V_{j} → α_{j} by V_{j} → x_{j} ⋆ x_{j}, if j ≤ 7, and by \(V_{j} \rightarrow f(j)[1] V_{h(j)} f(j)[1]\), if 8 ≤ j, where \(h(j):= \frac {jM(j,7)}{7}\), but now this operation increases the size of the grammar by 1, which, as shall be shown next, is compensated by removing a rule from the grammar. To this end, we note that α_{j} = A_{j}B_{j} and \(\mathfrak {D}(A_{j}) = f(j) \star t_{j}\) or \(\mathfrak {D}(B_{j}) = t_{j} \star f(j)^{R}\) for some \(t_{j} \in \{x_{1}, \ldots , x_{7}\}^{*}\). Let us assume that \(\mathfrak {D}(A_{j}) = f(j) \star t_{j}\) (the case \(\mathfrak {D}(B_{j}) = t_{j} \star f(j)^{R}\) can be handled analogously); note that this particularly implies that A_{j}∉{V_{i}∣1 ≤ i ≤ 14n}, since its derivative is not of the form 〈i〉_{v}. Since f(j) ⋆ t_{j} does not occur in any \(\langle j^{\prime } \rangle _{v}\) with \(j^{\prime } < j\), A_{j} is not involved in a production of any \(\langle j^{\prime } \rangle _{v}\) with \(j^{\prime } < j\). Moreover, A_{j} cannot occur on the right side of the rule for a \(V_{j^{\prime }}\) with \(j < j^{\prime }\), since, due to the maximality of j and the modifications from above, those only have nonterminals of the form V_{i} on the right side. Thus, A_{j} has no occurrence in any of the rules for the nonterminals V_{i}, 1 ≤ i ≤ 14n. This means that A_{j} can only occur on the right side of some nonterminal with a derivative that is not a factor of some 〈i〉_{v} and, since \(\mathfrak {D}(A_{j})_{\star } \geq 1\), with Lemma 4, we can further conclude that A_{j} can only occur on the right side of some nonterminal with a derivative #〈i〉_{v}, 〈i〉_{v}# or #〈i〉_{v}#. The rules \(\overset {{~}_{\leftarrow }}{V_{i}} \to \# V_{i}\) and \({\!}_{\rightarrow }{V_{i}} \to V_{i} \#\) have the derivatives #〈i〉_{v} and 〈i〉_{v}#, respectively, and their right sides do not contain A_{j}. Furthermore, if the right side of a nonterminal with derivative #〈i〉_{v}# contains A_{j}, we can replace it by \(\overset {{~}_{\leftarrow }}{V_{i}} \#\) without increasing the size of the grammar. Consequently, we can assume that the nonterminal A_{j} is never used and therefore its rule can be removed. By repeating this argument, it follows that G contains all the rules R_{v}.
In a similar way, we can show that G contains all the rules R_{◇} (in fact, the argument is simpler, since in this case, Lemma 4 together with the fact that A_{j} can only occur on the right side of some nonterminal with a derivative that is not a factor of some 〈i〉_{◇} immediately implies that A_{j} does not occur on any right side).
We now assume that \(\mathsf {ax} = {\prod }^{6}_{i=1}(\beta _{i} \$_{i}) \beta _{7}\) is the axiom of G. In the same way as in the proofs of Lemmas 5 and 6, we can conclude that β_{1}≥ 196n, β_{ℓ}≥ 3n, 1 ≤ ℓ ≤ 5. Hence, replacing ax by \(\mathsf {ax}^{\prime } = {\prod }^{6}_{i=1}(\beta ^{\prime }_{i} \$_{i}) \beta ^{\prime }_{7}\) with
does not increase the size of the grammar. We now consider β_{6}, which produces the word \(v_{6} = {\prod }_{i=1}^{n} \left (\# \langle 7i+C_{v}(i) \rangle _{v} \# \langle 7i \rangle _{\diamond }\right )\). We can conclude the following from Lemma 4. No two occurrences of ⋆ in v_{6} can be produced by the same nonterminal; thus, β_{6}≥ 2n. Furthermore, the only factors that are repeated in \(w_{\mathcal {G}}\) and that contain an occurrence of both ⋆ and # are factors of #〈7i + C_{v}(i)〉_{v}#. Hence, for every i, 1 ≤ i ≤ n, if the factor #〈7i + C_{v}(i)〉_{v}# in #〈7i + C_{v}(i)〉_{v}#〈7i〉_{◇} is not produced by a single nonterminal, then there is an additional nonterminal in β_{6} (i. e., in addition to the two nonterminals producing the two occurrences of ⋆ in #〈7i + C_{v}(i)〉_{v}#〈7i〉_{◇}). This implies that β_{6}≥ 3n − p, where p is the number of nonterminals with a derivative of #〈7i + C_{v}(i)〉_{v}#. This means that we can replace every such nonterminal and its rule by \(\overset {{~}_{\leftrightarrow }}{V_{i}} \rightarrow \# {\!}_{\rightarrow }{V_{i}}\) without increasing the size of the grammar. Furthermore, again without increasing the size of the grammar, we can replace β_{6} by \({\prod }_{i=1}^{n} \left (y_{i} D_{7i} \right )\), where, for every i, 1 ≤ i ≤ n, \(y_{i} = \overset {{~}_{\leftrightarrow }}{V_{i}}\) if this nonterminal exists and \(y_{i} = \overset {{~}_{\leftarrow }}{V_{i}} \#\) otherwise.
Next, we consider β_{7}, which produces the word
Similar as for the word v_{6}, every occurrence of ⋆ in v_{7} requires a distinct nonterminal and, in addition to that, also a distinct nonterminal for each factor #〈7i + C_{v}(i)〉_{v}# that is not completely produced by a single nonterminal. Hence, β_{7}≥ 4m − 1 − q, where q is the number of nonterminals \(\overset {{~}_{\leftrightarrow }}{V_{i}}\) used in β_{7}. Consequently, we can also replace β_{7} by \(v_{7} = {\prod }_{i=1}^{m1}(y_{i} D_{7i+C_{e}(v_{j_{2i}},v_{j_{2i+1}})}) y_{m}\), where, for every i, 1 ≤ i ≤ m, \(y_{i} \in \{\overset {{~}_{\leftrightarrow }}{V}_{j_{2i1}} {\!}_{\rightarrow }{V}_{j_{2i}}, \overset {{~}_{\leftarrow }}{V}_{j_{2i1}} \overset {{~}_{\leftrightarrow }}{V}_{j_{2i}}\}\), if \(\overset {{~}_{\leftrightarrow }}{V}_{j_{2i1}}\) or \(\overset {{~}_{\leftrightarrow }}{V}_{j_{2i}}\) exist, and \(y_{i} = \overset {{~}_{\leftarrow }}{V}_{j_{2i1}} \overset {{~}_{\leftarrow }}{V}_{j_{2i}}\#\), otherwise. We note that this does not increase the size of the grammar.
The grammar has now the form claimed in the statement of the lemma (note that all other rules not mentioned in the statement of the lemma can be ignored, since they are not used anymore). □
Finally, we are able to conclude the proof of correctness by establishing the connection between the size of a smallest grammar for \(w_{\mathcal {G}}\) and the size of a vertex cover for \(\mathcal {G}\).
Lemma 8
The graph \(\mathcal {G}\) has a vertex cover of size k if and only if \(w_{\mathcal {G}}\) has a grammar of size 299n + k + 3m + 5.
Proof
Let Γ be a sizek vertex cover of \(\mathcal {G}\). We construct the grammar described in Lemma 7 with respect to \(\mathfrak {I} = \{i \mid v_{i} \in {\Gamma }\}\). Since Γ is a vertex cover, in the definition of β_{7}, we have \(y_{i} \in \{\overset {{~}_{\leftrightarrow }}{V}_{j_{2i1}} {\!}_{\rightarrow }{V}_{j_{2i}}, \overset {{~}_{\leftarrow }}{V}_{j_{2i1}} \overset {{~}_{\leftrightarrow }}{V}_{j_{2i}}\}\), for every 1 ≤ i ≤ m. Consequently, by simply counting the symbols on the right sides of the rules, we conclude \(G = 299n + \mathfrak {I} + 3m + 5 = 299n + k + 3m + 5\).
On the other hand, if there is a grammar of size 299n + k + 3m + 5 for \(w_{\mathcal {G}}\), then, by Lemma 7, we can also assume that there exists a grammar G for \(w_{\mathcal {G}}\) with \(G = 299n + \mathfrak {I} + 3m + 5 \leq 299n + k + 3m + 5\) that has the form described in Lemma 7, with respect to some \(\mathfrak {I} \subseteq \{1, 2, \ldots , n\}\). If, for some edge (v_{i}, v_{j}), \(\{v_{i}, v_{j}\} \cap \mathfrak {I} = \emptyset \), then adding i to \(\mathfrak {I}\) (and therefore the rule \(\overset {{~}_{\leftrightarrow }}{V_{i}} \rightarrow \# {\!}_{\rightarrow }{V_{i}}\) to the grammar) does not increase the size of the grammar. This is due to the fact that the additional cost of 2 for introducing the rule is compensated by using \(\overset {{~}_{\leftrightarrow }}{V_{i}}\) once in β_{6} and once in β_{7}. Consequently, we can assume that \({\Gamma } = \{v_{i} \mid i \in \mathfrak {I}\}\) is a vertex cover. Since \(G = 299n + \mathfrak {I} + 3m + 5 \leq 299n + k + 3m + 5\), this means that Γ is a vertex cover for \(\mathcal {G}\) of size at most \(\mathfrak {I}=k\). □
From Lemma 8, we directly conclude our main result:
Theorem 3
SGP is NPcomplete, even for alphabets of size 24.
Obviously, Theorem 3 leaves some room for improvement with respect to smaller alphabet sizes. In our reduction, we did use terminal symbols economically, but, for reasons explained next, this was not our main concern. While we generally believe that the alphabet size can be slightly reduced in our reduction, we consider it very unlikely that its current structure allows a substantial improvement in this regard (e. g., an alphabet size below 10). Thus, we did not further pursue this point, which we expect to lead to an even more involved reduction while at the same time only insignificantly decreases the alphabet size. Consequently, the NPhardness of the smallest grammar problem for small alphabets (with the most interesting candidates being 2 (i. e., binary strings) and 4 (due to the fact that DNAsequences use a 4letter alphabet)) remains open. Furthermore, we expect that completely new techniques are required for respective hardness reductions. In this regard, note that for alphabets of size 1, the smallest grammar problem is strongly connected to the problem of computing the smallest addition chain for a single integer; a problem that is neither known to be in P nor to be NPhard (see [34] or Section 6 for details).
Extensions of the Reductions
In this section, we conclude several important hardness results by slight modifications of the reduction presented in Section 3.2. First, we show that the optimisation variant of the smallest grammar problem (over fixed alphabets) is APXhard and therefore it does not allow for a polynomialtime approximation scheme, unless P = NP. Just like Theorem 3 lifts the known NPhardness of the smallest grammar problem for unbounded alphabets to the practically relevant case of fixed alphabets, this APXhardness result lifts the inapproximability result for unbounded alphabets of [33, 34] to the fixed alphabet case. There is one caveat, though, which is that the corresponding constant lower bound on the approximation ratio is much lower than the already low 1.0001 achieved for unbounded alphabets; thus, we do not bother to actually compute it and we consider the value of the APXhardness result that the existence of a PTAS is ruled out.
Theorem 4
SGP_{opt} is APXhard, even for alphabets of size 24.
Proof
The reduction used for Theorem 3 can also be seen as an Lreduction from the optimisation variant of the minimum vertex cover problem restricted to cubic graphs (each vertex has degree 3), which remains APXhard (see [52]). More precisely, this problem is denoted by (I_{VC}, S_{VC}, m_{VC}), where I_{VC} is the set of undirected cubic graphs, \(S_{\textsc {VC}}(\mathcal {G}) = \{C \mid C \text { is a vertex cover for } \mathcal {G}\}\) and \(m_{\textsc {VC}}(\mathcal {G}, C) = C\); we denote SGP_{opt} by (I_{SGP}, S_{SGP}, m_{SGP}).
Next, we describe an Lreduction from the problem (I_{VC}, S_{VC}, m_{VC}) to the problem (I_{SGP}, S_{SGP}, m_{SGP}). The above described translation of a graph \(\mathcal {G}\) to the word \(w_{\mathcal {G}}\) (i. e., the one defined in Section 3.2 in order to prove Theorem 3) gives the function f for the Lreduction. The function g, that maps \(\mathcal {G} \in I_{\textsc {VC}}\) and a grammar \(G \in S_{\textsc {SGP}}(f(\mathcal {G}))\) to a vertex cover \(C \in S_{\textsc {VC}}(\mathcal {G})\) works as follows. We first build a grammar \(G^{\prime }\) with \(G^{\prime }\leq G\) which is of the form described in Lemma 7; observe that all transformations that are necessary to reach this kind of normal form are constructive and computable in polynomial time. Then \(g(\mathcal {G}, G) = \{v_{i} \mid i \in \mathfrak {I}\}\), which is a vertex cover for \(\mathcal {G}\) by Lemma 8 (note that the set \(\mathfrak {I}\) is ensured by Lemma 7). Finally, we show that choosing β = 613 and γ = 1 satisfies the inequalities. To this end, we first note that, for any cubic graph \(\mathcal {G}\) with n vertices and m edges, we have \(m=\frac {3}{2}n\) (since each vertex has degree 3) and \(m_{\textsc {VC}}^{*}(\mathcal {G}) \geq \frac {n}{2}\) (since each vertex can cover at most three edges), and \(m_{\textsc {VC}}^{*}(\mathcal {G}) \geq 1\).
for any \(\mathcal {G}\in I_{\textsc {VC}}\). Furthermore,
for any \(\mathcal {G}\in I_{\textsc {VC}}\) and \(G\in S_{\textsc {SGP}}(w_{\mathcal {G}})\). □
Next, we take a closer look at the rulesize measure of grammars, i. e., at the problems SGP_{r} and 1SGP_{r}. As defined in Section 2.2, the rulesize also takes the number of rules into account. In fact, the literature on grammarbased compression is inconsistent with respect to which kind of size is used, e. g., in [6, 8, 15, 33, 34, 34, 41], the size of a grammar coincides with our definition ⋅, while in [4, 53,54,55], the rulesize is used. The rulesize seems to be mainly motivated by the question of how a grammar is encoded as a single string, which, in any reasonable way, requires an additional symbol per rule.^{Footnote 9} In many contexts, the difference between size and rulesize of grammars seems negligible, but, formally, the problems SGP and SGP_{r} (as well as 1SGP and 1SGP_{r}) are different decision problems and hardness results do not automatically carry over from one to the other. Since the existing literature suggests that the rulesize is of interest as well, we consider it a worthwhile task to extend our hardness results accordingly.
It seems intuitively clear that the size increase caused by measuring with the rulesize does not have an impact on the complexity of the smallest grammar problem. In fact, the arguments in the proof for Theorem 2 for the 1level case also apply for the rulesize, but with an addition of 2n + k + 2 (i. e., the number of rules) to the size of an rsmallest grammar. This is due to the fact that the rules that are introduced in the proof of Lemma 3 also shorten the grammar with respect to the rulesize measure.
Theorem 5
1SGP_{r} is NPcomplete, even for even for alphabets of size 5.
In the multilevel case, however, the situation is not so simple. In particular, in the proof of Theorem 3, there are some arguments, which do not apply for the rulesize. For example, a rule which only compresses a factor of length two is only profitable (with respect to the rulesize) if it can be used at least three times, which is problematic, since the rules which correspond to the vertex cover have length two and, in case the vertex only covers one edge, compress factors which only occur twice. Beside these problems, already in Lemma 5, we can see that it is hard to prove that the rulesize of the desired grammar \(G^{\prime \prime }\) is smaller than G_{r} as we now have to pay a cost of 4 for each rule V_{i} (or D_{i}) with \(i\notin \mathfrak {I}_{v}\) (or \(i\notin \mathfrak {I}_{\diamond }\)) which cannot be compensated by shortening the axiom for u only by \(7\lceil \frac {\overline {\mathfrak {I}_{\diamond }}+\overline {\mathfrak {I}_{v}}}{2}\rceil \).
With a larger alphabet and certain repetitions of subwords of \(w_{\mathcal {G}}\), we can modify the reduction to accommodate the rulesize, such that the arguments used for Theorem 3 still hold for this measure. To this end, we now encode 〈i〉_{v} and 〈i〉_{◇} over 8ary instead of 7ary alphabets \(\{x_{1},\dots ,x_{8}\}\) and \(\{d_{1},\dots ,d_{8}\}\), respectively, with analogous functions f and g. Let \(v^{\prime }\) and \(w^{\prime }\) be defined as v and w on page 23, but with respect to the new 8ary codewords which only means that each occurrence of ‘7’ in the definition of v and w is replaced by ‘8’. Moreover, let \(u^{\prime }\) be defined as u on page 23, but with the ‘6’ of the first product replaced by ‘7’ and the ‘14n’ of the second product replaced by ‘24n + 4’ (the latter is necessary, since we need more separators of the form 〈i〉_{◇}). The colourings C_{v} and C_{e} remain unchanged.
In order to adapt the reduction to the rulesize measure, we have to repeat each factor #〈8i + C_{v}(i)〉_{v} and each factor 〈8i + C_{v}(i)〉_{v}# once more, but in such a way that Proposition 2 still holds, which is done by using three new symbols $_{7}, $_{8} and ¢_{3}, and to add the following to \(v^{\prime }\):
In order to also repeat once more the factors #〈8j_{2i} + C_{v}(j_{2i})〉_{v}# to make covering edges profitable with respect to the rulesize, we repeat the complete list of edges, but every edge \((v_{j_{2i1}}, v_{j_{2i}})\) is represented in reverse order as #〈8j_{2i} + C_{v}(j_{2i})〉_{v}#〈8j_{2i− 1} + C_{v}(j_{2i− 1})〉_{v}# to make sure that no subword of the form 〈i〉_{v}#x_{j} or x_{j}#〈i〉_{v} is repeated. We further choose a new, previously not used set of separators 〈i〉_{◇} (actually the 2m + 4 more for which we created codewords with u) to make sure that each factor of the form 〈i〉_{◇}# or #〈i〉_{◇} occurs at most once. We still chose the separators according to the edgecolouring to make sure that no factors of the form 〈i〉_{v}#d_{j} or d_{j}#〈i〉_{v} are repeated; observe that by repeating the edges in reverse order, a factor of the form 〈i〉_{v}#d_{j} in \(w^{\prime }\) becomes a factor of the form d_{j}#〈i〉_{v} in the reverse listing. Formally, we define:
where
Finally, we set \(w^{\prime }_{\mathcal {G}} = u^{\prime }v^{\prime \prime }w^{\prime \prime }\).
It can be easily verified that Lemma 4 remains true for the new construction; observe that appending the new part of \(w^{\prime \prime }\) yields the only occurrence of the factor ## (note that \(w^{\prime }\) ends with #) which implies that the old and the new part are separated in the axiom of any rsmallest grammar for \(w^{\prime }_{\mathcal {G}}\). The equivalent to Lemma 5 also holds, since the part of the axiom for \(u^{\prime }\) now has a length of at least \(384n+64+8\lceil \frac {\overline {\mathfrak {I}_{\diamond }}+\overline {\mathfrak {I}_{v}}}{2}\rceil +1\) and the set of new rules, which now costs \(4(\overline {\mathfrak {I}_{\diamond }}+\overline {\mathfrak {I}_{v}})\), shortens this to 384n + 65 (i. e., the number of occurrences of ⋆ in \(u^{\prime }\) plus 1 for $_{1}). Lemma 6 follows with the same arguments as before, just with 3 occurrences for each #〈i〉_{v} and 〈i〉_{v}#, which makes the rules for these subwords profitable even with respect to the rulesize. An analogue of Lemma 7 then follows exactly as before (the only addition is that the new parts of \(v^{\prime \prime }\) and \(w^{\prime \prime }\) are compressed in the obvious way by the existing rules). The following observation shall be helpful.
Observation 3
If HCode \(\mathfrak {I} \subseteq \{1, 2, \ldots , n\}\) is such that \(\{v_{i} \mid i \in \mathfrak {I}\}\) is a vertex cover, then the grammar for \(w^{\prime }_{\mathcal {G}}\) according to the adapted version of Lemma 7 with respect to \(\mathfrak {I}\) (see the proof of Lemma 8) satisfies \(G = 553n + \mathfrak {I} + 6m + 94\) and \(G_{\mathsf {r}} = 603n + 2\mathfrak {I} + 6m + 103\) (note that for the rulesize, we also have to count the start rule, so the sizes differ by the number of rules which is 50n + k + 9).
An analogous statement of Lemma 8 can now be concluded as follows. For a sizek vertex cover Γ of \(\mathcal {G}\), we set \(\mathfrak {I} = \{i \mid v_{i} \in {\Gamma }\}\) and then construct a grammar G for \(w^{\prime }_{\mathcal {G}}\) according to the adapted version of Lemma 7 with respect to \(\mathfrak {I}\) with \(G_{\mathsf {r}} = 603n + 2\mathfrak {I} + 6m + 103\) (see Observation 3). On the other hand, if there is a grammar for \(w^{\prime }_{\mathcal {G}}\) of rulesize 603n + 2k + 6m + 103, then, by the adapted version of Lemma 7, there is a grammar G for \(w^{\prime }_{\mathcal {G}}\) with \(G_{\mathsf {r}} = 603n + 2\mathfrak {I} + 6m + 103 \leq 603n + 2k + 6m + 103\) that has the form given by the adapted version of Lemma 7, with respect to some \(\mathfrak {I} \subseteq \{1, 2, \ldots , n\}\). If, for some edge (v_{i}, v_{j}), \(\{v_{i}, v_{j}\} \cap \mathfrak {I} = \emptyset \), then the factors #〈8i + C_{v}(i)〉_{v}#〈8j + C_{v}(j)〉_{v}# and #〈8j + C_{v}(j)〉_{v}#〈8i + C_{v}(i)〉_{v}# in \(w^{\prime \prime }\) each correspond to three symbols in the axiom, and the factor #〈8i + C_{v}(i)〉_{v}# in \(v^{\prime \prime }\) corresponds to two symbols in the axiom. Hence, introducing the rule \(\overset {{~}_{\leftrightarrow }}{V_{i}}\rightarrow \#{\!}_{\rightarrow }{V_{i}}\) has a cost of three with respect to the rulesize and shortens the axiom by at least three. Consequently, as in the proof of Lemma 8, we can assume that \({\Gamma } = \{v_{i} \mid i \in \mathfrak {I}\}\) is a vertex cover. Since \(G_{\mathsf {r}} = 603n + 2\mathfrak {I} + 6m + 103 \leq 603n + 2k + 6m + 103\), this means that Γ is a vertex cover for \(\mathcal {G}\) of size at most k. Thus, we conclude that the graph \(\mathcal {G}\) has a vertex cover of size k if and only if there exists a grammar of rulesize 603n + 2k + 6m + 103 for \(w^{\prime }_{\mathcal {G}}\), which yields the following:
Theorem 6
SGP_{r} is NPcomplete, even for alphabets of size 29.
Similar to Theorem 4, the above reduction can also be seen as an Lreduction (with the only change of setting β = 1329), which shows that the optimisation variant of the smallest grammar problem remains APXhard under the rulesize measure.
Theorem 7
SGP_{r, opt} is APXhard, even for alphabets of size 29.
We conclude that if we change from the normal size measure to the rulesize measure, NP and APXhardness of the smallest grammar problem over fixed alphabets remains, although the smallest alphabet size in our constructions is slightly larger. We conclude this section by another interesting observation that follows from the rulesize variant of our reduction.
Obviously, the modified reduction to SGP_{r} can also be interpreted as a reduction to SGP. While, on first glance, this only seems to yield a weaker hardness result compared to the one of Theorem 3, it has a nice feature that entails an interesting result in its own right. More precisely, with respect to the modified reduction and the normal size measure, every rule from Lemma 7 has a positive profit (i.e., replacing all occurrences of the nonterminal by the right side of the rule would increase the overall size) and, furthermore, every rule added in the proofs of Lemmas 5 and 6 yields a strictly smaller grammar (note that this directly follows from the correctness of the construction for the rulesize measure). Moreover, there are no repeated substrings in the grammar with this set of rules which means that no additional rules with nonnegative profit can be added. Consequently, we have not only determined the size of a smallest (with respect to ⋅) grammar G for \(w^{\prime }_{\mathcal {G}}\) to be 553n + k + 6m + 94, where k is the size of a smallest vertex cover for \(\mathcal {G}\) (see Observation 3), but also that G requires exactly G_{r} −G = 50n + k + 9 rules (or nonterminals). Hence, the modified reduction also serves as a reduction from the vertex cover problem to the following (weaker) variant of the smallest grammar problem:

Rule NumberSGP (RNSGP)

Instance: A word w and a \(k \in \mathbb {N}\).

Question: Does there exist a smallest grammar G = (N,Σ, R, S) for w with N≤ k?
Theorem 8
RNSGP is NPhard, even for alphabets of size 29.
For the 1level case, the original reduction already provides the analogous result (here, 1RNSGP denotes the variant of RNSGP, where we ask whether there is a smallest 1level grammar with N≤ k):
Theorem 9
1RNSGP is NPhard, even for alphabets of size 5.
While the problems RNSGP and 1RNSGP naturally arise in the context of grammarbased compression, they are particularly interesting in the light of the results presented in Section 4.1 and their relevance shall be discussed there in more detail.
(Limits of) Alphabet Reduction
As shall be discussed in this section, we can achieve a slight reduction of the alphabet size in Theorem 3. However, it seems rather unlikely that a substantial decrease is possible with our current general approach. In particular, it is suggested that a different approach is needed to prove the hardness of SGP for small, e. g., binary, alphabets.
We first note that we already saved one further unique separator of the form $_{i} in the construction for the rulesize by using ## instead, simply exploiting the fact that this substring of length two is not repeated anywhere else, which makes a rule containing it impossible in a smallest grammar. We can actually also shrink our alphabet in the construction used to prove Theorem 3 by saving separator symbols, more precisely, by only using one symbol $ instead of \(\$_{1},\dots ,\$_{6}\). Recall that \(\$_{1},\dots ,\$_{6}\) only had the purpose to cut the grammar at these symbols as described in Observation 2 and hence avoid unwanted repetitions.
As a first observation, it is not hard to see that $_{2}, $_{4}, $_{5} can be removed from the \(w_{\mathcal {G}}\), without creating unwanted repetitions. Removing $_{2} only creates the two unwanted (in the sense that those should not repeat by Propositions 1 and 2) substrings ⋆ g(7n − 1)# and d_{1}#f(C_{v}(1)) ⋆, which do not occur elsewhere in \(w_{\mathcal {G}}\) (more precisely for the second substring: y#f(C_{v}(1)) ⋆ with \(y\notin \{x_{1},\dots ,x_{7}\}\) occurs only two other times once with y = $_{1} and, after removal of $_{5}, once with y = ¢_{2}). Similar arguments hold for removing $_{4} and $_{5}. The remaining $_{i} occur in the subwords: \(x_{6}\$_{1}\#x_{C_{v}(1)}\), \(d_{5}\$_{3}x_{C_{v}(1)}\), \(d_{7}\$_{6}\#x_{C_{v}(j_{1})}\). Now consider replacing $_{1}, $_{3}, $_{6} each by the same symbol $. If we make sure to list the edges in an order such that C_{v}(1)≠C_{v}(j_{1}), the only repeating factor of length more than one containing this new symbol $ is $#. As this subword of length two only occurs twice, it is not profitable for a smallest grammar to compress it with a rule. So with the little adjustments of deleting $_{2}, $_{4}, $_{5}, possible picking another order to list the edges and replacing $_{1}, $_{3}, $_{6} by $, we need five symbols less for our reduction.
Further reduction of the alphabet size requires much more effort. Our main kind of argument is that certain rules cannot exist, simply because their derivative does not occur more than once in \(w_{\mathcal {G}}\). There are cases, where it is possible to show that certain rules with a repeated derivative do not occur, but the respective argument cannot be local and would rather depend on the structure of the whole grammar. On the other hand, rules that we want fixed in a smallest grammar have to be provably profitable. With these properties in mind, it is quite obvious that there is not much room to reduce the alphabet size further.
The symbols ⋆, #,¢_{1},¢_{2} and, after applying the replacement above, $ each have a very specific purpose. It seems very difficult to reduce the alphabet by replacing one of those characters by another or some codeword.
For the symbols \(x_{1},\dots ,x_{7},d_{1},\dots ,d_{7}\), we see that in Lemma 5, which fixes the codewords for vertices and separators built from these symbols, we require at least six repetitions of each desired codeword. Doing this without repeating unwanted subwords, means that, at least with the idea we used to repeat these codewords in the alternating fashion given by the subword u, we need at least six different symbols in each encoding. For the separators 〈j〉_{◇}, our construction requires the seven different symbols \(d_{1},\dots ,d_{7}\), to have unique separators between the repetitions of the subwords #〈i〉_{v}, 〈i〉_{v}# and #〈i〉_{v}# in v and between the edges in the listing in w, for which we need four different kinds of separators, one for each colour of the edgecolouring C_{e}. For the vertex codewords 〈i〉_{v}, we also need seven different symbols to represent the vertex colouring C_{v}. So, first of all, the only way to save symbols among \(x_{1},\dots ,x_{7},d_{1},\dots ,d_{7}\) seems to modify the input graph in such a way that the colourings C_{e} and C_{v} require less colours. It is possible to do this with the adjustments described in the following.
Given a subcubic graph \(\mathcal {G}=(V,E)\), we first build the graph \(\bar {\mathcal {G}}\) from \(\mathcal {G}\) by subdividing each edge twice, i.e., we replace each edge (u, v) ∈ E by three edges (u, u_{v}),(u_{v}, v_{u}) and (v_{u}, v), where u_{v} and v_{u} are two new vertices which are not adjacent to further edges. We now construct the word for SGP to represent the graph \(\bar {\mathcal {G}}\). This shift to the graph \(\bar {\mathcal {G}}\) can be used to decrease the number of colours we require both for C_{v} and C_{e}. First observe that the graph \(\bar {\mathcal {G}}^{2}\) (i. e., the graph obtained from \(\bar {\mathcal {G}}\) by the same operation used to obtain \(\mathcal {G}^{2}\) from \(\mathcal {G}\) in the original reduction; see page 23) has maximum degree three, as a vertex v ∈ V is adjacent to the at most three vertices in {u_{v}: (u, v) ∈ E}, and a vertex v_{u}, added by the subdivision process for an edge (u, v), is adjacent to u and possible the at most two vertices in {v_{x}: (v, x) ∈ E, x≠u}. The vertex colouring C_{v} hence only needs four different colours to properly colour \(\bar {\mathcal {G}}^{2}\).
Next, we choose a specific listing of the edges of \(\bar {\mathcal {G}}\) such that the three edges of \(\bar {\mathcal {G}}\) corresponding to an edge (u, v) of \(\mathcal {G}\) are consecutively listed as (u_{v}, u),(v_{u}, u_{v}),(v, v_{u}) (and the relative order of such triples is arbitrary). In this way, the multigraph \(\bar {\mathcal {G}}^{\prime }\) (i. e., the graph obtained from \(\bar {\mathcal {G}}\) by the same operation used to obtain \(\mathcal {G}^{\prime }\) from \(\mathcal {G}\) in the original reduction; see page 23) contains the edges {(u, v_{u}),(u_{v}, v): (u, v) ∈ E} for vertices from V and, in addition, we have at most one edge of the form \((u_{v}, u^{\prime }_{v^{\prime }})\) for each new vertex added by the subdivision. This means that in \(\bar {\mathcal {G}}^{\prime }\), a vertex v ∈ V is only adjacent to the at most three vertices in {u_{v}: (u, v) ∈ E}, and a vertex u_{v} added by the subdivision process for the edge (u, v) is adjacent to one edge connected to v and to at most one other edge connected to a vertex added by the subdivision process different from u_{v}. Consequently, \(\bar {\mathcal {G}}^{\prime }\) is a simple graph and of maximum degree three. Further, observe that the vertices of degree three in \(\bar {\mathcal {G}}^{\prime }\) (which are a subset of the vertices in V ) form an independent set in \(\bar {\mathcal {G}}^{\prime }\). By a theorem of Fournier [56], an edgecolouring for a graph with these properties, only requires three colours and can be computed in polynomial time with Vizings algorithm [57]. With the same arguments used to prove Theorem 3, it follows that a smallest grammar encodes a minimum vertex cover for \(\bar {\mathcal {G}}\). It remains to observe that the size of a minimum vertex cover for the original input graph \({\mathcal {G}}\) can be derived from a minimum vertex cover for \(\bar {\mathcal {G}}\). If \(\mathcal {G}\) has a vertex cover of size k, then this can be extended to a vertex cover of size k + E for \(\bar {\mathcal {G}}\) by adding exactly one of u_{v} and v_{u} for each edge (u, v) of \(\mathcal {G}\). On the other hand, it can be easily seen that, without loss of generality, a minimum vertex cover for \(\bar {\mathcal {G}}\) contains exactly one of u_{v} and v_{u} for each edge (u, v) of \(\mathcal {G}\), and, moreover, the remaining k vertices in the vertex cover for \(\bar {\mathcal {G}}\) must be a vertex cover for the graph \(\mathcal {G}\).
Overall, the adjustments described so far lead to a hardness reduction which only uses an alphabet with 17 symbols, as we now only require a 6ary encoding for vertices and separators. Observe that, although the colouring C_{v} only requires four colours now, we cannot reduce the alphabet for the vertices to be less than six, as we need six different symbols for the repetitions in u.
Corollary 1
SGP is NPcomplete, even for alphabets of size 17.
The reduction sketched above can still be seen as an Lreduction from the optimisation version of vertex cover to SGP_{opt}. Too see this, observe that the adjustments made to reduce the alphabet only cause an addition of \(\mathcal {O}(m)\) to the size of a smallest grammar for the word constructed for the input graph \(\mathcal {G}\). As \(\mathcal {O}(m)\subseteq \mathcal {O}(m^{*}_{VC}(\mathcal {G}))\) (recall that \(\mathcal {G}\) is cubic), the size of the smallest grammar can be linearly bounded by \(m^{*}_{VC}(\mathcal {G})\) in a similar way as shown in the proof of Theorem 4.
Corollary 2
SGP_{opt} is APXhard, even for alphabets of size 17.
The only way to further reduce the alphabet would be to not just use the repetitions in u to prove Lemma 5 but the repetitions in the whole word. This however is very difficult, as including the rules we want to fix can no longer easily be shown to shorten the axiom. If there is no nonterminal V_{i} which derives 〈i〉_{v} for some index i, the larger substring #〈i〉_{v}¢_{1} in v, for example, might still only require three symbols in the axiom by compressing parts of 〈i〉_{v} with # or ¢_{1}. Similarly for all occurrences of the substring 〈i〉_{v} in v or w. This problem is actually the reason, why we need the nonterminals V_{i} and D_{i} fixed for Lemma 6, to make our desired rules to derive 〈i〉_{v}# and #〈i〉_{v} in the cheapest possible way to enable the argument that other unwanted rules in N_{ax} cannot be more profitable. Consequently, an alphabet of size 17 seems to be necessary to cleanly prove Theorem 3 with our construction.
Similar ideas and limits for alphabet reduction hold for the rulesize measure. A reduction that only uses $ instead of \(\$_{1},\dots , \$_{8}\) works analogously. The symbols $_{i} with i ∈{2,4,5,6,7} can be deleted without creating repetitions of unwanted subwords. Replacing the remaining $_{i}, i ∈{1,4,8} by $ and again reordering the edges in the listing given in \(w^{\prime \prime }\) such that \(x_{C_{v}(1)}\not =x_{C_{v}(j_{1})}\) makes sure that the only repeating factor of length more than one containing the new symbol $ is $#. This factor occurs exactly twice and is hence not compressed by a rule in a smallest grammar (observe that with the rulesize as measure, such a rule is not just unprofitable but even makes the grammar larger). As we here require eight repetitions to show the equivalent of Lemma 5 for the rulesize, saving symbols among \(x_{1},\dots ,x_{8},d_{1},\dots ,d_{8}\) is not possible. Consequently, Theorems 6 and 7 can be improved to require only an alphabet of size 22 but a reduction with a smaller alphabet will be very difficult with our construction.
Smallest Grammars with a Bounded Number of Nonterminals
A natural followup question to the hardness for fixed alphabets is whether polynomialtime solvability is possible if instead the cardinality of the nonterminal alphabet N (or, equivalently, the number of rules) is bounded. In this section, we answer this question in the affirmative by representing words w ∈Σ^{∗} as graphs Φ_{m}(w) and Φ_{1}(w), such that smallest independent dominating sets of these graphs correspond to smallest grammars and smallest 1level grammars, respectively, for w.
It will be more convenient to first take care of the simpler 1level case and to treat then the multilevel case as an extension of it, i. e., we first define Φ_{1}(w) and then derive Φ_{m}(w) from Φ_{1}(w). Recall that, as defined in Section 2, F_{≥ 2}(w) is the set of factors of w with size at least 2. Let Φ_{1}(w) = (V, E) be defined by V = V_{1} ∪ V_{2} ∪ V_{3} and E = E_{1} ∪ E_{2} ∪ E_{3}, where:
Intuitively speaking, the vertices of V_{1} represent every factor by its start and end position, whereas V_{2} contains exactly one vertex per factor of length at least 2. Every u ∈ V_{2} is connected to (i, j), if and only if w[i..j] = u. Vertices (i, j), \((i^{\prime }, j^{\prime })\) are connected if they refer to overlapping factors. For every u ∈ V_{2}, there are u + 1 special vertices in V_{3} that are only adjacent with u. Consequently, we can view Φ_{1}(w) as consisting of w layers, where the i^{th} layer contains the vertices (j, j + (i − 1)) ∈ V_{1}, 1 ≤ j ≤w− (i − 1), the vertices {u ∈ V_{2}∣u = i} and the vertices {(u, j) ∈ V_{3}∣u = i,0 ≤ j ≤u} (see Fig. 2 for an illustration).
Next, we show that 1level grammars for w correspond to independent dominating sets for Φ_{1}(w). Intuitively speaking, the vertices in an independent dominating set from V_{1} induce a factorisation of w, which, in turn, induces the axiom of a 1level grammar in the natural way (i. e., every factor of size at least 2 is represented by a rule). If (i, j) ∈ V_{1} is in the independent dominating set, then w[i..j] ∈ V_{2} is not; thus, due to the dominationproperty, all (w[i..j], ℓ) ∈ V_{3}, 0 ≤ ℓ ≤ j − i + 1, are in the independent dominating set, which represents the size of the rule.
Lemma 9
Let w ∈Σ^{∗}, k ≥ 1. There exists an independent dominating set D of cardinality at most k for Φ_{1}(w) if and only if there exists a 1level grammar G for w with G≤ k −F_{≥ 2}(w).
Proof
We start with the if direction. If G = (N,Σ, R, ax) is a 1level grammar for w with size k −F_{≥ 2}(w), then we can construct an independent dominating set D for Φ_{1}(w) of size k as follows. Let ax = A_{1}A_{2}…A_{n}, A_{i} ∈ N ∪Σ, 1 ≤ i ≤ n, and let \(F = \{\mathfrak {D}(A) \mid A \in N\}\). For every i, 1 ≤ i ≤ n, we add \((\mathfrak {D}(A_{1} {\ldots } A_{i1}) + 1, \mathfrak {D}(A_{1} {\ldots } A_{i})) \in V_{1}\) to D and, if A_{i} ∈ N, then we also add all \(\{(\mathfrak {D}(A_{i}), j) \mid 0 \leq j \leq \mathfrak {D}(A_{i})\}\) to D. Furthermore, we add all V_{2} ∖ F to D. It can be easily verified that D is an independent dominating set. Moreover, \(D = \mathsf {ax} + {\sum }_{v \in F} (v + 1) + V_{2} \setminus F = \mathsf {ax} + {\sum }_{v \in F} v + V_{2} = \mathsf {ax} + {\sum }_{A \in N} \mathfrak {D}(A) + V_{2} = G + \mathsf {F}_{\geq 2}(w)\). Since G = k −F_{≥ 2}(w), we conclude that D = k.
Next, we prove the only if direction. Let D be an independent dominating set for Φ_{1}(w). We first note that, for every u ∈ V_{2} ∖ D, \(\{(u, j) \mid 0 \leq j \leq u\} \subseteq D\), which implies that
For every i, 1 ≤ i ≤w, we say that i is covered by \((j, j^{\prime }) \in V_{1}\) if \((j, j^{\prime }) \in D\) and \(j \leq i \leq j^{\prime }\) (recall that any vertex (i, i) can only be dominated by some vertex \((j, j^{\prime })\) with \(j \leq i \leq j^{\prime }\), since vertex (i, i) has no neighbours in V_{2}). If some i, 1 ≤ i ≤w, is not covered by any \((j, j^{\prime }) \in V_{1}\), then (i, i) is not dominated by D and if i is covered by two different elements from V_{1}, then there is an edge (from E_{1}) between them, so that D is not an independent set. Thus, every i, 1 ≤ i ≤w, is covered by exactly one element \((j, j^{\prime }) \in V_{1}\). This directly implies that D ∩ V_{1} = {(ℓ_{1}, r_{1}),(ℓ_{2}, r_{2}),…,(ℓ_{m}, r_{m})}, such that (u_{1}, u_{2},…, u_{m}) is a factorisation of w, where u_{j} = w[ℓ_{j}..r_{j}], 1 ≤ j ≤ m. Due to the edges in E_{2}, we know that, for every j, 1 ≤ j ≤ m, with ℓ_{j} < r_{j}, there is an edge (u_{j},(ℓ_{j}, r_{j})); thus, u_{j} ∈ (V_{2} ∖ D). Next, we define N = {A_{u}∣u ∈ (V_{2} ∖ D)} and R = {A_{u} → u∣u ∈ (V_{2} ∖ D)}. Since now for each j, 1 ≤ j ≤ m, either u_{j} ∈Σ or there exists a nonterminal \(A_{u_{j}}\) which derives u_{j}, we can define an axiom of length m by \(\mathsf {ax} = C_{u_{1}} C_{u_{2}} {\ldots } C_{u_{m}}\) with \(C_{u_{j}}=A_{u_{j}}\) for all j with u_{j} > 1 and \(C_{u_{j}}=u_{j}\) otherwise, in order to obtain a 1level grammar G = (N,Σ, R, ax) with \(\mathfrak {D}(G) = w\). Finally, we note that
□
Since in the multilevel case the derivatives of the nonterminals that appear in the axiom are again compressed by a grammar, a first idea that comes to mind is to somehow represent the vertices u ∈ V_{2} again by graph structures of the type Φ_{1}(u) and iterating this step. However, naively carrying out this idea would lead to redundancies (copies of the subgraph representing a factor u would appear inside subgraphs representing different superstrings w_{1}uw_{2} and \(w^{\prime }_{1} u w^{\prime }_{2}\)) that even seem to cause an exponential size increase of the graph structure. Fortunately, it turns out that these redundancies can be avoided and a surprisingly simple modification of Φ_{1}(w) is sufficient.
For a word w ∈Σ^{∗}, let Φ_{m}(w) = (V, E) be defined as follows. Let V = V_{1} ∪ V_{2} ∪ V_{3} ∪ V_{4}, where V_{1} and V_{2} are defined as for Φ_{1}(w), whereas
Moreover, E = E_{1} ∪ E_{2} ∪ E_{3} ∪ E_{4} ∪ E_{5}, where E_{1} and E_{2} are defined as for Φ_{1}(w), while
Intuitively speaking, Φ_{m}(w) differs from Φ_{1}(w) in the following way. We add to every vertex u ∈ V_{2} a subgraph (V_{4, u}, E_{4, u}), which is completely connected to u and which represents u in the same way as the subgraph (V_{1}, E_{1}) of Φ_{1}(w) represents w, i. e., factors u[i..j] are represented by (u, i, j) and edges represent overlappings. Moreover, if a u ∈ V_{2} is a factor of some v ∈ V_{2}, then there is an edge from u to all the vertices (v, i, j) ∈ V_{4, v} that satisfy v[i..j] = u (by these “crosslinks”, we get rid of the redundancies mentioned above). Finally, every u ∈ V_{2} is also connected with an otherwise isolated vertex (u,0) ∈ V_{3}. See Fig. 3 for a partial illustration of a Φ_{m}(w).
Similar as for the 1level case, we can show that (multilevel) grammars for w correspond to independent dominating sets for Φ_{m}(w):
Lemma 10
Let w ∈Σ^{∗}, k ≥ 1. There is an independent dominating set D of cardinality k for Φ_{m}(w) if and only if there is a grammar G for w with G = k −F_{≥ 2}(w).
Proof
Let D be an independent dominating set of cardinality k for Φ_{m}(w). In the same way as in the proof of Lemma 9, it can be concluded that the set \(V_{1} \cap D = \{(\ell _{1}, r_{1}), (\ell _{2}, r_{2}), \ldots , (\ell _{m_{w}}, r_{m_{w}})\}\) corresponds to a factorisation \((w_{1}, w_{2}, \ldots , w_{m_{w}})\) of w, where w_{j} = w[ℓ_{j}..r_{j}], 1 ≤ j ≤ m_{w}, and satisfies \(\{w_{1}, w_{2}, \ldots , w_{m_{w}}\} \cap D = \emptyset \).
Next, for an arbitrary u ∈ V_{2}, we consider the subgraph with the vertices N[u] ∖ V_{1} = V_{4, u} ∪{(v, i, j)∣v[i..j] = u, u≠v}∪{u,(u,0)}. If u ∈ D, then N(u) ∩ D = ∅. On the other hand, if u∉D, then (u,0) ∈ D and, analogously as for V_{1}, we can conclude that
such that \((u_{1}, u_{2}, \ldots , u_{m_{u}})\) is a factorisation of u (note that, in the same way as for V_{1}, if a position i of u is not covered in the sense that \((u, j, j^{\prime }) \in D\) with \(j \leq i \leq j^{\prime }\), then vertex (u, i, i) would neither be in D nor adjacent to a vertex in D), where u_{j} = u[ℓ_{u, j}..r_{u, j}], 1 ≤ j ≤ m_{u}. Furthermore, for every j, 1 ≤ j ≤ m_{u}, with u_{j}≥ 2, {u_{j},(u, ℓ_{u, j}, r_{u, j})}∈ E; thus, u_{j}∉D. Consequently, by induction, D induces a factorisation \((u_{1}, u_{2}, \ldots , u_{m_{u}})\) for every u ∈ (V_{2} ∖ D) ∪{w}, such that, for every i, 1 ≤ i ≤ m_{u}, u_{j}≥ 2 implies u_{j} ∈ V_{2} ∖ D, which means that there is also a factorisation for u_{j}.
For every u ∈ V_{2} ∖ D, we can now define a nonterminal A_{u} and a rule \(A_{u} \to B_{1} B_{2} {\ldots } B_{m_{u}}\), where, for every j, 1 ≤ j ≤ m_{u}, \(B_{j} = A_{u_{j}}\) if u_{j}≥ 2 and B_{j} = u_{j} if u_{j} = 1. Obviously, these rules together with the axiom \(\mathsf {ax} = C_{1} C_{2} {\ldots } C_{m_{w}}\), where, for every j, 1 ≤ j ≤ m_{w}, \(C_{j} = A_{w_{j}}\) if w_{j}≥ 2 and C_{j} = w_{j} if w_{j} = 1, defines a grammar G for w.
We note that ax = V_{1} ∩ D and, for every rule A_{u} → α_{u}, α_{u} = V_{4, u} ∩ D. Since
we conclude that G = D−V_{2} = k −F_{≥ 2}(w).
For a grammar G for w, we can select vertices from Φ_{m}(w) according to the factorisations induced by the rules of G, which results in an independent dominating set D for Φ_{m}(w) with D = G + V_{2}. □
For the algorithmic application of these graph encodings, it is important to note that the proofs of Lemmas 9 and 10 are constructive, i. e., they also show how an independent dominating set D of Φ_{m}(w) or Φ_{1}(w) can be transformed into a grammar for w (a 1level grammar for w, respectively) of size D−F_{≥ 2}(w), which, in the following, we will denote by G(D).
Thus, the smallest grammar problem can be solved by constructing Φ_{m}(w) or Φ_{1}(w), then computing a smallest independent dominating set D for Φ_{m}(w) (or Φ_{1}(w), respectively) and finally constructing G(D). Unfortunately, this does not lead to a polynomialtime algorithm, since computing a minimal independent dominating set is an NPcomplete problem, even for quite restricted graph classes [58, Theorem 13].
In the following, we shall analyse the graph structures Φ_{m}(w) and Φ_{1}(w) more thoroughly and we begin with their respective sizes:
Proposition 3
Let w ∈Σ^{∗}. Then Φ_{1}(w) has \(\mathcal {O}({w}^{3})\) vertices and \(\mathcal {O}({w}^{4})\) edges; Φ_{m}(w) has \(\mathcal {O}({w}^{4})\) vertices and \(\mathcal {O}({w}^{6})\) edges.
Proof
We first consider Φ_{m}(w). The subgraph (V_{1}, E_{1}) has \(\mathcal {O}({w}^{2})\) vertices and \(\mathcal {O}({w}^{4})\) edges. Similarly, every induced subgraph on the set of vertices V_{4, u} ∪{u,(u,0)}, u ∈ V_{2} has \(\mathcal {O}({w}^{2})\) vertices, \(\mathcal {O}({w}^{4})\) edges and there are \(\mathcal {O}({w}^{2})\) such subgraphs. In addition to this, there are \(\mathcal {O}({w})\) edges connecting any u ∈ V_{2} with vertices from V_{1} and \(\mathcal {O}({w}^{2})\) edges connecting any u ∈ V_{2} with vertices from V_{4}. Finally, there are \(\mathcal {O}({w}^{2})\) vertices in V_{3} with one incident edge each. Consequently, Φ_{m}(w) has \(\mathcal {O}({w}^{4})\) vertices and \(\mathcal {O}({w}^{6})\) edges.
For Φ_{1}(w), the situation is easier. The subgraph (V_{1}, E_{1}) has \(\mathcal {O}({w}^{2})\) vertices and \(\mathcal {O}({w}^{4})\) edges. There are \(\mathcal {O}({w}^{2})\) vertices in V_{2} and each u ∈ V_{2} has \(\mathcal {O}({w})\) edges. Finally, there are \(\mathcal {O}({w}^{2})\) vertices in V_{3} with one edge each. Consequently, Φ_{1}(w) has \(\mathcal {O}({w}^{3})\) vertices and \(\mathcal {O}({w}^{4})\) edges. □
Next, we investigate the intervalstructure of Φ_{m}(w) and Φ_{1}(w).
Proposition 4
Φ_{m}(w) and Φ_{1}(w) are 2interval graphs.
Proof
In the following 2interval representations, we denote by I_{1}(v) the first and by I_{2}(v) the second interval that represents a vertex v.
We first consider the graph Φ_{1}(w). For every (i, j) ∈ V_{1}, we set I_{1}((i, j)) = [i, j]; this already yields the subgraph (V_{1}, E_{1}). In addition, let I_{1}(u), u ∈ V_{2}, be a sequence of pairwise disjoint intervals that are also disjoint with the intervals I_{1}((i, j)), (i, j) ∈ V_{1}. For every (u, j) ∈ V_{3}, let I_{1}((u, j)) be an interval that lies within I_{1}(u) and is disjoint from every other interval. Now, it only remains to represent the edges from E_{2}, for which we simply let I_{2}((i, j)), (i, j) ∈ V_{1}, be an interval that lies within I_{1}(w[i..j]) and is disjoint from every other interval. Note that only the vertices from V_{1} are represented by two intervals each.
For Φ_{m}(w), we represent V_{1} ∪ V_{2} and the edges E_{1} ∪ E_{2} by intervals in the same way as for the graph Φ_{1}(w). Then, for every u ∈ V_{2} and (u, i, j) ∈ V_{4, u}, we set I_{1}((u, i, j)) = [i + k_{u}, j + k_{u}], where k_{u} is chosen such that all these intervals lie inside I_{1}(u) without intersecting an interval I_{2}((i, j)) for some (i, j) ∈ V_{1}. In particular, this takes care of all the edges E_{4, u} (due to the intersections between these intervals) and the edges between u and the vertices V_{4, u} (due to the fact that these intervals lie inside I_{1}(u)). In order to take care of the edges from E_{5}, for every u and for every (v, i, j) ∈ V_{4, v} with v[i..j] = u, we place a new interval I_{2}((v, i, j)) inside of I_{1}(u) such that it does not intersect with any other interval inside of I_{1}(u). This creates all the edges from E_{5}. Now it only remains to take care of vertices (u,0), u ∈ V_{2}, and their edges, which can be done by placing a new interval I_{1}((u,0)) inside I_{1}(u) such that it does not intersect with any other interval. □
Unfortunately, the independent dominating set problem for 2interval graphs is still NPcomplete (in [58], the hardness of the independent dominating set problem for subcubic graphs is shown and from [59], it follows that subcubic graphs are 2interval graphs). Nevertheless, solving the smallest grammar problem by computing small independent dominating sets for Φ_{m}(w) or Φ_{1}(w), as sketched before Proposition 3, might still be worthwhile, since computing small independent dominating sets is a wellresearched problem, for which the literature provides fast and sophisticated algorithms (see [60, 61]). In particular, the 2interval structure suggests that we are dealing with simpler instances of the independent dominating set problem.
Our algorithmic application of the graph encodings, which leads to the polynomialtime solvability of the smallest grammar problem with a bounded number of nonterminals, can be sketched as follows. If we have fixed the set of factors \(F \subseteq \mathsf {F}_{\geq 2}(w)\) to occur as derivatives of nonterminals in the grammar, i. e., \(\{\mathfrak {D}(A) \mid A \in N\} = F\), then, for the corresponding independent dominating set D of Φ_{m}(w) or Φ_{1}(w), we must have \((\mathsf {F}_{\geq 2}(w) \setminus F) \subseteq D\) and F ∩ D = ∅. Thus, in order to find an independent dominating set that is minimal among all those that correspond to a grammar with \(\{\mathfrak {D}(A) \mid A \in N\} = F\), it is sufficient to first select the vertices (F_{≥ 2}(w) ∖ F), deleting the neighbourhood of this vertex set and computing a smallest independent dominating set for what remains, which is the graph \({\mathscr{H}} = {\Phi }(w) \setminus (N[\mathsf {F}_{\geq 2}(w) \setminus F] \cup F)\).^{Footnote 10} However, \({\mathscr{H}}\) is an interval graph, so a smallest independent dominating set can be computed in linear time.
In order to carry out this approach, we first formally prove that \({\mathscr{H}}\) is an interval graph:
Proposition 5
Let w ∈Σ^{+}, \(F \subseteq \mathsf {F}_{\geq 2}(w)\) and Φ(w) ∈{Φ_{m}(w),Φ_{1}(w)}. Then \({\mathscr{H}} = {\Phi }(w) \setminus (N[\mathsf {F}_{\geq 2}(w) \setminus F] \cup F)\) is an interval graph.
Proof
We only prove the case Φ(w) = Φ_{m}(w), since the case Φ(w) = Φ_{1}(w) can be handled analogously. First, we consider the 2interval representation of Φ_{m}(w) described in the proof of Proposition 4. We can now obtain a 1interval representation of \({\mathscr{H}}\) from the 2interval representation of Φ_{m}(w) as follows. Since \({\mathscr{H}}\) does not contain any vertex from V_{2}, we first remove the corresponding intervals for vertices from V_{2}. The only vertices represented by more than one interval are the ones from V_{1} and V_{4}. However, the second intervals of these only intersect intervals which represent vertices from V_{2} in the 2interval representation of Φ_{m}(w), which means that they are now all isolated and can therefore be removed. Consequently, every vertex of \({\mathscr{H}}\) can be represented by one interval.□
Next, we show that independent dominating sets for \({\mathscr{H}}\) can be easily extended to independent dominating sets for Φ_{m}(w) (or Φ_{1}(w)).
Proposition 6
Let w ∈Σ^{+}, \(F \subseteq \mathsf {F}_{\geq 2}(w)\), Φ(w) ∈{Φ_{m}(w),Φ_{1}(w)} and let \(D_{{\mathscr{H}}}\) be an independent dominating set for \({\mathscr{H}} = {\Phi }(w) \setminus (N[\mathsf {F}_{\geq 2}(w) \setminus F] \cup F)\). Then \(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \backslash F)\) is an independent dominating set for Φ(w).
Proof
We start with the multilevel case. Since \(D_{{\mathscr{H}}}\) is an independent dominating set for \({\mathscr{H}}\), it is also an independent set for Φ_{m}(w). The only vertices of Φ_{m}(w) that are not necessarily dominated by \(D_{{\mathscr{H}}}\) are from N[F_{≥ 2}(w) ∖ F] or F. Since \(\mathsf {F}_{\geq 2}(w) \setminus F \subseteq D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \backslash F)\), the vertices from N[F_{≥ 2}(w) ∖ F] are dominated by \(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \backslash F)\). Regarding the vertices from F, we note that since \(F \cap D_{{\mathscr{H}}} = \emptyset \), the vertices {(u,0)∣u ∈ F} occur in \({\mathscr{H}}\) as isolated vertices and, thus, they must be included in \(D_{{\mathscr{H}}}\), which means that the vertices F are dominated in Φ_{m}(w) by \(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \backslash F)\) as well. Now it only remains to observe that, by definition of Φ_{m}(w), the vertices (F_{≥ 2}(w)∖F) are clearly independent and, since their neighbourhood is completely excluded from \({\mathscr{H}}\) and therefore also from \(D_{{\mathscr{H}}}\), they are also independent from the vertices in \(D_{{\mathscr{H}}}\). Consequently, \(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \backslash F)\) is an independent dominating set for Φ_{m}(w).
The argument for the 1level case is very similar with the only difference that {(u, i)∣u ∈ F,0 ≤ i ≤u} are the vertices from \(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \backslash F)\) that dominate the vertices F. □
For the sake of convenience, for any \(F \subseteq \mathsf {F}_{\geq 2}(w)\), we denote a grammar G = (N,Σ, R, ax) for w with \(\{\mathfrak {D}(A) \mid A \in N\} = F\) by the term Fgrammar, a smallest Fgrammar for w is one that is minimal among all Fgrammars for w.
Lemma 11
Let w ∈Σ^{+} and \(F \subseteq \mathsf {F}_{\geq 2}(w)\). A smallest Fgrammar for w can be computed in time \(\mathcal {O}({w}^{6})\) and a smallest 1level Fgrammar for w can be computed in time \(\mathcal {O}({w}^{4})\).
Proof
Again, we only prove the multilevel case, since the 1level case can be dealt with analogously. We compute a smallest Fgrammar for w as follows. First, we construct Φ_{m}(w) and then \({\mathscr{H}} = {\Phi }_{m}(w) \setminus (N[\mathsf {F}_{\geq 2}(w) \setminus F] \cup F)\), which can be done in time \(\mathcal {O}({\Phi }_{m}(w)) = {w}^{6}\) (see Proposition 3). Obviously, we could also construct \({\mathscr{H}}\) directly, which would not change the overall runningtime. Next, we compute a minimal independent dominating set \(D_{{\mathscr{H}}}\) for \({\mathscr{H}}\), which, since \({\mathscr{H}}\) is an interval graph (see Proposition 5), can be done in time \(\mathcal {O}({\mathscr{H}}) = \mathcal {O}({w}^{6})\) (see Section 2.1). Finally, we construct \(G = \mathsf {G}(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \setminus F))\) (note that, by Proposition 6, \(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \setminus F)\) is an independent dominating set for Φ_{m}(w); thus, G is welldefined), which can be done in time \(\mathcal {O}({w}^{6})\) as well.
It remains to prove that G is a smallest Fgrammar. To this end, we assume that there exists an Fgrammar \(G^{\prime }\) for w and \(G^{\prime } < G\). Consequently, by Lemma 10, there is an independent dominating set \(D^{\prime }\) for Φ_{m}(w) with \(G^{\prime } = D^{\prime }  \mathsf {F}_{\geq 2}(w)\). Since both G and \(G^{\prime }\) are Fgrammars, \(\mathsf {F}_{\geq 2}(w) \setminus D =\mathsf {F}_{\geq 2}(w) \setminus D^{\prime }=F\). This implies that \(D^{\prime }_{{\mathscr{H}}} = D^{\prime } \setminus (\mathsf {F}_{\geq 2}(w) \setminus F)\) is an independent dominating set for \({\mathscr{H}}\). Since by Lemma 10, G = D−F_{≥ 2}(w) and, by assumption, \(D^{\prime } < D\), it follows that \(D^{\prime }_{{\mathscr{H}}} < D_{{\mathscr{H}}}\), which is a contradiction to the minimality of \(D_{{\mathscr{H}}}\). Consequently, G is a smallest Fgrammar for w. □
If instead of a set F of factors, we are only given an upper bound k on N, then we can compute a smallest grammar by enumerating all \(F \subseteq \mathsf {F}_{\geq 2}(w)\) with F≤ k and computing a smallest Fgrammar. This shows that smallest grammars can be computed in polynomial time if the number of nonterminals is bounded.
Theorem 10
Let w ∈Σ^{∗} and \(k \in \mathbb {N}\). A grammar (1level grammar, resp.) for w with at most k rules that is smallest among all grammars (1level grammars, resp.) for xw with at most k rules can be computed in time \(\mathcal {O}({w}^{2k+6})\) (\(\mathcal {O}({w}^{2k+4})\), resp.).
Proof
Obviously, a grammar G for w with k rules and
is smallest among all grammars for w with at most k rules. In order to compute such a grammar, it is sufficient to compute, for every set \(F \subseteq \mathsf {F}_{\geq 2}(w)\) with F≤ k, a smallest Fgrammar, which requires time \(\mathcal {O}({w}^{2k} \cdot {w}^{6}) = \mathcal {O}({w}^{2k+6})\).
Analogously, we can compute a 1level grammar for w with at most k rules that is smallest among all 1level grammars for w with at most k rules in time \(\mathcal {O}({w}^{2k+4})\). □
This result raises some related questions, which shall be discussed next.
Related Questions
In the literature on grammarbased compression, the size of a smallest grammar has been interpreted in terms of a computable upper bound of the Kolomogorov complexity and, thus, as some measure for entropy or information content of strings (see Section 1). Similarly, we could treat the minimal number of nonterminals (i. e., number of rules) that are needed for a smallest grammar as a general parameter of strings, which we call the rulenumber. The main motivation for doing this is pointed out by Theorem 10, which shows that a smallest grammar for w can be computed in time that is exponential only in the rulenumber of w (or, in parameterised complexity terms, the smallest grammar problem parameterised by N is in XP). However, in order to apply the algorithm of Theorem 10 in this regard, we need to know the rulenumber, which naturally leads to the question whether the rulenumber of a given string can efficiently be computed. However, the hardness reductions for the rulesize variants of the smallest grammar problem (see Section 3.3) has already provided a negative answer to this question (see Theorems 8 and 9).
The XPmembership of the smallest grammar problem, provided by Theorem 10, shows that the parameter N has a stronger impact on the complexity than Σ and, furthermore, it gives reason to hope that bounding N might also lead to practically relevant algorithms. In this regard, the algorithm of Theorem 10 with its runningtime of the form \({w}^{\mathcal {O}(N)}\) is a bit dissapointing, since it cannot be considered practical for larger constant bounds on N. On the other hand, an algorithm with a runningtime of f(N) ⋅ g(w), for a polynomial g, would be a huge improvement. In other words, the question is whether the smallest grammar problem is also fixedparameter tractable with respect to the number of nonterminals. Unfortunately, this seems unlikely, since, as stated by the next result, these parameterisations of 1SGP and SGP are W[1]hard. To prove this, we devise a parameterised reduction from the independent set problem parameterised by the size of the independent set, which is known to be W[1]hard (see [62]).
Let \(\mathcal {G} = (V, E)\) be a graph with V = {v_{1}, v_{2},…, v_{n}}, E = m, and let \(k \in \mathbb {N}\). We define the alphabet \({\Sigma } = V \cup \{\#\} \cup \{\diamond _{i} \mid 1 \leq i \leq m + {\sum }^{n}_{i = 1} n  N(v_{i})\}\) and the following word over Σ
As already done in Section 3, every occurrence of ◇ in the word stands for a distinct symbol of \(\{\diamond _{i} \mid 1 \leq i \leq m + {\sum }^{n}_{i = 1} n  N(v_{i})\}\)). Note that w = 6m + 4(n^{2} − 2m) = 4n^{2} − 2m.
Lemma 12
The following statements are equivalent for each k ≤ n:

\(\mathcal {G}\) has an independent set I with I = k.

There is a grammar G for w with at most k nonterminals and G≤ 4n^{2} − 2m + 3k − 2kn.

There is a 1level grammar G for w with at most k nonterminals and G≤ 4n^{2} − 2m + 3k − 2kn.
Proof
We first prove the equivalence of the first and the third statement. Let I be an independent set for \(\mathcal {G}\) with I = k. We define a grammar G = (N,Σ, R, ax) by N = {A_{i}∣v_{i} ∈ I}, R = {A_{i} → #v_{i}#∣A_{i} ∈ N} and \(\mathsf {ax} = w^{\prime }\), where \(w^{\prime }\) is obtained from w, by replacing, for every v_{i} ∈ I, all occurrences of #v_{i}# by A_{i} (note that since I is an independent set, no two occurrences of factors #v_{i}# and #v_{j}# with v_{i}, v_{j} ∈ I overlap). Obviously, G is a 1level grammar for w with k nonterminals. For every v_{i} ∈ I, \(\mathsf {ax}_{A_{i}} = N(v_{i}) + (n  N(v_{i})) = n\); thus, p(A_{i}) = 2n − 3 (recall that the concept of the profit p(A) of a nonterminal A of a 1level grammar is defined on page 11). Consequently, \(G = {w}  {\sum }_{A \in V} \mathsf {p}(A) = 4n^{2}2m  k(2n  3)\).
Let G = (N,Σ, R, ax) be a 1level grammar of size at most 4n^{2} − 2m − 2kn + 3k, with at most k nonterminals. We note that, for every A ∈ N, p(A) ≤ 2n − 3, since in w every repeated factor has size of at most 3 and is repeated at most n times. Since, by assumption, G≤ 4n^{2} − 2m − k(2n − 3) and \(G = 4n^{2}2m  {\sum }_{A \in N} \mathsf {p}(A)\), we conclude that \({\sum }_{A \in N} \mathsf {p}(A) \geq k(2n  3)\). Hence, there are exactly k nonterminals A ∈ N each with a right side of length 3, which implies A → #v_{i}#, for some i, 1 ≤ i ≤ n, and, furthermore, ax_{A} = n. It can be easily verified that this is only possible if {v_{i}∣there is (A → #v_{i}#) ∈ R} is an independent set for \(\mathcal {G}\).
The third statement obviously implies the second statement. We assume that the second statement holds, i. e., there is a grammar G = (N,Σ, R, ax) for w with at most k nonterminals and G≤ 4n^{2} − 2m + 3k − 2kn. If G is not a 1level grammar, then it has a rule A → α with α∉Σ^{+} and, since the only repeated factors of w with a length of at least 3 have the form #x#, for some \(x \in \{v_{1},\dots , v_{n}\}\), we also know that \(\mathfrak {D}(A) = \# x \#\). In particular, this implies that α = B# or α = #B with B → #x ∈ R or B → x# ∈ R. Generally, each rule in G has a length (and hence cost) of at least 2, compresses a factor of length at most 3 and occurs in the axiom at most n times. The rules A and B together can occur at most n times in ax, as they both derive the symbol x. This means that the axiom has a length of at least w− (k − 1)2n and therefore the overall grammar has size of at least ax + 2k = 4n^{2} − 2m − 2kn + 2n + 2k. Since we assumed that G≤ 4n^{2} − 2m + 3k − 2kn, this implies 4n^{2} − 2m − 2kn + 2n + 2k ≤ 4n^{2} − 2m + 3k − 2kn, so 2n ≤ k which contradicts the assumption k ≤ n. □
Lemma 12 directly yields the following result:
Theorem 11
1SGP and SGP parameterised by N are W[1]hard.
We emphasise that Theorem 11 shows W[1]hardness for the smallest grammar problem parameterised by N only for the case where the terminal alphabet Σ is unbounded. The most important respective question, which, unfortunately, is left open here, is whether the smallest grammar problem is fixedparameter tractable with respect to the combined parameter (N,Σ) (we discuss the open cases of the parameterised complexity of the smallest grammar problem in more detail in Section 6).
Finally, we note that we can use Lemma 11 in order to obtain a simple exact exponentialtime algorithm for the smallest grammar problem. More precisely, we compute for each subset \(F \subseteq \mathsf {F}_{\geq 2}(w)\) a smallest Fgrammar, which yields an algorithm with an overall runningtime of \(2^{\mathcal {O}({w}^{2})}\). In the next section, we present more advanced exact exponentialtime algorithms for SGP and 1SGP.
Exact ExponentialTime Algorithms
An obvious approach for an exact exponentialtime algorithm for SGP is to enumerate all ordered trees with w leaves and to interpret them as derivation trees of a grammar for w. More precisely, for a given ordered tree with w leaves, we first label the leaves with the symbols of w and then we inductively label each internal node with u_{1}u_{2}…u_{k}, where u_{i} are the labels of its children nodes. Finally, for every factor u that occurs as a label of some internal node, we substitute all occurrences of this label by a nonterminal A_{u}. In order to estimate the number of such trees, we first note that the i^{th} Catalan number C_{i} is the number of full binary trees (i. e., every nonleaf has exactly two children) with i + 1 leaves. Moreover, every tree with w leaves can be obtained from a full binary tree with w leaves by contracting some of its ‘nonleaf’ edges (i. e., edges not incident to a leaf). Since every full binary tree with w leaves has less than w such ‘nonleaf’ edges, the number of trees that we have to consider is at most C_{w− 1} ⋅ 2^{w}. Since \(C_{{w}1} \in \mathcal {O}(4^{{w}1})\), this leads to an algorithm with runningtime \(\mathcal {O}^{*}(8^{{w}})\).
In the following, we shall give more sophisticated exact exponentialtime algorithms with running times \(\mathcal {O}^{*}(1.8392^{{w}})\), for the 1level case, and \(\mathcal {O}^{*}(3^{{w}})\), for the multilevel case. First, we need to introduce some helpful notations.
Let G = (N,Σ, R, ax) be a grammar for w and let α = A_{1}…A_{k}, A_{i} ∈ (Σ ∪ N), 1 ≤ i ≤ k. The factorisation of \(\mathfrak {D}(\alpha )\) induced by α is the tuple \((\mathfrak {D}_{G}(A_{1}),\dots ,\mathfrak {D}_{G}(A_{k}))\). Furthermore, the factorisation of w induced by ax is called the factorisation of w induced by G. A factorisation q = (u_{1}, u_{2},…, u_{k}) of a word w with w = n can be characterised by the vector \(v_{q}\in \{0,1\}^{n1}\) defined by setting v_{q}[i] = 1 if and only if i = u_{1}…u_{j} for some 1 ≤ j < k. For the sake of convenience, we implicitly assume v_{q}[0] = v_{q}[n] = 1, and treat vectors as words over the alphabet \(\mathbb {N}\), which allows us to use notations already defined for words. From now on, we shall use these two representations of factorisations, i. e., tuples of factors and vectors in {0,1}^{n− 1}, interchangeably, without mentioning it.
The 1Level Case
In the 1level case, as long as we are only concerned with smallest grammars, the factorisation induced by the axiom already fully determines the grammar. More formally, let \(q=(u_{1},u_{2},\dots ,u_{k})\) be a factorisation for a word w and let F_{q} = {u_{i}∣1 ≤ i ≤ k,u_{i}≥ 2}. We define the 1level grammar G_{q} = (N_{q},Σ, R_{q}, ax_{q}) by R_{q} = {(A_{u}, u): u ∈ F_{q}}, N_{q} = {A_{u}: u ∈ F_{q}} and ax_{q} = B_{1}…B_{k} with \(B_{j} = A_{u_{j}}\), if u_{j} ∈ F_{q} and B_{j} = u_{j}, otherwise.
Lemma 13
For any factorisation \(q=(u_{1},u_{2},\dots ,u_{k})\) for w, G_{q} is a smallest grammar among all 1level grammars for w that induce the factorisation q.
Proof
Let \(q=(u_{1},u_{2},\dots ,u_{k})\) be a factorisation for a word w. Every 1level grammar G = (N,Σ, R, ax) for w that induces q satisfies \(G = k + {\sum }_{A \in N} \mathfrak {D}(A) \geq k + {\sum }_{u \in F_{q}} u\). Since \(G_{q} = k + {\sum }_{u \in F_{q}} u\), G_{q} is a smallest 1level grammar for w that induces q. □
Choosing the smallest among all grammars {G_{q}∣q is a factorisation of w} yields an \(\mathcal {O}^{*}(2^{n})\) algorithm for 1SGP. However, it is not necessary to enumerate factorisations that contain at least two consecutive factors of length 1, which improves this result as follows.
Theorem 12
1SGP can be solved exactly in polynomial space and in time \(\mathcal {O}^{*}(1.8392^{{w}})\).
Proof
For any \(k \in \mathbb {N}\), let Γ_{k} contain all q ∈{0,1}^{k}, such that v has no prefix 11, no suffix 11 and no factor 111; furthermore, let \({\Gamma }^{\prime }_{k}\) contain all q ∈{0,1}^{k}, such that v has no suffix 11 and no factor 111. Clearly, Γ_{w− 1} contains exactly the factorisations for w that have no consecutive factors of length 1. In order to solve the smallest 1level grammar problem, we enumerate Γ_{w− 1} and for every q ∈Γ_{w− 1}, we construct G_{p}, where p is obtained from q, by replacing every nonrepeated factor u of q with the factors u[1], u[2],…, u[u]. It remains to prove the correctness of this algorithm and to estimate its runningtime.
To this end, let G be a smallest 1level grammar for w and let p = (u_{1}, u_{2},…, u_{k}) be the factorisation induced by G. Furthermore, let q be the factorisation obtained from p by joining any maximal sequence u_{i}, u_{i+ 1},…, u_{j}, 1 ≤ i < j ≤ k, of factors with u_{ℓ} = 1, i ≤ ℓ ≤ j (note that q ∈Γ_{w− 1}). If none of the newly constructed factors of q is repeated, then the algorithm, when enumerating q, constructs grammar G_{p} that, according to Lemma 13, is smallest among all 1level grammars for w that induce p; thus, G_{p} is a smallest 1level grammar. If, on the other hand, any of these newly constructed factors is repeated and has a length of at least 3, or has length 2 and is repeated for at least 3 times, then a 1level grammar smaller than G could be constructed, which is a contradiction. This leaves the case where all newly constructed factors of q have length 2 and are repeated exactly twice. In this case the algorithm will, when enumerating q, construct a grammar that differs from G_{p} only in that it compresses some factors of length 2 that are repeated only twice, and that G_{p} does not compress. This grammar has obviously the same size as G_{p} and is therefore a smallest 1level grammar as well.
In order to estimate the runningtime, let T(k) = Γ_{k} and \(T^{\prime }(k) = {\Gamma }^{\prime }_{k}\), for every \(k \in \mathbb {N}\). Obviously,
so, in the following, we shall determine {q ∈Γ_{k}∣q[1] = 0} and {q ∈Γ_{k}∣q[1] = 1} separately. To this end, we first note that \(\{q \in {\Gamma }_{k} \mid q[1] = 1\} = T(k1)  T^{\prime }(k3)\) (this is due to the fact that T(k − 1) also counts all \(q = 110q^{\prime }\ldots \) with \(q^{\prime } \in {\Gamma }^{\prime }_{k3}\), so we have to subtract \(T^{\prime }(k3)\)). Moreover,
This is due to the fact that extending the prefix 01 or 001 with 11 yields a factor 111, where the prefix 000 can be extended by 11. With the above observations, we can now conclude the following:
This yields \(T(k) = \mathcal {O}(1.8392^{k})\); since we can also enumerate Γ_{w− 1} in time \(\mathcal {O}^{*}(1.8392^{{w}})\), the algorithm has a runningtime of \(\mathcal {O}^{*}(1.8392^{{w}})\). □
The MultiLevel Case
The obvious idea for a dynamic programming algorithm is to build up grammars level by level, e. g., by starting with a 1level grammar, then extending it by a new axiom, which can derive the old axiom in one derivation step, and iterating this procedure. Obviously, we have to try an exponential number of axioms, which will lead to an exponentialtime algorithm (as suggested by the NPcompleteness of the problem). However, there is a more fundamental problem with this general approach, which shall be pointed out by going a bit more into detail.
For every i and every factorisation p of w, we store in entry T[i, p] of a table the size of a smallest ilevel grammar with an axiom ax that induces factorisation p (in the sense defined at the beginning of this section). Then, for every factorisation q, such that p is a refinement of q, we construct a new axiom \(\mathsf {ax}^{\prime }\) that induces factorisation q and that can derive ax in one step, which is treated as the axiom of a new (i + 1)level grammar. We subtract the profit of the rules needed to derive ax from \(\mathsf {ax}^{\prime }\) to T[i, p] and store the obtained number in T[i + 1, q]. Note that the axioms ax and \(\mathsf {ax}^{\prime }\) are fully determined by the factorisations p and q (similar as a factorisation determines a smallest 1level grammar with an axiom inducing this factorisation, see Lemma 13). However, this approach is fundamentally flawed, since in order to compute the size of the new (i + 1)level grammar, we need to know whether the rules needed to derive ax from \(\mathsf {ax}^{\prime }\) have already been used earlier in the ilevel grammar and therefore are already counted by T[i, p], or whether they are newly introduced. On the other hand, it should clearly be avoided to additionally store all previously used rules as well.
To overcome this problem, we do not consider the levels of a grammar as strings ax, D(ax), D(D(ax)),…, w, which is the obvious choice, but we define them in such a way that all occurrences of a nonterminal are on the same level. With this definition, all the rules that are needed for the extension to the new level must be completely new rules without prior application; thus, a dynamic programming approach similar to the one described above will be successful. Next, we give the required definitions (which are also illustrated by Example 2).
For a dlevel grammar G = (N,Σ, R, ax), we partition the set of nonterminals N according to the number of derivation steps that are necessary to derive a terminal word (or, equivalently, according to their height, i. e., the maximum distance to a leaf in the derivation tree). More precisely, let \(N_{1},\dots ,N_{d}\) be the partition of N into \(N_{i}=\{A\in N \mid ({\mathsf {D}^{i}_{G}}(A)\in {\Sigma }^{+})\wedge (\mathsf {D}^{i1}_{G}(A)\notin {\Sigma }^{+})\}\). We recall that the morphism D : (N ∪Σ)^{∗}→ (N ∪Σ)^{∗} replaces every occurrence of a nonterminal by the right side of its rule. For every i, 1 ≤ i ≤ d, we modify D, such that it only considers nonterminals from N_{i} and ignores the rest. More formally, for every i, 1 ≤ i ≤ d, we define a morphism \(\widehat {\mathsf {D}}_{i}\colon (N\cup {\Sigma })^{*}\rightarrow (N\cup {\Sigma })^{*}\) componentwise by \(\widehat {\mathsf {D}}_{i}(x) = \mathsf {D}(x)\), if x ∈ N_{i} and \(\widehat {\mathsf {D}}_{i}(x) = x\), otherwise. Using these morphisms, we now inductively define the levelsL_{i}, 0 ≤ i ≤ d, of G by L_{d} = ax and, for every i, 0 ≤ i ≤ d − 1, \(\mathsf {L}_{i} = \widehat {\mathsf {D}}_{i+1}(\mathsf {L}_{i + 1})\).
Observation 4
The sequence L_{d}, L_{d− 1},…, L_{1}, L_{0} is a derivation with L_{d} = ax, L_{0} = w and, by a simple induction over i, it can be verified that, for every i, 1 ≤ i ≤ d, all applications of rules for nonterminals from N_{i} happen in the single derivation step from L_{i} to L_{i− 1}. In particular, this implies that, for every i, 1 ≤ i ≤ d, L_{i} contains all occurrences of nonterminals A ∈ N_{i} that are ever derived in the derivation of w or, in other words, for every j, 0 ≤ j ≤ i − 1, \({\sum }_{A \in N_{i}} \mathsf {L}_{j}_{A} = 0\).
Since in the derivation L_{d}, L_{d− 1},…, L_{1}, L_{0} occurrences of a nonterminal A are not derived until all of them are collected in L_{i} and then they are derived all at once in the same derivation step, we can conveniently define the term profit for all rules (of the dlevel grammar G) as follows. For every i, 1 ≤ i ≤ d, we define the profit of every A ∈ N_{i} by p(A) = L_{j}_{A}(D(A)− 1) −D(A). Note that for d = 1 this corresponds to the definition of profit for 1level grammars as introduced on page 11. In particular, we can now express the size of a grammar in terms of the profit of its rules:
Proposition 7
Let G be a grammar. Then \(G = {w}({\sum }^{d}_{i = 1}{\sum }_{A\in N_{i}} \mathsf {p}(A))\).
Proof
We recall that, by definition of the size of a grammar and as a conclusion of Observation 4, we have
Consequently,
□
Example 2
Let G = (N,Σ, R, ax) with N = {A, B, C, D}, Σ = {,}, R = {A → D, B →, C → AB, D →} and ax = CDC be the 3level grammar illustrated in Fig. 4. According to the definitions from above, the partition of N is N_{1} = {B, D}, N_{2} = {A}, N_{3} = {C}, and the levels are
Note that, for every i, 1 ≤ i ≤ 3, L_{i} contains all occurrences of all nonterminals from N_{i} and the rules for all nonterminals N_{i} are exclusively applied in deriving L_{i− 1} from L_{i}. In particular, note that in the derivation L_{3},…, L_{0}, the derivation of occurrences of nonterminals B and D is delayed until the very last derivation step.
Furthermore, the profits are as follows
Moreover, \({w}{\sum }_{A\in N} \mathsf {p}(A) = 17  4 = 13\) and G = ax + D(A) + D(B) + D(C) + D(D) = 3 + 3 + 2 + 2 + 3 = 13.
Before we formally present the dynamic programming algorithm, we sketch its behaviour in a more intuitive way. We first need the following definition. A factorisation p = (u_{1}, u_{2},…, u_{k}) is a refinement of a factorisation q = (v_{1}, v_{2},…, v_{m}), denoted by p ≼ q, if \((u_{j_{i  1} + 1}, u_{j_{i  1} + 2}, \ldots , u_{j_{i}})\) is a factorisation of v_{i}, 1 ≤ i ≤ m, for some {j_{i}}_{0≤i≤m}, with 0 = j_{0} < j_{1} < … < j_{m} = k.
The algorithm runs through steps \(i = 1, 2, \ldots , \frac {w}{2}\) and in step i, it considers all possibilities for two factorisations q_{i− 1} and q_{i} of w induced by L_{i− 1} and L_{i}, respectively (note that this implies q_{i− 1} ≼ q_{i}). The differences between q_{i− 1} and q_{i} implicitly define N_{i} as follows. Let \(q_{i}=(v_{1},v_{2},\dots ,v_{k})\) and let q_{i− 1} = (u_{1}, u_{2},…, u_{ℓ}), which, since q_{i− 1} ≼ q_{i}, means that for some j_{i}, 0 ≤ i ≤ k, with 1 = j_{0} < j_{1} < … < j_{k} = ℓ + 1, \((u_{j_{i  1}}, u_{j_{i  1} + 1}, \ldots , u_{j_{i}  1})\) is a factorisation of v_{i}, 1 ≤ i ≤ k. If j_{s} − j_{s− 1} > 1 for some 1 ≤ s ≤ k, N_{i} contains a nonterminal A with D(A) = j_{s} − j_{s− 1} and \(\mathfrak {D}(A)=v_{s}\). The number L_{i}_{A} is also implicitly given by counting how often the sequence of factors \((u_{j_{s1}+1},\dots ,u_{j_{s}})\) independently occurs in q_{i− 1} and is combined into one single factor in q_{i}; more precisely, \(\mathsf {L}_{i}_{A} = \{t\colon (u_{j_{t1}+1},\dots ,u_{j_{t}})=(u_{j_{s1}+1},\dots ,u_{j_{s}})\}\). This allows to calculate the profit of the rule for A without knowing the exact structure of the rules for nonterminals in N_{j} with j≠i. By Lemma 13, this choice of nonterminals for N_{i} is optimal for the fixed induced factorisations, which means that a search among all choices for q_{i− 1} and q_{i} yields a smallest ilevel grammar for w. The running time of this algorithm is dominated by enumerating all pairs q_{i− 1} and q_{i} of factorisations of w. However, due to q_{i− 1} ≼ q_{i}, these pairs can be compressed as vectors {0,1,2}^{w− 1} (the entries denote whether the corresponding position in w is factorised by both (entry ‘1’), only by the refinement (entry ‘2’) or none (entry ‘0’) of the factorisations). Hence, enumerating these pairs of vectors can be done in time \(\mathcal {O}(3^{{w}})\).
Theorem 13
SGP can be solved in time and space \(\mathcal {O}^{*}(3^{{w}})\).
Proof
Let n = w. We use dynamic programming to consider all possible factorisations of w and refinements for each level \(i=1,\dots ,d\). A factorisation of w is stored as a vector q ∈{0,1}^{n− 1} and, furthermore, we use vectors q ∈{0,1,2}^{n− 1} in order to represent a factorisation together with a refinement, as explained above (for the sake of convenience, we implicitly assume q[0] = q[n] = 1). For such a vector q ∈{0,1,2}^{n− 1} that describes two factorisations p and \(p^{\prime }\) with \(p \preceq p^{\prime }\), we denote by F(q) the factorisation \(p^{\prime }\) (represented as a vector from {0,1}^{n− 1}) and by R(q) the refinement p (represented as a vector from {0,1}^{n− 1}). More formally, let \(F \colon \{0,1,2\}^{n1}\rightarrow \{0,1\}^{n1}\) be a mapping that replaces each ‘2’entry by a ‘0’entry (and leaves all other entries unchanged), and let \(R \colon \{0,1,2\}^{n1}\rightarrow \{0,1\}^{n1}\) be a mapping that replaces each ‘2’entry by a ‘1’entry (and leaves all other entries unchanged).
The dynamic program uses the following tables:

T[i, q] for \(i\in \{2,\dots , \frac {n}{2}\}\) and all q ∈{0,1,2}^{n− 1} ∖{0,1}^{n− 1} stores the size of a smallest ilevel grammar for w for which the axiom ax induces the factorisation F(q) and for which \(\widehat {\mathsf {D}}_{i}(\mathsf {ax})\) induces the factorisation R(q).

S[i, q] for all \(i\in \{1,\dots ,\frac {n}{2}\}\) and all q ∈{0,1}^{n− 1} stores the size of a smallest ilevel grammar for w for which the axiom induces the factorisation q.

P[i, q] for all \(i\in \{2,\dots , \frac {n}{2}\}\) and all q ∈{0,1}^{n− 1} stores the refinement of q which equals the factorisation induced by \(\widehat {\mathsf {D}}_{i}(\mathsf {ax})\) for an optimal ilevel grammar for which ax induces factorisation q.

opt_{i} for all \(i\in \{1,\dots ,\frac {n}{2}\}\) stores the value of a smallest ilevel grammar for w.
We point out that the tables T and S are sufficient to compute the size of a smallest grammar; the purpose of table P is to construct an actual grammar of minimal size after termination of the algorithm. Intuitively speaking, in order to determine S[i, q], i. e., the size of a smallest ilevel grammar for which the axiom induces the factorisation q, we have to check all entries \(T[i, q^{\prime }]\) for which the factorisation of \(q^{\prime }\) (note that \(q^{\prime }\) represents a factorisation and a refinement) equals q and for a minimal one of these entries, we store the actual refinement (which is not needed anymore to compute the size of a minimal grammar) in P[i, q]. In this way, the entries of P[i, q] allow us to restore an actual smallest grammar.
We first initialise S by setting S[1, q] = G_{q}, for every q ∈{0,1}^{n− 1}, where, according to Lemma 13, G_{q} is a smallest 1level grammar for w that induces factorisation q, and we set \(opt_{1} = \min \limits \{S[1,q]\mid q\in \{0,1\}^{n1}\}\).
We then compute iteratively for each \(i=2,\dots , \frac {n}{2}\) the entries T[i, q], \(S[i, q^{\prime }]\) and \(P[i, q^{\prime }]\), for every q ∈{0,1,2}^{n− 1} ∖{0,1}^{n− 1} and \(q^{\prime }\in \{0,1\}^{n1}\) as follows.
First, for any q ∈{0,1,2}^{n− 1} ∖{0,1}^{n− 1}, we define the set I(q) of consecutive factors in R(q) which are combined into one factor in F(q):
Furthermore, from I(q), we can extract the set N(q) of nonterminals which create these factors on level i, i. e., \(N(q):=\{w(j_{0},j_{1},\dots ,j_{k})\mid (j_{0},\dots ,j_{k})\in I(q)\}\), where
The corresponding number of occurrences of the nonterminal \(w(j_{0},j_{1},\dots ,j_{k})\) on level i is given by
The entry T[i, q] can now be computed as follows:
Then, for every \(q^{\prime }\in \{0,1\}^{n1}\), we can compute entries \(S[i,q^{\prime }]\) and \(P[i,q^{\prime }]\) by
where q ∈{0,1,2}^{n− 1} ∖{0,1}^{n− 1} with \(F(q)=q^{\prime }\) and \(T[i,q]=S[i,q^{\prime }]\). Finally, the value opt_{i} is computed by \(opt_{i} = \min \limits \{S[i,q^{\prime }]\mid q^{\prime }\in \{0,1\}^{n1}\}\).
After termination of step \(\frac {n}{2}\), the size of a smallest grammar for the word w is \(\min \limits \{opt_{i} \mid 1 \leq i \leq \frac {n}{2}\}\). Since the values in T[i, q] for any \(i=2,3,\dots , \frac {n}{2}\) and q ∈{0,1,2}^{n− 1} ∖{0,1}^{n− 1} are constructively computed from S[i, R(q)] by defining the rules in N(q), the set \(\bigcup _{j=1}^{i} N(q_{i})\) with q_{i} := q and q_{j− 1} := P[j, q_{j}] for \(j=i1,\dots ,1\) yields an ilevel grammar for w of size T[i, q]. For the index i with \(opt_{i} = \min \limits \{opt_{i} \mid 1 \leq i \leq \frac {n}{2}\}\) and a vector q ∈{0,1,2}^{n− 1} ∖{0,1}^{n− 1} such that opt_{i} = S[i, R(q)], this construction gives a smallest grammar for w.
In order to prove the correctness of the algorithm, we show for each q ∈{0,1}^{n− 1}, inductively for each \(i=1,\dots ,\frac {n}{2}\) that S[i, q] equals the size of a smallest ilevel grammar for w which induces the factorisation q. For i = 1 this is implied by Lemma 13. Assuming that this statement is true for some value i − 1, let G_{i} = (N,Σ, R, ax) be a smallest ilevel grammar for w with \(i \leq \frac {n}{2}\). Let q_{i} and q_{i− 1} be the vectorrepresentations of the factorisations induced by ax and \(\widehat {\mathsf {D}}_{i}(\mathsf {ax})\) respectively. The grammar \(G_{i1}:=(N\setminus N_{i},{\Sigma },R\setminus \{(A,\mathsf {D}(A))\mid A\in N_{i}\},\widehat {\mathsf {D}}_{i}(\mathsf {ax}))\) is an (i − 1)level grammar for w with induced factorisation q_{i− 1} and the size of G_{i− 1} can be computed by \(G_{i}+{\sum }_{A\in N_{i}} \mathsf {p}(A)\) and is at least S[i − 1, q_{i− 1}] by the induction hypothesis. By definition of the profit, the term \(G_{i}+{\sum }_{A\in N_{i}} \mathsf {p}(A)\) can be rewritten to \(G_{i}+\widehat {\mathsf {D}}_{i}(\mathsf {ax})\mathsf {ax} {\sum }_{A\in N_{i}}\mathsf {D}(A)\).
Let q ∈{0,1,2}^{n− 1} be such that F(q) = q_{i} and R(q) = q_{i− 1}, i. e., for every j, 1 ≤ j ≤ n − 1, q[j] = 2, if q_{i}[j]≠q_{i− 1}[j] and q[j] = q_{i}[j], otherwise. The value T[i, q] is computed from S[i − 1, q_{i− 1}] by subtracting
Each 2entry in q occurs in exactly one set in I(q) which, by definition of q, yields:
For each \(w(j_{0},j_{1},\dots ,j_{k})\in N(q)\), N_{i} contains a nonterminal A ∈ N_{i} with D(A) = k, which means that \({\sum }_{A\in N_{i}}\mathsf {D}(A)\geq {\sum }_{w(j_{0},j_{1},\dots ,j_{k})\in N(q)}k\); thus,
Consequently, the algorithm computes the size of a grammar for w that is smallest among all grammars for w with at most \(\frac {n}{2}\) levels and since for any word w there always exists a smallest grammar with at most \(\frac {{w}}{2}\) levels, we conclude that the described algorithm finds a smallest grammar for w. □
We conclude this section by pointing out some features of the algorithm of Theorem 13. First, note that the bruteforce enumeration of all q ∈{0,1,2}^{n− 1} ∖{0,1}^{n− 1}, which dominates the runningtime, provides some possibilities for modifications. For example, if we only consider q such that at most 2 neighbouring factors of R(q) are combined in F(q) (which are much less than the full set {0,1,2}^{n− 1} ∖{0,1}^{n− 1}), then we automatically compute smallest grammars in Chomsky normal form.^{Footnote 11} Moreover, for a fixed i and two \(q_{1}, q_{2} \in \{0,1, 2\}^{n1}\setminus \{0,1\}^{n1}\), the computations that are necessary to compute T[i, q_{1}] and T[i, q_{2}] are independent from each other and only require the previously computed values S[i − 1,⋅] (an analogous observation can be made for the computation of the S[i,⋅] and P[i,⋅]). Hence, the bruteforce enumeration of the q ∈{0,1,2}^{n− 1} ∖{0,1}^{n− 1} and of the \(q^{\prime } \in \{0,1\}^{n1}\) can be easily done in parallel.
Conclusions
We conclude this work by discussing some important open problems and additional questions that are motivated by our results.
Small Alphabets
For hard problems on strings, we usually encounter the situation that either the problem becomes polynomialtime solvable for constant alphabets, or there is a hardness reduction that works for some constant alphabet, which, by simple encoding techniques, extends to binary alphabets as well. Moreover, the unary case is often trivially solvable in polynomial time, even if the problem becomes intractable for larger alphabets. However, the smallest grammar problem shows a drastically different behaviour: it is not polynomialtime solvable for every constant alphabet (unless P = NP), but the NPhardness for very small alphabets (even for the binary or unary case) is still open. Thus, we consider the following as one of the most important open questions:
Open Problem 1
Is it possible to compute smallest grammars for binary alphabets in polynomial time?
We believe that answering this question in the negative might be rather difficult. In fact, the substantial effort that was necessary to prove Theorem 3 suggests that further strengthening our reduction to the case of binary alphabets is problematic. Thus, a completely different kind of reduction seems necessary. However, the main technical challenge seems to be the necessity to control the compression of factors that function as codewords for parts of the source problem of the reduction. It is arguably difficult to think about reductions that somehow circumvents this issue.
On the other hand, it is not apparent how a small alphabet could help in order to efficiently compute smallest grammars and, if this is possible, it seems that deeper combinatorial insights with respect to grammarbased compression are necessary.
Approximation
So far, no constantfactor approximation algorithm is known for the smallest grammar problem (as already mentioned in Section 1.3, the best approximation algorithms achieve a ratio in \(\mathcal {O}\left (\log \left (\frac {{w}}{m^{*}}\right )\right )\) [33, 34, 40]) and, although not backed by any hardness results, the existing literature suggests that no such algorithm exists. Moreover, this apparent hardness of approximating smallest grammars also applies to the case of fixed alphabets, since, as shown in [39], if there is an approximation algorithm for the smallest grammar problem over a binary alphabet with a constant approximation ratio c, then there also is a 6capproximation algorithm for arbitrary alphabets. This especially means that disproving the existence of a 6approximation for the smallest grammar problem for unbounded alphabets, under some complexity theoretic assumption, implies, under the same assumption, that there is no polynomial algorithm for the restriction to binary alphabets. Considering the substantial effort that went into designing a reduction for alphabet size 17 in this paper, such an inapproximability result for unbounded alphabets might actually be an easier way to show computational lower bounds for binary alphabets.
Aside from these consequences for binary alphabets, an inapproximability result (with some ratio significantly larger than the current bound of \(\frac {8569}{8568}\)) for the smallest grammar problem would be very interesting, yet not unexpected. The common belief that general constantfactor approximations probably do not exist is based on the fact that, despite substantial effort, such algorithms have not been found so far, but also on the close relation to the problem of computing shortest addition chains for a set of integers — a problem which has been extensively studied for over 100 years (see [63] for a survey on addition chains and [33, 34] for their connections to the smallest grammar problem). Formally, an addition chain is a strictly increasing sequence \((a_{1}, a_{2}, \ldots , a_{k}) \in \mathbb {N}^{k}\) with a_{1} = 1 and, for every i, 2 ≤ i ≤ k, there are b, c ∈{a_{1},…, a_{i− 1}} with a_{i} = b + c; the task is to compute a desirably short addition chain that contains a given set of integers. In a sense, grammars can be seen as the natural extension of addition chains (i. e., instead of integers, we are concerned with strings and integeraddition becomes stringconcatenation).
It has been shown in [33, 34], that a set of integers can be translated into a word (over an alphabet that grows with the number of integers), the smallest grammar of which is larger than the length of a shortest addition chain of the integers by only a constant factor. Consequently, an approximation algorithm for the smallest grammar problem with approximation ratio in \(\small \text {o}(\frac {\log n}{\log \log n})\) would imply an improvement of longstanding results for addition chains, for which the best known approximation algorithm achieves an approximation ratio in \(\mathcal {O}(\frac {\log n}{\log \log n})\) (see [34] for details). Note that, with the results of [39] mentioned above, this statement also holds for the case of constant, even binary, alphabets.
Moreover, we can also observe that the fundamental technique of the approximation algorithms of [33, 34, 40], which links smallest grammars with the size of LZ77factorisations, is unlikely to prove an approximation with ratio in \(\small \text {o}(\frac {\log n}{\log \log n})\). More precisely, by bounding the size of a smallest grammar of a word from below by the length of its shortest LZ77factorisation, the performance of these algorithms is shown by comparison with this LZ77bound. However, it is also shown (see [33, 40]) that there are words, for which a smallest grammar is \(\mathcal {O}(\frac {\log n}{\log \log n})\)times as large as the size of a smallest LZ77factorisation; thus, for such algorithms, an approximationratio better than \(\mathcal {O}(\frac {\log n}{\log \log n})\) cannot be shown by this technique. Moreover, note that this result is improved in [39], where binary words are presented, for which a smallest grammar is \(\mathcal {O}(\frac {\log n}{\log \log n})\)times as large as the size of a smallest LZ77factorisation.
Open Problem 2
Is there a constantfactor approximation algorithm for the smallest grammar problem? (Note that a negative result disproving a ratio of 6 or larger, yields a bound for the restriction to binary alphabets.)
Parameterised Complexity
This work can also be seen as the starting point of a comprehensive parameterised complexity analysis of the smallest grammar problem. More precisely, our results show that the problem is most likely not in FPT, if parameterised by Σ, N or the number of levels. However, with respect to parameter N, we saw that it is at least in XP. A simple fixedparameter tractable case can be obtained, if we parameterise by both Σ and \(\ell = \max \limits \{\mathfrak {D}(A) \mid A \in N\}\). More precisely, for every \(F \subseteq \{u \mid u \in {\Sigma }^{+}, 2 \leq u \leq \ell \}\), we compute a smallest Fgrammar according to Lemma 11 and we output one that is minimal among them. Since the number of the sets F is bounded by a function of the parameters, this yields an fptalgorithm. However, we consider the following parameterised variant, for which the existence of an fptalgorithm is still open, the most interesting:
Open Problem 3
Is the smallest grammar problem parameterised by Σ and N fixedparameter tractable?
A More Abstract View
From a rather abstract point of view, one could generally interpret any set of factors \(F \subseteq 2^{{\Sigma }^{*}}\) as a grammar. More precisely, an Fgrammar is then a triple G_{F} = (N,Σ, R) (the axiom or start symbol is intentionally missing) with N = {A_{u}∣u ∈ F} and R is a set of rules over Σ and N that satisfies \(\mathfrak {D}(A_{u}) = u\), for every u ∈ F. In this way, an Fgrammar is a representation of F (just that none of the words in F is the designated compressed word). Obviously, there is a large element of freedom in this definition of Fgrammars, since many choices for R are possible. However, as long as we are only interested in small grammars, this is justified, since a grammar that is a smallest among all Fgrammars (in the sense described above) can be computed in polynomial time. To see this, we can slightly adapt the approach from Section 4 as follows. For every u ∈ F, we first construct the subgraph with vertices V_{4, u} and edges E_{4, u}, then we delete all vertices (u, i, j) with i < j and u[i..j]∉F (and adjacent edges). As before, it can be shown that an independent dominating set for the resulting interval graph corresponds to a smallest Fgrammar. In the following, we denote by G_{F} the smallest Fgrammar obtained in this way.
In a sense, this abstracts away the question of how factors are compressed by other factors and boils the problem of computing small grammar down to its core of hardness, which relies in choosing the right factors. While this perspective is interesting from a theoretical point of view, it also yields questions that might have algorithmic application. For example, as an alternative to the exponential bruteforce enumeration of all \(F \subseteq \mathsf {F}_{\geq 2}(w)\) in order to obtain an Fgrammar that is smallest among all grammars, one could compute G_{F} for a factor set F that is inclusion maximal in the sense that, for every \(F^{\prime } \supsetneq F\), \(G_{F} < G_{F^{\prime }}\) (or inclusion minimal, which can be defined analogously). However, this approach only seems applicable in a reasonable way, if this concept of inclusion maximality is monotone, i. e., the inclusion maximality of F is characterised by G_{F} < G_{(F∪{u})}, for every u ∈Σ^{∗}. In this regard, note that \(G_{F} = G_{F^{\prime }}\) is possible for \(F \subsetneq F^{\prime }\), as witnessed by F = {^{4}} and F = {^{4},^{2}}.
Open Problem 4
Are there \(F_{1} \subsetneq F_{2} \subsetneq F_{3} \subseteq \mathsf {F}_{\geq 2}(w)\), such that \(G_{F_{1}} < G_{F_{2}}\) and \(G_{F_{3}} < G_{F_{1}}\)?
If the inclusion maximality is monotone, then every inclusion maximal F (thus, also an optimal F for which G_{F} is a smallest grammar) can be computed by starting with F = {w} and iteratively adding factors from w, until every possible new factor would increase the size of G_{F}. This also yields an obvious greedy strategy: always choose the new factor that results in a smallest G_{F}. In this regard, we stress the fact that this kind of greedy strategy differs from the algorithm Greedy [37], analysed in [33, 34], since the latter iteratively changes an existing grammar and the greediness is with respect to the rules of the intermediate grammars.
This also points out an interesting fact (and a potential difficulty) of this approach: The grammars corresponding to the factor sets F, F ∪{u}, \(F \cup \{u, u^{\prime }\}\) and so on, i. e., the grammars G_{F}, G_{(F∪{u})}, etc., could be quite different and do not necessarily share the incremental character of the factor sets, in the sense that one grammar can be obtained from the previous one by small, local modifications.
Notes
Such contextfree grammars are also called straightline programs in the literature.
In this work, the term “compression” always refers to lossless data compression.
The work [2] also considers the compression perspective.
A concept of grammar complexity has also been introduced and is investigated in the area of descriptional complexity of formal languages (see [27,28,29,30,31,32]). However, this differs from the topic of this paper, since there, grammars for finite languages are investigated and the complexity measure under interest is the number of rules (note that in [31, 32], the size of grammars is also considered).
Most of these algorithms were originally designed as compression algorithms (with slightly different purposes than solving the smallest grammar problem), but they can also be regarded as approximation algorithms for the smallest grammar problem and have also been investigated in this regard in [33, 34].
As we are not considering maximisation problems, we define the relevant terminology only for minimisation problems.
The report can be downloaded at http://www.informatik.unitrier.de/∼fernau/Sto77.pdf.
For example, a grammar can be formed into a single string by using an order on the rules and then listing the right sides with separators in between, or by listing the rules with the corresponding nonterminals.
See page 8 for the definition of the closed neighbourhood.
The restriction to grammars in Chomsky normal form is quite common, since also many of the existing approximation algorithms compute grammars in Chomsky normal form.
References
NevillManning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: A lineartime algorithm. J. Artif. Intell. Res. 7, 67–82 (1997)
NevillManning, C.G.: Inferring sequential structure, Ph.D. Thesis, University of Waikato, NZ (1996)
de Marcken, C.: Unsupervised language acquisition. Ph.D. Thesis, Department of Electrical Engineering and Computer Science, MIT, USA (1996)
Gallé, M: Searching for compact hierarchical structures in DNA by means of the smallest grammar problem. Ph.D. Thesis, University of Rennes 1, France (2011)
Lanctôt, J.K., Li, M., Yang, E.: Estimating DNA sequence entropy. In: Proceedings of the eleventh annual ACMSIAM symposium on discrete algorithms, SODA 2000, january 911, 2000, san francisco, ca, USA., pp. 409–418 (2000)
Kieffer, J.C., Yang, E.H.: Grammarbased codes: A new class of universal lossless source codes. IEEE Trans. Inf. Theory 46(3), 737–754 (2000)
Kieffer, J.C., Yang, E.H., Nelson, G.J., Cosman, P.C.: Universal lossless compression via multilevel pattern matching. IEEE Trans. Inf. Theory 46(4), 1227–1245 (2000)
Yang, E.H., Kieffer, J.C.: Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform  part one: Without context models. IEEE Trans. Inf. Theory 46(3), 755–777 (2000)
Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. Journal of the ACM 29(4), 928–951 (1982)
Storer, J.A.: NPcompleteness results concerning data compression. Tech. Rep. Dept. 234, Electrical Engineering and Computer Science, Princeton University, USA (1977)
Li, M., Vitányi, P: An introduction to Kolmogorov complexity and its applications, 2nd edn. Springer, Berlin (1997)
Böttcher, S, Lohrey, M., Maneth, S., Rytter, W.: 08261 abstracts collection  structurebased compression of complex massive data. In: StructureBased Compression of Complex Massive Data, 22.06.  27.06.2008. http://drops.dagstuhl.de/opus/volltexte/2008/1694/ (2008)
Maneth, S., Navarro, G.: Indexes and computation over compressed structured data (dagstuhl seminar 13232). Dagstuhl Reports 3(6), 22–37 (2013). https://doi.org/10.4230/DagRep.3.6.22
Bille, P., Lohrey, M., Maneth, S., Navarro, G.: Computation over compressed structured data (dagstuhl seminar 16431). Dagstuhl Reports 6(10), 99–119 (2016). https://doi.org/10.4230/DagRep.6.10.99
Lohrey, M.: Algorithmics on SLPcompressed strings: A survey. Groups, Complexity, Cryptology 4(2), 241–299 (2012)
Lohrey, M.: The compressed word problem for groups, Springer Briefs in Mathematics. Springer, Berlin (2014)
Akutsu, T.: A bisection algorithm for grammarbased compression of ordered trees. Inf. Process. Lett. 110(1819), 815–820 (2010)
Lohrey, M., Maneth, S.: The complexity of tree automata and XPath on grammarcompressed trees. Theor. Comput. Sci. 363(2), 196–210 (2006)
Lohrey, M., Maneth, S., Mennicke, R.: XML tree structure compression using RePair. Inf. Syst. 38(8), 1150–1167 (2013)
Lohrey, M., Maneth, S., SchmidtSchauß, M.: Parameter reduction and automata evaluation for grammarcompressed trees. J. Comput. Syst. Sci. 78(5), 1651–1669 (2012)
Gascón, A., Lohrey, M., Maneth, S., Reh, C.P., Sieber, K.: Grammarbased compression of unranked trees. In: Computer Science  Theory and Applications  13th International Computer Science Symposium in Russia, CSR 2018, Moscow, Russia, June 610, 2018, Proceedings, pp. 118–131 (2018)
Gascón, A., Godoy, G., SchmidtSchauß, M: Unification with singleton tree grammars. In: Rewriting Techniques and Applications, 20th International Conference, RTA 2009, Brasília, Brazil, June 29  July 1, 2009, Proceedings, pp. 365–379 (2009)
Berman, P., Karpinski, M., Larmore, L.L., Plandowski, W., Rytter, W.: On the complexity of pattern matching for highly compressed twodimensional texts. J. Comput. Syst. Sci. 65(2), 332–350 (2002)
Plandowski, W., Rytter, W.: Application of LempelZiv encodings to the solution of words equations. In: Automata, Languages and Programming, 25th International Colloquium, ICALP 1998, Aalborg, Denmark, July 1317, 1998, Proceedings, pp. 731–742 (1998)
Jez, A.: Recompression: A simple and powerful technique for word equations. J. ACM 63(1), 4:1–4:51 (2016)
Ganardi, M., Jez, A., Lohrey, M.: Balancing straightline programs. In: 60th Annual Symposium on Foundations of Computer Science, FOCS ’19, Baltimore, Maryland, USA, November 912, 2019 (2019)
Alspach, B., Eades, P., Rose, G.: A lowerbound for the number of productions required for a certain class of languages. Discret. Appl. Math. 6(2), 109–115 (1983)
Filmus, Y.: Lower bounds for contextfree grammars. Inf. Process. Lett. 111, 895–898 (2011)
Bucher, W., Maurer, H.A., II, K.C., Wotschke, D.: Concise description of finite languages. Theor. Comput. Sci. 14, 227–246 (1981)
Eberhard, S., Hetzl, S.: Compressibility of finite languages by grammars. In: Descriptional Complexity of Formal Systems  17th International Workshop, DCFS 2015, Waterloo, ON, Canada, June 2527, 2015. Proceedings, pp. 93–104 (2015)
Gruber, H., Holzer, M., Wolfsteiner, S.: On minimal grammar problems for finite languages. In: Developments in Language Theory  22nd International Conference, DLT 2018, Tokyo, Japan, September 1014, 2018, Proceedings, pp. 342–353 (2018)
Holzer, M., Wolfsteiner, S.: On the grammatical complexity of finite languages. In: Descriptional Complexity of Formal Systems  20th IFIP WG 1.02 International Conference, DCFS 2018, Halifax, NS, Canada, July 2527, 2018, Proceedings, pp. 151–162 (2018)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
Lehman, E.: Approximation algorithms for grammarbased data compression. Ph.D. Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology (2002)
Ziv, J., Lempel, A.: Compression of individual sequences via variablerate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)
Welch, T.A.: A technique for highperformance data compression. IEEE Computer 17(6), 8–19 (1984)
Apostolico, A., Lonardi, S.: Offline compression by greedy textual substitution. Proceedings of the IEEE 88, 1733–1744 (2000)
Larsson, N.J., Moffat, A.: Offline dictionarybased compression. Proceedings of the IEEE 88, 1722–1732 (2000)
Hucke, D., Lohrey, M., Reh, C.P.: The smallest grammar problem revisited. In: String Processing and Information Retrieval  23rd International Symposium, SPIRE 2016, Beppu, Japan, October 1820, 2016, Proceedings, pp. 35–49 (2016)
Rytter, W.: Application of LempelZiv factorization to the approximation of grammarbased compression. Theor. Comput. Sci. 302(13), 211–222 (2003)
Arpe, J., Reischuk, R.: On the complexity of optimal grammarbased compression. In: 2006 data compression conference (DCC 2006), 2830 march 2006, snowbird, ut, USA, pp. 173–182 (2006)
Garey, M.R., Johnson, D.S.: Computers and intractability. New York: Freeman, New York (1979)
Farber, M.: Independent domination in chordal graphs. Oper. Res. Lett. 1(4), 134–138 (1982)
Papadimitriou, C.H.: Computational complexity. AddisonWesley, Boston (1994)
Downey, R.G., Fellows, M.R.: Fundamentals of parameterized complexity. Texts in Computer Science. Springer, Berlin (2013)
Flum, J., Grohe, M.: Parameterized complexity theory. Springer, Berlin (2006)
Cygan, M., Fomin, F., Kowalik, L., Lokshtanov, D., Marx, D., Pilipczuk, M., Pilipczuk, M., Saurabh, S.: Parameterized algorithms. Springer, Berlin (2015)
Ausiello, G.: Complexity and approximation: combinatorial optimization problems and their approximability properties. Springer, Berlin (1999)
Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NPcomplete graph problems. Theor. Comput. Sci. 1(3), 237–267 (1976)
Shannon, C.E.: A theorem on coloring the lines of a network. Journal of Mathematics and Physics 28, 148–151 (1949)
Skulrattanakulchai, S.: Δlist vertex coloring in linear time. Inf. Process. Lett. 98(3), 101–106 (2006)
Alimonti, P., Kann, V.: Some APXcompleteness results for cubic graphs. Theor. Comput. Sci. 237(12), 123–134 (2000)
NevillManning, C.G., Witten, I.H.: Online and offline heuristics for inferring hierarchies of repetitions in sequences. Proceedings of the IEEE 88, 1745–1755 (2000)
Benz, F., Kötzing, T: An effective heuristic for the smallest grammar problem. In: Genetic and Evolutionary Computation Conference, GECCO ’13, Amsterdam, The Netherlands, July 610, 2013, pp. 487–494 (2013)
Carrascosa, R., Coste, F., Gallé, M, López, G G I: Searching for smallest grammars on large sequences and application to DNA. Journal of Discrete Algorithms 11, 62–72 (2012)
Fournier, J.C.: Colorations des arêtes d’un graphe. Cahiers Centre Études Recherche Opér. 15, 311–314 (1973). Colloque sur la Théorie des Graphes (Brussels, 1973)
Vizing, V.G.: The chromatic class of a multigraph. Kibernetika (Kiev) 1(3), 29–39 (1965)
Manlove, D.F.: On the algorithmic complexity of twelve covering and independence parameters of graphs. Discret. Appl. Math. 91(13), 155–175 (1999)
Griggs, J.R., West, D.B.: Extremal values of the interval number of a graph. SIAM Journal on Matrix Analysis and Applications 1(1), 1–7 (1980)
Haynes, T.W., Hedetniemi, S.T., Slater, P.J.: Fundamentals of domination in graphs. Monographs and Textbooks in Pure and Applied Mathematics, vol. 208. Marcel Dekker, New York (1998)
Bourgeois, N., Croce, F.D., Escoffier, B., Paschos, V.T.: Fast algorithms for min independent dominating set. Discret. Appl. Math. 161(45), 558–572 (2013)
Downey, R.G., Fellows, M.R.: Fixed parameter tractability and completeness. Congressus Numerantium 87, 161–187 (1992)
Thurber, E.G.: Efficient generation of minimal length addition chains. SIAM Journal on Computing 28, 1247–1263 (1999)
Acknowledgments
Katrin Casel was supported by the Deutsche Forschungsgemeinschaft (FE 560/61). Serge Gaspers is the recipient of an Australian Research Council (ARC) Future Fellowship (FT140100048) and acknowledges support under the ARC’s Discovery Projects funding scheme (DP150101134). NICTA is funded by the Australian Government through the Department of Communications and the ARC through the ICT Centre of Excellence Program. We thank Gabriele Fici for pointing us to the “rule number” variant of the smallest grammar problem that we discussed at the end of Section 3.3.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work represents an extended version of the paper “On the Complexity of GrammarBased Compression over Fixed Alphabets” presented at the 43^{rd} International Colloquium on Automata, Languages, and Programming (ICALP 2016) and published in LIPIcs  Leibniz International Proceedings in Informatics (https://doi.org/10.4230/LIPIcs.ICALP.2016.122)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Casel, K., Fernau, H., Gaspers, S. et al. On the Complexity of the Smallest Grammar Problem over Fixed Alphabets. Theory Comput Syst 65, 344–409 (2021). https://doi.org/10.1007/s0022402010013w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s0022402010013w
Keywords
 Grammarbased compression
 Smallest grammar problem
 Straightline programs
 NPcompleteness
 Exact exponentialtime algorithms