1 Introduction

Context-free grammars are among the most classical concepts in theoretical computer science. Their wide range of applications, both of theoretical and practical nature, is well-known and usually forms an integral part of academic undergraduate courses in computer science. In this paper, we are concerned with grammars G that describe singleton languages {w} (or, by slightly abusing notation, grammars describing single words).Footnote 1

1.1 Grammars as Inference Tools and Compressors

Although, from a formal languages point of view, describing a single word by a context-free grammar seems excessive, there are at least two evident motivations:

  • Compression Perspective:Footnote 2 The grammar G is a compressed representation of the word w.

  • Inference Perspective: The grammar G identifies the hierarchical structure of the word w.

The inference perspective can be traced back to the work of Nevill-Manning and Witten [1, 2],Footnote 3 in which the authors consider algorithmic possibilities of extracting (hierarchical) structure from sequential data, such as texts (in a natural or formal language), music or DNA, by constructing a grammar for a given sequence. The hypothesis that small grammars are to be preferred can be considered as an application of Occam’s razor (note that the size of a grammar is the sum of the sizes of its rules, where the size of a rule is measured by the length of its right side). In a more general sense, Nevill-Manning and Witten’s approach embarks on the quest of inferring the intrinsic information content of a given sequence, which is a central problem in learning theory and algorithmic information theory (especially Kolmogorov complexity, as mentioned below). In Nevill-Manning’s PhD-thesis [2], a multitude of connections between the compression perspective of computing grammars for single words and other core topics of mathematics and theoretical computer science are discussed (e. g., the minimum description length principle in learning theory, information theory, data compression). The inference perspective of computing grammars for single words has been applied in two more PhD-theses, namely by de Marcken [3] in order to investigate whether analysing the structure of small grammars for large English texts could help understanding the structure of the language itself, and by Gallé [4] in order to infer hierarchical structures in DNA. Moreover, Lanctot et al. [5] contribute to the work on estimating the entropy of DNA-sequences (see the references in [5]), by using an algorithm first proposed by Kieffer and Yang [6] to compute grammars for DNA-sequences.

While in the above mentioned work, grammars are mainly used as an inference tool, the obvious connections to data compression are often highlighted as well (e. g., in [2]). The work of Kieffer et al. [6,7,8] directly approaches the concept of representing words by grammars from a traditional data compression perspective, i. e., we want to compute a small grammar representing a large given word w (in the following, we denote the general concept of compressing a single word by a context-free grammar as grammar-based compression). Besides the above mentioned papers by Nevill-Manning and Witten, the work by Kieffer et al. is usually stated as the second origin of using grammars for single words, but a closer look into the older literature reveals that the external pointer macro scheme (without overlapping and with pointer size 1) defined by Storer and Szymanski [9, 10] is also equivalent to grammar-based compression.

Another motivation is that grammar-based compression, like any lossless data compression scheme, provides a computable upper bound of the Kolmogorov complexity (see [11]). Since this central measure in algorithmic information theory is generally incomputable, such computable approximations are important and, in this regard, grammars are of relevance, since, in comparison to other practically applied compression schemes, they achieve high compression rates and therefore yield a better approximation of the Kolmogorov complexity (in this regard, note that many practically relevant compression schemes, e. g., some of the ones mentioned in Section 1.3, allow fast compression and decompression, but cannot achieve exponential compression rates).

1.2 Algorithmics on Compressed Strings

The original motivations outlined so far are still relevant, but the actual reason why grammar-based compression has experienced a renaissance and thrives today as an independent and important field of research on its own are the following. While in the early days of computer science, the most important requirements for compression schemes were fast (i. e., linear or near linear time) compression and decompression, nowadays the investigation regarding whether they are suitable for solving problems directly on the compressed data without prior decompression forms a vibrant research area.Footnote 4 This area is usually subsumed under the term algorithmics on compressed strings, and grammar-based compression is particularly well suited for this purpose.

The success of grammars with respect to algorithmics on compressed strings is due to the fact that they cover many compression schemes from practice (most notably, the family of Lempel-Ziv encodings) and that they are mathematically easy to handle (see Lohrey [15] for a survey on the role of grammar-based compression for algorithmics on compressed strings). Many basic problems on strings, e. g., comparison, pattern matching, membership in a regular language, retrieving subwords, etc. can all be solved in polynomial time directly on the grammars [15]. In addition, grammar-based compression has been successfully applied in combinatorial group theory (see the textbook [16] by Lohrey) and to prove problems in computational topology to be polynomial-time solvable [15]. Grammars as compression schemes have also been extended to more complicated objects, e. g., trees (see [17,18,19,20,21], and [21, 22] for applications in term unification) and two-dimensional words (see [23]). It is also worth pointing out the successful applications of compression-techniques for solving word equations (see, e. g., [24, 25]).

A rather recent result is that any context-free grammar for a single word can be transformed in linear time into an equivalent one that is balanced in the sense that the depth of its derivation tree is logarithmic in the size of the represented word (see [26]). This result has a direct impact on basic algorithmic problems on grammar-compressed data, e. g., the random access problem (i. e., accessing in the compressed string the symbol at a given position).

1.3 The Smallest Grammar Problem

For grammar-based compression, the central computational problem is that of computing a smallest (or at least small) grammar for a given word, which is called the smallest grammar problem,Footnote 5 and the respective literature is mainly about approximation algorithms:Footnote 6LZ78 [35], LZW [36], Bisection [7], Sequitur [1, 2] and Sequential [8], Longest Match [6], Greedy [37], Re-Pair [38] (the names of algorithms in this list are according to [33, 34]). These algorithms share the benefit of being rather simple and fast, and their approximation ratios have been studied thoroughly by Charikar et al. in [33], by Lehman in his PhD-thesis [34] and some bounds have recently been further improved by Hucke et al. [39]. Unfortunately, none of the approximation ratios are constant and the currently best achieved approximation ratio is \(\mathcal {O}\left (\log \left (\frac {|{w}|}{m^{*}}\right )\right )\), where m is the size of a smallest grammar (i. e., it is still open whether an approximation algorithm with a constant approximation-ratio exists, or equivalently, whether the problem is in APX). This result is due to the algorithms by Rytter [40] and Charikar et al. [33, 34], which have been developed independently from each other and are not mentioned in the above list. On the other hand, assuming PNP, it has been shown in [33, 34] that an approximation ratio better than \(\frac {8569}{8568} \approx 1.0001\) is not possible (thus, ruling out a polynomial-time approximation scheme (PTAS)). However, the research seems to have stagnated at this huge gap between lower and upper bound and still neither an approximation algorithm with a constant approximation ratio nor stronger inapproximability results are known.

The strong bias towards approximation algorithms is usually justified by the general NP-hardness of the smallest grammar problem, but, as explained next, this theoretical justification is seriously flawed. The NP-completeness can be shown by a reduction from vertex cover (see [33, 34]), but in the reduction, an unbounded number of symbols in the underlying alphabet is needed. This means nothing less than that the hardness-reduction is invalid for any realistic scenario, where we deal with a constant alphabet (even more, if the alphabet is rather small, as it is the case in practical applications). Consequently, since the motivation for the approximation algorithms mentioned above is of a rather practical kind (i. e., string compression in real-world scenarios), this theoretical foundation falls apart (in particular, note that an unbounded alphabet is also necessary for the inapproximability result of [33, 34]). One reason for this situation is probably that in [41], it is claimed that the hardness for alphabets of size 3 follows from [10], but a closer look into [10] does not confirm this (we elaborate on this claim in Section 2.4). Consequently, the NP-hardness of the smallest grammar problem for fixed alphabets is essentially open (for well over 30 years, taking [9, 10] as the first reference, which investigates hardness and complexity questions).

1.4 Our Contribution

The main result of this paper is a reduction that proves the smallest grammar problem for fixed alphabets to be NP-complete, at least for alphabet sizes of 17 or larger. As explained above, this closes an important gap in the literature and therefore puts the previous work on grammar-based compression on a more solid theoretical foundation.

Moreover, it also follows that the optimisation version of the smallest grammar problem is APX-hard; thus, the impossibility of a PTAS, previously only known for unbounded alphabets, carries over to the more realistic case of bounded alphabets. By a minor modification of this reduction, we can also show that these two hardness results hold for a slightly different (but frequently used) size measure of grammars, i. e., the rule-size, which equals the size of a grammar as defined above plus the number of its rules (both these measures are formally defined Section 2.2).

Given these negative complexity results, we move on to the question of whether smallest grammars can be efficiently computed, if certain parameters (e. g., levels of the derivation tree, number of rules) are bounded. In this regard, we show that smallest grammars can be computed in polynomial time, provided that the size of the nonterminal alphabet (i. e., number of rules) is bounded. This result, which is due to an encoding of the smallest grammar problem as a problem on graphs with interval structure, raises two follow-up questions: (1) is the problem fixed-parameter tractable with respect to the number of rules, (2) is it possible to efficiently compute, how many rules are at least necessary for a smallest grammar? Both of these questions are answered in the negative, by showing W[1]-hardness and NP-hardness, respectively.

Finally, we investigate exact exponential-time algorithms which are not yet considered in the literature. We consider this a relevant topic, since grammars are particularly suitable for solving basic problems directly on the compressed representation without decompression, which motivates scenarios, where an extensive running time is invested only once, in order to obtain an optimal compression, which is then stored and worked with. While brute-force algorithms with running time \(\mathcal {O}^{*}(c^{|{w}|})\), for a constant c, can be easily found, we present a dynamic programming algorithm with running time \(\mathcal {O}^{*}(3^{{|w|}})\).

The exploitation of hierarchical structure is one of the main features of grammars (making them suitable tools for structural inference, and also allowing exponential compression rates) and is reflected in the number of levels of the corresponding derivation tree. Hence, from a (parameterised) complexity point of view, it is natural to measure the impact of this “hierarchical depth” of grammars with respect to the complexity of the smallest grammar problem. To this end, we investigate the above mentioned questions also for 1-level grammars, i. e., grammars in which only the start rule contains nonterminals and, surprisingly, our results suggest that computing general grammars is, if at all, only insignificantly more difficult than computing 1-level grammars. More precisely, the smallest grammar problem for 1-level grammars is NP-hard for alphabets of size 5 (also with respect to the rule size measure), W[1]-hard if parameterised by the number of rules, it can be solved in polynomial time if the number of rules is bounded by a constant and there is an \(\mathcal {O}^{*}(1.8392^{|{w}|})\) exact algorithm. Moreover, the exact exponential-time algorithm for the general case works incrementally in the sense that in the process of producing a smallest grammar, it also produces a smallest 1-level grammar, a smallest 2-level grammar and so on.

1.5 Outline of the Paper

In Section 2, we give basic definitions, we define the smallest grammar problem, we illustrate it with several examples and also illustrate in detail the connections between grammar-based compression and the related macro schemes by Storer and Szymanski [9]. The next section contains the hardness results mentioned above, where the 1-level and the multi-level case is treated separately in Sections 3.1 and 3.2, respectively (in Section 3.3, we define and discuss possible extensions of the hardness reductions). The second main part of the paper is Section 4, where we show that the smallest grammar problem can be solved in polynomial time, if the number of nonterminals is bounded (in Section 4.1, we discuss some related questions). In the last part, Section 5, we first present a (simple) exact exponential-time algorithm for the 1-level case and then, in Section 5.2, we define the dynamic programming algorithm for the multi-level case. Finally, in Section 6, we summarise our results, point out open problems and mention further research tasks.

2 Preliminaries

In this section, we first introduce some general mathematical definitions and terminology about strings, and some basic concepts from graph theory and complexity theory. Then we define grammars and the smallest grammar problem and illustrate it by several examples. We conclude this section by a discussion of Storer and Szymanski’s external pointer macro scheme already mentioned in Section 1.

Let \(\mathbb {N} = \{1, 2, 3, \ldots \}\) denote the natural numbers. By |A|, we denote the cardinality of a set A. Let Σ be a finite alphabet of symbols. A word or string (over Σ) is a sequence of symbols from Σ. For any word w over Σ, |w| denotes the length of w and ε denotes the empty word, i. e., |ε| = 0. The symbol Σ+ denotes the set of all non-empty words over Σ and Σ = Σ+ ∪{ε}. For the concatenation of two words w1, w2 we write w1w2 or simply w1w2. For every symbol a ∈Σ, we denote by |w|a the number of occurrences of symbol a in w. We say that a word v ∈Σ is a factor of a word w ∈Σ if there are \(u_{1}, u_{2} \in {\Sigma }^{*}\) such that w = u1vu2. If u1 = ε (or u2 = ε), then v is a prefix (or a suffix, respectively) of w. Furthermore, F(w) = {uu is a factor of w} and F≥ 2(w) = {uuF(w),|u|≥ 2}. For a position j, 1 ≤ j ≤|w|, we refer to the symbol at position j of w by the expression w[j] and \(w[j..j^{\prime }] = w[j] w[j + 1] {\ldots } w[j^{\prime }]\), \(j \leq j^{\prime } \leq |{w}|\). By wR, we denote the reversal of w, i. e., wR = w[n]w[n − 1]…w[1], where |w| = n.

A factorisation of a word w is a tuple (u1, u2,…, uk) with uiε, 1 ≤ ik, such that w = u1u2uk.

2.1 Basic Concepts of Graph Theory and Complexity Theory

We use undirected graphs, which are represented as pairs (V, E), where V is the set of vertices and E is the set of edges. For the sake of convenience, we write edges {u, v}∈ E also as (u, v) or (v, u). For a vertex vV, N(v) = {u∣(v, u) ∈ E} is the (open) neighbourhood (of v), N[v] = N(v) ∪{v} is the closed neighbourhood (of v) and, furthermore, we extend the notation of closed neighbourhood to sets \(C \subseteq V\) in the obvious way, i. e., \(N[C] = \bigcup _{v \in C}N[v]\). A graph is cubic (or subcubic) if, for every vV, |N(v)| = 3 (or |N(v)| ≤ 3, respectively).

A set \(C \subseteq V\) is

  • an independent set if, for every u, vC, (u, v)∉E,

  • a dominating set if N[C] = V,

  • an independent dominating set if it is both an independent and a dominating set,

  • a vertex cover if, for every (u, v) ∈ E, {u, v}∩ C.

We are concerned with the corresponding problems of deciding, for a given graph G and a \(k \in \mathbb {N}\), whether there is a vertex cover (or an independent dominating set) of cardinality at most k. It is a well-known fact that these decision problems are NP-complete problems (see [42]).

For \(k \in \mathbb {N}\), a graph G = (V, E), with |V | = n, is a k-interval graph, if there are intervals Ii, j, 1 ≤ i ≤|V |, 1 ≤ jk, on the real line, such that G is isomorphic to \((\{v_{i} \mid 1 \leq i \leq |V|\}, \{(v_{i}, v_{i^{\prime }}) \mid \bigcup ^{k}_{j = 1} I_{i, j} \cap \bigcup ^{k}_{j = 1} I_{i^{\prime }, j} \neq \emptyset \})\). For 1-interval graphs (which are also just called interval graphs), it is possible to compute minimal independent dominating sets in linear time (see [43]; note that a perfect elimination ordering (that is part of the input of Farber’s algorithm) can be easily computed in our applications, because the intervals are clear).

We assume the reader to be familiar with the basic concepts of complexity theory (for unexplained notions, see Papadimitriou [44]) and the theory of NP-completeness (see [44] and [42]).

As usual, for our running-time estimations, we mainly use the \(\mathcal {O}\)-notation, but sometimes also the \(\mathcal {O}^{*}\)-notation (ignoring polynomial factors). The latter is appropriate, if we are dealing with exponential-time algorithms (see Section 5).

Since we also wish to discuss some of our results from the parameterised complexity point of view, we shall briefly mention the concepts relevant for us (for detailed explanations on parameterised complexity, the reader is referred to the textbooks [45,46,47]). A parameterised problem is a decision problem with instances (x, k), where x is the actual input and \(k \in \mathbb {N}\) is the parameter. By XP, we denote the class of parameterised problems that are solvable in time \(\mathcal {O}(n^{f(k)})\) (where n is the size of the instance) and FPT denotes the class of fixed-parameter tractable problems, i. e., problems having an algorithm with running-time \(\mathcal {O}(g(k) \cdot f(n))\), for a computable function g and polynomial f.

In order to argue about fixed-parameter intractability, we need the following kind of reductions. A (classical) many-one reduction R from a parameterised problem to another is an fpt-reduction, if the parameter of the target problem is bounded in terms of the parameter of the source problem, i. e., there is a recursive function \(h\colon \mathbb {N} \rightarrow \mathbb {N}\) such that \(R(x, k) = (x^{\prime }, k^{\prime })\) implies \(k^{\prime } \leq h(k)\).

We shall use two different kinds of fixed-parameter intractability. First, if a parameterised problem is NP-hard if the parameter is fixed to a constant, then it is not in FPT, unless P = NP. As a slightly weaker form of fixed-parameter intractability, the framework of parameterised complexity provides the classes of the so-called W-hierarchy, for which the hard problems (with respect to fpt-reductions) are considered fixed-parameter intractable, i. e., they are not in FPT (under some complexity theoretical assumptions). For a detailed definition of the W-hierarchy, we refer to the textbooks [45,46,47]; in this paper, we only use the first level of this hierarchy, i. e., the class W[1], and our respective intractability results are W[1]-hardness results.

A minimisation problemFootnote 7P is a triple (I, S, m) with I being the set of instances, S being a function that maps instances xI to the set of feasible solutions for x, and m being the objective function that maps pairs (x, y) with xI and yS(x) to a positive rational number. For every xI, we denote \(m^{*}(x):=\min \limits \{m(x,y)\colon y\in S(x)\}\). For two minimisation problems P1, P2 with Pj given by (Ij, Sj, mj), j ∈{1,2}, an L-reduction from P1 to P2 is a quadruple (f, g, β, γ) such that

  • f is a polynomial-time computable function from I1 to I2 that satisfies, for every xI1 with S1(x)≠, S2(f(x))≠.

  • g is a polynomial-time computable function that, for every xI1 and yS2(f(x)), maps (x, y) to a solution in S1(x).

  • β is a constant such that \(m_{2}^{*}(f(x))\leq \beta \cdot m_{1}^{*}(x)\) for each xI1.

  • γ is a constant such that \(m_{1}(x,g(x,y))-m_{1}^{*}(x)\leq \gamma \cdot (m_{2}(f(x),y)-m_{2}^{*}(f(x)))\) for each xI1 and yS2(f(x)).

We shall use L-reductions in order to show hardness for APX, the class of optimisation problems for which there exists an approximation algorithm with a constant approximation ratio. Note that, unless P = NP, an APX-hard problem does not have a polynomial-time approximation scheme (see [48] for detailed information of approximation hardness).

2.2 Grammars

A context-free grammar is a tuple G = (N,Σ, R, S), where N is the set of nonterminals, Σ is the terminal alphabet, SN is the start symbol and \(R \subseteq N \times (N \cup {\Sigma })^{+}\) is the set of rules (as a convention, we write rules (A, w) ∈ R also in the form Aw). A context-free grammar G = (N,Σ, R, S) is a singleton grammar if R is a total function N → (N ∪Σ)+ and the relation {(A, B)∣(A, w) ∈ R,|w|B ≥ 1} is acyclic.

For a singleton grammar G = (N,Σ, R, S), let DG: (N ∪Σ) → (N ∪Σ)+ be defined by DG(A) = R(A), AN, and DG(a) = a, a ∈Σ. We extend DG to a morphism (N ∪Σ)+ → (N ∪Σ)+ by setting DG(α1α2αn) = DG(α1)DG(α2)…DG(αn), for αi ∈ (N ∪Σ), 1 ≤ in. Furthermore, for every α ∈ (N ∪Σ)+, we set \({\mathsf {D}^{1}_{G}}(\alpha ) = \mathsf {D}_{G}(\alpha )\), \({\mathsf {D}^{k}_{G}}(\alpha ) = \mathsf {D}(\mathsf {D}^{k-1}_{G}(\alpha ))\), for every k ≥ 2, and \(\mathfrak {D}_{G}(\alpha ) = \lim _{k \to \infty } {\mathsf {D}^{k}_{G}}(\alpha )\) is the derivative of α. By definition, \(\mathfrak {D}_{G}(\alpha )\) exists for every α ∈ (N ∪Σ)+ and is an element from Σ+. The size of the singleton grammar G is defined by \(|G| = {\sum }_{A \in N} |\mathsf {D}_{G}(A)|\) and the rule-size of G is defined by |G|r = |G| + |N| or, equivalently, \(|G|_{\mathsf {r}} = {\sum }_{A \in N} (|\mathsf {D}_{G}(A)| + 1)\). Our main size measure will be |⋅|. The rule-size |⋅|r will play a role in Section 3.3 and will be discussed in more detail there.

Remark 1

The class of singleton grammars exactly coincides with the class of context-free grammars that do not have unreachable rules (i. e., rules that cannot occur in any derivation) and that can derive exactly one word. As mentioned before, such grammars are also called straight-line programs in the literature. A context-free grammar that can derive only a single word and is not a singleton grammar must contain some rules that are not reachable. Since unreachable rules can easily be discovered and removed, we directly add this restriction to the concept of singleton grammars.

The derivation tree of G is a ranked ordered tree with node-labels from Σ ∪ N, inductively defined as follows. The root is labelled by S and every node labelled by AN with D(A) = α1α2αn has n children labelled by α1, α2,…, αn in exactly this order; note that this means that all leaves are from Σ.

From now on, we simply use the term grammar instead of singleton grammar and if the grammar under consideration is clear from the context, we also drop the subscript G. We set \(\mathfrak {D}(G) = \mathfrak {D}(S)\) and say that Gis a grammar for \(\mathfrak {D}(G)\). Since for singleton grammars, the start symbol is somewhat superfluous, we will ignore it and denote grammars G = (N,Σ, R, S) in the form G = (N,Σ, R, ax) instead, where ax = R(S) is called the axiom (of G). In particular, we interpret derivations to start directly with the axiom and, correspondingly, we also sometimes ignore the root of derivation trees. However, this does not change the size measures |⋅| and |⋅|r, which, when ignoring the start symbol, can also be defined as \(|G| = ({\sum }_{A \in N} |\mathsf {D}_{G}(A)|) + |\mathsf {ax}|\) and \(|G|_{\mathsf {r}} = ({\sum }_{A \in N} (|\mathsf {D}_{G}(A)| + 1)) + |\mathsf {ax}| + 1\).

The number of levels of a grammar G = (N,Σ, R, ax) is \(\min \limits \{k \mid {\mathsf {D}^{k}_{G}}(\mathsf {ax}) = \mathfrak {D}_{G}(\mathsf {ax})\}\), and a grammar with d levels is a d-level grammar. Intuitively speaking, a grammar G is a d-level grammar if we need exactly d derivation steps in order to derive \(\mathfrak {D}(G)\) from the axiom; thus, the number of levels measures what we called in the introduction the “hierarchical depth” of a grammar. Note that for a d-level grammar, the derivation tree has a maximum depth of d + 1 and d + 2 levels (when counting the root as well). With this definition, the grammars that are the most restricted with respect to their hierarchical depth and that are still reasonable, are 1-level grammars (i. e., an axiom that derives a word in one step).

Let G = (N,Σ, R, ax) be a 1-level grammar. The profit of a rule (A, α) ∈ R is defined by p(A) = |ax|A(|α|− 1) −|α|. Intuitively speaking, if all occurrences of A in ax are replaced by α and the rule Aα is deleted, then the size of the grammar increases by exactly p(A). Consequently, \(|G| = |\mathfrak {D}(G)| - {\sum }_{A \in N} \mathsf {p}(A)\).

Example 1

The grammar G = (N,Σ, R, ax) with N = {A, B}, Σ = {,}, ax = AAB and

$$ R = \{A \to B {\mathtt{a}} B, B \to {\mathtt{b}} {\mathtt{a}} {\mathtt{a}} {\mathtt{b}}\} $$

is a 2-level grammar of size 13 (and rule-size 16) with axiom AAB. Furthermore, \(\mathfrak {D}(B) = {\mathtt {b}} {\mathtt {a}} {\mathtt {a}} {\mathtt {b}}\), \(\mathfrak {D}(A) = \mathfrak {D}(B) {\mathtt {a}} \mathfrak {D}(B) = {\mathtt {b}} {\mathtt {a}} {\mathtt {a}} {\mathtt {b}} {\mathtt {a}} {\mathtt {b}} {\mathtt {a}} {\mathtt {a}} {\mathtt {b}}\) and

$$ \mathfrak{D}(G) = \mathfrak{D}(S) = \underbrace{{\mathtt{b}} {\mathtt{a}} {\mathtt{a}} {\mathtt{b}} {\mathtt{a}} {\mathtt{b}} {\mathtt{a}} {\mathtt{a}} {\mathtt{b}}}_{\mathfrak{D}(A)} {\mathtt{b}} {\mathtt{a}} \underbrace{{\mathtt{b}} {\mathtt{a}} {\mathtt{a}} {\mathtt{b}} {\mathtt{a}} {\mathtt{b}} {\mathtt{a}} {\mathtt{a}} {\mathtt{b}}}_{\mathfrak{D}(A)} \underbrace{{\mathtt{b}} {\mathtt{a}} {\mathtt{a}} {\mathtt{b}}}_{\mathfrak{D}(B)} {\mathtt{b}} . $$

Consequently, G is a size 13 representation of a word of length 25. A derivation tree of G can be seen in Fig. 1.

Fig. 1
figure 1

Derivation tree for the grammar G from Example 1 (for the sake of convenience, neighbouring leaves are merged)

Replacing the axiom by R(A)R(A)B = BBBBB and deleting rule ABB turns G into a 1-level grammar \(G^{\prime }\) with \(\mathfrak {D}(G^{\prime }) = \mathfrak {D}(G)\). Moreover, p(B) = |ax|B(|R(B)|− 1) −|R(B)| = 5(4 − 1) − 4 = 11 and \(|G^{\prime }| = |\mathfrak {D}(G^{\prime })| - \mathsf {p}(B) = 25 - 11 = 14\).

A smallest grammar for a word w is any grammar G with \(\mathfrak {D}(G) = w\) and \(|G| \leq |G^{\prime }|\) for every grammar \(G^{\prime }\) with \(\mathfrak {D}(G^{\prime }) = w\); generally, a grammar G is smallest if it is a smallest grammar for \(\mathfrak {D}(G)\) (grammars that are smallest with respect to the rule-size measure will be called r-smallest grammars). The decision problem variant of computing smallest grammars is defined as follows:

  • Smallest Grammar Problem (SGP)

  • Instance: A word w and a \(k \in \mathbb {N}\).

  • Question: Does there exist a grammar G with \(\mathfrak {D}(G) = w\) and |G|≤ k?

The Smallest 1-Level Grammar Problem(1-SGP) is defined analogously, with the only difference that we ask for a 1-level grammar of size at most k. By SGPr and 1-SGPr, we denote the problem variants, where we consider the rule-size instead of the size, i. e., we require |G|rk.

The optimisation variant of SGP, i. e., the task of actually producing a smallest grammar for a given word w, shall be denoted by SGPopt (and SGPr, opt if we are concerned with the rule-size). More precisely, according to the definitions given in Section 2.1, SGPopt = (I, S, m), where I = Σ, \(S(w) = \{G \mid \mathfrak {D}(G) = w\}\) and m(w, G) = |G| (or m(w, G) = |G|r for SGPr, opt).

2.3 Examples

While the following examples illustrate the smallest grammar problem in general, they are particularly tailored to the technicalities to be encountered in Section 3, i. e., they shall point out the difficulties arising in predicting how factors in a larger word are compressed by a smallest grammar, which is crucial in the design of gadgets for a hardness reduction.

Let \(w = {\prod }^{n}_{i = 1} 1 0^{i}\) be a word over the binary alphabet Σ = {0,1}, where n = 2k, \(k \in \mathbb {N}\). This word has a very simple structure and can be interpreted as a list of a (potentially unbounded) number of integers. This is crucial, since if we want to encode objects (e. g., graphs), the size of which is not bounded in terms of the alphabet size, then structures of this form will inevitably appear.

One way of compressing w that comes to mind is by the use of rules A1 → 10, AiAi− 10, 2 ≤ in − 1, and an axiom A1A2An− 1An− 10, which leads to the grammar G1 = (N,Σ, R, ax), with:

$$ \begin{array}{@{}rcl@{}} N &=&\{A_{i} \mid 1 \leq i \leq n-1\} ,\\ R &=&\{A_{1} \to 1 0\} \cup \{A_{i} \to A_{i - 1}0 \mid 2 \leq i \leq n-1\} ,\\ \mathsf{ax} &=& A_{1} A_{2} {\ldots} A_{n - 1} A_{n - 1} 0 . \end{array} $$

This grammar has an overall size given by \(|G_{1}| = \underbrace {n + 1}_{\mathsf {ax}} + \underbrace {2(n - 1)}_{\text {rules}} = 3n - 1\).

However, it is also possible to construct the factors 0i, 1 ≤ in, “from the middle” by rules A1 → 010, Ai → 0Ai− 10, \(2 \leq i \leq \frac {n}{2} - 1\), and an axiom 1(A1)2(A2)2… By using these ideas, we can construct the smaller grammar G2 = (N,Σ, R, ax), where

$$ \begin{array}{@{}rcl@{}} N &= &\{A_{i} \mid 1 \leq i \leq \tfrac{n}{2} - 1\} \cup \{B_{i} \mid 1 \leq i \leq k - 2\} ,\\ R &= &\{A_{1} \to 0 1 0, B_{1} \to 0 0\} \cup \{A_{i} \to 0 A_{i - 1}0 \mid 2 \leq i \leq \tfrac{n}{2} - 1\} \cup \\ &&\{B_{i} \to B_{i - 1} B_{i - 1} \mid 2 \leq i \leq k - 2\} ,\\ \mathsf{ax} &= &1 (A_{1})^{2} (A_{2})^{2} {\ldots} (A_{\frac{n}{2} - 1})^{2} 0 A_{\frac{n}{2} - 1}0 B_{k-2} B_{k-2} . \end{array} $$

We have \(|G_{2}| = \underbrace {n + 4}_{\mathsf {ax}} + \underbrace {3(\tfrac {n}{2} - 1) + 2(k-2)}_{\text {rules}} = \frac {5n}{2} + 2k - 3\).

Both of these grammars achieve an asymptotic compression rate of order \(\mathcal {O}(\sqrt {|{w}|})\), but, generally, grammars are capable of exponential compression rates (see [33, 34]). Aiming for such exponential compression, it seems worthwhile to represent every unary factor \(0^{2^{\ell }}\), 1 ≤ k, by a nonterminal B (obviously, this requires only k rules of size 2) and then represent all unary factors by sums of these powers (e. g., 074 is compressed by B1B3B6). Formally, consider G3 = (N,Σ, R, ax), where

$$ \begin{array}{@{}rcl@{}} N &= &\{B_{i} \mid 1 \leq i \leq k - 1\} ,\\ R &= &\{B_{1} \to 0 0\} \cup \{B_{i} \to B_{i - 1} B_{i - 1} \mid 2 \leq i \leq k - 1\} ,\\ \mathsf{ax} &= &\left( \prod\limits_{i = 1}^{n-1} 1 \alpha_{i}\right) (B_{k-1})^{2} , \end{array} $$

where αi = x0x1xk− 1 and, for every j, 1 ≤ jk − 1, xj = Bj if the jth bit (i. e., the one representing 2j) of the binary representation of i is 1 and xj = ε otherwise. However, this yields a grammar of size

$$ |G_{3}| = \underbrace{\tfrac{1}{2}(n-1)k}_{\mathsf{ax}} + \underbrace{2(k-1)}_{\text{rules}} = \frac{k(n + 3)}{2} - 2 , $$

which, if k is sufficiently large, is worse than the previous grammars.

A grammar that is even smaller than G2 can be obtained by combining the idea of G2 with that of representing factors \(0^{2^{\ell }}\) by nonterminals B. More precisely, for every , 1 ≤ ik − 2, we represent \(0^{2^{\ell }}\) by an individual nonterminal B and, in addition, we use rules A1 → 010, Ai → 0Ai− 10, \(2 \leq i \leq \frac {n}{4}\). Then the left and right half of w can be compressed in the way of G2, with the only difference that in the right part, for every unary factor, we also need an occurrence of Bk− 1, i. e., consider G4 = (N,Σ, R, ax) with:

$$ \begin{array}{@{}rcl@{}} N &= &\{A_{i} \mid 1 \leq i \leq \tfrac{n}{4}\} \cup \{B_{i} \mid 1 \leq i \leq k-1\} ,\\ R &= &\{A_{1} \to 0 1 0, B_{1} \to 0 0\} \cup \{A_{i} \to 0 A_{i - 1} 0 \mid 2 \leq i \leq \tfrac{n}{4}\} \cup \\ &&\{B_{i} \to B_{i - 1} B_{i - 1} \mid 2 \leq i \leq k-1\} ,\\ \mathsf{ax} &= &1 (A_{1})^{2} (A_{2})^{2} {\ldots} (A_{\frac{n}{4}})^{2} B_{k-2} \\ &&(A_{1} B_{k-1})^{2} (A_{2} B_{k-1})^{2} {\ldots} (A_{\frac{n}{4} - 1} B_{k-1})^{2} A_{\frac{n}{4}} B_{k-1} B_{k-2} . \end{array} $$

This grammar yields a size of \(|G_{4}| = \underbrace {\tfrac {3n}{2} + 1}_{\mathsf {ax}} + \underbrace {\tfrac {3n}{4} + 2(k-1)}_{\text {rules}} = \frac {9n}{4} + 2k - 1\). Note that again the asymptotic compression rate is of order \(\mathcal {O}(\sqrt {|{w}|})\).

These considerations point out that even for simply structured words like w, it is very difficult to determine the structure of a smallest grammar or its size. However, for reducing an NP-hard problem, we need to know, to at least some extent, how smallest grammars compress the constructed strings in order to relate the reduced instances to the original instances. Consequently, the above examples point out the challenges that arise in this regard.

We conclude this list of examples, by pointing out that giving a smallest grammar for our toy-example \(w = {\prod }^{n}_{i = 1} 1 0^{i}\) in dependency of n, is essentially an open problem. A respective asymptotic bound of \({\Omega }(\sqrt {|w|})\) is a reasonable assumption, but we have no proof for this claim.

2.4 Storer and Szymanski’s External Pointer Macro Scheme and Grammar-Based Compression

Storer and Szymanski [9] introduce a very general form of a compression scheme that covers a large variety of different compression strategies, in particular also grammar-based compression. On the one hand, we cite their work as the first that, in a sense, considered grammar-based compression, but in the context of our paper, it is also of greater importance for the following reasons. The technical report [10]Footnote 8 provides a comprehensive complexity analysis of many different variants of Storer and Szymanski’s compression scheme with many NP-hardness reductions. Some of the considered variants also concern the case of fixed alphabets, which has led to the misunderstanding that the hardness of the smallest grammar problem for fixed alphabets is provided by [10], leading to the misconception that also in practical scenarios – i. e., for fixed alphabets – grammar-based compression is known to be intractable. Since closing this gap by providing the assumed hardness result is one of the main objectives of this paper, we shall discuss in some more detail why it cannot already be found among the many hardness results of [10].

First, we recall the definitions of Storer and Szymanski [9] that are relevant here. For a word w ∈Σ+ and a pointer size \(p \in \mathbb {N}\), a compressed form of w for pointer size p using the external pointer macro, EPM for short, is any word s0#s1 with \(s_{0}, s_{1} \in ({\Sigma } \cup \{1, 2, \ldots , |s_{0}|\}^{2})^{+}\), #∉Σ, and w can be obtained from s0#s1 by repeating the following two steps:

  • Replace every symbol (i, j) in s1 by s0[i..j],

  • repeat the first step until s1 equals w.

The size of an EPM s0#s1 is defined by \({\sum }_{i = 1}^{|s_{0} s_{1}|} \ell _{i}\), where i = 1, if s0s1[i] ∈Σ and i = p, otherwise (i. e., each occurrence of a symbol from \(\{1, 2, \ldots , |s_{0}|\}^{2}\) (the actual pointers) contribute the pointer size p to the overall size of the EPM).

A grammar for a word w easily translates into an EPM for w. For example, the grammar G = (N,Σ, R, ax) with N = {A, B}, Σ = {a, b, c}, R = {ABcB, Bba} and ax = AabBBAc translates into the external pointer macro ba(1,2)c(1,2)#(3,5)ab(1,2)(1,2)(3,5)c. More precisely, the prefix ab is the right side of the rule for B, (1,2)c(1,2) corresponds to the right side of the rule for A, where the occurrences of B are represented by pointers (1,2) to the prefix s0[1..2] = ab, (3,5)ab(1,2)(1,2)(3,5)c corresponds to the axiom, where occurrences of A and B are represented by pointers (3,5) and (1,2), respectively. If the pointer size is 1, then the EPM has the same size as the grammar.

If an EPM s0#s1 is non-overlapping, i. e., it is never the case that for two pointers (i, j) and (k, ) we have ikj or ki, then it also translates into a grammar by transforming each pointer (i, j) into a nonterminal A(i, j) with a rule A(i, j)s0[i..j]. In this regard, it is important to note that the property of an EPM that s1 can be turned into w by repeated replacement of the pointers ensures that the derivation function of the grammar constructed in this way is acyclic.

We conclude that the concept of singleton grammars and the concept of EPMs with pointer size 1 and without overlapping are more or less identical, i. e., they just differ syntactically. Consequently, the problem of grammar-based compression and the problem of computing smallest EPMs with pointer size 1 and without overlapping are identical problems.

However, a closer look at Storer [10] shows that in this paper the variant of computing EPMs with pointer size 1 is not considered. Instead, the focus is on EPMs (and other kind of compression schemes), for which the pointer size is not even constant, but a function of the length of the word that is compressed, typically logarithmic in the size |w|. Note that this avoids the main difficulties encountered when designing a reduction for grammar-based compression with fixed alphabets (see Section 3): the factors that encode vertices of a graph must have unbounded length, which makes it rather difficult to control how the grammar compresses these codewords. On the other hand, if the pointers (which correspond to nonterminals in the grammar) have size \(\log (|{w}|)\), then it does not make sense to compress factors that are smaller than this size (since we gain nothing by replacing them by pointers). It is straightforward to represent a graph as a word of length linear in the size of the graph, where the length of the factors (i. e., the codewords) that represent single vertices are logarithmic in the size of the graph (this is the case in all reductions of [9, 33, 34]). The property mentioned above, i. e., that factors of logarithmic size are not compressed, then simply means that we can assume that the codewords for vertices are not compressed in the string that describes the graph, which makes is rather simple to devise a hardness reduction (in fact, controlling the possible compression of codewords is the main technical challenge in our reductions).

3 N P-Hardness of Computing Smallest Grammars for Fixed Alphabets

In their basic structure, the hardness reductions to be presented next are similar to the one from [33, 34], which shows NP-hardness of SGP for unbounded alphabets by a reduction from the vertex cover problem. All the effort of this section will consist in the extension of the general idea to the case of a fixed alphabet. In order to facilitate the accessibility of our technical proofs, we shall sketch this reduction from [33, 34].

Let \(\mathcal {G} = (V, E)\) be a graph with

$$ V=\{v_{1},\dots,v_{n}\} \text{ and } E=\{(v_{j_{2i-1}},v_{j_{2i}})\mid 1 \leq i \leq m\} . $$

We define the following word over the alphabet V ∪{◇i∣1 ≤ i ≤ 5n + m}∪{#} (for the sake of simplicity, every individual occurrence of ◇ in the word stands for a distinct symbol of {◇i∣1 ≤ i ≤ 5n + m}):

$$ \begin{array}{@{}rcl@{}} w_{\mathcal{G}} = \prod\limits_{i = 1}^{n}(\#v_{i} \diamond v_{i}\# \diamond)^{2}\prod\limits_{i = 1}^{n}(\# v_{i} \# \diamond) \prod\limits_{i = 1}^{m}(\#v_{j_{2i-1}}\#v_{j_{2i}}\#\diamond) . \end{array} $$

Let G = (N,Σ, R, S) be a smallest grammar for \(w_{\mathcal {G}}\), then we can observe the following:

  • For every AN, \(\mathfrak {D}(A) \in \{\# v_{i}, v_{i} \#, \# v_{i} \# \mid 1 \leq i \leq n\}\). This is due to the fact that the only factors of \(w_{\mathcal {G}}\) with repetitions are of the form #vi, vi# or #vi#.

  • We can assume that, for every i, 1 ≤ in, there are rules Ai#vi and Bivi#, since if some of these rules are missing, then adding them and compressing the respective factors does not increase the size of the grammar.

  • Let \(\mathfrak {I} \subseteq \{1, 2, \ldots , n\}\) contain exactly the indices i such that a rule with derivative #vi# exists; moreover, we can assume that all these rules have the form CiAi#.

  • Let \({\Gamma } = \{v_{i} \mid i \in \mathfrak {I}\}\). If an edge \((v_{j_{2i-1}}, v_{j_{2i}})\) is not covered by Γ, then adding a rule \(C_{j_{2i-1}} \to A_{j_{2i-1}} \#\) or \(C_{j_{2i}} \to A_{j_{2i}} \#\) does not increase the size of the grammar. So we can assume that Γ is a vertex cover.

These observations show that there exists a grammar G for \(w_{\mathcal {G}}\) with |G|≤ 15n + 3m + k if and only if there is a vertex cover for \(\mathcal {G}\) of size at most k (for a formal proof, we refer to [33, 34]).

A simple modification of this reduction yields the following.

Theorem 1

1-SGP is NP-complete.

Proof

We slightly change the reduction from [33, 34] as follows:

$$ \begin{array}{@{}rcl@{}} w_{\mathcal{G}} = \prod\limits_{i = 1}^{n}(\#v_{i}\diamond v_{i}\# \diamond)^{2}\prod\limits_{i = 1}^{n}(\#v_{i}\#\diamond)^{2} \prod\limits_{i = 1}^{m}(\#v_{j_{2i-1}}\#v_{j_{2i}}\#\diamond) . \end{array} $$

The only difference from the original reduction is that the size of the rules with derivative #vi# has increased by 1, i. e., they now have the form Ci#vi#, so by repeating the factors #vi# ◇, we make sure that adding such a rule whenever an edge is not covered does not increase the size of the grammar. □

In these reductions, we encode the different vertices of a graph by single symbols and also use individual separator symbols (i. e., symbols with only one occurrence in the word to be compressed). This makes it particularly easy to devise suitable gadgets, but, on the other hand, it assumes that we have an arbitrarily large alphabet at our disposal. In the remainder of this section, we shall extend these hardness results to the more realistic case of fixed alphabets. The general structure of our reductions is similar to the ones of [10, 33, 34] sketched above, but, due to the constraint of having a fixed alphabet, they substantially differ on a more detailed level. More precisely, since fixed alphabets make it impossible to use single symbols (or even words of constant size) as separators or as representatives for vertices, we need to use special encodings for which we are able to determine how a smallest grammar will compress them (in this regard, recall our examples from Section 2.3 demonstrating how difficult it can be to determine a smallest grammar even for a single simply structured word). This constitutes a substantial technical challenge, which complicates our reductions considerably.

In the following, we prove that 1-SGP and SGP are NP-hard, even for constant alphabet of size 5 and 24, respectively. The stronger result claimed in the abstract and introduction, i. e., the hardness of SGP for alphabets of size 17, is presented later as an improvement (see Section 3.4, Corollary 1).

3.1 The 1-Level Case

As a tool for proving the hardness of 1-SGP, but also as a result in its own right, we first show that the compression of any 1-level grammar is at best quadratic (in contrast to general grammars, which can achieve exponential compression). Note that the bound of Lemma 1 is tight, e. g., consider \(\mathtt {a}^{n^{2}}\) and a grammar with rules SAn and An.

Lemma 1

Let G be a 1-level grammar. Then \(|G| \geq 2 \sqrt {|\mathfrak {D}(G)|}\).

Proof

Let \(n = |\mathfrak {D}(G)|\), let ax be the axiom and let Au be a rule with a right side of maximum length. Obviously, |ax||u|≥ n, and, since \(x+y\geq 2\sqrt {xy}\) holds for all x, y ≥ 0, also \(|\mathsf {ax}| + |u| \geq 2 \sqrt {|\mathsf {ax}||u|}\). Consequently,

$$ |G| \geq |\mathsf{ax}| + |u| \geq 2 \sqrt{|\mathsf{ax}||u|} \geq 2\sqrt{n} . $$

In order to prove the NP-hardness of 1-SGP for constant alphabets, we also devise a reduction from the vertex cover problem. To this end, let \(\mathcal {G} = (V, E)\) be the graph defined above and, without loss of generality, we assume n ≥ 40. We define Σ = {,,◇,⋆, #} and \([\diamond ] = \diamond ^{n^{3}}\). For each i, 1 ≤ in, we encode vi by a word \(\overline {v_{i}} \in \{\mathtt {a},\mathtt {b}\}^{\lceil \log (n)\rceil }\) such that \(\overline {v_{i}} \neq \overline {v_{j}}\) if and only if ij (e. g., by taking \(\overline {v_{i}}\) to be the binary representation of i over symbols and with \(\lceil \log (n)\rceil \) many digits). We now define the following word over Σ:

$$ \begin{array}{@{}rcl@{}} w &= &\prod\limits_{i=1}^{n}(\# \overline{v_{i}} [\diamond] \overline{v_{i}} \# [\diamond])^{2\left\lceil \log(n) \right\rceil+3} \prod\limits_{i=1}^{n}(\# \overline{v_{i}} \# [\diamond])^{\left\lceil \log(n) \right\rceil +1} \\ &&\prod\limits_{i = 1}^{m}(\# \overline{v_{j_{2i-1}}} \# \overline{v_{j_{2i}}} \# [\diamond])^{2} \star [\diamond]^{n^{3}} . \end{array} $$

First, we show how a vertex cover for \(\mathcal {G}\) translates into a grammar for w:

Lemma 2

If there exists a size k vertex cover of \(\mathcal {G}\), then there exists a 1-level grammar G with \(\mathfrak {D}(G) = w\) and \(|G| = 13n\left \lceil \log (n) \right \rceil + 17n + k + 6m + 1 + 2n^{3}\).

Proof

Let \({\Gamma } \subseteq V\) be a size-k vertex cover of \(\mathcal {G}\). We define a grammar G = (N,Σ, R, ax) with

$$ \begin{array}{@{}rcl@{}} N &= &\{D, \overset{{~}_{\leftarrow}}{V_{i}}, {\!}_{\rightarrow}{V_{i}}, \overset{{~}_{\leftrightarrow}}{V_{j}} \mid 1 \leq i \leq n, v_{j} \in {\Gamma}\} ,\\ R &= &\{S \to u, D \to [\diamond]\} \cup \{\overset{{~}_{\leftarrow}}{V_{i}} \to \# \overline{v_{i}}, {\!}_{\rightarrow}{V_{i}} \to \overline{v_{i}} \# \mid 1 \leq i \leq n\} \cup \\ &&\{\overset{{~}_{\leftrightarrow}}{V_{j}} \to \# \overline{v_{j}} \# \mid v_{j} \in {\Gamma}\} ,\\ \mathsf{ax} &= &\prod\limits_{i=1}^{n}(\overset{{~}_{\leftarrow}}{V_{i}} D {\!}_{\rightarrow}{V_{i}} D)^{2\left\lceil \log(n) \right\rceil+3} \prod\limits_{i=1}^{n}(y_{i} D)^{\left\lceil \log(n) \right\rceil +1} \prod\limits_{i = 1}^{m}(z_{i} D)^{2} \star D^{n^{3}} , \end{array} $$

where, for every i, 1 ≤ in, \(y_{i} = \overset {{~}_{\leftrightarrow }}{V_{i}}\) if vi ∈Γ and \(y_{i} = \overset {{~}_{\leftarrow }}{V_{i}} \#\) otherwise, and, for every i, 1 ≤ im, \(z_{i} = \overset {{~}_{\leftrightarrow }}{V}_{j_{2i-1}} {\!}_{\rightarrow }{V}_{j_{2i}}\) if \(v_{j_{2i-1}} \in {\Gamma }\) and \(z_{i} = \overset {{~}_{\leftarrow }}{V}_{j_{2i-1}} \overset {{~}_{\leftrightarrow }}{V}_{j_{2i}}\) if \(v_{j_{2i-1}} \notin {\Gamma }\) (note that in this case \(v_{j_{2i}} \in {\Gamma }\)).

Obviously, G is a 1-level grammar and it can be easily verified that \(\mathfrak {D}(G) = w\). It remains to determine the size of G. To this end, we first observe that each rule \(\overset {{~}_{\leftarrow }}{V_{i}} \to \# \overline {v_{i}}\) and \({\!}_{\rightarrow }{V_{i}} \to \overline {v_{i}} \#\), 1 ≤ in, has size of \(\lceil \log (n)\rceil + 1\), each rule \(\overset {{~}_{\leftrightarrow }}{V_{j}} \to \# \overline {v_{j}} \#\), vj ∈Γ, has size of \(\lceil \log (n)\rceil + 2\), and the rule D → [◇] has size of n3. Hence, the size contributed by these rules is

$$ 2n\lceil\log(n)\rceil + 2n + k\lceil\log(n)\rceil+ 2k + n^{3} . $$

The axiom has size of

$$ \begin{array}{@{}rcl@{}} &&4n(2\left\lceil \log(n) \right\rceil+3) + (3n - k)(\left\lceil \log(n) \right\rceil+1) + 6m + 1 + n^{3}\\ &= &11n\left\lceil \log(n) \right\rceil - k \left\lceil \log(n) \right\rceil + 15n - k + 6m + 1 + n^{3} . \end{array} $$

So the total size is

$$ 13n\left\lceil \log(n) \right\rceil + 17n + k + 6m + 1 + 2n^{3} . $$

Next, we take care of the opposite direction, i. e., we show how a vertex cover can be extracted from a grammar for w:

Lemma 3

If there exists a 1-level grammar G with \(\mathfrak {D}(G) = w\) and \(|G| \leq 13n\left \lceil \log (n) \right \rceil + 17n + k + 6m + 1 + 2n^{3}\), then there exists a size k vertex cover of \(\mathcal {G}\).

Proof

Let G = (N,Σ, R, ax) be a smallest 1-level grammar with

$$ |G| \leq 13n\left\lceil \log(n) \right\rceil + 17n + k + 6m + 1 + 2n^{3} $$

and \(\mathfrak {D}(G) = w\). We first observe that, since n ≥ 40,

$$ \begin{array}{@{}rcl@{}} 13n\left\lceil \log(n) \right\rceil + 17n + k + 6m + 1 < 19n^{2} + 18n < 20n^{2} = \frac{40}{2}n^{2} \leq \frac{n}{2}n^{2} = \frac{n^{3}}{2} . \end{array} $$

Thus, \(|G| < \frac {n^{3}}{2} + 2n^{3} = \frac {5n^{3}}{2}\). Due to the separator symbol ⋆ with only one occurrence in w, we know that the axiom of G has the form \(u \star u^{\prime }\). Hence, we can consider all the nonterminals (and their rules) that occur in \(u^{\prime }\) as an individual 1-level grammar \(G^{\prime }\) for the word \(\mathfrak {D}(u^{\prime }) = [\diamond ]^{n^{3}}\) of size n6. By Lemma 1, we can conclude that \(|G^{\prime }| \geq 2n^{3}\); thus, \(2n^{3} \leq |G| < \frac {5n^{3}}{2}\). Claim 1: There is a DN with D → [◇] and, for every other rule Ax in R, |x| = 0.

Proof of Claim 1: First, we assume that there is a rule A →◇ with > n3. This rule can only be used in order to compress the suffix \([\diamond ]^{n^{3}}\) of w, since the other part of w has no occurrence of a factor ◇. Hence, we can replace A →◇ by the rule \(A \to \diamond ^{n^{3}}\) and change the axiom to \(u \star A^{n^{3}}\). By Lemma 1, the rule \(A \to \diamond ^{n^{3}}\) with axiom \(A^{n^{3}}\) compresses the subword \([\diamond ]^{n^{3}}\) optimally which means that this operation does not increase the size of G. Therefore, we conclude that G does not contain a rule A →◇ with > n3.

Since w contains at least n3 non-overlapping occurrences of the factor [◇] and since |G| < 3n3, at least one of these factors must be produced by at most 2 nonterminals. This implies that there is a rule Bv with \(|v| \geq \frac {|[\diamond ]|}{2} = \frac {n^{3}}{2}\). If v contains a symbol from Σ ∖{◇}, then Bv is not a rule of \(G^{\prime }\); thus, by Lemma 1, it follows that \(|G| \geq |G^{\prime }| + \frac {n^{3}}{2} \geq 2n^{3} + \frac {n^{3}}{2} = \frac {5n^{3}}{2}\), which is a contradiction. Hence, we can conclude that v ∈{◇} and we further assume that, among all rules with a right side in {◇} of size at least \(\frac {n^{3}}{2}\), Bv is such that |v| is maximal. Moreover, let |v| = n3t, for a \(t \in \mathbb {N}\).

We note that, due to the maximality of Bv and the fact that all rules in \(G^{\prime }\) have a right side in {◇}, a rule of maximum size in \(G^{\prime }\) has size at most n3t. In particular, this implies

$$ |u^{\prime}| \geq \frac{n^{6}}{n^{3} - t} > \frac{n^{6} - t^{2}}{n^{3} - t} = \frac{(n^{3} + t)(n^{3} - t)}{n^{3} - t} = n^{3} + t , $$

where \(u^{\prime }\) is the right side of the axiom as defined above.

We now remove rule Bv, add the rule D → [◇] and replace part \(u^{\prime }\) of the axiom by \(D^{n^{3}}\). Since |[◇]| = |v| + t and \(|u^{\prime }| \geq n^{3} + t = |D^{n^{3}}| + t\), this does not increase the size of the grammar. However, the rule Bv might have been used in order to produce some of the factors [◇] in the left part u of the axiom of G; thus, since we removed the rule Bv, we have to repair G accordingly.

To this end, we first note that every occurrence of [◇] to the left of ⋆ in w is compressed by a sequence E1C1C2CpE2 of terminals or nonterminals, such that \(\mathfrak {D}(E_{1} C_{1} C_{2} {\ldots } C_{p} E_{2}) = x [\diamond ] y\), where E1xq, q ≥ 1, or E1 = ε, and E2 →◇ry, r ≥ 1, or E2 = ε. For every such occurrence of [◇] to the left of ⋆ in w, we exchange E1C1C2CpE2 by \(E^{\prime }_{1} D E^{\prime }_{2}\), where \(E^{\prime }_{1} = \varepsilon \), if E1 = ε and \(E^{\prime }_{1} = x\) if E1xq, q ≥ 1, and \(E^{\prime }_{2} = \varepsilon \), if E2 = ε and \(E^{\prime }_{2} = y\) if E2 →◇ry, r ≥ 1. This construction removes rules or shortens them; thus, in order to conclude that the overall size of the grammar does not increase, we only have to observe that the size of the axiom is not increased. To this end, we first observe that if p = 0, then E1 or E2 must have a right side of length at least \(\frac {n^{3}}{2}\) that contains a symbol from Σ ∖{◇}, but, as shown above, such rules do not exist. Hence, we can assume that p ≥ 1. Furthermore, since E1 = ε implies \(E^{\prime }_{1} = \varepsilon \) and E2 = ε implies \(E^{\prime }_{2} = \varepsilon \), \(|E_{1} C_{1} C_{2} {\ldots } C_{p} E_{2}| \geq |E^{\prime }_{1} D E^{\prime }_{2}|\) follows.

We conclude that the overall size of the grammar did not increase due to these modifications. Moreover, G now contains a rule D → [◇] and, since all occurrences of ◇ in w are produced by this rule, we can safely remove all other rules that produce an occurrence of ◇ from the grammar. (Claim 1) \(\square \)

The statement of the previous claim particularly implies that the axiom of G has the form

$$ \mathsf{ax} = \prod\limits_{i=1}^{n}(\alpha_{i} D \alpha^{\prime}_{i} D)^{2\left\lceil \log(n) \right\rceil+3} \prod\limits_{i=1}^{n}(\beta_{i} D)^{\left\lceil \log(n) \right\rceil +1} \prod\limits_{i = 1}^{m}(\gamma_{i} D)^{2} \star D^{n^{3}} , $$

where \(\alpha _{i}, \alpha ^{\prime }_{i}, \beta _{i}, \gamma _{j} \in (N \cup {\Sigma })^{*}\), 1 ≤ in, 1 ≤ jm.

Claim 2: For every i, 1 ≤ in, \(\alpha _{i} = \overset {{~}_{\leftarrow }}{V_{i}}\), \(\alpha ^{\prime }_{i} = {\!}_{\rightarrow }{V_{i}}\), where \(\overset {{~}_{\leftarrow }}{V_{i}}, {\!}_{\rightarrow }{V_{i}}\) are nonterminals with rules \(\overset {{~}_{\leftarrow }}{V_{i}} \rightarrow \# \overline {v_{i}}\) and \({\!}_{\rightarrow }{V_{i}} \rightarrow \overline {v_{i}} \#\).

Proof of Claim 2: Obviously, for every i, 1 ≤ in, \(\mathfrak {D}(\alpha _{i}) = \# \overline {v_{i}}\), which means that |αi| = 1 implies that αi is a nonterminal with derivative \(\# \overline {v_{i}}\). We now assume that |αi|≥ 2 for some i, 1 ≤ in. If we substitute αi, by a new nonterminal \(\overset {{~}_{\leftarrow }}{V_{i}}\) with a rule \(\overset {{~}_{\leftarrow }}{V_{i}} \rightarrow \# \overline {v_{i}}\), then we shorten the axiom by at least \(2\lceil \log (n)\rceil +3\) and the size of the new rule is \(|\# \overline {v_{i}}| = \left \lceil \log (n) \right \rceil + 1\); thus, the overall size of the grammar does not increase. An analogous argument applies if \(|\alpha ^{\prime }_{i}| \geq 2\) for some i, 1 ≤ in. Consequently, we can assume that we have \(\overset {{~}_{\leftarrow }}{V_{i}}, {\!}_{\rightarrow }{V_{i}} \in N\) with rules \( \overset {{~}_{\leftarrow }}{V_{i}} \rightarrow \# \overline {v_{i}}\) and \({\!}_{\rightarrow }{V_{i}} \rightarrow \overline {v_{i}} \#\), and \(\alpha _{i} = \overset {{~}_{\leftarrow }}{V_{i}}\), \(\alpha ^{\prime }_{i} = {\!}_{\rightarrow }{V_{i}}\), 1 ≤ in.(Claim 2) \(\square \)

We recall that, for every i, 1 ≤ in, \(\mathfrak {D}(\beta _{i}) = \# \overline {v_{i}} \#\). Hence, if, for some i, 1 ≤ in, |βi|≥ 2, then we can as well replace βi by \(\overset {{~}_{\leftarrow }}{V_{i}} \#\) without increasing the size of the grammar. This implies that, for every i, 1 ≤ in, \(\beta _{i} = \overset {{~}_{\leftarrow }}{V_{i}} \#\) or \(\beta _{i} = \overset {{~}_{\leftrightarrow }}{V_{i}}\) with \(\overset {{~}_{\leftrightarrow }}{V_{i}} \to \# \overline {v_{i}} \#\).

Next, recall that, for every j, 1 ≤ jm, \(\mathfrak {D}(\gamma _{i}) = \# \overline {v_{j_{2i-1}}} \# \overline {v_{j_{2i}}} \#\). If, for some i, 1 ≤ in, |γi|≥ 3, then we can as well replace γi by \(\overset {{~}_{\leftarrow }}{V}_{j_{2i-1}} \overset {{~}_{\leftarrow }}{V}_{j_{2i}} \#\) without increasing the size of the grammar. If |γi| = 1, then there is a rule \(E \to \# \overline {v_{j_{2i-1}}} \# \overline {v_{j_{2i}}} \#\) of size \(2 \lceil \log (n)\rceil + 3\). If we now replace γi by \(\overset {{~}_{\leftarrow }}{V}_{j_{2i-1}} \overset {{~}_{\leftarrow }}{V}_{j_{2i}} \#\), then we increase the size of the axiom (and therefore of the grammar) by 4. However, since there are no other occurrences of \(\# \overline {v_{j_{2i-1}}} \# \overline {v_{j_{2i}}} \#\) in w, there are no other occurrences of E in the axiom; thus, we can remove the rule \(E \to \# \overline {v_{j_{2i-1}}} \# \overline {v_{j_{2i}}} \#\), which decreases the size of the grammar by \(2 \lceil \log (n)\rceil + 3 \geq 4\). Hence, the overall size of the grammar does not increase. If |γi| = 2, then γi = E1E2 with \(E_{1} \to \# \overline {v_{j_{2i-1}}} \# x\) or \(E_{2} \to x \# \overline {v_{j_{2i}}} \#\). Let us assume that there is a rule \(E_{1} \to \# \overline {v_{j_{2i-1}}} \# x\) (the case \(E_{2} \to x \# \overline {v_{j_{2i}}} \#\) is analogous). If we now change this rule to \(E_{1} \to \# \overline {v_{j_{2i-1}}} \#\) and substitute every E2 by \({\!}_{\rightarrow }{V}_{j_{2i}}\), then the size of the grammar does not increase (note that the nonterminals E1 and E2 can only occur in some γj, which has been replaced in this way).

These considerations demonstrate that we can assume that, in addition to the rule D → [◇], the rules of G are \(\overset {{~}_{\leftarrow }}{V_{i}} \rightarrow \# \overline {v_{i}}\), \({\!}_{\rightarrow }{V_{i}} \to \overline {v_{i}}\#\), 1 ≤ in, and rules \(\overset {{~}_{\leftrightarrow }}{V_{i}} \to \#\overline {v_{i}}\#\) with \(i \in \mathfrak {I}\), for some \(\mathfrak {I} \subseteq \{1, 2, \ldots , n\}\). We now define \(\ell = |\mathfrak {I}|\) and the vertex set \(\mathcal {V} = \{v_{i} \mid i \in \mathfrak {I}\}\); furthermore, let t be the number of edges from \(\mathcal {G}\) that are covered by some vertex of \(\mathcal {V}\). The axiom has the following form:

$$ \mathsf{ax} = \prod\limits_{i=1}^{n}(\overset{{~}_{\leftarrow}}{V_{i}} D {\!}_{\rightarrow}{V_{i}} D)^{2\left\lceil \log(n) \right\rceil+3} \prod\limits_{i=1}^{n}(y_{i} D)^{\left\lceil \log(n) \right\rceil +1} \prod\limits_{i = 1}^{m}(z_{i} D)^{2} \star D^{n^{3}} , $$

where, for every i, 1 ≤ in, \(y_{i} = \overset {{~}_{\leftrightarrow }}{V_{i}}\) if \(v_{i} \in \mathcal {V}\) and \(y_{i} = \overset {{~}_{\leftarrow }}{V_{i}} \#\) otherwise, and, for every i, 1 ≤ im, \(z_{i} = \overset {{~}_{\leftarrow }}{V}_{j_{2i-1}} \overset {{~}_{\leftarrow }}{V}_{j_{2i}} \#\), if the edge \((v_{j_{2i-1}}, v_{j_{2i}})\) is not covered by \(\mathcal {V}\), \(z_{i} = \overset {{~}_{\leftrightarrow }}{V}_{j_{2i-1}} {\!}_{\rightarrow }{V}_{j_{2i}}\) or \(z_{i} = \overset {{~}_{\leftarrow }}{V}_{j_{2i-1}} \overset {{~}_{\leftrightarrow }}{V}_{j_{2i}}\), if \(v_{j_{2i-1}} \in \mathcal {V}\) or \(v_{j_{2i}} \in \mathcal {V}\), respectively.

The total size of the rules is

$$ 2n\lceil \log(n) \rceil + 2n + \ell\lceil \log(n) \rceil + 2\ell + n^{3} . $$

Moreover,

$$ \begin{array}{@{}rcl@{}} |\mathsf{ax}|& = &4n(2\left\lceil \log(n) \right\rceil + 3) + (\left\lceil \log(n) \right\rceil + 1)(3n - \ell)) + 6t + 8(m - t) + 1 + n^{3} \\ &= &11n\left\lceil \log(n) \right\rceil + 15n - \ell\left\lceil \log(n) \right\rceil - \ell + 8m - 2t + 1 + n^{3} . \end{array} $$

Consequently, \(|G| = 13n\left \lceil \log (n) \right \rceil + 17n + \ell + 8m - 2t + 1 + 2n^{3}\). Since, by assumption, \(|G| \leq 13n\left \lceil \log (n) \right \rceil + 17n + k + 6m + 1 + 2n^{3}\), we conclude that + 8m − 2tk + 6m. From this inequality, since tm, we can deduce k on the one hand and also \(m - \frac {k-\ell }{2} \leq t\) on the other.

Consequently, the vertex set \(\mathcal {V}\) covers already \(m - \frac {k-\ell }{2}\) edges of \(\mathcal {G}\). This implies that we can extend \(\mathcal {V}\) to a vertex cover \(\mathcal {V}^{\prime }\) for \(\mathcal {G}\) by adding q vertices, where \(q \leq \frac {k-\ell }{2} \leq k-\ell \). Since \(|\mathcal {V}| = \ell \), \(|\mathcal {V}^{\prime }| \leq |\mathcal {V}| + q \leq \ell + k-\ell = k\). □

From Lemmas 2 and 3, we can directly conclude the following theorem:

Theorem 2

1-SGP is NP-complete, even for |Σ| = 5.

3.2 The Multi-Level Case

In the above reduction for the 1-level case, the main difficulty is the use of unary factors as separators. However, once those separators are in place, we know the factors of w that are produced by nonterminals and, for a smallest 1-level grammar, this already fully determines the axiom and therefore also the grammar itself. For the multi-level case, the situation is much more complicated. Even if we manage to force the axiom to factorise w into parts that are either separators or codewords of vertices, this only determines the top-most level of the grammar and we do not necessarily know how these single factors are further hierarchically compressed and, more importantly, the dependencies between these compressions (i. e., how they share the same rules).

To deal with these issues, we rely on a larger alphabet Σ and we use palindromic codewords uuR, where ⋆ ∈Σ and u is a word over an alphabet of size 7 representing a 7-ary number. The purpose of the palindromic structure is twofold. Firstly, it implies that codewords always start and end with the same symbol, which, in the construction of w, makes it easier to avoid the situation that an overlapping between neighbouring codewords is repeated elsewhere in w (see Lemma 4). Secondly, if all codewords are produced by individual nonterminals, then we can show that they are produced best “from the middle”, similar to the rules of the example grammar G2 from Section 2.3. In addition to this, we also need a vertex colouring and an edge colouring of certain variants of the graph to be encoded.

In order to formally define the reduction, we first give some preparatory definitions. Let

$$ {\Sigma}=\{x_{1},\dots,x_{7}, d_{1},\dots,d_{7}, \star,\#, {\cent}_{1},{\cent}_{2},\$_{1},\dots, \$_{6}\} $$

be an alphabet of size 24. The function \(M \colon \mathbb {N}\times \mathbb {N}\rightarrow \mathbb {N}\) is defined by

$$ M(q,k):=\min\{r>0\mid \exists \ t\in \mathbb{N}\colon q=tk+r\} $$

(note that M is the positive modulo-function, i. e., M(q, k) = q%k, if q%k≠ 0 and M(q, k) = k, otherwise). Let the functions \(f\colon \mathbb {N} \rightarrow \{x_{1},\dots ,x_{7}\}^{+}\) and \(g\colon \mathbb {N} \rightarrow \{d_{1},\dots ,d_{7}\}^{+}\) be defined by

$$ \begin{array}{@{}rcl@{}} f(q) &:= &x_{a_{0}} x_{a_{1}}{\dots} x_{a_{k}} \text{ and}\\ g(q) &:= &d_{a_{0}} d_{a_{1}}{\dots} d_{a_{k}} , \end{array} $$

for every \(q \in \mathbb {N}\), where \(k \in \mathbb {N} \cup \{0\}\) and ai ∈{1,2,…,7}, 0 ≤ ik, such that \(q={\sum }^{k}_{i=0} a_{i} 7^{i}\) is satisfied. Note that since, for every \(q \in \mathbb {N}\), there are unique \(k \in \mathbb {N}\) and ai ∈{1,2,…,7}, 1 ≤ ik, such that \(q={\sum }^{k}_{i \geq 0} a_{i} 7^{i}\), the functions f and g are well-defined.

For every \(i \in \mathbb {N}\), let 〈iv := f(i) ⋆ f(i)R and 〈i := g(i) ⋆ g(i)R. The factors 〈iv and 〈i are called codewords; 〈iv represents a vertex vi, while the 〈i are used as separators.

Observation 1

The functions f and g are bijections and they are 7-ary representations of the integers n > 0 (least significant digit first). Thus, for any \(n \in \mathbb {N} \cup \{0\}\), g(7n + i)[1] = di and f(7n + i)[1] = xi, 1 ≤ i ≤ 7. In particular, this means that \(\{g(n+i)[1]\mid 0\leq i \leq 6\}=\{d_{1},\dots ,d_{7}\}\) and \(\{f(n+i)[1]\mid 0\leq i \leq 6\}=\{x_{1},\dots ,x_{7}\}\), for every \(n \in \mathbb {N}\). Consequently, for every \(n, n^{\prime } \in \mathbb {N}\) with \(M(n, 7) \neq M(n^{\prime }, 7)\), the factors 〈nv and \(\langle n^{\prime } \rangle _{v}\) do not share any prefixes or suffixes (and the same holds for the words 〈n).

Let \(\mathcal {G}=(V,E)\) be a subcubic graph (i. e., a graph with maximum degree 3) with \(V=\{v_{1},\dots ,v_{n}\}\) and \(E=\{\{v_{j_{2i-1}},v_{j_{2i}}\}\mid 1 \leq i \leq m\}\) (note that the vertex cover problem remains NP-hard if restricted to subcubic graphs (see [49])). Let \(\mathcal {G}^{\prime }=(V,E^{\prime })\) be the multi-graph defined by

$$ E^{\prime}:=\left\{\{v_{j_{2i}},v_{j_{2i+1}}\}\mid 1 \leq i \leq m-1\right\} . $$

By [50], it is possible to compute in polynomial time a proper edge-colouring (meaning a colouring such that no two edges which share one or two vertices have the same colour) for a multi-graph with at most \(\lfloor \tfrac 32 m\rfloor \) colours, where m is the maximum degree of the multi-graph. Since the graph \(\mathcal {G}\) is subcubic, the maximum degree of \(\mathcal {G}^{\prime }\) is three and we can compute a proper edge-colouring \(C_{e}\colon E^{\prime }\rightarrow \{1,2,3,4\}\) for \(\mathcal {G}^{\prime }\) with colours {1,2,3,4}. Let \(\mathcal {G}^{2}=(V,E^{\prime \prime })\) be the graph defined by

$$ E^{\prime\prime}=\left\{\{u,v\}\mid \ \{u,w\},\{w,v\}\in E\text{ for some } w\in V\!\setminus\!\{u,v\}, u\not= v\right\} . $$

Since \(\mathcal {G}\) is subcubic, \(\mathcal {G}^{2}\) has maximum degree at most six. Let \(C_{v}\colon \{1,\dots ,n\}\rightarrow \{1,2,3,4,5,6,7\}\) be a proper vertex-colouring (defined over the vertex-indices of \(V=\{v_{1},\dots ,v_{n}\}\)) for \(\mathcal {G}^{2}\) with colours {1,2,3,4,5,6,7}. Such a colouring can be computed by an algorithmic version of Brook’s theorem [51].

Let \(w_{\mathcal {G}} = u v w\) be the word representing \(\mathcal {G}\), where u, v, w ∈Σ+ are defined as follows (note that \(m \leq \frac {3n}{2}\), so 7m < 14n in the word w).

$$ u = \prod\limits_{j=0}^{6} \left( {\prod}_{i=1}^{14n} (\langle i \rangle_{\diamond} \langle M(i+j,14n) \rangle_{v})\right) \$_{1} $$
$$ \begin{array}{@{}rcl@{}} v &=& \prod\limits_{i=1}^{n} \left( \# \langle 7i+C_{v}(i) \rangle_{v} {\cent}_{1} \langle 7i-1 \rangle_{\diamond}\right) \$_{2} \prod\limits_{i=1}^{n} \left( \# \langle 7i+C_{v}(i) \rangle_{v} {\cent}_{2} \langle 7i-2 \rangle_{\diamond}\right) \$_{3}\\ &&\prod\limits_{i=1}^{n} \left( \langle 7i+C_{v}(i) \rangle_{v} \# \langle 7i-2 \rangle_{\diamond} {\cent}_{1}\right) \$_{4} \prod\limits_{i=1}^{n} \left( \langle 7i+C_{v}(i) \rangle_{v} \# \langle 7i-1 \rangle_{\diamond} {\cent}_{2}\right) \$_{5}\\ &&\prod\limits_{i=1}^{n} \left( \# \langle 7i+C_{v}(i) \rangle_{v} \# \langle 7i \rangle_{\diamond}\right) \$_{6} \end{array} $$
$$ \begin{array}{@{}rcl@{}} w = \prod\limits_{i=1}^{m-1} (\!\!\!\!\!&&\# \langle 7j_{2i-1}+C_{v}(j_{2i-1}) \rangle_{v} \# \langle 7j_{2i}+C_{v}(j_{2i}) \rangle_{v} \# \langle 7i+C_{e}(v_{j_{2i}},v_{j_{2i+1}}) \rangle_{\diamond} )\\ &&\# \langle 7j_{2m-1}+C_{v}(j_{2m-1}) \rangle_{v} \# \langle 7j_{2m}+C_{v}(j_{2m}) \rangle_{v} \# \end{array} $$

This concludes the definition of the reduction. Since the following proof of correctness is very complicated, we first present a corresponding “road-map”, to make it more accessible:

  • First, and completely independent from the question of how a grammar could compress \(w_{\mathcal {G}}\), we take a closer look at the structure of this word. More precisely, in Propositions 1 and 2, we show that if a factor of \(w_{\mathcal {G}}\) spans over the symbol ⋆ of some codeword 〈iv or 〈i and also reaches over the boundaries of this codeword into some other factor, then it is not repeated in \(w_{\mathcal {G}}\). This property is the main reason for the complicated structure of \(w_{\mathcal {G}}\) (especially the factor v).

  • An immediate consequence of the property described in the previous point, is that in a smallest grammar, any nonterminal that derives a factor with an occurrence of ⋆ necessarily derives a factor that is completely contained in some codeword 〈i or in some codeword 〈iv delimited by two occurrences of the symbol # (see Lemma 4).

  • Next, we show that we can assume that in a smallest grammar, there are nonterminals that have exactly our codewords as derivatives (see Lemma 5).

  • The next result (Lemma 6) states that we can also assume that in a smallest grammar there are nonterminals with derivative #〈7i + Cv(i)〉v and nonterminals with derivative 〈7i + Cv(i)〉v#.

  • Finally, we are able to fix the structure of a smallest grammar (Lemma 7) and we can show that, just like in the reduction from [33, 34] (see Page 16), the set of rules that derive factors of the form #〈7i + Cv(i)〉v# can be transformed into a vertex cover (see Lemma 8).

The following simple, but crucial observation shall be helpful throughout the proof of correctness:

Observation 2

The word \(w_{\mathcal {G}}\) contains each of the symbols $1,…, $6 exactly once, which implies that any smallest grammar for \(w_{\mathcal {G}}\) has an axiom of the form \({\prod }^{6}_{i=1}(\beta _{i} \$_{i}) \beta _{7}\), \(\beta _{i} \in ((V\cup {\Sigma }) \setminus \{\$_{1},\dots ,\$_{6}\})^{+}\), 1 ≤ i ≤ 7.

We now prove the two propositions that establish the property with respect to the repetitions of factors containing ⋆.

Proposition 1

For every i, 1 ≤ i ≤ 14n, and j, 1 ≤ j ≤ 7, the word \(w_{\mathcal {G}}\) contains at most one occurrence of a factor of the form

$$ \begin{array}{@{}rcl@{}} \star f(i)^{R} d_{j}, \qquad d_{j} f(i) \star, \qquad \star g(i)^{R} x_{j}, \qquad x_{j} g(i) \star . \end{array} $$

Furthermore, if such a factor occurs in \(w_{\mathcal {G}}\), then the occurrence is in u.

Proof

We first note that factors of the form stated in the lemma can only occur in factors of the form \(\langle i \rangle _{v} \langle i^{\prime } \rangle _{\diamond }\) or \(\langle i \rangle _{\diamond } \langle i^{\prime } \rangle _{v}\). Since such factors only occur in u, the second statement of the proposition holds.

We first take care of factors of the form \(\langle i \rangle _{v} d_{j^{\prime }}\), 1 ≤ i ≤ 14n, \(1 \leq j^{\prime } \leq 7\). These factors are subwords of 〈M(x + j,14n)〉vx + 1〉 for some \(j\in \{0,\dots ,6\}\) and x such that i = M(x + j,14n), which for each choice of pair (j, x) occur at most once in u. For every i, 6 < i ≤ 14n, this gives the seven choices (j, ij) with 0 ≤ j ≤ 6; note that i = M(x + j,14n) implies x = ij. This shows that the word u contains the subword 〈ivg(x + 1)[1] = 〈ivg(ij + 1)[1] once for each j, 0 ≤ j ≤ 6, and these are the only occurrences of a subword of the form \(\langle i \rangle _{v} d_{j^{\prime }}\) for some \(j^{\prime }\in \{1,\dots ,7\}\) in u. Since \(\{g(i-j+1)[1] \mid 0 \leq j \leq 6\}=\{d_{1},\dots ,d_{7}\}\) by Observation 1, it follows that no subword of the form \(\langle i \rangle _{v}d_{j^{\prime }}\) with \(j^{\prime }\in \{1,\dots ,7\}\) appears in u more than once. For every i, 1 ≤ i ≤ 6, the choices of pairs (j, x) shift x by taking the modulo and are (j, ij) for 0 ≤ j < i and (j,14nj + i) for ij ≤ 6. The word u hence contains the subword 〈ivg(ij + 1)[1] once for each j, 0 ≤ j < i, the subword 〈ivg(14nj + i + 1)[1] once for each j, ij ≤ 6, and these are the only occurrences of a subword of the form \(\langle i \rangle _{v} d_{j^{\prime }}\) for some \(j^{\prime }\in \{1,\dots ,7\}\) in u. By reducing the 14n modulo 7 to zero, shifting by + 7 and substituting j by 7 − r we get that {g(14nj + i + 1)[1]∣ij ≤ 6} = {g(i + 1 + r)[1]∣1 ≤ r ≤ 7 − i} and {g(ij + 1)[1]∣0 ≤ j < i} = {g(i + 1 + r)[1]∣7 − i < r ≤ 7}. By Observation 1 we can hence conclude that each subword of the form \(\langle i \rangle _{v}d_{j^{\prime }}\) with \(j^{\prime }\in \{1,\dots ,7\}\) appears in u mat most once. Note that for i = 6, the factor 〈ivg(14nj + i + 1)[1] for the only choice j = 6 does not show up, as in this case u ends and 〈6〉v is followed by $1. Consequently, for every i, 1 ≤ i ≤ 14n, every factor ⋆ f(i)Rdj, 1 ≤ j ≤ 7, has at most one occurrence in u.

Analogously, we can show that, for every i, 1 ≤ i ≤ 14n, every factor djf(i) ⋆, 1 ≤ j ≤ 7, has at most one occurrence in u. More precisely, it is sufficient to observe that, for every 6 < i ≤ 14n, the word u contains the subword g(ij)[1]〈iv once for each j, 0 ≤ j ≤ 6; for every 1 ≤ i ≤ 6, the subword g(ij)[1]〈iv once for each j, 0 ≤ ji − 1, and the subword g(14nj)[1]〈iv once for each j, 0 ≤ j ≤ 6 − i. As before, these are the only occurrences of a subword of the form \(d_{j^{\prime }} \langle i \rangle _{v}\) for some \(j^{\prime }\in \{1,\dots ,7\}\) in u.

For every i, 1 ≤ i ≤ 14n, there are exactly 7 factors of the form ⋆ g(i)Rxj, for some j, 1 ≤ j ≤ 7. Let \(\star g(i) x_{j_{\ell }}\), 1 ≤ ≤ 7, be these 7 factors. By the structure of u, we observe that {j∣1 ≤ ≤ 7} = {x1, x2,…, x7}, which directly implies that, for every i, 1 ≤ i ≤ 14n, every factor \(\star g(i)^{R} x_{j_{\ell }}\), 1 ≤ ≤ 7, has at most one occurrence in u. Analogously, we can show that, for every i, 1 < i ≤ 14n, every factor of the form xjg(i) ⋆, 1 ≤ j ≤ 7, has at most one occurrence in u. Finally, there are exactly 6 factors of the form xjg(1) ⋆, 1 ≤ j ≤ 7, namely the factors f(14n)[1]g(1) ⋆ and f(j)[1]g(1) ⋆, 1 ≤ j ≤ 5. Since {f(14n)[1], f(j)[1]∣1 ≤ j ≤ 5} = {x7, x1, x2,…, x5}, it follows that every factor of the form xjg(1) ⋆, 1 ≤ j ≤ 7, has at most one occurrence in u. □

Proposition 2

For every i, 1 ≤ i ≤ 14n, and j, 1 ≤ j ≤ 7, the word \(w_{\mathcal {G}}\) contains at most one occurrence of a factor of the form

$$ \begin{array}{@{}rcl@{}} &&\star g(i)^{R} y,\qquad y g(i) \star, \qquad \star f(i)^{R} z, \qquad z f(i) \star,\\ &&d_{j} \# f(i) \star, \qquad \star f(i)^{R} \# d_{j}, \star f(i)^{R} \# x_{j}, \quad x_{j} \# f(i) \star , \end{array} $$

where y ∈Σ∖{d1,…, d7} and z ∈Σ∖{x1,…, x7, #}.

Proof

We first consider the factors ⋆ g(i)Ry with y ∈Σ∖{d1,…, d7}. In the case y ∈{x1,…, x7}, Proposition 1 shows that such factors have at most one occurrence in \(w_{\mathcal {G}}\). For y ∈{⋆, #12, $1,…, $6}, there are occurrences of factors of the form ⋆ g(i)Ry in v and in w, but not in u. We note that each two occurrences of factors ⋆ g(i)Ry and \(\star g(i^{\prime })^{R} y^{\prime }\) in w satisfy \(i \neq i^{\prime }\) and are therefore different. Moreover, all factors ⋆ g(i)Ry in w satisfy g(i)[1] ∈{1,2,3,4} (this is due to the colouring Ce). We next observe that all factors ⋆ g(i)Ry in v satisfy \(i \in \{7i^{\prime }, 7i^{\prime }-1, 7i^{\prime }-2 \mid i^{\prime } \in \mathbb {N}\}\), which implies that for these factors, we have g(i)[1] ∈{5,6,7}; thus, they all differ from the factors ⋆ g(i)Ry in w. Consequently, if a factor of the form ⋆ g(i)Ry repeats, then there must be individual occurrences of factors 〈iy and \(\langle i \rangle _{\diamond } y^{\prime }\) in v. This is only the case for \(i = 7i^{\prime } - 1\), but then there are exactly two such factors and with y ∈{#, $2}, \(y^{\prime } = {\cent}_{2}\), or for \(i = 7i^{\prime } - 2\), but then there are exactly two such factors and with y ∈{#, $3}, \(y^{\prime } = {\cent}_{1}\). This shows that each factor ⋆ g(i)Ry with y ∈Σ∖{d1,…, d7} has at most one occurrence in \(w_{\mathcal {G}}\). For the factors yg(i) ⋆ the argument is the same up to the point where we consider individual occurrences of factors yi and \(y^{\prime } \langle i \rangle _{\diamond }\) in v. Again, this is only possible for \(i = 7i^{\prime } - 1\) or \(i = 7i^{\prime } - 2\), but in the first case, we have y = ¢1, \(y^{\prime } = \#\), while in the second case, we have y = ¢2, \(y^{\prime } = \#\).

We next turn to the factors ⋆ f(i)Rz with z ∈Σ∖{x1,…, x7, #}. Again, Proposition 1 shows that for y ∈{d1,…, d7} such factors have at most one occurrence in \(w_{\mathcal {G}}\); thus, we consider the case y ∈{⋆,¢12, $1,…, $6}. We first note that such factors have no occurrence in u. Moreover, for every i, 1 ≤ i ≤ 14n, any factor of the form 〈ivy with y∉{d1,…, d7, x1,…, x7} has either no occurrence in vw, or exactly 5 occurrences in v and at most 3 occurrences in w (this is due to the fact that \(\mathcal {G}\) is subcubic). However, y is equal to # for all but two of those occurrences, where one occurrence is with y = ¢1 and the other with y = ¢2. Consequently, each factor ⋆ f(i)Rz with z ∈Σ∖{x1,…, x7, #} has at most one occurrence in \(w_{\mathcal {G}}\). The argument for the factors zf(i) ⋆ with z ∈Σ∖{x1,…, x7, #} is analogous, with the difference that the only two occurrences of a factor yiv in v with y∉{d1,…, d7, x1,…, x7, #} are once with y ∈{$31} and once with y ∈{$42}.

We next consider the factors dj#f(i) ⋆ and first note that such a factor only occurs in a factor #iv that is preceded by a factor \(\langle i^{\prime } \rangle _{\diamond }\), for some \(i^{\prime }\), \(1 \leq i^{\prime } \leq 14n\), and that such factors only occur in v or w. In v, there are either no or exactly 3 occurrences of #iv. The first one is either a prefix of v or preceded by 〈7 − 1〉, 1 ≤ n, the second is preceded by either $2 or 〈7 − 2〉, 1 ≤ n, and the third one is preceded by either $5 or 〈7, 1 ≤ n. Hence, these three occurrences are preceded by symbols d6, d5 and d7, respectively (or by symbols not in {d1,…, d7}). Consequently, the factor dj#f(i) ⋆ is not repeated in v and if it occurs, j ∈{5,6,7} holds. Next, we note that every #iv in w that is preceded by a \(\langle i^{\prime } \rangle _{\diamond }\), satisfies \(i^{\prime } = 7\ell + C_{e}(v_{j_{2\ell }}, v_{j_{2\ell + 1}})\), and since the range of Ce is {1,2,3,4}, this occurrence of #iv is preceded by symbol d1, d2, d3 or d4. Finally, we have to show that no dj#iv is repeated in w. To this end, we assume that dj#iv with j ∈{1,2,3,4} is repeated. This implies that there are \(k, k^{\prime }\), \(1 \leq k < k^{\prime } \leq m-1\), with \(j_{2k-1} = j_{2k^{\prime }-1} = i\), and, furthermore, \(\langle 7(k - 1) + C_{e}(v_{j_{2(k-1)}}, v_{j_{2(k - 1)+1}}) \rangle _{\diamond }\) and \(\langle 7(k^{\prime } - 1) + C_{e}(v_{j_{2(k^{\prime }-1)}}, v_{j_{2(k^{\prime } - 1)+1}}) \rangle _{\diamond }\) both end with symbol dj. Thus, \(C_{e}(v_{j_{2(k-1)}}, v_{j_{2(k - 1)+1}}) = C_{e}(v_{j_{2(k^{\prime }-1)}}, v_{j_{2(k^{\prime } - 1)+1}}) = j\), which is a contradiction, since the edges \((v_{j_{2(k-1)}}, v_{j_{2(k - 1)+1}})\) and \((v_{j_{2(k^{\prime }-1)}}, v_{j_{2(k^{\prime } - 1)+1}})\) of \(\mathcal {G}^{\prime }\) are incident with the same vertex \(v_{j_{2k-1}} = v_{j_{2k^{\prime }-1}} = v_{i}\) and Ce is a proper edge colouring for \(\mathcal {G}^{\prime }\). Consequently, no dj#iv is repeated in w; thus, the word \(w_{\mathcal {G}}\) contains at most one occurrence of a factor of the form dj#f(i) ⋆.

In an analogous way, we can show that every factor of form ⋆ f(i)R#dj in v satisfies j ∈{5,6,7} and in w it satisfies j ∈{1,2,3,4}. That these factors do not repeat follows from the fact that ⋆ f(i)R# occurs at most 3 times in v (followed by the different symbols d5, d6 and d7) and the repetitions of ⋆ f(i)R# in w are followed by distinct symbols from {d1, d2, d3, d4} due to the proper edge colouring Ce of \(\mathcal {G}^{\prime }\). Thus, the word \(w_{\mathcal {G}}\) contains at most one occurrence of a factor of the form ⋆ f(i)R#dj.

For any i, 1 ≤ i ≤ 14n, and j, 1 ≤ j ≤ 7, the factor ⋆ f(i)R#xj only occurs in w and only in a factor of the form \(\langle 7\ell +C_{v}(\ell ) \rangle _{v} \# \langle 7\ell ^{\prime }+C_{v}(\ell ^{\prime }) \rangle _{v}\), \(1 \leq \ell , \ell ^{\prime } \leq n\), with i = 7 + Cv() and \(f(7\ell ^{\prime }+C_{v}(\ell ^{\prime }))[1] = x_{j}\). Hence, if ⋆ f(i)R#xj has two occurrences, then there are \(\ell ^{\prime }, \ell ^{\prime \prime }\), \(1 \leq \ell ^{\prime }, \ell ^{\prime \prime } \leq n\), such that the vertices \(v_{\ell ^{\prime }}\) and \(v_{\ell ^{\prime \prime }}\) are neighbours of v (in \(\mathcal {G}\)), and \(f(7\ell ^{\prime }+C_{v}(\ell ^{\prime }))[1] = f(7\ell ^{\prime \prime }+C_{v}(\ell ^{\prime \prime }))[1] = x_{j}\), which implies \(C_{v}(\ell ^{\prime }) = C_{v}(\ell ^{\prime \prime }) = j\). This is a contradiction to the fact that Cv is a proper vertex colouring for the graph \(\mathcal {G}^{2}\). In an analogous way, it follows that the factor xj#f(i) ⋆ is not repeated. □

Since a smallest grammar does not contain rules which produce a factor which is not repeated, Propositions 1 and 2 yield the following:

Lemma 4

For every smallest grammar G = (N,Σ, R, ax) for \(w_{\mathcal {G}}\), \(\mathfrak {D}(A)_{\star } \geq 1\) for some AN implies that \(\mathfrak {D}(A)\) is a factor of some #〈7i + Cv(i)〉v#, 1 ≤ in, or a factor of some 〈jv, 1 ≤ j ≤ 14n, or a factor of some 〈j, 1 ≤ j ≤ 14n.

The main consequence of Lemma 4 is that, in a smallest grammar, the axiom has a length of at least the number of occurrences of ⋆ in \(w_{\mathcal {G}}\). This allows us to show that, without increasing the size of the grammar, the axiom can be restructured, such that each individual codeword is produced by its own nonterminal.

Lemma 5

There is a smallest grammar G for \(w_{\mathcal {G}}\) such that, for every i, 1 ≤ i ≤ 14n, there is a nonterminal with derivative 〈i and a nonterminal with derivative 〈iv.

Proof

Let G = (N,Σ, R, ax) be a smallest grammar with \(\mathfrak {D}(G) = w_{\mathcal {G}}\). We shall first show how G can be modified in such a way that, for every i, 1 ≤ i ≤ 14n, there is a nonterminal with derivative 〈i. To this end, we assume that for some \(\mathfrak {I}_{\diamond } \subseteq \{1, 2, \ldots , 14n\}\) and every i, 1 ≤ i ≤ 14n, there currently is a nonterminal in G with derivative 〈i if and only if \(i \in \mathfrak {I}_{\diamond }\); furthermore, let \(\overline {\mathfrak {I}_{\diamond }} = \{1, 2, \ldots , 14n\} \setminus \mathfrak {I}_{\diamond }\). For the sake of concreteness, for every \(i \in \mathfrak {I}_{\diamond }\), let \(\widehat {D}_{i}\) be the nonterminal with \(\mathfrak {D}(\widehat {D}_{i}) = \langle i \rangle _{\diamond }\).

We now recursively define a set of rules R := {r◇, i∣1 ≤ i ≤ 14n} for nonterminals Di, 1 ≤ i ≤ 14n, by \(r_{\diamond , i} := D_{i} \rightarrow d_{i} \star d_{i}\), 1 ≤ i ≤ 7, and \(r_{\diamond , i} := D_{i} \rightarrow g(i)[1] D_{h(i)} g(i)[1]\), 8 ≤ i ≤ 14n, where \(h(i):= \frac {i-M(i,7)}{7}\). Obviously, \(\mathfrak {D}(D_{i}) = \langle i \rangle _{\diamond }\), 1 ≤ i ≤ 14n. We modify G by the following algorithm. For every i = 1,2,…,14n, if \(i \in \overline {\mathfrak {I}_{\diamond }}\), then we add the rule Di from R to G, and if \(i \in \mathfrak {I}_{\diamond }\), then we replace the rule \(\widehat {D}_{i} \to \alpha \) by Diα. Furthermore, we can carry out an analogous modification with respect to derivatives 〈iv. More precisely, we define \(\mathfrak {I}_{v} \subseteq \{1, 2, \ldots , 14n\}\) to be such that, for exactly the \(i \in \mathfrak {I}_{v}\), there is a nonterminal with derivative 〈iv. Then, in the same way as above, we can add rules from the set Rv := {rv, i∣1 ≤ i ≤ 14n}, where \(r_{v, i} := V_{i} \rightarrow x_{i} \star x_{i}\), 1 ≤ i ≤ 7, and \(r_{v, i} := V_{i} \rightarrow f(i)[1] V_{h(i)} f(i)[1]\), 8 ≤ i ≤ 14n, where \(h(i):= \frac {i-M(i,7)}{7}\).

We denote this modified grammar by \(G^{\prime }\) and note that, by the considerations from above, for every i, 1 ≤ i ≤ 14n, \(G^{\prime }\) contains nonterminals Di and Vi with

$$ \mathfrak{D}(D_{i}) = \langle i \rangle_{\diamond} \text{ and } \mathfrak{D}(V_{i}) = \langle i \rangle_{v}, 1 \leq i \leq 14n . $$

Moreover, since every rule from R and Rv has size 3, \(|G^{\prime }| = |G| + 3(|\overline {\mathfrak {I}_{\diamond }}| + |\overline {\mathfrak {I}_{v}}|)\). In the remainder of this proof, we show that this size increase can be compensated by using the new rules in order to significantly shorten the axiom. Hence, we obtain a smallest grammar, with the properties claimed in the lemma. To this end, we first measure the size of the axiom of the original grammar G.Claim 1: \(\mathsf {ax} = {\prod }^{6}_{i=1}(\beta _{i} \$_{i}) \beta _{7}\), where \(\beta _{i}\in ((N\cup {\Sigma }) \setminus \{\$_{1},\dots ,\$_{6}\})^{+}\), 1 ≤ i ≤ 7, and β1 contains at least 196n occurrences of symbols (terminal or nonterminal) that each produces exactly one occurrence of ⋆.

Proof of Claim 1: From Observation 2, it follows that \(\mathsf {ax} = {\prod }^{6}_{i=1}(\beta _{i} \$_{i}) \beta _{7}\), \(\beta _{i}\in ((N\cup {\Sigma }) \setminus \{\$_{1},\dots ,\$_{6}\})^{+}\), 1 ≤ i ≤ 7. Furthermore, β1 contains at least |u| symbols (terminal or nonterminal), since otherwise at least two occurrences of ⋆ of u are produced by the same nonterminal, which is a contradiction to Lemma 4. Hence, β1 contains at least 196n occurrences of symbols that each produces exactly one occurrence of ⋆. (Claim 1) \(\square \)Claim 2: There are at least \(7\lceil \frac {|\overline {\mathfrak {I}_{\diamond }}| + |\overline {\mathfrak {I}_{v}}|}{2}\rceil \) occurrences of symbols in β1 (terminal or nonterminal), each of which has a derivative without any occurrence of ⋆.

Proof of Claim 2: Let \(i \in \overline {\mathfrak {I}_{\diamond }}\), i. e., there is no nonterminal with derivative 〈i. Furthermore, a derivative that properly contains 〈i (and the corresponding nonterminal which occurs in β1) contains an occurrence of ⋆ and occurrences of symbols from both sets {d1,…, d7} and {x1,…, x7}, which contradicts Lemma 4. Consequently, each of the 7 occurrences of 〈i are produced by at least two symbols. Hence, for each of these 7 occurrences, there is one symbol producing a factor of 〈i containing the symbol ⋆ and a second symbol, which produces a factor of 〈i that contains symbols from {d1,…, d7}. Due to Lemma 4, this second symbol cannot also produce the next or preceding occurrence of ⋆. This means that for each \(i \in \overline {\mathfrak {I}_{\diamond }}\), there exist 7 symbols that do not produce a symbol ⋆. In the same way, we can also conclude that for each \(i \in \overline {\mathfrak {I}_{v}}\), there exist 7 symbols that do not produce a symbol ⋆. However, it is possible that these symbols in β1 which do not produce a ⋆ coincide, i. e., such a symbol can produce parts of some 〈i with \(i \in \overline {\mathfrak {I}_{\diamond }}\) and \(\langle i^{\prime } \rangle _{v}\), with \(i^{\prime } \in \overline {\mathfrak {I}_{v}}\). So we can only conclude that there are at least \(7\lceil \frac {|\overline {\mathfrak {I}_{\diamond }}| + |\overline {\mathfrak {I}_{v}}|}{2}\rceil \) occurrences of symbols in β1 that do not produce an occurrence of ⋆. (Claim 2) \(\square \)From these two claims, it follows that the axiom of G (and therefore the whole grammar G) has size of at least \(196n + 7\lceil \frac {|\overline {\mathfrak {I}_{\diamond }}| + |\overline {\mathfrak {I}_{v}}|}{2}\rceil \). We now change \(G^{\prime }\) a second time (into \(G^{\prime \prime }\)), as follows. We replace β1 in the axiom \(\mathsf {ax}^{\prime } = {\prod }^{6}_{i=1}(\beta _{i} \$_{i}) \beta _{7}\) of \(G^{\prime }\) (note that Observation 2 implies that \(\mathsf {ax}^{\prime }\) must have this structure) by \(\beta ^{\prime }_{1} = {\prod }^{6}_{j=0} {\prod }^{14n}_{i=1} D_{i} V_{M(i+j,14n)}\). We note that \(|\beta _{1}| \geq 196n + 7\lceil \frac {|\overline {\mathfrak {I}_{\diamond }}| + |\overline {\mathfrak {I}_{v}}|}{2}\rceil \), whereas \(|\beta ^{\prime }_{1}| = 196n\). Consequently,

$$ \begin{array}{@{}rcl@{}} |G^{\prime\prime}| &=& \underbrace{|G| + 3(|\overline{\mathfrak{I}_{\diamond}}| + |\overline{\mathfrak{I}_{v}}|)}_{|G^{\prime}|} + |\beta^{\prime}_{1}| - |\beta_{1}|\\ &\leq& |G| + 3(|\overline{\mathfrak{I}_{\diamond}}| + |\overline{\mathfrak{I}_{v}}|) + 196n - \left( 196n + 7\left\lceil \frac{|\overline{\mathfrak{I}_{\diamond}}| + |\overline{\mathfrak{I}_{v}}|}{2}\right\rceil\right)\\ &=& |G| + 3(|\overline{\mathfrak{I}_{\diamond}}| + |\overline{\mathfrak{I}_{v}}|) - 7\left\lceil \frac{|\overline{\mathfrak{I}_{\diamond}}| + |\overline{\mathfrak{I}_{v}}|}{2}\right\rceil\\ &\leq& |G| . \end{array} $$

In the hardness proof from [33, 34] for the case of unbounded alphabets (see Page 16), one simple, but crucial fact was that for every i, 1 ≤ in, we can assume that nonterminals for each factor #vi and vi# exist. By using the previously mentioned lemmas, we now show a similar statement for our reduction:

Lemma 6

There is a smallest grammar G for \(w_{\mathcal {G}}\) such that, for every i, 1 ≤ in, there is a nonterminal with derivative #〈7i + Cv(i)〉v and a nonterminal with derivative 〈7i + Cv(i)〉v#.

Proof

Let G = (N,Σ, R, ax) be a smallest grammar for \(w_{\mathcal {G}}\). By Lemma 5, we can assume that, for every i, 1 ≤ i ≤ 14n, there is a nonterminal Di with derivative 〈i and a nonterminal Vi with derivative 〈iv.

Let be the total number of occurrences of symbols from {⋆, ¢1, ¢2, #, $1, \(\dots \), $6} in \(w_{\mathcal {G}}\). We can conclude that |ax|≤ , since an axiom of length can be obtained from \(w_{\mathcal {G}}\) (without introducing any new rules) by replacing all occurrences of 〈i and 〈iv by Di and Vi, respectively.

Let \(N_{\mathsf {ax}} = \{A \mid A \in N, |\mathsf {ax}|_{A} \geq 1, \mathfrak {D}(A)_{\star } \geq 1\}\) and let Γ = {⋆,¢12, #}. Furthermore, for every i, 1 ≤ i ≤ 3, \(N_{\mathsf {ax}, i} = \{A \mid A \in N_{\mathsf {ax}}, {\sum }_{x \in {\Gamma }} |\mathfrak {D}(A)|_{x} = i\}\). Since, for every ANax, \({\sum }_{x \in {\Gamma }} |\mathfrak {D}(A)|_{x} > 3\) is a contradiction to Lemma 4, we can conclude that {Nax,1, Nax,2, Nax,3} is a partition of Nax. Consequently, we can use this partition in order to estimate the length of the axiom in the following way: \(|\mathsf {ax}| \geq \ell - {\sum }_{A \in N_{\mathsf {ax}, 2}} |\mathsf {ax}|_{A} - 2 {\sum }_{A \in N_{\mathsf {ax}, 3}} |\mathsf {ax}|_{A}\) (note that each occurrence of some ANax, j, j ∈{2,3}, is responsible for |ax|A units of the size |ax|, but also for exactly j|ax|A occurrences of the total amount of symbols from \(\{\star , {\cent}_{1}, {\cent}_{2}, \#, \$_{1},\dots ,\$_{6}\}\)). Moreover, also due to Lemma 4, for every ANax,2, \(\mathfrak {D}(A) = \# f(7i+C_{v}(i)) \star r_{i}\) or \(\mathfrak {D}(A) = r_{i} \star f(7i+C_{v}(i))^{R}\#\) with |ri|≤|f(7i + Cv(i))| and, for every ANax,3, \(\mathfrak {D}(A) = \#\langle 7i+C_{v}(i) \rangle _{v} \#\).

We now add to G, for every i, 1 ≤ in, the rules \(\overset {{~}_{\leftarrow }}{V_{i}}\rightarrow \# V_{7i+C_{v}(i)}\) and \({\!}_{\rightarrow }{V_{i}}\rightarrow V_{7i+C_{v}(i)} \#\), and, for every ANax,3, we add the rule \(\overset {{~}_{\leftrightarrow }}{V_{i}}\rightarrow \overset {{~}_{\leftarrow }}{V_{i}} \#\), where \(\mathfrak {D}(A) = \#\langle 7i+C_{v}(i) \rangle _{v} \#\). Then, we replace ax by a new axiom \(\mathsf {ax}^{\prime }\) that is obtained from \(w_{\mathcal {G}}\) in the following way. Every factor 〈i is replaced by Di. For every occurrence of ⋆ in \(w_{\mathcal {G}}\), if this occurrence of ⋆ is produced (according to ax) by a nonterminal ANax,3, which, since \(\mathfrak {D}(A) = \#\langle 7i+C_{v}(i) \rangle _{v} \#\), implies that it is inside a factor #〈7i + Cv(i)〉v#, then we replace #〈7i + Cv(i)〉v# by \(\overset {{~}_{\leftrightarrow }}{V_{i}}\). All remaining factors of the form #〈7i + Cv(i)〉v# are replaced by \(\overset {{~}_{\leftarrow }}{V_{i}} \#\). Then, all remaining factors #〈7i + Cv(i)〉v and 〈7i + Cv(i)〉v# are replaced by \(\overset {{~}_{\leftarrow }}{V_{i}}\) and \({\!}_{\rightarrow }{V_{i}}\), respectively (note that since there are no factors of the form #〈7i + Cv(i)〉v# left, this is unambiguous). We note that \(|\mathsf {ax}^{\prime }| = \ell - {\sum }_{i=1}^{n}(|\mathsf {ax}^{\prime }|{\overset {{~}_{\leftarrow }}{V_{i}}}+|\mathsf {ax}^{\prime }|{{\!}_{\rightarrow }{V_{i}}}) - 2 {\sum }_{i=1}^{n} |\mathsf {ax}^{\prime }|{\overset {{~}_{\leftrightarrow }}{V_{i}}}\).

Next, we show that all the rules for the nonterminals of Nax,2Nax,3 can be removed from the grammar. To this end, let ANax,2Nax,3, which means that \(|\mathfrak {D}(A)|_{\#} \geq 1\). However, every occurrence of # of \(w_{\mathcal {G}}\) that is produced by a rule (and is not already present in the new axiom \(\mathsf {ax}^{\prime }\)), is directly produced by \(\overset {{~}_{\leftarrow }}{V_{i}}\), \({\!}_{\rightarrow }{V_{i}}\) or \(\overset {{~}_{\leftrightarrow }}{V_{i}}\), i. e., it occurs on the right side of these rules and is not produced by means of any other nonterminal. Consequently, in the derivation of \(w_{\mathcal {G}}\), the nonterminal A is not used and, therefore, its rule can be erased.

It only remains to show that the modified grammar is not larger than the original one, i. e., we have to compare \(|\mathsf {ax}^{\prime }|\) to |ax| show that the size increase of 2 caused by each added rule is compensated. For every new rule \(\overset {{~}_{\leftrightarrow }}{V_{i}}\rightarrow \overset {{~}_{\leftarrow }}{V_{i}} \#\) (of cost 2), there is an ANax,3 with \(\mathfrak {D}(A) = \# \langle 7i+C_{v}(i) \rangle _{v} \#\) (of cost at least 2), for which the rule is erased and all all occurrences of A in ax correspond to occurrences of some \(\overset {{~}_{\leftrightarrow }}{V_{i}}\) in \(\mathsf {ax}^{\prime }\), hence \({\sum }_{i=1}^{n} |\mathsf {ax}^{\prime }|{\overset {{~}_{\leftrightarrow }}{V_{i}}}={\sum }_{A \in N_{\mathsf {ax}, 3}} |\mathsf {ax}|_{A}\). For every new rule \(\overset {{~}_{\leftarrow }}{V_{i}}\rightarrow \# V_{7i+C_{v}(i)}\) consider \(\overset {{~}_{\leftarrow }}{I}:=\{i\colon \mathfrak {D}(A) = \# f(7i+C_{v}(i)) \star r_{i} \text { for some } A \in N_{\mathsf {ax}, 2}\}\). If \(i\in \overset {{~}_{\leftarrow }}{I}\) we have removed at least one rule Aα with \( \mathfrak {D}(A) = \# f(7i+C_{v}(i)) \star r_{i}\) with |α|≥ 2, so the cost for all rules \(\overset {{~}_{\leftarrow }}{V_{i}}\rightarrow \# V_{7i+C_{v}(i)}\) with \(i\in \overset {{~}_{\leftarrow }}{I}\) is compensated. Further, every occurrence of this A in ax yields an occurrence of \(\overset {{~}_{\leftarrow }}{V_{i}}\) in \(\mathsf {ax}^{\prime }\). If \(i\not \in \overset {{~}_{\leftarrow }}{I}\), then both occurrences of #〈7i + Cv(i)〉v in the factor v of \(w_{\mathcal {G}}\) are produced in ax by at least two nonterminals each. An analogous argument applies to the new rules \({\!}_{\rightarrow }{V_{i}}\rightarrow V_{7i+C_{v}(i)} \#\) with \({\!}_{\rightarrow }{I}:=\{i\colon \mathfrak {D}(A) = r_{i}\star f(7i+C_{v}(i))^{R} \# \text { for some } A \in N_{\mathsf {ax}, 2}\}\). This yields \({\sum }_{i=1}^{n}(|\mathsf {ax}^{\prime }|{\overset {{~}_{\leftarrow }}{V_{i}}}+|\mathsf {ax}^{\prime }|{{\!}_{\rightarrow }{V_{i}}}) \geq {\sum }_{A \in N_{\mathsf {ax}, 2}} |\mathsf {ax}|_{A} +2(n-|\overset {{~}_{\leftarrow }}{I}|)+2(n-|{\!}_{\rightarrow }{I}|)\). Together with \({\sum }_{i=1}^{n} |\mathsf {ax}^{\prime }|{\overset {{~}_{\leftrightarrow }}{V_{i}}}={\sum }_{A \in N_{\mathsf {ax}, 3}} |\mathsf {ax}|_{A}\) we can conclude:

$$ \begin{array}{@{}rcl@{}} |\mathsf{ax}^{\prime}| &=& \ell - \sum\limits_{i=1}^{n}(|\mathsf{ax}^{\prime}|{\overset{{~}_{\leftarrow}}{V_{i}}}+|\mathsf{ax}^{\prime}|{{\!}_{\rightarrow}{V_{i}}}) - 2 \sum\limits_{i=1}^{n} |\mathsf{ax}^{\prime}|{\overset{{~}_{\leftrightarrow}}{V_{i}}}\\ &\leq& \ell - \sum\limits_{A \in N_{\mathsf{ax}, 2}} |\mathsf{ax}|_{A} -2(n-|\overset{{~}_{\leftarrow}}{I}|)-2(n-|{\!}_{\rightarrow}{I}|) - 2\!\!\sum\limits_{A \in N_{\mathsf{ax}, 3}} |\mathsf{ax}|_{A}\\ & \leq&|\mathsf{ax}| -2(n-|\overset{{~}_{\leftarrow}}{I}|)-2(n-|{\!}_{\rightarrow}{I}|) \end{array} $$

Since every new rule for \({\overset {{~}_{\leftarrow }}{V_{i}}}\) or \({{\!}_{\rightarrow }{V_{i}}}\) is added at a cost of two, the difference between \(|\mathsf {ax}^{\prime }|\) and |ax| compensates for the additional rules \(\overset {{~}_{\leftarrow }}{V_{i}}\rightarrow \#V_{7i+C_{v}(i)} \) with \(i\not \in \overset {{~}_{\leftarrow }}{I}\) and \({\!}_{\rightarrow }{V_{i}}\rightarrow V_{7i+C_{v}(i)} \#\) with \(i\not \in {\!}_{\rightarrow }{I}\). Recall further that the cost for the rules for \({\overset {{~}_{\leftrightarrow }}{V_{i}}}\) are compensated by deleting the rules in Nax,3. Overall, the modified grammar is not larger than the original grammar. Furthermore, the new grammar has now the form stated in the lemma. □

Now, by the lemmas presented above, we are able to sufficiently pin down the structure of a smallest grammar for \(w_{\mathcal {G}}\):

Lemma 7

There is a smallest grammar G for \(w_{\mathcal {G}}\) that contains all the rules

  • R := {r◇, i : ∣1 ≤ i ≤ 14n}, with \(r_{\diamond , i} := D_{i} \rightarrow d_{i} \star d_{i}\), 1 ≤ i ≤ 7, and \(r_{\diamond , i} := D_{i} \rightarrow g(i)[1] D_{h(i)} g(i)[1]\), 8 ≤ i ≤ 14n, where \(h(i):= \frac {i-M(i,7)}{7}\),

  • Rv := {rv, i : ∣1 ≤ i ≤ 14n}, with \(r_{v, i} := V_{i} \rightarrow x_{i} \star x_{i}\), 1 ≤ i ≤ 7, and \(r_{v, i} := V_{i} \rightarrow f(i)[1] V_{h(i)} f(i)[1]\), 8 ≤ i ≤ 14n, where \(h(i):= \frac {i-M(i,7)}{7}\),

  • \(\overset {{~}_{\leftarrow }}{V} := \{\overset {{~}_{\leftarrow }}{V_{i}}\rightarrow \# V_{7i+C_{v}(i)} \mid 1 \leq i \leq n\}\),

  • \({\!}_{\rightarrow }{V} := \{{\!}_{\rightarrow }{V_{i}} \rightarrow V_{7i+C_{v}(i)} \# \mid 1 \leq i \leq n\}\),

  • \(\overset {{~}_{\leftrightarrow }}{V} := \{\overset {{~}_{\leftrightarrow }}{V_{i}} \rightarrow \# {\!}_{\rightarrow }{V_{i}} \mid i \in \mathfrak {I}\}\), for some \(\mathfrak {I} \subseteq \{1, 2, \ldots , n\}\).

and an axiom \(\mathsf {ax} = {\prod }^{6}_{i=1}(\beta _{i} \$_{i}) \beta _{7}\) with

$$ \begin{array}{@{}rcl@{}} &&\beta_{1} = \prod\limits_{j=0}^{6}\left( \prod\limits_{i=1}^{14n} (D_{i} V_{M(i+j,14n)}) \right) ,\qquad \beta_{2} = \prod\limits_{i=1}^{n} \left( \overset{{~}_{\leftarrow}}{V_{i}} {\cent}_{1} D_{7i-1} \right) ,\\ &&\beta_{3} = \prod\limits_{i=1}^{n} \left( \overset{{~}_{\leftarrow}}{V_{i}} {\cent}_{2} D_{7i-2}\right) , \qquad \qquad\quad \beta_{4} = \prod\limits_{i=1}^{n} \left( {\!}_{\rightarrow}{V_{i}} D_{7i-2} {\cent}_{1}\right) , \\ &&\beta_{5} = \prod\limits_{i=1}^{n} \left( {\!}_{\rightarrow}{V_{i}} D_{7i-1} {\cent}_{2}\right) , \end{array} $$
$$ \begin{array}{@{}rcl@{}} \beta_{6} &= &\prod\limits_{i=1}^{n} \left( y_{i} D_{7i} \right),\text{ where for every \textit{i}, $1 \leq i \leq n$, } y_{i}=\begin{cases} \overset{{~}_{\leftrightarrow}}{V_{i}}& \text{ if } i \in \mathfrak{I},\\ \overset{{~}_{\leftarrow}}{V_{i}} \#& \text{ otherwise}, \end{cases}\\ \beta_{7} &= &\prod\limits_{i=1}^{m-1}(y_{i} D_{7i+C_{e}(v_{j_{2i}},v_{j_{2i+1}})}) y_{m}, \text{ where for every \textit{i}, $1 \leq i \leq m$, }\\ &&y_{i} \in \{\overset{{~}_{\leftrightarrow}}{V}_{j_{2i-1}} {\!}_{\rightarrow}{V}_{j_{2i}}, \overset{{~}_{\leftarrow}}{V}_{j_{2i-1}} \overset{{~}_{\leftrightarrow}}{V}_{j_{2i}}\} \text{ if } \{j_{2i-1}, j_{2i}\} \cap \mathfrak{I} \neq \emptyset,\\ &&y_{i} = \overset{{~}_{\leftarrow}}{V}_{j_{2i-1}} \overset{{~}_{\leftarrow}}{V}_{j_{2i}}\# \text{ otherwise}. \end{array} $$

Proof

Let G be a smallest grammar for \(w_{\mathcal {G}}\). By Lemma 5, we can assume that, for every i, 1 ≤ i ≤ 14n, there is a nonterminal Di with derivative 〈i and a nonterminal Vi with derivative 〈iv, and, by Lemma 6, we can assume that, for every i, 1 ≤ in, there is a nonterminal \(\overset {{~}_{\leftarrow }}{V_{i}}\) with derivative #〈7i + Cv(i)〉v and a nonterminal \({\!}_{\rightarrow }{V_{i}}\) with derivative 〈7i + Cv(i)〉v#. Obviously, for every i, 1 ≤ in, we can substitute the rule for \(\overset {{~}_{\leftarrow }}{V_{i}}\) by \(\overset {{~}_{\leftarrow }}{V_{i}} \to \# V_{i}\) and the rule for \({\!}_{\rightarrow }{V_{i}}\) by \({\!}_{\rightarrow }{V_{i}} \to V_{i} \#\), without increasing the size of G.

Next, for every Vjαj with |αj|≥ 3, we can replace Vjαj by Vjxjxj, if j ≤ 7, and by \(V_{j} \rightarrow f(j)[1] V_{h(j)} f(j)[1]\), if 8 ≤ j, where \(h(j):= \frac {j-M(j,7)}{7}\). This does not increase the size of G, since the size of the modified rules can only decrease and no new rules need to be added. Now let \(j = \max \limits \{i \mid 1 \leq i \leq 14n, V_{i} \to \alpha _{i}, |\alpha _{i}| = 2\}\). We can now again replace Vjαj by Vjxjxj, if j ≤ 7, and by \(V_{j} \rightarrow f(j)[1] V_{h(j)} f(j)[1]\), if 8 ≤ j, where \(h(j):= \frac {j-M(j,7)}{7}\), but now this operation increases the size of the grammar by 1, which, as shall be shown next, is compensated by removing a rule from the grammar. To this end, we note that αj = AjBj and \(\mathfrak {D}(A_{j}) = f(j) \star t_{j}\) or \(\mathfrak {D}(B_{j}) = t_{j} \star f(j)^{R}\) for some \(t_{j} \in \{x_{1}, \ldots , x_{7}\}^{*}\). Let us assume that \(\mathfrak {D}(A_{j}) = f(j) \star t_{j}\) (the case \(\mathfrak {D}(B_{j}) = t_{j} \star f(j)^{R}\) can be handled analogously); note that this particularly implies that Aj∉{Vi∣1 ≤ i ≤ 14n}, since its derivative is not of the form 〈iv. Since f(j) ⋆ tj does not occur in any \(\langle j^{\prime } \rangle _{v}\) with \(j^{\prime } < j\), Aj is not involved in a production of any \(\langle j^{\prime } \rangle _{v}\) with \(j^{\prime } < j\). Moreover, Aj cannot occur on the right side of the rule for a \(V_{j^{\prime }}\) with \(j < j^{\prime }\), since, due to the maximality of j and the modifications from above, those only have nonterminals of the form Vi on the right side. Thus, Aj has no occurrence in any of the rules for the nonterminals Vi, 1 ≤ i ≤ 14n. This means that Aj can only occur on the right side of some nonterminal with a derivative that is not a factor of some 〈iv and, since \(|\mathfrak {D}(A_{j})|_{\star } \geq 1\), with Lemma 4, we can further conclude that Aj can only occur on the right side of some nonterminal with a derivative #iv, 〈iv# or #iv#. The rules \(\overset {{~}_{\leftarrow }}{V_{i}} \to \# V_{i}\) and \({\!}_{\rightarrow }{V_{i}} \to V_{i} \#\) have the derivatives #iv and 〈iv#, respectively, and their right sides do not contain Aj. Furthermore, if the right side of a nonterminal with derivative #iv# contains Aj, we can replace it by \(\overset {{~}_{\leftarrow }}{V_{i}} \#\) without increasing the size of the grammar. Consequently, we can assume that the nonterminal Aj is never used and therefore its rule can be removed. By repeating this argument, it follows that G contains all the rules Rv.

In a similar way, we can show that G contains all the rules R (in fact, the argument is simpler, since in this case, Lemma 4 together with the fact that Aj can only occur on the right side of some nonterminal with a derivative that is not a factor of some 〈i immediately implies that Aj does not occur on any right side).

We now assume that \(\mathsf {ax} = {\prod }^{6}_{i=1}(\beta _{i} \$_{i}) \beta _{7}\) is the axiom of G. In the same way as in the proofs of Lemmas 5 and 6, we can conclude that |β1|≥ 196n, |β|≥ 3n, 1 ≤ ≤ 5. Hence, replacing ax by \(\mathsf {ax}^{\prime } = {\prod }^{6}_{i=1}(\beta ^{\prime }_{i} \$_{i}) \beta ^{\prime }_{7}\) with

$$ \begin{array}{@{}rcl@{}} &&\beta^{\prime}_{1} = \prod\limits_{j=0}^{6}\left( \prod\limits_{i=1}^{14n} (D_{i} V_{M(i+j,14n)}) \right) , \beta^{\prime}_{2} = \prod\limits_{i=1}^{n} \left( \overset{{~}_{\leftarrow}}{V_{i}} {\cent}_{1} D_{7i-1} \right) ,\\ &&\beta^{\prime}_{3} = \prod\limits_{i=1}^{n} \left( \overset{{~}_{\leftarrow}}{V_{i}} {\cent}_{2} D_{7i-2}\right) , {\kern35pt}\beta^{\prime}_{4} = \prod\limits_{i=1}^{n} \left( {\!}_{\rightarrow}{V_{i}} D_{7i-2} {\cent}_{1}\right) , \\ &&\beta^{\prime}_{5} = \prod\limits_{i=1}^{n} \left( {\!}_{\rightarrow}{V_{i}} D_{7i-1} {\cent}_{2}\right) , \end{array} $$

does not increase the size of the grammar. We now consider β6, which produces the word \(v_{6} = {\prod }_{i=1}^{n} \left (\# \langle 7i+C_{v}(i) \rangle _{v} \# \langle 7i \rangle _{\diamond }\right )\). We can conclude the following from Lemma 4. No two occurrences of ⋆ in v6 can be produced by the same nonterminal; thus, |β6|≥ 2n. Furthermore, the only factors that are repeated in \(w_{\mathcal {G}}\) and that contain an occurrence of both ⋆ and # are factors of #〈7i + Cv(i)〉v#. Hence, for every i, 1 ≤ in, if the factor #〈7i + Cv(i)〉v# in #〈7i + Cv(i)〉v#〈7i is not produced by a single nonterminal, then there is an additional nonterminal in β6 (i. e., in addition to the two nonterminals producing the two occurrences of ⋆ in #〈7i + Cv(i)〉v#〈7i). This implies that |β6|≥ 3np, where p is the number of nonterminals with a derivative of #〈7i + Cv(i)〉v#. This means that we can replace every such nonterminal and its rule by \(\overset {{~}_{\leftrightarrow }}{V_{i}} \rightarrow \# {\!}_{\rightarrow }{V_{i}}\) without increasing the size of the grammar. Furthermore, again without increasing the size of the grammar, we can replace β6 by \({\prod }_{i=1}^{n} \left (y_{i} D_{7i} \right )\), where, for every i, 1 ≤ in, \(y_{i} = \overset {{~}_{\leftrightarrow }}{V_{i}}\) if this nonterminal exists and \(y_{i} = \overset {{~}_{\leftarrow }}{V_{i}} \#\) otherwise.

Next, we consider β7, which produces the word

$$ \begin{array}{@{}rcl@{}} v_{7} = \prod\limits_{i=1}^{m-1}&(\!\!&\# \langle 7j_{2i-1}+C_{v}(j_{2i-1}) \rangle_{v} \# \langle 7j_{2i}+C_{v}(j_{2i}) \rangle_{v} \# \langle 7i+C_{e}(v_{j_{2i}},v_{j_{2i+1}}) \rangle_{\diamond} )\\ &&\# \langle 7j_{2m-1}+C_{v}(j_{2m-1}) \rangle_{v} \# \langle 7j_{2m}+C_{v}(j_{2m}) \rangle_{v} \# . \end{array} $$

Similar as for the word v6, every occurrence of ⋆ in v7 requires a distinct nonterminal and, in addition to that, also a distinct nonterminal for each factor #〈7i + Cv(i)〉v# that is not completely produced by a single nonterminal. Hence, |β7|≥ 4m − 1 − q, where q is the number of nonterminals \(\overset {{~}_{\leftrightarrow }}{V_{i}}\) used in β7. Consequently, we can also replace β7 by \(v_{7} = {\prod }_{i=1}^{m-1}(y_{i} D_{7i+C_{e}(v_{j_{2i}},v_{j_{2i+1}})}) y_{m}\), where, for every i, 1 ≤ im, \(y_{i} \in \{\overset {{~}_{\leftrightarrow }}{V}_{j_{2i-1}} {\!}_{\rightarrow }{V}_{j_{2i}}, \overset {{~}_{\leftarrow }}{V}_{j_{2i-1}} \overset {{~}_{\leftrightarrow }}{V}_{j_{2i}}\}\), if \(\overset {{~}_{\leftrightarrow }}{V}_{j_{2i-1}}\) or \(\overset {{~}_{\leftrightarrow }}{V}_{j_{2i}}\) exist, and \(y_{i} = \overset {{~}_{\leftarrow }}{V}_{j_{2i-1}} \overset {{~}_{\leftarrow }}{V}_{j_{2i}}\#\), otherwise. We note that this does not increase the size of the grammar.

The grammar has now the form claimed in the statement of the lemma (note that all other rules not mentioned in the statement of the lemma can be ignored, since they are not used anymore). □

Finally, we are able to conclude the proof of correctness by establishing the connection between the size of a smallest grammar for \(w_{\mathcal {G}}\) and the size of a vertex cover for \(\mathcal {G}\).

Lemma 8

The graph \(\mathcal {G}\) has a vertex cover of size k if and only if \(w_{\mathcal {G}}\) has a grammar of size 299n + k + 3m + 5.

Proof

Let Γ be a size-k vertex cover of \(\mathcal {G}\). We construct the grammar described in Lemma 7 with respect to \(\mathfrak {I} = \{i \mid v_{i} \in {\Gamma }\}\). Since Γ is a vertex cover, in the definition of β7, we have \(y_{i} \in \{\overset {{~}_{\leftrightarrow }}{V}_{j_{2i-1}} {\!}_{\rightarrow }{V}_{j_{2i}}, \overset {{~}_{\leftarrow }}{V}_{j_{2i-1}} \overset {{~}_{\leftrightarrow }}{V}_{j_{2i}}\}\), for every 1 ≤ im. Consequently, by simply counting the symbols on the right sides of the rules, we conclude \(|G| = 299n + |\mathfrak {I}| + 3m + 5 = 299n + k + 3m + 5\).

On the other hand, if there is a grammar of size 299n + k + 3m + 5 for \(w_{\mathcal {G}}\), then, by Lemma 7, we can also assume that there exists a grammar G for \(w_{\mathcal {G}}\) with \(|G| = 299n + |\mathfrak {I}| + 3m + 5 \leq 299n + k + 3m + 5\) that has the form described in Lemma 7, with respect to some \(\mathfrak {I} \subseteq \{1, 2, \ldots , n\}\). If, for some edge (vi, vj), \(\{v_{i}, v_{j}\} \cap \mathfrak {I} = \emptyset \), then adding i to \(\mathfrak {I}\) (and therefore the rule \(\overset {{~}_{\leftrightarrow }}{V_{i}} \rightarrow \# {\!}_{\rightarrow }{V_{i}}\) to the grammar) does not increase the size of the grammar. This is due to the fact that the additional cost of 2 for introducing the rule is compensated by using \(\overset {{~}_{\leftrightarrow }}{V_{i}}\) once in β6 and once in β7. Consequently, we can assume that \({\Gamma } = \{v_{i} \mid i \in \mathfrak {I}\}\) is a vertex cover. Since \(|G| = 299n + |\mathfrak {I}| + 3m + 5 \leq 299n + k + 3m + 5\), this means that Γ is a vertex cover for \(\mathcal {G}\) of size at most \(|\mathfrak {I}|=k\). □

From Lemma 8, we directly conclude our main result:

Theorem 3

SGP is NP-complete, even for alphabets of size 24.

Obviously, Theorem 3 leaves some room for improvement with respect to smaller alphabet sizes. In our reduction, we did use terminal symbols economically, but, for reasons explained next, this was not our main concern. While we generally believe that the alphabet size can be slightly reduced in our reduction, we consider it very unlikely that its current structure allows a substantial improvement in this regard (e. g., an alphabet size below 10). Thus, we did not further pursue this point, which we expect to lead to an even more involved reduction while at the same time only insignificantly decreases the alphabet size. Consequently, the NP-hardness of the smallest grammar problem for small alphabets (with the most interesting candidates being 2 (i. e., binary strings) and 4 (due to the fact that DNA-sequences use a 4-letter alphabet)) remains open. Furthermore, we expect that completely new techniques are required for respective hardness reductions. In this regard, note that for alphabets of size 1, the smallest grammar problem is strongly connected to the problem of computing the smallest addition chain for a single integer; a problem that is neither known to be in P nor to be NP-hard (see [34] or Section 6 for details).

3.3 Extensions of the Reductions

In this section, we conclude several important hardness results by slight modifications of the reduction presented in Section 3.2. First, we show that the optimisation variant of the smallest grammar problem (over fixed alphabets) is APX-hard and therefore it does not allow for a polynomial-time approximation scheme, unless P = NP. Just like Theorem 3 lifts the known NP-hardness of the smallest grammar problem for unbounded alphabets to the practically relevant case of fixed alphabets, this APX-hardness result lifts the inapproximability result for unbounded alphabets of [33, 34] to the fixed alphabet case. There is one caveat, though, which is that the corresponding constant lower bound on the approximation ratio is much lower than the already low 1.0001 achieved for unbounded alphabets; thus, we do not bother to actually compute it and we consider the value of the APX-hardness result that the existence of a PTAS is ruled out.

Theorem 4

SGPopt is APX-hard, even for alphabets of size 24.

Proof

The reduction used for Theorem 3 can also be seen as an L-reduction from the optimisation variant of the minimum vertex cover problem restricted to cubic graphs (each vertex has degree 3), which remains APX-hard (see [52]). More precisely, this problem is denoted by (IVC, SVC, mVC), where IVC is the set of undirected cubic graphs, \(S_{\textsc {VC}}(\mathcal {G}) = \{C \mid C \text { is a vertex cover for } \mathcal {G}\}\) and \(m_{\textsc {VC}}(\mathcal {G}, C) = |C|\); we denote SGPopt by (ISGP, SSGP, mSGP).

Next, we describe an L-reduction from the problem (IVC, SVC, mVC) to the problem (ISGP, SSGP, mSGP). The above described translation of a graph \(\mathcal {G}\) to the word \(w_{\mathcal {G}}\) (i. e., the one defined in Section 3.2 in order to prove Theorem 3) gives the function f for the L-reduction. The function g, that maps \(\mathcal {G} \in I_{\textsc {VC}}\) and a grammar \(G \in S_{\textsc {SGP}}(f(\mathcal {G}))\) to a vertex cover \(C \in S_{\textsc {VC}}(\mathcal {G})\) works as follows. We first build a grammar \(G^{\prime }\) with \(|G^{\prime }|\leq |G|\) which is of the form described in Lemma 7; observe that all transformations that are necessary to reach this kind of normal form are constructive and computable in polynomial time. Then \(g(\mathcal {G}, G) = \{v_{i} \mid i \in \mathfrak {I}\}\), which is a vertex cover for \(\mathcal {G}\) by Lemma 8 (note that the set \(\mathfrak {I}\) is ensured by Lemma 7). Finally, we show that choosing β = 613 and γ = 1 satisfies the inequalities. To this end, we first note that, for any cubic graph \(\mathcal {G}\) with n vertices and m edges, we have \(m=\frac {3}{2}n\) (since each vertex has degree 3) and \(m_{\textsc {VC}}^{*}(\mathcal {G}) \geq \frac {n}{2}\) (since each vertex can cover at most three edges), and \(m_{\textsc {VC}}^{*}(\mathcal {G}) \geq 1\).

$$ \begin{array}{@{}rcl@{}} m_{\textsc{SGP}}^{*}(w_{\mathcal{G}}) &=& 299n + 3m + 5 + m_{\textsc{VC}}^{*}(\mathcal{G})\\ &=& 607 \cdot \frac{n}{2} + 5 + m_{\textsc{VC}}^{*}(\mathcal{G}) \\ &\leq& 607 \cdot m_{\textsc{VC}}^{*}(\mathcal{G}) + 5 + m_{\textsc{VC}}^{*}(\mathcal{G}) \\ &\leq& 613 \cdot m_{\textsc{VC}}^{*}(\mathcal{G}) = \beta \cdot m_{\textsc{VC}}^{*}(\mathcal{G}) , \end{array} $$

for any \(\mathcal {G}\in I_{\textsc {VC}}\). Furthermore,

$$ \begin{array}{@{}rcl@{}} m_{\textsc{VC}}(\mathcal{G},g(\mathcal{G},G)) - m_{\textsc{VC}}^{*}(\mathcal{G}) &= &(299n + 3m + 5 + m_{\textsc{VC}}(\mathcal{G},g(\mathcal{G},G))) - \\ &&(299n + 3m + 5 + m_{\textsc{VC}}^{*}(\mathcal{G}))\\ &= &1\cdot (m_{\textsc{SGP}}(w_{\mathcal{G}},G) - m_{\textsc{SGP}}^{*}(w_{\mathcal{G}})) , \end{array} $$

for any \(\mathcal {G}\in I_{\textsc {VC}}\) and \(G\in S_{\textsc {SGP}}(w_{\mathcal {G}})\). □

Next, we take a closer look at the rule-size measure of grammars, i. e., at the problems SGPr and 1-SGPr. As defined in Section 2.2, the rule-size also takes the number of rules into account. In fact, the literature on grammar-based compression is inconsistent with respect to which kind of size is used, e. g., in [6, 8, 15, 33, 34, 34, 41], the size of a grammar coincides with our definition |⋅|, while in [4, 53,54,55], the rule-size is used. The rule-size seems to be mainly motivated by the question of how a grammar is encoded as a single string, which, in any reasonable way, requires an additional symbol per rule.Footnote 9 In many contexts, the difference between size and rule-size of grammars seems negligible, but, formally, the problems SGP and SGPr (as well as 1-SGP and 1-SGPr) are different decision problems and hardness results do not automatically carry over from one to the other. Since the existing literature suggests that the rule-size is of interest as well, we consider it a worthwhile task to extend our hardness results accordingly.

It seems intuitively clear that the size increase caused by measuring with the rule-size does not have an impact on the complexity of the smallest grammar problem. In fact, the arguments in the proof for Theorem 2 for the 1-level case also apply for the rule-size, but with an addition of 2n + k + 2 (i. e., the number of rules) to the size of an r-smallest grammar. This is due to the fact that the rules that are introduced in the proof of Lemma 3 also shorten the grammar with respect to the rule-size measure.

Theorem 5

1-SGPr is NP-complete, even for even for alphabets of size 5.

In the multi-level case, however, the situation is not so simple. In particular, in the proof of Theorem 3, there are some arguments, which do not apply for the rule-size. For example, a rule which only compresses a factor of length two is only profitable (with respect to the rule-size) if it can be used at least three times, which is problematic, since the rules which correspond to the vertex cover have length two and, in case the vertex only covers one edge, compress factors which only occur twice. Beside these problems, already in Lemma 5, we can see that it is hard to prove that the rule-size of the desired grammar \(G^{\prime \prime }\) is smaller than |G|r as we now have to pay a cost of 4 for each rule Vi (or Di) with \(i\notin \mathfrak {I}_{v}\) (or \(i\notin \mathfrak {I}_{\diamond }\)) which cannot be compensated by shortening the axiom for u only by \(7\lceil \frac {|\overline {\mathfrak {I}_{\diamond }}|+|\overline {\mathfrak {I}_{v}}|}{2}\rceil \).

With a larger alphabet and certain repetitions of subwords of \(w_{\mathcal {G}}\), we can modify the reduction to accommodate the rule-size, such that the arguments used for Theorem 3 still hold for this measure. To this end, we now encode 〈iv and 〈i over 8-ary instead of 7-ary alphabets \(\{x_{1},\dots ,x_{8}\}\) and \(\{d_{1},\dots ,d_{8}\}\), respectively, with analogous functions f and g. Let \(v^{\prime }\) and \(w^{\prime }\) be defined as v and w on page 23, but with respect to the new 8-ary codewords which only means that each occurrence of ‘7’ in the definition of v and w is replaced by ‘8’. Moreover, let \(u^{\prime }\) be defined as u on page 23, but with the ‘6’ of the first product replaced by ‘7’ and the ‘14n’ of the second product replaced by ‘24n + 4’ (the latter is necessary, since we need more separators of the form 〈i). The colourings Cv and Ce remain unchanged.

In order to adapt the reduction to the rule-size measure, we have to repeat each factor #〈8i + Cv(i)〉v and each factor 〈8i + Cv(i)〉v# once more, but in such a way that Proposition 2 still holds, which is done by using three new symbols $7, $8 and ¢3, and to add the following to \(v^{\prime }\):

$$ \begin{array}{@{}rcl@{}} v^{\prime\prime} = v^{\prime}& & \prod\limits_{i=1}^{n} \left( \langle 8i+C_{v}(i) \rangle_{v} \# \langle 8i-3 \rangle_{\diamond} {\cent}_{3}\right) {\$_{7}}\\ &&\prod\limits_{i=1}^{n}\left( \# \langle 8i+C_{v}(i) \rangle_{v} {\cent}_{3} \langle 8i-3 \rangle_{\diamond}\right) {\$_{8}} . \end{array} $$

In order to also repeat once more the factors #〈8j2i + Cv(j2i)〉v# to make covering edges profitable with respect to the rule-size, we repeat the complete list of edges, but every edge \((v_{j_{2i-1}}, v_{j_{2i}})\) is represented in reverse order as #〈8j2i + Cv(j2i)〉v#〈8j2i− 1 + Cv(j2i− 1)〉v# to make sure that no subword of the form 〈iv#xj or xj#iv is repeated. We further choose a new, previously not used set of separators 〈i (actually the 2m + 4 more for which we created codewords with u) to make sure that each factor of the form 〈i# or #i occurs at most once. We still chose the separators according to the edge-colouring to make sure that no factors of the form 〈iv#dj or dj#iv are repeated; observe that by repeating the edges in reverse order, a factor of the form 〈iv#dj in \(w^{\prime }\) becomes a factor of the form dj#iv in the reverse listing. Formally, we define:

$$ \begin{array}{@{}rcl@{}} w^{\prime\prime} = w^{\prime} \tilde{w} \# \langle 8j_{2}+C_{v}(j_{2}) \rangle_{v} \# \langle 8j_{1}+C_{v}(j_{1}) \rangle_{v}\# , \end{array} $$

where

$$ \begin{array}{@{}rcl@{}} \tilde{w} = \prod\limits_{i=m}^{2} (& \# \langle 8j_{2i}+C_{v}(j_{2i}) \rangle_{v} \# \langle 8j_{2i-1}+C_{v}(j_{2i-1}) \rangle_{v} \\ &\# \langle 8(i+m)+C_{e}(v_{j_{2(i-1)}},v_{j_{2i-1}}) \rangle_{\diamond}) . \end{array} $$

Finally, we set \(w^{\prime }_{\mathcal {G}} = u^{\prime }v^{\prime \prime }w^{\prime \prime }\).

It can be easily verified that Lemma 4 remains true for the new construction; observe that appending the new part of \(w^{\prime \prime }\) yields the only occurrence of the factor ## (note that \(w^{\prime }\) ends with #) which implies that the old and the new part are separated in the axiom of any r-smallest grammar for \(w^{\prime }_{\mathcal {G}}\). The equivalent to Lemma 5 also holds, since the part of the axiom for \(u^{\prime }\) now has a length of at least \(384n+64+8\lceil \frac {|\overline {\mathfrak {I}_{\diamond }}|+|\overline {\mathfrak {I}_{v}}|}{2}\rceil +1\) and the set of new rules, which now costs \(4(|\overline {\mathfrak {I}_{\diamond }}|+|\overline {\mathfrak {I}_{v}}|)\), shortens this to 384n + 65 (i. e., the number of occurrences of ⋆ in \(u^{\prime }\) plus 1 for $1). Lemma 6 follows with the same arguments as before, just with 3 occurrences for each #iv and 〈iv#, which makes the rules for these subwords profitable even with respect to the rule-size. An analogue of Lemma 7 then follows exactly as before (the only addition is that the new parts of \(v^{\prime \prime }\) and \(w^{\prime \prime }\) are compressed in the obvious way by the existing rules). The following observation shall be helpful.

Observation 3

If HCode \(\mathfrak {I} \subseteq \{1, 2, \ldots , n\}\) is such that \(\{v_{i} \mid i \in \mathfrak {I}\}\) is a vertex cover, then the grammar for \(w^{\prime }_{\mathcal {G}}\) according to the adapted version of Lemma 7 with respect to \(\mathfrak {I}\) (see the proof of Lemma 8) satisfies \(|G| = 553n + |\mathfrak {I}| + 6m + 94\) and \(|G|_{\mathsf {r}} = 603n + 2|\mathfrak {I}| + 6m + 103\) (note that for the rule-size, we also have to count the start rule, so the sizes differ by the number of rules which is 50n + k + 9).

An analogous statement of Lemma 8 can now be concluded as follows. For a size-k vertex cover Γ of \(\mathcal {G}\), we set \(\mathfrak {I} = \{i \mid v_{i} \in {\Gamma }\}\) and then construct a grammar G for \(w^{\prime }_{\mathcal {G}}\) according to the adapted version of Lemma 7 with respect to \(\mathfrak {I}\) with \(|G|_{\mathsf {r}} = 603n + 2|\mathfrak {I}| + 6m + 103\) (see Observation 3). On the other hand, if there is a grammar for \(w^{\prime }_{\mathcal {G}}\) of rule-size 603n + 2k + 6m + 103, then, by the adapted version of Lemma 7, there is a grammar G for \(w^{\prime }_{\mathcal {G}}\) with \(|G|_{\mathsf {r}} = 603n + 2|\mathfrak {I}| + 6m + 103 \leq 603n + 2k + 6m + 103\) that has the form given by the adapted version of Lemma 7, with respect to some \(\mathfrak {I} \subseteq \{1, 2, \ldots , n\}\). If, for some edge (vi, vj), \(\{v_{i}, v_{j}\} \cap \mathfrak {I} = \emptyset \), then the factors #〈8i + Cv(i)〉v#〈8j + Cv(j)〉v# and #〈8j + Cv(j)〉v#〈8i + Cv(i)〉v# in \(w^{\prime \prime }\) each correspond to three symbols in the axiom, and the factor #〈8i + Cv(i)〉v# in \(v^{\prime \prime }\) corresponds to two symbols in the axiom. Hence, introducing the rule \(\overset {{~}_{\leftrightarrow }}{V_{i}}\rightarrow \#{\!}_{\rightarrow }{V_{i}}\) has a cost of three with respect to the rule-size and shortens the axiom by at least three. Consequently, as in the proof of Lemma 8, we can assume that \({\Gamma } = \{v_{i} \mid i \in \mathfrak {I}\}\) is a vertex cover. Since \(|G|_{\mathsf {r}} = 603n + 2|\mathfrak {I}| + 6m + 103 \leq 603n + 2k + 6m + 103\), this means that Γ is a vertex cover for \(\mathcal {G}\) of size at most k. Thus, we conclude that the graph \(\mathcal {G}\) has a vertex cover of size k if and only if there exists a grammar of rule-size 603n + 2k + 6m + 103 for \(w^{\prime }_{\mathcal {G}}\), which yields the following:

Theorem 6

SGPr is NP-complete, even for alphabets of size 29.

Similar to Theorem 4, the above reduction can also be seen as an L-reduction (with the only change of setting β = 1329), which shows that the optimisation variant of the smallest grammar problem remains APX-hard under the rule-size measure.

Theorem 7

SGPr, opt is APX-hard, even for alphabets of size 29.

We conclude that if we change from the normal size measure to the rule-size measure, NP- and APX-hardness of the smallest grammar problem over fixed alphabets remains, although the smallest alphabet size in our constructions is slightly larger. We conclude this section by another interesting observation that follows from the rule-size variant of our reduction.

Obviously, the modified reduction to SGPr can also be interpreted as a reduction to SGP. While, on first glance, this only seems to yield a weaker hardness result compared to the one of Theorem 3, it has a nice feature that entails an interesting result in its own right. More precisely, with respect to the modified reduction and the normal size measure, every rule from Lemma 7 has a positive profit (i.e., replacing all occurrences of the nonterminal by the right side of the rule would increase the overall size) and, furthermore, every rule added in the proofs of Lemmas 5 and 6 yields a strictly smaller grammar (note that this directly follows from the correctness of the construction for the rule-size measure). Moreover, there are no repeated substrings in the grammar with this set of rules which means that no additional rules with nonnegative profit can be added. Consequently, we have not only determined the size of a smallest (with respect to |⋅|) grammar G for \(w^{\prime }_{\mathcal {G}}\) to be 553n + k + 6m + 94, where k is the size of a smallest vertex cover for \(\mathcal {G}\) (see Observation 3), but also that G requires exactly |G|r −|G| = 50n + k + 9 rules (or nonterminals). Hence, the modified reduction also serves as a reduction from the vertex cover problem to the following (weaker) variant of the smallest grammar problem:

  • Rule Number-SGP (RN-SGP)

  • Instance: A word w and a \(k \in \mathbb {N}\).

  • Question: Does there exist a smallest grammar G = (N,Σ, R, S) for w with |N|≤ k?

Theorem 8

RN-SGP is NP-hard, even for alphabets of size 29.

For the 1-level case, the original reduction already provides the analogous result (here, 1-RN-SGP denotes the variant of RN-SGP, where we ask whether there is a smallest 1-level grammar with |N|≤ k):

Theorem 9

1-RN-SGP is NP-hard, even for alphabets of size 5.

While the problems RN-SGP and 1-RN-SGP naturally arise in the context of grammar-based compression, they are particularly interesting in the light of the results presented in Section 4.1 and their relevance shall be discussed there in more detail.

3.4 (Limits of) Alphabet Reduction

As shall be discussed in this section, we can achieve a slight reduction of the alphabet size in Theorem 3. However, it seems rather unlikely that a substantial decrease is possible with our current general approach. In particular, it is suggested that a different approach is needed to prove the hardness of SGP for small, e. g., binary, alphabets.

We first note that we already saved one further unique separator of the form $i in the construction for the rule-size by using ## instead, simply exploiting the fact that this substring of length two is not repeated anywhere else, which makes a rule containing it impossible in a smallest grammar. We can actually also shrink our alphabet in the construction used to prove Theorem 3 by saving separator symbols, more precisely, by only using one symbol $ instead of \(\$_{1},\dots ,\$_{6}\). Recall that \(\$_{1},\dots ,\$_{6}\) only had the purpose to cut the grammar at these symbols as described in Observation 2 and hence avoid unwanted repetitions.

As a first observation, it is not hard to see that $2, $4, $5 can be removed from the \(w_{\mathcal {G}}\), without creating unwanted repetitions. Removing $2 only creates the two unwanted (in the sense that those should not repeat by Propositions 1 and 2) substrings ⋆ g(7n − 1)# and d1#f(Cv(1)) ⋆, which do not occur elsewhere in \(w_{\mathcal {G}}\) (more precisely for the second substring: y#f(Cv(1)) ⋆ with \(y\notin \{x_{1},\dots ,x_{7}\}\) occurs only two other times once with y = $1 and, after removal of $5, once with y = ¢2). Similar arguments hold for removing $4 and $5. The remaining $i occur in the subwords: \(x_{6}\$_{1}\#x_{C_{v}(1)}\), \(d_{5}\$_{3}x_{C_{v}(1)}\), \(d_{7}\$_{6}\#x_{C_{v}(j_{1})}\). Now consider replacing $1, $3, $6 each by the same symbol $. If we make sure to list the edges in an order such that Cv(1)≠Cv(j1), the only repeating factor of length more than one containing this new symbol $ is $#. As this subword of length two only occurs twice, it is not profitable for a smallest grammar to compress it with a rule. So with the little adjustments of deleting $2, $4, $5, possible picking another order to list the edges and replacing $1, $3, $6 by $, we need five symbols less for our reduction.

Further reduction of the alphabet size requires much more effort. Our main kind of argument is that certain rules cannot exist, simply because their derivative does not occur more than once in \(w_{\mathcal {G}}\). There are cases, where it is possible to show that certain rules with a repeated derivative do not occur, but the respective argument cannot be local and would rather depend on the structure of the whole grammar. On the other hand, rules that we want fixed in a smallest grammar have to be provably profitable. With these properties in mind, it is quite obvious that there is not much room to reduce the alphabet size further.

The symbols ⋆, #12 and, after applying the replacement above, $ each have a very specific purpose. It seems very difficult to reduce the alphabet by replacing one of those characters by another or some codeword.

For the symbols \(x_{1},\dots ,x_{7},d_{1},\dots ,d_{7}\), we see that in Lemma 5, which fixes the codewords for vertices and separators built from these symbols, we require at least six repetitions of each desired codeword. Doing this without repeating unwanted subwords, means that, at least with the idea we used to repeat these codewords in the alternating fashion given by the subword u, we need at least six different symbols in each encoding. For the separators 〈j, our construction requires the seven different symbols \(d_{1},\dots ,d_{7}\), to have unique separators between the repetitions of the subwords #iv, 〈iv# and #iv# in v and between the edges in the listing in w, for which we need four different kinds of separators, one for each colour of the edge-colouring Ce. For the vertex codewords 〈iv, we also need seven different symbols to represent the vertex colouring Cv. So, first of all, the only way to save symbols among \(x_{1},\dots ,x_{7},d_{1},\dots ,d_{7}\) seems to modify the input graph in such a way that the colourings Ce and Cv require less colours. It is possible to do this with the adjustments described in the following.

Given a subcubic graph \(\mathcal {G}=(V,E)\), we first build the graph \(\bar {\mathcal {G}}\) from \(\mathcal {G}\) by subdividing each edge twice, i.e., we replace each edge (u, v) ∈ E by three edges (u, uv),(uv, vu) and (vu, v), where uv and vu are two new vertices which are not adjacent to further edges. We now construct the word for SGP to represent the graph \(\bar {\mathcal {G}}\). This shift to the graph \(\bar {\mathcal {G}}\) can be used to decrease the number of colours we require both for Cv and Ce. First observe that the graph \(\bar {\mathcal {G}}^{2}\) (i. e., the graph obtained from \(\bar {\mathcal {G}}\) by the same operation used to obtain \(\mathcal {G}^{2}\) from \(\mathcal {G}\) in the original reduction; see page 23) has maximum degree three, as a vertex vV is adjacent to the at most three vertices in {uv: (u, v) ∈ E}, and a vertex vu, added by the subdivision process for an edge (u, v), is adjacent to u and possible the at most two vertices in {vx: (v, x) ∈ E, xu}. The vertex colouring Cv hence only needs four different colours to properly colour \(\bar {\mathcal {G}}^{2}\).

Next, we choose a specific listing of the edges of \(\bar {\mathcal {G}}\) such that the three edges of \(\bar {\mathcal {G}}\) corresponding to an edge (u, v) of \(\mathcal {G}\) are consecutively listed as (uv, u),(vu, uv),(v, vu) (and the relative order of such triples is arbitrary). In this way, the multi-graph \(\bar {\mathcal {G}}^{\prime }\) (i. e., the graph obtained from \(\bar {\mathcal {G}}\) by the same operation used to obtain \(\mathcal {G}^{\prime }\) from \(\mathcal {G}\) in the original reduction; see page 23) contains the edges {(u, vu),(uv, v): (u, v) ∈ E} for vertices from V and, in addition, we have at most one edge of the form \((u_{v}, u^{\prime }_{v^{\prime }})\) for each new vertex added by the subdivision. This means that in \(\bar {\mathcal {G}}^{\prime }\), a vertex vV is only adjacent to the at most three vertices in {uv: (u, v) ∈ E}, and a vertex uv added by the subdivision process for the edge (u, v) is adjacent to one edge connected to v and to at most one other edge connected to a vertex added by the subdivision process different from uv. Consequently, \(\bar {\mathcal {G}}^{\prime }\) is a simple graph and of maximum degree three. Further, observe that the vertices of degree three in \(\bar {\mathcal {G}}^{\prime }\) (which are a subset of the vertices in V ) form an independent set in \(\bar {\mathcal {G}}^{\prime }\). By a theorem of Fournier [56], an edge-colouring for a graph with these properties, only requires three colours and can be computed in polynomial time with Vizings algorithm [57]. With the same arguments used to prove Theorem 3, it follows that a smallest grammar encodes a minimum vertex cover for \(\bar {\mathcal {G}}\). It remains to observe that the size of a minimum vertex cover for the original input graph \({\mathcal {G}}\) can be derived from a minimum vertex cover for \(\bar {\mathcal {G}}\). If \(\mathcal {G}\) has a vertex cover of size k, then this can be extended to a vertex cover of size k + |E| for \(\bar {\mathcal {G}}\) by adding exactly one of uv and vu for each edge (u, v) of \(\mathcal {G}\). On the other hand, it can be easily seen that, without loss of generality, a minimum vertex cover for \(\bar {\mathcal {G}}\) contains exactly one of uv and vu for each edge (u, v) of \(\mathcal {G}\), and, moreover, the remaining k vertices in the vertex cover for \(\bar {\mathcal {G}}\) must be a vertex cover for the graph \(\mathcal {G}\).

Overall, the adjustments described so far lead to a hardness reduction which only uses an alphabet with 17 symbols, as we now only require a 6-ary encoding for vertices and separators. Observe that, although the colouring Cv only requires four colours now, we cannot reduce the alphabet for the vertices to be less than six, as we need six different symbols for the repetitions in u.

Corollary 1

SGP is NP-complete, even for alphabets of size 17.

The reduction sketched above can still be seen as an L-reduction from the optimisation version of vertex cover to SGPopt. Too see this, observe that the adjustments made to reduce the alphabet only cause an addition of \(\mathcal {O}(m)\) to the size of a smallest grammar for the word constructed for the input graph \(\mathcal {G}\). As \(\mathcal {O}(m)\subseteq \mathcal {O}(m^{*}_{VC}(\mathcal {G}))\) (recall that \(\mathcal {G}\) is cubic), the size of the smallest grammar can be linearly bounded by \(m^{*}_{VC}(\mathcal {G})\) in a similar way as shown in the proof of Theorem 4.

Corollary 2

SGPopt is APX-hard, even for alphabets of size 17.

The only way to further reduce the alphabet would be to not just use the repetitions in u to prove Lemma 5 but the repetitions in the whole word. This however is very difficult, as including the rules we want to fix can no longer easily be shown to shorten the axiom. If there is no nonterminal Vi which derives 〈iv for some index i, the larger substring #iv¢1 in v, for example, might still only require three symbols in the axiom by compressing parts of 〈iv with # or ¢1. Similarly for all occurrences of the substring 〈iv in v or w. This problem is actually the reason, why we need the nonterminals Vi and Di fixed for Lemma 6, to make our desired rules to derive 〈iv# and #iv in the cheapest possible way to enable the argument that other unwanted rules in Nax cannot be more profitable. Consequently, an alphabet of size 17 seems to be necessary to cleanly prove Theorem 3 with our construction.

Similar ideas and limits for alphabet reduction hold for the rule-size measure. A reduction that only uses $ instead of \(\$_{1},\dots , \$_{8}\) works analogously. The symbols $i with i ∈{2,4,5,6,7} can be deleted without creating repetitions of unwanted subwords. Replacing the remaining $i, i ∈{1,4,8} by $ and again reordering the edges in the listing given in \(w^{\prime \prime }\) such that \(x_{C_{v}(1)}\not =x_{C_{v}(j_{1})}\) makes sure that the only repeating factor of length more than one containing the new symbol $ is $#. This factor occurs exactly twice and is hence not compressed by a rule in a smallest grammar (observe that with the rule-size as measure, such a rule is not just unprofitable but even makes the grammar larger). As we here require eight repetitions to show the equivalent of Lemma 5 for the rule-size, saving symbols among \(x_{1},\dots ,x_{8},d_{1},\dots ,d_{8}\) is not possible. Consequently, Theorems 6 and 7 can be improved to require only an alphabet of size 22 but a reduction with a smaller alphabet will be very difficult with our construction.

4 Smallest Grammars with a Bounded Number of Nonterminals

A natural follow-up question to the hardness for fixed alphabets is whether polynomial-time solvability is possible if instead the cardinality of the nonterminal alphabet N (or, equivalently, the number of rules) is bounded. In this section, we answer this question in the affirmative by representing words w ∈Σ as graphs Φm(w) and Φ1(w), such that smallest independent dominating sets of these graphs correspond to smallest grammars and smallest 1-level grammars, respectively, for w.

It will be more convenient to first take care of the simpler 1-level case and to treat then the multi-level case as an extension of it, i. e., we first define Φ1(w) and then derive Φm(w) from Φ1(w). Recall that, as defined in Section 2, F≥ 2(w) is the set of factors of w with size at least 2. Let Φ1(w) = (V, E) be defined by V = V1V2V3 and E = E1E2E3, where:

$$ \begin{array}{@{}rcl@{}} &&V_{1} = \{(i, j) \mid 1 \leq i \leq j \leq |{w}|\}, \qquad E_{1} = \{\{(i_{1},j_{1}), (i_{2},j_{2})\} \mid i_{1}\leq i_{2} \leq j_{1}\} ,\\ &&V_{2} = \mathsf{F}_{\geq 2}(w) , \qquad\qquad\qquad\qquad\quad E_{2} = \{\{w[i..j], (i, j)\} \mid 1 \leq i <j \leq |{w}|\} ,\\ &&V_{3} = \{(u,i) \mid u \in V_{2}, 0 \leq i \leq |u|\}, \!\!\quad E_{3} = \{\{u, (u,i)\} \mid u \in V_{2}, 0 \leq i \leq |u|\} . \end{array} $$

Intuitively speaking, the vertices of V1 represent every factor by its start and end position, whereas V2 contains exactly one vertex per factor of length at least 2. Every uV2 is connected to (i, j), if and only if w[i..j] = u. Vertices (i, j), \((i^{\prime }, j^{\prime })\) are connected if they refer to overlapping factors. For every uV2, there are |u| + 1 special vertices in V3 that are only adjacent with u. Consequently, we can view Φ1(w) as consisting of |w| layers, where the ith layer contains the vertices (j, j + (i − 1)) ∈ V1, 1 ≤ j ≤|w|− (i − 1), the vertices {uV2∣|u| = i} and the vertices {(u, j) ∈ V3∣|u| = i,0 ≤ j ≤|u|} (see Fig. 2 for an illustration).

Fig. 2
figure 2

The third layer of Φ1() (edges from E1 are omitted). The uppermost vertices \((1,3), (2,4), \dots \) are from V1, the ones in the middle labelled by \({\mathtt {a}} {\mathtt {b}} {\mathtt {b}}, {\mathtt {b}} {\mathtt {b}} {\mathtt {a}}, \dots \) are the ones from V2 and, finally, the lower vertices are from V3 (for the sake of convenience, these are labelled by i instead of (u, i))

Next, we show that 1-level grammars for w correspond to independent dominating sets for Φ1(w). Intuitively speaking, the vertices in an independent dominating set from V1 induce a factorisation of w, which, in turn, induces the axiom of a 1-level grammar in the natural way (i. e., every factor of size at least 2 is represented by a rule). If (i, j) ∈ V1 is in the independent dominating set, then w[i..j] ∈ V2 is not; thus, due to the domination-property, all (w[i..j], ) ∈ V3, 0 ≤ ji + 1, are in the independent dominating set, which represents the size of the rule.

Lemma 9

Let w ∈Σ, k ≥ 1. There exists an independent dominating set D of cardinality at most k for Φ1(w) if and only if there exists a 1-level grammar G for w with |G|≤ k −|F≥ 2(w)|.

Proof

We start with the if direction. If G = (N,Σ, R, ax) is a 1-level grammar for w with size k −|F≥ 2(w)|, then we can construct an independent dominating set D for Φ1(w) of size k as follows. Let ax = A1A2An, AiN ∪Σ, 1 ≤ in, and let \(F = \{\mathfrak {D}(A) \mid A \in N\}\). For every i, 1 ≤ in, we add \((|\mathfrak {D}(A_{1} {\ldots } A_{i-1})| + 1, |\mathfrak {D}(A_{1} {\ldots } A_{i})|) \in V_{1}\) to D and, if AiN, then we also add all \(\{(\mathfrak {D}(A_{i}), j) \mid 0 \leq j \leq |\mathfrak {D}(A_{i})|\}\) to D. Furthermore, we add all V2F to D. It can be easily verified that D is an independent dominating set. Moreover, \(|D| = |\mathsf {ax}| + {\sum }_{v \in F} (|v| + 1) + |V_{2} \setminus F| = |\mathsf {ax}| + {\sum }_{v \in F} |v| + |V_{2}| = |\mathsf {ax}| + {\sum }_{A \in N} |\mathfrak {D}(A)| + |V_{2}| = |G| + |\mathsf {F}_{\geq 2}(w)|\). Since |G| = k −|F≥ 2(w)|, we conclude that |D| = k.

Next, we prove the only if direction. Let D be an independent dominating set for Φ1(w). We first note that, for every uV2D, \(\{(u, j) \mid 0 \leq j \leq |u|\} \subseteq D\), which implies that

$$ \begin{array}{@{}rcl@{}} |D| &=& |D \cap V_{1}| + |D \cap V_{2}| + |D \cap V_{3}|\\ &\geq& |D \cap V_{1}| + |D \cap V_{2}| + \sum\limits_{u \in (V_{2} \setminus D)} \{(u, j) \mid 0 \leq j \leq |u|\}\\ &=& |D \cap V_{1}| + |D \cap V_{2}| + \sum\limits_{u \in (V_{2} \setminus D)} (|u| + 1) \\ &=& |D \cap V_{1}| + |V_{2}| + \sum\limits_{u \in (V_{2} \setminus D)} |u| . \end{array} $$

For every i, 1 ≤ i ≤|w|, we say that i is covered by \((j, j^{\prime }) \in V_{1}\) if \((j, j^{\prime }) \in D\) and \(j \leq i \leq j^{\prime }\) (recall that any vertex (i, i) can only be dominated by some vertex \((j, j^{\prime })\) with \(j \leq i \leq j^{\prime }\), since vertex (i, i) has no neighbours in V2). If some i, 1 ≤ i ≤|w|, is not covered by any \((j, j^{\prime }) \in V_{1}\), then (i, i) is not dominated by D and if i is covered by two different elements from V1, then there is an edge (from E1) between them, so that D is not an independent set. Thus, every i, 1 ≤ i ≤|w|, is covered by exactly one element \((j, j^{\prime }) \in V_{1}\). This directly implies that DV1 = {(1, r1),(2, r2),…,(m, rm)}, such that (u1, u2,…, um) is a factorisation of w, where uj = w[j..rj], 1 ≤ jm. Due to the edges in E2, we know that, for every j, 1 ≤ jm, with j < rj, there is an edge (uj,(j, rj)); thus, uj ∈ (V2D). Next, we define N = {Auu ∈ (V2D)} and R = {Auuu ∈ (V2D)}. Since now for each j, 1 ≤ jm, either uj ∈Σ or there exists a non-terminal \(A_{u_{j}}\) which derives uj, we can define an axiom of length m by \(\mathsf {ax} = C_{u_{1}} C_{u_{2}} {\ldots } C_{u_{m}}\) with \(C_{u_{j}}=A_{u_{j}}\) for all j with |uj| > 1 and \(C_{u_{j}}=u_{j}\) otherwise, in order to obtain a 1-level grammar G = (N,Σ, R, ax) with \(\mathfrak {D}(G) = w\). Finally, we note that

$$ \begin{array}{@{}rcl@{}} |G| &=& |\mathsf{ax}| + \sum\limits_{u \in (V_{2} \setminus D)} (|u|)\\ &=& |D \cap V_{1}| + |V_{2}| + \left( \sum\limits_{u \in (V_{2} \setminus D)} |u|\right) - |V_{2}| \\ &\leq& |D| - |\mathsf{F}_{\geq 2}(w)| . \end{array} $$

Since in the multi-level case the derivatives of the nonterminals that appear in the axiom are again compressed by a grammar, a first idea that comes to mind is to somehow represent the vertices uV2 again by graph structures of the type Φ1(u) and iterating this step. However, naively carrying out this idea would lead to redundancies (copies of the subgraph representing a factor u would appear inside subgraphs representing different superstrings w1uw2 and \(w^{\prime }_{1} u w^{\prime }_{2}\)) that even seem to cause an exponential size increase of the graph structure. Fortunately, it turns out that these redundancies can be avoided and a surprisingly simple modification of Φ1(w) is sufficient.

For a word w ∈Σ, let Φm(w) = (V, E) be defined as follows. Let V = V1V2V3V4, where V1 and V2 are defined as for Φ1(w), whereas

$$ \begin{array}{@{}rcl@{}} V_{3} &=& \{(u, 0) \mid u \in V_{2}\} \text{ and}\\ V_{4} &=& \bigcup\limits_{u \in V_{2}} V_{4, u} \text{ with } V_{4, u} = \{(u, i, j) \mid 1 \leq i \leq j \leq |u|, u[i..j]\neq u\} \text{ for } u \in V_{2} . \end{array} $$

Moreover, E = E1E2E3E4E5, where E1 and E2 are defined as for Φ1(w), while

$$ \begin{array}{@{}rcl@{}} E_{3} &= &\{\{u, (u, 0)\} \mid u \in V_{2}\} \cup \{\{u, (u,i,j)\} \mid u \in V_{2}, (u,i,j) \in V_{4, u}\} ,\\ E_{4} &= &\bigcup\limits_{u \in V_{2}} E_{4, u}, \text{ with } E_{4, u} = \{\{(u, i_{1},j_{1}), (u, i_{2},j_{2})\}\subseteq V_{4,u} \mid i_{1} \leq i_{2} \leq j_{1}\},\\ &&\text{ for every } u \in V_{2}, \text{ and}\\ E_{5} &= &\{\{u, (v, i, j)\} \mid u, v \in V_{2}, v[i..j] = u,u\neq v\} . \end{array} $$

Intuitively speaking, Φm(w) differs from Φ1(w) in the following way. We add to every vertex uV2 a subgraph (V4, u, E4, u), which is completely connected to u and which represents u in the same way as the subgraph (V1, E1) of Φ1(w) represents w, i. e., factors u[i..j] are represented by (u, i, j) and edges represent overlappings. Moreover, if a uV2 is a factor of some vV2, then there is an edge from u to all the vertices (v, i, j) ∈ V4, v that satisfy v[i..j] = u (by these “crosslinks”, we get rid of the redundancies mentioned above). Finally, every uV2 is also connected with an otherwise isolated vertex (u,0) ∈ V3. See Fig. 3 for a partial illustration of a Φm(w).

Fig. 3
figure 3

Second and third layer of Φm() (vertices from V1 and edges from E1E2 omitted). For example, vertex () ∈ V2 is connected to all the vertices V4, = {(, i, j)∣1 ≤ ij ≤ 3, ji ≤ 1}, and with (,0) ∈ V3. Moreover, since ()[1..2] =, there is an edge between (,1,2) and () ∈ V2, and since ()[2..3] =, there is an edge between (,2,3) and () ∈ V2

Similar as for the 1-level case, we can show that (multi-level) grammars for w correspond to independent dominating sets for Φm(w):

Lemma 10

Let w ∈Σ, k ≥ 1. There is an independent dominating set D of cardinality k for Φm(w) if and only if there is a grammar G for w with |G| = k −|F≥ 2(w)|.

Proof

Let D be an independent dominating set of cardinality k for Φm(w). In the same way as in the proof of Lemma 9, it can be concluded that the set \(V_{1} \cap D = \{(\ell _{1}, r_{1}), (\ell _{2}, r_{2}), \ldots , (\ell _{m_{w}}, r_{m_{w}})\}\) corresponds to a factorisation \((w_{1}, w_{2}, \ldots , w_{m_{w}})\) of w, where wj = w[j..rj], 1 ≤ jmw, and satisfies \(\{w_{1}, w_{2}, \ldots , w_{m_{w}}\} \cap D = \emptyset \).

Next, for an arbitrary uV2, we consider the subgraph with the vertices N[u] ∖ V1 = V4, u ∪{(v, i, j)∣v[i..j] = u, uv}∪{u,(u,0)}. If uD, then N(u) ∩ D = . On the other hand, if uD, then (u,0) ∈ D and, analogously as for V1, we can conclude that

$$ V_{4, u} \cap D = \{(u, \ell_{u,1}, r_{u,1}), (u, \ell_{u,2}, r_{u,2}), \ldots, (u, \ell_{u,m_{u}}, r_{u,m_{u}})\} , $$

such that \((u_{1}, u_{2}, \ldots , u_{m_{u}})\) is a factorisation of u (note that, in the same way as for V1, if a position i of u is not covered in the sense that \((u, j, j^{\prime }) \in D\) with \(j \leq i \leq j^{\prime }\), then vertex (u, i, i) would neither be in D nor adjacent to a vertex in D), where uj = u[u, j..ru, j], 1 ≤ jmu. Furthermore, for every j, 1 ≤ jmu, with |uj|≥ 2, {uj,(u, u, j, ru, j)}∈ E; thus, ujD. Consequently, by induction, D induces a factorisation \((u_{1}, u_{2}, \ldots , u_{m_{u}})\) for every u ∈ (V2D) ∪{w}, such that, for every i, 1 ≤ imu, |uj|≥ 2 implies ujV2D, which means that there is also a factorisation for uj.

For every uV2D, we can now define a nonterminal Au and a rule \(A_{u} \to B_{1} B_{2} {\ldots } B_{m_{u}}\), where, for every j, 1 ≤ jmu, \(B_{j} = A_{u_{j}}\) if |uj|≥ 2 and Bj = uj if |uj| = 1. Obviously, these rules together with the axiom \(\mathsf {ax} = C_{1} C_{2} {\ldots } C_{m_{w}}\), where, for every j, 1 ≤ jmw, \(C_{j} = A_{w_{j}}\) if |wj|≥ 2 and Cj = wj if |wj| = 1, defines a grammar G for w.

We note that |ax| = |V1D| and, for every rule Auαu, |αu| = |V4, uD|. Since

$$ \begin{array}{@{}rcl@{}} |D| &= &|D \cap V_{1}| + |(D \cap (\bigcup\limits_{u \in V_{2}} V_{4, u}))| + |D \cap (V_{2} \cup V_{3})| ,\\ |V_{2}| &= &|D \cap (V_{2} \cup V_{3})| \text{ and}\\ |G| &= &|D \cap V_{1}| + |(D \cap (\bigcup\limits_{u \in V_{2}} V_{4, u}))| , \end{array} $$

we conclude that |G| = |D|−|V2| = k −|F≥ 2(w)|.

For a grammar G for w, we can select vertices from Φm(w) according to the factorisations induced by the rules of G, which results in an independent dominating set D for Φm(w) with |D| = |G| + |V2|. □

For the algorithmic application of these graph encodings, it is important to note that the proofs of Lemmas 9 and 10 are constructive, i. e., they also show how an independent dominating set D of Φm(w) or Φ1(w) can be transformed into a grammar for w (a 1-level grammar for w, respectively) of size |D|−|F≥ 2(w)|, which, in the following, we will denote by G(D).

Thus, the smallest grammar problem can be solved by constructing Φm(w) or Φ1(w), then computing a smallest independent dominating set D for Φm(w) (or Φ1(w), respectively) and finally constructing G(D). Unfortunately, this does not lead to a polynomial-time algorithm, since computing a minimal independent dominating set is an NP-complete problem, even for quite restricted graph classes [58, Theorem 13].

In the following, we shall analyse the graph structures Φm(w) and Φ1(w) more thoroughly and we begin with their respective sizes:

Proposition 3

Let w ∈Σ. Then Φ1(w) has \(\mathcal {O}(|{w}|^{3})\) vertices and \(\mathcal {O}(|{w}|^{4})\) edges; Φm(w) has \(\mathcal {O}(|{w}|^{4})\) vertices and \(\mathcal {O}(|{w}|^{6})\) edges.

Proof

We first consider Φm(w). The subgraph (V1, E1) has \(\mathcal {O}(|{w}|^{2})\) vertices and \(\mathcal {O}(|{w}|^{4})\) edges. Similarly, every induced subgraph on the set of vertices V4, u ∪{u,(u,0)}, uV2 has \(\mathcal {O}(|{w}|^{2})\) vertices, \(\mathcal {O}(|{w}|^{4})\) edges and there are \(\mathcal {O}(|{w}|^{2})\) such subgraphs. In addition to this, there are \(\mathcal {O}(|{w}|)\) edges connecting any uV2 with vertices from V1 and \(\mathcal {O}(|{w}|^{2})\) edges connecting any uV2 with vertices from V4. Finally, there are \(\mathcal {O}(|{w}|^{2})\) vertices in V3 with one incident edge each. Consequently, Φm(w) has \(\mathcal {O}(|{w}|^{4})\) vertices and \(\mathcal {O}(|{w}|^{6})\) edges.

For Φ1(w), the situation is easier. The subgraph (V1, E1) has \(\mathcal {O}(|{w}|^{2})\) vertices and \(\mathcal {O}(|{w}|^{4})\) edges. There are \(\mathcal {O}(|{w}|^{2})\) vertices in V2 and each uV2 has \(\mathcal {O}(|{w}|)\) edges. Finally, there are \(\mathcal {O}(|{w}|^{2})\) vertices in V3 with one edge each. Consequently, Φ1(w) has \(\mathcal {O}(|{w}|^{3})\) vertices and \(\mathcal {O}(|{w}|^{4})\) edges. □

Next, we investigate the interval-structure of Φm(w) and Φ1(w).

Proposition 4

Φm(w) and Φ1(w) are 2-interval graphs.

Proof

In the following 2-interval representations, we denote by I1(v) the first and by I2(v) the second interval that represents a vertex v.

We first consider the graph Φ1(w). For every (i, j) ∈ V1, we set I1((i, j)) = [i, j]; this already yields the subgraph (V1, E1). In addition, let I1(u), uV2, be a sequence of pairwise disjoint intervals that are also disjoint with the intervals I1((i, j)), (i, j) ∈ V1. For every (u, j) ∈ V3, let I1((u, j)) be an interval that lies within I1(u) and is disjoint from every other interval. Now, it only remains to represent the edges from E2, for which we simply let I2((i, j)), (i, j) ∈ V1, be an interval that lies within I1(w[i..j]) and is disjoint from every other interval. Note that only the vertices from V1 are represented by two intervals each.

For Φm(w), we represent V1V2 and the edges E1E2 by intervals in the same way as for the graph Φ1(w). Then, for every uV2 and (u, i, j) ∈ V4, u, we set I1((u, i, j)) = [i + ku, j + ku], where ku is chosen such that all these intervals lie inside I1(u) without intersecting an interval I2((i, j)) for some (i, j) ∈ V1. In particular, this takes care of all the edges E4, u (due to the intersections between these intervals) and the edges between u and the vertices V4, u (due to the fact that these intervals lie inside I1(u)). In order to take care of the edges from E5, for every u and for every (v, i, j) ∈ V4, v with v[i..j] = u, we place a new interval I2((v, i, j)) inside of I1(u) such that it does not intersect with any other interval inside of I1(u). This creates all the edges from E5. Now it only remains to take care of vertices (u,0), uV2, and their edges, which can be done by placing a new interval I1((u,0)) inside I1(u) such that it does not intersect with any other interval. □

Unfortunately, the independent dominating set problem for 2-interval graphs is still NP-complete (in [58], the hardness of the independent dominating set problem for subcubic graphs is shown and from [59], it follows that subcubic graphs are 2-interval graphs). Nevertheless, solving the smallest grammar problem by computing small independent dominating sets for Φm(w) or Φ1(w), as sketched before Proposition 3, might still be worthwhile, since computing small independent dominating sets is a well-researched problem, for which the literature provides fast and sophisticated algorithms (see [60, 61]). In particular, the 2-interval structure suggests that we are dealing with simpler instances of the independent dominating set problem.

Our algorithmic application of the graph encodings, which leads to the polynomial-time solvability of the smallest grammar problem with a bounded number of nonterminals, can be sketched as follows. If we have fixed the set of factors \(F \subseteq \mathsf {F}_{\geq 2}(w)\) to occur as derivatives of nonterminals in the grammar, i. e., \(\{\mathfrak {D}(A) \mid A \in N\} = F\), then, for the corresponding independent dominating set D of Φm(w) or Φ1(w), we must have \((\mathsf {F}_{\geq 2}(w) \setminus F) \subseteq D\) and FD = . Thus, in order to find an independent dominating set that is minimal among all those that correspond to a grammar with \(\{\mathfrak {D}(A) \mid A \in N\} = F\), it is sufficient to first select the vertices (F≥ 2(w) ∖ F), deleting the neighbourhood of this vertex set and computing a smallest independent dominating set for what remains, which is the graph \({\mathscr{H}} = {\Phi }(w) \setminus (N[\mathsf {F}_{\geq 2}(w) \setminus F] \cup F)\).Footnote 10 However, \({\mathscr{H}}\) is an interval graph, so a smallest independent dominating set can be computed in linear time.

In order to carry out this approach, we first formally prove that \({\mathscr{H}}\) is an interval graph:

Proposition 5

Let w ∈Σ+, \(F \subseteq \mathsf {F}_{\geq 2}(w)\) and Φ(w) ∈{Φm(w),Φ1(w)}. Then \({\mathscr{H}} = {\Phi }(w) \setminus (N[\mathsf {F}_{\geq 2}(w) \setminus F] \cup F)\) is an interval graph.

Proof

We only prove the case Φ(w) = Φm(w), since the case Φ(w) = Φ1(w) can be handled analogously. First, we consider the 2-interval representation of Φm(w) described in the proof of Proposition 4. We can now obtain a 1-interval representation of \({\mathscr{H}}\) from the 2-interval representation of Φm(w) as follows. Since \({\mathscr{H}}\) does not contain any vertex from V2, we first remove the corresponding intervals for vertices from V2. The only vertices represented by more than one interval are the ones from V1 and V4. However, the second intervals of these only intersect intervals which represent vertices from V2 in the 2-interval representation of Φm(w), which means that they are now all isolated and can therefore be removed. Consequently, every vertex of \({\mathscr{H}}\) can be represented by one interval.□

Next, we show that independent dominating sets for \({\mathscr{H}}\) can be easily extended to independent dominating sets for Φm(w) (or Φ1(w)).

Proposition 6

Let w ∈Σ+, \(F \subseteq \mathsf {F}_{\geq 2}(w)\), Φ(w) ∈{Φm(w),Φ1(w)} and let \(D_{{\mathscr{H}}}\) be an independent dominating set for \({\mathscr{H}} = {\Phi }(w) \setminus (N[\mathsf {F}_{\geq 2}(w) \setminus F] \cup F)\). Then \(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \backslash F)\) is an independent dominating set for Φ(w).

Proof

We start with the multi-level case. Since \(D_{{\mathscr{H}}}\) is an independent dominating set for \({\mathscr{H}}\), it is also an independent set for Φm(w). The only vertices of Φm(w) that are not necessarily dominated by \(D_{{\mathscr{H}}}\) are from N[F≥ 2(w) ∖ F] or F. Since \(\mathsf {F}_{\geq 2}(w) \setminus F \subseteq D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \backslash F)\), the vertices from N[F≥ 2(w) ∖ F] are dominated by \(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \backslash F)\). Regarding the vertices from F, we note that since \(F \cap D_{{\mathscr{H}}} = \emptyset \), the vertices {(u,0)∣uF} occur in \({\mathscr{H}}\) as isolated vertices and, thus, they must be included in \(D_{{\mathscr{H}}}\), which means that the vertices F are dominated in Φm(w) by \(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \backslash F)\) as well. Now it only remains to observe that, by definition of Φm(w), the vertices (F≥ 2(w)∖F) are clearly independent and, since their neighbourhood is completely excluded from \({\mathscr{H}}\) and therefore also from \(D_{{\mathscr{H}}}\), they are also independent from the vertices in \(D_{{\mathscr{H}}}\). Consequently, \(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \backslash F)\) is an independent dominating set for Φm(w).

The argument for the 1-level case is very similar with the only difference that {(u, i)∣uF,0 ≤ i ≤|u|} are the vertices from \(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \backslash F)\) that dominate the vertices F. □

For the sake of convenience, for any \(F \subseteq \mathsf {F}_{\geq 2}(w)\), we denote a grammar G = (N,Σ, R, ax) for w with \(\{\mathfrak {D}(A) \mid A \in N\} = F\) by the term F-grammar, a smallest F-grammar for w is one that is minimal among all F-grammars for w.

Lemma 11

Let w ∈Σ+ and \(F \subseteq \mathsf {F}_{\geq 2}(w)\). A smallest F-grammar for w can be computed in time \(\mathcal {O}(|{w}|^{6})\) and a smallest 1-level F-grammar for w can be computed in time \(\mathcal {O}(|{w}|^{4})\).

Proof

Again, we only prove the multi-level case, since the 1-level case can be dealt with analogously. We compute a smallest F-grammar for w as follows. First, we construct Φm(w) and then \({\mathscr{H}} = {\Phi }_{m}(w) \setminus (N[\mathsf {F}_{\geq 2}(w) \setminus F] \cup F)\), which can be done in time \(\mathcal {O}(|{\Phi }_{m}(w)|) = |{w}|^{6}\) (see Proposition 3). Obviously, we could also construct \({\mathscr{H}}\) directly, which would not change the overall running-time. Next, we compute a minimal independent dominating set \(D_{{\mathscr{H}}}\) for \({\mathscr{H}}\), which, since \({\mathscr{H}}\) is an interval graph (see Proposition 5), can be done in time \(\mathcal {O}(|{\mathscr{H}}|) = \mathcal {O}(|{w}|^{6})\) (see Section 2.1). Finally, we construct \(G = \mathsf {G}(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \setminus F))\) (note that, by Proposition 6, \(D_{{\mathscr{H}}} \cup (\mathsf {F}_{\geq 2}(w) \setminus F)\) is an independent dominating set for Φm(w); thus, G is well-defined), which can be done in time \(\mathcal {O}(|{w}|^{6})\) as well.

It remains to prove that G is a smallest F-grammar. To this end, we assume that there exists an F-grammar \(G^{\prime }\) for w and \(|G^{\prime }| < |G|\). Consequently, by Lemma 10, there is an independent dominating set \(D^{\prime }\) for Φm(w) with \(|G^{\prime }| = |D^{\prime }| - |\mathsf {F}_{\geq 2}(w)|\). Since both G and \(G^{\prime }\) are F-grammars, \(\mathsf {F}_{\geq 2}(w) \setminus D =\mathsf {F}_{\geq 2}(w) \setminus D^{\prime }=F\). This implies that \(D^{\prime }_{{\mathscr{H}}} = D^{\prime } \setminus (\mathsf {F}_{\geq 2}(w) \setminus F)\) is an independent dominating set for \({\mathscr{H}}\). Since by Lemma 10, |G| = |D|−|F≥ 2(w)| and, by assumption, \(|D^{\prime }| < |D|\), it follows that \(|D^{\prime }_{{\mathscr{H}}}| < |D_{{\mathscr{H}}}|\), which is a contradiction to the minimality of \(D_{{\mathscr{H}}}\). Consequently, G is a smallest F-grammar for w. □

If instead of a set F of factors, we are only given an upper bound k on |N|, then we can compute a smallest grammar by enumerating all \(F \subseteq \mathsf {F}_{\geq 2}(w)\) with |F|≤ k and computing a smallest F-grammar. This shows that smallest grammars can be computed in polynomial time if the number of nonterminals is bounded.

Theorem 10

Let w ∈Σ and \(k \in \mathbb {N}\). A grammar (1-level grammar, resp.) for w with at most k rules that is smallest among all grammars (1-level grammars, resp.) for xw with at most k rules can be computed in time \(\mathcal {O}(|{w}|^{2k+6})\) (\(\mathcal {O}(|{w}|^{2k+4})\), resp.).

Proof

Obviously, a grammar G for w with k rules and

$$ |G| = \min\{|G^{\prime}| \mid G^{\prime} \text{ is smallest \textit{F}-grammar, with } F \subseteq \mathsf{F}_{\geq 2}(w), |F| \leq k\} $$

is smallest among all grammars for w with at most k rules. In order to compute such a grammar, it is sufficient to compute, for every set \(F \subseteq \mathsf {F}_{\geq 2}(w)\) with |F|≤ k, a smallest F-grammar, which requires time \(\mathcal {O}(|{w}|^{2k} \cdot |{w}|^{6}) = \mathcal {O}(|{w}|^{2k+6})\).

Analogously, we can compute a 1-level grammar for w with at most k rules that is smallest among all 1-level grammars for w with at most k rules in time \(\mathcal {O}(|{w}|^{2k+4})\). □

This result raises some related questions, which shall be discussed next.

4.1 Related Questions

In the literature on grammar-based compression, the size of a smallest grammar has been interpreted in terms of a computable upper bound of the Kolomogorov complexity and, thus, as some measure for entropy or information content of strings (see Section 1). Similarly, we could treat the minimal number of nonterminals (i. e., number of rules) that are needed for a smallest grammar as a general parameter of strings, which we call the rule-number. The main motivation for doing this is pointed out by Theorem 10, which shows that a smallest grammar for w can be computed in time that is exponential only in the rule-number of w (or, in parameterised complexity terms, the smallest grammar problem parameterised by |N| is in XP). However, in order to apply the algorithm of Theorem 10 in this regard, we need to know the rule-number, which naturally leads to the question whether the rule-number of a given string can efficiently be computed. However, the hardness reductions for the rule-size variants of the smallest grammar problem (see Section 3.3) has already provided a negative answer to this question (see Theorems 8 and 9).

The XP-membership of the smallest grammar problem, provided by Theorem 10, shows that the parameter |N| has a stronger impact on the complexity than |Σ| and, furthermore, it gives reason to hope that bounding |N| might also lead to practically relevant algorithms. In this regard, the algorithm of Theorem 10 with its running-time of the form \(|{w}|^{\mathcal {O}(|N|)}\) is a bit dissapointing, since it cannot be considered practical for larger constant bounds on |N|. On the other hand, an algorithm with a running-time of f(|N|) ⋅ g(|w|), for a polynomial g, would be a huge improvement. In other words, the question is whether the smallest grammar problem is also fixed-parameter tractable with respect to the number of nonterminals. Unfortunately, this seems unlikely, since, as stated by the next result, these parameterisations of 1-SGP and SGP are W[1]-hard. To prove this, we devise a parameterised reduction from the independent set problem parameterised by the size of the independent set, which is known to be W[1]-hard (see [62]).

Let \(\mathcal {G} = (V, E)\) be a graph with V = {v1, v2,…, vn}, |E| = m, and let \(k \in \mathbb {N}\). We define the alphabet \({\Sigma } = V \cup \{\#\} \cup \{\diamond _{i} \mid 1 \leq i \leq m + {\sum }^{n}_{i = 1} n - |N(v_{i})|\}\) and the following word over Σ

$$ w = \prod\limits_{\{v_{i}, v_{j}\} \in E}(\#v_{i}\#v_{j}\#\diamond) \prod\limits^{n}_{i = 1}(\#v_{i}\#\diamond)^{n-|N(v_{i})|} . $$

As already done in Section 3, every occurrence of ◇ in the word stands for a distinct symbol of \(\{\diamond _{i} \mid 1 \leq i \leq m + {\sum }^{n}_{i = 1} n - |N(v_{i})|\}\)). Note that |w| = 6m + 4(n2 − 2m) = 4n2 − 2m.

Lemma 12

The following statements are equivalent for each kn:

  • \(\mathcal {G}\) has an independent set I with |I| = k.

  • There is a grammar G for w with at most k nonterminals and |G|≤ 4n2 − 2m + 3k − 2kn.

  • There is a 1-level grammar G for w with at most k nonterminals and |G|≤ 4n2 − 2m + 3k − 2kn.

Proof

We first prove the equivalence of the first and the third statement. Let I be an independent set for \(\mathcal {G}\) with |I| = k. We define a grammar G = (N,Σ, R, ax) by N = {AiviI}, R = {Ai#vi#AiN} and \(\mathsf {ax} = w^{\prime }\), where \(w^{\prime }\) is obtained from w, by replacing, for every viI, all occurrences of #vi# by Ai (note that since I is an independent set, no two occurrences of factors #vi# and #vj# with vi, vjI overlap). Obviously, G is a 1-level grammar for w with k nonterminals. For every viI, \(|\mathsf {ax}|_{A_{i}} = |N(v_{i})| + (n - |N(v_{i})|) = n\); thus, p(Ai) = 2n − 3 (recall that the concept of the profit p(A) of a nonterminal A of a 1-level grammar is defined on page 11). Consequently, \(|G| = |{w}| - {\sum }_{A \in V} \mathsf {p}(A) = 4n^{2}-2m - k(2n - 3)\).

Let G = (N,Σ, R, ax) be a 1-level grammar of size at most 4n2 − 2m − 2kn + 3k, with at most k nonterminals. We note that, for every AN, p(A) ≤ 2n − 3, since in w every repeated factor has size of at most 3 and is repeated at most n times. Since, by assumption, |G|≤ 4n2 − 2mk(2n − 3) and \(|G| = 4n^{2}-2m - {\sum }_{A \in N} \mathsf {p}(A)\), we conclude that \({\sum }_{A \in N} \mathsf {p}(A) \geq k(2n - 3)\). Hence, there are exactly k nonterminals AN each with a right side of length 3, which implies A#vi#, for some i, 1 ≤ in, and, furthermore, |ax|A = n. It can be easily verified that this is only possible if {vi∣there is (A#vi#) ∈ R} is an independent set for \(\mathcal {G}\).

The third statement obviously implies the second statement. We assume that the second statement holds, i. e., there is a grammar G = (N,Σ, R, ax) for w with at most k nonterminals and |G|≤ 4n2 − 2m + 3k − 2kn. If G is not a 1-level grammar, then it has a rule Aα with α∉Σ+ and, since the only repeated factors of w with a length of at least 3 have the form #x#, for some \(x \in \{v_{1},\dots , v_{n}\}\), we also know that \(\mathfrak {D}(A) = \# x \#\). In particular, this implies that α = B# or α = #B with B#xR or Bx#R. Generally, each rule in G has a length (and hence cost) of at least 2, compresses a factor of length at most 3 and occurs in the axiom at most n times. The rules A and B together can occur at most n times in ax, as they both derive the symbol x. This means that the axiom has a length of at least |w|− (k − 1)2n and therefore the overall grammar has size of at least |ax| + 2k = 4n2 − 2m − 2kn + 2n + 2k. Since we assumed that |G|≤ 4n2 − 2m + 3k − 2kn, this implies 4n2 − 2m − 2kn + 2n + 2k ≤ 4n2 − 2m + 3k − 2kn, so 2nk which contradicts the assumption kn. □

Lemma 12 directly yields the following result:

Theorem 11

1-SGP and SGP parameterised by |N| are W[1]-hard.

We emphasise that Theorem 11 shows W[1]-hardness for the smallest grammar problem parameterised by |N| only for the case where the terminal alphabet Σ is unbounded. The most important respective question, which, unfortunately, is left open here, is whether the smallest grammar problem is fixed-parameter tractable with respect to the combined parameter (|N|,|Σ|) (we discuss the open cases of the parameterised complexity of the smallest grammar problem in more detail in Section 6).

Finally, we note that we can use Lemma 11 in order to obtain a simple exact exponential-time algorithm for the smallest grammar problem. More precisely, we compute for each subset \(F \subseteq \mathsf {F}_{\geq 2}(w)\) a smallest F-grammar, which yields an algorithm with an overall running-time of \(2^{\mathcal {O}(|{w}|^{2})}\). In the next section, we present more advanced exact exponential-time algorithms for SGP and 1-SGP.

5 Exact Exponential-Time Algorithms

An obvious approach for an exact exponential-time algorithm for SGP is to enumerate all ordered trees with |w| leaves and to interpret them as derivation trees of a grammar for w. More precisely, for a given ordered tree with |w| leaves, we first label the leaves with the symbols of w and then we inductively label each internal node with u1u2uk, where ui are the labels of its children nodes. Finally, for every factor u that occurs as a label of some internal node, we substitute all occurrences of this label by a nonterminal Au. In order to estimate the number of such trees, we first note that the ith Catalan number Ci is the number of full binary trees (i. e., every non-leaf has exactly two children) with i + 1 leaves. Moreover, every tree with |w| leaves can be obtained from a full binary tree with |w| leaves by contracting some of its ‘non-leaf’ edges (i. e., edges not incident to a leaf). Since every full binary tree with |w| leaves has less than |w| such ‘non-leaf’ edges, the number of trees that we have to consider is at most C|w|− 1 ⋅ 2|w|. Since \(C_{|{w}|-1} \in \mathcal {O}(4^{|{w}|-1})\), this leads to an algorithm with running-time \(\mathcal {O}^{*}(8^{|{w}|})\).

In the following, we shall give more sophisticated exact exponential-time algorithms with running times \(\mathcal {O}^{*}(1.8392^{|{w}|})\), for the 1-level case, and \(\mathcal {O}^{*}(3^{|{w}|})\), for the multi-level case. First, we need to introduce some helpful notations.

Let G = (N,Σ, R, ax) be a grammar for w and let α = A1Ak, Ai ∈ (Σ ∪ N), 1 ≤ ik. The factorisation of \(\mathfrak {D}(\alpha )\) induced by α is the tuple \((\mathfrak {D}_{G}(A_{1}),\dots ,\mathfrak {D}_{G}(A_{k}))\). Furthermore, the factorisation of w induced by ax is called the factorisation of w induced by G. A factorisation q = (u1, u2,…, uk) of a word w with |w| = n can be characterised by the vector \(v_{q}\in \{0,1\}^{n-1}\) defined by setting vq[i] = 1 if and only if i = |u1uj| for some 1 ≤ j < k. For the sake of convenience, we implicitly assume vq[0] = vq[n] = 1, and treat vectors as words over the alphabet \(\mathbb {N}\), which allows us to use notations already defined for words. From now on, we shall use these two representations of factorisations, i. e., tuples of factors and vectors in {0,1}n− 1, interchangeably, without mentioning it.

5.1 The 1-Level Case

In the 1-level case, as long as we are only concerned with smallest grammars, the factorisation induced by the axiom already fully determines the grammar. More formally, let \(q=(u_{1},u_{2},\dots ,u_{k})\) be a factorisation for a word w and let Fq = {ui∣1 ≤ ik,|ui|≥ 2}. We define the 1-level grammar Gq = (Nq,Σ, Rq, axq) by Rq = {(Au, u): uFq}, Nq = {Au: uFq} and axq = B1Bk with \(B_{j} = A_{u_{j}}\), if ujFq and Bj = uj, otherwise.

Lemma 13

For any factorisation \(q=(u_{1},u_{2},\dots ,u_{k})\) for w, Gq is a smallest grammar among all 1-level grammars for w that induce the factorisation q.

Proof

Let \(q=(u_{1},u_{2},\dots ,u_{k})\) be a factorisation for a word w. Every 1-level grammar G = (N,Σ, R, ax) for w that induces q satisfies \(|G| = k + {\sum }_{A \in N} |\mathfrak {D}(A)| \geq k + {\sum }_{u \in F_{q}} |u|\). Since \(|G_{q}| = k + {\sum }_{u \in F_{q}} |u|\), Gq is a smallest 1-level grammar for w that induces q. □

Choosing the smallest among all grammars {Gqq is a factorisation of w} yields an \(\mathcal {O}^{*}(2^{n})\) algorithm for 1-SGP. However, it is not necessary to enumerate factorisations that contain at least two consecutive factors of length 1, which improves this result as follows.

Theorem 12

1-SGP can be solved exactly in polynomial space and in time \(\mathcal {O}^{*}(1.8392^{|{w}|})\).

Proof

For any \(k \in \mathbb {N}\), let Γk contain all q ∈{0,1}k, such that v has no prefix 11, no suffix 11 and no factor 111; furthermore, let \({\Gamma }^{\prime }_{k}\) contain all q ∈{0,1}k, such that v has no suffix 11 and no factor 111. Clearly, Γ|w|− 1 contains exactly the factorisations for w that have no consecutive factors of length 1. In order to solve the smallest 1-level grammar problem, we enumerate Γ|w|− 1 and for every q ∈Γ|w|− 1, we construct Gp, where p is obtained from q, by replacing every non-repeated factor u of q with the factors u[1], u[2],…, u[|u|]. It remains to prove the correctness of this algorithm and to estimate its running-time.

To this end, let G be a smallest 1-level grammar for w and let p = (u1, u2,…, uk) be the factorisation induced by G. Furthermore, let q be the factorisation obtained from p by joining any maximal sequence ui, ui+ 1,…, uj, 1 ≤ i < jk, of factors with |u| = 1, ij (note that q ∈Γ|w|− 1). If none of the newly constructed factors of q is repeated, then the algorithm, when enumerating q, constructs grammar Gp that, according to Lemma 13, is smallest among all 1-level grammars for w that induce p; thus, Gp is a smallest 1-level grammar. If, on the other hand, any of these newly constructed factors is repeated and has a length of at least 3, or has length 2 and is repeated for at least 3 times, then a 1-level grammar smaller than G could be constructed, which is a contradiction. This leaves the case where all newly constructed factors of q have length 2 and are repeated exactly twice. In this case the algorithm will, when enumerating q, construct a grammar that differs from Gp only in that it compresses some factors of length 2 that are repeated only twice, and that Gp does not compress. This grammar has obviously the same size as Gp and is therefore a smallest 1-level grammar as well.

In order to estimate the running-time, let T(k) = |Γk| and \(T^{\prime }(k) = |{\Gamma }^{\prime }_{k}|\), for every \(k \in \mathbb {N}\). Obviously,

$$ T(k) = |\{q \in {\Gamma}_{k}:q[1] = 0\}| + |\{q \in {\Gamma}_{k}:q[1] = 1\}| , $$

so, in the following, we shall determine |{q ∈Γkq[1] = 0}| and |{q ∈Γkq[1] = 1}| separately. To this end, we first note that \(|\{q \in {\Gamma }_{k} \mid q[1] = 1\}| = T(k-1) - T^{\prime }(k-3)\) (this is due to the fact that T(k − 1) also counts all \(q = 110q^{\prime }\ldots \) with \(q^{\prime } \in {\Gamma }^{\prime }_{k-3}\), so we have to subtract \(T^{\prime }(k-3)\)). Moreover,

$$ \begin{array}{@{}rcl@{}} |\{q \in {\Gamma}_{k} \mid q[1]q[2] = 01\}| &=& T(k-2) ,\\ |\{q \in {\Gamma}_{k} \mid q[1]q[2]q[3] = 001\}| &=& T(k-3) ,\\ |\{q \in {\Gamma}_{k} \mid q[1]q[2]q[3] = 000\}| &=& T^{\prime}(k-3) . \end{array} $$

This is due to the fact that extending the prefix 01 or 001 with 11 yields a factor 111, where the prefix 000 can be extended by 11. With the above observations, we can now conclude the following:

$$ \begin{array}{@{}rcl@{}} T(k) &= &|\{q \in {\Gamma}_{k} \mid q[1] = 0\}| + |\{q \in {\Gamma}_{k} \mid q[1] = 1\}|\\ &= &|\{q \in {\Gamma}_{k} \mid q = 01\ldots\}| + |\{q \in {\Gamma}_{k} \mid q = 001\ldots\}| +\\ &&|\{q \in {\Gamma}_{k} \mid q = 000\ldots\}| + |\{q \in {\Gamma}_{k} \mid q[1] = 1\}|\\ &= &T(k-2) + T(k-3) + T^{\prime}(k-3) + T(k-1) - T^{\prime}(k-3)\\ &= &T(k-1) + T(k-2) + T(k-3) . \end{array} $$

This yields \(T(k) = \mathcal {O}(1.8392^{k})\); since we can also enumerate Γ|w|− 1 in time \(\mathcal {O}^{*}(1.8392^{|{w}|})\), the algorithm has a running-time of \(\mathcal {O}^{*}(1.8392^{|{w}|})\). □

5.2 The Multi-Level Case

The obvious idea for a dynamic programming algorithm is to build up grammars level by level, e. g., by starting with a 1-level grammar, then extending it by a new axiom, which can derive the old axiom in one derivation step, and iterating this procedure. Obviously, we have to try an exponential number of axioms, which will lead to an exponential-time algorithm (as suggested by the NP-completeness of the problem). However, there is a more fundamental problem with this general approach, which shall be pointed out by going a bit more into detail.

For every i and every factorisation p of w, we store in entry T[i, p] of a table the size of a smallest i-level grammar with an axiom ax that induces factorisation p (in the sense defined at the beginning of this section). Then, for every factorisation q, such that p is a refinement of q, we construct a new axiom \(\mathsf {ax}^{\prime }\) that induces factorisation q and that can derive ax in one step, which is treated as the axiom of a new (i + 1)-level grammar. We subtract the profit of the rules needed to derive ax from \(\mathsf {ax}^{\prime }\) to T[i, p] and store the obtained number in T[i + 1, q]. Note that the axioms ax and \(\mathsf {ax}^{\prime }\) are fully determined by the factorisations p and q (similar as a factorisation determines a smallest 1-level grammar with an axiom inducing this factorisation, see Lemma 13). However, this approach is fundamentally flawed, since in order to compute the size of the new (i + 1)-level grammar, we need to know whether the rules needed to derive ax from \(\mathsf {ax}^{\prime }\) have already been used earlier in the i-level grammar and therefore are already counted by T[i, p], or whether they are newly introduced. On the other hand, it should clearly be avoided to additionally store all previously used rules as well.

To overcome this problem, we do not consider the levels of a grammar as strings ax, D(ax), D(D(ax)),…, w, which is the obvious choice, but we define them in such a way that all occurrences of a nonterminal are on the same level. With this definition, all the rules that are needed for the extension to the new level must be completely new rules without prior application; thus, a dynamic programming approach similar to the one described above will be successful. Next, we give the required definitions (which are also illustrated by Example 2).

For a d-level grammar G = (N,Σ, R, ax), we partition the set of nonterminals N according to the number of derivation steps that are necessary to derive a terminal word (or, equivalently, according to their height, i. e., the maximum distance to a leaf in the derivation tree). More precisely, let \(N_{1},\dots ,N_{d}\) be the partition of N into \(N_{i}=\{A\in N \mid ({\mathsf {D}^{i}_{G}}(A)\in {\Sigma }^{+})\wedge (\mathsf {D}^{i-1}_{G}(A)\notin {\Sigma }^{+})\}\). We recall that the morphism D : (N ∪Σ)→ (N ∪Σ) replaces every occurrence of a nonterminal by the right side of its rule. For every i, 1 ≤ id, we modify D, such that it only considers nonterminals from Ni and ignores the rest. More formally, for every i, 1 ≤ id, we define a morphism \(\widehat {\mathsf {D}}_{i}\colon (N\cup {\Sigma })^{*}\rightarrow (N\cup {\Sigma })^{*}\) component-wise by \(\widehat {\mathsf {D}}_{i}(x) = \mathsf {D}(x)\), if xNi and \(\widehat {\mathsf {D}}_{i}(x) = x\), otherwise. Using these morphisms, we now inductively define the levelsLi, 0 ≤ id, of G by Ld = ax and, for every i, 0 ≤ id − 1, \(\mathsf {L}_{i} = \widehat {\mathsf {D}}_{i+1}(\mathsf {L}_{i + 1})\).

Observation 4

The sequence Ld, Ld− 1,…, L1, L0 is a derivation with Ld = ax, L0 = w and, by a simple induction over i, it can be verified that, for every i, 1 ≤ id, all applications of rules for nonterminals from Ni happen in the single derivation step from Li to Li− 1. In particular, this implies that, for every i, 1 ≤ id, Li contains all occurrences of nonterminals ANi that are ever derived in the derivation of w or, in other words, for every j, 0 ≤ ji − 1, \({\sum }_{A \in N_{i}} |\mathsf {L}_{j}|_{A} = 0\).

Since in the derivation Ld, Ld− 1,…, L1, L0 occurrences of a nonterminal A are not derived until all of them are collected in Li and then they are derived all at once in the same derivation step, we can conveniently define the term profit for all rules (of the d-level grammar G) as follows. For every i, 1 ≤ id, we define the profit of every ANi by p(A) = |Lj|A(|D(A)|− 1) −|D(A)|. Note that for d = 1 this corresponds to the definition of profit for 1-level grammars as introduced on page 11. In particular, we can now express the size of a grammar in terms of the profit of its rules:

Proposition 7

Let G be a grammar. Then \(|G| = |{w}|-({\sum }^{d}_{i = 1}{\sum }_{A\in N_{i}} \mathsf {p}(A))\).

Proof

We recall that, by definition of the size of a grammar and as a conclusion of Observation 4, we have

$$ \begin{array}{@{}rcl@{}} |G| = \left( \sum\limits_{i = 1}^{d} \sum\limits_{A \in N_{i}} |\mathsf{D}(A)| \right) + |\mathsf{ax}| , |{w}| = \left( \sum\limits_{i = 1}^{d} \sum\limits_{A \in N_{i}} |\mathsf{L}_{i}|_{A} (|\mathsf{D}(A)| - 1)\right) + |\mathsf{ax}| . \end{array} $$

Consequently,

$$ \begin{array}{@{}rcl@{}} &&|{w}|-\left( \sum\limits_{i = 1}^{d}\sum\limits_{A\in N_{i}}\! \mathsf{p}(A)\! \right)\! = |{w}| - \left( \sum\limits_{i = 1}^{d} \sum\limits_{A \in N_{i}} |\mathsf{L}_{i}|_{A} (|\mathsf{D}(A)| - 1) - |\mathsf{D}(A)|\right) = \\ &&|{w}| - \left( \left( \sum\limits_{i = 1}^{d} \sum\limits_{A \in N_{i}} |\mathsf{L}_{i}|_{A} (|\mathsf{D}(A)| - 1)\right) - \left( \sum\limits_{i = 1}^{d} \sum\limits_{A \in N_{i}} |\mathsf{D}(A)|\right)\right) = \\ &&|{w}| - \left( (|{w}| - |\mathsf{ax}|) - (|G| - |\mathsf{ax}|)\right) = |G| . \end{array} $$

Example 2

Let G = (N,Σ, R, ax) with N = {A, B, C, D}, Σ = {,}, R = {AD, B →, CAB, D →} and ax = CDC be the 3-level grammar illustrated in Fig. 4. According to the definitions from above, the partition of N is N1 = {B, D}, N2 = {A}, N3 = {C}, and the levels are

$$ \begin{array}{@{}rcl@{}} \begin{array}{ll} \mathsf{L}_{3}= \mathsf{ax} & \qquad =CDC ,\\ \mathsf{L}_{2} = \widehat{\mathsf{D}}_{3}(CDC) & \qquad = A B D A B ,\\ \mathsf{L}_{1} = \widehat{\mathsf{D}}_{2}(A B D A B) & \qquad = D {\mathtt{b}} {\mathtt{b}} B D D {\mathtt{b}} {\mathtt{b}} B ,\\ \mathsf{L}_{0} = \widehat{\mathsf{D}}_{1}(D {\mathtt{b}} {\mathtt{b}} B D D {\mathtt{b}} {\mathtt{b}} B) & \qquad = {\mathtt{a}} {\mathtt{a}} {\mathtt{a}} {\mathtt{b}} {\mathtt{b}} {\mathtt{a}} {\mathtt{b}} {\mathtt{a}} {\mathtt{a}} {\mathtt{a}} {\mathtt{a}} {\mathtt{a}} {\mathtt{a}} {\mathtt{b}} {\mathtt{b}} {\mathtt{a}} {\mathtt{b}} . \end{array} \end{array} $$

Note that, for every i, 1 ≤ i ≤ 3, Li contains all occurrences of all nonterminals from Ni and the rules for all nonterminals Ni are exclusively applied in deriving Li− 1 from Li. In particular, note that in the derivation L3,…, L0, the derivation of occurrences of nonterminals B and D is delayed until the very last derivation step.

Fig. 4
figure 4

A derivation tree for 3-level grammar (neighbouring leaves are combined, the start rule is omitted)

Furthermore, the profits are as follows

$$ \begin{array}{@{}rcl@{}} \mathsf{p}(A) &=& |\mathsf{L}_{2}|_{A} (|\mathsf{D}(A)| - 1) - |\mathsf{D}(A)| = 2 (3 - 1) - 3 = 1, \\ \mathsf{p}(B) &=& |\mathsf{L}_{1}|_{B} (|\mathsf{D}(B)| - 1) - |\mathsf{D}(B)| = 2 (2 - 1) - 2 = 0, \\ \mathsf{p}(C) &=& |\mathsf{L}_{3}|_{C} (|\mathsf{D}(C)| - 1) - |\mathsf{D}(C)| = 2 (2 - 1) - 2 = 0, \\ \mathsf{p}(D) &=& |\mathsf{L}_{1}|_{D} (|\mathsf{D}(D)| - 1) - |\mathsf{D}(D)| = 3 (3 - 1) - 3 = 3 . \end{array} $$

Moreover, \(|{w}|-{\sum }_{A\in N} \mathsf {p}(A) = 17 - 4 = 13\) and |G| = |ax| + |D(A)| + |D(B)| + |D(C)| + |D(D)| = 3 + 3 + 2 + 2 + 3 = 13.

Before we formally present the dynamic programming algorithm, we sketch its behaviour in a more intuitive way. We first need the following definition. A factorisation p = (u1, u2,…, uk) is a refinement of a factorisation q = (v1, v2,…, vm), denoted by pq, if \((u_{j_{i - 1} + 1}, u_{j_{i - 1} + 2}, \ldots , u_{j_{i}})\) is a factorisation of vi, 1 ≤ im, for some {ji}0≤im, with 0 = j0 < j1 < … < jm = k.

The algorithm runs through steps \(i = 1, 2, \ldots , \frac {w}{2}\) and in step i, it considers all possibilities for two factorisations qi− 1 and qi of w induced by Li− 1 and Li, respectively (note that this implies qi− 1qi). The differences between qi− 1 and qi implicitly define Ni as follows. Let \(q_{i}=(v_{1},v_{2},\dots ,v_{k})\) and let qi− 1 = (u1, u2,…, u), which, since qi− 1qi, means that for some ji, 0 ≤ ik, with 1 = j0 < j1 < … < jk = + 1, \((u_{j_{i - 1}}, u_{j_{i - 1} + 1}, \ldots , u_{j_{i} - 1})\) is a factorisation of vi, 1 ≤ ik. If jsjs− 1 > 1 for some 1 ≤ sk, Ni contains a nonterminal A with |D(A)| = jsjs− 1 and \(\mathfrak {D}(A)=v_{s}\). The number |Li|A is also implicitly given by counting how often the sequence of factors \((u_{j_{s-1}+1},\dots ,u_{j_{s}})\) independently occurs in qi− 1 and is combined into one single factor in qi; more precisely, \(|\mathsf {L}_{i}|_{A} = |\{t\colon (u_{j_{t-1}+1},\dots ,u_{j_{t}})=(u_{j_{s-1}+1},\dots ,u_{j_{s}})\}|\). This allows to calculate the profit of the rule for A without knowing the exact structure of the rules for nonterminals in Nj with ji. By Lemma 13, this choice of nonterminals for Ni is optimal for the fixed induced factorisations, which means that a search among all choices for qi− 1 and qi yields a smallest i-level grammar for w. The running time of this algorithm is dominated by enumerating all pairs qi− 1 and qi of factorisations of w. However, due to qi− 1qi, these pairs can be compressed as vectors {0,1,2}|w|− 1 (the entries denote whether the corresponding position in w is factorised by both (entry ‘1’), only by the refinement (entry ‘2’) or none (entry ‘0’) of the factorisations). Hence, enumerating these pairs of vectors can be done in time \(\mathcal {O}(3^{|{w}|})\).

Theorem 13

SGP can be solved in time and space \(\mathcal {O}^{*}(3^{|{w}|})\).

Proof

Let n = |w|. We use dynamic programming to consider all possible factorisations of w and refinements for each level \(i=1,\dots ,d\). A factorisation of w is stored as a vector q ∈{0,1}n− 1 and, furthermore, we use vectors q ∈{0,1,2}n− 1 in order to represent a factorisation together with a refinement, as explained above (for the sake of convenience, we implicitly assume q[0] = q[n] = 1). For such a vector q ∈{0,1,2}n− 1 that describes two factorisations p and \(p^{\prime }\) with \(p \preceq p^{\prime }\), we denote by F(q) the factorisation \(p^{\prime }\) (represented as a vector from {0,1}n− 1) and by R(q) the refinement p (represented as a vector from {0,1}n− 1). More formally, let \(F \colon \{0,1,2\}^{n-1}\rightarrow \{0,1\}^{n-1}\) be a mapping that replaces each ‘2’-entry by a ‘0’-entry (and leaves all other entries unchanged), and let \(R \colon \{0,1,2\}^{n-1}\rightarrow \{0,1\}^{n-1}\) be a mapping that replaces each ‘2’-entry by a ‘1’-entry (and leaves all other entries unchanged).

The dynamic program uses the following tables:

  • T[i, q] for \(i\in \{2,\dots , \frac {n}{2}\}\) and all q ∈{0,1,2}n− 1 ∖{0,1}n− 1 stores the size of a smallest i-level grammar for w for which the axiom ax induces the factorisation F(q) and for which \(\widehat {\mathsf {D}}_{i}(\mathsf {ax})\) induces the factorisation R(q).

  • S[i, q] for all \(i\in \{1,\dots ,\frac {n}{2}\}\) and all q ∈{0,1}n− 1 stores the size of a smallest i-level grammar for w for which the axiom induces the factorisation q.

  • P[i, q] for all \(i\in \{2,\dots , \frac {n}{2}\}\) and all q ∈{0,1}n− 1 stores the refinement of q which equals the factorisation induced by \(\widehat {\mathsf {D}}_{i}(\mathsf {ax})\) for an optimal i-level grammar for which ax induces factorisation q.

  • opti for all \(i\in \{1,\dots ,\frac {n}{2}\}\) stores the value of a smallest i-level grammar for w.

We point out that the tables T and S are sufficient to compute the size of a smallest grammar; the purpose of table P is to construct an actual grammar of minimal size after termination of the algorithm. Intuitively speaking, in order to determine S[i, q], i. e., the size of a smallest i-level grammar for which the axiom induces the factorisation q, we have to check all entries \(T[i, q^{\prime }]\) for which the factorisation of \(q^{\prime }\) (note that \(q^{\prime }\) represents a factorisation and a refinement) equals q and for a minimal one of these entries, we store the actual refinement (which is not needed anymore to compute the size of a minimal grammar) in P[i, q]. In this way, the entries of P[i, q] allow us to restore an actual smallest grammar.

We first initialise S by setting S[1, q] = |Gq|, for every q ∈{0,1}n− 1, where, according to Lemma 13, Gq is a smallest 1-level grammar for w that induces factorisation q, and we set \(opt_{1} = \min \limits \{S[1,q]\mid q\in \{0,1\}^{n-1}\}\).

We then compute iteratively for each \(i=2,\dots , \frac {n}{2}\) the entries T[i, q], \(S[i, q^{\prime }]\) and \(P[i, q^{\prime }]\), for every q ∈{0,1,2}n− 1 ∖{0,1}n− 1 and \(q^{\prime }\in \{0,1\}^{n-1}\) as follows.

First, for any q ∈{0,1,2}n− 1 ∖{0,1}n− 1, we define the set I(q) of consecutive factors in R(q) which are combined into one factor in F(q):

$$ \begin{array}{@{}rcl@{}} I(q):=\{(j_{0},j_{1},\dots,j_{k}) \mid & & |q[j_{0} - 1..j_{k}]|_{1} = |q[j_{0} - 1]q[j_{k}]|_{1} = 2,\\ && |q[j_{0}..j_{k}]|_{2} = |q[j_{1}]{\ldots} q[j_{k-1}]|_{2} = k-1 \geq 1\} . \end{array} $$

Furthermore, from I(q), we can extract the set N(q) of nonterminals which create these factors on level i, i. e., \(N(q):=\{w(j_{0},j_{1},\dots ,j_{k})\mid (j_{0},\dots ,j_{k})\in I(q)\}\), where

$$ \begin{array}{@{}rcl@{}} w(j_{0},j_{1},\dots,j_{k}):=(w[j_{0}+1.. j_{1}],w[j_{1}+1.. j_{2}],\dots,w[j_{k-1}+1 .. j_{k}]) . \end{array} $$

The corresponding number of occurrences of the nonterminal \(w(j_{0},j_{1},\dots ,j_{k})\) on level i is given by

$$c(j_{0},j_{1},\dots,j_{k}){}:={}|\{(j^{\prime}_{0},j^{\prime}_{1},\dots,j^{\prime}_{k}){}\in{} I(q){}\mid{} w(j_{0},j_{1},\dots,j_{k}){}={}w(j^{\prime}_{0},j^{\prime}_{1},\dots,j^{\prime}_{k})\}| .$$

The entry T[i, q] can now be computed as follows:

$$T[i,q]=S[i-1,R(q)]-\left( \sum\limits_{w(j_{0},j_{1},\dots,j_{k})\in N(q)} c(j_{0},j_{1},\dots,j_{k})(k-1)-k\right)$$

Then, for every \(q^{\prime }\in \{0,1\}^{n-1}\), we can compute entries \(S[i,q^{\prime }]\) and \(P[i,q^{\prime }]\) by

$$ \begin{array}{@{}rcl@{}} S[i,q^{\prime}] &=&\min\{T[i, q]\mid F(q)=q^{\prime}\} \text{ and}\\ P[i,q^{\prime}] &=& q , \end{array} $$

where q ∈{0,1,2}n− 1 ∖{0,1}n− 1 with \(F(q)=q^{\prime }\) and \(T[i,q]=S[i,q^{\prime }]\). Finally, the value opti is computed by \(opt_{i} = \min \limits \{S[i,q^{\prime }]\mid q^{\prime }\in \{0,1\}^{n-1}\}\).

After termination of step \(\frac {n}{2}\), the size of a smallest grammar for the word w is \(\min \limits \{opt_{i} \mid 1 \leq i \leq \frac {n}{2}\}\). Since the values in T[i, q] for any \(i=2,3,\dots , \frac {n}{2}\) and q ∈{0,1,2}n− 1 ∖{0,1}n− 1 are constructively computed from S[i, R(q)] by defining the rules in N(q), the set \(\bigcup _{j=1}^{i} N(q_{i})\) with qi := q and qj− 1 := P[j, qj] for \(j=i-1,\dots ,1\) yields an i-level grammar for w of size T[i, q]. For the index i with \(opt_{i} = \min \limits \{opt_{i} \mid 1 \leq i \leq \frac {n}{2}\}\) and a vector q ∈{0,1,2}n− 1 ∖{0,1}n− 1 such that opti = S[i, R(q)], this construction gives a smallest grammar for w.

In order to prove the correctness of the algorithm, we show for each q ∈{0,1}n− 1, inductively for each \(i=1,\dots ,\frac {n}{2}\) that S[i, q] equals the size of a smallest i-level grammar for w which induces the factorisation q. For i = 1 this is implied by Lemma 13. Assuming that this statement is true for some value i − 1, let Gi = (N,Σ, R, ax) be a smallest i-level grammar for w with \(i \leq \frac {n}{2}\). Let qi and qi− 1 be the vector-representations of the factorisations induced by ax and \(\widehat {\mathsf {D}}_{i}(\mathsf {ax})\) respectively. The grammar \(G_{i-1}:=(N\setminus N_{i},{\Sigma },R\setminus \{(A,\mathsf {D}(A))\mid A\in N_{i}\},\widehat {\mathsf {D}}_{i}(\mathsf {ax}))\) is an (i − 1)-level grammar for w with induced factorisation qi− 1 and the size of Gi− 1 can be computed by \(|G_{i}|+{\sum }_{A\in N_{i}} \mathsf {p}(A)\) and is at least S[i − 1, qi− 1] by the induction hypothesis. By definition of the profit, the term \(|G_{i}|+{\sum }_{A\in N_{i}} \mathsf {p}(A)\) can be re-written to \(|G_{i}|+|\widehat {\mathsf {D}}_{i}(\mathsf {ax})|-|\mathsf {ax}| -{\sum }_{A\in N_{i}}|\mathsf {D}(A)|\).

Let q ∈{0,1,2}n− 1 be such that F(q) = qi and R(q) = qi− 1, i. e., for every j, 1 ≤ jn − 1, q[j] = 2, if qi[j]≠qi− 1[j] and q[j] = qi[j], otherwise. The value T[i, q] is computed from S[i − 1, qi− 1] by subtracting

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{w(j_{0},j_{1},\dots,j_{k})\in N(q)} c(j_{0},j_{1},\dots,j_{k})(k-1)-k = \\ &&\left( \sum\limits_{(j_{0},\dots,j_{k})\in I(q)}(k-1)\right) - \left( \sum\limits_{w(j_{0},\dots,j_{k})\in N(q)}k\right) . \end{array} $$

Each 2-entry in q occurs in exactly one set in I(q) which, by definition of q, yields:

$$\sum\limits_{(j_{0},j_{1},\dots,j_{k})\in I(q)}(k-1)= {\sum}_{j=1}^{n-1}(q_{i-1}[j]-q_{i}[j]) = |\widehat{\mathsf{D}}_{i}(\mathsf{ax})|-|\mathsf{ax}| .$$

For each \(w(j_{0},j_{1},\dots ,j_{k})\in N(q)\), Ni contains a nonterminal ANi with |D(A)| = k, which means that \({\sum }_{A\in N_{i}}|\mathsf {D}(A)|\geq {\sum }_{w(j_{0},j_{1},\dots ,j_{k})\in N(q)}k\); thus,

$$ \begin{array}{@{}rcl@{}} |G_{i}|&=&|G_{i-1}|-|\widehat{\mathsf{D}}_{i}(\mathsf{ax})|+|\mathsf{ax}| +\sum\limits_{A\in N_{i}}|\mathsf{D}(A)|\\ &\geq& S[i-1,q_{i-1}] - \sum\limits_{w(j_{0},j_{1},\dots,j_{k})\in N(q)} c(j_{0},j_{1},\dots,j_{k})(k-1)-k\\ &=&T[i,q]\geq S[i,F(q)]=S[i,q_{i}] . \end{array} $$

Consequently, the algorithm computes the size of a grammar for w that is smallest among all grammars for w with at most \(\frac {n}{2}\) levels and since for any word w there always exists a smallest grammar with at most \(\frac {|{w}|}{2}\) levels, we conclude that the described algorithm finds a smallest grammar for w. □

We conclude this section by pointing out some features of the algorithm of Theorem 13. First, note that the brute-force enumeration of all q ∈{0,1,2}n− 1 ∖{0,1}n− 1, which dominates the running-time, provides some possibilities for modifications. For example, if we only consider q such that at most 2 neighbouring factors of R(q) are combined in F(q) (which are much less than the full set {0,1,2}n− 1 ∖{0,1}n− 1), then we automatically compute smallest grammars in Chomsky normal form.Footnote 11 Moreover, for a fixed i and two \(q_{1}, q_{2} \in \{0,1, 2\}^{n-1}\setminus \{0,1\}^{n-1}\), the computations that are necessary to compute T[i, q1] and T[i, q2] are independent from each other and only require the previously computed values S[i − 1,⋅] (an analogous observation can be made for the computation of the S[i,⋅] and P[i,⋅]). Hence, the brute-force enumeration of the q ∈{0,1,2}n− 1 ∖{0,1}n− 1 and of the \(q^{\prime } \in \{0,1\}^{n-1}\) can be easily done in parallel.

6 Conclusions

We conclude this work by discussing some important open problems and additional questions that are motivated by our results.

6.1 Small Alphabets

For hard problems on strings, we usually encounter the situation that either the problem becomes polynomial-time solvable for constant alphabets, or there is a hardness reduction that works for some constant alphabet, which, by simple encoding techniques, extends to binary alphabets as well. Moreover, the unary case is often trivially solvable in polynomial time, even if the problem becomes intractable for larger alphabets. However, the smallest grammar problem shows a drastically different behaviour: it is not polynomial-time solvable for every constant alphabet (unless P = NP), but the NP-hardness for very small alphabets (even for the binary or unary case) is still open. Thus, we consider the following as one of the most important open questions:

Open Problem 1

Is it possible to compute smallest grammars for binary alphabets in polynomial time?

We believe that answering this question in the negative might be rather difficult. In fact, the substantial effort that was necessary to prove Theorem 3 suggests that further strengthening our reduction to the case of binary alphabets is problematic. Thus, a completely different kind of reduction seems necessary. However, the main technical challenge seems to be the necessity to control the compression of factors that function as codewords for parts of the source problem of the reduction. It is arguably difficult to think about reductions that somehow circumvents this issue.

On the other hand, it is not apparent how a small alphabet could help in order to efficiently compute smallest grammars and, if this is possible, it seems that deeper combinatorial insights with respect to grammar-based compression are necessary.

6.2 Approximation

So far, no constant-factor approximation algorithm is known for the smallest grammar problem (as already mentioned in Section 1.3, the best approximation algorithms achieve a ratio in \(\mathcal {O}\left (\log \left (\frac {|{w}|}{m^{*}}\right )\right )\) [33, 34, 40]) and, although not backed by any hardness results, the existing literature suggests that no such algorithm exists. Moreover, this apparent hardness of approximating smallest grammars also applies to the case of fixed alphabets, since, as shown in [39], if there is an approximation algorithm for the smallest grammar problem over a binary alphabet with a constant approximation ratio c, then there also is a 6c-approximation algorithm for arbitrary alphabets. This especially means that disproving the existence of a 6-approximation for the smallest grammar problem for unbounded alphabets, under some complexity theoretic assumption, implies, under the same assumption, that there is no polynomial algorithm for the restriction to binary alphabets. Considering the substantial effort that went into designing a reduction for alphabet size 17 in this paper, such an inapproximability result for unbounded alphabets might actually be an easier way to show computational lower bounds for binary alphabets.

Aside from these consequences for binary alphabets, an inapproximability result (with some ratio significantly larger than the current bound of \(\frac {8569}{8568}\)) for the smallest grammar problem would be very interesting, yet not unexpected. The common belief that general constant-factor approximations probably do not exist is based on the fact that, despite substantial effort, such algorithms have not been found so far, but also on the close relation to the problem of computing shortest addition chains for a set of integers — a problem which has been extensively studied for over 100 years (see [63] for a survey on addition chains and [33, 34] for their connections to the smallest grammar problem). Formally, an addition chain is a strictly increasing sequence \((a_{1}, a_{2}, \ldots , a_{k}) \in \mathbb {N}^{k}\) with a1 = 1 and, for every i, 2 ≤ ik, there are b, c ∈{a1,…, ai− 1} with ai = b + c; the task is to compute a desirably short addition chain that contains a given set of integers. In a sense, grammars can be seen as the natural extension of addition chains (i. e., instead of integers, we are concerned with strings and integer-addition becomes string-concatenation).

It has been shown in [33, 34], that a set of integers can be translated into a word (over an alphabet that grows with the number of integers), the smallest grammar of which is larger than the length of a shortest addition chain of the integers by only a constant factor. Consequently, an approximation algorithm for the smallest grammar problem with approximation ratio in \(\small \text {o}(\frac {\log n}{\log \log n})\) would imply an improvement of long-standing results for addition chains, for which the best known approximation algorithm achieves an approximation ratio in \(\mathcal {O}(\frac {\log n}{\log \log n})\) (see [34] for details). Note that, with the results of [39] mentioned above, this statement also holds for the case of constant, even binary, alphabets.

Moreover, we can also observe that the fundamental technique of the approximation algorithms of [33, 34, 40], which links smallest grammars with the size of LZ77-factorisations, is unlikely to prove an approximation with ratio in \(\small \text {o}(\frac {\log n}{\log \log n})\). More precisely, by bounding the size of a smallest grammar of a word from below by the length of its shortest LZ77-factorisation, the performance of these algorithms is shown by comparison with this LZ77-bound. However, it is also shown (see [33, 40]) that there are words, for which a smallest grammar is \(\mathcal {O}(\frac {\log n}{\log \log n})\)-times as large as the size of a smallest LZ77-factorisation; thus, for such algorithms, an approximation-ratio better than \(\mathcal {O}(\frac {\log n}{\log \log n})\) cannot be shown by this technique. Moreover, note that this result is improved in [39], where binary words are presented, for which a smallest grammar is \(\mathcal {O}(\frac {\log n}{\log \log n})\)-times as large as the size of a smallest LZ77-factorisation.

Open Problem 2

Is there a constant-factor approximation algorithm for the smallest grammar problem? (Note that a negative result disproving a ratio of 6 or larger, yields a bound for the restriction to binary alphabets.)

6.3 Parameterised Complexity

This work can also be seen as the starting point of a comprehensive parameterised complexity analysis of the smallest grammar problem. More precisely, our results show that the problem is most likely not in FPT, if parameterised by |Σ|, |N| or the number of levels. However, with respect to parameter |N|, we saw that it is at least in XP. A simple fixed-parameter tractable case can be obtained, if we parameterise by both |Σ| and \(\ell = \max \limits \{|\mathfrak {D}(A)| \mid A \in N\}\). More precisely, for every \(F \subseteq \{u \mid u \in {\Sigma }^{+}, 2 \leq |u| \leq \ell \}\), we compute a smallest F-grammar according to Lemma 11 and we output one that is minimal among them. Since the number of the sets F is bounded by a function of the parameters, this yields an fpt-algorithm. However, we consider the following parameterised variant, for which the existence of an fpt-algorithm is still open, the most interesting:

Open Problem 3

Is the smallest grammar problem parameterised by |Σ| and |N| fixed-parameter tractable?

6.4 A More Abstract View

From a rather abstract point of view, one could generally interpret any set of factors \(F \subseteq 2^{{\Sigma }^{*}}\) as a grammar. More precisely, an F-grammar is then a triple GF = (N,Σ, R) (the axiom or start symbol is intentionally missing) with N = {AuuF} and R is a set of rules over Σ and N that satisfies \(\mathfrak {D}(A_{u}) = u\), for every uF. In this way, an F-grammar is a representation of F (just that none of the words in F is the designated compressed word). Obviously, there is a large element of freedom in this definition of F-grammars, since many choices for R are possible. However, as long as we are only interested in small grammars, this is justified, since a grammar that is a smallest among all F-grammars (in the sense described above) can be computed in polynomial time. To see this, we can slightly adapt the approach from Section 4 as follows. For every uF, we first construct the subgraph with vertices V4, u and edges E4, u, then we delete all vertices (u, i, j) with i < j and u[i..j]∉F (and adjacent edges). As before, it can be shown that an independent dominating set for the resulting interval graph corresponds to a smallest F-grammar. In the following, we denote by GF the smallest F-grammar obtained in this way.

In a sense, this abstracts away the question of how factors are compressed by other factors and boils the problem of computing small grammar down to its core of hardness, which relies in choosing the right factors. While this perspective is interesting from a theoretical point of view, it also yields questions that might have algorithmic application. For example, as an alternative to the exponential brute-force enumeration of all \(F \subseteq \mathsf {F}_{\geq 2}(w)\) in order to obtain an F-grammar that is smallest among all grammars, one could compute GF for a factor set F that is inclusion maximal in the sense that, for every \(F^{\prime } \supsetneq F\), \(|G_{F}| < |G_{F^{\prime }}|\) (or inclusion minimal, which can be defined analogously). However, this approach only seems applicable in a reasonable way, if this concept of inclusion maximality is monotone, i. e., the inclusion maximality of F is characterised by |GF| < |G(F∪{u})|, for every u ∈Σ. In this regard, note that \(|G_{F}| = |G_{F^{\prime }}|\) is possible for \(F \subsetneq F^{\prime }\), as witnessed by F = {4} and F = {4,2}.

Open Problem 4

Are there \(F_{1} \subsetneq F_{2} \subsetneq F_{3} \subseteq \mathsf {F}_{\geq 2}(w)\), such that \(|G_{F_{1}}| < |G_{F_{2}}|\) and \(|G_{F_{3}}| < G_{F_{1}}|\)?

If the inclusion maximality is monotone, then every inclusion maximal F (thus, also an optimal F for which GF is a smallest grammar) can be computed by starting with F = {w} and iteratively adding factors from w, until every possible new factor would increase the size of GF. This also yields an obvious greedy strategy: always choose the new factor that results in a smallest GF. In this regard, we stress the fact that this kind of greedy strategy differs from the algorithm Greedy [37], analysed in [33, 34], since the latter iteratively changes an existing grammar and the greediness is with respect to the rules of the intermediate grammars.

This also points out an interesting fact (and a potential difficulty) of this approach: The grammars corresponding to the factor sets F, F ∪{u}, \(F \cup \{u, u^{\prime }\}\) and so on, i. e., the grammars GF, G(F∪{u}), etc., could be quite different and do not necessarily share the incremental character of the factor sets, in the sense that one grammar can be obtained from the previous one by small, local modifications.