1 Introduction and definitions

In the last 30 years, the following problem was studied very intensively: How behaves the complexity of languages under operations (on languages). More precisely, for a complexity measure K on regular languages and a regularity preserving n-ary function f on languages, one is interested in the set \(g_f^K(k_1,k_2,\ldots ,k_n)\) of all numbers k such that there are regular languages \(L_1,L_2,\ldots , L_n\) such that \(K(L_i)=k_i\) for \(1\le i\le n\) and \(K(f(L_1,L_2,\ldots , L_n))=k\) (in many cases only the maximal number \(h_f^K(k_1,k_2,\ldots ,k_n)\) in \(g_f^K(k_1,k_2,\ldots ,k_n)\) is considered).

Most of the research concerns the state complexity \({{\,\mathrm{sc}\,}}\), where \({{\,\mathrm{sc}\,}}(L)\) is defined as the minimal number of states of a deterministic finite automata which accepts L. As an example we mention

$$\begin{aligned} g_{\cup }^{{{\,\mathrm{sc}\,}}}(m,n)=\{1,2,\ldots ,mn\} \quad \text{ for } m\ge 2 \text{ and } n\ge 2 \end{aligned}$$

(see [12]). For further results concerning the behaviour of the state complexity under operations, we refer to the survey article [5].

Because the family of regular languages can be characterized by nondeterministic finite automata, by incomplete deterministic finite automata, regular expressions etc., it is natural to study the set \(g_f^{K}\) and the function \(h_f^K\) for measures K connected with those characterizations.

The state complexity of a regular language coincides with the number of its left quotients by words. However, using the quotients instead of states gives some further interesting aspects. The behaviour of the quotient complexity under operations was intensively studied by Brzozowski and others. A summary is given in [1].

In 2003, Holzer and Kutrib started the investigation with respect to the nondeterministic state complexity \({{\,\mathrm{nsc}\,}}\) where \({{\,\mathrm{nsc}\,}}(L)\) is defined as the minimal number of states of a nondeterministic finite automata which accepts L  [10, 11]. We mention here the facts given in Table 1.

Table 1 Some results on ranges of nondeterministic state complexities for some operations

A complete deterministic automaton with r states and s input symbols has \(r\cdot s\) transitions. However, often it is not necessary to require completeness, i.e., it can be that, for some pairs of states and input symbols, no transition is defined. For such incomplete automata, one can study the transition complexity tc where tc(L) is the minimal number of transitions in incomplete deterministic finite automata which accepts L. The behaviour of this measure was studied in [6, 16]. For instance, in [6], it was shown that for all numbers \(m\ge 1\) and \(n\ge 1\),

$$\begin{aligned} m\cdot n + m+n-1\le h_{\cup }^{{{\,\mathrm{tc}\,}}}(m,n) \le 2(m\cdot n+m+n). \end{aligned}$$

The behaviour of the size of regular expressions was studied in [4, 7].

We now introduce a further concept which characterizes regular languages. We start with some notation.

For a set M, by \({{\,\mathrm{card}\,}}(M)\) we denote its cardinality. An alphabet is a non-empty finite set. The elements of an alphabet are called letters. A word over an alphabet V is a finite sequence of letters of V. The length of a word is the number of its letters (where we count the letters as often as they occur in the word); the length of a word w is denoted by \(\vert w\vert \). A word w of length \(n\ge 1\) is written as \(w=a_1a_2\ldots a_n\), where \(a_i\in V\) for \(1\le i\le n\). The empty word (of length 0) is designated by \(\lambda \). The set of all non-empty words of V is denoted by \(V^+\), and we set \(V^*=V^+\cup \{\lambda \}\). The reversal of a non-empty word \(w=a_1a_2\ldots a_n\) is defined by \(w^R=a_na_{n-1}\ldots a_1\); moreover, we set \(\lambda ^R=\lambda \).

Any subset of \(V^*\) is called a language over V. For two languages L and \(L'\), we define the concatenation by \(LL'=\{ ww' \mid w\in L, w'\in L'\}\), the powers of L by \(L^0=\{\lambda \}\), \(L^n=L^{n-1}L\) for \(n\ge 1\), the positive Kleene-closure, the Kleene-closure and the reversal by

$$\begin{aligned} L^+= \bigcup _{n\ge 1} L^n,\quad \ L^*=\bigcup _{n\ge 0} L^n \quad \text{ and } \quad L^R=\{ w^R \mid w\in L\}. \end{aligned}$$

A right linear grammar is a quadruple \(G=(N,T,P,S)\) where N and T are two disjoint (non-empty) alphabets, P is a finite subset of \(N\times (T^*N\cup T^*)\), and \(S\in N\). The elements of N and T are called nonterminals and terminals, respectively. The elements of P are called rules. For a rule (Aw), we mostly use the notation \(A\rightarrow w\). S is called the axiom or the start symbol.

For two words \(x\in T^*N\) and \(y\in T^*N\cup T^*\), we write \(x\Longrightarrow y\) if and only if there is a rule \(A\rightarrow w\) in P such that \(x=vA\) and \(y=vw\), and then we say that x derives y in one derivation step. Let \(\Longrightarrow ^*\) be the reflexive and transitive closure of \(\Longrightarrow \). If \(x\Longrightarrow ^* y\), we say that y is derived from x.

The language L(G) generated by a right linear grammar \(G=(N,T,P,S)\) is defined by

$$\begin{aligned} L(G)= \{ z \mid z\in T^*, \ S\Longrightarrow ^* z\}. \end{aligned}$$

It is known that a language L is regular if and only if it is generated by a right linear grammar.

For a right linear grammar \(G=(N,T,P,S)\), we define the complexity measure \({{\,\mathrm{Var}\,}}\) and extend it for regular languages by setting

$$\begin{aligned} {{\,\mathrm{Var}\,}}(G)= {{\,\mathrm{card}\,}}(N) \end{aligned}$$

and

$$\begin{aligned} {{\,\mathrm{Var}\,}}(L)=\min \{ {{\,\mathrm{Var}\,}}(G) \mid G \text{ is } \text{ a } \text{ right } \text{ linear } \text{ grammar } \text{ with } L(G)=L\}. \end{aligned}$$

A right linear grammar \(G=(N,T,P,S)\) is called to be in normal form, if all rules of P are of the form \(A\rightarrow aB\) or \(A\rightarrow \lambda \) where A and B are nonterminals and a is a terminal. Let \({{\,\mathrm{Varnf}\,}}\) be the complexity if one restricts to right linear grammars in normal form. Then it is known that \({{\,\mathrm{Varnf}\,}}(L)={{\,\mathrm{nsc}\,}}(L)\). However, the complexities of concrete languages differ essentially. For instance, it is obvious that \({{\,\mathrm{nsc}\,}}(\{a^m\}^*)=m\) and \({{\,\mathrm{Var}\,}}(\{a^m\}^*)=1\).

Obviously, context-free grammars (where the rules have the form \(A\rightarrow w\in (N\cup T)^*\) with \(A\in N\) and \(w\in (N\cup T)^*\)) are a generalization of right linear grammars, and one can define the \({{\,\mathrm{Var}\,}}\)-complexity for context-free grammars, which is denoted by \({{\,\mathrm{Varcf}\,}}\). The behaviour of this measure under operations was studied in [3]. Some results are presented in Table 2 (there are no results for intersection and complement because the class of context-free languages is not closed under these two operations).

Table 2 Some results on ranges of \({{\,\mathrm{Varcf}\,}}\)-complexities for some operations

We mention that, again, the \({{\,\mathrm{Var}\,}}\)-complexities of languages can considerably differ. For instance, the set \(U=\{aba\}^+\{ab^2a\}^+\cdots \{ab^{2m}a\}^+\) has the complexities \({{\,\mathrm{Varcf}\,}}(U)=m\) (by Lemma 2.3 in [3]) and \({{\,\mathrm{Var}\,}}(U)=2m\) (by Lemma 1 below).

In this paper we continue the research mentioned above by a (partial) determination of the sets \(g_f^{{{\,\mathrm{Var}\,}}}\) with respect to the operations union, concatenation, Kleene-closure, intersection, and reversal.

By the above remarks, the study of the sets \(g_f^{{{\,\mathrm{Var}\,}}}\) is an intermediate step between the investigation of \(g_f^{{{\,\mathrm{nsc}\,}}}\) and the study of \(g_f^{{{\,\mathrm{Varcf}\,}}}\). Therefore our results sometimes look similar to those of [3, 12,13,14]. In some cases, below, we can use modifications of the proofs in those papers. However, by the above mentioned differences in the complexities for certain languages, mostly, we cannot follow the ideas of that papers. For instance, to prove the first line of Table 1 the authors only use finite and unary languages. However, such languages cannot be used to prove an analogous statement for the measure \({{\,\mathrm{Var}\,}}\) because \({{\,\mathrm{Var}\,}}(L)\le 2\) holds for any finite and/or unary language L.

2 The complexity of some special languages

In this section we determine the complexities of some languages, which are used later.

For two different letters a and b, a natural number p with \(p\ge 1\), and pairwise different natural numbers \(r_1, r_2,\ldots ,r_p\) with \(1\le r_i\) for \(1\le i\le p\), we set

$$\begin{aligned} P(r_1,r_2,\ldots ,r_p)= & {} \{ab^{r_1}a\}^+\cdot \{ab^{r_2}a\}^+\ldots \{ab^{r_p}a\}^+, \\ P'(r_1,r_2,\ldots ,r_p)= & {} \{ab^{r_1}a\}^*\cdot \{ab^{r_2}a\}^*\ldots \{ab^{r_p}a\}^*, \\ U(r_1,r_2,\ldots ,r_p)= & {} \{ab^{r_1}a, ab^{r_2}a, \ldots ,ab^{r_p}a\}^+. \end{aligned}$$

Lemma 1

For two different letters a and b, natural numbers \(n\ge 1\), \(p_1,p_2, \ldots ,p_n\), \(p_i\ge 1\) for \(1\le i\le n\), and pairwise different natural numbers \(r_{1,1}, r_{1,2},\ldots ,r_{1,p_1}, r_{2,1},r_{2,2},\ldots ,r_{2,p_2}, \ldots ,r_{n,p_n}\), \(r_{i,j}\ge 1\) for \(1\le i\le n\), \(1\le j\le p_i\), let

$$\begin{aligned} L=P(r_{1,1},\ldots ,r_{1,p_1})\cup P(r_{2,1},\ldots ,r_{2,p_2})\cup \ldots \cup P(r_{n,1},\ldots ,r_{n,p_n}). \end{aligned}$$

Then we have

$$\begin{aligned} Var(L)= \left\{ \begin{array}{ll} p_1+p_2+\cdots +p_n+1 &{} \text{ for } n\ge 2 \\ p_1 &{} \text{ for } n=1 \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} {{\,\mathrm{Var}\,}}(P'(r_{1,1},r_{1,2},\ldots ,r_{1,p_1}))=p_1. \end{aligned}$$

Proof

Let \(G=(N,\{a,b\},P,S)\) be a right linear grammar such that

$$\begin{aligned} L(G)=L \quad \text{ and } \quad {{\,\mathrm{Var}\,}}(G)={{\,\mathrm{card}\,}}(N)={{\,\mathrm{Var}\,}}(L). \end{aligned}$$

Further, let \(t={{\,\mathrm{card}\,}}(N)\) and \(t'=\max \{ \vert z\vert \mid A\rightarrow z\in P\}\).

For i and j, \(1\le i\le n\), \(1\le j\le p_i\), we consider the word

$$\begin{aligned} z_{i,j}=ab^{r_{i,1}}aab^{r_{i,2}}a\ldots ab^{r_{i,j-1}}a(ab^{r_{i,j}}a)^{(t+1)(5r_{i,j}+t')}ab^{r_{i,j+1}}a ab^{r_{i,j+2}}a\ldots ab^{r_{i,p_i}}a \end{aligned}$$

and its derivation. There are nonterminals \(A_1,A_2,\ldots ,A_{t+1}\) such that

$$\begin{aligned} S&\Longrightarrow ^*&w_iA_1\Longrightarrow ^* w_iv_{i,1}A_2 \Longrightarrow ^* w_iv_{i,1}v_{i,2}A_3 \Longrightarrow ^* \ldots \\&\Longrightarrow ^*&w_iv_{i,1}v_{i,2}\ldots v_{i,t-1}A_{t}\Longrightarrow ^* w_iv_{i,1}v_{i,2}\ldots v_{i,t-1}v_{i,t}A_{t+1}\Longrightarrow ^*z_i \end{aligned}$$

with \(w_i=ab^{r_{i,1}}aab^{r_{i,2}}a\ldots ab^{r_{i,j-1}}aw_i'\), \(\vert w_i'\vert \le t'\), \(2r_{i,j}+3\le \vert v_{i,l}\vert \le 2r_{i,j}+3+t'\) for \(1\le l\le t\). Because \(\vert w_i'v_{i,1}v_{i,2}\ldots v_{i,t-1}v_{i,t}\vert \le t'+(2r_{i,j}+3+t')t\le (5r_{i,j}+t')(t+1)\), we get \(w_i'v_{i,1}v_{i,2}\ldots v_{i,t-1}v_{i,t}=(ab^{r_{i,j}}a)^pw''\) with some natural number p and a proper prefix \(w''\) of \(ab^{r_{i,j}}a\). By the length condition for the words \(v_{i,l}\), we obtain that \(v_{i,l}\) contains at least one occurrence of \(ab^{r_{i,j}}a\) as a subword.

Since N contains only t nonterminals, it follows that there are two numbers \(l_1\) and \(l_2\) such that \(A_{l_1}=A_{l_2}\). We set \(B_{i,j}=A_{l_1}=A_{l_2}\). Then the derivation

$$\begin{aligned} S\Longrightarrow ^* y_{i,j}A_{l_1}\Longrightarrow ^* y_{i,j}x_{i,j}A_{l_2}\Longrightarrow ^* y_{i,j}x_{i,j}y_{i,j}'\in L(G) \end{aligned}$$

can be written as

$$\begin{aligned} S\Longrightarrow ^* y_{i,j}B_{i,j}\Longrightarrow ^* y_{i,j}x_{i,j}B_{i,j}\Longrightarrow ^* y_{i,j}x_{i,j}y_{i,j}'\in L(G) \end{aligned}$$

and by construction \(x_{i,j}=x_{i,j}'(ab^{r_{i,j}}a)^{p(i,j)}x_{i,j}''\) for some natural number \(p(i,j)\ge 1\), a proper suffix \(x_{(i,j)}'\) of \(ab^{r_{i,j}}a\), and a proper prefix \(x_{(i,j)}''\) of \(ab^{r_{i,j}}a\).

If \(B_{i,j}=B_{i,j'}\) for some \(1\le j< j'\le p_i\), then we have a derivation

By construction, \(z\in L(G)=L\). But z contains an occurrence of \(ab^{r_{i,j}}a\) after an occurrence of \(ab^{r_{i,j'}}a\) with \(j<j'\) which contradicts the structure of the words in L (more precisely, in \(P(r_{i,1},r_{i,2},\ldots , p_{i,p_i})\)).

If \(B_{i,j}=B_{i',j'}\) for some \(1\le i<i'\le n\), \(1\le j\le p_i\), and \(1\le j'\le p_{i'}\), then we can analogously show that a word is generated which contains an occurrence of \(ab^{r_{i,j}}a\) as well as an occurrence of \(ab^{r_{i',j'}}a\) which contradicts the structure of words in L, again.

Thus N contains at least the \(p_1+p_2+\cdots +p_n\) different letters \(B_{i,j}\), \(1\le i\le n\), \(1\le j\le p_i\).

Moreover, if \(n\ge 2\) and \(B_{i,j}\) is the axiom for certain \(1\le i\le n\), \(1\le j\le p_i\), then we can as above produce a word which contains an occurrence of \(ab^{r_{i,j}}a\) as well as an occurrence of \(ab^{r_{i',j'}}a\), where \(i'\ne i\). Again, we obtain a contradiction to the structure of L. Thus, N contains a start symbol different from all symbols \(B_{i,j}\).

Hence we obtain \({{\,\mathrm{Var}\,}}(L)\ge p_1+ p_2+\cdots +p_n+1\) for \(n\ge 2\) and \({{\,\mathrm{Var}\,}}(L)\ge p_1\) for \(n=1\).

On the other hand, the right linear grammars \(G_1=(N_1,\{a,b\},P_1,C_1)\) and \(G_{\ge 2}=(N_{\ge 2},\{a,b\},P_{\ge 2},S)\) with

$$\begin{aligned} N_1= & {} \{C_1,C_2,\ldots ,C_{p_1}\}, \\ P_1= & {} \{C_{p_1}\rightarrow ab^{r_{1,p_1}}aC_{p_1}, C_{p_1}\rightarrow ab^{r_{1,p_1}}a\} \\&\qquad \cup \bigcup _{i=1}^{p_1-1} \{C_i\rightarrow ab^{r_{1,i}}aC_i, C_i\rightarrow ab^{r_{1,i}}aC_{i+1}\}, \\ N_{\ge 2}= & {} \{S\} \cup \{C_{i,j} \mid 1\le i\le n, 1\le j\le p_i \}, \\ P_{\ge 2}= & {} \{S\rightarrow C_{i,1} \mid 1\le i\le n\} \cup \bigcup _{i=1}^n \{C_{i,p_i}\rightarrow ab^{r_{i,p_i}}aC_{i,p_i}, C_{i,p_i}\rightarrow ab^{r_{i,p_i}}a\} \\&\qquad \cup \bigcup _{i=1}^n \bigcup _{j=1}^{p_i-1} \{C_{i,j}\rightarrow ab^{r_{i,j}}aC_{i,j}, C_{i,j}\rightarrow ab^{r_{i,j}}aC_{i,j+1}\} \end{aligned}$$

generate L for \(n=1\) and \(n\ge 2\), respectively. Thus

$$\begin{aligned}{{\,\mathrm{Var}\,}}(L)=\left\{ \begin{array}{ll} p_1+p_2+\cdots +p_n+1 &{} \text{ for } n\ge 2, \\ p_1 &{} \text{ for } n=1. \end{array} \right. \end{aligned}$$

The proof that we need at least \(p_1\) nonterminals for the generation of the set \(P'(r_{1,1},r_{1,2},\ldots ,r_{1,p_1})\) can be analogously given to the proof of this statement for \(P(r_{1,1},r_{1,2},\ldots ,r_{1,p_1})\). To show that that \(p_1\) nonterminals are sufficient we add the rules \(C_i\rightarrow C_{i+1}\) for \(1\le i\le p_1-1\) and \(C_{p_1}\rightarrow \lambda \) to the rules of \(G_1\), which results in a grammar generating \(P'(r_{1,1},r_{1,2},\ldots ,r_{1,p_1})\) with \(p_1\) nonterminals. \(\square \)

Lemma 2

For three pairwise different letters a, b and c, natural numbers \(p\ge 1\) and \(q\ge 1\), and pairwise different numbers \(r_1,r_2,\ldots , r_p, s_1,s_2,\ldots , s_q\), \(r_i\ge 1\) for \(1\le i\le p\) and \(s_j\ge 1\) for \(1\le j\le q\), we have

  1. (i)

    \({{\,\mathrm{Var}\,}}(P(r_1,r_2,\ldots ,r_p)\{c\})=p\),

  2. (ii)

    \({{\,\mathrm{Var}\,}}(\{c\}P(r_1,r_2,\ldots ,r_p))=p+1\),

  3. (iii)

    \({{\,\mathrm{Var}\,}}(\{c\}P(r_1,r_2,\ldots ,r_p)\cup P(s_1,s_2,\ldots ,s_q))=p+q+1\).

Proof

  1. (i)

    The proof that we need at least p nonterminals for the generation of \(P(r_1,r_2,\ldots ,r_p)\{c\}\) can be analogously given to the corresponding statement for the set \(P(r_{1,1},r_{1,2},\ldots ,r_{1,p_1})\) in the proof of Lemma 1. To show that p rules are sufficient we consider the right linear grammar \(G_1'=(\{C_1,C_2,\ldots C_p\}, \{a,b,c\},P_1',C_1)\) with

    $$\begin{aligned} P_1'=\{C_p\rightarrow ab^{r_p}aC_p, C_p\rightarrow ab^{r_p}ac\} \cup \bigcup _{i=1}^{p-1} \{C_i\rightarrow ab^{r_i}aC_i, C_i\rightarrow ab^{r_i}aC_{i+1}\}, \end{aligned}$$

    which generates \(P(r_1,r_2,\ldots ,r_p)\{c\}\) with p nonterminals.

  2. (ii)

    As in the proof of Lemma 1, we can show the existence of p different nonterminals \(B_i\), \(1\le i\le p\), with derivations

    If one of these nonterminals is the axiom, say \(B_i\), we have the derivation \(B_i\Longrightarrow ^* u_i(ab^{r_i}a)^{p_i}v_iB_i\Longrightarrow ^* u_iab^{r_i}av_iz_i\) which produces a word which does not start with c. Thus, we need at least \(p+1\) nonterminals.

    Moreover, we modify \(G_1'\) of part i) to \(G_1''\) adding a new nonterminal C which is the axiom of \(G_1''\), by replacing \(C_p\rightarrow ab^{r_p}ac\) by \(C_p\rightarrow ab^{r_p}a\) and adding the rule \(C\rightarrow cC_1\). Then \(G_1''\) generates \(\{c\}P(r_1,r_2,\ldots ,r_p)\) with \(p+1\) nonterminals.

  3. (iii)

    As in the proof of Lemma 1, we can show that we need \(p+q\) different nonterminals to generate the words \(cab^{r_1}a\ldots ab^{r_{i-1}}a(ab^{r_i}a)^tab^{r_{i+1}}a\ldots ab^{r_p}a\), \(1\le i\le p\), and \(ab^{s_1}a\ldots ab^{s_{j-1}}a(ab^{s_j}a)^tab^{s_{j+1}}a\ldots ab^{s_q}a\), \(1\le j\le q\), for sufficiently large t. Furthermore, as in the proof of Lemma 1, we can show that these nonterminals are pairwise different and different from the axiom. Therefore \({{\,\mathrm{Var}\,}}(\{c\}P(r_1,r_2,\ldots ,r_p)\cup P(s_1,s_2,\ldots ,s_q))\ge p+q+1\).

    Because the grammar \(G_2'=(\{S,C_1,C_2,\ldots ,C_p,D_1,D_2,\ldots ,D_q\}, \{a,b,c\},P_2',S)\) with

    $$\begin{aligned} P_2'= & {} \{S\rightarrow cC_1, S\rightarrow D_1, C_p\rightarrow ab^{r_p}aC_p, C_p\rightarrow ab^{r_p}a, D_q\rightarrow ab^{s_q}aD_q, D_q\rightarrow ab^{s_q}a\} \\&\cup \!\bigcup _{i=1}^{p-1} \{C_i\rightarrow ab^{r_i}aC_i, C_i\rightarrow ab^{r_i}aC_{i+1}\} \cup \!\bigcup _{j=1}^{q-1} \{D_j\rightarrow ab^{s_j}aD_j, D_j\rightarrow ab^{s_j}aD_{j+1}\} \end{aligned}$$

    generates \(\{c\}P(r_1,r_2,\ldots ,r_p)\cup P(s_1,s_2,\ldots ,s_q)\) with \(p+q+1\) nonterminals, the statement follows. \(\square \)

Lemma 3

For pairwise different letters a, b, and c, natural numbers \(p\ge 1\), \(q\ge 1\), and pairwise different numbers \(r_1,r_2,\ldots , r_p, s_1,s_2,\ldots , s_q\), \(r_i\ge 1\) for \(1\le i\le p\) and \(s_j\ge 1\) for \(1\le j\le q\), we have

  1. (i)

    \({{\,\mathrm{Var}\,}}(P(r_1,r_2,\ldots ,r_p)\cup U(s_1,s_2,\ldots ,s_q))=p+2\),

  2. (ii)

    \({{\,\mathrm{Var}\,}}(\{c\}P(r_1,r_2,\ldots ,r_p)\cup U(s_1,s_2,\ldots ,s_q)=p+2\),

  3. (iii)

    \({{\,\mathrm{Var}\,}}(P'(r_1,r_2,\ldots ,r_{p'})U(s_1,s_2,\ldots ,s_q)P(r_{p'+1},r_{p'+2},\ldots ,r_p))=p+1\), where \(p> p'\ge 1\) holds in addition,

  4. (iv)

    \({{\,\mathrm{Var}\,}}(U(s_1,s_2,\ldots ,s_q)P(r_{1},r_{2},\ldots ,r_p))=p+1\).

Proof

(i) As in Lemma 1, we can show that the generation of

$$\begin{aligned} ab^{r_1}a\ldots ab^{r_{i-1}}a(ab^{r_i}a)^tab^{r_{i+1}}a\ldots ab^{r_p}a,\ 1\le i\le p \quad \text{ and } \quad (ab^{s_1}a)^t \end{aligned}$$

for sufficiently large t requires \(p+1\) different nonterminals which are also different from the axiom. Thus \({{\,\mathrm{Var}\,}}((P(r_1,r_2,\ldots ,r_p)\cup U(s_1,s_2,\ldots ,s_q))\ge p+2\).

On the other hand, the grammar \(H=(\{S,C,C_1,C_2,\ldots ,C_p\},\{a,b\},P,S)\) with

$$\begin{aligned} P= & {} \{S\rightarrow C, S\rightarrow C_1, C_p\rightarrow ab^{r_p}aC_p, C_p\rightarrow ab^{r_p}a\} \\&\qquad \cup \bigcup _{j=1}^q \{C\rightarrow ab^{s_j}aC, C\rightarrow ab^{s_j}a\} \cup \bigcup _{i=1}^{p-1} \{C_i\rightarrow ab^{r_i}aC_i, C_i\rightarrow ab^{r_i}aC_{i+1}\} \end{aligned}$$

generates \(P(r_1,r_2,\ldots ,r_p)\cup U(s_1,s_2,\ldots ,s_q)\) with \(p+2\) nonterminals. (ii) can be analogously shown. (iii) As in Lemma 1, we can show that the generation of the words

$$\begin{aligned}&(ab^{r_i}a)^tab^{s_1}aab^{r_{p'+1}}a\ldots ab^{r_p}a \quad \text{ for } \quad 1\le i\le p', \\&(ab^{s_1}a)^tab^{r_{p'+1}}a\ldots ab^{r_p}a, \\&ab^{s_1}aab^{r_{p'+1}}a\ldots ab^{r_{j-1}}a(ab^{r_j}a)^tab^{r_{j+1}}a\ldots ab^{r_p}a \quad \text{ for } \quad 1\le p'<j\le p \end{aligned}$$

for sufficiently large t requires \(p+1\) different nonterminals. Thus

$$\begin{aligned} {{\,\mathrm{Var}\,}}(P'(r_1,r_2,\ldots ,r_{p'})U(s_1,s_2,\ldots ,s_q)P(r_{p'+1},r_{p'+2},\ldots ,r_p))\ge p+1. \end{aligned}$$

Since the grammar \(H'=(\{C,C_1,C_2,\ldots C_p\},\{a,b\},P',C_1)\) with

$$\begin{aligned} P'= & {} \{ C_{p'}\rightarrow ab^{r_{p'}}aC_{p'}, C_{p'}\rightarrow C, C_p\rightarrow ab^{r_p}aC_{p}, C_p\rightarrow ab^{r_p}a\} \\&\quad \cup \bigcup _{i=1}^{p'-1} \{C_i\rightarrow ab^{r_i}aC_i, C_i\rightarrow C_{i+1}\} \cup \bigcup _{i=p'+1}^{p-1}\{C_i\rightarrow ab^{r_i}aC_i, C_i\rightarrow ab^{r_i}aC_{i+1}\} \\&\quad \cup \bigcup _{j=1}^q \{C\rightarrow ab^{s_j}aC, C\rightarrow ab^{s_j}aC_{p'+1}\} \end{aligned}$$

generates \(P'(r_1,r_2,\ldots ,r_{p'})U(s_1,s_2,\ldots ,s_q)P(r_{p'+1},r_{p'+2},\ldots ,r_p)\) with \(p+1\) nonterminals, the statement follows. (iv) can be shown analogously. \(\square \)

Lemma 4

  1. (i)

    For different letters a and b, we have \({{\,\mathrm{Var}\,}}(\{b\}\{a,b\}^*)=2\).

  2. (ii)

    For different letters a and b, a natural \(p\ge 1\), and pairwise different natural numbers \(r_1,r_2,\ldots ,r_p\), \(r_i\ge 1\) for \(1\le i\le p\), we have \({{\,\mathrm{Var}\,}}(P(r_1,r_2,\ldots ,r_p)\cup \{c\}\{a,b\}^*)=p+2\).

Proof

  1. (i)

    Let \(G=(N,\{a,b\},P,S)\) be a right linear grammar with \(L(G)=\{b\}\{a,b\}^*\) and \({{\,\mathrm{Var}\,}}(G)={{\,\mathrm{Var}\,}}(\{b\}\{a,b\}^*)\). Since \(ba^t\in L(G)\) for all (sufficiently large) numbers \(t\ge 1\), there is a nonterminal \(A\in N\) such that \(A\Longrightarrow ^* a^rA\) for some \(r\ge 1\). If \(A=S\), then we have a derivation \(S=A\Longrightarrow ^* a^rA=a^rS\Longrightarrow ^* a^rb\) (because \(S\Longrightarrow ^* b\in L(G)=\{b\}\{a,b\}^*\)), which produces a word not in \(\{b\}\{a,b\}^*\), which is a contradiction. Hence, \({{\,\mathrm{Var}\,}}(\{b\}\{a,b\}^*)={{\,\mathrm{Var}\,}}(G)\ge 2\).

    Because \(H=(\{S,S'\},\{a,b\},\{S\rightarrow bS', S'\rightarrow aS', S'\rightarrow bS', S'\rightarrow \lambda \},S)\) generates \(\{b\}\{a,b\}^*\) with two nonterminals, we obtain \({{\,\mathrm{Var}\,}}(\{b\}\{a,b\}^*)=2\).

  2. (ii)

    can be shown analogously to the proofs of Lemma 3 and part i). \(\square \)

3 Behaviour under operations

In this section we study the behaviour of the measure \({{\,\mathrm{Var}\,}}\) for regular languages (presented by right linear grammars) under some operations.

3.1 Reversal

We start with the operation reversal.

Theorem 1

We have

$$\begin{aligned} g_R^{{{\,\mathrm{Var}\,}}}(n)=\left\{ \begin{array}{ll} \{n-1,n,n+1\} &{}\quad \text{ for } n\ge 2, \\ \{1,2\} &{}\quad \text{ for } n=1. \end{array} \right. . \end{aligned}$$

Proof

  1. (i)

    We first prove that, for a language L with \({{\,\mathrm{Var}\,}}(L)=n\), \({{\,\mathrm{Var}\,}}(L^R)\le n+1\) holds.

    Since \({{\,\mathrm{Var}\,}}(L)=n\), L is generated by a right linear grammar \(G=(N,T,P,S)\) with \({{\,\mathrm{card}\,}}(N)=n\). We construct the grammar \(G^R=(N\cup \{S^R\}, T,P^R, S^R)\) where \(S^R\) is a new symbol, i.e., \(S^R\notin N\), and \(P^R\) consists of all rules constructed as follows:

    • if \(A\rightarrow w\in P\), \(w\in T^*\) and \(A\in N\), then \(S^R\rightarrow w^RA\in P^R\),

    • if \(A\rightarrow wB\in P\), \(w\in T^*\), \(A\in N\), and \(B\in N\), then \(B\rightarrow w^RA\in P^R\),

    • if \(S\rightarrow w\in P\) and \(w\in T^*\), then \(S^R\rightarrow w^R\in P^R\), and

    • \(S\rightarrow \lambda \in P^R\).

    By construction, \(G^R\) is right linear and satisfies \(L(G^R)=L^R\). Moreover, we have \({{\,\mathrm{Var}\,}}(G^R)=n+1\). Therefore \({{\,\mathrm{Var}\,}}(L^R)\le n+1\).

  2. (ii)

    We now prove that, for a language L with \({{\,\mathrm{Var}\,}}(L)=n\), \({{\,\mathrm{Var}\,}}(L^R)\ge n-1\) holds.

    Assume that there is a language L such that \({{\,\mathrm{Var}\,}}(L)=n\) and \({{\,\mathrm{Var}\,}}(L^R)\le n-2\). Since \((L^R)^R=L\), we obtain from i) that \({{\,\mathrm{Var}\,}}(L)={{\,\mathrm{Var}\,}}((L^R)^R)\le {{\,\mathrm{Var}\,}}(L^R)+1 \le n-1\) in contrast to our supposition.

  3. (iii)

    We now present witnesses for the values n for \(n\ge 1\), \(n+1\) for \(n\ge 1\), and \(n-1\) for \(n\ge 2\).

    Let \(n\ge 1\) and \(L_n=P(1,2,\ldots ,n)\). Then we obtain \(L_n^R=P(n,n-1,\ldots ,1)\). By Lemma 1, we get \({{\,\mathrm{Var}\,}}(L_n)={{\,\mathrm{Var}\,}}(L_n^R)=n\).

    Let \(n\ge 1\) and \(K_n=P(1,2,\ldots ,n)\{c\}\). Then \(K_n^R=\{c\}P(n,n-1,\ldots ,1)\) and, by Lemma 2, \({{\,\mathrm{Var}\,}}(K_n)=n\) and \({{\,\mathrm{Var}\,}}(K_n^R)=n+1\).

    Let \(n\ge 2\) and \(M_n=\{c\}P(1,2,\ldots ,n-1)\). Then \(M_n^R=P(n-1,n-2,\ldots ,1)\{c\}\). By Lemma 2, we have \({{\,\mathrm{Var}\,}}(M_n)=n\) and \({{\,\mathrm{Var}\,}}(M_n^R)=n-1\). \(\square \)

3.2 Kleene-closure

First, we consider the positive closure.

Theorem 2

For \(n\ge 1\), we have \(g_+^{{{\,\mathrm{Var}\,}}}(n)=\{1,2,\ldots ,n\}\).

Proof

  1. (i)

    We show that \({{\,\mathrm{Var}\,}}(L)=n\) implies \({{\,\mathrm{Var}\,}}(L^+)\le n\).

    Let L be a language with \({{\,\mathrm{Var}\,}}(L)=n\). Then there is a right linear grammar \(G=(N,T,P,S)\) such that \({{\,\mathrm{card}\,}}(N)=n\) and \(L(G)=L\). We construct the grammar \(G'=(N,T,P',S)\) where

    $$\begin{aligned} P'=P\cup \{A\rightarrow wS \mid A\rightarrow w\in P\}. \end{aligned}$$

    By construction, after finishing a derivation in G, we can start a new derivation in \(G'\). Thus \(L(G')=L^+\). Because \({{\,\mathrm{Var}\,}}(G')={{\,\mathrm{card}\,}}(N)=n\), we obtain \({{\,\mathrm{Var}\,}}(L^+)\le n\).

  2. (ii)

    We now show that \(n\in g_+^{{{\,\mathrm{Var}\,}}}(n)\) for \(n\ge 1\). Let \(L_n=P(1,2,\ldots ,n)\). By Lemma 1, \({{\,\mathrm{Var}\,}}(L_n)=n\).

    Let \(G=(N,\{a,b\}, P, S)\) be a right linear grammar such that \(L(G)=L_n^+\). Since \(L_n\subseteq L_n^+\), we can prove as in the proof of Lemma 1 that there are n different nonterminals \(B_1,B_2,\ldots ,B_n\) in N such that there are derivations

    $$\begin{aligned} S\Longrightarrow ^* y_iB_i\Longrightarrow ^* y_ix_iB_i\Longrightarrow ^* y_ix_iy_i' \end{aligned}$$

    where \(x_i=x_i'ab^iax_i''\) with \(x_i',x_i'', y_i,y_i'\in \{a,b\}^*\). Thus we have \({{\,\mathrm{Var}\,}}(L_n^+)\ge n\). By i) we get \({{\,\mathrm{Var}\,}}(L_n^+)=n\). Hence \(n\in g_+^{{{\,\mathrm{Var}\,}}}(n)\) for \(n\ge 1\).

  3. (iii)

    We now prove that all values k with \(3\le k\le n-1\) are in \(g_+^{{{\,\mathrm{Var}\,}}}(n)\). Let

    $$\begin{aligned} K_{n,k}=P(1)\cup P(2)\cup \ldots \cup P(n-k)\cup P(n-k+1,n-k+2,\ldots ,n-1). \end{aligned}$$

    By Lemma 1, we have \({{\,\mathrm{Var}\,}}(K_{n,k})=n\). Moreover, let

    $$\begin{aligned} U=(\{aba, ab^2a,\ldots ,ab^{n-k}a\}\cup P(n-k+1,n-k+2,\ldots ,n-1))^+. \end{aligned}$$

    Because \(ab^ia\in K_{n,k}\) for \(1\le i\le n-k\) and \(P(n-k+1,n-k+2,\ldots ,n-1)\subseteq K_{n,k}\), we get \(U\subseteq K_{n,k}^+\). Furthermore, \((ab^ia)^j\in U\) for \(1\le i\le n-k\) and \(j\ge 1\) and \(P(n-k+1,n-k+2,\ldots ,n-1)\subseteq U\) imply \(K_{n,k}^+\subseteq U\). Therefore we have \(U=K_{n,k}^+\).

    Because the right linear grammar \((\{S,A_1,A_2,\ldots ,A_{k-1}\}, \{a,b\}, P,S)\) with

    $$\begin{aligned} P= & {} \{S\rightarrow A_1, A_{k-1}\rightarrow ab^{n-1}aA_{k-1}, A_{k-1}\rightarrow ab^{n-1}a, A_{k-1}\rightarrow ab^{n-1}aS\} \\&\qquad \cup \bigcup _{i=1}^{n-k}\{ S\rightarrow ab^iaS, S\rightarrow ab^ia\} \\&\qquad \cup \bigcup _{j=1}^{k-2} \{A_j\rightarrow ab^{n-k+j}aA_j, A_j\rightarrow ab^{n-k+j}aA_{j+1}\} \end{aligned}$$

    generates U, we have \({{\,\mathrm{Var}\,}}(K_{n,k}^+)={{\,\mathrm{Var}\,}}(U)\le k\). On the other hand, let \(G'=(N',\{a,b\},P',S')\) be a right linear grammar such that \(L(G')=K_{n,k}^+\). As in the proof of Lemma 1, we can show that, starting from the words \((aba)^t\) and \(ab^{n-k+1}a\ldots ab^{n-k+j-1}a(ab^{n-k+j}a)^tab^{n-k+j+1}a\ldots ab^{n-1}a\), \(1\le j\le k-1\), for sufficiently large t, there are k letters \(B_1, B_{n-k+1}, \ldots ,B_{n-1}\) with derivations

    $$\begin{aligned} B_i&\Longrightarrow ^*&u_i(ab^{i}a)^{p_i}v_iB_i \text{ where } p_i \text{ is } \text{ a } \text{ natural } \text{ number }, u_i \text{ and } v_i \text{ are } \\&\text{ a } \text{ proper } \text{ suffix } \text{ and } \text{ a } \text{ proper } \text{ prefix } \text{ of } ab^{i}a, \text{ respectively }, \\ B_i&\Longrightarrow ^*&z_i \in \{a,b\}^*. \end{aligned}$$

    As in Lemma 1, we can prove that all these nonterminals are different. Thus \({{\,\mathrm{Var}\,}}(K_{n,k}^+)\ge k\).

    Combining these estimations we get \({{\,\mathrm{Var}\,}}(K_{n,k}^+)=k\). Thus \(k\in g_+^{{{\,\mathrm{Var}\,}}}(n)\).

  4. (iv)

    We show that \(1\in g_+^{{{\,\mathrm{Var}\,}}}(n)\) for \(n\ge 3\). Let \(M_n=P(1)\cup P(2)\cup \ldots \cup P(n-1)\). By Lemma 1, \({{\,\mathrm{Var}\,}}(M_n)=n\). Since \(M_n^+=\{aba,ab^2a,\ldots ,ab^{n-1}a\}^+\) is generated by the grammar

    $$\begin{aligned} G=\left( \{S\},\{a,b\},\bigcup _{i=1}^{n-1}\{S\rightarrow ab^iaS, S\rightarrow ab^ia\}, S\right) , \end{aligned}$$

    we have \({{\,\mathrm{Var}\,}}(M_n^+)=1\). Therefore \(1\in g_+^{{{\,\mathrm{Var}\,}}}(n)\).

  5. (v)

    We show that \(2\in g_+^{{{\,\mathrm{Var}\,}}}(n)\) for \(n\ge 3\). Let \(R_n=P(1)\cup P(2)\cup \ldots \cup P(n-2)\cup \{c\}P(n-1)\).

    As in Lemma 2, we can prove that \({{\,\mathrm{Var}\,}}(R_n)=n\).

    Moreover, we have \(R_n^+=(\{aba,ab^2a,\ldots ,ab^{n-2}a\}\cup \{c\}\{ab^{n-1}a\}^+)^+\). Again, as in the proof of Lemma 1, we can show the existence of a nonterminal \(B_{n-1}\) with derivations

    $$\begin{aligned} B_{n-1}&\Longrightarrow ^*&u_{n-1}(ab^{n-1}a)^{p_{n-1}}v_{n-1}B_{n-1} \text{ where } p_{n-1} \text{ is } \text{ a } \text{ natural } \text{ number, } \\&u_{n-1} \text{ and } v_{n-1} \text{ are } \text{ a } \text{ proper } \text{ suffix } \text{ and } \text{ a } \text{ proper } \text{ prefix } \\&\text{ of } ab^{n-1}a, \text{ respectively }, \\ B_{n-1}&\Longrightarrow ^*&z_{n-1} \in \{a,b\}^*. \end{aligned}$$

    If \({{\,\mathrm{Var}\,}}(R_n^+)=1\), then \(B_{n-1}\) is the start symbol. Then there is a derivation

    $$\begin{aligned} B_{n-1}\Longrightarrow ^* u_{n-1}(ab^{r_{n-1}}a)^{p_{n-1}}v_{n-1}B_{n-1}\Longrightarrow ^* u_{n-1}(ab^{r_{n-1}}a)^{p_{n-1}}v_{n-1}z_{n-1} \end{aligned}$$

    which derives a word not in \(R_n^+\) since it contains the subword \(ab^{n-1}a\) but no c. Therefore \({{\,\mathrm{Var}\,}}(R_n^+)\ge 2.\)

    The right linear grammar \((\{S,S'\},\{a,b,c\},P,S)\) with

    $$\begin{aligned} P= & {} \{S\rightarrow cS', S'\rightarrow ab^{n-1}aS', S'\rightarrow ab^{n-1}a, S'\rightarrow ab^{n-1}aS\} \\&\qquad \cup \bigcup _{i=1}^{n-2}\{S\rightarrow ab^iaS, S\rightarrow ab^ia\} \end{aligned}$$

    generates \(R_n^+\), which gives \({{\,\mathrm{Var}\,}}(R_n^+)=2\). This proves \(2\in g_+^{{{\,\mathrm{Var}\,}}}(n)\).

  6. (vi)

    Finally, we prove \(1\in g_+^{{{\,\mathrm{Var}\,}}}(2)\).

    Let \(Q=\{a\} \cup \{a^3\}^*\). Assume that \({{\,\mathrm{Var}\,}}(Q)=1\). Then there is a right linear grammar \((\{S\},\{a\},P,S)\) which generates Q. Obviously, P contains a rule \(S\rightarrow a^kS\), \(k\ge 1\), since otherwise a finite language is generated. Moreover, there is a derivation \(S\Longrightarrow ^* a\). Then we have the derivations

    $$\begin{aligned} S\Longrightarrow a^kS\Longrightarrow ^* a^ka=a^{k+1} \text{ and } S\Longrightarrow a^kS\Longrightarrow a^ka^kS\Longrightarrow ^* a^ka^ka=a^{2k+1}. \end{aligned}$$

    Because \(k+1>1\), we obtain \(k+1=3p\) and \(2k+1=3q\) for certain positive integers p and q. We add these equations and get \(3k+2=3(p+q)\) or equivalently \(2=3(p+q-k)\) which is impossible. This contradiction proves \({{\,\mathrm{Var}\,}}(Q)\ge 2\).

    The right linear grammar \((\{S,S'\}, \{a\},\{S\rightarrow a, S\rightarrow S', S'\rightarrow a^3S', S'\rightarrow a^3\},S)\) generates Q. Therefore \({{\,\mathrm{Var}\,}}(Q)=2\).

    Moreover, we have \(Q^+=\{a\}^+\) and \(\{a\}^+\) is generated by the right linear grammar \((\{S\},\{a\},\{S\rightarrow aS, S\rightarrow a\},S)\). Hence \({{\,\mathrm{Var}\,}}(Q^+)=1\) and \(1\in g_+^{{{\,\mathrm{Var}\,}}}(2)\). \(\square \)

Theorem 3

For \(n\ge 1\), we have \(g_*^{{{\,\mathrm{Var}\,}}}(n)=\{1,2,\ldots ,n+1\}\).

Proof

  1. (i)

    We first prove that \({{\,\mathrm{Var}\,}}(L)=n\) implies \({{\,\mathrm{Var}\,}}(L^*)\le n+1\). We start with a grammar \(G=(N,T,P,S)\) such that \(L(G)=L\) and \({{\,\mathrm{Var}\,}}(G)={{\,\mathrm{Var}\,}}(L)=n\). Then we construct the grammar \(G'=(N,T,P',S)\) which generates \(L^+\) as in part i) of the proof of Theorem 2. Now we modify \(G'\) to \(G''=(N\cup \{S'\}, T, P'\cup \{S'\rightarrow \lambda ,\) \(S'\rightarrow S\},S')\) where \(S'\) is a new symbol. Then \(L(G'')=L^*\) and \({{\,\mathrm{Var}\,}}(G'')=n+1\). Thus \({{\,\mathrm{Var}\,}}(L^*)\le n+1\).

  2. (ii)

    Let \(n\ge 1\). Let \(L_n=P(1,2,\ldots ,n)\{c\}\). Then \({{\,\mathrm{Var}\,}}(L_n)=n\) by Lemma 2.

    Let \(G=(N,\{a,b,c\},P,S)\) be a grammar such that \(L(G)=L_n^*\) and \({{\,\mathrm{Var}\,}}(G)={{\,\mathrm{Var}\,}}(L_n^*)\). Since \(L_n\subseteq L_n^*\), as in Lemma 1, we can prove the existence of n different nonterminals \(B_1,B_2,\ldots ,B_n\) with a derivation

    $$\begin{aligned} B_i&\Longrightarrow ^*&u_i(ab^{i}a)^{p_i}v_iB_i \text{ where } p_i \text{ is } \text{ a } \text{ natural } \text{ number }, \nonumber \\&u_i \text{ and } v_i \text{ are } \text{ a } \text{ proper } \text{ suffix } \text{ and } \text{ a } \text{ proper } \text{ prefix } \text{ of } ab^{i}a, \text{ respectively }, \end{aligned}$$
    (1)

    \(1\le i\le n\), in N. Moreover, since \(\lambda \in L_n^*\), there is a derivation \(S\Longrightarrow ^* C\Longrightarrow \lambda \) where \(C\rightarrow \lambda \) is applied in the last step. If \(C=B_i\) for some i, \(1\le i\le n\), then we have a derivation

    $$\begin{aligned} S\Longrightarrow ^* C=B_i\Longrightarrow u_i(ab^{i}a)^{p_i}v_iB_i = u_i(ab^{i}a)^{p_i}v_iC \Longrightarrow u_i(ab^{i}a)^{p_i}v_i, \end{aligned}$$

    which produces a word in \(L(G)=L_n^*\) which is not empty and contains no c. This contradicts the fact that a non-empty word of \(L_n^*\) contains a c. Thus we need an additional nonterminal, i.e., \({{\,\mathrm{Var}\,}}(L_n^*)\ge n+1\). By part i) of this proof, \({{\,\mathrm{Var}\,}}(L_n^*)=n+1\). Hence \(n+1\in g_*^{{{\,\mathrm{Var}\,}}}(n)\) for \(n\ge 1\).

  3. (iii)

    Let \(n\ge 1\) and \(L_n'=P'(1,2,\ldots ,n)\). Then we have \({{\,\mathrm{Var}\,}}(L_n')=n\) by Lemma 1. Let \(G=(N,\{a,b\},P,S)\) be a grammar such that \(L(G)=(L_n')^*\) and \({{\,\mathrm{Var}\,}}(G)={{\,\mathrm{Var}\,}}((L_n')^*)\). Since \(L_n'\subseteq (L_n')^*\), as in Lemma 1, we can prove the existence of n different nonterminals \(B_1,B_2,\ldots ,B_n\) with a derivation (1). Hence \({{\,\mathrm{Var}\,}}((L_n')^*)\ge ~n\).

    Because the grammar \(H=(\{B_1,B_2,\ldots ,B_n\}, \{a,b\}, P,B_1)\) with

    $$\begin{aligned} P= \{ B_n\rightarrow ab^naB_n, B_n\rightarrow B_1, B_n\rightarrow \lambda \} \cup \bigcup _{i=1}^{n-1} \{B_i\rightarrow ab^iaB_i, B_i\rightarrow B_{i+1}\} \end{aligned}$$

    generates \((L_n')^*\), we obtain \({{\,\mathrm{Var}\,}}((L_n')^*)=n\).

    Hence \(n\in g_*^{{{\,\mathrm{Var}\,}}}(n)\) for \(n\ge 1\).

  4. (iv)

    \(1,2,\ldots ,n-1\in g_*^{{{\,\mathrm{Var}\,}}}(n)\) can be shown by the witnesses given in the proof of Theorem 2; we only have to add the rule \(S\rightarrow \lambda \) for the axiom S because then the Kleene-closure of the witness is generated. \(\square \)

3.3 Concatenation

For concatenation, we only present some partial results.

Lemma 5

For any positive integers \(n\ge 1\) and \(m\ge 1\), languages \(L_n\) and \(K_m\) with \({{\,\mathrm{Var}\,}}(L_n)=n\) and \({{\,\mathrm{Var}\,}}(K_m)=m\), we have \({{\,\mathrm{Var}\,}}(L_nK_m)\le n+m\).

Proof

Let \(G_n=(N_n,T,P_n,S_n)\) and \(H_m=(N_m,T,P_m,S_m)\) be two right linear grammars such that \(L(G_n)=L_n\), \(L(H_m)=K_m\), \({{\,\mathrm{card}\,}}(N_n)={{\,\mathrm{Var}\,}}(L_n)=n\), and \({{\,\mathrm{card}\,}}(N_m)={{\,\mathrm{Var}\,}}(K_m)=m\). We can assume that \(N_n\cap N_m=\emptyset \). We consider the right linear grammar

$$\begin{aligned} G=(N_n\cup N_m, T, \{A\rightarrow wB \mid A\rightarrow wB\in P_n\} \cup \{A\rightarrow wS_m \mid A\rightarrow w\in P_n\} \cup P_m , S_n). \end{aligned}$$

Since the finishing of a derivation in \(G_n\) by a rule \(A\rightarrow w\) is replaced by an application of \(A\rightarrow wS_m\), we have to continue with a derivation in \(H_m\). Thus \(L(G)=L_nK_m\). Because \({{\,\mathrm{Var}\,}}(G)=n+m\), we obtain the relation \({{\,\mathrm{Var}\,}}(L_nK_m)\le n+m\). \(\square \)

Theorem 4

For \(n\ge 1\), \(m\ge 1\), we have

$$\begin{aligned} \{\min \{m,n\}, \min \{m,n\}+1,\ldots ,m+n\}\subseteq g_{\cdot }^{{{\,\mathrm{Var}\,}}}(m,n). \end{aligned}$$

Proof

  1. (i)

    For \(n\ge 1\) and \(m\ge 1\), let

    $$\begin{aligned} L_n=P(1,2,\ldots ,n) \quad \text{ and } \quad K_{m,n}=P(n+1,n+2,\ldots ,n+m). \end{aligned}$$

    By Lemma 1, \({{\,\mathrm{Var}\,}}(L_n)=n\) and \({{\,\mathrm{Var}\,}}(K_m)=m\). Moreover,

    $$\begin{aligned} L_nK_m=P(1,2,\ldots ,n,n+1,\ldots ,n+m). \end{aligned}$$

    Again, by Lemma 1, \({{\,\mathrm{Var}\,}}(L_nK_m)=n+m\). This proves \(n+m\in g_{\cdot }^{{{\,\mathrm{Var}\,}}}(m,n)\).

  2. (ii)

    Let \(n\ge m\ge 2\) and \(n-1\ge k'\ge 0\). We consider the languages

    $$\begin{aligned}&L_n=P'(1,2,\ldots ,n) \text{ and } K_{m,k'}\\&\quad =U(k'+1,k'+2,\ldots ,n)P(n+1,n+2,\ldots n+m-1). \end{aligned}$$

    Then we obtain

    $$\begin{aligned} L_nK_{m,k'}=P'(1,2,\ldots ,k')U(k'+1,k'+2,\ldots ,n)P(n+1,n+2,\ldots n+m-1). \end{aligned}$$

    By Lemmas 1 and 3 , we get

    $$\begin{aligned} {{\,\mathrm{Var}\,}}(L_n)=n,\ {{\,\mathrm{Var}\,}}(K_{m,k'})=m, \quad \text{ and } \quad {{\,\mathrm{Var}\,}}(L_nK_{m,k'})=k'+m. \end{aligned}$$

    Thus \(k\in g_{\cdot }^{{{\,\mathrm{Var}\,}}}(n,m)\) for \(m\le k\le m+n-1\).

    If \(n\ge m=1\), then we consider the languages \(L_n=P'(1,2,\ldots ,n)\) and \(K_{m,k'}=U(k'+1,k'+2,\ldots ,n)\).

  3. (iii)

    For \(m\ge n\ge 2\) and \(m-1\ge k'\ge 0\), we consider

    $$\begin{aligned}&L_{n,k'}=P(1,2,\ldots ,n-1)U(n+1,n+2,\ldots ,n+m-k'),\\&\quad K_m=P'(n+1,n+2,\ldots n+m) \end{aligned}$$

    and obtain \(k\in g_{\cdot }^{{{\,\mathrm{Var}\,}}}(n,m)\) for \(n\le k\le m+n-1\).

    For \(m\ge n=1\), we consider the modification analogous to that in ii). \(\square \)

With respect to numbers which are smaller than \(\min \{n,m\}\), we only know that 1 can be obtained if \(n\ge 2\) and \(m\ge 2\).

Lemma 6

For \(n\ge 2\) and \(m\ge 2\), we have \(1\in g_{\cdot }^{{{\,\mathrm{Var}\,}}}(n,m)\).

Proof

Let h be the homomorphism defined by \(h(a)=b\) and \(h(b)=a\).

If \(n\ge 3\) and \(m\ge 3\), we consider the languages

$$\begin{aligned} L_n=P(1,2,\ldots ,n-2)\cup \{b\}\{a,b\}^* \text{ and } K_m=h(P(1,2,\ldots ,m-2\}\cup \{b\}\{a,b\}^*). \end{aligned}$$

Then we obtain \({{\,\mathrm{Var}\,}}(L_n)=n\) and \({{\,\mathrm{Var}\,}}(K_m)=m\) by Lemma 4 and symmetry. Moreover, \(L_nK_m=\{a,b\}^+\), which gives \({{\,\mathrm{Var}\,}}(L_nK_m)=1\).

If \(n=2\) or \(m=2\), we replace \(P(1,2,\ldots ,n-2)\) and \(P(1,2,\ldots ,m-2)\) by the empty set. \(\square \)

With respect to state complexity, there are investigations where instead of all regular languages only members of some subregular families are considered. For instance, the restriction to prefix-free languages was considered in [8, 15] (where a language L is prefix-free if and only if no proper prefix of a word in L belongs to L). We now show that such restrictions lead to the situation that small values cannot be obtained for the \({{\,\mathrm{Var}\,}}\)-complexity of \(L_nK_m\).

Definition 1

Two languages L and K are well concatenated if and only if, for any words u and v, \(u\in L\) and \(uv\in LK\) implies \(v\in K\).

As an example, the languages \(L=\{x,x^2\}\) and \(K=\{x^2\}\) are not well concatenated since \(x\in L\) in \(xx^3=x^4\in LK\), but \(x^3\) is not in K. On the other hand, \(L'=\{x^2,x^3\}\) and \(K'=\{x^2,x^3\}\) are well concatenated.

Remark 1

If L is prefix-free and K is an arbitrary language, then L and K are well concatenated.

This can be seen as follows. Let us assume that L and K are not well concatenated. Then there are words u and v such that \(u\in L\), \(uv\in LK\), and \(v\notin K\). Then there are words \(u'\) and \(v'\) such that \(u'v'=uv\), \(u'\in L\), and \(v'\in K\). Obviously, \(u\ne u'\). Therefore u is a proper prefix of \(u'\) or \(u'\) is a proper prefix of u. Since both words are in L, we obtain a contradiction to the prefix-freeness of L.

We now show that small numbers cannot occur if L and K are well concatenated.

Lemma 7

Let L and K be well concatenated. Then \({{\,\mathrm{Var}\,}}(LK)\ge {{\,\mathrm{Var}\,}}(K)-1\).

Proof

Let \(G=(N,T,P,S)\) be a right linear grammar such that \(L(G)=LK\) and \({{\,\mathrm{Var}\,}}(G)={{\,\mathrm{Var}\,}}(LK)\). Let t be the maximal length of a terminal word w with \(A\rightarrow wB\in P\) or \(A\rightarrow w\in P\).

Let \(S'\) be a symbol not in \(N\cup T\). For any derivation

$$\begin{aligned} D:\ S\Longrightarrow ^* zX\Longrightarrow zuvY\Longrightarrow ^* y\in L(G) \text{ or } D:\ S\Longrightarrow ^* zX\Longrightarrow zuv\in L(G) \end{aligned}$$

with \(zu\in L\) (and \(X,Y\in N\)), we set \(p_D=S'\rightarrow vY\) or \(p_D=S'\rightarrow v\), respectively. Let \(P'\) be the set of all rules obtained in this way. Since we have only finitely many rules for any nonterminal X, \(P'\) is finite. Then we construct the the right linear grammar \(G'=(N\cup \{S'\},T,P\cup P',S')\) and prove that \(L(G')=K\).

Let \(w\in K\). We take a word \(w'\in L\). Then \(w'w\in LK\). Hence, in G, there are derivations

$$\begin{aligned} D:\ S\Longrightarrow ^* zX\Longrightarrow zuvY\Longrightarrow ^* zuvy\in L(G) \text{ or } D: \ S\Longrightarrow ^* zX\Longrightarrow zuv\in L(G) \end{aligned}$$

with \(zu=w'\) and \(vy=w\) or \(v=w\), respectively. In the former case, by construction \(S'\rightarrow vY\in P'\), and hence we have the derivation \(S'\Longrightarrow vY\Longrightarrow ^* vy=w\) in \(G'\) (because the derivation \(Y\Longrightarrow ^* y\) uses only rules of P), which proves \(w\in L(G')\). Analogously, we conclude in the latter case. Thus \(K\subseteq L(G')\).

Conversely, let \(w\in L(G')\). Then there are derivations

$$\begin{aligned} D':\ S'\Longrightarrow vY\Longrightarrow ^* vy' \text{ or } D':\ S'\Longrightarrow v \end{aligned}$$

with \(w=vy'\) or \(w=v\). Since \(S'\rightarrow vY\) or \(S'\rightarrow v\) in \(P'\), there are derivations

$$\begin{aligned} D:\ S\Longrightarrow ^* zX\Longrightarrow zuvY\Longrightarrow ^* zuvy\in L(G) \text{ or } D: \ S\Longrightarrow ^* zX\Longrightarrow zuv\in L(G) \end{aligned}$$

in G with \(zu\in L\). However, then we also have the derivations

$$\begin{aligned} D:\ S\Longrightarrow ^* zX\Longrightarrow zuvY\Longrightarrow ^* zuvy'\in L(G) \text{ or } D: \ S\Longrightarrow ^* zX\Longrightarrow zuv\in L(G). \end{aligned}$$

We obtain \(zuvy'\in LK\) or \(zuv\in LK\), respectively. Since \(zu\in L\) and L and K are well concatenated, we get \(vy'=w\in K\) or \(v=w\in K\), respectively. Therefore \(L(G')\subseteq K\).

By definition \({{\,\mathrm{Var}\,}}(K)\le {{\,\mathrm{Var}\,}}(G')={{\,\mathrm{card}\,}}(N)+1={{\,\mathrm{Var}\,}}(LK)+1\). \(\square \)

We note that the witnesses given in the proof of Theorem 4 are not prefix-free. However, if we consider the prefix-free languages \(L_n=P(1,2,\ldots ,n)\{c\}\) and \(K_m=P(n+1,n+2,\ldots ,n+m)\{d\}\), where c and d are additional letters, we also get \({{\,\mathrm{Var}\,}}(L_n)=n\), \({{\,\mathrm{Var}\,}}(K_m)=m\), and \({{\,\mathrm{Var}\,}}(L_nK_m)=n+m\), i.e., the upper bound \(m+n\) is also obtained with prefix-free sets. We do not know whether Theorem 4 also holds for prefix-free languages (together with Lemma 7 and Remark 1 it would be a nearly optimal result).

3.4 Intersection

We start with a lemma which is an intermediate consequence of the commutativity of intersection.

Lemma 8

For \(n\ge 1\) and \(m\ge 1\), we have \(g_{\cap }^{{{\,\mathrm{Var}\,}}}(n,m)=g_{\cap }^{{{\,\mathrm{Var}\,}}}(m,n)\). \(\Box \)

We now give a partial result concerning \(g_{\cap }^{{{\,\mathrm{Var}\,}}}\). In contrast to concatenation, we only know that certain small numbers are in the intersection.

Lemma 9

  1. (i)

    For \(n\ge 3\) and \(m\ge 3\), we have \(\{0,1,2,\ldots , n+m-3\}\subseteq g_{\cap }^{{{\,\mathrm{Var}\,}}}(n,m)\).

  2. (ii)

    (i) For \(n\ge 3\) and \(m\in \{1,2\}\), we have \(\{0,1,2,\ldots , n\}\subseteq g_{\cap }^{{{\,\mathrm{Var}\,}}}(n,m)\).

Proof

By Lemma 8, it is sufficient to consider the situation that \(n\ge m\).

We distinguish some cases and give witnesses \(L_n\) with \({{\,\mathrm{Var}\,}}(L_n)=n\) and \(K_m\) with \({{\,\mathrm{Var}\,}}(K_m)=m\) such that \({{\,\mathrm{Var}\,}}(L_n\cap K_m)=k\). Since the proofs that the languages have the required \({{\,\mathrm{Var}\,}}\)-complexities follow directly from Lemmas 14 or can be given by arguments analogous to those used in the proofs of Lemmas 14 in all cases, we only present the witnesses in Table 3 (in order to simplify the notation we omit the possible dependence of the languages by the parameters k or \(k'\)). \(\square \)

Table 3 Table of the witnesses to prove Lemma 9

We do not know an upper bound for the numbers in \(g_{\cap }^{{{\,\mathrm{Var}\,}}}(n,m)\). This comes from the fact that all constructions of a right linear grammar for \(L_n\cap K_m\) from right linear grammars \(G_n\) and \(H_m\) for \(L_n\) and \(K_m\), respectively, do not only depend on n and m. For instance, we can transform the given grammars into right linear grammars \(G_n'=(N_n,T,P_n,S_n)\) and \(H_m'=(N_m,T,P_m,S_m)\) in normal form. Then the right linear grammar \((N_n\times N_m, T,P,S)\) with \(S=(S_n,S_m)\) and

$$\begin{aligned} P= & {} \{ (A,B)\rightarrow a(C,D) \mid A\rightarrow aC\in P_n, B\rightarrow aD\in P_m\} \\&\quad \cup \{(A,B)\rightarrow \lambda \mid A\rightarrow \lambda \in P_n, B\rightarrow \lambda \in P_m\} \end{aligned}$$

generates \(L_n\cap K_m\). However, the blowup of the number of nonterminals in the construction of the normal forms depends on the length of the right hand sides of the rules in the original grammars. Since this length can be arbitrarily large, we have no upper bound for \(g_{\cap }^{{{\,\mathrm{Var}\,}}}(n,m)\) using this construction. The situation for other constructions—known to us—is similar.

3.5 Union

Finally we study the behaviour under union.

We start by proving that \(g_{\cup }^{{{\,\mathrm{Var}\,}}}\) is a symmetric function.

Lemma 10

For \(n\ge 1\) and \(m\ge 1\), we have \(g_{\cup }^{{{\,\mathrm{Var}\,}}}(n,m)=g_{\cup }^{{{\,\mathrm{Var}\,}}}(m,n)\). \(\Box \)

The statement follows immediately from the commutativity of union.

Theorem 5

For \(n\ge 1\) and \(m\ge 1\), we have \(g_{\cup }^{{{\,\mathrm{Var}\,}}}(n,m)=\{1,2,\ldots ,m+n+1\}\)

Proof

We first prove that \({{\,\mathrm{Var}\,}}(L_n)=n\) and \({{\,\mathrm{Var}\,}}(K_m)=m\) imply \({{\,\mathrm{Var}\,}}(L_n\cup K_m)\le n+m+1\). Let \(G_n=(N_n,T,P_n,S_n)\) and \(H_m=(N_m,T,P_m,S_m)\) be right linear grammars with \(L(G_n)=L_n\), \(L(H_m)=K_m\), \({{\,\mathrm{Var}\,}}(G_n)={{\,\mathrm{Var}\,}}(L_n)=n\), and \({{\,\mathrm{Var}\,}}(H_m)={{\,\mathrm{Var}\,}}(K_m)=m\). We assume that \(N_n\cap N_m=\emptyset \). Then, by the standard construction, the right linear grammar

$$\begin{aligned} G=(N_n\cup N_m\cup \{S\}, T, P_n\cup P_m\cup \{S\rightarrow S_n, S\rightarrow S_m\},S) \end{aligned}$$

generates \(L_n\cup K_m\) and has \(n+m+1\) nonterminals. Therefore, \({{\,\mathrm{Var}\,}}(L_n\cup K_m)\le n+m+1\).

For \(n\ge 1\) and \(m\ge 1\), let \(L_n=P(1,2,\ldots ,n)\) and \(K_m=P(n+1,n+2,\ldots ,n+m)\). Then we have \({{\,\mathrm{Var}\,}}(L_n)=n\), \({{\,\mathrm{Var}\,}}(K_m)=m\), and \({{\,\mathrm{Var}\,}}(L_n\cup K_m)=n+m+1\) by Lemma 1.

For \(n\ge 1\), \(m\ge 1\), and \(1\le k\le n+m\), in Tables 4 and 5 , we now present witness languages \(L_n\) and \(K_m\) such that the relations \({{\,\mathrm{Var}\,}}(L_n)=n\), \({{\,\mathrm{Var}\,}}(K_m)=m\), and \({{\,\mathrm{Var}\,}}(L_n\cup K_m)=k\) hold. By Lemma 10, we can assume without loss of generality that \(n\ge m\). Because in all cases the \({{\,\mathrm{Var}\,}}\)-complexity of the given languages follows directly by Lemmas 14 or can be given by arguments similar to those used in the proof of Lemmas 14, we only give the witnesses and their union. \(\square \)

Table 4 Table of the witnesses \(L_n\) with \({{\,\mathrm{Var}\,}}(L_n)=n\) and \(K_m\) with \({{\,\mathrm{Var}\,}}(K_m)=m\) for \(n\ge 3\)
Table 5 Table of the witnesses \(L_n\) with \({{\,\mathrm{Var}\,}}(L_n)=n\) and \(K_m\) with \({{\,\mathrm{Var}\,}}(K_m)=m\) for \(n\le 2\)

4 Conclusions

In this paper we have started the investigation of the sets \(g_f^{{{\,\mathrm{Var}\,}}}(n,m)\) (and \(g_f^{{{\,\mathrm{Var}\,}}}(n)\)) for regularity preserving binary (and unary, respectively) operations f. Summarizing our results, we obtain Table 6.

Table 6 Summary of the results on \(g_f^{{{\,\mathrm{Var}\,}}}(n,m)\) and \(g_f^{{{\,\mathrm{Var}\,}}}(n)\)

We see that we have complete results for reversal, positive Kleene-closure, Kleene-closure and union. However, for concatenation and intersection, we have only given some partial results; an exact determination of \(g_.^{{{\,\mathrm{Var}\,}}}(n,m)\) and \(g_{\cap }^{{{\,\mathrm{Var}\,}}}(n,m)\) remains to be done.

In comparison with the measures \({{\,\mathrm{Varnf}\,}}\) of right linear grammars in normal forms (which equals the nondeterministic state complexity \({{\,\mathrm{nsc}\,}}\)) and \({{\,\mathrm{Varcf}\,}}\) of context-free grammars given in the Tables 1 and 2 , we see that our results correspond more to those of \({{\,\mathrm{Varnf}\,}}\) (\(={{\,\mathrm{nsc}\,}}\)) (those for \({{\,\mathrm{nsc}\,}}\) are more complete) than to those of \({{\,\mathrm{Varcf}\,}}\) where the upper bound for concatenation and the range for reversal are different.

It is left open to determine the range for further operations as—for instance—complement, set-subtraction, and symmetric difference.

The behaviour of other complexity measures defined on special regular languages under operations is already investigated; for \({{\,\mathrm{sc}\,}}\) see [5]. Especially, the behaviour of finite and unary languages is studied. With respect to \({{\,\mathrm{Var}\,}}\), the behaviour of finite and unary languages is not of interest since, for any finite language L, \({{\,\mathrm{Var}\,}}(L)=1\) holds, and for any unary language K, \({{\,\mathrm{Var}\,}}(K)\le 2\) is valid. However, for other special languages as regular prefix-free languages or regular suffix-closed languages (any suffix of a word in L is in L, too) the behaviour can be studied.

Furthermore, for a right linear grammar \(G=(N,T,P,S)\), we can also define the complexity measures

$$\begin{aligned} {{\,\mathrm{Prod}\,}}(G)={{\,\mathrm{card}\,}}(P), \text{ and } {{\,\mathrm{Symb}\,}}(G)=\sum _{A\rightarrow w\in P} (\vert w\vert +2), \end{aligned}$$

which count the number of rules and the sum of symbols contained in rules, respectively. Then we can extend these measures for regular languages as in Section 1. These measures describe the complexity of a language in more precise manner. For context-free languages, the investigation of the behaviour of the corresponding complexity measures under operations was done in [2, 9]. A study of the ranges for the measures \({{\,\mathrm{Prod}\,}}\) and \({{\,\mathrm{Symb}\,}}\) for right linear grammars is left as an open field.