Keywords

1 Introduction

Brzozowski derivatives [5] and Antimirov’s partial derivatives [4] are well-known tools to transform regular expressions to finite automata and to define algorithms for equivalence and containment of regular languages [3, 10]. Both automata constructions rely on the finiteness of the set of iterated derivatives. Brzozowski derivatives need to be considered up to similarity (commutativity, associativity, and idempotence for union) to obtain finiteness. Derivatives had quite some impact on the study of algorithms for regular languages on finite words and trees [6, 15].

There are many studies of derivative structures for enhancements of regular expressions. While Brzozowski’s original work covered extended regular expressions, partial derivatives were originally limited to simple expressions without intersection and complement. It is a significant effort to define partial derivatives for extended regular expressions [6]. Many further operators have been considered, among them shuffle operators [16], multi-tilde-bar expressions [7], expressions with multiplicities [12], approximate regular expressions [9], and many more. There have been a number of approaches to develop general frameworks for derivation: Caron and coworkers [8] abstract over the support for creating derivations, Thiemann [17] develops criteria for derivable language operators.

Recently, there has been practical interest in the study of derivatives and partial derivatives. Owens and coworkers [14] report a functional implementation with some extensions (e.g., character classes) to handle large character sets, which is partially rediscovering work on the FIRE library [18]. Might and coworkers [1, 13] push beyond regular languages by implementing parsing for context-free languages using derivatives and demonstrate its feasibility in practice.

Winter and coworkers [19] study context-free languages in a coalgebraic setting. They use a notion of derivative to give three equivalent characterizations of context-free languages by grammars in weak Greibach normal form, behavioral differential equations, and guarded \(\mu \)-regular expressions.

In this work, we focus on using derivatives for parsing of context-free languages. While Might and coworkers explore algorithmic issues, we investigate the correctness of context-free parsing with derivatives. To this end, we develop the theory of derivatives for \(\mu \)-regular expressions, which extend regular expressions with a least fixed point operator. Our results are relevant for context-free parsing because \(\mu \)-regular expressions are equivalent to context-free grammars in generating power. Compared to the work of Winter and coworkers [19], we do not require recursion to be guarded (i.e., we admit left recursion) and we focus on establishing the connection to pushdown automata. Unguarded recursion forces us to consider derivation by \(\varepsilon \), which corresponds to an unfolding of a left-recursive \(\mu \)-expression. Guarded expressions always admit a proper derivation by a symbol.

Our theory is the proper generalization of Antimirov’s theory of partial derivatives to \(\mu \)-regular expressions: our derivative function corresponds exactly to the transition function of the nondeterministic pushdown automaton that recognizes the same language. The pendant of Antimirov’s finiteness result yields the finiteness of the set of pushdown symbols of this automaton.

2 Preliminaries

We write \(\mathbb {N}\) for the set of natural numbers, \(\mathbb {B}= \{ \mathbf {ff}, \mathbf {tt}\}\) for the set of booleans, and \(X \uplus Y\) for the disjoint union of sets X and Y. We consider total maps \(m:X \rightarrow Y\) as sets of pairs in the usual way, so that \(m \subseteq X \times Y\) and \(\emptyset \) denotes the empty mapping. For \(x_0\in X\) and \(y_0\in Y\), the map update of m is defined as \(m[y_0/x_0] (x) = y_0\) if \(x=x_0\) and \(m[y_0/x_0] (x) = m (x)\) if \(x \ne x_0\).

For conciseness, we fix a finite set of symbols, \(\varSigma \), as the underlying alphabet. We write \(\varSigma ^*\) for the set of finite words over \(\varSigma \), \(\varepsilon \in \varSigma ^*\) stands for the empty word, and \(\varSigma ^+ = \varSigma ^*\setminus \{\varepsilon \}\). For \(u,v\in \varSigma ^*\), we write \(|u|\in \mathbb {N}\) for the length of u and \(u\cdot v \) (or just uv) for the concatenation of words.

Given languages \(U,V,W \subseteq \varSigma ^*\), concatenation extends to languages as usual: \(U \cdot V = \{ u \cdot v \mid u\in U, v \in V\}\). The Kleene closure is defined as the smallest set \(U^* \subseteq \varSigma ^*\) such that \(U^* = \{\varepsilon \} \cup U \cdot U^*\). We write the left quotient as \(U\backslash W = \{ v \mid v\in \varSigma ^*, \exists u\in U: uv \in W \}\). For a singleton language \(U = \{u\}\), we write \(u\backslash W\) for the left quotient.

Definition 1

A (nondeterministic) finite automaton (NFA) is a tuple \(\mathcal {A} = (Q, \varSigma , \delta , q_0, F)\) where Q is a finite set of states, \(\varSigma \) an alphabet, \(\delta \subseteq Q \times \varSigma \times Q\) the transition relation, \(q_0 \in Q\) the initial state, and \(F \subseteq Q\) the set of final states.

Let \(n\in \mathbb {N}\). A run of \(\mathcal {A}\) on \(w = a_0\dots a_{n-1}\in \varSigma ^*\) is a sequence \(q_0\dots q_n \in Q^*\) such that, for all \(0\le i <n\), \((q_i, a_i, q_{i+1}) \in \delta \). The run is accepting if \(q_n \in F\). The language recognized by \(\mathcal {A}\) is \(\mathcal {L}^{}(\mathcal {A}) = \{ w \in \varSigma ^* \mid \exists \, {accepting\, run\, of}\, \mathcal {A}\, {on}\, w \}\).

Definition 2

A (nondeterministic) pushdown automaton (PDA) is a tuple \(\mathcal {P} = (Q, \varSigma , \varGamma , \delta , q_0, Z_0)\) where Q is a finite set of states, \(\varSigma \) the input alphabet, \(\varGamma \) the pushdown alphabet (a finite set), \(\delta \subseteq Q \times (\varSigma \cup \{\varepsilon \}) \times \varGamma \times Q \times \varGamma ^* \) is the transition relation, \(q_0 \in Q\) is the initial state, \(Z_0 \in \varGamma \) is the bottom symbol.

A configuration of \(\mathcal {P} \) is a tuple \(c \in Q \times \varSigma ^* \times \varGamma ^*\) of the current state, the rest of the input, and the current contents of the pushdown.

The transition relation \(\delta \) gives rise to a binary stepping relation \(\vdash \) on configurations defined by (for all \(q, q' \in Q\), \(\alpha \in \varSigma \cup \{\varepsilon \}\), \(Z\in \varGamma \), \(\gamma ,\gamma ' \in \varGamma ^*\), \(v\in \varSigma ^*\)):

figure a

The language of the PDA is \(\mathcal {L}^{}(\mathcal {P}) = \{ v \in \varSigma ^* \mid \exists q \in Q: (q_0, v, Z_0) \vdash ^* (q, \varepsilon , \varepsilon ) \}\) where \(\vdash ^*\) is the reflexive transitive closure of \(\vdash \).

3 \(\mu \)-Regular Expressions

Regular expressions can be extended with a least fixed point operator \(\mu \) to extend their scope to context-free languages [11].

Definition 3

The set \(\mathbf {R}(\varSigma , X)\) of \(\mu \)-regular pre-expressions over alphabet \(\varSigma \) and set of variables X is defined as the smallest set such that

  • \(\mathbf 0\in \mathbf {R}(\varSigma , X)\),

  • \(\mathbf 1\in \mathbf {R}(\varSigma , X)\),

  • \(a\in \varSigma \) implies \(a \in \mathbf {R}(\varSigma , X)\),

  • \(r, s \in \mathbf {R}(\varSigma , X)\) implies \(r \cdot s\in \mathbf {R}(\varSigma , X)\),

  • \(r,s \in \mathbf {R}(\varSigma , X)\) implies \(r+s \in \mathbf {R}(\varSigma , X)\),

  • \(r \in \mathbf {R}(\varSigma , X)\) implies \(r^*\in \mathbf {R}(\varSigma , X)\),

  • \(x \in X\) implies \(x \in \mathbf {R}(\varSigma , X)\),

  • \(r \in \mathbf {R}(\varSigma , X \cup \{x\})\) implies \(\mu x.r \in \mathbf {R}(\varSigma , X)\).

The set \(\mathbf {R}(\varSigma )\) of \(\mu \)-regular expressions over \(\varSigma \) is defined as \(\mathbf {R}(\varSigma ) := \mathbf {R}(\varSigma , \emptyset )\).

As customary, we consider the elements of \(\mathbf {R}(\varSigma ,X)\) as abstract syntax trees and freely use parentheses to disambiguate. We further assume that \(*\) has higher precedence than \(\cdot \), which has higher precedence than \(+\). The \(\mu x\)-operator binds the recursion variable x with lowest precedence: its scope extends as far to the right as possible. A variable x occurs free if there is no enclosing \(\mu x\)-operator. A closed expression has no free variables.

Definition 4

The language denoted by a \(\mu \)-regular pre-expression is defined inductively by \(\mathcal {L}^{}: \mathbf {R}(\varSigma , X) \times (X \rightarrow \wp (\varSigma ^*)) \rightarrow \wp (\varSigma ^*)\). Let \(\eta \in X \rightarrow \wp (\varSigma ^*)\) be a mapping from variables to languages.

  • \(\mathcal {L}^{}(\mathbf 0, \eta ) = \{\}\).

  • \(\mathcal {L}^{}(\mathbf 1, \eta ) = \{\varepsilon \}\).

  • \(\mathcal {L}^{}(a, \eta ) = \{a\}\) (singleton letter word) for each \(a\in \varSigma \).

  • \(\mathcal {L}^{}(r\cdot s, \eta ) = \mathcal {L}^{}(r, \eta ) \cdot \mathcal {L}^{}(s, \eta )\).

  • \(\mathcal {L}^{}(r+s, \eta ) = \mathcal {L}^{}(r, \eta ) \cup \mathcal {L}^{}(s, \eta )\).

  • \(\mathcal {L}^{}(r^*, \eta ) = (\mathcal {L}^{}(r, \eta ))^*\).

  • \(\mathcal {L}^{}(x, \eta ) = \eta (x)\).

  • \(\mathcal {L}^{}(\mu x. r, \eta ) = \textsf {lfp}\ L. \mathcal {L}^{}(r, {\eta [x \mapsto L]}) \).

For an expression \(r \in \mathbf {R}(\varSigma )\), we write \(\mathcal {L}^{}(r) := \mathcal {L}^{}(r, \emptyset )\).

Here, \(\textsf {lfp}\) is the least fixed point operator on the complete lattice \(\wp (\varSigma ^*)\) (ordered by set inclusion). Its application in the definition yields the smallest set \(L \subseteq \varSigma ^*\) such that \(L = \mathcal {L}^{}(r, {\eta [x \mapsto L]}) \). This fixed point exists by Tarski’s theorem because \(\mathcal {L}^{}\) is a monotone function, which is captured precisely in the following lemma.

Lemma 5

For each finite set X, \(\eta \in X \rightarrow \wp (\varSigma )\), \(r \in \mathbf {R}(\varSigma , X \cup \{x\})\), the function \(L \mapsto \mathcal {L}^{}(r, { \eta [x \mapsto L]})\) is monotone on \(\wp (\varSigma ^*)\). That is, if \(L\subseteq L'\), then \(\mathcal {L}^{}(r, {\eta [x \mapsto L]}) \subseteq \mathcal {L}^{}(r, { \eta [x \mapsto L']})\).

According to Leiss [11], it is a folkore theorem that the languages generated by \(\mu \)-regular expressions are exactly the context-free languages.

Theorem 6

\(L\subseteq \varSigma ^*\) is context-free if and only if there exists a \(\mu \)-regular expression \(r \in \mathbf {R}(\varSigma )\) such that \(L = \mathcal {L}^{}(r)\).

Subsequently we will deal syntactically with fixed points. To this end, we define properties of expressions and substitutions to make substitution application well-defined.

Definition 7

Let \(\mathcal {X}\) be the universe of variables occurring in expressions equipped with a strict partial order \(\prec \).

An expression is order-respecting if each subexpression of the form \(\mu x.r\) has only free variables which are strictly before x: \(\forall y\in {\textit{fv}}(\mu x.r), y \prec x\).

A mapping \(\sigma : X \rightarrow \mathbf {R}(\varSigma , X)\) is order-closed if \(\forall x\in X\), \(\sigma (x)\) is order-respecting and \(\forall y \in {\textit{fv}}(\sigma (x))\), \(y\prec x\) and \(y\in {\textit{dom}}(\sigma )\).

A variable ordering for an expression always exists: assume that all binders bind different variables and take the topological sort of the subexpression containment.

We define the application \(\sigma \bullet r\) of an order-closed mapping \(\sigma \) to an order-respecting expression r by starting to substitute a maximal free variable by its image and repeat this process until all variables are eliminated.

Definition 8

Let \(X \subseteq \mathcal {X}\) a finite set of variables, \(r \in \mathbf {R}(\varSigma , X)\) order-respecting, and \(\sigma : X \rightarrow \mathbf {R}(\varSigma , X)\) be order-closed.

The application \(\sigma \bullet r \in \mathbf {R}(\varSigma , X)\) yields an expression that is defined by substituting for the free variables in r in descending order.

$$\begin{aligned} \sigma \bullet r&= {\left\{ \begin{array}{ll} r &{} {\textit{fv}}(r) = \emptyset \\ \sigma \bullet r[\sigma (x)/x]&{} x \in \max ({\textit{fv}}(r)) \text { is a maximal element} \end{array}\right. } \end{aligned}$$

Application is well-defined because the variables x are drawn from the finite set X and the substitution step for x only introduces new variables that are strictly smaller than x due to order-closedness. The outcome does not depend on the choice of the maximal variable because the unfolding of a maximal variable cannot contain one of the other maximal variables. Furthermore, all intermediate expressions (and thus the result) are order-respecting.

4 Partial Derivatives

Antimirov [4] introduced partial derivatives to study the syntactic transformation from regular expressions to nondeterministic and deterministic finite automata. A partial derivative \(\partial ^{}_{a}(r)\) with respect to an input symbol a maps an expression r to a set of expressions such that their union denotes the left quotient of \(\mathcal {L}^{}(r)\). Antimirov’s definition corresponds to the left part of Fig. 1. We write \(\mathbf {R}_o (\varSigma )\) for the set of ordinary regular expressions that neither contain the \(\mu \)-operator nor any variables. We extend \(\cdot \) to a function \((\cdot ) : \wp (\mathbf {R}(\varSigma ,X)) \times \mathbf {R}(\varSigma ,X) \rightarrow \wp (\mathbf {R}(\varSigma ,X))\) on sets of expressions R defined pointwise by

$$\begin{aligned} R \cdot s&= \{ r \cdot s \mid r \in R \} . \end{aligned}$$
Fig. 1.
figure 1

Antimirov’s definition of partial derivatives and nullability

The definition of partial derivatives relies on nullability, which is tested by a function \(\mathcal {N}: \mathbf {R}_o (\varSigma ) \rightarrow \mathbb {B}\). The right side of the figure corresponds to Antimirov’s definition.

Lemma 9

For all \(r\in \mathbf {R}_o (\varSigma )\), \(\mathcal {N}(r)\) iff \(\varepsilon \in \mathcal {L}^{}(r)\).

Theorem 10

(Correctness [4]). For all \(r \in \mathbf {R}_o (\varSigma )\), \(a\in \varSigma \), \(\mathcal {L}^{}( \partial ^{}_{a}(r)) = a\backslash \mathcal {L}^{}(r)\).

Here we adopt the convention that if R is a set of expressions, then \(\mathcal {L}^{}( R)\) denotes the union of the languages of all expressions: \(\mathcal {L}^{}( R) = \bigcup \{ \mathcal {L}^{}(r) \mid r\in R\}\).

Theorem 11

(Expansion). For \(r \in \mathbf {R}_o (\varSigma )\), \(\mathcal {L}^{}(r) = \{ \varepsilon \mid \mathcal {N}(r) \} \cup \bigcup _{a\in \varSigma } a\cdot \mathcal {L}^{}(\partial ^{}_{a}(r))\).

Partial derivatives give rise to a nondeterministic finite automaton.

Theorem 12

(Finiteness [4]). Let \(r\in \mathbf {R}_o (\varSigma )\) be a regular expression. Define partial derivatives by words by \(\partial ^{}_{\varepsilon }(r) = \{r\}\) and \(\partial ^{}_{aw}(r) = \bigcup \{ \partial ^{}_{w}(s) \mid s \in \partial ^{}_{a}(r) \}\) and by a language L by \(\partial ^{}_{L}(r) = \bigcup \{ \partial ^{}_{w}(r) \mid w\in L\}\).

The set \(\partial ^{}_{\varSigma ^*}(r)\) is finite.

Theorem 13

(Nondeterministic finite automaton construction [4]). Let \(r\in \mathbf {R}_o (\varSigma )\) be a regular expression and define \(Q = \partial ^{}_{\varSigma ^*}(r)\), \(\delta : Q \times \varSigma \rightarrow \wp (Q)\) by \((q, a, q') \in \delta \) iff \(q' \in \partial ^{}_{a}(q)\). Let further \(q_0 = r\) and \(F = \{ q \in Q \mid \mathcal {N}(q) \}\).

Then \(\mathcal {A}= (Q, \varSigma , \delta , q_0, F)\) is a NFA such that \(\mathcal {L}^{}(r)= \mathcal {L}^{}(\mathcal {A})\).

The plan is to extend these results to \(\mu \)-regular expressions. We start with the extension of the nullability function.

5 Nullability

Figure 2 extends nullability to \(\mu \)-regular expressions. To cater for recursion, the \(\mathcal {N}\) function obtains as a further argument a nullability environment \(\nu \) of type \(X \rightarrow \mathbb {B}\). With this extension, an expression \(\mu x.r\) is deemed nullable if its body r is nullable. Furthermore, the least fixed point operator feeds back the nullability of the body to the free occurrences of the recursion variables. This fixed point is computed on the two-element Boolean lattice \(\mathbb {B}\) ordered by \(\mathbf {ff}\sqsubseteq \mathbf {tt}\) with disjunction \((\vee ) : \mathbb {B}\times \mathbb {B}\rightarrow \mathbb {B}\) as the least upper bound operation. Thus, the case for a free variable x obtains its nullability information from the nullability environment.

Lemma 14

For each \(r\in \mathbf {R}(\varSigma , X)\), \(\mathcal {N}(r)\) is a monotone function from \(X \rightarrow \mathbb {B}\) (ordered pointwise) to \(\mathbb {B}\).

Fig. 2.
figure 2

Nullability of \(\mu \)-regular expressions

To prepare for the correctness proof of \(\mathcal {N}\), we first simplify the case for the fixed point. It turns out that one iteration is sufficient to obtain the fixed point. This fact is also a consequence of a standard result, namely that the number of iterations needed to compute the fixed point of a monotone function on a lattice is bounded by the height of the lattice. In this case, the Boolean lattice has height one.

Lemma 15

Let X be a set of variables, \(r \in \mathbf {R}(\varSigma , X \cup \{x\})\), \(\eta : X \rightarrow \wp (\varSigma ^*)\), and \(L\subseteq \varSigma ^*\) such that \(\varepsilon \notin L\). If \(\varepsilon \notin \mathcal {L}^{}(r, { \eta [x \mapsto \emptyset ]}) \), then \(\varepsilon \notin \mathcal {L}^{}(r, { \eta [x \mapsto L]})\).

Lemma 16

For all \(r\in \mathbf {R}(\varSigma ,X)\), for all \(\nu : X\rightarrow \mathbb {B}\),

$$\textsf {lfp}\ b.\mathcal {N}(r)\nu [x \mapsto b] = \mathcal {N}(r)\nu [x \mapsto \mathbf {ff}].$$

For the statement of the correctness, we need to define what it means for a nullability environment to agree with a language environment.

Definition 17

Nullability environment \(\nu : X \rightarrow \mathbb {B}\) agrees with language environment \(\eta : X \rightarrow \wp (\varSigma ^*)\), written \(\eta \models \nu \), if for all \(x\in X\), \(\varepsilon \in \eta (x)\) iff \(\nu (x)\).

Lemma 18

(Correctness of \(\mathcal {N}\) ). For all X, \(r \in \mathbf {R}(\varSigma , X)\), \(\eta \in X \rightarrow \wp (\varSigma ^*)\), \(\nu \in X \rightarrow \mathbb {B}\), such that \(\eta \models \nu \), it holds that \(\varepsilon \in \mathcal {L}^{}(r, {\eta })\) iff \(\mathcal {N}(r)\nu \).

6 Derivation

The derivative for \(\mu \)-regular expressions has a different type than for ordinary regular expressions: A partial derivative is a set of non-empty sequences (i.e., stack fragments) of regular expressions. The idea is that deriving a recursion operator \(\mu x.r\) pushes the current context on the stack and starts afresh with the derivation of r. In other words, the derivative function for \(\mu \)-regular expressions has the same signature as the transition function for a nondeterministic PDA.

Fig. 3.
figure 3

Partial derivatives of \(\mu \)-regular expressions for \(\alpha \in \varSigma \cup \{\varepsilon \}\)

To distinguish operations on stacks from operations on words over \(\varSigma \), we write “\(:\)” (read “push”) for the concatenation operator on stacks. We also use this operator for pattern matching parts of a stack. We write \([\,]\) for the empty stack, \({[r_1, \dots , r_n]}\) for a stack with n elements, and \(\overline{\mathbf {r}}\) for any stack of expressions. We extend the concatenation operator for regular expressions to non-empty stacks by having it operate on the last (bottom) element of a stack.

Definition 19

Let \((M, (\cdot ), \mathbf 1)\) be a monoid. We lift the monoid operation to non-empty stacks \((\cdot ) \in M^+ \times M \rightarrow M^+\) for \(\overline{a} \in M^*\) and \(a, b \in M\) by

$$\begin{aligned} (\overline{a} :{[a]}) \cdot b&= (\overline{a} :{[a \cdot b]}) \text{. } \end{aligned}$$

We further lift it pointwise to sets \(A \subseteq M^+\) to obtain \((\cdot )\in \wp (M^+) \times M \rightarrow \wp (M^+)\):

$$\begin{aligned} A \cdot b&= \{ \overline{a} \cdot b \mid \overline{a} \in A \} \text{. } \end{aligned}$$

We use this definition for \(M= \mathbf {R}(\varSigma ,X)\) and also extend the push operation ( : ) pointwise to sets of stacks.

$$\begin{aligned} (:)&\in \wp (\mathbf {R}(\varSigma ,X)^+) \times \mathbf {R}(\varSigma ,X)^+ \rightarrow \wp (\mathbf {R}(\varSigma ,X)^+) \\ R :\overline{\mathbf {s}}&= \{ \overline{\mathbf {r}}:\overline{\mathbf {s}}\mid \overline{\mathbf {r}}\in R \} \end{aligned}$$

Most of the time, the second argument will be a singleton stack \({[s]}\).

Before we discuss the intricacies of the full definition in Fig. 3, let’s first consider a naive extension of the derivative function in Fig. 1 to \(\mu \)-regular expressions and analyze its problems:

Taking the derivative of a recursive definition means to apply the derivative to the unrolled definition. At the same time, we push an empty context on the stack so that the context of the recursion does not become a direct part of the derivative. This proposed definition makes sure that the partial derivative \(\partial ^{}_{a}(r)\) is only ever applied to closed expressions \(r\in \mathbf {R}(\varSigma )\). Hence, the case of a free recursion variable x would not occur during the computation of \(\partial ^{}_{a}(r)\).

Example 20

The “naive unrolling” definition of the partial derivative has a problem. While it can be shown to be (partially) correct, it is not well-defined for all arguments. Consider the left-recursive expression \(r = \mu x. \mathbf 1+ x \cdot a\), which is equivalent to \(a^*\). Computing its partial derivative according to “naive unrolling” reveals that it depends on itself, so that \(\partial ^{}_{a}(r)\) would be undefined.

$$\begin{aligned} \underline{\partial ^{}_{a}(r)}&= \partial ^{}_{a}(\mathbf 1+ r \cdot a) :{[\mathbf 1]} \\&= (\{\} \cup \partial ^{}_{a}(r \cdot a)) :{[\mathbf 1]} \\&= (\partial ^{}_{a}(r) \cdot a \cup \partial ^{}_{a}(a)) :{[\mathbf 1]} \\&= (\underline{\partial ^{}_{a}(r)} \cdot a \cup \{ {[\mathbf 1]}\}) :{[\mathbf 1]} \end{aligned}$$

We remark that the expression r corresponds to a left-recursive grammar, where the naive construction of a top-down parser using the method of recursive descent also runs into problems [2]. There would be no problem with the right-recursive equivalent \(r' = \mu x.\mathbf 1+ a\cdot x\) where the naive unrolling yields \(\partial ^{}_{a}(r') = \{ [r', \mathbf 1] \}\). Indeed, the work by Winter and others [19] only allows guarded uses of the recursion operator, which rules out expressions like r from the start and which enables them to use the “naive unrolling” definition of the derivative.

For that reason, the derivative must not simply unroll recursions as they are encountered. Our definition distinguishes between left-recursive occurrences of a recursion variable, which must not be unrolled, and guarded occurrences, which can be unrolled safely. The derivative function remembers deferred unrollings in a substitution \(\sigma \) and applies them only when it is safe.

These observations lead to the signature of the definition of partial derivative in Fig. 3. Its type is

$$\begin{aligned} \partial&: (\varSigma \cup \{\varepsilon \}) \times (X \rightarrow \mathbf {R}(\varSigma , X)) \times (X \rightarrow \mathbb {B}) \times \mathbf {R}(\varSigma , X) \rightarrow \wp ( \mathbf {R}(\varSigma )^+) \end{aligned}$$

and we write it as \(\partial ^{\sigma ,\nu }_{\alpha }(r)\). It takes a symbol or an empty string \(\alpha \in \varSigma \cup \{\varepsilon \}\) to derive, a substitution \(\sigma : X \rightarrow \mathbf {R}(\varSigma , X)\) that maps free recursion variables to expressions (i.e., their unrollings), a nullability function \(\nu :X \rightarrow \mathbb {B}\) that maps free recursion variables to their nullability, and the regular expression \(r \in \mathbf {R}(\varSigma , X)\) to derive as arguments and returns the partial derivatives as a set of non-empty stacks of expressions.

Let’s examine how the revised definition guarantees well-definedness. Example 20 demonstrates that left recursion is the cause for non-termination of the naive definition. The problem is that the naive definition indiscriminately substitutes all occurrences of x by its unfolding and propagates the derivative into the unfolding. However, this substitution is only safe in guarded positions (i.e., behind at least one terminal symbol in the unfolding). To avoid substitution in unguarded positions, the definition in Fig. 3 reifies this substitution as an additional argument \(\sigma \) and takes care to only apply it in guarded positions.

To introduce this recursion, the derived symbol \(\alpha \) ranges over \(\varSigma \cup \{\varepsilon \}\) in Fig. 3. For \(\alpha =\varepsilon \), the derivative function unfolds one step of left recursion.

Example 21

Recall \(r = \mu x. \mathbf 1+ x \cdot a\) from Example 20. Observe that \(\mathcal {N}(r) \emptyset = \mathcal {N}(\mathbf 1+ x \cdot a) [\mathbf {ff}/x] = \mathbf {tt}\).

$$\begin{aligned} \partial ^{\emptyset ,\emptyset }_{a}(r)&= (\partial ^{[r/x],[\mathbf {tt}/x]}_{a}(\mathbf 1+ x \cdot a)) :{[\mathbf 1]} \\&= ( \{ \} \cup \partial ^{[r/x],[\mathbf {tt}/x]}_{a}(x \cdot a)) :{[\mathbf 1]} \\&= ( \{ {[\mathbf 1]}\}) :{[\mathbf 1]} \\&= \{ {[\mathbf 1, \mathbf 1]} \} \end{aligned}$$

The spontaneous derivative unfolds one level of left recursion.

$$\begin{aligned} \partial ^{\emptyset ,\emptyset }_{\varepsilon }(r)&= (\partial ^{[r/x],[\mathbf {tt}/x]}_{\varepsilon }(\mathbf 1+ x \cdot a)) :{[\mathbf 1]} \\&= ( \{ \} \cup \partial ^{[r/x],[\mathbf {tt}/x]}_{\varepsilon }(x \cdot a)) :{[\mathbf 1]} \\&= ( \{ {[r \cdot a]}\}) :{[\mathbf 1]} \\&= \{ {[r\cdot a, \mathbf 1]} \} \end{aligned}$$

Thus, the spontaneous derivative corresponds to \(\varepsilon \)-transitions of the PDA that is to be constructed.

7 Correctness

To argue about the correctness of our derivative operation, we define the membership of a word \(w\in \varSigma ^*\) in the language of an order-respecting expression \(r\in \mathbf {R}(\varSigma , X)\) under an order-closed mapping \(\sigma : X \rightarrow \mathbf {R}(\varSigma ,X)\) inductively by the judgment \(\sigma \vdash w \in r\) in Fig. 4 along with \(\sigma \vdash w \in \overline{\mathbf {r}}\) for an expression stack \(\overline{\mathbf {r}}\) and \(\sigma \vdash w \in R\) for a set of such stacks \(R \subseteq \mathbf {R}(\varSigma , X)^*\). This inductive definition mirrors the previous fixed point definition of the language of an expression.

Lemma 22

For all \(r\in \mathbf {R}(\varSigma )\) and \(w\in \varSigma ^*\). \(\emptyset \vdash w \in r\) iff \(w\in \mathcal {L}^{}(r)\).

It is straightforward to prove the following derived rule.

Fig. 4.
figure 4

Membership in a \(\mu \)-regular expression, a stack of expressions, and a set of stacks

Lemma 23

If \(R \subseteq S \subseteq \mathbf {R}(\varSigma , X)^*\), then \(\sigma \vdash w \in R\) implies \(\sigma \vdash w \in S\).

Lemma 24

Let \(r\in \mathbf {R}(\varSigma , X)\) and \(\sigma : X \rightarrow \mathbf {R}(\varSigma , X)\) be order-respecting. If \(\sigma \vdash w \in r\), then \(\emptyset \vdash w \in \sigma \bullet r\).

The derivation closure \(\tilde{\partial }^{}_{a}(\overline{\mathbf {r}})\) of a non-empty closed stack of expressions is defined by the union of the partial derivatives after taking an arbitrary number of \(\varepsilon \)-steps. It is our main tool in proving the correctness of the derivative.

Definition 25

For \(a\in \varSigma \), the derivation closure \(\tilde{\partial }^{\sigma ,\nu }_{a}(r :\overline{\mathbf {r}})\) is inductively defined as the smallest set of stacks such that

  1. 1.

    \(\tilde{\partial }^{\sigma ,\nu }_{a}(r :\overline{\mathbf {r}}) \supseteq \partial ^{\sigma ,\nu }_{a}(r) :\overline{\mathbf {r}}\) and

  2. 2.

    \(\tilde{\partial }^{\sigma ,\nu }_{a}(r :\overline{\mathbf {r}}) \supseteq \bigcup \{ \tilde{\partial }^{\sigma ,\nu }_{a}(\overline{\mathbf {s}}:\overline{\mathbf {r}}) \mid \overline{\mathbf {s}}\in \partial ^{\sigma ,\nu }_{\varepsilon }(r) \}\).

Lemma 26

(Unfolding). Let \(r \in \mathbf {R}(\varSigma , X)\) an order-respecting expression, \(\sigma : X \rightarrow \mathbf {R}(\varSigma , X)\) order-closed with \(\sigma (x) = \mu x.s_x\), \(\nu : X \rightarrow \mathbb {B}\) such that \(\nu (x) = \mathcal {N}(\sigma \bullet x)\emptyset \).

$$\begin{aligned} \sigma \vdash w \in r&\Leftarrow \emptyset \vdash w \in \partial ^{\sigma ,\nu }_{\varepsilon }( r) \end{aligned}$$

Theorem 27

(Correctness). Let \(r \in \mathbf {R}(\varSigma , X)\) an order-respecting expression, \(\sigma : X \rightarrow \mathbf {R}(\varSigma , X)\) order-closed with \(\sigma (x) = \mu x.s_x\), \(\nu : X \rightarrow \mathbb {B}\) such that \(\nu (x) = \mathcal {N}(\sigma \bullet x)\emptyset \).

$$\begin{aligned} \sigma \vdash aw \in r&\Leftrightarrow \emptyset \vdash w \in \tilde{\partial }^{\sigma ,\nu }_{a}({[r]}) \end{aligned}$$

Proof

The direction from left to right is proved by induction on \(\sigma \vdash aw \in r\).

We demonstrate the right-to-left direction.

Suppose that \(\varDelta = \emptyset \vdash w \in \tilde{\partial }^{\sigma ,\nu }_{a}({[r]})\) and show that \(\sigma \vdash aw \in r\).

The proof is by induction on the size of the derivation of \(\varDelta \). Inversion yields that there is some \(\overline{\mathbf {r}}\in \tilde{\partial }^{\sigma ,\nu }_{a}({[r]})\) such that \(\emptyset \vdash w \in \overline{\mathbf {r}}\). Now there are two cases.

Case \(w=\varepsilon \) and \(\overline{\mathbf {r}}={[]}\) so that the empty-sequence-rule \(\emptyset \vdash \varepsilon \in {[]} \) applies. But this case cannot happen because \(\overline{\mathbf {r}}\ne {[]}\).

Case \(\emptyset \vdash vw \in {[s]}:\overline{\mathbf {r}}\) because \(\emptyset \vdash v \in s\) and \(\emptyset \vdash w \in \overline{\mathbf {r}}\).

These two cases boil down to \(w=w_1\dots w_n\), \(\overline{\mathbf {r}}= [r_1, \dots , r_n]\), for some \(n\ge 1\), and \(\emptyset \vdash w_1\dots w_n \in [r_1, \dots , r_n]\) because \(\emptyset \vdash w_i \in r_i\).

We perform an inner induction on r.

Case \(\mathbf 0\), \(\mathbf 1\), \(b\ne a\): contradictory because \(\tilde{\partial }^{\sigma ,\nu }_{a}({[r]}) = \emptyset \).

Case a: \(\tilde{\partial }^{\sigma ,\nu }_{a}({[a]}) = \{{[\mathbf 1]}\}\) so that \(w=\varepsilon \). Clearly, \(\sigma \vdash a \in a\).

Case \(r+s\): We can show that \(\tilde{\partial }^{\sigma ,\nu }_{a}(r+s) = \tilde{\partial }^{\sigma ,\nu }_{a}(r) \cup \tilde{\partial }^{\sigma ,\nu }_{a}(s)\). Assuming that \(\overline{\mathbf {r}}\in \tilde{\partial }^{\sigma ,\nu }_{a}(r)\), induction on r yields \(\sigma \vdash aw \in r\) and the \(+\)-rule yields \(\sigma \vdash aw \in r+s\). Analogously for s.

Case \(r \cdot s\): We can show that \(\tilde{\partial }^{\sigma ,\nu }_{a}(r\cdot s) = \tilde{\partial }^{\sigma ,\nu }_{a}(r) \cdot (\sigma \bullet s) \cup \{ \overline{\mathbf {s}}\mid \mathcal {N}(r)\nu , \overline{\mathbf {s}}\in \tilde{\partial }^{\sigma ,\nu }_{a}(s) \}\). There are two cases.

Subcase \(\overline{\mathbf {r}}\in \tilde{\partial }^{\sigma ,\nu }_{a}(r)\cdot (\sigma \bullet s)\). Hence, \(\overline{\mathbf {r}}= [r_1, \dots , r_n \cdot (\sigma \bullet s)]\) so that \(w= w_1\dots w_nw_{n+1}\) and \(\emptyset \vdash w_1 \in r_1\), ..., \(\emptyset \vdash w_n \in r_n\), and \(\emptyset \vdash w_{n+1} \in (\sigma \bullet s)\). Now, \(\overline{\mathbf {r}}' = [r_1,\dots , r_n] \in \tilde{\partial }^{\sigma ,\nu }_{a}(r)\) and thus \(\emptyset \vdash w_1\dots w_n \in \tilde{\partial }^{\sigma ,\nu }_{a}(r)\). By induction, \(\sigma \vdash aw_1\dots w_n \in r\). Because \(\sigma \bullet s\) is closed, we also have \(\emptyset \vdash w_{n+1} \in (\sigma \bullet s)\) and thus by Lemma 24 \(\sigma \vdash w_{n+1} \in s\). Taken together \(\sigma \vdash aw_1\dots w_nw_{n+1} \in r\cdot s\).

Subcase \(\mathcal {N}(r)\nu \) and \(\overline{\mathbf {r}}\in \partial ^{\sigma ,\nu }_{a}(s)\). Hence, \(\sigma \vdash \varepsilon \in r\), by induction \(\sigma \vdash aw\in s\), and the concatenation rule yields \(\sigma \vdash aw \in r\cdot s\).

Case \(r^*\). Because \(\overline{\mathbf {r}}\in \tilde{\partial }^{\sigma ,\nu }_{a}(r) \cdot (\sigma \bullet r^*)\), it must be that \(\overline{\mathbf {r}}= [r_1,\dots , r_n\cdot (\sigma \bullet r^*)]\) and \(w=w_1\dots w_nw_{n+1}\) so that \(\emptyset \vdash w_1 \in r_1\), ..., \(\emptyset \vdash w_n \in r_n\), and \(\emptyset \vdash w_{n+1} \in (\sigma \bullet r^*)\). Proceed as in the first subcase for concatenation.

Case \(\mu x.r\). As usual, let \(\hat{\sigma }= \sigma {[\mu x.r/x]}\) and \(\hat{\nu }= \nu {[\mathcal {N}(r)\nu [\mathbf {ff}/x]/x]}\). Again, \(\tilde{\partial }^{\sigma ,\nu }_{a}(\mu x.r) = \tilde{\partial }^{\hat{\sigma },\hat{\mu }}_{a}(r) :{[\mathbf 1]}\). Hence, \(\overline{\mathbf {r}}= \overline{\mathbf {r}}' :{[\mathbf 1]}\) for some \(\overline{\mathbf {r}}' \in \tilde{\partial }^{\hat{\sigma },\hat{\mu }}_{a}(r)\) such that \(\emptyset \vdash w \in \overline{\mathbf {r}}'\). Induction yields that \(\hat{\sigma }\vdash aw \in r\) and application of the \(\mu \)-rule yields \(\sigma \vdash aw \in \mu x.r\).

Case x. Then \(\tilde{\partial }^{\hat{\sigma },\hat{\nu }}_{a}(x) = \tilde{\partial }^{\sigma ,\nu }_{a}(\mu x.r)\) if \(\hat{\sigma }= \sigma {[\mu x.r/x]}\) and \(\hat{\nu }= \nu {[\mathcal {N}(r)\nu [\mathbf {ff}/x]/x]}\).

Now \(\emptyset \vdash w \in \tilde{\partial }^{\hat{\sigma },\hat{\nu }}_{a}(x)\) iff exists \(\overline{\mathbf {r}}\in \tilde{\partial }^{\hat{\sigma },\hat{\nu }}_{a}(x) = \tilde{\partial }^{\sigma ,\nu }_{a}(\mu x.r)\) such that \(\emptyset \vdash w \in \overline{\mathbf {r}}\). But \(\tilde{\partial }^{\sigma ,\nu }_{a}(\mu x.r) = \tilde{\partial }^{\hat{\sigma },\hat{\nu }}_{a}(r) :{[\mathbf 1]}\) so that \(\overline{\mathbf {r}}= \overline{\mathbf {r}}':{[\mathbf 1]}\) and \(\emptyset \vdash w \in \overline{\mathbf {r}}'\) with a smaller derivation tree. Thus, induction yields that \(\hat{\sigma }\vdash aw \in r\), application of the \(\mu \)-rule yields \(\sigma \vdash aw \in \mu x.r\), and application of the variable rule yields \(\hat{\sigma }\vdash aw \in x\), as desired.    \(\square \)

8 Finiteness

In analogy to Antimirov’s finite automaton construction, we aim to use the set of iterated derivatives as a building block for a pushdown automaton. In our construction, derivatives end up as pushdown symbols rather than states: the top of the pushdown plays the role of the state. It remains to prove that this set is finite to obtain a proper PDA.

Our finiteness argument is based on an analysis of the syntactical form of the derivatives. It turns out that a derivative is, roughly, a concatenation of a strictly descending sequence of certain subexpressions of the initial expression. As this ordering is finite, we obtain a finite bound on the syntactically possible derivatives.

We start with an analysis of the output of \(\partial ^{\sigma ,\nu }_{\alpha }(r)\). The elements in the stack of a partial derivative are vectors of the form \(((h\cdot s_1)\cdot s_2)\cdots s_k\) that we abbreviate \(h\cdot \vec {s}\), where the \(s_i\) are arbitrary expressions and h is either \(\mathbf 1\) or \({\mu x.r}\) where \(\mu x.r\) is closed.

It turns out that the vectors produced by derivation are always strictly ascending chains in the subterm ordering of the original expression, say t. We first define this ordering, then we define the structure of these vectors in Definition 31.

Definition 28

Let \(r \in \mathbf {R}(\varSigma )\) be a closed expression. We define the addressing function \(A_r : \mathbb {N}^* \hookrightarrow \mathbf {R}(\varSigma , X)\) by induction on r.

$$\begin{aligned} A_\mathbf 0&= \{ (\varepsilon , \mathbf 0) \}&A_{r+s}&= \{ (\varepsilon , r+s) \} \cup 1.A_r \cup 2.A_s \\ A_\mathbf 1&= \{ (\varepsilon , \mathbf 1) \}&A_{r\cdot s}&= \{ (\varepsilon , r\cdot s) \} \cup 1.A_r \cup 2.A_s\\ A_a&= \{ (\varepsilon ,a) \}&A_{r^*}&= \{ (\varepsilon , r^*) \} \cup 1.A_r\\ A_{x}&= \{ (\varepsilon , x) \}&A_{\mu x.r}&= \{ (\varepsilon , \mu x.r) \} \cup 1.A_r \end{aligned}$$

Here \(i.A\) modifies the function \(A\) by prepending i to each element of \(A\)’s domain:

$$\begin{aligned} (i.A) (w)&= {\left\{ \begin{array}{ll} A(w') &{} w = i w'\,\, { and }\,\,A(w')\,\, { defined} \\ {undefined} &{} {otherwise.} \end{array}\right. } \end{aligned}$$

It is well known that \({\textit{dom}}(A_r)\) is prefix-closed and assigns a unique \(w\in \mathbb {N}^*\) to each occurrence of a subexpression in r. Let \(r_1 = A_r (w_1)\) and \(r_2 = A_r (w_2)\) be subexpression occurrences of r. We say that \(r_1\) occurs before \(r_2\) in r if \(w_1 \preceq w_2\) in the lexicographic order on \(\mathbb {N}^*\):

figure b

We write \(w_1 \prec w_2\) if \(w_1 \preceq w_2\) and \(w_1\ne w_2\), in which case we say that \(r_1\) occurs strictly before \(r_2\).

Lemma 29

For each closed expression \(r\in \mathbf {R}(\varSigma )\), the strict lexicographic ordering \(\prec \) on \({\textit{dom}}(A_r)\) has no infinite chains.

Definition 30

Let \(t \in \mathbf {R}(\varSigma )\) be a closed expression such that each variable occurring in t is bound exactly once. The unfolding substitution \(\sigma _t\) is defined by induction on t.

$$\begin{aligned} \sigma _\mathbf 0&= []&\sigma _{r+s}&= \sigma _r \cup \sigma _s \\ \sigma _\mathbf 1&= []&\sigma _{r\cdot s}&= \sigma _r \cup \sigma _s\\ \sigma _a&= []&\sigma _{r^*}&= \sigma _r \\ \sigma _x&= []&\sigma _{\mu x.r}&= [\mu x.r / x] \cup \sigma _r \end{aligned}$$

Definition 31

A vector \(\vec {s} = (s_1\cdot s_2)\cdots s_k\) is t-sorted if for all \(1\le i < j \le k\): \(s_i\) and \(s_j\) are subexpressions of t and \(s_i\) occurs strictly before \(s_j\), which means that there are \(w_1, \dots , w_k \in \mathbb {N}^*\) such that \(s_i = {A_t (w_i)}\) and \(w_i \prec w_{i+1}\), for \(1\le i < k\).

For a t-sorted vector \(\vec {s} = (s_1\cdot s_2)\cdots s_k\) define two forms of expressions:

  • top: \(\sigma _t \bullet (\mathbf 1\cdot \vec {s})\).

  • rec: \(\sigma _t \bullet (( {\mu x.s})\cdot \vec {s})\) where \(\mu x.s\) is a subexpression of t and either \(\mu x.s\) or an occurrence of x is strictly before \(s_i\), for all \(1\le i \le k\).

A stack \(\overline{\mathbf {r}}= [r_1, \dots , r_n]\) (for \(n\ge 1\)) has form \(\mathbf {top}^+\) if \(r_1, \dots , r_n\) have form \(\mathbf {top}\).

A stack \(\overline{\mathbf {r}}= [r_1, \dots , r_n]\) (for \(n\ge 1\)) has form \(\mathbf {rec}.\mathbf {top}^*\) if \(r_1\) has form \(\mathbf {rec}\) and \(r_2, \dots , r_n\) have form \(\mathbf {top}\).

Next, we show that all derivatives and partial derivatives of subexpressions of a closed expression t have indeed one of the forms \(\mathbf {top}^+\) or \(\mathbf {rec}.\mathbf {top}^*\).

Lemma 32

(Classification of derivatives). Suppose that \(t \in \mathbf {R}(\varSigma )\) is a closed expression, \(r \in \mathbf {R}(\varSigma ,X)\) is a subexpression of t, \(\sigma : X \rightarrow \mathbf {R}(\varSigma ,X)\) is order-closed with \(\sigma (x) = \mu x.s\) (for \(x\in X\) and \(\mu x.s\) a subterm of t), and \(\nu : X \rightarrow \mathbb {B}\) such that \(\nu (x) = \mathcal {N}(\sigma \bullet x)\emptyset \). If \(\overline{\mathbf {r}}= [r_1, \dots , r_n] \in \partial ^{\sigma ,\nu }_{a}(r)\), then \(n\ge 1\) and \(\overline{\mathbf {r}}\) has form \(\mathbf {top}^+\) and each \(r_i = h_i \cdot \vec {s_i}\) for some t-sorted \(\vec {s_i}\) which is before r.

Lemma 33

(Classification of spontaneous derivatives). Suppose that \(t \in \mathbf {R}(\varSigma )\) is a closed expression, \(r \in \mathbf {R}(\varSigma ,X)\) is a subexpression of t, \(\sigma : X \rightarrow \mathbf {R}(\varSigma ,X)\) is order-closed with \(\sigma (x) = \mu x.s\) (for \(x\in X\) and \(\mu x.s\) a subterm of t), and \(\nu : X \rightarrow \mathbb {B}\) such that \(\nu (x) = \mathcal {N}(\sigma \bullet x)\emptyset \). If \(\overline{\mathbf {r}}= [r_1, \dots , r_n] \in \partial ^{\sigma ,\nu }_{\varepsilon }(r)\), then \(n\ge 1\) and \(\overline{\mathbf {r}}\) has form \(\mathbf {rec}.\mathbf {top}^*\) and each \(r_i = h_i \cdot \vec {s_i}\) for some t-sorted \(\vec {s_i}\) which is before r.

Lemma 34

(Classification of derivatives of vectors). Let \(t \in R (\varSigma )\) be a closed expression and \(t_0\) be closed of form top or form rec with respect to t. Then the elements of \(\partial ^{\emptyset ,\emptyset }_{a}(t_0)\) are stacks of the form \(\mathbf {top}^+\) as in Lemma 32 and the elements of \(\partial ^{\emptyset ,\emptyset }_{\varepsilon }(t_0)\) are stacks of the form \(\mathbf {rec}.\mathbf {top}^*\).

We define the set of iterated partial derivatives as the expressions that may show up in the stack of a partial derivative. This set will serve as the basis for defining the set of pushdown symbols of a PDA.

Definition 35

(Iterated Partial Derivatives). Let \(t \in \mathbf {R}(\varSigma )\) be a closed expression. Define \(\varDelta (t)\), the set of iterated partial derivatives of t, as the smallest set such that

  • \(\mathbf 1\cdot t \in \varDelta (t)\);

  • if \(r \in \varDelta (t)\) and \([t_1,\dots , t_n] \in \partial ^{\emptyset ,\emptyset }_{a}(r)\), then \(t_j \in \varDelta (t)\), for all \(1\le j\le n\); and

  • if \(r \in \varDelta (t)\) and \([t_1,\dots , t_n] \in \partial ^{\emptyset ,\emptyset }_{\varepsilon }(r)\), then \(t_j \in \varDelta (t)\), for all \(1\le j\le n\).

Lemma 36

(Closure). Let \(t \in \mathbf {R}(\varSigma )\) be a closed expression. Then all elements of \(\varDelta (t)\) either have form top or rec with respect to t.

Proof

Follows from Lemmas 32, 33, and 34.

Lemma 37

(Finiteness). Let \(t \in \mathbf {R}(\varSigma )\) be closed. Then \(\varDelta (t)\) is finite.

Proof

By construction, the elements of \(\varDelta (t)\) are all closed and have either form top or form rec, which is a vector of the form \(\sigma _t \bullet ( h\cdot \vec {s})\) where \(\vec {s}\) is t-sorted. As t is a finite expression and a t-sorted vector is strictly decreasing, there are only finitely many candidates for \(\vec {s}\) (by Lemma 29).

The head h of the vector is either \(\mathbf 1\) or it is a subexpression of t of the form \(\mu x.s_x\). Hence, there are only finitely many choices for h.

Thus \(\varDelta (t)\) is a subset of a finite set and hence finite.    \(\square \)

9 Automaton Construction

Given that the derivative for a closed \(\mu \)-regular expression gives rise to a finite set of iterated partial derivatives, we use that set as the pushdown alphabet to construct a nondeterministic pushdown automaton that recognizes the same language. This construction is straightforward as its transition function corresponds exactly to the derivative and the spontaneous derivative function.

Definition 38

Suppose that \(t \in \mathbf {R}(\varSigma )\) is closed. Define the PDA \(\mathcal {U}\mathcal {A}(t) = (Q, \varSigma , \varGamma , \delta , q_0, Z_0)\) by a singleton set \(Q = \{ q \}\), \(\varGamma = \varDelta (t)\), \(q_0 = q\), \(Z_0 = \mathbf 1\cdot r\), and \(\delta \subseteq Q \times (\varSigma \cup \{\varepsilon \})\times \varGamma \times Q \times \varGamma ^*\) as the smallest relation such that

  • \((q, a, s, q, \overline{\mathbf {s}}) \in \delta \) if \(\overline{\mathbf {s}}\in \partial ^{\emptyset ,\emptyset }_{a}(s)\), for all \(s \in \varGamma \), \(\overline{\mathbf {s}}\in \varGamma ^*\), \(a\in \varSigma \);

  • \((q, \varepsilon , s, q, {\overline{\mathbf {s}}}) \in \delta \) if \( \overline{\mathbf {s}}\in \partial ^{\emptyset ,\emptyset }_{\varepsilon }(s) \), for all \(s \in \varGamma \), \(\overline{\mathbf {s}}\in \varGamma ^*\);

  • \((q, \varepsilon , s, q, \varepsilon )\), for all \(s\in \varGamma \) with \(\mathcal {N}(s)\emptyset \).

Theorem 39

(Automaton correctness). For all closed expressions \(t\in R (\varSigma )\), \( \mathcal {L}^{}(t) = \mathcal {L}^{}(\mathcal {U}\mathcal {A}(t))\).

Proof

Let \(\mathcal {U}\mathcal {A}(t) = (Q, \varSigma , \varGamma , \delta , q_0, Z_0)\). We prove a generalized statement from which the original statement follows trivially: for all \(\overline{\mathbf {r}}\in \varDelta (t)^*\), \(\emptyset \vdash w \in \overline{\mathbf {r}}\) iff \((q, \overline{\mathbf {r}}, w) \vdash ^* (q, \varepsilon , \varepsilon )\). The proof in the left-to-right direction is by induction on the derivation of \(\emptyset \vdash w \in \overline{\mathbf {r}}\).

Case \(\emptyset \vdash \varepsilon \in {[]}\). Immediate.

Case \(\emptyset \vdash w \in \overline{\mathbf {r}}\) because \(w=w_1w_2\), \(\overline{\mathbf {r}}= {[r]}:\overline{\mathbf {r}}'\), \(\emptyset \vdash w_1 \in r\), and \(\emptyset \vdash w_2 \in \overline{\mathbf {r}}'\). By induction, we find that \((q, {[r]}, w_1) \vdash ^* (q, {[]}, \varepsilon )\). By a standard argument that means \((q, {[r]}:\overline{\mathbf {r}}', w_1w_2) \vdash ^* (q, \overline{\mathbf {r}}', w_2)\). By the second inductive hypothesis, we find that \( (q, \overline{\mathbf {r}}', w_2) \vdash ^* (q, {[]}, \varepsilon )\). Taken together, we obtain the desired result.

Now we consider the derivation of \(\emptyset \vdash w \in r\) by performing a case analysis on w and using Lemma 27.

Case \(\varepsilon \). In this case, \(\emptyset \vdash \varepsilon \in r\) iff \(\mathcal {N}(r_i)\emptyset \) iff \((q, \varepsilon , r, q, \varepsilon ) \in \delta \) so that \((q, {[r]}, \varepsilon ) \vdash ^+ (q, \varepsilon , \varepsilon )\).

Case aw. In this case \(\emptyset \vdash aw \in r\). By Lemma 27, \(\emptyset \vdash aw \in r\) is equivalent to \(\emptyset \vdash w \in \tilde{\partial }^{\emptyset ,\emptyset }_{a}({[r]})\) and we perform a subsidiary induction on its definition. That is, either \(\exists \overline{\mathbf {s}}\in \partial ^{\emptyset ,\emptyset }_{a}(r)\) such that \(\emptyset \vdash w \in \overline{\mathbf {s}}\). In that case, \(\mathcal {U}\mathcal {A}(t)\) has a transition \((q, r, aw) \vdash (q, \overline{\mathbf {s}}, w)\) by definition of \(\delta \). By induction we know that \((q, \overline{\mathbf {s}}, w) \vdash ^+ (q, \varepsilon , \varepsilon )\).

Alternatively, \(\exists \overline{\mathbf {s}}\in \partial ^{\emptyset ,\emptyset }_{\varepsilon }(r)\) such that \(\emptyset \vdash aw \in \overline{\mathbf {s}}\). In this case, \((q, r, aw) \vdash (q, \overline{\mathbf {s}}, aw)\) is a transition and by induction we have \((q, \overline{\mathbf {s}}, aw) \vdash ^+ (q, \varepsilon , \varepsilon )\).

Right-to-left direction. By induction on the length of \((q, \overline{\mathbf {r}}, w) \vdash ^* (q, {[]}, \varepsilon )\).

Case length 0: it must be \(\overline{\mathbf {r}}={[]}\) and \(w=\varepsilon \). Obviously, \(\emptyset \vdash \varepsilon \in {[]}\).

Case length \(>0\): Thus the first configuration must have the form \((q, {[s]}:\overline{\mathbf {r}}, w)\). There are three possibilities.

Subcase \((q, {[s]}:\overline{\mathbf {r}}, w) \vdash (q, \overline{\mathbf {s}}:\overline{\mathbf {r}}, w')\) if \(w=aw'\) and \(\overline{\mathbf {s}}\in \partial ^{}_{a}(s)\). We split the run of the automaton at the point where \(\overline{\mathbf {s}}\) is first consumed: let \(w' = w_1w_2\) such that \((q, \overline{\mathbf {s}}:\overline{\mathbf {r}}, w_1w_2) \vdash ^* (q, \overline{\mathbf {r}}, w_2) \vdash ^* (q, {[]}, \varepsilon )\). Hence, there is also a shorter run on \(w_1\): \((q, \overline{\mathbf {s}}, w_1) \vdash ^* (q, {[]}, \varepsilon )\). Induction yields \(\emptyset \vdash w_1 \in \overline{\mathbf {s}}\). By Lemma 27, we also have a derivation \(\emptyset \vdash aw_1 \in s\). By induction on the \(\overline{\mathbf {r}}\) run, we obtain \(\emptyset \vdash w_2 \in \overline{\mathbf {r}}\) and applying the stack rule yields \(\emptyset \vdash aw_1w_2 \in {[s]}:\overline{\mathbf {r}}\) or in other words \(\emptyset \vdash w \in {[s]}:\overline{\mathbf {r}}\).

Subcase \((q, {[s]}:\overline{\mathbf {r}}, w) \vdash (q, \overline{\mathbf {s}}:\overline{\mathbf {r}}, w)\) if \(\overline{\mathbf {s}}\in \partial ^{}_{\varepsilon }(s)\). We split the run of the automaton at the point where \(\overline{\mathbf {s}}\) is first consumed: let \(w' = w_1w_2\) such that \((q, \overline{\mathbf {s}}:\overline{\mathbf {r}}, w_1w_2) \vdash ^* (q, \overline{\mathbf {r}}, w_2) \vdash ^* (q, {[]}, \varepsilon )\). Hence there is also a shorter run on \(w_1\): \((q, \overline{\mathbf {s}}, w_1) \vdash ^* (q, {[]}, \varepsilon )\). By induction, we have a derivation \(\emptyset \vdash w_1 \in \overline{\mathbf {s}}\), which yields \(\emptyset \vdash w_1\in s\) by Lemma 26, and a derivation \(\emptyset \vdash w_2 \in \overline{\mathbf {r}}\), which we can combine to \(\emptyset \vdash w_1w_2 \in {[s]}:\overline{\mathbf {r}}\) as desired.

Subcase \((q, {[s]}:\overline{\mathbf {r}}, w) \vdash (q, \overline{\mathbf {r}}, w)\) if \(\mathcal {N}(s)\emptyset \). By induction, \(\emptyset \vdash w \in \overline{\mathbf {r}}\). As \(\mathcal {N}(s)\emptyset \), it must be that \(\emptyset \vdash \varepsilon \in s\). Hence, \(\emptyset \vdash w\in {[s]}:\overline{\mathbf {r}}\).    \(\square \)

If all recursion operators in an expression t are guarded, in the sense that they consume some input before entering a recursive call, then all \(\varepsilon \)-transitions in the constructed automaton pop the stack. In fact, when restricting to guarded expressions, the spontaneous derivative function is not needed at all, which explains the simplicity of the derivative in the work of Winter and coworkers [19].