1 Introduction

Kleene Algebra (KA) [10, 18] provides an algebraic perspective on the equivalence of regular expressions. It is the foundation for Kleene Algebra with Tests (KAT) [9, 19, 20, 24], which has been applied to reason about equivalence of programs in general [22, 25], and programming languages such as NetKAT [1, 34].

Central to Kleene Algebra and its extensions is the completeness property, which says that every equivalence valid in the language model can be proved using the laws of KA. Salomaa showed an important precursor to this result [32], and other authors [6, 10, 13, 26] have studied alternative axiomatizations.

The axiomatization most commonly used today is due to Kozen [18], and has the advantage of being algebraic, i.e., it allows one to define a “Kleene algebra” as a model that may verify or falsify equations. A number of alternative proofs of the same result have been proposed [12, 14, 21, 23]; notably, it was shown that one of the quasi-equations can be dropped from Kozen’s axioms [12, 23].

Another phenomenon of interest is the finite model property (FMP) [5]. For KA, the FMP states that any invalid equivalence is witnessed by some finite Kleene algebra where it does not hold—contrapositively, equivalences valid in any finite Kleene algebra are also valid in any (possibly infinite) Kleene algebra.

Palka [29] showed that the FMP is a consequence of completeness for KA, and moreover that completeness can be recovered if one assumes the FMP. This equivalence raises a question: can we provide an elementary proof of the FMP for KA, i.e., one that does not rely on completeness? Indeed, Palka writes that “an independent proof of [the FMP] would provide a quite different proof of the Kozen completeness theorem, based on purely logical tools” [29].

Our main contribution is a positive answer to this question, providing a proof of the FMP for KA. More specifically, our argument weaves together considerations from Palka’s proof as well as classical facts from automata theory, in such a way that both the FMP and completeness can be concluded.

In contrast with earlier completeness proofs, our method does not center on minimality [18], bisimilarity [14, 23] or the construction of a cyclic proof system [12]. Instead, we rely purely on the fact that KA allows one to find least solutions to linear systems [3, 10, 19], or in our case, to automata. The arguments towards our main result exploit this property in concert with various ideas around automata, such as the transition monoid [27], and Antimirov’s construction [2], eventually building a particular finite Kleene algebra with sufficient structure to conclude both completeness and the finite model property.

The remainder of this paper is organized as follows. Section 2 provides an overview of the context, and defines fundamental notions. Section 3 recalls the notion of solutions to an automaton, a technique that will be leveraged repeatedly. Sections 4 and 5 provide an algebraic perspective on transformation automata [27], and Antimirov’s construction [2] respectively. Section 6 shows how to construct a particularly useful Kleene algebra, and Sect. 7 shows how to conclude completeness and the FMP using the notions discussed up to that point. Section 8 concludes with some discussion and suggestions for future work.

To save space, proofs of auxiliary facts appear in the full version [15]. Our formalization of the proofs in Coq is available online [16].

2 Overview

Our primary objects of study are Kleene algebras. The equations that hold in a Kleene algebra correspond well to properties expected of program composition, which makes them a suitable semantic domain for programs.

Definition 2.1

A (weakFootnote 1) Kleene algebra (KA) is a tuple \((K, +, \cdot , {}^*, 0, 1)\), where K is a set (the carrier), \(+\) and \(\cdot \) are binary operators on K, \({}^*\) is a unary operator on K, and \(0, 1 \in K\) are constants, satisfying the following for all \(x, y, z \in K\):

figure a

Here, we use \(\le \) to denote the natural order induced by \(+\), that is, \(x \le y\) if and only if \(x + y = y\); it is straightforward to verify that this makes \(\le \) a partial order on K, and that all operators are monotone w.r.t. this order.

We often denote a generic KA \((K, +, \cdot , {}^*, 0, 1)\) by its carrier K, and simply write \(+\), \(\cdot \), etc. for the operators and constants when there is no risk of ambiguity.

Typically, the additive operator \(+\) is used to implement nondeterministic composition, the multiplicative operator \(\cdot \) corresponds to sequential composition, the Kleene star operator \({}^*\) implements iteration, 0 represents a program that fails immediately, and 1 is the program that does nothing and terminates successfully. The equations of KA correspond well to what might be expected of such operators on programs—for instance, and iteration is characterized as a least fixpoint.

One very natural instance of Kleene algebras, which we will connect to the interpretation of programs shortly, is given by the relational model.

Example 2.2

(KA of relations). Let X be a set. The set of relations on X, i.e., \(\mathcal {P}(X \times X)\), can be equipped with a KA \(\mathcal {R}_X = (\mathcal {P}(X \times X), \cup , \circ , {}^*, \emptyset , \textsf{id}_X)\), in which \(\circ \) is relational composition; \({}^*\) is the reflexive-transitive closure operator on relations; and \(\textsf{id}_X\) is the diagonal or identity relation on X given by \(\{ (x, x) : x \in X \}\).

When interpreting programs in \(\mathcal {R}_X\), we think of the relations on X as a way of representing how a program may transform the machine states represented by X. To make this more precise, we need a syntax and semantics for programs.

Definition 2.3

(Expressions). We fix a set of actions \(\varSigma = \{ \texttt{a}, \texttt{b}, \texttt{c}, \dots \}\) called the alphabet. The set of regular expressions \(\mathbb {E}\) is given by

$$ e, f \,{::=}\, 0 \mid 1 \mid \texttt{a} \in \varSigma \mid e + f \mid e \cdot f \mid e^* $$

Given a KA K and a function \(h: \varSigma \rightarrow K\), we can define \(\widehat{h}: \mathbb {E} \rightarrow K\) inductively:

$$\begin{aligned} \widehat{h}(0)&= 0&\widehat{h}(\texttt{a})&= h(\texttt{a})&\widehat{h}(e \cdot f)&= \widehat{h}(e) \cdot \widehat{h}(f) \\ \widehat{h}(1)&= 1&\widehat{h}(e + f)&= \widehat{h}(e) + \widehat{h}(f)&\widehat{h}(e^*)&= \widehat{h}(e)^* \end{aligned}$$

Example 2.4

Consider a programming language with integer variables \(\textsf{Var} = \{ \texttt{x}, \texttt{y}, \dots \}\), and statements \(\varSigma \) comprised of (for all \(\texttt{x}, \texttt{y} \in \textsf{Var}\), \(n \in \mathbb {N}\) and \(v \in \textsf{Var} \cup \mathbb {N}\)) assignments \(\texttt{x} \leftarrow n\), increments \(\texttt{x} \leftarrow \texttt{x} + v\), and comparisons \(\texttt{x} < \texttt{y}, \texttt{x} \ge \texttt{y}\).

The state of the machine is defined by the value of each variable, and so we choose \(S = \{ \sigma : \textsf{Var} \rightarrow \mathbb {N} \}\) as the state space. The semantics of the actions are relations that represent their effect, i.e., we define \(h: \varSigma \rightarrow \mathcal {P}(S \times S)\) by

$$\begin{aligned} h(\texttt{x} \leftarrow n)&= \{ (\sigma , \sigma [n/\texttt{x}]) : \sigma \in S \} \\ h(\texttt{x} \leftarrow \texttt{x} + n)&= \{ (\sigma , \sigma [\sigma (\texttt{x}) + n/\texttt{x}]) : \sigma \in S \} \\ h(\texttt{x} \leftarrow \texttt{x} + \texttt{y})&= \{ (\sigma , \sigma [\sigma (\texttt{x}) + \sigma (\texttt{y})/\texttt{x}]) : \sigma \in S \} \\ h(\texttt{x}< \texttt{y})&= \{ (\sigma , \sigma ) : \sigma \in S, \sigma (\texttt{x}) < \sigma (\texttt{y}) \} \\ h(\texttt{x} \ge \texttt{y})&= \{ (\sigma , \sigma ) : \sigma \in S, \sigma (\texttt{x}) \ge \sigma (\texttt{y}) \} \end{aligned}$$

Here, \(\sigma [n/\texttt{x}]\) is denotes the function that assigns n to \(\texttt{x}\), and \(\sigma (\texttt{y})\) to all \(\texttt{y} \ne \texttt{x}\).

This gives us a semantics \(\widehat{h}: \mathbb {E} \rightarrow \mathcal {P}(S \times S)\) for regular expressions over \(\varSigma \), and allows us to express and interpret programs like the following:

$$ \texttt{x} \leftarrow 1 \;\cdot \; \texttt{y} \leftarrow 0 \;\cdot \; \texttt{i} \leftarrow 0 \;\cdot \; (\texttt{i} < \texttt{n} \;\cdot \; \texttt{y} \leftarrow \texttt{y} + \texttt{x} \; \cdot \; \texttt{x} \leftarrow \texttt{x} + 2 \;\cdot \; \texttt{i} \leftarrow \texttt{i} + 1)^* \;\cdot \; (\texttt{i} \ge \texttt{n}) $$

which will compute the square of \(\texttt{n}\) and store it in \(\texttt{y}\).

Of course, one can build more involved programming languages based on KA; one elaborate and well-studied instance is NetKAT [1], a programming language for specifying and reasoning about software-defined networks.

Let \(e, f \in \mathbb {E}\). When \(\widehat{h}(e) = \widehat{h}(f)\) for all \(h: \varSigma \rightarrow K\), we write \(K \models e = f\). If \(\mathfrak {C}\) is a class of KAs and \(K \models e = f\) for each KA K in \(\mathfrak {C}\), then we write \(\mathfrak {C} \models e = f\). We use \(\equiv \) for the smallest congruence on \(\mathbb {E}\) that satisfies the axioms of KA, and use \(e \leqq f\) as shorthand for \(e + f \equiv f\). One can show that all operators are monotone w.r.t. the preorder \(\leqq \), and that \(e \leqq f\) and \(f \leqq e\) together imply \(e \equiv f\).

We use \([\phi ]\) for some logical condition \(\phi \) to denote \(1 \in \mathbb {E}\) when \(\phi \) holds, and \(0 \in \mathbb {E}\) otherwise. We also use the familiar \(\sum \) notation to generalize \(+\). The empty sum is defined to be 0, the unit of \(+\). Note that the sum notation is well-defined up to \(\equiv \), because \(+\) is associative, commutative and idempotent in any KA.

The following is a standard fact of universal algebra—see, e.g., [8].

Lemma 2.5

Let \(e, f \in \mathbb {E}\). We have \(e \equiv f\) iff \(K \models e = f\) for all KAs K.

Given that KA provides such a suitable semantic domain, can we characterize the equational theory of KA, i.e., the equations valid in all models (programming languages) captured by \(\equiv \), as the equations that hold for a particular model or class of models? Conversely, can we guarantee any properties of countermodels (pathological programming languages) that witness invalid equations?

Kozen [19] answered these questions by showing that the valid equations of KA are characterized by the language model. Intuitively, this model assigns to each expression the set of possible sequences of actions that may be executed by the program it represents. We will now make this more precise.

A word is a finite sequence of actions \(\texttt{a}_1\cdots {}\texttt{a}_n\); the empty word (with no letters) is denoted \(\epsilon \). We write \(\varSigma ^*\) for the set of words, and denote its elements by \(w, x, y, \dots \). We can concatenate words by juxtaposing them, i.e., if \(w = \texttt{a}_1\cdots {}\texttt{a}_n\) and \(x = \texttt{b}_1\cdots {}\texttt{b}_m\), then wx is the word given by \(\texttt{a}_1\cdots {}\texttt{a}_n{}\texttt{b}_1\cdots {}\texttt{b}_m\).

A set of words \(L, K, M, \dots \) is called a language. We can combine languages as one would combine sets (e.g., by taking their union). The concatenation of words can also be lifted to languages in a pointwise manner, writing \(L \cdot K\) for the set \(\{ wx : w \in L, x \in K \}\). Finally, the Kleene star of a language L, denoted \(L^*\), is the set \(\{ w_1\cdots {}w_n : w_1, \dots , w_n \in L \}\). Note that \(L^*\) includes the empty word.

Definition 2.6

The KA of languages \(\mathcal {L}\) is given by \((\mathcal {P}(\varSigma ^*), \cup , \cdot , {}^*, \emptyset , \{ \epsilon \})\), where \(\cdot \) is language concatenation, and \({}^*\) is the Kleene star of a language as above. We furthermore define the function \(\ell : \varSigma \rightarrow \mathcal {P}(\varSigma ^*)\) by \(\ell (\texttt{a}) = \{ \texttt{a} \}\).

Remark 2.7

Readers familiar with regular languages will recognize that \(\widehat{\ell }: \mathbb {E} \rightarrow \mathcal {P}(\varSigma ^*)\) is the standard language interpretation of regular expressions.

Algebraically, Kozen’s theorem can now be stated as follows.

Theorem 2.8

(Kozen [19]). For all \(e, f \in \mathbb {E}\), we have that \(e \equiv f\) iff \(\mathcal {L} \models e = f\).

One of the payoffs of this result is that we can decide \(e \equiv f\) by checking whether \(\mathcal {L} \models e = f\), which turns out to be a pspace-complete problem [36].

Another property, known as the finite model property and proved by Palka [29], states that the equational theory of KA can also be characterized by the class of finite KAs, denoted \(\mathfrak {F}\). Her result can be stated as follows:

Theorem 2.9

(Palka [29]). For all \(e, f \in \mathbb {E}\), we have that \(e \equiv f\) iff \(\mathfrak {F} \models e = f\).

In her proof of the above, Palka applied Kozen’s theorem. The central contribution of this paper is that both theorems can be proved independently of each other, by a generic construction that allows one to conclude either result.

3 Solutions to Automata

In this section, we recall automata as a way of defining a language, as well as the notion of the least solution to an automaton. Both of these are well-known, but since they play such a central role for our results we discuss them in detail.

Definition 3.1

(Automaton). A (non-deterministic finite) automaton A is a tuple \((Q, \delta , I, F)\) where Q is a finite set of states, \(\delta : Q \times \varSigma \rightarrow \mathcal {P}(Q)\) is the transition function and \(I \subseteq Q\) (resp. \(F \subseteq Q\)) holds the initial (resp. final) states.

For \(\texttt{a} \in \varSigma \), we write \(\delta _{\texttt{a}}\) for the relation given by \(\{ (q, q') : q' \in \delta (q, \texttt{a}) \}\). This family of relations can be extended to words, as follows:

figure b

The language of a state \(q \in Q\), denoted L(Aq), is the set of words w such that q can reach a final state through \(\delta _w\), given by \(L(A, q) = \{ w \in \varSigma ^* : q \mathrel {\delta _w} q_f \in F \}\). The language of A is defined by its initial states, i.e., \(L(A) = \bigcup _{q_i \in I} L(A, q_i)\).

It is well known that the set of languages defined by regular expressions is the same as the set of languages described by (finite) automata [17]. In fact, the translations that demonstrate this equivalence will play an important role in the remainder of this paper, and we will outline one of these now.

Definition 3.2

(Solutions). Let \(A = (Q, \delta , I, F)\) be an automaton, and \(e \in \mathbb {E}\). An e-solution to A is a function \(s: Q \rightarrow \mathbb {E}\) s.t. the following hold for all \(q, q' \in Q\):

figure c

A 1-solution to A is simply called a solution to A. We say that s is the least e-solution to A if for all e-solutions \(s'\) it holds for all \(q \in Q\) that \(s(q) \leqq s'(q)\).

Least e-solutions are unique up to the laws of KA; this explains why we can speak of the least e-solution to an automaton.

3.1 Computing Solutions

It is well-known that least e-solutions always exist for (finite) automata; the process to compute these [10, 18] closely resembles the state elimination technique from [17, 28], which computes a regular expression representing the language accepted by an automaton.

Theorem 3.3

(Computing solutions). Let \(A = (Q, \delta , I, F)\) be an automaton, and let \(e \in \mathbb {E}\). We can compute the least e-solution to A, denoted \(\overline{A^e}\).

In fact, the above statement can be strengthened: as it turns out, the least solution to A gives rise to all of the least e-solutions [18], in the following sense.

Theorem 3.4

(Relating solutions). Let \(A = (Q, \delta , I, F)\) be an automaton, and let \(e \in \mathbb {E}\). For all \(q \in Q\), it holds that \(\overline{A^1}(q) \cdot e \equiv \overline{A^e}(q)\).

The two results above form the technical nexus of this paper, and will be applied repeatedly throughout the coming three sections. The second result in particular, which connects solutions to e-solutions, will prove to be rather useful.

To lighten notation, we will simply write \(\overline{A}\) for \(\overline{A^1}\), which we call the least solution to A. We also write \(\lfloor A\rfloor \) for the expression \(\sum _{q \in I} \overline{A}(q)\).

3.2 Properties of Solutions

We conclude this section by recording three more properties of solutions. For the remainder of this section, we fix two automata \(A_i = (Q_i, \delta _i, I_i, F_i)\) for \(i \in \{1,2\}\).

For the first property, we need to define morphisms of automata.

Definition 3.5

A morphism from \(A_1\) to \(A_2\) is a function \(h: Q_1 \rightarrow Q_2\) where (1) if \(q \in F_1\) then \(h(q) \in F_2\), and (2) if \(q' \in \delta _1(q, \texttt{a})\), then \(h(q') \in \delta _2(h(q), \texttt{a})\). Furthermore, h is strong when for all \(q \in I_1\) we have that \(h(q) \in I_2\).

Morphisms between automata relate their least solutions, as follows.

Lemma 3.6

Let \(h: Q_1 \rightarrow Q_2\) be a morphism from \(A_1\) to \(A_2\). For all \(q \in Q\), it holds that \({\overline{A_1}(q) \leqq \overline{A_2}(h(q))}\). Furthermore, if h is strong, then \({\lfloor A_1\rfloor \leqq \lfloor A_2\rfloor }\).

For the second property, we need the notion of a subautomaton.

Definition 3.7

We say \(A_1\) is a subautomaton of \(A_2\) when \(Q_1 \subseteq Q_2\), and furthermore for all \(\texttt{a} \in \varSigma \) we have that \(\delta _1(q, \texttt{a}) = \delta _2(q, \texttt{a})\).

Unsurprisingly, the least solution to a subautomaton coincides with the least solution of the automaton that contains it, on the states where they overlap.

Lemma 3.8

If \(A_1\) is a subautomaton of \(A_2\) and \(q \in Q_1\), then \(\overline{A_1}(q) \equiv \overline{A_2}(q)\).

The third and last property that we will use connects the least solution of an automaton to the languages of that automaton.

Lemma 3.9

Both of the following hold for all \(q \in Q_1\):

figure d

Here, \([q \in F]\) is shorthand for 1 if \(q \in F\), and 0 otherwise.

4 Transformation Automata

Throughout this section, we fix an automaton \(A = (Q, \delta , I, F)\).

We now turn our attention to transformation automata [27]. Intuitively, the states of a transformation automaton \(A'\) obtained from A are relations on Q, with the intention that reading \(w \in \varSigma ^*\) starting from a state R in \(A'\) leads (uniquely) to \(R \circ \delta _w\). In particular, reading w in \(A'\) from \(\textsf{id}_Q\) takes us to \(\delta _w\), which is why we will pay special attention to the solutions to \(\textsf{id}_Q\) in transformation automata.

Definition 4.1

We define \(\delta ^\tau : \mathcal {P}(Q \times Q) \times \varSigma \rightarrow \mathcal {P}(\mathcal {P}(Q \times Q))\) by setting

$$ \delta ^\tau (R, \texttt{a}) = \{ R \circ \delta _{\texttt{a}} \} $$

For each \(R \subseteq Q \times Q\), write A[R] for the R-transformation automaton

$$ (\mathcal {P}(Q \times Q), \delta ^\tau , \{ \textsf{id}_Q \}, \{ R \}) $$

Note that the above still fits our definition of an automaton, since if Q is finite then so is the set of relations on Q, i.e., \(\mathcal {P}(Q \times Q)\). It is also useful to point out that transformation automata are deterministic, in that each state leads to one (and only one) next state for a given letter.

Remark 4.2

Readers familiar with formal language theory may recognize transformation automata as the construction used to show that each language accepted by an automaton can also be recognized by a (finite) monoid [27].

In the remainder of this section, we characterize the solution to A in terms of solutions to its transformation automata. To this end, we first analyze the solutions to transformation automata in general. A useful first observation is that, for each \(\texttt{a} \in \varSigma \), words read from \(\textsf{id}_Q\) to \(\delta _{\texttt{a}}\) in the transformation automaton include \(\texttt{a}\). This gives rise to the following property on the level of solutions.

Lemma 4.3

For all \(\texttt{a} \in \varSigma \), it holds that \(\texttt{a} \leqq \lfloor A[\delta _{\texttt{a}}]\rfloor \).

Furthermore, if \(R_1\), \(R_2\) and \(R_3\) are relations, and if we can read w by moving from \(R_1\) to \(R_2\), then we can also read w by moving from \(R_3 \circ R_1\) to \(R_3 \circ R_2\). This can be expressed in terms of solutions to transformation automata, as follows.

Lemma 4.4

For all \(R_1, R_2, R_3 \subseteq Q \times Q\), it holds that

$$ \overline{A[R_2]}(R_1) \leqq \overline{A[R_3 \circ R_2]}(R_3 \circ R_1) $$

Proof Sketch. Let’s fix \(R_2\) and \(R_3\). We choose \(s: \mathcal {P}(Q \times Q) \rightarrow \mathbb {E}\) by setting \( s(R) = \overline{A[R_3 \circ R_2]}(R_3 \circ R) \). Now show that s is a solution to \(A[R_2]\).    \(\square \)

We can think of the least solution to \(\textsf{id}_Q\) in the R-transformation automaton of A as an expression representing all words w such that \(\delta _w = R\). This explains the next property, which is an algebraic encoding of the fact that if \(w_1, w_2 \in \varSigma ^*\) are such that \(\delta _{w_1} = R_1\) and \(\delta _{w_2} = R_2\), then \(\delta _{w_1 \cdot w_2} = \delta _{w_1} \circ \delta _{w_2} = R_1 \circ R_2\).

Lemma 4.5

For all \(R_1, R_2 \subseteq Q\) it holds that \( \lfloor A[R_1]\rfloor \cdot \lfloor A[R_2]\rfloor \leqq \lfloor A[R_1 \circ R_2]\rfloor \)

Proof Sketch. Using Lemmas 4.3 and 4.4, one can show that \(\overline{A[R_1 \circ R_2]}\) is an \(\lfloor A[R_2]\rfloor \)-solution to \(A[R_1]\), which implies the claim.    \(\square \)

With this property in hand, we can now express the least solution to A in terms of the least solutions to its transformation automata, as follows.

Lemma 4.6

For all \(q \in Q\) it holds that \( \overline{A}(q) \equiv \sum _{q \mathrel {R} q_f \in F} \lfloor A[R]\rfloor \)

Proof (Proof sketch)

To show that the left-hand side is contained in the right-hand side, we use the preceding lemmas to show that it constitutes a solution to A. For the converse containment, we argue that the solution of A gives rise to a solution to each of the automata A[R] that appear on the right-hand side.    \(\square \)

5 Antimirov’s Construction

We now discuss the least solution to an automaton \(A_e\) that accepts \(\widehat{\ell }(e)\), for each \(e \in \mathbb {E}\). Many methods to obtain such an automaton exist (for instance [7, 38]; see [39] for a good overview). We focus on Antimirov’s construction [2], and show that an expression e can be recovered from its Antimirov automaton.

Remark 5.1

In a sense, the property we prove is analogous to the one shown by Kozen [21] (see also [14]), who proved that e can recovered from the solution to its Brzozowski automaton [7]. We diverge from this for two reasons.

  1. 1.

    Antimirov’s construction produces non-deterministic automata, which makes it a bit easier to express than Brzozowski’s construction, which uses deterministic automata. In particular, this saves us from having to consider the theory necessary to make Brzozowski’s construction produce a finite automaton.

  2. 2.

    Kozen’s result about Brzozowski’s construction leverages the fact that bisimilar automata have equivalent solutions. This is a very powerful (and somewhat tricky to prove) observation, which also underlies the completeness proof in [14, 21]. For Antimirov automata, however, it turns out that we can rely on morphisms of automata instead, which are fairly easy to establish.Footnote 2

Having said that, the structure of the proof that follows is very much inspired by the strategy employed in [14, 21], especially when it comes to Lemma 5.15.

5.1 Recalling Antimirov’s Automata

The main idea behind Antimirov’s construction is that expressions are endowed with the structure of an automaton. The language of a state in this automaton is meant to be \(\widehat{\ell }(e)\), the language denoted by its expression. From this perspective, the accepting states should be those representing expressions whose language contains the empty word. This set of expressions is fairly easy to describe.

Definition 5.2

The set \(\mathbb {F}\) is defined as the smallest subset of \(\mathbb {E}\) satisfying:

figure e

Next, we recall Antimirov’s transition function. The intuition is that an expression e has an \(\texttt{a}\)-transition to an expression \(e'\) when \(e'\) denotes remainders of words in \(\widehat{\ell }(e)\) that start with \(\texttt{a}\)—i.e., if \(w \in \widehat{\ell }(e')\), then \(\texttt{a}w \in \widehat{\ell }(e)\). Together, the expressions reachable by \(\texttt{a}\)-transitions from e should describe all such words.

In the following, when \(S \subseteq \mathbb {E}\) and \(e \in \mathbb {E}\), we write for \(\{ e' \cdot e : e' \in S \}\). We are now ready to define Antimirov’s transition function, as follows.

Definition 5.3

We define \(\partial : \mathbb {E} \times \varSigma \rightarrow \mathcal {P}(\mathbb {E})\) recursively, as follows

figure g

Here, we use \(e \star S\) as a shorthand for S when \(e \in \mathbb {F}\), and \(\emptyset \) otherwise.

Of course, the expression e could serve as the sole initial state in the automaton for e. However, our automata allow multiple initial states, and distributing this task among them will simplify some of the arguments that follow.

Definition 5.4

We define \(\iota : \mathbb {E} \rightarrow \mathcal {P}(\mathbb {E})\) recursively, as follows:

figure h

We could now try to package these parts into an automaton \((\mathbb {E}, \partial , \iota (e), \mathbb {F})\) for each expression e. Unfortunately, we have defined our automata to be finite, so a little more work is necessary to identify the expressions that are relevant (i.e., represented by reachable states) for a starting expression e.

Definition 5.5

We define \(\rho : \mathbb {E} \rightarrow \mathcal {P}(\mathbb {E})\) recursively, as follows:

figure i

With this function in hand, we can verify that it fits all of the requirements of the state space of an automaton with respect to the other parts identified above.

Lemma 5.6

For all \(e \in \mathbb {E}\), the set \(\rho (e)\) is finite and closed under \(\partial \), i.e., if \(e' \in \rho (e)\) and \(e'' \in \partial (e', \texttt{a})\), then \(e'' \in \rho (e)\) as well. Furthermore, \(\iota (e) \subseteq \rho (e)\).

In light of this, we write \(\partial _e\) for the function \(\partial _e: \rho (e) \times \varSigma \rightarrow \mathcal {P}(\rho (e))\) obtained by restricting \(\partial \). We can now define Antimirov automata, as follows.

Definition 5.7

Let \(e \in \mathbb {E}\). We write \(A_e\) for the Antimirov automaton

$$ (\rho (e), \partial _e, \iota (e), \mathbb {F} \cap \rho (e)) $$

Antimirov’s transition function can be used to decompose an expression e into several “derivatives” \(e'\), which can then reconstitute e. This validates the intuition that the derivatives collectively contain (only) the “tails” of words denoted by e. Similarly, the initial expressions \(\iota (e)\) can also be used to reconstitute e.

Theorem 5.8

Let \(e \in \mathbb {E}\). The following two equivalences hold:

figure j

The first property above is usually referred to as the fundamental theorem of Kleene algebra [31], because of its close resemblance to the fundamental theorem of calculus. One caveat is that one needs to prove that the sums on the right-hand sides are in fact finite, but this turns out to be the case.

We end this subsection by recording two more useful properties of \(\iota \).

Lemma 5.9

Let \(e \in \mathbb {E}\). We have \(e \in \mathbb {F}\) if and only if there exists an \(e' \in \iota (e)\) such that \(e' \in \mathbb {F}\). Also, \(e'' \in \partial (e, \texttt{a})\) if and only if \(e'' \in \partial (e', \texttt{a})\) for some \(e' \in \iota (e)\).

5.2 Solving Antimirov’s Automata

Having fully described Antimirov’s construction, we resume with the proof of the main technical point of this section, which is that the solution to \(A_e\) can be used to construct an expression equivalent to e.

More precisely, we will prove that e is equivalent to the sum of the solutions to its initial states, \(\lfloor A_e\rfloor \), by showing that \(\lfloor A_e\rfloor \leqq e\) and \(e \leqq \lfloor A_e\rfloor \). The former property is easy to prove using the theory established up to this point.

Lemma 5.10

For all \(e \in \mathbb {E}\), it holds that \(\lfloor A_e\rfloor \leqq e\).

Proof Sketch

By Theorem 5.8, the injection of \(\rho (e)\) into \(\mathbb {E}\) is a solution to \(A_e\).

   \(\square \)

To show that \(e \leqq \lfloor A_e\rfloor \), we cannot exploit the fact that \(\overline{A_e}\) is the least solution to \(A_e\), as above. Our proof will instead operate by induction on e. First, we will need to develop some theory; the following abstraction is useful.

Definition 5.11

Let \(e, f \in \mathbb {E}\). We write \(e \lesssim f\) when there exists a strong automaton morphism \(h: \rho (e) \rightarrow \rho (f)\) from \(A_e\) to \(A_f\).

By Lemma 3.6, if we want to show that \(\lfloor A_e\rfloor \leqq \lfloor A_f\rfloor \), it is sufficient to prove \(e \lesssim f\). We record the following instances of expressions being related by \(\lesssim \).

Lemma 5.12

The following hold for all \(e_0, e_1, e_2 \in \mathbb {E}\):

figure k

Proof Sketch. In all cases, a map can be gleaned from the structure of the relevant state spaces; checking that it is a term morphism is routine.    \(\square \)

Remark 5.13

Kozen [21] and Jacobs [14] show that if \(e, f \in \mathbb {E}\) are such that \(e \leqq f\), then the Brzozowski automaton of e is simulated by that of f, and hence these automata yield solutions \(e'\) and \(f'\) such that \(e' \leqq f'\).

It is tempting to try and prove a similar property for Antimirov automata, along the lines of “if \(e \leqq f\), then \(e \lesssim f\)”. Unfortunately, this is not true. For instance, if \(e = \texttt{a} \cdot (\texttt{b} + \texttt{c})\) and \(f = \texttt{a} \cdot \texttt{b} + \texttt{a} \cdot \texttt{c}\), then \(e \leqq f\), but there is no strong morphism from \(A_e\) to \(A_f\). Fortunately, Lemma 5.12 is sufficient for our purposes.

The solutions to the automata for e, \(e \cdot 1\) and \(1 \cdot e\) are also related.

Lemma 5.14

Let \(e \in \mathbb {E}\). It holds that \(\lfloor A_{e \cdot 1}\rfloor \leqq \lfloor A_e\rfloor \) and \(\lfloor A_e\rfloor \leqq \lfloor A_{1 \cdot e}\rfloor \).

Proof Sketch. We show that the solution to one automaton gives rise to a solution to the other automaton, using Lemmas 5.9 and 3.9 for the latter claim.    \(\square \)

The next lemma is the main workhorse that we need to show that \(e \leqq \lfloor A_e\rfloor \). The proof is very similar to that of [21, Lemma 3].

Lemma 5.15

Let \(e, f \in \mathbb {E}\). It holds that \( e \cdot \lfloor A_f\rfloor \leqq \lfloor A_{e \cdot f}\rfloor \)

Proof Sketch. As in [21, Lemma 3], we proceed by induction on e; we use Lemmas 3.6 and 5.14 in the base, and Lemma 5.12 in the inductive cases.    \(\square \)

With this in hand, we now have everything required to conclude the desired property of solutions to Antimirov automata, which we record below.

Lemma 5.16

For all \(e \in \mathbb {E}\), we have that \(e \equiv \lfloor A_e\rfloor \).

Proof

We already knew that \(\lfloor A_e\rfloor \leqq e\) by Lemma 5.10. To show that \(e \leqq \lfloor A_e\rfloor \), we derive using Lemmas 5.14 and 5.15, as follows:

$$ e \equiv e \cdot 1 \leqq e \cdot \lfloor A_1\rfloor \leqq \lfloor A_{e \cdot 1}\rfloor \leqq \lfloor A_e\rfloor $$

The second step is valid because \(1 \in \iota (1)\) and \(1 \in \mathbb {F}\), so \(1 \leqq \overline{A_1}(1) \le \lfloor A_1\rfloor \).    \(\square \)

6 From Monoids to Kleene Algebras

Recall that our objective was to derive a finite KA for two expressions, whose properties can then be used to conclude completeness and the FMP. We already saw how an expression gives rise to an automaton, which can then be turned into a transformation automaton. As it happens, the states of this transformation automaton have the internal structure of a monoid—indeed, this was the original motivation for the construction [27]—but we still do not have a KA.

In this section, we recall a straightforward translation from monoids to KAs proposed by Palka [29], and prove a useful property that we will leverage in the proof later on. Let us start by recalling the definition of a monoid.

Definition 6.1

A monoid is a tuple \((M, \cdot , 1)\) where M is a set, \(\cdot \) is a binary operator and \(1 \in M\) such that the following hold for all \(m_0, m_1, m_2 \in M\):

figure l

A function \(h: \varSigma \rightarrow M\) gives rise to the function \(\widetilde{h}: \varSigma ^* \rightarrow M\), defined by

$$ \widetilde{h}(\texttt{a}_1 \cdots \texttt{a}_n) = h(\mathtt {a_1}) \cdot \cdots \cdot h(\mathtt {a_n}) $$

As for KAs, we may identify a monoid \((M, \cdot , 1)\) with its carrier M, if the accompanying operator and unit are clear from context.

As stated above, if \(A = (Q, \delta , I, F)\) is an automaton, then the state space of its transition automata is given by \(\mathcal {P}(Q \times Q)\)—i.e., the relations on Q—which has a monoidal structure: the operator is given by relational composition, and the unit is the identity relation on Q. In the sequel, we write \(M_A\) for this monoid.

The composition operator of a monoid can be lifted to sets of its elements, which can then be used to derive a fixed point operator, as follows.

Lemma 6.2

(Palka [29]). If \((M, \cdot , 1)\) is a monoid, then \((\mathcal {P}(M), \cup , \otimes , {}^{\circledast }, \emptyset , \{ 1 \})\) is a KA, where \(\otimes \) and \({}^\circledast \) are defined by choosing for \(U, V \subseteq M\):

figure m

As an example of this construction, note that applying this construction to the free monoid \((\varSigma ^*, \cdot , \epsilon )\) precisely yields the free KA of languages.

Now, given an expression \(e \in \mathbb {E}\), a monoid \((M, \cdot , 1)\), and a map \(h: \varSigma \rightarrow M\), we have two ways of interpreting e inside of the KA that arises from this monoid:

  1. 1.

    We lift the map \(\texttt{a} \mapsto \{ h(\texttt{a}) \}\) to obtain a map \(\mathbb {E} \rightarrow \mathcal {P}(M)\).

  2. 2.

    We map each \(w \in \widehat{\ell }(e)\) to an element of M via \(\widetilde{h}: \varSigma \rightarrow M\).

The next lemma shows that these two interpretations of expressions inside the KA for \((M, \cdot , 1)\) are actually the same; it can be thought of as a generalization of [29, Lemma 3.1], which covers the special case for the syntactic monoid.

Lemma 6.3

Let \((M, \cdot , 1)\) be a monoid and let \((\mathcal {P}(M), \cup , \otimes , {}^\circledast , \emptyset , \{ 1 \})\) be the KA obtained from it, per Lemma 6.2. Furthermore, let \(h_1: \varSigma \rightarrow M\) and \(h_2: \varSigma \rightarrow \mathcal {P}(M)\) be such that for all \(\texttt{a} \in \varSigma \), we have that \(h_2(\texttt{a}) = \{ h_1(\texttt{a}) \}\). Then for \(e \in \mathbb {E}\):

$$ \widehat{h_2}(e) = \{ \widetilde{h_1}(w) : w \in \widehat{\ell }(e) \} $$

We conclude this section by leveraging the above to prove a pivotal lemma: the solution to a state q of an automaton A can be recovered by interpreting this expression inside of the KA obtained from the transformation automata of A, and looking at the solutions to the relations inside that interpretation.

Lemma 6.4

Let \(A = (Q, \delta , I, F)\) be an automaton and \(q \in Q\). Furthermore, let \(h: \varSigma \rightarrow \mathcal {P}(M_A)\) be given by \(h(\texttt{a}) = \{ \delta _{\texttt{a}} \}\). The following holds.

$$ \overline{A}(q) \equiv \sum _{R \in \widehat{h}(\overline{A}(q))} \lfloor A[R]\rfloor $$

Proof

We start by massaging the proof goal. Lemma 6.3 tells us that \(R \in \widehat{h}(\overline{A}(q))\) if and only if there exists a \(w \in \widehat{\ell }(\overline{A}(q))\) such that \(R = \delta _w\). Using this observation and Lemmas 4.6 and 3.9, it suffices to show

$$ \sum _{q \mathrel {R} q' \in F} \lfloor A[R]\rfloor \equiv \sum _{w \in L(A,q)} \lfloor A[\delta _w]\rfloor $$

For the inclusion from left to right, let \(R \subseteq Q \times Q\) and \(q' \in F\) be such that \(q \mathrel {R} q'\). On the one hand, if \(\widehat{\ell }(\lfloor A[R]\rfloor ) = \emptyset \), then an easy inductive argument shows that \(\lfloor A[R]\rfloor \equiv 0\), which means that the term \(\lfloor A[R]\rfloor \) does not contribute to the sum. Otherwise, let \(w \in \widehat{\ell }(\lfloor A[R]\rfloor )\). By Lemma 3.9, we know that \(w \in L(A[R], \textsf{id}_Q)\), and thus \(\delta _w = R\). Since \(q \mathrel {\delta _w} q' \in F\), we also know that \(w \in L(\overline{A}(q))\). Therefore \(\lfloor A[R]\rfloor = \lfloor A[\delta _w]\rfloor \) appears in the sum on the right-hand side.

For the inclusion from right to left, let \(w \in L(A,q)\). In that case, \(q \delta _w q'\) for some \(q' \in F\). Thus \(\lfloor A[\delta _w]\rfloor \) appears in the sum on the left-hand side.    \(\square \)

7 Completeness and the FMP

We are now ready to prove our main claims. In a nutshell, our proof will take two expressions e and f, apply the transformation automaton construction to \(A_{e+f}\), and then use the resulting state space monoid to obtain a KA. The previously derived facts connect e and f to their interpretation inside this KA, which we will use in two different ways to conclude both completeness and the FMP.

Throughout this section, we fix \(e, f \in \mathbb {E}\). For brevity, we also write \(\partial \) for the transition function of \(A_{e+f}\). We fix \(h: \varSigma \rightarrow \mathcal {P}(M_{A_{e+f}})\) by \(h(\texttt{a}) = \{ \partial _{\texttt{a}} \}\). Note that \(M_{A_{e+f}}\) is a finite monoid, and hence \(\mathcal {P}(M_{A_{e+f}})\) is a finite KA by Lemma 6.2.

The next lemma puts the results of the previous sections together to connect e and f to their interpretations inside \(\mathcal {P}(M_{A_{e+f}})\), in the following way.

Lemma 7.1

The following two equivalences hold:

figure n

Proof

Without loss of generality, we prove the first property by deriving:

$$\begin{aligned} e&\equiv \lfloor A_e\rfloor&(\textrm{Lemma} 5.16) \\&\equiv \sum _{e' \in \iota (e)} \overline{A_e}(e')&(\mathrm{def.} \lfloor A_e\rfloor ) \\&\equiv \sum _{e' \in \iota (e)} \overline{A_{e+f}}(e')&(\textrm{Lemma} 3.8) \\&\equiv \sum _{e' \in \iota (e)} \sum _{R \in \widehat{h}(\overline{A_{e+f}}(e'))} \lfloor A_{e+f}[R]\rfloor&(\textrm{Lemma} 6.4) \\&\equiv \sum _{R \in \widehat{h}(e)} \lfloor A_{e+f}[R]\rfloor&(\text {see below}) \end{aligned}$$

The last equivalence holds because by Lemmas 3.8 and 5.16, we have that:

$$ e \equiv \lfloor A_e\rfloor \equiv \sum _{e' \in \iota (e)} \overline{A_e}(e') \equiv \sum _{e' \in \iota (e)} \overline{A_{e+f}}(e') $$

and thus \(\widehat{h}(e) = \bigcup _{e' \in \iota (e)} \widehat{h}(\overline{A_{e+f}}(e'))\) by Lemma 2.5.    \(\square \)

We are now ready to conclude our first main claim: the finite model property holds for KA. Recall that \(\mathfrak {F}\) denotes the class of all finite Kleene algebras, to which \(\mathcal {P}(M_{A_{e+f}})\) belongs. This allows us to apply Lemma 7.1, as follows.

Theorem 7.2

(Finite model property). If \(\mathfrak {F} \models {e = f}\), then \(e \equiv f\).

Proof

By the premise, we have that \(\widehat{h}(e) = \widehat{h}(f)\), and so by Lemmas 7.1 and 2.5:

$$ e \equiv \sum _{R \in \widehat{h}(e)} \lfloor A_{e+f}[R]\rfloor = \sum _{R \in \widehat{h}(f)} \lfloor A_{e+f}[R]\rfloor \equiv f $$

   \(\square \)

Finally, we note that a very similar proof also allows us to conclude that the axioms of KA are complete w.r.t. its language model, thanks to the connection between interpretations in lifted monoids given by Lemma 6.3.

Theorem 7.3

(Completeness). If \(\mathcal {L} \models {e = f}\), then \(e \equiv f\).

Proof

Let \(h': \varSigma \rightarrow M_{A_{e+f}}\) be given by \(h'(\texttt{a}) = \partial _{\texttt{a}}\). By the premise \(\widehat{\ell }(e) = \widehat{\ell }(f)\), and thus \(\widehat{h}(e) = \widehat{h}(f)\) because we can use Lemma 6.3 to derive

$$ \widehat{h}(e) = \{ \widetilde{h'}(w) : w \in \widehat{\ell }(e) \} = \{ \widetilde{h'}(w) : w \in \widehat{\ell }(f) \} = \widehat{h}(f) $$

We can then conclude by leveraging Lemma 7.1, as for Lemma 7.2.    \(\square \)

8 Discussion

We leave the reader with some final considerations regarding our formalization and directions for possible further work.

Coq Formalization. We have formalized all of our results in Coq [4, 11]. The trusted base comes down to (1) the axioms of the Calculus of Inductive Constructions, (2) injectivity of dependent equality (equivalent to Streicher’s axiom K [37]), and (3) dependent functional extensionality. The latter is a result of our encoding of subsets, and can most likely be factored out with better data structures.

All proofs as presented here are faithful to the insights underlying the claim, although some encodings differ slightly. For instance, the definition of \(\rho (e)\) in the development is more accurately rendered using disjoint union.

Possible Extensions. Guarded Kleene Algebra with Tests (GKAT) [33, 35] is a fragment of KAT with favorable decidability properties. GKAT in particular admits a set of axioms that are complete w.r.t. its language (resp. relational, probabilistic) model, but this set is infinite as a result of an axiom scheme. We wonder whether the techniques discussed here could be applied to arrive at a more satisfactory completeness result. To start answering this question, one would first have to devise an analogue transformation automata and monoids for GKAT.

Relational Models. Pratt [30] connected the language model of KA to the relational model—essentially saying that if \(\mathfrak {R}\) is the class of relational KAs (as in Example 2.2), then \(\mathfrak {R} \models e = f\) if and only if \(\mathcal {L} \models e = f\) for all \(e, f \in \mathbb {E}\). By Theorem 7.3, this means that relational models are also complete for KA.

In light of Theorem 7.2, we wonder: can this form of completeness be strengthened to finite relational models? A positive answer would mean that the finite countermodel accompanying an invalid equation would correspond to an interpretation of the primitive actions as state transformers on a finite state space.

There is a tantalizing candidate for a canonical model that might be able to fill the role of \(\mathcal {P}(M_{A_{e+f}})\) in the previous section: simply use the relational KA with the carrier \(M_{A_{e+f}}\). For this to work, we would have to connect e and f with their interpretations inside this KA, which will require further research.