Aggregation-based minimization of finite state automata

We present a minimization algorithm for non-deterministic finite state automata that finds and merges bisimulation-equivalent states. The bisimulation relation is computed through partition aggregation, in contrast to existing algorithms that use partition refinement. The algorithm simultaneously generalises and simplifies an earlier one by Watson and Daciuk for deterministic devices. We show the algorithm to be correct and run in time On2r2Σ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ O \left( n^2 r^2 \left| \varSigma \right| \right) $$\end{document}, where n is the number of states of the input automaton M\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M$$\end{document}, r is the maximal out-degree in the transition graph for any combination of state and input symbol, and Σ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left| \varSigma \right| $$\end{document} is the size of the input alphabet. The algorithm has a higher time complexity than derivatives of Hopcroft’s partition-refinement algorithm, but represents a promising new solution approach that preserves language equivalence throughout the computation process. Furthermore, since the algorithm essentially computes the maximal model of a logical formula derived from M\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M$$\end{document}, optimisation techniques from the field of model checking become applicable.


Introduction
Finite-state automata (nfa) is a fundamental concept in theoretical computer science, and their computational and representational complexity is the subject of extensive investigations. In this work, we revisit the minimization problem for nfa, which inputs an automaton M with n states and outputs a minimal language equivalent automaton M . In the case of deterministic finite state automata (dfa), it is well-known that M is always unique and canonical with respect to the recognized language. In the more general, non-deterministic case, no analogous result exists and M is typically only one of several equally compact automata. Moreover, finding any one of these is PSPACE complete [17], and the problem cannot even be efficiently approximated within a factor o(n) unless P = PSPACE [12].
Since nfa minimization is inherently difficult, attention has turned to efficient heuristic minimization algorithms, that often, if not always, perform well. In this category we find bisimulation minimization. Intuitively, two states are bisimulation equivalent if every tran-sition that can be made from one of them, can be mirrored starting from the other. More formally, an equivalence relation E on the states Q of an nfa M is a bisimulation relation if the following holds: (i) the relations respects the separation in M of final and non-final states, and (ii) for every p, q ∈ Q such that ( p, q) ∈ E, if p ∈ Q can be reached from p on the symbol a, then there must be a q ∈ Q that can be reached from q on a, and ( p , q ) ∈ E.
The transitive closure of the union of two bisimulation relations is again a bisimulation relation, so there is a unique coarsest bisimulation relation E of every nfa M. When each equivalence class of E is merged into a single state, the result is a smaller but language-equivalent nfa. If M is deterministic, then this approach coincides with regular dfa minimization. The currently predominant method of finding E is through partition refinement: The states are initially divided into final and non-final states, and the minimization algorithm resolves contradictions to the bisimulation condition by refining the partition until a fixed point is reached. This method is fast, and requires O(m log n) computation steps (see [20]), where m is the size of M's transition function. The drawback is that up until termination, merging equivalence classes into states will not preserve the recognized language.
In this paper, which extends and revises [5], we present an nfa minimization algorithm that produces intermediate solutions language-equivalent to M. Similarly to previous approaches, the algorithm computes the coarsest bisimulation relation E on M. However, the initial partition is entirely made up of singleton classes, and these are repeatedly merged until a fixed point is reached. The algorithm runs in time O n 2 · (log n 2 + r 2 |Σ|) , where r is the maximal outdegree in the transition graph for any combination of state and input symbol, and Σ is the input alphabet. This is slower than the derivatives of Hopcroft's partition-refinement algorithm, out of which Paige and Tarjan's algorithm is one, but we believe that it is a useful first step, and it is still an open question whether partition aggregation can be computed as efficiently as partition refinement.
The use of aggregation was inspired by a family of minimization algorithms for dfas (see Sect. 1.1), and we lift the technique to non-deterministic devices. In the deterministic case, our algorithm runs in O n 2 |Σ| , which is the same as for the fastest aggregation-based dfa minimisation algorithms.
Another contribution is the computational approach: we derive a characteristic propositional-logic formula w M for the input automaton M, in which the variables are pairs of states. The algorithm's main task is to compute a maximal modelv of w M , in the sense thatv assigns 'true' to as many variables as possible. We show that if w M is satisfiable, then v is unique and efficiently computable by a greedy algorithm, andv encodes the coarsest bisimulation relation on M.

Related work
dfa minimization has been studied extensively since the 1950s (see [13,15,18]). ten Eikelder [22] observed that the equivalence problem for recursive types can be formulated as a dfa reachability problem, and gave a recursive procedure for deciding equivalence for a pair of dfa states. This procedure was later used by Watson [23] to formulate a dfa minimization algorithm that works through partition aggregation. The algorithm runs in exponential time, and two mutually exclusive optimization methods were proposed by Watson and Daciuk [24]. One uses memoization to limit the number of recursive invocations; the other bases the implementation on the union-find data structure (see [2,14,21]). The union-find method reduces the complexity from O |Σ| n−2 n 2 down to O α(n 2 )n 2 , where α(n), roughly speaking, is the inverse of Ackermann's function. The value of this function is less than 5 for n ≤ 2 2 16 , so it can be treated as a constant.
The original formulation of the algorithm was later rectified by Daciuk [10], who discovered and removed an incorrect combination of memoization and restricted recursion depth. The fact that this combination was problematic had been pointed out by Almeida et al. [3], who had found situations in which the Watson-Daciuk algorithm returned non-minimal dfas. Almeida et al. [3] also presented a simpler version, doing away with presumably costly dependency list management. Assuming a constant alphabet size, they state that their algorithm has a worst-case running time of O α(n 2 )n 2 for all practical cases, yet also claim it to be faster than the Watson-Daciuk one. Based on Almeida's reporting, Daciuk [10,Section 7.4] provided a new version, presented as a compromise between the corrected Watson-Daciuk and the Almeida-Moreira-Reis algorithm, but did not discuss its efficiency. The original version of the algorithm has been lifted to deterministic tree automata (a generalisation of finite state automata) both as an imperative sequential algorithm and in terms of communicating sequential processes (see [9]). nfa minimisation has also received much attention, and we restrict our discussion to heuristics that compute weaker relations than the actual Nerode congruence (recalled in Sect. 2). Paige and Tarjan [20] presented three partition refinement algorithms, one of which is essentially bisimulation minimization for nfas. The technique was revived by Abdulla et al. [1] for finite-state tree automata. The paper was soon followed by bisimulation-minimization algorithms for weighted and unranked tree automata by Björklund et al. [6] and Björklund et al. [7], and also algorithms based on more general simulation relations by Abdulla et al. [1] and Maletti [16]. Our work is to the best of our knowledge the first in which the bisimulation relation is computed through partition aggregation.

Sets, numbers, and relations
We write N for the set of natural numbers, including 0. For n ∈ N, [n] = {i ∈ N | 1 ≤ i ≤ n}. Thus, [0] = ∅. The cardinality of a set S is written |S| and the powerset of S by pow(S). A binary relation ⊗: A binary relation is an equivalence relation if it is reflexive, symmetric and transitive. Let E and F be equivalence relations on S. We say that F is coarser than E (or equivalently: that An alphabet is a finite nonempty set. Given an alphabet Σ we write Σ * for the set of all strings over Σ, and ε for the empty string. A string language is a subset of Σ * .

Finite state automata
A nondeterministic finite state automaton is a tuple M = (Q, Σ, δ, Q I , Q F ), where Q is a finite set of states; Σ is an alphabet of input symbols; the transition function δ = (δ f ) f ∈Σ is a family of functions δ f : Q → pow(Q); Q I ⊆ Q is a set of initial states; and Q F ⊆ Q is a set of final states.
We immediately extend δ to (δ w ) w∈Σ * whereδ w : pow(Q) → pow(Q) as follows: For every string w ∈ Σ * and set of states P ⊆ Q, From here on, we identify δ withδ. If |Q I | ≤ 1, and if δ f ({q}) ≤ 1 for every f ∈ Σ and q ∈ Q, then M is said to be deterministic. Let E be an equivalence relation on Q. The aggregated nfa with respect to E is the nfa [19]) is the coarsest congruence relation E on Q with respect to the rightlanguages of the states in Q. This means that ( p, q) ∈ E if and only if

Propositional logic
We assume that the reader is familiar with propositional logic, but recall some basic facts to fix terminology. It is important to note, that in the definitions that follow, interpretations are in general partial functions.
The Boolean values true and false are written as and ⊥, respectively, and we use B for { , ⊥}. Let L be a propositional logic over the logical variables X , and let WF(L) be the set of well-formed formulas over L. An interpretation of L is a partial function X → B. Given interpretations v and v , we say that v is an extension The set of all such extensions is written Ext(v).
As usual, the semantics of a well-formed formula w ∈ WF(L) is a function from the set of all total interpretations (i.e., from all total mappings X → B), to B. A total interpretation v is a total model for w if w(v) = (by convention, hereafter this application as v(w)). The set of all total models for w is written Mod t (w). Given a pair of formulas w, w ∈ B, we write w ≡ w to denote that Mod t (w) = Mod t (w ).
A substitution of formulas for a finite set of variables X is a set {x 1 ← w 1 , . . . , x n ← w n }, where each x i ∈ X is a distinct variable and each w i ∈ WF(L) \ X is a formula. The empty substitution is defined by the empty set. Let θ = {x 1 ← w 1 , . . . , x n ← w n } and σ = {y 1 ← w 1 , . . . , y k ← w k } be two substitutions. Let X and Y be the sets of variables substituted for in θ and σ , respectively. The composition θσ of θ and σ is the substitution The application of θ to a formula w is denoted wθ and defined by (simultaneously) replacing every occurrence of each x i in w by the corresponding w i . Finally, given a set of formulas W ⊆ WF(L), we let W θ = {wθ | w ∈ W }.
Every partial interpretation v of L can be seen as a substitution, in which x ∈ dom(v) is replaced by v(x), resulting in a new formula wv in WF(L) with variables in X \ dom(v). This allows us to extend v to a function WF(L) → ((X → B) → B) defined by v(w) = wv. Example 1 Consider the formulas w = x 1 → x 2 and w = x 1 ∧ x 2 , and the partial interpreta- Conversely, given a substitution σ we can define a partial interpretation σ : The join of a pair of partial interpretations v and v is the total interpretation v∨v : it is a conjunction of clauses, where each clause is a disjunction of possibly negated variables. A formula is negation-free if no variable occurs negated.

Logical framework
In this section, we express the problem of finding the coarsest simulation relation on a finite automaton, as a problem of computing the maximal model of a propositional-logic formula.
From here on, M = (Q, Σ, δ, Q I , Q F ) is a fixed but arbitrary nfa, free from useless states.
and only if q ∈ Q F ; and 2. for every symbol f ∈ Σ, We shall express the second of these conditions in a propositional logic, in which the variables are pairs of states. The resulting formula is such that if the variable p, q is assigned the value , then p and q must satisfy Condition 2 of Definition 1 for the whole formula to be true.
In the following, we take the conjunction of an empty set of Boolean values to be true (or ), and the disjuction of an empty set of Boolean values to be false (or ⊥).
and by w x the formula f ∈Σ w f x . It should be clear that, for every f ∈ Σ and x ∈ X M , the formulas w f x and w x are negation-free. Finally, w M denotes the conjunction x∈X M (x → w x ), and w x is said to be the right-hand side of the implication x → w x . We could also model Condition 1 of Definition 1 in the formula w M , but that would introduce negations and make the presentation more involved. To find the coarsest bisimulation relation for M, we start instead with a partial interpretation of X M satisfying Condition 1 of Definition 1 and search for a 'maximal' total extension that also satisfies Condition 2. By 'maximal' we mean that it assigns as many variables as possible the value .

Definition 3 (Maximal model)
Let v and v be interpretations of X M . We say that the total Due to the structure of w M , its models are closed under the join operator.
Assume the former, without loss of generality. Then v(w x ) ≡ since v ∈ Mod(w M ). Now, the fact that more variables are assigned the value in v∨v cannot cause w x to become false, since it is negation-free. Hence From Lemma 1, we conclude that when a solution exists, it is unique.
Proof If v cannot be extended to a model of w M then the statement is trivially true. If it can be extended to a model, then by Lemma 1 the join of all such extensions is a model of w M , and it is unique since join is idempotent.
Given v ∈ Mod(w M ), Lemma 2 allows us to unambiguously write Max(M, v) for the unique maximal model of To translate our logical models back into the domain of bisimulation relations, we introduce the notion of their associated relations.
We say that the interpretation v is reflexive, symmetric, and transitive, respectively, whenever ∼ v is.
Note that Definition 4 does not distinguish between a state pair x for which v(x) = ⊥, and a state pair for which v is undefined. If v is an arbitrary model of w M , then its associated relation need not be an equivalence relation, but for the maximal model, it is.

Lemma 3
Let v be a partial interpretation of X M such that ∼ v is an equivalence relation, then also ∼v, wherev = Max(M, v), is an equivalence relation.
Proof Since ∼ v is reflexive, v( p, p ) = for every p ∈ X , so the associated relation of every extention of v is also reflexive.
Since the logical operators ∨ and ∧ commute, every extension v of v in which v ( p, q ) = can be turned into a model v in which v ( q, p ) = by swapping the order of every pair in X M . By taking the join of v and v , we arrive at a greater model v ∨ v in which Sincev is the maximal model of v, it is necessarily already symmetric.
A similar argument holds for transitivity. Let v be the transitive closure ofv, in other words, let v be the complete interpretation that assigns the fewest number of variables in X M the value , while still guaranteeing that for all p, q, We verify that v is also a model for w M , by checking that since v assigns more variables the value thanv does, and since w p i , p i+1 is negation-free. Suppose for the sake of contradiction that P = p 1 , This means that for every so v (w p 1 , p k+1 ) ≡ , that is, a contradiction. Sincev is already maximal, it has to be transitive.
We introduce a partial interpretation v 0 to reflect Condition 1 of Definition 1 and use this as the starting point for our search.
and v 0 is undefined on all other state pairs.

Lemma 4
The interpretation v 0 is in Mod(w M ) and ∼ v 0 is an equivalence relation.
Proof To verify that v 0 is a model for w M , we must ensure that v 0 ( p, p → w p, p ) ≡ for every p ∈ Q. By definition, This means that for every p ∈ δ f ( p) we know that there is some For the second part of the statement, we note that ∼ v 0 = I X M . Furthermore, I X M is clearly an equivalence relation, namely the finest one in which each state is an equivalence class of its own.
We summarize this section's main findings in Theorem 1.

Algorithm
An aggregation-based minimisation algorithm starts with a singleton partition, in which each state is viewed as a separate block, and iteratively merges blocks found to be equivalent. When all blocks have become mutually distinguishable, the algorithm terminates. We take the same approach for the more general problem of minimizing nfas with respect to bisimulation equivalence. The procedure is outlined in Algorithm 1 and the auxiliary Algorithm 2.
The input to Algorithm 1 is an nfa M = (Q, Σ, δ, Q I , Q F ). The algorithm computes the interpretationv of the set of variables X M = { p, q | p, q ∈ Q}, wherev(x) = means that x is a pair of equivalent states, andv(x) = ⊥ that x is a pair of distinguishable states. The interpretationv is an extension of v 0 , in the meaning of Definition 5, and a maximal model for the characteristic formula w M . Due to the structure of w M this maximal model can, as we shall see, be computed greedily.
The maximal model Max(M, v 0 ) is derived by incrementally assembling a substitution σ , which replaces state pairs by logical formulas. When outlining the algorithm, we add an index to σ to address distinct assignments to σ . The method is such that (i) the substitution is eventually a total function, and (ii) no right-hand side of the substitution contains a variable that is also in the domain of the substitution. In combination, this means that when the algorithm terminates, the logical value of every variable is resolved to or ⊥. The substitution thus comes to represent a total interpretation of X M . In the computations, σ i is a global variable. It is initialised such that it substitutes for each pair of identical states, and ⊥ for each pair of states that differ in their finality (see Line 2 of Algorithm 1). Following this initialisation, the function equiv (see Algorithm 2) is called for each pair of states not yet resolved by the substitution.
Function equiv has two parameters: the pair of states x for which equivalence should be determined, and a set S of pairs of states that are under investigation in previous, though not yet completed, invocations of the function. In other words, S contains pairs that are higher up in the call hierarchy. The function recursively invokes itself with those pairs of states that occur as a variable in formula w x σ i , but which have not yet been resolved, nor form part of the call stack S.
After these calls have been completed and the while loop exited, the following two steps are taken: First, the formula w x σ i {x ← } is derived from w x σ i by replacing every occurrence of x by , and second, the substitution σ i+1 is derived from σ i by adding a rule that substitutes x Algorithm 1 Aggregation-based bisimulation minimization algorithm. 1: function minimize(M) 2:

Example 2
To illustrate the algorithm and sketch the intuition behind it, we consider the automaton in Fig. 1a. The automaton represents a non-minimal NFA for the language L = {a * }; for comparison, Fig. 1b represents the minimal NFA for the same language. The non-minimal NFA gives rise to nine pairs of states as variables. For the pair q 0 , q 1 , for example, the corresponding formula w q 0 ,q 1 is Line 1 of Algorithm 1 ensures that the three pairs of identical states all resolve to . In other words, This means that w q 0 ,q 1 σ 0 = ( q 1 , q 0 ∨ q 1 , q 2 ) ∧ ( q 0 , q 2 ∨ q 1 , q 2 ). As observed in the proof of Lemma 3, the solution will be symmetric, so we need only consider, without loss of generality, the three pairs q 0 , q 1 , q 0 , q 2 and q 1 , q 2 and the corresponding formula for each of these.
Assuming that the 'for' loop in Algorithm 1 initially selects the pair q 0 , q 1 , a call to equiv( q 0 , q 1 , { q 0 , q 1 }) occurs. In the called function equiv, the existential quantification on Line 2 will be true, namely for each of the other two of the three pairs indicated above, i.e., for q 0 , q 2 and q 1 , q 2 .

Correctness
The correctness proof is based on the fact that throughout the computation, var(w x σ i ) ∩ dom(σ i ) = ∅, for every x ∈ X M . In other words, at every point of the computation, the set of variables that occur in the domain of σ i is disjoint from the set of variables that occur in w x σ i , x ∈ X M . This invariant means that there are no circular dependencies, and helps us prove that eventually, every variable will be resolved. Intuitively, the invariant holds because every time σ i is updated by adding a variable x to its domain, the assignment on Line 5 of Proof The proof is by induction. Lemma 5 is trivially true after the initialisation of σ 0 in Algorithm 1.
Consider the assignment to σ i+1 on Line 5 of Algorithm 2. By the induction hypoth- Let us now ensure that the recursive calls always come to an end.

Lemma 6 Algorithm 1 terminates.
Proof We need only consider calls to function equiv. Since S grows with each recursive call to equiv on Line 3 of Algorithm 2, the recursion is finite. Due to Line 5, each call to equiv terminates with dom(σ i ) greater than before, hence the number of calls of the while-loop is also finite.

It remains to verify every intermediate solution is a partial solution.
Lemma 7 Throughout the execution of Algorithm 1, and for every x ∈ dom(σ i ), the formula (x → w x )σ i is a tautology.

Proof
The proof is by induction on the index of σ i . There are two cases. First, if σ 0 (x) = , then x = p, p for some p ∈ Q, which means that by Definition 2 of the characteristic formula, w x σ 0 is a tautology, and so is (x → w x )σ 0 . Second, If σ 0 (x) = ⊥, then (⊥ → w x )σ 0 is clearly a tautology.
We continue to consider the inductive step, which extends the substitution by letting σ i+1 = σ i {y ← w y σ i {y ← }}. For every x ∈ dom(σ i+1 ), there are two cases: -The variable x ∈ dom(σ i ). By the induction hypothesis, (x → w x )σ i is a tautology, and replacing every occurrence of a variable in a tautology with one and the same formula yields a new tautology. -The variable x = y, in which case (y → w y )σ i+1 expands to the tautology which completes the proof.

Lemma 8 Throughout the execution of Algorithm
Proof The proof is by induction on the index of σ i . By construction, σ 0 = v 0 , which establishes the base case.
We continue to consider the inductive step, which extends the substitution by letting We prove that for every Due to the conjunctive structure of w M , we can take advantage of the fact that The argument has three cases: , and since w x is negation free, it must be the case that This completes the case analysis and the proof. Proof The proof is by induction on the index of σ i . By construction, σ 0 = v 0 , so the base case is trivially true. We continue to consider the inductive step, which extends the substitution by letting We first observe that if σ i is updated to σ i = σ i {x ← w x }, then by Lemma 8, we have Let σ t be the value of σ i at the point of termination, in other words, when control reaches Line 6 of Algorithm 1.

Observation 2
Since var(w x σ t ) = ∅, for every x ∈ X M , the interpretation σ t is total. Lemmas 6,9, and Observation 2 are combined in Theorem 3.
Theorem 3 Algorithm 1 terminates, and when it does, the relation ∼ σ t is the unique coarsest bisimulation equivalence on M.

Complexity
Let us now discuss the efficient implementation of Algorithm 1. The key idea is to keep the representation of the characteristic formula and the computed substitutions small by linking recurring structures, rather than copying them. We use the parameter r to capture the amount of nondeterminism in M. It is defined as r = max q∈Q, f ∈Σ δ f (q) . In particular, r ≤ 1 whenever the automaton M is deterministic.
Let us denote the union of all w x , x ∈ X M , in other words, the formulas that appear as right-hand sides in w M , by rhs M . In the update of σ i on Line 5, some of these formulas may be copied into others, so the growth of rhs M σ i is potentially exponential. For the sake of compactness we therefore represent rhs M σ i as a directed acyclic graph (DAG) and allow node sharing between formulas. In the following, we represent a DAG as a tuple (V , E, l), where V is a set of nodes, E ⊆ V × V is a set of (directed) edges, and l : V → X M ∪{∨, ∧, ⊥, } is a labelling function that labels each node with a variable name or logical symbol. In the initial DAG, only nodes representing variables and the logical constants and ⊥ are shared, but as the algorithm proceeds, more substantial parts of the graph come to overlap. The construction is straight-forward but has many steps, so readers that are satisfied with a high-level view may want to continue to Theorem 4.
where root(D(w ⊗ w )) = u, and then merging leaf nodes with identical labels.
Given the above definition, we obtain the many-rooted DAG representation D(rhs M ) of rhs M by taking the disjoint union of D(w x ), w x ∈ rhs M , and merging all leaf nodes that have identical labels. Thus, for each state pair x and for each of and ⊥, there is a single leaf node in D(rhs M ).
Throughout the computation, we maintain a DAG representing D(rhs M σ i ). This is initialised to D(rhs M ∅) and then immediately updated to D(rhs M σ 0 ). On top of this DAG, we assume that for each pair x, we have a reference ref rhs (x) to w x , in other words, to the  Figure 2 illustrates the structure of the initial DAG. During the computation, the graph D(rhs M σ i ) is reorganised by changing the targets of certain edges, but D(rhs M σ i ) does not grow. The exceptions are a potential once-off addition of and ⊥ labelled nodes during initialisation in Algorithm 1, and the addition of a single outgoing edge to each of the initial leave nodes. Moreover, every time a variable is resolved, D(rhs M σ i ) is updated to reflect this; while the ref rhs (x)'s will continue to point at w x σ i , the expression w x σ i changes to reflect the latest σ i , and will be simplified as much as possible.
There are two cases to consider at Line 5 of Algorithm 2. The first of these is that w x σ i {x ← } resolves to or ⊥. In this case a number of adjustments are made to D(rhs M σ i ) to reflect the updated w x σ i+1 (illustrated in Fig. 3): 1. The formula w x σ i in D(rhs M σ i ) is replaced by or ⊥ as the case may be. Thus, the graph D(rhs M σ i ) is modified to remove the nodes and edges of such w x σ i . 2. The unique shared leaf node representing x in the DAG is re-labeled to either or ⊥. 3. The re-labeling is propagated upwards along each DAG branch leading to this node, now labeled respectively ⊥, as this resolution of x may lead subtrees rooted further up this branch to resolve to either ⊥ or as well. In the case of ⊥, if the immediate parent is labeled by ∧, then it can be resolved to (i.e., replaced by a reference to) ⊥. If the parent is instead labelled ∨, then we can simplify the graph by deleting the edge and if it was the last edge, also resolving the parent to ⊥. In the case of and parent ∨, the parent can be resolved to . In the case of with parent ∧, a simplification is possible by deleting the edge between them, and if it was the last edge, resolving the parent to . This processes continues until no more simplifications or resolutions are possible.
In the second case, w x σ i {x ← } does not resolve to ⊥ or . Here, as illustrated in Fig. 4, two updates are made to D(rhs M σ i ): 1. The references in w x σ i to the unique shared leaf node for x itself are replaced by references to . 2. The change is propagated upwards along each DAG branch leading to this reference to , as this local resolution of x may either simplify (in case of ∧) or resolve (in case of ∨) subtrees rooted further up in rhs M σ i . The resulting modified right-hand side w x σ i+1 may either resolve to , or still be a proper tree. In the first case, the unique shared leaf node representing x in the DAG is re-labeled to . This change is then propagated upwards, as . 3 The update of the DAG D(rhs M σ i ) in the case where x gets resolved to either or ⊥. The symbol ⊗ denotes a node labeled by either ∧ or ∨. The upper part of the images shows the nodes and edges that are about to be deleted (outlined in gray), and the lower part shows how the information is propagated through the graph (dashed lines) described in the previous paragraph. In the second case, the node x may still be used in right-hand sides other than w x σ i+1 , and is replaced there by a reference to the modified w x σ i+1 .
The above graph manipulations permit an efficient implementation of Algorithm 1. In our complexity analysis, we assume that the sets of state pairs involved provide constant time insertion and deletion. This is an idealized view of the matter. In practice, one can represent such a set as a hash table indexed by pairs of states. With this implementation, both set operators will essentially take constant time, as long as the hash table is sufficiently large to make hash collisions rare (see [10, p. 207]).
Proof The initialisation of σ 0 in Algorithm 1 can be done it O(n 2 ), whereupon the algorithm proceeds to call Algorithm 2, which is in total called O(n 2 ) times, over the entire execution of Algorithm 1.
Let us look closer at the body of Algorithm 2 on input x and S. To satisfy the existence clause in the 'while' loop of Algorithm 2, the algorithm needs to decide what variable to resolve next. To do this, the algorithm finds the left-most leaf (i.e., node with no outgoing edges) in the DAG representation of w x σ i . In other words, the algorithm follows the left-most path from the root downwards in w x σ i (the top-most subfigure of Fig. 2 gives an idea of what path looks like). In total, over the entire run of Algorithm 1, the algorithm must in the worst case traverse each path from the root of D(w x ) to a leaf in D(w x ), for every x ∈ X M . The algorithm must however only go down each path once, even when the representations of the formula w x , x ∈ X M start to link up (like in the lower subfigure Fig. 4). The reason is that after a call to equiv with argument x and S, the only variables left in w x σ i are those in S, and these will all be resolved to or ⊥ at a later stages of the algorithm, at which points the DAG representing w x σ will be simplified, until it has at last a decided Boolean value. The cost for these simplifications is discussed in the next paragraph, and the cost for deciding what variable to resolve next is a total |D(rhs M )| = O(n 2 r 2 |Σ|) when summed over every call to Algorithm 2.
The update to σ i on Line 5 is where the majority of the work is done. First, the local substitution w x σ i {x ← } on Line 5 requires O(r 2 |Σ|) steps. As argued above, there are two cases: -If w x σ i {x ← } resolves to or ⊥, we start at the node labelled x and follow all edges in D(rhs M σ i ) backwards, updating the graph structure to reflect the truth value of x. Every time we follow an edge, we are able to simplify D(rhs M σ i ) by removing at least one edge and one node. This means that the total amount of work done at Line 5 is in O(|D(rhs M σ i )|) = O(n 2 r 2 |Σ|) when summed over every call to Algorithm 2. -If, instead, w x σ i {x ← } is not resolved, then the update of σ i to σ i+1 only takes constant time, as it only involves the redirection of one pointer.
Recall that bisimulation minimization coincides with classical minimization in the case of dfa. Since we always have r ≤ 1 for such devices, we arrive at Corollary 1. This means that the runtime of our algorithm, when applied to deterministic automata, is comparable to that of the algorithm by Watson and Daciuk [24]. Corollary 1 Algorithm 1 minimizes any dfa in O n 2 |Σ| .

Lazy evaluation and heuristic improvements
We argued from the outset that one advantage of the aggregation approach is that also intermediate solutions are language-equivalent with the original automaton. Let us now show that every time that the call hierarchy returns to the level of Algorithm 1 (i.e., the function minimize), every variable x ∈ X M on which equiv has been called, either from Algorithm 1 or through a recursive call from equiv itself, has also been resolved to or ⊥.

Lemma 10 Throughout the computation, var(σ i ) ⊆ S.
Proof The proof is by induction on the index of σ i . When Algorithm 2 is invoked for the first time, var(σ 0 ) = ∅, so the statement is true.
In the construction of σ i+1 we know from the induction hypothesis that var(σ i ) ⊆ S. The modification of σ i consists of substituting w x σ i {x ← } for x. The right-hand side w x only contains variables that are in S, or in dom(σ i ), so by the induction hypothesis var(w x σ i ) ⊆ S. Finally, all occurrences of x ∈ var(w x σ i ) are replaced by , so x can safely be removed from S when the function exits, without violating the statement of Lemma 10.

Theorem 5
Every time the process control returns to minimize, every pair x on which equiv has been called is resolved.
Proof Every time the call hierarchy returns to minimize, the stack S empties. Let σ i be the state of the substitution when this happens. By Lemma 10, var(w x σ i ) = ∅, so w x σ i is resolved.
Finally, we note that Algorithm 1 can be extended to add ( p, q) → ⊥ to σ 0 for each pair of states p and q such that the sets of unique symbols that label the transitions from p and q differ. This technique does not affect the asymptotic running time of the algorithm, but may be of practical value.

Conclusion
We have presented a minimization algorithm for nfas that identifies and merges bisimulationequivalent states. In terms of running time, it is as efficient as any existing aggregationbased minimisation algorithm for dfas, but less efficient than the fastest refinement-based minimisation algorithms derived from Hopcroft's algorithm for nfas. However, compared to the latter group, it has the advantage that intermediate solutions are usable for languagepreserving reduction of the input automaton M.
The algorithm is the first to compute the coarsest bisimulation relation on M through partition aggregation. Also the logical framework used for representation and computation appears to be new for this application. For this reason, the investigation of optimization techniques similar to those used in SAT solvers is an interesting future endeavour. Furthermore the generalization of the algorithm to, for example, nondeterministic tree automata could be considered.
A disadvantage of the proposed algorithm is that it builds the entire characteristic formula for the input machine, before the actual evaluation starts. For some cases, where state similarity can be disregarded after considering a fraction of the formula, this is not well-spent work. We are therefore interested in approaches that assemble the clauses of the characteristic formula as they are needed. The updated algorithm would likely have the same worst-case complexity as the original one, but may perform better in the average case. Average time complexity is in itself a relevant item of study. Recently, Bassino et al. [4] and David [11] published on the average-case complexities of the well-known Hopcroft's and Moore's algorithms, and a similar analysis could shed further light on Algorithm 1.