What’s Wrong with On-the-Fly Partial Order Reduction

. Partial order reduction and on-the-ﬂy model checking are well-known approaches for improving model checking performance. The two optimizations interact in subtle ways, so care must be taken when using them in combination. A standard algorithm combining the two optimizations, published over twenty years ago, has been widely studied and deployed in popular model checking tools. Yet the algorithm is incorrect. Counterexamples were discovered using the Alloy analyzer. A ﬁx for a restricted class of property automata is proposed.


Introduction
Partial order reduction (POR) refers to a family of model checking techniques used to reduce the size of the state space that must be explored when verifying a property of a program. The techniques vary, but all share the core observation that when two independent operations are enabled in a state, it is often safe to ignore traces that begin with one of them. A large number of POR techniques have been explored, differing in details such as the range of properties to which they apply. This paper focuses on ample set POR [4], an approach which applies to stutter-invariant properties and is used in the model checker Spin [8].
In the automata-theoretic view of model checking, the negation of the property to be verified is represented by an ω-automaton. The basic algorithm computes the product of this automaton with the state space of the program. The language of the product is empty if and only if the program cannot violate the property. On-the-fly model checking refers to an optimization of this basic algorithm in which the enumeration of the reachable program states, computation of the product, and language emptiness check are interleaved, rather than occurring in sequence.
These two optimizations must be combined with care, because they interact in subtle ways. 1 A standard algorithm for on-the-fly ample set POR is described in [12] and in further detail in [13]. I shall refer to this algorithm as the combined algorithm. Theorem 4.2 of [13] asserts the soundness of the combined algorithm. A proof of the theorem is also given in [13].
The proof has a gap. This was pointed out in [16,Sect. 5], with details in [15]. The gap was rediscovered in the course of developing mechanized correctness proofs for model checking algorithms; an explicit counterexample to the incorrect proof step was also found ( [2,Sect. 8.4.5] and [3,Sect. 5]). The fact that the proof is erroneous, however, does not imply the theorem is wrong. To the best of my knowledge, no one has yet produced a proof or a counterexample for the soundness of the combined algorithm.
In this paper, I show that the combined algorithm is not sound; a counterexample is given in Sect. 3.1. I found this counterexample by modeling the combined algorithm in Alloy and using the Alloy analyzer [11] to check its soundness. Sect. 4 describes this model. Spin's POR is based on the combined algorithm, and in Sect. 5, Spin is seen to return an incorrect result on a Promela model derived from the theoretical counterexample.
There is a small adjustment to the combined algorithm, yielding an algorithm that is arguably more natural and that returns the correct result on the previous counterexample; this is described in Sect. 6. It turns out this one is also unsound, as demonstrated by another Alloy-produced counterexample. However, in Sect. 7, I show that this variation is sound if certain restrictions are placed on the property automaton.

Preliminaries
Definition 1. A finite state program is a triple P = T, Q, ι , where Q is a finite set of states, ι ∈ Q is the initial state, and T is a finite set of operations. Each operation α ∈ T is a function from a set en α ⊆ Q to Q. Fix a finite state program P = T, Q, ι .

Definition 4.
A Büchi automaton is a tuple B = S, Δ, Σ, δ, F , where S is a finite set of automaton states, Δ ⊆ S is the set of initial states, Σ is a finite set called the alphabet, δ ⊆ S × Σ × S is the transition relation, and F ⊆ S is the set of accepting states. The language of B, denoted L(B), is the set of all ξ ∈ Σ ω generated by infinite paths in B that pass through an accepting state infinitely often.
Fix a finite set AP of atomic propositions and let Σ = 2 AP . Fix an interpretation mapping for P , i.e., a function L : Q → Σ. Definition 5. The language of P , denoted L(P ), is the set of all infinite words L(q 0 )L(q 1 ) · · · ∈ Σ ω , where q 0 q 1 · · · is the sequence of states generated by an execution of P . Definition 6. A language L ⊆ Σ ω is stutter-invariant if, for any a 0 , a 1 , . . . ∈ Σ and positive integers i 0 , i 1 . . ., a 0 a 1 · · · ∈ L ⇔ a i0 0 a i1 1 · · · ∈ L, where a i denotes the concatenation of i copies of a. Definition 7. Let B = S, Δ, Σ, δ, F , be a Büchi automaton with alphabet Σ. The product of P and B is the Büchi automaton − −− → s in B executes, leading to an "intermediate state" x = q, s . Then a program transition q α − → q executes, culminating in y = q , s . While this is a good mental model, the product automaton does not necessarily contain a transition from x to x or from x to y. The intermediate state x is not even necessarily reachable in the product. The transition in the product goes directly from x to y with label α, L(q) .
It is well-known that In the context of model checking, B is used to represent the negation of a desirable property; the program P satisfies the property if, and only if, no execution of P is accepted by B, i.e., L(P ) ∩ L(B) = ∅. The automaton B may be generated from a (negated) LTL formula, but that assumption is not needed here.
The goal of "offline" (not on-the-fly) partial order reduction is to generate some subspace P of P with the guarantee that The emptiness of L(P ⊗ B) = L(P ) ∩ L(B) can be decided in various ways, such as a nested depth first search (NDFS) [5].

On-the-Fly Partial Order Reduction
In on-the-fly model checking, the state space of the product automaton is enumerated directly, without first enumerating the program states. Adding POR to the mix means that at each state reached in the product automaton, some subset of enabled transitions will be explored. The goal is to ensure that if the language of the full product automaton is nonempty, then the language of the resulting reduced automaton must be nonempty.
To make this precise, fix a finite state program P = T, Q, ι , a set AP of atomic propositions, an interpretation L : Q → Σ = 2 AP , and Büchi automaton B = S, Δ, Σ, δ, F . Let A = P ⊗ B.
An ample selector determines a subautomaton A = reduced(A, amp) of A: A is defined exactly as in Definition 7, except that the transition relation has the additional restriction that α ∈ amp(q, s ): Definition 9. An ample selector amp is POR-sound if the following holds: The goal is to define some constraints on an ample selector that guarantee it is POR-sound. Before stating the constraints, we need two more concepts: Definition 10. An independence relation is an irreflexive and symmetric relation I ⊆ T × T satisfying the following: if (α, β) ∈ I and q ∈ en α ∩ en β , then α(q) ∈ en β , β(q) ∈ en α , and α(β(q)) = β(α(q)).
Fix an independence relation I. We say α and β are dependent if (α, β) ∈ I. Definition 11. An operation α ∈ T is invisible with respect to L if, for all q ∈ en α , L(q) = L(α(q)). [13] is slightly different. Given an LTL formula φ over AP, let AP be the set of atomic propositions occurring syntactically in φ. The definition in [13] says α is invisible in φ if, for all p ∈ AP and q ∈ en α , p ∈ L(q) ⇔ p ∈ L(α(q)). However, there is no loss of generality using Definition 11, since one can define a new interpretation L : Q → 2 AP by L (q) = L(q) ∩ AP . Then α is invisible for φ if, and only if, α is invisible with respect to L , and the results of this paper can be applied without modification to P , AP , and L .

Note 2. The definition in
We now define the following constraints on an ample selector amp: 2 C0 For all q ∈ Q, s ∈ S: en(q) = ∅ =⇒ amp(q, s) = ∅. C1 For all q ∈ Q, s ∈ S: in any admissible sequence in P starting from q, no operation in T \ amp(q, s) that is dependent on an operation in amp(q, s) can occur before some operation in amp(q, s) occurs. C2 For all q ∈ Q, s ∈ S: if amp(q, s) = en(q) then ∀α ∈ amp(q, s), α is invisible. C3 There is a depth-first search of A = reduced(A, amp) with the following property: whenever there is a transition in A from a node q, s on the top of the stack to a node q , s on the stack, amp(q, s ) = en(q).
Condition C3 is the interesting one. The combined algorithm of [13] enforces it using a DFS (the outer search of the NDFS) of the reduced space and the following protocol: given a new state q, s that has just been pushed onto the stack, first iterate over all Büchi transitions s, L(q), s departing from s and labeled by L(q). For each of these, a candidate ample set for amp(q, s ) that satisfies the first three conditions is computed; this computation does not depend on s . If any operation in that candidate set leads back to a state on the search stack (a "back edge"), a different candidate is tried and the process is repeated until a satisfactory one is found. If no such candidate is found, en(q) is used for the ample set.
Hence the process for choosing the ample set depends on the current state of the search. If y 1 = y 2 , it is not necessarily the case that amp(x, y 1 ) = amp(x, y 2 ), because it is possible that when x, y 1 was encountered, a back edge existed for a candidate, but when x, y 2 was encountered, there was no back edge.

Counterexample
Theorem 4.2 of [13] can be expressed as follows: if L(B) is stutter-invariant and the language of an LTL formula, and amp satisfies C0-C3, then amp is PORsound. A counterexample to this claim is given in Fig. 1. The program consists of two states, A and B, and two operations, α and β. There is a single atomic proposition, p, which is false at A and true at B. Note that α and β are independent. Also, α is invisible, and β is not.
The property automaton, B 1 , is shown in Fig. 1 (center top). It has two states, numbered 0 and 1. State 1 is the sole accepting state. The language consists of all infinite words of the following form: a finite nonempty prefix of ∅s followed by an infinite sequence of {p}s. This language is stutter-invariant, and is the language of the LTL formula (¬p) ∧ ((¬p)U Gp).
The ample selector is specified by the table (center bottom). Notice that amp(A, 1) = en(A), but the other three ample sets are full. C0 holds because the ample sets are never empty. C1 holds because β is independent of α. C2 holds because α is invisible. The reachable product space is shown in Fig. 1 (right). In any DFS of reduced(A, amp), the only back edge is the self-loop on A0 labeled α, ∅ . Since amp(A, 0) is full, C3 holds. Yet there is an accepting path in the full space, but not in the reduced space.

Alloy Models of POR Schemes
Alloy is a "lightweight formal methods" language and tool. It has been used in a wide variety of contexts, from exploring software designs to studying weak memory-consistency models. An Alloy model specifies signatures, each of which defines a type, relations on signatures, and constraints on the signatures and relations. Constraints are expressed in a logic that combines elements of first order logic and relational logic, and includes a transitive closure operator. An instance of a model assigns a finite set of atoms to each signature, and a finite set of tuples (of the right type) to each relation, in such a way that the constraints are satisfied. The Alloy analyzer can be used to check that an assertion holds on all instances in which the sizes of the signatures are within some specified bounds. The analyzer converts the question of the validity of the assertion into a SAT problem and invokes a SAT solver. Based on the result, it reports either that the assertion holds within the given bounds, or it produces an instance of the model violating the assertion.
I developed an Alloy model to search for counterexamples to various POR claims, such as the one in Sect. 3.1. The model encodes the main concepts of the previous two sections, including program, operations, interpretation, invisibility and independence, property automaton, the product space, ample selectors and the constraints on them, and a language emptiness predicate. The model culminates in an assertion which states that an ample selector satisfying the four constraints is POR-sound.
I was not able to find a way to encode stutter-invariance. In the end, I developed a small set of Büchi automata based on my own intuition of what would make interesting tests. I encoded these in Alloy and used the analyzer to explore all possible programs and ample selectors for each.
The first part of the model is a simple encoding of a finite state automaton. The following is a listing of file ba.als: The alphabet is some unconstrained set Sigma. The set of states is represented by signature BState. There is a single initial state, and any number of accepting states. Each transition has a source and destination state, and label. Relations declared within a signature declaration have that signature as an implicit first argument. So, for example, src is a binary relation of type BTrans × BState. Furthermore, the relation is many-to-one: each transition has exactly one BState atom associated to it by the src relation.
The remaining concepts are incorporated into module por_v0: The facts are constraints that any instance must satisfy; some of the facts are given names for readability. A pred declaration defines a (typed) predicate.
Most aspects of this model are self-explanatory; I will comment only on the less obvious features. The relations nextFull and nextReduced represent the next state relations in the full and reduced spaces, respectively. They are declared in ProdState, but specified completely in the final fact on lines 56-58. Strictly speaking, one could remove those predicates and substitute their definitions, but this seemed more convenient. Line 60 asserts that a product state is determined uniquely by its program and property components. Line 61 specifies the initial product state.
Line 59 insists that only states reachable (in the full space) from the initial state will be included in an instance (* is the reflexive transitive closure operator). Lines 62-64 specify the converse. Hence in any instance of this model, ProdState will consist of exactly the reachable product states in the full space.
The encoding of C1 is based on the following observation: given q ∈ Q and a set A of operations enabled at q, define r ⊆ Q × Q by removing from the program's next-state relation all edges labeled by operations in A. Then "no operation dependent on an operation in A can occur unless an operation in A occurs first" is equivalent to the statement that on any path from q using edges in r, all enabled operations encountered will either be in A or independent of every operation in A.
Condition C3 is difficult to encode, in that it depends on specifying a depthfirst search. I have replaced it with a weaker condition, which is similar to a well-known cycle proviso in the offline theory: C3 In any cycle in reduced(A, amp), there is a transition from q, s to q , s for which amp(q, s ) = en(q).
Equivalently: if one removes from the reduced product space all such transitions, then the resulting graph should have no cycles. This is the meaning of lines 50-54 (^is the strict transitive closure operator). The next step is to create tests for specific property automata. This example is for the automaton B 1 of Fig. 1 It places upper bounds on the numbers of operations, program states, and product states while checking the soundness assertion. Using the Alloy analyzer to check the assertion above results in a counterexample like the one in Fig. 1. The runtime is a fraction of a second. The Alloy instance uses two uninterpreted atoms for the elements of Sigma; I have simply substituted the sets ∅ and {p} for them to produce Fig. 1. As we have seen, this counterexample happens to also satisfy the stronger constraint C3.

Spin
The POR algorithm used by Spin is described in [10] and is similar to the combined algorithm. We can see what Spin actually does by encoding examples in Promela and executing Spin with and without POR.   Fig. 1. Transition α corresponds to the assignment x = 0, where x is a variable local to p1. Transition β corresponds to the assignment p = 1, where p is a shared variable. Applying Spin with the following commands allows one to see the structure of the program graphs for each process, as well as each step in the search of the full space: spin -a test1.pml; cc -o pan -DCHECK -DNOREDUCE pan.c; ./pan -d; ./pan -a I did this with Spin version 6.4.9, the latest stable release. The output indicates that 4 states and 5 transitions are explored, and one state is matched-exactly as in Fig. 1 (right). As expected, the output also reports a violation-a path to an accepting cycle that corresponds to the transition from A0 to B1 followed by the self-loop on B1 repeated forever.
Repeat this experiment without the -DNOREDUCE, however, and Spin finds no errors. The output indicates that it misses the transition from A0 to B1.

Ignoring the Intermediate States
An interesting aspect of the combined algorithm is that the ample set is a function of an intermediate state. I.e., given a product state x = q, s , the ample set is determined by the intermediate state x = q, s obtained after executing a property transition. This introduces a difference between the on-the-fly scheme and offline schemes, where there is no notion of intermediate state. It also introduces other complexities. For example, it is possible that x was reached earlier in the search through some other state q, s 2 , because of a property transition s 2

L(q)
− −− → s . How does the algorithm guarantee that the ample set selected for x will be the same as the earlier choice? This issue is not addressed in [13] or [10].
These problems go away if one simply makes the ample set a function of the source product state x. The intermediate states do not have to play a role.
Specifically, given an ample selector amp, define reduced 2 (A, amp) as in (1) and (2), except replace "α ∈ amp(q, s )" in (2) with "α ∈ amp(q, s)". Perform the same substitution in C3 and call the resulting condition C3 1 . The weaker version of C3 1 is simply: In any cycle in reduced 2 (A, amp) there is a state q, s with amp(q, s) = en(q).
Conditions C0-C2 are unchanged. I refer to this scheme as V1, and to the original combined algorithm as V0. The Alloy model of V0 in Sect. 4 can be easily modified to represent V1. Using V1, the example of Fig. 1 is no longer a counterexample. In fact, Alloy reports there are no counterexamples using B 1 , at least for small bounds on the program size. Figure 5 gives detailed results for this and other Alloy experiments.
Unfortunately, Alloy does find a counterexample for a slightly more complicated property automaton, B 2 , which is shown in Fig. 3. The program is the same as the one in Sect. 3.1. Automaton B 2 has four states, with state 3 the sole accepting state. The language is the same as that of B 1 : all infinite words formed by concatenating a finite nonempty prefix of ∅s and an infinite sequence of {p}s. If the prefix has odd length, the accepting run begins with the transition 0 → 1, otherwise it begins with the transition 0 → 2.
In the ample selector, only A0 and A2 are not fully enabled: C0-C2 hold for the reasons given in Sect. 3.1. C3 1 holds for any DFS in which A2 is pushed onto the stack before A1. In that case, there is no back edge from A2; there will be a back edge when A1 is pushed, but A1 is fully enabled.

What's Right
In this section, I show that POR scheme V1 of Sect. 6 is sound if one introduces certain assumptions on the property automaton. The following definition is similar to the notion of stutter invariant (SI) automaton in [6] and to that of closure under stuttering in [9]. The main differences derive from the use of Muller automata in [6] and Büchi transition systems in [9], while we are dealing with ordinary Büchi automata. A Büchi automaton B = S, {s init }, Σ, δ, F ,

Definition 12.
Moreover, if s 3 is accepting, then s 2 is accepting.
Following the approach of [6], one can show that the language of an automaton in SI normal form is stutter-invariant. Moreover, any Büchi automaton with a stutter-invariant language can be transformed into SI normal form without changing the language. The conversion satisfies |S | ≤ O(|Σ||S|), where |S| and |S | are the number of states in the original and new automaton, respectively. For details and proofs, see [17]. An example is given in Fig. 4; the language of B 3 (or B 4 ) consists of all words with a finite number of {p}s. The remainder of this section is devoted to the proof of Theorem 1. The proof is similar to the proof of the offline case in [4].
Let θ be an accepting path in the full space A. An infinite sequence of accepting paths π 0 , π 1 , . . . will be constructed, where π 0 = θ. For each i ≥ 0, π i will be decomposed as η i • θ i , where η i is a finite path of length i in the reduced space, θ i is an infinite path, η i is a prefix of η i+1 , and • denotes concatenation. For i = 0, η 0 is empty and θ 0 = θ.
Assume i ≥ 0 and we have defined η j and θ j for j ≤ i. Write where σ k = L(q k ) for k ≥ 0. Then η i+1 and θ i+1 are defined as follows. Let A = amp(q 0 , s 0 ). There are two cases: Let η i+1 be the path obtained by appending the first transition of θ i to η i , and θ i+1 the path obtained by removing the first transition from θ i .
Then there are two sub-cases: Case 2a: Some operation in A occurs in θ i . Let n be the index of the first occurrence, so that α n ∈ A, but α j ∈ A for 1 ≤ j < n. By C1, α j and α n are independent for 1 ≤ j < n. By repeated application of the independence property, there are paths in P q 0 q 1 q 2 · · · q n−2 q n−1 By C2, α n is invisible, whence L(q j+1 ) = σ j for 0 ≤ j ≤ n − 2, and σ n−1 = σ n . Hence the admissible sequence generates the word Now the projection of θ i onto B has the form which accepts the word (5). Composing (4) and (6) therefore gives a path through the product space. Removing the first transition (labeled α n , σ 0 ) from this path yields θ i+1 . Appending that transition to η i yields η i+1 .
Case 2b: No operation in A occurs in θ i . By C0, A is nonempty. Let β ∈ A. By C2, every operation in θ i is independent of β. With an argument that is similar to the one for Case 2a, we can see there is a path in the product space for which the projection onto the program component has the form and the projection onto the property component has the form Removing the first transition from this path yields θ i+1 . Appending that transition to η i yields η i+1 . This completes the definitions of η i+1 and θ i+1 . Let η be the limit of the η i . Clearly η is an infinite path through the reduced product space, starting from the initial state. We must show that it passes through an accepting state infinitely often. To do so, we must examine more closely the sequence of property states through which each θ i passes.
Let i ≥ 0, and s 0 the final state of η i . Say θ i passes through states s 0 s 1 s 2 · · · . Then the final state of η i+1 will be s 1 , and the state sequence of θ i+1 is determined by the three cases as follows: We first claim that for all i ≥ 0, θ i passes through an accepting state infinitely often. This holds for θ 0 , which is an accepting path by assumption. Assume it holds for θ i . In each case of (7), we see that the state sequence of θ i+1 has a suffix which is a suffix of the state sequence of θ i , so the claim holds for θ i+1 .

Definition 13.
For any path ξ = s 0 → s 1 → · · · through B which passes through an accepting state infinitely often, define the accepting distance of ξ, written AD(ξ), to be the minimum k ≥ 1 for which s k is accepting.

Lemma 2.
Let i ≥ 0 and say the state sequence of θ i is s 0 s 1 s 2 · · · . If s 1 is not accepting then one of the following holds: -Case 1 holds and AD(θ i+1 ) < AD(θ i ), or -Case 2a or 2b holds and AD(θ i+1 ) ≤ AD(θ i ).
Proof. If s 1 is not accepting then there is some k ≥ 2 for which s k is accepting. The result follows by examining (7). In Case 1, the accepting distance decreases by 1. In Case 2a, the accepting distance is either unchanged (if k ≤ n) or decreases by 1 (if k > n). In Case 2b, the accepting distance is unchanged. Proof. Suppose not. Then there is some i ≥ 0 such that Case 2 holds for all j ≥ i. Let α 1 be the first program operation of θ i . Then α 1 is the first program operation of θ j , for all j ≥ i. Furthermore, for all j ≥ i, α 1 is not in the ample set of the final state of η j . Since the product space has only a finite number of states, this means there is a cycle in the reduced space for which α 1 is enabled but never in the ample set, contradicting C3 1 .
We now show that η passes through an accepting state infinitely often. Note that, if AD(θ i ) = 1, an accepting state is added to η i to form η i+1 . Suppose η does not pass through an accepting state infinitely often. Then there is some i ≥ 0 such that for all j ≥ i, AD(θ j ) > 1. By Lemma 2, (AD(θ j )) j≥i is a nonincreasing sequence of positive integers, and by Lemma 3, this sequence strictly decreases infinitely often, a contradiction. This completes the proof of Theorem 1.

Summary of Experimental Results and Conclusion
We have seen that standard ways of combining POR and on-the-fly model checking are unsound. This is not only a theoretical issue-the defect in the algorithm is realized in Spin, which can produce an incorrect result. A modification (V1) seems to help, but is still not enough to guarantee soundness for any Büchi automaton with a stutter-invariant language. However, any such automaton can be transformed into a normal form for which algorithm V1 is sound.  Alloy proved useful for reasoning about the algorithms and generating small counterexamples. A summary of the Alloy experiments and results is given in Fig. 5. These were run on an 8-core 3.7GHz Intel Xeon W-2145 and used the plingeling SAT solver [1]. 3 In addition to the experiments already discussed, Alloy found no soundness counterexamples for property automata B 3 or B 4 , using V0 or V1. In the case of B 4 , this is what Theorem 1 predicts. For further confirmation of Theorem 1, I constructed a general Alloy model of Büchi automata in SI normal form, represented by B 5 in the table. Alloy confirms that both V0 and V1 are sound for all such automata within small bounds on program and automata size.
It is possible that the use of the normal form, while correct, cancels out the benefits of POR. A comprehensive exploration of this issue is beyond the scope of this paper, but I can provide data on one non-trivial example. I encoded an n-process version of Peterson's mutual exclusion algorithm in Promela, and used Spin to verify starvation-freedom for one process in the case n = 5. If p is the predicate that holds whenever the process is enabled, a trace violates this property if p holds only a finite number of times in the trace, i.e., if the trace is in L(B 3 ) = L(B 4 ). Figure 6 shows the results of Spin verification using B 3 without POR, and using B 3 and B 4 with POR. The results indicate that POR significantly improves performance on this problem, and that using the normal form B 4 in place of B 3 actually improves performance further by a small amount.  It is likely that V1 is sound for other interesting classes of automata. Observe, for example, that B 2 of Fig. 3 has states u where the language of the automaton with u considered as the initial state is not stutter-invariant. If we restrict to automata in which every state has a stutter-invariant language, is V1 sound? I have neither a proof nor a counterexample. (This is certainly not true of V0, as B 1 is a counterexample.) To explore this question, it would help to find a way to encode the stutter-invariant property-or a suitable approximation-in Alloy.
Finally, the proof of Theorem 1 is complicated and might also be flawed. Recent work mechanizing such proofs [3] represents an important advance in raising the level of assurance in model checking algorithms. It would be interesting to see if the proof of this theorem is amenable to such methods. However, constructing such proofs requires far more effort than the Alloy approach described here. One possible approach moving forward is to use tools such as Alloy when prototyping a new algorithm, to get feedback quickly and root out bugs. Once Alloy no longer finds any counterexamples, one could then expend the considerable effort required to construct a formal mechanized proof.