State space reduction in modeling checking parameterized cache coherence protocol by two-dimensional abstraction
Authors
- First Online:
DOI: 10.1007/s11227-012-0755-0
- Cite this article as:
- Guo, Y., Qu, W., Zhang, L. et al. J Supercomput (2012) 62: 828. doi:10.1007/s11227-012-0755-0
Abstract
Scalability of cache coherence protocol is a key component in future shared-memory multi-core or multi-processor systems. The state space explosion is the first hurdle while applying model-checking to scalable protocols. In order to validate parameterized cache coherence protocols effectively, we present a new method of reducing the state space of parameterized systems, two-dimensional abstraction (TDA). Drawing inspiration from the design principle of parameterized systems, an abstract model of an unbounded system is constructed out of finite states. The mathematical principles underlying TDA is presented. Theoretical reasoning demonstrates that TDA is correct and sound. An example of parameterized cache coherence protocol based on MESI illustrates how to produce a much smaller abstract model by TDA. We also demonstrate the power of our method by applying it to various well-known classes of protocols. During the development of TH-1A supercomputer system, TDA was used to verify the coherence protocol in FT-1000 CPU and showed the potential advantages in reducing the verification complexity.
Keywords
Parameterized cache coherence protocol True concurrency Model checking Two-dimensional abstraction1 Introduction
Model checking is an automatic technique for verifying finite state concurrent systems, which uses a finite state machine to describe the system under consideration and temporal logic to state the properties that the system must satisfy. This method has been used successfully in practice to verify complex software and hardware systems [1, 2]. However, efficient verification of parameterized cache coherence protocols is one of the most challenging problems in verification domain today. Firstly, parameterized systems are composed of an arbitrary number of processes which concur cooperatively (the number of processes is called the system parameter). The behavior of one process is determined not only by its current state, but also the changes of the environment it lives. Secondly, parameterized systems are by nature unbounded. The system parameter may be arbitrarily large, and the ultimate goal is to validate the properties in a system for every possible number of processes. In such cases, the number of global states can be enormous, resulting in the state space explosion. Formal verification of parameterized systems is known to be undecidable and thus cannot be automated. Thirdly, symbolic methods such as BDD or SAT, which can enable scalable formal verification methods, can be ineffective when it comes to cache coherence protocols because most of the state variables are relevant in protocol property verification. As faster larger systems are designed, the complexity of cache protocols will continue to increase.
Fong Pong [3] presented a comprehensive survey of various approaches to the verification of cache coherence protocol based on state enumeration, model checking, and symbolic state models. He pointed out that no framework had been proposed so far to deal with the memory consistency model in the context of formal verification based on state expansion. Monolithic formal verification methods that treat the protocol as a whole have been used fairly routinely for verifying cache coherence protocols from the early 1990s [4, 5]. However, these monolithic techniques will not be able handle the very large state space of parameterized protocols. While techniques like indexed predicates [6], counter abstraction [7], environment abstractions [8, 9], and cutoffs based approach [10] have been proposed for parameter protocol verification during these years, none of them scales well to large protocols, and those that do scale require an inordinate amount of manual effort to succeed [11]. We are not aware of any published work that has reported formal verification of a parameterized cache coherence protocol with reasonable complexity.
All successful applications of model checking thus far have made use of domain specification abstraction techniques. Continuing this trend and drawing inspiration from recent work like environment abstraction [8, 9], we exploit the domain knowledge about parameterized systems to devise an appropriate abstraction method. We propose a novel generic approach called two-dimensional abstraction (TDA), which could effectively reduce the state space of parameterized systems. In our work, the size of the state transition graph for each process is reduced independently at first, then the whole system composed of the reduced processes is abstracted based on the design principles of parameterized systems, thus avoiding the construction of the complete state space that might be too large to fit into memory.
TDA has a number of advantages over other approaches. First, TDA abstracts away redundant information from a concrete system via decomposition–abstraction–composition–reabstraction, thus effectively alleviating the state explosion problem during parameterized systems verification. Second, TDA can be used for parallel systems in the usual fashion because it has no limitation in communication mode among processes. Third, TDA can be used with any model checker. The freedom to choose model checkers is important in practice. Fourth, TDA is sound and complete. We give complete soundness and completeness proofs for our method. At last, constant heterogeneous processes and infinite state systems are allowed, which makes TDA suitable for large scale heterogeneous systems. We demonstrate the power of our method by applying it to various well-known classes of protocols.
The rest of this paper is organized as follows. In Sect. 2, we introduce previous related work. Section 3 gives some background information. In Sect. 4, we propose a model with true concurrency semantics for parameterized systems. In Sect. 5, we present concepts of a TDA model and the method to construct a TDA model. A cache coherence protocol based on MESI is used to illustrate the approach of getting a much smaller state space by TDA in Sect. 6. Experimental results of various well-known protocols and application are presented in Sect. 7. Section 8, the last section, presents concluding remarks.
2 Related works
The development of effective techniques for checking parameterized systems is one of the most challenging problems in verification today. Prior research in the area of coherence protocol verification has ranged from simulation to formal methods. These techniques have had varying degrees of success, but few of them have been applied to a large industrial-strength protocol like FLASH.
Simulation with random or directed stimulus has been shown to be effective at finding most protocol errors [12]. However, simulation tends not to be effective at uncovering subtle bugs, especially those related to the consistency model. Subtle consistency bugs often occur only under unusual combinations of circumstances, and it is unlikely that simulation will drive the protocol to these situations.
For verification of high level specifications, modern industrial practice consists of modeling small instances of the protocols in guard/action languages such as Murphi [13] or TLA+ [14], and exploring the reachable states through explicit state enumeration.
The idea of using non-interference lemmas for parameterized model checking is attributed to McMillan [15], Chou [16], and Li [17], which is also called the CMP method. The CMP approach to parameterized verification is a combination of data type reduction and compositional reasoning. In this approach, a model checker is used as proof assistant and the user guides the proof by supplying invariants or non-interference lemmas. Similar types of reasoning have been applied by Chen to verify non-parameterized hierarchical protocols [18]. The compositional method of McMillan is used for compositional reasoning to handle infinite state systems including directory based protocols. This technique, which requires user intervention at various stages, has been applied to verify safety and liveness properties of the FLASH protocol. The paper by Chou [16] presented a method along similar lines, that was used to verify safety of FLASH and GERMAN protocol. Krstic [19] gave a formalization of the method. The CMP method scales well. As far as we are aware, the CMP method is one of a few methods to handle the full complexity of the FLASH protocol. Intel used CMP to verify an industrial-strength cache protocol several orders of magnitude larger than even the FLASH protocol [20]. Talupur and Tuttle showed how to derive high-quality invariants from message flows and how to use these invariants to accelerate the CMP method [21, 22]. A message flow is a sequence of messages sent among processors during the execution of a protocol. The hardest part of using CMP is finding a set of protocol invariants that enable CMP to work. The user has the burden of coming up with non-interference lemmas which can be non-trivial and require deep understanding of the protocol under verification.
Another effective method for parameterized verification is the abstraction approach [6–9, 11, 23–25]. Predicate abstraction, first proposed by Graf [11] as a special case of the general framework of abstraction interpretation, has been used in the verification of parameterized protocols. In predicate abstraction, a finite set of predicates is defined over the concrete set of states. These predicates are used to construct a finite state abstraction of a concrete system. The automation in generating the finite abstract model makes this scheme attractive in combining deductive and algorithmic approaches for infinite state verification. Lahiri [26] proposed the use of a symbolic decision procedure and its application for predication abstraction. One of the main problems in predicate abstractions is that it typically makes a large number of theorem prover calls when computing the abstract transition relation or the abstract state space. Pnueli [23] presented the method of invisible invariants that combines a small-model theorem with a heuristics to generate proofs of correctness of parameterized systems. Wang [24] used monotonic abstraction to provide an over-approximation of the transition system induced by a parameterized system. The over-approximation gives a transition system which is monotonic with respect to a well quasi-ordering on the set of configurations. Timm [25] presented an approach combining symmetry arguments with spotlight abstractions. The technique determines (the size of) a particular instantiation of the parameterized system from the given temporal logic formula, and feds this into an abstracting model checker. Environment abstraction [8, 9] exploits the replicated structure of a parameterized system to make its verification easy, and it converts the unbounded system into a bounded one via finite state description method. In real cache coherence protocols, the internal state of each cache can be quite complex, and thus environment abstraction might fail. The other method is divide-and-conquer, in other words, abstraction for each process is made independently before the model for the whole system is constructed [27]. Unfortunately, too many constraints for systems under consideration make this way unpractical.
Other related work includes that of Pandav [28] who has proposed a set of heuristics to aid in constructing invariants for cache protocols. Delzanno [29] used arithmetic constraints to model possibly infinite sets of global states of a multi-processor system with many identical caches. General purpose symbolic model checkers for infinite-state systems working over arithmetical domains were used. Delzanno and Bultan [30, 31] described a constraint based verification method for handling the safety and liveness properties of GERMAN protocol. But their method cannot verify single index liveness properties. Emerson and Kahlon [32] verified GERMAN by first reducing it to a snoopy bus protocol and then invoking a theorem asserting that if a snoopy bus protocol of a certain form is correct for 7 nodes then it is correct for any number of nodes. Pnueli proposed an elegant cutoff method that can verify the DIR protocol [10], but it was sound and not complete, and worked only for safety properties. A broad technique was proposed for the verification of WSIS systems that can handle the DIR protocol as an example [33], yet again the resulting technique was sound but not complete.
3 Preliminaries
This section contains basic material about the Kripke structure, temporal logic and equivalent relation on Kripke structures [34].
Definition 1
(Kripke structure)
- 1.
S is a finite set of states.
- 2.
I⊆S is the set of initial states.
- 3.
R⊆S×S is a transition relation that must be total, that is, for every state s∈S there is a state s′∈S such that R(s,s′).
- 4.
L:S→2^{ AP } is a function that labels each state with the set of atomic propositions true in that state.
Temporal logic is used to specify properties of Kripke structures. CTL ^{⋆}, a powerful logic, describes properties of computation trees. A tree is formed by designating a state in a Kripke structure as the initial state and then unwinding the structure into an infinite tree with the designated state at the root. In CTL ^{⋆}, formulas are composed of path quantifiers and temporal operators. The path quantifiers are used to describe the branching structure in the computation tree. There are two such quantifiers A (for all computation paths) and E (for some computation path). The temporal operators, X (next time), F (in the future), G (always), U (until), and R (release) describe properties of a path through the tree.
- 1.
If p∈AP, then p is a state formula.
- 2.
If f and g are state formulas, then ¬f,f∧g, and f∨g are state formulas.
- 3.
If f is a path formula, then Ef and Af are state formulas.
- 4.
If f is a state formula,then f is also a path formula.
- 5.
If f and g are path formulas, then ¬f, f∧g, f∨g, Xf, Ff, fUg, and fRg are path formulas.
Let M be a Kripke structure over AP. A path in M from a state s is an infinite sequence of states π=s _{0} s _{1} s _{2}⋯ such that s _{0}=s and R(s _{ i },s _{ i+1}) holds for all i≥0. We use π ^{ i } to denote the suffix of π starting at s _{ i }.
The restriction of CTL ^{⋆} to universal path quantifiers A is called ACTL ^{⋆}.
Simulation equivalence restricts the logic and relaxes the requirement that the structures should satisfy exactly the same formulas, resulting in a great reduction.
Definition 2
(Simulation relation)
- 1.
L(s)∩AP′=L′(s′).
- 2.
For every state s _{1} such that R(s,s _{1}), there is a state \(s_{1}^{\prime}\) with the property that \(R^{\prime}(s^{\prime},s_{1}^{\prime})\) and \(H(s_{1},s_{1}^{\prime})\).
If there exists a simulation relation H such that for every initial state s _{0} in M there is an initial state \(s_{0}^{\prime}\) in M′ for which \(H(s_{0},s_{0}^{\prime})\), we say that M′ simulates M (denoted by M⪯M′).
4 Modeling parameterized systems
States of each process in a parameterized system are considered as interpretations over a finite variable set, V. For each V, a subset V ^{ e } is called an external variable set that is used by the process to communicate with the environment consisting of other processes. The set V ^{ i }=V−V ^{ e } is an internal variable set. Obviously, the environment may update only external variables, whereas the process may update all the variables. Such processes are modeled by Kripke structures which describe a class of finite state systems with first-order logic propositions. A complex parameterized system is modeled as a composition of such smaller processes when the following conditions are met.
Definition 3
(Compatible structure)
Two Kripke structures M _{1}=(AP _{1},S _{1},I _{1},R _{1},L _{1}) and M _{2}=(AP _{2},S _{2},I _{2},R _{2},L _{2}) are involved, in which V _{1} and V _{2} are their respective state variable sets. If \(V_{1}^{i} \cap V_{2}^{i}= \varnothing\) and \(V_{1}^{e}=V_{2}^{e}\) are true, then M _{1} and M _{2} are compatible structures. The former condition indicates that internal variables are owned only by one process and the latter requires external variables shared by both processes.
Definition 4
(Compatible state)
Let M _{1}=(AP _{1},S _{1},I _{1},R _{1},L _{1}) and M _{2}=(AP _{2},S _{2},I _{2},R _{2},L _{2}) be two compatible structures. If L _{1}(s _{1}) ∩ AP _{2}=L _{2}(s _{2}) ∩ AP _{1} is true, then s _{1}∈S _{1} and s _{2}∈S _{2} are compatible. Compatible states agree on the external variables as well as the common atomic propositions.
Processes communicate with each other in the synchronous or asynchronous mode. In the synchronous execution mode, all processes execute the transitions at the same time, whereas in the asynchronous execution mode, the process state transitions are independent of each other: the system evolves by interleaving the evolution of its processes. At each execution cycle, only one process is chosen to perform a transition. However, parameterized systems, in which different processes may change their states at the same time, are very common in reality. There is no order between these transitions, thus preserving the true meanings of concurrency. We call such a communication mode as asynchronous composition with true concurrency semantics. From the viewpoint of computer science, it is more interesting to investigate asynchronous products of Kripke structures with true concurrency semantics. We propose a formal model with true concurrency semantics for parameterized systems, which is more suitable for describing concurrent systems in the usual fashion.
Definition 5
(Asynchronous composition with true concurrency semantics)
- 1.
\(\mathit{AP}={{\bigcup}^{n}_{k=1} \mathit{AP}_{k}}\).
- 2.
\(S=\{<s_{1},s_{2},\ldots,s_{n}>|s_{k}\in S_{k}\ (1 \le k \le n)\mbox{\textit{~are~compatible~states}}\} \subseteq{\prod}^{n}_{k=1}{S_{k}}\).
- 3.
\(I=\{<s_{1},s_{2},\ldots,s_{n}>|{\bigwedge^{n}_{k=1}s_{k}}\in I_{k}\}\subseteq S\).
- 4.
R={(<s _{1,i },s _{2,i },…,s _{ n,i }>,<s _{1,i+1},s _{2,i+1},…,s _{ n,i+1}>)|∃j,1≤j≤n,(s _{ j,i },s _{ j,i+1})∈R _{ j }}.
- 5.
\(L(<s_{1},s_{2},\ldots,s_{n}>)={\bigcup}^{n}_{k=1}L_{k}(s_{k})\).
Theorem 1
The asynchronous composition operator with true concurrency semantics, ∏_{ a }, is commutative and associative.
Proof
By Definition 5, the set of atomic propositions of the composition is a union of component atomic propositions; so is the set of labels. States of the composition are vectors of component states that are compatible, and they are elements of the Cartesian product of component states. Each transition of the composition involves at least a transition of n components. Because the union and product of sets are commutative and associative, the asynchronous composition operator with true concurrency semantics is also commutative and associative. □
5 Two-dimensional abstraction
Definition 6
(Two-dimensional abstraction) For asynchronous concurrent parameterized systems with true concurrency semantics, two-dimensional abstraction is a process constructing an abstract model by first reducing the state space of each process independently along the y axis in order to reduce m and then hiding the system parameter n along the x axis based on the design principles of parameterized systems. The former step is called y-abstraction, and the latter x-abstraction. The corresponding reduced results are called the y-abstract model and TDA model, respectively.
The selection of an equivalence relation between a TDA model and a concrete system is of prime importance for the successful application of TDA in practice. Simulation relationship [35] will result in a greater reduction of the number of states by restricting logic and relaxing the requirement that two structures should satisfy exactly the same set of formulas. Given two Kripke structures M _{1}=(AP _{1},S _{1},I _{1},R _{1},L _{1}) and M _{2}=(AP _{2},S _{2},I _{2},R _{2},L _{2}) with AP _{2}⊆AP _{1}, if there exists a simulation relation H such that for every initial state s _{10} (s _{10}∈I _{1}) in M _{1} there is an initial state s _{20} (s _{20}∈I _{2}) in M _{2} for which H(s _{10},s _{20}), we say that M _{2} simulates M _{1} and denote it by M _{1}≼M _{2}. Intuitively, for every transition in M _{1}, there is a corresponding transition in M _{2}.
In the following sections, PS ^{ c }(n) refers to the concrete model of asynchronous concurrent parameterized systems with true concurrency semantics consisting of n concrete processes. PS ^{ y }(n) is the y-abstract model of PS ^{ c }(n) and PS ^{ t }(n) is its TDA model.
5.1 y-Abstraction
The y-abstraction deals with each concrete process independently in order to abstract away the information irrespective of system properties. Any property-preserving abstraction method is available. We construct a finite predicate set Φ={φ _{1},φ _{2},…,φ _{ r }} from properties and system description, and build the y-abstract model through the method of basic predicate abstraction.
The predicate set Φ defines an equivalence relationship on \(S_{k}^{c}\), the set of states of \(M_{k}^{c}=(\mathit{AP}_{k}^{c},S_{k}^{c},I_{k}^{c},R_{k}^{c},L_{k}^{c})\ (1 \le k \le n)\), and each equivalence class is denoted by an abstract state. The concrete state is labeled with a predicate formula which is satisfied in that state. In other words, labeling function \(L_{k}^{c}\) maps a concrete state into a predicate set. The set of states of the y-abstract model \(M_{k}^{y}\), \(S_{k}^{y}\) is a set of normal boolean expressions on b _{1},b _{2},…,b _{ r } (b _{ j }(1≤j≤r) corresponding to predicate φ _{ j }. A y-abstract state is a truth assignment to r boolean variables. Labeling function \(L_{k}^{y}\) maps a y-abstract state into a boolean expression. The abstract operator \(H_{k}^{cy}\) determines the relationship between concrete states and abstract states. The method of building the transition relation \(R_{k}^{y}\) of the y-abstract model \(M_{k}^{y}\) from the concrete transition relation \(R_{k}^{c}\) is the same as that introduced by Graf and Saidi [11]. From the above definitions, we can conclude that \(H_{k}^{cy} \subseteq S_{k}^{c} \times S_{k}^{y}\) is a simulation relation between \(M_{k}^{c}\) and \(M_{k}^{y}\), so the following theorem holds.
Theorem 2
\(M_{k}^{c} \preccurlyeq M_{k}^{y}\ (1 \le k \le n)\).
Proof
The proof is given in [11]. □
In the following, we will demonstrate how the y-abstraction affects the parameterized concurrent systems.
Definition 7
(Visible transitions set and invisible transitions set)
Given a Kripke structure M=(AP,S,I,R,L), we assume that AP _{ f } is the set of atomic propositions involved in the temporal formula f. The set of visible transitions of M w.r.t. AP _{ f } includes transitions affecting the truth of atomic propositions in AP _{ f }, which is denoted by VTS(M,AP _{ f })={(s,t)|(s,t)∈R∧(L(s)∩AP _{ f }≠L(t)∩AP _{ f })}. The set of IVTS(M,AP _{ f })=R−VTS(M,AP _{ f }) is called the set of invisible transitions of M w.r.t. AP _{ f }.
It is obvious that VTS(M,AP _{ f }) and IVTS(M,AP _{ f }) relate to the system property. Both of them satisfy VTS(M,AP _{ f })∩IVTS(M,AP _{ f })=∅ and VTS(M,AP _{ f })∪IVTS(M,AP _{ f })=R.
Theorem 3
The asynchronous composition with true concurrency semantics operator ∏_{ a } is monotonic w.r.t. ≼, that is, \(M_{k}^{c}\preccurlyeq M_{k}^{y}\) (1≤k≤n)⇒PS ^{ c }(n)≼PS ^{ y }(n).
Proof
Let \(\mathit{PS}^{c}(n)=(\mathit{AP}^{c},S^{c},I^{c},R^{c},L^{c})=\prod_{a\ k=1}^{n} M_{k}^{c}\) be an asynchronous composition with true concurrency semantics, where \(M_{k}^{c}=(\mathit{AP}_{k}^{c},S_{k}^{c},I_{k}^{c},R_{k}^{c},L_{k}^{c})\). Its y-abstract model is denoted by \(\mathit{PS}^{y}(n)=(\mathit{AP}^{y},S^{y},I^{y},R^{y},L^{y})=\prod^{n}_{a\ k=1} M_{k}^{y}\), where \(M_{k}^{y}=(\mathit{AP}_{k}^{y},S_{k}^{y},I_{k}^{y},R_{k}^{y},L_{k}^{y})\).
That is to say, a y-abstract state is obtained by applying \(H_{k}^{cy}\ (1 \le k \le n)\) to the kth element in concrete state s ^{ c }.
- 1.
L ^{ c }(s ^{ c })∩AP ^{ y }=L ^{ y }(s ^{ y }).
- 2.
∀t ^{ c } t ^{ c }∈S ^{ c }∧R ^{ c }(s ^{ c },t ^{ c })⇒∃t ^{ y } t ^{ y }∈S ^{ y }∧R ^{ y }(s ^{ y },t ^{ y })∧H ^{ cy }(t ^{ c },t ^{ y }).
Proof of condition (1): L ^{ c }(s ^{ c })∩AP ^{ y }=L ^{ y }(s ^{ y }).
Hence, condition (1) is true.
Proof of condition (2): ∀t ^{ c } t ^{ c }∈S ^{ c }∧R ^{ c }(s ^{ c },t ^{ c })⇒∃t ^{ y } t ^{ y }∈S ^{ y }∧R ^{ y }(s ^{ y },t ^{ y })∧H ^{ cy }(t ^{ c },t ^{ y }).
For each \(t^{c}=\langle t_{1a^{\prime}}^{c},t_{2b^{\prime}}^{c},\dots,t_{kl^{\prime}}^{c},\dots,t_{ng^{\prime}}^{c}\rangle \in S^{c}\), R ^{ c }(s ^{ c },t ^{ c }) implies that there is at least one component in a concrete model that makes a transition. Suppose that the former k (1≤k≤n) components make transitions, while the latter n−k components do not. There are several cases to be considered.
Case 1: \(t^{c} \neq s^{c} \wedge R_{k}^{c}(s_{kl}^{c},t_{kl^{\prime}}^{c}) \in \mathit{IVTS}(M_{k}^{c},\mathit{AP}_{f})\), as represented in the middle of Fig. 2.
This expression indicates that applying \(H_{k}^{cy}\) to the kth element of t ^{ c } will yield its y-abstract state, thus, (t ^{ c },t ^{ y })∈H ^{ cy }.
From (11), there is at least one element in s ^{ y } and t ^{ y } that satisfies \(R_{k}^{y}(s_{ke}^{y},t_{ke^{\prime}}^{y})\), so (s ^{ y },t ^{ y })∈R ^{ y }.
The other two cases, \(t^{c} \neq s^{c} \wedge R_{k}^{c}(s_{kl}^{c},t_{kl^{\prime}}^{c})\in \mathit{VTS}(M_{k}^{c},\mathit{AP}_{f})\) and t ^{ c }=s ^{ c }, can be discussed in a similar way.
To this point, both conditions (1) and (2) are true. We conclude that H ^{ cy }⊆S ^{ c }×S ^{ y } is a simulation between PS ^{ c }(n) and PS ^{ y }(n). By Definition 2, for every initial state \(s_{0}^{c} \in I^{c}\) in PS ^{ c }(n) there is an initial state \(s_{0}^{y} \in I^{y}\) in PS ^{ y }(n) such that \(H^{cy}(s_{0}^{c},s_{0}^{y})\), as a consequence, this theorem is proved. □
Theorem 3 implies that the y-abstract model is weakly-preserved w.r.t. ACTL* formula. Applying this theorem to each kind of ACTL* formula, we get the following conclusion.
Theorem 4
For each ACTL* formula f(AP _{ f }⊆AP ^{ y }), PS ^{ y }(n)⊨f⇒PS ^{ c }(n)⊨f.
Proof
Hence, PS ^{ y }(n)⊨f⇒PS ^{ c }(n)⊨f holds. It is proved in [34]. □
Intuitively, this theorem is true because formula in ACTL* describes properties that are quantified over all possible behaviors of a system. Because every behavior of PS ^{ y }(n) is a behavior of PS ^{ c }(n) , every formula of ACTL* that is true in PS ^{ y }(n) must also be true in PS ^{ c }(n). Theorem 4 is very useful for large scale system verification since it provides a way of accelerating the verification by taking advantage of exhaustive search of a smaller state space.
5.2 x-Abstraction
During the construction of parameterized systems, the designers reason about its correctness by focusing on the execution of one process (called hub) and consider its interaction with other processes (called rims, all rims constitute the hub’s environment) [8]. The x-abstraction, following this idea, produces a much smaller state space.
It is straightforward to find that \(L_{k}^{y}(s_{k}^{y})\) (1≤k≤n) on the right hand side of the identity is the set of all labels of rims (or hubs) and they are atomic propositions that process k satisfies in the current state. These atomic propositions reflect process properties. Consequently, the object of x-abstraction is the whole parameterized system whose properties relate to either one process or many processes.
Definition 8
(Process property)
The first-order predicate prop(k), 1≤k≤n, indicating that the kth process has property prop, is called process property. We use PROP(k)={prop(k)} to denote all properties the kth process holds.
Given a process d, the d-label is an instance of prop(k), meaning that process d meets the property prop. PROP(d)={prop(d)} is the set of all d-labels. For every s ^{ y } (s ^{ y }∈S ^{ y }) and process d (1≤d≤n), we have either s ^{ y }⊨prop(d) or s ^{ y }⊭prop(d). If s ^{ y }⊨prop(d) holds, the y-abstract state s ^{ y } has the label prop(d).
It is interesting to note that the global label of the y-abstract state s ^{ y } is all the process properties it satisfied. Next we will introduce a new notation to describe the parameterized system.
Definition 9
The first-order predicate snps(k)=prop(k)∧(⋀_{ j≠k } prop(j)) describes not only the kth process but also its environment (comprising the jth process). snps(k) is a quite detailed picture of the global system, and all the snapshots are represented as SNPS={snps(k)}.
A snapshot snps(k) gives the necessary condition that an equivalent partition meets on PS ^{ y }(n): if there exits a process d satisfying s ^{ y }⊨snps(d), snps(k) is one of the abstract states of s ^{ y }. All such y-abstract states which satisfy the above condition compose an equivalence class. If snps(k) were of the form ±prop _{1}(k)∧±prop _{2}(k)∧⋯∧±prop _{ r }(k), r>1, where prop _{1}(k),…,prop _{ r }(k) are r process properties and ±prop _{ i }(k) (1≤i≤r) indicates that prop _{ i }(k) appears positive or negative, snps(k) can be expressed by a tuple 〈b _{1},b _{2},…,b _{ r }〉, where b _{ i }=1⇔snps(k)⇒prop _{ i }(k). That is, the value of each bit b _{ i } reflects the polarity of the corresponding predicate prop _{ i }(k) in snps(k). Labeling the y-abstract states with atomic formulas will result in a much smaller state space.
In order to construct a TDA model, PROP and SNPS must meet two conditions: coverage and congruence. Coverage means that every y-abstract state is reflected by some snapshots, and congruence implies that snps(k) contains enough information about a process to conclude a label holds true for this process or not. That is to say, for each snps(k)∈SNPS and each prop(k)∈PROP it holds that snps(k)→prop(k) or snps(k)→¬prop(k).
- 1.
AP ^{ t } is the set of atomic propositions involved in the process property prop(k), and AP ^{ t }=AP ^{ y } according to Definition 8;
- 2.
S ^{ t }=SNPS is the set of abstract states: the abstract operator α _{ n }(s ^{ y })={snps(k)∈SNPS|s ^{ y }⊨snps(n)} maps all the y-abstract states s ^{ y }, where hub meets the condition of snps(k), into the TDA abstraction state snps(k);
- 3.
I ^{ t } is the set of initial abstract states: snps(k)∈I ^{ t } if there exists a parameterized system PS ^{ y }(n) and a y-abstract state s ^{ y }∈I ^{ y } such that snps(k)∈α _{ n }(s ^{ y });
- 4.
L ^{ t } is the labeling function: for each snps(k)∈S ^{ t },L ^{ t }(snps(k))={prop(k):snps(k)⇒prop(n)};
- 5.
R ^{ t } is the set of abstract transitions: for each snps _{1}(k)∈S ^{ t },snps _{2}(k)∈S ^{ t }, if there exist a parameterized system PS ^{ y }(n) and two y-abstract states s ^{ y }∈S ^{ y },t ^{ y }∈S ^{ y } which meet the condition of snps _{1}(k)∈α _{ n }(s ^{ y })∧snps _{2}(k)∈α _{ n }(t ^{ y })∧(s ^{ y },t ^{ y })∈R ^{ y }, then (snps _{1}(k),snps _{2}(k))∈R ^{ t }.
The TDA abstract state is labeled with prop(k) which process k satisfies, and now k becomes finite after y-abstraction, therefore, S ^{ t } is finite, too. From the theoretical perspective, TDA will reduce the space by (|S|−|S ^{ t }|)/|S| where S is the set of asynchronous composition states defined in Definition 5. At this time, our goal of reducing the state space of parametric verification has been achieved.
Theorem 5
For a single-indexed ACTL* specification ∀x φ(x) where the atomic formulas involved in φ(x) are labels in L ^{ t }, the following holds: PS ^{ t }⊨φ(x)⇒∀n PS ^{ y }(n)⊨∀xφ(x).
Proof
The proof is given in [36]. □
The correctness of TDA means that TDA model is weakly-preserved for single indexed ACTL* specifications, which is guaranteed by Theorems 3, 4, and 5. In addition, Theorem 5 implies that TDA is sound, namely, any single-indexed ACTL* specification which holds in a TDA model also holds in a concrete model with arbitrary number of processes. The completeness and soundness of our approach provide a solid theoretical foundation for optimizing the state space of parameterized systems.
6 An example
We show how the TDA runs on parameterized MESI protocol. The MESI protocol is a four-state write-invalidate cache coherence protocol in which every memory block can be in one of the following states: Modified, Exclusive, Shared, and Invalid [37]. Invalid means that a memory block is not present in the cache and to load it the processor would have to send a request (LD) to the main memory. Modified identifies cache lines that have been written by the corresponding processor (ST). The current version of the modified block resides in the cache and is not visible to the rest of the system at this time. The processor can perform LD, ST, and Eviction on this data. Shared is the only state which allows other valid copies of the same memory block to be stored in other caches. A processor can load from a Shared memory block or evict it without notifying other processors or the memory. Exclusive means that the processor is the one who owns the right to modify the block and the main memory is current with the contents of the cache. If one cache has an Exclusive or Modified state, all matching lines in other caches are marked Invalid.
Now we want to validate PS ^{ c }(3) which satisfies such a property that there exists a processor without a copy of a block of memory when it is shared by another processor. The first step is to simplify the MESI protocol for a single processor through y-abstraction by Definition 6. Because the above property only relates to the state of cache line and does not care its value, cachedata is redundant. The Kripke structure of the reduced MESI protocol by y-abstraction is shown on the right hand side of Fig. 3, where states are labeled with predicates satisfied in the current state, for example, ‘M’ means cachestate=MODF.
PS ^{ y }(3) state space partition using snps(2)
Equivalence class |
Label of equivalence class |
Bit-vector of label |
---|---|---|
{ISI,SSI,ISS} |
prop _{1}(2)∧prop _{2}(2)=snps(2) |
〈11〉 |
{III,IEI,IMI,EII,MII,IIE,IIS,IIM,SII} |
¬prop _{1}(2)∧prop _{2}(2)=¬snps(2) |
〈01〉 |
{SSS} |
prop _{1}(2)∧¬prop _{2}(2)=¬snps(2) |
〈10〉 |
{SIS} |
¬prop _{1}(2)∧¬prop _{2}(2)=¬snps(2) |
〈00〉 |
7 Case studies
To validate our approach, we have implemented TDA and applied it to verify several classical cache coherence protocols as described in [38] and a hierarchical cache protocol in FT-1000 CPU.
7.1 Protocols and properties to be verified
Classical protocols and properties these protocols should have are introduced briefly here.
Synapse N+1
There are two possible sources of data inconsistency for Synapse:
UNS1: a dirty cache co-exists with one or more caches in state valid;
UNS2: more than one cache is in state dirty.
Illinois
The possible sources of data inconsistency are:
UNS1: a dirty cache co-exists with caches either in state shared or valid-exclusive;
UNS2: there is more than one dirty cache.
The other possible violations of the exclusivity of state valid-exclusive are:
UNS3: there is more than one valid-exclusive cache;
UNS4: a shared cache co-exists with a cache in state valid-exclusive.
Berkeley
In the Berkeley protocol, we have the following sources of data inconsistency:
UNS1: an owned exclusively cache co-exists with one or more caches either in state owned non-exclusively, or unowned;
UNS2: there is more than one owned exclusively cache.
Dragon
P≜Number(exclusive)=0∧Number(dirty=0)∧Number(shared-dirty)=0∧Number(shared-clean)=0,
Q≜Number(shared-dirty)+Number(shared-clean)≥2,
S≜Number(shared-dirty)=0∧Number(shared-clean)=1,
T≜Number(shared-dirty)=1∧Number(shared-clean)=0.
In the Dragon protocol, there are several possible sources of data inconsistency:
UNS1: a dirty cache co-exists with one or more caches either in state shared dirty, shared clean or valid exclusive;
UNS2: an valid exclusive cache co-exists with one or more caches either in state shared clean, or shared dirty;
UNS3: there is more than one dirty cache;
UNS4: there is more than one valid exclusive cache.
7.2 Experimental results
The asynchronous composition of n-processor system which ensures the data consistency through some protocol is a concrete system. Figure 10 shows the number of concrete states of each protocol against different system parameter according to Definition 5. Although in the worst case the number of states in asynchronous composition could be as large as \({\prod}^{n}_{k=1}|{S_{k}}|\), in practice it typically turns out to be much smaller. This is because some states, such as 〈dirty,dirty〉 in Illinois protocol and 〈owned-exclusively,owned-exclusively〉 in Berkeley protocol are prohibited. As it is seen from this figure, with the increase of processor number (especially greater than 13 for Berkeley and Dragon, 20 for Synapse N+1 and Illinois), the state number grows rapidly. Therefore, the largest asynchronous composition we can get only comprises 24 processors (Synapse N+1).
In Fig. 11, we plot the number of states in TDA model of each protocol. Because process properties used in TDA are made of predicates taken from properties to be verified, different properties for the same protocol have different TDA models. Two predicates, cachestate(i)=dirty/shared and \({\mathit{Number}(\mathit{dirty}/\mathit{valid} \text {-}\mathit{exclusive})}\), are enough to express these properties formally, resulting in 4, the maximum number of TDA abstract states. AHG denotes the number of reachable states in the abstract history graph described in [39] which are greater than those in TDA. It is also important to notice that the number of states in TDA model does not change along with the system parameter, which is consistent with the conclusion in Sect. 6. All experiments were conducted on a PC with a 3.3 GHz Intel Core processor, 8 Gb of available main memory, running Red Hat Linux (6.1) and GCC (4.4.5).
7.3 Application for FT-1000 CPU
Experimental results of FT-1000 chip-level protocol
Asynchronous composition state number |
TDA state number |
Time (ms) | ||
---|---|---|---|---|
UNS1 |
UNS2 |
UNS1 |
UNS2 | |
264 |
4 |
2 |
64 |
57 |
8 Conclusions
The verification of cache coherence in general is known to be NP-hard. In the age of exascale computing, scalability is emerging as one of the key components in parallel computing [41]. Scalable multi-core multi-processor architectures are inevitable. More and more complex processes and unbounded system parameter result in the state explosion during the verification of parameterized cache coherence protocols. A generic abstraction method for parameterized systems, two-dimensional abstraction (TDA), has been put forward in this paper. The novelty of our approach lies in that it analyzes in depth the intrinsic factors affecting the size of state space, and reduces the state space in two dimensions, thus a much smaller abstract model is produced. Compared with traditional approaches, our approach can effectively reduce the verification complexity and greatly scale the verification capabilities. We give complete soundness and completeness proofs for our method. We have demonstrated the benefits of our approach on several coherence protocols with realistic features.
Our future work is to integrate TDA with model-checking tools and check the advanced cache coherence protocol hierarchically organized for a next generation supercomputer. We also plan to investigate combining TDA with CMP method in the future.
Acknowledgements
This work is inspired by the idea from M. Talupur’s work on environment abstraction, and supported by the National Natural Science Foundation of China under Grant No. 61070036 and 61133007.
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.