# State space reduction in modeling checking parameterized cache coherence protocol by two-dimensional abstraction

- First Online:

DOI: 10.1007/s11227-012-0755-0

- Cite this article as:
- Guo, Y., Qu, W., Zhang, L. et al. J Supercomput (2012) 62: 828. doi:10.1007/s11227-012-0755-0

- 761 Downloads

## Abstract

Scalability of cache coherence protocol is a key component in future shared-memory multi-core or multi-processor systems. The state space explosion is the first hurdle while applying model-checking to scalable protocols. In order to validate parameterized cache coherence protocols effectively, we present a new method of reducing the state space of parameterized systems, two-dimensional abstraction (TDA). Drawing inspiration from the design principle of parameterized systems, an abstract model of an unbounded system is constructed out of finite states. The mathematical principles underlying TDA is presented. Theoretical reasoning demonstrates that TDA is correct and sound. An example of parameterized cache coherence protocol based on MESI illustrates how to produce a much smaller abstract model by TDA. We also demonstrate the power of our method by applying it to various well-known classes of protocols. During the development of TH-1A supercomputer system, TDA was used to verify the coherence protocol in FT-1000 CPU and showed the potential advantages in reducing the verification complexity.

### Keywords

Parameterized cache coherence protocolTrue concurrencyModel checkingTwo-dimensional abstraction## 1 Introduction

Model checking is an automatic technique for verifying finite state concurrent systems, which uses a finite state machine to describe the system under consideration and temporal logic to state the properties that the system must satisfy. This method has been used successfully in practice to verify complex software and hardware systems [1, 2]. However, efficient verification of parameterized cache coherence protocols is one of the most challenging problems in verification domain today. Firstly, parameterized systems are composed of an arbitrary number of processes which concur cooperatively (the number of processes is called the system parameter). The behavior of one process is determined not only by its current state, but also the changes of the environment it lives. Secondly, parameterized systems are by nature unbounded. The system parameter may be arbitrarily large, and the ultimate goal is to validate the properties in a system for every possible number of processes. In such cases, the number of global states can be enormous, resulting in the state space explosion. Formal verification of parameterized systems is known to be undecidable and thus cannot be automated. Thirdly, symbolic methods such as BDD or SAT, which can enable scalable formal verification methods, can be ineffective when it comes to cache coherence protocols because most of the state variables are relevant in protocol property verification. As faster larger systems are designed, the complexity of cache protocols will continue to increase.

Fong Pong [3] presented a comprehensive survey of various approaches to the verification of cache coherence protocol based on state enumeration, model checking, and symbolic state models. He pointed out that no framework had been proposed so far to deal with the memory consistency model in the context of formal verification based on state expansion. Monolithic formal verification methods that treat the protocol as a whole have been used fairly routinely for verifying cache coherence protocols from the early 1990s [4, 5]. However, these monolithic techniques will not be able handle the very large state space of parameterized protocols. While techniques like indexed predicates [6], counter abstraction [7], environment abstractions [8, 9], and cutoffs based approach [10] have been proposed for parameter protocol verification during these years, none of them scales well to large protocols, and those that do scale require an inordinate amount of manual effort to succeed [11]. We are not aware of any published work that has reported formal verification of a parameterized cache coherence protocol with reasonable complexity.

All successful applications of model checking thus far have made use of domain specification abstraction techniques. Continuing this trend and drawing inspiration from recent work like environment abstraction [8, 9], we exploit the domain knowledge about parameterized systems to devise an appropriate abstraction method. We propose a novel generic approach called two-dimensional abstraction (TDA), which could effectively reduce the state space of parameterized systems. In our work, the size of the state transition graph for each process is reduced independently at first, then the whole system composed of the reduced processes is abstracted based on the design principles of parameterized systems, thus avoiding the construction of the complete state space that might be too large to fit into memory.

TDA has a number of advantages over other approaches. First, TDA abstracts away redundant information from a concrete system via decomposition–abstraction–composition–reabstraction, thus effectively alleviating the state explosion problem during parameterized systems verification. Second, TDA can be used for parallel systems in the usual fashion because it has no limitation in communication mode among processes. Third, TDA can be used with any model checker. The freedom to choose model checkers is important in practice. Fourth, TDA is sound and complete. We give complete soundness and completeness proofs for our method. At last, constant heterogeneous processes and infinite state systems are allowed, which makes TDA suitable for large scale heterogeneous systems. We demonstrate the power of our method by applying it to various well-known classes of protocols.

The rest of this paper is organized as follows. In Sect. 2, we introduce previous related work. Section 3 gives some background information. In Sect. 4, we propose a model with true concurrency semantics for parameterized systems. In Sect. 5, we present concepts of a TDA model and the method to construct a TDA model. A cache coherence protocol based on MESI is used to illustrate the approach of getting a much smaller state space by TDA in Sect. 6. Experimental results of various well-known protocols and application are presented in Sect. 7. Section 8, the last section, presents concluding remarks.

## 2 Related works

The development of effective techniques for checking parameterized systems is one of the most challenging problems in verification today. Prior research in the area of coherence protocol verification has ranged from simulation to formal methods. These techniques have had varying degrees of success, but few of them have been applied to a large industrial-strength protocol like FLASH.

Simulation with random or directed stimulus has been shown to be effective at finding most protocol errors [12]. However, simulation tends not to be effective at uncovering subtle bugs, especially those related to the consistency model. Subtle consistency bugs often occur only under unusual combinations of circumstances, and it is unlikely that simulation will drive the protocol to these situations.

For verification of high level specifications, modern industrial practice consists of modeling small instances of the protocols in guard/action languages such as Murphi [13] or TLA+ [14], and exploring the reachable states through explicit state enumeration.

The idea of using non-interference lemmas for parameterized model checking is attributed to McMillan [15], Chou [16], and Li [17], which is also called the CMP method. The CMP approach to parameterized verification is a combination of data type reduction and compositional reasoning. In this approach, a model checker is used as proof assistant and the user guides the proof by supplying invariants or non-interference lemmas. Similar types of reasoning have been applied by Chen to verify non-parameterized hierarchical protocols [18]. The compositional method of McMillan is used for compositional reasoning to handle infinite state systems including directory based protocols. This technique, which requires user intervention at various stages, has been applied to verify safety and liveness properties of the FLASH protocol. The paper by Chou [16] presented a method along similar lines, that was used to verify safety of FLASH and GERMAN protocol. Krstic [19] gave a formalization of the method. The CMP method scales well. As far as we are aware, the CMP method is one of a few methods to handle the full complexity of the FLASH protocol. Intel used CMP to verify an industrial-strength cache protocol several orders of magnitude larger than even the FLASH protocol [20]. Talupur and Tuttle showed how to derive high-quality invariants from message flows and how to use these invariants to accelerate the CMP method [21, 22]. A message flow is a sequence of messages sent among processors during the execution of a protocol. The hardest part of using CMP is finding a set of protocol invariants that enable CMP to work. The user has the burden of coming up with non-interference lemmas which can be non-trivial and require deep understanding of the protocol under verification.

Another effective method for parameterized verification is the abstraction approach [6–9, 11, 23–25]. Predicate abstraction, first proposed by Graf [11] as a special case of the general framework of abstraction interpretation, has been used in the verification of parameterized protocols. In predicate abstraction, a finite set of predicates is defined over the concrete set of states. These predicates are used to construct a finite state abstraction of a concrete system. The automation in generating the finite abstract model makes this scheme attractive in combining deductive and algorithmic approaches for infinite state verification. Lahiri [26] proposed the use of a symbolic decision procedure and its application for predication abstraction. One of the main problems in predicate abstractions is that it typically makes a large number of theorem prover calls when computing the abstract transition relation or the abstract state space. Pnueli [23] presented the method of invisible invariants that combines a small-model theorem with a heuristics to generate proofs of correctness of parameterized systems. Wang [24] used monotonic abstraction to provide an over-approximation of the transition system induced by a parameterized system. The over-approximation gives a transition system which is monotonic with respect to a well quasi-ordering on the set of configurations. Timm [25] presented an approach combining symmetry arguments with spotlight abstractions. The technique determines (the size of) a particular instantiation of the parameterized system from the given temporal logic formula, and feds this into an abstracting model checker. Environment abstraction [8, 9] exploits the replicated structure of a parameterized system to make its verification easy, and it converts the unbounded system into a bounded one via finite state description method. In real cache coherence protocols, the internal state of each cache can be quite complex, and thus environment abstraction might fail. The other method is divide-and-conquer, in other words, abstraction for each process is made independently before the model for the whole system is constructed [27]. Unfortunately, too many constraints for systems under consideration make this way unpractical.

Other related work includes that of Pandav [28] who has proposed a set of heuristics to aid in constructing invariants for cache protocols. Delzanno [29] used arithmetic constraints to model possibly infinite sets of global states of a multi-processor system with many identical caches. General purpose symbolic model checkers for infinite-state systems working over arithmetical domains were used. Delzanno and Bultan [30, 31] described a constraint based verification method for handling the safety and liveness properties of GERMAN protocol. But their method cannot verify single index liveness properties. Emerson and Kahlon [32] verified GERMAN by first reducing it to a snoopy bus protocol and then invoking a theorem asserting that if a snoopy bus protocol of a certain form is correct for 7 nodes then it is correct for any number of nodes. Pnueli proposed an elegant cutoff method that can verify the DIR protocol [10], but it was sound and not complete, and worked only for safety properties. A broad technique was proposed for the verification of WSIS systems that can handle the DIR protocol as an example [33], yet again the resulting technique was sound but not complete.

## 3 Preliminaries

This section contains basic material about the Kripke structure, temporal logic and equivalent relation on Kripke structures [34].

### Definition 1

(Kripke structure)

*AP*be a set of atomic propositions. A

*Kripke structure*

*M*over

*AP*is a five-tuple

*M*=(

*AP*,

*S*,

*I*,

*R*,

*L*) where

- 1.
*S*is a finite set of states. - 2.
*I*⊆*S*is the set of initial states. - 3.
*R*⊆*S*×*S*is a transition relation that must be total, that is, for every state*s*∈*S*there is a state*s*′∈*S*such that*R*(*s*,*s*′). - 4.
*L*:*S*→2^{AP}is a function that labels each state with the set of atomic propositions true in that state.

Temporal logic is used to specify properties of Kripke structures. *CTL*^{⋆}, a powerful logic, describes properties of computation trees. A tree is formed by designating a state in a Kripke structure as the initial state and then unwinding the structure into an infinite tree with the designated state at the root. In *CTL*^{⋆}, formulas are composed of path quantifiers and temporal operators. The path quantifiers are used to describe the branching structure in the computation tree. There are two such quantifiers *A* (for all computation paths) and *E* (for some computation path). The temporal operators, *X* (next time), *F* (in the future), *G* (always), *U* (until), and *R* (release) describe properties of a path through the tree.

*CTL*

^{⋆}: state formulas which are true in a specific state and path formulas which are true along a specific path. Let

*AP*be the set of atomic propositions, the syntax of

*CTL*

^{⋆}is given by the following rules:

- 1.
If

*p*∈*AP*, then*p*is a state formula. - 2.
If

*f*and*g*are state formulas, then ¬*f*,*f*∧*g*, and*f*∨*g*are state formulas. - 3.
If

*f*is a path formula, then*Ef*and*Af*are state formulas. - 4.
If

*f*is a state formula,then*f*is also a path formula. - 5.
If

*f*and*g*are path formulas, then ¬*f*,*f*∧*g*,*f*∨*g*,*Xf*,*Ff*,*fUg*, and*fRg*are path formulas.

Let *M* be a Kripke structure over *AP*. A path in *M* from a state *s* is an infinite sequence of states *π*=*s*_{0}*s*_{1}*s*_{2}⋯ such that *s*_{0}=*s* and *R*(*s*_{i},*s*_{i+1}) holds for all *i*≥0. We use *π*^{i} to denote the suffix of *π* starting at *s*_{i}.

The restriction of *CTL*^{⋆} to universal path quantifiers *A* is called *ACTL*^{⋆}.

Simulation equivalence restricts the logic and relaxes the requirement that the structures should satisfy exactly the same formulas, resulting in a great reduction.

### Definition 2

(Simulation relation)

*M*and

*M*′ with

*AP*′⊆

*AP*, a relation

*H*⊆

*S*×

*S*′ is a

*simulation relation*between

*M*and

*M*′ if and only if for all

*s*and

*s*′, if

*H*(

*s*,

*s*′) then the following conditions hold:

- 1.
*L*(*s*)∩*AP*′=*L*′(*s*′). - 2.
For every state

*s*_{1}such that*R*(*s*,*s*_{1}), there is a state \(s_{1}^{\prime}\) with the property that \(R^{\prime}(s^{\prime},s_{1}^{\prime})\) and \(H(s_{1},s_{1}^{\prime})\).

If there exists a simulation relation *H* such that for every initial state *s*_{0} in *M* there is an initial state \(s_{0}^{\prime}\) in *M*′ for which \(H(s_{0},s_{0}^{\prime})\), we say that *M*′ *simulates**M* (denoted by *M*⪯*M*′).

## 4 Modeling parameterized systems

States of each process in a parameterized system are considered as interpretations over a finite variable set, *V*. For each *V*, a subset *V*^{e} is called an external variable set that is used by the process to communicate with the environment consisting of other processes. The set *V*^{i}=*V*−*V*^{e} is an internal variable set. Obviously, the environment may update only external variables, whereas the process may update all the variables. Such processes are modeled by Kripke structures which describe a class of finite state systems with first-order logic propositions. A complex parameterized system is modeled as a composition of such smaller processes when the following conditions are met.

### Definition 3

(Compatible structure)

Two Kripke structures *M*_{1}=(*AP*_{1},*S*_{1},*I*_{1},*R*_{1},*L*_{1}) and *M*_{2}=(*AP*_{2},*S*_{2},*I*_{2},*R*_{2},*L*_{2}) are involved, in which *V*_{1} and *V*_{2} are their respective state variable sets. If \(V_{1}^{i} \cap V_{2}^{i}= \varnothing\) and \(V_{1}^{e}=V_{2}^{e}\) are true, then *M*_{1} and *M*_{2} are compatible structures. The former condition indicates that internal variables are owned only by one process and the latter requires external variables shared by both processes.

### Definition 4

(Compatible state)

Let *M*_{1}=(*AP*_{1},*S*_{1},*I*_{1},*R*_{1},*L*_{1}) and *M*_{2}=(*AP*_{2},*S*_{2},*I*_{2},*R*_{2},*L*_{2}) be two compatible structures. If *L*_{1}(*s*_{1}) ∩ *AP*_{2}=*L*_{2}(*s*_{2}) ∩ *AP*_{1} is true, then *s*_{1}∈*S*_{1} and *s*_{2}∈*S*_{2} are compatible. Compatible states agree on the external variables as well as the common atomic propositions.

Processes communicate with each other in the synchronous or asynchronous mode. In the synchronous execution mode, all processes execute the transitions at the same time, whereas in the asynchronous execution mode, the process state transitions are independent of each other: the system evolves by interleaving the evolution of its processes. At each execution cycle, only one process is chosen to perform a transition. However, parameterized systems, in which different processes may change their states at the same time, are very common in reality. There is no order between these transitions, thus preserving the true meanings of concurrency. We call such a communication mode as asynchronous composition with true concurrency semantics. From the viewpoint of computer science, it is more interesting to investigate asynchronous products of Kripke structures with true concurrency semantics. We propose a formal model with true concurrency semantics for parameterized systems, which is more suitable for describing concurrent systems in the usual fashion.

### Definition 5

(Asynchronous composition with true concurrency semantics)

*M*

_{k}=(

*AP*

_{k},

*S*

_{k},

*I*

_{k},

*R*

_{k},

*L*

_{k}) be the

*k*th (1≤

*k*≤

*n*) Kripke structure among compatible structures. Their asynchronous composition with true concurrency semantics,

- 1.
\(\mathit{AP}={{\bigcup}^{n}_{k=1} \mathit{AP}_{k}}\).

- 2.
\(S=\{<s_{1},s_{2},\ldots,s_{n}>|s_{k}\in S_{k}\ (1 \le k \le n)\mbox{\textit{~are~compatible~states}}\} \subseteq{\prod}^{n}_{k=1}{S_{k}}\).

- 3.
\(I=\{<s_{1},s_{2},\ldots,s_{n}>|{\bigwedge^{n}_{k=1}s_{k}}\in I_{k}\}\subseteq S\).

- 4.
*R*={(<*s*_{1,i},*s*_{2,i},…,*s*_{n,i}>,<*s*_{1,i+1},*s*_{2,i+1},…,*s*_{n,i+1}>)|∃*j*,1≤*j*≤*n*,(*s*_{j,i},*s*_{j,i+1})∈*R*_{j}}. - 5.
\(L(<s_{1},s_{2},\ldots,s_{n}>)={\bigcup}^{n}_{k=1}L_{k}(s_{k})\).

### Theorem 1

*The asynchronous composition operator with true concurrency semantics*, ∏_{a}, *is commutative and associative*.

### Proof

By Definition 5, the set of atomic propositions of the composition is a union of component atomic propositions; so is the set of labels. States of the composition are vectors of component states that are compatible, and they are elements of the Cartesian product of component states. Each transition of the composition involves at least a transition of *n* components. Because the union and product of sets are commutative and associative, the asynchronous composition operator with true concurrency semantics is also commutative and associative. □

## 5 Two-dimensional abstraction

*x*axis denotes system parameter

*n*, and the

*y*axis denotes the state space of each process

*m*. To simplify the presentation, it is supposed that all processes are identical. Since the full cross-product of the process states needs to be considered in the global system at each step, the result of the asynchronous composition with true concurrency semantics is very large, in the worst case

*m*

^{n}. Too many reachable states impede the automatic verification in many practical cases. Two-dimensional abstraction technique proposed in this paper is specifically tailored for parameterized systems with true concurrency semantics and helps avoiding the problem of state explosion.

### Definition 6

(Two-dimensional abstraction) For asynchronous concurrent parameterized systems with true concurrency semantics, *two-dimensional abstraction* is a process constructing an abstract model by first reducing the state space of each process independently along the *y* axis in order to reduce *m* and then hiding the system parameter *n* along the *x* axis based on the design principles of parameterized systems. The former step is called *y-abstraction*, and the latter *x-abstraction*. The corresponding reduced results are called the *y-abstract* model and *TDA* model, respectively.

The selection of an equivalence relation between a TDA model and a concrete system is of prime importance for the successful application of TDA in practice. Simulation relationship [35] will result in a greater reduction of the number of states by restricting logic and relaxing the requirement that two structures should satisfy exactly the same set of formulas. Given two Kripke structures *M*_{1}=(*AP*_{1},*S*_{1},*I*_{1},*R*_{1},*L*_{1}) and *M*_{2}=(*AP*_{2},*S*_{2},*I*_{2},*R*_{2},*L*_{2}) with *AP*_{2}⊆*AP*_{1}, if there exists a simulation relation *H* such that for every initial state *s*_{10} (*s*_{10}∈*I*_{1}) in *M*_{1} there is an initial state *s*_{20} (*s*_{20}∈*I*_{2}) in *M*_{2} for which *H*(*s*_{10},*s*_{20}), we say that *M*_{2} simulates *M*_{1} and denote it by *M*_{1}≼*M*_{2}. Intuitively, for every transition in *M*_{1}, there is a corresponding transition in *M*_{2}.

In the following sections, *PS*^{c}(*n*) refers to the concrete model of asynchronous concurrent parameterized systems with true concurrency semantics consisting of *n* concrete processes. *PS*^{y}(*n*) is the *y*-abstract model of *PS*^{c}(*n*) and *PS*^{t}(*n*) is its TDA model.

### 5.1 *y*-Abstraction

The *y*-abstraction deals with each concrete process independently in order to abstract away the information irrespective of system properties. Any property-preserving abstraction method is available. We construct a finite predicate set *Φ*={*φ*_{1},*φ*_{2},…,*φ*_{r}} from properties and system description, and build the *y*-abstract model through the method of basic predicate abstraction.

The predicate set *Φ* defines an equivalence relationship on \(S_{k}^{c}\), the set of states of \(M_{k}^{c}=(\mathit{AP}_{k}^{c},S_{k}^{c},I_{k}^{c},R_{k}^{c},L_{k}^{c})\ (1 \le k \le n)\), and each equivalence class is denoted by an abstract state. The concrete state is labeled with a predicate formula which is satisfied in that state. In other words, labeling function \(L_{k}^{c}\) maps a concrete state into a predicate set. The set of states of the *y*-abstract model \(M_{k}^{y}\), \(S_{k}^{y}\) is a set of normal boolean expressions on *b*_{1},*b*_{2},…,*b*_{r} (*b*_{j}(1≤*j*≤*r*) corresponding to predicate *φ*_{j}. A *y*-abstract state is a truth assignment to *r* boolean variables. Labeling function \(L_{k}^{y}\) maps a *y*-abstract state into a boolean expression. The abstract operator \(H_{k}^{cy}\) determines the relationship between concrete states and abstract states. The method of building the transition relation \(R_{k}^{y}\) of the *y*-abstract model \(M_{k}^{y}\) from the concrete transition relation \(R_{k}^{c}\) is the same as that introduced by Graf and Saidi [11]. From the above definitions, we can conclude that \(H_{k}^{cy} \subseteq S_{k}^{c} \times S_{k}^{y}\) is a simulation relation between \(M_{k}^{c}\) and \(M_{k}^{y}\), so the following theorem holds.

### Theorem 2

\(M_{k}^{c} \preccurlyeq M_{k}^{y}\ (1 \le k \le n)\).

### Proof

The proof is given in [11]. □

In the following, we will demonstrate how the *y*-abstraction affects the parameterized concurrent systems.

### Definition 7

(Visible transitions set and invisible transitions set)

Given a Kripke structure *M*=(*AP*,*S*,*I*,*R*,*L*), we assume that *AP*_{f} is the set of atomic propositions involved in the temporal formula *f*. The *set of visible transitions* of *M* w.r.t. *AP*_{f} includes transitions affecting the truth of atomic propositions in *AP*_{f}, which is denoted by *VTS*(*M*,*AP*_{f})={(*s*,*t*)|(*s*,*t*)∈*R*∧(*L*(*s*)∩*AP*_{f}≠*L*(*t*)∩*AP*_{f})}. The set of *IVTS*(*M*,*AP*_{f})=*R*−*VTS*(*M*,*AP*_{f}) is called the *set of invisible transitions* of *M* w.r.t. *AP*_{f}.

It is obvious that *VTS*(*M*,*AP*_{f}) and *IVTS*(*M*,*AP*_{f}) relate to the system property. Both of them satisfy *VTS*(*M*,*AP*_{f})∩*IVTS*(*M*,*AP*_{f})=∅ and *VTS*(*M*,*AP*_{f})∪*IVTS*(*M*,*AP*_{f})=*R*.

*k*th process in a concrete model

*PS*

^{c}(

*n*), if \(R_{k}^{c}(s_{k}^{c},t_{k}^{c}) \in \mathit{IVTS}(M_{k}^{c},\mathit{AP}_{f})\), the corresponding

*y*-abstract transition \(R_{k}^{y}(s_{k}^{y},t_{k}^{y})\) is a loop in the state graph and \(M_{k}^{y}\) does not change the current state. If \(R_{k}^{c}(s_{k}^{c},t_{k}^{c}) \in \mathit{VTS}(M_{k}^{c},\mathit{AP}_{f})\), \(R_{k}^{y}(s_{k}^{y},t_{k}^{y})\) connects two different

*y*-abstract states in \(M_{k}^{y}\), that is to say, \(M_{k}^{y}\) performs a transition. Hence, all transitions in \(M_{k}^{c}\) are maintained. Figure 2 illustrates two kinds of concrete transitions and their

*y*-abstract transitions. Therefore, the

*y*-abstract model

*PS*

^{y}(

*n*) is an asynchronous composition of \(M_{1}^{y},M_{2}^{y},\dots,M_{n}^{y}\).

### Theorem 3

*The asynchronous composition with true concurrency semantics operator* ∏_{a}*is monotonic w*.*r*.*t*. ≼, *that is*, \(M_{k}^{c}\preccurlyeq M_{k}^{y}\) (1≤*k*≤*n*)⇒*PS*^{c}(*n*)≼*PS*^{y}(*n*).

### Proof

Let \(\mathit{PS}^{c}(n)=(\mathit{AP}^{c},S^{c},I^{c},R^{c},L^{c})=\prod_{a\ k=1}^{n} M_{k}^{c}\) be an asynchronous composition with true concurrency semantics, where \(M_{k}^{c}=(\mathit{AP}_{k}^{c},S_{k}^{c},I_{k}^{c},R_{k}^{c},L_{k}^{c})\). Its *y*-abstract model is denoted by \(\mathit{PS}^{y}(n)=(\mathit{AP}^{y},S^{y},I^{y},R^{y},L^{y})=\prod^{n}_{a\ k=1} M_{k}^{y}\), where \(M_{k}^{y}=(\mathit{AP}_{k}^{y},S_{k}^{y},I_{k}^{y},R_{k}^{y},L_{k}^{y})\).

*s*

^{y}in

*PS*

^{y}(

*n*), the following identity holds:

That is to say, a *y*-abstract state is obtained by applying \(H_{k}^{cy}\ (1 \le k \le n)\) to the *k*th element in concrete state *s*^{c}.

*H*

^{cy}⊆

*S*

^{c}×

*S*

^{y}is a simulation relation between

*PS*

^{c}(

*n*) and

*PS*

^{y}(

*n*). For every \(s^{c}=\langle s_{1a}^{c},s_{2b}^{c},\dots,s_{kl}^{c},\dots,s_{ng}^{c}\rangle \in S^{c}\), suppose that \(s^{y}=\langle s_{1a}^{y},s_{2b}^{y},\dots,s_{kl}^{y},\dots,s_{ng}^{y} \rangle \in S^{y}\) is its

*y*-abstract state, namely,

*H*

^{cy}(

*s*

^{c})=

*s*

^{y}, then, by Definition 2, both of the following conditions must hold:

- 1.
*L*^{c}(*s*^{c})∩*AP*^{y}=*L*^{y}(*s*^{y}). - 2.
∀

*t*^{c}*t*^{c}∈*S*^{c}∧*R*^{c}(*s*^{c},*t*^{c})⇒∃*t*^{y}*t*^{y}∈*S*^{y}∧*R*^{y}(*s*^{y},*t*^{y})∧*H*^{cy}(*t*^{c},*t*^{y}).

Proof of condition (1): *L*^{c}(*s*^{c})∩*AP*^{y}=*L*^{y}(*s*^{y}).

*AP*

^{y}in (5) with the right-hand side of (6), we obtain

Hence, condition (1) is true.

Proof of condition (2): ∀*t*^{c} *t*^{c}∈*S*^{c}∧*R*^{c}(*s*^{c},*t*^{c})⇒∃*t*^{y} *t*^{y}∈*S*^{y}∧*R*^{y}(*s*^{y},*t*^{y})∧*H*^{cy}(*t*^{c},*t*^{y}).

For each \(t^{c}=\langle t_{1a^{\prime}}^{c},t_{2b^{\prime}}^{c},\dots,t_{kl^{\prime}}^{c},\dots,t_{ng^{\prime}}^{c}\rangle \in S^{c}\), *R*^{c}(*s*^{c},*t*^{c}) implies that there is at least one component in a concrete model that makes a transition. Suppose that the former *k* (1≤*k*≤*n*) components make transitions, while the latter *n*−*k* components do not. There are several cases to be considered.

Case 1: \(t^{c} \neq s^{c} \wedge R_{k}^{c}(s_{kl}^{c},t_{kl^{\prime}}^{c}) \in \mathit{IVTS}(M_{k}^{c},\mathit{AP}_{f})\), as represented in the middle of Fig. 2.

*n*−

*k*components in the concrete model do not make transitions, we obtain

This expression indicates that applying \(H_{k}^{cy}\) to the *k*th element of *t*^{c} will yield its *y*-abstract state, thus, (*t*^{c},*t*^{y})∈*H*^{cy}.

From (11), there is at least one element in *s*^{y} and *t*^{y} that satisfies \(R_{k}^{y}(s_{ke}^{y},t_{ke^{\prime}}^{y})\), so (*s*^{y},*t*^{y})∈*R*^{y}.

The other two cases, \(t^{c} \neq s^{c} \wedge R_{k}^{c}(s_{kl}^{c},t_{kl^{\prime}}^{c})\in \mathit{VTS}(M_{k}^{c},\mathit{AP}_{f})\) and *t*^{c}=*s*^{c}, can be discussed in a similar way.

To this point, both conditions (1) and (2) are true. We conclude that *H*^{cy}⊆*S*^{c}×*S*^{y} is a simulation between *PS*^{c}(*n*) and *PS*^{y}(*n*). By Definition 2, for every initial state \(s_{0}^{c} \in I^{c}\) in *PS*^{c}(*n*) there is an initial state \(s_{0}^{y} \in I^{y}\) in *PS*^{y}(*n*) such that \(H^{cy}(s_{0}^{c},s_{0}^{y})\), as a consequence, this theorem is proved. □

Theorem 3 implies that the *y*-abstract model is weakly-preserved w.r.t. *ACTL** formula. Applying this theorem to each kind of *ACTL** formula, we get the following conclusion.

### Theorem 4

*For each ACTL** *formula**f*(*AP*_{f}⊆*AP*^{y}), *PS*^{y}(*n*)⊨*f*⇒*PS*^{c}(*n*)⊨*f*.

### Proof

Hence, *PS*^{y}(*n*)⊨*f*⇒*PS*^{c}(*n*)⊨*f* holds. It is proved in [34]. □

Intuitively, this theorem is true because formula in *ACTL** describes properties that are quantified over all possible behaviors of a system. Because every behavior of *PS*^{y}(*n*) is a behavior of *PS*^{c}(*n*) , every formula of *ACTL** that is true in *PS*^{y}(*n*) must also be true in *PS*^{c}(*n*). Theorem 4 is very useful for large scale system verification since it provides a way of accelerating the verification by taking advantage of exhaustive search of a smaller state space.

### 5.2 *x*-Abstraction

During the construction of parameterized systems, the designers reason about its correctness by focusing on the execution of one process (called *hub*) and consider its interaction with other processes (called *rims*, all *rims* constitute the *hub*’s environment) [8]. The *x*-abstraction, following this idea, produces a much smaller state space.

*PS*

^{y}(

*n*) is an asynchronous concurrent system with true concurrency semantics. Without loss of generality, assume that

*PS*

^{y}(

*n*) contains

*n*−1 (

*n*>1)

*rims*(numbered from 1 to

*n*−1) and one

*hub*(numbered

*n*). We get the following identity by expanding

*L*

^{y}, the labeling function of

*PS*

^{y}(

*n*):

It is straightforward to find that \(L_{k}^{y}(s_{k}^{y})\) (1≤*k*≤*n*) on the right hand side of the identity is the set of all labels of *rims* (or *hubs*) and they are atomic propositions that process *k* satisfies in the current state. These atomic propositions reflect process properties. Consequently, the object of *x*-abstraction is the whole parameterized system whose properties relate to either one process or many processes.

### Definition 8

(Process property)

The first-order predicate *prop*(*k*), 1≤*k*≤*n*, indicating that the *k*th process has property *prop*, is called process property. We use *PROP*(*k*)={*prop*(*k*)} to denote all properties the *k*th process holds.

Given a process *d*, the *d*-*label* is an instance of *prop*(*k*), meaning that process *d* meets the property *prop*. *PROP*(*d*)={*prop*(*d*)} is the set of all *d*-*labels*. For every *s*^{y} (*s*^{y}∈*S*^{y}) and process *d* (1≤*d*≤*n*), we have either *s*^{y}⊨*prop*(*d*) or *s*^{y}⊭*prop*(*d*). If *s*^{y}⊨*prop*(*d*) holds, the *y*-abstract state *s*^{y} has the label *prop*(*d*).

*y*-abstract model can be simplified as follows, by Definition 8:

It is interesting to note that the global label of the *y*-abstract state *s*^{y} is all the process properties it satisfied. Next we will introduce a new notation to describe the parameterized system.

### Definition 9

The first-order predicate *snps*(*k*)=*prop*(*k*)∧(⋀_{j≠k}*prop*(*j*)) describes not only the *k*th process but also its environment (comprising the *j*th process). *snps*(*k*) is a quite detailed picture of the global system, and all the snapshots are represented as *SNPS*={*snps*(*k*)}.

A snapshot *snps*(*k*) gives the necessary condition that an equivalent partition meets on *PS*^{y}(*n*): if there exits a process *d* satisfying *s*^{y}⊨*snps*(*d*), *snps*(*k*) is one of the abstract states of *s*^{y}. All such *y*-abstract states which satisfy the above condition compose an equivalence class. If *snps*(*k*) were of the form ±*prop*_{1}(*k*)∧±*prop*_{2}(*k*)∧⋯∧±*prop*_{r}(*k*), *r*>1, where *prop*_{1}(*k*),…,*prop*_{r}(*k*) are *r* process properties and ±*prop*_{i}(*k*) (1≤*i*≤*r*) indicates that *prop*_{i}(*k*) appears positive or negative, *snps*(*k*) can be expressed by a tuple 〈*b*_{1},*b*_{2},…,*b*_{r}〉, where *b*_{i}=1⇔*snps*(*k*)⇒*prop*_{i}(*k*). That is, the value of each bit *b*_{i} reflects the polarity of the corresponding predicate *prop*_{i}(*k*) in *snps*(*k*). Labeling the *y*-abstract states with atomic formulas will result in a much smaller state space.

In order to construct a TDA model, *PROP* and *SNPS* must meet two conditions: coverage and congruence. Coverage means that every *y*-abstract state is reflected by some snapshots, and congruence implies that *snps*(*k*) contains enough information about a process to conclude a label holds true for this process or not. That is to say, for each *snps*(*k*)∈*SNPS* and each *prop*(*k*)∈*PROP* it holds that *snps*(*k*)→*prop*(*k*) or *snps*(*k*)→¬*prop*(*k*).

*PROP*and

*SNPS*of

*PS*

^{y}(

*n*) satisfy the above conditions, the TDA model is a Kripke structure

*PS*

^{t}=〈

*AP*

^{t},

*S*

^{t},

*I*

^{t},

*R*

^{t},

*L*

^{t}〉:

- 1.
*AP*^{t}is the set of atomic propositions involved in the process property*prop*(*k*), and*AP*^{t}=*AP*^{y}according to Definition 8; - 2.
*S*^{t}=*SNPS*is the set of abstract states: the abstract operator*α*_{n}(*s*^{y})={*snps*(*k*)∈*SNPS*|*s*^{y}⊨*snps*(*n*)} maps all the*y*-abstract states*s*^{y}, where*hub*meets the condition of*snps*(*k*), into the TDA abstraction state*snps*(*k*); - 3.
*I*^{t}is the set of initial abstract states:*snps*(*k*)∈*I*^{t}if there exists a parameterized system*PS*^{y}(*n*) and a*y*-abstract state*s*^{y}∈*I*^{y}such that*snps*(*k*)∈*α*_{n}(*s*^{y}); - 4.
*L*^{t}is the labeling function: for each*snps*(*k*)∈*S*^{t},*L*^{t}(*snps*(*k*))={*prop*(*k*):*snps*(*k*)⇒*prop*(*n*)}; - 5.
*R*^{t}is the set of abstract transitions: for each*snps*_{1}(*k*)∈*S*^{t},*snps*_{2}(*k*)∈*S*^{t}, if there exist a parameterized system*PS*^{y}(*n*) and two*y*-abstract states*s*^{y}∈*S*^{y},*t*^{y}∈*S*^{y}which meet the condition of*snps*_{1}(*k*)∈*α*_{n}(*s*^{y})∧*snps*_{2}(*k*)∈*α*_{n}(*t*^{y})∧(*s*^{y},*t*^{y})∈*R*^{y}, then (*snps*_{1}(*k*),*snps*_{2}(*k*))∈*R*^{t}.

The TDA abstract state is labeled with *prop*(*k*) which process *k* satisfies, and now *k* becomes finite after *y*-abstraction, therefore, *S*^{t} is finite, too. From the theoretical perspective, TDA will reduce the space by (|*S*|−|*S*^{t}|)/|*S*| where *S* is the set of asynchronous composition states defined in Definition 5. At this time, our goal of reducing the state space of parametric verification has been achieved.

### Theorem 5

*For a single*-*indexed ACTL** *specification* ∀*x*
*φ*(*x*) *where the atomic formulas involved in**φ*(*x*) *are labels in**L*^{t}, *the following holds*: *PS*^{t}⊨*φ*(*x*)⇒∀*n*
*PS*^{y}(*n*)⊨∀*xφ*(*x*).

### Proof

The proof is given in [36]. □

The correctness of TDA means that TDA model is weakly-preserved for single indexed *ACTL** specifications, which is guaranteed by Theorems 3, 4, and 5. In addition, Theorem 5 implies that TDA is sound, namely, any single-indexed *ACTL** specification which holds in a TDA model also holds in a concrete model with arbitrary number of processes. The completeness and soundness of our approach provide a solid theoretical foundation for optimizing the state space of parameterized systems.

## 6 An example

We show how the TDA runs on parameterized MESI protocol. The MESI protocol is a four-state write-invalidate cache coherence protocol in which every memory block can be in one of the following states: *Modified*, *Exclusive*, *Shared*, and *Invalid* [37]. *Invalid* means that a memory block is not present in the cache and to load it the processor would have to send a request (LD) to the main memory. *Modified* identifies cache lines that have been written by the corresponding processor (ST). The current version of the modified block resides in the cache and is not visible to the rest of the system at this time. The processor can perform LD, ST, and Eviction on this data. *Shared* is the only state which allows other valid copies of the same memory block to be stored in other caches. A processor can load from a *Shared* memory block or evict it without notifying other processors or the memory. *Exclusive* means that the processor is the one who owns the right to modify the block and the main memory is current with the contents of the cache. If one cache has an *Exclusive* or *Modified* state, all matching lines in other caches are marked *Invalid*.

*PS*

^{c}(3) be a distributed shared-memory multi-processor system with three processors which ensures the data consistency through a directory-based MESI protocol considering single memory block and single cache line. The directory itself is a data structure whose entries record, for every block of memory, the state (i.e., cache access permission, namely,

*dirstate*) and the identities of the processors which have cached that block (

*sharedset*). Each cache tag residing in a processor includes at least three fields:

*memaddr*,

*cachestate*, and

*cachedata*. From the viewpoint of each cache controller, a particular memory block can be in one of the four states:

*MODF*,

*EXCL*,

*SHRD*, or

*INVD*. From the perspective of system-wide view, the state of a cache line is determined by the corresponding

*dirstate*and

*cachestate*. Regardless of

*dirstate*, if the range of

*cachedata*is contained in [0,1], there are as many as 32 transitions in the state machine of a single processor for a single memory block, even though 7 states are valid (shown on the left hand side of Fig. 3). It is very difficult to draw the state machine graph if

*cachedata*and

*memaddr*are allowed to take on any values from its domain.

Now we want to validate *PS*^{c}(3) which satisfies such a property that there exists a processor without a copy of a block of memory when it is shared by another processor. The first step is to simplify the MESI protocol for a single processor through *y*-abstraction by Definition 6. Because the above property only relates to the state of cache line and does not care its value, *cachedata* is redundant. The Kripke structure of the reduced MESI protocol by *y*-abstraction is shown on the right hand side of Fig. 3, where states are labeled with predicates satisfied in the current state, for example, ‘M’ means *cachestate*=*MODF*.

*PS*

^{c}(3) (shown in Fig. 4), each of them is labeled with a predicate-vector of length three, with the three bits representing the predicate the current memory block satisfies in processors 1, 2, and 3, respectively. For example, 〈

*EII*〉 implies that processor 1 owns the right to modify the memory block and the memory data is not present in the caches of processors 2 and 3. To load the memory data, both of them must issue a request to the main memory. Other states are excluded due to compatibility constraints. Take 〈

*MMM*〉 as an example. For the particular cache line in processors 1, 2 and 3,

*cachestate*is an internal variable, whereas

*dirstate*and

*sharedset*are external variables. The labels for an

*M*state in each processor are {

*dirstate*=

*M*,

*sharedset*=

*P1*,

*cachestate*=

*M*}, \(\{\mathit{dirstate}=M,\mathit{sharedset}={\mbox{\emph{P2}}}, \mathit{cachestate}=M\}\), and \(\{\mathit{dirstate}=M, \mathit{sharedset}={\mbox{\emph{P3}}}, \mathit{cachestate}=M\}\), respectively. The

*M*states do not agree on the external variable

*sharedset*, so they are not compatible.

*k*th processor and there is another processor which has no copy of the block of memory:

*PS*

^{y}(3) partitioned by

*snps*(2). The first column lists the sets of equivalence class, while the second is the label of each equivalence class and its bit vector expression is shown in the last column. From the table we note that there are only 4 states in the TDA model, reducing the space by 71.4 % compared with that in the

*y*-abstract model. The state of 〈11〉 in the resulting model means that processor 2 has a shared copy of the memory block and the memory data is not present in the caches of processor 1 and/or processor 3. Therefore, the TDA model is precise enough to prove the above system property, namely, TDA is correct.

*PS*^{y}(3) state space partition using *snps*(2)

Equivalence class | Label of equivalence class | Bit-vector of label |
---|---|---|

{ |
| 〈11〉 |

{ | ¬ | 〈01〉 |

{ |
| 〈10〉 |

{ | ¬ | 〈00〉 |

*n*is existentially-quantified, a group of parameterized systems with different system parameter can be modeled by the same TDA model. To prove the soundness, we applied our method to several other concrete systems. As it is expected, at least 3 concrete systems have the same TDA model as

*PS*

^{c}(3) has. Figure 5 shows one such system.

## 7 Case studies

To validate our approach, we have implemented TDA and applied it to verify several classical cache coherence protocols as described in [38] and a hierarchical cache protocol in FT-1000 CPU.

### 7.1 Protocols and properties to be verified

Classical protocols and properties these protocols should have are introduced briefly here.

*Synapse**N*+1

*N*+1 is a write-allocation protocol developed by Synapse for the

*N*+1 computer. A cache can be in one of three possible states:

*invalid*(the cache has no valid data),

*valid*(the cache has a potentially shared copy of the data), and

*dirty*(the cache has a modified copy of the data).

*dirty*is an exclusive state, only one cache can have a dirty line. The state changes according to write and read commands issued by the corresponding processor (for example,

*R*

_{m},

*W*) or coming from the system bus (such as \(\overline {R_{m}}\) and \(\overline{W}\)), as shown in Fig. 6,

*R*

_{h}is an internal action that denotes a read hit,

*R*

_{m}denotes a read miss,

*W*denotes a write.

There are two possible sources of data inconsistency for Synapse:

UNS1: a *dirty* cache co-exists with one or more caches in state *valid*;

UNS2: more than one cache is in state *dirty*.

*Illinois*

*invalid*, caches can be in one of the following states:

*valid-exclusive*(the cache has an exclusive copy of the data that is consistent with the memory such that a modification of its content requires no bus invalidation signal),

*shared*(the cache has a copy of the data consistent with the memory and other caches may have copies of the data), and

*dirty*(the cache has a modified copy of the data, i.e., the data in main memory are obsolete and the content of the other caches is not valid). The transition is given in Fig. 7, and the behavior of one cache may be internal actions

*R*

_{h}(read hit),

*R*

_{m}(read miss),

*W*

_{e}(write in exclusive state),

*W*

_{d}(write in dirty state),

*WI*(write and invalidate), and

*Rep*(replacement with a new memory line). In this figure,

*P*is defined as

*Number*(

*dirty*)=0∧

*Number*(

*shared*)=0∧

*Number*(

*valid*-

*exclusive*)=0, where

*Number*(

*q*) denotes the number of caches in state

*q*in the current global state.

The possible sources of data inconsistency are:

UNS1: a *dirty* cache co-exists with caches either in state *shared* or *valid-exclusive*;

UNS2: there is more than one *dirty* cache.

The other possible violations of the exclusivity of state *valid-exclusive* are:

UNS3: there is more than one *valid-exclusive* cache;

UNS4: a *shared* cache co-exists with a cache in state *valid-exclusive*.

*Berkeley*

*owned non-exclusively*. In this state, the main memory is not coherent with the possible multiple, cached copies of the owner data. The other three states are

*invalid*,

*unowned*(similar to the MESI

*Shared*state), and

*owned exclusively*(similar to the MESI

*Modified*state). Figure 8 demonstrates how one cache changes its state according to different commands.

In the Berkeley protocol, we have the following sources of data inconsistency:

UNS1: an *owned exclusively* cache co-exists with one or more caches either in state *owned non-exclusively*, or *unowned*;

UNS2: there is more than one *owned exclusively* cache.

*Dragon*

*shared clean*(multiple clean copies may coexist),

*shared dirty*(multiple dirty copies may coexist),

*shared valid exclusive*(the cache has an exclusive clean copy), and

*dirty*(the cache has an exclusive dirty copy). The possible transitions from the perspective of cache

*C*

_{i}are shown in Fig. 9, where

*P*,

*Q*,

*S*,

*T*are defined as follows:

*P*≜*Number*(*exclusive*)=0∧*Number*(*dirty*=0)∧*Number*(*shared*-*dirty*)=0∧*Number*(*shared*-*clean*)=0,

*Q*≜*Number*(*shared*-*dirty*)+*Number*(*shared*-*clean*)≥2,

*S*≜*Number*(*shared*-*dirty*)=0∧*Number*(*shared*-*clean*)=1,

*T*≜*Number*(*shared*-*dirty*)=1∧*Number*(*shared*-*clean*)=0.

In the Dragon protocol, there are several possible sources of data inconsistency:

UNS1: a *dirty* cache co-exists with one or more caches either in state *shared dirty*, *shared clean* or *valid exclusive*;

UNS2: an *valid exclusive* cache co-exists with one or more caches either in state *shared clean*, or *shared dirty*;

UNS3: there is more than one *dirty* cache;

UNS4: there is more than one *valid exclusive* cache.

### 7.2 Experimental results

The asynchronous composition of *n*-processor system which ensures the data consistency through some protocol is a concrete system. Figure 10 shows the number of concrete states of each protocol against different system parameter according to Definition 5. Although in the worst case the number of states in asynchronous composition could be as large as \({\prod}^{n}_{k=1}|{S_{k}}|\), in practice it typically turns out to be much smaller. This is because some states, such as 〈*dirty*,*dirty*〉 in Illinois protocol and 〈*owned*-*exclusively*,*owned*-*exclusively*〉 in Berkeley protocol are prohibited. As it is seen from this figure, with the increase of processor number (especially greater than 13 for Berkeley and Dragon, 20 for Synapse *N*+1 and Illinois), the state number grows rapidly. Therefore, the largest asynchronous composition we can get only comprises 24 processors (Synapse *N*+1).

In Fig. 11, we plot the number of states in TDA model of each protocol. Because process properties used in TDA are made of predicates taken from properties to be verified, different properties for the same protocol have different TDA models. Two predicates, *cachestate*(*i*)=*dirty*/*shared* and \({\mathit{Number}(\mathit{dirty}/\mathit{valid} \text {-}\mathit{exclusive})}\), are enough to express these properties formally, resulting in 4, the maximum number of TDA abstract states. *AHG* denotes the number of reachable states in the abstract history graph described in [39] which are greater than those in TDA. It is also important to notice that the number of states in TDA model does not change along with the system parameter, which is consistent with the conclusion in Sect. 6. All experiments were conducted on a PC with a 3.3 GHz Intel Core processor, 8 Gb of available main memory, running Red Hat Linux (6.1) and GCC (4.4.5).

### 7.3 Application for FT-1000 CPU

*unowned*,

*shared*,

*exclusive*) invalidation-based directory-based cache coherence protocol with some extensions. This hierarchical protocol is more complicated, with more corner cases and bigger state space than non-hierarchical protocols, as we can see, it has eight instances of chip-level protocol and at most four instances of inter-chip protocol running concurrently. So it seems obvious that such hierarchical protocols cannot be checked by current model checkers, e.g., Murphi, NuSMV. During the development of FT-1000 CPU, we applied TDA to reduce the state space of chip-level protocol, and checked several safety properties using NuSMV. Then, FT-1000 CPU is regarded as a single-core processor and the verification of the inter-chip protocol is simplified. We claimed the correctness of the original protocol by verifying the second level protocol. Some chip-level experimental results are given in Table 2, where UNS1 and UNS2 are the same as those of Synapse

*N*+1.

Experimental results of FT-1000 chip-level protocol

Asynchronous composition state number | TDA state number | Time (ms) | ||
---|---|---|---|---|

UNS1 | UNS2 | UNS1 | UNS2 | |

264 | 4 | 2 | 64 | 57 |

## 8 Conclusions

The verification of cache coherence in general is known to be NP-hard. In the age of exascale computing, scalability is emerging as one of the key components in parallel computing [41]. Scalable multi-core multi-processor architectures are inevitable. More and more complex processes and unbounded system parameter result in the state explosion during the verification of parameterized cache coherence protocols. A generic abstraction method for parameterized systems, two-dimensional abstraction (TDA), has been put forward in this paper. The novelty of our approach lies in that it analyzes in depth the intrinsic factors affecting the size of state space, and reduces the state space in two dimensions, thus a much smaller abstract model is produced. Compared with traditional approaches, our approach can effectively reduce the verification complexity and greatly scale the verification capabilities. We give complete soundness and completeness proofs for our method. We have demonstrated the benefits of our approach on several coherence protocols with realistic features.

Our future work is to integrate TDA with model-checking tools and check the advanced cache coherence protocol hierarchically organized for a next generation supercomputer. We also plan to investigate combining TDA with CMP method in the future.

## Acknowledgements

This work is inspired by the idea from M. Talupur’s work on environment abstraction, and supported by the National Natural Science Foundation of China under Grant No. 61070036 and 61133007.

### Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.