1 Introduction

Modern software and hardware systems are becoming increasingly complex, resulting in new design challenges. For safety-critical applications, correctness evidence for designed systems must be presented to the regulatory bodies (see the automotive standard ISO 26262 [33]). It follows that verification and validation techniques must be used to provide evidence that the designed system meets its requirements. Testing remains the preferred practice in industry for gaining confidence in the design correctness. In classical testing, an engineer designs a test experiment, i.e., an input vector that is executed on the system under test (\(\textsc {SUT}\)) to check whether it satisfies its requirements. Due to the finite number of experiments, testing cannot prove the absence of errors. However, it is an effective technique for catching bugs. Testing remains a predominantly manual and ad-hoc activity that is prone to human errors. As a result, it is often a bottleneck in the complex system design.

Model-based testing (MBT) is a technology that enables systematic and automatic test-case generation (TCG) and execution, thus reducing system design time and cost. In MBT, the \(\textsc {SUT}\) is tested for conformance against its specification, a mathematical model of the \(\textsc {SUT}\). In contrast to the specification, that is a formal object, the \(\textsc {SUT}\) is a physical implementation with often unknown internal structure, also called a “black-box”. The \(\textsc {SUT}\) can be accessed by the tester only through its external interface. To reason about the conformance of the \(\textsc {SUT}\) to its specification, one needs to rely on the testing assumption [48], that the \(\textsc {SUT}\) can react at all times to all inputs and can be modeled in the same language as its specification.

The formal model of the \(\textsc {SUT}\) is derived from its informal requirements. The process of formulating, documenting, and maintaining system requirements is called requirement engineering. The requirements are typically written in a textual form, using possibly constrained English, and are gathered in a requirements document. The requirements document is structured into chapters describing various different views of the system, as, e.g., behavior, safety, timing. Intuitively, a system must correctly implement the conjunction of all its requirements. Sometimes, requirements can be inconsistent, resulting in a specification that does not admit any correct implementation.

In this paper, we propose a requirement-driven MBT-framework of synchronous data-flow reactive systems. In contrast to classical MBT, in which the requirements’ document is usually formalized into one big monolithic specification, we exploit the structure of the requirements and adopt a multiple-viewpoint approach.

Fig. 1
figure 1

Overview of using requirement interfaces for testing, analysis, and tracing

We first introduce requirement interfaces as the formalism for modeling system views as subsets of requirements. It is a state-transition formalism that supports compositional specification of synchronous data-flow systems by means of assume/guarantee rules that we call contracts. We associate subsets of contracts to requirement identifiers to facilitate their tracing to the informal requirements from which the specification is derived. These associations can, later on, be used to generate links between the work products [4], connecting several tools.

A requirement interface is intended to model a specific view of the \(\textsc {SUT}\). We define the conjunction operation that enables combining different views of the \(\textsc {SUT}\). Intuitively, a conjunction of two requirement interfaces is another requirement interface that requires contracts of both interfaces to hold. We assume that the overall specification of the \(\textsc {SUT}\) is given as a conjunction of requirement interfaces modeling its different views. Requirement interfaces are inspired by the synchronous interfaces [23], with the difference that we allow hidden variables in addition to the interface (input and output) variables and that the requirement identifiers are part of the formal model. The conjunction operator was first defined in [25] as shared refinement, while [13] establishes the link of the conjunction to multiple-viewpoint modeling and requirement engineering.

We then formally define consistency for requirement interfaces and develop a bounded consistency checking procedure. In addition, we show that falsifying consistency is compositional with respect to conjunction, i.e., the conjunction of an inconsistent interface with any other interface remains inconsistent. Next, we develop a requirement-driven TCG and execution procedure from requirement interfaces, with language inclusion as the conformance relation. We present a procedure for TCG from a specific \(\textsc {SUT}\) view, modeled as a requirement interface, and a test purpose. Here, the test purpose is a formal specification of the target state(s) that a test case should cover. Such a test case can be used directly to detect if the implementation by the \(\textsc {SUT}\) violates a given requirement, but cannot detect violation of other requirements in the conjunction. Next, we extend this procedure by completing such a partial test case with additional constraints from other view models that enable detection of violations of any other requirement.

Then, we develop a tracing procedure that exploits the natural mapping between informal requirements and our formal model. Thus, inconsistent contracts or failing test cases can be traced back to the violated requirements. We believe that such tracing information provides precious maintenance and debugging information to the engineers.

Finally, we show how to apply fault-based test generation to requirement interfaces. The used technique, called model-based mutation testing [2], is applied to automatically generate a set of test purposes. The corresponding test suite is able to provide fault coverage for a specified set of fault models, under the assumption of a deterministic \(\textsc {SUT}\). The approach includes the following steps: first, we define a set of fault models for requirement interfaces. These are applied to all applicable parts of the contracts in the requirement interface, generating a set of faulty models, called mutants. We then check whether the mutated contract introduces any new behavior. This check is encoded as a test purpose, so we can simply pass it to the previously defined test generation. If the mutation introduces new behavior that deviates from the reference model, it will generate a test; otherwise, the test purpose will be unreachable, and the mutant is considered equivalent.

We illustrate the entire workflow of using requirement interfaces for consistency checking, testing, and tracing in Fig. 1, where the test purpose may be produced by model-based mutation testing, or any arbitrary other technique.

Parts of this paper have already been published in the proceedings of the Formal Methods for Industrial Critical Systems 2015 workshop [5]. The current paper improves on that, by adding the theory for model-based mutation testing of requirement interfaces, proofs of all theorems, a second industrial case study and various improvements throughout the paper.

The rest of the paper is structured as follows: first, we introduce requirement interfaces in Sect. 2. Then, in Sect. 3, we present how to perform consistency checks and test-case generation from requirement interfaces, and how to trace the involved requirements from consistency violations or test cases. Next, Sect. 4 gives the theory for applying model-based mutation testing to requirement interfaces. Then, in Sect. 5, we present results of our test-case generation on our running example and on the two industrial case studies. Finally, we discuss related work (Sect. 6) and conclude our work (Sect. 7).

2 Requirement interfaces

We introduce requirement interfaces, a formalism for specification of synchronous data-flow systems. Their semantics is given in the form of labeled transition systems (\(\text {LTS}\)). We define consistent interfaces as the ones that admit at least one correct implementation. The refinement relation between interfaces is given as language inclusion. Finally, we define the conjunction of requirement interfaces as another interface that subsumes all behaviors of both interfaces.

2.1 Syntax

Let \(X\) be a set of typed variables. A valuation v over \(X\) is a function that assigns to each \(x \in X\) a value v(x) of the appropriate type. We denote by \(V(X)\) the set of all valuations over \(X\). We denote by \(X'=\{x'~|~x \in X\}\) the set obtained by priming each variable in \(X\). Given a valuation \(v \in V(X)\) and a predicate \(\varphi \) on \(X\), we denote by \(v \models \varphi \) the fact that \(\varphi \) is satisfied under the variable valuation v. Given two valuations \(v,v' \in V(X)\) and a predicate \(\varphi \) on \(X\cup X'\), we denote by \((v,v') \models \varphi \) the fact that \(\varphi \) is satisfied by the valuation that assigns to \(x \in X\) the value v(x), and to \(x' \in X'\) the value \(v'(x')\).

Given a subset \(Y \subseteq X\) of variables and a valuation \(v \in V(X)\), we denote by \(\pi (v)[Y]\), the projection of v to Y. We will commonly use the symbol \(w_{Y}\) to denote a valuation projected to the subset \(Y \subseteq X\). Given the sets X, \(Y_{1} \subseteq X\), \(Y_{2} \subseteq X\), \(w_{1} \in V(Y_{1})\), and \(w_{2} \in V(Y_{2})\), we denote by \(w =w_{1} \cup w_{2}\) the valuation \(w \in V(Y_{1} \cup Y_{2})\), such that \(\pi (w)[Y_{1}] = w_{1}\) and \(\pi (w)[Y_{2}] = w_{2}\).

Given a set \(X\) of variables, we denote by \(X_{I}\), \(X_{O}\), and \(X_{H}\) three disjoint partitions of \(X\) denoting sets of input, output, and hidden variables, such that \(X= X_{I}\cup X_{O}\cup X_{H}\). We denote by \(X_{\text {obs}}= X_{I}\cup X_{O}\) the set of observable variables and by \(X_{\text {ctr}}= X_{H}\cup X_{O}\) the set of controllable variables.Footnote 1 A contract c on \(X \cup X'\), denoted by \((\varphi \vdash \psi )\), is a pair consisting of an assumption predicate \(\varphi \) on \(X_{I}' \cup X\) and a guarantee predicate \(\psi \) on \(X_{\text {ctr}}' \cup X\). A contract \({\hat{c}} = ({\hat{\varphi }} \vdash {\hat{\psi }})\) is said to be an initial contract if \({\hat{\varphi }}\) and \({\hat{\psi }}\) are predicates on \(X_{I}'\) and \(X_{\text {ctr}}'\), respectively, and an update contract otherwise. Given two valuations \(v, v' \in V(X)\), and a contract \(c = (\varphi \vdash \psi )\) over \(X \cup X'\), we say that \((v,v')\) satisfies c, denoted by \((v,v') \models c\), if \((v, \pi (v')[X_{I}]) \models \varphi \rightarrow (v, \pi (v')[X_{\text {ctr}}]) \models \psi \). In addition, we say that \((v, v')\) satisfies the assumption of c, denoted by \((v,v') \models _{A} c\) if \((v, \pi (v')[X_{I}]) \models \varphi \). The valuation pair \((v,v')\) satisfies the guarantee of c, denoted by \((v,v') \models _{G} c\), if \((v,\pi (v')[X_{\text {ctr}}]) \models \psi )\).Footnote 2

Definition 1

A requirement interface \(A\) is a tuple \(\langle X_{I}, X_{O}, \) \(X_{H}, \hat{C}, C, \mathcal {R}, \rho \rangle \), where

  • \(X_{I}\), \(X_{O}\), and \(X_{H}\) are disjoint finite sets of input, output, and hidden variables, respectively, and \(X= X_{I}\cup X_{O}\cup X_{H}\) denotes the set of all variables;

  • \(\hat{C}\) and \(C\) are finite non-empty sets of the initial and update contracts;

  • \(\mathcal {R}\) is a finite set of requirement identifiers;

  • \(\rho :\mathcal {R}\rightarrow {\mathcal {P}}(C\cup \hat{C})\) is a function mapping requirement identifiers to subsets of contracts, such that \(\bigcup _{r \in \mathcal {R}} \rho (r) = C\cup \hat{C}\).

We say that a requirement interface is receptive if in any state, it has defined behaviors for all inputs, that is \(\bigvee _{({\hat{\varphi }} \vdash {\hat{\psi }}) \in \hat{C}} {\hat{\varphi }}\) and \(\bigvee _{(\varphi \vdash \psi ) \in C} \varphi \) are both valid. A requirement interface is fully observable if \(X_{H}= \emptyset \). A requirement interface is deterministic if for all \(({\hat{\varphi }} \vdash {\hat{\psi }}) \in \hat{C}\), \({\hat{\psi }}\) has the form \(\bigwedge _{x \in X_{O}} x' = c\),Footnote 3 where c is a constant of the appropriate type, and for all \((\varphi \vdash \psi ) \in C\), \(\psi \) has the form \(\bigwedge _{x \in X_{\text {ctr}}} x' = f(X)\),3 where f is a function over X that has the same type as x.

Example 1

We use an abstract N-bounded FIFO buffer example to illustrate all the concepts introduced in the paper. Let \(A^{beh }\) be the behavioral model of the buffer. The buffer has two Boolean input variables enq, deq, i.e., \(X_{I}^{beh } = \{\mathsf{enq}, \mathsf{deq}\}\), two Boolean output variables E, F, i.e., \(X_{O}^{beh } = \{\mathsf{E}, \mathsf{F}\}\), and a bounded integer internal variable \(k \in [0:N]\) for some \(N \in {\mathbb {N}}\), i.e., \(X_{H}^{beh }=\{k\}\). The textual requirements are listed below:


The buffer is empty and the inputs are ignored in the initial state.


enq triggers an enqueue operation when the buffer is not full.


deq triggers a dequeue operation when the buffer is not empty.


E signals that the buffer is empty.


F signals that the buffer is full.


Simultaneous enq and deq (or their simultaneous absence), an enq on the full buffer, or a deq on the empty buffer have no effect.

We formally define \(A^{beh }\) as \(\hat{C}^{beh } = \{c_{0}\}\), \(C^{beh } = \{c_{i}~|~i \in [1,5]\}\), \(\mathcal {R}^{beh } = \{ r_{i}~|~ i \in [0,5]\}\) and \(\rho ^{beh }(r_{i}) = \{c_{i}\}\), where

$$\begin{aligned} \begin{array}{lcl} c_{0} &{} : &{} \mathbf{true } \;\vdash \; (k' = 0) \wedge \mathsf{E'} \wedge \lnot \mathsf{F'} \\ c_{1} &{} : &{} \mathsf{enq'} \wedge \lnot \mathsf{deq'} \wedge k < N \;\vdash \; k' = k+1\\ c_{2} &{} : &{} \lnot \mathsf{enq'} \wedge \mathsf{deq'} \wedge k > 0 \;\vdash \; k' = k-1 \\ c_{3} &{} : &{} \mathbf{true } \;\vdash \; k' = 0 \Leftrightarrow \mathsf{E'}\\ c_{4} &{} : &{} \mathbf{true } \;\vdash \; k' = N \Leftrightarrow \mathsf{F'}\\ c_{5} &{} : &{} (\mathsf{enq'} = \mathsf{deq'}) \vee (\mathsf{enq'} \wedge \mathsf{F}) \vee (\mathsf{deq'} \wedge \mathsf{E}) \;\vdash \; k' = k. \\ \end{array} \end{aligned}$$

2.2 Semantics

Given a requirement interface \(A\) defined over \(X\), let \(V = V(X) \cup \{ {\hat{v}} \}\) denote the set of states in \(A\), where a state v is a valuation \(v \in V(X)\) or the initial state \({\hat{v}} \not \in V(X)\). The latter is not a valuation, as the initial contracts do not specify unprimed variables. There is a transition between two states v and \(v'\) if \((v,v')\) satisfies all its contracts. The transitions are labeled by the (possibly empty) set of requirement identifiers corresponding to contracts for which \((v,v')\) satisfies their assumptions. The semantics \([[A]]\) of \(A\) is the following \(\text {LTS}\).

Definition 2

The semantics of the requirement interface \(A\) is the \(\text {LTS}\) \([[A]] = \langle V, {\hat{v}}, L, T \rangle \), where V is the set of states, \({\hat{v}}\) is the initial state, \(L = {\mathcal {P}}(\mathcal {R})\) is the set of labels, and \(T \subseteq V \times L \times V\) is the transition relation, such that:

  • \(({\hat{v}}, R, v) \in T\) if \(v \in V(X)\), \(\bigwedge _{{\hat{c}} \in \hat{C}} ({\hat{v}}, v) \models {\hat{c}}\) and \(R = \{ r~|~({\hat{v}},v) \models _{A} {\hat{c}} \; \text {for some} \; {\hat{c}} \in \hat{C}\; \text {and} \; {\hat{c}} \in \rho (r)\}\);

  • \((v, R, v') \in T\) if \(v, v' \in V(X)\), \(\bigwedge _{c \in C} (v, v') \models c\) and \(R = \{ r~|~(v,v') \models _{A} c \; \text {for some} \; c \in C\; \text {and} \; c \in \rho (r)\}\).

Fig. 2
figure 2

Labeled transition graph \([[A^{beh }]]\) illustrating the semantics of the bounded FIFO specification \(A^{beh }\), where \(N=2\)

We say that \(\tau = v_{0} \xrightarrow {R_{1}} v_{1} \xrightarrow {R_{2}} \cdots \xrightarrow {R_{n}} v_{n}\) is an execution of the requirements interface \(A\) if \(v_{0} = {\hat{v}}\) and for all \(1 \le i \le n-1\), \((v_{i}, R_{i+1}, v_{i+1}) \in T\). In addition, we use the following notation: (1) \(v \xrightarrow {R}\) iff \(\exists v' \in V(X) \; \text {s.t.} \; v \xrightarrow {R} v'\); (2) \(v \rightarrow v'\) iff \(\exists R \in L \; \text {s.t.} \; v \xrightarrow {R} v'\); (3) \(v \rightarrow \) iff \(\exists v' \in V(X) \; \text {s.t.} \; v \rightarrow v'\); (4) ; (5) iff \(\exists Y \subseteq X \; \text {s.t.} \; \pi (v')[Y] = w\) and \(v \rightarrow v'\); (6) iff \(\exists v',Y \subseteq X \; \text {s.t.} \; \pi (v')[Y] = w \; \text {and}\; v \rightarrow v'\); (7) iff \(\exists v_{1}, \ldots , v_{n-1}, v_{n}\) s.t. ; and (8) iff \(\exists v'\) s.t. .

We say that a sequence \(\sigma \in V(X_{\text {obs}})^{*}\) is a trace of \(A\) if . We denote by \(\mathcal {L}(A)\) the set of all traces of \(A\). Given a trace \(\sigma \) of \(A\), let . Given a state \(v \in V\), let \(\text {succ}(v) = \{v'~|~v \rightarrow v'\}\) be the set of successors of v.

Example 2

In Fig. 2, we show the \(\text {LTS}\) \([[A^{beh }]]\) of \(A^{beh }\). For instance, \({\hat{v}} \xrightarrow {r_{0}} v_{3} \xrightarrow {r_{1,3,4}} v_{5} \xrightarrow {r_{3,4,5}} v_{6}\) is an executionFootnote 4 in [[A]], and the trace \(\sigma \) induced by the above execution is

$$\begin{aligned} \begin{array}{lcl} &{}&{}(\lnot \textsf {enq},\lnot \textsf {deq},E,\lnot F)\\ \sigma &{}=&{} (\textsf {enq},\lnot \textsf {deq},\lnot E,\lnot F)\\ &{}&{}(\textsf {enq},\textsf {deq},\lnot E,\lnot F).\\ \end{array} \end{aligned}$$

2.3 Consistency, refinement, and conjunction

A requirement interface consists of a set of contracts that can be conflicting. Such an interface does not allow any correct implementation. We say that a requirement interface is consistent if it allows at least one correct implementation.

Definition 3

Let A be a requirement interface, [[A]] its associated \(\text {LTS}\), \(v \in V\) a state, and \({\mathcal {C}} = \hat{C}\) if v is initial, and \(C\) otherwise. We say that v is consistent, denoted by \(\text {cons}(v)\), if for all \(w_{I} \in V(X_{I})\), there exists \(v'\), such that \(w_I = \pi (v')[X_{I}]\), \(\bigwedge _{c \in {\mathcal {C}}} (v,v') \models c\) and \(\text {cons}(v')\). We say that A is consistent if \(\text {cons}({\hat{v}})\).

Example 3

\(A^{beh }\) is consistent, that is, every reachable state accepts every input valuation and generates an output valuation satisfying all contracts. Consider now replacing \(c_{2}\) in \(A^{beh }\) with the contract \(c'_{2} : \lnot \mathsf{enq'} \wedge \mathsf{deq'} \wedge k\ {\ge }\ 0 \;\vdash \; k' = k-1\), that incorrectly models \(r_{2}\) and decreases the counter k upon \(\textsf {deq}\) even when the buffer is empty, setting it to the value minus one. This causes an inconsistency with the contracts \(c_3\) and \(c_5\) which state that if k equals zero, the buffer is empty, and that dequeue on an empty buffer has no effect on k.

We define the refinement relation between two requirement interfaces \(A^{1}\) and \(A^{2}\), denoted by \(A^{2} \preceq A^{1}\), as trace inclusion.

Definition 4

Let \(A^{1}\) and \(A^{2}\) be two requirement interfaces. We say that \(A^{2}\) refines \(A^{1}\), denoted by \(A^{2} \preceq A^{1}\), if (1) \(A^{1}\) and \(A^{2}\) have the same sets \(X_{I}\), \(X_{O}\), and \(X_{H}\) of variables; and (2) \(\mathcal {L}(A^1) \subseteq \mathcal {L}(A^2)\).

We use a requirement interface to model a view of a system. Multiple views are combined by conjunction. The conjunction of two requirement interfaces is another requirement interface that is either inconsistent due to a conflict between views, or is the greatest lower bound with respect to the refinement relation. The conjunction of \(A^{1}\) and \(A^{2}\), denoted by \(A^{1} \wedge A^{2}\), is defined if the two interfaces share the same sets \(X_{I}\), \(X_{O}\), and \(X_{H}\) of variables.

Definition 5

Let \(A^{1} = \langle X_{I}, X_{H}, X_{O}, \hat{C}^{1}, C^{1}, \mathcal {R}^{1}, \rho ^{1} \rangle \) and \(A^{2} = \langle X_{I}, X_{H}, X_{O}, \hat{C}^{2}, C^{2}, \mathcal {R}^{2}, \rho ^{2} \rangle \) be two Their conjunction \(A = A^{1} \wedge A^{2}\) is the requirement interface \(\langle X_{I}, X_{H}, X_{O}, \hat{C}, C, \mathcal {R}, \rho \rangle \), where

  • \(\hat{C}= \hat{C}^{1} \cup \hat{C}^{2}\) and \(C= C^{1} \cup C^{2}\);

  • \(\mathcal {R}= \mathcal {R}^{1} \cup \mathcal {R}^{2}\); and

  • \(\rho (r) = \rho ^{1}(r)\) if \(r \in \rho ^{1}\) and \(\rho (r) = \rho ^{2}(r)\) otherwise.


For refinement and conjunction, we require the two interfaces to share the same alphabet. This additional condition is used to simplify definitions. It does not restrict the modeling—arbitrary interfaces can have their alphabets equalized without changing their properties by taking union of respective input, output, and hidden variables. Contracts in the transformed interfaces do not constrain newly introduced variables. For requirement interfaces \(A^{1}\) and \(A^{2}\), alphabet equalization is defined if \((X_{I}^1 \cup X_{I}^2) \cap (X_{\text {ctr}}^1 \cup X_{\text {ctr}}^2) = (X_{O}^1 \cup X_{O}^2) \cap (X_{H}^1 \cup X_{H}^2) = \emptyset \). Otherwise, \(A_1 \not \preceq A_2\) and vice versa, and \(A^1 \wedge A^2\) is not defined.

Example 4

We now consider a power consumption view of the bounded FIFO buffer. Its model \(A^{pc}\) has the Boolean input variables \(\textsf {enq}\) and \(\textsf {deq}\) and a bounded integer output variable \(\textsf {pc}\). The following textual requirements specify \(A^{pc}\):


The power consumption equals zero when no enq/deq is requested.


The power consumption is bounded to two units otherwise.

The interface \(A^{pc}\) consists of \(\hat{C}^{pc } = C^{pc } = \{c_{a},c_{b}\}\), \(\mathcal {R}^{pc } = \{ r_{i}~|~ i \in \{a,b\}\}\), and \(\rho (r_{i}) = \{c_{i}\}\), where:

$$\begin{aligned} \begin{array}{lclcl} c_a &{} : &{}\ \lnot \textsf {enq} \wedge \lnot \textsf {deq} &{} \vdash &{} \textsf {pc}' = 0\\ c_b &{} : &{}\ \textsf {enq} \vee \textsf {deq} &{} \vdash &{} \textsf {pc}' \le 2. \\ \end{array} \end{aligned}$$

The conjunction \(A^{buf } = A^{beh } \wedge A^{pc }\) is the requirement interface where \(X_{I}^{buf } = \{ \textsf {enq}, \textsf {deq} \}\), \(X_{O}^{buf } = \{ \textsf {E}, \textsf {F}, \textsf {pc} \}\), \(X_{H}^{buf } = \{k\}\), \(\hat{C}^{buf } = \{c_{0}, c_{a},c_{b}\}\), \(C^{buf } = \{c_{1}, c_{2}, c_{3}, c_{4}, c_{5}, c_{a},c_{b}\}\), \(\mathcal {R}^{pc } = \{ r_{i}~|~ i \in \{a,b,0,1,2,3,4,5\}\}\), and \(\rho (r_{i}) = \{c_{i}\}\).

We now show some properties of requirement interfaces. The conjunction of two requirement interfaces with the same alphabet is the intersection of their traces.

Theorem 1

Let \(A^{1}\) and \(A^{2}\) be two consistent requirement interfaces defined over the same alphabet. Then, either \(A^1 \wedge A^2\) is inconsistent, or \(\mathcal {L}(A^{1} \wedge A^{2}) = \mathcal {L}(A^{1}) \cap \mathcal {L}(A^2)\).

A proof of the theorem can be found in the appendix.

The conjunction of two requirement interfaces with the same alphabet is either inconsistent, or it is the greatest lower bound with respect to refinement.

Theorem 2

Let \(A^{1}\) and \(A^{2}\) be two consistent requirement interfaces defined over the same alphabet, such that \(A^1 \wedge A^2\) is consistent. Then, \(A^{1} \wedge A^{2} \preceq A^{1}\) and \(A^{1} \wedge A^{2} \preceq A^{2}\), and for all consistent requirement interfaces A, if \(A \preceq A^{1}\) and \(A \preceq A^{2}\), then \(A \preceq A^{1} \wedge A^{2}\).

A proof of the theorem can be found in the appendix.

The following theorem states that the conjunction of an inconsistent requirement interface with any other interface remains inconsistent. This result enables incremental detection of inconsistent specifications.

Theorem 3

Let A be an inconsistent requirement interface. Then, for all consistent requirement interfaces \(A'\) with the same alphabet as A, \(A\wedge A'\) is also inconsistent.

The proof follows directly from the definition of conjunction, which constrains the guarantees of individual interfaces.

3 Consistency, testing, and tracing

In this section, we present our test-case generation and execution framework and instantiate it with bounded model checking techniques. For now, we assume that all variables range over finite domains. This restriction can be lifted by considering richer data domains in addition to theories that have decidable quantifier elimination, such as linear arithmetic over reals. Before executing the test-case generation, we can apply a consistency check on the requirement interface, to ensure the generation starts from an implementable specification.

3.1 Bounded consistency checking

To check k-bounded consistency of a requirement interface \(A\), we unfold the transition relation of \(A\) in k steps, and encode the definition of consistency in a straight-forward manner. The transition relation of an interface is the conjunction of its contracts, where a contract is represented as an implication between its assumption and guarantee predicates. Let

$$\begin{aligned} {\hat{\theta }} = \bigwedge _{({\hat{\varphi }} \vdash {\hat{\psi }}) \in \hat{C}} {\hat{\varphi }} \rightarrow {\hat{\psi }} \end{aligned}$$


$$\begin{aligned} \theta = \bigwedge _{(\varphi \vdash \psi ) \in C} \varphi \rightarrow \psi . \end{aligned}$$

Then, the k-bounded consistency check for \(A\) corresponds to checking the satisfiability of the formula

$$\begin{aligned} \forall X_{I}^{0}. \exists X_{\text {ctr}}^0 \dots \forall X_{I}^{k}. \exists X_{\text {ctr}}^{k} .\ \theta ^0 \wedge \theta ^1 \wedge \dots \wedge \theta ^{k} \text { where} \end{aligned}$$

\(\theta ^{0} = {\hat{\theta }}[X' \backslash X^{0}]\) and \(\theta ^{i} = \theta [X' \backslash X^{i}, X\backslash X^{\tiny {i-1}}]\), \(1\,{\le }\, i \,{\le }\, k\).

To implement a consistency check in our prototype, we transform it to a satisfiability problem and use the SMT solver Z3 to solve it.

The first step is to construct a symbolic representation of the initial contracts and the transition relation.

The transition relation is then unfolded for each step by renaming the occurrence of each variable, such that it is indexed by the corresponding step. In each step i, the undecorated variables are indexed with \(i-1\), while the decorated variables are indexed with i, thus keeping the relation between the valuations of each step. Given a set \(X\) of variables, we denote by \(X^{i}\) the copy of the set, in which every variable is indexed by i.

The conjunction of all instances up to a certain depth is an open formula, leaving all variables free. The consistency check is bounded by a certain depth.

3.2 Test-case generation

A test case is an experiment executed on the \(\textsc {SUT}\) by the tester. We assume that the \(\textsc {SUT}\) is a black-box that is only accessed via its observable interface. We assume that it can be modeled as an input-enabled, deterministicFootnote 5 requirement interface. Without loss of generality, we can represent the \(\textsc {SUT}\) as a total sequential function \(\textsc {SUT}: V(X_{I}) \times V(X_{\text {obs}})^{*} \rightarrow V(X_{O})\). A test case \(T_{A}\) for a requirement interface \(A\) over \(X\) takes a history of actual input/output observations \(\sigma \in \mathcal {L}(A)\) and returns either the next input value to be executed or a verdict. Hence, a test case can be represented as a partial function \(T_{A}: \mathcal {L}(A) \rightarrow V(X_{I}) \cup \{ \mathbf{pass }, \mathbf{fail } \}\).

We first consider the problem of generating a test case from \(A\). The test-case generation procedure is driven by a test purpose. Here, a test purpose is a condition specifying the target set of states that a test execution should reach. Hence, it is a formula \(\varPi \) defined over \(X_{\text {obs}}\).

Given a requirement interface \(A\), let \({\hat{\phi }} = \bigvee _{({\hat{\varphi }} \vdash {\hat{\psi }}) \in \hat{C}} {\hat{\varphi }} \; \wedge \; \bigwedge _{({\hat{\varphi }} \vdash {\hat{\psi }}) \in \hat{C}} {\hat{\varphi }} \rightarrow {\hat{\psi }}\) and \(\phi = \bigvee _{(\varphi \vdash \psi ) \in C} \varphi \; \wedge \; \bigwedge _{(\varphi \vdash \psi ) \in C} \varphi \rightarrow \psi \). The predicates \({\hat{\phi }}\) and \(\phi \) encode the transition relation of \(A\), with the additional requirement that at least one assumption must be satisfied, thus avoiding input vectors for which the test purpose can be trivially reached due to under-specification. A test case for \(A\) that can reach \(\varPi \) is defined iff there exists a trace \(\sigma = \sigma ' \cdot w_{\mathrm{obs}}\) in \(\mathcal {L}(A)\), such that \(w_{\mathrm{obs}} \models \varPi \). The test purpose \(\varPi \) can be reached in \(A\) in at most k steps if

$$\begin{aligned} \exists X^{0},\ldots , X^{k}.\, \phi ^{0} \wedge \cdots \wedge \phi ^{k} \wedge \bigvee _{i \le k} \varPi [X_{\text {obs}}\backslash X_{\text {obs}}^{i}], \end{aligned}$$

where \(\phi ^{0} = {\hat{\phi }}[X' \backslash X^{0}]\) and \(\phi ^{i} = \phi [X' \backslash X^{i}, X\backslash X^{i-1}]\) represent the transition relation of \(A\) unfolded in the i-th step.

Given \(A\) and \(\varPi \), assume that there exists a trace \(\sigma \) in \(\mathcal {L}(A)\) that reaches \(\varPi \). Let \(\sigma _{I}\) be a projection to inputs, \(\pi (\sigma )[X_{I}] = w_{I}^{0} \cdot w_{I}^{1} \cdots w_{I}^{n}\). We first compute \(\omega _{\sigma _{I},A}\) (see Algorithm 1), a formulaFootnote 6 characterizing the set of output sequences that \(A\) allows on input \(\sigma _{I}\).

figure a

Let \({\hat{\theta }} = \bigwedge _{({\hat{\varphi }}\vdash {\hat{\psi }}) \in \hat{C}} {\hat{\varphi }} \rightarrow {\hat{\psi }}\) and \(\theta = \bigwedge _{(\varphi \vdash \psi )} \varphi \rightarrow \psi \). For every step i, we represent by \(\omega ^{i}_{\sigma _{I},A}\) the allowed behavior of \(A\) constrained by \(\sigma _{I}\) (Lines 1–4). The formula \(\omega ^{*}_{\sigma _{I}, A}\) (Line 5) describes the transition relation of \(A\), unfolded to n steps, and constrained by \(\sigma _{I}\). However, this formula refers to the hidden variables of \(A\) and cannot be directly used to characterize the set of output sequences allowed by \(A\) under \(\sigma _{I}\). Since any implementation of hidden variables that preserve correctness of the outputs is acceptable, it suffices to existentially quantify over hidden variables in \(\omega ^{*}_{\sigma _{I}, A}\). After eliminating the existential quantifiers with strategy qe, we obtain a simplified formula \(\omega _{\sigma _{I}, A}\) over output variables only (Line 6).

figure b

Let \(T_{\sigma _{I}, A}\) be a test case, parameterized by the input sequence \(\sigma _{I}\) and the requirement interface \(A\) from which it was generated. It is a partial function, where \(T_{\sigma _{I},A}(\sigma )\) is defined if \(|\sigma | \le |\sigma _{I}|\) and for all \(0 \le i \le |\sigma |\), \(w_{I}^{i} = \pi (w_\mathrm{obs}^{i})[X_{I}]\), where \(\sigma _{I} = w_{I}^{0} \cdots w_{I}^{n}\) and \(\sigma = w_\mathrm{obs}^{0} \cdots w_\mathrm{obs}^{k}\). Algorithm 2 gives a constructive definition of the test case \(T_{\sigma _{I},A}\). It starts by producing the output monitor for the given input sequence (Line 1). Then, it substitutes all output variables in the monitor, by the outputs observed from the \(\textsc {SUT}\) (Lines 2–5). If the monitor is satisfied by the outputs, it returns the verdict pass; otherwise, it returns fail.

Incremental test-case generation So far, we considered test-case generation for a complete requirement interface \(A\), without considering its internal structure. We now describe how test cases can be incrementally generated when the interface \(A\) consists of multiple views,Footnote 7 i.e., \(A= A^{1} \wedge A^{2}\). Let \(\varPi \) be a test purpose for the view modeled with \(A_{1}\). We first check whether \(\varPi \) can be reached in \(A^{1}\), which is a simpler check than doing it on the conjunction \(A^{1} \wedge A^{2}\). If \(\varPi \) can be reached, we fix the input sequence \(\sigma _{I}\) that steers \(A^1\) to \(\varPi \). Instead of creating the test case \(T_{\sigma _{I}, A^{1}}\), we generate \(T_{\sigma _{I}, A^{1} \wedge A^{2}}\), which keeps \(\sigma _I\) as the input sequence, but collects output guarantees of \(A^{1}\) and \(A^{2}\). Such a test case steers the \(\textsc {SUT}\) towards the test purpose in the view modeled by \(A^{1}\), but is able to detect possible violations of both \(A^{1}\) and \(A^{2}\).

We note that test-case generation for fully observable interfaces is simpler than the general case, because there is no need for the quantifier elimination, due to the absence of hidden variables in the model. A test case from a deterministic interface is even simpler as it is a direct mapping from the observable trace that reaches the test purpose—there is no need to collect constraints on the output, since the deterministic interface does not admit any freedom to the implementation on the choice of output valuations.

Example 5

Consider the requirement interface \(A_{beh }\) for the behavioral view of the two-bounded buffer, and the test purpose \(\textsf {F}\). Our test-case generation procedure gives the input vector \(\sigma _{I}\) of size 3, such that

$$\begin{aligned} \begin{array}{lcl} &{}&{}(\textsf {enq}, \textsf {deq})\\ \sigma _{I} &{}=&{} (\textsf {enq}, \lnot \textsf {deq})\\ &{}&{}(\textsf {enq}, \lnot \textsf {deq}).\\ \end{array} \end{aligned}$$

The observable output constraints for \(\sigma _{I}\) (which are encoded in \(\text {OutMonitor}\)) are \(\textsf {E} \wedge \lnot \textsf {F}\) in Step 0, \(\lnot \textsf {E} \wedge \lnot \textsf {F}\) in Step 1, and \(\lnot \textsf {E} \wedge \textsf {F}\) in Step 2. Together, the input vector \(\sigma _{I}\) and the associated output constraints form the test case \(T_{\sigma _{I},beh }\). Using the incremental test-case generation procedure, we can extend \(T_{\sigma _{I},beh }\) to a test case \(T_{\sigma _{I},buf }\) that also considers the power consumption view of the buffer, resulting in output constraints \(\textsf {E} \wedge \lnot \textsf {F} \wedge \textsf {pc} \le 2\) in Step 0, \(\lnot \textsf {E} \wedge \lnot \textsf {F} \wedge \textsf {pc} \le 2\) in Step 1, and \(\lnot \textsf {E} \wedge \textsf {F} \wedge \textsf {pc} \le 2\) in Step 2.

figure c

3.3 Test-case execution

Let \(A\) be a requirement interface, \(\textsc {SUT}\) a system under test with the same set of variables as \(A\), and \(T_{\sigma _{I}, A}\) a test case generated from \(A\). Algorithm 3 defines the test-case execution procedure \(\text {TestExec}\) that takes as input the \(\textsc {SUT}\) and \(T_{\sigma _{I}, A}\) and outputs a verdict \(\mathbf{pass }\) or \(\mathbf{fail }\). \(\text {TestExec}\) gets the next test input in from the given test case \(T_{\sigma _{I},A}\) (Lines 4, 8), stimulates at every step the system under test with this input, and waits for an output out (Line 6). The new inputs/outputs observed are stored in \(\sigma \) (Line 7), which is given as input to \(T_{\sigma _{I}, A}\). The test case monitors if the observed output is correct with respect to A. The procedure continues until a \(\mathbf{pass }\) or \(\mathbf{fail }\) verdict is reached (Line 5). Finally, the verdict is returned (Line 10).

Proposition 1

Let A, \(T_{\sigma _{I}, A}\), and \(\textsc {SUT}\) be arbitrary requirement interface, test case generated from A, and a system under test, respectively. Then, we have

  1. 1.

    if \(I \preceq A\), then \(\text {TestExec}(\textsc {SUT}, T_{\sigma _{I}, A}) = \mathbf{pass }\);

  2. 2.

    if \(\text {TestExec}(\textsc {SUT}, T_{\sigma _{I}, A}) = \mathbf{fail }\), then \(\textsc {SUT}\not \preceq A\).

Proposition 1 immediately holds for test cases generated incrementally from a requirement interface of the form \(A= A^{1} \wedge A^{2}\). In addition, we notice that a test case \(T_{\sigma _{I}, A^{1}}\) generated from a single view \(A^{1}\) of \(A\) does not need to be extended to be useful, and can be used to incrementally show that a \(\textsc {SUT}\) does not conform to its specification. We state the property in the following corollary that follows directly from Proposition 1 and Theorem 2.

Corollary 1

Let \(A= A^{1} \wedge A^{2}\) be an arbitrary requirement interface composed of \(A^{1}\) and \(A^{2}\), \(\textsc {SUT}\) an arbitrary system under test, and \(T_{\sigma _{I}, A^{1}}\) an arbitrary test case generated from \(A^{1}\). Then, if \(\text {TestExec}(\textsc {SUT}, T_{\sigma _{I}, A^{1}}) = \mathbf{fail }\), then \(\textsc {SUT}\not \preceq A^{1} \wedge A^{2}\).

figure d

Example 6

Consider as an \(\textsc {SUT}\) the implementation of a 3-place-buffer, as illustrated in Algorithm 4. We assume that the power consumption is updated directly in a \(\textsc {pc}\) variable. Although \(\textsc {SUT}\) is correctly implementing a 3-place-buffer, it is a faulty implementation of a 2-place-buffer. In fact, when \(\textsc {SUT}\) already contains two items, the buffer is still not full, which is in contrast with requirement \(r_4\) of a 2-place-buffer. Executing tests \(T_{\sigma _{I},beh }\) and \(T_{\sigma _{I},buf }\) from Example 5 will both result in a \(\mathbf{fail }\) test verdict.

3.4 Traceability

Requirement identifiers as first-class elements in requirement interfaces facilitate traceability between informal requirements, views, and test cases. A test case generated from a view \(A^i\) of an interface \(A= A^1 \wedge \cdots \wedge A^n\) is naturally mapped to the set \(\mathcal {R}^i\) of requirements. In addition, requirement identifiers enable tracing violations caught during consistency checking and test-case execution back to the conflicting/violated requirements.

Tracing inconsistent interfaces to conflicting requirements When we detect an inconsistency in a requirement interface \(A\) defining a set of contracts \(C\), we use QuickXPlain, a standard conflict set detection algorithm [36], to compute a minimal set of contracts \(C' \subseteq C\), such that \(C'\) is inconsistent. Once we have computed \(C'\), we use the requirement mapping function \(\rho \) defined in \(A\), to trace back the set \(\mathcal {R}' \subseteq \mathcal {R}\) of conflicting requirements.

Tracing fail verdicts to violated requirements In fully observable interfaces, every trace induces at most one execution. In that case, a test case resulting in \(\mathbf{fail }\) can be traced to a unique set of violated requirements. This is not the case in general for interfaces with hidden variables. A trace that violates such an interface may induce multiple executions resulting in \(\mathbf{fail }\) with different valuations of hidden variables, and thus different sets of violated requirements. In this case, we report all sets to the user, but ignore internal valuations that would introduce an internal requirement violation before inducing the visible violation.

We propose a tracing procedure \(TraceFailTC \), presented in Algorithm 5, that gives useful debugging data regarding violation of test cases in the general case. The algorithm takes as input a requirement interface \(A\) and a trace \(\sigma \not \in \mathcal {L}(A)\). The trace \(\sigma \) that is given as input to the algorithm is obtained from executing a test case for \(A\) that leads to a \(\mathbf{fail }\) verdict. The algorithm runs a main loop that at each iteration computes a debugging pair that consists of an execution \(\tau = \pi (\sigma )[X_{\text {obs}}]\) and a set \(\text {failR}\subseteq \mathcal {R}\) of requirements.Footnote 8 The execution \(\tau \) completes the faulty trace with valuations of hidden variables that are consistent with the violation of the requirement interface in the last step. The set \(\text {failR}\) contains all the requirements that are violated by the execution \(\tau \). We initialize the algorithm by setting an auxiliary variable \(C^*\) to the set of all update contracts \(C\) (Line 3). In every iteration of the main loop, we encode in \(\phi ^{*}_{\mathrm{obs}}\) all the executions induced by \(\sigma \) that violate at least one contract in \(C^*\) (Lines 6 and 7). In the next step (Line 8), we check the satisfiability of the formula \(\phi ^{*}_{\mathrm{obs}}\) (\(\mathbf{sat }(\phi ^{*}_{\mathrm{obs}})\)), a function that returns \(b = \mathbf{true }\), and a sequence (model) of hidden variable valuations \(w^0_H,\ldots ,w^n_H\) if \(\phi ^{*}_{\mathrm{obs}}\) is satisfiable, and \((b = \text{ false }, \sigma _H = \epsilon )\) otherwise. In the former case, we combine \(\sigma \) and \(\sigma _H\) into an execution \(\tau \) (Line 10). We collect in \(\text {failR}\) all requirements that are violated by \(\tau \) and remove the corresponding contracts from \(C^*\) (Lines 11–16). The debugging pair \((\tau , \text {failR})\) is added to \(\text {debugSet}\) (Line 16). The procedure terminates and returns \(\text {debugSet}\) when either \(C^*\) is empty or \(\sigma \) cannot violate any remaining contract in \(C^*\), thus ensuring that every requirement that can be violated by \(\sigma \) is part of at least one debugging pair in \(\text {debugSet}\).

figure e

Example 7

Consider the execution trace

$$\begin{aligned} \begin{array}{lcl} &{}&{}(\textsf {enq}, \textsf {deq}, \textsf {E}, \lnot \textsf {F})\\ \sigma &{}=&{} (\textsf {enq}, \lnot \textsf {deq}, \lnot \textsf {E}, \lnot \textsf {F})\\ &{}&{}(\textsf {enq}, \lnot \textsf {deq}, \lnot \textsf {E}, \lnot \textsf {F})\\ \end{array} \end{aligned}$$

that results in a \(\mathbf{fail }\) verdict when executing the test \(T_{\sigma _I,beh }\). The tracing procedure gives as debugging information the set \(\text {debugSet}= \{(\tau _1, \{r_4\}), (\tau _2, \{r_1, r_3\})\}\), where \(\tau _1\) and \(\tau _2\) correspond to the following executions that can lead to violations of requirements \(r_4\) and \(r_1, r_3\), respectively.

$$\begin{aligned} \begin{array}{lcl} &{}&{}(\textsf {enq}, \textsf {deq}, k=0, \textsf {E}, \lnot \textsf {F})\\ \tau _1 &{}=&{} (\textsf {enq}, \lnot \textsf {deq}, k=1, \lnot \textsf {E}, \lnot \textsf {F})\\ &{}&{}(\textsf {enq}, \lnot \textsf {deq}, k=2, \lnot \textsf {E}, \lnot \textsf {F})\\ &{}&{}\\ &{}&{}(\textsf {enq}, \textsf {deq}, k=0, \textsf {E}, \lnot \textsf {F})\\ \tau _2 &{}=&{} (\textsf {enq}, \lnot \textsf {deq}, k=1, \lnot \textsf {E}, \lnot \textsf {F})\\ &{}&{}(\textsf {enq}, \lnot \textsf {deq}, k=0, \lnot \textsf {E}, \lnot \textsf {F}).\\ \end{array} \end{aligned}$$

Requirements \(r_0\) and \(r_5\) cannot be violated in the last step of this test execution. We note that accessing the faulty 2-buffer implementation I from Algorithm 4, the debugging pair \((\tau _1, \{r_4\})\) would allow to exactly localize the error and trace it back to the violation of the requirement \(r_4\).

For requirement interfaces with hidden variables, the underlying implementation is only partially observable. The best that the tracing procedure can do when the execution of a test leads to the \(\mathbf{fail }\) verdict is to complete missing hidden variables with valuations that are consistent with the partial observations of input and output variables. It follows that the \(\text {debugSet}\) consists of “hints” on possible violated requirements and the causes of their violation. We note that Algorithm 5 attempts at finding the right compromise between minimizing the amount of data presented to the designer, while still providing useful information. In particular, it focuses on implementation errors that occur at the time of the failure, for both the hidden and the output variables. We note that in some faulty implementations, errors in updating hidden variables may not immediately result in observable faults. For instance, in the execution

$$\begin{aligned} \begin{array}{lcl} &{}&{}(\textsf {enq}, \textsf {deq}, k=1, \textsf {E}, \lnot \textsf {F})\\ \tau _3 &{}=&{} (\textsf {enq}, \lnot \textsf {deq}, k=1, \lnot \textsf {E}, \lnot \textsf {F})\\ &{}&{}(\textsf {enq}, \lnot \textsf {deq}, k=1, \lnot \textsf {E}, \lnot \textsf {F})\\ \end{array} \end{aligned}$$

the requirement \(r_0\) is immediately violated in the initial step, but the implementation errors are only observed in the last step of the test execution. Algorithm 5 does not give such executions as possible causes that lead to a \(\mathbf{fail }\) verdict. It is a design choice—we believe that choosing hidden variables without any restriction would result in executions that are too arbitrary and have little debugging value.

4 Model-based mutation testing

In this section, we apply a fault-based variant of model-based testing to requirement interfaces. In MBT, test cases are generated according to predefined coverage criteria, producing test suites that, e.g., cover all states in the specification model, or, in the case of a contract-based specification, enable all assumptions at least once.

Similar to that, we generate a test suite covering a set of faults. The faults are specified via a set of mutation operators that apply specific faults to all applicable parts of the model. When applied to requirement interfaces, we mutate one contract at a time. Then, we check for conformance between the original requirement interface and the mutated one. If the mutated requirement interface can produce controllable variable values that are forbidden by the original, the conformance is violated. In that case, we produce a test case leading exactly to that violation. If that test case is executed on a deterministic \(\textsc {SUT}\) and passes, we can guarantee that the corresponding fault was not implemented. Thus, by generating all tests for all fault models, we can prove the absence of all corresponding faults in the system.

Definition 6

We define a mutation operator \(\mu \) as a function \(\mu : C\rightarrow 2 ^ {C}\), which takes a contract \(c = (\varphi \vdash \psi ) \in C\) and produces a set of mutated contracts \(C^\mu \subseteq C\), where a specific kind of fault is applied to all valid parts of \(\psi \). We only consider mutations in the guarantee, as the fault models should simulate situations where the system produces wrong outputs, after receiving valid inputs.

We currently consider the following fault models:

  1. 1.

    Off-by-one Mutate every integer constant or variable, both by adding and subtracting 1,

  2. 2.

    Negation Flip every boolean constant or variable,

  3. 3.

    Change comparison operators Replace equality operators by inequality operators, and vice versa; replace every operator in \(\{<=, <,>, >=\}\) by every of the operators in \(\{<=, <, ==,>, >=\}\).

  4. 4.

    Change and/or Replace every and operator by an or operator and vice versa,

  5. 5.

    Change implication/bi-implication Replace every implication by a bi-implication and vice versa;

Definition 7

A mutant \(c_m = (\varphi \vdash \psi _m) \in C^\mu \) is an intentionally altered (mutated) version of the contract \(c = (\varphi \vdash \psi )\). A mutant is called a first-order mutant, if it only contains one fault. This paper only considers first-order mutants.

If a mutation does not introduce new behavior to the requirement interface, it is considered an equivalent mutation. If it leads to an inconsistency, it is considered an unproductive mutation .

Given the contract \((\varphi \vdash \psi ) \in C\), we denote by \({\bar{C}}\) the set of the other contracts in the requirement interface, i.e., \({\bar{C}} = C\setminus \{(\varphi \vdash \psi )\} \), by \(C^\mu \) the set of mutants obtained by applying all mutation operators to \((\varphi \vdash \psi )\) and by \(c_m=(\varphi \vdash \psi _m)\) one single mutant in \(C^\mu \).

\(c_m\) is a non-equivalent mutation, if there exist two valuations \(v,v'\), so that:

  • v is reachable from \({\hat{v}}\)

  • \((v,v') \models \varphi \)

  • \(\forall _{({\bar{\varphi }} \vdash {\bar{\psi }}) \in {\bar{C}}} (v,v') \models ({\bar{\varphi }} \vdash {\bar{\psi }})\)

  • \((v,v') \models (\varphi \vdash \psi _m) \wedge \lnot (\varphi \vdash \psi )\).

We considered a mutant k-equivalent to the original requirement interface, if it is equivalent up to a bound k.

The test purpose \(\varPi \) for detecting \((\varphi \vdash \psi _m)\) can be encoded by the formula

$$\begin{aligned}\varPi = \varphi \wedge \psi _m \wedge \lnot \psi \wedge \bigwedge _{({\bar{\varphi }} \vdash {\bar{\psi }}) \in {\bar{C}}} ({\bar{\varphi }} \rightarrow {\bar{\psi }}). \end{aligned}$$

The reachability formula for such a test purpose differs from the one presented in Sect. 3.2 in two details: in the step of the test purpose, the transition relation does not hold, as we require the original contract to be violated. In addition, a test purpose is a relation over primed and unprimed variables.

A test purpose \(\varPi \) can be reached in step k, if

$$\begin{aligned} \exists X^{0},\ldots , X^{k}.\, \phi ^{0} \wedge \cdots \wedge \phi ^{k-1} \wedge \varPi [X\backslash X^{k-1}, X'\backslash X^{k}], \end{aligned}$$

where, as in Sect. 3.2, \(\phi ^{0} = {\hat{\phi }}[X' \backslash X^{0}]\) and \(\phi ^{i} = \phi [X' \backslash X^{i}, X\backslash X^{i-1}]\) represent the transition relation of \(A\) unfolded in the i-th step. If the test purpose is reachable, the mutation is not k-equivalent.

Remark 1

In this paper, we consider weak mutation testing [35]. This means that wrong behavior of internal variables is already considered a conformance violation. In contrast, strong mutation testing also requires that an internal fault propagates to an observable failure. The encoding of the reachability of the mutation as a test purpose, without altering the step relation, is only possible for weak mutation testing. Strong mutation testing would require the step relation to use the mutated contract in all steps, and then detect the failure in the last step. Due to considering weak mutation testing, we also weaken the definition of a test purpose compared with the definition in Sect. 3, allowing it to use internal variables.

Contrary to the previously defined test purposes, the test purposes in model-based mutation testing lead to negative counter examples, that is, counter examples steering towards an incorrect state. However, as defined in Sect. 3, we only extract the input vector \(\sigma _i\), which is then combined with the correct requirement interface, to form a positive test case.

Often, different mutations of a contract will generate different negative counter examples, but those tests will then combine into the same positive test case. However, if the different mutations require different inputs to be enabled, they will also produce different positive test cases.

Example 8

Consider the requirement interface \(A_{beh }\) for the behavioral view of the 2-bounded buffer. Let \(c_{2,m} : \lnot \mathsf{enq'} \wedge \mathsf{deq'} \wedge k > 0 \;\vdash \; k' = k-2 \) be a mutant of \(c_2\), where \(k' = k-1\) was mutated to \(k' = k-2\). The test purpose to detect this mutation is

$$\begin{aligned} \varPi = \varphi _{c_2} \wedge \psi _{c_{2,m}} \wedge \lnot \psi _{c_2} \wedge \bigwedge _{i \in \{0,1,3,4,5\}} (\varphi _{c_i} \rightarrow \psi _{c_i}). \end{aligned}$$

The test purpose is not valid in the initial state, as the assumption requires k to be greater than 0. Thus, a corresponding test case needs to execute at least one enqueue operation, before the mutated dequeue functionality can occur. The shortest vector \({\bar{\sigma }}\) leading to the test purpose is

$$\begin{aligned} \begin{array}{lcl} &{}&{}(\textsf {enq}, \textsf {deq}, k=0, E, \lnot F)\\ {\bar{\sigma }} &{}=&{} (\textsf {enq}, \lnot \textsf {deq}, k=1, \lnot E, \lnot F)\\ &{}&{}(\lnot \textsf {enq}, \textsf {deq}, k=-1, \lnot E, \lnot F).\\ \end{array} \end{aligned}$$

We extract the input vector \(\sigma _{I}\), so that

$$\begin{aligned} \begin{array}{lcl} &{}&{}(\textsf {enq}, \textsf {deq})\\ \sigma _{I} &{}=&{} (\textsf {enq}, \lnot \textsf {deq})\\ &{}&{}(\lnot \textsf {enq}, \textsf {deq})\\ \end{array} \end{aligned}$$

Now, we can build the positive test case, by applying the correct step relation, thus gaining

$$\begin{aligned} \begin{array}{lcl} &{}&{}(\textsf {enq}, \textsf {deq}, k=0, E, \lnot F)\\ \sigma _{I} &{}=&{} (\textsf {enq}, \lnot \textsf {deq}, k=1, \lnot E, \lnot F)\\ &{}&{}(\lnot \textsf {enq}, \textsf {deq}, k=0, E, \lnot F).\\ \end{array} \end{aligned}$$

As a second mutant, consider \(c_{3,m} : \mathbf{true } \;\vdash \; k' = 0 \Leftrightarrow \lnot \mathsf{E'}\), where \(E'\) is mutated to \(\lnot E'\). In this case, the test purpose is not reachable, as the initial contract \(c_0\) requires both \(k'=0\) and \(E'\), which causes an inconsistency with the mutated contract. This makes \(c_{3,m}\) an unproductive mutation, as it does not generate a test case. However, consider another mutant of \(c_3\), \(c_{3,m'} : \mathbf{true } \;\vdash \; k' = 0 \implies \mathsf{E'}\) that changes the bi-implication to an implication. In this case, the mutated contract can be enabled after an enqueue in the first step, when \(k' = 1\), and thus, the left hand side of the implication is false, allowing \(E'\) to take any value. Thus, any vector of length two that starts with an enqueue operation, e.g., \({\bar{\sigma }}[0] = (\textsf {enq}, \textsf {deq}, k=0, E, \lnot F)\), \({\bar{\sigma }}[1] = (\textsf {enq}, \lnot \textsf {deq}, k=1, E, \lnot F)\) detects the mutation. This example shows, that for different mutations on the same contract, the test generation results in different outcomes.

5 Implementation and experimental results

In this section, we present a prototype that implements our test-case generation framework introduced in Sects. 3 and 4. The prototype was added to the model-based testing tool family MoMuTFootnote 9 and goes by the name MoMuT::REQs. The implementation uses the programming language Scala 2.10 and Microsoft’s SMT solver Z3 [40]. The tool implements both monolithic and incremental approaches to test-case generation. All experiments were run on a MacBook Pro with a 2.53 GHz Intel Core 2 Duo Processor and 4 GB RAM. We will demonstrate the tool on the buffer example and two industrial case studies.

5.1 Demonstrating example

To experiment with our algorithms, we model three variants of the buffer behavioral interface. All three variants model buffers of size 150, with different internal structures. Buffer 1 models a simple buffer with a single counter variable k. Buffer 2 models a buffer that is composed of two internal buffers of size 75 each, and Buffer 3 models a buffer that is composed of three internal buffers of size 50 each. We also remodel a variant of the power consumption interface that created a dependency between the power used and the state of the internal buffers (idle/used).

All versions of the behavior interfaces can be combined with the power consumption view point, either using the incremental approach or doing the conjunction before generating test cases from the monolithic specification.

Incremental consistency checking To evaluate the consistency check, we introduce three faults to the behavioral and power consumption models of the buffer: Fault 1 makes deq to decrease k when the buffer is empty; Fault 2 mutates an assumption resulting in conflicting requirements for power consumption upon enq; and Fault 3 makes enq to increase k when the buffer is full. The fault injection results in 9 single-faulty variants of interfaces.

We compare monolithic consistency checking to the consistency checking of individual views. We first note that the consistency check is coupled with the algorithm for finding minimal inconsistent sets of contracts. We set the range of the integer values to \([-2:152]\) and we bound the search depth to 3. In the monolithic approach, we first conjunct all view models and then check for consistency. In the incremental approach, we first check the consistency of individual views, and then, if no inconsistency is found, conjunct them one by one, checking consistency of partial conjunctions. However, in the current example, as we knew which view was faulty, we always started with the faulty view. As the inconsistency was detectable by examining only one view, we did not need to conjunct the second view. Table 1 summarizes and compares the time it takes to find an inconsistency and to compute the minimal inconsistent set of requirements in the requirement interface of a single view and in the monolithic interface that is the conjunction of both views. It gives a very nice expression on how separating the different views helps decreases the complexity, and thus the runtime, of the consistency checks. E.g., for Fault 2 in the second buffer, it reduces the runtime of the consistency check from 26 to 1 s. Fault 3 is omitted in the table, as neither approach was able to find an inconsistency. This is caused by the fact that the fault lies to deep in the system, and cannot be detected with the given search depth.

Table 1 Runtime in seconds for checking consistency of single and conjuncted interfaces

The bounded consistency checking is very sensitive to the search depth. Setting the bound to 5 increases the runtime from seconds to minutes—this is not surprising, since a search of depth n involves simplifying formulas with alternating quantifiers of depth n, which is a very hard problem for SMT solvers.

Test-case generation We compare the monolithic and incremental approach to test-case generation, by generating monolithic tests for the conjunction of the buffer interfaces and the power consumption interface, and incrementally, by generating tests only for the buffer interfaces, and completing them with the power consumption interface. The tests were generated according to manually defined test purposes that required the buffer to be full. Thus, the according test cases needed to perform 150 enqueue operations, and were of length 150. Table 2 summarizes the results, presenting the number of contracts and variables of the requirement interfaces, the runtime of the incremental test case generation, and the runtime of the monolithic approach. For the incremental approach, the runtime includes the test-case generation using only the behavioral view and the completion of the test case, according to the power consumption. The three examples diverge in complexity, expressed in the number of contracts and variables. Our results show that the incremental approach outperforms the monolithic one, resulting in speed-ups from 1.33 to 1.68.

Table 2 Runtime in seconds for incremental and monolithic test-case generation

Model-based mutation testing We applied the model-based mutation testing technique on all three variants of the buffer. For these experiments, we did not consider the power consumption, which could be added incrementally after generation of the tests. We used all mutation operators defined in Sect. 4. Table 3 shows the results of the approach, giving the number of mutants, the number of k-equivalent mutants, the number of unique test cases that were produced, and the total time for applying the complete approach to all mutants. The bound k for the equivalence check was set to 150. The reported times include mutation, generation of according test purposes, test-case generation, conversion into positive test cases, and detecting the unique test cases. Buffer 2 and Buffer 3 are more complex and create more mutants, and thus have a longer runtime. Yet, they also generate more unique tests, and thus a more thorough test suite.

Table 3 Results for model-based mutation testing on depth 150

5.2 Safing engine

As a first industrial application, we present an automotive use case supplied by the European ARTEMIS project MBAT,Footnote 10 that partially motivated our work on requirement interfaces. The use case was initiated by our industrial partner Infineon and evolves around building a formal model for analysis and test-case generation for the safing engine of an airbag chip. The requirements document, developed by a customer of Infineon, is written in natural (English) language. We identified 39 requirements that represent the core of the system’s functionality and iteratively formalized them in collaboration with the designers of Infineon. The resulting formal requirement interface is deterministic and consists of 36 contracts.

The formalization process revealed several under-specifications in the informal requirements that were causing some ambiguities. These ambiguities were resolved in collaboration with the designers. The consistency check revealed two inconsistencies between the requirements. Tracing the conflicts back to the informal requirements allowed their fixing in the customer requirements document.

We generated 21 test cases from the formalized requirements that were designed to ensure that every boolean internal and output variable is at least activated once and that every possible state of the underlying finite-state machine is reached at least once. Thus, the test suite provides state and signal coverage. The average length of the test cases was 3.4 and the maximal length was 6, but since the test cases are synchronous, each of the steps is able to trigger several inputs and outputs at once. The test cases were used to test the SIMULINK model of the system, developed by Infineon as part of their design process. The SIMULINK model of the safing engine consists of a state machine with seven states, ten smaller blocks transforming the input signals, and a MATLAB function calculating the final outputs according to the current state and the input signals. To execute the test cases, Infineon’s engineers developed a test adapter that transforms abstract input values from the test cases to actual inputs passed to the SIMULINK model. We illustrate a part of the use case with three customer requirements that give the flavor of the underlying system’s functionality:

\(\textsf {r}_1\)::

There shall be seven operating states for the safing engine: RESET state, INITIAL state, DIAGNOSTIC state, TEST state, NORMAL state, SAFE state, and DESTRUCTION state.

\(\textsf {r}_2\)::

The safing engine shall change per default from RESET state to INIT state.

\(\textsf {r}_3\)::

On a reset signal, the safing engine shall enter RESET state and stay, while the reset signal is active.

These three informal requirements were formalized with the following contracts with a one-to-one relationship between these example requirements and the contracts:

This case study extends an earlier one [4] with test-case execution and a detailed mutation analysis evaluating the quality of the generated test cases. We created 66 faulty SIMULINK models (six turned out to be equivalent), by flipping every boolean signal (also internal ones) involved in the MATLAB function calculating the final output signals. Our 21 test cases were able to detect 31 of the 60 non-equivalent faulty models, giving a mutation score of \(51.6\%\). These numbers show that state and signal coverage is not enough to find all faults and confirm the need to incorporate a more sophisticated test-case generation methodology. Therefore, we manually added TEN test purposes generating 10 additional test cases. The combined 31 test cases finally reached a \(100\%\) mutation score. This means that all injected faults in the SIMULINK models were detected.

Model-based mutation testing In addition, we applied two iterations of the model-based mutation testing approach, setting the bound k to 6. In the first iteration, we generated 362 mutants, applying all mutation operators. We generated 165 negative tests—197 mutants were k-equivalent. From the 165 negative tests, we extracted 28 unique positive test cases.

The mutation score achieved by these 28 test cases on the 60 faulty SIMULINK models was surprisingly low, with only \(49.2\%\). A closer investigation of the requirement interface shows that many of the contracts work globally, without being bound to a specific state of the state machine. For mutants from these contracts, our approach only generates one test case, even though the mutants generate multiple faults, in several different states. Due to the decomposed structure, even though we only insert one fault, our mutants are not classic first-order mutants anymore.

There are two ways to deal with this problem. The first one would be the generation of multiple test cases per mutant that covers all possible faulty states. We already applied this technique previously, in a different context [3]. However, this technique might become very expensive, and impossible for systems with infinite-state space.

The second approach is based on refactoring of the contracts, splitting global contracts into multiple more fine-grained ones. E.g., contract \(c_3\) could be refactored into several state-bound contracts like

Applying this technique, we gained 17 new contracts. The second run of our test-case generation produced 525 mutants and 293 were detected as non-equivalent. This led to 61 unique test cases, which were able to detect 53 of the faulty mutants, resulting in a mutation score of \(88\%\).

This shows that the quality of model-based mutation testing for requirement interfaces is severely depending on the modeling style. However, while the fine-grained contracts might slightly decrease the clarity of the requirement interfaces, they, in turn, increase the traceability and facilitate fault detection.

5.3 Automated speed limiter

The second industrial case study is a use case provided by the industrial partner Volvo in the ARTEMIS Project CRYSTAL. It revolves around an automated speed limiter (ASL), which adopts the current speed according to a desired speed limit. It contains an internal state machine with three states: OFF, LIMITING, and OVERRIDDEN. Upon activation, it either takes the current speed as limit, or a predefined value. The limit can then be increased and decreased manually, and a kickdown of the gas pedal overrides the speed limiter for some time threshold. Adjusting the speed, or setting it to the predefined value, ends the overridden mode. Finally, the speed limiter can be turned off again, both from overridden and active mode.

The part of the ASL that was analysed within the project was documented by 17 informal requirements. These were refined to 26 formal requirements, collected in one requirements interface. The interface contains two input variables, two output variables, and four internal variables. The example below shows three characteristic contracts which serve as an illustration of the functionality of the speed limiter, where set and state are output variables, in and kickdown are input variables, preset_value and timer are internal variables, and preset and plus are enum values. Contract \(c_1\) switches the ASL on, assigning the preset value as current limit. \(c_2\) adjusts the current limit, increasing it by one. In addition, \(c_3\) activates the overridden mode, in case of a gas pedal kickdown. It also resets a clock variable for the automated timeout that would lead back to limiting mode.

Applying the mutation-based test generation to this case study generates 291 mutants, using all mutation operators introduced in Sect. 4 and setting the bound k to 4. Fifty seven of the mutants are equivalent, leaving a total of 234 non-equivalent mutants. Ninety six of these mutants can be detected within one step, sixty-four mutants are detected after two steps, and seventy two after the third step. This reflects very clearly the state-based structure that consists of three states. An analysis of the test cases shows that 60 of the tests are unique. Given that the model is deterministic, each of the unique tests enables different contracts in the individual steps. A further analysis of these 60 unique test cases shows that 12 are of length one, 18 are of length two, and 30 are of length three.

To evaluate the quality of the test cases, we implemented a Java version of the ASL, and used the Major mutation framework [37] to generate a set of 64 faulty implementations, using all mutation operators supported by Major. By executing our generated tests on these faulty implementations, we could perform a classic mutation analysis: our test suite was able to detect 48 of the faults. Further investigation of the undetected faults revealed that 13 of the remaining Java mutants were equivalent, and could thus not be detected by any test case. Another two of the faults were introduced in the conditions of if statements. The conditions correspond to the assumptions of our interfaces, which we did not mutate during the test-case generation.

The last remaining fault was introduced in the timing behavior of the Java implementation, which was simulated via a tick method, indicating the passage of 1 s. In the requirement interface, it was modeled via a non-deterministically increasing variable. The fault caused the implementation to trigger the state change already after 9 s instead of 10 s. The test driver was not sensitive enough to detect that, as the test case only specified the behavior after 10 s, and did not specify what the correct behavior after 9 s would be.

6 Related work

Synchronous languages were introduced in the 1980’s, and mostly driven in France, where the most well-known three synchronous languages were developed: Lustre [22], Signal [27], and Esterel [17]. In 1991, the IEEE devoted a special issue to synchronous systems, featuring, e.g., a paper by Benveniste and Berry [11], discussing the major issues and approaches of synchronous specifications of real-time systems. A decade later, Benveniste et al. [14] gave an overview on the development of synchronous languages during that decade, especially mentioning the rising tool support and the industrial acceptance. They also mention globally asynchronous, locally synchronous systems [47] as an upcoming trend.

The main inspiration for this work was the introduction of the conjunction operation and the investigation of its properties [25] in the context of synchronous interface theories [23]. While the mathematical properties of the conjunction in different interface theories were further studied in [12, 30, 43], we are not aware of any similar work related to MBT.

Synchronous data-flow modeling [15] has been an active area of research in the past. The most important synchronous data-flow programming languages are Lustre [22] and SIGNAL [27]. These languages are implementation languages, while requirement interfaces enable specifying high-level properties of such programs. Testing of Lustre-like programs was studied by Raymond et al. [42] and Papailiopoulou [41]. The specification language SCADE [16] supports graphical representation of synchronous systems. Internally, SCADE models are stored in a textual representation very similar to Lustre. Experimental results for test-case generation were presented by Wakankar et al. [49] for experiments where the manually translated SCADE models to SAL-ATG models, and used the SAL-ATG for the test-case generation.

The tool STIMULUS [34] allows the synchronous specification and debugging of real-time systems, via predefined sentence templates, which can be simulated via a constraint solver. As in our approach, this enables a tight traceability between the natural language requirements, the formalized requirements, and the outcome of the verification tasks. They combine a user-friendly specification language with formal verification via BDDs. The tool is evaluated on an automotive case study.

Compositional properties of specifications in the context of testing were studied before [7, 18, 24, 38, 44]. None of these works consider synchronous data-flow specifications, and the compositional properties are investigated with respect to the parallel composition and hiding operations, but not conjunction. A different notion of conjunction is introduced for the test-case generation with SAL [28]. In that work, the authors encode test purposes as trap variables and conjunct them to drive the test-case generation process towards reaching all the test purposes with a single test case. Consistency checking of contracts has been studied in [26], yet for a weaker notion of consistency.

Our specifications using constraints share similarities with the Z specification language [45] that also follows a multiple-viewpoint approach to structuring a specification into pieces called schemas. However, a Z schema defines the dynamics of a system in terms of operations. In contrast, our requirement interfaces follow the style of synchronous languages.

Brillout et al. [20] performed mutation-based test-case generation on SIMULINK models. They implemented the approach in the tool COVER, based on the model-checker CBMC. He et al. [29] exploit similarity measures on mutants of SIMULINK models, to decrease the cost of mutation-based test-case generation. They provide experiments to show the advantages of model-based mutation testing compared with random testing, and compared with simpler mutation-based testing approaches.

There exist several tools for test-case generation for synchronous systems. The tool Lutess [19] is based on Lustre. It takes the specification of the environment (specified in Lustre), a test sequence generator, and an oracle and performed online testing on the system under test according to the environment and traces selected by the generator according to several different modes. Another tool based on Lustre is called Lurette [42]. Lurette only performs random testing, but is able to validate systems with numerical inputs and outputs. A third testing tool based on Lustre is called GATeL [39]. It generates tests according to test purposes, using constraint logic programming to search for suitable traces.

The tool Autofocus [32] facilitates test-case generation from time-synchronous communicating extended finite-state machines that build a distributed system. It is based on constraint logic programming, and supports functional, structural, and stochastic test specifications.

The application of the test-case generation and consistency checking tool for requirement interfaces and its integration into a set of software engineering tools was presented in [4]. That work focuses on the requirement-driven testing methodology, workflow, and tool integration, and gives no technical details about requirement interfaces. In contrast, this paper provides a sound mathematical theory for requirements interfaces and their associated incremental test-case generation, consistency checking, and tracing procedures.

Model-based mutation testing was initially used for predicate-calculus specifications [21] and later applied to formal Z specifications [46]. Amman et al. [8] used temporal formulae to check equivalence between models and mutants, and converted counter examples to test cases, in case of non-equivalence. Belli et al. [9, 10] applied model-based mutation testing to event sequence graphs and pushdown automata. Hierons and Merayo [31] applied mutation-based test-case generation to probabilistic finite-state machines. The work presents mutation operators and describes how to create input sequences to kill a given mutated state machine.

Model-based mutation testing has already been applied to UML models [1, 3], action systems [2], and timed automata [6].

7 Conclusions and future work

We presented a framework for requirement-driven modeling and testing of complex systems that naturally enable the multiple-view incremental modeling of synchronous data-flow systems. The formalism enables conformance testing of complex systems to their requirements and combining partial models via conjunction.

We also adapted the model-based mutation testing technique to requirement interfaces, and evaluated its applicability for two industrial case studies.

Our requirement-driven framework opens many future directions. We will extend our procedure to allow generation of adaptive test cases. In the context of model-based mutation testing, we will investigate strong mutation testing and mutation of assumptions. We will investigate in the future other compositional operations in the context of testing synchronous systems, such as the parallel composition and quotient. We intend on adding timed semantics to requirement interfaces, for a more thorough timing analysis. We will consider additional coverage criteria and test purposes, and will use our implementation to generate test cases for safety-critical domains, including automotive, avionics, and railways applications.