1 Introduction

The holy grail of model-based testing is a complete test suite: a test suite that can detect any possible faulty implementation. This is impossible for black-box testing, since a tester can only make a finite number of observations, but for an implementation of unknown size, it is unclear when to stop. Often, a so-called n-complete test suite is used to tackle this problem, meaning it is complete for all implementations with at most n states.

A celebrated result for deterministic finite state machines (FSMs, or Mealy machines) is the existence of efficient n-complete test suites (Chow 1978). Nowadays, many variations exist (Dorofeeva et al. 2010), all of which share the basic structure. The test suites usually provide some way to reach all states and transitions of the implementation. After reaching some implementation state, state identification is used to test whether it is equivalent to the intended specification state: the intended state is distinguished from inequivalent specification states.

We will explore how an n-complete test suite can be constructed for more general labeled transition systems instead. We use the ioco relation (Tretmans 2008) as a conformance relation. Unlike FSM equivalence, ioco is not an equivalence relation, meaning that many inequivalent implementations may conform to the same specification and, conversely, an implementation may conform to several inequivalent specifications. Specification states which can be implemented with a single state are called compatible. Standard distinguishing techniques cannot be applied to compatible states. We investigate and characterize the notion of compatibility, and we introduce an alternative to the usual way of distinguishing compatible states. Using these insights, we give a construction for an n-complete test suite and prove it to be correct.

We already addressed this problem in van den Bos et al. (2017). This paper improves on it in the following ways:

  • We give a more detailed discussion on equivalence and compatibility of states, and we discuss the construction of a merge of two states explicitly. In particular, we now prove that states are compatible if and only if their merge is valid.

  • The algorithm in van den Bos et al. (2017) to compute distinguishing trees assumes incompatible states, but no means of deciding compatibility of states is given. In this paper, we instead give an algorithm for computing the compatibility relation, while simultaneously computing witnesses (i.e., distinguishing graphs) if states are incompatible.

  • Instead of distinguishing trees, we use distinguishing graphs by reusing nodes. This gives a more efficient algorithm, as the graphs are polynomial in size.

  • For sets of distinguishing graphs, we define the notions of characterization sets and (harmonized set of) state identifiers. This makes the relation to FSM theory more explicit.

  • We add examples to highlight properties of compatibility, and the construction and execution of test suites.

This paper is structured as follows. In Section 2, we introduce the domain of specifications and implementations, as well as the ioco relation. Furthermore, we give a short overview of existing theory on n-complete test suites for FSMs. We formalize the notions of equivalence of states in Section 3 and of compatibility in Section 4. In Section 5, we show how to compute distinguishing graphs for incompatible states. The construction of n-complete test suites is then be described in Section 6, together with a correctness proof. We conclude in Section 7.

Related work

Testing methods for FSMs have been analyzed thoroughly, including n-complete test suites and various ways of distinguishing states. A survey is given by Dorofeeva et al. (2010). Progress has been made on generalizing these testing methods to nondeterministic FSMs. Petrenko and Yevtushenko (2005, 2011) use the reduction relation for testing nondeterministic FSMs, which resembles ioco more closely than equivalence.

Complete testing received less attention within ioco theory. With the original test generation method (Tretmans 2008), test cases are generated randomly. This method is described as complete, but only in the sense that any fault can eventually be found: there is no upper bound to the required number and length of test cases.

Paiva and Simao (2016) construct complete test suites for Mealy-IOTSes. Mealy-IOTSes are a subclass of labeled transition systems, but are similar to Mealy machines, as (sequences of) outputs are coupled to inputs. This makes the translation from FSM testing more straightforward.

The work most similar to ours is that of Simao and Petrenko (2014) and works on deterministic labeled transition systems. Some further restrictions are made on the specification domains. In particular, every specification state should be certainly reachable, i.e., all conforming implementations must implement that state. Furthermore, all states should be mutually incompatible, such that an implementation state cannot possibly conform to multiple specification states. In this sense, our test suite construction can be applied to a broader set of systems, potentially at the cost of efficiency. Thus, we explore the bounds of n-complete test suites for ioco in an unrestricted setting, whereas Simao and Petrenko (2014) aim at efficient test suites in a restricted setting.

2 Preliminaries

To model implementations and specifications, we use a particular domain of labeled transition systems, namely suspension automata. We essentially regard them as deterministic automata, for which the transitions are labeled with an input or output.

For the remainder of this paper, we fix LI and LO as disjoint finite sets of input and output labels respectively, with L = LILO. Furthermore, we use a, b as input labels and w, x, y, z as output labels. We use μ as a label that can be either input or output. The set L denotes the set of sequences of labels in L. For a partial function \(f : X \rightharpoonup Y\), let f(x) and f(x) mean that f(x) is defined and undefined respectively.

Definition 1

An automaton with inputs and outputs is a tuple (Q, T, q0) where

  • Q is a finite set of states,

  • \(T : Q \times L \rightharpoonup Q\) is the (partial) transition function, and

  • q0Q is the initial state.

We interchangeably use T as partial function and as the set of transitions T = {(q, μ, T(q, μ))∣ T(q, μ)}. For qQ, we denote the set of enabled inputs and outputs in q by in(q) = {aLIT(q, a) } and out(q) = {xLOT(q, x) } respectively. An automaton (Q, T, q0) is input-enabled if ∀qQ : in(q) = LI, and non-blocking if ∀qQ : out(q)≠.

The set of all automata with inputs and outputs is denoted by \(\mathcal {A}\mathcal {I}\mathcal {O}\). With \(\mathcal {S\!A}\), we denote the set of suspension automata, which are non-blocking automata with inputs and outputs. \(\mathcal {S\!A}_{\textit {IE}}\) denotes the set of input-enabled suspension automata.

We will use \(\mathcal {S\!A}\) as the domain of specifications, and \(\mathcal {S\!A}_{\textit {IE}}\) as the domain of implementations. Both thus have an output transition in every state, and implementations have a transition for every input (see Fig. 1 for an example specification and two implementations). We will encounter automata in \(\mathcal {A}\mathcal {I}\mathcal {O}\) only as intermediate product of an operation introduced in Section 4.

Fig. 1
figure 1

A specification with a conforming and non-conforming implementation. An edge labeled with a, x indicates two independent transitions leaving a state. a Specification S.b Conforming implementation. c Non-conforming implementation

At this point, we remark that \(\mathcal {S\!A}\) and \(\mathcal {S\!A}_{\textit {IE}}\) are different from the usual implementation and specification domains for ioco: the original theory considers nondeterministic labeled transition systems with inputs, outputs, internal transitions, and the artificial output quiescence, i.e., observation of the absence of explicit outputs. Quiescence ensures that every labeled transition system in ioco theory is non-blocking. By determinizing these non-blocking labelled transition systems, any labeled transition system may equivalently be expressed as a suspension automaton (Tretmans 2008). For suspension automata, we will consider quiescent transitions to be output transitions like any other. By using suspension automata, we thus do not need to concern ourselves with nondeterminism and internal transitions, as suspension automata describe the observable behavior.

Readers familiar with suspension automata may remark that they usually adhere to particular restrictions. For example, quiescence should not be followed by any output (other than quiescence itself) and it should not cause any actual transition in the underlying nondeterministic labeled transition. We refer to Willemse (2006) for a more elaborate description of these restrictions. We will not pose such restrictions in order to simplify reasoning, and our domains are thus a generalization of the usual domains for ioco theory. This implies soundness of our test suites: if any faulty implementation in our general domain can be detected, then we can certainly detect all faults in a more restricted implementation domain. When reusing results from other works in which this difference is relevant, we will clarify the translation to our domain.

Throughout the paper, we will use the following notation (Definition 2), where 𝜖 denotes the empty sequence. With after, we lift the transition relation to sets of states and sequences of labels. With traces, we denote the set of all traces of a set of states. We also lift in and out to sets, and use init to obtain the labels of all enabled transitions. We sometimes interchange a singleton set with its element, e.g., we write out(q) instead of out({q}). Following Simao and Petrenko (2014), we write S/q to refer to the suspension automata starting in state q of specification S.

Definition 2

Let \(S = (Q,T,q_{0}) \in \mathcal {A}\mathcal {I}\mathcal {O}\), qQ, BQ, μL and σL. Then, we define

$$\begin{array}{llll} &q \textsf{ after } \epsilon = \{q\} \qquad\qquad\qquad\qquad\qquad\qquad\quad &\quad \textsf{out}(B)= \bigcup\limits_{q^{\prime} \in B} \textsf{out}(q^{\prime})\\ & q \textsf{ after } \mu\sigma = \left\{\begin{array}{llllll} T(q, \mu) \textsf{ after } \sigma & \text{if} T(q, \mu)\uparrow\\ & \\ \emptyset & \text{otherwise} \end{array}\right. &\quad \textsf{in}(B)= \bigcup\limits_{q^{\prime} \in B} \textsf{in}(q^{\prime})\\ &B \textsf{ after } \sigma = \bigcup\limits_{q^{\prime} \in B} q^{\prime} \textsf{ after } \sigma &\quad \textsf{init}(B) = \textsf{in}(B) \cup \textsf{out}(B)\\ & S\textsf{ after } \sigma = q_{0}\textsf{ after } \sigma &\quad \textsf{traces}(B)= \{\sigma^{\prime} \in L^{*} | B \textsf{ after } \sigma^{\prime} \neq \emptyset\}\\ &S/q = (Q,T,q) &\quad \textsf{traces}(S)= \textsf{traces}(\{q_{0}\}) \end{array} $$

The ioco relation formalizes when implementations conform to specifications. We give a definition relating traces, following (Tretmans 1996; Willemse 2006), and a coinductive definition relating states. This last definition can be seen as an alternating simulation. Several papers (Aarts and Vaandrager 2010; Noroozi 2014; Veanes and Bjørner 2012) have related the original ioco definition to alternating simulation, and proven that the two coincide for deterministic systems. Note that our domain of suspension automata extends the usual domain, and as such, our definition of ioco is also an extension with respect to Noroozi (2014) and Tretmans (1996).

Definition 3

Let \(S \in \mathcal {S\!A}\) and \(I \in \mathcal {S\!A}_{\textit {IE}}\). Then, we say that I ioco S if for all σtraces(S) we have out(I after σ) ⊆out(S after σ).

Definition 4

Let \(S = (Q_{S},T_{S},{q_{0}^{S}}) \in \mathcal {S\!A}\) and \(I = (Q_{I},T_{I},{q_{0}^{I}}) \in \mathcal {S\!A}_{\textit {IE}}\). Then, for qIQI, qSQS, we say that qI ioco qS if there exists a relation RQI × QS such that (qI, qS) ∈ R, and for all (q, q) ∈ R :

  • a ∈in(q) : (TI(q, a),TS(q, a)) ∈ R, and

  • x ∈out(q) : x ∈out(q) and (TI(q, x),TS(q, x)) ∈ R.

Any such relation R is called a coinductive ioco relation.

Proposition 5

Let \(S \in \mathcal {S\!A}\) , \(I \in \mathcal {S\!A}_{\textit {IE}}\) and let \({q_{0}^{S}}\) and \({q_{0}^{I}}\) be their initial states. We have I ioco S if and only if \({q_{0}^{I}} \textup {\textsf { ioco }} {q_{0}^{S}}\) .

The relation ioco is a preorder on input-enabled labeled transition systems (Tretmans 2008), and it is also a preorder on our extended domain \(\mathcal {S\!A}_{\textit {IE}}\). We introduce the notion of ioco counterexample as a witness for non-conformance, since this is sometimes convenient for reasoning about the ioco relation.

Definition 6

Let \(S \in \mathcal {S\!A}\), σL, and xLO. We call σx an ioco counterexample forS if σtraces(S) and x∉out(S afterσ).

Lemma 7

Let\(S \in \mathcal {S\!A}\)bea specification and\(I \in \mathcal {S\!A}_{\textit {IE}}\)animplementation. Then,IiocoSif and only iftraces (I) contains no ioco counterexample for S.

Proof

I i o c o S

$$\begin{array}{@{}rcl@{}} &&\iff \forall\sigma \in \textsf{traces}({S}): \textsf{out}(I {\textsf{ after }} \sigma) \subseteq \textsf{out }(S{\textsf{ Safter }}\sigma) \text{\quad(Definition~3)}\\ &&\iff \forall\sigma \in \textsf{traces}({S}): \forall x \in L_{O}: x \in \textsf{out}(I{\textsf{ after }} \sigma) \implies x \in \textsf{out}(S {\textsf{after}} \sigma)\\ &&\iff \forall\sigma \in \textsf{traces}({S}): \forall x \in L_{O}: x \notin \textsf{out}(S {\textsf{after}} \sigma) \implies x \not \in \textsf{out}(I {\textsf{after}} \sigma)\\ &&\iff \forall\sigma \in \textsf{traces}({S}): \forall x \in L_{O}:\\ &&\hspace{1cm}\sigma x \text{ is an ioco counterexample for S} \implies \sigma x \notin \textsf{traces}({I}) \text{\quad(Definition~6)}\\ &&\iff \text{\textsf{traces}({I}) contains no ioco counterexample for S} \end{array} $$

Example 8

Figure 1 shows two implementations for the specification S in Fig. 1a. The first (Fig. 1b) is conforming and to see this we can define the relation R = {(1,1),(2,2),(2,3),(5,4),(5,5),(6,6)} and check that it is a coinductive ioco relation. In particular, observe that the state 2 is related to two different specification states. This will be important when we discuss compatible states. Ioco counterexample awzx shows that Fig. 1c does not conform to the specification. (The final x is not allowed by the specification.)

2.1 n-Complete test suites for FSMs

As this paper is founded on the ideas of existing theory on n-complete test suites for deterministic complete FSMs (Chow 1978), we give a short overview to ease comparison.

A finite state machine (FSM) is a state machine in which every transition has both an input and output label. A deterministic complete FSM contains precisely one transition for every input in every state. We only consider deterministic complete FSMs in this section.

One can provide a sequence of inputs to an FSM, on which it will produce a sequence of outputs following the transitions. Every state can thus be characterized as a function from input sequences to output sequences, which induces an equivalence on states. When both the specification and the implementation are FSMs, we take equivalence of initial states as implementation relation. An input sequence represents a test for this equivalence: the sequence is provided to the implementation, and the outputs are compared to the specification. An n-complete test suite is a set of tests which detect all faulty implementations having at most n states.

If m is the number of states of a specification FSM, then an m-complete test suite can be constructed as follows. We construct a set P containing access sequences to every specification state and a set W containing sequences which distinguishes every pair of specification states. The set P is usually called the state-cover and W the characterization set. The set \(P \cdot L_{I}^{\le 1} \cdot W\) is then an m-complete test suite, with \(L_{I}^{\le 1}\) the set of input sequences of length 0 or 1. By executing every distinguishing sequence after every access sequence (PW), we ensure that the implementation shows at least |P| different behaviors, i.e., the implementation has at least as many states as the specification. Executing the access sequence with an additional input before the distinguishing sequence (PLIW) ensures that after every transition, we observe the correct destination state in the implementation. By extending the set \(L_{I}^{\leq 1}\) to \(L_{I}^{\leq k + 1}\), one can construct (m + k)-complete test suites. Such a test suite then detects all faulty implementations with k more states than the specification. There exist various variants of distinguishing sequences from which more efficient (i.e., smaller) test suites can be constructed. An overview is given in Dorofeeva et al. (2010).

3 Equivalent states

If two specifications or two specification states have precisely the same implementations conforming to them, it is impossible but also unnecessary to distinguish them. We provide a characterization of this equivalence.

Definition 9

Two specifications \(S_{1}, S_{2} \in \mathcal {S\!A}\) are equivalent, denoted S1S2, if \(\forall I \in \mathcal {S\!A}_{\textit {IE}}: I \mathrel {\textsf {ioco}} S_{1} \iff I \mathrel {\textsf {ioco}} S_{2}\).

This defines an equivalence relation. Algorithmically, it is useful to have a coinductive definition. However, a direct definition might be cumbersome as it has to relate explicit underspecification with implicit underspecification. The former is a specification which allows all outputs after an input transition while the latter is a specification which omits such an input transition altogether. One can make all underspecifications explicit with demonic completion (Tretmans 2008). This will lead to a simple coinductive definition of equivalence.

Definition 10

Let \(S = (Q,T,q_{0}) \in \mathcal {S\!A}\), and let χQ. The demonic completion ofS is defined as X(S) = (Q ∪{χ},T, q0) where T = T ∪{(q, a, χ)∣qQ, aLI, T(q, a) }∪{(χ, μ, χ)∣μL}.

Using the demonic completion one can transform specifications to equivalent, input-enabled ones. The basic properties are listed in the next lemma. These properties are used on suspension automata by Beneš et al. (2015).

Lemma 11

For all\(S \in \mathcal {S\!A}\),we have thatX(S) is input-enabled andSX(S).Moreover, we haveX(S) ioco S.

With these properties, we can characterize equivalence as follows.

Lemma 12

Let \(S_{1},S_{2} \in \mathcal {S\!A}\) . Then, we have

$$S_{1} \simeq S_{2} \iff X(S_{1})~ {\mathsf{ioco}} X(S_{2}) \wedge X(S_{2}) ~{\mathsf{ioco}} X(S_{1}).$$

Proof

(⇒ ) Let S1S2. From X(S1) ioco S1 (Lemma 11), it follows that X(S1) ioco S2 by equivalence. By using Lemma 11 again, we conclude that X(S1)iocoX(S2). Similarly X(S2)iocoX(S1).

(⇐= ) Let \(I \in \mathcal {S\!A}_{\textit {IE}}\) and assume that I ioco S1. We have to show that I ioco S2. By Lemma 11 we have I ioco X(S1), and by assumption, we have X(S1) iocoX(S2). Using the transitivity on \(\mathcal {S\!A}_{\textit {IE}}\), we get I iocoX(S2). By Lemma 11, we conclude that I iocoS2. The implication I iocoS2 to I iocoS1 is proven similarly. □

We note that the right-hand side in Lemma 12 can be defined coinductively by using Proposition 5. If we spell this out, we get the following definition.

Definition 13

Let \(S \in \mathcal {S\!A}\) be a specification and X(S) = (QX, TX, q0) its demonic completion. A relation RQX × QX is a coinductive equivalence relation if for all (q, q) ∈ R:

$$\begin{array}{@{}rcl@{}} &&\textsf{out}(q) = \textsf{out}({q^{\prime}}) , \text{and} \end{array} $$
(1)
$$\begin{array}{@{}rcl@{}}\ &&\forall \mu \in \textsf{init}({q}) \cap \textsf{init}({q^{\prime}}): (q \textsf{after} {\mu}, {q^{\prime}} \textsf{after} {\mu}) \in R. \end{array} $$
(2)

We define qq if there is a coinductive equivalence relation R with (q, q) ∈ R.

Proposition 14

Let\(S=(Q,T,q_{0}) \in \mathcal {S\!A}\)andq, qQtwo states. Then, we haveqqS/qS/q.

Proof

By Lemma 12, we need to prove qqX(S/q) ioco X(S/q) ∧ X(S/q) ioco X(S/q). Note that all relations involved here are on the set Q ∪{χ}. (⇒) Any coinductive equivalence relation is also a coinductive ioco relation. (⇐=) Let R and R be the coinductive ioco relations for X(S/q) ioco X(S/q) and X(S/q) ioco X(S/q) respectively. Then, we conclude that RR is a coinductive equivalence relation. □

4 Compatible states

For two inequivalent specification states, there may still exist an implementation that conforms to the two, which we should be able to handle in our test suite construction. For example, in Fig. 1, states 2 and 3 of the specification are both implemented by state 2 of the implementation (as shown by ioco relation R in Example 8). In that case, we say that the two specification states are compatible, following the terminology introduced by Petrenko and Yevtushenko (2011) and Simao and Petrenko (2014). We give an explicit coinductive relation for compatibility and relate it to ioco in Lemma 24.

Definition 15

Let \((Q,T,q_{0}) \in \mathcal {S\!A}\). A relation RQ × Q is a compatibility relation if for all (q, q) ∈ R we have

$$\begin{array}{@{}rcl@{}} &&\forall a \in \textsf{in}(q) \cap \textsf{in} ({q^{\prime}}): (q \textsf{after} {a}, {q^{\prime}} \textsf{after} {a}) \in R \text{, and} \end{array} $$
(1)
$$\begin{array}{@{}rcl@{}} &&\exists x \in \textsf{out}(q) \cap \textsf{out}({q^{\prime}}): (q \textsf{after} {x}, {q^{\prime}} \textsf{after} {x}) \in R. \end{array} $$
(2)

Two states q, q are compatible, denoted by qq, if there exists a compatibility relation R relating q and q. Otherwise, the states are incompatible, denoted by .

Lemma 16

Let\((Q,T,q_{0}) \in \mathcal {S\!A}\).The relationis the largest compatibility relation. Furthermore,is reflexive and symmetric.

Proof

Symmetry follows from the fact that the definition is symmetric, and reflexivity holds as (1) holds trivially for any (q, q), and (2) follows from suspension automata being non-blocking. Thus, {(q, q)∣qQ} is a compatibility relation.

Second, note that ◊ is a compatibility relation: for any element (q, q) ∈ ◊, there is a compatibility relation R and so any successors of q and q are related by R as well, meaning that the successors are also included in ◊. To show that ◊ is the largest, let R be any compatibility relation, then all its elements are included in ◊ by definition. □

Example 17 shows that compatibility is not transitive, thus it is not an equivalence relation. We will later show that equivalence is stronger than compatibility.

Example 17

In Fig. 2, we have 1 ◊ 2 and 1 ◊ 3, but . This last fact can be immediately deduced from the common outputs of states 2 and 3, since out(2) ∩out(3) = {y, z}∩{x} = . From the observations {1, 2}aftery = 2, {1, 3} afterx = 2, and in({1, 2, 3}) = , it follows that 1 ◊ 2 and 1 ◊ 3.

Fig. 2
figure 2

An example showing that compatibility is non-transitive

Definition 18

Let \(S = (Q,T,q_{0}) \in \mathcal {S\!A}\). Define \(F_{\mathrel {\Diamond }}: \mathcal {P}(Q \times Q) \rightarrow \mathcal {P}(Q \times Q)\) as

$$\begin{array}{@{}rcl@{}} F_{\mathrel{\Diamond}}(U) = \{ (q, q^{\prime}) \in Q \times Q \mid {}&\forall a \in \textsf{in}(q) \cap \textsf{in}({q^{\prime}}): (q \textsf{after} {a}, {q^{\prime}} \textsf{after} {a}) \in U \\ &\wedge \exists x \in \textsf{out}(q) \cap \textsf{out}({q^{\prime}}): (q \textsf{after} {x}, {q^{\prime}} \textsf{after} {x}) \in U \}. \end{array} $$

Lemma 19

Relationcan be computed iteratively as greatest fixpoint ofF.

Proof

First, we remark that F is a monotone function on the set of relations on Q. Define the relations ◊0 = Q × Q and ◊i+ 1 = F(◊i). Now note that ◊0F(◊0) and so by monotonicity, we get ◊0 ⊇◊1 ⊇◊2 ⊇…. Since ◊0 is finite, this sequence stabilizes at some stage k: ◊k = ◊k+ 1. Due to the correspondence between F and Definition 15, a relation U is a compatibility relation if and only if it is a fixpoint for F. In particular, ◊k = ◊. Since ◊ is reflexive, pairs (q, q) are not removed from ◊ during this computation, and since it is symmetric, we remove (q, q) and (q, q) at the same time. Thus, k is bounded by \(\frac {|Q| \cdot (|Q|-1)}{2}\). □

Compatibility of two specification states means that there is some common behavior allowed by both states. Beneš et al. (2015) introduce the merge-operator, which produces a new specification allowing precisely this common behavior. We present the definitions here, although in a somewhat different notation. In particular, we specialize the n-ary operator to a binary operator. We prove that compatibility indeed corresponds to existence of such a merge. Intuitively, merging is similar to parallel composition, removing blocking states afterwards.

Definition 20

Let \(S = (Q,T,q_{0}), S^{\prime } = (Q^{\prime },T^{\prime },q_{0}^{\prime }) \in \mathcal {S\!A}\) and let (QX, TX, q0) and \((Q_{X}^{\prime },T_{X}^{\prime },q_{0}^{\prime })\) be their demonic completions. For qQ and qQ, we define their parallel composition as \(q \parallel q^{\prime } = (Q_{\parallel },T_{\parallel },(q,q^{\prime })) \in \mathcal {A}\mathcal {I}\mathcal {O}\), where

  • \(Q_{\parallel } = Q_{X} \times Q_{X}^{\prime }\)

  • \(T_{\parallel } = \left \{((q_{1},q_{1}^{\prime }),\mu ,(q_{2},q_{2}^{\prime })) \mid (q_{1},\mu ,q_{2}) \in T_{X} \wedge (q_{1}^{\prime },\mu ,q_{2}^{\prime }) \in T_{X}^{\prime } \right \}\).

Note that qq may contain states without any outputs (i.e., blocking states) and may therefore not be a suspension automaton. A blocking state cannot be implemented in a conforming manner, as an implementation must produce an output. States with transitions unavoidably leading to blocking states can also not be implemented. These states are denoted to be invalid by Beneš et al. (2015). We prove that two states are compatible exactly when their parallel composition has a valid initial state.

Definition 21

Let \((Q,T,q_{0}) \in \mathcal {A}\mathcal {I}\mathcal {O}\). We define the set of invalid states, inv(Q) ⊆ Q, inductively as follows. A state qQ is invalid ifFootnote 1

$$\begin{array}{@{}rcl@{}} && \textsf{out}(q) = \emptyset, \text{or} \end{array} $$
(1)
$$\begin{array}{@{}rcl@{}} && \exists a \in \textsf{in}(q): q \textsf{after} {a} \in {\textsf{inv}}(Q), \text{or} \end{array} $$
(2)
$$\begin{array}{@{}rcl@{}} && \forall x \in \textsf{out}(q): q \textsf{after} {x} \in {\textsf{inv}}(Q). \end{array} $$
(3)

A state is called valid if it is not invalid and we define valid(Q) = Q ∖inv(Q).

Lemma 22

Let\(S = (Q,T,q_{0})\in \mathcal {S\!A}\),and letq, qQ.The initial state ofqqis valid if and only ifqq.

Proof

Let qq = (Q, T,(q, q)). We first remark that condition (1) in Definition 21 is redundant as it implies condition (3). So we have that inv(Q) is the smallest set closed under (2) and (3). Thus, since the set of valid states is its complement, valid(Q) is the largest set for which the negations of (2) and (3) hold. We unfold these negated definitions to see that this coincides with Definition 15, by using De Morgan’s laws and Definition 20:

$$\begin{array}{@{}rcl@{}} &&\quad\neg(\exists a \in \textsf{in}({(p,p^{\prime})}): ({(p,p^{\prime})} \textsf{after} {a}) \in {\textsf{inv}}(Q_{\parallel}))\\ &&\quad \wedge \neg(\forall x \in \textsf{out}({(p,p^{\prime})}): ({(p,p^{\prime})} \textsf{after} {x}) \in {\textsf{inv}}(Q_{\parallel})))\\ &&\iff \\ &&\quad(\forall a \in \textsf{in}({p})\cap\textsf{in}({p^{\prime}}): ({p} \textsf{after} {a},{p^{\prime}} \textsf{after} {a}) \in {\textsf{valid}}(Q_{\parallel}))\\ &&\quad\wedge (\exists x \in \textsf{out}({p})\cap\textsf{out}({p^{\prime}}): ({p} \textsf{after} {x},{p^{\prime}} \textsf{after} {x}) \in {\textsf{valid}}(Q_{\parallel}))) \end{array} $$

According to Definition 15, valid(Q) is thus the largest compatibility relation on X(S). Removing all pairs of states (p, χ) and (χ, p) from valid(Q) results in the largest compatibility relation for S, that is, the relation ◊. □

We can now define the merge of two states as the parallel composition in which the invalid states have been removed. Figure 3 shows an example.

Fig. 3
figure 3

In the specification S from Fig. 1a, the states 2 and 3 are compatible, but not equivalent. This shows (the reachable states of) the specification 2 ∧ 3

Definition 23

Let \(S = (Q,T,q_{0}) \in \mathcal {S\!A}\), q, qQ and qq = (Q, T,(q, q)) be their parallel composition. If (q, q) ∈valid(Q), then the merge of q and q is defined as \(q \wedge q^{\prime } = (Q_{\wedge }, T_{\wedge }, (q,q^{\prime })) \in \mathcal {S\!A}\), where Q = valid(Q) and T = T∩ (Q× L × Q).

By Beneš et al. (2015), it is proven that removing the invalid states yields no new invalid states. The merge thus yields a suspension automaton, except when its initial state would be removed. The initial state thus should be valid for the merge to be well-defined. From Lemma 22, it then follows that ∧ yields a suspension automaton precisely for compatible states.

We introduced the merge as an operation that describes the common behavior of two compatible states. The following lemma states that implementations conform to both compatible states exactly when these implementations implement their merge. Moreover, there exists an implementation conforming to two states exactly when two states are compatible. This also means that our compatibility relation coincides with the one given by Simao and Petrenko (2014).

Lemma 24

Let\(S = (Q,T,q_{0}) \in \mathcal {S\!A}\)andq, qQ.Then, the following holds:

  1. 1.

    \(q \mathrel {\Diamond } q^{\prime } \implies (\forall I \in \mathcal {S\!A}_{\textit {IE}}: I \mathrel {\mathsf {ioco}} (q \wedge q^{\prime }) \iff (I \mathrel {\mathsf {ioco}} S/q)\)and(I ioco S/q))

  2. 2.

    \(q \mathrel {\Diamond } q^{\prime } \iff \exists I \in \mathcal {S\!A}_{\textit {IE}}: I \mathrel {\mathsf {ioco}} S/q\)andIiocoS/q.

Proof

Let qq = (Q, T,(q, q)). For both statements, we can replace qq by (q, q) ∈valid(Q) by Lemma 22. The merge is then well-defined (Definition 23). Statement 1 then follows from [Beneš et al. (2015), Axiom (M)]. Although \(\mathcal {S\!A}\) is an extension of the specification domain of Beneš et al. (2015), the proof holds in our setting as well.

For statement 2 (⇐=), we prove the contrapositive: if the initial state of qq is invalid, no implementation exists. If condition 1 of Definition 21 holds for (q, q), then trivially no implementation exists, as implementations are non-blocking by Definition 1. If condition 2 or 3 holds then there exists no implementation by induction: If condition 2 holds, an implementation cannot prevent receiving any input that reaches an invalid state, as implementations are input enabled by Definition 1; If condition 3 holds, any output transition for x ∈out((q, q)) leads to an invalid state. Hence, qq cannot be implemented. By statement 1, we then obtain that S/q and S/q cannot be implemented.

To prove 2 (⇒), note first that we take the demonic completion before computing the parallel composition. Therefore, qq is input-enabled. Pruning preserves this, as a state is invalid already if it has one input transition to an invalid state (Definition 21). Hence, \(q \wedge q^{\prime } \in \mathcal {S\!A}_{\textit {IE}}\). As ioco is reflexive for \(\mathcal {S\!A}_{\textit {IE}}\), qq conforms to itself. We obtain the conclusion by applying statement 1. □

From the established properties of ◊ and ≈, we can now easily relate the two.

Lemma 25

Let\(S=(Q,T,q_{0}) \in \mathcal {S\!A}\).Then, ≈⊆◊.

Proof

Let q, qQ be two states with qq. By Lemma 11, we have X(S/q) ioco S/q and by equivalence of q and q, we get X(S/q) ioco S/q. We conclude that S/q and S/q are both implemented by X(S/q). This implies qq by Lemma 24. □

5 Distinguishing graphs

In Definition 27, we define distinguishing graphs. Intuitively, such a graph describes how a tester can distinguish the specification states in a set D. That is, how to steer an implementation in state qi in such a way that it can only show conformance to at most one specification state in D, forcing it to reveal non-conformance to other specification states in D. Figure 4 shows an example distinguishing graph. Distinguishing graphs are very similar to the distinguishing sequences used in FSM theory.

Fig. 4
figure 4

Distinguishing graph of the suspension automaton in Fig. 5. For readability, some nodes are shown multiple times to obtain a tree representation

In our context, we may either want to observe outputs, or we may want to apply some input. In the latter case, this gives a race-condition between the tester and the implementation, if the implementation delivers an output before the desired input can be supplied. We then simply re-attempt the test. We will elaborate on this in Section 6.

When distinguishing states D, we require that every input that we take is specified in all states of D. Furthermore, if multiple states of D have the same destination state for some common input or output μ, i.e., T(q, μ) = T(q, μ) for different q, qD, then μ cannot be used to distinguish D. The reason is that after performing μ, the resulting behavior afterwards is then the same for both states. We then say that μ is not injective for D. Injectivity as we define it here is similar to the concept of validity as used in Lee and Yannakakis (1994) (not to be confused with validity as introduced in Definition 21).

Definition 26

Let \((Q,T,q_{0}) \in \mathcal {S\!A}\), DQ a set of states, and μL a label. Then, injective(D, μ) holds if \(\mu \in L_{O} \cup \bigcap _{q \in D}\textsf {in}(q)\) and for all distinct q, qD, we have μ ∈init(q) ∩init(q) ⇒ qafterμqafterμ.

Definition 27

Let \((Q,T,q_{0}) \in \mathcal {S\!A}\), and DQ a set of states. A distinguishing graph for D is a directed acyclic graph with a finite set of nodes \(V \subseteq \mathcal {P}(Q) \cup \{\textbf {reset}\}\), labeled edges EV × L × V, and root node D. For every node vV, we require

  1. 1.

    if |v|≤ 1, then v is a leaf node, and

  2. 2.

    if |v| > 1, then v is a non-leaf node and either of the following holds:

    1. (a.)

      for every output xLO, there is an edge (v, x, vafterx) ∈ E, and injective(v, x) holds, or

    2. (b.)

      for some input aLI such that injective(v, a), there is an edge(v, a, v after) ∈ E, and for every output xLO there is an edge (v, x, reset) ∈ E.

A node vV is a pass node if vreset and |v|≤ 1. We define \(\mathcal {DG}(S,D)\) as the set of all distinguishing graphs for D with DDQ.

A node v of a distinguishing graph describes the states of the specification that can be reached from states in the root node, by taking the sequence of labels from the root node to v. By injectivity, if a node is reached with less states than the root, then the sequence to that node disproves conformance to some states of the root. A pass node is reached when at most one state is left, disproving conformance to all, or all but one state of the root node. Any graph \(w \in \mathcal {DG}(S,\{q, q^{\prime }\})\) distinguishes q and q. By Definition 27, we have \(\mathcal {DG}(S,D) \subseteq \mathcal {DG}(S,D^{\prime })\) for DD, because a distinguishing graph that can distinguish all states D, can also distinguish all its subsets of states DD.

Example 28

Figure 4 shows a distinguishing graph for states {1,2,3,4,5} of the specification in Fig. 5. Suppose that we observe outputs zz from some implementation. Then, the distinguishing graph tells us that we can perform input a. Suppose that we then observe outputs xy. We then have observed trace zzaxy, thus we must be in state 1. We can trace this path backwards from state 1, traversing only states in the nodes of distinguishing graph, to find our starting state. We must have reached state 1 with y from state 5, which in turn we have reached with x from 4. State 4 has two incoming edges for a from states 2 and 4, but only state 4 is in the respective node of the distinguishing graph. Continuing, we find that we started in state 4. Indeed, no other state has this trace.

Fig. 5
figure 5

Example specification with mutually incompatible states

Lemma 29

Let\(S = (Q, T, q_{0}) \in \mathcal {S\!A}\),andq, qQ.There is a distinguishing graph\(Y \in \mathcal {DG}(S, \{q, q^{\prime }\})\)ifand only if.

Proof

(⇒) Note that the graph is directed and acyclic, so successor nodes define strictly smaller graphs. This means we can prove the implication by induction on the graph Y . We know that Y is a distinguishing graph for q and q, so its root node is {q, q}. This excludes that Y is constructed with rule (1) of Definition 27.

Assume Y is constructed by rule 2(a). Then, for all x ∈out(q) ∩out(q), we have that qafterxqafterx by injectivity. We then have a distinguishing graph for qafterx and qafterx. By induction, we may assume that . Hence, as condition (2) of Definition 15 cannot be satisfied.

Now assume Y is constructed by rule 2(b). Then, we have an a ∈in(q) ∩in(q) with q afteraqaftera. Again, we have a graph distinguishing q aftera and qaftera. By induction we know . So as condition (1) of Definition 15 cannot be satisfied.

In both cases, we showed that as required.

(⇐=) By Lemma 19, we know that ◊ can be computed iteratively as ◊i. Let i be the smallest number such that (q, q)∉◊i. (Note that i≠ 0.) Since (q, q)∉◊i, either of the conditions (1) and (2) in Definition 15 is false.

If i = 1, the first condition trivially holds: for all a ∈in(q) ∩in(q), we have (q aftera, qaftera) ∈ Q × Q, as ◊0 = Q × Q. So the second condition must be false. This means that for all x ∈out(q) ∩out(q), we have (qafterx, qafterx)∉Q × Q. This can only happen if out(q) ∩out(q) = . So we can make a distinguishing graph with root node {q, q} and edges for xLO to a node with either {q}, {q} or .

If i > 1, both conditions can be false. If the first condition is false, there exists an a ∈in(q) ∩in(q) such that (q aftera, qaftera)∉◊i− 1. We then make a distinguishing graph with root node {q, q}, with an edge for a to a distinguishing graph for {q, q}aftera, which exists by induction, and x-labeled edges to reset nodes for each xLO. Otherwise, the second condition is false and we have for all x ∈out(q) ∩out(q) that (qafterx, qafterx) ∉◊i− 1. In this case, we make a node with several edges, one for each such x. In all cases, the children are constructed inductively using the fact that (qafterμ, qafterμ) ∉◊i− 1. □

Lemma 29 tells us that a distinguishing graph always exists for two incompatible states. However, for a set D of more than two mutually incompatible states, a distinguishing graph for D may not exist.

Example 30

Consider mutually incompatible states 1, 3, and 5 in Fig. 6. States 1 and 3 both reach the same state after a, so injective({1,3,5},a) does not hold, and these states can thus not be distinguished by a. Similarly, states 3 and 5 cannot be distinguished after b. For the only output z ∈out({1,3,5}), we have that {1,3,5}afterz = {1,3,5}, so we cannot distinguish {1,3,5} on outputs as this would make the distinguishing graph cyclic.

Fig. 6
figure 6

No distinguishing graph exists for {1,3,5}

Definition 31 defines properties on sets of distinguishing graphs needed for constructing n-complete test suites.

Definition 31

Let \(S = (Q,T,q_{0}) \in \mathcal {S\!A}\) be a specification. Let W be a set of distinguishing graphs.

  • W is a characterization set if ∀q, qQ: \(\implies \exists w \in W: w \in \mathcal {DG}(S,\{q, q^{\prime }\})\).

  • W is a state identifier forq if: ∀qQ: \(\implies \exists w \in W: w \in \mathcal {DG}(S,\{q, q^{\prime }\})\).

  • A set of state identifiers {W(q)∣qQ} is harmonized if: ∀q, qQ: \(\implies \exists w \in W(q) \cap W(q^{\prime }): w \in \mathcal {DG}(S,\{q, q^{\prime }\})\).

Algorithm 1 shows how to construct a set of distinguishing graphs that is a set of harmonized state identifiers. We will only construct distinguishing graphs for pairs of states, as we can guarantee that these graphs have polynomial size.

This algorithm extends the fixpoint algorithm as described in Lemma 19, in which ◊ is computed. We add a partial function W, which keeps track of all distinguishing graphs for sets D of at most two states. Initially, we already know that every D with size zero or one has a trivial distinguishing graph of a pass root node. We then start computing ◊i for increasing i until this procedure stabilizes. During every iteration, we find new pairs of states which are incompatible, stored in . We then immediately construct a distinguishing graph for the found pairs.

Incompatibility arises for two reasons. Either for some input aLI, successor states q aftera and qaftera have earlier been found incompatible. Otherwise, for all outputs xLO, states qafterx and qafterx have been found incompatible. We thus know that we have already constructed a distinguishing graph for these successor states in an earlier iteration. Since the transitions from q and q for the found input or all outputs lead to incompatible states, we can then use the distinguishing graph for the successor states to create a distinguishing graph for q and q, which we add to W. The result of this algorithm is thus the compatibility relation, proven to be correct by Lemma 19, together with distinguishing graphs for all incompatible states.

figure p

On first sight, the algorithm may seem to miss a base case, as it finds incompatible states only if the successor states for some input or for all outputs are also incompatible. However, the condition on line 10 is trivially true if out(q) ∩out(q) = for some incompatible states q and q. The successors {q, q} afterx for xLO are then singleton or empty, for which W contains a (trivial) distinguishing graph.

Note that at line 27 of the algorithm, we describe how to distinguish states by applying an input: we do this with an edge to an existing distinguishing graph for this input, and an edge to reset for all outputs. This indicates that a failed attempt of applying an input should simply be retried, until it succeeds. However, after an output, we may still reach incompatible states, which instead we may attempt to distinguish without resetting. Furthermore, one may want to prioritize distinguishing with inputs (if waiting for outputs may be slow) or with outputs (if one wants to prevent race conditions). One may thus adapt Algorithm 1 to his or her needs.

Example 32

We demonstrate how to apply Algorithm 1 on specification S in Fig. 1a. Since ◊0 contains all pairs of states, iteration i = 1 will find pairs of incompatible states only for pairs of states with disjoint outputs. These are all pairs except (1,4), (2,3), and (4,5) (and, obviously, their mirrored variants, as well as all pairs of equal states (1,1), (2,2), …). Every pair in is assigned a distinguishing graph on outputs, with leaf nodes as children. For example, the distinguishing graph for pair is shown in Fig. 7. We find

$${\mathrel{\Diamond}_{1}} = \{(1,4),(4,1),(2,3),(3,2),(4,5),(5,4),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6)\}.$$

In iteration i = 2, we additionally find , as out(1) ∩out(4) = x, and 1 afterx = 2, 4 afterx = 6 and . The distinguishing graph for 1 and 4 is built up from the previously found graph, as also shown in Fig. 7. We find

$${\mathrel{\Diamond}_{2}} = \{(2,3),(3,2),(4,5),(5,4),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6)\}. $$

In iteration i = 3, no new incompatible states are found so ◊ = ◊3 = ◊2. Indeed, 2 ◊ 3 and 4 ◊ 5 are the only (non-trivial) compatible state pairs.

Fig. 7
figure 7

Two distinguishing trees resulting from Algorithm 1

Lemma 33

Let\(S = (Q,T,q_{0}) \in \mathcal {S\!A}\),and let (◊,W) be the result of Algorithm 1. Then,

  1. 1.

    q, qQ:W({q, q}).

  2. 2.

    q, qQ: \(\implies \mathbf {W}(\{q,q^{\prime }\}) \in \mathcal {DG}(S,\{q,q^{\prime }\}))\).

  3. 3.

    For any distinguishing graph in W, the number of its nodes is bounded by O(|Q2|) and the number of its edges is bounded by O(|Q2|⋅|LO|).

  4. 4.

    By taking W(q) = , we obtain a harmonized set of state identifiers {W(q)∣qQ}.

Proof

  1. (1)

    This follows from the simultaneous construction of W and ◊: we add a distinguishing graph for {q, q}, precisely when we conclude .

  2. (2)

    We indeed find a graph by (1). Thus, we only need to show that it is acyclic and finite, conforming to Definition 27. For any graph in W constructed in iteration i, the graph is acyclic, and the height of the graph is at most i. This can be shown by induction to i: at iteration i = 0, W contains only leaf nodes, which have no outgoing edges. For all graphs constructed in iteration i + 1, the root node only has edges to root nodes of graphs from previous iterations, and to reset. By induction, these contain no cycles and have height of at most i.

  3. (3)

    For any distinguishing graph with root D, all nodes D in that graph have |D|≤|D|, by Definition 27. Since nodes of distinguishing graphs of W are sets of at most two states, the number of nodes is bounded by |Q|2 + |Q| + 2 (including the node {reset}). Since every node in the graph contains at most one outgoing edge for every output, and possibly a single edge for some input, we find the claimed bounds.

  4. (4)

    The fact that W(q) is a state identifier follows from (1) and (2). The set {W(q)∣qQ} is harmonized because for each pair q, qQ we constructed one graph, which is then added to both W(q) and W(q).

6 Test suites

An n-complete test suite \(\mathbb {T}(S,n)\) for a specification S guarantees for any implementation I that I ioco S if I passes \(\mathbb {T}\), assuming that the size of I is at most n. Implementations may contain many states which are unspecified in S, and these states are not relevant for conformance. We will first define the size of an implementation in this respect, after which we will introduce all ingredients required for n-complete test suites.

Definition 34

Let \(S=(Q,T,q_{0}) \in \mathcal {S\!A}\) be a suspension automaton and \(I=(Q_{I},T_{I},{q_{0}^{I}}) \in \mathcal {S\!A}_{\textit {IE}}\) be an implementation.

  • Define the set of reachable states from a state qQ in S as the set \({\textsf {Reachable}}(S, q) = \bigcup _{\sigma \in L^{*}} q \textsf {after} {\sigma }\). The set of reachable states from q0 is denoted by Reachable(S).

  • A state qQI is specified if ∃σtraces(S) : I afterσ = q. A transition (q, μ, q) ∈ TI is specified if q is specified, and if either μLO, or μLI ∧∃σL : I afterσ = qσμtraces(S).

  • We denote the number of specification states by |S| = |Reachable(S)|.

  • The set of reachable specified implementation states is denoted SpecifiedS(I) = {q ∈Reachable(I)∣q is specified}. We define |I|S = |SpecifiedS(I)|.

Definition 35

Let \(S \in \mathcal {S\!A}\) be a specification. A test suite \(\mathbb {T}\) for S is n-complete if \(\forall I \in \mathcal {S\!A}_{\textit {IE}}\): \(\mathbb {T}\) produces verdict pass for II iocoS ∨|I|S > n.

In particular, |S|-complete means that if an implementation passes the test suite, then the implementation is correct (w.r.t. ioco) or it has strictly more states than the specification.

In the FSM setting, n-complete test suites require access sequences and distinguishing sequences. In our context, we will use the term distinguishing experiments instead of distinguishing sequences. We already have distinguishing graphs for distinguishing incompatible states. Distinguishing experiments for compatible states, as well as access sequences, will be explained in the next two sections. After that, we give the definition of a test suite constituting of these parts and explain how it must be executed. We also give a proof that this test suite is indeed n-complete.

6.1 Distinguishing compatible states

Distinguishing graphs as described in Section 5 rely on incompatibility of states, by steering the implementation to a point where the specification states disagree on the allowed outputs, i.e., the states have disjoint out-sets. In this way, an implementation state cannot conform to both states, so it shows a non-conformance to at least one of the states. By using multiple distinguishing graphs, we hence show that an implementation state conforms to all but one specification state. By doing this for all implementation states, each implementation state conforms to a different specification state.

This technique fails for compatible specification states, as an implementation state may conform to multiple specification states. In such a case, a tester cannot with certainty steer the implementation to showing a non-conformance to any of the compatible specification states.

We thus extend the aim of a distinguishing experiment: instead of showing a non-conformance to any of two states q and q of specification S, we may also prove conformance to both. As our implementation is black-box, we can only prove this by testing: this is achieved precisely by an n-complete test suite for qq, as this describes all common behavior of S/q and S/q (Lemma 24). Hence, failing an n-complete test suite for qq means disproving conformance to either S/q, S/q, or both, thus achieving the original goal of a distinguishing experiment. Passing this n-complete test suite means proving conformance to both S/q and S/q, under the assumption that the implementation has no more than n states. This is already assumed, when distinguishing q and q in the context of an n-complete test suite for S.

6.2 Access sequences

In FSM-based testing, the implementation states are reached in a rather efficient way. A set P of access sequences is used to reach |P| implementation states, after which all other states are reached by extending P with sequences of LI. If we directly translate this to using P ⊆traces(S), and alphabet L, this is not sufficient for reaching all states SpecifiedS(I) of implementation I. This is because I may have less than |P| states reached by P, and hence PLk reaches less than n = |P| + k states of I. This has two causes: (1) the specification has multiple compatible states, which are implemented by a single state; (2) ioco allows to have a sequence pP with ptraces(I) if p = σxρ with |out(S after σ)| > 1, i.e., transition x is optional for I to implement (Safterσx is then not certainly reachable according to Simao and Petrenko 2014).

Example 36

Consider Fig. 8 for an example. An implementation can omit state 2 of specification S, as shown in Fig. 8b, while still conforming to S. The implementation in Fig. 8c exploits this: it is non-conforming, while still having no more states than S, yet it is not detected by test suite PL≤ 1W. We have PL≤ 1W = {𝜖, x, y}⋅{𝜖, x, y, z}⋅{x, y, z}, so if we take yP (the implementation has no x-transition), zL (the implementation has no other possible transitions), and observe zW(3), we do not reach the faulty y transition in the implementation. This means that we may need to increase the size of the test suite in order to obtain the desired completeness. In this example, a test suite PL≤ 2W is sufficient as the test suite will contain a test with yzzPLL after which the faulty output yW(3) will be observed.

Fig. 8
figure 8

A specification with not certainly reachable states 2 and 3. a Specification S.b Conforming implementation. c Non-conforming implementation

Clearly, we reach all states in a n-state implementation for any specification S, by taking P to be all traces in traces(S) of length less than n. This set P can be constructed by simple enumeration. We then have that the traces in the set P will reach all specified, reachable states in all implementations I such that |I|Sn. In particular, this means that P+ = PL reaches all specified transitions. We conjecture that a much more efficient construction is possible with a careful analysis of compatible states and not certainly reachable states.

6.3 Test suite definition

We now have all ingredients to define a test suite. Definition 37 uses mutual recursion, as a test suite can show up inside another test suite (as discussed in Section 6.1).

Definition 37

Let \(S=(Q,T,q_{0}) \in \mathcal {S\!A}\), and \(n \in \mathbb {N}\). Let {W(q)∣qQ} be a harmonized set of state identifiers for S. The distinguishing testsuite\(\mathbb {T}(S,n)\) is defined as follows.

$$\begin{array}{@{}rcl@{}} \mathbb{T}(S,n) &=& \{(\sigma,\tau) \mid \sigma \in P^{+}(S,n), \tau \in \mathit{D{\kern-.75pt}E}(S,S \textsf{after} {\sigma}) \}, \text{where}\\ P(S,n) &=& \{ \sigma \in \textsf{traces}({S}) \mid |\sigma| < n\}\\ P^{+}(S,n) &=& \{ \sigma\mu \mid \sigma \in P(S,n), \mu \in \textsf{init} ({{S} \textsf{after} {\sigma}}) \}\\ \mathit{D{\kern-.75pt}E}(S,q) &=& W(q) \cup \{\mathbb{T}(q \wedge q^{\prime},n) \mid q^{\prime} \in Q, q \mathrel{\Diamond} q^{\prime}, q \not\approx q^{\prime}\} \end{array} $$

When having a test suite \(\mathbb {T}(S,n)\), we refer with access sequences to its set P(S, n), and with distinguishing experiments to its sets DE(S, q). A merge qq used as part of a distinguishing experiment may be bigger even than S itself, which may cause an infinite distinguishing test suite from Definition 37. We give an alternative solution with a finite upper bound in Section 6.6.

We remark that specification states which allow all behavior (i.e., all states equivalent to χ) never need to be tested, as conformance for any implementation is intrinsic. Thus, we can remove these states from the specification (similar to Beneš et al. 2015) before constructing a test suite.

Example 38

We will briefly show the ingredients for a test suite for S in Fig. 1a by constructing \(\mathbb {T}(S,6)\). The set of access sequences P+(S,6) contains all traces of S up to length 6. To also determine the distinguishing experiments for all states, we first analyze the compatible states as explained in Example 32. This analysis shows that the only pairs of inequivalent, compatible states are 2 ◊ 3, and 4 ◊ 5. For all incompatible pairs, we obtain a distinguishing graph. For example, the distinguishing graph for as constructed in Example 32 is included in the distinguishing experiments DE(S,1) and DE(S,4).

For every compatible pair, we recursively compute a test suite for their merge, which we use as distinguishing experiments: \(\mathbb {T}(2 \wedge 3, 6) \in \mathit {D{\kern -.75pt}E}(S,2) \cap \mathit {D{\kern -.75pt}E}(S,3)\) and \(\mathbb {T}(4 \wedge 5, 6) \in \mathit {D{\kern -.75pt}E}(S,4) \cap \mathit {D{\kern -.75pt}E}(S,5)\). The merge 2 ∧ 3 was given in Fig. 3 and the merge 4 ∧ 5 occurs as a sub-automaton. When making the distinguishing experiments for these compatible states, we can remove the state (χ, χ) as it is equivalent to the chaos state. This leaves us with a 3-state and a 2-state automaton.

To recursively compute \(\mathbb {T}(2 \wedge 3, 6)\), we take all prefixes of wz5 and ayz4 as access sequences. Performing Algorithm 1 on these automata, we find that all pairs of states in these automata are incompatible. Distinguishing experiments DE(2 ∧ 3,q) thus only contain distinguishing graphs for all states q of 2 ∧ 3, so no new test suites have to be computed recursively. Computing \(\mathbb {T}(4 \wedge 5, 6)\) is done likewise, and also terminates without recursion.

6.4 Execution of test suites

So far we have introduced distinguishing test suites, access sequences and distinguishing graphs. Each of those describes an executable experiment, for which we need to define how it is executed.

First, we consider the execution of a trace σ as a sequential execution of its labels, where inputs and outputs are treated differently.

An output x is executed by waiting for the implementation to produce an output y, and then checking whether x = y. If so, we continue with the next label of σ. Otherwise, we try again by resetting the implementation to its initial state and execute σ from its first label. We require execution to be fair: if a trace σx is executed often enough, then every output y appearing in the implementation after σ will eventually be observed. Therefore, after a finite number of times resetting, we may conclude that the implementation cannot show the intended x-transition. Determining the exact number is left to the tester. Concluding that the implementation does not contain the trace σx is also considered a successful execution.

An input is executed by providing it to the implementation. An implementation may produce an output after σ before the tester can supply an input. Again, we require fairness: if a trace σa is executed often enough, then the tester will eventually succeed in executing a after σ. Assuming fairness is unavoidable for any notion of completeness in testing: a fault can never be detected if an implementation consistently chooses paths that avoid this fault.

Distinguishing test suites are executed by executing all tests contained in it. A test (σ, τ) is executed by first executing σ as described, and then executing the distinguishing experiment τ. If we conclude by fairness that some output of σ cannot be produced by the implementation, we declare σ, and also (σ, τ) to have been executed. While executing any test of a test suite \(\mathbb {T}\) for specification S, it is always checked whether any executed trace is a trace of S. If an ioco counterexample for S is observed, the test suite \(\mathbb {T}\) produce the verdict fail, and test execution stops. If all tests have been executed without encountering an ioco counterexample, then the test suite \(\mathbb {T}\) produces a pass verdict.

Example 39

Consider the distinguishing test suite \(\mathbb {T}\) for specification S in Fig. 1a. One of the tests contained in \(\mathbb {T}\) is \((a,\mathbb {T^{\prime }})\), where \(\mathbb {T^{\prime }}\) is a distinguishing test suite for the merge of compatible state 2 and 3 (see Example 38).

We continuously execute access sequence a before executing a test from \(\mathbb {T^{\prime }}\). If during execution of \(\mathbb {T^{\prime }}\), we observe the trace ax, then \(\mathbb {T^{\prime }}\) fails: trace ax is not a trace of 2 ∧ 3. We thus have successfully distinguished states 2 and 3 in the test \((a,\mathbb {T^{\prime }})\). This corresponds to observing trace aax during execution of \(\mathbb {T}\), which is no ioco counterexample for S, and hence does not result in a fail for \(\mathbb {T}\) itself.

If τ is a distinguishing test suite, then we execute it recursively, as already shown in Example 39. If it is a distinguishing graph, then it can be executed on an implementation by providing the inputs and observing the outputs on the edges of the tree going downwards from the root. In other words, if we view a distinguishing graph G with nodes V, edges E, and root node D, as a suspension automaton G = (V, E, D), then test execution of G on an implementation I is taking the parallel composition of G and I as in Definition 20. If (g, i) is a state of the composition, and g is a pass node, then distinguishing graph τ has been executed successfully. If g is a reset node, then the test needs to be reattempted. Note that the pass and reset states of the composition are the only blocking states, as all nodes of the distinguishing graph have edges for all outputs, and the implementation is input-enabled and non-blocking. Again, a test suite \(\mathbb {T}\) using the distinguishing graph τ does not use the verdict of τ, similar to Example 39: it only requires that distinguishing is successful. In the proof of Theorem 40, we need the following consequence of fairness. If a certain sequence ρ is observed in executing τ and τ is also used in testing another state, then if the other state does not show ρ (at some point), we conclude that ρ is not a trace of that state.

Finishing a distinguishing experiment τ may take several attempts: a distinguishing graph may give a reset because an input transition was not taken, and an n-complete test suite for distinguishing two compatible states may contain multiple tests. Access sequence σ needs to be executed before every attempt. By assuming fairness and finiteness of the test suite, every distinguishing experiment is guaranteed to terminate, and thus also every test.

6.5 Completeness proof for distinguishing test suites

Theorem 40

Let\(S \in \mathcal {S\!A}\)bea specification and\(n \in \mathbb {N}\).The distinguishing test suite\(\mathbb {T}(S,n)\)fromDefinition37 is n-complete.

Proof

We will show that for any implementation I with |I|Sn which passes the test suite we can build a coinductive ioco relation which contain the initial states. As a basis for that relation we take the states which are reached by the set P(S, n). This may not be an ioco relation, but by extending it (in two ways) we obtain a full ioco relation. Extending the relation is an instance of a so-called up-totechnique, we will use terminology from Bonchi and Pous (2015).

More precisely, let \(S = (Q_{S},T_{S},{q_{0}^{S}})\) and let \(I = (Q_{I},T_{I},{q_{0}^{I}})\) be an implementation with |I|Sn which passes \(\mathbb {T}(S,n)\). By construction of P(S, n), all states SpecifiedS(I) are reached by P(S, n) and so all specified transitions are reached by P+(S, n).

Using the set P(S, n), we define \( R = \{ (q_{0}^{I} \textsf {after} \sigma , q_{0}^{S} \textsf {after} \sigma ) \mid \sigma \in P(S,n) \} \) as a subset of QI × QS. First, we extend R by adding relations for all equivalent specification states: R = {(i, s)∣(i, s) ∈ R, sQS, ss}. Second, let \(\mathcal {J} = \{ (i,s) \mid i \in Q_{I}, s \in Q_{S} \text { such that} i \textsf {ioco} s \}\) and Ri, s be the ioco relation for i iocos, now define \(\overline {R} = R^{\prime } \cup \bigcup _{(i,s) \in \mathcal {J}} R_{i,s}\). We want to show that \(\overline {R}\) defines a coinductive ioco relation. We do this by showing that R progresses to \(\overline {R}\).

Let (i, s) ∈ R. We assume that we have seen all of out(i) and that out(i) ⊆out(s) (this is taken care of by the test suite and the fairness assumption). Then, because we use P+(S, n), we also reach the transitions after i. We need to show that the input and output successors are again related.

  • Let aLI. Since the implementation is input-enabled there is a transition for a with i aftera = i2. Suppose there is a transition for a from s: Saftera = s2 (if not, then we are done). We have to show that \((i_{2}, s_{2}) \in \overline {R}\).

  • Let xLO. Suppose there is a transition for x: iafterx = i2. Then, (since out(i) ⊆out(s)) there is a transition for x from s: safterx = s2. We have to show that (i2, s2) \(\in \overline {R}\).

In both cases, we have a successor (i2, s2) which we have to prove to be in \(\overline {R}\). Now since P(S, n) reaches all specified states of I, we know that i2 is reached and so \((i_{2}, s_{2}^{\prime }) \in R\) for some \(s_{2}^{\prime }\). If \(s_{2} \approx s_{2}^{\prime }\), then \((i_{2}, s_{2}) \in R^{\prime } \subseteq \overline {R}\) holds and we are done. So now suppose that \(s_{2} \not \approx s_{2}^{\prime }\). There are two cases:

  • If , then there exists a distinguishing graph \(w \in W(s_{2}) \cap W(s_{2}^{\prime })\) (since W is a harmonized set of state identifiers). This graph w is executed twice in i2: once as a test (σ, w) for some σP(S, n) with S afterσ = s, and once as a test (σ, w) for some σP+(S, n) with Safterσ = s2. By fairness, there is a single sequence ρ in w executed in both executions. This sequences reaches a pass state of w in both cases as our implementation passed the test suite. By construction of distinguishing graphs, ρ must be an ioco counterexample for either S/s2 or \(S/s_{2}^{\prime }\). This contradicts that the two tests passed, so this case cannot happen.

  • If \(s_{2} \mathrel {\Diamond } s_{2}^{\prime }\) (but \(s_{2} \not \approx s_{2}^{\prime }\) as assumed above), then we executed a test suite τW(s2) for \(s_{2} \wedge s_{2}^{\prime }\). By induction, we assume that τ is n-complete. If all the tests in τ pass, then we can conclude that i2iocos2 and so \((i_{2}, s_{2}) \in R_{i,s_{2}} \subseteq \overline {R}\). It can happen that a test in the distinguishing test suite τ fails, so that i2 does not conform to \(s_{2} \wedge s_{2}^{\prime }\). In that case, there is a sequence ρ which is an ioco counterexample executed after an access sequence of s2. By fairness, we may assume this trace ρ is also executed after \(s_{2}^{\prime }\) (since we execute it from the same implementation state). Since i2 does not conform to \(s_{2} \wedge s_{2}^{\prime }\), either execution makes the whole test suite \(\mathbb {T}(S,n)\) fail, contradicting the assumption.

In both cases, we either have a contradiction, so that \(s_{2} \not \approx s_{2}^{\prime }\) cannot hold, or we have proven directly that \((i_{2}, s_{2}) \in \overline {R}\).

So we have now seen that R progresses to \(\overline {R}\). It is clear that R progresses to \(\overline {R}\) too. Then, since each Ri, s is an ioco relation, they progress to \(R_{i,s} \subseteq \overline {R}\). And so the union, \(\overline {R}\), progresses to \(\overline {R}\), meaning that \(\overline {R}\) is a coinductive ioco relation. Furthermore, we have \((i_{0}, s_{0}) \in \overline {R}\) (since 𝜖P(S, n)), concluding the proof. □

We remark that if the specification does not contain any compatible states, the proof can be simplified considerably. In particular, we do not need test suites for merges of states, and we can use the relation R instead of \(\overline {R}\).

6.6 Unconditional test suite

The distinguishing test suite relies on executing distinguishing experiments. If a specification contains compatible states, the test suite contains distinguishing experiments which are themselves distinguishing test suites. This is thus a recursive construction: we need to show that such a test suite is finite. For particular specifications, recursive repetition of the distinguishing test suite as described above is already finite. For example, specification S in Fig. 3 contains compatible states, but in the merge of every two compatible states, no further compatible states remain (when ignoring state (χ, χ) as explained in Example 38). Consequently, the distinguishing test suites of each merge only have distinguishing graphs as distinguishing experiments, and hence the recursion terminates.

However, the merge of two compatible states may in general again contain compatible states. In these cases, recursive repetition of distinguishing test suites may cause a blow-up in the size of the test suite. We therefore provide an alternative: the unconditional test suite which has a clear upper bound. This bound is based on what is called state counting in FSM theory (Hierons 2004). The bound constitutes counting the number of times a specification state is visited while executing a trace on the implementation. Definition 41 and Lemma 42 make this precise in our ioco setting.

Fig. 9
figure 9

A specification, and a non-conforming implementation. a Specification S.b Implementation I

Definition 41

Let \(S = (Q, T, q_{0}) \in \mathcal {S\!A}\) and \(n \in \mathbb {N}\). A trace σtraces(S) is n-bounded if ∀qQ : |{ρρ is a prefix of σSafterρ = q}|≤ n.

Lemma 42

Let\(S=(Q,T,q_{0}) \in \mathcal {S\!A}\)and\(I \in \mathcal {S\!A}_{\textit {IE}}\).If, then traces(I) contains an |I|S-bounded ioco counterexample for S.

Proof

If , then traces(I) contains an ioco counterexample σ for S by Lemma 7. If σ is |I|S-bounded, the proof is trivial, so assume it is not. Hence, there exists a state qQ, with at least |I|S + 1 prefixes of σ leading to q. At least two of those prefixes ρ and ρ must lead to the same implementation state, i.e., it holds that Iafterρ = Iafterρ and Safterρ = Safterρ. Assuming |ρ| < |ρ| without loss of generality, we can thus create an ioco counterexample σ shorter than σ by replacing ρ by ρ. If σ is still not |I|S-bounded, we can repeat this process until it is. □

Contrapositively, if we would observe all n-bounded traces of an implementation, and we find no ioco counterexamples, we know that the implementation must be conforming. Note that an n-bounded trace has a length of at most |S|⋅ n, thus exhaustively checking all n-bounded traces is possible.

Definition 43

Let \(S \in \mathcal {S\!A}\) and \(n \in \mathbb {N}\). The unconditional test suite is then: \(\mathbb {U}(S,n) = \{\sigma \in \textsf {traces}({S}) \mid \sigma \text { is} n\text {-bounded}\}\).

Corollary 44

Let\(S \in \mathcal {S\!A}\)and\(n \in \mathbb {N}\).Then,\(\mathbb {U}(S,n)\)isan n-complete test suite. We have\(\forall \sigma \in \mathbb {U}(S,n): |\sigma | \le |S|\cdot n\).

The upperbound |S|⋅ n is tight, as shown in Example 45. A set of traces of length at most |S|⋅ n is much bigger than the set P+(S, n) of at most n-length traces, as the number of traces grows exponentially in their length. Thus, a distinguishing test suite as introduced in

Section 6.3 may be significantly smaller, depending on the number of compatible states. The unconditional test suite shows the possibility of unconditional termination with a fixed upper bound, though.

As \(\mathbb {U}(S,n)\) consists of traces, test execution amounts to executing all these traces, i.e., by executing them according to the fairness assumption. If the implementation produces a trace not in traces(S), the test suite has verdict fail. If all traces of \(\mathbb {U}(S,n)\) have been executed without obtaining a fail verdict, the test suite has verdict pass.

Example 45

Figure 9 shows a specification and a non-conforming implementation with ioco counterexample yyxyyxyyxyyx, of maximal length |S|⋅|I|S = 12.

7 Conclusions and future work

We firmly embedded theory on n-complete test suites into ioco theory, without making any restrictive assumptions. We have identified several problems where classical FSM techniques fail for suspension automata, in particular for compatible states. The concept of distinguishing states has been extended such that compatible states can be handled, by n-complete testing of the merge of such states. Additionally, we have given a construction for distinguishing graphs for incompatible states, which follows naturally from the computation of the compatibility relation.

We use an extended domain of suspension automata, which may not respect the usual conditions for quiescence. This is a conservative approach: detecting any faulty implementation in our extended domain, also finds any faulty implementation which does respect quiescence. However, this may produce more tests than required to detect “spurious” implementations. A further area of research is to tighten the definitions of equivalence, compatibility and n-complete test suites to capture the more restricted usual implementation domain.

For reaching all implementation states, we used all traces up to length n, which is hence an upper bound exponential in the number of states. Furthermore, the recursion of using a test suite for testing a merge of compatible states may possibly not terminate. We therefore introduced an unconditional test suite, which provides an exponential but finite upper bound. These two exponential upper bounds may limit practical applicability, so further investigation is needed to efficiently tackle these problems. Furthermore, experiments are needed to determine the actual efficiency of computation and execution time, preferably on real world case studies. This should include a quantitative comparison with other methods, for example random testing as by Tretmans (2008).