1 Introduction

Model-based testing (MBT) [50] is a technique to automatically generate, execute, and evaluate test suites on black-box implementations under test (IUT). The theoretical ingredients of an MBT framework are a formal model that specifies the desired system behaviour, often in terms of (some extension of) input–output transition systems; a notion of conformance that specifies when an IUT is considered a valid implementation of the model; and a precise definition of what a test case is. For the framework to be applicable in practice, we also need algorithms to derive test cases from the model, execute them on the IUT, and evaluate the results, i.e. decide conformance. They need to be sound (i.e. every implementation that fails a test case does not conform to the model), and ideally also complete (i.e. for every non-conforming implementation, there theoretically exists a failing test case). MBT is attractive due to its high degree of automation: given a model, the otherwise labour-intensive and error-prone derivation, execution and evaluation steps can be performed in a fully automatic way.

Model-based testing originally gained prominence for input–output transition systems (IOTS) using the ioco relation for input–output conformance [49]. IOTS partition the observable actions of the IUT (and thus of the model and test cases) into inputs (or stimuli) that can be provided at any time, e.g. pressing a button or receiving a network message, and outputs that are signals or activities that the environment can observe, e.g. delivering a product or sending a network message. IOTS include nondeterministic choices, allowing underspecification: the IUT may implement any or all of the modelled alternatives. MBT with IOTS tests for functional correctness: the IUT shall only exhibit behaviours allowed by the model. In the presence of nondeterminism, the IUT is allowed to use any deterministic or randomised policy to decide between the specified alternatives.

Stochastic behaviour and requirements are an important aspect of today’s complex systems: network protocols extensively rely on randomised algorithms, cloud providers commit to service level agreements, probabilistic robotics [46] allows the automation of complex tasks via simple randomised strategies (as seen in, e.g. vacuuming and lawn mowing robots), and we see a proliferation of probabilistic programming languages [23]. Stochastic systems must satisfy stochastic requirements. Consider the example of exponential backoff in Ethernet: an adapter that, after a collision, sometimes retransmits earlier than prescribed by the standard may not impact the overall functioning of the network, but may well gain an unfair advantage in throughput at the expense of overall network performance. In the case of cloud providers, the service level agreements are inherently stochastic when guaranteeing a certain availability (i.e. average uptime) or a certain distribution of maximum response times for different tasks. This has given rise to extensive research in stochastic model checking techniques [30]. However, in practice, testing remains the dominant technique to evaluate and certify systems outside of a limited area of highly safety-critical applications.

In this paper, we present two MBT frameworks based on input–output Markov automata [17] (IOMA) and stochastic automata [11, 12] (IOSA), which are transition systems augmented with discrete probabilistic choices and stochastic delays. Markov automata are a memoryless continuous-time model, essentially the extension of continuous-time Markov chains with nondeterminism: the time spent in any state of the automaton follows some exponential distribution. In stochastic automata, on the other hand, the progress of time is governed by clock variables whose expiration times follow general probability distributions. By using IOMA or IOSA models, we can quantitatively specify stochastic aspects of a system, in particular, w.r.t. timing. While IOMA are more suitable for the abstract specification of soft real-time systems, IOSA enable precise modelling of both hard and soft real-time systems and requirements. Since both models extend transition systems, nondeterminism is available for underspecification as usual. After introducing the models and their semantics (Sect. 3), we formally define the notions of Markovian and stochastic ioco (mar-ioco and sa-ioco, respectively), and of test cases as restrictions of IOMA and IOSA (Sect. 4). We then outline practical algorithms for conformance testing (Sect. 5). The latter combines per-trace functional verdicts as in standard ioco with a statistical evaluation that builds upon confidence interval estimation for IOMA and the Kolmogorov–Smirnov test [29] for IOSA. We finally exemplify our frameworks’ capabilities and the tradeoffs between the IOMA and IOSA approaches by testing timing aspects of different implementation variants of the Bluetooth device discovery protocol (Sect. 6).

1.1 Related work

Our mar-ioco and sa-ioco frameworks generalise the pioco framework [20] for probabilistic automata (or Markov decision processes), which only supports discrete probabilistic choices and has no notion of time at all.

Early influential work on model-based testing had only deterministic time [4, 31, 33, 34], later extended with timeouts/quiescence [5]. Probabilistic testing relations and equivalences are well studied [9, 14, 42]. Probabilistic bisimulation via hypothesis testing was first introduced in [35]. Our work is largely influenced by [8], which introduced a way to compare trace frequencies with collected samples. A more restricted approach is given in the work on stochastic finite state machines [28, 40]: stochastic delays are specified similarly, but discrete probability distributions over target states are not included. Closely related to our testing relation for Markov automata are the studies of bisimulation relations [17], which inspired further work on weak bisimulation [15] and late-weak bisimulation [43]. By studying relations based on trace distribution semantics, rather than equivalence relations, we grant vastly more implementation freedom.

Probabilistic and non-probabilistic MBT are part of a greater ecosystem of formal methods developed to improve the correctness, dependability, and trustworthiness of various types of systems, ranging from software over cyber-physical systems to, for example, organisational processes and biological applications. Model checking [1], probabilistic model checking [30], and statistical model checking [26, 54] serve to prove or disprove the conformance of a (probabilistic) model of a system to a (probabilistic) specification usually given in terms of temporal logics formulas. Notable probabilistic model checkers include Prism [32], Storm [13], and the mcsta tool of the Modest Toolset [25], while two current examples of statistical model checkers are Plasmalab [36] and the Modest Toolset’s modes simulator [6]. These techniques and tools are complimentary to MBT, which establishes a relation between a model (which now acts as a specification, and may earlier have been verified with model checking) and the real implementation. Notably, the Modest Toolset also includes an MBT tool [24], thus providing all three techniques for probabilistic systems in one package. The “opposite” of MBT, deriving a model from an implementation using automata learning [51, 53], is also gaining popularity and is especially well suited for the analysis of legacy systems [41]. Automata learning typically uses MBT internally to check whether the model learned so far is approximately equivalent to the implementation under learning.

1.2 Previous work

This paper provides a new integrated presentation of our previous papers on model-based testing for Markov automata [21] and stochastic automata [19]. We explain the differences and tradeoffs between the two frameworks in theory and practice. We added examples and more detailed explanations throughout the paper. Test cases for both models are now effectively IOTS (Sect. 4.2), where our previous work used probabilistic test cases, providing a clean distinction between test generation and test selection.

Specifically compared to [21], we use a more standard definition of IOMA (Definition 1) that does not rely on being input-reactive and output-generative [52]. We discuss how to implement quiescence in a Markovian setting in a way that does not affect the statistical evaluation yet minimises the testing runtime and the chance for errors of the second kind (Sect. 5.2). Finally, we study an additional protocol mutant with IOMA in the Bluetooth case study (Sect. 6).

Compared to [19], we adapted the sa-ioco conformance relation such that it now properly extends ioco. That is, where [19] relied on trace distribution inclusion of closed systems, we now utilise schedulers for open systems. As a result, sa-ioco is in line with mar-ioco and with earlier work on untimed probabilistic systems [20]. We also present full proofs for the soundness and completeness of the IOSA MBT framework (Sect. 4.4).

2 Preliminaries

2.1 Mathematical notation

\({\mathbb {N}}\) is \(\{\,0, 1, \ldots \,\}\), the set of natural numbers. \({\mathbb {R}} \), \({\mathbb {R}}^+ \), and \({\mathbb {R}}^{+}_{0} \) are the sets of all, all positive, and all nonnegative real numbers, respectively. We write closed intervals as \([a, b] \{\,x \in {\mathbb {R}} \mid a \le x \le b\,\}\), open intervals as \(]a, b{[} \{\,x \in {\mathbb {R}} \mid a< x < b\,\}\), and half-open intervals analogously as ]ab] and [ab[. For a given set \(\varOmega \), we denote its powerset by \({\mathcal {P}}({\varOmega }) \). A multiset is written as . Let the function \(\mathbb {1} \in \{\, \textit{true}, \textit{false} \,\} \rightarrow \{\,0, 1\,\}\) be defined by \(\mathbb {1}(\textit{true}) = 1\) and \(\mathbb {1}(\textit{false}) = 0\). We write \(\mathbb {1}_b\) to denote \(\mathbb {1}(b)\).

We use angled brackets \(\langle \cdot \rangle \) to denote tuples, and define \(\varOmega ^* \cup _{i\in {\mathbb {N}}} \varOmega ^i\), the set of all finite tuples or sequences consisting of elements from \(\varOmega \). Correspondingly, we write \(\varOmega ^\omega \) for the set of all infinite sequences, \(\varOmega ^{\le \omega }\) for the set of all finite and infinite sequences, and \(\varOmega ^{\le k}\) for the set of all sequences of length at most k. For a sequence

$$\begin{aligned} \sigma = \omega _0 \ldots \omega _n \langle \omega _0, \ldots , \omega _n \rangle \in \varOmega ^{n+1}, \end{aligned}$$

we write \(\sigma \mathbin {.} \omega _{n+1}\) for \(\omega _0 \ldots \omega _n\, \omega _{n+1} \in \varOmega ^{n+2}\), i.e. \(\sigma \) extended by \(\omega _{n+1} \in \varOmega \). We also use the generalisation of the \(\mathbin {.}\) operator to the concatenation of two sequences.

2.2 Probability theory

For a given set \(\varOmega \), a probability subdistribution is a function \(\mu \in \varOmega \rightarrow [0, 1]\) such that

$$\begin{aligned} \mathrm {support}({\mu }) \{\,\omega \in \varOmega \mid \mu (\omega ) > 0\,\} \end{aligned}$$

is countable. Its probability mass is \(|\mu | \sum _{\omega \in \mathrm {support}({\mu })}{\mu (\omega )}\). If \(|\mu | = 1\), then \(\mu \) is a probability distribution. We write \(\mathrm {SubDistr}(\varOmega )\) and \(\mathrm {Distr}(\varOmega )\) for the sets of all probability subdistributions and distributions over \(\varOmega \), respectively. The Dirac distribution for \(\omega \) is \({\mathcal {D}}(\omega ) \), defined by \({\mathcal {D}}(\omega ) = 1\) and \({\mathcal {D}}(\omega ') = 0\) for all \(\omega ' \ne \omega \). Given probability distributions \(\mu _1\) and \(\mu _2\), we denote by \(\mu _1 \otimes \mu _2\) the product distribution, which is the unique probability distribution defined by

$$\begin{aligned} (\mu _1 \otimes \mu _2)(\langle \omega _1, \omega _2 \rangle ) = \mu _1(\omega _1) \cdot \mu _2(\omega _2) \end{aligned}$$

for all \(\langle \omega _1, \omega _2 \rangle \in \mathrm {support}({\mu _1}) \times \mathrm {support}({\mu _2}) \).

Let \(\varOmega \) be endowed with a \(\sigma \)-algebra \(\sigma (\varOmega )\): a collection of measurable subsets of \(\varOmega \). A probability measure over \(\varOmega \) is a function \(\mu \in \sigma (\varOmega ) \rightarrow [0, 1]\) such that

$$\begin{aligned} \mu (\varOmega )=1 \quad \text {and}\quad \mu (\cup _{i \in I}\, B_i) = \sum _{i \in I}\, \mu (B_i) \end{aligned}$$

for any countable index set I and pairwise disjoint measurable sets \(B_i\subseteq \varOmega \). \(\mathrm {Meas}(\varOmega )\) is the set of probability measures over \(\varOmega \). Each \(\mu \in \mathrm {Distr}(\varOmega )\) induces a probability measure, and we also write \({\mathcal {D}}(\cdot ) \) for the Dirac measure.

2.3 Valuations

\(\textit{Val}V \rightarrow {\mathbb {R}}^{+}_{0} \) is the set of valuations for an (implicit) set V of (nonnegative real-valued) variables. Valuation \(\mathbf 0 \) assigns value zero to all variables. Given \(X\subseteq V\) and \(v \in \textit{Val}\), we write \(v[X \mapsto 0]\) for the valuation defined by \(v[X \mapsto 0](x) = 0\) if \(x \in X\) and \(v[X \mapsto 0](y) = v(y)\) otherwise. For \(t \in {\mathbb {R}}^{+}_{0} \), \(v + t\) is the valuation defined by \((v + t)(x) = v(x) + t\) for all \(x \in V\).

3 Automata with stochastic time

We now present the formal automata-based models underlying our model-based testing approaches: Markov automata for memoryless time and stochastic automata for general stochastic time. In addition to their syntax and semantics (in terms of paths, traces and trace distributions), we define parallel composition operators to formally capture the interaction between implementations and test cases.

3.1 Markov automata

Our approach to testing memoryless stochastic-timed systems builds upon the framework of Markov automata [17]. They are a formal model that unifies the discrete probabilistic and nondeterministic choices of Markov decision processes (MDP) with the exponentially distributed delays of continuous-time Markov chains (CTMC) in a compositional way. The exponential distribution provides an appropriate approximation of reality if only the mean durations of activities are known, as is often the case in practice.

In Markov automata, we distinguish between probabilistic and Markovian transitions. The former take place as soon as possible and lead into a probability distribution over successor states (as in MDP). The latter are defined via a rate parameter in \({\mathbb {R}}^+\): the time until the transition is taken follows the exponential distribution with that rate (as in CTMC).

Definition 1

(IOMA) An input–output Markov automaton (IOMA) is a tuple

$$\begin{aligned} {\mathcal {M}}= \langle S, s_{0}, \textit{Act}, T_P, T_M \rangle \end{aligned}$$

where

  • S is a finite set of states,

  • \(s_0 \in S\) is the initial state,

  • \(\textit{Act}= \textit{Act}_I \uplus \textit{Act}_O \uplus \{\, \tau \,\}\) is the set of actions partitioned into inputs, outputs, and the internal action \(\tau \), respectively, with \(\delta \in \textit{Act}_O\) being the distinct quiescence action,

  • \(T_P \in S \rightarrow {\mathcal {P}}({\textit{Act}\times \mathrm {Distr}(S)}) \) is the finite probabilistic transition function, and

  • \(T_M \in S \rightarrow {\mathcal {P}}({{\mathbb {R}}^+ \times S}) \) is the finite Markovian transition function.

If \(\langle \lambda , s' \rangle \in T_M(s)\), we say that \(\langle s, \lambda , s' \rangle \) is a (Markovian) transition (of \({\mathcal {M}}\)), also written . If \(\langle a, \mu \rangle \in T_P(s)\), we say that \(\langle s, a, \mu \rangle \) is a (probabilistic) transition (of \({\mathcal {M}}\)), also written \(s \xrightarrow {a} \mu \). We say that s is Markovian if \(|T_M(s)| \ne 0\); s is probabilistic if \(|T_P(s)| \ne 0\). We write \(s \rightarrow a\) if \(\exists \, \mu :s \xrightarrow {a} \mu \), and \(s\not \rightarrow a\) if \(\not \exists \, \mu :s \xrightarrow {a} \mu \). In the former case, we also say that action a is enabled ins. The set \(\textit{enabled}(s)\) contains all enabled actions in s. We write \(s\xrightarrow {a}_{\!\!\!{\mathcal {M}}} \mu \), etc., to clarify that a transition belongs to IOMA \({\mathcal {M}}\) if ambiguities arise. For brevity, whenever we refer to an IOMA \({\mathcal {M}}\), we assume it to be a tuple with components \(\langle S, s_{0}, \textit{Act}, T_P, T_M \rangle \) as in the above definition unless otherwise noted. \({\mathcal {M}}\) is input-enabled if all inputs are enabled in all states, i.e. we have that \(\forall \, a \in \textit{Act}_I, s \in S :s \rightarrow a\).

We partition the action alphabet into inputs and outputs. This captures communication ports of a system with its environment (e.g. a tester). \(\tau \) represents internal progress of a system that is not visible to an external observer. The existence of a distinct quiescence action \(\delta \) is required to explicitly characterise the absence of any other output for an indefinite amount of time. The combination of exponentially distributed delays and quiescence poses a particular challenge to an MBT framework since quiescence in practice is frequently judged by waiting a finite amount of time [5]. We further investigate this challenge in Sect. 5.2.

Fig. 1
figure 1

Protocol specification IOMA and two erroneous implementations

A Markov automaton starts in its initial state and then progresses through the state space, incurring exponentially distributed delays and jumping between states. When in state s, the next transition to take is selected as follows: if there is an outgoing probabilistic transition labelled with an action in \(\textit{Act}_O \cup \{\, \tau \,\}\), we apply the maximal progress assumption [27]: no time can pass, and one of these transitions is selected nondeterministically. We also say that outputs and internal actions are urgent. Otherwise, time passes until a Markovian transition takes place or an input arrives. The sum of the rates of all outgoing Markovian transitions of s is called its exit rate, denoted \(\mathbf E \left( s\right) \). Multiple Markovian transitions represent a race between exponential distributions. Thus, the time until any Markovian transition takes place is exponentially distributed with rate \(\mathbf E \left( s\right) \); at that point, the actual transition to take is selected probabilistically, with the probability of each transition being its rate divided by \(\mathbf E \left( s\right) \). We define \(\mathbf R \left( s,s'\right) = \sum _{\langle \lambda , s' \rangle \in T_M(s)} \lambda \), the rate from s to \(s'\).

Example 1

Figure 1 shows three IOMA describing a protocol that associates a delay with every send action, followed by an acknowledgement or error. As a convention, we indicate inputs by a ? suffix and outputs by a ! suffix. Discrete probability distributions follow an intermediate dot. Markovian transitions are presented as wavy arrows.

After the send? input is received by the specification in Fig. 1a, there is an exponentially distributed delay with rate \(\lambda _1\): the probability to go from \(s_1\) to \(s_2\) in at most T time units is \(1-\hbox {e}^{-\lambda _1 T}\). State \(s_2\) has one probabilistic transition. The specification requires that only \(10\%\) of all messages end in an error report and the remaining \(90\%\) are delivered correctly. After a message is delivered, the automaton goes back to its initial state where it stays quiescent until input is provided. The \(\delta \) self-loop marks the absence of outputs.

The “unfair” implementation model in Fig. 1b has the same structure, except for altered probabilities in the distribution out of \(s_2\). While the delay conforms to the one prescribed in the specification model, sufficiently many executions of the implementation should reveal that an error is reported more frequently than required. The “slow” implementation model of Fig. 1c assigns rate \(\lambda _2\) to the exponential delay between input and output. This is conforming iff \(\lambda _1=\lambda _2\); if \(\lambda _2 < \lambda _1\), it would be slower than required. This paper aims at establishing an MBT framework capable of identifying that implementations like these two do not conform to the given specification model.

3.2 Stochastic automata

We use stochastic automata [11] to develop an MBT approach for general stochastic-timed systems. They are MDP augmented with real-time clocks that expire after delays governed by general (continuous) probability distributions. In this way, they allow every stochastic delay to be modelled precisely, without the need for exponential or phase-type approximation as with Markov automata.

The progress of time is governed and tracked across locations and edges explicitly by clocks. This is necessary because, working in general continuous time not restricted to exponential distributions, delays in stochastic automata do not have the memoryless property. Clocks are real-valued variables that increase synchronously with rate 1 over time and expire some random amount of time after they have been restarted. The expiration time is drawn from a probability distribution specified for each clock. Stochastic automata are thus a symbolic model, so they consist of locations and edges rather than states and transitions.

Definition 2

(IOSA) An input–output stochastic automaton (IOSA) is a tuple

where

  • \(\textit{Loc} \) is a finite set of locations,

  • \(\ell _0 \in \textit{Loc} \) is the initial location,

  • \({\mathcal {C}} \) is a finite set of clocks,

  • \(\textit{Act}= \textit{Act}_I \uplus \textit{Act}_O \uplus \{\, \tau \,\}\) is the set of actions partitioned into inputs, outputs, and the internal action \(\tau \), respectively, with \(\delta \in \textit{Act}_O\) being the distinct quiescence action,

  • \(E \in \textit{Loc} \rightarrow {\mathcal {P}}({ \textit{Edges}}) \) with \(\textit{Edges} {\mathcal {P}}({{\mathcal {C}}}) \times \textit{Act}\times \mathrm {Distr}(\textit{T})\) and \(\textit{T} {\mathcal {P}}({{\mathcal {C}}}) \times \textit{Loc} \) is the edge function mapping each location to a finite set of edges that in turn consist of a guard set, an action label, and a distribution over targets in \(\textit{T} \) consisting of a restart set of clocks and target locations, and

  • \(F\in {\mathcal {C}} \rightarrow \mathrm {Meas}({\mathbb {R}}^{+}_{0})\) is the delay measure function that maps each clock to a probability measure.

We write \(\textit{pdf}(c)\) to refer to the probability density function associated with the measure F(c) for \(c \in {\mathcal {C}} \). As for Markov automata, we use an input–output variant of stochastic automata, along the lines of [12]. We transfer the notation used for transitions in IOMA to edges in IOSA. We call an IOSA \({\mathcal {I}}\)input-enabled if all inputs are available in every location at every time, i.e. \(\exists \, \mu :\ell \xrightarrow {\varnothing , a_I} \mu \) for all \(\ell \in \textit{Loc} \) and \(a_I \in \textit{Act}_I\).

Intuitively, a stochastic automaton starts in the initial location with all clocks expired. An edge may be taken only if all clocks in its guard set G are expired. If any output or internal edge is enabled, some edge must be taken, i.e. all outputs and internal actions are urgent. When an edge \(\ell \xrightarrow {G, a } \mu \) is taken, its action is a, we select a target \(\langle R, \ell ' \rangle \in \textit{T} \) randomly according to the discrete distribution \(\mu \), all clocks in R are restarted, and we move to successor location \(\ell '\). There, another edge may be taken immediately or we may need to wait until some further clocks expire, and so on. When a clock c is restarted, the time until it expires is chosen randomly according to the probability measure F(c).

Fig. 2
figure 2

File server specification and implementation IOSA

Example 2

Figure 2a shows an example IOSA specifying the behaviour of a file server with archival storage. We omit empty restart sets and the empty guard sets of inputs. Upon receiving a request in the initial location \(\ell _0\), the specification allows implementations to either move to \(\ell _1\) or \(\ell _2\). The edge, i.e. the element of \(E (\ell _0)\), corresponding to the move to \(\ell _1\) is \(\langle \varnothing , \texttt {req?}, {\mathcal {D}}(\langle \{\,x\,\}, \ell _2 \rangle ) \rangle \), where \(\varnothing \) is the edge’s empty guard set—it must be empty since req? is an input. The move to \(\ell _2\) represents the case of a file in archive: the server must immediately deliver a wait! notification and then attempt to retrieve the file from the archive. Clocks y and z are restarted, and used to specify that retrieving the file shall take on average \(\frac{1}{3}\) of a time unit, exponentially distributed, but no more than 5 time units. In location \(\ell _3\), there is thus a race between retrieving the file and a deterministic timeout. In case of timeout, an error message (action err!) is returned; otherwise, the file can be delivered as usual from location \(\ell _1\). Clock x is used to specify the transmission time of the file: it shall be uniformly distributed between 0 and 1 time units.

In Fig. 2b, we show an implementation of this specification. One out of ten files randomly requires to be fetched from the archive. This is allowed by the specification: it is one particular (randomised) resolution of the nondeterminism, i.e. underspecification, defined in \(\ell _0\). The implementation also manages to transmit files from archive directly while fetching them, as evidenced by the direct edge from \(\ell _3\) back to \(\ell _0\) labelled file!. This violates the timing prescribed by the specification, and must be detected by an MBT procedure for IOSA.

In the remainder of this paper, whenever a statement applies to both IOMA and IOSA, we will say that it applies to an automaton \({\mathcal {A}}\) for brevity.

3.3 Parallel composition

To give a semantics for synchronisation and communication between components of a system, we define a binary parallel composition operator. Two components synchronise on inputs and outputs, and otherwise evolve independently. Our operators are defined w.r.t. a binary input–output relation M that associates outputs of one component with inputs of the other component, and vice versa. Wherever we use the !/?-suffix convention for action labels, we assume that M relates every output \(a\texttt {!}\) with the input \(a\texttt {?}\) and vice versa.

Markov automata IOMA interact via probabilistic transitions, while Markovian transitions evolve independently, with the single technical exception of Markovian self-loops:

Definition 3

(parallel composition, IOMA) For two IOMA

$$\begin{aligned} {\mathcal {M}}_i = \langle S_i, s_{0_i}, \textit{Act}_i, T_P^i, T_M^i \rangle , \end{aligned}$$

\(i \in \{\,1, 2\,\}\), and an input–output relation

$$\begin{aligned} M \subseteq (\textit{Act}_{O_1} \times \textit{Act}_{I_2}) \cup (\textit{Act}_{I_1} \times \textit{Act}_{0_2}), \end{aligned}$$

the parallel composition of \({\mathcal {M}}_1\) and \({\mathcal {M}}_2\) w.r.t. M is

$$\begin{aligned} {\mathcal {M}}_1 \!\parallel \!_M {\mathcal {M}}_2 \langle S_1 \times S_2, \langle s_{0_1}, s_{0_2} \rangle , \textit{Act}, T_P, T_M \rangle \end{aligned}$$

with \(\textit{Act}\textit{Act}_I \uplus \textit{Act}_O \uplus \{\, \tau \,\}\), \(\textit{Act}_O = \textit{Act}_{O_1} \cup \textit{Act}_{O_2}\), and

$$\begin{aligned} \textit{Act}_I (\textit{Act}_{I_1} \cup \textit{Act}_{I_2}) {\setminus } ( \sqcap ^{\textit{Act}_{I_1}}_{\textit{Act}_{O_2}}(M) \cup \sqcap ^{\textit{Act}_{I_2}}_{\textit{Act}_{O_1}}(M^{-1}) ) \end{aligned}$$

where \(\sqcap ^{I}_{O}(M)\) are the inputs in I that are matched to an output in O by M:

$$\begin{aligned} \sqcap ^{I}_{O}(M) \{\, a_I \in I \mid \exists \, a_O \in O :\langle a_I, a_O \rangle \in M \,\}. \end{aligned}$$

The transition functions \(T_P\) and \(T_M\) are the smallest functions satisfying the inference rules given in Fig. 3 plus symmetric rules \(\textit{indep}_2\), \(\textit{sync}_2\), \(\textit{mar}_2\), and \(\textit{marloop}_2\) for the corresponding independent steps, synchronising outputs, Markovian transitions, and Markovian loops of \({\mathcal {M}}_2\).

Fig. 3
figure 3

Inference rules for IOMA parallel composition

In the action alphabet only those inputs carry over that do not have a synchronising output in the other component associated with them via M. If \(s_1 \rightarrow _{{\mathcal {M}}_1} a_1\) and \(\langle a_1, a_2 \rangle \in M\), an \(a_1\)-labelled transition can only take place in synchronisation with an \(a_2\)-labelled transition from the second component (assuming no other action is associated with \(a_1\) by M). In particular, if \(s_1 \not \rightarrow _{{\mathcal {M}}_1} a_2\), then \(\langle s_1, s_2 \rangle \) has no \(a_1\)-\(a_2\)-synchronising transition: synchronisation waits for all partners to be ready. We later restrict to input-enabled models to make sure that outputs cannot be prevented from occurring immediately.

Stochastic automata The definition of parallel composition for IOSA is similar: while there are no Markovian transitions, the synchronisation of probabilistic edges now requires building the unions of the involved guard and restart sets. This means that a synchronising edge in the parallel composition only takes places as soon as both of its constituent edges are enabled: synchronisation partners wait, just as in IOMA.

Definition 4

(parallel composition, IOSA) For two IOSA

\(i \in \{\, 1, 2 \,\}\), with \({\mathcal {C}} _1 \cap {\mathcal {C}} _2 = \varnothing \) and an input–output relation M as in Definition 3, the parallel composition of \({\mathcal {I}}_1\) and \({\mathcal {I}}_2\) w.r.t. M is

with \(\textit{Act}\) as in Definition 3 and E being the smallest function satisfying the inference rules given in Fig. 4, plus symmetric rules for the corresponding steps of \({\mathcal {I}}_2\).

Fig. 4
figure 4

Inference rules for IOSA parallel composition

3.4 Qualitative semantics

The non-probabilistic aspects of the semantics of IOMA and IOSA are captured in the notion of a path, which precisely represents a single execution of an automaton.

3.4.1 Paths

A concrete execution of an automaton—the exact amount of time spent in each state, the transition/edge taken, and the selected successor state/location—is captured by a path.

Markov automata The definition of paths for IOMA is based on the automaton’s states and transitions:

Definition 5

(path, IOMA) The set of all paths of an IOMA \({\mathcal {M}}\) is

$$\begin{aligned} \textit{paths}({\mathcal {M}}) \subseteq S \times ({\mathbb {R}}^{+}_{0} \times T \times \{\,\varnothing \,\} \times S)^{\le \omega }, \end{aligned}$$

with \(T (\textit{Act}\times \mathrm {Distr}(S)) \cup {\mathbb {R}}^+ \) serving to characterise transitions, and contains precisely the sequences \(\pi \) of the form

$$\begin{aligned} \pi =s_0 \, t_1 \, \alpha _1 \, \varnothing \, s_1 \, t_2 \, \alpha _2 \, \varnothing \ldots \end{aligned}$$

where, for all applicable \(i \ge 1\), for the \(\alpha _i \in T\) we have that either \(\alpha _i = \langle a_i, \mu _i \rangle \in \textit{Act}\times \mathrm {Distr}(S)\) such that

$$\begin{aligned} \langle a_i, \mu _i \rangle \in T_P(s_{i-1}) \wedge \mu _i(s_i) > 0, \end{aligned}$$

i.e. \(\alpha _i\) is a probabilistic transition, or \(\alpha _i = \lambda _i \in {\mathbb {R}}^+ \) with \(\langle \lambda _i, s_i \rangle \in T_M({s_{i-1}})\), i.e. it is a Markovian transition.

By definition, every finite path ends in a state, and either \(s_{i} \xrightarrow {a_{i + 1}} \mu _{i + 1}\) or for every non-final state \(s_i\). A subsequence \(s_{i-1}\, t_{i}\, \alpha _i\, \varnothing \, s_{i}\) means that \({\mathcal {M}}\) resided \(t_i\) time units in state \(s_{i-1}\) before moving to \(s_{i}\) via \(\alpha _i\). The empty sets \(\varnothing \) are for consistent notation with paths for IOSA (see below).

Stochastic automata IOSA comprise real-valued clocks; to define a path through an IOSA \({\mathcal {I}}\), we need to keep track of their values and expiration times. We do so by defining the state of \({\mathcal {I}}\)to include these values: the set of states of an IOSA \({\mathcal {I}}\) is \(S \textit{Loc} \times \textit{Val}\times \textit{Val}\). Each state \(\langle \ell , v, x \rangle \in S\) consists of the current location \(\ell \) and the values v and expiration times x of all clocks. Consequently, the state space of an IOSA is uncountably infinite.

Definition 6

(path, IOSA) Let us define the predicate

$$\begin{aligned} {\mathrm {Ex}}(G, v, x) \forall \, c \in G :v(c) \ge x(c) \end{aligned}$$

that indicates whether all clocks in G are expired. Then, the set of all paths of an IOSA \({\mathcal {I}}\) is

$$\begin{aligned} \textit{paths}({\mathcal {I}}) \subseteq S \times ({\mathbb {R}}^{+}_{0} \times \textit{Edges} \times {\mathcal {P}}({{\mathcal {C}}}) \times S)^{\le \omega } \end{aligned}$$

and contains precisely the sequences \(\pi \) of the form

where \(v_0 = x_0 = \mathbf 0 \) and, for all applicable \(i \ge 1\), we have

  • \(\ell _{i-1} \xrightarrow {G_i, a_i} \mu _i\),

  • \(v_i = (v_{i-1} + t)[R_i \mapsto 0]\),

  • \({\mathrm {Ex}}(G_i, v_{i-1} + t, x_{i-1})\) is satisfied,

  • \(\mu _i(\langle R_i, \ell _i \rangle ) > 0\),

  • the expiration times satisfy

    $$\begin{aligned} \begin{aligned} x_i \in \{\, x \in \textit{Val}\mid \, \forall \, c \in {\mathcal {C}} {\setminus } R_i&:x(c) = x_{i-1}(c)\\ \wedge \, \forall \, c \in R_i&:x(c) \ge 0 \,\}, \end{aligned} \end{aligned}$$
  • and if \(a_i \notin \textit{Act}_I\), then additionally

    $$\begin{aligned} \not \exists \,t' \in [0,t[:\, \exists \, \ell _{i-1} \xrightarrow {G, a} \mu :{\mathrm {Ex}}(G, v_{i-1} + t', x_{i-1}). \end{aligned}$$

The last condition implements the urgency of outputs and internal actions. We require that every path starts in the initial location with all clocks and expiration times set to zero. An edge may only be taken if all clocks in its guard set are expired (which is the case when predicate \({\mathrm {Ex}}\) is satisfied). The clock values in the successor state are obtained by resetting exactly those clocks in the restart set \(R_i\) to zero. All other clocks keep their value and expiration time.

We write \(\textit{last}(\pi )\) to denote the last state of a finite path. We write \(\pi '\sqsubseteq \pi \) if \(\pi '\) is a prefix of \(\pi \). The set of all finite paths of an automaton \({\mathcal {A}}\) is \(\textit{paths}^{\textit{fin}}({\mathcal {A}})\). The set of complete paths, denoted \(\textit{paths}^{\textit{com}}({\mathcal {A}})\), contains every path ending in a deadlock, i.e. in a state s where \(T_P(s) = T_M(s) = \varnothing \) (for IOMA) or a location \(\ell \) where \(E(\ell ) = \varnothing \) (for IOSA).

3.4.2 Traces

A trace is the projection of a path to its delays and actions, recording the path’s visible behaviour:

Definition 7

(trace) The trace of \(\pi \) is

$$\begin{aligned} \textit{tr}(\pi ) \in ({\mathbb {R}}^{+}_{0} \times \textit{Act}{\setminus } \{\, \tau \,\})^{\le \omega } \end{aligned}$$

given as the projection of

$$\begin{aligned} \pi =s_0 \, t_1 \, \alpha _1 \, R_1 \, s_1 \, t_2 \, \alpha _2 \, R_2 \ldots \end{aligned}$$

to the \(t_i\) and the actions \(a_i \ne \tau \) of those \(\alpha _i\) that are of the form \(\langle a_i, \mu _i \rangle \in \textit{Act}\times \mathrm {Distr}(S)\) for IOMA or \(\langle G_i, a_i, \mu _i \rangle \in \textit{Edges} \) for IOSA, summing up the \(t_i\) over all subsequent steps where \(\alpha _i\) is of another form (i.e. internal and Markovian transitions for IOMA and internal edges for IOSA). The length of \(\pi \), denoted \(|\pi |\), is the number of actions on \(\textit{tr}(\pi )\). The set \(\textit{tr}^{-1}(\sigma )\) is the set of all paths that have trace \(\sigma \). The set of all traces of an automaton \({\mathcal {A}}\) is \(\textit{traces}({\mathcal {A}})\), while \(\textit{traces}^{\textit{fin}}({\mathcal {A}})\) is the set of all of its finite traces. Finally, \(\textit{traces}^{\textit{com}}({\mathcal {A}})\) is the set of all its complete traces, i.e. those \(\sigma \) for which \(\textit{tr}^{-1}(\sigma )\) contains at least one complete path.

3.4.3 Abstract traces

When delays are governed by continuous probability distributions, the probability of any single time point is zero. Hence, we will need a notion that represents an automaton’s behaviour over time intervals instead of points.

Definition 8

(abstract trace) An abstract trace is a trace where each delay \(t_i\) is replaced by an interval \(I_i\subseteq {\mathbb {R}}^{+}_{0} \) with \(t_i \in I_i\).

W.l.o.g. we only consider non-empty intervals of the form \(\left[ 0,t\right] \) in the remainder of this paper. Consequently, every trace can be replaced by its abstract trace by changing all \(t_i\) to \(\left[ 0,t_i\right] \) and vice versa, defining a bijection between traces and their abstract counterparts. Hence, for a trace \(\sigma \) we denote by \(\varSigma \) its corresponding abstract trace. \(\textit{AbsTraces}({\mathcal {A}})\) is the set of all abstract traces of automaton \({\mathcal {A}}\), and \(\textit{AbsTraces}^{\textit{fin}}({\mathcal {A}})\) is the set of all its finite abstract traces. For \(\varSigma \) and \(\varSigma '\) with \(\varSigma = I_1\, a_1\, I_2\, a_2 \ldots a_n\) and \(\varSigma ' = I'_1\, a'_1\, I'_2\, a'_2\ldots \), we say \(\varSigma \) is a prefix of \(\varSigma '\), denoted \(\varSigma \sqsubseteq \varSigma '\), if \(I_i = I'_i\) and \(a_i = a_i'\) for \(i = 1, 2, \ldots , n\). That is, \(\varSigma \) and \(\varSigma '\) coincide on the first n steps. Finally, we define \(\textit{act}\left( \sigma \right) \) as the action trace of \(\sigma \), obtained by removing all time values \(t_i\) from \(\sigma \), i.e. \(\textit{act}\left( \sigma \right) \) consists of actions in \(\textit{Act}{\setminus } \{\, \tau \,\}\) only.

Fig. 5
figure 5

Example IOMA for paths and traces

Example 3

Consider the IOMA \({\mathcal {M}}\) given in Fig. 5. Let the three Dirac distributions of the transitions labelled \(\tau \), a?, and b? be \(\mu _\tau ,\mu _a\) and \(\mu _b\), respectively. For the path

$$\begin{aligned} \pi = s_0 ~ 2.9 ~ 3 ~ \varnothing ~ s_1 ~ 0 ~ \langle \tau , \mu _\tau \rangle ~ \varnothing ~ s_0 ~ 0 ~ \langle \texttt {b?}, \mu _b \rangle ~ \varnothing ~ s_2 \end{aligned}$$

we have \(\pi \in \textit{paths}^{\textit{com}}({\mathcal {M}})\), trace \(\textit{tr}(\pi ) = \sigma = 2.9 ~ \texttt {b?}\), abstract trace \(\varSigma = [0,2.9] ~ \texttt {b?}\), action trace \(\textit{act}\left( \sigma \right) = \texttt {b?}\), and path length \(|\pi |= 1\). Note that the trace is much shorter than the path since it omits the internal \(\tau \) steps and then merges all the delay steps between any two consecutive remaining (i.e. non-\(\tau \)) actions.

3.5 Quantitative semantics

Our goal is now to quantify the frequency of observed traces. For this purpose, we first define schedulers, which resolve all nondeterministic choices, and then a probability space and measure over the remaining paths. The space and measure will allow us to specify trace distributions.

3.5.1 Schedulers

IOMA and IOSA comprise nondeterministic choices, discrete probability distributions, and delays following continuous probability distributions. Due to the nondeterminism, we cannot assign probabilities to paths and traces directly. Rather, we resort to schedulers that resolve nondeterminism, and consequently yield a purely probabilistic system. Given any finite history leading to a state/location, a scheduler returns a discrete probability distribution over the set of next transitions/edges. In order to model termination, we define schedulers such that they can continue paths with a halting extension \(\perp \), after which only quiescence is observed.

Definition 9

(scheduler, IOMA) A scheduler of an IOMA \({\mathcal {M}}\) is a function

$$\begin{aligned} {\mathfrak {S}}\in \textit{paths}^{\textit{fin}}({\mathcal {M}}) \rightarrow \mathrm {SubDistr}( \textit{Act}\times \mathrm {Distr}(S) \cup \{\, \perp \,\}) \end{aligned}$$

such that, with \(\textit{last}(\pi ) = s\), \({\mathfrak {S}}(\pi )(\langle a, \mu \rangle ) > 0\) implies \(s \xrightarrow {a} \mu \), and if \(s \rightarrow a\) for \(a \in \textit{Act}_O \cup \{\, \tau \,\}\) then \(|{\mathfrak {S}}(\pi )|=1\). The probability to halt is \({\mathfrak {S}}\left( \pi \right) \left( \perp \right) \); we say that \({\mathfrak {S}}\) halts on \(\pi \) if \({\mathfrak {S}}\left( \pi \right) \left( \perp \right) =1\), and that \({\mathfrak {S}}\) is of length\(k\in {\mathbb {N}}\) if it halts on all paths \(\pi \) with \(|\pi | \ge k\) and for every complete path of length less than k. The set of all schedulers of \({\mathcal {M}}\) of length k is \(\textit{Sched}({\mathcal {M}})^{\le k}\); the set of all schedulers of finite length is \(\textit{Sched}({\mathcal {M}})\).

The definition of schedulers ensures that only enabled transitions are chosen. We use subdistributions, as opposed to distributions, such that the probability mass a scheduler did not assign to actions in \(\textit{Act}\) is left for Markovian transitions. That is, a scheduler chooses an action, halts immediately (\(\bot \)), or leaves a chance for Markovian actions to take place. Schedulers for IOSA are defined similarly:

Definition 10

(scheduler, IOSA) A scheduler of an IOSA \({\mathcal {I}}\) is a measurable function

$$\begin{aligned} {\mathfrak {S}}\in \textit{paths}^{\textit{fin}}({\mathcal {I}})\rightarrow \mathrm {Distr}(\textit{Edges} \cup \{\,\bot \,\}) \end{aligned}$$

such that, with \(\textit{last}(\pi )= \langle \ell , v, x \rangle \), \({\mathfrak {S}}(\pi )(\langle G, a, \mu \rangle )>0\) implies \(\ell \xrightarrow {G, a} \mu \wedge {\mathrm {Ex}}(G, v + t, x)\) where \(t \in {\mathbb {R}}^{+}_{0} \) is the minimal delay for which no other transition was available before, i.e.

$$\begin{aligned} \not \exists \,t' \in [0,t[:\, \bigvee _{\ell \xrightarrow {G', a'} \mu '} {\mathrm {Ex}}(G', v + t', x). \end{aligned}$$

\({\mathfrak {S}}(\pi )(\bot )\) is the probability to halt. \({\mathfrak {S}}\) halts on \(\pi \) if \({\mathfrak {S}}(\pi )(\bot ) = 1\). \({\mathfrak {S}}\) is of length\(k\in {\mathbb {N}}\) if it halts on all paths \(\pi \) with \(|\pi | \ge k\) and for every complete path of length less than k. The set of all schedulers of \({\mathcal {I}}\)of length k is \(\textit{Sched}({\mathcal {I}})^{\le k}\); the set of all schedulers of finite length is \(\textit{Sched}({\mathcal {I}})\).

A scheduler for an IOSA can only choose between the edges enabled at the points where any edge just became enabled. While actions (via probabilistic transitions) and the passage of time (via Markovian transitions) were decoupled in IOMA, edges in IOSA directly govern delays. Schedulers thus return distributions, not subdistributions.

Remark 1

We use schedulers in the context of MBT in an open environment, yet schedule both inputs and outputs. This is in contrast to similar approaches in the literature; for instance, [7] use a partial scheduler for each component and an arbiter scheduler that tells precisely how progress of the composed system is determined. Our approach is non-compositional (see, for example, [44]). However, we utilise schedulers only to determine the probabilities of paths and traces, which does not require compositionality.

For both IOMA and IOSA, we restrict to finite-length schedulers in the remainder of the paper. As is usual, we also consider only schedulers that let time diverge with probability 1.

3.5.2 Probabilities of paths

By resolving all nondeterminism, a scheduler makes it possible to calculate the probability for measurable sets of paths via step probability functions. A scheduler schedules without delay. Hence, there are no additional races between Markovian transitions or edges and scheduler decisions.

Definition 11

(step probability, IOMA) Let \({\mathfrak {S}}\) be a scheduler of an IOMA \({\mathcal {M}}\). We define the step probability function\(Q^{\mathfrak {S}}\) from \(\textit{paths}^{\textit{fin}}({\mathcal {M}})\) to

$$\begin{aligned} \mathrm {Meas}(( {\mathbb {R}}^{+}_{0} \times T \times \{\, \varnothing \,\} \times S )\cup \{\, \bot \,\} ), \end{aligned}$$

with \(T (\textit{Act}\times \mathrm {Distr}(S)) \cup {\mathbb {R}}^+ \) by \(Q^{\mathfrak {S}}(\pi )(\bot ) = {\mathfrak {S}}(\pi )(\bot )\) and, for \(\pi \) with \(\textit{last}(\pi ) = s\), by

\(Q^{\mathfrak {S}}(\pi )(I \times A_Q \times \{\, \varnothing \,\} \times S_Q) = \)

$$\begin{aligned}&\sum _{s' \in S_Q} \big ( \sum _{\alpha _P \in T_P(s) \cap A_Q} P_\pi (I, \alpha _P, s') + \sum _{\alpha _M \in T_M(s) \cap A_Q} M_\pi (I, \alpha _M, s') \big )\\&\quad \text {with } P_\pi (I, \langle a, \mu \rangle , s') \mathbb {1}_{0 \in I} \cdot {\mathfrak {S}}(\pi )(\langle \alpha , \mu \rangle )\cdot \mu (s')\\&\quad \text {and }M_\pi (I, \langle \lambda , s'' \rangle , s') \mathbb {1}_{s'' = s'} \cdot (1 - |{\mathfrak {S}}(\pi )|) \cdot \int _{t \in I} \lambda \, \hbox {e}^{-\mathbf E \left( s\right) \cdot t}. \end{aligned}$$

The probability to halt right after \(\pi \) is inferred from the probability a scheduler assigns to the halting extension \(\perp \). Otherwise, this function defines, for every path \(\pi \), a measure quantifying the probability to continue from state \(\textit{last}(\pi ) = s\) by incurring a delay in the interval \(I \subseteq {\mathbb {R}}^{+}_{0} \), taking a transition in \(A_Q\), and ending up in a state in \(S_Q\). Auxiliary function \(P_\pi \) calculates the probability of doing so via a probabilistic transition while \(M_\pi \) considers Markovian transitions. The integral in \(M_\pi \) implements the exponential distribution of delays.

Definition 12

(step probability, IOSA) Let \({\mathfrak {S}}\) be a scheduler of an IOSA \({\mathcal {I}}\). We define the step probability function\(Q^{\mathfrak {S}}\) in

$$\begin{aligned} \textit{paths}^{\textit{fin}}({\mathcal {I}}) \rightarrow \mathrm {Meas}(( {\mathbb {R}}^{+}_{0} \times \textit{Edges} \times {\mathcal {P}}({{\mathcal {C}}}) \times S )\cup \{\, \bot \,\} ) \end{aligned}$$

by \(Q^{\mathfrak {S}}(\pi )(\bot ) = {\mathfrak {S}}(\pi )(\bot )\) and, for \(\pi \) with \(\textit{last}(\pi ) = \langle \ell , v, x \rangle \) and t the minimal delay in \(\ell \) as in Definition 6,

$$\begin{aligned} Q^{\mathfrak {S}}(\pi )(I {\times } E _Q {\times } R_Q {\times } S_Q) = \mathbb {1}_{t \in I} \cdot \sum _{e \in E _Q} Y^{S_Q}_{R_Q}(\pi , e) \end{aligned}$$

where

$$\begin{aligned} Y^{S_Q}_{R_Q}(\pi , e) {\mathfrak {S}}(\pi )(e) \cdot \sum _{R \in R_Q, \ell ' \in \textit{Loc}} \mu (\langle R, \ell ' \rangle ) \cdot \int _{\langle \ell ', v', x' \rangle \in S_Q} \!\!X_R^x(v'\!, x') \end{aligned}$$

and

$$\begin{aligned} X_R^x(v', x') \mathbb {1}_{v' = (v+t)[R \mapsto 0]} \prod _{c \in {\mathcal {C}}} {\left\{ \begin{array}{ll} 1 \text { if } c \notin R \wedge x(c) = x'(c) \\ 0 \text { if } c \notin R \wedge x(c) \ne x'(c) \\ \textit{pdf}(c)(x'(c)) \text {~~if } c \in R. \end{array}\right. } \end{aligned}$$

This function defines, for every path \(\pi \), a measure quantifying the probability to continue from state \(\textit{last}(\pi ) = \langle \ell , v, x \rangle \) by incurring a delay in the interval \(I \subseteq {\mathbb {R}}^{+}_{0} \), taking an edge in \(E_Q\), resetting a set of clocks in \(R_Q\), and ending up in a state in \(S_Q\). First, the factor \(\mathbb {1}_{t \in I}\) ensures that only delays in I have positive probability. We then sum the probabilities over all edges, with the value for each edge being given by auxiliary function \(Y^{S_Q}_{R_Q}\). In that function, we multiply the probability that the scheduler selects this edge, the probability for each probabilistic branch, and the probability to end up in a state in \(S_Q\) by following that branch. States are uncountable, so we integrate the probability density for every state as given by auxiliary function \(X_R^x\). A state can only have positive probability if the values it assigns to clocks are the previous values plus the selected delay plus the branch’s clock restarts (factor \(\mathbb {1}_{v' = (v+t)[R \mapsto 0]}\)). The final multiplication in \(X_R^x\) assigns the correct probability mass (via \(\textit{pdf}(c)(x'(c))\)) to sampling new expiration times for the clocks that are restarted (identified by \(c \in R\)); all other clocks retain their expiration times (as enforced by the first two lines of the case distinction).

3.5.3 Trace distributions

Overall, the two-step probability functions induce unique probability measures \(P_{{\mathfrak {S}}}\) over \(\textit{paths}^{\textit{fin}}({\mathcal {A}})\) for an automaton \({\mathcal {A}}\)and a scheduler \({\mathfrak {S}}\). We can define the trace distribution for \({\mathcal {A}}\) and a scheduler as the probability measure over traces (using abstract traces to construct the corresponding \(\sigma \)-algebra) induced by these probability measures over paths in the usual way. The probability of a set of abstract traces X is the probability of all paths whose trace is in X.

Definition 13

(trace distribution) The trace distribution \({\mathcal {T}}\) of a scheduler \({\mathfrak {S}}\in \textit{Sched}({\mathcal {M}})\), denoted \({\mathcal {T}}=\textit{trd}({\mathfrak {S}})\), is given by the probability space \(\langle \varOmega _{\mathcal {T}}, {\mathcal {F}}_{\mathcal {T}}, P_{\mathcal {T}} \rangle \) where

  • \(\varOmega _{\mathcal {T}}\textit{AbsTraces}({\mathcal {M}})\),

  • \({\mathcal {F}}_{\mathcal {T}}\) is the smallest \(\sigma \)-field generated by the sets

    $$\begin{aligned} \{\, C_\varSigma \mid \varSigma \in \textit{AbsTraces}^{\textit{fin}}({\mathcal {M}}) \,\} \end{aligned}$$

    with \(C_{\varSigma } \{\, \varSigma ' \in \varOmega _{\mathcal {T}}\mid \varSigma \sqsubseteq \varSigma ' \,\}\), and

  • \(P_{\mathcal {T}}\) is the unique probability measure on \({\mathcal {F}}_{\mathcal {T}}\) defined by \(P_{\mathcal {T}}(X) = P_{{\mathfrak {S}}}(\textit{tr}^{-1}({X}))\) for \(X\in \mathcal {F_{\mathcal {T}}}\).

We can also use trace distributions to relate two automata: \({\mathcal {A}}_1\) and \({\mathcal {A}}_2\) are related if they induce the same trace distributions. In particular, a trace distribution \({\mathcal {T}}\) of \({\mathcal {A}}_1\) is contained in the set of trace distributions of \({\mathcal {A}}_2\) if there is a scheduler \({\mathfrak {S}}\) in \({\mathcal {A}}_2\) such that \({\mathcal {T}}=\textit{trd}({\mathfrak {S}})\). We write \(\textit{trd}({\mathcal {A}},k)\) for the set of trace distributions based on a scheduler of length k and \(\textit{trd}({\mathcal {A}})\) for the set of all finite trace distributions. Finally, we write \({\mathcal {A}}_1\sqsubseteq ^k_{\textit{TD}}{\mathcal {A}}_2\) if \(\textit{trd}({\mathcal {A}}_1,k)\subseteq \textit{trd}({\mathcal {A}}_2,k)\) for \(k\in {\mathbb {N}}\), and \({\mathcal {A}}_1\sqsubseteq ^\textit{fin}_{\textit{TD}}{\mathcal {A}}_2\) if \({\mathcal {A}}_1\sqsubseteq _{\textit{TD}}^k{\mathcal {A}}_2\) for some \(k\in {\mathbb {N}}\). This induces an equivalence relation \(=_{\textit{TD}}\): \({\mathcal {A}}_1\) and \({\mathcal {A}}_2\) are trace distribution equivalent, written \({\mathcal {A}}_1 =_{\textit{TD}} {\mathcal {A}}_2\), iff \(\textit{trd}({\mathcal {A}}_1) = \textit{trd}({\mathcal {A}}_2)\).

4 Stochastic testing theory

Model-based testing comprises automatic test case generation, execution, and evaluation based on a requirements model. We now establish this three-step procedure for IOMA and IOSA. As a first step, we define formal conformance between two models via two conformance relations akin to ioco [49], called mar-ioco and sa-ioco. We then specify what a test case is, and when an observed trace should be judged as correct via test annotations. Working in a stochastic environment also necessitates a statistical verdict. We describe the sampling process for an IUT and then define verdict functions. Finally, we prove the correctness of the framework.

The main difference of our stochastic test theory, compared to the probabilistic test theory of [20], lies in the sampling process and its resulting observations, in particular, in the trace frequency counting functions. We carefully defined IOMA and IOSA in such a way that many of the notions in the remainder of this section apply to both settings. For this reason, we will write \(\mathbf * \mathbf{-ioco } \), \(\sqsubseteq ^{*}_{\textit{ioco}}\), etc., to summarise a definition for both \(\mathbf{mar-ioco }\) and \(\mathbf{sa-ioco } \), \(\sqsubseteq ^\textit{mar}_\textit{ioco}\) and \(\sqsubseteq ^{\textit{sa}}_{\textit{ioco}}\), etc.

4.1 Stochastic conformance relations

The purpose of the conformance relation is to judge whether an implementation model conforms to the requirements specification model. We define our relations for IOMA and IOSA such that they only rely on trace distributions. Trace distribution equivalence \(=_{\textit{TD}}\) is the probabilistic counterpart of trace equivalence for transition systems. However, trace equivalence or inclusion is too fine as a conformance relation for testing [48]. The ioco relation for functional conformance solves this problem by allowing underspecification of functional behaviour: an implementation \({\mathcal {I}}\) is conforming to a specification \({\mathcal {S}}\) if every experiment derived from \({\mathcal {S}}\) executed on \({\mathcal {I}}\) leads to an output that was foreseen in \({\mathcal {S}}\):

$$\begin{aligned} {\mathcal {I}}\sqsubseteq _{\textit{ioco}}{\mathcal {S}} ~\Leftrightarrow ~ \forall \sigma \in \textit{traces}^{\textit{fin}}({\mathcal {S}}) :\textit{out}_{{\mathcal {I}}}(\sigma ) \subseteq \textit{out}_{{\mathcal {S}}}(\sigma ) \end{aligned}$$

where \(\textit{out}_{{\mathcal {I}}}(\sigma )\) is the set of outputs in \({\mathcal {I}}\) that is enabled after trace \(\sigma \). To extend ioco testing to stochastic systems, we need two auxiliary concepts that mirror trace prefixes and the set \(\textit{out}\) stochastically:

Definition 14

(prefix and output continuation) For trace distributions \({\mathcal {T}}\) of length k and \({\mathcal {T}}'\) of length \(\ge k\), the prefix relation \(\sqsubseteq _k\) is defined by

$$\begin{aligned} {\mathcal {T}}\sqsubseteq _k{\mathcal {T}}' ~\Leftrightarrow ~ \forall \sigma \in ( {\mathbb {R}}^{+}_{0} \times \textit{Act})^{\le k}:P_{{\mathcal {T}}}(\varSigma ) = P_{{\mathcal {T}}'}(\varSigma ). \end{aligned}$$

For an automaton \({\mathcal {A}}\), the output continuation of trace distribution \({\mathcal {T}}\) of length k is \(\textit{outcont}_{{\mathcal {A}}}({\mathcal {T}})\) defined as the set of all \({\mathcal {T}}' \in \textit{trd}({\mathcal {A}},k+1)\) such that

$$\begin{aligned} {\mathcal {T}}\sqsubseteq _{k}{\mathcal {T}}' \wedge \forall \sigma \in ({\mathbb {R}}^{+}_{0} \times \textit{Act})^{k} \times {\mathbb {R}}^{+}_{0} \times \textit{Act}_{I} :P_{{\mathcal {T}}'} ( \varSigma ) = 0. \end{aligned}$$

The prefix relation extends the one for traces to trace distributions. The output continuation of \({\mathcal {T}}\) of length k in \({\mathcal {M}}\) contains all trace distributions \({\mathcal {T}}'\) of length \(k+1\) such that \({\mathcal {T}}\sqsubseteq _k{\mathcal {T}}'\) and \({\mathcal {T}}'\) assigns probability zero to every abstract trace of length \(k+1\) that ends with an input.

We can now define the mar-ioco and sa-ioco conformance relations that relate input-enabled implementations \({\mathcal {I}}\) to specifications \({\mathcal {S}}\). Intuitively, \({\mathcal {I}}\) conforms to \({\mathcal {S}}\) if the probability of every output trace of \({\mathcal {I}}\) can be matched by \({\mathcal {S}}\) under some scheduler. This includes the functional behaviour, probabilistic behaviour, and stochastic timing, as accounted for in the definition of output continuations.

Definition 15

(mar-iocoandsa-ioco) Let \({\mathcal {I}}\) and \({\mathcal {S}}\) be automata over the same action signature with \({\mathcal {I}}\) input-enabled. \({\mathcal {I}}\) is \(\mathbf * \mathbf{-ioco } \)-conforming to \({\mathcal {S}}\), written \({\mathcal {I}}\sqsubseteq ^{*}_{\textit{ioco}}{\mathcal {S}}\), if for all \(k\in {\mathbb {N}}\) we have

$$\begin{aligned} \forall {\mathcal {T}}\in \textit{trd}({\mathcal {S}},k): \textit{outcont}_{{\mathcal {I}}}({\mathcal {T}})\subseteq \textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}). \end{aligned}$$

Example 4

Recall the protocol models of Fig. 1. After the send? input, there is a delay before the file transmission is either acknowledged or an error is reported. Let \({\mathcal {S}}\) be the leftmost automaton and \({\mathcal {I}}\) be the rightmost one. Consider now the scheduler of \({\mathcal {S}}\) that schedules send? with probability 1. Its set of output continuations in \({\mathcal {S}}\) contains all trace distributions that schedule the outgoing distribution leading to ack! and err! with probability p and halt with \(1-p\), for \(p\in [0,1]\). This holds for the set of output continuations in \({\mathcal {I}}\), but the probability to reach \(s_2\) within a certain amount of time t differs from \({\mathcal {S}}\) whenever \(\lambda _1\ne \lambda _2\). Hence, there are trace distributions in \({\mathcal {I}}\) such that the probability of, for example,

$$\begin{aligned}{}[0, 0] ~ \texttt {send?} ~ [0, t] ~ \texttt {ack!} \end{aligned}$$

cannot be matched. The implementation is therefore not conforming with respect to mar-ioco in this case.

Relationship to other relations If \({\mathcal {A}}\) is an IOMA without Markovian transitions or an IOSA where \({\mathcal {C}} = \varnothing \), then \({\mathcal {A}}\) is a probabilistic input–output transition system (pIOTS). Under this restriction, mar-ioco and sa-ioco coincide with pioco of [20] and are thus extensions of pioco:

Theorem 1

For two pIOTS \({\mathcal {I}}\) and \({\mathcal {S}}\) with \({\mathcal {I}}\) input-enabled, we have \({\mathcal {I}}\sqsubseteq ^{*}_{\textit{ioco}}{\mathcal {S}}\Leftrightarrow {\mathcal {I}}\sqsubseteq _{\textit{pioco}}{\mathcal {S}}\).

Proof sketch

All three relations are defined in the same way over trace distributions and schedulers, the notions for which coincide if \(T_M = \varnothing \) or \({\mathcal {C}} = \varnothing \), respectively. \(\square \)

Consequently, the relationships already established between pioco and other relations in [20] carry over as well: mar-ioco and sa-ioco extend ioco (i.e. the relations coincide on IOTS), and for trace distribution inclusion, we have the following result:

Theorem 2

Let \({\mathcal {A}},{\mathcal {B}}\) and \({\mathcal {C}}\) be automata and let \({\mathcal {A}}\) and \({\mathcal {B}}\) be input-enabled, then

$$\begin{aligned}&{\mathcal {A}} \sqsubseteq ^{*}_{\textit{ioco}}{\mathcal {B}} ~\Leftrightarrow ~ {\mathcal {A}} \sqsubseteq _{\textit{TD}}^{\textit{fin}}{\mathcal {B}}\\&\quad \text {and}{\mathcal {A}} \sqsubseteq ^{*}_{\textit{ioco}}{\mathcal {B}} ~\wedge ~ {\mathcal {B}} \sqsubseteq ^{*}_{\textit{ioco}}{\mathcal {C}} ~\Rightarrow ~ {\mathcal {A}}\sqsubseteq ^{*}_{\textit{ioco}}{\mathcal {C}}.\end{aligned}$$

Proof sketch

The fact that finite trace distribution inclusion implies conformance with respect to \(\sqsubseteq ^{*}_{\textit{ioco}}\) is immediate if we consider that the relation is defined via trace distributions. The opposite direction follows from the fact that all abstract traces of \({\mathcal {A}}\) ending in output assuredly can get assigned the same probabilities in \({\mathcal {B}}\) by \(\sqsubseteq ^{*}_{\textit{ioco}}\). All abstract traces ending in input are taken care of because \({\mathcal {A}}\) and \({\mathcal {B}}\) are input-enabled, and all such distributions are input-reactive. The second result is a direct consequence of the first. \(\square \)

4.2 Test cases and annotations

The advantage of MBT over manual testing is that test cases can be automatically generated from the specification and automatically executed on an implementation. We are interested in the result of a parallel composition of a test case and an implementation model. We define test cases over an action signature \(\langle \textit{Act}_I, \textit{Act}_O \rangle \). A test case is a collection of traces that represent the possible behaviour of a tester. It is summarised by an IOMA without Markovian transitions, or an IOSA without clocks, whose graph is a tree. The action signature describes the potential interaction with the implementation. In each state/location, the test may either stop, wait for a response of the system, or provide some stimulus. When a test is waiting for a response, it has to take into account all potential outputs including the situation that the system provides no response at all, modelled by quiescence \(\delta \). A single test case may provide multiple options, giving rise to multiple concrete testing sequences. It may also prescribe different reactions to different outputs.

Definition 16

(test case, test suite) A test case over an action signature \(\langle \textit{Act}_{I},\textit{Act}_{O} \rangle \) of system inputs \(\textit{Act}_{I}\) and system outputs \(\textit{Act}_{O}\) is an IOMA

$$\begin{aligned} {\mathfrak {t}}= \langle S, s_{0}, \textit{Act}^{\mathfrak {t}}, T_P, \varnothing \rangle \end{aligned}$$

or an IOSA

where \(\textit{Act}^{\mathfrak {t}}= \textit{Act}^{\mathfrak {t}}_I \uplus \textit{Act}^{\mathfrak {t}}_O\) with inputs \(\textit{Act}^{\mathfrak {t}}_I = \textit{Act}_O \cup \{\, \delta \,\}\) and outputs \(\textit{Act}^{\mathfrak {t}}_O = \textit{Act}_I \backslash \{\, \delta \,\}\) that is a finite, internally deterministic, and connected tree. In addition, all discrete distributions of the transitions or edges must be Dirac, and for every state or location s we require that either

  1. (1)

    \(\textit{enabled}(s) = \emptyset \) (stop the test) or

  2. (2)

    \(\textit{enabled}(s) = \textit{Act}^{\mathfrak {t}}_{I}\) (wait for some response) or

  3. (3)

    \(\textit{enabled}(s) \subseteq \textit{Act}^{\mathfrak {t}}_{O} \wedge |\textit{enabled}(s) = 1|\) (provide a single stimulus, deterministically).

A test suite\({\mathfrak {T}}\) is a set of test cases. A test case (suite) for an automaton \({\mathcal {S}}\) with inputs \(\textit{Act}_I\) and outputs \(\textit{Act}_O\) is a test case (suite) if it is defined over action signature \(\langle \textit{Act}_{I},\textit{Act}_{O} \rangle \) and if we additionally require in item 3 above that, if a transition or edge labelled \(a \in \textit{Act}_O^{\mathfrak {t}}\) can lead to state or location \(s'\) with positive probability, then there exists a \(\sigma \in \textit{traces}({\mathcal {S}})\) such that \(\sigma \,{.}\; t\; a \in \textit{traces}({\mathcal {S}})\) for some \(t\in {\mathbb {R}}^{+}_{0} \).

Test cases are, in effect, IOMA or IOSA that are IOTS. The inputs of a test case are the outputs of the action signature, i.e. the outputs of the implementation or specification, and vice versa. The last requirement in the definition ensures that only specified inputs are provided: a test may only judge the correctness of specified behaviour. This is referred to as being input minimal in the literature [47].

In order to identify the behaviour which we deem as functionally acceptable/correct, each complete trace of a test, i.e. every leaf state or location, is annotated with a pass or fail verdict. We annotate exactly the traces that are present in the specification with the \(pass \) verdict, formally:

Definition 17

(test annotation) For a test \({\mathfrak {t}}\), a test annotation is a function

$$\begin{aligned} \textit{ann}\in \textit{traces}^{\textit{com}}({\mathfrak {t}}) \rightarrow \{\, \textit{pass},\, \textit{fail} \,\}. \end{aligned}$$

A pair \({\hat{{\mathfrak {t}}}}= \langle {\mathfrak {t}}, \textit{ann} \rangle \) consisting of a test and a test annotation is an annotated test. The set of all such \({\hat{{\mathfrak {t}}}}\), denoted by \({\hat{{\mathfrak {T}}}}=\left\{ \left( t_{i},\textit{ann}_{i}\right) _{i\in {\mathcal {I}}}\right\} \) for some index set \({\mathcal {I}}\), is an annotated test suite. If \({\mathfrak {t}}\) is a test case for a specification \({\mathcal {S}}\) with signature \(\langle \textit{Act}_I, \textit{Act}_O \rangle \), we define

$$\begin{aligned} \textit{ann}_{*\textit{-ioco}}^{\mathcal {S}}\in \textit{traces}^{\textit{com}}({\mathfrak {t}}) \rightarrow \{\, \textit{pass},\, \textit{fail}\,\} \end{aligned}$$

by \(\textit{ann}_{*\textit{-ioco}}^{\mathcal {S}}(\sigma ) = \textit{fail}\) if there exist \(\rho \in \textit{traces}^{\textit{fin}}({\mathcal {S}})\), \(t \in {\mathbb {R}}^{+}_{0} \) and \(a \in \textit{Act}_{O}\) such that

$$\begin{aligned} \rho \,{.}\; t\; a \sqsubseteq \sigma ~\wedge ~ \rho \,{.}\; t\; a \notin \textit{traces}^{\textit{fin}}({\mathcal {S}}) \end{aligned}$$

and \(\textit{ann}_{*\textit{-ioco}}^{\mathcal {S}}(\sigma ) = \textit{pass}\) otherwise.

Annotations decide functional correctness only. The correctness of discrete probabilistic choices and stochastic delays is assessed in a separate second step.

Fig. 6
figure 6

Three test cases for the file server specification

Example 5

Figure 6 presents a test suite for the file server specification IOSA of Fig. 2. Test case \({\hat{{\mathfrak {t}}}}_1\) uses the quiescence observation \(\delta \) to assure no output is given in the initial state. \({\hat{{\mathfrak {t}}}}_2\) checks for eventual delivery of the file, which may be archived, requiring the intermediate wait! notification, or may be sent directly. Finally, \({\hat{{\mathfrak {t}}}}_3\) tests the abort? edge.

4.3 Sampling and verdicts

Functional conformance is assessed via test annotations in the same way as in classical ioco theory [47]. However, we test stochastic systems; thus, executing a test case once is insufficient to establish \(\mathbf * \mathbf{-ioco } \) conformance. We now focus on the statistical evaluation of the probabilistic and stochastic-timed behaviour based on a sample of multiple traces.

4.3.1 Sampling

We perform a statistical hypothesis test on the implementation based on the outcome of a push-button experiment in the sense of [37]. We assume a black-box timed trace machine with inputs, a time and an action window, and a reset button, as illustrated in Fig. 7. An observer records each individual execution before the reset button is pressed and a new execution starts. A clock that increases is started, and is stopped once the next visible action is recorded. We assume that recording an action resets the clock. Thus, the recordings of the external observer match the notion of (abstract) traces. After a sample of sufficient size has been collected, we compare the collected frequencies of abstract traces to their expected frequencies according to the specification. If the empiric observations are close to the expectations, we accept the probabilistic behaviour of the implementation.

Before the experiment, we fix the parameters for sample length \(k\in {\mathbb {N}}\) (the length of the individual test executions), sample size \(m\in {\mathbb {N}}\) (how many test executions to observe), and level of significance \(\alpha \in \; ]0, 1[\) (the probability of erroneously rejecting a correct implementation). Checking the abstract trace frequencies contained in the sample versus their expectancy w.r.t. the specification \({\mathcal {S}}\) requires a scheduler due to the presence of nondeterminism in \({\mathcal {S}}\). In order for any statistical reasoning to work, we assume each iteration of the sampling process to be governed by the same scheduler, which induces a trace distribution \({\mathcal {T}}\in \textit{trd}({\mathcal {I}})\).

4.3.2 Frequencies and expectations

To quantify how close a sample is to its expectations, we require a notion of distance. Our goal is to evaluate the deviation of a collected sample to the expected distribution. Thus, we require (1) a metric space for the quantification of distances between measures, (2) the frequency measure of abstract traces in a sample, and (3) the expected measure of abstract traces in the specification under \({\mathcal {T}}\).

Fig. 7
figure 7

Black-box timed trace machine

For automaton \({\mathcal {A}}\), we use metric space \(\langle \mathrm {Meas}({\mathcal {A}}), \textit{dist} \rangle \) where the metric

$$\begin{aligned} \textit{dist}(u,v)\sup _{\sigma \in ({\mathbb {R}}^{+}_{0} \times \textit{Act})^{\le k}}|u(\varSigma )-v(\varSigma )|\end{aligned}$$

is the maximal variation distance of two measures u and v. (Recall we denote by \(\varSigma \) the abstract trace corresponding to the trace \(\sigma \).) We next define the two measures—the frequency measure for a sample and the expected measure according to the specification—that need to be compared. Our definitions for the former differ between IOMA and IOSA due to their different models of stochastic time.

Memoryless time For IOMA, our frequency measure can assume the independence of all time intervals since the delays are memoryless. Thus, we order the i-th time intervals of all \(\rho \) increasingly and compare them to \(\sigma \). We achieve this by grouping traces into classes based on the same visible action behaviour. For a given trace \(\sigma \), its class \(\varSigma _\sigma \) is the set of all traces \(\rho \in O\) such that \(\textit{act}\left( \rho \right) =\textit{act}\left( \sigma \right) \). A sample of length k and width m then induces the frequency measure

$$\begin{aligned} \textit{freq} \in (({\mathbb {R}}^{+}_{0} \times {\mathbb {R}}^{+}_{0}) \times \textit{Act})^{\le k\times m} \rightarrow \mathrm {Meas}(({\mathbb {R}}^{+}_{0} \times \textit{Act})^{\le k}) \end{aligned}$$

defined by

where \(t_i^\rho \) denotes the i-th time stamp of trace \(\rho \). In this way, the distributions for each time stamp in a trace converge to the true underlying distribution by the Glivenko–Cantelli theorem [22].

General stochastic time For IOSA, we define the frequency measure by

i.e. the fraction of traces in O that are in \(\varSigma \). Specifically, we require all time stamps to be contained in the intervals given in \(\varSigma \). In contrast to IOMA, this function does not assume the independence of clock valuations from locations.

Expected measure The last missing ingredient is the expected measure according to a specification. Let \({\mathcal {T}}\) be the trace distribution resulting from the resolution of all nondeterministic choices. We treat each iteration of the sampling process of the implementation as Bernoulli trial. Recall that a Bernoulli trial has two outcomes: success with probability p and failure with probability \(1-p\). For any trace \(\sigma \), we say that success occurred at position i of the sample if \(\sigma =\sigma _i\). Therefore, let \(X_i\sim \textit{Ber}(P_{{\mathcal {T}}}(\varSigma ))\) be Bernoulli distributed random variables for \(i=1,\ldots , m\). Let \(Z=\frac{1}{m}\varSigma _{i=1}^m X_i\) be the empiric mean with which we observe \(\sigma \) in a sample. The expected probability under \({\mathcal {T}}\) is then calculated as

$$\begin{aligned} \mathbb {E^{{\mathcal {T}}}}(Z)={\mathbb {E}}^{\mathcal {T}}\left( \frac{1}{m}\varSigma _{i=1}^m X_i\right) =\frac{1}{m}\varSigma _{i=1}^m{\mathbb {E}}^{{\mathcal {T}}}(X_i)=P_{{\mathcal {T}}}(\varSigma ). \end{aligned}$$

Hence, the expected probability for each abstract trace \(\varSigma \) is the probability of \(\varSigma \) under trace distribution \({\mathcal {T}}\), as expected.

Example 6

Returning to the example of

$$\begin{aligned} \sigma _1 = 0.5 ~ \texttt {a?} ~ 0.6 ~ \texttt {b!} \text {~~and~~} \sigma _2 = 0.6 ~ \texttt {a?} ~ 0.5 ~ \texttt {b!}, \end{aligned}$$

assume \(O=\{\, \sigma _1, \sigma _2 \,\}\). Then,

$$\begin{aligned}&\textit{freq}(O)([0,0.5]\,\texttt {a?}\,[0,0.5]\,\texttt {b!})=\frac{2}{2}\cdot \frac{1}{2}\cdot \frac{1}{2}=\frac{1}{4},\\&\textit{freq}(O)([0,0.5]\,\texttt {a?}\,[0,0.6]\,\texttt {b!})=\frac{2}{2}\cdot \frac{1}{2}\cdot \frac{2}{2}=\frac{1}{2},\\&\textit{freq}(O)([0,0.6]\,\texttt {a?}\,[0,0.6]\,\texttt {b!})=\frac{2}{2}\cdot \frac{2}{2}\cdot \frac{2}{2}=1. \end{aligned}$$

4.3.3 Acceptable outcomes

We accept a sample O if \(\textit{freq}(O)\) lies within some distance \(r_{\alpha }\) of the expected measure \({\mathbb {E}}^{\mathcal {T}}\). All measures deviating at most \(r_\alpha \) from the expected measures are contained within the ball \(B_{r_\alpha }({\mathbb {E}}^{\mathcal {T}})\). The actual \(r_\alpha \) is chosen such that the error of accepting an erroneous sample is limited while keeping the error of rejecting a correct sample smaller than \(\alpha \), i.e.

$$\begin{aligned} r_{\alpha }=\inf \{r\in {\mathbb {R}}^{+}_{0} \mid P_{{\mathcal {T}}}(\textit{freq}^{-1}(B_r({\mathbb {E}}^{\mathcal {T}})))\ge 1-\alpha \}. \end{aligned}$$

Definition 18

(acceptable outcomes) For \(k,m\in {\mathbb {N}}\) and an automaton \({\mathcal {A}}\), the set of acceptable outcomes under \({\mathcal {T}}\in \textit{trd}({\mathcal {A}},k)\) of significance level \(\alpha \in (0,1)\) is \(\textit{Obs}({\mathcal {T}},\alpha ,k,m) =\)

$$\begin{aligned} \{\, O \in ({\mathbb {R}}^{+}_{0} \times \textit{Act})^{\le k\times m} \mid \textit{dist}(\textit{freq}(O),{\mathbb {E}}^{\mathcal {T}})\le r_\alpha \,\}. \end{aligned}$$

We obtain the set of acceptable outcomes of \({\mathcal {A}}\) by

$$\begin{aligned} \textit{Obs}({\mathcal {A}},\alpha ,k,m)=\bigcup _{{\mathcal {T}}\in \textit{trd}({\mathcal {A}},k)}\textit{Obs}({\mathcal {T}},\alpha ,k,m). \end{aligned}$$

The set of acceptable outcomes consists of all possible samples that we are willing to accept as close enough to the expectations. Note that this takes all possible trace distributions of \({\mathcal {A}}\) into consideration. The set of acceptable outcomes has two properties reflecting the error of false rejection and the error of false acceptance, respectively: first, if a sample was generated under a trace distribution of \({\mathcal {A}}\) or a trace distribution-equivalent automaton, we correctly accept it with probability higher than \(1-\alpha \), i.e.

$$\begin{aligned} P_{{\mathcal {T}}}(\textit{Obs}({\mathcal {T}},\alpha ,k,m))\ge 1-\alpha ; \end{aligned}$$

second, if a sample was generated by a non-admitted trace distribution, the chance of erroneously accepting it is smaller than some \(\beta _m\). Again, \(\alpha \) is the a priori defined level of significance, and \(\beta _m\) is unknown, but minimal by construction. Additionally, \(\beta _m\rightarrow 0\) as \(m\rightarrow \infty \): the error of falsely accepting an observation decreases with increasing sample size.

Remark 2

The set of acceptable outcomes comprises samples of the form \(O \in ({\mathbb {R}}^{+}_{0} \times \textit{Act})^{\le k\times m}\). In order to align observations with the \(\mathbf * \mathbf{-ioco } \) relations, we define the set of acceptable output outcomes \(\textit{OutObs}({\mathcal {T}},\alpha ,k,m)\) as the set of those \(O\in (({\mathbb {R}}^{+}_{0} \times \textit{Act})^{\le k-1} \times {\mathbb {R}}^{+}_{0} \times \textit{Act}_O)^m\) for which we have \(\textit{dist}(\textit{freq}(O), {\mathbb {E}}^{\mathcal {T}})\le r_\alpha \).

Verdict functions With all necessary components in place, the following decision process summarises whether an implementation fails a test case or test suite based on a functional or statistical verdict. The overall pass verdict is given iff both sub-verdicts yield a pass. Let \(\textit{Aut}_{*}\) denote the set of all IOMA or IOSA, respectively.

Definition 19

(verdicts) Given a specification automaton \({\mathcal {S}}\), an annotated test \({\hat{{\mathfrak {t}}}}\) for \({\mathcal {S}}\), \(k,m\in {\mathbb {N}}\) where k is the length of the longest trace of \({\hat{{\mathfrak {t}}}}\), and \(\alpha \in (0,1)\), we define the functional verdict as the function

$$\begin{aligned} v_{\textit{func}} \in \textit{Aut}_{*} \times \textit{Aut}_{*} \rightarrow \{\, \textit{pass},\, \textit{fail}\,\} \end{aligned}$$

with \(v_{\textit{func}}({\mathcal {I}}, {\hat{{\mathfrak {t}}}}) = \textit{pass}\) if

$$\begin{aligned} \forall \sigma \in \textit{traces}^{\textit{com}}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}}) :\textit{ann}_{*\textit{-ioco}}^{\mathcal {S}}(\sigma ) = \textit{pass}\end{aligned}$$

and \(v_{\textit{func}}({\mathcal {I}}, {\hat{{\mathfrak {t}}}}) = \textit{fail}\) otherwise, the statistical verdict as

$$\begin{aligned} v_{\textit{prob}} \in \textit{Aut}_{*} \times \textit{Aut}_{*} \rightarrow \{\, \textit{pass},\, \textit{fail}\,\} \end{aligned}$$

with \(v_{\textit{prob}}({\mathcal {I}},{\hat{{\mathfrak {t}}}}) = \textit{pass}\) if for all \({\mathcal {T}}\in \textit{trd}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}})\) there exists a \({\mathcal {T}}'\in \textit{trd}({\mathcal {S}},k)\) such that

$$\begin{aligned} P_{{\mathcal {T}}'}(\textit{OutObs}({\mathcal {T}},\alpha ,k,m))\ge 1-\alpha \end{aligned}$$

and \(v_{\textit{prob}}({\mathcal {I}},{\hat{{\mathfrak {t}}}}) = \textit{fail}\) otherwise, and the overall verdict as

$$\begin{aligned} V \in \textit{Aut}_{*} \times \textit{Aut}_{*} \rightarrow \{\, \textit{pass},\, \textit{fail}\,\} \end{aligned}$$

\(\text {with } V({\mathcal {I}},{\hat{{\mathfrak {t}}}})={\left\{ \begin{array}{ll} \textit{pass}&{} \text{ if } v_{\textit{func}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=v_{\textit{prob}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{pass}\\ \textit{fail}&{} \text{ otherwise }. \end{array}\right. }\)

An implementation passes a test suite \({\hat{{\mathfrak {T}}}}\) if it passes the overall verdict for all annotated tests \({\hat{{\mathfrak {t}}}}\in {\hat{{\mathfrak {T}}}}\).

Although IOMA and IOSA include three properties in terms of (1) functional behaviour, (2) discrete probabilistic behaviour, and (3) continuous time, we only have two verdicts. This is because continuous time is only present in the form of stochastic delays. Thus, on the purely mathematical level, the decision whether or not a delay in the implementation adheres to the one specified is covered by the probabilistic verdict \(v_{\textit{prob}}\). Only on the practical side of things do we need a new decision procedure. We study this in Sect. 5.

4.4 Soundness and completeness

Ideally, only \(\mathbf * \mathbf{-ioco } \)correct implementations pass a test suite. However, due to the stochastic nature of our models, there remains a degree of uncertainty upon giving verdicts. This is phrased as errors of first and second kind in hypothesis testing: the probability to reject a true hypothesis and to accept a false one, respectively. They are reflected as the probability to reject a correct implementation and to accept an erroneous one in the context of probabilistic MBT. The relevance of these errors becomes evident when we consider the correctness of our test frameworks. Correctness comprises soundness and completeness: every conforming implementation passes, and there is a test case to expose every non-conforming one. A test suite can only be considered correct with some guaranteed (high) probability.

Definition 20

(sound, complete) Let \({\mathcal {S}}\) be a specification automaton over action signature \(\langle \textit{Act}_I, \textit{Act}_O \rangle \), \(\alpha \in \; ]0,1[\) the level of significance, and \({\hat{{\mathfrak {T}}}}\) an annotated test suite for \({\mathcal {S}}\). Then, \({\hat{{\mathfrak {T}}}}\) is sound for \({\mathcal {S}}\) with respect to \(\sqsubseteq ^{*}_{\textit{ioco}}\) if, for all input-enabled automata \({\mathcal {I}}\) and sufficiently large \(m\in {\mathbb {N}}\), it holds for all \({\hat{{\mathfrak {t}}}}\in {\hat{{\mathfrak {T}}}}\) that

$$\begin{aligned} {\mathcal {I}}\sqsubseteq ^{*}_{\textit{ioco}}{\mathcal {S}}~\Rightarrow ~ V({\mathcal {I}}, {\hat{{\mathfrak {t}}}}) = \textit{pass}. \end{aligned}$$

\({\hat{{\mathfrak {T}}}}\) is complete for \({\mathcal {S}}\) with respect to \(\sqsubseteq ^{*}_{\textit{ioco}}\) if, for all input-enabled automata \({\mathcal {I}}\) and sufficiently large \(m\in {\mathbb {N}}\), there is at least one \({\hat{{\mathfrak {t}}}}\in {\hat{{\mathfrak {T}}}}\) such that

$$\begin{aligned} {\mathcal {I}}\not \sqsubseteq ^{*}_{\textit{ioco}}{\mathcal {S}}~\Rightarrow ~ V({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{fail}. \end{aligned}$$

Soundness expresses for a given \(\alpha \in \; ]0,1[\) that there is a \(1-\alpha \) chance that a correct system passes the annotated test suite for sufficiently large sample size m. This relates to false rejection of a correct hypothesis in statistical hypothesis testing, or rejection of a correct implementation, respectively.

For the following theorems, we provide full proofs for sa-ioco. The proofs for mar-ioco use the exact same arguments and only lack some of the technical complications of the more general IOSA setting. The interested reader may find the full proofs for mar-ioco in [18].

Theorem 3

Each annotated test case for an automaton \({\mathcal {S}}\) is sound for every level of significance \(\alpha \in (0,1)\) with respect to \(\sqsubseteq ^{*}_{\textit{ioco}}\).

Proof

Let \({\mathcal {I}}\) be an input-enabled IOSA and \({\hat{{\mathfrak {t}}}}\) be a test for \({\mathcal {S}}\). Assume that \({\mathcal {I}}\sqsubseteq ^{\textit{sa}}_{\textit{ioco}}{\mathcal {S}}\). We want to show \(V({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{pass}\). By Definition 19, we have that \(V({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{pass}\) if and only if \(v_{\textit{func}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=v_{\textit{prob}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{pass}\). We proceed by showing \(v_{\textit{func}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{pass}\) and \(v_{\textit{prob}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{pass}\) in separate steps:

Functional verdict By Definition 19, we need to show that

$$\begin{aligned} \textit{ann}_{\textit{sa-ioco}}^{\mathcal {S}}\left( \sigma \right) =\textit{pass} \text{ for } \text{ all } \sigma \in \textit{traces}^{\textit{com}}({\mathcal {I}}\!\parallel \!{\hat{t}}). \end{aligned}$$

Let \(\sigma \in \textit{traces}^{\textit{com}}({\mathcal {I}}\!\parallel \!{\hat{t}})\) and use Definition 17. Assume \(\sigma '\in \textit{traces}^{\textit{fin}}({\mathcal {S}})\) and \(a\in \textit{Act}_O\) such that \(\sigma ' \!\!\!\mathbin {.} t\,a\sqsubseteq \sigma \) for some \(t\in {\mathbb {R}}^{+}_{0} \). We observe that (a) since the empty trace is a trace and is in \(\textit{traces}^{\textit{fin}}({\mathcal {S}})\), \(\sigma '\) always exists, and (b) if no such \(a \in \textit{Act}_O\) exists, then \(\sigma \) only consists of inputs, and by Definition 17 consequently \(\textit{ann}_{\textit{sa-ioco}}^{\mathcal {S}}(\sigma )=\textit{pass}\). By construction of \(\sigma \), we have \(\sigma ' \!\!\mathbin {.} t\, a \in \textit{traces}^{\textit{fin}}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}})\) and therefore also \(\sigma ' \!\!\mathbin {.} t\, a \in \textit{traces}^{\textit{fin}}({\mathcal {I}})\). In particular, the parallel composition with a test case does not alter the guard sets on edges. We conclude that \(\sigma '\in \textit{traces}^{\textit{fin}}({\mathcal {I}})\cap \textit{traces}^{\textit{fin}}({\mathcal {S}})\). Our goal is to show \(\sigma ' \!\!\mathbin {.} \, t\, a \in \textit{traces}^{\textit{fin}}({\mathcal {S}})\).

Let \(l=\left| \sigma '\right| \) be the length of \(\sigma '\). W.l.o.g. we can now choose \({\mathcal {T}}\in \textit{trd}({\mathcal {S}},l)\) such that \(P_{\mathcal {T}}(\varSigma ')>0\). In particular, this choice is not invalidated by urgent transitions. If a transition has a guard set with a clock that can never expire in a location due to another urgent output, then this transition is never part of a path (Definition 6). With the previous observation, this yields \(\textit{outcont}_{{\mathcal {I}}}({\mathcal {T}})\ne \varnothing \). Again, w.l.o.g. we choose \({\mathcal {T}}'\in \textit{outcont}_{{\mathcal {I}}}({\mathcal {T}})\) such that \(P_{{\mathcal {T}}'}(\varSigma ' \!\!\mathbin {.} [0,t]\,a)>0\). Finally, we assumed \({\mathcal {I}}\sqsubseteq ^{\textit{sa}}_{\textit{ioco}}{\mathcal {S}}\); hence,

$$\begin{aligned} \textit{outcont}_{{\mathcal {I}}}({\mathcal {T}})\subseteq \textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}). \end{aligned}$$

We conclude \({\mathcal {T}}'\in \textit{trd}({\mathcal {S}},l+1)\) and \(P_{{\mathcal {T}}'}(\varSigma ' \!\!\mathbin {.} [0,t]\, a)>0\). By Definition 13, this implies \(\sigma ' \!\!\mathbin {.} t\, a\in \textit{traces}^{\textit{fin}}({\mathcal {S}})\). If additionally \(\sigma ' \!\!\mathbin {.} t\,a\in \textit{traces}^{\textit{com}}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}})\), then \(\sigma =\sigma ' \!\!\mathbin {.} t\,a\). Consequently, \(\textit{ann}_{\textit{sa-ioco}}^{\mathcal {S}}(\sigma )=pass \) by Definition 17 and \(v_{\textit{func}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{pass}\).

Statistical verdict By Definition 19, we must show that for all \({\mathcal {T}}\in \textit{trd}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}},k)\) there exists a \({\mathcal {T}}'\in \textit{trd}({\mathcal {S}},k)\) such that

$$\begin{aligned} P_{{\mathcal {T}}'}\left( \textit{OutObs}\left( {\mathcal {T}},\alpha ,k,m\right) \right) \ge 1-\alpha . \end{aligned}$$

Let \({\mathcal {T}}\in \textit{trd}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}},k)\). By Remark 2, \(\textit{OutObs}({\mathcal {T}},\alpha ,k,m)\) is the set of all \(O \in (({\mathbb {R}}^{+}_{0} \times \textit{Act})^{\le k-1} \times {\mathbb {R}}^{+}_{0} \times \textit{Act}_O)^m\) such that \(\textit{dist}(\textit{freq}(O),{\mathbb {E}}^{{\mathcal {T}}})\le r_{\alpha }\). There exists \({\mathcal {T}}\in \textit{trd}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}},k)\) with

$$\begin{aligned} P_{{\mathcal {T}}'}(\varSigma )={\left\{ \begin{array}{ll} 0 &{} \text {if } \sigma \in ({\mathbb {R}}^{+}_{0} \times \textit{Act})^{k-1} \times {\mathbb {R}}^{+}_{0} \times \textit{Act}_I \\ P_{{\mathcal {T}}}(\varSigma ) &{} \text {if } \sigma \in ({\mathbb {R}}^{+}_{0} \times \textit{Act})^{\le k-1} \times {\mathbb {R}}^{+}_{0} \times \textit{Act}_O. \end{array}\right. }\nonumber \\ \end{aligned}$$
(1)

To see why, consider the scheduler that assigns all probability to halting instead of inputs for traces of length k while assigning the same probability to outputs as the scheduler of \({\mathcal {T}}\). By construction of \(\textit{OutObs}\) (Remark 2), observe that

$$\begin{aligned} P_{{\mathcal {T}}'}(\textit{OutObs}({\mathcal {T}}',\alpha ,k,m))= & {} P_{{\mathcal {T}}'}(\textit{OutObs}({\mathcal {T}},\alpha ,k,m)) \\= & {} P_{{\mathcal {T}}}(\textit{OutObs}({\mathcal {T}},\alpha ,k,m)) \\\ge & {} 1-\alpha \end{aligned}$$

since only traces ending in output are measured.

It is now sufficient to show that \({\mathcal {T}}'\in \textit{trd}({\mathcal {S}},k)\). As an intermediate step, we first show that \({\mathcal {T}}'\in \textit{trd}({\mathcal {I}},k)\), as this will let us make use of the assumption \({\mathcal {I}}\sqsubseteq ^{\textit{sa}}_{\textit{ioco}}{\mathcal {S}}\). Consider the mapping

$$\begin{aligned} f \in \textit{paths}^{\textit{fin}}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}})\longrightarrow \textit{paths}^{\textit{fin}}({\mathcal {I}}) \end{aligned}$$

where for every fragment of the path we have

$$\begin{aligned}&f(\ldots \langle \langle \ell ,q \rangle ,v_0,x_0\rangle \langle t_1,e_1,R_1,|\langle \ell ,q \rangle ,v_1,x_1\rangle \rangle \ldots )\\&\quad = \ldots \langle \ell ,{\bar{v}}_0,{\bar{x}}_0\rangle \langle {\bar{t}}_1,{\bar{e}}_1,{\bar{R}}_1\langle \ell ',{\bar{v}}_1,{\bar{x}}_1\rangle \rangle \ldots \end{aligned}$$

This is possible because test cases do not contain clocks and parallel composition thus does not change guard sets, restart sets, or expiration times (Definition 4) and implies \(v_i={\bar{v}}_i \wedge x_i={\bar{x}}_i\) for \(i=0,1\) and \(t_1={\bar{t}}_1 \wedge R_1={\bar{R}}_1\). For \({\bar{e}}_1\) consider \(g \in E _{{\mathcal {I}}\,\!\parallel \!\,{\hat{{\mathfrak {t}}}}}\rightarrow E _{{\mathcal {I}}}\) such that

$$\begin{aligned} g(e) = g(C,a,\mu (R,(\ell ,q))) = (C,a,{\bar{\mu }}(R,\ell ))={\bar{e}} \end{aligned}$$

where \(\mu (R,\langle \ell ,q \rangle )={\bar{\mu }}(R,\ell )\) for all \(\ell \). This construction of \(\mu \) is possible because tests only contain Dirac distributions and discrete probabilities thus directly transfer. Hence, q is uniquely determined by parallel composition. Since \({\hat{{\mathfrak {t}}}}\) is internally deterministic, f is an injective mapping, i.e.

$$\begin{aligned} f(\pi _1)=f(\pi _2) \Rightarrow \pi _1=\pi _2. \end{aligned}$$

By Definition 13, there is a scheduler \({\mathfrak {S}}'\in \textit{Sched}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}})^{\le k}\) such that \(\textit{trd}({\mathfrak {S}}')={\mathcal {T}}'\). With the help of f, we show the existence of a scheduler \({\mathfrak {S}}''\in \textit{Sched}({\mathcal {I}})\) such that for all traces \(\sigma \) we have \(P_{\textit{trd}({\mathfrak {S}}')}(\varSigma )=P_{\textit{trd}({\mathfrak {S}}'')}(\varSigma )\), i.e. \(\textit{trd}({\mathfrak {S}}'')={\mathcal {T}}'\).

For every path \(\pi \in \textit{paths}^{\textit{fin}}({\mathcal {I}})\) with

$$\begin{aligned} f^{-1}(\pi )\in \textit{paths}^{\textit{fin}}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}}), \end{aligned}$$

we define \({\mathfrak {S}}''\) as \({\mathfrak {S}}''(\pi )({\bar{e}}){\mathfrak {S}}'(f^{-1}(\pi ))(e)\). \(P_{{\mathfrak {S}}''}(\Pi )=0\) if \(\pi \notin \textit{paths}^{\textit{fin}}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}})\). The construction of \({\mathfrak {S}}''\) is straightforward: due to the construction of test cases, \({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}}\) is internally deterministic. In particular, there is no interleaving. This means that \({\mathfrak {S}}''\) can copy the behaviour of \({\mathfrak {S}}'\) step by step. We set \({\mathcal {T}}''=\textit{trd}({\mathfrak {S}}'')\) and conclude \({\mathcal {T}}''\in \textit{trd}({\mathcal {I}},k)\). By construction \(P_{{\mathcal {T}}''}(\varSigma )=P_{{\mathcal {T}}'}(\varSigma )\) for all traces \(\sigma \). Further,

$$\begin{aligned} P_{{\mathcal {T}}''}(\textit{OutObs}({\mathcal {T}}'',\alpha ,k,m))= & {} P_{{\mathcal {T}}''}(\textit{OutObs}({\mathcal {T}}',\alpha ,k,m)) \\= & {} P_{{\mathcal {T}}''}(\textit{OutObs}({\mathcal {T}},\alpha ,k,m))\\= & {} P_{{\mathcal {T}}'}(\textit{OutObs}({\mathcal {T}},\alpha ,k,m)) \\= & {} P_{{\mathcal {T}}}(\textit{OutObs}({\mathcal {T}},\alpha ,k,m)) \\\ge & {} 1-\alpha . \end{aligned}$$

We proceed to show that \({\mathcal {T}}''\in \textit{trd}({\mathcal {S}},k)\). The proof is by induction over trace distribution length of prefixes of \({\mathcal {T}}''\) up to k. Trivially, if \({\mathcal {T}}''\in \textit{trd}({\mathcal {I}},0)\), then also \({\mathcal {T}}''\in \textit{trd}({\mathcal {S}},0)\). Assume this has been shown for length n. We proceed by showing that the statement holds for \(n+1\le k\). Let \({\mathcal {T}}''\in \textit{trd}({\mathcal {I}},n+1)\) and take \({\mathcal {T}}'''\sqsubseteq _n{\mathcal {T}}''\). By induction assumption \({\mathcal {T}}'''\in \textit{trd}({\mathcal {S}},n)\). Together with \({\mathcal {I}}\sqsubseteq ^{\textit{sa}}_{\textit{ioco}}{\mathcal {S}}\), we have

$$\begin{aligned} \textit{outcont}_{{\mathcal {I}}}({\mathcal {T}}''')\subseteq \textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}'''). \end{aligned}$$

Since \({\mathcal {T}}''\in \textit{outcont}_{{\mathcal {I}}}({\mathcal {T}}''')\) (Eq. 1), we also have that \({\mathcal {T}}''\in \textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}''')\), and consequently \({\mathcal {T}}''\in \textit{trd}({\mathcal {S}},n+1)\). We showed \({\mathcal {T}}''\in \textit{trd}({\mathcal {S}},k)\) and conclude

$$\begin{aligned} P_{{\mathcal {T}}''}(\textit{OutObs}({\mathcal {T}},\alpha ,k,m))\ge 1-\alpha . \end{aligned}$$

Ultimately, this yields \(v_{\textit{prob}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{pass}\) by Definition 19). \(\square \)

Completeness of a test suite is an inherently theoretical result. Infinite behaviour of the implementation, for instance, via loops, would require an infinite test suite. Moreover, the possibility of accepting an erroneous implementation by chance, i.e. committing an error of the second kind, remains. However, the latter is bounded from above by construction, and decreases with increasing sample size (Definition 18).

Theorem 4

The set of all annotated test cases for an automaton \({\mathcal {S}}\) is complete for every level of significance \(\alpha \in (0,1)\) with respect to \(\sqsubseteq ^{\textit{sa}}_{\textit{ioco}}\) for sufficiently large sample size.

Proof

Assume \({\mathcal {I}}\not \sqsubseteq ^{\textit{sa}}_{\textit{ioco}}{\mathcal {S}}.\) We want to show that \(V({\mathcal {I}},{\hat{{\mathfrak {T}}}})=\textit{fail}\). By the definition of verdicts (Definition 19), this is the case iff \(v_{\textit{func}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{fail}\) or \(v_{\textit{prob}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{fail}\) for some \({\hat{{\mathfrak {t}}}}\in {\hat{{\mathfrak {T}}}}\). Since \({\mathcal {I}}\not \sqsubseteq ^{\textit{sa}}_{\textit{ioco}}{\mathcal {S}}\), there is a \(k\in {\mathbb {N}}\) such that there is a \({\mathcal {T}}^*\in \textit{trd}({\mathcal {S}},k)\) for which \(\textit{outcont}_{{\mathcal {I}}}({\mathcal {T}}^*) \nsubseteq \textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}^*)\). More specifically, there exists a \({\mathcal {T}}\in \textit{outcont}_{{\mathcal {I}}}({\mathcal {T}}^*)\) such that

$$\begin{aligned} \forall {\mathcal {T}}'\in \textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}^*):\exists \,\sigma \in {\mathfrak {C}}:P_{{\mathcal {T}}}(\varSigma )\ne P_{{\mathcal {T}}'}(\varSigma ) \end{aligned}$$
(2)

where \({\mathfrak {C}}\textit{traces}^{\textit{fin}}({\mathcal {I}})\cap ({\mathbb {R}}^{+}_{0} \times \textit{Act})^k \times {\mathbb {R}}^{+}_{0} \times \textit{Act}_O\) and \(\varSigma \) is the abstract trace of \(\sigma \). W.l.o.g. we can assume k to be minimal. There are two cases to consider: (1) \(\exists \sigma \in {\mathfrak {C}}:\sigma \notin \textit{traces}^{\textit{fin}}({\mathcal {S}})\), or (2) \(\forall \sigma \in {\mathfrak {C}}:\sigma \in \textit{traces}^{\textit{fin}}({\mathcal {S}})\). We will relate the two cases to the functional and the probabilistic verdict (Definition 19): we prove that case 1 implies that \(v_{\textit{func}}({\mathcal {I}},{\hat{{\mathfrak {T}}}})=\textit{fail}\) and that case 2 implies \(v_{\textit{prob}}({\mathcal {I}},{\hat{{\mathfrak {T}}}})=\textit{fail}\). Now let \({\mathcal {T}}\in \textit{outcont}_{{\mathcal {I}}}({\mathcal {T}}^*)\) such that Eq. 2 holds for all \({\mathcal {T}}'\in \textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}^*)\).

Functional verdict By Definition 19, we need to show

$$\begin{aligned} \exists \sigma \in \textit{traces}^{\textit{com}}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}}):\textit{ann}_{\textit{sa-ioco}}^{\mathcal {S}}(\sigma )=\textit{fail}\end{aligned}$$

for some \({\hat{{\mathfrak {t}}}}\in {\hat{{\mathfrak {T}}}}\). Assume there is a \(\sigma \in {\mathfrak {C}}\) such that \(\sigma \notin \textit{traces}^{\textit{fin}}({\mathcal {S}})\). Our goal is to show that there is \({\hat{{\mathfrak {t}}}}\in {\hat{{\mathfrak {T}}}}\) for which \(\sigma \in \textit{traces}^{\textit{com}}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}})\) and \(\textit{ann}^{\mathcal {S}}_{\textit{pioco}}(\sigma )=fail \).

Without loss of generality, we assume \(P_{{\mathcal {T}}}(\varSigma )>0\). To see why, assume \(P_{{\mathcal {T}}}(\varSigma )=0\). Then, we can find a trace distribution in \(\textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}^*)\) with an underlying scheduler \(\textit{Sched}({\mathcal {S}})\) that does not assign positive probability to the last action in \(\sigma \) to obtain overall probability zero. This violates the assumption that \(P_{{\mathcal {T}}}(\varSigma )\ne P_{{\mathcal {T}}'}(\varSigma )\) for all \({\mathcal {T}}'\in \textit{trd}({\mathcal {S}})\). We conclude \(\sigma =\sigma ' \!\!\mathbin {.} t\,a\), for some \(\sigma ' \in ({\mathbb {R}}^{+}_{0} \times \textit{Act})^k\), \(a\in \textit{Act}_O\) and \(t\in {\mathbb {R}}^{+}_{0} \). The prefix \(\sigma '\) is in \(\textit{traces}^{\textit{fin}}({\mathcal {S}})\) because it is of length k and since \({\mathcal {T}}^*\in \textit{trd}({\mathcal {S}},k)\). Since \({\mathcal {T}}\) and all \({\mathcal {T}}'\in \textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}^*)\) are continuations of \({\mathcal {T}}^*\), we conclude that \(P_{{\mathcal {T}}^*}(\varSigma ')=P_{{\mathcal {T}}}(\varSigma ')=P_{{\mathcal {T}}'}(\varSigma '),\) i.e. that all trace distributions of the respective sets assign every prefix of \(\sigma \) the same probability by merit of \(\textit{outcont}\). We conclude \(\sigma '\in \textit{traces}^{\textit{fin}}({\mathcal {S}})\), but \(\sigma ' \!\!\mathbin {.} t\,a\notin \textit{traces}^{\textit{fin}}({\mathcal {S}})\).

By initial assumption \({\hat{{\mathfrak {T}}}}\) contains all annotated test cases. Let \({\hat{{\mathfrak {t}}}}\in {\hat{{\mathfrak {T}}}}\) such that \(\sigma \in \textit{traces}^{\textit{com}}({\hat{{\mathfrak {t}}}})\). This is possible because \(\sigma '\in \textit{traces}^{\textit{fin}}({\mathcal {S}})\). By Definition 17, \(\textit{ann}_{\textit{sa-ioco}}^{\mathcal {S}}(\sigma )=\textit{fail}\). Recall that the set of clocks in test cases in empty. Since \(\sigma \in \textit{traces}^{\textit{fin}}({\mathcal {I}})\) and \(\sigma \in \textit{traces}^{\textit{com}}({\hat{{\mathfrak {t}}}})\), we consequently also have \(\sigma \in \textit{traces}^{\textit{com}}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}})\) as no guard or restart sets are changed under parallel composition with a test case. Ultimately, this yields \(v_{\textit{func}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{fail}\).

Statistical verdict By Definition 19, we must show that there is \({\mathcal {T}}\in \textit{trd}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}},l)\) such that for all \({\mathcal {T}}'\in \textit{trd}({\mathcal {S}},l)\) we have

$$\begin{aligned} P_{{\mathcal {T}}'}(\textit{OutObs}({\mathcal {T}},\alpha ,l,m))<1-\alpha , \end{aligned}$$

for some \({\hat{{\mathfrak {t}}}}\in {\hat{{\mathfrak {T}}}}\) and some \(l\in {\mathbb {N}}\).

Together with Eq. 2 and Definition 18, we conclude that for all \({\mathcal {T}}'\in \textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}^*)\) we have

$$\begin{aligned} P_{{\mathcal {T}}'}(\textit{OutObs}({\mathcal {T}},\alpha ,k+1,m))<\beta _m \end{aligned}$$
(3)

for some \(\beta _m\rightarrow 0\) as \(m\rightarrow \infty \). Observe that

$$\begin{aligned}&\sup _{{\mathcal {T}}'\in \textit{trd}({\mathcal {S}},k+1)}P_{{\mathcal {T}}'}(\textit{OutObs}({\mathcal {T}},\alpha ,k+1,m))\nonumber \\&\quad = \sup _{{\mathcal {T}}'\in \textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}^*)}P_{{\mathcal {T}}'}(\textit{OutObs}({\mathcal {T}},\alpha ,k+1,m)), \end{aligned}$$
(4)

by Remark 2. \(\textit{OutObs}\) only comprises traces ending in output; thus, its measure under any trace distribution of \(\textit{trd}(S,k+1)\) cannot be larger than the measure of the ones already contained in \(\textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}^*)\). Together with Eq. 3, this yields that for all \({\mathcal {T}}'\in \textit{trd}({\mathcal {S}},k+1)\) we have

$$\begin{aligned} P_{{\mathcal {T}}'}(\textit{OutObs}({\mathcal {T}},\alpha ,k+1,m))<\beta _m \end{aligned}$$
(5)

for some \(\beta _m\rightarrow 0\) as \(m\rightarrow \infty \). We are left to show that \({\mathcal {T}}\in \textit{trd}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}},k+1)\) for some \({\hat{{\mathfrak {t}}}}\in {\hat{{\mathfrak {T}}}}\). Let

$$\begin{aligned} {\mathfrak {K}}=\{\,\sigma \in \textit{traces}^{\textit{fin}}({\mathcal {I}})\mid P_{{\mathcal {T}}}(\varSigma )>0\,\}, \end{aligned}$$

i.e. the set of all traces assigned positive probability under \({\mathcal {T}}\). Obviously \({\mathfrak {C}}\subseteq {\mathfrak {K}}\). By initial assumption, we know that all \(\sigma \in {\mathfrak {C}}\) are contained in \(\textit{traces}^{\textit{fin}}({\mathcal {S}})\). Hence, all \(\sigma \in {\mathfrak {K}}\) are necessarily in \(\textit{traces}^{\textit{fin}}({\mathcal {S}})\). Thus, there is a test case \({\hat{{\mathfrak {t}}}}\) for \({\mathcal {S}}\) such that all \(\sigma \in {\mathfrak {K}}\) are in \(\textit{traces}^{\textit{com}}({\hat{{\mathfrak {t}}}})\). In particular, all \(\sigma \) end in output by assumption. Hence, the last stage of every test case is item 2 in Definition 16. We now construct a scheduler \({\mathfrak {S}}'\in \textit{Sched}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}})^{\le k+1}\) such that \(\textit{trd}({\mathfrak {S}}')={\mathcal {T}}\).

Consider the mapping \(f \in \textit{tr}^{-1}({\mathfrak {K}})\rightarrow \textit{paths}^{\textit{fin}}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}})\) where for every path fragment we have

$$\begin{aligned}&f(\ldots \langle \ell ,v_0,x_0\rangle \langle t_1,e_1,R_1\langle \ell ',v_1,x_1\rangle \rangle \ldots )\\&\quad = \ldots \langle \langle \ell ,q \rangle ,{\bar{v}}_0,{\bar{x}}_0\rangle \langle {\bar{t}}_1,{\bar{e}}_1,{\bar{R}}_1,\langle \langle \ell ,q \rangle ,{\bar{v}}_1,{\bar{x}}_1\rangle \rangle \ldots . \end{aligned}$$

By Definition 16, \(v_i={\bar{v}}_i \wedge x_i={\bar{x}}_i\) for \(i=0,1\) and \(t_1={\bar{t}}_1 \wedge R_1={\bar{R}}_1\), because test cases do not have clocks. Further, we define \(g \in E _{{\mathcal {I}}}\rightarrow E _{{\mathcal {I}}\,\!\parallel \!\,{\mathfrak {t}}}\) such that

$$\begin{aligned} g(e)=g(C,a,\mu (\langle R,\ell \rangle ))=(C,a,{\bar{\mu }}(\langle R,\langle \ell ,q \rangle \rangle ))={\bar{e}} \end{aligned}$$

where \(\mu (\langle R,\langle \ell ,q \rangle \rangle )={\bar{\mu }}(\langle R,\ell \rangle )\) for all \(\ell \). q is uniquely determined because tests are internally deterministic and every distribution is the Dirac distribution. Thus, discrete probabilities carry over from \(\mu \) to \({\bar{\mu }}\). In particular, \(q=q'\) if \(a=\tau \). Then, f is an injection, i.e. \(f(\pi _1)=f(\pi _2)\Rightarrow \pi _1=\pi _2\).

We now construct \({\mathfrak {S}}'\). Let \({\mathfrak {S}}\) be the scheduler that induces \({\mathcal {T}}\) by Definition 13. For every \(\pi \in \textit{tr}^{-1}({\mathfrak {K}})\), we define

$$\begin{aligned} {\mathfrak {S}}'(\pi )({\bar{e}}){\mathfrak {S}}(f^{-1}(\pi ))(e). \end{aligned}$$

The construction of \({\mathfrak {S}}'\) is straightforward: since \({\hat{{\mathfrak {t}}}}\) is internally deterministic, and every of its discrete distributions is the Dirac distribution, there is no interleaving in \({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}}\). Hence, a scheduler of \({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}}\) may copy the decisions of \({\mathfrak {S}}\) step by step. In particular, \(P_{\textit{trd}({\mathfrak {S}}')}(\varSigma )=0\) for \(\sigma \notin {\mathfrak {K}}\). We conclude \(\textit{trd}({\mathfrak {S}}')={\mathcal {T}}\) and therefore \({\mathcal {T}}\in \textit{trd}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}},k+1)\).

Together with Eq. 4, we have found a scheduler \({\mathfrak {S}}'\) such that \(\textit{trd}({\mathfrak {S}}')\in \textit{trd}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}},k+1)\), and for all \({\mathcal {T}}'\in \textit{trd}({\mathcal {S}},k+1)\) we have

$$\begin{aligned} P_{{\mathcal {T}}'}(\textit{OutObs}(\textit{trd}({\mathfrak {S}}'),\alpha ,k+1,m))<\beta _m. \end{aligned}$$

Now iff \(\alpha \le 1-\beta _m\), we estimate this further to

$$\begin{aligned} P_{{\mathcal {T}}'}(\textit{OutObs}(\textit{trd}({\mathfrak {S}}'),\alpha ,k+1,m))<\beta _m\le 1-\alpha . \end{aligned}$$

However, the inequality \(\alpha \le 1-\beta _m\) always holds for sufficiently large m, since \(\beta _m\rightarrow 0\) as \(m\rightarrow \infty \) by Definition 18. Ultimately, this yields \(v_{\textit{prob}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{fail}\). \(\square \)

5 Implementing stochastic testing

We now present practical procedures to implement the concepts defined in the previous section. First, we propose a goodness-of-fit method in the form of Pearson’s \(\chi ^2\) test enriched with confidence interval analysis on the time stamps to evaluate the stochastic behaviour of the observed traces in the IOMA setting. Waiting times recorded in traces are grouped and compared to the prescribed rate parameters in the specification. Some additional assumptions are necessary to enable a clean and efficient framework. Since IOSA are not limited to exponential distributions, we need more powerful ways to infer if a sample was drawn from a particular distribution. In the IOSA setting, we thus apply the Kolmogorov–Smirnov (KS) test, which is able to infer general probability distributions, in place of interval estimation. Next, we discuss the interplay of stochastic delays and quiescence. Finally, we summarise the overall stochastic MBT procedure from test case generation to final verdicts.

5.1 Goodness of fit

We need practically applicable methods to decide about the verdicts given by Definition 19. While the functional verdict is determined via test annotations in the same straightforward way as in traditional ioco testing, we also need a procedure to decide the probabilistic verdict. We propose a two-step procedure consisting of Pearson’s \(\chi ^2\) hypothesis test for the discrete probabilities followed by interval estimation (in the IOMA setting) or multiple KS tests (in the IOSA setting) for the time stamps resulting from the stochastic delays.

Our method is based on a theorem known from the literature [8] relating trace distributions to the set of acceptable outcomes. However, neither is readily available to us in case of a real black-box implementation—only experiments and samples give evidence about its inner workings. Therefore, we pose a null-hypothesis test based on a gathered sample of the implementation. Should the sample turn out to be an acceptable outcome of the specification, too, then we accept the hypothesis that all observations of the implementation are also observations of the specification. In tandem with the theorem by Cheung et al. [8], this would imply an embedding on the set of trace distributions. Consequently, the resulting probabilistic verdict in Definition 19 would be pass.

5.1.1 Pearson’s \({{\varvec{\chi }}}^2\) test

In previous work for pIOTS models [20], we used the \(\chi ^2\) hypothesis test to judge discrete probabilistic behaviour. Its outcome is based on a sample O taken from the implementation under test. Should O prove to be a sample of the set \(\textit{OutObs}({\mathcal {S}},\alpha ,k,m)\) for some \(\alpha \in (0,1)\), we are willing to accept the hypothesis of the embeddings of observations. In the continuous-time stochastic case, we argue along the same lines. However, only applying the \(\chi ^2\) hypothesis test is insufficient, as it does not take into account the delays observed in abstract traces. Nonetheless, passing the \(\chi ^2\) test is a necessary condition for an implementation to be accepted.

For a finite trace \(\sigma =t_1\,a_1\,t_2\,a_2\ldots \,t_n\,a_n\), we define its time closure as \({\bar{\sigma }}={\mathbb {R}}^{+}_{0} \,a_1\,{\mathbb {R}}^{+}_{0} \,a_2\ldots \,{\mathbb {R}}^{+}_{0} \,a_n\). Then, the empiric \(\chi ^2\) score is given as

(6)

essentially comparing observed traces to their respective expected counterparts. We use the time closure of traces to ignore time stamps for the \(\chi ^2\) analysis. The empirical \(\chi ^2\) value is compared to critical values of given degrees of freedom and levels of significance. The degrees of freedom are given by the number of different timed closures in O minus one. The critical value can be calculated, or looked up in a \(\chi ^2\) table. In case the empiric \(\chi ^2\) score is below the given threshold \(\chi ^2_{\textit{crit}}\), the hypothesis is accepted, and otherwise, it is rejected.

However, the expected value \({\mathbb {E}}^{\mathcal {T}}\) depends on the resulting trace distribution of a scheduler. Thus, finding a scheduler such that \(\chi ^2\le \chi ^2_{\textit{crit}}\) turns (6) into a minimisation problem (or satisfaction problem, respectively):

(7)

The probability of a trace is given by a scheduler and the corresponding path probability function. Hence, we need to find probabilities p used by a scheduler to resolve nondeterminism. This turns (7) into a minimisation or constraint solving problem of a rational function f(p) / g(p) with inequality constraints on the vector p. This type of problem is NP-hard in general [39].

5.1.2 Interval estimation for IOMA

In addition to the \(\chi ^2\) test defined above, we need a metric to decide whether the observed delays correspond to exponential distributions prescribed by the specification in the IOMA setting. For this purpose, we use interval estimation on the parameters of the exponential distributions.

In general, assume values \(x_1,\ldots ,x_n\) are given, and suppose we ought to test whether the values follow an exponential distribution with rate \(\lambda \). Our goal is to construct the confidence interval of these values for a given \(\alpha \in \; ]0,1[\), i.e. upon further sampling and estimations, there is a \(1-\alpha \) chance that the true parameter \(\lambda _{\textit{real}}\) is contained in the interval. The \(1-\alpha \) confidence interval is given by

$$\begin{aligned} \left[ \frac{\chi ^2_{1-\alpha /2,2n}}{2\varSigma _{i=1}^n x_i},\frac{\chi ^2_{\alpha /2,2n}}{2\varSigma _{i=1}^n x_i}\right] \end{aligned}$$
(8)

where \(\chi ^2_{\alpha ,2n}\) is the \(1-\alpha \) quantile of the \(\chi ^2\) distribution of 2n degrees of freedom.

Fig. 8
figure 8

Specification IOMA and observation sample

Example 7

Figure 8 shows an example specification model alongside an example observation sample from an implementation. State \(s_0\) has two outgoing \(\tau \) transitions, followed by one Markovian transition in each of \(s_1\) and \(s_2\). In states \(s_3\) and \(s_4\), we either observe action a! or b!, respectively. The sample shows 14 recorded traces of length one, thus \(m=14\) and \(k=1\). There are two steps to assess whether the observed data are a truthful sample of the specification model with a confidence of \(\alpha =0.1\): first find a trace distribution that minimises the \(\chi ^2\) statistic, then evaluate two confidence intervals to assess whether the observed time data are a sample of \(\lambda _1=1\) and \(\lambda _2=0.1\), respectively.

There are two classes of traces solely based on the action signature: ID 1-8 with a! and ID 9-14 with b!. Let p be the probability that a scheduler assigns to taking the left branch in \(s_0\), and \(1-p\) the probability for the right branch. Upon drawing a sample with \(m=14\) we expect \(m\cdot p\) as frequency for a! and \(m\cdot (1-p)\) as frequency for b!. The empirical \(\chi ^2\) score therefore calculates as

$$\begin{aligned} \chi ^2 = \frac{8-14\cdot p}{14\cdot p} + \frac{6- 14\cdot (1-p)}{14\cdot (1-p)}. \end{aligned}$$

This yields \(\chi ^2=0\) for \(p=8/14\), which is obviously smaller than the value \(\chi ^2_{\textit{crit}} = \chi ^2_{0.1,1} = 2.706\). We thus proceed to confidence interval estimation.

\(t_1=0.03,\ldots ,t_8=2.69\) is the data associated with \(\lambda _1\) and \(t_1'=2.28,\ldots ,t'_6=19.01\) the data associated with \(\lambda _2\). Calculating the confidence intervals according to Eq. 8 yields \(C_1=[0.441,1.458]\) and \(C_2=[0.092,0.368]\). We see that \(\lambda _1\in C_1\) and \(\lambda _2\in C_2\) and are therefore willing to accept that the recorded sample was drawn under the prescribed parameters.

These two steps do not yet make a sound statement about the acceptance of the hypothesis \(O\in \textit{OutObs}({\mathcal {S}},0.05,1,14)\) since we test multiple hypotheses at once. We need to adjust the individual level of significance for the statistical tests, to conclude the overall acceptance with \(\alpha =0.1\). This inflation of the error of first kind is discussed in Sect. 5.1.4.

Example 7 highlights the necessity of two assumptions if we are to apply confidence intervals as the method of choice:

  • We must be able to uniquely identify every recorded trace. Assume for illustration that the transition currently labelled b! was labelled a! instead. It would not directly be possible to associate values \(t_i\) with \(\lambda _1\) and \(t_i'\) with \(\lambda _2\); we would need to check all possible permutations. This becomes infeasible in practice even for moderate sample sizes or moderately sized models; we therefore assume all specification models to be internally deterministic, i.e. there must be a bijection between paths and traces.

  • The sum of exponential distributions is not an exponential distribution. Hence, confidence interval estimation would be flawed for two sequential Markovian actions. We would need to deal with phase-type distributions instead, which are dense in the set of all positively valued distributions. We thus assume models to contain an input or output between any two Markovian transitions.

5.1.3 Kolmogorov–Smirnov tests for IOSA

Working with IOSA means that specifications and implementations are not limited to the exponential distribution. Since they neither comprise one specific distribution nor one specific parameter to test for, we use the nonparametric KS test to validate that the observed delays were drawn from the specified clocks and distributions. The KS test assesses whether observed data matches a hypothesised continuous probability measure. We thus restrict the practical application of our approach to IOSA where the \(F(c)\) for all clocks c are continuous distributions.

Let \(t_1,\ldots ,t_n\) be the delays observed for a certain edge over multiple traces in ascending order and \(F_n\) be the resulting step function, i.e. the right-continuous function \(F_n\) defined by

$$\begin{aligned} F_n(t) = {\left\{ \begin{array}{ll} 0 &{} \text {if } t< t_1 \\ n_i/n &{} \text {if } t_i \le t < t_{i+1} \\ 1 &{} \text {if } t \ge t_n \end{array}\right. } \end{aligned}$$

where \(n_i\) is the number of \(t_j\) that are smaller or equal to \(t_i\). Further, let c be a clock with CDF \(F_c\) for the measure F(c). Then the n-th KS statistic is given by

$$\begin{aligned} K_n\sup _{t\in {\mathbb {R}}^{+}_{0}}|F_c(t)-F_n(t)|. \end{aligned}$$
(9)

If the sample values \(t_1,\ldots ,t_n\) are truly drawn from the CDF \(F_c\), then \(K_n\rightarrow 0\) almost surely as \(n\rightarrow \infty \) by the Glivenko–Cantelli theorem [22]. Hence, for given \(\alpha \) and sample size n, we accept the hypothesis that the \(t_i\) were drawn from \(F_c\) iff \(K_n\le K_{\textit{crit}}\), where \(K_\textit{crit}\) is a critical value given by the Kolmogorov distribution. Again, the critical values can be calculated or found in tables.

Fig. 9
figure 9

Specification IOSA and observation sample

Example 8

The left-hand side of Fig. 9 shows a tiny example specification IOSA with clocks x and y. The expiration times of both are uniformly distributed with different parameters. In \(\ell _0\) there is a nondeterministic choice to either take the left or the right branch. The right-hand side depicts a sample from this IOSA. There are two steps to assess whether the observed data are a truthful sample of the specification with a confidence of \(\alpha =0.05\): first find a trace distribution that minimises the \(\chi ^2\) statistic, and then evaluate two KS tests to assess whether the observed time data are a truthful sample of Uni\(\left[ 0,2\right] \) and Uni\(\left[ 0,3\right] \), respectively.

In the same way as in Example 7, the empirical \(\chi ^2\) value calculates as

$$\begin{aligned} \chi ^2 = \frac{(8 - 14\cdot p)^2}{(14\cdot p)} + \frac{(6 - 14\cdot \left( 1-p)\right) ^2}{(14\cdot (1-p))}, \end{aligned}$$

which is minimal for \(p=8/14\) and smaller than \(\chi ^2_{\textit{crit}}=3.84\). We thus found a scheduler that maximises the likelihood of the observed frequencies.

For the second step, \(t_1=0.26,\ldots ,t_8=1.97\) is the data associated with clock x and \(t'_1=0.29,\ldots ,t'_6=2.74\) is the data associated with clock y. Since there is no time that was recorded twice, the step function of the \(t_i\) is

$$\begin{aligned} F_8\left( t\right) ={\left\{ \begin{array}{ll} 0 &{} \text {if } t<t_0\\ \frac{k}{8} &{} \text {if } t_k\le t< t_{k+1}, k=1,\ldots ,7 \\ 1 &{} \text {if } t \ge t_8. \end{array}\right. } \end{aligned}$$

\(D_8=0.145\) is the maximal distance between this empirical step function and Uni\(\left[ 0,2\right] \). The critical value of the Kolmogorov distribution for \(n=8\) and \(\alpha =0.05\) is \(K_\textit{crit}=0.46\). With \(K_8<K_{\textit{crit}}\), the empiric value is below the given threshold. Hence, the inferred measure is sufficiently close to the specification. The KS test for \(t'_i\) and Uni\(\left[ 0,3\right] \) can be performed analogously. To conclude overall acceptance with \(\alpha = 0.1\), we again need to adjust the level of significance due to performing multiple tests; see Sect. 5.1.4.

Our intention is to provide a general and universally applicable procedure. The KS test is conservative for general distributions, but can be made precise [10]. Specialised and thus more efficient tests exist for specific distributions, e.g. the Lilliefors test [29] for Gaussian distributions, and parametric tests are generally preferred due to higher power at equal sample size. The KS test requires a comparably large sample size, an alternative being, e.g. the Anderson–Darling test [29].

Remark 3

The connection of two nonparametric tests is immensely more difficult in the presence of internal nondeterminism in a specification, cf. Example 8 with only a! on both visible edges. Time values can no longer be unambiguously addressed to unique distributions, and no confidence bound for the measured time data can be given. In this case, the scheduler probability decisions p are used as parameters for mixture distributions, e.g. \(F\left( p\right) p\cdot F_x + (1-p)\cdot F_y\) in Fig. 9. The parameterised distribution can then be used in the iterative expectation–maximisation algorithm [38], and confidence can be given upon convergence.

For the sake of simplicity, we assume that the specification is internally deterministic, i.e. there are no two paths that result in the same trace. While this decreases the space of potential specifications, we deem it a necessary compromise to come up with a feasible and general method.

5.1.4 Multiple comparisons

Since the \(\chi ^2\) test and all subsequent confidence interval estimations or KS tests are statistical hypothesis tests on their own, their errors accumulate. To illustrate: if a hypothesis test is performed at \(\alpha =0.05\) there is a 5% chance of performing an error of first kind, i.e. of erroneously rejecting a true hypothesis. If we apply 100 individual tests with \(\alpha =0.05\), we might naively expect to perform this error 5 times. If we assume the tests to be independent, the probability of committing at least one error of the first kind actually calculates as \(1-(1-0.05)^{100}=99.4\%\).

There are several techniques to cope with the inflation of the error of first kind. For the remainder of this section, we use Bonferroni correction: \( \alpha _{\textit{local}}=\alpha _{\textit{global}}/{l} \) where l is the total number of statistical hypothesis tests to be performed.

Example 9

We return to Example 7. Applying Bonferroni correction for a total of three hypothesis tests with desired \(\alpha = \alpha _\textit{global} = 0.1\) tests yields a necessary \(\alpha _{\textit{local}}\approx 0.033\). This applies to the \(\chi ^2\) test and the two interval estimations. The \(\chi ^2\) test still passes, and the new confidence intervals are \(C'_1=[0.353,1.677]\) and \(C'_2=[0.070,0.432]\). We see that \(\lambda _1\in C_1'\) and \(\lambda _2\in C_2'\) still hold, so we give the implementation the probabilistic pass verdict.

5.2 Stochastic delays and quiescence

A test case needs to assess if an implementation is allowed to be unresponsive when output was expected [45]. In our formalism, quiescence \(\delta \) models the absence of output for an indefinite time. It should be regarded with caution in practical testing scenarios. A common way to deal with quiescence is a global fixed timeout value set by a user [2, 5]. The time progress in IOMA and IOSA is governed by continuous probability distributions; hence, a global timeout has two disadvantages: first, a timeout might occur before a specified Markovian transition or edge takes place. The average waiting time of this event might be substantially higher than the global timeout. Second, a global timeout might unnecessarily prolong the overall test process.

A timeout can be seen as a delay that follows a Dirac distribution. While this naturally fits into the framework of stochastic automata, it is incompatible with the IOMA approach: Dirac delays cannot be represented in IOMA, and consequently, they were not considered in the statistical evaluation that we developed in Sect. 5.1.2. We now detail an approach for IOMA that avoids the problem of Dirac distributions and aims to minimise the probability of erroneously declaring quiescence while keeping the overall testing time as low as possible. While Dirac distributions are supported by IOSA, similar ideas for the latter apply to IOSA, too.

In order to avoid Dirac distributions, an MBT tool for IOMA needs to implement quiescence by racing an exponentially distributed delay with rate \(\mu _\delta \) against the implementation; this quiescence timer winning the race is then treated as the quiescence output \(\delta \). Let \(\lambda >0\) be the minimum exit rate over all Markovian states. With level of significance \(\alpha \in \; ]0,1[\), we would like the probability that the quiescence timer expires before a Markovian transition is executed, i.e. that we incorrectly report quiescence when the implementation could make progress, to be at most \(\alpha \). Choosing \(\mu _\delta = \lambda \cdot \frac{\alpha }{1 - \alpha }\) as the quiescence timer’s rate achieves this probability with the shortest waiting time in case of actual quiescence. We can further reduce the waiting time by using a different rate in every state: if the exit rate of state s is \(\lambda _s\), we use rate \(\mu _\delta ^s = \lambda _s \cdot \frac{\alpha }{1 - \alpha }\) to judge quiescence in s.

The statistical evaluation only has to be adjusted to consider the new exit rate \(\lambda + \mu _\delta \) and the new “Markovian transition” for quiescence. In fact, we can directly represent this approach by rewriting the specification model as shown in Example 10. For non-Markovian states, a default maximal waiting time is still applicable.

Fig. 10
figure 10

Two example specifications for quiescence timeouts

Example 10

Figure 10 (top) shows a simple specification of a file transmission protocol. Exponential distributions model the delay between sending a file and acknowledging its reception. Different delays are associated with sending small or a large files, respectively. After a file was sent, there is a chance that it gets lost, and we do not receive an acknowledgement. In this case, the system is judged as quiescent, and therefore erroneous.

However, since \(\lambda _2\ll \lambda _1\), a test should use a quiescence timer rate of \(\mu _\delta ^{s_1} = 10 \cdot \frac{\alpha }{1 - \alpha }\) in \(s_1\) and \(\mu _\delta ^{s_2} = \frac{\alpha }{1 - \alpha }\) in \(s_2\) to minimise the probability to erroneously judge the system as quiescent while also keeping the global testing time as low as possible. Regardless, for sufficiently large sample size, an MBT tool eventually erroneously observes quiescence. Figure 10 (bottom) therefore allows some amount of quiescence observations depending on \(\alpha \), i.e. on how many erroneous quiescent judgements we are willing to accept.

Example 11

We compare a global quiescence timer rate to individual ones by assuming \(\alpha =0.05\) and that we are to test the protocol as in Fig. 10 (top) 100 times:

  1. Long global:

    A sensible long global quiescence timer rate is \(\mu _d = \mu _\delta ^{s_2} \approx 0.053\). Executing 100 test cases yields a worst-case expected waiting time (for the case where implementation is always quiescent) of \(100/\mu _\delta ^{s_2} = 1900\) time units. However, we are (more than) guaranteed to incorrectly judge the implementation quiescent in at most \(5\,\%\) of all cases.

  2. Short global:

    A sensible short global quiescence timer rate is \(\mu _d = \mu _\delta ^{s_1} \approx 0.526\). The worst-case expected time is now only 190 time units. However, the probability of the Markovian transition with rate \(\lambda _2\) not firing before the quiescence timer becomes \(\approx 34\,\%\). We would then incorrectly judge the implementation quiescent even though the transition might still take place.

  3. Individual:

    Using the long rate in state \(s_2\) and the short one in state \(s_1\) guarantees that we erroneously judge quiescence overall in at most 5% of the cases. Note that this is accounted for in the specification in Fig. 10 (bottom). The worst-case waiting time now depends on the probability p of sending a small file instead of a large one; it is \(p \cdot 190 + (1 - p) \cdot 1900\). Time is saved in the overall test process whenever a small file is sent.

5.3 Stochastic test procedure outline

Test cases for IOMA and IOSA are essentially IOTS. Hence, the standard test generation algorithms for ioco [47] apply directly, except for the inclusion of explicit quiescence timeouts as in Fig. 10 (bottom), if desired. We summarise all necessary steps to perform model-based testing with Markov automata or stochastic automata using our framework:

  1. 1.

    Generate an annotated test case (suite) for the specification automaton.

  2. 2.

    Execute the test case (all test cases of the test suite) m times. If the functional \(\textit{fail}\) verdict is encountered in any of the m executions, then fail the implementation for functional reasons.

  3. 3.

    Calculate the number of necessary statistical hypothesis tests for each test case. Correct \(\alpha \) accordingly.

  4. 4.

    Perform statistical analysis on the gathered sample of size m for the test case (all test cases) with the new parameter \({\bar{\alpha }}\).

    (a):

    Use optimisation or constraint solving to find a scheduler such that \(\chi ^2\le \chi ^2_{\textit{crit}}\). If no such scheduler is found, reject the implementation for statistical reasons.

    (\(\hbox {b}_1\)):

    For IOMA, perform confidence interval estimation, and check if all Markovian parameters are contained in their respective intervals. If there is at least one parameter not contained in its confidence interval, reject the implementation for statistical reasons.

    (\(\hbox {b}_2\)):

    For IOSA, group all time stamps assigned to the same clock and perform a KS test for each clock. If any of them fail, reject \({\mathcal {I}}\) for statistical reasons.

  5. 5.

    Accept the implementation.

6 A Bluetooth device discovery example

Bluetooth is a wireless communication standard [3] aimed at low-powered devices that communicate over short distances. Before any communication can take place, Bluetooth devices organise into small networks of one master and up to seven slave devices. To cope with interference, this device discovery protocol uses a frequency hopping scheme.

To illustrate and compare our frameworks for IOMA and IOSA, we study the discovery phase for one master and one slave device. The device discovery protocol is inherently stochastic due to the initially random and unsynchronised state of the devices. We give a high-level overview of the protocol here and refer the interested reader to a verification case study performed with PRISM [16] for a more detailed description and formal analysis in a more general setting.

6.1 Device discovery protocol

To resolve possible interference, the master and slave device communicate via a prescribed sequence of 32 frequencies. Both devices have a 28-bit clock that ticks every 312.5 \(\upmu \hbox {s}\).

The master broadcasts on two frequencies for two consecutive ticks followed by a two-tick listening period on the same frequencies. It picks the broadcasting frequency \(\textit{freq}\) as

$$\begin{aligned} (\textit{CLK}_{16\ldots 12} + \textit{o} + (\textit{CLK}_{4\ldots 2,0} {-} \textit{CLK}_{16\ldots 12} ) \text{ mod } 16 ) \text{ mod } 32 \end{aligned}$$

where \(\textit{CLK}_{i\ldots j}\) marks bits i to j of the clock and \(\textit{o}\in {\mathbb {N}}\) is an offset. The master chooses one of two tracks and switches to the respective other every 2.56 s. Every 1.28 s, i.e. every time the 12th bit of the clock changes, a frequency is swapped between the two tracks. For simplicity, we choose \(\textit{o}=1\) for track one and \(\textit{o}=17\) for track two, such that the two tracks initially comprise frequencies \(1,\ldots ,16\) and \(17,\ldots ,32\).

The slave device periodically scans on the 32 frequencies. It is in either a sleeping or a listening state. To ensure eventual connection, the hopping rate of the slave device is much slower. The Bluetooth standard leaves some flexibility with respect to the length of the listening period. For our study, every 0.64 s, it listens to one frequency for 11.25 ms and sleeps during the remaining time. It cycles to the next frequency after 1.28 s. This is enough time for the master device to broadcast on 16 different frequencies.

6.2 Specification models

The time to connect two devices is deterministic for a fixed initial state. That is, assuming we know the initial state of both devices, we can calculate the time needed until a connection is established. To study a realistic scenario, however, we have to assume that the clocks of both devices are initially desynchronised. Thus, in our models, the master starts broadcasting immediately while the slave starts listening after a uniformly chosen random waiting time. We then have four scenarios to reach synchronisation:

  • Synchronisation happens during the first 16 broadcast frequencies. This happens between 0 and 1.28 s and comprises 16 frequencies.

  • Synchronisation happens after the first frequency swap of the master device (1.28 to 2.56 s, one frequency).

  • Synchronisation happens after the first switch of tracks and two frequency swaps of the master device (2.56 to 3.84 s, 14 frequencies).

  • Synchronisation happens after the first switch of tracks and three frequency swaps of the master device (3.84 to 5.12 s, one frequency).

These four scenarios are exhaustive, i.e. the master device broadcasts on frequencies such that the slave necessarily listens to at least one of them within 5.12 s. The different scenarios yield 32 possible exact waiting times to connect, i.e. after 2 or 3 ticks, 6 or 7 ticks, etc.

This protocol specification prescribes a delay that is not exponentially distributed, as is evident by the sample CDF we collected for the specification shown in Fig. 13 (dark blue line). This is no problem for IOSA-based testing. Our IOSA specification is shown in Fig. 11; we directly incorporate the exact probability distribution to connect within a certain time as prescribed by the protocol description as the distribution F(x) here. Thus, the structure of the IOSA can be extremely simple; the complexity is hidden in F(x). For IOMA, we have to approximate the true distribution by an exponential distribution. Calculating the mean of all waiting times gives us the average time to connect as approximately 1.325 s and thus \(\lambda =0.755\) as the estimated rate parameter. Note that F(x) in the IOSA case could also be specified as the exponential distribution with \(\lambda =0.755\) to pose the same requirement that concerns the mean time to connection only.

Fig. 11
figure 11

IOSA specification for the Bluetooth example

Fig. 12
figure 12

Experimental setup

6.3 Experimental setup

Our toolchain is depicted in Fig. 12. The implementation is tested on-the-fly via the MBT tool JTorX [2], which generates tests with respect to the transition system abstraction of the specifications. JTorX returns the functional fail verdict if unforeseen output is observed at any time throughout the test process. Additionally, we chose a timeout of approximately 5.2 s in accordance with the protocol description: this is the time that the master device needs to broadcast all available frequencies at least once. We can use this fixed timeout even in the IOMA setting since we know that no correct implementation may take this long to connect; any implementation that does can be functionally rejected without the need for statistical analysis. The recorded log files of JTorX comprise the sample. We use MATLAB to calculate the statistical verdict. We implemented the correct protocol and three mutants in Java 7:

\({\mathcal {M}_1}\) :

The first master mutant never switches between tracks one and two, therefore covering far fewer different frequencies than the correct protocol in the same time. It will need a total of \(16 \cdot 1.28\,{\mathrm {s}} = 20.48\,{\mathrm {s}}\) to cover all 32 frequencies. Hence, we expect a much longer time to connect when compared to the correct implementation.

\({\mathcal {M}_2}\) :

The second master mutant never swaps frequencies, only switching between tracks one and two. The expected time to connect will therefore be around 2.56 s.

\({\mathcal {S}_1}\) :

The slave mutant has its listening period halved, and thus only listens for 5.65 ms every 1.28 s. Therefore, it has a longer sleeping period and we expect that the probability to connect is slightly reduced when compared to the correct counterpart.

6.4 Results

We collected \(m=100\), \(m=1000\), and \(m=10{,}000\) test executions for each of the four implementations. We set the level of significance to \(\alpha =0.05\). No \(\chi ^2\) tests are necessary due to the absence of nondeterminism and probabilistic branching in the specifications. Furthermore, we need only one statistical test in each setting and thus no \(\alpha \) correction. Figure 13 shows the cumulative distribution of the sample data collected for \(m = 1000\) runs of the correct implementation and mutants (coloured lines).

Fig. 13
figure 13

Probability to establish connection over time

IOMA For comparison, we show as a dashed line in Fig. 13 the cumulative probability to connect within T seconds for the exponential distribution with rate \(\lambda = 0.755\), which is the specified distribution in the IOMA setting. Table 1 shows the confidence intervals calculated based on our samples. All intervals of the correct implementation contain the assumed value \(\lambda =0.755\), which is therefore judged as correct. \({\mathcal {M}}_1 \!\parallel \!{\mathcal {S}}\) was consistently rejected for functional reasons by JTorX due to exceeding the fixed timeout. The remaining two mutants required the statistical verdict for rejection; both were still accepted for \(m = 100\), requiring at least 1000 test executions for the statistical verdict to produce a confidence interval sufficiently narrow for rejection. In particular, dividing the listening time of the slave into half had the least impact on the behaviour; it was consequently rejected with a very small margin.

Table 1 Connection time confidence intervals (IOMA)

IOSA We used MATLAB’s kstest2 function to execute a two-sample KS test to analyse the samples with respect to the specified time distribution. Table 2 shows the verdicts and the observed KS statistics \(K_m\) alongside the corresponding critical values \(K_{\textit{crit}}\) for our experiments. The statistical verdict \(\textit{pass}\) was given if \(K_m<K_{\textit{crit}}\), and \(\textit{fail}\) otherwise. The critical values depend on \(\alpha \) and m. The correct implementation was accepted in all three experiments. During the sampling of \({\mathcal {M}}_1\!\parallel \!{\mathcal {S}}\), we again observed several timeouts leading to a functional \(fail \) verdict. It would also have failed the KS test in all three experiments. \({\mathcal {M}}_2\!\parallel \!{\mathcal {S}}\) passed the test for \(m=100\), but was rejected with increased sample size. \({\mathcal {M}}\!\parallel \!{\mathcal {S}}_1\) is the most subtle of the three mutants and was only rejected with \(m=10{,}000\) at a narrow margin.

Discussion The case study was not tailored to MBT with Markov automata. The waiting time of interest is clearly not exponentially distributed, and only means of the delay until the connection is established are compared. Nonetheless, the IOMA framework is applicable and rightfully judged the correct implementation as conforming while eliminating the mutants. The confidence intervals for the slave mutant only marginally did not contain the parameter \(\lambda \). Consequently, there is a relatively high probability to commit an error of second kind. On the other hand, the second master mutant was eliminated with a large margin.

Table 2 Verdicts and KS test results (IOSA)

In the IOSA setting, observe that the critical value decreases faster than the observed KS statistic in all three faulty implementations. We conjecture that an even larger sample is expected to have a clearer verdict, as this is in line with the decreasing error of the second kind for increasing sample size pointed out in Sect. 4. This is especially desirable in the case of \({\mathcal {M}}\!\parallel \!{\mathcal {S}}_1\), where a sample of size \(m=10{,}000\) was needed to refute the faulty implementation. This is in contrast to the IOMA setting, where \(m = 1000\) sufficed, and highlights that the statistical evaluation for IOMA is in general more efficient (it needs fewer samples for clearer verdicts) than the one for IOSA. We point out that an alternate specification to the very compact one given in Fig. 11 is possible. For instance, the entire specification could comprise a probabilistic branching over 32 locations with deterministic guard sets according to the step values of the distribution of the Bluetooth specification. This illustrates the flexibility of the modelling capabilities in the IOSA test framework, and goes to show there is no unique best model.

Overall, there is a trade-off in expressivity and efficiency when comparing the test theory for Markov automata and stochastic automata in practical applications.

7 Conclusion

We presented two closely related sound and complete MBT frameworks to test probabilistic systems with stochastic delays. The underlying modelling formalisms are Markov automata and stochastic automata with a separation of their alphabet into inputs and outputs: IOMA and IOSA. The former limit delays to follow exponential distributions, but mark a relevant intermediate step between previous work on testing untimed probabilistic models [20] and the full generality—and complexity—of stochastic automata. In particular, the statistical evaluation of testing results is far simpler and more efficient in the case of IOMA. On the other hand, our Bluetooth case study shows that being able to represent arbitrary distributions over time directly as in IOSA may lead to specifications that much more closely match reality, and to provide results that are more precise and understandable.