Modelbased testing of stochastically timed systems
 100 Downloads
Abstract
Many systems are inherently stochastic: they interact with unpredictable environments or use randomised algorithms. Classical modelbased testing is insufficient for such systems: it only covers functional correctness. In this paper, we present two modelbased testing frameworks that additionally cover the stochastic aspects in hard and soft realtime systems. Using the theory of Markov automata and stochastic automata for specifications, test cases, and a formal notion of conformance, they provide clean mechanisms to represent underspecification, randomisation, and stochastic timing. Markov automata provide a simple memoryless model of time, while stochastic automata support arbitrary continuous and discrete probability distributions. We cleanly define the theoretical foundations, outline practical algorithms for statistical conformance checking, and evaluate both frameworks’ capabilities by testing timing aspects of the Bluetooth device discovery protocol. We highlight the tradeoff of simple and efficient statistical evaluation for Markov automata versus precise and realistic modelling with stochastic automata.
Keywords
Modelbased testing Markov automata Stochastic automata Ioco conformance1 Introduction
Modelbased testing (MBT) [50] is a technique to automatically generate, execute, and evaluate test suites on blackbox implementations under test (IUT). The theoretical ingredients of an MBT framework are a formal model that specifies the desired system behaviour, often in terms of (some extension of) input–output transition systems; a notion of conformance that specifies when an IUT is considered a valid implementation of the model; and a precise definition of what a test case is. For the framework to be applicable in practice, we also need algorithms to derive test cases from the model, execute them on the IUT, and evaluate the results, i.e. decide conformance. They need to be sound (i.e. every implementation that fails a test case does not conform to the model), and ideally also complete (i.e. for every nonconforming implementation, there theoretically exists a failing test case). MBT is attractive due to its high degree of automation: given a model, the otherwise labourintensive and errorprone derivation, execution and evaluation steps can be performed in a fully automatic way.
Modelbased testing originally gained prominence for input–output transition systems (IOTS) using the ioco relation for input–output conformance [49]. IOTS partition the observable actions of the IUT (and thus of the model and test cases) into inputs (or stimuli) that can be provided at any time, e.g. pressing a button or receiving a network message, and outputs that are signals or activities that the environment can observe, e.g. delivering a product or sending a network message. IOTS include nondeterministic choices, allowing underspecification: the IUT may implement any or all of the modelled alternatives. MBT with IOTS tests for functional correctness: the IUT shall only exhibit behaviours allowed by the model. In the presence of nondeterminism, the IUT is allowed to use any deterministic or randomised policy to decide between the specified alternatives.
Stochastic behaviour and requirements are an important aspect of today’s complex systems: network protocols extensively rely on randomised algorithms, cloud providers commit to service level agreements, probabilistic robotics [46] allows the automation of complex tasks via simple randomised strategies (as seen in, e.g. vacuuming and lawn mowing robots), and we see a proliferation of probabilistic programming languages [23]. Stochastic systems must satisfy stochastic requirements. Consider the example of exponential backoff in Ethernet: an adapter that, after a collision, sometimes retransmits earlier than prescribed by the standard may not impact the overall functioning of the network, but may well gain an unfair advantage in throughput at the expense of overall network performance. In the case of cloud providers, the service level agreements are inherently stochastic when guaranteeing a certain availability (i.e. average uptime) or a certain distribution of maximum response times for different tasks. This has given rise to extensive research in stochastic model checking techniques [30]. However, in practice, testing remains the dominant technique to evaluate and certify systems outside of a limited area of highly safetycritical applications.
In this paper, we present two MBT frameworks based on input–output Markov automata [17] (IOMA) and stochastic automata [11, 12] (IOSA), which are transition systems augmented with discrete probabilistic choices and stochastic delays. Markov automata are a memoryless continuoustime model, essentially the extension of continuoustime Markov chains with nondeterminism: the time spent in any state of the automaton follows some exponential distribution. In stochastic automata, on the other hand, the progress of time is governed by clock variables whose expiration times follow general probability distributions. By using IOMA or IOSA models, we can quantitatively specify stochastic aspects of a system, in particular, w.r.t. timing. While IOMA are more suitable for the abstract specification of soft realtime systems, IOSA enable precise modelling of both hard and soft realtime systems and requirements. Since both models extend transition systems, nondeterminism is available for underspecification as usual. After introducing the models and their semantics (Sect. 3), we formally define the notions of Markovian and stochastic ioco (marioco and saioco, respectively), and of test cases as restrictions of IOMA and IOSA (Sect. 4). We then outline practical algorithms for conformance testing (Sect. 5). The latter combines pertrace functional verdicts as in standard ioco with a statistical evaluation that builds upon confidence interval estimation for IOMA and the Kolmogorov–Smirnov test [29] for IOSA. We finally exemplify our frameworks’ capabilities and the tradeoffs between the IOMA and IOSA approaches by testing timing aspects of different implementation variants of the Bluetooth device discovery protocol (Sect. 6).
1.1 Related work
Our marioco and saioco frameworks generalise the pioco framework [20] for probabilistic automata (or Markov decision processes), which only supports discrete probabilistic choices and has no notion of time at all.
Early influential work on modelbased testing had only deterministic time [4, 31, 33, 34], later extended with timeouts/quiescence [5]. Probabilistic testing relations and equivalences are well studied [9, 14, 42]. Probabilistic bisimulation via hypothesis testing was first introduced in [35]. Our work is largely influenced by [8], which introduced a way to compare trace frequencies with collected samples. A more restricted approach is given in the work on stochastic finite state machines [28, 40]: stochastic delays are specified similarly, but discrete probability distributions over target states are not included. Closely related to our testing relation for Markov automata are the studies of bisimulation relations [17], which inspired further work on weak bisimulation [15] and lateweak bisimulation [43]. By studying relations based on trace distribution semantics, rather than equivalence relations, we grant vastly more implementation freedom.
Probabilistic and nonprobabilistic MBT are part of a greater ecosystem of formal methods developed to improve the correctness, dependability, and trustworthiness of various types of systems, ranging from software over cyberphysical systems to, for example, organisational processes and biological applications. Model checking [1], probabilistic model checking [30], and statistical model checking [26, 54] serve to prove or disprove the conformance of a (probabilistic) model of a system to a (probabilistic) specification usually given in terms of temporal logics formulas. Notable probabilistic model checkers include Prism [32], Storm [13], and the mcsta tool of the Modest Toolset [25], while two current examples of statistical model checkers are Plasmalab [36] and the Modest Toolset’s modes simulator [6]. These techniques and tools are complimentary to MBT, which establishes a relation between a model (which now acts as a specification, and may earlier have been verified with model checking) and the real implementation. Notably, the Modest Toolset also includes an MBT tool [24], thus providing all three techniques for probabilistic systems in one package. The “opposite” of MBT, deriving a model from an implementation using automata learning [51, 53], is also gaining popularity and is especially well suited for the analysis of legacy systems [41]. Automata learning typically uses MBT internally to check whether the model learned so far is approximately equivalent to the implementation under learning.
1.2 Previous work
This paper provides a new integrated presentation of our previous papers on modelbased testing for Markov automata [21] and stochastic automata [19]. We explain the differences and tradeoffs between the two frameworks in theory and practice. We added examples and more detailed explanations throughout the paper. Test cases for both models are now effectively IOTS (Sect. 4.2), where our previous work used probabilistic test cases, providing a clean distinction between test generation and test selection.
Specifically compared to [21], we use a more standard definition of IOMA (Definition 1) that does not rely on being inputreactive and outputgenerative [52]. We discuss how to implement quiescence in a Markovian setting in a way that does not affect the statistical evaluation yet minimises the testing runtime and the chance for errors of the second kind (Sect. 5.2). Finally, we study an additional protocol mutant with IOMA in the Bluetooth case study (Sect. 6).
Compared to [19], we adapted the saioco conformance relation such that it now properly extends ioco. That is, where [19] relied on trace distribution inclusion of closed systems, we now utilise schedulers for open systems. As a result, saioco is in line with marioco and with earlier work on untimed probabilistic systems [20]. We also present full proofs for the soundness and completeness of the IOSA MBT framework (Sect. 4.4).
2 Preliminaries
2.1 Mathematical notation
\({\mathbb {N}}\) is \(\{\,0, 1, \ldots \,\}\), the set of natural numbers. \({\mathbb {R}} \), \({\mathbb {R}}^+ \), and \({\mathbb {R}}^{+}_{0} \) are the sets of all, all positive, and all nonnegative real numbers, respectively. We write closed intervals as \([a, b] \{\,x \in {\mathbb {R}} \mid a \le x \le b\,\}\), open intervals as \(]a, b{[} \{\,x \in {\mathbb {R}} \mid a< x < b\,\}\), and halfopen intervals analogously as ]a, b] and [a, b[. For a given set \(\varOmega \), we denote its powerset by \({\mathcal {P}}({\varOmega }) \). A multiset is written as Open image in new window . Let the function \(\mathbb {1} \in \{\, \textit{true}, \textit{false} \,\} \rightarrow \{\,0, 1\,\}\) be defined by \(\mathbb {1}(\textit{true}) = 1\) and \(\mathbb {1}(\textit{false}) = 0\). We write \(\mathbb {1}_b\) to denote \(\mathbb {1}(b)\).
2.2 Probability theory
2.3 Valuations
\(\textit{Val}V \rightarrow {\mathbb {R}}^{+}_{0} \) is the set of valuations for an (implicit) set V of (nonnegative realvalued) variables. Valuation \(\mathbf 0 \) assigns value zero to all variables. Given \(X\subseteq V\) and \(v \in \textit{Val}\), we write \(v[X \mapsto 0]\) for the valuation defined by \(v[X \mapsto 0](x) = 0\) if \(x \in X\) and \(v[X \mapsto 0](y) = v(y)\) otherwise. For \(t \in {\mathbb {R}}^{+}_{0} \), \(v + t\) is the valuation defined by \((v + t)(x) = v(x) + t\) for all \(x \in V\).
3 Automata with stochastic time
We now present the formal automatabased models underlying our modelbased testing approaches: Markov automata for memoryless time and stochastic automata for general stochastic time. In addition to their syntax and semantics (in terms of paths, traces and trace distributions), we define parallel composition operators to formally capture the interaction between implementations and test cases.
3.1 Markov automata
Our approach to testing memoryless stochastictimed systems builds upon the framework of Markov automata [17]. They are a formal model that unifies the discrete probabilistic and nondeterministic choices of Markov decision processes (MDP) with the exponentially distributed delays of continuoustime Markov chains (CTMC) in a compositional way. The exponential distribution provides an appropriate approximation of reality if only the mean durations of activities are known, as is often the case in practice.
In Markov automata, we distinguish between probabilistic and Markovian transitions. The former take place as soon as possible and lead into a probability distribution over successor states (as in MDP). The latter are defined via a rate parameter in \({\mathbb {R}}^+\): the time until the transition is taken follows the exponential distribution with that rate (as in CTMC).
Definition 1

S is a finite set of states,

\(s_0 \in S\) is the initial state,

\(\textit{Act}= \textit{Act}_I \uplus \textit{Act}_O \uplus \{\, \tau \,\}\) is the set of actions partitioned into inputs, outputs, and the internal action \(\tau \), respectively, with \(\delta \in \textit{Act}_O\) being the distinct quiescence action,

\(T_P \in S \rightarrow {\mathcal {P}}({\textit{Act}\times \mathrm {Distr}(S)}) \) is the finite probabilistic transition function, and

\(T_M \in S \rightarrow {\mathcal {P}}({{\mathbb {R}}^+ \times S}) \) is the finite Markovian transition function.
If \(\langle \lambda , s' \rangle \in T_M(s)\), we say that \(\langle s, \lambda , s' \rangle \) is a (Markovian) transition (of \({\mathcal {M}}\)), also written Open image in new window . If \(\langle a, \mu \rangle \in T_P(s)\), we say that \(\langle s, a, \mu \rangle \) is a (probabilistic) transition (of \({\mathcal {M}}\)), also written \(s \xrightarrow {a} \mu \). We say that s is Markovian if \(T_M(s) \ne 0\); s is probabilistic if \(T_P(s) \ne 0\). We write \(s \rightarrow a\) if \(\exists \, \mu :s \xrightarrow {a} \mu \), and \(s\not \rightarrow a\) if \(\not \exists \, \mu :s \xrightarrow {a} \mu \). In the former case, we also say that action a is enabled ins. The set \(\textit{enabled}(s)\) contains all enabled actions in s. We write \(s\xrightarrow {a}_{\!\!\!{\mathcal {M}}} \mu \), etc., to clarify that a transition belongs to IOMA \({\mathcal {M}}\) if ambiguities arise. For brevity, whenever we refer to an IOMA \({\mathcal {M}}\), we assume it to be a tuple with components \(\langle S, s_{0}, \textit{Act}, T_P, T_M \rangle \) as in the above definition unless otherwise noted. \({\mathcal {M}}\) is inputenabled if all inputs are enabled in all states, i.e. we have that \(\forall \, a \in \textit{Act}_I, s \in S :s \rightarrow a\).
A Markov automaton starts in its initial state and then progresses through the state space, incurring exponentially distributed delays and jumping between states. When in state s, the next transition to take is selected as follows: if there is an outgoing probabilistic transition labelled with an action in \(\textit{Act}_O \cup \{\, \tau \,\}\), we apply the maximal progress assumption [27]: no time can pass, and one of these transitions is selected nondeterministically. We also say that outputs and internal actions are urgent. Otherwise, time passes until a Markovian transition takes place or an input arrives. The sum of the rates of all outgoing Markovian transitions of s is called its exit rate, denoted \(\mathbf E \left( s\right) \). Multiple Markovian transitions represent a race between exponential distributions. Thus, the time until any Markovian transition takes place is exponentially distributed with rate \(\mathbf E \left( s\right) \); at that point, the actual transition to take is selected probabilistically, with the probability of each transition being its rate divided by \(\mathbf E \left( s\right) \). We define \(\mathbf R \left( s,s'\right) = \sum _{\langle \lambda , s' \rangle \in T_M(s)} \lambda \), the rate from s to \(s'\).
Example 1
Figure 1 shows three IOMA describing a protocol that associates a delay with every send action, followed by an acknowledgement or error. As a convention, we indicate inputs by a ? suffix and outputs by a ! suffix. Discrete probability distributions follow an intermediate dot. Markovian transitions are presented as wavy arrows.
After the send? input is received by the specification in Fig. 1a, there is an exponentially distributed delay with rate \(\lambda _1\): the probability to go from \(s_1\) to \(s_2\) in at most T time units is \(1\hbox {e}^{\lambda _1 T}\). State \(s_2\) has one probabilistic transition. The specification requires that only \(10\%\) of all messages end in an error report and the remaining \(90\%\) are delivered correctly. After a message is delivered, the automaton goes back to its initial state where it stays quiescent until input is provided. The \(\delta \) selfloop marks the absence of outputs.
The “unfair” implementation model in Fig. 1b has the same structure, except for altered probabilities in the distribution out of \(s_2\). While the delay conforms to the one prescribed in the specification model, sufficiently many executions of the implementation should reveal that an error is reported more frequently than required. The “slow” implementation model of Fig. 1c assigns rate \(\lambda _2\) to the exponential delay between input and output. This is conforming iff \(\lambda _1=\lambda _2\); if \(\lambda _2 < \lambda _1\), it would be slower than required. This paper aims at establishing an MBT framework capable of identifying that implementations like these two do not conform to the given specification model.
3.2 Stochastic automata
We use stochastic automata [11] to develop an MBT approach for general stochastictimed systems. They are MDP augmented with realtime clocks that expire after delays governed by general (continuous) probability distributions. In this way, they allow every stochastic delay to be modelled precisely, without the need for exponential or phasetype approximation as with Markov automata.
The progress of time is governed and tracked across locations and edges explicitly by clocks. This is necessary because, working in general continuous time not restricted to exponential distributions, delays in stochastic automata do not have the memoryless property. Clocks are realvalued variables that increase synchronously with rate 1 over time and expire some random amount of time after they have been restarted. The expiration time is drawn from a probability distribution specified for each clock. Stochastic automata are thus a symbolic model, so they consist of locations and edges rather than states and transitions.
Definition 2

\(\textit{Loc} \) is a finite set of locations,

\(\ell _0 \in \textit{Loc} \) is the initial location,

\({\mathcal {C}} \) is a finite set of clocks,

\(\textit{Act}= \textit{Act}_I \uplus \textit{Act}_O \uplus \{\, \tau \,\}\) is the set of actions partitioned into inputs, outputs, and the internal action \(\tau \), respectively, with \(\delta \in \textit{Act}_O\) being the distinct quiescence action,

\(E \in \textit{Loc} \rightarrow {\mathcal {P}}({ \textit{Edges}}) \) with \(\textit{Edges} {\mathcal {P}}({{\mathcal {C}}}) \times \textit{Act}\times \mathrm {Distr}(\textit{T})\) and \(\textit{T} {\mathcal {P}}({{\mathcal {C}}}) \times \textit{Loc} \) is the edge function mapping each location to a finite set of edges that in turn consist of a guard set, an action label, and a distribution over targets in \(\textit{T} \) consisting of a restart set of clocks and target locations, and

\(F\in {\mathcal {C}} \rightarrow \mathrm {Meas}({\mathbb {R}}^{+}_{0})\) is the delay measure function that maps each clock to a probability measure.
We write \(\textit{pdf}(c)\) to refer to the probability density function associated with the measure F(c) for \(c \in {\mathcal {C}} \). As for Markov automata, we use an input–output variant of stochastic automata, along the lines of [12]. We transfer the notation used for transitions in IOMA to edges in IOSA. We call an IOSA \({\mathcal {I}}\)inputenabled if all inputs are available in every location at every time, i.e. \(\exists \, \mu :\ell \xrightarrow {\varnothing , a_I} \mu \) for all \(\ell \in \textit{Loc} \) and \(a_I \in \textit{Act}_I\).
Example 2
Figure 2a shows an example IOSA specifying the behaviour of a file server with archival storage. We omit empty restart sets and the empty guard sets of inputs. Upon receiving a request in the initial location \(\ell _0\), the specification allows implementations to either move to \(\ell _1\) or \(\ell _2\). The edge, i.e. the element of \(E (\ell _0)\), corresponding to the move to \(\ell _1\) is \(\langle \varnothing , \texttt {req?}, {\mathcal {D}}(\langle \{\,x\,\}, \ell _2 \rangle ) \rangle \), where \(\varnothing \) is the edge’s empty guard set—it must be empty since req? is an input. The move to \(\ell _2\) represents the case of a file in archive: the server must immediately deliver a wait! notification and then attempt to retrieve the file from the archive. Clocks y and z are restarted, and used to specify that retrieving the file shall take on average \(\frac{1}{3}\) of a time unit, exponentially distributed, but no more than 5 time units. In location \(\ell _3\), there is thus a race between retrieving the file and a deterministic timeout. In case of timeout, an error message (action err!) is returned; otherwise, the file can be delivered as usual from location \(\ell _1\). Clock x is used to specify the transmission time of the file: it shall be uniformly distributed between 0 and 1 time units.
In Fig. 2b, we show an implementation of this specification. One out of ten files randomly requires to be fetched from the archive. This is allowed by the specification: it is one particular (randomised) resolution of the nondeterminism, i.e. underspecification, defined in \(\ell _0\). The implementation also manages to transmit files from archive directly while fetching them, as evidenced by the direct edge from \(\ell _3\) back to \(\ell _0\) labelled file!. This violates the timing prescribed by the specification, and must be detected by an MBT procedure for IOSA.
In the remainder of this paper, whenever a statement applies to both IOMA and IOSA, we will say that it applies to an automaton \({\mathcal {A}}\) for brevity.
3.3 Parallel composition
To give a semantics for synchronisation and communication between components of a system, we define a binary parallel composition operator. Two components synchronise on inputs and outputs, and otherwise evolve independently. Our operators are defined w.r.t. a binary input–output relation M that associates outputs of one component with inputs of the other component, and vice versa. Wherever we use the !/?suffix convention for action labels, we assume that M relates every output \(a\texttt {!}\) with the input \(a\texttt {?}\) and vice versa.
Markov automata IOMA interact via probabilistic transitions, while Markovian transitions evolve independently, with the single technical exception of Markovian selfloops:
Definition 3
In the action alphabet only those inputs carry over that do not have a synchronising output in the other component associated with them via M. If \(s_1 \rightarrow _{{\mathcal {M}}_1} a_1\) and \(\langle a_1, a_2 \rangle \in M\), an \(a_1\)labelled transition can only take place in synchronisation with an \(a_2\)labelled transition from the second component (assuming no other action is associated with \(a_1\) by M). In particular, if \(s_1 \not \rightarrow _{{\mathcal {M}}_1} a_2\), then \(\langle s_1, s_2 \rangle \) has no \(a_1\)\(a_2\)synchronising transition: synchronisation waits for all partners to be ready. We later restrict to inputenabled models to make sure that outputs cannot be prevented from occurring immediately.
Stochastic automata The definition of parallel composition for IOSA is similar: while there are no Markovian transitions, the synchronisation of probabilistic edges now requires building the unions of the involved guard and restart sets. This means that a synchronising edge in the parallel composition only takes places as soon as both of its constituent edges are enabled: synchronisation partners wait, just as in IOMA.
Definition 4
3.4 Qualitative semantics
The nonprobabilistic aspects of the semantics of IOMA and IOSA are captured in the notion of a path, which precisely represents a single execution of an automaton.
3.4.1 Paths
A concrete execution of an automaton—the exact amount of time spent in each state, the transition/edge taken, and the selected successor state/location—is captured by a path.
Markov automata The definition of paths for IOMA is based on the automaton’s states and transitions:
Definition 5
By definition, every finite path ends in a state, and either \(s_{i} \xrightarrow {a_{i + 1}} \mu _{i + 1}\) or Open image in new window for every nonfinal state \(s_i\). A subsequence \(s_{i1}\, t_{i}\, \alpha _i\, \varnothing \, s_{i}\) means that \({\mathcal {M}}\) resided \(t_i\) time units in state \(s_{i1}\) before moving to \(s_{i}\) via \(\alpha _i\). The empty sets \(\varnothing \) are for consistent notation with paths for IOSA (see below).
Stochastic automata IOSA comprise realvalued clocks; to define a path through an IOSA \({\mathcal {I}}\), we need to keep track of their values and expiration times. We do so by defining the state of \({\mathcal {I}}\)to include these values: the set of states of an IOSA \({\mathcal {I}}\) is \(S \textit{Loc} \times \textit{Val}\times \textit{Val}\). Each state \(\langle \ell , v, x \rangle \in S\) consists of the current location \(\ell \) and the values v and expiration times x of all clocks. Consequently, the state space of an IOSA is uncountably infinite.
Definition 6

\(\ell _{i1} \xrightarrow {G_i, a_i} \mu _i\),

\(v_i = (v_{i1} + t)[R_i \mapsto 0]\),

\({\mathrm {Ex}}(G_i, v_{i1} + t, x_{i1})\) is satisfied,

\(\mu _i(\langle R_i, \ell _i \rangle ) > 0\),
 the expiration times satisfy$$\begin{aligned} \begin{aligned} x_i \in \{\, x \in \textit{Val}\mid \, \forall \, c \in {\mathcal {C}} {\setminus } R_i&:x(c) = x_{i1}(c)\\ \wedge \, \forall \, c \in R_i&:x(c) \ge 0 \,\}, \end{aligned} \end{aligned}$$
 and if \(a_i \notin \textit{Act}_I\), then additionally$$\begin{aligned} \not \exists \,t' \in [0,t[:\, \exists \, \ell _{i1} \xrightarrow {G, a} \mu :{\mathrm {Ex}}(G, v_{i1} + t', x_{i1}). \end{aligned}$$
The last condition implements the urgency of outputs and internal actions. We require that every path starts in the initial location with all clocks and expiration times set to zero. An edge may only be taken if all clocks in its guard set are expired (which is the case when predicate \({\mathrm {Ex}}\) is satisfied). The clock values in the successor state are obtained by resetting exactly those clocks in the restart set \(R_i\) to zero. All other clocks keep their value and expiration time.
We write \(\textit{last}(\pi )\) to denote the last state of a finite path. We write \(\pi '\sqsubseteq \pi \) if \(\pi '\) is a prefix of \(\pi \). The set of all finite paths of an automaton \({\mathcal {A}}\) is \(\textit{paths}^{\textit{fin}}({\mathcal {A}})\). The set of complete paths, denoted \(\textit{paths}^{\textit{com}}({\mathcal {A}})\), contains every path ending in a deadlock, i.e. in a state s where \(T_P(s) = T_M(s) = \varnothing \) (for IOMA) or a location \(\ell \) where \(E(\ell ) = \varnothing \) (for IOSA).
3.4.2 Traces
A trace is the projection of a path to its delays and actions, recording the path’s visible behaviour:
Definition 7
3.4.3 Abstract traces
When delays are governed by continuous probability distributions, the probability of any single time point is zero. Hence, we will need a notion that represents an automaton’s behaviour over time intervals instead of points.
Definition 8
(abstract trace) An abstract trace is a trace where each delay \(t_i\) is replaced by an interval \(I_i\subseteq {\mathbb {R}}^{+}_{0} \) with \(t_i \in I_i\).
Example 3
3.5 Quantitative semantics
Our goal is now to quantify the frequency of observed traces. For this purpose, we first define schedulers, which resolve all nondeterministic choices, and then a probability space and measure over the remaining paths. The space and measure will allow us to specify trace distributions.
3.5.1 Schedulers
IOMA and IOSA comprise nondeterministic choices, discrete probability distributions, and delays following continuous probability distributions. Due to the nondeterminism, we cannot assign probabilities to paths and traces directly. Rather, we resort to schedulers that resolve nondeterminism, and consequently yield a purely probabilistic system. Given any finite history leading to a state/location, a scheduler returns a discrete probability distribution over the set of next transitions/edges. In order to model termination, we define schedulers such that they can continue paths with a halting extension \(\perp \), after which only quiescence is observed.
Definition 9
The definition of schedulers ensures that only enabled transitions are chosen. We use subdistributions, as opposed to distributions, such that the probability mass a scheduler did not assign to actions in \(\textit{Act}\) is left for Markovian transitions. That is, a scheduler chooses an action, halts immediately (\(\bot \)), or leaves a chance for Markovian actions to take place. Schedulers for IOSA are defined similarly:
Definition 10
A scheduler for an IOSA can only choose between the edges enabled at the points where any edge just became enabled. While actions (via probabilistic transitions) and the passage of time (via Markovian transitions) were decoupled in IOMA, edges in IOSA directly govern delays. Schedulers thus return distributions, not subdistributions.
Remark 1
We use schedulers in the context of MBT in an open environment, yet schedule both inputs and outputs. This is in contrast to similar approaches in the literature; for instance, [7] use a partial scheduler for each component and an arbiter scheduler that tells precisely how progress of the composed system is determined. Our approach is noncompositional (see, for example, [44]). However, we utilise schedulers only to determine the probabilities of paths and traces, which does not require compositionality.
For both IOMA and IOSA, we restrict to finitelength schedulers in the remainder of the paper. As is usual, we also consider only schedulers that let time diverge with probability 1.
3.5.2 Probabilities of paths
By resolving all nondeterminism, a scheduler makes it possible to calculate the probability for measurable sets of paths via step probability functions. A scheduler schedules without delay. Hence, there are no additional races between Markovian transitions or edges and scheduler decisions.
Definition 11
The probability to halt right after \(\pi \) is inferred from the probability a scheduler assigns to the halting extension \(\perp \). Otherwise, this function defines, for every path \(\pi \), a measure quantifying the probability to continue from state \(\textit{last}(\pi ) = s\) by incurring a delay in the interval \(I \subseteq {\mathbb {R}}^{+}_{0} \), taking a transition in \(A_Q\), and ending up in a state in \(S_Q\). Auxiliary function \(P_\pi \) calculates the probability of doing so via a probabilistic transition while \(M_\pi \) considers Markovian transitions. The integral in \(M_\pi \) implements the exponential distribution of delays.
Definition 12
This function defines, for every path \(\pi \), a measure quantifying the probability to continue from state \(\textit{last}(\pi ) = \langle \ell , v, x \rangle \) by incurring a delay in the interval \(I \subseteq {\mathbb {R}}^{+}_{0} \), taking an edge in \(E_Q\), resetting a set of clocks in \(R_Q\), and ending up in a state in \(S_Q\). First, the factor \(\mathbb {1}_{t \in I}\) ensures that only delays in I have positive probability. We then sum the probabilities over all edges, with the value for each edge being given by auxiliary function \(Y^{S_Q}_{R_Q}\). In that function, we multiply the probability that the scheduler selects this edge, the probability for each probabilistic branch, and the probability to end up in a state in \(S_Q\) by following that branch. States are uncountable, so we integrate the probability density for every state as given by auxiliary function \(X_R^x\). A state can only have positive probability if the values it assigns to clocks are the previous values plus the selected delay plus the branch’s clock restarts (factor \(\mathbb {1}_{v' = (v+t)[R \mapsto 0]}\)). The final multiplication in \(X_R^x\) assigns the correct probability mass (via \(\textit{pdf}(c)(x'(c))\)) to sampling new expiration times for the clocks that are restarted (identified by \(c \in R\)); all other clocks retain their expiration times (as enforced by the first two lines of the case distinction).
3.5.3 Trace distributions
Overall, the twostep probability functions induce unique probability measures \(P_{{\mathfrak {S}}}\) over \(\textit{paths}^{\textit{fin}}({\mathcal {A}})\) for an automaton \({\mathcal {A}}\)and a scheduler \({\mathfrak {S}}\). We can define the trace distribution for \({\mathcal {A}}\) and a scheduler as the probability measure over traces (using abstract traces to construct the corresponding \(\sigma \)algebra) induced by these probability measures over paths in the usual way. The probability of a set of abstract traces X is the probability of all paths whose trace is in X.
Definition 13

\(\varOmega _{\mathcal {T}}\textit{AbsTraces}({\mathcal {M}})\),
 \({\mathcal {F}}_{\mathcal {T}}\) is the smallest \(\sigma \)field generated by the setswith \(C_{\varSigma } \{\, \varSigma ' \in \varOmega _{\mathcal {T}}\mid \varSigma \sqsubseteq \varSigma ' \,\}\), and$$\begin{aligned} \{\, C_\varSigma \mid \varSigma \in \textit{AbsTraces}^{\textit{fin}}({\mathcal {M}}) \,\} \end{aligned}$$

\(P_{\mathcal {T}}\) is the unique probability measure on \({\mathcal {F}}_{\mathcal {T}}\) defined by \(P_{\mathcal {T}}(X) = P_{{\mathfrak {S}}}(\textit{tr}^{1}({X}))\) for \(X\in \mathcal {F_{\mathcal {T}}}\).
We can also use trace distributions to relate two automata: \({\mathcal {A}}_1\) and \({\mathcal {A}}_2\) are related if they induce the same trace distributions. In particular, a trace distribution \({\mathcal {T}}\) of \({\mathcal {A}}_1\) is contained in the set of trace distributions of \({\mathcal {A}}_2\) if there is a scheduler \({\mathfrak {S}}\) in \({\mathcal {A}}_2\) such that \({\mathcal {T}}=\textit{trd}({\mathfrak {S}})\). We write \(\textit{trd}({\mathcal {A}},k)\) for the set of trace distributions based on a scheduler of length k and \(\textit{trd}({\mathcal {A}})\) for the set of all finite trace distributions. Finally, we write \({\mathcal {A}}_1\sqsubseteq ^k_{\textit{TD}}{\mathcal {A}}_2\) if \(\textit{trd}({\mathcal {A}}_1,k)\subseteq \textit{trd}({\mathcal {A}}_2,k)\) for \(k\in {\mathbb {N}}\), and \({\mathcal {A}}_1\sqsubseteq ^\textit{fin}_{\textit{TD}}{\mathcal {A}}_2\) if \({\mathcal {A}}_1\sqsubseteq _{\textit{TD}}^k{\mathcal {A}}_2\) for some \(k\in {\mathbb {N}}\). This induces an equivalence relation \(=_{\textit{TD}}\): \({\mathcal {A}}_1\) and \({\mathcal {A}}_2\) are trace distribution equivalent, written \({\mathcal {A}}_1 =_{\textit{TD}} {\mathcal {A}}_2\), iff \(\textit{trd}({\mathcal {A}}_1) = \textit{trd}({\mathcal {A}}_2)\).
4 Stochastic testing theory
Modelbased testing comprises automatic test case generation, execution, and evaluation based on a requirements model. We now establish this threestep procedure for IOMA and IOSA. As a first step, we define formal conformance between two models via two conformance relations akin to ioco [49], called marioco and saioco. We then specify what a test case is, and when an observed trace should be judged as correct via test annotations. Working in a stochastic environment also necessitates a statistical verdict. We describe the sampling process for an IUT and then define verdict functions. Finally, we prove the correctness of the framework.
The main difference of our stochastic test theory, compared to the probabilistic test theory of [20], lies in the sampling process and its resulting observations, in particular, in the trace frequency counting functions. We carefully defined IOMA and IOSA in such a way that many of the notions in the remainder of this section apply to both settings. For this reason, we will write \(\mathbf * \mathbf{ioco } \), \(\sqsubseteq ^{*}_{\textit{ioco}}\), etc., to summarise a definition for both \(\mathbf{marioco }\) and \(\mathbf{saioco } \), \(\sqsubseteq ^\textit{mar}_\textit{ioco}\) and \(\sqsubseteq ^{\textit{sa}}_{\textit{ioco}}\), etc.
4.1 Stochastic conformance relations
Definition 14
The prefix relation extends the one for traces to trace distributions. The output continuation of \({\mathcal {T}}\) of length k in \({\mathcal {M}}\) contains all trace distributions \({\mathcal {T}}'\) of length \(k+1\) such that \({\mathcal {T}}\sqsubseteq _k{\mathcal {T}}'\) and \({\mathcal {T}}'\) assigns probability zero to every abstract trace of length \(k+1\) that ends with an input.
We can now define the marioco and saioco conformance relations that relate inputenabled implementations \({\mathcal {I}}\) to specifications \({\mathcal {S}}\). Intuitively, \({\mathcal {I}}\) conforms to \({\mathcal {S}}\) if the probability of every output trace of \({\mathcal {I}}\) can be matched by \({\mathcal {S}}\) under some scheduler. This includes the functional behaviour, probabilistic behaviour, and stochastic timing, as accounted for in the definition of output continuations.
Definition 15
Example 4
Relationship to other relations If \({\mathcal {A}}\) is an IOMA without Markovian transitions or an IOSA where \({\mathcal {C}} = \varnothing \), then \({\mathcal {A}}\) is a probabilistic input–output transition system (pIOTS). Under this restriction, marioco and saioco coincide with pioco of [20] and are thus extensions of pioco:
Theorem 1
For two pIOTS \({\mathcal {I}}\) and \({\mathcal {S}}\) with \({\mathcal {I}}\) inputenabled, we have \({\mathcal {I}}\sqsubseteq ^{*}_{\textit{ioco}}{\mathcal {S}}\Leftrightarrow {\mathcal {I}}\sqsubseteq _{\textit{pioco}}{\mathcal {S}}\).
Proof sketch
All three relations are defined in the same way over trace distributions and schedulers, the notions for which coincide if \(T_M = \varnothing \) or \({\mathcal {C}} = \varnothing \), respectively. \(\square \)
Consequently, the relationships already established between pioco and other relations in [20] carry over as well: marioco and saioco extend ioco (i.e. the relations coincide on IOTS), and for trace distribution inclusion, we have the following result:
Theorem 2
Proof sketch
The fact that finite trace distribution inclusion implies conformance with respect to \(\sqsubseteq ^{*}_{\textit{ioco}}\) is immediate if we consider that the relation is defined via trace distributions. The opposite direction follows from the fact that all abstract traces of \({\mathcal {A}}\) ending in output assuredly can get assigned the same probabilities in \({\mathcal {B}}\) by \(\sqsubseteq ^{*}_{\textit{ioco}}\). All abstract traces ending in input are taken care of because \({\mathcal {A}}\) and \({\mathcal {B}}\) are inputenabled, and all such distributions are inputreactive. The second result is a direct consequence of the first. \(\square \)
4.2 Test cases and annotations
The advantage of MBT over manual testing is that test cases can be automatically generated from the specification and automatically executed on an implementation. We are interested in the result of a parallel composition of a test case and an implementation model. We define test cases over an action signature \(\langle \textit{Act}_I, \textit{Act}_O \rangle \). A test case is a collection of traces that represent the possible behaviour of a tester. It is summarised by an IOMA without Markovian transitions, or an IOSA without clocks, whose graph is a tree. The action signature describes the potential interaction with the implementation. In each state/location, the test may either stop, wait for a response of the system, or provide some stimulus. When a test is waiting for a response, it has to take into account all potential outputs including the situation that the system provides no response at all, modelled by quiescence \(\delta \). A single test case may provide multiple options, giving rise to multiple concrete testing sequences. It may also prescribe different reactions to different outputs.
Definition 16
 (1)
\(\textit{enabled}(s) = \emptyset \) (stop the test) or
 (2)
\(\textit{enabled}(s) = \textit{Act}^{\mathfrak {t}}_{I}\) (wait for some response) or
 (3)
\(\textit{enabled}(s) \subseteq \textit{Act}^{\mathfrak {t}}_{O} \wedge \textit{enabled}(s) = 1\) (provide a single stimulus, deterministically).
Test cases are, in effect, IOMA or IOSA that are IOTS. The inputs of a test case are the outputs of the action signature, i.e. the outputs of the implementation or specification, and vice versa. The last requirement in the definition ensures that only specified inputs are provided: a test may only judge the correctness of specified behaviour. This is referred to as being input minimal in the literature [47].
In order to identify the behaviour which we deem as functionally acceptable/correct, each complete trace of a test, i.e. every leaf state or location, is annotated with a pass or fail verdict. We annotate exactly the traces that are present in the specification with the \(pass \) verdict, formally:
Definition 17
Example 5
Figure 6 presents a test suite for the file server specification IOSA of Fig. 2. Test case \({\hat{{\mathfrak {t}}}}_1\) uses the quiescence observation \(\delta \) to assure no output is given in the initial state. \({\hat{{\mathfrak {t}}}}_2\) checks for eventual delivery of the file, which may be archived, requiring the intermediate wait! notification, or may be sent directly. Finally, \({\hat{{\mathfrak {t}}}}_3\) tests the abort? edge.
4.3 Sampling and verdicts
Functional conformance is assessed via test annotations in the same way as in classical ioco theory [47]. However, we test stochastic systems; thus, executing a test case once is insufficient to establish \(\mathbf * \mathbf{ioco } \) conformance. We now focus on the statistical evaluation of the probabilistic and stochastictimed behaviour based on a sample of multiple traces.
4.3.1 Sampling
We perform a statistical hypothesis test on the implementation based on the outcome of a pushbutton experiment in the sense of [37]. We assume a blackbox timed trace machine with inputs, a time and an action window, and a reset button, as illustrated in Fig. 7. An observer records each individual execution before the reset button is pressed and a new execution starts. A clock that increases is started, and is stopped once the next visible action is recorded. We assume that recording an action resets the clock. Thus, the recordings of the external observer match the notion of (abstract) traces. After a sample of sufficient size has been collected, we compare the collected frequencies of abstract traces to their expected frequencies according to the specification. If the empiric observations are close to the expectations, we accept the probabilistic behaviour of the implementation.
Before the experiment, we fix the parameters for sample length \(k\in {\mathbb {N}}\) (the length of the individual test executions), sample size \(m\in {\mathbb {N}}\) (how many test executions to observe), and level of significance \(\alpha \in \; ]0, 1[\) (the probability of erroneously rejecting a correct implementation). Checking the abstract trace frequencies contained in the sample versus their expectancy w.r.t. the specification \({\mathcal {S}}\) requires a scheduler due to the presence of nondeterminism in \({\mathcal {S}}\). In order for any statistical reasoning to work, we assume each iteration of the sampling process to be governed by the same scheduler, which induces a trace distribution \({\mathcal {T}}\in \textit{trd}({\mathcal {I}})\).
4.3.2 Frequencies and expectations
Example 6
4.3.3 Acceptable outcomes
Definition 18
Remark 2
The set of acceptable outcomes comprises samples of the form \(O \in ({\mathbb {R}}^{+}_{0} \times \textit{Act})^{\le k\times m}\). In order to align observations with the \(\mathbf * \mathbf{ioco } \) relations, we define the set of acceptable output outcomes \(\textit{OutObs}({\mathcal {T}},\alpha ,k,m)\) as the set of those \(O\in (({\mathbb {R}}^{+}_{0} \times \textit{Act})^{\le k1} \times {\mathbb {R}}^{+}_{0} \times \textit{Act}_O)^m\) for which we have \(\textit{dist}(\textit{freq}(O), {\mathbb {E}}^{\mathcal {T}})\le r_\alpha \).
Verdict functions With all necessary components in place, the following decision process summarises whether an implementation fails a test case or test suite based on a functional or statistical verdict. The overall pass verdict is given iff both subverdicts yield a pass. Let \(\textit{Aut}_{*}\) denote the set of all IOMA or IOSA, respectively.
Definition 19
An implementation passes a test suite \({\hat{{\mathfrak {T}}}}\) if it passes the overall verdict for all annotated tests \({\hat{{\mathfrak {t}}}}\in {\hat{{\mathfrak {T}}}}\).
Although IOMA and IOSA include three properties in terms of (1) functional behaviour, (2) discrete probabilistic behaviour, and (3) continuous time, we only have two verdicts. This is because continuous time is only present in the form of stochastic delays. Thus, on the purely mathematical level, the decision whether or not a delay in the implementation adheres to the one specified is covered by the probabilistic verdict \(v_{\textit{prob}}\). Only on the practical side of things do we need a new decision procedure. We study this in Sect. 5.
4.4 Soundness and completeness
Ideally, only \(\mathbf * \mathbf{ioco } \)correct implementations pass a test suite. However, due to the stochastic nature of our models, there remains a degree of uncertainty upon giving verdicts. This is phrased as errors of first and second kind in hypothesis testing: the probability to reject a true hypothesis and to accept a false one, respectively. They are reflected as the probability to reject a correct implementation and to accept an erroneous one in the context of probabilistic MBT. The relevance of these errors becomes evident when we consider the correctness of our test frameworks. Correctness comprises soundness and completeness: every conforming implementation passes, and there is a test case to expose every nonconforming one. A test suite can only be considered correct with some guaranteed (high) probability.
Definition 20
Soundness expresses for a given \(\alpha \in \; ]0,1[\) that there is a \(1\alpha \) chance that a correct system passes the annotated test suite for sufficiently large sample size m. This relates to false rejection of a correct hypothesis in statistical hypothesis testing, or rejection of a correct implementation, respectively.
For the following theorems, we provide full proofs for saioco. The proofs for marioco use the exact same arguments and only lack some of the technical complications of the more general IOSA setting. The interested reader may find the full proofs for marioco in [18].
Theorem 3
Each annotated test case for an automaton \({\mathcal {S}}\) is sound for every level of significance \(\alpha \in (0,1)\) with respect to \(\sqsubseteq ^{*}_{\textit{ioco}}\).
Proof
Let \({\mathcal {I}}\) be an inputenabled IOSA and \({\hat{{\mathfrak {t}}}}\) be a test for \({\mathcal {S}}\). Assume that \({\mathcal {I}}\sqsubseteq ^{\textit{sa}}_{\textit{ioco}}{\mathcal {S}}\). We want to show \(V({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{pass}\). By Definition 19, we have that \(V({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{pass}\) if and only if \(v_{\textit{func}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=v_{\textit{prob}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{pass}\). We proceed by showing \(v_{\textit{func}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{pass}\) and \(v_{\textit{prob}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{pass}\) in separate steps:
Completeness of a test suite is an inherently theoretical result. Infinite behaviour of the implementation, for instance, via loops, would require an infinite test suite. Moreover, the possibility of accepting an erroneous implementation by chance, i.e. committing an error of the second kind, remains. However, the latter is bounded from above by construction, and decreases with increasing sample size (Definition 18).
Theorem 4
The set of all annotated test cases for an automaton \({\mathcal {S}}\) is complete for every level of significance \(\alpha \in (0,1)\) with respect to \(\sqsubseteq ^{\textit{sa}}_{\textit{ioco}}\) for sufficiently large sample size.
Proof
Without loss of generality, we assume \(P_{{\mathcal {T}}}(\varSigma )>0\). To see why, assume \(P_{{\mathcal {T}}}(\varSigma )=0\). Then, we can find a trace distribution in \(\textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}^*)\) with an underlying scheduler \(\textit{Sched}({\mathcal {S}})\) that does not assign positive probability to the last action in \(\sigma \) to obtain overall probability zero. This violates the assumption that \(P_{{\mathcal {T}}}(\varSigma )\ne P_{{\mathcal {T}}'}(\varSigma )\) for all \({\mathcal {T}}'\in \textit{trd}({\mathcal {S}})\). We conclude \(\sigma =\sigma ' \!\!\mathbin {.} t\,a\), for some \(\sigma ' \in ({\mathbb {R}}^{+}_{0} \times \textit{Act})^k\), \(a\in \textit{Act}_O\) and \(t\in {\mathbb {R}}^{+}_{0} \). The prefix \(\sigma '\) is in \(\textit{traces}^{\textit{fin}}({\mathcal {S}})\) because it is of length k and since \({\mathcal {T}}^*\in \textit{trd}({\mathcal {S}},k)\). Since \({\mathcal {T}}\) and all \({\mathcal {T}}'\in \textit{outcont}_{{\mathcal {S}}}({\mathcal {T}}^*)\) are continuations of \({\mathcal {T}}^*\), we conclude that \(P_{{\mathcal {T}}^*}(\varSigma ')=P_{{\mathcal {T}}}(\varSigma ')=P_{{\mathcal {T}}'}(\varSigma '),\) i.e. that all trace distributions of the respective sets assign every prefix of \(\sigma \) the same probability by merit of \(\textit{outcont}\). We conclude \(\sigma '\in \textit{traces}^{\textit{fin}}({\mathcal {S}})\), but \(\sigma ' \!\!\mathbin {.} t\,a\notin \textit{traces}^{\textit{fin}}({\mathcal {S}})\).
By initial assumption \({\hat{{\mathfrak {T}}}}\) contains all annotated test cases. Let \({\hat{{\mathfrak {t}}}}\in {\hat{{\mathfrak {T}}}}\) such that \(\sigma \in \textit{traces}^{\textit{com}}({\hat{{\mathfrak {t}}}})\). This is possible because \(\sigma '\in \textit{traces}^{\textit{fin}}({\mathcal {S}})\). By Definition 17, \(\textit{ann}_{\textit{saioco}}^{\mathcal {S}}(\sigma )=\textit{fail}\). Recall that the set of clocks in test cases in empty. Since \(\sigma \in \textit{traces}^{\textit{fin}}({\mathcal {I}})\) and \(\sigma \in \textit{traces}^{\textit{com}}({\hat{{\mathfrak {t}}}})\), we consequently also have \(\sigma \in \textit{traces}^{\textit{com}}({\mathcal {I}}\!\parallel \!{\hat{{\mathfrak {t}}}})\) as no guard or restart sets are changed under parallel composition with a test case. Ultimately, this yields \(v_{\textit{func}}({\mathcal {I}},{\hat{{\mathfrak {t}}}})=\textit{fail}\).
5 Implementing stochastic testing
We now present practical procedures to implement the concepts defined in the previous section. First, we propose a goodnessoffit method in the form of Pearson’s \(\chi ^2\) test enriched with confidence interval analysis on the time stamps to evaluate the stochastic behaviour of the observed traces in the IOMA setting. Waiting times recorded in traces are grouped and compared to the prescribed rate parameters in the specification. Some additional assumptions are necessary to enable a clean and efficient framework. Since IOSA are not limited to exponential distributions, we need more powerful ways to infer if a sample was drawn from a particular distribution. In the IOSA setting, we thus apply the Kolmogorov–Smirnov (KS) test, which is able to infer general probability distributions, in place of interval estimation. Next, we discuss the interplay of stochastic delays and quiescence. Finally, we summarise the overall stochastic MBT procedure from test case generation to final verdicts.
5.1 Goodness of fit
We need practically applicable methods to decide about the verdicts given by Definition 19. While the functional verdict is determined via test annotations in the same straightforward way as in traditional ioco testing, we also need a procedure to decide the probabilistic verdict. We propose a twostep procedure consisting of Pearson’s \(\chi ^2\) hypothesis test for the discrete probabilities followed by interval estimation (in the IOMA setting) or multiple KS tests (in the IOSA setting) for the time stamps resulting from the stochastic delays.
Our method is based on a theorem known from the literature [8] relating trace distributions to the set of acceptable outcomes. However, neither is readily available to us in case of a real blackbox implementation—only experiments and samples give evidence about its inner workings. Therefore, we pose a nullhypothesis test based on a gathered sample of the implementation. Should the sample turn out to be an acceptable outcome of the specification, too, then we accept the hypothesis that all observations of the implementation are also observations of the specification. In tandem with the theorem by Cheung et al. [8], this would imply an embedding on the set of trace distributions. Consequently, the resulting probabilistic verdict in Definition 19 would be pass.
5.1.1 Pearson’s \({{\varvec{\chi }}}^2\) test
In previous work for pIOTS models [20], we used the \(\chi ^2\) hypothesis test to judge discrete probabilistic behaviour. Its outcome is based on a sample O taken from the implementation under test. Should O prove to be a sample of the set \(\textit{OutObs}({\mathcal {S}},\alpha ,k,m)\) for some \(\alpha \in (0,1)\), we are willing to accept the hypothesis of the embeddings of observations. In the continuoustime stochastic case, we argue along the same lines. However, only applying the \(\chi ^2\) hypothesis test is insufficient, as it does not take into account the delays observed in abstract traces. Nonetheless, passing the \(\chi ^2\) test is a necessary condition for an implementation to be accepted.
5.1.2 Interval estimation for IOMA
In addition to the \(\chi ^2\) test defined above, we need a metric to decide whether the observed delays correspond to exponential distributions prescribed by the specification in the IOMA setting. For this purpose, we use interval estimation on the parameters of the exponential distributions.
Example 7
Figure 8 shows an example specification model alongside an example observation sample from an implementation. State \(s_0\) has two outgoing \(\tau \) transitions, followed by one Markovian transition in each of \(s_1\) and \(s_2\). In states \(s_3\) and \(s_4\), we either observe action a! or b!, respectively. The sample shows 14 recorded traces of length one, thus \(m=14\) and \(k=1\). There are two steps to assess whether the observed data are a truthful sample of the specification model with a confidence of \(\alpha =0.1\): first find a trace distribution that minimises the \(\chi ^2\) statistic, then evaluate two confidence intervals to assess whether the observed time data are a sample of \(\lambda _1=1\) and \(\lambda _2=0.1\), respectively.
\(t_1=0.03,\ldots ,t_8=2.69\) is the data associated with \(\lambda _1\) and \(t_1'=2.28,\ldots ,t'_6=19.01\) the data associated with \(\lambda _2\). Calculating the confidence intervals according to Eq. 8 yields \(C_1=[0.441,1.458]\) and \(C_2=[0.092,0.368]\). We see that \(\lambda _1\in C_1\) and \(\lambda _2\in C_2\) and are therefore willing to accept that the recorded sample was drawn under the prescribed parameters.
These two steps do not yet make a sound statement about the acceptance of the hypothesis \(O\in \textit{OutObs}({\mathcal {S}},0.05,1,14)\) since we test multiple hypotheses at once. We need to adjust the individual level of significance for the statistical tests, to conclude the overall acceptance with \(\alpha =0.1\). This inflation of the error of first kind is discussed in Sect. 5.1.4.

We must be able to uniquely identify every recorded trace. Assume for illustration that the transition currently labelled b! was labelled a! instead. It would not directly be possible to associate values \(t_i\) with \(\lambda _1\) and \(t_i'\) with \(\lambda _2\); we would need to check all possible permutations. This becomes infeasible in practice even for moderate sample sizes or moderately sized models; we therefore assume all specification models to be internally deterministic, i.e. there must be a bijection between paths and traces.

The sum of exponential distributions is not an exponential distribution. Hence, confidence interval estimation would be flawed for two sequential Markovian actions. We would need to deal with phasetype distributions instead, which are dense in the set of all positively valued distributions. We thus assume models to contain an input or output between any two Markovian transitions.
5.1.3 Kolmogorov–Smirnov tests for IOSA
Working with IOSA means that specifications and implementations are not limited to the exponential distribution. Since they neither comprise one specific distribution nor one specific parameter to test for, we use the nonparametric KS test to validate that the observed delays were drawn from the specified clocks and distributions. The KS test assesses whether observed data matches a hypothesised continuous probability measure. We thus restrict the practical application of our approach to IOSA where the \(F(c)\) for all clocks c are continuous distributions.
Example 8
The lefthand side of Fig. 9 shows a tiny example specification IOSA with clocks x and y. The expiration times of both are uniformly distributed with different parameters. In \(\ell _0\) there is a nondeterministic choice to either take the left or the right branch. The righthand side depicts a sample from this IOSA. There are two steps to assess whether the observed data are a truthful sample of the specification with a confidence of \(\alpha =0.05\): first find a trace distribution that minimises the \(\chi ^2\) statistic, and then evaluate two KS tests to assess whether the observed time data are a truthful sample of Uni\(\left[ 0,2\right] \) and Uni\(\left[ 0,3\right] \), respectively.
Our intention is to provide a general and universally applicable procedure. The KS test is conservative for general distributions, but can be made precise [10]. Specialised and thus more efficient tests exist for specific distributions, e.g. the Lilliefors test [29] for Gaussian distributions, and parametric tests are generally preferred due to higher power at equal sample size. The KS test requires a comparably large sample size, an alternative being, e.g. the Anderson–Darling test [29].
Remark 3
The connection of two nonparametric tests is immensely more difficult in the presence of internal nondeterminism in a specification, cf. Example 8 with only a! on both visible edges. Time values can no longer be unambiguously addressed to unique distributions, and no confidence bound for the measured time data can be given. In this case, the scheduler probability decisions p are used as parameters for mixture distributions, e.g. \(F\left( p\right) p\cdot F_x + (1p)\cdot F_y\) in Fig. 9. The parameterised distribution can then be used in the iterative expectation–maximisation algorithm [38], and confidence can be given upon convergence.
For the sake of simplicity, we assume that the specification is internally deterministic, i.e. there are no two paths that result in the same trace. While this decreases the space of potential specifications, we deem it a necessary compromise to come up with a feasible and general method.
5.1.4 Multiple comparisons
Since the \(\chi ^2\) test and all subsequent confidence interval estimations or KS tests are statistical hypothesis tests on their own, their errors accumulate. To illustrate: if a hypothesis test is performed at \(\alpha =0.05\) there is a 5% chance of performing an error of first kind, i.e. of erroneously rejecting a true hypothesis. If we apply 100 individual tests with \(\alpha =0.05\), we might naively expect to perform this error 5 times. If we assume the tests to be independent, the probability of committing at least one error of the first kind actually calculates as \(1(10.05)^{100}=99.4\%\).
There are several techniques to cope with the inflation of the error of first kind. For the remainder of this section, we use Bonferroni correction: \( \alpha _{\textit{local}}=\alpha _{\textit{global}}/{l} \) where l is the total number of statistical hypothesis tests to be performed.
Example 9
We return to Example 7. Applying Bonferroni correction for a total of three hypothesis tests with desired \(\alpha = \alpha _\textit{global} = 0.1\) tests yields a necessary \(\alpha _{\textit{local}}\approx 0.033\). This applies to the \(\chi ^2\) test and the two interval estimations. The \(\chi ^2\) test still passes, and the new confidence intervals are \(C'_1=[0.353,1.677]\) and \(C'_2=[0.070,0.432]\). We see that \(\lambda _1\in C_1'\) and \(\lambda _2\in C_2'\) still hold, so we give the implementation the probabilistic pass verdict.
5.2 Stochastic delays and quiescence
A test case needs to assess if an implementation is allowed to be unresponsive when output was expected [45]. In our formalism, quiescence \(\delta \) models the absence of output for an indefinite time. It should be regarded with caution in practical testing scenarios. A common way to deal with quiescence is a global fixed timeout value set by a user [2, 5]. The time progress in IOMA and IOSA is governed by continuous probability distributions; hence, a global timeout has two disadvantages: first, a timeout might occur before a specified Markovian transition or edge takes place. The average waiting time of this event might be substantially higher than the global timeout. Second, a global timeout might unnecessarily prolong the overall test process.
A timeout can be seen as a delay that follows a Dirac distribution. While this naturally fits into the framework of stochastic automata, it is incompatible with the IOMA approach: Dirac delays cannot be represented in IOMA, and consequently, they were not considered in the statistical evaluation that we developed in Sect. 5.1.2. We now detail an approach for IOMA that avoids the problem of Dirac distributions and aims to minimise the probability of erroneously declaring quiescence while keeping the overall testing time as low as possible. While Dirac distributions are supported by IOSA, similar ideas for the latter apply to IOSA, too.
In order to avoid Dirac distributions, an MBT tool for IOMA needs to implement quiescence by racing an exponentially distributed delay with rate \(\mu _\delta \) against the implementation; this quiescence timer winning the race is then treated as the quiescence output \(\delta \). Let \(\lambda >0\) be the minimum exit rate over all Markovian states. With level of significance \(\alpha \in \; ]0,1[\), we would like the probability that the quiescence timer expires before a Markovian transition is executed, i.e. that we incorrectly report quiescence when the implementation could make progress, to be at most \(\alpha \). Choosing \(\mu _\delta = \lambda \cdot \frac{\alpha }{1  \alpha }\) as the quiescence timer’s rate achieves this probability with the shortest waiting time in case of actual quiescence. We can further reduce the waiting time by using a different rate in every state: if the exit rate of state s is \(\lambda _s\), we use rate \(\mu _\delta ^s = \lambda _s \cdot \frac{\alpha }{1  \alpha }\) to judge quiescence in s.
Example 10
Figure 10 (top) shows a simple specification of a file transmission protocol. Exponential distributions model the delay between sending a file and acknowledging its reception. Different delays are associated with sending small or a large files, respectively. After a file was sent, there is a chance that it gets lost, and we do not receive an acknowledgement. In this case, the system is judged as quiescent, and therefore erroneous.
However, since \(\lambda _2\ll \lambda _1\), a test should use a quiescence timer rate of \(\mu _\delta ^{s_1} = 10 \cdot \frac{\alpha }{1  \alpha }\) in \(s_1\) and \(\mu _\delta ^{s_2} = \frac{\alpha }{1  \alpha }\) in \(s_2\) to minimise the probability to erroneously judge the system as quiescent while also keeping the global testing time as low as possible. Regardless, for sufficiently large sample size, an MBT tool eventually erroneously observes quiescence. Figure 10 (bottom) therefore allows some amount of quiescence observations depending on \(\alpha \), i.e. on how many erroneous quiescent judgements we are willing to accept.
Example 11
 Long global:
A sensible long global quiescence timer rate is \(\mu _d = \mu _\delta ^{s_2} \approx 0.053\). Executing 100 test cases yields a worstcase expected waiting time (for the case where implementation is always quiescent) of \(100/\mu _\delta ^{s_2} = 1900\) time units. However, we are (more than) guaranteed to incorrectly judge the implementation quiescent in at most \(5\,\%\) of all cases.
 Short global:
A sensible short global quiescence timer rate is \(\mu _d = \mu _\delta ^{s_1} \approx 0.526\). The worstcase expected time is now only 190 time units. However, the probability of the Markovian transition with rate \(\lambda _2\) not firing before the quiescence timer becomes \(\approx 34\,\%\). We would then incorrectly judge the implementation quiescent even though the transition might still take place.
 Individual:
Using the long rate in state \(s_2\) and the short one in state \(s_1\) guarantees that we erroneously judge quiescence overall in at most 5% of the cases. Note that this is accounted for in the specification in Fig. 10 (bottom). The worstcase waiting time now depends on the probability p of sending a small file instead of a large one; it is \(p \cdot 190 + (1  p) \cdot 1900\). Time is saved in the overall test process whenever a small file is sent.
5.3 Stochastic test procedure outline
 1.
Generate an annotated test case (suite) for the specification automaton.
 2.
Execute the test case (all test cases of the test suite) m times. If the functional \(\textit{fail}\) verdict is encountered in any of the m executions, then fail the implementation for functional reasons.
 3.
Calculate the number of necessary statistical hypothesis tests for each test case. Correct \(\alpha \) accordingly.
 4.Perform statistical analysis on the gathered sample of size m for the test case (all test cases) with the new parameter \({\bar{\alpha }}\).
 (a)

Use optimisation or constraint solving to find a scheduler such that \(\chi ^2\le \chi ^2_{\textit{crit}}\). If no such scheduler is found, reject the implementation for statistical reasons.
 (\(\hbox {b}_1\))

For IOMA, perform confidence interval estimation, and check if all Markovian parameters are contained in their respective intervals. If there is at least one parameter not contained in its confidence interval, reject the implementation for statistical reasons.
 (\(\hbox {b}_2\))

For IOSA, group all time stamps assigned to the same clock and perform a KS test for each clock. If any of them fail, reject \({\mathcal {I}}\) for statistical reasons.
 5.
Accept the implementation.
6 A Bluetooth device discovery example
Bluetooth is a wireless communication standard [3] aimed at lowpowered devices that communicate over short distances. Before any communication can take place, Bluetooth devices organise into small networks of one master and up to seven slave devices. To cope with interference, this device discovery protocol uses a frequency hopping scheme.
To illustrate and compare our frameworks for IOMA and IOSA, we study the discovery phase for one master and one slave device. The device discovery protocol is inherently stochastic due to the initially random and unsynchronised state of the devices. We give a highlevel overview of the protocol here and refer the interested reader to a verification case study performed with PRISM [16] for a more detailed description and formal analysis in a more general setting.
6.1 Device discovery protocol
To resolve possible interference, the master and slave device communicate via a prescribed sequence of 32 frequencies. Both devices have a 28bit clock that ticks every 312.5 \(\upmu \hbox {s}\).
The slave device periodically scans on the 32 frequencies. It is in either a sleeping or a listening state. To ensure eventual connection, the hopping rate of the slave device is much slower. The Bluetooth standard leaves some flexibility with respect to the length of the listening period. For our study, every 0.64 s, it listens to one frequency for 11.25 ms and sleeps during the remaining time. It cycles to the next frequency after 1.28 s. This is enough time for the master device to broadcast on 16 different frequencies.
6.2 Specification models

Synchronisation happens during the first 16 broadcast frequencies. This happens between 0 and 1.28 s and comprises 16 frequencies.

Synchronisation happens after the first frequency swap of the master device (1.28 to 2.56 s, one frequency).

Synchronisation happens after the first switch of tracks and two frequency swaps of the master device (2.56 to 3.84 s, 14 frequencies).

Synchronisation happens after the first switch of tracks and three frequency swaps of the master device (3.84 to 5.12 s, one frequency).
6.3 Experimental setup
 \({\mathcal {M}_1}\)

The first master mutant never switches between tracks one and two, therefore covering far fewer different frequencies than the correct protocol in the same time. It will need a total of \(16 \cdot 1.28\,{\mathrm {s}} = 20.48\,{\mathrm {s}}\) to cover all 32 frequencies. Hence, we expect a much longer time to connect when compared to the correct implementation.
 \({\mathcal {M}_2}\)

The second master mutant never swaps frequencies, only switching between tracks one and two. The expected time to connect will therefore be around 2.56 s.
 \({\mathcal {S}_1}\)

The slave mutant has its listening period halved, and thus only listens for 5.65 ms every 1.28 s. Therefore, it has a longer sleeping period and we expect that the probability to connect is slightly reduced when compared to the correct counterpart.
6.4 Results
Connection time confidence intervals (IOMA)
Correct  Mutants  

\({\mathcal {M}} \!\parallel \! {\mathcal {S}}\)  \({\mathcal {M}}_1 \!\parallel \! {\mathcal {S}}\)  \({\mathcal {M}}_2 \!\parallel \! {\mathcal {S}}\)  \({\mathcal {M}} \!\parallel \! {\mathcal {S}}_1\)  
\(k=2\)  \(\textit{pass}\)  \(\textit{fail}\)  \(\textit{pass}\)  \(\textit{pass}\) 
\(m=100\)  [0.586, 0.868]  –  [0.597, 0.885]  [0.673, 0.997] 
Timeouts  0  33  0  0 
\(k=2\)  \(\textit{pass}\)  \(\textit{fail}\)  \(\textit{fail}\)  \(\textit{fail}\) 
\(m=1000\)  [0.729, 0.826]  –  [0.767, 0.868]  [0.756, 0.855] 
Timeouts  0  376  0  0 
\(k=2\)  \(\textit{pass}\)  \(\textit{fail}\)  \(\textit{fail}\)  \(\textit{fail}\) 
\(m=10{,}000\)  [0.735, 0.764]  –  [0.772, 0.803]  [0.757, 0.787] 
Timeouts  0  3753  0  0 
IOSA We used MATLAB’s kstest2 function to execute a twosample KS test to analyse the samples with respect to the specified time distribution. Table 2 shows the verdicts and the observed KS statistics \(K_m\) alongside the corresponding critical values \(K_{\textit{crit}}\) for our experiments. The statistical verdict \(\textit{pass}\) was given if \(K_m<K_{\textit{crit}}\), and \(\textit{fail}\) otherwise. The critical values depend on \(\alpha \) and m. The correct implementation was accepted in all three experiments. During the sampling of \({\mathcal {M}}_1\!\parallel \!{\mathcal {S}}\), we again observed several timeouts leading to a functional \(fail \) verdict. It would also have failed the KS test in all three experiments. \({\mathcal {M}}_2\!\parallel \!{\mathcal {S}}\) passed the test for \(m=100\), but was rejected with increased sample size. \({\mathcal {M}}\!\parallel \!{\mathcal {S}}_1\) is the most subtle of the three mutants and was only rejected with \(m=10{,}000\) at a narrow margin.
Verdicts and KS test results (IOSA)
Correct  Mutants  

\({\mathcal {M}} \!\parallel \! {\mathcal {S}}\)  \({\mathcal {M}}_1 \!\parallel \! {\mathcal {S}}\)  \({\mathcal {M}}_2 \!\parallel \! {\mathcal {S}}\)  \({\mathcal {M}} \!\parallel \! {\mathcal {S}}_1\)  
\(k=2\)  \(\textit{pass}\)  \(\textit{fail}\)  \(\textit{pass}\)  \(\textit{pass}\) 
\(m=100\)  \(K_{m}=0.065\)  –  \(K_{m}=0.110\)  \(K_{m}=0.065\) 
\(K_{\textit{crit}}=0.136\)  \(K_{\textit{crit}}=0.136\)  \(K_{\textit{crit}}=0.136\)  
Timeouts  0  40  0  0 
\(k=2\)  \(\textit{pass}\)  \(\textit{fail}\)  \(\textit{fail}\)  \(\textit{pass}\) 
\(m=1000\)  \(K_{m}=0.028\)  –  \(K_{m}=0.050\)  \(K_{m}=0.020\) 
\(K_{\textit{crit}}=0.045\)  \(K_{\textit{crit}}=0.045\)  \(K_{\textit{crit}}=0.045\)  
Timeouts  0  399  0  0 
\(k=2\)  \(\textit{pass}\)  \(\textit{fail}\)  \(\textit{fail}\)  \(\textit{fail}\) 
\(m=10{,}000\)  \(K_{m}=0.006\)  –  \(K_{m}=0.043\)  \(K_{m}=0.0193\) 
\(K_{\textit{crit}}=0.019\)  \(K_{\textit{crit}}=0.019\)  \(K_{\textit{crit}}=0.0192\)  
Timeouts  0  3726  0  0 
In the IOSA setting, observe that the critical value decreases faster than the observed KS statistic in all three faulty implementations. We conjecture that an even larger sample is expected to have a clearer verdict, as this is in line with the decreasing error of the second kind for increasing sample size pointed out in Sect. 4. This is especially desirable in the case of \({\mathcal {M}}\!\parallel \!{\mathcal {S}}_1\), where a sample of size \(m=10{,}000\) was needed to refute the faulty implementation. This is in contrast to the IOMA setting, where \(m = 1000\) sufficed, and highlights that the statistical evaluation for IOMA is in general more efficient (it needs fewer samples for clearer verdicts) than the one for IOSA. We point out that an alternate specification to the very compact one given in Fig. 11 is possible. For instance, the entire specification could comprise a probabilistic branching over 32 locations with deterministic guard sets according to the step values of the distribution of the Bluetooth specification. This illustrates the flexibility of the modelling capabilities in the IOSA test framework, and goes to show there is no unique best model.
Overall, there is a tradeoff in expressivity and efficiency when comparing the test theory for Markov automata and stochastic automata in practical applications.
7 Conclusion
We presented two closely related sound and complete MBT frameworks to test probabilistic systems with stochastic delays. The underlying modelling formalisms are Markov automata and stochastic automata with a separation of their alphabet into inputs and outputs: IOMA and IOSA. The former limit delays to follow exponential distributions, but mark a relevant intermediate step between previous work on testing untimed probabilistic models [20] and the full generality—and complexity—of stochastic automata. In particular, the statistical evaluation of testing results is far simpler and more efficient in the case of IOMA. On the other hand, our Bluetooth case study shows that being able to represent arbitrary distributions over time directly as in IOSA may lead to specifications that much more closely match reality, and to provide results that are more precise and understandable.
Notes
References
 1.Baier C, Katoen JP (2008) Principles of model checking. MIT Press, CambridgezbMATHGoogle Scholar
 2.Belinfante A (2014) JTorX: exploring modelbased testing. Ph.D. thesis, University of Twente, Enschede, The Netherlands. http://purl.utwente.nl/publications/91781
 3.Bluetooth SIG: Bluetooth specification, version 1.2. www.bluetooth.com (2003)
 4.Bohnenkamp HC, Belinfante A (2005) Timed testing with TorX. In: Formal methods: international symposium of Formal Methods Europe (FM). Lecture notes in computer science, vol 3582. Springer, pp 173–188. https://doi.org/10.1007/11526841_13
 5.Briones LB, Brinksma E (2004) A test generation framework for quiescent realtime systems. In: 4th international workshop on formal approaches to software testing (FATES). Lecture notes in computer science, vol 3395. Springer, pp 64–78. https://doi.org/10.1007/9783540318484_5
 6.Budde CE, D’Argenio PR, Hartmanns A, Sedwards S (2018) A statistical model checker for nondeterminism and rare events. In: 24th international conference on tools and algorithms for the construction and analysis of systems (TACAS). Lecture notes in computer science, vol 10806. Springer, pp 340–358. https://doi.org/10.1007/9783319899633_20
 7.Cheung L, Lynch NA, Segala R, Vaandrager FW (2006) Switched PIOA: parallel composition via distributed scheduling. Theor Comput Sci 365(1–2):83–108. https://doi.org/10.1016/j.tcs.2006.07.033 MathSciNetzbMATHGoogle Scholar
 8.Cheung L, Stoelinga M, Vaandrager FW (2007) A testing scenario for probabilistic processes. J ACM 54(6):29. https://doi.org/10.1145/1314690.1314693 MathSciNetzbMATHGoogle Scholar
 9.Cleaveland R, Dayar Z, Smolka SA, Yuen S (1999) Testing preorders for probabilistic processes. Inf Comput 154(2):93–148. https://doi.org/10.1006/inco.1999.2808 MathSciNetzbMATHGoogle Scholar
 10.Conover WJ (1972) A Kolmogorov goodnessoffit test for discontinuous distributions. J Am Stat Assoc 67(339):591–596MathSciNetzbMATHGoogle Scholar
 11.D’Argenio PR, Katoen JP (2005) A theory of stochastic systems part I: stochastic automata. Inf Comput 203(1):1–38. https://doi.org/10.1016/j.ic.2005.07.001 MathSciNetzbMATHGoogle Scholar
 12.D’Argenio PR, Lee MD, Monti RE (2016) Input/output stochastic automata—compositionality and determinism. In: 14th international conference on formal modeling and analysis of timed systems (FORMATS). Lecture notes in computer science, vol 9884. Springer, pp 53–68. https://doi.org/10.1007/9783319448787_4
 13.Dehnert C, Junges S, Katoen JP, Volk M (2017) A Storm is coming: A modern probabilistic model checker. In: 29th international conference on computer aided verification (CAV). Lecture notes in computer science, vol 10427. Springer, pp 592–600. https://doi.org/10.1007/9783319633909_31
 14.Deng Y, van Glabbeek RJ, Hennessy M, Morgan C (2008) Characterising testing preorders for finite probabilistic processes. Log Methods Comput Sci 4(4):4. https://doi.org/10.2168/LMCS4(4:4)2008 MathSciNetzbMATHGoogle Scholar
 15.Deng Y, Hennessy M (2013) On the semantics of Markov automata. Inf Comput 222:139–168. https://doi.org/10.1016/j.ic.2012.10.010 MathSciNetzbMATHGoogle Scholar
 16.Duflot M, Kwiatkowska MZ, Norman G, Parker D (2006) A formal analysis of Bluetooth device discovery. STTT 8(6):621–632. https://doi.org/10.1007/s100090060014x Google Scholar
 17.Eisentraut C, Hermanns H, Zhang L (2010) On probabilistic automata in continuous time. In: 25th annual IEEE symposium on logic in computer science (LICS). IEEE Computer Society, pp 342–351. https://doi.org/10.1109/LICS.2010.41
 18.Gerhold M (2018) Choice and chance—modelbased testing of stochastic behaviour. Ph.D. thesis, University of Twente, Enschede, The Netherlands. https://doi.org/10.3990/1.9789036546959
 19.Gerhold M, Hartmanns A, Stoelinga M (2018) Modelbased testing for general stochastic time. In: 10th international NASA formal methods symposium (NFM). Lecture notes in computer science, vol 10811. Springer, pp 203–219. https://doi.org/10.1007/9783319779355_15
 20.Gerhold M, Stoelinga M (2016) Modelbased testing of probabilistic systems. In: 19th international conference on fundamental approaches to software engineering (FASE). Lecture notes in computer science, vol 9633. Springer, pp 251–268. https://doi.org/10.1007/9783662496657_15
 21.Gerhold M, Stoelinga M (2017) Modelbased testing of probabilistic systems with stochastic time. In: 11th international conference on tests and proofs (TAP). Lecture notes in computer science, vol 10375. Springer, pp 77–97. https://doi.org/10.1007/9783319614670_5
 22.Gibbons JD, Chakraborti S (2011) Nonparametric statistical inference. In: International encyclopedia of statistical science. Springer, pp 977–979. https://doi.org/10.1007/9783642048982_420
 23.Gordon AD, Henzinger TA, Nori AV, Rajamani SK (2014) Probabilistic programming. In: Future of software engineering (FOSE). ACM, pp 167–181. https://doi.org/10.1145/2593882.2593900
 24.GrafBrill A, Hartmanns A, Hermanns H, Rose S (2017) Modelling and certification for electric mobility. In: 15th IEEE international conference on industrial informatics (INDIN). IEEE, pp 109–114. https://doi.org/10.1109/INDIN.2017.8104755
 25.Hartmanns A, Hermanns H (2014) The Modest Toolset: an integrated environment for quantitative modelling and verification. In: 20th international conference on tools and algorithms for the construction and analysis of systems (TACAS). Lecture notes in computer science, vol 8413. Springer, pp 593–598. https://doi.org/10.1007/9783642548628_51
 26.Hérault T, Lassaigne R, Magniette F, Peyronnet S (2004) Approximate probabilistic model checking. In: 5th international conference on verification, model checking, and abstract interpretation (VMCAI). Lecture notes in computer science, vol 2937. Springer, pp 73–84. https://doi.org/10.1007/9783540246220_8
 27.Hermanns H (2002) Interactive Markov chains: the quest for quantified quality. Lecture notes in computer science, vol 2428. Springer. https://doi.org/10.1007/3540458042
 28.Hierons RM, Merayo MG, Núñez M (2009) Testing from a stochastic timed system with a fault model. J Log Algebr Program 78(2):98–115. https://doi.org/10.1016/j.jlap.2008.06.001 MathSciNetzbMATHGoogle Scholar
 29.Hollander M, Wolfe DA, Chicken E (2013) Nonparametric statistical methods. Wiley, New YorkzbMATHGoogle Scholar
 30.Katoen JP (2016) The probabilistic model checking landscape. In: 31st annual ACM/IEEE symposium on logic in computer science (LICS). ACM, pp 31–45. https://doi.org/10.1145/2933575.2934574
 31.Krichen M, Tripakis S (2009) Conformance testing for realtime systems. Form Methods Syst Des 34(3):238–304. https://doi.org/10.1007/s1070300900651 zbMATHGoogle Scholar
 32.Kwiatkowska MZ, Norman G, Parker D (2011) PRISM 4.0: verification of probabilistic realtime systems. In: 23rd international conference on computer aided verification (CAV). Lecture notes in computer science, vol 6806. Springer, pp 585–591. https://doi.org/10.1007/9783642221101_47
 33.Larsen KG, Mikucionis M, Nielsen B (2004) Online testing of realtime systems using uppaal. In: 4th international workshop on formal approaches to software testing (FATES). Lecture notes in computer science, vol 3395. Springer, pp 79–94. https://doi.org/10.1007/9783540318484_6
 34.Larsen KG, Mikucionis M, Nielsen B (2009) Uppaal Tron user manual. CISS, BRICS, Aalborg University, AalborgGoogle Scholar
 35.Larsen KG, Skou A (1989) Bisimulation through probabilistic testing. In: Sixteenth annual ACM symposium on principles of programming languages (POPL). ACM Press, pp 344–352. https://doi.org/10.1145/75277.75307
 36.Legay A, Sedwards S, Traonouez LM (2016) Plasma Lab: a modular statistical model checking platform. In: 7th international symposium on leveraging applications of formal methods, verification and validation: foundational techniques (ISoLA). Lecture notes in computer science, vol 9952, pp 77–93. https://doi.org/10.1007/9783319471662_6
 37.Milner R (1980) A calculus of communicating systems. Lecture notes in computer science, vol 92. Springer. https://doi.org/10.1007/3540102353
 38.Moon TK (1996) The expectation–maximization algorithm. IEEE Signal Process Mag 13(6):47–60Google Scholar
 39.Nie J, Demmel J, Gu M (2008) Global minimization of rational functions and the nearest GCDs. J Global Optim 40(4):697–718. https://doi.org/10.1007/s1089800691198 MathSciNetzbMATHGoogle Scholar
 40.Núñez M, Rodríguez I (2003) Towards testing stochastic timed systems. In: 23rd IFIP WG 6.1 international conference on formal techniques for networked and distributed systems (FORTE). Lecture notes in computer science, vol 2767. Springer, pp 335–350. https://doi.org/10.1007/9783540399797_22
 41.Schuts M, Hooman J, Vaandrager FW (2016) Refactoring of legacy software using model learning and equivalence checking: An industrial experience report. In: 12th international conference on integrated formal methods (IFM). Lecture notes in computer science, vol 9681. Springer, pp 311–325. https://doi.org/10.1007/9783319336930_20
 42.Segala R (1995) Modeling and verification of randomized distributed realtime systems. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA, USAGoogle Scholar
 43.Song L, Zhang L, Godskesen JC (2012) Late weak bisimulation for Markov automata. CoRR. arXiv:1202.4116
 44.Stoelinga M (2002) Alea Jacta Est: verification of probabilistic, realtime and parametric systems. Ph.D. thesis, University of Nijmegen, Nijmegen, The NetherlandsGoogle Scholar
 45.Stokkink WGJ, Timmer M, Stoelinga M (2013) Divergent quiescent transition systems. In: 7th international conference on tests and proofs (TAP). Lecture notes in computer science, vol 7942. Springer. https://doi.org/10.1007/9783642389160_13
 46.Thrun S, Burgard W, Fox D (2005) Probabilistic robotics. MIT Press, CambridgezbMATHGoogle Scholar
 47.Timmer M, Brinksma E, Stoelinga M (2011) Modelbased testing. In: Software and systems safety—specification and verification, NATO science for peace and security series—D: information and communication security, vol 30. IOS Press, pp 1–32. https://doi.org/10.3233/97816075071161
 48.Tretmans J (1996) Conformance testing with labelled transition systems: implementation relations and test generation. Comput Netw ISDN Syst 29(1):49–79. https://doi.org/10.1016/S01697552(96)000177 Google Scholar
 49.Tretmans J (2008) Model based testing with labelled transition systems. In: Formal methods and testing, an outcome of the FORTEST network, revised selected papers. Lecture notes in computer science, vol 4949. Springer, pp 1–38. https://doi.org/10.1007/9783540789178_1
 50.Utting M, Pretschner A, Legeard B (2012) A taxonomy of modelbased testing approaches. Softw Test Verif Reliab 22(5):297–312. https://doi.org/10.1002/stvr.456 Google Scholar
 51.Vaandrager FW (2017) Model learning. Commun ACM 60(2):86–95. https://doi.org/10.1145/2967606 Google Scholar
 52.van Glabbeek RJ, Smolka SA, Steffen B, Tofts CMN (1990) Reactive, generative, and stratified models of probabilistic processes. In: Fifth annual symposium on logic in computer science (LICS). IEEE Computer Society, pp 130–141. https://doi.org/10.1109/LICS.1990.113740
 53.Volpato M, Tretmans J (2014) Active learning of nondeterministic systems from an ioco perspective. In: 6th international symposium on leveraging applications of formal methods, verification and validation. Technologies for mastering change (ISoLA). Lecture notes in computer science, vol 8802. Springer, pp 220–235. https://doi.org/10.1007/9783662452349_16
 54.Younes HLS, Simmons RG (2002) Probabilistic verification of discrete event systems using acceptance sampling. In: 14th international conference on computer aided verification (CAV). Lecture notes in computer science, vol 2404. Springer, pp. 223–235. https://doi.org/10.1007/3540456570_17
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.