Scenario-Based Verification of Uncertain Parametric MDPs

We consider parametric Markov decision processes (pMDPs) that are augmented with unknown probability distributions over parameter values. The problem is to compute the probability to satisfy a temporal logic specification with any concrete MDP that corresponds to a sample from these distributions. As solving this problem precisely is infeasible, we resort to sampling techniques that exploit the so-called scenario approach. Based on a finite number of samples of the parameters, the proposed method yields high-confidence bounds on the probability of satisfying the specification. The number of samples required to obtain a high confidence on these bounds is independent of the number of states and the number of random parameters. Experiments on a large set of benchmarks show that several thousand samples suffice to obtain tight and high-confidence lower and upper bounds on the satisfaction probability.

environments [48]. They are widely used in areas such as planning [50], reinforcement learning [54], formal verification [7], and robotics [40]. Mature model checking tools such as PRISM [37] and Storm [25] employ efficient algorithms to verify the correctness of MDPs against temporal logic specifications [45] provided all transition probabilities and cost functions are exactly known. In many applications, however, this assumption may be unrealistic, as certain system parameters are typically not exactly known and under control by external sources.
Uncertainty on MDPs. A common approach to deal with unknown system parameters is to describe transition probabilities of an MDP using intervals [26,47,29] or generalizations to a class of uncertain MDPs [57,42,55]. Solution approaches rely on the limiting assumption that the uncertainty at different states of the MDP is independent from each other. As an example, consider a simple motion planning scenario where an unmanned aerial vehicle (UAV) is tasked to transport a certain payload to a target location. The problem is to compute a strategy (or policy) for the UAV to successfully deliver the payload while taking into account the weather conditions. External factors such as the wind strength or direction may affect the movement of the UAV. The assumption that such weather conditions are independent between the different possible states of UAV is unrealistic and may yield pessimistic verification results.
Illustrative examples. We stress that the same situation appears in various systems. For example, in the verification of network protocols, we typically do not precisely know the channel quality (i.e., the loss rate). However, the loss rate is independent of the question of whether we are, e.g., probing or actually sending useful data over the network. A typical verification task would be to show that the protocol yields a sufficiently high quality of service. A verification approach that pessimistically assumes that the channel quality depends on the protocol state may be too pessimistic and fail to establish that the protocol provides the required quality of service.
Parametric models. Parametric Markov models allow to explicitly describe that some probabilities are unknown but explicitly related [23,30,36]. In a parametric MDP, one uses variables (parameters) that encode, e.g., the probability of wind gusts affecting a UAV, or the probability of packet loss in a network. Transition probabilities are then given as expressions over these parameters. Substituting the parameters with concrete values yields an induced (standard) MDP. A variety of parameter synthesis methods have been devised, see the related work in Sect. 8 for details. A typical verification query concerns feasibility, that is, whether there exist parameter values such that the induced model satisfies a specification, which implicitly assumes that the parameters are controllable. Another query is to ask whether for all parameter values the induced model satisfies a specification. The latter can lead to pessimistic verification results: a UAV may be able to fly during most weather conditions, but it may be impossible to find a satisfying strategy for flying during a rare storm.
Uncertain parametric models. Rather than asking about the existence of parameter values, we want to analyze a system by considering the typical parameter values. In terms of our examples, this means that we want to investigate the typical weather conditions and the typical channel qualities. Similar to [51], we, therefore, assume that the parameters are random variables. For instance, weather data in the form of probability distributions may provide additional information on potential changes during the mission, or a measurement series may provide typical channel qualities. For weather data, such probability distributions may be derived from historical data of, for example, the wind speed [43].
Problem statement. We study a setting where the uncertain parameters are random variables that are defined on an arbitrary (joint) probability space over all parameters. We assume that we can sample independent and identically distributed parameter values from this distribution and solve the following problem.
Problem statement. Given a parametric MDP and a distribution over the parameter values, compute the probability with which any randomly drawn parameter values yield an induced MDP that satisfies a given specification.
We call this probability the satisfaction probability. The intuition is that the question of whether all (or some) parameter values satisfy a specification -as is often done in parameter synthesis [36] -is replaced by the question of how much we expect the (sampled) model to satisfy a specification. For example, a satisfaction probability of 80% tells that, if we randomly sample the parameters, with a probability of 80% there exists a strategy for the resulting MDP satisfying the specification. Importantly, we thus assume that the parameter values are observable, and hence known when synthesizing a strategy. In every concrete MDP, we may use a different strategy. This is in contrast to a robust strategy synthesis approach, where a single strategy is sought that is robust against all (or a portion of the) parameter valuations.
Scenario-based verification. In this paper, we devise a method that answers the problem statement up to a user-specified confidence level. That is, we aim to solve the problem statement up to a statistical guarantee. To achieve this, we resort to sampling-based algorithms that yield a confidence (probability) on the bounds of the satisfaction probability. In doing so, we do not make any assumptions about the distribution over the parameter values. Referring back to the UAV example, we want to compute a confidence bound on the probability for the UAV to successfully finish its mission for some strategy. To derive confidence bounds, we first formulate the problem of (exactly) computing the satisfaction probability as a chance-constrained optimization program. However, this problem is very hard to solve [16], especially because we do not assume any knowledge on the probability distribution of the parameters. We, therefore, use a technique known as scenario optimization (also called the scenario approach), which provides guarantees on the satisfaction probability via sampling techniques [13,15]. The basic idea is to consider a finite set of samples from the distribution over the parameters and restrict the problem to these samples only. This so-called scenario optimization problem can be solved efficiently [11]. The solution to the scenario program is, with a high confidence probability, a solution to the previously mentioned chance-constrained program.
Our approach. For our setting, we first sample a finite number of parameter instantiations, each of which induces a concrete MDP. We can check the satisfaction of the specification for these concrete MDPs efficiently using, e.g., a probabilistic model checker. Based on the results, we compute an estimate of the satisfaction probability, which is a lower bound on the true satisfaction probability with the desired confidence probability. For example, we may obtain a lower bound on the satisfaction probability of 80%, which holds with a confidence probability of at least 90%. We show that the probability of an incorrect lower bound on the satisfaction probability diminishes to zero exponentially rapidly with an increasing sample size. Moreover, the number of required samples depends on neither the size of the state space nor the number of random parameters. Finally, we show that we can use the same technique to additionally compute upper bounds on the satisfaction probability.
Empirical evaluation. In our experiments, we validate the theoretical results using several MDPs that have different sizes of state and parameter spaces. We demonstrate experimentally that the required number of parameter samples is indeed not sensitive to the dimension of the state and parameter space. In addition, we show the effectiveness of our method with a new dedicated case study based on the aforementioned UAV example which incorporates 900 random parameters.
Contributions. This paper revises an earlier conference paper [19] as follows. Due to new results in [28] that lift some assumptions required for the scenario approach, we can simplify and generalize our approach by a simplified chance-constrained program. This program can be used in conjunction with all (standard temporal) properties on parametric MDPs. This change in the approach yields completely revised technical sections of the paper. Furthermore, this paper fixes a technical error in [19]. The (new) bounds in Theorem 1 of this paper are less pessimistic. The (new) bounds in Theorem 2 are now correct at the cost of being slightly more conservative.

Motivating Example
We consider the previously mentioned UAV motion planning example in more detail, where the objective is to transport a payload from one end of a valley to the other. A specification for the UAV would be that it (with at least probability x) realizes this objective. The typical approach to verify the UAV against this specification is to create a model that faithfully captures its dynamics.
However, the dynamics of the UAV depend on the weather conditions, which may be different on each day. Thus, each weather condition induces a distinct model for the UAV. In line with the problem statement, we assume that the weather conditions are deterministically observed on the day itself, and we can adapt the strategy accordingly. When designing the UAV, we may require that the expected number of days per year on which the UAV can satisfy a mission objective is sufficiently high. Concretely, this translates to the requirement that the UAV shall satisfy the specification above on, e.g., at least 90% of the days. More abstractly, this requirement implies we want to show that the probability of a random day yielding weather conditions on which a specification of the corresponding model is satisfied is at least 90%. To this end, we assume that we have historical data that describes a distribution over weather conditions. Model construction. Planning scenarios like the UAV example are naturally modelled by MDPs, where the actions determine the high-level control actions and the action outcomes are stochastic due to the influences of the environment. While this planning problem is, to some degree, continuous, high-level planners often discretize the world. We thus obtain the following grid world in which the UAV can decide to fly in either of the six cardinal directions (N, W, S, E, up, down). States encode the position of the UAV, the current weather condition (sunny, stormy), and the general wind direction in the valley. In this particular scenario, we assume that the probabilistic outcomes are (only) determined by the wind in the valley and the control action. Concretely, we assume that an action moves the UAV one cell in the corresponding direction. Moreover, the wind moves the UAV one cell in the wind direction with a probability p, which depends on the wind speed. Furthermore, we assume that the weather and wind-conditions change during the day and are described by a stochastic process.
We observe that some probabilities in the system are not fixed but rather a function of the weather. Thus, the model is an uncertain MDP (uMDP) whose transition probabilities depend on the weather. Concretely, parameters describe how the weather affects the UAV in different zones of the valley, and how the weather/wind may change during the day. Historical weather data now induces a distribution over the (joint) parameters.
Sampling from this distribution yields a concrete instantiated MDP. The problem is to compute the satisfaction probability, i.e., the probability that for any sampled MDP, we are able to synthesize a UAV strategy that satisfies the specification. Fig. 1 shows an example environment for the UAV, with the target zone in green and zones to avoid shown in red. The shown trajectories are typical paths under three different weather conditions (we refer to the experiments in Sect. 7 for details).

Preliminaries
In the following, we use probability distributions over finite and infinite sets, for which we refer to [9] for details. Let V = {x 1 , . . . , x n } be a finite set of variables (parameters) over R n . The set of polynomials over V , with rational coefficients, is denoted by Q[V ]. We denote the cardinality of a set U by |U|.

Parametric Models
We introduce parametric Markov decision processes. Note that we omit reward models, but our methods are directly applicable to reward measures.
Definition 1 (pMDP) A parametric Markov decision process (pMDP) M is a tuple M = (S, Act, s I , V, P) with a finite set S of states, a finite set Act of actions, an initial state s I ∈ S, a finite set V of parameters, and a transition function P : The set ActS (s) of enabled actions at state s ∈ S is ActS (s) = {α ∈ Act | ∃s ∈ S, P(s, α, s ) = 0}. Without loss of generality, we require ActS (s) = ∅ for all s ∈ S. If |ActS (s)| = 1 for all s ∈ S, M is a parametric discrete-time Markov chain (pMC) and we denote its transition function by P(s, s ) ∈ Q[V ].
MDP. In the remainder, we assume that all parameter instantiations in V M yield well-defined MDPs. We call u graph-preserving if for all s, s ∈ S and α ∈ Act it holds that P(s, α, s ) = 0 ⇒ P(s, α, s )[u] ∈ (0, 1].

Assumption 1.
We consider only parameter instantiations for upMDPs that are graph-preserving.
To define measures on MDPs, nondeterministic choices are resolved by a strategy σ : S → Act with σ(s) ∈ ActS (s). The set of all strategies over M is Str M . For the specifications we consider in this paper, memoryless deterministic strategies are sufficient [7]. Applying a strategy σ to an MDP M yields an induced Markov where all nondeterminism is resolved.
Measures. For an MC D, the reachability probability Pr D (♦T ) describes the (time unbounded) reachability probability of reaching a set T ⊆ S of target states from the initial state s I [7]. Similar definitions can be given for the step-bounded reachability probability of reaching a set T from the initial state within k steps, and -given rewards for every state -the expected rewards accumulated until reaching the target states or the long-run average, and so forth. For an MDP M, these measures are typically lifted. The maximum reachability probability Pr max M (♦T ) is the maximum reachability probability in all induced Markov chains (for all strategies σ ∈ Str M ), i.e., Pr max M (♦T ) = max σ Pr M[σ] (♦T ). Similar definitions hold for the minimums and the other measures described above. Our approach is directly applicable to more general measures, e.g., measures on paths described by LTL properties [45].

Solution Functions
Recall that every parameter instantiation u ∈ V M for pMDP M induces a concrete MDP M [u]. For this MDP, we may then compute any of the measures described above. We exploit this relationship to create a direct mapping from parameter instantiations to real values.

Definition 2 (Solution Function)
In particular, we are interested in solution functions that map a parameter instantiation to the solution of computing a particular measure. 1 For instance, we may consider a solution function sol M that maps a parameter instantiation u to the probability Pr min Solution functions for parametric models with reachability and expected rewards measures are well-studied, in particular their computation [23,30,24], but also some of their properties [56,6]. Already for pMCs, these functions are typically infeasible to compute. In the context of this paper, the important idea is that we can determine the size of the region where this function exceeds a threshold by sampling, as explained next.

Uncertain Parametric MDPs
We now introduce the setting studied in this paper. Specifically, we use pMDPs whose parameters define the uncertainty in the transition probabilities of an MDP. We add another layer of uncertainty, where each parameter follows a probability distribution. For example, referring back to the UAV example in Sect. 2, each 1 Notice that we later assume the existence of the integral in Def. 4, which excludes some esoteric functions. weather condition has a certain probability, and every condition leads to a certain parameter instantiation. Importantly, the probability distribution of the parameters is assumed to be unknown, and we just assume that we are able to sample this distribution.

Definition 3 (upMDP) An uncertain pMDP (upMDP)
M P is a tuple M P = (M, P) with M a pMDP, and P a probability distribution over the parameter space V M . If M is a pMC, then we call M P a upMC.
Intuitively, a upMDP is a pMDP with an associated distribution over possible parameter instantiations. That is, sampling from V M according to P yields concrete MDPs M[u] with instantiations u ∈ V M (and P(u) > 0). Definition 4 (Satisfaction Probability) Let M P = (M, P) be a upMDP and ϕ a specification. The (weighted) satisfaction probability of ϕ in M P is Note that I ϕ is measurable for all specifications mentioned in this paper, as it partitions V M into a finite union of semi-algebraic sets [8,56]. Moreover, we have that F (M P , ϕ) ∈ [0, 1] and (1) Example 2 We expand the pMC in Fig. 2 toward a upMC with a uniform distribution for the parameter v over the interval [0, 1]. In Fig. 4, we again plot the solution function for the reachability probability in the pMC from Fig. 2, which was also shown in Fig. 3. Additionally, we compare this probability against a threshold λ = 0.13 with comparison operator ≤, and we plot the satisfying region and its complementary as green and red regions. The satisfying region is given by the union of the intervals [0.13, 0.525] and [0.89, 1.0], and the satisfaction probability F (M P , ϕ) is 0.395 + 0.11 = 0.505.

Problem Statement
Let us now formalize the problem of interest. We aim to compute the satisfaction probability of the parameter space for a specification ϕ on a upMDP. Equivalently, we thus seek the probability that a randomly sampled instantiation u from the parameter space V M induces an MDP M[u] which satisfies ϕ. Formally: given a up-MDP M P = (M, P), and a specification ϕ, compute the satisfaction probability F (M P , ϕ). We approximate this probability by sampling the parameters. Such an approach cannot be precise and correct in all cases, because we only have a finite number of samples at our disposal. Instead, we provide the following probably approximately correct (PAC) style formulation [33], meaning that we compute a lower bound on the satisfaction probability that is correct with high confidence: Formal problem 1. Given a upMDP M P = (M, P), a specification ϕ, and a confidence probability β ∈ (0, 1), compute a lower bound η on the satisfaction probability, such that F (M P , ϕ) ≥ η holds with a confidence probability of at least β.
Intuitively, given a confidence probability β close below one, we obtain η as a high-confidence lower bound on the satisfaction probability F (M P , ϕ).

Remark 1
We can also compute an upper bound on the satisfaction probability by exploiting (1) and computing a lower bound for the negated specification ¬ϕ.
Furthermore, as is typical in PAC settings, if a specific value for η is desired, we are also able to compute the confidence that η is indeed a lower bound on the satisfaction probability. We will exploit both directions, with either given β or η, in the numerical examples in Sect. 7. We illustrate our formal problem by continuing our examples on the upMC in Fig. 2

and the UAV.
Example 3 Let us reconsider the upMC from Example 2 with ϕ and satisfaction probability F (M P , ϕ) = 0.505. Assume we do not yet know this probability. The problem statement then asks how to compute an η such that with high confidence β, say 0.99, F (M P , ϕ) ≥ η.
Example 4 For the UAV motion planning example introduced in Sect. 2, consider the question "What is a lower bound on the probability that on a given day, there exists

Sample parameters
Model checker a strategy for the UAV to successfully complete the mission? " Our specification ϕ for successfully completing a mission could then be that the maximal reachability probability to a target state is above 0.99, or that the expected travel time is below 12 hours. For any such ϕ, assume that we want to answer the question above with confidence of β = 0.9. The resulting lower bound on the satisfaction probability could be, e.g., η = 0.81. This means that with a confidence probability of β = 0.9, the actual satisfaction probability is indeed at least η = 0.81. If we change the confidence β to 0.99, the obtained lower bound may reduce to η = 0.78. Intuitively, the more confidence we want to have, the lower the lower bound.

Computing the Satisfaction Probability
In this section, we introduce our approach for solving the problem presented in Sect. 4. We focus on a practical overview of our approach in this section, while postponing technical details and the derivation of our main results (Theorems 1 and 2) to Sect. 6. First, in Sect. 5.1, we fix some notation for our concrete setup. In particular, we discuss how we obtain solutions sol M (u) by sampling parameter instantiations u from V M . In Sect. 5.2, we then first address a simpler yet related problem in which we let the specification ϕ depend on the set of sampled solutions at hand. In Sect. 5.3, we return to our original problem statement: we introduce our approach in which we keep the specification fixed. An algorithmic overview of both of these methods is shown in Fig. 5. Finally, we discuss the quality of the obtained lower bounds in Sect. 5.4   Fig. 6a, the specification threshold λ * (U N ) is chosen after observing the solutions such that all samples are satisfying; Fig. 6b uses a fixed threshold λ and has two violating samples.

Obtaining Solutions from Parameter Samples
We describe how we obtain solutions by sampling from the parameter space. Specifically, we define the set U N = {u 1 , u 2 , . . . , u N } as the outcome of sampling N parameter instantiations from V M according to the probability distribution P. Recall that we assume that these samples are independent and identically distributed. Thus, the set U N of N samples is a random element drawn according to the product probability P N = P × · · · × P (N times) over the parameter probability distribution P. For each sample u ∈ U N , we compute the resulting solution sol M (u), as shown in the following example.
Example 5 We continue from Example 3 on the upMC in Fig. 2. We sample N = 10 parameters from the (uniform) probability distribution of this upMC, which are shown in red on the x-axis in Fig. 6. The resulting solutions sol M (u 1 ), . . . , sol M (u 10 ) are depicted as the blue points. As expected, these points indeed lie on the solution function curve shown in Fig. 3. In Fig. 6a and 6b, we check these solutions against a specification in two different ways: (1) In Fig. 6a, we first observe the solutions and then devise a threshold for the given measure, such that these samples all satisfy the resulting specification. That is, the threshold, denoted by λ * (U N ), depends on the solutions at hand, such that M[u] |= ϕ for all u ∈ U N . For this example, the specification is P ≤λ * (U N ) (♦T ), and the tightest threshold satisfying this condition is λ * (U N ) = max u∈U N sol M (u). (2) In Fig. 6b, we fix the specification with its threshold first, and then evaluate the number of samples satisfying the specification. This may lead to samples violating the specification (e.g., Fig. 6b has two violating samples).
In both cases, we can partition U N into disjoint sets of samples that satisfy (U Nϕ ) or violate (U N¬ϕ ) the specification, i.e., U N = U Nϕ ∪ U N¬ϕ . Note, that in the first case (Fig. 6a), the set of violating samples is empty by construction, i.e., U N¬ϕ = ∅. Let N ϕ = |U Nϕ | denote the number of satisfying samples and N ¬ϕ = |U N¬ϕ | the number of violating samples.

Restriction to Satisfying Samples
Before solving the main problem introduced in Sect. 4, we consider a simpler setting to introduce some of the necessary ideas. Intuitively, we want to investigate the case where we adapt the specification (or rather the threshold in this specification) such that U N = U Nϕ , or equivalently N ϕ = N . This simpler setting is shown by Fig. 6a. Here we do not fix a threshold λ for the specification ϕ a-priori, but instead derive a threshold λ * (U N ) from the solutions at hand such that that all samples are satisfying, i.e. we ensure that Problem. We raise the question: "What is the probability that, given these N samples and a specification threshold that makes all samples satisfying, the next sampled parameter valuation u (on the x-axis of Fig. 6a) with the corresponding solution sol M (u) will also satisfy this specification? " This probability is similar to the satisfaction probability F (M P , ϕ) in Def. 4, but the threshold of specification ϕ is not fixed a-priori.
Result. Using Theorem 1 2 , we compute a lower bound η on this satisfaction probability that holds with a userspecified confidence probability β: Theorem 1 Let upMDP M P and the set U N of N ≥ 1 sampled parameters. For any set U N , choose threshold λ * (U N ) of specification ϕ such that U N = U Nϕ , and fix a confidence probability β ∈ (0, 1). Then, it holds that Applying Theorem 1 to the solutions in Fig. 6a, we compute that the satisfaction probability F (M P , ϕ) with respect to specification ϕ with threshold λ * (U N ) = 0.142 is bounded from below by η = 0.794 (with a confidence probability of at least β = 0.9) and by η = 0.631 (with a confidence of at least β = 0.99).
Sample complexity. More generally, Theorem 1 asserts that with a probability of at least β, the next sampled parameter from V M will satisfy the specification (with sample-dependent threshold λ * (U N )) with a probability of at least (1 − β) 1 N . Thus, the satisfaction probability is lower bounded by η = (1 − β) 1 N with high confidence, given that β is close to one. This high confidence is easily achieved for a sufficiently large number of samples N , as seen from the following corollary.
Corollary 1. The sample size N necessary to obtain a desired lower bound η ∈ (0, 1) on the satisfaction probability with at least a confidence of β ∈ (0, 1) is where the function ceil(x) rounds its argument x ∈ R upwards to the nearest integer.
Corollary 1 states that the sample size N is logarithmic in the confidence probability β. Thus, a significant improvement in β (i.e. closer to one) only requires a marginal increase in N . Similarly, increasing the sample size N improves the lower bound on the satisfaction probability η. For example, when increasing the number of samples in Fig. 6a to N = 100 (note that we still assume that M[u] |= ϕ for all u ∈ U N ), Theorem 1 concludes that the satisfaction probability is lower bounded by 0.977 (with a confidence of at least β = 0.9) and by 0.955 (with a confidence of at least β = 0.99). Next, consider the extreme case, where β is infinitely close to one. We observe from Corollary 1 that such a confidence probability can only be obtained for N = ∞. Intuitively, this observation makes sense: we can only be absolutely certain of our lower bound on the satisfaction probability, if we have based this estimate on infinitely many samples. In practice, our sample set is finite, and a typical confidence probability may be β = 1 − 10 −3 .

Treatment of Violating Samples
We return to a setting with a fixed threshold λ and possible violating samples. In other words, we violate (Assumption A) that M[u] |= ϕ for all samples u ∈ U N . Consider again Fig. 6b, where for some of the samples u ∈ U N , the value sol M (u) exceeds λ, so U N = U Nϕ , and Theorem 1 does not apply. Instead, we state Theorem 2 3 as a generalization that uses a fixed threshold λ and is also applicable in the presence of violating samples: 3 We derive this theorem using Lemma 2, which is provided later on in Sect. 6.4  Theorem 2 Let upMDP M P and the set U N of N ≥ 1 sampled parameters. Choose a confidence probability β ∈ (0, 1). Then, it holds that where t * (N ) = 0 for N ¬ϕ = N , and for k = 0, . . . , N −1, t * (k) is the solution of Theorem 2 solves the formal problem stated in Sect. 4. Recall that N ¬ϕ denotes the number of samples whose value sol M (u) violates the specification ϕ. Applying Theorem 2 to the solutions in Fig. 6b (with N ¬ϕ = 2), we conclude that the satisfaction probability is bounded from below by 0.388 (with a confidence of at least β = 0.9) and by 0.282 (with β = 0.99). When we increase the number of samples to N = 100 and assume that N ¬ϕ = 20, these results improve to the lower bounds 0.654 (with β = 0.9) and 0.622 (with β = 0.99). We note that the intuition in Corollary 1 about the relationships between the sample size N , lower bound η, and the confidence probability β also holds for Theorem 2.   Fig. 7 illustrates how the number of violating samples, N ¬ϕ , influences the quality of the lower bound on the satisfaction probability. The points at N ¬ϕ = 0 are the bounds returned by Theorem 1, while the lines correspond to Theorem 2. Intuitively, the lower bound on the satisfaction probability computed by Theorem 2 decreases with an increased number of violating samples. Moreover, Theorem 1 yields a better lower bound than Theorem 2 (points versus the lines in Fig. 7), at the cost of not using a fixed threshold on the specification, and not being able to deal with violating samples.

Quality of the Obtained Lower Bounds
In Fig. 8, we fix the fraction of violating samples N¬ϕ /N and plot the lower bounds on the satisfaction probability obtained using Theorem 2 for different values of N and β. Note that the lower bounds grow toward the fraction of violation for increased sample sizes. As also shown with Corollary 1, the confidence probability β only has a marginal effect on the obtained lower bounds.
Finally, we make the following remark with respect to the sample complexity of Theorems 1 and 2.
Remark 2 (Independence to model size) The number of samples needed to obtain a certain confidence probabil-ity in Theorems 1 and 2 is independent of the number of states, transitions, or parameters of the upMDP. 4

Derivation of the Main Results
In this section, we explain how we obtain Theorems 1 and 2. Toward proving these theorems, we reformulate our problem statement into the domain of linear programs (LPs). First, we define the case where we account for all but a small fraction of the parameters instantiations u ∈ V M (recall that V M typically has infinite cardinality), which we formalize using a so-called chance-constrained LP. We remark that solving this chance-constrained LP directly is difficult [16]. Instead, we formalize our sampling-based approach, which is based on scenario optimization, and which only considers a finite number of sampled parameters u ∈ U N .

Chance-Constrained LP Reformulation
Recall from Sect. 4 that the problem is to compute a lower bound η on the satisfaction probability F (M P , ϕ). In other words, when sampling a parameter instantiation u ∈ V M according to probability measure P, compute a lower bound η on the probability that M[u] |= ϕ. If the specification ϕ has a comparison operator ≤ and a threshold λ (e.g., ϕ = P ≤λ (♦T )), then the condition M[u] |= ϕ is equivalent to sol M (u) ≤ λ. As the solution function (Def. 2) is a function of (only) the parameter instantiation, the solution sol M (u) is also a random variable with probability measure P. Thus, we can formalize the problem of finding a lower bound η based on the following chance-constrained LP : Then, the satisfaction probability F (M P , ϕ) is lower bounded by η, given that the optimal solution τ * to (6) is at most λ. Similarly, if the specification ϕ has a lower bound comparison operator ≥ and a threshold λ, we consider the following chance-constrained LP: Note that the differences between (6) and (7) are the optimization direction and the operator within the chance constraint. Solving these chance-constrained problems is very hard in general, in particular because the probability distribution of the parameters is unknown [16].
In what follows, we introduce our sampling-based approach to solve these problems with high confidence. For brevity, we assume specifications with a lower bound comparison operator as in (6), but modifying our results for other operators is straightforward.

Scenario Optimization Program
Instead of solving the chance-constrained LP in (6) directly, we compute probably approximately correct lower bounds on the satisfaction probability based on scenario optimization [12,13]. Specifically, we replace the chance constraint (6b), which asks for the satisfaction of a certain fraction of a set of infinitely many constraints, with a finite number of hard constraints that are induced by the sampled parameters u ∈ U N . The resulting optimization problem is called a scenario program [16] and is formulated as follows: Upon solving scenario program (8), we obtain a unique optimal solution τ * = max u∈U N sol M (u). In Sect. 6.3, we show that Theorem 1 follows from solving program (8) directly, while Theorem 2 corresponds to a setting where we deal with samples that violate the specification.

Restriction to Satisfying Samples
Consider the case where we directly solve the scenario program (8). In this case, the following theorem, which is based on [13,Theorem 2.4], enables us to compute a highconfidence lower bound on the satisfaction probability.
Lemma 1 Let uMDP M P , a specification ϕ with operator ≤, and the set U N of N ≥ 1 sampled parameters. Let τ * be the optimal solution of (8), and choose a confidence probability β ∈ (0, 1). Then, it holds that Lemma 1 states that with a probability of at least β, the probability that sol M (u) ≤ τ * for the next sampled parameter u ∈ V M is at least (1 − β) 1 N . To derive Theorem 1 from Lemma 1, we choose the (sample-dependent) specification threshold λ(U N ) ≥ τ * after solving the scenario program. Then, Theorem 1 follows directly by observing that F (M P , ϕ) ≥ Pr{u ∈ V M | sol M (u) ≤ τ * }, since λ(U N ) ≥ τ * . We provide the proof of Lemma 1, and thus of Theorem 1, in Appendix A.

Treatment of Violating Samples
We now derive Theorem 2, which assumes a fixed threshold λ. In this case, we cannot guarantee a-priori that λ ≥ τ * , because some samples may induce a reachability probability above λ, as in Fig. 6b. Recall that N ¬ϕ is the number of samples that violate the specification. Loosely speaking, we relax the constraints for these N ¬ϕ samples, and compute the maximum probability over the remaining samples U Nϕ ⊆ U N , which we write as τ + = max u∈U Nϕ sol M (u). The following theorem is adapted from [15, Theorem 2.1] and computes a highconfidence lower bound on the satisfaction probability, using the values of N ¬ϕ and τ + .
Lemma 2 Let uMDP M P , a specification ϕ with operator ≤, and the set U N of N ≥ 1 sampled parameters. Fix a confidence probability β ∈ (0, 1). Then, it holds that (10) where t * (N ) = 0 for N ¬ϕ = N , and for k = 0, . . . , N −1, t * (k) is the solution of Lemma 2 asserts that with a probability of at least β, the probability that sol M (u) ≤ τ + for the next sampled parameter u ∈ V M is at least t * (N ¬ϕ ), given that N ¬ϕ samples violate the specification ϕ. Theorem 2 follows directly from Lemma 2, by observing that by construction, τ + ≤ λ. We provide the proof of Lemma 2, and thus of Theorem 2, in Appendix A. We note that (11) is the cumulative distribution function of a beta distribution with N ¬ϕ + 1 and N − N ¬ϕ degrees of freedom [16], which can easily be solved numerically for t. Moreover, we can speed up the computations at run-time, by tabulating the solutions to (11) for all relevant values of N , β and N ¬ϕ up front.

Numerical Examples
We implemented our approach in Python using the model checker Storm [25] to construct and analyze samples of MDPs. Our implementation is available at https://doi.org/10.5281/zenodo.6674059 All computations ran on a computer with 32 3.7 GHz cores, and 64 GB of RAM.
First, we apply our method to the UAV motion planning example introduced in Sect. 2. Thereafter, we report on a set of well-known benchmarks used in parameter synthesis [36] that are, for instance, available on the website of the tools PARAM [30] or part of the PRISM benchmark suite [38]. We demonstrate that with our method, we can either specify a lower bound η on the satisfaction probability and compute with what confidence probability β we can guarantee this lower bound, or we can do this in the opposite direction (i.e., specify η and compute β).

UAV Motion Planning
Setup. Recall the example from Sect. 2 of a UAV which needs to deliver a payload while avoiding obstacles. The weather conditions are uncertain, and this uncertainty is reflected in the parameters of the uMDP. For the distributions over parameter values, that is, over weather conditions, we consider the following three cases: 1. we assume a uniform distribution over the different weather conditions in each zone; 2. the probability for a weather condition inducing a wind direction that pushes the UAV northbound (i.e., into the positive y-direction) is twice as likely as in other directions; 3. it is twice as likely to push the UAV westbound (i.e., into the negative x-direction).
Trajectories. We depict example trajectories of the UAV for these three cases in Fig. 1. The trajectory depicted by the black line represents a simulated trajectory for the first case (uniform distribution), taking a direct route to reach the target area. Similarly, the trajectories shown by the dotted purple and dashed blue lines are simulated trajectories for the second (stronger northbound wind, i.e., positive x-direction) and third cases (stronger westbound wind, i.e., positive y-direction), respectively. Under these two weather conditions, the UAV takes different paths toward the goal in order to account for the stronger wind. In particular, for the case with northbound wind, we observe that the UAV is able to fly close to the obstacle at the right bottom. By contrast, for the case with westbound wind, the UAV avoids to get close to this obstacle, as the wind may push the UAV into the obstacles, and then reaches the target area.
Bounds on satisfaction probabilities. We sample N = 1 000 parameters for each case and consider different con-fidence probabilities β between 0.9 and 0.9999. Specifically, we consider the specification ϕ = P ≥0.9 (♦T ), which is satisfied if the probability to safely reach the goal region is at least 90%. For all three weather conditions, we compute the lower bounds η on both the satisfaction probability F (M P , ϕ) and the unsatisfaction probability F (M P , ¬ϕ), using Theorem 2.
The results are presented in Table 1. The highest lower bound on the satisfaction probability is given by the first weather condition, and is η = 0.911 (for β = 0.9) and η = 0.906 (for β = 0.9999). In other words, under a uniform distribution over the weather conditions, the UAV will (with a confidence of at least β = 0.9999) satisfy the specification on at least 90.6% of the days. The second and third weather conditions lead to η = 0.770 and η = 0.768 (for β = 0.9999), respectively, showing that it is harder to navigate around the obstacles with non-uniform probability distributions over the parameters. The average time to run a full iteration of our approach on this uMDP with 900 parameters and around 10 000 states (i.e., performing the sampling, model checking, and computing the lower bounds η) with 5 000 parameter samples is 9.5 minutes.

Parameter Synthesis Benchmarks
Setup. In our second set of benchmarks, we adopt parametric MDPs and pMCs from [49]. We adapt the Consensus protocol [3] and the Bounded Retransmission Protocol (brp) [34,22] to uMDPs; the Crowds Protocol (crowds) [52] and the NAND Multiplexing benchmark (nand) [31] become uMCs. Essentially, the PLA technique from [49] allows to approximate the percentage of instantiations that satisfy (or do not satisfy) a specification, while assuming a uniform distribution over the parameter space. Table 2 lists the specification checked (ϕ) and the number of parameters, states, and transitions for all benchmarks. Note that the satisfying regions reported in Table 2 approximate F (M P , ϕ), while the unsatisfying regions approximate F (M P , ¬ϕ). We provide these numbers as a baseline only: the computation via PLA cannot scale to more then tens of parameters [49] and cannot cope with unknown distributions. For all benchmarks, we assume a uniform distribution over the parameters.
Specifications with variable thresholds λ. We demonstrate Theorem 1 on brp (16,5) with a variable threshold λ * (U N ) in specification ϕ = E ≤λ * (U )N ) (♦T ). We use either N = 1 000 or 10 000 parameter samples and compute the tightest threshold λ * (U N ) such that all samples are satisfying. As explained in Example 5, this tightest threshold is λ * (U N ) = max u∈U N sol M (u). We repeat Table 1: Lower bounds η on the (un)satisfaction probability for the UAV benchmark with N = 5 000 samples.  Table 2: Information for the benchmark instances and the approximate (un)satisfaction probabilities taken from [49].   both cases ten thousand times and show a histogram of the obtained thresholds in Fig. 9. The corresponding lower bounds on the satisfaction probability (which only depend on N and β) are η = 0.9954 (for N = 1 000) and η = 0.9995 (for N = 10 000). We observe from Fig. 9 that for a higher number of samples, the thresholds are, on average, higher and their variability is lower. These results are explainable, since the threshold is computed as the maximum of all solutions.
Computing β for a given η. We use Theorem 2 to compute the confidence probabilities β that the approximate satisfying regions in Table 2 are indeed a lower bound on the actual satisfaction probability F (M P , ϕ) (and on F (M P , ¬ϕ) for the unsatisfying regions). Thus, we let η be the approximate (un)satisfying regions in Table 2, and sample a desired number of parameters (between N = 1 000 and 25 000). For every instance, we then compute the confidence probability β using Theorem 2.
For every benchmark and sample size, we report the average confidence probabilities β over 10 full iterations of the same benchmark in Table 3. Furthermore, we list the run time (in seconds) per 1 000 samples for each instance. The results in Table 3 show that we can conclude with high confidence that the (un)satisfying regions are indeed a lower bound on the actual (un)satisfaction probabilities. In particular for N = 25 000 samples, most confidence probabilities are very close to one. For example, for the crowds benchmark, instance (10,5) with N = 25 000, we obtain a confidence probability of β = 0.99943 on the unsatisfying region of size 0.413. Thus, the probability that the approximate unsatisfying region of 0.413 in Table 2 is not a lower bound on the actual unsatisfaction probability F (M P , ¬ϕ) is less than 1 − β = 0.00057. Moreover, in line with Remark 2, larger models do not (in general) lead to worse confidence bounds (although model checking times do typically increase with the size of the MDP, cf. Table 3).
The instance for which we obtained the worst confidence probability is the unsatisfying probability of nand (10,5), namely β = 0.975. Recomputing the results of [49] with a much smaller tolerance revealed that the approximate unsatisfying region of 0.733 was already a very tight lower bound (the best bound we were able to compute was 0.747). As such, we could only conclude with a confidence of β = 0.975 that η = 0.733 is a correct lower bound (as shown in Table 3, for N = 25 000).
Computing η for a given β. Conversely, we can also use Theorem 2 to compute the best lower bound η on the (un)satisfaction probability that holds with at least a confidence probability β. For each benchmark, we sample N = 25 000 parameters and apply Theorem 2 for increasing confidence probabilities β. We report the Table 3: Average confidence probabilities β for different sample sizes N , and run times per 1 000 samples.  Table 4: Lower bounds η on the (un)satisfaction probability for N = 25 000 samples.  Table 4. We observe that the obtained values of η are slightly more conservative (i.e., lower) for higher values of β. This observation is indeed intuitive: to reduce our chance 1 − β of obtaining an incorrect bound on the (un)satisfaction probability, the value of η must be more conservative. Moreover, increasing the confidence probability β only marginally reduces the obtained lower bound η. For example, the obtained lower bound on the satisfaction probability for brp (256,5) with β = 0.9 is η = 0.07244, while for β = 0.9999, it is only reduced to η = 0.07036 (a reduction of only 0.21%). This observation confirms the important result of Corollary 1: a high confidence probability β can typically be obtained without sacrificing the tightness of the obtained lower bound η.
Recall that, based on Table 3, we can only confirm the validity of the lower bound η = 0.733 on the unsatisfying region for nand (10,5) with a confidence probability of β = 0.975. Interestingly, Table 4 shows that we can guarantee a marginally weaker lower bound of η = 0.73271 with a confidence probability β = 0.9999. In other words, by weakening our lower bound η by an almost negligible amount of 0.0003, we increase the confidence on the results from 97.51% to a remarkable 99.99%. This highlights that the confidence probability β is extremely sensitive for the lower bound η, especially for high sample sizes N .

Discussion and Related work
The so-called parameter synthesis problem considers computing parameter values such that the induced nonparametric MDP satisfies the specification for some strategy. Most works on parameter synthesis focus on finding one parameter value that satisfies the specification. The approaches involve computing a rational function of the reachability probabilities [23,30,27,6], utilizing convex optimization [20,21], and sampling-based methods [18,41]. The problem of whether there exists a value in the parameter space that satisfies a reachability specification is ETR-complete 5 [56], and finding a satisfying parameter value is exponential in the number of parameters.
The work in [4] considers the analysis of Markov models in the presence of uncertain rewards, utilizing statistical methods to reason about the probability of a parametric MDP satisfying an expected cost specification. This approach is restricted to reward parameters and does not explicitly compute confidence bounds. The work in [46] obtains data-driven bounds on the parameter ranges and then uses parameter synthesis techniques to validate properties for all parameter values in this range. Paper [39] computes bounds on the long-run prob-ability of satisfying a specification with probabilistic uncertainty for Markov chains. Other related techniques include multi-objective model checking to maximize the average performance with probabilistic uncertainty sets [51], sampling-based methods which minimize the regret with uncertainty sets [1], and Bayesian reasoning to compute parameter values that satisfy a metric temporal logic specification on a continuous-time Markov chain (CTMC) [10]. Sampling-based methods similar to ours for verifying CTMCs with uncertain rates are developed in [5]. Finally, the work in [2] considers a variant of the problem in this paper where parameter values cannot be observed and thus must be learned. The paper formulates the strategy synthesis problem as an computationally harder partially observable Markov decision process (POMDP) synthesis problem and uses off-the-shelf point-based POMDP methods [44,17].
The works [47,57] consider the verification of MDPs with convex uncertainties. However, the uncertainty sets for different states in an MDP are restricted to be independent, which does not hold in our problem setting where we have parameter dependencies.
Uncertainties in MDPs have received quite some attention in the artificial intelligence and planning literature. Interval MDPs [47,29] use probability intervals in the transition probabilities. Dynamic programming, robust value iteration and robust strategy iteration have been developed for MDPs with uncertain transition probabilities whose parameters are statistically independent, also referred to as rectangular, to find a strategy ensuring the highest expected total reward at a given confidence level [42,57]. The work in [55] relaxes this independence assumption a bit and determines a strategy that satisfies a given performance with a pre-defined confidence provided an observation history of the MDP is given by using conic programming. State-of-the art exact methods can handle models of up to a few hundred of states [35]. Multi-model MDPs [53] treat distributions over probability and cost parameters and aim at finding a single strategy maximizing a weighted value function. This problem is NP-hard for deterministic strategies and PSPACE-hard for history-dependent strategies.

Conclusion
We have presented a new sampling-based approach to analyze uncertain Markov models. Theoretically, we have shown how to effectively and efficiently bound the probability that any randomly drawn sample satisfies a temporal logic specification. Furthermore, we have shown the computational tractability of our approaches by means of well-known benchmarks and a new, dedicated case study. In the future, we plan to exploit our approaches for more involved models such as Markov automata [32]. Another line of future work will be a closer integration with a parameter synthesis framework.
In this section, we provide the proofs of our main theoretical contributions. Since our theorems and lemmas are tailored to specifications ϕ with comparison operator ≤, we also use this assumptions throughout the proofs. The proofs for the case where we have a specification ϕ with comparison operator ≥ are analogous to the difference between LPs (6b) and (7b): we use maximize instead of minimize, and the operator in the constraint changes sign.

A.1 Proof of Lemma 1 and Theorem 1
We first prove Lemma 1, and then show that Theorem 1 follows directly. Let us rewrite LP (8) in a more compact way. To this end, for a parameter sample u ∈ U N , let C u denote the interval of values for τ , for which constraint (8b) is satisfied. Note that for a specification ϕ with comparison operator ≤, we have C u = [sol M (u), +∞), i.e., C u is lower bounded by the solution sol M (u). Using C u , we reformulate the scenario program (8) as the equivalent program Note that (12) is exactly in the form of the scenario program formulated in [13]. Let τ * denote the optimal value to the scenario program with respect to sample set U N , and let u be an independently sampled parameter from V M according to P. Then, Theorem 2.4 of [13] states that the cumulative distribution function of the probability for the set C u associated with sampled parameter u to violate the optimal solution τ * , that is τ * / ∈ C u , is written as follows: where ∈ (0, 1) bounds the violation probability, and d is the number of decision variables of (12). Since we have d = 1, and we are after the satisfaction probability (rather than the violation probability), we simplify (13) as Let β = 1 − (1 − ) N , which implies that = 1 − (1 − β) 1 N . Moreover, the event that τ * ∈ C u is equivalent to sol M (u) ≤ τ * , so (15) reduces to which equals the desired expression in (9) for Lemma 1.
Finally, we show that Theorem 1 follows from Lemma 1. Because τ * ≤ λ * (U N ), with λ * (U N ) the sample-dependent threshold of specification ϕ, the inner probability in (16) is a lower bound on the satisfaction probability F (M P , ϕ): By combining (16) with (17), we find that and thus, the claim in Theorem 1 follows.

A.2 Proof of Lemma 2 and Theorem 2
We first prove Lemma 2, and then show that Theorem 2 follows directly. We modify the scenario program (8) as a scenario program with discarded samples [15], which allows for the removal of undesirable constraints: where we introduced the sample removal set Q, which accounts for a subset of samples whose constraints have been discarded. We explicitly write the dependency of the optimal solution τ * |Q| on the number of discarded samples |Q|. Critically, samples are removed based on the following rule: Lemma 3 The sample removal set Q ⊆ {1, . . . , N } is obtained by iteratively removing the active constraints from (19), i.e. the samples u ∈ U N for which sol M (u) = τ * |Q| .
Note that the active constraint may not be unique, e.g., if the solution sol M (u 1 ) = sol M (u 2 ) for u 1 = u 2 . In that case, a suitable tie-break rule may be used to select a constraint to discard, as discussed in [14].
The main difference between programs (19) and (8) is that instead of enforcing the constraint for every sample u ∈ U N , we only enforce the constraint for a subset of samples u ∈ U N \Q. Based on the solution to (19), Theorem 2.1 of [15] bounds the cumulative distribution function of the violation probability, in a similar manner as the guarantees given by (13): where ∈ (0, 1) bounds the violation probability, d = 1 is the number of decision variables of (19), and |Q| is the cardinality of the sample removal set. As our goal is to bound the satisfaction probability (rather than the violation probability), we define t = 1 − , and rewrite (20) as We equate the right-hand side of (22) to 1 − 1−β N , where β ∈ (0, 1) is a confidence probability (typically close to one): where t * (|Q|) is the solution to We divide the confidence level by N to account for all N possible values for |Q|, ranging from 0 to N − 1. The value of |Q| that is actually needed depends on the sample set at hand, and is, therefore, not known a-priori (i.e., before observing the actual samples). For brevity, denote by A n the event that P r{u ∈ V M | τ * n ∈ C u } ≥ t * (n).
The probability for this event to hold is P N {A n } ≥ 1 − 1−β N , and its complement A n has a probability of P N {A n } ≤ 1−β N . Based on Boole's inequality, it holds that Thus, the probability of the intersection of all events is In other words, the bounds on the satisfaction probability given by (23) hold simultaneously for all values |Q| = 0, . . . , N − 1 with a confidence probability of at least β. After observing the samples at hand, we determine the actual value of N ¬ϕ and plug it as |Q| into (23). The probability that this expression holds cannot be smaller than that of the intersection of all events in (27). Hence, we obtain Recall from Sect. 6.4 that τ + = max u∈U Nϕ sol M (u), which is, by construction, equivalent to τ * |Q| under |Q| removed samples. Thus, (28) is equivalent to (10), and the definition of t * (|Q|) in (24) equals (11), so the claim in Lemma 2 follows.
By combining (28) with (29), we find that and thus, we conclude the proof of Theorem 2.