Symbolically Quantifying Response Time in Stochastic Models using Moments and Semirings

. We study quantitative properties of the response time in stochastic models. For instance, we are interested in quantifying bounds such that a high percentage of the runs answers a query within these bounds. To study such problems, computing probabilities on a state-space blown-up by a factor depending on the bound could be used, but this solution is not satisfactory when the bound is large. In this paper, we propose a new symbolic method to quantify bounds on the response time, using the moments of the distribution of simple stochastic systems. We prove that the distribution (and hence the bounds) is uniquely deﬁned given its moments. We provide optimal bounds for the response time over all distributions having a pair of these moments. We explain how to symbolically compute in polynomial time any moment of the distribution of response times using adequately-deﬁned semirings. This allows us to compute optimal bounds in parametric models and to reduce complexity for computing optimal bounds in hierarchical models.


Introduction
Response time has been considered lately as an important property of systems [8,15,21].In this context, one does not simply want a query to be answered eventually, but to be answered in a reasonable amount of time.In the model-checking community, problems on response time have been studied mainly qualitatively, in the context of (pure, that is non stochastic) two-player games [8,21].There, one looks for a strategy ensuring that the lim-sup of response time is finite.It ensures that under this strategy, there will be a bound on the response time to any query.This has been extended in [15] to a quantitative setting, where one wants to optimize the mean response time in a pure two-player game.
In this paper, we consider stochastic systems.In such systems, the response time is a random variable, unlikely to be bounded as even a single probabilistic loop on a reachable state will make the response time longer than T for a set of runs of small but positive probability, no matter T .Instead, we propose to quantify such response times.One way to do that is to obtain the distribution of response times.Another way is to compute, for a probability 0 < p < 1, the bound T that is satisfied (by a set of runs) with probability at least 1 − p.In this paper, we tackle both problems.For that, we use the concept of moments of the distribution of response times, as described next.
phase type distribution of order n.However, [20] does not help characterizing bounds as it does not ensure that a non-phase type distribution cannot have the exact same moments as a phase type distribution, unlike our result.
Bounding the response time has also been studied in the perf.eval.community.Again, methods used there are mostly numerical [6,19].In [19](p.68-69), a symbolic bound is also provided in the particular case of moments of order 1,2 and 3.In [2], it is shown how to use the two first moments of response time across various components to compute general bounds, using techniques close to ours, but restricted to moments of order 1 and 2. In our paper, we provide optimal bounds for any order (i, j) ∈ N 2 .Taking into account moments of order i, j > 3 is important when the proportion of runs to answer is close to 1.
Last, computing moments find other applications.For instance, in [4,7,14], complex functions describing the evolution of molecular species are approximated using the first k moments, for some k.

Probabilistic Automata
We first introduce a simple class of models, namely probabilistic automata (also called labeled discrete time Markov chains), on which we can demonstrate our techniques.Later, we will extend our results to handle continuous time, considering Continuous-Time Markov Chains (CTMC), as well as parametric and hierarchical systems.Definition 1.A probabilistic automaton A over a finite alphabet Σ is a tuple (S, P r, δ 0 ) where: -S is a finite set of states, -P r : S × Σ × S → [0, 1] is a stochastic transition function such that for all s ∈ S, a∈Σ,t∈S P r(s, a, t) = 1: the weights of paths leaving s sum to 1, δ 0 : S → [0, 1] is the initial distribution over states such that s∈S δ 0 (s) = 1.
Example 1.For instance, the model depicted on Fig. 1 is a probabilistic automaton with 3 states {1, 2, 3}.There is a transition between 1 and 2 labeled query with probability 1.From state 2, with probability .9we stay in state 2 with a transition labeled wait, and with probability .1 we go to state 3 with a transition labeled response.We loop in state 3 with probability 1.A finite sequence π = s 0 , a 1 , s 1 , . . ., a n , s n ∈ (SΣ) n S is called a finite path starting from s 0 and ending in s n , and a transition t ∈ π if t = s i a i+1 s i+1 for some i.We denote |π| = n the length of the path π.For a path π 1 ending in s n and a path π 2 starting from s n , we can define the concatenated path π 1 • π 2 where the last node of π 1 and the first node of π 2 are merged.A path π 1 is a prefix of π if there exists a path π 2 such that π 1 • π 2 = π.
For a path π starting in a state s 0 , we define P(π) = t∈π P r(t) the probability that a path with prefix π is executed from s 0 .A path π is realizable if P(π) > 0.
Let s be a state, and Π be a set of finite paths starting from s such that no path in Π is a prefix of another path in Π.Then the probability that a path starting from s has a prefix in Π is P(Π) = ρ∈Π P(ρ).We say that Π is Some labels of an automaton will be of particular interest concerning response time.Let Σ Q ⊆ Σ be a subset of labels standing for queries, and Σ R ⊆ Σ be a subset of labels standing for responses.For simplicity, we will assume that there is a unique query type Σ Q = {q} and a unique response type Σ R = {r}, with q = r.We will also assume that there is no path with two (similar) queries q.To handle cases with several query/response types, it suffices for each type to consider only queries and answers of that type and disregard other types.
Problem statement: We are interested in quantifying the time between queries and responses, called the response time, which is a random variable.A way to quantify it is to produce the distribution of response times, either for each transition labeled by a query, or averaged on these transitions, weighted by the probability to see each of these transitions.Another way is to answer modelchecking questions such as: what is the smallest delay T such that the mass of paths unanswered after T units of time is smaller than some probability p?
To compute both the distribution and the delay T , we will use the so called moments of the distribution of response times.The moment of order 1 is the mean value, and the moment of order 2 allows to compute the standard deviation.

Symbolically computing moments using semirings
In this section, we define moments and explain how to compute them symbolically using appropriately-defined semirings.
Let X be the random variable of the response time.If all queries are answered, then X takes values in N max , else X takes values in N max ∪ {∞}.Let p(x) be the probability that the response is obtained x units of time after the query, that is, the probability that X = x.Variable p is a distribution over response time, with x p(x) = 1.Definition 2. For p : N → [0, 1] and n ∈ N, we define the n-th moment of p by

Semirings associated with moments
We will compute moments of the distribution of response times by considering each query individually.We can then take e.g. the average over all queries (as we assumed that there are no two queries on the same path).Thus, we first fix a state q, target of a transition labeled by a query.State q symbolizes that a query has just been asked.We then let R be the set of target states of transitions labeled by a response.A state is in R if a response to this query has just been given.For instance, on Fig. 1, we have q = 2 and R = {3}.
We introduce a set of semirings that will allow us to compute symbolically the moment of order n of the distribution of response times to the query associated with state q, for all n ∈ N. We will compute the moment inductively on a disjoint subset Π of paths of A from q to R. For an integer n, we denote µ n (Π) = ρ∈Π P(ρ)|ρ| n .Let Path R q be the set of paths in the automaton A between q and the first occurrence of R. Notice that Path R q is disjoint.Thus, we have that µ n (Path R q ) is the moment of order n of the distribution of response times to the query associated with state q.To avoid some heavy notations, when R is reduced to one state t, let µ n (Path t s ) be the set of paths between s to the first occurence of t and we denote µ n (s, t) = µ n (Path t s ).We now give some properties of µ.Let Π 1 be a set of paths ending in some state s and let Π 2 be a set of paths starting from s.We denote by Proposition 1.For all n, we have This property hints to a set of semirings (R, ⊕ n , ⊗ n , 0 n , 1 n ) with good properties to compute moments.For (n + 1)-tuples (x 0 , . . ., x n ) and (y 0 , . . ., y n ), we define operations ⊕ n and ⊗ n : The neutral element for ⊕ n is 0 n = (0, . . ., 0). 0 n is an annihilator for ⊗ n .The neutral element for ⊗ n is 1 n = (1, 0, . . ., 0).In the following, we will denote the different laws and elements by ⊕, ⊗, 0 and 1. Proposition 2. For n ≥ 0, (R n+1 , ⊕, ⊗, 0, 1)defines a commutative semiring.

Computations in a semiring
Following the Floyd-Warshall algorithm to sum weights of paths reaching a state, we will decompose inductively Path R q using operations ∪ and •.We will then use the semiring (R n+1 , ⊕, ⊗, 0, 1)to perform these computations inductively.The induction will be over the number of states in S. Let G be a subset of S disjoint with R: G ∩ R = ∅.For all state s ∈ S \ R, we define the set of paths from state s to state t using only states G, except for the initial state, which is s and for the last state which is t, even if s, t ∈ R or s, t / ∈ G.For a set of paths Π, we define w n (Π) = (P(Π), µ 1 (Π), . . ., µ n (Π)).Let g ∈ G be a state of G.A path ρ in Path t s (G) has two possibilities: either it does not use g, or it uses g one or several times.We deduce the inductive formula: Proof (Sketch of ).If ρ does not use g, we have ρ is in Path t s (G \ {g}).Otherwise, ρ can be expressed as ρ 0 . . .ρ k with: and for all 0 < j < k, ρ j ∈ Path g g (G \ {g}).We can then write an inductive formula satisfied by Path t s (G): In order to use this formula, we need to compute ), which represents what happens along a cycle from g to g.Let (g, Π) a pair with g a state and Π a set of paths (cycles) using g exactly twice: the first state and the last states are g.The pair (g, Path g g (G \ {g})) satisfies this property.We define Π ⊗k is disjoint.We show that w * n (Π) is defined in most cases, namely when P(Π) < 1.
Proposition 4. Let Π be a set of paths using state g exactly twice, as first and last state.If P(Π) < 1, then , and for i > 0 Notice that P (Π) = 1 describes cases where s cannot reach t (as t / ∈ G, if P(w n (Path g g (G)) = 1, it would mean that every path reaching g stays in G forever, and in particular never meets t).Thus, we first compute the set of states S 1 from which there exists a path to R. Notice that for each set Π of paths ending in g ∈ S 1 \ R, we have P(Π) < 1, because there is a positive probability to reach R from g, which is not captured by paths in Π.

A symbolic algorithm
From the inductive formulae to compute set of paths from subsets of paths and to compute w * n (Π)[i] from w * n (Π)[j] for j < i, we deduce Algorithm 1, following the ideas of Floyd-Warshall, incrementally adding non response states from S 1 \ R, which can be used as intermediate states.Notice that states in S \ S 1 cannot reach R anyway.This algorithm is symbolic (or algebraic) in that every constant (e.g.P r(s, a, t)) can be replaced by a variable (see e.g.Section 4.2).
Theorem 1.Let A = (S, δ, δ 0 ) be a probabilistic automaton.One can compute µ i (s, t) for all i ≤ n and s, t ∈ S in time O(n 2 × |S| 3 ).
Proof.In Algorithm 1, after running the outer for-loop on g 1 , . . ., g j , we have To obtain µ i (s, t) for all i ≤ n, it suffices to run Algorithm 1 inductively on moment of order 1, . . ., n. Computing w * n [i](s, t) in the inner for-loop takes time O(i) as w n [j](s, t) = w j [j](s, t) has already been computed inductively for all j < i.This yields the complexity of O( Now, for each query q, we have µ i (Path R q ) = r∈R µ i (q, r), as Path r1 q and Path r2 q have no path prefix of each other for r 1 = r 2 , r 1 , r 2 ∈ R. Now, the moment of order n of the distribution of response times of q is formally either ∞ if µ 0 (Path R q ) < 1 (there is positive probability to never answer q, that is have infinite response time), and µ n (Path R q ) otherwise.Example 2. For the example of figure 1, unfolding the algorithm for n = 2 (that is for probability, and moments of order 1 and 2) gives after initialization: w(1, 2) = (1, 1, 1), w(2, 2) = (0.9, 0.9, 0.9), w(2, 3) = (0.1, 0.1, 0.1), and w(1, 3) = (0, 0, 0), as there is no direct transition from state 1 to state 3.

Extension to continuous time
We now extend the symbolic computation of moments to continuous time Markov Chains (CTMCs).In order to be as close as possible to the setting of probabilistic automata, we use the sojourn time representation of CTMCs.This representation is fully equivalent with the more usual representation of CTMCs with transition rates, see chapter 7.3 of [9].Definition 3. A CTMC is a tuple (S, P r, δ 0 , (λ s ) s∈S ) with: -(S, P r, δ 0 ) is a probabilistic automata, and for all s, λ s is the sojourn parameter associated with state s.That is, the PDF function of the sojourn time is X s (t) = λ s e −λs•t and the probability to stay in s at least t units of time is e −λs•t .
In this continuous context, we need integrals instead of sums to define the i-th moment of a variable X: We can easily extend the computation of moments for CTMCs.The inductive formulas for probabilities and moments of the reaching time distribution remain unchanged.We only need to change the definition of moments for every transition, which is input at the initialization phase of the algorithm 1: for all s, t ∈ S, we set w n (s, t) to be (w 0 (s, t), w 1 (s, t), . . ., w n (s, t)), where w 0 (s, t) = a∈Σ P r(s, a, t) Theorem 2. Let A = (S, P r, δ 0 , (λ s ) s∈S ) be a CTMC.One can compute µ i (s, t) for all i ≤ n and s, t ∈ S in time O(n 2 × |S| 3 ).

Uniqueness of distribution, parameters and hierarchy
In this section, we present cases where having a symbolic algorithm allows efficient techniques, compared to numerical methods.We start with hierarchical systems which are a way to compactly describe systems.Then, we present the possibility to work on systems with parameters.Finally, thanks to the symbolic expression of moments, we prove that there is a unique distribution having the moments of a distribution of reaching times of a (continuous-time) Markov chain.

Hierarchical Probabilistic Automata
We use notations mainly from [3] to describe hierarchical structures: Definition 4. A hierarchical probabilistic automaton (HPA) A over a finite alphabet Σ is a tuple of n modules (S i , P r i , λ i , s 0 i , s f i ) 1≤i≤n where for all i, -S i is the finite set of states of module i, s 0 i ∈ S i is the initial state of module i, and s f i the finial state of module i, λ i : S i → {i + 1, . . ., n} is a partial mapping associating some states of S i from module i to deeper modules.
Intuitively, the system starts in module 1, in state s 0 1 .Each time a state s ∈ S i associated with a module j > i, that is λ i (s) = j, is entered by a transition t → s, the system goes to state s 0 j and stays in S j till s f j is seen, in which case it comes back to state s and takes a transition s → t (according to the probability distribution from s).This process can be repeated from any state in a module i to any module j as long as j > i.
To define the semantics of (S i , P r i , λ i , s 0 i , s f i ) 1≤i≤n formally, we inductively replace states associated with the deepest module by their definition.Indeed, nodes from the deepest module are not associated with any module by definition.Once every module has been replaced, a (flat) probabilistic automaton is obtained with the intended semantics.Hence, HPA have the same expressive power as probabilistic automata.Yet, they may be much more compact: we denote by |A| the size of the description of the hierarchical automaton and by A the size of the unfolded automaton.The interest of such a description is that it may be exponentially smaller than the size of the unfolded automaton, as depicted in figure 2: here, every module contains two copies of the next module, with the exception of the last one.While the number of states in the description is linear (4n), the number of states in the unfolded automaton is equal to 3 The symbolic Algorithm 1 is naturally modular, in that computations on a module used several times can be performed only once by considering states of the deepest module first.Indeed, one module can be summarized by three information items: the probability (and moments) to answer the query in this module, the probability (and moments) to leave this module without answering the query in the module and the probability to stay forever in this module without answering the query.Then the information can be used for shallower modules: every time a state s in a module i is associated with the deepest module, it can be replaced by this small set of states containing all the relevant information about the deepest module (and computed only once).Then, this process can be repeated to eliminate modules recursively.This leads to a complexity in the small size |A| of the compact HPA representation rather than in the large size ||A|| of the unfolded PA: Theorem 3. Let A be an HPA with k modules of size at most m.The n first moments of the distribution associated with A can be computed in time O(n 2 km 3 ).
Not only does Theorem 3 reduces the complexity for hierarchical representations with redundancy ( O(n 2 k) for the example in Fig. 2 instead of O(n 2 2 3k ) when running the algorithm in [13] on the equivalent flat PA), it also gives a better complexity on structure without redundancy.Consider the example in figure 3, without redundancy, with an unfolded PA with 3k + 1 states.Theorem 3 takes time O(n 2 k3 3 ), while the algorithm in [13] on the equivalent flat PA would take time O(n 2 (3k) 3 ).

Parametric systems
Another case where having a symbolic algorithm is helpful is when the system has parameters standing for probability values (see for instance Fig. 4, where p is such a parameter).We illustrate two cases here.The first case is when parameters help with redundancy.Often, stochastic systems reuse the same constructions, but with different probability values.This would be naturally encoded as a module M of a hierarchical system using a set of parameters P .This module M would be used several times, with different values of parameters specified in each module using it.
In this case, one can run Algorithm 1 on M , using the parameter values literally in the equations.This yields rational functions f n : [0, 1] P → (0, 1] of the parameters expressing the moments of order n for module M , for all n.For instance with the example of Fig. 4, the probability to reach state 4 from state 1 is equal to 2p+4 5p+4 , and the mean time is equal to 112+44p−12p 2 (5p+4)(2p+4) .Each time module M is used, f n can be evaluated using the value of the parameters P for this particular usage.
Another possible use of parameters is to model uncertainty of values.In the example of Fig. 4, we may not know exactly the value of parameter p, but only know that it is above 0.8.In this case, one may be interested of synthesizing the largest (resp.smallest) moment of order n which is smaller (resp.larger) than the moment of any system realizing the parametric system, that is where p is replaced by any value above 0.8.This will be particularly interesting in the next section discussing bounds.To do so, one can use the rational function f n to compute its minimal and maximal values (e.g.deriving it and looking for 0 with Euler's method).In this way, we also obtain the best/worst value for p.

Uniqueness of the distribution
Last, we use the symbolic expression of moments obtained in Section 3 in order to prove the uniqueness of the distribution having moments of first passage times of (continuous-time) Markov chains.Thus this distribution is the distribution of response times of the system considered.
Notice that in general, there may be several distributions that correspond to a given sequence of moments (µ n ) n∈N .This would compromise approximating the distribution using moments, as there would not be a unique such distribution.
Example 3. Let us consider a distribution δ on R + .If δ has the sequence of moments {µ n = n! | n ∈ N}, then δ is the exponential distribution with parameter 1.Similarly, the sequence of moments {µ n = (2n)!| n ∈ N} for a distribution on R + is characteristic of the square of the exponential distribution of parameter 1.Now, consider the cube of the exponential distribution of parameter 1.Its sequence of moments is {µ n = (3n!) | n ∈ N}.However, there exist an infinite number of distributions with this sequence of moments [18] We now prove answer positevely to the Stieljes moment problem for the case of the distribution of response time in a (continuous-time) Markov chain, that is its sequence of moments respects the Carleman's condition from year 1922, that guarantees the uniqueness of the distribution.The condition is that Theorem 4. Let A be a probabilistic automaton or a CTMC.For all n ∈ N, let µ n be the moment of order n of the times of first passage in a set of state R of A. Then there exists a unique distribution δ such that µ n (δ) = µ n for all n ∈ N.
Sketch of proof : We first consider CTMC where all states have the same sojourn time λ.Then, a path that uses i transitions to answer a query will follow the gamma distribution with parameters (i, λ).We have a symbolic expression for moments of this distribution thanks to Section 3.This can be used to minimize 2n by a diverging sum.For general CTMCs, we use the fact that E(Γ (i, It allows us to minimize the Carleman's sum of the CTMC considered by the Carleman's sum of the CTMC where all sojourn times are replaced by the smallest sojourn time λ, hence the divergence.
The case of probabilistic automaton is simpler.
We show how this theorem allows to approximate distribution δ in the next subsection.

A sequence of distributions converging towards δ
Since we have unicity of the distribution corresponding to the sequence of moments of the distribution of response time of a probabilistic automaton, we obtain the following convergence in law: Proposition 5. [17] Let δ be the distribution of response times of a probabilistic automaton.Let (δ i ) i∈N be a sequence of distributions on R + such that for all n, lim i→∞ µ n (δ i ) = µ n (δ).Then, if C i is the cumulative distribution function of δ i and C the cumulative distribution function of δ, then for all x lim i→∞ C i (x) = C(x).
Thus, C can be approximated by taking a sequence (δ n ) n∈N of distribution such that for all i ≤ n, µ i (δ n ) = µ i (δ).A reasonable choice for δ n is to consider the distribution of maximal entropy corresponding to the moments µ 1 , . . ., µ n , as presented in [11].The distribution of maximal entropy can be understood as the distribution that assume the least information.It can be approximated as close as desired, for instance 1  n close to the distribution of maximal entropy having moments (µ 1 (δ), . . ., µ n (δ)).Applying Prop.5, we thus obtain that the cumulative distribution function associated with δ i converges towards the cumulative distribution function associated with δ.

Bounding the response time
We now explain how to use moments in order to obtain optimal bounds on the response time.First, notice that as soon as there exists a loop between a query and a response (as in Fig. 1), then there will be runs with arbitrarily long response times, although there might be probability 1 to eventually answer every query (which is the case for Fig. 1).We thus turn to a more quantitative evaluation of the response time.
Let 0 < p < 1.We are interested in a bound T on the delay between a query and a response such that more than 1 − p of the queries are answered before this bound.For a distribution δ : R + → R + of response times, we denote by B(δ, p) the lowest T such that the probability to have a response time above T is lower than p. Equivalently, we look for the highest T such that the probability of a response time above T is at least p.
We place ourselves in the general setting of continuous distributions, where Dirac delta functions are allowed for simplicity.Discrete distributions form a special case, with delta functions at integer values.One could get rid of Dirac delta functions by -approximating them without changing the moments, obtaining the same bounds as we prove here.

Tchebychev bounds associated with one moment
Let i ∈ N and µ i > 0. We let ∆ i,µi be the set of distributions of response time which have µ i as moment of order i.We are interested in bounding B(δ, p) for all δ ∈ ∆ i,µi , that is for all distributions with µ i as moment of order i.Such a bound is provided by Tchebychev inequality, and it is optimal: Proof.It suffices to remark that µ i > pb i for b the bound we want to reach.Further, this bound is trivially optimal: it suffices to consider a distribution with a Dirac of mass (1 − p) at 0 and a Dirac of mass p at α i (µ i , p).
Given a probabilistic automaton, let δ be its associated distribution of response time.We can compute its associated moments µ i using Algorithm 1, described in the previous section.We thus know that δ ∈ ∆ i,µi .Given different values of i, one can compute the different moments and apply for each of the Tchebychev bound and use the minimal bound obtained.
Understanding the relationship between the α i is thus important.For i < j, one can use Jensen's inequality for the convex function f : x → x j i over R + , and obtain: (µ i ) j ≤ (µ j ) i .For instance, µ 2 1 < µ 2 .For p = 1, this gives α i (p = 1) < α j (p = 1).On the other hand, for p sufficiently close to 0, we have α j (p) < α i (p).That is, when p is very small, moments of high orders will give better bounds than moments of lower order.On the other hand, if p is not that small, moments of small order will suffice.

Optimal bounds for a pair of moments
We now explain how to extend Tchebychev bounds to pairs of moments: We consider the set of distributions where two moments are fixed.Let i < j be two orders of moments and µ i , µ j > 0. We denote by ∆ j,µj i,µi the set of distributions with µ i , µ j as moments of order i, j respectively.As ∆ j,µj i,µi is strictly included into ∆ i,µi and in ∆ j,µj , min(α i (p), α j (p)) is a bound for any δ ∈ ∆ j,µj i,µi .However, it may be the case that min(α i (p), α j (p)) is not optimal.We now provide optimal bounds α j i (p) for any pair i < j of order of moments and probability p: Theorem 5. Let i < j be natural integers, p ∈ (0, 1), and let µ i , µ j > 0. Let α i = ( µi p ) 1 i and α j = ( µj p ) 1 j .We define α j i (p) to be: otherwise, where 0 ≤ M ≤ µ j is the smallest positive real root of: For all δ ∈ ∆ j,µj i,µi , we have B(δ, p) ≤ α j i , and ∃δ ∈ ∆ j,µj i,µi with B(δ, p) = α j i To obtain a value for M , one can use for instance Newton's method.For i = 1, j = 2, we can compute explicitly M and obtain: Example 4. Consider the distribution associated with the system of Fig. 1.We obtain the following bounds α i (p), α i−1 i (p) considering different values of p and i: For p = 0.1, it is not useful to consider moments of order higher than 3. On the other hand, for p = 0.01, the moment of order 5 provides better bounds than moment of lower orders.
For hierarchical systems, one can compute moments in an efficient way using Theorem 3, and then use Theorem 5 to obtain the associated optimal bounds.In order to handle parametric systems, we use the following result which allows to underapproximate the value of M , and thus overapproximate the optimal bound, by iterating the following operator f from x = 0: ) n∈N is strictly increasing and converges towards M .
We show how to -approximate the optimal bound B of a parametric probabilistic automaton A with set of parameters P , that is such that for all val ∈ V P , the probabilistic automaton A with valuation val for parameter values has a bound b(val) ≤ B and there exists a val ∈ V P such that b(val) = B. First, we obtain the moments as symbolic functions of the parameters using Section 4.2.Then, we compute M 1 = f (0) as a function of the parameters, using Lemma 1 and replacing µ i , µ j by their expression.One can then compute the minimum m 1 of function M 1 over all the parameters.We then proceed with M 2 = f (m 1 ), and so on till obtaining a value m.This allows to obtain a lower bound m over values of M for all parameter values.Computing the largest µ j over all parameters allows to obtain an upper bound B up : . A lower bound B lw is easily obtained by considering the value ≥ m of M for the parameters maximizing µ j .If the distance between B up and B lw is larger than , one can partition the space of parameter values in zones and proceed in the same way on each zone, forgetting zones for which B up is lower than the B lw of another zone, till the distance between max(B lw ) and max(B up ) over zones is smaller than .

Conclusion
In this paper, we have shown how to compute moments symbolically for probabilistic automata and CTMCs, using adequately defined semirings.This method has the same complexity as the efficient numerical methods already known [13].The proof of this symbolic computation allows proving that there is a unique distribution of response time corresponding to a probabilistic automaton or a CTMC.This allows obtaining simple approximated distributions scheme converging in law towards the distribution of response time.The symbolic computation of moments also allows computing moments in compact (hierarchical) models faster, as well as finding lowest/highest value of moments in parametric systems.
We also provide optimal bounds on the delay after which very few queries stay unanswered.It is optimal when considering distribution displaying a given pair of moments, and we showed on a simple example how this improves Tchebychev bounds.This can be used efficiently to obtain bounds for compact (hierarchical) models or to compute an optimal bound which fulfills the response of almost all queries even for systems where some parameter values are not known exactly.
Proof.It is clear that (R n+1 + , ⊕, 0) is a commutative monoid.Associativity and commutativity in (R n+1 + , ⊗, 1) come the symmetric role of the x i and y i in ⊗.Thus, we have to prove that ⊗ is distributive over ⊕.Since ⊗ is commutative, we only have to prove that for all x, y, z ∈ R n+1 + , (x ⊗ y) ⊕ (x ⊗ z) = x ⊗ (y ⊕ z).For i ≥ 0, we check the i-th component: Proof.We have the inductive formula: . By using proposition 1, we can deduce that w n (Path . Notice that by associativity, the second part of the sum is equal to: Proposition 4. Let Π be a set of paths using state g exactly twice, as first and last state.If P(Π) < 1, then , and . In particular, for all i, x (1,i) = x i .First, let us prove by induction on i that x (m,i) → m→∞ 0. It is true for i = 0, as x m 0 → 0 since 0 ≤ x 0 < 1. Assume by induction that it is true for all j < i.Then as x (m,i) = i Σ j=0 i j x j x (m−1,i−j) is a linear combination of the x (m−1,i−j) , and all x (m−1,i−j) converges towards 0, then so does x (m,i) converge to 0.
Then, define S m = m k=0 R k , with R 0 = 1.We have S m = (z (m,0) , . . ., z (m,n) ) Let us prove that the series z m,i converges by induction on i.For i = 0, it is easy to see that lim m→∞ z m,0 = 1 (1−x0) .Assume by induction that for all j < i with i > 0, the series z (m,j) converges.We denote its limit x * |j .We have: x m,i → m→∞ 0 and x 1,0 = x 0 .Then we can conclude that Hence we obtained the formulation of w * m (Π)[i] = z (m,i) by induction.

Proofs of Section 4
Formal Semantics of a Hierarchical Probabilistic Automaton.
Let A = (S i , P r i , s 0 i , s f i ) i≤n be a hierarchical automaton.We give now a formal semantics to A, defining the flat probabilistic automaton associated to it.
Let r ∈ S i such that λ i (r) = n.We redefine module i as (S i , P r i , s 0 i , s f i ) with: -We define P r i (s, a, t) as follows for all s ∈ S i \ s f i , t ∈ S i and a ∈ Σ: • for s, t ∈ S i \ {r}, P r i (s, a, t) = P r i (s, a, t) • for s, t ∈ S n , P r i (s, a, t) = P r n (s, a, t) • for s ∈ S i and t = s 0 n , we have P r i (s, a, t) = P r i (s, a, r), • for r = s f i , s = s f n and t, we have P r i (s, a, t) = P r i (r, a, t), • Otherwise, P r i (s, a, t) = 0, We proceed by replacing inductively all nodes associated with the deepest module n till there is no more such node.Then we can remove module n altogether and proceed inductively with nodes associated with module n − 1.
Adaptation of the algorithm to the HPA model We now turn to the proof of Theorem 3. In the model of probabilistic automata, sojourn time in each state takes exactly one time unit.We extend trivially the computation of moments to systems where the sojourn time in a state s depends on the output transition (s, a, t) and follows a distribution δ s,a,t : it suffices to replace in Algorithm 1 the initialization from a∈Σ P r(s, a, t) to a∈Σ (P r(s, a, t) • µ n (δ s,a,t )).Equivalently, we can specify (µ i (δ s,a,t )) i≤n on each transition.
Proof.Algorithm 1 is suitable for HPA because it can be performed be going through states module by module, deepest module first.We now show how to replace a module S i by a set of four states Si carrying an equivalent information, thanks to transition characterized by probabilities and moments.
Assume that a query has been performed and not yet answered before entering a module.There are three possibilities.Firstly, it can be answered in the module.Secondly, it is not answered in the module but it the run leaves this module.The last possibility is the query is not answered and the run does not leave the module.Thus, the module can be summarized by four states,as presented in figure 5: the initial state, a final state, a self looping state representing paths where the query is answered in the module, and another self looping state that represents paths where the query is neither answered nor leave the module.We recall that in each module the final state has no transition to another state in the module.Now, we need to compute the following quantities: the probability (and moments) for reaching the goal from s i0 and the ones for reaching s i f from s i0 without having reached the goal before.These two quantities can be computed inductively by the algorithm 1 as presented before.
Then, the module s i can be replaced by the new four states module in modules of higher rank.Instead of having a cubic complexity over the size of the automaton, this procedure allows us to have a complexity proportional to the sum of the cube of the sizes of the modules, which can lead to great improvement, especially in redundant systems.

Proofs for section 5
We now give a sequence of lemmas that will yield the proof of Theorem 5 and Lemma 1.
case α i < α j We prove that in the case where α i < α j , α i is actually optimal in ∆ j,µj i,µi .This is the first item in Theorem 5 As it is a bound for all δ ∈ ∆ j,µj i,µi , we just need to show that it is optimal Let 0 < η < 1 and 0 < p < 1.We let z be a positive real that we will choose big enough, bigger than α i and actually larger, that will be set later.
Let δ be the distribution with mass (1 − p) in 0, mass p 1 in ηα i mass p 2 in α j and mass p 3 in z, with p 1 + p 2 + p 3 = p.
We want to choose p 1 , p 2 , p 3 such that µ i is the moment of order i and µ j is the moment of order j, that is such that δ ∈ ∆ j,µj i,µi .We thus have the following equations: 1) and (3), we obtain: Granted p 2 < p, for F > (ηα i ) j (that is z > α i which we assumed), we get p 3 > 0. Now, using (2), we obtain: Using equivalents for z going to ∞, we get p 3 (C − η i A) equivalent to (1 − p 2 /p)C/F .Notice that C/F tends to 0. We obtain p2 = For z big enough (η being fixed), we get p 2 > 0.
That is, for z big enough, we can chose p 1 , p 2 , p 3 positive and satisfying the equations we wanted to obtain.That is, p 1 , p 2 , p 3 < p as p = p 1 + p 2 + p 3 , and µ 1 (δ) = µ 1 and µ 2 (δ) = µ 2 .Thus, δ ∈ ∆ j,µj i,µi Last, we have B(δ, p) = ηα i .Case α j < α i We now consider the case where α j < α i , that is the second item of Theorem 5. We first prove that the α j i defined is a bound for all δ ∈ ∆ j,µj i,µi .We take δ any distribution with µ i , µ j for moment of response time of order i, j.We let b = B(δ, p).We partition δ in 2 parts: δ 1 between 0 and b (and 0 elsewhere), and δ 2 after b (and 0 before).We denote µ k (δ ) = ∞ 0 δ (t)t k dt, for ∈ {1, 2}.As δ 2 represents a proportion p of the distribution, and as all the mass is after b, we have the following: Proof.We apply Jensen inequality to both δ 1 and δ 2 .
The M of Lemma 1 and Theorem 5 will be µ j (δ 1 ) for δ a distribution realizing B(δ, p) = α j i (p).We now prove the second part of Theorem 5 and Lemma 1.
Secondly, we show by induction on n that f n (0) is an increasing sequence.We already proved the first step of the induction: f (0) ≥ 0.

Fig. 1 .
Fig. 1.A simple example of a query-response model

Fig. 4 .
Fig. 4. Example of a parametric system with set of parameters {p}

Theorem 3 .
Given an HPA A, with k modules of size at most m.Then one can compute the n first moments of the distribution associated with A in time O(n 2 km 3 ).