Model Checking Finite-Horizon Markov Chains with Probabilistic Inference

We revisit the symbolic verification of Markov chains with respect to finite horizon reachability properties. The prevalent approach iteratively computes step-bounded state reachability probabilities. By contrast, recent advances in probabilistic inference suggest symbolically representing all horizon-length paths through the Markov chain. We ask whether this perspective advances the state-of-the-art in probabilistic model checking. First, we formally describe both approaches in order to highlight their key differences. Then, using these insights we develop Rubicon, a tool that transpiles Prism models to the probabilistic inference tool Dice. Finally, we demonstrate better scalability compared to probabilistic model checkers on selected benchmarks. All together, our results suggest that probabilistic inference is a valuable addition to the probabilistic model checking portfolio -- with Rubicon as a first step towards integrating both perspectives.

(b) A Prism model of (a) with 3 factories. tracks the states that can reach a target state (or dually, that can be reached from an initial state), probabilistic model checking tracks the i-step reachability probability for each state in the chain. The i+1-step reachability can then be computed via multiplication with the transition matrix. The scalability concern is that this matrix grows with the state space in the Markov chain. Mature model checking tools such as Storm [38], Modest [36], and Prism [52] utilize a variety of methods to alleviate the state space explosion. Nevertheless various natural models cannot be analyzed by the available techniques.
In parallel, within the AI community a different approach to representing a distribution has emerged, which on first glance can seem unintuitive. Rather than marginalizing out the paths and tracking reachability probabilities per state, the probabilistic AI community commonly aggregates all paths that reach the target state. At its core, inference is then a weighted sum over all these paths [17]. This hinges on the observation that this set of paths can often be stored more compactly, and that the probability of two paths that share the same prefix or suffix can be efficiently computed on this concise representation. This inference technique has been used in a variety of domains in the artificial intelligence (AI) and verification communities [40,28,15,9], but is not part of any mature probabilistic model checking tools.
This paper theoretically and experimentally compares and contrasts these two approaches. In particular, we describe and motivate Rubicon, a probabilistic model checker that leverages the successful probabilistic inference techniques. We begin with an example that explains the core ideas of Rubicon followed by the paper structure and key contributions.
Motivating Example Consider the example illustrated in Fig. 1(a). Suppose there are n factories. Each day, the workers at each factory collectively decide whether or not to strike. To simplify, we model each factory (i) with two states, striking (t i ) and not striking (s i ). Furthermore, since no two factories are identical, we take the probability to begin striking (p i ) and to stop striking (q i ) to be different for each factory. Assuming that each factory transitions synchronously and in parallel with the others, we query: "what is the probability that all the factories are simultaneously striking within h days?" Despite its simplicity, we observe that state-of-the-art model checkers like Storm and Prism do not scale beyond 15 factories. 3 For example, Figure 1(b) provides a Prism encoding for this simple model (we show the instance with 3 factories), where a Boolean variable c i is used to encode the state of each factory. The "allStrike" label identifies the target state. Figure 1(c) shows the run time for an increasing number of factories. While all methods eventually time out, Rubicon scales to systems with an order of magnitude more states.
Why is this problem hard? To understand the issue with scalability, observe that tools such as Storm and Prism store the transition matrix, either explicitly or symbolically using algebraic decision diagrams (ADDs). Every distinct entry of this transition matrix needs to be represented; in the case of ADDs using a unique leaf node. Because each factory in our example has a different probability of going on strike, that means each subset of factories will likely have a unique probability of jointly going on strike. Hence, the transition matrix then will have a number of distinct probabilities that is exponential in the number of factories, and its representation as an ADD must blow up in size. Concretely, for 10 factories, the size of the ADD representing the transition matrix has 1.9 million nodes. Moreover, the explicit engine fails due to the dense nature of the underlying transition matrix. We discuss this method in Sec. 3.
How to overcome this limitation? This problematic combinatorial explosion is often unnecessary. For the sake of intuition, consider the simple case where the horizon is 1. Still, the standard transition matrix representations blow up exponentially with the number of factories n. Yet, the probability of reaching the "allStrike" state is easy to compute, even when n grows: it is p 1 · p 2 · · · p n .
Rubicon aims to compute probabilities in this compact factorized way by representing the computation as a binary decision diagram (BDD). Fig. 1(d) gives an example of such a BDD, for three factories and a horizon of one. A key property of this BDD, elaborated in Sec. 3, is that it can be interpreted as a parametric Markov chain, where the weight of each edge corresponds with the probability of a particular factory striking. Then, the probability that the goal state is reached is given by the weighted sum of paths terminating in T : for this instance, there is a single such path with weight p 1 · p 2 · p 3 . These BDDs are tree-like Markov-chains, so model checking can be performed in time linear in the size of the BDD using dynamic programming. Essentially, the BDD represents the set of paths that reach a target state-an idea common in probabilistic inference.
To construct this BDD, we propose to encode our reachability query symbolically as a weighted model counting (WMC) query on a logical formula. By compiling that formula into a BDD, we obtain a diagram where computing the query probability can be done efficiently (in the size of the BDD). Concretely for Fig. 1(d), the BDD represents the formula c 3 , which encodes all paths through the chain that terminate in the goal state (all factories strike on day 1). For this example and this horizon, this is a single path. WMC is a well-known strategy for probabilistic inference and is currently the among the state-of-the-art approaches for discrete graphical models [17], discrete probabilistic programs [40], and probabilistic logic programs [28].
In general, the exponential growth of the number of paths might seem like it dooms this approach: for n = 3 factories and horizon h = 1, we need to only represent 8 paths, but for h = 2, we would need to consider 64 different paths, and so on. However, a key insight is that, for many systems -such as the factory example -the structural compression of BDDs allows a concise representation of exponentially many paths, all while being parametric over path probabilities (see Sec. 4). To see why, observe that in the above discussion, the state of each factory is independent of the other factories: independence, and its natural generalizations like conditional and contextual independence, are the driving force behind many successful probabilistic inference algorithms [48]. Succinctly, the key advantage of Rubicon is that it exploits a form of structure that has thus far been under-exploited by model checkers, which is why it scales to more parallel factories than the existing approaches on the hard task. In Section 6 we consider an extension to this motivating example that adds dependencies between factories. This dependency (or rather, the accompanying increase in the size of the underlying MC) significantly decreases scalability for the existing approaches but negligibly affects Rubicon. This leads to the task: how does one go from a Prism model to a concise BDD efficiently? To do this, Rubicon leverages a novel translation from Prism models into a probabilistic programming language called Dice (outlined in Section 5).
Contribution and Structure Inspired by the example, we contribute conceptual and empirical arguments for leveraging BDD-based probabilistic inference in model checking. Concretely: 1. We demonstrate fundamental advantages in using probabilistic inference on a natural class of models (Sec. 1 and 6). 2. We explain these advantages by showing the fundamental differences between existing model checking approaches and probabilistic inference (Sec. 3 and 4).
To that end, Section 4 presents probabilistic inference based on an operational and a logical perspective and combines these perspectives. 3. We leverage those insights to build Rubicon, a tool that transpiles Prism to Dice, a probabilistic programming language (Sec. 5). 4. We demonstrate that Rubicon indeed attains an order-of-magnitude scaling improvement on several natural problems including sampling from parametric Markov chains and verifying network protocol stabilization (Sec. 6). Ultimately we argue that Rubicon makes a valuable contribution to the portfolio of probabilistic model checking backends, and brings to bear the extensive developments on probabilistic inference to well-known model checking problems.

Preliminaries and Problem Statement
We state the problem formally and recap relevant concepts. See [7] for details. We sometimes usep to denote 1−p. A Markov chain (MC) is a tuple M = S, ι, P, T with S a (finite) set of states, ι ∈ S the initial state, P : S → Distr(S) the transition function, and T a set of target states T ⊆ S, where Distr(S) is the set of distributions over a (finite) set S. We write P (s, s ) to denote P (s)(s ) and call P a transition matrix. The successors of s are Succ(s) = {s | P (s, s ) > 0}. To support MCs with billions of states, we may describe MCs symbolically, e.g., with Prism [52] or as a probabilistic program [49,43]. For such a symbolic description P, we denote the corresponding MC with [[ P ]]. States then reflect assignments to symbolic variables. A path π = s 0 . . . s n is a sequence of states, π ∈ S + . We use π ↓ to denote the last state s n , and the length of π above is n and is denoted |π|. Let Paths h denote the paths of length h. The probability of a path is the product of the transition probabilities, and may be defined inductively by Pr(s) = 1, Pr(π · s) = Pr(π) · P (π ↓ , s). For a fixed horizon h and set of states T , let the set [[ s→♦ ≤h T ]] = {π | π 0 = s ∧ |π| ≤ h ∧ π ↓ ∈ T ∧ ∀i < |π|. π i ∈ T } denote paths from s of length at most h that terminate at a state contained in T . Furthermore, let Pr M (s |= ♦ ≤h T ) = π∈[[ s→♦ ≤h T ]] Pr(π) describe the probability to reach T within h steps. We simplify notation when s = ι and write [[ ♦ ≤h T ]] and Pr M (♦ ≤h T ), respectively. We omit M whenever that is clear from the context.  It is helpful to separate the topology and the probabilities. We do this by means of a parametric MC (pMC) [23]. A pMC over a fixed set of parameters p generalises MCs by allowing for a transition function that maps to Q[ p], i.e., to polynomials over these variables [23]. A pMC and a valuation of parameters u : p → R describe a MC by replacing p with u in the transition function P to obtain P [ u]. If P [ u](s) is a distribution for every s, then we call u a welldefined valuation. We can then think about a pMC M as a generator of a set of MCs {M[ u] | u well-defined}. Fig. 2(b) shows a pMC; any valuation u with u(p), u(q) ∈ [0, 1] is well-defined. We consider the following associated problem: Parameter Sampling: Given a pMC M, a finite set of well-defined valuations U , and a horizon h, compute Pr M[ u] (♦ ≤h T ) for each u ∈ U .
We recap binary decision diagrams (BDDs) and their generalization into algebraic decision diagrams (ADDs, a.k.a. multi-terminal BDDs). ADDs over a set of variables X are directed acyclic graphs whose vertices V can be partitioned into terminal nodes V t without successors and inner nodes V i with two successors. Each terminal node is labeled with a polynomial over some parameters p (or just to constants in Q), val : V t → Q[ p], and each inner node V i with a variable, var : V i → X. One node is the root node v 0 . Edges are described by the two successor functions E 0 : V i → V and E 1 : V i → V . A BDD is an ADD with exactly two terminals labeled T and F . Formally, we denote an ADD by the tuple V, v 0 , X, var, val, E 0 , E 1 . ADDs describe functions f : B X → Q[ p] (described by a path in the underlying graph and the label of the corresponding terminal node). As finite sets can be encoded with bit vectors, ADDs represent functions from (tuples of) finite sets to polynomials.
Example 2. The transition matrix P of the MC in Fig. 2(a) maps states, encoded by bit vectors, x, y , x , y to the probabilities to move from state x, y to x , y . Fig. 2(c) shows the corresponding ADD. 4

A Model Checking Perspective
We briefly analyze the de-facto standard approach to symbolic probabilistic model checking of finite-horizon reachability probabilities. It is an adaptation of qualitative model checking, in which we track the (backward) reachable states. This set can be thought of as a mapping from states to a Boolean indicating whether a target state can be reached. We generalize the mapping to a function that maps every state s to the probability that we reach T within i steps, denoted Pr M (s |= ♦ ≤i T ). First, it is convenient to construct a transition relation in which the target states have been made absorbing, i.e., we define a matrix with A(s, s ) = P (s, s ) if s ∈ T and A(s, s ) = [s = s ] 5 otherwise. The following Bellman equations characterize that aforementioned mapping: The main aspect model checkers take from these equations is that to compute the h-step reachability from state s, one only needs to combine the h−1-step reachability from any state s and the transition probabilities P (s, s ). We define a vector T with T (s) = [s ∈ T ]. The algorithm then iteratively computes and stores the i step reachability for i = 0 to i = h, e.g. by computing A 3 · T using A · (A · (A · T )). This reasoning is thus inherently backwards and implicitly marginalizing out paths. In particular, rather than storing the i-step paths that lead to the target, one only stores a vector x = A i · T that stores for every state s the sum over all i-long paths from s. Explicit representations of matrix A and vector x require memory at least in the order |S|. 6 To overcome this limitation, symbolic probabilistic model checking stores both A and A i · T as an ADD by considering the matrix as a function from a tuple s, s to A(s, s ), and x as a function from s to x(s) [2]. Fig. 2(a). The h-bounded reachability probability Pr M (♦ ≤h { 1, 0 }) can be computed as reflected in Fig. 3(a). The ADD for P is shown in Fig. 2(c). The ADD for x when h = 2 is shown in Fig. 3

Example 3. Reconsider the MC in
The performance of symbolic probabilistic model checking is directly governed by the sizes of these two ADDs. The size of an ADD is bounded from below by the number of leafs. In qualitative model checking, both ADDs are in fact BDDs, with two leafs. However, for the ADD representing A, this lower bound is given by the number of different probabilities in the transition matrix. In the running example, we have seen that a small program P may have an underlying MC [[ P ]] with an exponential state space S and equally many different transition probabilities. Symbolic probabilistic model checking also scales badly on some models where A has a concise encoding but x has too many different entries. 7 Therefore, model checkers may store x partially explicit [50].
The insights above are not new. Symbolic probabilistic model checking has advanced [47] to create small representations of both A and x. In competitions, Storm often applies a bisimulation-to-explicit method that extracts an explicit representation of the bisimulation quotient [27,38]. Finally, game-based abstraction [45,34] can be seen as a predicate abstraction technique on the ADD level. However, these methods do not change the computation of the finite horizon 6 Excluding e.g., partial exploration or sampling which typically are not exact. 7 For an interesting example of this, see the "Queue" example in Section 6. reachability probabilities and thus do not overcome the inherent weaknesses of the iterative approach in combination with an ADD-based representation.

A Probabilistic Inference Perspective
We present four key insights into probabilistic inference. (1) Section 4.1 shows how probabilistic inference takes the classical definition as summing over the set of paths, and turns this definition into an algorithm. In particular, these paths may be stored in a computation tree.
(3) Section 4.3 connects this reduction to point (1) by showing that a BDD that represents this WMC is bisimilar to the computation tree assuming that the out-degree of every state in the MC is two. (4) Section 4.4 describes and compares the computational benefits of the BDD representation.
In particular, we clarify that enforcing an out-degree of two is a key ingredient to overcoming one of the weaknesses of symbolic probabilistic model checking: the number of different probabilities in the underlying MC.

Operational perspective
The following perspective frames (an aspect of) probabilistic inference as a model transformation. By definition, the set of all paths -each annotated with the transition probabilities -suffices to extract the reachability probability. These sets of paths may be represented in the computation tree (which is itself an MC).
Example 4. We continue from Ex. 1. We put all paths of length three in a computation tree in Fig. 4(a) (cf. the caption for state identifiers). The three paths that reach the target are highlighted in red. The MC is highly redundant. We may compress to the MC in Fig. 4(b).
The CT contains (up to renaming) the same paths to the target as the original MC. Notice that after h transitions, all paths are in a sink state, and thus we can drop the step bound from the property and consider either finite or indefinite horizons. The latter considers all paths that eventually reach the target. We denote the probability mass of these paths with Pr M (s |= ♦T ) and refer to [7] for formal details. 8 Then, we may compute bounded reachability probabilities in the original MC by analysing unbounded reachability in the CT: The nodes in the CT have a natural topological ordering. The unbounded reachability probability is then computed (efficiently in CT's size) using dynamic programming (i.e., topological value iteration) on the Bellman equation for s ∈ T : For pMCs, the right-hand side naturally is a factorised form of the solution function f that maps parameter values to the induced reachability probability, [23,35,25]. For bounded reachability (or acyclic pMCs), this function amounts to a sum over all paths with every path reflected by a term of a polynomial, i.e., the sum is a polynomial. In sum-of-terms representation, the polynomial can be exponential in the number of parameters [5].
For computational efficiency, we need a smaller representation of the CT. As we only consider reachability of T , we may simplify [44] the notion of (weak) bisimulation [6] (in the formulation of [41]) to the following definition.

Logical perspective
The previous section defined weakly bisimilar chains and showed computational advantages, but did not present an algorithm. In this section we frame the finite horizon reachability probability as a logical query known as weighted model counting (WMC). In the next section we will show how this logical perspective yields an algorithm for constructing bisimilar MCs.
Weighted model counting is well-known as an effective reduction for probabilistic inference [59,17]. Let ϕ be a logical sentence over variables C. The weight function W C : C → R ≥0 assigns a weight to each logical variable. A total variable assignment η : . Then the weighted model count for ϕ given W is WMC(ϕ, W C ) = η|=ϕ weight(η). Formally, we desire to compute a reachability query using a WMC query in the following sense: Consider initially the simplified case when the MC M is binary: every state has at most two successors. In this case producing (ϕ C M,h , W C ) is straightforward: Example 5. Consider the MC in Fig. 2(a), and note that it is binary. We introduce logical variables called state/step coins C = {c s,i | s ∈ S, i < h} for every state and step. Assignments to these coins denote choices of transitions at particular times: if the chain is in state s at step i, then it takes the transition to the lexicographically first successor of s if c s,i is true and otherwise takes the transition to the lexicographically second successor. To construct the predicate ϕ C M,3 , we will need to write a logical sentence on coins whose models encode accepting paths (red paths) in the CT in Fig. 4(a).
We start in state s = 0, 0 (using state labels from the caption of Fig. 4). We order states as s = 0, 0 < t = 0, 1 < u = 1, 0 < v = 1, 1 . Then, c s,0 is true if the chain transitions into state s at time 0 and false if it transitions to state t at time 0. So, one path from s to the target node 1, 0 is given by the logical sentence (c s,0 ∧ ¬c s,1 ∧ c t,2 ). The full predicate ϕ C M,3 is therefore: Each model of this sentence is a single path to the target. This predicate ϕ C M,h can clearly be constructed by considering all possible paths through the chain, but later on we will show how to build it more efficiently. Finally, we fix W C : The weight for each coin is directly given by the transition probability to the lexicographically first successor: When the MC is not binary, it suffices to limit the out-degree of an MC to be at most two by adding auxiliary states, hence binarizing all transitions, cf. Appendix A.

Connecting the Operational and the Logical Perspective
Now that we have reduced bounded reachability to weighted model counting, we reach a natural question: how do we perform WMC? 9 Various approaches p1p2p3 p1p2p3 p1p2p3 to performing WMC have been explored; a prominent approach is to compile the logical function into a binary decision diagram (BDD), which supports fast weighted model counting [22]. In this paper, we investigate the use of a BDDdriven approach for two reasons: (i) BDDs admit straightforward support for parametric models. (ii) BDDs provide a direct connection between the logical and operational perspectives. To start, observe that the graph of the BDD, together with the weights, can be interpreted as an MC: Definition 3. Let ϕ X be a propositional formula over variables X and < X an These BDDs are intimately related to the computation trees discussed before. For a binary MC M, the tree CT(M, h) is binary and can be considered as a (not necessarily reduced) BDD. More formally, let us construct BDD MC (ϕ C M,h , < C ,). We fix a total order on states. Then we fix state/step coins C = {c s,i | s ∈ S, i < h} and the weights as in Example 5. Finally, let < C be an order on C such that i < j implies c s,i < C c s,j . Then: In the spirit of Idea 1, we thus aim to construct BDD MC (ϕ C M,h , < C , W ), a representation as outlined in Idea 2, efficiently. Indeed, the BDD (as MC) in Fig. 4(c) is bisimilar to the MC in Fig. 4(b).
Idea 3: Represent a bisimilar version of the computation tree using a BDD.

The Algorithmic Benefits of BDD Construction
Thus far we have described how to construct a binarized MC bisimilar to the CT. Here, we argue that this construction has algorithmic benefits by filling in two details. First, the binarized representation is an important ingredient for compact BDDs. Second, we show how to choose a variable ordering that ensures that the BDDs grow linearly in the horizon. In sum, Idea 4: WMC encodings of binarized Markov Chains may increase compression of computation trees.
To see the benefits of binarized transitions, we return to the factory example in Section 1. Figure 5(a) gives a bisimilar computation tree for the 3-factory h = 1 example. However, in this tree, the states are unfactorized : each node in the tree is a joint configuration of factories. This tree has 8 transitions (one for each possible joint state transition) with 8 distinct probabilities. On the other hand, the bisimilar computation tree in Figure 1(d) has binarized transitions: each node corresponds to a single factory's state at a particular time-step, and each transition describes an update to only a single factory. This binarization enables the exploitation of new structure: in this case, the independence of the factories leads to smaller BDDs, that is otherwise lost when considering only joint configurations of factories.
Recall that the size of the ADD representation of the transition function is bounded from below by the number of distinct probabilities in the underlying MC: in this case, this is visualized by the number of distinct outgoing edge probabilities from all nodes in the unfactorized computation tree. Thus, a good binarization can have a drastically positive effect on performance. For the running example, rather than 2 n different transition probabilities (with n factories), the system now has only 4 · n distinct transition probabilities! Causal orderings. Next, we explore some of the engineering choices Rubicon makes to exploit the sequential structure in a MC when constructing the BDD for a WMC query. First, note that the transition matrix P (s, s ) implicitly encodes a distribution over state transition functions, S → S. To encode P as a BDD, we must encode each transition as a logical variable, similar to the situation in Sec. 4.2. In the case of binary transitions this is again easy. In the case of nonbinary transitions, we again introduce additional logical variables [28,59,40,17]. This logical function has the following form: Whereas the computation tree follows a fixed (temporal) order of states, BDDs can represent the same function (and the same weighted model count) using an arbitrary order. Note that the BDD's size and structure drastically depends both on the construction of the propositional formula and the order of the variables in that encoding. We can bound the size of the BDD by enforcing a variable order based on the temporal structure of the original MC. Specifically, given h coin collections C = C × . . . × C, one can generate a function f describing the h-length paths via repeated applications of f P : Let ψ denote an indicator for the reachability property as a function over paths, . We call predicates formed by composition with f P , i.e., ϕ = ψ • f P , causal encodings and orderings on c i,t ∈ C that are lexicographically sorted in time, t 1 < t 2 =⇒ c i,t1 < c j,t2 , causal orderings. Importantly, causally ordered / encoded BDDs grow linearly in horizon h [63, Corollary 1]. More precisely, let ϕ C M,h be causally encoded where | C| = h · m. The causally ordered BDD for ϕ C M,h has at most h · |S × S ψ | · m · 2 m nodes, where |S ψ | = 2 for reachability properties. 10 However, while the worst-case growth is linear in the horizon, constructing that BDD may induce a super-linear cost in the size, e.g., function composition using BDDs is super-linear! Figure 5(b) shows the motivating factory example with 2 factories and h = 2. The variables are causally ordered: the factories in time step 1 occur before the factories in time step 2. For n factories, a fixed number f (n) of nodes are added to the BDD upon each iteration, guaranteeing growth on the order O(f (n) · h). Note the factorization that occurs: the BDD has node sharing (node c (2) 2 is reused) that yields additional computational benefits.
Summary and remaining steps. The operational view highlights that we want to compute a transformation of the original input MC M. The logical view presents an approach to do so efficiently: By computing a BDD that stores a predicate describing all paths that reach the target, and interpreting and evaluating the (graph of the) BDD as an MC. In the following section, we discuss the two steps that we follow to create the BDD:

Rubicon
We present Rubicon which follows the two steps outlined above. For exposition, we first describe a translation of monolithic Prism programs to Dice programs and then extend this translation to admit modular programs. Technical steps and extensions are deferred to Appendix B.
Dice Preliminaries We give a brief description of Dice, a probabilistic programming language (PPL) introduced in [40]. A PPL is a programming language augmented with a primitive notion of random choice: for instance, in Dice, a Bernoulli random variable is introduced by the syntax flip 0.5. The syntax of Dice is similar to the programming language OCaml: local variables are introduced by the syntax let x = e 1 in e 2 , where e 1 and e 2 are expressions, i.e., sub-programs. Dice supports procedures, bounded integers, bounded loops, and standard control flow via if-statements.
One goal of a PPL is to perform probabilistic inference: compute the probability that the program returns a particular value. Inference on the tiny Dice program let x = flip 0.1 in x would yield that true is returned with probability 0.1. The Dice compiler performs probabilistic inference via weighted model counting and BDD compilation. In doing so, it accomplishes the non-trivial tasks of: (i) choosing a logical encoding for probabilistic programs (ii) establishing  good variable orderings (iii) efficiently manipulating and constructing BDDs (iv) performing WMC. For details, we refer the reader to [40].
Rubicon uses Dice to effectively construct a BDD and perform WMC on a Dice program that reflects a description of some computation tree. This implementation exploits the structure that was described in Sec. 4.4: in particular, the BDD generated in Figure 5(b) is exactly the BDD that will be generated by Dice from the output of Rubicon. The variable ordering used by Dice is given by the order in which program variables are introduced, and Rubicon's translation was designed with this variable ordering in mind.
Transpiling Prism to Dice We present the core translation routine implemented in Rubicon. We note that the ultimate performance of Rubicon is heavily dependent on the quality of this translation. We evaluate the performance in the next section.
The Prism specification language consists of one or more reactive modules (or partially synchronized state machines) that may interact with each other. Our example in Fig. 1(b) illustrates fully synchronized state machines. While Prism programs containing multiple modules can be flattened into a single monolithic program, this yields an exponential blow-up: If one flattens the n modules in Fig. 1(b) to a single module, the resulting program has 2 n updates per command. This motivates our direct translation of PRISM programs containing multiple modules.
Monolithic Prism programs. We explain most ideas on Prism programs that consist of a single "monolithic" module before we address the modular translation at the end of the subsection. A module has a set of bounded variables, and the valuations of these variables span the state space of the underlying MC. Its transitions are described by guarded commands of the form: [act] guard → p 1 : update 1 + . . . . . . + p n : update n The action name act is only relevant in the modular case and can be ignored for now. The guard is a Boolean expression over the module's variables. If the guard evaluates to true for some state (a valuation), then the module evolves into one of the n successor states by updating its variables. An update is chosen according to the probability distribution given by the expressions p 1 , . . . , p n . In every state enabling the guard, the evaluation of p 1 , . . . , p n must sum up to one. A set of guards overlap if they all evaluate to true on a given state. The semantics of overlapping guards in the monolithic setting is to first uniformly select an active guard and then apply the corresponding stochastic transition. Finally, a self-loop is implicitly added to states without an enabled guard. Example 6. We present our translation primarily through example. In Fig. 6(a), we give a Prism program for a MC. The program contains two variables x and y, where x is either zero or one, and y between zero and two. There are thus 6 different states. We denote states as tuples with the x-and y-value. We depict the MC in Fig. 6(b). From state 0, 0 , (only) the first guard is enabled and thus there are two transitions, each with probability a half: one in which x becomes one and one in which y is increased by one. Finally, there is no guard enabled in state 1, 1 , resulting in an implicit self-loop.
Translation. All Dice programs consist of two parts: a main routine, which is run by default when the program starts, and function declarations that declare auxiliary functions. We first define the auxiliary functions. For simplicity let us temporarily assume that no guards overlap and that probabilities are constants, i.e., not state-dependent.
The main idea in the translation is to construct a Dice function step that, given the current state, outputs the next state. Because a monolithic Prism program is almost a sequential program, in its most basic version, the step function is straightforward to construct using built-in Dice language primitives: we simply build a large if-else block corresponding to each command. This block iteratively considers each command's guard until it finds one that is satisfied. To perform the corresponding update we flip a coin -based on the probabilities corresponding to the updates -to determine which update to perform. If no command is enabled, we return the same state in accordance with the implicit selfloop. Fig. 6(d) shows the program blocks for the Prism program from Fig. 6(a) with target state [[ x = 0, y = 2 ]]. There are two other important auxiliary functions. The init function simply returns the initial state by translating the initialization statements from Prism, and the hit function checks whether the current state is a target state that is obtained from the property. Now we outline the main routine, given for this example in Figure 6(c). This function first initializes the state. Then, it calls step 2 times, checking on each iteration using hit if the target state is reached. Finally, we return whether we have been in a target state. The probability to return true corresponds to the reachability probability on the underlying MC specified by the Prism program.   Overlapping guards. Prism allows multiple commands to be enabled in the same state, with semantics to uniformly at random choose one of the enabled commands to evaluate. Dice has no primitive notion of this construct. 11 We illustrate the translation in Fig. 7(a) and Fig. 7(b). It determines which guards aEn, bEn, cEn are enabled. Then, we randomly select one of the commands which are enabled, i.e., we uniformly at random select a true bit from a given tuple of bits. We store the index of that bit and use it to execute the corresponding command.
Modular Prism Programs. For modular Prism programs, the action names at the front of Prism commands are important. In each module, there is a set of action names available. An action is enabled if each module that contains this action name has (at least) one command with this action whose guard is satisfied. Commands with an empty action are assumed to have a globally unique action name, so in that case the action is enabled iff the guard is enabled. Intuitively, once an action is selected, we randomly select a command per module in all modules containing this action name. Our approach resembles that for overlapping guards described above. See Fig. 8 for an intuitive example. To automate this, the updates require more care, see Appendix B for details.
Implementation. Rubicon is implemented on top of Storm's Python API and translates Prism to Dice fully automatically. Rubicon supports all MCs in the Prism benchmark suite and a large set of benchmarks from the Prism website and the QVBS [37], with the note that we require a single initial state and ignore 11 One cannot simply condition on selecting an enabled guard as this redistributes probability mass over all paths and not only over paths with the same prefix.
reward declarations. Furthermore, we currently do not support the hide/restrict process-algebraic compositions and some integer operations.

Empirical Comparisons
We compare and contrast the performance of Storm against Rubicon to empirically demonstrate the following strengths and weaknesses: 12 Explicit Model Checking (Storm) represents the MC explicitly in a sparse matrix format. The approach suffers from the state space explosion, but has been engineered to scale to models with many states. Besides the state space, the sparseness of the transition matrix is essential for performance. Symbolic Model Checking (Storm) represents the transition matrix and the reachability probability as an ADD. This method is strongest when the transition matrix and state vector have structure that enables a small ADD representation, like symmetry and sparsity. Rubicon represents the set of paths through the MC as a (logical) BDD. This method excels when the state space has structure that enables a compact BDD representation, such as conditional independence, and hence scales well on examples with many (asymmetric) parallel processes or queries that admit a compact representation.
The sources, benchmarks and binaries are archived. 13 There is no clear-cut model checking technique that is superior to others (see QCOMP [13]). We demonstrate that, while Rubicon is not competitive on some commonly used benchmarks [53], it improves a modern model checking portfolio approach on a significant set of benchmarks. Below we provide several natural models on which Rubicon is superior to one or both competing methods. We also evaluated Rubicon on standard benchmarks, highlighting that Rubicon is applicable to models from the literature. We see that Rubicon is effective on Herman (elaborated below), has mixed results on BRP (see Appendix C) and is currently not competitive on some other standard benchmarks (NAND, EGL, LeaderSync). While not exhaustive, our selected benchmarks highlight specific strengths and weaknesses of Rubicon. Finally, a particular benefit of Rubicon is fast sampling of parametric chains, which we demonstrate on Herman and our factory example.

Scaling Experiments
In this section, we describe several scaling experiments (Figure 9), each designed to highlight a specific strength or weakness.
Weather Factories. First, Figure 9(a) describes a generalization of the motivating example from Sec. 1. In this model, the probability that each factory is on strike is dependent on a common random event: whether or not it is raining. The ). An "(R)" in the caption denotes random parameters.
rain on each day is dependent on the previous day's weather. We plot runtime for an increasing number of factories for h=10. Both Storm engines eventually fail due to the state explosion and the number of distinct probabilities in the MC.
Rubicon is orders of magnitude faster in comparison, highlighting that it does not depend on complete independence among the factories. Figure 9(b) shows a more challenging instance where the weather includes wind which, each day, affects whether or not the sun will shine, which in turn affects strike probability.
Herman. Herman is based on a distributed protocol [39] that has been wellstudied [54,1] and which is one of the standard benchmarks in probabilistic model checking. Rather than computing the expected steps to 'stabilization', we consider the step-bounded probability of stabilization. Usually, all participants in the protocol flip a coin with the same bias. The model is then highly symmetric, and hence is amenable to symbolic representation with ADDs. Figures 9(c) and 9(e) show how the methods scale on Herman examples with 13 and 17 parallel processes. We observe that the explicit approach scales very efficiently in the number of iterations but has a much higher up-front model-construction cost, and hence can be slower for fewer iterations.
To study what happens when the coin biases vary over the protocol participants, we made a version of the Herman protocol where each participant's bias is randomly chosen, which ruins the symmetry and so causes the ADD-based approaches to scale significantly worse (Figures 9(d) and 9(f), and 9(g)); we see that symbolic ADD-based approaches completely fail on Herman 17 and Herman 19 (the curve terminating denotes a memory error). Rubicon and the explicit approach are unaffected by varying parameters.
Queues. The Queues model has K queues of capacity Q where every step, tasks arrive with a particular probability. Three queues are of type 1, the others of type 2. We ask the probability that all queues of type 1 and at least one queue of type 2 is full within k steps. Contrary to the previous models, the ADD representation of the transition matrix is small. Figure 9(h) shows the relative scaling on this model with K = 8 and Q = 3. We observe that ADDs quickly fail due to inability to concisely represent the probability vector x from Sec. 3. Rubicon outperforms explicit model checking until h = 10.

Sampling Parametric Markov Chains
We evaluate performance for the pMC sampling problem outlined in Sec. 2. Table 1 gives for four models the time to construct the BDD and to perform WMC, as well as the time to construct an ADD in Storm and to perform model checking with this ADD. Finally, we show the time for Storm to compute the solution function of the pMC (with the explicit representation). The pMC sampling in Storm -symbolic and explicit -computes the reachability probabilities with concrete probabilities. Rubicon, in contrast, constructs a 'parametric' BDD once, amortizing the cost of repeated efficient evaluation. The 'parametric BDD' may be thought of as a solution function, as discussed in Sec. 4.1. Storm cannot compute these solution functions as efficiently. We observe in Tab. 1 that fast parametric sampling is realized in Rubicon: for instance, after a 40s up-front compilation of the factories example with 15 factories, we have a solution function in factorized form and it costs an order of magnitude less time to draw a sample. Hence, sampling and computation of solution functions of pMCs is a major strength of Rubicon.

Discussion, Related Work, and Conclusion
We have demonstrated that the probabilistic inference approach to probabilistic model checking can improve scalability on an important class of problems. Another benefit of the approach is for sampling pMCs. These are used to evaluate e.g., robustness of systems [1], or to synthesise POMDP controllers [42]. Many state-ofthe-art approaches [18,25,20] require the evaluation of various instantiated MCs, and Rubicon is well-suited to this setting. More generally, support of inference techniques opens the door to a variety of algorithms for additional queries, e.g, computing conditional probabilities [3,8].
An important limitation of probabilistic inference is that only finitely many paths can be stored. For infinite horizon properties in cyclic models, an infinite set of arbitrarily long paths would be required. However, as standard in probabilistic model checking, we may soundly approximate infinite horizons. Additionally, the inference algorithm in Dice does not support a notion of non-determinism. It thus can only be used to evaluate MCs, not Markov decision processes. However, [63] illustrates that this is not a conceptual limitation. Finally, we remark that Rubicon achieves its performance with a straightforward translation. We are optimistic that this is a first step towards supporting a larger class of models by improving the transpilation process for specific problems.
Related work The tight connection with inference has been recently investigated via the use of model checking for Bayesian networks, the prime model in probabilistic inference [57]. Recently, this has been extended to consider parameter synthesis methods from the verification community [58]. Bayesian networks can be described as probabilistic programs [11] and their operational semantics coincides with MCs [32]. Our work complements these insights by studying how symbolic model checking can be sped up by probabilistic inference.
The path-based perspective is tightly connected to factored state spaces. Factored state spaces are often represented as (bipartite) Dynamic Bayesian networks. ADD-based model checking for DBNs has been investigated in [26], with mixed results. Their investigation focuses on using ADDs for factored state space representations. We investigate using BDDs representing paths. Other approaches also investigated a path-based view: The symbolic encoding in [29] annotates propositional sub-formulae with probabilities, an idea closer to ours. The underlying process implicitly constructs an (uncompressed) CT leading to an exponential blow-up. Likewise, an explicit construction of a computation tree without factorization is considered in [64]. Compression by grouping paths has been investigated in two approximate approaches: [56] discretises probabilities and encodes into a satisfiability problem with quantifiers and bit-vectors. This idea has been extended [62] to a PAC algorithm by purely propositional encodings and (approximate) model counting [15]. Finally, factorisation exploits symmetries, which can be exploited using symmetry reduction [51]. We highlight that the latter is not applicable to the example in Fig. 1(d).
There are many techniques for exact probabilistic inference in various forms of probabilistic modeling, including probabilistic graphical models [55,21]. The semantics of graphical models make it difficult to transpile Prism programs, since commonly used operations are lacking. Recently, probabilistic programming languages have been developed which are more amenable to transpilation [14,24,31,61,30]. We target Dice due to the technical development that it enables in Section 4, which enabled us to design and explain our experiments. Closest related to Dice is ProbLog [28], which is also a PPL that performs inference via WMC; ProbLog has different semantics from Dice that make the translation less straightforward. The paper [63] uses an encoding similar to Dice for inferring specifications based on observed traces. ADDs and variants have been considered for probabilistic inference [16,19,60], which is similar to the process commonly used for probabilistic model checking. The planning community has developed their own disjoint sets of methods [46]. Some ideas from learning have been applied in a model checking context [12].
Conclusion We present Rubicon, bringing probabilistic AI to the probabilistic model checking community. Our results show that Rubicon can outperform probabilistic model checkers on some interesting examples, and that this is not a coincidence but rather the result of a significantly different perspective.

A Binary MCs
Any (parametric) Markov chain with out-degree more than two can translated into a Markov chain with an out-degree of at most two. This operation is standard and e.g. exploited in [42]. We exemplify one possible construction in Fig. 10. Notice that this construction requires increasing the horizon.

B Details on Rubicon: Prism to Dice
In this section, we assume some familiarity with the Prism semantics. We refer to the Prism website for details.

B.1 Extensions to Monolithic Translation
State-dependent probabilities. Dice currently does not support expressions that evaluate to rationals, and thus, probabilities are constants. Thus, our translation expands expressions by considering all values for the variables that occur in these statements. We illustrate this in Fig. 11(a) and Fig. 11(b). Prism programs may contain expressions like x /y, with x and y both ranging from, say, 0 to 10 which may not necessarily be probabilities. The language only requires the outcomes to be valid probabilities for reachable expressions.
Init statements The declarative way of initial states -that give an initial state as the solution of a predicate -is supported by using Storm's API. Notice that currently, we only support MCs with a unique initial state.

B.2 Modular Translation
For modular Prism programs, the action names at the front of Prism commands are important. In each module, there is a set of action names available. An action is enabled if each module that contains this action name has (at least) one command with this action whose guard is satisfied. Commands with an empty action are assumed to have a globally unique action name, so in that case the action is enabled iff the guard is enabled. Intuitively, once an action is selected, we randomly select a command per module in all modules containing this action name. We then independently and randomly, according to the probability distribution over updates, select an update, and execute these updates in parallel. All reads are done before all writes. We remark that variables can be read from and written to from any module. Data races lead to undefined behavior, i.e., any linearization of updates is valid in Prism.
Modular translation without overlapping guards within action and module. Here, we assume guards do not overlap within an action and module, see the next paragraph for the general case. The Dice program first evaluates all actions to determine their joint guards. Then Dice randomly selects one of the actions which is enabled 14 . Once the action is fixed, we now need to select to associated commands and updates. While similar to the vanilla case, we now run a series of updates rather than a single update. More precisely, once the actions are fixed, we iterate over the modules and flip the coins to select the updates for each command. We use the outcomes of these coin flips to incrementally construct the next state. We remark that the latter is not completely trivial as for different actions, different modules may be assigning an update to a particular variable. 15 Modular translation with overlapping guards within action and module. When a module has multiple commands with the same action, the semantics of Prism programs requires uniform resolving of actions on a global level, among all enabled combinations of enabled commands. Thus in our example, if there were two enabled a actions in module m1 in a given state, then a actions would get double weight when determining which action to select. Once we have computed the right weight for every action, we can then continue as before, where we now in every module must first decide which action to take.

B.3 Discussion on sampling and other properties
Sampling and symbolic distributions. As discussed before, Dice may quickly evaluate models with a range of distributions. Technically, we support Prism programs with symbolic probabilities (parameters), and allow probabilities to be expressions over these symbols. We collect all commands that depend on these parameters, and replace these by symbolic transitions. We then (separately) translate each assignments to these parameters to concrete instantiations of the symbolic distributions.
Extended finite-horizon properties. Instead of returning the distribution over a predicate whether a target state has been visited, the Dice program can return distributions over (bounded) quantities. In the finite horizon case, expected cumulative rewards (that assign to every finite path a bounded quantity rather than true or false) can thus be supported straightforwardly. Rather than the simple reachability, the target can be straightforwardly described by an automaton. The translation merely needs to update the hit function (and make it a stateful function). Dice has native and efficient support for conditioning, which allows conditioning over a finite horizon events, e.g., to condition on a prefix, or to condition that within the first H 1 steps, a particular state must have been visited. Combinations of these constructions with indefinite horizon properties are left for future work.
Indefinite horizon. Inspired by ideas like interval iteration [33] is the following approximation. Naturally, the probability mass for the bounded horizon is a lower bound on the indefinite horizon probability. The also obtain an upper bound, we use the following equality [7]: Pr(♦T ) + Pr(♦( ¬T )) = 1, that states that eventually we reach a target state, or we reach a state from which it is impossible to reach a target state, denoted ¬T ('globally not T '). By setting ( ¬T ) as the bad states, we can approximate Pr(♦Bad) with a bounded horizon probability, getting Pr(♦ ≤h T ) ≤ Pr(♦T ) ≤ 1 − Pr(♦ ≤h Bad).
To generate a Dice program, we compute with Storm a BDD that expresses the states in Bad [7]. We translate this BDD in a sequence of if-then-else statements, with one statement per node.

B.4 Technical details
Invalid inputs The semantics for Prism programs assume that bounds are adhered to. However, Rubicon does not enforce this.
Overlapping guards To avoid constantly running into the overlapping guards case, we run an Satisfiability-Modulo-Theories [10] -solver that checks whether commands have overlapping guards. This analysis may be refined, e.g., to take into account for which states we run into overlapping guards.
selectFrom selectFrom is not a native function in Dice but rather encoded as in Figure B.3. We first count the number of set bits, then select randomly an offset C and then count until we found the C'th set bit, and return its index.
Bitwidth and domains. Notice that the translation requires the lower bounds of all variables to be 0. Dice programs type integers in their bitwidth. This potentially leads to typing errors when variables with different bitwidths occur within an expression. We therefore use the bitwidth of the largest domain for all variables. Static analysis could potentially refine this. We do not explicitly check whether variables remain in their domain, the behavior of violating variable bounds is undefined in Prism semantics.
Further technical concerns. Furthermore, we have seen problems with expressing that exactly one of a set of predicates φ 1 , . . . , φ k should be true. In Prism programs, this is often expressed with (φ 1 ?1 : 0) + . . . + (φ k ?1 : 0) = 1, which is awkward for the aforementioned typing problem. We alleviate this specific problem by extending the Prism dialect that Storm accepts with predicates like ExactlyOneOf(φ 1 , . . . , φ k ).

C Additional Experiments
See Listing 1.2. The "Queues" Prism model.