PrIC3: Property Directed Reachability for MDPs

IC3 has been a leap forward in symbolic model checking. This paper proposes PrIC3 (pronounced pricy-three), a conservative extension of IC3 to symbolic model checking of MDPs. Our main focus is to develop the theory underlying PrIC3. Alongside, we present a first implementation of PrIC3 including the key ingredients from IC3 such as generalization, repushing, and propagation.


Introduction
IC3. Also known as property-directed reachability (PDR) [23], IC3 [13] is a symbolic approach for verifying finite transition systems (TSs) against safety properties like "bad states are unreachable". It combines bounded model checking (BMC) [12] and inductive invariant generation. Put shortly, IC3 either proves that a set B of bad states is unreachable by finding a set of non-B states closed under reachability-called an inductive invariant-or refutes reachability of B by a counterexample path reaching B. Rather than unrolling the transition relation (as in BMC), IC3 attempts to incrementally strengthen the invariant "no state in B is reachable" into an inductive one. In addition, it applies aggressive abstraction to the explored state space, so-called generalization [36]. These aspects together with the enormous advances in modern SAT solvers have led to IC3's success. IC3 has been extended [27,38] and adapted to software verification [19,44]. This paper develops a quantitative IC3 framework for probabilistic models.
MDPs. Markov decision processes (MDPs) extend TSs with discrete probabilistic choices. They are central in planning, AI as well as in modeling randomized distributed algorithms. A key question in verifying MDPs is quantitative reachability: "is the (maximal) probability to reach B at most λ? ". Quantitative reachability [5,6] reduces to solving linear programs (LPs). Various tools support MDP model checking, e.g., Prism [43], Storm [22], modest [34], and EPMC [31]. The LPs are mostly solved using (variants of) value iteration [8,28,35,51]. Symbolic BDD-based MDP model checking originated two decades ago [4] and is rather successful. Towards IC3 for MDPs. Despite the success of BDD-based symbolic methods in tools like Prism, IC3 has not penetrated probabilistic model checking yet. The success of IC3 and the importance of quantitative reachability in probabilistic model checking raises the question whether and how IC3 can be adapted-not just utilized-to reason about quantitative reachability in MDPs. This paper addresses the challenges of answering this question. It extends IC3 in several dimensions to overcome these hurdles, making PrIC3-to our knowledge-the first IC3 framework for quantitative reachability in MDPs 1 . Notably, PrIC3 is conservative: For a threshold λ = 0, PrIC3 solves the same qualitative problem and behaves (almost) the same as standard IC3. Our main contribution is developing the theory underlying PrIC3, which is accompanied by a proof-of-concept implementation.
Challenge 1 (Leaving the Boolean domain). IC3 iteratively computes frames, which are over-approximations of sets of states that can reach B in a bounded number of steps. For MDPs, Boolean reachability becomes a quantitative reachability probability. This requires a shift: frames become real-valued functions rather than sets of states. Thus, there are infinitely many possible frames-even for finite-state MDPs-just as for infinite-state software [19,44] and hybrid systems [54]. Additionally, whereas in TSs a state reachable within k steps remains reachable on increasing k, the reachability probability in MDPs may increase. This complicates ensuring termination of an IC3 algorithm for MDPs.
Challenge 2 (Counterexamples = single paths). For TSs, a single cycle-free path 2 to B suffices to refute that "B is not reachable". This is not true in the probabilistic setting [32]. Instead, proving that the probability of reaching B exceeds the threshold λ requires a set of possibly cyclic paths-e.g., represented as a sub-MDP [15]-whose probability mass exceeds λ. Handling sets of paths as counterexamples in the context of IC3 is new.

Challenge 3 (Strengthening).
This key IC3 technique intuitively turns a proof obligation of type (i) "state s is unreachable from the initial state s I " into type (ii) "s's predecessors are unreachable from s I ". A first issue is that in the quantitative setting, the standard characterization of reachability probabilities in MDPs (the Bellman equations) inherently reverses the direction of reasoning (cf. "reverse" IC3 [53]): Hence, strengthening turns (i) "s cannot reach B" into (ii) "s's successors cannot reach B".
A much more challenging issue, however, is that in the quantitative setting obligations of type (i) read "s is reachable with at most probability δ". However, the strengthened type (ii) obligation must then read: "the weighted sum over the reachability probabilities of the successors of s is at most δ". In general, there are infinitely many possible choices of subobligations for the successors of s in order to satisfy the original obligation, because-grossly simplified-there are infinitely many possibilities for a and b to satisfy weighted sums such as 1 3 a + 2 3 b ≤ δ. While we only need one choice of subobligations, picking a good one is approximately as hard as solving the entire problem altogether. We hence require a heuristic, which is guided by a user-provided oracle. Challenge 4 (Generalization). "One of the key components of IC3 is [inductive] generalization" [13]. Generalization [36] abstracts single states. It makes IC3 scale, but is not essential for correctness. To facilitate generalization, systems should be encoded symbolically, i.e., integer-valued program variables describe states. Frames thus map variables to probabilities. A first aspect is how to effectively present them to an SMT-solver. Conceptually, we use uninterpreted functions and universal quantifiers (encoding program behavior) together with linear real arithmetic to encode the weighted sums occurring when reasoning about probabilities. A second aspect is more fundamental: Abstractly, IC3's generalization guesses an unreachable set of states. We, however, need to guess this set and a probability for each state. To be effective, these guesses should moreover eventually yield an inductive frame, which is often highly nonlinear. We propose three SMT-guided interpolation variants for guessing these maps.
Structure of this Paper. We develop PrIC3 gradually: We explain the underlying rationale in Sect. 3. We also describe the core of PrIC3-called PrIC3 H -which resembles closely the main loop of standard IC3, but uses adapted frames and termination criteria (Challenge 1). In line with Challenge 3, PrIC3 H is parameterized by a heuristic H which is applied whenever we need to select one out of infinitely many probabilities. No requirements on the quality of H are imposed. PrIC3 H is sound and always terminates: If it returns true, then the maximal reachability probability is bounded by λ. Without additional assumptions about H, PrIC3 H is incomplete: on returning false, it is unknown whether the returned sub-MDP is indeed a counterexample (Challenge 2). Section 4 details strengthening (Challenge 3). Section 5 presents a sound and complete algorithm PrIC3 on top of PrIC3 H . Section 6 presents a prototype, discusses our chosen heuristics, and addresses Challenge 4. Section 7 shows some encouraging experiments, but also illustrates need for further progress.
Related Work. Just like IC3 has been a symbiosis of different approaches, PrIC3 has been inspired by several existing techniques from the verification of probabilistic systems.
BMC. Adaptions of BMC to Markov chains (MCs) with a dedicated treatment of cycles have been pursued in [57]. The encoding in [24] annotates sub-formulae with probabilities. The integrated SAT solving process implicitly unrolls all paths leading to an exponential blow-up. In [52], this is circumvented by grouping paths, discretizing them, and using an encoding with quantifiers and bit-vectors, but without numerical values. Recently, [56] extends this idea to a PAC algorithm by purely propositional encodings and (approximate) model counting [17]. These approaches focus on MCs and are not mature yet.
Invariant Synthesis. Quantitative loop invariants are key in analyzing probabilistic programs whose operational semantics are (possibly infinite) MDPs [26]. A quantitative invariant I maps states to probabilities. I is shown to be an invariant by comparing I to the result of applying the MDP's Bellman operator to I. Existing approaches for invariant synthesis are, e.g., based on weakest preexpectations [33,39,40,42,46], template-based constraint solving [25], notions of martingales [3,9,16,55], and solving recurrence relations [10]. All but the last technique require user guidance.
Abstraction. To combat state-space explosion, abstraction is often employed. CEGAR for MDPs [37] deals with explicit sets of paths as counterexamples. Game-based abstraction [30,41] and partial exploration [14] exploit that not all paths have to be explored to prove bounds on reachability probabilities.

Statistical Methods and (deep) Reinforcement
Learning. Finally, an avenue that avoids storing a (complete) model are simulation-based approaches (statistical model checking [2]) and variants of reinforcement learning, possibly with neural networks. For MDPs, these approaches yield weak statistical guarantees [20], but may provide good oracles.

Problem Statement
Our aim is to prove that the maximal probability of reaching a set B of bad states from the initial state s I of a Markov decision process M is at most some threshold λ. Below, we give a formal description of our problem. We refer to [7,50] for a thorough introduction.
Definition 1 (MDPs). A Markov decision process ( MDP) is a tuple M = (S, s I , Act, P ), where S is a finite set of states, s I ∈ S is the initial state, Act is a finite set of actions, and P : S × Act × S → [0, 1] is a transition probability function. For state s, let Act (s) = {a ∈ Act | ∃s ∈ S : P (s, a, s ) > 0} be the enabled actions at s. For all states s ∈ S, we require |Act (s) | ≥ 1 and s ∈S P (s, a, s ) = 1. For this paper, we fix an MDP M = (S, s I , Act, P ), a set of bad states B ⊆ S, and a threshold λ ∈ [0, 1]. The maximal 3 (unbounded) reachability probability to eventually reach a state in B from a state s is denoted by Pr max (s |= ♦B). We characterize Pr max (s |= ♦B) using the so-called Bellman operator. Let M N denote the set of functions from N to M . Anticipating IC3 terminology, we call a function F ∈ [0, 1] S a frame. We denote by F [s] the evaluation of frame F for state s.
We write Φ a for Φ {a} , Φ for Φ Act , and call Φ simply the Bellman operator.
For every state s, the maximal reachability probability Pr max (s |= ♦B) is then given by the least fixed point of the Bellman operator Φ. That is, where the underlying partial order on frames is a complete lattice with ordering In terms of the Bellman operator, our formal problem statement reads as follows: Given an MDP M with initial state s I , a set B of bad states, and a threshold λ ∈ [0, 1], prove or refute that Whenever Pr max (s I |= ♦B) ≤ λ indeed holds, we say that the MDP M is safe (with respect to the set of bad states B and threshold λ); otherwise, we call it unsafe. 3 The Core PrIC3 Algorithm The purpose of PrIC3 is to prove or refute that the maximal probability to reach a bad state in B from the initial state s I of the MDP M is at most λ. In this section, we explain the rationale underlying PrIC3. Moreover, we describe the core of PrIC3-called PrIC3 H -which bears close resemblance to the main loop of standard IC3 for TSs. Because of the inherent direction of the Bellman operator, we build PrIC3 on reverse IC3 [53], cf. Challenge 3. Reversing constitutes a shift from reasoning along the direction initial-to-bad to bad-to-initial. While this shift is mostly inessential to the fundamentals underlying IC3, the reverse direction is unswayable in the probabilistic setting. Whenever we draw a connection to standard IC3, we thus generally mean reverse IC3.

Inductive Frames
IC3 for TSs operates on (quali tative) frames representing sets of states of the TS at hand. A frame F can hence be thought of as a mapping 4 from states to {0, 1}. In PrIC3 for MDPs, we need to move from a Boolean to a quantitative regime. Hence, a (quanti tative) frame is a mapping from states to probabilities in [0, 1].
For a given TS, consider the frame transformer T that adds to a given input frame F all bad states in B and all predecessors of the states contained in F . The rationale of standard (reverse) IC3 is to find a frame F ∈ {0, 1} S such that (i) the initial state s I does not belong to F and (ii) applying T takes us down in the partial order on frames, i.e., Intuitively, (i) postulates the hypothesis that s I cannot reach B and (ii) expresses that F is closed under adding bad states and taking predecessors, thus affirming the hypothesis. Analogously, the rationale of PrIC3 is to find a frame F ∈ [0, 1] S such that (i) F postulates that the probability of s I to reach B is at most the threshold λ and (ii) applying the Bellman operator Φ to F takes us down in the partial order on frames, i.e., Frames satisfying the above conditions are called inductive invariants in IC3. We adopt this terminology. By Park's Lemma [48], which in our setting reads If no inductive invariant exists, then standard IC3 will find a counterexample: a path from the initial state s I to a bad state in B, which serves as a witness to refute. Analogously, PrIC3 will find a counterexample, but of a different kind: Since single paths are insufficient as counterexamples in the probabilistic realm (Challenge 2), PrIC3 will instead find a subsystem of states of the MDP witnessing Pr max (s I |= ♦B) > λ.

The PrIC3 Invariants
Analogously to standard IC3, PrIC3 aims to find the inductive invariant by maintaining a sequence of frames imates the maximal probability of reaching B from s within at most i steps. This i-step-bounded reachability probability Pr max s |= ♦ ≤i B can be characterized using the Bellman operator: Φ (0) is the 0-step probability; it is 1 for every s ∈ B and 0 otherwise. For any i ≥ 0, we have where 0, the frame that maps every state to 0, is the least frame of the underlying complete lattice. For a finite MDP, the unbounded reachability probability is then given by the limit where ( * ) is a consequence of the well-known Kleene fixed point theorem [45].
. . will never explicitly be known to PrIC3. Instead, PrIC3 will ensure the above frame-wise overapproximation property implicitly by enforcing the so-called PrIC3 invariants on the frame sequence F 0 , F 1 , F 2 , . . .. Apart from allowing for a threshold 0 ≤ λ ≤ 1 on the maximal reachability probability, these invariants coincide with the standard IC3 invariants (where λ = 0 is fixed). Formally: Definition 3 (PrIC3 Invariants). Frames F 0 , . . . , F k , for k ≥ 0, satisfy the PrIC3 invariants, a fact we will denote by PrIC3Inv (F 0 , . . . , F k ), if all of the following hold: The PrIC3 invariants enforce the above picture: The chain property , the frames F 0 , . . . , F k in effect bound the maximal step-bounded reachability probability of every state: In particular, Lemma 1 together with frame-safety ensures that the maximal step-bounded reachability probability of the initial state s I to reach B is at most the threshold λ.
As for proving that the unbounded reachability probability is also at most λ, it suffices to find two consecutive frames, say F i and F i+1 , that coincide:

Operationalizing the PrIC3 Invariants for Proving Safety
Lemma 2 gives us a clear angle of attack for proving an MDP safe: Repeatedly add and refine frames approximating step-bounded reachability probabilities for more and more steps while enforcing the PrIC3 invariants (cf. Definition 3.2) until two consecutive frames coincide. Analogously to standard IC3, this approach is taken by the core loop PrIC3 H depicted in Algorithm 1; differences to the main loop of IC3 (cf. [23,Fig. 5]) are highlighted in red. A particular difference is that PrIC3 H is parameterized by a heuristic H for finding suitable probabilities (see Challenge 3). Since the precise choice of H is irrelevant for the soundness of PrIC3 H , we defer a detailed discussion of suitable heuristics to Sect. 4.
if ¬success then returnfalse, subsystem; As input, PrIC3 H takes an MDP M = (S, s I , Act, P ), a set B ⊆ S of bad states, and a threshold λ ∈ [0, 1]. Since the input is never changed, we assume it to be globally available, also to subroutines. As output, PrIC3 H returns true if two consecutive frames become equal. We hence say that PrIC3 H is sound if it only returns true if M is safe.
We will formalize soundness using Hoare triples. For precondition φ, postcondition ψ, and program P , the triple φ P ψ is valid (for partial correctness) if, whenever program P starts in a state satisfying precondition φ and terminates in some state s , then s satisfies postcondition ψ. Soundness of PrIC3 H then means validity of the triple Let us briefly go through the individual steps of PrIC3 H in Algorithm 1 and convince ourselves that it is indeed sound. After that, we discuss why PrIC3 H terminates and what happens if it is unable to prove safety by finding two equal consecutive frames.
How PrIC3 H works. Recall that PrIC3 H maintains a sequence of frames If Strengthen H returns true, then a new frame F k+1 = 1 is created in l. 5 Fig. 1. Assume that action b has been removed. Then, for every state, exactly one action is enabled, i.e., we consider a Markov chain. Figure 2 depicts the frame sequences computed by PrIC3 H (for a reasonable H) on that Markov chain for two thresholds: 5 /9 = Pr max (s 0 |= ♦B) and 9 /10. In particular, notice that proving the coarser bound of 9 /10 requires fewer frames than proving the exact bound of 5 /9.

Strengthening in PrIC3 H
When the main loop of PrIC3 H has created a new frame F k = 1 in its previous iteration, this frame may violate frame-safety (Definition 3.3) because of F k [s I ] = 1 ≤ λ. The task of Strengthen H is to restore the PrIC3 invariants on all frames F 0 , . . . , F k . To this end, our first obligation is to lower the value in frame i = k for state s = s I to δ = λ ∈ [0, 1]. We denote such an obligation by (i, s, δ). Observe that implicitly δ = 0 in the qualitative case, i.e., when proving unreachability. An obligation (i, s, δ) is resolved by updating the values assigned to state s in all frames F 1 , . . . , F i to at most δ. That is, for all j ≤ i, we set F j [s] to the minimum  1, s1, δ1) , . . . , (i − 1, sn, δn) , (i, s, δ) Fig. 6]) are highlighted in red. Intuitively, Strengthen H attempts to recursively resolve all obligations until either both frame-safety and relative inductivity are restored for all frames or it detects a potential counterexample justifying why it is unable to do so. We first consider an execution where the latter does not arise: Example 3. We zoom in on Example 2: Prior to the second iteration, we have created the following three frames assigning values to the states s 0 , s 5 : F 0 = (0, 0, 0, 0, 1), F 1 = ( 5 /9, 1, 1, 1, 1, 1), and F 2 = 1.
Strengthen H is Sound. Let us briefly discuss why Algorithm 2 meets the specification of a sound implemenation of Strengthen H (Definition 4): First, we observe that Algorithm 2 alters the frames-and thus potentially invalidates the PrIC3 invariants-only in l. 11 by resolving an obligation Indeed, resolving obligation (i, s, δ) in l. 11 lowers the values assigned to state s to at most δ without invalidating the PrIC3 invariants: Lemma 3. Let (i, s, δ) be an obligation and F 0 , . . . , F i , for i > 0, be frames with Crucially, the precondition of Definition 4 guarantees that all PrIC3 invariants except frame safety hold initially. Since these invariants are never invalidated due to Lemma 3, Algorithm 2 is a sound implementation of Strengthen H if it restores frame safety whenever it returns true, i.e., once it leaves the loop with an empty obligation queue Q (ll. 12-13). Now, an obligation (i, s, δ) is only popped from Q in l. 3. As (i, s, δ) is added to Q upon reaching l. 9, the size of Q can only ever be reduced (without returning false) by resolving (i, s, δ) in l. 11. Hence, Algorithm 2 does not return true unless it restored frame safety by resolving, amongst all other obligations, the initial obligation (k, s I , λ). Consequently: We remark that, analogously to standard IC3, resolving an obligation in l. 11 may be accompanied by generalization. That is, we attempt to update the values of multiple states at once. Generalization is, however, highly non-trivial in a probabilistic setting. We discuss three possible approaches to generalization in Sect. 6 1, s , ). Consequently, the total number of obligations spawned by Algorithm 2 is bounded. Since Algorithm 2 terminates if all obligations have been resolved (l. 12) and each of its loop iterations either returns false, spawns obligations, or resolves an obligation, we conclude: Strengthen H (F 0 , . . . , F k ) i, s, 0).
Strengthen H returns false. There are two cases in which Strengthen H fails to restore the PrIC3 invariants and returns false. The first case (the left disjunct of l. 4) is that we encounter an obligation for frame F 0 . Resolving such an obligation would inevitably violate initiality; analogously to standard IC3, we thus return false.
Strengthen H Cannot Prove Unsafety. If standard IC3 returns false, it proves unsafety by constructing a counterexample, i.e., a single path from the initial state to a bad state. If PrIC3 returns false, there are two possible reasons: Either the MDP is indeed unsafe, or the heuristic H at some point selected probabilities in a way such that Strengthen H is unable to restore the PrIC3 invariants (even though the MDP might in fact be safe). Strengthen H thus only returns a potential counterexample which either proves unsafety or indicates that our heuristic was inappropriate.
Counterexamples in our case consist of subsystems rather than a single path (see Challenge 2 and Sect. 5). Strengthen H hence returns the set Q.touched() of all states that eventually appeared in the obligation queue. This set is a conservative approximation, and optimizations as in [1] may be beneficial. Furthermore, in the qualitative case, our potential counterexample subsumes the counterexamples constructed by standard IC3: Recovery Statement 4. Let H 0 be the adequate heuristic mapping every state to 0. For qual. reachability (λ = 0), if success = false is returned by Strengthen H0 (F 0 , . . . , F k ), then Q.touched() contains a path from the initial to a bad state.

Dealing with Potential Counterexamples
Recall that our core algorithm PrIC3 H is incomplete for a fixed heuristic H: It cannot give a conclusive answer whenever it finds a potential counterexample for two possible reasons: Either the heuristic H turned out to be inappropriate or the MDP is indeed unsafe. The idea to overcome the former is to call PrIC3 H finitely often in an outer loop that generates new heuristics until we find an appropriate one: If PrIC3 H still does not report safety of the MDP, then it is indeed unsafe. We do not blindly generate new heuristics, but use the potential counterexamples returned by PrIC3 H to refine the previous one.
Let consider the procedure PrIC3 in Algorithm 3 which wraps our core algorithm PrIC3 H in more detail: First, we create an oracle Ω: S → [0, 1] which (roughly) estimates the probability of reaching B for every state. A perfect oracle would yield precise maximal reachability probabilites, i.e., Ω(s) = Pr max (s |= ♦B) for every state s. We construct oracles by user-supplied methods (highlighted in blue). Examples of implementations of all user-supplied methods in Algorithm 3 are discussed in Sect. 7.
Assuming the oracle is good, but not perfect, we construct an adequate heuristic H selecting probabilities based on the oracle 7 for all successors of a given state: There are various options. The simplest is to pass-through the oracle values. A version that is more robust against noise in the oracle is discussed in Sect. 6. We then invoke PrIC3 H . If PrIC3 H reports safety, the MDP is indeed safe by the soundness of PrIC3 H .
Check Refutation. If PrIC3 H does not report safety, it reports a subsystem that hints to a potential counterexample. Formally, this subsystem is a subMDP of states that were 'visited' during the invocation of Strengthen H . Definition 6 (subMDP). Let M = (S, s I , Act, P ) be an MDP and let S ⊆ S with s I ∈ S . We call M S = (S , s I , Act, P ) the subMDP induced by M and S , where for all s, s ∈ S and all a ∈ Act, we have P (s, a, s ) = P (s, a, s ).
A subMDP M S may be substochastic where missing probability mass never reaches a bad state. Definition 1 is thus relaxed: For all states s ∈ S we require that s ∈S P (s, a, s ) ≤ 1.If the subsystem is unsafe, we can conclude that the original MDP M is also safe.

Lemma 6. If M is a subMDP of M and M is unsafe, then M is also unsafe.
The role of CheckRefutation is to establish whether the subsystem is indeed a true counterexample or a spurious one. Formally, CheckRefutation should ensure: Again, PrIC3 is backward compatible in the sense that a single fixed heuristic is always sufficient when reasoning about reachability (λ = 0).

Recovery Statement 5.
For qualitative reachability (λ = 0) and the heuristic H 0 from Recovery Statement 4, PrIC3 invokes its core PrIC3 H exactly once.
This statement is true, as PrIC3 H returns either safe or a subsystem containing a path from the initial state to a bad state. In the latter case, CheckRefutation detects that the subsystem is indeed a counterexample which cannot be spurious in the qualitative setting.
We remark that the procedure CheckRefutation invoked in l. 5 is a classical fallback; it runs an (alternative) model checking algorithm, e.g., solving the set of Bellman equations, for the subsystem. In the worst case, i.e., for S = S, we thus solve exactly our problem statement. Empirically (Table 1) we observe that for reasonable oracles the procedure CheckRefutation is invoked on significantly smaller subMDPs. However, in the worst case the subMDP must include all paths of the original MDP, and then thus coincides.
Refine Oracle. Whenever we have neither proven the MDP safe nor unsafe, we refine the oracle to prevent generating the same subsystem in the next invocation of PrIC3 H . To ensure termination, oracles should only be refined finitely often. That is, we need some progress measure. The set touched overapproximates all counterexamples encountered in some invocation of PrIC3 H and we propose to use its size as the progress measure. While there are several possibilities to update touched through the user-defined procedure Enlarge (l. 6), every implementation should hence satisfy true touched ← Enlarge(touched, ) |touched | > |touched| . Consequently, after finitely many iterations, the oracle is refined with respect to all states. In this case, we may as well rely on solving the characteristic LP problem: Weaker assumptions on Refine are possible, but are beyond the scope of this paper. Moreover, the above lemma does not rely on the abstract concept that heuristic H provides suitable probabilities after finitely many refinements. 8

Practical PrIC3
So far, we gave a conceptual view on PrIC3, but now take a more practical stance. We detail important features of effective implementations of PrIC3 (based on our empirical evaluation). We first describe an implementation without generalization, and then provide a prototypical extension that allows for three variants of generalization.

A Concrete PrIC3 Instance Without Generalization
Input. We describe MDPs using the Prism guarded command language 9 , exemplified in Fig. 3  Encoding. We encode frames as logical formulae. Updating frames then corresponds to adding conjuncts, and checking for relative inductivity is a satisfiability call. Our encoding is as follows: States are assignments to the program variables, i.e., States = Z m . We use various uninterpreted functions, to whom we give semantics using appropriate constraints. Frames 10 are represented by uninterpreted functions Frame : Likewise, the Bellman operator is an uninterpreted function Phi : Among the appropriate constraints, we ensure that variables are within their range, bound the values for the frames, and enforce Phi (s) = 1 for s ∈ B. We encode the guarded commands as exemplified by this encoding of the first command in Fig. 3: In our implementation, we optimize the encoding. We avoid the uninterpreted functions by applying an adapted Ackerman reduction. We avoid universal quantifiers, by first observing that we always ask whether a single state is not inductive, and then unfolding the guarded commands in the constraints that describe a frame. That encoding grows linear in the size of the maximal out-degree of the MDP, and is in the quantifier-free fragment of linear arithmetic (QFLRIA).
Heuristic. We select probabilities δ i by solving the following optimization problem, with variables x i , range(x i ) ∈ [0, 1], for states s i ∈ Succs(s, a) and oracle Ω 11 .
The constraint ensures that, if the values x i correspond to the actual reachability probabilities from s i , then the reachability from state s is exactly δ. A constraint stating that δ ≥ . . . would also be sound, but we choose equality as it preserves room between the actual probability and the threshold we want to show. Finally, the objective function aims to preserve the ratio between the suggested probabilities.
Repushing and Breaking Cycles. Repushing [23] is an essential ingredient of both standard IC3 and PrIC3. Intuitively, we avoid opening new frames and spawning obligations that can be deduced from current information. Since repushing generates further obligations in the current frame, its implementation requires that the detection of Zeno-behavior has to be moved from PrIC3 H into the Strengthen H procedure. Therefore, we track the histories of the obligations in the queue. Furthermore, once we detect a cycle we first try to adapt the heuristic H locally to overcome this cyclic behavior instead of immediately giving up. This local adaption reduces the number of PrIC3 H invocations.
Extended Queue. In contrast to standard IC3, the obligation queue might contain entries that vary only in their δ entry. In particular, if the MDP is not a tree, it may occur that the queue contains both (i, s, δ) and (i, s, δ ) with δ > δ . Then, (i, s, δ ) can be safely pruned from the queue. Similarly, after handling (i, s, δ), if some fresh obligation (i, s, δ > δ) is pushed to the queue, it can be substituted with (i, s, δ). To efficiently operationalize these observations, we keep an additional mapping which remains intact over multiple invocations of Strengthen H . We furthermore employed some optimizations for Q.touched() aiming to track potential counterexamples better. After refining the heuristic, one may want to reuse frames or the obligation queue, but empirically this leads to performance degradation as the values in the frames are inconsistent with behavior suggested by the heuristic. Generalization in IC3 aims to update a set G (including s) of states in a frame rather than a single one without invalidating relative inductivity. In our setting, we thus consider a function p G : G → [0, 1] with p G (s) ≤ δ that assigns (possibly different) probabilities to all states in G. Updating a frame then amounts to adding the constraint Standard IC3 generalizes by iteratively "dropping" a variable, say v. The set G then consists of all states that do not differ from the fixed state s except for the value of v. 12 We take the same approach by iteratively dropping program variables. Hence, p G effectively becomes a mapping from the value s[v] to a probability. We experimented with four types of functions p G that we describe for Markov chains. The ideas are briefly outlined below; details are beyond the scope of this paper.
Constant p G . Setting all s ∈ G to δ is straightforward but empirically not helpful.
Linear Interpolation. We use a linear function p G that interpolates two points. The first point (s [v], δ) is obtained from the obligation (i, s, δ). For a second point, consider the following: Let Com be the unique 13 command active at state s. Among all states in G that are enabled in the guard of Com, we take the state s in which s [v] is maximal 14 . The second point for interpolation is then (s [v], Φ (F i−1 ) [s ]). If the relative inductivity fails for p G we do not generalize with p G , but may attempt to find other functions.
Polynomial Interpolation. Rather than linearly interpolating between two points, we may interpolate using more than two points. In order to properly fit these points, we can use a higher-degree polynomial. We select these points using counterexamples to generalization (CTGs): We start as above with linear interpolation. However, if p G is not relative inductive, the SMT solver yields a model with state s ∈ G and probability δ , with s violating relative inductivity, i.e., is then a further interpolation point, and we repeat.
Technically, when generalizing using nonlinear constraints, we use real-valued arithmetic with a branch-and-bound-style approach to ensure integer values.
Hybrid Interpolation. In polynomial interpolation, we generate high-degree polynomials and add them to the encoding of the frame. In subsequent invocations, reasoning efficiency is drastically harmed by these high-degree polynomials. Instead, we soundly approximate p G by a piecewise linear function, and use these constraints in the frame.

Experiments
We assess how PrIC3 may contribute to the state of the art in probabilistic model checking. We do some early empirical evaluation showing that PrIC3 is feasible. We see ample room for further improvements of the prototype.
Implementation. We implemented a prototype 15 of PrIC3 based on Sect. 6.1 in Python. The input is represented using efficient data structures provided by the model checker Storm. We use an incremental instance of Z3 [47] for each frame, as suggested in [23]. A solver for each frame is important to reduce the time spent on pushing the large frame-encodings. The optimization problem in the heuristic is also solved using Z3. All previously discussed generalizations (none, linear, polynomial, hybrid) are supported.
Oracle and Refinement. We support the (pre)computation of four different types of oracles for the initialization step in Algorithm 3: (1) A perfect oracle solving exactly the Bellman equations. Such an oracle is unrealistic, but interesting from a conceptual point. (2) Relative frequencies by recording all visited states during simulation. This idea is a naïve simplification of Q-learning. (3) Model checking with decision diagrams (DDs) and few value iterations. Often, a DD representation of a model can be computed fast, and the challenge is in executing sufficient value iterations. We investigate whether doing few value iterations yields a valuable oracle (and covers states close to bad states). (4) Solving a (pessimistic) LP from BFS partial exploration. States that are not expanded are assumed bad. Roughly, this yields oracles covering states close to the initial states.
To implement Refine (cf. Algorithm 3, l. 7), we create an LP for the subMDP induced by the touched states. For states whose successors are not in the touched states, we add a transition to B labeled with the oracle value as probability. The solution of the resulting LP updates the entries corresponding to the touched states.
For Enlarge (cf. Algorithm 3, l. 6), we take the union of the subsystem and the touched states. If this does not change the set of touched states, we also add its successors.
Setup. We evaluate the run time and memory consumption of our prototype of PrIC3. We choose a combination of models from the literature (BRP [21], ZeroConf [18]) and some structurally straightforward variants of grids (chain, double chain; see [11,Appendix A]). Since our prototype lacks the sophisticated preprocessing applied by many state-of-the-art model checkers, it is more sensitive to the precise encoding of a model, e.g., the number of commands. To account for this, we generated new encodings for all models. All experiments were conducted on an single core of an Intel® Xeon® Platinum 8160 processor. We use a 15 min time-limit and report TO otherwise. Memory is limited to 8GB; we report MO if it is exceeded. Apart from the oracle, all parameters of our prototype remain fixed over all experiments. To give an impression of the run times, we compare our prototype with both the explicit (Storm sparse ) and DD-based (Storm dd ) engine of the model checker Storm 1.4, which compared favourably in QComp [29].
Results. In Table 1, we present the run times for various invocations of our prototype and Oracle 4 16 . In particular, we give the model name and the number of (non-trivial) states in the particular instance, and the (estimated) actual probability to reach B. For each model, we consider multiple thresholds λ. The next 8 columns report on the four variants of PrIC3 with varying generalization schemes. Besides the scheme with the run times, we report for each scheme the number of states of the largest (last) subsystem that CheckRefutation in Algorithm 3, l. 5 was invoked upon (column |sub|). The last two columns report on the run times for Storm that we provide for comparison. In each row, we mark with purple MDPs that are unsafe, i.e., PrIC3 refutes these MDPs for the given threshold λ. We highlight the best configurations of PrIC3.

Discussion.
Our experiments give a mixed picture on the performance of our implementation of PrIC3. On the one hand, Storm significantly outperforms PrIC3 on most models. On the other hand, PrIC3 is capable of reasoning about huge, yet simple, models with up to 10 12 states that Storm is unable to analyze within the time and memory limits. There is more empirical evidence that PrIC3 may complement the state-of-the-art: First, the size of thresholds matters. Our benchmarks show that-at least without generalization-more "wiggle room" between the precise maximal reachability probability and the threshold generally leads to a better performance. PrIC3 may thus prove bounds for large models where a precise quantitative reachability analysis is out of scope.
Second, PrIC3 enjoys the benefits of bounded model checking. In some cases, e.g., ZeroConf for λ = 0.45, PrIC3 refutes very fast as it does not need to build the whole model.
Third, if PrIC3 proves the safety of the system, it does so without relying on checking large subsystems in the CheckRefutation step.
Fourth, generalization is crucial. Without generalization, PrIC3 is unable to prove safety for any of the considered models with more than 10 3 states. With generalization, however, it can prove safety for very large systems and thresholds close to the exact reachability probability. For example, it proved safety of the Chain benchmark with 10 12 states for a threshold of 0.4 which differs from the exact reachability probability by 0.006. Fifth, there is no best generalization. There is no clear winner out of the considered generalization approaches. Linear generalization always performs worse than the other ones. In fact, it performs worse than no generalization at all. The hybrid approach, however, occasionally has the edge over the polynomial approach. This indicates that more research is required to find suitable generalizations.
In [11, Appendix A], we also compare the additional three types of oracles (1-3). We observed that only few oracle refinements are needed to prove safety; for small models at most one refinement was sufficient. However, this does not hold if the given MDP is unsafe. DoubleChain with λ = 0.15, for example, and Oracle 2 requires 25 refinements.

Conclusion
We have presented PrIC3-the first truly probabilistic, yet conservative, extension of IC3 to quantitative reachability in MDPs. Our theoretical development is accompanied by a prototypical implementation and experiments. We believe there is ample space for improvements including an in-depth investigation of suitable oracles and generalizations.