figure a
figure b

1 Introduction

Reasoning about reachability probabilities is a foundational task in the analysis of randomized systems. Such systems are (possibly infinite-state) Markov chains, which are typically described as probabilistic programs – imperative programs that may sample from probability distributions. We contribute a method for proving bounds on quantitative properties of probabilistic programs, which finds inductive invariants on source-code level by inductive synthesis. We discuss each of these ingredients below, present our approach with a running example in Sect. 2, and defer a detailed discussion of related work to Sect. 8.

1) Quantitative Reachability Properties. We aim to verify properties such as “is the probability of reaching an error at most 1%?” More generally, our technique proves bounds on the expected value of a probabilistic program terminating in designated states (see Sect. 2.1). Various verification problems are ultimately solved by bounding quantitative reachability properties (cf.  [7, 47]). Further examples of such problems include “does a program terminate with finite expected runtime?” and “is the expected sum of program variables x and y at least one?”

2) Inductive Invariants. An inductive invariant is a certificate that witnesses a certain quantitative reachability property. Quantitative (and qualitative) reachability are typically captured as least fixed points (cf. [7, 47, 52]). For upper bounds, this characterization makes it natural to search for a prefixed point – the inductive invariant – that, by standard fixed point theory [56], is greater than or equal to the least fixed point. Our invariants assign every state a quantity. If the initial state is assigned a quantity below the desired threshold, then the invariant certifies that the property in question holds. We detail quantitative inductive invariants in Sect. 2.2; we adapt our method to lower bound reasoning in Sect. 6.

3) Source-Code Level. We consider probabilistic programs over (potentially unbounded) integer variables that conceptually extend while-programs with coin flips, see e.g. Fig. 2.Footnote 1 We exploit the program structure to reason about infinite-state (and large finite-state) programs: We never construct a Markov chain but find symbolic inductive invariants (mapping from program states to nonnegative reals) on source-code level. We particularly discover inductive invariants that are piecewise linear, as they can often be verified efficiently.

Fig. 1.
figure 1

Our CEGIS framework for synthesizing quantitative inductive invariants.

4) Inductive Synthesis. Our approach to finding invariants, as sketched in Fig. 1, is inspired by inductive synthesis [4]: The inner loop (shaded box) is provided with a template T which may generate an infinite set \(\langle T \rangle \) of instances. We then synthesize a template instance I that is an inductive invariant witnessing quantitative reachability, or determine that no such instance exists. We search for such instances in a counterexample-guided inductive synthesis (CEGIS) loop: The synthesizer constructs a candidate. (A tailored variant of) an off-the-shelf verifier either (i) decides that the candidate is a suitable inductive invariant or (ii) reports a counterexample state \(s\) back to the synthesizer. Upon termination (guaranteed for finite-state programs), the inner loop has either found an inductive invariant or the solver reports that the template \(T\) does not admit an inductive invariant.

Contributions. We show that inductive synthesis for verifying quantitative reachability properties by finding inductive invariants on source-code level is feasible: Our approach is sound for arbitrary probabilistic programs, and complete for finite-state programs. We implemented our simple yet powerful technique. The results are promising: Our CEGIS loop is sufficiently fast to support large templates and finds inductive invariants for various probabilistic programs and properties. It can prove, amongst others, upper and lower bounds on reachability probabilities and universal positive almost-termination [42]. Our implementation is competitive with three state-of-the-art tools – Storm  [39], Absynth  [50], and Exist  [9] – on subsets of their benchmarks fitting our framework.

Applicability and Limitations. We consider programs with possibly unbounded nonnegative integer-valued variables and arbitrary affine expressions in quantitative specifications. As for other synthesis-based approaches, there are unrealizable cases – loops for which no piecewise linear invariant exists. But, if there is an invariant, our CEGIS loop often finds it within a few iterations.

2 Overview

Fig. 2.
figure 2

Model for the bounded retransmission protocol (BRP).

We illustrate our approach using the bounded retransmission protocol (BRP) – a standard probabilistic model checking benchmark [28, 38] – modeled by the probabilistic program in Fig. 2. The model attempts to transmit 8 million packetsFootnote 2 over a lossy channel, where each packet is lost with probability \(0.1\%\); if a packet is lost, we retry sending it; if any packet is lost in 10 consecutive sending attempts (\(\textit{fail} = 10\)), the entire transmission fails; if all packets have been transmitted successfully (\(\textit{sent} = 8\,000\,000\)), the transmission succeeds.

2.1 Reachability Probabilities and Loops

We aim to reason about the transmission-failure probability of BRP, i.e. the probability that the loop terminates in a target state \(t\) with \(t(\textit{fail}) = 10\) when started in initial program state \(s_0\) with \(s_0(\textit{fail}) = s_0(\textit{sent}) = 0\). One approach to determine this probability is to (i) construct an explicit-state Markov chain (MC) per Fig. 2, (ii) derive its Bellmann operator \(\varPhi \) [52], (iii) compute its least fixed point \(\textsf {{lfp}}~\varPhi \) (a vector containing for each state the probability to reach t), e.g. using value iteration (cf.  [7, Thm 10.15]), and finally (iv) evaluate \(\textsf {{lfp}}~\varPhi \) at \(s_0\).

The explicit-state MC of BRP has ca. 80 million states. We avoid building such large state spaces by computing a symbolic representation of \(\varPhi \) from the program. More formally, let \(S\) be the set of all states, \(\texttt{loop}\) the entire loop (ll. 2–3 in Fig. 2), \(\texttt{body}\) the \(\texttt{loop}\)’s body (l. 3), and \(\llbracket {\texttt{body}} \rrbracket (s)(s')\) the probability of reaching state \(s'\) by executing \(\texttt{body}\) once on state \(s\). Then the least fixed point of the \(\texttt{loop}\)’s Bellmann operator \(\varPhi :\bigl (S\rightarrow \mathbb {R}_{\ge 0}^\infty \bigr ) \rightarrow \bigl (S\rightarrow \mathbb {R}_{\ge 0}^\infty \bigr )\), defined by

figure c

captures the transmission-failure probability for the entire execution of \(\texttt{loop}\) and for any initial state, that is, \((\textsf {{lfp}}~\varPhi )(s)\) is the probability of terminating in a target state when executing \(\texttt{loop}\) on \(s\) (even if \(\texttt{loop}\) would not terminate almost-surely). Intuitively, \(\varPhi (I)(s)\) maps to 1 if \(\texttt{loop}\) has terminated meeting the target condition (transmission failure); and to 0 if \(\texttt{loop}\) has terminated otherwise (transmission success). If \(\texttt{loop}\) is still running (i.e. it has neither failed nor succeeded yet), then \(\varPhi (I)(s)\) maps to the expected value of I after executing \(\texttt{body}\) on state \(s\).

2.2 Quantitative Inductive Invariants

Reachability probabilities are generally not computable for infinite-state probabilistic programs [43]. Even for finite-state programs the state-space explosion may prevent us from computing reachability probabilities exactly. However, it often suffices to know that the reachability probability is bounded from above by some threshold \(\lambda \). For BRP, we hence aim to prove that \((\textsf {{lfp}}~\varPhi )(s_0) \le \lambda \).

We attack the above task by means of (quantitative) inductive invariants: a candidate for an inductive invariant is a mapping \(I:S\rightarrow \mathbb {R}_{\ge 0}^\infty \). Intuitively, such a candidate I is inductive if the following holds: when assuming that \(I(s)\) is (an over-approximation of) the probability to reach a target state upon termination of \(\texttt{loop}\) on s, then the probability to reach a target state after performing one more guarded loop iteration, i.e. executing \({\texttt {if}}\,\left( \, {\textit{sent} < {\ldots }} \,\right) \,\left\{ \, {{\texttt{body}}{\,;}~ {\texttt{loop}}} \,\right\} \) on \(s\), must be at most \(I(s)\). Formally, I is an inductive invariantFootnote 3 if

$$\begin{aligned} \forall s:\quad \varPhi (I)(s) ~{}\le {}~I(s) \qquad \text {which implies}\qquad \forall s:\quad \bigl (\textsf {{lfp}}~\varPhi \bigr ) (s) ~{}\le {}~I(s) \end{aligned}$$

by Park induction [51]. Hence, \(I(s)\) bounds for each initial state \(s\) the exact reachability probability from above. If we are able to find an inductive I that is below \(\lambda \) for the initial state \(s_0\) with \(\textit{fail} = \textit{sent} = 0\), i.e. \(I(s_0) \le \lambda \), then we have indeed proven the upper bound \(\lambda \) on the transmission-failure probability of our BRP model. In a nutshell, our goal can be phrased as follows:

figure d

2.3 Our CEGIS Framework for Synthesizing Inductive Invariants

While finding a safe inductive invariant I is challenging, checking whether a given candidate I is indeed inductive is easier: it is decidable for certain infinite-state programs (cf. [14, Sect. 7.2]), it may not require an explicit exploration of the whole state space, and it can be done efficiently for piecewise linear I. Hence, techniques that generate decent candidate expressions fast and then check their inductivity could enable the automatic verification of probabilistic programs with gigantic and even infinite state spaces.

In this paper, we test this hypothesis by developing the CEGIS framework depicted in Fig. 1 for incrementally synthesizing inductive invariants. A template generator generates parametrized templates for inductive invariants. The inner loop (shaded box in Fig. 1) then tries to solve for appropriate template-parameter instantiations. If it succeeds, an inductive invariant has been synthesized. Otherwise, the template provably cannot be instantiated into an inductive invariant. The inner loop then reports that back to the template generator (possibly with some hint on why it failed, see [12, Appx. D]) and asks for a refined template.

For our running example, we start with the template

$$\begin{aligned} T ~{}={}~\left[ {\textit{fail}<10 \wedge \textit{sent} < 8\,000\,000} \right] \cdot (\alpha \cdot \textit{sent} + \beta \cdot \textit{fail} + \gamma ) ~{}+{}~\left[ {\textit{fail} =10} \right] , \end{aligned}$$
(1)

where we use Iverson brackets for indicators, i.e. \(\left[ {\varphi } \right] (s) = 1\) if \(s \models \varphi \) and 0 otherwise. T contains two kinds of variables: integer program variables \(\textit{fail}, \textit{sent} \) and \(\mathbb {Q} \)-valued parameters \(\alpha , \beta , \gamma \). While the template is nonlinear, substituting \(\alpha , \beta , \gamma \) with concrete values yields piecewise linear candidate invariants I. We ensure that those I are piecewise linear to render the repeated inductivity checks efficient. We construct only so-called natural templates T with \(\varPhi \) in mind, e.g. we want to construct only I such that \(I(s) = 1\) when \(s(\textit{fail}) = 10\).

Our inner CEGIS loop checks whether there exists an assignment from these template variables to concrete values such that the resulting piecewise linear expression is an inductive invariant. Concretely, we try to determine whether there exist values for \(\alpha , \beta , \gamma \) such that \(T(\alpha , \beta ,\gamma )\) is inductive. For that, we first guess values for \(\alpha , \beta , \gamma \), say all 0’s, and ask a verifier whether the instantiated (and now piecewise linear) template \(I = T(0,0,0)\) is indeed inductive. In our example, the verifier determines that I is not inductive: a counterexample is \(s(\textit{fail})=9\), \(s(\textit{sent})=7999999\). Intuitively, the probability to reach the target after one more loop iteration exceeds the value in I for this state, that is, \(\varPhi (I)(s) = 0.001 > 0 = I(s)\). From this counterexample, our synthesizer learns

$$\begin{aligned} \varPhi (T)(s) ~{}={}~0.001 \smash {~{}{\mathop {\le }\limits ^{!}}{}~} \alpha \cdot 7999999 + \beta \cdot 9 + \gamma ~{}={}~T(s)~. \end{aligned}$$

Observe that this learned lemma is linear in \(\alpha , \beta , \gamma \). The synthesizer will now keep “guessing” assignments to the parameters which are consistent with the learned lemmas until either no such parameter assignment exists anymore, or until it produces an inductive invariant \(I = T(\ldots )\). In our running example, assuming \(\lambda = 0.9\), after 6 lemmas, our synthesizer finds the inductive invariant I

$$\begin{aligned} \left[ {\textit{fail}<10 \wedge \textit{sent} < 8 \cdot 10^6} \right] \cdot (-\tfrac{9}{8\cdot 10^7} \cdot \textit{sent} + \tfrac{79\,991}{72\cdot 10^7} \cdot \textit{fail} + \tfrac{9}{10}) + \left[ {\textit{fail} =10} \right] \end{aligned}$$
(2)

where indeed \(I(s_0) \le \lambda \) holds. For a tighter threshold \(\lambda \), such simple templates do not suffice. For example, it is impossible to instantiate this template to an inductive invariant for \(\lambda = 0.8\), even though 0.8 is an upper bound on the actual reachability probability. We therefore support more general templates of the form

$$\begin{aligned} T ~{}={}~\sum _{\smash {i}} \left[ {B_i} \right] \cdot \left( \alpha _i \cdot \textit{sent} + \beta _i \cdot \textit{fail} + \gamma _i \right) ~{}+{}~\left[ {\textit{fail} =10} \right] ~, \end{aligned}$$

where the \(B_i\) are (restricted) predicates over program and template variables which partition the state space. In particular, we allow for a template such as

$$\begin{aligned} \begin{aligned} T ~{}={}~&\left[ {\textit{fail}< 10 \wedge \textit{sent}< \delta } \right] \cdot \left( \alpha _1 \cdot \textit{sent} + \beta _1 \cdot \textit{fail} + \gamma _1 \right) ~{}+{}~\\&\left[ {\textit{fail} < 10 \wedge \textit{sent} \ge \delta } \right] \cdot \left( \alpha _2 \cdot \textit{sent} + \beta _2 \cdot \textit{fail} + \gamma _2 \right) ~{}+{}~\left[ {\textit{fail} =10} \right] \end{aligned} \end{aligned}$$
(3)

However, such templates are challenging for the CEGIS loop. Thus, we additionally consider templates where the \(B_i\)’s range only over program variables, e.g.

$$\begin{aligned}&\left[ {\textit{fail}< 10 \wedge \textit{sent}< 4\,000\,000} \right] \cdot (\ldots ) ~{}+{}~\left[ {\textit{fail} < 10 \wedge \textit{sent} \ge 4\,000\,000} \right] \cdot (\ldots ) ~{}+{}~\ldots \end{aligned}$$

Our partition refinement algorithms automatically produce these templates, without the need for user interaction.

Finally, we highlight that we may use our approach for more general questions. For BRP, suppose we want to verify an upper bound \(\lambda = 0.05\) on the probability of failing to transmit all packages for an infinite set of models (also called a family) with varying upper bounds on packets \(1 \le P \le 8000000\) and retransmissions \(R \ge 5\). This infinite set of models is described by the loop shown in Fig. 3a. Our approach fully automatically synthesizes the following inductive invariant \(I \):

$$\begin{aligned}&\left[ \begin{array}{l} \textit{fail}< R ~{}\wedge {}~\textit{sent}<P~{}\wedge {}~P<8\,000\,000 ~{}\wedge {}~R\ge 5 \\[1ex] ~{}\wedge {}~R > 1+\textit{fail} ~{}\wedge {}~\tfrac{13067990199}{5280132671650}\cdot \textit{fail} \le \tfrac{5278689867}{211205306866000} \end{array} \right] \cdot \left( \begin{array}{l} \tfrac{-19}{3820000040}\cdot \textit{sent} \\[1ex] {}+ \tfrac{19}{3820000040} \cdot P \\[1ex] {}+ \tfrac{19500001}{1910000020} \end{array}\right) \\[-1ex] ~{}+{}~&\ldots ~ \text {(7 additional summands omitted)} \end{aligned}$$
Fig. 3.
figure 3

A bounded retransmission protocol family and piece of a matching invariant.

The first summand of \(I \) is plotted in Fig. 3b. Since \(I \) overapproximates the probability of failing to transmit all packages for every state, \(I \) may be used to infer additional information about the reachability probabilities.

3 Formal Problem Statement

Before we state the precise invariant synthesis problem that we aim to solve, we summarize the essential concepts underlying our formalization.

Probabilistic Loops. We consider single probabilistic loops \({\texttt {while}}\left( \, {\varphi } \,\right) \left\{ \, {C} \,\right\} \) whose loop guard \(\varphi \) and (loop-free) body \(C\) adhere to the grammar

figure e

where \(z \in \mathbb {Z} \) is a constant and x is from an arbitrary finite set \(\textsf{Vars} \) of \(\mathbb {N} \)-valued program variables. Program states in map variables to natural numbers.Footnote 4 All statements are standard (cf. [47]). \(\left\{ \, {C_1} \,\right\} \mathrel {\left[ \,p\,\right] }\left\{ \, {C_2} \,\right\} \) is a probabilistic choice which executes \(C_1\) with probability \(p \in [0,1] \cap \mathbb {Q} \) and \(C_2\) with probability \(1-p\). Fig. 2 (ll. 2–3) is an example of a probabilistic loop.

Expectations. In Sect. 2, we considered whether final states meet some target condition by assigning 0 or 1 to each final state. The assignment can be generalized to more general quantities in \(\mathbb {R}_{\ge 0}^\infty \). We call such assignments f expectations [47] (think: random variable) and collect them in the set \(\mathbb {E}\), i.e.

\(\preceq \) is a partial order on \(\mathbb {E}\) – necessary to sensibly speak about least fixed points.

Characteristic Functions. The expected behavior of a probabilistic loop for an expectation f is captured by an expectation transformer (namely the \(\varPhi :\mathbb {E}\rightarrow \mathbb {E}\) of Sect. 2), called the loop’s characteristic function. To focus on invariant synthesis, we abstract from the detailsFootnote 5 of constructing characteristic functions from probabilistic loops; our framework only requires the following key property:

Proposition 1 (Characteristic Functions)

For every loop \({\texttt {while}}\left( \, {\varphi } \,\right) \left\{ \, {C} \,\right\} \) and expectation f, there exists a monotone function \(\varPhi _{f}:\mathbb {E}\rightarrow \mathbb {E}\) such that

and the least fixed point of \(\varPhi _{f}\), denoted \(\textsf {{lfp}}~\varPhi _{f}\), satisfies

figure g

Example 1

In our running example from Sect. 2.1, we chose as f the expression \(\left[ {\textit{fail} = 10} \right] \), which evaluates to 1 in every state s where \(\textit{fail} = 10\) and to 0 otherwise. The characteristic function \(\varPhi _{f}(I)\) of the loop in Fig. 2 is

,

where \(\varphi = \textit{sent}< 8\,000\,000 \wedge \textit{fail} < 10\) is the loop guard and denotes the (syntactic) substitution of variable x by expression \(e\) in expectation I – the latter is used to model the effect of assignments as in standard Hoare logic.    \(\lhd \)

Inductive Invariants. For a probabilistic loop \({\texttt {while}}\left( \, {\varphi } \,\right) \left\{ \, {C} \,\right\} \), and pre- and postexpectations \(g,f \in \mathbb {E}\), we aim to verify \(\textsf {{lfp}}~\varPhi _{f} \preceq g\), i.e. that the expected value of f after termination of the loop is bounded from above by g. We discuss how to adapt our approach to expected runtimes and lower bounds in Sect. 6. Intuitively, f assigns a quantity to all target states reached upon termination. g assigns to all initial states a desired bound on the expected value of f after termination of the loop. By choosing \(g(s) = \infty \) for certain s, we can make \(s\) so-to-speak “irrelevant”. An \(I \in \mathbb {E}\) is an inductive invariant proving \(\textsf {{lfp}}~\varPhi _{f} \preceq g\) iff \(\varPhi _{f}(I) \preceq I\) and \(I \preceq g\). Continuing our example, Eq. (2) on p. 5 shows an inductive invariant proving that \( \textsf {{lfp}}~\varPhi _{f} \preceq g {:}{=}[\textit{fail} =0 \wedge \textit{sent} =0]\cdot 0.9 + [\lnot (\textit{fail} =0 \wedge \textit{sent} =0)]\cdot \infty \).

Our framework employs syntactic fragments of expectations on which the check \(\varPhi _{f}(I) \preceq I\) can be done symbolically by an SMT solver. As illustrated in Fig. 1, we use templates to further narrow down the invariant search space.

Templates. Let \(\textsf{TVars}= \{\alpha , \beta , \ldots \}\) be a countably infinite set of \(\mathbb {Q} \)-valued template variables. A template valuation is a function \(\mathfrak {I}:\textsf{TVars}\rightarrow \mathbb {Q} \) that assigns to each template variable a rational number. We will use the same expressions as in our programs except that we admit both rationals and template variables as coefficients. Formally, arithmetic and Boolean expressions \(E\) and \(B\) adhere to

$$\begin{aligned} E\quad {}&\longrightarrow {}\quad r ~{}\mid {}~x ~{}\mid {}~r \cdot x ~{}\mid {}~E+ E\qquad \quad \, B\quad {}\longrightarrow {}\quad E< E~{}\mid {}~~ \lnot B~{}\mid {}~~ B\wedge B~, \end{aligned}$$

where \(x \in \textsf{Vars} \) and \(r \in \mathbb {Q} \cup \textsf{TVars}\). The set \(\textsf{TExp}\) of templates then consists of all

$$\begin{aligned} T~{}={}~\left[ {B_1} \right] \cdot E_1 + \ldots + \left[ {B_n} \right] \cdot E_n~, \end{aligned}$$

for \(n \ge 1\), where the Boolean expressions \(B_i\) partition the state space, i.e. for all template valuations \(\mathfrak {I}\) and all states \(s\), there is exactly one \(B_i\) such that \(\mathfrak {I}, s\models B_i\). \(T\) is a fixed-partition template if additionally no \(B_i\) contains a template variable.

Notice that templates are generally not linear (over \(\textsf{Vars} \cup \textsf{TVars}\)). Sect. 2 gives several examples of templates, e.g. Eq. (1).

Template Instances. We denote by \(T\left[ \mathfrak {I}\right] \) the instance of template \(T\) under \(\mathfrak {I}\), i.e. the expression obtained from substituting every template variable \(\alpha \) in \(T\) by its valuation \(\mathfrak {I}(\alpha )\). For example, the expression in Eq. (2) on p. 5 is an instance of the template in Eq. (1) on p. 5. The set of all instances of template \(T\) is defined as ]. We chose the shape of templates on purpose: To evaluate an instance \(T\left[ \mathfrak {I}\right] \) of a template \(T\) in a state \(s\), it suffices to find the unique Boolean expression \(B_i\) with \(\mathfrak {I}, s\models B_i\) and then evaluate the single linear arithmetic expression \(E_i\left[ \mathfrak {I}\right] \) in \(s\). For fixed-partition templates, the selection of the right \(B_i\) does not even depend on the template evaluation \(\mathfrak {I}\).

Piecewise Linear Expectations. Some template instances \(T\left[ \mathfrak {I}\right] \) do not represent expectations, i.e. they are not of type \(S\rightarrow \mathbb {R}_{\ge 0}^\infty \), as they may evaluate to negative numbers. Template instances \(T\left[ \mathfrak {I}\right] \) that do represent expectations are piecewise linear; we collect such well-defined instances in the set \(\textsf{LinExp} \). Formally,

Definition 1

(\(\boldsymbol{\textsf{LinExp}}\)). The set \(\textsf{LinExp} \) of (piecewise) linear expectations is \( \textsf{LinExp} ~{}={}~\{ T\left[ \mathfrak {I}\right] \mid T\in \textsf{TExp}~~\text {and}~~ \mathfrak {I}:\textsf{TVars}\rightarrow \mathbb {Q} ~~\text {and}~~ \forall s\in S:T\left[ \mathfrak {I}\right] (s) \ge 0 \} \).

We identify well-defined instances of templates in \(\textsf{LinExp} \) with the expectation in \(\mathbb {E}\) that they represent, e.g. when writing the inductivity check .

Natural Templates. As suggested in Sect. 2.3, it makes sense to focus only on so-called natural templates. Those are templates that even have a chance of becoming inductive, as they take the loop guard \(\varphi \) and postexpectation f into account. Formally, a template \(T\) is natural (wrt. to \(\varphi \) and f) if \(T\) is of the form

$$\begin{aligned} T~~{}={}~~ \underbrace{\left[ {\lnot \varphi \wedge B_1} \right] \cdot E_1 + \ldots + \left[ {\lnot \varphi \wedge B_n} \right] \cdot E_n}_{\text {must be equivalent to } \left[ {\lnot \varphi } \right] \cdot f} ~{}+{}~\left[ {B'_1} \right] \cdot E'_1 + \ldots + \left[ {B'_m} \right] \cdot E'_m~. \end{aligned}$$

We collect all natural templates in the set \(\textsf{TnExp}\).

Formal Problem Statement. Throughout this paper, we fix an ambient single loop \({\texttt {while}}\left( \, {\varphi } \,\right) \left\{ \, {C} \,\right\} \), a postexpectation \(f \in \textsf{LinExp} \), and a preexpectation \(g \in \textsf{LinExp} \)Footnote 6 such that \(\textsf {{lfp}}~\varPhi _{f}(I) \preceq g\)Footnote 7. The set \(\textsf{AdmInv}\) of admissible invariants (i.e. those expectations that are both inductive and safe) is then given by

where the underbraces summarize the tasks for a verifier to decide whether a template instance I is an admissible inductive invariant. We require \(\textsf {{lfp}}~\varPhi _{f} \preceq g\), so that \(\textsf{AdmInv}\) is not vacuously empty due to an unsafe bound g.

figure l

Notice that \(\textsf{AdmInv}\) might be empty, even for safe g’s, because generally one might need more complex invariants than piecewise linear ones [16]. However, there always exists an inductive invariant in \(\textsf{LinExp} \) if a loop can reach only finitely many states.Footnote 8 We call a loop \({\texttt {while}}\left( \, {\varphi } \,\right) \left\{ \, {C} \,\right\} \) finite-state, if only finitely many states satisfy the loop guard \(\varphi \), i.e. if is finite.

Syntactic Characteristic Functions. We work with linear expectations \(I,f \in \textsf{LinExp} \), so that we can check inductivity (\(\varPhi _{f}(I) \preceq I\)) symbolically (via SMT) without state space construction. In particular, we can construct a syntactic counterpart \(\varPsi _{f}\) to \(\varPhi _{f}\) that operates on templates. Intuitively, whether we evaluate \(\varPsi _{f}\) on a (syntactic) template \(T\) and then instantiate the result with a valuation \(\mathfrak {I}\), or we evaluate \(\varPhi _{f}\) on the (semantic) expectation \(T\left[ \mathfrak {I}\right] \) emerging from instantiating \(T\) with \(\mathfrak {I}\) – the results will coincide if \(T\left[ \mathfrak {I}\right] \) is well-defined. Formally:

Proposition 2

Given \({\texttt {while}}\left( \, {\varphi } \,\right) \left\{ \, {C} \,\right\} \) and \(f \in \textsf{LinExp} \), one can effectively compute a mapping \(\varPsi _{f} :\textsf{TExp}\rightarrow \textsf{TExp}\), such that for all \(T\) and \(\mathfrak {I}\)

$$ T\left[ \mathfrak {I}\right] ~{}\in {}~\textsf{LinExp} ~\quad \text {implies}\quad ~\varPsi _{f}(T)\left[ \mathfrak {I}\right] ~{}={}~\varPhi _{f}\bigl ( T\left[ \mathfrak {I}\right] \bigr )~. $$

Moreover, \(\varPsi _{f}\) maps fixed-partition templates to fixed-partition templates.

In Ex. 1, we have already constructed such a \(\varPsi _{f}\) to represent \(\varPhi _{f}\). The general construction is inspired by [14], but treats template variables as constants.

4 One-Shot Solver

One could address the template instantiation problem from Sect. 3 in one shot: encode it as an SMT query, ask a solver for a model, and infer from the model an admissible invariant. While this approach is infeasible in practice (as it involves quantification over \(S_\varphi \)), it inspires the CEGIS loop in Fig. 1.

Regarding the encoding, given a template \(T\), we need a formula over \(\textsf{TVars}\) that is satisfiable if and only if there exists a template valuation \(\mathfrak {I}\) such that \(T\left[ \mathfrak {I}\right] \) is an admissible invariant, i.e. \(T\left[ \mathfrak {I}\right] \in \textsf{AdmInv}\). To get rid of program variables in templates, we denote by \(T(s)\) the expression over \(\textsf{TVars}\) in which all program variables \(x \in \textsf{Vars} \) have been substituted by \(s(x)\).

Intuitively, we then encode that, for every state \(s\), the expression \(T(s)\) satisfies the three conditions of admissible invariants, i.e. well-definedness, inductivity, and safety. In particular, we use Prop. 2 to compute a template \(\varPsi _{f}(T)\) that represents the application of the characteristic function \(\varPhi _{f}\) to a candidate invariant, i.e. \(\varPhi _{f}(T\left[ \mathfrak {I}\right] )\) – a necessity for encoding inductivity.

Formally, we denote by \(\textsf{Sat}(\phi )\) the set of all models of a first-order formula \(\phi \) (with a fixed underlying structure), i.e. \(\textsf{Sat}(\phi ) = \{ \mathfrak {I}\mid \mathfrak {I}\models \phi \}\). Then:

Theorem 1

For every natural template \(T\in \textsf{TnExp}\) and \(f, g \in \textsf{LinExp} \), we have

figure n

Notice that, for fixed-partition templates, the above encoding is particularly simple: \(T(s)\) and \(\varPsi _{f}(T)(s)\) are equivalent to single linear arithmetic expressions over \(\textsf{TVars}\); \(g(s)\) is either a single expression or \(\infty \) – in the latter case, we get an equisatisfiable formula by dropping the always-satisfied constraint \(T(s) \le g(s)\).

For general templates, one can exploit the partitioning to break it down into multiple inequalities, i.e. every inequality becomes a conjunction over implications of linear inequalities over the template variables \(\textsf{TVars}\).

Example 2

Reconsider template \(T\) in Eq. (3) on p. 6 and assume a state \(s\) with \(s(\textit{fail}) = 5\) and \(s(\textit{sent}) = 2\). Then, we encode the well-definedness, \(T(s) \ge 0\), as

$$ \big (5<10 \wedge 2< \delta \Rightarrow \alpha _1 \cdot 2 + \beta _1 \cdot 5 + \gamma _1 \ge 0\big ) \wedge \big ( 5<10 \wedge 2 \ge \delta \Rightarrow \alpha _2 \cdot 2 + \beta _2 \cdot 5 + \gamma _2 \ge 0\big ) $$

where the trivially satisfiable conjunct \(5 = 10 \Rightarrow \textsf{true}\) encoding the last summand, i.e. \(\left[ {\textit{fail} = 10} \right] \), has been dropped.    \(\lhd \)

The query in Thm. 1 involves (non-linear) mixed real and integer arithmetic with quantifiers – a theory that is undecidable in general. However, for finite-state loops and natural templates, one can replace the universal quantifier \(\forall s\) by a finite conjunction \(\bigwedge _{s\in S_\varphi }\) to obtain a (decidable) QF_LRA formula.

Theorem 2

The problem is decidable for finite-state loops and \(T\in \textsf{TnExp}\). If  \(T\) is fixed-partition, it is decidable via linear programming.

5 Constructing an Efficient CEGIS Loop

We now present a CEGIS loop (see inner loop of Fig. 1) in which a synthesizer and a verifier attempt to incrementally solve our problem statement (cf. p. 9).

5.1 The Verifier

We assume a verifier for checking . For CEGIS, it is important to get some feedback whenever \(I \not \in \textsf{AdmInv}\). To this end, we define:

Definition 2

For a state \(s\in S\), the set \(\textsf{AdmInv}(s)\) of \(s\)-admissible invariants is

figure q

For a subset \(S' \subseteq S\) of states, we define \(\textsf{AdmInv}(S') = \bigcap _{s \in S'} \textsf{AdmInv}(s)\).

Clearly, if \(I \not \in \textsf{AdmInv}\), then \(I \notin \textsf{AdmInv}(s)\) for some \(s \in S\), i.e. state s is a counterexample to well-definedness, inductivity, or safety of I. We denote the set of all such counterexamples (to the claim \(I \in \textsf{AdmInv}\)) by \(\textsf{CounterEx}_I \). We assume an effective (baseline) verifier for detecting counterexamples:

Definition 3

A verifier is any function \(\textsf{Verify}:\textsf{LinExp} \rightarrow \{\textsf{true}\} \cup S\) such that

  1. 1.

    \(\textsf{Verify}(I )=\textsf{true}\) if and only if \(I \in \textsf{AdmInv}\), and

  2. 2.

    \(\textsf{Verify}(I )=s\) implies \(s \in \textsf{CounterEx}_I \).

Proposition 3

([14]). There exist effective verifiers.

For example, one can implement an SMT-backed verifier using an encoding analogous to Thm. 1, where every model is a counterexample \(s \in \textsf{CounterEx}_I \):

figure r

5.2 The Counterexample-Guided Inductive Synthesizer

A synthesizer must generate from a given template \(T\) instances \(I \in \langle T \rangle \) which can be passed to a verifier for checking admissibility. To make an informed guess, our synthesizers can take a finite set of witnesses \(S' \subseteq S\) into account:

Definition 4

Let \(\textsf{FinStates}\) be the set of finite sets of states. A synthesizer for template \(T\in \textsf{TnExp}\) is any function \(\textsf{Synt}_T:\textsf{FinStates} \rightarrow \langle T \rangle \cup \{ \textsf{false}\} \) such that

  1. 1.

    if \(\textsf{Synt}_T(S') = I\), then \(I \in \langle T \rangle \cap \textsf{AdmInv}(S')\), and

  2. 2.

    \(\textsf{Synt}_T(S') = \textsf{false}\) if and only if \(\langle T \rangle \cap \textsf{AdmInv}(S') = \emptyset \).

To build a synthesizer \(\textsf{Synt}_T(S')\) for finite sets of states \(S' \subseteq S\), we proceed analogously to one-shot solving for finite-state loops (Thm. 2), i.e. we exploit

figure s

That is, our synthesizer may return any model \(\mathfrak {I}\) of the above constraint system; it can be implemented as one SMT query. In particular, one can efficiently find such an \(\mathfrak {I}\) for fixed-partition templates via linear programming.

Theorem 3 (Synthesizer Completeness)

For finite-state loops and natural templates \(T\in \textsf{TnExp}\), we have or \(\langle T \rangle \cap \textsf{AdmInv}= \emptyset \).

Using the synthesizer and verifier in concert is then intuitive as in Alg. 1. We incrementally ask our synthesizer to provide a candidate invariant I that is s-admissible for all states \(s \in S'\). Unless the synthesizer returns \(\textsf{false}\), we ask the verifier whether I is admissible. If yes, we return I; otherwise, we get a counterexample s and add it to \(S'\) before synthesizing the next candidate.

figure u

Remark 1

Without further restrictions, the verifier of Def. 3 may go into a counterexample enumeration spiral. In [12, Appx. C], we therefore discuss additional constraints that make this verifier act more cooperatively.    \(\lhd \)

6 Generalization to Termination and Lower Bounds

We extend our approach to (i) proving universal positive almost-sure termination (UPAST) – termination in finite expected runtime on all inputs, see [42, Sect. 6] – by synthesizing piecewise linear upper bounds on expected runtimes, and to (ii) verifying lower bounds on possibly unbounded expected values.

UPAST. We leverage Kaminski et al.’s weakest-precondition-style calculus for reasoning about expected runtimes [44, 45]:

Proposition 4

For every loop \({\texttt {while}}\left( \, {\varphi } \,\right) \left\{ \, {C} \,\right\} \), the monotone function

$$\begin{aligned} \varTheta :\quad \mathbb {E}\rightarrow \mathbb {E}, \qquad \varTheta (I)(s) ~{}={}~1 + \varPhi _{0}(I)(s)~, \end{aligned}$$

obtained from \(\varPhi _{0}\) (cf. Prob. 1) satisfies

$$\begin{aligned} \bigl (\textsf {{lfp}}~\varTheta \bigr )(s) ~{}={}~\begin{array}{l}\mathrm{``}\text {expected number of loop guard evaluations}\\ \text {when executing}\ {\texttt {while}}\left( \, {\varphi } \,\right) \left\{ \, {C} \,\right\} \text { on } s"~.\end{array} \end{aligned}$$

All properties of \(\varPhi _{0}\) relevant to our approach carry over to \(\varTheta \), thus enabling the synthesis of inductive invariants \(I \in \textsf{LinExp} \) satisfying \(0 \preceq I \) and \(\varTheta (I ) \preceq I \). Such \(I \) upper-bound the expected number of loop iterations [44] and, since expectations in \(\textsf{LinExp} \) never evaluate to infinity, \(I \) witnesses UPAST of the \({\texttt {while}}\)-loop.

Lower Bounds. Consider the problem of verifying a lower bound \(g \preceq \textsf {{lfp}}~\varPhi _{f}\) for some loop \(C' = {\texttt {while}}\left( \, {\varphi } \,\right) \left\{ \, {C} \,\right\} \). It is straightforward to modify our CEGIS approach for synthesizing sub-invariants, i.e. \(I \in \textsf{LinExp} \) with \(I \preceq \varPhi _{f}(I )\). However, Hark et al. [36] showed that sub-invariants do not necessarily lower-bound \(\textsf {{lfp}}~\varPhi _{f}\); they hence proposed a more involved yet sound induction rule for lower bounds:

Theorem 4

(Adapted from Hark et al. [36]). Let \(T\) be a natural template and \(I \in \langle T \rangle \). If  \(0\preceq I \), \(I \preceq \varPhi _{f}(I )\), and \(C'\) is UPAST, then

figure v

Akin to Prob. 2, given \(T\in \textsf{TnExp}\), we can compute \(T' \in \textsf{TnExp}\) s.t. for all \(\mathfrak {I}\),

figure w

which facilitates the extension of our verifier and synthesizer (see Sect. 5) for encoding and checking conditional difference boundedness. Hence, we can employ our CEGIS framework for verifying \(g \preceq \textsf {{lfp}}~\varPhi _{f}\) by (i) proving UPAST of \(C'\) as demonstrated above and (ii) synthesizing a c.d.b. sub-invariant \(I \) with \(g \preceq I \).

7 Empirical Evaluation

We have implemented a prototype of our techniques called cegispro2Footnote 9: CEGIS for PRObabilistic PROgrams. The tool is written in Python using pySMT [34] with Z3 [49] as the backend for SMT solving. cegispro2 proves upper- or lower bounds on expected outcomes of a probabilistic program by synthesizing quantitative inductive invariants. We investigate the applicability and scalability of our approach with a focus on the expressiveness of piecewise linear invariants. Moreover, we compare with three state-of-the-art tools – Storm [39], Absynth [50], and Exist [9] – on subsets of their benchmarks fitting into our framework.

Template Refinement. We start with a fixed-partition template \(T_1\) constructed automatically from the syntactic structure of the given loop (i.e. the loop guard and branches in the loop body, see e.g. Eq. (1)). If we learn that \(T_1\) admits no admissible invariant, we generate a refined template \(T_2\), and so on, until we find a template \(T_{i}\) with \(\langle T_i \rangle \cap \textsf{AdmInv}\ne \emptyset \) or realize that no further refinement is possible. We implemented three strategies for template refinement (including one producing non-fixed-partition templates); see [12, Appx. D] for details.

Fig. 4.
figure 4

Performance of cegispro2 vs. state-of-the-art tools on three verification tasks (time in seconds, log-scaled; MO=8GB). Markers above the solid line depict benchmarks where cegispro2 is faster (in different orders of magnitude marked by the dashed lines).

Finite-State Programs. Fig. 4a depicts experiments on verifying upper bounds on expected values of finite-state programs. For each benchmark, i.e. program and property with increasingly sharper bounds, we evaluate cegispro2 on all template-refinement strategies (cf.  [12, Appx. D]). We compare explicit- and symbolic-state engines of the probabilistic model checker Storm 1.6.3 [39] with exact arithmetic. Storm implements LP-based model checking (as in Sect. 4) but employs more efficient methods in its default configuration. Fig. 4a depicts the runtime of the best configuration. See detailed configurations in [12, Appx. E.1].

Results. (i) Our CEGIS approach synthesizes inductive invariants for a variety of programs. We mostly find syntactically small invariants with a small number of counterexamples compared to the state-space size (cf.  [12, Tab. 2]). This indicates that piecewise linear inductive invariants can be sufficiently expressive for the verification of finite-state programs. The overall performance of cegispro2 depends highly on the sharpness of the given thresholds. (ii) Our approach can outperform state-of-the-art explicit- and symbolic-state model checking techniques and can scale to huge state spaces. There are also simple programs where our method fails to find an inductive invariant (gridbig) or finds inductive invariants only for rather simple properties while requiring many counterexamples (gridsmall). Whether we need more sophisticated template refinements or whether these programs are not amenable to piecewise linear expectations is left for future work. (iii) There is no clear winner between the two fixed-partition template-refinement strategies (cf.  [12, Tab. 2]). We further observe that the non-fixed-partition refinement is not competitive as significantly more time is spent in the synthesizer to solve formulae with Boolean structures. We thus conclude that searching for good fixed-partition templates in a separate outer loop (cf.  Fig. 1) pays off.

Proving UPAST. Fig. 4b depicts experiments on proving UPAST of (possibly infinite-state) programs taken from [50] (restricted to \(\mathbb {N} \)-valued, linear programs with flattened nested loops). We compare to the LP-based tool Absynth [50] for computing upper bounds on expected runtimes. These benchmarks do not require template refinements. More details are given in [12, Appx. E.2].

Results. cegispro2 can prove UPAST of various infnite-state programs from the literature using very few counterexamples. Absynth mostly outperforms cegispro2Footnote 10, which is to be expected as Absynth is tailored to the computation of expected runtimes. Remarkably, the runtime bounds synthesized by cegispro2 are often as tight as the bounds synthesized by Absynth (cf.  [12, Tab. 3]).

Verifying Lower Bounds. Fig. 4c depicts experiments aiming to verify lower bounds on expected values of (possibly infinite-state) programs taken from [9]. We compare to Exist [9]Footnote 11, which combines CEGIS with sampling- and ML-based techniques. However, Exist synthesizes sub-invariants only, which might be unsound for proving lower bounds (cf. Sect. 6). Thus, for a fair comparison, Fig. 4c depicts experiments where both Exist and cegispro2 synthesize sub-invariants only, whereas in Fig. 4d, we compare cegispro2 that finds sub-invariants only with cegispro2 that additionally proves UPAST and c.d.b., thus obtaining sound lower bounds as per Thm. 4. No benchmark requires template refinements.

Results. cegispro2 is capable of verifying quantitative lower bounds and outperforms Exist (on 30/32 benchmarks) for synthesizing sub-invariants. Additionally proving UPAST and c.d.b. naturally requires more time. A manual inspection reveals that, for most TO/MO cases in Fig. 4d, there is no c.d.b. sub-invariant. One soundness check times out, since we could not prove UPAST for that benchmark.

8 Related Work

We discuss related works in invariant synthesis, probabilistic model checking, and symbolic inference. ICE [33] is a template-based, cex.-guided technique for learning invariants. More inductive synthesis approaches are surveyed in [4, 29].

Quantitative Invariant Synthesis. Apart from the discussed method [9], constraint solving-based approaches [26, 30, 46] aim to synthesize quantitative invariants for proving lower bounds over \(\mathbb {R}\)-valued program variables – arguably a simplification as it allows solvers to use (decidable) real arithmetic. In particular, [26] also obtains linear constraints from counterexamples ensuring certain validity conditions on candidate invariants. Apart from various technical differences, we identify three conceptual differences: (i) we support piecewise expectations which have been shown sufficiently expressive for verifying quantitative reachability properties; (ii) we focus on the integration of fast verifiers over efficiently decidable theories; and (iii) we do not need to assume termination or boundedness of expectations.

Various martingale-based approaches, such as [2, 19, 23, 24, 31, 32, 48], aim to synthesize quantitative invariants over \(\mathbb {R}\)-valued variables, see [55] for a recent survey. Most of these approaches yield invariants for proving almost-sure termination or bounding expected runtimes. \(\varepsilon \)-decreasing supermartingales [19, 20] and nonnegative repulsing supermartingales [55] can upper-bound arbitrary reachability probabilities. In contrast, we synthesize invariants for proving upper- or lower bounds for more general quantities, i.e. expectations. [10] can prove bounds on expected values via symbolic reasoning and Doob’s decomposition, which, however, requires user-supplied invariants and hints. [1] employs a CEGIS loop to train a neural network dedicated to learning a ranking supermartingale witnessing UPAST of (possibly continuous) probabilistic programs. They also use counterexamples provided by SMT solvers to guide the learning process.

The recurrence solving-based approach in [11] synthesizes nonlinear invariants encoding (higher-order) moments of program variables. However, the underlying algebraic techniques are confined to the sub-class of prob-solvable loops.

Probabilistic Model Checking. Symbolic probabilistic model checking focusses mostly on algebraic decision diagrams [3, 6], representing the transition relation symbolically and using equation solving or value iteration [8, 37, 53] on that representation. PrIC3 [15] finds quantitative invariants by iteratively overapproximating k-step reachability. Alternative CEGIS approaches synthesize Markov chains [18] and probabilistic programs [5] that satisfy reachability properties.

Symbolic Inference. Probabilistic inference – in the finite-horizon case – employs weighted model counting via either decision diagrams annotated with probabilities as in Dice  [40, 41] or approximate versions by SAT/SMT-solvers [17, 21, 22, 27, 54]. PSI  [35] determines symbolic representations of exact distributions. Prodigy  [25] decides whether a probabilistic loop agrees with an (invariant) specification.