Control-Flow Refinement for Complexity Analysis of Probabilistic Programs in KoAT (Short Paper)

Lommen, Nils; Meyer, Éléanore; Giesl, Jürgen

doi:10.1007/978-3-031-63498-7_14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14739))

Included in the following conference series:

International Joint Conference on Automated Reasoning

401 Accesses

Abstract

Recently, we showed how to use control-flow refinement (CFR) to improve automatic complexity analysis of integer programs. While up to now CFR was limited to classical programs, in this paper we extend CFR to probabilistic programs and show its soundness for complexity analysis. To demonstrate its benefits, we implemented our new CFR technique in our complexity analysis tool KoAT.

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 235950644 (Project GI 274/6-2) and DFG Research Training Group 2236 UnRAVeL.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

There exist numerous tools for complexity analysis of (non-probabilistic) programs, e.g., [2,3,4,5,6, 10, 11, 15, 16, 18, 19, 24, 25, 28, 30, 32]. Our tool KoAT infers upper runtime and size bounds for (non-probabilistic) integer programs in a modular way by analyzing subprograms separately and lifting the obtained results to global bounds on the whole program [10]. Recently, we developed several improvements of KoAT [18, 24, 25] and showed that incorporating control-flow refinement (CFR) [13, 14] increases the power of automated complexity analysis significantly [18].

There are also several approaches for complexity analysis of probabilistic programs, e.g., [1, 7, 9, 21,22,23, 27, 29, 31, 34]. In particular, we also adapted KoAT’s approach for runtime and size bounds, and introduced a modular framework for automated complexity analysis of probabilistic integer programs in [27]. However, the improvements of KoAT from [18, 24, 25] had not yet been adapted to the probabilistic setting. In particular, we are not aware of any existing technique to combine CFR with complexity analysis of probabilistic programs.

Thus, in this paper, we develop a novel CFR technique for probabilistic programs which could be used as a black box by every complexity analysis tool. Moreover, to reduce the overhead by CFR, we integrated CFR natively into KoAT by calling it on-demand in a modular way. Our experiments show that CFR increases the power of KoAT for complexity analysis of probabilistic programs substantially.

The idea of CFR is to gain information on the values of program variables and to sort out infeasible program paths. For example, consider the probabilistic while-loop (1). Here, we flip a (fair) coin and either set x to 0 or do nothing.

$$\begin{aligned} {\textbf {while }} \;x > 0\; {\textbf { do }} \; x \leftarrow 0 \; \oplus _{\nicefrac {1}{2}} \; \texttt {noop} \; {\textbf { end}} \end{aligned}$$

(1)

The update $x \leftarrow 0$ is in a loop. However, after setting x to 0, the loop cannot be executed again. To simplify its analysis, CFR “unrolls” the loop resulting in (2).

$$\begin{aligned} & {\textbf {while }} x > 0 {\textbf { do }} \; \texttt {break} \; \oplus _{\nicefrac {1}{2}} \; \texttt {noop} \; {\textbf { end}}\nonumber \\ & {\textbf {if }} x > 0 {\textbf { then }} x\leftarrow 0 {\textbf { end}} \end{aligned}$$

(2)

Here, x is updated in a separate, non-probabilistic if-statement and the loop does not change variables. Thus, we sorted out paths where $x \leftarrow 0$ was executed repeatedly. Now, techniques for probabilistic programs can be used for the while-loop. The rest of the program can be analyzed by techniques for non-probabilistic programs. In particular, this is important if (1) is part of a larger program.

We present necessary preliminaries in Sect. 2. In Sect. 3, we introduce our new control-flow refinement technique and show how to combine it with automated complexity analysis of probabilistic programs. We conclude in Sect. 4 by an experimental evaluation with our tool KoAT. We refer to [26] for further details on probabilistic programs and the soundness proof of our CFR technique.

2 Preliminaries

Let $\mathcal {V}$ be a set of variables. An atom is an inequation $p_1 < p_2$ for polynomials $p_1,p_2\in \mathbb {Z}[\mathcal {V}]$, and the set of all atoms is denoted by $\mathcal {A}(\mathcal {V})$. A constraint is a (possibly empty) conjunction of atoms, and $\mathcal {C}(\mathcal {V})$ denotes the set of all constraints. In addition to “<”, we also use “$\ge $”, “$=$”, etc., which can be simulated by constraints (e.g., $p_1 \ge p_2$ is equivalent to $p_2 < p_1 + 1$ for integers).

For probabilistic integer programs (PIPs), as in [27] we use a formalism based on transitions, which also allows us to represent while-programs like (1) easily. A PIP is a tuple $(\mathcal{P}\mathcal{V},\mathcal {L},\ell _0,\mathcal{G}\mathcal{T})$ with a finite set of program variables $\mathcal{P}\mathcal{V}\subseteq \mathcal {V}$, a finite set of locations $\mathcal {L}$, a fixed initial location $\ell _0 \in \mathcal {L}$, and a finite set of general transitions $\mathcal{G}\mathcal{T}$. A general transition $g\in \mathcal{G}\mathcal{T}$ is a finite set of transitions which share the same start location $\ell _g$ and the same guard $\varphi _g$. A transition is a 5-tuple $(\ell ,\varphi ,p,\eta ,\ell ')$ with a start location $\ell \in \mathcal {L}$, target location $\ell '\in \mathcal {L}\setminus \lbrace \ell _0 \rbrace $, guard $\varphi \in \mathcal {C}(\mathcal {V})$, probability $p\in [0,1]$, and update $\eta : \mathcal{P}\mathcal{V}\rightarrow \mathbb {Z}[\mathcal {V}]$. The probabilities of all transitions in a general transition add up to 1. We always require that general transitions are pairwise disjoint and let $\mathcal {T}= \biguplus _{g\in \mathcal{G}\mathcal{T}} g$ denote the set of all transitions. PIPs may have non-deterministic branching, i.e., the guards of several transitions can be satisfied. Moreover, we also allow non-deterministic (temporary) variables $\mathcal {V}\setminus \mathcal{P}\mathcal{V}$. To simplify the presentation, we do not consider transitions with individual costs and updates which use probability distributions, but the approach can easily be extended accordingly. From now on, we fix a PIP $\mathcal {P}= (\mathcal{P}\mathcal{V},\mathcal {L},\ell _0,\mathcal{G}\mathcal{T})$.

Example 1

The PIP in Fig. 1 has $\mathcal{P}\mathcal{V}= \{x,y\}$, $\mathcal {L}= \{\ell _0, \ell _1, \ell _2 \}$, and four general transitions $\{t_0\}$, $\{t_{1a}, t_{1b}\}$, $\{t_2\}$, $\{t_3\}$. The transition $t_0$ starts at the initial location $\ell _0$ and sets x to a non-deterministic positive value $u \in \mathcal {V}\setminus \mathcal{P}\mathcal{V}$, while y is unchanged. (In Fig. 1, we omitted unchanged updates like $\eta (y) = y$, the guard $ \texttt {true}$, and the probability $p = 1$ to ease readability.) If the general transition is a singleton, we often use transitions and general transitions interchangeably. Here, only $t_{1a}$ and $t_{1b}$ form a non-singleton general transition which corresponds to the program (1). We denoted such (probabilistic) transitions by dashed arrows in Fig. 1. We extended (1) by a loop of $t_2$ and $t_3$ which is only executed if $y > 0 \wedge x = 0$ (due to $t_2$’s guard) and decreases y by 1 in each iteration (via $t_3$’s update).

A state is a function $\sigma : \mathcal {V}\rightarrow \mathbb {Z}$, $\varSigma $ denotes the set of all states, and a configuration is a pair of a location and a state. To extend finite sequences of configurations to infinite ones, we introduce a special location $\ell _\bot $ (indicating termination) and a special transition $t_\bot $ (and its general transition $g_\bot = \lbrace t_\bot \rbrace $) to reach the configurations of a run after termination. Let $\mathcal {L}_\bot = \mathcal {L}\uplus \lbrace \ell _\bot \rbrace $, $\mathcal {T}_\bot = \mathcal {T}\uplus \lbrace t_\bot \rbrace $, $\mathcal{G}\mathcal{T}_\bot = \mathcal{G}\mathcal{T}\uplus \lbrace g_\bot \rbrace $, and let $\textsf{Conf}= (\mathcal {L}_\bot \times \varSigma )$ denote the set of all configurations. A path has the form $c_0\rightarrow _{t_1} \dots \rightarrow _{t_n} c_n$ for $c_0,\dots ,c_n\in \textsf{Conf}$ and $t_1,\dots ,t_n\in \mathcal {T}_\bot $ for an $n\in \mathbb {N}$, and a run is an infinite path $c_0\rightarrow _{t_1} c_1 \rightarrow _{t_2} \cdots $. Let $\textsf{Path}$ and $\textsf{Run}$ denote the sets of all paths and all runs, respectively.

We use Markovian schedulers $\mathfrak {S}: \textsf{Conf}\rightarrow \mathcal{G}\mathcal{T}_\bot \times \varSigma $ to resolve all non-determinism. For $c = (\ell ,\sigma ) \in \textsf{Conf}$, a scheduler $\mathfrak {S}$ yields a pair $\mathfrak {S}(c) = (g,\tilde{\sigma })$ where g is the next general transition to be taken (with $\ell = \ell _g$) and $\tilde{\sigma }$ chooses values for the temporary variables such that $\tilde{\sigma } \models \varphi _g$ and $\sigma (v) = \tilde{\sigma }(v)$ for all $v \in \mathcal{P}\mathcal{V}$. If $\mathcal{G}\mathcal{T}$ contains no such g, we obtain $\mathfrak {S}(c) = (g_\bot ,\sigma )$. For the formal definition of Markovian schedulers, we refer to [26].

For every $\mathfrak {S}$ and $\sigma _0\in \varSigma $, we define a probability mass function $pr_{\mathfrak {S},\sigma _0}$. For all $c\in \textsf{Conf}$, $pr_{\mathfrak {S},\sigma _0}(c)$ is the probability that a run with scheduler $\mathfrak {S}$ and the initial state $\sigma _0$ starts in c. So $pr_{\mathfrak {S},\sigma _0}(c) = 1$ if $c = (\ell _0,\sigma _0)$ and $pr_{\mathfrak {S},\sigma _0}(c) = 0$ otherwise.

For all $c, c' \in \textsf{Conf}$ and $t \in \mathcal {T}_\bot $, let $pr_{\mathfrak {S}}(c \rightarrow _{t} c')$ be the probability that one goes from c to $c'$ via the transition t when using the scheduler $\mathfrak {S}$ (see [26] for the formal definition of $pr_{\mathfrak {S}}$). Then for any path $f = (c_0\rightarrow _{t_1} \dots \rightarrow _{t_n} c_n)\in \textsf{Path}$, let $pr_{\mathfrak {S},\sigma _0}(f) = pr_{\mathfrak {S},\sigma _0}(c_0)\cdot pr_{\mathfrak {S}}(c_0 \rightarrow _{t_1} c_1)\cdot \ldots \cdot pr_{\mathfrak {S}}(c_{n-1}\rightarrow _{t_n}c_n)$. Here, all paths f which are not “admissible” (e.g., guards are not fulfilled, transitions are starting or ending in wrong locations, etc.) have probability $pr_{\mathfrak {S},\sigma _0}(f) = 0$.

The semantics of PIPs can be defined via a corresponding probability space, obtained by a standard cylinder construction. Let $\mathbb {P}_{\mathfrak {S},\sigma _0}$ denote the probability measure which lifts $pr_{\mathfrak {S},\sigma _0}$ to cylinder sets: For any $f\in \textsf{Path}$, we have $pr_{\mathfrak {S},\sigma _0}(f) = \mathbb {P}_{\mathfrak {S},\sigma _0}(\text {Pre}_f)$ for the set $\text {Pre}_f$ of all infinite runs with prefix f. So $\mathbb {P}_{\mathfrak {S},\sigma _0}(\varTheta )$ is the probability that a run from $\varTheta \subseteq \textsf{Run}$ is obtained when using the scheduler $\mathfrak {S}$ and starting in $\sigma _0$. Let $\mathbb {E}_{\mathfrak {S},\sigma _0}$ denote the associated expected value operator. So for any random variable $X: \textsf{Run}\rightarrow \overline{\mathbb {N}}= \mathbb {N}\cup \lbrace \infty \rbrace $, we have $\mathbb {E}_{\mathfrak {S},\sigma _0}(X) = \sum _{n\in \overline{\mathbb {N}}}\; n\cdot \mathbb {P}_{\mathfrak {S},\sigma _0}(X = n)$. For a detailed construction, see [26].

Definition 2

(Expected Runtime). For $g \in \mathcal{G}\mathcal{T}$, $\mathcal {R}_g: \textsf{Run}\rightarrow \overline{\mathbb {N}}$ is a random variable with $\mathcal {R}_g(c_0\rightarrow _{t_1} c_1 \rightarrow _{t_2} \cdots ) = |\lbrace i\in \mathbb {N}\mid t_i \in g \rbrace |$, i.e., $\mathcal {R}_g(\vartheta )$ is the number of times that a transition from g was applied in the run $\vartheta \in \textsf{Run}$. Moreover, the random variable $\mathcal {R}: \textsf{Run}\rightarrow \overline{\mathbb {N}}$ denotes the number of transitions that were executed before termination, i.e., for all $\vartheta \in \textsf{Run}$ we have $\mathcal {R}(\vartheta ) = \sum _{g\in \mathcal{G}\mathcal{T}} \mathcal {R}_g(\vartheta )$. For a scheduler $\mathfrak {S}$ and $\sigma _0 \in \varSigma $, the expected runtime of g is $\mathbb {E}_{\mathfrak {S},\sigma _0}(\mathcal {R}_g)$ and the expected runtime of the program is $\mathcal {R}_{\mathfrak {S},\sigma _0}= \mathbb {E}_{\mathfrak {S},\sigma _0}(\mathcal {R})$.

The goal of complexity analysis for a PIP is to compute a bound on its expected runtime complexity. The set of bounds $\mathcal {B}$ consists of all functions from $\varSigma \rightarrow \mathbb {R}_{\ge 0}$.

Definition 3

(Expected Runtime Bound and Complexity [27]). The function ${\mathcal{R}\mathcal{B}}:\mathcal{G}\mathcal{T}\rightarrow \mathcal {B}$ is an expected runtime bound if $({\mathcal{R}\mathcal{B}}(g))(\sigma _0) \ge \sup _\mathfrak {S}\mathbb {E}_{\mathfrak {S},\sigma _0}(\mathcal {R}_g)$ for all $\sigma _0\in \varSigma $ and all $g\in \mathcal{G}\mathcal{T}$. Then $\sum _{g\in \mathcal{G}\mathcal{T}} {\mathcal{R}\mathcal{B}}(g)$ is a bound on the expected runtime complexity of the whole program, i.e., $\sum _{g\in \mathcal{G}\mathcal{T}} (({\mathcal{R}\mathcal{B}}(g))(\sigma _0)) \ge \sup _\mathfrak {S}\mathcal {R}_{\mathfrak {S},\sigma _0}$ for all $\sigma _0\in \varSigma $.

3 Control-Flow Refinement for PIPs

We now introduce our novel CFR algorithm for probabilistic integer programs, based on the partial evaluation technique for non-probabilistic programs from [13, 14, 18]. In particular, our algorithm coincides with the classical CFR technique when the program is non-probabilistic. The goal of CFR is to transform a program $\mathcal {P}$ into a program $\mathcal {P}'$ which is “easier” to analyze. Thm. 4 shows the soundness of our approach, i.e., that $\mathcal {P}$ and $\mathcal {P}'$ have the same expected runtime complexity.

Our CFR technique considers “abstract” evaluations which operate on sets of states. These sets are characterized by conjunctions $\tau $ of constraints from $\mathcal {C}(\mathcal{P}\mathcal{V})$, i.e., $\tau $ stands for all states $\sigma \in \varSigma $ with $\sigma \models \tau $. We now label locations $\ell $ by formulas $\tau $ which describe (a superset of) those states $\sigma $ which can occur in $\ell $, i.e., where a configuration $(\ell ,\sigma )$ is reachable from some initial configuration $(\ell _0,\sigma _0)$. We begin with labeling every location by the constraint $ \texttt {true}$. Then we add new copies of the locations with refined labels $\tau $ by considering how the updates of transitions affect the constraints of their start locations and their guards. The labeled locations become the new locations in the refined program.

Since a location might be reachable by different paths, we may construct several variants $\langle \ell ,\tau _1\rangle ,\dots ,\langle \ell ,\tau _n\rangle $ of the same original location $\ell $. Thus, the formulas $\tau $ are not necessarily invariants that hold for all evaluations that reach a location $\ell $, but we perform a case analysis and split up a location $\ell $ according to the different sets of states that may reach $\ell $. Our approach ensures that a labeled location $\langle \ell ,\tau \rangle $ can only be reached by configurations $(\ell , \sigma )$ where $\sigma \models \tau $.

We apply CFR only on-demand on a (sub)set of transitions $\mathcal {S}\subseteq \mathcal {T}$ (thus, CFR can be performed in a modular way for different subsets $\mathcal {S}$). In practice, we choose $\mathcal {S}$ heuristically and use CFR only on transitions where our currently inferred runtime bounds are “not yet good enough”. Then, for $\mathcal {P}= (\mathcal{P}\mathcal{V},\mathcal {L},\ell _0,\mathcal{G}\mathcal{T})$, the result of the CFR algorithm is the program $\mathcal {P}' = (\mathcal{P}\mathcal{V},\mathcal {L}',\langle \ell _0, \texttt {true}\rangle ,\mathcal{G}\mathcal{T}')$ where $\mathcal {L}'$ and $\mathcal{G}\mathcal{T}'$ are the smallest sets satisfying the properties (3), (4), and (5) below.

First, we require that for all $\ell \in \mathcal {L}$, all “original” locations $\langle \ell , \texttt {true}\rangle $ are in $\mathcal {L}'$. In these locations, we do not have any information on the possible states yet:

$$\begin{aligned} \forall \; \ell \in \mathcal {L}.\; \langle \ell , \texttt {true}\rangle \in \mathcal {L}' \end{aligned}$$

(3)

If we already introduced a location $\langle \ell ,\tau \rangle \in \mathcal {L}'$ and there is a transition $(\ell ,\varphi ,p,\eta ,\ell ') \in \mathcal {S}$, then (4) requires that we also add the location $\langle \ell ',\tau _{\varphi ,\eta ,\ell '}\rangle $ to $\mathcal {L}'$. The formula $\tau _{\varphi ,\eta ,\ell '}$ over-approximates the set of states that can result from states that satisfy $\tau $ and the guard $\varphi $ of the transition when applying the update $\eta $. More precisely, $\tau _{\varphi ,\eta ,\ell '}$ has to satisfy $(\tau \wedge \varphi )\models \eta (\tau _{\varphi ,\eta ,\ell '})$. For example, if $\tau = (x = 0)$, $\varphi = \texttt {true}$, and $\eta (x) = x-1$, then we might have $\tau _{\varphi ,\eta ,\ell '} = (x=-1)$.

To ensure that every $\ell '\in \mathcal {L}$ only gives rise to finitely many new labeled locations $\langle \ell ', \tau _{\varphi ,\eta ,\ell '} \rangle $, we perform property-based abstraction: For every location $\ell '$, we use a finite so-called abstraction layer $\alpha _{\ell '} \subset \lbrace p_1 \sim p_2\mid p_1,p_2\in \mathbb {Z}[\mathcal{P}\mathcal{V}]\text { and }\sim \;\in \lbrace <,\le ,= \rbrace \rbrace $ (see [14] for heuristics to compute $\alpha _{\ell '}$). Then we require that $\tau _{\varphi ,\eta ,\ell '}$ must be a conjunction of constraints from $\alpha _{\ell '}$ (i.e., $\tau _{\varphi ,\eta ,\ell '} \subseteq \alpha _{\ell '}$ when regarding sets of constraints as their conjunction). This guarantees termination of our CFR algorithm, since for every location $\ell '$ there are only finitely many possible labels.

$$\begin{aligned} \nonumber & \forall \; \langle \ell ,\tau \rangle \in \mathcal {L}'.\;\forall \; (\ell ,\varphi ,p,\eta ,\ell ')\in \mathcal {S}.\; \langle \ell ',\tau _{\varphi ,\eta ,\ell '}\rangle \in \mathcal {L}' \\ & \qquad \qquad \qquad \qquad \quad \,\, \text {where } \tau _{\varphi ,\eta ,\ell '} = \lbrace \psi \in \alpha _{\ell '}\mid (\tau \wedge \varphi )\models \eta (\psi ) \rbrace \end{aligned}$$

(4)

Finally, we have to ensure that $\mathcal{G}\mathcal{T}'$ contains all “necessary” (general) transitions. To this end, we consider all $g\in \mathcal{G}\mathcal{T}$. The transitions $(\ell ,\varphi ,p,\eta ,\ell ')$ in $g \cap \mathcal {S}$ now have to connect the appropriately labeled locations. Thus, for all labeled variants $\langle \ell ,\tau \rangle \in \mathcal {L}'$, we add the transition $(\langle \ell ,\tau \rangle ,\tau \wedge \varphi ,p,\eta ,\langle \ell ', \tau _{\varphi ,\eta ,\ell '}\rangle )$. In contrast, the transitions $(\ell ,\varphi ,p,\eta ,\ell ')$ in $g \setminus \mathcal {S}$ only reach the location where $\ell '$ is labeled with $ \texttt {true}$, i.e., here we add the transition $(\langle \ell ,\tau \rangle ,\tau \wedge \varphi ,p,\eta ,\langle \ell ', \texttt {true}\rangle )$.

$$\begin{aligned} & \forall \; \langle \ell ,\tau \rangle \in \mathcal {L}'.\;\forall \; g\in \mathcal{G}\mathcal{T}.\nonumber \\ & \left( \lbrace (\langle \ell ,\tau \rangle ,\tau \wedge \varphi ,p,\eta ,\langle \ell ', \tau _{\varphi ,\eta ,\ell '}\rangle ) \mid (\ell ,\varphi ,p,\eta ,\ell ')\in g \cap \mathcal {S} \rbrace \;\cup \right. \nonumber \\ & \;\left. \;\lbrace (\langle \ell ,\tau \rangle ,\tau \wedge \varphi ,p,\eta ,\langle \ell ', \texttt {true}\rangle ) \mid (\ell ,\varphi ,p,\eta ,\ell ')\in g \setminus \mathcal {S} \rbrace \right) \qquad \in \mathcal{G}\mathcal{T}' \end{aligned}$$

(5)

$\mathcal {L}'$ and $\bigcup _{g\in \mathcal{G}\mathcal{T}'}g$ are finite due to the property-based abstraction, as there are only finitely many possible labels for each location. Hence, repeatedly “unrolling” transitions by (5) leads to the (unique) least fixpoint. Moreover, (5) yields proper general transitions, i.e., their probabilities still add up to 1. In practice, we remove transitions with unsatisfiable guards, and locations that are not reachable from $\langle \ell _0, \texttt {true}\rangle $. Thm. 4 shows the soundness of our approach (see [26] for its proof).

Theorem 4

(Soundness of CFR for PIPs). Let $\mathcal {P}'\!=\!(\mathcal{P}\mathcal{V},\mathcal {L}',\langle \ell _0, \texttt {true}\rangle ,\mathcal{G}\mathcal{T}')$ be the PIP such that $\mathcal {L}'$ and $\mathcal{G}\mathcal{T}'$ are the smallest sets satisfying (3), (4), and (5). Let $\mathcal {R}_{\mathfrak {S},\sigma _0}^\mathcal {P}$ and $\mathcal {R}_{\mathfrak {S},\sigma _0}^{\mathcal {P}'}$ be the expected runtimes of $\mathcal {P}$ and $\mathcal {P}'$, respectively. Then for all $\sigma _0 \in \varSigma $ we have $\sup _{\mathfrak {S}}\mathcal {R}_{\mathfrak {S},\sigma _0}^\mathcal {P}= \sup _{\mathfrak {S}}\mathcal {R}_{\mathfrak {S},\sigma _0}^{\mathcal {P}'}$.

CFR Algorithm and its Runtime: To implement the fixpoint construction of Thm. 4 (i.e., to compute the PIP $\mathcal {P}'$), our algorithm starts by introducing all “original” locations $\langle \ell , \texttt {true}\rangle $ for $\ell \in \mathcal {L}$ according to (3). Then it iterates over all labeled locations $\langle \ell ,\tau \rangle $ and all transitions $t \in \mathcal {T}$. If the start location of t is $\ell $, then the algorithm extends $\mathcal{G}\mathcal{T}'$ by a new transition according to (5). Moreover, it also adds the corresponding labeled target location to $\mathcal {L}'$ (as in (4)), if $\mathcal {L}'$ did not contain this labeled location yet. Afterwards, we mark $\langle \ell ,\tau \rangle $ as finished and proceed with a previously computed labeled location that is not marked yet. So our implementation iteratively “unrolls” transitions by (5) until no new labeled locations are obtained (this yields the least fixpoint mentioned above). Thus, unrolling steps with transitions from $\mathcal {T}\setminus \mathcal {S}$ do not invoke further computations.

To over-approximate the runtime of this algorithm, note that for every location $\ell \in \mathcal {L}$, there can be at most $2^{|\alpha _\ell |}$ many labeled locations of the form $\langle \ell ,\tau \rangle $. So if $\mathcal {L}= \lbrace \ell _0,\dots ,\ell _n \rbrace $, then the overall number of labeled locations is at most $2^{|\alpha _{\ell _0}|} + \ldots + 2^{|\alpha _{\ell _n}|}$. Hence, the algorithm performs at most $|\mathcal {T}| \cdot (2^{|\alpha _{\ell _0}|} + \ldots + 2^{|\alpha _{\ell _n}|})$ unrolling steps.

Example 5

For the PIP in Fig. 1 and $\mathcal {S}= \lbrace t_{1a},t_{1b},t_2,t_3 \rbrace $, by (3) we start with $\mathcal {L}'=\{\langle \ell _i, \texttt {true}\rangle \mid i\in \lbrace 0,1,2 \rbrace \}$. We abbreviate $\langle \ell _i, \texttt {true}\rangle $ by $\ell _i$ in the final result of the CFR algorithm in Fig. 2. As $t_0 \in \lbrace t_0 \rbrace \setminus \mathcal {S}$, by (5) $t_0$ is redirected such that it starts at $\langle \ell _0, \texttt {true}\rangle $ and ends in $\langle \ell _1, \texttt {true}\rangle $, resulting in $t_0'$. We always use primes to indicate the correspondence between new and original transitions.

Next, we consider $\lbrace t_{1a},t_{1b} \rbrace \subseteq \mathcal {S}$ with the guard $\varphi = (x > 0)$ and start location $\langle \ell _1, \texttt {true}\rangle $. We first handle $t_{1a}$ which has the update $\eta = { \textsf {id}}$. We use the abstraction layer $\alpha _{\ell _0} = \varnothing $, $\alpha _{\ell _1} = \lbrace x = 0 \rbrace $, and $\alpha _{\ell _2} = \lbrace x = 0 \rbrace $. Thus, we have to find all $\psi \in \alpha _{\ell _1} = \lbrace x = 0 \rbrace $ such that $( \texttt {true}\wedge x>0) \models \eta (\psi )$. Hence, $\tau _{x>0, { \textsf {id}}, \ell _1}$ is the empty conjunction $ \texttt {true}$ as no $\psi $ from $\alpha _{\ell _1}$ satisfies this property. We obtain

$$\begin{aligned} t_{1a}': & \; (\langle \ell _1, \texttt {true}\rangle ,x > 0,\nicefrac {1}{2},{ \textsf {id}},\langle \ell _1, \texttt {true}\rangle ). \end{aligned}$$

In contrast, $t_{1b}$ has the update $\eta (x) = 0$. To determine $\tau _{x>0, \eta , \ell _1}$, again we have to find all $\psi \in \alpha _{\ell _1} = \lbrace x = 0 \rbrace $ such that $( \texttt {true}\wedge x>0) \models \eta (\psi )$. Here, we get $\tau _{x>0, \eta , \ell _1} = (x = 0)$. Thus, by (4) we create the location $\langle \ell _1,x = 0\rangle $ and obtain

$$\begin{aligned} t_{1b}': & \; (\langle \ell _1, \texttt {true}\rangle ,x > 0,\nicefrac {1}{2},\eta (x) = 0,\langle \ell _1,x = 0\rangle ). \end{aligned}$$

As $t_{1a}$ and $t_{1b}$ form one general transition, by (5) we obtain $\{ t_{1a}', t_{1b}' \} \in \mathcal{G}\mathcal{T}'$.

Now, we consider transitions resulting from $\lbrace t_{1a},t_{1b} \rbrace $ with the start location $\langle \ell _1,x = 0\rangle $. However, $\tau = (x=0)$ and the guard $\varphi = (x > 0)$ are conflicting, i.e., the transitions would have an unsatisfiable guard $\tau \wedge \varphi $ and are thus omitted.

Next, we consider transitions resulting from $t_2$ with $\langle \ell _1, \texttt {true}\rangle $ or $\langle \ell _1,x = 0\rangle $ as their start location. Here, we obtain two (general) transitions $\{t_2'\}, \{t_2''\} \in \mathcal{G}\mathcal{T}'$:

$$\begin{aligned} t_{2}': (\langle \ell _1,x = 0 & \rangle , y > 0 \wedge x = 0,1,{ \textsf {id}},\langle \ell _2,x = 0\rangle ) \\ t_{2}'': (\langle \ell _1, \texttt {true}& \rangle , y > 0 \wedge x = 0,1,{ \textsf {id}},\langle \ell _2,x=0\rangle ) \end{aligned}$$

However, $t_2''$ can be ignored since $x = 0$ contradicts the invariant $x > 0$ at $\langle \ell _1, \texttt {true}\rangle $. KoAT uses Apron [20] to infer invariants like $x > 0$ automatically. Finally, $t_3$ leads to the transition $t_{3}': (\langle \ell _2,x = 0\rangle ,x=0,1,\eta (y) = y - 1,\langle \ell _1,x = 0\rangle )$. Thus, we obtain $\mathcal {L}' = \lbrace \langle \ell _i, \texttt {true}\rangle \mid i\in \lbrace 0,1 \rbrace \rbrace \cup \lbrace \langle \ell _i,x=0\rangle \mid i\in \lbrace 1,2 \rbrace \rbrace $.

KoAT infers a bound ${\mathcal{R}\mathcal{B}}(g)$ for each $g \in \mathcal{G}\mathcal{T}$ individually (thus, non-probabilistic program parts can be analyzed by classical techniques). Then $\sum _{g\in \mathcal{G}\mathcal{T}}{\mathcal{R}\mathcal{B}}(g)$ is a bound on the expected runtime complexity of the whole program, see Definition 3.

Example 6

We now infer a bound on the expected runtime complexity of the PIP in Fig. 2. Transition $t_0'$ is not on a cycle, i.e., it can be evaluated at most once. So ${\mathcal{R}\mathcal{B}}(\lbrace t_0' \rbrace ) = 1$ is an (expected) runtime bound for the general transition $\lbrace t_0' \rbrace $.

For the general transition $\lbrace t_{1a}',t_{1a}' \rbrace $, KoAT infers the expected runtime bound 2 via probabilistic linear ranking functions (PLRFs, see e.g., [27]). More precisely, KoAT finds the constant PLRF $\lbrace \ell _1 \mapsto 2, \langle \ell _1,x = 0\rangle \mapsto 0 \rbrace $. In contrast, in the original program of Fig. 1, $\lbrace t_{1a}, t_{1b} \rbrace $ is not decreasing w.r.t. any constant PLRF, because $t_{1a}$ and $t_{1b}$ have the same target location. So here, every PLRF where $\lbrace t_{1a}, t_{1b} \rbrace $ decreases in expectation depends on x. However, such PLRFs do not yield a finite runtime bound in the end, as $t_0$ instantiates x by the non-deterministic value u. Therefore, KoAT fails on the program of Fig. 1 without using CFR.

For the program of Fig. 2, KoAT infers ${\mathcal{R}\mathcal{B}}(\lbrace t_2' \rbrace ) ={\mathcal{R}\mathcal{B}}(\lbrace t_3' \rbrace ) =y$. By adding all runtime bounds, we obtain the bound $3+2\cdot y$ on the expected runtime complexity of the program in Fig. 2 and thus by Theorem 4 also of the program in Fig. 1.

4 Implementation, Evaluation, and Conclusion

We presented a novel control-flow refinement technique for probabilistic programs and proved that it does not modify the program’s expected runtime complexity. This allows us to combine CFR with approaches for complexity analysis of probabilistic programs. Compared to its variant for non-probabilistic programs, the soundness proof of Theorem 4 for probabilistic programs is considerably more involved.

Up to now, our complexity analyzer KoAT used the tool iRankFinder [13] for CFR of non-probabilistic programs [18]. To demonstrate the benefits of CFR for complexity analysis of probabilistic programs, we now replaced the call to iRankFinder in KoAT by a native implementation of our new CFR algorithm. KoAT is written in OCaml and it uses Z3 [12] for SMT solving, Apron [20] to generate invariants, and the Parma Polyhedra Library [8] for computations with polyhedra.

Table 1. Evaluation of CFR on Probabilistic Programs

Full size table

We used all 75 probabilistic benchmarks from [27, 29] and added 15 new benchmarks including our leading example and problems adapted from the Termination Problem Data Base [33], e.g., a probabilistic version of McCarthy’s 91 function. Our benchmarks also contain examples where CFR is useful even if it cannot separate probabilistic from non-probabilistic program parts as in our leading example.

Table 1 shows the results of our experiments. We compared the configuration of KoAT with CFR (“KoAT + CFR”) against KoAT without CFR. Moreover, as in [27], we also compared with the main other recent tools for inferring upper bounds on the expected runtimes of probabilistic integer programs (Absynth [29] and eco-imp [7]). As in the Termination Competition [17], we used a timeout of 5 min per example. The first entry in every cell is the number of benchmarks for which the tool inferred the respective bound. In brackets, we give the corresponding number when only regarding our new examples. For example, KoAT + CFR finds a finite expected runtime bound for 84 of the 90 examples. A linear expected bound (i.e., in $\mathcal {O}(n)$) is found for 56 of these 84 examples, where 12 of these benchmarks are from our new set. $\mathrm {AVG(s)}$ is the average runtime in seconds on all benchmarks and $\mathrm {AVG^+(s)}$ is the average runtime on all successful runs.

The experiments show that similar to its benefits for non-probabilistic programs [18], CFR also increases the power of automated complexity analysis for probabilistic programs substantially, while the runtime of the analyzer may become longer since CFR increases the size of the program. The experiments also indicate that a related CFR technique is not available in the other complexity analyzers. Thus, we conjecture that other tools for complexity or termination analysis of PIPs would also benefit from the integration of our CFR technique.

KoAT’s source code, a binary, and a Docker image are available at:

https://koat.verify.rwth-aachen.de/prob_cfr

The website also explains how to use our CFR implementation separately (without the rest of KoAT), in order to access it as a black box by other tools. Moreover, the website provides a web interface to directly run KoAT online, and details on our experiments, including our benchmark collection.

References

Agrawal, S., Chatterjee, K., Novotný, P.: Lexicographic ranking supermartingales: an efficient approach to termination of probabilistic programs’. In: Proceedings of the ACM on Programming Languages, vol. 2. POPL (2017). https://doi.org/10.1145/3158122
Albert, E., Arenas, P., Genaim, S., Puebla, G.: Automatic inference of upper bounds for recurrence relations in cost analysis. In: Alpuente, M., Vidal, G. (eds.) SAS 2008. LNCS, vol. 5079, pp. 221–237. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69166-2_15
Chapter Google Scholar
Albert, E., Arenas, P., Genaim, S., Puebla, G., Zanardini, D.: Cost analysis of object-oriented bytecode programs. Theor. Comput. Sci. 413(1), 142–159 (2012). https://doi.org/10.1016/j.tcs.2011.07.009
Article MathSciNet Google Scholar
Albert, E., Bofill, M., Borralleras, C., Martín-Martín, E., Rubio, A.: Resource analysis driven by (conditional) termination proofs. Theory Pract. Log. Program. 19(5–6), 722–739 (2019). https://doi.org/10.1017/S1471068419000152
Article MathSciNet Google Scholar
Alias, C., Darte, A., Feautrier, P., Gonnord, L.: Multi-dimensional rankings, program termination, and complexity bounds of flowchart programs. In: Cousot, R., Martel, M. (eds.) SAS 2010. LNCS, vol. 6337, pp. 117–133. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15769-1_8
Chapter Google Scholar
Avanzini, M., Moser, G.: A combination framework for complexity. In: van Raamsdonk, F. (ed.) RTA 2013. LIPIcs, vol. 21, pp. 55–70 (2013). https://doi.org/10.4230/LIPIcs.RTA.2013.55
Avanzini, M., Moser, G., Schaper, M.: A modular cost analysis for probabilistic programs. In: Proceedings of the ACM on Programming Languages, vol. 4. OOPSLA (2020). https://doi.org/10.1145/3428240
Bagnara, R., Hill, P.M., Zaffanella, E.: The Parma Polyhedra Library: toward a complete set of numerical abstractions for the analysis and verification of hardware and software systems. Sci. Comput. Program. 72, 3–21 (2008). https://doi.org/10.1016/j.scico.2007.08.001
Article MathSciNet Google Scholar
Batz, K., Kaminski, B.L., Katoen, J.-P., Matheja, C., Verscht, L.: A calculus for amortized expected runtimes. In: Proceedings of the ACM on Programming Languages, vol. 7. POPL (2023). https://doi.org/10.1145/3571260
Brockschmidt, M., Emmes, F., Falke, S., Fuhs, C., Giesl, J.: Analyzing runtime and size complexity of integer programs. ACM Trans. Program. Lang. Syst. 38, 1–50 (2016). https://doi.org/10.1145/2866575
Article Google Scholar
Carbonneaux, Q., Hoffmann, J., Shao, Z.: Compositional certified resource bounds. In: Grove, D., Blackburn, S.M. (eds.) PLDI 2015, pp. 467–478 (2015). https://doi.org/10.1145/2737924.2737955
de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24
Chapter Google Scholar
Doménech, J.J., Genaim, S.: iRankFinder. In: Lucas, S. (ed.) WST 2018, p. 83 (2018). http://wst2018.webs.upv.es/wst2018proceedings.pdf
Doménech, J.J., Gallagher, J.P., Genaim, S.: Control-flow refinement by partial evaluation, and its application to termination and cost analysis. In: Theory and Practice of Logic Programming vol. 19(5-6), pp. 990–1005 (2019). https://doi.org/10.1017/S1471068419000310
Flores-Montoya, A.: Upper and lower amortized cost bounds of programs expressed as cost relations. In: Fitzgerald, J., Heitmeyer, C., Gnesi, S., Philippou, A. (eds.) FM 2016. LNCS, vol. 9995, pp. 254–273. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48989-6_16
Chapter Google Scholar
Frohn, F., Giesl, J.: Complexity analysis for Java with AProVE. In: Polikarpova, N., Schneider, S. (eds.) IFM 2017. LNCS, vol. 10510, pp. 85–101. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66845-1_6
Chapter Google Scholar
Giesl, J., Rubio, A., Sternagel, C., Waldmann, J., Yamada, A.: The termination and complexity competition. In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) TACAS 2019. LNCS, vol. 11429, pp. 156–166. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17502-3_10
Chapter Google Scholar
Giesl, J., Lommen, N., Hark, M., Meyer, F.: Improving automatic complexity analysis of integer programs. In: The Logic of Software. A Tasting Menu of Formal Methods. LNCS 13360, pp. 193–228 (2022). https://doi.org/10.1007/978-3-031-08166-8_10
Hoffmann, J., Das, A., Weng, S.-C.: Towards automatic resource bound analysis for OCaml’. In: Castagna, G., Gordon, A.D. (eds.) POPL 2017, pp. 359–373 (2017). https://doi.org/10.1145/3009837.3009842
Jeannet, B., Miné, A.: Apron: a library of numerical abstract domains for static analysis. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 661–667. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02658-4_52
Chapter Google Scholar
Kaminski, B.L., Katoen, J.-P., Matheja, C., Olmedo, F.: Weakest precondition reasoning for expected runtimes of randomized algorithms. J. ACM 65, 1–68 (2018). https://doi.org/10.1145/3208102
Article MathSciNet Google Scholar
Kaminski, B.L., Katoen, J.-P., Matheja, C.: Expected runtime analyis by program verification. In: Barthe, G., Katoen, J.-P., Silva, A (eds.) Foundations of Probabilistic Programming, pp. 185–220. Cambridge University Press (2020). https://doi.org/10.1017/9781108770750.007
Leutgeb, L., Moser, G., Zuleger, F.: Automated expected amortised cost analysis of probabilistic data structures. In: Shoham, S., Vizel, Y. (eds.) CAV 2022. LNCS, vol. 13372, pp. 70–91 (2022). https://doi.org/10.1007/978-3-031-13188-2_4
Lommen, N., Meyer, F., Giesl, J.: Automatic complexity analysis of integer programs via triangular weakly non-linear loops. In: Blanchette, J., Kovács, L., Pattinson, D. (eds.) IJCAR 2022. LNCS, vol. 13385, pp. 734–754 (2022). https://doi.org/10.1007/978-3-031-10769-6_43
Lommen, N., Giesl, J.: Targeting completeness: using closed forms for size bounds of integer programs. In: Sattler, U., Suda, M. (eds.) FroCoS 2023. LNCS, vol. 14279, pp. 3–22 (2023). https://doi.org/10.1007/978-3-031-43369-6_1
Lommen, N., Meyer, E., Giesl, J.: Control-flow refinement for complexity analysis of probabilistic programs in KoAT. https://doi.org/10.48550/arXiv.2402.03891. Corr abs/2402.03891(2024)
Meyer, F., Hark, M., Giesl, J.: Inferring expected runtimes of probabilistic integer programs using expected sizes. In: Groote, J.F., Larsen, K.G. (eds.) TACAS 2021. LNCS, vol. 12651, pp. 250–269. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72016-2_14
Chapter Google Scholar
Moser, G., Schaper, M.: From Jinja bytecode to term rewriting: a complexity reflecting transformation. Inf. Comput. 261, 116–143 (2018). https://doi.org/10.1016/j.ic.2018.05.007
Article MathSciNet Google Scholar
Ngo, V.C., Carbonneaux, Q., Hoffmann, J.: Bounded expectations: resource analysis for probabilistic programs. In: Foster, J.S., Grossman, D. (eds.) PLDI 2018, pp. 496–512 (2018). https://doi.org/10.1145/3192366.3192394
Noschinski, L., Emmes, F., Giesl, J.: Analyzing innermost runtime complexity of term rewriting by dependency pairs. J. Autom. Reason. 51, 27–56 (2013). https://doi.org/10.1007/s10817-013-9277-6
Article MathSciNet Google Scholar
Schröer, P., Batz, K., Kaminski, B.L., Katoen, J.-P., Matheja, C.: A deductive verification infrastructure for probabilistic programs. In: Proceedings of the ACM on Programming Languages, vol. 7. OOPSLA, pp. 2052–2082 (2023). https://doi.org/10.1145/3622870
Sinn, M., Zuleger, F., Veith, H.: Complexity and resource bound analysis of imperative programs using difference constraints. J. Autom. Reason. 59(1), 3–45 (2017). https://doi.org/10.1007/s10817-016-9402-4
Article MathSciNet Google Scholar
TPDB (Termination Problem Data Base). https://github.com/TermCOMP/TPDB
Wang, D., Kahn, D.M., Hoffmann, J.: Raising expectations: automating expected cost analysis with types. In: Proceedings of the ACM on Programming Languages, vol. 4. ICFP (2020). https://doi.org/10.1145/3408992

Download references

Acknowledgements

We thank Yoann Kehler for helping with the implementation of our CFR technique in KoAT.

Author information

Authors and Affiliations

RWTH Aachen University, Aachen, Germany
Nils Lommen, Éléanore Meyer & Jürgen Giesl

Authors

Nils Lommen
View author publications
You can also search for this author in PubMed Google Scholar
Éléanore Meyer
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Giesl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nils Lommen .

Editor information

Editors and Affiliations

Otto-Friedrich-Universität Bamberg, Bamberg, Germany
Christoph Benzmüller
Carnegie Mellon University, Pittsburgh, PA, USA
Marijn J.H. Heule
The University of Manchester, Manchester, UK
Renate A. Schmidt

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lommen, N., Meyer, É., Giesl, J. (2024). Control-Flow Refinement for Complexity Analysis of Probabilistic Programs in KoAT (Short Paper). In: Benzmüller, C., Heule, M.J., Schmidt, R.A. (eds) Automated Reasoning. IJCAR 2024. Lecture Notes in Computer Science(), vol 14739. Springer, Cham. https://doi.org/10.1007/978-3-031-63498-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-63498-7_14
Published: 01 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63497-0
Online ISBN: 978-3-031-63498-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Control-Flow Refinement for Complexity Analysis of Probabilistic Programs in KoAT (Short Paper)