Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Probabilistic Programs. Classic imperative programs extended with random-value generators give rise to probabilistic programs. Probabilistic programs provide the appropriate framework to model applications ranging from randomized algorithms [17, 38], to stochastic network protocols [5, 34], to robot planning [30, 33], etc. Nondeterminism plays a crucial role in modeling, such as, to model behaviors over which there is no control, or for abstraction. Thus nondeterministic probabilistic programs are crucial in a huge range of problems, and hence their formal analysis has been studied across disciplines, such as probability theory and statistics [18, 28, 32, 39, 42], formal methods [5, 34], artificial intelligence [30, 31], and programming languages [10, 19, 21, 43].

Basic Termination Questions. Besides safety properties, the most basic property for analysis of programs is the liveness property. The most basic and widely used notion of liveness for programs is termination. In absence of probability (i.e., for nonprobabilistic programs), the synthesis of ranking functions and proof of termination are equivalent [22], and numerous approaches exist for synthesis of ranking functions for nonprobabilistic programs [8, 13, 40, 48]. The most basic extension of the termination question for probabilistic programs is the almost-sure termination question which asks whether a program terminates with probability 1. Another fundamental question is about finite termination (aka positive almost-sure termination [7, 21]) which asks whether the expected termination time is finite. The next interesting question is the concentration bound computation problem that asks to compute a bound M such that the probability that the termination time is below M is concentrated, or in other words, the probability that the termination time exceeds the bound M decreases exponentially.

Previous Results. We discuss the relevant previous results for termination analysis of probabilistic programs.

  • Probabilistic Programs. First, quantitative invariants was introduced to establish termination of discrete probabilistic programs with demonic nondeterminism [35, 36], This was extended in [10] to ranking supermartingales resulting in a sound (but not complete) approach to prove almost-sure termination of probabilistic programs without nondeterminism but with integer- and real-valued random variables from distributions like uniform, Gaussian, and Poison, etc. For probabilistic programs with countable state-space and without nondeterminism, the Lyapunov ranking functions provide a sound and complete method for proving finite termination [7, 23]. Another sound method is to explore bounded-termination with exponential decrease of probabilities [37] through abstract interpretation [15]. For probabilistic programs with nondeterminism, a sound and complete characterization for finite termination through ranking-supermartingale is obtained in [21]. Ranking supermartingales thus provide a very powerful approach for termination analysis of probabilistic programs.

  • Ranking Functions/Supermartingales Synthesis. Synthesis of linear ranking-functions/ranking-supermartingales has been studied extensively in [10, 12, 13, 40]. In context of probabilistic programs, the algorithmic study of synthesis of linear ranking supermartingales for probabilistic programs (cf. [10]) and probabilistic programs with nondeterminism (cf. our previous result [12]) has been studied. The major technique adopted in these results is Farkas’ Lemma [20] which serves as a complete reasoning method for linear inequalities. Beyond linear ranking functions, polynomial ranking functions have also been considered. Heuristic synthesis method of polynomial ranking-functions is studied in [4, 9]: Babic et al. [4] checked termination of deterministic polynomial programs by detecting divergence on program variables and Bradley et al. [9] extended to nondeterministic programs through an analysis on finite differences over transitions. More general methods for deterministic polynomial programs are given by [14, 47] where Cousot [14] uses Lagrangian Relaxation, and Shen et al. [47] use Putinar’s Positivstellensatz [41]. Complete methods of synthesizing polynomial ranking-functions for nondeterministic programs are studied by Yang et al. [50], where a complete method through root classification/real root isolation of semi-algbebraic systems and quantifier elimination is proposed.

To summarize, while many different approaches has been studied, the algorithmic study of synthesis of ranking supermartingales for probabilistic programs has only been limited to linear ranking supermartingales (cf. [10, 12]). Hence there is no algorithmic approach to handle nonlinear ranking supermartingales even for probabilistic programs without nondeterminism.

Our Contributions. Our contributions are as follows:

  1. 1.

    Polynomial Ranking Supermartingales. First, we extend the notion of linear ranking supermartingales (LRSM) to polynomial ranking supermartingales (pRSM). We show (by a straightforward extension of LRSM) that pRSM implies both almost-sure as well as finite termination.

  2. 2.

    Positivstellensatz’s. Second, we conduct a detailed investigation on the application of Positivstellensatz’s (German for “positive-locus-theorem” which is related to polynomials over semialgebraic sets) (cf. Sect. 5.1) to synthesis of pRSMs over nondeterministic probabilistic programs. To the best of our knowledge, this is the first result which demonstrates the synthesis of a polynomial subclass of ranking supermartingales through Positivstellensatz’s.

  3. 3.

    New Approach for Non-probabilistic Programs. Our results also extend existing results for nonprobabilistic programs. We present the first result that uses Schmüdgen’s Positivstellensatz [45] and Handelman’s Theorem [25] to synthesize polynomial ranking-functions for nonprobabilistic programs.

  4. 4.

    Efficient Approach. The previous complete method [50] suffers from high computational complexity due to the use of quantifier elimination. In contrast, our approach (sound but not complete) is efficient since the synthesis can be accomplished through linear or semi-definite programming, which can mostly be solved in polynomial time in the problem size [24]. In particular, our approach does not require quantifier elimination, and works for nondeterministic probabilistic programs.

  5. 5.

    Experimental Results. We demonstrate the effectiveness of our approach on several classical examples. We show that on classical examples, such as Gambler’s Ruin, and Random Walk, our approach can synthesize a pRSM efficiently. For these examples, LRSMs do not exist, and many of them cannot be analysed efficiently by previous approaches.

In summary, while Farkas’ Lemma and Motzkin’s Transposition Theorem are standard techniques to linear ranking functions or linear ranking supermartingales, they are not sufficient for synthesizing polynomial ranking-supermartingales. To address this problem, we study the use of Positivstellensatz’s for the first time to synthesize polynomial ranking-supermartingales for probabilistic programs, for some of them even the first time for nonprobabilistic programs, and show that how they can be used for efficient termination analysis over programs. Due to space restrictions, some technical details are available only in the full version [11].

2 Probabilistic Programs

2.1 Basic Notations and Concepts

For a set A, we denote by |A| the cardinality of A. We denote by \(\mathbb {N}\), \(\mathbb {N}_0\), \(\mathbb {Z}\), and \(\mathbb {R}\) the sets of all positive integers, non-negative integers, integers, and real numbers, respectively. We use boldface notation for vectors, e.g. \({{\varvec{x}}}\), \({{\varvec{y}}}\), etc., and we denote an i-th component of a vector \({{\varvec{x}}}\) by \({{\varvec{x}}}[i]\).

Polynomial Predicates. Let X be a finite set of variables endowed with a fixed linear order under which we have \(X=\{x_1,\dots ,x_{|X|}\}\). We denote the set of real-coefficient polynomials by \({\mathfrak {R}}{\left[ x_1,\dots , x_{|X|}\right] }\) or \({\mathfrak {R}}{\left[ X\right] }\). A polynomial constraint over X is a logical formula of the form \({g_1}{\bowtie }{g_2}\), where \(g_1,g_2\) are polynomials over X and \(\bowtie \in \{<,\le ,>,\ge \}\). A propositional polynomial predicate over X is a propositional formula whose all atomic propositional literals are either true, false or polynomial constraints over X. The validity of the satisfaction assertion \({{\varvec{x}}}\models \phi \) between a vector \({{\varvec{x}}}\in \mathbb {R}^{|X|}\) (interpreted in the way that the value for \(x_j\) \((1\le j\le |X|)\) is \({{\varvec{x}}}[j]\)) and a propositional polynomial predicate \(\phi \) is defined in the standard way w.r.t polynomial evaluation and normal semantics for logical connectives. The satisfaction set of a propositional polynomial predicate \(\phi \) is defined as \({\!\!}\llbracket {\phi }\rrbracket {\!\!}:=\{{{\varvec{x}}}\in \mathbb {R}^{|X|}\mid {{\varvec{x}}}\models \phi \}\). For more on polynomials (e.g., polynomial evaluation and arithmetic over polynomials), we refer to the textbook [29, Chapter 3].

Probability Space. A probability space is a triple \((\varOmega ,\mathcal {F},\mathbb {P})\), where \(\varOmega \) is a non-empty set (so-called sample space), \(\mathcal {F}\) is a \(\sigma \) -algebra over \(\varOmega \) (i.e., a collection of subsets of \(\varOmega \) that contains the empty set \(\emptyset \) and is closed under complementation and countable union), and \(\mathbb {P}\) is a probability measure on \(\mathcal {F}\), i.e., a function \(\mathbb {P}:\mathcal {F}\rightarrow [0,1]\) such that (i) \(\mathbb {P}(\varOmega )=1\) and (ii) for all set-sequences \(A_1,A_2,\dots \in \mathcal {F}\) that are pairwise-disjoint (i.e., \(A_i \cap A_j = \emptyset \) whenever \(i\ne j\)) it holds that \(\sum _{i=1}^{\infty }\mathbb {P}(A_i)=\mathbb {P}\left( \bigcup _{i=1}^{\infty } A_i\right) \).

Random Variables and Filtrations. A random variable X in a probability space \((\varOmega ,\mathcal {F},\mathbb {P})\) is an \(\mathcal {F}\)-measurable function \(X:\varOmega \rightarrow \mathbb {R}\cup \{-\infty ,+\infty \}\), i.e., a function satisfying the condition that for all \(d\in \mathbb {R}\cup \{+\infty , -\infty \}\), the set \(\{\omega \in \varOmega \mid X(\omega )\le d\}\) belongs to \(\mathcal {F}\). The expected value of a random variable X, denote by \(\mathbb {E}(X)\), is defined as the Lebesgue integral of X with respect to \(\mathbb {P}\), i.e., \(\mathbb {E}(X):=\int X\,\mathrm {d}\mathbb {P}\) ; the precise definition of Lebesgue integral is somewhat technical and is omitted here (cf. [6, Chapter 5] for a formal definition). A filtration of a probability space \((\varOmega ,\mathcal {F},\mathbb {P})\) is an infinite sequence \(\{\mathcal {F}_n \}_{n\in \mathbb {N}_0}\) of \(\sigma \)-algebras over \(\varOmega \) such that \(\mathcal {F}_n \subseteq \mathcal {F}_{n+1} \subseteq \mathcal {F}\) for all \(n\in \mathbb {N}_0\).

2.2 Probabilistic Programs

The Syntax. The class of probabilistic programs we consider encompasses basic programming mechanisms such as assignment statement (indicated by ‘:=’), while-loop, if-branch, basic probabilistic mechanisms such as probabilistic branch (indicated by ‘prob’) and random sampling, and demonic nondeterminism indicated by ‘\(\star \)’. Variables (or identifiers) of a probabilistic program are of real type, i.e., values of the variables are real numbers; moreover, variables are classified into program and sampling variables, where program variables receive their values through assignment statements and sampling variables do through random samplings. We consider that each sampling variable r is bounded, i.e., associated with a one-dimensional cumulative distribution function \(\Upsilon _r\) and a non-empty bounded interval \(\mathrm {supp}_{r}\) such that any random variable z which respects \(\Upsilon _r\) satisfies that z lies in the bounded interval with probability 1. Due to space restriction, details (e.g., grammar) are relegated to the full version [11]. An example probabilistic program is illustrated in Example 1.

Example 1

Consider the running example depicted in Fig. 1, where r is a sampling variable with the two-point distribution \(\{1\mapsto 0.5,-1\mapsto 0.5\}\) where the probability to take values 1 and \(-1\) are both 0.5. The probabilistic program models a scenario of Gambler’s Ruin where the gambler has initial money x and repeats gambling until he wins more than 10 or loses all his money. The result of a gamble is nondeterministic: either win 1 with probability 0.5 (nondeterministic branch); or lose with probability 0.51 (the probabilistic branch). The numbers 1–7 on the left are the program counters for the program, where 1 is the initial program counter and 7 the terminal program counter.

Fig. 1.
figure 1

Running example: Gambler Ruin

Fig. 2.
figure 2

The CFG of the running example

The Semantics. We use control flow graphs to capture the semantics of probabilistic programs, which we define below.

Definition 1

(Control Flow Graph). A control flow graph (CFG) is a tuple \(\mathcal {G}=( L ,\bot ,(X,R),\mapsto )\) with the following components:

  • \( L \) is a finite set of labels partitioned into four pairwise-disjoint subsets \( L _\mathrm {d}\), \( L _\mathrm {p}, L _\mathrm {c}\) and \( L _\mathrm {a}\) of demonic, probabilistic, conditional-branching (branching for short) and assignment labels, resp.; and \(\bot \) is a special label not in L called the terminal label;

  • \(X\) and \(R\) are disjoint finite sets of real-valued program and sampling variables respectively;

  • \(\mapsto \) is a transition relation in which every member (called transition) is a tuple of the form \((\ell ,\alpha ,\ell ')\) for which \(\ell \) (resp. \(\ell '\)) is the source label (resp. target label) in \( L \) and \(\alpha \) is either a real number in (0, 1) if \(\ell \in L _\mathrm {p}\), or \(\star \) if \(\ell \in L _\mathrm {d}\), or a propositional polynomial predicate if \(\ell \in L _\mathrm {c}\), or an update function \(f:\mathbb {R}^{|X|}\times \mathbb {R}^{|R|}\rightarrow \mathbb {R}^{|X|}\) if \(\ell \in L _\mathrm {a}\).

W.l.o.g, we assume that \( L \subseteq \mathbb {N}_0\). Intuitively, labels in \( L _\mathrm {d}\) correspond to demonic statements indicated by ‘\(\star \)’; labels in \( L _\mathrm {p}\) correspond to probabilistic-branching statements indicated by ‘prob’; labels in \( L _\mathrm {c}\) correspond to conditional-branching statements indicated by some propositional polynomial predicate; labels in \( L _\mathrm {a}\) correspond to assignments indicated by ‘\(:=\)’; and the terminal label \(\bot \) denotes the termination of a program. The transition relation \(\mapsto \) specifies the transitions between labels together with the additional information specific to different types of labels. The update functions are interpreted as follows: we first fix two linear orders on \(X\) and \(R\) so that \(X= \{x_1,\dots ,x_{|X|}\}\) and \(R= \{r_1,\dots ,r_{|R|}\}\), interpreting each vector \({{\varvec{x}}}\in \mathbb {R}^{|X|}\) (resp. \({{\varvec{r}}}\in \mathbb {R}^{|R|}\)) as a valuation of program (resp. sampling) variables in the sense that the value of \(x_j\) (resp. \(r_j\)) is \({{\varvec{x}}}[j]\) (resp. \({{\varvec{r}}}[j]\)); then each update function f is interpreted as a function which transforms a valuation \({{\varvec{x}}}\in \mathbb {R}^{|X|}\) before the execution of an assignment statement into \(f({{\varvec{x}}},{{\varvec{r}}})\) after the execution of the assignment statement, where \({{\varvec{r}}}\) is the valuation on \(R\) obtained from a sampling before the execution of the assignment statement.

It is intuitively clear that any probabilistic program can be naturally transformed into a CFG. Informally, each label represents a program location in an execution of a probabilistic program for which the statement of the program location is the next to be executed (see Fig. 2).

In the rest of the section, we fix a probabilistic program P with the set \(X= \{x_1,\dots ,x_{|X|}\}\) of program variables and the set \(R= \{r_1,\dots ,r_{|R|}\}\) of sampling variables, and let \(\mathcal {G}=( L ,\bot ,(X,R),\mapsto )\) be its associated CFG. We also fix \(\ell _0\) and resp. \({{\varvec{x}}}_0\) to be the label corresponding to the first statement to be executed in P and resp. the initial valuation of program variables.

The Semantics. A configuration (for P) is a tuple \((\ell ,{{\varvec{x}}})\) where \(\ell \in L \cup \{\bot \}\) and \({{\varvec{x}}}\in \mathbb {R}^{|X|}\). A finite path (of P) is a finite sequence of configurations \((\ell _0,{{\varvec{x}}}_0),\cdots ,(\ell _k,{{\varvec{x}}}_k)\) such that for all \(0 \le i < k\), either (i) \(\ell _{i+1}=\ell _i=\bot \) and \({{\varvec{x}}}_i={{\varvec{x}}}_{i+1}\) (i.e., the program terminates); or (ii) there exist \((\ell _i,\alpha ,\ell _{i+1})\in \mapsto \) and \({{\varvec{r}}}\in \{{{\varvec{r}}}'\mid \forall r\in R.\ {{\varvec{r}}}'(r)\in \mathrm {supp}_{r}\}\) such that one of the following conditions hold: (a) \(\ell _i\in L _\mathrm {p}\cup L _\mathrm {d}\) and \({{\varvec{x}}}_i={{\varvec{x}}}_{i+1}\) (probabilistic or demonic transitions), (b) \(\ell _i\in L _\mathrm {c}\), \({{\varvec{x}}}_i={{\varvec{x}}}_{i+1}\) and \({{\varvec{x}}}_i\models \alpha \) (conditional-branch transitions), (c) \(\ell _i\in L _\mathrm {a}\) and \({{\varvec{x}}}_{i+1}=\alpha ({{\varvec{x}}}_i,{{\varvec{r}}})\) (assignment transitions). A run (of P) is an infinite sequence of configurations whose all finite prefixes are finite paths over P. A configuration \((\ell ,{{\varvec{x}}})\) is reachable from the initial configuration \((\ell _0,{{\varvec{x}}}_0)\) if there exists a finite path \((\ell _0,{{\varvec{x}}}_0),\cdots ,(\ell _k,{{\varvec{x}}}_k)\) such that \((\ell ,{{\varvec{x}}})=(\ell _k,{{\varvec{x}}}_k)\).

The probabilistic feature of P can be captured by constructing a suitable probability measure over the set of all its runs. However, before this can be done, nondeterminism in P needs to be resolved by some scheduler.

Definition 2

(Scheduler). A scheduler (for P) is a function which assigns to every finite path \((\ell _0,{{\varvec{x}}}_0),\dots ,(\ell _k,{{\varvec{x}}}_k)\) with \(\ell _k\in L _\mathrm {d}\) a transition in \(\mapsto \) with source label \(\ell _k\).

The behaviour of P under a scheduler \(\sigma \) is standard: at each step, P first samples a real number for each sampling variable and then evolves to the next step according to its CFG or the scheduler choice. In this way, the scheduler and random choices/samplings produce a run over P. Moreover, each scheduler \(\sigma \) induces a unique probability measure \(\mathbb {P}^{\sigma }\) over the runs of P. In the sequel, we will use \(\mathbb {E}^{\sigma }(\cdot )\) to denote the expected values of random variables under \(\mathbb {P}^{\sigma }\).

Random Variables and Filtrations over Runs. We define the following (vectors of) random variables on the set of runs of P: \(\{\theta ^P_n\}_{n\in \mathbb {N}_0},~\{\overline{{{\varvec{x}}}}^P_{n}\}_{n\in \mathbb {N}_0}\) and \(\{\overline{{{\varvec{r}}}}^P_{n}\}_{n\in \mathbb {N}_0}\): each \(\theta ^P_n\) is the random variable representing the (integer-valued) label at the n-th step; each \(\overline{{{\varvec{x}}}}^P_{n}\) is the vector of random variables such that each \(\overline{{{\varvec{x}}}}^P_{n}[i]\) is the random variable representing the value of the program variable \(x_i\) at the n-th step; and each \(\overline{{{\varvec{r}}}}^P_{n}[i]\) is the random variable representing the sampled value of the sampling variable \(r_i\) at the n-th step. The filtration \(\{\mathcal {H}^P_n\}_{n\in \mathbb {N}_0}\) is defined such that each \(\sigma \)-algebra \(\mathcal {H}^P_n\) is the smallest \(\sigma \)-algebra that makes all random variables in \(\{\theta ^P_k\}_{0\le k\le n}\) and \(\{\overline{{{\varvec{x}}}}^P_{k}\}_{0\le k\le n}\) measurable. We will omit the superscript P in all the notations above if it is clear from the context.

Remark 1

Under the condition that each sampling variable is bounded, using an inductive argument it follows that each \(\overline{{{\varvec{x}}}}_{n}\) is a vector of bounded random variables. Thus \(\mathbb {E}^\sigma ({|}{\overline{{{\varvec{x}}}}_n[i]}{|})\) exists for each random variable \(\overline{{{\varvec{x}}}}_n[i]\).

Below we define the notion of polynomial invariants which logically captures all reachable configurations. A polynomial invariant may be obtained through abstract interpretation [15].

Definition 3

(Polynomial Invariant). A polynomial invariant (for P) is a function \(I\) assigning a propositional polynomial predicate over \(X\) to every label in \(\mathcal {G}\) such that for all configurations \((\ell ,{{\varvec{x}}})\) reachable from \((\ell _0,{{\varvec{x}}}_0)\) in \(\mathcal {G}\), it holds that \({{\varvec{x}}}\models I(\ell )\).

3 Termination over Probabilistic Programs

In this section, we first define the notions of almost-sure/finite termination and concentration bounds over probabilistic programs, and then describe the computational problems studied in this paper. Below we fix a probabilistic program P with its associated CFG \(\mathcal {G}=( L ,\bot ,(X,R),\mapsto )\) and an initial configuration \((\ell _0,{{\varvec{x}}}_0)\) for P.

Definition 4

(Termination [7, 12, 21]). A run \(\omega =\{(\ell _n,{{\varvec{x}}}_n)\}_{n\in \mathbb {N}_0}\) over P is terminating if \(\ell _n=\bot \) for some \(n\in \mathbb {N}_0\). The termination time of P is a random variable \(T_P\) such that for each run \(\omega =\{(\ell _n,{{\varvec{x}}}_n)\}_{n\in \mathbb {N}_0}\), \(T_P(\omega )\) is the least number n such that \(\ell _n=\bot \) if such n exists, and \(\infty \) otherwise. The program P is said to be almost-sure terminating (resp. finitely terminating) if \(\mathbb {P}^\sigma (T_P<\infty )=1\) (resp. \(\mathbb {E}^\sigma (T_P)<\infty \)) for all schedulers \(\sigma \) (for P).

Note that \(\mathbb {E}^\sigma (T_P)<\infty \) implies that \(\mathbb {P}^\sigma (T_P<\infty )=1\), but the converse does not necessarily hold (see [10, Example 5] for an example). To measure the expected values of the termination time under all (demonic) schedulers, we further define the quantity \(\mathsf {ET}(P):=\sup _{\sigma }\mathbb {E}^{\sigma }(T_P)\).

Definition 5

(Concentration on Termination Time [12, 37]). A concentration bound for P is a non-negative integer M such that there exist real constants \(c_1\ge 0\) and \(c_2>0\), and for all \(N \ge M\) we have \(\mathbb {P}(T_P>N)\le c_1\cdot e^{-c_2\cdot N}\).

Informally, a concentration bound characterizes exponential decrease of probability values of non-termination beyond the bound. On one hand, it can be used to give an upper bound on probability of non-termination beyond a large step; and on the other hand, it leads to an algorithm that approximates \(\mathsf {ET}(P)\) (cf. [12, Theorem 5]).

In this paper, we consider the algorithmic analysis of the following problems:

  • Input: a probabilistic program P, a polynomial invariant \(I\) for P and an initial configuration \((\ell _0,{{\varvec{x}}}_0)\) for P;

  • Output (Almost-Sure/Finite Termination):\(\text{ yes }\)” if the algorithm finds that P is almost-sure/finite terminating and “\(\text{ fail }\)” otherwise;

  • Output (Concentration on Termination): a concentration bound if the algorithm finds one and “\(\text{ fail }\)” otherwise.

4 Polynomial Ranking-Supermartingale

In this section, we develop the notion of polynomial ranking-supermartingale which is an extension of linear ranking-supermartingale [10, 12]. We fix a probabilistic program P, a polynomial invariant I for P and an initial configuration \((\ell _0,{{\varvec{x}}}_0)\) for P. Let \(\mathcal {G}=( L ,\bot ,(X,R),\mapsto )\) be the associated CFG of P, with \(X= \{x_1,\dots ,x_{|X|}\}\) and \(R= \{r_1,\dots ,r_{|R|}\}\). We first present the general notion of ranking supermartingale, and then define polynomial ranking supermartingale.

Definition 6

(Ranking Supermartingale [12, 21]). A discrete-time stochastic process \(\{X_n\}_{n\in \mathbb {N}_0}\) w.r.t a filtration \(\{\mathcal {F}_n\}_{n\in \mathbb {N}_0}\) is a ranking supermartingale (RSM) if there exist \(K<0\) and \(\epsilon >0\) such that for all \(n\in \mathbb {N}_0\), we have \(\mathbb {E}(|X_n|)<\infty \) and it holds almost surely (with probability 1) that \(X_n\ge K\) and \(\mathbb {E}(X_{n+1}\mid \mathcal {F}_n)\le X_n-\epsilon \cdot \mathbf {1}_{X_n\ge 0}\), where \(\mathbb {E}(X_{n+1}\mid \mathcal {F}_n)\) is the conditional expectation of \(X_{n+1}\) given \(\mathcal {F}_n\) (cf. [49, Chapter 9]).

Informally, a polynomial ranking-supermartingale over P is a polynomial instantiation of an RSM through certain function \(\eta :( L \cup \{\bot \})\times \mathbb {R}^{|X|}\rightarrow \mathbb {R}\) which satisfies that each \(\eta (\ell ,\cdot )\) (for all \(\ell \in L \cup \{\bot \}\)) is essentially a polynomial function over \(X\). Given such a function \(\eta \), the intuition is to have conditions that make the stochastic process \(X_n=\eta (\theta _n,\overline{{{\varvec{x}}}}_n)\) an RSM. To ensure this, we consider the conditional expectation \(\mathbb {E}^\sigma \left( X_{n+1}\mid \mathcal {H}_n\right) \); this is captured by an extension of pre-expectation [10, 12] from the linear to the polynomial case. Below we define \( L _{\bot }:= L \cup \{\bot \}\). For a function \(g:\mathbb {R}^{|X|}\times \mathbb {R}^{|R|}\rightarrow \mathbb {R}\), we let \(\mathbb {E}_R(g,\cdot ):\mathbb {R}^{|X|}\rightarrow \mathbb {R}\) be the function such that each \(\mathbb {E}_R(g,{{\varvec{x}}})\) is the expected value \(\mathbb {E}(g({{\varvec{x}}},\hat{{{\varvec{r}}}}))\), where \(\hat{{{\varvec{r}}}}\) is any vector of independent random variables such that each \(\hat{{{\varvec{r}}}}[i]\) is a random variable that respects the cumulative distribution function \(\Upsilon _{r_i}\).

Definition 7

(Pre-Expectation). Let \(\eta : L _\bot \times \mathbb {R}^{|X|}\rightarrow \mathbb {R}\) be a function such that each \(\eta (\ell ,\cdot )\) (for all \(\ell \in L _\bot \)) is a polynomial function over \(X\). The function \(\mathrm {pre}_\eta : L _\bot \times \mathbb {R}^{|X|}\rightarrow \mathbb {R}\) is defined by:

  • \(\mathrm {pre}_\eta (\ell ,{{\varvec{x}}}):=\sum _{(\ell ,z,\ell ')\in \mapsto } z\cdot \eta \left( \ell ',{{\varvec{x}}}\right) \) if \(\ell \in L _\mathrm {p}\) (probabilistic transitions);

  • \(\mathrm {pre}_\eta (\ell ,{{\varvec{x}}}):=\max _{(\ell ,\star ,\ell ')\in \mapsto }\eta (\ell ',{{\varvec{x}}})\) if \(\ell \in L _\mathrm {d}\) (nondeterministic transitions);

  • \(\mathrm {pre}_\eta (\ell ,{{\varvec{x}}}):=\eta (\ell ',{{\varvec{x}}})\) if \(\ell \in L _\mathrm {c}\) and \((\ell ,\phi ,\ell ')\) is the only transition in \(\mapsto \) such that \({{\varvec{x}}}\models \phi \) (conditional transitions);

  • \(\mathrm {pre}_\eta (\ell ,{{\varvec{x}}}):=\mathbb {E}_{R}\left( g,{{\varvec{x}}}\right) \) if \(\ell \in L _{\mathrm {a}}\), where g is the function such that \(g({{\varvec{x}}},{{\varvec{r}}})=\eta \left( \ell ',f({{\varvec{x}}},{{\varvec{r}}})\right) \) and \((\ell ,f,\ell ')\) is the only transition in \(\mapsto \) (assignment transitions); and

  • \(\mathrm {pre}_\eta (\ell ,{{\varvec{x}}}):=\eta (\ell ,{{\varvec{x}}})\) if \(\ell =\bot \) (terminal location).

The following lemma establishes the relationship between pre-expectation and conditional expectation.

Lemma 1

Let \(\eta : L _\bot \times \mathbb {R}^{|X|}\rightarrow \mathbb {R}\) be a function such that each \(\eta (\ell ,\cdot )\) (for all \(\ell \in L _\bot \)) is a polynomial function over \(X\), and \(\sigma \) be any scheduler. Let the stochastic process \(\{X_n\}_{n\in \mathbb {N}_0}\) be defined by: \(X_{n}:=\eta (\theta _{n},\overline{{{\varvec{x}}}}_{n})\). Then for all \(n\in \mathbb {N}_0\), we have \(\mathbb {E}^{\sigma }(X_{n+1}\mid \mathcal {H}_n)\le \mathrm {pre}_\eta (\theta _{n},\overline{{{\varvec{x}}}}_{n})\).

Example 2

Consider the running example in Example 1 with CFG in Fig. 2. Let \(\eta \) be the function specified in the second and fifth column of Table 1, where \(g(x):=(x-1)(10-x)\). Then \(\mathrm {pre}_\eta \) is given in the third and sixth column of Table 1. Note that the case for \(i=2\) is obtained from \(\mathrm {pre}_\eta (2, x)=\max \{g(x)+9.6,g(x)+9.6\}\), and the case for \(i=3\) is from \(\mathrm {pre}_\eta (3, x)=\mathbb {E}_R(h, x)\), where h is the function \(h(y,r)= g(y)-(2y-11)r-r^2+10\).

Table 1. \(\eta \) and \(\mathrm {pre}_\eta \) for Example 1 and Fig. 2

We now define the notion of polynomial ranking-supermartingale. The intuition is that we encode the RSM-difference condition as a logical formula, treat zero as the threshold between terminal and non-terminal labels, and use the invariant I to over-approximate the set of reachable configurations at each label. Below for each \(\ell \in L _\mathrm {c}\), we define \(\mathsf {PP}(\ell )\) to be the propositional polynomial predicate \(\bigvee _{(\ell ,\phi ,\ell ')\in \mapsto , \ell '\ne \bot }\phi \); and for \(\ell \in L \backslash L _\mathrm {c}\), we let \(\mathsf {PP}(\ell ):= \text{ true. }\)

Definition 8

(Polynomial Ranking-Supermartingale). A d-degree polyonomial ranking-supermartingale map (in short, d-pRSM) w.r.t (PI) is a function \(\eta : L _\bot \times \mathbb {R}^{|X|}\rightarrow \mathbb {R}\) satisfying that there exist \(\epsilon >0\) and \(K\le -\epsilon \) such that for all \(\ell \in L _\bot \) and all \({{\varvec{x}}}\in \mathbb {R}^{|X|}\), the conditions (C1-C4) hold:

  • C1: the function \(\eta (\ell ,\cdot ):\mathbb {R}^{|X|}\rightarrow \mathbb {R}\) is a polynomial over \(X\) of order at most d;

  • C2: if \(\ell \ne \bot \) and \({{\varvec{x}}}\models I(\ell )\), then \(\eta (\ell ,{{\varvec{x}}})\ge 0\);

  • C3: if \(\ell =\bot \), then \(\eta (\ell , {{\varvec{x}}})=K\);

  • C4: if \(\ell \ne \bot \) and \({{\varvec{x}}}\models I(\ell )\wedge \mathsf {PP}(\ell )\), then \(\mathrm {pre}_\eta (\ell ,{{\varvec{x}}})\le \eta (\ell ,{{\varvec{x}}})-\epsilon \).

Note that C2 and C3 together separate non-termination and termination by the threshold 0, and C4 is the RSM difference condition which is intuitively related to the \(\epsilon \) difference in the RSM definition (cf. Definition 6). By generalizing our previous proofs in [12] (from LRSM to pRSM), we establish the soundness of pRSMs w.r.t both almost-sure and finite termination.

Theorem 1

If there exists a d-pRSM \(\eta \) w.r.t (PI) with constants \(\epsilon ,K\) (cf. Definition 8), then P is a.s. terminating and \(\mathsf {ET}(P)\le \mathsf {UB}(P):=\frac{\eta (\ell _0,{{\varvec{x}}}_0)-K}{\epsilon }\).

Example 3

Consider the running example (cf. Example 1) and the function \(\eta \) given in Example 2. Assuming that the initial valuation satisfies \(1\le x\wedge x\le 10\), we assign the trivial invariant I such that \(I(1)=0\le x\wedge x\le 11\), \(I(j)=1\le x\wedge x\le 10\) for \(2\le j\le 6\) and \(I(7)=x<1\vee x>10\). It is straightforward to verify that \(\eta \) is a 2-pRSM with \(\epsilon =0.2\) and \(K=-0.2\) (cf. Definition 8 for \(\epsilon , K\)). Hence by Theorem 1, the program in Example 1 terminates almost-surely under any scheduler and its expected termination time is at most \(5\cdot (x_0-1)\cdot (10-x_0)+51\), given the initial value \(x_0\).

Remark 2

The running example (cf. Example 1) does not admit a linear (i.e. 1-) pRSM since \(\mathbb {E}_R(r)=0\) at label 3. This indicates that linear pRSMs may not exist even over simple affine programs like Example 1. Thus, this motivates the study of pRSMs even for simple affine programs.

Remark 3

The non-strict inequality symbol ‘\(\ge \)’ in C2 can be replaced by its strict counterpart ‘>’ since \(\eta +c\) (\(c>0\)) remains to be a pRSM if \(\eta \) is a pRSM and K (in C3) is sufficiently small. (By definition, \(\mathrm {pre}_{\eta +c}=\mathrm {pre}_\eta +c\).) Moreover, the non-strict inequality symbol ‘\(\le \)’ in C4 can be replaced by ‘<’ since a pRSM \(\eta \) and a constant K (for C3) can be scaled by a constant factor (e.g. 1.1) so that strict inequalities are ensured. Moreover, one can also assume that \(K=-1\) and \(\epsilon =1\) in Definition 8. This is because one can first scale a pRSM with constants \(\epsilon , K\) by a positive scalar to ensure that \(\epsilon =1\), and then safely set \(K=-1\) due to C2.

Theorem 1 answers the questions of almost-sure and finite termination in a unified fashion. Generalizing our approach in [12], we show that by restricting a pRSM to have bounded difference, we also obtain concentration results.

Definition 9

(Difference-Bounded pRSM). A d-pRSM \(\eta \) is difference-bounded w.r.t a non-empty interval \([a,b]\subseteq \mathbb {R}\) if the following conditions hold:

  • for all \(\ell \in L _\mathrm {d}\cup L _\mathrm {p}\) and \((\ell ,\alpha ,\ell ')\in \mapsto \), and for all \({{\varvec{x}}}\in {\!\!}\llbracket {I(\ell )}\rrbracket {\!\!}\), it holds that \(a\le \eta (\ell ',{{\varvec{x}}})-\eta (\ell ,{{\varvec{x}}})\le b\);

  • for all \(\ell \in L _\mathrm {c}\) and \((\ell ,\phi ,\ell ')\in \mapsto \), and for all \({{\varvec{x}}}\in {\!\!}\llbracket {I(\ell )\wedge \phi }\rrbracket {\!\!}\), it holds that \(a\le \eta (\ell ',{{\varvec{x}}})-\eta (\ell ,{{\varvec{x}}})\le b\);

  • for all \(\ell \in L _\mathrm {a}\) and \((\ell ,f,\ell ')\in \mapsto \), for all \({{\varvec{x}}}\in {\!\!}\llbracket {I(\ell )}\rrbracket {\!\!}\) and for all \({{\varvec{r}}}\in \{{{\varvec{r}}}'\mid \forall r\in R.\ {{\varvec{r}}}'[r]\in \mathrm {Supp}_r\}\), it holds that \(a\le \eta (\ell ',f({{\varvec{x}}},{{\varvec{r}}}))-\eta (\ell ,{{\varvec{x}}})\le b\).

Note that if a d-pRSM \(\eta \) with constants \(\epsilon ,K\) (cf. Definition 8) is difference-bounded w.r.t [ab], then from definition \(a\le -\epsilon \); one can further assume that \(-\epsilon \le b\) since otherwise one can reset \(\epsilon :=-b\). By definition, the stochastic process \(X_n:=\eta (\theta _n, \overline{{{\varvec{x}}}}_n)\) defined through a difference-bounded pRSM w.r.t [ab] satisfies that \(a\le X_{n+1}-X_n\le b\); then using Hoeffding’s Inequality [12, 26], we establish a concentration bound.

Theorem 2

Let \(\eta \) be a difference-bounded d-pRSM w.r.t [ab] with constants \(\epsilon \) and K. For all \(n\in \mathbb {N}\), if \(\epsilon (n-1)>\eta (\ell _0,{{\varvec{x}}}_0)\), then \(\mathbb {P}(T_P > n)\le e^{-\frac{2(\epsilon (n-1)-\eta (\ell _0,{{\varvec{x}}}_0))^2}{(n-1)(b-a)^2}}\).

From Theorem 2, a difference-bounded d-pRSM \(\eta \) implies a concentration bound \(\frac{\eta (\ell _0,{{\varvec{x}}}_0)}{\epsilon }+2\).

Example 4

Consider again our running example in Example 1 with invariant given in Example 3. Let \(\eta \) be the function illustrated in Table 1. One can verify that the interval \([-10.2 , 8.6]\) satisfies the conditions specified in Definition 9 for \(\eta \), as the following hold:

  • for all \(x\in [1,10]\), \(\eta (2,x)-\eta (1,x)=-0.2\);

  • for all \(x\in [0,1)\cup (10,11]\), \(-10.2\le \eta (7,x)-\eta (1,x)\le -0.2\);

  • for all \(x\in [1,10]\) and \(i\in \{3,4\}\), \(\eta (i,x)-\eta (2,x)=-0.2\);

  • for all \(x\in [1,10]\) and \(i\in \{5,6\}\), \(-9.4\le \eta (i,x)-\eta (4,x)\le 8.6\);

  • for all \(x\in [1,10]\), \(\eta (1,x-1)-\eta (5,x)=-0.2\);

  • for all \(x\in [1,10]\), \(\eta (1,x+1)-\eta (6,x)=-0.2\);

  • for all \(x\in [1,10]\) and \(r\in \{-1,1\}\), \(-9.6\le \eta (1,x+r)-\eta (3,x)\le 8.4\).

Then by Theorem 2, assuming that the program have initial value \(x_0=5\), one can deduce that \(\mathbb {P}\left( T_P>50000\right) \le e^{-\frac{2\cdot (0.2\cdot 49999-30)^2}{49999\cdot 18.8^2}}\approx 1.3016\cdot 10^{-5}\).

We end this section with a result stating that whether a (difference-bounded) d-pRSM exists can be decided (using quantifier elimination).

Theorem 3

For any fixed natural number \(d\in \mathbb {N}\), the problem whether a (difference-bounded) d-pRSM w.r.t an input pair (PI) exists is decidable.

5 The Synthesis Algorithm

In this section, we present an efficient algorithmic approach for solving almost-sure/finite termination and concentration questions through synthesis of pRSMs. Instead of computationally-expensive quantifier elimination (cf. Theorem 3) we use Positivstellensatz, which is sound but not complete. Note that by Theorem 1, the existence of a pRSM implies both almost-sure and finite termination of a probabilistic program.

The General Framework. To synthesize a pRSM, the algorithm first sets up a polynomial template with unknown coefficients. Next, the algorithm finds values for the unknown coefficients, \(\epsilon ,K\) (cf. Definition 8) and [ab] (cf. Definition 9) so that C2-C4 in Definition 8 and concentration conditions in Definition 9 are satisfied. Note that from Definition 7, each \(\mathrm {pre}_\eta (\ell ,\cdot )\) is a (piecewise) polynomial over \(X\) whose coefficients are linear combinations of unknown coefficients from the polynomial template. Instead of using quantifier elimination (cf. e.g. [50] or Theorem 3), we use Positivstellensatz’s [44]. We observe that each universally-quantified formula described in C2, C4 and Definition 9 can be decomposed (through disjunctive normal form of propositional polynomial predicate or transformation of \(\max \) in Definition 7 into two conjunctive clauses) into a conjunction of formulae of the following pattern (\(\dag \))

$$\begin{aligned} \forall {{\varvec{x}}}\in \mathbb {R}^{|X|}. \left[ \left( g_1({{\varvec{x}}})\ge 0\wedge \dots \wedge g_m({{\varvec{x}}})\ge 0\right) \rightarrow g({{\varvec{x}}})> 0\right] \qquad (\dag ) \end{aligned}$$

where each \(g_i\) is a polynomial with constant coefficients and g is one with unknown coefficients from the polynomial template. In the pattern, we over-approximate any possible ‘\(g_j({{\varvec{x}}})>0\)’ by ‘\(g_j({{\varvec{x}}})\ge 0\)’. By Remark 3, the difference between ‘\(g({{\varvec{x}}})> 0\)’ and ‘\(g({{\varvec{x}}})\ge 0\)’ does not matter.

Example 5

Consider again the program in Example 1 with its CFG. Consider the invariant specified in Example 3. The instances of the pattern for termination of this program are listed as follows, where each instance is represented by a pair \((\varGamma ,g)\) where \(\varGamma \) and g corresponds to \(\{g_1,\dots ,g_m\}\) and resp. g described in (\(\dag \)).

  • (C4, label 1) \((\{x-1,10-x,x,11-x\}, \eta (1,x)-\eta (2,x)-\epsilon )\);

  • (C4, label 2) \((\{x-1,10-x\}, \eta (2,x)-\eta (3,x)-\epsilon )\) and \((\{x-1,10-x\}, \eta (2,x)-\eta (4,x)-\epsilon )\);

  • (C4, label 3) \((\{x-1,10-x\}, \eta (3,x)-\mathbb {E}_R((y,r)\mapsto \eta (1,y+r), x)-\epsilon )\);

  • (C4, label 4) \((\{x-1,10-x\}, \eta (4,x)-0.51\eta (5,x)-0.49\eta (6,x)-\epsilon )\);

  • (C4, label 5) \((\{x-1,10-x\}, \eta (5,x)-\eta (1, x-1)-\epsilon )\);

  • (C4, label 6) \((\{x-1,10-x\}, \eta (6,x)-\eta (1, x+1)-\epsilon )\);

  • (C2) \((\{x,11-x\}, \eta (1,x))\) and \((\{x-1,10-x\}, \eta (j,x))\) for \(2\le j\le 6\).

In the next part, we show that such pattern can be solved by Positivstellensatz’s.

5.1 Positivstellensatz’s

We fix a linearly-ordered finite set X of variables and a finite set \(\varGamma =\{g_1,\dots ,g_m\}\subseteq {\mathfrak {R}}{\left[ X\right] }\) of polynomials. Let \({\!\!}\llbracket {\varGamma }\rrbracket {\!\!}\) be the set of all vectors \({{\varvec{x}}}\in \mathbb {R}^{|X|}\) satisfying the propositional polynomial predicate \(\bigwedge _{i=1}^m g_i\ge 0\). We first define pre-orderings and sums of squares as follows.

Definition 10

(Sums of Squares). Define \(\varTheta \) to be the set of sums-of-squares, i.e.,

$$\begin{aligned} \varTheta :=\left\{ \sum _{i=1}^k h^2_i \mid k\in \mathbb {N} \text{ and } h_{1},\dots ,h_k\in {\mathfrak {R}}{\left[ X\right] }\right\} ~. \end{aligned}$$

Definition 11

(Preordering). The preordering generated by \(\varGamma \) is defined by:

$$\begin{aligned} \text{ PO }(\varGamma ):=\left\{ \sum _{w\in \{0,1\}^m} h_w\cdot \prod _{i=1}^{m} g_i^{w_i}\mid \forall w.\ h_w\in \varTheta \right\} ~. \end{aligned}$$

Remark 4

It is well-known that a real-coefficient polynomial g of degree 2d is a sum of squares iff there exists a k-dimensional positive semi-definite real square matrix Q such that \(g={{\varvec{y}}}^\mathrm {T} Q{{\varvec{y}}}\), where k is the number of monomials of degree no greater than d and \({{\varvec{y}}}\) is the column vector of all such monomials (cf. [27, Corollary 7.2.9]). This implies that the problem whether a given polynomial (with real coefficients) is a sum of squares can be solved by semi-definite programming [24].

Now we present the first Positivstellensatz, called Schmüdgen’s Positivstellensatz.

Theorem 4

(Schmüdgen’s Positivstellensatz [45]). Let \(g\in {\mathfrak {R}}{\left[ X\right] }\). If the set \({\!\!}\llbracket {\varGamma }\rrbracket {\!\!}\) is compact and \(g({{\varvec{x}}})>0\) for all \({{\varvec{x}}}\in {\!\!}\llbracket {\varGamma }\rrbracket {\!\!}\), then \(g\in \text{ PO }(\varGamma )\).

From Schmüdgen’s Positivstellensatz, any polynomial g which is positive on \({\!\!}\llbracket {\varGamma }\rrbracket {\!\!}\) can be represented by

$$\begin{aligned} (\ddag ): g=\sum _{w\in \{0,1\}^m}h_w\cdot g_w~, \end{aligned}$$

where \(g_w:=\prod _{i=1}^m g_i^{w_i}\) and \(h_w\in \varTheta \) for each \(w\in \{0,1\}^m\). To apply Schmüdgen’s Positivstellensatz, the degrees of those \(h_w\)’s are restricted to be no greater than a fixed natural number. Then from Remark 4 and by equating the coefficients of the same monomials between the two polynomials, Eq. (\(\ddag \)) results in a system of linear equalities that involves coefficients of g and variables (grouped as \(2^m\) square matrices) under semi-definite constraints.

Example 6

Consider that \(X=\{x\}\) and \(\varGamma =\{1-x,1+x\}\). Choose the maximal degree for sums of squares to be 2. Then from Remark 4, the form of Eq. (\(\ddag \)) can be written as:

$$\begin{aligned} g=\sum _{i=1}^4 \left[ \begin{pmatrix} 1&x \end{pmatrix}\cdot \begin{pmatrix} a_{i,1,1} &{} a_{i,1,2} \\ a_{i,2,1} &{} a_{i,2,2} \end{pmatrix}\cdot \begin{pmatrix} 1 \\ x\end{pmatrix}\right] \cdot u_i \end{aligned}$$

where \(u_1=1\), \(u_2=1-x\), \(u_3=1+x\), \(u_4=1-x^2\) and each matrix \((a_{i,j,k})_{2\times 2}\) (\(1\le i\le 4\)) is a matrix of variables subject to be positive semi-definite.

Theorem 4 can be further refined by a weaker version of Putinar’s Positivstellensatz.

Theorem 5

(Putinar’s Positivstellensatz [41]). Let \(g\in {\mathfrak {R}}{\left[ X\right] }\). If (i) there exists some \(g_i\in \varGamma \) such that the set \(\{{{\varvec{x}}}\in \mathbb {R}^{|X|}\mid g_i({{\varvec{x}}})\ge 0\}\) is compact and (ii) \(g({{\varvec{x}}})>0\) for all \({{\varvec{x}}}\in {\!\!}\llbracket {\varGamma }\rrbracket {\!\!}\), then

$$\begin{aligned} (\S )\qquad g=h_0+\sum _{i=1}^m h_i\cdot g_i \end{aligned}$$

for some sums of squares \(h_0,\dots ,h_m\in \varTheta \).

Similar to Eqs. (\(\ddag \)) and (\(\S \)) results in a system of linear equalities that involves variables for synthesis of a pRSM and matrices of variables under semi-definite constraints, provided that an upper bound on the degrees of sums of squares is enforced.

Example 7

Consider that \(X=\{x\}\) and \(\varGamma =\{1-x^2, 0.5-x\}\). Choose the maximal degree for sums of squares to be 2. Then the form of Eq. (§) can be written as:

$$\begin{aligned} g=\sum _{i=1}^3 \left[ \begin{pmatrix} 1&x \end{pmatrix}\cdot \begin{pmatrix} a_{i,1,1} &{} a_{i,1,2} \\ a_{i,2,1} &{} a_{i,2,2} \end{pmatrix}\cdot \begin{pmatrix} 1 \\ x\end{pmatrix}\right] \cdot u_i \end{aligned}$$

where \(u_1=1\), \(u_2=1-x^2\), \(u_3=0.5-x\) and each matrix \((a_{i,j,k})_{2\times 2}\) (\(1\le i\le 3\)) is a matrix of variables subject to be positive semi-definite.

In the following, we introduce a Positivstellensatz entitled Handelman’s Theorem when \(\varGamma \) consists of only linear (degree one) polynomials. For Handelman’s Theorem, we assume that \(\varGamma \) consists of only linear (degree 1) polynomials and \({\!\!}\llbracket {\varGamma }\rrbracket {\!\!}\) is non-empty. (Note that whether a system of linear inequalities has a solution is decidable in PTIME [46].)

Definition 12

(Monoid). The monoid of \(\varGamma \) is defined by:

$$\begin{aligned} \text{ Monoid }(\varGamma ):=\left\{ \prod _{i=1}^k h_i \mid k\in \mathbb {N}_0 \text{ and } h_1,\dots ,h_k\in \varGamma \right\} ~~. \end{aligned}$$

Theorem 6

(Handelman’s Theorem [25]). Let \(g\in {\mathfrak {R}}{\left[ X\right] }\) be a polynomial such that \(g({{\varvec{x}}})>0\) for all \({{\varvec{x}}}\in {\!\!}\llbracket {\varGamma }\rrbracket {\!\!}\). If \({\!\!}\llbracket {\varGamma }\rrbracket {\!\!}\) is compact, then

$$\begin{aligned} (\#)\qquad g=\sum _{i=1}^d a_i\cdot u_i \end{aligned}$$

for some \(d\in \mathbb {N}\), real numbers \(a_1,\dots ,a_d\ge 0\) and \(u_1,\dots ,u_d\in \text{ Monoid }(\varGamma )\).

To apply Handelman’s theorem, we consider a natural number which serves as a bound on the number of multiplicands allowed to form an element in \(\text{ Monoid }(\varGamma )\); then Eq. (\(\#\)) results in a system of linear equalities involving \(a_1,\dots ,a_d\). Unlike previous Positivstellensatz’s, the form of Handelman’s theorem allows us to construct a system of linear equalities free from semi-definite constraints.

Example 8

Consider that \(X=\{x\}\) and \(\varGamma =\{1-x,1+x\}\). Fix the maximal number of multiplicands in an element of \(\text{ Monoid }(\varGamma )\) to be 2. Then the form of Eq. (\(\#\)) can be rewritten as \(g=\sum _{i=1}^6 a_i\cdot u_i\) where \(u_1=1\), \(u_2=1-x\), \(u_3=1+x\), \(u_4=1-x^2\), \(u_5=1-2x+x^2\), \(u_6=1+2x+x^2\) and each \(a_i\) (\(1\le i\le 6\)) is subject to be a non-negative real number.

5.2 The Algorithm for pRSM Synthesis

Based on the Positivstellensatz’s introduced in the previous part, we present our algorithm for synthesis of pRSMs. Below, we fix an input probabilistic program P, an input polynomial invariant \(I\) and an input initial configuration \((\ell _0,{{\varvec{x}}}_0)\) for P. Let \(\mathcal {G}=( L ,\bot ,(X,R),\mapsto )\) be the associated CFG of P.

Description of the Algorithm PRSMSynth. We present a succinct description of the key ideas. The description of the key steps of the algorithm is as follows.

  1. 1.

    Template \(\eta \) for a pRSM. The algorithm fixes a natural number d as the maximal degree for a pRSM, constructs \(\mathcal {M}_d\) as the set of all monomials over X of degree no greater than d, and set up a template d-pRSM \(\eta \) such that each \(\eta (\ell ,\cdot )\) is the polynomial \(\sum _{h\in \mathcal {M}_d} a_{h,\ell }\cdot h\) where each \(a_{h,\ell }\) is a (distinct) scalar variable (cf. C1).

  2. 2.

    Bound for Sums of Squares and Monoid Multiplicands. The algorithm fixes a natural number k as the maximal degree for a sum of squares (cf. Schmüdgen’s and Putinar’s Positivstellensatz) or as the maximal number of multiplicands in a monoid element (cf. Handelman’s Theorem).

  3. 3.

    RSM-Difference and Terminating-Negativity. From Remark 3, the algorithm fixes \(\epsilon \) to be 1 (cf. condition C3) and K to be \(-1\) (cf. condition C4).

  4. 4.

    Computation of pre-expectation \(\mathrm {pre}_\eta \). With \(\epsilon ,K\) fixed to be resp. \(1,-1\) in the previous step, the algorithm computes \(\mathrm {pre}_\eta \) by Definition 7, whose all involved coefficients are linear combinations from \(a_{h,\ell }\)’s.

  5. 5.

    Pattern Extraction. The algorithm extracts instances conforming to pattern (\(\dag \)) from C2, C4 and formulae presented in Definition 9, and translates them into systems of linear equalities over variables among \(a_{h,\ell }\)’s, \(\epsilon \), K, and extra matrices of variables assumed to be positive semi-definite (cf. Schmüdgen’s and Putinar’s Positivstellensatz) or scalar variables assumed to be non-negative (cf. Handelman’s Theorem) through Eqs. (\(\ddag \)), (\(\S \)) and (\(\#\)).

  6. 6.

    Solution via Semidefinite or Linear Programming. The algorithm calls semi-definite programming (for Schmüdgen’s and Putinar’s Positivstellensatz) or linear programming (for Handelman’s Theorem) in order to check the feasibility or to optimize \(\mathsf {UB}(P)\) (cf. Theorem 1 for upper bound of \(\mathsf {ET}(P)\)) over all variables among \(a_{h,\ell }\)’s and extra matrix/scalar variables from Eqs. (\(\ddag \)), (\(\S \)) and (\(\#\)). Note that the feasibility implies the existence of a (difference-bounded) d-pRSM; the existence of a d-pRSM in turn implies finite termination, and the existence of a difference-bounded d-pRSM in turn implies a concentration bound through Theorem 2.

The soundness of our algorithm is as follows.

Theorem 7

(Soundness). Any function \(\eta \) synthesized through the algorithm PRSMSynth is a valid pRSM.

Remark 5

(Efficiency). It is well-known that for semi-definite programs with a positive real number R to bound the Frobenius norm of any feasible solution, an approximate solution upto precision \(\epsilon \) can be computed in polynomial time in the size of the semi-definite program (with rational numbers encoded in binary), \(\log R\) and \(\log \epsilon ^{-1}\) [24]. Thus, our sound approach presents an efficient method for analysis of many probabilistic programs. Moreover, when each propositional polynomial predicate in the probabilistic program involves only linear polynomials, then the sound form of Handelman’s theorem can be applied, resulting in feasibility checking of systems of linear inequalities rather than semi-definite constraints. By polynomial-time algorithms for solving systems of linear inequalities [46], our approach is polynomial time (and thus efficient) over such programs.

Remark 6

(Semi-Completeness). Consider probabilistic programs of the following form: \(\mathbf{while}~\phi ~\mathbf{do}~\mathbf{if}~\star ~\mathbf{then}~P_1~\mathbf{else}~P_2~\mathbf{od}\), where \(P_1,P_2\) are single assignments, \({\!\!}\llbracket {\phi }\rrbracket {\!\!}\) is compact, and invariants which assign to each label a propositional polynomial predicate is in DNF form that involves no strict inequality (i.e. no ‘<’ or ‘>’). Upon such inputs, our approach is semi-complete in the sense that by raising the upper bounds for the degree of a sum of squares and the number of multiplicands in a monoid element, the algorithm PRSMSynth will eventually find a pRSM if it exists. This is because Theorems 4 to 6 are “semi-complete” when \({\!\!}\llbracket {\varGamma }\rrbracket {\!\!}\) is compact, as the terminal label can be separately handled by \(\mathsf {PP}(\cdot )\) so that only compact \(\varGamma \)’s for Positivstellensatz’s may be formed, and the difference between strict and non-strict inequalities does not matter (cf. Remark 3).

6 Experimental Results

In this section, we present experimental results for our algorithm through the semi-definite programming tool SOSTOOLS [3] (that uses SeDuMi [1]) and the linear programming tool CPLEX [2]. Due to space constraints, the detailed description of the input probabilistic programs are in [11].

Experimental Setup. We consider six classical examples of probabilistic programs that exhibit distinct types non-linear behaviours. Our examples are, namely, Logistic Map adopted in [14] which was previously handled by Lagrangian relaxation and semi-definite programming whereas our approach uses linear programming, Decay that models a sequence of points converging stochastically to the origin, Random Walk that models a random walk within a bounded region defined through non-linear curves, Gambler’s Ruin which is our running example (Example 1), Gambler’s Ruin Variant which is a variant of Example 1, and Nested Loop which is a nested loop with stochastic increments. Except for Gambler’s Ruin Variant and Nested Loop, our approach is semi-complete for all other examples (cf. Remark 6). In all the examples the invariants are straightforward and was manually integrated with the input. Since SOSTOOLS only produces numerical results, we modify “\(\eta (\ell ,{{\varvec{x}}})\ge 0\)” in C2 to “\(\eta (\ell ,{{\varvec{x}}})\ge 1\)” for Putinar’s or Schmüdgen’s Positivstellensatz and check whether the maximal numerical error of all equalities added to SOSTOOLS is sufficiently small over a bounded region. In our examples, the bounded region is \(\{(x,y)\mid x^2+y^2\le 2\}\) and the maximal numerical error should not exceed 1. Note that 1 is also our fixed \(\epsilon \) in C4, and by Remark 3, the modification on C2 is not restrictive. Instead, one may also pursue Sylvester’s Criterion (cf. [27, Theorem 7.2.5]) to check membership of sums of squares through checking whether a square matrix is positive semi-definite or not.

Experimental Results. In Table 2, we present the experimental results, where ‘Method’ means that whether we use either Handelman’s Theorem, Putinar’s Positivstellensatz or Schmüdgen’s Positivstellensatz to synthesize pRSMs, ‘SOSTOOLS/CPLEX’ means the running time for CPLEX/SOSTOOLS in seconds, ’error’ is the maximal numerical error of equality constraints added into SOSTOOLS (when instantiated with the solutions), and \(\eta (\ell _0,\cdot )\) is the polynomial for the initial label in the synthesized pRSM. The synthesized pRSMs (in the last column) refer to the variables of the program. All numbers except errors are rounded to \(10^{-4}\). For all the examples, our translation to the optimization problems are linear. We report the running times of the optimization tools and synthesized pRSMs. The experimental results were obtained on Intel Core i7-2600 machine with 3.4 GHz and 16 GB RAM.

Table 2. Experimental results

For all the examples we consider except Logistic Map, their almost-sure termination cannot be answered by previous approaches. For the Logistic-Map example, our reduction is to linear programming whereas existing approaches  [14, 47] reduce to semidefinite programming.

7 Conclusion and Future Work

In this paper, we extended linear ranking supermartingale (LRSM) for probabilistic programs proposed in [10, 12] to polynomial ranking supermartingales (pRSM) for nondeterministic probabilistic programs. We developed the notion of (difference bounded) pRSM and proved that it is sound for almost-sure and finite termination, as well as for concentration bound (Theorems 1 and 2). Then we developed an efficient (sound but not complete) algorithm for synthesizing pRSMs through Positivstellensatz’s (cf. Sect. 5.1), proved its soundness (Theorem 7) and argued its semi-completeness (Remark 6) over an important class of programs. Finally, our experiments demonstrate the effectiveness of our synthesis approach over various classical probabilistic programs, where LRSMs do not exist (cf. Example 1 and Remark 2). Directions of future work are to explore (a) more elegant methods for numerical problems related to semi-definite programming, and (b) other forms of RSMs for more general class of probabilistic programs.