Abstract
This paper presents a new framework to synthesize lowerbounds on the worstcase cost for nondeterministic integer loops. As in previous approaches, the analysis searches for a metering function that underapproximates the number of loop iterations. The key novelty of our framework is the specialization of loops, which is achieved by restricting their enabled transitions to a subset of the inputs combined with the narrowing of their transition scopes. Specialization allows us to find metering functions for complex loops that could not be handled before or be more precise than previous approaches. Technically, it is performed (1) by using quasiinvariants while searching for the metering function, (2) by strengthening the loop guards, and (3) by narrowing the space of nondeterministic choices. We also propose a MaxSMT encoding that takes advantage of the use of soft constraints to force the solver look for more accurate solutions. We show our accuracy gains on benchmarks extracted from the 2020 Termination and Complexity Competition by comparing our results to those obtained by the system.
This work was funded partially by the Spanish MCIU, AEI and FEDER (EU) project RTI2018094403BC31, by the CM project S2018/TCS4314 cofunded by EIE Funds of the European Union, and by the UCM CT42/18CT43/18 grant.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
One of the most important problems in program analysis is to automatically –and accurately– bound the cost of program’s executions. The first automated analysis was developed in the 70s [24] for a strict functional language and, since then, a plethora of techniques has been introduced to handle the peculiarities of the different programming languages (see, e.g., for Integer programs [5], for Javalike languages [2, 19], for concurrent and distributed languages [16], for probabilistic programs [15, 18], etc.) and to increase their accuracy (see, e.g., [10, 14, 21, 22]). The vast majority of these techniques have focused on inferring upper bounds on the worstcase cost, since having the assurance that none execution of the program will exceed the inferred amount of resources (e.g., time, memory, etc.) has crucial applications in safetycritical contexts. On the other hand, lower bounds on the bestcase cost characterize the minimal cost of any program execution and are useful in task parallelization (see, e.g., [3, 9, 10]). There are a third type of important bounds which are the focus of this work: lower bounds on the worstcase cost, they bound the worstcase cost from below. Their main application is that, together with the upper bounds on worstcase, allow us to infer tighter worstcase cost bounds (when they coincide ensuring that the inferred cost is exact) what can be crucial in safetycritical contexts. Besides, lower bounds on the worstcase cost will give us families of inputs that lead to an expensive cost, what could be used to detect performance bugs. In what follows, we use the acronyms LB\(^w\) and LB\(^b\) to refer to worstcase and bestcase lowerbounds, resp.
StateoftheArt in \(LB^{w}.\) An important difference between LB\(^w\) and LB\(^b\) is that, while the bestcase must consider all program runs, LB\(^w\) holds for (usually infinite) families of the most expensive program executions. This is why the techniques applicable to LB\(^b\) inference (e.g., [3, 9, 10]) are not useful for LB\(^w\) in general, since they would provide too inaccurate (low) results. The stateoftheart in LB\(^w\) inference is [12, 13] (implemented in the system) which introduces a variation of ranking functions, called metering functions, to underestimate the number of iterations of simple loops, i.e., loops without branching nor nested loops. The core of this method is a simplification technique that allows treating general loops (with branchings and nested loops) by using the socalled acceleration: that replaces a transition representing one loop iteration by another rule that collects the effect of applying several consecutive loop iterations using the original rule. Asymptotic lower bounds are then deduced from the resulting simplified programs using a specialpurpose calculus and an SMT encoding.
Motivation. Our work is motivated by the limitation of stateoftheart methods when, by treating each simple loop separately, a LB\(^w\) bound cannot be found or it is too imprecise. For example, consider the interleaved loop in Fig. 1, that is a simplification of the benchmark from the Termination and Complexity competition. Its transition system appears to the right (the transition system is like a controlflow graph (CFG) in which the transitions \(\tau \) are labeled with the applicability conditions and with the updates for the variables, primed variables denote the updated values). In every iteration x or y can decrease by one, and these behaviors can interleave. The worst case is obtained for instance when x is decreased to 0 (x iterations) and then y is decreased to 0 (y iterations), resulting in \(x+y\) iterations, or when y is first decreased to 1 and then x to \(1\), etc. The approach in [12, 13] accelerates independently both \(\tau _1\) and \(\tau _4\), resulting in accelerated versions \(\tau ^a_1 = x \ge 1 \wedge y > 0 \wedge x' = 1 \wedge y' = y\) with cost \(x+1\) and \(\tau ^a_4 = x \ge 0 \wedge y \ge 0 \wedge x'=x \wedge y' = 0\) with cost y. Applying one accelerated version results in that the other accelerated version cannot be applied because of the final values of the variables. Thus, the overall knowledge extracted from the loop is that it can iterate \(x+1\) or y times, whereas the precise LB\(^w\) is \(x+y\) iterations. Our challenge for inferring more precise LB\(^w\) is to devise a method that can handle all loop transitions simultaneously, as disconnecting them leads to a semantics loss that cannot be recovered by acceleration.
NonTermination and \(LB^{w}.\) Our work is inspired by [17], which introduces the powerful concept of quasiinvariant to find witnesses for nontermination. A quasiinvariant is an invariant which does not necessarily hold on initialization, and can be found as in templatebased verification [23]. Intuitively, when there is a loop in the program that can be mapped to a quasiinvariant that forbids executing any of the outgoing transitions of the loop, then the program is nonterminating. This paper leverages such powerful use of quasiinvariants and MaxSMT in nontermination analysis to the more difficult problem of LB\(^w\) inference. Nontermination and LB\(^w\) are indeed related properties: in both cases we need to find witnesses, resp., for nonterminating the loop and for executing at least a certain number of iterations. For LB\(^w\), we additionally need to provide such underestimation for the number of iterations and search for LB\(^w\) behaviors that occur for a class of inputs rather than for a single input instantiation (since the LB\(^w\) for a single input is a concrete (i.e., constant) cost, rather than a parametric LB\(^w\) function as we are searching for). Instead, for nontermination, it is enough to find a nonterminating input instantiation.
Our Approach. A fundamental idea of our approach is to specialize loops in order to guide the search of the metering functions of complex loops, avoiding the inaccuracy introduced by disconnecting them into simple loops. To this purpose, we propose specializing loops by combining the addition of constraints to their transitions with the restriction of the valid states by means of quasiinvariants. For instance, for the loop in Fig. 1, our approach automatically narrows \(\tau _1\) by adding \(x>0\) (so that x is decreased until \(x=0\)) and \(\tau _4\) by adding \(x\le 0\) (so that \(\tau _4\) can only be applied when \(x=0\)). This specialized loop has lost many of the possible interleavings of the original loop but keeps the worst case execution of \(x+y\) iterations. These specialized guards do not guarantee that the loop executes \(x+y\) iterations in every possible state, as the loop will finish immediately for \(x<0\) or \(y\le 0\), thus our approach also infers the quasiinvariant \(x\ge 0 \wedge x \le y\). Combining the specialized guards and the quasiinvariant, we can assure that when reaching the loop in a valid state according to the quasiinvariant, \(x+y\) is a lower bound on the number of iterations of the loop, i.e., its cost. Using quasiinvariants that include all (invariant) inequalities syntactically appearing in loop transitions might work for the case of loops with single path. However, for the general case, the specialized guards usually lead to essential quasiinvariants that do not appear in the original loop. The specialization achieved by adding constraints could be also applied in the context of nontermination to increase the accuracy of [17], as only quasiinvariants were used. Therefore, we argue that our work avoids the precision loss caused by the simplification in [12, 13] and, besides, introduces a loop specialization technique that can also be applied to gain precision in nontermination analysis [17].
Contributions. Briefly, our main theoretical and practical contributions are:

1.
In Sect. 3 we introduce several semantic specializations of loops that enable the inference of local metering functions for complex loops by: (1) restricting the input space by means of automatically generated quasiinvariants, (2) narrowing transition guards and (3) narrowing nondeterministic choices.

2.
We propose a templatebased method in Sect. 4 to automate our technique which is effectively implemented by means of a MaxSMT encoding. Whereas the use of templates is not new [6], our encoding has several novel aspects that are devised to produce better lowerbounds, e.g., the addition of (soft) constraints that force the solver look for larger lowerbound functions.

3.
We implement our approach in the system and evaluate it on benchmarks from the Integer Transition Systems category of the 2020 Termination and Complexity Competition (see Sect. 5). Our experimental results when compared to the existing system [12] are promising: they show further accuracy of in challenging examples that contain complex loops.
2 Background
This section introduces some notation on the program representation and recalls the notion of LB\(^w\) we aim at inferring.
2.1 Program Representation
Our technique is applicable to sequential nondeterministic programs with integer variables and commands whose updates can be expressed in linear (integer) arithmetic. We assume that the nondeterminism originates from nondeterministic assignments of the form “x:=nondet();”, where x is a program variable and nondet() can be represented by a fresh nondeterministic variable u. This assumption allows us to also cover nondeterministic branching, e.g., “if (*){..} else {..}” as it can be expressed by introducing a nondeterministic variable u and rewriting the code as “u:=nondet(); if (u\(\ge \) 0){..} else {..}”.
Our programs are represented using transition systems, in particular using the formalization of [17] that simplifies the presentation of some formal aspects of our work. A transition system (abbrev. TS) is a tuple \(\mathcal{S} =\langle {\bar{x},\bar{u},\mathcal{L},\mathcal{T},\varTheta } \rangle \), where \(\bar{x}\) is a tuple of n integer program variables, \(\bar{u}\) is a tuple of integer (nondeterministic) variables, \(\mathcal{L} \) is a set of locations, \(\mathcal{T} \) is a set of transitions, and \(\varTheta \) is a formula that defines the valid input and is specified by a conjunction of linear constraints of the form \(\bar{a}\cdot \bar{x}+b \diamond 0\) where \(\diamond \in \{>,<,=,\ge ,\le \}\). A transition is of the form \((\ell ,\ell ',\mathcal{R}) \in \mathcal{T} \) such that \(\ell ,\ell '\in \mathcal{L} \), and \(\mathcal{R} \) is a formula over \(\bar{x}\), \(\bar{u}\) and \(\bar{x}'\) that is specified by a conjunction of linear constraints of the form \(\bar{a}\cdot \bar{x}+\bar{b}\cdot \bar{u}+\bar{c}\cdot \bar{x}'+d \diamond 0\) where \(\diamond \in \{>,<,=,\ge ,\le \}\), and primed variables \(\bar{x}'\) represent the values of the unprimed corresponding variables after the transition. We sometimes write \(\mathcal{R} \) as \(\mathcal{R} (\bar{x},\bar{u},\bar{x}')\), use \(\mathcal{R} (\bar{x})\) to refer to the constraints that involve only variables \(\bar{x}\) (i.e., the guard), and use \(\mathcal{R} (\bar{x},\bar{u})\) to refer to the constraints that involve only variables \(\bar{u}\) and (possibly) \(\bar{x}\). W.l.o.g., we may assume that constraints involving primed variables are of the form \(x_i'=\bar{a}\cdot \bar{x}+\bar{b}\cdot \bar{u}+c\). This is because nondeterminism can be moved to \(\mathcal{R} (\bar{x},\bar{u})\) – if a primed variable \(x_i'\) appears in any expression that is not of this form, we replace \(x_i'\) by a fresh nondeterministic variable \(u_i\) in such expressions and add the equality \(x_i'=u_i\). We require that for any \(\bar{x}\) satisfying \(\mathcal{R} (\bar{x})\), there are \(\bar{u}\) satisfying \(\mathcal{R} (\bar{x},\bar{u})\), formally
This guarantees that for any state \(\bar{x}\) satisfying the condition, there are values for the nondeterministic variables \(\bar{u}\) such that we can make progress. A transition that does not satisfy this condition is called invalid. Note that (1) does not refer to \(\bar{x}'\) since they are set in a deterministic way, once the values of \(\bar{x}\) and \(\bar{u}\) are fixed. W.l.o.g., we assume that all coefficients and free constants, in all linear constraints, are integer; and that there is a single initial location \(\ell _0\in \mathcal{L} \) with no incoming transitions, and a single final location \(\ell _e\) with no outgoing transitions.
Example 1
The TS graphically presented in Fig. 1 is expressed as follows, considering that all inputs are valid (\(\varTheta = true \)):
\(\begin{array}{ll} \mathcal{S} \equiv \langle &{} \{x, y\}, \emptyset , \{\ell _0, \ell _1, \ell _e \}, \\ &{}\{ (\ell _0, \ell _1, x' = x \wedge y' = y), \\ &{} \,\,\, (\ell _1, \ell _1, x \ge 0 \wedge y> 0 \wedge x' = x1 \wedge y' = y),\\ &{} \,\,\, (\ell _1, \ell _e, x < 0 \wedge x' = x \wedge y' = y),\\ &{} \,\,\, (\ell _1, \ell _e, y \le 0 \wedge x' = x \wedge y' = y),\\ &{} \,\,\, (\ell _1, \ell _1, x \ge 0 \wedge y > 0 \wedge x' = x \wedge y' = y  1) \}, true \rangle \\ \end{array} \)
A configuration \(C \) is a pair \((\ell ,\sigma )\) where \(\ell \in \mathcal{L} \) and \(\sigma :\bar{x}\mapsto \mathbb Z \) is a mapping representing a state. We abuse notation and use \(\sigma \) to refer to \(\wedge _{i=1}^nx_i=\sigma (x_i)\), and also write \(\sigma '\) for the assignment obtained from \(\sigma \) by renaming the variables to primed variables. There is a transition from \((\ell ,\sigma _1)\) to \((\ell ',\sigma _2)\) iff there is \((\ell ,\ell ',\mathcal{R})\in \mathcal{T} \) such that \(\exists \bar{u}.\sigma _1\wedge \sigma _2' \models \mathcal{R} \). A (valid) trace t is a (possibly infinite) sequence of configurations \( (\ell _0,\sigma _0),(\ell _1,\sigma _1),\ldots \) such that \(\sigma _0\models \varTheta \), and for each i there is a transition from \((\ell _i,\sigma _i)\) to \((\ell _{i+1},\sigma _{i+1})\). Traces that are infinite or end in a configuration with location \(\ell _e\) are called complete. A configuration \((\ell ,\sigma )\), where \(\ell \ne \ell _e\), is blocking iff
A TS is nonblocking if no trace includes a blocking configuration. We assume that the TS under consideration is nonblocking, and thus any trace is a prefix of a complete one. Throughout the paper, we represent a TS as a CFG, and analyze its strongly connected components (SCC) one by one. An SCC is said to be trivial if it has no edge.
2.2 LowerBounds
For simplicity, we assume that an execution step (a transition) costs 1. Under this assumption, the cost of a trace t is simply its length \(len (t)\) where the length of an infinite trace is \(\infty \). In what follows, the set of all configurations is denoted by \(\mathcal{C} \), the set of all valid complete traces (using a transition system \(\mathcal{S} \)) when starting from configuration \(C \in \mathcal{C} \) is denoted by \( Traces _{\mathcal{S}}(C) \), and \(\mathbb R_{\ge 0} = \{k \in \mathbb R \mid k\ge 0\}\cup \{\infty \}\). For a nonempty set \(M \subseteq \mathbb R_{\ge 0} \), sup M is the least upper bound of M and \( inf ~M\) is the greatest lower bound of M. The worstcase cost of an initial configuration \(C \) is the cost of the most expensive complete trace starting from C and the bestcase cost is the less expensive complete trace.
Definition 1 (worst and bestcase cost)
Let \(\mathcal{S} \) be a TS. Its worstcase cost function \(wc_\mathcal{S}:\mathcal{C} \rightarrow \mathbb R_{\ge 0} \) is \(wc_\mathcal{S} (C)= sup ~\{ len (t) \mid t\in Traces _{\mathcal{S}}(C) \}\) and its bestcase cost function \(bc_\mathcal{S}:\mathcal{C} \rightarrow \mathbb R_{\ge 0} \) is \(bc_\mathcal{S} (C)= inf ~\{ len (t) \mid t \in Traces _{\mathcal{S}}(C) \}\).
Clearly, \(wc_\mathcal{S} \) and \(bc_\mathcal{S} \) are not computable. Our goal in this paper is to automatically find a lowerbound function \(\rho :\mathbb Z ^n \rightarrow \mathbb R_{\ge 0} \) such that for any initial configuration \(C =(\ell _0,\sigma )\) we have \(wc_\mathcal{S} (C) \ge \rho (\sigma (\bar{x}))\), i.e., it is an LB\(^w\). An LB\(^b\) would be a function \(\rho :\mathbb Z ^n \rightarrow \mathbb R_{\ge 0} \) that ensures that \(bc_\mathcal{S} (C) \ge \rho (\bar{x})\) for any initial configuration \(C =(\ell _0,\sigma )\). In what follows, for a function \(\rho (\bar{x})\), we let \({\parallel }\rho (\bar{x}){\parallel } =\lceil \max (0,\rho (x)) \rceil \) to map all negative valuations of \(\rho \) to zero.
Example 2
Consider the TS \(\mathcal{S} =\langle {\{x\},\{u\}, \{\ell _0, \ell _1, \ell _e\},\mathcal{T}, true } \rangle \) with transitions:
\( \begin{array}{ll} \mathcal{T} \equiv \{ &{} \tau _1 = (\ell _0, \ell _1, x \ge 0 ), \\ &{} \tau _2 = (\ell _1, \ell _1, x > 0 \wedge x' = xu \wedge u \ge 1 \wedge u \le 2), \\ &{} \tau _3 =(\ell _1, \ell _e, x \le 0 \wedge x'=x) ~\} \\ \end{array} \)
\(\mathcal{S} \) contains a loop at \(\ell _1\) where variable x is nondeterministically decreased by 1 or 2. From any initial configuration \(C _0 = (\ell _0, \sigma _0)\), the longest possible complete trace decreases x by 1 in every iteration with \(\tau _2\), therefore \(wc_\mathcal{S} (C _0) =\,{\parallel }\sigma _0(x){\parallel } +2\) because of the \({\parallel }\sigma _0(x){\parallel } \) iterations in \(\ell _1\) plus the cost of \(\tau _1\) and \(\tau _3\). The most precise lower bound for \(wc_\mathcal{S} \) is \(\rho (x) = \,{\parallel }x{\parallel } +2\), although \(\rho (x) = \,{\parallel }x{\parallel } \) or \(\rho (x) = \,{\parallel }x  2{\parallel } \) are also valid lower bounds. The shortest complete trace from \(C _0\) decreases x by 2 in every iteration, so \(bc_\mathcal{S} (C _0) = {\parallel } \frac{\sigma _0(x)}{2}{\parallel } +2\). There are several valid lower bounds for \(bc_\mathcal{S} (C _0)\) like \(\rho (x) = \,{\parallel }\frac{x}{2}{\parallel } +2\), \(\rho (x) = \,{\parallel } \frac{x}{2} {\parallel } \), or \(\rho (x) = 2\).
3 Local LowerBound Functions
Focus on Local Bounds. Existing techniques and tools for cost analysis (e.g., [1, 12]) work by inferring local (iteration) bounds for those parts of the TS that correspond to loops, and then combining these bounds by propagating them “backwards” to the entry point in order to obtain a global bound. For example, suppose that our program consists of the following two loops:
where the second loop makes x iterations (when considering the value of x just before executing the loop), and the first loop makes z iterations and increments x by z in each iteration. We are interested in inferring a global function that describes the total number of iterations of both loops, in terms of the input values \(x_0\) and \(z_0\). While both loops have linear complexity locally, i.e., iteration bounds z and x, the second one has quadratic complexity w.r.t the initial values. This can be inferred automatically from the local bounds z and x by inferring how the value of x changes in the first loop, and then rewriting x in terms of the initial values to \(e=x_0+\frac{z_0\cdot (z_01)}{2}\) (e.g., by solving corresponding recurrence relations). Now the global cost would be e plus the cost of the first loop \(z_0\). Rewriting the loop bound x as above is done by propagating it backwards to the entry point, and there are several techniques in the literature for this purpose that can be directly adopted in our setting to produce global bounds. These techniques can infer global bounds for nestedloops as well, given the iteration bounds of each loop. Thus, we focus on inferring local lowerbounds on the number of iterations that nonnested loops (more precisely, parts of the TS that correspond to loops) can make, and assume that they can be rewritten to global bounds by adopting the existing techniques of [1, 12] (our implementation indeed could be used as a blackbox which provides local lowerbounds to these tools). Namely, we aim at inferring, for each nonnested loop, a function \({\parallel }\rho (\bar{x}){\parallel } =\lceil \max (0,\rho (x)) \rceil \) that is a (local) LB\(^w\) on its number of iterations, i.e., whenever the loop is reached with values \(\bar{v}\) for the variables \(\bar{x}\), it is possible to make at least \({\parallel }\rho (\bar{v}){\parallel } \) iterations.
Loops and TSs. For ease of presentation, we first consider a special case of TSs in which all locations, except the initial and exit ones define loops, and Sect. 3.6 explains how the techniques can be used for the general case. In particular, we consider that each nontrivial SCC consists of a single location \(\ell \) and at least one transition, and we call it loop \(\ell \). Transitions from \(\ell \) to \(\ell \) are called loop transitions and their guards are called loop guards, and transitions from \(\ell \) to \(\ell '\ne \ell \) are called exit transitions. The number of iterations of a loop \(\ell \) in a trace t is defined as the number of transitions from \(\ell \) to \(\ell \), which we refer to as the cost of loop \(\ell \) as well (since we are assuming that the cost of transitions is always 1, see Sect. 2.2). The notions of bestcase and worstcase cost in Definition 1 naturally extend to the cost of a loop \(\ell \), i.e., we can ask what is the bestcase and worstcase number of iterations of a given loop.
Overview of the Section. The overall idea of our approach is to specialize each loop \(\ell \), by restricting the initial values and/or adding constraints to its transitions, such that it becomes possible to obtain a metering function for the specialized loop. A function that is a LB\(^b\) of the specialized loop is by definition a LB\(^w\) of loop \(\ell \), as it does not necessarily hold for all execution traces but rather for the class of restricted ones. Technically, inferring a LB\(^b\) of a (specialized) loop is done by inferring a metering function \(\rho \) [13], such that whenever the (specialized) loop is reached with a state \(\sigma \), it is guaranteed to make at least \({\parallel }\rho (\sigma (\bar{x})){\parallel } \) iterations. Besides, specialization is done in such away that the TS obtained by putting all specialized loops together is nonblocking, i.e., there is an execution that is either nonterminating or reaches the exit location, and thus the cost of this execution is, roughly, the sum of the costs of all (specialized) loops that are traversed. The rest of this section is organized as follows. In Sect. 3.1 we generalize the basic definition of metering function for simple loops from [12] to general types of loops and explore its limitations. Then, in the following 3 sections, we explain how to overcome these limitations by means of the following specializations: using quasiinvariants to narrow the set of input values (Sect. 3.2); narrowing loop guards to make loop transitions mutually exclusive and force some execution order between them (Sect. 3.3); and narrowing the space of nondeterministic choices to force longer executions (Sect. 3.4). Sect. 3.5 states the conditions, to be satisfied when specializing loops, in order to guarantee that the TS obtained by putting all specialized loops together is nonblocking.
3.1 Metering Functions
Metering functions were introduced by [13], as a tool for inferring a lowerbound on the number of iterations that a given loop can make. The definition is analogue to that of (linear) ranking function which is often used to infer upperbounds on the number of iterations. The definition as given in [13] considers a loop with a single transition, and assumes that the exit condition is the negation of its guard. We start by generalizing it to our notion of loop.
Definition 2 (Metering function)
We say that a function \(\rho _\ell \) is a metering function for a loop \(\ell \in \mathcal{L} \), if the following conditions are satisfied
Intuitively, Condition (3) requires \(\rho _\ell \) to decrease at most by 1 in each iteration, and Condition (4) requires \(\rho _\ell \) to be nonpositive when leaving the loop.
Assuming \((\ell ,\sigma )\) is a reachable configuration in \(\mathcal{S} \), it is easy to see that loop \(\ell \) will make at least \({\parallel }\rho _\ell (\sigma (\bar{x})){\parallel } \) iterations when starting from \((\ell ,\sigma )\). We require \((\ell ,\sigma )\) to be reachable in \(\mathcal{S} \) since we are interested only in nonblocking executions. Typically, we are interested in linear metering functions, i.e., of the form \(\rho _\ell (\bar{x})=\bar{a}\cdot \bar{x}+a_0\), since they are easier to infer and cover most loops in practice. Nonlinear lowerbound functions will be obtained when rewriting these local linear lowerbounds in terms of the initial input at location \(\ell _0\) (see beginning of Sect. 3) and by composing nested loops (see Sect. 3.6).
Example 3 (Metering function)
Consider the following loop on location \(\ell _1\) that decreases x (\(\tau _1\)) until it takes nonpositive values and exits to \(\ell _2\) (\(\tau _2\)):
The function \(\rho _{\ell _1}(x) = x+1\) is a valid metering function because it decreases by exactly 1 in \(\tau _1\) and becomes nonpositive when \(\tau _2\) is applicable (\(x<0 \rightarrow x+1 \le 0\), Condition (3) of Definition 2). The function \(\rho '_{\ell _1}(x) = \frac{x}{2}\) is also metering because its value decreases by less than 1 when applying \(\tau _1\) (\(\frac{x}{2}  \frac{x1}{2} = \frac{1}{2}\le 1\)) and becomes nonpositive in \(\tau _2\). Even a function as \(\rho ''_{\ell _1}(x) = 0\) is trivially metering, as it satisfies (3) and (4). Although all of them are valid metering functions, \(\rho _{\ell _1}(x)\) is preferable as it is more accurate (i.e., larger) and thus captures more precisely the number of iterations of the loop. Note that functions like \(\rho ^*_{\ell _1}(x) = 2x\) or \(\rho ^{**}_{\ell _1}(x) = x+5\) are not metering because they do not verify (3) (because \(2x  2(x1) = 2 \not \le 1 \) for \(\rho ^*_{\ell _1}\)) or (4) (because \(x<0 \not \rightarrow x+5 \le 0\) for \(\rho ^{**}_{\ell _1}\)).
3.2 Narrowing the Set of Input Values Using QuasiInvariants
Metering functions typically exist for loops with simple loop guards. However, when guards involve more than one inequality they usually do not exist in a simple (linear) form. This is because such loops often include several exit transitions with unrelated conditions, where each one corresponds to the negation of an inequality of the guard. It is unlikely then that a nontrivial (linear) function satisfies (4) for all exit transitions. This is illustrated in the next example.
Example 4
Consider the following loop that iterates on \(\ell _1\) if \(x\ge 0 \wedge y>0\), and exits when \(x<0\) or \(y \le 0\):
Intuitively, this loop executes \(x+1\) transitions, but \(\rho _{\ell _1}(x,y) = x+1\) is not a valid metering function because it does not satisfy (4) for \(\tau _3\): \(y \le 0 \not \rightarrow x+1 \le 0\). Moreover, no other function depending on x (e.g., \(\frac{x}{2}\), \(x2\), etc.) will be a valid metering function, as it will be impossible to prove (4) for \(\tau _3\) only from the information \(y \le 0\) on its guard. The only valid metering function for this loop will be the trivial one \(\rho _{\ell _1}(x,y) = c\) with \(c\le 0\), which does not provide any information about the number of iterations of the loop.
Our proposal to overcome the imprecision discussed above is to consider only a subset of the input values s.t. conditions (3,4) hold in the context of the corresponding reachable states. For example, the reachable states might exclude some of the exit transitions, i.e., it is guaranteed that they are never used, and then (4) is not required to hold for them. A metering function in this context is a LB\(^b\) of the loop when starting from that specific input, and thus it is a LB\(^w\) (i.e., not necessarily bestcase) of the loop when the input values are not restricted.
Technically, our analysis materializes the above idea by relying on quasiinvariants [17]. A quasiinvariant for a loop \(\ell \) is a formula \(\mathcal{Q} _\ell \) over \(\bar{x}\) such that
Intuitively, \(\mathcal{Q} _\ell \) is similar to an inductive invariant but without requiring it to hold on the initial states, i.e., once \(\mathcal{Q} _\ell \) holds it will hold during all subsequent visits to \(\ell \). This also means that for executions that start in states within \(\mathcal{Q} _\ell \), it is guaranteed that \(\mathcal{Q} _\ell \) is an overapproximation of the reachable states. Condition (6) is used to avoid quasiinvariants that are false. Given a quasiinvariant \(\mathcal{Q} _\ell \) for \(\ell \), we say that \(\rho _\ell \) is a metering function for \(\ell \) if the following holds
Intuitively, these conditions state that (3,4) hold in the context of the states induced by \(\mathcal{Q} _\ell \). Assuming that \((\ell ,\sigma )\) is reachable in \(\mathcal{S} \) and that \(\sigma \models \mathcal{Q} _\ell \), loop \(\ell \) will make at least \({\parallel }\rho _\ell (\sigma (\bar{x})){\parallel } \) iterations in any execution that starts in \((\ell ,\sigma )\).
Example 5
Recall that the loop in Example 4 only admitted trivial metering functions because of the exit transition \(\tau _3\). It is easy to see that \(\mathcal{Q} _{\ell _1}\equiv x < y\) verifies (5,6), because y is not modified in \(\tau _1\) and x decreases, and thus it is a quasiinvariant. In the context of \(\mathcal{Q} _{\ell _1}\), function \(\rho _{\ell _1}(x,y) = x+1\) is metering because when taking \(\tau _3\) the value of x is guaranteed to be negative, i.e., \(\tau _3\) satisfies (8) because \(x < y \wedge y \le 0 \rightarrow x+1 \le 0\). Notice that \(\rho _{\ell _1}(x,y) = x+1\) will still be a valid metering function considering other quasiinvariants of the form \(\mathcal{Q} '_{\ell _1} \equiv y>c\) with \(c\ge 0\), as they would completely disable transition \(\tau _3\).
3.3 Narrowing Guards
The loops that we have considered so far consist of a single loop transition, what makes easier to find a metering function. This is because there is only one way to modify the program variables (with some degree of nondeterminism induced by the nondeterministic variables). However, when we allow several loop transitions, we can have loops for which a nontrivial metering function does not exist even when narrowing the set of input values.
Example 6
Consider the extension of the loop in Example 4 with a new transition \(\tau _4\) that decrements y (it corresponds to the example in Sect. 1):
\( \begin{array}{l} \tau _1 = (\ell _1, \ell _1, x \ge 0 \wedge y> 0 \wedge x' = x1 \wedge y' = y)\\ \tau _4 = (\ell _1, \ell _1, x \ge 0 \wedge y > 0 \wedge x' = x \wedge y' = y1)\\ \tau _2 = (\ell _1, \ell _2, x <0 \wedge x'=x \wedge y'=y) \\ \tau _3 = (\ell _1, \ell _2, y \le 0 \wedge x'=x \wedge y'=y) \\ \end{array} \)
The most precise LB\(^w\) of this loop is \({\parallel }\rho _{\ell _1}(x,y){\parallel } \) where \(\rho _{\ell _1}(x,y)=x+y\). As mentioned, this corresponds, e.g., to an execution that uses \(\tau _1\) until \(x=0\), i.e., x times, and then \(\tau _4\) until \(y=0\), i.e., y times. It is easy to see that if we start from a state that satisfies \(x \ge 0\wedge x \le y\), then it will be satisfied during the particular execution that we just described. Moreover, assuming that \(\mathcal{Q} _{\ell _1}\equiv x \ge 0\wedge x \le y\) is a quasiinvariant, it is easy to show that together with \(\rho _{\ell _1}\) we can verify (7,8), and thus \(\rho _{\ell _1}\) will be a metering function. However, unfortunately, \(\mathcal{Q} _{\ell _1}\) is not a quasiinvariant since the above loop can make executions other than the one described above (e.g., decreasing y to 1 first and then x to 0).
Our idea to overcome this imprecision is to narrow the set of states for which loop transitions are enabled, i.e., strengthening loop guards by additional inequalities. This, in principle, reduces the number of possible executions, and thus it is more likely to find a metering function (or a better quasiinvariant), because now they have to be valid for fewer executions. For example, this might force an execution order between the different paths, or even disable some transitions by narrowing their guard to false. Again, a metering function for the specialized loop is not a valid LB\(^b\) of the original loop, but rather its a valid LB\(^w\) that is what we are interested in. Next, we state the requirements that such narrowing should satisfy. The choice of a narrowing that leads to longer executions is discussed in Sect. 4.
A guard narrowing for a loop transition \(\tau \in \mathcal{T} \) is a formula \(\mathcal{G} _{\tau }(\bar{x})\), over variables \(\bar{x}\). A specialization of a loop is obtained simply by adding these formulas to the corresponding transitions. Conditions (5)(8) can be specialized to hold only for executions that use the specialized loop as follows. Suppose that for a loop \(\ell \in \mathcal{L} \) we are given a narrowing \(\mathcal{G} _\tau \) for each loop transition \(\tau \), then \(\mathcal{Q} _\ell \) and \(\rho _\ell \) are quasiinvariant and metering function resp. for the corresponding specialized loop if the following conditions hold
Conditions (9,10) guarantee that \(\mathcal{Q} _\ell \) is a nonempty quasiinvariant for the specialized loop, and conditions (11,12) guarantee that \(\rho _\ell \) is a metering function for the specialized loop in the context of \(\mathcal{Q} _\ell \). However, in this case, function \(\rho _\ell \) induces a lowerbound on the number of iterations only if the specialized loop is nonblocking for states in \(\mathcal{Q} _\ell \). This is illustrated in the following example.
Example 7
Consider the loop from Example 3 where we have specialized the guard of \(\tau _1\) by adding \(x \ge 5\):
\( \begin{array}{l} \tau _1 = (\ell _1, \ell _1, x \ge 0 \wedge x \ge 5 \wedge x' = x1) ~~~~~~ \tau _2 = (\ell _1, \ell _2, x <0 \wedge x'=x) \\ \end{array} \)
With this specialized guard and considering \(\mathcal{Q} _{\ell _1}\equiv true \), the metering function \(\rho _{\ell _1}(x) = x+1\) still satisfies (11,12), and \(\mathcal{Q} _{\ell _1}\) trivially satisfies (9,10). However, \(\rho _{\ell _1}\) is not a valid measure of the number of transitions executed because the loop gets blocked whenever x takes values \(0 \le x \le 5\), and thus it will never execute \(x+1\) transitions.
To guarantee that the specialized loop is nonblocking for states in \(\mathcal{Q} _\ell \), it is enough to require the following condition to hold
Intuitively, it states that from any state in \(\mathcal{Q} _\ell \) we can make progress, either by making a loop iteration or exiting the loop. Assuming that \((\ell ,\sigma )\) is reachable in \(\mathcal{S} \) and that \(\sigma \models \mathcal{Q} _\ell \), the specialized loop \(\ell \) will make at least \({\parallel }\rho _\ell (\sigma (\bar{x})){\parallel } \) iterations in any execution that starts in \((\ell ,\sigma )\). This also means that the original loop can make at least \({\parallel }\rho _\ell (\sigma (\bar{x})){\parallel } \) iterations in any execution that starts in \((\ell ,\sigma )\).
Example 8
In Example 6, we have seen that if \(\mathcal{Q} _{\ell _1} \equiv x \le y \wedge x \ge 0\) was a quasiinvariant, then function \(\rho _{\ell _1}(x,y) = x+y\) becomes metering. We can make \(\mathcal{Q} _{\ell _1}\) a quasiinvariant by specializing the guards of the loop in transitions \(\tau _1\) and \(\tau _4\) to force the following execution with \(x+y\) iterations: first use \(\tau _1\) until \(x=0\) (x iterations) and then use \(\tau _4\) until \(y=0\) (y iterations). This behavior can be forced by taking \(\mathcal{G} _{\tau _1} \equiv x>0\) and \(\mathcal{G} _{\tau _4} \equiv x \le 0\). With \(\mathcal{G} _{\tau _1}\) we assure that x stops decreasing when \(x=0\), and with \(\mathcal{G} _{\tau _4}\) we assure that \(\tau _4\) is used only when \(x=0\). Now, \(\mathcal{Q} _{\ell _1} \equiv x \le y \wedge x \ge 0\) and \(\rho _{\ell _1}(x,y) = x+y\) are valid quasiinvariant and metering, resp. Function \(\rho _{\ell _1}\) decreases by exactly 1 in \(\tau _1\) and \(\tau _4\), is trivially nonpositive in \(\tau _2\) because that transition is indeed disabled (\(x\ge 0\) from \(\mathcal{Q} _{\ell _1}\) and \(x < 0\) from the guard) and is nonpositive in \(\tau _3\) (\(x \le y \wedge y \le 0 \rightarrow x+y \le 0\)). Regarding \(\mathcal{Q} _{\ell _1}\), it verifies (9,10), and more importantly, the loop in \(\ell _1\) is nonblocking w.r.t \(\mathcal{Q} _{\ell _1}\), \(\mathcal{G} _{\tau _1}\), and \(\mathcal{G} _{\tau _4}\), i.e., Condition (13) holds.
3.4 Narrowing Nondeterministic Choices
Loop transitions that involve nondeterministic variables, might give rise to executions of different lengths when starting from the same input values. Since we are interested in LB\(^w\), we are clearly searching for longer executions. However, since our approach is based on inferring LB\(^b\), we have to take all executions into account which might result in less precise, or even trivial, LB\(^w\).
Example 9
Consider a modification of the loop in Example 6 in which the variable x in \(\tau _1\) is decreased by a nondeterministic positive quantity u:
\( \begin{array}{l} \tau _1 = (\ell _1, \ell _1, x \ge 0 \wedge y > 0 \wedge x' = xu \wedge u \ge 1 \wedge y' = y)\\ \end{array} \)
The effect of this nondeterministic variable u is that \(\tau _1\) can be applied x times if we always take \(u=1\), \(\lceil \frac{x}{2}\rceil \) times if we always take \(u = 2\) or even only once if we take \(u > x\). As a consequence, \(\rho _{\ell _1}(x,y) = x+y\) is no longer a valid metering function because x can decrease by more than 1 in \(\tau _1\). Moreover, \(\mathcal{Q} _{\ell _1} \equiv x \le y \wedge x \ge 0\) is not a quasiinvariant anymore since \(x' = xu \wedge u \ge 1\) does not entail \(x' \ge 0\). In fact, no metering function involving x will be valid in \(\tau _1\) because x can decrease by any positive amount.
To handle this complex situation, we propose narrowing the space of nondeterministic choices, and thus metering functions should be valid wrt. fewer executions and more likely be found and be more precise. Next we state the requirements that such narrowing should satisfy. The choice of a narrowing that leads to longer executions is discussed in Sect. 4.
A nondeterministic variables narrowing for a loop transition \(\tau \in \mathcal{T} \) is a formula \(\mathcal{U} _{\tau }(\bar{x},\bar{u})\), over variables \(\bar{x}\) and \(\bar{u}\), that is added to \(\tau \) to restrict the choices for variables \(\bar{u}\). A specialized loop is now obtained by adding both \(\mathcal{G} _\tau \) and \(\mathcal{U} _\tau \) to the corresponding transitions. Suppose that for loop \(\ell \in \mathcal{L} \), in addition to \(\mathcal{G} _\tau \), we are also given \(\mathcal{U} _\tau \) for each of its loop transitions \(\tau \). For \(\mathcal{Q} _\ell \) and \(\rho _\ell \) to be quasiinvariant and metering function for the specialized loop \(\ell \), we require conditions (9)(13) to hold but after adding \(\mathcal{U} _\tau \) to the lefthand side of the implications in (9) and (11). Besides, unlike narrowing of guards, narrowing of nondeterministic choices might make a transition invalid, i.e., not satisfying Condition (1), and thus \({\parallel }\rho _\ell (\bar{x}){\parallel } \) cannot be used as a lowerbound on the number of iterations. To guarantee that specialized transitions are valid we require, in addition, the following condition to hold
This condition is basically (1) taking into account the inequalities introduced by the corresponding narrowings. Assuming that \((\ell ,\sigma )\) is reachable in \(\mathcal{S} \) and that \(\sigma \models \mathcal{Q} _\ell \), the specialized loop \(\ell \) will make at least \({\parallel }\rho _\ell (\sigma (\bar{x})){\parallel } \) iterations in any execution that starts in \((\ell ,\sigma )\), which also means, as before, that the original loop can make at least \({\parallel }\rho _\ell (\sigma (\bar{x})){\parallel } \) iterations in any execution that starts in \((\ell ,\sigma )\).
Example 10
To solve the problems shown in Example 9 we need to narrow the nondeterministic variable u to take bounded values that reflect the worstcase execution of the loop. Concretely, we need to take \(\mathcal{U} _{\tau _1} \equiv u \le 1\), which combined with \(u \ge 1\) entails \(u = 1\) so x decreases by exactly 1 in \(\tau _1\). Considering the narrowing \(\mathcal{U} _{\tau _1}\), the resulting loop is equivalent to the one presented in Example 8 so we could obtain the precise metering function \(\rho _{\ell _1}(x,y) = x+y\) with the quasiinvariant \(\mathcal{Q} _{\ell _1}\equiv x \le y \wedge x \ge 0\). Note that (14) holds for \(\tau _1\) because \(u=1\) makes the consequent true for every value of x and y: \( \begin{array}{l} \forall \bar{x}\exists \bar{u}.~ (x \le y \wedge x \ge 0) \wedge (x \ge 0 \wedge y> 0) \wedge x > 0 \rightarrow u \ge 1 \wedge u \le 1 \end{array} \)
3.5 Ensuring the Feasibility of the Specialized Loops
In order to enable the propagation of the local lowerbounds back to the input location (as we have discussed at the beginning of Sect. 3), we have to ensure that there is actually an execution that starts in \(\ell _0\) and passes through the specialized loop. In other words, we have to guarantee that when putting all specialized loops together, they still form a nonblocking TS for some set of input values. We achieve this by requiring that the quasiinvariants of the preceding loops ensure that the considered quasiinvariant for this loop also holds on initialization (i.e., it is an invariant for the considered context). Technically, we require, in addition to (9)(14), the following conditions to hold for each loop \(\ell \):
Condition (15) means that transitions entering loop \(\ell \), strengthened with the quasiinvariant of the preceding location \(\ell '\), must lead to states within the quasiinvariant \(\mathcal{Q} _\ell \). Condition (16) guarantees that \(\mathcal{Q} _{\ell _0}\) defines valid input values, i.e., within the initial condition \(\varTheta \).
Theorem 1 (soundness)
Given \(\mathcal{Q} _\ell \) for each nonexit location \(\ell \in \mathcal{L} \), narrowings \(\mathcal{G} _\tau \) and \(\mathcal{U} _\tau \) for each loop transition \(\tau \in \mathcal{T} \), and function \(\rho _{\ell }\) for each loop location \(\ell \), such that (9)(16) are satisfied, it holds:

1.
The TS \(\mathcal{S} '\) obtained from \(\mathcal{S} \) by adding \(\mathcal{G} _\tau \) and \(\mathcal{U} _\tau \) to the corresponding transitions, and changing the initial condition to \(\mathcal{Q} _{\ell _0}\), is nonblocking.

2.
For any complete trace t of \(\mathcal{S} '\), if \(C =(\ell ,\sigma )\) is a configuration in t, then t includes at least \({\parallel }\rho _{\ell }(\sigma (\bar{x})){\parallel } \) visits to \(\ell \) after \(C \) (i.e., \({\parallel }\rho _{\ell }(\bar{x}){\parallel } \) is a lowerbound function on the number of iterations of the loop defined by location \(\ell \)).
The proof of this soundness result is straightforward: it follows as a sequence of facts using the definitions of the conditions (9)(16) given in this section.
We note that when there is an unbounded overlap between the guards of the loop transitions and the guards of exit transitions, it is likely that a nontrivial metering function does not exist because it must be nonpositive on the overlapping states. To overcome this limitation, instead of using the exit transitions in (12), we can use ones that correspond to the negation of the guards of loop transitions, and thus it is ensured that they do not overlap. However, we should require (13) to hold for the original exit transitions as well in order to ensure that the nonblocking property holds. Another way to overcome this limitation is to simply strengthen the exit transitions by the negation of the guards.
As a final comment, we note that it is not needed to assume that the TS \(\mathcal{S} \) that we start with is nonblocking (even though we have done so in Sect. 2.1 for clarity). This is because our formalization above finds a subset of \(\mathcal{S} \) (\(\mathcal{S} '\) in Theorem 1) that is nonblocking, which is enough to ensure the feasibility of the local lowerbounds. This is useful not only for enlarging the set of TSs that we accept as input, but also allows us to start the analysis from any subset of \(\mathcal{S} \) that includes a path from \(\ell _0\) to the exit location. For example, it can be used to remove trivial execution paths from \(\mathcal{S} \), or concentrate on ones that include more sequences of loops (since we are interested in LB\(^w\)).
3.6 Handling General TSs
So far we have considered a special case of TSs in which all locations, except the entry and exit ones, are multipath loops. Next we explain how to handle the general case. It is easy to see that we can allow locations that correspond to trivial SCCs. These correspond to paths that connect loops and might include branching as well. For such locations, there is no need to infer metering functions or apply any specialization, we only need to assign them quasiinvariants that satisfy (15) to guarantee that the overall specialized TS is nonblocking.
The more elaborated case is when the TS includes nontrivial SCCs that do not form a multipath loop. In such case, if a SCC has a single cutpoint, we can unfold its edges and transform it into a multipath following the techniques of [1]. It is important to note that when merging two transitions, the cost of the new one is the sum of their costs. In this case the number of iterations is still a lowerbound on the cost of the loop, however, we might get a better one by multiplying it by the minimal cost of its transitions.
If a SCC cannot be transformed into a multipath loop by unfolding its transitions, then it might correspond to a nested loop, and, in such case, we can recover the nesting structure and consider them as separated TSs that are “called” from the outer one using loop extraction techniques [25]. Each innerloop is then analyzed separately, and replaced (in the original TS, where is “called”) by a single edge with its lowerbound as cost for that edge, and then the outer is analyzed taking that cost into account. Besides, to guarantee that the specialized program corresponds to a valid execution, we require the quasiinvariant of the inner loop to hold in the context of the quasiinvariant of the outer loop. This approach is rather standard in cost analysis of structured programs [1, 3, 12].
Another issue is how to compose the (local) lowerbounds of the specialized loops into a globallower bound. For this, we can rely on the techniques [1, 3] that rewrite the local lowerbounds in terms of the input values by relying on invariant generation and recurrence relations solving.
4 Inference Using MaxSMT
This section presents how metering functions and narrowings can be inferred automatically using MaxSMT, namely how to automatically infer all \(\mathcal{G} _\tau \), \(\mathcal{U} _\tau \), \(\mathcal{Q} _\ell \), and \(\rho _\ell \) such that (9)(16) are satisfied. We do it in a modular way, i.e., we seek \(\mathcal{G} _\tau \), \(\mathcal{U} _\tau \), \(\mathcal{Q} _\ell \), and \(\rho _\ell \) for one loop at a time following a (reversed) topological order of the SCCs, as we describe next. Recall that (16) is required only for loops connected directly to \(\ell _0\), and w.l.o.g. we assume there is only one such loop.
4.1 A TemplateBased Verification Approach
We first show how the templatebased approach of [6, 17] can be used to find \(\mathcal{G} _\tau \), \(\mathcal{U} _\tau \), and \(\mathcal{Q} _\ell \) by representing them as template constraint systems, i.e., each is a conjunction of linear constraints where coefficients and constants are unknowns. Also, \(\rho _\ell \) is represented as a linear template function \(\bar{a}\cdot \bar{x}+a_0\) where \((a_0,\bar{a})\) are unknowns. Then, the problem is to find concrete values for the unknowns such that all formulas generated by (9)(16) are satisfied:

Each \(\forall \)formula generated by (9)(16), except those of (14) that we handle below, can be viewed as an \(\exists \forall \) problem where the \(\exists \) is over the unknowns of the templates and the \(\forall \) is over (some of) the program variables. It is wellknown that solving such an \(\exists \forall \) problem, i.e., finding values for the unknowns, can be done by translating it into a corresponding \(\exists \) problem over the existentially quantified variables (i.e., the unknowns) using Farkas’ lemma [20], which can then be solved using an offtheshelf SMT solver.

To handle (14) we follow [17], and eliminate \(\exists \bar{u}\) using the skolemization \(u_i=\bar{a}\cdot \bar{x}+a_0\) where \((a_0,\bar{a})\) are fresh unknowns (different for each \(u_i\)). This allows handling it using Farkas’ lemma as well. However, in addition, when solving the corresponding \(\exists \) problem we require all \((a_0,\bar{a})\) to be integer. This is because the domain of program variables is the integers, and picking integer values for all \((a_0,\bar{a})\) guarantees that the values of any \(x_i'\) that depends on \(\bar{u}\) will be integer as well^{Footnote 1}.
The size of templates for \(\mathcal{G} _\tau \), \(\mathcal{U} _\tau \), and \(\mathcal{Q} _\ell \), i.e., the number of inequalities, is crucial for precision and performance. The larger the size is, the more likely that we get a solution if one exists, but also the worse the performance is (as the corresponding SMT problem will include more constraints and variables). In practice, one typically starts with templates of size 1, and iteratively increases it by 1 when failing to find values for the unknowns, until a solution is found or the bound on the size is reached.
Alternatively, we can use the approach of [17] to construct \(\mathcal{G} _\tau \), \(\mathcal{U} _\tau \), and \(\mathcal{Q} _\ell \) incrementally. This starts with templates of size 1, but instead of requiring all (9)(16) to hold, the conditions generated by (12) are marked as soft constraints (i.e., we accept solutions in which they do not hold) and use MaxSMT to get a solution that satisfies as many of such soft conditions as possible. If all are satisfied, we are done, if not, we use the current solution to instantiate the templates, and then add another template inequality to each of them and repeat the process again. This means that at any given moment, each template will include at most one inequality with unknowns. Finally, to guarantee progress from one iteration to another, soft conditions that hold at some iteration are required to hold at the next one, i.e., they become hard.
The use of (12) as soft constraint is based on the observation [12] that when seeking a metering function, the problematic part is often to guarantee that it is negative on exit transitions, which is normally achieved by adding quasiinvariants that are incrementally inferred. By requiring (12) to be soft we handle more exit transitions as the quasiinvariant gets stronger until all are covered.
4.2 Better Quality Solutions
The precision can also be affected by the quality of the solution picked by the SMT solver for the corresponding \(\exists \) problem. Since there might be many metering functions that satisfy (9)(16), we are interested in narrowing the search space of the SMT solver in order to find more accurate ones, i.e., lead to longer executions. Next we present some techniques for this purpose.
Enabling More Loop Transitions. We are interested in guard narrowings that keep as many loop transitions as possible, since such narrowings are more likely to generate longer executions. This can be done by requiring the following to hold
We also use MaxSMT to require a solution that satisfies as many disjuncts as possible and thus eliminating less loop transitions (if \(\mathcal{Q} _\ell (\bar{x}) \wedge \mathcal{R} (\bar{x}) \wedge \mathcal{G} _\tau (\bar{x})\) is false for a transition \(\tau \), then it is actually disabled). Note that this condition can be used instead of (10) that requires the quasiinvariant to be nonempty.
Larger Metering Functions. We are interested in metering functions that lead to longer executions. One way to achieve this is to require metering functions to be ranking as well, i.e., in addition to (11) we require the following to hold
These new conditions are added as soft constraints, and we use MaxSMT to ask for a solution that satisfies as many conditions as possible.
Unbounded Metric Functions. We are interested in metering functions that do not have an upper bound, since otherwise they will lead to constant lowerbound functions. For example, for a loop with a transition \(x\ge 0\wedge x'=x1\), we want to avoid quasiinvariants like \(x\le 5\) which would make the metering function x bounded by 5. For this, we rely on the following lemma.
Lemma 1
A function \(\rho (\bar{x})=\bar{a}\cdot \bar{x}+a_0\) is unbounded over a polyhedron \(\mathcal {P}\), iff \(\bar{a}\cdot \bar{y}\) is positive on at least one ray \(\bar{y}\) of the recession cone of \(\mathcal {P}\).
It is known that for a polyhedron \(\mathcal {P}\) given in constraints representation, its recession cone \(\mathtt {cone(\mathcal {P})}\) is the set specified by the constraints of \(\mathcal {P}\) after removing all free constants. Now we can use the above lemma to require that the metering function \(\rho _\ell (\bar{x})=\bar{a}\cdot \bar{x}+\bar{a}_0\) is unbounded in the quasiinvariant \(\mathcal{Q} _\ell \) by requiring the following condition to hold
where \(\mathtt {cone(\mathcal{Q} _\ell )}\) is obtained from the template of \(\mathcal{Q} _\ell \) by removing all (unknowns corresponding to) free constants, i.e., it is the recession cone of \(\mathcal{Q} _\ell \).
Note that all encodings discussed in this section generate nonlinear SMT problems, because they either correspond to \(\exists \forall \) problems that include templates on the lefthand side of implications, or to \(\exists \) problems over templates that include both program variables and unknowns.
Finally, it is important to note that the optimizations described provide theoretical guarantees to get better lower bounds: the one that adds (18,19) leads to a bound that corresponds exactly to the worstcase execution (of the specialized program), and the one that uses (20) is essential to avoid constant bounds.
5 Implementation and Experimental Evaluation
We have implemented a LOwerBound synthesizER, named , that can be used from an online web interface at http://costa.fdi.ucm.es/lober. is built as a pipeline with the following processes: (1) it first reads a KoAT file [5] and generates a corresponding set of multipath loops, by extracting parts of the TS that correspond to loops [25], applying unfolding, and inferring loop summaries to be used in the calling context of nested loops, as explained in Sect. 3.6; (2) it then encodes in SMT the conditions (9)–(13) defined through the paper, for each loop separately, by using template generation, a process that involves several nontrivial implementations using Farkas’ lemma (this part is implemented in Java and uses Z3 [8] for simple (linear) satisfiability checks when producing the MaxSMT encoding); (3) the problem is solved using the SMT solver Barcelogic [4], as it allows us to use nonlinear arithmetic and MaxSMT capabilities in order to assert soft conditions and implement the solutions described in Sect. 4; (4) in order to guarantee the correctness of our system results, we have added to the pipeline an additional checker that proves that the obtained metering function and quasiinvariants verify conditions (9)–(13) by using Z3. To empirically evaluate the results of our approach, we have used benchmarks from the Termination Problem Data Base (TPDB), namely those from the category Complexity_ITS that contains Integer Transition Systems. We have removed nonterminating TSs and terminating TSs whose cost is unbounded (i.e., the cost depends on some nondeterministic variables and can be arbitrarily high) or nonlinear, because they are outside the scope of our approach. In total, we have considered a set of 473 multipath loops from which we have excluded 13 that were nonlinear. Analyzing these 473 programs took 199 min, an average of 25 sec by program, approximately. For 255 of them, it took less than 1 s.
Table 1 illustrates our results and compares them to those obtained by the [12, 13] system, which also outputs a pair \((\rho ,\mathcal{Q})\) of a lowerbound function \(\rho \) and initial conditions \(\mathcal{Q} \) on the input for which \(\rho \) is a valid lowerbound. In order to automatically compare the results obtained by the two systems, we have implemented a comparator that first expresses costs as functions \(f: \mathbb {N} \rightarrow \mathbb {R}_{\ge 0}\) over a single variable n and then checks which function is greater. To obtain this unary cost function from the results (\(\rho \), \(\mathcal{Q} \)), we use convex polyhedra manipulation libraries to maximize the obtained cost \(\rho \) wrt. \(\mathcal{Q} \wedge \overline{n \le x_i \le n}\), where \(x_i\) are the TS variables, and express that maximized expressions in terms of n. Therefore, f(n) represents the maximum cost when the variables are bounded by \(x_i \le n\) and satisfy the corresponding initial condition \(\mathcal{Q} \), a notion very similar to the runtime complexity used in [12, 13]. Once we have both unary linear costs \(f_1(n) = k_1n + d_1\) and \(f_2(n) = k_2n + d_2\), we compare them in \(n \ge 0\) by inspecting \(k_1\) and \(k_2\).
Each row of the table contains the number of loops for which both tools obtain the same result (\(=\)), the number of loops where is better than (>) and the number of loops where is better than (<). The subcategories are obtained directly from the name of the innermost folder, except for the cases in which this folder contains too few examples that we merge them all in a folder in the parent directory. The total number of loops that are considered in each subcategory appears in column Total. Brockschmidt_16 and Hark_20 have their first row empty as all their results are contained in their subcategories. Globally, both tools behave the same in 412 programs (column “\(=\)”), obtaining equivalent linear lower bounds in 376 of them and a constant lower bound in the remaining ones. Our tool achieves a better accuracy in 37 programs (column “>”), while is more precise in 11 programs (column “<”). Let us discuss the two sets of programs in which both tools differ. As regards the 37 examples for which we get better results, we have that crashes in 4 cases and it can only find a constant lower bound in 1 example while our tool is able to find a path of linear length by introducing the necessary quasiinvariants. For the remaining 32 loops, both tools get a linear bound, but finds one that leads to an unboundedly longer execution: 18 of these loops correspond to cases that have implicit relations between the different execution paths (like our running examples) and require semantic reasoning; for the remaining 14, we get a better set of quasiinvariants. The following techniques have been needed to get such results in these 37 better cases (note that (i) is not mutually exclusive with the others):

(i)
1 needs narrowing nondeterministic choices,

(ii)
5 do not need quasiinvariants nor guard narrowing,

(iii)
14 need quasiinvariants only,

(iv)
18 need both quasiinvariants and guard narrowing (in 3 of them this is only used to disable transitions).
Therefore, this shows experimentally the relevance of all components within our framework and its practical applicability thanks to the good performance of the MaxSMT solver on nonlinear arithmetic problems. In general, for all the set of programs, we can solve 308 examples without quasiinvariants and 444 without guardnarrowing. The intersection of these two sets is: 298 examples (63% of the programs), that leaves 175 programs that need the use of some of the proposed techniques to be solved.
As regards the 11 examples for which we get worse results than , we have two situations: (1) In 6 cases, the SMTsolver is not able to find a solution. We noticed that too many quasiinvariants were required, what made the SMT problem too hard. To improve our results, we could start, as a preprocessing step, from a quasiinvariant that includes all invariant inequalities that syntactically appear in the loop transitions, something similar to what is done by when inferring what they call conditional metering function [12]. This is left for future experimentation. (2) In the other 5 cases, our tool finds a linear bound but with a worse set of quasiinvariants, which makes the bound provide unboundedly longer executions. We are investigating whether this can be improved by adding new soft constraints that guide the solver to find these better solutions. Finally, let us mention that, for the 13 problems that gives a nonlinear bound and have been excluded from our benchmarks as justified above, we get a linear bound for the 12 that have a polynomial bound (of degree 2 or more), and a constant bound for the additional one that has a logarithmic lower bound. This is the best we can obtain as our approach focuses on the inference of precise local linear bounds, as they constitute the most common type of loops.
All in all, we argue that our experimental results are promising: we triple in the number of benchmarks for which we get more accurate results and, besides, many of those examples correspond to complex loops that lead to worse results when disconnecting transitions. Besides, we see room for further improvement, as most examples for which outperforms us could be handled as accurately as them with better quasiinvariants (that is somehow a blackbox component in our framework). Syntactic strategies that use invariant inequalities that appear in the transitions, like those used in , would help, as well as further improvements in SMT nonlinear arithmetic.
Application Domains. The accuracy gains obtained by have applications in several domains in which knowing the precise cost can be fundamental. This is the case for predicting the gas usage [26] of executing smart contracts, where gas cost amounts to monetary fees. The caller of a transaction needs to include a gas limit to run it. Giving a too low gas limit can end in an “out of gas” exception and giving a too high gas limit can end in a “not enough eth (money)” error. Therefore having a tighter prediction is needed to be safe on both sides. Also, when the UB is equal to the LB, we have an exact estimation, e.g., we would know precisely the runtime or memory consumption of the most costly executions. This can be crucial in safetycritical applications and has been used as well to detect potential vulnerabilities such as denialofservice attacks. In https://apps.dtic.mil/sti/pdfs/AD1097796.pdf, vulnerabilities are detected in situations in which both bounds do not coincide. For instance, in password verification programs, if the UB and LB differ due to a difference on the delays associated to how many characters are right in the guessed password, this is identified as a potential attack.
6 Related Work and Conclusions
We have proposed a novel approach to synthesize precise lowerbounds from integer nondeterministic programs. The main novelties are on the use of loop specialization to facilitate the task of finding a (precise) metering function and on the MaxSMT encoding to find larger (better) solutions. Our work is related to two lines of research: (1) nontermination analysis and (2) LB inference. In both kinds of analysis, one aims at finding classes of inputs for which the program features a nonterminating behavior (1) or a costexpensive behavior (2). Therefore, techniques developed for nontermination might provide a good basis for developing a LB analysis. In this sense, our work exploits ideas from the MaxSMT approach to nontermination in [17]. The main idea borrowed from [17] has been the use of quasiinvariants to specialize loops towards the desired behavior: in our case towards the search of a metering function, while in theirs towards the search of a nontermination proof. However, there are fundamental differences since we have proposed other new forms of loop specialization (see a more detailed comparison in Sect. 1) and have been able to adapt the use of MaxSMT to accurately solve our problem (i.e., find larger bounds). As mentioned in Sect. 1, our loop specialization technique can be used to gain precision in nontermination analysis [17]. For instance, in this loop: “ ” no sub SCC (considering only one of the transitions) is nonterminating and no quasiinvariant can be found to ensure we will stay in the loop (when considering both transitions), hence cannot be handled by [17]. Instead if we narrow the transitions by adding \(y>=x\) in the ifcondition (and hence \(x>y\) in the else), we can prove that \(x>=0 \wedge y>=0 \wedge x+y=1\) is quasiinvariant, which allow us to prove nontermination in the way of [17] (as we will stay in the loop forever).
As regards LB inference, the current stateoftheart is the work by Frohn et al. [12, 13] that introduces the notion of metering function and acceleration. Our work indeed tries to recover the semantic loss in [12, 13] due to defining metering functions for simple loops and combining them in a later stage using acceleration. Technically, we only share with this work the basic definition of metering function in Sect. 3.1. Indeed, the definition in conditions (3) and (4) already generalizes the one in [12, 13] since it is not restricted to simple loops. This definition is improved in the following sections with several loop specializations. While [12, 13] relies on pure SMT to solve the problem, we propose to gain precision using MaxSMT. We believe that similar ideas could be adapted by [12, 13]. Due to the different technical approaches underlying both frameworks, their accuracy and efficiency must be compared experimentally wrt. the system that implements the ideas in [12, 13]. We argue that the results in Sect. 5 justify the important gains of using our new framework and prove experimentally that, the fact that we do not lose semantic relations in the search of metering functions is key to infer LB for challenging cases in which [12, 13] fails. Originally, the [12, 13] system only accelerated simple loops by using metering functions, so the overall precision of the lower bound relied on obtaining valid and precise metering functions. However, the framework in [12, 13] is independent of the accelerating technique applied. In order to increase the number of simple loops that can be accelerated, Frohn [11] proposes a calculus to combine different conditional acceleration techniques (monotonic increase/decrease, eventual increase/decrease, and metering functions). These conditional acceleration techniques assume that all the iterations of the loop verify some condition \(\varphi \), and the calculus applies the techniques in order and extract those conditions \(\varphi \) from fragments of the loop guard. Although more precise and powerful, the combined acceleration calculus considers only simple loops, so it does not solve the precision loss when the loop cost involves several interleaved transitions. Moreover, the techniques in [11] are integrated into , so the experimental evaluation in Sect. 5 compares our approach to the framework in [12, 13] extended with several techniques to accelerate loops (not only metering functions).
Finally, our approach presents similarities to the CTL* verification for ITS in [7] as both extend transition guards of the original ITS. The difference is that in [7] the added constraints only contain newly created prophecy variables and the transitions to modify are detected directly using graph algorithms; whereas our SMTbased approach adds constraints only over existing variables to satisfy the properties that characterize a good metering function. Additionally, both approaches differ both in the goal (CTL* verification vs. inference of lowerbounds) and the technologies applied (CTL model checkers vs. MaxSMT solvers).
Notes
 1.
Because we assumed that constraints involving primed variables are of the form \(x_i'=\bar{a}\cdot \bar{x}+\bar{b}\cdot \bar{u}+c\).
References
Albert, E., Arenas, P., Genaim, S., Puebla, G.: Automatic inference of upper bounds for recurrence relations in cost analysis. In: Alpuente, M., Vidal, G. (eds.) SAS 2008. LNCS, vol. 5079, pp. 221–237. Springer, Heidelberg (2008). https://doi.org/10.1007/9783540691662_15
Albert, E., Arenas, P., Genaim, S., Puebla, G., Zanardini, D.: Cost analysis of java bytecode. In: De Nicola, Rocco (ed.) ESOP 2007. LNCS, vol. 4421, pp. 157–172. Springer, Heidelberg (2007). https://doi.org/10.1007/9783540713166_12
Albert, E., Genaim, S., Masud, A.N.: On the inference of resource usage upper and lower bounds. ACM Trans. Comput. Logic 14(3), 1–35 (2013)
Bofill, M., Nieuwenhuis, R., Oliveras, A., RodríguezCarbonell, E., Rubio, A.: The barcelogic SMT solver. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 294–298. Springer, Heidelberg (2008). https://doi.org/10.1007/9783540705451_27
Brockschmidt, M., et al.: Analyzing runtime and size complexity of integer programs. ACM Trans. Program. Lang. Syst. 38(4), 1–50 (2016)
Colón, M.A., Sankaranarayanan, S., Sipma, H.B.: Linear invariant generation using nonlinear constraint solving. In: Hunt, W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 420–432. Springer, Heidelberg (2003). https://doi.org/10.1007/9783540450696_39
Cook, B., Khlaaf, H., Piterman, N.: On automation of CTL* verification for infinitestate systems. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 13–29. Springer, Cham (2015). https://doi.org/10.1007/9783319216904_2
de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/9783540788003_24
Debray, S.K., LópezGarcía, P., Hermenegildo, M., Lin, N.W.: Lower Bound Cost Estimation for Logic Programs. The MIT Press, In Logic Programming (1997)
FloresMontoya, A.: Upper and lower amortized cost bounds of programs expressed as cost relations. In: Fitzgerald, J., Heitmeyer, C., Gnesi, S., Philippou, A. (eds.) FM 2016. LNCS, vol. 9995, pp. 254–273. Springer, Cham (2016). https://doi.org/10.1007/9783319489896_16
Frohn., F.: A calculus for modular loop acceleration. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 58–76. Springer International Publishing (2020)
Frohn, F., Naaf, M., Brockschmidt, M., Giesl, J.: Inferring lower runtime bounds for integer programs. ACM Trans. Program. Lang. Syst. 42(3), 1–50 (2020)
Frohn, F., Naaf, M., Hensel, J., Brockschmidt, M., Giesl, J.: Lower runtime bounds for integer programs. In: Olivetti, N., Tiwari, A. (eds.) IJCAR 2016. LNCS (LNAI), vol. 9706, pp. 550–567. Springer, Cham (2016). https://doi.org/10.1007/9783319402291_37
Gulwani, S., Mehra, K.K., Chilimbi, T.: SPEED. In: Proceedings of the 36th Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages  POPL 2009. ACM Press (2008)
Hark, M., Kaminski, B.L., Giesl, J., Katoen, J.P.: Aiming low is harder: induction for lower bounds in probabilistic program verification. In: Proceedings of the ACM on Programming Languages, 4(POPL), pp. 1–28, January 2020
Hoffmann, J., Shao, Z.: Automatic static cost analysis for parallel programs. In: Vitek, J. (ed.) ESOP 2015. LNCS, vol. 9032, pp. 132–157. Springer, Heidelberg (2015). https://doi.org/10.1007/9783662466698_6
Larraz, D., Nimkar, K., Oliveras, A., RodríguezCarbonell, E., Rubio, A.: Proving nontermination using MaxSMT. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 779–796. Springer, Cham (2014). https://doi.org/10.1007/9783319088679_52
Ngo, V.C., Carbonneaux, Q., Hoffmann, J.: Bounded expectations: resource analysis for probabilistic programs. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, June 2018
Otto, C., Brockschmidt, M., Von Essen, C., Giesl, J.: Automated termination analysis of java bytecode by term rewriting. In: Lynch, C. (eds.) RTA, vol. 6 of LIPIcs, pp. 259–276. Schloss Dagstuhl  LeibnizZentrum fuer Informatik (2010)
Schrijver, A.: Theory of Linear and Integer Programming. Wiley, Chichester (1986)
Sinn, M., Zuleger, F., Veith, H.: A simple and scalable static analysis for bound analysis and amortized complexity analysis. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 745–761. Springer, Cham (2014). https://doi.org/10.1007/9783319088679_50
Sinn, M., Zuleger, F., Veith, H.: Complexity and resource bound analysis of imperative programs using difference constraints. J. Autom. Reasoning 59(1), 3–45 (2017). https://doi.org/10.1007/s1081701694024
Srivastava, S., Gulwani, S.: Program verification using templates over predicate abstraction. ACM SIGPLAN Not. 44(6), 223–234 (2009)
Wegbreit, B.: Mechanical program analysis. Commun. ACM 18(9), 528–539 (1975)
Wei, Tao, Mao, Jian, Zou, Wei, Chen, Yu.: A new algorithm for identifying loops in decompilation. In: Nielson, Hanne Riis, Filé, Gilberto (eds.) SAS 2007. LNCS, vol. 4634, pp. 170–183. Springer, Heidelberg (2007). https://doi.org/10.1007/9783540740612_11
Wood, G.: Ethereum: a secure decentralised generalised transaction ledger (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2021 The Author(s)
About this paper
Cite this paper
Albert, E., Genaim, S., MartinMartin, E., Merayo, A., Rubio, A. (2021). LowerBound Synthesis Using Loop Specialization and MaxSMT. In: Silva, A., Leino, K.R.M. (eds) Computer Aided Verification. CAV 2021. Lecture Notes in Computer Science(), vol 12760. Springer, Cham. https://doi.org/10.1007/9783030816889_40
Download citation
DOI: https://doi.org/10.1007/9783030816889_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783030816872
Online ISBN: 9783030816889
eBook Packages: Computer ScienceComputer Science (R0)