Complexity and Resource Bound Analysis of Imperative Programs Using Difference Constraints

Difference constraints have been used for termination analysis in the literature, where they denote relational inequalities of the form \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x' \le y + c$$\end{document}x′≤y+c, and describe that the value of x in the current state is at most the value of y in the previous state plus some constant \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c \in \mathbb {Z}$$\end{document}c∈Z. We believe that difference constraints are also a good choice for complexity and resource bound analysis because the complexity of imperative programs typically arises from counter increments and resets, which can be modeled naturally by difference constraints. In this article we propose a bound analysis based on difference constraints. We make the following contributions: (1) our analysis handles bound analysis problems of high practical relevance which current approaches cannot handle: we extend the range of bound analysis to a class of challenging but natural loop iteration patterns which typically appear in parsing and string-matching routines. (2) We advocate the idea of using bound analysis to infer invariants: our soundness proven algorithm obtains invariants through bound analysis, the inferred invariants are in turn used for obtaining bounds. Our bound analysis therefore does not rely on external techniques for invariant generation. (3) We demonstrate that difference constraints are a suitable abstract program model for automatic complexity and resource bound analysis: we provide efficient abstraction techniques for obtaining difference constraint programs from imperative code. (4) We report on a thorough experimental comparison of state-of-the-art bound analysis tools: we set up a tool comparison on (a) a large benchmark of real-world C code, (b) a benchmark built of examples taken from the bound analysis literature and (c) a benchmark of challenging iteration patterns which we found in real source code. (5) Our analysis is more scalable than existing approaches: we discuss how we achieve scalability. Electronic supplementary material The online version of this article (doi:10.1007/s10817-016-9402-4) contains supplementary material, which is available to authorized users.


Introduction
Automated program analysis for inferring program complexity and resource bounds is a very active area of research. Amongst others, approaches have been developed for analyzing functional programs [16], C# [15], C [2,7,29,35], Java [1] and Integer Transition Systems [6,10]. Below we sketch applications in the areas of verification and program understanding. For additional motivation we refer the reader to the cited papers.
Verification In many applications such as embedded systems there is a hard constraint on the availability of resources such as CPU time, memory, bandwidth, etc. It is an important part of functional correctness that programs stay within their given resource limits. As a concrete example we mention that considerable effort has been invested to analyze the worst case execution time (WCET) of hard real-time systems [33]. Another application domain is security, where the goal is to derive a bound on how much secret information is leaked in order to decide whether this leakage is acceptable [31].
Static Profiling and Program Understanding Standard profilers report numbers such as how often certain program locations are visited and how much time is spent inside certain functions; however, no information is provided how these numbers are related to the program input. Recently, new profiling approaches have been proposed that apply curve fitting techniques for deriving a cost function, which relates size measures on the program input to the measured program performance [9,34]. We believe that automated complexity and resource bound analysis lends itself naturally as static profiling technique, because it provides the user with a symbolic expression that relates the program performance to the program input. In the same way, complexity and resource bound analysis can be used to explore unfamiliar code or to annotate library functions by their performance characteristics; we note that a substantial number of performance bugs can be attributed to a "wrong understanding of API performance features" [22].
As a final remark we discuss the relationship to termination analysis, which has been intensively studied in the last decade in the computer-aided verification community: complexity and resource bound analysis can be understood as a quantitative variant of termination analysis, where not only a qualitative "yes" answer is provided, but also a symbolic upper bound on the run-time of the program.
Difference constraints (DCs) have been introduced by Ben-Amram for termination analysis in [4], where they denote relational inequalities of the form x ≤ y + c, and describe that the value of x in the current state is at most the value of y in the previous state plus some constant c ∈ Z. We call a program whose transitions are given by a set of difference constraints a difference constraint program (DCP).
We advocate the use of DCs for program complexity and resource bound analysis. Our key insight is that DCs provide a natural abstraction of the standard manipulations of counters in imperative programs: counter increments and decrements, i.e., x := x + c resp. resets, i.e., x := y, can be modeled by the DCsx ≤ x + c resp. x ≤ y (see Sect. 6 on program abstraction). The approach we discuss in this article exploits the expressive strength of DCs and distinguishes between counter resets and counter increments in the reasoning. In contrast, previous approaches [1,2,6,10,15,29,35] to bound analysis are not able to track increments and resets on the same level of precision and therefore often fail to infer tight bounds for a class of nested loop constructs which we identified during our experiments on real-world code (demonstrated by our experimental evaluation in Sect. 8.3). In this article we make the following contributions: 1. Our analysis handles bound analysis problems of high practical relevance which current approaches cannot handle: we extend the range of bound analysis to a class of challenging but natural loop iteration patterns which typically appear in parsing and string-matching routines as we discuss in Sect. 2. At the same time our analysis is general and can handle most of the bound analysis problems which are discussed in the literature. Both claims are supported by our experiments. 2. We advocate the idea of using bound analysis to infer invariants: we state a clear and concise formulation of invariant analysis by bound analysis on base of our abstract program model: our soundness proven algorithm (Sect. 3) obtains invariants through bound analysis, the inferred invariants are in turn used for obtaining bounds. Our bound analysis therefore does not rely on external techniques for invariant generation. 3. We demonstrate that difference constraints are a suitable abstract program model for automatic complexity and resource bound analysis: we develop appropriate techniques for abstracting imperative programs to DCPs in Sect. 6. 4. We report on a thorough experimental comparison of state-of-the-art bound analysis tools (Sect. 8): we set up a tool comparison on (a) a large benchmark of real-world C-code (Sect. 8.1), (b) a benchmark built of examples taken from the bound analysis literature (Sect. 8.2) and (c) a benchmark of challenging iteration patterns which we found in real source code (Sect. 8.3). 5. We have designed our analysis with the goal of scalability: our experiments demonstrate that our implementation outperforms the state-of-the-art with respect to scalability. We give a detailed discussion on how we achieve scalability in Sect. 10.
This article is an extension of the conference version presented at FMCAD 2015 [27]. Besides making the material more accessible through additional explanations and discussions, it adds the following contributions: (1) a discussion on the instrumentation of our analysis for resource bound analysis (Sect.

Motivation and Related Work
Example xnuSimple stated in Fig. 1 is representative for a class of loops that we found in parsing and string matching routines during our experiments. In these loops the inner loop iterates over disjoint partitions of an array or string, where the partition sizes are determined by the program logic of the outer loop. For an illustration of this iteration scheme see Example xnu in Fig. 9 (Sect. 7), which contains a snippet of the source code after which we have void xnuSimple(uint n) { int x = n; int r = 0; l 1 while(x > 0) { x = x -1; r = r + 1; l 2 if(*) { int p = r; l 3 while(p > 0) p--; r = 0; } l 4 } } Complexity: T B(τ 6 ) + T B(τ 3 ) = n + n = 2n Example xnuSimple abstracted DCP of xnuSimple modeled Example xnuSimple. Example xnuSimple has the linear complexity 2n (we define complexity here as the total number of loop iterations, for alternative definitions see the discussion in Sect. 2.2), because the inner loop as well as the outer loop can be iterated at most n times (as argued in the next paragraph). In the following, we give an overview how our approach infers the linear complexity for Example xnuSimple: 1. Program Abstraction We abstract the program to a DCP over N as shown in Fig. 1. The abstract variable [x] represents the program expression max(x, 0). We discuss our algorithm for abstracting imperative programs to DCPs based on symbolic execution in Sect. 6 3 . Accordingly we identify [x] as a local bound for the transitions τ 1 , τ 2 , τ 4 , τ 5 , τ 6 . 3. Bound Analysis Our algorithm (stated in Sect. 3) computes transition bounds, i.e., (symbolic) upper bounds on the number of times program transitions can be executed, and variable bounds, i.e., (symbolic) upper bounds on variable values. For both types of bounds, the main idea of our algorithm is to reason how much and how often the value of the local bound resp. the variable value may increase during program run. Our algorithm is based on a mutual recursion between variable bound analysis ("how much", function V B(v)) and transition bound analysis ("how often", function T B(τ )). Next, we give an intuition how our algorithm computes transition bounds: for τ ∈ {τ 1 , τ 2 , τ 4 , τ 5 , τ 6 Example twoSCCs abstracted DCP of twoSCCs

Invariants and Bound Analysis
We motivate the need for invariants in bound analysis and sketch how our algorithm infers invariants by bound analysis. Consider Example twoSCCs in Fig. 2. It is easy to infer x as a bound for the possible number of iterations of the loop at l 3 . However, in order to obtain a bound in the function parameters the difficulty lies in finding an invariant of form x ≤ expr(n, m 1 , m 2 ), where expr(n, m 1 , m 2 ) denotes an expression over the function parameters n, m 1 , m 2 . We show how our algorithm obtains the invariant x ≤ max(m 1 , m 2 )+ 2n by means of bound analysis: Our algorithm computes a transition bound for the loop at l 3 (with the single transition τ 5 1 and [m 2 ] = m 2 because n, m 1 , m 2 have type unsigned). We point out the mutual recursion between T B and V B: T B(τ 5 ) has called V B(x), which in turn called T B(τ 3 ). We highlight that the variable bound V B(x) (corresponding to the invariant x ≤ max(m 1 , m 2 ) + 2n) has been established during the computation of T B(τ 5 ).
We call the kind of invariants that our algorithm infers upper bound invariants (Definition 6). We compare our reasoning to classical invariant analysis in Sect. 2.3.

Resource Bound Analysis
We shortly discuss how resource bound analysis can be naturally formulated within our framework. We introduce a fresh variable c and add the initialization c = 0 to the beginning of the program under scrutiny. We add an increment/decrement c = c+k at program locations where a resource of cost k is consumed (k is positive) or freed (k is negative). Resource bound analysis is then equivalent to computing an upper bound on the value of the variable c. We can run our algorithm V B(c) to compute a symbolic upper bound for c.
In the same way we can encode related bound analysis problems: reachability bounds [15] (visits to a single location), visits to multiple transitions, loop bounds or complexity analysis.
For each of these bound analysis problems one can add a counter increment at the program locations of interest.
We illustrate the suggested encoding on the problem of computing loop bounds: for a given loop we add increments of the counter variable c to every back edge of the loop. Calling V B(c) then returns the sum of the transition bounds of all back edges of the loop. This example also illustrates how transition bounds are used for computing variable bounds in our approach.

Related Work
Termination In [4] it is shown that termination of DCPs is undecidable in general but decidable for the natural syntactic subclass of fan-in free DCPs (see Definition 12), which is the class of DCPs we use in this article. It is an open question for future work whether there is a complete algorithm for bound analysis of fan-in free DCPs.
Bound Analysis In [35] a bound analysis based on so-called size-change constraints x y is proposed, where ∈ {<, ≤}. Size-change constraints form a strict syntactic subclass of DCs. However, termination is decidable even for size-change programs that are not fan-in free and a complete algorithm for deciding the complexity of size-change programs has been developed [8]. For reasoning about inner loops [35] computes disjunctive loop summaries while such summaries are not computed by the approach discussed in this work.
In [29] a bound analysis based on constraints of the form x ≤ x + c is proposed, where c is either an integer or a symbolic constant. Because the constraints in [29,35] cannot model both increments and resets, the resulting bound analyses cannot infer the linear complexity of Example xnuSimple and need to rely on external techniques for invariant analysis.
The COSTA project (e.g. [1]) obtains recurrence relations from so-called cost equations using invariant analysis based on the polyhedra abstract domain and approaches from the literature for synthesizing linear ranking functions. Closed-form solutions for the obtained recurrence relations are inferred by means of computer algebra.
The technique discussed in [10] is based on the COSTA approach and formulated in terms of cost equations. Further, paper [10] is inspired by the counter instrumentation-based approach [14] and applies the techniques [3,25] for inferring linear ranking functions. The technique of [10] achieves a high precision of the inferred bounds by means of control-flow refinement (see also Ref. [11]).
The technique discussed in [2] over-approximates the reachable states by abstract interpretation based on the polyhedra abstract domain. This information is used for generating a linear constraint problem from which a multi-dimensional linear ranking function is obtained. A bound on the number of values which can be taken by the ranking function is then obtained from the previously computed approximation of the reachable states. Importantly, the number of dimensions of the ranking function determines the degree of the bound polynomial. The approach of [2] therefore aims at inferring a ranking function with a minimal number of dimensions and thus depends on a minimal solution to the linear constraint problem which is obtained by linear optimization (Technique [2] instruments the LP-solver with an objective function).
The technique discussed in [6] applies approaches from the literature for synthesizing ranking functions thereby inferring bounds on the number of times the execution of isolated program parts can be repeated. These bounds, called time bounds, are then used to compute bounds on the absolute value of variables, so-called variable size bounds. Additional information is inferred through abstract interpretation based on the octagon abstract domain. An overall complexity bound is deduced by alternating between time bound and variable size bound analysis. In each alternation bounds for larger program parts are obtained based on the previously computed information.
Amortized Complexity Analysis We note that inferring the linear complexity 2n for Example xnuSimple, even though the inner loop can already be iterated n times within one iteration of the outer loop, is an instance of amortized complexity analysis [32]: the cost of executing the inner loop, averaged over all n iterations of the outer loop is 1. Most previous approaches [1,6,10,15,29,35] can establish only a quadratic bound for Example xnuSimple. A typical reasoning which fails to establish the linear complexity of Example xnuSimple is as follows: (1) the outer loop can be iterated at most n times, (2) the inner loop can be iterated at most n times within one iteration of the outer loop (because the inner loop has a local loop bound p and p ≤ n is an invariant), (3) the loop bound n 2 is obtained from (1) and (2) by multiplication. The recent paper [7] discusses an interesting alternative for amortized complexity analysis of imperative programs: a system of linear inequalities is derived using Hoare-style proof-rules. Solutions to the system represent valid linear resource bounds. Since bound analysis typically does not aim at some bound but tries to infer a tight bound, Ref. [7] uses linear optimization (an LP-solver instrumented by an objective function) in order to obtain a minimum solution to the problem. Interestingly, Ref. [7] is able to compute the linear bound for l 3 of Example xnuSimple but fails to deduce the bound for the original source code (discussed in Sect. 7). Moreover, Ref. [7] is restricted to linear bounds, while our approach derives bounds which are polynomial (see, e.g., the results in Table 12) and which contain the maximum operator (e.g., Example twoSCCs). We compare our implementation to the implementation of Ref. [7] in Sect. 8.

Invariants and Bound Analysis
The powerful idea of expressing locally computed bounds in terms of the function parameters by alternating between bound analysis and variable upper bound analysis has previously been applied in [6,12,28]. Since Refs. [12,28] do not give a general algorithm but deal with specific cases we focus our discussion on [6] and highlight some important differences. The technique discussed in [6] computes upper bound invariants only for the absolute values of variables; for many cases, this does not allow to distinguish between variable increments and decrements: consider the program foo(int x, int y) {while(y > 0) {x−−; y−−;} while(x > 0) x−−;}. The algorithm described in [6] infers the bound |x| + |y| for the second loop, whereas our analysis infers the bound max(x, 0). The approach of [6] depends on global invariant analysis. E.g., given a decrement x := x − 1, the technique of [6] needs to check whether x ≥ 0 holds. If x ≥ 0 cannot be ensured, the decrement can actually increment the absolute value of x, and will thus be interpreted as |x| = |x| + 1. This can either lead to gross over-approximations or failure of bound computation if the increment of |x| cannot be bounded. Since our approach does not track the absolute value but the value, it is not concerned with this problem. The technique discussed in [6] does not support amortized analysis: e.g., The technique [6] fails to compute the linear bounds for Example xnuSimple (Fig. 1), Example xnu ( Fig. 9) and other examples we discuss in this article (see also the results in Sect. 8.3). On the other hand, Ref. [6] can infer bounds for functions with multiple recursive calls which is not supported by the analysis we present in this article.

Comparison to Invariant Analysis
We contrast our previously discussed approach for computing a bound for the loop at l 3 of Example xnuSimple with classical invariant analysis: assume that we have added a counter c which counts the number of inner loop iterations (i.e., c is initialized to 0 and incremented in the inner loop). For inferring c ≤ n through invariant analysis the invariant c + x + r ≤ n is needed for the outer loop, and the invariant c + x + p ≤ n for the inner loop. Both relate 3 variables and cannot be expressed as (parametrized) octagons (e.g., [26]). Further, the expressions c + x + r and c + x + p do not appear in the program, which is challenging for template based approaches to invariant analysis.
We now contrast our variable bound analysis (function V B) with classical invariant analysis: reconsider Example twoSCCs in Fig. 2. We have discussed how our algorithm obtains the invariant x ≤ max(m 1 , m 2 ) + 2n by means of bound analysis in the course of computing a bound for the loop at l 3 . Note, that the invariant x ≤ max(m 1 , m 2 )+2n cannot be computed by standard abstract domains such as octagon or polyhedra: these domains are convex and cannot express non-convex relations such as maximum. The most precise approximation of x in the polyhedra domain is x ≤ m 1 + m 2 + 2n. Unfortunately, it is well-known that the polyhedra abstract domain does not scale to larger programs and needs to rely on heuristics for termination. Standard abstract domains such as octagon or polyhedra propagate information forward until a fixed point is reached, greedily computing all possible invariants expressible in the abstract domain at every location of the program. In contrast, our method V B(x) infers the invariant x ≤ max(m1, m2) + 2n by modular reasoning: local information about the program (i.e., local bounds and increments/resets of variables) is combined to a global program property. Moreover, our variable and transition bound analysis is demand-driven: our algorithm performs only those recursive calls that are indeed needed to derive the desired bound. We believe that our analysis complements existing techniques for invariant analysis and will find applications outside of bound analysis.

Program Model and Algorithm
In this section we present our algorithm for computing worst-case upper bounds on the number of executions of a given transition (transition bound) and on the value of a given program expression (variable bound and upper bound invariant). Definition 1 (Program) Let Σ be a set of states. A program over Σ is a directed labeled graph P = (L , T, l b , l e ), where L is a finite set of locations, l b ∈ L is the entry location, l e ∈ L is the exit location and T ⊆ L × 2 Σ×Σ × L is a finite set of transitions. We write l 1 λ − → l 2 to denote a transition (l 1 , λ, l 2 ) ∈ T . We call λ ∈ 2 Σ×Σ a transition relation. A path Note that a run of P = (L , T, l b , l e ) starts at location l b . Further note that we call an edge l 1 λ − → l 1 ∈ T of the program a transition, whereas λ is its transition relation. In the following we will refer to transitions by τ and to transition relations by λ.
Transition bounds are at the core of our analysis: we infer bounds on the number of loop iterations, on computational complexity, on resource consumption, etc., by computing bounds on the number of times that one or several transitions can be executed. Before we formally define our notion of a transition bound we have to introduce some notation.
− → · · · be a run of P. By (τ, ρ) we denote the number of times that τ occurs on ρ.
In the following, we denote by '∞' a value s.t. a < ∞ for all a ∈ Z (infinity).
, iff τ appears not more than b times on ρ. A function b : Σ → N 0 ∪ {∞} is a bound for τ iff for all runs ρ of P it holds that b(σ 0 ) is a bound for τ on ρ, where σ 0 denotes the initial state of ρ.
Given a program transition τ , our bound algorithm (which we define below) computes a bound for τ . If possible, the bound computed by our algorithm should be precise or tight, in particular the trivial bound Σ → ∞ is (most often) of no value to us.

Definition 4 (Precise Transition Bound)
Let P(L , T, l b , l e ) be a program over states Σ. Let τ ∈ T . We say that a transition bound b : Informally A transition bound is precise if it can be reached for all initial states σ 0 . Note that there is exactly one precise transition bound.

Definition 5 (Tight Transition Bound)
Let P(L , T, l b , l e ) be a program over states Σ. Let τ ∈ T . We say that a transition bound b : Informally A transition bound is tight if it is in the same asymptotic class as the precise transition bound: let τ ∈ T . For the special case Σ = N we have the following: let f : N → N denote the precise transition bound for τ . Let g : N → N be some transition bound for τ . Trivially f ∈ O(g) ( f does not grow faster than g). Now, g is tight if also f ∈ Ω(g) ( f does not always grow slower than g). With f ∈ O(g) and f ∈ Ω(g) we have that f ∈ Θ(g). The same can be formulated for general state sets Σ by mapping Σ to the natural numbers.
We discussed in Sect. 2.1 that in the course of computing transition bounds, our analysis computes invariants of a special shape. We now formally define the form of the invariants that our analysis infers.  We now formally define the notion local bound that we motivated in Sect. 2. − → · · · be a run of P. Let e : Σ → Z be a norm. By ↓(e, ρ) we denote the number of times that the value of e decreases on ρ, i.e., ↓(e, ρ) = |{i | e(σ i ) > e(σ i+1 )}|.

Definition 8 (Norm) Let Σ be a set of states.
A norm e : Σ → Z over Σ is a function that maps the states to the integers. Definition 9 (Local Bound) Let P(L , T, l b , l e ) be a program over Σ. Let τ ∈ T . Let e : Σ → N be a norm that takes values in the natural numbers.
be a run of P. e is a local bound for τ on ρ if it holds that (τ, ρ) ≤ ↓(e, ρ). We call e a local bound for τ if e is a local bound for τ on all runs of P.
Discussion A natural number valued norm e is a local bound for τ on a run ρ if τ appears not more often on ρ than the number of times the value of e decreases. I.e., a local bound e for τ limits the number of executions of τ on a run ρ as long as certain program parts (those were e increases) are not executed. We argue in Sect. 9 that in our analysis local bounds play the role of potential functions in classical amortized complexity analysis [32]. We discuss how we obtain local bounds in Sect. 4.

Difference Constraint Programs
As discussed introductory, we base our algorithm on the abstract program model of difference constraint programs which we now formally define in Definition 12. We discuss in Sect. 6 how we abstract a given program to a DCP.
Definition 10 (Variables, Symbolic Constants, Atoms) By V we denote a finite set of variables. By C we denote a finite set of symbolic constants. A = V ∪ C is the set of atoms.
Notation We often write x ≤ y as a shorthand for the difference constraint x ≤ y + c.
where L is a finite set of vertices, l b ∈ L and l e ∈ L and E ⊆ L × 2 DC(A) × L is a finite set of edges. We write l 1 u − → l 2 to denote an edge (l 1 , u, l 2 ) ∈ E labeled by a set of difference constraints u ∈ 2 DC(A) . We use the notation l 1 − → l 2 to denote an edge that is labeled by the empty set of difference constraints. ΔP is fan-in-free, if for every edge l 1 Example Figure 10b shows a fan-in free DCP.
program over the set of states Val A with locations L, entry location l b , exit location l e and Discussion A DCP is a program (Definition 1) whose transition relations are solely specified by conjunctions of difference constraints. Note that variables in difference constraint programs take values only over the natural numbers. Further note that we refer to the syntactic representation of the transition relation in form of a set of difference constraints by u, whereas by u we refer to the transition relation itself.
We say that a variable x is used at l if x ∈ use(l), where use : ΔP is well-defined iff l b has no incoming edges and for all l ∈ L it holds that use(l) ⊆ def(l).
Discussion A DCP ΔP is well-defined if l b has no incoming edges and for all v ∈ V it holds that v is defined at all locations at which v is used (symbolic constants are always defined). Note that for well-defined programs we in particular require use(l b ) ⊆ def(l b ). Because l b has no incoming edges we have def(l b ) = C. Thus only symbolic constants can be used at l b .
Throughout this work we will only consider DCPs that are fan-in free and well-defined.
Let ΔP(L , E, l b , l e ) be a DCP over A. Our bound algorithm, which we start to develop in the next section, computes a bound for a given transition τ ∈ E in form of an expression over A which involves the operators +,×,/,min,max and the floor function · . However, note that the norms, which are treated as atoms (elements of A) in the abstraction, can involve arbitrary operators (see Sect. 6). Our bound algorithm, which we define next, computes a special case of an upper bound invariant which we call a variable bound.

Definition 15 (Expressions over A) By Expr(A) we denote the set of expressions over
Let variable x of the abstract program represent the expression expr of the concrete program. Note that by computing a variable bound for x in the abstract program, we compute an upper bound invariant for expr in the concrete program.

Algorithm
Our bound algorithm computes a bound for a given transition τ ∈ E based on a mapping ζ : E → Expr(A) (called local bound mapping) which assigns each transition τ ∈ E either (1) a bound for τ in form of an expression over the symbolic constants (i.e., ζ(τ ) ∈ Expr(C)) or (2) a local bound for τ in form of a variable (i.e, ζ(τ ) ∈ V). Note that V ∩ Expr(C) = ∅. In Case (1) our algorithm (Definition 19) returns T B(τ ) = ζ(τ ). In Case (2) a transition bound T B(τ ) ∈ Expr(C) is computed by inferring how often and by how much the local transition bound ζ(τ ) ∈ V of τ may increase during program run.
− → · · · be a run of ΔP. We call a function ζ : E → Expr(A) a local bound mapping for ρ if for all τ ∈ E it holds that either We say that ζ is a local bound mapping for ΔP if ζ is a local bound mapping for all runs of ΔP.
Further, our bound algorithm is based on a syntactic distinction between two kinds of updates that modify the value of a given variable v ∈ V: we identify transitions which increment v and transitions which reset v.

Definition 18 (Resets and Increments
We define the resets R(v) and increments I(v) of v as follows: I.e., we have that (τ, a, c) ∈ R(v) if variable v is reset to a value smaller or equal to a + c when executing the transition τ . Accordingly we have (τ, c) ∈ I(v) if variable v is incremented by a value smaller or equal to c when executing the transition τ .
Our algorithm in Definition 19 is built on a mutual recursion between the two functions where Discussion We first explain the subroutine Incr(v): with (τ, c) ∈ I(v) we have that a single execution of τ increments the value of v by not more than c. Incr(v) multiplies the bound for τ with the increment c in order to summarize the total amount by which v may be incremented over all executions of τ . Incr(v) thus computes a bound on the total amount by which the value of v may be incremented during program run. The function V B(v) computes a variable bound for v: after executing a transition τ with (τ, a, c) ∈ R(v), the value of v is bounded by V B(a) + c. As long as v is not reset, its value cannot increase by more than Incr(v).
The function T B(τ ) computes a transition bound for τ based on the following reasoning: (1) the total amount by which the local bound ζ(τ ) of transition τ can be incremented is bounded by Incr(ζ (τ )). (2) We consider a reset (t, a, c) ∈ R(ζ (τ )); in the worst case, a single execution of t resets the local bound ζ(t) to V B(a) + c, adding max(V B(a) + c, 0) to the potential number of executions of t; in total all T B(t) possible executions of t add up to T B(t) × max(V B(a) + c, 0) to the potential number of executions of t. Example We want to infer a bound for the loop at l 3 in Fig. 2. We thus compute a transition bound for τ 5 (the single back edge of the loop at l 3 ). See Table 1 for details on the computation. We get T B(τ 5 ) = max([m1], [m2]) + [n] × 2. Thus max(m1, m2) + 2n is a bound for the loop at l 3 (m 1 , m 2 and n have type unsigned).
Termination Our algorithm does not terminate iff recursive calls cycle, i.e., if a call to T B(τ ) resp. V B(v) (indirectly) leads to a recursive call to T B(τ ) resp. V B(v). This can be detected easily, we return the expression '∞'.
We distinguish three cases of cyclic computation: Case (1) occurs iff there is a cycle in the reset graph (Definition 20 in Sect. 3.3) of ΔP. In Sect. 3.4 we discuss a preprocessing that ensures absence of cycles in the reset graph and thus absence of Case (1) by renaming the program variables appropriately. Case (2) occurs iff there is a transition τ 1 with local bound x that increases the local bound y of a transition τ 2 which in turn increases x. We conclude that absence of Case (2) is ensured if for all strongly connected components (SCC) SCC of ΔP we can find an ordering τ 1 , . . . , τ n of the transitions of SCC such that the local bound of transition τ i is not increased on any void foo(uint n) { int x = n; int a = 1; Note that the existence of such an ordering for each SCC of ΔP proves termination of ΔP: it allows to directly compose a termination proof in form of a lexicographic ranking function by ordering the respective local transition bounds accordingly.
An example for Case (3) is given in Fig. 3a. Let τ 1 be the transition on which y is reset to a. Let τ 2 be the single transition of the inner loop. Assume we want to compute a loop bound for the inner loop, i.e., a transition bound for τ 2 with local bound y. This triggers a variable bound computation for a because y is reset to a. Since a is incremented on τ 2 , the variable bound computation for a will in turn trigger a transition bound computation for τ 2 . Note, however, that the loop bound for the inner loop is exponential (2 n ). We consider exponential loop bounds very rare, we did not encounter an exponential loop bound during our experiments.
Complexity Our algorithm can be efficiently (polynomial in the number of variables and transitions of the abstract program) implemented using caches (dynamic programming): we set ζ(τ ) = T B(τ ) after having computed T B(τ ). Accordingly we introduce a cache to store the result of a V B-computation. When V B(v) is called we first check if the result is already in the cache before performing the computation. The computed bound expressions, however, can be of exponential size: consider the DCP ΔP = ({l b , l}, {τ 0 , τ 1 , . . . , τ n }, l b , l e ) over variables {x 1 , x 2 , . . . , x n } and constants {m 1 , m 2 , . . . , m n } shown in Fig. 3b. In fact, Fig. 3b. However, the example is artificial. To our experience the computed bound expressions can, in practice, be reduced to human readable size by applying basic rules of arithmetic.

well-defined and fan-in free DCP over atoms
In the following we describe two straightforward improvements of the algorithm stated in Definition 19.
Let v ∈ V be a local bound for τ , i.e., for all runs ρ of ΔP we have that (τ, ρ) ≤ ↓(v, ρ). Let c ∈ N. Let ↓(v, c, ρ) denote the number of times that the value of v decreases on a run ρ of ΔP by at least c (refines Definition 7). If for all runs ρ of ΔP we have that (τ, ρ) ≤ ↓(v, c, ρ) (refines Definition 9) then T B(τ ) c is a bound for τ void foo(uint n) { int x = n; int r = n; See Sect. 4 on how we determine relevant constraints. More details on the discussed improvement are given in [30].
Improvement II Let τ 1 , τ 2 ∈ E be two transitions with the same local bound, i.e., ζ(τ 1 ) = ζ(τ 2 ). If τ 1 and τ 2 cannot be executed without decreasing the common local bound ζ(τ 1 ) twice, once for τ 1 and once for τ 2 (e.g., τ 2 and τ 5 in xnuSimple, Fig. 1), we have that is a bound on the number of times that τ 1 and τ 2 can be executed on any run of ΔP. We exploit this observation: assume some v ∈ V is incremented by c 1 on τ 1 and by c 2 on τ 2 . For computing Incr(v) we only add This idea can be generalized to multiple transitions. Further details on the discussed improvement are given in [30].

Reasoning Based on Reset Chains
Consider Fig. 4. The precise bound for the loop at l 3 is n: Initially r has value n, after we have iterated the loop at l 3 , r is set to 0. Thus the loop can only be executed in at most one iteration of the outer loop. However, our algorithm from Definition 19 infers a quadratic bound for the loop at l 3 : as shown in Table 2 . We thus get n 2 (n has type unsigned) as bound for the loop at l 3 in the concrete program.
Our algorithm from Definition 19 does not take into account that r is reset to 0 after executing the loop at l 3 . In the following we discuss an extension of our algorithm which overcomes this imprecision by taking the context under which a transition is executed into account: we say that a transition τ 2 is executed under context τ 1 if transition τ 1 was executed before the current execution of τ 2 and after the previous execution of τ 2 (if any).
As an example, consider Fig. 4b, the abstraction of Fig. 4a. We have that τ 2 is always executed either under context τ 0 or under context τ 4 . When executing τ 2 under context τ 0 , p is set to n. But when executing τ 2 under context τ 4 , p is set to 0. Moreover, τ 2 can only be executed once under context τ 0 because τ 0 is executed only once.
We define the notion of a reset graph as a means to reason systematically about the context under which resets can be executed.
Definition 20 (Reset Chain Graph) Let ΔP(L , E, l b , l e ) be a DCP over A. The reset chain graph or reset graph of ΔP is the directed graph G with node set A and edges We say that κ is a reset chain from a n to a 0 . Let n ≥ i ≥ j ≥ 0. By κ [i, j] we denote the sub-path of κ that starts at a i and ends at a j . We define in(κ) = a n , c(κ) = n i=1 c i , trn(κ) = {τ n , τ n−1 , . . . , τ 1 }, and atm(κ) = {a n−1 , . . . , a 0 }. κ is sound if for all 1 ≤ i < n it holds that a i is reset on all paths from the target location of τ 1 to the source location of τ i in ΔP. κ is optimal if κ is sound and there is no sound reset chain of length n + 1 s.t.
[n,0] = κ. Let v ∈ V, by R(v) we denote the set of optimal reset chains ending in v.
Example Figure 4c shows the reset graph of Fig. 4b.
We elaborate on the notions sound and optimal below. Let us first state a basic intuition on how we employ reset chains to enhance the precision of our reasoning: For a given reset (τ, a, c) ∈ R(v), the reset graph determines which atom flows into variable v under which context. For example, consider Fig. 4b and its reset graph in Fig. 4c: . Note that the reset graph does not represent increments of variables. We discuss how we handle increments in Sect. 3.3.1.
Let v ∈ V. Given a reset chain κ of length n that ends at v, we say that (trn(κ), in(κ), c(κ)) is a reset of v with context of length n − 1. I.e., R(v) from Definition 18 is the set of contextfree resets of v (context of length 0), because (trn(κ), in(κ), c(κ)) ∈ R(v) iff κ ends at v and has length 1. Our previously defined algorithm from Definition 19 uses only context-free resets, we say that it reasons context free. For reasoning with context, we substitute the term in Definition 19 by the term Note that we can compute a bound on the number of times that a sequence τ 1 , τ 2 , . . . , τ n of transitions may occur on a run by computing min 1≤i≤n T B(τ i ).
We now discuss how our algorithm infers the linear bound for τ 3 of Fig. 4 when applying the described modification to Definition 19: the reset graph of Fig. 4b is shown in Fig. 4c.
There are 3 reset chains ending in [ p]: . However, κ 3 is a sub-path of κ 1 and κ 2 . Note that κ 1 and κ 2 are sound by Definition 20 because [r ] is reset on all paths from the target location l 3 of τ 2 to the source location l 2 of τ 2 in Fig. 4b (namely on τ 4 ). κ 1 and κ 2 are both optimal because they are sound and of maximal length (we discuss the notions sound and optimal next). Thus R([ p]) = {κ 1 , κ 2 }. Basing our analysis on R([ p]) rather than R([ p]) our approach reasons as shown in Table 3. We get T B(τ 3 ) = [n], i.e., we get the bound n (n has type unsigned) for the loop at l 3 in the concrete program (Fig. 4a).
Sound and Optimal Reset Paths A given reset chain a n τ n ,c n − −− → a n−1 is sound if in between any two executions of τ 1 all atoms on the path (but not necessarily a n where the path starts and a 0 where it ends) are reset: Assume that r in Fig. 4a would not be reset after executing the inner loop. Then we could repeat the reset of p to r without resetting r to 0, and the inner loop would have a quadratic loop bound. For the abstract program the described modification replaces the constraint [r ] ≤ [0] on τ 4 in Fig. 4b   is not a valid transition bound for τ 3 if r is not reset to 0 between two executions of the inner loop. The optimal reset chains are the sound reset chains with maximal context, i.e., those reset chains that are sound and cannot be extended without becoming unsound.

Algorithm Based on Reset Chain Forests
In the presence of cycles in the reset graph we get infinitely many reset chains. Let us for now assume that the given program has a reset forest, i.e., that the sub-graph of the reset graph, which has nodes only in V, is a forest (Definition 20). Then also the complete reset graph is acyclic because A = V ∪ C and the nodes in C cannot have incoming edges (Definition 20).
Incr({a 1 , a 2 , . . . , a n }) = 1≤i≤n Incr(a i ) with Incr(∅) = 0 Discussion and Example We have discussed above why we replace the term T B(t) × max(V B(a) + c, 0) from Definition 19 by the term T B(trn(κ)) × max(V B(in(κ)) + c(κ), 0). We further discuss the term Incr( κ∈R(ζ (τ )) atm(κ)) which replaces the term Incr(ζ (τ )) from Definition 19: consider Example xnuSimple in Fig. 1. Note that r may be incremented on τ 1 between the reset of r to 0 on τ 0 resp. τ 4 and the reset of p to r on τ 2 . The term Incr( κ∈R(ζ (τ )) atm(κ)) takes care of such increments which may increase the value that finally flows into ζ(τ ) (in the example p) when the last transition on κ (in the example τ 2 ) is executed. In Table 4 the details of the bound computation are given. We get T B(τ 3 ) = [n], i.e, we have the bound n for the loop at l 3 in the concrete program (Fig. 1a, n has type unsigned).
Soundness Definition 21 for DCPs with a reset forest is a special case of Definition 23 for DCPs with a reset DAG. We prove soundness of Definition 23 in Electronic Supplementary Material.
Complexity The nodes of a reset forest are the variables and constants of the abstract program (the elements of A). Since the number of paths of a forest is polynomial in the number of nodes, the run time of our algorithm remains polynomial.
The first row counts the number of iterations of the outer loop, the second row shows the transitions that are executed and in the last two rows the values of r resp. p are tracked. The execution switches between two iteration schemes of the outer loop: an uneven iteration increments r twice (by executing τ 2 twice) and afterward assigns r to p by executing τ 5 . We can then execute τ 6 two times. Afterward the value of r is "saved" in p for the next (even) iteration of the outer loop before r is set to 0 on τ 1 . Therefore τ 6 can be executed again two times in the next, even iteration though r is not incremented on that iteration.
Consider the abstracted DCP in Fig. 5b and its reset graph in Fig. 5c. We have that are two reset chains ending in [ p] (see Fig. 5 c). Observe that both are sound, i.e., between any two executions of τ 7 resp. τ 5 [r ] is reset. However, [r ] is not necessarily reset between the execution of τ 5 and τ 7 , therefore the accumulated value 2 of r is used twice to increase the local bound [ p] of τ 6 .
I.e., since there are two paths from [r ] to [ p] in the reset graph (Fig. 5c) we have to count the increments of [r ] twice: once for κ 2 and once for κ 3 . Definition 22 distinguishes between nodes that have a single resp. multiple path(s) to a given variable in the reset graph. This is used in Definition 23 for a sound handling of the latter case.
where Example As shown in Table 6   Complexity A DAG can have exponentially many paths in the number of nodes. Thus there can be exponentially many reset chains in R(v) (exponential in the number of variables and constants of the abstract program, i.e., the norms generated during the abstraction process, see Sect. 6). However, in our experiments enumeration of (optimal) reset chains did not affect performance. (See also our discussion on scalability in Sect. 10.1.)

Preprocessing: Transforming a Reset Graph into a Reset DAG
Consider the DCP shown in Fig. 6a. Figure 6a has a cyclic reset graph as shown in Fig. 6b. In the following we describe an algorithm which transforms Fig. 6a into d by renaming the program variables. Figure 6d has an acyclic reset graph (a reset DAG).
where u is obtained from u by generating the constraint ς(x, l 2 ) ≤ ς(y, l 1 ) + c from a constraint x ≤ y + c ∈ u.
Examples Figure 6d is obtained from Fig. 6a by applying the described transformation using the mapping ς(x, l 1 ) = ς(y, l 2 ) = z. Soundness Soundness of the described variable renaming is obvious if there are no two (different) variables v 1 and v 2 that are renamed to the same fresh variable at some location l. This is the case if in each SCC of the variable flow graph each location l ∈ L appears at most once, i.e., if there is no SCC SCC in the variable flow graph of the program such that there is a location l ∈ L and variables v 1 , v 2 ∈ V with v 1 = v 2 and (l, v 1 ) ∈ SCC and (l, v 2 ) ∈ SCC. In the literature, a program with this property is called stratifiable (e.g., [5]). A fan-in free DCP that is not stratifiable can be transformed into a stratifiable and fan-in free DCP by introducing appropriate case distinctions into the control flow of the program. Details are given in [30]. In the worst-case, however, this transformation can cause an exponential blow up of the number of transitions in the program (the size of the control flow graph).

Finding Local Bounds
In this section we describe our algorithm for finding local bounds.
Intuition Let τ = l 1 u − → l 2 ∈ E and v ∈ V. Clearly, v is a local bound for τ if v decreases when executing τ , i.e., if v ≤ v + c ∈ u for some c < 0. Moreover, v is a local bound for τ , if every time τ is executed also some other transition t ∈ E is executed and v is a local bound for t. This is, e.g., the case if t is always executed either before each execution of τ or after each execution of τ .
Algorithm The above intuition can be turned into a simple three-step algorithm. Let ΔP(L , E, l b , l e ) be a DCP. (1) We set ζ(τ ) = 1 for all transitions τ that do not belong to a strongly connected component (SCC) of ΔP.
Let v ∈ V and τ ∈ E. Assume τ was not yet assigned a local bound by (1) or (2). We set ζ(τ ) = v if τ does not belong to a strongly connected component (SCC) of the directed graph (L , E ) where E = E\{ξ(v)} (the control flow graph of ΔP without the transitions in ξ(v)). If is assigned either v 1 or v 2 nondeterministically. An alternative way of handling this case is as follows: we generate two local bound mappings, ζ 1 and ζ 2 where ζ 1 (τ ) = v 1 and ζ 2 (τ ) = v 2 . This way we can systematically enumerate all possible choices, finally we apply our bound algorithm once based on ζ 1 , based on ζ 2 , etc., and finally take the minimum over all computed bounds. In our implementation, however, we follow the aforementioned greedy approach based on non-deterministic choice. (1) and (2) is obvious. We discuss soundness of Step (3): let τ ∈ E. If τ does not belong to an SCC of (L , E\{ξ(v)}) we have that some transition in ξ(v) (which decreases v) has to be executed in between any two executions of τ . It remains to ensure that there is a decrease of v also for the last execution of τ : for special cases this is unfortunately not the case. Consider Fig. 8b (Sect. 5). The above stated algorithm sets ζ(τ 1 ) = [x]. However, [x] is not a local bound for τ 1 of Fig. 8b because there is no decrease of [x] for the last execution of τ 1 (before executing τ 3 ).

Discussion on Soundness Soundness of Steps
It is straightforward to ensure soundness of the algorithm: adding an edge from l e to l b forces the algorithm to take the last execution of a transition into account. I.e., we set E = E ∪ {l e ∅ − → l b }\{ξ(v)}. Now our algorithm fails to find a local bound for τ 1 of Fig. 8b, which is sound. We discuss how we handle the example in Fig. 8 in Sect. 4.1.
Complexity Steps (1) and (2): can be implemented in linear time.
Step (3): for each v ∈ V we need to compute the SCCs of (L , E\ξ(v)). It is well known that SCCs can be computed in linear time (linear in the number of edges and nodes). Since we need to perform one SCC computation per variable, Step (3) is quadratic.

Generalizing Local Bounds to Sets of Local Bounds
Consider the example in Fig. 7. In Fig. 7b the DCP obtained by abstraction (Sect. 6) from the program in Fig. 7a is shown. We have that x is a local bound for τ 1 and y is a local bound for τ 2 . However, it is not straightforward to find a local bound for τ 3 : in order to form a local bound for τ 3 we need to combine x and y to a linear combination, e.g., 2x + y. It is unclear how to automatically come up with such expressions.
In the following we discuss a simple generalization of our algorithm by which we avoid an explicit composition of local bounds.
Example For Fig. 7b we have that ζ :

x]} and ζ(τ 3 ) = {[x], [y]} is a local bound set mapping.
We generalize the transition bound algorithm T B to local bound set mappings by summing up over all expr ∈ ζ(τ ). We exemplify the generalization by extending Definition 19.
Example For Fig. 7 we get T B(τ 3 ) = 2n, details are shown Table 7 ([n] = n because n has type unsigned).

Inferring a Local Bound Set Mapping
The algorithm for finding local bounds can be easily extended for finding local bound sets: steps (1) and (2) remain unchanged.

Note that
Step (3) is parametrized in the number k ∈ N of variables considered. For obvious reasons it is preferable to find local bound sets of minimal size. Given a transition Table 7 Computation of T B(τ 3 ) for Fig. 7b by Definition 26 τ , we therefore first try to find a local bound set of size k = 1 for τ and increment k only if the search fails. With a fixed limit for k the complexity of our procedure for finding local bounds remains polynomial. To our experience limiting k to 3 is sufficient in practice. Fig. 8a. The loop (resp. its back-edge) can be executed n times, the skip instruction (a placeholder for some code of interest), however, can be executed n+1 times. Consider the abstraction shown in Fig. 8b. Our algorithm for finding local bounds, as we discussed it so far, fails to find a local bound (set) for τ 1 (modeling the skip instruction). We extend the algorithm as follows: we set ξ(1) = {τ ∈ E | τ is not part of any SCC}. I.e., for Fig. 8b we set ξ(1) = {τ 0 , τ 3 }. We add 1 ∈ Expr(A) to the set of "variables"

Combined Bound Algorithm
We have developed our algorithm for computing transition bounds and variable bounds on  . 8 a Example with a "break"-statement, b DCP obtained by abstraction where T B({τ 1 , τ 2 , . . . , τ n }) = min 1≤i≤n T B(τ i ) and Incr({a 1 , a 2 , . . . , a n }) = 1≤i≤n Incr(a i ) (we set Incr(∅) = 0) and We introduced and discussed the terms from which Definition 27 is composed in Sects. 3 and 4.1.
Soundness Soundness of Definition 27 results from Theorem 2 (proven in Appendix and the discussion in Sect. 4.1. Note that Definition 27 is only sound for DCPs that have a reset DAG. We have described in Sect. 3.4 how to transform a given DCP into a DCP with a reset DAG.

Program Abstraction
In the following we discuss how we abstract a given program to a DCP. Our abstraction algorithm proceeds in two steps: we first abstract a given concrete program to a DCP with integer semantics, in a second step we then further abstract the integer-DCP to a DCP over the natural numbers (as defined in Definition 12).

Abstraction I: DCPs with Integer Semantics
We extend our abstract program model from Definition 12 to the non-well-founded domain Z by adding guards to the transitions of the program.

Syntax of DCP s with guards
The edges E of a DCP with guards ΔP G (L , E, l b , l e ) are a subset of L × 2 V × 2 DC(A) × L. I.e., an edge of a DCP with guards is of form l 1 Example See Fig. 10a in Sect. 7.1 for an example. We abstract a program P = (L , T, l b , l e ) to a DCP with guards ΔP G = (L , E, l b , l e ) as follows:

Semantics of DCP s with guards We extend the range of the valuations Val
1. Choosing an initial set of Norms We aim at creating a suitable abstract program for bound analysis. In our non-recursive setting complexity evolves from iterating loops. Therefore we search for expressions which limit the number of loop iterations. We consider conditions of form a > b resp. a ≥ b found in loop headers or on loop-paths if they involve loop counter variables, i.e., variables which are incremented and/or decremented inside the loop. Such conditions are likely to limit the consecutive execution of single or multiple loop-paths. From each condition of form a > b we create the integer expression a − b, from each condition of form a ≥ b we create the integer expression a + 1 − b. These expressions form our initial set of norms N. Note that on those transitions on which a > b holds, a − b > 0 must hold, whereas with a ≥ b we have a + 1 − b > 0.
In ΔP G we interpret a norm e ∈ N from our initial set of norms N as variable, i.e., we have e ∈ V for all e ∈ N. 2. Abstracting Transitions For each transition l 1 λ − → l 2 ∈ T we generate a set u λ of difference constraints: initially we set u λ = ∅ for all transitions l 1 λ − → l 2 ∈ T . We repeat the following construction until the set of norms N becomes stable: For each e 1 ∈ N and for each l 1 λ − → l 2 ∈ T , such that all variables in e 1 are defined at l 2 , we check whether there is a difference constraint of form e 1 ≤ e 2 + c with e 2 ∈ N and c ∈ Z in u λ . If not, we derive a difference constraint e 1 ≤ e 2 + c as follows: we symbolically execute λ for deriving e 1 from e 1 : e.g., let e 1 = x + y and assume x is assigned x + 1 on l 1 λ − → l 2 while y stays unchanged. We get e 1 = x + 1 + y through symbolic execution. In order to keep the number of norms low, we first try (a) to find a norm e 2 ∈ N and c ∈ Z s.t. e 1 ≤ e 2 + c is invariant on l 1 λ − → l 2 (see Definition 28). If we succeed we add the predicate e 1 ≤ e 2 + c to u λ . E.g., for e 1 = x + y and e 1 = x + 1 + y we get the transition invariant (x + y) ≤ (x + y) + 1 and will thus add e 1 ≤ e 1 + 1 to u λ . In general, we find a norm e 2 and a constant c by separating constant parts in the expression e 1 using associativity and commutativity, thereby forming an expression e 3 over variables and program parameters and an integer constant c. E.g., given e 1 = 5 + z we set e 3 = z and c = 5. We then search a norm e 2 ∈ N with e 2 = e 3 where the check on equality is performed modulo associativity and commutativity. (b) If (a) fails, i.e., no such e 2 ∈ N exists, we add e 3 to N and derive the predicate e 1 ≤ e 3 + c. In ΔP G we interpret e 3 as atom, i.e., e 3 ∈ A. We interpret e 3 as a symbolic constant, i.e., e 3 ∈ C, only if e 3 is purely built over the program's input parameters and constants. Note that this step increases the number of norms. If so, we add e to g λ . We use an SMT solver to perform this check. E.g., let e = x + y and assume that l 1 λ − → l 2 is guarded by the conditions x ≥ 0 and y > x. An SMT solver supporting linear arithmetic proves that

Inferring Guards
Note that SMT reasoning is applied only locally to single transitions to check if an expression is greater than 0 on that transition.

Propagation of Guards
We improve the precision of our abstraction by propagating guards: consider a transition l 3 Well-defined and Fan-in free DCPs generated by our algorithm are always fan-in free by construction: for each transition we get at most one predicate e ≤ e 2 + c for each e ∈ N because we check whether there is already a predicate for e before a predicate is inferred resp. added. We ensure well-definedness of our abstraction by a final clean-up: we iterate over all l ∈ L and check if use(l) ⊆ def(l) holds. If this check fails we remove all difference constraints x ≤ y + c with y ∈ use(l)\def(l) from all outgoing edges of l. We repeat this iteration until well-definedness is established, i.e., until use(l) ⊆ def(l) holds for all l ∈ L. Termination We have to ensure the termination of our abstraction procedure, since case (b) in step "2. Abstracting Transitions" triggers a recursive abstraction for the newly added norm: note that we can always stop the abstraction process at any point, getting a sound abstraction of the original program. We therefore ensure termination of the abstraction algorithm by limiting the chain of recursive abstraction steps that is triggered by entering case (2.b).

Non-linear Iterations
We can handle counter updates such as x = 2x or x = x/2 as follows: (1) We add the expression log x to our set of norms. (2) We derive the difference constraint

Data Structures
In previous publications [13,24] it has been described how to abstract programs with data structures to pure integer programs by making use of appropriate norms such as the length of a list or the number of elements in a tree. In our implementation we follow these approaches using a light-weight abstraction based on optimistic aliasing assumptions (see [30] for details). Once the program is transformed to an integer program, our abstraction algorithm is applied as described above for obtaining a difference constraint program.

Abstraction II: From the Integers to the Natural Numbers
We now discuss how we abstract a DCP with guards ΔP is also invariant for τ . We add the corresponding predicates to ΔP.

A Complete Example
Example xnu in Fig. 9a contains a snippet of the source code after which we have modeled Example xnuSimple in Fig. 1. The full version of Example xnu can be found in the SPEC CPU2006 benchmark, 1 in function XNU of 456.hmmer/src/masks.c. The outer loop in Example xnu partitions the interval [0, len] into disjoint sub-intervals [beg, end]. The inner loop iterates over the sub-intervals. Therefore the inner loop has an overall linear iteration count. Example xnu is a natural example for amortized complexity: Though a single visit to the inner loop can cost len (if beg = 0 and end = len), several visits can also not cost more than len since in each visit the loop iterates over a disjoint sub-interval. We therefore have: The amortized cost of a visit to the inner loop, i.e., the cost of executing the inner loop within an iteration of the outer loop averaged over all len iterations of the outer loop, is 1. Here, we refer by cost to the number of consecutive back jumps in the inner loop. But in general, any resource consumption inside the inner loop can, in total, only be repeated up to max(len, 0) times. Together with the loop bound max(len, 0) of the outer loop, our observation yields an overall complexity of 2 × max(len, 0).
Our experimental results (Sect. 8.3) demonstrate that state-of-the-art bound analyses fail to infer tight bounds for Example xnu and similar problems.

Abstraction
We give a formal representation of the concrete program semantics of Example xnu in form of a labeled transition system (LTS) shown in Fig. 9b. Each edge in the LTS is labeled by a formula which encodes the transition relation. Consider, e.g., the edge from l 1 to l 2 (τ 1 ) labeled by the formula i < l ∧ b = b ∧ e = e ∧ i = i + 1 ∧ l = l. This formula induces the transition relation We now discuss how our abstraction algorithm from Sect. 6 abstracts Example xnu to the DCP shown in Fig. 10b. Recall abstraction Step I discussed in Sect. 6.1.   Fig. 11 Reset graph -We check how e − b changes on the transitions τ 0 , τ 1 , τ 2a , τ 2b , τ 3a , τ 3b , τ 3c , τ 4 , τ 5 , τ 6 :

Choosing an initial set of Norms
, τ 3a , τ 3b , τ 4 , τ 6 : unchanged -We have processed all norms in N 3. Inferring Guards We add the guard [l − i] to τ 1 in ΔP G because l − i is a guard of τ 1 in P (due to the condition i < l), we add the guard [e − k] to τ 4 in ΔP G because e − k is a guard of τ 4 in P (due to the condition k < e). 4. The resulting DCP with guards is shown in Fig. 10a.

Applying abstraction
Step II discussed in Sect. 6.2 gives us the DCP shown in Fig. 10b. In the depiction of the abstraction we assume p, q, r,

Bound Computation
In Fig. 11 the reset graph of Fig. 10b is shown. Table 8 shows how our bound algorithm from Sect. 3 infers the linear bound max(l, 0) for the inner loop at l 4 of Example xnu by computing T B(τ 4 ) = [l] on the abstraction shown in Fig. 10b. Recall that the abstract variable [l] represents the expression max(l, 0) in the concrete program.
Note that the DCP in Fig. 10b has a reset forest. Therefore atm 1 (κ) = atm(κ) for all reset paths κ of Fig. 10b, as discussed in Sect. 3.3.2. The computation traces of Definitions 21 and 23 are thus equivalent for Fig. 10b. Table 8 Computation of T B(τ 4 ) for Example xnu (Fig. 9) by Definition 21 resp. Definition 23

Experiments
Implementation The presented analysis defines the core ideas and techniques of our implementation loopus. A complete description of the implemented techniques, including a path-sensitive extension of our bound algorithm, is given in [30]. Our implementation is open-source and available at [18]. loopus reads in the LLVM [23] intermediate representation and performs an intra-procedural analysis. It is capable of computing bounds for loops as well as analyzing the complexity of non-recursive functions.
In the following we discuss three experimental setups and tool comparisons. Our first experiment, which we discuss in Sect. 8.1 is performed on a benchmark of open-source C programs. For our second experiment (Sect. 8.2), we assembled a benchmark of challenging programs from the literature on automatic bound analysis. The third experiment was performed on a set of interesting loop iteration patterns that we found in real source code.

Evaluation on Real World C Code
Experimental Setup We base our experiment on the program and compiler optimization benchmark Collective Benchmark [17] (cBench), which contains a total of 1027 different C files (after removing code duplicates) with 211.892 lines of code. We set up the first comparison of complexity analysis tools on real-world code. For comparing our tool (loopus'15) we chose the three most promising tools from recent publications: the tool KoAT implementing the approach of [6], the tool CoFloCo implementing [10] and our own earlier implementation loopus'14 [29]. Note that we compared against the most recent versions of KoAT and CoFloCo (download 01/23/15). 2 We were not able to evaluate Rank (implementing [2]) and C4B (implementing [7]) on our benchmark because both tools support only a limited subset of C. The experiments were performed on a Linux system with an Intel dual-core 3.2 GHz processor and 16 GB memory. The task was to perform a complexity analysis on function level. We used the following experimental set up: 1. We compiled all 1027 C files in the benchmark into the LLVM intermediate representation using clang. 2. We extracted all 1751 functions which contain at least one loop using the tool llvm-extract (comes with the LLVM tool suite). Extracting the functions to single files guarantees an intra-procedural setting for all tools. 3. We used the tool llvm2kittel [20] to translate the 1751 LLVM modules into 1751 text files in the integer transition system (ITS) format that is read in by KoAT. 4. We used the transformation described in [10] to translate the ITS format of KoAT into the cost equations representation that is read in by CoFloCo. This last step is necessary because there exists no direct way for translating C or the LLVM intermediate representation into the CoFloCo input format. 5. We decided to exclude the 91 recursive functions from the benchmark set because we were not able to run CoFloCo on these examples (the transformation tool does not support recursion), KoAT was not successful on any of them, and loopus does not support recursion. In total our example set thus comprises 1659 functions.
Evaluation Table 9 shows the results of all four tools on our benchmark using a time out of 60 s (Table 10 shows the results on the subset of those functions on which no tool timed out). The first column shows the number of functions which were successfully bounded by the respective tool, the last column shows the number of time outs, on the remaining examples (not shown in the table) the respective tool did not time out but was also not able to compute a bound. The column Time shows the total time used by the respective tool to We conclude that our implementation is both more scalable and, on real C code, more successful than implementations of other state-of-the-art approaches. However, while the experiment clearly demonstrates that our implementation outperforms the competitors with respect to scalability, it does not allow to compare the strengths of the different bound analyses conclusively: we observed that llvm2kittel, the only tool available for translating C-code resp. the LLVM intermediate representation into the ITS format of KoAT, looses information that is kept by our analysis. As a result, it is unclear if a failure to compute a bound is due to different analysis strength or due to information loss during translation (we have not seen such an information loss for our second and third experiment, on which we report in Sects. 8.2 and 8.3, where the considered benchmarks consist of rather small, pure integer programs for which llvm2kittel works well). We hope that our experiment motivates the development of better tools for bound analysis of real world-code and drives the research towards solving realistic complexity analysis problems. We want to add that to our experience, working with C programs instead of integer transition systems is very helpful for developing and debugging a complexity analysis tool: looking at C code, we can use our own intuition as programmers about the expected complexity of the analyzed code and compare it to the complexity reported by the tool.
Pointers and Shapes Even loopus'15 computed bounds for only about half of the functions in the benchmark. Studying the benchmark code we concluded that for many functions pointer alias and/or shape analysis is needed for inferring functional complexity. In our experimental comparison such information was not available to the tools. Using optimistic (but unsound) assumptions on pointer aliasing and heap layout, our tool loopus'15 was able to compute the complexity for in total 1185 out of the 1659 functions in the benchmark, using 28 min total time. A discussion of our optimistic pointer aliasing and heap layout assumption and on the reasons of failure can be found in [30].
The benchmark and more details on our experimental results can be found on [18] where our tool is also offered for download.

Evaluation on Examples from the Literature
In order to evaluate the precision of our approach on a number of known challenges to bound analysis, we performed a tool comparison on 110 examples from the literature. Our example set comprises those examples from the tool evaluation in [6,29] that were available as imperative code (C or pseudo code, in total 89 examples), and additionally the examples used for the evaluation of Ref. [7] (15 examples) as well as the running examples of Ref. [27] (6 examples). We added the tools Rank (implementing [2]) and C4B (implementing [7]) to the comparison, because we were able to formulate the examples over the restricted C subset that is supported by these two tools (this was not possible for our experiment on real-world code).
The results of our evaluation are shown in Table 11. Our two tools loopus'15 and loopus'14 compute the highest number of linear bounds and are also significantly faster than the other tools, in particular than KoAT and CoFloCo. On the other hand, KoAT computes the highest number of bounds in total (4 more than loopus). CoFloCo computes, in total, 1 bound more than our tool. The comparable low number of bounds computed by C4B is also due to the fact that the approach implemented in C4B is limited to linear bounds.
In summary, our second evaluation shows that our approach is not only successful on the class of problems on which we focused in this article, but solves also many other bound analysis problems from the literature. Note, that in contrast to our first evaluation, our second benchmark contains small examples from academia (1293 LOC, in average 12 lines per file). On these examples our implementation is comparable in strength to the implementation of other state-of-the-art approaches to bound analysis. Given that our tool is a prototype implementation, there is room for improvement, concrete suggestions are discussed in [30]. More details on the results computed by each tool can be found on [19].

Evaluation on Challenging Iteration Patterns from Real Code
Scanning through two C-code benchmarks (cBench Example xnu (discussed in Sect. 7) is a natural example for the described behaviour. The complete benchmark is available at [19]. For each pattern we link its origin in the header of the respective file. Note that for some patterns we found several instances. Table 12 states the results that were obtained by loopus'15, loopus'14, CoFloCo, KoAT, Rank and C4B: '✓' denotes that the bound computed by the respective tool is tight (in the same asymptotic class as the precise bound, see Definition 5), 'O(n x )' denotes that the respective tool did not infer a tight bound but a bound in the asymptotic class O(n x ), '✗' denotes that no bound was inferred, 'TO' denotes that the tool timed out (the time  T  90  3  43  36  3  5  0  3min50s  3 The time out was 120 s. A higher time out did not yield additional results  The time out was 20 min, a longer time out did not yield additional results out limit was 20 min, a longer time out did not yield additional results), ' ' denotes that we were not able to translate the example into the input format of the tool. For each file we annotate its asymptotic complexity (an asymptotic bound on the total number of loop iterations, determined manually) behind its file name in Table 12. We explain the last 5 rows of The experiment demonstrates, that our bound analysis complements the state-of-the-art, by inferring tight bounds for a class of real-world loop iterations, on which existing techniques mostly fail or obtain coarse over-approximations.
Technical remarks (1) We counted the time needed by the tool Aspic (a preprocessor for Rank which performs invariant generation) into the time of the bound analysis performed by Rank. (2) Rank reported an unsound bound and an error message for the examples s_SFD_process.c, load_mems.c and SingleLinkCluster.c. On these examples we therefore assessed Rank's return value as fail ('✗').

Amortized Complexity Analysis
In the following we discuss how our approach relates to amortized complexity analysis as introduced by Tarjan in his influential paper [32]. We recall Tarjan's idea of using potential functions for amortized analysis in Sect. 9. In Sect. 9.1 we explain how our approach can be viewed as an instantiation of amortized analysis via potential functions.
Amortized Analysis using Potential Functions Amortized complexity analysis [32] aims at inferring the worst-case average cost over a sequence of calls to an operation or function rather than the worst-case cost of a single call. In (resource) bound analysis the difference between the single worst-case cost and the amortized cost is relevant, e.g., if a function f is called inside a loop: assume the loop bound is n and the single worst-case cost of a call to f is also n. The cost of a single call to f amortized over all n calls might, however, be lower than n, e.g., 2. In this case the total worst-case cost of iterating the loop is 2n rather than n 2 . Note that in our non-recursive setting function calls can always be inlined. The amortized analysis problem thus boils down to the problem of inferring the cost of executing an inner loop averaged over all executions of the outer loop.
Tarjan [32] motivates amortized complexity analysis on the example of a program which executes n stack operations StackOp. Each StackOp operation consists of a push instruction, adding an element to the stack, followed by a pop instruction, removing an arbitrary number of elements from the stack. Initially the stack is empty. The cost of a single push is 1 and the cost of a single pop is the number of elements removed from the stack. Tarjan points out that the worst-case cost of a single pop is n: the nth pop instruction may pop n elements (cost n) from the stack, if the previous pop instructions did not remove any elements from the stack. i.e., the worst-case cost of a single StackOp operation is n + 1. Nevertheless all n operations StackOp cannot cost more than 2n in total since we cannot remove more elements from the stack than have been added to the stack and thus the overall cost of the pop instructions is bounded by the total number of push instructions (n by assumption). The amortized cost of StackOp, i.e., the cost of StackOp averaged over the sequence of all n operations, is therefore 2.
Potential Function As a means to reason about the amortized cost of an operation or a sequence of operations, Tarjan introduces the notion of a potential function. A potential function is a function Φ : Σ → Z from the program states to the integers. Let C op (σ ) denote the cost of executing operation op at program state σ ∈ Σ. Let Φ be a potential function. Tarjan defines the amortized cost where σ denotes the program state before and σ denotes the program state after executing op. I.e., the amortized cost is the cost plus the decrease resp. increase in the value of the potential. Consider a sequence of n operations, let op i denote the ith operation in the sequence. Let σ i denote the program state before executing operation op i , σ i+1 is the program state after executing op i . In general, the total cost of executing all n operations is: That is, the total time of the operations equals the sum of their amortized times plus the net decrease in potential from the initial to the final configuration. [...] In most cases of interest, the initial potential is zero and the potential is always non-negative. In such a situation the total amortized time is an upper bound on the total time [32]. I.e., if Φ i ≥ 0 and Φ 0 = 0 then Reconsider Tarjan's previously discussed example of a sequence of n executions of operation StackOp. Let j denote the stack size, i.e., σ i (j) is the size of the stack in program state σ i . The cost of executing StackOp in program state is the cost of the pop operation). Tarjan proposes to use the stack size j as a potential function, i.e. we choose Φ(σ i ) = σ i (j). We have With (2) we get:

Amortized Analysis in our Algorithm
Example tarjan in Fig. 12 is a model of Tarjan's motivating example (discussed above): variable j models the stack size. The push instruction is modeled by increasing the stack size j by 1. The pop instruction is modeled by decreasing the stack size. Further all calls    Fig. 12b. We have that transition τ 1 models the push instruction, increasing the stack size j by 1, a sequence of transitions τ 2 models the pop instruction, decreasing the stack by an arbitrary number of elements. A complete run ρ of Example tarjan can be decomposed into the initial transition τ 0 and a number of sub-runs consists of a single transition τ 1 (push) followed by a sequence of transitions τ 2 (pop), followed by a single execution of transition τ 3 . Each sub-run ρ [i k ,i k+1 ] models Tarjan's StackOp operation. We thus have that the amortized cost of a sub-run ρ [i k ,i k+1 ] is 2. Given that τ 1 cannot be executed more than n times and each ρ [i k ,i k+1 ] contains exactly one τ 1 , we get that the overall cost of executing Example tarjan is bounded by n × 2 = 2n. In the following we argue that our transition bound algorithm T B is an instantiation of amortized analysis using potential functions. We base our discussion on the concrete semantics of Example tarjan given by the LTS in Fig. 12b. Note, however, that our algorithm runs on the abstracted DCP in Fig. 12c where the same reasoning applies: suppose we want to compute the transition bound of transition τ 2 in order to compute the total cost of the pop instructions. Let ρ = (σ 0 , l 0 ) λ 0 − → (σ 1 , l 1 ) λ 1 − → · · · be a run of Example tarjan. Let len(ρ) denote the length of ρ (i.e, total number of transitions on ρ). We define the cost of executing τ 2 in program state σ i as C τ 2 (σ i ) = 1 and the cost of executing τ 1 and τ 3 as C τ 1 (σ i ) = C τ 3 (σ i ) = 0 since we are only interested in τ 2 . We have where ρ(i) denotes the i + 1th transition l i u i − → l i+1 on ρ. Our algorithm reduces the question "how often can τ 2 be executed?" to the question "how often can the local bound ' j' of τ 2 be increased on τ 1 ?". This reasoning uses the local bound j of τ 2 as a potential function, as we show next: we get the following amortized costs for executing τ 1 , τ 2 and τ 3 respectively: With σ i ( j) ≥ 0, σ 1 ( j) = 0 and (2) we have: We point out that choosing the local bound j of τ 2 as potential function causes the amortized cost of executing τ 2 to be 0 and reduces the question how often τ 2 can be executed to how often the potential j can be incremented on τ 1 . Using (τ 1 , ρ) ≤ (τ 0 , ρ) × n = n one obtains the upper bound n for the total cost of the pop instructions.

Conclusion
We presented a new approach to (resource) bound analysis. Our approach complements existing approaches in several aspects, as discussed in Sect. 2.3. Our analysis handles bound analysis problems of high practical relevance which current approaches cannot handle: current techniques [6,7,10,29] fail on Example xnu and similar problems. We have argued that such problems, e.g., occur naturally in parsing and string matching routines. During our experiments on real-world source code, we found 23 different iteration patterns that pose a challenge for similar reasons as Example xnu: in these patterns, the worst-case cost of a single inner loop execution is lower than the worst-case cost of the inner loop averaged over the iterations of the outer loop. Our implementation obtains tight bounds for 21 out of these 23 iteration patterns (Sect. 8.3).
Our algorithm (Sect. 3) obtains invariants by means of bound analysis and does not rely on external techniques for invariant generation. This is in contrast to current bound analysis techniques (see discussion on related work in Sect. 2). We have compared our algorithm to classical invariant analysis and argued that we can efficiently compute invariants which are difficult to obtain by standard abstract domains such as octagon or polyhedra (Sect. 2). We have demonstrated that the limited form of invariants (upper bound invariants) that our algorithm obtains is sufficient for the bound analysis of a large class of real-world programs.
We have demonstrated that difference constraints are a suitable abstract program model for automatic complexity and resource bound analysis. Despite their syntactic simplicity, difference constraints are expressive enough to model the complexity-related aspects of many imperative programs. In particular, difference constraints allow to model amortized complexity problems such as the bound analysis challenge posed by Example xnu (discussed in Sect. 7). We developed appropriate techniques for abstracting imperative programs to DCPs (Sect. 6): we described how to extract norms (integer-valued expressions over the program state) from imperative programs and showed how to use these norms as variables in DCPs.
Our approach deals with many challenges bound analysis is known to be confronted: in Sect. 8.2 we compared our tool on a benchmark of challenging problems from publications on bound analysis. The results show that our prototype implementation can handle most of these problems. Here, our implementation, while comparable in terms of strengths to other implementations of state-of-the-art bound analysis techniques, performs the task significantly faster than the competitors. The results obtained by our prototype tool could be further enhanced by extending our implementation with additional techniques discussed in [30].
We stress that our approach is more scalable than existing approaches. We presented empirical evidence of the good performance characteristics of our analysis by a large exper-iment and tool comparison on real source code in Sect. 8.1. We discuss the main technical reasons for scalability of our analysis in Sect. 10.1.
We think that the abstract program model of difference constraint programs is worth further investigation: given that difference constraints can model standard counter manipulations (counter increments, decrements and resets), a further research on complexity analysis of difference constraint programs is of high value. We consider DCPs to be a very suitable program model for studying the principle challenges of automated complexity and resource bound analysis for imperative programs.

Discussion on the Scalability of Our Analysis
In the following we state what we consider to be the main technical reasons that make our analysis scale: First of all, we achieve scalability by local reasoning: note that our abstraction procedure relies on purely local information, i.e., information that is available on single program transitions. In particular, we do not apply global invariant analysis. Further, the sets I(v) and R(v), by which our main algorithm is parametrized, are built by categorizing the difference constraints on single (abstract) program transitions based on simple syntactic criteria. Our algorithm for computing the local bound mapping ζ (Sect. 4) is polynomial even in the generalized case (Sect. 4.1).
We use bound analysis to infer bounds on variable values (variable bounds). Unlike classical invariant analysis this approach is demand-driven and does not perform a fixed point iteration (see discussion in Sect. 2.3).
Note that the only general purpose reasoner we employ is an SMT solver. Further, the SMT solver is only employed in the program abstraction phase. In terms of size, the problems we feed to the SMT solver are small, namely simple linear arithmetic formulas, composed of the arithmetic of single transitions. Our approach instruments the SMT solver only for yes/no answers, no optimal solution (e.g., minimum or minimal unsatisfiable core) is required.
Our basic bound algorithm (Definition 19) runs in polynomial time. The reasoning based on reset chains (Definition 23), however, has exponential worst-case complexity, resulting from the potentially exponential number of paths in the program (exponential in the number of program transitions). We did not experience this to be an issue in practice because the simplicity of our abstract program model allows to take straightforward engineering measures: program slicing reduces the number of paths in the program significantly, further, merging of similar paths can be applied (details are given in [28]).