Loop Summarization with Rational Vector Addition Systems (extended version)

This paper presents a technique for computing numerical loop summaries. The method works first synthesizing a rational vector addition system with resets (Q-VASR) that simulates the action of an input loop, and then using the (polytime computable) reachability relation of Q-VASRs to over-approximate the behavior of the loop. The key technical problem solved in this paper is to synthesize a Q-VASR that is a best abstraction of a loop in the sense that (1) it simulates the loop and (2) it is simulated by any other Q-VASR that simulates the loop. As a result, our loop summarization scheme has predictable precision. We implement the summarization algorithm and show experimentally that it is precise and performant.


Introduction
Modern software verification techniques employ a number of heuristics for reasoning about loops. While these heuristics are often effective, they are unpredictable. For example, an abstract interpreter may fail to find the most precise invariant expressible in the language of its abstract domain due to imprecise widening; similarly, a software-model checker might fail to terminate because it generates interpolants that are insufficiently general. This paper presents a loop summarization technique that is powerful, in the sense that it generates expressive loop invariants, and predictable, in the sense that we can make theoretical guarantees about invariant quality.
The key idea behind our technique is to leverage reachability results of vector addition systems (VAS) for invariant generation. Vector addition systems are a class of infinite-state transition systems with decidable reachability; VAS are classically used as a model of parallel systems [10]. We consider a variation of VAS, rational VAS with resets (Q-VASR), wherein there is a finite number of rational-typed variables and a finite set of transitions that simultaneously update each variable in the system by either adding a constant value or (re)setting the variable to a constant value. Our interest in Q-VASRs stems from the fact that there is (polytime) procedure to compute a linear arithmetic formula that represents a Q-VASR's reachability relation [8].
Since the reachability relation of a Q-VASR is computable, the dynamics of Q-VASR can be analyzed without relying on heuristic techniques. However, there is a gap between Q-VASR and the loops that we are interested in summarizing. The latter typically use a rich set of operations (memory manipulation, conditionals, non-constant increments, non-linear arithmetic, etc) and cannot be analyzed precisely. We bridge the gap with a procedure that, for any loop, synthesizes a Q-VASR that simulates the loop. The reachability relation of the Q-VASR can be used to over-approximate the behavior of the loop. Moreover, we prove that if a loop is expressed in linear rational arithmetic (LRA), our procedure synthesizes a best Q-VASR abstraction, in the sense that it simulates any other Q-VASR that simulates the loop. That is, the procedure does not make arbitrary heuristic choices, but rather synthesizes a best approximation of the loop in the language of Q-VASR.
Q-VASRs over-approximate multi-paths loops by treating the choice between paths as non-deterministic. We show that we can recover some conditional control flow information and inter-path control dependencies by partitioning the states of the loop, and encoding this partitioning by extending Q-VASR with control states (Q-VASR with states, Q-VASRS). We give a procedure for synthesizing a Q-VASRS that simulates a given loop; we may then use the reachability relation of a Q-VASRS to summarize the loop. We prove that, for a fixed program state partition, this procedure computes best Q-VASRS abstractions for LRA formulas. Additionally, we give a state-partitioning algorithm that yields a monotone loop summarization procedure (more accurate information about loop bodies result in more accurate loop summaries).
Finally, we note that our analysis techniques extend to complex control structures, such as nested loops, by employing summarization compositionally (i.e., "bottom-up"). For example, our analysis summarizes a nested loop by first summarizing its inner loops, and then uses the summaries to analyze the outer loop. As a result of compositionality, our analysis can be applied to partial programs, is easy to parallelize, and has the potential to scale to large code bases.
The main contributions of the paper are as follows: -We present a procedure to synthesize Q-VASR abstractions of transition formulas. For transition formula in linear rational arithmetic, this Q-VASR is a best abstraction. -We present a technique for improving the precision of our analysis by using Q-VASR with states to capture loop control structure. -We implement the proposed invariant generation technique and show that its ability to verify user assertions is comparable to software model checkers, while providing theoretical guarantees of termination and invariant quality.

Outline
This section illustrates the high-level structure of our invariant generation scheme. The goal is to compute a transition formula that summarizes the behavior of a given program. A transition formula is a formula over a set of program variables Var along with primed copies Var ′ , representing the state of the program before and after executing a computation (respectively). For any given program P , a where (−) ⋆ is a function that computes an over-approximation of the transitive closure of a transition formula. The contribution of this paper is a method for computing this (−) ⋆ operation, which is based on first over-approximating the input transition formula by a Q-VASR, and then computing the (exact) reachability relation of the Q-VASR.
We illustrate the analysis on the integer model of a persistent queue data structure pictured in Figure 1. The example consists of two operations (enqueue and dequeue), as well as a test harness (harness) that non-deterministially executes enqueue and dequeue operations. The queue achieves O(1) amortized time enqueue and queue by implementing the queue as two lists, front and back (whose lengths are modeled as front len and back len, respectively); the sequence of elements in the queue is the front list followed by the reverse of the back list. We will show that the queue operates in O(1) amortized time by finding a summary for harness that implies a linear bound on mem ops (the number of memory operations in the computation) in terms of nb ops (the total number of enqueue/dequeue operations executed in some sequence of operations).
To analyze the queue, we procede compositionally, in "bottom-up" fashion (i.e., starting from deeply-nested code and working our way back up to a summary for harness). There are two loops of interest, one in dequeue and one in harness. Since the dequeue loop is nested inside the harness loop, dequeue is analyzed first. We first compute a transition formula that represents one exeution of the body of the loop: Observe that each variable in the loop is incremented by a constant value. As a result, the loop update can be captured faithfully by a vector addition system. In particular, we see that this loop body formula is simulated by a four-dimensional vector addition system, where the simulation relation ∼ deq and Q-VASR V deq are as follows: A formula representing the reachability relation of a vector addition system can be computed in polytime. For the case of V deq , a formula representing k steps of the Q-VASR is simply To capture information about the pre-condition (post-condition) of the loop, we may project the primed variables to obtain back len > 0, and similarly project the unprimed variables to obtain back len ′ ≥ 0. Finally, combining the Q-VASR update formula, the simulation formula ∼ deq , and the pre/post-condition, we get the following approximation of the dequeue loop's behavior: Using this summary for the dequeue loop, we may proceed to compute a transition formula for the body of the harness loop (omitted for brevity). Just as with the dequeue loop, we analyze the harness loop by computing a simulation Q-VASR, V har , that simulates (∼ har ) it: Unlike the dequeue loop, we do not get an exact characterization of the dynamics of each changed variable. In particular, in the slow dequeue path through the loop, the value of front len, back len, and mem ops change by a variable amount. The variable back len is set to 0, so its behavior can be captured by a reset. The dynamics of front len and mem ops cannot be captured by a Q-VASR, but (using our dequeue summary) we can observe that the sum of front len + back len is decremented by 1, and the sum of mem ops + 3back len is incremented by 2.
We compute the following formula that captures the reachability relation of V har (taking k 1 steps of enqueue, k 2 steps of dequeue fast, and k 3 steps of dequeue slow): ∧ mem ops ′ + 3back len ′ = mem ops + 3back len + 4k 1 + 2k 2 + 2k 3 ∧ front len ′ + back len ′ = front len + back len Using this update formula (along with pre/post-condition formulas), we obtain a summary for the harness loop (omitted for brevity). Using this summary we can prove some interesting features of the data structure (supposing that we start in a state where all variables are zero): mem ops is at most 4 times nb ops (i.e., enqueue and dequeue use O(1) amortized memory operations), and size is the sum of front len and back len.

Background
We now take a moment to define what a transition system is, the transition systems of interest in this paper (transition formulas, Q-VASR, Q-VASRS), and the notation used throughout the paper.
For a transition relation →, we use → * to denote its reflexive, transitive closure.
For a vector a, we use diagonal(a) to denote the diagonal matrix with a on the diagonal. For two vectors a and b of the same dimension d, we use a · b to denote the inner product For any natural number i, we use e i to denote the standard basis vector in the ith direction (i.e., the consisting of all zeros except the ith entry is 1), where the dimension of e i is understood from context. We use I n to denote the n × n identity matrix, or simply I if n is understood from context.
For any natural number pair, n, m ∈ N, matrix A ∈ Q n×m , and set R ⊆ {1, ..., n}, define Π R (A) to be the |R| × m submatrix of A obtained by deleting the rows not in R (i.e., if we enumerate R in order as i 1 , ..., i n ′ then Π R (A) is the matrix whose jth row is the i j th row of A). Observe that for any A and R, An n-transition formula is a ∃LRA (or ∃LIA) formula whose free variables range over x 1 , ..., x n and x ′ 1 , ..., x ′ n . The free variables designate the state before and after a transition and ∃LIA (∃LRA) denotes the existential fragment of linear integer (rational) arithmetic.
The syntax for ∃LIA/∃LRA is as follows: where x is a variable symbol and c is a rational number. Observe that (without loss of generality) we assume that formulas are free of negation.
An n-transition formula, F , defines a transition system (S F , → F ) where consists of a binary reset vector a, and a rational addition vector b, both of dimension d. V defines a transition system Definition 3. A d-dimensional rational vector addition system with resets and states (Q-VASRS), V = (Q, E), is a finite set of states, Q, together with a finite set of edges, The reachability relation of a Q-VASRS is definable in Presburger arithmetic.
Q-VASRs are a special case of Q-VASRSs with a single state and so this theorem applies to Q-VASRs as well.

Approximating loops with vector addition systems
In this section, we describe a method for over-approximating the transitive closure of a transition formula using a Q-VASR. This procedure immediately extends to computing summaries for programs using the method outlined in Section 1.1.
The core algorithmic problem that we answer in this section is: given a transition formula, how can we synthesize a best abstraction of that formulas dynamics as a Q-VASR? We begin by formalizing the problem: in particular, we define what it means for a Q-VASR to simulate a transition formula and what it means for an abstraction to be "best." Definition 4. Let A = (Q n , → A ) and B = (Q m , → B ) be transition systems operating over rational vector spaces. A linear simulation from A to B is a linear transformation S : is a transition formula representing the transitions that V simulates under the transformation S. The key property of simulations is that if F S V , then Our task is to synthesize a linear transformation S and a Q-VASR V such that F S V . We call a pair (S, V ), consisting of a rational matrix S ∈ Q d×n , and a d-dimensional Q-VASR V a Q-VASR abstraction; we say that n is the concrete dimension of (S, V ) and d is the abstract dimension. We say that (S, V ) is a Q-VASR abstraction of F if F S V . A transition formula has many Q-VASR abstractions, and we so we are interested in comparing their precision. We define a preorder on Q-VASR abstractions, where (S, V ) ( S, V ) iff there exists a linear transformation R ∈ Q e×d such that V R V and RS = S (d and e are the abstract dimensions of (S, V ) and ( S, V ), respectively).
Thus, our problem can be stated as follows: given a transition formula F , solution to this problem is given in Algorithm 1.
Algorithm 1 follows the familiar pattern of an AllSat-style loop. We begin with an empty Q-VASR abstraction ((I, ∅)), and build the abstraction up to overapproximate all possible behaviors of F iteratively. The formula Γ maintains the set of transitions that are allowed by F but not simulated by the current Q-VASR abstraction. Each abstraction round proceeds as follows: First, we sample a model M of Γ (i.e., a transition that is allowed by F but not simulated by (S, V ). We then generalize that transition to a set of transitions by using M to select a cube C of the DNF of F contains M . We then compute a Q-VASR abstraction α(C) of C, using the procedure described in Section 3.1. We combine this Q-VASR abstraction with the current one ((S, V ) ⊔ α(C)) by computing a least upper bound (in order), using the procedure described in Section 3.2. Finally, we block any transition in C from being sampled again by conjoining ¬γ(S, V ) to Γ . The loop terminates when Γ is unsatisfiable, in which case we have that F S V .
Theorem 2. Given an n-transition formula F (in ∃LIA or ∃LRA), Algorithm 1 computes a Q-VASR abstraction of F . If F is in ∃LRA, Algorithm 1 computes a best Q-VASR abstraction of F .

Abstracting conjunctive transition formulas
In this section, we show how to compute a Q-VASR abstraction for a consistent conjunctive formula. The intuition is that, since ∃LRA is a convex theory, the best Q-VASR abstraction consists of a single transition. (For ∃LIA formulas, our procedure produces a Q-VASR abstract that is not guaranteed to be best, precisely because ∃LIA is not convex).
Let C be formula that is consistent and conjunctive. Observe that the set R { s, c : C |= s · x ′ = c}, which represents linear combinations of variables that are reset across C, forms a vector space. Similarly, the set representing linear combinations of variables that are incremented across C forms a vector space, I = { s, c : C |= s · x ′ = s · x + c}. We compute bases for both R and , where z is a Skolem constant. The vector space of resets has basis { 1 0 0 , 1 } (representing that x 1 is reset to 1). The vector space of increments has basis { 1 0 0 , 2 , 0 1 −1 , −1 } (representing that x 1 increases by 2 and x 2 − x 3 decreases by 1). A best abstraction of C is thus the three-dimensional Q-VASR In particular, notice that since the variable x 1 is both incremented and reset, it is represented by two different dimensions inα(C).
Proposition 1. For any consistent, conjunctive transition formula C,α(C) is a Q-VASR abstraction of C. If C is expressed in ∃LRA,α(C) is best.

Least upper bound
In this section, we show how to compute least upper bounds w.r.t. the order. Supposing that (S 1 , V 1 ) (S, V ) and (S 2 , V 2 ) (S, V ), there must exist linear simulations R 1 and R 2 such that V 1 The intuition behind our approach is that we will compute R 1 and R 2 , and derive Q-VASR V as the union of the image of V 1 under R 1 and the image of V 2 under R 2 . Computation of R 1 and R 2 relies on (1) the constraints on R 1 and R 2 induced by the expected equation R 1 S 1 = R 2 S 2 (2) the fact that if R i is a linear simulation from V i to any other Q-VASR, then R i must satisfy a certain structural property. This property is called coherence, as defined in the following.
, every transition of V that resets i also resets j and vice versa). i ≡ V j denotes that i and j are coherent dimensions of V . Observe that ≡ V forms an equivalence relation on {1, ..., d}. We refer to the equivalence classes of ≡ V as coherence classes.
A row vector r ∈ Q d is coherent with respect to V if and only if for all j, k ∈ {1, ..., d}, r j = 0 and r k = 0 implies i ≡ V j. Equivalently, r is coherent if there is some coherence class C ∈ {1, ..., d}/ ≡ V and some row vector r ∈ Q |C| such that r = rΠ C (I). If r is non-zero then the coherence class C is uniquely determined; in this case we use coh V (s) to denote C.
A matrix R ∈ Q e×d is coherent with respect to V if and only if each of its rows is coherent with respect to V . Lemma 1. Let V be a d-dimensional Q-VASR, let V be an e-dimensional Q-VASR, and let R ∈ Q e×d be a linear transformation such that V R V . Then R is coherent with respect to V .
For a d-dimensional Q-VASR, V , and a linear transformation, S ∈ Q e×d , that is both coherent with respect to V and has no zero rows, there is a unique e-dimensional Q-VASR, V , such that V S V and V is minimal in the inclusion order; we use image(V , S) to denote this Q-VASR. More explicitly, image(V , S) is defined as where S ⊠ a is the reset vector a translated along S, defined as the e-dimensional vector with (S ⊠ a) i a j for some arbitrary representative j ∈ coh V (S i ). 2 The intuition behind S ⊠ a is that each row i of S corresponds to a unique coherence class coh V (S i ) of V , and either all the dimensions in coh V (S i ) are reset (in which case we take (S ⊠ a) i = 0) or none of them are (in which case we take (S ⊠ a) i = 1). Observe that for any u ∈ gQ d , S(a * u) = (S ⊠ a) * (Su). The following lemma gives an intuitive characterization of image.
input : Normal Q-VASR abstractions (S 1 , V 1 ) and (S 2 , V 2 ) of equal concrete dimension output: Least upper bound (w.r.t. ) of (S 1 , V 2 ) and (S 1 , V 2 ) 1 Let d 1 , d 2 denote the abstract dimensions of (S 1 , V 1 ) and (S 2 , V 2 ); 2 S, R 1 , R 2 ← empty matrices; Before describing our least upper bound algorithm, we must define a technical condition that is both assumed and preserved by the procedure: Definition 6. A Q-VASR abstraction (S, V ) is normal if there is no non-zero vector z that is coherent with respect to V such that zS = 0 (i.e., the rows of S that correspond to any coherence class of V are linearly independent).
Intuitively, a Q-VASR abstraction that is not normal contains information that is either inconsistent or redundant.
We now present Algorithm 2, our algorithm for computing least upper bounds of Q-VASR abstractions. Let (S 1 , V 1 ) and (S 2 , V 2 ) be Q-VASR abstractions. Our goal is to find two matrices R 1 and R 2 such that (1) We find the best such R 1 and R 2 iteratively. For each pair of coherence classes C 1 of V 1 and C 2 of V 2 , we compute matrices U 1 and U 2 such that (i) and (iv) U 1 /U 2 are maximal, in the sense that the rows of U 1 S 1 = U 2 S 2 form a basis vector space that contains the rowspace of any matrix E = E 1 S 1 = E 2 S 2 such that E 1 and E 2 satisfy (i)-(iii). We form R 1 and R 2 simply by collecting all such U 1 and U 2 . Properties (1)-( 3) together ensure that the Q-VASR abstraction (S, is an upper bound on (S 1 , V 1 ) and (S 2 , V 2 ). The fact that R 1 and R 2 are constructed from matrices that satisfy

Control Flow and Q-VASRS
In this section, we give a method for improving the precision of our loop summarization technique by using Q-VASRS, Q-VASR extended with control states. While Q-VASRs over-approximate control flow using non-determinism, Q-VASRSs can encode patterns such as oscillating and multi-phase loops. Section 5 demonstrates that the ability to analyze such patterns greatly increases the accuracy of loop summaries for some loops. We begin with an example that demonstrates the precision gained by Q-VASRS. The loop in Figure 2a oscillates between (1) incrementing variable i by 1 and (2) incrementing both variables i and x by 1. Suppose that we wish to prove that, starting with the configuration x = 0 ∧ i = 1, the loop maintains the invariant that 2x ≤ i. The (best) Q-VASR abstraction of the loop, pictured in Figure 2b, over-approximates the control flow of the loop by treating the conditional branch in the loop as a non-deterministic branch. This over-approximation may violate the invariant 2x ≤ i by repeatedly executing the path where both variables are incremented. On the other hand, the Q-VASRS abstraction of the loop pictured in Figure 2c captures the understanding that the loop must oscillate between the two paths. The loop summary obtained from the reachability relation of this Q-VASRS, is powerful enough to prove the invariant 2x ≤ i holds (under the precondition x = 0 ∧ i = 1).

Technical details
Definition 7. An n-predicate Q-VASRS is a Q-VASRS, V = (P, E), such that each control state is a predicate over the variables x 1 , ..., x n and the predicates in P are pairwise inconsistent (for all p = q ∈ P , p ∧ q is unsatisfiable).
We extend linear simulations to n-predicate Q-VASRS as follows: -Let F be an n-state transition formula and let V = (P, E) be an n-predicate Q-VASRS of dimension m. We say that a linear transformation S : and e, respectively. We say that a linear transformation S : Q d → Q e is a linear simulation from V to V if for all p, q ∈ P and for all u, v ∈ Q d such that (p, u) Observe that if V = (P, E) has a linear simulation to V = ( P , E), then P must be finer than P in the sense that (1) for each p ∈ P there is a (unique) p ∈ P such that p |= p.
We define a Q-VASRS abstraction to be a pair (S, V) consisting of a rational matrix S ∈ Q d×n and an n-predicate Q-VASRS of dimension d. We extend the simulation preorder to Q-VASRS abstractions in the natural way. Extending the definition of "best" abstractions requires more care, since we can always find a "better" Q-VASRS abstraction (strictly smaller in order) by using a finer state partition. However, if we consider only n-predicate Q-VASRS that share the same set of control states, then best abstractions do exist and can be computed using Algorithm 3.

Algorithm 3: abstract-VASRS(F, P )
input : Transition formula F , set of pairwise-disjoint predicates P such that u →F v implies u |= P and v |= P output: Best Q-VASRS abstraction of F with control states P 1 For all p, q ∈ P , let (Sp,q, Vp,q) ← abstract-VASR(p ∧ F ∧ q ′ ); 2 (S, V ) ← least upper bound of all (Sp,q, Vp,q); 3 For all p, q ∈ P , let Rp,q ← the simulation matrix from (Sp,q, Vp,q) to (S, V ); 4 E = {(p, a, b, q) : p, q ∈ P, (a, b) ∈ image(Vp,q, Rp,q))}; 5 return (S, (P, E)) Algorithm 3 works as follows. First, for each pair of formulas p, q ∈ P , compute a best Q-VASR abstraction of the formula p∧F ∧q ′ (where q ′ denotes q with unprimed variables replaced by primed ones) and call it (S p,q , V p,q ). (S p,q , V p,q ) over-approximates the transitions of F that begin in a program state satisfying p and ending in a program state satisfying q. Second, we compute the least upper bound of all Q-VASR abstractions (S p,q , V p,q ) to get a best Q-VASR abstraction (S, V ) for F . Computing the least upper bound has the effect of reconciling the Q-VASR abstractions corresponding to different edges in the Q-VASRS, but does not maintain the provenance of the Q-VASR transitions (i.e., which transformers correspond to which edges). To reconstruct provenance, we compute the linear simulation R p,q from (S p,q , V p,q ) to (S, V ), and compute the edges from p to q as the image of V p,q under R p,q .
Proposition 3. Given a transition n-transition formula F and control states P , Algorithm 3 computes the best n-predicate Q-VASRS abstraction of F with control states P .
We now describe Algorithm 4, which uses Q-VASRS to over-approximate transitive closure of transition formulas. Towards our goal of predictable program analysis, we desire our analysis to be monotone in the sense that if F and G are transition formulas such that F entails G, then the over-approximate transitive summary of F entails the over-approximate transitive summary of G.
The key property we desire in a procedure for generating control states predicates is monotonicity: if F |= G, then control states of F should be at least as fine as control state of G. We can achieve this by taking the set of control states P of F to be the set of topologically connected regions of ∃x ′ .F (lines 1-4). Unfortunately, this set of predicates fails the contract of abstract-VASRS, because there may exist a transition u → F v such that v |= P . As a result, (S, V) = abstract-VASRS(F, P ) does not necessarily approximate F ; however, it does over-approximate F ∧ P ′ . An over-approximation of the transitive closure of F can easily be obtained from cℓ(S, V) (the over-approximation of the transitive closure of F ∧ P ′ obtained from the Q-VASRS abstraction (S, V)) by sequentially composing with the identity relation or F (line 6).

Precision improvements
The abstract-VASRS algorithm uses predicates to infer the control structure of a Q-VASRS, but after computing the Q-VASRS abstraction, iter-VASRS makes no further use of the predicates (i.e., the predicates are irrelevant in the computation of cℓ(S, V )). Predicates can be used to improve iter-VASRS as follows. The reachability relation of a Q-VASRS is expressed by a formula that uses auxiliary variables to represent the state at which the computation begins and ends [8]. These variables can be used to encode that the pre-state of the transitive closure must satisfy the predicate corresponding to the begin state and the post-state must satisfy the predicate corresponding to the end state. As an example, consider the Figure 2 and suppose that we wish to prove the invariant x ≤ 2i under the pre-condition i = 0 ∧ x = 0. While this invariant holds, we cannot prove it because there is counter example if the computation begins at i%2 == 1. By applying the above improvement, we can prove that the computation must begin at i%2 == 0, and the invariant is verified.

Evaluation
The goals of our evaluation is the answer the following questions:  We implemented our loop summarization procedure and the compositional whole-program summarization technique described in Section 1.1. We ran on a suite of 149 benchmarks, drawn from the C4B [2] and HOLA [4] suites, as well as the safe, integer-only benchmarks in the loops category of SV-Comp 2016 [17]. We ran each benchmark with a time-out of 5 minutes, and recorded how many benchmarks were proved safe by our Q-VASR-based technique and our Q-VASRS-based technique. For context, we also compare with CRA [12] (a related loop summarization technique), as well as SeaHorn [7] and UltimateAutomizer [9] (state-of-the-art software model checkers). The results are shown in Figure 3.
The number of assertions proved correct using Q-VASR is comparable to both SeaHorn and UltimateAutomizer, demonstrating that Q-VASR can indeed model interesting loop phenomena. Q-VASRS-based summarization significantly improves precision, proving the correctness of 91% of assertions in the suite, and more than any other tool.
Q-VASR-based summarization is the most performant of all the compared techniques, followed by CRA and Q-VASRS. SeaHorn and UltimateAutomizer employ abstraction-refinement loops, and so take significantly longer to run the test suite.

Related work
Compositional analysis Our analysis follows the same high-level structure as compositional recurrence analysis (CRA) [5,12]. Our analysis differs from CRA in the way that it summarizes loops: we compute loop summaries by overapproximating loop by vector addition systems and computing reachability relations, whereas CRA computes loop summaries by extracting recurrence relations and computing closed forms. The advantage of our approach is that is that we can use Q-VASR to accurately model multi-path loops and can make theoretical guarantees about the precision of our analysis; the advantage of CRA is its ability to generate non-linear invariants.
Vector addition systems Our invariant generation method techniques upon Haase and Halfon's polytime procedure for computing the reachability relation of integer vector addition systems with states and resets [8]. Generalization from the integer case to the rational case is straightforward. Continuous Petri nets [3] are a related generalization of vector addition systems, where time is taken to be continuous (Q-VASR, in contrast, have rational state spaces but discrete time). Reachability is for continuous Petri nets is polytime [6] and transitive closure is definable in linear arithmetic [1].
Sinn et al. present a technique for resource bound analysis which is related to our loop summarization procedure [16]. Sinn et al.'s method is based on computing a lossy vector addition system with states that simulates a piece of code, proving termination of the VASS, and then extracting resource bounds from the ranking function. Our method differs in several respects. First, Sinn et al. model programs using vector addition systems with states over the natural numbers, which enables them to use termination bounds for VASS to compute upper bounds on resource usage. We use VASS with resets over the rationals, which (in contrast to VASS) have a Presburger-definable reachability relation, enabling us to summarize loops. Moreover, Sinn et al.'s method for extracting VASS models of program is heuristic, whereas our method gives precision guarantees.

Symbolic abstraction
The main contribution of this paper is a technique for synthesizing the best abstraction of a transition formula expressible in the language of Q-VASR (with or without states). This is closely related to the symbolic abstraction problem, which computes the best abstraction of a formula within an abstract domain. The problem of computing best abstractions has been undertaken for finite-height abstract domains [15], template constraint matrices (including intervals and octagons) [13], and polyhedra [19,5]. Our best abstraction result differs in that (1) it is for a disjunctive domain and (2) the notion of "best" is based on simulation rather than the typical order-theoretic framework.
First, we show that there is a single transition ( a, b) ∈ V that simulates I (i.e., I S {( a, b)}). This follows essentially from the fact that linear rational arithmetic is a convex theory; for completeness, we make an explicit argument. By the well-ordering principle, it is sufficient to prove that if T =  {(a 1 , b 1 ), ..., (a n , b n )} is a Q-VASR such that I S T and I S U for any proper subset U ⊂ T , we must have n = 1. For a contradiction, suppose n > 1, and let U 1 = {(a 1 , b 1 )} and U 2 = {(a 2 , b 2 ), ..., (a n , b n )}. Since I S U 1 , there is a transition u 1 → I v 1 such that S(u 1 ) → U1 S(v 1 ). Since I S U 2 there is a transition u 2 → I v 2 such that S(u 2 ) → U2 S(v 2 ). Geometrically, I forms a convex polyhedron to which the points (u 1 , v 1 ) and (u 2 , v 2 ) belong. By convexity, every point on the line segment from (u 1 , v 1 ) and (u 2 , v 2 ) belongs to I; that is, for all k ∈ [0, 1] we have ( Since there are infinitely many transitions along the line segment and each one must have a corresponding transition in T that simulates it, there must exist some i ∈ {1, ..., n} such that the set A i of transitions that are simulated by transition (a i , b i ), contains at least two points on the line segment. Since A i is an affine space and contains at least two points on the line segment, it must contain all points on the entire line that connects (u 1 , v 1 ) and (u 2 , v 2 ) (and in particular the points (u 1 , v 1 ) and (u 2 , v 2 ) themselves). Since S(u 1 ) → (ai,bi) S(v 1 ) and (by construction) S(u 1 ) → U1 α ′ (v 1 ), we cannot have i = 1. Since S(u 2 ) → (ai,bi) S(v 2 ) and (by construction) S(u 2 ) → U2 α ′ (v 2 ) we also cannot have i = 1, a contradiction.
Next we construct a matrix R ∈ Q d ′ ×d with RS = S and that V R V . Recall thatα(I) is defined to be (S, and where { a 1 , c 1 , ..., a m , c m } is a basis for the vector space V r = { a, c : I |= a · x ′ = c} and { a m+1 , c m+1 , ..., a d , c d } is a basis for the vector space We form the ith row of the matrix R, R i , as follows. Suppose that a ′ i = 0 (the case for a ′ i = 1 is similar). Since I S {( a, b)}, we have (using S j to denote the jth row of S) and thus we may conclude that S ′ i , b ′ i ∈ V r . It follows that there exist a unique r 1 , ..., r m ∈ Q such that r 1 S 1 , b 1 +· · · + r m S m , b m = S ′ i , b ′ i . We take R i = r 1 ... r m 0 ... 0 , and observe that Lemma 1. Let V be a d-dimensional Q-VASR, let V be an e-dimensional Q-VASR, and let R ∈ Q e×d be a linear transformation such that V R V . Then R is coherent with respect to V .
Proof. Suppose u → V v. Then there exists a transformer (a, b) ∈ V such that v = a * u+b. It follows that Sv = S(a * u+b) = S(a * u)+Sb = (S⊠a) * (Su)+Sb. Since (S ⊠ a, Sb) ∈ image(V , S), Su → image(V ,S) Sv. The other direction is symmetric.
Theorem 3. Let V be a d-dimensional Q-VASR, let R ∈ Q e×d be a matrix that is coherent with respect to V . For any i, j ∈ {1, ..., |R|} such that R i = 0 and Proof. We must prove (1) that there is a linear simulation from (S 1 , V 1 ) to (S 3 , V 3 ), (2) that there is a linear simulation from (S 2 , V 2 ) to (S 3 , V 3 ), (3) that (S 3 , V 3 ) is normal, and (4) that for any Q-VASR abstraction ( S, V ) that satisfies (1) and (2) Algorithm 2 constructs matrices R 1 and R 2 such that (1) and (2) are essentially given for free. We now show that (S 3 , V 3 ) is normal. Normality follows directly from the way that S 3 is constructed. Essentially, S 3 is constructed by combining a bases for the intersection of rowspace(Π C 1 (S 1 )) with rowspace(Π C 2 (S 2 )), for each equivalence class C 1 of (S 1 , V 1 ) and C 2 of of (S 2 , V 2 ). Each basis will be in its own equivalence class of (S 3 , V 3 ) (lemma 3), and therefore (S 3 , V 3 ).
(4) is the only remaining proof obligation. The intuition is that any ( S, V ) fulfilling (1) and (2) must have coherent simulations from both (S 1 , V 1 ) and (S 2 , V 2 ). Each row of these coherent simulations must lay in the rowspace of the intersection of rowspace(Π C 1 (S 1 )) with rowspace(Π C 2 (S 2 )) for some equivalence class C 1 of (S 1 , V 1 ) and some equivalence class C 2 of of (S 2 , V 2 ). Since the rows of S 3 form a basis for all such equivalence classes, there is a matrix R such that RS 3 = S, and since V is minimal in the sense that it is defined as the union of the images of S 1 and S 2 , R is a linear simulation. We now prove this formally.
Suppose that ( S, V ) satisfies (1) and (2). We first construct a linear transformation R 3 such that R 3 S 3 = S and then we show that V 3 R 3 V . Since ( S, V ) satisfies (1) and (2), there exist linear transformations R 1 and R 2 R 1 S 1 = S, R 2 S 2 = S, V 1 R 1 V , and V 2 R 2 V . We construct R 3 by rows, with the ith row R 3 i constructed to satisfy (i) i as follows. Distinguish two cases: 1. Case S i = 0. Then R 3 i = 0. Since R 1 S 1 = S, we have R 1 i S 1 = ( R 1 S 1 ) i = S i = 0. Since (S 1 , V 1 ) is normal (by assumption) and R 1 (and therefore R 1 i ) is coherent with respect to V 1 (by Lemma 1), we have R 1 i = 0, and thus (ii) ( R 3 i R 1 ) = ( R 1 ) i because both sides are 0. Property (iii) follows from a symmetric argument. 2. Case S i = 0. By similar logic in the previous case, one can see that R 1 i = 0 and R 2 i = 0. Therefore, coh V 1 ( R 1 i ) = C 1 and coh V 2 ( R 2 i ) = C 2 for some unique equivalence class C 1 of V 1 and C 2 of V 2 . That is to say, there exists some q = 0 such that qB[J, K] = S i . Take [C 1 , C 2 ] to be the dimensions of S 3 such that {S i : i ∈ [C 1 , C 2 ]} is a basis of rowspace(B[C 1 , C 2 ]) and [C 1 , C 2 ] is an equivalence class of ≡ V 3 . There must exist some q ′ = 0 such that q ′ S 3 = S i and coh V 3 (q ′ ) = [C 1 , C 2 ]. Define the i th row of R 3 as q ′ . Clearly, we have (i) i is coherent with respect to (S 3 , V 3 ), R 3 i is also coherent with respect to (R 1 S 1 , image(V 1 , R 1 )). Since S i = 0, R 3 i R 1 = 0. Since R 3 i is a linear combination of the coherence class, [C 1 , C 2 ] of (S 3 , V 3 ), coh V 1 ( R 3 i R 1 ) = C 1 . Therefore, there exists somes,t ∈ Q |C 1 | such that ( R 3 R 1 ) i =sΠ C 1 (I) and R 1 i =tΠ C 1 (I). Since R 3 S 3 = S = R 1 S 1 and R 1 S 1 = S 3 , we have that R 3 R 1 S 1 = R 1 S 1 . As such,sΠ C 1 (I)S 1 =tΠ C 1 (I)S 1 . Since (S 1 , V 1 ) is normal, Π C 1 (I)S 1 is right invertible. Therefore,s =t. Multiplying both sides by Π C 1 (I) givessΠ C 1 (I) =tΠ C 1 (I) and so we have (ii) ( R 3 R 1 ) i = ( R 1 ) i . A symmetric argument can be made to show that (iii) ( R 3 R 2 ) i = ( R 2 ) i Finally, we must prove that V 3 R 3 V . Suppose that u → V 3 v, and let (a ′ , b ′ ) ∈ V 3 be such that v = a ′ * u + b ′ . By construction of V 3 , there is some i ∈ {1, 2} and some (a, b) ∈ V i such that (a ′ , b ′ ) = (R i ⊠ a, R i b). Since Since V i R i V , we must have ( R i ⊠ a, R i b) ∈ V , and therefore ( R 3 ⊠ a ′ , R 3 b ′ ) ∈ V ; by Theorem 2, we have R 3 u → V 3 R 3 v. Theorem 2. Given an n-transition formula F (in ∃LIA or ∃LRA), Algorithm 1 computes a Q-VASR abstraction of F . If F is in ∃LRA, Algorithm 1 computes a best Q-VASR abstraction of F . Proof. We break this proof into three steps. We first show that the Q-VASR abstraction, (S, V ), output by algorithm 1 with input F , is a Q-VASR abstraction of F . We next show that (S, V ) is a best abstraction of F . We conclude with a proof of termination for the algorithm.