Diffy: Inductive Reasoning of Array Programs using Difference Invariants

We present a novel verification technique to prove interesting properties of a class of array programs with a symbolic parameter N denoting the size of arrays. The technique relies on constructing two slightly different versions of the same program. It infers difference relations between the corresponding variables at key control points of the joint control-flow graph of the two program versions. The desired post-condition is then proved by inducting on the program parameter $N$, wherein the difference invariants are crucially used in the inductive step. This contrasts with classical techniques that rely on finding potentially complex loop invaraints for each loop in the program. Our synergistic combination of inductive reasoning and finding simple difference invariants helps prove properties of programs that cannot be proved even by the winner of Arrays sub-category from SV-COMP 2021. We have implemented a prototype tool called diffy to demonstrate these ideas. We present results comparing the performance of diffy with that of state-of-the-art tools.


Introduction
Software used in a wide range of applications use arrays to store and update data, often using loops to read and write arrays. Verifying correctness properties of such array programs is important, yet challenging. A variety of techniques have been proposed in the literature to address this problem, including inference of quantified loop invariants [20]. However, it is often difficult to automatically infer such invariants, especially when programs have loops that are sequentially composed and/or nested within each other, and have complex control flows. This has spurred recent interest in mathematical induction-based techniques for verifying parametric properties of array manipulating programs [12,42,11,44]. While induction-based techniques are efficient and quite powerful, their Achilles heel is the automation of the inductive argument. Indeed, this often becomes the limiting step in applications of induction-based techniques. Automating the induction step and expanding the class of array manipulating programs to which induction-based techniques can be applied forms the primary motivation for our work. Rather than being a stand-alone technique, we envisage our work being used as part of a portfolio of techniques in a modern program verification tool.
We propose a novel and practically efficient induction-based technique that advances the state-of-the-art in automating the inductive step when reasoning about array manipulating programs. This allows us to automatically verify interesting properties of a large class of array manipulating programs that are beyond the reach of state-of-the-art induction-based techniques, viz. [12,42]. The work that comes closest to us is Vajra [12], which is part of the portfolio of techniques in VeriAbs [1] -the winner of SV-COMP 2021 in the Arrays Reach sub-category. Our work addresses several key limitations of the technique implemented in Vajra, thereby making it possible to analyze a much larger class of array manipulating programs than can be done by VeriAbs. Significantly, this includes programs with nested loops that have hitherto been beyond the reach of automated techniques that use mathematical induction [12,42,44].
A key innovation in our approach is the construction of two slightly different versions of a given program that have identical control flow structures but slightly different data operations. We automatically identify simple relations, called difference invariants, between corresponding variables in the two versions of a program at key control flow points. Interestingly, these relations often turn out to be significantly simpler than inductive invariants required to prove the property directly. This is not entirely surprising, since the difference invariants depend less on what individual statements in the programs are doing, and more on the difference between what they are doing in the two versions of the program. We show how the two versions of a given program can be automatically constructed, and how differences in individual statements can be analyzed to infer simple difference invariants. Finally, we show how these difference invariants can be used to simplify the reasoning in the inductive step of our technique.
We consider programs with (possibly nested) loops manipulating arrays, where the size of each array is a symbolic integer parameter N (> 0) 3 . We verify (a sub-class of) quantified and quantifier-free properties that may depend on the symbolic parameter N . Like in [12], we view the verification problem as one of proving the validity of a parameterized Hoare triple {ϕ(N )} P N {ψ(N )} for all values of N (> 0), where arrays are of size N in the program P N , and N is a free variable in ϕ(·) and ψ(·).
To illustrate the kind of programs that are amenable to our technique, consider the program shown in Fig. 1 (a), adapted from an SV-COMP benchmark. This program has a couple of sequentially composed loops that update arrays and scalars. The scalars S and F are initialized to 0 and 1 respectively before the first loop starts iterating. Subsequently, the first loop computes a recurrence in variable S and initializes elements of the array B to 1 if the corresponding elements of array A have non-negative values, and to 0 otherwise. The outermost branch condition in the body of the second loop evaluates to true only if the program parameter N and the variable S have same values. The value of F is reset based on some conditions depending on corresponding entries of arrays A and B. The pre-condition of this program is true; the post-condition asserts that F is never reset in the second loop. State-of-the-art techniques find it difficult to prove the assertion in this program. Specifically, Vajra [12] is unable to prove the property, since it cannot reason about the branch condition (in the second loop) whose value depends on the program parameter N . VeriAbs [1], which employs a sequence of techniques such as loop shrinking, loop pruning, and inductive reasoning using [12] is also unable to verify the assertion shown in this program. Indeed, the loops in this program cannot be merged as the final value of S computed by the first loop is required in the second loop; hence loop shrinking does not help. Also, loop pruning does not work due to the complex dependencies in the program and the fact that the exact value of the recurrence variable S is required to verify the program. Subsequent abstractions and techniques applied by VeriAbs from its portfolio are also unable to verify the given post-condition. VIAP [42] translates the program to a quantified first-order logic formula in the theory of equality and uninterpreted functions [32]. It applies a sequence of tactics to simplify and prove the generated formula. These tactics include computing closed forms of recurrences, induction over array indices and the like to prove the property. However, its sequence of tactics is unable to verify this example within our time limit of 1 minute.
Benchmarks with nested loops are a long standing challenge for most verifiers. Consider the program shown in Fig. 1(b) with a nested loop in addition to sequentially composed loops. The first loop initializes entries in array A to 0. The second loop aggregates a constant value in the scalar S. The third loop is a nested loop that updates array A based on the value of S. The entries of A are updated in the inner as well as outer loop. The property asserts that on termination, each array element equals twice the value of the parameter N .
While the inductive reasoning of Vajra and the tactics in VIAP do not support nested loops, the sequence of techniques used by VeriAbs is also unable to prove the given post-condition in this program. In sharp contrast, our prototype tool Diffy is able to verify the assertions in both these programs automatically within a few seconds. This illustrates the power of the inductive technique proposed in this paper.
The technical contributions of the paper can be summarized as follows: -We present a novel technique based on mathematical induction to prove interesting properties of a class of programs that manipulate arrays. The crucial inductive step in our technique uses difference invariants from two slightly different versions of the same program, and differs significantly from other induction-based techniques proposed in the literature [12,42,11,44]. -We describe algorithms to transform the input program for use in our inductive verification technique. We also present techniques to infer simple difference invariants from the two slightly different program versions, and to complete the inductive step using these difference invariants. -We describe a prototype tool Diffy that implements our algorithms.
-We compare Diffy vis-a-vis state-of-the-art tools for verification of C programs that manipulate arrays on a large set of benchmarks. We demonstrate that Diffy significantly outperforms the winners of SV-COMP 2019, 2020 and 2021 in the Array Reach sub-category.

Overview and Relation to Earlier Work
In this section, we provide an overview of the main ideas underlying our technique. We also highlight how our technique differs from [12], which comes closest to our work. To keep the exposition simple, we consider the program P N , shown in the first column of Fig. 2, where N is a symbolic parameter denoting the sizes of arrays a and b. We assume that we are given a parameterized precondition ϕ(N ), and our goal is to establish the parameterized post-condition ψ(N ), for all N > 0. In [12,44], techniques based on mathematical induction (on N ) were proposed to solve this class of problems. As with any induction-based technique, these approaches consist of three steps.
Like in [12,44], our technique uses induction on N to prove the Hoare triple {ϕ(N )} P N {ψ(N )} for all N > 0. Hence, our base case and inductive hypothesis are the same as those in [12,44]. However, our reasoning in the crucial inductive step is significantly different from that in [12,44], and this is where our primary contribution lies. As we show later, not only does this allow a much larger class of programs to be efficiently verified compared to [12,44], it also permits reasoning about classes of programs with nested loops, that are beyond the reach of [12,44]. Since the work of [12] significantly generalizes that of [44], henceforth, we only refer to [12] when talking of earlier work that uses induction on N .
In order to better understand our contribution and its difference vis-a-vis the work of [12], a quick recap of the inductive step used in [12] is essential. The inductive step in [12] crucially relies on finding a "difference program" ∂P N and a "difference pre-condition" ∂ϕ(N ) such that: (i) P N is semantically equivalent to P N −1 ; ∂P N , where ';' denotes sequential composition of programs 4 , (ii) ϕ(N ) ⇒ ϕ(N − 1) ∧ ∂ϕ(N ), and (iii) no variable/array element in ∂ϕ(N ) is modified by P N −1 . As shown in [12], once ∂P N and ∂ϕ(N ) satisfying these conditions are obtained, the problem of proving {ϕ(N )} P N {ψ(N )} can be reduced to that of proving {ψ(N − 1) ∧ ∂ϕ(N )} ∂P N {ψ(N )}. This approach can be very effective if (i) ∂P N is "simpler" (e.g. has fewer loops or strictly less deeply nested loops) than P N and can be computed efficiently, and (ii) a formula ∂ϕ(N ) satisfying the conditions mentioned above exists and can be computed efficiently. The requirement of P N being semantically equivalent to P N −1 ; ∂P N is a very stringent one, and finding such a program ∂P N is non-trivial in general. In fact, the authors of [12] simply provide a set of syntax-guided conditionally sound heuristics for computing ∂P N . Unfortunately, when these conditions are violated (we have found many simple programs where they are violated), there are no known algorithmic techniques to generate ∂P N in a sound manner. Even if a program ∂P N were to be found in an ad-hoc manner, it may be as "complex" as P N itself. This makes the approach of [12] ineffective for analyzing such programs. As an example, the fourth column of Fig. 2 shows P N −1 followed by one possible ∂P N that ensures P N (shown in the first column of the same figure) is semantically equivalent to P N −1 ; ∂P N . Notice that ∂P N in this example has two sequentially composed loops, just like P N had. In addition, the assignment statement in the body of the second loop uses a more complex expression than that present in the corresponding loop of P N . Proving {ψ(N − 1) ∧ ∂ϕ(N )} ∂P N {ψ(N )} may therefore not be any simpler (perhaps even more difficult) than proving {ϕ(N )} P N {ψ(N )}.
In addition to the difficulty of computing ∂P N , it may be impossible to find a formula ∂ϕ(N ) such that ϕ(N ) ⇒ ϕ(N −1)∧∂ϕ(N ), as required by [12]. This can happen even for fairly routine pre-conditions, such as ϕ(N ) ≡ Notice that there is no ∂ϕ(N ) that satisfies ϕ(N ) ⇒ ϕ(N − 1) ∧ ∂ϕ(N ) in this case. In such cases, the technique of [12] cannot be used at all, even if P N , ϕ(N ) and ψ(N ) are such that there exists a trivial proof of {ϕ(N )} P N {ψ(N )}.
The inductive step proposed in this paper largely mitigates the above problems, thereby making it possible to efficiently reason about a much larger class of programs than that possible using the technique of [12]. Our inductive step proceeds as follows. Given P N , we first algorithmically construct two programs Q N −1 and peel(P N ), such that P N is semantically equivalent to Q N −1 ; peel(P N ). Intuitively, Q N −1 is the same as P N , but with all loop bounds that depend on N now modified to depend on N − 1 instead. Note that this is different from P N −1 , which is obtained by replacing all uses (not just in loop bounds) of N in P N by N − 1. As we will see, this simple difference makes the generation of peel(P N ) significantly simpler than generation of ∂P N , as in [12]. While generating Q N −1 and peel(P N ) may sound similar to generating P N −1 and ∂P N [12], there are fundamental differences between the two approaches. First, as noted above, P N −1 is semantically different from Q N −1 . Similarly, peel(P N ) is also semantically different from ∂P N . Second, we provide an algorithm for generating Q N −1 and peel(P N ) that works for a significantly larger class of programs than that for which the technique of [12] works. Specifically, our algorithm works for all programs amenable to the technique of [12], and also for programs that violate the restrictions imposed by the grammar and conditional heuristics in [12]. For example, we can algorithmically generate Q N −1 and peel(P N ) even for a class of programs with arbitrarily nested loops -a program feature explicitly disallowed by the grammar in [12]. Third, we guarantee that peel(P N ) is "simpler" than P N in the sense that the maximum nesting depth of loops in peel(P N ) is strictly less than that in P N . Thus, if P N has no nested loops (all programs amenable to analysis by [12] belong to this class), peel(P N ) is guaranteed to be loop-free. As demonstrated by the fourth column of Fig. 2, no such guarantees can be given for ∂P N generated by the technique of [12]. This is a significant difference, since it greatly simplifies the analysis of peel(P N ) vis-a-vis that of ∂P N .
We had mentioned earlier that some pre-conditions ϕ(N ) do not admit any ∂ϕ(N ) such that ϕ(N ) ⇒ ϕ(N −1)∧∂ϕ(N ). It is, however, often easy to compute formulas ϕ (N − 1) and ∆ϕ (N ) in such cases such that ϕ(N ) ⇒ ϕ (N − 1) ∧ ∆ϕ (N ), and the variables/array elements in ∆ϕ (N ) are not modified by either We assume the availability of such a ϕ (N − 1) and ∆ϕ (N ) for the given ϕ(N ). This significantly relaxes the requirement on pre-conditions and allows a much larger class of Hoare triples to be proved using our technique vis-a-vis that of [12].
The third column of Fig. 2 shows Q N −1 and peel(P N ) generated by our algorithm for the program P N in the first column of the figure. It is illustrative to compare these with P N −1 and ∂P N shown in the fourth column of Fig. 2. Notice that Q N −1 has the same control flow structure as P N −1 , but is not semantically equivalent to P N −1 . In fact, Q N −1 and P N −1 may be viewed as closely related versions of the same program. Let V Q and V P denote the set of variables of Q N −1 and P N −1 respectively. We assume V Q is disjoint from V P , and analyze the joint execution of Q N −1 starting from a state satisfying the precondition ϕ (N − 1), and P N −1 starting from a state satisfying ϕ(N − 1). The purpose of this analysis is to compute a difference predicate D(V Q , V P , N − 1) that relates corresponding variables in Q N −1 and P N −1 at the end of their joint execution. The above problem is reminiscent of (yet, different from) translation validation [40,49,48,4,46,17,24], and indeed, our calculation of D(V Q , V P , N − 1) is motivated by techniques from the translation validation literature. An important finding of our study is that corresponding variables in Q N −1 and P N −1 are often related by simple expressions on N , regardless of the complexity of P N , ϕ(N ) or ψ(N ). Indeed, in all our experiments, we didn't need to go beyond quadratic expressions on N to compute D(V Q , V P , N − 1).
Once the steps described above are completed, we have ∆ϕ (N ), peel(P N ) and D(V Q , V P , N − 1). It can now be shown that if the inductive hypothesis, is obtained as a set of equalities, the existential quantifier in the formula ψ (N − 1) can often be eliminated simply by substitution. We can also use quantifier elimination capabilities of modern SMT solvers, viz. Z3 [39], to eliminate the quantifier, if needed. Second, recall that unlike ∂P N generated by the technique of [12], peel(P N ) is guaranteed to be "simpler" than P N , and is indeed loop-free if P N has no nested loops. Therefore, proving {∆ϕ (N ) ∧ ψ (N − 1)} peel(P N ) {ψ(N )} is typically significantly simpler than proving {ψ(N − 1) ∧ ∂ϕ(N )} ∂P N {ψ(N )}. Finally, it may happen that the pre-condition in {∆ϕ (N ) ∧ ψ (N −1)} peel(P N ) {ψ(N )} is not strong enough to yield a proof of the Hoare triple. In such cases, we need to strengthen the existing pre-condition by a formula, say ξ (N − 1), such that the strengthened pre-condition implies the weakest pre-condition of ψ(N ) under peel(P N ). Having a simple structure for peel(P N ) (e.g., loop-free for the entire class of programs for which [12] works) makes it significantly easier to compute the weakest pre-condition. Note that ξ (N −1) is defined over the variables in V Q . In order to ensure that the inductive proof goes through, we need to strengthen the post-condition of the original Computing ξ(N − 1) requires a special form of logical abduction that ensures that ξ(N − 1) refers only to variables in V P . However, if D(V Q , V P , N − 1) is given as a set of equalities (as is often the case), ξ(N − 1) can be computed from ξ (N − 1) simply by substitution. This process of strengthening the pre-condition and postcondition may need to iterate a few times until a fixed point is reached, similar to what happens in the inductive step of [12]. Note that the fixed point iterations may not always converge (verification is undecidable in general). However, in our experiments, convergence always happened within a few iterations. If ξ (N − 1) denotes the formula obtained on reaching the fixed point, the final Hoare triple to be proved is . Having a simple (often loop-free) peel(P N ) significantly simplifies the above process.
We conclude this section by giving an overview of how Q N −1 and peel(P N ) are computed for the program P N shown in the first column of Fig. 2. The second column of this figure shows the program obtained from P N by peeling the last iteration of each loop of the program. Clearly, the programs in the first and second columns are semantically equivalent. Since there are no nested loops in P N , the peels (shown in solid boxes) in the second column are loop-free program fragments. For each such peel, we identify variables/array elements modified in the peel and used in subsequent non-peeled parts of the program. For example, the variable x is modified in the peel of the first loop and used in the body of the second loop, as shown by the arrow in the second column of Fig. 2. We replace all such uses (if needed, transitively) by expressions on the right-hand side of assignments in the peel until no variable/array element modified in the peel is used in any subsequent non-peeled part of the program. Thus, the use of x in the body of the second loop is replaced by the expression x + N*N in the third column of Fig. 2. The peeled iteration of the first loop can now be moved to the end of the program, since the variables modified in this peel are no longer used in any subsequent non-peeled part of the program. Repeating the above steps for the peeled iteration of the second loop, we get the program shown in the third column of Fig. 2. This effectively gives a transformed program that can be divided into two parts: (i) a program Q N −1 that differs from P N only in that all loops are truncated to iterate N − 1 (instead of N ) times, and (ii) a program peel(P N ) that is obtained by concatenating the peels of loops in P N in the same order in which the loops appeared in P N . It is not hard to see that P N , shown in the first column of Fig. 2, is semantically equivalent to Q N −1 ; peel(P N ). Notice that the construction of Q N −1 and peel(P N ) was fairly straightforward, and did not require any complex reasoning. In sharp contrast, construction of ∂P N , as shown in the bottom half of fourth column of Fig. 2, requires non-trivial reasoning, and produces a program with two sequentially composed loops.

Preliminaries and Notation
We consider programs generated by the grammar shown below: Formally, we consider a program P N to be a tuple (V, L, A, PB, N ), where V is a set of scalar variables, L ⊆ V is a set of scalar loop counter variables, A is a set of array variables, PB is the program body, and N is a special symbol denoting a positive integer parameter of the program. In the grammar shown above, we assume that A ∈ A, v ∈ V \ L, ∈ L and c ∈ Z. We also assume that each loop L has a unique loop counter variable that is initialized at the beginning of L and is incremented by 1 at the end of each iteration. We assume that the assignments in the body of L do not update . For each loop L with termination condition < UB, we require that UB is an expression in terms of N , variables in L representing loop counters of loops that nest L, and constants as shown in the grammar. Our grammar allows a large class of programs (with nested loops) to be analyzed using our technique, and that are beyond the reach of state-of-the-art tools like [1,12,42].
We verify Hoare triples of the form {ϕ(N )} P N {ψ(N )}, where the formulas ϕ(N ) and ψ(N ) are either universally quantified formulas of the form ∀I (α(I, N ) ⇒ β(A, V, I, N )) or quantifier-free formulas of the form η(A, V, N ). In these formulas, I is a sequence of array index variables, α is a quantifier-free formula in the theory of arithmetic over integers, and β and η are quantifier-free formulas in the combined theory of arrays and arithmetic over integers. Our technique can also verify a restricted set of existentially quantified post-conditions. We give a few illustrative examples in the Appendix.
For technical reasons, we rename all scalar and array variables in the program in a pre-processing step as follows. We rename each scalar variable using the wellknown Static Single Assignment (SSA) [43] technique, such that the variable is written at (at most) one location in the program. We also rename arrays in the program such that each loop updates its own version of an array and multiple writes to an array element within the same loop are performed on different versions of that array. We use techniques for array SSA [30] renaming studied earlier in the context of compilers, for this purpose. In the subsequent exposition, we assume that scalar and array variables in the program are already SSA renamed, and that all array and scalar variables referred to in the pre-and post-conditions are also expressed in terms of SSA renamed arrays and scalars.

Verification using Difference Invariants
The key steps in the application of our technique, as discussed in Section 2, are A1: Generation of Q N −1 and peel(P N ) from a given P N . A2: Generation of ϕ (N − 1) and ∆ϕ (N ) from a given ϕ(N ).
possibly by generation of ξ (N − 1) and ξ(N ) to strengthen the pre-and post-conditions, respectively.
We now discuss techniques for solving each of these sub-problems.

Generating Q N −1 and peel(P N )
The procedure illustrated in Fig. 2 (going from the first column to the third column) is fairly straightforward if none of the loops have any nested loops within them. It is easy to extend this to arbitrary sequential compositions of non-nested loops. Having all variables and arrays in SSA-renamed forms makes it particularly easy to carry out the substitution exemplified by the arrow shown in the second column of Fig. 2. Hence, we don't discuss any further the generation of Q N −1 and peel(P N ) when all loops are non-nested.

Fig. 3. A Generic Nested Loop
The case of nested loops is, however, challenging and requires additional discussion. Before we present an algorithm for handling this case, we discuss the intuition using an abstract example. Consider a pair of nested loops, L 1 and L 2 , as shown in Fig. 3. Suppose that B1 and B3 are loop-free code fragments in the body of L 1 that precede and succeed the nested loop L 2 . Suppose further that the loop body, B2, of L 2 is loop-free. To focus on the key aspects of computing peels of nested loops, we make two simplifying assumptions: (i) no scalar variable or array element modified in B2 is used subsequently (including transitively) in either B3 or B1, and (ii) every scalar variable or array element that is modified in B1 and used subsequently in B2, is not modified again in either B1, B2 or B3. Note that these assumptions are made primarily to simplify the exposition. For a detailed discussion on how our technique can be used even with some relaxations of these assumptions, the reader is referred to [13]. The peel of the abstract loops L 1 and L 2 is as shown in Fig. 4. The first loop in the peel includes the last iteration of L 2 in each of the N − 1 iterations of L 1 , that was missed in Q N −1 . The subsequent code includes the last iteration of L 1 that was missed in Q N −1 .

Fig. 4. Peel of the Nested Loop
Formally, we use the notation L 1 (N) to denote a loop L 1 that has no nested loops within it, and its loop counter, say 1 , increases from 0 to an upper bound that is given by an expression in N . Similarly, we use L 1 (N, L 2 (N)) to denote a loop L 1 that has another loop L 2 nested within it. The loop counter 1 of L 1 increases from 0 to an upper bound expression in N , while the loop counter 2 of L 2 increases from 0 to an upper bound expression in 1 and N . Using this notation, L 1 (N, L 2 (N, L 3 (N))) represents three nested loops, and so on. Notice that the upper bound expression for a nested loop can depend not only on N but also on the loop counters of other loops nesting it. For notational clarity, we also use LPeel(L i , a, b) to denote the peel of loop L i consisting of all iterations of L i where the value of i ranges from a to b-1, both inclusive. Note that if b-a is a constant, this corresponds to the concatenation of (b-a) peels of L i . We will now try to see how we can implement the transformation from the first column to the second column of Fig. 2 for a nested loop L 1 (N, L 2 (N)). The first step is to truncate all loops to use N − 1 instead of N in the upper bound expressions. Using the notation introduced above, this gives the loop L 1 (N-1, L 2 (N-1)). Note that all uses of N other than in loop upper bound expressions stay unchanged as we go from L 1 (N, L 2 (N)) to L 1 (N-1, L 2 (N-1)). We now ask: Which are the loop iterations of L 1 (N, L 2 (N)) that have been missed (or skipped) in going to L 1 (N-1, L 2 (N-1))? Let the upper bound expression of L 1 in L 1 (N, L 2 (N)) be U L1 (N ), and that of L 2 be U L2 ( 1 , N ). It is not hard to see that in every it- In addition, all iterations of L 1 corresponding to 1 ∈ {U L1 (N − 1), . . . , U L1 (N ) − 1} have also been missed. This implies that the "peel" of L 1 (N, L 2 (N)) must include all the above missed iterations. This peel therefore is the program fragment shown in Fig. 5.  N) is any linear function of 1 and N ), then the peel does not have any loop with nesting depth 2. Hence, the maximum nesting depth of loops in the peel is strictly less than that in L 1 (N, L 2 (N)), yielding a peel that is "simpler" than the original program. This argument can be easily generalized to loops with arbitrarily large nesting depths. The peel of L 1 (N, L 2 (N, L 3 (N))) is as shown in Fig. 6   As an illustrative example, let us consider the program in Fig. 7(a), and suppose we wish to compute the peel of this program containing nested loops. In this case, the upper bounds of the loops are U L1 (N ) = U L2 (N ) = N . The peel is shown in Fig. 7(b) and consists of two sequentially composed non-nested loops. The first loop takes into account the missed iterations of the inner loop (a single iteration in this example) that are executed in P N but are missed in Q N −1 . The second loop takes into account the missed iterations of the outer loop in Q N −1 compared to P N .
Generalizing the above intuition, Algorithm 1 presents function GenQand-Peel for computing Q N −1 and peel(P N ) for a given P N that has sequentially composed loops with potentially nested loops. Due to the grammar of our pro-Algorithm 1 GenQandPeel(PN : program) 1: Let sequentially composed loops in P N be in the order L1, L2, . . ., Lm; 2: for each loop Li ∈ TopLevelLoops(P N ) do 3:

12:
if L has subloops then 13: t ← nesting depth of inner-most nested loop in L;

16:
for each subloop SLj in Li at nesting depth k do Ordered SL1, SL2, . . ., SLj 17: 18: Note that lines 4 and 5 of Algorithm 1 implement the substitution represented by the arrow in the second column of Fig. 2. This is necessary in order to move the peel of a loop to the end of the program. If either of the loops L i or L j use array elements as index to other arrays then it can be difficult to identify what expression to use in Q Li for the substitution. However, such scenarios are observed less often, and hence, they hardly impact the effectiveness of the technique on programs seen in practice. The peel R Lj , from which the expression to be substituted in Q Li has to be taken, itself may have a loop. In such cases, it can be significantly more challenging to identify what expression to use in Q Li . We use several optimizations to transform the peeled loop before trying to identify such an expression. If the modified values in the peel can be summarized as closed form expressions, then we can replace the loop in the peel with its summary. For example consider the peeled loop, for ( 1 =0; 1 <N; 1 ++) { S = S + 1; }. This loop is summarized as S = S + N; before it can be moved across subsequent code. If the variables modified in the peel of a nested loop are not used later, then the peel can be trivially moved. In many cases, the loop in the peel can also be substituted with its conservative over-approximation. We have implemented some of these optimizations in our tool and are able to verify several benchmarks with sequentially composed nested loops. It may not always be possible to move the peel of a nested loop across subsequent loops but we have observed that these optimizations suffice for many programs seen in practice. Theorem 1. Let Q N −1 and peel(P N ) be generated by application of function GenQandPeel from Algorithm 1 on program P N . Then P N is semantically equivalent to Q N −1 ; peel(P N ). Then, the max nesting depth of loops in peel(P N ) is strictly less than that in P N .
Proof. Let U L k ( 1 , . . . , k−1 , N ) be the upper bound expression of a loop L k at nesting depth k.
i.e. a constant. Now, recalling the discussion in Section 4.1, we see that N )) simply results in concatenating a constant number of peels of the loop L k . Hence, the maximum nesting depth of loops in N )) is strictly less than the maximum nesting depth of loops in L k . Suppose loop L with nested loops (having maximum nesting depth t) is passed as the argument of function GenQandPeelForLoop (see Algorithm 1). In line 15 of function GenQandPeelForLoop, we iterate over all loops at nesting depth 2 and above within L. Let L k be a loop at nesting depth k, where 2 ≤ k ≤ t. Clearly, L k can have at most t − k nested levels of loops within it. Therefore, when LPeel is invoked on such a loop, the maximum nesting depth of loops in the peel generated for L k can be at most t − k − 1. From lines 18 and 19 of function GenQandPeelForLoop, we also know that this LPeel can itself appear at nesting depth k of the overall peel R L . Hence, the maximum nesting depth of loops in R L can be t − k − 1 + k, i.e. t − 1. This is strictly less than the maximum nesting depth of loops in L.

Corollary 1.
If P N has no nested loops, then peel(P N ) is loop-free.

Generating ϕ (N − 1) and ∆ϕ (N )
Given ϕ(N ), we check if it is of the form where ρ i is a formula on the i th elements of one or more arrays, and scalars used in P N . If so, we infer ϕ (N − 1) to be i=0 ρ i ) and ∆ϕ (N ) to be ρ N −1 (assuming variables/array elements in ρ N −1 are not modified by Q N −1 ). Note that all uses of N in ρ i are retained as is (i.e. not changed to N − 1) in ϕ (N − 1). In general, when deriving ϕ (N − 1), we do not replace any use of N in ϕ(N ) by N − 1 unless it is the limit of an iterated conjunct/disjunct as discussed above. Specifically, if ϕ(N ) doesn't contain an iterated conjunct/disjunct as above, then we consider ϕ (N −1) to be the same as ϕ(N ) and ∆ϕ (N ) to be True. Thus, our generation of ϕ (N −1) and ∆ϕ (N ) differs from that of [12]. As discussed earlier, this makes it possible to reason about a much larger class of pre-conditions than that admissible by the technique of [12].

Inferring Inductive Difference Invariants
Once we have P N −1 , Q N −1 , ϕ(N −1) and ϕ (N −1), we infer difference invariants. We construct the standard cross-product of programs Q N −1 and P N −1 , denoted as Q N −1 × P N −1 , and infer difference invariants at key control points. Note that P N −1 and Q N −1 are guaranteed to have synchronized iterations of corresponding loops (both are obtained by restricting the upper bounds of all loops to use N − 1 instead of N ). However, the conditional statements within the loop body may not be synchronized. Thus, whenever we can infer that the corresponding conditions are equivalent, we synchronize the branches of the conditional statement. Otherwise, we consider all four possibilities of the branch conditions. It can be seen that the net effect of the cross-product is executing the programs P N −1 and Q N −1 one after the other.
We run a dataflow analysis pass over the constructed product graph to infer difference invariants at loop head, loop exit and at each branch condition. The only dataflow values of interest are differences between corresponding variables in Q N −1 and P N −1 . Indeed, since structure and variables of Q N −1 and P N −1 are similar, we can create the correspondence map between the variables. We start the difference invariant generation by considering relations between corresponding variables/array elements appearing in pre-conditions of the two programs. We apply static analysis that can track equality expressions (including disjunctions over equality expressions) over variables as we traverse the program. These equality expressions are our difference invariants.
We observed in our experiments the most of the inferred equality expressions are simple expressions of N (atmost quadratic in N ). This not totally surprising and similar observations have also been independently made in [24,15,4]. Note that the difference invariants may not always be equalities. We can easily extend our analysis to learn inequalities using interval domains in static analysis. We can also use a library of expressions to infer difference invariants using a guessand-check framework. Moreover, guessing difference invariants can be easy as in many cases the difference expressions may be independent of the program constructs, for example, the equality expression v = v where v ∈ P N −1 and v ∈ Q N −1 does not depend on any other variable from the two programs.
For the example in Fig. 2, the difference invariant at the head of the first loop of , where x, a ∈ V P and x , a ∈ V Q . Given this, we easily get Note that the difference invariants and its computation are agnostic of the given post-condition. Hence, our technique does not need to re-run this analysis for proving a different post-condition for the same program.

Verification using Inductive Difference Invariants
We present our method Diffy for verification of programs using inductive difference invariants in Algorithm 2. It takes a Hoare triple {ϕ(N )} P N {ψ(N )} as input, where ϕ(N ) and ψ(N ) are pre-and post-condition formulas. We check the base in line 1 to verify the Hoare triple for N = 1. If this check fails, we report a counterexample. Subsequently, we compute Q N −1 and peel(P N ) as described in section 4.1 using the function GenQandPeel from Algorithm 1. At line 5, we compute the formulas ϕ (N − 1) and ∆ϕ (N ) as described in section 4.2. For automation, we analyze the quantifiers appearing in ϕ(N ) and modify the quantifier ranges such that the conditions in section 4.2 hold. We infer difference invariants D(V Q , V P , N − 1) on line 6 using the method described in section 4.3, wherein V Q and V P are sets of variables from Q N −1 and P N −1 respectively. At line 7, we compute ψ (N − 1) by eliminating variables V P from P N −1 from ψ(N − 1) ∧ D(V Q , V P , N − 1). At line 8, we check the inductive step of our analysis. If the inductive step succeeds, then we conclude that the assertion holds. If that is not the case then, we try to iteratively strengthen both the preand post-condition of peel(P N ) simultaneously by invoking Strengthen.
The function Strengthen first initializes the formula χ(N ) with ψ(N ) and the formulas ξ(N ) and ξ (N − 1) to True. To strengthen the pre-condition of peel(P N ), we infer a formula χ (N − 1) using Dijkstra's weakest pre-condition computation of χ(N ) over the peel(P N ) in line 17. It may happen that we are unable to infer such a formula. In such a case, if the program peel(P N ) has loops then we recursively invoke Diffy at line 20 to further simplify the program. Otherwise, we abandon the verification effort (line 22). We use quantifier elimination to infer χ(N − 1) from χ (N − 1) and D(V Q , V P , N − 1)) at line 7.  return False; Unable to prove 23: (1)  The inferred pre-conditions χ(N ) and χ (N −1) are accumulated in ξ(N ) and ξ (N − 1), which strengthen the post-conditions of P N and Q N −1 respectively in lines 24 -25. We again check the base case for the inferred formulas in ξ(N ) at line 26. If the check fails we abandon the verification attempt at line 27. If the base case succeeds, we then proceed to the inductive step. When the inductive step succeeds, we conclude that the assertion is verified. Otherwise, we continue in the loop and try to infer more pre-conditions untill we run out of time.

Experimental Evaluation
We have instantiated our technique in a prototype tool called Diffy. It is written in C++ and is built using the LLVM(v6.0.0) [31] compiler. We use the SMT solver Z3(v4.8.7) [39] for proving Hoare triples of loop-free programs. Diffy and the supporting data to replicate the experiments are openly available at [14].
Setup. All experiments were performed on a machine with Intel i7-6500U CPU, 16GB RAM, running at 2.5 GHz, and Ubuntu 18.04.5 LTS operating system. We have compared the results obtained from Diffy with Vajra(v1.0) [12], VIAP(v1.1) [42] and VeriAbs(v1.4.1-12) [1]. We choose Vajra which also employs inductive reasoning for proving array programs and verify the benchmarks in its test-suite. We compared with VeriAbs as it is the winner of the arrays sub-category in SV-COMP 2020 [6] and 2021 [7]. VeriAbs applies a sequence of techniques from its portfolio to verify array programs. We compared with VIAP which was the winner in arrays sub-category in SV-COMP 2019 [5]. VIAP also employs a sequence of tactics, implemented for proving a variety of array programs. Diffy does not use multiple techniques, however we choose to compare it with these portfolio verifiers to show that it performs well on a class of programs and can be a part of their portfolio. All tools take C programs in the SV-COMP format as input. Timeout of 60 seconds was set for each tool. A summary of the results is presented in Table 1.
Benchmarks. We have evaluated Diffy on a set of 303 array benchmarks, comprising of the entire test-suite of [12], enhanced with challenging benchmarks to test the efficacy of our approach. These benchmarks take a symbolic parameter N which specifies the size of each array. Assertions are (in-)equalities over array elements, scalars and (non-)linear polynomial terms over N . We have divided both the safe and unsafe benchmarks in three categories. Benchmarks in C1 category have standard array operations such as min, max, init, copy, compare Analysis. Diffy verified 151 safe benchmarks, compared to 110 verified by Vajra as well as VeriAbs and 20 verified by VIAP. Diffy was unable to verify 6 safe benchmarks. In 3 cases, the smt solver timed out while trying to prove the induction step since the formulated query had a modulus operation and in 3 cases it was unable to compute the predicates needed to prove the assertions. Vajra was unable to verify 47 programs from categories C2 and C3. These are programs with nested loops, branch conditions affected by N , and cases where it could not compute the difference program. The sequence of techniques employed by VeriAbs, ran out of time on 47 programs while trying to prove the given assertion. VeriAbs proved 2 benchmarks in category C2 and 3 benchmarks in category C3 where Diffy was inconclusive or timed out. VeriAbs spends considerable amount of time on different techniques in its portfolio before it resorts to Vajra and hence it could not verify 14 programs that Vajra was able to prove efficiently. VIAP was inconclusive on 24 programs which had nested loops or constructs that could not be handled by the tool. It ran out of time on 113 benchmarks as the initial tactics in its sequence took up the allotted time but could not verify the benchmarks. Diffy was able to verify all programs that VIAP and Vajra were able to verify within the specified time limit.
The cactus plot in Figure 8 (a) shows the performance of each tool on all safe benchmarks. Diffy was able to prove most of the programs within three seconds. The cactus plot in Figure 9(a) shows the performance of each tool on safe benchmarks in C1 category. Vajra and Diffy perform equally well in the C1 category. This is due to the fact that both tools perform efficient inductive  Figure 10(a) shows the performance of each tool on safe benchmarks in the combined categories C2 and C3, that are difficult for Vajra as most of these programs are not within its scope. Diffy out performs all other tools in categories C2 and C3. VeriAbs was an order of magnitude slower on programs it was able to verify, as compared to Diffy. VeriAbs spends significant amount of time in trying techniques from its portfolio, including Vajra, before one of them succeeds in verifying the assertion or takes up the entire time allotted to it. VIAP took 70 seconds more on an average as compared to Diffy to verify the given benchmark. VIAP also spends a large portion of time in trying different tactics implemented in the tool and solving the recurrence relations in programs.
Our technique reports property violations when the base case of the analysis fails for small fixed values of N . While the focus of our work is on proving assertions, we report results on unsafe versions of the safe benchmarks from our test-suite. Diffy was able to detect a property violation in 142 unsafe programs and was inconclusive on 4 benchmarks. Vajra detected violations in 115 programs and was inconclusive on 31 programs. VeriAbs reported 125 programs as unsafe and ran out of time on 21 programs. VIAP reported property violation in 120 programs, was inconclusive on 23 programs and timed out on 3 programs.
The cactus plot in Figure 8  tools and on more benchmarks from the test-suite. Figure 9(b) and Figure 10(b) give a finer glimpse of the performance of these tools on the categories that we have defined. In the C1 category, Diffy and Vajra have comparable performance and Diffy disproves the same number of benchmarks as Vajra and VIAP. In C2 and C3 categories, we are able to detect property violations in more benchmarks than other tools in less time.
To observe any changes in the performance of these, we also ran them with an increased time out of 100 seconds. Performance remains unchanged for Diffy, Vajra and VeriAbs on both safe and unsafe benchmarks, and of VIAP on unsafe benchmarks. VIAP was able to additionally verify 89 safe programs in categories C1 and C2 with the increased time limit.

Related Work
Techniques based on Induction. Our work is related to several efforts that apply inductive reasoning to verify properties of array programs. Our work subsumes the full-program induction technique in [12] that works by inducting on the entire program via a program parameter N . We propose a principled method for computation and use of difference invariants, instead of computing difference programs which is more challenging. An approach to construct safety proofs by automatically synthesizing squeezing functions that shrink program traces is proposed in [27]. Such functions are not easy to synthesize, whereas difference invariants are relatively easy to infer. In [11], the post-condition is inductively established by identifying a tiling relation between the loop counter and array indices used in the program. Our technique can verify programs from [11], when supplied with the tiling relation. [44] identifies recurrent program fragments for induction using the loop counter. They require restrictive data dependencies, called commutativity of statements, to move peeled iterations across subsequent loops. Unfortunately, these restrictions are not satisfied by a large class of programs in practice, where our technique succeeds.
Difference Computation. Computing differences of program expressions has been studied for incremental computation of expensive expressions [41,35], optimizing programs with arrays [34], and checking data-structure invariants [45].
These differences are not always well suited for verifying properties, in contrast with the difference invariants which enable inductive reasoning in our case.
Logic based reasoning. In [21], trace logic that implicitly captures inductive loop invariants is described. They use theorem provers to introduce and prove lemmas at arbitrary time points in the program, whereas we infer and prove lemmas at key control points during the inductive step using SMT solvers. VIAP [42] translates the program to an quantified first-order logic formula using the scheme proposed in [32]. It uses a portfolio of tactics to simplify and prove the generated formulas. Dedicated solvers for recurrences are used whereas our technique adapts induction for handling recurrences.
Invariant Generation. Several techniques generate invariants for array programs. QUIC3 [25], FreqHorn [19] and [9] infer universally quantified invariants over arrays for Constrained Horn Clauses (CHCs). Template-based techniques [23,47,8] search for inductive quantified invariants by instantiating parameters of a fixed set of templates. We generate relational invariants, which are often easier to infer compared to inductive quantified invariants for each loop.
Abstraction-based Techniques. Counterexample-guided abstraction refinement using prophecy variables for programs with arrays is proposed in [36]. Veri-Abs [1] uses a portfolio of techniques, specifically to identify loops that can be soundly abstracted by a bounded number of iterations. Vaphor [38] transforms array programs to array-free Horn formulas to track bounded number of array cells. Booster [3] combines lazy abstraction based interpolation [2] and acceleration [10,28] for array programs. Abstractions in [16,18,22,26,29,33,37] implicitly or explicitly partition the range array indices to infer and prove facts on array segments. In contrast, our method does not rely on abstractions.

Conclusion
We presented a novel verification technique that combines generation of difference invariants and inductive reasoning. These invariants relate corresponding variables and arrays from two versions of a program and are easy to infer and prove. These invariants facilitate inductive reasoning by assisting in the inductive step. We have instantiated these techniques in our prototype Diffy. Experiments shows that Diffy out-performs the tools that won the Arrays subcategory in SV-COMP 2019, 2020 and 2021. Investigations in using synthesis techniques for automatic generation of difference invariants to verify properties of array manipulating programs is a part of future work.