Mind the Gap: Bit-vector Interpolation recast over Linear Integer Arithmetic

Much of an interpolation engine for bit-vector (BV) arithmetic can be constructed by observing that BV arithmetic can be modeled with linear integer arithmetic (LIA). Two BV formulae can thus be translated into two LIA formulae and then an interpolation engine for LIA used to derive an interpolant, albeit one expressed in LIA. The construction is completed by back-translating the LIA interpolant into a BV formula whose models coincide with those of the LIA interpolant. This paper develops a back-translation algorithm showing, for the first time, how back-translation can be universally applied, whatever the LIA interpolant. This avoids the need for deriving a BV interpolant by bit-blasting the BV formulae, as a backup process when back-translation fails. The new back-translation process relies on a novel geometric technique, called gapping, the correctness and practicality of which are demonstrated.


Introduction
Given two formulae A and B which are inconsistent, an interpolant for the ordered pair A, B is a formula I over the variables common to both A and B which is a relaxation of A that is still inconsistent with B. For example, when working over the theory of linear inequalities, if A = (x = y + 1) ∧ (y = 0) and B = (x = z + 2) ∧ (1 ≤ z) then interpolants for A, B are I 1 = (x = 1), I 2 = (x ≤ 1) and I 3 = (x < 3), ordering by increasing generality. The intuition behind I 1 , I 2 and I 3 is that they are abstractions of A which concisely explain the inconsistency between A and B. Interpolation has attracted growing attention over the last decade [26], because of the crucial role it plays in model checking in lazy [18] predicate abstraction [15] and lazy abstraction with interpolants [25], as exemplified in BLAST [5] and IMPACT [25] respectively. In lazy predicate abstraction [25], interpolation is used to synthesise predicates which describe program state. Predicates are added, on demand, to explain why a path through a program cannot reach an error state. In lazy abstraction with interpolants [25], program state is described with unrestricted formulae, rather than merely using predicates, and interpolation is applied to relax sequences of formulae that describe the states down paths which do not error. Interpolation simplify these formulae but increasing the likelihood of covering, again accelerating path exploration. In effect, interpolation is the key abstraction mechanism.
Context As solvers for richer theories have evolved so have interpolation engines for these theories, with a notable flurry of activity around one decade ago [10,11,19,20,23,24,30]. However, progress on the important theory of bit-vectors (BV) has been surprisingly slow, the two key works [2,16] taking opposing approaches. One takes advantage of existing interpolation engines [16] and the another develops a bespoke interpolation engine around lazy reduction [2], which supports bit-vector operations by expanding them, on demand, to Presburger arithmetic [2]. This paper develops the former approach, aiming to use an LIA solver as is.
The central problem in bit-vector interpolation is to construct an interpolant which is compact (one might even say beautiful [1]). Although a pair of inconsistent BV formulae can always be bit-blasted (unfolded) into a pair of inconsistent propositional formulae, it is not always obvious how the resulting propositional interpolant can be folded back into a compact bit-vector (BV) formula to derive a BV interpolant. Interpolation engines over linear integer arithmetic (LIA) have thus been repurposed for BV interpolation [16]. First, operations on bit-vectors are reformulated as LIA formulae. An interpolant over LIA is then reinterpreted as a candidate interpolant for a pair of BV formulae. Because of wrap-around, LIA does not necessarily align with BV arithmetic, hence the LIA interpolant is adopted as a BV interpolant only if it passes a (unsatisfiability) check over bit-vectors. This checks that the interpolant relaxes the first BV formula of the pair and yet is still inconsistent with the second. If the candidate fails the check, then the two BV formulae are bit-blasted to recover a propositional interpolant, albeit one which looses the high-level structure of bit-vectors, and therefore is not compact. This approach is promising: it exploits robust off-the-shelf LIA interpolation [17] yet is compromised by the quality of the interpolants which follow from bit-blasting.
Contribution This paper plugs this gap, addressing the issue of interpolant quality by developing a new, principled encoding LIA formulae into BV formulae which does not enlarge the bit-width of the BV formulae. This ensures that the interpolant is still drawn from the language used to define BV formulae. We show that a naïve encoding of an LIA inequality as a BV inequality can give a formula which has a completely different meaning from LIA inequality: the BV inequality can have solutions not admitted by the LIA inequality and vice versa. Moreover, we illustrate how a straightforward encoding of a single LIA inequality can require many BV inequalities, which compromises the quality of a BV interpolant. We therefore propose a technique, which we call gapping, which adds range constraints to LIA inequality which reduces the LIA inequality into two or three LIA systems the solutions of which are amenable to compact BV representation. The term gapping reflects a geometric interpretation of this transformation which introduces a gap 4 between the solutions of the two LIA systems. We demonstrate the value of this approach with a BV interpolation engine which side-steps bit-blasting (and the complexity of providing bit-level circuits for arithmetic) and show that the approach usually gives a modest slowdown relative to LIA. We also prove the validity of the BV encoding, and the correctness of the reductions the encoding relies on, though the proofs themselves are omitted here for brevity. To summarise, the contributions of this paper are as follows: -We show how interpolating theorem provers for LIA can be used to interpolate BV formulae, without recourse to bit-blasting. -We develop a rigorous theory which explains gapping and proves that the resulting BV interpolant has exactly the same set of models as the LIA interpolant of the two BV formulae. -We provide evaluation, within the established [6] framework of lazy abstraction with interpolants [25] which demonstrates the practicality of the approach for BV interpolation.
Use case Since BV formulae are converted to LIA one might wonder why one cannot work with LIA throughout and avoid BV interpolation all together. First, such an approach would not fit with a layered approach to interpolation [17] where one uses one lightweight theory (eg. uninterpreted functors) and then, if necessary, a more complicated one (eg. LIA) to construct a BV interpolant. BV formulae provide a uniform way expressing interpolants, no matter how they are derived. Second, computing LIA interpolants is complex and it is not surprising that these engines contain subtle 5 bugs. Translating a LIA interpolant back into a BV formula enables interpolants to be validated using a BV solver [3,7,27], using the reference (BV) semantics of a program. Moreover, validation need make no assumption on the correctness of a translation between theories. Validation can be performed on-the-fly, as the unwinding tree [25] is constructed, or by translating the complete, stable unwinding tree into its BV counterpart. The BV version can then be validated as a form of post-processing, akin to post-fixpoint validation in abstract interpretation [4,14].
Road map This paper is structured as follows: Section 2 gives the intuition behind boxing and gapping whereas Section 3 argues for the correctness of the approach. Section 4 presents the experimental work. Section 5 presents the related work and section 6 concludes.

Boxing and Gapping in Pictures
Given a linear inequality , we seek to find a bit-vector formula f such that f BV = LIA where f BV and LIA are respectively the sets of solutions (models) of f and in the linear integer arithmetic (LIA) and bit-vector (BV) semantics. Ideally f should be compact where we measure size by the number of binary logical connectives in f . This section gives the intuition behind two techniques, boxing and gapping, and demonstrate how they are used together to construct such an f ; the sequel provides a more formal development.
To illustrate boxing and gapping, first consider the set of solutions to the inequality x + y ≤ 3, when interpreted with both the LIA semantics and BV semantics. Figure 1(a) gives the LIA solutions in blue and the BV solutions in red over the non-negative integer grid {(x, y) | 0 ≤ x < 8 ∧ 0 ≤ y < 8} using a modulo of 8 for bit-vectors. The solution sets differ on, for instance, (5, 6) since (5 + 6) (mod 8) = 3 ≤ 3 but 5 + 6 = 11 ≤ 3. It does not generally follow that f LIA ⊆ f BV as Figure 1 Enumeration A naive approach to finding a formula f such that f BV = LIA is to enumerate all solutions of LIA to then summarise them in a single BV formula. Figure 1(a) illustrates the 4 + 3 + 2 + 1 = 10 LIA solutions for = (x + y ≤ 3) which are summarised in the following BV formula: This formula has 9 binary disjuncts and 10 binary conjuncts, hence 19 logical connectives in total. A more compact formulation is to cover the blue triangular region of Figure 1(a) with columns as realised with the following BV formula: Only non-negative solutions on the grid are considered so there is no need to additionally assert 0 ≤ y. This formula has 3 binary disjuncts and 4 binary conjuncts giving and 7 connectives in total.
Boxing Observe from Figure 1(a) that the extra solutions of x + y ≤ 3 BV over x + y ≤ 3 LIA stem from overflow. Overflow can be avoided by constraining BV solutions with x ≤ 3 and y ≤ 3 which amounts to placing a box (in general a hyper-rectangle) around the LIA solutions, as illustrated in Figure 1(b). This tactic, henceforth called boxing, leads to the following formula: which requires 2 binary conjuncts.
Gapping Figure 1(c) illustrates that in general boxing cannot be applied in isolation because a box around the LIA solutions would not eliminate any extraneous BV solutions. Boxing is successful for Figure 1(b) because of the absence of solutions (a gap) between the LIA solutions inside the box and the BV solutions outside the box. No such gap exists for the box of Figure 1(c). Yet boxing can still be applied by decomposing the inequality x + y ≤ 7 into two inequalities both of which are amenable to boxing. The construction is based on Recall that boxing alone allows the LIA solutions of x + y ≤ 3 to be expressed as a BV formula of 2 binary connectives. Thus consider the compound formula = (0 ≤ x + y − 4 ∧ x + y − 4 ≤ 3) whose LIA solutions are illustrated in Figure 1(d). Observe that the BV solutions of can be covered with two rectangles without including the extraneous 6 BV solutions in top right. Then such that f 3 BV = x + y ≤ 7 LIA . This tactic of artificially introducing a gap, henceforth called gapping, is equally applicable for larger grids too. For instance, working over a modulo of 32 x + y ≤ 31 LIA = f 4 BV where

Formal correctness of boxing and gapping
In what follows we consider LIA and BV formulae over an ordered set of variables {x 1 , . . . , x d } for some d > 1. We consider bit-vectors of fixed width w > 1 and interpret LIA and BV formulae over the product space M d where M = {0, 1, 2, . . . , m − 1} and m = 2 w as follows: Furthermore, the LIA semantics can be lifted from inequalities to LIA formulae by: In the sequel, N denotes the set of (strictly) positive integers, R the set of real numbers, and R ≥0 the set of non-negative real numbers. We extend the floor and ceiling function for the sequences in R d in a component-wise manner:

Boxing
Boxing is founded on the following result and its corollary in which sets of solutions to inequalities which describe hyper-rectangles are pinched, above and below, by inclusions to systems of inequalities with positive, unary coefficients: Then: The corollary leads to two types of box constraint: one for LIA and the other, reducing boxing, for BV. Boxing formulae are purely conceptual and are used to reason about correctness; reduced boxing formulae are deployed within BV interpolants.
Given m and b ∈ N, it is always possible to find a unique L ∈ N which satisfies Definition 2 by putting L = 2b m + 1.
One might expect that the cardinality of I d ((d − 1)(L + 1)) becomes large as d or L grow large. Yet d is the number of variables occurring in the LIA interpolant, which is typically small. Furthermore, when L is large, the values of p are also large, so that many terms become equivalent because of the min operation in equation (2) of Definition 2. Thus the number of terms required to define box BV (c; b) does not grow excessively large in practice.
The following proposition asserts that the boxing and reduced boxing formulae share the same solution set when interpreted with, respectively, the LIA and BV semantics.
The following lemma shows that the solution sets for boxing grow monotonically as the constant of the inequality is relaxed.
The following results explains how to augment an inequality with a box so as to align its BV semantics with its LIA semantics.
Theorem 1 (boxing without gapping). Let c ∈ N d and b ∈ N. If b < m/2 then Fig. 2. Gapping and boxing for x + 2y ≤ 5 Observe that the result requires b < m/2. In this circumstance L = 2b/m + 1 = 1 and number of logical connectives in box BV (c; b) is determined by the cardinality of the set I d ((d − 1)(L + 1)) = I d (2(d − 1)), which is given below:
Example 4. Yet the LIA inequality x + 2y ≤ 5 can be decomposed as follows: Figures 2(b, c) illustrates boxing for x + 2y ≤ 3 and 0 ≤ x + 2y − 4 ≤ 1 where: Observe from Figure 2(c) that and moreover 0 mod 8 = 0 ≤ (x + 2y − 4) mod 8 for all (x, y) ∈ M 2 thus The general rule of the separation of the given inequality and the boxing is shown in this theorem: Corollary 2 (boxing and gapping with simplification).

Boxing, Gapping and Flipping
To handle inequalities which have indeterminates with negative coefficients, boxing and gapping are augmented with a third technique, which we have informally named flipping. Flipping transforms an inequality into a syntactic form which is amenable to boxing and gapping by reflecting the solutions of the inequality. To detail the transformation, we assume without loss of generality, that an inequality takes the syntactic form c Given an inequality with negative coefficients, we derive a new inequality whose solutions coincide with the flipped solutions of the given inequality. This transformation is then lifted to formulae as follows: The overall strategy involves applying boxing and gapping to an inequality derived by the flipping function F x − . The validity of this strategy is based on the following proposition: A complete strategy for handling inequalities with negative coefficients is justified by the following corollary. The strategy entails flipping an LIA inequality, deriving a BV formula by boxing and gapping, and then flipping the BV formula. Fig. 4(a). Then Fig. 3(a) shows 7x + 3y − 21 ≤ −4 LIA = 7x + 3y ≤ 17 LIA and so building on exam- , F y (φ 1 ) and F y (φ 2 ) are given in Fig. 4(b), (c) and (d) respectively. Finally, to illustrate the handling of boxing, recall box BV (c; 11) from example 6 and box BV (c; 11) Finally observe and that the disjunct (x ≤ 1 ∧ (−y + 7 ≤ 1)) is actually redundant.

Boxing, Gapping, Flipping and Demoding
Griggio [16] gives a procedure for encoding machine arithmetic in LIA, illustrating that the resulting LIA interpolants can include inequalities such as −x 2 +x 3 −256 −x 2 /256 ≤ 255 [16,Example 5]. Relaxing inequalities to include ceiling (or floor) functions can reduce the size of interpolants whilst simplifying their derivation [17]. These more general forms of interpolant include inequalities of the form c · x + n c · x/n ≤ b [9] or c · x + n c · x/n ≤ b [17], though for our purposes it is sufficient to consider c · x + n 2 n c · x/2 n ≤ b or c · x + n 2 n c · x/2 n ≤ b, where the divisors are powers of 2, stemming from the way they model wrap-around in machine arithmetic. To extend boxing to these generalised interpolants we extend the LIA and BV semantics two new types of atomic constraint (though the definitions are almost vacuous): The following proposition shows generalised LIA interpolants are not an obstacle to boxing. These inequalities are handled through a transformation scheme which exploits the property that if n ≤ w then (c · x mod 2 n ) mod m = c · x mod 2 n . We informally call this transformation tactic demoding, because like gapping and flipping, it is designed to increase the general applicability of boxing.
If y does not occur in x then Inequalities such as c · x + n 2 n c · x/2 n ≤ b can be handled similarly. For completeness, we note that expansion can be applied for non-powers of 2: Proposition 4. Suppose n > 0. Then

Experiments
To evaluate the performance of boxing we implemented a model checker based on the lazy abstraction (IMPACT) [25] algorithm. The model checker is implemented in Python 3.7.2 and uses MathSAT5 [9] for satisfiability checking and interpolation over LIA. The model checker parses a subset of the C language, but is rich enough to handle 312 benchmarks drawn from [2,12]. The model checker was instantiated in one of three ways to use: (1) LIA interpolation [17]; (2) BV interpolation by covering the solutions of an LIA interpolate with columns (recall f 2 of section 2); and (3) BV interpolation by covering the solutions of an LIA interpolate using boxing, gapping and flipping. Experiments were performed using an Amazon Web Service EC2 c3.xlarge cloud architecture of 14 EC2 Computing Units [31] each equipped with 4 cores and 7.5 GB of RAM. The timeout for each run of IMPACT was set to 600 seconds. All arithmetic is idealised in configuration (1) taking no account of integer overflow and underflow. This is not, in general, safe. In configurations (2) and (3) the model checker interprets machine arithmetic and bit operations using the LIA encoding of BV operations outlined in [16,Fig 1]. This is safe but complicates the LIA formulae, often substantially. One would expect this to enlarge the interpolants, even before boxing and gapping are deployed. We would also expect (1) to be substantially faster than (2) and (3). Due to differences in the semantics of arithmetic, we might also see differences in the number of programs proved to be safe or found to be unsafe. The experiments quantify these predictions. To discuss the experiments, (2) will be referred to as the naive encoding, even though it improves on complete enumeration (recall f 1 of section 2). Table 4.1 summarises the outcomes of running IMPACT on all 312 programs, using the three different instances of interpolation, categorised as to whether the run proved safety (safe rows) or found a counterexample (unsafe rows). The Solved column of the left-hand table gives the total of the programs there were either shown to be safe or unsafe within 600 seconds. Time is the mean execution of a run (for all those programs which did not timeout). Size is mean total number of atomic constraints in all interpolants encountered over a run (for those programs which did not timeout). We observe that more programs can be analysed to completion with LIA than with BV, as one would expect, but that BV (boxing) improves on BV (naive), the speedup being significant when proving safety. The right-hand table compares a terminating run of LIA to a terminating run of BV (boxing). For 17 of these 142 runs, LIA (incorrectly) verified the program to be safe whereas BV found a counter-example. Unexpectantly for trex03 trueunreach-call.i.annot.c from [12], LIA found a counter-example but BV verified safety. This program contains three integers, x1, x2 and x3, which can become negative in the idealised arithmetic employed in LIA, triggering an assertion. But x1, x2 and x3 are actually unsigned.

Runtime for Naive encoding and Boxing
The scatter plot of Figure 5 compares the runtime of the naive encoding against that of boxing and its allied techniques of gapping and flipping. The scatter plot excludes timeouts and depicts 151 pairs of runs. Almost all points are under the dotted line, indicating the boxing significantly improves performance. The line graph plots the ratio of the execution times, from which we observe that boxing does not accelerate the verification for almost half of the runs, but does speed it up between 2-and 256-fold for the other half.

Interpolant Size for Naive encoding and Boxing
The line graph on Figure 6 compares the relative size of interpolants for boxing versus the naive encoding. Size is the sum of the sizes of all the interpolants generated during a run, where the size of an interpolant is itself defined as the number of atomic constraints that occur within it. We observe that for most problems the size ratio is around one, but a second peak occurs at 1/32, giving an overall size reduction. The scatter plot explores how interpolant size correlates with runtime, showing how the relative size of interpolants varies with relative runtimes. We observe that reducing the size of interpolants improves runtime, and that two peaks of the line graph manifesting themselves as two clusters of points in the scatter plot. Total size of interpolants of BV with boxing relative to BV with naive encoding

Related work
The problem of reasoning about machine arithmetic and wrapping arises not only in model checking, but abstract interpretation too, where solvers are augmented with support for relaxing abstractions by join rather than interpolation. Despite the long-standing work [3,7,27] in deciding BV theories, there has been scant work on BV interpolation. Although not focussing on BV interpolation, an early work on deriving work-level interpolants [23] uses bit-vectors to interpolate equality logic. This logic supports equations of the form x = y and x = c where x and y are variables and c is drawn from a finite set of symbols C. Bit-vectors with width log 2 (|C|) are used to bit-blast equations [29] so that formulae are encoded entirely propositionally. Then a propositional resolution proof of the inconsistence of two formulae is lifted to the work-level.
Seminal work by Griggio [16] advocated encoding BV formulae in theories of increasing complexity. The pair of BV formulae are encoded in a theory whose interpolation engine is used to find an interpolant in that theory. The interpolant is then reinterpreted as a BV formula and tested to see if it is still an interpolant the pair of BV formulae. The approach resorts to bit-blasting if no simpler theory can find an interpolant, at the cost of losing world-level information. By way of contrast, Backeman et al. [2] propose a calculus over a core language, which supports interpolation and is rich enough to describe BV formulae, even making use of Groebner bases to express polynomial equality relationships. Since interpolation is performed within their core language, they do not aim to derive a BV interpolant, and therefore their work is orthogonal to ours. Yet if Backeman's procedure returns an interpolant in their core language and it could be interpreted as an LIA formula, which would seem likely for many cases, then our work could convert the LIA formula back to BV.
Further afield, polynomial algorithms for interpolation have developed for systems of linear congruence equations [19, section 4], conjunctions of linear Diophantine equations and disequations [19, section 6], and systems of mixed integer linear equations [19, section 7]. This comprehensive study stops short of using LIA to interpolate BV formula, mentioning the problem as future work.
Abstract domains have been proposed for tracking linear modulo relationships where the module is a power of 2 [13,22,28]. These domains, which are essentially specialist solvers, express more than linear equalities [21], while enabling the domain operations to be realised using machine arithmetic. Surprisingly, systems of linear inequalities can be reinterpreted to model machine arithmetic by just changing the concretisation function [32] and the handling of guards [32].

Concluding Discussion
To repurpose efficient LIA interpolation engines to BV, we have shown how to systematically construct a BV formula so its solutions are exactly those of an LIA interpolant. Since an LIA interpolant summarises the reason for a conflict between two LIA formula, we seek to retain its compact structure by introducing no more than simple boxes around the LIA solutions which block extraneous BV solutions. When this encoding tactic, called boxing, is not applicable, gapping is used to decompose an LIA inequality into two or more inequalities which are amenable to boxing. We show how the size of the resulting BV interpolants are smaller than BV interpolants constructed by merely partitioning the LIA solutions into columns, and demonstrate how boxing and gapping improves the runtime of an interpolation-based model-checker. We instantiate a model-checker with LIA and BV to compare their performance, and conclude that with this encoding BV interpolation is feasible. Because of wrap-around, BV is substantially more complicated than LIA for interpolation, yet BV is no more than twice as slow as LIA for over half the benchmarks. Furthermore, the resulting BV interpolants can be validated, independent of LIA, just using a BV solver.