figure a
figure b

1 Introduction

Bit-precise reasoning as provided by Satisfiability Modulo Theories (SMT) for the theory of fixed-size bit-vectors is a key requirement for many applications in computer-aided verification. The dominant, state-of-the-art approach for solving bit-vector formulas is a technique called bit-blasting [24], an eager reduction of bit-vector constraints to a propositional satisfiability problem (SAT). Bit-blasting is usually combined with aggressive simplifications of the input constraints prior to the actual reduction step. Even though this eager reduction may come at the cost of significantly increasing the formula size, it is surprisingly efficient in practice—mainly due to the fact that state-of-the-art SAT solvers are usually able to efficiently deal with complex formulas over millions of variables. This size increase, however, is a potential bottleneck and the main reason why bit-blasting does not generally scale well for large bit-widths. This is especially true in the presence of arithmetic operators, which translate to large and complex Boolean circuits on the bit-level. In practice, this scaling issue can already occur with bit-widths as low as 32 bits, and it is especially severe for applications that reason over considerably larger bit-widths due to the nature of their domain, e.g., 256 bits in the context of smart contract verification [15].

In this paper, we propose a novel abstraction-refinement framework for the theory of fixed-size bit-vectors that significantly improves the scalability of bit-blasting on increasing bit-widths. Rather than providing an alternative to bit-blasting, our approach is explicitly aimed at improving its performance via an abstraction-refinement scheme based on the counterexample-guided abstraction refinement (CEGAR) paradigm [16]. Constructs and operators that are potentially expensive when translated to the bit-level are abstracted with fresh uninterpreted functions (UF), which corresponds to over-approximating the original problem and translates to significantly smaller circuits on the bit-level. When an abstraction is unsatisfiable, so is the original problem. However, when it is satisfiable and inconsistent with the true semantics of the abstracted operators, it must be refined with lemmas to rule out spurious counterexamples. We iteratively repeat the abstraction-refinement process until all abstractions are consistent, and only fall back to bit-blasting an abstracted term when it cannot be further refined, as a last resort. Thus, the main challenge is finding lemmas for abstraction refinement that, ideally, allow to avoid bit-blasting of abstracted terms, entirely. To this extent, this paper makes the following contributions:

  • We present a modular and configurable CEGAR-style abstraction-refinement framework for the theory of fixed-size bit-vectors, based on bit-blasting.

  • We provide a set of refinement lemmas for a restricted but sufficient set of arithmetic bit-vector operators (bvmul, bvudiv, bvurem). This set of lemmas consists of a set of basic, hand-crafted lemmas (encoding core properties of abstracted operators) and a set of lemmas synthesized via abduction.

  • We provide a lemma scoring scheme and an abduction-based framework for synthesizing lemmas, utilizing the syntax-restricted abduction reasoning capabilities of the SMT solver cvc5  [7].

  • We extend the open-source SMT solver Bitwuzla  [29] with our approach and show that it significantly improves performance on a wide range of benchmarks, including industrial benchmarks from smart contract verification.

Related Work. Developing scalable approaches for solving bit-vector formulas with large bit-widths is a long-standing challenge. Previous efforts to tackle this challenge can be mainly divided into two categories: alternative approaches to bit-blasting that primarily rely on word-level reasoning, and techniques based on bit-blasting that try to reduce the size of the original problem on the bit-level.

Alternative approaches to bit-blasting include: translations to linear integer arithmetic [11] and non-linear integer arithmetic (in combination with CEGAR-style handling of bit-wise operators) [36]; layered CDCL(\(T\))-style approaches that rely on encoding fragments of the input problem into other theories before resorting to bit-blasting [13, 21]; instances of the model-constructing satisfiability (mcSAT) calculus [20, 35], a generalization of propositional conflict-driven clause learning (CDCL) to SMT; and incomplete techniques such as local search [19, 28, 30], which are only able to determine satisfiability. All of these approaches are generally not competitive with bit-blasting.

Techniques based on bit-blasting that aim at mitigating the impact of increasing bit-widths on the bit-level are mainly based on some form of under-approximation. Bryant et al. [14] proposed a combination of under-approximation via restricting the value range of input variables with over-approximation of the unsat core of the under-approximated problem. This over-approximation consists of two strategies: eliminating if-then-else (ite) operations, and abstracting bit-vector multiplication \(x \cdot y\) with a partially interpreted function of the form \(\lambda x. \lambda y. ite(x \approx 0 \vee y \approx 0, 0, ite(x = 1, y, ite(y \approx 1, x, f(x,y)))\) where f(xy) is a fresh uninterpreted function. An early version of Boolector [12] implemented a refined version of the above under-approximation strategy in [14]. More recently, in the context of quantified bit-vector reasoning, Jonás et al. proposed an abstraction-based approach that reduces the size of the input problem via interpreting bits as don’t care bits [22], and an under-approximation-based framework based on bit-width reduction [23] similar to [14].

2 Preliminaries

We assume and briefly review the usual notions and terminology of many-sorted first-order logic with equality (see, e.g., [18, 25]). Let \(S\) be a set of sort symbols, and let \(\varSigma \) be a signature containing a set \(\varSigma ^s \!\subseteq S \) of sort symbols and a set \(\varSigma ^f\) of function symbols \(f^{\sigma _1 \cdots \sigma _n \sigma }\) with arity \(n \ge 0\) and \(\sigma _1, ..., \sigma _n, \sigma \in \varSigma ^s \). We usually omit the superscript from function symbols and refer to 0-arity function symbols as constants. We assume that \(\varSigma \) includes a designated sort \(\textsf{Bool}\), values \(\top \) (true) and \(\bot \) (false) of sort \(\textsf{Bool}\), Boolean connectives \(\{\wedge , \lnot \}\) defined as usual, equality and disequality symbols \(\{\approx , \not \approx \}\) of sort \(\sigma \times \sigma \rightarrow \textsf{Bool} \) for every \(\sigma \in \varSigma ^s \), and an if-then-else operator ite of sort \(\textsf{Bool} \times \sigma \times \sigma \rightarrow \sigma \) for every \(\sigma \in \varSigma ^s \).

Let \({\mathcal {I}}\) be a \(\varSigma \) -interpretation that maps each \(\sigma \in \varSigma ^s \) to a non-empty set \(\sigma ^{\mathcal {I}} \) (the domain of \({\mathcal {I}}\)), with \(\textsf{Bool} ^{\mathcal {I}} = \{ \top , \bot \}\); and each \(f^{\sigma _1 \cdots \sigma _n \sigma } \in \varSigma ^f \) to a total function \(f^{\mathcal {I}} \!\!: \sigma _1^{\mathcal {I}} \times ... \times \sigma _n^{\mathcal {I}} \rightarrow \sigma ^{\mathcal {I}} \) if \(n > 0\), and to an element in \(\sigma ^{\mathcal {I}} \) if \({n = 0}\). The interpretation of Boolean connectives, Boolean values, equality symbols and ite symbols is fixed and standard. We use the usual inductive definition of the satisfiability relation \(\models \) between \(\varSigma \)-interpretations and \(\varSigma \)-formulas. We write \({\varphi } [x_1,\ldots ,x_n]\) to denote a \(\varSigma \)-formula \({\varphi }\) defined over (a subset of) symbols \(\{x_1,\ldots ,x_n\}\). We further use \({\varphi } [x_1\!\mapsto \!a_1,\ldots ,x_n\!\mapsto \!a_n]\) for the formula obtained from \({\varphi }\) by simultaneously replacing each occurrence of \(x_i\) with \(a_i\).

A theory is a pair \((\varSigma ,{I})\) where \(\varSigma \) is some signature, and \({I} \) is a class of \(\varSigma \)-interpretations. A \(\varSigma \)-formula is \(T\) -satisfiable (resp. \(T\) -unsatisfiable) if it is satisfied by some (resp. no) interpretation in \({I}\); it is \(T\) -valid if it is satisfied by all interpretations in \({I}\). We assume the usual definition of well-sorted terms, literals, and formulas, and call \(\varSigma \)-formulas \(T \)-formulas and \(\varSigma \)-literals \(T \)-literals.

We focus on the theory of fixed-size bit-vectors \(T _{ BV } \) as defined by the SMT-LIB 2 standard [8]. The theory of fixed-size bit-vectors \(T _{ BV }\) is defined as the pair \((\varSigma _{BV}, {I} _{BV})\). Signature \(\varSigma _{BV}\) includes a unique sort \(\sigma _{[w]} \) for each bit-width w, function symbols overloaded for every \(\sigma _{[w]} \), and all bit-vector values of sort \(\sigma _{[w]} \) for each w. The non-empty class of \(\varSigma _{BV}\)-interpretations \({I} _{BV}\) (the models of \(T _{ BV }\)) interpret sort and function symbols as specified in SMT-LIB 2.

Without loss of generality, we consider \(\varSigma _{BV}\) to contain a restricted, arbitrary set of bit-vector operators as listed in Table 1. This set is complete in the sense that it suffices to express all bit-vector operators defined in SMT-LIB 2. We further use logical connectives \(\{\vee , \Rightarrow , \Leftrightarrow \}\) and bit-vector operator \(-\) for subtraction and negation as shorthand when convenient. In the context of this paper it is important to note that both bit-vector subtraction and negation are expressed in terms of bit-vector addition.

We denote a \(\varSigma _{BV}\)-term (or bit-vector term) x of width w as \(x_{[w]}\) when we want to specify its bit-width explicitly, and will omit w from the notation when it is clear from the context. The width of a bit-vector term is given by function \(\kappa \), e.g., \(\kappa (x_{[w]}) = w\). We refer to the bit at index i of \(x_{[w]} \) as x[i] and represent a bit-vector value \(v_{[w]}\) as a bit-string of 0 s and 1 s, with the most significant bit (MSB) as the left-most bit \(v[ msb ]\) at index \( msb = w-1\), and the least significant bit (LSB) as the right-most bit \(v[ lsb ]\) at index \( lsb = 0\). To simplify the notation, we will sometimes represent a value \(v_{[w]}\) as a natural number in \(\{0, \ldots , 2^{w-1}\}\).

Table 1. Set of considered bit-vector operators.

3 Abstraction-Refinement Framework

Our abstraction-refinement framework is integrated into an SMT solver as a CEGAR procedure that combines an abstraction module with the theory solver that is responsible for reasoning about \(T _{ BV }\)-formulas (the bit-vector solver). Since our main goal is to improve the scalability of bit-blasting, we assume that the bit-vector solver implements bit-blasting as its main strategy. For simplicity, we further assume that bit-blasting is its only strategy. However, this is not a requirement. Our abstraction-refinement technique can be combined with any complete technique for determining the satisfiability of \(T _{ BV }\)-formulas that produces models for satisfiable formulas.

Algorithm 1 shows the main abstraction-refinement procedure of our approach. Given a set of bit-vector constraints \(\mathcal {A}\), the abstraction module (AM) first generates an abstraction \(\mathcal {A'}\) of \(\mathcal {A}\) (AM::abstract) by replacing abstracted terms with fresh constants. This abstraction is then iteratively refined with lemmas \(\mathcal {L}\), starting from an empty set. First, the bit-vector solver is queried for a satisfiability result of the current abstraction \(\mathcal {A'}\) and a model \(\mathcal {M}\) of \(\mathcal {A'}\) if it is satisfiable (\(T _{ BV }\)::solve). If \(\mathcal {A'}\) is unsatisfiable, the procedure concludes with unsat. If \(\mathcal {A'}\) is satisfiable, the abstraction module checks the consistency of \(\mathcal {M}\) for all abstracted terms with respect to their true semantics (AM::check) as follows. Starting from an empty set of refinement lemmas \(\mathcal {L}\), for each abstracted term, function AM::check determines if the model value of its abstraction is consistent. If it is inconsistent, we add a refinement lemma to \(\mathcal {L}\) that rules out the inconsistency. When the model values of all abstracted terms have been checked for consistency, AM::check returns the set of refinement lemmas \(\mathcal {L}\), which extends abstraction \(\mathcal {A'}\) in the next iteration. If model \(\mathcal {M}\) is consistent for all abstracted terms (i.e., \(\mathcal {L} = \emptyset \)), the procedure concludes with sat.

Algorithm 1
figure c

. Abstraction-refinement loop around the \(T _{ BV }\)-solver.

Note that conceptually, our term abstractions are uninterpreted functions that map bit-vector arguments to a term of bit-vector sort, e.g., \( mul _{32}(x, s)\) of sort \(\sigma _{[32]} \times \sigma _{[32]} \rightarrow \sigma _{[32]} \) as abstraction of a bit-vector multiplication \(x_{[32]} \cdot s_{[32]} \). When combining bit-vector theory reasoning with UF theory reasoning, from the point of view of the bit-vector solver, these UF are seen as fresh bit-vector constants. However, by construction, our procedure ensures that term abstractions are refined until consistency. Thus, when the UF theory solver is invoked after the bit-vector theory solver, additional UF theory reasoning is not required. Hence, introducing uninterpreted functions is redundant—it is sufficient to introduce a fresh constant of the same bit-vector sort as the abstracted term, e.g., \(mul_{[32]} ^{x,s}\) for \(x_{[32]} \cdot s_{[32]} \). This allows the integration of our approach into any SMT solver that supports bit-vector reasoning, even when UF reasoning is not supported. Preliminary experiments showed that in the context of integrating our techniques in the SMT solver Bitwuzla, using UF as abstractions and scheduling the UF theory solver prior to our abstraction-refinement loop introduced redundant overhead and negatively impacted performance. Our approach, however, allows to freely choose between introducing UF vs. fresh bit-vector constants, depending on what is more beneficial for a specific solver architecture.

One of the main tasks of the abstraction module is consistency checking of satisfying assignments of the current abstraction, and refining the abstraction in case of inconsistency. This refinement is driven by a pre-defined refinement scheme for each abstracted operator. A refinement scheme is a four-tiered set of lemmas that is checked tier-wise, in ascending order, during consistency checking. We describe the refinement scheme for each operator and their tiers in more detail in Sect. 4.

4 Refinement Schemes

We define four-tiered refinement schemes for bit-vector operators \({\diamond \in \{\cdot , \div , \bmod \}}\), with tiers 1–2 as the main and predefined sets of refinement lemmas that describe properties of the abstracted operators in the usual bit-vector semantics (notably, with respect to overflow semantics). The first tier consists of hand-crafted lemmas that mostly encode basic properties (described in more detail in Sect. 4.1), while the second tier is entirely comprised of lemmas that were synthesized via our abduction-based lemma synthesis framework (see Sect. 4.3).

The third tier is not pre-defined but encodes so-called value instantiation lemmas to rule out the current inconsistent model value as a limited fallback strategy before we have to, as the fourth and final tier, resort to bit-blasting. For example, for \(x_{[32]} \cdot s_{[32]} \) with \(\mathcal {M} = \{x = 3, s = 6, mul_{[32]} ^{x,s} = 1\}\), we add \({(x = 3\,\wedge \,s = 6)}\,\Rightarrow \,mul_{[32]} ^{x,s} = 18 \) as value instantiation lemma. Value instantiation lemmas are only added if none of the lemmas in previous tiers were violated. We further limit the number of value instantiation lemmas that are added for an abstracted term since they each only rule out a single spurious model value of the term abstraction (see Sect. 5). Lemmas in tiers 1–2 do not necessarily fully capture all properties of an abstracted operator, and thus, inconsistent assignments may remain uncovered. When this is the case and the number of value instantiation lemmas to add is exhausted, we add a so-called bit-blasting lemma, e.g., \(mul_{[32]} ^{x,s} \approx x \cdot s \), which enforces bit-blasting of the abstracted term.

Note that of the considered arithmetic operators, addition is the only one we do not abstract. Even though addition is more expensive when bit-blasting compared to bit-wise operators, it is considerably cheaper than the operators we abstract. Preliminary experiments showed that the trade-off between abstracting the addition operator (which also occurs in our lemmas) versus bit-blasting addition terms suggests that it is more beneficial to not abstract addition.

Table 2 lists all lemmas of tiers 1–2 for all three operators, with hand-crafted lemmas marked with an asterisk. We use x for the left-hand operand, s for the right-hand operand, and t for the constant introduced to abstract \(x \diamond s\). We further indicate with a subscript on the lemma ID if there is a restriction on the bit-widths for which the lemma is correct (see Sect. 4.4). Note that while our abstraction approach does not generally restrict the bit-width of operators to abstract, lemmas that are incorrect for certain bit-widths must be removed from the lemma sets when terms of that size are abstracted. In practice, we only abstract terms of bit-width 32 and above (see Sect. 5) and thus these restrictions are not applicable. Further, note that in practice we consider both commutative cases (when applicable) while Table 2 only gives one. In the following, we describe our set of hand-crafted lemmas, our lemma scoring scheme and how we derive lemmas via abduction reasoning in more detail.

Table 2. Lemmas for terms \(x_{[w]} \diamond s_{[w]}\) with \(\diamond \in \{\cdot ,\div ,\bmod \} \). We use t for the constant introduced to abstract \(x \diamond s\), hand-crafted lemmas are marked with \(*\), and \(i \in [0, w-1]\). Lemma ID subscripts indicate bit-width restrictions for correctness.

4.1 Hand-Crafted Lemmas

For each refinement scheme, our set of hand-crafted lemmas mostly contains lemmas that cover basic properties of the abstracted operators (e.g., when one of its operands is a special value). We also include lemmas that describe more elaborate properties based on invertibility conditions [31], i.e., conditions that exactly describe when operand x of operator \(\diamond \) has a solution in literal \(x \diamond s \approx y\). More formally, an invertibility condition IC for a literal \({\varphi } [x, s, y]\) is a formula defined over s and y such that \(\exists x.\,{\varphi } \Leftrightarrow IC\). In the following, we summarize the properties encoded by each hand-crafted lemma.

Multiplication. Lemmas 1–2 capture the fact that multiplication by a power of 2 (and its arithmetic negation) can be described as a left shift operation. Lemma 3 states that the result of the multiplication must have at least as many trailing zeros in its binary representation as one of its arguments and is derived from the invertibility condition \( {(- s \mid s) \mathrel { \& } y \approx y}\) for \(x \cdot s \approx y\). The left-to-right direction of \(\exists x.\,{\varphi } \Leftrightarrow IC\) gives us (after Skolemization) the implication \( {x \cdot s \approx y \Rightarrow (- s \mid s) \mathrel { \& } y \approx y}\), of which lemma 3 is the right-hand side. Lemma 4 is a parity lemma that states that the result of a multiplication \(x \cdot s\) must be odd if both x and s are odd, and even otherwise. Note that properties related to multiplication by special values 1, \(-1\) and 0 are subsumed by lemmas 1, 2 and 3, respectively. Further note that [31] also provides invertibility conditions for literals defined over disequality and inequalities. We only consider invertibility conditions for literals \(x \diamond s \approx y\) as this allows to instantiate y in the corresponding lemma with term abstraction t. For literals over predicates other than equality, e.g., \(x \diamond s <_u y\), a good strategy for instantiating y in the resulting lemma is not obvious and left to future work.

Division. Lemma 1 states that unsigned division by a power of 2 can be described as a logical right shift operation. Lemmas 2–3 cover special cases: division by itself and division by 0 (the latter is a defined case in SMT-LIB). Lemma 4 states that zero divided by a non-zero value is zero. Lemma 5 captures a natural property of division by a non-zero value: its result is always less than its left-hand argument. Lemma 6 describes the property that division by \({\sim }\, \!0\) (the maximum unsigned value) yields zero if the dividend is less than \({\sim }\, \!0\). Note that for division, we do not utilize the corresponding invertibility conditions from [31] since they introduce new division terms that may not yet appear in the input constraints, which may lead to non-termination of the abstraction procedure.

Remainder. Lemma 1 exploits the fact that unsigned division by a power of 2 can be described as a logical right shift operation: the resulting remainder corresponds to the value of the bits that are shifted out. Lemma 2 states that a division by a non-zero divisor yields a remainder that cannot be greater than the divisor. Lemmas 3–5 cover special cases: when one of the operands is zero, and division by itself. Lemma 6 captures the fact that a division with a dividend that is less than the divisor yields the dividend as the remainder. Lemma 7 is derived from invertibility condition \({\sim }\, \!- s \ge _u y\) for \(x \bmod s \approx y\) from [31] in a similar manner as the lemma derived from the invertibility condition for multiplication.

Powers of Two Lemmas. The powers of two lemmas for multiplication (lemmas 1–2), division (lemma 1), and remainder (lemma 1) use \(2^i\) to denote a specific power of two. They do not symbolically encode whether a term s represents a power of two since this would require counting the number of trailing zero bits i. Instead, if the current model value of s is a power of two, we instantiate the corresponding lemma with this value. In the worst case, this will add \(\kappa (s) \) instantiations of the lemma if all powers of two for bit-width \(\kappa (s) \) are enumerated. However, this is rarely the case and the lemmas are cheap in terms of bit-blasting.

4.2 Lemma Scoring Scheme

Compiling a set of lemmas to describe properties of an abstracted operator \(\diamond \) requires careful consideration of several key aspects: (i) lemmas for \(\diamond \) should not introduce new terms that will be abstracted (introducing new terms with \(\diamond \) may lead to non-termination of the abstraction procedure and introducing terms with abstracted operators other than \(\diamond \) may yield potentially expensive abstractions in case they have to be bit-blasted); (ii) lemmas should minimize introducing new terms with potentially expensive operators that are not abstracted (e.g., bit-vector addition); and (iii) possible candidate lemmas should be filtered based on their quality to avoid adding redundant (subsumed) lemmas and to ensure that included lemmas maximize the number of spurious models to rule out.

The former two impose syntax restrictions (see Sect. 4.3), and for the purpose of addressing (iii), we define a scoring scheme that measures the quality of a candidate lemma for operator \(\diamond \) as follows.

Definition 1

(Lemma Score). Let \(x \diamond s\) be the term to abstract, and let t be the constant abstracting \(x \diamond s\) such that \(x \diamond s \approx t\). Given a lemma \({\ell } [x,s,t]\) defined over \(\{x, s, t\}\) such that \(x \diamond s \approx t \,\Rightarrow \,{\ell } \). We define \(\textsc {Score} ({\ell }, w)\), the score of \({\ell }\) for a given bit-width w, as the number of triplets \((v^x, v^s, v^t)\) of bit-vector values of bit-width w where \({\ell } [x\!\mapsto \!v^x, s\!\mapsto \!v^s, t\!\mapsto \!v^t]\) evaluates to \(\top \).

For a term \(x_{[4]} \diamond s_{[4]} \), the worst possible lemma score is the number of all possible combinations of triplets (\(2^4\times 2^4\times 2^4 = 4096\)), and the best possible score is the number of possible combinations of x and s (\(2^4\times 2^4=256\)). Thus, the difference between the worst and best possible lemma score for any \(x \diamond s\) is the number of incorrect triplets, i.e., triplets for which \(v^x \diamond v^s \mathrel {\not \approx } v^t\). Since lemmas over-approximate literals \(x \diamond s \approx t\), their score is a measure for the degree of over-approximation: a lower score indicates higher quality of a lemma as a higher number of incorrect triplets is ruled out.

For our hand-crafted lemmas for multiplication from Sect. 4.1, for bit-width 4 we compute as scores: {1: 2416, 2: 2791, 3: 1961, 4: 2048}. This indicates that they, individually, rule out 34–55% of incorrect triplets. Further, lemma 3, the lemma derived via the invertibility condition for multiplication over equality, is the strongest lemma of the four. Similarly, our hand-crafted lemmas for division and remainder rule out 6–50% of incorrect triplets for bit-width 4, with lemma 5 the strongest lemma for division, and lemma 7, the lemma derived from an invertibility condition, the strongest for remainder.

Individual lemma scores are a valuable measure of quality for a single lemma. However, triplet coverage for individual lemmas may intersect. Thus, when considered as a set, in a refinement scheme, it is necessary to define a measure for the quality of sets of lemmas to determine if extending the set with additional lemmas improves the number of incorrect triplets that are ruled out.

Definition 2

(Score of Lemma Set). Given a set of lemmas \(\mathcal {L}\) such that for each \({\ell } [x,s,t]\in \mathcal {L} \), \(x \diamond s \approx t \,\Rightarrow \,{\ell } \). We define the score of \(\mathcal {L}\) for a given bit-width w \(\textsc {Score} (\mathcal {L},w)\) as the number of triplets \((v^x, v^s, v^t)\) of bit-vector values of bit-width w where \(\bigwedge \limits _{l \in \mathcal {L}}{\ell } [x\!\mapsto \!v^x, s\!\mapsto \!v^s, t\!\mapsto \!v^t] = \top \).

For example, for \(x_{[4]} \cdot s_{[4]} \), the score of the set of hand-crafted lemmas is 704, which indicates that it already rules out 88% of the incorrect triplets. Similarly, for division and remainder, for bit-width 4 the sets of hand-crafted lemmas rule out 71% and 91% of incorrect triplets. Note that extending a set of lemmas \(\mathcal {L}\) with a lemma \({{\ell } \not \in \mathcal {L}}\) can improve but not worsen its score. If \({\ell }\) is subsumed by \(\mathcal {L}\), \(\textsc {Score} (\mathcal {L}, w)\) remains unchanged. While our sets of hand-crafted lemmas from Sect. 4.1 already rule out a large number of incorrect triplets, their score also indicates that a considerable number of incorrect triplets is still not covered. We thus, in the following, propose an automated framework for synthesizing lemmas with respect to our sets of hand-crafted lemmas via abductive reasoning.

4.3 Synthesizing Lemmas via Abduction

The lemmas from Sect. 4.1 describe basic properties of the abstracted operators and are hand-crafted but strong, as indicated by their score. However, a considerable number of incorrect triplets is still uncovered for each set. Further, manually crafting lemmas that are effective with respect to an already existing set is challenging for arithmetic bit-vector operators, mainly due to overflow semantics. In this section, we propose an automated way to synthesize lemmas with respect to our sets of hand-crafted lemmas via syntax-restricted abductive reasoning [34] and focus on synthesizing lemmas for bit-vector operators \(\{\cdot ,\div ,\bmod \}\). Our approach, however, can easily be generalized to other operators and theories.

Since we are over-approximating literals \(x \diamond s \approx t\), we are trying to find lemmas \({\ell } [x,s,t]\) such that \({(x \diamond s) \approx t \,\Rightarrow \,{\ell } }\). Further, as mentioned in Sect. 4.2, we require that \({\ell }\) does not contain specific operators (the set of abstracted operators, including \(\diamond \) itself) and that the number of occurrences of more expensive operators (such as bit-vector addition) is limited. The best possible over-approximation of operator \(\diamond \) would exactly describe the semantics of \(\diamond \) without including \(\diamond \), which seems unattainable under the given constraints. The worst possible over-approximation, on the other hand, is the formula \(\top \). We are thus looking for simple but non-trivial lemmas that improve the scores of our initial, hand-crafted lemma sets. We formulate this problem as an instance of the general abduction problem, which is defined as follows.

Definition 3

( \(T _{ BV }\) -Abduct). Given two quantifier-free \(T _{ BV }\)-formulas A and B, a \(T _{ BV }\)-abduct is a quantifier-free formula C such that \(A\,\wedge \,C \,\Rightarrow \,B\) is \(T _{ BV }\)-valid, and \(A\,\wedge \,C\) is \(T _{ BV }\)-satisfiable.

Definition 4

(Non-trivial Lemma). Given a \(T _{ BV }\)-literal \({\varphi }\) as \({x \diamond s \approx t}\), a \({\varphi }\) -lemma \({\ell } [x,s,t]\) is a quantifier-free T-formula defined over \(\{x, s, t\}\) such that \({\varphi } \,\Rightarrow \,{\ell } \) is \(T _{ BV }\)-valid. Lemma \({\ell }\) is non-trivial if it is not \(T _{ BV }\)-valid.

Finding a non-trivial lemma \({\ell }\) for a given literal \({\varphi }\) amounts to finding an abduct \(\lnot {\ell } \) of the formulas \(\top \) and \(\lnot {\varphi } \).

Lemma 1

Let \({\varphi }\) be a \(T _{ BV }\)-literal as above. \(T _{ BV }\)-formula \({\ell } \) is a non-trivial \({\varphi }\)-lemma if and only if \(\lnot {\ell } \) is a \(T _{ BV }\)-abduct of the formulas \(\top \) and \(\lnot {\varphi } \).

Proof

Suppose \(\lnot {\ell } \) is a \(T _{ BV }\)-abduct of \(\top \) and \(\lnot {\varphi } \). In particular, \(\top \,\wedge \,\lnot {\ell } \,\Rightarrow \,\lnot {\varphi } \), and therefore \({\varphi } \,\Rightarrow \,{\ell } \), and thus \({\ell } \) is a \({\varphi }\)-lemma. And since, by Definition 3, \(\top \,\wedge \,\lnot {\ell } \) is \(T _{ BV }\)-satisfiable, we get that \({\ell } \) is not \(T _{ BV }\)-valid. For the converse, suppose \({\ell } \) is a non-trivial \({\varphi }\)-lemma. Then, \({\varphi } \,\Rightarrow \,{\ell } \) is \(T _{ BV }\)-valid. In particular, \(\top \,\wedge \,\lnot {\ell } \,\Rightarrow \,\lnot {\varphi } \) is \(T _{ BV }\)-valid. Further, since \({\ell } \) is not \(T _{ BV }\)-valid, \(\top \,\wedge \,\lnot {\ell } \) is \(T _{ BV }\)-satisfiable.    \(\square \)

Since we require certain syntactic restrictions for \({\varphi }\)-lemmas, we base our lemma synthesis framework on the syntax-restricted abductive reasoning framework of [34] as implemented in the SMT solver cvc5 [7]. This abduction framework is based on Syntax-Guided Synthesis (SyGuS) [6] and thus guided by a user-defined grammar. Note that, alternatively, our lemma synthesis problem could be directly expressed as a SyGuS problem. However, non-triviality of lemmas requires the introduction of quantifiers in the specification of the formula to synthesize, whereas this quantification is implicit in the abduction formulation.

Our goal is to automatically extend a set of \({\varphi }\)-lemmas \(\mathcal {L}\) (may be empty) for a given literal \({\varphi }\) (as defined above) with a set of lemmas \({\mathsf {\Gamma }}\) such that each lemma \({\ell } \in {\mathsf {\Gamma }} \) improves the score of \(\mathcal {L}\). Algorithm 2 shows the main procedure of our abduction-based lemma synthesis approach. Function SynthLem takes as input a literal \({\varphi }\), the bit-width w for which \({\varphi }\) is defined, a set of initial lemmas \({\mathcal {I}}\), a set \(\mathcal {G}\) of grammars that define syntax restrictions for lemma construction, and a limit \({\textsf{n}}\) of number of lemmas to synthesize for each grammar. The procedure constructs and returns a set of \({\varphi }\)-lemmas \(\mathcal {L}\) such that \({\mathcal {I}} \subseteq \mathcal {L} \) and \({{\mathcal {I}} \subset \mathcal {L} \,\Rightarrow \,\textsc {Score} (\mathcal {L}, w) < \textsc {Score} ({\mathcal {I}},w)}\) as follows. The resulting set of lemmas \(\mathcal {L}\) is initialized with the given set of initial lemmas \({\mathcal {I}}\) (in our case our hand-crafted lemmas). Then, for each grammar \({\mathsf {\gamma }} \in \mathcal {G} \), in lines 5–9, first a set of at most \({\textsf{n}}\) lemmas \({\mathsf {\Gamma }}\) is generated via abductive reasoning (\(\textsc {GetAbduct}\)). From this set, in lines 10–13, \(\mathcal {L}\) is extended only with those lemmas \({\ell }\) that improve the score of \(\mathcal {L}\). Lemmas are synthesized via an incremental abduction engine \(\textsc {GetAbduct}\) (in our case cvc5) by iteratively asking for \({\textsf{n}}\) new \(T _{ BV }\)-abducts of formulas \(\top \) and \(\lnot {\varphi } \), constructed from the operators in grammar \({\mathsf {\gamma }}\). Function \(\textsc {GetAbduct}\) returns \(\bot \) if no more abducts are found (line 7), either because the search terminated or a resource limit was reached. Note that we used \({\textsf{n}} =100\) and a time limit of 100 s per call to \(\textsc {GetAbduct}\). Both limits were found to be a good middle ground between generating sufficiently many lemmas while not overwhelming the solver with too many abduction queries.

Algorithm 2
figure d

. Synthesizing lemmas. Function SynthLem assumes the availability of an abduction reasoner \(\textsc {GetAbduct}\). Function Score computes the score of a set of lemmas w.r.t. a given bit-width w as in Definition 2.

In the context of synthesizing lemmas for \(T _{ BV }\) operators, the search for lemmas via abduction is limited to formulas where the bit-width of \(T _{ BV }\)-terms is explicitly given. Consequently, the \(T _{ BV }\)-abducts determined via \(\textsc {GetAbduct}\) (and thus the resulting lemmas) are only guaranteed to be correct for this specific bit-width. Further, abductive reasoning for theory \(T _{ BV }\) as in [34] is based on a \(T _{ BV }\)-solver with the same limitations our abstraction-based approach aims to address: it relies on bit-blasting and thus does not scale well for increasing bit-widths. We thus chose a bit-width of 4 for x, s and t as a reasonable compromise to not overwhelm the abduction engine while avoiding the generation of lemmas that are specific to very small bit-widths. To minimize the risk of including bit-width specific lemmas in the set of synthesized lemmas \(\mathcal {L}\), in function SynthLem, before adding lemma \({\ell }\) to \(\mathcal {L}\), we introduce an additional step where we verify the correctness of \({\ell }\) for bit-widths 4–10. And finally, before incorporating synthesized lemmas in our refinement schemes, we verify each lemma up to a certain, large bit-width (see Sect. 4.4). Note that while the additional verification step during synthesis encountered lemmas that were only valid for bit-width 4, no lemmas that passed this verification step failed verification for larger bit-widths. Further note that bit-vector multiplication is commutative. As an optimization we thus add the corresponding symmetric cases of hand-crafted lemmas to the set of initial lemmas \({\mathcal {I}}\) when applicable.

Our abduction-based lemma synthesis procedure requires the definition of a set of grammars \(\mathcal {G}\) to describe syntax restrictions for constructing lemmas. Since the search space for SyGuS-based abduction heavily depends on such an input grammar, we opted for diversification via a set of grammars rather than a single, larger grammar. Set \(\mathcal {G}\) consists of the of grammars \({\mathsf {\gamma }} _0\) to \({\mathsf {\gamma }} _6\) defined via a common grammar \({\mathsf {\gamma }} _c = \{ x, s, t, \approx , \not \approx , <_u, \le _u, 0, 1 \}\) as follows:

$$ \begin{aligned} {\mathsf {\gamma }} _0 &= {\mathsf {\gamma }} _c \cup \{{\sim }\,, \mathrel { \& },\mid , \oplus \} & {\mathsf {\gamma }} _4 &= {\mathsf {\gamma }} _3 \cup \{ \oplus \} \\ {\mathsf {\gamma }} _1 &= {\mathsf {\gamma }} _c \cup \{-, {\sim }\,, \mathrel { \& }, \mid \} & {\mathsf {\gamma }} _5 &= {\mathsf {\gamma }} _4 \cup \{ + \} \\ {\mathsf {\gamma }} _2 &= {\mathsf {\gamma }} _1 \cup \{ \oplus \} & {\mathsf {\gamma }} _6 &= {\mathsf {\gamma }} _c \cup \{-, +, - _+, \mathop {<\!\!<}, \mathop {>\!\!>} \} \\ {\mathsf {\gamma }} _3 &= {\mathsf {\gamma }} _1 \cup \{ \mathop {<\!\!<}, \mathop {>\!\!>} \} {} & {} \end{aligned}$$

Note that in grammars \({\mathsf {\gamma }} _0\) to \({\mathsf {\gamma }} _6\) above, we use symbol ‘\(-\) ’ for negation and ‘\(- _+\)’ for subtraction to ensure that they are distinguishable. Further note that we include bit-vector addition (and operators such as subtraction and negation that can be rewritten as addition) even though it is an arithmetic operation and thus one of the more expensive operators when bit-blasting. Preliminary experiments showed that including addition, negation and subtraction in some of the grammars is beneficial for finding useful lemmas.

Extending our set of hand-crafted lemmas from Sect. 4.1 with the lemmas synthesized via abduction as given in Table 2 improves the score for multiplication from 704 to 490, which corresponds to ruling out 94% of incorrect triplets for our final set of tier 1 and tier 2 lemmas. Similarly, the score for division improves from 1366 to 394 (96% coverage of incorrect triplets), and the score for remainder improves from 616 to 400 (96% coverage of incorrect triplets).

Finally, it is important to note that we synthesized lemmas via abduction in an offline manner, as opposed to during the solving process. That is, after automatically generating the lemmas, they were incorporated into the solver together with the hand-crafted lemmas. Thus, the set of incorporated tier 1 and tier 2 lemmas is fixed and independent from the input problem.

4.4 Lemma Verification

We verified the correctness of lemmas \({\ell }\) from Table 2 for bit-widths from 1–256 by checking for literal \(x \diamond s \approx t\) if formula \(x \diamond s \approx t \,\wedge \,\lnot {\ell } \) is \(T\)-unsatisfiable. Given that the lemmas based on powers of two are well-known and universally valid properties of the corresponding bit-vector operators, we omit the additional 131,584 benchmarks required to check each instance of these lemmas up to bit-width 256. For the remaining lemmas, we generated 16,896 benchmarks and used the SMT solvers Bitwuzla  [29], cvc5  [7], Yices  [17], and Z3  [27] for verification. We ran these verification tasks on a cluster of 22 machines with Intel(R) Xeon(R) Gold 6348 CPUs. For each solver and benchmark pair, we used a CPU time limit of 8 h and a memory limit of 8GB. For a given bit-width, we consider a lemma to be correct if at least one solver determined unsat, and as incorrect if at least one solver determined sat. Overall, all solver-benchmark pairs required 1,112 d of CPU time. We did not encounter any disagreements between solvers and were able to complete all verification tasks, with Yices individually solving 96.49%, Bitwuzla 96.47%, cvc5 96.29%, and Z3 95.05% of all tasks.

We were able to verify the correctness of all hand-crafted lemmas for bit-widths 1–256, and of all synthesized lemmas for bit-widths 3–256. Synthesized lemmas are correct by construction for bit-width 4, which is confirmed by this experiment. However, some of the synthesized lemmas do not hold for very small bit-widths, as indicated by the bit-width restrictions given in Table 2. As mentioned above, if terms of such a restricted size are abstracted, these lemmas must not be considered for refinement. However, in the context of integrating our abstraction approach into Bitwuzla, all lemmas are applicable since we only abstract terms of size 32 and above (see Sect. 5).

Verification of the correctness of our lemmas up to bit-width 256 establishes sufficient confidence of their correctness for bit-widths larger than 256. We leave the task of formally proving their correctness for all bit-widths to future work. A recent technique for reasoning over bit-vectors with parametric bit-width based on a reduction to the quantified combination of the theories of uninterpreted functions and non-linear arithmetic was proposed in [32]. However, preliminary experiments showed that except for a small number of lemmas, verification of our lemmas using this technique is not feasible.

5 Integration

We extended the state-of-the-art SMT solver Bitwuzla  [29] with our proposed framework. Bitwuzla supports quantified and quantifier-free bit-vector reasoning in combination with arrays, floating-point arithmetic and uninterpreted functions and was the best performing solver across supported logics in the SMT competition in 2023 [5]. Further, Bitwuzla reduces floating-point arithmetic to the theory of bit-vectors, which allows us to also apply our approach to floating-point arithmetic problems that do not involve bit-vector constraints.

Bitwuzla implements a lazy, CEGAR-based SMT paradigm called lemmas on demand [10, 26], but with a bit-vector abstraction (and thus a \(T _{ BV }\)-solver) instead of a propositional abstraction at its core. In this bit-vector abstraction, non-\(T _{ BV }\)-atoms are abstracted as Boolean constants and non-\(T _{ BV }\)-terms are abstracted as bit-vector constants. These abstracted terms are then handled by the corresponding theory solvers. This architecture allows an easy and seamless integration of our abstraction module. The interaction between the \(T _{ BV }\)-solver of Bitwuzla and our abstraction module AM is implemented as shown in Algorithm 3. Prior to sending assertions to the \(T _{ BV }\)-solver, the abstraction module processes each assertion and introduces abstractions for all relevant bit-vector terms. After the \(T _{ BV }\)-solver determines that the set of abstracted assertions is satisfiable, the abstraction module checks if all abstracted bit-vector terms are consistent and adds refinement lemmas when needed.

Note that the order in which the theory solvers and the abstraction module are called is not arbitrary. The \(T _{ FP }\)-solver word-blasts floating-point constraints to \(T _{ BV }\) and, thus, introduces new bit-vector terms. Hence, the abstraction module is called after the \(T _{ FP }\)-solver to ensure that for pure \(T _{ FP }\)-formulas, the \(T _{ FP }\)-solver first generates word-blasting lemmas so that the abstraction module has bit-vector terms to abstract. For the arrays (\(T _{A}\)) and UF (\(T _{ UF }\)) theory solvers and the quantifiers module (\(T _{Q}\)), on the other hand, we have to ensure that the bit-vector abstraction is consistent before checking the theory axioms based on the current bit-vector abstraction model \(\mathcal {M}\). In preliminary experiments, the abstraction module was called after the \(T _{A}\)- and \(T _{ UF }\)-solvers, which resulted in a degraded performance for problems involving these theories. This was a consequence of the \(T _{A}\)- and \(T _{ UF }\)-solvers generating substantially more lemmas due to an inconsistent bit-vector abstraction. Similarly, when quantifiers are involved, the quantifiers module is called last to ensure that the bit-vector abstraction of all ground terms and formulas is consistent.

figure e

As an additional extension, we also implemented a more coarse-grained abstraction approach that abstracts assertions as fresh Boolean constants. This is not a novel technique and has been proposed in earlier literature [24]. However, it can be easily implemented in our proposed abstraction framework with a simple refinement scheme for assertions. The goal of this refinement scheme is to incrementally add assertions as refinements that evaluate to \(\bot \) under the current model of the bit-vector abstraction. This is combined with our main approach of term abstraction in an interleaved manner by limiting the number of assertion refinements added per refinement iteration. When adding assertions as refinement, the abstraction module abstracts all relevant bit-vector terms occurring in these assertions, and before new assertions are added, it ensures that the current set of term abstractions is consistent. Only when all currently abstracted terms are consistent, more assertions may be added as refinement. The termination criteria are the same as with term abstraction only. If all of the remaining assertions evaluate to \(\top \) under the current model, we conclude with sat. If a subset of the added assertions is already unsatisfiable, we found an unsat core and conclude with unsat.

Configuration. The number of assertion refinements per iteration is configurable and set to 100 refinements per iteration. Similarly, the minimum bit-width of terms defined over {\(\cdot \), \(\div \), \(\bmod \) } that we abstract is configurable and limited to terms of size 32 and above. Further, since value instantiation lemmas only rule out one spurious model, our implementation limits the number of value instantiations per abstraction t based on its bit-width to \(\kappa (t)/8\) instantiations. For example, for an abstracted term t of bit-width 32, we add at most four value instantiations before we add a bit-blasting lemma as final refinement for t.

6 Evaluation

We evaluate the performance of our bit-vector abstraction approach as integrated in Bitwuzla on five different benchmark sets: certora (1,988 benchmarks), ethereum (3,173 benchmarks), syrew (15,000 benchmarks), ff (1,224 benchmarks), and smtlib (155,269 benchmarks). Benchmark sets certora and ethereum are industrial benchmarks that arise from smart contract verification applications [15], provided by Certora [1] and the Ethereum Foundation [3]. The certora set consists of SMT queries generated by the Certora Prover [2] and is split into sets certora1 and certora2. The ethereum set contains benchmarks generated by hevm [4], a symbolic execution engine for the Ethereum virtual machine. Benchmarks in these sets are specifically encoded over bit-vectors of size 256, in combination with arrays, uninterpreted functions, and quantifiers.

Benchmark set syrew serves as a more controlled and balanced set to specifically evaluate the effectiveness of our abstraction approach for each abstracted operator. We generated three sets of equivalence checks, each only involving one of the abstracted operators. For that purpose, we enumerated \(T _{ BV }\)-terms and \(T _{ BV }\)-formulas that are equivalent for bit-width 4 with the SyGuS-solver of cvc5. For each set, we enumerated 500 equivalence checks using as SyGuS grammar \( \{0, 1, x, s, t, \approx , \mathrel {\not \approx }, <_u, \le _u, {\sim }\,, \mathrel { \& }, \mathop {<\!\!<}, \mathop {>\!\!>} \}\), extended with only one of \(\{\cdot ,\div ,\bmod \}\). The resulting 1,500 benchmarks were then instantiated for bit-widths \(2^{i}\) with \(i \in [4,13]\) yielding 15,000 benchmarks in total, the majority unsatisfiable.

The ff benchmark set originates from [33] and consists of translation validation problems of zero-knowledge proof compilers in two sets: an encoding in the theory of finite fields \(T _{ FF }\) and a translation to \(T _{ BV }\) that exclusively uses arithmetic bit-vector operators \(\{+, \cdot , \bmod \}\) over bit-vectors of size 510.

Benchmark set smtlib contains all non-incremental benchmarks of all logics in the SMT-LIB [9] benchmark library supported by Bitwuzla. This includes all quantified and quantifier-free logics involving the theories of bit-vectors, arrays, floating-point arithmetic and uninterpreted functions (24 in total). Note that this also includes floating-point arithmetic logics that do not involve the theory of bit-vectors since Bitwuzla word-blasts floating-point terms to bit-vector terms.

We implemented our novel term abstraction technique in our main configuration Abstr-t. We additionally distinguish two configurations that enable assertion abstraction as described in Sect. 5: configuration Abstr-a, which enables assertion abstraction only, and configuration Abstr-ta, which enables both term and assertion abstraction. We evaluate these configurations against Bitwuzla version 0.3.2, cvc5 version 1.1.0, and Z3 version 4.12.4 (in their default configuration, using bit-blasting for \(T _{ BV }\)). Both cvc5 and Z3 are industrial-strength SMT solvers that support a wide range of theories, including the theories supported by Bitwuzla. We further compare against cvc5-ib, a configuration of cvc5 that reduces bit-vector problems to non-linear integer arithmetic problems via int-blasting [36]. Note that on the ff benchmark set, we evaluate these configurations only on the \(T _{ BV }\) subset, and additionally compare against a dedicated \(T _{ FF }\)-solver implementation of cvc5 (cvc5-ff) on the \(T _{ FF }\) subset.

We ran all experiments on a cluster of 25 machines with Intel(R) Xeon E5-2620 v4 CPUs. For each solver and benchmark pair, we allocated one CPU core and 8GB of memory with a time limit of 1200 s. In case that a solver terminated with an error or ran into the memory limit on a specific benchmark, we counted its runtime on that benchmark as 1200 s as a penalty.

Table 3. Number of solved benchmarks (Solved), timeouts (TO), memory outs (MO), penalized runtime (T), memory usage of all benchmarks (M), and runtime Tc on commonly solved benchmarks, grouped by benchmark set and solvers. Note that the number (x/y) for each benchmark set indicates the number of commonly solved instances x and the total number of benchmarks y in the set.

Table 3 summarizes the results for each solver grouped by benchmark set and ordered by number of solved benchmarks. Overall, Abstr-t significantly outperforms all other bit-blasting solvers and the int-blasting solver cvc5-ib on all benchmark sets. Our abstraction approach considerably reduces the memory usage across all sets, solving more benchmarks with a lower number of memory outs. Only on the certora sets, cvc5-ib has a smaller memory footprint, which is due to the more memory-efficient translation of bit-vector to integer arithmetic.

The certora set is divided into the certora1 and certora2 subsets, which correspond to the use of two different encodings arising from the same application. Both sets rely on 256-bit bit-vectors and uninterpreted functions and make heavy use of arithmetic operators. Set certora1 is a proprietary and more diverse set of benchmarks and is sampled from a different (and more diverse) set of smart contracts than certora2. It uses an older, less optimized encoding that involves quantifiers and overflow predicates, while certora2 does not rely on quantifiers and was successfully optimized for existing bit-blasting solvers, which struggled on the older encoding. This can be seen in Table 3, where the best non-abstraction-based bit-blasting configuration (Bitwuzla) solves only 13% of certora1 but 74% of certora2. Benchmarks in the certora1 set usually contain a large number of assertions (15k on average, up to 100k) and are thus good candidates for evaluating assertion abstraction in combination with term abstraction. Benchmarks in the certora2 set, on the other hand, usually contain a significantly smaller number of assertions (less than 1k per benchmark). Hence, on the certora benchmark sets, in addition to configuration Abstr-t, we also evaluate the two configurations Abstr-a and Abstr-ta that enable assertion abstraction. On both sets, Abstr-t considerably improves over bit-blasting. On the certora1 set, Abstr-a outperforms Abstr-t, and combining assertion and term abstraction (Abstr-ta) significantly outperforms either, both in terms of solved benchmarks and memory usage. We observed that in the majority of cases where Abstr-ta improves over Abstr-t, the benchmark is unsatisfiable and the size of the unsatisfiable core is only a small fraction of the overall number of assertions. On the certora2 set, however, Abstr-a is less effective since these benchmarks contain a significantly smaller number of assertions. Configuration Abstr-ta still improves over Abstr-t in terms of overall memory usage.

Note that for the benchmark sets ethereum, ff and syrew, enabling assertion abstraction was not applicable for a majority of the benchmarks due to the low number of assertions (less than 100 per benchmark). On benchmark set smtlib, the effects of assertion abstraction were overall inconclusive. Thus, due to space constraints, for the remaining sets, we exclude configurations Abstr-a and Abstr-ta from the evaluation.

On the ethereum set, both Abstr-t and Bitwuzla solve all benchmarks. However, Abstr-t is more than 40% faster and requires 60% less memory. On the commonly 3,138 solved benchmarks, Abstr-t is the fastest solver, closely followed by cvc5-ib. Both outperform the other bit-blasting solvers. Note that on this benchmark set, cvc5 and cvc5-ib returned with errors due to unsupported cases of equality over constant arrays on 14 and 12 benchmarks, respectively.

On the syrew set, Abstr-t significantly outperforms all other solvers and is more than \(3\times \) faster with a \(5\times \) lower memory usage compared to the second best solver Bitwuzla. On the commonly solved 5,528 benchmarks, Abstr-t is 12–90\(\times \) faster than the competition. The int-blasting configuration cvc5-ib comes in last, mainly due to the occurrence of bit-wise operations. Bit-wise operators do not have a direct translation to integers and require cvc5-ib to resort to abstraction schemes, which is more expensive than the direct translation via bit-blasting.

On the ff benchmark set, as expected, the native finite field solver cvc5-ff solves the most benchmarks overall. However, Abstr-t significantly improves over bit-blasting (Bitwuzla) and int-blasting (cvc5-ib) with the least number of memory outs overall. Surprisingly, Abstr-t is able to solve 36 benchmarks that cvc5-ff cannot. None of the other solvers solves benchmarks that cvc5-ff cannot.

On the smtlib set, Abstr-t improves over Bitwuzla in 10 out of the 24 logics in terms of number of solved benchmarks, with 6 of them being floating-point arithmetic logics. Most notably, Abstr-t was able to improve the number of solved instances X and runtime in percent Y on commonly solved instances (X, Y%) over Bitwuzla in logics FP (\(+5\), \(-16\%\)), BVFP (0, \(-45\%\)), QF_ABVFP (\(+1\), \(-33\%\)), QF_ABVFPLRA (0, \(-23\%\)), QF_BVFP (\(+1\), \(-45\%\)), QF_BVFPLRA (\(+9\), \(-46\%\)), QF_FP (\(+23\), \(-13\%\)), and QF_FPLRA (+1, -7%).

The only significant loss of -13 benchmarks is in the QF_BV logic, which is also the only logic where Abstr-t is significantly slower (33%) on commonly solved instances compared to Bitwuzla. This slowdown can be primarily attributed to the two benchmark families Sage2 and uclid. On these two families, on the commonly solved instances, Abstr-t is slower by 40% and 4,100%, respectively. This slowdown is unexpected and needs further investigation. Nevertheless, in logic QF_BV, Abstr-t is able to solve more unsatisfiable benchmarks with less memory outs compared to Bitwuzla and outperforms cvc5, cvc5-ib and Z3 by a significant margin (more than 1,400 solved benchmarks).

Table 4. Number of overall abstracted terms and abstraction refinements on solved benchmarks grouped by abstracted operator and refinement tier (1: hand-crafted, 2: abduction, 3: value instantiation, 4: bit-blasting).

We further performed an analysis of term abstractions and abstraction refinements for all benchmarks solved by Abstr-t in all benchmark sets. Table 4 summarizes our findings, grouped by refinement tier and abstracted operator. Overall, Abstr-t abstracted 367,101 multiplication terms, 55,461 unsigned division terms, and 62,328 unsigned remainder terms. Out of these, only 134,525 (37%) multiplications, 7,024 (13%) divisions, and 1,326 (2%) remainders were bit-blasted as last resort via adding tier-4 lemmas. For the remaining 63%/87%/98% of multiplication/division/remainder terms, refinement with tier 1–3 lemmas only was sufficient to solve the benchmarks. Out of the solved benchmarks where Abstr-t abstracted any bit-vector terms, 80% were solved without bit-blasting any of the abstracted terms. For the remaining 20% of solved benchmarks, 78% of abstracted terms were bit-blasted.

For the benchmarks solved with abstraction, Abstr-t required on average 37 refinement iterations (median 4). Further, all lemmas except bvudiv lemma 21 and bvurem lemma 11 from Table 2 were used for solving these instances. Tier-1/2/3/4 lemmas were used in 76%/27%/30%/20% of solved instances.

We further evaluated the usefulness of the abduction-based lemmas (tier 2) by disabling these lemmas on the syrew benchmark set. Without these lemmas, Abstr-t solves 336 less benchmarks, has \(2\times \) more memory outs, and is 23% slower on commonly solved instances while consuming 61% more memory. Without tier-3 lemmas the number of solved instances for benchmark sets certora1/certora2/syrew/ff/smtlib change by −12%/−1%/−1%/−6%/+0.01%.

The artifact of this paper is archived and available in the Zenodo open-access repository at https://zenodo.org/record/10913320.

7 Conclusion

We have presented a novel abstraction-refinement approach to improve the scalability of bit-blasting arithmetic terms with large bit-widths. We have introduced a lemma scoring scheme and an abduction-based framework for synthesizing refinement lemmas, which we include in our four-tiered refinement schemes. We have extended the state-of-the-art SMT solver Bitwuzla with our techniques and showed that this significantly improves solver performance on a diverse set of benchmarks coming from a variety of applications, including smart contract verification and zero-knowledge proofs. Incorporating existing under-approximation techniques with our approach is an interesting direction for future work.