CoqQFBV: A Scalable Certiﬁed SMT Quantiﬁer-Free Bit-Vector Solver

. We present a certiﬁed SMT QF BV solver CoqQFBV built from a veriﬁed bit blasting algorithm, Kissat , and the veriﬁed SAT cer-tiﬁcate checker GratChk in this paper. Our veriﬁed bit blasting algo-rithm supports the full QF BV logic of SMT-LIB ; it is speciﬁed and formally veriﬁed in the proof assistant Coq . We compare CoqQFBV with CVC4 , Bitwuzla , and Boolector on benchmarks from the QF BV division of the single query track in the 2020 SMT Competition, and real-world cryptographic program veriﬁcation problems. CoqQFBV surprisingly solves more program veriﬁcation problems with certiﬁcation than the 2020 SMT QF BV division winner Bitwuzla without certiﬁcation.


Introduction
Satisfiability Modulo Theories (SMT) solvers for the Quantifier-Free Bit-Vector (QF BV) logic have been used to verify programs with bit-level accuracy [9,10]. In such applications, a program verification problem is reformulated as an SMT QF BV query. An SMT QF BV solver is then invoked to compute a query result. The query result in turn decides the answer to the program verification problem. For cryptographic assembly programs, a missing carry or borrow flag will result in incorrect computation. Bit-accurate verification is thus necessary for cryptographic programs. SMT QF BV solvers in fact have been employed to verify such programs [8,25]. These solvers nonetheless are very complex programs with possibly unknown bugs [7,18]. Since bugs in SMT QF BV solvers may induce incorrect query results, program verification cannot be taken without a grain of salt when SMT QF BV solvers are employed.
In order to check SMT QF BV query results independently, SMT QF BV solvers can generate certificates to validate their answers. In the LFSC certificates [23,14], for instance, an SMT QF BV query result is certified by correct bit blasting and Boolean Satisfiability (SAT) solving. Such certificates demonstrate that the SMT QF BV query is reduced to a Boolean SAT query correctly and the corresponding SAT query is solved correctly. Although one can certify SAT query results with certificates from SAT solvers [24], it is not always easy to certify correct bit blasting due to complex arithmetic operations in SMT QF BV queries. Developing correct and efficient checkers for SMT QF BV certificates can be very challenging. Indeed, an LFSC certificate checker based on the proof assistant Coq has been developed to improve confidence [12]. Yet the Coq-based certificate checker does not fully support arithmetic operations and thus cannot certify results of SMT QF BV queries with complicated arithmetic operations. Consequently, the correctness of cryptographic programs still relies on the correctness of SMT QF BV solvers or their unverified certificate checkers.
In this paper, we take a more direct approach to ensure the correctness of SMT QF BV query results. Instead of certifying correct bit blasting for every SMT QF BV query, we specify a bit blasting algorithm and prove its correctness in the proof assistant Coq. In order to formalize the correctness of our bit blasting algorithm, we develop a formal bit-vector theory in Coq. Naturally, the formal theory has to support all arithmetic functions (addition, subtraction, multiplication, division, and remainder) for both signed and unsigned representations as needed in SMT-LIB [3]. Based on our new bit-vector theory, we give a formal semantics for SMT QF BV queries in Coq. Our semantics follows the SMT-LIB semantics carefully. Particularly, division and remainder are total arithmetic operations even when the divisor is zero. Using our Coq bit-vector theory and semantics, we prove that our bit blasting algorithm always returns a corresponding Boolean formula correctly on any SMT QF BV query. Since our algorithm has been formally verified, bit blasting is always correct and need not be certified. Through the OCaml program extracted from our verified bit blasting algorithm, a corresponding SAT query is obtained for each SMT QF BV query and sent to a SAT solver. A SAT certificate checker suffices to validate SAT query results and hence the correctness of answers to SMT QF BV queries. Since neither complicated SMT QF BV solvers nor their certificate checkers are trusted, our work can improve the confidence of SMT QF BV query results.
To our knowledge, our bit-vector theory is the first Coq formalization designed for bit blasting queries from the QF BV logic of SMT-LIB. Our semantics is the first Coq formalization for full SMT QF BV queries. We are not aware of any verified bit blasting algorithm or program for full SMT QF BV queries of SMT-LIB at the time of writing. Even the correctness of its results could be ensured, our certified SMT QF BV solver CoqQFBV would not be very useful if it were extremely inefficient. In order to evaluate its performance, we run CoqQFBV on benchmarks from the QF BV division of the single query track in the 2020 SMT Competition. With the same memory and time limits in the competition, our solver successfully finishes 88.72% of the 6861 queries with certification. In comparison, CVC4 with its certificate checker solves 55.97% with certification, and the division winner Bitwuzla solves 98.22% of the benchmarks without certification. Our certified solver outperforms CVC4 with certification significantly. Generating and checking certificates make our certified solver finish about 10% of the queries less than the division winner. The price of accuracy perhaps is not unacceptable for the benchmarks in the competition. To further evaluate CoqQFBV, the certified solver is used to verify linear arithmetic assembly programs from various cryptography libraries such as OpenSSL [30]. CoqQFBV gives certified answers to 96.88% out of the 96 SMT QF BV queries from real-world cryptographic program verification. CVC4 with its certificate checker certifies 19.79%. Compared with efficient SMT QF BV solvers without certification, Boolector is able to solve 100% and Bitwuzla solves 91.67% of the queries. Intriguingly, our certified SMT QF BV solver outperforms the 2020 division winner Bitwuzla in queries from real-world verification problems. Our certified solver is probably useful for real-world verification problems.
Related Work. As mentioned, SMT certificate generating and checking are challenging. There are few efforts developing SMT QF BV certificate checkers, let alone verified ones. CVC4 is able to produce unsatisfiability certificates for QF BV queries, and also equipped with an (unverified) certificate checker [14]. SMTCoq [12] is proposed to check certificates from SMT solvers veriT and CVC4. It supports fragments of several logics including the QF BV logic. Moreover, its correctness is formally proved in Coq. However, the QF BV logic is not fully supported by SMTCoq. Z3 also supports certificate generation for the QF BV logic [19]. The proofs can be reconstructed, thus checked, within proof assistants HOL4 and Isabelle [6]. But the lack of details in Z3's generated certificates makes proof reconstruction particularly challenging.
With a similar approach in this paper, GL is a framework for bit blasting finitely bounded ACL2 theorems into SAT queries [28]. Its bit blasting algorithm is formally verified in ACL2. Though it is not designed for SMT-LIB, most of the operations defined in the QF BV logic are supported, except division and concatenation for instance. A bit blasting algorithm is defined and verified in HOL4 as well [13]. Neither [28] nor [13] aims to develop a scalable SMT QF BV solver. CoqQFBV accepts SMT-LIB inputs with fully supported QF BV logic while adopting performance optimizations such as caches.
In Isabelle and HOL4, one can use the bit-vector libraries to conform SMT-LIB operations, see [17] for example. Under the frame of Coq, coq-bits is a formalization of logical and arithmetic operations on bit vectors [15]. The library provides the mapping between bit-vector operations and abstract number operations. Different from our theory, it does not support division/remainder or signed operations. Why3 [11] provides a bit-vector theory which is formalized in Coq too. It defines the division by zero in a different way from SMT-LIB. Moreover, the operations are defined based on integer operations. Our new bitvector theory instead defines bit-vector operations through bit manipulation. It is more suitable for the correctness proof of bit blasting algorithms.
We have the following organization. After the introduction, an overview is given in Section 2. Section 3 reviews preliminaries. Our formal bit vector theory is presented in Section 4. It is followed by the formal semantics of SMT QF BV queries (Section 5). The correctness of our bit blasting algorithm is established in Section 6. Section 7 outlines the construction of our certified SMT QF BV solver. Experiments are presented in Section 8. Section 9 concludes our presentation.

Methodology Overview
Given an SMT QF BV query, a bit blasting algorithm computes a Boolean formula such that the SMT QF BV query is satisfiable if and only if the Boolean formula is satisfiable. The QF BV logic contains arithmetic operations for bit vectors. Computing an equi-satisfiable Boolean formula for an arbitrary SMT QF BV query can be very complicated and susceptible to errors. Our goal is to construct a correct bit blasting program for every SMT QF BV query. The correctness of the program moreover is verified by the proof assistant Coq to minimize gaps or even errors in hand-written proofs.
Our construction is based on a new formal bit-vector theory coq-nbits (Section 4). In coq-nbits, we define bit vectors and their functions on top of the Coq data type for Boolean sequences. In order to support the QF BV logic of SMT-LIB fully, five arithmetic bit-vector functions (addition, subtraction, multiplication, division, and remainder) are defined in our formal theory. To establish the correctness of our definitions, formal proofs are provided to relate bit-vector functions with their arithmetic counterparts. For instance, we show the number represented by the output of the bit-vector negation function is indeed the arithmetic negation of the number represented by the input bit vector.
Using our coq-nbits theory, we then give a formal semantics for SMT QF BV queries as defined in SMT-LIB (Section 5). In our formalization, a QF BV predicate denotes a Boolean value; and a QF BV expression denotes a bit vector. An SMT QF BV query is formalized as a Boolean combination of QF BV predicates on QF BV expressions over QF BV variables and bitvector constants. In order to demonstrate the correctness of our formal semantics for SMT QF BV queries, formal proofs are provided to show that our formal semantics coincides with those defined in SMT-LIB.
Our bit blasting algorithm is given in Coq (Section 6). It extends Tseitin transformation for Boolean formulae to SMT QF BV queries. More precisely, a QF BV predicate is transformed to a literal with a Boolean formula; a QF BV expression is transformed to a literal sequence with a Boolean formula. Using our formalization of SMT QF BV queries, the correctness of bit blasting algorithm is established in Coq by mutual induction. To improve efficiency, our bit blasting algorithm is further optimized with more economic transformations and a cache. The optimized bit blasting algorithm is also verified with formal Coq proofs.
Our formally verified bit blasting algorithm is written in the Coq specification language. It is not yet a program compilable into executable binary codes. Using the code extraction mechanism in Coq, an OCaml program is extracted from our verified bit blasting algorithm. The OCaml program takes expressions in our formal SMT QF BV query syntax as inputs and returns expressions in our formal syntax for Boolean formulae as outputs. SAT solvers can be employed to decide satisfiability of output Boolean formulae. Their certificates can be validated by SAT certificate checkers independently (Section 7).

Preliminaries
Let v be a Boolean variable with values ff and tt. A literal is of the form v or ¬v. A clause is a disjunction l 0 ∨l 1 ∨· · ·∨l k of literals l 0 , l 1 , . . . , l k . A Boolean formula in the conjunctive normal form (CNF) is a conjunction c 0 ∧c 1 ∧· · ·∧c m of clauses c 0 , c 1 , . . . , c m . A SAT query is a Boolean CNF formula. An environment maps Boolean variables to their values. Given a SAT query, the Boolean satisfiability problem is to decide if the query evaluates to tt on some environments.
A bit vector of width w is written as In the unsigned representation, the bit vector #bb w−1 b w−2 · · · b 0 denotes the natural number (non-negative integer) 0≤i<w b i 2 i ; in two's complement (signed) representation, it denotes the integer 0≤i<w−1 b i 2 i − 2 w−1 b w−1 . For instance, #b1010 denotes 10 and −6 in the unsigned and two's complement representations respectively. We use bv2nat(bv ) for the natural number denoted by the bit vector bv in the unsigned representation; and nat2bv (w, i) stands for the bit vector of width w representing the natural number i modulo 2 w .
Let bv = #bb w−1 b w−2 · · · b 0 and cv = #bc u−1 c u−2 · · · c 0 be bit vectors of widths w and u respectively. The following QF BV operations are defined in the QF BV logic of SMT-LIB: concat bv cv #bb w−1 b w−2 · · · b 0 c u−1 c u−2 · · · c 0 is the concatenation of bv and cv ; extract i j bv #bb i b i−1 · · · b j extracts bits from bv where 0 ≤ j ≤ i < w; bvnot bv , bvand bv cv , and bvor bv cv are the bitwise complement, and, or operations respectively. Additionally, bvneg bv nat2bv (w, 2 w − bv2nat(bv )) is the arithmetic negation operation; bvadd bv cv nat2bv (w, bv2nat(bv ) + bv2nat(cv )) is the arithmetic addition operation; and bvmul bv cv nat2bv (w, bv2nat(bv ) × bv2nat(cv )) is the arithmetic multiplication operation. The arithmetic division and remainder operations are Note that the arithmetic division and remainder operations are defined even when the divisor represents the number zero. Finally, the operations bvshl bv cv nat2bv (w, bv2nat(bv )×2 bv2nat(cv ) ) shifts the bit vector bv to the left by bv2nat(cv ) bits; bvlshr bv cv nat2bv (w, bv2nat(bv ) ÷ 2 bv2nat(cv ) ) shifts the bit vector bv to the right by bv2nat(cv ) bits. In addition to bit-vector operations, the QF BV logic of SMT-LIB defines QF BV predicates on bit vectors. The predicate bveq bv cv is true when the bit vectors bv and cv are equal; bvult bv cv is true if bv2nat(bv ) < bv2nat(cv ). In the QF BV logic of SMT-LIB, both operands of binary operations and predicates must have the same width. Overall, seventeen bit-vector operations and predicates are defined in the QFBV logic of SMT-LIB. Particularly, arithmetic division and remainder operations with operands in both unsigned and two's complement signed representations are defined in SMT-LIB.
A QF BV variable denotes a bit vector. A QF BV expression is constructed from QF BV operations over QF BV variables and bit vectors. An SMT QF BV query is a Boolean combination of QF BV predicates on QF BV expressions. Let stores be mappings from QF BV variables to bit vectors. Given an SMT QF BV query, the satisfiability modulo QF BV theory problem is to decide if the query evaluates to tt on some stores.

Bit-Vector Theory
We present our formal Coq bit-vector theory coq-nbits in this section. The coq-nbits theory supports bit vectors in both unsigned and two's complement signed representations. In coq-nbits, a bit vector is represented by a Boolean sequence of the data type bits in the least significant bit-first order.
In the definition, bool and seq are the data types for Boolean values (false and true) and sequences in Coq respectively. For instance, the bit vector #b100 is represented by [:: false; false; true] in coq-nbits.
Coq functions defined for sequences are applicable to bit vectors. Particularly, size bv computes the width of the bit vector bv and bv ++ cv is the concatenation of the bit vectors bv and cv . It is also straightforward to define auxiliary bit-vector functions. For example, zeros n returns the bit vector of n false's; ones n returns the bit vector of n true's; extract i j bv returns the sub-sequence of the bit vector bv with indices from j to i where 0 ≤ j ≤ i < size bv . Let  The function first checks the width w. If the width is zero, it returns the empty bit vector. Otherwise, the function returns the bit vector with the least significant bit N.odd n and the remaining w − 1 bits representing n divided by two. Observe that two Coq formalizations of natural numbers are used. The nat theory uses the unary representation suitable for inductive proofs; N uses the succinct binary representation. The following lemma is proved in Coq: The following properties hold: The first property shows that bit vectors can be converted to natural numbers and back to themselves. The second property shows that natural numbers can be converted to bit vectors with sufficient widths and back to themselves. To see how they are used to prove properties about bit-vector functions in coq-nbits, consider the definition of the successor bit-vector function. If the input is the empty bit vector, the function returns the empty bit vector. Otherwise, succB checks the least significant bit of the input bit vector. If the bit is true, the function computes the successor of the remaining bits and appends false as the least significant bit. If the least significant bit of the input is false, the function simply changes the least significant bit to true and copies the remaining bits. Using the conversion functions, the bit-vector successor is related to the arithmetic successor in the following lemma: Lemma 2 says that succB bv does compute the bit vector representing the arithmetic successor of the natural number represented by the bit vector bv . Observe that the successor bit vector function is correct when the input bit vector is empty. It is also correct when there is overflow. Indeed, both sides are zeros of width size bv when overflow occurs.
Other arithmetic bit-vector functions are defined and proved in coq-nbits similarly. Specifically, the arithmetic negation negB, addition addB, subtraction subB, unsigned multiplication mulB, unsigned division divB, and unsigned remainder remB functions are supported by coq-nbits. We give properties to relate the arithmetic functions for bit vectors and natural numbers.
Lemma 3. The following properties hold: Let bv , cv be bit vectors of width w. Lemma 3 shows that the natural number represented by the bit vector addB bv cv is equal to the modular sum of the natural numbers represented by bv and cv . Similarly, the natural number represented by mulB bv cv is equal to the modular product of the natural numbers represented by bv and cv . The division and remainder functions in coq-nbits follow the SMT-LIB semantics. Specifically, the quotient of any bit vector divided by zero is equal to the bit vector of all true's; the remainder of a bit vector divided by zero is the bit vector itself. For non-zero divisors, the division and remainder functions behave as expected. The natural number represented by the bit vector divB bv cv is the quotient of the number represented by bv divided by the number represented by cv ; and the bit vector remB bv cv represents the remainder of the number represented by bv divided by the number represented by cv . Last but not least, the logical left (shlB) and right (shrB) shifts correspond to multiplication and division by powers of two respectively.
coq-nbits also provides comparison predicates. In addition to the equality predicate == inherited from Boolean sequences, ltB bv cv and leB bv cv compare the natural numbers represented by the bit vectors bv and cv . Properties about comparison predicates have also been proved in Coq.
Lemma 4. The following properties hold: In addition to arithmetic functions and predicates in the unsigned representation, our formal bit-vector theory moreover defines arithmetic functions and predicates for bit vectors in two's complement representation. For the signed representation, bit vectors are converted to integers by the to Z function. Arithmetic bit-vector functions and predicates in the signed representation are related to arithmetic integer functions and predicates as follows.
Lemma 5. The following properties hold: In the lemma, sext n bv extends the bit vector bv by n bits with the sign bit of bv , msb bv returns the sign bit of bv , and dropmsb bv drops the sign bit of bv . quot and rem are the quotient and remainder functions for Coq integers. Consider, for instance, the signed division function sdivB bv cv in coq-nbits (Lemma 5 (4)). If the dividend bv is of width > 1, the widths of bv and the divisor cv are equal, and bv is not of the form #b100 · · · 0 or cv is not of the form #b11 · · · 1, then the bit vector sdivB bv cv represents the quotient of the integers represented by bv and cv . The condition may appear counter-intuitive. To see why it is necessary, consider bv = #b100 · · · 0 and cv = #b11 · · · 1 both of width w. bv and cv thus represent the integers −2 w−1 and −1 respectively. Their quotient 2 w−1 however cannot be represented by bit vectors of width w in two's complement representation. The corner input case is hence excluded. The corner case is also excluded from the arithmetic negation function (Lemma 5(1)).
The coq-nbits theory has several important differences from the prior Coq formalization in [15]. Our formal bit-vector theory supports both unsigned and two's complement signed representations. It also provides the arithmetic division and remainder functions. Since these features are needed in the QF BV logic of SMT-LIB, they are essential to the formalization of SMT QF BV queries. Such important features unfortunately are lacking in the prior formalization. Another noted difference is the numeric representations used in theory developments. Since integers are needed for the QF BV logic, coq-nbits naturally uses binary representations for integers and natural numbers in Coq. The prior formalization on the other hand is mainly based on the unary natural number representation but provides conversion to positive integers in the binary representation.

Theory for SMT QF BV Queries
Using coq-nbits, we formalize SMT QF BV queries. Our formalization consists of two parts: a syntactic representation for SMT QF BV queries in Coq inductive types and a formal semantics in our bit-vector theory coq-nbits.

Syntax of SMT QF BV Queries
An SMT QF BV query is a Coq term of the data type bexp. It can be constants Bfalse or Btrue, a unary predicate Bnot, or binary predicates Band or Bor for A Coq term of the data type exp represents a QF BV expression. It can be a QF BV variable Evar vid with a variable identifier vid : var, a bit vector constant Econst bv with bv : bits, a bitwise-not operation Ebvnot e 0 , a bitwise-and operation Ebvand e 0 e 1 , a bitwise-or operation Ebvor e 0 e 1 , a logical left-shift operation Ebvshl e 0 e 1 , or a logical right-shift operation Ebvlshr e 0 e 1 . For arithmetic operations, there are Ebvneg e 0 for negation, Ebvadd e 0 e 1 for addition, Ebvmul e 0 e 1 for multiplication, Ebvudiv e 0 e 1 for unsigned division, and Ebvurem e 0 e 1 for unsigned remainder with e 0 , e 1 : exp. Finally, the extraction Eextract i j e 0 and the concatenation Econcat e 0 e 1 operations have the data type exp with i, j : nat and e 0 , e 1 : exp.

Semantics of SMT QF BV Queries
In our Coq formalization, an SMT QF BV query is interpreted on stores. A store is a mapping from QF BV variables to bits. Let σ be a store. The interpretation of be : bexp on σ is a Boolean value; the interpretation of e : exp on σ is a bit vector. Semantic functions eval bexp and eval exp are as follows. An SMT QF BV query denotes a value in the Coq data type bool. Bfalse and Btrue denote false and true respectively. Boolean negation, conjunction, and disjunction correspond to~~, &&, and || in bool respectively. For QF BV predicates, the bit-vector equality Bbveq is interpreted by the equality == for Boolean sequences. The coq-nbits function ltB is used to interpret Bbvult.
A QF BV expression denotes a bit vector. For basic cases, QF BV variables are interpreted by corresponding bit vectors in the store σ through the store access function Store.acc; bit vector constants are interpreted by themselves. Bitwise logical operations Ebvnot, Ebvand, and Ebvor are interpreted by corresponding coq-nbits functions invB, andB, and orB respectively. For logical shift operations, the offset e 1 is first converted to a natural number through to nat (eval exp e 1 σ) and then passed to the corresponding logical shift functions shlB or shrB in coq-nbits. QF BV arithmetic operations are interpreted by corresponding coq-nbits arithmetic functions as expected. Finally, the extraction Eextract and concatenation Econcat operations are interpreted by extract and ++ in coq-nbits respectively.
In an SMT QF BV query, a QF BV variable designates a bit vector of a certain width. An SMT QF BV query is hence associated with a signature Σ mapping QF BV variables to their respective widths. A store σ conforms to a signature Σ if the interpretation of each QF BV variable on σ has the same width as specified in Σ. Given an SMT QF BV query be : bexp with its signature Σ, be is satisfiable if there is a store σ conforming to Σ and eval bexp be σ = true.

Derived QF BV Operations and Predicates
In the QF BV logic of SMT-LIB, a number of QF BV operations and predicates are derived from a small set of core operations and predicates. Consider To compare two bit vectors of width w in two's complement representation, the sign bits are checked. If bv is negative but cv is positive, bvslt bv cv is true. Otherwise, the signed predicate checks that both operands have the same sign and compares the operands using the unsigned comparison predicate. Interestingly, the arithmetic subtraction operation is actually a derived operation in SMT-LIB: bvsub bv cv bvadd bv (bvneg cv ). The arithmetic operation is defined to be the bit-vector sum of minuend and the negation of subtrahend. It is not, for instance, defined as nat2bv (w, bv2nat(bv ) − bv2nat(cv )) because bv2nat(bv ) − bv2nat(cv ) may not be a natural number.
For derived operations and predicates, there is a subtle yet important difference between our formal semantics and those defined in SMT-LIB. In our formal bit-vector theory coq-nbits, most functions and predicates are defined directly. Particularly, the arithmetic subtraction function subB is defined by onebit subtractors in coq-nbits. Our formal semantics for the QF BV arithmetic operation bvsub therefore is defined by the corresponding bit-vector function subB. Since our formal semantics did not define bvsub by bvadd and bvneg, it could be different from those in SMT-LIB. In order to build a certified solver for the QF BV logic of SMT-LIB, it is necessary to establish semantic equivalences between both semantic definitions for all derived QF BV operations and predicates.
To justify our formal semantics, we show the semantics of our definitions and those of SMT-LIB indeed denote the same bit-vector functions or predicates. Consider again the subtraction operation. Recall the semantics of the arithmetic operations bvadd and bvneg are defined by the bit-vector functions addB and negB respectively. The next lemma is useful to show the semantic equivalence: Lemma 6. ∀bv cv , size bv = size cv =⇒ subB bv cv = addB bv (negB cv ).
For all derived QF BV operations and predicates, we give Coq proofs for the equivalence between our formal semantics and those of SMT-LIB. Particularly, semantics of all QF BV arithmetic operations and predicates over two's complement representation are equivalent to those in SMT-LIB. Our formal semantics for QF BV queries is thus certified to be equivalent to SMT-LIB.

Certified Bit Blasting
Recall that a SAT query is a Boolean CNF formula. Given an SMT QF BV query, a bit blasting algorithm computes a SAT query that is satisfiable if and only if the given SMT QF BV query is satisfiable. Although it is the standard technique for solving SMT QF BV queries, bit blasting can be very complex due to arithmetic operations and various optimizations. Bit blasting algorithms therefore can be tedious to construct and thus prone to errors. We verify a bit blasting algorithm for SMT QF BV queries using our Coq formalization.
Let us start with a simple formalization of Boolean CNF formulae. In our formalization, a clause is represented by a sequence of literals; a CNF formula in turn is represented by a sequence of clauses. Let bvar be the data type for Boolean variables. We have the following data types in Coq: Define an environment to be a mapping from bvar to bool. Given a literal , a CNF formula f , and an environment , it is straightforward to define the semantic functions eval lit : bool and eval cnf f : bool.
A SAT query f is satisfiable if there is an environment such that eval cnf f = true.
To illustrate how our Coq proof works, consider Tseitin transformation for the logical negation operation: The idea is generalized to QF BV operations naturally. For each QF BV operation, we construct a literal sequence r and a Boolean CNF formula cnf . If cnf evaluates to true on an environment , the interpretation of r on needs to reflect the semantics of the QF BV operation. For instance, a Coq proof is given for the QF BV addition operation: Lemma 8. ∀ r cnf 0 1 , ( r, cnf ) = bit blast Ebvadd 0 1 =⇒ eval cnf cnf = true =⇒ eval lits r = addB (eval lits 0 ) (eval lits 1 ).
Given two literal sequences 0 and 1 , bit blast Ebvadd 0 1 returns a literal sequence r and a CNF formula cnf . If cnf evaluates to true on an environment , then the interpretation of the literal sequence r on is indeed the bit-vector sum of the interpretations of 0 and 1 on . Bit blasting algorithms for other QF BV operations are given and shown to reflect the semantics of corresponding functions defined in the bit-vector theory coq-nbits. Particularly, our bit blasting algorithms for arithmetic division and remainder correctly reflect corresponding arithmetic bit-vector functions in coq-nbits.
Recall that the semantics for SMT QF BV queries is defined over stores for QF BV variables. In order to prove the correctness of bit blasting algorithms, one has to relate stores for QF BV variables with environments for Boolean variables. The relation is explicated through literal correspondences. A literal correspondence π is a mapping from QF BV variables to sequences of literals. For each QF BV variable v, the literal sequence π(v) is meant to interpret v on environments for Boolean variables. More formally, let eval lits : bits be the bit vector for the literal sequence interpreted on the environment . The bit vector eval lits π(v) is hence the interpretation of the QF BV variable v on the environment . Let σ be a store and π a literal correspondence. An environment is consistent with σ through π if the bit vectors eval lits π(v) and Store.acc v σ are equal for every QF BV variable v in σ. Thus, an environment is consistent with a store if their interpretations of variables coincide.
It is now straightforward to give our bit blasting algorithm for SMT QF BV queries. For each QF BV expression, our algorithm first computes literals and CNF formulae for operands recursively. It then invokes an auxiliary bit blasting algorithm to construct result literals and a CNF formula for the QF BV operation. The literal correspondence is also updated when literals are allocated for QF BV variables. Finally, the result literals and the updated literal correspondence are returned along with the concatenation of all CNF formulae.
The following Coq theorem establishes the connection between the output literals and the input SMT QF BV query or expression of the algorithm. Theorem 1. Let be : bexp be an SMT QF BV query with the signature Σ be , e : exp a QF BV expression with the signature Σ e , and π 0 the empty literal correspondence.
Let be be an SMT QF BV query with the signature Σ be , r and cnf the literal and CNF formula returned by bit blast bexp respectively. Consider any store conforming to Σ be and any environment consistent with the store. If the environment evaluates the formula cnf to true, Theorem 1 says that the literal r and the SMT QF BV query be evaluate to the same Boolean value on the environment and store respectively. In other words, the algorithm bit blast bexp is a generalized Tseitin transformation for SMT QF BV queries. Particularly, all QF BV arithmetic operations (addition, subtraction, multiplication, division, and remainder in the unsigned and two's complement representations) are transformed to CNF formulae with formal proofs of correctness in Coq.
Corollary 1 gives the formal proof of correctness for our bit blasting algorithm bit blast bexp. Let be be an arbitrary SMT QF BV query, r and cnf the literal and the CNF formula returned by the algorithm. The corollary shows that the query be is satisfiable if and only of the SAT query r ∧ cnf is satisfiable. An equi-satisfiable SAT query is indeed obtained from the bit blasting algorithm on every input SMT QF BV query with a formal proof of correctness.
Recall that several QF BV operations and predicates are derived from a small number of operations and predicates in SMT-LIB. A naïve bit blasting algorithm could expand derived operations or predicates, and then perform bit blasting on a small set of operations and predicates. Such an algorithm would have a simpler proof of correctness but generate more intermediate literals and clauses. For instance, the naïve algorithm for bvsub would perform bit blasting on bvneg followed by bvadd with intermediate literals and clauses. Our bit blasting algorithm for bvsub on the other hand reflects our semantics defined by the bitvector function subB. Intermediate literals or clauses are not needed. Our bit blasting algorithm hence transforms bvsub more economically than the naïve algorithm.
To improve our bit blasting algorithm further, a cache for QF BV expressions and predicates is added. In large queries, QF BV expressions and predicates can occur a number of times. If a QF BV expression has several occurrences, our basic bit blasting algorithm will generate result literals and CNF formulae for each occurrence. Consider the SMT QF BV query (and (bvslt #b1000 (bvadd x y)) (bvslt (bvadd x y) #b0111)).
The query checks whether the sum of the QF BV variables x and y can be in a proper range. Since the Boolean predicate and has two operands, our basic algorithm invokes the auxiliary bit blasting algorithm for the two comparison predicates. It in turn blasts the same expression bvadd x y twice. Repeated bit blasting on the same expression or predicate is redundant. A hash function can detect repeated QF BV expressions and predicates easily. When an expression or a predicate recur, the previously computed literals with the empty CNF formula are returned from a cache as the result. More importantly, we give a formal Coq proof of Corollary 1 for the bit blasting algorithm with a cache.

A Certified SMT QF BV Solver
We have so far built a formally verified bit blasting algorithm for SMT QF BV queries. Using the code extraction mechanism in Coq, an OCaml program corresponding to the verified bit blasting algorithm is obtained. Using a SAT solver and a SAT certificate checker, a certified SMT QF BV solver can be constructed. Figure 1 gives the flow of our certified solver. In the figure, the extracted OCaml program takes an OCaml expression be of the type bexp as an input (Section 5). The verified program performs bit blasting on the SMT QF BV query and returns an OCaml expression cnf of the type lit list list representing a SAT query (Section 6). Precisely, an OCaml term of the type lit represents a literal. The OCaml type lit list corresponds to the data type for clauses; and the type lit list list corresponds to the data type for CNF formulae. The expression cnf is sent to a SAT solver to check satisfiability. If the SAT solver reports SAT, the SMT QF BV query represented by be is satisfiable. Otherwise, the SAT solver reports UNSAT with a certificate. The certificate is sent to a SAT certificate checker for validation. If it is validated, the SMT QF BV query be is unsatisfiable with certification.

Experiments
In order to evaluate the performance of our verified OCaml bit blasting program, we instantiate our SMT QF BV solver CoqQFBV based on Figure 1 as follows. We write an OCaml parser to translate a text file in the SMT-LIB format to an SMT QF BV query in our formal syntax. The query is sent to the verified OCaml program for bit blasting. We then add an OCaml program to transform the output SAT query to a text file in the DIMACS format. The 2020 SAT Competition winner Kissat [5] is used to check the satisfiability of the SAT query. If the SAT solver reports UNSAT with a certificate in the DRAT format [31], the certificate is sent to the verified certificate checker GratChk [16] for validation. Certificate checkers for SAT solvers use much simpler algorithms than certificate checkers for SMT solvers. They are hence easier to build and prove correct. The correctness of GratChk is in fact verified by the proof assistant Isabelle [22]. We need not trust the certificate checker either.
We ran two experiments to evaluate our certified SMT QF BV solver. The first experiment is the QF BV division of the single query track in the 2020 SMT Competition [2]. The second experiment consists of verification problems from various assembly implementations for linear field arithmetic in cryptography libraries such as OpenSSL [30], RELIC [1], and BLST [29]. We compare CoqQFBV against three SMT QF BV solvers: CVC4 [4] with an LFSC certificate checker [27], the 2020 SMT QF BV division winner Bitwuzla [20], and the 2019 SMT QF BV division winner Boolector [21]. Bitwuzla and Boolector are designed for efficiency without certification. CVC4 provides an LFSC certificate checker implemented in C [26]. The certificate checker can validate certificates from different theories but is itself not verified. All experiments were run on a Linux machine with a 3.20GHz CPU and 1TB memory.

SMT QFBV Competition
The first experiment is running our certified solver CoqQFBV on tasks from the QF BV division of the 2020 SMT Competition. We set 60GB memory limit and 20 minutes timeout for each task as in the competition. A task solves a single SMT-LIB file sequentially. The SMT QF BV division contains 6861 files in the SMT-LIB format. All files are marked with unsat, sat, or unknown indicating expected query results. To save running time, we ran 10 tasks concurrently. The experimental results are summarized in Table 1.
In the table, the column N SC indicates the number of solved tasks with certification. O SC is the number of timeouts. E SC shows the number of unsolved tasks due to tool errors. T SC is the average time for solved tasks. CoqQFBV solves 6087 (88.72%) and CVC4 with its certificate checker solves 3840 (55.97%) with certification. We observe three stack overflow errors during bit blasting in CoqQFBV. These errors are induced by deep recursion. Among 328 errors from CVC4, 249 are segmentation faults raised by the LFSC certificate checker.
The same table also compares against efficient but uncertified solvers. To evaluate the overhead from certificate checking, the two certified solvers Co-  CoqQFBV solves about 10% less tasks with certification than the 2020 track winner Bitwuzla without certification. It also performs significantly better than CVC4 with a general SMT certificate checker. Table 2 compares the four solvers by tasks from the three expected query results. Among the 4238 unsat tasks, CoqQFBV and CVC4 give certified answers to 3838 (90.56%) and 1762 (41.58%) of them respectively. The column P SU gives the average size of certificates. Efficient solvers Bitwuzla and Boolector give 4188 (98.82%) and 4180 (98.63%) uncertified answers respectively.
Among the 2553 sat tasks, Bitwuzla and Boolector finish 2524 (98.86%) and 2516 (98.55%) of them respectively. CoqQFBV and CVC4 solve only 2242 (87.82%) and 2078 (81.39%) sat tasks respectively. For the 70 tasks marked unknown, Bitwuzla and Boolector respectively answer 27 (38.57%) and 23 (32.86%) of them without certification. Our certified SMT QF BV solver finds two sat and five unsat tasks. Answers to the five unsat tasks are all certified. CVC4 with its certificate checker fails to solve any unknown task. For the benchmarks from the 2020 SMT QF BV division, our certified solver CoqQFBV appears to be more scalable than CVC4 with its general SMT certificate checker.  Table 3 further decomposes the time spent on different components in Co-qQFBV. The column T BB gives the average time for our verified OCaml bit blasting program; T SAT gives the average time used by the SAT solver Kissat; and T Cert contains the average time for the certificate checker GratChk. For the tasks in the QF BV division, the time for SAT solving and certificate checking are comparable. In comparison, the OCaml bit blasting program seems to take an unexpectedly large amount of time and hence can still be improved.

Linear Field Arithmetic in Cryptography
In this section, we evaluate our certified SMT QF BV solver on benchmarks from real-world assembly implementations in various cryptography libraries such as OpenSSL [30], RELIC [1], and BLST [29]. In elliptic curve cryptography, arithmetic operations over large finite fields are needed. A field element is typically represented by hundreds of bits. A field arithmetic operation takes two field elements and returns a field element as the result. In the signature scheme Ed25519 used in OpenSSH, for instance, a field element belongs to the residue system modulo the prime number 2 255 − 19. Field sum of two field elements is obtained by the arithmetic sum modulo 2 255 − 19. Commodity processors however do not support arithmetic instructions with operands in hundreds of bits natively. Field arithmetic has to be implemented by 32-or 64-bit instructions. The functional specification of the field addition used in Ed25519 may look as follows. r 1 , r 2 , r 3 , a 0 , a 1 , a 2 , a 3 , b 0 , b 1 , b 2 Let a i , b i , c i be 64-bit variables (registers) for 0 ≤ i ≤ 3. The specification says that the output field element represented by r i 's computed by the program x25519 fe64 add is the field arithmetic sum of the input elements represented by a i 's and b i 's. In finite field arithmetic programs, over-or under-flow in assembly instructions lead to incorrect results, and bit-accurate program verification is required. We obtain 46 implementations and generate 96 SMT QF BV queries from verification conditions in order to evaluate our certified solver in this experiment. Table 4 shows the verification results with the same memory and time limits in the 2020 SMT Competition. All SMT QF BV queries are expected to be unsatisfiable. Boolector successfully solves all queries (100%) without certification. The 2020 QF BV track winner Bitwuzla finishes 88 queries (91.67%) without certification. Surprisingly, CoqQFBV gives certified answers to 93 queries (96.88%). The verified SAT certificate checker GratChk used in CoqQFBV successfully validates all certificates for the real-world cryptographic program verification problems. In comparison, CVC4 solves 46 queries (47.92%) but certifies only 19 (19.79%). The CVC4 certificate checker raises segmentation faults on the 27 (= 46 − 19) solved but uncertified queries. These certificates are perhaps too complicated to be validated by the unverified LFSC certificate checker. For the SMT QF BV queries from real-world program verification problems, our certified solver CoqQFBV seems to perform slightly better than the efficient but uncertified SMT QF BV solver Bitwuzla. Our certified solver is probably scalable enough for certain bit-accurate program verification problems.

Conclusion
We combine algorithm design with interactive theorem proving to build a scalable certified SMT QF BV solver CoqQFBV in this work. Our certified solver employs a verified OCaml bit blasting program and the verified certificate checker GratChk to improve the confidence in SMT QF BV query results. Experiments on the QF BV division of the 2020 SMT Competition and realworld cryptographic program verification suggest that CoqQFBV is useful.
For future work, we plan to specify and verify more heuristics to further optimize CoqQFBV. Particularly, cryptographic program verification requires more sophisticated range checks. More verified bit blasting algorithms for such checks will undoubtedly improve the confidence of bit-accurate program verification.