Flexible Proof Production in an Industrial-Strength SMT Solver ⋆

. Proof production for SMT solvers is paramount to ensure their correctness independently from implementations, which are often prohibitively difficult to verify. Historically, however, SMT proof production has struggled with performance and coverage issues, resulting in the disabling of many crucial solving techniques and in coarse-grained (and thus hard to check) proofs. We present a flexible proof-production architecture designed to handle the complexity of versatile, industrial-strength SMT solvers and show how we leverage it to produce detailed proofs, including for components previously unsupported by any solver. The architecture allows proofs to be produced modularly, lazily, and with numerous safeguards for correctness. This architecture has been implemented in the state-of-the-art SMT solver cvc5. We evaluate its proofs for SMT-LIB benchmarks and show that the new architecture produces better coverage than previous approaches, has acceptable performance overhead, and supports detailed proofs for most solving components.


Introduction
SMT solvers [9] are widely used as backbones of formal methods tools in a variety of applications, often safety-critical ones. These tools rely on the solver's correctness to guarantee the validity of their results such as, for instance, that an access policy does not inadvertently give access to sensitive data [4]. However, SMT solvers, particularly industrial-strength ones, are often extremely complex pieces of engineering. This makes it hard to ensure that implementation issues do not affect results. As the industrial use of SMT solvers increases, it is paramount to be able to convince non-experts of the trustworthiness of their results.
A solution is to decouple confidence from the implementation by coupling results with machine-checkable certificates of their correctness. For SMT solvers, ⋆ This work was partially supported by the Office of Naval Research (Contract No. 68335-17-C-0558), a gift from Amazon Web Services, and by NSF-BSF grant numbers 2110397 (NSF) and 2020704 (BSF).
this amounts to providing proofs of unsatisfiability. The main challenges are justifying a combination of theory-specific algorithms while keeping the solver performant and providing enough details to allow scalable proof checking, i.e., checking that is fundamentally simpler than solving. Moreover, while proof production is well understood for propositional reasoning and common theories, that is not the case for more expressive theories, such as the theory of strings, or for more advanced solver operations such as formula preprocessing. We present a new, flexible proof-production architecture for versatile, industrial-strength SMT solvers and discuss its integration into the cvc5 solver [5]. The architecture (Section 2) aims to facilitate the implementation effort via modular proof production and internal proof checking, so that more critical components can be enabled when generating proofs. We provide some details on the core proof calculus and how proofs are produced (Section 3), in particular how we support eager and lazy proof production with built-in proof reconstruction (Section 3.2). This feature is particularly important for substitution and rewriting techniques, facilitating the instrumentation of notoriously challenging functionalities, such as simplification under global assumptions [6, Section 6.1] and string solving [40,46,48], to produce detailed proofs. Finally, we describe (Section 5) how the architecture is leveraged to produce detailed proofs for most of the theory reasoning, critical preprocessing, and underlying SAT solving of cvc5. We evaluate proof production in cvc5 (Section 6) by measuring the proof overhead and the proof quality over an extensive set of benchmarks from SMT-LIB [8].
In summary, our contributions are a flexible proof-producing architecture for state-of-the-art SMT solvers, its implementation in cvc5, the production of detailed proofs for simplification under global assumptions and the full theory of strings, and initial experimental evidence that proof-production overhead is acceptable and detailed proofs can be generated for a majority of the problems.
Preliminaries We assume the usual notions and terminology of many-sorted first-order logic with equality (≈) [29]. We consider signatures Σ all containing the distinguished Boolean sort Bool. We adopt the usual definitions of well-sorted Σ-terms, with literals and formulas as terms of sort Bool, and Σ-interpretations. A Σ-theory is a pair T = (Σ, I) where I, the models of T , is a class of Σinterpretations closed under variable reassignment. A Σ-formula φ is T -valid (resp., T -unsatisfiable) if it is satisfied by all (resp., no) interpretations in I. Two Σ-terms s and t of the same sort are T -equivalent if s ≈ t is T -valid. We write ⃗ a to denote a tuple (a 1 , . . . , a n ) of elements, with n ≥ 0. Depending on context, we will abuse this notation and also denote the set of the tuple's elements or, in case of formulas, their conjunction. Similarly, for term tuples ⃗ s, ⃗ t of the same length and sort, we will write ⃗ s ≈ ⃗ t to denote the conjunction of equalities between their respective elements.

Proof-production Architecture
Our proof-production architecture is intertwined with the CDCL(T ) architecture [43], as shown in Figure 1. Proofs are produced and stored modularly by Pre-processor φ
each solving component, which also checks they meet the expected proof structure for that component, as described below. Proofs are combined only when needed, via post-processing. The pre-processor receives an input formula φ and simplifies it in a variety of ways into formulas ϕ 1 , . . . , ϕ n . For each ϕ i , the preprocessor stores a proof P : φ → ϕ i justifying its derivation from φ. The propositional engine receives the preprocessed formulas, and its clausifier converts them into a conjunctive normal form C 1 ∧ · · · ∧ C l . A proof P : ψ → C i is stored for each clause C i , where ψ is a preprocessed formula. Note that several clauses may derive from each formula. Corresponding propositional clauses C p 1 , . . . , C p l , where first-order atoms are abstracted as Boolean variables, are sent to the SAT solver, which checks their joint satisfiability. The propositional engine enters a loop with the theory engine, which considers a set of literals asserted by the SAT solver (corresponding to a model of the propositional clauses) and verifies its satisfiability modulo a combination of theories T . If the set is T -unsatisfiable, a lemma L is sent to the propositional engine together with its proof P : L. Note that since lemmas are T -valid, their proofs have no assumptions. The propositional engine stores these proofs and clausifies the lemmas, keeping the respective clausification proofs in the clausifier. The clausified and abstracted lemmas are sent to the SAT solver to block the current model and cause the assertion of a different set of literals, if possible. If no new set is asserted, then all the clauses C 1 , . . . , C m generated until then are jointly unsatisfiable, and the SAT solver yields a proof P : C 1 ∧ · · · ∧ C m → ⊥. Note that the proof is in terms of the first-order clauses, as are the derivation rules that conclude ⊥ from them.
The post-processor of the propositional engine connects the assumptions of the SAT solver proof with the clausifier proofs, building a proof P : ϕ 1 ∧· · ·∧ϕ n → ⊥. Since theory lemmas are T -valid, the resulting proof only has preprocessed formulas as assumptions. The final proof is built by the SMT solver's postprocessor combining this proof with the preprocessing proofs P : φ → ϕ i . The resulting proof P : φ → ⊥ justifies the T -unsatisfiability of the input formula.

The Internal Proof Calculus
In this section, we specify how proofs are represented in the internal calculus of cvc5. We also provide some low-level details on how proofs are constructed and managed in our implementation.
The proof rules of the internal calculus are similar to rules in other calculi for ground first-order formulas, except that they are made a little more operational by optionally having argument terms and side conditions. Each rule has the form with identifier r, premises φ 1 , . . . , φ n , arguments t 1 , . . . , t m , conclusion ψ, and side condition C. The argument terms are used to construct the conclusion from the premises and can be used in the side condition together with the premises.

Proof Checkers and Proofs
The semantics of each proof rule r is provided operationally in terms of a proofrule checker for r. This is a procedure that takes as input a list of argument terms ⃗ t and a list of premises ⃗ φ for r. It returns fail if the input is malformed, i.e., it does not match the rule's arguments and premises or does not satisfy the side condition. Otherwise, it returns a conclusion formula ψ expressing the result of applying the rule. All proof rules of the internal calculus have an associated proof-rule checker. We say that a proof rule proves a formula ψ, from given arguments and premises, if its checker returns ψ.
cvc5 has an internal proof checker built modularly out of the individual proof-rule checkers. This checker is meant mostly for internal debugging during development, to help guarantee that the constructed proofs are correct. The expectation is that users will rely instead on third-party tools to check the proof certificates emitted by the solver.
A proof object is constructed internally using a data structure that we will describe abstractly here and call a proof node. This is a triple (r, ⃗ N , ⃗ t) consisting of a rule identifier r; a sequence ⃗ N of proof nodes, its children; and a sequence ⃗ t of terms, its arguments. The relationships between proof nodes and their children induces a directed graph over proof nodes, with edges from proofs nodes to their children. We call a single-root graph rooted at node N a proof. A proof P is well-formed if it is finite, acyclic, and there is a total mapping Ψ from the nodes of P to formulas such that, for each node N = (r, (N 1 , . . . , N m ), ⃗ t), Ψ (N ) is the formula returned by the proof checker for rule r when given premises Ψ (N 1 ), . . . , Ψ (N n ) and arguments ⃗ t. For a well-formed proof P with root N and mapping Ψ , the conclusion of P is the formula Ψ (N ); a subproof of P is any proof rooted at a descendant of N in P . We will identify a well-formed proof with its root node.

Core Proof Rules
In total, the internal calculus of cvc5 consists of 155 proof rules, 1 which cover all reasoning performed by the SMT solver, including theory-specific rules, rules for Boolean reasoning, and others. In the remainder of this section, we describe the core rules of the internal calculus, which are used throughout the system, and are illustrated in Figure 2.
Proof rules for equality Many theory solvers in cvc5 perform theory-specific reasoning on top of basic equational reasoning. The latter is captured by the proof rules eq res, refl, symm, trans, and cong. The first rule is used to prove a formula ψ from a formula φ that was proved equivalent to ψ. The rest are the standard rules for computing the congruence closure of a set of term equalities.
Proof rules for rewriting, substitution and witness forms A single coarse-grained rule, sr, is used for tracking justifications for core utilities in the SMT solver such as rewriting and substitution. This rule, together with other non-core rules with side conditions (omitted for brevity), allows the generation of coarse-grained proofs that trust the correctness of complex side conditions. Those conditions involve rewriting and substitution operations performed by cvc5 during solving. More fine-grained proofs can be constructed from coarse-grained ones by justifying the various rewriting and substitution steps in terms of simpler proof rules. This is done with the aid of the equality rules mentioned above and the additional core rules atom rewrite and witness. To describe atom rewrite, witness, and sr, we first need to introduce some definitions and notations. A rewriter R is a function over terms that preserves equivalence in the background theory T , i.e., returns a term t↓ R T -equivalent to its input t. We call t↓ R the rewritten form of t with respect to R. Currently, cvc5 uses a handful of specialized rewriters for various purposes, such as evaluating constant terms, preprocessing input formulas, and normalizing terms during solving. Each individual rewrite step executed by a rewriter R is justified in fine-grained proofs by an application of the rule atom rewrite, which takes as argument both (an identifier for) R and the term s the rewrite was applied to. Note that the rule's soundness requires that the rewrite step be equivalence preserving.
A (term) substitution σ is a finite sequence (t 1 → s 1 , . . . , t n → s n ) of oriented pairs of terms of the same sort. A substitution method S is a function that takes a term r and a substitution σ and returns a new term that is the result of applying σ to r, according to some strategy. We write S(r, σ) to denote the resulting term. We distinguish three kinds of substitution methods for σ: simultaneous, which returns the term obtained by simultaneously replacing every occurrence of term t i in r with s i , for i = 1, . . . , n; sequential, which splits σ into n substitutions (t 1 → s 1 ), . . . , (t n → s n ) and applies them in sequence to r using the simultaneous strategy above; and fixed-point, which, starting with r, repeatedly applies σ with the simultaneous strategy until no further subterm replacements are possible. For example, consider the application S(y, (x → u, y → f (z), z → g(x))). The steps the substitution method takes in computing its result are the following: In cvc5, we use a substitution derivation method D to derive a contextual substitution (t 1 → s 1 , . . . , t n → s n ) from a collection ⃗ φ of derived formulas. The substitution essentially orients a selection of term equalities t i ≈ s i entailed by ⃗ φ and, as such, can be applied soundly to formulas derived from ⃗ φ. 2 We write D(⃗ φ) to denote the substitution computed by D from ⃗ φ. Finally, cvc5 often introduces fresh variables, or Skolem variables, which are implicitly globally existentially quantified. This happens as a consequence of Skolemization of existential variables, lifting of if-then-else terms, and some kinds of flattening. Each Skolem variable k is associated with a term k↑ of the same sort containing no Skolem variables, called its witness term. This global map from Skolem variables to their witness term allows cvc5 to detect when two Skolem variables can be equated, as a consequence of their respective witness terms becoming equivalent in the current context [47]. Witness terms can also be used to eliminate Skolem variables at proof output time. We write t↑ to denote the witness form of term t, which is obtained by replacing every Skolem variable in t by its witness term. For example, if k 1 and k 2 are Skolem variables with associated witness terms ite(x ≈ z, y, z) and y − z, respectively, and φ is the formula ite(x ≈ k 2 , k 1 ≈ y, k 1 ≈ z), the witness form φ↑ of φ is the formula . When a Skolem variable k appears in a proof, the witness proof rule is used to explicitly constrain its value to be the same as that of the term k↑ it abstracts. 3 We can now explain the sr proof rule, which is parameterized by a substitution method S, a rewriter R, and substitution derivation method D. The rule is used to transform the proof of a formula φ into one of a formula ψ provided that the two formulas are equal up to rewriting under a substitution derived from the premises ⃗ φ. Note that this rule is quite general because its conclusion ψ, which is provided as an argument, can be any formula that satisfies the side condition.
Proof rules for scoped reasoning Two of the core proof rules, assume and scope, enable local reasoning. Together they achieve the effect of the ⇒introduction rule of Natural Deduction. However, separating the local assumption functionality in assume provides more flexibility. That rule has no premises and introduces a local assumption φ provided as an argument. The scope rule is used to close the scope of the local assumptions φ 1 , . . . , φ n made to prove a formula φ, inferring the formula φ 1 ∧ · · · ∧ φ n ⇒ φ.
We say that φ is a free assumption in proof P if P has a node (assume, (), φ) that is not a subproof of a scope node with φ as one of its arguments. A proof is closed if it has no free assumptions, and open otherwise.
Soundness All proof rules other than assume are sound with respect to the background theory T in the following sense: if a rule proves a formula ψ from premises ⃗ φ, every model of T that satisfies ⃗ φ, and assigns the same values to Skolem variables and their respective witness term, satisfies ψ as well. Based on this and a simple structural induction argument, one can show that well-formed closed proofs have T -valid conclusions. In contrast, open proofs have conclusions that are T -valid only under assumptions. More precisely, in general, if ⃗ φ are all the free assumptions of a well-formed proof P with conclusion ψ and ⃗ k are all the Skolem variables introduced in P , then ⃗ k ≈ ⃗ k↑ ∧ ⃗ φ ⇒ ψ is T -valid.

Constructing Proof Nodes
We have implemented a library of proof generators that encapsulates common patterns for constructing proof nodes. We assume a method getProof that takes the proof generator g and a formula φ as input and returns a proof node with conclusion φ based on the information in g. During solving, cvc5 uses a combination of eager and lazy proof generation. In general terms, eager proof generation involves constructing proof nodes for inference steps at the time those steps are taken during solving. Eager proof generation may be required if the computation state pertinent to that inference cannot be easily recovered later. In contrast, lazy proof generation occurs for inferred formulas associated with proof generators that can do internal bookkeeping to be able to construct proof nodes for the formula after solving is completed. Depending on the formula, different kinds of Algorithm 1 Proof generation for term-conversion generators, rewrite-once policy. B is a lazy proof builder, R a map from terms to their converted form, and c pre , c post are sets of pairs of equalities and the proof generators justifying them.
proof generators are used. For brevity, we only describe in detail (see Section 3.2) the proof generator most relevant to the core calculus, the term-conversion proof generator, targeted for substitution and rewriting proofs.

Proof Reconstruction for Substitution and Rewriting
Once it determines that the input formulas φ 1 , . . . , φ n are jointly unsatisfiable, the SMT solver has a reference to a proof node P that concludes ⊥ from the free assumptions φ 1 , . . . , φ n . After the post-processor is run, the (closed) proof (scope, P ′ , (φ 1 , . . . , φ n )) is then generated as the final proof for the user, where P ′ is the result of optionally expanding coarse-grained steps (in particular, applications of the rule sr) in P into fine-grained ones. To do so, we require the following algorithm for generating term-conversion proofs.
In particular, we focus on equalities t ≈ s whose proof can be justified by a set of steps that replace subterms of t until it is syntactically equal to s. We assume these steps are provided to a term-conversion proof generator. Formally, a term-conversion proof generator g is a pair of sets c pre and c post . The set c pre (resp., c post ) contains pairs of the form (t ≈ s, g t,s ) indicating that t should be replaced by s in a preorder (resp., postorder) traversal of the terms that g processes, where g t,s is a proof generator that can prove the equality t ≈ s. We require that neither c pre nor c post contain multiple entries of the form (t ≈ s 1 , g 1 ) and (t ≈ s 2 , g 2 ) for distinct (s 1 , g 1 ) and (s 2 , g 2 ).
The procedure for generating proofs from a term-conversion proof generator g is given in Algorithm 1. When asked to prove an equality t 1 ≈ t 2 , getProof traverses the structure of t 1 and applies steps from the sets c pre and c post from g. The traversal is performed by the auxiliary procedure getTermConv which relies on two data structures. The first is a lazy proof builder B that stores the intermediate steps in the overall proof of t 1 ≈ t 2 . The proof builder is given these steps either via addStep, as a concrete triple with the proof rule, a list of premise formulas, and a list of argument terms, or as a lazy step via addLazyStep, with a formula and a reference to another generator that can prove that formula. The second data structure is a mapping R from terms to terms that is updated (using array syntax in the pseudo-code) as the converted form of terms is computed by getTermConv. Each subterm s of t 1 is traversed only once by getTermConv by checking whether R already contains the converted form of s. When that is not the case, s is first preorder processed. If c pre contains an entry indicating that s rewrites to s ′ , this rewrite step is added to the lazy proof builder and the converted form R[s] of s is set to s ′ . Otherwise, the immediate subterms of s, if any, are traversed and then s is postorder processed. The converted form of s is set to some term r of the form f (R[s 1 ], . . . , R[s n ]), considering how its immediate subterms were converted. Note that B will contain steps for ⃗ s ≈ R[⃗ s]. Thus, the equality s ≈ r can be proven by congruence for function f with these premises if s ̸ = r, and by reflexivity otherwise. Furthermore, if c post indicates that r rewrites to r ′ , then this step is added to the lazy proof builder; a transitivity step is added to prove s ≈ r ′ from t ≈ r and r ≈ r ′ ; and the converted form R[s] is set to r ′ . provide proofs based on arithmetic reasoning. Invoking getProof(g, t ≈ ⊥) initiates the traversal with getTermConv(t, c pre , c post , ∅, ∅). Since t is not in the conversion map, it is preorder processed. However, as it does not occur in c pre , nothing is done and its subterms are traversed. The subterm and the respective lazy step is added to B. The subterms of f (b)+f (a) are not traversed, therefore the next term to be traversed is f (a−0)+f (b). Since it does not occur in c pre , its subterm f (a−0) is traversed, which analogously leads to the traversal of a−0. As a−0 does occur in c pre , both R and B are updated accordingly and the processing of its parent f (a−0) resumes. A congruence step added to B justifies its conversion to f (a) being added to R.  f (b). Finally, the processing returns to the initial term t, which has been converted to Since this term is equated to ⊥ in c post , justified by g Arith 1 , the respective lazy step is added to B, as well as a transitivity step to connect At this point, the execution terminates with R[f (b)+f (a) < f (a+0)+f (b)] = ⊥, as expected. A proof for t ≈ ⊥ with the following structure can then be extracted from B: We use several extensions to the procedures in Algorithm 1. Notice that this procedure follows the policy that terms on the right-hand side of conversion steps (equalities from c pre and c post ) are not traversed further. The procedure getTermConv is used by term-conversion proof generators that have the rewriteonce policy. A similar procedure which additionally traverses those terms is used by term-conversion proof generators that have a rewrite-to-fixpoint policy.
We now show how the term-conversion proof generator can be used for reconstructing fine-grained proofs from coarse-grained ones. In particular we focus on proofs P ψ1 of the form (sr, (Q ψ0 , ⃗ Q), (S, R, D, ψ)). Recall from Figure 2 that the proof rule sr concludes a formula ψ that can be shown equivalent to the formula ψ 0 proven by Q ψ0 based on a substitution derived from the conclusions of the nodes ⃗ Q. A proof like P ψ1 above can be transformed to one that involves (atomic) theory rewrites and equality rules only. We show this transformation in two phases. In the first phase, the proof is expanded to: (eq res, (Q ψ0 , (trans, (R 0 , (symm, R 1 ))))) with R i = (trans, ((subs, ⃗ Q ⃗ φ , (S, D, ψ i )), (rewrite, (), (R, S(ψ i , D(⃗ φ)))))) for i ∈ {0, 1} where ⃗ φ are the conclusions of ⃗ Q ⃗ φ , and subs and rewrite are auxiliary proof rules used for further expansion in the second phase. We describe them next.
Substitution Steps Let P t≈s be the subproof (subs, ⃗ Q ⃗ φ , (S, D, t)) of R i above proving t ≈ s with s = S(ψ i , D(⃗ φ)) and D(⃗ φ) = (t 1 → s 1 , . . . , t n → s n ). Substitution steps can be expanded to fine-grained proofs using a term-conversion proof generator. First, for each j = 1, . . . , n, we construct a proof of t j ≈ s j , which involves simple transformations on the proofs of ⃗ φ. Suppose we store all of these in an eager proof generator g. If S is a simultaneous or fixed-point substitution, we then build a single term-conversion proof generator C, which recall is modeled as a pair of mappings (c pre , c post ). We add (t j ≈ s j , g) to c pre for all j. We use the rewrite-once policy for C if S is a simultaneous substitution, and the rewrite-fixed-point policy for C otherwise. We then replace the proof P t≈s by getProof(C, t ≈ s), which runs the procedure in Algorithm 1. Otherwise, if S is a sequential substitution, we construct a term-conversion generator C j for each j, initializing it so that its c pre set contains the single rewrite step (t j ≈ s j , g) and uses a rewrite-once policy. We then replace the proof P t≈s by (trans, (P 1 , . . . , P n )) where, for j = 1, . . . , n: P j is generated by getProof(C j , s j−1 ≈ s j ); s 0 = t; s i is the result of the substitution D(⃗ φ) after the first i steps; and s n = s.
Rewrite Steps Let P be the proof node (rewrite, (), (R, t)), which proves the equality t ≈ t↑↓ R . During reconstruction, we replace P with a proof involving only fine-grained rules, depending on the rewrite method R. For example, if R is the core rewriter, we run the rewriter again on t in proof tracking mode. Normally, the core rewriter performs a term traversal and applies atomic rewrites to completion. In proof tracking mode, it also return two lists, for pre-and postrewrites, of steps (t 1 ≈ s 1 , g), . . . , (t n ≈ s n , g) where g is a proof generator that returns (atom rewrite, (), (R, t i )) for all equalities t i ≈ s i . Furthermore, for each Skolem k that is a subterm of t, we construct the rewrite steps (k ≈ k↑, g ′ ) where g ′ is a proof generator that returns (witness, (), (k)) for equalities k ≈ k↑. We add these rewrite proof steps to a term-conversion generator C with rewritefixed-point policy, and replace P by getProof(C, t ≈ t↑↓ R ).

SMT Proofs
Here we briefly describe each component shown in Section 2 and how it produces proofs with the infrastructure from Sections 3 and 3.2.

Preprocessing Proofs
The pre-processor transforms an input formula φ into a list of formulas to be given to the core solver. It applies a sequence of preprocessing passes. A pass may replace a formula φ i with another one ϕ i , in which case it is responsible for providing a proof of φ i ≈ ϕ i . It may also append a new formula ϕ to the list, in which case it is responsible for providing a proof for it. We use a (lazy) proof generator that tracks these proofs, maintaining the invariant that a proof can be provided for all (preprocessed) formulas when requested. We have instrumented proof production for the most common preprocessing passes, relying heavily on the sr rule to model transformations such as expansion of function definitions and, with witness forms, Skolemization and if-then-else elimination [6].
Simplification under global assumptions cvc5 aggressively learns literals that hold globally by performing Boolean constraint propagation over the input formula. When a learned literal corresponds to a variable elimination (e.g., x ≈ 5 corresponds to x → 5) or a constant propagation (e.g., P (x) corresponds to P (x) → ⊤), we apply the corresponding (term) substitution to the input. This application is justified via sr, while the derivation of the globally learned literals is justified via clausification and resolution proofs, as explained in Section 5.3.
The key features of our architecture that make it feasible to produce proofs for this simplification are the automatic reconstruction of sr steps and the ability to customize the strategy for substitution application during reconstruction, as detailed in Section 3.2. When a new variable elimination x → t is learned, old ones need to be normalized to eliminate any occurrences of x in their right-hand sides. Computing the appropriate simultaneous substitution for all eliminations requires quadratically many traversals over those terms. We have observed that the size of substitutions generated by this preprocessing pass can be very large (with thousands of entries), which makes this computation prohibitively expensive. Using the fixed-point strategy, however, the reconstruction for the sr steps can apply the substitution efficiently and its complexity depends on how many applications are necessary to reach a fix-point, which is often low in practice.

Theory Proofs
The theory engine produces lemmas, as disjunctions of literals, from an individual theory or a combination of them. In the first case, the lemma's proof is provided directly by the corresponding theory solver. In the second case, a theory solver may produce a lemma ψ containing a literal ℓ derived by some other theory solver from literals ⃗ ℓ. A lemma over the combined theory is generated by replacing ℓ in ψ by ⃗ ℓ. This regression process, which is similar to the computation of explanations during solving, is repeated until the lemma contains only input literals. The proof of the final lemma then uses rules like sr to combine the proofs of the intermediate literals derived locally in various theories and their replacement by input literals in the final lemma.

Equality and Uninterpreted Function (EUF) Proofs
The EUF solver can be easily instrumented to produce proofs [31,42] with equality rules (see Figure 2). In cvc5, term equivalences are also derived via rewriting in some other theory T : when a function from T has all of its arguments inferred to be congruent to T -values, it may be rewritten into a T -value itself, and this equivalence asserted. Such equivalences are justified via sr steps. Since generating equality proofs incurs minimal overhead [42] and rewriting proofs are reconstructed lazily, EUF proofs are generated during solving and stored in an eager proof generator.
Extensional Arrays and Datatypes Proofs While these two theories differ significantly, they both combine equality reasoning with rules for handling their particular operators. For arrays, these are rules for select, store, and array extensionality (see [36,Sec. 5]). For datatypes, they are rules reflecting the properties of constructors and selectors, as well as acyclicity. The justifications for lemmas are also generated eagerly and stored in an eager proof generator.

Bit-Vector Proofs
The bit-vector solver applies bit-blasting to reduce bit-vector problems to equisatisfiable propositional problems. Thus, its lemmas amount to the rewriting of the bit-vector literals into Boolean formulas, which will be solved and proved by the propositional engine. The bit-vector lemmas are proven lazily, analogous to sr steps, with the difference that the reconstruction uses the bitblaster in the bit-vector solver instead of the rewriter.

Arithmetic Proofs
The linear arithmetic solver is based on the simplex algorithm [24], and each of its lemmas is the negation of an unsatisfiable conjunction of inequalities. Farkas' lemma [30,49] guarantees that there exists a linear combination of these inequalities equivalent to ⊥. The coefficients of the combination are computed during solving with minimal overhead [38], and the equivalence is proven with an sr step. To allow the rewriter to prove this equivalence, the bounds of the inequalities are scaled by constants and summed during reconstruction. Integer reasoning is proved through rules for branching and integer bound tightening, recorded eagerly.
Non-linear arithmetic lemmas are generated from incremental linearization [16] or cylindrical algebraic coverings [1]. The former can be proven via propositional and basic arithmetic rules, with only a few, such as the tangent plane lemma, needing a dedicated proof rule. The latter requires two complex rules that are not inherently simpler than solving, albeit not as complex as those for regular CAD-based theory solvers [2]. We point out that checking these rules would require a significant portion of CAD-related theory, whose proper formalization is still an open, if actively researched, problem [18,25,34,41,53].
Quantifier Proofs Quantified formulas not Skolemized during pre-processing are handled via instantiation, which produces theory lemmas of the form (∀⃗ x φ) ⇒ φσ, where σ is a grounding substitution. An instantiation rule proves them independently of how the substitution was actually derived, since any well-typed one suffices for soundness.

String Proofs
The strings solver applies a layered approach, distinguishing between core [40] and extended operators [48]. The core operators consist of (dis)equalities between string concatenations and length constraints. Reasoning over them is proved by a combination of equality and linear integer arithmetic proofs, as well as specific string rules. The extended operators are reduced to core ones via formulas with bounded quantifiers. The reductions are proven with rules defining each extended function's semantics, and sr steps justifying the reductions. Finally, regular membership constraints are handled by string rules that unfold occurrences of the Kleene star operator and split up regular expression concatenations into different parts. Overall, the proofs for the strings theory solver encompass not only string-specific reasoning but also equality, linear integer arithmetic, and quantifier reasoning, as well as substitution and rewriting.
Unsupported The theory solvers for the theories of floating-point arithmetic, sequences, sets and relations, and separation logic are currently not proof-producing in cvc5. These are relatively new or non-standard theories in SMT and have not been our focus, but we intend to produce proofs for them in the future.

Propositional Proofs
Propositional proofs justify both the conversion of preprocessed input formulas and theory lemmas into conjunctive normal form (CNF) and the derivation of ⊥ from the resulting clauses. CNF proofs are a combination of Boolean transformations and introductions of Boolean formulas representing the definition of Tseytin variables, used to ensure that the CNF conversion is polynomial. The clausifier uses a lazy proof builder which stores the clausification steps eagerly, with the preprocessed input formulas as assumptions, and the theory lemmas as lazy steps, with associated proof generators. For Boolean reasoning, cvc5 uses a version of MiniSat [27] instrumented to produce resolution proofs. It uses a lazy proof builder to record resolution steps for learned clauses as they are derived (see [7,Chap 1] for more details) and to lazily build a refutation with only the resolution steps necessary for deriving ⊥. The resolution rule, however, is ground first-order resolution, since the proofs are in terms of the first-order clauses rather than their propositional abstractions.

Evaluation
In this section, we discuss an initial evaluation of our implementation in cvc5 of the proof-production architecture presented in this paper. In the following, we denote different configurations of cvc5 by cvc plus some suffixes. A configuration using variable and clause elimination in the SAT solver [26], symmetry breaking [23] in the EUF solver, and black-box SAT solving in the bit-vector (BV) solver, is denoted by the suffix o. These techniques are currently incompatible with the proof production architecture. Other cvc5 techniques for which we do not yet support fine-grained proofs, however, are active and have their inferences registered in the proofs as trusted steps. A configuration that includes simplification under global assumptions is denoted by s; one that includes producing proofs by p; and one that additionally reconstructs proofs by r. The default configuration of cvc5 is cvc+os.
We split our evaluation into measuring the proof-production cost as well as the performance impact of making key techniques proof-producing; the proof reconstruction overhead; and the coverage of the proof production. We also comment on how cvc5's proofs compare with CVC4's proofs. Note that the internal proof checking described in Section 3, which was invaluable for a correct implementation, is disabled for evaluating performance. Experiments ran on a cluster with Intel Xeon E5-2620 v4 CPUs, with 300s and 8GB of RAM for each solver and benchmark pair. We consider 162,060 unsatisfiable problems from SMT-LIB [8], across all logics except those with floating point arithmetic, as determined by cvc5 [5,Sec. 4]. We split them into 38,732 problems with the BV theory (the BVs set) and 123,328 problems without (the non-BVs set). Proof production cost The cost of proof production is summarized in Table 1 and Figures 3a to 3d. The impact of running without o is negligible overall in non-BVs, but steep for BVs, both in terms of solving time and number of problems solved, as evidenced by the table and Figure 3b, respectively. This is expected given the effectiveness of combining bit-blasting with black-box SAT solvers. The overhead of p is similar for both sets, although more pronounced in BVs. While the total time is around double that of cvc+s, Figure 3c shows a finer distribution, with most problems having a less significant overhead. Moreover, the total number of problems solved is quite similar, as shown in Figures 3a  and 3b, particularly for non-BVs. The difference in overhead due to p between the BVs and non-BVs sets can be attributed to the cost of managing large proofs, which are more common in BVs. This stems from the well-known blowup in problem size incurred by bit-blasting, which is reflected in the proofs. The cost of generating fine-grained steps for the sr rule and for the similarly reconstructed theory-specific steps mentioned in Section 5, varies again between the two sets, but more starkly. While for non-BVs the overall solving time and number of problems solved are very similar between cvc+sp and cvc+spr, for the BVs set cvc+spr is significantly slower overall. This difference again arises mainly because of the increased proof sizes. Nevertheless, r leads to only a small increase in unsolved problems in BVs, as shown in Figure 3b.
The importance of being able to produce proofs for simplification under global assumptions is made clear by Figure 3a: the impact of disabling s is virtually the same as that of adding p; moreover, cvc+spr significantly outperforms cvc+pr. In Figure 3b the difference is less pronounced but still noticeable.
Proofs coverage When using techniques that are not yet fully proof-producing, but still active, cvc5 inserts trusted steps in the proof. These are usually steps whose checking is not inherently simpler than solving. They effectively represent holes in the proof, but are still useful for users who avail themselves of powerful proof-checking techniques. Trusted steps are commonly used when integrating SMT solvers into proof assistants [11,28,51].
The percentage of cvc+spr proofs without trusted steps is 92% for BVs and 80% for non-BVs. That is to say, out of 145,683 proofs, 120,473 of them are fully fine-grained proofs. The vast majority of the trusted steps in the remaining proofs are due to theory-specific preprocessing passes that are not yet fully proofproducing. In non-BVs, the occurrence of trusted steps is heavily dependent on the specific SMT-LIB logic, as expected. Common offenders are logics with datatypes, with trusted steps for acyclicity checks, and quantified logics, with trusted steps for certain α-equivalence eliminations. In non-linear real arithmetic logics, all cylindrical algebraic coverings proofs are built with trusted steps (see Section 5.2), but we note this is the state of the art for CAD-based proofs. As for non-linear integer arithmetic logics, our proof support is still in its early stages, so a significant portion of their theory lemmas are trusted steps.
We stress the extent of our coverage for string proofs, which were previously unsupported by any SMT solver. In the string logics without length constraints, 100% of the proofs are fully fine-grained. This rate goes down to 80% in the logics with length. For the remaining 20%, the overwhelming majority of the trusted steps are for theory-specific preprocessing or some particular string or linear arithmetic inference within the proof of a theory lemma.
Comparison with CVC4 Proofs We compare the proof coverage of cvc5 versus CVC4. The cvc5 proof production replaces CVC4's [32,36], which was incomplete and monolithic. CVC4 did not produce proofs at all for strings, substitutions, rewriting, preprocessing, quantifiers, datatypes, or non-linear arithmetic. In particular, simplification over global assumptions had to be disabled when producing proofs. In fragments supported by both systems, CVC4's proofs are at most as detailed as cvc5's. The only superior aspect of CVC4's proof production was to support proofs from external SAT solvers [45] used in the BV solver, which are very significant for solving performance, as shown above. Integrating this feature into cvc5 is left as future work, but we note that there is no limitation in the proof architecture that would prevent it. We also point out that cvc5 produces resolution proofs for the bit-blasted BV constraints, which can be checked in polynomial time, whereas external SAT solvers produce DRAT proofs [33] (or reconstructions of them via other tools [19,20,37,39]), which can take exponential time to check. So there is a significant trade-off to be considered.

Related work
Two significant proof-producing state-of-the-art SMT solvers are z3 [22] and veriT [14]. Both can have their proofs successfully reconstructed in proof assistants [3,12,13,51]. They can produce detailed proofs for the propositional and theory reasoning in EUF and linear arithmetic, as well as for quantifiers. However, z3's proofs are coarse-grained for preprocessing and rewriting, and for bitvector reasoning, which complicates proof checking. Moreover, to the best of our knowledge, z3 does not produce proofs for its other theories. In contrast, veriT can produce fine-grained proofs for preprocessing and rewriting [6], which has led to a better integration with Isabelle/HOL [51]. However, it does so eagerly, which requires a tight integration between the preprocessing and the proof-production code. In addition, it does not support simplification under global assumptions when producing proofs, which significantly impacts its performance. Other proofproducing SMT solvers are MathSAT5 [17] and SMTInterpol [15]. They produce resolution proofs and theory proofs for EUF, linear arithmetic, and, in SMTInterpol's case, array theories. Their proofs are tailored towards unsatisfiable core and interpolant generation, rather than external certification. Moreover, they do not seem to provide proofs for preprocessing, clausification or rewriting.
While cvc5 is possibly the only proof-producing solver for the full theory of strings, CertiStr [35] is a certified solver for the fragment with concatenation and regular expressions. It is automatically generated from Isabelle/HOL [44] but is significantly less performant than cvc5, although a proper comparison would need to account for proof-checking time in cvc5's case.

Conclusion and future work
We presented and evaluated a flexible proof production architecture, showing it is capable of producing proofs with varying levels of granularity in a scalable manner for a state-of-the-art and industrial-strength SMT solver like cvc5.
Since currently, there is no standard proof format for SMT solvers, our architecture is designed to support multiple proof formats via a final post-processing transformation to convert internal proofs accordingly. We are developing backends for the LFSC [52] proof checker and the proof assistants Lean 4 [21], Isabelle/HOL [44], and Coq [10], the latter two via the Alethe proof format [50]. Since using these tools requires mechanizing the respective target proof calculi in their languages, besides external checking, another benefit is to decouple confidence on the soundness of the proof calculi from the internal cvc5 proof calculus.
A considerable challenge for SMT proofs is the plethora of rewrite rules used by the solvers, which are specific for each theory and vary in complexity. In particular, string rewrites can be very involved [46] and hard to check. We are also developing an SMT-LIB-based DSL for specifying rewrite rules, to be used during proof reconstruction to decompose rewrite steps in terms of them, thus providing more fine-grained proofs for rewriting.
Finally, we plan to incorporate into the proof-production architecture the unsupported theories and features mentioned in Sections 5.2 and 6, particularly those relevant for solving performance that currently either leave holes in proofs, such as theory pre-processing or non-linear arithmetic reasoning, or that have to be disabled, such as the use of external SAT solvers in the BV theory.