Abstract
We present a fast and reliable reconstruction of proofs generated by the SMT solver veriT in Isabelle. The finegrained proof format makes the reconstruction simple and efficient. For typical proof steps, such as arithmetic reasoning and skolemization, our reconstruction can avoid expensive search. By skipping proof steps that are irrelevant for Isabelle, the performance of proof checking is improved. Our method increases the success rate of Sledgehammer by halving the failure rate and reduces the checking time by 13%. We provide a detailed evaluation of the reconstruction time for each rule. The runtime is influenced by both simple rules that appear very often and common complex rules.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Proof assistants are used in verification and formal mathematics to provide trustworthy, machinecheckable formal proofs of theorems. Proof automation reduces the burden of finding proofs and allows proof assistant users to focus on the core of their arguments instead of technical details. A successful approach implemented by “hammers,” like Sledgehammer for Isabelle [15], is to heuristically selects facts from the background; use an external automatic theorem prover, such as a satisfiability modulo theories (SMT) solver [12], to filter facts needed to discharge the goal; and to use the filtered facts to find a trusted proof.
Isabelle does not accept proofs that do not go through the assistant’s inference kernel. Hence, Sledgehammer attempts to find the fastest internal method that can recreate the proof (preplay). This is often a call of the smt tactic, which runs an SMT solver, parses the proof, and reconstructs it through the kernel. This reconstruction allows the usage of external provers. The smt tactic was originally developed for the SMT solver Z3 [18, 34].
The SMT solver CVC4 [10] is one of the best solvers on Sledgehammer generated problems [14], but currently does not produce proofs for problems with quantifiers. To reconstruct its proofs, Sledgehammer mostly uses the smt tactic based on Z3. However, since CVC4 uses more elaborate quantifier instantiation techniques, many problems provable for CVC4 are unprovable for Z3. Therefore, Sledgehammer regularly fails to find a trusted proof and the user has to write the proofs manually. veriT [19] (Sect. 2) supports these techniques and we extend the smt tactic to reconstruct its proofs. With the new reconstruction (Sect. 3), more smt calls are successful. Hence, less manual labor is required from users.
The runtime of the smt method depends on the runtime of the reconstruction and the solver. To simplify the reconstruction, we do not treat veriT as a black box anymore, but extend it to produce more detailed proofs that are easier to reconstruct. We use detailed rules for simplifications with a combination of propositional, arithmetic, and quantifier reasoning. Similarly, we add additional information to avoid search, e.g., for linear arithmetic and for term normalization. Our reconstruction method uses the newly provided information, but it also has a step skipping mode that combines some steps (Sect. 4).
A very early prototype of the extension was used to validate the finegrained proof format itself [7, Sect. 6.2, second paragraph]. We also published some details of the reconstruction method and the rules [25] before adapting veriT to ease reconstruction. Here, we focus on the new features.
We optimize the performance further by tuning the search performed by veriT. Multiple options influence the execution time of an SMT solver. To finetune veriT ’s search procedure, we select four different combinations of options, or strategies, by generating typical problems and selecting options with complementary performance on these problems. We extend Sledgehammer to compare these four selected strategies and suggest the fastest to the user. We then evaluate the reconstruction with Sledgehammer on a large benchmark set. Our new tactic halves the failure rate. We also study the time required to reconstruct each rule. Many simple rules occur often, showing the importance of step skipping (Sect. 5).
Finally, we discuss related work (Sect. 6). Compared to the prototype [25], the smt tactic is now thoroughly tested. We fixed all issues revealed during development and improved the performance of the reconstruction method. The work presented here is integrated into Isabelle version 2021; i.e., since this version Sledgehammer can also suggest veriT, without user interaction. To simplify future reconstruction efforts, we document the proof format and all rules used by veriT. The resulting reference manual is part of the veriT documentation [40].
2 veriT and Proofs
The SMT solver veriT is an open source solver based on the CDCL(\(\mathcal {T}\)) calculus. In proofproduction mode, it supports the theories of uninterpreted functions with equality, linear real and integer arithmetic, and quantifiers. To support quantifiers veriT uses quantifier instantiation and extensive preprocessing.
veriT ’s proof syntax is an extension of SMTLIB [11] which uses Sexpressions and prefix notation. The proofs are refutation proofs, i.e., proofs of \(\bot \). A proof is an indexed list of steps. Each step has a conclusion clause (cl ..) and is annotated with a rule, a list of premises, and some ruledependent arguments. veriT distinguishes 90 rules [40]. Subproofs are the key feature of the proof format. They introduce an additional context. Contexts are used to reason about binders, e.g., preprocessing steps like transformation under quantifiers.
The conclusions of rules with contexts are always equalities. The context models a substitution into the free variables of the term on the lefthand side of the equality. Consider the following proof fragment that renames the variable name x to vr, as done during preprocessing:
The assume command repeats input assertions or states local assumptions. In this fragment the assumption a0 is not used. Subproofs start with the anchor command that introduces a context. Semantically, the context is a shorthand for a lambda abstraction of the free variable and an application of the substituted term. Here the context is \(x\mapsto \mathrm {vr}\) and the step t1 means \((\lambda x.\;x)\;\mathrm {vr} = \mathrm {vr}\). The step is proven by congruence (rule cong). Then congruence is applied again (step t2) to prove that \((\lambda x.\;f\;x)\;\mathrm {vr} = f\;\mathrm {vr}\) and step t3 concludes the renaming.
During proof search each module of veriT appends steps onto a list. Once the proof is completed, veriT performs some cleanup before printing the proof. First, a pruning phase removes branches of the proof not connected to the root \(\bot \). Second, a merge phase removes duplicated steps. The final pass prepares the data structures for the optional term sharing via name annotations.
3 Overview of the veriTPowered smt Tactic
Isabelle is a generic proof assistant based on an intuitionistic logic framework, Pure, and is almost always only used parameterized with a logic. In this work we use only Isabelle/HOL, the parameterization of Isabelle with higherorder logic with rank1 (top level) polymorphism. Isabelle adheres to the LCF [26] tradition. Its kernel supports only a small number of inferences. Tactics are programs that prove a goal by using only the kernel for inferences. The LCF tradition also means that external tools, like SMT solvers, are not trusted.
Nevertheless, external tools are successfully used. They provide relevant facts or a detailed proof. The Sledgehammer tool implements the former and passes the filtered facts to trusted tactics during preplay. The smt tactic implements the latter approach. The provided proof is checked by Isabelle. We focus on the smt tactic, but we also extended Sledgehammer to also suggest our new tactic.
The smt tactic translates the current goal to the SMTLIB format [11], runs an SMT solver, parses the proof, and replays it through Isabelle’s kernel. To choose the smt tactic the user applies (smt (z3)) to use Z3 and (smt (verit)) to use veriT. We will refer to them as zsmt and vsmt. The proof formats of Z3 and veriT are so different that separate reconstruction modules are needed. The vsmt tactic performs four steps:

1.
It negates the proof goal to have a refutation proof and also encodes the goal into firstorder logic. The encoding eliminates lambda functions. To do so, it replaces each lambda function with a new function and creates \({\text {app}}\) operators corresponding to function application. Then veriT is called to find a proof.

2.
It parses the proof found by veriT (if one is found) and encodes it as a directed acyclic graph with \(\bot \) as the only conclusion.

3.
It converts the SMTLIB terms to typed Isabelle terms and also reverses the encoding used to convert higherorder into firstorder terms.

4.
It traverses the proof graph, checks that all input assertions match their Isabelle counterpart and then reconstructs the proof step by step using the kernel’s primitives.
4 Tuning the Reconstruction
To improve the speed of the reconstruction method, we create small and welldefined rules for preprocessing simplifications (Sect. 4.1). Previously, veriT implicitly normalized every step; e.g., repeated literals were immediately deleted. It now produces proofs for this transformation (Sect. 4.2). Finally, the lineararithmetic steps contain coefficients which allow Isabelle to reconstruct the step without relying on its limited arithmetic automation (Sect. 4.3). On the Isabelle side, the reconstruction module selectively decodes the firstorder encoding (Sect. 4.4). To improve the performance of the reconstruction, it skips some steps (Sect. 4.5).
4.1 Preprocessing Rules
During preprocessing SMT solvers perform simplifications on the operator level which are often akin to simple calculations; e.g., \(a \times 0 \times f(x)\) is replaced by \(0\).
To capture such simplifications, we create a list of 17 new rules: one rule per arithmetic operator, one to replace boolean operators such as XOR with their definition, and one to replace \(n\)ary operator applications with binary applications. This is a compromise: having one rule for every possible simplification would create a longer proof. Since preprocessing uses structural recursion, the implementation simply picks the right rule in each leaf case. The example above now produces a prod_simplify step with the conclusion \(a \times 0 \times f(x) = 0\). Previously, a single step of the connect_equiv rule collected all those simplifications and no list of simplifications performed by this rule existed. The reconstruction relied an experimentally created list of tactics to be fast enough.
On the Isabelle side, the reconstruction is fast, because we can direct the search instead of trying automated tactics that can also work on other parts of the formula. For example, the simplifier handles the numeral manipulations of the prod_simplify rule and we restrict it to only use arithmetic lemmas. [2]
Moreover, since we know the performed transformations, we can ignore some parts of the terms by generalizing, i.e., replacing them by constants [18]. Because generalized terms are smaller, the search is more directed and we are less likely to hit the searchdepth limitation of Isabelle’s auto tactic as before. Overall, the reconstruction is more robust and easier to debug.
4.2 Implicit Steps
To simplify reconstruction, we avoid any implicit normal form of conclusions. For example, a rule concluding \(t \vee P\) for any formula \(t\) can be used to prove \(P\vee P\). In such cases veriT automatically normalizes the conclusion \(P\vee P\) to \(P\). Without a proof of the normalization, the reconstruction has to handle such cases.
We add new proof rules for the normalization and extend veriT to use them. Instead of keeping only the normalized step, both the original and the normalized step appear in the proof. For the example above, we have the step \(P\vee P\) and the normalized \(P\). To remove a double negation \(\lnot \lnot t\) we introduce the tautology \(\lnot \lnot \lnot t \vee t\) and resolve it with the original clause. Our changes do not affect any other part of veriT. The solver now also prunes steps concluding \(\top \).
On the Isabelle side, the reconstruction becomes more regular with fewer special cases and is more reliable. The reconstruction method can directly reconstruct rules. To deal with the normalization, the reconstruction used to first generate the conclusion of the theorem and then ran the simplifier to match the normalized conclusion. This could not deal with tautologies.
We also improve the proof reconstruction of quantifier instantiation steps. One of the instantiation schemes, conflicting instances [8, 36], only works on clausified terms. We introduce an explicit quantifiedclausification rule qnt_cnf issued before instantiating. While this rule is not detailed, knowing when clausification is needed improves reconstruction, because it avoids clausifying unconditionally. The clausification is also shared between instantiations of the same term.
4.3 Arithmetic Reasoning
We use a proof witness to handle linear arithmetic. When the propositional model is unsatisfiable in the theory of linear real arithmetic, the solver creates la_generic steps. The conclusion is a tautological clause of linear inequalities and equations and the justification of the step is a list of coefficients so that the linear combination is a trivially contradictory inequality after simplification (e.g., \(0 \ge 1\)). Farkas’ lemma guarantees the existence of such coefficients for reals. Most SMT solvers, including veriT, use the simplex method [21] to handle linear arithmetic. It calculates the coefficients during normal operation.
The real arithmetic solver also strengthens inequalities on integer variables before adding them to the simplex method. For example, if \(x\) is an integer the inequality \(2x < 3\) becomes \(x \le 1\). The corresponding justification is the rational coefficient . The reconstruction must replay this strengthening.
The complete linear arithmetic proof step \(1<x \vee 2x < 3\) looks like
The reconstruction of an la_generic step in Isabelle starts with the goal \(\bigvee _i \lnot c_i\) where each \(c_i\) is either an equality or an inequality. The reconstruction method first generalizes over the nonarithmetic parts. Then it transforms the lemma into the equivalent formulation \(c_1 \Rightarrow \dots \Rightarrow c_n\Rightarrow \bot \) and removes all negations (e.g., by replacing \(\lnot a \le b\) with \(b > a\)).
Next, the reconstruction method multiplies the equation by the corresponding coefficient. For example, for integers, the equation \(A< B\), and the coefficient (with \(p>0\) and \(q > 0\)), it strengthens the equation and multiplies by \(p\) to get
The ifthenelse term \((\mathrm {if}\;B\mathop {mod}q = 0\;\mathrm {then}\;1\;\mathrm {else}\;0)\) corresponds to the strengthening. If \(B\mathop {mod}q = 0\), the result is an equation of the form \(A' +1\le B'\), i.e., \(A' < B'\). No strengthening is required for the corresponding theorem over reals.
Finally, we can combine all the equations by summing them while being careful with the equalities that can appear. We simplify the resulting (in)equality using Isabelle’s simplifier to derive \(\bot \).
To replay linear arithmetic steps, Isabelle can also use the tactic linarith as used for Z3 proofs. It searches the coefficients necessary to verify the lemma. The reconstruction used it previously [25], but the tactic can only find integer coefficients and fails if strengthening is required. Now the rule is a mechanically checkable certificate.
4.4 Selective Decoding of the Firstorder Encoding
Next, we consider an example of a rule that shows the interplay of the higherorder encoding and the reconstruction. To express function application, the encoding introduces the firstorder function \({\text {app}}\) and constants for encoded functions. The proof rule eq_congruent expresses congruence on a firstorder function: \((t_1 \mathbin \ne u_1) \mathrel \vee \dots \mathrel \vee (t_n \mathbin \ne u_n) \mathrel \vee f(t_1, \dots , t_n) \mathbin = f(u_1, \dots , u_n)\). With the encoding it can conclude \(f \ne f' \vee x \ne x' \vee {\text {app}}(f, x) \mathbin = {\text {app}}(f', x')\). If the reconstruction unfolds the entire encoding, it builds the term \(f \mathbin \ne f' \vee x \mathbin \ne x' \vee f x \mathbin = f' x'\). It then identifies the functions and the function arguments and uses rewriting to prove that if \(f=f'\) and \(x = x'\), then \(f x = f' x'\).
However, Isabelle \(\beta \)reduces all terms implicitly, changing the term structure. Assume \(f := \lambda x.\; x = a\) and \(f' := \lambda x.\; a = x\). After unfolding all constructs that encode higherorder terms and after \(\beta \)reduction, we get \((\lambda x.\; x = a) \ne (\lambda x.\; a = x')\vee (x \ne x') \vee (x =a) \mathbin = (a = y')\). The reconstruction method cannot identify the functions and function arguments anymore.
Instead, the reconstruction method does not unfold the encoding including \({\text {app}}\). This eliminates the need for a special case to detect lambda functions. Such a case was used in the previous prototype, but the code was very involved and hard to test (such steps are rarely used).
4.5 Skipping Steps
The increased number of steps in the finegrained proof format slows down reconstruction. For example, consider skolemization from \(\exists x.\; P\;x\). The proof from Z3 uses one step. veriT uses eight steps—first renaming it to \((\exists x.\; P\;x) = (\exists v.\; P\;v)\) (with a subproof of at least 2 steps), then concluding the renaming to get \((\exists v.\; P\;v)\) (two steps), then \( (\exists v.\; P\;v) = P\;(\epsilon v.\;P\;v)\) (with a subproof of at least 2 steps), and finally \(P\; (\epsilon v.\;P\;v)\) (two steps).
To reduce the number of steps, our reconstruction skips two kinds of steps. First, it replaces every usage of the or rule by its only premise. Second, it skips the renaming of bound variables. The proof format treats \(\forall x.\;P\;x\) and \(\forall y.\;P\;y\) as two different terms and requires a detailed proof of the conversion. Isabelle, however, uses De Bruijn indices and variable names are irrelevant. Hence, we replace steps of the form \((\forall x.\;P\;x)\Leftrightarrow (\forall y.\;P\;y)\) by a single application of reflexivity. Since veriT canonizes all variable names, this eliminates many steps.
We can also simplify the idiom “equiv_pos2; th_resolution”. veriT generates it for each skolemization and variable renaming. Step skipping replaces it by a single step which we replay using a specialized theorem.
On proof with quantifiers, step skipping can remove more than half of the steps—only four steps remain in the skolemization example above (where two are simply reflexivity). However, with step skipping the smt method is not an independent checker that confirms the validity of every single step in a proof.
5 Evaluation
During development we routinely tested our proof reconstruction to find bugs. As a side effect, we produced SMTLIB files corresponding to the calls. We measure the performance of veriT with various options on them and select five different strategies (Sect. 5.1). We also evaluate the repartition of the tactics used by Sledgehammer for preplay (Sect. 5.2), and the impact of the rules (Sect. 5.3).
We performed the strategy selection on a computer with two Intel Xeon Gold 6130 CPUs (32 cores, 64 threads) and 192 GiB of RAM. We performed Isabelle experiments with Isabelle version 2021 on a computer with two AMD EPYC 7702 CPUs (128 cores, 256 threads) and 2 TiB of RAM.
5.1 Strategies
veriT exposes a wide range of options to finetune the proof search. In order to find good combinations of options (strategies), we generate problems with Sledgehammer and use them to finetune veriT ’s search behavior. Generating problems also makes it possible to test and debug our reconstruction.
We test the reconstruction by using Isabelle’s Mirabelle tool. It reads theories and automatically runs Sledgehammer [14] on all proof steps. Sledgehammer calls various automatic provers (here the SMT solvers CVC4, veriT, and Z3 and the superposition prover E [38]) to filter facts and chooses the fastest tactic that can prove the goal. The tactic smt is used as a last resort.
To generate problems for tuning veriT, we use the theories from HOLLibrary (an extended standard library containing various developments) and from the formalizations of Green’s theorem [2, 3], the Prime Number Theorem [23], and the KBO ordering [13]. We call Mirabelle with only veriT as a fact filter. This produces SMT files for representative problems Isabelle users want to solve and a series of calls to vsmt. For failing vsmt calls three cases are possible: veriT does not find a proof, reconstruction times out, or reconstruction fails with an error. We solved all reconstruction failures in the test theories.
To find good strategies, we determine which problems are solved by several combination of options within a two second timeout. We then choose the strategy which solves the most benchmarks and three strategies which together solve the most benchmarks. For comparison, we also keep the default strategy.
The strategies are shown in Table 1 and mostly differ in the instantiation schemes. The strategy del_insts uses instance deletion [6] and uses a breadthfirst algorithm to find conflicting instances. All other strategies rely on extended trigger inference [29]. The strategy ccfv_SIG uses a different indexing method for instantiation. It also restricts enumerative instantiation [35], because the options indexsorts and indexfreshsorts are not used. The strategy ccfv_insts increases some thresholds. Finally, the strategy best uses a subset of the options used by the other strategies. Sledgehammer uses best for fact filtering.
We have also considered using a scheduler in Isabelle as used in the SMT competition. The advantage is that we do not need to select the strategy on the Isabelle side. However, it would make vsmt unreliable. A problem solved by only one strategy just before the end of its time slice can become unprovable on slower hardware. Issues with zsmt timeouts have been reported on the Isabelle mailing list, e.g., due to an antivirus delaying the startup [27].
5.2 Improvements of Sledgehammer Results
To measure the performance of the vsmt tactic, we ran Mirabelle on the full HOLLibrary, the theory Prime Distribution Elementary (PDE) [22], an executable resolution prover (RP) [37], and the Simplex algorithm [30]. We extended Sledgehammer’s proof preplay to try all veriT strategies and added instrumentation for the time of all tried tactics. Sledgehammer and automatic provers are mostly nondeterministic programs. To reduce the variance between the different Mirabelle runs, we use the deterministic MePo fact filter [33] instead of the better performing MaSh [28] that uses machine learning (and depends on previous runs) and underuse the hardware to minimize contention. We use the default timeouts of 30 seconds for the fact filtering and one second for the proof preplay. This is similar to the Judgment Day experiments [17]. The raw results are available [1].
Success Rate. Users are not interested in which tactics are used to prove a goal, but in how often Sledgehammer succeeds. There are three possible outcomes: (i) a successfully preplayed proof, (ii) a proof hint that failed to be preplayed (usually because of a timeout), or (iii) no proof. We define the success rate as the proportion of outcome (i) over the total number of Sledgehammer calls.
Table 2 gathers the results of running Sledgehammer on all unique goals and analyzing its outcome using different preplay configurations where only zsmt (the baseline) or both vsmt and zsmt are enabled. Any useful preplay tactic should increase the success rate (SR) by preplaying new proof hints provided by the factfilter prover, reducing the preplay failure rate (PF).
Let us consider, e.g., the results when using CVC4 as factfilter prover. The success rate of the baseline on the HOLLibrary is 54.5% and its preplay failure rate is 1.5%. This means that CVC4 found a proof for \(54.5\% + 1.5\% = 56\%\) of the goals, but that Isabelle’s proof methods failed to preplay many of them. In such cases, Sledgehammer gives a proof hint to the user, which has to manually find a functioning proof. By enabling vsmt, the failure rate decreases by two thirds, from 1.5% to 0.5%, which directly increases the success rate by 1 percentage point: new cases where the burden of the proof is moved from the user to the proof assistant. The failure rate is reduced in similar proportions for PNT (63%), RP (63%), and Simplex (56%). For these formalizations, this improvement translates to a smaller increase of the success rate, because the baseline failure rate was smaller to begin with. This confirms that the instantiation technique conflicting instances [8, 36] is important for CVC4.
When using veriT or Z3 as factfilter prover, a failure rate of zero could be expected, since the same SMT solvers are used for both fact filtering and preplaying. The observed failure rate can partly be explained by the much smaller timeout for preplay (1 second) than for fact filtering (30 seconds).
Overall, these results show that our proof reconstruction enables Sledgehammer to successfully preplay more proofs. With vsmt enabled, the weighted average failure rate decreases as follows: for CVC4, from 1.3% to 0.4%; for E, from 1.5% to 1.2%; for veriT, from 1.0% to 0.3%; and for Z3, from 0.7% to 0.3%. For the user, this means that the availability of vsmt as a proof preplay tactic increases the number of goals that can be fully automatically proved.
Saved time. Table 3 shows a different view on the same results. Instead of the raw success rate, it shows the time that is spent reconstructing proofs. Using the baseline configuration, preplaying all formalizations takes a total of \(250.1 + 33.4 + 37.2 + 42.8= 363.5\) seconds. When enabling vsmt , some calls to zsmt are replaced by faster vsmt calls and the reconstruction time decreases by 13% to \(212.6 + 28.4 + 34.4 + 41.6=317\) seconds. Note that the performalization improvement varies considerably: 15% for HOLLibrary, 15% for PNT, 7.5% for RP, and 4.0% for Simplex.
For the user, this means that enabling vsmt as a proof preplay tactic may significantly reduce the verification time of their formalizations.
Impact of the Strategies. We have also studied what happens if we remove a single veriT strategy from Sledgehammer (Table 4). The most important one is best, as it solves the highest number of problems. On the contrary, default is nearly entirely covered by the other strategies. ccfv_SIG and del_insts have a similar number where they are faster than Z3, but the latter has more unique goals and therefore, saves more time. Each strategy has some uniquely solved problems that cannot be reconstructed using any other. The results are similar for the other theories used in Table 3.
5.3 Speed of Reconstruction
To better understand what the key rules of our reconstruction are, we recorded the time used to reconstruct each rule and the time required by the solver over all calls attempted by Sledgehammer including the ones not selected. The reconstruction ratio (reconstruction over search time) shows how much slower reconstructing compared to finding a proof is. For the 25% of the proofs, Z3 ’s concise format is better and the reconstruction is faster than proof finding (first quartile: 0.9 for vsmt vs. 0.1 for zsmt). The 99th percentile of the proofs (18.6 vs. 27.2) shows that veriT’s detailed proof format reduces the number of slow proofs. The reconstruction is slower than finding proofs on average for both solvers.
Fig. 1 shows the distribution of the time spent on some rules. We remove the slowest and fastest 5% of the applications, because garbage collection can trigger at any moment and even trivial rules can be slow. Fig. 2 gives the sum of all reconstruction times over all proofs. We call parsing the time required to parse and convert the veriT proof into Isabelle terms.
Overall, there are two kinds of rules: (1) direct application of a sequence of theorems—e.g., equiv_pos2 corresponds to the theorem \(\lnot (a \Leftrightarrow b) \vee \lnot a \vee b\)—and (2) calls to fullblown tactics—like qnt_cnf (Sect. 4.2).
First, direct application of theorems are usually fast, but they occur so often that the cumulative time is substantial. For example, cong only needs to unfold assumptions and apply reflexivity and symmetry of equality. However, it appears so often and sometimes on large terms, that it is an important rule.
Second, rules which require fullblown tactics are the slowest rules. For qnt_cnf (CNF under quantifiers, see Sect. 4.2), we have not written a specialized tactic, but rely on Isabelle’s tableaubased blast tactic. This rule is rather slow, but is rarely used. It is similar to the rule la_generic: it is slow on average, but searching the coefficients takes even more time.
We can also see that the time required to check the simplification steps that were formerly combined into the connect_equiv rule is not significant anymore.
We have performed the same experiments with the reconstruction of the SMT solver Z3. In contrast to veriT, we do not have the amount of time required for parsing. The results are shown in Figs. 3 and 4. The rule distribution is very different. The nnfneg and nnfpos rules are the slowest rules and take a huge amount of time in the worst case. However, the coarser quantifier instantiation step is on average faster than the one produced by veriT. We suspect that reconstruction is faster because the rule, which is only an implication without choice terms, is easier to check (no equality reordering).
6 Related Work
The SMT solvers CVC4 [10], Z3 [34], and veriT [19] produce proofs. CVC4 does not record quantifier reasoning in the proof, and Z3 uses some macro rules. Proofs from SMT solvers have also been used to find unsatisfiability cores [20], and interpolants [32]. They are also useful to debug the solver itself, since unsound steps often point to the origin of bugs. Our work also relates to systems like Dedukti [5] that focuses on translating proof steps, not on replaying them.
Proof reconstruction has been implemented in various systems, including CVC4 proofs in HOL Light [31], Z3 in HOL4 and Isabelle/HOL [18], and veriT [4] and CVC4 [24] in Coq. Only veriT produces detailed proofs for preprocessing and skolemization. SMTCoq [4, 24] currently supports veriT ’s version 1 of the proof output which has different rules, does not support detailed skolemization rules, and is implemented in the 2016 version of veriT, which has worse performance. SMTCoq also supports bit vectors and arrays.
The reconstruction of Z3 proofs in HOL4 and Isabelle/HOL is one of the most advanced and well tested. It is regularly used by Isabelle users. The Z3 proof reconstruction succeeds in more than 90% of Sledgehammer benchmarks [14, Section 9] and is efficient (an older version of Z3 was used). Performance numbers are reported [16, 18] not only for problems generated by proof assistants (including Isabelle), but also for preexisting SMTLIB files from the SMTLIB library.
The performance study by Böhme [16, Sect. 3.4] uses version 2.15 of Z3, whereas we use version 4.4.0 which currently ships with Isabelle. Since version 2.15, the proof format changed slightly (e.g., thlemmaarith was introduced), fulfilling some of the wishes expressed by Böhme and Weber [18] to simplify reconstruction. Surprisingly, the nnf rules do not appear among the five rules that used the most runtime. Instead, the thlemma and rewrite rules were the slowest. Similarly to veriT, the cong rule was among the most used (without accounting for the most time), but it does not appear in our Z3 tests.
CVC4 follows a different philosophy compared to veriT and Z3: it produces proofs in a logical framework with side conditions [39]. The output can contain programs to check certain rules. The proof format is flexible in some aspects and restrictive in others. Currently CVC4 does not generate proofs for quantifiers.
7 Conclusion
We presented an efficient reconstruction of proofs generated by a modern SMT solver in an interactive theorem prover. Our improvements address reconstruction challenges for proof steps of typical inferences performed by SMT solvers.
By studying the time required to replay each rule, we were able to compare the reconstruction for two different proof formats with different design directions. The very detailed proof format of veriT makes the reconstruction easier to implement and allows for more specialization of the tactics. On slow proofs, the ratio of time to reconstruct and time to find a proof is better for our more detailed format. Integrating our reconstruction in Isabelle halves the number of failures from Sledgehammer and nicely completes the existing reconstruction method with Z3.
Our work is integrated into Isabelle version 2021. Sledgehammer suggests the veriTbased reconstruction if it is the fastest tactic that finds the proof; so users profit without action required on their side. We plan to improve the reconstruction of the slowest rules and remove inconsistencies in the proof format. The developers of the SMT solver CVC4 are currently rewriting the proof generation and plan to support a similar proof format. We hope to be able to reuse the current reconstruction code by only adding support for CVC4specific rules. Generating and reconstructing proofs from the veriT version with higherorder logic [9] could also improve the usefulness of veriT on Isabelle problems. The current proof rules [40] should accommodate the more expressive logic.
References
Reliable Reconstruction of FineGrained Proofs in a Proof Assistant. Zenodo (Apr 2021). https://doi.org/10.5281/zenodo.4727349
Abdulaziz, M., Paulson, L.C.: An Isabelle/HOL formalisation of Green’s theorem. Archive of Formal Proofs (Jan 2018), https://isaafp.org/entries/Green.html, formal proof development
Abdulaziz, M., Paulson, L.C.: An Isabelle/HOL Formalisation of Green’s Theorem. Journal of Automated Reasoning 63(3), 763–786 (Nov 2018). https://doi.org/10.1007/s108170189495z
Armand, M., Faure, G., Grégoire, B., Keller, C., Théry, L., Werner, B.: A modular integration of SAT/SMT solvers to Coq through proof witnesses. In: Jouannaud, J.P., Shao, Z. (eds.) CPP 2011. LNCS, vol. 7086, pp. 135–150. Springer, Berlin Heidelberg (2011). https://doi.org/10.1007/9783642253799_12
Assaf, A., Burel, G., Cauderlier, R., Delahaye, D., Dowek, G., Dubois, C., Gilbert, F., Halmagrand, P., Hermant, O., Saillard, R.: Expressing theories in the \(\lambda \)\(\pi \)calculus modulo theory and in the Dedukti system. In: TYPES: Types for Proofs and Programs. Novi SAd, Serbia (May 2016)
Barbosa, H.: Efficient instantiation techniques in SMT (work in progress). vol. 1635, pp. 1–10. CEURWS.org (Jul 2016), http://ceurws.org/Vol1635/#paper01
Barbosa, H., Blanchette, J.C., Fleury, M., Fontaine, P.: Scalable finegrained proofs for formula processing. Journal of Automated Reasoning (Jan 2019). https://doi.org/10.1007/s1081701809502y
Barbosa, H., Fontaine, P., Reynolds, A.: Congruence closure with free variables. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 214–230. Springer, Berlin Heidelberg (2017). https://doi.org/10.1007/9783662545805_13
Barbosa, H., Reynolds, A., Ouraoui, D.E., Tinelli, C., Barrett, C.W.: Extending SMT solvers to higherorder logic. In: Fontaine, P. (ed.) CADE 27. LNCS, vol. 11716, pp. 35–54. Springer International Publishing (2019). https://doi.org/10.1007/9783030294366_3
Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanović, D., King, T., Reynolds, A., Tinelli, C.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 171–177. Springer, Berlin Heidelberg (2011). https://doi.org/10.1007/9783642221101_14
Barrett, C., Fontaine, P., Tinelli, C.: The SMTLIB Standard: Version 2.6. Tech. rep., Department of Computer Science, The University of Iowa (2017), available at www.SMTLIB.org
Barrett, C.W., Tinelli, C.: Satisfiability modulo theories. In: Clarke, E.M., Henzinger, T.A., Veith, H., Bloem, R. (eds.) Handbook of Model Checking, pp. 305–343. Springer International Publishing, Cham (2018). https://doi.org/10.1007/9783319105758_11
Becker, H., Blanchette, J.C., Waldmann, U., Wand, D.: Formalization of Knuth–Bendix orders for lambdafree higherorder terms. Archive of Formal Proofs (Nov 2016), https://isaafp.org/entries/Lambda_Free_KBOs.html, formal proof development
Blanchette, J.C., Böhme, S., Fleury, M., Smolka, S.J., Steckermeier, A.: Semiintelligible Isar Proofs from MachineGenerated Proofs. Journal of Automated Reasoning 56(2), 155–200 (2015). https://doi.org/10.1007/s1081701593353
Blanchette, J.C., Böhme, S., Paulson, L.C.: Extending Sledgehammer with smt solvers. In: Bjørner, N., SofronieStokkermans, V. (eds.) CADE 23. LNCS, vol. 6803, pp. 116–130. Springer, Berlin Heidelberg (2011). https://doi.org/10.1007/9783642224386_11
Böhme, S.: Proving Theorems of HigherOrder Logic with SMT Solvers. Ph.D. thesis, Technische Universität München (2012), http://mediatum.ub.tum.de/node?id=1084525
Böhme, S., Nipkow, T.: Sledgehammer: Judgement day. In: Giesl, J., Hähnle, R. (eds.) IJCAR 2010. pp. 107–121. Springer, Berlin Heidelberg (2010). https://doi.org/10.1007/9783642142031_9
Böhme, S., Weber, T.: Fast LCFstyle proof reconstruction for Z3. In: Kaufmann, M., Paulson, L.C. (eds.) ITP 2010. LNCS, vol. 6172, pp. 179–194. Springer, Berlin Heidelberg (2010). https://doi.org/10.1007/9783642140525_14
Bouton, T., de Oliveira, D.C.B., Déharbe, D., Fontaine, P.: veriT: An open, trustable and efficient SMTsolver. In: Schmidt, R.A. (ed.) CADE 22. LNCS, vol. 5663, pp. 151–156. Springer, Berlin Heidelberg (2009). https://doi.org/10.1007/9783642029592_12
Déharbe, D., Fontaine, P., Guyot, Y., Voisin, L.: SMT solvers for Rodin. In: Derrick, J., Fitzgerald, J.A., Gnesi, S., Khurshid, S., Leuschel, M., Reeves, S., Riccobene, E. (eds.) ABZ 2012. LNCS, vol. 7316, pp. 194–207. Springer, Berlin Heidelberg (Jun 2012). https://doi.org/10.1007/9783642308857_14
Dutertre, B., de Moura, L.: Integrating simplex with DPLL(T). Tech. rep., SRI International (May 2006), http://www.csl.sri.com/users/bruno/publis/sricsl0601.pdf
Eberl, M.: Elementary facts about the distribution of primes. Archive of Formal Proofs (Feb 2019), https://isaafp.org/entries/Prime_Distribution_Elementary.html, formal proof development
Eberl, M., Paulson, L.C.: The prime number theorem. Archive of Formal Proofs (Sep 2018), https://isaafp.org/entries/Prime_Number_Theorem.html, formal proof development
Ekici, B., Mebsout, A., Tinelli, C., Keller, C., Katz, G., Reynolds, A., Barrett, C.W.: SMTCoq: A plugin for integrating SMT solvers into Coq. In: Majumdar, R., Kuncak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 126–133. Springer International Publishing (2017). https://doi.org/10.1007/9783319633909_7
Fleury, M., Schurr, H.: Reconstructing veriT proofs in Isabelle/HOL. In: Reis, G., Barbosa, H. (eds.) PxTP 2019. EPTCS, vol. 301, pp. 36–50 (2019). https://doi.org/10.4204/EPTCS.301.6
Edinburgh LCF. LNCS, vol. 78. Springer, Heidelberg (1979). https://doi.org/10.1007/3540097244
Immler, F.: Re: [isabelle] Isabelle 2019RC2 sporadic smt failures. Email (May 2019), https://lists.cam.ac.uk/pipermail/clisabelleusers/2019May/msg00130.html
Kühlwein, D., Blanchette, J.C., Kaliszyk, C., Urban, J.: MaSh: Machine learning for Sledgehammer. In: ITP. LNCS, vol. 7998, pp. 35–50. Springer (2013)
Leino, K.R.M., PitClaudel, C.: Trigger selection strategies to stabilize program verifiers. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9779, pp. 361–381. Springer International Publishing (2016). https://doi.org/10.1007/9783319415284_20
Marić, F., Spasić, M., Thiemann, R.: An incremental simplex algorithm with unsatisfiable core generation. Archive of Formal Proofs (Aug 2018), https://isaafp.org/entries/Simplex.html, formal proof development
McLaughlin, S., Barrett, C., Ge, Y.: Cooperating theorem provers: A case study combining HOLLight and CVC Lite. Electronic Notes in Theoretical Computer Science 144(2), 43–51 (2006). https://doi.org/10.1016/j.entcs.2005.12.005
McMillan, K.L.: Interpolants from Z3 proofs. In: FMCAD 2011. pp. 19–27. FMCAD Inc, Austin, Texas (2011)
Meng, J., Paulson, L.C.: Lightweight relevance filtering for machinegenerated resolution problems. J. Appl. Log. 7(1), 41–57 (2009)
de Moura, L., Bjørner, N.: Z3: An efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Berlin Heidelberg (2008). https://doi.org/10.1007/9783540788003_24
Reynolds, A., Barbosa, H., Fontaine, P.: Revisiting enumerative instantiation. In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10806, pp. 112–131. Springer International Publishing (2018). https://doi.org/10.1007/9783319899633_7
Reynolds, A., Tinelli, C., de Moura, L.: Finding conflicting instances of quantified formulas in SMT. In: FMCAD 2014. pp. 195–202. IEEE (2014). https://doi.org/10.1109/FMCAD.2014.6987613
Schlichtkrull, A., Blanchette, J.C., Traytel, D., Waldmann, U.: Formalization of Bachmair and Ganzinger’s ordered resolution prover. Archive of Formal Proofs (Jan 2018), https://isaafp.org/entries/Ordered_Resolution_Prover.html, formal proof development
Schulz, S.: E  a brainiac theorem prover. AI Communications 15(2–3), 111–126 (2002), http://content.iospress.com/articles/aicommunications/aic260
Stump, A., Oe, D., Reynolds, A., Hadarean, L., Tinelli, C.: SMT proof checking using a logical framework. Formal Methods in System Design 42(1), 91–118 (2013). https://doi.org/10.1007/s1070301201633
The veriT Team and Contributors: Proofonomicon: A reference of the veriT proof format. Software Documentation (2021), https://www.veritsolver.org/documentation/proofonomicon.pdf, last Accessed: April 2021
Acknowledgment
We would like to thank Haniel Barbosa for his support with the implementation in veriT. We also thank Haniel Barbosa, Jasmin Blanchette, Pascal Fontaine, Daniela Kaufmann, Petar Vukmirović, and the anonymous reviewers for many fruitful discussions and suggesting many textual improvements. The first and third authors have received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreements No. 713999, Matryoshka, and No. 830927, Concordia). The second author is supported by the LIT AI Lab funded by the State of Upper Austria. The training presented in this paper was carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2021 The Author(s)
About this paper
Cite this paper
Schurr, HJ., Fleury, M., Desharnais, M. (2021). Reliable Reconstruction of Finegrained Proofs in a Proof Assistant. In: Platzer, A., Sutcliffe, G. (eds) Automated Deduction – CADE 28. CADE 2021. Lecture Notes in Computer Science(), vol 12699. Springer, Cham. https://doi.org/10.1007/9783030798765_26
Download citation
DOI: https://doi.org/10.1007/9783030798765_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783030798758
Online ISBN: 9783030798765
eBook Packages: Computer ScienceComputer Science (R0)