A Verified SAT Solver Framework with Learn, Forget, Restart, and Incrementality
Abstract
We developed a formal framework for conflictdriven clause learning (CDCL) using the Isabelle/HOL proof assistant. Through a chain of refinements, an abstract CDCL calculus is connected first to a more concrete calculus, then to a SAT solver expressed in a functional programming language, and finally to a SAT solver in an imperative language, with total correctness guarantees. The framework offers a convenient way to prove metatheorems and experiment with variants, including the Davis–Putnam–Logemann–Loveland (DPLL) calculus. The imperative program relies on the twowatchedliteral data structure and other optimizations found in modern solvers. We used Isabelle’s Refinement Framework to automate the most tedious refinement steps. The most noteworthy aspects of our work are the inclusion of rules for forget, restart, and incremental solving and the application of stepwise refinement.
Keywords
SAT solvers CDCL DPLL Proof assistants Isabelle/HOL1 Introduction
Researchers in automated reasoning spend a substantial portion of their work time developing logical calculi and proving metatheorems about them. These proofs are typically carried out with pen and paper, which is errorprone and can be tedious. Today’s proof assistants are easier to use than their predecessors and can help reduce the amount of tedious work, so it makes sense to use them for this kind of research.
In this spirit, we started an effort, called Open image in new window (Isabelle Formalization of Logic) [4], that aims at developing libraries and a methodology for formalizing modern research in the field, using the Isabelle/HOL proof assistant [45, 46]. Our initial emphasis is on established results about propositional and firstorder logic. In particular, we are formalizing large parts of Weidenbach’s forthcoming textbook, tentatively called Open image in new window . Our inspiration for formalizing logic is the Open image in new window (Isabelle Formalization of Rewriting) project [55], which focuses on term rewriting.
The objective of formalization work is not to eliminate paper proofs, but to complement them with rich formal companions. Formalizations help catch mistakes, whether superficial or deep, in specifications and theorems; they make it easy to experiment with changes or variants of concepts; and they help clarify concepts left vague on paper.
This article presents our formalization of CDCL (conflictdriven clause learning) based on Open image in new window , derived as a refinement of Nieuwenhuis, Oliveras, and Tinelli’s abstract presentation of CDCL [43]. It is the algorithm implemented in modern propositional satisfiability (SAT) solvers. We start with a family of formalized abstract DPLL (Davis–Putnam–Logemann–Loveland) [17] and CDCL [3, 6, 40, 42] transition systems from Nieuwenhuis et al. (Sect. 3). Some of the calculi include rules for learning and forgetting clauses and for restarting the search. All calculi are proved sound and complete, as well as terminating under a reasonable strategy.
The abstract CDCL calculus is refined into the more concrete calculus presented in Open image in new window and recently published [57] (Sect. 4). The latter specifies a criterion for learning clauses representing first unique implication points [6, Chapter 3], with the guarantee that learned clauses are not redundant and hence derived at most once. The correctness results (soundness, completeness, termination) are inherited from the abstract calculus. The calculus also supports incremental solving.
The concrete calculus is refined further to obtain a verified, but very naive, functional program extracted using Isabelle’s code generator (Sect. 5). The final refinement step derives an imperative SAT solver implementation with efficient data structures, including the wellknown twowatchedliteral optimization (Sect. 6).

Develop a basic library of formalized results and a methodology aimed at researchers who want to experiment with calculi.

Study and connect the members of the CDCL family, including newer extensions.

Check the proofs in Open image in new window and provide a formal companion to the book.

Assess the suitability of Isabelle/HOL for formalizing logical calculi.

Isar [58] is a textual proof format inspired by the pioneering Mizar system [41]. It makes it possible to write structured, readable proofs—a requisite for any formalization that aims at clarifying an informal proof.

Sledgehammer [7, 48] integrates superposition provers and SMT (satisfiability modulo theories) solvers in Isabelle to discharge proof obligations. The SMT solvers, and one of the superposition provers [56], are built around a SAT solver, resulting in a situation where SAT solvers are employed to prove their own metatheory.

Locales [2, 25] parameterize theories over operations and assumptions, encouraging a modular style. They are useful to express hierarchies of concepts and to reduce the number of parameters and assumptions that must be threaded through a formal development.

The Refinement Framework [30] can be used to express refinements from abstract data structures and algorithms to concrete, optimized implementations. This allows us to reason about simple algebraic objects and yet obtain efficient programs. The Sepref tool [31] builds on the Refinement Framework to derive an imperative program, which can be extracted to Standard ML and other programming languages. For example, Isabelle’s algebraic lists can be refined to mutable arrays in ML.
2 Isabelle/HOL
Isabelle [45, 46] is a generic proof assistant that supports several object logics. The metalogic is an intuitionistic fragment of higherorder logic (HOL) [15]. The types are built from type variables Open image in new window and nary type constructors, normally written in postfix notation (e.g, Open image in new window ). The infix type constructor Open image in new window is interpreted as the (total) function space from Open image in new window to Open image in new window . Function applications are written in a curried style without parentheses (e.g., Open image in new window ). Anonymous functions \(x \mapsto t_x\) are written \(\lambda x.\; t_x\). The notation Open image in new window indicates that term t has type \(\tau \). Propositions are terms of type Open image in new window , a type with at least two values. Symbols belonging to the signature (e.g., Open image in new window ) are uniformly called constants, even if they are functions or predicates. No syntactic distinction is enforced between terms and formulas. The metalogical operators are universal quantification Open image in new window , implication Open image in new window , and equality Open image in new window . The notation \({\bigwedge }x.\; p_x\) abbreviates \({\bigwedge }\;(\lambda x.\; p_x)\) and similarly for other binder notations.
Isabelle/HOL is the instantiation of Isabelle with HOL, an object logic for classical HOL extended with rank1 (toplevel) polymorphism and Haskellstyle type classes. It axiomatizes a type Open image in new window of Booleans as well as its own set of logical symbols (\(\forall \), \(\exists \), Open image in new window , Open image in new window , \(\lnot \), \(\wedge \), \(\vee \), \(\longrightarrow \), Open image in new window , \(=\)). The object logic is embedded in the metalogic via a constant Open image in new window , which is normally not printed. In practice, the distinction between the two logical levels is important operationally but not semantically.
Isabelle adheres to the tradition that started in the 1970s by the LCF system [22]: All inferences are derived by a small trusted kernel; types and functions are defined rather than axiomatized to guard against inconsistencies. Highlevel specification mechanisms let us define important classes of types and functions, notably inductive datatypes, inductive predicates, and recursive functions. Internally, the system synthesizes appropriate lowlevel definitions and derives the user specifications via primitive inferences.
Isabelle developments are organized as collections of theory files that build on one another. Each file consists of definitions, lemmas, and proofs expressed in Isar [58], Isabelle’s input language. Isar proofs are expressed either as a sequence of tactics that manipulate the proof state directly or in a declarative, naturaldeduction format inspired by Mizar. Our formalization almost exclusively employs the more readable declarative style.
2.1 Sledgehammer
The Sledgehammer subsystem [7, 48] integrates automatic theorem provers in Isabelle/HOL, including CVC4, E, LEOII, Satallax, SPASS, Vampire, veriT, and Z3. Upon invocation, it heuristically selects relevant lemmas from the thousands available in loaded libraries, translates them along with the current proof obligation to SMTLIB or TPTP, and invokes the automatic provers. In case of success, the machinegenerated proof is translated to an Isar proof that can be inserted into the formal development, so that the external provers do not need to be trusted.
2.2 Isar
In Isar proofs, intermediate properties are introduced using Open image in new window and proved using a tactic such as simp and auto. Proof blocks ( Open image in new window \(\;\ldots \;\) Open image in new window ) can be nested. The advantage of Isar proofs over oneline metis proofs is that we can follow and understand the steps. However, for lemmas about multisets and other background theories, we are usually content if we can get a proof automatic and carry on with formalizing the more interesting foreground theory.
2.3 Locales
2.4 Refinement Framework

The ‘do’ construct is a convenient Haskellinspired syntax for expressing monadic computations (here, on the nondeterminism monad).

The Open image in new window combinator takes a condition, a loop body, and a start value. In our example, the loop’s state is a pair of the form Open image in new window . The Open image in new window subscript in the combinator’s name indicates that the loop must not diverge. Totality is necessary for code generation.

The Open image in new window statement takes an assertion that must always be true when the statement is executed.

The Open image in new window operation returns the \((i + 1)\)st element of Open image in new window , and Open image in new window replaces the \((i + 1)\)st element by y.
The Sepref tool automates the transition from the nondeterminism monad to the heap monad. It keeps track of the values that are destroyed and ensures that they are not used later in the program. Given a suitable source program, it can automatically generate the target program and prove the corresponding refinement lemma automatically. The main difficulty is that some lowlevel operations have side conditions, which we must explicitly discharge by adding assertions at the right points in the source program to guide Sepref.
The ML idiom \(\texttt {(fn () => \ldots ) ()}\) is inserted to delay the evaluation of the body, so that the side effects occur in the intended order.
3 Abstract CDCL
The abstract CDCL calculus by Nieuwenhuis et al. [43] forms the first layer of our refinement chain. The formalization relies on basic Isabelle libraries for lists and multisets and on custom libraries for propositional logic. Properties such as partial correctness and termination (given a suitable strategy) are inherited by subsequent layers.
3.1 Propositional Logic
The simpler calculi do not use Open image in new window ; they take Open image in new window , a singleton type whose unique value is (). Informally, we write A, \(\lnot \,A\), and \(L^\dag \) for positive, negative, and decision literals, and we write \(L^C\) (with Open image in new window ) or simply L (if Open image in new window or if the clause C is irrelevant) for propagated literals. The unary minus operator is used to negate a literal, with \( (\lnot \,A) = A\).
As is customary in the literature [1, 57], clauses are represented by multisets, ignoring the order of literals but not repetitions. A Open image in new window is a (finite) multiset over Open image in new window . Clauses are often stored in sets or multisets of clauses. To ease reading, we write clauses using logical symbols (e.g., \(\bot \), L, and \(C \vee D\) for \(\emptyset \), \(\{L\}\), and \(C \uplus D\)). Given a clause C, we write \(\lnot \,C\) for the formula that corresponds to the clause’s negation.
Given a set or multiset I of literals, \(I \vDash C\) is true if and only if C and I share a literal. This is lifted to sets and multisets of clauses or formulas: Open image in new window . A set or multiset is satisfiable if there exists a consistent set or multiset of literals I such that \(I \vDash N\). Finally, Open image in new window These notations are also extended to formulas.
3.2 DPLL with Backjumping
Nieuwenhuis et al. present CDCL as a set of transition rules on states. A state is a pair Open image in new window , where M is the trail and N is the multiset of clauses to satisfy. In a slight abuse of terminology, we will refer to the multiset of clauses as the “clause set.” The trail is a list of annotated literals that represents the partial model under construction. The empty list is written Open image in new window . Somewhat nonstandardly, but in accordance with Isabelle conventions for lists, the trail grows on the left: Adding a literal L to M results in the new trail \(L \cdot M\), where Open image in new window . The concatenation of two lists is written \(M \mathbin {@} M'\). To lighten the notation, we often build lists from elements and other lists by simple juxtaposition, writing \(M L M'\) for \(M \mathbin {@} L \cdot M'\).

Open image in new window Open image in new window
if N contains a clause \(C\vee L\) such that \(M \vDash \lnot \, C\) and L is undefined in M (i.e., neither \(M \vDash L\) nor \(M \vDash  L\))

Open image in new window Open image in new window
if the atom of L occurs in N and is undefined in M

Open image in new window Open image in new window
if N contains a conflicting clause C (i.e., \(M'L^\dag M\vDash \lnot \, C\)) and there exists a clause \(C'\vee L'\) such that \(N\vDash C'\vee L'\), \(M \vDash \lnot \, C'\), and \(L'\) is undefined in M but occurs in N or in \(M'L^\dag \)
Following a common idiom, the Open image in new window calculus is distributed over two locales: The first locale, Open image in new window Open image in new window , defines the Open image in new window calculus; the second locale, Open image in new window , extends it with an assumption expressing a structural invariant over Open image in new window that is instantiated when proving concrete properties later. This cannot be achieved with a single locale, because definitions may not precede assumptions.
Theorem 1
(Termination [20, Open image in new window ]) The relation Open image in new window is well founded.
 (1)
there exists an index \(i \le n, n'\) such that \([\nu \, M'_0,\, \cdots ,\, \nu \, M'_{i1}] = [\nu \, M_0,\, \cdots ,\, \nu \, M_{i1}]\) and \(\nu \,M'_i < \nu \,M_i\); or
 (2)
\([\nu \, M_0,\, \cdots ,\, \nu \, M_{n}]\) is a strict prefix of \([\nu \, M'_0,\, \cdots ,\, \nu \, M'_{n'}]\).
A final state is a state from which no transitions are possible. Given a relation Open image in new window , we write Open image in new window for the rightrestriction of its reflexive transitive closure to final states (i.e., Open image in new window if and only if Open image in new window ).
Theorem 2
(Partial Correctness [20, Open image in new window ]) If Open image in new window , then N is satisfiable if and only if \(M\vDash N.\)
We first prove structural invariants on arbitrary states Open image in new window reachable from Open image in new window , namely: (1) each variable occurs at most once in \(M'\); (2) if \(M' = M_2 L M_1\) where L is propagated, then \(M_1, N \vDash L\). From these invariants, together with the constraint that Open image in new window is a final state, it is easy to prove the theorem.
3.3 Classical DPLL

Open image in new window Open image in new window
if N contains a conflicting clause and \(M'\) contains no decision literals
Lemma 3
(Backtracking [20, Open image in new window ]) The Open image in new window rule is a special case of the Open image in new window rule.
The Open image in new window rule depends on two clauses: a conflict clause C and a clause \(C'\vee L'\) that justifies the propagation of \(L'\!.\) The conflict clause is specified by Open image in new window . As for \(C'\vee L'\), given a trail \(M'L^\dag M\) decomposable as \(M_nL^\dag M_{n1}L_{n\smash {1}}^\dag \cdots M_1 L_1^ \dag M_0\) where \(M_0,\cdots ,M_n\) contain no decision literals, we can take \(C' = L_1\vee \cdots \vee L_{n1}\).
If a conflict cannot be resolved by backtracking, we would like to have the option of stopping even if some variables are undefined. A state Open image in new window is conclusive if \(M \vDash N\) or if N contains a conflicting clause and M contains no decision literals. For Open image in new window , all final states are conclusive, but not all conclusive states are final.
Theorem 4
(Partial Correctness [20, Open image in new window ])
If Open image in new window Open image in new window and Open image in new window is a conclusive state, N is satisfiable if and only if \(M\vDash N\).
The theorem does not require stopping at the first conclusive state. In an implementation, testing \(M\vDash N\) can be expensive, so a solver might fail to notice that a state is conclusive and continue for some time. In the worst case, it will stop in a final state—which is guaranteed to exist by Theorem 1. In practice, instead of testing whether \(M\vDash N\), implementations typically apply the rules until every literal is set. When N is satisfiable, this produces a total model.
3.4 The CDCL Calculus

Open image in new window Open image in new window if \(N\vDash C\) and each atom of C is in N or M

Open image in new window Open image in new window if \(N\vDash C\)
We call this calculus Open image in new window . In general, Open image in new window does not terminate, because it is possible to learn and forget the same clause infinitely often. But for some instantiations of the parameters with suitable restrictions on Open image in new window and Open image in new window , the calculus always terminates.
Theorem 5
(Termination [20, Open image in new window ])
Let Open image in new window be an instance of the Open image in new window calculus (i.e., Open image in new window ). If Open image in new window admits no infinite chains consisting exclusively of Open image in new window and Open image in new window transitions, then Open image in new window is well founded.

Open image in new window Open image in new window
if Open image in new window , L, Open image in new window , M, Open image in new window , N satisfy Open image in new window ’s side conditions
3.5 Restarts
A working strategy is to gradually increase the number of transitions between successive restarts. This is formalized via a locale parameterized by a base calculus Open image in new window and an unbounded function Open image in new window . Nieuwenhuis et al. require f to be strictly increasing, but unboundedness is sufficient.
We instantiated the locale parameter Open image in new window with Open image in new window and f with the Luby sequence (\(1, 1, 2, 1, 1, 2, 4, \cdots \)) [35], with the restriction that no clause containing duplicate literals is ever learned, thereby bounding the number of learnable clauses and hence the number of transitions taken by Open image in new window .
Figure 1a summarizes the syntactic dependencies between the calculi reviewed in this section. An arrow Open image in new window indicates that Open image in new window is defined in terms of Open image in new window . Figure 1b presents the refinements between the calculi. An arrow Open image in new window indicates that we proved Open image in new window or some stronger result—either by locale embedding ( Open image in new window ) or by simulating Open image in new window ’s behavior in terms of Open image in new window .
4 A Refined CDCL Towards an Implementation
The Open image in new window calculus captures the essence of modern SAT solvers without imposing a policy on when to apply specific rules. In particular, the Open image in new window rule depends on a clause \(C' \vee L'\) to justify the propagation of a literal, but does not specify a procedure for coming up with this clause. For Open image in new window , Weidenbach developed a calculus that is more specific in this respect, and closer to existing solver implementations, while keeping many aspects unspecified [57]. This calculus, Open image in new window , is also formalized in Isabelle and connected to Open image in new window .
4.1 The New DPLL Calculus

Open image in new window Open image in new window if \(C\vee L \in N \uplus U\), \(M \vDash \lnot \, C\), and L is undefined in M

Open image in new window Open image in new window if L is undefined in M and occurs in N

Open image in new window Open image in new window
if N contains a conflicting clause and \(M'\) contains no decision literals
The termination and partial correctness proofs given by Weidenbach depart from Nieuwenhuis et al. We also formalized them:
Theorem 6
(Termination [20, Open image in new window ]) The relation Open image in new window is well founded.
Theorem 7
(Partial Correctness [20, Open image in new window ]) If Open image in new window and Open image in new window is a conclusive state, N is satisfiable if and only if \(M\vDash N.\)
The proof is analogous to the proof of Theorem 2. Some lemmas are shared between both proofs. Moreover, we can link Weidenbach’s DPLL calculus with the version we derived from Open image in new window in Sect. 3.3:
Theorem 8
(DPLL [20, Open image in new window ]) For all states Open image in new window that satisfy basic structural invariants, Open image in new window if and only if Open image in new window
This provides another way to establish Theorems 6 and 7. Conversely, the simple measure that appears in the above termination proof can also be used to establish the termination of the more general Open image in new window calculus (Theorem 1).
4.2 The New CDCL Calculus
The Open image in new window calculus operates on states Open image in new window , where M is the trail; N and U are the sets of initial and learned clauses, respectively; and D is a conflict clause, or the distinguished clause \(\top \) if no conflict has been detected.
In the trail M, each decision literal L is marked as such (\(L^\dag \)—i.e., Open image in new window ), and each propagated literal L is annotated with the clause C that caused its propagation (\(L^C\)—i.e., Open image in new window ). The level of a literal L in M is the number of decision literals to the right of the atom of L in M, or 0 if the atom is undefined. The level of a clause is the highest level of any of its literals, with 0 for \(\bot \), and the level of a state is the maximum level (i.e., the number of decision literals). The calculus assumes that N contains no clauses with duplicate literals and never produces clauses containing duplicates.

Open image in new window Open image in new window
if \(C\vee L \in N \uplus U\), \(M \vDash \lnot \, C\), and L is undefined in M

Open image in new window Open image in new window if L is undefined in M and occurs in N

Open image in new window Open image in new window if \(D \in N \uplus U\) and \(M \vDash \lnot \, D\)

Open image in new window Open image in new window if \(M \not \vDash N\)

Open image in new window Open image in new window if \(M \not \vDash N\) and M contains no literal \(L^C\)

Open image in new window Open image in new window if \(D \notin \{\bot ,\top \}\) and \(L\) does not occur in D

Open image in new window Open image in new window
if D has the same level as the current state

Open image in new window Open image in new window
if L has the level of the current state, D has a lower level, and K and D have the same level
In Open image in new window , \(C \cup D\) is the same as \(C \vee D\) (i.e., \(C \uplus D\)), except that it keeps only one copy of the literals that belong to both C and D. When performing propagations and processing conflict clauses, the calculus relies on the invariant that clauses never contain duplicate literals. Several other structural invariants hold on all states reachable from an initial state, including the following: The clause annotating a propagated literal of the trail is a member of \(N \uplus U.\) Some of the invariants were not mentioned in the textbook (e.g., whenever \(L^C\) occurs in the trail, L is a literal of C). Formalization helped develop a better understanding of the data structure and clarify the book.
4.3 A Reasonable Strategy
To prove correctness, we assume a reasonable strategy: Open image in new window and Open image in new window are preferred over Open image in new window ; Open image in new window and Open image in new window are not applied. (We will lift the restriction on Open image in new window and Open image in new window in Sect. 4.5.) The resulting calculus, Open image in new window , refines Open image in new window with the assumption that derivations are produced by a reasonable strategy. This assumption is enough to ensure that the calculus can backjump after detecting a nontrivial conflict clause other than \(\bot \). The crucial invariant is the existence of a literal with the highest level in any conflict, so that Open image in new window can be applied. The textbook suggests preferring Open image in new window to Open image in new window and Open image in new window to the other rules. While this makes sense in an implementation, it is not needed for any of our metatheoretical results.
Theorem 9
(Partial Correctness [20, Open image in new window ]) If Open image in new window Open image in new window and N contains no clauses with duplicate literals, Open image in new window is a conclusive state.
Once a conflict clause has been stored in the state, the clause is first reduced by a chain of Open image in new window and Open image in new window transitions. Then, there are two scenarios: (1) the conflict is solved by a Open image in new window , at which point the calculus may resume propagating and deciding literals; (2) the reduced conflict is \(\bot \), meaning that N is unsatisfiable—i.e., for unsatisfiable clause sets, the calculus generates a resolution refutation.
The Open image in new window calculus is designed to have respectable complexity bounds. One of the reasons for this is that the same clause cannot be learned twice:
Theorem 10
(No Relearning [20, Open image in new window ])
If we have Open image in new window Open image in new window then no Open image in new window transition is possible from the latter state causing the addition of a clause from \(N \uplus U\) to U.
Many details are missing. To find the contradiction, we must show that there exists a state in the derivation with the trail \(M_2K^\dag M_1\), and such that \(D\vee L \in N \uplus U.\) The textbook does not explain why such a state is guaranteed to exist. Moreover, inductive reasoning is hidden under the ellipsis notation (\(K_n\cdots K_2\)). Such a highlevel proof might be suitable for humans, but the details are needed in Isabelle, and Sledgehammer alone cannot fill in such large gaps, especially if induction is needed. The first version of the formal proof was over 700 lines long and is among the most difficult proofs we carried out.Proof By contradiction. Assume CDCL learns the same clause twice, i.e., it reaches a state Open image in new window where Open image in new window is applicable and Open image in new window More precisely, the state has the form Open image in new window where the \(K_i\), \(i>1\) are propagated literals that do not occur complemented in D, as otherwise D cannot be of level i. Furthermore, one of the \(K_i\) is the complement of L. But now, because Open image in new window is false in Open image in new window and Open image in new window instead of deciding Open image in new window the literal L should be propagated by a reasonable strategy. A contradiction. Note that none of the \(K_i\) can be annotated with Open image in new window . \(\square \)
We later refactored the proof. Following the book, each transition in Open image in new window was normalized by applying Open image in new window and Open image in new window exhaustively. For example, we defined Open image in new window so that Open image in new window if Open image in new window and Open image in new window cannot be applied to Open image in new window and Open image in new window for some state T. However, normalization is not necessary. It is simpler to define Open image in new window as Open image in new window , with the same condition on Open image in new window as before. This change shortened the proof by about 200 lines. In a subsequent refactoring, we further departed from the book: We proved the invariant that all propagations have been performed before deciding a new literal. The core argument (“the literal L should be propagated by a reasonable strategy”) remains the same, but we do not have to reason about past transitions to argue about the existence of an earlier state. The invariant also makes it possible to generalize the statement of Theorem 10: We can start from any state that satisfies the invariant, not only from an initial state. The final version of the proof is 250 lines long.
Using Theorem 10 and assuming that only backjumping has a cost, we get a complexity of \(\mathrm {O}(3^V)\), where V is the number of different propositional variables. If Open image in new window is always preferred over Open image in new window , the learned clause is never redundant in the sense of ordered resolution [57], yielding a complexity bound of \(\mathrm {O}(2^V)\). We have not formalized this yet.
In Open image in new window , and in our formalization, Theorem 10 is also used to establish the termination of Open image in new window . However, the argument for the termination of Open image in new window also applies to Open image in new window irrespective of the strategy, a stronger result. To lift this result, we must show that Open image in new window refines Open image in new window .
4.4 Connection with Abstract CDCL
It is interesting to show that Open image in new window refines Open image in new window , to establish beyond doubt that Open image in new window is a CDCL calculus and to lift the termination proof and any other general results about Open image in new window . The states are easy to connect: We interpret a Open image in new window tuple Open image in new window as a Open image in new window pair Open image in new window , ignoring C.
The main difficulty is to relate the lowlevel conflictrelated Open image in new window rules to their highlevel counterparts. Our solution is to introduce an intermediate calculus, called Open image in new window , that combines consecutive lowlevel transitions into a single transition. This calculus refines both Open image in new window and Open image in new window and is sufficiently similar to Open image in new window so that we can transfer termination and other properties from Open image in new window to Open image in new window through it.
4.5 A Strategy with Restart and Forget
4.6 Incremental Solving
SMT solvers combine a SAT solver with theory solvers (e.g., for uninterpreted functions and linear arithmetic). The main loop runs the SAT solver on a clause set. If the SAT solver answers “unsatisfiable,” the SMT solver is done; otherwise, the main loop asks the theory solvers to provide further, theorymotivated clauses to exclude the current candidate model and force the SAT solver to search for another one. This design crucially relies on incremental SAT solving: The possibility of adding new clauses to the clause set C of a conclusive satisfiable state and of continuing from there.

Open image in new window \(_{\,C}\) Open image in new window
if \(M \not \vDash \lnot \, C\) and Open image in new window

Open image in new window \(_{\,C}\) Open image in new window
if \(L M \vDash \lnot \, C\), \(L \in C\), \(M'\) contains no literal of C, and
Theorem 11
(Partial Correctness [20, Open image in new window ]) If state Open image in new window is conclusive and Open image in new window , then Open image in new window is conclusive.
The key is to prove that the structural invariants that hold for Open image in new window still hold after adding the new clause to the state. Then the proof is easy because we can reuse the invariants we have already proved about Open image in new window .
5 A Naive Functional Implementation of CDCL
Sections 3 and 4 presented variants of DPLL and CDCL as parameterized transition systems, formalized using locales and inductive predicates. We now present a deterministic SAT solver that implements Open image in new window , expressed as a functional program in Isabelle.
When implementing a calculus, we must make many decisions regarding the data structures and the order of rule applications. Our functional SAT solver is very naive and does not feature any optimizations beyond those already present in the Open image in new window calculus; in Sect. 6, we will refine the calculus further to capture the twowatchedliteral optimization and present an imperative implementation relying on mutable data structures.
To work around this, we restrict the input by introducing a subset type that contains a strong enough structural invariant, including the duplicatefreedom of all the lists in the data structure. With the invariant in place, it is easy to show that the call graph is included in the Open image in new window calculus, allowing us to reuse its termination argument. The partial correctness theorem can then be lifted, meaning that the SAT solver is a decision procedure for propositional logic.
The final step is to extract running code. Using Isabelle’s code generator [23], we can translate the program to Haskell, OCaml, Scala, or Standard ML. The resulting program is syntactically analogous to the source program in Isabelle, including its dependencies, and uses the target language’s facilities for datatypes and recursive functions with pattern matching. Invariants on subset types are ignored; when invoking the solver from outside Isabelle, the caller is responsible for ensuring that the input satisfies the invariant. The entire program is about 520 lines long in Standard ML. It is not efficient, due to its extensive reliance on lists, but it satisfies the need for a proof of concept.
6 An Imperative Implementation of CDCL
As an impure functional language, Standard ML provides assignment and mutable arrays. We use these features to derive an imperative SAT solver that is much more efficient than the functional implementation. We start by integrating the twowatchedliteral optimization into Open image in new window . Then we refine the calculus to apply rules deterministically, and we generate code that uses arrays to represent clauses and clause sets.
The resulting SAT solver is orders of magnitude faster than the naive functional implementation described in the previous section. However, it is one to two orders of magnitude slower than DPT 2.0 [21], the fastest imperative OCaml solver we know of, because it does not implement restarts or any sophisticated heuristics for learned clause minimization. We expect that many missing heuristics will be straightforward to implement. Due to inefficient memory handling, our solver is not competitive with stateoftheart solvers.
6.1 The TwoWatchedLiteral Scheme

(\(\alpha \)) A watched literal may be false only if all the unwatched literals are false.
 1.
If some of the unwatched literals are not false, we restore the invariant by updating the clause: We start watching one of the nonfalse unwatched literals instead of \(L\).
 2.Otherwise, we consider the clause’s other watched literal:
 2.1.
If it is not set, we can propagate it.
 2.2.
If it is false, we have found a conflict.
 2.3.
If it is true, there is nothing to do.
 2.1.

(\(\beta \)) A watched literal may be false only if the other watched literal is true or all the unwatched literals are false.
 1.
We start with an empty trail and an arbitrary choice of watched literals (Fig. 3a).
 2.
We decide to make A true. The trail becomes \(A^\dag \). In clauses 2 and 3, we exchange \(\lnot \,A\) with another literal to restore the invariant (Fig. 3b).
 3.
We propagate B from clause 4. The trail becomes Open image in new window . In clause 1, we exchange \(\lnot \,B\) with A to restore the invariant (Fig. 3c).
 4.
From clauses 2 and 3, we find out that we can propagate \(\lnot \,C\) and C. We choose C. The trail becomes Open image in new window . Clause 2 is in conflict. The decision made in step 2 was wrong, so we backtrack.
 1.
We start with an empty trail and the same watched literals as before (Fig. 3a).
 2.
We decide to make A true. The trail becomes \(A^\dag \).
 3.
We propagate B from clause 4. The trail becomes Open image in new window .
 4.
We propagate C from clause 3. The trail becomes Open image in new window . Clause 2 is in conflict. The decision made in step 2 was wrong, so we backtrack.

(\(\gamma \)) If there are no pending updates for the clause and no conflict is being processed, invariant (\(\beta \)) holds.
6.2 The CDCL Calculus with Watched Literals

M is the trail;

N is the initial nonunit clause set in 2WL format;

U is the learned nonunit clause set in 2WL format;

D is a conflict clause or \(\top \);

Open image in new window is the initial unit clause set;

Open image in new window is the learned unit clause set;

Open image in new window is a multiset of literal–clause pairs (L, C) indicating that clause C must be updated with respect to literal L;

Q is a set of literals for which further updates are pending.

Open image in new window Open image in new window
if Open image in new window , \(L'\) is not set in M, and Open image in new window

Open image in new window Open image in new window
if Open image in new window , \(L' \in M\), and Open image in new window

Open image in new window Open image in new window
if Open image in new window , and \(N'\) and \(U'\) are obtained from N and U by replacing Open image in new window with Open image in new window

Open image in new window Open image in new window
if Open image in new window and \(L' \in M\)

Open image in new window Open image in new window
if L is not defined in M and appears in N

Open image in new window Open image in new window
if \(D \ne \bot \) and L satisfies the conditions on Open image in new window

Open image in new window Open image in new window
Open image in new window if L satisfies the conditions on Open image in new window
Theorem 12
(Invariant [20, cdcl_twl_stgy_twl_struct_invs]) If state Open image in new window satisfies invariant (\(\gamma \)) and Open image in new window , then T satisfies invariant (\(\gamma \)).
Open image in new window refines Open image in new window in the following sense:
Theorem 13
(Refinement [20, full_cdcl_twl_stgy_cdcl\(_W\)_stgy]) Let Open image in new window be a state that satisfies invariant (\(\gamma \)). If Open image in new window , then Open image in new window
Open image in new window refines Open image in new window ’s endtoend behavior and produces final states that are also final states for Open image in new window . We can apply Theorem 9 to establish partial correctness.
6.3 Derivation of an Executable ListBased Program
The next step is to refine the calculus with watched literals to an executable program. The state is a tuple Open image in new window , where Open image in new window is a list (instead of a set) of clauses containing first n initial nonunit clauses followed by the learned nonunit clauses, where clauses are represented as lists of literals starting with the watched ones; M uses indices in Open image in new window to represent clause annotations; and Open image in new window uses indices in Open image in new window to represent clauses. The D, Open image in new window , Open image in new window , and Q components are as before.
The main loop is called Open image in new window . Although it imposes an order on rule applications, it is not fully deterministic—for example, it does not specify which literal to choose in Open image in new window . The following theorem connects it to the Open image in new window calculus:
Theorem 14
The state returned by the program is final for Open image in new window , which means by Theorem 13 that it is also final for Open image in new window . We conclude that the program is a partially correct implementation of Open image in new window . In addition, since the specification always specifies a non Open image in new window result, the program always terminates normally.
In a further refinement step not presented here, we extend the state with watch lists that map from a literal to the clauses that are watched, instead of recalculating them each time. The watch lists are modeled by a function Open image in new window such that Open image in new window and update it in when required.
6.4 Generation of Imperative Code
To be complete in a practical sense, an executable SAT solver must first initialize the 2WL data structure, run the Open image in new window calculus, and return “satisfiable” (with a model) or “unsatisfiable,” depending on whether a conflict has been found. The initialization step is necessary not only to run the program on actual problems but also to ensure that it is possible to create a 2WL state that satisfies invariant (\(\gamma \)) for any input.
 1.If C is a unit clause L:
 1.1
Add L to the state’s Open image in new window component.
 1.2
If \(L\) is in the trail, set the state’s D component to L and stop the procedure.
 1.3
Otherwise, add L to the state’s M and Q components, unless this has already been done.
 1.1
 2.
Otherwise, add C to Open image in new window Its first two literals are watched.
Before we can generate imperative code, we must first eliminate the remaining nondeterminism, notably the choice of literal in Open image in new window . We implement the variablemovetofront heuristic [5]. During initialization, we create a list containing all the literals. This list is used to initialize the doubly linked list needed by the heuristic. We also extract the maximal atom in the list to allocate the list of the polaritychecking optimization (Sect. 6.5) with the correct length.
Second, we must specify the data structures to use the generated code. Lists of clauses are refined to resizable arrays of nonresizable arrays. The dynamic aspect is required for adding learned clauses. Within a clause, only the order of literals needs to change. We had to formalize the data structure ourselves; for technical reasons, the resizable arrays from the Imperative Collection Framework [29, 31] cannot contain arrays. We were able to reuse some of the theorems proved on the separation logic level.
We used Sepref to refine the code of the SAT solver, including initialization. We restrict the type of the atoms Open image in new window to natural numbers Open image in new window . In our first version, we also used (unbounded) natural number to literals in the generated code: The literals Open image in new window and Open image in new window are encoded by the numbers \(2\cdot i\) and \(2\cdot i +1\), respectively. However, the extraction of an atom from the literals (the integer division by 2) was inefficient in Standard ML. Therefore, we changed our representation to 32bits unsigned integers (so only \(2^{31}\) atoms are allowed). The extraction of atoms now becomes bitshifting.
The endtoend refinement theorem, relating a semantic satisfiability check on the input problem ( Open image in new window that returns Open image in new window if unsatisfiable) to the Imperative HOL heap code ( Open image in new window ), is stated below, where the Open image in new window relation refines a multiset of multisets of literals to a list of lists of 32bit unsigned integers, and the Open image in new window relation refines the model that is returned as a list of literals.
Theorem 15
(EndtoEnd Correctness [20, Open image in new window ])
6.5 Fast Polarity Checking
The imperative code described in the previous subsection suffers from a crippling inefficiency: The solver often needs to compute the polarity of a literal, and currently this is achieved by traversing the trail M, which may be very large. In practice, solvers employ a map from atoms to their current polarity.
Using stepwise refinement, we integrate this optimization into the imperative data structure used for the trail. This refinement step is isolated from the rest of the development, which only relies on its final result: a more efficient implementation of the trail and its operations. As Lammich observed elsewhere [32], this kind of modularity is invaluable when designing complex data structures.

The relation Open image in new window refines a literal with natural number atoms by a literal encoded as a 32bit unsigned integer.

The relation Open image in new window refines the trail data structure to use an array of polarities (instead of a list) and annotated literals of type Open image in new window , using the 32bit representation of literals. The clause indices of type Open image in new window remain unbounded unsigned integers.
7 Discussion and Related Work
Our formalization of the DPLL and CDCL calculi consists of about 28 000 lines of Isabelle text. The work was done over a period of 10 months almost entirely by Fleury, who also taught himself Isabelle during that time. It covers nearly all of the metatheoretical material of Sections 2.6 to 2.11 of Open image in new window and Section 2 of Nieuwenhuis et al., including normal form transformations and ground unordered resolution [19]. The refinement to an imperative program is about 20 000 lines long and took about 6 months to perform.
It is difficult to quantify the cost of formalization as opposed to paper proofs. For a sketchy argument, formalization may take an arbitrarily long time; indeed, Weidenbach’s eightline proof of Theorem 10 initially took 700 lines of Isabelle. In contrast, given a very detailed paper proof, one can sometimes obtain a formalization in less time than it took to write the paper proof [60]. A frequent hurdle to formalization is the lack of suitable libraries. We spent considerable time adding definitions, lemmas, and automation hints to Isabelle’s multiset library, and the refinement to resizable arrays of arrays required an elaborate setup, but otherwise we did not need any special libraries. We also found that organizing the proof at a high level, especially locale engineering, is more challenging, and perhaps even more time consuming, than discharging proof obligations.
One of our initial motivations for using locales, besides the ease with which it lets us express relationships between calculi, was that it allows abstracting over the concrete representation of the state. However, we discovered that this is often too restrictive, because some data structures need sophisticated invariants, which we must establish at the abstract level. We found ourselves having to modify the base locale each time we attempted to refine the data structure, an extremely tedious endeavor.
In contrast, the Refinement Framework, with its focus on functions, allows us to exploit local assumptions. Consider the Open image in new window function (Sect. 3.2), which adds a literal to the trail. Whenever the function is called, the literal is not already set and appears in the clauses. The polaritychecking optimization (Sect. 6.5) relies on the latter property to avoid checking bounds when updating the atomtopolarity map. With the Refinement Framework, there are enough assumptions in the context to establish the property. With a locale, we would have to restrict the specification of Open image in new window to handle only those cases where the literals is in the set of clauses, leading to changes in the locale definition itself and to all its uses, well beyond the polaritychecking code.
While refining to the heap monad, we discovered several issues with our program. We had forgotten several assertions (especially array bound checks) and sometimes mixed up the Open image in new window and Open image in new window annotations, resulting in large, hardtointerpret proof obligations. Sepref is a very useful tool, but it provides few safeguards or hints when something goes wrong. Moreover, the Isabelle/jEdit user interface can be unbearably slow at displaying large proof obligations.
Given the varied level of formality of the proofs in the draft of Open image in new window , it is unlikely that Fleury will ever catch up with Weidenbach. But the insights arising from formalization have already enriched the textbook in many ways. For the calculi described in this paper, the main issues were that fundamental invariants were omitted and some proofs may have been too sketchy to be accessible to the book’s intended audience. We also found a major mistake in an extension of CDCL using the branchandbound principle: Given a weight function, the calculus aims at finding a model of minimal weight. In the course of formalization, Fleury came up with a counterexample that invalidates the main correctness theorem, whose proof confused partial and total models.
For discharging proof obligations, we relied heavily on Sledgehammer, including its facility for generating detailed Isar proofs [10] and the SMTbased smt tactic [13]. We found the SMT solver CVC4 particularly useful, corroborating earlier empirical evaluations [50]. In contrast, the counterexample generators Nitpick and Quickcheck [8] were seldom useful. We often discovered flawed conjectures by observing Sledgehammer fail to solve an easylooking problem. As one example among many, we lost perhaps one hour working from the hypothesis that converting a set to a multiset and back is the identity. Because Isabelle’s multisets are finite, the property does not hold for infinite sets A; yet Nitpick and Quickcheck fail to find a counterexample, because they try only finite values for A (and Quickcheck cannot cope with underspecification anyway).
At the calculus level, we followed Nieuwenhuis et al. (Sect. 3) and Weidenbach (Sect. 4), but other accounts exist. In particular, Krstić and Goel [28] present a calculus that lies between Open image in new window and Open image in new window on a scale from abstract to concrete. Unlike Nieuwenhuis et al., they have a concrete Open image in new window rule. On the other hand, whereas Weidenbach only allows to resolve the conflict ( Open image in new window ) with the clause that was used to propagate a literal, Krstić and Goel allow any clause that could have cause the propagation (rule Open image in new window ). Another difference is that their Open image in new window and Open image in new window rules must explicitly check that no clause is learned twice (cf. Theorem 10).
Formalizing metatheoretical results about logic in a proof assistant is an enticing, if somewhat selfreferential, prospect. Shankar’s proof of Gödel’s first incompleteness theorem [52], Harrison’s formalization of basic firstorder model theory [24], and Margetson and Ridge’s formalized completeness and cut elimination theorems [36] are some of the landmark results in this area. Recently, SAT solvers have been formalized in proof assistants. Marić [37, 38] verified a CDCLbased SAT solver in Isabelle/HOL, including two watched literals, as a purely functional program. The solver is monolithic, which complicates extensions. In addition, he formalized the abstract CDCL calculus by Nieuwenhuis et al. and, together with Janičić [37, 39], the more concrete calculus by Krstić and Goel [28]. Marić’s methodology is quite different from ours, without the use of refinements, inductive predicates, locales, or even Sledgehammer.
In his Ph.D. thesis, Lescuyer [34] presents the formalization of the CDCL calculus and the core of an SMT solver in Coq. He also developed a reflexive DPLLbased SAT solver for Coq, which can be used as a tactic in the proof assistant. Another formalization of a CDCLbased SAT solver, including termination but excluding two watched literals, is by Shankar and Vaucher in PVS [53]. Most of this work was done by Vaucher during a twomonth internship, an impressive achievement. Finally, Oe et al. [47] verified an imperative and fairly efficient CDCLbased SAT solver, expressed using the Guru language for verified programming. Optimized data structures are used, including for two watched literals and conflict analysis. However, termination is not guaranteed, and model soundness is achieved through a runtime check and not proved.
8 Conclusion
The advantages of computerchecked metatheory are well known from programming language research, where papers are often accompanied by formalizations and proof assistants are used in the classroom [44, 49]. This article, like its predecessors and relatives [9, 12, 51], reported on some steps we have taken to apply these methods to automated reasoning. Compared with other application areas of proof assistants, the proof obligations are manageable, and little background theory is required.
We presented a formal framework for DPLL and CDCL in Isabelle/HOL, covering the ground between an abstract calculus and a verified imperative SAT solver. Our framework paves the way for further formalization of metatheoretical results. We intend to keep following Open image in new window , including its generalization of ordered ground resolution with CDCL, culminating with a formalization of the full superposition calculus and extensions. Thereby, we aim at demonstrating that interactive theorem proving is mature enough to be of use to practitioners in automated reasoning, and we hope to help them by developing the necessary libraries and methodology.
The CDCL algorithm, and its implementation in highly efficient SAT solvers, is one of the jewels of computer science. To quote Knuth [26, p. iv], “The story of satisfiability is the tale of a triumph of software engineering blended with rich doses of beautiful mathematics.” What fascinates us about CDCL is not only how or how well it works, but also why it works so well. Knuth’s remark is accurate, but it is not the whole story.
Notes
Acknowledgements
Open access funding was provided by the Max Planck Society. Stephan Merz made this work possible in the first place. Dmitriy Traytel remotely cosupervised Fleury’s M.Sc. thesis and provided copious advice on using Isabelle. Andrei Popescu gave us his permission to reuse, in a slightly adapted form, the succinct description of locales he cowrote on a different occasion [9]. Simon Cruanes, Anders Schlichtkrull, Mark Summerfield, Dmitriy Traytel, and the reviewers suggested many textual improvements. The work has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No. 713999, Matryoshka).
References
 1.Bachmair, L., Ganzinger, H.: Resolution theorem proving. In: Robinson, A., Voronkov, A. (eds.) Handbook of Automated Reasoning, vol. I, pp. 19–99. Elsevier, Amsterdam (2001)CrossRefGoogle Scholar
 2.Ballarin, C.: Locales: a module system for mathematical theories. J. Autom. Reason. 52(2), 123–153 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 3.Bayardo Jr., R.J., Schrag, R.: Using CSP lookback techniques to solve exceptionally hard SAT instances. In: Freuder, E.C. (ed.) CP96. LNCS, vol. 1118, pp. 46–60. Springer, Berlin (1996)Google Scholar
 4.Becker, H., Blanchette, J.C., Fleury, M., From, A.H., Jensen, A.B., Lammich, P., Larsen, J.B., Michaelis, J., Nipkow, T., Popescu, A., Schlichtkrull, A., Tourret, S., Traytel, D., Villadsen, J.: IsaFoL: Isabelle Formalization of Logic. https://bitbucket.org/isafol/isafol/. Accessed 13 Feb 2018
 5.Biere, A., Fröhlich, A.: Evaluating CDCL variable scoring schemes. In: Heule, M., Weaver, S. (eds.) SAT 2015. LNCS, vol. 5584, pp. 237–243. Springer, Berlin (2015)Google Scholar
 6.Biere, A., Heule, M., van Maaren, H., Walsh, T. (eds.): Handbook of Satisfiability, Frontiers in Artificial Intelligence and Applications, vol. 185. IOS Press, Amsterdam (2009)Google Scholar
 7.Blanchette, J.C., Böhme, S., Paulson, L.C.: Extending Sledgehammer with SMT solvers. J. Autom. Reason. 51(1), 109–128 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 8.Blanchette, J.C., Bulwahn, L., Nipkow, T.: Automatic proof and disproof in Isabelle/HOL. In: Tinelli, C., SofronieStokkermans, V. (eds.) FroCoS 2011. LNCS, vol. 6989, pp. 12–27. Springer, Berlin (2011)Google Scholar
 9.Blanchette, J.C., Popescu, A.: Mechanizing the metatheory of Sledgehammer. In: Fontaine, P., Ringeissen, C., Schmidt, R.A. (eds.) FroCoS 2013. LNCS, vol. 8152, pp. 245–260. Springer, Berlin (2013)Google Scholar
 10.Blanchette, J.C., Böhme, S., Fleury, M., Smolka, S.J., Steckermeier, A.: Semiintelligible Isar proofs from machinegenerated proofs. J. Autom. Reason. 56(2), 155–200 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 11.Blanchette, J.C., Fleury, M., Weidenbach, C.: A verified SAT solver framework with learn, forget, restart, and incrementality. In: Olivetti, N., Tiwari, A. (eds.) IJCAR 2016. LNCS, vol. 9706, pp. 25–44. Springer, Berlin (2016)Google Scholar
 12.Blanchette, J.C., Popescu, A., Traytel, D.: Soundness and completeness proofs by coinductive methods. J. Autom. Reason. 58(1), 149–179 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
 13.Böhme, S., Weber, T.: Fast LCFstyle proof reconstruction for Z3. In: Kaufmann, M., Paulson, L.C. (eds.) ITP 2010. LNCS, vol. 6172, pp. 179–194. Springer, Berlin (2010)Google Scholar
 14.Bulwahn, L., Krauss, A., Haftmann, F., Erkök, L., Matthews, J.: Imperative functional programming with Isabelle/HOL. In: Mohamed, O.A., Muñoz, C.A., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 134–149. Springer, Berlin (2008)Google Scholar
 15.Church, A.: A formulation of the simple theory of types. J. Symb. Log. 5(2), 56–68 (1940)MathSciNetCrossRefzbMATHGoogle Scholar
 16.CruzFilipe, L., Heule, M.J.H., Jr., W.A.H., Kaufmann, M., SchneiderKamp, P.: Efficient certified RAT verification. In: de Moura, L. (ed.) CADE26. LNCS, vol. 10395, pp. 220–236. Springer, Berlin (2017)Google Scholar
 17.Davis, M., Logemann, G., Loveland, D.W.: A machine program for theoremproving. Commun. ACM 5(7), 394–397 (1962)MathSciNetCrossRefzbMATHGoogle Scholar
 18.Eén, N., Sörensson, N.: An extensible SATsolver. In: Giunchiglia, E., Tacchella, A. (eds.) SAT 2003. LNCS, vol. 2919, pp. 502–518. Springer, Berlin (2003)Google Scholar
 19.Fleury, M.: Formalisation of Ground Inference Systems in a Proof Assistant. M.Sc. thesis, École normale supérieure de Rennes (2015). https://www.mpiinf.mpg.de/fileadmin/inf/rg1/Documents/fleury_master_thesis.pdf. Accessed 13 Feb 2018
 20.Fleury, M., Blanchette, J.C.: Formalization of Weidenbach’s Automated Reasoning—The Art of Generic Problem Solving (2017). https://bitbucket.org/isafol/isafol/src/master/Weidenbach_Book/README.md, Formal proof development. Accessed 13 Feb 2018
 21.Goel, A., Grundy, J.: Decision Procedure Toolkit. http://dpt.sourceforge.net/. Accessed 13 Feb 2018
 22.Gordon, M.J.C., Milner, R., Wadsworth, C.P.: Edinburgh LCF: A Mechanised Logic of Computation, LNCS, vol. 78. Springer, Berlin (1979)zbMATHGoogle Scholar
 23.Haftmann, F., Nipkow, T.: Code generation via higherorder rewrite systems. In: Blume, M., Kobayashi, N., Vidal, G. (eds.) FLOPS 2010. LNCS, vol. 6009, pp. 103–117. Springer, Berlin (2010)Google Scholar
 24.Harrison, J.: Formalizing basic first order model theory. In: Grundy, J., Newey, M. (eds.) TPHOLs ’98. LNCS, vol. 1479, pp. 153–170. Springer, Berlin (1998)Google Scholar
 25.Kammüller, F., Wenzel, M., Paulson, L.C.: Locales—a sectioning concept for Isabelle. In: Bertot, Y., Dowek, G., Hirschowitz, A., Paulin, C., Théry, L. (eds.) TPHOLs ’99. LNCS, vol. 1690, pp. 149–166. Springer, Berlin (1999)Google Scholar
 26.Knuth, D.E.: The Art of Computer Programming, vol. 4, Fascicle 6: Satisfiability. AddisonWesley, Boston (2015)Google Scholar
 27.Krauss, A.: Partial recursive functions in higherorder logic. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS, vol. 4130, pp. 589–603. Springer, Berlin (2006)Google Scholar
 28.Krstić, S., Goel, A.: Architecting solvers for SAT modulo theories: NelsonOppen with DPLL. In: Konev, B., Wolter, F. (eds.) FroCoS 2007. LNCS, vol. 4720, pp. 1–27. Springer, Berlin (2007)Google Scholar
 29.Lammich, P.: The Imperative Refinement Framework. Archive of Formal Proofs 2016. http://isaafp.org/entries/Refine_Imperative_HOL.shtml, Formal proof development. Accessed 13 Feb 2018
 30.Lammich, P.: Automatic data refinement. In: Blazy, S., PaulinMohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 84–99. Springer, Berlin (2013)Google Scholar
 31.Lammich, P.: Refinement to imperative/HOL. In: Urban, C., Zhang, X. (eds.) ITP 2015. LNCS, vol. 9236, pp. 253–269. Springer, Berlin (2015)Google Scholar
 32.Lammich, P.: Refinement based verification of imperative data structures. In: Avigad, J., Chlipala, A. (eds.) CPP 2016, pp. 27–36. ACM, New York (2016)CrossRefGoogle Scholar
 33.Lammich, P.: Efficient verified (UN)SAT certificate checking. In: de Moura, L. (ed.) CADE26. LNCS, vol. 10395, pp. 237–254. Springer, Berlin (2017)Google Scholar
 34.Lescuyer, S.: Formalizing and implementing a reflexive tactic for automated deduction in Coq. Ph.D. thesis, Université ParisSud (2011)Google Scholar
 35.Luby, M., Sinclair, A., Zuckerman, D.: Optimal speedup of Las Vegas algorithms. Inf. Process. Lett. 47(4), 173–180 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
 36.Margetson, J., Ridge, T.: Completeness theorem. Archive of Formal Proofs 2004. http://isaafp.org/entries/Completeness.shtml, Formal proof development. Accessed 13 Feb 2018
 37.Marić, F.: Formal verification of modern SAT solvers. Archive of Formal Proofs 2008. http://isaafp.org/entries/SATSolverVerification.shtml, Formal proof development. Accessed 13 Feb 2018
 38.Marić, F.: Formal verification of a modern SAT solver by shallow embedding into Isabelle/HOL. Theor. Comput. Sci. 411(50), 4333–4356 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
 39.Marić, F., Janičić, P.: Formalization of abstract state transition systems for SAT. Log. Methods Comput. Sci. 7(3) (2011). https://doi.org/10.2168/LMCS7(3:19)2011
 40.MarquesSilva, J.P., Sakallah, K.A.: GRASP–a new search algorithm for satisfiability. In: ICCAD ’96, pp. 220–227. IEEE Computer Society Press, Silver Spring (1996)Google Scholar
 41.Matuszewski, R., Rudnicki, P.: Mizar: the first 30 years. Mech. Math. Appl. 4(1), 3–24 (2005)Google Scholar
 42.Moskewicz, M.W., Madigan, C.F., Zhao, Y., Zhang, L., Malik, S.: Chaff: Engineering an efficient SAT solver. In: DAC 2001, pp. 530–535. ACM, New York (2001)Google Scholar
 43.Nieuwenhuis, R., Oliveras, A., Tinelli, C.: Solving SAT and SAT modulo theories: from an abstract Davis–Putnam–Logemann–Loveland procedure to DPLL(T). J. ACM 53(6), 937–977 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 44.Nipkow, T.: Teaching semantics with a proof assistant: no more LSD trip proofs. In: Kuncak, V., Rybalchenko, A. (eds.) VMCAI 2012. LNCS, vol. 7148, pp. 24–38. Springer, Berlin (2012)Google Scholar
 45.Nipkow, T., Klein, G.: Concrete Semantics: With Isabelle/HOL. Springer, Berlin (2014)CrossRefzbMATHGoogle Scholar
 46.Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL: A Proof Assistant for HigherOrder Logic, LNCS, vol. 2283. Springer, Berlin (2002)zbMATHGoogle Scholar
 47.Oe, D., Stump, A., Oliver, C., Clancy, K.: versat: a verified modern SAT solver. In: Kuncak, V., Rybalchenko, A. (eds.) VMCAI 2012, LNCS, vol. 7148, pp. 363–378. Springer, Berlin (2012)Google Scholar
 48.Paulson, L.C., Blanchette, J.C.: Three years of experience with Sledgehammer, a practical link between automatic and interactive theorem provers. In: Sutcliffe, G., Schulz, S., Ternovska, E. (eds.) IWIL2010. EPiC, vol. 2, pp. 1–11. EasyChair (2012)Google Scholar
 49.Pierce, B.C.: Lambda, the ultimate TA: using a proof assistant to teach programming language foundations. In: Hutton, G., Tolmach, A.P. (eds.) ICFP 2009, pp. 121–122. ACM, New York (2009)Google Scholar
 50.Reynolds, A., Tinelli, C., de Moura, L.: Finding conflicting instances of quantified formulas in SMT. In: Claessen, K., Kuncak, V. (eds.) FMCAD 2014, pp. 195–202. IEEE Computer Society Press, Silver Spring (2014)Google Scholar
 51.Schlichtkrull, A.: Formalization of the resolution calculus for firstorder logic. In: Blanchette, J.C., Merz, S. (eds.) ITP 2016. LNCS, vol. 9807, pp. 341–357. Springer, Berlin (2016)Google Scholar
 52.Shankar, N.: Metamathematics, Machines, and Gödel’s Proof, Cambridge Tracts in Theoretical Computer Science, vol. 38. Cambridge University Press, Cambridge (1994)CrossRefGoogle Scholar
 53.Shankar, N., Vaucher, M.: The mechanical verification of a DPLLbased satisfiability solver. Electr. Notes Theor. Comput. Sci. 269, 3–17 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 54.Sörensson, N., Biere, A.: Minimizing learned clauses. In: Kullmann, O. (ed.) SAT 2009. LNCS, vol. 9340, pp. 237–243. Springer, Berlin (2009)Google Scholar
 55.Sternagel, C., Thiemann, R.: An Isabelle/HOL formalization of rewriting for certified termination analysis. http://clinformatik.uibk.ac.at/software/ceta/. Accessed 13 Feb 2018
 56.Voronkov, A.: AVATAR: the architecture for firstorder theorem provers. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 696–710. Springer, Berlin (2014)Google Scholar
 57.Weidenbach, C.: Automated reasoning building blocks. In: Meyer, R., Platzer, A., Wehrheim, H. (eds.) Correct System Design: Symposium in Honor of ErnstRüdiger Olderog on the Occasion of His 60th Birthday. LNCS, vol. 9360, pp. 172–188. Springer, Berlin (2015)CrossRefGoogle Scholar
 58.Wenzel, M.: Isabelle/Isar—a generic framework for humanreadable proof documents. In: Matuszewski, R., Zalewska, A. (eds.) From Insight to Proof: Festschrift in Honour of Andrzej Trybulec, Studies in Logic, Grammar, and Rhetoric, vol. 10(23). University of Białystok (2007)Google Scholar
 59.Wirth, N.: Program development by stepwise refinement. Commun. ACM 14(4), 221 (1971)CrossRefzbMATHGoogle Scholar
 60.Woodcock, J., Banach, R.: The verification grand challenge. J. Univers. Comput. Sci. 13(5), 661–668 (2007)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.