A Flexible Proof Format for SAT Solver-Elaborator Communication

We introduce FRAT, a new proof format for unsatisfiable SAT problems, and its associated toolchain. Compared to DRAT, the FRAT format allows solvers to include more information in proofs to reduce the computational cost of subsequent elaboration to LRAT. The format is easy to parse forward and backward, and it is extensible to future proof methods. The provision of optional proof steps allows SAT solver developers to balance implementation effort against elaboration time, with little to no overhead on solver time. We benchmark our FRAT toolchain against a comparable DRAT toolchain and confirm >84% median reduction in elaboration time and >94% median decrease in peak memory usage.


Introduction
The Boolean satisfiability problem is the problem of determining, for a given Boolean formula consisting of Boolean variables and connectives, whether there exists a variable assignment under which the formula evaluates to true.Boolean satisfiability (SAT) is interesting in part because there are surprisingly diverse types of problems that can be encoded as Boolean formulas and solved efficiently by checking their satisfiability.SAT solvers, programs that automatically solve SAT problems, have been successfully applied to a wide range of areas, including hardware verification [BCC + 99], planning [KS96], and combinatorics [HKM16].
The performance of SAT solvers has taken great strides in recent years, and modern solvers can often solve problems involving millions of variables and clauses, which would have been unthinkable a mere 20 years ago [Knu15].But this improvement comes at the cost of significant increase in the code complexity of SAT solvers, which makes it difficult to either assume their correctness on faith, or certify their program correctness directly.As a result, the ability of SAT solvers to produce independently verifiable certificates has become a pressing necessity.Since there is an obvious certificate format (the satisfying Boolean assignment) for satisfiable problems, the real challenge in proof-producing SAT solving is in devising a compact proof format for unsatisfiable problems, and developing a toolchain that efficiently produces and verifies it.
The current de facto standard proof format for unsatisfiable SAT problems is DRAT [HHW13].The format, as well as its predecessor DRUP, were designed with a strong focus on quick adaptation by the community, emphasizing easy proof emission, practically zero overhead, and reasonable validation speed [HHW14].The DRAT format has become the only supported proof format in SAT Competitions and Races since 2014 due to entrants losing interest in alternatives.
DRAT is a clausal proof format [GN03], which means that a DRAT proof consists of a sequence of instructions for adding and deleting clauses.It is helpful to think of a DRAT proof as a program for modifying the 'active multiset' of clauses: the initial active multiset is the clauses of the input problem, and this multiset grows and shrinks over time as the program is executed step by step.The invariant throughout program execution is that the active multiset at any point of time is at least as satisfiable as the initial active multiset.This invariant holds trivially in the beginning and after a deletion; it is also preserved by addition steps by either RUP or RAT, which we explain shortly.The last step of a DRAT proof is the addition of the empty clause, which ensures the unsatisfiability of the final active multiset, and hence that of the initial active multiset, i.e. the input problem.
Every addition step in DRAT is either a reverse unit propagation (RUP) step [GN03] or a resolution asymmetric tautology (RAT) [JHB12] step.A clause C has the property AT (asymmetric tautology) with respect to a formula F if F, C 1 ⊥, which is to say, there is a proof of the empty clause by unit propagation using F and the negated literals in C. A RUP step that adds C to the active multiset F is valid if C has property AT with respect to F .A clause l ∨ C has property RAT with respect to F if for every clause l ∨ D ∈ F , the clause C ∨ D has property AT with respect to F .In this case, C is not logically entailed by F , but F and F ∧ C are equisatisfiable, and a RAT step will add C to the active multiset if C has property RAT with respect to F .(See [HHW13] for more about the justification for this proof system.) DRAT has a number of advantages over formats based on more traditional proof calculi, such as resolution or analytic tableaux.For SAT solvers, DRAT proofs are easier to emit because CNF clauses are the native data structures that the solvers store and manipulate internally.Whenever a solver obtains a new clause, the clause can be simply streamed out to a proof file without any further modification.Also, DRAT proofs are more compact than resolution proofs, as the latter can become infeasibly large for some classes of SAT problems [Hak85].
There is, however, room for further improvement in the DRAT format due to the information loss incurred by DRAT proofs.Consider, for instance, the SAT problem and proofs shown in Figure 1.The left column is the input problem in the DIMACS format, the center column is its DRAT proof, and the right column is the equivalent proof in the LRAT [CFHH + 17] format, which can be thought of as an enriched version of DRAT with more information.The numbers before the first zero on lines without a "d" represent literals: positive numbers denote positive literals, while negative numbers denote negative literals.The first clause of the input formula is (x 1 ∨ x 2 ∨ x 3 ), or equivalently 1 2 -3 0 in DIMACS.
The first lines of both DRAT and LRAT proofs are RUP steps for adding the clause (x 1 ∨ x 2 ), written 1 2 0. When an LRAT checker verifies this step, it is informed of the IDs of active clauses (the trailing numbers 1 6 3) relevant for unit propagation, in the exact order they should be used.Therefore, the LRAT checker only has to visit the first, sixth, and third clauses and confirm that, starting with unit literals x 1 , x 2 , they yield the new unit literals x 3 , x 4 , ⊥.In contrast, a DRAT checker verifying the same step must add the literals

DIMACS
DRAT LRAT x 1 , x 2 to the active multiset (in this case, the eight initial clauses) and carry out a blind unit propagation with the whole resulting multiset until reaching contradiction.This omission of RUP information in DRAT proofs introduces significant overheads in proof verification.
Although the exact figures vary from problem to problem, checking a DRAT proof typically takes approximately twice as long as solving the original problem, whereas the verification time for an LRAT proof is negligible compared to its solution time.This additional cost of checking DRAT proofs also represents a lost opportunity: when a SAT solver emits a RUP step, it knows exactly how the new clause was obtained, and this knowledge can (in theory) be turned into an LRAT-style RUP annotation, which can cut down verification costs significantly if conveyed to the verifier.For the DRAT format, a design choice was made not to include such information since demanding explicit proofs for all steps turned out to be impractical.Although it is theoretically possible to always glean the correct RUP annotation from the solver state, computing this information can be intricate and costly for some types of inferences (e.g.conflict-clause minimization [SB09]), making it harder to support proof logging [VG09].Reducing such overheads is particularly important for solving satisfiable formulas, as proofs are superfluous for them and the penalty for maintaining such proofs should be minimized.We should note, however, that proof elaboration need not be an all-or-nothing business; if it is infeasible to demand 100% elaborated proofs, we can still ask solvers to fill in as many gaps as it is convenient for them to do so, which would still be a considerable improvement over handling all of it from the verifier side.
Inclusion of final clauses is another potential area for improvement over the DRAT format.A DRAT proof typically includes many addition steps that do not ultimately contribute to the derivation of the empty clause.This is unavoidable in the proof emission phase, since a SAT solver cannot know in advance whether a given clause will be ultimately useful, and must stream out the clause before it can find out.All such steps, however, should be dropped in the post-processing phase in order to compress proofs and speed up verification.The most straightforward way of doing this is processing the proof in reverse order [GN03]: when processing a clause C k+1 , identify all the clauses used to derive C k+1 , mark them as 'used', and move on to clause C k .For each clause, process it if it is marked as used, and skip it otherwise.The only caveat of this method is that the postprocessor needs to know which clauses were present at the very end of the proof, since there is no way to identify which clauses were used to derive the empty clause otherwise.Although it is possible to enumerate the final clauses by a preliminary forward pass through a DRAT proof, this is clearly unnecessary work since SAT solvers know exactly which clauses are present at the end, and it is desirable to put this information in the proof in the first place.
In this paper, we introduce FRAT, a new proof format designed to address the above issues by allowing more fine-grained communication between SAT solvers and elaborators.Using a diverse range of benchmarks with nontrivial CNF instances, we also demonstrate that the FRAT format achieves its design goals with significant performance improvements.
A quick note on versions: the previous version of this paper [BCH21] was presented at TACAS 2021.The main changes from the previous version are the extension of FRAT to accommodate PR proof steps (Section 2.2) and new benchmarks for unannotated FRAT proofs (Section 5.3) and DPR proofs (Sections 5.4 and 5.5).

The FRAT format
The design of the FRAT format is largely based on DRAT, with which it shares many similarities.The main differences between FRAT and DRAT are: (1) optional annotation of RUP steps, (2) inclusion of final clauses, and (3) identification of clauses by unique IDs.We've already explained the rationale for (1) and (2); (3) is necessary for concise references to clauses in deletions and RUP step annotations.More specifically, a FRAT proof consists of the following seven types of proof steps: o: An original step; a clause from the input file.The purpose of these lines is to name the clauses from the input with identifiers; they are not required to come in the same order as the file, they are not required to be numbered in order, and not all steps in the input need appear here.Proof may also progress (with a and d steps) before all o steps are added.a, l: An addition step, and an optional LRAT-style unit propagation proof of the step.
The proof, if provided, is a sequence of clauses in the current formula in the order that they become unit.For solver flexibility, they are allowed to come out of order, but the elaborator is optimized for the case where they are correctly ordered.For a RAT or PR step, the negative numbers in the proof refer to the clauses in the active set that contain the negated pivot literal, followed by the unit propagation proof of the resolvent.See [CFHH + 17] for more details on the LRAT checking algorithm.
For PR steps (see Section 2.2), the list of literals encodes both the clause C and the witness ω.These are subject to the constraint C[0] = ω[0], so a step has a witness iff C[0] appears again later in the clause.d: A deletion step for deleting the clause with the given ID from the formula.The literals given must match the literals in the corresponding addition step up to permutation.r: A relocation step.The syntax is r ids 0, where ids has the form s 0 , t 0 , . . ., s k , t k and must consist of an even number of clause IDs.It indicates that the active clause with ID s i is re-labeled and now has ID t i , for each 0 ≤ i ≤ k. (This is used for solvers that use pointer identity for clauses, but also move clauses during garbage collection.)3:5 1 -2 -4 0 a 9 -3 -4 0 l 5 1 8 0 a 10 -4 0 l 9 3 2 8 0 a 11 3 0 a 12 -2 0 a 13 1 0 l 12 11 1 0 a 14 0 l 13 12 10 7 0 To illustrate that proofs are optional, we have omitted the proofs of steps 11 and 12 in this example.The steps must still be legal RAT steps but the elaborator will derive the proof rather than the solver.
f: A finalization step.These steps must come at the end of a proof (after a, d, o steps), and provide the list of all active clauses at the end of the proof.The clauses may come in any order, but every step that has been added and not deleted must be present, and for it to be a proof of unsatisfiability an f step for the empty clause must exist.(For best results, clauses should be finalized in roughly reverse order of when they were added.)c: A text comment, not a real step.These are preserved by the toolchain so they can be used to mark or classify proof steps for debugging purposes, but they are removed in LRAT output because external verifiers generally don't have support for comment lines.
Note that our modified version of CaDiCaL also outputs steps of the form t todo id 0, an outdated comment syntax now replaced by the more expressive c steps.The t steps are used to collect statistics on code paths that produce a steps without proofs.See Section 3 for how this information is used.Figure 2 shows the corresponding problem and proofs for the FRAT toolchain.Notice how the FRAT proof is more verbose than its DRAT counterpart and includes all the hints for addition steps, which are reused in the subsequent LRAT proof.2.0.1.Binary FRAT.The files shown in Figure 2 are in the text version of the FRAT format, but for efficiency reasons solvers may also wish to use a binary encoding.The binary FRAT format is exactly the same in structure, but the integers are encoded using the same variablelength integer encoding used in binary DRAT [HB15].Unsigned numbers are encoded in 7-bit little endian, with the high bit set on each byte except the last.That is, the number Signed numbers are encoded by mapping n ≥ 0 to f (n) := 2n and −n (with n > 0) to f (n) := 2n + 1, and then using the unsigned encoding.(Incidentally, the mapping f is not surjective, as it misses 1.But it is used by other formats so we have decided not to change it.) Comments using the c step syntax are preserved even in the binary format; they are encoded as the character c followed by a null-terminated string with the contents of the comment.3:7 text a 9 -3 -4 0 l 5 1 8 0 binary 61 09 07 09 00 6C 0A 02 10 00 2.1.Flexibility and extensibility.The purpose of the FRAT format is for solvers to be able to quickly write down what they are doing while they are doing it, with the elaborator stage "picking up the pieces" and preparing the proof for consumption by simpler mechanisms such as certified LRAT checkers.As such, it is important that we are able to concisely represent all manner of proof methods used by modern SAT solvers.
The high level syntax of a FRAT file is quite simple: A sequence of "segments", each of which begins with a character, followed by zero or more nonzero numbers, followed by a 0. In the binary version, each segment similarly begins with a printable character, followed by zero or more nonzero bytes, followed by a zero byte.(Note that continuation bytes in an unsigned number encoding are always nonzero.)This means that it is possible to jump into a FRAT file and find segment boundaries by searching for a nearby zero byte.This is in contrast to binary LRAT, in which add steps are encoded as a id literal * 0 (± id ) * 0, because a random zero byte could either be the end of a segment or the middle of an add step.Since 0x61, the ASCII representation of a, is also a valid step ID (encoding the signed number −48), in a sequence such as (a nonzero * 0) * , the literals and the steps cannot be locally disambiguated.
The local disambiguation property is important for our FRAT elaborator, because it means that we can efficiently parse FRAT files generated by solvers backward, reading the segments in reverse order so that we can perform backward checking in a single pass.The terminating '.' in c comments is a concession to ensure the backward parsing property; without this, sequences like c (a nonzero * 0) * would require excessive lookahead (reading from the right) to determine that in fact all the add steps are commented out.
DRAT is based on adding clauses that are RAT with respect to the active formula.It is quite versatile and sufficient for most common cases, covering CDCL steps, hyper-resolution, unit propagation, blocked clause elimination and many other techniques.However, we recognize that some methods can be too difficult or expensive to translate into this proof system.In this work we define only seven segment characters (a, c, d, f, l, o, r), that suffice to cover methods used by SAT solvers targeting DRAT.However, the format is forward-compatible with new kinds of proof steps, that can be indicated with different characters.
For example, CryptoMiniSat [SNC09] is a SAT solver that also supports XOR clause extraction and reasoning, and can derive new XOR clauses using proof techniques such as Gaussian elimination.Encoding this in DRAT is quite complicated: The XOR clauses must be Tseitin transformed into CNF, and Gaussian elimination requires a long resolution proof.Participants in SAT competitions therefore turn this reasoning method off as producing the DRAT proofs is either too difficult or the performance gains are canceled out by the overhead.
FRAT resolves this impasse by allowing the solver to express itself with minimal encoding overhead.A hypothetical extension to FRAT would add new segment characters to allow adding and deleting XOR clauses, and a new proof method for proof by linear algebra on these clauses.The FRAT elaborator would be extended to support the new step kinds, and it could either perform the expensive translation into DRAT at that stage (only doing the work when it is known to be needed for the final proof), or it could pass the new methods on to some XLRAT backend format that understands these steps natively.Since the extension is backward compatible, it can be done without impacting any other FRAT-producing solvers.
2.2.Extensibility case study: PR proof steps.As a demonstration of its extensibility, we extended the FRAT format to support the propagational redundancy (PR) steps [HKB17] used by the DPR format and DPR-trim tool.
For consistency with DPR proofs supported by DPR-trim, we encode PR steps using the a command, as shown in Figure 3.As written this CFG is ambiguous, since we cannot tell the difference between regular steps of the form a id literal * 0, and PR steps of the form a id literal * literal * 0. However, regular a steps have the constraint that they must not repeat any literals, while PR steps a(i, C, ω) satisfy C[0] = ω[0], so to distinguish them we search for an occurrence of C[0] later in the clause.
As we shall see in Section 5.4, this extension works very well in practice with no adverse effects on performance, providing further evidence of the extensibility of the FRAT format to new types of inferences.

FRAT-producing solvers
The FRAT proof format is designed to allow conversion of DRAT-producing solvers into FRAT-producing solvers at minimal cost, both in terms of implementation effort and impact on runtime efficiency.In order to show the feasibility of such conversions, we chose two popular SAT solvers, CaDiCaL1 and MiniSat 2 , to modify as case studies.The solvers were chosen to demonstrate two different aspects of feasibility: since MiniSat forms the basis of the majority of modern SAT solvers, an implementation using MiniSat shows that the format is widely applicable, and provides code which developers can easily incorporate into a large number of existing solvers.CaDiCaL, on the other hand, is a modern solver which employs a wide range of sophisticated optimizations.A successful conversion of CaDiCaL shows that the technology is scalable, and is not limited to toy examples.
As mentioned in Section 2, the main solver modifications required for FRAT production are inclusions of clause IDs, finalization steps, and LRAT proof traces.The provision of IDs requires some non-trivial modification as many solvers, including CaDiCaL and MiniSat, do not natively keep track of clause IDs, and DRAT proofs use literal lists up to permutation for clause identity.In CaDiCaL, we added IDs to all clauses, leading to 8 bytes overhead per clause.Additionally, unit clauses are tracked separately, and ensuring proper ID tracking for unit clauses resulted in some added code complexity.In MiniSat, we achieved 0 byte overhead by using the pointer value of clauses as their ID, with unit clauses having computed IDs based on the literal.This requires the use of relocation steps during garbage collection.The output of finalization steps requires identifying the active set from the solver state, which can be subtle depending on the solver architecture, but is otherwise a trivial task assuming knowledge of the solver.
LRAT trace production is the heart of the work, and requires the solver to justify each addition step.This modification is relatively easier to apply to MiniSat, as it only adds clauses in a few places, and already tracks the "reasons" for each literal in the current assignment, which makes the proof trace straightforward.In contrast, CaDiCaL has over 30 ways to add clauses; in addition to the main CDCL loop, there are various in-processing and optimization passes that can create new clauses.
To accommodate this complexity, we leverage the flexibility of the FRAT format which allows optional hints to focus on the most common clause addition steps, to reap the majority of runtime advantage with only a few changes.The FRAT elaborator falls back on the standard elaboration-by-unit propagation when proofs are not provided, so future work can add more proofs to CaDiCaL without any changes to the toolchain.
To maximize the efficacy of the modification, we used a simple method to find places to add proofs.In the first pass, we added support for clause ID tracking and finalization, and changing the output format to FRAT syntax.Since CaDiCaL was already producing DRAT proofs, we can easily identify the addition and removal steps and replace them with a and d steps.Once this is done, CaDiCaL is producing valid FRAT files which can pass through the elaborator and get LRAT results, but it will be quite slow since the FRAT elaborator is essentially acting as a less-optimized version of DRAT-trim at this point.
We then find all code paths that lead to an a step being emitted, and add an extra call to output a step of the form t todo id 0, where todo id is some unique identifier of this position in the code.These steps are ignored by the FRAT elaborator, but we can use them to identify the hottest code paths by running the solver on benchmarks and counting how many t steps of each kind appear.
The basic idea is that elaborating a step that has a proof is much faster than elaborating a step that doesn't, but the distribution of code paths leading to add steps is highly skewed, so adding proofs to to the top 3 or 4 paths already decreases the elaboration time by over 70%.At the time of writing, about one third of CaDiCaL code paths are covered, and median elaboration time is about 15% that of DRAT-trim (see Section 5).(This is despite the fact that our elaborator could stand to improve on low level optimizations, and runs about twice as slow as DRAT-trim when no proofs are provided.)

Elaboration
The main tasks of the FRAT-to-LRAT elaborator3 are provision of missing RUP step hints, elimination of irrelevant clause additions, and re-labeling clauses with new IDs.These tasks are performed in two separate 'passes' over files, writing and reading directly to disk (so the entire proof is never in memory at once).In the first pass, the elaborator reads the FRAT file and produces a temporary file (which by default is stored on disk, but can also be configured to be stored in memory).The temporary file is essentially the original FRAT file with the steps put in reverse order, while satisfying the following additional conditions: • All a steps have annotations.
• Every clause introduced by an o, a, or r step ultimately contributes to the proof of ⊥.
Note that we consider an r step as using an old clause with the old ID and introducing a new clause with the new ID.• There are no f steps.Algorithm 1 shows the pseudocode of the first pass, Elaborate(cert).Here, cert is the FRAT proof obtained from the SAT solver, and the pass works by iterating over its steps Algorithm 1 First pass (elaboration): FRAT to elaborated reversed FRAT revcert ← revcert, a(i, C, l(steps )) 19: revcert ← revcert, r(R )

25:
return revcert Algorithm 2 Second pass (renumbering): elaborated reversed FRAT to LRAT 1: function Renumber(F orig , revcert) for step in reverse(revcert) do 4: a(i, C, ω ?, l(steps)) ⇒ 7: if C = ⊥ then return lrat 10: assert false no proof of ⊥ found in reverse order, producing the temporary file revcert.The map F maintains the active formula as a map with unique IDs for each clause (double inserts and removes to F are always error conditions), and the effect of each step is replayed backwards to reconstruct the solver's state at the point each step was produced.
• All d or f clauses are immediately inserted to F , but (with the exception of the empty clause) are marked as not necessarily required for the proof, and the d step is deferred until just before its first use (or rather, just after the last use).• ProvePR(F, C, ω ?), not given here, checks that C has property RAT with respect to F , and produces a step list in LRAT format (where positive numbers are clause references in a unit propagation proof, and negative numbers are used in RAT steps, indicating the clauses to resolve against).When a witness ω is provided (not empty), it checks that C is PR with respect to F and the witness ω. • CheckHint(F, C, ω ?, steps) does the same thing, but it has been given a candidate proof, steps.It will check that steps is a valid proof, and if so, returns it, but the steps in the unit propagation proof may be out of order (in which case they are reordered to LRAT conformity), and if the given proof is not salvageable, it falls back on ProveRAT(F, C) to construct the proof.
In the second pass, Renumber(F orig , revcert) reads the input DIMACS file and the temporary file from the first pass, and produces the final result in LRAT format.Not much checking happens in this pass, but we ensure that the o steps in the FRAT file actually appear (up to permutation) in the input.The state that is maintained in this pass is a list of all active clause IDs, and the corresponding list of LRAT IDs (in which original steps are always numbered sequentially in the file, and add/delete steps use a monotonic counter that is incremented on each addition step).
The resulting LRAT file can then be verified by any of the verified LRAT checkers [WHH13], or by the built-in LRAT checker included in our FRAT toolchain, with one caveat: if the input FRAT file includes PR steps described in Section 2.2, the output will be an LPR file which must be checked using an LPR checker like cake lpr [THM21].This usually happens when the FRAT file was generated by converting from a DPR file.To avoid this, run the FRAT toolchain's from-pr subcommand on the original DPR file, which produces a PR-free FRAT output by translating PR steps into a sequence of RAT steps.Then the output can be elaborated into the LRAT format.
The 2-pass algorithm is used in order to optimize memory usage.The result of the first pass is streamed out so that the intermediate elaboration result does not have to be stored in memory simultaneously.Once the temporary file is streamed out, we need at least one more pass to reverse it (even if the labels did not need renumbering) since its steps are in reverse order.

Testing environment.
All tests in this section were performed on Amazon EC2 r5a.xlarge instances, running Ubuntu Server 20.04 LTS on 2.5 GHz AMD EPYC 7000 processors with 32 GB RAM and 512 GB SSD.DRAT) proofs could be elaborated into LRAT proofs within the given time limit.
5.2.FRAT vs. DRAT.We performed benchmarks comparing our FRAT toolchain (modified CaDiCaL + FRAT-to-LRAT elaborator written in Rust) against the DRAT toolchain (standard CaDiCaL + DRAT-trim) and measured their execution times, output file sizes, and peak memory usages while solving SAT instances in the DIMACS format and producing their LRAT proofs.The instances used in the benchmark were chosen by selecting all 97 instances for which default-mode CaDiCaL returned 'UNSAT' in the 2019 SAT Race results.One of these instances was excluded because DRAT-trim exhausted the available 32GB memory and failed during elaboration.Although this instance was not used for comparisons below, we note that it offers further evidence of the FRAT toolchain's efficient use of memory, since the FRAT-to-LRAT elaboration of this instance succeeded on the same system.The remaining 96 instances were used for performance comparison of the two toolchains. 4igures 5 and 6 show the time and memory measurements from the benchmark.We can see from Figure 5 that the FRAT toolchain is significantly faster than DRAT toolchain.Although the modified CaDiCaL tends to be slightly (6%) slower than standard CaDiCaL, that overhead is more than compensated by a median 84% decrease in elaboration time (the sum over all instances are 1700.47s in the DRAT toolchain vs. 381.70s in the FRAT toolchain, so the average is down by 77%).If we include the time of the respective solvers, the execution time of the FRAT + modified CaDiCaL toolchain is 53.6% that of the DRAT + CaDiCaL toolchain on median.The difference in the toolchains' time budgets is clear: the DRAT toolchain spends 42% of its time in solving and 58% in elaboration, while FRAT spends 85% on solving and only 15% on elaboration.'DRAT') shows the number of instances whose LRAT proofs could be generated by the FRAT (resp.DRAT) toolchain within the given peak memory usage limit.
Figure 6 shows a dramatic difference in peak memory usage between the FRAT and DRAT toolchains.On median, the FRAT toolchain used only 5.4% as much peak memory as DRAT.(The average is 318.62 MB, which is 11.98% that of the DRAT toolchain's 2659.07 MB, but this is dominated by the really large instances.The maximum memory usage was 2.99 GB for FRAT and 21.5 GB for DRAT, but one instance exhausted the available 32 GB in DRAT and is not included in this figure.)This result is in agreement with our initial expectations: the FRAT toolchain's 2-pass elaboration method allows it to limit the number of clauses held in memory to the size of the active set used by the solver, whereas the DRAT toolchain loads all clauses in a DRAT file into memory at once during elaboration.This difference suggests that the FRAT toolchain can be used to verify instances that would be unverifiable with the DRAT toolchain due to requiring more memory than the system limit.
There were no noticeable differences in the sizes or verification times of LRAT proofs produced by the two toolchains.On average, LRAT proofs produced by the FRAT toolchain were 1.873% smaller and 3.314% faster 5 to check than those from the DRAT toolchain.
One minor downside of the FRAT toolchain is that it requires the storage of a temporary file during elaboration, but we do not expect this to be a problem in practice since the temporary file is typically much smaller than either the FRAT or LRAT file.In our test cases, the average temporary file size was 28.68% and 47.60% that of FRAT and LRAT files, respectively.In addition, users can run the elaborator with the -m option to bypass temporary files and write the temporary data to memory instead, which further improves performance but foregoes the memory conservation that comes with 2-pass elaboration.
The modified CaDiCaL used for the benchmark is a prototype, and some of its weaknesses show in the data.The general pattern we observed is that on problems for which 5 One instance was omitted from the LRAT verification time comparison due to what seems to be a bug in the standard LRAT checker included in DRAT-trim.Detailed information regarding this instance can be found at https://github.com/digama0/frat/blob/lmcs/benchmarks/README.md.shows the number of instances whose FRAT0 (resp.DRAT) proofs could be elaborated into LRAT proofs within the given time limit.
the predicted CaDiCaL code paths were taken, the generated files have a large number of hints and the elaboration time is negligible (the "FRAT elab" line in Fig. 5); but on problems which make use of the more unusual in-processing operations, many steps with no hints are given to the elaborator, and performance becomes comparable to DRAT-trim.For solver developers, this means that there is a very direct relationship between proof annotation effort and mean solution + elaboration time.
5.3.FRAT0 vs. DRAT.The performance improvement of the FRAT toolchain relative to the DRAT toolchain comes from two main sources, (1) the optional hints and (2) the 2-pass elaboration model.Since the two optimizations are independent of each other, it would be interesting to investigate how each of them affect elaboration performance in isolation.
In particular, we'd want to know how the FRAT toolchain performs when using the 2-pass elaboration on completely unannotated FRAT proofs (which, for convenience, we refer to as 'FRAT0'), since this is a realistic scenario for developers who find it difficult to modify a solver for hint generation and seek to use the FRAT format for the memory efficiency advantage, or to interface with other FRAT-based tools.
In order to perform this new test, we had to first update the elaborator with new optimizations.The original version of the elaborator used for the tests in Section 5.2 was designed with the assumption that most steps will be annotated, falling back to naive searches and becoming prohibitively slow whenever a significant proportion of steps came without hints.This problem was fixed by porting relevant optimization techniques from DRAT-trim to the elaborator, putting their elaboration speeds (roughly) on par.Using this new version of the elaborator, we performed another benchmark that compares the elaboration time and memory usage of FRAT0 and DRAT files.The files used for the benchmark were identical to that of Section 5.2, except that all hints were removed from the FRAT files using the strip-frat subcommand of the elaborator.The results are shown in Figures 7 and 8  (resp.'DRAT') shows the number of instances whose FRAT0 (resp.DRAT) proofs could be elaborated into LRAT proofs within the given peak memory usage limit.
Although the elaboration speed advantage is gone without proof hints, FRAT0 still trails DRAT closely thanks to the new optimizations.In return for the slightly increased elaboration time, we get steep cuts in memory usage, which has remained virtually unchanged from the FRAT test results.The data shows that the FRAT toolchain can confer significant advantages whenever memory usage is the bottleneck, even if you cannot implement any major modifications for the SAT solver used.5.4.FRAT elaborator vs. DPR-trim.In addition to offering performance improvements over the DRAT format and toolchain, the flexibility of the FRAT format allows it to also serve as an alternative to the more recent and expressive DPR format.Since the DPR format is much more nascent than DRAT, it was difficult to do a direct comparison with two solvers producing equivalent proofs side-by-side.Instead, we performed a proxy comparison by taking the example DPR proofs from the pr-proofs repository6 and producing their FRAT equivalents with the DPR-to-FRAT converter included in the FRAT elaborator.The elaboration test results using these instances are shown in Figures 9 and 10.
As in the FRAT vs. DRAT benchmarks, the elaboration of FRAT proofs require significantly less memory than that of DPR, which is expected given its 2-pass elaboration model.What's more surprising is the FRAT elaborator's significant speedup over DPR-trim: unlike in the FRAT vs. DRAT case, the FRAT proofs in this benchmark were generated from the DPR proofs and do not contain any additional information, so the faster speed can be entirely attributed to the elaborator itself.At any rate, the results show that FRAT can be a highly viable alternative to DPR, at least when relative to the current version of DPR-trim.In fact, replacing a DPR toolchain with a FRAT one does not even require any solver modification: the FRAT elaborator's DPR-to-FRAT conversion takes negligible execution time and uses (resp.'DPR-trim') shows the number of instances whose FRAT (resp.DPR) proofs could be elaborated into LPR proofs within the given time limit.Note that the x-axis scale is logarithmic, and the lowest datapoints of the FRAT graph are not shown because their execution times were measured as 0 seconds with the given measurement resolution (0.01 seconds).we can take any DPR-producing solver and plug it to the converter to enjoy almost the same performance improvements as using a natively FRAT-producing solver.
5.5.FRAT elaborator vs. PR2DRAT + DPR-trim.In some cases, it may be desirable to produce an LRAT proof output instead of LPR, as the former is supported by a wider range of existing tools.The standard DPR toolchain for doing this is PR2DRAT + DPR-trim, where the former converts DPR input into DRAT, and the latter elaborates DRAT into LRAT (note that DPR is a superset of DRAT, so DPR-trim can elaborate DRAT proofs just like DRAT-trim).Alternatively, we can use FRAT as the intermediate format between DPR and LRAT, using the converter included in the FRAT elaborator for DPR-to-FRAT conversion.We benchmarked the performances of the two elaboration methods using the 19 example instances from the PR2DRAT repository7 , whose results are shown in Figures 11 and 12. Again, the results follow a similar pattern observed in other benchmarks.Memory usage shows a dramatic improvement by a couple orders of magnitude, while elaboration speedup is less pronounced but still consistent across the board.This shows that the FRAT format and toolchain is useful not only as a DPR replacement, but also for complimenting DPR.Each datapoint of 'FRAT elab' (resp.'PR2DRAT + DPR-trim') shows the number of instances whose DPR proofs could be elaborated into LRAT proofs by the FRAT elaborator (resp.PR2DRAT and DPR-trim) within the given peak memory usage limit.

Related works
As already mentioned, the FRAT format is most closely related to the DRAT format [Heu16], which it seeks to replace as an intermediate output format for SAT solvers.It is also dependent on the LRAT format and related tools [CFHH + 17], as the FRAT toolchain targets LRAT as the final output format.The GRAT format [Lam17] and toolchain also aims to improve elaboration of SAT unsatisfiability proofs, but takes a different approach from that of FRAT.It retains DRAT as the intermediate format, but uses parallel processing and targets a new final format with more information than LRAT in order to improve overall performance.GRAT also comes with its own verified checker [Lam19].
Specifying and verifying the program correctness of SAT solvers (sometimes called the autarkic method, as opposed to the proof-producing skeptical method) is a radically different approach to ensuring the correctness of SAT solvers.There have been various efforts to verify nontrivial SAT solvers [Mar10,SV11,OSOC12,Fle19,FBL18].Although these solvers have become significantly faster, they cannot compete with the (unverified) state-of-the-art solvers.It is also difficult to maintain and modify certified solvers.Proving the correctness of nontrivial SAT solvers can provide new insights about key invariants underlying the used techniques [FBL18].
Generally speaking, devising proof formats for automated reasoning tools and augmenting the tools with proof output capability is an active research area.Notable examples outside SAT solving include the LFSC format for SMT solving [SOR + 13] and the TSTP format for classical first-order ATPs [SZS04].In particular, the recent work on the veriT SMT solver [BBFF19] is motivated by similar rationales as that for the FRAT toolchain; the key 3:19 insight is that a proof production pipeline is often easier to optimize on the solver side than on the elaborator side, as the former has direct access to many types of useful information.

Conclusion
The test results show that the FRAT format and toolchain made significant performance gains relative to their DRAT equivalents in both elaboration time and memory usage.We take this as confirmation of our initial conjectures that (1) there is a large amount of useful and easily extracted information in SAT solvers that is left untapped by DRAT proofs, and (2) the use of streaming verification is the key to verifying very large proofs that cannot be held in memory at once.
The practical ramification is that, provided that solvers produce well-annotated FRAT proofs, the elaborator is no longer a bottleneck in the pipeline.Typically, when DRAT-trim hangs it does so either by taking excessive time, or by attempting to read in an entire proof file at once and exhausting memory (the so-called "uncheckable" proofs that can be produced but not verified).But FRAT-to-LRAT elaboration is typically faster than FRAT production, and the memory consumption of the FRAT-to-LRAT elaborator at any given point is proportional to the memory used by the solver at the same point in the proof.Since LRAT verification is already efficient, the only remaining limiting factor is essentially the time and memory usage of the solver itself.
In addition to performance, the other main consideration in the design of the FRAT format and toolchain was flexibility of use and extension.The encoding of FRAT files allows them to be read and parsed both backward and forward, and the format can be modified to include more advanced inferences, as we have seen in the example of PR steps.The optional l steps allow SAT solvers to decide precisely when they will provide explicit proofs, thereby promoting a workable compromise between implementation complexity and runtime efficiency.SAT solver developers can begin using the format by producing the most bare-bones FRAT0 proofs with no annotations (essentially DRAT proofs with metadata for original/final clauses) and gradually work toward providing more complete hints.We hope that this combination of efficiency and flexibility will motivate performance-minded SAT solver developers to adopt the format and support more robust proof production, which is presently only an afterthought in most SAT solvers.

Figure 1 :
Figure 1: DRAT and LRAT proofs of a SAT problem.All whitespace and alignment is not significant; we have aligned lines of the DRAT proof with the corresponding LRAT lines (d steps in LRAT may correspond to multiple DRAT d steps).

Figure 2 :
Figure2: FRAT and LRAT proofs of a SAT problem.To illustrate that proofs are optional, we have omitted the proofs of steps 11 and 12 in this example.The steps must still be legal RAT steps but the elaborator will derive the proof rather than the solver.

Figure 3 :
Figure 3: Context-free grammar for the FRAT format.(The additions in red are to support PR proofs, see Section 2.2.)

Figure 4 :
Figure 4: Comparison of binary and text formats for a step.Note that the step ID 9 uses the unsigned encoding, but literals and LRAT style proof steps use signed encoding.

Figure 5 :
Figure 5: FRAT vs. DRAT time comparison.Each datapoint of 'FRAT total' (resp.'DRAT total') shows the number of instances whose LRAT proofs could be generated by the FRAT (resp.DRAT) toolchain within the given time limit.Each datapoint of 'FRAT elab' (resp.'DRAT elab') shows the number of instances whose FRAT (resp.DRAT) proofs could be elaborated into LRAT proofs within the given time limit.

Figure 6 :
Figure 6: FRAT vs. DRAT peak memory usage comparison.Each datapoint of 'FRAT' (resp.'DRAT')shows the number of instances whose LRAT proofs could be generated by the FRAT (resp.DRAT) toolchain within the given peak memory usage limit.

Figure 8 :
Figure 8: FRAT0 vs. DRAT peak memory usage comparison.Each datapoint of 'FRAT0'(resp.'DRAT') shows the number of instances whose FRAT0 (resp.DRAT) proofs could be elaborated into LRAT proofs within the given peak memory usage limit.

Figure 9 :
Figure 9: FRAT elaborator vs. DPR-trim time comparison.Each datapoint of 'FRAT elab'(resp.'DPR-trim') shows the number of instances whose FRAT (resp.DPR) proofs could be elaborated into LPR proofs within the given time limit.Note that the x-axis scale is logarithmic, and the lowest datapoints of the FRAT graph are not shown because their execution times were measured as 0 seconds with the given measurement resolution (0.01 seconds).

Figure 10 :Figure 11 :
Figure 10: FRAT elaborator vs. DPR-trim peak memory usage comparison.Each datapoint of 'FRAT elab' (resp.'DPR-trim') shows the number of instances whose FRAT (resp.DPR) proofs could be elaborated into LPR proofs within the given peak memory usage limit.