1 Introduction

The general area of correct-by-construction code generation is venerable, going back at least to Dijkstra’s work in the 1960s [5]. Oftentimes, solutions offer a strict subset of the desiderata of generality, automation, and performance of synthesized code. This paper presents the final piece of a pipeline that sits at the sweet spot of all three, enabling semiautomatic refinement of high-level specifications into efficient low-level code in a proof-generating manner. Our initial specification language is the rich, higher-order logic of Coq, and we support a high degree of automation through domain-specific refinement strategies, which in turn enable targeted optimization strategies for extracting efficient low-level code. In order to take advantage of these opportunities, we have built an extensible compilation framework that can be updated to handle new compilation strategies without sacrificing soundness. Our pipeline is foundational: it produces a fully linked assembly program represented as a Coq term with a proof that it meets the original high-level specification.

Fig. 1.
figure 1

The full pipeline, with this work’s contributions in . Stick figures indicate user-supplied components. (Color figure online)

Our complete toolchain uses Fiat [4] to refine high-level specifications of abstract data types (ADTs) into nondeterministic functional programs depending on external data structures (expressed in a shallowly embedded Gallina DSL), then soundly extracts these programs to an imperative intermediate language (Facade) using a novel proof-generating extraction procedure. The resulting programs are then translated into the Cito [29] language by a newly written compiler, backed by a nontrivial soundness argument bridging two styles of operational semantics. A traditional verified compiler produces efficient Bedrock assembly [3] from the Cito level, which we soundly link against hand-verified implementations of the required data structures. Beyond exploring a new technique for sound extraction of shallowly embedded DSLs (EDSLs), this work bridges the last remaining gap (extraction) to present the first mechanically certified automatic translation pipeline from declarative specifications to efficient assembly programs, as shown in Fig. 1.

In the original Fiat system, specifications were highly nondeterministic programs, and final implementations were fully deterministic programs obtained by repeatedly refining the specification, eventually committing to a single possible result. As a consequence, the generated code committed to a particular deterministic (and pure) implementation of external ADTs and functions that it relied on, reducing flexibility, optimization opportunities, and overall performance. Additionally, the final step in previous work using Fiat was to extract this code directly to OCaml, using Coq’s popular but unverified extraction mechanism. Unfortunately, this meant that correctness of the compiled executable depended not only on the correctness of Coq’s kernel but also on that of the extraction mechanism and of the OCaml compiler and runtime system. These two dependencies significantly decreased the confidence that users can place in programs synthesized by Fiat, and more generally in all programs extracted from Gallina code.

Our work overcomes these issues via a novel approach to extraction that is both extensible and correct and produces efficient, stateful low-level code from nondeterministic functional sources. The process runs within Coq, produces assembly code instead of OCaml code, and supports linking with handwritten or separately compiled verified assembly code.

Instead of refining specifications down to a fully deterministic Gallina program, as the original Fiat system did, we allow Fiat’s final output to incorporate nondeterminism. These choices are resolved at a later stage by interpreting the nondeterminism as a postcondition specification in Hoare logic and linking against assembly code proven to meet that specification. Nondeterminism at runtime, which is not normally present in Gallina programs, is essential to support code derivation with flexible use of efficient low-level data structures. For example, if we represent a database with a type of binary trees that does not enjoy canonical representations, the same data may admit multiple concrete representations, each corresponding to a different ordering of results for an operation enumerating all database records.

Unlike certified compilers like CompCert [13] or CakeML [9], we do not implement our translator in the proof assistant’s logic and prove it sound once and for all. Instead, we use proof-generating extraction: we phrase the translation problem in a novel sequent-calculus-style formulation that allows us to apply all of Coq’s usual scriptable proof automation. The primary reason is that we want to make our compiler extensible by not committing to a specific compilation strategy: in our system, programmers can teach the compiler about new verified low-level data structures and code-generation strategies by introducing new lemmas explaining how to map a Gallina term to a particular imperative programFootnote 1. Our automation then builds a (deeply embedded) syntax tree by repeatedly applying lemmas until the nondeterministic functional program is fully compiled. The many advantages of this approach (extensibility, ease of development, flexibility, performance, and ease of verification) do come at a cost, however: compilation is slower, care is needed to make the compiler robust to small variations in input syntax, and the extensible nature of the compiler makes it hard to characterize the supported source language precisely.

To summarize the benefits of our approach:

  • It is lightweight: it does not require reifying the entirety of Gallina into a deeply embedded language before compiling. Instead, we use Coq’s tactic language to drive compilation.

  • It is extensible: each part of the compilation logic is expressed as a derivation rule, proved as an arbitrarily complex Coq theorem. Users can assemble a customized compiler by supplying their own compilation lemmas to extend the source language or improve the generated code.

  • It is well-suited to compiling EDSLs: we support nondeterminism in input programs (standard extraction requires deterministic code).

  • It allows us to link against axiomatically specified foreign functions and data structures, implemented and verified separately.

  • It compiles to a relatively bare language with explicit memory management.

To demonstrate the applicability of this approach, Sect. 6 presents a set of microbenchmarks of Fiat programs manipulating variables, conditions, and nested lists of machine words, as well as a more realistic example of SQL-like programs similar to those of the original Fiat paper. These benchmarks start from high-level specifications of database queries and pass automatically through our pipeline to closed assembly programs, complete with full-functional-correctness specifications and proofs in separation logic. Source code and compilation instructions for the framework and benchmarks are available online at https://github.com/mit-plv/fiat/tree/ijcar2020.

2 A Brief Outline of Our Approach

We begin with an example of the pipeline in action. Below are an SQL-style query finding all titles by an author and a Fiat-generated implementation (right):

figure b

The generated code relies on a Fiat module , which is not an executable implementation of the required functionality; rather, it specifies certain methods nondeterministically, implying that returns the expected rows in some undetermined order. The order may even be different for every call, as might arise, for instance, with data structures like splay trees that adjust their layout even during logically read-only operations.

Such nondeterministic programs are the starting point for our new refinement phases. The ultimate output of the pipeline is a library of assembly code in the Bedrock framework [3], obtained by extracting to a new language, Facade, built as a layer on top of the Cito C-like language [29], and then compiling to Bedrock.

figure e

The output for our running example might look like the code on the right. Note that this code works directly with pointers to heap-allocated mutable objects, handling all memory management by itself, including for intermediate values. The general interface has been replaced with calls to a concrete module providing binary search trees of tuples, and the call to became an imperative loop. We implement and verify in Bedrock assembly, and then we link code and proofs to obtain a binary and an end-to-end theorem guaranteeing full functional correctness of assembly libraries, for code generated automatically from high-level specifications.

The heart of our contribution is spanning the gap from nondeterministic functional programs (written in Gallina) to imperative low-level programs (written in Facade) using an extensible, proof-generating framework. We phrase this derivation problem as one of finding a proof of a Hoare triple, where the precondition and postcondition are known, but the Facade program itself must be derived during the proof. The central goal from our running example looks as follows, where stands for the overall Facade program that we seek, and where we unfold (Subsect. 4.3 defines these triples precisely).

figure l

The actual implementation of is found by applying lemmas to decompose this goal into smaller, similar goals representing subexpressions of the final program. These lemmas form a tree of deduction steps, produced automatically by a syntax-directed compilation script written in Coq’s Ltac tactic language. Crucially, the derivation implemented by this script can include any adequately phrased lemma, allowing new implementation strategies. Composed with the automation that comes before and after this stage, we have a fully automated, proof-generating pipeline from specifications to libraries of assembly code.

Fig. 2.
figure 2

A few rules of our synthesizing compiler.

3 An Example of Proof-Producing Extraction

We begin by illustrating the compilation process on the example Fiat program from Sect. 2. We synthesize a Facade program p according to the following specificationFootnote 2, which we summarize as

  • p, when started in an initial state containing the arguments and must be safe (it must not violate function preconditions, access undefined variables, leak memory, etc.).

  • p, when started in a proper initial state, must reach (if it terminates) a state where the variable has one of the values allowed by the nondeterministic program shown above.

Replacing with our example, we need to find a program p such that

figure v

We use our first compilation lemma (with a few examples shown in Fig. 2) to connect the semantics of Fiat’s bind operation (the operator of monads [27]) to the meaning of , which yields the following synthesis goal:

figure x

In this step, we have broken down the assignment to of a Fiat-level bind ( ) into the assignment of two variables: to the intermediate list of authors, and to the final result. The operator separates entries in a list of bindings of Facade variables to nondeterministic Fiat terms. The ordering of the individual bindings matters: the Fiat term that we assign to depends on the particular value chosen for bound locally as .

We then break down the search for into the search for two smaller programs: the first ( ) starts in the initial state (abbreviated to ) and is only concerned with the assignment to ; the second ( ) starts in a state where is already assigned and uses that value to construct the final result.

figure am

At this point, a lemma about connecting the meaning of the nondeterministic selection of authors and the Facade-level function tells us that is a good choice for (this is the call rule for ). We are therefore only left with to synthesize: noticing the common prefix of the starting and ending states, we apply a rule (called chomp in our development) allowing us to set aside the common prefix and focus on the tail of the pre- and post-states, transforming the problem into

figure as

The additional mapping pictured under the arrow indicates that the initial and final states must both map to the same value . In this form, we can first rewrite to , at which point the synthesis goal matches the conclusion of the fold\(_L\) rule shown in Fig. 2c: given a program to initialize the accumulator and a program to implement the body of the fold, the Facade program defined by the macro obeys the specification above. This gives us two new synthesis goals, which we can handle recursively, in a fashion similar to the one described above. Once these obligations have been resolved, we arrive at the desired Facade program.

4 Proof-Generating Extraction of Nondeterministic Functional Programs: From Fiat to Facade

4.1 The Facade Language

We start with a brief description of our newly designed target language, Facade. Facade is an Algol-like untyped imperative language operating on Facade states, which are finite maps from variable names to Facade values (either scalars, or nonnull values of heap-allocated ADTs). Syntactically, Facade includes standard programming constructs like assignments, conditionals, loops, function calls, and recursion. What distinguishes the language is its operational semantics, pictured partially in Fig. 3. First, that semantics follows that of Cito in supporting modularity by modeling calls to externally defined functions via preconditions and postconditions. Second, linearity is baked into Facade’s operational semantics, which enforce that every ADT value on the heap will be referred to by exactly one live variable (no aliasing and no leakage) to simplify reasoning about the formal connection to functional programs: if every object has at most one referent, then we can almost pretend that variables hold abstract values instead of pointers to mutable objects. In practice, we have not found this requirement overly constraining for our applications: one can automatically introduce copying when needed, or one can require the external ADTs to provide nondestructive iteration.

The program semantics manipulates local-variable environments where ADTs are associated with high-level models. For instance, a finite set is modeled as a mathematical set, not as e.g. a hash table. A key parameter to the compiler soundness theorem is a separation-logic abstraction relation, connecting the domain of high-level ADT models to mutable memories of bytes. By picking different relations at the appropriate point in our pipeline, we can justify linking with different low-level implementations of high-level concepts. No part of our automated translation from Fiat to Facade need be aware of which relation is chosen, and the same result of that process can be reused for different later choices. This general approach to stateful encapsulation is largely inherited from Cito, though with Facade we have made it even easier to use.

Fig. 3.
figure 3

Selected syntax & operational semantics of Facade [28].

Facade’s operational semantics are defined by two predicates, \(\varPsi \vdash (\texttt {p}, \texttt {st}){\downarrow }\) and \(\varPsi \vdash (\texttt {p}, \texttt {st}) \Downarrow \texttt {st'}\), expressing respectively that the Facade program p will run safely when started in Facade state st, and that p may reach state st’ when started from st (this latter predicate essentially acts as a big-step semantics of Facade). Both predicates are parameterized over a context \(\varPsi \) mapping function names to their axiomatic specifications. The semantics is nondeterministic in the sense that there can be more than one possible st’.

Modularity is achieved through the CallAx rule, allowing a Facade program to call a function via its specification in \(\varPsi \). A function call produces a return value and a list of output values representing the result of in-place modification of input ADT arguments . A precondition is a predicate pre on the values assigned to the input arguments of the callee by the map . A postcondition is a predicate post on these input values, output values , and return value . The semantics prescribes that a function call will nondeterministically pick a list of output values and a return value satisfying and use them to update the relevant variables in the caller’s postcall state (possibly deallocating them).

Linearity is achieved by a set of syntactic and semantic provisions. For instance, variables currently holding ADT values cannot appear on the righthand sides of assignments, to avoid aliasing. They also cannot appear on the lefthand sides of assignments, to avoid losing their current payloads and causing memory leaks.

We have implemented a verified translation from Facade to Cito, and from there we reuse established infrastructure to connect into the Bedrock framework for verified assembly code. Its soundness proof has the flavor of justifying a new type system for an existing language, since Facade’s syntax matches that of Cito rather closely.

4.2 Fiat and Facade States

Fig. 4.
figure 4

Equivalence relation on Fiat and Facade states. Because Facade does not allow us to leak ADT values, we require that all bindings pointing to ADT values in be reflected in and vice versa. For scalars, we only require that bindings in be present in st.

We connect Fiat’s semantics to those of Facade by introducing a notion of Fiat states, which allow us to express pre and post-conditions in a concise and homogeneous way, facilitating syntax-driven compilation. Each Fiat state (denoted as ) describes a set of Facade states (denoted as ): in Facade, machine states are unordered collections of names and values. Fiat states, on the other hand, are ordered collections of bindings (sometimes called telescopes), each containing a variable name and a set of permissible values for that variable.

For example,the telescope

describes all machine states in which maps to a positive value and maps to the pair . Each variable in a Fiat state is annotated with a function describing how to inject values of its type in and out of the concrete type used at the Facade level (e.g. a linked list may be extracted to a vector, as in our example).

Finally, to be able to implement the aforementioned chomp rule, Fiat states are extended with an unordered map ) from names to concrete values. A full Fiat state is thus composed of a telescope st and an extra collection of bindings , written . We relate Fiat states to Facade states using the ternary predicate defined in Fig. 4, which ensures that the values assigned to variables in the Facade state are compatible with the bindings described in the Fiat state .

4.3 Proof-Generating Extraction by Synthesis

Armed with this predicate, we are ready for the full definition of :

  • For any initial Facade state , if is in relation with the Fiat state extended by ext, then it is safe to run the Facade program p from state st.

  • For all initial and final Facade states st and st’, if st is in relation with the Fiat state st extended by ext, and if running the Facade program p starting from st may produce the Facade state st’, then st’ is in relation with the Fiat state st’ extended by ext.

This definition is enough to concisely and precisely phrase the three types of lemmas required to synthesize Facade programs: properties of the relation used to drive the proof search and provide the extraction architecture; connections between the relation and Fiat’s semantics, used to reduce extraction of Fiat programs to that of Gallina programs; and connections between Fiat and Facade, such as the FoldL rule of Fig. 2c (users provide additional lemmas of the latter kind to extend the scope of the compiler and broaden the range of source programs that the compiler is able to handle).

With these lemmas, we can phrase certified extraction as a proof-search problem that can be automated effectively. Starting from a Fiat computation mixing Gallina code with calls to external ADTs, we generate a specification based on the predicate (which itself is defined in terms of Facade’s operational semantics):

(1)

From this starting point, extraction proceeds by analyzing the shapes of the pre- and post-states to determine applicable compilation rules, which are then used to build a Facade program progressively. This stage explains why we chose strongly constrained representations for pre and post-states: where typical verification tasks compute verification conditions from the program’s source, we compute the program from carefully formulated pre- and postconditions (proper care in designing the compilation rules and their preconditions obviates the need for backtracking).

In practice, this syntax-driven process is implemented by a collection of matching functions written in Ltac. These may either fail, or solve the current goal by applying a lemma, or produce a new goal by applying a compilation lemma of the form shown in Fig. 2. Our extraction architecture is extensible: the main loop exposes hooks that users can rebind to call their own matching rules. Examples of such rules are provided in Sect. 6.1. Our focus is on extracting efficient code from Gallina EDSLs, so the set of rules is tailored to each domain and does not cover all possible programs (in particular, we do not have support for arbitrary fixpoints or pattern-matching constructs; we use custom lemmas mapping specific matches to specific code snippets or external functions). When the compiler encounters an unsupported construct , it stops and presents the user with a goal of the form indicating which piece is missing so the user can provide the missing lemmas and tactics.

In our experience, debugging proof search and adding support for new constructs is relatively easy, though it does require sufficient familiarity with Coq. Typically, our compiler would have two classes of users: library developers, who interactively implement support for new DSLs (developing compilation tactics requires manual labor similar to writing a domain-specific compiler); and final users, who write programs within supported DSLs and use fully automated compilation tactics.

5 The Complete Proof-Generating Pipeline

The components presented in the previous section form the final links in an automated pipeline lowering high-level specifications to certified Bedrock modules, whose correctness is guaranteed by Theorem 1.

Starting from a Fiat ADT specification (a collection of high-level method specifications , as shown in Fig. 5a), we obtain by refinement under a relation \(\approx \) a Fiat ADT implementation (a collection of nondeterministic functional programs , as shown in Fig. 5b). Each method of this implementation is assigned an operational specification (Eq. 1), from which we extract (using proof-producing synthesis, optionally augmented with user-specified lemmas and tactics) a verified Facade implementation mimpl (Sect. 4.3) that calls into a number of external functions (\(\varPsi \), Fig. 3), as shown in Fig. 5c.

Finally, we package the resulting Facade methods into a Facade module. This module imports \(\varPsi \) (i.e. it must be linked against implementations of the functions in \(\varPsi \)) and exports axiomatic specifications straightforwardly lifted from the original high-level specifications into Facade-style axiomatic specifications (of the style demonstrated in the call rule of Fig. 3): for each high-level specification , we export the following (written ):

figure cp

Since we are working in an object-oriented style at the high level, our low-level code follows a convention of an extra “self” argument added to each method, written in this logical formulation as for spec-level “self” values and for implementation-level “self” values.

A generic proof guarantees that the operational specifications used to synthesize Facade code indeed refine the axiomatic specifications exported by our Facade module. Compiling this Facade module via our new formally verified Facade compiler produces a correct Bedrock module, completing Theorem 1:

Theorem 1

Starting from a valid refinement of a Fiat ADT specification with methods and and a set of Facade programs synthesized from each , we can build a certified Bedrock module whose methods satisfy the axiomatic specifications .

The final Bedrock module satisfies the original, high-level Fiat specifications. It specifies its external dependencies \(\varPsi \), for which verified assembly implementations must be provided as part of the final linking phase, which happens entirely inside of Coq. After linking, we obtain a closed, executable Bedrock module, exposing an axiomatic specification directly derived from the original, high-level ADT specification. Our implementation links against verified hand-written implementations of low-level indexing structures, though it would be possible to use the output of any compiler emitting Bedrock assembly code.

Fig. 5.
figure 5

Different stages of a process-scheduler compilation example (see also the annotated ‘ProcessScheduler.v’ file).

6 Evaluation

6.1 Microbenchmarks

We first evaluated our pipeline by extracting a collection of twenty six Gallina programs manipulating machine words, lists, and nested lists, with optional nondeterministic choices. Extraction takes a few seconds for each program, ranging from simple operations such as performing basic arithmetic, allocating data structures, calling compiler intrinsics, or sampling arbitrary numbers to more complex operations involving sequence manipulations, like reversing, filtering, reducing (e.g. reading in a number written as a list of digits in a given base), flattening, and duplicating or replacing elements. All examples, and the corresponding outputs, are included in a literate Coq file available online. These examples illustrate that our extraction engine supports a fluid, extensible source language, including subsets of Gallina and many nondeterministic Fiat programs.

6.2 Relational Queries

To evaluate our full pipeline in realistic conditions, we targeted the query-structure ADT library of the Fiat paper [4] as well as an ADT modeling process scheduling inspired by Hawkins et al. [7]. This benchmark starts from high-level Fiat specifications (as shown in Fig. 5a) and outputs a closed Bedrock module, linked against a hand-verified nested-binary-tree implementation.

From Fiat specifications we derive a collection of nondeterministic Fiat programs (one per ADT method, as demonstrated in Fig. 5b), then extract each method to Facade Fig. 5c) and compile to Bedrock. Extraction is fully automatic; it draws from the default pool of extraction lemmas (about conditionals, constants, arithmetic operations, etc.) and from bag-specific lemmas that we added to the compiler (these manually verified call rules connect the pure bag specifications used in Fiat sources to Bedrock-style specifications of mutable binary search trees using the relation).

Fig. 6.
figure 6

Process scheduler benchmarks.

Figure 6 presents the results of our experimental validation. We compare our own verified implementation (“Fiat”) against the corresponding SQL queries executed by SQLite 3.8.2 (using an in-memory database) and PostgreSQL 9.3.11 (“PG”). For increasingly large collections of processes, we run 20,000 Enumerate queries to locate the 10 active processes, followed by 10,000 GetCPUTime queries for arbitrary process IDs. In all cases, the data is indexed by (state, PID) to allow for constant-time Enumerate queries (the number of active processes is kept constant) and logarithmic-time GetCPUTime queries (assuming a B-tree–style index and skip-scans).

Our implementation behaves as expected: it beats SQLite and PostgreSQL by 1.5 and 2.5 orders of magnitude respectively on GetCPUTime, and competes honorably with SQLite (while beating PostgreSQL by one order of magnitude) on Enumerate. Notice the red curves on the graph: without an explicit “ ” clause, both database management systems missed the skip-scan opportunity and exhibited asymptotically suboptimal linear-time behavior, so we had to tweak the queries fed to PostgreSQL and SQLite to obtain good GetCPUTime performance (in contrast, the optimizer in our system can be guided explicitly by adding compiler hints in the form of extra tactics, without modifying the specifications).

Of course, our implementation does much less work than a database engine; the strength of our approach is to expose an SQL-style interface while enabling generation of specialized data-structure-manipulation code, allowing programmers to benefit from the conciseness and clarity of high-level specifications without incurring the overheads of a full-fledged DBMS.

Trusted Base. Our derivation assumes ensemble extensionality and Axiom K. Our trusted base comprises the Coq 8.4 checker [25] (\(\sim \)10 000 lines of OCaml code), the semantics of the Bedrock IL and the translator from it to x86 assembly (\(\sim \)1200 lines of Gallina code), an assembler, and wrappers for extracted methods (\(\sim \)50 lines of x86 assembly). We used Proof General [2] for development.

7 Related Work

Closely related to our work is a project by Lammich [10] that uses Isabelle/HOL to refine functional programs to an embedded imperative language that requires garbage collection. This approach has been applied to various complex algorithms, whereas our focus is on fully automatic derivation from highly regular specs. Both approaches use some form of linearity checking to bridge the functional-imperative gap (Lammich et al. use separation logic [20] and axiomatic semantics, while we apply Facade’s lighter-weight approach: decidable syntactic checks applied after-the-fact, with no explicit pointer reasoning). A recent extension [11] targets LLVM directly. Crucially, the initial work only targets Imperative/HOL and its extension does not support linking against separately verified libraries, while our pipeline allows linking, inside of Coq, low-level programs against verified libraries written in any language of the Bedrock ecosystem. Finally, we have integrated our translation into an automated proof-generating pipeline from relational specifications to executable assembly code—as far as we know, no such pipeline has been presented before.

Another closely related project by Kumar et al. [8, 17] focuses on extracting terms written in a purely functional subset of HOL4’s logic into the CakeML dialect of ML. The main differences with our pipeline are optimization opportunities, extensibility, and external linking. Indeed, while the compiler to CakeML bridges a relatively narrow gap (between two functional languages with expressive type systems and automatic memory management), our extraction procedure connects two very different languages, opening up many more opportunities for optimizations (including some related to memory management). We expose these opportunities to our users by letting them freely extend the compiler based on their domain-specific optimization knowledge.

Recent work by Protzenko et al. [19] achieves one of our stated goals (efficient extraction to low-level code, here from F* to C) but does not provide formal correctness guarantees for the extracted code (the tool, KreMLin, consists of over 15,000 lines of unverified OCaml code). Additionally, KreMLin requires source programs to be written in a style matching that of the extracted code: instead of extending the compiler with domain-specific representation choices and optimizations, users must restrict their programs to the Low* subset of F*.

One last related project is the compiler of the Cogent language [18]. Its sources are very close to Facade’s (it allows for foreign calls to axiomatically specified functions, but it does not permit iteration or recursion except through foreign function calls), and its compiler also produces low-level code without a garbage collector. Our projects differ in architecture and in spirit: Cogent is closer to a traditional verified compiler, producing consecutive embeddings of a source program (from C to a shallow embedding in Isabelle/HOL) and generating equivalence proofs connecting each of them. Cogent uses a linear type system to establish memory safety, while we favor extensibility over completeness, relying on lemmas to justify the compilation of arbitrary Gallina constructs.

We draw further inspiration from a number of other efforts:

Program Extraction. Program extraction (a facility offered by Coq and other proof assistants) is a popular way of producing executable binaries from verified code. Extractors are rather complex programs, subjected to varying degrees of scrutiny: for example, the theory behind Coq’s extraction was mechanically formalized and verified [14], but the corresponding concrete implementation itself is unverified. The recent development of CertiCoq [1], a verified compiler for Gallina, has significantly improved matters over unverified extraction, but it only supports pure Gallina programs, and it uses a fixed compilation strategy. In contrast, our pipeline ensures that nondeterministic specifications are preserved down to the generated Bedrock code and grants user fine control over the compilation process.

Compiler Verification. Our compilation strategy allows Fiat programs to depend on separately compiled libraries. This contrasts with verified compilers like CakeML [9] or CompCert [13]: in the latter, correctness guarantees only extend to linking with modules written in CompCert C and compiled with the same version of the compiler. Recent work [23] generalized these guarantees to cover cross-language compilation, but these developments have not yet been used to perform functional verification of low-level programs assembled from separately verified components.

An alternative approach, recently used to verify an operating-system kernel [21], is to validate individual compiler outputs. This is particularly attractive as an extension of existing compilers, but it generally falls short when trying to verify complex optimizations, such as our high-level selection of algorithms and data structures. In the same vein, verified compilers often rely on unverified programs to solve complex problems such as register allocation, paired with verified checkers to validate solutions. In our context, the solver is the proof-producing extraction logic, and the verifier is Coq’s kernel: our pipeline produces proofs that witness the correctness of the resulting Facade code.

Extensible Compilation. Multiple research projects let users add optimizations to existing compilers. Some, like Racket [26], do not focus on verification. Others, like Rhodium [12], let users phrase and verify transformations using DSLs. Unfortunately, most of these tools are unverified and do not provide end-to-end guarantees. One recent exception is XCert [24], which lets CompCert users soundly describe program transformations using an . Our approach is similar insofar as we assemble DSL compilers from collections of verified rewritings.

Program Synthesis. Our approach of program generation via proofs follows in the deductive-synthesis tradition started in the 1980s [15]. We use the syntactic structure of our specialized pre- and postconditions to drive synthesis: the idea of strongly constraining the search space is inherited from the syntax-guided approach pioneered in the Sketch language [22]. That family of work uses SMT solvers where we use a proof assistant, offering more baseline automation with less fundamental flexibility.

Formal Decompilation. Instead of deriving low-level code from high-level specifications, some authors have used HOL-family proof assistants to translate unverified low-level programs (in assembly [16] or C [6]) into high-level code suitable for verification. Decompilation is an attractive approach for existing low-level code, or when compiler verification is impractical.

8 Conclusion

The extraction technique presented in this paper is a convenient and lightweight approach for generating certified extracted programs, reducing the trusted base of verified programs to little beyond a proof assistant’s kernel. We have shown our approach to be suitable for the extraction of DSLs embedded in proof assistants, using it to compile a series of microbenchmarks and to do end-to-end proof-generating derivation of assembly code from SQL-style specifications. Crucially, the latter derivations work via linking with verified implementations of assembly code that our derivation pipeline could never produce directly. To ease this transition, we developed Facade, a new language designed to facilitate reasoning about memory allocation in synthesized extracted programs. In the process, we have closed the last gap in the first automatic and mechanically certified translation pipeline from declarative specifications to assembly-language libraries, supporting user-guided optimizations and parameterization over abstract data types implemented, compiled, and verified using arbitrary languages and tools.