A Verified Generational Garbage Collector for CakeML

This paper presents the verification of a generational copying garbage collector for the CakeML runtime system. The proof is split into an algorithm proof and an implementation proof. The algorithm proof follows the structure of the informal intuition for the generational collector’s correctness, namely, a partial collection cycle in a generational collector is the same as running a full collection on part of the heap, if one views pointers to old data as non-pointers. We present a pragmatic way of dealing with ML-style mutable state, such as references and arrays, in the proofs. The development has been fully integrated into the in-logic bootstrapped CakeML compiler, which now includes command-line arguments that allow configuration of the generational collector. All proofs were carried out in the HOL4 theorem prover.


Introduction
High-level programming languages such as ML, Haskell, Java, Javascript and Python provide an abstraction of memory which removes the burden of memory management from the application programmer.The most common way to implement this memory abstraction is to use garbage collectors in the language runtimes.The garbage collector is a routine which is invoked when the memory allocator finds that there is not enough free space to perform allocation.The collector's purpose is to produce new free space.It does so by traversing the data in memory and deleting data that is unreachable from the running application.There are two classic algorithms: mark-and-sweep collectors mark all live objects and delete the others; copying collectors copy all live objects to a new heap and then discard the old heap and its dead objects.
Since garbage collectors are an integral part of programming language implementations, their performance is essential to make the memory abstraction seem worthwhile.As a result, there have been numerous improvements to the classic algorithms mentioned above.There are variants of the classic algorithms that make them incremental (do a bit of garbage collection often), generational (run the collector only on recent data in the heap), or concurrent (run the collector as a separate thread alongside the program).This paper's topic is the verification of a generational copying collector for the CakeML compiler and runtime system [15].The CakeML project has produced a formally verified compiler for an ML-like language called CakeML.The compiler produces binaries that include a verified language runtime, with supporting routines such as an arbitrary precision arithmetic library and a garbage collector.One of the main aims of the CakeML compiler project is to produce a verified system that is as realistic as possible.This is why we want the garbage collector to be more than just an implementation of one of the basic algorithms. Contributions.
-To the best of our knowledge, this paper presents the first completed formal verification of a generational garbage collector.However, it seems that the CertiCoq project [1] is in the process of verifying a generational garbage collector.-We present a pragmatic approach to dealing with mutable state, such as MLstyle references and arrays, in the context of implementation and verification of a generational garbage collector.Mutable state adds a layer of complexity since generational collectors need to treat pointers from old data to new data with special care.The CertiCoq project does not include mutable data, i.e. their setting is simpler than ours in this respect.-We describe how the generational algorithm can be verified separately from the concrete implementation.Furthermore, we show how the proof can be structured so that it follows the intuition of informal explanations of the form: a partial collection cycle in a generational collector is the same as running a full collection on part of the heap if one views pointers to old data as non-pointers.-This paper provides more detail than any previous CakeML publication on how algorithm-level proofs can be used to write and verify concrete implementations of garbage collectors for CakeML, and how these are integrated into the full CakeML compiler and runtime.The updated in-logic bootstrapped compiler comes with new command-line arguments that allow configuration of the generational garbage collector.

Approach
In this section, we give a high-level overview of the work and our approach to it.Subsequent sections will cover some-but for lack of space, not all-of these topics in more detail.

Algorithm-Level Modelling and Verification:
-The intuition behind the copying garbage collection is important in order to understand this paper.Section 3.1 provides an explanation of the basic Cheney copying collector algorithm.Section 3.2 continues with how the basic algorithm can be modified to run as a generational collector.It also describes how we deal with mutable state such as ML-style references and arrays.
-Section 3.3 describes how the algorithm has been modelled as HOL functions.These algorithm-level HOL functions model memory abstractly, in particular we use HOL lists to represent heap segments.This representation neatly allows us to avoid awkward reasoning about potential overlap between memory segments.It also works well with the separation logic we use later to map the abstract heaps to their concrete memory representations, in Sect.4.2.-Section 3.4 defines the main correctness property, gc related, that any garbage collector must satisfy: for every pointer traversal that exists in the original heap from some root, there must be a similar pointer traversal possible in the new heap.-A generational collector can run either a partial collection, which collects only some part of the heap, or a full collection of the entire heap.We show that the full collection satisfies gc related.To show that a run of the partial collector also satisfies gc related, we exploit a simulation argument that allows us to reuse the proofs for the full collector.Intuitively, a run of the partial collector on a heap segment h simulates a run of the full collector on a heap containing only h.Section 3.4 provides some details on this.

Implementation and Integration into the CakeML Compiler:
-The CakeML compiler goes through several intermediate languages on the way from source syntax to machine code.

Algorithm Modelling and Verification
Garbage collectors are complicated pieces of code.As such, it makes sense to separate the reasoning about algorithm correctness from the reasoning about the details of its more concrete implementations.Such a split also makes the algorithm proofs more reusable than proofs that depend on implementation details.This section focuses on the algorithm level.

Intuition for Basic Algorithm
Intuitively, a Cheney copying garbage collector copies the live elements from the current heap into a new heap.We will call the heaps old and new.In its simplest form, the algorithm keeps track of two boundaries inside the new heap.These split the new heap into three parts, which we will call h1, h2, and unused space.Throughout execution, the heap segment h1 will only contain pointers to the new heap, and heap segment h2 will only contain pointers to the old heap, i.e. pointers that are yet to be processed.The algorithm's most primitive operation is to move a pointer ptr, and the data element d that ptr points at, from the old heap to the new one.The move primitive's behaviour depends on whether d is a forward pointer or not.A forward pointer is a heap element with a special tag to distinguish it from other heap elements.Forward pointers will only ever occur in the heap if the garbage collector puts them there; between collection cycles, they are never present nor created.
If d is not a forward pointer, then d will be copied to the end of heap segment h2, consuming some of the unused space, and ptr is updated to be the address of the new location of d.A forward pointer to the new location is inserted at the old location of d, namely at the original value of ptr.We draw forward pointers as hollow boxes with dashed arrows illustrating where they point.Solid arrows that are irrelevant for the example are omitted in these diagrams.If d is already a forward pointer, the move primitive knows that this element has been moved previously; it reads the new pointer value from the forward pointer, and leaves the memory unchanged.
The algorithm starts from a state where the new heap consists of only free space.It then runs the move primitive on each pointer in the list of roots.This processing of the roots populates h2.
Once the roots have been processed, the main loop starts.The main loop picks the first heap element from h2 and applies the move primitive to each of the pointers that that heap element contains.Once the pointers have been updated, the boundary between h1 and h2 can be moved, so that the recently processed element becomes part of h1.This process is repeated until h2 becomes empty, and the new heap contains no pointers to the old heap.The old heap can then be discarded, since it only contains data that is unreachable from the roots.The next time the garbage collector runs, the previous old heap is used as the new heap.

Intuition for Generational Algorithm
Generational garbage collectors attempt to run the collector only on part of the heap.The motivation is that new data tends to be short-lived while old data tends to stay live.By running the collector on new data only, one avoids copying around old data unnecessarily.
The intuition is that a partial collection focuses on a small segment of the full heap and ignores the rest, but operates as a normal full collection on this small segment.

old:
Partial collection pretends that a small part is the entire heap: . . . . . .new: The collector operates as normal on part of heap: old: . . . . . .new: Finally, the external new segment is copied back: new: . . . . . .

For the partial collection to work we need:
(a) the partial algorithm to treat all pointers to the outside (old data) as nonpointers, in order to avoid copying old data into its new memory region.(b) that outside data does not point into the currently collected segment of the heap, because the partial collector should be free to move around and delete elements in the segment it is working on without looking at the heap outside.
In ML programs, most data is immutable, which means that old data cannot point at new data.However, ML programs also use references and arrays (henceforth both will be called references) that are mutable.References are usually used sparingly, but are dangerous for a generational garbage collector because they can point into the new data from old data.
Our pragmatic solution is to make sure immutable data is allocated from the bottom of the heap upwards, and references are allocated from the top downwards, i.e. the memory layout is as follows.This diagram also shows that we use a GC trigger pointer, which causes a GC invocation whenever one attempts to allocate past the GC trigger pointer.We modify the simple garbage collection algorithm described above to maintain this layout, and we make each run of the partial collection algorithm treat the references as roots that are not part of the heap.This way we can meet the two requirements (a) and (b) from above.
Our approach means that references will never be collected by a partial collection.However, they will be collected when the full collection is run.
Full collections happen if there is a possibility that the partial collector might fail to free up enough space, i.e. if the amount of unused space prior to collection is less than the amount of new memory requested.Note that there is no heuristic involved here: if there is enough space for the allocation between the GC trigger pointer and the actual end of the heap, then a partial collection is performed.

Formalisation
The algorithm-level formalisation represents heaps abstractly as lists, where each element is of type heap element.The definition of heap element is intentionally somwewhat abstract with type variables.We use this flexiblity to verify the partial collector for our generational version, in the next section.
Addresses are of type heap address and can either be an actual pointer with some data attached, or a non-pointer Data.A heap element can be unused space, a forward pointer, or actual data.
Each heap element carries its concrete length, i.e. how many machine words the eventual memory representation will hold.The length function, el length, returns l plus one because we do not allow heap elements of length zero.
The natural number (type num in HOL) in Pointer values is an offset from the start of the relevant heap.We define a lookup function heap lookup that fetches the content of address a from a heap xs: The generational garbage collector has two main routines: gen gc full which runs a collection on the entire heap including the references, and gen gc partial which runs only on part of the heap, treating the references as extra roots.Both use the record type gc state to represent the heaps.In a state s, the old heap is in s.heap, and the new heap comprises the following fields: s.h1 and s.h2 are the heap segments h1 and h2 from before, s.n is the length of the unused space, and s.r2, s.r1 are for references what s.h1 and s.h2 are for immutable data; s.ok is a boolean representing whether s is a well-formed state that has been arrived at through a well-behaved execution.It has no impact on the behaviour of the garbage collector; its only use is in proofs, where it serves as a convenient trick to propagate invariants downwards in refinement proofs.
Figure 1 shows the HOL function implementing the move primitive for the partial generational algorithm.It follows what was described informally in the section above: it does nothing when applied to a non-pointer, or to a pointer that points outside the current generation.When applied to a pointer to a forward pointer, it follows the forward pointer but leaves the heap unchanged.When applied to a pointer to some data element d, it inserts d at the end of h2, decrements the amount of unused space by the length of d, and inserts at the old location of d a forward pointer to its new location.When applied to an invalid pointer (i.e. to an invalid heap location, or to a location containing unused space) it does nothing except set the ok field of the resultant state to false; we prove later that this never happens.The HOL function gen gc full move implements the move primitive for the full generational collection; its definition is elided for space reasons.It is similar to gen gc partial move, but differs in two main ways: first, it does not consider generation boundaries.Second, in order to maintain the memory layout it must distinguish between pointers to references and pointers to immutable data, allocating references at the end of the new heap's unused space and immutable data at the beginning.Note that gen gc partial move does not need to consider pointers to references, since generations are entirely contained in the immutable part of the heap.
The algorithms for an entire collection cycle consist of several HOL functions in a similar style; the functions implementing the move primitive are the most interesting of these.The main responsibility of the others is to apply the move primitive to relevant roots and heap elements, following the informal explanations in previous sections.

Verification
For each collector (gen gc full and gen gc partial), we prove that they do not lose any live elements.We formalise this notion with the gc related predicate shown below.If a collector can produce heap 2 from heap 1 , there must be a map f such that gc related f heap 1 heap 2 .The intuition is that if there was a heap element at address a in heap 1 that was retained by the collector, the same heap element resides at address f a in heap 2 .
The conjuncts of the following definition state, respectively: that f must be an injective map into the set of valid addresses in heap 2 ; that its domain must be a subset of the valid addresses into heap 2 ; and that for every data element d at address a ∈ domain f , every address reachable from d is also in the domain of f , and f a points to a data element that is exactly d with all its pointers updated according to f.Separately, we require that the roots are in domain f .
Proving a gc related-correctness result for gen gc full, as below, is a substantial task that requires a non-trivial invariant, similar to the one we presented in earlier work [10].The main correctness theorem is as follows; we will not give further details of its proofs in this paper; for such proofs see [10].The theorem above can be read as saying: if all roots are pointers to data elements in the heap (abbreviated roots ok), if the heap has length conf.limit, and if all pointers in the heap are valid non-forward pointers back into the heap (abbreviated heap ok), then a call to gen gc full results in a state that is gc related via a mapping f whose domain includes the roots (and hence, by definition of gc related, all live elements).
The more interesting part is the verification of gen gc partial, which we conduct by drawing a formal analogy between how gen gc full operates and how gen gc partial operates on a small piece of the heap.The proof is structured in two steps: 1. we first prove a simulation result: running gen gc partial is the same as running gen gc full on a state that has been modified to pretend that part of the heap is not there and the references are extra roots.2. we then show a gc related result for gen gc partial by carrying over the same result for gen gc full via the simulation result.
For the simulation result, we instantiate the type variables in the gen gc full algorithm so that we can embed pointers into Data blocks.The idea is that encoding pointers to locations outside the current generation as Data causes gen gc full to treat them as non-pointers, mimicking the fact that gen gc partial does not collect there.
The type we use for this purpose is defined as follows: Similar to gen functions, elided here, encode the roots, heap, state and configuration for a run of gen gc partial into those for a run of gen gc full.We prove that for every execution of gen gc partial starting from an ok state, and the corresponding execution of gen gc full starting from the encoding of the same state through the to gen functions, encoding the results of the former with to gen yields precisely the results of the latter.Initially, we made an attempt to do the gc related proof for gen gc partial using the obvious route of manually adapting all loop invariants and proofs for gen gc full into invariants and proofs for gen gc partial.This soon turned out to overly cumbersome; hence we switched to the current approach because it seemed more expedient and more interesting.As a result, the proofs for gen gc partial are more concerned with syntactic properties of the encoding than with semantic properties of the collector as such.The syntactic arguments are occasionally quite tedious, but we believe this approach still leads to more understandable and less repetitive proofs.
Finally, note that gc related is the same correctness property that we use for the previous copying collector; this makes it straightforward to prove that the top-level correctness theorem of the CakeML compiler remains true if we swap out the garbage collector.

Combining the Partial and Full Collectors
An implementation that uses the generational collector will mostly run the partial collector and occasionally the full one.At the algorithm level, we define a combined collector and leave it up to the implementation to decide when a partial collection is to be run.The choice is made visible to the implementation by having a boolean input do partial to the combined function.The combined function will produce a valid heap regardless of the value of do partial.
Our CakeML implementation (next section) runs a partial collection if the allocation will succeed even if the collector does not manage to free up any space, i.e., if there is already enough space on the other side of the GC trigger pointer before the GC starts (Sect.3.2).

Implementation and Integration into CakeML Compiler
The concept of garbage collection is introduced in the CakeML compiler at the point where a language with unbounded memory (DataLang) is compiled into a language with a concrete finite memory (WordLang).Here the garbage collector's role is to automate memory deallocation and to implement the illusion of an unbounded memory.
This section sketches how the collector algorithm's types get instantiated, how the data refinement is specified, and how an implementation of the garbage collector algorithm is verified.

Instantiating the Algorithm's Types
The language which comes immediately prior to the introduction of the garbage collector, DataLang, stores values of type v in its variables.
DataLang gets compiled into a language called WordLang where memory is finite and variables are of type word loc.A word loc is either a machine word Word w, or a code location Loc l 1 l 2 .
In what follows we will show through an example how an instance of v is represented.We would have liked to provide more detail, but the definitions involved are simply too verbose to be included here.We will use the following DataLang value as our running example.The relation v inv specifies how values of type v relate to the heap addresses and heaps that the garbage collection algorithms operate on.Below is the Number case from the definition of v inv.If integer i is small enough to fit into a tagged machine word, then the head address x must be Data that carries the value of the small integer, and there is no requirement on the heap.If integer i is too large to fit into a machine word, then the heap address must be a Pointer to a heap location containing the data for the bignum representing integer i.In the definition of v inv, f is a finite map that specifies how semantic location values for reference pointers (RefPtr) are to be represented as addresses.
The Block case below shows how constructors and tuples, Blocks, are represented.
When v inv is expanded for the case of our running example, we get the following constraint on the heap.The address x must be a pointer to a DataElement which contains Data representing integer 5, and a pointer to some memory location which contains the machine words representing bignum 80000000000000.
Here we assume that the architecture has 32-bit machine words.Below one can see that the first Pointer is given information, ptr bits conf 3 2, about the length, 2, and tag, 3, of the Block that it points to.Such information is used to speed up pattern matching.If the information fits into the lower bits of the pointer, then the pattern matcher does not need to follow the pointer to know whether there is a match.The following is an instantiation of heap that satisfies the constraint set out by v inv for representing our running example.As we know, the garbage collector moves heap elements and changes the addresses.However, it will only transform heaps in a way that respects gc related.We prove that v inv properties can be transported from one heap to another if they are gc related.In other words, execution of a garbage collector does not interfere with this data representation.
Here addr apply f (Pointer x d) = Pointer (f x) d.

Data Refinement down to Concrete Memory
The relation provided by v inv only gets us halfway down to WordLang's memory representation.In WordLang, values are of type word loc, and memory is modelled as a function, α word → α word loc, and an address domain set.We use separation-logic formulas to specify how lists of heap elements are represented in memory.We define separating conjunction *, and use fun2set to turn the memory function m and its domain set dm into something we can write separation logic assertions about.The relevant definitions are: Using these, we define word heap a heaf conf to assert that a heap element list heap is in memory, starting at address a, and word el asserts the same thing about individual heap elements.Figure 2 shows an expansion of the word heap assertion applied to our running example.

Implementing the Garbage Collector
The garbage collector is used in the WordLang semantics as a function that the semantics of Alloc applies to memory when the allocation primitive runs out of memory.At this level, the garbage collector is essentially a function from a list of roots and a concrete memory to a new list of roots and concrete memory.
To implement the new garbage collector, we define a HOL function at the level of a concrete memory, and prove that it correctly mimics the operations performed by the algorithm-level implementation from Sect. 3. The following is an excerpt of the theorem relating gen gc partial move with its refinement word gen gc partial move.This states that the concrete memory is kept faithful to the algorithm's operations over the heaps.We prove similar theorems about the other components of the garbage collectors.

Discussion of Related Work
Anand et al. [1] reports that the CertiCoq project has a "high-performance generational garbage collector" and a project is underway to verify this using Verifiable C in Coq.Their setting is simpler than ours in that their programs are purely functional, i.e. they can avoid dealing with the added complexity of mutable state.The text also suggests that their garbage collector is specific to a fixed data representation.In contrast, the CakeML compiler allows a highly configurable data representation, which is likely to become more configurable in the future.The CakeML compiler generates a new garbage collector implementation for each configuration of the data representation.
CakeML's original non-generational copying collector has its origin in the verified collector described in Myreen [10].The same verified algorithm was used for a verified Lisp implementation [11] which in turn was used underneath the proved-to-be-sound Milawa prover [2].These Lisp and ML implementations are amongst the very few systems that use verified garbage collectors as mere components of much larger verified implementations.Verve OS [16] and Ironclad Apps [7] are verified stacks that use verified garbage collectors internally.
Numerous abstract garbage collector algorithms have been mechanically verified before.However, most of these only verify the correctness at the algorithmlevel implementation and only consider mark-and-sweep algorithms.Noteworthy exceptions include Hawblitzel and Petrank [8] and McCreight [9]; recent work by Gammie et al. [4] is also particularly impressive.
Hawblitzel and Petrank [8] show that performant verified x86 code for simple mark-and-sweep and Cheney copying collectors can be developed using the Boogie verification condition generator and the Z3 automated theorem prover.
Their method requires the user to write extensive annotations in the code to be verified.These annotations are automatically checked by the tools.Their collector implementations are realistic enough to show good results on off-the-shelf C# benchmarks.This required them to support complicated features such as interior pointers, which CakeML's collector does not support.We decided to not support interior pointers in CakeML because they are not strictly needed and they would make the inner loop of the collector a bit more complicated, which would probably cause the inner loop to run a little slower.
McCreight [9] verifies copying and incremental collectors implemented in MIPS-like assembly.The development is done in Coq, and casts his verification efforts in a common framework based on ADTs that all the collectors refine.
Gammie et al. [4] verify a detailed model of a state-of-the-art concurrent collector in Isabelle/HOL, with respect to an x86-TSO memory model.
Pavlovic et al. [13] focus on an earlier step, namely the synthesis of concurrent collection algorithms from abstract specifications.The algorithms thus obtained are at a similar level of abstraction to the algorithm-level implementation we start from.The specifications are cast in lattice-theoretic terms, so e.g.computing the set of live nodes is fixpoint iteration over a function that follows pointers from an element.A main contribution is an adaptation of the classic fixpoint theorems to a setting where the monotone function under consideration may change, which can be thought of as representing interference by mutators.
This paper started by listing incremental, generational, and concurrent as variations on the basic garbage collection algorithms.There have been prior verifications of incremental algorithms (e.g.[6,9,12,14]) and concurrent ones (e.g.[3][4][5]13]), but we believe that this paper is the first to report on a successful verification of a generational garbage collector.

Summary
This paper describes how a generational copying garbage collector has been proved correct and integrated into the verified CakeML compiler.The algorithmlevel part of the proof is structured to follow the usual informal argument for a generational collector's correctness: a partial collection is the same as running a full collection on part of the heap if pointers to old data are treated as nonpointers.To the best of our knowledge, this paper is the first to report on a completed formal verification of a generational garbage collector.
What We Did Not Do.The current implementation lacks support for (a) nested nursery generations, and (b) the ability to switch garbage collector mode (e.g. from non-generational to generational, or adjust the size of the nursery) midway through execution of the application program.We expect both extensions to fit within the approach taken in this paper and neither to require modification of the algorithm-level proofs.For (a), one would keep track of multiple nursery starting points in the immutable part of the heap.These parts are left untouched by collections of the inner nursery generations.For (b), one could run a full generational collection to introduce the special heap layout when necessary.This is possible since the correctness theorem for gen gc full does not assume that the references are at the top end of the heap when it starts.
current: immutable data here . . .| unused space here . . .| references GC trigger start of nursery gen.relevant part for the next partial collection used as extra roots by partial collections

Fig. 1 .
Fig. 1.The algorithm implementation of the move primitive for gen gc partial.
(α, β) data sort = Protected α | Real β and the translation from gen gc partial's pointers to pointers on the pretend-heap used by gen gc full in the simulation argument is: to gen heap address conf (Data a) = Data (Real a) to gen heap address conf (Pointer ptr a) = if ptr < conf .genstart then Data (Protected (Pointer ptr a)) else if conf .refsstart ≤ ptr then Data (Protected (Pointer ptr a)) else Pointer (ptr − conf .genstart) (Real a)

Fig. 2 .
Fig. 2. Running example expanded to concrete memory assertion gen gc partial move gc conf s x = (x1,s1) ∧ word gen gc partial move conf (word addr conf x ,. . . ) = (w ,. . . ) ∧ . . .∧ (word heap a s.heap conf * word heap p s.h2 conf * . . . ) (fun2set (m,dm)) ⇒ w = word addr conf x1 ∧ . . .∧ (word heap a s1.heap conf * word heap p1 s1.h2 conf * . . . ) (fun2set (m1,dm)) For the verification of the DataLang to WordLang compiler, we also specify how each instantiation of the algorithm-level heap types maps into Word- The heap is the region of memory where heap elements are allocated and which is to be garbage collected.A heap element is the unit of memory allocation.A heap element can contain pointers to other heap elements.The collection of all program visible variables is called the roots.
The garbage collector is introduced gradually in the intermediate languages DataLang (abstract data), Word-Lang (machine words, concrete memory, but abstract stack) and StackLang (more concrete stack).-Theverification of the compiler phase from DataLang to WordLang specifies how abstract values of DataLang are mapped to instantiations of the heap types that the algorithm-level garbage collection operates over, Sect.4.1.We prove that gc related implies that from DataLang's point of view, nothing changes when a garbage collector is run.- ok roots heap ∧ heap ok heap conf .limit⇒ ∃ state f .