Incremental λ -Calculus in Cache-Transfer Style Static Memoization by Program Transformation

. Incremental computation requires propagating changes and reusing intermediate results of base computations. Derivatives, as produced by static diﬀerentiation [7], propagate changes but do not reuse intermediate results, leading to wasteful recomputation. As a solution, we introduce conversion to Cache-Transfer-Style , an additional program transformations producing purely incremental functional programs that create and maintain nested tuples of intermediate results. To prove CTS conversion correct, we extend the correctness proof of static diﬀerentia-tion from STLC to untyped λ -calculus via step-indexed logical relations , and prove sound the additional transformation via simulation theorems. To show ILC-based languages can improve performance relative to from-scratch recomputation, and that CTS conversion can extend its applicability, we perform an initial performance case study. We provide derivatives of primitives for operations on collections and incrementalize selected example programs using those primitives, conﬁrming expected asymptotic speedups.


Introduction
After computing a base output from some base input, we often need to produce updated outputs corresponding to updated inputs. Instead of rerunning the same base program on the updated input, incremental computation transforms the input change to an output change, potentially reducing asymptotic time complexity and significantly improving efficiency, especially for computations running on large data sets.
Incremental λ-Calculus (ILC) is a recently introduced framework [7] for higher-order incremental computation. ILC represents changes from a base value v 1 to an updated value v 2 as a first-class change value dv . Since functions are firstclass values, change values include function changes.
ILC also statically transforms base programs to incremental programs or derivatives, that are functions mapping input changes to output changes. Incremental language designers can then provide their language with (higher-order) primitives (with their derivatives) that efficiently encapsulate incrementalizable computation skeletons (such as tree-shaped folds), and ILC will incrementalize higher-order programs written in terms of these primitives.
Alas, ILC only incrementalizes efficiently self-maintainable computations [7,Sec. 4.3], that is, computations whose output changes can be computed using only input changes, but not the inputs themselves [11]. Few computations are self-maintainable: for instance, mapping self-maintainable functions on a sequence is self-maintainable, but dividing numbers is not! Cai et al. [7] already hint at this problem; we elaborate on it in Section 2.1. In this paper, we extend ILC to non-self-maintainable computations. To this end, we must enable derivatives to reuse intermediate results created by the base computation.
Many incrementalization approaches rely on some forms of dynamic memoization to remember intermediate results: they typically use hashtables to memoize function results, or dynamic dependence graphs [1] to remember the computation trace. However, looking up intermediate results in such dynamic data structure has a cost in time, and typical general-purpose optimizers cannot predict results from memoization lookups. Besides, reasoning on dynamic dependence graphs and computation traces is often complex. Instead, ILC aims to produce purely functional programs that are suitable for further optimizations and equational reasoning.
To that end, we eschew standard dynamic memoization in favor of static memoization: we transform programs to cache-transfer style (CTS), following ideas from Liu and Teitelbaum [20]. CTS functions output caches of intermediate results along with their primary results. Caches are just nested tuples whose structure is derived from code, and accessing them does not involve looking up keys depending on inputs. On the contrary, intermediate results can be fetched from these tuples using statically known locations. We also extend differentiation to produce CTS derivatives, which can extract from caches any intermediate results they need and which are responsible for updating the caches for the next computation step. This approach was inspired and pioneered by Liu and Teitelbaum [20] for untyped first-order functional languages; we integrate it with ILC and extend it to higher-order languages.
The correctness proof of static differentiation in CTS is quite challenging. First, it requires to show a forward simulation relation between two triples of reduction traces (the first triple being made of the source base evaluation, the source updated evaluation and the source derivative evaluation; the second triple being made of the corresponding CTS-translated evaluations). Dealing with six distinct evaluation environments at the same time was error prone on paper and for this reason, we conducted the proof using Coq [26]. Second, the simulation relation must not only track values but also caches, which are only partially updated while in the middle of the evaluation of derivatives. Finally, we study the translation for an untyped λ-calculus, while previous ILC correctness proofs were restricted to simply-typed λ-calculus. Hence, we define which changes are valid via a logical relation and show its fundamental property. Being in an untyped setting, our logical relations are step-indexed, not indexed by types. We study an untyped language with the intention to make it applicable to the erasure of typed languages. Formalizing a type-preserving translation is left for future work because giving a type to CTS programs is challenging, as we shall explain.
In addition to the correctness proof, we present preliminary experimental results from three case studies. We obtain efficient incremental programs even on non self-maintainable functions.
Our contributions are presented as follows. First, we summarize ILC and illustrate the need to extend it to remember intermediate results via CTS (Sec. 2). Second, in our mechanized formalization (Sec. 3), we give a novel proof of correctness for ILC differentiation for untyped λ-calculus, based on step-indexed logical relations (Sec. 3.4). Third, building on top of ILC differentiation, we show how to transform untyped higher-order programs to CTS (Sec. 3.5) and we show that CTS functions and derivatives simulate correctly their non-CTS counterparts (Sec. 3.7). Finally, in our case studies (Sec. 4), we compare the performance of the generated code to the base programs. Sec. 4.4 discusses limitations and future work. Sec. 5 discusses related work and Sec. 6 concludes.

ILC and CTS Primer
In this section we exemplify ILC by applying it on an average function, show why the resulting incremental program is asymptotically inefficient, and use CTS conversion and differentiation to incrementalize our example efficiently and speed it up asymptotically (as confirmed by benchmarks in Sec. 4.1). Further examples in Sec. 4 apply CTS to higher-order programs and suggest that CTS enables incrementalizing efficiently some core database primitives such as joins.

Incrementalizing average via ILC
Our example computes the average of a bag of numbers. After computing the base output y 1 of the average function on the base input bag xs 1 , we want to update the output in response to a stream of updates to the input bag. For simplicity, we assume we have two updated inputs xs 2 and xs 3 and want to compute two updated outputs y 2 and y 3 , so the overall program can be described in Haskell as:

average
:: Bag Z → Z average xs = let s = sum xs; n = length xs; r = div s n in r average 3 = let y1 = average xs1; y2 = average xs2; y3 = average xs3 in (y1, y2, y3) Throughout the paper, we contrast base vs updated inputs and outputs. We also have base and updated values, computations, inputs, outputs, and so on.
We want to compute the updated outputs y 2 and y 3 in average 3 faster using ILC. For that, we assume that we receive not only updated inputs xs 2 and xs 3 but also input change dxs 1 from xs 1 to xs 2 and input change dxs 2 from xs 2 to xs 3 . An input change dx from x 1 to x 2 describes the changes from base input x 1 to updated input x 2 , so that x 2 can be computed via the update operator ⊕ as x 1 ⊕ dx .
We use ILC to automatically transform average to its derivative daverage :: Bag Z → ∆(Bag Z) → ∆Z. A derivative is guaranteed to map input changes to output changes. In this example, this means that dy 1 = daverage xs 1 dxs 1 is a change from base output y 1 = average xs 1 to updated output y 2 = average xs 2 , hence y 2 = y 1 ⊕ dy 1 .
Thanks to daverage's correctness, we can avoid expensive calls to average on updated inputs, and use daverage instead. This lets us rewrite average 3 to incrementalAverage 3 : As shown in previous work [10], the derivative df of a function f is the nil

Self-maintainability and efficiency of derivatives
Alas, derivatives are efficient only if they are self-maintainable, and daverage is not! So, incrementalAverage 3 is no faster than average 3 . Let us inspect the code generated by differentiating average. daverage :: Bag Z → ∆(Bag Z) → ∆Z daverage xs dxs = let s = sum xs; ds = dsum xs dxs; n = length xs; dn = dlength xs dxs; r = div s n; dr = ddiv s ds n dn in dr Since average combines sum, length, and div , its derivative daverage combines those functions and their derivatives accordingly. It recomputes s, n and r just like average, but r is not used anywhere so its recomputation could be avoided. On the other hand, expensive intermediate results s and n are passed to ddiv . If ddiv does not actually use its inputs s and n, then their computation could be avoided too. Cai et al. [7] call a derivative self-maintainable if it does not inspect its base inputs, but only its change inputs. Derivatives produced by ILC are only efficient if they do not compute inputs of self-maintainable derivatives. Cai et al. [7] leave efficient support for non-self-maintainable derivatives for future work. But ddiv is not self-maintainable! Consider its implementation: Function ddiv computes the difference between the updated and the original result. It uses its base inputs a and b, hence is not self-maintainable, and a derivative calling it, such as daverage, will not be efficient.
But not all is lost: executing daverage xs dxs will compute exactly the same s and n as executing average xs, so to avoid recomputation we must simply save and reuse them. Hence, we CTS-convert each function f to a CTS function fC and a CTS derivative dfC : CTS function fC produces, together with its final result, a cache containing intermediate results, that the caller must pass to CTS derivative dfC . CTS-converting our example produces the following code, which requires no wasteful recomputation.
For CTS-converted functions, the cache type FC is a tuple of intermediate results and caches of subcalls. For primitive functions like div , the cache type DivC could contain information needed for efficient computation of output changes. In the case of div , no additional information is needed. The definition of divC uses div and produces an empty cache, and the definition of ddivC follows the earlier definition for ddiv , except that we now pass along an empty cache.
Finally, we can rewrite average 3 to incrementally compute y 2 and y 3 : Since functions of the same type translate to functions of different types, the translation does not preserve well-typedness in a higher-order language in general, but it works well in our case studies (Sec. 4); Sec. 4.1 shows how to map such functions.

Formalization
Terms at : Closed values av : Nil change for primitive | !av Replacement change Value environments aE ::= • Empty | aE; x = av Value binding j, k, n ∈ N Step indexes We now formalize CTS-differentiation for an untyped Turing-complete λcalculus, and formally prove it sound with respect to differentiation. We also give a novel proof of correctness for differentiation itself, since we cannot simply adapt Cai et al. [7]'s proof to the new syntax: Our language is untyped and Turingcomplete, while Cai et al. [7]'s proof assumed a strongly normalizing simplytyped λ-calculus and relied on its naive set-theoretic denotational semantics. Our entire formalization is mechanized using Coq [26]. For reasons of space, some details are deferred to the appendix.
Transformations We introduce and prove sound three term transformations, namely differentiation, CTS translation and CTS differentiation, that take a function to its corresponding (non-CTS) derivative, CTS function and CTS derivative. Each CTS function produces a base output and a cache from a base input, while each CTS derivative produces an output change and an updated cache from an input, an input change and a base cache.
Proof technique To show soundness, we prove that CTS functions and derivatives simulate respectively non-CTS functions and derivatives. In turn, we formalize (non-CTS) differentiation as well, and we prove differentiation sound with respect to non-incremental evaluation. Overall, this shows that CTS functions and derivatives are sound relatively to non-incremental evaluation. Our presentation proceeds in the converse order: first, we present differentiation, formulated as a variant of Cai et al. [7]'s definition; then, we study CTS differentiation.
By using logical relations, we simplify significantly the setup of Cai et al. [7]. To handle an untyped language, we employ step-indexed logical relations. Besides, we conduct our development with big-step operational semantics because that choice simplifies the correctness proof for CTS conversion.
Structure of the formalization Sec. 3.1 introduces the syntax of the language λ L we consider in this development, and introduces its four sublanguages λ AL , λ IAL , λ CAL and λ ICAL . Sec. 3.2 presents the syntax and the semantics of λ AL , the source language for our transformations. Sec. 3.3 defines differentiation and its target language λ IAL , and Sec. 3.4 proves differentiation correct. Sec. 3.5 defines CTS conversion, comprising CTS translation and CTS differentiation, and their target languages λ CAL and λ ICAL . Sec. 3.6 presents the semantics of λ CAL . Finally, Sec. 3.7 proves CTS conversion correct.
Notations We write X for a sequence of X of some unspecified length X 1 , . . . , X m .

Syntax for λ L
A superlanguage To simplify our transformations, we require input programs to have been lambda-lifted [15] and converted to A'-normal form (A'NF). Lambdalifted programs are convenient because they allow us to avoid a specific treatment for free variables in transformations. A'NF is a minor variant of ANF [24], where every result is bound to a variable before use; unlike ANF, we also bind the result of the tail call. Thus, every result can thus be stored in a cache by CTS conversion and reused later (as described in Sec. 2). This requirement is not onerous: A'NF is a minimal variant of ANF, and lambda-lifting and ANF conversion are routine in compilers for functional languages. Most examples we show are in this form.
In contrast, our transformation's outputs are lambda-lifted but not in A'NF. For instance, we restrict base functions to take exactly one argument-a base input. As shown in Sec. 2.1, CTS functions take instead two arguments -a base input and a cache -and CTS derivatives take three arguments -an input, an input change, and a cache. We could normalize transformation outputs to inhabit the source language and follow the same invariants, but this would complicate our proofs for little benefit. Hence, we do not prescribe transformation outputs to satisfy the same invariants, and we rather describe transformation outputs through separate grammars.
As a result of this design choice, we consider languages for base programs, derivatives, CTS programs and CTS derivatives. In our Coq mechanization, we formalize those as four separate languages, saving us many proof steps to check the validity of required structural invariants. For simplicity, in the paper we define a single language called λ L (for λ-Lifted). This language satisfies invariants common to all these languages (including some of the A'NF invariants). Then, we define λ L sublanguages. We do not formalize the semantics of λ L itself, preferring to describe it informally; we only formalize the semantics of its sublanguages.
Syntax for terms The λ L language is a relatively conventional lambda-lifted λcalculus with a limited form of pattern matching on tuples. The syntax for terms and values is presented in Fig. 1. We separate terms and values in two distinct syntactic classes because we use big-step operational semantics. Our let-bindings are non-recursive as usual, and support shadowing. Terms cannot contain λexpressions directly, but only refer to closures through the environment, and similarly for literals and primitives; we elaborate on this in Sec. 3.2. We do not introduce case expressions, but only bindings that destructure tuples, both in let-bindings and λ-expressions of closures. Our semantics does not assign meaning to match failures, but pattern-matchings are only used in generated programs and our correctness proofs ensure that the matches always succeed. We allow tuples to contain terms of form x ⊕ dx , which update base values x with changes in dx , because A'NF-converting these updates is not necessary to the transformations. We often inspect the result of a function call "f x ", which is not a valid term in our syntax. Hence, we write "@(f , x )" as a syntactic sugar for "let y = f x in y" with y chosen fresh.
Syntax for closed values A closed value is either a closure, a tuple of values, a literal, a primitive, a nil change for a primitive or a replacement change. A closure is a pair of an evaluation environment E and a λ-abstraction closed with respect to E. The set of available literals is left abstract. It may contain usual first-order literals like integers. We also leave abstract the primitives p like if-then-else or projections of tuple components. Each primitive p comes with a nil change, which is its derivative as explained in Sec. 2. A change value can also represent a replacement by some closed value a v . Replacement changes are not produced by static differentiation but are useful for clients of derivatives: we include them in the formalization to make sure that they are not incompatible with our system. As usual, environments E map variables to closed values.
Sublanguages of λ L The source language for all our transformations is a sublanguage of λ L named λ AL , where A stands for A'NF. To each transformation we associate a target language, which matches the transformation image. The target language for CTS conversion is named λ CAL , where "C" stands for CTS. The target languages of differentiation and CTS differentiation are called, respectively, λ IAL and λ ICAL , where the "I" stands for incremental.

The source language λ AL
We show the syntax of λ AL in Fig. 2. As said above, λ AL is a sublanguage of λ L denoting lambda-lifted base terms in A'NF. With no loss of generality, we assume that all bound variables in λ AL programs and closures are distinct. The step-indexed big-step semantics (Fig. 3) for base terms is defined by the judgment written E t ⇓ n v (where n can be omitted) and pronounced "Under environment E, base term t evaluates to closed value v in n steps." Intuitively, our ; syntax of its target language λ IAL , tailored to the output of differentiation; syntax of its source language λ AL . We assume that in λ IAL the same let binds both y and dy and that α-renaming preserves this invariant. We also define the base environment dE 1 and the updated environment dE 2 of a change environment dE .
step-indexes count the number of "nodes" of a big-step derivation. 4 As they are relatively standard, we defer the explanations of these rules to the appendix B.
Expressiveness A closure in the base environment can be used to represent a top-level definition. Since environment entries can point to primitives, we need no syntax to directly represent calls of primitives in the syntax of base terms. To encode in our syntax a program with top-level definitions and a term to be evaluated representing the entry point, one can produce a term t representing the entry point together with an environment E containing as values any top-level definitions, primitives and literals used in the program. Semi-formally, given an environment E 0 mentioning needed primitives and literals, and a list of top-level function definitions D = f = λx . t defined in terms of E 0 , we can produce a base environment E = L(D), with L defined by: Step-indexed big-step semantics for base terms of source language λ AL .
Correspondingly, we extend all our term transformations to values and environments to transform such encoded top-level definitions.
Our mechanization can encode n-ary functions "λ(x 1 , x 2 , . . . , x n ). t" through unary functions that accept tuples; we encode partial application using a curry primitive such that, essentially, curry f x y = f (x, y); suspended partial applications are represented as closures. This encoding does not support currying efficiently, we further discuss this limitation in Sec. 4.4.
Control operators, like recursion combinators or branching, can be introduced as primitive operations as well. If the branching condition changes, expressing the output change in general requires replacement changes. Similarly to branching we can add tagged unions.
To check the assertions of the last two paragraphs, the Coq development contains the definition of a curry primitive as well as a primitive for a fixpoint combinator, allowing general recursion and recursive data structures as well.

Static differentiation from λ AL to λ IAL
Previous work [7] defines static differentiation for simply-typed λ-calculus terms. Fig. 2 transposes differentiation as a transformation from λ AL to λ IAL and defines λ IAL 's syntax.
Differentiating a base term t produces a change term D ι (t), its derivative. Differentiating final result variable x produces its change variable dx . Differentiation copies each binding of an intermediate result y to the output and adds a new binding for its change dy. If y is bound to tuple (x ), then dy will be bound to the change tuple (dx ). If y is bound to function application "f x ", then dy will be bound to the application of function change df to input x and its change dx . We explain differentiation of environments D ι (E) later in this section.
Evaluating D ι (t) recomputes all intermediate results computed by t. This recomputation will be avoided through cache-transfer style in Sec. 3.5. A comparison with the original static differentiation [7] can be found in Appendix A.
Semantics for λ IAL We move on to define how λ IAL change terms evaluate to change values. We start by defining necessary definitions and operations on Closed change values dv are particular λ L values a v . They are either a closure change, a tuple change, a literal change, a replacement change or a primitive nil change. A closure change is a closure containing a change environment dE and a λ-abstraction expecting a value and a change value as arguments to evaluate a change term into an output change value. An evaluation environment dE follows the same structure as let-bindings of change terms: it binds variables to closed values and each variable x is immediately followed by a binding for its associated change variable dx . As with let-bindings of change terms, α-renamings in an environment dE must rename dx into dy if x is renamed into y. We define the update operator ⊕ to update a value with a change. This operator is a partial function written "v ⊕ dv ", defined as follows: Replacement changes can be used to update all values (literals, tuples, primitives and closures), while tuple changes can only update tuples, literal changes can only update literals, primitive nil can only update primitives and closure changes can only update closures. A replacement change overrides the current value v with a new one v . On literals, ⊕ is defined via some interpretation function δ ⊕ , which takes a literal and a literal change to produce an updated literal. Change update for a closure ignores dt instead of computing something like dE[t ⊕ dt]. This may seem surprising, but we only need ⊕ to behave well for valid changes (as shown by Theorem 3.1): for valid closure changes, dt must behave anyway similarly to D ι (t), which Cai et al. [7] show to be a nil change. Hence, t ⊕ D ι (t) and t ⊕ dt both behave like t, so ⊕ can ignore dt and only consider environment updates. This definition also avoids having to modify terms at runtime, which would be difficult to implement safely. We could also implement f ⊕ df as a function that invokes both f and df on its argument, as done by Cai et al. [7], but we believe that would be less efficient when ⊕ is used at runtime. As we discuss in Sec. 3.4, we restrict validity to avoid this runtime overhead.
Having given these definitions, we show in Fig. 4 a step-indexed big-step semantics for change terms, defined through judgment dE dt ⇓ n dv (where n can be omitted). This judgment is pronounced "Under the environment dE , the change term dt evaluates into the closed change value dv in n steps." Rules [SDVar] and [SDTuple] are unsurprising. To evaluate function calls in letbindings "let y = f x , dy = df x dx in dt" we have three rules, depending on the shape of dE (df ). These rules all recompute the value v y of y in the original environment, but compute differently the change dy to y. If dE (df ) replaces the value of f , [SDReplaceCall] recomputes v y = f x from scratch in the new environment, and bind dy to !v y when evaluating the let body. If dE (df ) is the nil change for primitive p, [SDPrimitiveNil] computes dy by running p's derivative through function ∆ p (-). If dE (df ) is a closure change, [SDClosureChange] invokes it normally to compute its change dv y . As we show, if the closure change is valid, its body behaves like f 's derivative, hence incrementalizes f correctly.
Closure changes with non-nil environment changes represent partial application of derivatives to non-nil changes; for instance, if f takes a pair and dx is a non-nil change, 0 curry f df x dx constructs a closure change containing dx , using the derivative of curry mentioned in Sec. 3.2. In general, such closure changes do not arise from the rules we show, only from derivatives of primitives.

A new soundness proof for static differentiation
In this section, we show that static differentiation is sound (Theorem 3.3) and that Eq. (1) holds: whenever da is a valid change from a 1 to a 2 (as defined later). One might want to prove this equation assuming only that a 1 ⊕ da = a 2 , but this is false in general. A direct proof by induction on terms fails in the case for application (ultimately because f 1 ⊕df = f 2 and a 1 ⊕da = a 2 do not imply that f 1 a 1 ⊕df a 1 da = f 2 a 2 ). Step-indexed validity, through judgments for values and for terms.
As usual, this can be fixed by introducing a logical relation. We call ours validity: a function change is valid if it turns valid input changes into valid output changes.
Static differentiation is only sound on input changes that are valid. Cai et al. [7] show this for a strongly normalizing simply-typed λ-calculus using denotational semantics. Using an operational semantics, we generalize this result to an untyped and Turing-complete language, so we must turn to a step-indexed logical relation [4,3].
Alert readers might wonder why we use a big-step semantics for a Turingcomplete language. We indeed focus on incrementalizing computations that terminate on both old and new inputs. Our choice follows Acar et al. [3] (compared with in Sec. 5).
Validity as a step-indexed logical relation We say that "dv is a valid change from v 1 to v 2 , up to k steps" and write dv k v 1 → v 2 to mean that dv is a change from v 1 to v 2 and that dv is a valid description of the differences between v 1 and v 2 , with validity tested with up to k steps. This relation approximates validity; if a change dv is valid at all approximations, it is simply valid (between v 1 and v 2 ); we write then dv v 1 → v 2 (omitting the step-index k) to mean that validity holds at all step-indexes. We similarly omit step-indexes k from other step-indexed relations when they hold for all k.
To justify this intuition of validity, we show that a valid change from v 1 to v 2 goes indeed from v 1 to v 2 (Theorem 3.1), and that if a change is valid up to k steps, it is also valid up to fewer steps (Lemma 3.2).

Theorem 3.1 (⊕ agrees with validity).
If Crucially, Theorem 3.1 enables (a) computing v 2 from a valid change and its source, and (b) showing Eq. (1) through validity. As discussed, ⊕ ignores changes to closure bodies to be faster, which is only sound if those changes are nil; to ensure Theorem 3.1 still holds, validity on closure changes must be adapted accordingly and forbid non-nil changes to closure bodies. This choice, while unusual, does not affect our results: if input changes do not modify closure bodies, intermediate changes will not modify closure bodies either. Logical relation experts might regard this as a domain-specific invariant we add to our relation. Alternatives are discussed by Giarrusso [10, App. C].
As usual with step-indexing, validity is defined by well-founded induction over naturals ordered by <; to show well-foundedness we observe that evaluation always takes at least one step.
Validity for values, terms and environments is formally defined by cases in Fig. 5. First, a literal change d is a valid change from to ⊕ d = δ ⊕ ( , d ).
Since the function δ ⊕ is partial, the relation only holds for the literal changes d which are valid changes for . Second, a replacement change !v 2 is always a valid change from any value v 1 to v 2 . Third, a primitive nil change is a valid change between any primitive and itself. Fourth, a tuple change is valid up to step n, if each of its components is valid up to any step strictly less than n. Fifth, we define validity for closure changes. Roughly speaking, this statement means that a closure change is valid if (i) its environment change dE is valid for the original closure environment E 1 and for the new closure environment E 2 ; and (ii) when applied to related values, the closure bodies t are related by dt, as defined by the auxiliary judgment (dE dt) n (E 1 t 1 ) → (E 2 t 2 ) for validity between terms under related environments (defined in C). As usual with step-indexed logical relations, in the definition for this judgment about terms, the number k of steps required to evaluate the term t 1 is subtracted from the number of steps n that can be used to relate the outcomes of the term evaluations.
Soundness of differentiation We can state a soundness theorem for differentiation without mentioning step-indexes; thanks to this theorem, we can compute the updated result v 2 not by rerunning a computation, but by updating the base result v 1 with the result change dv that we compute through a derivative on the input change. A corollary shows Eq. (1). Theorem 3.3 (Soundness of differentiation in λ AL ). If dE is a valid change environment from base environment E 1 to updated environment E 2 , that is dE E 1 → E 2 , and if t converges both in the base and updated environment, We must first show that derivatives map input changes valid up to k steps to output changes valid up to k steps, that is, the fundamental property of our step-indexed logical relation: Translation of terms M = Tt(t ) Cache of a term C = C (t)  Syntax definitions for the target languages λ CAL and λ ICAL Terms of λ CAL follow again λ-lifted A'NF, like λ AL , except that a let-binding for a function application "f x " now binds an extra cache identifier c y fx besides output y. Cache identifiers have non-standard syntax: it can be seen as a triple that refers to the value identifiers f , x and y. Hence, an α-renaming of one of these three identifiers must refresh the cache identifier accordingly. Result terms explicitly return cache C through syntax (x , C). Caches are encoded through nested tuples, but they are in fact a tree-like data structure that is isomorphic to an execution trace. This trace contains both immediate values and the execution traces of nested function calls.

CTS conversion
The syntax for λ ICAL matches the image of the CTS derivative and witnesses the CTS discipline followed by the derivatives: to determine dy, the derivative of f evaluated at point x with change dx expects the cache produced by evaluating y in the base term. The derivative returns the updated cache which contains the intermediate results that would be gathered by the evaluation of f (x ⊕ dx ). The result term of every change term returns the computed change and a cache update dC , where each value identifier x of the input cache is updated with its corresponding change dx .

Differentiation of terms dM = Dt(t )
Cache update of a term dC = U (t) Change terms  CTS conversion and differentiation These translations use two auxiliary functions: C (t) which computes the cache term of a λ AL term t, and U (t), which computes the cache update of t's derivative.
CTS translation on terms, T t (t ), accepts as inputs a global term t and a subterm t of t. In tail position (t = x ), the translation generates code to return both the result x and the cache C (t) of the global term t. When the transformation visits let-bindings, it outputs extra bindings for caches c y fx on function calls and visits the let-body.
Similarly to T t (t ), CTS derivation D t (t ) accepts a global term t and a subterm t of t. In tail position, the translation returns both the result change dx and the cache update U (t). On let-bindings, it does not output bindings for y but for dy, it outputs extra bindings for c y fx as in the previous case and visits the let-body.
To handle function definitions, we transform the base environment E through T (E) and T (D ι (E)) (translations of environments are done pointwise, see D). Since D ι (E) includes E, we describe T (D ι (E)) to also cover T (E). Overall, T (D ι (E)) CTS-converts each source closure f = E[λx . t] to a CTS-translated function, with body T t (t), and to the CTS derivative df of f . This CTS derivative pattern matches on its input cache using cache pattern C (t). That way, we make sure that the shape of the cache expected by df is consistent with the shape of the cache produced by f . The body of derivative df is computed by CTS-deriving f 's body via D t (t).

Semantics of λ CAL and λ ICAL
An evaluation environment F of λ CAL contains both values and cache values. Values V resemble λ AL values v , cache values V c match cache terms C and change values dV match λ IAL change values dv . Evaluation environments dF for change terms must also bind change values, so functions in change closures take not just a base input x and an input change dx , like in λ IAL , but also an input cache C. By abuse of notation, we reuse the same syntax C to both deconstruct and construct caches.
Base terms of the language are evaluated using a conventional big-step semantics, consisting of two judgments. Judgment "F M ⇓ (V, V c )" is read "Under evaluation environment F , base term M evaluates to value V and cache V c ". The semantics follows the one of λ AL ; since terms include extra code to produce and carry caches along the computation, the semantics evaluates that code as well. For space reasons, we defer semantic rules to the appendix. Auxiliary judgment "F C ⇓ V c " evaluates cache terms into cache values: It traverses a cache term and looks up the environment for the values to be cached.
Change terms of λ ICAL are also evaluated using a big-step semantics, which resembles the semantics of λ IAL and λ CAL . Unlike those semantics, evaluating cache updates (dC , x ⊕ dx ) is evaluated using the ⊕ operator (overloaded on λ CAL values and λ ICAL changes). By lack of space, its rules are deferred to Appendix E. This semantics relies on three judgments. Judgment "dF dM ⇓ (dV , V c )" is read "Under evaluation environment F , change term dM evaluates to change value dV and updated cache V c ". The first auxiliary judgment "dF dC ⇓ V c " defines evaluation of cache update terms. The final auxiliary judgment "V c ∼ C → dF " describes a limited form of pattern matching used by CTS derivatives: namely, how a cache pattern C matches a cache value V c to produce a change environment dF .

Soundness of CTS conversion
The proof is based on a simulation in lock-step, but two subtle points emerge. First, we must relate λ AL environments that do not contain caches, with λ CAL environments that do. Second, while evaluating CTS derivatives, the evaluation environment mixes caches from the base computation and updated caches computed by the derivatives. Theorem 3.7 follows because differentiation is sound (Theorem 3.3) and evaluation commutes with CTS conversion; this last point requires two lemmas. First, CTS translation of base terms commutes with our semantics:

Lemma 3.5 (Commutation for base evaluations).
For all Second, we need a corresponding lemma for CTS translation of differentiation results: intuitively, evaluating a derivative and CTS translating the resulting change value must give the same result as evaluating the CTS derivative. But to formalize this, we must specify which environments are used for evaluation, and this requires two technicalities.
Assume derivative D ι (t) evaluates correctly in some environment dE . Evaluating CTS derivative D t (t) requires cache values from the base computation, but they are not in T (dE )! Therefore, we must introduce a judgment to complete a CTS-translated environment with the appropriate caches (see Appendix F).
Next, consider evaluating a change term of the form dM = C[dM ], where C is a standard single-hole change-term context-that is, for λ ICAL , a sequence of let-bindings. When evaluating dM , we eventually evaluate dM in a change environment dF updated by C: the change environment dF contains both the updated caches coming from the evaluation of C and the caches coming from the base computation (which will be updated by the evaluation of dM ). Again, a new judgment, given in Appendix F, is required to model this process.
With these two judgments, the second key Lemma stating the commutation between evaluation of derivatives and evaluation of CTS derivatives can be stated. We give here an informal version of this Lemma, the actual formal version can be found in Appendix F.

Lemma 3.6 (Commutation for derivatives evaluation).
If the evaluation of D ι (t) leads to an environment dE 0 when it reaches the differentiated context D ι (C) where t = C[t ], and if the CTS conversion of t under this environment completed with base (resp. changed) caches evaluates into a base value T (v ) (resp. a changed value T (v )) and a base cache value V c (resp. an updated cache value V c ), then under an environment containing the caches already updated by the evaluation of D ι (C) and the base caches to be updated, the CTS derivative of t evaluates to T (dv ) such that v ⊕ dv = v and to the updated cache V c .
Finally, we can state soundness of CTS differentiation. This theorem says that CTS derivatives not only produce valid changes for incrementalization but that they also correctly consume and update caches.

Theorem 3.7 (Soundness of CTS differentiation).
If the following hypotheses hold: then there exists dv , V c , V c and F 0 such that:

Incrementalization case studies
In this section, we investigate two questions: whether our transformations can target a typed language like Haskell and whether automatically transformed programs can perform well. We implement by hand primitives on sequences, bags and maps in Haskell. The input terms in all case studies are written in a deep embedding of λ AL into Haskell. The transformations generate Haskell code that uses our primitives and their derivatives. We run the transformations on three case studies: a computation of the average value of a bag of integers, a nested loop over two sequences and a more involved example inspired by Koch et al. [17]'s work on incrementalizing database queries. For each case study, we make sure that results are consistent between from scratch recomputation and incremental evaluation; we measure the execution time for from scratch recomputation and incremental computation as well as the space consumption of caches. We obtain efficient incremental programs, that is ones for which incremental computation is faster than from scratch recomputation. The measurements indicate that we do get the expected asymptotic improvement in time of incremental computation over from scratch recomputation by a linear factor while the caches grows in a similar linear factor.
Our benchmarks were compiled by GHC 8.2.2 and run on a 2.20GHz hexa core Intel(R) Xeon(R) CPU E5-2420 v2 with 32GB of RAM running Ubuntu 14.04. We use the criterion [21] benchmarking library.

Averaging bags of integers
Sec. 2.1 motivates our transformation with a running example of computing the average over a bag of integers. We represent bags as maps from elements to (possibly negative) multiplicities. Earlier work [7,17] represents bag changes as bags of removed and added elements. We use a different representation of bag changes that takes advantage of the changes to elements and provide primitives on bags and their derivatives. The CTS variant of map, that we call mapC , takes a function fC in CTS and a bag as and produces a bag and a cache. The cache stores for each invocation of fC , and therefore for each distinct element in as, the result of fC of type b and the cache of type c.
Inspired by Rossberg et al. [23], all higher-order functions (and typically, also their caches) are parametric over cache types of their function arguments. Here, functions mapC and dmapC and cache type MapC are parametric over the cache type c of fC and dfC . (Map a (b, c)) mapC :: (a → (b, c)) → Bag a → (Bag b, MapC a b c) dmapC :: (a → (b, c) We wrote the length and sum functions used in our benchmarks in terms of primitives map and foldGroup and had their CTS function and CTS derivative generated automatically.  We evaluate whether we can produce an updated result with daverageC shown in Sec. 2.1 faster than by from scratch recomputation with average. We expect the speedup of daverageC to depend on the size of the input bag n. We fix an input bag of size n as the bag containing the numbers from 1 to n. We define a change that inserts the integer 1 into the bag. To measure execution time of from scratch recomputation, we apply average to the input bag updated with the change. To measure execution time of the CTS function averageC , we apply averageC to the input bag updated with the change. To measure execution time of the CTS derivative daverageC , we apply daverageC to the input bag, the change and the cache produced by averageC when applied to the input bag. In all three cases we ensure that all results and caches are fully forced so as to not hide any computational cost behind laziness.
The plot in Fig. 8a shows execution time versus the size n of the base input. To produce the base result and cache, the CTS transformed function averageC takes longer than the original average function takes to produce just the result. Producing the updated result incrementally is slower than from scratch recomputation for small input sizes, but because of the difference in time complexity becomes faster as the input size grows. The size of the cache grows linearly with the size of the input, which is not optimal for this example. We leave optimizing the space usage of examples like this to future work.

Nested loops over two sequences
Next, we consider CTS differentiation on a higher-order example. To incrementalize this example efficiently, we have to enable detecting nil function changes at runtime by representing function changes as closures that can be inspected by incremental programs. Our example here is the Cartesian product of two sequences computed in terms of functions map and concat.
cartesianProduct :: Sequence a → Sequence b → Sequence (a, b) cartesianProduct xs ys = concatMap (λx → map (λy → (x , y)) ys) xs concatMap :: (a → Sequence b) → Sequence a → Sequence b concatMap f xs = concat (map f xs) We implemented incremental sequences and related primitives following Firsov and Jeltsch [9]: our change operations and first-order operations (such as concat) reuse their implementation. On the other hand, we must extend higher-order operations such as map to handle non-nil function changes and caching. A correct and efficient CTS derivative dmapC has to work differently depending on whether the given function change is nil or not: For a non-nil function change it has to go over the input sequence; for a nil function change it has to avoid that.
Cai et al. [7] use static analysis to conservatively approximate nil function changes as changes to terms that are closed in the original program. But in this example the function argument (λy → (x , y)) to map in cartesianProduct is not a closed term. It is, however, crucial for the asymptotic improvement that we avoid looping over the inner sequence when the change to the free variable x in the change environment is 0 x .
To enable runtime nil change detection, we apply closure conversion to the original program and explicitly construct closures and changes to closures. While the only valid change for closed functions is their nil change, for closures we can have non-nil function changes. A function change df , represented as a closure change, is nil exactly when all changes it closes over are nil.
We represent closed functions and closures as variants of the same type. Correspondingly we represent changes to a closed function and changes to a closure as variants of the same type of function changes. We inspect this representation at runtime to find out if a function change is a nil change. We use the same benchmark setup as in the benchmark for the average computation on bags. The input of size n is a pair of sequences (xs, ys). Each sequence initially contains the integers from 1 to n. Updating the result in reaction to a change dxs to the outer sequence xs takes less time than updating the result in reaction to a change dys to the inner sequence ys. While a change to the outer sequence xs results in an easily located change in the output sequence, a change for the inner sequence ys results in a change that needs a lot more calculation to find the elements it affects. We benchmark changes to the outer sequence xs and the inner sequence ys separately where the change to one sequence is the insertion of a single integer 1 at position 1 and the change for the other one is the nil change. Fig. 9 shows execution time versus input size. In this example again preparing the cache takes longer than from scratch recomputation alone. The speedup of incremental computation over from scratch recomputation increases with the size of the base input sequences because of the difference in time complexity. Eventually we do get speedups for both kinds of changes (to the inner and to the outer sequence), but for changes to the outer sequence we get a speedup

Indexed joins of two bags
Our goal is to show that we can compose primitive functions into larger and more complex programs and apply CTS differentiation to get a fast incremental program. We use an example inspired from the DBToaster literature [17]. In this example we have a bag of orders and a bag of line items. An order is a pair of an order key and an exchange rate. A line item is a pair of an order key and a price. We build an index mapping each order key to the sum of all exchange rates of the orders with this key and an index from order key to the sum of the prices of all line items with this key. We then merge the two maps by key, multiplying corresponding sums of exchange rates and sums of prices. We compute the total price of the orders and line items as the sum of those products. Unlike DBToaster, we assume our program is already transformed to explicitly use indexes, as above. Because our indexes are maps, we implemented a change structure, CTS primitives and their CTS derivatives for maps.
To build the indexes, we use a groupBy function built from primitive functions foldMapGroup on bags and singleton for bags and maps respectively. The CTS function groupByC and the CTS derivative dgroupByC are automatically generated. While computing the indexes with groupBy is self-maintainable, merging them is not. We need to cache and incrementally update the intermediately created indexes to avoid recomputing them.
We evaluate the performance in the same way we did in the other case studies. The input of size n is a pair of bags where both contain the pairs (i , i ) for i between 1 and n. The change is an insertion of the order (1, 1) into the orders bag. For sufficiently large inputs, our CTS derivative of the original program produces updated results much faster than from scratch recomputation, again because of a difference in time complexity as indicated by Fig. 8b. The size of the cache grows linearly with the size of the input in this example. This is unavoidable, because we need to keep the indexes.

Limitations and future work
Typing of CTS programs Functions of the same type f 1 , f 2 :: A → B can be transformed to CTS functions f 1 :: A → (B , C 1 ), f 2 :: A → (B , C 2 ) with different cache types C 1 , C 2 , since cache types depend on the implementation. This heterogeneous typing of translated functions poses difficult typing issues, e.g. what is the translated type of a list (A → B )? We cannot hide cache types behind existential quantifiers because they would be too abstract for derivatives, which only work on very specific cache types. We can fix this problem with some runtime overhead by using a single type Cache, defined as a tagged union of all cache types or, maybe with more sophisticated type systems -like first-class translucent sums, open existentials or Typed Adapton's refinement types [12] that could be able to correctly track down cache types properly.
In any case, we believe that these machineries would add a lot of complexity without helping much with the proof of correctness. Indeed, the simulation relation is more handy here because it maintains a global invariant about the whole evaluations (typically the consistency of cache types between base computations and derivatives), not many local invariants about values as types would.
One might wonder why caches could not be totally hidden from the programmer by embedding them in the derivatives themselves; or in other words, why we did not simply translate functions of type A → B into functions of type A → B × (∆A → ∆B). We tried this as well; but unlike automatic differentiation, we must remember and update caches according to input changes (especially when receiving a sequence of such changes as in Sec. 2.1). Returning the updated cache to the caller works; we tried closing over the caches in the derivative, but this ultimately fails (because we could receive function changes to the original function, but those would need access to such caches).
Comprehensive performance evaluation This paper focuses on theory and we leave benchmarking in comparison to other implementations of incremental computation to future work. The examples in our case study were rather simple (except perhaps for the indexed join). Nevertheless, the results were encouraging and we expect them to carry over to more complex examples, but not to all programs. A comparison to other work would also include a comparison of space usage for auxiliary data structure, in our case the caches.
Cache pruning via absence analysis To reduce memory usage and runtime overhead, it should be possible to automatically remove from transformed programs any caches or cache fragments that are not used (directly or indirectly) to compute outputs. Liu [19] performs this transformation on CTS programs by using absence analysis, which was later extended to higher-order languages by Sergey et al. [25]. In lazy languages, absence analysis removes thunks that are not needed to compute the output. We conjecture that the analysis could remove unused caches or inputs, if it is extended to not treat caches as part of the output.
Unary vs n-ary abstraction We only show our transformation correct for unary functions and tuples. But many languages provide efficient support for applying curried functions such as div :: Z → Z → Z. Naively transforming such a curried function to CTS would produce a function divC of type Z → (Z → (Z, DivC 2 )), DivC 1 ) with DivC 1 = (), which adds excessive overhead. In Sec. 2 and our evaluation we use curried functions and never need to use this naive encoding, but only because we always invoke functions of known arity.

Related work
Cache-transfer-style Liu [19]'s work has been the fundamental inspiration to this work, but her approach has no correctness proof and is restricted to a first-order untyped language. Moreover, while the idea of cache-transfer-style is similar, it's unclear if her approach to incrementalization would extend to higher-order programs. Firsov and Jeltsch [9] also approach incrementalization by code transformation, but their approach does not deal with changes to functions. Instead of transforming functions written in terms of primitives, they provide combinators to write CTS functions and derivatives together. On the other hand, they extend their approach to support mutable caches, while restricting to immutable ones as we do might lead to a logarithmic slowdown.
Finite differencing Incremental computation on collections or databases by finite differencing has a long tradition [22,6]. The most recent and impressive line of work is the one on DBToaster [16,17], which is a highly efficient approach to incrementalize queries over bags by combining iterated finite differencing with other program transformations. They show asymptotic speedups both in theory and through experimental evaluations. Changes are only allowed for datatypes that form groups (such as bags or certain maps), but not for instance for lists or sets. Similar ideas were recently extended to higher-order and nested computation [18], though only for datatypes that can be turned into groups. Koch et al. [18] emphasize that iterated differentiation is necessary to obtain efficient derivatives; however, ANF conversion and remembering intermediate results appear to address the same problem, similarly to the field of automatic differentiation [27].
Logical relations To study correctness of incremental programs we use a logical relation among base values v 1 , updated values v 2 and changes dv . To define a logical relation for an untyped λ-calculus we use a step-indexed logical relation, following Appel and McAllester [5], Ahmed [4]; in particular, our definitions are closest to the ones by Acar et al. [3], who also work with an untyped language, big-step semantics and (a different form of) incremental computation. However, they do not consider first-class changes. Technically, we use environments rather than substitution, and index our big-step semantics differently.

Dynamic incrementalization
The approaches to incremental computation with the widest applicability are in the family of self-adjusting computation [1,2], including its descendant Adapton [14]. These approaches incrementalize programs by combining memoization and change propagation: after creating a trace of base computations, updated inputs are compared with old ones in O(1) to find corresponding outputs, which are updated to account for input modifications. Compared to self-adjusting computation, Adapton only updates results that are demanded. As usual, incrementalization is not efficient on arbitrary programs, but only on programs designed so that input changes produce small changes to the computation trace; refinement type systems have been designed to assist in this task [8,12]. To identify matching inputs, Nominal Adapton [13] replaces input comparisons by pointer equality with first-class labels, enabling more reuse.

Conclusion
We have presented a program transformation which turns a functional program into its derivative and efficiently shares redundant computations between them thanks to a statically computed cache. This work has been mechanized using Coq and the corresponding proof is available online, together with the case study material. 5 Although our first practical case studies show promising results, this paper focused on putting CTS differentiation on solid theoretical ground. For the moment, we only have scratched the surface of the incrementalization opportunities opened by CTS primitives and their CTS derivatives: in our opinion, exploring the design space for cache data structures will lead to interesting new results in purely functional incremental programming.

A Comparison with original static differentiation
Cai et al. [7]'s static differentiation for λ-terms [7] is defined instead as follows: Even though the first two cases of Cai et al. [7]'s differentiation map into the two cases of our differentiation variant, one may ask where the third case is realized now. Actually, this third case occurs while we transform the base environment E. Indeed, we will assume that the closures of the environment of the source program have been adjoined a nil change or derivative, as defined by D ι (E). Change value D ι (v ) represents a nil change for v ; if v is a closure, its nil change is also its derivative [7]. Computing D ι (v ) requires a meta-level function nil which computes a nil change for every literal . The existence of such a function is generally assumed in previous work [7] as it is always valid to use the change ! . Yet, since this triggers recomputation, it can be more efficient to pick nil changes appropriate to the domain and to the definition of ⊕.

B Description of the semantic rules for λ AL
Rule [SVar] looks variable x up in environment E. Other rules evaluate letbinding "let y = . . . in t" in environment E: Each rule computes y's new value v y (taking m steps) and evaluates in n steps the body t to a value v , using environment E extended by binding y to v y . The overall let-binding evaluates to v in m + n + 1 steps. But different rules compute the value of y differently.
[STuple] looks each variable in x up in E to evaluate tuple (x ) (in m = 0 steps). [SPrimitiveCall] evaluates function calls where variable f is bound in E to a primitive p, evaluated as specified by a function δ p (-) from closed values to closed values. To evaluate such a primitive call, this rule applies δ p (-) to x 's value (in m = 0 steps). [SClosureCall] evaluates function calls where variable f is bound in E to closure E f [λx . t f ]: this rule evaluates the closure body t f in m steps, using the closure environment E f extended with the value of x in E.

D CTS translation of environments
Translation of value environments F = T (E)

E Target language semantics
Evaluation of base terms F M ⇓ (V, Vc) Evaluation of caches F C ⇓ Vc  The rules for the evaluation of λ CAL change terms are similar to the evaluation rules of λ AL change terms except that (i) caches are carried along the computation and updated at return sites; (ii) there is no recomputation of base Evaluation of change terms dF dM ⇓ (dV , Vc) dF ; dy = !V ; c y fx = V c dM ⇓ (dV , Vc) dF let dy, c y fx = df dx c y fx in dM ⇓ (dV , Vc) [TDPrimitiveNil] dF (f , df ) = p, 0p dF (x , dx ) = Vx , dV x dE ; dy, c y fx = ∆p(Vx , dV x , dF (c y fx )) dM ⇓ (dV , Vc) dF let dy, c y fx = df x dx c y fx in dM ⇓ (dV , Vc) [TDClosureChange] dF (df ) = dF f [λx dx C. dM f ] dF (c y fx ) ∼ C → dF dF f ; x = dF (x ); dx = dF (dx ); dF dM f ⇓ (dV y , V c ) dF ; dy = dV y , c y fx = V c dM ⇓ (dV , Vc) dF let dy, c y fx = df x dx c y fx in dM ⇓ (dV , Vc) Binding of caches Vc ∼ C → dF values since they are now extracted from the caches. Rule [TDResult] returns the final change value of a computation as well as a updated cache resulting from the evaluation of the cache update term dC . Rule [TDTuple] resembles its counterpart in the source language, but the tuple for y is not built as it has already been pushed in the environment by the cache. As for λ AL , there are three rules to deal with let-bindings depending on the shape of the change bound to df in the environment: -If df is bound to a replacement, the rule [TDReplaceCall] applies. In that case, we reevaluate the function call in the updated environment dF 2 (defined similarly as in the source language). This evaluation leads to a new value V which replaces the original one as well as an updated cache for c y fx . -If df is bound to a nil change and f is bound to primitive p, the rule [TD-PrimitiveNil] applies. The derivative of p is invoked with the value of x , its change value and the cache of the original call to p. The semantics of p's derivative is given by builtin function ∆ p (-), as in the source language. -If df is bound to a closure change and f is bound to a closure, the rule [TD-ClosureChange] applies. The body of the closure change is evaluated under the closure change environment extended with the value of the formal argument x , its change dx and with the environment resulting from the binding of the original cache value to the variables occuring in the cache C. This evaluation leads to a change and an updated cache bound in the environment to continue with the evaluation of the rest of the term.

F Soundness proof of CTS conversion
This section gives complementary technical details about the homonymous section of the paper. The proof requires a judgment to relate source change environments with CTS change environments that appear during the evaluation of λ CAL change terms under context. Assuming that dF is a change environment with no cache, this judgment is written dF dF 0 ↑ C → k dF k and is read "Under the change environment dF , the change environment with no cache dF is completed into the change environment dF k with the caches produced by C in the base computation if k = 1 or in the updated computation if k = 2".
The rules for this judgment are given in Fig. 12. [CompleteNil] is a straightforward base case. [CompleteTuple] does not introduce any cache since only function calls do produce caches. [CompleteWithBaseCall] (resp. [Complete-WithUpdatedCall]) computes a cache V c by evaluating @(f , x ) under the base (resp. updated) environment dF 1 (resp. dF 2 ). In [CompleteWithUpdated-Call], one might wonder why the recursive call which evaluates the remaining context C is not performed under an environment where y = V . Actually, this is consistent with the fact that in the CTS derivatives base values are only updated at the end of the computation when the updated cache is returned. This is necessary since derivatives calls in C still need values from the base computation, not the ones from the updated computation. [CompleteNil] dF ; y = (V ); dy = (dV ) dF 0 ↑ C → k dF k dF dF 0; y = (V ); dy = (dV ) ↑ let y = (x ) in C → k dF k ; y = (V ); dy = (dV ) [CompleteWithBaseCall] dF 1 @(f , x ) ⇓ (V, Vc) dF ; y = V ; dy = dV ; c y fx = Vc dF 0 ↑ C →1 dF 1 dF dF 0; y = V ; dy = dV ↑ let y = f x in C →1 dF 1; y = V ; dy = dV ; c y fx = Vc [CompleteWithUpdatedCall] dF 2 @(f , x ) ⇓ (V , Vc) V ⊕ dV = V dF ; y = V ; dy = dV ; c y fx = Vc dF 0 ↑ C →2 dF 2 dF dF 0; y = V ; dy = dV ↑ let y = f x in C →2 dF 2; y = V ; dy = dV ; c y fx = Vc Thanks to this judgment, we can state the following key technical lemma whose proof represents a large part of our Coq formalization.
Lemma F.1 (Evaluation of derivatives commutes with evaluation of CTS derivatives).
Let t = C[t ] be a term and dE be a change environment. If the following hypotheses hold: base (resp. updated) computations. The hypothesis (5) (resp. (6)) evaluates t under the base (resp. updated) change environment equipped with caches to get the cached values for the whole base (resp. updated) computation. Finally, the hypothesis (7) extracts from V c only the base cached values that are needed to evaluate D t (t ): the operation V ↑|C| c removes the |C| first elements from the cache V c . Indeed, these cached values have already been updated by the evaluation of C and they are present in dF 2 . The conclusion of this lemma states that T (dE ) (the CTS version of dE ), extended with the F 0 , the not-yet-updated cache values from the base computation, and with the updated change environment dF 2 , is a sound environment to evaluate the CTS derivative to the CTS version of dv and to the updated cache V c .