# A Fistful of Dollars: Formalizing Asymptotic Complexity Claims via Deductive Program Verification

## Abstract

We present a framework for simultaneously verifying the functional correctness and the worst-case asymptotic time complexity of higher-order imperative programs. We build on top of Separation Logic with Time Credits, embedded in an interactive proof assistant. We formalize the *O* notation, which is key to enabling modular specifications and proofs. We cover the subtleties of the multivariate case, where the complexity of a program fragment depends on multiple parameters. We propose a way of integrating complexity bounds into specifications, present lemmas and tactics that support a natural reasoning style, and illustrate their use with a collection of examples.

## 1 Introduction

A program or program component whose functional correctness has been verified might nevertheless still contain complexity bugs: that is, its performance, in some scenarios, could be much poorer than expected.

Indeed, many program verification tools only guarantee partial correctness, that is, do not even guarantee termination, so a verified program could run forever. Some program verification tools do enforce termination, but usually do not allow establishing an explicit complexity bound. Tools for automatic complexity inference can produce complexity bounds, but usually have limited expressive power.

In practice, many complexity bugs are revealed by testing. Some have also been detected during ordinary program verification, as shown by Filliâtre and Letouzey [14], who find a violation of the balancing invariant in a widely-distributed implementation of binary search trees. Nevertheless, none of these techniques can guarantee, with a high degree of assurance, the absence of complexity bugs in software.

*i*,

*j*) takes linear time. It would be embarrassing if such faulty code was deployed, as it would aggravate benevolent users and possibly allow malicious users to mount denial-of-service attacks.

As illustrated above, complexity bugs can affect execution time, but could also concern space (including heap space, stack space, and disk space) or other resources, such as the network, energy, and so on. In this paper, for simplicity, we focus on execution time only. That said, much of our work is independent of which resource is considered. We expect that our techniques could be adapted to verify asymptotic bounds on the use of other non-renewable resources, such as the network.

We work with a simple model of program execution, where certain operations, such as calling a function or entering a loop body, cost one unit of time, and every other operation costs nothing. Although this model is very remote from physical running time, it is independent of the compiler, operating system, and hardware [18, 24] and still allows establishing asymptotic time complexity bounds, and therefore, detecting complexity bugs—situations where a program is asymptotically slower than it should be.

In prior work [11], the second and third authors present a method for verifying that a program satisfies a specification that includes an explicit bound on the program’s worst-case, amortized time complexity. They use Separation Logic with Time Credits, a simple extension of Separation Logic [23] where the assertion \(\$1\) represents a permission to perform one step of computation, and is consumed when exercised. The assertion \(\$n\) is a separating conjunction of *n* such time credits. Separation Logic with Time Credits is implemented in the second author’s interactive verification framework, CFML [9, 10], which is embedded in the Coq proof assistant.

Using CFML, the second and third authors verify the correctness and time complexity of an OCaml implementation of the Union-Find data structure [11]. However, their specifications involve *concrete* cost functions: for instance, the precondition of the function *find* indicates that calling *find* requires and consumes \(\$(2\alpha (n)+4)\), where *n* is the current number of elements in the data structure, and where \(\alpha \) denotes an inverse of Ackermann’s function. We would prefer the specification to give the *asymptotic* complexity bound \(O(\alpha (n))\), which means that, for *some* function \(f\in O(\alpha (n))\), calling *find* requires and consumes \(\$f(n)\). This is the purpose of this paper.

We argue that the use of asymptotic bounds, such as \(O(\alpha (n))\), is necessary for (verified or unverified) complexity analysis to be applicable at scale. At a superficial level, it reduces clutter in specifications and proofs: *O*(*mn*) is more compact and readable than \(3mn + 2n \log n + 5n + 3m + 2\). At a deeper level, it is crucial for stating modular specifications, which hide the details of a particular implementation. Exposing the fact that *find* costs \(2\alpha (n)+4\) is undesirable: if a tiny modification of the Union-Find module changes this cost to \(2\alpha (n)+5\), then all direct and indirect clients of the Union-Find module must be updated, which is intolerable. Furthermore, sometimes, the constant factors are unknown anyway. Applying the Master Theorem [12] to a recurrence equation only yields an order of growth, not a concrete bound. Finally, for most practical purposes, no critical information is lost when concrete bounds such as \(2\alpha (n)+4\) are replaced with asymptotic bounds such as \(O(\alpha (n))\). Indeed, the number of computation steps that take place at the source level is related to physical time only up to a hardware- and compiler-dependent constant factor. The use of asymptotic complexity in the analysis of algorithms, initially advocated by Hopcroft and by Tarjan, has been widely successful and is nowadays standard practice.

One must be aware of several limitations of our approach. First, it is not a worst-case execution time (WCET) analysis: it does not yield bounds on actual physical execution time. Second, it is not fully automated. We place emphasis on expressiveness, as opposed to automation. Our vision is that verifying the functional correctness *and* time complexity of a program, at the same time, should not involve much more effort than verifying correctness alone. Third, we control only the growth of the cost as the parameters grow large. A loop that counts up from 0 to \(2^{60}\) has complexity *O*(1), even though it typically won’t terminate in a lifetime. Although this is admittedly a potential problem, traditional program verification falls prey to analogous pitfalls: for instance, a program that attempts to allocate and initialize an array of size (say) \(2^{48}\) can be proved correct, even though, on contemporary desktop hardware, it will typically fail by lack of memory. We believe that there is value in our approach in spite of these limitations.

Reasoning and working with asymptotic complexity bounds is not as simple as one might hope. As demonstrated by several examples in Sect. 2, typical paper proofs using the \(O\) notation rely on informal reasoning principles which can easily be abused to prove a contradiction. Of course, using a proof assistant steers us clear of this danger, but implies that our proofs cannot be quite as simple and perhaps cannot have quite the same structure as their paper counterparts.

A key issue that we run against is the handling of existential quantifiers. According to what was said earlier, the specification of a sorting algorithm, say *mergesort*, should be, roughly: “there exists a cost function \(f\in O(\lambda n.n\log n)\) such that *mergesort* is content with \(\$f(n)\), where *n* is the length of the input list.” Therefore, the very first step in a naïve proof of *mergesort* must be to exhibit a witness for *f*, that is, a concrete cost function. An appropriate witness might be \(\lambda n.2n\log n\), or \(\lambda n.n\log n+3\), who knows? This information is not available up front, at the very *beginning* of the proof; it becomes available only *during* the proof, as we examine the code of *mergesort*, step by step. It is not reasonable to expect the human user to guess such a witness. Instead, it seems desirable to *delay* the production of the witness and to *gradually* construct a cost expression as the proof progresses. In the case of a nonrecursive function, such as *insertionsort*, the cost expression, once fully synthesized, yields the desired witness. In the case of a recursive function, such as *mergesort*, the cost expression yields the body of a recurrence equation, whose solution is the desired witness.

- 1.
We formalize \(O\) as a binary

*domination*relation between functions of type \(A \rightarrow \mathbb {Z}\), where the type*A*is chosen by the user. Functions of several variables are covered by instantiating*A*with a product type. We contend that, in order to define what it means for \(a \in A\) to “grow large”, or “tend towards infinity”, the type*A*must be equipped with a filter [6], that is, a quantifier \(\mathbb {U}a.P\). (Eberl [13] does so as well.) We propose a library of lemmas and tactics that can prove nonnegativeness, monotonicity, and domination assertions (Sect. 3). - 2.
We propose a standard style of writing specifications, in the setting of the CFML program verification framework, so that they integrate asymptotic time complexity claims (Sect. 4). We define a predicate, Open image in new window , which imposes this style and incorporates a few important technical decisions, such as the fact that every cost function must be nonnegative and nondecreasing.

- 3.
We propose a methodology, supported by a collection of Coq tactics, to prove such specifications (Sect. 5). Our tactics, which heavily rely on Coq metavariables, help gradually synthesize cost expressions for straight-line code and conditionals, and help construct the recurrence equations involved in the analysis of recursive functions, while delaying their resolution.

- 4.
We present several classic examples of complexity analyses (Sect. 6), including: a simple loop in \(O(n.2^n)\), nested loops in \(O(n^3)\) and

*O*(*nm*), binary search in \(O(\log n)\), and Union-Find in \(O(\alpha (n))\).

Our code can be found online in the form of two standalone Coq libraries and a self-contained archive [16].

## 2 Challenges in Reasoning with the \(O\) Notation

*O*(1)”, “that code has asymptotic complexity

*O*(

*n*)”, and so on. Yet, these assertions are too informal: they do not have sufficiently precise meaning, and can be easily abused to produce flawed paper proofs.

A striking example appears in Fig. 2, which shows how one might “prove” that a recursive function has complexity *O*(1), whereas its actual cost is *O*(*n*). The flawed proof exploits the (valid) relation \(O(1) + O(1) = O(1)\), which means that a sequence of two constant-time code fragments is itself a constant-time code fragment. The flaw lies in the fact that the \(O\) notation hides an existential quantification, which is inadvertently swapped with the universal quantification over the parameter *n*. Indeed, the claim is that “there exists a constant *c* such that, for every *n*, Open image in new window runs in at most *c* computation steps”. However, the proposed proof by induction establishes a much weaker result, to wit: “for every *n*, there exists a constant *c* such that Open image in new window runs in at most *c* steps”. This result is certainly true, yet does not entail the claim.

An example of a different nature appears in Fig. 3. There, the auxiliary function Open image in new window takes two integer arguments *n* and *m* and involves two nested loops, over the intervals [1, *n*] and [1, *m*]. Its asymptotic complexity is \(O(n + nm)\), which, *under the hypothesis that m is large enough*, can be simplified to *O*(*nm*). The reasoning, thus far, is correct. The flaw lies in our attempt to substitute 0 for *m* in the bound *O*(*nm*). Because this bound is valid only for sufficiently large *m*, it does not make sense to substitute a specific value for *m*. In other words, from the fact that “ Open image in new window costs *O*(*nm*) when *n* and *m* are sufficiently large”, one *cannot* deduce anything about the cost of Open image in new window . To repair this proof, one must take a step back and prove that Open image in new window has asymptotic complexity \(O(n + nm)\) *for sufficiently large n and for every m.* This fact *can* be instantiated with \(m=0\), allowing one to correctly conclude that Open image in new window costs *O*(*n*). We come back to this example in Sect. 3.3.

*f*(

*m*,

*n*,

*i*) represents the true cost of the

*i*-th loop iteration and

*g*(

*m*,

*n*,

*i*) represents an asymptotic bound on

*f*(

*m*,

*n*,

*i*):

*f*(

*m*,

*n*, 0) is \(2^n\) and

*f*(

*m*,

*n*,

*i*) when \(i > 0\) is

*ni*, while

*g*(

*m*,

*n*,

*i*) is just

*ni*. The left-hand side of the above implication holds, but the right-hand side does not, as \(2^n + \sum _{i=1}^{m-1}\,ni\) is \(O(2^n + nm^2)\), not \(O(nm^2)\). The Summation lemma presented later on in this paper (Lemma 8) rules out the problem by adding the requirement that

*f*be a nondecreasing function of the loop index

*i*. We discuss in depth later on (Sect. 4.5) why cost functions should and can be monotonic.

The examples that we have presented show that the informal reasoning style of paper proofs, where the \(O\) notation is used in a loose manner, is unsound. One cannot hope, in a formal setting, to faithfully mimic this reasoning style. In this paper, we do assign \(O\) specifications to functions, because we believe that this style is elegant, modular and scalable. However, during the analysis of a function body, we abandon the \(O\) notation. We first synthesize a cost expression for the function body, then check that this expression is indeed dominated by the asymptotic bound that appears in the specification.

## 3 Formalizing the \(O\) Notation

### 3.1 Domination

In many textbooks, the fact that *f* is bounded above by *g* asymptotically, up to constant factor, is written “\(f = O(g)\)” or “\(f \in O(g)\)”. However, the former notation is quite inappropriate, as it is clear that “\(f = O(g)\)” cannot be literally understood as an equality. Indeed, if it truly were an equality, then, by symmetry and transitivity, \(f_1=O(g)\) and \(f_2=O(g)\) would imply \(f_1=f_2\). The latter notation makes much better sense: *O*(*g*) is then understood as a set of functions. This approach has in fact been used in formalizations of the \(O\) notation [3]. Yet, in this paper, we prefer to think directly in terms of a *domination* preorder between functions. Thus, instead of “\(f \in O(g)\)”, we write \(f \preceq g\).

Although the \(O\) notation is often defined in the literature only in the special case of functions whose domain is \(\mathbb {N}\), \(\mathbb {Z}\) or \(\mathbb {R}\), we must define domination in the general case of functions whose domain is an arbitrary type *A*. By later instantiating *A* with a product type, such as \(\mathbb {Z}^k\), we get a definition of domination that covers the multivariate case. Thus, let us fix a type *A*, and let *f* and *g* inhabit the function type \(A \rightarrow \mathbb {Z}\).^{1}

*A*, it turns out, is not quite enough. In addition, the type

*A*must be equipped with a

*filter*[6]. To see why that is the case, let us work towards the definition of domination. As is standard, we wish to build a notion of “growing large enough” into the definition of domination. That is, instead of requiring a relation of the form \(|f(x)| \le c \; |g(x)|\) to be “everywhere true”, we require it to be “ultimately true”, that is, “true when

*x*is large enough”.

^{2}Thus, \(f \preceq g\) should mean, roughly:

That is, somewhat more formally:“up to a constant factor, ultimately, \(|f|\) is bounded above by \(|g|\).”

In mathematical notation, we would like to write: \( \exists c.\; \mathbb {U}x.\; |f(x)| \le c \, |g(x)| \). For such a formula to make sense, we must define the meaning of the formula \(\mathbb {U}x.P\), where“for some

c, for every sufficiently largex, \(|f(x)| \le c \, |g(x)|\)”

*x*inhabits the type

*A*. This is the reason why the type

*A*must be equipped with a filter \(\mathbb {U}\), which intuitively should be thought of as a quantifier, whose meaning is “ultimately”. Let us briefly defer the definition of a filter (Sect. 3.2) and sum up what has been explained so far:

## Definition 1

**(Domination).** Let *A* be a filtered type, that is, a type *A* equipped with a filter \(\mathbb {U}_{A}\).

### 3.2 Filters

Whereas \(\forall x.P\) means that *P* holds of *every* *x*, and \(\exists x.P\) means that *P* holds of *some* *x*, the formula \(\mathbb {U}x.P\) should be taken to mean that *P* holds of every *sufficiently large* *x*, that is, *P* *ultimately* holds.

The formula \(\mathbb {U}x.P\) is short for \(\mathbb {U}\;(\lambda x.P)\). If *x* ranges over some type *A*, then \(\mathbb {U}\) must have type \(\mathcal {P}(\mathcal {P}(A))\), where \(\mathcal {P}(A)\) is short for \(A\rightarrow \text {Prop}\). To stress this better, although Bourbaki [6] states that a filter is “a set of subsets of *A*”, it is crucial to note that \(\mathcal {P}(\mathcal {P}(A))\) is the type of a quantifier in higher-order logic.

## Definition 2

**(Filter).**A

*filter*[6] on a type

*A*is an object \(\mathbb {U}\) of type \(\mathcal {P}(\mathcal {P}(A))\) that enjoys the following four properties, where \(\mathbb {U}x. P\) is short for \(\mathbb {U}\; (\lambda x.P)\):

Properties (1)–(3) are intended to ensure that the intuitive reading of \(\mathbb {U}x.P\) as: “for sufficiently large *x*, *P* holds” makes sense. Property (1) states that if \(P_1\) implies \(P_2\) and if \(P_1\) holds when *x* is large enough, then \(P_2\), too, should hold when *x* is large enough. Properties (2a) and (2b), together, state that if each of \(P_1, \ldots , P_k\) independently holds when *x* is large enough, then \(P_1, \ldots , P_k\) should simultaneously hold when *x* is large enough. Properties (1) and (2b) together imply \(\forall x.P \Rightarrow \mathbb {U}x.P\). Property (3) states that if *P* holds when *x* is large enough, then *P* should hold of some *x*. In classical logic, it would be equivalent to \(\lnot (\mathbb {U}x.\text {False})\).

In the following, we let the metavariable *A* stand for a *filtered type*, that is, a pair of a carrier type and a filter on this type. By abuse of notation, we also write *A* for the carrier type. (In Coq, this is permitted by an implicit projection.) We write \(\mathbb {U}_{A}\) for the filter.

### 3.3 Examples of Filters

When \(\mathbb {U}\) is a *universal filter*, \(\mathbb {U}x.Q(x)\) is (by definition) equivalent to \(\forall x.Q(x)\). Thus, a predicate *Q* is “ultimately true” if and only if it is “everywhere true”. In other words, the universal quantifier is a filter.

## Definition 3

**(Universal filter).** Let *T* be a nonempty type. Then \(\lambda Q.\forall x.Q(x)\) is a filter on *T*.

When \(\mathbb {U}\) is the *order filter* associated with the ordering \(\le \), the formula \(\mathbb {U}x.Q(x)\) means that, when *x* becomes sufficiently large with respect to \(\le \), the property *Q*(*x*) becomes true.

## Definition 4

**(Order filter).** Let \((T, \mathord \le )\) be a nonempty ordered type, such that every two elements have an upper bound. Then \(\lambda Q.\exists x_0.\forall x \ge x_0.\; Q(x)\) is a filter on *T*.

The order filter associated with the ordered type \((\mathbb {Z}, \le )\) is the most natural filter on the type \(\mathbb {Z}\). Equipping the type \(\mathbb {Z}\) with this filter yields a filtered type, which, by abuse of notation, we also write \(\mathbb {Z}\). Thus, the formula \(\mathbb {U}_{\mathbb {Z}}\, x.Q(x)\) means that *Q*(*x*) becomes true “as *x* tends towards infinity”.

*product filter*is the most natural construction, although there are others:

## Definition 5

**(Product filter).**Let \(A_1\) and \(A_2\) be filtered types. Then

To understand this definition, it is useful to consider the special case where \(A_1\) and \(A_2\) are both \(\mathbb {Z}\). Then, for \(i\in \{1,2\}\), the formula \(\mathbb {U}_{A_i}\,x_i.\; Q_i\) means that the predicate \(Q_i\) contains an infinite interval of the form \([a_i, \infty )\). Thus, the formula \(\forall x_1, x_2.\; Q_1(x_1) \wedge Q_2(x_2) \Rightarrow Q (x_1, x_2)\) requires the predicate *Q* to contain the infinite rectangle \([a_1, \infty ) \times [a_2, \infty )\). Thus, a predicate *Q* on \(\mathbb {Z}^2\) is “ultimately true” w.r.t. to the product filter if and only if it is “true on some infinite rectangle”. In Bourbaki’s terminology [6, Chap. 1, Sect. 6.7], the infinite rectangles form a *basis* of the product filter.

We view the product filter as the default filter on the product type \(A_1\times A_2\). Whenever we refer to \(A_1\times A_2\) in a setting where a filtered type is expected, the product filter is intended.

We stress that there are several filters on \(\mathbb {Z}\), including the universal filter and the order filter, and therefore several filters on \(\mathbb {Z}^k\). Therefore, it does not make sense to use the \(O\) notation without specifying which filter one considers. Consider again the function Open image in new window in Fig. 3 (Sect. 2). One can prove that Open image in new window has complexity \(O(nm+n)\) with respect to the standard filter on \(\mathbb {Z}^2\). With respect to *this filter*, this complexity bound is equivalent to *O*(*mn*), as the functions \(\lambda (m,n).mn+n\) and \(\lambda (m,n).mn\) dominate each other. Unfortunately, this *does not allow* deducing anything about the complexity of Open image in new window , since the bound *O*(*mn*) holds only when *n* and *m* grow large. An alternate approach is to prove that Open image in new window has complexity \(O(nm+n)\) with respect to a stronger filter, namely the product of the standard filter on \(\mathbb {Z}\) and the universal filter on \(\mathbb {Z}\). With respect to *that filter*, the functions \(\lambda (m,n).mn+n\) and \(\lambda (m,n).mn\) are *not* equivalent. This bound *does allow* instantiating *m* with 0 and deducing that Open image in new window has complexity *O*(*n*).

### 3.4 Properties of Domination

Many properties of the domination relation can be established with respect to an arbitrary filtered type *A*. Here are two example lemmas; there are many more. As before, *f* and *g* range over \(A \rightarrow \mathbb {Z}\). The operators \(f + g\), \(\max (f, g)\) and *f*.*g* denote pointwise sum, maximum, and product, respectively.

## Lemma 6

**(Sum and Max Are Alike).** Assume *f* and *g* are ultimately nonnegative, that is, \(\mathbb {U}_{A}\, x.\; f(x) \ge 0\) and \(\mathbb {U}_{A}\, x.\; g(x) \ge 0\) hold. Then, we have \(\max (f,g) \,\preceq _{A}\, f + g\) and \(f + g \,\preceq _{A}\, \max (f,g)\).

## Lemma 7

**(Multiplication).**\(f_1 \,\preceq _{A}\, g_1\) and \(f_2 \,\preceq _{A}\, g_2\) imply \(f_1.f_2 \,\preceq _{A}\, g_1.g_2\).

Lemma 7 corresponds to Howell’s Property 5 [19]. Whereas Howell states this property on \(\mathbb {N}^k\), our lemma is polymorphic in the type *A*. As noted by Howell, this lemma is useful when the cost of a loop body is independent of the loop index. In the case where the cost of the *i*-th iteration may depend on the loop index *i*, the following, more complex lemma is typically used instead:

## Lemma 8

**(Summation).**Let

*f*,

*g*range over \(A \rightarrow \mathbb {Z}\rightarrow \mathbb {Z}\). Let \(i_0\in \mathbb {Z}\). Assume the following three properties:

- 1.
\(\mathbb {U}_{A}\, a.\; \forall i \ge i_0.\; f(a)(i) \ge 0\).

- 2.
\(\mathbb {U}_{A}\, a.\; \forall i \ge i_0.\; g(a)(i) \ge 0\).

- 3.
for every

*a*, the function \(\lambda i.f(a)(i)\) is nondecreasing on the interval \([i_0, \infty )\).

Lemma 8 uses the product filter on \(A\times \mathbb {Z}\) in its hypothesis and conclusion. It corresponds to Howell’s property 2 [19]. The variable *i* represents the loop index, while the variable *a* collectively represents all other variables in scope, so the type *A* is usually instantiated with a tuple type (an example appears in Sect. 6).

An important property is the fact that function composition is compatible, in a certain sense, with domination. This allows transforming the parameters under which an asymptotic analysis is carried out (examples appear in Sect. 6). Due to space limitations, we refer the reader to the Coq library for details [16].

### 3.5 Tactics

Our formalization of filters and domination forms a stand-alone Coq library [16]. In addition to many lemmas about these notions, the library proposes automated tactics that can prove nonnegativeness, monotonicity, and domination goals. These tactics currently support functions built out of variables, constants, sums and maxima, products, powers, logarithms. Extending their coverage is ongoing work. This library is not tied to our application to the complexity analysis of programs. It could have other applications in mathematics.

## 4 Specifications with Asymptotic Complexity Claims

In this section, we first present our existing approach to verified time complexity analysis. This approach, proposed by the second and third authors [11], does not use the \(O\) notation: instead, it involves explicit cost functions. We then discuss how to extend this approach with support for asymptotic complexity claims. We find that, even once domination (Sect. 3) is well-understood, there remain nontrivial questions as to the style in which program specifications should be written. We propose one style which works well on small examples and which we believe should scale well to larger ones.

### 4.1 CFML with Time Credits for Cost Analysis

CFML [9, 10] is a system that supports the interactive verification of OCaml programs, using higher-order Separation Logic, inside Coq. It is composed of a trusted standalone tool and a Coq library. The CFML tool transforms a piece of OCaml code into a *characteristic formula*, a Coq formula that describes the semantics of the code. The characteristic formula is then exploited, inside Coq, to state that the code satisfies a certain specification (a Separation Logic triple) and to interactively prove this statement. The CFML library provides a set of Coq tactics that implement the reasoning rules of Separation Logic.

*l*.

^{3}The precondition \(\${(|l|+1)}\) asserts that this call requires \(|l|+1\) credits. This triple is proved in a variant of Separation Logic where every function call and every loop iteration consumes one credit. Thus, the above specification guarantees that the execution of Open image in new window involves no more than \(|l|+1\) function calls or loop iterations. Our previous paper [11, Definition 2] gives a precise definition of the meaning of triples.

As argued in prior work [11, Sect. 2.7], bounding the number of function calls and loop iterations is equivalent, up to a constant factor, to bounding the number of reduction steps of the program. Assuming that the OCaml compiler is complexity-preserving, this is equivalent, up to a constant factor, to bounding the number of instructions executed by the compiled code. Finally, assuming that the machine executes one instruction in bounded time, this is equivalent, up to a constant factor, to bounding the execution time of the compiled code. Thus, the above specification guarantees that Open image in new window runs in linear time.

Instead of understanding Separation Logic with Time Credits as a variant of Separation Logic, one can equivalently view it as standard Separation Logic, applied to an instrumented program, where a Open image in new window instruction has been inserted at the beginning of every function body and loop body. The proof of the program is carried out under the axiom Open image in new window , which imposes the consumption of one time credit at every Open image in new window instruction. This instruction has no runtime effect: it is just a way of marking where credits must be consumed.

### 4.2 A Modularity Challenge

The above specification of Open image in new window guarantees that Open image in new window runs in linear time, but does not allow predicting how much real time is consumed by a call to Open image in new window . Thus, this specification is already rather abstract. Yet, it is still too precise. Indeed, we believe that it would not be wise for a list library to publish a specification of Open image in new window whose precondition requires exactly \(|l|+1\) credits. Indeed, there are implementations of Open image in new window that do not meet this specification. For example, the tail-recursive implementation found in the OCaml standard library, which in practice is more efficient than the naïve implementation shown above, involves exactly \(|l|+2\) function calls, therefore requires \(|l|+2\) credits. By advertising a specification where \(|l|+1\) credits suffice, one makes too strong a guarantee, and rules out the more efficient implementation.

After initially publishing a specification that requires \(\${(|l|+1)}\), one could of course still switch to the more efficient implementation and update the published specification so as to require \(\${(|l|+2)}\) instead of \(\${(|l|+1)}\). However, that would in turn require updating the specification and proof of every (direct and indirect) client of the list library, which is intolerable.

*l*, that is, the cost is \(a \cdot |l|+b\), for some constants

*a*and

*b*:This is a better specification, in the sense that it is more modular. The naïve implementation of Open image in new window shown earlier and the efficient implementation in OCaml’s standard library both satisfy this specification, so one is free to choose one or the other, without any impact on the clients of the list library. In fact, any reasonable implementation of Open image in new window should have linear time complexity and therefore should satisfy this specification.

*concrete cost*of Open image in new window , as opposed to the

*asymptotic bound*, represented here by the function \(\lambda n.\,n\). This specification asserts that there exists a concrete cost function \( cost \), which is dominated by \(\lambda n.\,n\), such that \( cost (|l|)\) credits suffice to justify the execution of Open image in new window . Thus, \( cost (|l|)\) is an upper bound on the actual number of Open image in new window instructions that are executed at runtime.

The above specification informally means that Open image in new window has time complexity *O*(*n*) where the parameter *n* represents |*l*|, that is, the length of the list *l*. The fact that *n* represents |*l*| is expressed by applying \( cost \) to |*l*| in the precondition. The fact that this analysis is valid when *n* grows large enough is expressed by using the standard filter on \(\mathbb {Z}\) in the assertion \( cost \,\preceq _{\mathbb {Z}}\, \lambda n.\,n\).

In general, it is up to the user to choose what the parameters of the cost analysis should be, what these parameters represent, and which filter on these parameters should be used. The example of the Bellman-Ford algorithm (Sect. 6) illustrates this.

### 4.3 A Record for Specifications

This type, Open image in new window , is defined in Fig. 5. The first three fields in this record type correspond to what has been explained so far. The first field asserts the existence of a function Open image in new window of *A* to \(\mathbb {Z}\), where *A* is a user-specified filtered type. The second field asserts that a certain property Open image in new window is satisfied; it is typically a Separation Logic triple whose precondition refers to Open image in new window . The third field asserts that Open image in new window is dominated by the user-specified function Open image in new window . The need for the last two fields is explained further on (Sects. 4.4 and 4.5).

*O*(

*n*); and the Separation Logic triple, which describes the behavior of Open image in new window , and refers to the concrete cost function Open image in new window .

One key technical point is that Open image in new window is a strong existential, whose witness can be referred to via to the first projection, Open image in new window . For instance, the concrete cost function associated with Open image in new window can be referred to as Open image in new window . Thus, at a call site of the form Open image in new window , the number of required credits is Open image in new window .

In the next subsections, we explain why, in the definition of Open image in new window , we require the concrete cost function to be nonnegative and monotonic. These are design decisions; although these properties may not be strictly necessary, we find that enforcing them greatly simplifies things in practice.

### 4.4 Why Cost Functions Must Be Nonnegative

There are several common occasions where one is faced with the obligation of proving that a cost expression is nonnegative. These proof obligations arise from several sources.

One source is the Separation Logic axiom for splitting credits, whose statement is \(\${(m + n)} = \${m} \,\star \,\${n}\), subject to the side conditions \(m \ge 0\) and \(n \ge 0\). Without these side conditions, out of \(\${0}\), one would be able to create \(\${1}\,\star \,\${(-1)}\). Because our logic is affine, one could then discard \(\${(-1)}\), keeping just \(\${1}\). In short, an unrestricted splitting axiom would allow creating credits out of thin air.^{4} Another source of proof obligations is the Summation lemma (Lemma 8), which requires the functions at hand to be (ultimately) nonnegative.

Now, suppose one is faced with the obligation of proving that the expression Open image in new window is nonnegative. Because Open image in new window is an existential package (a Open image in new window record), this is impossible, unless this information has been recorded up front within the record. This is the reason why the field Open image in new window in Fig. 5 is needed.

For simplicity, we require cost functions to be nonnegative everywhere, as opposed to within a certain domain. This requirement is stronger than necessary, but simplifies things, and can easily be met in practice by wrapping cost functions within “\(\max (0,-)\)”. Our Coq tactics automatically insert “\(\max (0,-)\)” wrappers where necessary, making this issue mostly transparent to the user. In the following, for brevity, we write \(c^+\) for \(\max (0,c)\), where \(c\in \mathbb {Z}\).

### 4.5 Why Cost Functions Must Be Monotonic

One key reason why cost functions should be monotonic has to do with the “avoidance problem”. When the cost of a code fragment depends on a local variable *x*, can this cost be reformulated (and possibly approximated) in such a way that the dependency is removed? Indeed, a cost expression that makes sense outside the scope of *x* is ultimately required.

The problematic cost expression is typically of the form *E*[|*x*|], where |*x*| represents some notion of the “size” of the data structure denoted by *x*, and *E* is an arithmetic context, that is, an arithmetic expression with a hole. Furthermore, an upper bound on |*x*| is typically available. This upper bound can be exploited if the context *E* is monotonic, i.e., if \(x \le y\) implies \(E[x] \le E[y]\). Because the hole in *E* can appear as an actual argument to an abstract cost function, we must record the fact that this cost function is monotonic.

*O*(|

*l*|).

In a formal setting, though, the problem is not so simple. Assume that we have two specification lemmas Open image in new window and Open image in new window for Open image in new window and Open image in new window , which describe the behavior of these OCaml functions and guarantee that they have linear-time complexity. For brevity, let us write just *g* and *f* for the functions Open image in new window and Open image in new window . Also, at the mathematical level, let us write \(l\mathord \downarrow \) for the sublist of the positive elements of the list *l*. It is easy enough to check that the cost of the expression “ Open image in new window ; Open image in new window ” is \(1+f(|l|) + g(|l'|)\). The problem, now, is to *find an upper bound* for this cost *that does not depend on* \(l'\), a local variable, and to verify that this upper bound, *expressed as a function of* |*l*|, is dominated by \(\lambda n.\,n\). Indeed, this is required in order to establish a Open image in new window statement about Open image in new window .

- 1.
Within the scope of \(l'\), the equality \(l' = l\mathord \downarrow \) is available, as it follows from the postcondition of Open image in new window . Thus, within this scope, \(1+f(|l|) + g(|l'|)\) is provably equal to \( let \;l'=l\mathord \downarrow \; in \;1+f(|l|) + g(|l'|)\), that is, \(1+f(|l|)+g(|l\mathord \downarrow |)\). This remark may seem promising, as this cost expression does not depend on \(l'\). Unfortunately, this approach falls short, because this cost expression cannot be expressed as the application of a closed function \( cost \) to |

*l*|. Indeed, the length of the filtered list, \(|l\mathord \downarrow |\), is not a function of the length of*l*. In short, substituting local variables away in a cost expression does not always lead to a usable cost function. - 2.Within the scope of \(l'\), the inequality \(|l'| \le |l|\) is available, as it follows from \(l' = l\mathord \downarrow \). Thus, inequality (A) can be proved, provided we take:Furthermore, for this definition of \( cost \), the domination assertion (B) holds as well. The proof relies on the fact the functions$$\begin{aligned} cost = \lambda n.\,\max _{0\le n'\le n} \; 1 + f(n) + g(n') \end{aligned}$$
*g*and \(\hat{g}\), where \(\hat{g}\) is \(\lambda n.\,\max _{0\le n'\le n} \; g(n')\) [19], dominate each other. Although this approach seems viable, and does not require the function*g*to be monotonic, it is a bit more complicated than we would like. - 3.Let us now assume that the function
*g*is monotonic, that is, nondecreasing. As before, within the scope of \(l'\), the inequality \(|l'| \le |l|\) is available. Thus, the cost expression \(1+f(|l|) + g(|l'|)\) is bounded by \(1 + f(|l|) + g(|l|)\). Therefore, inequalities (A) and (B) are satisfied, provided we take:$$\begin{aligned} cost = \lambda n.\,1 + f(n) + g(n) \end{aligned}$$

We believe that approach 3 is the simplest and most intuitive, because it allows us to easily eliminate \(l'\), without giving rise to a complicated cost function, and without the need for a running maximum.

However, this approach requires that the cost function *g*, which is short for Open image in new window , be monotonic. This explains why we build a monotonicity condition in the definition of Open image in new window (Fig. 5, last line). Another motivation for doing so is the fact that some lemmas (such as Lemma 8, which allows reasoning about the asymptotic cost of an inner loop) also have monotonicity hypotheses.

The reader may be worried that, in practice, there might exist concrete cost functions that are not monotonic. This may be the case, in particular, of a cost function *f* that is obtained as the solution of a recurrence equation. Fortunately, in the common case of functions of \(\mathbb {Z}\) to \(\mathbb {Z}\), the “running maximum” function \(\hat{f}\) can always be used in place of *f*: indeed, it is monotonic and has the same asymptotic behavior as *f*. Thus, we see that both approaches 2 and 3 above involve running maxima in some places, but their use seems less frequent with approach 3.

## 5 Interactive Proofs of Asymptotic Complexity Claims

To prove a specification lemma, such as Open image in new window (Sect. 4.3) or Open image in new window (Sect. 4.4), one must construct a Open image in new window record. By definition of Open image in new window (Fig. 5), this means that one must exhibit a concrete cost function \( cost \) and prove a number of properties of this function, including the fact that, when supplied with \(\${( cost \;\ldots )}\), the code runs correctly ( Open image in new window ) and the fact that \( cost \) is dominated by the desired asymptotic bound Open image in new window .

Thus, the very first step in a naïve proof attempt would be to *guess* an appropriate cost function for the code at hand. However, such an approach would be painful, error-prone, and brittle. It seems much preferable, if possible, to enlist the machine’s help in *synthesizing* a cost function *at the same time as we step through the code*—which we have to do anyway, as we must build a Separation Logic proof of the correctness of this code.

Suppose we wish to establish that Open image in new window runs in linear time. As argued at the beginning of the paper (Sect. 2, Fig. 2), it does not make sense to attempt a proof by induction on *n* that “ Open image in new window runs in time *O*(*n*)”. Instead, in a formal framework, we must exhibit a concrete cost function \( cost \) such that \( cost (n)\) credits justify the call Open image in new window and \( cost \) grows linearly, that is, \( cost \,\preceq _{\mathbb {Z}}\, \lambda n.\,n\).

Let us assume that a specification lemma Open image in new window for the function Open image in new window has been established already, so the number of credits required by a call to Open image in new window is Open image in new window . In the following, we write \(G\) as a shorthand for this constant.

Because this example is very simple, it is reasonably easy to manually come up with an appropriate cost function for Open image in new window . One valid guess is \(\lambda n. \; 1 + \varSigma _{i=2}^n (1+G)\). Another valid guess, obtained via a simplification step, is \(\lambda n. \; 1 + (1+G)(n - 1)^+\). Another witness, obtained via an approximation step, is \(\lambda n. \; 1 + (1+G)n^+\). As the reader can see, there is in fact a spectrum of valid witnesses, ranging from verbose, low-level to compact, high-level mathematical expressions. Also, it should be evident that, as the code grows larger, it can become very difficult to guess a valid concrete cost function.

This gives rise to two questions. Among the valid cost functions, which one is preferable? Which ones can be systematically constructed, without guessing?

Among the valid cost functions, there is a tradeoff. At one extreme, a low-level cost function has exactly the same syntactic structure as the code, so it is easy to prove that it is an upper bound for the actual cost of the code, but a lot of work may be involved in proving that it is dominated by the desired asymptotic bound. At the other extreme, a high-level cost function can be essentially identical to the desired asymptotic bound, up to explicit multiplicative and additive constants, so the desired domination assertion is trivial, but a lot of accounting work may be involved in proving that this function represents enough credits to execute the code. Thus, by choosing a cost function, we shift some of the burden of the proof from one subgoal to another. From this point of view, no cost function seems inherently preferable to another.

From the point of view of systematic construction, however, the answer is more clear-cut. It seems fairly clear that it is possible to systematically build a cost function whose syntactic structure is the same as the syntactic structure of the code. This idea goes at least as far back as Wegbreit’s work [26]. Coming up with a compact, high-level expression of the cost, on the other hand, seems to require human insight.

To provide as much machine assistance as possible, our system mechanically synthesizes a low-level cost expression for a piece of OCaml code. This is done transparently, at the same time as the user constructs a proof of the code in Separation Logic. Furthermore, we take advantage of the fact that we are using an interactive proof assistant: we allow the user to guide the synthesis process. For instance, the user controls how a local variable should be eliminated, how the cost of a conditional construct should be approximated (i.e., by a conditional or by a maximum), and how recurrence equations should be solved. In the following, we present this semi-interactive synthesis process. We first consider straight-line (nonrecursive) code (Sect. 5.1), then recursive functions (Sect. 5.2).

### 5.1 Synthesizing Cost Expressions for Straight-Line Code

To this end, we use specialized variants of the reasoning rules, whose premises and conclusions take the form \(\{\$\,{n} \star H\}\,(e)\,\{Q\}\). Furthermore, to simplify the nonnegativeness side conditions that must be proved while reasoning, we make all cost expressions obviously nonnegative by wrapping them in \(\max (0, -)\). Recall that \(c^+\) stands for \(\max (0,c)\), where \(c\in \mathbb {Z}\). Our reasoning rules work with triples of the form \(\{\$\,{c^+} \star H\}\,(e)\,\{Q\}\). They are shown in Fig. 6.

Because we wish to *synthesize* a cost expression, our Coq tactics maintain the following invariant: whenever the goal is \(\{\$\,{c^+} \star H\}\,(e)\,\{Q\}\), the cost *c* is *uninstantiated*, that is, it is represented in Coq by a metavariable, a placeholder. This metavariable is instantiated when the goal is proved by applying one of the reasoning rules. Such an application produces new subgoals, whose preconditions contain new metavariables. As this process is repeated, a cost expression is incrementally constructed.

The rule Open image in new window is a special case of the consequence rule of Separation Logic. It is typically used once at the root of the proof: even though the initial goal \(\{\$\,{c_1} \star H\}\,(e)\,\{Q\}\) may not satisfy our invariant, because it lacks a \(-^+\) wrapper and because \(c_1\) is not necessarily a metavariable, Open image in new window gives rise to a subgoal \(\{\$\,{c_2^+} \star H\}\,(e)\,\{Q\}\) that satisfies it. Indeed, when this rule is applied, a fresh metavariable \(c_2\) is generated. Open image in new window can also be explicitly applied by the user when desired. It is typically used just before leaving the scope of a local variable *x* to approximate a cost expression \(c_2^+\) that depends on *x* with an expression \(c_1\) that does not refer to *x*.

The Open image in new window rule is a special case of the Open image in new window rule. It states that the cost of a sequence is the sum of the costs of its subexpressions. When this rule is applied to a goal of the form \(\{\$\,{c^+} \star H\}\,(e)\,\{Q\}\), where *c* is a metavariable, two new metavariables \(c_1\) and \(c_2\) are introduced, and *c* is instantiated with \(c_1^+ + c_2^+\).

The Open image in new window rule is similar to Open image in new window , but involves an additional subtlety: the cost \(c_2\) must not refer to the local variable *x*. Naturally, Coq enforces this condition: any attempt to instantiate the metavariable \(c_2\) with an expression where *x* occurs fails. In such a situation, it is up to the user to use Open image in new window so as to avoid this dependency. The example of Open image in new window (Sect. 4.5) illustrates this issue.

The Open image in new window rule handles values, which in our model have zero cost. The symbol \(\mathrel {\Vdash }\) denotes entailment between Separation Logic assertions.

The Open image in new window rule states that the cost of an OCaml conditional expression is a mathematical conditional expression. Although this may seem obvious, one subtlety lurks here. Using Open image in new window , the cost expression \( if \;b\; then \;c_1\; else \;c_2\) can be approximated by \(\max (c_1,c_2)\). Such an approximation can be beneficial, as it leads to a simpler cost expression, or harmful, as it causes a loss of information. In particular, when carried out in the body of a recursive function, it can lead to an unsatisfiable recurrence equation. We let the user decide whether this approximation should be performed.

The Open image in new window rule handles the Open image in new window instruction, which is inserted by the CFML tool at the beginning of every function and loop body (Sect. 4.1). This instruction costs one credit.

The Open image in new window rule states that the cost of a Open image in new window loop is the sum, over all values of the index *i*, of the cost of the *i*-th iteration of the body. In practice, it is typically used in conjunction with Open image in new window , which allows the user to simplify and approximate the iterated sum \(\varSigma _{a\le i<b} \; c(i)^+\). In particular, if the synthesized cost *c*(*i*) happens to not depend on *i*, or can be approximated so as to not depend on *i*, then this iterated sum can be expressed under the form \(c(b-a)^+\). A variant of the Open image in new window rule, not shown, covers this common case. There is in principle no need for a primitive treatment of loops, as loops can be encoded in terms of higher-order recursive functions, and our program logic can express the specifications of these combinators. Nevertheless, in practice, primitive support for loops is convenient.

### 5.2 Synthesizing and Solving Recurrence Equations

*S*(

*f*) is the Separation Logic triple that we wish to establish, where

*f*stands for an as-yet-unknown cost function. Following common informal practice, we would like to do this in two steps. First, from the code, derive a “recurrence equation”

*E*(

*f*), which in fact is usually not an equation, but a constraint (or a conjunction of constraints) bearing on

*f*. Second, prove that this recurrence equation admits a solution that is dominated by the desired asymptotic cost function

*g*. This approach can be formally viewed as an application of the following tautology:

*g*. In Coq, applying this tautology gives rise to a new metavariable

*E*, as the recurrence equation is initially unknown, and two subgoals.

*f*is abstract (universally quantified), but we are allowed to assume

*E*(

*f*), where

*E*is initially a metavariable. So, should the need arise to prove that

*f*satisfies a certain property, this can be done just by instantiating

*E*. In the example of the OCaml function Open image in new window (Sect. 5), we prove

*S*(

*f*) by induction over

*n*, under the hypothesis \(n \ge 0\). Thus, we assume that the cost of the recursive call Open image in new window is \(f(n-1)\), and must prove that the cost of Open image in new window is

*f*(

*n*). We synthesize the cost of Open image in new window as explained earlier (Sect. 5.1) and find that this cost is \(1 + if \;n \le 1\; then \;0\; else \;(G+ f(n-1))\). We apply Open image in new window and find that our proof is complete, provided we are able to prove the following inequation:

*E*as follows:

We then turn to the second subgoal, \(\exists f.E(f) \wedge f \,\preceq _{}\, g\). The metavariable *E* is now instantiated. The goal is to solve the recurrence and analyze the asymptotic growth of the chosen solution. There are at least three approaches to solving such a recurrence.

First, one can guess a closed form that satisfies the recurrence. For example, the function \(f := \lambda n. \; 1 + (1+G)n^+\) satisfies *E*(*f*) above. But, as argued earlier, guessing is in general difficult and tedious.

Second, one can invoke Cormen *et al.*’s Master Theorem [12] or the more general Akra-Bazzi theorem [1, 21]. Unfortunately, at present, these theorems are not available in Coq, although an Isabelle/HOL formalization exists [13].

*et al.*’s substitution method [12, Sect. 4]. The idea is to guess a parameterized

*shape*for the solution; substitute this shape into the goal; gather a set of constraints that the parameters must satisfy for the goal to hold; finally, show that these constraints are indeed satisfiable. In the above example, as we expect the code to have linear time complexity, we propose that the solution

*f*should have the shape \(\lambda n.(an^+ + b)\), where

*a*and

*b*are parameters, about which we wish to gradually accumulate a set of constraints. From a formal point of view, this amounts to applying the following tautology:

*C*is a metavariable and can be instantiated as desired (possibly in several steps), allowing us to gather a conjunction of constraints bearing on

*a*and

*b*. During the proof of the second subgoal,

*C*is fixed and we must check that it is satisfiable. In our example, the first subgoal is:

*C*with \(\lambda (a,b).(1 \le b \wedge 1 \le a + b \wedge 1 + G \le a)\).

There remains to check the second subgoal, that is, \(\exists ab.C(a,b)\). This is easy; we pick, for instance, \(a := 1 + G\) and \(b := 1\). This concludes our use of Cormen *et al.*’s substitution method.

In summary, by exploiting Coq’s metavariables, we are able to set up our proofs in a style that closely follows the traditional paper style. During a first phase, as we analyze the code, we synthesize a cost function and (if the code is recursive) a recurrence equation. During a second phase, we guess the shape of a solution, and, as we analyze the recurrence equation, we synthesize a constraint on the parameters of the shape. During a last phase, we check that this constraint is satisfiable. In practice, instead of explicitly building and applying tautologies as above, we use the first author’s procrastination library [16], which provides facilities for introducing new parameters, gradually gathering constraints on these parameters, and eventually checking that these constraints are satisfiable.

## 6 Examples

**Binary Search.**We prove that binary search has time complexity \(O{(\log {n})}\), where \(n = j-i\) denotes the width of the search interval [

*i*,

*j*). The code is as in Fig. 1, except that the flaw is fixed by replacing Open image in new window with Open image in new window on the last line. As outlined earlier (Sect. 5), we synthesize the following recurrence equation on the cost function

*f*:

**Dependent Nested Loops.**Many algorithms involve dependent nested for loops, that is, nested loops, where the bounds of the inner loop depend on the outer loop index, as in the following simplified example: For this code, the cost function \(\lambda n. \; \sum _{i=1}^{n} (1 + \sum _{j=1}^{i} 1)\) is synthesized. There remains to prove that it is dominated by \(\lambda {n.}{n^2}\). We could recognize and prove that this function is equal to \(\lambda n.\frac{n(n+3)}{2}\), which clearly is dominated by \(\lambda {n.}{n^2}\). This works because this example is trivial, but, in general, computing explicit closed forms for summations is challenging, if at all feasible.

A higher-level approach is to exploit the fact that, if *f* is monotonic, then \(\sum _{i=1}^n f(i)\) is less than *n*.*f*(*n*). Applying this lemma twice, we find that the above cost function is less than \(\lambda n.\sum _{i=1}^{n} (1 + i)\) which is less than \(\lambda n.n(1+n)\) which is dominated by \(\lambda n.n^2\). This simple-minded approach, which does not require the Summation lemma (Lemma 8), is often applicable. The next example illustrates a situation where the Summation lemma is required.

**A Loop Whose Body Has Exponential Cost.**In the following simple example, the loop body is just a function call:

Thus, the cost of the loop body is not known exactly. Instead, let us assume that a specification for the auxiliary function Open image in new window has been proved and that its cost is \(O(2^i)\), that is, Open image in new window holds. We then wish to prove that the cost of the whole loop is also \(O(2^n)\).

For this loop, the cost function Open image in new window is automatically synthesized. We have an asymptotic bound for the cost of the loop body, namely: Open image in new window . The side conditions of the Summation lemma (Lemma 8) are met: in particular, the function Open image in new window is monotonic. The lemma yields Open image in new window . Finally, we have \(\lambda n. \; \sum _{i=0}^{n} 2^i = \lambda n. \; 2^{n+1} - 1 \,\preceq _{\mathbb {Z}}\, \lambda n. \; 2^n\).

**The Bellman-Ford Algorithm.**We verify the asymptotic complexity of an implementation of Bellman-Ford algorithm, which computes shortest paths in a weighted graph with

*n*vertices and

*m*edges. The algorithm involves an outer loop that is repeated \(n-1\) times and an inner loop that iterates over all

*m*edges. The specification asserts that the asymptotic complexity is

*O*(

*nm*):By exploiting the fact that a graph without duplicate edges must satisfy \(m \le n^2\), we prove that the complexity of the algorithm, viewed as a function of

*n*, is \(O(n^3)\).To prove that the former specification implies the latter, one instantiates

*m*with \(n^2\), that is, one exploits a composition lemma (Sect. 3.4). In practice, we publish both specifications and let clients use whichever one is more convenient.

**Union-Find.**Charguéraud and Pottier [11] use Separation Logic with Time Credits to verify the correctness and time complexity of a Union-Find implementation. For instance, they prove that the (amortized) concrete cost of Open image in new window is \(2\alpha (n)+4\), where

*n*is the number of elements. With a few lines of proof, we derive a specification where the cost of Open image in new window is expressed under the form \(O(\alpha (n))\): Union-Find is a mutable data structure, whose state is described by the abstract predicate Open image in new window . In particular, the parameter Open image in new window represents the domain of the data structure, that is, the set of all elements created so far. Thus, its cardinal, Open image in new window , corresponds to

*n*. This case study illustrates a situation where the cost of an operation depends on the current state of a mutable data structure.

## 7 Related Work

Our work builds on top of Separation Logic [23] with Time Credits [2], which has been first implemented in a verification tool and exploited by the second and third authors [11]. We refer the reader to their paper for a survey of the related work in the general area of formal reasoning about program complexity, including approaches based on deductive program verification and approaches based on automatic complexity analysis. In this section, we restrict our attention to informal and formal treatments of the \(O\) notation.

The \(O\) notation and its siblings are documented in several textbooks [7, 15, 20]. Out of these, only Howell [19, 20] draws attention to the subtleties of the multivariate case. He shows that one cannot take for granted that the properties of the \(O\) notation, which in the univariate case are well-known, remain valid in the multivariate case. He states several properties which, at first sight, seem natural and desirable, then proceeds to show that they are inconsistent, so no definition of the \(O\) notation can satisfy them all. He then proposes a candidate notion of domination between functions whose domain is \(\mathbb {N}^k\). His notation, \(f\in \hat{O}(g)\), is defined as the conjunction of \(f\in O(g)\) and \(\hat{f}\in O(\hat{g})\), where the function \(\hat{f}\) is a “running maximum” of the function *f*, and is by construction monotonic. He shows that this notion satisfies all the desired properties, provided some of them are restricted by additional side conditions, such as monotonicity requirements.

In this work, we go slightly further than Howell, in that we consider functions whose domain is an arbitrary filtered type *A*, rather than necessarily \(\mathbb {N}^k\). We give a standard definition of \(O\) and verify all of Howell’s properties, again restricted with certain side conditions. We find that we do not need \(\hat{O}\), which is fortunate, as it seems difficult to define \(\hat{f}\) in the general case where *f* is a function of domain *A*. The monotonicity requirements that we impose are not exactly the same as Howell’s, but we believe that the details of these administrative conditions do not matter much, as all of the functions that we manipulate in practice are everywhere nonnegative and monotonic.

Avigad and Donnelly [3] formalize the \(O\) notation in Isabelle/HOL. They consider functions of type \(A \rightarrow B\), where *A* is arbitrary and *B* is an ordered ring. Their definition of “\(f = O(g)\)” requires \(|f(x)| \le c|g(x)|\) for every *x*, as opposed to “when *x* is large enough”. Thus, they get away without equipping the type *A* with a filter. The price to pay is an overly restrictive notion of domination, except in the case where *A* is \(\mathbb {N}\), where both \(\forall x\) and \(\mathbb {U}x\) yield the same notion of domination—this is Brassard and Bratley’s “threshold rule” [7]. Avigad and Donnelly suggest defining “\(f = O(g) \text { eventually}\)” as an abbreviation for \(\exists f', (f' = O(g) \wedge \mathbb {U}x.f(x) = f'(x))\). In our eyes, this is less elegant than parameterizing *O* with a filter in the first place.

Eberl [13] formalizes the Akra-Bazzi method [1, 21], a generalization of the well-known Master Theorem [12], in Isabelle/HOL. He creates a library of Landau symbols specifically for this purpose. Although his paper does not mention filters, his library in fact relies on filters, whose definition appears in Isabelle’s Complex library. Eberl’s definition of the \(O\) symbol is identical to ours. That said, because he is concerned with functions of type \(\mathbb {N} \rightarrow \mathbb {R}\) or \(\mathbb {R} \rightarrow \mathbb {R}\), he does not define product filters, and does not prove any lemmas about domination in the multivariate case. Eberl sets up a decision procedure for domination goals, like \(x\in O(x^3)\), as well as a procedure that can simplify, say, \(O(x^3+x^2)\) to \(O(x^3)\).

TiML [25] is a functional programming language where types carry time complexity annotations. Its type-checker generates proof obligations that are discharged by an SMT solver. The core type system, whose metatheory is formalized in Coq, employs concrete cost functions. The TiML implementation allows associating a \(O\) specification with each toplevel function. An unverified component recognizes certain classes of recurrence equations and automatically applies the Master Theorem. For instance, *mergesort* is recognized to be \(O(mn\log n)\), where *n* is the input size and *m* is the cost of a comparison. The meaning of the \(O\) notation in the multivariate case is not spelled out; in particular, which filter is meant is not specified.

Boldo *et al.* [4] use Coq to verify the correctness of a C program which implements a numerical scheme for the resolution of the one-dimensional acoustic wave equation. They define an ad hoc notion of “uniform \(O\)” for functions of type \(\mathbb {R}^2 \rightarrow \mathbb {R}\), which we believe can in fact be viewed as an instance of our generic definition of domination, at an appropriate product filter. Subsequent work on the Coquelicot library for real analysis [5] includes general definitions of filters, limits, little-*o* and asymptotic equivalence. A few definitions and lemmas in Coquelicot are identical to ours, but the focus in Coquelicot is on various filters on \(\mathbb {R}\), whereas we are more interested in filters on \(\mathbb {Z}^k\).

The tools RAML [17] and Pastis [8] perform fully automated amortized time complexity analysis of OCaml programs. They can be understood in terms of Separation Logic with Time Credits, under the constraint that the number of credits that exist at each program point must be expressed as a polynomial over the variables in scope at this point. The a priori unknown coefficients of this polynomial are determined by an LP solver. Pastis produces a proof certificate that can be checked by Coq, so the trusted computing base of this approach is about the same as ours. RAML and Pastis offer much stronger automation than our approach, but have weaker expressive power. It would be very interesting to offer access to a Pastis-like automated system within our interactive system.

## Footnotes

- 1.
At this time, we require the codomain of

*f*and*g*to be \(\mathbb {Z}\). Following Avigad and Donnelly [3], we could allow it to be an arbitrary nondegenerate ordered ring. We have not yet needed this generalization. - 2.
When

*A*is \(\mathbb {N}\), provided*g*(*x*) is never zero, requiring the inequality to be “everywhere true” is in fact the same as requiring it to be “ultimately true”. Outside of this special case, however, requiring the inequality to hold everywhere is usually too strong. - 3.
The square brackets denote a pure Separation Logic assertion. |

*l*| denotes the length of the Coq list*l*. CFML transparently reflects OCaml integers as Coq relative integers and OCaml lists as Coq lists. - 4.
Another approach would be to define \(\${n}\) only for \(n\in \mathbb {N}\), in which case an unrestricted axiom would be sound. However, as we use \(\mathbb {Z}\) everywhere, that would be inconvenient. A more promising idea is to view \(\${n}\) as linear (as opposed to affine) when

*n*is negative. Then, \(\${(-1)}\) cannot be discarded, so unrestricted splitting is sound.

## References

- 1.Akra, M.A., Bazzi, L.: On the solution of linear recurrence equations. Comput. Optim. Appl.
**10**(2), 195–210 (1998). https://doi.org/10.1023/A:1018373005182MathSciNetCrossRefMATHGoogle Scholar - 2.Atkey, R.: Amortised resource analysis with separation logic. Log. Methods Comput. Sci.
**7**(2:17) (2011). http://bentnib.org/amortised-sep-logic-journal.pdf - 3.Avigad, J., Donnelly, K.: Formalizing
*O*notation in Isabelle/HOL. In: Basin, D., Rusinowitch, M. (eds.) IJCAR 2004. LNCS (LNAI), vol. 3097, pp. 357–371. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25984-8_27CrossRefGoogle Scholar - 4.Boldo, S., Clément, F., Filliâtre, J.C., Mayero, M., Melquiond, G., Weis, P.: Wave equation numerical resolution: a comprehensive mechanized proof of a C program. J. Autom. Reason.
**50**(4), 423–456 (2013). https://hal.inria.fr/hal-00649240MathSciNetCrossRefGoogle Scholar - 5.Boldo, S., Lelay, C., Melquiond, G.: Coquelicot: a user-friendly library of real analysis for Coq. Math. Comput. Sci.
**9**(1), 41–62 (2015). https://hal.inria.fr/hal-00860648MathSciNetCrossRefGoogle Scholar - 6.Bourbaki, N.: General Topology, Chapters 1–4. Springer, Heidelberg (1995). https://doi.org/10.1007/978-3-642-61701-0CrossRefGoogle Scholar
- 7.Brassard, G., Bratley, P.: Fundamentals of Algorithmics. Prentice Hall, Upper Saddle River (1996)MATHGoogle Scholar
- 8.Carbonneaux, Q., Hoffmann, J., Reps, T., Shao, Z.: Automated resource analysis with Coq proof objects. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017, Part II. LNCS, vol. 10427, pp. 64–85. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9_4CrossRefGoogle Scholar
- 9.Charguéraud, A.: Characteristic formulae for the verification of imperative programs. In: International Conference on Functional Programming (ICFP), pp. 418–430, September 2011. http://www.chargueraud.org/research/2011/cfml/main.pdf
- 10.Charguéraud, A.: The CFML tool and library (2016). http://www.chargueraud.org/softs/cfml/
- 11.Charguéraud, A., Pottier, F.: Verifying the correctness and amortized complexity of a union-find implementation in separation logic with time credits. J. Autom. Reason. September 2017. http://gallium.inria.fr/~fpottier/publis/chargueraud-pottier-uf-sltc.pdf
- 12.Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press (2009). http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=11866
- 13.Eberl, M.: Proving divide and conquer complexities in Isabelle/HOL. J. Autom. Reason.
**58**(4), 483–508 (2017). https://www21.in.tum.de/~Eeberlm/divide_and_conquer_isabelle.pdfMathSciNetCrossRefGoogle Scholar - 14.Filliâtre, J.-C., Letouzey, P.: Functors for proofs and programs. In: Schmidt, D. (ed.) ESOP 2004. LNCS, vol. 2986, pp. 370–384. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24725-8_26CrossRefMATHGoogle Scholar
- 15.Graham, R.L., Knuth, D.E., Patashnik, O.: Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley (1994). http://www-cs-faculty.stanford.edu/~knuth/gkp.html
- 16.Guéneau, A., Charguéraud, A., Pottier, F.: Electronic appendix, January 2018. http://gallium.inria.fr/~agueneau/bigO/
- 17.Hoffmann, J., Das, A., Weng, S.: Towards automatic resource bound analysis for OCaml. In: Principles of Programming Languages (POPL), pp. 359–373, January 2017. http://www.cs.cmu.edu/~janh/papers/HoffmannDW17.pdfCrossRefGoogle Scholar
- 18.Hopcroft, J.E.: Computer science: the emergence of a discipline. Commun. ACM
**30**(3), 198–202 (1987). http://doi.acm.org/10.1145/214748.214750MathSciNetCrossRefGoogle Scholar - 19.Howell, R.R.: On asymptotic notation with multiple variables. Technical report 2007–4, Kansas State University, January 2008. http://people.cs.ksu.edu/~rhowell/asymptotic.pdf
- 20.Howell, R.R.: Algorithms: a top-down approach, July 2012, draft. http://people.cs.ksu.edu/~rhowell/algorithms-text/text/
- 21.Leighton, T.: Notes on better master theorems for divide-and-conquer recurrences (1996). http://courses.csail.mit.edu/6.046/spring04/handouts/akrabazzi.pdf
- 22.Pilkiewicz, A., Pottier, F.: The essence of monotonic state. In: Types in Language Design and Implementation (TLDI), January 2011. http://gallium.inria.fr/~fpottier/publis/pilkiewicz-pottier-monotonicity.pdf
- 23.Reynolds, J.C.: Separation logic: a logic for shared mutable data structures. In: Logic in Computer Science (LICS), pp. 55–74 (2002). http://www.cs.cmu.edu/~jcr/seplogic.pdf
- 24.Tarjan, R.E.: Algorithm design. Commun. ACM
**30**(3), 204–212 (1987). http://doi.acm.org/10.1145/214748.214752MathSciNetCrossRefGoogle Scholar - 25.Wang, P., Wang, D., Chlipala, A.: TiML: a functional language for practical complexity analysis with invariants. Proc. ACM Program. Lang.
**1**(OOPSLA), 79:1–79:26 (2017). http://adam.chlipala.net/papers/TimlOOPSLA17/TimlOOPSLA17.pdfGoogle Scholar - 26.Wegbreit, B.: Mechanical program analysis. Commun. ACM
**18**(9), 528–539 (1975). http://doi.acm.org/10.1145/361002.361016MathSciNetCrossRefGoogle Scholar

## Copyright information

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.