Complexity Analysis of Self-adjusting Data Structures

. Being able to argue about the performance of self-adjusting data structures such as splay trees has been a main objective, when Sleator and Tarjan introduced the notion of amortised complexity. Analysing these data structures requires sophisticated potential functions, which typically contain logarithmic expressions. Possibly for these reasons, and despite the recent progress in automated resource analysis, they have so far eluded automation. In this paper, we report on the ﬁrst fully-automated amortised complexity analysis of self-adjusting data structures. Following earlier work, our analysis is based on potential function templates with unknown coeﬃcients. We make the following contributions: 1) We encode the search for concrete potential function coeﬃcients as an optimisation problem over a suitable constraint system. Our target function steers the search towards coeﬃcients that minimise the inferred amortised complexity. 2) Automation is achieved by using a linear constraint system in conjunction with suitable lemmata schemes that encapsulate the required non-linear facts about the logarithm. We discuss our choices that achieve a scalable analysis. 3) We present our tool ATLAS and report on experimental results for splay trees , splay heaps and pairing heaps . We completely automatically infer complexity estimates that match previous results (obtained by sophisticated pen-and-paper proofs), and in some cases even infer better complexity estimates than previously published.


Introduction
Amortisedanalysis, as pioneeredbySleator and Ta rjan [DR85;RE 85], is amethod forthe worst-case cost analysis of data structures. The innovation of amortisedanalysis is to not only consider the cost of performing asingle operation on adata structure, but the cost of performing asequenceofoperations. The methodology of amortisedanalysis allows one to assign alow (e.g. constant or logarithmic) amortisedcost to adata structure operation even though the worst-case cost of asingle operation might be high (e.g. linear, polynomial or worse).
The setup of amortisedanalysis guarantees that forasequenceofdata structure operations the worstcase cost is indeed the number of data structure operations (i.e. the length of given sequence) times the amortisedcost. In this way, amortisedanalysis provides amethodology forworst-case cost analysis.
Notably,the cost analysis of self-adjusting data structures, such as splaytrees, has been amotivating example already in the initial proposal of amortisedanalysis [DR85; RE 85].
These data structures have the behaviour that asingle data structure operation might be expensive(i.e. linear in the size of the tree) but the cost is guaranteed to "average out" in asequenceofdata structure operations (i.e. logarithmic in the size of the tree).
Analysing these data structures requires sophisticatedpotential functions, which typically contain logarithmic expressions. Possibly forthese reasons, and despite the recent progress in automatedcomplexity analysis, they have so fareludedautomation.
In this thesis, we present the first automatedamortisedcost analysis of self-adjusting data structures, that is, of splaytrees, splayheaps and pairingheaps,for which only results obtainedbymanual on penand-paper, or by using interactivetheorem provers, have been reported.
In this thesis, we present the first fully automatedamortisedcost analysis of self-adjusting data structures, that is, of splaytrees, splayheaps and pairingheaps,Wefor the first time propose atype system that supports logarithmic potential functions (and at the same time enables amultivariate cost analysis).
Our analysis is coachedinasimple first-order functional language just sufficiently rich to provide a full definition of our motivating examples (intially splaytrees, but also splayheaps and pairing heaps). We employabig-step semantics, following similar approaches in the literature. We note that this decision only supports the resourceanalysis of terminating programs. However, it is straightforward to provide apartial big-step semantics [JM10a] or asmall-step semantics [MS20] to overcome this limitation. Furthermore, our type system is gearedtowards runtime as computation cost i.e. we assign a unit cost to each function call and zero cost to every other evaluation step). Again it would not be difficult to provide aparametric type system that supports other cost models. We consider both issues as complementary to our main agenda.
Our type system has been designedwith the goal of automation. As in previous work on type-based amortisedanalysis, the type system infers constraints on unknown coefficients of template potential functions in asyntax directedway from the program under analysis. Suitable coefficients canthen be found automatically by solving the collectedc onstraints with asuitable constraint solver (i.e. an SMT solver that supports the theory of linear arithmetic). The derivation of constraints is straightforward forall syntactic constructs of our programming language. However, our automatedanalysis also requires a weakening rule, which supports the comparison of different potential functions. As our potential functions are logarithmic, we cannot directly encode the comparison betweenlogarithmic expressions within the theory of linear arithmetic. Here we propose several ideas for linearising the requiredcomparison of logarithmic expressions. The obtainedlinear constraints canbethen be added to the constraint system. Our proposedlinearisation makes use of 1. mathematical facts about the logarithm, 2. facts inferredfrom the program under analysis about the arguments of the logarithmic expressions -wecall both these facts expert knowledge -, and 3. Farkas'Lemma (Lemma 5) forturning the universally-quantifiedpremise of the weakening rule into an existentially-quantifiedstatement that canbeaddedtothe constraint system.
Our work takes the seminal study of Schoenmakers [Sch92;BS93] as astarting point, who has forthe first time formulatedself-adjusting data structures in afunctional setting and analysedthe amortised cost of the obtainedfunctional data structures. An important precursor of our work is the recent effort by Nipkowetal. [T N15; NB19], who has verifiedthe amortisedcost of these data structures with the interactivetheorem prover Isabelle/HOL,which allows forasemi-automatedverification (most of the calculations need some manual intervention though).
Achieving full automation requiredsubstantial implementation effort as the structural rules need to be appliedcarefully -aswelearnedduring our experiments -inorder to avoid asize explosion of 1.1. State of the Art and RelatedWork the generatedc onstraint system. We evaluate and discuss our design choices that lead to ascalable implementation.

State of the Art and RelatedW ork
To the best of our knowledge the establishedtype-and-effectsystem forthe analysis of logarithmic amortisedcomplexity is noveland also the automatedresourceanalysis of self-adjusting data structures like splaytrees, is unparalleledinthe literature.
The automatedcost analysis of imperative, functional and object-orientedprograms as well as of more abstract programming paradigms such as term rewriting systems and logic programming is an active Forimperativeprograms, aline of work infers cost bounds from lexicographic ranking functions using arguments that implicitly achieveanamortisedanalysis [SZV14; SZV15; SZV17; Fie+18] (for details we refer the reader to [SZV17]). The connection betweenranking functions and amortisedanalysis has also been discussedinthe context of term rewriting systems [MG14]. Proposals that incorporate amortisedanalysis within the recurrencerelations approach to cost analysis have been discussedin [AG12;AF17]. To the best of our knowledge, none of the citedapproaches is able to conduct aworst-case cost analysis forself-adjusting binary search trees such as splaytrees. One notable exception is [T N15] where the correct amortisedanalysis of splaytrees [DR85;RE 85] and other data structures is certified in Isabelle/HOL with some tactic support. However, it is not clear if the approach canbefurther automated.
1. Introduction SZV17]. This work has ledtothe development of the tool LOOPUS,which performs amortisedanalysis foraclass of programs that cannot be handledbyrelatedtools from the literature. Interestingly, LOOPUS infers worst-case costs from lexicographic ranking functions using arguments that implicitly achieveanamortisedanalysis (for details we refer the reader to [SZV17]). Another line of work has targetedthe resourcebound analysis of imperativeand object-orientedprograms through the extraction of recurrencerelations from the program under analysis, whose closed-form solutions then allows one to infer upper bounds on resourceusage [Alb+08; EA+11; AG12; AF17]. Amortisedanalysis with recurrencerelations has been discussedfor the tools COSTA [AG12] and CoFloCo [A F17]. Amortised analysis has also been employedinthe resourceanalysis forrewriting [MS20] and non-strict function programs, in particular, if lazy evaluation is conceived,see [S J+17].
Sublinear bounds are typically not in the focus of these tools, but canbeinferredbysome tools. In the recurrencerelations basedapproach to cost analysis [Alb+08; EA+11] refinements of linear ranking functions are combinedwith criteria fordivide-and-conquer patterns; this allows their tool PUBS to recognise logarithmic bounds forsome problems, but examples such as mergesort or splaying are beyond the scope of this approach. Logarithmic and exponential terms are integratedinto the synthesis of ranking functions in [CFG17], making use of an insightful adaption of Farkas'and Handelman's lemmata. The approach is able to handle examples such as mergesort,but again not suitable to handle self-adjusting data structures. Atype basedapproach to cost analysis foranML-like language is presentedin [WWC17], which uses the Master Theorem to handle divide-and-conquer-like recurrences. Very recently,s upport forthe Master Theorem wasalso integratedf or the analysis of rewriting systems [SG20], extending [MG16] on the modular resourceanalysis of rewriting to so-calledlogically constrainedrewriting systems [FKN17]. The resulting approach also supports the fully automated analysis of mergesort.
We also mention the quest forabstract program models whose resourcebound analysis problem is decidable, and forwhich the obtainable resourcebounds canbeprecisely characterised. We list here the size-change abstraction, whose worst-case complexity has been completely characterisedaspolynomial (with rational coefficients) [CDZ14;Zul15], vector-addition systems [Brá+18; Zul20], forwhich polynomial complexity canbedecided, and LOOP programs [BH19], forwhich multivariate polynomial bounds canbecomputed. We are not aware of similar results forprograms models that induce logarithmic bounds.

Contributions
Summarising, we make the following contributions: •W epropose anew class of template potential functions suitable forlogarithmic amortisedanalysis; these potential functions in particular include avariant of Schoenmakers'potential (a key building block forthe analysis of the splayfunction) and logarithmic expressions. Basedon these template potential functions, we present atype system forpotential-basedresourceanalysis capableofexpressing logarithmic amortisedcosts, and prove its soundness.
•W eencode the search forconcrete potential function coefficients as an optimisation problem over asuitable constraint system. Our target function steers the search towards coefficients that minimise the inferredamortisedcomplexity.O ur approach does not rely on manual annotations, it is a"push button" automation.
•W egivethe details of our implementation that enable an automatedanalysis. The main challenge consists in automating the calculations about the logarithmic potential functions. We achieveautomation by using Farkas'Lemma (5) forthe linear part of the calculations, and isolate monotonicity and asimple inequality betweenlogarithmic expressions as the necessary nonlinear facts that need to be addedtothe linear reasoning.
•W epresent our tool ATLAS and report on experimental results for splaytrees, splayheaps and pairingheaps.W ecompletely automatically infer complexity estimates that match previous results (obtainedbysophisticatedpen-and-paper proofs), and in some cases even infer better complexity estimates than previously published.
•W ereport on experimental results forthreeself-adjusting data structures, that is splaytrees, splay heaps and pairingheaps,and automatically infer logarithmic amortisedcost fortheir operations.
The theory presentedinthis thesis (mainly in Chapter 3) wasdevelopedincollaboration with Martin Hofmann, David Obwaller, Georg Moser, and Florian Zuleger. The software tool ATLAS (subject of Chapter 5) wasdevelopedsolely by the author of this thesis, and canbeconsideredthe main contribution of the thesis.
To some extent, the theory and the toolimplementing it were developedinparallel, with asynergistic effect. Improvements to the theory,causedbyadvancing the implementation, pointing out problematic cases, and asking the right questions, were part of this process. Chapter 4gives some insight into the intersection of the twoparts.
To provide sufficient context, this thesis also combines the contents of the following twopublications, fruits of the collaborativeeffort mentionedabove,inthe style of an extendedversion.

NewResultsfor AmortisedComplexity Analysis of Self-Adjusting Data Structures
We either improve the best known complexity bounds or provide new (alternative) proofs forknown complexity bounds. In Ta ble1.1 we state the complexity bounds computedbyATLAS next to results from the literature. We match or improve the results from [Sch92; BS93; NB19]. To the best of our knowledge, the bounds forsplay trees and splayheaps represent the state-of-the-art. We improve the 1. Introduction bound forthe delete function of splaytrees and all bounds forthe splayheap functions. Forpairing heaps, Iacono [Iac00; IY16] has proven(using amore involved potential function) that insert and merge have constant amortisedcomplexity,while the other data structure operations continue to have an amortisedcomplexity of log(| |);while we leave an automatedanalysis basedonIacono's potential function forfuture work, we note that his coefficients in the logarithmic terms are large, and that therefore the small coefficients in Ta ble1.1 are still of interest. We will detail belowthat we usedasimpler potential function than [Sch92; BS93; NB19] to obtain our results. Hence, also the new proofs of the confirmedcomplexity bounds canbeconsideredacontribution.

ANew Approach forthe Complexity Analysis of DataStructures
Establishing the prior results in Ta ble1.1 requiredconsiderable effort. Schoenmakers studiedinhis PhD thesis [Sch92] the best amortisedcomplexity bounds that canbeobtainedusing aparametrised potential function ( ),where is abinary tree, definedby (leaf) 0 and (( , , )) ( )+ log (| |+| |)+ ( ),for real-valuedparameters , >0 .Carrying out asophisticatedoptimisation with pen and paper, he concludedthat the best bounds are obtainedb ysetting = 3 √ 4 and = 1 3 forsplay trees, and by setting = √ 2 and = 1 2 forpairing heaps (splayheaps were proposed only some years later by Okasaki in [C O99]). Brinkop and Nipkowverify their complexity results for splaytrees in the theorem prover Isabelle/HOL [NB19]. They note that manipulating the expressions corresponding to log (| |) could only partly be automated. 2 Forsplay heaps, there is to the best of our knowledge no previous attempt to optimise the obtainedcomplexity bounds, which might explain whyour optimising analysis wasabletoimprove all bounds. Forpairing heaps, Brinkop and Nipkow did not use the optimal parameters reportedbySchoenmakers -probably in order to avoid reasoning about polynomial inequalities -, which explains the worse complexity bounds. In contrast to the discussedapproaches, we were able to verify and improve the previous results fully automatically.Our approach uses avariation of Schoenmakers'potential function, where we roughly fix = 2 and leave as aparameter forthe optimisation phase (seeSection 2.1 formore details). Despite these choices, our approach wasabletoderivebounds that match and improve the previous results, which came as a surprise to us. Looking back at our experiments and interpreting the obtainedresults, we recognise that we might have been in luck with the particular choiceofthe potential function (because we canobtain the previous results despite fixing = 2). However, we would not have expectedthat an automated analysis is able to match and improve all previously reportedcoefficients, which shows the powerof the optimisation phase.
We believe that ourresults suggest anew approachfor thecomplexity analysis of data structures. So far, self-adjusting data structures had to be analysedmanually.T his is possibly due to the use of sophisticatedpotential functions, which maycontain logarithmic expressions. Both features are challenging forautomatedreasoning. Our results suggest that the following alternative(seeSections 2.1 and 4.1 for more details): 1. Fix aparametrisedpotential function; 2. derivea(linear) constraint system over the function parameters from the AST of the program; 3. capture the requirednon-linear reasoning in lemmata, and use Farkas'Lemma (Lemma 5) to integrate the application of these lemmata into the constraint system (in our case twolemmata, one about an arithmetic property and one about the monotonicity of the logarithm, were sufficient forall of our benchmarks); and finally 4. find values forthe parameters by an (optimising) constraint solver.
We believethat our approach will carry over to other data structures: one needstoadapt the potential functions and add suitable lemmata, but the overall setup will be the same. We compare the proposed methodology to program synthesis by sketching [Sol09], where the synthesis engineer communicates her main insights to the synthesis engine (in our case the potential functions plus suitable lemmata), and aconstraint solver then fills in the details.

Outline
The rest of this thesis is organisedasfollows: In Chapter 2, to set the stage (Section 2.1) we review the key concepts underlying type-basedamortised analysis (Sections 2.1.1 and 2.1.2) and present our ideas fortheir extension (Sections 2.1.1 and 2.1.2 respectively). We also present anecessarily simple but at the same time sufficiently complexprogramming languge (Section 2.2) to be usedinthe later chapters, and spell out the motivating example of splay trees in this programming language(Section 2.3).
Chapter 3presents atype system forlogarithmic amoritsedresourceanalysis. We first discuss resource functions in Section 3.1, then present the type system and its rules in Section 3.2, and finally apply it to analyse twoprograms in Section 3.3.
Chapter 4bridges betweenthe theory itself (Chapter 3) and its implementation (Chapter 5): Anumber of challenges at this intersection needed to be solved.W egroup them as follows: In Section 4.1 we address translation of non-linear properties of the logarithm into aworkable linear constraint system as well as clarify the role of Farkas'Lemma. The steps requiredtogofrom type checking to type inference are coveredinSection 4.2.
The implementation of the toolisdescribedinChapter 5. We report experimental results forsplay trees, splayheaps and pairing heaps in Section 5.3.
We conclude in Chapter 6.

The Physicist'sMethodofAmortisedAnalysis
Before we elaborate anyfurther, we revisit the seminal work [DR85; RE 85] introducing amortised analysis, sinceitisfoundational forthis thesis. Originally,amortisedanalysis waspresentedfrom two points of view,calledthe thebanker's view and thephysicist's view,respectively.Inthis section we focus on the latter, sinceitisthe approach taken in this thesis.
In the physicist's view,amoritsedanalysis revolvesaround the notion of a potentialfunction.Apotential function Φ( ) maps anydatastructure configuration into anumber. We call Φ( ) the potential of .
The idea is to use this concept of potential to reason about the amortised cost of an operation performed on the datastructure. The notion of amortised cost relates to the actual cost as follows: Foranoperation with actual cost ( ),the amortisedcost of is ( ) ( )( )+Φ( ( )) − Φ( ),i.e.the sum of the actual cost of performing on and the potential of the result, minus the potential of .
The analogy at work is that of physical objects conserving potential energy in classical mechanics: Moving an object higher up (comparedtosome referenceheight) requires work, and will increase its gravitational potential energy.This energy is convertedback to kinetic energy when the object movesback down. Operating on datastructures is thus analoguos to moving and up or down in space, depending on the characteristics of the potential function and the operation.

The Physicist's Method of AmortisedAnalysis
Another key aspect of amortisedanalysis is that it considers sequences of operations. We address this next. We use • to denote composition of operations. Amortisedcost generalises in asimple way: Further, we use exponentiation to denote repeatedcomposition of an operation with itself. We set ∀ ∀ . 0 ( ) = .
With the convention that the empty composition is equivalent to its unit, identity,which does not incur anycost, we canexpress this recurrenceasasum.
Note that twoofthe threeterms in the sum telescope. By exploiting this, we arriveatamore direct form that only talks about the potential before applying anyoperation, Φ( ),and the potential after all operations have been applied, Φ( •···• 1 ( )).
To use amortisedcost as an upper bound foractual cost, we impose tworestrictions on Φ.F irstly, Φ( ) = 0,which is to saythat the initial potential is zero.T his first condition is also intuitivein the sense that an empty datastructure stores no data and therefore no "fuel" forcomputation. And secondly, ∀ .Φ( )≥0which avoids "borrowing" potential when there is none left. This gives an upper bound forthe actual cost of repeatedapplication of : With this machinery,cost analysis is reframedasthe task of choosing Φ in such away that the difference betweenamortisedcost and actual cost is minimal.
Note that amortisedanalysis has acompositional character. Even though we analyse each operation only once, we arriveatcost forasequenceofoperations. Amortisedanalysis allows to assign amortised cost that is logarithmic in the size of the input data structure to an operation, even though the operation analysedinisolation would yieldhigher, e.g. linear worst-case cost.

Introduction
The above notion of amortisedcost forsequences of operations does not adequately capture operations that take twoormore datastructures as input (e.g. merging/union, difference). Forageneralisation from sequences to trees, refer to [NB19, Sec. 3].
Considering statically typedfunctional programming languages, operations on the data structure D are functions : D→D.With partial application and polymorphism, examples would be insert : →D →D and delete : →D →D .T he approach taken in this thesis is to encode the Φ in the types. In later chapters, we will define some structure which characterises Φ,and thus, we get types of the form : D | →D | .

Preliminaries
In this chapter, we briefly present the state-of-the-art that our approach builds on. The goal is to highlight similarities betweenexisting polynomial amortisedanalysis to our logarithmic analysis, and marking points fordeparture, such as cost-freetyping. We also present our programming language. holds forall inputs .This allows one to read off an upper bound on the amortisedcost of ,i.e.wehave 1) Automation is achieved by a type-and-effect system that uses template potentialfunctions,i.e.functions of afixedshape with indeterminate coefficients. Here, the key challenge is to identify templates that are suitable forlogarithmic analysis and that are closedunder the basic operations of the consideredprogramming language.

Setting the Stage
2) In addition to the actual amortisedanalysis with costs, we employ cost-free analysis as asubroutine, setting the amortised and actual costs of all functions to zero.This enables a size analysis of sorts, because the inequality ( ) ⩾ ( ( )) bounds the size of the potential ( ( )) in terms of the potential ( ).The size analysis we conduct allows lifting the analysis of asubprogram to a larger context, which is crucial forachieving a compositionalanalysis.

Polynomial AmortisedAnalysis
Suppose that we have types , , , ... representing sets of values. We write forthe set of values representedbytype .T ypes maybeconstructedfrom base types such as Booleans and integers, de-notedbyBase,and by type formers such as list, tree, product, sum, etc. By introducing product types, one canregard functions with several arguments as unary functions, which allows fortechnically smooth formalisations, see[JM10b; JM10a; JH11]; the analyses in the citedpapers are called univariate as the set of basic potential functions BF ( ) of aproduct type is given directly.I nthe later multivariate versions of automatedamortisedanalysis [JKM11; JKM12a; MG15] one takes amore fine-grainedapproach to products. Namely,one then sets (for arbitrary ) Thus, the basic potential function foraproduct type is obtainedasthe multiplication of the basic potential functions of its constituents. 1 2.1. Setting the Stage

Automation
The idea behind this setup is that the basic potential functions BF ( ) are suitably chosen and fixed by the analysis designer, the coefficients ( ) for ∈B F( ),however, are left indeterminate and will (automatically) be fixed during the analysis. Forthis, constraints over the unknown coefficients are collectedinasyntax-directedway from the function under analysis and then solved by asuitable constraint solver. The type-and-effectsystem formalises this collection of constraints as typing rules, where foreach construct of the consideredprogramming language atyping rule is given that corresponds to constraints over the coefficients of the annotatedtypes. Expressing the quest forsuitable type annotations as atype-and-effectsystem allows one to compose typing judgements in asyntax-orientedway without the need forfixing additional intermediate results, which is often requiredbycompeting approaches. This syntax-directedapproach to amortisedanalysis has been demonstratedtowork well fordatatypes like lists or trees whose basic potential functions are polynomials over the length of alist resp.the number of nodes of atree. One of the reasons whythis works well is, e.g., that functional programming languages typically include dedicatedsyntax forlist construction and that polynomials are closedunder addition by one (i.e. if ( ) is apolynomial, so is ( + 1)), supporting the formulation of asuitable typing rule forlist construction, see

Logarithmic AmortisedAnalysis
We nowmotivate the design choices of our type-and-effectsystem. The main objectiveofour approach is the automatedanalysis of data structures such as splaytrees, which have logarithmic amortisedcost. The amortisedanalysis of splaytrees is tricky and requires choosing an adequate potential function: our work makes use of av ariant of Schoenmakers'potential, rk( ) foratree ,s ee [B S93; TN15], definedinductively by where , are the left resp.right child of the tree ( , , ), | | denotes the number of leavesofatree , and is some data element that is ignoredbythe potential function. Besides Schoenmakers'potential we need to add further basic potential functions to our analysis. This is motivatedasfollows: Similar to the polynomial amortisedanalysis discussedabove we want that the basic potential functions can express the construction of atree, e.g., let us consider the function which constructs the tree ( , , ) from some trees , and some data element ,and let us assume aconstant cost ( , ) = 1 forthe function .Atype annotation for is given by , 13 2. Preliminaries i.e. the potential ( , ) suffices to payf or the cost of executing and the potential of the result ( ( , )) (the correctness of this annotation canbeestablished directly from the definition of Schoenmakers'potential). As mentionedabove,the logarithmic expressions in ( , ),i.e. log(| |) + log(| |) + 1,specify the amortised costs of the operation.
We seethat in order to express the potential ( , ) we also need the basic potential functions log(| |) foratree .Infact, we will choose the slightly richer set of basic potential functions where , ∈ N and is atree. We note that by setting = 0 and = 2 this choiceallows us to represent the constant function with ( ) = 1 forall trees .W efurther note that this choice of potential functions is sufficiently rich to express that ( , ) ( ) = ( , + ) ( ) fortrees , with | | = | |+1,which is needed forprecisely expressing the change of potential when atreeisextended by one node. Further, we define basic potential functions forproducts of trees by setting where 1 ,..., , ∈Nand 1 ,..., is atuple of trees. This is sufficiently rich to state the equality ,which supports the formulation of a sharing rule, which in turn is needed forsupporting the let-construct in functional programming; see[JKM11; JKM12a; MG15] foramore detailedexposition on the sharing rule and the let-construct.

Cost-FreeSemantics Polynomial AmortisedAnalysis
We begin by reviewing the cost-freesemantics underlying previous work [JM10a; JH11; JKM11; JKM12a] on polynomial amortisedanalysis. Assume that we want to analyse the composedfunction call ( ( ), ) using already establishedanalysis results for ( )and ( , ).Suppose we have already establishedthat forall , , we have: where as in the multivariate case above, is arbitrary and equations (2.1) and (2.3) assume cost, while equation (2.2) is cost-free.Then, we canconclude forall , that guaranteeing that the potential ( , ) suffices to payfor the cost ( ) of computing ( ),the cost ( ( ), ) of computing ( ( ), ) and the potential ( ( ( ), )) of the result ( ( ), ).W e note that the correctness of this inferencehinges on the fact that we canmultiply equation (2.2) with ( ) for = 1 ... ,using the monotonicity of the multiplication operation (note that potential functions are non-negative). We highlight that the multiplication argument works well with cost-free semantics, and enables lifting the resourceanalysis of ( ) and ( , ) to the composedfunction call ( ( ),

Logarithmic Amortised Analysis
Similar to the polynomial case, we want to analyse the composedfunction call ( ( ), ) using already establishedanalysis results for ( ) and ( , ).However, nowweextend the class of potential functions to sublinear functions. Assume that we have already establishedthat where equations (2.4) and (2.6) assume cost, while equation (2.5) is cost-free.E quations (2.4) and (2.5) represent the result of an analysis of ( ) (note that these equations do not contain the parameters ,which will howeverbeneeded forthe analysis of ( ( ), )), and equation (2.6) the result of an analysis of ( , ).Then, we canconclude forall , , that guaranteeing that the potential ( , ) suffices to payfor the cost ( ) of computing ( ),the cost ( ( ), ) of computing ( ( ), ) and the potential ( ( ( ), )) of the result ( ( ), ).Here, we crucially use monotonicity of the logarithm function, as formalisedinLemma 2. This reasoning allows us to lift isolatedanalyses of the functions ( ) and ( , ) to the composedfunction call ( ( ), );this is what is requiredfor acompositional analysis! 15 Example 1. We now illustrate thec ompositionalr easoning on an example. We reconsider thef unction ( , , ) ( , , ),which takes two trees , andsome data element andreturns thetree ( , , ).A ssume that we already have established that Kindlynote that theabove example appears in similarform as part of theanalysis of the splay function described in Section 3.3.

ANecessarily Simple and SufficientlyComplex Programming Language
In this section we introduceafirst-order programming language that will be usedthroughout the later chapters. It is designedtobeassimple as possible and comfortable, while still allowing to define all operations on splaytrees (presentedasdefinedin [TN15] below, and analysedindetail in Section 3.3.2) which are the primary motivating example.

Syntax
Consider following grammar in aBNF-like style that defines expressions .Note that corresponds to afunction definition, while is to be substitutedbythe nameofafunction definition and , 1 ,..., are to be substitutedbyvariable names.
We skipthe standard definition of integer constants ∈ Z as well as variable declarations, cf. [B P02].
In the definition of syntax above and semantics and typing rules below, expressions are given in letnormal-form forsimplicity.O nthe other hand, examplary code will not be presentedinlet-normalform forreadability.Atranslation to let-normal-form is describedinChapter 5.

Semantics
To make the presentation more succinct, we assume only the following types: Boolean values Bool = {true, false},anabstract base type Base (abbrev. B), product types, and binary trees Tree (abbrev. T), whose internal nodes are labelledwith elements : Base.W euse lower-case Greek letters forthe denotation of types. Elements : Tree are definedbythe followinggrammar which fixes notation.
Furthermore, we omit binary operators, and only define essential comparisons. Forour analysis, these are unimportant, as long as we assume that no actual costs are emitted.
A typing context is amapping from variables V to types. Type contexts are denotedbyupper-case Greek letters (usually Γ, Δ). Aprogram P consists of asignature F together with aset of function definitions of the form ( 1 ,..., ) = ,w here the are variables and an expression. A substitution or (environment) is amapping from variables to values that respects types. Substitutions are denotedas sets of assignments: T hen we denote the (disjoint) union of and as .W eemployasimple cost-sensitivebig-step semantics basedon eager evaluation, whose rules are given in Figure 2.1. The judgement ℓ ⇒ means that under environment ,expression is evaluatedtovalue in exactly ℓ steps. Here only rule applications emit (unit) costs.

Motivating Example:S play Trees
Splaytrees have been introduced by Sleator and Ta rjan [DR85; RE 85]asself-adjusting binary search trees with strictly increasing inorder traversal. There is no explicit balancing condition. All operations rely on atreerotating operation dubbed splaying; splay at is performedbyrotating element to the root of tree while keeping inorder traversal intact. If is not containedin ,then the last element found before leaf is rotatedtothe tree. The complete definition is given in Figure 2.2. Basedon splaying, searching is performedbysplaying with the sought element and comparing to the root of the result. Similarly,the definition of insertion and deletion depends on splaying. As an example the definition of insertion and delete is given in Figure 2.3 and 2.4 respectively.S ee also [T N15] forfull algorithmic, formally verified, descriptions.
All basic operations canbeperformedin (log ) amortisedruntime. The logarithmic amortised complexity is crucially achieved by local rotations of subtrees in the definition of splay.A mortised cost analysis of splaying has been providedfor example by Sleator and Ta rjan [DR85] Here [ ↦ → ] denotes the update of the environment such that [ ↦ → ]( ) = and the value of all other variables remains unchanged. Furthermore, in the second match rule, we set

A Type System forAnalysis of Logarithmic AmortizedComplexity
In this chapter, we will present atype system designedfor logarithmic amortisedworst-case complexity.I tequips the programming language definedinSection 2.2 with astandard Hindley-Milner type inference, enrichedbytype annotations that capture potential.
We will first introduceresourcefunctions, that measure the potential of atree-baseddata structure, and then proceed with presenting the type system in Section 3.2. To conclude the chapter, we will apply it to our running example of splaytrees in Section 3.3.

Resource Functions
In this section, we detail the basic potential functions employedand clarify the definition of potentials used.
Only trees are assignednon-zero potential. This is not asevere restriction as potentials forbasic datatypes would only become essential if the construction of such types would emit actual costs. This is not the case in our context. Moreover, note that lists canbeconceived as trees of particular shape. The potential Φ( ) of atree is given as anon-negativelinear combination of basic functions, which essentially amount to "sums of logs", seeSchoenmakers [B S93]. It suffices to specify the basic functions forthe type of trees T.Asalready mentionedinSection 2.1, the rank rk( ) of atreeisdefinedasfollows We set log ( ) log 2 (max{ , 1}),that is, the (binary) logarithm function is definedfor all numbers. This is merely atechnicality,introduced to ease the presentation as it simplifies the statement of subsequent definitions. In the following, we will denote the modifiedlogarithmic function, simply as log.F urthermore, recall that | | denotes the number of leavesintree .T he definition of "rank" is inspiredbythe definition of potential in [B S93; TN15], but subtly changedtosuit it to our context. Definition 2. The basic potential functions of Tree,denoted BF,are • rk( ),and Note that the constant function 1 is representable: Following the recipe of the high-level description in Section 2.1, potentials or more generally resource functions become definable as linear combinations of basic potential functions.
Definition 3. A resourcefunction : T →R + 0 is anon-negative linearcombination of basicpotentialfunctions, that is, , ∈N ] with * , ( , ) ∈ Q + 0 with all but finitely manyofthe coefficients * , ( , ) equal to 0. It represents a (finite) linear combination of basic potential functions, that is, aresourcefunction. The empty annotation, that is, the annotation where all coefficients are set to zero,isdenoted as ∅.
Remark 2. We use theconvention that thesequenceelements of resourceannotations aredenoted by the lower-case letter of theannotation, potentiallywithcorresponding sub-or superscripts.

Analysis of Products of Trees
We nowlift the basic potential functions ( , ) of asingle treetoproducts of trees. As discussedin Section 2.1, we define the potential functions ( 1 ,..., , ) forasequenceof trees 1 ,..., ,b y setting: where 1 ,..., , ∈ N.E quippedwith this definition, we generalise annotations to sequences of trees. An annotation forasequenceoflength is asequence again vanishing almost everywhere. Note that an annotation of length 1 is simply an annotation as definedabove,where the coefficient 1 is set to equal the coefficient * .Basedonthis, the potential of asequenceoftrees 1 ,..., is definedasfollows: .., , ) ) , ∈N ] be an annotation of length as above. We define Note that foranempty sequenceoftrees, we have Φ( | ) = ∈N log( ).N ote that the rank function rk( ) amounts to the sum of the logarithms of the size of subtrees of .I nparticular if the tree simplifies to alist of length ,then rk( ) = ( + 1)+ = 1 log( ).M oreover, as =1 log( )∈ Θ ( log ),the above definedpotential functions are sufficiently rich to express linear combinations of sub-and super-linear functions.
This amounts to apotentialofthe arguments rk( )+3log(| |) + 1,while for theresult we consider only its rank,that is, theannotation expresses 3log(| |) + 1 as thelogarithmiccost of splaying. The correctness of theinducedlogarithmicamortised costs for thezig-zig case of splaying is verified in Section 3.3 andis also automaticallyverified by ourprototype. □ Suppose Φ( 1 ,..., , 1 , 2 | ) denotes an annotatedsequenceoflength + 2.S uppose further 1 = 2 and we want to share the value ,that is, the corresponding function arguments appears multiple times in the body of the function definition. Then we make use of the operator ⋎( ) that adapts the potential suitably.T he operator is also called sharingoperator in analogy to [JKM12a, Lemma 6.6].

AT ype System forLogarithmic AmortisedResource Analysis
In this section, we present the central contribution of this thesis. We delineate anovel type-and-effect system incorporating apotential-basedamortisedresourceanalysis capableofexpressing logarithmic amortisedcosts. Soundness of the approach is establishedinTheorem 3.
Remark 5. Principallythe type system canbeparameterised in theresourcemetric(see e.g. [JKM12a]). In this thesis, we focus on amortised andworst-case runtime complexity, symbolicallymeasured through thenumber of function applications. It is straightforward to generalise this type system to other monotone cost models. W.r.t. non-monotone costs, like e.g. heapusage, we expect thetype system canalso be readily be adapted, but this is outside thescope of thethesis.
We consider the typing rules in turn; recall the convention that sequenceelements of annotations are denotedbythe lower-case letter of the annotation. Further, note that sequenceelements which do not occur in anyconstraint are set to zero.T he variable rule (var) types avariable of unspecifiedtype . As no actual costs are requiredthe annotation is unchanged. Similarly no resources are lost through the use of controloperators. Hencethe definition of the rules (cmp) and (ite) is straightforward.
As exemplary constructor rules, we have rule (leaf) forthe empty treeand rule (node) forthe node constructor. Both rules define suitable constraints on the resourceannotations to guaranteethat the potential of the values is correctly represented. The application rule (app) represents the application of arule given in P.Each application emits actual cost 1,which is indicatedinthe subtraction of 1.I nits simplest form, that is, forthe factor = 0, the rule allows to directly read off the requiredannotation from the set of signatures F .F or arbitrary ∈ Q + 0 ,the rule allows to combine some signature with cost with acost-freesignature. We note that Remark 4would in fact allowustoadd anypositivelinear combination of cost-freesignatures; however, forperformancereasons we refrain from doing so.
In the pattern matching rule (match) the potential freed through the destruction of the treeconstruction is addedtothe annotation ,which is usedinthe right premise of the rule. Note that | | = + 2, where equals the number of treetypes in the type context Γ.
The constraints expressedinthe typing rules (let : T) and (let : gen),guaranteethat the potential providedthrough annotation is distributedamong the call to 1 and 2 ,that is, this rule takes care of function composition. The numbers , ,respectively,denote the number of treetypes in Γ, Δ. Due to the sharing rule -discussedinamoment -wecan assume w.l.o.g. that each variable in 1 and 2 occurs at most once.
First, consider the rule (let : gen),that is, the expression 1 evaluates to avalue of arbitrary type ≠ Tree.Inthis case the resulting value cannot carry anypotential. This is indicatedthrough the empty annotation ∅ in the typing judgement Γ| 1 : |∅.Similarly,inthe judgement Δ, : | 2 : | forthe expression 2 ,all available potential prior to the execution of 2 stems from the potential embodiedinthe type context Δ w.r.t. annotation .T his is enforced by the corresponding constraints. Suppose for ≠ 0 and ≠ 0, ( , , ) would be non-zero.T hen the corresponding sharedpotential betweenthe contexts Γ and Δ w.r.t. is discardedbythe rule, as there is no possibility this potential is attachedtothe result type .
Second, consider the more involved rule (let : T).T oexplain this rule, we momentarily assume that in no potential is shared, that is, ( , , ) = 0,whenever ≠ 0, ≠ 0.Inthis sub-case the rule can be simplifiedasfollows: Again the potential in Γ, Δ (w.r.t. annotation )isdistributedfor the typing of the expressions 1 , 2 , respectively,which is governedbythe constraints on the annotations. The simplifiedrule is obtained, as the assumption that no sharedpotential exists, makes almost all constraints vacuous. In particular, the cost-freederivation Γ| ( , , ) cf 1 : T| ( , , ) is not required.
At last, the type system makes use of structural rules, like the sharing rule (share) and the weakening rules (w : var) and (w).T he sharing rule employs the sharing operator, definedinLemma 1. Note that the variables , introduced in the assumption of the typing rule are fresh variables, that do not occur in Γ.S imilarly,the rule (shift) allows to shift the potential before and after evaluation of the expression by aconstant .
Note that the weakening rules embody changes in the potential of the type context of expressions considered. This amounts to the comparison on logarithmic expressions, principally anon-trivial task that cannot be directly representedasconstraints in the type system. Instead, the rule (w) employs a symbolic potential expressions forthese comparisons, replacing actual values fortreebyvariables. Let Γ denote atype context containing the type declarations 1 : T,..., : Tand  Before we state and prove the soundness of the presentedtype-and-effectsystem, we establish the following auxiliary result, employedinthe correct assessment of the transfer of potential in the case of function composition, seeFigure 3.1. Seealso the high-level description providedinSection 2.1.
Nowweconsider some ⩾ 1.Combining (3.1) and (3.2), we get where we have usedthat ≥ 1,and that ⩾ 1 and ⩾ 1 imply ⩾ .Bytaking the logarithm on both sides of the inequality we obtain the claim. □ Finally,weobtain the following soundness result, which roughly states that if aprogram P terminates, then the differenceinpotential has paid its execution costs. 1 Theorem 3 (Soundness Theorem). Let P be well-typed andlet be asubstitution. Suppose Γ| : | and Proof. The proof embodies the high-level description given in Section 2.1. It proceedsbymain induction on Π : ℓ ⇒ and by side induction on Ξ : Γ| : | ,where the latter is employedin the context of the weakening rules. We consider only afew cases of interest. Forexample, foracase not covered: the variable rule (var) types avariable of unspecifiedtype .A snoactual costs are required the annotation is unchangedand the theorem follows trivially. 1 Astated, soundness assumes termination of P,but our analysis is not restrictedtoterminating programs. In order to avoid the assumption the soundnesstheorem would have to be formulatedw.r.t. to apartial big-step or asmall step semantics, see[JM10a; MS20]. We consider this outside the scope of this thesis.

AType System forLogarithmic AmortisedResourceAnalysis
Case.Suppose Π has the following from: Case.Consider the first (match) rule, where Π ends as follows: W.l.o.g. we mayassume that Ξ ends with the relatedapplication of the (match) rule: Now, consider the second (match) rule, that is, Π ends as follows: with| leaf -> 1 | ( 0 , 1 , 2 ) -> 2 ⇒ . As above,wemay assume that Ξ ends with the relatedapplication of the (match) rule. In this subcase, the assumption on Π yields = ( , , ).B ydefinition and the constraints given in the rule, we obtain: Case.Consider the (let) rule, that is, Π ends in the following rule: where ℓ = ℓ 1 + ℓ 2 .First, we consider the sub-case, where the type of 1 is an arbitrary type but not of type Tree.I.e.weassume that Ξ ends in thefollowing application of the (let : gen)-rule Second, we consider the more involved sub-case, where 1 is of Tree type. Thus, w.l.o.g. Ξ ends in the following application of the (let : T)-rule.

Remark 6. We note that thebasicresourcefunctions canbegeneralised to additionallyrepresent linear functions in thesizeofthe arguments.The above soundness theorem is not affected by this generalisation.
In the next section, we exemplify the use of the proposedtype-and-effectsystem, seeFigure 3.1, on the motivating example.

Example Analysis
In this section we apply the proposedtype-and-effectsystem to obtain an analysis of the amortisedcosts of the zig-zig case of splaying,for type annotations that are fixed apriori. As apreparatory step,also to emphasise the need forthe cost-freesemantics, we make precise the informal account of compositional reasoning given in Section 2.1.2.
We emphasise that the involved (let)-rule, employedinstep ( * ) cannot be avoided. In particular, the additional cost-freederivation (3.5) is essential. Observethe annotation markedinred in the calculation above.E ventually these amount to asharedpotential employedinstep ( * ). The cost-freesemantics allows us to exploit this sharedpotential, which otherwise would have to be discarded.
To wit, assume momentarily the rule (let) would not make use of cost-freereasoning, similar to the simplified (let)-rule, that we have usedinthe explanations on page 26. Then the sharedpotential representedbythe coefficient (1,1,1,0,0

SplayTrees
In this subsection, we exemplify the use of the type system presentedinthe last section on the function splay,see where the expression is the definition of splay given in Figure 2.2 and the annotations and are as follows: Remark that the amortisedcost of splaying is representedbythe coeficients (1,0) and (0,2) ,expressing in sum 3log(| |) + 1.Note, further that the coefficient 1 , * ,represent Schoenmakers'potential, that is, rk( ) and rk(splay at ) ,respectively.
We restrict to the zig-zig case which amounts to t= ((bl,b,br),c,cr) together with the recursivec all splay ab l=( al,a',ar) and side conditions < < .T hus splay atyields (al,a',(ar,b,(br,c,cr))).R ecall that need not occur in ,i nthisc ase, the last element before aleaf wasfound, is rotatedtothe root. Our prototype checks correctness of these annotations automatically. Below, we showasimplifiedderivation of (3.6), where we have focusedonly on aparticular path in the derivation tree, suitedtothe consideredzig-zig case of the definition of splaying. Omission of premises or rules is indicatedbydouble lines in the inferencestep.A gain we make crucial use of the cost-free semantics in this derivation. The corresponding inferencestep is markedwith ( * )and the employed sharedpotentials are markedinred.T oease presentation we take the liberty of removing types of tree variables from the context. We abbreviate Γ : B, :B, :B,Δ :B, :B.I naddition to the original signature of splaying, B × T| → T| ,weuse the following annotations, induced by constraints in the type system, seeFigure 3.1. As in Section 3.3.1, we mark annotations that require cost-freederivations in the (let : T) rule in red.
As indicatedthe cost-freederivation also requires the use of the full version of the rule (let : T),as markedb y( * ).I nparticular, the informal argument on the size of the argument and the result of splaying is built into the type system. We use the following annotations:    The type system presentedinChapter 3isasound, theoretical basis. In Section 2.1.1 we mentionedhow inferring coefficients allows analysis. However, development of atoolthat performs fully automated analysis, comes with additional challenges.
In this chapter, we want to highlight the challenges and howthey were met.

Linearisation and Expert Knowledge
In the context of the presentedtype system (seeFigure 3.1) an obvious challenge is the requirement to compare potentials symbolically (seeSection 3.2) rather than to compare annotations directly.This is in contrast to results on resourceanalysis forconstant amortisedcosts, seee.g. [S J+09; SJ+10; JKM12a; JAS17; SJ+17].
Comparison betweenlogarithmic expressions, constitutes afirst major challenge, as such acomparison cannot be directly encodedasalinear constraint problem.
To achievesuch linearisation,wemake use of the following: (i) asubtle and surprisingly effectivevariant of Schoenmakers'potential (not coveredbelow, refer to Section 3.1); (ii) mathematical facts about the logarithm function -like Lemma 4below-referredtoasexpert knowledge;and finally (iii) Farkas'Lemma (Lemma 5) forturning the universally-quantifiedpremise of the weakening rule into an existentially-quantifiedstatement that canbeaddedtothe constraint system. Furthermore, the presenceoflogarithmic basic functions seems to necessitate the embodiment of nonlinear arithmetic. In particular, as we need to make use of basic laws of the log functions, as the following one. Avariant of the belowfact has already been observed by Okasaki, see [CO99].
Hence ( + ) 2 ⩾ 4 and from the monotonicity of log we conclude log( ) ⩽ log( ( + ) 2 4 ).B y elementary laws of log we obtain: from which the lemma follows as log( ) = log( )+log( ). □ We remark that our automatedanalysis shows that Lemma 4isnot only crucial in the analysis of splaying, but also forthe other data structures we have investigated.
Arefinedand efficient approach which targets linear constraints is achievable as follows. All logarithmic terms, that is, terms of the form log(.) are replaced by new variables, focusing on finitely many. Forthe latter, we exploit the condition that in resourceannotations only finitely manycoefficients are non-zero.Consider the following inequality as prototypical example. Validity of the constraint ought to incorporate the monotonicity of log.

Linearisation and Expert Knowledge
Belowwediscuss ageneral method forthe derivation of inequalities such as (4.3) basedonthe affine form of Farkas'Lemma. First, we state the variant of Farkas'Lemma that we use in this article, see [Sch99]. Note that and denote column vectors of suitable length.
Proof. It is easytosee that from (4.5), we obtain (4.4). Assume (4.5). Assume further that ⩽ for some column vector .Then we have Note that forthis direction the assumption that ⩽ , ⩾ 0 is solvable is not required.
With respect to the opposite direction, we assume (4.4). By assumption, ⩽ , ⩾ 0 is solvable. Hence, maximisation of under the side condition ⩽ , ⩾ 0 is feasible. Let denote the maximal value. Due to (4.4), we have ⩽ . Now, consider the dual asymmetric linear program to minimise under side condition = and ⩾ 0.Due to the Dualisation Theorem, the dual problem is also solvable with the same solution = = .
According to the above discussion we canrepresent the inequality Φ(Γ| ) ⩽ Φ(Γ| ) by where is afinite vector of variables representing the base potential functions, and are column vectors representing the unknown coefficients of the non-constant potential functions, and and are the coefficients of the constant potential functions. We assume the expert knowledge is given by the constraints ⩽ , ⩾ 0.W enow want to deriveconditions for , , ,and such that we canguarantee By Farkas'Lemma it is sufficient to find coefficients ⩾ 0 such that Hence, we canensure (4.6) by (4.7) using the new unknowns .
Thus, the validity of constraints incorporating the monotonicity of log becomes expressible in asystematic way. Further, the symbolic constraints enforced by the weakening rule canb ed ischarged systematically and become expressible as existential constraint satisfaction problems. Note that the incorporation of Farkas'Lemma in the above form subsumes the well-known practiceofcoefficient comparison forthe synthesis of polynomial interpretations [E C+05], ranking functions [AA04] or resourceannotations in thecontext of constant amortisedcosts [JKM12a].
The incorporation of Farkas'Lemma with suitable expert knowledge is already essential for type checking,whenever (w) needstobedischarged.
ATLAS incorporates twofacts into the expert knowledge: Lemma 5and the monotonicity of the logarithm (seeChapter 5). We found these twofacts to be sufficient forhandling our benchmarks, i.e. expert knowledge of form (ii) and (iii) wasnot needed.( We note though that we have played with adding adedicatedsize analysis (ii), which interestingly increasedthe solver performance, despite generating alarge constraint system).
Further, the following variant of Farkas'Lemma, lies at the heart of an effectivetransformation of comparisons demandedby(w)into alinear constraint problem.
The lemma allows the assumption of expert knowledge through the assumption ⩽ forall ⩾ 0. The, thus formalisedexpert knowledge is aclear point of departure foradditional information.
Further, we emphasise that Lemma 5subsumes the well-known practise of coefficient comparison for the synthesis of polynomial interpretations [E C+05], ranking functions [AA04] or resourceannotations in the context of constant amortisedcosts [JKM12a].
In the next section, we briefly detail our implementation of the establishedlogarithmic amortisedresourceanalysis, basedonthe observations in this section.

Type Inference
Second, we reckon to what these ideas are sufficient fortype checking and detail further challenges to fully automatedtype inferenceand the main design choiceinATLAS,toovercome these challenges. Finally,weindicate howour toolsolvesthe gatheredconstraints induced by the type system forthe motivating example and we remark on challenges posedbyour benchmarking code base for splayheaps and pairingheaps.
Further, we automate the application of structural rules like sharing or weakening We emphasise that it is not sufficient to include all weakening steps into the axioms of the typing rules. This is in contrast to the situation of earlier work by Hofmann et al.,e.g. [MS03; JKM11; JKM12b; HR13; JAS17; SJ+17], which could rely on so-calledalgorithmic typing rules.
Concretely,they came about by anovel (i) optimisation layer; (ii) acareful controlofthe structuralrules; (iii) the generalisation of user-defined proof tactics into an overall strategy of type inference; and (iv) provision of an automatedamortisedanalysis in the sense of Sleator and Ta rjan.
In the sequel of the section, we will discuss these stepping stones towards full automation in more details.

Structural Rules
We observed that an uncheckedapplication of the structural rules, that is of the sharing and the weakening rule, quickly leads to an explosion of the size of the constraint system and thus to de-facto unsolvable problems. To wit, an earlier version of our implementation ran continuously for 24/7 without being able to infer atype forthe complete definition of the function splay.
The type-and-effectsystem proposedbyHofmann et al. is in principle linear,that is, variables occur at most onceinthe function body.F or example, this is employedinthe definition of the (let)-rule, seeSection 2.1. However, a sharing rule is admissible, that allows to treat multiple occurrences of variables. Occurrences of non-linear variables are suitably renamedapart and the carriedpotential is shared among the variants. The number of variables strongly influences the size of the constraint problem. Hence, eager application of the sharing rule provedinfeasible. Instead, we restrictedits application to individual program traces. Forthe consideredbenchmark examples, this removedthe need forsharing altogether.
With respect to weakening,however, arefinement of the employedweakening steps provedessential. I.e. we make use of different levels of granularity of this automation, ranging from asimple coefficient comparison (indicatedinthe tactics as w)tofull endowment of the methodology discussedabove (w{mono l2xy}); seethe detaileddiscussion in Chapter 5.
The structural rules caninprinciple be appliedatevery AST node of the program under analysis. However, they introduceadditional variables and constraints and forperformancereasons it is better to applythem sparingly.For the sharing rule we proceed as follows: We recall that the sharing rule allows us to assume that the type system is linear. In particular, we canassume that every variable occurs exactly onceinthe type context, which is exploitedinthe definition of the let rules. However, such an eager application of the sharing rule would directly yield to asize explosion in the number of constraints, as the generation of each fresh variables requires the generation of exponentially manyannotations. Hence, we only apply sharing only when strictly necessary.Inthis waythe typing context canbekept small. Similar to the sharing rule (share),v ariable weakening (w : var) is employedonly when required. This in turn reduces the number of constraints generated. Forthe weakening rule, we employ our novelmethods forsymbolically comparing logarithmic expressions, which we discussedinSection 4.1. Because of our use of Farkas'Lemma, weakening introduces new unknown coefficients. For performancereasons, we need to controlthe size of the resulting constraint system and rely on the user to insert applications of the weakening rule. We note that the weakening rule mayneed to be applied in the middle of atype derivation, seefor example the typing derivation forour motivating example in Figure 3.2. This contrasts to the literature where the weakening rule cantypically be incorporated into the axioms of the type system and thus dispensedwith. Perhaps asimilar approach is possible in the context of logarithmic amortisedresourceanalysis. Fornow we have not been able to verify this.

Proof Ta ctics
In combination with our optimisation framework, tactics allowtosignifcantly improve type annotations. To wit, ATLAS canbeinvokedwith user-definedresourceannotations forthe function splay, representing its "standard" amortisedcomplexity (e.g. copiedfrom Okasaki's book [CO99]) and an easily definable tactic. Fore xample, employing the tactic forthe zig-zig case depictedinFigure 4.1. Then, ATLAS automatically derives the improvedbound reportedabove.S till, forfull automation, tactics are clearly not sufficient. In order to obtain type inference in general, we developedageneralisation of all the tactics that proveduseful on our benchmark and incorporatedthis proof search strategy into the type inferencealgorithm. Using this, the aforementioned(unsuccessful) week-long quest for atype inferenceofsplaying cannow be successfully answered(with the best known result) in minutes.

AutomatedAmortisedAnalysis
In Section 2.1, we providedahigh-level introduction into the potential method and remarkedthat Sleator and Ta rjan's original formulation is re-obtained, if the corresponding potential functions are definedsuch that ( ) ( )+ ( ),see page 11. Formally this canbeachieved by careful controlofthe annotatedsignatures of the functions studied. Suppose, we are interestedinanamortised analysis of pairing heaps -inthe original sense of Sleator and Ta rjan. Forthat, it suffices to controlthe annotatedtype of the result the functions definedoverpairing heaps, that is, we add the additional constraint that the type annotations forthe results are equal. We nowdiscuss howwecan extract amortisedcomplexities in the sense of Sleator and Ta rjan from our approach. Suppose, we are interestedinanamortisedanalysis of splayheaps. Then, it suffices to equate the right-hand sides of the annotatedsignatures of the splayheap functions. That is, we set del_min: B × T| 1 → T| , insert: B × T| 2 → T| and partition: T| 3 → T| forsome unknown resourceannotations 1 , 2 , 3 , .Note that we use the same annotation forall signatures. We canthen obtain apotential function from the annotation in the sense of Sleator and Ta rjan and deduce − as an upper bound on the amortisedcomplexity of the respectivefunction. In Chapter 5, we discuss how to automatically optimise − in order to minimise the amortisedcomplexity bound. Thus, we can(by soundness) derivethe amortisedcost for pairingheaps by utilising the above definition. This automatic minimisation is the second major contribution of our work. Our results suggest anew approach forthe complexity analysis of data structures. On the one hand, we obtain novelinsights into the automatedworst-case runtime complexity analysis of involved programs. On the other hand, we provide aproof-of-concept of acomputer-aidedanalysis of amortisedcomplexities of data-structures that so farhaveonly be analysedmanually.F or example, our approach allows the automatedverification of certificates in program code, stating the (expected) amortisedcomplexity.M ost often these comments only refer to the expectation of the programmer, but have not been verifiedinany way, let alone in aformally verifiable one.

CHAPTER 5
Implementation Basedonthe principal approach, delineatedinSection 2.1, we have implementedthe logarithmic amortisedresourceanalysis detailedabove.I nthis chapter, we briefly indicate the corresponding design choices and heuristics used.
Our tool ATLAS implements the type system partly presentedinFigure 3.1. Its core is the syntaxdirectedapplication of typing rules.
It operates in threephases:

1.) Preprocessing, ATLAS parses and normalises the input program;
2.) Generation of theConstraint System, ATLAS extracts constraints from the normalisedprogram according to the typing rules (seeFigure 3.1); and 3.) Solving,the derived constraint system is handedtoanoptimising constraint solver and the solver output is convertedinto atype annotation.
In terms of overall resourcerequirements, the bottleneck of the system is phase three. Preprocessing is both simple and fast. While the code implementing constraint generation might be complex, its execution is fast. All of the underlying complexity is shiftedinto the third phase. On modern machines with multiple gibibytes of main memory, ATLAS is constrainedbythe CPU, and not by the available memory.
In the remainder of this section, we first detail these phases of ATLAS.W ethen go into more details of the second phase. Finally,weelaborate the optimisation function which is the key enabler of type inference.

Preprocessing
The parser usedinthe first phase is generatedwith ANTLR 1 and transformation of the syntax is im-plementedinJava. The preprocessing performs twotasks: (i) Transformation of the input program into let-normal-form,which is the form of program input requiredbyour type system.
(ii) The unsharing conversion creates explicit copies forvariables that are usedmultiple times. Making multiple uses of avariables explicit is requiredbythe (let)-rule of the type system.
In order to satisfy the requirement of the (let)-rule, it is actually sufficient to track variable usage on the level of program paths. It turns out that in our benchmarks variables are only usedmultiple times in different branches of an if-statement, forwhich no unsharing conversion is needed.H ence, we do not discuss the unsharing conversion further in this thesis and refer the interestedreader to related approaches [J H11; JKM12b; JKM12a; JAS17] formore details.

Let-Normal-Form Conversion
Sinceprograms are not usually written in the restrictedsyntax demandedbythe type system, the input program is first convertedtolet-normal-form. The conversion is performedrecursively and rewrites composede xpressions into simple expressions, where each operator is only appliedtoav ariable or ac onstant. This conversion is achieved by introducing additional let-constructs. We exemplify letnormal-form conversion on acode snippet in Figure 5.1.

Unsharing
The unsharing operation introduces an explicit share node before alet-expression whenever its subexpressions have asharedvariable, as shown in Figure 5.2. It introduces fresh variables and renames occurrences of the sharedvariable. Note that this is adeparture from the type system and makes the application of the (share) rule syntax-directed. We giveanexample forunsharing in Figure 5.2. We note that unsharing is only requiredwhen variables canbeusedmultiple times on the same program path However, in none of our benchmarks variables are shared, so this step is not of relevancefor the presentedresults.

Generation of the ConstraintSystem
After preprocessing, we apply the typing rules. Each rule application generates aset of constraints, which are collectedovermultiple passes over the syntax tree. Importantly,the application of all typing rules, except forthe weakening rule, which we discuss in further detail below, is syntax-directed:This means that each node of the AST of the input program dictates which typing rule is to be applied.   The weakening rule could in principle be appliedateach AST node, giving the constraint solver more freedom to find asolution. This degreeoffreedom needstobecontrolledbythe tooldesigner. In addition, recall that the suggestedimplementation of the weakening rule (seeSection 4.1) is to be parameterisedbythe expert knowledge, fedinto the weakening rule. In our experiments we noticed that the weakening rule has to be appliedsparingly in order to avoid an explosion of the resulting constraint system.
We summarise the degrees of freedom available to the tooldesigner, which canbespecifiedasparameters to ATLAS on sourcelevel.
1.) The selectedt emplate potential functions, i.e. the family of indices , forw hich coefficients ( , ) are generated(we assume not explicitly generatedare set to zero).
2.) The number of annotatedsignatures (with costs and without costs) foreach function.
We nowdiscuss our choices forthe aforementioneddegrees of freedom.

Potential Function Templates
Foreach node in the AST of the consideredinput program, where variables of type Tree are currently in context, we create the coefficients 1 ,..., forthe rank functions and the coefficients ( , ) for the logarithmic terms, where ∈{ 0 ,1 } and ∈{ 0 ,2 } .T his choiceturnedout to be sufficient in our experiments.
Our potential-basedmethod employs linear combinations of basic potential functions BF,see Definition 2. In order to fix the cardinality of the set of resourcefunctions to be considered, we restrict the coefficients of the potential functions ( 1 ,..., , ) .F or the non-constant part, we demand that ∈{ 0 ,1 } ,while the coefficients ,representing the constant part are restrictedto{0,1,2}.This restriction to arelativesmall set of basic potential functions suitably controls the number of constraints generatedfor each inferencerule in thetype system.

Number of Function Signatures
We fix the number of annotations foreach function to (i) One indeterminate type annotation representing afunction call with costs; (ii) one indeterminate cost-freetype annotation to represent a zero-cost call; and (iii) one fixed cost-freeannotation with the empty annotation that doesn't carry any potential. These restrictions were sufficient to handle our benchmarks. Alarger, potentially infinite set of type annotations is conceivable, as long as well-typedness is respected, seeDefinition 7. As noted in the context of the analysis of constant amortisedcomplexity an enlargedset of type annotations may be even requiredtohandle non-tail recursiveprograms, see [JKM12a;JAS17].

Sparse Expert Knowledge Matrix
We observefor both kinds of constraints that matrix is sparse. We exploit this in our implementation and only store non-zero coefficients.
Parametrisation of We akening Each applications of the weakening rule is parametrisedb ythe matrix .Inour tool, we instantiate with either the constraints for (i) monotonicity,shortly referenced as w{mono}; (ii) Lemma 4(w{l2xy}); (iii) both (w{mono l2xy}); or (iv) none of the constraints (w).
In the last case, Farkas'Lemma is not needed because weakening defaults to point-wise comparison of the coefficients ( , ) ,which canbeimplementedmore directly.E ach time we apply weakening, we need to choose howtoinstantiate matrix .O ur experiments demonstrate that we need to apply monotonicity and Lemma 4sparingly in order to avoid blowing up the constraint system.

Optimisation
AutomatedMode Forautomation, we extractedcommon patterns from the tactics we developedmanually: Weakening with mode w{mono} is appliedbefore (var) and (leaf), w{mono l2xy} is appliedonly before (app). Further, forAST subtrees that construct trees, i.e. which only consist of (node), (var) and (leaf) rule applications, we apply w{mono} foreach inner node, and w{l2xy} foreach outermost node. Forall other cases, no weakening is applied. This approach is sufficient to coverall benchmarks, with further improvements possible.

Solving
Forsolving the generatedconstraint system, we rely on the Z3 SMT solver, see [MB08]. We employ Z3's Java bindings, load Z3 as asharedlibrary,and exchange constraints forsolutions. ATLAS forwards user-suppliedconfiguration to Z3, which allows forflexible tuning of solver parameters. We also record Z3's statistics, most importantly memory usage. During the implementation of ATLAS,Z3's feature to extract unsatisfiable cores has provenvaluable. It supplieduswith manycounterexamples, often directly pinpointing bugs in our implementation. The toolexports constraint systems in SMT-LIB format to the file system. This way, solutions could be cross-checkedbyre-computing them with other SMT solvers that support minimisation, such as OptiMathSAT [ST15].

Optimisation
Given an annotatedfunction : 1 ×···× | → | ,wewant to find values forthe coefficients of the resourceannotations and that minimise Φ(Γ| )−Φ(Γ| ),sincethis differenceisan upper bound on the amortisedcost of ,see Section 4.2.4 .H owever, as with weakening, we cannot directly express such aminimisation, and again resort to linearisation: We choose an optimisation function that directly maps from and to Q.Our optimisation function combines four measures, threeofwhich involveadifferencebetweencoefficients of and ,and afourth one that only involves coefficients from in order to minimise the absolute values of the discoveredcoefficients. We first present these measures forthe special case of | | = 1.
Our main results have already been statedinT able 1.1 in Chapter 1. We illustrate in Ta ble5.1a that the naiveautomation does not terminate within 24h forthe core operations of the threeconsidereddata structures, whereas the improvedautomatedmode produces optimisedresults within minutes. Here, "Selective" means that limitedexpert knowledge is chosen by the automatedmode, "all" means that monotonicity and Lemma 4are used. Timeouts are denotedby"t/o".N aiveautomation does not support selection of expert knowledge forweakening, thus resulting in no answer, denoted"n/a".I n Ta ble5.1b, we compare the (improved) automatedmode with the manual mode, and report on the sizes of the resulting constraint system and on the resources requiredtoproduce the same results. Observe that even though our automatedmode achieves reasonable solving times, there is still asignificant gap betweenthe manually craftedtactics and the automatedmode, which invites future work.

Evaluation
We first describe the benchmark functions employedtoevaluate ATLAS and then detail this experimental evaluation, already depictedinT able 1.1.

SplayTrees
Introduced by Sleator and Ta rjan [DR85; RE 85], splaytrees are self-adjusting binary search trees with strictly increasing in-order traversal, but without an explicit balancing condition. Basedonsplaying, searching is performedbysplaying with the sought element and comparing to the root of the result. Similarly,i nsertion and deletion are basedonsplaying. Above we usedthe zig-zig case of splaying, depictedinFigure 2.2 as motivating code example. While the pen-and-paper analysis of this case is the most involved,type inferencefor this case alone did not directly yield the desiredautomation of the complete definition. Rather, full automation requiredsubstantial implementation effort, as detailed in Chapter 5. As already emphasised, it came as asuprise to us that our tool ATLAS is able match up and partly improve upon the sophisticatedoptimisations perfomedbySchoemakers [Sch92;BS93]. This seems to be evidenceofthe versatility of the employedpotential functions. Further, we leverage the sophistication of our optimisation layerinconjunction with the current powerofstate-of-the-art constraint solvers, like Z3 [MB08].

SplayHeaps
To overcome deficiencies of splayt rees when implementedf unctionally,O kasaki introduced splay heaps.S play heaps are definedsimilarly to splaytrees and its (manual) amortisedcost analysis follows similar pattern than the one forsplay trees. Due to the similarity in the definitions betweensplay heaps and splaytrees, extension of our experimental results in this direction did not pose anyproblems. Notably,however, ATLAS derives fully automatically the best (or slight improvements of) known complexity bounds on the amortisedcomplexity forthe functions studied. We also remark that typical assumptions made in pen-and-paper proofs are automatically dischargedb your approach. To wit, Shoenmakers [Sch92; BS93] as well as Nipkowand Brinkop [NB19] make use of the (obvious) fact that the size of the resulting tree or heap ℎ equals the size of the input. As discussed, this information is capturedthrough acost-freederivation, seeSection 2.1.

Pairing Heaps
These are another implementation of heaps, which are representedasbinary trees, subject to the invariant that they are either leaf,orthe right child is leaf,respectively.T he left child is conceivable as list of pairing heaps. Schoenmakers and Nipkowetal. provide a(semi-)manual analysis of pairing heaps, that ATLAS canverify or even improve fully-automatically.W enote that we analyse asingle function merge_pairs,whereas [NB19] breaks down the analysis and studies twofunctions pass_1 and pass_2 with merge_pairs = pass_2 • pass_1.A ll definitions canbefound at [Lor21b].

Experimental Results
Our main results have already been statedinT able 1.1 of Section 1.2. Ta ble5.1a compares the differences betweenthe "naiveautomation" and our actual automation ("automatedmode"), seeSection 5.1. Within the latter, we distinguish betweena"selective" and a"full" mode. The "selective" mode is as describedinSection 5.1.2. The "full" mode employs weakening forthe same rule applications as the "selective" mode,but alwayswith option w{mono l2xy}.T he same applies to the "full" manual mode. The naiveautomation does not support selection of expert knowledge. Thus the "selective" option is not available, denotedas"n/a".Timeouts are denotedby"t/o".Asdepictedinthe table, the naiveautomation does not terminate within 24h forthe core operations of the threeconsidereddata structures, whereas the improvedautomatedmode produces optimisedresults within minutes. In Ta ble5.1b, we compare the (improved) automatedmode with the manual mode, and report on the sizes of the resulting constraint system and on the resources requiredtoproducethe same results. Observethat even though our automatedmode achieves reasonable solving times, there is still asignificant gap between the manually craftedtactics andthe automatedmode, which invites future work.

Conclusion
We have presentedanamortisedresourceanalysis using the potential method. Potential functions take the shape of "sums of logarithms". The method is renderedinatype-and-effectsystem. Our type system has been carefully designedwith the goal of automation, crucially invoking Farkas'Lemma for the linear part of the calculations and adding necessary facts about the logarithm.
Our contribution is novel, in the sense that this is the first approach to automation of a logarithmic amortisedcomplexity analysis. In particular, our system automatically infers competitiveresults for the logarithmic amortisedcost of multiple operations on various self-balancing data structures such as splaytrees, splayheaps and pairing heaps.
As our potential functions are logarithmic, we cannot directly encode the comparison betweenlogarithmic expressions within the theory of linear arithmetic. This howeverisvital fore.g. expressing Schhoenmakers'and Nipkow's (manual) analysis [B S93; TN15] in our type-and-effectsystem. In order to overcome this algorithmic challenge, we proposedseveral ideas forthe linearisation of the induced constraint satisfaction problem.
These efforts canbereadily extendedbyexpanding upon the expert knowledge currently employed, e.g. via incorporation of the results of astatic analysis performedinapre-processing step.
Immediate future work is concernedwith replacing the "sum of logarithms" potential function in order to analyse skew heaps and Fibonacci heaps [Sch92]. In particular, the potential function for skew heaps, which counts "right heavy" nodes, is interesting, because this function is usedasabuilding block by Iacono in his improvedanalysis of pairing heaps [Iac00; IY16]. Further, we envision to extend our analysis to relatedprobabilistic settings such as the analysis of priority queues [GM98] and skip lists [Pug90].