Automated Expected Amortised Cost Analysis of Probabilistic Data Structures

In this paper, we present the first fully-automated expected amortised cost analysis of self-adjusting data structures, that is, of randomised splay trees, randomised splay heaps and randomised meldable heaps, which so far have only (semi-) manually been analysed in the literature. Our analysis is stated as a type-and-effect system for a first-order functional programming language with support for sampling over discrete distributions, non-deterministic choice and a ticking operator. The latter allows for the specification of fine-grained cost models. We state two soundness theorems based on two different -- but strongly related -- typing rules of ticking, which account differently for the cost of non-terminating computations. Finally we provide a prototype implementation able to fully automatically analyse the aforementioned case studies.


Introduction
Probabilistic variants of well-known computational models such as automata, Turing machines or the λ-calculus have been studied since the early days of computer science (see [15,16,23] for early references).One of the main reasons for considering probabilistic models is that they often allow for the design of more efficient algorithms than their deterministic counterparts (see e.g.[6,21,23]).Another avenue for the design of efficient algorithms has been opened up by Sleator and Tarjan [31,33] with their introduction of the notion of amortised complexity.
Here, the cost of a single data structure operation is not analysed in isolation but as part of a sequence of data structure operations.This allows for the design of algorithms where the cost of an expensive operation is averaged out over multiple operations and results in a good overall worst-case cost.Both methodologiesprobabilistic programming and amortised complexity-can be combined for the design of even more efficient algorithms, as for example in randomized splay trees [1], where a rotation in the splaying operation is only performed with some probability (which improves the overall performance by skipping some rotations while still guaranteeing that enough rotations are performed).
In this paper, we present the first fully-automated expected amortised cost analysis of probabilistic data structures, that is, of randomised splay trees, randomised splay heaps, randomised meldable heaps and a randomised analysis of a binary search tree.These data structures have so far only (semi-)manually been analysed in the literature.Our analysis is based on a novel type-and-effect system, which constitutes a generalisation of the type system studied in [13,17] to the non-deterministic and probabilistic setting, as well as an extension of the type system introduced in [34] to sublinear bounds and non-determinism.We provide a prototype implementation that is able to fully automatically analyse the case studies mentioned above.We summarize here the main contributions of our article: (i) We consider a first-order functional programming language with support for sampling over discrete distributions, non-deterministic choice and a ticking operator, which allows for the specification of fine-grained cost models.(ii) We introduce compact small-step as well as big-step semantics for our programming language.These semantics are equivalent wrt. the obtained normal forms (ie., the resulting probability distributions) but differ wrt. the cost assigned to non-terminating computations.(iii) Based on [13,17], we develop a novel type-and-effect system that strictly generalises the prior approaches from the literature.(iv) We state two soundness theorems (see Section 5.3) based on two different-but strongly related-typing rules of ticking.The two soundness theorems are stated wrt. the small-step resp.big-step semantics because these semantics precisely correspond to the respective ticking rule.The more restrictive ticking rule can be used to establish (positive) almost sure termination (AST) while the more permissive ticking rule supports the analysis of a larger set of programs (which can be very useful in case termination is not required or can be established by other means); in fact, the more permissive ticking rule is essential for the precise cost analysis of randomised splay trees.We note that the two ticking rules and corresponding soundness theorems do not depend on the details of the type-and-effect system, and we believe that they will be of independent interest (e.g., when adapting the framework of this paper to other benchmarks and cost functions).(v) Our prototype implementation ATLAS strictly extends and earlier vrsion discussed in [17] and all our earlier evaluation results can be replicated (and sometime improved).
With our implementation and the obtained experimental results we make two contributions to the complexity analysis of data structures: 1. We automatically infer bounds on the expected amortised cost, which could previously only be obtained by sophisticated pen-and-paper proofs.In particular, we verify that the amortised costs of randomised variants of self-adjusting data structures improve upon their non-randomised variants.In Table 1 we state the expected cost of the randomised data structures and their deterministic counterparts; the benchmarks are detailed in Section 2. 2. We establish a novel approach to the expected cost analysis of data structures.
While the detailed study of Albers et al. in [1] requires a sophisticated pen-Table 1: Expected Amortised Cost of Randomised Data Structures.We also state the deterministic counterparts considered in [17] for comparison.
and-paper analysis, our approach allows us to fully-automatically compare the effect of different rotation probabilities on the expected cost (see Table 2 of Section 6).
Related Work.The generalisation of the model of computation and the study of the expected resource usage of probabilistic programs has recently received increased attention (see e.g.[2,4,5,7,9,10,14,19,20,22,25,34,35]).We focus on related work concerned with automations of expected cost analysis of deterministic or non-deterministic, probabilistic programs-imperative or functional.(A probabilistic program is called non-deterministic, if it additionally makes use of non-deterministic choice.) In recent years the automation of expected cost analysis of probabilistic data structures or programs has gained momentum, cf.[2-5, 20, 22, 25, 34, 35].Notably, the Absynth prototype by [25], implement Kaminski's ert-calculus, cf.[14] for reasoning about expected costs.Avanzini et al. [5] introduce the tool ecoimp, which generalises the Absynth prototype and provides a modular and thus a more efficient and scalable alternative for non-deterministic, probabilistic programs.In comparison to these works, we base our analysis on a dedicated type system finetuned to express sublinear bounds; further our prototype implementation ATLAS derives bounds on the expected amortised costs.Neither is supported by Absynth or eco-imp.Martingale based techniques have been implemented, e.g., by Peixin Wang et al. [35].Related results have been reported by Moosbrugger et al. [22].Meyer et al. [20] provide an extension of the KoAT tool, generalising the concept of alternating size and runtime analysis to probabilistic programs.Again, these innovative tools are not suited to the benchmarks considered in our work.With respect to probabilistic functional programs, Di Wang et al. [34] provided the only prior expected cost analysis of (deterministic) probabilistic programs; this work is most closely related to our contributions.Indeed, our typing rule (ite : coin) stems from [34] and the soundness proof wrt. the big-step semantics is conceptually similar.Nevertheless, our contributions strictly generalise their results.First, our core language is based on a simpler semantics, giving rise to cleaner formulations of our soundness theorems.Second, our type-andeffect provides two different typing rules for ticking, a fact we can capitalise on in additional strength of our prototype implementation.Finally, our amortised analysis allows for logarithmic potential functions.
A bulk of research concentrates on specific forms of martingales or Lyapunov ranking functions.All these works, however, are somewhat orthogonal to our contributions, as foremostly termination (ie.AST or PAST) is studied, rather than computational complexity.Still these approaches can be partially suited to a variety of quantitative program properties, see [32] for an overview, but are incomparable in strength to the results established here.
Structure.In the next section, we provide a bird's eye view on our approach.Sections 3 and 4 detail the core probabilistic language employed, as well as its small-and big-step semantics.In Section 5 we we introduce the novel type-andeffect system formalising and state soundness of the system wrt.the respective semantics.In Section 6 we present evaluation results of our prototype implementation ATLAS.Finally, we conclude in Section 7.

Overview of Our Approach and Results
In this section, we first sketch our approach on an introductory example and then detail the benchmarks and results depicted in Table 1 in the Introduction.

Introductory Example
Consider the definition of the function descend, depicted in Figure 1.The expected amortised complexity of descend is log 2 (|t|), where |t| denotes the size of a tree (defined as the number of leaves of the tree). 4Our analysis is set up in terms of template potential functions with unknown coefficients, which will be instantiated by our analysis.Following [13,17], our potential functions are composed of two types of resource functions, which can express logarithmic amortised cost: For a sequence of n trees t 1 , . . ., t n and coefficients denotes the logarithm of a linear combination of the sizes of the trees.The resource function rk(t), which is a variant of Schoenmakers' potential, cf.[26,29,30], is inductively defined as (i) rk(leaf) := 1; | node l a r → if coin 1/2 Denotes p = 1 /2, which is default and could be omitted.(We note that rk(t) is not needed for the analysis of descend but is needed for more involved benchmarks, e.g.randomised splay trees.)With these resource functions at hand, our analysis introduces the coefficients q * , q (1,0) , q (0,2) , q ′ * , q ′ (1,0) , q ′ (0,2) and employs the following Ansatz : Here, c descend (t) denotes the expected cost of executing descend on tree t, where the cost is given by the ticks as indicated in the source code (each tick accounts for a recursive call).The result of our analysis will be an instantiation of the coefficients, returning q (1,0) = 1 and zero for all other coefficients, which allows to directly read off the logarithmic bound log 2 (|t|) of descend.
Our analysis is formulated as a type-and-effect system, introducing the above template potential functions for every subexpression of the program under analysis.The typing rules of our system give rise to a constraint system over the unknown coefficients that capture the relationship between the potential functions of the subexpressions of the program.Solving the constraint system then gives a valid instantiation of the potential function coefficients.Our type-andeffect system constitutes a generalisation of the type system studied in [13,17] to the non-deterministic and probabilistic setting, as well as an extension of the type system introduced in [34] to sublinear bounds and non-determinism.
In the following, we survey our type-and-effect system by means of example descend.A partial type derivation is given in Figure 2.For brevity, type judgements and the type rules are presented in a simplified form.In particular, we restrict our attention to tree types, denoted as T.This omission is inessential to the actual complexity analysis.For the full set of rules see the Appendix.We now discuss this type derivation step by step.
Let e denote the body of the function definition of descend, cf. Figure 1.Our automated analysis infers an annotated type by verifying that the type judgement t : T|Q ⊢ e : T|Q ′ is derivable.Types are decorated with annotations Q := [q * , q (1,0) , q (0,2) ] and Q ′ := [q ′ * , q ′ (1,0) , q ′ (0,2) ]-employed to express the potential carried by the arguments to descend and its results.Annotations fix the coefficients of the resource functions in the corresponding potential func- . By our soundness theorems (see Section 5.3), such a typing guarantees that the expected amortised cost of descend is bounded by the expectation (wrt.the distribution of values in the limit) of the difference between Φ(t : T|Q) and Φ(descend t : T|Q ′ ).Because e is a match expression, the following rule is applied (we only state a restricted rule here, the general rule can be found in the Appendix): Here e 1 denotes the subexpression of e that corresponds to the node case of match.Apart from the annotations Q, Q 1 and Q ′ , the rule (match) constitutes a standard type rule for pattern matching.With regard to the annotations Q and Q 1 , (match) ensures the correct distribution of potential by inducing the constraints where the constraints are immediately justified by recalling the definitions of the resource functions p (a1,...,an,b The next rule is a structural rule, representing a weakening step that rewrites the annotations of the variable context.The rule (w) allows a suitable adaptation of the coefficients based on the following inequality, which holds for any substitution σ of variables by values, Φ(σ; l : T, r : T|Q 1 ) ⩾ Φ(σ; l : T, r : T|Q 2 ).l : T, r : T|Q 2 ⊢ e 1 : T|Q ′ l : T, r : T|Q 1 ⊢ e 1 : T|Q ′ (w) In our prototype implementation this comparison is performed symbolically.We use Farkas' Lemma in conjunction with two facts about the logarithm to else Omitted for brevity, symmetric to the the depicted case.
Fig. 3: Partial meld function of Randomised Meldable Heaps linearise this symbolic comparison, namely the monotonicity of the logarithm and the fact that 2 + log 2 (x) + log 2 (y) ⩽ 2 log 2 (x + y) for all x, y ⩾ 1.For example, Farkas' Lemma in conjunction with the latter fact gives rise to q 1 (0,0,2) + 2f ⩾ q 2 (0,0,2) (1,0,0) for some fresh rational coefficient f ⩾ 0 introduced by Farkas' Lemma.After having generated the constraint system for descend, the solver is free to instantiate f as needed.In fact in order to discover the bound log 2 (|t|) for descend, the solver will need to instantiate f = 1 /2, corresponding to the inequality log So far, the rules did not refer to sampling and are unchanged from their (non-probabilistic) counterpart introduced in [13,17].The next rule, however, formalises a coin toss, biased with probability p.Our general rule (ite : coin) is depicted in Figure 12 and is inspired by a similar rule for coin tosses that has been recently been proposed in the literature, cf.[34].This rule specialises as follows to our introductory example: l : T, r : T|Q 4 ⊢ e 3 : T|Q ′ l : T, r : T|Q 3 ⊢ let x l = (descend l) ✓ in node x l a r : T|Q ′ l : T, r : T|Q 2 ⊢ if coin 1/2 then e 2 else e 3 : T|Q ′ (ite : coin) Here e 2 and e 3 respectively, denote the subexpressions of the conditional and in addition the crucial condition This condition, expressing that the corresponding annotations are subject to the probability of the coin toss, gives rise to the following constraints (among others) (1,0,0) .
In the following, we will only consider one alternative of the coin toss and proceed as in the partial type derivation depicted in Figure 1 (ie.we state the thenbranch and omit the symmetric else-branch).Thus next, we apply the rule for the let expression.This rule is the most involved typing rule in the system  proposed in [13,17].However, for our leading example it suffices to consider the following simplified variant: Focusing on the annotations, the rule (let : tree) suitably distributes potential assigned to the variable context, embodied in the annotation Q 3 , to the recursive call within the let expression (via annotation Q 4 ) and the construction of the resulting tree (via annotation Q 7 ).The distribution of potential is facilitated by generating constraints that can roughly be stated as two "equalities", that is, (i) "Q 3 = Q 4 + D" and (ii) "Q 7 = D + Q 6 ".Equality (i) states that the input potential is split into some potential Q 4 used for typing (descend l) ✓ and some remainder potential D (which however is not constructed explicitly and only serves as a placeholder for potential that will be passed on).Equality (ii) states that the potential Q 7 used for typing node x l a r equals the remainder potential D plus the leftover potential Q 6 from the typing of (descend l) ✓ .The (tick : now) rule then ensures that costs are properly accounted for by generating constraints for Q 4 = Q 5 + 1.Finally, the type derivation ends by the application rule, denoted as (app), that verifies that the recursive call is well-typed wrt. the (annotated) signature of the function descend : T|Q → T|Q ′ , ie. the rule enforces that Q 5 = Q and Q 6 = Q ′ .We illustrate (a subset of) the constraints induced by (let), (tick : now) and (app): (1,0) (1,0) where (i) the constraints in the first three columns-involving the annotations Q 3 , Q 4 , Q 6 and Q 7 -stem from the constraints of the rule (let : tree); (ii) the constraints in the last column-involving Q 4 , Q 5 , Q and Q ′ -stem from the constraints of the rule (tick : now) and (app).For example, q 3 (1,0,0) = q 4 (1,0) and q 3 (0,1,0) = q 7 (0,1,0) distributes the part of the logarithmic potential represented by Q 3 to Q 4 and Q 7 ; q 6 1 = q 7 1 expresses that the rank of the result of evaluating the recursive call can be employed in the construction of the resulting tree node x l a r; q 4  (1,0) = q 5 (1,0) and q 4 (0,2) = q 5 (0,2) + 1 relate the logarithmic resp.con- Assuming probability 1 /2 for a < d.
then node (insert d l) ✓ a r else node l a (insert d r) ✓ Fig. 5: insert function of a Binary Search Tree with randomized comparison stant potential according to the tick rule, where the addition of one accounts for the cost embodied by the tick rule; q 5 (1,0) = q (1,0) stipulates that the potential at the recursive call site must match the function type.
Our prototype implementation ATLAS collects all these constraints and solves them fully automatically.Following [13,17], our implementation in fact searches for a solution that minimises the resulting complexity bound.For the descend function, our implementation finds a solution that sets q (1,0) to 1, and all other coefficients to zero.Thus, the logarithmic bound log 2 (|t|) follows.

Overview of Benchmarks and Results
Randomised Meldable Heaps.Gambin et al. [12] proposed meldable heaps as a simple priority-queue data structure that is guaranteed to have expected logarithmic cost for all operations.All operations can be implemented in terms of the meld function, which takes two heaps and returns a single heap as a result.The partial source code of meld is given in Figure 3 (the full source code of all examples can be found in the Appendix).Our tool ATLAS fully-automatically infers the bound log 2 (|h1|) + log 2 (|h2|) on the expected cost of meld.
Randomised Splay Trees.Albers et al. in [1] proposed these splay trees as a variation of deterministic splay trees [31], which have better expected runtime complexity (the same computational complexity in the O-notation but with smaller constants).Related results have been obtained by Fürer [11].The proposal is based on the observation that it is not necessary to rotate the tree in every (recursive) splaying operation but that it suffices to perform rotations with some fixed positive probability in order to reap the asymptotic benefits of self-adjusting search trees.The theoretical analysis of randomised splay trees [1] starts by refining the cost model of [31], which simply counts the number of rotations, into one that accounts for recursive calls with a cost of c and for rotations with a cost of d.We present a snippet of a functional implementation of randomised splay trees in Figure 4. We note that in this code snippet we have set c = d = 1 /2; this choice is arbitrary; we have chosen these costs in order to be able to compare the resulting amortised costs to the deterministic setting of [17], where the combined cost of the recursive call and rotation is set to 1; we note that our analysis requires fixed costs c and d but these constants can be chosen by the user; for example one can set c = 1 and d = 2.75 corresponding to the costs observed during the experiments in [1].Likewise the probability of the coin toss has been arbitrarily set to p = 1 /2 but could be set differently by the user.(We remark that to the best of our knowledge no theoretical analysis has been conducted on Our analysis is able to fully automatically infer an amortised complexity bound of 9 /8 log 2 (|t|) for splay (with c, d and p fixed as above), which improves on the complexity bound of 3 /2 log 2 (|t|) for the deterministic version of splay as reported in [17], confirming that randomisation indeed improves the expected runtime.We remark on how the amortised complexity bound of 9 /8 log 2 (|t|) for splay is computed by our analysis.Our tool ATLAS computes an annotated type for splay that corresponds to the inequality 3 /4 rk(t)+ 9 /8 log 2 (|t|)+ 3 /4 ⩾ c splay (t)+ 3 /4 rk(splay t)+ 3 /4.By setting ϕ(t) := rk(t)+ 3 /4 as potential function in the sense of Tarjan and Sleator [31,33], the above inequality allows us to directly read out an upper bound on the amortised complexity a splay (t) of splay (we recall that the amortised complexity in the sense of Tarjan and Sleator is defined as the sum of the actual costs plus the output potential minus the input potential): Probabilistic Analysis of Binary Search Trees.We present a probabilistic analysis of a deterministic binary search tree, which offers the usual contains, insert, and delete operations, where delete uses delete_max given in Figure 6, as a subroutine (the source code of the missing operations is given in the Appendix).We assume that the elements inserted, deleted and searched for are equally distributed; hence, we conduct a probabilistic analysis by replacing every comparison with a coin toss of probability one half.We will refer to the resulting data structure as Coin Search Tree in our benchmarks.The source code of insert is given in Figure 5.Our tool ATLAS infers an logarithmic expected amortised cost for all operations, ie., for insert and delete_max we obtain (i , from which we obtain an expected amortised cost of 1 /2 log 2 (|t|) for both functions.

Probabilistic Functional Language
Preliminaries.Let R + 0 denote the non-negative reals and R +∞ 0 their extension by ∞.We are only concerned with discrete distributions and drop "discrete" in the following.Let A be a countable set and let D(A) denote the set of (sub)distributions d over A, whose support supp(µ) := {a ∈ A | µ(a) ̸ = 0} is countable.Distributions are denoted by Greek letters.For µ ∈ D(A), we may write µ = {a pi i } i∈I , assigning probabilities p i to a i ∈ A for every i ∈ I, where I Fig. 7: A Core Probabilistic (First-Order) Programming Language is a suitable chosen index set.We set |µ| := i∈I p i .If the support is finite, we simply write µ = {a p1 1 , . . ., a pn n } The expected value of a function f : . Further, we denote by Syntax.In Figure 7, we detail the syntax of our core probabilistic (first-order) programming language.With the exception of ticks, expressions are given in let-normal form to simplify the presentation of the operational semantics and the typing rules.In order to ease the readability, we make use of mild syntactic sugaring in the presentation of actual code (as we already did above).
To make the presentation more succinct, we assume only the following types: a set of base types B such as Booleans Bool = {true, false}, integers Int, or rationals Rat, product types, and binary trees T, whose internal nodes are labelled with elements b : B, where B denotes an arbitrary base type.Values are either of base types, trees or pairs of values.We use lower-case Greek letters (from the beginning of the alphabet) for the denotation of types.Elements t : T are defined by the following grammar which fixes notation.t ::= leaf | node t 1 b t 2 .The size of a tree is the number of leaves: |leaf| := 1, |node t a u| := |t| + |u|.
We skip the standard definition of integer constants n ∈ Z as well as variable declarations, cf.[27].Furthermore, we omit binary operators with the exception of essential comparisons.As mentioned, to represent sampling we make use of a dedicated if-then-else expression, whose guard evaluates to true depending on a coin toss with fixed probability.Further, non-deterministic choice is similarly rendered via an if-then-else expression.Moreover, we make use of ticking, denoted by an operator • ✓a/b to annotate costs, where a, b are optional and default to one.Following Avanzini et al. [2], we represent ticking • ✓ as an operation, rather than in let-normal form, as in [34].This allows us to suit a big-step semantics that only accumulates the cost of terminating expressions.The set of all expressions is denoted E.
A typing context is a mapping from variables V to types.Type contexts are denoted by upper-case Greek letters, and the empty context is denoted ε.A program P consists of a signature F together with a set of function definitions 1 , e  When considering some expression e that includes function calls we will always assume that these function calls are defined by some program P. A substitution or (environment) σ is a mapping from variables to values that respects types.Substitutions are denoted as sets of assignments: We write dom(σ) to denote the domain of σ.

Operational Semantics
Small-Step Semantics.The small-step semantics is formalised as a (weighted) non-deterministic, probabilistic abstract reduction system [4,8] over M(E).In this way (expected) cost, non-determinism and probabilistic sampling are taken care of.Informally, a probabilistic abstract reduction system is a transition systems where reducts are chosen from a probability distribution.A reduction wrt.such a system is then given by a stochastic process [8], or equivalently, as a reduction relation over multidistributions [4], which arise naturally in the context of non-determinism (we refer the reader to [4] for an example that illustrates the advantage of multidistributions in the presence of non-determinism).More precisely, multidistributions are countable multisets {a pi i } i∈I over pairs p i : a i of probabilities 0 < p i ⩽ 1 and objects a i ∈ A with i∈I p i ⩽ 1. (For ease of presentation, we do not distinguish notationally between sets and multisets.)Multidistributions over objects A are denoted by M(A).For a multidistribution µ ∈ M(A) the induced distribution µ ∈ D(A) is defined in the obvious way by summing up the probabilities of equal objects.
Following [5], we equip transitions with (positive) weights, amounting to the cost of the transition.Formally, a (weighted) Probabilistic Abstract Reduction System (PARS) on a countable set A is a ternary relation n⩾ c n and µ = lim n→∞ µ n ↾ V , where µ n ↾ V denotes the restriction of the distribution µ n (induced by the multidistribution µ n ) to a (sub-)distribution over values.Note that the µ n ↾ V form a CPO wrt. the pointwise ordering, cf.[36].Hence, the fixed point µ = lim n→∞ µ n ↾ V exists.We also write e −→ ∞ µ in case the cost of the evaluation is not important.
(Positive) Almost Sure Termination.A program P is almost surely terminating (AST ) if for any substitution σ, and any evaluation eσ −→ ∞ µ, we have that µ forms a full distribution.For the definition of positive almost sure termination we assume that every statement of P is enclosed in an ticking operation with cost one; we note that such a cost models the length of the computation.We say P is positively almost surely terminating (PAST ), if for any substitution σ, and any evaluation eσ c −→ ∞ µ, we have c < ∞.It is well known that PAST implies AST, cf.[8].
Big-Step Semantics.We now define the aforementioned big-step semantics.We first define approximate judgments σ n c e ⇒ µ, see Figure 10, which say that in derivation trees with depth up to n the expression e evaluates to a subdistribution µ over values with cost c.We now consider the cost c n and subdistribution µ n in σ n cn e ⇒ µ n for n → ∞.Note that the subdistributions µ n in σ n cn e ⇒ µ n form a CPO wrt. the pointwise ordering, cf.[36].Hence, there exists a fixed point µ = lim n→∞ µ n .Moreover, we set c = lim n→∞ c n (note that either c n converges to some real c ∈ R +∞ 0 or we have c = ∞).We now define the big-step judgments e is not a value Here  5 Type-and-Effect System for Expected Cost Analysis

Resource Functions
In Section 2, we introduced a variant of Schoenmakers' potential function, denoted as rk(t), and the additional potential functions p (a1,...,an,b , denoting the log 2 of a linear combination of tree sizes.We demand for well-definedness of the latter; log 2 denotes the logarithm to the base 2. Throughout the paper we stipulate log 2 (0) := 0 in order to avoid case distinctions.Note that the constant function 1 is representable: 1 = λt.log 2 (0 • |t| + 2) = p (0,2) .We are now ready to state the resource annotation of a sequence of trees.Definition 1.A resource annotation or simply annotation of length m is a sequence Q = [q 1 , . . ., q m ] ∪ [(q (a1,...,am,b) ) ai,b∈N ], vanishing almost everywhere.The length of Q is denoted |Q|.The empty annotation, that is, the annotation where all coefficients are set to zero, is denoted as ∅.Let t 1 , . . ., t m be a sequence of trees.Then, the potential of t m , . . ., t n wrt.Q is given by In case of an annotation of length 1, we sometimes write q * instead of q 1 .We may also write Φ(v : α|Q) for the potential of a value of type α annotated with Q.Both notations were already used above.Note that only values of tree type are assigned a potential.We use the convention that the sequence elements of resource annotations are denoted by the lower-case letter of the annotation, potentially with corresponding sub-or superscripts.
Example 1.Let t be a tree.To model its potential as log 2 (|t|) in according to Definition 1, we simply set q (1,0) := 1 and thus obtain Φ(t|Q) = log 2 (|t|), which describes the potential associated to the input tree t of our leading example descend above.
Let σ be a substitution, let Γ denote a typing context and let x 1 : T, . . ., x n : T denote all tree types in Γ .A resource annotation for Γ or simply annotation is an annotation for the sequence of trees x 1 σ, . . ., x n σ.We define the potential of the annotated context Γ |Q wrt. a substitution σ as Φ(σ; Γ |Q) := Φ(x 1 σ, . . ., x n σ|Q).
Fig. 12: Conditional expression that models tossing a coin.
Definition 2. An annotated signature F maps functions f to sets of pairs of annotated types for the arguments and the annotated type of the result: We suppose f takes n arguments of which m are trees; m ⩽ n by definition.
Similarly, the return type may be the product In this case, we demand that at most one β i is a tree type. 6nstead of , we sometimes succinctly write f : α|Q → β|Q ′ where α, β denote the product types respectively.It is tacitly understood that the above syntactic restrictions on the length of the annotations Q, Q ′ are fulfilled.For every function f , we also consider its cost-free variant from which all ticks have been removed.We collect the cost-free signatures of all functions in the set F cf .
Example 2. Consider the function descend depicted in Figure 2. Its signature is formally represented as T|Q → T|Q ′ , where ].We leave it to the reader to specify the coefficients in Q, Q ′ so that the rule (app) as depicted in Section 2 can indeed be employed to type the recursive call of descend.

Typing Rules
The non-probabilistic part of the type system is given in Figs.B.1 and B.2.In contrast to the type system employed in [13,17], the cost model is not fixed but controlled by the ticking operator.Hence, the corresponding application rule (app) has been adapted.Costing of evaluation is now handled by a dedicated ticking operator, cf. Figure 11.In Figure 12, we give the rule (ite : coin) responsible for typing probabilistic conditionals.We remark that the core type system, that is, the type system given by Fig.  → iff e is a value.

Soundness Theorems
A program P is called well-typed if for any definition f (x 1 , . . ., x n ) = e ∈ P and any annotated signature f : Corollary 1.Let P be a well-typed program such that ticking accounts for all evaluation steps.Suppose Γ |Q ⊢ e : α|Q ′ .Then e is positive almost surely terminating (and thus in particular almost surely terminating).
Theorem 3 (Soundness Theorem for (tick : defer)).Let P be well-typed.Suppose Γ |Q ⊢ e : α|Q ′ and σ c e ⇒ µ.Then, we have We comment on the trade-offs between Theorems 2 and 3.As stated in Corollary 1 the benefit of Theorem 2 is that when every recursive call is accounted for by a tick, then a type derivation implies the termination of the program under analysis.The same does not hold for Theorem 3.However, Theorem 3 allows to type more programs than Theorem 2, which is due to the fact that (tick : defer) rule is more permissive than (tick : now).This proves very useful, in case termination is not required (or can be established by other means).
We exemplify this difference on the foo function, see Figure 13.Theorem 3 supports the derivation of the type rk(t) + log 2 (|t|) + 1 ⩾ rk(foo t) + 1, while Theorem 2 does not.This is due to the fact that potential can be "borrowed" Table 2: Coefficients q such q • log 2 (|t|) is a bound on the expected amortized complexity of splay depending on the probability p of a rotation and the cost c of a recursive call, where the cost of a rotation is 1 − c.Coefficients are additionally presented in decimal representation to ease comparison.
with Theorem 3. To wit, from the potential rk(t) + log 2 (|t|) + 1 for foo one can derive the potential rk(l ′ ) + rk(r ′ ) for the intermediate context after both letexpression (note there is no +1 in this context, because the +1 has been used to pay for the ticks around the recursive calls).Afterwards one can restore the +1 by weakening rk(l ′ ) + rk(r ′ ) to rk(foo t) + 1 (using in addition that rk(t) ⩾ 1 for all trees t).On the other hand, we cannot "borrow" with Theorem 2 because the rule (tick : now) forces to pay the +1 for the recursive call immediately (but there is not enough potential to pay for this).In the same way, the application of rule (tick : defer) and Theorem 3 is essential to establish the logarithmic amortised costs of randomised splay trees.(We note that the termination of foo as well as of splay is easy to establish by other means: it suffices to observe that recursive calls are on sub-trees of the input tree).

Implementation and Evaluation
Implementation.Our prototype ATLAS is an extension of the tool described in [17].In particular, we rely on the preprocessing steps and the implementation of the weakening rule as reported in [17] (which makes use of Farkas' Lemma in conjunction with selected mathematical facts about the logarithm).We only use the fully-automated mode reported in [17].We have adapted the generation of the constraint system to the rules presented in this paper.We rely on Z3 [24] for solving the generated constraints.We use the optimisation heuristics of [17] for steering the solver towards solutions that minimize the resulting expected amortised complexity of the function under analysis.
Evaluation.We present results for the benchmarks described in Section 2 (plus a randomised version of splay heaps, the source code can be found in the Appendix) in Table 1.Table 3 details the computation time of our evaluations.To the best of our knowledge this is the first time that an expected amortised cost could be inferred for these data structures.By comparing the costs of the operations of randomised splay trees and heaps to the costs of their deterministic versions (see Table 1), one can see the randomised variants have equal or lower complexity in all cases (as noted in Section 2 we have set the costs of the recursive call and the rotation to 1 /2, such that in the deterministic case, which corresponds to a coin  toss with p = 1, these costs will always add up to one).Clearly, setting the costs of the recursion to the same value as the cost of the rotation does not need to reflect the relation of the actual costs.A more accurate estimation of the relation of these two costs will likely require careful experimentation with data structure implementations, which we consider orthogonal to our work.Instead, we report that our analysis is readily adapted to different costs and different coin toss probabilities.We present an evaluation for different values of p, recursion cost c and rotation cost 1 − c in Table 2.The memory usage according to Z3's "max memory" statistic was 7129MiB per instance.The total runtime was 1H45M, with an average of 11M39S and a median of 2M33S.Two instances took longer time (36M and 49M).
Deterministic benchmarks.For comparison we have also evaluated our tool ATLAS on the benchmarks of [17].All results could be reproduced by our implementation.In fact, for the function SplayHeap.insert it yields an improvement of 1 /4 log 2 (|h|), ie.
to also allow negative values for b (under the condition that i a i + b ≥ 1) and our generalised (let : tree) rule can take advantage of these generalized resource functions (see Fig. B.1 for a statement of the rule and the proof of its soundness as part of the proof of Theorem 3).

Conclusion
In this paper, we present the first fully-automated expected amortised cost analysis of self-adjusting data structures, that is, of randomised splay trees, randomised splay heaps and randomised meldable heaps, which so far have only (semi-) manually been analysed in the literature.In future work, we envision to extend our analysis to related probabilistic settings such as skip lists [28] and randomised binary search trees [18].We note that adaptation of the framework developed in this paper to new benchmarks will likely require to identify new potential functions and the extension of the type-effect-system with typing rules for these potential functions.Further, on more theoretical grounds we want to clarify the connection of the here proposed expected amortised cost analysis with Kaminski's ert-calculus, cf.[14], and study whether the expected cost transformer is conceivable as a potential function.

A Benchmark: Probabilistic Analysis of Binary Search Trees
We present a probabilistic analysis of a deterministic binary search tree, which offers the usual contains, insert, and delete operations, where delete uses delete_max as a subroutine (the source code of all operations is given in Fig. C.4).We assume that the elements inserted, deleted and searched for are equally distributed; hence, we conduct a probabilistic analysis by replacing every comparison with a coin toss of probability one half.We will refer to the resulting data structure as Coin Search Tree in our benchmarks.Our tool ATLAS infers an logarithmic expected amortised cost for all operations, e.g., for insert and delete_max we obtain from which we obtain an expected amortised cost of 1 /2 log 2 (|t|) for both functions.respectively.Let Γ be a variable context, Q, Q ′ annotations and let e be an expression.The typing rule for rule (let : tree) makes use of the cost-free typing judgment Γ |Q ⊢ cf ndt e : α|Q ′ that differs from the standard cost-free typing relation Γ |Q ⊢ e : α|Q ′ insofar that all probabilistic choices in e are replaced by non-deterministic choices.We call the expression e ′ obtained from e through this adaption the non-deterministic version of e.

B.2 Soundness Theorems
The proof of the soundness theorems makes use of the following lemma, whose proof can be found in [13].
Lemma 2. Assume i q i log 2 a i ⩾ q log 2 b for some rational numbers a i , b > 0 and q i ⩾ q.Then, i q i log 2 (a i + c) ⩾ q log 2 (b + c) for all c ⩾ 1.
Theorem 2 (Soundness Theorem for (tick : now)).Let P be well-typed.Suppose Γ |Q ⊢ e : α|Q ′ and eσ Proof.We first deal with the case that Π ends in a structural rule, cf.Case.Suppose the last rule in Π be of the following form: Case.Let Π end in the following weakening rule applicaton .
By SIH, we have Due to the assumption of the (w) rule, we have Case. (share) and (w : var) can be dealt with in the same way, we refer the reader to [13] for the details.
We now assume that Π ends in a syntax-directed rule, cf.Case.First we assume eσ is a value.By definition of • −→ we have µ = {v} and c = 0.There are several subcases to consider, eg.eσ = node t b u, eσ = leaf, eσ = ( e 1 , e 2 ), etc.For these cases we can essentially proceed as in the nonprobabilistic setting, cf.[13].
for at most one i, α i = T q i = q ′ * q (a,c) = q ′ (a,c) .
By definition and the constraints incorporated in (pair), we obtain from which the claim follows. Case.Consider and suppose further xσ = node lu b v, that is, {eσ} Because Π ends with a syntax-directed rule, Π must in fact end with an application of the (match) rule, ie. .
By definition and the constraints given in the rule, we obtain: By MIH we have Φ(σ; Γ, x 1 : T, x 2 : B, from which the case follows directly.
Case.Consider e = let x = e 1 in e 2 .
In order to prove the claim for eσ c −→ n µ we need to split the n-step derivation into n 1 -step and n 2 -step derivations for e 1 and e 2 with n 1 + n 2 + 1 = n, where the one step accounts for substituting the value to which e 1 has evaluated into e 2 .
However, we cannot only consider one such split because evaluating e 1 to a normal form will in general need a different number of steps according to the probabilistic choices encountered in the derivation.Hence, we will consider all possible splits.
For this, we consider e 1 σ ci −→ i ν i for all 0 ⩽ i ⩽ n.We recall that the ν i are pointwise ordered on values, ie.we have ν i ↾ V ⩽ ν j ↾ V for i ⩽ j.Hence, we can define ξ i = ν i ↾ V − ν i−1 ↾ V for all 0 < i ⩽ n.Note that for w pi i ∈ ξ i we have that the probability that e 1 σ evaluates to the value w i in i steps is exactly p i .
Let w be some value to which e 1 σ has evaluated to in i steps.We then note that let x = w in e 2 0 −→ 1 {e 2 [x → w]}.Thus, we can apply the MIH to e 2 σ[x → w] and obtain that for suitably defined distributions µ w,i and costs c w,i .We now consider the SIH applied to e 1 σ c1 −→ n ν, ie.we have that By the definition of the ξ i we have that ν↾ V = n i=1 ξ i .We then consider eσ c −→ n µ.We observe that µ↾ V = n i=1 w p i ∈ξi p i • µ w,i ↾ V and c = c 1 + n i=1 w p i ∈ξi p i • c w,i for distributions µ w,i and costs c w,i defined as above.Further, we will establish below that We finally calculate using ( †), ( ‡) and (⋆) that In the first line, we employ property (⋆) together with the observation that ( †) implies It remains to establish (⋆).For this we proceed by a case distinction on whether e 1 is of tree type, ie., whether the rule Π ends in an application of the (let : tree)or of the (let : base)-rule.We treat the simpler case first and consider that e 1 is not of tree type.Then, Π ends in an application of the (let : base)-rule, ie.
We note that (⋆) follows directly from the constraints in the (let : base) together with the fact that ν is a (sub-)distribution.
Finally, we now suppose that e 1 is of type tree.Then, the type derivation Π ends in an application of the (let : tree)-rule.where we have elided all arithmetic constraints for readability.By definition and due to the constraints expressed in the typing rule, we have that Because the above equation holds for any w ∈ supp(ν↾ V ) we can deduce that r (b,d,e) log 2 (b|u| + d|w| + e) , using that ν is a (sub-)distribution, ie., that w∈supp(ν↾ V ) ν(w) ≤ 1.We now note that (⋆) follows directly from the above inequality and the constraints in the (let : tree) together with the fact that ν is a (sub-)distribution.
Case.Let e be ticking statement and let the last rule in Π be of the following form: Γ Case.Let e be a probabilistic branching statement, that is and let the last rule in Π be of the following form By definition, we have {eσ} from which we conclude the case.
Case.We consider the application rules (app) and (app : cf) and restrict our argument to the former, as the proof for the cost-free variant is similar, but simpler.We consider the costed typing Let f (x 1 , . . ., x k ) = e ∈ P, as P is well-typed, we have Γ |P ⊢ e : β|P ′ and Γ |Q ⊢ cf e : β|Q ′ by assumption.Further, by definition {eσ} Theorem 3 (Soundness Theorem for (tick : defer)).Let P be well-typed.Suppose Γ |Q ⊢ e : α|Q ′ and σ c e ⇒ µ.Then, we have Proof.The setup (and most of the cases) of this proof follow the proof of Theorem 2: It suffices to prove for every n ⩾ 0 that We proceed by main induction on n -which we will call main induction hypothesis (MIH)-and side-induction on the length of the type derivation Π of Γ |Q ⊢ e : α|Q ′ -which we will call side induction hypothesis (SIH).
For the majority of the cases, the arguments can be easily suited from those employed in proof of Theorem 2. Thus, we only consider a restricted set of cases that may be of independent interest.
We now consider σ n+1 c e ⇒ µ for some n ⩾ 0 and the type derivation Π of Γ |Q ⊢ e : α|Q ′ .The cases where Π ends in a structural rule, cf. Figure B.2, can be dealt with in the same way as in the proof of Theorem 2.
We now assume that Π ends in a syntax-directed rule, cf.Let w be some value.We apply the MIH to σ for suitably defined distributions µ w and costs c w .We further apply the MIH to σ n c1 e 1 ⇒ ν, ie.we obtain that Further, we will establish below that We finally calculate using ( †), ( ‡) and (⋆) that where we have used for the last equality that µ = w∈supp(ν) ν(w) • µ w and c = c 2 + w∈supp(ν) ν(w)•c w according to the definition of the big-step semantics.
We finally note that (⋆) can be established in the same way as in the proof of Theorem 2 (for both the (let : base)and the (let : tree)-rule case).
Case.Let e be a probabilistic branching statement, that is eσ = if coin a/b then e 1 else e 2 , and let the last rule in Π be of the following form By definition, there exists distributions µ 1 and µ 2 such that σ n c1 e 1 ⇒ µ 1 , Hence, we obtain from which we conclude the case.By MIH, we obtain Φ(σ; Γ |Q) ⩾ c + E µ (λv.Φ(v|Q ′ )), from which we conclude that Here, we exploit the definition of Q ′ − a /b and the definition of |µ| in the secondto-last line and the definition of expectations in the last line.

C Function Definitions
Below, we use a notation for ticks that is easier to type with standard keyboard layouts, ie. the tilde symbol followed by cost and the subexpression, ∼ a/b e, instead of a tick mark and cost in the superscript, e ✓a/b .

Fig. 1 :
Fig. 1: descend function (ii) rk(node l d r) := rk(l) + log 2 (|l|) + log 2 (|r|) + rk(r), where l, r are the left resp.right child of the tree node l d r, and d is some data element that is ignored by the resource function.(We note that rk(t) is not needed for the analysis of descend but is needed for more involved benchmarks, e.g.randomised splay trees.)With these resource functions at hand, our analysis introduces the coefficients q * , q (1,0) , q (0,2) , q ′ * , q ′ (1,0) , q ′ (0,2) and employs the following Ansatz : 5

Fig. 6 :
Fig.6: delete_max function of a Coin Search Tree with one rotation how to chose the best value of p for given costs c and d.)Our analysis is able to fully automatically infer an amortised complexity bound of 9 /8 log 2 (|t|) for splay (with c, d and p fixed as above), which improves on the complexity bound of 3 /2 log 2 (|t|) for the deterministic version of splay as reported in[17], confirming that randomisation indeed improves the expected runtime.We remark on how the amortised complexity bound of 9 /8 log 2 (|t|) for splay is computed by our analysis.Our tool ATLAS computes an annotated type for splay that corresponds to the inequality 3 /4 rk(t)+ 9 /8 log 2 (|t|)+ 3 /4 ⩾ c splay (t)+ 3 /4 rk(splay t)+ 3 /4.By setting ϕ(t) := rk(t)+ 3 /4 as potential function in the sense of Tarjan and Sleator[31,33], the above inequality allows us to directly read out an upper bound on the amortised complexity a splay (t) of splay (we recall that the amortised complexity in the sense of Tarjan and Sleator is defined as the sum of the actual costs plus the output potential minus the input potential): a splay (t) = c splay (t) + ϕ(splay t) − ϕ(t) ⩽ 9 /8 • log 2 (|t|).

Fig. 8 :
Fig.8: One-Step Reduction Rules of the form f x 1 . . .x n = e f , where the x i are variables and e f an expression.When considering some expression e that includes function calls we will always assume that these function calls are defined by some program P. A substitution or (environment) σ is a mapping from variables to values that respects types.Substitutions are denoted as sets of assignments: σ = {x 1 → t 1 , . . ., x n → t n }.We write dom(σ) to denote the domain of σ.
For a ∈ A, a rule a c → {b µ(b) } b∈A indicates that a reduces to b with probability µ(b) and cost c ∈ R + 0 .Note that any right-hand-side of a PARS is supposed to be a full distribution, ie. the probabilities in µ sum up to 1.Given two objects a and b, a c → {b 1 } will be written a c → b for brevity.An object a

Fig. 9 :
Fig. 9: Probabilistic Reduction Rules of Distributions of Expressions We suit the one-step reduction relation → given in Figure 8 as a (nondeterministic) PARS over multidistributions.As above, we sometimes identify Dirac distributions {e 1 } with e. Evaluation contexts are formed by let expressions, as in the following grammar: C ::= □ | let x = C in e.We denote with C[e] the result of substitution the empty context □ with expression e. Contexts are exploited to lift the one-step reduction to a ternary weighted reduction relation • −→ ⊆ M(E) × R +∞ 0 × M(E), cf. Figure 9. (In (Conv), refers to the usual notion of multiset union.)The relation • −→ constitutes the operational (small-step) semantics of our simple probabilistic function language.Thus µ c −→ ν states that the submultidistribution of objects µ evolves to a submultidistribution of reducts ν in one step, with an expected cost of c.Note that since • → is non-deterministic, so is the reduction relation • −→.We now define the evaluation of an expression e ∈ E wrt. to the small-step relation σ[x → w] denotes the update of the environment σ such that σ[x → w](x) = w and the value of all other variables remains unchanged.For function application we set σ ′ := {y1 → x1σ, . . ., y k → x k σ}.In the rules covering match we set σ ′′ := σ ⊎ {x0 → t, x1 → a, x2 → u} and σ ′′′ := σ ⊎ {x0 → t, x2 → u} for trees and tuples respectively.

B. 1
Type System: Non-Probabilistic Part The non-probabilistic and structural typing rules are given in Figure B.1 and B.2 Figure B.2: Figure B.1, and proceed by a case distinction on eσ, respectively the first step of eσ c −→ n µ:
Figure B.1, and proceed by a case distinction on eσ, respectively the first step of σ n+1 c e ⇒ µ: Case.Consider e = let x = e 1 in e 2 .
Here we assume f x1 . . .x k = e ∈ P, σ a substitution respecting the signature of f and w is a value. 3

Table 3 :
Number of assertions, solving time and maximum memory usage (in mebibytes) for the combined analysis of functions per-module.The number of functions and lines of code is given for comparison.