Calculating Datastructures

. Where do datastructures come from? This paper explores how to systematically derive implementations of one-sided (cid:29)exible arrays from a simple reference implementation. Using the dependently typed programming language Agda, each calculation constructs an isomorphic(cid:22) yet more e(cid:30)cient(cid:22)datastructure using only a handful of laws relating types and arithmetic. Although these calculations do not generally pro-duce novel datastructures they do give insight into how certain datastructures arise and how di(cid:27)erent implementations are related.


Introduction
There is a rich eld of program calculation, deriving a program systematically from its specication. In this paper, we explore a slightly dierent problem: showing how ecient datastructures can be derived from an inecient reference implementation. In particular, we consider how to derive implementations of one-sided exible arrays, that oer ecient indexing without being limited to store only a xed number of elements. Although we do not claim to invent new datastructures by means of our calculations, we can demystify the denitions of familiar datastructures, providing a constructive rationalization identifying the key design choices that are made.
In contrast to program calculation, relating a program and its specication, the calculation of datastructures requires relating two dierent types. As it turns out, we show how to calculate ecient implementations that are isomorphic to our reference implementation. These calculations rely exclusively on familiar laws of types and arithmetic. Indeed, we have formalised these calculations in the dependently typed programming language and proof assistant Agda. While we present our derivations in quite some detail, we occassionally will refer to the accompanying source code for a more complete account; while not provable in the current type theory underlying Agda, we will occassionally assume the axiom of functional extensionality.
After dening the interface of exible arrays (Section 2), we will dene the Peano natural numbers (Section 3), leading to the rst functional reference implementation of exible arrays (Section 4). Starting from this reference implementation, we compute an isomorphic, yet inecient, datastructure (Section 5).
By shifting to a more ecient (binary) number representation (Section 6), we can dene a similar reference implementation (Section 7). Using this second reference implementation, we once again compute an isomorphic datastructure (Section 8)but in this case several alternative choices exist (Sections 9 & 10).

One-sided Flexible Arrays
Consider the following interface for one-sided exible arrays: N : Set Array : N → Set → Set lookup : Array n elem → ({i : N | i < n} → elem) tabulate : ({i : N | i < n} → elem) → Array n elem nil : Array 0 elem cons : elem → Array n elem → Array (1 + n) elem head : Array (1 + n) elem → elem tail : Array (1 + n) elem → Array n elem An array of type Array n elem stores n elements of type elem, for some natural number n. For the moment, we leave the type of natural numbers abstract.
In what follows, we explore dierent implementations of arrays by varying the implementation of the natural number type.
We require exible arrays to be isomorphic to functions from some nite set of indices to the elem type. The lookup function witnesses one direction of the isomorphism, tabulate the other.
In what follows, we refer to functions with a nite domain, that is functions of the form {i : N | i < n} → elem, as nite maps.
In contrast to traditional xed-size arrays, one-sided exible arrays can be extended at the front using the cons operation. Non-empty arrays can be shrunk with tail, discarding the rst element. The following properties specify the interplay of indexing and the other operations that modify the size of the array.
lookup (cons x xs) 0 ≡ x (2a) lookup (cons x xs) (1 + i) ≡ lookup xs i (2b) head xs ≡ lookup xs 0 (2c) lookup (tail xs) i ≡ lookup xs (1 + To dene any implementation of this interface, we rst need to settle on the implementation of the natural number type. The most obvious choice is, of course, Peano's representation.

Peano Numerals
To calculate an implementation of exible arrays we proceed in two steps. First, we x an indexing scheme by dening a type of natural numbers below some xed, upper bound. Such an indexing scheme xes the domain of our nite maps, {i | i < n} . Next, we calculate a more ecient representation of nite maps, yielding a datastructure rather than a function. This section details the ideas underlying the rst step using the simplest representation of natural numbers; in Section 5, we explore the second step.

Number Type
The datatype of Peano numerals describes the set of natural numbers as the least set containing zero that is closed under a successor operation.
data Peano : Set where zero : Peano succ : Peano → Peano We use variable names such as k, m, and n to range over Peano numerals and use the Arabic numerals to denote Peano constants, writing 3 rather than succ (succ (succ zero)).
The operations doubling and incrementing natural numbers, needed in Section 6.1, illustrate how to dene functions (by induction) in Agda.

Index Type
Having xed the number type, we move on to dene the type of valid indices in an Array of size n. Here we have several alternatives, each with its own advantages and disadvantages. The most obvious transcription of {i | i < n} uses a dependent pair or Σ type to combine a natural number and a proof that it is within bounds: 3 The single most important rule when reading Agda code is that only a space separates lexemes. For example, +1 is a single lexeme denoting the successor function, whereas n + 1 is a sequence of three lexemes. In general, an Agda identier consists of an arbitrary sequence of non-whitespace Unicode characters. There are only a few exceptions to this rule: for example, parentheses, ( and ), and curly braces, { and }, must not form part of a namefor obvious reasons.
Here < denotes the strict ordering on the naturals. While the denition is fairly straightforward, it is somewhat cumbersome to use in practice as any computation on indices involves manipulations of proofs. Before discussing alternative denitions, let us rst explore some properties of the Index type.
The formulas link arithmetic on numbers to operations on types, with the number 0 corresponding to the empty set (written ⊥ in Agda), 1 to a singleton set (written ), addition to disjoint union (written ), multiplication to cartesian product (written ×), and, nally, exponentiation to the function space. We refer to these laws as index transformations. For example, the sum rule Index-+ is witnessed by a mapping between indices for a pair of arrays and indices for a single array, as suggested below. More instructively, the product rule, Index- * , is witnessed by a mapping between indices for a two-dimensional array and indices for a one-dimensional array. In general, there is not a single canonical witness for an index transformation.
The diagram on the left above exemplies what is known as row-major order, but there is also column-major order, shown on the right. For now, we choose to ignore these specics. However, when we start calculating datastructures the choice of isomorphism becomes tremendously important, a point we return to in Section 6.2.
Remark 1 (Categorical background). To provide some background, the type function Index is the object part of a functor from the preorder of natural numbers to the category of nite sets and total functions. (This is why the type is also known as Fin or FinSet.) The action of the functor on arrows embeds Index m into Index n, provided m n. In fact, the isomorphisms demonstrate that Index is simultaneously a strong monoidal functor of type (N, 0, +) → (Set, ⊥, ) and a strong monoidal functor of type (N, 1, ·) → (Set, , ×).
Returning to the issue of dening the Index type in Agda, we can use the isomorphisms above to determine the index set by pattern matching on the natural number n.
The zero rule Index-0 determines that the Index (0) type is uninhabited; whereas the rules for one and addition, Index-1 and Index-+, determine that Index (succ n) contains one more element than Index n. For reasons of readability, we turn the denition of Index into idiomatic Agda, replacing the type function by an inductively dened indexed datatype.
data Index : Peano → Set where izero : Index (succ n) isucc : Index n → Index (succ n) There are no constructors for Index zero, corresponding to the rst equation of Index, and two constructors for Index (succ n), corresponding to Index's second equation. The constructor names are almost identical to those of Peano. This is intentional: we want the constructors of Index to look and behave like their namesakes. The only dierence is that the former carry vital type information about upper bounds. Reassuringly, all three denitions of index sets are equivalent. The straightforward, but rather technical proofs can be found in the accompanying material.
To illustrate working with indices, let us implement some index transformations that are needed later in Section 6.2.
The second operation combines doubling and increment. The obvious denition, i ·2+1 = isucc (i ·2+0) does not work, as the expression on the right-hand side has type Index (n ·2 +1) and not Index (n ·2), as required. On plain naturals we can separate doubling and increment; here we need to combine the operations to be able to establish precise upper bounds. We cannot expect Agda to automatically replicate the hand-written proof: In general, since Index combines data and proof, index transformations require more work than their vanilla counterparts on naturals. Now that we have a precise understanding of the domain of our nite maps, we can start calculating an implementation of the interface specied in Section 2.

Functions as Datastructures
The simplest implementation of the Array type identies arrays and nite maps: Array : Peano → Set → Set Array n elem = Index n → elem In this particular case, lookup and tabulate are manifest identities, rather than isomorphisms.
lookup : Array n elem → (Index n → elem) lookup a = a tabulate : (Index n → elem) → Array n elem tabulate f = f To complete this implementation, however, we still need to dene the remaining operations: nil, cons, head, and tail. The empty array nil is the unique function from the empty set, dened below using an absurd pattern, written (). For the other functions, the specication serves as the implementation, for example, (2a) and (2b) form the denition of cons, given that lookup is implemented as the identity function.
nil : Array zero elem nil () cons : elem → Array n elem → Array (succ n) elem cons x xs (izero) = x cons x xs (isucc i) = xs i head : Array (succ n) elem → elem head xs = xs izero tail : Array (succ n) elem → Array n elem tail xs i = xs (isucc i) The proofs that the implementation satises the specication are consequently trivial: all the specied equivalences hold by denition.
While the implementation is exceptionally simple, it is also exceptionally slow: the running time of lookup xs i is not only determined by the index i but also by the number of operations used to build the array xs. For example, even though tail (cons x xs) is extensionally equal to xs, each lookup takes two additional steps as the index is rst incremented by tail only to be subsequently decremented by cons. In other words, the run-time behaviour of lookup is sensitive to how the array has been constructed! To avoid this problem, we turn functions into datastructures, using this implementation above as our starting point.

5
Lists also Known as Vectors Do you remember the laws of exponents from secondary school?
Quite amazingly, these equalities can be re-interpreted as isomorphisms between types, where B A is the type of functions from A to B.
law-of-exponents-⊥ : (⊥ → X) If we apply these isomorphisms from left to right, perhaps repeatedly, we can systematically eliminate function spaces. This process might be called defunctionalization or trieication, except that the former term is already taken.
Some background is perhaps not amiss. A trie is also known as a digital search tree. In a conventional search tree, the search path is determined on the y by comparing a given key against the sign posts stored in the inner nodes.
By contrast, in a digital search tree, the key is the search path. This idea is, of course, not limited to searching. The point of this paper is that it applies equally well to indexing: the index or position of an element within an array is a path into the datastructure that represents the array.
Remark 2 (Categorical background). A lot more can be said about trieication: Tab with Tab A = A → X is a contravariant functor, part of an adjoint situation, sending left adjoints (initial objects ⊥, coproducts , initial algebras) to right adjoints (terminal objects , products ×, nal coalgebras) [2,13].
That's enough words for the moment, calculemus! To trieify our type of nite maps, trieify : ∀ elem → ∀ n → (Index n → elem) ∼ = Array n elem we proceed by induction on the size of the array n. For the derivation, we use Wim Feijen's proof format, which features explicit justications for the calculational steps, written between angle brackets. For example, the rst rewrite below is justied by an isomorphism between function spaces: dom ∼ =→ ∼ = cod applies the isomorphism dom to the domain of its function argument and cod to its codomain. Note that in contrast to traditional pencil-and-paper proofs, the justication is an Agda term, and the Agda type-checker veries that this term serves indeed as appropriate evidence for the step.
Array zero elem The calculation suggests a dening equation: Array zero elem = , which expresses that there is exactly one array of size 0, namely the empty array. For non empty arrays, the calculation is almost just as straightforward.
trieify elem (succ n) = proof elem × Array n elem ∼ = use-as-denition-of Array-succ Array (succ n) elem The nal isomorphism expresses that an array of size 1 + n consists of an element followed by an array of size n. If we name the constructors appropriately, we obtain the familiar datatype of lists, indexed by length. This indexed type is also known as Vector. variable elem : Set data Array : Peano → Set → Set where nil : Array zero elem cons : elem → Array n elem → Array (succ n) elem Observe that the constructors of the interface, nil and cons, are now implemented by the constructors of the datatype.
If we extract the two components of the trieify isomorphism, we obtain the following denitions of lookup and tabulate. 4 lookup : Array n elem → (Index n → elem) lookup (cons x xs) (izero) = x lookup (cons x xs) (isucc i) = lookup xs i tabulate : (Index n → elem) → Array n elem tabulate {zero} fm = nil tabulate {succ n} fm = cons (fm izero) (tabulate (λ i → fm (isucc i))) Like trieify, both lookup and tabulate are dened by induction on the size. In the case of lookup, the size information remains implicit as Agda is able to recreate it from the explicit argument, namely, the argument list. For tabulate, no such information is available. Hence, we need to match on the implicit argument explicitly in its denition. 4 Unfortunately, Agda's extraction process is only semi-automatic, so we do not trust the resulting code. The proofs that lookup and tabulate are inverses are, however, entirely straightforward and can be found in the accompanying material.
Here we have prexed the operations of the reference implementation of Section 4 by FM, short for`nite map'. Recalling that FM.lookup and FM.tabulate are both the identity function, Property (3e) means that lookup is indeed extensionally equal to the implementation using nite maps. Conversely, as (3f) shows, the tabulate is the right-inverse of lookup, which is unique.
The vector implementation of arrays does not suer from history-sensitivity, tail (cons x xs) is now denitionally equal to xs, but thanks to the ivory tower number type, it is still too slow to be useful in practice. The cure is pretty obvious: we replace unary numbers by binary numbersalbeit with a twist.

Leibniz Numerals
Instead of working with Peano naturals, we could choose a dierent implementation of the natural number type. In this section, we will explore one possible implementation, Leibniz numbers, or binary numbers that have a unique representation of every Peano natural number.

Number Type
A Leibniz numeral is given by a sequence of digits with the most signicant digit on the left. A digit is either 1 or 2.
To assign a meaning to a Leibniz numeral, we map it to a Peano numeral.
For example, N 0b 1 1 2 1 normalizes to 17. The meaning function makes it crystal clear that we implement a base-two positional number system, except that the digits 1 and 2 are used, rather than 0 and 1.
Thanks to this twist we avoid the problem of leading zeros: every natural number enjoys a unique representation; the Leibniz number system is nonredundant. Moreover, the meaning function establishes a one-to-one correspondence between the two number systems: Leibniz ∼ = Peano. Speaking of number conversion, the other direction of the isomorphism can be easily implemented using the pseudo-constructors zero and succ. zero : Leibniz zero = 0b succ : Leibniz → Leibniz succ (0b) = 0b 1 succ (n 1) = n 2 succ (n 2) = (succ n) 1 --carry The binary increment exhibits the typical recursion pattern: the least signicant digit is incremented, unless it is maximal, in which case a carry is propagated to the left. Using the meaning function it is straightforward to show that the implementation is correct.
The prex pseudo indicates that the operations zero and succ are not fulledged constructors: we cannot use them in patterns on the left-hand side of denitions. To compensate for this, we additionally oer a Peano view [18,27]. view : (n : Leibniz) → Peano-View n view (0b) = as-zero view (n 1) with view n ... | as-zero = as-succ 0b ... | as-succ m = as-succ (m 2) --borrow view (n 2) = as-succ (n 1) In a sense, view combines two functions: the test for zero and the predecessor function, again following the typical recursion pattern: the least signicant digit is decremented, unless it is minimal, in which case we borrow one from the left.
The semantics of such a view is dened by a mapping into the Peano numerals. The correctness criterion asserts that view does not change the value of its argument.

Remark 3 (Agda). You may wonder why the type Peano-View is indexed by a
Leibniz numeral. Why not simply dene: data Peano-View : Set where --too simple-minded as-zero : Peano-View as-succ : Leibniz → Peano-View In contrast to the simple, unindexed datatype above, our indexed view type keeps track of the to-be-viewed value, which turns out to be vital for correctness proofs: if view n yields as-succ m then we know that n denitionally equals succ m. The constructors of the unindexed datatype do not maintain this important piece of information, so the subsequent proofs do not go through.
As an intermediate summary, Leibniz numerals serve as a drop-in replacement for Peano numerals: the pseudo-constructors replace zero and succ on the right-hand side of equations; the view allows us to additionally replace them in patterns on the left-hand side.

Index Type
Of course, we would like to use binary numbers for indices, as well. Therefore we need to adapt the type of positions that species the domain of our nite maps. The following derivation is based on the index transformations but, as we have noted in Section 3, there are, in general, several options for the witnesses of these transformations. In other words, we have to make some design decisions!
In particular, since we use a binary, positional number system we need to inject life into the doubling isomorphism: Index-2·n ∼ =Index-n Index-n : Index (n ·2) ∼ = Index n Index n There are two canonical choices, one based on appending and a second one based on zipping or interleaving. Zipping maps elements of the rst summand to even indices and elements of the second to odd indices.
Its inverse amounts to division by 2 with the remainder specifying the summand of the disjoint union.
Given these prerequisites the calculation of re-index : ∀ n → Peano.Index n ∼ = Leibniz.Index n proceeds by induction on the structure of Leibniz numerals. For the base case, The calculation for the rst inductive case also works without any surprises.
We plug in the denition of Peano.Index, apply the doubling isomorphism based on zipping, and nally invoke the induction hypothesis. The derivation for the nal case follows exactly the same pattern, except that we unfold the denition Leibniz.Index n Leibniz.Index n ∼ = use-as-denition-of Index-2 Leibniz.Index (n 2) As usual, we introduce names for the summands of the disjoint unions, obtaining the following index type for Leibniz numerals.
data Index : Leibniz → Set where 0b 1 : Index (n 2) --Index n 2 2 : Index n → Index (n 2) --Index n 3 2 : Index n → Index (n 2) --Index n A couple of remarks are in order. The index attached to the constructor names indicates the least signicant digit of the upper bound. The constructors 0b 1 and 0b 2 say: Operationally we are alike, both representing the zeroth index. However, we carry important type information, 0b 1 lives below an odd upper bound, whereas 0b 2 is below an even bound. The denition of the index set is perhaps not quite what we expected as it amalgamates two dierent number systems: the by now familiar 1-2 system and a variant that employs 0 and 1 as leading digits and 2 and 3 for non-leading digits. Remark 4. As an aside, the 2-3 number system is also non-redundant. In general, any binary system that uses the digits 0, . . . , a in the leading position and a + 1 and a + 2 for the other positions, enjoys the property that every natural number has a unique representation.
To make the semantics of these indices precise, we extract the witness for the reverse direction of the re-indexing isomorphism (using iz and is as shorthands for izero and isucc on Peano indices).
Just as we saw in Section 3, the expressions n ·2+1 and is (n ·2+0) are quite dierent as they live below dierent upper bounds: if j : Index a, then j ·2+1 : Index (a ·2), whereas is (j ·2+0) : Index (a ·2 +1). These types carry just enough information to avoid the infamous index out of bounds errors. While the denitions of Index and I may seem quite bulky at rst glance, they encode an essential invariant of the indices involved.
The same remark applies to the denition of izero and isucc.
The successor function maps an odd number to an even number, and vice versa, correspondingly incrementing the upper bounds. Consequently, arguments and results alternate between the two number systems. This is why isucc (i 2 2 ) yields (isucc i) 1 1 , rather than i 3 2 . The recursion pattern is interesting: if the argument is below an odd bound, isucc returns immediately; a recursive call is only made for indices that live below an even upper bound. We return to this observation in Section 8.
Using the meaning function we can establish the correctness of izero and isucc.
izero-correct : Both equations relate expressions of dierent types, for example, I isucc i : N succ n whereas is I i : N n +1. Fortunately, succ-correct tells us that both types are propositionally equal. 5 You may have noticed that this section replicates the structure of Section 6.1.
It remains to dene an appropriate view on Leibniz indices.
data Index-View : Index (succ n) → Set where as-izero : Index-View {n} izero as-isucc : (i : Index n) → Index-View (isucc i) The type Index-View is implicitly parametrized by a Leibniz numeral n and explicitly parametrized by a Leibniz index of type Index (succ n). The denition of the view function merits careful study.
Recall that the Peano view combines the test for zero and the predecessor function. The same is true of iview, except that arguments and results additionally alternate between the two number systems.
Finally, given the semantics of view patterns we can assert that iview does not change the value of its argument.

W
: Functions as Datastructures To showcase the use of our new gadgets we adapt the implementation of Section 4 to binary indices, setting Array n elem = Index n → elem.
nil : Array 0b elem nil () cons : elem → Array n elem → Array (succ n) elem cons x xs i with iview i ... | as-izero = x ... | as-isucc j = xs j head : Array (succ n) elem → elem head xs = xs izero tail : Array (succ n) elem → Array n elem tail xs i = xs (isucc i) As in the Peano case, functions as datastructures serve as our reference implementation for datastructures based on Leibniz numerals. With this specication in place, we can now try to discover a corresponding datastructure. 8

One-two Trees
Turning to the heart of the matter, let us trieify the type of nite maps based on binary indices. trieify : ∀ elem → ∀ n → (Index n → elem) ∼ = Array n elem The strategy should be clear: as in Section 5, we eliminate the type of nite maps using the laws of exponents. The base case is identical to the one for lists.
trieify elem (0b) = proof The calculation for the inductive cases follows the same rhythmwe unfold the denition of Index and apply the laws of exponentsexcept that we additionally invoke the induction hypothesis.
trieify elem (n 1) = proof elem × Array n elem × Array n elem ∼ = use-as-denition-of Array-1 Array (n 1) elem The nal step in the isomorphism above expresses that an array of size n ·2 +1 consists of an element followed by two arrays of size n. The isomorphism for arrays of size n ·2 +2 follows a similar pattern.
trieify elem (n 2) = proof elem × elem × Array n elem × Array n elem ∼ = use-as-denition-of Array-2 Array (n 2) elem An array of size n ·2 +2 consists of two elements followed by two arrays of size n. All in all, we obtain the following datatype. Its elements are called one-two trees 6 for want of a better name.
variable elem : Set data Array : Leibniz → Set → Set where Leaf : Array 0b elem Node 1 : elem → Array n elem → Array n elem → Array (n 1) elem Node 2 : elem → elem → Array n elem → Array n elem → Array (n 2) elem As an aside, Agda like Haskell prefers curried data constructors over uncurried ones. The following equivalent denition that uses pairs shows more clearly that one-two trees are modelled after the 1-2 number system, data Array : Leibniz → Set → Set where Leaf : Array 0b elem Node 1 : elem 1 → (Array n elem) 2 → Array (n 1) elem Node 2 : elem 2 → (Array n elem) 2 → Array (n 2) elem where A 1 = A and A 2 = A × A.
Turning to the operations on one-two trees, we rst extract the witnesses of the trieify isomorphism, obtaining human-readable denitions of lookup and tabulate.
tabulate : (Index n → elem) → Array n elem For example, tabulate {0b 1 1 2 1} id yields the tree depicted below. A couple of remarks are in order. By denition, one-two trees are not only sizebalanced, they are also height-balancedthe height corresponds to the length of the binary representation of the size. The binary decomposition of the size fully determines the shape of the tree; all the nodes on one level have the same shape; the digits determine this shape from bottom to top. In the example above, 0b 1 1 2 1 implies that the nodes on the third level (from bottom to top) are two-nodes, whereas the other nodes are one-nodes. There are 2 3 nodes on the bottom level, witnessing the weight of the most signicant digit.
Turning to the size-changing operations, cons is based on the binary increment. Recall that succ alternates between odd and even numbers. Accordingly, cons alternates between one-and two-nodes. nil : Array zero elem nil = Leaf cons : elem → Array n elem → Array (succ n) elem cons x 0 (Leaf) = Node 1 x 0 Leaf Leaf cons x 0 (Node 1 x 1 l r) = Node 2 x 0 x 1 l r cons x 0 (Node 2 x 1 x 2 l r) = Node 1 x 0 (cons x 1 l) (cons x 2 r) A one-node is turned into a two-node. Dually, a two-node becomes a one-node; the two surplus elements are pushed into the two sub-trees. Observe that the recursion pattern of succ dictates the recursion pattern of cons, that is, whether we stop or recurse. The denition of isucc dictates the layout of the data. For example, the rst component of Node 1 becomes the second component of Node 2 .
You may want to view a two-node as a small buer. Consing an element to a one-node allocates the buer; consing a further element causes the buer to overow.
It is also possible to derive the implementation of cons from the specication (2a)(2b). However, as the argument is based on positions and the Index type comprises seven constructors, the calculations are rather lengthy, but wholly unsurprising, so they have been relegated to Appendix A. Figure 1 shows a succession of one-two trees obtained by consing 4, 3, 2, 1, and 0 (in this order) to tabulate {0b 1 2 2} (λ i → i +5). In every second step, cons touches only the root node. However, every once in a while the entire tree is rewritten, corresponding to a cascading carry. In Figure 1, this happens in the nal step when a tree of size 0b 2 2 2 is turned into a tree of size 0b 1 1 1 1.
Consequently, the worst-case running time of cons is Θ(n). However, like the binary increment, consing shows a more favourable behaviour if a sequence of operations is taken into account: cons runs in Θ(log n) amortized time. This is less favourable than succ, which runs in constant amortized time. The reason is simple: for each carry succ makes one recursive call, whereas cons features two calls so a carry propagation takes time proportional to the weight of the digit.
We return to this point in Section 10.
The operations head and tail basically undo the changes of cons. head : Array (succ n) elem → elem head {0b} (Node 1 x 0 l r) = x 0 head {n 1} (Node 2 x 0 x 1 l r) = x 0 head {n 2} (Node 1 x 0 l r) = x 0 tail : Array (succ n) elem → Array n elem tail {0b} (Node 1 x 0 l r) = Leaf tail {n 1} (Node 2 x 0 x 1 l r) = Node 1 x 1 l r tail {n 2} (Node 1 x 0 l r) = Node 2 (head l) (head r) (tail l) (tail r) As an attractive alternative to these operations we also introduce a list view, analogous to the Peano view on binary numbers. = as-nil list-view {n = n 1} (Node 1 x 0 l r) with view n | list-view l | list-view r ... | as-zero | as-nil | as-nil = as-cons x 0 r ... | as-succ m | as-cons x 1 l | as-cons x 2 r = as-cons x 0 (Node 2 x 1 x 2 l r ) list-view {n = n 2} (Node 2 x 0 x 1 l r) = as-cons x 0 (Node 1 x 1 l r) The rst and the last equation are straightforward, in particular, removing an element from a two-node yields a one-node. Following this logic, removing an element from a one-node gives a zero-nodeexcept that our datatype does not feature this node type. Consequently, we need to borrow data from the sub-trees.
To this end, a view is simultaneously placed on the size and the two sub-treesa typical usage pattern.
The implementation of arrays using one-two trees can be shown correct with respect to our reference implementation in Section 7. The steps are completely analogous to the line of action in Section 5. The details are elided for reasons of space.

Remark 5 (Haskell). It is instructive to translate the Agda code into a language
such as Haskell that does not support dependent types. The datatype denition is almost the same, except that the size index is dropped.
data Array :: Type → Type where Leaf :: Array elem Node 1 :: elem → Array elem → Array elem → Array elem Node 2 :: elem → elem → Array elem → Array elem → Array elem If we represent the element of the index type by plain integers, then we need to translate the index patterns. This can be done in a fairly straightforward manner using guards and integer division.
A more sophisticated alternative is to replace each constructor of Index by a pattern synonym [23].
As the interface considered in this paper is rather narrow, there is no need to maintain the size of trees at run-time. However, if size information is needed, it can be computed on the y, size :: Array elem → Integer size (Leaf) = 0 size (Node 1 x 1 l r) = size l * 2 + 1 size (Node 2 x 1 x 2 l r) = size l * 2 + 2 in logarithmic time. 9 Braun Trees The derivation of the Leibniz index type in Section 6.2 and its associated trie type in Section 8 are entirely straightforward. Too straightforward, perhaps?
This section and the next highlight the decision points and investigate alternative designs.

Index Type, Revisited
The index type for Peano numerals enjoys an appealing property: its constructors look and behave like their Peano namesakes, indicated by the isomorphisms: The same cannot be said of Leibniz indices. The indices below an even bound are based on 2-3 binary numbers, rather than the 1-2 system we started with.
Can we re-work the isomorphism on the right so that it has the same shape as the one on the left, with three constructors instead of four?
Let's calculate, revisiting the second inductive case.
re-index (n 2) = proof Peano.Index n 2 Peano.Index n Peano.Index n At this point, we have applied the induction hypothesis in the original derivation.
An alternative is to rst join the second and third summands of the disjoint union, applying the re-indexing law Index-succ backwards from right to left.
data Index : Leibniz → Set where Index (n 1) 1 1 : Index n → Index (n 1) 2 1 : Index n → Index (n 1) Index (n 2) 1 2 : Index (succ n) → Index (n 2) 2 2 : Index n → Index (n 2) The datatype features two identical sets of constructors, one for indices below an odd upper bound and a second for indices below an even upper bound.
Having changed the index type, we need to adapt the operations on indices.
If we ignore the subscripts, the rst three clauses of the successor function are identical to the last three clauses. Operationally, the constructors 1 1 and 1 2 are treated in exactly the same way. This is precisely what we have hoped for! (At the risk of dwelling on the obvious, even though the denition of isucc seems repetitive, it is not: the proofs relating the indices to their upper bounds are quite dierent.)

Trie Type, Revisited
The new index type gives rise to a new trie type. We only need to adapt the trieication for the second inductive case: n 2. As in the original derivation, the steps are entirely straightforwardnothing surprising here.
elem × Array (succ n) elem × Array n elem ∼ = use-as-denition-of Array-2 Array data Array : Leibniz → Set → Set where Leaf : Array 0b elem Node 1 : elem → Array n elem → Array n elem → Array (n 1) elem Node 2 : elem → Array (succ n) elem → Array n elem → Array (n 2) elem A moment's reection reveals that we have rediscovered Braun trees. Recall that the size of a Braun tree determines its shape: a Braun tree of odd size (n 1) consists of two sub-trees of the same size; in a Braun tree of even, non-zero size (n 2) the left sub-tree is one element larger. As an aside, the property that the size determines the shape is shared by all our implementations of exible arrays. It is a consequence of the fact that the container types are based on non-redundant number systems.
Similar to one-two threes, erm, trees, Braun trees feature two constructors for non-empty trees. However, in contrast to one-two trees, indexing is the same for both constructors. This becomes apparent if we extract the witnesses from the trieify ismorphism. lookup : Array n elem → (Index n → elem) lookup (Node 1 x 0 l r) (0b 1 ) = x 0 lookup (Node 1 x 0 l r) (i 1 1 ) = lookup l i lookup (Node 1 x 0 l r) (i 2 1 ) = lookup r i lookup (Node 2 x 0 l r) (0b 2 ) = x 0 lookup (Node 2 x 0 l r) (i 1 2 ) = lookup l i lookup (Node 2 x 0 l r) (i 2 2 ) = lookup r i We make the same observation as for the successor function: if we ignore the subscripts, the rst three clauses are identical to the last three clauses. In other words, the same indexing scheme applies to both varieties of inner nodes: The denition of tabulate is similarly repetitive. tabulate : (Index n → elem) → Array n elem For example, the call tabulate {0b 1 1 2 1} id yields the Braun tree shown below. Here we observe the eect of the re-indexing isomorphism based on zipping or interleaving, see Section 6.2. Elements at odd positions are located in the left sub-tree, elements at even, non-zero positions in the right sub-tree.
There is, however, one problem with this denition. The Node 2 constructor has two subtrees: one of size n, the other of size succ n. As a result, the (implicit) Leibniz number passed implicitly to the tabulate function is not obviously decreasing: one call is passed n; the other is passed succ n. While the latter represents a smaller natural number, it is not a structurally smaller recursive call. As a result, Agda rejects this denition as it stands. There is a reasonably straightforward argument that we can make to guarantee terminationeven if the recursion is not structural, it is well-founded: each recursive call is performed on a structurally smaller Peano number. In the remainder of this section, we will ignore such termination issues.
As before, the indexing scheme determines the implementation of the sizechanging operations. nil : Array zero elem nil = Leaf cons : elem → Array n elem → Array (succ n) elem cons x 0 (Leaf) = Node 1 x 0 Leaf Leaf cons x 0 (Node 1 x 1 l r) = Node 2 x 0 (cons x 1 r) l cons x 0 (Node 2 x 1 l r) = Node 1 x 0 (cons x 1 r) l head : Array (succ n) elem → elem head {0b} (Node 1 x 0 l r) = x 0 head {n 1} (Node 2 x 0 l r) = x 0 head {n 2} (Node 1 x 0 l r) = x 0 tail : Array (succ n) elem → Array n elem tail {0b} (Node 1 x 0 l r) = Leaf tail {n 1} (Node 2 x 0 l r) = Node 1 (head l) r (tail l) tail {n 2} (Node 1 x 0 l r) = Node 2 (head l) r (tail l) Consider the denition of cons. Both recursive calls of cons are applied to the right sub-tree, additionally swapping left and right subtrees. Of course! Adding an element to the front requires re-indexing: elements that were at even positions before are now located at odd positions, and vice versa. .
One-two trees may be characterized as lazy Braun trees: the rst cons operation is in a sense delayed with the data stored in a two-node. The next cons forces the delayed call, issuing two recursive calls. By contrast, cons for Braun trees recurses immediately, but makes only a single call. However, after two steps swapping the sub-trees twicethe net eect is the same. The strategy, lazy or eager, determines the performance: cons for Braun trees has a worst-case running time of Θ(log n), whereas cons for one-two trees achieves the logarithmic time bound only in an amortized sense.
In this paper, we focus on one-sided exible arrays. Braun trees are actually more exible (pun intended) as they also support extension at the rearthey implement two-sided exible arrays. Through the lens of the interface, cons and snoc 7 are perfectly symmetric. However, due to the layout of the data, their implementation for Braun trees is quite dierent. If an element is attached to the front, the positions, odd or even, of the original elements change: sub-trees must be swapped. By contrast, if an element is added to the rear, nothing changes: the sub-trees must stay in place. Depending on the size of the array, the position of the new element is either located in the left or in the right sub-tree, see Figure 2.
Staring at the sequence of trees, the position of the last element, the number 14, Turning to the implementation of snoc, the code is pretty straightforward since the data constructors carry the required size information: if the original size is odd, the element is added to the left sub-tree, if the size is even, it is added to the right sub-tree. snoc : Array n elem → elem → Array (succ n) elem snoc (Leaf) x n = Node 1 x n Leaf Leaf snoc (Node 1 x 0 l r) x n = Node 2 x 0 (snoc l x n ) r snoc (Node 2 x 0 l r) x n = Node 1 x 0 l (snoc r x n ) As the relative order of x 0 , l, r, and x n must not be changed, each equation is actually forced upon us! (The power of dependent types is particularly tangible if the code is developed interactively.) The implementation of snoc shows that the cases for Node 1 and Node 2 are not necessarily equal! This has consequences when porting the code to nondependently typed languages such as Haskelldepending on the interface explicit size information may or may not be necessary. 7 It is customary to call the extension at the rear snoc, which is cons written backwards. Remark 6 (Haskell). Continuing the discussion of Remark 5, let us again translate Agda into Haskell code. The implementation of snoc has demonstrated that we cannot simply identify Node 1 and Node 2 . For a Haskell implementation there are at least three options: we identify the constructors Node 1 and Node 2 but maintain explicit size information, either locally in each node or globally for the entire tree; or we identify the two constructors and recreate the size information on the ythis can be done in Θ(log 2 n) time [20]; or we faithfully copy the Agda code at the cost of some code duplication. The duplication of code can, however, be ameliorated using or-patterns.

8
If we rather arbitrarily select the second option, data Array :: Type → Type where Leaf :: Array elem Node :: elem → Array elem → Array elem → Array elem the implementation of lookup is short and sweet, lookup :: whereas the denition of snoc is more involved. snoc :: Array a → a → Array a snoc xs x n = put xs (size xs) where put (Leaf) n = Node x n Leaf Leaf put (Node a l r) n | n`mod`2 ≡ 1 = Node a (put l ((n -1)`div`2)) r | n`mod`2 ≡ 0 = Node a l (put r ((n -2)`div`2)) Unfortunately, the running time of snoc degrades to Θ(log 2 n), as it is dominated by the initial call to size. The culprit is easy to identify: the cons operation makes two recursive calls for each carry (eagerly or lazily), whereas incr makes do with only one. There are two recursive calls as we introduced two recursive sub-trees when we invoked (an instance of ) the sum law during triecation, see Section 8: where A ×2 = A A and A 2 = A × A. The isomorphism states that a nite map whose domain has an even size can be represented by two maps whose domains have half the size. If you know the laws of exponents by heart, then you may realize that this is not the only option. Alternatively, we could replace the nite map by a single map that yields pairs. The formal property is a combination of the product rule also known as currying, law-of-exponents-×, and the sum rule: law-of-exponents-sq : Building on this isomorphism the trieication of the original index set of Section 6.2 proceeds as follows.
All in all, we obtain the following datatype, which is known as the type of binary, random-access lists.
Two remarks are in order. First, our binary numbers are written with the most signicant digit on the left. For the array types above, we have reversed the order of bits, as this corresponds to the predominant, left-to-right reading order.
Second, our nal implementation of arrays is a so-called nested datatype [3], where the element type changes at each level. Indeed, a random-access list can be seen as a standard list, except that it contains an element, a pair of elements, a pair of pairs of elements, and so forth. Nested datatypes are also known as non-uniform datatypes [16] or non-regular datatypes [22].
The rest is probably routine by now. As usual, we extract the witnesses of the trieify isomorphism, lookup and tabulate.
lookup : Array n elem → (Index n → elem) The call tabulate {0b 1 1 2 1} id yields the random-access list shown below. Now the elements appear sequentially from left to right. But wait! Isn't our indexing scheme based on interleaving rather than appending as set out in Section 6.2? This is probably worth a closer look. Let us dene a variant of lookup that takes its two arguments in reverse order and works on the primed variants of our arrays, dened on the previous page. access : Index n → Array n elem → elem access i t = lookup t i If we compare the implementation of access for one-two trees access (i 1 1 ) (Node 1 x 0 xs) = access i (proj 1 xs) to the one for random-access lists, access (i 1 1 ) (One x 0 xs) = proj 1 (access i xs) we make an interesting observation. The two projection functions are composed in a dierent order: access i · proj 1 versus proj 1 · access i. Of course! This reects the change in the organisation of data: we have replaced a pair of sub-trees by a sub-tree of pairs. In more detail, the k-th tree of a random-access list corresponds to the k-th level of a one-two tree. As the access order is reversed, the corresponding sequences are bit-reversal permutations of each other. Consider, for example, the lowest level: 9 13 11 15 10 14 12 16 (one-two tree) is the bit-reversal permutation of 9 10 11 12 13 14 15 16 (random-access list).
It is time to reap the harvest. Since our new datastructure is list-like, cons makes do with one recursive call.
cons : elem → Array n elem → Array (succ n) elem cons x 0 (Nil) = One x 0 Nil cons x 0 (One x 1 xs) = Two x 0 x 1 xs cons x 0 (Two x 1 x 2 xs) = One x 0 (cons (x 1 , x 2 ) xs) The implementation is truly modelled after the binary increment. This entails, in particular, that cons runs in constant amortized time. Figure 3 shows cons in action, mirroring Figures 1 and 2. The drawings nicely reect that a 2 of weight 2 k is equivalent to a 1 of weight 2 k+1 , see rst and third diagram.
If we ip the equations for cons, we obtain implementations of head and tail. head : Array (succ n) elem → elem head {0b} (One x 0 xs) = x 0 head {n 1} (Two x 0 x 1 xs) = x 0 head {n 2} (One x 0 xs) = x 0 tail : Array (succ n) elem → Array n elem tail {0b} (One x 0 xs) = Nil tail {n 1} (Two x 0 x 1 xs) = One x 1 xs tail {n 2} (One x 0 xs) = Two (proj 1 (head xs)) (proj 2 (head xs)) (tail xs) Observe that we need to make the implicit size arguments explicit, so that Agda is able to distinguish between a singleton array, rst equation, and an array that contains at least three elements, third equation. We leave the denition of a suitable list view as the obligatory exercise to the reader (the solution can be found in the accompanying material). data Array :: Type → Type where Nil :: Array elem One :: elem → Array (elem, elem) → Array elem Two :: elem → elem → Array (elem, elem) → Array elem and the denition of recursive functions over them.
Note that the denition is not typable in a standard Hindley-Milner system as the recursive call has type Array (elem, elem) → (Integer → (elem, elem)), which is a substitution instance of the declared type. The target language must support polymorphic recursion [19]. As typability in this system is undecidable [8], Haskell requires the programmer to provide an explicit type signature.
Random-access lists outperform one-two trees and Braun trees. But, all that glitters is not gold: unlike their rival implementations, random-access lists do not support a snoc operation, extending the end of an array. For this added exibility, we could symmetrize the design. The point of departure is a slightly weird number system that features two least signicant digits, one at the front and another one at the rear. Inventing some syntax, 1 2 0 1 1, for example, represents 1 + 2 · (2 + 2 · 0 + 1) + 1 = 8. If we trieify a suitable index type based on this number system, we obtain so-called nger trees [14,4]. But that's a story to be told elsewhere. 11 Related work We are, of course, not the rst to observe the connection between number systems and purely functional datastructures. This observation can be traced as far back as early work by Okasaki [21] and Hinze [9,11]. Indeed, Okasaki writes data structures that can be cast as numerical representations are surprisingly common, but only rarely is the connection to a number system noted explicitly. This paper tries to provide a framework for making this connection explicit.
Nor are not the rst to propose such a framework. McBride's work on ornaments describes how to embellish a data type with additional informationand even how to transport functions over one data type to work on its ornamented extension. Typical examples include showing how lists arise from decorating natural numbers with additional information; vectors arise from indexing lists with their length. Ko and Gibbons [15] have shown how these ideas can be applied to describe how binomial heaps arise as ornaments on binary numbers. Similarly, binary random-access lists can be implemented systematically in Agda by indexing with a slight variaton of the binary numbers used in this paper [25]. Instead of using the Leibniz numbers presented here, this construction uses a more traditional`list of bits' to represent binary numbers. The resulting representation is no longer unique, leading to many dierent representations of zeroand the empty binary random access list accordingly. Without such unique a representation, the isomorphisms described in this paper do not hold.
The datastructures described in this paper are instances of so-called Napierian functors [7], more commonly know as representable functors. By design each of our datastructures is isomorphic to a functor of the form P → A, for some (xed) type of positions P. Indeed, this is the key lookup-tabulate isomorphism that we use to calculate the dierent datastructures throughout this paper. Gibbons's work on Napierian functors was driven by describing APL and enforcing size invariants in multiple dimensions. Although quad trees built from binary numbers are briey mentioned, the dierent datastructures that can be calculated using positions built from binary numbers remains largely unexplored.
Nor are we the rst to explore type isomorphisms. DiCosmo gives an overview of the eld in his survey article [5]; Hinze and James have previously shown how to adapt an equational reasoning style to type isomorphisms, using a few principles from category theory. Recent work on homotopy type theory [26], where isomorphic types are guaranteed to be equivalent, might facilitate some of the derivations done in this paper, especially when establishing that operations such as cons are respected across isomorphic implementations [17].
There is a great deal of literature on datatype generic tries [10,12,13]. These tries exploit the same laws of exponentiation that we have used in this paper.
Typically, these tries are used for memoising computations, trading time for space, whereas this paper uses the same laws in a novel context: the derivation of datastructures. These datatype generic tries have appeared in the context of dependent types when recognising languages [1,6] and memoising computations [24]but their usage is are novel in this context. 12 Conclusion

A Deriving Operations
The denition of cons for one-two trees can be systematically inferred using the abstraction function = lookup that maps one-two trees to nite maps. The types dictate that cons applied to a one-node returns a two-node, so we need to solve the equation cons x 0 (Node 1 x 1 l r) ≡ Node 2 x 0 x 1 l r in the unknowns x 0 , x 1 , l , and r . To determine the components of the twonode, we conduct a case analysis on the indices, the arguments of the nite maps.
The index i 2 2 , for example, determines the third component, the left sub-tree.
The derivation works towards a situation where we can apply the specication of cons (2b).
The equation for two-nodes cons x 0 (Node 2 x 1 x 2 l r) ≡ Node 1 x 0 l r is more interesting to solve. Consider the index i 1 1 that determines the second component, the left sub-tree of the one-node. We need to conduct a further case distinction on i . Otherwise, Agda is not able to gure out the predecessor of i 1 1 , that the equation isucc i = i 1 1 uniquely determines i for a given i . For the case analysis we use a combined Peano view, on binary numbers and on binary indices, dealing with the inductive case rst.