Linear Numeral Systems

We investigate numeral systems in the lambda calculus; specifically in the linear lambda calculus where terms cannot be copied or erased. Our interest is threefold: representing numbers in the linear calculus, finding constant time arithmetic operations when possible for successor, addition and predecessor, and finally, efficiently encoding subtraction—an operation that is problematic in many numeral systems. This paper defines systems that address these points, and in addition provides a characterisation of linear numeral systems.


Introduction
This paper is about numeral systems in the linear λ-calculus. By a numeral system we mean a mathematical notation (in our case linear λ-terms) for representing both numbers and arithmetic operations. By linear we mean that the λ-terms do not copy or erase arguments. For the purpose of this paper, the arithmetic operations that we are interested in are restricted to: successor, addition, predecessor, subtraction and a test for zero.
The λ-calculus is a model of computation that can represent both programs and data. Representing numbers is an essential step towards proving that the calculus is adequate for capturing all computable functions. When the λ-calculus was introduced, Church [2] gave what are now known as Church numerals, which is the most well-known numeral system. This representation supports all the usual arithmetic operations (successor, addition, predecessor, etc.), but some operations are more efficient (in terms of β-reductions) than others. Specifically for Church numerals, the predecessor operation is costly, and we give a summary later to show that other numeral systems can encode this operation in a more efficient way, but at the cost of introducing a less efficient way of computing one of the other B Ian Mackie i.mackie@sussex.ac.uk 1 University of Sussex, Brighton , UK operations. Here we are interested in the linear λ-calculus. This is a proper subset of the usual calculus, where explicit copying and erasing (by using variables in the body of a term more than once or not at all, respectively) are not permitted. All functions must use their arguments exactly once, and as a consequence the calculus has extremely weak expressive power-all functions normalise in linear time. In the linear case, β-reduction is a constant time operation, so counting reductions gives a cost metric. It is important to note that if a function is definable in the linear λ-calculus it will be constant or linear time. This makes the linear λ-calculus an interesting framework for encoding arithmetic. However, it comes as no surprise that there is no notion of adequacy for linear numeral systems (the ability to represent all computable functions). The reason for this is that iteration and recursion are not linearly definable. This makes the definition of addition and subtraction a difficult problem that we address in this paper.
We explore the limits of this system. Once we have defined linear numerals and the arithmetic functions that are linearly definable, it is then possible to add non-linear functions to compute the remaining functions as usual. Our interest here is therefore: -to define suitable representations (data-types) of numerals in the linear λ-calculus; -and to define linear functions over these representations to compute: -successor, addition and predecessor that are constant time; -subtraction and a test for zero that are as efficient as possible.
It turns out that for all the systems we define, successor, addition and predecessor all must be constant time in fact (we cannot define anything less efficient). The test for zero and subtraction both require erasing, and this is where some cost is introduced. However, they are both worst case linear time. Indeed, erasing is one of the main issues that we must deal with in a linear setting. A test for zero will return a Boolean result (true or false), and consequently the number must be erased. A similar situation arises with subtraction, where part of a number must be erased. Consequently, neither of these operations can be constant time in a linear framework.
The main contributions in this paper are: -a series of linear numeral systems, leading to a characterisation of linear numerals; -constant time successor, predecessor and addition operators; -subtraction that has cost min(m, n); -test for zero that is linear in the size of the term.

Related Work
There has been a great wealth of work since the λ-calculus was introduced on representing data; specifically numeral systems. The λI -calculus [1], where abstractions must bind at least one variable was used, and several systems have been adapted to work in this way.
There are many numeral systems in the literature (and many more used as exercises in λ-calculus courses). Apart from the most well-known system of Church already mentioned, we find in the literature a system by Scott (first reported in [3]), and some so-called unusual ones by Wadsworth [10] (also developed by Böhm). With an emphasis on constant time operations, there is the work of Parigot and Rozière [8], which however doesn't focus on linearity, but is the work closest to ours in spirit.
From a very different perspective, and a focus on compact representation, there are binary representations [7] that give logarithmic size numbers. However, as we show later, these are not linearly definable. There is also the work of Böhm and Dezani (see for example [9]) where numbers are represented by non-terminating λ-terms, but again these are not possible in the linear λ-calculus.
Rezus [9] and Barendregt [1], contain several other systems that we will not enumerate here, where one of the main concerns is adequacy. Linear systems are never going to be adequate, but for us the focus is on time efficient representations. We provide a selection of systems. Some are closely based on existing systems (for example those defined by Rezus [9]), adapted to a linear setting, and some other ones are new.

Overview
The rest of this paper is structured as follows. We first recall some background concepts for the λ-calculus and numeral systems. In Sect. 3 we discuss ways of representing numbers generally, so that they can be applied to our setting. In Sect. 4 we define some linear numeral systems, specifically one of the main contributions of the paper: a linear system that allows constant time operations as well as subtraction. In Sect. 5 we give a characterisation of linear numeral systems. We discuss these systems and conclude in Sect. 6.

Background
We assume familiarity with standard λ-calculus concepts, including notions of free variables (FV), substitution (t[u/x]), βand η-reduction. We refer the reader to [1,5] for additional background to these. To fix the syntax we start with a definition.

Definition 1 (Linear terms)
Assume an infinite set X of variables denoted by x, y, z, …, then the set L of linear λ-terms is the least set satisfying: In this definition, the free variable constraints ensure linearity of terms by construction. Application associates to the left and abstraction binds as far to the right as possible. Consequently we will economise on parentheses whenever possible. Examples of linear terms include: I = λx.x, B = λx yz.x(yz), C = λx yz.xzy, and function composition: t•u = Btu. Notable non-linear terms are S = λx yz.xz(yz) and K = λx y.x, which incorporate explicit copying and erasing of arguments respectively.
The λI -calculus [1] is an intermediate calculus that does not allow erasing, but does allow copying (this is analogous to relevance logic), and we could define an affine calculus that allows erasing but not copying. Thus the linear λ-calculus is the most constrained calculus with respect to resource usage, and related to linear logic.
Every linear term is typeable, and an alternative way to write the variable constraints of Definition 1 is the following typing system, where −• is linear implication (functions that use their arguments exactly once): In this system the contexts and are used to guarantee the linearity constraints. Types will not play much role in the following, but it is worth remembering that all linear terms are typeable.

··· ···
Occasionally, it will be illuminating to see linear terms drawn as graphs. We define G (·) inductively over the structure of a linear term t. The general structure is given by the following diagram, where FV(t) = {x 1 , . . . , x n }: The three diagrams given in Fig. 1 show the translation, respectively, of a variable G (x), which is an edge, an abstraction G (λx.u), which introduces a new node λ to bind the unique occurrence of the variable x (that we assume is left-most), and finally an application G (uv) introduces a new node @. To give the general idea of the graphs, we give two examples of closed terms: G ((λx.x)(λx.x)) and G (λx y.yx), which are shown in Fig. 2.
Definition 2 (Reduction) There are two reduction rules: Linear β-reduction is a constant time operation (there are implementations where substitution has no cost). The number of β-reductions is therefore a reasonable measure of the cost of a computation. η-reduction does not need a side-condition, as there cannot be another occurrence of x in t. Linear terms are stable under βand η-reduction: if t ∈ L , t → u then u ∈ L . We occasionally mention η-reduction, but all reductions will be β-reductions unless explicitly labelled otherwise. The reflexive, transitive closure is denoted → * , and we write t → n u meaning that t reduces to u in exactly n steps.
If no reduction can take place in a term, then the term is a normal form. Reduction in the linear λ-calculus is strongly normalising and confluent. Consequently, all terms have a normal form (and therefore also a head normal form). Moreover, all terms are (simply) typeable, and reduction preserves types.
We can now show a number of properties of terms and reduction that will be of use later. First, it is useful to characterise terms in the following way. Lemma 1 1. Each term is a variable, an abstraction term or an application term. 2. Each application term is of the form t 1 . . . t n , where n ≥ 2 and t 1 is not an application term. 3. Each abstraction term is of the form λx 1 . . . x n .t, where n ≥ 1 and t is not an abstraction term.
As an almost direct consequence of Lemma 1, we have the following result about the shape of normal forms. This is standard result, but it is important for later results so we provide some additional details here: Lemma 2 Every closed linear term t in normal form has the shape: Proof Since t is closed it cannot be a variable. Neither can it be an application term because if it were, then t 1 in t 1 . . . t n would have to be an abstraction, and thus there is a redex contradicting that this is a normal form. Thus t must be an abstraction term, say λx 1 . . . x n .u, n ≥ 1. Since u cannot be an abstraction it must either be a variable in which case it must be one of x i , 1 ≤ i ≤ n, or it is an application term v 1 . . . v m , m ≥ 2. Now v 1 must be a variable, which again must be one of the x i , 1 ≤ i ≤ n, because if it were an abstraction it would not be a normal form. Thus the only possibilities are instances of the above.
A convenient metric on linear terms is the size of the term.

Definition 3 (Size of linear term)
We define |·| over the structure of terms: |x| = 1 |λx.t| = |t| + 1 |tu| = |t| + |u| + 1 We can now glean some insight into the structure of linear terms, and monitor the structure (and consequently the size) of a term under reduction. The following properties are key to our ideas and have a lot of say in the following about the representation of numeral systems and operations that can be encoded. . . , y n }, then by the induction hypothesis twice, we have |(uv)| = 3(k 1 + 2 − m) + 3(k 2 + 2 − n) + 1, which we can reorganise as 3(k 1 + k 2 + 1) + 2 − (m + n) as required. Finally, if t is an abstraction, say λx.u, where FV(u) = {x, x 1 , . . . , x n }, then by the induction hypothesis we have |(λx.u)| = (3k + 2 − (i + 1)) + 1, which simplifies to 3k + 2 − i as required. The result follows because when t is closed, i = 0. 2. The unique occurrence of x in t has size 1 which will be replaced by u. 3. Each β-reduction (λx.t)u → t[u/x] removes an application, abstraction and a variable.
As a corollary, reduction is easily seen to be strongly normalising. This result tells us something very important about representing numbers in the linear λ-calculus, and gives us some hints and a start in the process of characterising linear numeral systems that we will return to later in the paper. In particular, it tells us that any encoded number must be proportional in size to the number it represents, and that operations such as addition must be constant in time. Thus compact representations, such as binary numerals, cannot be defined as each successor cannot be represented by a constant operation. We will return to this later in the paper.
We recall now two standard notations for repeated applications.
An important structure that will be used throughout the paper is the following representation of a list of terms, suitably adapted to the linear λ-calculus.
In particular, we note that the empty sequence is = λz.z, and the singleton is t = λz.zt. Function composition (t • u = Btu) gives concatenation of sequences in four βreductions: If we were interested in preserving the order of the elements in the sequence, then we can define a variant of this using B = λx yz.y(xz) rather than B. However, we will see that the use we make of this result always joins sequences of the same elements, so the results are equivalent.
The following is a standard definition of a numeral system that we have adapted for the linear case. We restrict to the case where each numeral is a term in normal form, as the linear λ-calculus is terminating.
Definition 6 (Linear numeral system) A numeral system is a sequence of λ-terms: d = d 0 , d 1 , . . ., such that 1. each d n is a closed term in normal form; 2. there exist terms S (representing successor), and Z (representing the test for zero) and two different terms T and F (representing the Booleans true and false), that satisfy Sd n → * d n+1 , Zd 0 → * T and Zd n+1 → * F. Furthermore, we call a numeral system linear when: 3. each d n is a closed linear term; 4. the terms S, Z , T and F are linear; 5. addition, predecessor and subtraction are all linearly definable.
The first two points in the above definition are enough to give an adequate numeral system in the full λ-calculus, but we will need to use iteration or recursion to build the arithmetic functions. Because neither iteration nor recursion are linear definable, it is not possible to get an adequate system. Therefore we have chosen to include in the definition of linear numeral system the requirement to support linear addition, predecessor and subtraction. As a consequence several systems that we put forward in this paper are not linear numeral systems according to this definition, and we will call those linear systems 'candidates'. Therefore, in some candidate systems we present, the numbers may be represented by linear terms, but the arithmetic functions would be non-linear terms.
To summarise, in this paper we are interested in the following arithmetic operations: successor, addition, predecessor, subtraction and test for zero, because these are linearly definable.

Representing Numbers
We can represent numbers as algebraic terms (i.e., first order syntax). But the introduction of binders (i.e., higher order) gives some new ideas. We briefly recall some of these ideas in this section, and then review some well-known numeral systems in the (non-linear) λ-calculus.

Numbers
The standard inductive data type for numbers requires a zero (0) and a successor (S). We can then write numbers as 0, S(0), S(S(0)), etc., and we can write recursive definitions over this data type. For example addition and subtraction can be defined as: where sub is a partial function. Typically, there is more than one way to write such functions, for example the second case of addition can also be defined as add (Sm) n = add m (Sn). Other alternatives can do recursion on the second argument, etc. All these variants give the same answer, but some require (a lot) more computation steps than others. With this inductive data type, add has linear complexity: it depends on the first argument. So for n large, add 0 n is a very different computation to add n 0. Consequently, it is difficult to know the effect of commutativity of operations. Subtraction is also a linear function: it depends on the number being subtracted. However, the definition is more complicated, pattern-matching on both arguments, and it only works when the second number is smaller than the first (because we don't have negative numbers). In this notation, we can write a predecessor function and a test for zero both as constant time functions.
The reason that add is so inefficient is because we are essentially concatenating two lists of successors. It is well-known that we can do concatenation much better than this-constant time append operations can be achieved if we just keep a pointer to the start and end of each list. This idea is easy to implement with languages with binders, in our case λ-terms. Specifically, we can represent numbers as λx.S n x, so 0 = λx.x, 1 = λx.Sx, 2 = λx.S(Sx), etc. In this representation, there is a constant time addition operation (four β-reductions), which is just composition: Equally important, n • m → 4 n + m also, so commutativity does not change the cost of computation. So if we can find a representation of S as a λ-term, then we have an efficient numeral system. In the λ-calculus, a λ can be thought of as a constructor and an application as a destructor (dually, we can also think of application as the constructor, and the λ as the destructor). Thus we have essentially a function/constructor system. We see later that Church numerals work in this way, so they have a constant time addition, but Scott numerals do not because they correspond closer to the inductive data type given at the start of this section. Thus, the complexity of the addition operation depends crucially on the chosen representation.
What about the predecessor and subtraction functions? These bring additional issues. Consider first the predecessor: pred 0 = 0, pred (Sn) = n. With this definition, we can show pred(Sx) = x, but we cannot simplify S(pred x). This is a problem for constructor systems, and also an issue for us. From these examples we conclude that constant time functions should be possible in some cases.
If addition is like concatenation of lists, then subtraction is like splitting a list. Consequently part of the list will need to be erased, which is not straightforward in a linear system.
There are other ways to represent numbers as λ-terms, for example binary representations (see for example [7]). However, the results of this paper show that these representations are not possible in the linear λ-calculus, and we discuss this topic in more detail in Sect. 5. We next review four well-known numeral systems in the λ-calculus. None of these are linear but we will refer to them later for inspiration.
We assume for the rest of this section the usual (non-linear) λ-calculus, and standard Booleans: T = λx y.x, F = λx y.y. We write succ, add, pred, sub, and zero to represent the successor, addition, predecessor, subtraction and test for zero.

Church Numerals
Church numerals [2] encode numbers with repeated application: λx f. f n x.
. .. Each number is an iterator, where a binder represents the iterated term, and a second binder is a place-holder at the end of the list of applications. Consequently, Church numerals can be seen as lists with a pointer to the end of the list, and thus we can predict that an efficient addition should be possible in this representation as discussed in Sect. 3.1 The arithmetic operators can be defined by: There are a number of alternative definitions of these operations. For example, the successor function can also be defined as succ = λnx f.n( f x) f . The difference between the two is where we add an extra iteration: at the front or at the end of a list. - An alternative addition is: add = λmn.mn succ. This is a lot more expensive, as it iterates the successor function m times. The one given above is constant time however, because we have access to the end of the list. To see this, ( f x)))). We depict this, showing just the applications for the numeral, and the concatenation by a dotted line: This operation is commutative, so add a b = add b a and moreover the result is obtained with the same number of reductions. There are other alternatives for predecessor, for instance using pairs, but we will need iteration or recursion for this one. Church numerals are interesting because a wealth of functions can be defined without using recursion since they have iteration built in. They offer a compact representation, and operations like addition can be efficient (constant time). But on the down side, predecessor, and therefore subtraction, are very expensive.

Wadsworth Systems/Böhm Systems
Wadsworth [10] presented a selection of (so-called unusual) numeral systems, based on a chain of λ-abstractions, and then a permutation of the variables. Some of these systems were also developed by Böhm. In one of these systems, n is represented as a permutation of n + 2 variables: n = λx 1 x 2 . . . x n+2 .x 1 x 2 . . . x n x n+2 x n+1 . Interesting for us is that this representation is linear: However, we lose linearity with some of the operations: This representation has a very efficient predecessor, but unfortunately, the addition operation requires a recursive function (so the fixed point combinator). We also remark that we need two copies of the number for the zero test to trigger an η-collapse: this is how all the abstractions can be erased. Subtraction also needs to be defined as a recursive function.

Standard Numerals
Barendregt [1] introduced the standard numerals. These are numbers represented using pairs: These are all very simple operations, especially test for zero and subtraction. But addition needs general recursion, thus becomes an expensive operation.

Scott Numerals
Scott numerals [3], like standard numerals, have an efficient predecessor operation. In addition, although the terms are not linear, they do not duplicate variables.
Scott numerals are given by the following idea: d 0 = λx y.x, d 1 = λx y.y(λx y.x), d 2 = λx y.y(λx y.y(λx y.x)) . . . , so each number is represented in a similar way to an inductive data type. In this representation one can define operations: Successor and predecessor are simple constant time operations, but there is no simple addition or subtraction. To encode the latter, we need iteration or recursion which are not part of a linear system. Alternatively, for addition, we would need a pointer to the end of list, which again is not part of this definition.

Summary
All of the above systems are adequate for representing all computable functions. Each system has operations that are efficient, for example, addition is a constant time operation using Church numerals, and predecessor is a constant time operation for Scott numerals. However, no system is efficient for all operations. Specifically: -Each numeral system presented has a good feature, for example constant time addition (Church), constant time predecessor (Scott, Standard), efficient subtraction (Standard). -Each system presented also has a bad feature: Scott numerals need recursive function to encode addition, and the predecessor in Church numerals (and therefore subtraction) is a very complicated and expensive operation. -Many of the systems cannot encode an operation without the need for recursion machinery. Church numerals have the advantage here, as they have a built-in iterator. -No system presented is linear, and no system has subtraction.
Our aim is therefore to understand how these numeral systems manage certain operations, and thus we pick the best features of each where we can.

Linear Numeral Systems
We now limit ourselves to the linear λ-calculus, and give several possible representations of numbers. Linear numeral systems are not new-some of the earliest systems were linear, and this is because of the popularity of the λI -calculus. However, the novelty here is that we want to characterise linear systems, and in particular look at efficient ways of computing test for zero, and also subtraction. Before we can define any, we need some ideas to simulate erasing and to represent Booleans.

Linear Erasing and Booleans.
Copying can never be simulated in the linear λ-calculus because of Lemma 3, Part 3: reduction reduces the size of the term, so there is no way to build a term of arbitrary size. However, it is possible to simulate erasing. There are two ideas that we will make use of to simulate erasing in the linear λ-calculus. The first is inspired by the solvability result for the λ-calculus [1], and is based on consuming data. The second is based on pairing an answer together with the unwanted (garbage) part of the data. We first explain these in more detail and then prove the associated properties.
1. Erasing by consuming. A linear β-reduction erases a λ, an application and a variable, so the size of the term reduces by 3, as shown previously. If we organise our data accordingly, we can use this idea to erase terms. For a trivial example consider (λx.x)(λy.y). This term reduces to λy.y, where λx.x has been consumed (erased). The challenge therefore is to see if we can do this for any term t: can we find a context C[ ] such that C[t] → * I , thus erasing t? 2. Erasing by pairing with garbage. We can simulate the non-linear term K = λx y.x by first defining the linear term K = λx y. x, y . When we apply this to two arguments, it reduces to a pair: K tu → 2 t, u . We can think of this as the result, t, together with the garbage that has yet to be erased, u. We can extend this idea so that we can accumulate all the garbage until the end of the computation, and the answer we require is the first component. This approach has previously been studied by Klop [6].
The second approach has some drawbacks. It requires that our functions return pairs. For example, if n > 0, then the test for zero may be like this: zero n = F, n , where F is the result we want, and n is the number that we need to erase. When we want to use the result of this computation, we need to extract the first component of the pair, and carry around the unwanted part, potentially adding to it with further garbage. This approach is therefore possible, but not ideal. We will use it temporarily in defining systems, but try to find ways to eliminate the extra collected garbage. We will not declare that a system has a linear definable test for zero if this is the only way it can be encoded.
The first approach requires a lot of reduction steps, but can be justified if we consider that this is doing nothing other than explicit garbage collection. Erasing by consuming is based on the following result, which is a variant of solvability. It states that a simple applicative context of identity functions will always cause a closed linear term to reduce to the identity function I . This gives an important observation that linear terms are solvable by linear contexts:

Lemma 5 (Erasing) Let t be a closed linear term. There exists a number n such that t I ∼n → * I . Moreover, there is a least such number n.
Proof Since t is terminating, we assume without loss of generality that t is a normal form. We proceed by induction on the size of t. By Lemma 2, the closed linear term t has a normal form of the shape: λx 1 . . . x n .x i t 1 . . . t m . If n = 1 and m = 0 (so t = λx.x) then there is nothing to do, otherwise we consider two cases: head redex we get a term that is strictly smaller than t. We can normalise it and conclude by induction. 2. Otherwise, n = 1 and m > 0 and we can construct t I , which reduces in one step to I t 1 . . . t m . If we now reduce the head redex we have a smaller term again. We can normalise it and then conclude by induction.
We give an example. Let t = λx.x I I I I , then this term can be erased by building t I ∼1 , i.e., by using the context C[ ] = [ ]I . It is then easy to see that C[t] = (λx.x I I I I )I → * I . We will say there is a β-collapse for a term t if there is a context C[ ] that causes the term t to reduce to the identity. Once we have the context, since the the size of the term is known, we know how many steps are needed to normalise it.
Using this idea to cause a β-collapse, we can define some data types. For example, inspired by the encoding of Booleans in the λI -calculus [1], we can define linear Booleans.

Definition 7 (Linear Booleans)
For n ≥ 0, we define B n = (T n , F n ), where T n = λx y.y I ∼n x and F n = λx.x I ∼n .
We remark that F n = λx.x I ∼n = η λx y.x I ∼n y. Using Lemma 5, we have:

Lemma 6 For linear λ-terms t and u, and for some n ≥ 0, if t I ∼n → * I and u I ∼n → * I , then if m ≥ n, then: T m tu → t and F m tu → u.
Thus, if we have enough I s, then we can erase the part of the conditional that is not needed in the result. We now have all we need to start defining linear numeral systems.

Numbers
We begin by presenting two numeral systems. The first one is based on Church sequences. Some of the ideas for this system are due to Böhm, and studied in detail in Rezus [9]; the main difference is that we need all the operations to be defined as linear terms. The second system is a variant of Scott numerals where we have access to the end of the list. We examine the shortcomings of each system, then conclude the section by introducing a third system which is a combination of the first two, where we are able to define all the operations.
In the following we write S for successor, A for addition, P for predecessor, M for minus (Subtraction), and Z for test-for-zero. We also annotate the names of the functions with underlining and over-lining to distinguish between the first two systems, for example S, S, etc.

First Candidate
Our first linear numeral system candidate uses linear sequences, where each element of the sequence is I : It is easy to see that each numeral is represented by a linear term. We can define the following linear arithmetic operations: Definition 8 Successor, addition, predecessor and test for zero are defined as follows: There are some minor variants possible for some of these operations, for instance: S = λx y.x(y I ) and A = λx yz.y(xz). However, they have no impact whatsoever on the efficiency of the system (the same number of reductions are needed in each case). The key observation is that a number is represented by a chain of applications with access to the first and last element, which is why there are two variants for successor: adding to the left or the right hand-side of this chain of applications, in a similar way as shown for Church numerals in Sect. 3.2.

Proposition 1 The arithmetic functions compute the expected results:
Proof We check three of these. One nice aspect of this system is the test for zero: any number will collapse to the identity function when applied to just one I: n I → * I , so the erasing of the number is easily simulated.

Corollary 1 1. S, A and P are all constant time operations. 2. Z is not a constant time operation (there exists no constant time Z since the number must be erased).
Although this system meets many of our requirements, it fails on one aspect: subtraction. The situation is not recoverable because of Lemma 5: Proposition 2 There is no linear term M such that M m n → * m − n.
Proof Assume that there is a term M, then by Lemma 2 this term must have the normal form: Interestingly, there is a linear term M n such that m • M n → * m − n. This is given by M 0 = λx.x and M n+1 = λyx.x(M n y). This has a cost in the size of n, and it is one of the very best subtractions for numeral systems. However, it is not so useful here since we cannot construct the term M n from n, as shown in the previous result, with a linear function. We will come back to this result later in the third attempt at building a linear numeral system.

Remark 1
All the linear systems that we define can also be expressed using the linear combinators C, B and I . For example, the system above can be written as: 0 = I , 1 = C I I , 2 = C(C I I )I, . . . , n + 1 = Cn I . Then each of the linear operators can be written using these combinators, for example S = C(BC(B I ))I .

Second Candidate
Our second attempt to build a linear numeral system is based on alternating abstractions and applications. This system builds terms in a similar style to Scott numerals in fact, but has the pointer to the end. This way of representing numbers was also found by Parigot and Rozière [8], although linearity was not a concern in their case, so the operations are different. In this system there is no way to erase a number however, so the test for zero will have to be simulated in a specific way using pairs as discussed earlier, in Sect. 4.1. We will harness this idea in the next system to allow numbers (and pairs) to be erased.
We define: d = , λx. x , λx. x . . ., i.e., We can see how the system is working, using a diagram. The following represents 2 = λyx.x(λx.x y): Here we can see that each successor is represented by an abstraction, a variable and an application (λx.x−, where − is the rest of the number represented as a list. Just like in the previous linear system, we have access to the first and last element of the list, so we can add to either end of this list of successors. Each numeral is represented by a linear term, and the following operations are also linear: Definition 9 Successor, addition, predecessor, subtraction and test for zero can be defined as follows: S = λnyx.x(ny) A = λmny.m(ny) P = λny.ny I M = λmny.n I (my) where T 2 = λx y.y I I x and F 2 = λx y.x I I y are the linear Booleans as defined previously.
There is a possible variant if we take S = λx y.x(λx.x y). Z in this system does not meet our requirement fully, since the result is a pair. It allows us to simulate the test for zero, and if we had enough I 's, then we would be able to consume the number. It does however meet all the other requirements, and in particular, we now have subtraction. The situation is again not recoverable:

Proposition 4
There is no linear term Z such that Z n → * T 2 /F 2 Proof Wadsworth [10] has shown that in any numeral system the test for zero has a specific form: where i ≥ 1, k ≥ 0, and the principal variable is the first bound variable. So Z n → * λx 2 . . .
Since n is of the form λyx. x(λx.x(...x y)...), there are n + 1 abstractions, so if k < n part of n remains untouched. Since n cannot be known in advance it will not reduce to the representation of a Boolean.

Third Candidate
Checking carefully, we note that in the previous two system P = S, and S = P (almost, because we need to use one of the variants given). Actually, these two systems could be part of one system that represents the positive and negative numbers. Let us arbitrarily decide that n is negative, and n is positive, then: . . .
Thus S/P moves us up, P/S moves us down the integers. However neither test for zero will work for these: either Z or Z work fine for each half, but neither for both. We can define a uniform test for zero though, which returns a pair: However, representing the integers is not the direction we want to go, and there are additional problems that would need to be solved (related to problems with predecessor indicated in Sect. 3). There is an alternative way to combine these two systems in a much more useful way though. The linchpin for the third system comes from the following property, which is a key result of this paper: Now k • k → * λz.k(kz)), and so by the induction hypothesis, λz.k(kz)) → * I as required. 2. The other two cases are a consequence of the following three results which can be shown is the same way as the inductive step above.
This result gives a way to erase fully in the second system, but more importantly, it gives a way to compute subtraction in min(m, n). We harness this idea to build a third numeral system that is a combination of the previous two. We do this by building pairs of numbers. Actually, representing integers using pairs is not a new idea. A number n is a pair (a, b) which we interpret as a − b. It is then possible to define operations over these pairs, for example: Thus we can add two pairs (+ p ) using addition on numbers, and we can also subtract pairs (− p ) using addition. We will make use of a variant of this idea and build the pair (n, −n), where each number will be represented from a different numeral system. The operators for the first two systems together with Proposition 5 then allow all the functions to be computed. We remark however that storing the number twice means that there is some redundancy in the system. It is also worth pointing out that Wadsworth's system needed two copies of the number for zero test. The same is true for this system, but we are in a linear framework so need to keep hold of the second number, and no η-collapse is needed.
We now build a fully linear numeral system, which is main contribution of this paper. We define d = I, I , λyx.x y, λx.x I , λyx.x(λx.x y), λx.x I I ,…, i.e., we have a sequence: Note that we use Z from the second system here, and we now have enough I 's to fully erase the terms from the first system.

Characterising Linear Numeral Systems
There are many other candidates for a linear numeral system. One of the easiest is given by: In general we have n = · · · I . . . . Each numeral is represented by a linear term, and this corresponds directly to the inductive definition of numbers given earlier in the paper. Here is a diagram showing the building blocks 0 and S: 0 S λ λ @ Thus, we have the same successor and zero as the second candidate, but constructed differently. In this representation, the following operations are also linear:

Definition 11
We define successor, predecessor, subtraction and test for zero as follows: We remark that successor, predecessor and subtraction are beautifully simple, and we can encode the test for zero with pairs as done before. However, this time there is no addition.

Proposition 7
There is no linear term A such that A m n → * m + n.
The reason for this is the same as why there is no constant time addition for the algebraic data type for numbers given earlier, or similarly for the concatenation of linked lists: in all cases, we need to traverse the first number (list) to join it to the second one. Of course, not being able to do addition is really a failure of this system, but it can be defined outside the linear framework (by using an encoding of recursion, as for Scott numerals for instance).
We have now collected a number of "failed" linear numeral systems, and we can start to ask which systems therefore do have addition, which have predecessor, which have subtraction, etc., and then we can try to understand when this arises.
We begin by observing that in a closed linear λ-term: -each λ binds one occurrence of a variable; -each application has disjoint free variables; -each variable is bound by a λ.
Therefore, each "successor" that we build in a linear system must be built from equal numbers of variables, abstractions and applications, and at least one of each is needed each time. For example, if we want to add an application, then we need to introduce a new variable, and this can only be added if we add an abstraction to bind this variable. We cannot create a numeral system which is not balanced in this way. Thus, we immediately find a characterisation of the size of linear terms as seen in Lemma 3, Part 1. As a consequence we can understand the size of linear numerals: -Zero is represented as a closed linear term of size c, where c = 2 + 3k for some k. k = 0 in all the systems we have presented in this paper so far, because the representation of zero is just λx.x, which has size 2.  used several times already for the first, second and fourth candidates at a linear numeral system. For the third system, we used k = 2. -The size of a numeral n is therefore given by: 2 + 3k + 3nk , for some k ≥ 0, k ≥ 1.
Consider the smallest systems, where k = 0 and k = 1 (these correspond to the values for the first, second and fourth systems presented above). We now ask what are all the possible linear systems that can be built in this way. We have found that diagrams are useful for this purpose, and we will demonstrate the ideas through the diagrams. First, we remark that there are essentially two different structures we have used, depending on whether we have access to the end of the list or not. These are represented by the two diagrams in Fig. 3, where 0 and S represent the building blocks, used previously, defined using linear λ-terms. The left-hand side corresponds to the standard inductive definition of numbers, so the fourth system, and the right-hand side corresponds to systems 2 and 3, where we have access to the end of the list. All systems that use the left-hand approach will not have a linearly definable addition, whereas those on the right will. Choices of S then give different systems that we can define. Since we are assuming that k = 1, each successor will be built out of an abstraction, application and a variable. We next consider all the ways that these can be put together. There are essentially 4 ways to do this. The first two shown below are used in the first and second system respectively: For the first one, there is no choice: the abstraction here must be the identity function, as there are no other variables to bind. The second one however has some more choices that we will mention later. The next two ways in which we can combine an abstraction and application are less useful as we cannot allow terms to be ηor β-redexes.
There are however additional choices. In the right-most two configurations above where there are two edges connecting the abstraction and application nodes we can split one of them to make alternative constructions. There is not much choice: we cannot have a list of abstractions as there is no way to erase. We may try to build systems like this for example: λ λ @ @ λ λ @ @ Some of the above are useful, if we pair them with the first system -this can erase the abstraction, but none of these can lead to a linear numeral system. We can exhaust all possibilities, and establish that there is no linear numeral system when k = 0, k = 1, thus the first is with characteristic values k = 0, k = 2.
To summarise, the following table shows the four candidates for being a linear numeral system, and we indicate which operations are linearly definable (all are λ-definable if we remove the linear constraint). indicates that the operation was linearly definable, × indicates that the operation was not linearly definable, and we write 1 2 if we needed to use pairs to encode erasing (thus we consider this a partial solution). The following proposition summarises the points above.

Proposition 8 For any linear numeral system:
-The size of the term representing the number n is proportional to n. Specifically, n would be represented by a term of size nk + c. -Successor, addition and predecessor will be constant time operations.
-Test for zero and subtraction will be linear time operations.
Proof -Let c be the size of the representation of zero. Each successor must add the same size term, say k, which gives the result immediately. -The successor S of a number n will be built from an application Sn, where |Sn| = |S| + |n| + 1. Since each β-reduction reduces the size of a term by 3, we know that the extra structure needed to represent n + 1 must come from the term S. Thus not only must this be a constant number of β-reductions, but we know the size of the term S also. The same reasoning applies to the addition and predecessor operations. -Erasing a term can never be a constant time operation and therefore must be linear.
Subtraction and test for zero both need to erase part of a number of a whole number.
Because of this characterisation, there cannot be a more compact representation of numbers in the linear λ-calculus. In particular, we cannot represent a linear version of binary numerals (see for example [7]) for instance.

Conclusion
One of the main results in the λ-calculus is that numerals can be defined, together with successor, predecessor and test for zero functions, and the system is adequate to represent all computable functions. The numeral system presented in this paper can be understood as numeral systems in the usual λ-calculus, and replacing the linear Booleans by the standard ones, immediately eliminates the inefficiency issues associated with tests for zero. In this way, all the systems defined here are adequate. Building numeral systems in a constrained calculus allowed us to get insight into numeral systems generally. Interestingly the linearity constraints imply that we are limited in choices but also ensure that the operations are efficient: it is not possible to define a non-constant time addition for instance.
We have shown that the linear λ-calculus can be used to represent numbers in an efficient way. Specifically, successor, predecessor and addition are all constant time functions, and moreover they have to be constant time. Subtraction, frequently omitted in numeral systems, and especially cumbersome in Church style representations, can also be done efficiently. The graphical representation of linear terms has been useful to identify new representations.
One aspect of this work, and indeed one of the reasons for investigating linear numeral systems in the first place, was to find a more efficient representation of predecessor and subtraction operations for System F [4] by potentially finding new ways to represent the data. All linear terms are typeable, so all numbers, and all operations we have presented in this paper also have a type. However, each number has a different type-recursive types are therefore needed-so this work does not offer any improvement for System F. We leave this aspect however for further investigation.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.