1 Introduction

It is well known that the length of a \(\beta \)-reduction sequence of a simply typed \(\lambda \)-term can be extremely long. Beckmann [1] showed that, for any \(k \ge 0\),

$$ \max \{ \beta (t) \mid t \text { is a simply typed } \lambda \text {-term of order } k \text { and size } n \} = {\mathbf {exp}_{k}(\varTheta (n))} $$

where \(\beta (t)\) is the maximum length of the \(\beta \)-reduction sequences of the term t, and \({\mathbf {exp}_{k}(x)}\) is defined by: \({\mathbf {exp}_{0}(x)} \triangleq x\) and \({\mathbf {exp}_{k+1}(x)} \triangleq 2^{{\mathbf {exp}_{k}(x)}}\). Indeed, the following order-\(k\) term [1]:

$$ ( Twice _k)^n Twice _{k-1} \cdots Twice _2 (\lambda x.a\,x\,x) c, $$

where \( Twice _j\) is the twice function \(\lambda f^{\sigma _{j-1}}.\lambda x^{\sigma _{j-2}}.f(f\,x)\) (with \(\sigma _j\) being the order-\(j\) type defined by: \(\sigma _0=\mathtt {o}\) and \(\sigma _j=\sigma _{j-1}\rightarrow \sigma _{j-1}\)), has a \(\beta \)-reduction sequence of length \({\mathbf {exp}_{k}(\varOmega (n))}\).

Although the worst-case length of the longest \(\beta \)-reduction sequence is well known as above, much is not known about the average-case length of the longest \(\beta \)-reduction sequence: how often does one encounter a term having a very long \(\beta \)-reduction sequence? In other words, suppose we pick a simply-typed \(\lambda \)-term \(t\) of order \(k\) and size \(n\) randomly; then what is the probability that \(t\) has a \(\beta \)-reduction sequence longer than a certain bound, like \({\mathbf {exp}_{k}(cn)}\) (where \(c\) is some constant)? One may expect that, although there exists a term (like the one above) whose reduction sequence is as long as \({\mathbf {exp}_{k}(\varOmega (n))}\), such a term is rarely encountered.

In the present paper, we provide a partial answer to the above question, by showing that almost every simply typed \(\lambda \)-term of order k has a reduction sequence as long as \((k-2)\)-fold exponential in the term size, under a certain assumption. More precisely, we shall show:

$$ \lim _{n \rightarrow \infty } \frac{\#\!\left( \{ [t]_\alpha \in \varLambda _n^\alpha (k,\iota ,\xi ) \mid \beta (t) \ge {\mathbf {exp}_{k-2}(n)} \}\right) }{\#\!\left( \varLambda _n^\alpha (k,\iota ,\xi )\right) } = 1 $$

where \(\varLambda _n^\alpha (k,\iota ,\xi )\) is the set of (\(\alpha \)-equivalence classes \([-]_\alpha \) of) simply-typed \(\lambda \)-terms such that the term size is n, the order is up to k, the (internal) arity is up to \(\iota \ge k\) and the number of variable names is up to \(\xi \) (see the next section for the precise definition).

To obtain the above result, we use techniques inspired by the quantitative analysis of untyped \(\lambda \)-terms [2,3,4]. For example, David et al. [2] have shown that almost all untyped \(\lambda \)-terms are strongly normalizing, whereas the result is opposite in the corresponding combinatory logic. A more sophisticated analysis is, however, required in our case, for considering only well-typed terms, and also for reasoning about the length of a reduction sequence instead of a qualitative property like strong normalization.

This work is a part of our long-term project on the quantitative analysis of the complexity of higher-order model checking [5, 6]. The higher-order model checking asks whether the (possibly infinite) tree generated by a ground-type term of the \(\lambda \)Y-calculus (or, a higher-order recursion scheme) satisfies a given regular property, and it is known that the problem is \(k\)-EXPTIME complete for order-\(k\) terms [6]. Despite the huge worst-case complexity, practical model checkers [7,8,9] have been built, which run fast for many typical inputs, and have successfully been applied to automated verification of functional programs [10,11,12,13]. The project aims to provide a theoretical justification for it, by studying how many inputs actually suffer from the worst-case complexity. Since the problem appears to be hard due to recursion, as an intermediate step towards the goal, we aimed to analyze the variant of the problem considered by Terui [14]: given a term of the simply-typed \(\lambda \)-calculus (without recursion) of type Bool, decide whether it evaluates to true or false (where Booleans are Church-encoded; see [14] for the precise definition). Terui has shown that even for the problem, the complexity is \(k\)-EXPTIME complete for order-\((2k+2)\) terms. If, contrary to the result of the present paper, the upper-bound of the lengths of \(\beta \)-reduction sequences were small for almost every term, then we could have concluded that the decision problem above is easily solvable for most of the inputs. The result in the present paper does not necessarily provide a negative answer to the question above, because one need not necessarily apply \(\beta \)-reductions to solve Terui’s decision problem.

The present work may also shed some light on other problems on typed \(\lambda \)-calculi with exponential or higher worst-case complexity. For example, despite DEXPTIME-completeness of ML typability [15, 16], it is often said that the exponential behavior is rarely seen in practice. That is, however, based on only empirical studies. Our technique may be used to provide a theoretical justification (or possibly unjustification).

The rest of this paper is organized as follows. Section 2 states our main result formally. Section 3 analyzes the asymptotic behavior of the number of typed \(\lambda \)-terms of a given size. Section 4 proves the main result. Section 5 discusses related work, and Sect. 6 concludes the paper. For the space restriction, we omit formal proofs and give only sketches instead; see the full version [17] for details.

2 Main Result

In this section we give the precise statement of our main theorem. We denote the cardinality of a set S by \(\#\!\left( S\right) \), and the domain and image of a function \(f\) by \(\mathop {\mathrm {Dom}}(f)\) and \(\mathrm {Im}\left( f\right) \), respectively.

The set of (simple) types, ranged over by \(\tau \) and \(\sigma \), is given by: \(\tau {::}= \mathtt {o}\mid \sigma \rightarrow \tau \). Let V be a countably infinite set, which is ranged over by \(x, x_1, x_2,\) etc. The set of \(\lambda \) -terms (or terms), ranged over by t, is defined by:

$$\begin{aligned} t {:}{:}= x \mid \lambda \overline{x}^\tau \!. t \mid t \, t\qquad \qquad \overline{x}{:}{:}= x \mid {*}\end{aligned}$$

We call elements of \(V \cup \{{*}\}\) variables; \(V \cup \{{*}\}\) is ranged over by \(\overline{x}, \overline{x}_1, \overline{x}_2,\) etc. We call the special variable \({*}\) an unused variable. We sometimes omit type annotations and just write \(\lambda \overline{x}.t\) for \(\lambda \overline{x}^\tau \!. t\).

Terms of our syntax can be translated to usual \(\lambda \)-terms by regarding elements in \(V\cup \{{*}\}\) as usual variables. We define the notions of free variables, closed terms, and \(\alpha \)-equivalence \(\sim _{\alpha }\) through this identification. The \(\alpha \)-equivalence class of a term t is written as \([t]_\alpha \). In this paper, we do not consider a term as an \(\alpha \)-equivalence class, and we always use \([-]_\alpha \) explicitly. For a term \(t\), we write \(\mathbf {FV}(t)\) for the set of all the free variables of \(t\).

For a term t, we define the set \(\mathbf {V}(t)\) of variables (except \({*}\)) in t by:

$$ \mathbf {V}(x) \triangleq \{x\} \quad \mathbf {V}(\lambda x^\tau \!.t) \triangleq \{x\} \cup \mathbf {V}(t) \quad \mathbf {V}(\lambda {*}^\tau \!.t) \triangleq \mathbf {V}(t) \quad \mathbf {V}(t_1 t_2) \triangleq \mathbf {V}(t_1) \cup \mathbf {V}(t_2). $$

Note that neither \(\mathbf {V}(t)\) nor even \(\#\!\left( \mathbf {V}(t)\right) \) is preserved by \(\alpha \)-equivalence. For example, \(t= \lambda x_1. (\lambda x_2. x_2)(\lambda x_3. x_1)\) and \(t'=\lambda x_1. (\lambda x_1. x_1)(\lambda {*}. x_1)\) are \(\alpha \)-equivalent, but \(\#\!\left( \mathbf {V}(t)\right) =3\) and \(\#\!\left( \mathbf {V}(t')\right) =1\).

A type environment \(\varGamma \) is a finite set of type bindings of the form \(x: \tau \) such that if \((x:\tau ) ,(x:\tau ') \in \varGamma \) then \(\tau = \tau '\); sometimes we regard an environment also as a function. Note that \(({*}:\tau )\) cannot belong to a type environment; we do not need any type assumption for \({*}\) since it does not occur in terms. We give the typing rules as follows:

$$ \begin{aligned} \frac{}{x:\tau \vdash x : \tau } \qquad \frac{ \varGamma _1 \vdash t_1 : \sigma {\rightarrow }\tau \qquad \varGamma _2 \vdash t_2 : \sigma }{\varGamma _1 \cup \varGamma _2 \vdash t_1 t_2 : \tau } \\[10pt] \frac{ \varGamma ' \vdash t:\tau \qquad \varGamma '=\varGamma \text { or }\varGamma ' = \varGamma \cup \{\overline{x}:\sigma \} \qquad \overline{x}\notin \mathop {\mathrm {Dom}}(\varGamma ) }{\varGamma \vdash \lambda \overline{x}^{\sigma }\!.t:\sigma {\rightarrow }\tau } \end{aligned} $$

The above typing rules are equivalent to the usual ones for closed terms, and if \(\varGamma \vdash t :\tau \) is derivable, then the derivation is unique. Moreover, if \(\varGamma \vdash t :\tau \) then \(\mathop {\mathrm {Dom}}(\varGamma )=\mathbf {FV}(t)\). Below we consider only well-typed \(\lambda \)-terms. A pair \(\langle \varGamma ; \tau \rangle \) of \(\varGamma \) and \(\tau \) is called a typing. We use \(\theta \) as a metavariable for typings. When \(\varGamma \vdash t: \tau \) is derived, we call \(\langle \varGamma ; \tau \rangle \) a typing of a term t, and call t an inhabitant of \(\langle \varGamma ; \tau \rangle \) or a \(\langle \varGamma ; \tau \rangle \) -term.

Definition 1

(size, order and internal arity of a term). The size of a term t, written |t|, is defined by:

$$\begin{aligned} |\overline{x}| \triangleq 1 \;\;\;\;\;\; |\lambda \overline{x}^\tau \!. t| \triangleq |t| + 1 \;\;\;\;\;\; |t_1 t_2| \triangleq |t_1| + |t_2| + 1. \end{aligned}$$

The order and internal arity of a type \(\tau \), written \(\mathtt {ord}(\tau )\) and \(\mathtt {iar}(\tau )\), are defined respectively by:

$$ \begin{aligned} \mathtt {ord}(\mathtt {o})&\triangleq 0 \qquad \mathtt {iar}(\mathtt {o}) \triangleq 0\\ \mathtt {ord}(\tau _1 \rightarrow \cdots \rightarrow \tau _n \rightarrow \mathtt {o})&\triangleq \max \{ \mathtt {ord}(\tau _i) + 1 \mid 1 \le i \le n \}&(n\ge 1) \\ \mathtt {iar}(\tau _1 \rightarrow \cdots \rightarrow \tau _n \rightarrow \mathtt {o})&\triangleq \max (\{n\} \cup \{ \mathtt {iar}(\tau _i) \mid 1 \le i \le n \})&(n\ge 1). \end{aligned} $$

For a \(\langle \varGamma ; \tau \rangle \)-term t, we define the order and internal arity of \(\varGamma \vdash t: \tau \) written \(\mathtt {ord}(\varGamma \vdash t:\tau )\) and \(\mathtt {iar}(\varGamma \vdash t:\tau )\) by:

$$ \begin{aligned}&\mathtt {ord}(\varGamma \vdash t:\tau ) \triangleq \max \{ \mathtt {ord}(\tau ') \mid (\varGamma ' \vdash t' : \tau ') \text { occurs in } \varDelta \} \\ {}&\mathtt {iar}(\varGamma \vdash t:\tau ) \triangleq \max \{ \mathtt {iar}(\tau ') \mid (\varGamma ' \vdash t' : \tau ') \text { occurs in } \varDelta \} \end{aligned} $$

where \(\varDelta \) is the (unique) derivation tree for \(\varGamma \vdash t: \tau \).

Note that the notions of size, order, internal arity, and \(\beta (t)\) (defined in the introduction) are well-defined with respect to \(\alpha \)-equivalence.

Definition 2

(terms with bounds on types and variables). Let \(\delta ,\iota ,\xi \ge 0\) and \(n \ge 1\) be integers. We denote by \(\mathrm {Types}(\delta ,\iota )\) the set of types \(\{ \tau \mid \mathtt {ord}(\tau ) \le \delta , \mathtt {iar}(\tau ) \le \iota \}\). For \(\varGamma \) and \(\tau \) we define:

Also we define:

$$\begin{aligned} \varLambda _n^\alpha (\delta ,\iota ,\xi ) \triangleq \, \bigcup _{\tau \in \mathrm {Types}(\delta ,\iota )}\, \varLambda _n^\alpha (\langle \emptyset ; \tau \rangle ,\delta ,\iota ,\xi ) \qquad \varLambda ^\alpha (\delta ,\iota ,\xi ) \triangleq \bigcup _{n \ge 1}\varLambda _n^\alpha (\delta ,\iota ,\xi ). \end{aligned}$$

Our main result is the following theorem, which will be proved in Sect. 4.

Theorem 1

Let \(\delta , \iota , \xi \ge 2\) be integers and let \(k = \min \{\delta , \iota \}\). Then,

$$ \lim _{n \rightarrow \infty } \frac{\#\!\left( \{ [t]_\alpha \in \varLambda _n^\alpha (\delta ,\iota ,\xi ) \mid \beta (t) \ge {\mathbf {exp}_{k-2}(n)} \}\right) }{\#\!\left( \varLambda _n^\alpha (\delta ,\iota ,\xi )\right) } = 1. $$

Remark 1

Note that in the above theorem, the order \(\delta \), the internal arity \(\iota \) and the number \(\xi \) of variables are bounded above by a constant, independently of the term size \(n\). It is debatable whether the assumption is reasonable, and a slight change of the assumption may change the result, as is the case for strong normalization of untyped \(\lambda \)-term [2, 4]. When \(\lambda \)-terms are viewed as models of functional programs, our rationale behind the assumption is as follows. The assumption that the size of types (hence also the order and the internal arity) is fixed is sometimes assumed in the context of type-based program analysis [18]. The assumption on the number of variables comes from the observation that a large program usually consists of a large number of small functions, and that the number of variables is bounded by the size of each function.

3 Analysis of \(\varLambda _n^\alpha (\delta ,\iota ,\xi )\)

To prove our main theorem, we first analyze some formal language theoretic structure and properties of \(\varLambda ^\alpha (\delta ,\iota ,\xi )\): in Sect. 3.1, we construct a regular tree grammar such that there is a size preserving bijection between its tree language and \(\varLambda ^\alpha (\delta ,\iota ,\xi )\); in Sect. 3.2, we show that the grammar has two important properties: irreducibility and aperiodicity. Thanks to those properties, we can obtain a simple asymptotic formula for \(\#\!\left( \varLambda _n^\alpha (\delta ,\iota ,\xi )\right) \) using analytic combinatorics [19]. The irreducibility and aperiodicity properties will also be used in Sect. 4 for adjusting the size and typing of a \(\lambda \)-term.

3.1 \(\varLambda ^\alpha (\delta ,\iota ,\xi )\) as a Regular Tree Language

We first recall some basic definitions for regular tree grammars. A ranked alphabet \(\varSigma \) is a mapping from a finite set of symbols to the set of natural numbers. For a symbol \(a \in \mathop {\mathrm {Dom}}(\varSigma )\), we call \(\varSigma (a)\) the rank of a. A \(\varSigma \) -tree is a tree composed from symbols in \(\varSigma \) according to their ranks: (i) a is a \(\varSigma \)-tree if \(\varSigma (a) = 0\), (ii) \(a(T_1, \cdots , T_{\varSigma (a)})\) is a \(\varSigma \)-tree if \(\varSigma (a) \ge 1\) and \(T_i\) is a \(\varSigma \)-tree for each \(i \in \{1,\ldots , \varSigma (a)\}\). We use the meta-variable \(T\) for trees. The size of \(T\), written as \(|T|\), is the number of nodes and leaves of \(T\). We denote the set of all \(\varSigma \)-trees by \(\mathcal{T}_{\varSigma }\).

A regular tree grammar is a triple \({\mathcal {G}}= (\varSigma , \mathcal{N}, {\mathcal {R}})\) where (i) \(\varSigma \) is a ranked alphabet; (ii) \(\mathcal{N}\) is a finite set of non-terminals; (iii) \({\mathcal {R}}\) is a finite set of rewriting rules of the form \(N \longrightarrow a(N_1, \cdots , N_{\varSigma (a)})\) where \(a \in \mathop {\mathrm {Dom}}(\varSigma )\), \(N \in \mathcal{N}\) and \(N_i \in \mathcal{N}\) for every \(i \in \{1,\ldots , \varSigma (a)\}\). A \((\varSigma \cup \mathcal{N})\) -tree \(T\) is a tree composed from symbols in \(\varSigma \cup \mathcal{N}\) according to their ranks where the rank of every symbol in \(\mathcal{N}\) is zero (thus non-terminals appear only in leaves of \(T\)). For a tree grammar \({\mathcal {G}}= (\varSigma , \mathcal{N}, {\mathcal {R}})\) and a non-terminal \(N \in \mathcal{N}\), the language \(\mathcal{L}\left( {\mathcal {G}},N\right) \) of N is defined by \(\mathcal{L}\left( {\mathcal {G}},N\right) \triangleq \{ T\in \mathcal{T}_{\varSigma } \mid N \longrightarrow ^*_{\mathcal {G}}T\}\) where \(\longrightarrow ^*_{\mathcal {G}}\) denotes the reflexive and transitive closure of the rewriting relation \(\longrightarrow _{\mathcal {G}}\). We also define \(\mathcal{L}_n\left( {\mathcal {G}},N\right) \triangleq \{ T\in \mathcal{T}_{\varSigma } \mid N \longrightarrow ^* T, |T| = n \}\). We often omit \({\mathcal {G}}\) and write \(N \longrightarrow ^* N'\), \(\mathcal{L}\left( N\right) \), and \(\mathcal{L}_n\left( N\right) \) for \(N \longrightarrow ^*_{\mathcal {G}}N'\), \(\mathcal{L}\left( {\mathcal {G}},N\right) \), and \(\mathcal{L}_n\left( {\mathcal {G}},N\right) \) respectively, if \({\mathcal {G}}\) is clear from the context. We say that \(N'\) is reachable from \(N\) if there exists a \((\varSigma \cup \mathcal{N})\)-tree \(T\) such that \(N \longrightarrow ^* T\) and \(T\) contains \(N'\) as a leaf. A grammar \({\mathcal {G}}\) is unambiguous if, for every pair of a non-terminal \(N\) and a tree \(T\), there exists at most one leftmost reduction sequence from \(N\) to \(T\).

Definition 3

(grammar of \(\varLambda ^\alpha (\delta ,\iota ,\xi )\) ). Let \(\delta ,\iota ,\xi \ge 0\) be integers and \(X_\xi = \{x_1, \cdots , x_\xi \}\) be a subset of V. The regular tree grammar \({\mathcal {G}}{(\delta ,\iota ,\xi )}\) is defined as \((\varSigma {(\delta ,\iota ,\xi )}, {\mathcal {N}}{(\delta ,\iota ,\xi )}, {\mathcal {R}}{(\delta ,\iota ,\xi )})\) where:

$$\begin{aligned} \varSigma {(\delta ,\iota ,\xi )} \triangleq&\ \{ x \mapsto 0 \mid x \in X_\xi \} \cup \{@ \mapsto 2\}\\ \cup&\ \{ \lambda \overline{x}^\tau \mapsto 1 \mid \overline{x}\in \{{*}\} \cup X_\xi ,\ \tau \in \mathrm {Types}(\delta -1,\iota ) \}\\ {\mathcal {N}}{(\delta ,\iota ,\xi )} \triangleq&\ \{N_{\langle \varGamma ; \tau \rangle } \mid \tau \in \mathrm {Types}(\delta ,\iota ), \mathop {\mathrm {Dom}}(\varGamma ) \subseteq X_\xi ,\ \mathrm {Im}\left( \varGamma \right) \subseteq \mathrm {Types}(\delta -1,\iota ), \\&\ \qquad \qquad \varGamma \vdash t: \tau \text { for some } t\ \} \\ {\mathcal {R}}{(\delta ,\iota ,\xi )} \triangleq&\ \{N_{\langle \{x_i:\tau \}; \tau \rangle } \longrightarrow x_i \} \cup \{ N_{\langle \varGamma ; \sigma {\rightarrow }\tau \rangle } \longrightarrow \lambda {*}^{\sigma } (N_{\langle \varGamma ; \tau \rangle }) \}\\ \cup&\ \{ N_{\langle \varGamma ; \sigma {\rightarrow }\tau \rangle } \longrightarrow \lambda x_i^{\sigma } (N_{\langle \varGamma \cup \{x_i:\sigma \}; \tau \rangle }) \mid i = \min \{ j \ge 1 \mid x_j \notin \mathop {\mathrm {Dom}}(\varGamma )\}, \\&\ \,\,\, \#\!\left( \varGamma \right) < \xi \} \cup \{N_{\langle \varGamma ; \tau \rangle } \longrightarrow @(N_{\langle \varGamma _1; \sigma {\rightarrow }\tau \rangle }, N_{\langle \varGamma _2; \sigma \rangle }) \mid \varGamma = \varGamma _1 \cup \varGamma _2\} \end{aligned}$$

Here, the special symbol \(@ \in \mathop {\mathrm {Dom}}(\varSigma {(\delta ,\iota ,\xi )})\) corresponds to application. For a technical convenience, the above definition excludes from \({\mathcal {N}}{(\delta ,\iota ,\xi )}\) typings which have no inhabitant. Note that \(\varSigma {(\delta ,\iota ,\xi )}\), \({\mathcal {N}}{(\delta ,\iota ,\xi )}\) and \({\mathcal {R}}{(\delta ,\iota ,\xi )}\) are finite. To see the finiteness of \({\mathcal {N}}{(\delta ,\iota ,\xi )}\), notice that \(X_\xi \) and \(\mathrm {Types}(\delta -1,\iota )\) are finite, hence so is \(\{\varGamma \mid \mathop {\mathrm {Dom}}(\varGamma ) \subseteq X_\xi ,\ \mathrm {Im}\left( \varGamma \right) \subseteq \mathrm {Types}(\delta -1,\iota )\}\). The finiteness of \({\mathcal {R}}{(\delta ,\iota ,\xi )}\) follows immediately from that of \({\mathcal {N}}{(\delta ,\iota ,\xi )}\).

Example 1

Let us consider the case where \(\delta = \iota = \xi = 1\). The grammar \({\mathcal {G}}{(1,1,1)}\) consists of the following.

$$\begin{aligned} \varSigma {(1,1,1)} \!=&\{ x_1, @, \lambda x_1^\mathtt {o}, \lambda {*}^\mathtt {o}\} \qquad {\mathcal {N}}{(1,1,1)} = \{ N_{\langle \emptyset ; \mathtt {o}{\rightarrow }\mathtt {o} \rangle }, N_{\langle \{x_1: \mathtt {o}\}; \mathtt {o} \rangle } N_{\langle \{x_1: \mathtt {o}\}; \mathtt {o}{\rightarrow }\mathtt {o} \rangle } \}\\ {\mathcal {R}}{(1,1,1)} \!=\!&{\left\{ \begin{array}{ll} N_{\langle \emptyset ; \mathtt {o}{\rightarrow }\mathtt {o} \rangle } \longrightarrow \lambda x_1^\mathtt {o}(N_{\langle \{x_1:\mathtt {o}\}; \mathtt {o} \rangle })\\ N_{\langle \{x_1: \mathtt {o}\}; \mathtt {o} \rangle } \!\longrightarrow x_1 \!\!\mid \!\! @(\!N_{\langle \{x_1: \mathtt {o}\}; {\mathtt {o}{\rightarrow }\mathtt {o}} \rangle }\!,N_{\langle \{x_1: \mathtt {o}\}; \mathtt {o} \rangle }\!) \!\!\mid \!\! @(\!N_{\langle \emptyset ; \mathtt {o}{\rightarrow }\mathtt {o} \rangle }\!,N_{\langle \{x_1: \mathtt {o}\}; \mathtt {o} \rangle }\!) \\ N_{\langle \{x_1: \mathtt {o}\}; \mathtt {o}{\rightarrow }\mathtt {o} \rangle } \longrightarrow \lambda {*}^\mathtt {o}(N_{\langle \{x_1: \mathtt {o}\}; \mathtt {o} \rangle }). \end{array}\right. } \end{aligned}$$

There is the obvious embedding \(e^{(\delta ,\iota ,\xi )}\) (\(e\) for short) from trees in \(\mathcal{T}_{\varSigma {(\delta ,\iota ,\xi )}}\) into \(\lambda \)-terms. For \(N_{\langle \varGamma ; \tau \rangle } \in {\mathcal {N}}{(\delta ,\iota ,\xi )}\) we define

$$ \pi ^{(\delta ,\iota ,\xi )}_{\langle \varGamma ; \tau \rangle } \triangleq [-]_\alpha \circ e: \mathcal{L}\left( N_{\langle \varGamma ; \tau \rangle }\right) \rightarrow \varLambda ^\alpha (\langle \varGamma ; \tau \rangle ,\delta ,\iota ,\xi ). $$

We sometimes omit the superscript and/or the subscript.

Proposition 1

For \(\delta ,\iota ,\xi \ge 0\), \(\pi _{\langle \varGamma ; \tau \rangle }\) is a size-preserving bijection, and \({\mathcal {G}}{(\delta ,\iota ,\xi )}\) is unambiguous.

The former part of Proposition 1 says that \({\mathcal {G}}{(\delta ,\iota ,\xi )}\) gives a complete representation system of the \(\alpha \)-equivalence classes. For \([t]_\alpha \in \varLambda ^\alpha (\langle \varGamma ; \tau \rangle ,\delta ,\iota ,\xi )\), we define \(\nu _{\langle \varGamma ; \tau \rangle }^{(\delta ,\iota ,\xi )}([t]_\alpha )\) (or \(\nu ([t]_\alpha )\) for short) as \(e^{(\delta ,\iota ,\xi )}\circ \left( \pi ^{(\delta ,\iota ,\xi )}_{\langle \varGamma ; \tau \rangle }\right) ^{-1}([t]_\alpha )\). The function \(\nu \) normalizes variable names. For example, \(t = \lambda x. x(\lambda y. \lambda z. z)\) is normalized to \(\nu ([t]_\alpha ) = \lambda x_1. x_1 (\lambda {*}. \lambda x_1. x_1)\).

Due to technical reasons, we restrict the grammar \({\mathcal {G}}{(\delta ,\iota ,\xi )}\) to \({\mathcal {G}}^{\emptyset }{(\delta ,\iota ,\xi )}\), which contains only non-terminals reachable from \(N_{\langle \emptyset ; \sigma \rangle }\) for some \(\sigma \) (see the full version [17] for details).

$$ \begin{aligned} {\mathcal {N}}^{\emptyset }{(\delta ,\iota ,\xi )}&\triangleq \{N_{\theta } \in {\mathcal {N}}{(\delta ,\iota ,\xi )} \mid N_{\theta } \text { is reachable from some }N_{\langle \emptyset ; \sigma \rangle } \in {\mathcal {N}}{(\delta ,\iota ,\xi )}\} \\ {\mathcal {R}}^{\emptyset }{(\delta ,\iota ,\xi )}&\triangleq \{N_{\theta } \longrightarrow T \in {\mathcal {R}}{(\delta ,\iota ,\xi )} \mid N_{\theta } \in {\mathcal {N}}^{\emptyset }{(\delta ,\iota ,\xi )}\} \\ {\mathcal {G}}^{\emptyset }{(\delta ,\iota ,\xi )}&\triangleq (\varSigma {(\delta ,\iota ,\xi )}, {\mathcal {N}}^{\emptyset }{(\delta ,\iota ,\xi )}, {\mathcal {R}}^{\emptyset }{(\delta ,\iota ,\xi )}). \end{aligned} $$

For \(N_{\theta } \in {\mathcal {N}}^{\emptyset }{(\delta ,\iota ,\xi )}\), clearly \(\mathcal{L}\left( {\mathcal {G}}^{\emptyset }{(\delta ,\iota ,\xi )},N_{\theta }\right) = \mathcal{L}\left( {\mathcal {G}}{(\delta ,\iota ,\xi )},N_{\theta }\right) \). Through the bijection \(\pi \), we can show that, for any \(N_{\langle \varGamma ; \tau \rangle } \in {\mathcal {N}}{(\delta ,\iota ,\xi )}\), \(N_{\langle \varGamma ; \tau \rangle }\) also belongs to \({\mathcal {N}}^{\emptyset }{(\delta ,\iota ,\xi )}\) if and only if there exists a term in \(\varLambda ^\alpha (\delta ,\iota ,\xi )\) whose derivation contains a type judgment of the form \(\varGamma \vdash t: \tau \).

3.2 Irreducibility and Aperiodicity

We discuss two important properties of the grammar \({\mathcal {G}}^{\emptyset }{(\delta ,\iota ,\xi )}\) where \(\delta ,\iota ,\xi \ge 2\): irreducibility and aperiodicity [19].Footnote 1

Definition 4

(irreducibility and aperiodicity). Let \({\mathcal {G}}= (\varSigma , \mathcal{N}, {\mathcal {R}})\) be a regular tree grammar. We say that \({\mathcal {G}}\) is:

  • non-linear if \({\mathcal {R}}\) contains at least one rule of the form \(N \longrightarrow a(N_1, \cdots , N_{\varSigma (a)})\) with \(\varSigma (a) \ge 2\),

  • strongly connected if for any pair of non-terminals \(N_1, N_2 \in \mathcal{N}\), \(N_1\) is reachable from \(N_2\),

  • irreducible if \({\mathcal {G}}\) is both non-linear and strongly connected,

  • aperiodic if for any non-terminal \(N \in \mathcal{N}\) there exists an integer \(m > 0\) such that \(\#\!\left( \mathcal{L}_n\left( N\right) \right) > 0\) for any \(n > m\).

Proposition 2

\({\mathcal {G}}^{\emptyset }{(\delta ,\iota ,\xi )}\) is irreducible and aperiodic for any \(\delta ,\iota , \xi \ge 2\).

The following theorem is a minor modification of Theorem VII.5 in [19], which states the asymptotic behavior of an irreducible and aperiodic context-free specification (see the full version [17] for details). Below, \(\sim \) means the asymptotic equality, i.e., \(f(n) \sim g(n) \; \Longleftrightarrow \; \lim _{n \rightarrow \infty } f(n)/g(n) = 1.\)

Theorem 2

([19]). Let \({\mathcal {G}}= (\varSigma , \mathcal{N}, {\mathcal {R}})\) be an unambiguous, irreducible and aperiodic regular tree grammar. Then there exists a constant \(\gamma ({\mathcal {G}}) >1\) such that, for any non-terminal \(N \in \mathcal{N}\), there exists a constant \(C_{N}({\mathcal {G}}) > 0\) such that

$$ \#\!\left( \mathcal{L}_n\left( N\right) \right) \sim C_{N}({\mathcal {G}}) \gamma ({\mathcal {G}})^n n^{-3/2}. $$

As a corollary of Proposition 2 and Theorem 2 above, we obtain:

$$\begin{aligned} \#\!\left( \varLambda _n^\alpha (\delta ,\iota ,\xi )\right) \sim C \gamma ^n n^{-3/2} \end{aligned}$$
(1)

where \(C>0\) and \(\gamma > 1\) are some real constants determined by \(\delta ,\,\iota ,\,\xi \ge 2\). For proving our main theorem, we use a variation of the formula (1) above, stated as Lemma 1 later.

4 Proof of the Main Theorem

We give a proof of Theorem 1 in this section. In the rest of the paper, we denote by \(\log ^{(2)}(n)\) the 2-fold logarithm: \(\log ^{(2)}(n) \triangleq \log \log n\). All logarithms are base 2. The outline of the proof is as follows. We prepare a family \((t_n)_{n \in \mathbb {N}}\) of \(\lambda \)-terms such that \(t_n\) is of size \(\varOmega (\log ^{(2)}(n))\) and has a \(\beta \)-reduction sequence of length \({\mathbf {exp}_{k}(\varOmega (|t_n|))}\), i.e., \({\mathbf {exp}_{k-2}(\varOmega (n))}\). Then we show that almost every \(\lambda \)-term of size \(n\) contains \(t_n\) as a subterm. The latter is shown by adapting (a parameterized version of) the infinite monkey theorem Footnote 2 for words to simply-typed \(\lambda \)-terms.

To clarify the idea, let us first recall the infinite monkey theorem for words. Let A be an alphabet, i.e., a finite non-empty set of symbols. For a word \(w = a_1\cdots a_n\), we write \(|w| = n\) for the length of w. As usual, we denote by \(A^n\) the set of all words of length n over A, and by \(A^*\) the set of all finite words over A: \(A^* = \bigcup _{n \ge 0} A^n\). For two words \(w, w' \in A^*\), we say \(w'\) is a subword of w and write \(w' \sqsubseteq w\) if \(w = w_1 w' w_2\) for some words \(w_1, w_2 \in A^*\). The infinite monkey theorem states that, for any word \(w \in A^*\), the probability that a randomly chosen word of size n contains w as a subword tends to one if n tends to infinity.

To prove our main theorem, we need to extend the above infinite monkey theorem to the following parameterized versionFootnote 3, and then further extend it for simply-typed \(\lambda \)-terms instead of words. We give a proof of the following proposition, because it will clarify the overall structure of the proof of the main theorem.

Proposition 3

(parameterized infinite monkey theorem). Let A be an alphabet and \((w_n)_n\) be a family of words over A such that \(|w_n| = \lceil \log ^{(2)}(n) \rceil \). Then, we have:

$$ \lim _{n \rightarrow \infty } \frac{\#\!\left( \{ w \in A^n \mid w_n \sqsubseteq w\}\right) }{\#\!\left( A^n\right) } = 1. $$

Proof

Let \(p(n)\) be \(1 - \#\!\left( \{ w \in A^n \mid w_n \sqsubseteq w \}\right) /\#\!\left( A^n\right) {} \), i.e., the probability that a word of size \(n\) does not contain \(w_n\). We write \(s(n)\) for \(\lceil \log ^{(2)}(n) \rceil \) and \(c(n)\) for \(\lfloor n/s(n) \rfloor \). Given a word \(w = a_1\cdots a_n\in A^n\), let us partition it to subwords of length \(s(n)\) as follows.

$$ w = \underbrace{a_1 \cdots a_{s(n)}}_{\text {1-st subword}} \cdots \underbrace{a_{(c(n)-1) s(n)+1} \cdots a_{c(n)s(n)}}_{c(n)\text {-th subword}} a_{c(n) s(n)+1} \cdots a_{n} $$

Then,

$$ \begin{array}{l} p(n) \le ``\text {the probability that none of the }i\text {-th subword is }w_{n}'' \\ \quad = \left( \frac{\#\!\left( A^{s(n)} \setminus \{w_n\}\right) }{\#\!\left( A^{s(n)}\right) } \right) ^{c(n)} = \left( \frac{\#\!\left( A^{s(n)}\right) - 1}{\#\!\left( A^{s(n)}\right) } \right) ^{c(n)} = \left( 1-\frac{1}{\#\!\left( A\right) ^{s(n)}}\right) ^{c(n)}. \end{array} $$

Since \(\left( 1-\frac{1}{\#\!\left( A\right) ^{s(n)}}\right) ^{c(n)} = \left( 1-\frac{1}{\#\!\left( A\right) ^{\lceil \log ^{(2)}(n) \rceil }}\right) ^{\lfloor n/\lceil \log ^{(2)}(n) \rceil \rfloor }{} \) tends to zero (see the full version [17]) if \(n\) tends to infinity, we have the required result.   \(\square \)

To prove an analogous result for simply-typed \(\lambda \)-terms, we consider below subcontexts of a given term instead of subwords of a given word. To consider “contexts up to \(\alpha \)-equivalence”, in Sect. 4.1 we introduce the set \(\mathcal{U}^{\nu }_{n}(\delta ,\iota ,\xi ){} \) of “normalized” contexts (of size \(n\) and with the restriction by \(\delta \), \(\iota \) and \(\xi \)), where \(\mathcal{U}^{\nu }_{s(n)}(\delta ,\iota ,\xi )\) corresponds to \(A^{s(n)}\) above, and give an upper bound of \(\#\!\left( \mathcal{U}^{\nu }_{n}(\delta ,\iota ,\xi )\right) {} \). A key property used in the above proof was that any word of length \(n\) can be partitioned to sufficiently many subwords of length \(\log ^{(2)}(n)\). Section 4.2 below shows an analogous result that any term of size \(n\) can be decomposed into sufficiently many subcontexts of a given size. Section 4.3 constructs a family of contexts \( Expl _{n}^{k}{} \) (called “explosive contexts”) that have very long reduction sequences; \(( Expl _{n}^{k})_n\) corresponds to \((w_n)_n\) above. Finally, Sect. 4.4 proves the main theorem using an argument similar to (but more involved than) the one used in the proof above.

4.1 Normalized Contexts

We first introduce some basic definitions of contexts, and then we define the notion of a normalized context, which is a context normalized by the function \(\nu \) given in Sect. 3.1.

The set of contexts, ranged over by \(C\), is defined by

$$ C {::}= [\,]\mid x \mid \lambda \overline{x}^\tau \!. C \mid CC $$

The size of \(C\), written \(|C|\), is defined by:

$$ |[\,]| \triangleq 0 \qquad |x| \triangleq 1 \qquad |\lambda \overline{x}^\tau \!. C| \triangleq |C| + 1 \qquad |C_1C_2| \triangleq |C_1| + |C_2| +1. $$

We call a context \(C\) an \(n\) -context (and define \(\mathtt {hn}(C)\triangleq n\)) if \(C\) contains \(n\) occurrences of \([\,]\). We use the metavariable \(S\) for 1-contexts. A 0/1-context is a term \(t\) or a 1-context \(S\) and we use the metavariable \(u\) to denote 0/1-contexts. The holes in \(C\) occur as leaves and we write \([\,]_i\) for the \(i\)-th hole, which is counted in the left-to-right order.

For \(C\), \(C_1\), ..., \(C_{\mathtt {hn}(C)}\), we write \(C[C_1]\dots [C_{\mathtt {hn}(C)}]\) for the context obtained by replacing \([\,]_i\) in \(C\) with \(C_i\) for each \(i \le \mathtt {hn}(C)\). For \(C\) and \(C'\), we write \(C[C']_i\) for the context obtained by replacing the \(i\)-th hole \([\,]_i\) in \(C\) with \(C'\). As usual, these substitutions may capture variables; e.g., \((\lambda x.[\,])[x]\) is \(\lambda x.x\). We say that \(C\) is a subcontext of \(C'\) and write \(C \preceq C'{} \) if there exist \(C''\), \(1 \le i \le \mathtt {hn}(C'')\) and \(C_1,\cdots ,C_{\mathtt {hn}(C)}\) such that \(C' = C''[C[C_1]\cdots [C_{\mathtt {hn}(C)}]]_i\).

The set of context typings, ranged over by \(\kappa \), is defined by: \(\kappa {:}{:}= \theta _1\cdots \theta _k{\Rightarrow }\theta \) where \(k \in \mathbb {N}\) and \(\theta _i\) is a typing of the form \(\langle \varGamma _i; \tau _i \rangle {} \) for each \(1 \le i \le k\) (recall that we use \(\theta \) as a metavariable for typings). A \(\langle \varGamma _1; \tau _1 \rangle \cdots \langle \varGamma _k; \tau _k \rangle {\Rightarrow }\langle \varGamma ; \tau \rangle \)-context is a \(k\)-context \(C\) such that \(\varGamma \vdash C:\tau \) is derivable from \(\varGamma _i\vdash [\,]_i:\tau _i\). We identify a context typing \({\Rightarrow }\theta \) with the typing \(\theta \), and call a \(\theta \)-context also a \(\theta \) -term.

From now, we begin to define normalized contexts. First we consider contexts in terms of the grammar \({\mathcal {G}}^{\emptyset }{(\delta ,\iota ,\xi )}{} \) given in Sect. 3.1. Let \(\delta ,\iota ,\xi \ge 0\). For \(\kappa =\theta _1\cdots \theta _n{\Rightarrow }\theta \) such that \(N_{\theta _1},\dots ,N_{\theta _n},N_{\theta } \in {\mathcal {N}}{(\delta ,\iota ,\xi )}\), a (\(\kappa \) -)context-tree is a tree \(\widehat{T}\) in \(\mathcal{T}_{\varSigma {(\delta ,\iota ,\xi )}\cup {\mathcal {N}}{(\delta ,\iota ,\xi )}}{} \) such that there exists a reduction \(N_{\theta } \longrightarrow ^{*} \widehat{T}\) and the occurrences of non-terminals in \(\widehat{T}\) (in the left-to-right order) are exactly \(N_{\theta _1},\dots ,N_{\theta _n}\). We use \(\widehat{T}\) as a metavariable for context-trees. We write \(\mathcal{L}\left( \kappa ,\delta ,\iota ,\xi \right) \) for the set of all \(\kappa \)-context-trees. For \(\theta _1\cdots \theta _n{\Rightarrow }\theta \)-context-tree \(\widehat{T}\) and \(\theta ^i_1\cdots \theta ^i_{k_i}{\Rightarrow }\theta _i\)-context-trees \(\widehat{T}_i\) \((i=1,\dots ,n)\), we define the substitution \( \widehat{T}[\widehat{T}_1]\cdots [\widehat{T}_n] \) as the

\(\theta ^1_1\cdots \theta ^1_{k_1}\cdots \theta ^n_1\cdots \theta ^n_{k_n}{\Rightarrow }\theta \)- context-tree obtained by replacing \(N_{\theta _i}\) in \(\widehat{T}\) with \(\widehat{T}_i\).

The set \(\mathcal{C}^{\nu }(\kappa ,\delta ,\iota ,\xi )\) of normalized \(\kappa \) -contexts is defined by:

$$ \mathcal{C}^{\nu }(\kappa ,\delta ,\iota ,\xi ) \triangleq e^{(\delta ,\iota ,\xi )}_{\kappa }(\mathcal{L}\left( \kappa ,\delta ,\iota ,\xi \right) ) $$

where \(e^{(\delta ,\iota ,\xi )}_{\kappa }{} \) is the obvious embedding from \(\kappa \)-context-trees to \(\kappa \)-contexts that preserves the substitution (i.e., \(e^{(\delta ,\iota ,\xi )}_{\kappa }(T[T']) = e^{(\delta ,\iota ,\xi )}_{\kappa }(T)[e^{(\delta ,\iota ,\xi )}_{\kappa }(T')]\)). Further, the sets \(\mathcal{U}^{\nu }(\delta ,\iota ,\xi )\) and \( \mathcal{U}^{\nu }_{n}(\delta ,\iota ,\xi )\) of normalized 0/1-contexts are defined by:

$$\begin{aligned} \mathcal{U}^{\nu }(\delta ,\iota ,\xi )&\triangleq \Big ({\bigcup _{N_{\theta } \in {\mathcal {N}}^{\emptyset }{(\delta ,\iota ,\xi )}}} \mathcal{C}^{\nu }(\theta ,\delta ,\iota ,\xi )\Big ) \ \bigcup \ \Big ({\bigcup _{N_{\theta },N_{\theta '} \in {\mathcal {N}}^{\emptyset }{(\delta ,\iota ,\xi )}}} \mathcal{C}^{\nu }(\theta {\Rightarrow }\theta ',\delta ,\iota ,\xi )\Big ) \\ \mathcal{U}^{\nu }_{n}(\delta ,\iota ,\xi )&\triangleq \{ u \in \mathcal{U}^{\nu }(\delta ,\iota ,\xi ) \mid |u| = n\}. \end{aligned}$$

In our proof of the main theorem, the set \(\mathcal{U}^{\nu }_{s(n)}(\delta ,\iota ,\xi )\) plays a role corresponding to \(A^{s(n)}\) in the word case explained above. Note that in the word case we calculated the limit of some upper bound of \(p(n)\); similarly, in our proof, we only need an upper bound of \(\#\!\left( \mathcal{U}^{\nu }_{n}(\delta ,\iota ,\xi )\right) {} \), which is given as follows.

Lemma 1

(upper bound of \(\#\!\left( \mathcal{U}^{\nu }_{n}(\delta ,\iota ,\xi )\right) {} \) ). For any \(\delta , \iota , \xi \ge 2\), there exists some constant \(\overline{\gamma }(\delta ,\iota ,\xi ) > 1\) such that \( \#\!\left( \mathcal{U}^{\nu }_{n}(\delta ,\iota ,\xi )\right) = \mathrm {O}(\overline{\gamma }(\delta ,\iota ,\xi )^n).\)

Proof Sketch

Given an unambiguous, irreducible and aperiodic regular tree grammar, adding a new terminal of the form \(a_N\) and a new rule of the form \(N \longrightarrow a_N\) for each non-terminal \(N\) does not change the unambiguity, irreducibility and aperiodicity. Let \(\overline{{\mathcal {G}}}^{\emptyset }{(\delta ,\iota ,\xi )}\) be the grammar obtained by applying this transformation to \({\mathcal {G}}^{\emptyset }{(\delta ,\iota ,\xi )}{} \). We can regard a tree of \(\overline{{\mathcal {G}}}^{\emptyset }{(\delta ,\iota ,\xi )}\) as a normalized context, with \(a_{N_{\theta }}\) considered a hole with typing \(\theta \). Then, clearly we have

$$ \#\!\left( \mathcal{U}^{\nu }_{n}(\delta ,\iota ,\xi )\right) \le \#\!\left( \cup _{N\in {\mathcal {N}}^{\emptyset }{(\delta ,\iota ,\xi )}} \mathcal{L}_n\left( \overline{{\mathcal {G}}}^{\emptyset }{(\delta ,\iota ,\xi )}, N\right) \right) . $$

Thus the required result follows from Theorem 2.    \(\square \)

4.2 Decomposition

As explained in the beginning of this section, to prove the parameterized infinite monkey theorem for terms, we need to decompose a \(\lambda \)-term into sufficiently many subcontexts of the term. Thus, in this subsection, we will define a decomposition function \(\widehat{\varPhi }_m\) (where \(m\) is a parameter) that decomposes a term \(t\) into (i) a (sufficiently long) sequence \(P\) of 0/1-subcontexts of \(t\) such that every component \(u\) of \(P\) satisfies \(|u| \ge m\), and (ii) a “second-order” context \(E\) (defined later), which is a remainder of extracting \(P\) from \(t\). Figure 1 illustrates how a term is decomposed by \(\widehat{\varPhi }_{3}{} \). Here, the symbols \({\![\![ \, \, \!]\!]}{} \) in the second-order context on the right-hand side represents the original position of each subcontext \((\lambda y.[\,])x,\,\lambda z. \lambda {*}.z\), and \((\lambda {*}.y)\lambda z.z\).

Fig. 1.
figure 1

Example of a decomposition

In order to define \(\widehat{\varPhi }_m\), let us give a precise definition of second-order contexts. The set of second-order contexts, ranged over by \(E\), is defined by:

$$ E {:}{:}= \![\![ \,\, \!]\!]^{\theta _1\cdots \theta _k{\Rightarrow }\theta }_{n}[E_1]\cdots [E_k] \ (n\in \mathbb {N}) \mid x \mid \lambda \overline{x}^\tau \!.E \mid E_1E_2. $$

Intuitively, the second-order context is an expression having holes of the form \([\![ \,\, \!]\!]^{\kappa }_n\) (called second-order holes). In the second-order context \([\![ \,\, \!]\!]^{\theta _1\cdots \theta _k{\Rightarrow }\theta }_{n}[E_1]\cdots [E_k]\), \([\![ \,\, \!]\!]^{\theta _1\cdots \theta _k{\Rightarrow }\theta }_{n}\) should be filled with a \(\theta _1\cdots \theta _k{\Rightarrow }\theta \)-context of size \(n\), yielding a term whose typing is \(\theta \). We use the metavariable \(P\) for sequences of contexts. For a sequence of contexts \(P=C_1\cdot C_2\cdots C_\ell \) and \(i\le \ell \), we write \(\#\!\left( P\right) {} \) for the length \(\ell \), and \(P {\centerdot } i\) for the \(i\)-th component \(C_i\).

We define \(|\![\![ \,\, \!]\!]^{\kappa }_n| \triangleq n\). We write \(\mathtt {shn}(E)\) for the number of the second-order holes in \(E\). For \(i \le \mathtt {shn}(E)\), we write \(E {\centerdot } i\) for the \(i\)-th second-order hole (counted in the depth-first left-to-right pre-order). For a context \(C\) and a second-order hole \([\![ \,\, \!]\!]^{\kappa }_n\), we write \(C : \![\![ \,\, \!]\!]^{\kappa }_n\) if \(C\) is a \(\kappa \)-context of size \(n\). For \(E\) and \(P = C_1\cdot C_2\cdot \cdots C_{\mathtt {shn}(E)}{} \), we write \(P : E{} \) if \(C_i : E {\centerdot } i{} \) for each \(i \le \mathtt {shn}(E)\). We distinguish between second-order contexts with different annotations; for example, \([\![ \,\, \!]\!]^{\langle \{x:\mathtt {o}\}; \mathtt {o} \rangle {\Rightarrow }\langle \{x:\mathtt {o}\}; \mathtt {o} \rangle }_{0}[x]\), \([\![ \,\, \!]\!]^{\langle \{x:\mathtt {o}\}; \mathtt {o} \rangle {\Rightarrow }\langle \{x:\mathtt {o}\}; \mathtt {o} \rangle }_{2}[x]\) and \([\![ \,\, \!]\!]^{\langle \{x:\mathtt {o}\rightarrow \mathtt {o}\}; \mathtt {o}\rightarrow \mathtt {o} \rangle {\Rightarrow }\langle \{x:\mathtt {o}\rightarrow \mathtt {o}\}; \mathtt {o}\rightarrow \mathtt {o} \rangle }_2[x]\) are different from each other. Note that every term can be regarded as a second-order context \(E\) such that \(\mathtt {shn}(E) = 0\).

The bracket \([-]\) in a second-order context is just a syntactical representation rather than the substitution operation of contexts. Given \(E\) and \(C\) such that \(\mathtt {shn}(E) \ge 1\) and \(C : E {\centerdot } 1\), we write \(E{\![\![ C \!]\!]}{} \) for the second-order context obtained by replacing the leftmost second-order hole of \(E ({ i}.{ e}.,\,E {\centerdot } 1{} \)) with \(C\) (and by interpreting the syntactical bracket \([-]\) as the substitution operation). For example, we have: \(\left( (\lambda x. \![\![ \,\, \!]\!][x][x])\![\![ \,\, \!]\!]\right) {\![\![ \lambda y.y[\,][\,] \!]\!]} = \left( \lambda x.(\lambda y.y[\,][\,])[x][x]\right) \![\![ \,\, \!]\!]= (\lambda x. \lambda y. yxx)\![\![ \,\, \!]\!].\) Below we actually consider only second-order contexts whose second-order holes are of the form \([\![ \,\, \!]\!]_n^{\theta }{} { or}\![\![ \,\, \!]\!]_n^{\theta ' {\Rightarrow }\theta }{} \).

We are now ready to define the decomposition function \(\widehat{\varPhi }_m\). We first prepare an auxiliary function \(\varPhi _{m} (t) = (E,u,P)\) such that (i) \(u\) is an auxiliary 0/1-subcontext, (ii) \(E{\![\![ u\cdot P \!]\!]} = t\), and (iii) the size of each context in \(P\) is between \(m\) and \(2m-1\). It is defined by induction on the size of \(t\) as follows:

If \(|t|<m\), then \(\varPhi _{m} (t) \triangleq (\![\![ \,\, \!]\!], t, \epsilon )\).

If \(|t|\ge m\), then:

Above, we have omitted the context-typing/size annotations of second-order holes for simplicity (see the full version [17] for details). The decomposition function \(\widehat{\varPhi }_m\) is then defined by \(\widehat{\varPhi }_m(t) \triangleq (E{\![\![ u \!]\!]}, P)\) where \((E,u,P) = \varPhi _{m} (t)\).

In the rest of this subsection, we show key properties of \(\widehat{\varPhi }_m\). We say that a 0/1-context \(u\) is good for \(m\) if \(u\) is either (i) a \(\lambda \)-abstraction where \(|u| = m\); or (ii) an application \(u_1u_2\) where \(|u_j| < m\) for each \(j=1,2\). By the definition of \(\widehat{\varPhi }_m(t) = (E, P)\), every component \(u\) of \(P\) is good for \(m\).

For \(m \ge 2,\,E\) and \(1 \le i \le \mathtt {shn}(E)\), we define \(\widehat{U}_{E {\centerdot } i}^{m}(\delta ,\iota ,\xi )\), \(\varLambda _{E}^{m}(\delta ,\iota ,\xi )\), and \(\mathcal {B}_{n}^{m}(\delta ,\iota ,\xi )\) by:

$$\begin{aligned} \widehat{U}_{E {\centerdot } i}^{m}(\delta ,\iota ,\xi )\triangleq&\ \{ u \in \mathcal{U}^{\nu }(\delta ,\iota ,\xi ) \mid u : E {\centerdot } i \text { and } u \text { is good{} for } m \}\\ \varLambda _{E}^{m}(\delta ,\iota ,\xi )\triangleq&\ \{ [E{\![\![ u_1 \cdots u_{\mathtt {shn}(E)} \!]\!]}]_\alpha \mid u_i \in \widehat{U}_{E {\centerdot } i}^{m}(\delta ,\iota ,\xi )\text { for } 1 \le i \le \mathtt {shn}(E) \}\\ \mathcal {B}_{n}^{m}(\delta ,\iota ,\xi )\triangleq&\ \{ E \mid (E,P) = \widehat{\varPhi }_m(\nu ([t]_\alpha )) \text { for some } [t]_\alpha \in \varLambda _n^\alpha (\delta ,\iota ,\xi ) \}. \end{aligned}$$

Intuitively, \(\widehat{U}_{E {\centerdot } i}^{m}(\delta ,\iota ,\xi )\) is the set of good contexts that can fill \(E.i\), \(\varLambda _{E}^{m}(\delta ,\iota ,\xi )\) is the set of terms obtained by filling the second-order holes of \(E\) with good contexts, and \(\mathcal {B}_{n}^{m}(\delta ,\iota ,\xi )\) is the set of second-order contexts that can be obtained by decomposing a term of size \(n\). The following lemma states the key properties of \(\widehat{\varPhi }_m\).

Lemma 2

(decomposition). Let \(\delta ,\iota ,\xi \ge 0{ and}2 \le m \le n\).

  1. 1.

    \(\varLambda _n^\alpha (\delta ,\iota ,\xi )\) is the disjoint union of \(\varLambda _{E}^{m}(\delta ,\iota ,\xi )\)’s, i.e., \(\varLambda _n^\alpha (\delta ,\iota ,\xi ) = \biguplus _{E\in \mathcal {B}_{n}^{m}(\delta ,\iota ,\xi )} \varLambda _{E}^{m}(\delta ,\iota ,\xi )\). Moreover, \(\widehat{\varPhi }_m(E{\![\![ P \!]\!]}) = (E,P)\) holds for any \(P \in \prod _{1 \le i \le \mathtt {shn}(E)} \widehat{U}_{E {\centerdot } i}^{m}(\delta ,\iota ,\xi )\).

  2. 2.

    \(m \le |E {\centerdot } i| < 2m (1\le i \le \mathtt {shn}(E))\) for every \(E \in \mathcal {B}_{n}^{m}(\delta ,\iota ,\xi )\).

  3. 3.

    \(\mathtt {shn}(E) \ge n/4m\) for every \(E \in \mathcal {B}_{n}^{m}(\delta ,\iota ,\xi )\).

The second and third properties say that \(\widehat{\varPhi }_m\) decomposes each term into sufficiently many contexts of appropriate size.

4.3 Explosive Context

Here, we show that each \(\widehat{U}_{E {\centerdot } i}^{m}(\delta ,\iota ,\xi )\) contains at least one context that has a very long reduction sequence. To this end, we first prepare a special context \( Expl _{k}^{m}{} \) that has a long reduction sequence, and shows that at least one element of \(\widehat{U}_{E {\centerdot } i}^{m}(\delta ,\iota ,\xi )\) contains \( Expl _{k}^{m}\) as a subcontext.

We define a “duplicating term” \( Dup \triangleq \lambda x^\mathtt {o}. (\lambda x^\mathtt {o}. \lambda {*}^\mathtt {o}. x)xx\), and \( Id \triangleq \lambda x^\mathtt {o}. x\). For two terms \(t,\,t'{} \) and integer \(n \ge 1\), we define the “ \(n\)-fold application” operation \(\mathop {{\uparrow }^{n}}{} { as}t\mathop {{\uparrow }^{0}}t' \triangleq t'{} \) and \(t\mathop {{\uparrow }^{n}}t' \triangleq t (t\mathop {{\uparrow }^{n-1}}t')\). For an integer \(k \ge 2\), we define an order-\(k\) term

$$ \overline{2}_{k} \triangleq \lambda f^{\tau (k) {\rightarrow }\tau (k)}. \lambda x^{\tau (k)}. f (f x) $$

where \(\tau (i)\) is defined by \(\tau (2) \triangleq \mathtt {o}\) and \(\tau (i+1) \triangleq \tau (i) {\rightarrow }\tau (i)\).

Definition 5

(explosive context). Let \(m \ge 1\) and \(k \ge 2\) be integers and let

$$ t \triangleq \nu \left( \lambda x^\mathtt {o}.\bigl ( (\overline{2}_{k}\mathop {{\uparrow }^{m}}\overline{2}_{k-1})\overline{2}_{k-2} \cdots \overline{2}_{2} \, Dup ( Id \, x^{\dagger }) \bigr )\right) $$

where \(x^{\dagger }\) is just variable \(x\) but we put \(\dagger \) to refer to the occurrence. We define the explosive context \( Expl _{m}^{k} ({ of}m\)-fold and order \(k\)) as the 1-context obtained by replacing the “normalized” variable \({x_1}^{\dagger }\) in \(t\) with \([\,]\).

We state key properties of \( Expl _{m}^{k}{} \) below. The proof of Item 3 is the same as that in [1]. The other items follow by straightforward calculation.

Lemma 3

(explosive).

  1. 1.

    \(\emptyset \vdash Expl _{m}^{k}[x_1] : \mathtt {o}{\rightarrow }\mathtt {o}\) is derivable.

  2. 2.

    \(| Expl _{m}^{k}| = 8m + 8k - 3\).

  3. 3.

    \(\mathtt {ord}( Expl _{m}^{k}[x_1]) = k,\,\mathtt {iar}( Expl _{m}^{k}[x_1]) = k\) and \(\#\!\left( \smash {\mathbf {V}( Expl _{m}^{k})}\right) =2\).

  4. 4.

    \( Expl _{m}^{k} \in \mathcal{U}^{\nu }(\delta ,\iota ,\xi )\) if \(\delta \), \(\iota \ge k\) and \(\xi \ge 2\).

  5. 5.

    If a term \(t\) satisfies \( Expl _{m}^{k} \preceq t\), then \(\beta (t) \ge {\mathbf {exp}_{k}(m)}{} \) holds.

We show that at least one element of \(\widehat{U}_{E {\centerdot } i}^{m}(\delta ,\iota ,\xi )\) contains \( Expl _{m'}^{k}\) as a subcontext.

Lemma 4

Let \(\delta , \iota , \xi \ge 2\) be integers and \(k=\min \{\delta ,\iota \}\). There exist integers \(b,c \ge 2\) such that, for any \(n\ge 1\), \(m' \ge b\), \(E \in \mathcal {B}_{n}^{cm'}(\delta ,\iota ,\xi ){} \) and \(i \in \{1,\dots ,\mathtt {shn}(E)\}\), \(\widehat{U}_{E {\centerdot } i}^{cm'}(\delta ,\iota ,\xi ){} \) contains \(u'\) such that \( Expl _{m'}^{k} \preceq u'{} \).

Proof Sketch

We pick \(u'' \in \widehat{U}_{E {\centerdot } i}^{cm'}(\delta ,\iota ,\xi )\) and construct \(u'\) by replacing some subcontext \(u_0\) of \(u''\) with a 0/1-context of the form \(S^{\circ }[ Expl _{m'}^{k}[u^{\circ }]]\). Here \(S^{\circ }\) and \(u^{\circ }\) adjust the context typing and size of \( Expl _{m'}^{k}\) and these can be obtained by using Proposition 2. The subcontext \(u_0\) is chosen so that the goodness of \(u''\) is preserved by this replacement.   \(\square \)

4.4 Proof Sketch of Theorem 1

We are now ready to prove the main theorem; see the full version [17] for details. For readability, we omit the parameters \((\delta ,\iota ,\xi )\), and write \(\varLambda _n^\alpha ,\mathcal{U}_n^{\nu },\varLambda _{E}^{m},\widehat{U}_{E {\centerdot } i}^{m}\) and \(\mathcal {B}_{n}^{m}\) for \(\varLambda _n^\alpha (\delta ,\iota ,\xi ),\mathcal{U}^{\nu }_{n}(\delta ,\iota ,\xi ),\varLambda _{E}^{m}(\delta ,\iota ,\xi ),\widehat{U}_{E {\centerdot } i}^{m}(\delta ,\iota ,\xi )\) and \(\mathcal {B}_{n}^{m}(\delta ,\iota ,\xi )\) respectively.

Let \(\overline{p}(n)\) be the probability that a randomly chosen normalized term \(t{ in}\varLambda _n^\alpha \) does not contain \( Expl _{\lceil \log ^{(2)}(n) \rceil }^{k}{} \) as a subcontext. By Item 3 of Lemma 3, it suffices to show \(\lim _{n\rightarrow \infty }\overline{p}(n)=0\). Let \(b{ and}c\) be the constants in Lemma 4 and let \(n \ge 2^{\mathchoice{2^{\mathchoice{b}{b}{b}{{\scriptstyle b}}}}{2^{\mathchoice{b}{b}{b}{{\scriptstyle b}}}}{2^{\mathchoice{b}{b}{b}{{\scriptstyle b}}}}{{\scriptstyle 2^{\mathchoice{b}{b}{b}{{\scriptstyle b}}}}}}\), \(m' = \lceil \log ^{(2)}(n)\rceil \) and \(m = cm'\). Then \(m' \ge \log ^{(2)}(n) \ge b\).

By Lemma 2, \(\varLambda _n^\alpha \) can be represented as the disjoint union \(\uplus _{E \in \mathcal {B}_{n}^{m}} \varLambda _{E}^{m}\). Let \(\overline{\varLambda }_{E}^{m}\) be the subset of \(\varLambda _{E}^{m}\) that does not contain \( Expl _{m'}^{k}\) as a subcontext. By Lemma 4, each of \(\widehat{U}_{E {\centerdot } i}^{m}\) contains at least one element that has \( Expl _{m'}^{k}\) as a subcontext. Furthermore, since \(m\le |E {\centerdot } i|<2m\), we have \(\#\!\left( \widehat{U}_{E {\centerdot } i}^{m}\right) \le \#\!\left( \mathcal{U}_{2m+d}^{\nu }\right) \) for some constant d (see the full version [17]). Thus, we have

$$\begin{aligned} \frac{\#\!\left( \overline{\varLambda }_{E}^{m}\right) }{\#\!\left( \varLambda _{E}^{m}\right) } \le&\prod _{1 \le i\le \mathtt {shn}(E)} \left( 1-\frac{1}{\#\!\left( \widehat{U}_{E {\centerdot } i}^{m}\right) }\right) \le \left( 1-\frac{1}{\#\!\left( \mathcal{U}_{2m+d}^{\nu }\right) }\right) ^{\mathchoice{\mathtt {shn}(E)}{\mathtt {shn}(E)}{\mathtt {shn}(E)}{{\scriptstyle \mathtt {shn}(E)}}}\\ \le&\left( 1-\frac{1}{\#\!\left( \mathcal{U}_{2m+d}^{\nu }\right) }\right) ^{\mathchoice{\frac{\mathchoice{n}{n}{n}{{\scriptstyle n}}}{\mathchoice{4m}{4m}{4m}{{\scriptstyle 4m}}}}{\frac{\mathchoice{n}{n}{n}{{\scriptstyle n}}}{\mathchoice{4m}{4m}{4m}{{\scriptstyle 4m}}}}{\frac{\mathchoice{n}{n}{n}{{\scriptstyle n}}}{\mathchoice{4m}{4m}{4m}{{\scriptstyle 4m}}}}{{\scriptstyle \frac{\mathchoice{n}{n}{n}{{\scriptstyle n}}}{\mathchoice{4m}{4m}{4m}{{\scriptstyle 4m}}}}}} \quad (\because \text {Item~3 of Lemma~2}). \end{aligned}$$

Let \(q(n)\) be the rightmost expression. Then we have

$$\begin{aligned} \overline{p}(n) =&\frac{\sum _{E \in \mathcal {B}_{n}^{m}} \#\!\left( \overline{\varLambda }_{E}^{m}\right) }{\sum _{E \in \mathcal {B}_{n}^{m}} \#\!\left( \varLambda _{E}^{m}\right) } \le \frac{\sum _{E \in \mathcal {B}_{n}^{m}} \left( q(n) \#\!\left( \varLambda _{E}^{m}\right) \right) }{\sum _{E\in \mathcal {B}_{n}^{m}} \#\!\left( \varLambda _{E}^{m}\right) }\\ =&\frac{q(n)\sum _{E\in \mathcal {B}_{n}^{m}} \#\!\left( \varLambda _{E}^{m}\right) }{\sum _{E\in \mathcal {B}_{n}^{m}} \#\!\left( \varLambda _{E}^{m}\right) }=q(n) \le \left( 1-\frac{1}{c'\overline{\gamma }(\delta ,\iota ,\xi )^{2m}}\right) ^{\mathchoice{\frac{\mathchoice{n}{n}{n}{{\scriptstyle n}}}{\mathchoice{4m}{4m}{4m}{{\scriptstyle 4m}}}}{\frac{\mathchoice{n}{n}{n}{{\scriptstyle n}}}{\mathchoice{4m}{4m}{4m}{{\scriptstyle 4m}}}}{\frac{\mathchoice{n}{n}{n}{{\scriptstyle n}}}{\mathchoice{4m}{4m}{4m}{{\scriptstyle 4m}}}}{{\scriptstyle \frac{\mathchoice{n}{n}{n}{{\scriptstyle n}}}{\mathchoice{4m}{4m}{4m}{{\scriptstyle 4m}}}}}} \quad (\because \text {Lemma~1}) \end{aligned}$$

for sufficiently large n. Finally, we can conclude that

$$\begin{aligned} \overline{p}(n) \le \left( 1-\frac{1}{c'\overline{\gamma }(\delta ,\iota ,\xi )^{2c\lceil \log ^{(2)}(n)\rceil }}\right) ^{\mathchoice{\frac{\mathchoice{n}{n}{n}{{\scriptstyle n}}}{\mathchoice{4c\lceil \log ^{(2)}(n)\rceil }{4c\lceil \log ^{(2)}(n)\rceil }{4c\lceil \log ^{(2)}(n)\rceil }{{\scriptstyle 4c\lceil \log ^{(2)}(n)\rceil }}}}{\frac{\mathchoice{n}{n}{n}{{\scriptstyle n}}}{\mathchoice{4c\lceil \log ^{(2)}(n)\rceil }{4c\lceil \log ^{(2)}(n)\rceil }{4c\lceil \log ^{(2)}(n)\rceil }{{\scriptstyle 4c\lceil \log ^{(2)}(n)\rceil }}}}{\frac{\mathchoice{n}{n}{n}{{\scriptstyle n}}}{\mathchoice{4c\lceil \log ^{(2)}(n)\rceil }{4c\lceil \log ^{(2)}(n)\rceil }{4c\lceil \log ^{(2)}(n)\rceil }{{\scriptstyle 4c\lceil \log ^{(2)}(n)\rceil }}}}{{\scriptstyle \frac{\mathchoice{n}{n}{n}{{\scriptstyle n}}}{\mathchoice{4c\lceil \log ^{(2)}(n)\rceil }{4c\lceil \log ^{(2)}(n)\rceil }{4c\lceil \log ^{(2)}(n)\rceil }{{\scriptstyle 4c\lceil \log ^{(2)}(n)\rceil }}}}}} \longrightarrow 0 \quad (\text {as } n \longrightarrow \infty ) \end{aligned}$$

(see the full version [17] for the last convergence) as required.   \(\square \)

5 Related Work

As mentioned in Sect. 1, there are several pieces of work on probabilistic properties of untyped \(\lambda \)-terms [2,3,4]. David et al. [2] have shown that almost all untyped \(\lambda \)-terms are strongly normalizing, whereas the result is opposite for terms expressed in SK combinators.

Their former result implies that untyped \(\lambda \)-terms do not satisfy the infinite monkey theorem, i.e., for any term t, the probability that a randomly chosen term of size n contains t as a subterm tends to zero. Bendkowski et al. [4] proved that almost all terms in de Brujin representation are not strongly normalizing, by regarding the size of an index i is \(i+1\), instead of the constant 1. The discrepancies among those results suggest that this kind of probabilistic property is quite fragile and depends on the definition of the syntax and the size of terms. Thus, the setting of our paper, especially the assumption on the boundedness of internal arities and the number of variables is a matter of debate, and it would be interesting to study how the result changes for different assumptions.

We are not aware of similar studies on typed \(\lambda \)-terms. In fact, in their paper about combinatorial aspects of \(\lambda \)-terms, Grygiel and Lescanne [3] pointed out that the combinatorial study of typed \(\lambda \)-terms is difficult, due to the lack of (simple) recursive definition of typed terms. In the present paper, we have avoided the difficulty by making the assumption on the boundedness of internal arities and the number of variables (which is, as mentioned above, subject to a debate though).

In a larger context, our work may be viewed as an instance of the studies of average-case complexity ([20], Chap. 10), which discusses “typical-case feasibility”. We are not aware of much work on the average-case complexity of problems with hyper-exponential complexity.

6 Conclusion

We have shown that almost every simply-typed \(\lambda \)-term of order \(k\) has a \(\beta \)-reduction sequence as long as \((k-2)\)-fold exponential in the term size, under a certain assumption. To our knowledge, this is the first result of this kind for typed \(\lambda \)-terms. A lot of questions are left for future work, such as (i) whether our assumption (on the boundness of arities and the number of variables) is reasonable, and how the result changes for different assumptions, (ii) whether our result is optimal (e.g., whether almost every term has a \(k\)-fold exponentially long reduction sequence), and (iii) whether similar results hold for Terui’s decision problems [14] and/or the higher-order model checking problem [6].