1 Introduction

Biased random walks in inhomogeneous environments are a natural setting to witness trapping phenomena. In the case of supercritical Galton–Watson trees with leaves (see [6, 22, 28]) or the supercritical percolation cluster on \(\mathbb Z ^d\) (see [17]), for example, it has been observed that dead-ends found in the environment can, for suitably strong biases, create a sub-ballistic regime that is characteristic of trapping. More specifically, for both of these models, the distribution of the time spent in individual traps has polynomial tail decay, and this places them in the same universality class as the one-dimensional heavy-tailed trapping models considered in [33]. Indeed, although the case of a deterministically biased random walk on a Galton–Watson tree with leaves is slightly complicated by a certain lattice effect, which means it can not be rescaled properly [6], in the case of randomly biased random walks on such structures, it was shown in [22] (see also the related article [8]) that precisely the same limiting behaviour as the one-dimensional models of [15] and [33] occurs. Moreover, there is evidence presented in [17] that suggests the biased random walk on a supercritical percolation cluster also has the same limiting behaviour. The universality class that connects these models was previously investigated in [3, 4] and [5], and is characterised by limiting stable subordinators and aging properties.

The aim of this paper is to investigate biased random walks on critical structures. To this end, we choose to study the biased random walk on a critical Galton–Watson tree conditioned to survive. With the underlying environment having radically different properties from its supercritical counterpart, we would expect different limiting behaviour, with more extreme trapping phenomena, to arise. It is further natural to believe that some of the properties of the biased random walk on the incipient infinite cluster for critical percolation on \(\mathbb Z ^d\), at least in high dimensions, would be similar to the ones proved in our context, as is observed to be the case for the unbiased random walk (compare, for instance, the results of [1] and [25]). Nevertheless, our current understanding of the geometry of this object is not sufficient to extend our results easily, and so we do not pursue this inquiry here. In particular, we anticipate that, as indicated by physicists in [2], for percolation close to criticality there is likely to be an additional trapping mechanism that occurs due to spatial considerations, which means that, even without taking the effect of dead-ends into account, it is more likely for the biased random walk to be found in certain regions of individual paths than others (see [9] for a preliminary study in this direction).

Our main model—the biased random walk on critical Galton–Watson trees conditioned to survive—is presented in the next section, along with a summary of the results we are able to prove for it. This is followed in Sect. 1.2 with an introduction to a one-dimensional trapping model in which the trapping time distributions have slowly-varying tails. This latter model, which is of interest in its own right, is of particular relevance for us, as it allows us to comprehensively characterise the universality class into which the Galton–Watson trees we consider fall. Furthermore, the arguments we apply for the one-dimensional model provide a useful template for the more complicated tree framework.

1.1 Biased random walk on critical Galton–Watson trees

Before presenting the Galton–Watson tree framework, we recall some classical results for sums of random variables whose distribution has a slowly-varying tail. Let \((X_i)_{i=1}^\infty \) be independent random variables, with distributional tail \(\bar{F}(u)=1-F(u)=\mathbf{P}(X_i>u)\) satisfying: \(\bar{F}(0)=1,\,\bar{F}(u)>0\) for all \(u>0\),

$$\begin{aligned} \lim _{u\rightarrow \infty }\frac{\bar{F}(uv)}{\bar{F}(u)}=1, \end{aligned}$$
(1.1)

for any \(v>0\), and \(\bar{F}(u)\rightarrow 0\) as \(u\rightarrow \infty \). A typical example is when the distribution in question decays logarithmically slowly, such as

$$\begin{aligned} \bar{F}(u)\sim \frac{1}{(\ln u)^{\gamma }}, \end{aligned}$$
(1.2)

for some \(\gamma >0\), where throughout the article \(f\sim g\) will mean \(f(x)/g(x)\rightarrow 1\) as \(x\rightarrow \infty \). A first scaling result for sums of the form \(\sum _{i=1}^nX_i\) was obtained in [11], and this was subsequently extended by [23] to a functional result. In particular, in [23] it was established that if \(L(x):=1/\bar{F}(x)\), then

$$\begin{aligned} \left(\frac{1}{n}L\left(\sum _{i=1}^{nt}X_i\right)\right)_{t\ge 0}\rightarrow \left(m(t)\right)_{t\ge 0} \end{aligned}$$
(1.3)

in distribution with respect to the Skorohod \(J_1\) topology (as an aid to the reader, we provide in the appendix a definition of the Skorohod \(J_1\) and \(M_1\) topologies, the latter of which is applied in several subsequent results), where \(m=(m(t))_{t\ge 0}\) is an extremal process. To define \(m\) more precisely, suppose that \((\xi (t))_{t\ge 0}\) is the symmetric Cauchy process, i.e., the Lévy process with Lévy measure given by \(\mu ((x,\infty ))=x^{-1/2}\) for \(x>0\), and then set

$$\begin{aligned} m(t)=\max _{0<s\le t}\Delta \xi (s), \end{aligned}$$

where \(\Delta \xi (s)=\xi (s)-\xi (s^-)\). (Observe that \((m(t))_{t\ge 0}\) is thus the maximum process of the Poisson point process with intensity measure \(x^{-2}dxdt\).) We will prove that, in addition to appearing in the limit at (1.3), this extremal process arises in the scaling limits of a biased random walk on a critical Galton–Watson tree and, as is described in the next section, a one-dimensional directed trap model whose holding times have a slowly-varying mean.

We continue by introducing some relevant branching process and random walk notation, following the presentation of [10]. Let \(Z\) be a critical \((\mathbf{E}Z=1)\) offspring distribution in the domain of attraction of a stable law with index \(\alpha \in (1,2]\), by which we mean that there exists a sequence \(a_n\uparrow \infty \) such that

$$\begin{aligned} \frac{Z[n]-n}{a_n}\mathop {\rightarrow }\limits ^{d} X, \end{aligned}$$
(1.4)

where \(Z[n]\) is the sum of \(n\) i.i.d. copies of \(Z\) and \(\mathbf{E}(e^{-\lambda X})=e^{-\lambda ^\alpha }\) for \(\lambda \ge 0\). Note that, by results of [16, Chapters XIII and XVII], this is equivalent to the probability generating function of \(Z\) satisfying

$$\begin{aligned} f(s):=\mathbf{E}(s^Z)=\sum _{k=0}^\infty p_ks^k=s+(1-s)^{\alpha }L(1-s),\quad \forall s\in (0,1), \end{aligned}$$
(1.5)

where \(L(x)\) is slowly varying as \(x\rightarrow 0^+\), and the non-triviality condition \(\mathbf{P}(Z=1)\ne 1\) holding. We point out that the condition \(\mathbf{E}(Z^2)<\infty \) is sufficient for the previous statements to hold with \(\alpha =2\).

Denote by \((Z_n)_{n\ge 0}\) the corresponding Galton–Watson process, started from \(Z_0=1\). It has been established in [30, Lemma 2] that if \(q_n:=\mathbf{P}(Z_n>0)\), then

$$\begin{aligned} q_n^{\alpha -1}L(q_n)\sim \frac{1}{(\alpha -1)n}, \end{aligned}$$
(1.6)

as \(n\rightarrow \infty \), where \(L\) is the function appearing in (1.5). It is also well known that the branching process \((Z_n)_{n\ge 0}\) can be obtained as the generation size process of a Galton–Watson tree, \(\mathcal T \) say, with offspring distribution \(Z\). In particular, to construct the random rooted graph tree \(\mathcal T \), start with a single ancestor (or root), and then suppose that individuals in a given generation have offspring independently of the past and each other according to the distribution of \(Z\), see [26, Section 3] for details. The vertex set of \(\mathcal T \) is the entire collection of individuals, edges are the parent-offspring bonds, and \(Z_n\) is the number of individuals in the \(n\)th generation of \(\mathcal T \). From (1.6), it is clear that \(\mathcal T \) will be a finite graph \(\mathbf{P}\)-a.s. However, in [24], Kesten showed that it is possible to make sense of conditioning \(\mathcal T \) to survive or ‘grow to infinity’. More specifically, there exists a unique (in law) random infinite rooted locally-finite graph tree \(\mathcal T ^*\) that satisfies, for any \(n\in \mathbb Z _+\),

$$\begin{aligned} \mathbf{E}\left(\phi (\mathcal T ^*|_n)\right)=\lim _{m\rightarrow \infty }\mathbf{E}\left(\phi (\mathcal T |_n)|Z_{m+n}>0\right)\!, \end{aligned}$$

where \(\phi \) is a bounded function on finite rooted graph trees of \(n\) generations, and \(\mathcal T |_n,\,\mathcal T ^*|_n\) are the first \(n\) generations of \(\mathcal T ,\,\mathcal T ^*\) respectively. We will write \(d_\mathcal{T ^*}\) to represent the shortest path graph distance on \(\mathcal T ^*\).

Given a particular realisation of \(\mathcal T ^*\), we will denote by \(X=((X_n)_{n\ge 0}, P_x^\mathcal{T ^*},x\in \mathcal T ^*)\) the discrete-time biased random walk on \(\mathcal T ^*\), and define this as follows. First, fix a bias parameter \(\beta >1\), and assign to each edge connecting a vertex \(x\) in generation \(k\) to a vertex \(y\) in generation \(k+1\) a conductance \(c(x,y):=\beta ^k=:c(y,x)\). The transition probabilities of \(X\) are then determined by

$$\begin{aligned} P^\mathcal{T ^*}(x,y):=\frac{c(x,y)}{\sum _{y^{\prime }\sim x}c(x,y^{\prime })},\quad \forall x\sim y, \end{aligned}$$

where the notation \(x\sim y\) means that \(x\) and \(y\) are connected by an edge in \(\mathcal T ^*\). Thus, when at a vertex \(x\) that is not equal to the root of \(\mathcal T ^*\), the probability of jumping to a neighbouring vertex further away from the root than \(x\) is \(\beta \) times more likely than jumping towards the root. Using the usual terminology for random walks in random environments, we will say that \(P_x^\mathcal{T ^*}\) is the quenched law of the biased random walk on \(\mathcal T ^*\) started from \(x\). Moreover, we introduce the annealed law for the process started from \(\rho \), the root of the tree \(\mathcal T ^*\), by setting

$$\begin{aligned} \mathbb P _\rho (\cdot ):=\int P_\rho ^\mathcal{T ^*}(\cdot ) \mathrm{d}\mathbf{P}. \end{aligned}$$
(1.7)

It will be this law under which we investigate the rate at which the process \(X\), which we call the biased random walk on a critical Galton–Watson tree conditioned to survive, escapes from the root.

The main result we prove for the process \(X\) concerns the time it takes to progress along the backbone. To be more specific, as is described in more detail in Sect. 3.1, \(\mathbf{P}\)-a.s. the tree \(\mathcal T ^*\) admits a unique backbone, that is, a semi-infinite path starting from the root, \(\{\rho =\rho _0,\rho _1,\rho _2,\dots \}\) say. We define \((\Delta _n)_{n\ge 0}\) by setting

$$\begin{aligned} \Delta _n:=\inf \left\{ m\ge 0:\,X_m=\rho _n\right\} \end{aligned}$$
(1.8)

to be the first time the process \(X\) reaches level \(n\) along this path. For this process, we are able to prove the following functional limit theorem.

Theorem 1.1

Let \(\alpha \in (1,2]\). As \(n\rightarrow \infty \), the laws of the processes

$$\begin{aligned} \left(\frac{(\alpha -1)\ln _+ \Delta _{nt}}{n\ln \beta }\right)_{t\ge 0} \end{aligned}$$

under \(\mathbb P _\rho \) converge weakly with respect to the Skorohod \(J_1\) topology on \(D([0,\infty ),\mathbb R )\) to the law of \((m(t))_{t\ge 0}\).

It is interesting to observe that this result is extremely explicit compared to its supercritical counterparts. Indeed, notwithstanding the fact the lattice-effect that was the source of somewhat complicated behaviour in [6] does not occur in the critical setting, the above scaling limit clearly describes the \(\beta \)-dependence of the relevant slowdown effect. Note that, unlike in the supercritical case where there is a ballistic phase, this slowdown effect occurs for any non-trivial bias parameter, i.e. for any \(\beta >1\). Furthermore, we remark that the dependence on \(\alpha \) is natural: as \(\alpha \) decreases and the leaves get thicker (in the sense that tree’s Hausdorff dimension of \(\alpha /(\alpha -1)\) increases, see [12, 21]), the biased random walk moves more slowly away from its start point.

As suggested by comparing Theorem 1.1 with (1.3), the critical Galton–Watson tree case is closely linked with a sum of independent and identically-distributed random variables where \(\bar{F}\) is asymptotically equivalent to \(\ln \beta /(\alpha -1)\ln x\). Although the logarithmic rate of decay is relatively easy to guess, finding the correct constant is slightly subtle, particularly for \(\alpha \ne 2\). This is because, unlike in the supercritical case and the critical case with \(\alpha =2\), when \(\alpha \ne 2\) it can happen that there are multiple deep traps emanating from a single backbone vertex. As a result, we have to take special care which of these have actually been visited when determining the time spent there, meaning that the random variable which actually has the \(\ln \beta /(\alpha -1)\ln x\) tail behaviour is not environment measurable (see Lemma 3.11). To highlight the importance of this consideration, which is also relevant albeit in a simpler way for \(\alpha =2\), in Theorem 3.14 we show that the constant that appears differs by a factor \(\alpha \) when \(\Delta _n\) is replaced by its quenched mean \(E^\mathcal{T ^*}_\rho \Delta _n\).

Theorem 1.1 readily implies the following corollary for the projection, \((\pi (X_m))_{m\ge 0}\), of the process \((X_m)_{m\ge 0}\) onto the backbone (roughly, \(\pi (X_m)\) is the vertex on the backbone from which the trap \(X_m\) is located in emanates, see Sect. 3.2 for a precise definition). To state this, we define the right-continuous inverse \((m^{-1}(t))_{t\ge 0}\) of \((m(t))_{t\ge 0}\) by setting

$$\begin{aligned} m^{-1}(t):=\inf \left\{ s\ge 0:\,m(s)>t\right\} \!. \end{aligned}$$
(1.9)

Corollary 1.2

Let \(\alpha \in (1,2]\). As \(n\rightarrow \infty \), the laws of the processes

$$\begin{aligned} \left(\frac{ d_\mathcal{T ^*}\left(\rho ,\pi (X_{e^{nt}})\right)\ln \beta }{(\alpha -1) n}\right)_{t\ge 0} \end{aligned}$$

under \(\mathbb P _\rho \) converge weakly with respect to the Skorohod \(M_1\) topology on \(D([0,\infty ),\mathbb R )\) to the law of \((m^{-1}(t))_{t\ge 0}\).

Remark 1.3

Since the height of the leaves in which the random walk can be found at time \(e^{n}\) (see the localisation result of Lemma 4.5) will typically be of order \(n\), some further argument will be necessary to deduce a limit result for the graph distance \(d_\mathcal{T ^*}(\rho ,X_n)\) itself.

Another characteristic property that we are able to show is that the random walk also exhibits extremal aging.

Theorem 1.4

Let \(\alpha \in (1,2]\). For any \(0<a<b\), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb P _\rho \left(\pi (X_{e^{an}})=\pi (X_{e^{bn}})\right)=\frac{a}{b}. \end{aligned}$$

Although regular aging has previously been observed for random walks in random environments in the sub-ballistic regime on \(\mathbb Z \) (see [14]), as far as we know, this is the first example of a random walk in random environment where extremal aging has been proved. As already hinted at, this kind of behaviour, as well as that demonstrated in Theorem 1.1 and Corollary 1.2, places the biased random walk on a critical Galton–Watson tree conditioned to survive in a different universality class to that of the supercritical structures discussed previously. In the class of critical Galton–Watson trees we have instead the spin glass models considered in [7] and [20], and the trap models with slowly-varying tails we introduce in the next section.

1.2 One-dimensional directed trap model with slowly-varying tails

In this section, we describe the one-dimensional trap model with which we want to compare to our main model, and the results we are able to prove for it. To start with a formal definition, let \(\tau =(\tau _x)_{x\in \mathbb Z }\) be a family of independent and identically-distributed strictly positive (and finite) random variables whose distribution has a slowly-varying tail, in the sense described by (1.1), built on a probability space with measure \(\mathbf{P}\); the sequence \(\tau =(\tau _x)_{x\in \mathbb Z }\) will represent the trap environment. For a fixed bias parameter \(\beta >1\), the directed trap model is then the continuous-time Markov process \(X=(X_t)_{t\ge 0}\) with state space \(\mathbb Z \), given by \(X_0=0\) and with jump rates

$$\begin{aligned} c(x,y):=\!\left\{ \begin{array}{ll} \left(\frac{\beta }{\beta +1}\right)\tau _x^{-1},&\quad \text{ if}\,y=x+1, \\ \left(\frac{1}{\beta +1}\right)\tau _x^{-1},&\quad \text{ if}\,y=x-1, \end{array}\right. \end{aligned}$$

and \(c(x,y)=0\) otherwise. To be more explicit, for a particular realisation of \(\tau \) we will write \(P^{\tau }_x\) for the law of the Markov chain with the above transition rates, started from \(x\); similarly to describing \(P_x^\mathcal{T ^*}\) in the previous section, we call this the quenched law for the directed trap model. The corresponding annealed law \(\mathbb P _x\) is obtained by integrating out the environment similarly to (1.7), i.e.

$$\begin{aligned} \mathbb P _x(\cdot ):=\int P^\tau _x(\cdot )\mathrm{d}\mathbf{P}. \end{aligned}$$

In studying the rate of escape of the above directed trap model, it is our initial aim to determine the rate of growth of

$$\begin{aligned} \Delta _n:=\inf \{t\ge 0:X_t=n\}, \end{aligned}$$

that is, the hitting times of level \(n\) by the process \(X\). The following theorem contains our main conclusion in this direction. As in the statement at (1.3), we define \(L(x)=1/\bar{F}(x)\).

Theorem 1.5

As \(n\rightarrow \infty \), the laws of the processes

$$\begin{aligned} \left(\frac{1}{n}L\left(\Delta _{nt}\right)\right)_{t\ge 0} \end{aligned}$$

under \(\mathbb P _0\) converge weakly with respect to the Skorohod \(J_1\) topology on \(D([0,\infty ),\mathbb R )\) to the law of the extremal process \((m(t))_{t\ge 0}\).

Similarly to [23, Remark 2.4], we note that the proof of the above result may be significantly simplified in the case when \(\bar{F}\) decays logarithmically. The reason for this is that, in the logarithmic case, the hitting time \(\Delta _{n}\) is very well-approximated by the maximum holding time within the first \(n\) vertices, and so the functional scaling limit for \((\Delta _n)_{n\ge 0}\) can be readily obtained from a simple study of the maximum holding time process. For general slowly varying functions, the same approximation does not provide tight enough control on \(\Delta _{n}\) to apply this argument, and so a more sophisticated approach is required.

As a simple corollary of Theorem 1.5, it is also possible to obtain a scaling result for the process \(X\) itself. The definition of \(m^{-1}\) should be recalled from (1.9). We similarly define the right-continuous inverse \(\bar{F}^{-1}\) of \(\bar{F}\), only with \(>\) replaced by \(<\).

Corollary 1.6

As \(n\rightarrow \infty \), the laws of the processes

$$\begin{aligned} \left(\frac{1}{n}X_{\bar{F}^{-1}(1/nt)}\right)_{t\ge 0} \end{aligned}$$

under \(\mathbb P _0\) converge weakly with respect to the Skorohod \(M_1\) topology on \(D([0,\infty ),\mathbb R )\) to the law of \((m^{-1}(t))_{t\ge 0}\).

Remark 1.7

(i) Although the preceding corollary does look somewhat awkward, it becomes much clearer for concrete choices of \(\bar{F}\). For example, if \(\bar{F}\) has the form described at (1.2), then the above result concerns the distributional limit of

$$\begin{aligned} \left(\frac{1}{n}X_{e^{(nt)^{1/\gamma }}}\right)_{t\ge 0}. \end{aligned}$$

Moreover, it can be deduced from the above result that, as \(t\rightarrow \infty \), the random variable \(\bar{F}(t)X_t\) converges in distribution under \(\mathbb P _0\) to \(m^{-1}(1)\), which is easily checked to have a mean one exponential distribution.

(ii) In a number of places in the proofs of Theorem 1.5 and Corollary 1.6, we are slightly cavalier about assuming that \(\bar{F}(\bar{F}^{-1}(x))=x\) for \(x\in (0,1)\). This is, of course, only true in general when \(\bar{F}\) is continuous. In the case when this condition is not satisfied, however, we can easily overcome the difficulties that arise by replacing \(\bar{F}\) with any non-increasing continuous function \(\bar{G}\) that satisfies \(\bar{G}(0)=1\) and \(\bar{G}(u)\sim \bar{F}(u)\) as \(u\rightarrow \infty \). For example, one could define such a \(\bar{G}\) by setting \(\bar{G}(u):=(\frac{1}{u}\int _0^u L(v)dv)^{-1}\).

The extremal aging result we are able to prove in this setting is as follows.

Theorem 1.8

For any \(0<a<b\), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb P _0\left(X_{\bar{F}^{-1}(1/na)}=X_{\bar{F}^{-1}(1/nb)}\right)=\frac{a}{b}. \end{aligned}$$

Remark 1.9

Note that if \(\bar{F}\) (and the functions \(\bar{F}_n\) introduced below at (2.6)) are not continuous and eventually strictly decreasing, a minor modification to the proof of the above result (cf. Remark 1.7(ii)) is needed.

1.3 Article outline and notes

The remainder of the article is organised as follows. In Sect. 2, we study the one-dimensional trap model introduced in Sect. 1.2 above, proving Theorem 1.5 and Corollary 1.6. In Sect. 3, we then adapt the relevant techniques to derive Theorem 1.1 and Corollary 1.2 for the Galton–Watson tree model. The arguments of both these sections depend on the extension of the limit at (1.3) that is proved in Sect. 5. Before this, in Sect. 4, we derive the extremal aging results of Theorems 1.4 and 1.8. Finally, as noted earlier, the appendix recalls some basic facts concerning Skorohod space.

We finish the introduction with some notes about the conventions used in this article. Firstly, there are two widely used versions of the geometric distribution with a given parameter, one with support \(0,1,2,\dots \) and one with support \(1,2,3,\dots \). In the course of this work, we will use both, and hope that, even without explanation, it is clear from the context which version applies when. Secondly, there are many instances when for brevity we use a continuous variable where a discrete argument is required, in such places \(x\), say, should be read as \(\lfloor x\rfloor \). Finally, we recall that \(f\sim g\) will mean \(f(x)/g(x)\rightarrow 1\) as \(x\rightarrow \infty \).

2 Directed trap model with slowly-varying tails

This section is devoted to the proof of Theorem 1.5 and Corollary 1.6. To this end, we start by deriving some slight adaptations of results from [33] regarding the trap environment. First, define a level \(n\) critical depth for traps of the environment by setting

$$\begin{aligned} g(n):=\bar{F}^{-1}(n^{-1}\ln n). \end{aligned}$$
(2.1)

We will say that there are deep traps at the sites \(\mathcal D :=\{x\in \mathbb Z :\,\tau _x > g(n)\}\), and consider the following events: for \(n\in \mathbb N ,\,T\in (0,\infty )\),

$$\begin{aligned} \mathcal E _1(n,T)&:= \left\{ \min _{\begin{matrix} x_1,x_2\in \mathcal D \cap [1,nT]:\\ x_1\ne x_2 \end{matrix}}|x_1-x_2|> n^\kappa \right\} ,\\ \mathcal E _2(n)&:= \left\{ \mathcal D \cap [-(\ln n)^{1+\gamma },0]=\emptyset \right\} , \end{aligned}$$

where \(\kappa , \gamma \in (0,1)\) are fixed. The event \(\mathcal E _1(n,T)\) requires that the distance between any two deep traps in the interval \([1,nT]\) is large, and the event \(\mathcal E _2(n)\) will help to ensure that the time the process \(X\) spends outside of the strictly positive integers is negligible.

Lemma 2.1

Fix \(T\in (0,\infty )\). As \(n\rightarrow \infty \), the \(\mathbf{P}\)-probability of the events \(\mathcal E _1(n,T)\) and \(\mathcal E _2(n)\) converge to one.

Proof

To check the result for \(\mathcal E _1(n,T)\), we simply observe that

$$\begin{aligned} \mathbf{P}\left(\mathcal E _1(n,T)^c\right)&\le \sum _{\begin{matrix} \{x_1,x_2\}\subseteq [1,nT]:\\ 0<|x_1-x_2|\le n^\kappa \end{matrix}}\mathbf{P}\left(\tau _{x_1},\tau _{x_2}> g(n)\right)\\&\le Tn^{1+\kappa }\bar{F}\left(g(n)\right)^2\le \frac{T(\ln n)^2}{n^{1-\kappa }} \rightarrow 0. \end{aligned}$$

Similarly, we have that \(\mathbf{P}(\mathcal E _2(n)^c)\le n^{-1}(1+(\ln n)^{1+\gamma })\ln n\), which also converges to 0. \(\square \)

We continue by introducing the embedded discrete-time random walk associated with \(X\) and some of its properties, which will be useful throughout the remainder of the section. In particular, first let \(S(0)=0\) and \(S(n)\) be the time of the \(n\)th jump of \(X\); this is the clock process corresponding to \(X\). The embedded discrete-time random walk is then the process \(Y=(Y_n)_{n\ge 0}\) defined by setting \(Y_n:=X_{S(n)}\). Clearly \(Y\) is a biased random walk on \(\mathbb Z \) under \(P^\tau _0\) for \(\mathbf{P}\)-a.e. realisation of \(\tau \), and thus satisfies, \({P}^\tau _0\)-a.s.,

$$\begin{aligned} \frac{Y_n}{n}\rightarrow \frac{\beta -1}{\beta +1}>0. \end{aligned}$$

Whilst this result already tells us that the embedded random walk \(Y\) drifts off to \(+\infty \) and that the time it takes to hit level \(n\), that is,

$$\begin{aligned} \Delta ^Y_n:=\inf \{k\ge 0: Y_k=n\}, \end{aligned}$$

is finite for each \(n,\,P^\tau _0\)-a.s., we further require that it does not backtrack too much, in the sense that, for each \(T\in (0,\infty )\),

$$\begin{aligned} \mathcal E _3(n,T):=\left\{ \min _{0\le i<j\le \Delta ^Y_{nT}}(Y_j-Y_i)>-(\ln n)^{1+\gamma }\right\} \end{aligned}$$

occurs with high probability. This is the content of the following lemma, which is essentially contained in [33, Lemma 3].

Lemma 2.2

Fix \(T\in (0,\infty )\). As \(n\rightarrow \infty \), the \(\mathbb P _0\)-probability of the event \(\mathcal E _3(n,T)\) converges to one.

Let us now introduce the total time the biased random walk \(X\) spends at a site \(x\in \mathbb Z \),

$$\begin{aligned} T_x:=\int _0^\infty \mathbf{1}_{\{X_t=x\}}dt. \end{aligned}$$

To study this, first observe that the clock process \(S=S(n)_{n\ge 0}\) can be written

$$\begin{aligned} S(n)=\sum _{i=0}^{n-1}\tau _{Y_i}\mathbf{e}_i, \end{aligned}$$

where \((\mathbf{e}_i)_{i\ge 0}\) is an independent sequence of mean one exponential random variables under \(P^\tau _0\), independent of \(Y\). Moreover, for \(x\in \mathbb Z \), let \(G(x)=\#\{n\ge 0:Y_n=x\}\) be the total number of visits of the embedded random walk \(Y\) to \(x\). By applying the fact that \(Y\) is a random walk with a strictly positive bias, we have that if \(x\ge 0\), then \(G(x)\) has the geometric distribution with parameter \(p = (\beta -1)/(\beta +1)\) (again for \(\mathbf{P}\)-a.e. realisation of \(\tau \)). It follows that \(T_x\) is equal in distribution under \(\mathbb P _0\) to the random variable

$$\begin{aligned} \tau _x\sum _{i=1}^{G(x)}\mathbf{e}_i, \end{aligned}$$
(2.2)

which is almost-surely finite. We will use this characterisation of the distribution of \(T_x\) to check that the time spent by \(X\) in traps that are not deep is asymptotically negligible, in the sense described by the following event: for \(n\in \mathbb N ,\,T\in (0,\infty )\),

$$\begin{aligned} \mathcal E _4(n,T):=\left\{ \sum _{i=0}^{\Delta _{nT}^Y-1}\tau _{Y_i}\mathbf{e}_i\mathbf{1}_{\{\tau _{Y_i}\le g(n)\}}<\bar{F}^{-1}(n^{-1}(\ln n)^{1/2})\right\} , \end{aligned}$$

In particular, by similar arguments to [33, Lemma 4], we deduce the following.

Lemma 2.3

Fix \(T\in (0,\infty )\). As \(n\rightarrow \infty \), the \(\mathbb P _0\)-probability of the event \(\mathcal E _4(n,T)\) converges to one.

Proof

We start by checking that

$$\begin{aligned} \mathbf{E}(\tau _0\mathbf{1}_{\{ \tau _0\le g(n)\}} )= o(n^{-1}\bar{F}^{-1}(n^{-1}(\ln n)^{1/2})). \end{aligned}$$
(2.3)

To this end, let \(\rho ,\varepsilon \in (0,1)\), and observe that

$$\begin{aligned} \mathbf{E}(\tau _0\mathbf{1}_{\{\tau _0\le g(n)\}} )&\le g(n)\sum _{j=0}^{\infty }\rho ^j\mathbf{P}(\tau _0>\rho ^{j+1}g(n))\\&\le c_1g(n)\sum _{j=0}^{\infty }\rho ^{j-\varepsilon (j+1)}\bar{F}(g(n))\\&\le \frac{c_2g(n)\ln n}{n}, \end{aligned}$$

where the second inequality is an application of the representation theorem for slowly varying functions ([29, Theorem 1.2], for example), which implies that, for any \(\varepsilon >0\), there exists a constant \(c_3\in (0,\infty )\) such that

$$\begin{aligned} \frac{\bar{F}(v)}{\bar{F}(u)}\le c_3\left(\frac{u}{v}\right)^\varepsilon , \end{aligned}$$
(2.4)

for all \(0<v\le u\). Again applying (2.4), we have that \(g(n)\le c_4 \bar{F}^{-1}(n^{-1}(\ln n)^{1/2}) (\ln n)^{-1/2\varepsilon }\). Hence, if \(\varepsilon \) is chosen small enough, then (2.3) holds as desired.

To proceed, note that, on \(\mathcal E _3(n,T)\), we have that

$$\begin{aligned} \sum _{i=0}^{\Delta _{nT}^Y-1}\tau _{Y_i}\mathbf{e}_i\mathbf{1}_{\{\tau _{Y_i}\le g(n)\}} \le \sum _{x=-(\ln n)^{1+\gamma }}^{nT-1}T_x\mathbf{1}_{\{\tau _{x}\le g(n)\}}. \end{aligned}$$

Consequently, because \(E^\tau _0T_x=\tau _xE^\tau _0 G(x) \le \frac{\beta +1}{\beta -1}\tau _x\), it follows that

$$\begin{aligned} E^\tau _0\left(\sum _{i=0}^{\Delta _{nT}^Y-1}\tau _{Y_i}\mathbf{e}_i\mathbf{1}_{\{\tau _{Y_i}\le g(n)\}}\mathbf{1}_\mathcal{E _3(n,T)}\right)\le \frac{\beta +1}{\beta -1}\sum _{x=-(\ln n)^{1+\gamma }}^{nT-1}\tau _x\mathbf{1}_{\{\tau _x\le g(n)\}}. \end{aligned}$$

Combining this bound with (2.3) and using Markov’s inequality yields

$$\begin{aligned}&\mathbb P _0\left(\mathcal E _3(n,T)\cap \mathcal E _4(n,T)^c\right) \le \frac{\beta +1}{(\beta -1)\bar{F}^{-1}(n^{-1}(\ln n)^{1/2})}\mathbf{E}\\&\quad \times \left(\sum _{x=-(\ln n)^{1+\gamma }}^{nT-1}\tau _x\mathbf{1}_{\{\tau _x\le g(n)\}}\right)=o(1). \end{aligned}$$

On recalling the conclusion of Lemma 2.2, this completes the proof. \(\square \)

As a consequence of the previous result, to deduce a scaling limit for the sequence \((\Delta _n)_{n\ge 0}\), it will suffice to study sums of the form \(\sum _{x=1}^{n}T_x\mathbf{1}_{\{\tau _x> g(n)\}}\). In fact, the backtracking result of Lemma 2.2 will further allow us to replace \(T_x\) in this expression by

$$\begin{aligned} \tilde{T}_x:=\int _{\Delta _x}^{\Delta _{x,(\ln x)^{1+\gamma }}}\mathbf{1}_{\{X_t=x\}}dt, \end{aligned}$$
(2.5)

where \(\Delta _{x,(\ln x)^{1+\gamma }}\) is the first time after \(\Delta _x\) that \(X\) leaves the interval \([x-(\ln x)^{1+\gamma },x+(\ln x)^{1+\gamma }]\). This is particularly useful because, by applying the fact that deep traps are separated by a distance that is polynomial in \(n\) (see Lemma 2.1), it will be possible to decouple the random variables \((\tilde{T}_x\mathbf{1}_{\{\tau _x> g(n)\}})_{x\ge 1}\) in such a way that enables us to deduce functional scaling results for their sums from those for independent sums proved in Sect. 5. Before commencing this program in Lemma 2.5, however, we derive a preliminary lemma that suitably describes the asymptotic behaviour of the distributional tail

$$\begin{aligned} \bar{F}_n(u):=\mathbb P _0\left(\tilde{T}_x\mathbf{1}_{\{\tau _x> g(n)\}}>u\right)\!. \end{aligned}$$
(2.6)

(Clearly, the definition of \(\bar{F}_n\) is independent of the particular \(x\ge 1\) considered.)

Lemma 2.4

For every \(\varepsilon >0\), there exists a constant \(c\) such that, for any \(u\ge c(g(n)\vee 1)\),

$$\begin{aligned} (1-\varepsilon )\bar{F}_n(u)\le \bar{F}(u)\le (1+\varepsilon )\bar{F}_n(u). \end{aligned}$$

Proof

For \(x\ge 1\), let \(\tilde{G}(x)\) be the total number of visits of the embedded random walk \(Y\) to \(x\) up until the first time after \(\Delta _x^Y\) that it leaves the interval \([x-(\ln n)^{1+\gamma },x+(\ln n)^{1+\gamma }]\). Then, similarly to (2.2), we have that \(\tilde{T}_x\) is distributed as \(\tau _x\sum _{i=1}^{\tilde{G}(x)}\mathbf{e}_i\). Hence, setting \(\Gamma :=\sum _{i=1}^{\tilde{G}(x)}\mathbf{e}_i\), we can use the independence of \(\Gamma \) and \(\tau _x\) under \(\mathbb P _0\) to write

$$\begin{aligned} \bar{F}_n(u)&= \mathbb P _0\left(\tau _x\Gamma >u,\,\tau _x>g(n)\right)\\&= \int _{0}^{u/g(n)}\bar{F}\left(uv^{-1}\right)\mathbb P _0\left(\Gamma \in dv\right)+\int _{u/g(n)}^\infty \bar{F}\left(g(n)\right)\mathbb P _0\left(\Gamma \in dv\right)\!. \end{aligned}$$

It follows that

$$\begin{aligned} \left|\frac{\bar{F}_n(u)}{\bar{F}(u)}-1\right|&\le \int _{0}^{\infty }\left|\frac{\bar{F}\left(uv^{-1}\right)}{\bar{F}(u)}-1\right|\mathbb P _0\left(\Gamma \in dv\right)\nonumber \\&+\left(\frac{\bar{F}\left(g(n)\right)}{\bar{F}(u)}+1\right)\mathbb P _0\left(\Gamma \ge u/g(n)\right)\!. \end{aligned}$$
(2.7)

The first term on the right-hand side of (2.7) is independent of \(n\), and so it will be enough for our purposes to show that it converges to 0 as \(u\rightarrow \infty \). To do this, first note that, by the monotonicity of \(\bar{F}\), (1.1) holds uniformly for \(v\in [v_0,v_1]\) for any \(0<v_0\le v_1<\infty \). Hence, the \(\limsup \) as \(u\rightarrow \infty \) of the term of interest is bounded above by

$$\begin{aligned}&\mathbb P _0\left(\Gamma \not \in [v_0,v_1]\right)+\limsup _{u\rightarrow \infty }\int _{0}^{v_0} \frac{\bar{F}(uv^{-1})}{\bar{F}(u)}\mathbb P _0\left(\Gamma \in dv\right)\nonumber \\&\quad +\limsup _{u\rightarrow \infty }\int _{v_1}^\infty \frac{\bar{F}(uv^{-1})}{\bar{F}(u)}\mathbb P _0\left(\Gamma \in dz\right)\!, \end{aligned}$$

for any \(0<v_0\le v_1<\infty \). Now, if \(v_0<1\), then \(\bar{F}(uv^{-1})\le \bar{F}(u)\) for all \(v\in [0,v_0]\), and so the first limsup is bounded above by \(\mathbb P _0(\Gamma \le v_0)\). Furthermore, if \(v_1\) is chosen to be no less than 1, then we can apply the bound at (2.4) to estimate \(\bar{F}(uv^{-1})/\bar{F}(u)\) by \(cv^\varepsilon \) for \(v\ge v_1\). Thus

$$\begin{aligned} \limsup _{u\rightarrow \infty } \int _{0}^{\infty }\left|\frac{\bar{F}\left(uv^{-1}\right)}{\bar{F}(u)}-1\right|\mathbb P _0\left(\Gamma \in dv\right)\le 2\mathbb P _0\left(\Gamma \not \in [v_0,v_1]\right)+c\int _{v_1}^\infty v^\varepsilon \mathbb P _0\left(\Gamma \in dv\right)\!. \end{aligned}$$

Since \(\mathbb E _0(\Gamma ^\varepsilon )\le \mathbb E _0(1+\Gamma )= 1+\mathbb E _0(\tilde{G}(x))\mathbb E _0(\mathbf{e}_1)<\infty \), by taking \(v_0\) arbitrarily small and \(v_1\) arbitrarily large, the upper bound here can be made arbitrarily small, meaning that

$$\begin{aligned} \lim _{u\rightarrow \infty } \int _{0}^{\infty }\left|\frac{\bar{F}\left(uv^{-1}\right)}{\bar{F}(u)}-1\right|\mathbb P _0\left(\Gamma \in dv\right)=0, \end{aligned}$$

as desired.

For the second term on the right-hand side of (2.7), we apply (2.4) and Markov’s inequality to deduce that, if \(u\ge g(n)\), then

$$\begin{aligned} \left(\frac{\bar{F}\left(g(n)\right)}{\bar{F}(u)}+1\right)\mathbb P _0\left(\Gamma \ge u/g(n)\right)\le \left(c \left(\frac{u}{g(n)}\right)^\varepsilon +1\right)\frac{g(n)\mathbb E _0\Gamma }{u}, \end{aligned}$$

where \(\varepsilon \in (0,1)\) is fixed. Thus, since the above bound is small whenever \(u/g(n)\) is large (it was already noted in the previous paragraph that \(\Gamma \) has a finite first moment), the proof is complete. \(\square \)

Lemma 2.5

As \(n\rightarrow \infty \), the laws of the processes

$$\begin{aligned} \left(\frac{1}{n}L\left(\sum _{x=1}^{nt}\tilde{T}_x\mathbf{1}_{\{\tau _x> g(n)\}}\right)\right)_{t\ge 0} \end{aligned}$$

under \(\mathbb P _0\) converge weakly with respect to the Skorohod \(J_1\) topology on \(D([0,\infty ),\mathbb R )\) to the law of \((m(t))_{t\ge 0}\).

Proof

First, fix \(T\in (0,\infty )\) and suppose \((f_x)_{x\ge 1}\) is a collection of bounded, continuous functions on \(\mathbb R \). We then have that

$$\begin{aligned}&\mathbb{E }_0\left(\mathbf{1}_{\mathcal{E }_1(n,T)}\prod _{x=1}^{nT} f_x\left(\tilde{T}_x\mathbf{1}_{\{\tau _x> g(n)\}}\right)\right)\\&\quad =\sum _{B}\mathbb{E }_0\left(\mathbf{1}_{\{\mathcal{D }\cap [1,nT]=B\}} \prod _{x=1}^{nT} f_x\left(\tilde{T}_x\mathbf{1}_{\{\tau _x> g(n)\}}\right)\right)\\&\quad =\sum _B\mathbb{E }_0\left(\prod _{x\in B}f_x\left(\tilde{T}_x\mathbf{1}_{\{\tau _x>g(n)\}}\right)\mathbf{1}_{\{\tau _x>g(n)\}}\prod _{x\in [1,nT]\backslash B}f_x(0)\mathbf{1}_{\{\tau _x\le g(n)\}}\right)\!, \end{aligned}$$

where the sums are over subsets \(B\subseteq [1,nT]\) such that if \(x_1,x_2\in B\) and \(x_1\ne x_2\), then \(|x_1-x_2|> n^\kappa \). By applying the independence of traps at different sites and the disjointness of the intervals \(([x-(\ln n)^{1+\gamma },x+(\ln n)^{1+\gamma }])_{x\in B}\) for the relevant choices of \(B\), the above sum can be rewritten as

$$\begin{aligned} \sum _B\prod _{x\in B}\mathbb E _0\left(f_x\left(\tilde{T}_x\mathbf{1}_{\{\tau _x>g(n)\}}\right)\mathbf{1}_{\{\tau _x>g(n)\}}\right) \prod _{x\in [1,nT]\backslash B}\mathbb E _0\left(f_x(0)\mathbf{1}_{\{\tau _x\le g(n)\}}\right)\!. \end{aligned}$$

In particular, it follows that

$$\begin{aligned} \mathbb E _0\left(\mathbf{1}_\mathcal{E _1(n,T)}\prod _{x=1}^{nT} f_x\left(\tilde{T}_x\mathbf{1}_{\{\tau _x> g(n)\}}\right)\right) =\mathbb E _0\left(\mathbf{1}_\mathcal{E ^{\prime }_1(n,T)}\prod _{x=1}^{nT} f_x\left(\tilde{T}^{\prime }_x\mathbf{1}_{\{\tau ^{\prime }_x> g(n)\}}\right)\right)\!, \end{aligned}$$

where we suppose that, under \(\mathbb P _0\), the pairs of random variables \((\tilde{T}^{\prime }_x,\tau ^{\prime }_x),\,{x\ge 1}\), are independent and identically-distributed as \((\tilde{T}_1,\tau _1)\), and the event \(\mathcal E ^{\prime }_1(n,T)\) is defined analogously to \(\mathcal E _1(n,T)\) from these random variables. Consequently, under \(\mathbb P _0\), the laws of \((\tilde{T}_x\mathbf{1}_{\{\tau _x> g(n)\}})_{x=1}^{nT}\) conditional on \(\mathcal E _1(n,T)\) and \((\tilde{T}^{\prime }_x\mathbf{1}_{\{\tau ^{\prime }_x> g(n)\}})_{x=1}^{nT}\) conditional on \(\mathcal E ^{\prime }_1(n,T)\) are identical.

By applying the conclusion of the previous paragraph, we obtain that, for any bounded function \(H:D([0,T],\mathbb R )\rightarrow \mathbb R \) that is continuous with respect to the Skorohod \(J_1\) topology,

$$\begin{aligned}&\left|\mathbb E _0\left[H\left(\left(\frac{1}{n}L\left(\sum _{x=1}^{nt}\tilde{T}_x\mathbf{1}_{\{\tau _x> g(n)\}}\right)\right)_{t\in [0,T]}\right)\right]\right.\\&\quad \left.-\mathbb E _0\left[H\left(\left(\frac{1}{n}L\left(\sum _{x=1}^{nt}\tilde{T}^{\prime }_x\mathbf{1}_{\{\tau ^{\prime }_x> g(n)\}}\right)\right)_{t\in [0,T]}\right)\right]\right| \end{aligned}$$

is bounded above by \(2\Vert H\Vert _\infty \mathbf{P}\left(\mathcal E _1(n,T)^c\right)\). Since Lemma 2.1 tells us that this upper bound converges to 0 as \(n\rightarrow \infty \), to complete the proof it will thus suffice to establish the result with \((\tilde{T}_x,\tau _x)_{x\ge 1}\) replaced by \((\tilde{T}^{\prime }_x,\tau ^{\prime }_x)_{x\ge 1}\). However, because we are assuming that \((\tilde{T}^{\prime }_x,\tau ^{\prime }_x)_{x\ge 1}\) are independent, the tail asymptotics proved in Lemma 2.4 allow us to derive the relevant scaling limit for the sums involving \((\tilde{T}^{\prime }_x,\tau ^{\prime }_x)_{x\ge 1}\) by a simple application of Theorem 5.1 (with \(h_1(n)=\ln n\) and \(h_2(n)=0\)). \(\square \)

We are now in a position to prove Theorem 1.5 by showing that the rescaled sums considered in the previous lemma suitably well approximate the sequence \((\Delta _n)_{n\ge 1}\).

Proof of Theorem 1.5

Fix \(T\in (0,\infty )\) and observe that, on \(\mathcal E _2(n)\cap \mathcal E _3(n,T)\cap \mathcal E _4(n,T)\), we have that

$$\begin{aligned}&\sum _{x=1}^{nt-(\ln n)^{1+\gamma }}\tilde{T}_x\mathbf{1}_{\{\tau _x>g(n)\}}\le \Delta _{nt}\le \sum _{x=1}^{nt}\tilde{T}_x\mathbf{1}_{\{\tau _x>g(n)\}}+\bar{F}^{-1}(n^{-1}(\ln n)^{1/2}),\nonumber \\&\quad \forall t\in [0,T]. \end{aligned}$$
(2.8)

By reparameterising the time-scales in the obvious way, it is clear that

$$\begin{aligned} d_{J_1}\left(\left(\frac{1}{n}L\left(\sum _{x=1}^{nt-(\ln n)^{1+\gamma }}\tilde{T}_x\mathbf{1}_{\{\tau _x>g(n)\}}\right)\right)_{t\in [0,T]}, \ \left(\frac{1}{n}L\left(\sum _{x=1}^{nt}\tilde{T}_x\mathbf{1}_{\{\tau _x>g(n)\}}\right)\right)_{t\in [0,T]}\right),\nonumber \\ \end{aligned}$$
(2.9)

where \(d_{J_1}\) is the Skorohod \(J_1\) distance on \(D([0,T],\mathbb R )\) (as defined in the appendix at (6.1)), is bounded above by

$$\begin{aligned} \frac{(\ln n)^{1+\gamma }}{n}+\frac{1}{n}L\left(\sum _{x=1}^{nT}\tilde{T}_x\mathbf{1}_{\{\tau _x>g(n)\}}\right)-\frac{1}{n}L\left(\sum _{x=1}^{n(T-\varepsilon )}\tilde{T}_x\mathbf{1}_{\{\tau _x>g(n)\}}\right), \end{aligned}$$

for large \(n\). (Note that the first term above relates to the distortion of the time scale needed to compare the two processes.) By Lemma 2.5, this bound converges in distribution under \(\mathbb P _0\) to \(m(T)-m(T-\varepsilon )\). Now, in the limit as \(\varepsilon \rightarrow 0,\,m(T)-m(T-\varepsilon )\) converges to 0 in probability. It readily follows that, as \(n\rightarrow \infty \), so does the expression at (2.9). Hence, the theorem will follow from Lemmas 2.1, 2.2, 2.3 and 2.5, if we can show that

$$\begin{aligned} \sup _{x\in [0,\Xi ]}\frac{1}{n}\left|L\left(x+\bar{F}^{-1}(n^{-1}(\ln n)^{1/2})\right)-L(x)\right|\rightarrow 0, \end{aligned}$$

in \(\mathbb P _0\) probability, where \(\Xi :=\sum _{x=1}^{nT}\tilde{T}_x\mathbf{1}_{\{\tau _x>g(n)\}}\). To check this, we start by noting that Lemma 2.5 implies, for any \(\lambda >0\),

$$\begin{aligned} \mathbb P _0\left(\Xi \le \bar{F}^{-1}(1/n\lambda )\right)= \mathbb P _0\left(n^{-1}L(\Xi )\le \lambda \right)\rightarrow \mathbb P _0(m(T)\le \lambda ). \end{aligned}$$

By choosing \(\lambda \) suitably large, the limiting probability can be made arbitrarily close to 1. Thus the problem reduces to showing that, for any \(\lambda \in (0,\infty )\),

$$\begin{aligned} \sup _{x\in [0,\bar{F}^{-1}(1/n\lambda )]}\frac{1}{n}\left|L\left(x+\bar{F}^{-1}(n^{-1}(\ln n)^{1/2})\right)-L(x)\right|\rightarrow 0. \end{aligned}$$

Let \(\varepsilon \in (0,\lambda )\), then, since \(\bar{F}^{-1}(n^{-1}(\ln n)^{1/2})\le \bar{F}^{-1}(1/n\varepsilon )\) for large enough \(n\), we have that

$$\begin{aligned}&\sup _{x\in [\bar{F}^{-1}(1/n\varepsilon ),\bar{F}^{-1}(1/n\lambda )]}\frac{1}{n}\left|L\left(x+\bar{F}^{-1}(n^{-1}(\ln n)^{1/2})\right)-L(x)\right|\\&\quad \le \frac{1}{n}L\left(\bar{F}^{-1}(1/n\lambda )\right)\sup _{x\ge \bar{F}^{-1}(1/n\varepsilon )}\left|\frac{L(x+\bar{F}^{-1}(n^{-1}(\ln n)^{1/2}))}{L(x)}-1\right|\\&\quad \le \lambda \sup _{x\ge \bar{F}^{-1}(1/n\varepsilon )}\left|\frac{L(2x)}{L(x)}-1\right|, \end{aligned}$$

which converges to 0 as \(n\rightarrow \infty \) by (1.1). Moreover, we also have that

$$\begin{aligned}&\sup _{x\in [0,\bar{F}^{-1}(1/n\varepsilon )]}\frac{1}{n}\left|L\left(x+\bar{F}^{-1}(n^{-1}(\ln n)^{1/2})\right)-L(x)\right|\le \frac{1}{n}L(2\bar{F}^{-1}(1/n\varepsilon )) \\&\quad \sim \frac{1}{n}L(\bar{F}^{-1}(1/n\varepsilon )), \end{aligned}$$

where the asymptotic equivalence is an application of (1.1). In particular, since the right-hand side above is equal to \(\varepsilon \), which can be chosen arbitrarily small, the result follows. \(\square \)

From this, the proof of Corollary 1.6 is relatively straightforward.

Proof of Corollary 1.6

Define \(X^*=(X_t)_{t\ge 0}\) to be the running supremum of \(X\), i.e. \(X_t^*:=\max _{s\le t}X_s\). Since \(X_t^*\ge n\) if and only if \(\Delta _n\le t\), we obtain that \((X^*_t+1)_{t\ge 0}\) is the inverse of \((\Delta _n)_{n\ge 0}\) in the sense described at (1.9). Thus, because the inverse map is continuous with respect to the Skorohod \(M_1\) topology (at least on the subset of functions \(f\in D([0,\infty ),\mathbb R )\) that satisfy \(\limsup _{t\rightarrow \infty } f(t)=\infty \), see [31]), it is immediate from Theorem 1.5 that, as \(n\rightarrow \infty \), the laws of the processes

$$\begin{aligned} \left(\frac{1}{n}X^*_{\bar{F}^{-1}(1/nt)}\right)_{t\ge 0} \end{aligned}$$

under \(\mathbb P _0\) converge weakly with respect to the Skorohod \(M_1\) topology on \(D([0,\infty ),\mathbb R )\) to the law of \((m^{-1}(t))_{t\ge 0}\). Thus, to complete the proof, it will suffice to demonstrate that, for any \(T\in (0,\infty )\),

$$\begin{aligned} \sup _{t\in [0,T]}\frac{1}{n}\left|X^*_{\bar{F}^{-1}(1/nt)}-X_{\bar{F}^{-1}(1/nt)}\right|\rightarrow 0 \end{aligned}$$

in \(\mathbb P _0\)-probability as \(n\rightarrow \infty \). To do this, we first fix \(T\in (0,\infty )\) and set \(N:=nT\ln (nT)\). Theorem 1.5 then implies that \(\mathbb P _0\left(\Delta _{N}\ge \bar{F}^{-1}(1/nT)\right)\rightarrow 1\) as \(n\rightarrow \infty \). Moreover, on the set \(\{\Delta _{N}\ge \bar{F}^{-1}(1/nT)\}\), it is the case that

$$\begin{aligned} \sup _{t\in [0,T]}\left|X^*_{\bar{F}^{-1}(1/nt)}-X_{\bar{F}^{-1}(1/nt)}\right|\le \sup _{k\le \Delta ^Y_{N} }(Y^*_k-Y_k), \end{aligned}$$

where \(Y^*\) is the running supremum of \(Y\). Hence

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\mathbb P _0\left(\sup _{t\in [0,T]}\frac{1}{n}\left|X^*_{\bar{F}^{-1}(1/nt)}-X_{\bar{F}^{-1}(1/nt)}\right|>\varepsilon \right)\\&\quad \le \limsup _{n\rightarrow \infty }\mathbb P _0\left(\frac{1}{n}\sup _{k\le \Delta ^Y_{N} }(Y^*_k-Y_k)>\varepsilon \right)\\&\quad \le \limsup _{n\rightarrow \infty }\mathbb P _0\left(n^{-1}(\ln N)^{1+\gamma }>\varepsilon ,\mathcal E _3(N,1)\right)\\&\quad =0, \end{aligned}$$

where we have applied the fact that \(\mathbb P _0(\mathcal E _3(N,1)^c)\rightarrow 0\), which is the conclusion of Lemma 2.2, and also that \(n^{-1}(\ln N)^{1+\gamma }\rightarrow 0\), which is clear from the definition of \(N\). \(\square \)

3 Biased random walk on critical Galton–Watson trees

In this section, we explain how techniques similar to those of the previous section can be used to deduce the corresponding asymptotics for a biased random walk on a critical Galton–Watson tree conditioned to survive. Prior to proving our main results (Theorem 1.1 and Corollary 1.2), however, we proceed in the next two subsections to derive certain properties regarding the structure of the tree \(\mathcal T ^*\) and deduce some preliminary simple random walk estimates, respectively. These results establish information in the present setting that is broadly analogous to that contained in Lemmas 2.1–2.4 for the directed trap model.

3.1 Structure of the infinite tree

A key tool throughout this study is the spinal decomposition of \(\mathcal T ^*\) that appears as [24, Lemma 2.2], and which can be described as follows. First, \(\mathbf{P}\)-a.e. realisation of \(\mathcal T ^*\) admits a unique non-intersecting infinite path starting at the root. Conditional on this ‘backbone’, the number of children of vertices on the backbone are independent, each distributed as a size-biased random variable \(\tilde{Z}\), which satisfies

$$\begin{aligned} \mathbf{P}\left(\tilde{Z}=k\right)=k\mathbf{P}(Z=k),\quad k\ge 1. \end{aligned}$$
(3.1)

Moreover, conditional on the backbone and the number of children of each backbone element, the trees descending from the children of backbone vertices that are not on the backbone are independent copies of the original critical branching process \(\mathcal T \). To fix notation and terminology for this decomposition, we will henceforth suppose that \(\mathcal T ^*\) has been built by starting with a semi-infinite path, \(\{\rho =\rho _0,\rho _1,\rho _2,\dots \}\)—this will form the backbone of \(\mathcal T ^*\). Then, after selecting \((\tilde{Z}_i)_{i\ge 0}\) independently with distribution equal to that of \(\tilde{Z}\), to each backbone vertex \(\rho _i\), we attach a collection of ‘buds’ \(\rho _{ij},\,j=1,\dots ,\tilde{Z}_i-1\). Finally, we grow from each bud \(\rho _{ij}\) a ‘leaf’ \(\mathcal T _{ij}\), that is, a Galton–Watson tree with initial ancestor \(\rho _{ij}\) and offspring distribution \(Z\). See Fig. 1 for a graphical representation of these definitions.

Fig. 1
figure 1

Decomposition of \(\mathcal T ^*\)

With this picture, it is clear how we can view \(\mathcal T ^*\) as an essentially one-dimensional trap model with the backbone playing the role of \(\mathbb Z \) in the previous section. Rather than having an exponential holding time at each vertex \(\rho _i\), however, we have a random variable representing the time it takes \(X\) to leave the tree \(\mathcal T _{i}:=\{\rho _i\}\cup (\cup _{j=1,\dots ,\tilde{Z}_i-1}\mathcal T _{ij})\) starting from \(\rho _i\). As will be made precise later, key to determining whether this time is likely to be large or not are the heights of the leaves connected to \(\rho _i\). For this reason, the rest of this section will be taken up with an investigation into the big, or perhaps more accurately tall, leaves of \(\mathcal T ^*\).

More concretely, we start by introducing a sequence of critical heights \((h_n)_{n\ge 1}\) by setting \(h_n:=n(\ln n)^{-1}\) (roughly, \(\beta ^{h_n}\) will play the role that the \(g(n)\) introduced at (2.1) did in the previous section), and define, for each \(i\ge 0\),

$$\begin{aligned} N_n(i):=\#\left\{ 1\le j\le \tilde{Z}_i-1:\,h(\mathcal T _{ij})\ge h_n\right\} \!, \end{aligned}$$

where \(h(\mathcal T _{ij})\) is the height of the tree \(\mathcal T _{ij}\), so that \(N_n(i)\) counts the number of big leaves emanating from the backbone vertex \(\rho _i\). The random variables in the collection \((N_n(i))_{i\ge 0}\) are independent and identically-distributed. Moreover, it is possible to describe the asymptotic probability that one of these random variables is equal to zero, i.e. there is no big leaf at the relevant site.

Lemma 3.1

Let \(\alpha \in (1,2]\). As \(n\rightarrow \infty \), we have that

$$\begin{aligned} \mathbf{P}\left(N_n(0)=0\right)\sim 1-\frac{\alpha }{(\alpha -1)h_n}. \end{aligned}$$

Proof

By conditioning on the number of buds attached to the root, we have

$$\begin{aligned} \mathbf{P}\left(N_n(0)=0\right)=\mathbf{E}\left(\left(1-q_{h_n}\right)^{\tilde{Z}-1}\right)\!, \end{aligned}$$

where, as introduced above (1.6), \(q_k\) is the probability that an unconditioned branching process with offspring distribution \(Z\) survives for at least \(k\) generations. By the size-biasing of (3.1), this can be rewritten as

$$\begin{aligned} \mathbf{P}\left(N_n(0)=0\right)=\mathbf{E}\left(Z\left(1-q_{h_n}\right)^{{Z}-1}\right)=f^{\prime }\left(1-q_{h_n}\right)\!, \end{aligned}$$

where \(f^{\prime }\) is the derivative of the generating function \(f\), as defined at (1.5). Now, by [30], (2.1)], it holds that \(f^{\prime }(1-x)\sim 1-\alpha x^{\alpha -1}L(x)\) as \(x\rightarrow 0^+\), and so

$$\begin{aligned} \mathbf{P}\left(N_n(0)=0\right)\sim 1- \alpha q_{h_n}^{\alpha -1}L(q_{h_n}). \end{aligned}$$

From this, the proof is completed by recalling the tail decay at (1.6). \(\square \)

It will be important for our future arguments that the sites from which big leaves emanate are not too close together, and that there are no big traps close to \(\rho \). The final lemma of this section demonstrates that the sequence of critical heights we have chosen achieves this.

Lemma 3.2

Let \(\alpha \in (1,2],\,T\in (0,\infty )\) and \(\varepsilon \in (0,1)\). As \(n\rightarrow \infty \),

$$\begin{aligned} \mathbf{P}\left(\sum _{i=m}^{m+n^\varepsilon }\mathbf{1}_{\{N_n(i)\ge 1\}}\ge 2 \text{ for} \text{ some}\,m\in \{0,1,\dots ,Tn-n^\varepsilon \}\right)\rightarrow 0 \end{aligned}$$

and also

$$\begin{aligned} \mathbf{P}\left(\sum _{i=0}^{n^\varepsilon }\mathbf{1}_{\{N_n(i)\ge 1\}}\ge 1\right)\rightarrow 0. \end{aligned}$$

Proof

This is essentially the same as Lemma 2.1. \(\square \)

3.2 Initial random walk estimates

This section collects together some preliminary results for the biased random walk \((X_m)_{m\ge 0}\) on \(\mathcal T ^*\), regarding in particular: the amount of backtracking performed by the embedded biased random walk on the backbone; the amount of time \(X\) spends in small leaves; the amount of time \(X\) spends close to the base of big leaves; and tail estimates for the amount of time \(X\) spends deep within big leaves.

To begin with, we introduce \(Y=(Y_n)_{n\ge 1}\) to represent the jump process of \(\pi (X)\), where \(\pi :\mathcal T ^*\rightarrow \{\rho _0,\rho _1,\dots \}\) is the projection onto the backbone, i.e. \(\pi (x)=\rho _i\) for \(x\in \mathcal T _i\). More precisely, set \(S(0)=0\),

$$\begin{aligned} S(n)=\inf \left\{ m> S(n-1):\pi (X_{m})\ne \pi (X_{m-1})\right\} ,\quad \forall n\ge 1, \end{aligned}$$

and then define \(Y_n:=X_{S(n)}\). From this construction, it is clear that, under either the quenched or annealed law, \(Y\) is simply a biased random walk on the semi-infinite line graph \(\{\rho _0,\rho _1,\dots \}\), and so, as in the previous section, we can control the amount it backtracks. In particular, if we let

$$\begin{aligned} \Delta _n^Y:=\inf \left\{ m\ge 0:\,Y_m=\rho _n\right\} \end{aligned}$$

be the first time that the embedded random walk \(Y\) reaches level \(n\) along the backbone, then we have the following result, which is simply a restatement of Lemma 2.2. We recall that \(d_\mathcal{T ^*}\) is the shortest path graph distance on \(\mathcal T ^*\).

Lemma 3.3

Let \(\alpha \in (1,2],\,T\in (0,\infty )\) and \(\gamma >0\). As \(n\rightarrow \infty \),

$$\begin{aligned} \mathbb P _\rho \left(\min _{0\le i<j\le \Delta ^Y_{nT}}\left(d_\mathcal{T ^*}(\rho _0,Y_j)-d_\mathcal{T ^*}(\rho _0,Y_i)\right)\le -(\ln n)^{1+\gamma }\right)\rightarrow 0. \end{aligned}$$

Our next goal is to show that the time the biased random walk \(X\) spends outside of the big leaves of \(\mathcal T ^*\) is unimportant, where we define the set of vertices in big leaves to be

$$\begin{aligned} \mathcal B :=\left\{ x\in \mathcal T _{ij}:\,i\ge 0,\, 1\le j\le {\tilde{Z}_i-1},\,h(\mathcal T _{ij})\ge h_n\right\} \!. \end{aligned}$$

Key to doing this is the following equality, which is obtained by applying standard results for weighted random walks on graphs (cf. [6, Lemma 3.1]):

$$\begin{aligned} E^\mathcal{T ^*}_{\rho _{ij}}\tau _{\rho _i}=1+\beta ^{-i}\sum _{\begin{matrix} x,y\in \mathcal T _{ij}:\\ x\sim y \end{matrix}}c(x,y), \end{aligned}$$
(3.2)

where for a vertex \(x\in \mathcal T ^*\), we define \(\tau _{x}:=\inf \{m\ge 0:\,X_m=x\}\). For the statement of the next lemma, which is approximately analogous to Lemma 2.3, we recall the definition of \((\Delta _n)_{n\ge 0}\) from (1.8).

Lemma 3.4

Let \(\alpha \in (1,2],\,T\in (0,\infty )\) and \(\varepsilon >0\). As \(n\rightarrow \infty \),

$$\begin{aligned} \mathbb P _\rho \left(\sum _{m\le \Delta _{nT}}\mathbf{1}_{\{X_m\not \in \mathcal B \}}\ge \beta ^{h_n(1+\varepsilon )} \right)\rightarrow 0. \end{aligned}$$

Proof

We start by estimating the quenched expectation of the time \(X\) spends in a particular small leaf before reaching level \(nT\) along the backbone. Thus, suppose we have a leaf \(\mathcal T _{ij}\) such that \(i<nT\) and \(h(\mathcal T _{ij})<h_n\). Starting from the vertex \(\rho _i\), the probability of hitting \(\rho _{ij}\) before \(\rho _{nT}\) can be computed exactly, by elementary means, as

$$\begin{aligned} P^\mathcal{T ^*}_{\rho _i}\left(\tau _{\rho _{ij}}<\tau _{\rho _{nT}}\right)=\frac{1+\beta ^{-1}+\dots +\beta ^{i+1-nT}}{1+(1+\beta ^{-1}+\dots +\beta ^{i+1-nT})}\le \frac{1}{2-\beta ^{-1}}. \end{aligned}$$

This means that the number of separate visits \(X\) makes to \(\mathcal T _{ij}\) is stochastically dominated by a geometric random variable with parameter \(1-(2-\beta ^{-1})^{-1}\), and so its mean is bounded above by \(\beta /(\beta -1)\). Moreover, the equality at (3.2) and our assumption on \(h(\mathcal T _{ij})\) imply that, on each visit to \(\mathcal T _{ij}\), the amount of time \(X\) spends there is bounded above by

$$\begin{aligned} E^\mathcal{T ^*}_{\rho _{ij}}\tau _{\rho _i}\le 1+2\beta ^{h_n}\#\mathcal T _{ij}, \end{aligned}$$

where \(\#\mathcal T _{ij}\) is the total number of vertices in \(\mathcal T _{ij}\). Hence

$$\begin{aligned} E^\mathcal{T ^*}_{\rho }\left(\sum _{m\le \Delta _{nT}}\mathbf{1}_{\{X_m\in \mathcal T _{ij}\}}\right)\le \frac{\beta }{\beta -1}\left(1+2\beta ^{h_n}\#\mathcal T _{ij}\right)\!. \end{aligned}$$
(3.3)

As for the estimating time spent at a vertex \(\rho _i\), where \(0<i<nT\), we start by noting that the total number of returns to \(\rho _i\) is a geometric random variable. Moreover, its parameter \(P_{\rho _i}^\mathcal{T ^*}(\tau _{\rho _i}^+=\infty )\), where \(\tau _{\rho _i}^+:=\inf \{m>0:X_m=\rho _i\}\) is the first return time to \(\rho _i\), can easily be bounded below by the probability that \(X\) jumps from \(\rho _i\) to \(\rho _{i+1}\) on its first step times the probability that a biased random walk on \(\mathbb Z \) never hits the vertex to the left of its starting point. Since the first of these quantities is given by \(\beta /(\beta \tilde{Z}_i+1)\) and the second is equal to \(1-\beta ^{-1}\), it follows that

$$\begin{aligned} E_{\rho }^\mathcal{T ^*}\left(\sum _{m\le \Delta _{nT}} \mathbf{1}_{\{X_m=\rho _i\}}\right)\le c_{\beta } \tilde{Z}_i. \end{aligned}$$
(3.4)

A similar argument applies for \(i=0\).

Piecing together the estimates at (3.3) and (3.4), we thus obtain

$$\begin{aligned} E^\mathcal{T ^*}_{\rho }\left(\sum _{m\le \Delta _{nT}}\mathbf{1}_{\{X_m\not \in \mathcal B \}}\right)\le c_\beta \beta ^{h_n}\sum _{i=0}^{nT-1}\tilde{Z}_i\left[1+\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\right], \end{aligned}$$
(3.5)

where \(c_\beta \) is a constant depending only on \(\beta \). Now, to bound the summands, we consider the following probabilistic upper bound

$$\begin{aligned}&\mathbf{P}\left(\tilde{Z}_i\left[1+\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\right]\ge k\right)\le \mathbf{P}\left(\tilde{Z}_i\ge k^{1/2}\right)\nonumber \\&\qquad +\,\mathbf{P}\left(\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\ge k^{1/2}-1\right). \end{aligned}$$
(3.6)

For the first of these terms, we apply the size-biasing of (3.1) and Markov’s inequality to deduce

$$\begin{aligned} \mathbf{P}\left(\tilde{Z}_i\ge k^{1/2}\right)\le \frac{\mathbf{E}(Z^{1+\alpha ^{\prime }})}{k^{\alpha ^{\prime }/2}}. \end{aligned}$$
(3.7)

Since the expectation in (3.7) is finite for any \(\alpha ^{\prime }\in (0,\alpha -1)\) (see [19, Section 35], for example), we fix an \(\alpha ^{\prime }\) in this range to obtain a polynomial bound for the relevant probability. For the second term of (3.6), we first condition on \(\tilde{Z}_i\) to obtain

$$\begin{aligned} \mathbf{P}\left(\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\ge k^{1/2}-1\right)&= 1-\mathbf{E}\left(\left(1-\mathbf{P}\left(\#\mathcal T \ge k^{1/2}-1\right)\right)^{\tilde{Z}-1}\right)\\&= 1-\mathbf{E}\left(Z\left(1-\mathbf{P}\left(\#\mathcal T \ge k^{1/2}-1\right)\right)^{{Z}-1}\right)\\&= 1-f^{\prime }\left(1-\mathbf{P}\left(\#\mathcal T \ge k^{1/2}-1\right)\right)\!. \end{aligned}$$

From the proof of Lemma 3.1, we know that \(f^{\prime }(1-x)\sim 1-\alpha x^{\alpha -1}L(x)\) as \(x\rightarrow 0^+\), and so

$$\begin{aligned}&\mathbf{P}\left(\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\ge k^{1/2}-1\right)\sim \alpha \mathbf{P}\left(\#\mathcal T \ge k^{1/2}-1\right)^{\alpha -1}\nonumber \\&\quad L\left(\mathbf{P}\left(\#\mathcal T \ge k^{1/2}-1\right)\right), \end{aligned}$$
(3.8)

as \(k\rightarrow \infty \). To establish a bound for \(\mathbf{P}(\#\mathcal T \ge k)\) that decays polynomially quickly, first note that \(\mathbf{P}(\#\mathcal T = k)=k^{-1}\mathbf{P}(S_k=-1)\), where \((S_k)_{k\ge 0}\) is a random walk on \(\mathbb Z \) with step distribution \(Z-1\) (see [13]). Moreover, by the local limit theorem of [19, Section 50], it is the case that \(\mathbf{P}(S_k=-1)\sim ca_k^{-1}\), where \(a_k\) are the constants appearing in (1.4). Since \(a_k\sim k^{1/\alpha }\ell (k)\) for some slowly varying function \(\ell \) (see [19, Section 35], for example), it follows that if \(\alpha ^{\prime }\in (0,1/\alpha )\), then there exists a constant \(c\) such that \(\mathbf{P}(|\mathcal T | \ge k)\le ck^{-\alpha ^{\prime }}\). Combining this estimate with (3.6), (3.7) and (3.8), we obtain that there exist constants \(c\) and \(\delta >0\) such that

$$\begin{aligned} \mathbf{P}\left(\tilde{Z}_i\left[1+\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\right]\ge k\right)\le ck^{-\delta }. \end{aligned}$$
(3.9)

Consequently, recalling (3.5),

$$\begin{aligned}&\mathbf{P}\left(\sum _{m\le \Delta _{nT}}\mathbf{1}_{\{X_m\not \in \mathcal B \}}\ge \beta ^{h_n(1+\varepsilon )}\right)\\&\quad \le \mathbf{P}\left(\sum _{m\le \Delta _{nT}}\mathbf{1}_{\{X_m\not \in \mathcal B \}}\ge n E^\mathcal{T ^*}_{\rho }\left(\sum _{m\le \Delta _{nT}}\mathbf{1}_{\{X_m\not \in \mathcal B \}}\right)\right)\\&\qquad +\mathbf{P}\left(E^\mathcal{T ^*}_{\rho }\left(\sum _{m\le \Delta _{nT}}\mathbf{1}_{\{X_m\not \in \mathcal B \}}\right)\ge n^{-1}\beta ^{h_n(1+\varepsilon )}\right)\\&\quad \le n^{-1}+ \mathbf{P}\left(\max _{i=0,\ldots ,{nT-1}} \tilde{Z}_i\left[1+\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\right]\ge \frac{1}{c_\beta n^2T}\beta ^{\varepsilon h_n}\right)\\&\quad \le n^{-1}+ nT\mathbf{P}\left(\tilde{Z}_i\left[1+\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\right]\ge \frac{1}{c_\beta n^2 T}\beta ^{\varepsilon h_n}\right)\\&\quad \le n^{-1}+cn^{1+2\delta }\beta ^{-\varepsilon \delta h_n}, \end{aligned}$$

and this converges to 0 as \(n\rightarrow \infty \). \(\square \)

The result above means that, in establishing the distributional convergence of \(\Delta _n\), we only have to consider the time the random walk \(X\) spends in big leaves. In fact, as we will now show, the time spent close to the backbone in big leaves is also negligible. To this end, let us start by introducing some notation and formalising some terminology. First, we will write \(y_{ij}\) for the deepest vertex in \(\mathcal T _{ij}\); that is, the vertex that maximises the distance from the root \(\rho _{ij}\). So that this notion is well-defined, if there is more than one vertex at the deepest level of \(\mathcal T _{ij}\), we choose \(y_{ij}\) to be the first in the usual lexicographical ordering of \(\mathcal T _{ij}\), assuming that the offspring of each vertex have been labelled according to birth order. If the tree \(\mathcal T _{ij}\) has height greater than or equal to \(h_n\), then for a fixed \(\delta \in (0,1)\) it is possible to define a unique vertex on the path from \(\rho _{ij}\) to \(y_{ij}\) at level \(h_n^\delta \) in \(\mathcal T _{ij}\). We shall denote this vertex \(x_{ij}\) and call it the ‘entrance’ to the leaf \(\mathcal T _{ij}\). When we say that the leaf \(\mathcal T _{ij}\) has been visited deeply, we will mean that \(X\) has hit \(x_{ij}\). Moreover, by the ‘time spent in the lower part of a big leaves emanating from \(\rho _i\)’, we will mean

$$\begin{aligned} t_i^{\prime }=\sum _{j=1}^{\tilde{Z}_i-1}\mathbf{1}_{\{h(\mathcal T _{ij})\ge h_n\}}\sum _{m=0}^{\infty }\mathbf{1}_{\{X_m\in \mathcal T _{ij}\backslash \mathcal T _{ij}(x_{ij})\}}, \end{aligned}$$
(3.10)

where \(\mathcal T _{ij}(x_{ij})\) is the part of the tree \(\mathcal T _{ij}\) descending from the entrance \(x_{ij}\).

To control the random variables \((t_i^{\prime })_{i\ge 0}\) (which are identically-distributed apart from \(i=0\)), we need to consider the structure of the trees \(\mathcal T _{ij}^{\prime }:=\mathcal T _{ij}\backslash \mathcal T _{ij}(x_{ij})\), and for this, the construction of a Galton–Watson tree conditioned on its height given in [18] is helpful. In particular, in Sect. 2 of that article, the following algorithm is described. First, let \((\xi _n,\zeta _n),\,n\ge 0\), be a sequence of independent pairs of random variables, with distribution given by

$$\begin{aligned} \mathbf{P}\left(\xi _{n+1}=j,\zeta _{n+1}=k\right)=c_np_k\left(1-q_n\right)^{j-1}\left(1-q_{n+1}\right)^{k-j}, \end{aligned}$$

(recall that \(q_n=\mathbf{P}(Z_n>0)\) is the probability that the unconditioned branching process survives for at least \(n\) generations) for \(1\le j\le k\), where

$$\begin{aligned} c_n:=\frac{\mathbf{P}\left(h(\mathcal T )=n\right)}{\mathbf{P}\left(h(\mathcal T )=n+1\right)}. \end{aligned}$$
(3.11)

Then, let \(\tilde{\mathcal{T }}_0\) be a Galton–Watson tree of height \(0\), i.e. consisting solely of a root vertex, and, to construct \(\tilde{\mathcal{T }}_{n+1},\,n\ge 0\):

  • let the first generation size of \(\tilde{\mathcal{T }}_{n+1}\) be \(\zeta _{n+1}\),

  • let \(\tilde{\mathcal{T }}_n\) be the subtree founded by the \(\xi _{n+1}\)th first generation particle of \(\tilde{\mathcal{T }}_{n+1}\),

  • attach independent Galton–Watson trees conditioned on having height strictly less than \(n\) to the \(\xi _{n+1}-1\) siblings to the left of the distinguished first generation particle,

  • attach independent Galton–Watson trees conditioned on height strictly less that \(n+1\) to the \(\zeta _{n+1}-\xi _{n+1}\) siblings to the right of the distinguished first generation particle.

It is shown in [18] that the tree \(\tilde{\mathcal{T }}_n\) that results from this procedure has the same probabilistic structure as \(\mathcal T \) conditioned to have height exactly equal to \(n\). Before considering the implications of this result for the times \((t_i^{\prime })_{i\ge 0}\), we derive the asymptotics of the constants \((c_n)_{n\ge 1}\) in our setting.

Lemma 3.5

Let \(\alpha \in (1,2]\). The constants \((c_n)_{n\ge 1}\), as defined at (3.11), satisfy

$$\begin{aligned} c_n\sim 1+\frac{\alpha }{(\alpha -1)n}, \end{aligned}$$

as \(n\rightarrow \infty \).

Proof

First note that \(\mathbf{P}(h(\mathcal T )=n)=q_n-q_{n+1}\). Moreover, if \(f^{(n)}\) is the \(n\)-fold iteration of the generating function \(f\), then we can write \(q_n=1-f^{(n)}(0)\). It follows that

$$\begin{aligned} c_n&= \frac{f^{(n+1)}(0)-f^{(n)}(0)}{f^{(n+2)}(0)-f^{ (n+1)}(0)}=\frac{f(1-q_n)-1+q_n}{f(1-q_{n+1})-1+q_{n+1}}=\frac{q_n^\alpha L(q_n)}{q_{n+1}^\alpha L(q_{n+1})}\\&= \frac{q_n}{q_{n+1}}\times \frac{q_n^{\alpha -1} L(q_n)}{q_{n+1}^{\alpha -1} L(q_{n+1})}, \end{aligned}$$

where we have applied (1.5) to deduce the third equality. Now, by (1.6), the second term on the right-hand side satisfies

$$\begin{aligned} \frac{q_n^{\alpha -1} L(q_n)}{q_{n+1}^{\alpha -1} L(q_{n+1})}\sim \frac{n+1}{n}=1+\frac{1}{n}. \end{aligned}$$
(3.12)

For the first term, again applying (1.5) and (1.6), it is the case that

$$\begin{aligned} \frac{q_{n}}{q_{n+1}}=\frac{q_n}{1-f(1-q_n)}=\frac{1}{1-q_n^{\alpha -1}L(q_n)}\sim 1+\frac{1}{(\alpha -1)n}. \end{aligned}$$
(3.13)

Multiplying the right-hand sides of (3.12) and (3.13) yields the result. \(\square \)

Lemma 3.6

Let \(\alpha \in (1,2]\) and \(T\in (0,\infty )\). As \(n\rightarrow \infty \),

$$\begin{aligned} \mathbb P _\rho \left(\sum _{i=0}^{nT-1}t_i^{\prime }\ge \beta ^{h_n}\right)\rightarrow 0. \end{aligned}$$

Proof

Our first aim will be to show that

$$\begin{aligned} \mathbf{P}\left(h(\mathcal T _{ij}^{\prime })>x\,|\,h(\mathcal T _{ij})\!\ge \!h_n\right) \!\le \!h_n^\delta \left(1-\left(\inf _{h\ge h_n-h_n^\delta }c_{h}\right)f^{\prime }\left(1-q_{x-h_n^\delta }\right)\right),\qquad \end{aligned}$$
(3.14)

for \(h_n^\delta <x<h_n\). Fix an \(x\) in this range, and suppose for the moment that \(h(\mathcal T _{ij})=h\ge h_n\), so that \(x_{ij}\) is defined. Denote the path from \(\rho _{ij}\) to \(x_{ij}\) by \(\rho _{ij}=w_0,w_1,\dots ,w_{h_n^\delta }=x_{ij}\). Now, remove the edges \(\{w_{l-1},w_{l}\},\,l=1,\dots , h_n^\delta \) from \(\mathcal T _{ij}^{\prime }\), and denote by \(\mathcal T _{ijl}\) the connected component containing \(w_l\), so that \(\mathcal T _{ij}^{\prime }\) (minus the relevant edges) is the disjoint union of \(\mathcal T _{ijl}\) over \(l=0,\dots ,h_n^\delta -1\). From the procedure for constructing a Galton–Watson tree conditioned on its height described before Lemma 3.5, we deduce

$$\begin{aligned} \mathbf{P}\left(h(\mathcal T _{ij}^{\prime })>x\,|\,h(\mathcal T _{ij})=h\right)&= \mathbf{P}\left(\max _{l=0,\dots ,h_n^\delta -1}\left(h(\mathcal T _{ijl})+l\right)>x\,|\,h(\mathcal T _{ij})=h\right)\\&\le \mathbf{P}\left(\max _{l=0,\dots ,h_n^\delta -1} h(\mathcal T _{ijl})>x+1-h_n^\delta \,|\,h(\mathcal T _{ij})=h\right)\\&= 1-\prod _{l=0}^{h_n^\delta -1}\left[1\!-\!\mathbf{P}\left(h(\mathcal T _{ijl})>x\!+\!1\!-\!h_n^\delta \,|\,h(\mathcal T _{ij})\!=\!h\right)\right]. \end{aligned}$$

Moreover, if we suppose that \(\mathcal T _{ij}\) conditioned on its height being equal to \(h\) has been built from the random variables \((\xi _n,\zeta _n),\,n\ge 0\), then we can write

$$\begin{aligned}&\mathbf{P}\left(h(\mathcal T _{ijl})<x+1-h_n^\delta \,|\,h(\mathcal T _{ij})=h\right)\\&\quad = \mathbf{E}\left(\mathbf{P}\left(h(\mathcal T _{ijl})<x+1-h_n^\delta \,|\,h(\mathcal T _{ij})=h, \xi _{h-l},\zeta _{h-l}\right)\right)\\&\quad =\mathbf{E}\left(\mathbf{P}\left(h(\mathcal T )<x-h_n^\delta \,|\,h(\mathcal T )<h-l-1\right)^{\xi _{h-l}-1}\right.\\&\qquad \times \left. \mathbf{P}\left(h(\mathcal T )<x-h_n^\delta \,|\,h(\mathcal T )<h-l\right)^{\zeta _{h-l}-\xi _{h-l}}\right)\\&\quad =\sum _{k=1}^\infty \sum _{j=1}^k c_{h-l-1}p_k\mathbf{P}\left(h(\mathcal T )<h-l-1\right)^{j-1}\mathbf{P}\\&\qquad \times \left(h(\mathcal T )<x-h_n^\delta \,|\,h(\mathcal T )<h-l-1\right)^{j-1}\\&\qquad \times \mathbf{P}\left(h(\mathcal T )<h-l\right)^{k-j}\mathbf{P}\left(h(\mathcal T )<x-h_n^\delta \,|\,h(\mathcal T )<h-l\right)^{k-j}\\&\quad =\sum _{k=1}^\infty \sum _{j=1}^k c_{h-l-1}p_k\mathbf{P}\left(h(\mathcal T )<x-h_n^\delta \right)^{j-1}\mathbf{P}\left(h(\mathcal T )<x-h_n^\delta \right)^{k-j}\\&\quad =\sum _{k=1}^\infty c_{h-l-1} k p_k\mathbf{P}\left(h(\mathcal T )<x-h_n^\delta \right)^{k-1}\\&\quad =c_{h-l-1}f^{\prime }\left(1-q_{x-h_n^\delta }\right)\!. \end{aligned}$$

Thus, combining these deductions, we obtain

$$\begin{aligned} \mathbf{P}\left(h(\mathcal T _{ij}^{\prime })>x\,|\,h(\mathcal T _{ij})=h\right)&\le 1-\left(c_{h-l-1}f^{\prime }\left(1-q_{x-h_n^\delta }\right)\right)^{h_n^\delta }\\&\le h_n^\delta \left(1-\inf _{h^{\prime }\ge h_n-h_n^\delta }c_{h^{\prime }}f^{\prime }\left(1-q_{x-h_n^\delta }\right)\right), \end{aligned}$$

and, since this bound is independent of \(h\ge h_n\), the bound at (3.14) follows.

Now, by arguing similarly to (3.5), it is possible to check that

$$\begin{aligned} E_{\rho }^\mathcal{T ^*}\left(\sum _{i=0}^{nT-1}t_i^{\prime }\right)\le c_\beta \sum _{i=0}^{nT-1}\sum _{j=1}^{\tilde{Z}_i-1}\mathbf{1}_{\{h(\mathcal T _{ij})\ge h_n\}}\beta ^{h(\mathcal T _{ij}^{\prime })}\#\mathcal T _{ij}, \end{aligned}$$

where \(c_\beta \) is a constant depending only upon \(\beta \). Thus, following the end of the proof of Lemma 3.4,

$$\begin{aligned}&\mathbb P _\rho \left(\sum _{i=0}^{nT-1}t_i^{\prime }\ge \beta ^{h_n}\right)\\&\quad \le n^{-1}+nT\mathbf{P}\left(T c_\beta \sum _{j=1}^{\tilde{Z}_i-1}\mathbf{1}_{\{h(\mathcal T _{ij})\ge h_n\}}\beta ^{h(\mathcal T _{ij}^{\prime })}\#\mathcal T _{ij}\ge n^{-2}\beta ^{h_n}\right)\\&\quad \le n^{-1}+nT\mathbf{P}\left(T c_\beta \tilde{Z}_i\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\ge n^{-2}\beta ^{h_n/2}\right)\\&\qquad +nT\mathbf{P}\left(\max _{\begin{matrix} j=1,\dots ,\tilde{Z}_i-1:\\ h(\mathcal T _{ij})\ge h_n \end{matrix}}{h(\mathcal T _{ij}^{\prime })}\ge {h_n/2}\right). \end{aligned}$$

Clearly the first term decays to zero, and, by applying (3.9), so does the second term. To deal with the third term, observe that, under the convention that \(h(\mathcal T _{ij}^{\prime })=0\) for \(j\) such that \(h(\mathcal T _{ij})<h_n\),

$$\begin{aligned} \mathbf{P}\left(\max _{\begin{matrix} j=1,\dots ,\tilde{Z}_i-1:\\ h(\mathcal T _{ij})\ge h_n \end{matrix}}{h(\mathcal T _{ij}^{\prime })}\ge {h_n/2}\right)&= 1-\mathbf{E}\left(\left(1-\mathbf{P}\left(h(\mathcal T _{ij}^{\prime })\ge h_n/2\right)\right)^{\tilde{Z}-1}\right)\\&= 1-f^{\prime }\left(1-\mathbf{P}\left(h(\mathcal T _{ij}^{\prime })\ge h_n/2\right)\right)\\&\sim \alpha \mathbf{P}\left(h(\mathcal T _{ij}^{\prime })\ge h_n/2\right)^{\alpha -1}L\left(\mathbf{P}\left(h(\mathcal T _{ij}^{\prime })\ge h_n/2\right)\right)\\&\sim \alpha q_{h_n}^{\alpha -1} \mathbf{P}\left(h(\mathcal T _{ij}^{\prime })>h_n/2\,|\,h(\mathcal T _{ij})\ge h_n\right)^{\alpha -1}\\&\,L\left(q_{h_n} \mathbf{P}\left(h(\mathcal T _{ij}^{\prime })>h_n/2\,|\,h(\mathcal T _{ij})\ge h_n\right)\right)\!, \end{aligned}$$

where we have used that \(f^{\prime }(1-x)\sim 1-\alpha x^{\alpha -1}L(x)\) as \(x\rightarrow 0^+\), which we first recalled in the proof of Lemma 3.1, and (1.6) again. Since the representation theorem for slowly varying functions ([29, Theorem 1.2], for example) implies that, for any \(\varepsilon >0\),

$$\begin{aligned}&L\left(q_{h_n} \mathbf{P}\left(h(\mathcal T _{ij}^{\prime })>h_n/2\,|\,h(\mathcal T _{ij})\ge h_n\right)\right)\\&\quad \le \mathbf{P}\left(h(\mathcal T _{ij}^{\prime })>h_n/2\,|\,h(\mathcal T _{ij})\ge h_n\right)^{-\varepsilon } L(q_{h_n}), \end{aligned}$$

for large \(n\), it follows that \(\mathbf{P}(\max _{{j=1,\dots ,\tilde{Z}_i-1:\,h(\mathcal T _{ij})\ge h_n}}{h(\mathcal T _{ij}^{\prime })}\ge {h_n/2})\) is asymptotically less than

$$\begin{aligned}&\alpha \mathbf{P}\left(h(\mathcal T _{ij}^{\prime })>h_n/2\,|\,h(\mathcal T _{ij})\ge h_n\right)^{\alpha -1-\varepsilon }q_{h_n}^{\alpha -1}L(q_{h_n})\nonumber \\&\quad \sim \frac{\alpha \mathbf{P}\left(h(\mathcal T _{ij}^{\prime })>h_n/2\,|\,h(\mathcal T _{ij})\ge h_n\right)^{\alpha -1-\varepsilon }}{(\alpha -1)h_n}. \end{aligned}$$

Finally, setting \(x=h_n/2\) in (3.14) and applying Lemma 3.5 yields

$$\begin{aligned} \mathbf{P}\left(h(\mathcal T _{ij}^{\prime })>h_n/2\,|\,h(\mathcal T _{ij})\ge h_n\right)&\le h_n^\delta \left(1-\inf _{h\ge h_n-h_n^\delta }c_hf^{\prime }\left(1-q_{ 2^{-1}h_n-h_n^\delta }\right)\right)\\&\sim \alpha h_n^\delta q_{2^{-1} h_n-h_n^\delta }^{\alpha -1}L\left(q_{2^{-1} h_n-h_n^\delta }\right)\\&\sim c h_n^{\delta -1}, \end{aligned}$$

for a suitable choice of constant \(c\), and so, by adjusting \(c\) as necessary, we obtain that, for large \(n\),

$$\begin{aligned} nT\mathbf{P}\left(\max _{\begin{matrix} j=1,\dots ,\tilde{Z}_i-1:\\ h(\mathcal T _{ij})\ge h_n \end{matrix}}{h(\mathcal T _{ij}^{\prime })}\ge {h_n/2}\right)\le cnh_n^{-1} h_n^{(\delta -1)(\alpha -1-\varepsilon )}. \end{aligned}$$

Since this upper bound converges to 0 for any \(\varepsilon <\alpha -1\), this completes the proof.

\(\square \)

In deriving tail asymptotics for the time \(X\) spends in the big leaves emanating from a particular backbone vertex, it will be useful to have information about the set of big leaves that the biased random walk visits deeply before it escapes along the backbone, and the next two lemmas provide this. For their statement, we define the index set of big leaves emanating from \(\rho _i\) by

$$\begin{aligned} B_i:=\left\{ j=1,\dots ,\tilde{Z}_i-1:\,h(\mathcal T _{ij})\ge h_n\right\} \end{aligned}$$

and the subset of those that are visited deeply by \(X\) before it escapes a certain distance along the backbone by

$$\begin{aligned} V_i:=\left\{ j\in B_i:\,\tau _{x_{ij}}<\tau _{z_i}\right\} \!, \end{aligned}$$

where \(z_i:=\rho _{i+1+h_n^\delta }\).

Lemma 3.7

Let \(\alpha \in (1,2]\) and \(i\ge 0\). For any \(A\subseteq B_i\), we have

$$\begin{aligned} P^\mathcal{T ^*}_{\rho _i}\left(V_i=A\right)=\frac{1}{1+\#B_i}\genfrac(){0.0pt}{}{\#B_i}{\#A}^{-1}, \end{aligned}$$

where \(\#B_i,\,\#A\) represents the cardinality of \(B_i,\,A\), respectively.

Proof

The lemma readily follows from the symmetry of the situation, which implies that, starting from \(\rho _i\), the biased random walk \(X\) is equally likely to visit any one of \(x_{ij},\,j\in B_i\) and \(z_i\) first. \(\square \)

Although the above lemma might seem simple, it allows us to deduce the distributional tail behaviour of the greatest height of a big leaf at a particular backbone vertex visited by the biased random walk \(X\). Note that we continue to use the notation \(q_n=\mathbf{P}(Z_n>0)\).

Lemma 3.8

Let \(\alpha \in (1,2]\) and \(i\ge 0\). For \(x\ge h_n\),

$$\begin{aligned} \mathbb P _\rho \left(\max _{j\in V_i}h(\mathcal T _{ij})\ge x\right)=q_x^{\alpha -1}L\left(q_{x}\right)\!. \end{aligned}$$

Proof

Let \(x\ge h_n\). By definition, we have that

$$\begin{aligned} \mathbb P _\rho \left(\max _{j\in V_i}h(\mathcal T _{ij})< x\right)= \mathbf{E}\left({E}_\rho ^\mathcal{T ^*}\left(\prod _{j\in V_i}\mathbf{1}_{\{h(\mathcal T _{ij})<x\}}\right)\right), \end{aligned}$$

and decomposing the inner expectation over the possible values of \(V_i\) yields

$$\begin{aligned} {E}_\rho ^\mathcal{T ^*}\left(\prod _{j\in V_i}\mathbf{1}_{\{h(\mathcal T _{ij})<x\}}\right) =\sum _{A\subseteq B_i}{E}_\rho ^\mathcal{T ^*}\left(\mathbf{1}_{\{V_i=A\}}\prod _{j\in A}\mathbf{1}_{\{h(\mathcal T _{ij})<x\}}\right)\!. \end{aligned}$$

Since \(\prod _{j\in A}\mathbf{1}_{\{h(\mathcal T _{ij})<x\}}\) is a measurable function of \(\mathcal T ^*\), this can be rewritten as

$$\begin{aligned} {E}_\rho ^\mathcal{T ^*}\left(\prod _{j\in V_i}\mathbf{1}_{\{h(\mathcal T _{ij})<x\}}\right)&= \sum _{A\subseteq B_i} {P}_\rho ^\mathcal{T ^*}\left(V_i=A\right)\prod _{j\in A}\mathbf{1}_{\{h(\mathcal T _{ij})<x\}}\\&= \sum _{A\subseteq B_i} \frac{1}{\#B_i+1}\genfrac(){0.0pt}{}{\#B_i}{\#A}^{-1}\prod _{j\in A}\mathbf{1}_{\{h(\mathcal T _{ij})<x\}}, \end{aligned}$$

where the second equality is an application of Lemma 3.7. Now, since

$$\begin{aligned} \mathbf{P}\left(\prod _{j\in A}\mathbf{1}_{\{h(\mathcal T _{ij})<x\}}\,|\,B_i\right)=\mathbf{P}\left(h(\mathcal T )< x\,|\,h(\mathcal T )\ge h_n\right)^{\#A}= \left(1-\frac{q_x}{q_{h_n}}\right)^{\#A}, \end{aligned}$$

for every \(A\subseteq B_i\), it follows that

$$\begin{aligned} \mathbb P _\rho \left(\max _{j\in V_i}h(\mathcal T _{ij})< x\right)&= \mathbf{E}\left(\sum _{A\subseteq B_i} \frac{1}{\#B_i+1}\genfrac(){0.0pt}{}{\#B_i}{\#A}^{-1} \mathbf{P}\left(\prod _{j\in A}\mathbf{1}_{\{h(\mathcal T _{ij})<x\}}\,|\,B_i\right)\right)\\&= \mathbf{E}\left(\sum _{A\subseteq B_i} \frac{1}{\#B_i+1}\genfrac(){0.0pt}{}{\#B_i}{\#A}^{-1} \left(1-\frac{q_x}{q_{h_n}}\right)^{\#A}\right)\\&= \mathbf{E}\left(\frac{1}{\#B_i+1}\sum _{l=0}^{\#B_i}\left(1-\frac{q_x}{q_{h_n}}\right)^{l}\right)\\&= \mathbf{E}\left(\frac{q_{h_n}}{(\#B_i+1)q_x}\left(1-\left(1-\frac{q_x}{q_{h_n}}\right)^{\#B_i+1}\right)\right). \end{aligned}$$

To continue, observe that, conditional on \(\tilde{Z}_i,\,\#B_i\) is binomially distributed with parameters \(\tilde{Z}_i-1\) and \(q_{h_n}\). Consequently, the probability we are trying to compute is equal to

$$\begin{aligned} \mathbf{E}\left(\sum _{l=0}^{\tilde{Z}-1}\genfrac(){0.0pt}{}{\tilde{Z}-1}{l}q_{h_n}^{l}\left(1-q_{h_n}\right)^{\tilde{Z}-1-l}\frac{q_{h_n}}{(l+1)q_x}\left(1-\left(1-\frac{q_x}{q_{h_n}}\right)^{l+1}\right)\right).\qquad \end{aligned}$$
(3.15)

We break this into two terms. Firstly,

$$\begin{aligned}&\mathbf{E}\left(\sum _{l=0}^{\tilde{Z}-1}\genfrac(){0.0pt}{}{\tilde{Z}-1}{l}q_{h_n}^{l}\left(1-q_{h_n}\right)^{\tilde{Z}-1-l}\frac{q_{h_n}}{(l+1)q_x}\right)\nonumber \\&\quad =q_x^{-1}\mathbf{E}\left(\sum _{l=0}^{{Z}-1}\genfrac(){0.0pt}{}{Z}{l+1}q_{h_n}^{l+1}\left(1-q_{h_n}\right)^{{Z}-1-l}\mathbf{1}_{\{Z\ge 1\}}\right)\nonumber \\&\quad =q_x^{-1}\mathbf{E}\left(1-\left(1-q_{h_n}\right)^Z\right)\nonumber \\&\quad =q_x^{-1}\left(1-f\left(1-q_{h_n}\right)\right)\!. \end{aligned}$$
(3.16)

Secondly,

$$\begin{aligned}&\mathbf{E}\left(\sum _{l=0}^{\tilde{Z}-1}\genfrac(){0.0pt}{}{\tilde{Z}-1}{l}q_{h_n}^{l}\left(1-q_{h_n}\right)^{\tilde{Z}-1-l}\frac{q_{h_n}}{(l+1)q_x}\left(1-\frac{q_x}{q_{h_n}}\right)^{l+1}\right)\nonumber \\&\quad =q_x^{-1}\mathbf{E}\left(\sum _{l=0}^{{Z}-1}\genfrac(){0.0pt}{}{Z}{l+1}\left(q_{h_n}-q_x\right)^{l+1}\left(1-q_{h_n}\right)^{{Z}-1-l}\mathbf{1}_{\{Z\ge 1\}}\right)\nonumber \\&\quad =q_x^{-1}\left(f\left(1-q_x\right)-f\left(1-q_{h_n}\right)\right)\!. \end{aligned}$$
(3.17)

Since taking the difference between (3.16) and (3.17) gives us (3.15), we have thus proved that

$$\begin{aligned} \mathbb P _\rho \left(\max _{j\in V_i}h(\mathcal T _{ij})< x\right)&= q_x^{-1}\left(1-f\left(1-q_x\right)\right)\\&= q_x^{-1}\left(q_x-q_x^\alpha L\left(q_x\right)\right)\\&= 1-q_x^{\alpha -1}L\left(q_{x}\right)\!, \end{aligned}$$

where the second equality is a consequence of (1.5), and the lemma follows. \(\square \)

With these preparations in place, we are now ready to study the asymptotic tail behaviour of

$$\begin{aligned} t_i:=\sum _{j=1}^{\tilde{Z}_i-1}\mathbf{1}_{\{h(\mathcal T _{ij})\ge h_n\}}\sum _{m=0}^{\tau _{z_i}}\mathbf{1}_{\{X_m\in \mathcal T _{ij}(x_{ij})\}}, \end{aligned}$$

which can be interpreted as the length of time the \(X\) spends deep inside leaves emanating from \(\rho _i\) before escaping along the backbone. The next lemma gives an upper tail bound for this random variable.

Lemma 3.9

Let \(\alpha \in (1,2]\) and \(\varepsilon >0\). There exists a constant \(c_{\beta ,\varepsilon }\) such that, for any \(i\ge 0\) and \(x\) satisfying \(\ln x\ge c_{\beta ,\varepsilon }h_n\),

$$\begin{aligned} \mathbb P _\rho \left(t_i\ge x\right)\le \frac{(1+\varepsilon )\ln \beta }{(\alpha -1)\ln x}. \end{aligned}$$

Proof

First note that, by applying the commute time identity for random walks (for example, [27, Proposition 10.6]), we have that

$$\begin{aligned} E^{\{\rho _i\}\cup \mathcal T _{ij}}_{x_{ij}}\tau _{\rho _i}+E^{\{\rho _i\}\cup \mathcal T _{ij}}_{\rho _i}\tau _{x_{ij}}=\left(\beta ^{-i}+\dots +\beta ^{-(i+h_n^\delta )}\right)\left(2\beta ^{i}+\sum _{\begin{matrix} x,y\in \mathcal T _{ij}:\\ x\sim y \end{matrix}}c(x,y)\right), \end{aligned}$$

where \(E^{\{\rho _i\}\cup \mathcal T _{ij}}_\cdot \) refers to the random walk on the tree \(\mathcal T _{ij}\) extended by adding the vertex \(\rho _i\) and the edge \(\{\rho _i,\rho _{ij}\}\). Since \(E^{\{\rho _i\}\cup \mathcal T _{ij}}_{x_{ij}}\tau _{\rho _i}=E^\mathcal{T ^*}_{x_{ij}}\tau _{\rho _i}\), it follows that

$$\begin{aligned} E^\mathcal{T ^*}_{x_{ij}}\tau _{\rho _i}\le \frac{\beta }{\beta -1}\left(2+2\beta ^{h(\mathcal T _{ij})}\#\mathcal T _{ij}\right)\!. \end{aligned}$$

Thus, since the random walk \(X\) spends no time in \(\mathcal T _{ij}(x_{ij})\) if \(j\not \in V_i\), we can bound the quenched expectation of \(t_i\) conditional on \(V_i\) as follows:

$$\begin{aligned} E^\mathcal{T ^*}_\rho \left( t_i\,|\,V_i\right)&= \sum _{j=1}^{\tilde{Z}_i-1}\sum _{m=0}^{\infty }P^\mathcal{T ^*}_\rho \left(X_m\in \mathcal T _{ij}(x_{ij}),\,m\le \tau _{z_i}\,|\,V_i\right)\nonumber \\&\le \sum _{j\in V_i} E^\mathcal{T ^*}_{x_{ij}}\left(\tau _{\rho _i}\right) E^\mathcal{T ^*}_{\rho }\left(\upsilon _{ij}\,|\,V_i\right)\!, \end{aligned}$$
(3.18)
$$\begin{aligned}&\le c_{\beta }\sum _{j\in V_i}\left(1+\#\mathcal T _{ij}\right)\beta ^{h(\mathcal T _{ij})}E^\mathcal{T ^*}_{\rho }\left(\upsilon _{ij}\,|\,V_i\right)\!, \end{aligned}$$
(3.19)

where \(\upsilon _{ij}\) is the number of passages \(X\) makes from \(\rho _i\) to \(x_{ij}\) before it hits \(z_i\), and the inequality at (3.18) is obtained by an application of the strong Markov property (that holds with respect to the unconditioned law). Now, \(\upsilon _{ij}\) is clearly bounded above by the total number of visits to \(\rho _i,\,N(\rho _i)\) say, and, by symmetry, this latter random variable satisfies \(E^\mathcal{T ^*}_{\rho }(N(\rho _i)\,|\,V_i) =E^\mathcal{T ^*}_{\rho }(N(\rho _i)\,|\,\#V_i)\). Consequently, we deduce that

$$\begin{aligned} E^\mathcal{T ^*}_{\rho }\left(\upsilon _{ij}\,|\,V_i\right)&\le \sum _{k=0}^{\#B_i}\mathbf{1}_{\{\#V_i=k\}}E^\mathcal{T ^*}_{\rho }\left(N(\rho _i)\,|\,\#V_i=k\right)\\&\le \sum _{k=0}^{\#B_i}\mathbf{1}_{\{\#V_i=k\}}\frac{E^\mathcal{T ^*}_{\rho }\left(N(\rho _i)\right)}{P^\mathcal{T ^*}_{\rho }\left(\#V_i=k\right)}\\&\le c_\beta \sum _{k=0}^{\#B_i}\mathbf{1}_{\{\#V_i=k\}}(\#B_i+1)\tilde{Z}_i\\&\le c_\beta \tilde{Z}_i^2, \end{aligned}$$

where we have applied Lemma 3.7 and the argument at (3.4) to deduce \(P^\mathcal{T ^*}_{\rho }(\#V_i=k)^{-1}=\#B_i+1\) and \(E^\mathcal{T ^*}_{\rho }(N(\rho _i))\le c_{\beta }\tilde{Z}_i\), respectively. Applying the above bound in combination with (3.19) yields

$$\begin{aligned} E^\mathcal{T ^*}_\rho \left( t_i\,|\,V_i\right)\le c_\beta \tilde{Z}_i^3 \left(1+\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\right)\beta ^{\max \nolimits _{j\in V_i}h(\mathcal T _{ij})}. \end{aligned}$$

Thus, for \(\eta \in (0,\frac{1}{2})\), we can conclude

$$\begin{aligned} \mathbb P _\rho \left(t_i\ge x\right)&\le \mathbb P _\rho \left(t_i\ge x^\eta E^\mathcal{T ^*}_\rho \left( t_i\,|\,V_i\right)\right)+\mathbb P _\rho \left(E^\mathcal{T ^*}_\rho \left( t_i\,|\,V_i\right)\ge x^{1-\eta } \right)\\&\le x^{-\eta }+\mathbb P _\rho \left(c_\beta \tilde{Z}_i^3 \left(1+\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\right)\beta ^{\max \nolimits _{j\in V_i}h(\mathcal T _{ij})}\ge x^{1-\eta }\right)\\&\le x^{-\eta }+\mathbf{P}\left(c_\beta \tilde{Z}_i^3 \left(1+\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\right)^3\ge x^\eta \right)\\&\quad +\mathbb P _\rho \left(\beta ^{\max \nolimits _{j\in V_i}h(\mathcal T _{ij})}\ge x^{1-2\eta }\right)\\&\le x^{-\eta }+c_\beta x^{-\eta \delta /3}+q_{(1-2\eta )\ln x/\ln \beta }^{\alpha -1}L\left(q_{(1-2\eta )\ln x/\ln \beta }\right) \end{aligned}$$

for \((1-2\eta )\ln x/\ln \beta \ge h_n\), where the value of \(c_\beta \) has been updated from above and the constant \(\delta \) is the one appearing in (3.9). We have also applied Lemma 3.8 in obtaining the final bound. Finally, (1.6) allows us to deduce from this that, as long as \((1-2\eta )\ln x/\ln \beta \) is sufficiently large, it holds that

$$\begin{aligned} \mathbb P _\rho \left(t_i\ge x\right)\le \frac{(1+\eta )\ln \beta }{(1-2\eta )(\alpha -1)\ln x}. \end{aligned}$$

The result follows. \(\square \)

We can also prove a lower bound for the distributional tail of \(t_i\) that matches the upper bound proved above. Similarly to a proof strategy followed in [6], a key step in doing this is obtaining a concentration result to show that the time spent in a leaf visited deeply by the process \(X\) will be on the same scale as its expectation.

Lemma 3.10

Let \(\alpha \in (1,2]\) and \(\varepsilon >0\). There exist constants \(n_0\) and \(c_{\beta ,\varepsilon }\) such that, for any \(i\ge 0,\,n\ge n_0\) and \(x\) satisfying \(c_{\beta ,\varepsilon }h_n \le \ln x\le n^2\),

$$\begin{aligned} \mathbb P _\rho \left(t_i\ge x\right)\ge \frac{(1-\varepsilon )\ln \beta }{(\alpha -1)\ln x}. \end{aligned}$$

Proof

Our first goal is to derive an estimate on the lower tail of the time that \(X\) spends in a big leaf \(\mathcal T _{ij}\) before hitting \(\rho _i\), given that it starts at the entrance vertex \(x_{ij}\). To this end, we start by noting that under \(P^\mathcal{T ^*}_{x_{ij}}\) and conditional on the number of returns that the random walk \(X\) makes to \(\mathcal T _{ij}(x_{ij})\) before hitting \(\rho _i\), i.e.

$$\begin{aligned} \upsilon ^{\prime }_{ij}:=\#\left\{ m\le \tau _{\rho _i}:\,X_{m-1}=x_{ij}^{\prime },\,X_m=x_{ij}\right\} \!, \end{aligned}$$

where \(x_{ij}^{\prime }\) denotes the parent of \(x_{ij}\), the random variable \(\Sigma :=\sum _{m=0}^{\tau _{\rho _i}}\mathbf{1}_{\{X_m\in \mathcal T _{ij}(x_{ij})\}}\) is distributed as \(\upsilon ^{\prime }_{ij}+1\) independent copies of a random variable whose law is equal to that of \(\tau _{x_{ij}^{\prime }}\) under \(P^\mathcal{T ^*}_{x_{ij}}\). (This is a simple application of the strong Markov property.) In particular, we have that

$$\begin{aligned} E_{x_{ij}}^\mathcal{T ^*}\left(\Sigma \,|\,\upsilon ^{\prime }_{ij}\right)= \left(1+\upsilon ^{\prime }_{ij}\right)E_{x_{ij}}^\mathcal{T ^*}\tau _{x_{ij}^{\prime }}, \end{aligned}$$

and also

$$\begin{aligned} \mathrm{Var}_{x_{ij}}^\mathcal{T ^*}\left(\Sigma \,|\,\upsilon ^{\prime }_{ij}\right)=\left(1+\upsilon ^{\prime }_{ij}\right) \mathrm{Var}_{x_{ij}}^\mathcal{T ^*}\left(\tau _{x_{ij}^{\prime }}\right)\!. \end{aligned}$$

To control the right-hand sides of these quantities, we will apply the following moment bounds:

$$\begin{aligned} E_{x_{ij}}^\mathcal{T ^*}\tau _{x_{ij}^{\prime }}=1+\beta ^{-(i+h_n^\delta )}\sum _{\begin{matrix} x,y\in \mathcal T _{ij}(x_{ij}):\\ x\sim y \end{matrix}}c(x,y)\ge \beta ^{h(\mathcal T _{ij})-h_n^\delta } \end{aligned}$$

and

$$\begin{aligned} E_{x_{ij}}^\mathcal{T ^*}\left(\tau _{x_{ij}^{\prime }}^2\right)\le \frac{2\beta }{\beta -1}E_{x_{ij}}^\mathcal{T ^*}\left(\tau _{x_{ij}^{\prime }}\right)^2, \end{aligned}$$

where the first moment lower bound is obtained by applying a formula similar to (3.2), and the second moment upper bound is an adaptation of a result derived in the proof of [6, Lemma 9.1]. As for the distribution of \(\upsilon ^{\prime }_{ij}\) under \(P^\mathcal{T ^*}_{x_{ij}}\), it is clear this is geometric, with parameter given by

$$\begin{aligned} P_{x_{ij}^{\prime }}^\mathcal{T ^*}\left(\tau _{\rho _i}<\tau _{x_{ij}}\right)=\frac{\left(1+\beta ^{-1}+\dots +\beta ^{-h_n^\delta +1}\right)^{-1}}{\beta ^{h_n^\delta }+\left(1+\beta ^{-1}+\dots +\beta ^{-h_n^\delta +1}\right)^{-1}}=\frac{\beta ^{h_n^\delta }-\beta ^{h_n^\delta -1}}{\beta ^{2h_n^\delta }-\beta ^{h_n^\delta -1}}, \end{aligned}$$

from which it follows that

$$\begin{aligned} E_{x_{ij}^{\prime }}^\mathcal{T ^*}\left(\upsilon ^{\prime }_{ij}+1\right)=\frac{\beta ^{h_n^\delta }-\beta ^{-1}}{1-\beta ^{-1}}\ge \beta ^{h_n^\delta }-\beta ^{-1}\ge \frac{\beta ^{h_n^\delta }}{2}, \end{aligned}$$

for \(n\ge n_0\), where \(n_0\) is a deterministic constant. Putting the above observations together, we deduce that, for \(n\ge n_0\) and \(\varepsilon >0\),

$$\begin{aligned}&P^\mathcal{T ^*}_{x_{ij}}\left(\Sigma \le \frac{\varepsilon }{4}\beta ^{h(\mathcal T _{ij})}\right)\\&\quad \le P^\mathcal{T ^*}_{x_{ij}}\left(\Sigma \le \frac{\varepsilon }{4}\beta ^{h(\mathcal T _{ij})},\,\upsilon ^{\prime }_{ij}+1\ge \varepsilon E^\mathcal{T ^*}_{x_{ij}} (\upsilon ^{\prime }_{ij}+1)\right)\\&\qquad + P^\mathcal{T ^*}_{x_{ij}}\left(\upsilon ^{\prime }_{ij}+1<\varepsilon E^\mathcal{T ^*}_{x_{ij}} (\upsilon ^{\prime }_{ij}+1)\right)\\&\quad \le P^\mathcal{T ^*}_{x_{ij}}\left(\Sigma \le \frac{1}{2}\left(\upsilon ^{\prime }_{ij}+1\right)E_{x_{ij}}^\mathcal{T ^*}\tau _{x_{ij}^{\prime }}\right)+1-\left(1-\frac{\beta ^{h_n^{\delta }}-\beta ^{h_n^{\delta }-1}}{\beta ^{2h_n^{\delta }}-\beta ^{h_n^{\delta }-1}}\right)^{\varepsilon E^\mathcal{T ^*}_{x_{ij}} (\upsilon ^{\prime }_{ij}+1)} \\&\quad \le P^\mathcal{T ^*}_{x_{ij}}\left(\left| \Sigma -E_{x_{ij}}^\mathcal{T ^*}\left(\Sigma \,|\,\upsilon ^{\prime }_{ij}\right)\right|\ge \frac{1}{2}E_{x_{ij}}^\mathcal{T ^*}\left(\Sigma \,|\,\upsilon ^{\prime }_{ij}\right)\right)+\varepsilon \\&\quad \le E^\mathcal{T ^*}_{x_{ij}}\left(\frac{4\mathrm{Var}_{x_{ij}}^\mathcal{T ^*}\left(\Sigma \,|\,\upsilon ^{\prime }_{ij}\right)}{E_{x_{ij}}^\mathcal{T ^*}\left(\Sigma \,|\,\upsilon ^{\prime }_{ij}\right)^2}\right)+\varepsilon \\&\quad \le \frac{8\beta }{\beta -1}E^\mathcal{T ^*}_{x_{ij}}\left(\frac{1}{\upsilon ^{\prime }_{ij}+1}\right)+\varepsilon \\&\quad \le c h_n^\delta \beta ^{-h_n^\delta }+\varepsilon , \end{aligned}$$

where \(c\) is a constant depending only on \(\beta \) and \(n_0\) (and not \(\varepsilon \)).

Now, if we suppose \(j_0\in V_i\) is such that \(h(\mathcal T _{ij_0})=\max _{j\in V_i}h(\mathcal T _{ij})\), then

$$\begin{aligned} P^\mathcal{T ^*}_\rho \left(t_i\le \frac{\varepsilon }{4}\beta ^{\max \nolimits _{j\in V_i}h(\mathcal T _{ij})}\,|\,V_i\right)\le P^\mathcal{T ^*}_\rho \left(\tau _{x_{ij_0}\rightarrow \rho _i}\le \frac{\varepsilon }{4}\beta ^{h(\mathcal T _{ij_0})}\,|\,V_i\right)\!, \end{aligned}$$

where \(\tau _{x_{ij_0}\rightarrow \rho _i}\) is the amount of time \(X\) spends in \(\mathcal T _{ij_0}(x_{ij_0})\) before \(\inf \{m\ge \tau _{x_{ij_0}}:\,X_m=\rho _i\}\). By applying a strong Markov argument for the unconditioned law (cf. (3.18)), yields that the law of \(\tau _{x_{ij_0}\rightarrow \rho _i}\) under \(P^\mathcal{T ^*}_{\rho }(\cdot |\,V_i)\) is the same as that of \(\Sigma \) (as defined above with \(j=j_0\)) under \(P^\mathcal{T ^*}_{x_{ij_0}}\), and thus the result of the previous paragraph implies that, for \(n\ge n_0\) and \(\varepsilon >0\),

$$\begin{aligned} P^\mathcal{T ^*}_\rho \left(t_i\le \frac{\varepsilon }{4}\beta ^{\max \nolimits _{j\in V_i}h(\mathcal T _{ij})}\,|\,V_i\right)\le c h_n^\delta \beta ^{-h_n^\delta }+\varepsilon . \end{aligned}$$

Taking expectations with respect to \(P^\mathcal{T ^*}_\rho \) and \(\mathbf{P}\) establishes that the same is true when \(P^\mathcal{T ^*}_{\rho }(\cdot |\,V_i)\) is replaced by the annealed law \(\mathbb P _\rho \). Consequently, for any \(n\ge n_0,\,\varepsilon >0\) and \(\ln x\ge h_n\ln \beta \),

$$\begin{aligned} \mathbb P _\rho \left(t_i\le x\right)&\le \mathbb P _\rho \left(t_i\le \frac{\varepsilon }{4}\beta ^{\max \nolimits _{j\in V_i}h(\mathcal T _{ij})}\right)+\mathbb P _\rho \left(\max _{j\in V_i}h(\mathcal T _{ij})\le \frac{\ln (4x/\varepsilon )}{\ln \beta }\right)\\&\le c h_n^\delta \beta ^{-h_n^\delta }+\varepsilon + 1-q_{\frac{\ln (4x/\varepsilon )}{\ln \beta }}^{\alpha -1}L\left(q_{\frac{\ln (4x/\varepsilon )}{\ln \beta }}\right), \end{aligned}$$

where we have applied Lemma 3.8 to deduce the second inequality. Finally, fix \(\eta >0\). If we set \(\varepsilon =1/(\ln x)^2\), then the second term is bounded above by \(\eta /\ln x\) for any \(\ln x\ge \eta ^{-1}\). With this choice of \(\varepsilon \), by (1.6), the fourth term is bounded above by \(-(1-\eta )\ln (\beta )/(\alpha -1)\ln x\), uniformly over \(\ln x\ge x_0\), for suitably large \(x_0=x_0(\eta )\). Moreover, it holds that, \(c h_n^\delta \beta ^{-h_n^\delta }=o(n^{-2})=o(1/\ln x)\), uniformly over \(\ln x\le n^2\), and this completes the proof. \(\square \)

Finally for this section, we establish that the same distributional tail behaviour for the random variables

$$\begin{aligned} \tilde{t}_i:=\sum _{j=1}^{\tilde{Z}_i-1}\mathbf{1}_{\{h(\mathcal T _{ij})\ge h_n\}}\sum _{m=\Delta _i}^{\Delta _{i,(\ln n)^{1+\gamma }}}\mathbf{1}_{\{X_m\in \mathcal T _{ij}(x_{ij})\}}, \end{aligned}$$
(3.20)

where \(\Delta _{i,(\ln n)^{1+\gamma }}\) is the first time after \(\Delta _i\) that the process \(X\) hits a backbone vertex outside of the interval \(\{\rho _{i-(\ln n)^{1+\gamma }},\dots ,\rho _{i+(\ln n)^{1+\gamma }}\}\). Given the backtracking result of Lemma 3.3, with high probability it is the case that \(\tilde{t}_i\) will be identical to the \(t_i\) for all relevant indices \(i\). However, the advantage of the sequence \((\tilde{t}_i)\) over \((t_i)\) is that, similarly to the sequence of random variables \((\tilde{T}_x)\) introduced for the directed trap model at (2.5), at least when the traps are suitably well-spaced, it is possible to decouple the elements of \((\tilde{t}_i)\) in such a way as to be able to usefully compare them with an independent sequence.

Lemma 3.11

Let \(\alpha \in (1,2]\) and \(\varepsilon >0\). There exist constants \(n_0\) and \(c_{\beta ,\varepsilon }\) such that, for any \(i\ge 0,\,n\ge n_0\) and \(x\) satisfying \(c_{\beta ,\varepsilon }h_n \le \ln x\le n^2\),

$$\begin{aligned} \frac{(1-\varepsilon )\ln \beta }{(\alpha -1)\ln x}\le \mathbb P _\rho \left(\tilde{t}_i\ge x\right)\le \frac{(1+\varepsilon )\ln \beta }{(\alpha -1)\ln x}. \end{aligned}$$

Proof

If the process \(X\) does not hit \(\rho _{i-1-(\ln n)^{1+\gamma }}\) again after having hit \(\rho _i\), and does not hit \(\rho _{i}\) again after having hit \(\rho _{i+1+(\ln n)^{1+\gamma }}\), then \(\tilde{t}_i\) is equal to \(t_i\). Hence,

$$\begin{aligned} \mathbb P _\rho \left(t_i\ne \tilde{t}_i\right)\le 2\mathbf{E}\left( P_{\rho _{1+(\ln n)^{1+\gamma }}}^\mathcal{T ^*}(\tau _{\rho }<\infty )\right)\!. \end{aligned}$$

An elementary calculation for the biased random walk on a line shows that the right-hand side here is equal to \(\beta ^{1-(\ln n)^{1+\gamma }}=o(n^{-2})\). Applying this fact, it is easy to deduce the result from Lemmas 3.9 and 3.10. \(\square \)

3.3 Proof of main result for critical Galton–Watson trees

The purpose of this section is to complete the proof of our main results for biased random walks on critical Galton–Watson trees (Theorem 1.1 and Corollary 1.2).

Proof of Theorem 1.1

We start the proof by claiming that the conclusion of the lemma holds when the hitting time sequence \((\Delta _{n})_{n\ge 0}\) is replaced by \((\sum _{i=0}^{n-1}\tilde{t}_i)_{n\ge 0}\). By imitating the proof of Lemma 2.5 with Lemma 3.2 in place of Lemma 2.1, to verify that this is indeed the case, it will be enough to prove the same result for \((\sum _{i=0}^{n-1}\tilde{t}_i^{\prime })_{n\ge 0}\), where \((\tilde{t}_i^{\prime })_{i\ge 0}\) in an independent sequence such that \(\tilde{t}_i^{\prime }\sim \tilde{t}_{1+(\ln n)^{1+\gamma }}\) for each \(i\). (Note that, because the elements of the sequence \((\tilde{t}_i)_{i\ge 0}\) are only identically-distributed for \(i\ge 1+(\ln n)^{1+\gamma }\), we do not take \(\tilde{t}_i^{\prime }\sim \tilde{t}_i\) for each \(i\). By applying the second part of Lemma 3.2, which shows that with high probability there will be no big leaves in the interval close to \(\rho \), it is easy to adapt the argument of Lemma 2.5 to overcome this issue.) Since the tail asymptotics of Lemma 3.11 mean that the relevant functional scaling limit for \((\sum _{i=0}^{n-1}\tilde{t}^{\prime }_i)_{n\ge 0}\) is an immediate application of Theorem 5.1 (with \(h_1(n)=\ln n\) and \(h_2(n)=n^{-1}\)), our claim holds as desired.

Now, fix \(T\in (0,\infty )\). By Lemmas 3.3, 3.4 and 3.6, with probability converging to one we have that, for every \(t\in [0,T]\),

$$\begin{aligned} \sum _{i=0}^{nt-1-(\ln n)^{1+\gamma }}\tilde{t}_i\le \Delta _n\le \sum _{i=0}^{nt-1}\tilde{t}_i+2\beta ^{2h_n}. \end{aligned}$$

By repeating the proof of Theorem 1.5 exactly with the particular choice \(L(x):=\log _+x\), this, in conjunction with the conclusion of the previous paragraph, yields the result. \(\square \)

Proof of Corollary 1.2

Since the proof is identical to that of Corollary 1.6, with \(\bar{F}(x)\) being taken to be a distribution function that is asymptotically equivalent to \(\ln \beta /(\alpha -1)\ln x\), we omit it. \(\square \)

3.4 Growth rate of quenched mean hitting times

The purpose of this section is to compare the growth rate of \(E^\mathcal{T ^*}_\rho \Delta _n\), that is, the quenched expectations of the hitting times \(\Delta _n\), with the growth rate of \(\Delta _n\) that was established in the previous section. Interestingly, in the result corresponding to Theorem 1.1 (see Theorem 3.14 below), an extra factor of \(\alpha \) appears, meaning that the sequence of quenched expectations grows more quickly than the hitting times themselves. This is primarily due to the fact that the quenched expectation \(E^\mathcal{T ^*}_\rho \Delta _n\) feels all the big leaves at a particular backbone vertex, whereas the hitting time \(\Delta _n\) only feels the big leaves that are deeply visited by \(X\). Indeed, the extra \(\alpha \) is most easily understood by comparing the following lemma, which describes the height of the biggest leaf at a particular backbone vertex, with Lemma 3.8, which concerns only deeply visited big leaves.

Lemma 3.12

Let \(\alpha \in (1,2]\). For any \(i\ge 0\),

$$\begin{aligned} \mathbf{P}\left(\max _{j=1,\dots ,\tilde{Z}_i-1}h(\mathcal T _{ij})\ge x\right)\sim \alpha q_x^{\alpha -1}L\left(q_{x}\right), \end{aligned}$$

as \(x\rightarrow \infty \).

Proof

Conditioning on \(\tilde{Z}_i\), we obtain

$$\begin{aligned} \mathbf{P}\left(\max _{j=1,\dots ,\tilde{Z}_i-1}h(\mathcal T _{ij})< x\right)&= \mathbf{E}\left(\mathbf{P}\left(h(\mathcal T )< x\right)^{\tilde{Z}-1}\right)\\&= \mathbf{E}\left(Z\left(1-q_x\right)^{{Z}-1}\right)\\&= f^{\prime }\left(1-q_x\right), \end{aligned}$$

where we have once again applied the size-biasing of (3.1) to obtain the second equality. Since we know from the proof of Lemma 3.1 that \(f^{\prime }(1-x)\sim 1-\alpha x^{\alpha -1}L(x)\) as \(x\rightarrow 0^+\), the proof is complete. \(\square \)

In studying the quenched expectation of hitting times, we no longer need an argument that is so sophisticated as to consider the time spent in the individual leaves \(\mathcal T _{ij}\) (which were defined after (3.1)). Instead, we will be concerned only with understanding the expected length of time the biased random walk \(X\) spends inside sets of the form \(\mathcal T _{i}=\{\rho _i\}\cup (\cup _{j=1,\dots ,\tilde{Z}_i-1}\mathcal T _{ij})\). To this end, we introduce a stopping time

$$\begin{aligned} \sigma _i:=\inf \{n\ge 0:X_n\not \in \mathcal T _{i}\}. \end{aligned}$$

The expected time spent by \(X\) inside \(\mathcal T _{i}\) on a single visit is thus given by \(\mathbf{E}_{\rho _i}^\mathcal{T ^*}\sigma _{i}\). Similarly to (3.2), we have that

$$\begin{aligned} \mathbf{E}_{\rho _i}^\mathcal{T ^*}\sigma _{i}=\left\{ \begin{array}{ll} 1+\frac{1}{\beta ^{i-1}(1+\beta )}\sum _{{x,y\in \mathcal T _{i},x\sim y}}c(x,y),&\quad \text{ if}\,i\ge 1, \\ 1+\sum \nolimits _{{x,y\in \mathcal T _{0},x\sim y}}c(x,y),&\quad \text{ if}\,i=0, \end{array}\right. \end{aligned}$$
(3.21)

and this allows us to obtain the following distributional asymptotics.

Lemma 3.13

Let \(\alpha \in (1,2]\). For any \(i\ge 0\),

$$\begin{aligned} \mathbf{P}\left(\mathbf{E}_{\rho _i}^\mathcal{T ^*}\sigma _i \ge x\right)\sim \frac{\alpha \ln \beta }{(\alpha -1)\ln x}, \end{aligned}$$

as \(x\rightarrow \infty \).

Proof

If \(i\ge 1\), then from (3.21) we are easily able to deduce that

$$\begin{aligned} \frac{\beta ^{h(\mathcal T _i)}}{1+\beta }\le \mathbf{E}_{\rho _i}^\mathcal{T ^*}\sigma _i\le 1+\frac{2\beta ^{1+h(\mathcal T _i)}\#\mathcal T _i}{1+\beta }, \end{aligned}$$
(3.22)

where \(h(\mathcal T _i)=\mathbf{1}_{\{\tilde{Z}_i> 1\}}+\max _{j=1,\dots ,\tilde{Z}_i-1}h(\mathcal T _{ij})\) is the height of \(\mathcal T _{i}\). Hence, for any \(\eta \in (0,1)\),

$$\begin{aligned}&\mathbf{P}\left(\mathbf{E}_{\rho _i}^\mathcal{T ^*}\sigma _i \ge x\right)\\&\quad \le \mathbf{P}\left(\beta ^{\max \nolimits _{j=1,\dots ,\tilde{Z}_i-1}h(\mathcal T _{ij})}\ge (x-1)^{1-\eta }\right) +\mathbf{P}\left(2\beta ^2\#\mathcal T _{i}\ge (\beta +1) (x-1)^\eta \right)\\&\quad \le \mathbf{P}\left(\max \nolimits _{j=1,\dots ,\tilde{Z}_i-1}h(\mathcal T _{ij})\ge \frac{(1-\eta )\ln (x-1)}{\ln \beta }\right)+cx^{-\delta \eta },\\&\quad \sim \frac{\alpha \ln \beta }{(\alpha -1)(1-\eta )\ln x}, \end{aligned}$$

where we have applied (3.9) to deduce the second inequality for suitable constants \(c\) and \(\delta >0\), and Lemma 3.12 and (1.6) to obtain the asymptotic equivalence. Since (3.22) in conjunction with Lemma 3.12 and (1.6) also implies that

$$\begin{aligned} \mathbf{P}\left(\mathbf{E}_{\rho _i}^\mathcal{T ^*}\sigma _i \ge x\right)\ge \mathbf{P}\left(\max _{j=1,\dots ,\tilde{Z}_i-1}h(\mathcal T _{ij})\ge \frac{\ln x +\ln (1+\beta )}{\ln \beta }\right)\sim \frac{\alpha \ln \beta }{(\alpha -1)\ln x}, \end{aligned}$$

the result follows in this case. The argument for \(i=0\) is similar. \(\square \)

We are now ready to prove the main result of this section.

Theorem 3.14

Let \(\alpha \in (1,2]\). As \(n\rightarrow \infty \), the laws of the processes

$$\begin{aligned} \left(\frac{(\alpha -1)\ln _+ E^\mathcal{T ^*}_\rho \Delta _{nt}}{n \alpha \ln \beta }\right)_{t\ge 0} \end{aligned}$$

under \(\mathbf{P}\) converge weakly with respect to the Skorohod \(J_1\) topology on \(D([0,\infty ),\mathbb R )\) to the law of \((m(t))_{t\ge 0}\).

Proof

The embedded random walk on the backbone \(Y\) visits each site \(\rho _i,\,i\ge 1\), a geometric parameter \((\beta -1)/(\beta +1)\) number of times in total and \(\rho =\rho _0\) a geometric parameter \((\beta -1)/\beta \) number of times. Moreover, before visiting \(\rho _n,\,Y\) has to visit each element of \(\{\rho _0,\dots ,\rho _{n-1}\}\) at least once. This and the definition of \(\Delta _n\) implies that

$$\begin{aligned} \sum _{i=0}^{n-1}\mathbf{E}_{\rho _i}^\mathcal{T ^*}\sigma _i\le \mathbf{E}_\rho ^\mathcal{T ^*}\Delta _n\le \frac{\beta +1}{\beta -1}\sum _{i=0}^{n-1}\mathbf{E}_{\rho _i}^\mathcal{T ^*}\sigma _i. \end{aligned}$$
(3.23)

Now, the random variables \(\mathbf{E}_{\rho _i}^\mathcal{T ^*}\sigma _i\) in these sums are independent and have slowly varying tails, as described by Lemma 3.13. Thus the result is a simple consequence of [23, Theorem 2.1] (or Theorem 5.1 below). \(\square \)

Remark 3.15

For comparison, recall the directed trap model of Sect. 2, but, so as to avoid having to consider the time that the biased random walk \(X\) spends at negative integers, replace \(\mathbb Z \) by the half-line \(\mathbb Z _+\). As in Theorem 1.5, we have that \((n^{-1}L(\Delta _{nt}))_{t\ge 0}\) converges in distribution under the annealed law \(\mathbb P _0\) to \((m(t))_{t\ge 0}\). For the corresponding quenched expectation, similarly to (3.23), we have that \(\sum _{i=0}^{n-1}\tau _i\le E^\tau _0(\Delta _n)\le \frac{\beta +1}{\beta -1}\sum _{i=0}^{n-1}\tau _i\). Thus, again applying [23, Theorem 2.1] (or Theorem 5.1 below), it is possible to check that \((n^{-1}L(E^\tau _0(\Delta _{nt})))_{t\ge 0}\) converges in distribution under \(\mathbf{P}\) to \((m(t))_{t\ge 0}\). In particular, in contrast to the critical Galton–Watson tree case, the asymptotic behaviour of \(E^\tau _0(\Delta _n)\) and \(\Delta _n\) are identical. This is because, although certain big leaves will be avoided by certain realisations of the biased random walker in the tree setting, the geometry of the graph \(\mathbb Z _+\) forces \(X\), when travelling from \(0\) and \(n\), to visit all the traps in between on every realisation.

4 Extremal aging

In this section, we will prove Theorem 1.4 and Theorem 1.8, which state that the biased random walk on critical Galton–Watson tree conditioned to survive and the one-dimensional trap model, respectively, experience extremal aging. The phenomenon we describe for these models is similar to what happens in the trapping models considered by Onur Gun in his PhD thesis [20] and to results observed for spin glasses in [7].

4.1 Extremal aging for the one-dimensional trap model

We start by considering the one-dimensional trap model introduced in Sect. 1.2, with the goal of this section being to prove Theorem 1.8. The reason for proving this result before its counterpart for trees is that the simpler argument it requires will be instructive when it comes to tackling the more challenging tree case in the subsequent section.

Key to proving Theorem 1.8 is establishing that \(X\) localises at the closest trap to 0 of a sufficient depth. To describe this precisely, as we do in Lemma 4.2 below, we first introduce the notation

$$\begin{aligned} l(u):=\min \left\{ x\ge 0:\,\tau _x\ge \bar{F}^{-1}\left(\frac{1}{u}\right)\right\} . \end{aligned}$$

From the independently and identically-distributed nature of the environment, we readily deduce the following preliminary lemma.

Lemma 4.1

For any \(0<a<b\), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbf{P}\left(l(an)=l(bn)\right)=\frac{a}{b}. \end{aligned}$$

We now establish the relevant localisation result for \(X\).

Lemma 4.2

For any \(a>0\), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb P _0\left(X_{\bar{F}^{-1}(1/an)}=l(an)\right)=1. \end{aligned}$$

Proof

Our first aim is to show that \(X\) hits \(l(an)\) before time \({\bar{F}^{-1}(1/an)}\) with high probability. Clearly, for any \(T>0\), we have that

$$\begin{aligned} \mathbf{P}\left(l(an)>nT\right)&= \mathbf{P}\left(\tau _x<\bar{F}^{-1}\left(\frac{1}{an}\right):\,x=0,1,\dots ,nT\right)\\&= \left(1-\frac{1}{an}\right)^{nT+1}\rightarrow e^{-T/a}, \end{aligned}$$

as \(n\rightarrow \infty \). Moreover, by Lemma 4.1, for \(\varepsilon \in (0,1)\) it holds that

$$\begin{aligned}&\mathbf{P}\left(\tau _x\ge \bar{F}^{-1}\left(\frac{1}{an(1-\varepsilon )}\right) \text{ for} \text{ some}\,x=0,\dots ,l(an)-1\right)\\&\quad =\mathbf{P}\left(l(an)\ne l(an(1-\varepsilon ))\right)\rightarrow \varepsilon , \end{aligned}$$

as \(n\rightarrow \infty \). Recalling the notation \(\tilde{T}_x\) introduced at (2.5) and applying these two results in conjunction with the bound at (2.8) yields

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\mathbb P _0\left(\Delta _{l(an)}> \sum _{x=1}^{nT}\tilde{T}_x\mathbf{1}_{\{\tau _x\le \bar{F}^{-1}(1/an(1-\varepsilon ))\}}+\bar{F}^{-1}\left(n^{-1}(\ln n)^{1/2}\right)\right)\\&\quad \le \varepsilon +e^{-T/a}. \end{aligned}$$

We know that \(\bar{F}^{-1}(n^{-1}(\ln n)^{1/2})\le \bar{F}^{-1}(1/\varepsilon n)\) for large enough \(n\), Markov’s inequality thus implies

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\mathbb P _0\left(\Delta _{l(an)}>\bar{F}^{-1}\left(\frac{1}{an}\right)\right)\\&\quad \le \limsup _{n\rightarrow \infty }\frac{1}{\bar{F}^{-1}(1/an)}\left[ \mathbb E _0\left(\sum _{x=1}^{nT}\tilde{T}_x\mathbf{1}_{\{\tau _x\le \bar{F}^{-1}(1/an(1-\varepsilon ))\}}\right)+ \bar{F}^{-1}\left(\frac{1}{\varepsilon n}\right)\right]\\&\qquad +\varepsilon + e^{-T/a}\\&\quad \le \limsup _{n\rightarrow \infty }\frac{1}{\bar{F}^{-1}(1/an)}\left[c_{\beta }nT \mathbf{E}\left(\tau _0\mathbf{1}_{\{\tau _0\le \bar{F}^{-1}(1/an(1-\varepsilon ))\}}\right)+ \bar{F}^{-1}\left(\frac{1}{\varepsilon n}\right)\right]\\&\qquad +\,\varepsilon \,+ e^{-T/a}, \end{aligned}$$

where \(c_\beta \) is a constant depending only on \(\beta \). By proceeding as in the proof of Lemma 2.3 with \(g(n)\) replaced by \(\bar{F}^{-1}(1/an(1-\varepsilon ))\), it is possible to check that

$$\begin{aligned} \mathbf{E}\left(\tau _0\mathbf{1}_{\{\tau _0\le \bar{F}^{-1}(1/an(1-\varepsilon ))\}}\right)\le \frac{c_1 \bar{F}^{-1}(1/an(1-\varepsilon ))}{an(1-\varepsilon )}, \end{aligned}$$

and so

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\mathbb P _0\left(\Delta _{l(an)}>\bar{F}^{-1}\left(\frac{1}{an}\right)\right) \le \limsup _{n\rightarrow \infty }\\&\quad \frac{c_2T\bar{F}^{-1}(1/an(1-\varepsilon ))+ a\bar{F}^{-1}(1/\varepsilon n)}{a(1-\varepsilon )\bar{F}^{-1}(1/an)}+\varepsilon + e^{-T/a}. \end{aligned}$$

Were \(\limsup _{n\rightarrow \infty } \bar{F}^{-1}(1/an(1-\varepsilon ))/\bar{F}^{-1}(1/an)>0\), then there would exist a subsequence \((n_i)_{i\ge 0}\) and constant \(c>0\) such that \(\bar{F}^{-1}(1/an_i(1-\varepsilon ))\ge c \bar{F}^{-1}(1/an_i)\). Applying the decreasing function \(\bar{F}\) to both sides and then the slowly-varying property (1.1) yields that \(a\le a(1-\varepsilon )\), which is clearly a contradiction. Hence \(\lim _{n\rightarrow \infty } \bar{F}^{-1}(1/an(1-\varepsilon ))/\bar{F}^{-1}(1/an)=0\). Similarly, one has that \(\lim _{n\rightarrow \infty } \bar{F}^{-1}(1/\varepsilon n)/\bar{F}^{-1}(1/an)=0\) for any \(\varepsilon <a\). Thus, letting \(T\rightarrow \infty \) and \(\varepsilon \rightarrow 0\), the above estimate yields

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb P _0\left(\Delta _{l(an)}>\bar{F}^{-1}\left(\frac{1}{an}\right)\right)=0, \end{aligned}$$

as desired.

Now, if \(X_{\bar{F}^{-1}(1/an)}\ne l(an)\), then either \(X\) does not hit \(l(an)\) before time \(\bar{F}^{-1}(1/an)\), or it does hit \(l(an)\) and spends less time than \(\bar{F}^{-1}(1/an)\) there before moving to any other vertex. By the conclusion of the previous paragraph, the former event has probability 0 asymptotically, and so

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\mathbb P _0\left(X_{\bar{F}^{-1}(1/an)}\ne l(an)\right)\\&\quad \le \limsup _{n\rightarrow \infty }\mathbf{E}\left({P}^\tau _{l(an)} \left(\inf \{t:X_t\ne l(an)\}\le \bar{F}^{-1}(1/an)\right)\right). \end{aligned}$$

Since \(\inf \{t:X_t\ne l(an)\}\) is exponential with mean \(\tau _{l(an)}\) under \({P}^\tau _{l(an)}\), for any \(\varepsilon >0\) the right-hand side here is bounded above by

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\mathbf{E}\left(1\wedge \frac{\bar{F}^{-1}(1/an)}{\tau _{l(an)}}\right)\le \limsup _{n\rightarrow \infty }\\&\quad \times \left[ \mathbf{P}\left(\tau _{l(an)}<\bar{F}^{-1}(1/an(1+\varepsilon )) \right)+ \frac{\bar{F}^{-1}(1/an)}{{\bar{F}^{-1}(1/an(1+\varepsilon ))}}\right]. \end{aligned}$$

The probability in the previous expression is equal to \(\mathbf{P}(l(an)\ne l(an(1+\varepsilon )))\), and, by Lemma 4.1, this is asymptotically bounded above by \(\varepsilon \). Similarly to an observation made in the previous paragraph, we also have that \(\lim _{n\rightarrow \infty } \bar{F}^{-1}(1/an)/\bar{F}^{-1}(1/an(1+\varepsilon ))=0\), and thus we have established

$$\begin{aligned} \limsup _{n\rightarrow \infty }\mathbb P _0\left(X_{\bar{F}^{-1}(1/an)}\ne l(an)\right)\le \varepsilon . \end{aligned}$$

Since \(\varepsilon \) was arbitrary, this completes the proof. \(\square \)

Combining Lemmas 4.1 and 4.2, we readily obtain Theorem 1.8.

4.2 Extremal aging for the critical Galton–Watson tree model

We now return to the setting of Sect. 1.1, so as to prove Theorem 1.4. Similarly to the strategy of the previous section, we will show that the biased random walk on a critical Galton–Watson tree localises in the first suitably big leaf it visits deeply. To describe this, we introduce the notation:

$$\begin{aligned} l(x):=\min \left\{ i\ge 0:\,\max _{j\in V_i}h(\mathcal T _{ij})\ge x/\ln \beta \right\} \!. \end{aligned}$$

Whilst the form of the following lemma is similar that of Lemma 4.1, we note that its proof is more involved. This is because, unlike the holding time means \(\tau _x\) used to define \(l\) there, the random variables \(\max _{j\in V_i}h(\mathcal T _{ij})\) are not environment measurable or independent.

Lemma 4.3

Let \(\alpha \in (1,2]\). For any \(0<a<b\), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb P _{\rho }\left(l(an)=l(bn)\right)=\frac{a}{b}. \end{aligned}$$

Proof

First, define

$$\begin{aligned} \tilde{V}_i:=\left\{ j\in B_i:\,\tau _{x_{ij}}<\Delta _{i,(\ln n)^{(1+\gamma )}}\right\} \!, \end{aligned}$$

to be the set of big leaves visited by \(X\) before the stopping time \(\Delta _{i,(\ln n)^{(1+\gamma )}}\) that was introduced at (3.20). Set \(\tilde{H}_i:=\max _{j\in \tilde{V}_i}h(\mathcal T _{ij})\) if \(\tilde{V}_i\ne \emptyset \), and \(\tilde{H}_i=0\) otherwise; observe that if \(\tilde{H}_i>0\), then it is necessarily also the case that \(\tilde{H}_i\ge h_n\). Moreover, for \(T\in (0,\infty )\) and \(\varepsilon \in (0,1)\), let

$$\begin{aligned} \mathcal E _1(n):=\left\{ \sum _{i=m}^{m+n^\varepsilon }\mathbf{1}_{\{N_n(i)\ge 1\}}\le 1:\,m=0,1,\dots ,Tn-n^\varepsilon \right\} \cap \left\{ \sum _{i=0}^{n^\varepsilon }\mathbf{1}_{\{N_n(i)\ge 1\}}=0\right\} \!. \end{aligned}$$

By proceeding as in the proof of Lemma 2.5, it is possible to show that, under \(\mathbb P _\rho \), the random variables \((\tilde{H}_i,N_n(i))_{i=0}^{nT}\) conditional on \(\mathcal E _1(n)\) have the same joint distribution as \((\tilde{H}^{\prime }_i,N^{\prime }_n(i))_{i=0}^{nT}\) conditional on \(\mathcal E _1^{\prime }(n)\), where \((\tilde{H}^{\prime }_i,N^{\prime }_n(i))_{i\ge 0}\) are independent copies of the pair of random variables \((\tilde{H}_{1+(\ln n)^{1+\gamma }},N_n(1+(\ln n)^{1+\gamma }))\) and \(\mathcal E _1^{\prime }(n)\) is defined analogously to \(\mathcal E _1(n)\) with the \(N_n(i)\)s replaced by \(N_n^{\prime }(i)\)s. Consequently, if we set

$$\begin{aligned} \tilde{l}(x):=\min \left\{ i\ge 0:\,\tilde{H}_i\ge x/\ln \beta \right\} \!, \end{aligned}$$

and define \(\tilde{l}^{\prime }(x)\) similarly from the random variables \(\tilde{H}_i^{\prime }\), then

$$\begin{aligned}&\left|\mathbb P _\rho \left(\tilde{l}(an)= \tilde{l}(bn)\right)- \mathbb P _\rho \left(\tilde{l}^{\prime }(an)= \tilde{l}^{\prime }(bn)\right)\right|\nonumber \\&\quad \le \left|\mathbb P _\rho \left(\tilde{l}(an)= \tilde{l}(bn),\,\tilde{l}(an)\le nT,\,\mathcal E _1(n)\right)\right.\nonumber \\&\left.\qquad - \mathbb P _\rho \left(\tilde{l}^{\prime }(an)= \tilde{l}^{\prime }(bn),\,\tilde{l}^{\prime }(an)\le nT,\,\mathcal E _1^{\prime }(n)\right)\right|\nonumber \\&\qquad +\left|\mathbb P _\rho \left(\tilde{l}(an)> nT,\,\mathcal E _1(n)\right)-\mathbb P _\rho \left(\tilde{l}^{\prime }(an)> nT,\,\mathcal E _1^{\prime }(n)\right)\right|\nonumber \\&\qquad +2\mathbb P _\rho \left(\tilde{l}^{\prime }(an)> nT,\,\mathcal E _1^{\prime }(n)\right)+2\mathbb P _\rho (\mathcal E _1(n)^c)\nonumber \\&\quad \le 2\mathbb P _\rho \left(\tilde{l}^{\prime }(an)> nT\right)+2\mathbf{P}(\mathcal E _1(n)^c), \end{aligned}$$
(4.1)

where we have applied the fact that \(\{\tilde{l}(an)= \tilde{l}(bn),\,\tilde{l}(an)\le nT\}\) and \(\{\tilde{l}(an)> nT\}\) are both \((\tilde{H}_i)_{i=0}^{nT}\) measurable events. Now, similarly to the observation made in the proof of Lemma 3.11, if the process \(X\) does not hit \(\rho _{i-1-(\ln n)^{1+\gamma }}\) again after having hit \(\rho _i\), and does not hit \(\rho _{i}\) again after having hit \(\rho _{i+1+(\ln n)^{1+\gamma }}\)—an event which has probability greater than \(1-o(n^{-2})\) uniformly in \(i\), then \(\tilde{H}_i\) is equal to \(\max _{j\in V_i}h(\mathcal T _{ij})\). Hence, applying (1.6) and Lemma 3.8, we obtain that, for any \(x,\varepsilon >0\),

$$\begin{aligned} \left|\mathbb P _\rho \left(\tilde{H}_i\ge xn\right)- \frac{1}{(\alpha -1)xn}\right|\le \frac{\varepsilon }{n} \end{aligned}$$

for large \(n\) (uniformly in \(i\)), and clearly the same bound holds when \(\tilde{H}_i\) is replaced by \(\tilde{H}^{\prime }_i\). Applying the independence of the random variables \((\tilde{H}^{\prime }_i)_{i\ge 0}\), it follows that

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb P _\rho \left(\tilde{l}^{\prime }(an)= \tilde{l}^{\prime }(bn)\right)=\frac{a}{b}, \end{aligned}$$

and also

$$\begin{aligned} \mathbb P _\rho \left(\tilde{l}^{\prime }(an)> nT\right)&= \mathbb P _\rho \left(\tilde{H}_{1+(\ln n)^{1+\gamma }}<\frac{an}{\ln \beta } \right)^{nT+1}\\&\le \left(1-\frac{\ln \beta -\varepsilon a}{(\alpha -1)an}\right)^{nT+1}\\&\sim e^{- T(\ln \beta -\varepsilon a)/(\alpha -1)a}. \end{aligned}$$

Combining these results with Lemma 3.2, which implies that \(\mathbf{P}(\mathcal E _1(n)^c)\rightarrow 0\), and the estimate at (4.1), then letting \(T\rightarrow \infty \), yields

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb P _{\rho }\left(\tilde{l}(an)=\tilde{l}(bn)\right)=\frac{a}{b}. \end{aligned}$$
(4.2)

Now, suppose that \(\mathcal E _2(n)\) is the event that the embedded random walk on the backbone \(Y\) does not backtrack more that \((\ln n)^{1+\gamma }\) before hitting \(\rho _{n(T+1)}\)—by Lemma 3.3, \(\mathbb P _{\rho }(\mathcal E _2(n))\rightarrow 1\). Moreover, on the event \(\mathcal E _2(n)\), we have that \(\tilde{H}_i=\max _{j\in V_i}h(\mathcal T _{ij})\) for \(i\le n(T+1)-1-h_n^\delta \). In particular, for large enough \(n\), if \(\mathcal E _2(n)\) holds and also \(\tilde{l}(an)\le nT\), then it must be the case that \(l(an)=\tilde{l}(an)\). Hence, for large \(n\),

$$\begin{aligned} \mathbb P _\rho \left(l(an)\ne \tilde{l}(an)\right) \le \mathbb P _\rho \left(\tilde{l}(an)> nT\right)+\mathbb P _\rho \left(\mathcal E _2(n)^c\right)\!. \end{aligned}$$

Similarly to above, we have that the first term here can be bounded above by

$$\begin{aligned}&\left|\mathbb P _\rho \left(\tilde{l}(an)> nT,\,\mathcal E _1(n)\right)-\mathbb P _\rho \left(\tilde{l}^{\prime }(an)> nT,\,\mathcal E _1^{\prime }(n)\right)\right|\\&\quad +\mathbb P _\rho \left(\tilde{l}^{\prime }(an)> nT\right)+\mathbf{P}(\mathcal E _1(n)^c), \end{aligned}$$

the limsup as \(n\rightarrow \infty \) of which can be made arbitrarily small by choosing \(T\) suitably large. Hence

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb P _\rho \left(l(an)\ne \tilde{l}(an)\right)=0. \end{aligned}$$

The lemma follows by applying this in conjunction with (4.2). \(\square \)

Before proceeding to prove the analogue of Lemma 4.2 in the tree setting—see Lemma 4.5 below, we prove a preliminary estimate which rules out the possibility that any leaves have heights that are close to any particular level on the appropriate scale.

Lemma 4.4

Let \(\alpha \in (1,2]\). For any \(a, T\in (0,\infty )\),

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\limsup _{n\rightarrow \infty }\mathbf{P}\left(\min _{i=0,1,\dots ,nT}\min _{j=1,\dots ,\tilde{Z}_i-1}\left|h(\mathcal T _{ij}) -an\right|\le a n\varepsilon \right)=0. \end{aligned}$$

Proof

First observe that

$$\begin{aligned}&\mathbf{P}\left(\min _{j=1,\dots ,\tilde{Z}_i-1}\left|h(\mathcal T _{ij}) -an\right|\le a n\varepsilon \right) =1-\mathbf{E}\left(\left(1-q_{an(1-\varepsilon )}+q_{an(1+\varepsilon )}\right)^{\tilde{Z}-1}\right)\\&\quad =1-f^{\prime }\left(1-q_{an(1-\varepsilon )}+q_{an(1+\varepsilon )}\right)\\&\quad \sim \alpha \left(q_{an(1-\varepsilon )}-q_{an(1+\varepsilon )}\right)^{\alpha -1}L\left(q_{an(1-\varepsilon )}-q_{an(1+\varepsilon )}\right)\!, \end{aligned}$$

where we again apply [30], (2.1)] to deduce the asymptotic equality. Now, from (1.6), one can check that

$$\begin{aligned} q_{an(1-\varepsilon )}-q_{an(1+\varepsilon )}\sim \frac{2\varepsilon q_{an}}{\alpha -1}, \end{aligned}$$

which yields

$$\begin{aligned} \mathbf{P}\left(\min _{j=1,\dots ,\tilde{Z}_i-1}\left|h(\mathcal T _{ij}) -an\right|\le a n\varepsilon \right)\sim \alpha \left(\frac{2\varepsilon q_{an}}{\alpha -1}\right)^{\alpha -1}L\left(\frac{2\varepsilon q_{an}}{\alpha -1}\right)\sim \frac{c_\alpha \varepsilon ^{\alpha -1}}{an}, \end{aligned}$$

where \(c_\alpha \) is a constant depending only on \(\alpha \). The lemma readily follows. \(\square \)

Lemma 4.5

Let \(\alpha \in (1,2]\). For any \(a>0\), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb P _\rho \left(\pi (X_{e^{an}})=\rho _{l(an)}\right)=1. \end{aligned}$$

Proof

Fix \(\varepsilon >0\), and let \(i_0,j_0\) be indices such that \(x_{i_0j_0}\) is the first entrance to a big leaf with height greater than or equal to \(an(1+\varepsilon )/\ln \beta \) visited by \(X\) (the relevant terminology was introduced just above (3.10)). If \(i_1:=l(an(1+\varepsilon ))\le nT,\,j_1\in V_{i_1}\) is such that \(h(\mathcal T _{i_1j_1})\ge an(1+\varepsilon )/\ln \beta \) and \(n\) is suitably large, then it must hold that

$$\begin{aligned} \tau _{x_{i_0j_0}}\le \tau _{x_{i_1j_1}}< \tau _{z_{i_1}}\le \Delta _{n(T+1)}, \end{aligned}$$

where we recall that \(z_i:=\rho _{i+1+h_n^\delta }\) and note that the second inequality follows from the definition of \(V_{i}\). In particular, for large \(n\),

$$\begin{aligned}&\mathbb P _\rho \left(\tau _{x_{i_0j_0}}> \Delta _{n(T+1)}\right) \le \mathbb P _\rho \left(l(an(1+\varepsilon ))>nT\right)\\&\quad \le \mathbb P _\rho \left(l(an(1+\varepsilon ))\ne \tilde{l}(an(1+\varepsilon ))\right)+\mathbb P _\rho \left(\tilde{l}(an(1+\varepsilon ))>nT\right)\!, \end{aligned}$$

where \(\tilde{l}(an)\) was defined in the proof of Lemma 4.3, and the upper bound here converges to 0 as \(n\) and then \(T\) tend to infinity. Consequently, by applying Lemmas 3.3, 3.4, 3.6 similarly to the proof of Theorem 1.1, as well as Lemma 4.4, we obtain that if

$$\begin{aligned} \Theta :=\sum _{i=0}^{n(T+1)-1}\sum _{j= 1}^{\tilde{Z}_i-1}\mathbf{1}_{\{h_n\le h(\mathcal T _{ij})\le an(1-\varepsilon )/\ln \beta \}}\sum _{m=\Delta _i}^{\Delta _{i,(\ln n)^{1+\gamma }}}\mathbf{1}_{\{X_m\in \mathcal T _{ij}(x_{ij})\}}, \end{aligned}$$

then

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\limsup _{T\rightarrow \infty }\limsup _{n\rightarrow \infty }\mathbb P _\rho \left(\tau _{x_{i_0j_0}}>\Theta +2\beta ^{2h_n}\right)=0. \end{aligned}$$

Now, by proceeding similarly to the proof of Lemma 3.4, we have that

$$\begin{aligned} E_{\rho }^\mathcal{T ^*}\Theta&\le \frac{\beta }{\beta -1} \sum _{i=0}^{n(T+1)-1} \tilde{Z}_i\left(1+2e^{an(1-\varepsilon )}\max _{j=1,\dots ,\tilde{Z}_i-1}{\#\mathcal T _{ij}}\right)\,\\&\le \frac{2\beta n(T+1)e^{an(1-\varepsilon )}}{\beta -1}\max _{i=0,\dots ,n(T+1)-1}\tilde{Z}_i\left[1+\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\right]. \end{aligned}$$

Combining these observations yields

$$\begin{aligned}&\limsup _{\varepsilon \rightarrow 0}\limsup _{n\rightarrow \infty }\mathbb P _\rho \left(\tau _{x_{i_0j_0}}>e^{an}\right)\\&\quad \le \limsup _{\varepsilon \rightarrow 0}\limsup _{T\rightarrow \infty } \limsup _{n\rightarrow \infty }\mathbb P _\rho \left(\Theta >2^{-1}e^{an}\right)\\&\quad \le \limsup _{\varepsilon \rightarrow 0}\limsup _{T\rightarrow \infty } \limsup _{n\rightarrow \infty }\left[\mathbb P _\rho \left(\Theta >n E_{\rho }^\mathcal{T ^*}\Theta \right)+\mathbb P _\rho \left(E_{\rho }^\mathcal{T ^*}\Theta >2^{-1}n^{-1}e^{an}\right)\right]\\&\quad \le \limsup _{\varepsilon \rightarrow 0}\limsup _{T\rightarrow \infty } \limsup _{n\rightarrow \infty }\mathbf{P}\\&\qquad \times \left(\max _{i=0,\dots ,n(T+1)-1}\tilde{Z}_i\left[1+\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\right]>\frac{(\beta -1)e^{an\varepsilon }}{4\beta n^2(T+1)}\right)\\&\quad \le \limsup _{\varepsilon \rightarrow 0} \limsup _{T\rightarrow \infty } \limsup _{n\rightarrow \infty }n(T+1)\mathbf{P}\nonumber \\&\qquad \times \left(\tilde{Z}_i\left[1+\max _{j=1,\dots ,\tilde{Z}_i-1}\#\mathcal T _{ij}\right]>\frac{(\beta -1)e^{an\varepsilon }}{4\beta n^2(T+1)}\right). \end{aligned}$$

Applying (3.9), we thus obtain

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\limsup _{n\rightarrow \infty }\mathbb P _\rho \left(\tau _{x_{i_0j_0}}>e^{an}\right)=0. \end{aligned}$$
(4.3)

We now check that \(i_0\), as defined in the previous paragraph, is, with high probability, equal to \(l(an)\). We continue to use the notation \(i_1=l(an(1+\varepsilon ))\). Firstly, observe that if \(i_0<i_1\), then the definition of \(l(an(1+\varepsilon ))\) implies that \(j_0\not \in V_{i_0}\), i.e. \(\tau _{z_{i_0}}\le \tau _{x_{i_0j_0}}\). Hence the process \(X\) must backtrack a distance \(1+h_n^\delta \) along the backbone before hitting \(x_{i_0j_0}\). This implies that, for any \(T\in (0,\infty )\)

$$\begin{aligned} \mathbb P _\rho \left(i_0<i_1\right)&\le \mathbb P _\rho \left(\tau _{x_{i_0j_0}}>\Delta _{n(T+1)}\right)\\&\quad + \mathbb P _\rho \left(\min _{0\le i<j\le \Delta ^Y_{n(T+1)}}\left(d_\mathcal{T ^*}(\rho ,Y_j)-d_\mathcal{T ^*}(\rho ,Y_i)\right)\le -h_n^\delta \right), \end{aligned}$$

where \(Y\) is the jump process on the backbone introduced in Sect. 3.2. By Lemma 3.3, the second term here vanishes as \(n\) tends to infinity, and it was already noted above that the first term converges to 0 as \(n\) and then \(T\) tend to infinity. Secondly, suppose \(i_0>i_1\). Since by construction \(\tau _{i_0j_0}<\tau _{i_ij_1}<\tau _{z_{i_1}}\) for any \(j_1\in V_{i_1}\) with \(h(\mathcal T _{i_1j_1})\ge an(1+\varepsilon )/\ln \beta \) (and such a \(j_1\) must necessarily exist), we always also have that \(i_0<i_1+1+h_n^\delta \). In particular, in this situation, there exist two distinct backbone vertices from which big traps emanate within a distance \(h_n^\delta \) of each other, and so

$$\begin{aligned}&\mathbb P _\rho \left(i_0>i_1\right) \quad \le \mathbb P _\rho \left(\tau _{x_{i_0j_0}}>\Delta _{n(T+1)}\right)\\&\quad + \mathbf{P}\left(\sum _{i=m}^{m+h_n^\delta }\mathbf{1}_{\{N_n(i)\ge 1\}}\ge 2 \text{ for} \text{ some}\,m\in \{0,1,\dots ,(T+1)n-n^\varepsilon \}\right). \end{aligned}$$

As \(n\) tends to infinity, the second term converges to 0 by Lemma 3.2, and we deal with the first term in the same way as before, thus confirming that \(\mathbb P _\rho (i_0\ne i_1)\) converges to 0 as \(n\) tends to infinity. We deduce from this that

$$\begin{aligned} \limsup _{\varepsilon \rightarrow 0}\limsup _{n\rightarrow \infty }\mathbb P _\rho \left(i_0\ne l(an)\right)&\le \limsup _{\varepsilon \rightarrow 0}\limsup _{n\rightarrow \infty } \left(\mathbb P _\rho \left(i_0\ne i_1\right)+\mathbb P _\rho \left(i_1\ne l(an)\right)\right)\\&= \limsup _{\varepsilon \rightarrow 0}\limsup _{n\rightarrow \infty } \mathbb P _\rho \left(l(an)\ne l(an(1+\varepsilon ))\right)\!. \end{aligned}$$

Lemma 4.3 can be applied to establish that the final expression here is equal to 0, as desired.

In conjunction with (4.3), the conclusion of the previous paragraph implies that

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\mathbb P _\rho \left(\pi (X_{e^{an}})\ne \rho _{l(an)}\right)\\&\quad = \limsup _{\varepsilon \rightarrow 0}\limsup _{n\rightarrow \infty } \mathbb P _\rho \left(\pi (X_{e^{an}})\ne \rho _{i_0}\right)\\&\quad \le \limsup _{\varepsilon \rightarrow 0}\limsup _{n\rightarrow \infty }\left[\mathbb P _\rho \left(\tau _{x_{i_0j_0}}> e^{an}\right)+\mathbb P _\rho \left(\tau _{x_{i_0j_0}}\le e^{an},\,\pi (X_{e^{an}})\ne \rho _{i_0}\right)\right]\\&\quad \le \limsup _{\varepsilon \rightarrow 0}\limsup _{n\rightarrow \infty }\mathbf{E}\left(P^\mathcal{T ^*}_{x_{i_0j_0}}\left(\inf \{m:\,\pi (X_m)\ne \rho _{i_0}\}\le e^{an}\right)\right)\\&\quad \le \limsup _{\varepsilon \rightarrow 0}\limsup _{n\rightarrow \infty }\mathbf{E}\left(P^\mathcal{T ^*}_{x_{i_0j_0}}\left( \tau _{\rho _{i_0}}\le e^{an}\right)\right)\!. \end{aligned}$$

Recalling from above (3.10) the definition of \(y_{ij}\) (the deepest vertex of trap \(\mathcal T _{ij}\)), it is plain to show that

$$\begin{aligned} P^\mathcal{T ^*}_{x_{i_0j_0}}\left(\tau _{y_{i_0j_0}}>\tau _{\rho _{i_0}}\right)\le c_1\beta ^{-h_{n}^{\delta }}, \end{aligned}$$

for some constant \(c_1\) depending only on \(\beta \); indeed, this is nothing more than a computation for a biased random walk on \(\mathbb Z \). Furthermore, another simple calculation for biased random walk on the line yields

$$\begin{aligned} P_{y_{i_0j_0}}^\mathcal{T ^*}\left(\tau _{y_{i_0j_0}}^+>\tau _{\rho _{i_0}}\right)\le c_2 e^{-a(1+\varepsilon )n}, \end{aligned}$$

where \(\tau _{y_{i_0j_0}}^+\) is the time of the first return to \(y_{i_0j_0}\), so

$$\begin{aligned} P_{y_{i_0j_0}}^\mathcal{T ^*}\left(\tau _{\rho _{i_0}}\le e^{an}\right) \le \sum _{k=0}^{e^{an}/2}P_{y_{i_0j_0}}^\mathcal{T ^*}\left(\tau _{y_{i_0j_0}}^+\le \tau _{\rho _{i_0}}\right)^k P_{y_{i_0j_0}}^\mathcal{T ^*}\left(\tau _{y_{i_0j_0}}^+> \tau _{\rho _{i_0}}\right)\le c_3e^{-a\varepsilon n}. \end{aligned}$$

Consequently,

$$\begin{aligned} \limsup _{n\rightarrow \infty }\mathbb P _\rho \left(\pi (X_{e^{an}})\ne \rho _{l(an)}\right)\le \limsup _{\varepsilon \rightarrow 0} \limsup _{n\rightarrow \infty }\left( c_1\beta ^{-h_{n}^{\delta }}+ c_3e^{-a\varepsilon n}\right)=0, \end{aligned}$$

which completes the proof. \(\square \)

Putting Lemmas 4.3 and 4.5 together, we obtain Theorem 1.4.

5 A limit theorem for sums of independent random variables with slowly varying tail probability

In this section, we derive the limit theorem for sums of independent random variables with slowly varying tail probability that was applied in the proofs of Lemma 2.5 and Theorem 1.1. The result we prove here is a generalisation of [23, Theorem 2.1].

Let \((X_{i,j})_{i,j\in \mathbb N }\) be non-negative random variables such that for each \(n\ge 1\), the elements of the collection \((X_{n,j})_{j\in \mathbb N }\) are independent and have common distribution function \(F_n\). Moreover, suppose \(F\) is a distribution function such that \(\bar{F}(x):=1-F(x)\) is slowly varying and \(\bar{F}(x)>0\) for all \(x>0\). Similarly writing \(\bar{F}_n(x):=1-F_n(x)\), the main assumption of this section is that for each \(\varepsilon >0\), there exist constants \(c_1,c_2\) such that

$$\begin{aligned} (1-\varepsilon )\bar{F}_n(x)\le \bar{F}(x)\le (1+\varepsilon )\bar{F}_n(x),\quad \forall x\in [c_1(g_1(n)\vee 1),c_2g_2(n)], \end{aligned}$$
(5.1)

where \(g_i(n):=\bar{F}^{-1}(n^{-1}h_i(n)),\,i=1,2\), with \(h_1: \mathbb N \rightarrow (0,\infty )\) a non-decreasing, divergent function satisfying \(\lim _{n\rightarrow \infty }h_1(n)/n=0\), and \(h_2: \mathbb N \rightarrow [0,\infty )\) a non-increasing function satisfying \(\lim _{n\rightarrow \infty }h_2(n)=0\). (Note that necessarily \(\lim _{n\rightarrow \infty }g_i(n)=\infty \) for \(i=1,2\).) Defining a function \(L\) by setting \(L(x):=1/\bar{F}(x)\), we then have the following scaling result for sums of the form

$$\begin{aligned} S_m^{n}:=\sum _{j=1}^mX_{n,j}. \end{aligned}$$

Theorem 5.1

Assume that (5.1) holds. As \(n\rightarrow \infty \),

$$\begin{aligned} \left(\frac{1}{n} L\left(S_{nt}^n\right)\right)_{t\ge 0}\rightarrow \left(m(t)\right)_{t\ge 0} \end{aligned}$$
(5.2)

in distribution with respect to the Skorohod \(J_1\) topology on \(D([0,\infty ),\mathbb R )\).

Remark 5.2

(i) Note that, similarly to Remark 1.9, if \(\bar{F}_n\) and \(\bar{F}\) are not continuous and eventually strictly decreasing, a minor modification to the proof of the above result (cf. Remark 1.7(ii)) is needed.

(ii) The same conclusion holds if on the left-hand side of (5.2) we replace \(L\) by \(L_n(x)=1/\bar{F}_n(x)\).

To the end of proving the above result, it is helpful to introduce \((\eta (t))_{t\ge 0}\) to represent a one-sided stable process with index \(1/2\), i.e., with Lévy measure given by \(\mu ((x,\infty ))=x^{-1/2}\) for \(x>0\). We will write \(F_*(x)=P(\eta (1)\le x)\) for the distribution function of \(\eta (1)\). Briefly, the connection with \((m(t))_{t\ge 0}\) is that \(m(t)=(\max _{0\le s\le t}\Delta \eta (s))^{1/2}\) (as processes)—here we recall \(\Delta \eta (s)=\eta (s)-\eta (s^-)\). Moreover, if we set

$$\begin{aligned} \eta _{n,i}&= \eta (i/n)-\eta ((i-1)/n),\quad \forall i\ge 1,\\ m_n(t)&= \left(\max _{i\le nt}\eta _{n,i}\right)^{1/2}\mathbf{1}_{[1/n,\infty )}(t), \end{aligned}$$

then \(m_n\rightarrow m\) almost-surely in the Skorohod \(J_1\) topology. Indeed, since \(\eta _n(t):=\eta (\lfloor nt\rfloor /n)\rightarrow \eta (t)\) in the Skorohod \(J_1\) topology, by the continuous mapping theorem

$$\begin{aligned} \max _{i\le nt}\eta _{n,i}=\max _{0\le s\le t}\Delta \eta _n(s)\rightarrow \max _{0\le s\le t}\Delta \eta (s) \end{aligned}$$
(5.3)

almost-surely as a process in the same topology.

We are now ready to present the key lemma needed to establish Theorem 5.1. (This corresponds to [23, Lemmas 2.2 and 2.3].) In its statement, we use the notation \(\phi _n(x):=F_n^{-1}(F_*(x))\), and we also define \(\phi (x):=F^{-1}(F_*(x))\) for its proof.

Lemma 5.3

Under (5.1), we have the following.

  1. (i)

    For every \(\lambda >0\) and \(T>0\), as \(n\rightarrow \infty \),

    $$\begin{aligned} \sup _{0\le x\le T}\left|\frac{1}{n} L\left(\lambda \phi _n\left(n^2x^2\right)\right)-x\right|\rightarrow 0. \end{aligned}$$
    (5.4)
  2. (ii)

    For each \(\delta >0,\,T>\delta \), there exist random constants \(K_1,K_2>0\) and \(n_0\) such that, for every \(t\in [\delta ,T]\) and \(n\ge n_0\),

    $$\begin{aligned} K_1\phi _n\left(n^2m_n(t)^2\right)\le \sum _{i\le nt}\phi _n\left(n^2\eta _{n,i}\right)\le K_2\phi _n\left(n^2m_n(t)^2\right)\!, \end{aligned}$$
    (5.5)

    almost-surely.

Proof

We first give some preliminary computations. Rewriting (5.1), we have

$$\begin{aligned} \frac{F(x)-\varepsilon }{1-\varepsilon }\le F_n(x)\le \frac{F(x)+\varepsilon }{1+\varepsilon },\quad \forall x\in [c_1(g_1(n)\vee 1),c_2g_2(n)]. \end{aligned}$$

Setting

$$\begin{aligned} z=F_*^{-1}\left(F(x)\right),\,\kappa _1(\varepsilon , z)=F_*^{-1}\left(\frac{F_*(z)-\varepsilon }{1-\varepsilon }\right),\, \kappa _2(\varepsilon , z)=F_*^{-1}\left(\frac{F_*(z)+\varepsilon }{1+\varepsilon }\right), \end{aligned}$$

it follows that

$$\begin{aligned} \phi _n(\kappa _1(\varepsilon , z))\le \phi (z)=x\le \phi _n(\kappa _2(\varepsilon , z)), \end{aligned}$$
(5.6)

for \(F(c_1g_1(n))<F_*(z)=F(x)<F(c_2g_2(n))\). Since

$$\begin{aligned} z^{-1/2}\sim 1-F_*(z)<1-F(c_1g_1(n))=\bar{F}(c_1g_1(n))\sim \bar{F}(g_1(n))=n^{-1}h_1(n), \end{aligned}$$

and similarly \(z^{-1/2}\sim 1-F_*(z)>n^{-1}h_2(n)\), for suitably large \(n\), the inequality at (5.6) holds for all \(c_1^{\prime }n^2/h_1(n)^2<z<c_2^{\prime }n^2/h_2(n)^2\). Now, since \(F_*^{-1}(x)\sim (1-x)^{-2}\) for \(x\rightarrow 1^-\), we have

$$\begin{aligned} \kappa _1(\varepsilon , n^2x^2)&= F_*^{-1}\left(\frac{F_*(n^2x^2)-\varepsilon }{1-\varepsilon }\right)\\&\sim \left(1-\frac{F_*(n^2x^2)-\varepsilon }{1-\varepsilon }\right)^{-2}\\&= \frac{(1-\varepsilon )^2}{\left(1-F_*(n^2x^2)\right)^2}\\&\sim (1-\varepsilon )^2n^2x^2, \end{aligned}$$

so that \(\kappa _1(\varepsilon , n^2x^2)\ge (1-\varepsilon )^3n^2x^2\) for large \(n\). Similarly, \(\kappa _2(\varepsilon , n^2x^2)\le (1+\varepsilon )^3n^2x^2\) for large \(n\). Since \(\phi _n, h_1\) are non-decreasing and \(h_2\) is non-increasing, by (5.6) we conclude

$$\begin{aligned}&\phi _n\left((1-\varepsilon )^3n^2x^2\right)\le \phi \left(n^2x^2\right)\le \phi _n\left((1+\varepsilon )^3n^2x^2\right),\nonumber \\&\quad \forall n> h_1^{-1}(c_1^{\prime \prime }/x)\vee h_2^{-1}(c_2^{\prime \prime }/x). \end{aligned}$$
(5.7)

Now let us prove (i). By the definition of \(\phi _n\) and the fact that \(1-F_*(x)\sim x^{-1/2}\), we have, as in the proof of [23, Lemma 2.2], that \(\lim _{x\rightarrow \infty }L(\lambda \phi (x^2))/x=1\), from which it follows that \(\lim _{n\rightarrow \infty }L(\lambda \phi (n^2x^2))/n=x\) for \(\lambda >0\). Noting that \(L(\lambda \phi (n^2x^2))/n\) is monotone in \(x\) and the limiting function is continuous, this convergence is uniform in \(x\) on each finite interval. By (5.7),

$$\begin{aligned} n^{-1}L\left(\lambda \phi _n\left((1-\varepsilon )^3n^2x^2\right)\right)\le n^{-1}L\left(\lambda \phi (n^2x^2)\right)\le n^{-1}L\left(\lambda \phi _n\left((1+\varepsilon )^3n^2x^2\right)\right), \end{aligned}$$

for \(n>h_1^{-1}(c_1^{\prime \prime }/x)\vee h_2^{-1}(c_2^{\prime \prime }/x)\), and thus we obtain (5.4).

We next prove (ii). First, by taking \(K_1=1\) the lower bound of (5.5) is clear. So we will prove the upper bound. As in the proof of [23, Lemma 2.2], noting that \(\phi ^{-1}\) is slowly varying and using the representation theorem of [29, Theorem 1.2], we have \(\phi ^{-1}(x)=c(x)\exp (\int _1^x \varepsilon (t)/tdt)\) where \(c(x)\rightarrow c>0\) and \(\varepsilon (x)\rightarrow 0\) as \(x\rightarrow \infty \). Thus, \(\phi (x)\) may be expressed as

$$\begin{aligned} \phi (x)=\exp \left(\int _1^{q(x)x}\frac{\tilde{\varepsilon }(t)}{t}dt\right) \end{aligned}$$

where \(\tilde{\varepsilon }(x)\rightarrow \infty \) and \(q(x)\rightarrow 1/c\) as \(x\rightarrow \infty \). Using this and (5.7), we have, for all \(a>0\) and \(M>2\),

$$\begin{aligned} n^2\frac{\phi _n(n^2a)}{\phi _n(2n^2a)}&\le n^2\frac{\phi \left(\frac{n^2a}{(1-\varepsilon )^{3}}\right)}{\phi _n\left(\frac{2n^2a}{(1+\varepsilon )^3}\right)}\\&= n^2\exp \left(-\int _{q\left(\frac{n^2a}{(1-\varepsilon )^{3}}\right)\frac{n^2a}{(1-\varepsilon )^{3}}} ^{q\left(\frac{2n^2a}{(1+\varepsilon )^3}\right)\frac{2n^2a}{(1+\varepsilon )^3}} \frac{\tilde{\varepsilon }(t)}{t}dt\right)\\&\le n^2\exp (-M\log (an^2))\\&= a^{-M}n^{-2(M-1)}, \end{aligned}$$

when \(n> h_1^{-1}(c_1^{\prime \prime \prime }/\sqrt{a})\vee h_2^{-1}(c_2^{\prime \prime \prime }/\sqrt{a})\vee (c_Ma^{-1/2})\) for some \(c_M\) depending on \(M\). Thus

$$\begin{aligned} \lim _{n\rightarrow \infty }n^2\frac{\phi _n(n^2a)}{\phi _n(2n^2a)}=0,\quad \forall a>0. \end{aligned}$$
(5.8)

Given this, the rest is a minor modification of the proof of [23, Lemma 2.3]. Let \(a=m(\delta )^2/3>0\). Since \(m_n\rightarrow m\) almost-surely (as discussed around (5.3)), there exists a random \(n_1\ge 1\) such that

$$\begin{aligned} m_n(t)^2\ge m_n(\delta )^2\ge 2a,\quad \forall t\ge \delta ,\,n\ge n_1. \end{aligned}$$
(5.9)

Let \(\mathcal A _{1,i}=\{\eta _{n,i}<n^{-2}\},\,\mathcal A _{2,i}=\{n^{-2}\le \eta _{n,i}\le a\}\) and \(\mathcal A _{3,i}=\{a< \eta _{n,i}\}\), and define

$$\begin{aligned} S^{n,k}_m=\sum _{i\le m}\phi _n(n^2\eta _{n,i})\mathbf{1}_\mathcal{A _{k,i}},\quad k=1,2,3. \end{aligned}$$

Since \(\phi _n\) is non-decreasing, \(S^{n,1}_{nt}\le nT\phi _n(1)\) for \(t\le T\). Further, for \(n\ge a^{-1/2}\), we have that \(n\phi _n(1)/\phi _n(2n^2a)\,\le n\phi _n(n^2a)/\phi _n(2n^2a)\), which goes to \(0\) as \(n\rightarrow \infty \) by (5.8). Thus, there exists a random \(n_2\ge 1\) such that

$$\begin{aligned} S^{n,1}_{nt}\le \phi _n(2n^2a)\le \phi _n(n^2m_n(t)^2),\quad \forall \delta \le t\le T,\,n\ge n_2,\qquad \end{aligned}$$
(5.10)

where the last inequality is due to (5.9). Next, by (5.8), there exists a random \(n_3\ge 1\) such that

$$\begin{aligned} 0\le \phi _n(n^2x)\le \phi _n(n^2a)n^2x\le \phi _n(2n^2a)x,\quad \forall n^{-2}\le x\le a, \,n\ge n_3. \end{aligned}$$

Thus, for \(\delta \le t\le T\) and \(n\ge n_3(\omega )\), we have

$$\begin{aligned} S^{n,2}_{nt}\le \phi _n(2n^2a)\sum _{i\le nt}\eta _{n,i}=\phi _n(2n^2a)\eta (\lfloor nt\rfloor /n) \le \phi _n(n^2m_n(t)^2)\eta (T), \end{aligned}$$
(5.11)

where the last inequality is due to (5.9). Now, noting that there are only finitely many \(t\in [0,T]\) such that \(\Delta \eta (t)>a\), there exists a random \(K_3>0\) such that \(\sum _{i\le nT}\mathbf{1}_\mathcal{A _{3,i}}\le K_3\) for large \(n\), almost-surely. Using this and the definition of \(m_n(t)\), there exists a random \(n_4\ge 1\) such that the following holds:

$$\begin{aligned} S^{n,3}_{nt}\le K_3\max _{i\le nt}\phi _n(n^2\eta _{n,i})=K_3\phi _n(n^2m_n(t)^2),\quad \forall \delta \le t\le T,\,n\ge n_4.\qquad \end{aligned}$$
(5.12)

Combining (5.10), (5.11) and (5.12), we obtain

$$\begin{aligned} \sum _{i\le nt}\phi _n(n^2\eta _{n,i})\le K_2\phi _n(n^2m_n(t)^2),\quad \forall \delta \le t\le T,\,n\ge n_2\vee n_3\vee n_4=:n_0, \end{aligned}$$

where \(K_2=1+\eta (T)+K_3\). Thus we have obtained the upper bound of (ii). \(\square \)

Proof of Theorem 5.1

Given the above lemma, the proof of Theorem 5.1 is basically the same as that of [23, Theorem 2.1], and so we only sketch it briefly. Let

$$\begin{aligned} \zeta _n^{(n)}(t)=\frac{1}{n} L\left(\sum _{i\le nt}\phi _n\left(n^2\eta _{n,i}\right)\right)\!. \end{aligned}$$

Then, by definition, \((\zeta _n^{(n)}(t))_{t\ge 0}\) is equal in law to \((\frac{1}{n} L(S_{nt}^n))_{t\ge 0}\). Further, as discussed around (5.3), \(m_n\rightarrow m\) almost-surely with respect to the Skorohod \(J_1\) topology. So, in order to complete the proof, it suffices to prove the following: for \(T>0\),

$$\begin{aligned} \sup _{0\le t\le T}\left|\zeta _n^{(n)}(t)-m_n(t)\right|\rightarrow 0, \end{aligned}$$
(5.13)

almost-surely. Firstly,

$$\begin{aligned} \sup _{\delta \le t\le T}\left|\zeta _n^{(n)}(t)-m_n(t)\right|&\le \sup _{0\le t\le T}\max _{i=1,2}\left|n^{-1}L\left(K_i\phi _n(n^2m_n(t)^2)\right)-m_n(t)\right|\\&\le \sup _{0\le x\le m_n(T)}\max _{i=1,2}\left|n^{-1}L\left(K_i\phi _n(n^2x^2)\right)-x\right|\!,\\ \end{aligned}$$

where Lemma 5.3(ii) is used in the first inequality. This bound converges to 0 as \(n\rightarrow \infty \) by Lemma 5.3(i). Secondly, using the monotonicity of \(\zeta _n^{(n)}\) and \(m_n\) (and Lemma 5.3 again), we have

$$\begin{aligned}&\lim _{\delta \rightarrow 0}\limsup _{n\rightarrow \infty }\sup _{0\le t\le \delta }\left|\zeta _n^{(n)}(t)-m_n(t)\right|\le \lim _{\delta \rightarrow 0}\limsup _{n\rightarrow \infty }(\zeta _n^{(n)}(\delta )+m_n(\delta ))\\&\quad =\lim _{\delta \rightarrow 0}2m(\delta )=0, \end{aligned}$$

almost-surely. We thus obtain (5.13). \(\square \)