Biased random walk on critical Galton–Watson trees conditioned to survive

We consider the biased random walk on a critical Galton–Watson tree conditioned to survive, and confirm that this model with trapping belongs to the same universality class as certain one-dimensional trapping models with slowly-varying tails. Indeed, in each of these two settings, we establish closely-related functional limit theorems involving an extremal process and also demonstrate extremal aging occurs.

one-dimensional heavy-tailed trapping models considered in [33]. Indeed, although the case of a deterministically biased random walk on a Galton-Watson tree with leaves is slightly complicated by a certain lattice effect, which means it can not be rescaled properly [6], in the case of randomly biased random walks on such structures, it was shown in [22] (see also the related article [8]) that precisely the same limiting behaviour as the one-dimensional models of [15] and [33] occurs. Moreover, there is evidence presented in [17] that suggests the biased random walk on a supercritical percolation cluster also has the same limiting behaviour. The universality class that connects these models was previously investigated in [3,4] and [5], and is characterised by limiting stable subordinators and aging properties.
The aim of this paper is to investigate biased random walks on critical structures. To this end, we choose to study the biased random walk on a critical Galton-Watson tree conditioned to survive. With the underlying environment having radically different properties from its supercritical counterpart, we would expect different limiting behaviour, with more extreme trapping phenomena, to arise. It is further natural to believe that some of the properties of the biased random walk on the incipient infinite cluster for critical percolation on Z d , at least in high dimensions, would be similar to the ones proved in our context, as is observed to be the case for the unbiased random walk (compare, for instance, the results of [1] and [25]). Nevertheless, our current understanding of the geometry of this object is not sufficient to extend our results easily, and so we do not pursue this inquiry here. In particular, we anticipate that, as indicated by physicists in [2], for percolation close to criticality there is likely to be an additional trapping mechanism that occurs due to spatial considerations, which means that, even without taking the effect of dead-ends into account, it is more likely for the biased random walk to be found in certain regions of individual paths than others (see [9] for a preliminary study in this direction).
Our main model-the biased random walk on critical Galton-Watson trees conditioned to survive-is presented in the next section, along with a summary of the results we are able to prove for it. This is followed in Sect. 1.2 with an introduction to a one-dimensional trapping model in which the trapping time distributions have slowly-varying tails. This latter model, which is of interest in its own right, is of particular relevance for us, as it allows us to comprehensively characterise the universality class into which the Galton-Watson trees we consider fall. Furthermore, the arguments we apply for the one-dimensional model provide a useful template for the more complicated tree framework.

Biased random walk on critical Galton-Watson trees
Before presenting the Galton-Watson tree framework, we recall some classical results for sums of random variables whose distribution has a slowly-varying tail. Let (X i ) ∞ i=1 be independent random variables, with distributional tailF(u) = 1− F(u) = P(X i > u) satisfying:F(0) = 1,F(u) > 0 for all u > 0, lim u→∞F (uv) for any v > 0, andF(u) → 0 as u → ∞. A typical example is when the distribution in question decays logarithmically slowly, such as F(u) ∼ 1 (ln u) γ , (1.2) for some γ > 0, where throughout the article f ∼ g will mean f (x)/g(x) → 1 as x → ∞. A first scaling result for sums of the form n i=1 X i was obtained in [11], and this was subsequently extended by [23] to a functional result. In particular, in [23] it was established that if L(x) := 1/F(x), then → (m(t)) t≥0 (1.3) in distribution with respect to the Skorohod J 1 topology (as an aid to the reader, we provide in the appendix a definition of the Skorohod J 1 and M 1 topologies, the latter of which is applied in several subsequent results), where m = (m(t)) t≥0 is an extremal process. To define m more precisely, suppose that (ξ(t)) t≥0 is the symmetric Cauchy process, i.e., the Lévy process with Lévy measure given by μ((x, ∞)) = x −1/2 for x > 0, and then set where ξ(s) = ξ(s) − ξ(s − ). (Observe that (m(t)) t≥0 is thus the maximum process of the Poisson point process with intensity measure x −2 dxdt. ) We will prove that, in addition to appearing in the limit at (1.3), this extremal process arises in the scaling limits of a biased random walk on a critical Galton-Watson tree and, as is described in the next section, a one-dimensional directed trap model whose holding times have a slowly-varying mean. We continue by introducing some relevant branching process and random walk notation, following the presentation of [10]. Let Z be a critical (EZ = 1) offspring distribution in the domain of attraction of a stable law with index α ∈ (1, 2], by which we mean that there exists a sequence a n ↑ ∞ such that Z [n] − n a n d → X, (1.4) where Z [n] is the sum of n i.i.d. copies of Z and E(e −λX ) = e −λ α for λ ≥ 0. Note that, by results of [16,Chapters XIII and XVII], this is equivalent to the probability generating function of Z satisfying f (s) := E(s Z ) = ∞ k=0 p k s k = s + (1 − s) α L(1 − s), ∀s ∈ (0, 1), (1.5) where L(x) is slowly varying as x → 0 + , and the non-triviality condition P(Z = 1) = 1 holding. We point out that the condition E(Z 2 ) < ∞ is sufficient for the previous statements to hold with α = 2. Denote by (Z n ) n≥0 the corresponding Galton-Watson process, started from Z 0 = 1. It has been established in [30,Lemma 2] that if q n := P(Z n > 0), then (1.6) as n → ∞, where L is the function appearing in (1.5). It is also well known that the branching process (Z n ) n≥0 can be obtained as the generation size process of a Galton-Watson tree, T say, with offspring distribution Z . In particular, to construct the random rooted graph tree T , start with a single ancestor (or root), and then suppose that individuals in a given generation have offspring independently of the past and each other according to the distribution of Z , see [26,Section 3] for details. The vertex set of T is the entire collection of individuals, edges are the parent-offspring bonds, and Z n is the number of individuals in the nth generation of T . From (1.6), it is clear that T will be a finite graph P-a.s. However, in [24], Kesten showed that it is possible to make sense of conditioning T to survive or 'grow to infinity'. More specifically, there exists a unique (in law) random infinite rooted locally-finite graph tree T * that satisfies, for any n ∈ Z + , where φ is a bounded function on finite rooted graph trees of n generations, and T | n , T * | n are the first n generations of T , T * respectively. We will write d T * to represent the shortest path graph distance on T * . Given a particular realisation of T * , we will denote by X = ((X n ) n≥0 , P T * x , x ∈ T * ) the discrete-time biased random walk on T * , and define this as follows. First, fix a bias parameter β > 1, and assign to each edge connecting a vertex x in generation k to a vertex y in generation k + 1 a conductance c(x, y) := β k =: c(y, x). The transition probabilities of X are then determined by , ∀x ∼ y, where the notation x ∼ y means that x and y are connected by an edge in T * . Thus, when at a vertex x that is not equal to the root of T * , the probability of jumping to a neighbouring vertex further away from the root than x is β times more likely than jumping towards the root. Using the usual terminology for random walks in random environments, we will say that P T * x is the quenched law of the biased random walk on T * started from x. Moreover, we introduce the annealed law for the process started from ρ, the root of the tree T * , by setting P ρ (·) := P T * ρ (·)dP. (1.7) It will be this law under which we investigate the rate at which the process X , which we call the biased random walk on a critical Galton-Watson tree conditioned to survive, escapes from the root. The main result we prove for the process X concerns the time it takes to progress along the backbone. To be more specific, as is described in more detail in Sect. 3.1, P-a.s. the tree T * admits a unique backbone, that is, a semi-infinite path starting from the root, {ρ = ρ 0 , ρ 1 , ρ 2 , . . . } say. We define ( n ) n≥0 by setting to be the first time the process X reaches level n along this path. For this process, we are able to prove the following functional limit theorem. to the law of (m(t)) t≥0 .
It is interesting to observe that this result is extremely explicit compared to its supercritical counterparts. Indeed, notwithstanding the fact the lattice-effect that was the source of somewhat complicated behaviour in [6] does not occur in the critical setting, the above scaling limit clearly describes the β-dependence of the relevant slowdown effect. Note that, unlike in the supercritical case where there is a ballistic phase, this slowdown effect occurs for any non-trivial bias parameter, i.e. for any β > 1. Furthermore, we remark that the dependence on α is natural: as α decreases and the leaves get thicker (in the sense that tree's Hausdorff dimension of α/(α − 1) increases, see [12,21]), the biased random walk moves more slowly away from its start point.
As suggested by comparing Theorem 1.1 with (1.3), the critical Galton-Watson tree case is closely linked with a sum of independent and identically-distributed random variables whereF is asymptotically equivalent to ln β/(α − 1) ln x. Although the logarithmic rate of decay is relatively easy to guess, finding the correct constant is slightly subtle, particularly for α = 2. This is because, unlike in the supercritical case and the critical case with α = 2, when α = 2 it can happen that there are multiple deep traps emanating from a single backbone vertex. As a result, we have to take special care which of these have actually been visited when determining the time spent there, meaning that the random variable which actually has the ln β/(α−1) ln x tail behaviour is not environment measurable (see Lemma 3.11). To highlight the importance of this consideration, which is also relevant albeit in a simpler way for α = 2, in Theorem 3.14 we show that the constant that appears differs by a factor α when n is replaced by its quenched mean E T * ρ n . Theorem 1.1 readily implies the following corollary for the projection, (π(X m )) m≥0 , of the process (X m ) m≥0 onto the backbone (roughly, π(X m ) is the vertex on the backbone from which the trap X m is located in emanates, see Sect. 3.2 for a precise definition). To state this, we define the right-continuous inverse (m −1 (t)) t≥0 of (m(t)) t≥0 by setting (1.9) Corollary 1.2 Let α ∈ (1, 2]. As n → ∞, the laws of the processes under P ρ converge weakly with respect to the Skorohod M 1 topology on D([0, ∞), R) to the law of (m −1 (t)) t≥0 . Remark 1.3 Since the height of the leaves in which the random walk can be found at time e n (see the localisation result of Lemma 4.5) will typically be of order n, some further argument will be necessary to deduce a limit result for the graph distance Another characteristic property that we are able to show is that the random walk also exhibits extremal aging.
Although regular aging has previously been observed for random walks in random environments in the sub-ballistic regime on Z (see [14]), as far as we know, this is the first example of a random walk in random environment where extremal aging has been proved. As already hinted at, this kind of behaviour, as well as that demonstrated in Theorem 1.1 and Corollary 1.2, places the biased random walk on a critical Galton-Watson tree conditioned to survive in a different universality class to that of the supercritical structures discussed previously. In the class of critical Galton-Watson trees we have instead the spin glass models considered in [7] and [20], and the trap models with slowly-varying tails we introduce in the next section.

One-dimensional directed trap model with slowly-varying tails
In this section, we describe the one-dimensional trap model with which we want to compare to our main model, and the results we are able to prove for it. To start with a formal definition, let τ = (τ x ) x∈Z be a family of independent and identicallydistributed strictly positive (and finite) random variables whose distribution has a slowly-varying tail, in the sense described by (1.1), built on a probability space with measure P; the sequence τ = (τ x ) x∈Z will represent the trap environment. For a fixed bias parameter β > 1, the directed trap model is then the continuous-time Markov process X = (X t ) t≥0 with state space Z, given by X 0 = 0 and with jump rates and c(x, y) = 0 otherwise. To be more explicit, for a particular realisation of τ we will write P τ x for the law of the Markov chain with the above transition rates, started from x; similarly to describing P T * x in the previous section, we call this the quenched law for the directed trap model. The corresponding annealed law P x is obtained by integrating out the environment similarly to (1.7), i.e.
In studying the rate of escape of the above directed trap model, it is our initial aim to determine the rate of growth of that is, the hitting times of level n by the process X . The following theorem contains our main conclusion in this direction. As in the statement at (1.3), we define L(x) = 1/F(x). Similarly to [23,Remark 2.4], we note that the proof of the above result may be significantly simplified in the case whenF decays logarithmically. The reason for this is that, in the logarithmic case, the hitting time n is very well-approximated by the maximum holding time within the first n vertices, and so the functional scaling limit for ( n ) n≥0 can be readily obtained from a simple study of the maximum holding time process. For general slowly varying functions, the same approximation does not provide tight enough control on n to apply this argument, and so a more sophisticated approach is required.
As a simple corollary of Theorem 1.5, it is also possible to obtain a scaling result for the process X itself. The definition of m −1 should be recalled from (1.9). We similarly define the right-continuous inverseF −1 ofF, only with > replaced by <. Corollary 1.6 As n → ∞, the laws of the processes under P 0 converge weakly with respect to the Skorohod M 1 topology on D([0, ∞), R) to the law of (m −1 (t)) t≥0 . Remark 1.7 (i) Although the preceding corollary does look somewhat awkward, it becomes much clearer for concrete choices ofF. For example, ifF has the form described at (1.2), then the above result concerns the distributional limit of Moreover, it can be deduced from the above result that, as t → ∞, the random variablē F(t)X t converges in distribution under P 0 to m −1 (1), which is easily checked to have a mean one exponential distribution.
(ii) In a number of places in the proofs of Theorem 1.5 and Corollary 1.6, we are slightly cavalier about assuming thatF(F −1 (x)) = x for x ∈ (0, 1). This is, of course, only true in general whenF is continuous. In the case when this condition is not satisfied, however, we can easily overcome the difficulties that arise by replacingF with any non-increasing continuous functionḠ that satisfiesḠ(0) = 1 andḠ(u) ∼F(u) as u → ∞. For example, one could define such aḠ by settingḠ(u) : The extremal aging result we are able to prove in this setting is as follows.
Remark 1.9 Note that ifF (and the functionsF n introduced below at (2.6)) are not continuous and eventually strictly decreasing, a minor modification to the proof of the above result (cf. Remark 1.7(ii)) is needed.

Article outline and notes
The remainder of the article is organised as follows. In Sect. 2, we study the onedimensional trap model introduced in Sect. 1.2 above, proving Theorem 1.5 and Corollary 1.6. In Sect. 3, we then adapt the relevant techniques to derive Theorem 1.1 and Corollary 1.2 for the Galton-Watson tree model. The arguments of both these sections depend on the extension of the limit at (1.3) that is proved in Sect. 5. Before this, in Sect. 4, we derive the extremal aging results of Theorems 1.4 and 1.8. Finally, as noted earlier, the appendix recalls some basic facts concerning Skorohod space. We finish the introduction with some notes about the conventions used in this article. Firstly, there are two widely used versions of the geometric distribution with a given parameter, one with support 0, 1, 2, . . . and one with support 1, 2, 3, . . . . In the course of this work, we will use both, and hope that, even without explanation, it is clear from the context which version applies when. Secondly, there are many instances when for brevity we use a continuous variable where a discrete argument is required, in such places x, say, should be read as x . Finally, we recall that f ∼ g will mean f (x)/g(x) → 1 as x → ∞.

Directed trap model with slowly-varying tails
This section is devoted to the proof of Theorem 1.5 and Corollary 1.6. To this end, we start by deriving some slight adaptations of results from [33] regarding the trap environment. First, define a level n critical depth for traps of the environment by setting g(n) :=F −1 (n −1 ln n). (2.1) We will say that there are deep traps at the sites D := {x ∈ Z : τ x > g(n)}, and consider the following events: for n ∈ N, T ∈ (0, ∞), where κ, γ ∈ (0, 1) are fixed. The event E 1 (n, T ) requires that the distance between any two deep traps in the interval [1, nT ] is large, and the event E 2 (n) will help to ensure that the time the process X spends outside of the strictly positive integers is negligible.
Proof To check the result for E 1 (n, T ), we simply observe that Similarly, we have that P(E 2 (n) c ) ≤ n −1 (1 + (ln n) 1+γ ) ln n, which also converges to 0.
We continue by introducing the embedded discrete-time random walk associated with X and some of its properties, which will be useful throughout the remainder of the section. In particular, first let S(0) = 0 and S(n) be the time of the nth jump of X ; this is the clock process corresponding to X . The embedded discrete-time random walk is then the process Y = (Y n ) n≥0 defined by setting Y n := X S(n) . Clearly Y is a biased random walk on Z under P τ 0 for P-a.e. realisation of τ , and thus satisfies, P τ 0 -a.s., Whilst this result already tells us that the embedded random walk Y drifts off to +∞ and that the time it takes to hit level n, that is, is finite for each n, P τ 0 -a.s., we further require that it does not backtrack too much, in the sense that, for each T ∈ (0, ∞), occurs with high probability. This is the content of the following lemma, which is essentially contained in [33, Lemma 3].
Let us now introduce the total time the biased random walk X spends at a site x ∈ Z, To study this, first observe that the clock process S = S(n) n≥0 can be written where (e i ) i≥0 is an independent sequence of mean one exponential random variables under P τ 0 , independent of Y . Moreover, for x ∈ Z, let G(x) = #{n ≥ 0 : Y n = x} be the total number of visits of the embedded random walk Y to x. By applying the fact that Y is a random walk with a strictly positive bias, we have that if x ≥ 0, then G(x) has the geometric distribution with parameter p = (β − 1)/(β + 1) (again for P-a.e. realisation of τ ). It follows that T x is equal in distribution under P 0 to the random variable which is almost-surely finite. We will use this characterisation of the distribution of T x to check that the time spent by X in traps that are not deep is asymptotically negligible, in the sense described by the following event: for n ∈ N, T ∈ (0, ∞), In particular, by similar arguments to [33,Lemma 4], we deduce the following.
To proceed, note that, on E 3 (n, T ), we have that Combining this bound with (2.3) and using Markov's inequality yields On recalling the conclusion of Lemma 2.2, this completes the proof.
As a consequence of the previous result, to deduce a scaling limit for the sequence ( n ) n≥0 , it will suffice to study sums of the form n x=1 T x 1 {τ x >g(n)} . In fact, the backtracking result of Lemma 2.2 will further allow us to replace T x in this expression byT ]. This is particularly useful because, by applying the fact that deep traps are separated by a distance that is polynomial in n (see Lemma 2.1), it will be possible to decouple the random variables (T x 1 {τ x >g(n)} ) x≥1 in such a way that enables us to deduce functional scaling results for their sums from those for independent sums proved in Sect. 5. Before commencing this program in Lemma 2.5, however, we derive a preliminary lemma that suitably describes the asymptotic behaviour of the distributional tailF (Clearly, the definition ofF n is independent of the particular x ≥ 1 considered.)

Lemma 2.4 For every ε >0, there exists a constant c such that, for any u ≥ c(g(n)∨1),
Proof For x ≥ 1, letG(x) be the total number of visits of the embedded random walk Y to x up until the first time after Y x that it leaves the interval [x − (ln n) 1+γ , x + (ln n) 1+γ ]. Then, similarly to (2.2), we have thatT x is distributed as τ x i=1 e i , we can use the independence of and τ x under P 0 to write It follows that The first term on the right-hand side of (2.7) is independent of n, and so it will be enough for our purposes to show that it converges to 0 as u → ∞. To do this, first note that, by the monotonicity ofF, Hence, the lim sup as u → ∞ of the term of interest is bounded above by and so the first limsup is bounded above by P 0 ( ≤ v 0 ). Furthermore, if v 1 is chosen to be no less than 1, then we can apply the bound at (2.4) to estimateF .
For the second term on the right-hand side of (2.7), we apply (2.4) and Markov's inequality to deduce that, if u ≥ g(n), then where ε ∈ (0, 1) is fixed. Thus, since the above bound is small whenever u/g(n) is large (it was already noted in the previous paragraph that has a finite first moment), the proof is complete.

Lemma 2.5
As n → ∞, the laws of the processes Proof First, fix T ∈ (0, ∞) and suppose ( f x ) x≥1 is a collection of bounded, continuous functions on R. We then have that where the sums are over subsets B ⊆ [1, nT ] such that if x 1 , x 2 ∈ B and x 1 = x 2 , then |x 1 − x 2 | > n κ . By applying the independence of traps at different sites and the disjointness of the intervals ([x − (ln n) 1+γ , x + (ln n) 1+γ ]) x∈B for the relevant choices of B, the above sum can be rewritten as In particular, it follows that where we suppose that, under P 0 , the pairs of random variables (T x , τ x ), x ≥ 1, are independent and identically-distributed as (T 1 , τ 1 ), and the event E 1 (n, T ) is defined analogously to E 1 (n, T ) from these random variables. Consequently, under P 0 , the laws of (T By applying the conclusion of the previous paragraph, we obtain that, for any bounded function H : D([0, T ], R) → R that is continuous with respect to the Skorohod J 1 topology, Since Lemma 2.1 tells us that this upper bound converges to 0 as n → ∞, to complete the proof it will thus suffice to establish the result with However, because we are assuming that (T x , τ x ) x≥1 are independent, the tail asymptotics proved in Lemma 2.4 allow us to derive the relevant scaling limit for the sums involving (T x , τ x ) x≥1 by a simple application of Theorem 5.1 (with h 1 (n) = ln n and h 2 (n) = 0).
We are now in a position to prove Theorem 1.5 by showing that the rescaled sums considered in the previous lemma suitably well approximate the sequence ( n ) n≥1 .
Proof of Theorem 1.5 Fix T ∈ (0, ∞) and observe that, on (2.8) By reparameterising the time-scales in the obvious way, it is clear that where d J 1 is the Skorohod J 1 distance on D([0, T ], R) (as defined in the appendix at (6.1)), is bounded above by for large n. (Note that the first term above relates to the distortion of the time scale needed to compare the two processes.) By Lemma 2.5, this bound converges in distribution under P 0 to m(T ) − m(T − ε). Now, in the limit as ε → 0, m(T ) − m(T − ε) converges to 0 in probability. It readily follows that, as n → ∞, so does the expression at (2.9). Hence, the theorem will follow from Lemmas 2.
To check this, we start by noting that Lemma 2.5 implies, for any λ > 0, By choosing λ suitably large, the limiting probability can be made arbitrarily close to 1. Thus the problem reduces to showing that, for any λ ∈ (0, ∞), which converges to 0 as n → ∞ by (1.1). Moreover, we also have that where the asymptotic equivalence is an application of (1.1). In particular, since the right-hand side above is equal to ε, which can be chosen arbitrarily small, the result follows.
From this, the proof of Corollary 1.6 is relatively straightforward.
Proof of Corollary 1.6 Define X * = (X t ) t≥0 to be the running supremum of X , i.e. X * t := max s≤t X s . Since X * t ≥ n if and only if n ≤ t, we obtain that (X * t + 1) t≥0 is the inverse of ( n ) n≥0 in the sense described at (1.9). Thus, because the inverse map is continuous with respect to the Skorohod M 1 topology (at least on the subset of functions f ∈ D([0, ∞), R) that satisfy lim sup t→∞ f (t) = ∞, see [31]), it is immediate from Theorem 1.5 that, as n → ∞, the laws of the processes under P 0 converge weakly with respect to the Skorohod M 1 topology on D([0, ∞), R) to the law of (m −1 (t)) t≥0 . Thus, to complete the proof, it will suffice to demonstrate that, for any T ∈ (0, ∞), in P 0 -probability as n → ∞. To do this, we first fix T ∈ (0, ∞) and set N := nT ln(nT ). Theorem 1.5 then implies that where Y * is the running supremum of Y . Hence lim sup where we have applied the fact that P 0 (E 3 (N , 1) c ) → 0, which is the conclusion of Lemma 2.2, and also that n −1 (ln N ) 1+γ → 0, which is clear from the definition of N .

Biased random walk on critical Galton-Watson trees
In this section, we explain how techniques similar to those of the previous section can be used to deduce the corresponding asymptotics for a biased random walk on a critical Galton-Watson tree conditioned to survive. Prior to proving our main results (Theorem 1.1 and Corollary 1.2), however, we proceed in the next two subsections to derive certain properties regarding the structure of the tree T * and deduce some preliminary simple random walk estimates, respectively. These results establish information in the present setting that is broadly analogous to that contained in Lemmas 2.1-2.4 for the directed trap model.

Structure of the infinite tree
A key tool throughout this study is the spinal decomposition of T * that appears as [24,Lemma 2.2], and which can be described as follows. First, P-a.e. realisation of T * admits a unique non-intersecting infinite path starting at the root. Conditional on this 'backbone', the number of children of vertices on the backbone are independent, each distributed as a size-biased random variableZ , which satisfies Moreover, conditional on the backbone and the number of children of each backbone element, the trees descending from the children of backbone vertices that are not on the backbone are independent copies of the original critical branching process T .
To fix notation and terminology for this decomposition, we will henceforth suppose that T * has been built by starting with a semi-infinite path, {ρ = ρ 0 , ρ 1 , ρ 2 , . . . }this will form the backbone of T * . Then, after selecting (Z i ) i≥0 independently with distribution equal to that ofZ , to each backbone vertex ρ i , we attach a collection of 'buds' ρ i j , j = 1, . . . ,Z i − 1. Finally, we grow from each bud ρ i j a 'leaf' T i j , that is, a Galton-Watson tree with initial ancestor ρ i j and offspring distribution Z . See Fig. 1 for a graphical representation of these definitions. With this picture, it is clear how we can view T * as an essentially one-dimensional trap model with the backbone playing the role of Z in the previous section. Rather than having an exponential holding time at each vertex ρ i , however, we have a random variable representing the time it takes X to leave the tree T i := {ρ i }∪(∪ j=1,...,Z i −1 T i j ) starting from ρ i . As will be made precise later, key to determining whether this time is likely to be large or not are the heights of the leaves connected to ρ i . For this reason, Fig. 1 Decomposition of T * the rest of this section will be taken up with an investigation into the big, or perhaps more accurately tall, leaves of T * .
More concretely, we start by introducing a sequence of critical heights (h n ) n≥1 by setting h n := n(ln n) −1 (roughly, β h n will play the role that the g(n) introduced at (2.1) did in the previous section), and define, for each i ≥ 0, where h(T i j ) is the height of the tree T i j , so that N n (i) counts the number of big leaves emanating from the backbone vertex ρ i . The random variables in the collection (N n (i)) i≥0 are independent and identically-distributed. Moreover, it is possible to describe the asymptotic probability that one of these random variables is equal to zero, i.e. there is no big leaf at the relevant site.
Proof By conditioning on the number of buds attached to the root, we have where, as introduced above (1.6), q k is the probability that an unconditioned branching process with offspring distribution Z survives for at least k generations. By the sizebiasing of (3.1), this can be rewritten as where f is the derivative of the generating function f , as defined at (1.5). Now, by From this, the proof is completed by recalling the tail decay at (1.6).
It will be important for our future arguments that the sites from which big leaves emanate are not too close together, and that there are no big traps close to ρ. The final lemma of this section demonstrates that the sequence of critical heights we have chosen achieves this.
Proof This is essentially the same as Lemma 2.1.

Initial random walk estimates
This section collects together some preliminary results for the biased random walk (X m ) m≥0 on T * , regarding in particular: the amount of backtracking performed by the embedded biased random walk on the backbone; the amount of time X spends in small leaves; the amount of time X spends close to the base of big leaves; and tail estimates for the amount of time X spends deep within big leaves.
To begin with, we introduce Y = (Y n ) n≥1 to represent the jump process of π(X ), and then define Y n := X S(n) . From this construction, it is clear that, under either the quenched or annealed law, Y is simply a biased random walk on the semi-infinite line graph {ρ 0 , ρ 1 , . . . }, and so, as in the previous section, we can control the amount it backtracks. In particular, if we let Y n := inf {m ≥ 0 : Y m = ρ n } be the first time that the embedded random walk Y reaches level n along the backbone, then we have the following result, which is simply a restatement of Lemma 2.2. We recall that d T * is the shortest path graph distance on T * .
Our next goal is to show that the time the biased random walk X spends outside of the big leaves of T * is unimportant, where we define the set of vertices in big leaves to be Key to doing this is the following equality, which is obtained by applying standard results for weighted random walks on graphs (cf. [6, Lemma 3.1]): where for a vertex x ∈ T * , we define τ x := inf{m ≥ 0 : X m = x}. For the statement of the next lemma, which is approximately analogous to Lemma 2.3, we recall the definition of ( n ) n≥0 from (1.8).

Lemma 3.4
Let α ∈ (1, 2], T ∈ (0, ∞) and ε > 0. As n → ∞, Proof We start by estimating the quenched expectation of the time X spends in a particular small leaf before reaching level nT along the backbone. Thus, suppose we have a leaf T i j such that i < nT and h(T i j ) < h n . Starting from the vertex ρ i , the probability of hitting ρ i j before ρ nT can be computed exactly, by elementary means, as This means that the number of separate visits X makes to T i j is stochastically dominated by a geometric random variable with parameter 1−(2−β −1 ) −1 , and so its mean is bounded above by β/(β − 1). Moreover, the equality at (3.2) and our assumption on h(T i j ) imply that, on each visit to T i j , the amount of time X spends there is bounded above by where #T i j is the total number of vertices in T i j . Hence As for the estimating time spent at a vertex ρ i , where 0 < i < nT , we start by noting that the total number of returns to ρ i is a geometric random variable. Moreover, its parameter P T * ρ i (τ + ρ i = ∞), where τ + ρ i := inf{m > 0 : X m = ρ i } is the first return time to ρ i , can easily be bounded below by the probability that X jumps from ρ i to ρ i+1 on its first step times the probability that a biased random walk on Z never hits the vertex to the left of its starting point. Since the first of these quantities is given by β/(βZ i + 1) and the second is equal to 1 − β −1 , it follows that

4)
A similar argument applies for i = 0.

Piecing together the estimates at (3.3) and (3.4), we thus obtain
where c β is a constant depending only on β. Now, to bound the summands, we consider the following probabilistic upper bound For the first of these terms, we apply the size-biasing of (3.1) and Markov's inequality to deduce Since the expectation in (3.7) is finite for any α ∈ (0, α − 1) (see [19, Section 35], for example), we fix an α in this range to obtain a polynomial bound for the relevant probability. For the second term of (3.6), we first condition onZ i to obtain From the proof of Lemma 3.1, we know that as k → ∞. To establish a bound for P(#T ≥ k) that decays polynomially quickly, first note that P(#T = k) = k −1 P(S k = −1), where (S k ) k≥0 is a random walk on Z with step distribution Z − 1 (see [13]). Moreover, by the local limit theorem of [19,Section 50], it is the case that P(S k = −1) ∼ ca −1 k , where a k are the constants appearing in (1.4). Since a k ∼ k 1/α (k) for some slowly varying function (see [19, Section 35], for example), it follows that if α ∈ (0, 1/α), then there exists a constant c such that P(|T | ≥ k) ≤ ck −α . Combining this estimate with (3.6), (3.7) and (3.8), we obtain that there exist constants c and δ > 0 such that (3.9) Consequently, recalling (3.5), and this converges to 0 as n → ∞.
The result above means that, in establishing the distributional convergence of n , we only have to consider the time the random walk X spends in big leaves. In fact, as we will now show, the time spent close to the backbone in big leaves is also negligible. To this end, let us start by introducing some notation and formalising some terminology. First, we will write y i j for the deepest vertex in T i j ; that is, the vertex that maximises the distance from the root ρ i j . So that this notion is well-defined, if there is more than one vertex at the deepest level of T i j , we choose y i j to be the first in the usual lexicographical ordering of T i j , assuming that the offspring of each vertex have been labelled according to birth order. If the tree T i j has height greater than or equal to h n , then for a fixed δ ∈ (0, 1) it is possible to define a unique vertex on the path from ρ i j to y i j at level h δ n in T i j . We shall denote this vertex x i j and call it the 'entrance' to the leaf T i j . When we say that the leaf T i j has been visited deeply, we will mean that X has hit x i j . Moreover, by the 'time spent in the lower part of a big leaves emanating from ρ i ', we will mean where T i j (x i j ) is the part of the tree T i j descending from the entrance x i j .
To control the random variables (t i ) i≥0 (which are identically-distributed apart from i = 0), we need to consider the structure of the trees T i j := T i j \T i j (x i j ), and for this, the construction of a Galton-Watson tree conditioned on its height given in [18] is helpful. In particular, in Sect. 2 of that article, the following algorithm is described. First, let (ξ n , ζ n ), n ≥ 0, be a sequence of independent pairs of random variables, with distribution given by (recall that q n = P(Z n > 0) is the probability that the unconditioned branching process survives for at least n generations) for 1 ≤ j ≤ k, where .
Then, letT 0 be a Galton-Watson tree of height 0, i.e. consisting solely of a root vertex, and, to constructT n+1 , n ≥ 0: • let the first generation size ofT n+1 be ζ n+1 , • letT n be the subtree founded by the ξ n+1 th first generation particle ofT n+1 , • attach independent Galton-Watson trees conditioned on having height strictly less than n to the ξ n+1 − 1 siblings to the left of the distinguished first generation particle, • attach independent Galton-Watson trees conditioned on height strictly less that n + 1 to the ζ n+1 − ξ n+1 siblings to the right of the distinguished first generation particle.
It is shown in [18] that the treeT n that results from this procedure has the same probabilistic structure as T conditioned to have height exactly equal to n. Before considering the implications of this result for the times (t i ) i≥0 , we derive the asymptotics of the constants (c n ) n≥1 in our setting.
Proof First note that P(h(T ) = n) = q n − q n+1 . Moreover, if f (n) is the n-fold iteration of the generating function f , then we can write q n = 1 − f (n) (0). It follows that where we have applied (1.5) to deduce the third equality. Now, by (1.6), the second term on the right-hand side satisfies For the first term, again applying (1.5) and (1.6), it is the case that Multiplying the right-hand sides of (3.12) and (3.13) yields the result.
Lemma 3.6 Let α ∈ (1, 2] and T ∈ (0, ∞). As n → ∞, Proof Our first aim will be to show that for h δ n < x < h n . Fix an x in this range, and suppose for the moment that h(T i j ) = h ≥ h n , so that x i j is defined. Denote the path from ρ i j to x i j by ρ i j = w 0 , w 1 , . . . , w h δ n = x i j . Now, remove the edges {w l−1 , w l }, l = 1, . . . , h δ n from T i j , and denote by T i jl the connected component containing w l , so that T i j (minus the relevant edges) is the disjoint union of T i jl over l = 0, . . . , h δ n − 1. From the procedure for constructing a Galton-Watson tree conditioned on its height described before Lemma 3.5, we deduce Moreover, if we suppose that T i j conditioned on its height being equal to h has been built from the random variables (ξ n , ζ n ), n ≥ 0, then we can write Thus, combining these deductions, we obtain and, since this bound is independent of h ≥ h n , the bound at (3.14) follows. Now, by arguing similarly to (3.5), it is possible to check that where c β is a constant depending only upon β. Thus, following the end of the proof of Lemma 3.4, Clearly the first term decays to zero, and, by applying (3.9), so does the second term.
To deal with the third term, observe that, under the convention that h( where we have used that f (1 − x) ∼ 1 − αx α−1 L(x) as x → 0 + , which we first recalled in the proof of Lemma 3.1, and (1.6) again. Since the representation theorem for slowly varying functions ([29, Theorem 1.2], for example) implies that, for any ε > 0, for large n, it follows that P(max j=1,...,Z i −1: h(T i j )≥h n h(T i j ) ≥ h n /2) is asymptotically less than Finally, setting x = h n /2 in (3.14) and applying Lemma 3.5 yields for a suitable choice of constant c, and so, by adjusting c as necessary, we obtain that, for large n, Since this upper bound converges to 0 for any ε < α − 1, this completes the proof.
In deriving tail asymptotics for the time X spends in the big leaves emanating from a particular backbone vertex, it will be useful to have information about the set of big leaves that the biased random walk visits deeply before it escapes along the backbone, and the next two lemmas provide this. For their statement, we define the index set of big leaves emanating from ρ i by B i := j = 1, . . . ,Z i − 1 : h(T i j ) ≥ h n and the subset of those that are visited deeply by X before it escapes a certain distance along the backbone by Proof The lemma readily follows from the symmetry of the situation, which implies that, starting from ρ i , the biased random walk X is equally likely to visit any one of x i j , j ∈ B i and z i first.
Although the above lemma might seem simple, it allows us to deduce the distributional tail behaviour of the greatest height of a big leaf at a particular backbone vertex visited by the biased random walk X . Note that we continue to use the notation q n = P(Z n > 0).

Lemma 3.8 Let α ∈ (1, 2] and i ≥ 0. For x ≥ h n ,
Proof Let x ≥ h n . By definition, we have that and decomposing the inner expectation over the possible values of V i yields Since j∈A 1 {h(T i j )<x} is a measurable function of T * , this can be rewritten as where the second equality is an application of Lemma 3.7. Now, since To continue, observe that, conditional onZ i , # B i is binomially distributed with para-metersZ i − 1 and q h n . Consequently, the probability we are trying to compute is equal to We break this into two terms. Firstly, Secondly, Since taking the difference between (3.16) and (3.17) gives us (3.15), we have thus proved that where the second equality is a consequence of (1.5), and the lemma follows.
With these preparations in place, we are now ready to study the asymptotic tail behaviour of which can be interpreted as the length of time the X spends deep inside leaves emanating from ρ i before escaping along the backbone. The next lemma gives an upper tail bound for this random variable. Lemma 3.9 Let α ∈ (1, 2] and ε > 0. There exists a constant c β,ε such that, for any i ≥ 0 and x satisfying ln x ≥ c β,ε h n , Proof First note that, by applying the commute time identity for random walks (for example, [27, Proposition 10.6]), we have that refers to the random walk on the tree T i j extended by adding the vertex ρ i and the edge {ρ i , ρ i j }. Since E Thus, since the random walk X spends no time in T i j (x i j ) if j ∈ V i , we can bound the quenched expectation of t i conditional on V i as follows: where υ i j is the number of passages X makes from ρ i to x i j before it hits z i , and the inequality at (3.18) is obtained by an application of the strong Markov property (that holds with respect to the unconditioned law). Now, υ i j is clearly bounded above by the total number of visits to ρ i , N (ρ i ) say, and, by symmetry, this latter random variable where we have applied Lemma 3.7 and the argument at (3.4) to deduce P T * ρ (#V i = k) −1 = # B i + 1 and E T * ρ (N (ρ i )) ≤ c βZi , respectively. Applying the above bound in combination with (3.19) yields Thus, for η ∈ (0, 1 2 ), we can conclude where the value of c β has been updated from above and the constant δ is the one appearing in (3.9). We have also applied Lemma 3.8 in obtaining the final bound. Finally, (1.6) allows us to deduce from this that, as long as (1 − 2η) ln x/ ln β is sufficiently large, it holds that The result follows.
We can also prove a lower bound for the distributional tail of t i that matches the upper bound proved above. Similarly to a proof strategy followed in [6], a key step in doing this is obtaining a concentration result to show that the time spent in a leaf visited deeply by the process X will be on the same scale as its expectation. α ∈ (1, 2] and ε > 0. There exist constants n 0 and c β,ε such that, for any i ≥ 0, n ≥ n 0 and x satisfying c β,ε h n ≤ ln x ≤ n 2 ,

Lemma 3.10 Let
Proof Our first goal is to derive an estimate on the lower tail of the time that X spends in a big leaf T i j before hitting ρ i , given that it starts at the entrance vertex x i j . To this end, we start by noting that under P T * x i j and conditional on the number of returns that the random walk X makes to T i j (x i j ) before hitting ρ i , i.e.
where x i j denotes the parent of x i j , the random variable := copies of a random variable whose law is equal to that of τ x i j under P T * x i j . (This is a simple application of the strong Markov property.) In particular, we have that and also To control the right-hand sides of these quantities, we will apply the following moment bounds: where the first moment lower bound is obtained by applying a formula similar to (3.2), and the second moment upper bound is an adaptation of a result derived in the proof of [6, Lemma 9.1]. As for the distribution of υ i j under P T * x i j , it is clear this is geometric, with parameter given by from which it follows that for n ≥ n 0 , where n 0 is a deterministic constant. Putting the above observations together, we deduce that, for n ≥ n 0 and ε > 0, where c is a constant depending only on β and n 0 (and not ε).
By applying a strong Markov argument for the unconditioned law (cf. (3.18)), yields that the law of τ x i j 0 →ρ i under P T * ρ (·| V i ) is the same as that of (as defined above with j = j 0 ) under P T * x i j 0 , and thus the result of the previous paragraph implies that, for n ≥ n 0 and ε > 0, Taking expectations with respect to P T * ρ and P establishes that the same is true when P T * ρ (·| V i ) is replaced by the annealed law P ρ . Consequently, for any n ≥ n 0 , ε > 0 and ln x ≥ h n ln β, where we have applied Lemma 3.8 to deduce the second inequality. Finally, fix η > 0. If we set ε = 1/(ln x) 2 , then the second term is bounded above by η/ ln x for any ln x ≥ η −1 . With this choice of ε, by (1.6), the fourth term is bounded above by , uniformly over ln x ≤ n 2 , and this completes the proof.
Finally for this section, we establish that the same distributional tail behaviour for the random variables (3.20) where i,(ln n) 1+γ is the first time after i that the process X hits a backbone vertex outside of the interval {ρ i−(ln n) 1+γ , . . . , ρ i+(ln n) 1+γ }. Given the backtracking result of Lemma 3.3, with high probability it is the case thatt i will be identical to the t i for all relevant indices i. However, the advantage of the sequence (t i ) over (t i ) is that, similarly to the sequence of random variables (T x ) introduced for the directed trap model at (2.5), at least when the traps are suitably well-spaced, it is possible to decouple the elements of (t i ) in such a way as to be able to usefully compare them with an independent sequence.

Lemma 3.11
Let α ∈ (1, 2] and ε > 0. There exist constants n 0 and c β,ε such that, for any i ≥ 0, n ≥ n 0 and x satisfying c β,ε h n ≤ ln x ≤ n 2 , Proof If the process X does not hit ρ i−1−(ln n) 1+γ again after having hit ρ i , and does not hit ρ i again after having hit ρ i+1+(ln n) 1+γ , thent i is equal to t i . Hence, An elementary calculation for the biased random walk on a line shows that the righthand side here is equal to β 1−(ln n) 1+γ = o(n −2 ). Applying this fact, it is easy to deduce the result from Lemmas 3.9 and 3.10.

Proof of main result for critical Galton-Watson trees
The purpose of this section is to complete the proof of our main results for biased random walks on critical Galton-Watson trees (Theorem 1.1 and Corollary 1.2).
Proof of Theorem 1. 1 We start the proof by claiming that the conclusion of the lemma holds when the hitting time sequence ( n ) n≥0 is replaced by ( n−1 i=0t i ) n≥0 . By imitating the proof of Lemma 2.5 with Lemma 3.2 in place of Lemma 2.1, to verify that this is indeed the case, it will be enough to prove the same result for ( n−1 i=0t i ) n≥0 , where (t i ) i≥0 in an independent sequence such thatt i ∼t 1+(ln n) 1+γ for each i. (Note that, because the elements of the sequence (t i ) i≥0 are only identically-distributed for i ≥ 1 + (ln n) 1+γ , we do not taket i ∼t i for each i. By applying the second part of Lemma 3.2, which shows that with high probability there will be no big leaves in the interval close to ρ, it is easy to adapt the argument of Lemma 2.5 to overcome this issue.) Since the tail asymptotics of Lemma 3.11 mean that the relevant functional scaling limit for ( n−1 i=0t i ) n≥0 is an immediate application of Theorem 5.1 (with h 1 (n) = ln n and h 2 (n) = n −1 ), our claim holds as desired. Now, fix T ∈ (0, ∞). By Lemmas 3.3, 3.4 and 3.6, with probability converging to one we have that, for every t ∈ [0, T ], By repeating the proof of Theorem 1.5 exactly with the particular choice L(x) := log + x, this, in conjunction with the conclusion of the previous paragraph, yields the result.
Proof of Corollary 1.2 Since the proof is identical to that of Corollary 1.6, withF(x) being taken to be a distribution function that is asymptotically equivalent to ln β/(α − 1) ln x, we omit it.

Growth rate of quenched mean hitting times
The purpose of this section is to compare the growth rate of E T * ρ n , that is, the quenched expectations of the hitting times n , with the growth rate of n that was established in the previous section. Interestingly, in the result corresponding to Theorem 1.1 (see Theorem 3.14 below), an extra factor of α appears, meaning that the sequence of quenched expectations grows more quickly than the hitting times themselves. This is primarily due to the fact that the quenched expectation E T * ρ n feels all the big leaves at a particular backbone vertex, whereas the hitting time n only feels the big leaves that are deeply visited by X . Indeed, the extra α is most easily understood by comparing the following lemma, which describes the height of the biggest leaf at a particular backbone vertex, with Lemma 3.8, which concerns only deeply visited big leaves. α ∈ (1, 2]. For any i ≥ 0,

Lemma 3.12 Let
Proof Conditioning onZ i , we obtain where we have once again applied the size-biasing of (3.1) to obtain the second equality. Since we know from the proof of Lemma 3.
In studying the quenched expectation of hitting times, we no longer need an argument that is so sophisticated as to consider the time spent in the individual leaves T i j (which were defined after (3.1)). Instead, we will be concerned only with understanding the expected length of time the biased random walk X spends inside sets of the form T i = {ρ i } ∪ (∪ j=1,...,Z i −1 T i j ). To this end, we introduce a stopping time The expected time spent by X inside T i on a single visit is thus given by E T * ρ i σ i . Similarly to (3.2), we have that and this allows us to obtain the following distributional asymptotics. α ∈ (1, 2]. For any i ≥ 0,

Lemma 3.13 Let
Proof If i ≥ 1, then from (3.21) we are easily able to deduce that is the height of T i . Hence, for any η ∈ (0, 1), where we have applied (3.9) to deduce the second inequality for suitable constants c and δ > 0, and Lemma 3.12 and (1.6) to obtain the asymptotic equivalence. Since (3.22) in conjunction with Lemma 3.12 and (1.6) also implies that the result follows in this case. The argument for i = 0 is similar.
We are now ready to prove the main result of this section.
Proof The embedded random walk on the backbone Y visits each site ρ i , i ≥ 1, a geometric parameter (β − 1)/(β + 1) number of times in total and ρ = ρ 0 a geometric parameter (β − 1)/β number of times. Moreover, before visiting ρ n , Y has to visit each element of {ρ 0 , . . . , ρ n−1 } at least once. This and the definition of n implies that Now, the random variables E T * ρ i σ i in these sums are independent and have slowly varying tails, as described by Lemma 3.13. Thus the result is a simple consequence of [23, Theorem 2.1] (or Theorem 5.1 below).

Remark 3.15
For comparison, recall the directed trap model of Sect. 2, but, so as to avoid having to consider the time that the biased random walk X spends at negative integers, replace Z by the half-line Z + . As in Theorem 1.5, we have that (n −1 L( nt )) t≥0 converges in distribution under the annealed law P 0 to (m(t)) t≥0 . For the corresponding quenched expectation, similarly to (3.23), we have that n−1 Thus, again applying [23, Theorem 2.1] (or Theorem 5.1 below), it is possible to check that (n −1 L(E τ 0 ( nt ))) t≥0 converges in distribution under P to (m(t)) t≥0 . In particular, in contrast to the critical Galton-Watson tree case, the asymptotic behaviour of E τ 0 ( n ) and n are identical. This is because, although certain big leaves will be avoided by certain realisations of the biased random walker in the tree setting, the geometry of the graph Z + forces X , when travelling from 0 and n, to visit all the traps in between on every realisation.

Extremal aging
In this section, we will prove Theorem 1.4 and Theorem 1.8, which state that the biased random walk on critical Galton-Watson tree conditioned to survive and the onedimensional trap model, respectively, experience extremal aging. The phenomenon we describe for these models is similar to what happens in the trapping models considered by Onur Gun in his PhD thesis [20] and to results observed for spin glasses in [7].

Extremal aging for the one-dimensional trap model
We start by considering the one-dimensional trap model introduced in Sect. 1.2, with the goal of this section being to prove Theorem 1.8. The reason for proving this result before its counterpart for trees is that the simpler argument it requires will be instructive when it comes to tackling the more challenging tree case in the subsequent section.
Key to proving Theorem 1.8 is establishing that X localises at the closest trap to 0 of a sufficient depth. To describe this precisely, as we do in Lemma 4.2 below, we first introduce the notation From the independently and identically-distributed nature of the environment, we readily deduce the following preliminary lemma. We now establish the relevant localisation result for X . Proof Our first aim is to show that X hits l(an) before timeF −1 (1/an) with high probability. Clearly, for any T > 0, we have that where c β is a constant depending only on β. By proceeding as in the proof of Lemma 2.3 with g(n) replaced byF −1 (1/an(1 − ε)), it is possible to check that and so lim sup Since inf{t : X t = l(an)} is exponential with mean τ l(an) under P τ l(an) , for any ε > 0 the right-hand side here is bounded above by  (1/an(1 + ε)) .
Since ε was arbitrary, this completes the proof.

Extremal aging for the critical Galton-Watson tree model
We now return to the setting of Sect. 1.1, so as to prove Theorem 1.4. Similarly to the strategy of the previous section, we will show that the biased random walk on a critical Galton-Watson tree localises in the first suitably big leaf it visits deeply. To describe this, we introduce the notation: Whilst the form of the following lemma is similar that of Lemma 4.1, we note that its proof is more involved. This is because, unlike the holding time means τ x used to define l there, the random variables max j∈V i h(T i j ) are not environment measurable or independent. Proof First, defineṼ to be the set of big leaves visited by X before the stopping time i,(ln n) (1+γ ) that was introduced at (3.20). SetH i := max j∈Ṽ i h(T i j ) ifṼ i = ∅, andH i = 0 otherwise; observe that ifH i > 0, then it is necessarily also the case thatH i ≥ h n . Moreover, for T ∈ (0, ∞) and ε ∈ (0, 1), let By proceeding as in the proof of Lemma 2.5, it is possible to show that, under P ρ , the random variables (H i , N n (i)) nT i=0 conditional on E 1 (n) have the same joint distribution as (H i , N n (i)) nT i=0 conditional on E 1 (n), where (H i , N n (i)) i≥0 are independent copies of the pair of random variables (H 1+(ln n) 1+γ , N n (1 + (ln n) 1+γ )) and E 1 (n) is defined analogously to E 1 (n) with the N n (i)s replaced by N n (i)s. Consequently, if we set and definel (x) similarly from the random variablesH i , then P ρ l (an) =l(bn) − P ρ l (an) =l (bn) where we have applied the fact that {l(an) =l(bn),l(an) ≤ nT } and {l(an) > nT } are both (H i ) nT i=0 measurable events. Now, similarly to the observation made in the proof of Lemma 3.11, if the process X does not hit ρ i−1−(ln n) 1+γ again after having hit ρ i , and does not hit ρ i again after having hit ρ i+1+(ln n) 1+γ -an event which has probability greater than 1 − o(n −2 ) uniformly in i, thenH i is equal to max j∈V i h(T i j ). Hence, applying (1.6) and Lemma 3.8, we obtain that, for any x, ε > 0, for large n (uniformly in i), and clearly the same bound holds whenH i is replaced bỹ H i . Applying the independence of the random variables (H i ) i≥0 , it follows that and also Combining these results with Lemma 3.2, which implies that P(E 1 (n) c ) → 0, and the estimate at (4.1), then letting T → ∞, yields Now, suppose that E 2 (n) is the event that the embedded random walk on the backbone Y does not backtrack more that (ln n) 1+γ before hitting ρ n(T +1) -by Lemma 3.3, P ρ (E 2 (n)) → 1. Moreover, on the event E 2 (n), we have thatH i = max j∈V i h(T i j ) for i ≤ n(T + 1) − 1 − h δ n . In particular, for large enough n, if E 2 (n) holds and alsõ l(an) ≤ nT , then it must be the case that l(an) =l(an). Hence, for large n, Similarly to above, we have that the first term here can be bounded above by the limsup as n → ∞ of which can be made arbitrarily small by choosing T suitably large. Hence lim n→∞ P ρ l(an) =l(an) = 0.
The lemma follows by applying this in conjunction with (4.2).
Before proceeding to prove the analogue of Lemma 4.2 in the tree setting-see Lemma 4.5 below, we prove a preliminary estimate which rules out the possibility that any leaves have heights that are close to any particular level on the appropriate scale.
where c α is a constant depending only on α. The lemma readily follows. Proof Fix ε > 0, and let i 0 , j 0 be indices such that x i 0 j 0 is the first entrance to a big leaf with height greater than or equal to an(1 + ε)/ ln β visited by X (the relevant terminology was introduced just above (3.10)). If i 1 := l(an(1 + ε)) ≤ nT, j 1 ∈ V i 1 is such that h(T i 1 j 1 ) ≥ an(1 + ε)/ ln β and n is suitably large, then it must hold that where we recall that z i := ρ i+1+h δ n and note that the second inequality follows from the definition of V i . In particular, for large n, P ρ τ x i 0 j 0 > n(T +1) ≤ P ρ (l(an(1 + ε)) > nT ) ≤ P ρ l(an(1 + ε)) =l(an(1 + ε)) + P ρ l (an(1 + ε)) > nT , wherel(an) was defined in the proof of Lemma 4.3, and the upper bound here converges to 0 as n and then T tend to infinity. Consequently, by applying Lemmas 3.
Combining these observations yields We now check that i 0 , as defined in the previous paragraph, is, with high probability, equal to l(an). We continue to use the notation i 1 = l(an (1 + ε)). Firstly, observe that if i 0 < i 1 , then the definition of l(an(1 + ε)) implies that j 0 ∈ V i 0 , i.e. τ z i 0 ≤ τ x i 0 j 0 . Hence the process X must backtrack a distance 1 + h δ n along the backbone before hitting x i 0 j 0 . This implies that, for any T ∈ (0, ∞) where Y is the jump process on the backbone introduced in Sect. 3.2. By Lemma 3.3, the second term here vanishes as n tends to infinity, and it was already noted above that the first term converges to 0 as n and then T tend to infinity. Secondly, suppose i 0 > i 1 . Since by construction τ i 0 j 0 < τ i i j 1 < τ z i 1 for any j 1 ∈ V i 1 with h(T i 1 j 1 ) ≥ an(1 + ε)/ ln β (and such a j 1 must necessarily exist), we always also have that i 0 < i 1 + 1 + h δ n . In particular, in this situation, there exist two distinct backbone vertices from which big traps emanate within a distance h δ n of each other, and so As n tends to infinity, the second term converges to 0 by Lemma 3.2, and we deal with the first term in the same way as before, thus confirming that P ρ (i 0 = i 1 ) converges to 0 as n tends to infinity. We deduce from this that lim sup Recalling from above (3.10) the definition of y i j (the deepest vertex of trap T i j ), it is plain to show that for some constant c 1 depending only on β; indeed, this is nothing more than a computation for a biased random walk on Z. Furthermore, another simple calculation for biased random walk on the line yields where τ + y i 0 j 0 is the time of the first return to y i 0 j 0 , so Consequently, lim sup n→∞ P ρ π(X e an ) = ρ l(an) ≤ lim sup ε→0 lim sup n→∞ c 1 β −h δ n + c 3 e −aεn = 0, which completes the proof.

A limit theorem for sums of independent random variables with slowly varying tail probability
In this section, we derive the limit theorem for sums of independent random variables with slowly varying tail probability that was applied in the proofs of Lemma 2.5 and Theorem 1.1. The result we prove here is a generalisation of [23, Theorem 2.1].
Let (X i, j ) i, j∈N be non-negative random variables such that for each n ≥ 1, the elements of the collection (X n, j ) j∈N are independent and have common distribution function F n . Moreover, suppose F is a distribution function such thatF(x) := 1−F(x) is slowly varying andF(x) > 0 for all x > 0. Similarly writingF n (x) := 1 − F n (x), the main assumption of this section is that for each ε > 0, there exist constants c 1 , c 2 such that → (m(t)) t≥0 (5.2) in distribution with respect to the Skorohod J 1 topology on D([0, ∞), R).
Remark 5.2 (i) Note that, similarly to Remark 1.9, ifF n andF are not continuous and eventually strictly decreasing, a minor modification to the proof of the above result (cf. Remark 1.7(ii)) is needed.
(ii) The same conclusion holds if on the left-hand side of (5.2) we replace L by L n (x) = 1/F n (x).
Thus, for δ ≤ t ≤ T and n ≥ n 3 (ω), we have S n,2 nt ≤ φ n (2n 2 a) i≤nt η n,i = φ n (2n 2 a)η( nt /n) ≤ φ n (n 2 m n (t) 2 )η(T ), (5.11) where the last inequality is due to (5.9). Now, noting that there are only finitely many t ∈ [0, T ] such that η(t) > a, there exists a random K 3 > 0 such that i≤nT 1 A 3,i ≤ K 3 for large n, almost-surely. Using this and the definition of m n (t), there exists a random n 4 ≥ 1 such that the following holds: S n,3 nt ≤ K 3 max i≤nt φ n (n 2 η n,i ) = K 3 φ n (n 2 m n (t) 2 ), ∀δ ≤ t ≤ T, n ≥ n 4 . (5.12) Combining (5.10), (5.11) and (5.12), we obtain i≤nt φ n (n 2 η n,i ) ≤ K 2 φ n (n 2 m n (t) 2 ), ∀δ ≤ t ≤ T, n ≥ n 2 ∨ n 3 ∨ n 4 =: n 0 , where K 2 = 1 + η(T ) + K 3 . Thus we have obtained the upper bound of (ii). Then, by definition, (ζ (n) n (t)) t≥0 is equal in law to ( 1 n L(S n nt )) t≥0 . Further, as discussed around (5.3), m n → m almost-surely with respect to the Skorohod J 1 topology. So, in order to complete the proof, it suffices to prove the following: for T > 0, To extend the above notions to D([0, ∞), R), we characterise convergence in the Skorohod J 1 (or M 1 ) topology on this space by saying f n → f if and only if f n → f with respect to the Skorohod J 1 (or M 1 ) topology on D([0, T ], R) for every continuity point of f . (These topologies can also be described by metrics, see [32,Section 3], for example.) In particular, to establish weak convergence of a random sequence (X n ) n≥1 to X with respect to the Skorohod J 1 (or M 1 ) topology on D([0, ∞), R), we require that (X n ) n≥1 converges weakly to X with respect to the Skorohod J 1 (or M 1 ) topology on D([0, T ], R) for every time T at which X is almost-surely continuous. Note that, since we only ever consider the limits (m(t)) t≥0 and (m −1 (t)) t≥0 , which are both continuous at each fixed T with probability 1, in our setting we are always required to check that the relevant weak convergence of processes holds in D([0, T ], R) for every time T .