Escape regimes of biased random walks on Galton–Watson trees

We study biased random walk on subcritical and supercritical Galton–Watson trees conditioned to survive in the transient, sub-ballistic regime. By considering offspring laws with infinite variance, we extend previously known results for the walk on the supercritical tree and observe new trapping phenomena for the walk on the subcritical tree which, in this case, always yield sub-ballisticity. This is contrary to the walk on the supercritical tree which always has some ballistic phase.


Introduction
In this paper, we investigate biased random walks on subcritical and supercritical Galton-Watson trees. These are a natural setting for studying trapping phenomena as dead-ends, caused by leaves in the trees, slow the walk. These models can be used to approach more difficult problems concerning biased random walks on percolation clusters (as studied in [11,13,24]) and random walk in random environment (see for example [18,25]). For a recent review of trapping phenomena and random walk in random environment we direct the reader to [3] which covers recent developments in B Adam Bowditch a.bowditch@warwick.ac.uk 1 University of Warwick, Coventry, UK a range of models of directionally transient and reversible random walks on underlying graphs such as supercritical GW-trees and supercritical percolation clusters.
For supercritical GW-trees with leaves, it has been shown in [21] that, for a suitably large bias away from the root, the dead-ends in the environment create a sub-ballistic regime. In this case, it has further been observed in [4], that if the offspring distribution has finite variance then the walker follows a polynomial escape regime but cannot be rescaled properly due to a certain lattice effect. (In [5,15] it is shown that, in a related model where the conductance along each is chosen randomly according to a distribution satisfying a certain non-lattice assumption, the tail of the trapping time obeys a pure power law and the rescaled walk converges in distribution.) Here we show that, when the offspring law has finite variance, the walk on the subcritical GWtree conditioned to survive experiences similar trapping behaviour to the walk on the supercritical GW-tree shown in [4]. However, the main focus of the article concerns offspring laws belonging to the domain of attraction of some stable law with index α ∈ (1,2). In this setting, although the distribution of time spent in individual traps has polynomial tail decay in both cases, the exponent varies with α in the subcritical case and not in the supercritical case. This results in a polynomial escape of the walk which is always sub-ballistic in the subcritical case unlike the supercritical case which always has some ballistic phase.
We now describe the model of a biased random walk on a subcritical GW-tree conditioned to survive which will be the main focus of the article. Let f (s) := ∞ k=0 p k s k denote the probability generating function of the offspring law of a GW-process with mean μ > 0 and variance σ 2 > 0 (possibly infinite) and let Z n denote the nth generation size of a process with this law started from a single individual, i.e. Z 0 = 1. Such a process gives rise to a random tree where individuals in the process are represented by vertices and undirected edges connect individuals with their offspring.
For a fixed tree T let ρ denote its root, Z T n the size of the nth generation, ← − x the parent of x ∈ T , c(x) the set of children of x, d x := |c(x)| the out-degree of x, d(x, y) the graph distance between vertices x, y, |x| := d(ρ, x) the graph distance between x and the root and T x to be the descendent tree of x. A β-biased random walk on a fixed, rooted tree T is a random walk (X n ) n≥0 on T which is β-times more likely to make a transition to a given child of the current vertex than the parent (which are the only options). That is, the random walk is the Markov chain started from X 0 = z defined by the transition probabilities We use P ρ (·) := P T ρ (·)P(dT ) for the annealed law obtained by averaging the quenched law P T ρ over a law P on random trees with a fixed root ρ. In general we will drop the superscript T and subscript ρ when it is clear to which tree we are referring and we start the walk at the root.
We will mainly be interested in GW-trees T which survive, that is H(T ) := sup{n ≥ 0 : Z n > 0} = ∞. It is classical (e.g. [2]) that when μ > 1 there is some strictly positive probability 1 − q that H(T ) = ∞ whereas when μ ≤ 1 we have that H(T ) is almost surely finite. However, it has been shown in [17] that there is some well defined probability measure P over f -GW trees conditioned to survive for infinitely many generations which arises as a limit of probability measures over f -GW trees conditioned to survive at least n generations. Henceforth, we assume P is this law and X n is a random walk on an f -GW-tree conditioned to survive.
The main object of interest is |X n |, that is, how the distance from the root changes over time. Due to the typical size of finite branches in the tree being small and the walk not backtracking too far we shall see that |X n | has a strong inverse relationship with the first hitting times n := inf{m ≥ 0 : X m ∈ Y, |X m | = n} of levels along the backbone Y := {x ∈ T : H(T x ) = ∞} so for much of the paper we will consider this instead. It will be convenient to consider the walk as a trapping model. To this end we define the underlying walk (Y k ) k≥0 defined by Y k := X η k where η 0 := 0 and η k := inf{m > η k−1 : X m , X m−1 ∈ Y} for k ≥ 1.
When X n is a walk on an f -GW tree conditioned to survive for f supercritical (μ > 1), it has been shown in [21] that ν(β) := lim n→∞ |X n |/n exists P-a.s. and is positive if and only if μ −1 < β < f (q) −1 in which case we call the walk ballistic. Furthermore, although no explicit expression for the speed ν is known, a description of the invariant distribution of the environment seen from the particle is used in [1] to give an expression of the speed in terms of the annealed expectation. This expression coincides with the speed of the walk on a certain regular tree where each vertex has some number of children m β ; in particular, it can be seen that m β ≤ μ therefore the randomness of the tree slows the walk. If β ≤ μ −1 then the walk is recurrent because the average drift of Y acts towards the root. When β ≥ f (q) −1 the walker expects to spend an infinite amount of time in the finite trees which hang off Y (see Fig. 5 in Sect. 10) thus causing a slowing effect which results in the walk being sub-ballistic. In this case, the correct scaling for some non-trivial limit is n γ where γ will be defined later in (1.1). In particular, it has been shown in [4] that, when σ 2 < ∞, the laws of |X n |n −γ are tight and, although |X n |n −γ doesn't converge in distribution, we have that n n −1/γ converges in distribution under P along certain subsequences to some infinitely divisible law. In Sect. 10 we extend this result by relaxing the condition that the offspring law has finite variance and instead requiring only that it belongs to the domain of attraction of some stable law of index α > 1.
Recall that the offspring law of the process is given by P(ξ = k) := p k , then we define the size-biased distribution by the probabilities P(ξ * = k) := kp k μ −1 . It can be seen (e.g. [16]) that the subcritical (μ < 1) GW-tree conditioned to survive coincides with the following construction: Starting with a single special vertex, at each generation let every normal vertex give birth onto normal vertices according to independent copies of the original offspring distribution and every special vertex give birth onto vertices according to independent copies of the size-biased distribution, one of which is chosen uniformly at random to be special. Unlike the supercritical tree which has infinitely many infinite paths, the backbone of the subcritical tree conditioned to survive consists of a unique semi-infinite path from the initial vertex ρ. We call the vertices not on Y which are children of vertices on Y buds and the finite trees rooted at the buds traps (see Fig. 2 in Sect. 3). In this paper we consider walks with positive bias therefore the walk is transient and only returns to the starting vertex ρ finitely often. Moreover, we are interested in the case where the trapping times are heavy tailed and therefore since the traps are i.i.d. the walk closely resembles a one dimensional directed trap model as studied in [26].
Briefly, the phenomena that can occur in the subcritical case are as follows. When E[ξ log + (ξ )] < ∞ and μ < 1 there exists a limiting speed ν(β) such that |X n |/n converges almost surely to ν(β) under P; moreover, the walk is ballistic (ν(β) > 0) if and only if 1 < β < μ −1 and σ 2 < ∞. This essentially follows from the argument used in [21] (to show the corresponding result on the supercritical tree) with the fact that, by (2.1) and (5.2), the conditions given are precisely the assumptions needed so that the expected time spent in a branch is finite (see [8] or [9] for further detail). The sub-ballistic regime has four distinct phases. When β ≤ 1 the walk is recurrent and we are not concerned with this case here. When 1 < β < μ −1 and σ 2 = ∞ the expected time spent in a trap is finite and the slowing of the walk is due to the large number of buds. When βμ > 1 and σ 2 < ∞, the expected time spent in a subcritical GW-tree forming a trap is infinite because the strong bias forces the walk deep into traps and long sequences of movements against the bias are required to escape. In the final case for the subcritical tree (βμ > 1, σ 2 = ∞) slowing effects are caused by both strong bias and the large number of buds. Figure 1 is the phase diagram for the almost sure limit of log(|X n |)/ log(n) (which is the leading order polynomial exponent in the scaling of |X n | relative to β and μ) where the offspring law has stability index α (which is 2 when σ 2 < ∞) and we define where we note that f (q) and μ are the mean number of offspring from vertices in traps of the supercritical and subcritical trees respectively. Strictly, f (q) isn't a function of μ therefore the line β = f (q) −1 is not well defined; Fig. 1 shows the particular case when the offspring distribution belongs to the geometric family. It is always the case that f (q) < 1 therefore some such region always exists however the parametrisation depends on the family of distributions. When the offspring law has finite variance, the limiting behaviour of |X n | on the supercritical and subcritical trees is very similar. Both have a regime with linear scaling (which is, in fact, almost sure convergence of |X n |/n) and a regime with polynomial scaling caused by the same phenomenon of deep traps (which results in |X n |n −γ not converging). When the offspring law has infinite variance, the bud distribution of the subcritical tree has infinite mean which causes an extra slowing effect which isn't seen with the supercritical tree. This equates for the different exponents observed in the two models as shown in Fig. 1. The walk on the critical (μ = 1) tree experiences a similar trapping mechanism to the subcritical tree; however, the slowing is more extreme and belongs to a different universality class which had been shown in [10] to yield a logarithmic escape rate.

Statement of main theorems and proof outline
In this section we introduce the three sub-ballistic regimes in the subcritical case and the one further regime for the infinite variance supercritical case that we consider here. We then state the main theorems of the paper.
The subcritical tree has bud distribution ξ * − 1 where P(ξ * = k) = kp k μ −1 which yields the following important property relating the size biased and offspring distributions Choosing ϕ to be the identity we have finite mean of the size-biased distribution if and only if the variance of the offspring distribution is finite. This causes a phase transition for the walk that isn't seen in the supercritical tree. The reason for this is that in the corresponding decomposition for the supercritical tree we have subcritical GW-trees as leaves but the number of buds is exponentially tilted and therefore maintains moment properties.
If the offspring law belongs to the domain of attraction of some stable law of index α ∈ (1, 2) then taking ϕ(x) = x1 {x≤t} shows that the size biased distribution belongs to the domain of attraction of some stable law with index α − 1 and allows us to attain properties of the scaling sequences (see for example [12,IX.8]).
The first case we consider is when βμ < 1 but σ 2 = ∞; we refer to this as the infinite variance, finite excursion case: Definition 1 (IVFE) The offspring distribution has mean μ satisfying 1 < β < μ −1 and belongs to the domain of attraction of a stable law of index α ∈ (1, 2).
Under this assumption we let L vary slowly at ∞ such that as x → ∞ and choose (a n ) n≥1 to be some scaling sequence for the size-biased law such that for any x > 0, as n → ∞ we have P(ξ * ≥ xa n ) ∼ x −(α−1) n −1 . Moreover for some slowly varying functionL we have that a n = n 1 α−1L (n). In this case we have that the slowing is caused by the number of excursions in traps. Since β is small (i.e. less than μ −1 ) we have that the expected time spent in a trap is finite. The number of excursions the walk takes into a branch is of the same order as the number of buds; since the size-biased law has infinite mean there are a large number of buds and therefore a large number of excursions. The main result for IVFE is Theorem 1 which reflects that n scales similarly to the sum of independent copies of ξ * . Theorem 1 For IVFE, the laws of the process nt a n t≥0 converge weakly as n → ∞ under P with respect to the Skorohod J 1 topology on D([0, ∞), R) to the law of an α − 1 stable subordinator R t with Laplace transform where C α,β,μ is a constant which we shall determine during the proof (see 9.1).
As for IVFE, in IVIE we let L vary slowly at ∞ such that (2.2) holds and (a n ) n≥1 be some scaling sequence for the size-biased law such that for any x > 0, as n → ∞ we have P(ξ * ≥ xa n ) ∼ x −(α−1) n −1 . It then follows that a n = n In FVIE and IVIE the slowing is caused by excursions in deep traps because the walk is required to make long sequences of movements against the bias in order to escape. We shall see that only the depth H (and not the foliage) is important to the scaling. By comparison with the model in which we strip all of the branch except the unique self-avoiding path to the deepest point; we see that, by transience, the walk reaches the deepest point with positive probability and then takes a geometric number of short excursions with escape probability of the order β −H . In particular, this means that the expected time spent in a branch of height H will cluster around β H .
Intuitively, the main reason we observe different scalings in these two cases is due to the way the number of buds affects the height of the branch. The height of a GW-tree is approximately geometric; in particular, the tallest of n independent trees will typically be close to log(n)/ log(μ −1 ). In FVIE the number of buds has finite mean therefore we see order n buds by level n hence tallest will have height close to log(n)/ log(μ −1 ). In IVIE the number of buds has infinite mean but belongs to the domain of attraction of some stable law. In particular, the number of buds seen by level n is equal in distribution to the sum of n independent copies of ξ * − 1 (which scales with a n ). It therefore follows that, in IVIE, the tallest tree up to level n will have height close to log(a n )/ log(μ −1 ). Since only the deepest trees are significant and the time spent in a large branch clusters around β H we see that the natural scaling is β log(n)/ log(μ −1 ) = n 1/γ in FVIE and β log(a n )/ log(μ −1 ) = a 1/γ n in IVIE.
Since H is approximately geometric we have that β H won't belong to the domain of attraction of any stable law. For this reason, as in [4], we only see convergence along specific increasing subsequences n l (t) := tμ −l for t > 0 in FVIE and n l (t) such that a n l (t) ∼ tμ −l for IVIE. Such a sequence exists for any t > 0 since by choosing n l (t) := sup{m ≥ 0 : a m < tμ −l } we have that a n l < tμ −l ≤ a n l +1 and therefore 1 ≥ a n l tμ −l ≥ a n l a n l +1 → 1.
Recalling (1.1), the main results for FVIE and IVIE are Theorems 2 and 3, which reflect slowing due to deep excursions.

Theorem 2
In FVIE, for any t > 0 we have that as l → ∞ n l (t) We write r n to be a n in IVFE, n 1/γ in FVIE and a 1/γ n in IVIE; then, letting r n := max{m ≥ 0 : r m ≤ n} we will also prove Theorem 4. This shows that, although the laws of X n /r n don't converge in general (for FVIE and IVIE), the suitably scaled sequence is tight and we can determine the leading order polynomial exponent explicitly.
The final case we consider is an extension of a result of [4] for the walk on the supercritical tree. The argument used for the infinite variance case is generally the same as in the finite variance case but needs some technical input. This is provided by three lemmas which we put aside until Sect. 10. For the same reason as in FVIE, we only see convergence along specific subsequences n l (t) := t f (q) −l for t > 0.
Theorem 5 (Infinite variance supercritical case) Suppose the offspring law belongs to the domain of attraction of some stable law of index α ∈ (1, 2), has mean μ > 1 and the bias satisfies the bound β > f (q) −1 . Then, where R t is a random variable with an infinitely divisible law whose parameters are given in [4]. Moreover, the laws of ( n n − 1 γ ) n≥0 and (|X n |n −γ ) n≥0 under P are tight on (0, ∞) and P-a.s.
The proofs of Theorems 1, 2 and 3 follow a similar structure to the corresponding proof of [4] which, for the walk on the supercritical tree, only considers the case in which the variance of the offspring distribution is finite. However, for the latter reason, the proofs of Theorems 1 and 3 become more technical in some places, specifically with regards to the number of traps in a large branch. The proof can be broken down into a sequence of stages which investigate different aspects of the walk and the tree. This is ideal for extending the result onto the supercritical tree because many of these behavioural properties will be very similar for the walk on the subcritical tree due to the similarity of the traps.
In all cases it will be important to decompose large branches. In Sect. 3 we show a decomposition of the number of deep traps in any deep branch. This is only important for FVIE and IVIE since the depth of the branch plays a key role in decomposing the time spent in large branches. In Sect. 4 we determine conditions for labelling a branch as large in each of the regimes so that large branches are sufficiently far apart so that, with high probability, the underlying walk won't backtrack from one large branch to the previous one. In Sect. 5 we justify the choice of label by showing that time spent outside these large branches is negligible. From this we then have that n can be approximated by a sum of i.i.d. random variables whose distribution depends on n. In Sect. 6 we only consider IVFE and show that, under a suitable scaling, these variables converge in distribution which allows us to show the convergence of their sum. Similarly, in Sect. 7 we show that the random variables, suitably scaled, converge in distribution for FVIE and IVIE. We then show convergence of their sum in Sect. 8. In Sect. 9 we prove Theorem 4 which is standard following Theorems 1, 2 and 3. Finally, in Sect. 10, we prove three short lemmas which extend the main result of [4] to prove Theorem 5. We require a lot of notation much of which is very similar; a glossary follows Sect. 10 which includes most of the notation used repeatedly throughout.

Number of traps
In this section we show asymptotics for the probability that the height of a branch is large and use it to determine the distribution over the number of large traps in a large branch. Unless stated otherwise we assume μ < 1.
In the construction of the subcritical GW-tree conditioned to survive T described in the introduction, the special vertices form the infinite backbone Y = {ρ 0 , ρ 1 , . . .} consisting of all vertices with an infinite line of descent where ρ i is the vertex in generation i. Each vertex ρ i on the backbone is connected to buds ρ i, j for j = 1, . . . , d ρ i − 1 (which are the normal vertices that are offspring of special vertices in the construction). Each of these is then the root of an f -GW tree T ρ i, j . We call each T ρ i, j a trap and the collection from a single backbone vertex (combined with the backbone vertex) T * − ρ i a branch. Figure 2 shows an example of the first five generations of a tree T . The solid line represents the backbone and the two dotted ellipses identify a sample branch and trap. The dashed ellipse indicates the children of ρ 1 which, since ρ 1 is on the backbone, have quantity distributed according to the size-biased law. It will be helpful throughout to work on a dummy branch which is equal in distribution to T * − ρ i for any i thus we define the following random tree.

Definition 4 (Dummy branch)
Define T * − to be a random tree rooted at ρ with first generation vertices ρ 1 , . . . , ρ ξ * −1 which are roots of independent f -GW-trees (T where ξ * is a size biased random variable independent of the rest of the tree. Define T • to be a dummy f -GW-tree.
The structure of the large traps will have an important role in determining the convergence of the scaled process. In this section we determine the distribution over  the number of deep traps rooted at backbone vertices with at least one deep trap. We will show that there is only a single deep trap at any backbone vertex when the offspring law has finite variance whereas, when the offspring law belongs to the domain of attraction of a stable law with index α < 2 we have that the number of deep traps converges in distribution to a certain heavy tailed law.
A fundamental result for branching processes (see, for example [20]), is that for μ < 1 and Z n an f -GW process, the sequence P(Z n > 0)/μ n is decreasing; moreover, E[ξ log(ξ )] < ∞ if and only if the limit of P(Z n > 0)μ −n as n → ∞ exists and is strictly positive. This assumption holds under any of the hypotheses thus for this paper we will always make this assumption and let c μ be the constant such that Recall that H(T ) denotes the height of a tree T rooted at ρ. Denote to be the probability that a given trap is of height at most m −1 (although in general we shall write s for convenience). Write N (m) := ξ * −1 j=1 1 {H(T • i )≥m} to be the number of traps of height at least m in the dummy branch then we are interested in the limit as m → ∞ of for l ≥ 1. Recall that f is the p.g.f. of the offspring distribution, write f (k) for its kth derivative then we have that In particular, we have that Lemma 3.1 shows that, when σ 2 < ∞, with high probability there will only be a single deep trap in any deep branch.
By monotonicity in s we have that Each summand in the denominator is increasing in s for s ∈ (0, 1) and by L'Hopital's rule 1 − s k−1 ∼ (k − 1)(1 − s) as s → 1 • therefore, by monotone convergence, the denominator in the final term of (3.6) converges to the same limit.
In order to determine the correct threshold for labelling a branch as large we will need to know the asymptotic form of P(N (m) ≥ 1). Corollary 3.2 gives this for the finite variance case.
It therefore follows that The result then follows by the definitions of c μ (3.1) and s (3.2).
We now consider the case when σ 2 = ∞ but ξ belongs to the domain of attraction of a stable law of index α ∈ (1, 2). The following lemma concerning the form of the probability generating function of the offspring distribution will be fundamental in determining the distribution over the number of large traps rooted at a given backbone vertex. The case μ = 1 appears in [7]; the proof of Lemma 3.3 is a simple extension of this hence the proof is omitted.
where L varies slowly at ∞.
When μ < 1 it follows that there exists a function L 1 (which varies slowly as . when this exists where (μ) l := l−1 j=0 (μ − j) is the Pochhammer symbol. Write L 2 (x) := L 1 (x −1 ) which is slowly varying at 0. Using Theorem 2 of [19], we see that xg (x) ∼ αg(x) as x → 0. Moreover, using an inductive argument in the proof of this result, it is straightforward to show that for all l ∈ N we have that xg (l+1) (x) ∼ (α − l)g (l) (x) as x → 0. Therefore, for any integer l ≥ 0 For derivatives l ≥ 1 we have that Proposition 3.4 will be useful for determining the number of large traps in a large branch but equally important is the asymptotic relation (3.9) which gives the tail behaviour of the height of a branch T * − . By the assumption on ξ that (2.2) holds we have that as t → ∞. Using (3.2), (3.5), (3.7), (3.9) and (3.11), we then have that (3.12)

Large branches are far apart
In this section we introduce the conditions for a branch to be large. This will differ in each of the cases however, since many of the proofs will generalise to all three cases, we will use the same notation for some aspects.
In IVFE we will have that the slowing is caused by the large number of traps. In particular, we will be able to show that the time spent outside branches with a large number of buds is negligible.

Definition 5 (IVFE large branch)
For ε ∈ (0, 1) write l n,ε := a n 1−ε and l + n,ε := a n 1+ε then we have that P(ξ * ≥ l n,ε ) ∼ n −(1−ε) . We will call a branch large if the number of buds is at least l n,ε and write D (n) := {x ∈ Y : d x > l n,ε } to be the collection of backbone vertices which are the roots of large branches.
In FVIE we will have that the slowing is caused by excursions into deep traps.
We will call a branch large if there exists a trap within it of height at least h n,ε and write D (n) := {x ∈ Y : H(T * − x ) > h n,ε } to be the collection of backbone vertices which are the roots of large branches. By a large trap we mean any trap of height at least h n,ε .
In IVIE we will have that the slowing is caused by a combination of the slowing effects of the other two cases. The height and number of buds in branches have a strong link which we show more precisely later; this allows us to label branches as large based on height which will be necessary when decomposing the time spent in large branches.

Definition 7 (IVIE large branch)
For ε ∈ (0, 1) write h n,ε := log a n 1−ε log μ −1 and h + n,ε := log a n 1+ε log μ −1 then by (3.12), for C D := (2 − α)c α−1 μ , we have that We will call a branch large if there exists a trap of height at least h n,ε and write D (n) := {x ∈ Y : H(T * − x ) > h n,ε } to be the collection of backbone vertices which are the roots of large branches. By a large trap we mean any trap of height at least h n,ε .
We want to show that, asymptotically, the large branches are sufficiently far apart to ignore any correlation and therefore approximate n by the sum of i.i.d. random variables representing the time spent in a large branch. Much of this is very similar to [4] so we only give brief details.
Write D (n) m := {x ∈ D (n) : |x| ≤ m} to be the large roots before level m then let q n := P(ρ ∈ D (n) ) be the probability that a branch is large and write to be the event that the number of large branches by level T n doesn't differ too much from its expected value. Notice that in all three cases we have that q n is of the order n −(1−ε) thus we expect to see nq n ≈ Cn ε large branches by level n.  We want to show that all of the large branches are sufficiently far apart such that the walk doesn't backtrack from one to another. For t > 0 and κ ∈ (0, 1 − 2ε) write to be the event that all large branches up to level nt are of distance at least n κ apart and the root of the tree is not the root of a large branch. A union bound shows that P(D(n, t) c ) → 0 as n → ∞ uniformly over t in compact sets. We want to show that, with high probability, once the walk reaches a large branch it never backtracks to the previous one. For t > 0 write to be the event that the walk never backtracks distance C log(n) (where Y n := min{m ≥ 0 : Y m = ρ n }). For x ∈ T write τ + x = inf{n > 0 : X n = x} to be the first return time of x. Comparison with a simple random walk on Z shows that for k ≥ 1 we have that the escape probability is P ρ k τ + ρ k−1 < ∞ = β −1 hence, using the Strong Markov property, Using a union bound we see that for C sufficiently large. Combining this with D(n, t) we have that with high probability the walk never backtracks from one large branch to a previous one.

Time is spent in large branches
In this section we show that the time spent up to time n outside large branches is negligible. Combined with Sect. 4 this allows us to approximate n by the sum of i.i.d. random variables. We begin with some general results concerning the number of excursions into traps and the expected time spent in a trap of height at most m.
Recall that ρ i, j are the buds connected to the backbone vertex ρ i , that c(ρ i ) := {ρ i, j } j ∪ {ρ i+1 } is the collection of all offspring of ρ i and d ρ i = |c(ρ i )| is the number of offspring. We write W i, j := |{m ≥ 0 : X m−1 = ρ i , X m = ρ i, j }| to be the number of excursions into the jth trap of the ith branch where we set W i, j := 0 if ρ i, j doesn't exist in the tree. Lemma 5.1 shows that, conditional on the number of buds, the number of excursions follows a geometric law.
Moreover, under this law, (W i, j ) j∈A have a negative multinomial distribution with one failure until termination and probabilities that from ρ i the next excursion will be into the jth trap (where j = 0 denotes escaping).
Proof From ρ i, j the walk must return to ρ i before escaping therefore since P ρ i, j (τ + ρ i < ∞) = 1, any traps not in the set we consider can be ignored and it suffices to assume that A = {1, . . . , k}. By comparison with a biased random walk on Z we have that The probability of never entering a trap in the branch T * − ρ i is, therefore, Each excursion ends with the walker at ρ i thus the walk takes a geometric number of excursions into traps with escape probability (β − 1)/((k + 1)β − 1). The second statement then follows from the fact that the walker has equal probability of going into any of the traps.
For a fixed tree T with nth generation size Z T n where Z T 1 > 0 it is classical (e.g. [22]) that Let T ← be an f -GW tree rooted at ← − ρ conditioned to have a single first generation vertex which we label ρ. Notice that this has the same distribution as an f -GW-tree T • to which we append a single ancestor of the root. From (5.1) it follows that For any m ≥ 1 we have that P(H(T ← ) ≤ m) ≥ p 0 therefore, for some constant C, Recall that Y n is the first hitting time of ρ n for the underlying walk Y and write to be the event that level n is reached by time C 1 n by the walk on the backbone. Standard large deviation estimates yield that lim n→∞ P( For the remainder of this section we mainly consider the case in which ξ belongs to the domain of attraction of a stable law of index α ∈ (1, 2). The case in which the offspring law has finite variance will proceed similarly however since the corresponding estimates are much simpler in this case we omit the proofs.
In IVIE and IVFE, for t > 0, let the event that there are at most log(n)a n buds by level nt be The variables d ρ k are i.i.d. with the law of ξ * therefore the laws of a −1 to be the event that any trap is entered at most C 2 log(n) times. By Lemma 5.1 the number of entrances into ρ i, j has the law of a geometric random variable of parameter p = (β − 1)/(2β − 1) hence using a union bound we have that where L varies slowly hence the final term converges to 0 for C 2 large and lim n→∞ P(A 5 (n, t) c ) = 0. Propositions 5.2, 5.3 and 5.4 show that any time spent outside large traps is negligible. In FVIE and IVIE we only consider the large traps in large branches. Recall that D (n) is the set of roots of large branches and write to be the vertices in large traps. In IVFE we require the entire large branch and write to be the vertices in large branches. In either case we write χ t,n := |{1 ≤ i ≤ nt : X i−1 , X i ∈ K (n)}| to be the time spent up to nt in large traps.

Proposition 5.2
In IVIE, for any t, > 0 we have that as n → ∞ Proof On A 4 (n, t) there are at most a n log(n) traps by level nt . We can order these traps so write T (l,k) to be the duration of the kth excursion into the lth trap and ρ(l) to be the root of this trap (that is, the unique bud in the trap). Here we consider an excursion to start from the bud and end at the last hitting time of the bud before returning to the backbone. Recall that on A 3 (n) the walk Y reaches level n by time C 1 n and on A 5 (n) no trap up to level n is entered more than C 2 log(n) times. Using the estimates on A 3 , A 4 and A 5 we have that Since a 1 γ n a n log(n) 2 n, for n sufficiently large we have that, using Markov's inequality and (5.2) with m = h n,ε , the second term can be bounded above by Combining constants and slowly varying functions into a single function L t, such that for anyε > 0 we have that L t, (n) ≤ nε for n sufficiently large we then have that which converges to 0 since α, 1 γ > 1. Using A 3 , A 5 and the form of A 4 for FVIE, the technique used to prove Proposition 5.2 extends straightforwardly to prove Proposition 5.3 therefore we omit the proof.

Proposition 5.3 In FVIE, for any t, > 0 we have that as n
Similarly, we can show a corresponding result for IVFE.
Proof We begin by bounding the total number of traps in small branches. Recall from Definition 5 that l n,ε ≤ a n 1−ε . Let c ∈ (0, 2 − α) then, by Markov's inequality and the truncated first moment asymptotic: as x → ∞ for some constant C (see for example [12], IX.8), for n large where L t (n) varies slowly at ∞. This converges to 0 as n → ∞. We can order the traps in small branches and write T (l,k) to be the duration of the kth excursion in the lth trap not in a large branch where we consider an excursion to start and end at the backbone. Using A 3 and A 5 to bound the time taken by Y to reach level nt and the number of entrances into traps up to level nt we have that for n suitably large Using Markov's inequality on the final term yields for some L t, varying slowly at ∞. This converges to 0 as n → ∞ hence the result holds.
Recall that we write r n to be a n in IVFE, n 1/γ in FVIE and a 1/γ n in IVIE. Since nt − χ t,n is non-negative and non-decreasing in t we have that sup 0≤t≤T | nt − χ t,n | = | nT − χ T,n | therefore Corollary 5.5 follows from Propositions 5.2, 5.3 and 5.4.
Write χ i n to be the total time spent in large traps of the ith large branch; that is where ρ + i is the element of D (n) which is ith closest to ρ. Notice that, whereas χ n,t only accumulates time up to reaching ρ nt , each χ i n may have contributions at arbitrarily large times. Recall that A (0) 2 (n, t) is the event that the walk never backtracks distance C log(n) along the backbone from a backbone vertex up to level nt . On A where, on D(n, t), the J 1 distance between the two sums in the above expression can be bounded above by C log(n). In particular, using that A The random variablesχ i n are independent copies (under P) of times spent in large branches; moreover, on D(n, t), ρ / ∈ D (n) therefore they are identically distributed. Let E extend to the enlarged space.

Lemma 5.6
In each of IVFE, FVIE and IVIE, Proof By definition of d J 1 , the distance in statement 1 is equal to For m ∈ N let λ n (m/n) := |D (n) m |(nq n ) −1 then define λ n (t) by the usual linear interpolation. It follows that |D (n) nt | = λ n (t)nq n and the above expression can be bounded above by be the analogue of A (0) 2 (n, t) for the ith copy and be the event that ρ is not the root of a large branch, on each of the first ntq n copies the walk never backtracks distance C log(n) and that large branches are of distance at least n κ apart.
which converges to 0 as n → ∞ for C large by the same argument as (4.3) and that P(D(n, T ) c ) → 0.
Using Corollary 5.5 and Lemma 5.6, in order to show the convergence of nt /r n , it suffices to show the convergence of the scaled sum of independent random variables χ t,n /r n .

Excursion times in dense branches
In this section we only consider IVFE. The main tool will be Theorem 6, which is Theorem 10.2 in [4], and is itself a consequence of Theorem IV.6 in [23].
then the following statements are equivalent: In our case, n(t) will be the number of large branches up to level nt and {R k } independent copies of the time spent in a large branch.
Since we are now working with i.i.d. random variables we will simplify notation by considering the dummy branch T * − defined in Definition 4 which has root ρ and first generation vertices have the multinomial distribution determined in Lemma 5.1; that is, W j represents the number of excursions into the jth trap of T * − . For the biased random walk X n on T * − started from ρ, let T j,k denote the duration of the kth excursion in the jth trap where we recall that in IVFE the excursion starts and ends at the root ρ. We then have that is equal in distribution under P(·|ξ * > l n,ε ) toχ i n under P for any i. For K ≥ l n,ε − l n,0 write L K := l n,0 + K then denote P K (·) := P ·|ξ * − 1 = L K and P K (·) := P ·|ξ * − 1 = L K . We now proceed to show that under P K converges in distribution to some random variable Z ∞ whose distribution doesn't depend on K .
We start by showing that the excursion times T j,k don't differ greatly from E T * − [T j,k ]. In order to do this we require moment bounds on T j,k however since E[ξ 2 ] = ∞ we don't have finite variance of the excursion times and thus we require a more subtle treatment. Recall that for a tree T we denote Z T n to be the size of the nth generation. Excursion times are first return times τ + ρ conditioned on the first step therefore pruning buds and using (5.1) we have that the expected excursion time in Using that P(Z T • n > 0) ∼ c μ μ n (from (3.1)) we see that for n large there are no traps of height greater than C log(n) for some constant C thus for our purposes it will suffice to study sup n Z T • n β n . Lemma 6.1 Let Z n be a subcritical Galton-Watson process with mean μ and offspring where the inequality follows by convexity of f (x) = x 1+ . From this it follows that for ∈ (0, α − 1) Fix λ = (μ/β) 1/2 then μ < λ and for > 0 sufficiently small λβ 1+ < 1. By dominated convergence E[ξ 1+ ] < λ for all small. In particular, β 1+ E[ξ 1+ ] < 1 for suitably small and therefore (Z n β n ) 1+ is a supermartingale.

Lemma 6.2
In IVFE, we can choose ε > 0 such that for any t > 0 there exists a constant C t such that to be the event that none of the first m trees have height greater than C log(m). .
≤Cm −c for m sufficiently large. By Lemma 6.1 we have that (Z k β k ) 1+ is a supermartingale for > 0 sufficiently small where Z n is the process associated to T • therefore by Doob's supermartingale inequality Using the expression (6.4) for the expected excursion time it follows that In particular, for some slowly varying function L to be the event that no trap is of height greater than C log(m) and the expected excursion time in any trap is at most m 1−κ . For m sufficiently large by (6.5) we have that to be the event that no trap is of height greater than C log(m), entered more than C log(m) times or has expected excursion time greater than m 1−κ . Then, by a union bound and the geometric distribution of W j from Lemma 5.1 for some slowly varying function L. Here the first inequality comes from Chebyshev and the second holds due to (6.6). Since > 0 we can choose ε ∈ (0, /2) then In particular, this holds for m = L K ≥ a n 1−ε thus since α < 2 sup K ≥−(an −ln,ε) which is bounded above by C t n −2ε for n large whenever ε < (2 − α)/(α − 1).
Using this we can now show that the average time spent in a trap indeed converges to its expectation.

Lemma 6.3
In IVFE, we can find ε > 0 such that for sufficiently large n we have that Proof We continue using the notation defined in Lemma 6.2 and also define the event that the jth trap isn't tall, entered many times and that the expected excursion time in it isn't large.
we have that the summand in the right hand side doesn't have zero mean thus we perform the splitting: By Chebyshev's inequality and the tail bound for some slowly varying function L. The second term is equal to by (6.7). The final term can be written as which converges to 0 as m → ∞ by dominated convergence since, by (5.2), E[T 1,1 ] < ∞. We therefore have that the statement holds by setting m = L K .
Recall from (6.3) that, under P K , ζ (n) is the average time spent in a trap of a branch with ξ * − 1 = L K buds. From Lemmas 6.2 and 6.3 we have that as n → ∞

Corollary 6.4
In IVFE, we can find ε > 0 such that for sufficiently large n we have that sup K ≥−(a n −l n,ε ) Proof By Lemma 5.1 the sum of W j have a geometric law. In particular, for some constant C independent of K . It therefore follows that the laws of ζ (n) converge under P K to an exponential law. In particular, using Lemmas 6.2 and 6.3 with the bound we have the result since L K ≥ l n,ε n ε .

Corollary 6.5 In IVFE, for any
Lemma 6.6 shows that the product of an exponential random variable with a heavy tailed random variable has a similar tail to the heavy tailed variable. Lemma 6.6 Let X ∼ ex p(θ ) and ξ be an independent variable which belongs to the domain of attraction of a stable law of index α ∈ (0, 2). Then P(X ξ > x) ∼ θ −α (α + 1)P(ξ > x) as x → ∞.
By the definition of a n we have that P > (ξ * − 1 ≥ a n ) = P(ξ * ≥ a n ) P(ξ * ≥ a n 1−ε ) ∼ n −ε . (6.10) Conditional on the number of buds ξ * we have that the number of excursions W j into the jth trap are independent from the excursion times T j,k and both the number of excursions and the excursion times have finite mean hence E > χ n a n 1 {ξ * −1<a n } = a n −1 where the asymptotic holds as n → ∞ by (5.4). In particular, by combining this with (6.10) in (6.9) we have that M λ n V ar P > (χ n a n 1 {χ n an ≤τ } ) ≤ C(τ 2 + τ ) for some constant C depending on λ hence, as τ → 0 + , we indeed have convergence to 0 and therefore the first condition holds.
We now move on to the Lévy spectral function L λ . Clearly for x < 0 we have that L λ (x) = 0 sinceχ n is a positive random variable. It therefore suffices to consider x > 0. By Corollary 6.4 we have that the scaled time spent in a large trap ζ (n) [from (6.3)] converges in distribution to an exponential random variable Z ∞ with parameter θ (which is independent of K ) therefore, since M λ Where the final asymptotic holds by Lemma 6.6 and because which converges to 0 as n → ∞ since l n,ε = a n 1−ε (and therefore a n /l n,ε >> n ε ).
It now suffices to show that n ε P > χ ∞ n a n > x − P > χ n a n > x converges to 0 as n → ∞. To do this we condition on the number of buds: We consider positive and negative K separately. For K ≥ 0 we have that By (6.10) n ε P > (ξ * − 1 ≥ a n ) converges as n → ∞ hence, using Corollary 6.4, (6.11) converges to 0. For K ≤ 0, by Corollary 6.4 we have that For some constant C we have that P(ξ * − 1 ≥ l n,ε ) ∼ Cn −(1−ε) thus by (5.4) In particular, sincer (n) = o(1), we indeed have that this converges to zero and thus we have the required convergence for L λ .
Finally, we consider the drift term d λ . Since 0<x≤τ xdL λ (x) < ∞ we have that We want to show that d λ = ∞ 0 x 1+x 2 dL λ (x) thus we need to show that the other terms cancel. By definition of P > we have that By Lemma 6.6, (ξ * −1)Z ∞ belongs to the domain of attraction of a stable law of index α − 1 and satisfies the scaling properties of ξ * (up to a constant factor). Therefore, using that a n l n,ε , we have that Using the form of the Lévy spectral function we have that thus it remains to show that n ε E > χ ∞ n a n 1 χ ∞ n an ≤τ − E > χ n a n 1 χn Similarly to the previous parts we condition on ξ * − 1 = L K and consider the sums over K positive and negative separately. For K ≤ 0 By definition of l n,ε and properties of stable laws n ε E (ξ * − 1)/a n 1 {ξ * ≤a n } /P(ξ * ≥ l n,ε ) converges to some constant as n → ∞. By Corollary 6.5 we therefore have that this converges to 0. Similarly for K ≥ 0 we have that We have that n ε P(ξ * ≥ l n,0 )/P(ξ * ≥ l n,ε ) converges to some constant as n → ∞. The result then follows by Corollary 6.5.
This shows the convergence result of Theorem 1 in the sense of finite dimensional distributions. In Sect. 9 we prove a tightness result which concludes the proof.

Excursion times in deep branches
In this section we decompose the time spent in large branches. In FVIE this will be very similar to the decomposition used in [4] and we won't consider the argument in great detail. However, the decomposition required in IVIE requires greater delicacy. In Lemmas 7.1, 7.2 and Proposition 7.3 we consider a construction of a GW-tree conditioned on its height from [14] to show that the time spent in deep traps essentially consists of some geometric number of excursions from the deepest point in the trap to itself. That is, as in [4], excursions which don't reach the deepest point are negligible as is the time taken for the walk to reach the deepest point from the root of the trap and the time taken to return to the root from the deepest point when this happens before returning to the deepest point.
In the remainder of the section we show that, conditional on the exact height of the branch H , the time spent in the branch scaled by β H converges in distribution along the given subsequences. In Lemma 7.5 we determine an important asymptotic relation for the distribution over the number of buds conditional on the height of the branch. In Lemmas 7.6-7.9 we provide various bounds which allow us, in Proposition 7.10, to show that the excursion time in a large branch is close to the random variable Z n ∞ (defined in (7.23)) which removes some of the dependency on n.
The main result of the section is Proposition 7.14 which shows that the scaled time spent in a large branch converges in distribution along the given subsequences. As a prelude to this we prove Lemmas 7.11-7.13 which show that we can reintroduce small traps into the branch and that the height of a trap is sufficiently close to a geometric random variable. We then conclude the section by showing that the scaled excursion times can be dominated by some random variable with a certain moment property which will be important in Sect. 8.
Recall that T • is an f -GW-tree and H(T • ) is its height then, following notation of [4], we denote (φ n+1 , ψ n+1 ) n≥0 to be a sequence of i.i.d. pairs with joint law for k = 1, 2, . . . and j = 1, . . . , k. Under this law ψ n+1 has the law of the degree of the root of a GW-tree conditioned to be of height n + 1 and φ n+1 has the law over the first bud to give rise onto a tree of height exactly n. We then construct a sequence of trees recursively as follows: Set T ≺ 0 = {δ} then 1. Let the first generation of T ≺ n+1 be of size ψ n+1 . 2. Attach T ≺ n to the φ n+1 th first generation vertex of T ≺ n+1 . 3. Attach f -GW-trees conditioned to have height at most n − 1 to the first φ n+1 − 1 vertices of the first generation of T ≺ n+1 . 4. Attach f -GW-trees conditioned to have height at most n to the remaining ψ n+1 − φ n+1 first generation vertices of T ≺ n+1 . Under this construction T ≺ n+1 has the distribution of an f -GW-tree conditioned to have height exactly n + 1. Write δ 0 = δ to be the deepest point of the tree and for n = 1, 2, . . . write δ n to be the ancestor of δ of distance n. The sequence δ 0 , δ 1 , . . . form a 'spine' from the deepest point to the root of the tree. We denote T ≺ to be the tree asymptotically attained. By a subtrap of T ≺ we mean some vertex x on the spine together with a descendant y off the spine and all of the descendants of y. This is itself a tree with root x and we write S x to be the collection of subtraps rooted at x.  We denote S n, j,1 to be the jth subtrap conditioned to have height at most n − 1 attached to δ n and S n, j,2 to be the jth subtrap conditioned to have height at most n attached to δ n . Recall that d(x, y) denotes the graph distance between vertices x, y then for k = 1, 2 let n, j,k := 2 x∈S n, j,k \{δ n } β d(x,δ n ) denote the weight of S n, j,k under the invariant measure associated to the conductance model with conductances β i+1 between levels i, i + 1 and the roots of S n, j,k (spinal vertices) denoting level 0. We then write to denote the total weight of the subtraps of δ n then, is the expected time R ∞ taken for a walk on T ≺ started from δ to return to δ.
Using that conditioning the height of a GW-tree T • to be small only decreases the expected generation sizes and that μβ > 1, by (5.1) for some constant c where Z k are the generation sizes of T • . Summing over j in (7.1) shows that P(ψ n+1 = k) = P(Z 1 = k|H(T • ) = n + 1). Recalling that s n = P(H(T • ) < n), By (3.1) 1 − s n+1 ∼ cμ n for some positive constant c. Let > 0 be such that 1− −μ(1+ ) > 0, then for n large we have that (1− )cμ n ≤ 1−s n+1 ≤ (1+ )cμ n . Therefore, for some positive constant C. In particular, when σ 2 < ∞, there exists some constant c such that where the final inequality comes from that (1 − s k )(1 − s) −1 is increasing in s and converges to k for any k ≥ 1. It therefore follows that E[ n ] ≤ C(βμ) n so indeed

When ξ has infinite variance but belongs to the domain of attraction of a stable law
hence by (3.9) as n → ∞ we have that E[ψ n+1 ] ∼ cμ n(α−2) L 2 (μ n ). Combining this with (7.3) and (7.4) we have therefore using (7.2) for C chosen sufficiently large we have that We therefore have that the expected time taken for a walk started from the deepest point in a trap (of height H ) to return to the deepest point is bounded above by E[R ∞ ] < ∞ independently of its height. Recall that τ + x is the first return time to x. The following lemma gives the probabilities of reaching the deepest point in a trap, escaping the trap from the deepest point and the transition probabilities for the walk in the trap conditional on reaching the deepest point before escaping. The proof is straightforward by comparison with the biased walk on Z with nearest neighbour edges so we omit it.

Lemma 7.2 For any tree T of height H + 1 (with H ≥ 1), root ρ and deepest vertex δ we have that
is the probability of reaching the deepest point without escaping and is the probability of escaping from the deepest point before returning. Moreover, is the probability that the walk restricted to the spine conditioned on reaching δ before returning to ρ moves towards δ.
Since the first two probabilities are independent of the structure of the tree except for the height we write to be the probability that the walk reaches the deepest vertex in the tree before returning to the root starting from the bud and to be the probability of escaping from the tree. For the remainder of the section we will consider only the case that the offspring distribution belongs to the domain of attraction of some stable law of index α ∈ (1, 2). The first aim is to prove Proposition 7.3 which shows that the time on excursions in deep traps essentially consists of some geometric number of excursions from the deepest point to itself. We will then conclude with Corollary 7.4 which is an adaptation for FVIE and of which we omit the proof.
Recall that ρ + i is the root of the ith large branch andχ i n is the time spent in this branch by the ith walk X otherwise to be the duration of the kth excursion into T + i, j without the first passage to the deepest point and the final passage from the deepest point to the exit. We can then define T * (i, j,k) (7.9) to be the time spent in the ith large trap without the first passage to and last passage from δ (i, j) on each excursion. We want to show that the difference between this and χ i n is negligible. In particular, recalling that D (n) n is the collection of large branches by level n, we will show that for all t > 0 as n → ∞ (7.10) to be the event that there are no h + n, -branches by level n. Using a union bound and (3.12) we have that P(A 6 (7.11) to be the event that all large branches up to level n of the backbone have fewer than n 2ε α−1 large traps. Conditional on the number of buds, the number of large traps in the branch follows a binomial distribution therefore By (4.1) P(H(T * − ) ≥ h n,ε ) ≥ Cn −(1−ε) for n large and some constant C hence by (3.11) the first term decays faster than n −ε . Using a Chernoff bound the second term has a stretched exponential decay. Therefore, by Lemma 4.1 and a union bound, as n → ∞ Recall that d x := |c(x)| is the number of children of x in the tree and define to be the event that there are fewer than n 3ε/(α−1) 2 subtraps on the spine in any large trap. For Z n the generation sizes associated to GW-tree T • we have that P(Z 1 ≥ n|H(T • ) ≥ m) is non-decreasing in m; therefore, the number of offspring from a vertex on the spine of a trap can be stochastically dominated by the size biased distribution. Using this and Lemma 4.1 with the bounds on A 6 and A 7 we then have that for some slowly varying function L where (ξ * k ) k≥1 are independent variables with the size biased law; thus P(A 8 (n) c ) → 0 as n → ∞.

Proposition 7.3 In IVIE, for any t
then using the bounds on A i for i = 1, . . . , 8 it follows that P(A (n) c ) → 0 as n → ∞. In particular, on A 1 (n) (from (4.2)) we have that |D (n) n | ≤ Cn ε and on A 7 (n) (from (7.11)) we have that N i ≤ n 2ε α−1 for all i therefore by Markov's inequality where we recall that T * (i, j,k) n ≤ T (i, j,k) n for all i, j, k.
Since the number of excursions W i, j are independent of the excursion times and have marginal distributions of geometric random variables with parameter For a given excursion either the walk reaches the deepest point δ (1,1) before returning to the root ρ + 1,1 or it doesn't. In the first case the difference T (1,1,1) n − T * (1,1,1) n is the time taken to reach δ (1,1) conditional on the walker reaching δ (1,1) before ρ + 1,1 added to the time taken to reach ρ + 1,1 from δ (1,1) conditional on reaching ρ + 1,1 before returning to δ (1,1) . In the second case the difference is the time taken to return to the root given that the walker returns to the root without reaching δ (1,1) . In particular, recalling that (1,1) . (7.13) We want to show that each of the terms in (7.13) can be bounded appropriately. This follows similarly to Lemmas 8.2 and 8.3 of [4] so we only sketch the details. Conditional on the event that the walk returns to the root of the trap before reaching the deepest point we have that: 1. the transition probabilities of the walk in subtraps are unchanged, 2. from any vertex on the spine, the walk is more likely to move towards the root than to any vertex in the subtrap, 3. from any vertex on the spine, excluding the root and deepest point, the probability of moving towards the root is at least β times that of moving towards the deepest point. Property 3 above shows that the probability of escaping the trap from any vertex on the spine is at least the probability p ∞ of a regeneration for the β-biased random walk on Z. From this we have that the number of visits to any spinal vertex can be stochastically dominated by a geometric random variable with parameter p ∞ . Similarly, using property 2 above, we see that the number of visits to any subtrap can be stochastically dominated by a geometric random variable with parameter p ∞ /2.
Using a union bound with A 1 , A 7 , A 8 and (3.1) we have that with high probability there are no subtraps of height greater than h n,ε . In particular, by (5.2), the expected time in any subtrap can be bounded above by C(βμ) h n,ε for some constant C using property 1. From this it follows that for some constant C and slowly varying function L.
A symmetric argument shows that the same bound can be achieved for the first term in (7.13). It then follows that the second term in (7.12) can be bounded above by C t L 1 (n)n − 1 α−1 +ε whereε can be made arbitrarily small by choosing ε sufficiently small.
A straightforward adaptation of Proposition 8.1 of [4] (similar to the previous calculation) shows Corollary 7.4 which is the corresponding result for FVIE.

Corollary 7.4 In FVIE, for any t
By Proposition 7.3 and Corollary 7.4, in FVIE and IVIE, almost all time up to the walk reaching level n is spent on excursions from the deepest point in deep traps. The aim of the remainder of the section is to prove Proposition 7.14 which shows that the time spent on the excursions from the deepest point in a single large branch (suitably scaled) converges in distribution along the given subsequences. To ease notation, for the remainder of the section we work on a dummy branch T * so that the timeχ i * n has the distribution of a sum of excursion times from the deepest points of T * .
Recall from Definition 4 that T * − is a dummy branch with root ρ, buds ρ 1 , . . . , ρ ξ * −1 each of which is the root of an f -GW-tree T • j with height H j := H(T • j ). We now define a pruned version of this branch which only contains traps of height at least h n,ε . Write W j to be the total number of excursions into T + j and B j the number of excursions which reach the deepest point δ j .

Definition 8 (Pruned dummy branch) Let
For each k ≤ B j we define G j,k to be the number of return times to δ j on the kth excursion which reaches δ j .
For l = 1, . . . , G j,k let R j,k,l denote the duration of the lth excursion from δ j to itself on the kth excursion into T + j which reaches δ j .
The height of the branch and the total number of traps in the branch have a strong relationship. Lemma 7.5 shows the exact form of this relationship in the limit as n → ∞. Recall from (3.1) that c μ is the positive constant such that P(H(T ) ≥ n) ∼ c μ μ n as n → ∞ then write Lemma 7.5 In IVIE, under P K we have that the sequence of random variables (ξ * − 1)/b K n converge in distribution to some random variable ξ satisfying We know the asymptotic form of P(H(T • ) ≤ H K n ) from (3.1) thus we need to consider the distribution of ξ * − 1 conditioned on ξ * − 1 ≥ tb K n . By the tail formula for ξ * − 1 following Definition 3 we have that for r ≥ 1 as n → ∞ We therefore have that, conditional on ξ * − 1 ≥ tb K n , the sequence (ξ * − 1)/tb K n converges in distribution to a variable Y with tail P(Y ≥ r ) = r −(α−1) ∧ 1. Using the form of b K n we then have that (1)) .
It therefore follows that Repeating with H K n replaced by H Combining this with (7.17) in (7.16) we have that Notice that under P the pruned dummy branch T * is the single vertex ρ with high probability however under P K there is at least one trap. By Lemma 5.1, conditional on N , (W j ) N j=1 have a joint negative multinomial distribution. Moreover, W j , B j are coupled so that B j is binomially distributed with W j trials and success probability p 1 (H +  j ). The number G j,k of return times to δ j is geometrically distributed with failure probability p 2 (H + j ). It follows that eachχ i * n is equal in distribution to Define the scaled excursion time in large traps of a large branch as then we will show that ζ (n) converges in distribution under P K along subsequences n l (t). Lemma 7.6 gives an upper bound on the number of large traps in a branch conditioned on its height. Proof Conditioned on the height of the branch and number of buds we have that at least one trap attains the maximum height, all others have the distribution of heights of GW-tree conditioned on their maximum height therefore By Lemma 7.5 P K ξ * − 1 ≥ log(n)b K n converges to 0 as n → ∞. Conditioned on having ξ * −1 = log(n)b K n buds we have that N is binomially distributed with log(n)b K n trails and success probability P(H(T • ) ≥ h n,ε ) ≤ Cμ h n,ε by (3.1). Since for some slowly varying function L we have that E Bin log(n)b K n , Cμ h n,ε ≤ Cμ K log(n) a n a n 1−ε ≤ L(n)μ K n ε α−1 , a Chernoff bound shows that the final term in (7.19) converges to 0.
Forε > 0 write Recall from (7.7) that p 2 (H ) is the probability that a walk started from the deepest point of a tree of height H reaches the root before returning to the deepest point. Since G j,k are independent geometric random variables there exist independent exponential random variables e j,k such that

By (7.7) we then have that
therefore, since H + j ≥ h n,ε , for anyε > 0 there exists n large such that P K (A 9 (n)) = 1 for any K ∈ Z.
Recall from (7.7) and Definition 8 that G j,k is geometrically distributed with failure probability Then, using convergence of scaled geometric variables to exponential variables (see the proof of part (3) of Proposition 9.1 in [4]), we have that there exists a constantC such that for anyε > 0 there exists n large such that By Definition 8 we have that B j ≤ W j . Moreover N ≤ n ε+˜ α−1 with high probability for any˜ > 0 by Lemma 7.6 and W j ≤ C log(n) for all j by the bound on the event A 5 (n) c (from (5.3)). Therefore, writing a union bound gives us that P(A 10 (n) c ) → 0 as n → ∞.
By comparison with the biased random walk on Z we have that p 1 (H + j ) ≥ p ∞ = 1 − β −1 therefore we can define a random variable B j ∞ ∼ Bin(B j , p ∞ / p 1 (H + j )). It then follows that B j ≥ B j ∞ ∼ Bin(W j , p ∞ ) and Since the marginal distribution of W 1 doesn't depend on n, using (7.21), the bound on N from Lemma 7.6 and the coupling between B 1 and B 1 ∞ we have that which decays to 0 as n → ∞. By choosing ε > 0 sufficiently small we can choose κ in the range ε(1/γ + 1/(α − 1)) < κ < min{2(α − 1), 1/γ } then write to be the event that there are no large traps with expected squared excursion time too large.

Lemma 7.7
In IVIE, for any K ∈ Z, as n → ∞ we have that P K (A 12 (n) c ) → 0.
Proof Recall from (7.10) that, for > 0, A 6 (n) is the event that all large branches are shorter than h + n, and since N ≤ n ε+˜ α−1 with high probability we have that .
A straightforward argument using conductances (see the proof of Lemma 9.1 in [4]) gives where π is the invariant measure scaled so that π(δ 1 ) = 1 and d denotes the graph distance. We then have that where the final inequality follows by (7.5). If β 1/2 μ α−1− ≤ 1 then by Markov's inequality we have that P K (A 12 (n) c ) → 0 as n → ∞ since κ < γ −1 . Otherwise by Markov's inequality for some slowly varying function L. In particular, since κ < 2(α − 1) we can choose , ε,˜ sufficiently small such that this converges to 0 as n → ∞. Write to be the event that on each excursion that reaches the deepest point of a large trap, the total excursion time before leaving the trap is approximately the product of the number of excursions and the expected excursion time.

Lemma 7.8
In IVIE, for any K ∈ Z, as n → ∞ we have that P K (A 13 (n) c ) → 0.
Proof With high probability we have that no trap is visited more than C log(n) by (5.3) and also N ≤ n ε+˜ α−1 by Lemma 7.6. Any excursion is of length at least 2 hence E[R 1,1,1 n ] ≥ 2. Therefore, by Lemma 7.7 and Chebyshev's inequality It then follows that since G 1,1 ∼ Geo( p 2 (H + 1 )) (where from (7.7) p 2 (H ) is the probability that a walk reaches the deepest point in the trap of height H ) and for some slowly varying function L. In particular, which converges to zero by the choice of κ > ε(1/γ +1/(α−1)).

Lemma 7.9
In IVIE, for any K ∈ Z as n → ∞ we have that Proof A straightforward computation similar to that in Proposition 9.1 of [4] yields that for some constant c and n sufficiently large for all j = 1, . . . , N where k are the weights of the extension of T + j . Recall that N ≤ n ε+˜ α−1 with high probability by Lemma 7.6, therefore by (7.5) and Markov's inequality Since we can chooseε, ε and˜ arbitrarily small we indeed have the desired result.
Define Z n ∞ := e j,k (7.23) whose distribution depends on n only through N and (H + j − H ) N j=1 . Recalling the definition of ζ (n) in (7.18), since e j,k are the exponential random variables defining ) and the random variable N is the same in both equations, we have that ζ (n) and Z n ∞ are defined on the same probability space.

Proposition 7.10
In IVIE, for any K ∈ Z and˜ > 0 Proof Using the bounds on A 11 , A 13 and A 14 from (7.22) and Lemmas 7.8 and 7.9 respectively there exists some function g : R → R such that limε →0 + g(ε) = 0 and for sufficiently large n (independently of K ) It therefore suffices to show that (Z n ∞ ) n≥0 are tight under P K . Write The variables E[R j ∞ ], B j ∞ and e j,k are independent, don't depend on K and have finite mean (by Lemma 7.1, the geometric distribution of W j and exponential distribution of e j,k ) therefore uniformly over K . We can then write The distribution of S j is independent of the height of the trap. The number of large traps N is dominated by the total number of traps ξ * − 1 in the branch thus reintroducing small traps where we recall that, under P K , (H j ) ξ * −1 j=1 are distributed as the heights of independent f -GW-trees conditioned so that the largest is of height H K n and (S j ) with the law of S 1 . By Lemma 7.5 we have that lim t→∞ lim sup n→∞ P K (ξ * − 1 ≥ b K n log(t)) = 0 therefore it remains to bound the first term in (7.26). Write = inf{r ≥ 1 : H r = H K n } to be the index of the first trap with height the same as the maximum in the branch. Conditional on trap j being the first in the branch which attains the maximum height we have that the heights of the remaining traps are independent and either at most the height of the largest (for higher indices than j) or strictly shorter (for lower indices than j). In particular, this means that The distribution of S 1 is independent of n therefore lim t→∞ P(S 1 ≥ log(t)) = 0. Conditional on = 1, (H j ) j≥2 are independent therefore by Markov's inequality we have that .
For large enough n we have that P(H 1 ≤ H K n ) ≥ 1/2 therefore we have that for some constant C therefore the result follows from We now prove three technical lemmas which will be important in the proof of Proposition 7.14 which is the main result of the section. The first shows that we can reintroduce the small traps into Z n ∞ . The reason for doing this is that we no longer need to condition on the heights of the traps being at least the critical level which will simplify later calculations. In particular, we can replace N with ξ * − 1 (i.e. the total number of traps in the branch) which we understand under P K by Lemma 7.5.

Lemma 7.11
For allε > 0 we have that for any K ∈ Z as n → ∞, Proof First, notice that each term in the sum is nonnegative therefore introducing extra terms only increases the probability. By Lemma 7.5, for any˜ > 0, we have that P K (ξ * − 1 ≥ a n 1+˜ ) → 0 as n → ∞. We therefore have that By Definitions 7 and 8 we have that β H K n ≤ β K a 1/γ n therefore by Markov's inequality and (7.27) we have that Recall from Definition 7 that h n,ε ≤ log(a n 1−ε )/ log(μ −1 ) therefore Using the form of a n following Definition 3 we then have that there exists a slowly varying function L such that a n 1+˜ (βμ) h n,ε a 1/γ n ≤ L(n)n 1 α−1 ˜ +ε− ε γ which converges to 0 by choosing˜ < ε(1/γ − 1).
The second Lemma leading to Proposition 7.14 shows that the height of an f -GW-tree is sufficiently close to a geometric random variable. To ease notation let S = S 1 (see (7.24)), H = H 1 ∼ H(T • ) be distributed as the height of a GW-tree and G ∼ Geo(μ) independently of each other.
converges to zero as b → ∞.
Similarly, since there exist a constant c such that P(H ≥ t) ≤ cP(G ≥ t) uniformly over t we have that Letε > 0 then choose > 0 such that 0 e −x x −γ dx <ε C θ then, since the integrals are positive and c μ ≤ 1, we have that (7.29) For x > , by independence of S and H we have that In particular, bP which converges to 0 as b → ∞ by dominated convergence. Similarly, the same holds replacing H with G therefore combining this with (7.29) we have that the quantity (7.28) is bounded above bỹ Since S is independent of G and H we have that the supremum in the above expression can be bounded above by which completes the proof.
In the final Lemma preceding Proposition 7.14 we show that the Laplace transform can be written in terms of the distributions of S, H and ξ * .

Lemma 7.13
In IVIE, Conditional on , the random variables (H j ) j≥1 are independent with By conditioning on ξ * , we then have that and by Bayes' rule we also have that (7.33) Combining (7.31), (7.32) and (7.33) we can then write ϕ K (λ) as however, from (7.31), therefore this is equal to The next proposition shows that, under P K , we have that the scaled time spent in a large branch ζ (n) (from (7.18)) converges in distribution along subsequences n l where a n l (t) ∼ tμ −l . Proposition 7.14 In IVIE, under P K we have that Z n l ∞ converges in distribution (as l → ∞) to some random variable Z ∞ .
Proof By Lemmas 7.11 and 7.13, it now suffices to show convergence of (1)) therefore, using the relationship (7.15) between b K n and H K n we have that and similarly, (1)) .
By Lemma 7.5 we know that (ξ * − 1)/b K n converges in distribution to a random variable with exponential moments therefore we want to show a similar expression for the numerator in (7.35). Notice that where E e −λSβ H −H K n converges to 1 deterministically. In particular, this means that (1)) . (7.37) By summing over the possible values of H and using independence of S and H we have that Recalling G ∼ Geo(μ) independently of S then writing ϕ SG (λ) to be the Laplace transform of Sβ G and using the relationship (7.15) between b K n and H K n we therefore have that (7.37) can be written as (1)) . (7.38) It remains to deal with E[e −λSβ H −H K n ] b K n . To ease notation, let us write b := b K n = c −1 μ μ −H K n and θ = λc where the final equality holds by Lemma 7.12. Since S and G are independent we have that . Writing we then have that where, from (7.25), we also have that I (z) ≤ E[S γ ] < ∞ since γ < 1 by (1.1). Moreover, J (z) = J (zm log(β) ) and I (z) = I (zm log(β) ) for all z ∈ R, m ∈ Z. Substituting this back into (7.39) we have that For t > 0, along sequences n l (t) such that a n l (t) ∼ tμ −l we have that (b K n ) 1/γ ∼ Cl log(β) therefore, since I is bounded, we have that along subsequences n l (t) for some constant C μ,β depending on the distribution of S. Furthermore, the same arguments gives us that (1)) .
By boundedness, continuity and Lemma 7.5 we therefore have that ϕ K (λ) converges along the given subsequences which proves the result.
In order to prove the convergence result for sums of i.i.d. variables we shall require that ζ (n) can be dominated (independently of K ≥ h n,ε − h n,0 ) by some random variable Z sup such that E[Z For each t ≥ 0 we have that P(ξ * − 1 ≥ tb K n ) ∼ Ct −(α−1) P(H = H K n ) as n → ∞. Since P(H = H K n ) doesn't depend on t we can choose a constant c such that for n sufficiently large we have that P( In particular, for t ≥ 1 we have that P(ξ * − 1 ≥ tb K n |H = H K n ) ≤ c 1 e −c 2 t for some constants c 1 , c 2 . It follows that there exists some random variable ξ sup which is independent of H , has an exponential tail and satisfies ξ sup b K n ≥ ξ * − 1 on the event {H = H K n } for n suitably large (independently of K ). Recall that the total number of excursions W j in a trap exceeds the number which reach the deepest point B j and we write G j,k to denote the number of excursions from the deepest point. The length of these excursions can be dominated by excursions R j,k,l ∞ from the deepest points of the infinite traps T ≺ i . We then have that for n suitably large, (which are identically distributed under P) we have that under P K , is independent of H j and . Since W 1 has a geometric distribution (independently of n) we have that E[W 1 ] < ∞ and by Lemma 7.1 we have that Using geometric bounds on the tail of H from (3.1) and that P(H ≥ j|H ≤ H K n ) ≤ P(H ≥ j) we have that We therefore have that P K (X n (m) ≥ t) ≤ C(βμ) H K n /t thus there exists some sequence of random variables X n sup X n (m) for any m such that P K (X n sup ≥ t) = 1 ∧ C(βμ) H K n t −1 . In particular, X n sup X n (ξ sup b K n ). Therefore, under P K . We then have that where ξ sup has finite first moment since P(ξ sup ≥ t) = c 1 e −c 2 t ∧ 1. It follows that there exists X sup X n sup for any n such that P(X sup ≥ t) = 1 ∧ Ct −1 . Since E K [Y n ] is bounded independently of K and n, by Markov's inequality we have that there exists Y sup Y n for all n such that P(Y sup ≥ t) = 1 ∧ Ct −1 . It therefore follows that ζ (n) under P K is stochastically dominated by X sup +Y sup under P where hence X sup + Y sup has finite moments up to 1 − for all > 0.

Convergence along subsequences
In this section we prove the main theorems concerning convergence to infinitely divisible laws in FVIE and IVIE. Both cases follow the proof from [4]; in FVIE the result follows directly whereas in IVIE adjustments need to be made to deal with slowly varying functions.
Recall that we want to show convergence of n /a n along sequences n l (t) however by Corollary 5.5 and Lemma 5.6 it suffices to consider

Proof of Theorem 3 (IVIE)
In IVIE write γ α := (α − 1) log(μ −1 )/ log(β) = (α − 1)γ . By (3.12) we have that for a known constant C μ,α . Due to the slowly varying term, we cannot apply Theorem 7 directly however Theorem 7 is proved using Theorem 6. It will therefore suffice to show convergence of the drift, variance and Lévy spectral function in this case.
Recall that we consider subsequences n l (t) such that a n l (t) ∼ tμ −l . From Propositions 7.10 and 7.14 we then have that for any K ∈ Z the laws of ζ l,K i converge to the laws of Z ∞ as l → ∞. Let (Z (i) ∞ ) i≥1 be an independent sequence of variables with this law and denote F ∞ (x) := P(Z ∞ > x). By Lemma 7.15, ∃Z sup such that ζ l,K 2. for all x continuity points Both terms converge to 0 as l → ∞ by the tail formula of a branch (3.12), the fact that Z sup has no atom at ∞ and that β h ε/2 n l −l → 0 which follows from l ∼ h n,0 . For (2), recall that F l If x > 0 is a continuity point of L λ then λxβ −K is a continuity point of F ∞ hence for any K ∈ Z as l → ∞ We need to exchange the sum and the limit; we do this using dominated convergence.
Since γ α < 1 we can choose > 0 such that γ α + < 1 and < γ α . By (8.1), for l Since Z sup has moments up to γ α + we have that for y = λx which is finite. By choice of it follows that It therefore follows that for Moreover, for x < 0 we have that P χ 1 * n l /K λ l < x = 0 which gives (2). For (3) we have that τ 0 xdL λ is well defined therefore We therefore want to show that By definition we have that proves Theorem 3.

Tightness
We conclude the results for the walk on the subcritical tree with Theorem 4 which is a tightness result for the process and a convergence result for the scaling exponent. We only prove the result in IVIE since the proof is standard (similar to that of Theorem 1.1 of [4]) and the other cases follow by the same method; however, we state the proof more generally. Recall that r n is a n in IVFE, n 1/γ in FVIE, a 1/γ n in IVIE and r n := max{m ≥ 0 : r m ≤ n}.
Proof of Theorem 4 in IVIE For statement 1 we show that lim t→∞ lim sup n→∞ P n /r n / ∈ [t −1 , t] = 0. Let l be such that a n l (1) ≤ a n < a n l+1 (1) then by monotonicity of n P n a 1/γ n The distribution of R 1 is continuous by Theorem III.2 of [23] since lim x→0 L(x) = −∞ (where R t denotes the limiting distribution); therefore, since the sequence (a n l+1 (1) /a n l (1) ) 1/γ can be bounded above by some constant c, For statement 2 we want to show that lim t→∞ lim sup n→∞ P |X n |/r n / ∈ [t −1 , t] = 0. To do this we compare |X n | with n . In order to deal with the depth X n reaches into the traps we use a bound for the height of a trap; for any > 0 we have By (3.12) we have that (tr n − r n )P H(T * − ) ≥ r n → 0 as n → ∞. Using the definition of r n we have that Since a 1/γ r n +1 /a 1/γ tr n −r n converges to t −1/γ α as n → ∞, by continuity of the distribution of R 1 and statement 1 we have that lim t→∞ lim sup n→∞ P (|X n |/r n > t) = 0.
It remains to show that lim t→∞ lim sup n→∞ P |X n |/r n < t −1 = 0. We need to bound how far the walker backtracks after reaching a new furthest point in order to compare |X n | with n . Let υ 0 := 0 and for j ≥ 1 define the jth regeneration time as The regeneration times (υ i − υ i−1 ), υ 1 and the heights of branches H(T * − ρ i ) have exponential moments for all i therefore for any > 0 by a union bound We then have that Then, since a 1/γ r n /a 1/γ 2t −1 r n → (t/2) 1/γ α as n → ∞, by continuity of the distribution of R 1 and statement 1 we indeed have that lim t→∞ lim sup n→∞ P |X n |/r n < t −1 = 0. For the final statement notice that and since r n = n γ (α−1)L (n) for some slowly varying functionL we have that as n → ∞ log(r n )/ log(n) → γ (α − 1) thus it suffices to show that the following is equal to 0 By Fatou we can bound the second term above by lim t→∞ lim inf n→∞ P |X n |/r n ≤ t −1 which is equal to 0 by tightness of (|X n |/r n ) n≥0 .
For the first term we have where κ n is the last regeneration time of Y before time n. Therefore, since κ n+1 − κ n have exponential moments we have that P(lim sup n→∞ (κ n+1 −κ n ) ≥ r n ) = 0; hence, where the second inequality follows by Fatou's lemma. The result follows by tightness of (|X n |/r n ) n≥0 .
Theorem 1 follows from Theorem 4, Proposition 6.7 and Corollary 5.6 with Êλ = t since nq n ∼ n ε . More specifically, since R d t ,0,L t is the infinitely divisible law with characteristic exponent by a simple change of variables calculation we have that the laws of the process ( nt /a n ) t≥0 converge weakly as n → ∞ under P with respect to the Skorohod J 1 topology on D([0, ∞), R) to the law of the stable subordinator with characteristic function 1dL 1 (x). A straightforward calculation then shows that the Laplace transform is of the form (9.1)

Supercritical tree
As discussed in the introduction, the structures of the supercritical and subcritical trees are very similar in that they consist of some backbone structure Y with subcritical GWtrees as leaves. The main differences are as follows: • On the subcritical tree the backbone was a single infinite line of descent, represented by the solid line in Fig. 2 of Sect. 3. On the supercritical tree the backbone is itself a random tree, represented by the solid line in Fig. 5. In particular, it is a GW-tree without deaths whose law is determined by the generating function g(s) : where f is the generating function of the original offspring law and q is the extinction probability. • Each backbone vertex has additional children which we call buds. On the subcritical tree, the number of buds had a size-biased law independent of the position on the backbone. On the supercritical tree, the distribution over the number of buds is more complicated since it depends on the backbone. Importantly, the expected number of buds can be bounded above by μ(1 − q) −1 independently of higher moments of the offspring law which isn't the case for the subcritical tree. • In the subcritical case, the GW-trees forming the traps have the law of the original (unconditioned) offspring law. In the supercritical case, the law is defined by the p.g.f. h(s) := f (qs)/q which has mean f (q).
Let T denote the supercritical tree conditioned to survive, T • the unconditioned tree and T − the tree conditioned to die out. Write Z n , Z • n , Z − n to be the size of their nth generations respectively and V n , V • n to be the number of vertices in the nth generation of the backbone (for T and T • ). As in the subcritical case we denote T * − to be a dummy branch formed by a backbone vertex, its buds and the associated traps. In Fig. 5, the dashed lines represent the finite structures comprised of the buds and leaves. It will be convenient to refer to the traps at a site so for x ∈ Y let L x denote the collection of traps adjacent to x, for example in Fig. 5 L ρ consists of the two trees rooted at y, z. We then write T * − x to be the branch at x. Recall that Theorem 5 states that if offspring law belongs to the domain of attraction of some stable law of index α ∈ (1, 2), has mean μ > 1 and the derivative of the generating function at the extinction probability satisfies β > f (q) −1 . Then, n l (t) n l (t) 1 γ → R t in distribution as l → ∞ under P, where γ is as given in (1.1), n l (t) = t f (q) −l for t > 0 and R t is a random variable with an infinitely divisible law whose parameters are given in [4]. Moreover, the laws of ( n n − 1 γ ) n≥0 and (|X n |n −γ ) n≥0 under P are tight on (0, ∞) and P-a.s. lim n→∞ log |X n | log(n) = γ.
In [4] it is shown that this holds when μ > 1, E[ξ 2 ] < ∞ and β > f (q) −1 . In order to extend this result to prove Theorem 5 it will suffice to prove Lemmas 10.1, 10.2 and 10.3 which we defer to the end of the section.
In Lemma 10.1 we show that P(H(T * − ) > n) ∼ C * f (q) n for some constant C * . This is the same as when E[ξ 2 ] < ∞ for the supercritical tree unlike for the subcritical tree where the exponent changes depending on the stability. This is because the first moment of the bud distribution has a fundamental role and the change from finite to infinite variance changes this for the subcritical tree but not for the supercritical tree. Lemma 10.1 is an extension of Lemma 6.1 of [4] which is proved using a Taylor expansion of the f around 1 up to second moments. We cannot take this approach because f (1) = ∞; instead we use the form of the generating function determined in Lemma 3.3. The expression is important because, as in FVIE, the expected time spent in a large branch of height H(T * − ) is approximately cβ H(T * − ) for some constant c.
Lemma 10.2 shows that, with high probability, no large branch contains more than one large trap. This is important because the number of large traps would affect the escape probability. That is, if there are many large traps in a branch then it is likely that the root has many offspring on the backbone since some geometric number of the offspring lie on the backbone. The analogue of this in [4] is proved using the bound f (1)− f (1− ) ≤ C which follows because f (1) < ∞. Similarly to Lemma 10.1, we use a more precise form of f in order to obtain a similar bound. Lemma 10.3 shows that no branch visited by level n is too large. This is important for the tightness result since we need to bound the deviation of the walk from the furthest point reached along the backbone. The proof of this follows quite straightforwardly from Lemma 10.1.
To explain why these are needed, we recall the argument which follows a similar structure to the proof of Theorem 2. As was the case for the walk on the subcritical tree, the first part of the argument involves showing that, asymptotically, the time spent outside large branches is negligible. This follows by the same techniques as for the subcritical tree.
One of the major difficulties with the walk on the supercritical tree is determining the distribution over the number of entrances into a large branch. The height of the branch from a backbone vertex x will be correlated with the number of children x has on the backbone. This affects the escape probability and therefore the number of excursions into the branch. It can be shown that the number of excursions into the first large trap converges in distribution to some non-trivial random variable W ∞ . In particular, it is shown in [4] that W ∞ can be stochastically dominated by a geometric random variable and that there is some constant c W > 0 such that P(W ∞ > 0) ≥ c W .
Similarly to Sect. 4, it can be shown that asymptotically the large branches are independent in the sense that with high probability the walk won't reach one large branch and then return to a previously visited large branch. Using Lemmas 10.1 and 10.2 (among other results) it can then be shown that n can be approximated by the sum of i.i.d. random variables.
The remainder of the proof of the first part of Theorem 5 involves decomposing the time spent in large branches, showing that the suitably scaled excursion times converge in distribution, proving the convergence results for sums of i.i.d. variables and concluding with standard tightness results similar to Sect. 9. Since P(Z − 1 = k) = p k q k−1 , the subcritical GW law over the traps has exponential moments. This means that these final parts of the proof follow by the results proven in [4] since, by Lemma 10.1, the scaling is the same as when E[ξ 2 ] < ∞.
Tightness of ( n n −1/γ ) n≥0 and (X n n −γ ) n≥0 and almost sure convergence of log(|X n |)/ log(n) then follow by the proof of Theorem 1.1 of [4] (with one slight adjustment) which is similar to the proof of Theorem 4. In order to bound the maximum distance between the walker's current position and the last regeneration point we use a bound on the maximum height of a trap seen up to Y n . In [4] it is shown that the probability a trap of height at least 4 log(n)/ log( f (q) −1 ) is seen is at most order n −2 by using finite variance of the offspring distribution to bound the variance of the number of traps in a branch. In Lemma 10.3 we prove this using Lemma 10.1. For any t, s > 0 (1 − q)). Furthermore, Therefore, writing t n := s n q + 1 − q we have that 1 − t n = q(1 − s n ) and By Taylor we have that ∃z ∈ [s n q, q] such that f (s n q) = q + q f (q)(s n − 1) + f (z)q 2 (s n − 1) 2 /2. Since q < 1 we have that f (z) exists for all z ≤ q and is bounded above by f (q) < ∞. By Lemma 3.3 for a slowly varying function L. In particular, which is the desired result.
Recall that Y n is the first hitting time of level n of the backbone by Y n and L x is the collection of traps adjacent to x then for ε > 0 let where t h n,ε = qs h n,ε + 1 − q. By Taylor  There exists some constant c such that 1 − t h n,ε ∼ qc μ f (q) h n,ε ≤ cn −(1−ε) therefore since α > 1 we can choose ε > 0 small enough (depending on α) such that P(B(n) c ) = o(1).
Let D(n) := max j≤ Y n H(T * − Y j ) ≤ 4 log(n)/ log( f (q) −1 ) be the event that all branches seen before reaching level n are of height at most 4 log(n)/ log( f (q) −1 ). Probability that the walk reaches δ before ρ p 2 (H ) Probability of escaping the tree started at δ P T , P, P Quenched, annealed and environment laws P K , P K Annealed and environment laws conditioned on the number of buds or the height L Slowly varying function satisfying (2.2) a n Scaling sequence for ξ * so that P(ξ * ≥ xa n ) ∼ n −1 x −(α−1) L Slowly varying function satisfying a n = n 1 α−1L (n) n l (t) Subsequence for convergence r n , r n Appropriate scaling of n in each case and its inverse l n,ε , l + n,ε Critical number of buds in IVFE h n,ε , h + n,ε