Near critical preferential attachment networks have small giant components

Preferential attachment networks with power law exponent $\tau>3$ are known to exhibit a phase transition. There is a value $\rho_{\rm c}>0$ such that, for small edge densities $\rho\leq \rho_c$ every component of the graph comprises an asymptotically vanishing proportion of vertices, while for large edge densities $\rho>\rho_c$ there is a unique giant component comprising an asymptotically positive proportion of vertices. In this paper we study the decay in the size of the giant component as the critical edge density is approached from above. We show that the size decays very rapidly, like $\exp(-c/ \sqrt{\rho-\rho_c})$ for an explicit constant $c>0$ depending on the model implementation. This result is in contrast to the behaviour of the class of rank-one models of scale-free networks, including the configuration model, where the decay is polynomial. Our proofs rely on the local neighbourhood approximations of [Dereich, Morters, 2013] and recent progress in the theory of branching random walks [Gantert, Hu, Shi, 2011].

1 Introduction and main results

Introduction
Sparse random graph models typically undergo a phase transition in their connectivity behaviour depending on the mean number of edges per vertex. A typical case are the Erdős-Renyi graphs with n vertices and m = m(n) edges and asymptotic edge density ρ = lim m(n) n . There exists a critical density ρ c = 1 such that if the edge density satisfies ρ ≤ ρ c the largest component in the graph comprises a vanishing proportion of vertices, whereas for ρ > ρ c this proportion θ(ρ) is strictly positive. The behaviour of θ(ρ) as ρ ↓ ρ c is characterised by an exponent β defined by as ρ ↓ ρ c , which is β = 1 in the Erdős-Renyi case. A natural extension of the Erdős-Renyi model is the configuration model, which allows to construct random graphs with n vertices and a given degree sequence m 1 , . . . , m n . Of particular interest is the case when the degree sequence is heavy-tailed, 1 n #{1 ≤ j ≤ n : m j = k} n→∞ −→ µ(k) = k −τ +o (1) , as k → ∞, where the parameter τ > 2 is the power-law exponent. The connectivity behaviour of the Erdős-Renyi model persists with β = 1 for the configuration model with τ > 4, and with β = 1/(τ − 3) if 3 < τ < 4, see Cohen et al. [4]. The paper by Cohen is based on an informal approximation of local neighbourhoods in the graph by Galton-Watson trees and thereby extends to a wide range of scale-free network models, where similar approximations hold.
Cohen et al. [4] claim their result for scale-free networks in general without specifying a model. This reflects the belief that the behaviour observed in the configuration model extends to all natural scale-free network models including the class of preferential attachment networks. In the present paper however we show for the first time that preferential attachment networks have a qualitatively completely different behaviour than predicted in [4]. In fact, for all τ > 3 the size of the giant component is decaying exponentially as one approaches the critical edge density. More precisely, we show that the relative size of the giant component in preferential attachment networks is where c is an explicit constant depending on the way in which the edge density is controlled. This demonstrates once again that preferential attachment networks belong to a different universality class than the configuration model and other models based on rank-one connection probabilities.
The underlying phenomenon of the 'small giant component' or 'slow emergence of the giant' has first been discovered and discussed by Oliver Riordan in the seminal paper [16], and in collaboration with Bollobás and Janson in [2] and [3]. Riordan [16] finds that for the original Barabasi-Albert model subject to Bernoulli percolation with retention parameter p, one has where m ≥ 2 is the outdegree of all vertices. As this model corresponds to the critical case τ = 3 this is not at odds with the results of Cohen et al. [4]. The merit of our work is to extend the phenomenon of slow emergence to a regime where it is most surprising because it defies the predictions in [4].
In his proof, Riordan exploits the fact that there are local approximations of the network by multitype branching processes whose survival probability can be studied analytically by looking at the associated Laplace operators. These branching processes are more complex than those used in [4].
In the present paper local approximation is by a branching random walk with killing. Instead of the analytical techniques used by Riordan, which we could not apply effectively in our case due to the higher complexity of the approximating process, we use geometric properties of killed branching random walks. Studying their survival probabilities requires sophisticated techniques from the theory of branching random walks, which became available only in the past few years, see in particular [8]. Our results are therefore pleasingly probabilistic and highly timely.
be a sequence of attachment rules with γ t := inf{ f t (k) : k ∈ N 0 } < 1 2 for all t ≥ 0. We denote by ρ t and α * t the spectral radius and its minimizer corresponding to the score operator for the unpercolated branching random walk derived from attachment rule f t . Theorem 1.2. Let (f t ) t≥0 be a pointwise decreasing sequence of attachment rules with γ t < 1 2 for all t ≥ 0 and pointwise limit f . Suppose that θ(1, f t ) > 0 for all t and θ(1, f ) = 0. Then where α * and ρ (α * ) are derived from f .
The existence of ρ and α * corresponding to f is proved in Proposition 2.3. There we will also see that lim t→∞ ρ t (α * t ) = 1. The following corollary exemplifies Theorem 1.2 for linear attachment rules. We denote β c (γ) = ( 1 2 − γ) 2 1 − γ and γ c (β) = 1 2 1 − β − β 2 + 2β . γ − γ c (β) log θ(1, γ · +β) = − π 2(β 2 + 2β) 1/4 . Remark 1.4. Two cases in our phase diagram are covered in the work of Riordan [16]. The first of these cases corresponds to an approximation from the right of the point β = 0, γ = 1 2 which is equivalent to the original Barabasi-Albert, or LCD, model. 1 Note that our results refer to the subcritical case γ < 1 2 and the critical case γ = 1 2 is not included. The second is the case β = 1 4 , γ = 0, the Dubins model, in which there is no preferential attachment and our results are consistent with those of Riordan [16]. Corollary 1.3 allows a quantitative comparison of the decay of the giant component for different models. The smaller γ (or the larger τ ), the slower is the decay. The LCD model, or equivalent models with γ = 1 2 , have faster decay of the size of the giant component than preferential attachment networks with attachment rules satisfying γ < 1 2 .
Throughout, we will use the following notation: For every integer k ∈ N, we write [k] = {1, . . . , k}, The remaining paper is structured as follows: We prove Theorems 1.1 and 1.2 simultaneously. In Section 2 we collect several auxiliary results that we need later on. In particular, in Section 2.1 we recall the relevant results from [6] who relate the size of the giant component to the survival probability of a multitype branching random walk with killing. In Section 2.2 we derive the main tool to analyse these branching random walks, a version of the well-known many-to-one lemma. In Section 2. We introduce a pure jump Markov process with generator This defines an increasing, integer-valued process, which jumps from k to k + 1 after an exponential waiting time with mean 1/f (k), independently of the previous jumps. Under the probability P we denote by (Z t : t ≥ 0) the process started in zero, by (Ẑ t : t ≥ 0) the process started in one, and by (Z (τ ) t : t ≥ 0) the process started in zero conditioned to have a point at τ ≥ 0. This process is used to define a multitype branching random walk with type space T : where is a non-numerical symbol for 'left'. A particle in location x ∈ R and of type τ ∈ T , produces offspring to its left whose displacements have the same distribution as the points of the Poisson point process with intensity measure The type of an offspring on the left equals the distance to its parent.
The distribution of the offspring to the right depends on the type of the particle. When the particle is of type , then the relative positions of its right offspring follow the same distribution as the jump times of (Z t : t ≥ 0). When the particle is of type τ ≥ 0, then the displacements follow the same distribution as the jump times of (Z (τ ) t − 1 [τ,∞) (t) : t ≥ 0). All offspring on the right are of type .
The offspring to the right do not form a Poisson point process. The more particles are born, the higher the rate of new particles arriving. Moreover, the total number of particles produced is infinite without accumulation point. The expected distance between a particle and its kth offspring on the right equals Since lim j→∞ f (j) j = γ, this distance behaves asymptotically like γ −1 log(k) when γ = 0 and like k when γ = 0.
We call the described process idealized branching random walk (IBRW) in accordance with [6]. Dereich and Mörters [6] show that the genealogical tree of the IBRW is related to the local neighbourhood of a vertex in G n . To obtain a branching process approximation to (G n (p) : n ∈ N), we define the percolated IBRW by associating to every offspring in the IBRW an independent Bernoulli(p) random variable. If the random variable is zero, we delete the offspring together with its descendants. Otherwise, the offspring is retained in the percolated IBRW. When the percolated IBRW is started with one particle in location x and type τ , then we write P p (x,τ ) for its distribution and E p (x,τ ) for the corresponding integral operator; . The percolated IBRW can be interpreted as a labelled tree Γ where every node represents a particle and is connected to its children and (apart from the root) to its parent. The vertices are identified as finite sequences of natural numbers x = j 1 . . . j k , including the empty sequence ∅ which denotes the root. We concatenate sequences x = i 1 . . . i k and y = j 1 . . . j m to form the sequence xy = i 1 . . . i k j 1 . . . j m . When, for j ∈ N, xj is a vertex in Γ, then x, x1, . . . , x(j − 1) are also vertices in the tree and x =: p(xj) is the parent of xj. The length |x| = k is the generation of x. For |x| ≥ k, we abbreviate p k (x) for the k-fold composition of p(·). The ancestor of x in generation k is denoted by x k , i.e. x k := p n−k (x) when |x| = n. In particular, x 0 always denotes the root. To every vertex we associate two functions, S and τ . Here S(x) is the location of the particle on the real line and τ (x) denotes its type.
To obtain a branching process approximation to the local neighbourhood of a vertex in (G n (p) : n ∈ N), we consider the percolated IBRW with a killing barrier at zero. That is, every particle with location on the nonnegative half-line is deleted together with its descendants. Dereich and Mörters prove the following identification.
Theorem 2.1 (Dereich and Mörters [6]). For all p ∈ [0, 1] and attachment rules f , θ(p, f ) equals the survival probability of the percolated IBRW with a killing barrier at zero, started with one particle of type whose location is given by −E, where E is an exponential random variable with mean one.
Next we collect some spectral properties that will be used in the analysis of the IBRW.
Denote by C(T ) the Banach space of bounded, continuous functions on T equipped with the supremum norm · . For α ∈ (0, 1), we consider the score operator The spectral radius of A p α is denoted by ρ p (α). The dependence of A p α on the attachment rule f is suppressed in notation but it will always be clear from the context which f is considered. Since by definition, it suffices to analyse A α . We write 1 for the constant function with value 1 and let I := (γ, 1 − γ) for γ < 1 2 and I = ∅ for γ ≥ 1 2 . Lemma 2.2. Let p ∈ (0, 1]. If γ ≥ 1 2 , then A p α 1(0) = ∞. If γ < 1 2 , then the following holds: (i) ρ p (α) is finite for all α ∈ I, ρ p (α) = pρ(α) and ρ(α) → ∞ for α → ∂I.
(ii) There exists a unique positive eigenfunction v α of A p α corresponding to ρ p (α) with v α = 1. Moreover, v α does not depend on the retention probability p and min τ ∈T v α (τ ) > 0.
(iii) The function ρ is twice differentiable on I with The function ρ is strictly convex on I and there exists a unique minimizer α * ∈ I.
Proof. By (2.2), it suffices to prove Lemma 2.2 in the case p = 1. For that case, it was shown in [6, Lemma 3.1] that A α 1(0) < ∞ is equivalent to A α being a strongly positive, compact operator with A α g ∈ C(T ) for all g ∈ C(T ). Moreover, it is proved that for all g ∈ C(T ), Here, for example, A α g means that for all τ ≥ 0, . The values of a and c are identified as (cf. proof of Proposition 1.10 in [6]) We analyse the convergence properties of a(α).
To prove (v), consider τ ≥ 0. LetẐ (τ ) t = Z (τ ) − 1 [τ,∞) for any τ ∈ [0, ∞). Then, by the definition of the eigenfunction, where we used in the inequality that the distribution of the positions to the left of the origin do not depend on the initial type and for the second expectation we used the monotonicity in types proved in [6]. The upper bound holds by a similar argument.
The next proposition collects some of the consequences for the spectral radius if we consider converging attachment rules. .
and let ρ(α) be the spectral radius of the operator associated to the branching process with attachment rule f , α * be its unique minimizers and let v α be the corresponding eigenfunction with v α = 1. Define the same quantities with index t when referring to the branching process associated to f t , where we set (i) ρ(α) = lim t→∞ ρ t (α) for all t ∈ (γ, 1 − γ).
(iv) Moreover, as t → ∞, Proof. Since (f t ) t≥0 is a pointwise decreasing sequence of positive functions exists. As an infimum of concave functions, f : The increments might in general only satisfy f (k + 1) − f (k) ≤ 1, but the strict inequality is not needed for the analysis of the branching process and the corresponding operators.
The assumption ρ t (α * t ) > 1 implies that there exists a giant component for all t ≥ 0. Hence, a ft (1/2) + a ft (1/2)c ft (1/2) > 1 ∀t ≥ 0, by Proposition 1.10 in [6]. Here a ft and c ft are the functions given in (2.4). By monotone convergence, a f (1/2) + a f (1/2)c f (1/2) ≥ 1 and, in particular, f (0) > 0. Hence, f is an attachment rule. From now on we add a subscript t to all quantities corresponding to f t and no subscript for quantities corresponding to f .
The assumption γ t < 1 2 is needed to make the operator A t,α exist for some α. We have for all t ≤ s In particular, (γ t ) t≥0 is a non-increasing sequence. Let I = (γ, 1 − γ) and I t = (γ t , 1 − γ t ) for t ≥ 0. Then I t ⊆ I for all t ≥ 0 and we write ρ t (α) = ∞ whenever α ∈ I t .
Let α ∈ I. We use the monotonicity of the branching process in the attachment rule to derive In particular, for every α ∈ I there exists t ≥ 0 such that ρ t (α) < ∞. Hence, In particular, we can consider the family ρ, (ρ t ) t≥t0 of uniformly continuous functions onÎ.
In the next step we argue that α * Then there exists a δ > 0 and a subsequence t n ↑ ∞ such that |α * tn − α * | ≥ δ for all n ∈ N. Since ρ is strictly convex with unique minimizer α * , we have In particular, Since the term on the left-hand side converges to 1, this is a contradiction and the convergence α * t t→∞ −→ α * is established.
The fact that α * t converges to α * and f t converges to f implies that A t,α * t converges to A α * because of the uniform continuity of the operator in α ∈Î. Since the eigenspaces of ρ(α * ) and ρ t (α * t ) are one dimensional, one can argue along the lines of Note 3 on Chapter II in [11, pages 568-569] is uniformly bounded in t from zero and infinity. With the observed convergences and this uniform bound (2.3) now implies that also ρ t (α * t ) → ρ (α * ), as required.

The many-to-one Lemma
We first continue the analysis of the IBRW. The following lemma is based on a spine construction which is known as Lyons' change of measure [13]. Recall the Ulam-Harris notation from Section 2.1.

Lemma 2.4 (Many-to-one).
For all α ∈ I there exists a probability measure P α on some measurable space, and a Markov process ((S n , τ n ) : n ∈ N 0 ; P α ) with state space R × T , such that for all n ∈ N, (2.5) Note that it is easy to check that the distribution of ((S n , τ n ) : n ∈ N 0 ; P α ) does not depend on the percolation parameter p.
Proof. Given a labelled tree (Γ, L), with L(x) = (S(x), τ (x)), we can distinguish an ancestral line ξ = (ξ 1 , ξ 2 , . . .) which we call spine. In the space of labelled trees, we denote by F n the σ-field generated by the first n generations, F n = σ((x, L(x)) : |x| ≤ n). The analogue in the space of trees with spines is denoted by F * n . For every (s 0 , τ 0 ) ∈ R × T , the distribution of the IBRW started in (s 0 , τ 0 ) can be interpreted as a distribution P p (s0,τ0) on the set of labelled trees. We extend this measure to the space of labelled trees with spines. Since (s 0 , τ 0 ) and p will remain fixed throughout the proof, we omit it from the notation and write P = P p (s0,τ0) for brevity. Note that every F * n -measurable function g can be written as for F n -measurable functions g x (see page 24 in [17]). We define P * n to be the (non-probability) measure on F * n such that for all nonnegative F * n -measurable functions g, We now construct a new branching random walk under a new probability measure P α . The root has again label L(∅) = (S(∅), τ (∅)) = (s 0 , τ 0 ). A particle ξ n on the spine in generation n with label (S(ξ n ), τ (ξ n )) produces new offspring with distribution for all atomic measures µ on R × T . Here L σ denotes the offspring distribution for a particle of type σ ∈ T in the original process. The new spine particle ξ n+1 in generation n + 1 is chosen from the offspring of ξ n by choosing an offspring x with probability proportional to Off the spine the new branching random walk behaves exactly as the original one. Then In particular, for all F : We define (S n , τ n ) := (S(ξ n ), τ (ξ n )) and P α (s0,τ0) = P α . The Markov property follows from the definition of the process. Since the offspring distribution of the spine is absolutely continuous with respect to the offspring distribution of the original process and the type of the original process is a function of the locations of its ancestors and the particle itself, the proof is complete.
Next we also need a higher-dimensional version of the many-to-one lemma that includes the number of offspring of the particles on the spine. Lemma 2.5. For all α ∈ I, (s 0 , τ 0 ) ∈ R × T there exists a probability measure P α (s0,τ0) = P α,p (s0,τ0) and a Markov process Moreover, for any measurable A ⊂ R and any measurable F , Note that unlike in Lemma 2.4 the distribution of (S k , τ k , ν k−1 ) k∈N ) does depend on p, since we are considering a non-linear function of the point process describing the position of the offspring.
Proof. Consider the IBRW with spine ξ under the measure P α as constructed in the proof of Lemma 2.4. Then, define (S n , τ n , ν n−1 ) := (S(ξ n ), τ (ξ n ), ν ξn−1 ) and the first statement follows since we know the explicit Radon-Nikodym density of P α with respect to P * n . The second statement is a consequence of the Markov property of the IBRW with spine combined with a suitable choice of test function F . Lemma 2.6 (Moments). Let α ∈ I and ((S n , τ n ) : n ∈ N 0 ) be the Markov process from Lemma 2.4.

Asymptotic moment estimates
In the proofs of Theorems 1.1 and 1.2 we will need estimates for moments of the Markov chains defined in Lemmas 2.4 and 2.5. Suppose that f tn is a sequence of decreasing attachment rules such that f tn ↓ f pointwise for an attachment rule f . Further, suppose that γ n := lim k→∞ f tn (k)/k < 1 2 for all n ∈ N, so that by Proposition 2.3 also γ = lim k→∞ Let α * n , resp. α * be the unique minimizer of the spectral radius ρ n , resp. ρ, of the operator corresponding to the attachment rule f n , resp. f . In the setting of Theorem 1.1 take f tn = f and write ρ n = ρ pn = p n ρ(·).
The Markov chain from Lemma 2.4 corresponding to attachment rule f tn is denoted by ( Proof. We first consider the case S (n) 1 ≥ 0. Note that where we used that ρ n (α n ) > 1, the monotonicity in types for v αn by Lemma 2.2 and the uniform boundedness of the quotient vα n (0) vα n ( ) in n by Proposition 2.3. Hence, it suffices to show that for the right choice of η the expectation on the right hand side remains bounded in n. Now, by Lemma 2.2, we have that η := 1 4 (α * − γ) > 0. By Proposition 2.3, we have that γ n ↓ γ and α * n → α * . Then, we can choose n 0 sufficiently large such that for all n ≥ n 0 we have γ n < γ + η and α n ≥ α * − η.
Furthermore, we denote byZ (n 0 ) the jump process that jumps from k to k + 1 with rate f tn 0 (k) started in 1. Then by comparison with a Yule process with constant branching rate, we can find a constant Finally, we obtain from the construction of IBRW and using (2.6) that for n ≥ n 0 which is finite by choice of η.
The fact that the supremum over all n is finite follows from the same argument if we redefine η as 1 ≤ 0, it suffices to prove in a second step, that there exists η > 0 such that Since the children to the left of a particle in the IBRW form a Poisson process and their distribution is not depending on the type of their ancestor we have by construction where X (f ) is the pure birth process with jump rates given by f .
k . By Lemma 2.2, n , > 0. Using that f t k (k) ≤ k + 1 and a comparison to a Yule process, we have that there exists C n > 0 such that By Proposition 2.3, we can find n 0 such that for all n ≥ n 0 , Furthermore, for n ≥ n 0 , we can use the monotonicity of f tn to deduce that which completes the second step and thus the proof of the lemma.
be the Markov chain defined in Lemma 2.5 either for attachment rule f tn and percolation parameter 1 or for fixed attachment rule f (with γ < 1 2 ) and percolation parameter p n , where p n ↓ ρ(α * ) −1 . For any sequence (M n ) n∈N such that M n → ∞, there exist constants C > 0 and γ > 0 such that, for all n, Proof. To unify notation define p n = 1 in the varying f case and f tn = f in the percolation case. By the extended many-to-one formula, Lemma 2.5, we have that We can use that ρ n (α * n ) > 1, the monotonicity in types and in p n , Lemma 2.2, and that C := sup n∈N vα n (0) vα n ( ) < ∞ by Proposition 2.3, in order to bound the above by (2.7) For the first term in (2.7), we note that Therefore, if we let (Ẑ ft 1 ) t≥0 be the jump process jumping from k to k + 1 at rate f t1 (k) started in 1, we can conclude that by construction of the IBRW for some constants C(f t1 ) andγ > 0, where the latter bound follows by comparison with a Yule process, whose second moments grow at most exponentially.
For the second term on the right hand side in (2.7), the first expectation is bounded uniformly in n by (the second part of) Lemma 2.7 and the second expectation can be bounded by the second moment, so that the first part of the argument applies.
For the final term in (2.7), we use that the particles to the left form a Poisson process, so that we can use a standard identity for Poisson processes, see e.g. [12,Equation (4.26)], to deduce that However, as in the proof of Lemma 2.7 the right hand side is bounded uniformly in n.
Corollary 2.9. In the setting of Lemma 2.8, we have for any sequence N n → ∞ and with R n = e N 1/4 , M n = N 1/5 that there existC,γ > 0 such that ≤ e −ηMn + 1 R n C e γ Mn = e −ηN 1/5 + C e γ N 1/5 −N 1/4 , by our choice of M n and R n . Therefore, the statement of the corollary follows by choosingγ andC appropriately.

Mogulskii's theorem
The main technical tool in the proof of our main result is the following large deviation result due to Mogulskii in its original form. We state it here in a version adapted to Markov chains as a generalisation of the version for random walks found in [8].
Theorem 2.10 ( [14], [8]). Let T be a nonempty set. We assume the following: (ii) (a n ) n∈N and ((k n ) n∈N ) are positive sequences with a n , k n → ∞ and a 2 n /k n → 0 as n → ∞.
(iii) For all c 1 , c 2 , c 3 , c 4 ∈ R, A > 0 and for r n := Aa 2 n , uniformly in τ ∈ T , where σ 2 > 0 is a constant independent of A and (W t : t ≥ 0; P ) is a standard Brownian motion.
We will now show that we can apply Mogulskii's results to the Markov chains from Lemmas 2.4 and 2.5. We will treat both the setting of Theorem 1.1 and 1.2 at the same time and so continue using the notation introduced at the beginning of Section 2.3.
To this end, we first recall Donsker's theorem for martingale difference arrays (see for example Theorem 7.7.3 in [7] or Theorem 18.2 in [1]). The theorem is usually stated for r n = n, but it is straightforward to generalize the statement to the following: Proposition 2.11. Let (r n ) n∈N ∈ N N be a sequence with r n ↑ ∞ as n → ∞. For every n ∈ N, let (ξ n i : 1 ≤ i ≤ r n ) be a family of random variables and denote F n i = σ(ξ n 1 , . . . , ξ n i ) for all i ≤ r n . Assume that  Then the linear interpolation of ( i≤m ξ n i : m ≤ r n ) converges weakly to a standard Brownian motion on [0, 1].
We use Donsker's theorem as follows.
Lemma 2.12. Let A > 0, (a n ) n∈N be a positive sequence with lim n→∞ a n = ∞ and write r n = Aa 2 n for each n ∈ N. Moreover, let (S (n) i , τ (n) i ) i∈N0 be the Markov chain introduced in Lemma 2.4 for attachment rule f tn . For all c 1 , c 2 , c 3 , c 4 ∈ R, as n → ∞, uniformly in τ ∈ T , where (W t : t ≥ 0; P ) is a standard Brownian motion.
Proof. The first step is to show that the conditions of Proposition 2.11 are satisfied by the random variables by Lemma 2.6 and ρ n (α * n ) = 0 as α * n minimizes ρ n . Moreover, as n → ∞, since r n → ∞ and ρ n (α * n )/ρ n (α * n ) → σ 2 by definition. Thus, Condition (ii) of Proposition 2.11 is satisfied. Finally, let η be as in Lemma 2.7, then where C is a constant such that x 2 ≤ Ce η|x| for all x ∈ R. Then, by Lemma 2.7 the exponential moment is bounded uniformly in n, so that the right hand side converges to zero as required.
To apply Theorem 2.10 we have to check that the convergence of the distribution functions is uniform in the start type. This is guaranteed by the monotonicity of the IBRW in the start type (which was proven in [6, Remark 2.6]) which entails a monotonicity of (S (n) i ) by the many-to-one lemma, and by the fact that the limit is independent of the start type. k ,τ (n) k ,ν (n) k−1 ) k∈N0 with filtration (F (n) k ) k∈N0 and transitions given for any measurable F by where (S (n) 1 , τ (n) 1 , ν (n) 0 ) is the first step of the Markov chain defined in Lemma 2.5 associated either to the IBRW with attachment rule f tn and p = 1 or with attachment rule f and percolation parameter p n . Let A > 0, write r n = Aa 2 n . For all c 1 , c 2 , c 3 , c 4 ∈ R, as n → ∞, uniformly in τ ∈ T , where (W t : t ≥ 0; P ) is a standard Brownian motion.
Proof. We show that we can replaceS (n) i by S (n) i in the above probability up to an error that converges to 0 uniformly in τ and then invoke Lemma 2.12. Define the event E n := {ν (n) 0 ≤ R n , S (n) 1 − S (n) 0 ≤ M n }. and note that by Corollary 2.9, there exist constantsC,γ > 0 such that We estimate for any c i where we assume that n is sufficiently large and we used (2.11) together with 1 1−x ≤ 1 + 2x for x ∈ [0, 1 2 ]. Iterating this estimate yields and we note that by the choice of r n the error converges to 0.
For a lower bound, we estimate Iterating and using the bound (2.11) gives where the error converges to 0 by choice of r n . Then, combining (2.12) and (2.13) together with Lemma 2.12 we can deduce the statement of the lemma.

Proofs: Upper bound
In this section, we fix the start type of the IBRW. Recall that in the killed IBRW every particle x with S(x) > 0 is deleted together with its descendants. We denote its survival probability by ζ, that is, for s 0 ≤ 0, ζ s0 (p, f ) = P p (s0, ) (killed IBRW survives).

Proof. By definition,
For the first summand we use the Markov inequality and Lemma 2.4 to derive where we used in the last step that by Lemma 2.
Proof. Using Lemma 3.1, that j → I(j) is decreasing, ρ(α) ≥ 1 and that j → e −αbj is increasing to obtain For the proof of Theorem 1.1, let (p n ) n∈N be a sequence of retention probabilities with p n ↓ p c . For Theorem 1.2, let (t n ) n∈N be a sequence of parameters with t n ↑ ∞. We write, ρ n (·) = ρ pn (·) = p n ρ(·) for Theorem 1. Moreover, we denote by v n the eigenfunction for ρ n (α * n ) from Lemma 2.2. The Markov chain from Lemma 2.4 corresponding to α = α * n and retention parameter p n or attachment rule f tn is denoted by ((S (n) i , τ (n) i ) : i ∈ N; P), i.e. P = P α * n (s0, ) . One easily checks that in the setup of Theorem 1.1 the distribution of the Markov chain does not depend on n.
Notice that the specific choice of parameters implies that By Lemma 2.12 we can apply Theorem 2.10 with a n = (lk n ) 1/3 , n = lk n and g 2 (t) = a −1/2 (C + δ) l N , and then take δ 0 to derive lim sup Moreover, as n → ∞, 1 2 n (lk n ) Proof of the upper bounds in Theorems 1.1 and 1.2. Since θ(p, f ) is non-decreasing in retention probability p and attachment rule f , it suffices to consider the asymptotic behaviour along a discrete subsequence. As before, for Theorem 1.1 we take any discrete sequence of retention probabilities p n ↓ p c and for Theorem 1.2 we take any discrete sequence t n ↑ ∞. We make use of the notation introduced in and before Lemma 3.3. In particular, we fix N ∈ N, a > 0 and b > 0 and let k n := (a/ n ) Applying Lemma 3.2 with α = α * n , we obtain for every N ∈ N, a > 0, b > 0, C > 0 log sup Recall that the choice of parameters implies that Taking N to infinity, we deduce lim sup Now, we consider any C < π 2 σ 2 2 and we choose b such that b < π 2 σ 2 2 − C.

Proofs: Lower bound
The general strategy of the lower bound is to identify a subtree of the IBRW that has the same distribution as a Galton-Watson tree. For this carefully chosen subtree we then lower bound the survival probability, which in return gives the required lower bound on the survival probability of the IBRW. In Section 4.1 we collect some general facts about Galton-Watson trees, which we will then use in Section 4.2 to carry out the proof of the lower bound.

Galton-Watson lemmas
In order to show the lower bound we construct a Galton-Watson tree, whose particles are a subfamily of the killed branching random walk. To estimate the survival probability of this Galton-Watson tree, we will use the following general lemma due to [8] and we also recall the proof for completeness.
Lemma 4.1. Let GW be a Galton-Watson tree and denote by X the number of children in the first generation and by q its extinction probability. Then, for all r ≤ min{ 1 8 , q}, Proof. Denote by q the extinction probability of GW and by s → g(s) = E[s X ] the generating function of GW. Then, for every r ∈ [0, q], where we used that g (s) is increasing, and g (s) ≤ 1 for all s ∈ [0, q n, ]. We continue by estimating where we used for the second summand that u → ue −ru is decreasing on Then, for all r ∈ (0, 1 8 ], 1 1 − r ≤ 2 and 1 1 − r r −2 e −1/r ≤ r. Combining (4.2) and (4.3), we deduce Rearranging (4.1) and (4.4), we conclude that for all r ≤ min{ 1 8 , q}, as required.
We will also need an estimate which guarantees that a supercritical Galton-Watson process grows exponentially fast on survival with large probability.
Lemma 4.2. For all θ 1 > 1 > θ 2 > 0 there exists δ > 0 such that for any Galton-Watson process (X n : n ∈ N) with X 0 = 1 where mean offspring m, offspring distribution (p k ) k∈N0 , and extinction probability q satisfy q, p 1 < δ and m > 1/δ, we have, for sufficiently large n, Proof. Denote by g the generating function of (X n : n ∈ N). By pruning the tree, i.e. removing all finite subtrees, we obtain a tree which on survival of the original process equals a Galton-Watson process (X n : n ∈ N) withp 0 = 0,p 1 = f (q) ≤ p 1 + 2q (1−q) 2 , and the same mean as the original process. Hence δ > 0 can be chosen such that the pruned process has arbitrarily smallp 1 and arbitrarily large mean.
Choosing δ > 0 so small that EX 1 > θ 2 1 and √p 1 < θ 2 , for sufficiently large n, the denominator is larger than 1/2 and the numerator is smaller than 1 8 θ n 2 , so that Note that if we condition X n on extinction in distribution it is equal to a Galton-Watson process X * n with mean f (q) =p 1 . Therefore, by Markov's inequality by the same assumptions on δ as above for n large. We can also assume that δ is sufficiently small, so that the extinction probability q is less than 1/2. Hence, combining the above estimates, which completes the proof.

The lower bound
Throughout, we use the same notation as in the upper bound: For the proof of Theorem 1.1, let (p n ) n∈N be a sequence of retention probabilities with p n ↓ p c . For Theorem 1.2, let (t n ) n∈N be a sequence of parameters with t n ↑ ∞. We denote by S (n) the positions either in the percolated IBRW or in the IBRW with attachment rule f tn . If the context is clear, we will omit the superscript. Also, we write, ρ n (·) = p n ρ(·) for Theorem 1.1 ρ tn (·) for Theorem 1.2 and α * n := α * for Theorem 1.1 α * tn for Theorem 1.2.
Given any starting point s ≥ 0 and initial type τ , we will write P = P (s,τ ) = P pn (s,τ ) in the percolation case and P = P (s,τ ) = P 1 (s,τ ) in the case of Theorem 1.2. In view of Lemma 4.1, we will now choose a Galton-Watson tree GW n as a subtree of the killed IBRW in the following way, where we denote by X (n) the number of children in the first generation of GW n : (a) P (X (n) = 0) ≈ the survival probability of the Galton-Watson process. That is, when there are offspring, then the process usually survives.
(b) P (X (n) = 0) is close to the survival probability of the killed IBRW. That means that we chose the subpopulation as a good approximation of the BRW and that the first inequality in (4.5) is a good estimate.
The Galton-Watson tree is obtained by a coarse-graining procedure, which we now describe. It involves positive parameters b, λ, θ, M which will be chosen carefully at a later stage of the proof. We group together the first N + o(N ) generations in the IBRW to form the first generation in GW n . It turns that we have to choose N = N n such that N n = (b/ n ) 3/2 , and for the first N steps we only choose particles whose positions are in the interval To be precise, let L(N ) = N + N 1/3 and Then we define C n to be the particles in the first generation of GW n . We include the last N 1/3 generations to make sure that if we survive until time N , then we will have many particles by time L(N ).
We iterate the procedure, i.e. the children of y ∈ C n will be Moreover, if we assume that M < θb, (4.7) then we have that all particle positions satisfy S(x i ) − S(x 0 ) ≤ 0 for all i ≤ |x| and x ∈ C n and GW n is really a subset of the killed branching random walk.

Coupling with a Galton-Watson process
To control the contribution of the last N 1/3 steps of the branching random walk, we use the following coupling: We can couple the IBRW with a modified IBRW, where in generations kL(N )+N, . . . , kL(N )+ L(N ) − 1, for k ∈ N 0 , the particles place their offspring according to the following rules relative to their own position: (i) to the right, the positions of the offspring follow the jump times of the birth process Z (f ) t started in 0, which jumps from k to k + 1 at rate f (k).
(ii) to the left, the positions of the offspring are given by a Poisson point process with intensity where (Z (f ) t ) is the birth process with jump rate f started in 0.
(iii) Also, in these generations types do not play a role, so we define all particles to have the same type as S(∅) in the original process. Now, we define the Galton-Watson GW n similarly to above in terms of the modified IBRW, where we again denote by C n the individuals in the first generation. Since (f t k ) is decreasing, we can couple the processes such that if GW n survives then GW n survives and also such that |C n | ≤ |C n |.
For the lower bound on P (C n = ∅) it turns out that it is enough to control the probability of the set being non-empty. Note that for both, the original process and the modification, the set C (asym) n is the same, however in the modified process, the next generations N + 1, . . . , L(N ) have the distribution of a single-type branching random walk. In particular, the number of particles in each generation form a Galton-Watson process, which we will denote by (X k ) k∈N0 = (X k (M )) k∈N0 . Moreover, we will denote by q = q M its extinction probability when started with a single particle. Note that q does not depend on n and we will use that by increasing M we have that lim M →∞ q M = 0 and also lim M →∞ EX 1 (M ) = ∞.
By the Markov property, the survival of the two subsets of the killed branching branching walk are related by The next result is the key step in the overall lower bound, where we will bound the probability that C (asym) n is non-empty. Proof. By construction the probability of the event C (asym) n = ∅ does not depend on the starting point of the initial point nor its type, so we can assume that S (n) (∅) = 0 and τ (∅) = 0.
For the first part of the proof, we will omit the indices n, whenever the context is clear and we are not dealing with asymptotic statements. In particular, we will write N = N n , S = S (n) , α = α n , etc. Also, in this proof ρ = ρ pn in the percolation case and ρ = ρ tn if the attachment rule is varying.
The first step is to carefully, select the relevant particles in C (asym) n . For R n = e N 1/4 and M n = N 1/5 and an individual x, we write ∆S( Recall that ν x = y : y−=x δ ∆S(y) .
For any |x| = N , letν The Paley-Zygmund inequality yields . (4.9) The remaining proof proceeds as follows: in the first step we find an easier upper bound on E[Y 2 n ], which we can estimate using the many-to-one lemma in the second step. In Step 3, we find a lower bound on E[Y n ], which we will then combine with the other steps to obtain our claim.
Step 1: We split the sum over y according to the last time j ∈ {0, . . . , N } that the ancestors of y agree with x to obtain Conditioning on the j + 1 first generations and using the independence of the branching process and the fact that on {ν xj ≤ R n }, we only have to consider at most R n relevant siblings of x j+1 , we obtain the upper bound where h N,n = 1 and for j ≤ N − 2, h j,n := sup s0∈Ij,n,τ0∈T In particular, we have shown that (4.10) Step 2: Upper bound on h j,n . First, we note since the left, resp. right, end point of I k,n are to the left of the left, resp. right, end point of I j,n for k ≥ j, we can apply the monotonicity in the initial types to deduce that h j,n ≤ sup By the monotonicity in types, Lemma 2.2, we have that v α (τ 0 )/v α (τ N ) ≤ v α (0)/v α ( ) and further by We now would like to apply Mogulskii's theorem to estimate the probability, but we need a bound that works uniformly for all j (since we will be summing over j) and uniformly in u. We thus approximate the sum over j by a finite sum and also split up the interval [−λN 1/3 , 0] into smaller subintervals. So fix κ ∈ N and define K := K n := N/κ .
Step 3: where we also used that by the monotonicity of types, see Lemma 2.2, v α (0) ≥ v α (s) for any s ∈ T .
Define the Markov chain (S k ,τ k ,ν k−1 ) k∈N with associated filtration (F k ) and transitions given for any measurable F by Then, we can continue the previous display as (4.14) Note that We estimate the last term in (4.14) by deducing from Corollary 2.9 that there existC,γ > 0 such that inf τ ∈T Step 4: Combining the estimates.
Finally, we can let θ 1 ↓ θ to obtain the claim of the proposition.
Proof of Theorems 1.1 and 1.2 -lower bounds. By the same argument as at the end of the proof of the upper bound, we have completed the proofs of Theorems 1.1 and 1.2 if we can show that if ζ n is either θ(p n , f ) or θ(1, f tn ), then lim n→∞ √ n log ζ n ≥ − π 2 σ 2 2 α * . (4.17) In our above construction, we first of all choose the constant M large enough such that the Galton-Watson process (X k (M )) k∈N0 satisfies the assumptions of Lemma 4.2 for θ 1 = 2 and θ 2 = 1/2. Also, we can assume that its survival probability satisfies 1 − q M ≥ 1/2. Let θ be such that α * θ < 1. Then, choose b large enough such that θb > M , so that in particular GW n is a subset of the killed IBRW, cf. (4.7). Additionally, we require that b > π 2 σ 2 (1 − α * θ) 2α * log 2 2 .
(4.18) By Proposition 4.3 we obtain that for any λ > Now, by (4.18) we have that so we can additionally assume that λ is small enough such that 2λα * < log 2.
Then, if we define r := 1 16 P (C (asym) n = ∅), we obtain by (4.19) that r −2 2 −N 1/3 ≤ e N 1/3 (2λα * −log 2+o(1)) → 0, We will use the following general fact (see [8,Fact 4.2], but also [9, Lemma 5.2]): let X 1 , . . . , X k be independent non-negative random variables and suppose F : (0, ∞) → [0, ∞) is non-increasing, then (4.20) We will eventually apply Lemma 4.1 to GW n and thus first estimate P (1 ≤ |C n | ≤ r −2 ). We write S for the positions in the modified branching random walk used in the definition of GW n . For all sufficiently large n, by using (4.20) in the second step, we obtain Finally, we can lower bound the survival probability of the killed IBRW by the survival probability of GW n , where we recall that the distribution of GW does not depend on the initial position and the initial type. Also, we deduce from the upper bound shown in Section 3 that necessarily P (GW n = ∅) → 0, so that the assumption r ≤ min{ 1 8 , P (|GW n | < ∞)} is satisfied for large n and we can apply Lemma 4.1 to obtain ζ n ≥ P (GW n = ∅) ≥ P (C n = ∅) − 2r −2 P (1 ≤ |C n | ≤ r −2 ) − 2r Hence, we get from (4.19) that lim n→∞ N −1/3 n log ζ n ≥ −λα * .

Proofs in the linear case
In this section, we show how to deduce Corollary 1.3 from our general result, Theorem 1.2.
Now the choice of parameters implies that for n sufficiently large P 0,τ0 (E n ) ≥ min{p 1 N,n , p 2 N,n }