Near Critical Preferential Attachment Networks have Small Giant Components

Preferential attachment networks with power law exponent τ>3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau >3$$\end{document} are known to exhibit a phase transition. There is a value ρc>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _{\mathrm{c}}>0$$\end{document} such that, for small edge densities ρ≤ρc\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho \le \rho _{\mathrm{c}}$$\end{document} every component of the graph comprises an asymptotically vanishing proportion of vertices, while for large edge densities ρ>ρc\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho >\rho _{\mathrm{c}}$$\end{document} there is a unique giant component comprising an asymptotically positive proportion of vertices. In this paper we study the decay in the size of the giant component as the critical edge density is approached from above. We show that the size decays very rapidly, like exp(-c/ρ-ρc)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\exp (-c/ \sqrt{\rho -\rho _{\mathrm{c}}})$$\end{document} for an explicit constant c>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c>0$$\end{document} depending on the model implementation. This result is in contrast to the behaviour of the class of rank-one models of scale-free networks, including the configuration model, where the decay is polynomial. Our proofs rely on the local neighbourhood approximations of Dereich and Mörters (Ann Probab 41(1):329–384, 2013) and recent progress in the theory of branching random walks (Gantert et al. in Ann Inst Henri Poincaré Probab Stat 47(1):111–129, 2011).


Introduction
Sparse random graph models typically undergo a phase transition in their connectivity behaviour depending on the mean number of edges per vertex. A typical case are the Erdős-Rényi graphs with n vertices and m = m(n) edges and asymptotic edge density ρ = lim m(n) n . There exists a critical density ρ c = 1 such that if the edge density satisfies ρ ≤ ρ c the largest component in the graph comprises a vanishing proportion of vertices, whereas for ρ > ρ c this proportion θ(ρ) is strictly positive. The behaviour of θ(ρ) as ρ ↓ ρ c is characterised by an exponent β defined by which is β = 1 in the Erdős-Rényi case. A natural extension of the Erdős-Rényi model is the configuration model, which allows to construct random graphs with n vertices and a given degree sequence m 1 , . . . , m n . Of particular interest is the case when the degree sequence is heavy-tailed, where the parameter τ > 2 is the power-law exponent. The connectivity behaviour of the Erdős-Rényi model persists with β = 1 for the configuration model with τ > 4, and with β = 1/(τ − 3) if 3 < τ < 4, see Cohen et al. [5]. The paper by Cohen is based on an informal approximation of local neighbourhoods in the graph by Galton-Watson trees and thereby extends to a wide range of scale-free network models, where similar approximations hold.
Cohen et al. [5] claim their result for scale-free networks in general without specifying a model. This reflects the belief that the behaviour observed in the configuration model extends to all natural scale-free network models including the class of preferential attachment networks. In the present paper however we show for the first time that preferential attachment networks have a qualitatively completely different behaviour than predicted in [5]. In fact, for all τ > 3 the size of the giant component is decaying exponentially as one approaches the critical edge density. More precisely, we show that the relative size of the giant component in preferential attachment networks is where c is an explicit constant depending on the way in which the edge density is controlled. This demonstrates once again that preferential attachment networks belong to a different universality class than the configuration model and other models based on rank-one connection probabilities. The underlying phenomenon of the 'small giant component' or 'slow emergence of the giant' has first been discovered and discussed by Oliver Riordan in the seminal paper [17], and in collaboration with Bollobás and Janson in [3,4]. Riordan [17] finds that for the original Barabási-Albert model subject to Bernoulli percolation with retention parameter p, one has where m ≥ 2 is the outdegree of all vertices. As this model corresponds to the critical case τ = 3 this is not at odds with the results of Cohen et al. [5]. The merit of our work is to extend the phenomenon of slow emergence to a regime where it is most surprising because it defies the predictions in [5].
In his proof, Riordan exploits the fact that there are local approximations of the network by multitype branching processes whose survival probability can be studied analytically by looking at the associated Laplace operators. These branching processes are more complex than those used in [5].
In the present paper local approximation is by a branching random walk with killing. Instead of the analytical techniques used by Riordan, which we could not apply effectively in our case due to the higher complexity of the approximating process, we use geometric properties of killed branching random walks. Studying their survival probabilities requires sophisticated techniques from the theory of branching random walks, which became available only in the past few years, see [1,9]. Our results are based on the techniques developed in [9] and are therefore pleasingly probabilistic and highly timely.

Statement of Results
We start by describing the preferential attachment network introduced in [6], which gives scale-free networks with arbitrary power law exponent τ > 2 by variation of a parameter.
A concave function f : N 0 → (0, ∞) is called an attachment rule if f (0) ≤ 1 and The maximal increment is denoted by γ + := sup{ f (k) : k ∈ N 0 }. By concavity, f is non-decreasing, γ + = f (1) − f (0) and the limit γ := lim k→∞ f (k)/k exists and equals γ = inf{ f (k) : k ∈ N 0 }. Given an attachment rule f , we define a growing sequence (G n : n ∈ N) of random graphs as follows • Start with the graph G 1 given by one vertex labelled 1 and no edges; • Given the graph G n , we construct G n+1 from G n by adding a new vertex labelled n + 1 and, for each m ≤ n independently, inserting the directed edge (n+1, m) with probability f (indegree of m at time n) n .
Formally, we are dealing with a sequence of directed graphs but all edges point from the younger to the older vertex. Hence, directions can be recreated from the undirected, labelled graph. For all structural questions, particularly regarding connectivity and the length of shortest paths, we regard (G n : n ∈ N) as an undirected network. It is shown in [6] that when γ > 0 the networks have a degree distribution which is a power law with exponent τ = 1 + 1/γ . We are also interested in the percolated version of the network (G n : n ∈ N). For p ∈ [0, 1], we write G n ( p) for the graph obtained from G n by deleting each edge with probability 1 − p independent of all other edges.
Let (G n : n ∈ N) be a sequence of (random or deterministic) graphs, where G n has n vertices. For n ∈ N, we denote by |C n | the size of the largest component in G n . The network (G n : n ∈ N) has a giant component if there exists a constant θ > 0 such that where the convergence holds in probability. The limit θ is called the size of the giant component.
Dereich and Mörters showed in [7,Theorem 1.6] that when γ ≥ 1 2 , or equivalently 2 < τ ≤ 3, then (G n ( p) : n ∈ N) has a giant component for all p ∈ (0, 1]. When γ < 1 2 , or equivalently τ > 3, then there exists a critical percolation parameter p c > 0 such that (G n ( p) : n ∈ N) has giant component if and only if p > p c . We denote the relative size of the giant component in (G n ( p) : n ∈ N) by θ( p, f ) and omit p or f from the notation when the percolation parameter or the attachment rule are fixed.
We are interested in the decay in the size of the giant component as we approach p c from above in the case p c > 0, or equivalently γ < 1 2 . It was shown in Lemma 3.3 of [7] that the critical retention probability for the network (G n : n ∈ N) is given by where ρ(·) is the spectral radius of the score operator, a function on a nonempty open interval, which we describe explicitly in Sect. 2, and α * is the minimizer of ρ.
Our first result shows the exponential decay of the size of the giant component of the percolated network, when the retention parameter approaches the critical value.
An alternative way of reducing the edge density and thereby destroying the giant component is to alter the attachment rule instead of percolating the network. For a linear attachment rule f (k) = γ k + β with γ < 1 2 , Dereich and Mörters [7] show that there exists a giant component if and only if Therefore, one could fix γ and decrease β to β c . Another idea would be, for a given β, to decrease γ until β = β c (γ ). To analyse the behaviour of the size of the giant component under this procedure, let ( f t ) t≥0 be a sequence of attachment rules with γ t := inf{ f t (k) : k ∈ N 0 } < 1 2 for all t ≥ 0. We denote by ρ t and α * t the spectral radius and its minimizer corresponding to the score operator for the unpercolated branching random walk derived from attachment rule f t . t≥0 be a pointwise decreasing sequence of attachment rules with γ t < 1 2 for all t ≥ 0 and pointwise limit f . Suppose that θ(1, f t ) > 0 for all t and θ(1, f ) = 0. Then where α * and ρ (α * ) are derived from f .
The existence of ρ and α * corresponding to f is proved in Proposition 2.3. There we will also see that lim t→∞ ρ t (α * t ) = 1. The following corollary exemplifies Theorem 1.2 for linear attachment rules, see also Fig. 1. We denote Remark 1.4 Two cases in our phase diagram are covered in the work of Riordan [17]. The first of these cases corresponds to an approximation from the right of the point β = 0, γ = 1 2 which is equivalent to the original Barabási-Albert, or LCD, model. 1 Note that our results refer to the subcritical case γ < 1 2 and the critical case γ = 1 2 is not included. The second is the case β = 1 4 , γ = 0, the Dubins model, in which there is no preferential attachment and our results are consistent with those of Riordan [17]. Corollary 1.3 allows a quantitative comparison of the decay of the giant component for different models. The smaller γ (or the larger τ ), the slower is the decay. The LCD model, or equivalent models with γ = 1 2 , have faster decay of the size of the giant component than preferential attachment networks with attachment rules satisfying γ < 1 2 . Throughout, we will use the following notation: For every integer k ∈ N, we write The remaining paper is structured as follows: We prove Theorems 1.1 and 1.2 simultaneously. In Sect. 2 we collect several auxiliary results that we need later on. In particular, in Sect. 2.1 we recall the relevant results from [7] who relate the size of the giant component to the survival probability of a multitype branching random walk with killing. In Sect. 2.2 we derive the main tool to analyse these branching random walks, a version of the well-known many-to-one lemma. In Sect. 2.3 we collect various moment estimates, while in Sect. 2.4 we state a large deviation result originally due to Mogulskii in a suitable adaptation to our setting.
The actual proofs of Theorems 1.1 and 1.2 are split into an upper bound carried out in Sect. 3 and a lower bound in Sect. 4. Our proof shows that the survival probability of the branching random walk is well approximated by the probability that the branching random walk follows a carefully chosen strategy up to a large generation N . Here, the choice of N and the particular strategy depend on how close to criticality the model is. With the help of the many-to-one lemma, these events for the branching random walk can be translated into events expressed in terms of a Markov chain. Then, the latter can be analysed using large deviation techniques. The upper bound uses this strategy together with a first moment method, while the lower bound is slightly more involved: we use a coarse graining strategy, whereby we first identify a suitable subset of the multitype branching random walk by grouping together particles across every N generations following a particular strategy. Then, we treat each of these groups as one generation of a standard Galton-Watson process and analyse its survival probability by estimating the probability that one generation survives.
In Sect. 5 we show how to derive Corollary 1.3 from Theorem 1.2. Finally, in Appendix we prove the large deviation result, Theorem 2.10.
As mentioned above our proof follows a similar strategy as in [9]. However, we are considering a very different mechanism: first of all we are dealing with a multi-type branching random walk that additionally has infinitely many offspring in each step. Furthermore, in our case we do not shift the killing boundary, but we allow the parameters of the model to vary such that the models approaches criticality which requires very different moment estimates.

The Approximating Branching Process
We introduce a pure jump Markov process with generator This defines an increasing, integer-valued process, which jumps from k to k + 1 after an exponential waiting time with mean 1/ f (k), independently of the previous jumps. Under the probability P we denote by (Z t : t ≥ 0) the process started in zero, by (Ẑ t : t ≥ 0) the process started in one, and by (Z (τ ) t : t ≥ 0) the process started in zero conditioned to have a point at τ ≥ 0.
This process is used to define a multitype branching random walk with type space T := [0, ∞) ∪ { }, where is a non-numerical symbol for 'left'. A particle in location x ∈ R and of type τ ∈ T , produces offspring to its left whose displacements have the same distribution as the points of the Poisson point process with intensity measure The type of an offspring on the left equals the distance to its parent.
The distribution of the offspring to the right depends on the type of the particle. When the particle is of type , then the relative positions of its right offspring follow the same distribution as the jump times of (Z t : t ≥ 0). When the particle is of type τ ≥ 0, then the displacements follow the same distribution as the jump times of (Z (τ ) t − 1 [τ,∞) (t) : t ≥ 0). All offspring on the right are of type .
The offspring to the right do not form a Poisson point process. The more particles are born, the higher the rate of new particles arriving. Moreover, the total number of particles produced is infinite without accumulation point. The expected distance between a particle and its kth offspring on the right equals k−1 distance behaves asymptotically like γ −1 log(k) when γ = 0 and like k when γ = 0. We call the described process idealized branching random walk (IBRW) in accordance with [7]. Dereich and Mörters [7] show that the genealogical tree of the IBRW is related to the local neighbourhood of a vertex in G n . To obtain a branching process approximation to (G n ( p) : n ∈ N), we define the percolated IBRW by associating to every offspring in the IBRW an independent Bernoulli( p) random variable. If the random variable is zero, we delete the offspring together with its descendants. Otherwise, the offspring is retained in the percolated IBRW. When the percolated IBRW is started with one particle in location x and type τ , then we write P p (x,τ ) for its distribution and E p (x,τ ) for the corresponding integral operator; . The percolated IBRW can be interpreted as a labelled tree where every node represents a particle and is connected to its children and (apart from the root) to its parent. The vertices are identified as finite sequences of natural numbers x = j 1 . . . j k , including the empty sequence ∅ which denotes the root. We concatenate sequences x = i 1 . . . i k and y = j 1 . . . j m to form the sequence x y = i 1 . . . i k j 1 . . . j m . When, for j ∈ N, x j is a vertex in , then x, x1, . . . , x( j − 1) are also vertices in the tree and x =: p(x j) is the parent of x j. The length |x| = k is the generation of x. For |x| ≥ k, we abbreviate p k (x) for the k-fold composition of p(·). The ancestor of x in generation k is denoted by x k , i.e. x k := p n−k (x) when |x| = n. In particular, x 0 always denotes the root. To every vertex we associate two functions, S and τ . Here S(x) is the location of the particle on the real line and τ (x) denotes its type.
To obtain a branching process approximation to the local neighbourhood of a vertex in (G n ( p) : n ∈ N), we consider the percolated IBRW with a killing barrier at zero. That is, every particle with location on the nonnegative half-line is deleted together with its descendants. Dereich and Mörters prove the following identification. [7]) For all p ∈ [0, 1] and attachment rules f , θ( p, f ) equals the survival probability of the percolated IBRW with a killing barrier at zero, started with one particle of type whose location is given by −E, where E is an exponential random variable with mean one.

Theorem 2.1 (Dereich and Mörters
Next we collect some spectral properties that will be used in the analysis of the IBRW. Denote by C(T ) the Banach space of bounded, continuous functions on T equipped with the supremum norm · . For α ∈ (0, 1), we consider the score operator on the attachment rule f is suppressed in notation but it will always be clear from the context which f is considered. Since by definition, it suffices to analyse A α . We write 1 for the constant function with value 1 and let I := (γ , 1 − γ ) for γ < 1 2 and I = ∅ for γ ≥ 1 2 .
Proof By (2.2), it suffices to prove Lemma 2.2 in the case p = 1. For that case, it was shown in [7, Lemma 3.1] that A α 1(0) < ∞ is equivalent to A α being a strongly positive, compact operator with A α g ∈ C(T ) for all g ∈ C(T ). Moreover, it is proved that for all g ∈ C(T ), Here, for example, A α g means that for all τ ≥ 0, . The values of a and c are identified as (cf. proof of Proposition 1.10 in [7]) We analyse the convergence properties of a(α).
for some C > 0 and a null sequence (δ k ) k∈N . In particular, there exists a C > 0 such that Hence, a(α) = ∞ for all α ≤ γ . On the other hand, by Cauchy's condensation test, a(α) < ∞ for all α > γ . Hence, A α 1(0) < ∞ if and only if α ∈ (γ , 1 − γ ) =: I. From now on we assume that I = ∅, i.e. γ < 1 2 . For α ∈ I, ρ(α) is finite and since ρ(α) ≥ a(α)+a(1−α), we see that ρ(α) → ∞ for α → ∂I. The existence and uniqueness of eigenfunction v α : T → (0, ∞) follows from the Krein-Rutman theorem (see Theorem 3.1.3 in [16]). The fact that A α is a strictly positive operator, implies min τ ∈T v α (τ ) > 0. Since ρ(α) is an isolated eigenvalue with one-dimensional eigenspace, one can argue along the lines of Chapter II §3 (in particular Remark 2.4) and Theorem II. §5.5.4 of [12] that ρ is twice differentiable and the derivative can be represented as in (2.3). In particular, ρ (α) > 0 for all α ∈ I, hence, ρ is strictly convex on I and there exists a unique minimizer α * ∈ I. To Then, by the definition of the eigenfunction, where we used in the inequality that the distribution of the positions to the left of the origin do not depend on the initial type and for the second expectation we used the monotonicity in types proved in [7]. The upper bound holds by a similar argument.
The next proposition collects some of the consequences for the spectral radius if we consider converging attachment rules. .
and let ρ(α) be the spectral radius of the operator associated to the branching process with attachment rule f , α * be its unique minimizers and let v α be the corresponding eigenfunction with v α = 1. Define the same quantities with index t when referring to the branching process associated to f t , where we set Proof Since ( f t ) t≥0 is a pointwise decreasing sequence of positive functions exists. As an infimum of concave functions, f : 1, but the strict inequality is not needed for the analysis of the branching process and the corresponding operators.
The assumption ρ t (α * t ) > 1 implies that there exists a giant component for all t ≥ 0. Hence, by Proposition 1.10 in [7]. Here a f t and c f t are the functions given in (2.4). By monotone convergence, Hence, f is an attachment rule. From now on we add a subscript t to all quantities corresponding to f t and no subscript for quantities corresponding to f .
The assumption γ t < 1 2 is needed to make the operator A t,α exist for some α. We have for all t ≤ s In particular, (γ t ) t≥0 is a non-increasing sequence.
for t ≥ 0. Then I t ⊆ I for all t ≥ 0 and we write ρ t (α) = ∞ whenever α / ∈ I t . Let α ∈ I. We use the monotonicity of the branching process in the attachment rule to derive In particular, for every α ∈ I there exists t ≥ 0 such that ρ t (α) < ∞. Hence, Convergence γ t → γ implies the existence of a t 0 > 0 such thatÎ ⊆ I t for all t ≥ t 0 . In particular, we can consider the family ρ, (ρ t ) t≥t 0 of uniformly continuous functions onÎ.
In the next step we argue that α * t t→∞ −→ α * . Notice that by assumption Hence, ρ(α * ) = 1. Suppose that α * t t→∞ −→ α * does not hold. Then there exists a δ > 0 and a subsequence t n ↑ ∞ such that |α * t n − α * | ≥ δ for all n ∈ N. Since ρ is strictly convex with unique minimizer α * , we have In particular, Since the term on the left-hand side converges to 1, this is a contradiction and the convergence The fact that α * t converges to α * and f t converges to f implies that A t,α * t converges to A α * because of the uniform continuity of the operator in α ∈Î. Since the eigenspaces of ρ(α * ) and ρ t (α * t ) are one dimensional, one can argue along the lines of Note 3 on Chapter II in [12, pp. 568-569]

uniformly bounded in t from zero and infinity. With the observed convergences and this uniform bound (2.3) now implies that also
as required.

The Many-to-One Lemma
We first continue the analysis of the IBRW. The following lemma is based on a spine construction which is known as Lyons' change of measure [14]. Recall the Ulam-Harris notation from Sect. 2.1.

Lemma 2.4 ( Many-to-one)
For all α ∈ I there exists a probability measure P α on some measurable space, and a Markov process (2.5) Note that it is easy to check that the distribution of ((S n , τ n ) : n ∈ N 0 ; P α ) does not depend on the percolation parameter p.
Proof Given a labelled tree ( , L), with L(x) = (S(x), τ (x)), we can distinguish an ancestral line ξ = (ξ 1 , ξ 2 , . . .) which we call spine. In the space of labelled trees, we denote by F n the σ -field generated by the first n generations, F n = σ ((x, L(x)) : |x| ≤ n). The analogue in the space of trees with spines is denoted by F * n . For every (s 0 , τ 0 ) ∈ R × T , the distribution of the IBRW started in (s 0 , τ 0 ) can be interpreted as a distribution P p (s 0 ,τ 0 ) on the set of labelled trees. We extend this measure to the space of labelled trees with spines. Since (s 0 , τ 0 ) and p will remain fixed throughout the proof, we omit it from the notation and write P = P p (s 0 ,τ 0 ) for brevity. Note that every F * n -measurable function g can be written as for F n -measurable functions g x (see page 24 in [18]). We define P * n to be the (non-probability) measure on F * n such that for all nonnegative F * n -measurable functions g, We now construct a new branching random walk under a new probability measure P α . The root has again label L(∅) = (S(∅), τ (∅)) = (s 0 , τ 0 ). A particle ξ n on the spine in generation n with label (S(ξ n ), τ (ξ n )) produces new offspring with distribution for all atomic measures μ on R × T . Here L σ denotes the offspring distribution for a particle of type σ ∈ T in the original process. The new spine particle ξ n+1 in generation n + 1 is chosen from the offspring of ξ n by choosing an offspring x with probability proportional to Off the spine the new branching random walk behaves exactly as the original one. Then In particular, for all F : We define (S n , τ n ) := (S(ξ n ), τ (ξ n )) and P α (s 0 ,τ 0 ) = P α . The Markov property follows from the definition of the process. Since the offspring distribution of the spine is absolutely continuous with respect to the offspring distribution of the original process and the type of the original process is a function of the locations of its ancestors and the particle itself, the proof is complete.
Next we also need a higher-dimensional version of the many-to-one lemma that includes the number of offspring of the particles on the spine. For any x = (x 0 , . . . , x n ) in the branching process, define the point measure on R with δ s a Dirac mass in s, which describes the positions of offspring of x relative to the position of x. Denote by M p (R) the point measures on R.
Moreover, for any measurable A ⊂ R and any measurable F, Note that unlike in Lemma 2.4 the distribution of (S k , τ k , ν k−1 ) k∈N 0 does depend on p, since we are considering a non-linear function of the point process describing the position of the offspring.
Proof Consider the IBRW with spine ξ under the measure P α as constructed in the proof of Lemma 2.4. Then, define (S n , τ n , ν n−1 ) := (S(ξ n ), τ (ξ n ), ν ξ n−1 ) and the first statement follows since we know the explicit Radon-Nikodym density of P α with respect to P * n . The second statement is a consequence of the Markov property of the IBRW with spine combined with a suitable choice of test function F.

Asymptotic Moment Estimates
In the proofs of Theorems 1.1 and 1.2 we will need estimates for moments of the Markov chains defined in Lemmas 2.4 and 2.5. Suppose that f t n is a sequence of decreasing attachment rules such that f t n ↓ f pointwise for an attachment rule f . Further, suppose that γ n := lim k→∞ f t n (k)/k < 1 2 for all n ∈ N, so that by Proposition 2.3 also γ = lim k→∞ f (k) k < 1 2 . Let α * n , resp. α * be the unique minimizer of the spectral radius ρ n , resp. ρ, of the operator corresponding to the attachment rule f n , resp. f . In the setting of Theorem 1.1 take f t n = f and write ρ n = ρ p n = p n ρ(·).
The Markov chain from Lemma 2.4 corresponding to attachment rule f t n is denoted by

Lemma 2.7
There exists η > 0 such that Proof We first consider the case S (n) where we used that ρ n (α n ) > 1, the monotonicity in types for v α n by Lemma 2.2 and the uniform boundedness of the quotient v αn (0) v αn ( ) in n by Proposition 2.3. Hence, it suffices to show that for the right choice of η the expectation on the right hand side remains bounded in n. Now, by Lemma 2.2, we have that η := 1 4 (α * − γ ) > 0. By Proposition 2.3, we have that γ n ↓ γ and α * n → α * . Then, we can choose n 0 sufficiently large such that for all n ≥ n 0 we have γ n < γ + η and α n ≥ α * − η.
Furthermore, we denote byZ (n 0 ) the jump process that jumps from k to k + 1 with rate f t n 0 (k) started in 1. Then by comparison with a Yule process with constant branching rate, we can find a constant C(n 0 ) > 0 such that Finally, we obtain from the construction of IBRW and using (2.6) that for n ≥ n 0 which is finite by choice of η.
The fact that the supremum over all n is finite follows from the same argument if we redefine η as 1 4 For the case that S (n) 1 ≤ 0, it suffices to prove in a second step, that there exists η > 0 such that Since the children to the left of a particle in the IBRW form a Poisson process and their distribution is not depending on the type of their ancestor we have by construction where X ( f ) is the pure birth process with jump rates given by f .
Using that f t k (k) ≤ k + 1 and a comparison to a Yule process, we have that there exists C n > 0 such that By Proposition 2.3, we can find n 0 such that for all n ≥ n 0 , |α n − α * | < , and γ ≤ γ n ≤ γ + .
Define η := min{ 3 8 , n , n ≤ n 0 }. Then, we have for n ≤ n 0 , Furthermore, for n ≥ n 0 , we can use the monotonicity of f t n to deduce that which completes the second step and thus the proof of the lemma.

Lemma 2.8 Let (S (n)
k , τ (n) k , ν (n) k−1 ) be the Markov chain defined in Lemma 2.5 either for attachment rule f t n and percolation parameter 1 or for fixed attachment rule f (with γ < 1 2 ) and percolation parameter p n , where p n ↓ ρ(α * ) −1 . For any sequence (M n ) n∈N such that M n → ∞, there exist constants C > 0 andγ > 0 such that, for all n, Proof To unify notation define p n = 1 in the varying f case and f t n = f in the percolation case. By the extended many-to-one formula, Lemma 2.5, we have that We can use that ρ n (α * n ) > 1, the monotonicity in types and in p n , Lemma 2.2, and that C := sup n∈N v αn (0) v αn ( ) < ∞ by Proposition 2.3, in order to bound the above by (2.7) For the first term in (2.7), we note that Therefore, if we let (Ẑ f t 1 ) t≥0 be the jump process jumping from k to k + 1 at rate f t 1 (k) started in 1, we can conclude that by construction of the IBRW for some constants C( f t 1 ) andγ > 0, where the latter bound follows by comparison with a Yule process, whose second moments grow at most exponentially. For the second term on the right hand side in (2.7), the first expectation is bounded uniformly in n by (the second part of) Lemma 2.7 and the second expectation can be bounded by the second moment, so that the first part of the argument applies.
For the final term in (2.7), we use that the particles to the left form a Poisson process, so that we can use a standard identity for Poisson processes, see e.g. [13,Eq. (4.26)], to deduce that However, as in the proof of Lemma 2.7 the right hand side is bounded uniformly in n.

Corollary 2.9
In the setting of Lemma 2.8, we have for any sequence N n → ∞ and with R n = e N 1/4 n , M n = N 1/5 n that there existC,γ > 0 such that Proof By Lemma 2.7, there exists η > 0 such that by our choice of M n and R n . Therefore, the statement of the corollary follows by choosing γ andC appropriately.

Mogulskii's Theorem
The main technical tool in the proof of our main result is the following large deviation result due to Mogulskii in its original form. We state it here in a version adapted to Markov chains as a generalisation of the version for random walks found in [9].
is a standard Brownian motion.
Let g 1 < g 2 be two continuous functions on [0, 1] with g 1 (0) ≤ 0 ≤ g 2 (0) and denote Then, for all τ 0 ∈ T , The proof of Theorem 2.10 is postponed to Appendix. We will now show that we can apply Mogulskii's results to the Markov chains from Lemmas 2.4 and 2.5. We will treat both the setting of Theorems 1.1 and 1.2 at the same time and so continue using the notation introduced at the beginning of Sect. 2.3.
To this end, we first recall Donsker's theorem for martingale difference arrays (see for example Theorem 7.7.3 in [8] or Theorem 18.2 in [2]). The theorem is usually stated for r n = n, but it is straightforward to generalize the statement to the following: Proposition 2.11 Let (r n ) n∈N ∈ N N be a sequence with r n ↑ ∞ as n → ∞. For every n ∈ N, let (ξ n i : 1 ≤ i ≤ r n ) be a family of random variables and denote F n i = σ (ξ n 1 , . . . , ξ n i ) for all i ≤ r n . Assume that Then the linear interpolation of ( i≤m ξ n i : m ≤ r n ) converges weakly to a standard Brownian motion on [0, 1].
We use Donsker's theorem as follows.

Lemma 2.12
Let A > 0, (a n ) n∈N be a positive sequence with lim n→∞ a n = ∞ and write r n = Aa 2 n for each n ∈ N. Moreover, let

be the Markov chain introduced in Lemma 2.4 for attachment rule f t n . For all c
as n → ∞, uniformly in τ ∈ T , where (W t : t ≥ 0; P) is a standard Brownian motion.
Proof The first step is to show that the conditions of Proposition 2.11 are satisfied by the random variables by Lemma 2.6 and ρ n (α * n ) = 0 as α * n minimizes ρ n . Moreover, as n → ∞, since r n → ∞ and ρ n (α * n )/ρ n (α * n ) → σ 2 by definition. Thus, Condition (ii) of Proposition 2.11 is satisfied. Finally, let η be as in Lemma 2.7, then where C is a constant such that x 2 ≤ Ce η|x| for all x ∈ R. Then, by Lemma 2.7 the exponential moment is bounded uniformly in n, so that the right hand side converges to zero as required.
To apply Theorem 2.10 we have to check that the convergence of the distribution functions is uniform in the start type. This is guaranteed by the monotonicity of the IBRW in the start type (which was proven in [7, Remark 2.6]) which entails a monotonicity of (S (n) i ) by the many-to-one lemma, and by the fact that the limit is independent of the start type.

Lemma 2.13
Let N n be a positive sequence with lim n→∞ N n = ∞ and set a n = N 1/3

and transitions given for any measurable F by
where (S (n) 1 , τ (n) 1 , ν (n) 0 ) is the first step of the Markov chain defined in Lemma 2.5 associated either to the IBRW with attachment rule f t n and p = 1 or with attachment rule f and percolation parameter p n . Let A > 0, write r n = Aa 2 n . For all c 1 , c 2 , c 3 , c 4 ∈ R, as n → ∞, uniformly in τ ∈ T , where (W t : t ≥ 0; P) is a standard Brownian motion.
Proof We show that we can replaceS (n) i by S (n) i in the above probability up to an error that converges to 0 uniformly in τ and then invoke Lemma 2.12. Define the event and note that by Corollary 2.9, there exist constantsC,γ > 0 such that We estimate for any c i where we assume that n is sufficiently large and we used (2.11) together with 1 1−x ≤ 1 + 2x for x ∈ [0, 1 2 ]. Iterating this estimate yields (2.12) and we note that by the choice of r n the error converges to 0. For a lower bound, we estimate Iterating and using the bound (2.11) gives where the error converges to 0 by choice of r n . Then, combining (2.12) and (2.13) together with Lemma 2.12 we can deduce the statement of the lemma.

Proofs: Upper Bound
In this section, we fix the start type of the IBRW. Recall that in the killed IBRW every particle x with S(x) > 0 is deleted together with its descendants. We denote its survival probability by ζ , that is, for s 0 ≤ 0,

Lemma 3.1
For all α ∈ I, s 0 ≤ 0, n ∈ N and b 1 , . . . , b n ≥ 0, Proof By definition, For the first summand we use the Markov inequality and Lemma 2.4 to derive where we used in the last step that by Lemma 2.
Proof Using Lemma 3.1, that j → I ( j) is decreasing, ρ(α) ≥ 1 and that j → e −αb j is increasing to obtain For the proof of Theorem 1.1, let ( p n ) n∈N be a sequence of retention probabilities with p n ↓ p c . For Theorem 1.2, let (t n ) n∈N be a sequence of parameters with t n ↑ ∞. We write, Moreover, we denote by v n the eigenfunction for ρ n (α * n ) from Lemma 2.2. The Markov chain from Lemma 2.4 corresponding to α = α * n and retention parameter p n or attachment rule f t n is denoted by ((S (n) i , τ (n) i ) : i ∈ N; P), i.e. P = P α * n (s 0 , ) . One easily checks that in the setup of Theorem 1.1 the distribution of the Markov chain does not depend on n.

Lemma 3.3 Let
Notice that the specific choice of parameters implies that By Lemma 2.12 we can apply Theorem 2.10 with a n = (lk n ) 1/3 , n = lk n and g 2 (t) = a −1/2 (C + δ) l N , and then take δ 0 to derive lim sup Moreover, as n → ∞, 1 2 n (lk n ) θ( p, f ) is non-decreasing in retention probability p and attachment rule f , it suffices to consider the asymptotic behaviour along a discrete subsequence. As before, for Theorem 1.1 we take any discrete sequence of retention probabilities p n ↓ p c and for Theorem 1.2 we take any discrete sequence t n ↑ ∞. We make use of the notation introduced in and before Lemma 3.3. In particular, we fix N ∈ N, a > 0 and b > 0 and let k n := (a/ n ) Applying Lemma 3.2 with α = α * n , we obtain for every N ∈ N,

Proof of the upper bounds in Theorems 1.1 and 1.2 Since
log k n − α * n (b n (l+1)k n + s 0 ) + (l + 1)k n log ρ n (α * n ) + log sup −C/ √ n ≤s 0 ≤0 Recall that the choice of parameters implies that 3 2 n k n → a 3 2 N , and √ n log k n → 0 asn → ∞.
Hence, by Lemma 3.3, lim sup Taking N to infinity, we deduce lim sup Now, we consider any C < π 2 σ 2 2 and we choose b such that For these choices, we can take a sufficiently large to see that the first term in the above maximisation can be ignored, while in the second one the supremum is achieved at x = 0. This gives the bound lim sup n→∞ 1 2 Now, we can let b ↑ π 2 σ 2 2 − C to see that for any 0 < C < π 2 σ 2 2 , lim sup n→∞ 1 2 n log sup To complete the proof take δ = 1 and note that δ ≤ α * πσ √ 2 < πσ √ 2 . Then, we obtain from (3.5) using the monotonicity of s → ζ n s , where we used that α * < 1. Thus, we can deduce by first taking the limit n → ∞ and then This immediately implies Theorem 1.2 since σ 2 = ρ (α * ). For Theorem 1.1, p c = 1/ρ(α * ) implies n = log( p n ρ(α * )) = log( p n / p c ) = log 1 + so that Theorem 1.1 follows since σ 2 = ρ (α * ) p c .

Proofs: Lower Bound
The general strategy of the lower bound is to identify a subtree of the IBRW that has the same distribution as a Galton-Watson tree. For this carefully chosen subtree we then lower bound the survival probability, which in return gives the required lower bound on the survival probability of the IBRW. In Sect. 4.1 we collect some general facts about Galton-Watson trees, which we will then use in Sect. 4.2 to carry out the proof of the lower bound.

Galton-Watson Lemmas
In order to show the lower bound we construct a Galton-Watson tree, whose particles are a subfamily of the killed branching random walk. To estimate the survival probability of this Galton-Watson tree, we will use the following general lemma due to [9] and we also recall the proof for completeness.

Lemma 4.1
Let GW be a Galton-Watson tree and denote by X the number of children in the first generation and by q := P(|GW| < ∞) its extinction probability. Then, for all r ≤ min{ 1 8 , q}, Proof Denote by q the extinction probability of GW and by s → g(s) = E[s X ] the generating function of GW. Then, for every r ∈ [0, q], where we used that g (s) is increasing, and g (s) ≤ 1 for all s ∈ [0, q n, ]. We continue by estimating where we used for the second summand that u → ue −ru is decreasing on [r −1 , ∞) and Then, for all r ∈ (0, 1 8 ], Combining (4.2) and (4.3), we deduce Rearranging (4.1) and (4.4), we conclude that for all r ≤ min{ 1 8 , q}, as required.
We will also need an estimate which guarantees that a supercritical Galton-Watson process grows exponentially fast on survival with large probability.

Lemma 4.2
For all θ 1 > 1 > θ 2 > 0 there exists δ > 0 such that for any Galton-Watson process (X n : n ∈ N) with X 0 = 1 where mean offspring m, offspring distribution ( p k ) k∈N 0 , and extinction probability q satisfy q, p 1 < δ and m > 1/δ, we have, for sufficiently large n, Proof Denote by g the generating function of X 1 . By pruning the tree, i.e. removing all finite subtrees, we obtain a tree which on survival of the original process equals a Galton-Watson process (X n : n ∈ N) withp 0 = 0,p 1 = g (q) ≤ p 1 + 2q (1−q) 2 , and the same mean as the original process. Hence δ > 0 can be chosen such that the pruned process has arbitrarily smallp 1 and arbitrarily large mean.
We first show the statement for the pruned process. For an individual v, we denote bỹ X n (v) the number of its offspring in generation n of the pruned Galton-Watson process. Let v 0 , v 1 , . . . , v n be the first individuals according to the Ulam-Harris labelling in generations 0 up to n. Then, we can boundX n is defined as follows: if v i−1 has two or more offspring we setX (i) n =X n (ṽ i ), whereṽ i is the second offspring of v i−1 and if v i−1 has only one offspring, we setX (i) n = 0. In particular, theX (i) n are independent and have distribution P(X (i) n = 0) =p 1 and P(X (i) n = k |X (i) n > 0) = P(X n−i = k), k ∈ N. Then, we can calculate We now choose δ > 0 so small that EX 1 > θ 2 1 andp 1 < θ 2 2 /8. In particular, for sufficiently large n we have P(X n/2 ≤ θ n 1 ) ≤ θ 2 2 /8. Hence, we obtain for n sufficiently large Note that if we condition X n on extinction in distribution it is equal to a Galton-Watson process X * n with mean g (q) =p 1 . Therefore, by Markov's inequality by the same assumptions on δ as above for n large. We can also assume that δ is sufficiently small, so that the extinction probability q is less than 1/2. Hence, combining the above estimates, which completes the proof.

The Lower Bound
Throughout, we use the same notation as in the upper bound: For the proof of Theorem 1.1, let ( p n ) n∈N be a sequence of retention probabilities with p n ↓ p c . For Theorem 1.2, let (t n ) n∈N be a sequence of parameters with t n ↑ ∞. We denote by S (n) the positions either in the percolated IBRW or in the IBRW with attachment rule f t n . If the context is clear, we will omit the superscript. Also, we write, ρ n (·) = p n ρ(·) for Theorem 1.1 ρ t n (·) for Theorem 1.2 and α * n := α * for Theorem 1.1 α * t n for Theorem 1.2.
Moreover, we denote by v n the eigenfunction for ρ n (α * n ) from Lemma 2.2.
Given any starting point s ≥ 0 and initial type τ , we will write P = P (s,τ ) = P p n (s,τ ) in the percolation case and P = P (s,τ ) = P 1 (s,τ ) in the case of Theorem 1.2. In view of Lemma 4.1, we will now choose a Galton-Watson tree GW n as a subtree of the killed IBRW in the following way, where we denote by X (n) the number of children in the first generation of GW n : (a) P(X (n) = 0) ≈ the survival probability of the Galton-Watson process. That is, when there are offspring, then the process usually survives. (b) P(X (n) = 0) is close to the survival probability of the killed IBRW. That means that we choose the subpopulation as a good approximation of the BRW and that the first inequality in (4.5) is a good estimate. (c) P(1 ≤ X (n) ≤ r −2 ) has to be small to beat r 2 .
The Galton-Watson tree is obtained by a coarse-graining procedure, which we now describe. It involves positive parameters b, λ, θ, M which will be chosen carefully at a later stage of the proof. We group together the first N + o(N ) generations in the IBRW to form the first generation in GW n . It turns out that we have to choose N = N n such that and for the first N steps we only choose particles whose positions are in the interval To be precise, let L(N ) = N + N 1/3 and Then we define C n to be the particles in the first generation of GW n . We include the last N 1/3 generations to make sure that if we survive until time N , then we will have many particles by time L (N ).
We iterate the procedure, i.e. the children of y ∈ C n will be and y∈C n C n (y) will form the individuals of the second generation of GW n and we continue in a similar way. Note that by the construction of I i,n , we only include children in the second generation of the original IBRW that are to the left of the position of their ancestor. In particular, their distribution does not depend on the type of the parent. Moreover, all the conditions on the spatial positions are relative to S(∅). Therefore, the distribution of C n does not depend on either the type of the root ∅ nor the initial position S(∅). Similarly, the distribution of C n (y) does not depend on either type nor position of y. Hence, the number of individuals in the different generations really do form a Galton-Watson process.
Moreover, if we assume that M < θb, (4.7) then we have that all particle positions satisfy S(x i ) − S(x 0 ) ≤ 0 for all i ≤ |x| and x ∈ C n and GW n is really a subset of the killed branching random walk.

Coupling with a Galton-Watson process
To control the contribution of the last N 1/3 steps of the branching random walk, we use the following coupling: We can couple the IBRW with a modified IBRW, where in generations k L (N )+ N , . . . , k L(N )+ L(N )−1, for k ∈ N 0 , the particles place their offspring according to the following rules relative to their own position: (i) to the right, the positions of the offspring follow the jump times of the birth process Z ( f ) t started in 0, which jumps from k to k + 1 at rate f (k). (ii) to the left, the positions of the offspring are given by a Poisson point process with intensity is the birth process with jump rate f started in 0. (iii) Also, in these generations types do not play a role, so we define all particles to have the same type as S(∅) in the original process. Now, we define the Galton-Watson GW n similarly to above in terms of the modified IBRW, where we again denote by C n the individuals in the first generation. Since ( f t k ) is decreasing, we can couple the processes such that if GW n survives then GW n survives and also such that |C n | ≤ |C n |.
For the lower bound on P(C n = ∅) it turns out that it is enough to control the probability of the set being non-empty. Note that for both, the original process and the modification, the set C (asym) n is the same, however in the modified process, the next generations N + 1, . . . , L(N ) have the distribution of a single-type branching random walk. In particular, the number of particles in each generation form a Galton-Watson process, which we will denote by (X k ) k∈N 0 = (X k (M)) k∈N 0 . Moreover, we will denote by q = q M its extinction probability when started with a single particle. Note that q does not depend on n and we will use that by increasing M we have that lim M→∞ q M = 0 and also lim M→∞ E X 1 (M) = ∞.
By the Markov property, the survival of the two subsets of the killed branching branching walk are related by The next result is the key step in the overall lower bound, where we will bound the probability that C (asym) n is non-empty. Proof By construction the probability of the event C (asym) n = ∅ does not depend on the starting point of the initial point nor its type, so we can assume that S (n) (∅) = 0 and τ (∅) = 0.
For the first part of the proof, we will omit the indices n, whenever the context is clear and we are not dealing with asymptotic statements. In particular, we will write N = N n , S = S (n) , α = α n , etc. Also, in this proof ρ = ρ p n in the percolation case and ρ = ρ t n if the attachment rule is varying.
The first step is to carefully, select the relevant particles in C (asym) n . For R n = e N 1/4 and M n = N 1/5 and an individual x, we write Recall that For any |x| = N , letν The Paley-Zygmund inequality yields . (4.9) The remaining proof proceeds as follows: in the first step we find an easier upper bound on E[Y 2 n ], which we can estimate using the many-to-one lemma in the second step. In Step 3, we find a lower bound on E[Y n ], which we will then combine with the other steps to obtain our claim.
Step 1: We split the sum over y according to the last time j ∈ {0, . . . , N } that the ancestors of y agree with x to obtain Conditioning on the j + 1 first generations and using the independence of the branching process and the fact that on {ν x j ≤ R n }, we only have to consider at most R n relevant siblings of x j+1 , we obtain the upper bound where h N ,n = 1 and for j ≤ N − 2, h j,n := sup In particular, we have shown that (4.10) Step 2: Upper bound on h j,n . First, we note since the left, resp. right, end point of I k,n are to the left of the left, resp. right, end point of I j,n for k ≥ j, we can apply the monotonicity in the initial types to deduce that h j,n ≤ sup We now would like to apply Mogulskii's theorem to estimate the probability, but we need a bound that works uniformly for all j (since we will be summing over j) and uniformly in u.
We thus approximate the sum over j by a finite sum and also split up the interval [−λN 1/3 , 0] into smaller subintervals. So fix κ ∈ N and define K := K n := N /κ .
Combining the upper bound on N j=1 h j,n with the lower bound on E[Y n ] as well as the fact that R n = e N 1/4 , we obtain from the second moment bound (4.10), Finally, we can let θ 1 ↓ θ to obtain the claim of the proposition. In our above construction, we first of all choose the constant M large enough such that the Galton-Watson process (X k (M)) k∈N 0 satisfies the assumptions of Lemma 4.2 for θ 1 = 2 and θ 2 = 1/2. Also, we can assume that its survival probability satisfies 1 − q M ≥ 1/2. Let θ be such that α * θ < 1. Then, choose b large enough such that θ b > M, so that in particular GW n is a subset of the killed IBRW, cf. (4.7). Additionally, we require that b > π 2 σ 2 1 − α * θ 2α * log 2 2 .
Then, if we define r := 1 16 P(C (asym) n = ∅), we obtain by (4.19) that r −2 2 −N 1/3 ≤ e N 1/3 (2λα * −log 2+o(1)) → 0, We will use the following general fact (see [9,Fact 4.2], but also [10, Lemma 5.2]): let X 1 , . . . , X k be independent non-negative random variables and suppose F : (0, ∞) → [0, ∞) is non-increasing, then (4.20) We will eventually apply Lemma 4.1 to GW n and thus first estimate P(1 ≤ |C n | ≤ r −2 ). We write S for the positions in the modified branching random walk used in the definition of GW n . For all sufficiently large n, by using (4.20) in the second step, we obtain where we used Lemma 4.2 for the last inequality.

Proofs in the Linear Case
In this section, we show how to deduce Corollary 1.3 from our general result, Theorem 1.2.
Proof of Corollary 1.3 In the proof of Proposition 1.3 in [7] it was shown that for linear attachment functions, the spectral radius of A α equals the largest eigenvalue of This eigenvalue is given by In order to apply Theorem 1.2, we need to determine ρ (α * ). To this end, we write (α) for the large squared bracket in (5.1) and Then ρ (α) = − 1 2 ϕ (α) ϕ(α) 2 (β(1 − 2γ ) + (α)) We need the second derivative only for α = α * = 1 2 . Since after applying the product rule, any term multiplied by (2α − 1) vanishes, we obtain For the corresponding derivative with respect to γ we use (5.6) to derive Since g 1 and g 2 are continuous functions, taking A → ∞ yields lim sup n→∞ a 2 n k n log P (0,τ 0 ) (E n ) ≤ − π 2 σ 2 2 Now we can take δ → 0 to establish the claim.
Choose N = k n r n , m N = k n and m k = kr n for 0 ≤ k ≤ N − 1. Writing y k = g( m k k n ) for 1 ≤ k ≤ N , the Markov property implies P 0,τ 0 (E n ) ≥ p 1,n (0, τ 0 ) ×