Cluster tails for critical power-law inhomogeneous random graphs

Recently, the scaling limit of cluster sizes for critical inhomogeneous random graphs of rank-1 type having finite variance but infinite third moment degrees was obtained (see previous work by Bhamidi, van der Hofstad and van Leeuwaarden). It was proved that when the degrees obey a power law with exponent in the interval (3,4), the sequence of clusters ordered in decreasing size and scaled appropriately converges as n goes to infinity to a sequence of decreasing non-degenerate random variables. Here, we study the tails of the limit of the rescaled largest cluster, i.e., the probability that the scaling limit of the largest cluster takes a large value u, as a function of u. This extends a related result of Pittel for the Erd\H{o}s-R\'enyi random graph to the setting of rank-1 inhomogeneous random graphs with infinite third moment degrees. We make use of delicate large deviations and weak convergence arguments.

Theorem 1.1 says that the ordered connected components in the critical Erdős-Rényi random graph are described by the ordered excursions of the reflected version of (W λ t ) t≥0 . The strict inequalities between the scaling limits of the ordered cluster follows from the local limit theorem proved in [23], see also [25,29]. Pittel [31,Eq. (1.12)] derived an exact formula for the distribution function of the limiting variable γ 1 (λ) (of the largest component) and various asymptotic results were obtained, including P(γ 1 (λ) > u) = 1 √ 9π/8u 3/2 e − 1 8 u(u−2λ) 2 (1 + o(1)), u → ∞. (1.2) As pointed out in [32,33], the constant √ 9π/8 was mistakenly reported in [31,Eq. (1.12)] as √ 2π due to a small oversight in the derivation. The result in (1.2) gives sharp asymptotics for the largest component in the critical Erdős-Rényi graph. It was rederived and extended in [33] using the original approach in [31]. Another generalization of (1.2) was obtained in [24] by studying the excursions of the scaling limit of the exploration process that is used to describe the limits in Theorem 1.1. In this paper, we follow a similar path, but then for a class of inhomogeneous random graphs and its scaling limit, and extend (1.2) to this setting.
Several recent works have studied inhomogeneity in random graphs and how it changes the critical nature. In our model, the vertices have a weight associated to them, and the weight of a vertex moderates its degree. Therefore, by choosing these weights appropriately, we can generate random graphs with highly variable degrees. For our class of random graphs, it is shown in [22, Theorem 1.1] that when the weights do not vary too much, the critical behavior is similar to the one in the Erdős-Rényi random graph. See in particular the recent works [6,34], where it was shown that if the degrees have finite third moment, then the scaling limit for the largest critical components in the critical window are essentially the same (up to a trivial rescaling that we explain in more detail below) as for the Erdős-Rényi random graph in Theorem 1.1.
When the degrees have infinite third moment, instead, it was shown in [22, Theorem 1.2] that the sizes of the largest critical clusters are quite different. In [7] scaling limits were obtained for the sizes of the largest components at criticality for rank-1 inhomogeneous random graphs with power-law degrees with power-law exponent τ ∈ (3,4). For τ ∈ (3, 4), the degrees have finite variance but infinite third moment. It was shown that the sizes of the largest components, rescaled by n −(τ −2)/(τ −1) , converge to hitting times of a thinned Lévy process. The latter is a special case of the general multiplicative coalescents studied by Aldous and Limic in [2] and [3]. We next discuss these results in more detail.

Inhomogeneous Random Graphs
In our random graph model, vertices have weights, and the edges are independent, with edge probabilities being approximately equal to the rescaled product of the weights of the two end vertices of the edge. While there are many different versions of such random graphs (see below), it will be convenient for us to work with the so-called Poissonian random graph or Norros-Reittu model [30]. To define the model, we consider the vertex set [n] := {1, 2, . . . , n} and suppose each vertex is assigned a weight, vertex i having weight w i . Now, attach an edge between vertices i and j with probability Different edges are independent. In this model, the average degree of vertex i is close to w i , thus incorporating inhomogeneity in the model. There are many adaptations of this model, for which equivalent results hold. Indeed, the model considered here is a special case of the so-called rank-1 inhomogeneous random graph introduced in great generality by Bollobás et al. [11]. It is asymptotically equivalent with many related models, such as the random graph with prescribed expected degrees or Chung-Lu model, where instead p i j = max(w i w j / n , 1), (1.4) and which has been studied intensively by Chung and Lu (see [13][14][15][16][17]). A further adaptation is the generalized random graph introduced by Britton et al. [12], for which (1.5) See Janson [26] for conditions under which these random graphs are asymptotically equivalent, meaning that all events have asymptotically equal probabilities. As discussed in more detail in [22, Sect. 1.3], these conditions apply in the setting to be studied in this paper. Therefore, all results proved here also hold for these related rank-1 models. We refer the interested reader to [22, Sect. 1.3] for more details.
Having specified the edge probabilities as functions of the vertex weights w = (w i ) i∈ [n] in (1.3), we now explain how we choose the vertex weights. Let the weight sequence w = (w i ) i∈ [n] be defined by (1.6) where F is a distribution function on [0, ∞) for which we assume that there exists a τ ∈ (3,4) and 0 < c F < ∞ such that and where [1 − F] −1 is the generalized inverse function of 1 − F defined, for u ∈ (0, 1), by By convention, we set [1 − F] −1 (1) = 0. Note that our inhomogeneity is chosen in such a way that the vertex weights i → w i are decreasing, with w 1 being the largest vertex weight. For the setting in (1.3) and (1.6), by [11,Theorem 3.13], the number of vertices with degree k, which we denote by N k , satisfies where P −→ denotes convergence in probability, and where W has distribution function F appearing in (1.6). We recognize the limiting distribution as a so-called mixed Poisson distribution with mixing distribution F, i.e., conditionally on W = w, the distribution is Poisson with mean w. As discussed in more detail in [22], since a Poisson random variable with large parameter w is closely concentrated around its mean w, the tail behavior of the degrees in our random graph is close to that of the distribution F. As a result, when (1.7) holds, and with D n the degree of a uniformly chosen vertex in [n], lim sup n→∞ E[D a n ] < ∞ when a < τ − 1 and lim sup n→∞ E[D a n ] = ∞ when a ≥ τ − 1. In particular, the degree of a uniformly chosen vertex in [n] has finite second, but infinite third moment when (1.7) holds with τ ∈ (3,4).
Under the key assumption in (1.7), (1)), u ↓ 0, (1.10) and the third moment of the degrees tends to infinity, i.e., with W ∼ F, we have E[W 3 ] = ∞. Define , (1.11) so that, again by (1.7), ν < ∞. Then, by [11,Theorem 3.1] (see also [11,Sect. 16.4] for a detailed discussion on rank-1 inhomogeneous random graphs, of which our random graph is an example), when ν > 1, there is one giant component of size proportional to n, while all other components are of smaller size o(n), and when ν ≤ 1, the largest connected component contains a proportion of vertices that converges to zero in probability. Thus, the critical value of the model is ν = 1. The main goal of this paper is to investigate what happens close to the critical point, i.e., when ν = 1.
With the definition of the weights in (1.6) and for F such that ν = 1, we write G 0 n (w) for the graph constructed with the probabilities in (1.3), while, for any fixed λ ∈ R, we write G λ n (w) when we use the weight sequence w(λ) = (1 + λn −(τ −3)/(τ −1) )w. (1.12) We shall assume that n is so large that 1 + λn −(τ −3)/(τ −1) ≥ 0, so that w i (λ) ≥ 0 for all i ∈ [n]. When τ > 4, so that E[W 3 ] < ∞, it was shown in [6,22,34] that the scaling limit of the random graphs studied here are (apart from a trivial rescaling of time and λ) equal to the scaling limit of the ordered connected components in the Erdős-Rényi random graph in Theorem 1.1. The rescaling of time and λ is due to the variance of the step distribution of the cluster exploration process being unequal to 1 (see Sect. 1.2 below for more details on what we mean with 'cluster exploration'). For the Erdős-Rényi random graph the step distribution has a Poisson distribution with parameter 1 minus one. When τ ∈ (3, 4) the situation is entirely different, as discussed next.
Throughout this paper, we make use of the following standard notation. We let d −→ denote convergence in distribution, and P −→ convergence in probability. For a sequence of random variables (X n ) n≥1 , we write X n = o P (b n ) when |X n |/b n P −→ 0 as n → ∞. For a non-negative function n → g(n), we write f (n) = O(g(n)) when | f (n)|/g(n) is uniformly bounded, and f (n) = o(g(n)) when lim n→∞ f (n)/g(n) = 0. Furthermore, we write f (n) = (g(n)) if f (n) = O(g(n)) and g(n) = O( f (n)). Finally, we abbreviate (1.13)
In order to further specify the scaling limit (γ i (λ)) i≥1 , we need to introduce a continuoustime process (S t ) t≥0 , referred to as a thinned Lévy process, and defined as the constant given in [7, (2.18)] 1 . The process (S t ) t≥0 starts out positive. It can be positive or negative, and we will be interested in the first hitting time of (S t ) t≥0 of zero. Further, here we use the notation where (T i ) i≥2 are independent exponential random variables with mean The term thinned Lévy process refers to the fact that I i (t) can be interpreted as by N i (t) in this representation, then the corresponding process is a Lévy process. In (S t ) t≥0 , only the first point in these Poisson processes is counted, thus we can think about the Poisson processes as being thinned. See below for more details on the interpretation of (S t ) t≥0 . Let H 1 (0) denote the first hitting time of 0 of the process (S t ) t≥0 , i.e., Let us informally describe how the process (S t ) t≥0 arises through a cluster exploration, and how it is linked to H a 1 (0) in (1.20) as well as (γ i (λ)) i≥1 in (1.14). In Theorem 1.3, we explore the connected component of vertex 1 one vertex at a time in a breadth-first way, and keep track of the number of active vertices, which are vertices that are found to be in C(1), but whose neighbors have not yet been inspected whether they are in C(1). Let S (n) k be the number of active vertices after k steps, so that S (n) 0 = 1. Obviously, |C(1)| = inf{k : S (n) k = 0}, since we are done with the exploration of a cluster when there are no unexplored vertices left, and we explore one vertex at a time. By construction, S (n) 1 is the number of neighbors of vertex 1, which can be seen to be close to w 1 ≈ bn α . Thus, the exploration process can be expected to be of order n α , and we will rescale S (n) 1 by a factor n −α . As explained in more detail in [7] and for the edge-probabilities in (1.3), the exploration can be performed rather effectively in terms of a marked branching process with mixed Poisson offspring distribution. Here, an unexplored vertex in the branching process, v, first draws a mark M v for which P(M v = i) = w i / n , and after this, it draws a Poisson number of children with mean w M v . The connection to the cluster exploration in the graph is obtained by thinning all vertices whose mark has appeared earlier. Here, we can think of the mark M v = i as indicating that the vertex v in the branching process is being mapped to vertex i in the graph.
The largest weights correspond to the small values of i ∈ [n]. The amount of time it takes us to draw a mark corresponding to vertex i is of the order n /w i , which is of order n ρ a/i α , which suggests that (n −α S (n) tn ρ ) t≥0 converges in distribution to some process (S t ) t≥0 . Further, the first time that i ≥ 2 is chosen, k makes a jump of order n α b/i α when i is found to be in C(1). This informally explains the process (1.15)-(1.18), while (1.19) explains that when the exploration process hits zero, then the cluster is fully explored. Turning this into a formal proof was one of the main steps in [7].
The above description does not yet describe the scaling limit (γ i (λ)) i≥1 in (1.14). For this, we note that after the exploration of C(1), we need to explore the clusters of the (high-weight) vertices that are not part of C(1). We do this by taking the vertex with largest weight that is not in C(1), which in the scaling limit corresponds to the smallest i for which I i (H 1 (0)) = 0, and start exploring the cluster of that vertex. This is again done by using processes similar to (S t ) t≥0 , but changes arise due to the depletion-of-points effect. Indeed, since C(1) is fully explored, in later explorations those vertices cannot arise again. We refrain from describing this in more detail, as it is not needed in this paper. We repeat this procedure, and explore the connected components of unexplored vertices of the highest weights one by one. After performing these explorations infinitely often, we obtain (γ i (λ)) i≥1 as the ordered vector of hitting times of zero of these cluster exploration processes. Some more details are given in Sect. 2.4.
By scaling, H a 1 (0)/a for some a, b, c has the same distribution as the hitting time H 1 (0) obtained by taking b = a = 1, and c = c/(ab) = (λ + ζ )/(ab). We shall reparametrize a = b = 1 and let (1.21) where we setβ (1.22) have used the notation and where I i (t) is defined in (1.17)-(1.18) with a replaced by a = 1. This scaling is convenient, as it reduces the clutter in our notation.

Main Results
In this section we state our main results. Recall γ 1 (λ) from (1.14). Our Main Theorem establishes a generalization of Pittel's result in (1.2) to our rank-1 inhomogeneous random graph with power-law exponents τ ∈ (3, 4): Main Theorem (Tail behavior scaling limit for τ ∈ (3, 4)). When u → ∞, there exists The constants I , A and κ i j are specified in Sect. 2. By scaling, these constants only depend on a, b through c = c/(ab) = (λ + ζ )/(ab), any other dependence disappears since the law of H 1 (0) only depends on c . Since τ ∈ (3, 4), the sum over i, j such that i + j ≥ 1 is in fact finite, as we can ignore all terms for which τ − 1 − i(τ − 2) − j (τ − 3) < 0. We also see that the Main Theorem connects up nicely with Pittel's result in (1.2) that arises for τ = 4, as for example seen in the fact that the exponent of u in the exponential is equal to 3 for τ = 4 and the exponent in the power of u in the prefactor is equal to 3/2, as in (1.2). That these powers depend sensitively on τ is a manifestation of the importance of the inhomogeneity, which we will see throughout this paper.
Aside from the Main Theorem, we prove two further theorems about the structure of the largest connected component when it is large. The first theorem concerns the probability that H a 1 (0) > u for some u > 0 large, where H a 1 (0) is the weak limit of n −ρ |C(1)| identified in Theorem 1.3. This is achieved by investigating the hitting time H 1 (0) of 0 of the process (S t ) t≥0 in (1.21). Theorem 1.4 (Tail behavior scaling limit cluster vertex 1 for τ ∈ (3, 4)). When u → ∞, there exists I > 0 independent ofβ and A = A(β) and κ i j (β) ∈ R such that The constants I , A and κ i j are equal to those in the Main Theorem. Comparing the Main Theorem and Theorem 1.4, we see that P(H 1 (0) > u) = P(γ 1 (λ) > au) · (1 + o (1)).
This has the interpretation that vertex 1, which is the vertex with the largest weight in our rank-1 inhomogeneous random graph, is with overwhelming probability in the largest connected component when this largest connected component is quite large. We can even go one step further and study the optimal trajectory the process t → S t takes in order to achieve the unlikely event that H 1 (0) > u when u is large. In order to describe this trajectory, we need to introduce some further notation. In the proof, it will be crucial to tilt the distribution, i.e., to investigate the measureP with Radon-Nikodym derivative e θ uS u /E[e θ uS u ], for some appropriately chosen θ . The selection of an appropriate θ for the thinned Lévy process(S t ) t≥0 is quite subtle, and has been the main topic of our paper [1]. The main results from paper [1] are reported in Sect. 2, and will play an important role in the present analysis. We refer to below (2.13) for the definition of θ * that appears in the description of the optimal trajectory that is identified in the following theorem ( Fig. 1): Our Main Theorem follows by combining Theorems 1.4 and 1.5, and showing that, for u large, the probability that 1 ∈ C (1) is overwhelmingly large. This argument is performed in detail in Sect. 2.4: Brownian motion on a parabola Note that substituting τ = 4 into (1.24) yields A u 3/2 e −I u 3 +κ 01 u 2 +(κ 10 +κ 02 )u (1 + o(1)), which agrees with the result of Pittel in (1.2). This suggests a smooth transition from the case τ ∈ (3, 4) to the case τ > 4. We next further explore this relation.
Consider the process (W λ t ) t≥0 = (W t + λt − t 2 /2) t≥0 with (W t ) t≥0 a standard Wiener process as mentioned in Theorem 1.1. We now apply the technique of exponential change of measure to this process. First note that the moment generating function of W λ u can be computed as log φ(u; ϑ) ≡ log E e ϑuW λ u = ϑu λu − 1 2 u 2 + 1 2 ϑu 2 (1. 28) and let θ * u be the solution of θ * u = arg min ϑ log φ(u; ϑ), which is given by The main term is Noting that we see that this upper bound agrees to leading order with the result of Pittel in (1.2). In order to derive the full asymtptotics in (1.2), one can define the measure 33) and then deduce the asymptotics of the latter expectation in full detail. Our analysis will be based on this intuition, now applied to a more involved, so-called thinned Lévy, stochastic process.

Overview of the Proofs
In this section, we give the overview of the proofs of Theorems 1.4-1.3. The point of departure for our proofs is the conjecture that P(H 1 (0) > u) ≈ P(S u > 0) for large u. The event {H 1 (0) > u} obviously implies {S u > 0}, but because of the strong downward drift of the process (S t ) t≥0 , it seems plausible that both events are roughly equivalent.
In [1] a detailed study was presented on the large deviations behavior of the process (S t ) t≥0 . Using exponential tilting of measure the following two theorems were proved. Theorem 2.1 (Exact asymptotics tail S u [1, Theorem 1.1]). There exists I, D > 0 and κ i j ∈ R such that, as u → ∞, In [1] it is explained that specific challenges arise in the identification of a tilted measure due to the power-law nature of (S t ) t≥0 . General principles prescribe that the tilt should follow from a variational problem, but in the case of (S t ) t≥0 this involves a Riemann sum that is hard to control. In [1] this Riemann sum is approximated by its limiting integral, and it is proved that the tilt that follows from the corresponding approximate variational problem is sufficient to establish the large deviations results in Theorems 2.1 and 2.2. Details about this tilted measure are presented in Sect. 2.1.
It is clear that Theorems 2.1 and 2.2 for the event {S u > 0} are the counterparts of Theorems 1.4 and 1.5 for {H 1 (0) > u}. Let us now sketch how we make the conjecture that P(H 1 (0) > u) ≈ P(S u > 0) for large u formal. We show that P(H 1 (0) > u) has the same asymptotic behavior as P(S u > 0) in (2.1), with the same constants except for the constant D. Despite the similarity of this result, the proof method we shall use is entirely different from the exponential tilting in [1]. In order to establish the asymptotics for P(H 1 (0) > u), we establish sample path large deviations, not conditioned on the event {S u > 0}, but on the event {H 1 (0) > u}. This is much harder, since we have to investigate the probability that S t > 0 for all t ∈ [0, u]. However, this is also more important, as only the hitting times H 1 (0) give us asymptotics of the limiting cluster sizes. In order to prove these strong sample-path properties, we first prove that, under the tilted measure, S t is close to its expected value for a finite, but large, number of t's, followed by a proof that the path cannot deviate much in the small time intervals between these times. Now here is our strategy for the proofs. We extend the conjecture P(H 1 (0) > u) ≈ P(S u > 0) by a conjectured sample path behavior that says that, under the tilted measure, the typical sample path of (S t ) t≥0 that leads to the event {S u > 0} remains positive and hence implies {H 1 (0) > u}. To be more specific, we divide up this likely sample path into three parts: the early part, the middle part, and the end part. Our proof consists of treating each of these parts separately. We shall prove consecutively that with high probability the process: (i) Does not cross zero in the initial part of the trajectory ('no early hits'); (ii) Is high up in the state space in the middle part of the trajectory, while experiencing small fluctuations, and therefore does not hit zero ('no middle ground'); (iii) Is forced to remain positive until the very end.
In the last step, we have to be very careful, and it is in this step that it will turn out that the constant D arising in the asymptotics of P(S u > 0) in (2.1) is different from the constant A arising in the asymptotics of P(H 1 (0) > u) in (1.25). This is due to the fact that even when S u > 0, the path could dip below zero right before time u and does so with non-vanishing probability. The proof reveals that then it will do so in the time interval [u − T u −(τ −2) , u] for some large T .
We next summarize the technique of exponential tilting developed in [1] for the thinned Lévy process (S t ) t≥0 with τ ∈ (3, 4), which allows us to give more details about how we shall establish the conjectured sample path behavior for each of the three parts described above.

Tilting and Properties of the Tilted Process
All results presented in this subsection are proved in [1].
Exponential tilting Parts of this section are taken almost verbatim from [1]. We use the notion of exponential tilting of measure in order to rewrite where ϑ is chosen later on. For every event E, define the measure P ϑ with corresponding expectation E ϑ by means of the equality with normalizing constant φ(u; ϑ) given by In terms of this notation, we are interested in where we write We now explain in more detail how to choose a good ϑ. The independence of the indicators The function x → f (x; ϑ) is integrable at x = 0 and at x = ∞, so the above sum can be approximated by the integral for some error term u → e ϑ (u) given by where α = 1/(τ − 1) and the Riemann zeta functions ζ(·) defined as where Re(s) denotes the real part of s ∈ C. Equation (2.11) follows from Euler-Maclaurin summation [21, p. 333]. The error term in (2.10) converges to 0 uniformly for ϑ in compact sets bounded away from zero. As a result, (1) . (2.12) Let θ * u be the solution of Moreover, let θ * be the value of ϑ where ϑ → (ϑ) is minimal. It follows easily that I ≡ − (θ * ) > 0 and that θ * is unique. In [1, Lemma 3.6], we have seen that θ * u = θ * +o(1). Further, θ * > 0 by [1,Lemma 3.5]. Set φ(u) = φ(u; θ * u ). The asymptotics of φ(u) are as follows. (2.14)

Properties of the process under the tilted measure
In what follows, take ϑ = θ * u , and let P = P θ * u with corresponding expectation E = E θ * u . Abbreviate θ = θ * u . Under this new measure, the rare event of S u being positive becomes quite likely. To describe these results, let us introduce some notation. Recall from (1.26) that, for p ∈ [0, 1], where we take ϑ = θ * , which turns out to be the limit of θ * u as u → ∞ (see, e.g., [1,Lemma 3.6]). The asymptotic mean of the process p → S pu conditionally on S u > 0 can be described with the help of the function p → I E ( p), cf. Theorem 2.2. One easily checks that and I E (1) = 0, (2.16) the latter by definition of θ * , as 0 = (θ * ) = I E (1). Finally, and We will also need some consequences of the asymptotic properties of E[S t ]. This is stated in the following corollary: Proof Part (a) for t ∈ [ε, εu] for ε > 0 sufficiently small follows from Lemma 2.4(a) together with the facts that I E (0) = 0, I E (0) > 0, and that 1+t +t|θ * −θ * u |u τ −3 = o(tu τ −3 ). The fact that I E (0) > 0 also implies that c can be taken to be strictly positive. For t ∈ [εu, u/2], Part (a) follows from the fact that I E ( p) > 0 for all p ∈ [ε, 1/2] and that 1 Part (b) follows as Part (a), now using Lemma 2.4(b) together with the fact that I E (1) = 0, Part (c) follows from Lemma 2.4(a), by subtracting the two terms. Note that the error term by Part (a) of this corollary. Further, note that Part (d) follows again from Lemma 2.4(a) by subtracting the two terms. Note again that the by part (b) of this corollary and Lemma 2.4(a). Further, note that The next lemma gives asymptotic properties of the variance of S t . Define, for p ∈ [0, 1], (2.23) Again, it is not hard to check that The next result bounds the Laplace transform of the couple (S t , S u ): Combine Proposition 2.7 and u E[S u ] = o(1) (see [1,Lemma 4.1]) to show that u −(τ −3)/2 S u converges to a normal distribution with mean 0 and variance I V (1). Moreover, as we see below, the density of S u close to zero behaves like (2π I V (1)) −1/2 u −(τ −3)/2 : There are three more results from [1] that will be used in this paper. The first is a description of the distribution of the indicator processes (I i (t)) t≥0 under the measure P. Since our indicator processes (I i (t)) t≥0 are independent, this property also holds under the measure P: Lemma 2.9 (Indicator processes under the tilted measure [1,Lemma 4.2]) Under the measure P, the distribution of the indicator processes (I i (t)) t≥0 is that of independent indicator processes. More precisely, The second lemma describes what happens to the variances for small p or for p close to 1: Consequently, there exist 0 < c < c < ∞ such that, for every p ∈ [0, ε] with ε > 0 sufficiently small, We finally rely on the following corollary that allows us to compute sums that we will encounter frequently: (1)).

No Early Hits and Middle Ground
In this section, we prove that the tilted process is unlikely to hit 0 until a time that is very close to u. We start by investigating the early hits.
No early hits In this step, we prove that it is unlikely that the process hits zero early on, i.e., in the first time interval [0, ε] for some ε > 0 sufficiently small. In its statement, we write 0 ∈ S [0,t] for the event that {S s = 0} for some s ∈ [0, t], so that P(
The proof of Lemma 2.12 follows from a straightforward application of the FKGinequality for independent random variables (see [19], or [20,Theorem 2.4,p. 34]). The standard versions of the FKG-inequality hold for independent indicator random variables, and in our case we need it for independent exponentials. It is not hard to prove that the FKG-inequality we need holds by an approximation argument.
Proof We note that the process (S t ) t≥0 is a deterministic function of the exponential random variables (T i ) i≥2 (recall (1.15), (1.17) and (1.18)). Now, the event {0 ∈ S [0,ε] } is increasing in terms of the random variables (T i ) i≥2 (use that S t only has positive jumps). Here we say that is decreasing (for a definition, change the role of t i and t i in the definition of an increasing event), so that the FKG-inequality implies that these events are negatively correlated: We conclude the proof by noting that The key to our proof of Theorem 1.4 will be to show that P(H 1 (0) > u) = (P(S u > 0)), so that Lemma 2.12 and the known asymptotics of P(S u > 0) imply that it is unlikely to have an early hit of zero.
No middle ground By (2.4) (recall that φ(u) = φ(u; θ) with θ = θ * u ), Lemma 2.12 and Theorem 2.1, For M > 0 arbitrarily fixed, we split As a result, we arrive at We continue to prove that the dominant contribution to the expectation of the right-hand side of (2.39) originates from paths that remain positive until time u − t for t = T u −(τ −2) , with T > 0 arbitrarily fixed.

Proposition 2.13 (No middle ground).
Fix ε > 0. For every u ∈ [0, ∞) and ε, M > 0 fixed, We prove Proposition 2.13 in Sect. 3. By (2.39) and Proposition 2.13, Since ε, M and T are arbitrary, it now suffices to identify the asymptotics of the expectation appearing on the right-hand side of (2.41).

Remaining Positive Near the End
To prove Theorem 1.4, by Proposition 2.3 and Eq. (2.41), it suffices to prove that, with where T > 0 fixed. In the above expectation, we see two terms. The term e −θ uS u forces S u to be small, more precisely, ,u] >0} forces the path to remain positive until time u. We now study these two effects. We start by highlighting the ideas behind the analysis of the process ,u] . Comparing Theorem 1.4 to Theorem 2.1, we see that they are identical, except for the precise constant, which is A in Theorem 1.4 and D > A in Theorem 2.1. This difference is due to the fact that, conditionally on S u > 0, the process has a probability of not hitting zero in the interval [u − T u −(τ −2) , u] that is strictly positive and bounded away from zero. In order to analyse this probability, we identify the scaling limit of the process (uS u−tu −(τ −2) − uS u ) t≥0 as u → ∞ conditionally on uS u = v, and relate it to a certain Lévy process. The parameter A/D is closely related to the probability that this limiting process is bounded below by −v, integrated over v. Let us now give the details.
In order to investigate the probability that ,u] > 0, we proceed as follows. Let denote the set of indices for which T j ≤ u. We condition on the set J (u). Note that S u is measurable with respect to J (u). We now rewrite S u−t in a convenient form. For this, recall (1.21) and write We aim to use dominated convergence on the above integral, and we start by proving pointwise convergence. By Proposition 2.8, . This leads us to study, for all v > 0, The main result for the near-end regime is the following proposition, which proves that g u (v) converges pointwise.
Proposition 2.14 (Weak conditional convergence of time-reversed process). (a) As u → ∞, conditionally on uS u = v,

53)
where (−L t ) t≥0 is a Lévy process with no positive jumps and with Laplace transform

54)
and characteristic measure (2.55) Proposition 2.14 is proved in Sect. 5, and determines the precise constant A from (1.25), as we now explain in more detail.
We proceed by investigating some properties of the supremum of the Lévy process from (2.53) that we need later on. Note in particular that the distribution of L s in (2.54) does not depend on v. With a slight abuse of notation, also the probability law describing the limiting process (L s ) s≥0 shall be denoted by P.

Completion of the Proofs
. A similar problem was encountered in [1, Proof of Theorem 1.1], which is restated here as Theorem 2.1, apart from the fact that there the function g u,t (v) was absent.
We wish to use bounded convergence. For this, we note that Recall that D is the constant appearing in Theorem 2.1. Since D = B/θ by [1, (7.4)] and P M ≤ v < 1 for every v, we also immediately obtain that A ∈ (0, D). and completes the proof of Theorem 1.4.
Path properties: Proof of Theorem 1. 5 We bound, using that .
(2.67) By Theorems 2.1 and 1.4, the ratio of probabilities converges to D/A ∈ (0, ∞), while, by Theorem 2.2, the conditional probability converges to 0. This completes the proof of Theorem 1.5.

Completion of the Proof of the Main Theorem
We finally complete the proof of the scaling of the critical clusters in the Main Theorem using Theorem 1.4 and recalling (1.22). For this, we go back to the random graph setting. Let us start by giving some introduction. The process (S t ) t≥0 in (1.21) arises when exploring a cluster in the Norros-Reittu random graph with weights w(λ) defined in (1.6) and (1.12), as described informally in Sect. 1.2. Recall Theorem 1.3. Here S t denotes the scaling limit of n −1/(τ −1) = n −α times the number of vertices found at time tn The key idea is that each time that a vertex, say j ∈ [n], is being explored, we have a chance (1 + λn −(τ −3)/(τ −1) )w i w j / n that the edge to the vertex i with the ith largest weight is present. As it turns out (see e.g., [6, Lemma 1.3]), the vertices are found in a size-biased reordered way, meaning that the kth vertex found is v (k) , where (here the factor Thus, the average weight of the kth vertex found is which informally corresponds to the graph being close to critical (as made more precise in [7]). By (2.68), the probability that at the kth exploration we find the vertex i with the ith largest weight is close to w i / n . By (1.10) and (1.6), Further, the probability that vertex i is not found in the time interval [0, t]n ρ is close to e −tai −1/(τ −1) = P(I i (t) = 0). It is not hard to see that these events are weakly dependent, so that the scaling limits of the times that the high-weight vertices are found are close to independent exponential random variables with rate ai −1/(τ −1) . This explains the random variables arising in (1.21). The restriction to i ≥ 2 in (1.21) arises since we explore the cluster of vertex 1. The cluster is fully explored when there are no more active vertices waiting to be explored. This corresponds to S t = 0 for the first time, which is H a 1 (0) and explains the result in Theorem 1.3. Recall the informal description in Sect. 1.2 here as well.
Next, we claim that when a particularly large cluster is found, then, since the weight w 1 is the largest of all weights, the maximal cluster is whp the cluster of vertex 1. This explains why the asymptotics in the Main Theorem for the maximal cluster is identical to the asymptotics in Theorem 1.4 for the cluster of vertex 1. To make this heuristic precise, we show, in this section, that indeed it is unlikely for an unusually large cluster to be found that does not contain 1. We next make the ideas in this heuristic precise, by introducing the exploration process of the cluster of vertex i for a general i ≥ 1.
(2.71) [7,Remark 3.9] and recall a , b , c from above (1.21)). The intuition for the above formula is that where we slightly abuse notation to now set I i (0) = 1 for the process (S (i) t ) t≥0 since vertex i is almost surely in the cluster of vertex i. Since (S (i) t ) t≥0 describes the scaling limit of the exploration process of the cluster of vertex i ≥ 1, while I j (t) has the interpretation as the indicator that vertex j is found in the exploration before time t, it is reasonable to set 2 Again recall the informal description of the exploration process in Sect. 1.2.
We continue to show that it is highly unlikely that the cluster of vertex i is large, while vertex 1 is not in it. For this, we define . This provides us with the appropriate background to complete the proof of the Main Theorem.
We start with the lower bound. By construction, γ 1 (λ) ≥ a · H 1 (0) (see [7, Theorems 1.1 and 2.1] and recall that C (i) denotes the ith-largest connected component). Therefore, Here we have used the fact that there are with high probability only finitely many clusters that are larger than εn ρ (as proved in [7, Theorem 1.6]). By the weak convergence of n −ρ |C ≤ (i)|, it holds that lim n→∞ P(n −ρ |C ≤ (i)| ≥ au) = P(H i (0) > u) for all i ≥ 1, so that we arrive at (2.77) The first term is the main term, and we prove that i≥2 P(H i (0) > u) = o(P (H 1 (0) > u)) now.
For this, we note that We can rewrite, on the event {I j (u) = 0 ∀ j ∈ [i − 1]}, and using that c 1 ≥ c i for every i ≥ 1, (2.79) Therefore, where in the last equality, we use that, conditionally on I j (u) = 0 forall j ∈ [i] \ {1}, the equality is decreasing (recall the notions used in the proof of Lemma 2.12) in the random variables (T i ) i≥2 , while the event {S (1) [0,u] > 0} is increasing. Thus, by the FKG-inequality, We can identify This completes the proof of the Main Theorem.

No Middle Ground: Proof of Proposition 2.13
In this section, we show that the probability to hit zero in the time interval where T is a constant, becomes negligible as T → ∞.
The strategy of proof is as follows. We start in Proposition 3.2 by investigating the value of S t at some discrete times (t k ) k≥1 in [0, u] and show that with high probability S t does not deviate far from its mean. Next, in Proposition 3.3, we show that it is unlikely for the process (S t ) t≥0 to make a substantial deviation in the interval [t k , t k+1 ] from its value in t k .
We start with a preparatory lemma that will allow us to give bounds on the asymptotic parameters appearing in the upcoming proofs:

1)
and, for all |λ| ≤ δu with δ > 0 sufficiently small, there exists K > 0 such that Proof We use the second moment method. With Lemma 2.9 we compute that the latter by an explicit computation using that c i = i −1/(τ −1) . Further, with Corollary 2.11 The Chebychev inequality now proves (3.1). For (3.2), we again compute Thus, for |λ| ≤ δu and again using Corollary 2.11, we obtain Further, Again the claim follows from the Chebychev inequality.
We continue to show that the probability for S t to deviate far from its mean at some discrete times in the time interval [ε, u − T u −(τ −2) ] is small when T is large enough: Proposition 3.2 (Probability to deviate far from mean at discrete times). Let η > 0 and δ u = u −(τ −2) . For any ε > 0 and M > 0,

9)
where we recall the definition of o T (1) from Proposition 2.13.
] is the hardest, and is split into three steps. We start by rewriting the event of interest. We define s = u − t and investigate S u−s in what follows, so that now s ∈ [T u −(τ −2) , ε].
Recall the definition of Q u (s) in (2.45), (3.21) We condition on J (u) from (2.43), and note that S u is measurable w.r.t J (u) to obtain

(3.22)
This is the starting point of our analysis. We split, writing η = η/2, We conclude using the union bound that (3.24) We will bound both contributions separately, and start by setting the stage. We compute that where we abbreviate It turns out that both contributions in (3.24) can be expressed in terms of p i,u (s), and we continue our analysis by studying this quantity in more detail.
Proof for t ∈ [u − ε, u − T u −(τ −2) ]: Analysis of p i,u (s). We next analyse the conditional probability p i,u (s). We compute (recall (1.23), (2.29) and (2.43)) Using the distribution of T i formulated in Lemma 2.9, we obtain, for any s ∈ [0, u], so that (3.29) We start by bounding p i,u (s), for s ∈ [0, ε], by Moreover, for u sufficiently large, Proof for t ∈ [u − ε, u − T u −(τ −2) ]: Completion first term (3.24). For the first term in (3.24), we use Markov's inequality in the form P and recall from (3.25) that The summands are conditionally independent given J (u) and identically 0 when I i (u) = 0, so that By the second bound in (3.30) and Corollary 2.11, the first term is at most By (3.1) in Lemma 3.1, we may assume that ∞ i=2 c 2 i I i (u) ≤ K u τ −3 , since the complement has a probability that is o(u −(τ −1)/2 ). Then, in a similar way, using the first bound in (3.30), the second term is at most As a result, Since s ≥ T u −(τ −2) , this can be simplified to We conclude using (3.32) that, on the event that so that, also using that P(S u ∈ [0, M/u]) = O(u −(τ −1)/2 ) by Proposition 2.8, This bound is true for any s ∈ [T u −(τ −2) , ε]. Taking s = s k = ku −(τ −2) and summing out over k ≥ T leads to when we take T = T (η) sufficiently large, as required.
Proof for t ∈ [u − ε, u − T u −(τ −2) ]: Completion second term (3.24). For the second term in (3.24), we need to bound We compute using (3.18) As a result, using (3.26), where with (3.29) As a result, For both terms, we use the Chebychev inequality. For X , as E[X ] = 0, this leads to (3.48) We use Lemma 2.9 to see that P(i ∈ J (u)) = 1−e −c i u 1−e −c i u +e −c i u(1+θ) , so that since 1 − e −x + e −x(1+θ) is uniformly bounded from below away from 0 for all x ≥ 0. We use this together with Corollary 2.11 to compute that (3.50) Therefore, as required below.
For the term involving Y (s), we start by using the union bound to obtain Then, by the Chebychev inequality and as E[Y (s k )] = 0, where we used Corollary 2.11 in the last equality. Substituting this into (3.52) and (3.53), we arrive at We now know that with high probability the process does not deviate much from its mean when observed at the discrete times kδ u ∈ [ε, u −T δ u ]. We continue to show that this actually holds with high probability on the whole interval [ε, u − T δ u ]. We complete the preparations for the proof of Proposition 2.13 by proving that it is unlikely for the process to deviate far from the mean for all times t ∈ [ε, u − T δ u ] simultaneously: where we take λ = δu with δ > 0 sufficiently small and K ≥ 1 as in Lemma 3.1. We first give a bound on P(E c u ∩ {S u ∈ [0, M/u]}). We apply (3.1) in Lemma 3.1 to obtain that (3.58) which is contained in the error term in (3.56). Further, by (3.2) in Lemma 3.1 Combined with Proposition 3.2, this ensures that As a result, we are left to control the fluctuations of the process on any interval I k = [kδ u , (k + 1)δ u ]. We use Boole's inequality to bound We split the analysis into four cases, depending on whether t k ≤ u/2 or not, and on which we refer to as 'large upper' and 'large lower' deviations, respectively. In all of the four cases, we take advantage of the following observations concerning the law of our indicator processes under P. By (1.21), For k respectively t k fixed and t ≥ 0, let k and As a result, (3.65) Let (T k i ) i≥2 be a sequence of independent exponential random variables with mean 1/c i (under P) that are independent of F t k , the σ -algebra generated by (t; t k , u)) 0≤t≤δ u be a sequence of processes that are independent in i (and also independent of all randomness so far), non-decreasing, taking values in {0, 1} and with success probability at time t ∈ [0, δ u ] of We can therefore without loss of generality assume that under P, the sequence of processes For lower deviations (see Part 2 and 4 below), we will use as a lower bound in (3.62) (3.69) For upper deviations (see Part 1 and 3 below), we require an upper bound instead. In a first step, we replace k i (t) by 1 {T k i ≤t} and show that the resulting error is sufficiently small in case t k ≤ u/2 (see Part 1). Indeed, let (3.70) and define, for t ∈ I k = [0, δ u ], Then, we obtain that ] by Corollary 2.5(a), (b) and Lemma 2.4(d). Further, in Part 5 below, we will prove that Part 1: The case t k ≤ u/2 and a large upper deviation We start by bounding the probability that there exists a t ∈ (1)) for any t ∈ I k by Corollary 2.5(c), we bound (3.74) which can be stochastically dominated by the processβt is a Poisson process with rate c i . As a result, with (3.73), Since (R t ) t≥0 is a finite-variance Lévy process, it is well-concentrated. In more detail, for λ ∈ R, we define the exponential martingale (3.77) Then, for every λ ≥ 0, using that φ(λ) ≥ 0 and by Doob's inequality, We apply this inequality to x = η 2 E[S t k ], t = δ u and λ = 1, and Corollary 2.5(a) implies that E[S t k ] ≥ ct k u τ −3 = ck/u for t k = kδ u ∈ [ε, u/2]. Therefore (using t k = kδ u ) which is small even when summed out over k as above.
Part 2: The case t k ≤ u/2 and a large lower deviation We continue with bounding the probability that there exists a t ≤ δ u and ε ≤ t k ≤ u/2 such that S t k +t − E[S t k +t ] ≤ −10η E[S t k +t ], which is slightly more involved. Here we can use that B + t k ,t ≥ 0. Again using that E[S t k +t ] = E[S t k ](1 + o(1)) for any t ≤ δ u by Corollary 2.5(c), we bound (3.80) Further, using (3.62) and (3.69), as well as the realization that where we set Thus, conditionally on (I i (t k )) i≥2 , the process (R t ) t≥0 is a Lévy process similar to the Lévy process investigated in Part 1 above and D t is the contribution due to i for which N i (t) ≥ 2.
We deal with the two terms one by one (recalling that we have already dealt with B − t k ,t above (3.73)), starting with (R t ) t≥0 . As in the previous part, we show that is small enough even when summed out over k such that t k ∈ [ε, u/2]. This again follows by Doob's inequality and the bound that for any λ ≥ 0, and with F t k the σ -algebra generated by We compute that Now follow the same steps as in Part 1, using that 0 ≤ φ (−2) ≤ const. We continue to bound D t by bounding since the process t → D t is non-decreasing. By the Markov inequality, When summing this out over k such that t k = kδ u ∈ [ε, u/2] we obtain a bound This proves that as required. Collecting terms completes Part 2.

Part 3: The case t k ≥ u/2 and a large upper deviation
This proof is more subtle. We fix k such that t k ∈ [u/2, u − T δ u ] and condition on F t k , which is the σ -field generated by (S t ) t≤t k to write (recall (3.57)) ) for any t ≤ δ u by Corollary 2.5(d). Using (3.72) similarly to (3.81), we bound where we note that R t is as in Part 2, (3.81). Conditionally on F t k , the process (R t ) t≥0 is a Lévy process, and we use where we recall Eqs. (3.86) and (3.87). Since e λc i − 1 − λc i ≥ 0 for every λ ∈ R, and since (3.92)), we have that φ (λ) ≤ K λ 2 u τ −4 , so that we can further bound, choosing λ = δu and t = δ u = u −(τ −2) , (3.97) We take x = η 2 E[S t k ] and note that Corollary 2.5(b) and Lemma 2. (3.98) Summing over k with t k = kδ u ∈ [u/2, u − T δ u ] and δ u = u −(τ −2) , using Proposition 2.8 and E[S t k ] ≤ c(u − t k )u τ −3 by Corollary 2.5(b) and Lemma 2.4(d) yields as an upper bound (recall also (3.92) and the definition of E u from (3.57)) as required.
Part 4: The case t k ≥ u/2 and a large lower deviation We again start from (3.81), and note that B + t k ,t ≥ 0 and that the bound on D t proved in Part 2 and that on B − t k ,t proved around (3.73) still apply, now using that by Corollary 2.5(b) and Lemma 2. (3.89). The exponential martingale bound for R t performed in Part 3 can easily be adapted to deal with a large lower deviation as well. We omit further details.

Part 5: The error term
Recall the definition of B + t k ,t in (3.71), and the bound that we need to prove in (3.73). Write We first use the first moment method to obtain the estimate (3.65)), using Corollary 2.11. As a result, using Markov's inequality, as required.
We continue with B +,1 t k ,t , which we bound as again by the fact that t → B k i (t) is non-decreasing. Thus, we can write, using (3.72),

Using that E[S t k+1 ] = E[S t k ](1 + o(1)) by Corollary 2.5(c), we can bound this by
We write By the analysis in Parts 1-4, as well as (3.101), we know that (with a possibly different value for η for the last term) Indeed, for the bound on B − t k ,δ u , see the argument below (3.72). The last term is bounded in terms of Lévy processes in each of the different parts. We conclude that it suffices to

Conditional Expectations Given uS u = v
A major difficulty in the proof of Proposition 2.14 is the fact that, while the summands in the definition of Q u (t) in (2.45) are independent, this property is lost due to the fact that we condition on S u . The following lemma allows us to deal with such expectations: where i denotes the imaginary unit.
For G((S s ) s≥0 ) = 1, (4.1) is just the usual Fourier inversion theorem applied to the (continuous) random variable S u . The expectation E G((S s ) s≥0 )e ikS u ] factorizes when G((S s ) s≥0 ) is of product form in the underlying random variables (I i (s)) s≥0 . In our applications, E G((S s ) s≥0 ) | S u = w will be close to constant in w. Then, in order to compute its asymptotics, it suffices to check that the computation in the proof of Proposition 2.8 is hardly affected by the presence of G((S s ) s≥0 ).
Proof Define the measure P G by Under the measure P G , the random variable S u is again continuous, since 0 < E[G((S s ) s≥0 )] < ∞. Let f G S u denote the density of S u under the measure P G . Then, we obtain, by the Fourier inversion theorem applied to P G , that Now, by (4.2), Therefore, substituting both sides in (4.3) and multiplying through by E G((S s ) s≥0 ) proves the claim.
Let P v denote P conditionally on uS u = v, so that Lemma 4.1 implies that In many cases, it shall prove to be convenient to rewrite the above using since the random variables (T i ) i∈J (u) are, conditionally on J (u), independent with In the following lemma, we investigate the effect on P(i ∈ J (u)) of conditioning on S u = w: (4.9) Proof By Lemma 4.1 (for the second term use G ≡ 1) (4.10) Recall Lemma 2.9. Under the measure P, the distribution of the indicator processes (I j (t)) t≥0 is that of independent indicator processes. Define S ( j) u = S u − c j (I j (u) − c j u). By (1.21) and (2.43), the random variables I j (u) and S ( j) u are independent under P. This yields Next we claim that there exist constants C 1 , C 2 such that for all j ≥ 2 Indeed, for S ( j) u replaced by S u the result was derived in the proof of Proposition 2.8 in [1]. To prove the same for S ( j) u with j ≥ 2 arbitrary, and following the approach in [1], we obtain for k 2π u −(τ −3)/2−1 ≤ 1/8 the bound Substituting (4.12) in (4.11) yields We further have (4.16) which yields 2π .
Proof The bound by 1 is obvious. The bound by Cc i u follows once we recall (2.30) and observe that for c j ≤ 1/u, P(T j ≤ u) = P( j ∈ J (u)) ≤ C(τ )c j u. Now use Lemma 4.2(i).

The Near-End Ground: Proof of Proposition 2.14
In this section, we prove Proposition 2.14. The proof is divided into several key parts. In Sect. 5.1, we show convergence of the mean process A u in Proposition 2.14(a). In Sect. 5.2, we prove the convergence of B u in Proposition 2.14(b).

Convergence of the Mean Process A u
Recall the definition of A u from (2.50). By (4.8), with an error term E u (t) bounded by uniformly in t ≤ T . Since j≥2 c 3 j < ∞ and u −2(τ −2)+1 = u 5−2τ = o(1), the first term vanishes. Further, by Corollary 4.3 with w = v/u, (1), (5.4) so that also the second term is o Pv (1).
In the above proof, we see that it is useful to split a sum over j ∈ J (u) into j ∈ J (u) such that c j > 1/u and j ∈ J (u) such that c j ≤ 1/u. Then we use upper bounds similar to the ones in Corollary 4.3 to bound the arising sums. We will follow this strategy often below.
We further rewrite (5.2) into Note that 0 ≤ q j (u) ≤ 1 for u big. Below, we will frequently rely on the bounds and, using (2.30) for t = u, By (5.3), to prove the claim of Proposition 2.14(a), it is enough to show that For this, we compute the Laplace transform of κ u under the measure P v using Lemma 4.1 and a change of variable. For a ≥ 0, By Proposition 2.8, for each v > 0, u (τ −3)/2 f S u (v/u) → B. We aim to use dominated convergence on the integral appearing in (5.9), for which we have to prove (a) pointwise convergence for each k ∈ R; and (b) a uniform bound that is integrable. We start by proving pointwise convergence: To compute E e −aκ u e iku −(τ −3)/2 S u , recall the definition of S u from (1.21) and recall that the indicator processes I j (t) = 1 {T j ≤t} are independent under the measure P (cf. Lemma 2.9), to see that The remainder of the proof proceeds in three steps.
Step 1: Asymptotic factorization We start by proving that To this end, we first use where we abbreviate q ≡ −aq j (u) ≤ 0 such that To bound j (q), we write e iku −(τ −3)/2 c j = 1 + (e iku −(τ −3)/2 c j − 1) and use the triangle inequality to bound each summand by We can bound e q − e q P(T j ≤u) ≤ |q|e (q∨0) and which gives a bound |q|e (q∨0) |k|u −(τ −3)/2 c j P(T j ≤ u) on the last line of (5.16).
To bound the first line of (5.16), we use the error bounds |e −x − 1 + x| ≤ |x| 2 for all x ≥ 0 to all the exponential functions in it, to obtain 1 − P(T j ≤ u) + e q P(T j ≤ u) − e q P(T j ≤u) ≤ Cq 2 P(T j ≤ u). (5.18) Together, this leads us to To prove (5.12), by (5.14) and (5.19) it is enough to show that j≥2 j = o (1). Consider the sum over c j > 1/u first. By (5.6), where we have used that j c 3 j < ∞ and τ > 3 in the last equality. For c j ≤ 1/u and by (5.6), we similarly get (1). (5.21) This completes the proof that j≥2 j = o(1) and thus of the claim in (5.12).
Here we used that the integrand in the last line of (5.22) is continuous and integrable over (0, ∞). Set −x −α = z to get the representation (2.52) for κ.
Step 3: Completion of the proof By Proposition 2.7, we know that Therefore, Steps 1-2 and (5.23) complete the proof of pointwise convergence in Lemma 5.1.
To show that the dominated convergence theorem can be applied, it remains to show that the integrand in (5.9) has an integrable dominating function: Proof By definition of S u from (1.21) and the independence in Lemma 2.9, We can rewrite each factor as since q j (u) ≥ 0. We then use log(1 + x) ≤ x for x ≥ −1 to obtain The latter equals with an overall error term (using that sup j q j (u) is arbitrarily small for u big enough) where we have used the bounds Completion of the proof of Proposition 2.14(a). By the dominated convergence theorem, Lemmas 5.1 and 5.2 complete the proof of Proposition 2.14(a).

Convergence of the Process B u
In this section, we investigate the convergence of the B u process and prove Proposition 2.14(b). Since the limit is a random process, this part is more involved than the previous section. We first note that ,u]} ) t≥0 are, conditionally on J (u), independent. Thus, (B u (tu −(τ −2) )) t≥0 is, conditionally on J (u), a sum of (conditionally) independent processes having zero mean. We make crucial use of this observation, as well as the technique in Lemma 4.1, to compute expectations of various functionals of the process (B u (tu −(τ −2) )) t≥0 . In order to prove the stated convergence in distribution, we follow the usual path of first proving weak convergence of the one-dimensional marginals, followed by the weak convergence of all finite-dimensional distributions, and complete the proof by showing tightness. We now discuss each of these steps in more detail.

Convergence of the One-Dimensional Marginal of B u
We start by computing the one-dimensional marginal of B u (tu −(τ −2) ) (recall (5.35)) and show that it is consistent with the claimed Lévy process limit. We achieve this by computing the Laplace transform (5.36) and proving that it converges to the Laplace transform of the claimed Lévy process limit at time t. The main result in this section is the following proposition: Proposition 5.3 (One-time marginal of B u (tu −(τ −2) )). There exists a measure such that, for every v, a > 0 fixed and as u → ∞, which is the Laplace transform of a Lévy process (L s ) s≥0 with non-negative jumps and characteristic measure Therefore, the one-dimensional marginals of the process (B u (su −(τ −2) )) t≥0 converge to those of (L s ) s≥0 .
The remainder of this section is devoted to the proof of Proposition 5.3. As for A u , we use Lemma 4.1 and a change of variables to rewrite and where we abbreviate by (4.8). We again wish to use dominated convergence on the integral in (5.39). We proceed along the lines of the proof of the convergence of the mean process A u . Basically, in the proof below, we replace −aκ u in (5.9) (recall the definition of κ u and q j (u) from (5.8) and (5.5)) by j∈J (u) r u j,t , where we define In what follows, we frequently make use of the bounds Proof The first factor on the left-hand side of (5.45) converges to 1. We identify the limit of the expectation in the following steps that mimic the pointwise convergence proof in Lemma 5.1. It will be convenient to split the asymptotic factorization in Step 1 of that proof into two parts, denoted by Steps 1(a) and 1(b). We start by showing that we can simplify ψ J (a): Step 1(a): Simplification of ψ J (a) As a first step towards the identification of the pointwise limit, we show that we can simplify the expectation in (5.45) as follows: Using the first line of (5.13) and applying the error bound |e x − (1 + x)| ≤ |x| 2 for |x| ≤ 1 to the differences |a j − b j |, the error of the approximation can be bounded by Next use that 1 − x ≤ e −x for x ≥ 0 to obtain as a further bound to the above For t ≤ T with T > 0 fixed, we further have by (5.43) that e ac j up u j,t ≤ C(a, T ). Together with e −x − 1 + x ≥ 0 for x ≥ 0, we obtain (5.51) The bound e −x − 1 This yields as an upper bound for (5.46) (recall (5.48)), which is equivalent to bounding appropriately. Here we used that log(1 + x) ≤ x for x ≥ 0. Next bound P(T i ≤ u) ≤ Cc i u in the above to obtain that for c i ≤ 1/u we have C(a)(c i u) 2 tu −(τ −1) ≤ C(a, T )u −(τ −1) ≤ log(2) for u big enough. Hence we can use that e x − 1 ≤ 2x for 0 ≤ x ≤ log(2) and thus get as a further upper bound to (5.57)

C(a, T )
The last inequality follows from (5.31). This completes the proof of (5.46).
Step 1(b): Asymptotic factorization We next show that To prove (5.59), we note that, by the definition of r u j,t in (5.42), As in the calculations of the Laplace transform of A u in (5.14), we now apply (5.13). Note that here we cannot apply the second bound of (5.13) as sup j (|a j | ∨ |b j |) is not bounded by 1 (recall that r u j,t ≥ 0). Instead, we get We proceed to prove that the first and the third product are bounded by constants. Indeed, we can bound the third product using (5.7) by As r u j,t is uniformly bounded for u big enough the above is again bounded by (5.63). Hence, it suffices to bound the middle part of (5.61), that is, it remains to show that for u = u(k) big enough. The bound on r u j,t in (5.44) is equal to C(a, T ) times the bounds on q j (u) in (5.6). The remaining calculations for A u in (5.19)-(5.21) therefore directly carry over, so that (5.65) follows.
Step 2: The limit of E[ j∈J (u) r u j,t ]. In this step, we identify the limit of E[ j∈J (u) r u j,t ]. For this, we use that by definition of r u j,t in (5.42), that of p u j,t in (5.41), and (2.30) with t = u, , we have to check that is a measure on (−∞, 0) that satisfies (dz)(1 ∧ z 2 ) < ∞. Indeed, close to 0, z 2 (dz) behaves like (τ − 1)z −(τ −3) dz, which is integrable at 0 and for z → ∞, (dz) behaves like e −z (τ − 1)z −(τ −1) dz, whose integral is finite for all n ∈ N.

Convergence of the Finite-Dimensional Distributions of B u
In this section, the convergence of the one-dimensional marginals of the process (B u (tu −(τ −2) )) t≥0 gets extended to convergence of its finite-dimensional distributions. In the same way as above, it can be shown that, for 0 < t 1 · · · < t n , the increments (B u (t i u −(τ −2) ) − B u (t i−1 u −(τ −2) )) n i=1 (where, by convention, t 0 = 0) converge in distribution, under P v , to independent Lévy random variables with the correct distribution.
In what follows, we only outline some minor changes in the proof. Instead of (5.40), we fix n ∈ N, a ∈ (R + ) n and 0 = t 0 < t 1 < · · · < t n ≤ T and consider where we have used that, by definition, Lévy processes have independent stationary increments. This completes the convergence of the finite-dimensional distributions of (B u (tu −(τ −2) )) t≥0 .

Tightness of B u
We next turn to tightness of the process (B u (tu −(τ −2) )) t≥0 . For this, we use the following tightness criterion: Proposition 5.6 (Tightness criterion [8, Theorem 15.6 and the comment following it]). The sequence {X n } is tight in D([0, T ], R d ) if the limiting process X has a.s. no discontinuity at t = T and there exist constants C > 0, r > 0 and a > 1 such that for 0 ≤ t 1 < t 2 < t 3 ≤ T and for all n, We show tightness of V (u) (t) given uS u = v. In what follows, we therefore bound By the conditional independence of the processes conditional on J (u) (recall comment preceding (4.8)), and as we subtract their respective expectations, we obtain For the first sum, note that (c i u) 4 u −2(τ −1) = c 4 i u −2(τ −3) , so that its sum is order o(1) as i c 3 i < ∞ and τ > 3. For the second sum in (5.91), we note that the sum over i such that c i > 1/u is clearly bounded, since it is bounded by which converges to a constant as u → ∞ since it is a Riemann approximation to a finite integral. For the contributions due to c i ≤ 1/u, we bound its expectation as as required.

Completion of the Proof of Proposition 2.14(b)
The convergence of the finite-dimensional distributions together with tightness yields (B u (tu −(τ −2) )) t≥0