Long paths in first passage percolation on the complete graph II. Global branching dynamics

We study the random geometry of first passage percolation on the complete graph equipped with independent and identically distributed edge weights, continuing the program initiated by Bhamidi and van der Hofstad [6]. We describe our results in terms of a sequence of parameters $(s_n)_{n\geq 1}$ that quantifies the extreme-value behavior of small weights, and that describes different universality classes for first passage percolation on the complete graph. We consider both $n$-independent as well as $n$-dependent edge weights. The simplest example consists of edge weights of the form $E^{s_n}$, where $E$ is an exponential random variable with mean 1. In this paper, we focus on the case where $s_n\rightarrow \infty$ with $s_n=o(n^{1/3})$. Under mild regularity conditions, we identify the scaling limit of the weight of the smallest-weight path between two uniform vertices, and we prove that the number of edges in this path obeys a central limit theorem with asymptotic mean $s_n\log{(n/s_n^3)}$ and variance $s_n^2\log{(n/s_n^3)}$. This settles a conjecture of Bhamidi and van der Hofstad [6]. The proof relies on a decomposition of the smallest-weight tree into an initial part following invasion percolation dynamics, and a main part following branching process dynamics. The initial part has been studied in [14]; the current article focuses on the global branching dynamics.


Model and Summary of Results
In this paper, we study first passage percolation on the complete graph equipped with independent and identically distributed positive and continuous edge weights. In contrast to earlier work [11,12,16,20,27], we consider the case where the extreme values of the edge weights are highly separated.
We start by introducing first passage percolation (FPP). Given a graph G = (V (G), E(G)), let (Y (G) e ) e∈E(G) denote a collection of positive edge weights. Thinking of Y (G) e as the cost of crossing an edge e, we can define a metric on V (G) by setting where the infimum is over all paths π in G that join i to j, and Y (G) represents the edge weights (Y (G) e ) e∈E(G) . We will always assume that the infimum in (1.1) is attained uniquely, by some (finite) path π i, j . We are interested in the situation where the edge weights Y (G) e are random, so that d G,Y (G) is a random metric. In particular, when the graph G is very large, with |V (G)| = n say, we wish to understand the scaling behavior of the following quantities for fixed i, j ∈ V (G): (a) The distance W n = d G,Y (G) (i, j)-the total edge cost of the optimal path π i, j ; (b) The hopcount H n -the number of edges in the optimal path π i, j ; (c) The topological structure-the shape of the random neighborhood of a point.
In this paper, we consider FPP on the complete graph, which acts as a mean-field model for FPP on finite graphs. In [11], the question was raised what the universality classes are for this model. We bring the discussion substantially further by describing a way to distinguish several universality classes and by identifying the limiting behavior of first passage percolation in one of these classes. The cost regime introduced in (1.1) uses the information from all edges along the path and is known as the weak disorder regime. By contrast, in the strong disorder regime the cost of a path π is given by max e∈π Y (G) e . We establish a firm connection between the weak and strong disorder regimes in first passage percolation. Interestingly, this connection also establishes a strong relation to invasion percolation (IP) on the Poisson-weighted infinite tree (PWIT), which is the local limit of IP on the complete graph, and also arises in the context of the minimal spanning tree on the complete graph (see e.g. [1]).
Our main interest is in the case G = K n , the complete graph on n vertices V (K n ) = [n] := {1, . . . , n}, equipped with independent and identically distributed (i.i.d.) edge weights (Y (Kn ) e ) e∈E(K n ) . We write Y for a random variable with Y d = Y (G) e , and assume that the distribution function F Y of Y is continuous. For definiteness, we study the optimal path π 1,2 between vertices 1 and 2. First, we introduce some general notation: Notation. All limits in this paper are taken as n tends to infinity unless stated otherwise. A sequence of events (A n ) n happens with high probability (whp) if P(A n ) → 1. For random variables (X n ) n , X , we write X n d −→ X , X n P −→ X and X n a.s.
−→ X to denote convergence in distribution, in probability and almost surely, respectively. For real-valued sequences (a n ) n , (b n ) n , we write a n = O(b n ) if the sequence (a n /b n ) n is bounded; a n = o(b n ) if a n /b n → 0; a n = (b n ) if the sequences (a n /b n ) n and (b n /a n ) n are both bounded; and a n ∼ b n if a n /b n → 1. Similarly, for sequences (X n ) n , (Y n ) n of random variables, we write X n = O P (Y n ) if the sequence (X n /Y n ) n is tight; X n = o P (Y n ) if X n /Y n P −→ 0; and X n = P (Y n ) if the sequences (X n /Y n ) n and (Y n /X n ) n are both tight. We denote by x the greatest integer not exceeding x. Moreover, E denotes an exponentially distributed random variable with mean 1. We often need to refer to results from [21], and we will write, e.g., [Part I, Lemma 2.18] for [21,Lemma 2.18].
For a brief overview of notation particular to this paper, see p. 81.

First Passage Percolation with Regularly-Varying Edge Weights
In this paper, we will consider edge-weight distributions with a heavy tail near 0, in the sense that the distribution function F Y (y) decays slowly to 0 as y ↓ 0. It will prove more convenient to express this notion in terms of inverse F −1 Y (u), since we can write where U is uniformly distributed on [0, 1]. Expressed in terms of F −1 Y , saying that the edgeweight distribution is heavy-tailed near 0 means that F −1 Y (u) decays rapidly to 0 as u ↓ 0. We will quantify this notion in terms of the logarithmic derivative of F −1 Y , which will become large as u ↓ 0.
In this section, we will assume that where α ≥ 0 and t → L(t) is slowly varying as t → ∞. That is, for all a > 0, lim t→∞ L(at)/L(t) = 1. In other words, we assume that u → u d is regularly varying as u ↓ 0. Recall that a functionL : (0, ∞) → (0, ∞) is called regularly varying as u ↓ 0 if lim u↓0L (au)/L(u) is finite but nonzero for all a > 0.
Define a sequence s n by setting u = 1/n: (1.4) The asymptotics of the sequence (s n ) n quantify how heavy-tailed the edge-weight distribution is. For instance, an identically constant sequence, say s n = s, corresponds to a pure power law F Y (y) = y 1/s , F −1 Y (u) = u s ; larger values of s correspond to heavier-tailed distributions. In this paper, we are interested in the regime where s n → ∞, which corresponds to a very heavy-tailed distribution function F Y (y) that decays to 0 slower than any power of y, as y ↓ 0.
To describe our scaling results, define (1. 6) In view of (1.6), the family (u n (x)) x∈(0,∞) are the characteristic values for min i∈[n] Y i . See [22] for a detailed discussion of extreme value theory. ) e∈E(K n ) follow an n-independent distribution F Y that satisfies (1.3). If the sequence (s n ) n from (1.4) satisfies s n / log log n → ∞ and s n = o(n 1/3 ), then there exist sequences (λ n ) n and (φ n ) n with φ n /s n → 1, λ n u n (1) → e −γ , where γ is Euler's constant, such that n F Y W n − 1 λ n log (n/s 3 n ) d −→ M (1) ∨ M (2) , (1.7) Here Z is standard normal, and M (1) , M (2) are i.i.d. random variables for which P(M ( j) ≤ x) is the survival probability of a Poisson Galton-Watson branching process with mean x.
Let us discuss the result in Theorem 1.1 in more detail. Under the hypotheses of Theorem 1.1, u n (x) varies heavily in x in the sense in that u n (x + δ)/u n (x) → ∞ for every x, δ > 0. Consequently, the extreme values are widely separated, which is characteristic of the strong disorder regime.
We see in (1.7) that W n − 1 λ n log (n/s 3 n ) ≈ u n (M (1) ∨ M (2) ), which means that the weight of the smallest-weight path has a deterministic part 1 λ n log (n/s 3 n ), while its random fluctuations are of the same order of magnitude as some of the typical values for the minimal edge weight adjacent to vertices 1 and 2. For j ∈ {1, 2}, one can think of M ( j) as the time needed to "escape" from the local neighborhood of vertex j. The sequences (λ n ) n and (φ n ) n will be identified in (3.17)- (3.18), subject to slightly stronger assumptions.
The optimal paths in Theorem 1.1 are long paths because the asymptotic mean of the path length H n in (1.8) is of larger order than log n, the path length that arises in many random graph contexts. See Sect. 2.2 for a comprehensive literature overview. The following example collects some edge-weight distributions that are covered by Theorem 1.1: The hypotheses of Theorem 1.1 are satisfied whenever γ > 1.
(b) Let a, γ > 0. Take Y (Kn ) e d = U a(log(1+log(1/U ))) γ , for which log F −1 Y (u) = a log u(log(1+ log(1/u))) γ and s n = a(log(1 + log n)) γ + aγ log n 1 + log n (log(1 + log n)) γ −1 . (1.10) We note that s n ∼ a(log log n) γ as n → ∞. The hypotheses of Theorem 1.1 are satisfied whenever γ > 1. We shall see, however, that the conclusions of Theorem 1.1 also hold when 0 < γ ≤ 1; see Sect. 2.1 and Lemma 4.8. Notice that every sequence (s n ) n of the form s n = n α L(n), for α ≥ 0 and L slowly varying at infinity, can be obtained from a distribution by taking log F −1 Y (u) = u −1−α L(1/u)du, i.e., the indefinite integral of the function u → u −1−α L(1/u). In Sect. 2.1 we will weaken the requirement s n / log log n → ∞ to the requirement s n → ∞ subject to an additional regularity assumption.

First Passage Percolation with n-Dependent Edge Weights
In Theorem 1.1, we started with a fixed edge-weight distribution and extracted a specific sequence (s n ) n . For an essentially arbitrary distribution [subject to the relatively modest regular variation assumption in (1.3)], its FPP properties are fully encoded, at least for the purposes of the conclusions of Theorem 1.1, by the scaling properties of this sequence (s n ) n . Thus, Theorem 1.1 shows the common behaviour of a universality class of edge-weight distributions, and shows that this universality class is described in terms of a sequence of real numbers (s n ) n and its scaling behaviour.
In this section, we reverse this setup. We take as input a sequence (s n ) n and consider the n-dependent edge-weight distribution (1.12) where E is exponentially distributed with mean 1. (For legibility, our notation will not indicate the implicit dependence of Y ( where Z is standard normal and M (1) , M (2) are i.i.d. random variables for which P(M ( j) ≤ x) is the survival probability of a Poisson Galton-Watson branching process with mean x.
We note that Theorem 1.3 resolves a conjecture in [11]. This problem is closely related to the problem of strong disorder on the complete graph, and has attracted considerable attention in the physics literature [18,23,33]. The convergence in (1.13) was proved in [Part I, Theorem 1.5 (a)] without the subtraction of the term 1 n sn (1+1/s n ) sn log (n/s 3 n ) in the argument, and under the stronger assumption that s n / log log n → ∞. 1 The edge-weight distribution in Theorem 1.3 allows for a simpler intuitive explanation of (1.7), while the convergence (1.14) verifies the heuristics for the strong disorder regime in [11,Sect. 1.4]. See Remark 4.6 for a discussion of the relation between these two results. As mentioned in Sect. 1.1, strong disorder here refers to the fact that when s n → ∞ the values of the random weights E s n e depend strongly on the disorder (E e ) e∈E(G) , making small values increasingly more, and large values increasingly less, favorable. Mathematically, the elementary limit lim expresses the convergence of the s norm towards the ∞ norm and establishes a relationship between the weak disorder regime and the strong disorder regime of FPP.
Remarkably, a similar argument actually also applies to Theorem 1.1, exemplifying that these settings are in the same universality class. Indeed, Theorem 1.3 shows that n-dependent distributions can be understood in the same framework as the n-independent distributions in Example 1.2. We next explain this comparison and generalize our results further by explaining the universal picture behind them.

The Universal Picture
In Sect. 2.1, we generalize the results in Theorems 1.1 and 1.3 to a larger class of edge weights and provide a common language that allows us to prove these results in one go. Having reached this higher level of abstraction, in Sect. 2.2, we will embed the results achieved here in the wider picture of universality classes of FPP, and provide conjectures or results how to describe all universality classes and what the scaling behaviour of each of them might be. Links to the relevant literature and existing results are provided. For a short guide to notation, see p. 81.

Description of the Class of Edge Weights to Which Our Results Apply
In this section, we describe a general framework containing both Theorem 1.1 as well as Theorem 1.3. This framework, which is in terms of i.i.d. exponential random variables, determines the precise conditions that the edge weights need to satisfy for the results in Theorems 1.1-1.3 to apply. Interestingly, due to the parametrization in terms of exponential random variables, this general framework also provides a clear link between the near-critical Erdős-Rényi random graph and our first passage percolation problem where the lower extremes of the edge-weight distribution are highly separated. Finally and conveniently, this framework allows us to prove these theorems simultaneously. In particular, both the n-independent edge weights in Theorem 1.1, as well as the n-dependent ones in Theorem 1.3, are key examples of the class of edge weights that we will study in this paper.
For fixed n, the edge weights (Y (Kn ) e ) e∈E(K n ) are independent for different e. However, there is no requirement that they are independent over n, and in fact in Sect. 5, we will produce Y (Kn ) e using a fixed source of randomness not depending on n. Therefore, it will be useful to describe the randomness on the edge weights ((Y (Kn ) e ) e∈E(K n ) : n ∈ N) uniformly across the sequence. It will be most useful to give this description in terms of exponential random variables. Fix independent exponential mean 1 variables (X (Kn ) e ) e∈E(K n ) , and define where g : (0, ∞) → (0, ∞) is a strictly increasing function. The relation between g and the distribution function F Y is given by We define Because of this convenient relation between the edge weights Y (Kn ) e and exponential random variables, we will express our hypotheses about the distribution of the edge weights in terms of conditions on the functions f n (x) as n → ∞. (2.5) Thus, (2.4)- (2.5) show that the parameter s n measures the relative sensitivity of min i∈[n] Y i to fluctuations in the variable E. In general, we will have f n (x) ≈ f n (1)x s n if x is appropriately close to 1 and s n ≈ f n (1)/ f n (1). These observations motivate the following conditions on the functions ( f n ) n , which we will use to relate the distributions of the edge weights Y (Kn ) e , n ∈ N, to a sequence (s n ) n : Even though we will rely on Condition 2.1 when s n → ∞ and s n = o(n 1/3 ), we strongly believe that the scaling of the sequence (s n ) n actually characterises the universality classes, in the sense that the behaviour of H n and W n is similar for edge weights for (s n ) n with similar scaling behaviour, and different for sequences that have different scaling. We elaborate on this in Sect. 2.2.1, where we identify eight different universality classes and the expected and/or proved results in them.

Condition 2.3 (Density bound for large weights)
(a) For all R > 1, there exist ε > 0 and n 0 ∈ N such that for every 1 ≤ x ≤ R, and n ≥ n 0 , For all C > 1, there exist ε > 0 and n 0 ∈ N such that (2.8) holds for every n ≥ n 0 and every x ≥ 1 satisfying f n (x) ≤ C f n (1) log n.
Notice that Condition 2.1 implies that f n (1) ∼ u n (1) [recall the definition of u n (x) in (1.5)] whenever s n = o(n). Indeed, by (2.3) we can write u n (1) = f n (x 1/s n n ) for x n = (−n log(1 − 1/n)) s n . Since s n = o(n), we have x n = 1 − o (1) and the monotonicity of f n implies that f n (x 1/s n n )/ f n (1) → 1. We remark also that (1.6) remains valid if u n (x) is replaced by f n (x).
We are now in a position to state our main theorem: where Z is standard normal, and M (1) , M (2)  = E s n from Theorem 1.3, (2.6)-(2.8) hold identically with ε 0 = ε = 1 and we explicitly compute λ n = n s n (1 + 1/s n ) s n and φ n = s n in Example 6.1. We will prove in Lemma 4.8 that the distributions in Theorem 1.1 satisfy the assumptions of Theorem 2.4. The convergence (1.7) in Theorem 1.1 is equivalent to (2.9) in Theorem 2.4 by the observation that, for any non-negative random variables (T n ) n and M, where the convergence is in distribution, in probability or almost surely; see e.g. [Part I, Lemma 5.5] for an example.
The following example describes a generalization of Theorem 1.3:

Example 2.5
Let (s n ) n be a positive sequence with s n → ∞, s n = o(n 1/3 ). Let Z be a positive-valued continuous random variable with distribution function G such that G (z) exists and is continuous at For instance, we can take Z to be a uniform distribution on an interval (0, b), for any b > 0. We give a proof of this assertion in Lemma 4.8. Condition 2.3 can be strengthened to the following condition that will be equivalent for our purposes: Condition 2.6 (Extended density bound) There exist ε 0 > 0 and n 0 ∈ N such that Henceforth, except where otherwise noted, we will assume Conditions 2.1, 2.2 and 2.6. We will reserve the notation ε 0 , δ 0 for some fixed choice of the constants in Conditions 2.2 and 2.6, with ε 0 chosen small enough to satisfy both conditions.

Discussion of Our Results
In this section we discuss our results and state open problems.

The Universality Class in Terms of s n
In Sect. 2.1, we have described an edge-weight universality class in terms of s n . In this paper, we investigate the case where s n → ∞ with s n = o(n 1/3 ). We conjecture that all universality classes can be described in terms of the scaling behaviour of the sequence (s n ) n and below identify the eight universality classes that describe the different scaling behaviours. These eight cases are defined by how fast s n → 0 (this gives rise to four cases), the case where s n converges to a positive and finite constant, and by how s n → ∞ (giving rise to three cases, including the one that is studied in this paper). We believe that this paper represents a major step forward in this direction in that it describes the scaling behaviour in a large regime of (s n ) n sequences. We next describe the eight scaling regimes of s n and the results proved and/or predicted for them. We conjecture that these eight cases describe all universality classes for FPP on the complete graph, and it would be of interest to make this universal picture complete. Let us now describe these eight cases. The regime s n → 0. In view of (2.5), for s n → 0, the first passage percolation problem approximates the graph metric, where the approximation is stronger the faster s n tends to zero. We distinguish four different scaling regimes according to how fast s n → 0: [20,Sect. 1.2]) and was investigated in [16], where it is proved that H n is concentrated on at most two values. For the case of n-dependent edge weights Y (Kn ) e d = E s n , it was observed in [20] that when s n log n converges to γ fast enough, the methods of [16] can be imitated and the concentration result for the hopcount continues to hold. (ii) When s n log n → ∞ but s 2 n log n → 0, the variance factor s 2 n log(n/s 3 n ) from the central limit theorem (CLT) in (2.10) tends to zero. Since H n is integer-valued, it follows that (2.10) must fail in this case. First order asymptotics are investigated in [20], and it is shown that H n /(s n log n) P −→ 1, W n /(u n (1)s n log n) P −→ e. It is tempting to conjecture that there exists an integer k = k n ≈ s n log n such that H n ∈ {k n , k n + 1} whp. (iii) The regime where s n log n → ∞ but s 2 n log n → γ ∈ (0, ∞) corresponds to a critical window between the one-or two-point concentration conjectured in (ii) and the CLT scaling conjectured in (iv). It is natural to expect that H n − φ n log n is tight for an appropriately chosen sequence φ n ∼ s n , although the distribution of H n − φ n log n might only have subsequential limits because of integer effects. Moreover, we would expect these subsequential limits in distribution to match with (ii) and (iv) in the limits γ → 0 or γ → ∞, respectively. (iv) When s n → 0, s 2 n log n → ∞, we conjecture that the CLT for the hopcount in Theorem 2.4 remains true, and that u n (1) −1 W n − 1 λ n log n converges to a Gumbel distribution for a suitable sequence λ n and u n (1). Unlike in the fixed s case, we expect no martingale limit terms to appear in the limiting distribution.
The fixed s regime. The fixed s regime was investigated in [11] in the case where Y d = E s , and describes the boundary case between s n → 0 and s n → ∞. We conjecture that for other sequences of random variables for which Condition 2.1 is valid for some s ∈ (0, ∞) the CLT for the hopcount remains valid, while there exist V and λ(s) depending on the distribution In general, V will not be equal to (M (1) ∨ M (2) ) s , see for example [11]. Instead, it is described by a sum of different terms involving Gumbel distributions and the martingale limit of a certain continuous-time branching process depending on the distribution. Our proof is inspired by the methods developed in [11]. The CLT for H n in the fixed s regime can be recovered from our proofs; in fact, the reasoning in that case simplifies considerably compared to our more general setup.
The results in [11] match up nicely with ours. Indeed, in [11], it was shown that where λ(s) = (1 + 1/s) s , 1,2 is a Gumbel variable so that P( 1,2 ≤ x) = e −e −x and L (1) s , L (2) s are two independent copies of the random variable L s with E(L s ) = 1 solving the distributional equation where (L s,i ) i≥1 are i.i.d. copies of L s and (E i ) i≥1 are i.i.d. exponentials with mean 1. We claim that the right-hand side of (2.13) converges to M (1) ∨ M (2) as s → ∞, where M (1) , M (2) are as in Theorem 2.4. This is equivalent to the statement that and using (2.14) we deduce that M is the solution of the equation The unique solution to (2.16) is the random variable with P(M ≤ x) being the survival probability of a Poisson Galton-Watson process with mean x, so that M d = M (1) . The regime s n → ∞. The regime s n → ∞ can be further separated into three cases.
(i) Firstly, the case where s n → ∞ with s n /n 1/3 → 0 is the main topic of this paper. (ii) Secondly, the regime where s n /n 1/3 → γ ∈ (0, ∞) corresponds to the critical window between the minimal spanning tree case discussed below and the case (i) studied here. It is natural to expect (see also Theorems 1.1 and 1.3) that H n /n 1/3 converges to a nontrivial limit that depends sensitively on γ , and that, when γ → 0 and γ → ∞ matches up with the cases (i) and (iii) discussed above and below, respectively. (iii) Finally, the regime s n /n 1/3 → ∞. Several of our methods do not extend to the case where s n /n 1/3 → ∞; indeed, we conjecture that the CLT in Theorem 2.4 ceases to hold in this regime. In this case, our proof clearly suggests that first passage percolation (FPP) on the complete graph is closely approximated by invasion percolation (IP) on the Poisson-weighted infinite tree (PWIT), studied in [2], whenever s n → ∞, see also [21]. It it tempting to predict that H n /n 1/3 converges to the same limit as the graph distance between two vertices for the minimal spanning tree on the complete graph as identified in [3].

First Passage Percolation on Random Graphs
FPP on random graphs has attracted considerable attention in the past years, and our research was strongly inspired by its studies. In [17], the authors show that for the configuration model with finite-variance degrees (and related graphs) and edge weights with a continuous distribution not depending on n, there exists only a single universality class. Indeed, if we define W n and H n to be the weight of and the number of edges in the smallest-weight path between two uniform vertices in the graph, then there exist positive, finite constants α, β, λ and sequences (α n ) n , (λ n ) n , with α n → α, λ n → λ, such that W n −(1/λ n ) log n converges in distribution, while H n satisfies a CLT with asymptotic mean α n log n and asymptotic variance β log n.
Related results for exponential edge weights appear for the Erdős-Rényi random graph in [15], to certain inhomogeneous random graphs in [28] and to the small-world model in [30]. The diameter of the weighted graph is studied in [6], and relations to competition on r -regular graphs are examined in [7]. Finally, the smallest-weight paths with most edges from a single source or between any pair in the graph are investigated in [5].
We conjecture that our results are closely related to FPP on random graphs with infinitevariance degrees. Such graphs, sometimes called scale-free random graphs, have been suggested in the networking community as appropriate models for various real-world networks. See [8,31] for extended surveys of real-world networks, and [19,24,32] for more details on random graph models of such real-world networks. FPP on infinite-variance random graphs with exponential weights was first studied in [13,14], of which the case of finite-mean degrees studied in [14] is most relevant for our discussion here. There, it was shown that a linear transformation of W n converges in distribution, while H n satisfies a CLT with asymptotic mean and variance α log n, where α is a simple function of the power-law exponent of the degree distribution of the configuration model. Since the configuration model with infinite-variance degrees whp contains a complete graph of size a positive power of n, it can be expected that the universality classes on these random graphs are closely related to those on the complete graph K n . In particular, the strong universality result for finite-variance random graphs is false, which can be easily seen by observing that for the weight distribution 1 + E, where E is an exponential random variable, the hopcount H n is of order log log n (as for the graph distance [26]), rather than log n as it is for exponential weights. See [9] for two examples proving that strong universality indeed fails in the infinite-variance setting, and [4,10] for further results. The area has attracted substantial attention through the work of Komjáthy and collaborators, see also [25,29] for recent work in geometric contexts.

Extremal Functionals for FPP on the Complete Graph
Many more fine results are known for FPP on the complete graph with exponential edge weights. In [27], the weak limits of the rescaled path weight and flooding are determined, where the flooding is the maximal smallest weight between a source and all vertices in the graph. In [12] the same is performed for the diameter of the graph. It would be of interest to investigate the weak limits of the flooding and diameter in our setting.

Detailed Results, Overview and Classes of Edge Weights
In this section, we provide an overview of the proof of our main results.
This section is organised as follows. In Sect. 3.1, we explain how FPP clusters can be described in terms of an appropriate exploration process, both from one as well as from two sources. In Sect. 3.2, we discuss how this exploration process can be coupled to first passage percolation on the Poisson-weighted infinite tree (PWIT). In Sect. 3.3, we interpret the FPP dynamics on the PWIT as a continuous-time branching process, and study one-and two-vertex characteristics associated with it. The two-vertex characteristics are needed since we explore from two sources. Due to the near-critical behavior of the involved branching processes, the FPP clusters may run at rather different speeds, and we need to make sure that their sizes are comparable. This is achieved by freezing the fastest growing one, which is explained in detail in Sect. 3.4, both for the time at which this happens as well as the sizes of the FPP cluster at the freezing times. There, we also investigate the collision times between the two exploration processes, which correspond to (near-) shortest paths between the two sources. In Sect. 3.5, we couple FPP on the complete graph from two sources to a continuous-time branching process from which we can retrieve the FPP clusters by a thinning procedure. In Sect. 3.6, we use the explicit distribution of the collision edge (whether thinned or not) to derive its scaling properties, both for the time at which it occurs, as well as the generations of the vertices it consist of. Finally, in Sect. 3.7, we show that the first point of the Cox process that describes the collision edge is with high probability unthinned and complete the proof of our main results.

FPP Exploration Processes
To understand smallest-weight paths in the complete graph, we study the first passage exploration process from one or two sources. Recall from (1.1) that d K n ,Y (Kn ) (i, j) denotes the total cost of the optimal path π i, j between vertices i and j.

One-Source Exploration Process
For a vertex j ∈ V (K n ), let the one-source smallest-weight tree SW T ( j) t be the connected subgraph of K n defined by (3.1) Note that SW T ( j) t is indeed a tree: if two optimal paths π j,k , π j,k pass through a common vertex i, both paths must contain π j,i since the minimizers of (1.1) are unique. Moreover, by construction, FPP distances from the source vertex j can be recovered from arrival times in the process SW T ( j) t : To visualize the process (SW T ( j) t ) t≥0 , think of the edge weight Y (Kn ) e as the time required for fluid to flow across the edge e. Place a source of fluid at j and allow it to spread through the graph. Then V (SW T ( j) t ) is precisely the set of vertices that have been wetted by time t, while E(SW T ( j) becomes completely wet, with the additional rule that an edge is not added if it would create a cycle.
Because fluid begins to flow across an edge only after one of its endpoints has been wetted, the age of a vertex-the length of time that a vertex has been wet-determines how far fluid has traveled along the adjoining edges. Given SW T ( j) t , the future of the exploration process will therefore be influenced by the current ages of vertices in SW T ( j) t , and the nature of this effect depends on the probability law of the edge weights (Y (Kn ) e ) e . In the sequel, for a subgraph G = (V (G), E(G)) of K n , we write G instead of V (G) for the vertex set when there is no risk of ambiguity.

Two-Source Exploration Process
Consider now two vertices from K n , which for simplicity we take to be vertices 1 and 2. The two-source smallest-weight tree SW T (1,2) t is the subgraph of K n defined by (3.4) In other words, SW T (1,2) t is the union, over all vertices i within FPP distance t of vertex 1 or vertex 2, of an optimal path, either π 1,i or π 2,i whichever has smaller weight.
Because the edge weight distribution has no atoms, no two optimal paths have the same length. It follows that, a.s., SW T (1,2) t is the union of two vertex-disjoint trees for all t. (To see this, suppose vertex i is closer to vertex j than to vertex j , where j, j = {1, 2}. Then, given another vertex i and a path π passing from i to i to j , there must be a strictly shorter path passing from i to i to j.) We note that 5) with strict containment for sufficiently large t.
To visualize the process (SW T (1,2) t ) t≥0 , place sources of fluid at vertices 1 and 2 and allow both fluids to spread through the graph. Then, as before, V (SW T (1,2) t ) is precisely the set of vertices that have been wetted by time t, while E(SW T (1,2) t ) is the set of edges along which, at any time up to t, fluid has flowed from a wet vertex to a previously dry vertex. Equivalently, an edge is added to SW T (1,2) t whenever it becomes completely wet, with the additional rules that an edge is not added if it would create a cycle or if it would connect the two connected components of SW T (1,2) t . From the process SW T (1,2) t , we can partially recover FPP distances. Denote by T SW T (1,2) (1,2) t the arrival time of a vertex i ∈ [n]. Then, for j ∈ {1, 2}, (3.6) More generally, observing the process (SW T (1,2) t ) t≥0 allows us to recover the edge weights Y (Kn ) e for all e ∈ ∪ t≥0 E(SW T (1,2) t ). However, in contrast to the one-source case, the FPP distance W n = d K n ,Y (Kn ) (1,2) cannot be determined by observing the process (SW T (1,2) t ) t≥0 .
Indeed, if vertices i 1 , i 2 satisfy i 1 ∈ SW T (1,2;1) t and i 2 ∈ SW T (1,2;2) t for some t, then by construction the edge {i 1 , i 2 } between them will never be added to SW T (1,2) and there is no arrival time from which to determine the edge weight Y (Kn ) {i 1 ,i 2 } . The optimal weight W n is the minimum value which is uniquely attained a.s. by our assumptions on the edge weights. In the fluid flow description above, T SW T coll is the time when the fluid from vertex 1 and the fluid from vertex 2 first collide, and this collision takes place inside the collision edge. Note that since fluid flows at rate 1 from both sides simultaneously, the overall distance is given by W n = 2T SW T coll . ; and the unique path in SW T (1,2;2) We will not use Proposition 3.2 and the formula (3.7), which are a special case of Lemma 3.20 and Theorem 3.22. These generalizations deal with a freezing procedure that we will explain below. Note that the conditioning in (3.8) reflects the information about Y {i 1 ,i 2 } gained by knowing that i 1 and i 2 belong to different connected components of SW T (1,2) t : during the period of time when one vertex was explored but not the other, the fluid must not have had time to flow from the earlier-explored vertex to the later-explored vertex.

Coupling FPP on K n to FPP on the Poisson-Weighted Infinite Tree
In this section, we state results that couple FPP on K n to FPP on the Poisson-weighted infinite tree (PWIT). We start by explaining the key idea, coupling of order statistics of exponentials to Poisson processes, in Sect. 3

Order Statistics of Exponentials and Poisson Processes
To study the smallest-weight tree from a vertex, say vertex 1, let us consider the time until the first vertex is added. By construction, min i∈[n]\{1} Y (Kn ) where E is an exponential random variable of mean 1. We next extend this to describe the distribution of the order statistics Y ,i} are independent, exponential random variables with rate 1/n. We can realize E i as the first point of a Poisson point process P (i) with rate 1/n, with points X (i) 1 < X (i) 2 < · · · , chosen independently for different i = 2, . . . , n. We can also form the Poisson point process P (1) with rate 1/n, corresponding to i = 1, although this Poisson point process is not needed to produce an edge weight. To each point of P (i) , associate the mark i. Now amalgamate all n Poisson point processes to form a single Poisson point process of intensity 1, with points X 1 < X 2 < · · · . Each point X k has an associated mark M k . By properties of Poisson point processes, given the points X 1 < X 2 < · · · , the marks M k are chosen uniformly at random from [n], different marks being independent.
To complete the construction of the edge weights Y (Kn ) {1,i} , we need to recover the first points X (i) 1 , for all i = 2, . . . , n, from the amalgamated points X 1 < X 2 < · · · . Thus we will thin a point X k when M k = 1 (since i = 1 is not used to form an edge weight) or when M k = M k for some k < k (since such a point is not the first point of its corresponding Poisson point process). Then In the next step, we extend this result to the smallest-weight tree SW T (1) using a relation to FPP on the Poisson-weighted infinite tree.

The Poisson-Weighted Infinite Tree
The Poisson-weighted infinite tree is an infinite edge-weighted tree in which every vertex has infinitely many (ordered) children. Before giving the definitions, we recall the Ulam-Harris notation for describing trees. Define the tree T (1) as follows. The vertices of T (1) are given by finite sequences of natural numbers headed by the symbol ∅ 1 , which we write as ∅ 1 j 1 j 2 · · · j k . The sequence ∅ 1 denotes the root vertex of T (1) . We concatenate sequences v = ∅ 1 i 1 · · · i k and w = ∅ 1 j 1 · · · j m to form the sequence vw = ∅ 1 i 1 · · · i k j 1 · · · j m of length |vw| = |v|+|w| = k +m. Identifying a natural number j with the corresponding sequence of length 1, the j th child of a vertex v is v j, and we say that v is the parent of v j. Write p (v) for the (unique) parent of v = ∅ 1 , and p k (v) for the ancestor k generations before, k ≤ |v|.
We can place an edge (which we could consider to be directed) between every v = ∅ 1 and its parent; this turns T (1) into a tree with root ∅ 1 . With a slight abuse of notation, we will use T (1) to mean both the set of vertices and the associated graph, with the edges given implicitly according to the above discussion, and we will extend this convention to any subset τ ⊂ T (1) . We also write ∂τ = {v / ∈ τ : p (v) ∈ τ } for the set of children one generation away from τ .
To describe the PWIT formally, we associate weights to the edges of T (1) . By construction, we can index these edge weights by non-root vertices, writing the weights as X = (X v ) v =∅ 1 , where the weight X v is associated to the edge between v and its parent p(v). We make the convention that X v0 = 0.

Definition 3.3 (Poisson-weighted infinite tree)
The Poisson-weighted infinite tree (PWIT) is the random tree (T (1) , X ) for which X vk − X v(k−1) is exponentially distributed with mean 1, independently for each v ∈ T (1) and each k ∈ N. Equivalently, the weights (X v1 , X v2 , . . .) are the (ordered) points of a Poisson point process of intensity 1 on (0, ∞), independently for each v.
Motivated by (3.9), we study FPP on T (1) with edge weights ( f n (X v )) v : (3.10) and the FPP exploration process B P (1)

Definition 3.4 (First passage percolation on the Poisson-weighted infinite tree) For FPP on
Note that the FPP edge weights ( f n (X vk )) k∈N are themselves the points of a Poisson point process on (0, ∞), independently for each v ∈ T (1) . The intensity measure of this Poisson point process, which we denote by μ n , is the image of Lebesgue measure on (0, ∞) under f n . Since f n is strictly increasing by assumption, μ n has no atoms and we may abbreviate μ n ((a, b]) as μ n (a, b) for simplicity. Thus μ n is characterized by for any measurable function h : [0, ∞) → [0, ∞). Clearly, and as suggested by the notation, the FPP exploration process B P is a continuoustime branching process:

Coupling One-Source Exploration to the PWIT
Similar to the analysis of the weights of the edges containing vertex 1, we now introduce a thinning procedure that allows us to couple B P (1) and SW T (1) . Define M ∅ 1 = 1 and to each other v ∈ T (1) \{∅ 1 } associate a mark M v chosen independently and uniformly from [n].
This definition also appears as [Part I, Definition 2.8]. As explained there, this definition is not circular since whether or not a vertex v is thinned can be assessed recursively in terms of earlier-born vertices. Write B P (1) t for the subgraph of B P (1) t consisting of unthinned vertices.

Definition 3.7 Given a subset τ ⊂ T (1) and marks
Note that if the marks (M v ) v∈τ are distinct then π M (τ ) and τ are isomorphic graphs.
The following theorem, taken from [Part I, Theorem 2.10] and proved in [Part I, Sect. 3.3], establishes a close connection between FPP on K n and FPP on the PWIT with edge weights ( f n (X v )) v∈τ : Theorem 3.8 (Coupling to FPP on PWIT-one source) The law of (SW T (1) t ) t≥0 is the same as the law of π M B P (1) t t≥0 .

Theorem 3.8 is based on an explicit coupling between the edge weights (Y (Kn )
e ) e on K n and (X v ) v on T (1) . We will describe a related coupling in Sect. 3.5. A general form of those couplings is given in Sect. 5.
As in Proposition 3.5, the two-source FPP exploration process on T (1,2) with edge weights ( f n (X v )) v starting from ∅ 1 and ∅ 2 is equivalent to the union B P = B P (1) ∪ B P (2) of two CTBPs. (In the fluid-flow formulation, the additional rule-an edge is not explored if it would join the connected components containing the two sources-does not apply.) Note that this two-CTBP thinning rule is applied simultaneously across both trees: for instance, a vertex v ∈ T (1) can be thinned due to an unthinned vertex w ∈ T (2) . Henceforth we will be concerned with the two-CTBP version of thinning. Write B P t for the subgraph of B P t = B P (1) t ∪ B P (2) t consisting of unthinned vertices. The following theorem is a special case of Theorem 3.26: Theorem 3.10 (Coupling to FPP on PWIT-two sources) The law of (SW T (1,2) t ) t≥0 is the same as the law of π M B P t t≥0 .
We will not use Theorem 3.10, but instead rely on its generalization Theorem 3.26, since, in our setting, B P (1) and B P (2) can grow at rather different speeds. We will counteract this unbalance by an appropriate freezing procedure, as explained in more detail later on. Theorem 3.26 generalizes Theorem 3.10 to include this freezing.
We next state an equality in law for the collision time and collision edge. As a preliminary step, note that (3.8) can be rewritten in terms of the measure μ n as By a Cox process with random intensity measure Z (with respect to a σ -algebra F ) we mean a random point measure P such that Z is F -measurable and, conditionally on F , P has the distribution of a Poisson point process with intensity measure Z . For notational convenience, given a sequence of intensity measures Z n on R × X , for some measurable space X , we write Z n,t for the measures on X defined by Z n, Thus (3.12) states that T SW T coll has the law of the first point of a Cox process on R, where the intensity measure is given by a sum over SW T (1,2;1) × SW T (1,2;2) (see also [11,Proposition 2.3]). Using Theorem 3.10, we can lift this equality in law to apply to the collision time and collision edge. (2) (with respect to the σ -algebra generated by B P and (M v ) v∈T (1,2)

FPP on the PWIT as a CTBP
In this section, we relate FPP on the PWIT to a continuous-time branching process (CTBP). In Sect. 3.3.1, we investigate the exploration from one vertex and describe this in terms of one-vertex characteristics. In Sect. 3.3.2, we extend this to the exploration from two vertices and relate this to two-vertex characteristics of CTBPs, which will be crucial to analyse shortest-weight paths in FPP on K n which we explore from two sources.

FPP on the PWIT as a CTBP: One-Vertex Characteristics
In this section, we analyze the CTBP B P (1) introduced in Sect. 3.2. Notice that (B P (1) t ) t≥0 depends on n through its offspring distribution. We have to understand the coupled double asymptotics of n and t tending to infinity simultaneously.
Recall that we write |v| for the generation of v (i.e., its graph distance from the root in the genealogical tree). To count particles in B P (1) t , we use a non-random characteristic Following [11], define the generation-weighted vertex characteristic by We make the convention that whenever (3.16) has a unique solution. The parameters λ n and φ n in Theorem 2.4 are given by The asymptotics of λ n and φ n stated in Theorem 2.4 is the content of the following lemma: In the following theorem, we investigate the asymptotics of such generation-weighted onevertex characteristics: Theorem 3.13 (Asymptotics of one-vertex characteristics) Given ε > 0 and a compact subset A ⊂ (0, 2), there is a constant K < ∞ such that for n sufficiently large, uniformly for a, b ∈ A and for χ and η bounded, non-negative, non-decreasing functions, n for all n sufficiently large, uniformly over u, t ≥ 0 and a, b ∈ A. Corollary 3.14 (Asymptotics of means and variance of population size) The population size B P (1) t satisfies E( B P (1) t ) ∼ s n e λ n (1)t and Var( B P (1) t ) ∼ s 3 n e 2λ n (1)t / log 2 in the limit as λ n (1)t → ∞, n → ∞. Theorem 3.13 is proved in Sect. 6.4. Generally, we will be interested in characteristics χ = χ n for which χ n λ n (1) −1 · converges as n → ∞, so that the integral in (3.20) acts as a limiting value. In particular, Corollary 3.14 is the special case χ = 1 [0,∞) , a = 1.
Since s n → ∞, Theorem 3.13 and Corollary 3.14 show that the variance ofz χ t (a 1/s n ) is larger compared to the square of the mean, by a factor of order s n . This suggests that B P (1) t is typically of order 1 when λ n (1)t is of order 1 [i.e., when t is of order f n (1), see Lemma 3.12] but has probability of order 1/s n of being of size of order s 2 n . See also [Part I, Proposition 2.17] which confirms this behavior.

FPP on the PWIT as a CTBP: Two-Vertex Characteristics
Theorem 3.11 expresses the collision time T SW T coll as the first point of a Cox process whose cumulative intensity is given by a double sum over two branching processes. To study such intensities, we introduce generation-weighted two-vertex characteristics. Let χ be a nonrandom, non-negative function on [0, ∞) 2 and recall that T v = inf {t ≥ 0 : v ∈ B P t } denotes the birth time of vertex v and |v| the generation of v. The generation-weighted two-vertex characteristic is given by , and so on. We make the convention that (3. 23) In (3.13), the cumulative intensity Z SW T n,t can be expressed in terms of a two-vertex characteristic. If we define then the total cumulative intensity is given by We will use the parameters a 1 , a 2 to compute moment generating functions corresponding to Z SW T n,t . The characteristic χ n will prove difficult to control directly, because its values fluctuate significantly in size: for instance, χ n and again write μ (K ) n ((a, b]) = μ (K ) n (a, b) to shorten notation. For convenience, we will always assume that n is large enough that s n ≥ K . By analogy with (3.24), define By construction, the total mass of μ (K ) n is 2K /s n , so that s n χ (K ) n is uniformly bounded.
The following results identify the asymptotic behavior ofz χ (K ) n t 1 ,t 2 ( a) and show that, for K → ∞, the contribution due to χ n −χ (K ) n becomes negligible. These results are formulated in Theorem 3.15, which investigates the truncated two-vertex characteristic, and Theorem 3.16, which studies the effect of truncation: Theorem 3.15 (Convergence of truncated two-vertex characteristic) For every ε > 0 and every compact subset A ⊂ (0, 2), there exists a constant K 0 < ∞ such that for every K ≥ K 0 there are constants K < ∞ and n 0 ∈ N such that for all n ≥ n 0 , a 1 ,

28)
and where ζ : (0, ∞) → R is the continuous function defined by The exponents in Theorem 3.15 can be understood as follows. By Theorem 3.13, the first and second moments of a bounded one-vertex characteristic are of order s n and s 3 n , respectively. Therefore, for two-vertex characteristics, one can expect s 2 n and s 6 n . Since χ (K ) n = 1 s n s n χ (K ) n appears once in the first and twice in the second moment, we arrive at s n and s 4 n , respectively.

CTBP Growth and the Need for Freezing: Medium Time Scales
Theorem 3.11 shows how to analyze the weight W n and hopcount H n in terms of a Cox process driven by two (n-dependent) branching processes. In this section, we describe how this analysis works when the branching processes grow normally (i.e., exponentially with a fixed prefactor). In the CTBP scaling results from Sect. 3.3, we have seen that the class of edge weights we consider gives rise to a more complicated scaling, with n-dependent prefactors that diverge to infinity. As we will explain, this causes a direct analysis to break down, and we define an appropriate freezing mechanism that we use to overcome this obstacle. In Sect. 3.4.1, we first explain what we mean with freezing and why we need it, and in Sect. 3.4.2 we explain how FPP from two sources can be frozen, and then later unfrozen, such that CTBP asymptotics can be used and collision times between the two FPP clusters can be analyzed.

Frozen FPP Exploration Process
Under reasonable hypotheses, a fixed CTBP grows exponentially under all measures of size. More precisely, there will be a single constant λ such that m η t (1) ∼ A η e λt and M η,η t,t (1, 1) ∼ B η e 2λt for constants A η , B η , over a wide class of one-vertex characteristics η : [0, ∞) → R, including the choice η = 1 that encodes the population size. Similarly, if χ is a twovertex characteristic, we can expect that m .25), we would then expect the first point of the Cox process from Theorem 3.11 to appear at times t for which e 2λt ≈ n. For such t, we have e λt ≈ √ n, so that each branching process has of order √ n individuals and a typical individual is of order log n generations away from the root.
If these asymptotics hold, then a typical vertex v alive at such times t is unthinned whp. Indeed, for each of the ≈ log n ancestors, there are at most ≈ √ n other vertices that might have the same mark. Each pair of vertices has probability 1/n of having the same mark, leading to an upper bound of ≈ √ n log n n on the probability that v is thinned. In particular, the first point of , the first unthinned point of P SW T n . Using Theorem 3.11, it is therefore possible to derive asymptotics of W n and H n by analysing the first point of P SW T n , which in turn can be done by a first-and second-moment analysis of the intensity measure Z SW T n,t . In the setting of this paper, however, this analysis breaks down. The branching processes now themselves depend on n, and their behaviour becomes irregular when n is large. Onevertex characteristics satisfy m The mismatch of prefactors, s n versus s 3 n , suggests that the branching process has probability of order 1/s n of growing to size s 2 n in a time of order 1/λ n (1) ≈ f n (1), and that this unlikely event is important to the long-run growth of the branching process.
We can balance the mismatched first and second moments by aggregating s n independent copies of the branching process. The sum of s n independent copies of z η t (1) will have mean of order A η s 2 n e λ n (1)t and second moment of order B η s 4 n e 2λ n (1)t , where now the second moment is on the order of the square of the mean. [With proper attention to correlations, it is also possible to show that the two-vertex characteristics z χ n t,t (1), summed over two groups of s n independent branching processes each, will have mean of order C s 3 n e 2λ n (1)t and second moment D s 6 n e 4λ n (1)t .] This balancing makes a first-and second-moment analysis possible. To achieve the same effect starting from two branching processes, wait until each branching process is large enough that it has of order s n new children in time of order 1. Then the collection of all individuals born after that time (and their descendants) will again have balanced first and second moments. However, as we will see, the time when each branching process becomes large enough is highly variable. In particular, by the time the slower-growing of the two branching processes is large enough, the faster-growing branching process will have become much too large. For this reason, we will need to freeze the faster-growing branching process to allow the other to catch up. In the following sections we explain how freezing affects the FPP exploration process, the coupling to the PWIT, the Cox process representation for the optimal path, and the effect of thinning. We now first explain precisely how we freeze our two branching processes.
The choice of the freezing times T ( j) fr must attain two goals. First, we must ensure that, at the collision time T coll , the two branching processes with freezing are of comparable size (see Theorem 3.33 and the discussion following it). Second, we must ensure that, after the freezing times, the branching processes grow predictably, with the relatively steady exponential growth typical of supercritical branching processes (in spite of Theorem 3.13 and Corollary 3.14, where the mismatch between mean and variance shows that a branching process from a single initial individual has highly variable growth).
It has been argued in [Part I, Sect. 2.5] that the crossover to typical branching process behavior occurs when we begin to discover "lucky" vertices that have a large number of descendants-of order s 2 n -in a time of order f n (1). From [Part I, Theorem 2.15], we can see that this crossover coincides approximately with several other milestones: for instance, around the same time, the branching process also reaches height of order s n and total size of order s 2 n . For our purposes, it will be most important to control moments of the branching process after unfreezing, which will involve exponentially discounting future births at rate λ n (1).
These considerations lead to the following definition of the freezing times: Definition 3.17 (Freezing) Define, for j = 1, 2, the freezing times 31) and the unfreezing time T unfr = T (1) fr ∨ T (2) fr . The frozen cluster is given by t , exponentially time-discounted at rate λ n (1). In Definition 3.17, this expected discounted number (summed over all v ∈ B P ( j) t ) is required to exceed s n . This is the correct choice of scaling, because each newly born vertex has probability of order 1/s n of being "lucky"-i.e., having of order s 2 n descendants in time f n (1), see [Part I, Definition 2.14 and Proposition 2.17]-and the scaling s n in Definition 3.17 ensures that such a lucky vertex will be born in time O P ( f n (1)) after unfreezing.
Recall that M (1) , M (2) are i.i.d. random variables for which P(M ( j) ≤ x) is the survival probability of a Poisson Galton-Watson branching process with mean x. The asymptotics of the freezing times T ( j) fr and the frozen cluster B fr are as follows. Theorem 3.18 (Properties of the freezing times and frozen cluster) We expect, but do not prove, that the bounds in parts (b) and (c) are of the correct order, i.e., that the volume is P (s 2 n ) and the diameter P (s n ). The proof of Theorem 3.18 is based on [Part I, Theorem 2.15] and is given in Sect. 9.5.
Since M (1) = M (2) a.s., Theorem 3.18 (a) and the scaling properties of f n confirm that the two CTBPs B P (1) and B P (2) require substantially different times to grow large enough. Theorem 3.18 (b) and (c) will allow us to ignore the elements coming from the frozen cluster in the proof of Theorem 2.4. For instance, part (c) shows that heights within the frozen cluster are negligible in the central limit theorem scaling of (2.10).
From the proof of Theorem 2.4, we will see that (1)). (3.33) [The presence of a logarithm in (3.33) reflects the fact that, after T unfr , the branching processes grow exponentially.] The effects of the three terms in (3.33) can be combined using the following lemma: (1) , M (2) ). Then Lemma 3.19 is proved in Sect. 4. Theorem 3.18 (a) and Lemma 3.19 yield that f −1 n (T (1) fr + T (2) fr ) (2) . Because M (1) , M (2) > 1 a.s., we will be able to ignore the term O P ( f n (1)), and the scaling of W n in Theorem 2.4 will follow.

FPP Exploration Process from Two Sources with Freezing and Collisions
Let T ( j) fr be the freezing times defined in Definition 3.17, which are stopping times with respect to the filtration induced by In words, we run the two branching processes B P = B P (1) ∪ B P (2) normally until the first freezing time, T (1) fr ∧ T (2) fr , when one of the two branching processes has become large enough. Then we freeze the larger CTBP and allow the smaller one to evolve normally until it is large enough, at time T unfr = T (1) fr ∨ T (2) fr . At this time, which we call the unfreezing time T unfr = T (1) fr ∨ T (2) fr , both CTBPs resume their usual evolution. The processes R j (t) are the on-off processes that encode this behaviour: R j (t) increases at constant rate 1, except for the interval between T (1) fr ∧ T (2) fr and T unfr , where one of the two processes is constant. In the fluid flow picture, R j (t) represents the distance traveled by fluid from vertex j ∈ {1, 2}. We call the process (B t ) t≥0 the two-source branching process with freezing.
As with T (1,2) , we can consider B t to be the union of two trees by placing an edge between each non-root vertex v / ∈ {∅ 1 , ∅ 2 } and its parent. We denote by We next define the two-source FPP exploration process with freezing on K n , which we will denote (S t ) t≥0 . Intuitively, S t is the analogue of SW T (1,2) t under the assumption that fluid from vertex j has flowed a distance R j (t) by time t. As with SW T (1,2) , fluid from one vertex blocks fluid from the other vertex, so that S t will consist of two vertex-disjoint trees for all t. However, because fluid from one vertex may be frozen while still blocking fluid from the other vertex, it will no longer be possible to directly specify the vertex set V (S t ) as in (3.4). Instead we will define S = (S t ) t≥0 inductively using R −1 j such that at every time t ≥ 0, S t = S (1) t ∪ S (2) t is the disjoint union of two trees S (1) t and S (2) t with root 1 and 2, respectively. At time t = 0, let S 0 be the subgraph of K n with vertex set {1, 2} and no edges. Suppose inductively that we have constructed (S t ) 0≤t≤τ k−1 up to the time τ k−1 where the (k − 1)st vertex (not including the vertices 1 and 2) was added, for 1 ≤ k ≤ n − 2, with the convention Consider the set ∂S τ k−1 of edges e joining a previously explored vertex e we add the edge e k that attains the minimum in (3.39). Our assumptions on the edge weights Y (Kn ) e and the processes R 1 , R 2 will imply that this minimum, and in addition the minimum (with edges between S (1) and S (2) included), are uniquely attained a.s. We set S t = S τ k−1 for τ k−1 ≤ t < τ k , and we define S τ k to be the graph obtained by adjoining e k to S τ k−1 .
In the case R 1 (t) = R 2 (t) = t, S coincides with the two-source smallest-weight tree SW T (1,2) . In general, because the processes R 1 , R 2 increase at variable speeds, the relationship between S, T S (i) and the FPP distances (1) t . However, we have the following analogue of (3.7): Lemma 3.20 (Minimal-weight representation) The weight of the optimal path π 1,2 from vertex 1 to vertex 2 is given by

41)
and the minimum is attained uniquely a.s.
The conclusion of Lemma 3.20 is easily seen when R 1 (t) = t, R 2 (t) = 0 (in which case S (1) is the same as SW T (1) with vertex 2 removed) or when R 1 (t) = R 2 (t) = t (in which case S reduces to SW T (1,2) ). The proof of Lemma 3.20 in general requires some care, and is given in Sect. 5.3. The equality in (3.41) will be the basis of our analysis of W n .

Definition 3.21 The collision time is
The collision edge is the edge between the vertices I 1 ∈ S (1) and I 2 ∈ S (2) that attain the minimum in (3.41). We denote by H (I 1 ), H (I 2 ) the graph distance between 1 and I 1 in S (1) and between 2 and I 2 in S (2) , respectively.

Theorem 3.22 (Exploration process at the collision time)
The following statements hold almost surely: the endpoints I 1 , I 2 of the collision edge are explored before time T coll . The optimal path π 1,2 from vertex 1 to vertex 2 is the union of the unique path in S (1) T coll from 1 to I 1 ; the collision edge {I 1 , I 2 }; and the unique path in S (2) T coll from I 2 to 2. The weight and hopcount satisfy (3.43) Theorem 3.22 is proved in Sect. 5.3. The first equality in (3.43) is a simple consequence of continuity of t → R j (t) and the definition of T coll in Definition 3.21.

Remark 3.23
The values of the process (S t ) t≥0 depend implicitly on the choice of the processes R 1 , R 2 , which we have defined in terms of B. Similarly, the value of (T coll , I 1 , I 2 ) depends on the on-off processes R 1 (t), R 2 (t), as well as on the edge weights Y (Kn ) (2) . In particular, the law of (T coll , I 1 , I 2 ) will depend on the relationship between B and the edge weights (Y (Kn ) e ), which we will specify in Sect. 3.5. However, regardless of the laws of S and of (T coll , I 1 , I 2 ), the laws of W n and H n are the same.

Coupling FPP on K n from Two Sources to a CTBP
In this section, we revisit the coupling of FPP on K n from two sources to a CTBP, which we initialized in Sect. 3.2, using the tools in Sect. 3.3 on vertex characteristics, as well as the freezing, unfreezing and collisions discussed in Sect. 3.4. In Sect. 3.5.1, we give the final conclusions of the coupling including freezing, and in Sect. 3.5.2, we relate the law of the weight of the smallest-weight path and its number of collision edges to a certain Cox process of collisions.

The Final Coupling Including Freezing
Similarly to Theorem 3.8, we next couple the FPP process S on K n and the FPP process B on T (1,2) . To this end, we introduce a thinning procedure for The difference between Definition 3.24 and its closely related cousin Definition 3.9 is that Definition 3.24 includes freezing, which, as explained in Sect. 3.4, is crucial for our analysis. As with Definition 3.9, this definition is not circular, as vertices are investigated in their order of appearance. Write B t for the subgraph of B t consisting of unthinned vertices.
From here onwards, we will work on a probability space that contains • and a family of independent exponential random variables E e , e ∈ E(K ∞ ), with mean 1, independent of the PWITs and the marks.
On this probability space, we can construct the FPP edge weights on K n as follows. Let be the first time that a vertex with mark i appears in B and denote the corresponding vertex by V (i) ∈ T (1,2) . Note that T B (i) is finite for all i almost surely since the FPP exploration process eventually explores every edge. For every edge i, i ∈ E(K n ), we define and (3.46) The following proposition states that the random variables in (3.46) can be used to produce the correct edge weights on K n for our FPP problem: . Proposition 3.25 shows that, subject to (3.46), these edge weights have the desired distribution. We next explain the relationship between B and S. Recall the subgraph π M (τ ) of K n introduced in Definition 3.7, which we extend to the case where τ ⊂ T (1,2) .
The proof is given in Sect. 5.4. Even though Theorem 3.26 is closely related to Theorem 3.10, it will be the version in Theorem 3.26 that we rely upon in our technical proofs. We now discuss its importance in more detail.
Theorem 3.26 achieves two goals. First, it relates the exploration process B, defined in terms of two infinite underlying trees, to the smallest-weight tree process S, defined in terms of a single finite graph. Because thinning gives an explicit coupling between these two objects, we will be able to control its effect, even when the total number of thinned vertices is relatively large. Consequently we will be able to study the FPP problem by analyzing a corresponding problem expressed in terms of B (see Theorems 3.27 and 3.28) and showing that whp thinning does not affect our conclusions (see Theorem 3.33).
Second, Theorem 3.26 allows us to relate FPP on the complete graph (n-independent dynamics run on an n-dependent weighted graph) with an exploration defined in terms of a pair of Poisson-weighted infinite trees (n-dependent dynamics run on an n-independent weighted graph). By analyzing the dynamics of B when n and s n are large, we obtain a fruitful dual picture: when the number of explored vertices is large, we find a dynamic rescaled branching process approximation that is essentially independent of n. When the number of explored vertices is small, we make use of a static approximation by invasion percolation found in [21]. In fact, under our scaling assumptions, FPP on the PWIT is closely related to invasion percolation (IP) on the PWIT which is defined as follows. Set IP (1) (0) to be the subgraph consisting of ∅ 1 only. For k ∈ N, form IP (1) (k) inductively by adjoining to IP (1) (k − 1) the boundary vertex v ∈ ∂IP (1) (k − 1) of minimal weight. We note that, since we consider only the relative ordering of the various edge weights, we can use either the PWIT edge weights (X v ) v or the FPP edge weights ( f n (X v )) v .
Write IP (1) (∞) = ∞ k=1 IP (1) (k) for the limiting subgraph. We remark that IP (1) (∞) is a strict subgraph of T (1) a.s. (in contrast to FPP, which eventually explores every edge). Indeed, define Consequently, (2.9) in Theorem 2.4 can be read as a decomposition of the weight W n of the smallest-weight path into a deterministic part 1 λ n log(n/s 3 n ) coming from the branching process dynamics and the weight of the largest edge explored by invasion percolation starting from two sources f n (M (1) ∨ M (2) ).

A Cox Process for the Collisions
Similarly to Theorem 3.11, we can relate the collision time and collision edge to a Cox process driven by B. To state this result, we will use a slightly different coupling between B and K n than the one just described. More precisely, we alter the definition (3.36) of B and work with a copy having the same law. This alteration only affects the thinned part B \ B, and the pointwise relationships in (3.46) and Theorem 3.26 continue to apply. As discussed in Remark 3.23, this change affects the law of (S, T coll , I 1 , I 2 ) but not of W n and H n . The full details can be found in Sect. 5.5.

Theorem 3.27 (A Cox process for the collision edges with freezing) Let P n be a Cox process on
(3.49) Let (T (Pn ) coll , V (1) coll , V (2) coll ) denote the first point of P n for which V (1) coll and V (2) coll are unthinned. Then, for a suitable coupling, the law of (T (Pn ) coll , coll ) is the same as the joint law of the collision time T coll ; the optimal weight W n ; the smallest-weight tree S T coll at time T coll ; and the endpoints I 1 , I 2 of the collision edge. In particular, the hopcount H n has the same distribution as V (1) coll + V (2) coll + 1.
Theorem 3.27 is the version of Theorem 3.11 that includes freezing. Sketch of the proof By Lemma 3.20, Theorem 3.22 and the fact that , the conditional law of such birth times is Poisson with intensity given by (3.48).
To complete the proof, it remains to ensure that knowledge does not reveal any other information about the birth times used to determine the connecting edge weights Y (Kn ) {i 1 ,i 2 } . We will accomplish this by redefining B and R 1 , R 2 so that they use a conditionally independent copy of those birth times. See Sect. 5.5 for more details and the full proof of Theorem 3.27.
Theorem 3.27 means that we can study first passage percolation on K n by studying a CTBP problem and then controlling the effect of thinning. In fact, with Theorem 3.27 in hand, we will no longer need to refer to K n at all.
The remainder of the proof of Theorem 2.4 will be to use Theorems 3.26 and 3.27. In Sect. 3.6, we will first study the properties of the collision edges ignoring the thinning, and in Sect. 3.7, we will show that whp the first collision edge is not thinned to conclude the proof.

The Collision Edge and Its Properties: Long Time Scales
Theorem 3.27 expresses the collision edge in terms of the first unthinned point of P n . We begin by stating the asymptotic behavior of the first point (whether thinned or not) of P n : Theorem 3.28 (The first point of the Cox process) Let P n be the Cox process in Theorem 3.27, and let (T first , V (1) first , V (2) first ) denote its first point. Then , converges in distribution to a pair of independent normal random variables of mean 0 and variance 1 2 , and is asymptotically independent of T first and of B.
The proof of Theorem 3.28, presented at the end of the current section, is based on a general convergence result for Cox processes, which we now describe. Consider a sequence of Cox processes (P * n ) n on R × R 2 with random intensity measures (Z * n ) n , with respect to σ -fields (F n ) n . We will write P * n,t for the measure defined by P * n,t (·) and let A n,k be the event that T * n, j / ∈ {±∞} and |P * n,T n, j | = j, for j = 1, . . . , k. That is, A n,k is the event that the points of P * n with the k smallest t-values are uniquely defined. On A n,k , let X n,k denote the unique point for which P * n ( T * n,k × X n,k ) = 1, and otherwise set X n,k = †, an isolated cemetery point.
The following theorem gives a sufficient condition for the first points of such a Cox process to converge towards independent realizations of a probability measure Q. To state it, we writê for the moment generating function of a measure R on R d and ξ ∈ R d .

57)
all this with probability at least 1 − ε for n sufficiently large; and (b) for each ε > 0, there exists t such that Then the random sequence (X n, j ) ∞ j=1 converges in distribution to an i.i.d. random sequence Theorem 3.29 is proved in Sect. 8. To apply Theorem 3.29, we will rescale and recentre both time and the heights of vertices. Furthermore, we will remove the effect of the frozen cluster B fr . Define P * n to be the image under the mapping

Theorem 3.31 (Our collision Cox process is nice)
The point measures (P * n ) n are Cox processes and satisfy the hypotheses of Theorem 3.29 when Q is the law of a pair of independent N 0, 1 2 random variables, q(t * ) = e 2t * , and F n is the σ -field generated by the frozen cluster B fr .
We prove Theorem 3.31 in Sect. 9.3. All the vertices relevant to P * n are born after the unfreezing time T unfr , and therefore appear according to certain CTBPs. Theorem 3.31 will therefore be proved by a first and second moment analysis of the two-vertex characteristics from Sect. 3.3.2.
To use Theorem 3.31 in the proof of Theorem 3.28, we will show that the first point (T * first , H * 1 , H * 2 ) of P * n and the first point (T first , V (1) first , V (2) first ) of P n are whp related as in (3.59)-(3.60). This will follow from part (b) of the following lemma, which we will prove in Sects. 9.2 and 9.4: Lemma 3.32 Let K < ∞ and t = T unfr + λ n (1) −1 1 2 log(n/s 3 n ) + K . Then (a) B t = O P ( √ ns n ); and Assuming Lemma 3.32 (b) and Theorem 3.31, we can now prove Theorem 3.28: Proof of Theorem 3.28 By construction, the first point . Theorems 3.29 and 3.31 imply that T * first = O P (1), so that T = T unfr + λ n (1) −1 1 2 log(n/s 3 n )+ O P (1)) by (3.60). We may therefore apply Lemma 3.32 (b) to conclude that P n [0, T ] × B (1) fr (2) fr = 0 whp. In particular, whp, (T , V 1 , V 2 ) equals the first point (T first , V (1) first , V (2) first ) of P n , and therefore first * . In Theorem 3.28, the heights are to be rescaled as in (3.52) rather than (3.59).
However, these differ only by the term p unfr (V ( j) first ) /s n log(n/s 3 n ). By Theorem 3.18 (c), fr by construction. Hence the term p unfr (V ( j) first ) /s n log(n/s 3 n ) is o P (1). Finally, the asymptotic independence statements follow from those in Theorem 5.3 and (3.51) follows from the tightness of T * first .

Thinning and Completion of the Proof
In this section, we explain that the first point of the Cox process is whp unthinned and conclude our main results: Theorem 3.33 (First point of Cox process is whp unthinned) Let P n be the Cox process in Theorem 3.27, and let (T first , V (1) first , V (2) first ) denote its first point. Then V (1) first and V (2) first are whp unthinned. Consequently, whp T (Pn ) Proof According to Definition 3.24, the vertex V ( j) first , j ∈ {1, 2}, will be thinned if and only if some non-root We obtain an upper bound by dropping the requirement that w should be unthinned and relaxing the condition T B w < T B v 0 to T B w ≤ T first and w = v 0 . Each such pair of vertices (v 0 , w) has conditional probability 1/n of having the same mark, so, by a union bound, and this upper bound is o P (1) since n/s 3 n → ∞.
Note that other choices of R 1 (t), R 2 (t) would make Theorem 3.33 false. For the first point of P n to appear, the intensity measure Z n,t , which is given by 1/n times a sum over B (1) edge distributions such as the E s edge weights considered in [11], where the exploration must proceed simultaneously from both endpoints with R 1 (t) = R 2 (t) = t.
In the heavy-tailed case that we consider, even the symmetric choice R 1 (t) = R 2 (t) = t is in effect unbalanced. Indeed, at the earlier of the two freezing times, t = min T (1) fr , T (2) fr , the faster-growing cluster has reached size O P (s 2 n ), whereas min T (1) fr , T (2) fr ≈ f n (min M (1) , M (2) ) < f n (max M (1) , M (2) ) (3.63) (see Theorem 3.18) implies that the slower-growing cluster has not yet explored the unique edge of weight max M (1) , M (2) and therefore has size O P (1). This is a crucial reason for introducing the freezing procedure of Sect.
Organisation of this paper. In the remainder of the paper, we use the outline given in Sects. 2.1-3.7 and give the details and proofs omitted there. We have already mentioned how these results complete the proof of our main results, as well as where these results are proved, we thus now restrict to the relation to the local behavior as described in the companion paper [21]. While [21] focusses on the local behaviour of FPP on K n , in this paper we extend the analysis to the global behavior. We will rely on some results from [21], but in a highly localized way. Indeed, in Sect. 5 we rely on some coupling results from [21]

Growth and Density Bounds for f n and n
In this section, we explore the key implications of Conditions 2.1-2.2 and 2.6 on f n and on the intensity measure μ n .
Proof Divide (2.7) or (2.12) by x and integrate between x and x to obtain log f n (x ) − log f n (x) ≥ ε 0 s n log x − log x whenever 1 − δ 0 ≤ x ≤ x , n ≥ n 0 , as claimed.
We call Condition 2.6 a density bound because it implies the following lemma, which will also be useful in the study of two-vertex characteristics in Sect. 7:  Taking K sufficiently large proves the claim.
(4.5) By Lemma 3.12, λ n f n (1) converges to a finite constant. Since f n is strictly increasing on . Writing x n = f −1 n (y) and using Conditions 2.1 and 2.6, we get

By Condition 2.1, f n (1 + x/s n )/ f n (1) → e x for every given x, and it follows with Taylor's theorem that
We are now in the position to prove Lemma 3.19:
We conclude the section with a remark on the connection between Theorem 2.4 above and [Part I, Theorem 2.1] which states that, if s n / log log n → ∞, then (2) . 1/ε 0 s n = (1+o(1)) exp 1 ε 0 s n log log(n/s 3 n ) = 1+o(1). (4.7) Hence, Lemma 3.19 gives that (2.9) and (4.6) agree if s n / log log n → ∞. Proof. The proof is virtually identical to the proof of [Part I, Lemma 5.4], and we indicate only those parts of the proof that differ. We prove Condition 2.1 for f n and the sequences n defined in (4.9). We compute

Analysis of Specific Edge-Weight Distributions
(4.10) Noting that x f n (x)/ f n (x) is the derivative of log f n (x) with respect to q = log x, we find (4.11) after the substitution θ = s n q. As n → ∞, we haves n → ∞ by assumption, so that the last integrand converges to 1 pointwise. For each fixed x, the convergence is uniform over θ by properties of slowly varying functions, so that f n (x 1/s n )/ f n (1) → e log x = x as required.  Proof For Example 1.2, it is readily verified that the regular-variation condition (1.3) holds. By Lemma 4.7 it remains to show that Condition 2.3 (b) holds when s n / log log n ∞, i.e., for Example 1.2 (b) with 0 < γ ≤ 1. It suffices to find a sequence x n with g(x n /n)/g(1/n) log n → ∞ such that s −1 n xg (x)/g(x) → 1 uniformly over 1/n ≤ x ≤ x n /n; in fact we will take x n = log n.
In particular, we see that xg (x)/g(x) is a slowly-varying function of q = − log(1 − e −x ), say xg (x)/g(x) = h(q) where q → h(q) is slowly varying as x ↓ 0, q → ∞. Since (1)) uniformly on 1/n ≤ x ≤ (log n)/n, and consequently uniformly on 1/n ≤ x ≤ (log n)/n by properties of slowly varying functions, as required.
On the other hand xg (x)/g(x) → ∞ as x ↓ 0 implies in particular that xg (x)/g(x) ≥ 2 for x sufficiently small, so that for n sufficiently large and we have shown that g(x n /n)/g(1/n) log n → ∞, as required. For Example 2.5, we compute f n (x) = G −1 (1 − e −x/n ) s n and 1 s n Write a = G (0) > 0. Since Z is positive-valued, we have G(0) = 0 and therefore u/G −1 (u) → a as u ↓ 0, whereas (G −1 ) (u) → 1/a as u → 0. This implies that the quantity in (4.16) tends to 1 whenever x/n → 0, which allows us to conclude Conditions 2.2 and 2.3 (a). Similarly, from s n → ∞ we can infer that x f n (x)/ f n (x) ≥ 2, uniformly over x ≤ log n, for n sufficiently large, whence f n (log n)/ f n (1) log n → ∞ and Condition 2.3 (b) holds. Finally, as in (4.11), we find and comparing with (4.16), the integrand converges to 1 as n → ∞, uniformly over θ for any fixed x, and Condition 2.1 follows.

Equivalence of Conditions: Proof of Lemma 2.7
The proof of Lemma 2.7 is based on the observation that if the functions f n ,f n agree on the interval [0, x] then, for the FPP problems with edge weights f n (n X (Kn ) e ) andf n (n X (Kn ) e ), respectively, the optimal paths and their corresponding edge weights are identical whenever either optimal path has weight less than f n (x) =f n (x).

Proof of Lemma 2.7
Let δ 0 be the constant from Condition 2.2, let R > 1, and define is constant on x n,R , ∞ . By construction, the sequence ( f n,R ) n satisfies Condition 2.6 for any fixed R > 1. Furthermore, given any x > 0, R > 1 implies that x 1/s n ≤ R ≤ x n,R for n sufficiently large, and it follows that f n,R satisfies Condition 2.1. Since x n,R ≥ R > 1 it follows that Condition 2.2 holds for ( f n,R ) n , too.
Let μ n,R and λ n,R denote the analogues of μ n and λ n when f n is replaced by f n,R , and let λ n,R = λ n,R (1) and φ n,R = λ n,R (1)/λ n,R (1) denote the corresponding parameters [see (3.11) and (3.16)-(3.18)]. Let W n,R , H n,R denote the weight and hopcount, respectively, associated to the FPP problem on K n with edge weights f n,R (n X (Kn ) e ). Abbreviate w n,R = W n,R − log(n/s 3 n )/λ n,R , h n,R = (H n,R − φ n,R log(n/s 3 n ))/ s 2 n log(n/s 3 n ). By assumption, Theorem 2.4 holds assuming Conditions 2.1, 2.2 and 2.6. Therefore, it applies to f n,R . Using Theorem 2.4, we conclude that for any k ∈ N, we may find n (R,k) whenever n ≥ n (R,k) 0 , and for definiteness we take n (R,k) 0 minimal with these properties. Indeed, using the continuity of M (1) ∨ M (2) and Z , the uniform convergence in (4.19a) follows from the pointwise convergence at a finite grid ((x i , y i )) i depending on k and monotonicity of the distribution functions. For (4.19c), use the inequality a + b ≤ 2(a ∨ b), Lemma 3.12, and note that 2 f n,R (R − 1) ≤ f n,R (R) for n sufficiently large by Lemma 4.1. Set R n = 2 ∨ max k ∈ N : n ≥ n (k,k) 0 ∧ n, and λ n = λ n,R n , φ n = φ n,R n . (4.20) Since n (k,k) 0 is finite for each k ∈ N, it follows that R n → ∞. Moreover, as soon as n ≥ n (2,2) 0 , we have n ≥ n (Rn ,Rn ) 0 , so that (4.19a)-(4.19c) hold with (R, k) = (R n , R n ). By construction, f n,R (1) = f n (1), and we conclude in particular that φ n /s n → 1 and λ n f n (1) → e −γ .
Given two functions f n andf n , we can couple the corresponding FPP problems by choosing edge weights f n (n X (Kn ) e ) andf n (n X (Kn ) e ), respectively. Letx > 0. On the event {W n ≤ f n (x)}, the optimal path π 1,2 uses only edges of weight at most f n (x). If f n andf n agree on the interval [0,x], then the edges along that path have the same weights in the two FPP problems and we deduce that W n =W n and H n =H n , whereW n andH n are the weight and the hopcount of the optimal path in the problem corresponding tof n .
Consequently, on the event W n,R n ≤ f n,R n (x n,R n ) , W n = W n,R n and H n = H n,R n . By (4.19a), it remains to show that this event occurs whp. Since R n → ∞, we conclude from (4.19a) that W n,R n ≤ f n,R n (R n − 1) + log(n/s 3 n )/λ n,R n whp. But from the definition of x n,R n and f n,R n it follows that f n,R n (x n,R n ) ≥ f n,R n (R n )∨4e γ f n,R n (1) log n, so (4.19c) completes the proof.

Coupling K n and the PWIT
In Theorem 3.26, we indicated that two random processes, the first passage exploration processes S and B on K n and T (1,2) , respectively, could be coupled. In this section we explain how this coupling arises as a special case of a general family of couplings between K n , understood as a random edge-weighted graph with i.i.d. exponential edge weights, and the PWIT. We rely on some results from the companion paper [21], in particular [Part I, Theorem 3.4, Lemma 3.6 and Proposition 3.7].

Exploration Processes and the Definition of the Coupling
As in Sect. 3.5, we define M ∅ j = j, for j = 1, 2, and to each v ∈ T (1,2) \ {∅ 1 , ∅ 2 }, we associate a mark M v chosen uniformly and independently from [n]. We next define what an exploration process is: Definition 5.1 (Exploration process on two PWITs) Let F 0 be a σ -field containing all null sets, and let (T (1,2) , X ) be independent of F 0 . We call a sequence E = (E k ) k∈N 0 of subsets of T (1,2) an exploration process if, with probability 1, E 0 = {∅ 1 , ∅ 2 } and, for every k ∈ N, either E k = E k−1 or else E k is formed by adjoining to E k−1 a previously unexplored child v k ∈ ∂E k−1 , where the choice of v k depends only on the weights X w and marks M w for vertices w ∈ E k−1 ∪ ∂E k−1 and on events in F 0 .
Examples for exploration processes are given by FPP and IP on T (1,2) . For FPP, as defined in Definition 3.4, it is necessary to convert to discrete time by observing the branching process at those moments when a new vertex is added. The standard IP on T (1,2) is defined as follows. Set IP(0) = {∅ 1 , ∅ 2 }. For k ∈ N, form IP(k) inductively by adjoining to IP(k − 1) the boundary vertex v ∈ ∂IP(k − 1) of minimal weight. However, an exploration process is also obtained when we specify at each step (in any suitably measurable way) whether to perform an invasion step in T (1) or T (2) .
For k ∈ N, let F k be the σ -field generated by F 0 together with the weights X w and marks M w for vertices w ∈ E k−1 ∪ ∂E k−1 . Note that the requirement on the choice of v k in Definition 5.1 can be expressed as the requirement that E is (F k ) k -adapted.
For v ∈ T (1,2) , define the exploration time of v by We define the stopping times at which i ∈ [n] first appears as a mark in the unthinned exploration process. Note that, on the event {N (i) < ∞}, E k contains a unique vertex in T (1,2) whose mark is i, for any k ≥ N (i); call that vertex V (i). On this event, we define We define, for an edge i, i ∈ E(K n ), where (E e ) e∈E(K n ) are exponential variables with mean 1, independent of each other and of (X v ) v .

Theorem 5.3 If E is an exploration process on the union T (1,2) of two PWITs, then the edge weights X (Kn )
e defined in (5.4) are exponential with mean 1, independently for each e ∈ E(K n ).
The idea underlying Theorem 5.3 is that each variable 1 n X (i, i ) is exponentially distributed conditionally on the past up to the moment N (i) when it may be used to set the value of X (Kn ) {i,i } . Theorem 5.3 restates [Part I, Theorem 3.4] and is proved in that paper.

Minimal-Rule Exploration Processes
An important class of exploration processes, which includes both FPP and IP, are those exploration processes determined by a minimal rule in the following sense:

Definition 5.4 A minimal rule for an exploration process
is a (possibly empty) subset of the boundary vertices of E k−1 and ≺ k is a strict total ordering of the elements of S k (if any) such that the implication holds. An exploration process is determined by the minimal rule In words, in every step k there is a set of boundary vertices S k from which we can select for the next exploration step. The content of (5.5) is that, whenever a vertex w ∈ S k is available for selection, then all siblings of w with the same mark but smaller weight are also available for selection and are preferred over w.
For FPP without freezing on T (1,2) with edge weights f n (X v ), we take v ≺ k w if and only if T v < T w [recall (3.10)] and take S k = ∂E k−1 . For IP on T (1,2) , we have v ≺ k w if and only if X v < X w ; the choice of subset S k can be used to enforce, for instance, whether the kth step is taken in T (1) or T (2) .
Recall the subtree E k of unthinned vertices from Definition 5.2 and the subgraph π M ( E k ) from Definition 3.7. That is, π M ( E k ) is the union of two trees with roots 1 and 2, respectively, and for v ∈ E k \{∅ 1 , ∅ 2 }, π M ( E k ) contains vertices M v and M p(v) and the edge M v , M p (v) . For The following lemma shows that, for an exploration process determined by a minimal rule, unthinned vertices must have the form V (i, i ):

Lemma 5.5 Suppose E is an exploration process determined by a minimal rule
See the proof of [Part I, Lemma 3.6]. If E is an exploration process determined by a minimal rule, then we define and where e j = i j , i j and i j ∈ π M ( E k−1 ), i j / ∈ π M ( E k−1 ) as in (5.7).

Proposition 5.6 (Thinned minimal rule) Suppose E is an exploration process determined by a minimal rule (S
Then, under the edge-weight coupling (5.4), the edge weights of π M ( E k ) are determined by and generally k that is minimal with respect to ≺ k .
Proposition 5.6 asserts that the subgraph π M ( E k ) of K n , equipped with the edge weights (X (Kn ) e ) e∈E(π M ( E k )) , is isomorphic as an edge-weighted graph to the subgraph E k of T (1,2) , equipped with the edge weights 1 n X v v∈ E k \{∅ 1 ,∅ 2 } . Furthermore, the subgraphs π M ( E k ) can be grown by an inductive rule. Thus the induced subgraphs (π M ( E k )) ∞ k=0 themselves form a minimal-rule exploration process on K n , with a minimal rule derived from that of E, with the caveat that ≺ k may depend on edge weights from E k−1 \ E k−1 as well as from See the proof of [Part I, Proposition 3.7] for the proof of Proposition 5.6.

FPP and the Two Smallest-Weight Trees: Proof of Theorem 3.22
In this section, we discuss the relationship between S and FPP distances, and we prove Lemma 3.20 and Theorem 3.22.

Proof of Lemma 3.20 DefineS
To describe the discrete-time evolution ofS = (S u ) u≥0 , denote byτ k−1 the time where the (k − 1)th vertex (not including vertices 1 and 2) was added toS. At time u = 0,S 0 is equal to S 0 , and contains vertex 1 and 2 and no edges. Having constructed the process until timē τ k−1 , at timeτ by construction, so that we may rewrite (5.12) as (5.14) Comparing (5.14) with (3.39), S andS will evolve in the same way until the timē whenS first accepts an edge betweenS (1) andS (2) . In particular, the minimization problem in (5.14) will be the same as in (3.40), and the minimizer will be a.s. unique, as long asτ k ≤τ . Therefore we can choose J , J with J , J = {1, 2} and I ∈S (J ) τ − , I ∈S (J ) τ − such that, at timeτ , the edge between I and I is adjoined toS (J ) . [In other words, j = J and e = I , I is the minimizer in (5.14)].
Because the minimizer in (5.14) is unique, no vertex is added to S at timeτ . In particular, T S (i) <τ for every i ∈ Sτ . Since S ( j) t andS ( j) t agree for t <τ , the arrival times beforeτ must coincide. Recalling (5.13), Consider the optimal path from vertex J to vertex I . Since I is adjoined toS (J ) at timē τ , it follows from (5.13) that d K n ,Y (Kn ) {I ,I } . Applying (5.16) to the path from J to I to I to J , In particular, both sides of (3.41) are bounded above by R 1 (τ ) + R 2 (τ ). The bound (5.18) will allow us to exclude vertices that arrive after timeτ . To this end, we will show that (a) if a path π from vertex 1 to vertex 2 contains a vertex not belonging to Sτ , then the weight of π is greater than R 1 (τ ) + R 2 (τ ); and (b) if i 1 ∈ S (1) \ S (1) τ or i 2 ∈ S (2) \ S (2) τ , then the term For (a), suppose that π contains a vertex i / ∈ Sτ . Since the vertex sets of Sτ andSτ coincide, it follows that i / ∈Sτ , and by right continuity i / ∈S t for some t >τ . Since R 1 + R 2 is strictly increasing, (5.13) shows that the weight of π is at least For (b), suppose for specificity that i 1 ∈ S (1) \ S (1) τ . If in addition i 2 ∈ S (2) \ S (2) τ then T S (i 1 ), T S (i 2 ) >τ and the strict monotonicity of R 1 + R 2 gives the desired result. We may therefore suppose i 2 ∈ S (2) τ . Since S (1) and S (2) are disjoint, we must have i 1 / ∈ S (2) τ , so that d K n ,Y (Kn ) (2, i 1 ) ≥ R 2 (τ ). In particular, by considering the optimal path from 2 to i 2 together with the edge from i 2 to i 1 , we conclude that d K n ,Y (Kn ) (2, . By the assumption i 2 ∈ S (2) τ , we may rewrite this as To complete the proof, consider a path π from vertex 1 to vertex 2. By statement (a) and (5.18), π must contain only vertices from Sτ if it is to be optimal. Since 1 ∈ S (1) τ but 2 ∈ S (2) τ , it follows that π must contain an edge between some pair of vertices i 1 ∈ S (1) τ and i 2 ∈ S (2) τ . The minimum possible weight of such a path is d K n ,Y (Kn ) (1, which agrees with the corresponding term in (3.41) by (5.16). Therefore (3.41) is verified if the minimum is taken only over i 1 ∈ S (1) τ , i 2 ∈ S (2) τ . But statement (b) and (5.18) shows that the remaining terms must be strictly greater.

Proof of Theorem 3.22
Since R 1 + R 2 is strictly increasing, the relation W n = R 1 (T coll ) + R 2 (T coll ) is a reformulation of Definition 3.21.
Recall the timeτ from the proof of Lemma 3.20. We showed there that the minimizer of (3.41) must come from vertices i 1 ∈ S (1) τ , i 2 ∈ S (2) τ . In particular, by (5.16), expresses W n as the weight of the path formed as the union of the optimal path from 1 to I 1 ; the edge from I 1 to I 2 ; and the optimal path from I 2 to 2. In particular, π 1,2 is the same as this path. Since T S (I j ) < τ and S ( j) t = SW T ( j) R j (t) for t <τ , it follows that the optimal paths from j to I j coincide with the unique paths in S ( j) τ between these vertices. The relation H n = H (I 1 ) + H (I 2 ) + 1 follows by counting the edges in these subpaths.
It remains to show that T S (I j ) < T coll . Define Recalling (3.39), we see that t 1 is the time at which the edge from I 1 to I 2 is adjoined to S (1) , provided that I 2 has not already been added to S at some point strictly before time t 1 . By construction, I 2 is added to S (2) , not S (1) , so it must be that T S (I 2 ) < t 1 . [Equality is not possible because of our assumption that the minimizers of (3.39) are unique.] Aiming for a contradiction, This is a contradiction since t 1 > T S (I 2 ) ≥ T coll . Similarly we must have T S (I 1 ) < T coll . This shows that the unique paths in S ( j) τ from j to I j are actually paths in S ( j) T coll , as claimed.

B and S as Exploration Processes: Proof of Theorem 3.26
Before proving Theorem 3.26, we show that the discrete-time analogue of B is an exploration process determined by a minimal rule:

Lemma 5.7
Let v k denote the kth vertex added to B, excluding the root vertices ∅ 1 , ∅ 2 , and Then E is an exploration process determined by a minimal rule.
Proof Consider the kth step and define i.e., the next birth time of a vertex in T ( j) in the absence of freezing, and let v ( j) k denote the a.s. unique vertex attaining the minimum in (5.21). Recalling the definition of the filtration F k , we see that τ ( j) next (k) and v ( j) k are F k -measurable. The variable T ( j) fr is not F k -measurable. However, the event T ( j) fr < τ ( j) next (k) is F kmeasurable. To see this, define (5.22) so that τ ( j) fr (k) is F k -measurable, and abbreviate τ unfr (k) = τ (1) fr (k) ∨ τ (2) fr (k). We will use τ ( j) fr (k) as an approximation to T ( j) fr based on the information available in F k . By analogy with (3.35) and (3.37), we also define

(5.23)
We note the following: The sum in (5.22) agrees with the sum in the definition (3.31) of T ( j) fr whenever t < τ ( j) next (k).
be nonempty. The minimizers of R −1 j,k (t) and R −1 j (t) over all pairs (t, j) with t ∈ I ( j) k and j ∈ {1, 2} agree and return the same value. Statement (i) follows by induction and (3.36). For (ii), note that the sums in (3.31) and . But for any t and any t < min by definition, so the second part of (i) completes the proof. Statement (iii) follows from (ii): if one of the sums in (3.31) or (5.22) exceeds s n before time τ ( j) next (k), then so does the other, and the first times where they do so are the same. In particular, , 2}, then R −1 j,k and R −1 j agree everywhere according to (iii). Finally, consider the case that τ ( j) fr (k) ≥ τ ( j) next (k) and τ ( j ) fr (k) < τ ( j ) next (k) for j, j = {1, 2}. Then T ( j) fr ≥ τ ( j) next (k) and, therefore, Hence, in all three cases, the functions agree on the relevant domain and we have proved (iv). Set can be infinite when τ ( j) fr (k) < ∞ and τ ( j ) fr (k) = ∞, where j, j = {1, 2}, but in this case R −1 j ,k (t) must be finite for all t. Furthermore R −1 j,k is strictly increasing whenever it is finite. Recalling that the times T v are formed from variables with continuous distributions, it follows that the minimizing pair , j ∈ {1, 2}. (Once again, this minimizing pair is well defined a.s., reflecting the fact that, a.s., B never adds more than one vertex at a time. . Moreover, since R −1 j and R −1 j,k are both strictly increasing (when finite), both minimizations can be restricted to the set (v ( j) k , j) : j = 1, 2 , where v ( j) k is the minimizer in (5.21).

Proof of Proposition 3.25 By Theorem 5.3, the edge weights X (Kn )
e associated via (5.4) to the exploration process in Lemma 5.7 are independent exponential random variables with mean 1. Recalling (2.1)-(2.2), the corresponding edge weights Y (Kn ) e = g(X (Kn ) e ) are independent with distribution function F Y . To complete the proof it suffices to observe that N (i) < N (i ) if and only if T B (i) < T B (i ) and that T B (i) is finite for all i ∈ [n] since the FPP process explores every edge in T (1,2) eventually. Hence, definitions (3.46) and (5.4) agree.
Proof of Theorem 3. 26 We resume the notation from the proof of Lemma 5.7. Create the edge weights on K n according to (5.4). Denote by τ k −1 the time where the (k − 1)th vertex (not including the vertices 1 and 2) was added to S [see (3.39)]. As in the proof of Theorem 3.8, both B and S are increasing jump processes and π M ( B 0 ) = S 0 . By an inductive argument we can suppose that k, k ∈ N are such that E k = E k−1 and π M ( E k−1 ) = S τ k −1 . The proof will be complete if we can prove that (a) the edge e k adjoined to S τ k −1 to form S τ k is the same as the edge . According to (5.9), (2.1) and (2.3), the edge weights along this path are In addition, let i / ∈ S τ k −1 and write e = i, i . By (5.10), (X V (i,i ) ). Thus the expression in the right-hand side of (3.39) reduces to The edge e k minimizes this expression over . By the induction hypothesis, statement (iv) in the proof of Lemma 5.7 and monotonicity, these two minimization problems have the same minimizers, proving (a), and return the same value, which completes the proof of (b).

Coupling and Cox Processes: Proof of Theorem 3.27
In this section we explain the modified coupling in, and give the proof of Theorem 3.27.

Proof of Theorem 3.27
The edge-weight coupling (3.46) selects the edge weights Y (Kn ) (2) based on values f n (X w ) for which p (w) ∈ B P and M p(w) , M w = {i 1 , i 2 }. Under the present definition of B [see (3.36)], such vertices w are eventually explored, and consequently the values f n (X w ) can be recovered by observing (B t ) t≥0 . On the other hand, in the context of Theorem 3.27, we want the values f n (X w ) to behave as a Cox process [with respect to B, R 1 , R 2 , (M v ) v∈B ]. For this reason, we will modify the definition of B so that it does not explore vertices w of this kind, and we will replace the contribution of those vertices using an additional source of randomness. 2 As always, we have two independent PWITs (T ( j) , X ( j) ), j ∈ {1, 2}, the marks M v , v ∈ T (1,2) , and a family of independent exponential random variables E e , e ∈ E(K ∞ ), with mean 1, independent of the PWITs and the marks. In addition, from each vertex v we initialise an independent PWIT with vertices (v, w ), edge weights X (v,w ) [such that (X (v,w k) ) ∞ k=1 forms a Poisson point process with rate 1 on (0, ∞)] and marks M (v,w ) uniform on [n], all independent of each other and of the original variables X v , M v .
First consider (B, B, R 1 , R 2 ) as normally constructed, without using the auxiliary PWITs with vertices (v, w ). Fix i, i ∈ [n], i = i , and suppose for definiteness that T B (i) < T B (i ).
[If instead T B (i ) < T B (i), interchange the roles of i and i in the following discussion.] According to (3.46) or (5.4), the edge weight X (Kn ) {i,i } is set to be 1 n X (i, i ), where X (i, i ) is the first point in the Poisson point process of intensity 1/n. Now condition on V (i) and V (i ) belonging to different trees, say . For this to happen, the children of V (i) having mark i must not have been explored by time , so we can reformulate this by saying that the Poisson point process , we can rewrite this as the condition that However, the condition gives no information about points of larger value. It follows that, conditionally on t > T B (i ) so as to use the replacement edge weights X v,w , but continue to use the original edge weights X w in the edge-weight coupling (5.4). [Formally, modify the minimal rule from Lemma 5.7 so that vertices are ineligible for exploring once they have been replaced, but add to the sum in (5.22) the contribution from any replacement vertices from the auxiliary PWITs that would have been explored at time t.] These replacements do not change the law of B, R 1 , R 2 , (M v ) v∈B , or the edge weights X (Kn ) e . The pointwise equality between π M ( B) and S is unaffected: the replaced vertices are thinned and therefore do not affect B. Finally, the evolution of B for t > T B (i 2 ) now gives no additional information about the edge weights In particular, conditionally on B, R 1 , R 2 , (M v ) v∈B and the event that V (i) ∈ T (J ) , V (i ) ∈ T (J ) for some choice of J , J = {1, 2}, the law of P (i,i ) is that of a Poisson point process with intensity measure 1/n on ( f −1 n ( R V (i),V (i ) ), ∞). Furthermore, the Poisson point processes corresponding to different i, i will be conditionally independent.
We can now give an explicit construction of P n . We begin by defining P n on the subspace given by unthinned pairs of vertices: (2) , (5.25) In the notation above, P n [0,∞)×{V (i 1 )}×{V (i 2 )} is the image of P (i j ,i j ) under the mapping ). (5.26) In particular, by the remarks above, P n [0,∞)× B (1) × B (2) has the conditional law of a Poisson point process conditionally on B, R 1 , R 2 , (M v ) v∈B . To compute its intensity measure, note that the mapping x → y = f n (x) sends 1/n times Lebesgue measure on . It follows that the further mapping y → (R 1 + R 2 ) −1 (T V (i 1 ) + y + T V (i 2 ) ) leads to the intensity measure specified by (3.48), where we have again used the relation is a Cox process of the correct intensity.
Finally, we may extend P n to be a Cox process on [0, ∞) × T (1) × T (2) with the specified intensity, by defining P n [0,∞)×{v 1 }×{v 2 } using an independent source of randomness for any pair of vertices v 1 , v 2 for which v 1 ∈ B (1) \ B (1) or v 2 ∈ B (2) \ B (2) . Note that the details of this extension are unimportant since such pairs (v 1 , v 2 ) are not considered in the definition of (T (Pn ) coll , V (1) coll , V (2) coll ). Observe that under this construction of P n and under the edge-weight coupling (3.46),
The value X (i J , i J ) coincides with the first point of P (i J ,i J ) , and applying the increasing mapping (5.26) it follows that the first point of P n [0,∞)×{V (i 1 )}×{V (i 2 )} has time coordinate Using Lemma 3.20, the strict monotonicity of R 1 + R 2 , and the relation = min is the result of minimizing (5.29) over all choices of i 1 , i 2 , and I 1 , I 2 are the corresponding minimizers. On the other hand, T (Pn ) coll is the result of minimizing the first point of P n [0,∞)×{v 1 }×{v 2 } over all choices of unthinned vertices v 1 ∈ T (1) , v 2 ∈ T (2) , and V (1) coll , V (2) coll are the corresponding minimizers. Every such pair (v 1 , v 2 ) can be written as v j = V (i j ) for some i j ∈ S ( j) , j = 1, 2, and in fact i j = M v j in this correspondence. Hence these two minimizations problems are equivalent and their unique minimizers coincide, and we have proved (5.27). The remaining statements in Theorem 3.27 follow from (5.27) and the relations W n = In the remainder of the paper, we will be concerned only with the equality in law from Theorem 3.27. We can therefore continue to define B as in (3.36), ignoring the modified construction given in the proof of Theorem 3.27. The edge-weight coupling (3.46) between T (1,2) and K n , and indeed the edge weights on K n generally, will play no further direct role in the analysis.

Branching Processes and Random Walks
In this section, we prove Theorem 3.13 by continuing the analysis of the branching process B P (1) introduced in Sect. 3. In Sect. 6.1 we identify a random walk which facilitates moment computations of the one-vertex characteristics. Section 6.2 contains precise results about the scaling behavior of the random walk and the parameter λ n (a). The results are proved in Sect. 6.3. Section 6.4 identifies the asymptotics of the first two moments of one-vertex characteristics. Having understood these, we investigate two-vertex characteristics in Sect. 7 and prove Theorem 3.15.

Continuous-Time Branching Processes and Random Walks
Recall that B P (1) = (B P (1) t ) t≥0 denotes a CTBP with original ancestor ∅ 1 . Using Ulam-Harris notation, the children of the root are the vertices v with p (v) = ∅ 1 , and their birth times (T v ) p(v)=∅ 1 form a Poisson point process with intensity μ n . For v ∈ T (1) , write B P (v) for the branching process of descendants of such a v, re-rooted and time-shifted to start at t = 0. Formally, In particular, B P (1) = B P (∅ 1 ) , and the processes (B P (v) ) p(v)=∅ 1 are independent of each other and of (T v ) p(v)=∅ 1 . We may express this compactly by saying that the sum of point ) forms a Poisson point process with intensity dμ n ⊗ dP(B P (1) ∈ ·), where P(B P (1) ∈ ·) is the law of the entire branching process. Recalling the definition of the one-vertex characteristic from (3.14), we deduce that whereQ is a Poisson point process with some intensityμ (and assuming the integrals exist). Apply Since ν a is a probability measure by construction, this recursion can be solved in terms of a random walk: (6.7) From (6.5), we obtain similarlȳ where S j = j i=1 D i now has distribution P ab (D i ∈ ·) = ν ab (·). Note that for a random variable D with law ν a , for every h ≥ 0 measurable, E a (h(D)) = h(y)ae −λ n (a)y dμ n (y). (6.9) Moreover, let ν * a denote the size-biasing of ν a , i.e., dν * a (y) = y dν a (y) y dν a (y) so that E a (h(D * )) = E a (Dh(D)) E a (D) (6.10) for h ≥ 0 measurable. Here and in all of the following we assume that D and D * have laws ν a and ν * a respectively under E a . Let U be uniform on [0, 1], and let (D i ) i≥1 be independent with law ν a , and independent of U and D * . Besides the random walk (S j ) j from (6.7)-(6.8), it is useful to study the random walk (S * j ) j with D i for all j ≥ 1. (6.11)

Random Walk Convergence
In this section, we investigate asymptotics of the random walks in (6.7)-(6.8) and (6.11).
Recall that the branching process B P (1) is derived from the intensity measure μ n , where for h : In particular, all the quantities z, m, M, ν a , D i , P a and E a as well as the random walks (S j ) j and (S * j ) j depend implicitly on n. We will give sufficient conditions for the sequence of walks (S j ) j , (S * j ) j to have scaling limits; in all cases the scaling limit is the Gamma process. This is the content of Theorem 6.3 below.
As a motivation, we first look at the key example Y (Kn ) e d = E s n : (1)x s n and f −1 n (x) = nx 1/s n . (6.13) One easily checks that for all a > 0, β > 0, f n (1)λ n (a 1/s n ) = a (1 + 1/s n ) s n , s n λ n (a 1/s n )E a 1/sn (D) = 1, s n λ n (a 1/s n ) 2 E a 1/sn (D 2 ) = 1 + 1/s n , a μ n λ n (a 1/s n )β s n = 1/β. (6.14) Notice that (1 + 1/s n ) s n → e −γ for n → ∞. Theorem 6.3 will show that in general the same identities hold asymptotically under our conditions on f n . In fact, we will prove Theorem 6.3 under weaker assumptions on f n : Conditions 2.2 and 2.6 together imply Condition 6.2: we may set δ n = δ 0 , with ε 0 chosen as for Conditions 2.2 and 2.6, and replacing (x, x ) in Lemma 4.1 by (1, x 1/s n ) or (x 1/s n , 1) verifies the inequalities in Condition 6.2. lim n→∞ a μ n λ n (a 1/s n )β s n = 1/β, (6.18) where γ is Euler's constant; (b) under E a 1/sn , the process (λ n (a 1/s n )S s n t ) t≥0 converges in distribution (in the Skorohod topology on compact subsets) to a Gamma process ( t ) t≥0 , i.e., the Lévy process such that t has the Gamma(t, 1) distribution; (c) under E a 1/sn , the variable λ n (a 1/s n )D * converges in distribution to an exponential random variable E with mean 1, and the process (λ n (a 1/s n )S * s n t ) t≥0 converges in distribution to the sum (U E + t ) t≥0 where U is Uniform on [0, 1] and U , E, ( t ) t≥0 are independent.
Moreover, given a compact subset A ⊂ (0, ∞), all of the convergences occur uniformly for a ∈ A and, for (6.18), for β ∈ A. Theorem 6.3 will be proved in Sect. 6.3. We stress that the proof or Theorem 6.3 uses only Conditions 2.1 and 6.2 and the relevant definitions but no other results stated so far.

Convergence of Random Walks: Proof of Theorem 6.3
For notational convenience, we make the abbreviations x = x 1/s n ,ã = a 1/s n ,b = b 1/s n , etc., (6.19) which we will use extensively in this section and in Sects. 7 and 9.
Proof of Theorem 6. 3 We begin by provinĝ Recalling (6.12), we haveμ . For x ≥ 1, Condition 6.2 implies that the integrand is bounded by e −δx ε 0 for some δ > 0. Dominated convergence therefore completes the proof of (6.20). It is easy to see that the proof of (6.25), and hence (6.20), holds uniformly in a ∈ A, where A ⊆ (0, ∞) is a fixed but arbitrary compact set.
The following lemma is a consequence of Theorem 6.3 and will be used in the proof of the second moment statements of Theorem 3.13: Lemma 6.6 Assume the hypotheses of Theorem 6.3, and let A ⊂ (0, 2) be compact. For any measurable, bounded function h ≥ 0, set (6.32) There are constants K < ∞ and n 0 ∈ N independent of h such that (h) ≤ K s n h ∞ for all n ≥ n 0 and a, b ∈ A. Moreover, for any ε > 0 there are constants K < ∞, n 0 ∈ N independent of h such that for all a, b ∈ A, n ≥ n 0 , (6.33) Note that log(1/a + 1/b) is positive and bounded away from 0 by our assumption on A.

Means of One-Vertex Characteristics: Proof of Theorem 3.13
In this section, we prove Theorem 3.13. Further, we set the stage for the proofs of Theorems 3.15 and 3.16 in Sect. 7. Recall from (6.7) that Thus,m χ t (a) can be understood as the expected integral of e −λ n (a)t χ(t) against the counting measure on the random set t − S j : j ∈ N 0 . When t → ∞, this measure approaches its stationary equivalent, which is the counting measure on the point-stationary set t − S * j : j ∈ N 0 (see [34]). Since the expected value of this stationary measure is a multiple of the Lebesgue measure,m χ t (a) will approach (as t → ∞) the same multiple of ∞ 0 e −λ n (a)t χ(t)dt. In the following we make this more precise. We begin with general estimates that apply to any CTBP with any intensity measure μ, and we will write simply λ(a) for the parameter defined by the analogue of (3.16). Similarly, all other notation introduced for B P (1) will be used for a general CTBP.
Proposition 6.7 Let (S * j ) j be the random walk defined in (6.11). Let χ be a non-negative characteristic. Then, for all a, t > 0, The first equality is (6.7); the second follows because the set t − S * j : j ∈ N 0 is point-stationary in the sense of [34]. Alternatively, the equality may be verified by taking Laplace transforms with respect to t.
In (6.7) and (6.38), we may collapse the tail of the sum into a single value ofm χ u (a). Namely, if J is a stopping time for (S j ) j or (S * j ) j , respectively, then by the strong Markov propertym (6.39) The following lemmas provide bounds onm  .7) and (S * j ) j as in (6.38), and suppose that (S j ) j and (S * j ) j are independent. Let ε > 0 and set J = inf j : S j − S * j ≤ ε . Then, for a, t > 0, χ t (a). Apply (6.38) and replace the limit of integration by ∞ to obtain the result. Lemma 6.10 Let χ be a non-negative, non-decreasing characteristic such that ∞ 0 e −λ(a)u χ(u) du < ∞, and fix a, K > 0. Then, for all t > 0, ∞ j=0 e −λ(a)(t−S j ) χ(t − S j ) is squareintegrable under E a and, abbreviating C a, (6.42) The same bound holds with (S j ) j replaced by (S * j ) j . Proof Since χ is non-decreasing, ∞ 0 e −λ(a)u χ(u)du < ∞ implies that e −λ(a)u χ(u) must be bounded. Hence ∞ 0 e −2λ(a)u χ(u) 2 du < ∞ also. Applying Lemma 6.9 to χ and χ 2 , we deduce (6.43) Another application of Lemma 6.9 gives (6.42). Finally replacing (S j ) j by (S * j ) j is equivalent to replacing t by t − U D * . Since the upper bound in (6.42) does not depend on t, the result follows.
We now specialise to the offspring distribution μ n and apply the convergence results of Theorem 6.3 to prove Theorem 3.13: Proof of Theorem 3.13 By Lemma 6.9, for all a, t > 0, , (6.44) and, by Theorem 6.3, Pã(λ n (ã)S * 0 ≤ 1) → P(U E ≤ 1) and s n λ n (ã)Eã(D) → 1 uniformly in a ∈ A. Hence, the uniform bound form χ t (ã) follows. By the same reasoning, Lemma 6.10 yields a constant C < ∞ such that an estimate that will be needed shortly.
For (3.20), fix ε > 0. Apply Lemma 6.8 with ε and a replaced byε = λ n (ã) −1 ε and a, with the stopping time J n = inf j : S j − S * j ≤ε = inf j : λ n (ã) S j − S * j ≤ ε . By (6.15), we may choose K large enough that ∞ t−ε λ n (ã)e −λ n (ã)z χ(z)dz ≤ χ ∞ ε whenever λ n (1)t ≥ K . By (6.16), it follows that the first terms in the upper and lower bounds of Lemma 6.8 are s n ∞ 0 λ n (ã)e −λ n (ã)z χ(z)dz + O(ε)s n χ ∞ . Therefore it is enough to show that the error term Eã is also O(ε)s n χ ∞ for λ n (1)t sufficiently large, uniformly in a ∈ A (the same proof will work for the term with S j ).
To prove this, observe that the variables (J n /s n , λ n (ã)S * J n ) n∈N,a∈A are tight. Indeed, the rescaled processes (λ n (ã)S s n t ) t≥0 , (λ n (ã)S * s n t ) t≥0 converge by Theorem 6.3 to independent Gamma processes (with different initial conditions). These limiting processes approach to within ε/2 at some random but finite time, and tightness follows.
Thus we may choose C large enough that the event A = J n ≤ C s n ∪ λ n (ã)S * J n ≤ C satisfies Pã(A c ) ≤ ε 2 . Using the Cauchy-Schwarz inequality and (6.45), By (6.15), the right-hand side is at most εs n χ ∞ , uniformly over a ∈ A, if λ n (1)t ≥ K with K sufficiently large. This completes the proof of (3.20). We turn to the estimates forM χ,η t,u (ã,b). In view of (6.8), apply Lemma 6.6 to h(y) = m
Regarding t 1 as fixed, expresses the one-vertex characteristic s n χ (K ) n (t 1 , ·) as the difference of two uniformly bounded, non-negative, non-decreasing functions.

Proof of Theorem 3.15
Interchanging the roles of t 1 and t 2 in (7.2) and using Proposition 7.1, we can write ρ t 2 ,ã 2 in (7.5) as the difference of two bounded, non-negative, non-decreasing functions and Theorem 3.13 yields that 1 s nm is bounded. To show (3.28), Proposition 7.1 allows us to replace ρ t 2 ,ã 2 in (7.5) by I (λ n (ã 2 ·)), making an error of at most εs n . Since I can be written as the difference of two bounded, non-negative, non-decreasing functions, Theorem 3.13, (7.10) and the fact that ζ(r ) → ζ(a 2 /a 1 ) uniformly, yield the claim.  ( a, b) = E a 1 b 1 ,a 2 where now (S (1) j ) j and (S (2) j ) j are independent random walks and (S (i) j ) j has step distribution ν a i b i , i = 1, 2. Applying Lemma 6.6 twice and using the results from the first part of the proof, we obtain the desired conclusions.

The Effect of Truncation: Proof of Theorem 3.16
In this section, we control the effect of truncation and prove Theorem 3.16, by showing that the remainder χ n − χ (K ) n has a negligible first moment. We will write χ n = dμ n (y) y , where The same is true for χ (K ) n and μ (K ) n , so that, by (6.12) and the substitutionx = x 1/s n , We must therefore show that I 0 , I 1 , I 2 are uniformly bounded and can be made small by making K and λ n (1)[t 1 ∧ t 2 ] large. To this end, we will bound the two-vertex meanm y t ( 1) in terms of one-vertex means. Abbreviate Proof Theorem 3.13 and η (q) ∞ = 1 imply (7.18a). For (7.18d), use the representation (6.7) and note that, starting from the first index J for which η (2q) (t − S J ) = 0 (if one exists), the total number of indices j for which η (2q) (t − S j ) = 0 is stochastically bounded by the waiting time (starting from J ) until the second step where Y j > q. Then P 1 (D j ≤ q) ≤ f −1 n (q) proves (7.18d). For (7.18b)-(7.18c), we employ a size-biasing argument on the jump sizes D i . For i ≤ j, write S j,i = 1≤k≤ j,k =i D k . We can therefore rewrite (6.7) (noting that the term j = 0 vanishes) as Finally, to prove (7.18b) we now reverse the argument that led to (7.21) by reintroducing a term D i . By (6.12), (1). Continuing from (7.21), we estimatē (1) (1), (7.24) where in the last inequality we have used that As in the proof of Theorem 3.13, we use Lemma 6.9, Theorem 6.3 and the definition of ρ to obtainm By Condition 6.2, we have f −1 n (u) ≤ (u/ f n (1)) 1/ε 0 s n for u ≥ f n (1). Changing variables and using that ε 0 s n ≥ 1 for large n, we obtain according to (6.15). Hencem  δ 0 )). The other factors in (7.24) are O(1/ f n (1)) because of (6.15), Condition 2.1, and the assumption t ≥ f n (1−1/s n ). This completes the proof of (7.18b).
Proof of Theorem 3. 16 We will show that each of the terms I 0 , I 1 , I 2 in (7.13) is uniformly bounded, and furthermore can be made arbitrarily small by taking K large enough (for I 1 and I 2 ) and λ n (1)[t 1 ∧t 2 ] large enough (for I 0 ). We begin with the term I 2 [i.e., x ≥ (1+ K /s n ) s n ]. Lemma 7.2, (7.18a) and (6.15) givē (7.28) it follows that I 2 is uniformly bounded and can be made arbitrarily small by taking K , and hence (1 + K /s n ) s n , large enough, uniformly over t 1 , t 2 .
For I 1 , we again start by estimatingm y t ( 1) where y = f n (x) withx ∈ [1−δ 0 , 1− K /s n ]. Suppose for definiteness, and without loss of generality by symmetry, that t 1 ≤ t 2 . Split the first integral from Lemma 7.2 into the intervals [0, and [t 1 − 4y, t 1 ] (noting that the integrand vanishes for r > t 1 ) and denote the summand by θ 11 n (y), θ 12 n (y) and θ 13 n (y). The second summand in Lemma 7.2 is called θ 14 n (y). The corresponding parts of I 1 are denoted by I 11 , . . . , I 14 .

First Points of Cox Processes: Proof of Theorem 3.29
Let X denote a topological space equipped with its Borel σ -field and let (P n ) n≥1 be a sequence of Cox processes on R × X with random intensity measures (Z n ) n≥1 . That is, there exist σ -fields F n such that Z n is F n -measurable and, conditionally on F n , P n is a Poisson point process with (random) intensity Z n . For instance, Theorem 3.27 expresses the first passage distance and hopcount in terms of the first point of a Cox process. In this section, we determine sufficient conditions to identify the limiting distribution of the first points of P n based on the intensity measure at fixed times t. This section is organised as follows. We start in Sect. 8.1 with preparations of convergence of Cox processes. In Sect. 8.2, we use these results to prove Theorem 3.29.

Preparations: Weak Convergence of Cox Processes
We will write P n,t for the measure defined by P n,t (·) = P n ((−∞, t]×·), and given a partition t 0 < · · · < t N we abbreviate P n,i = P n,t i − P n,t i−1 ; similarly for Z n,t , Z n,i . Write |μ| for the total mass of a measure μ. Define and let A n,k be the event that T n, j / ∈ {±∞} and P n,T n, j = j, for j = 1, . . . , k. That is, A n,k is the event that the points of supp P n with the k smallest t-values are uniquely defined. On A n,k , let X n,k denote the unique point for which P n ( T n,k × X n,k ) = 1, and otherwise set X n,k = †, an isolated cemetery point.
We will impose the following conditions on the intensity measures (Z n ) n , expressed in terms of a probability measure Q on X and a family H of measurable functions h : X → R.

Condition 8.1 (Regularity of Cox process intensities)
(a) For any t ∈ R and for any h ∈ H, We make the convention that any function h on X is extended to X ∪ { †} by h( †) = 0.

Then
(a) the random sequence (X n, j ) ∞ j=1 converges in distribution [with respect to the product topology on (X ∪ { †}) N ] to a random sequence (X j ) ∞ j=1 , where the X j are independent with law Q; (b) the sequence (X n, j ) ∞ j=1 is asymptotically independent of F n ; (c) the collection (T n, j ) k j=1 : n ∈ N of random vectors is tight; and and (X j ) ∞ j=1 are independent. Proof of Theorem 8.3 assuming Proposition 8.2 Because of the product topology, it suffices to consider finite sequences (T n, j , X n, j ) k j=1 for a fixed k ∈ N. Applying (8.6) with g j (t) = 1 gives the convergence of (X n, j ) k j=1 . The independence of (T j ) k j=1 and (X j ) k j=1 follows from the product form of (8.6), and the asymptotic independence of X j from F n follows because of the conditional expectations in (8.6).
We first prove the following lemma. Given t 0 < · · · < t N , write B n,k for the event that there exist (random) integers 1 ≤ I 1 < · · · < I k ≤ N with P n,I j = 1 for j = 1, . . . , k and P n,t I k = k. (That is, B n,k is the event that each of the first k points of P n is the unique point in some interval (t i−1 , t i . In particular B n,k ⊂ A n,k .) Lemma 8.4 Assume Conditions 8.1 (b)-(c). Then, given ε > 0 and k ∈ N, there exists [t, t] and a partition t = t 0 < · · · < t N = t of [t, t] such that lim inf n→∞ P(B n,k ) ≥ 1 − ε. In particular, P(A n,k ) → 1.
Proof Given a partition t = t 0 < · · · < t N = t, the complement B c n,k is the event that P n contains a point in −∞, t , fewer than k points in −∞, t , or more than one point in some interval (t i−1 , t i . By Conditions 8.1 (b)-(c), we may choose t, t such that the first two events each have probability at most ε/3 for n large. Since P P n,i ≥ 2 Z n = 1 − e −| Z n,i | (1 + Z n,i ) ≤ Z n,i 2 , Condition 8.1 (c) gives a partition of [t, t] such that the third event also has probability at most ε/3 for n large.

Proof of Proposition 8.2
Fix any ε > 0 and bounded, continuous functions g 1 , . . . , g k . Choose t 0 < · · · < t N as in Lemma 8.4. By taking a refinement, we may assume that g j (t) − g j (t i ) ≤ ε for each t ∈ (t i−1 , t i and each i, j. Define ψ(t) = t i if t i−1 < t ≤ t i and ψ(t) = t N otherwise, and setg j = g j • ψ. Partitioning according to the integers I j , 1 B n,k k j=1g j (T n, j )h j (X n, j ) = i 1 B n,k 1 where the sum is over i ∈ N k with 1 ≤ i 1 < · · · < i k ≤ N , and we write I = (I 1 , . . . , I k The right-hand side of (8.9) is bounded (since g j , h j , and | Z n,i j |e −| Z n,i j | are bounded) and, by Condition 8.1 (a), converges to 0 in probability, and hence also in expectation. By the choice of the partition, g j (T n, j ) − g j (T n, j ) ≤ ε on B n,k and lim sup n→∞ P(B c n,k ) ≤ ε. Now let (F n ) n be a uniformly bounded sequence of R-valued random variables such that F n is F n -measurable. Since all the functions involved are bounded, there exists C < ∞ such that When X = R d , another natural family is H = h( x) = e ξ · x : ξ ∈ R d . However, these functions are not bounded, so it is necessary to modify the argument of Proposition 8.2 and Theorem 8.3. Recall from (3.54) that we writeR for the moment generating function of a measure R on R d . Proof Fix any ε > 0, k ∈ N, g 1 , . . . , g k bounded, continuous functions, and choose t 0 < · · · < t N as in Lemma 8.4. By taking a refinement, we may assume that t i − t i−1 ≤ ε. Let C n be the event thatẐ n,t i ( ξ 0 ) ≤ Z n,t i Q ( ξ 0 ) + ε for each i = 1, . . . , N and for each ξ 0 ∈ {δ/ √ d, −δ/ √ d} d . By Condition 8.1 (a), P(C n ) → 1. Let X j be independent random variables with law Q, and defineX n, j = X n, j on B n,k ∩ C n andX n, j = X j otherwise. Recall the notations ψ(t),g j (t) from the proof of Proposition 8.2 and setT n, j = ψ(T n, j ). Set h j ( x) = e ξ j · x for ξ j ∞ ≤ δ/ √ d. By the argument of the previous proof, this time using that theX n, j have law Q on (B n,k ∩ C n ) c , we find E ⎛ ⎝ k j=1 g j (T n, j )h j (X n, j ) − k j=1 g j (T n, j )Q( ξ j ) F n ⎞ ⎠ = 1 C n i e − Z n,t i k ⎡ ⎣ k j=1 g j (t i j ) Z n,i j ( ξ j ) − k j=1 g j (t i j ) Z n,i j Q ( ξ j ) ⎤ ⎦ . (8.11) By Condition 8.1 (a), the right-hand side of (8.11) converges to 0 in probability. Moreover, by the bound e ξ j · x ≤ ξ 0 ∈{±δ/ √ d} d e ξ 0 · x and the choice of C n , it is bounded as well. Hence we may repeat the argument from the proof of Theorem 8.3 to find that (T n, j ,X n, j ) j satisfy the desired conclusions. But by construction, lim inf n→∞ P(X n, j =X n, j , |T n, j −T n, j | ≤ ε) ≥ 1 − ε. Since ε > 0 was arbitrary, it follows that X n, j and T n, j themselves have the same convergence properties.
Finally, let ε > 0 and a compact interval [t, t] be given. Expanding the interval if necessary, we may assume that t, t are dyadic rationals, and, decreasing ε if necessary, we may assume that t ≤ t(ε). Since q is continuous and non-decreasing, we may choose a partition t 0 < · · · < t N of [t, t] consisting of dyadic rationals such that (8.20) [it is enough to choose the partition finely enough that max i (q(t i ) − q(t i−1 )) ≤ q(t)/4z 0 (ε) 2 q(t)], and bound The latter sum is o P (1) by (8.19) with ξ = 0, and the remaining term is at most ε/2 on the event |Z * n,t(ε) | ≤ z 0 (ε) . This event has probability at least 1 − ε − o(1) by (8.15), which completes the proof.

Moment Estimates and the Cluster After Unfreezing
While most other sections did not rely on the companion paper [21], in this section we do heavily rely on it. In particular, we make use of [ Lemma 6.4].
In this section we study B t for t ≥ T unfr , when the cluster resumes its CTBP behaviour. We will use moment methods to prove Lemma 3.32 and Theorem 3.31, completing the proof of our results.
This section is organised as follows. We start in Sect. 9.1 with some preparations concerning frozen intensity measures. In Sect. 9.2, we investigate the volume of the cluster after unfreezing and prove Lemma 3.32 (a). In Sect. 9.3 we use second moment methods on the collision edges to prove Theorem 3.31. In Sect. 9.4 we show that the collision edge whp does not originate from the frozen cluster, but rather from one of its descendants, to prove Lemma 3.32 (b). Finally, in Sect. 9.5, we study the freezing time and frozen cluster and use this to prove Theorem 3.18.
(9.6) According to Lemma 4.4, for any ε > 0 we can choose some K < ∞ such that, after possibly increasing n 0 , the right-hand side of (9.6) is bounded from above by B ( j) fr ε /s n . Since B ( j) fr = O P (s 2 n ) by Theorem 3.18 (b), the proof is complete.

A First Moment Estimate: Proof of Lemma 3.32 (a)
In this section we show how to express B t \ B fr , t ≥ T unfr , as a suitable union of branching processes. This representation leads to a simple proof of Lemma 3.32 (a). We will also use it in Sect. 9.3 to prove Theorem 3.31. Consider the immediate children v ∈ ∂B fr of individuals in the frozen cluster B fr . Then, for t ≥ 0, where B P (v) denotes the branching process of descendants of v, re-rooted and time-shifted as in (6.1). Furthermore, conditionally on B fr , the children v ∈ ∂B fr appear according to a Cox process. Formally, the point measures so that, by Theorem 3.13 and Lemma 9.1, there exists a K < ∞ such that for sufficiently large n, 1 {t≤t } K s n e −λ n (1)t dμ ( j) n,fr (t) ≤ K e K n s 3 n s n (s n +1). (9.10) Markov's inequality completes the proof.

Second Moment Estimates: Proof of Theorem 3.31
In this section we prove that P * n satisfies the assumptions of Theorem 3.29. Namely, we will split μ n = μ (K ) n + (μ n − μ (K ) n ) into the truncated measure and a remainder, as in Sect. 7. This induces a splitting of the intensity measure into Z (K ) n + Z (K ) n , and the hypothesis (a) will be verified using the estimates for the two-vertex characteristics χ (K ) n and χ n − χ (K ) n in Theorems 3.15 and 3.16. The remaining hypothesis (b) will be proved using a separate argument. Throughout the proof, the times t and t * are related as in (3.60), and we recall from (3.54) that, for a measure Q on R d , we writeQ for its moment generating function.