The multiplicative coalescent, inhomogeneous continuum random trees, and new universality classes for critical random graphs

One major open conjecture in the area of critical random graphs, formulated by statistical physicists, and supported by a large amount of numerical evidence over the last decade [23, 24, 28, 63] is as follows: for a wide array of random graph models with degree exponent $\tau\in (3,4)$, distances between typical points both within maximal components in the critical regime as well as on the minimal spanning tree on the giant component in the supercritical regime scale like $n^{(\tau-3)/(\tau-1)}$. In this paper we study the metric space structure of maximal components of the multiplicative coalescent, in the regime where the sizes converge to excursions of L\'evy processes"without replacement"[10], yielding a completely new class of limiting random metric spaces. A by-product of the analysis yields the continuum scaling limit of one fundamental class of random graph models with degree exponent $\tau\in (3,4)$ where edges are rescaled by $n^{-(\tau-3)/(\tau-1)}$ yielding the first rigorous proof of the above conjecture. The limits in this case are compact"tree-like"random fractals with finite fractal dimensions and with a dense collection of hubs (infinite degree vertices) a finite number of which are identified with leaves to form shortcuts. In a special case, we show that the Minkowski dimension of the limiting spaces equal $(\tau-2)/(\tau-3)$ a.s., in stark contrast to the Erd\H{o}s-R\'{e}nyi scaling limit whose Minkowski dimension is 2 a.s. It is generally believed that dynamic versions of a number of fundamental random graph models, as one moves from the barely subcritical to the critical regime can be approximated by the multiplicative coalescent. In work in progress, the general theory developed in this paper is used to prove analogous limit results for other random graph models with degree exponent $\tau\in (3,4)$.


INTRODUCTION AND RESULTS
In the last two decades many results regarding scaling limits of large discrete random objects to continuum analogs have been proved. Examples range from Aldous's continuum random tree [7,8,51], Schramm-Loewner evolution and critical planar systems [61], to what is most closely related to this paper: scaling limits of maximal components in the critical regime for random graphs as well as the minimal spanning tree on the giant component in the supercritical regime [3][4][5].
Motivated by empirical observations on real-world networks, in the last decade, researchers from a wide array of fields including computer science, the social sciences and statistical physics have proposed a large number of random graph models to explain various functionals of real world systems including power law degree distributions and small world scaling of distances between nodes in the network [6,21,32,33,35,44,55,56]. Many of these models have a parameter t related to the edge density and a model-dependent critical point t c . Writing n for the number of vertices in the network, if t < t c then the maximal connected component C 1 (n) has size that is negligible compared to n, while if t > t c one has a giant component C 1 (n) ∼ f (t )n for some positive model-dependent function f (t ) > 0 for t > t c . The "t = t c " regime is often referred to as the critical regime. Just as a study of the classical critical Erdős-Rényi random graph spurred enormous activity in probabilistic combinatorics in the 90s [9,21,47,52,53], the study of the critical regime of these new random graph models and new phenomena such as explosive percolation [2,60] have motivated a concerted effort to understand the critical regime of these new random graph models.
In this context, for more than a decade [23,24,28,62], one of the fundamental open conjectures in this area (loosely stated) is as follows. Consider distances between typical points in the maximal component either in the critical regime or the minimal spanning tree on the giant component in the supercritical regime scale (a) If the random graph model has an asymptotic degree distribution with finite third moments, then distances scale like n 1/3 . (b) If the random graph model has a limiting degree distribution p k k 1 with tail p k ∼ C /k τ for τ ∈ (3,4), then distances scale like n (τ−3)/(τ−1) .

Contributions of this paper:
Since we will need to setup some notation before getting to the main results, let us give a general overview of the contributions of this paper: (i) General theory: The fundamental aim of the paper is to develop a general theory one can use to prove (b) in the conjecture above for a wide class of random graphs and, in particular, derive a new class of continuum scaling limits. To do so, we consider the multiplicative coalescent with entrance boundary in the space l 0 as in [10] (see (1.11) below). Viewing the maximal components as measured metric spaces (using graph distance and vertex weights), we show that these components with edges and associated measures properly rescaled converge to continuum random objects in the Gromov weak sense. These resulting objects are obtained via appropriate tilts and vertex identifications of inhomogeneous continuum random trees; untilted versions of the same objects have been used to describe the entrance boundary of the additive coalescent [13]. These resulting random objects are "tree-like" but with a dense collection of "hubs" (corresponding to infinite-degree vertices).
(ii) Proof techniques: The standard technique in proving such results is to study height processes of certain spanning trees of the components and to show that these processes converge to limiting excursions that code the limiting random real trees. In our context, the convergence of height processes of the corresponding approximating p-trees is not known. In [11], the height processes of p-trees were shown to converge to limiting excursions in certain regimes, but these results are not applicable to our situation. Because of this, we develop new techniques relying on first showing convergence in Gromov-weak topology via a careful analysis of the tree spanning a finite collection of "typical" points in random "tilted" p-trees. In one fundamental class of random graph models, we then extend Gromov-weak convergence to Gromov-Hausdorff-Prokhorov convergence by proving a global lower mass-bound. (iii) Special case: As an example of the general theory, we study the special case of the Norros-Reittu model [57] (which in the regime of interest has been proven [46] to be equivalent to the Chung-Lu model [30] and the rank-one random graph [22]). In this case, we show that the limiting spaces are compact. We also show that the box-counting or Minkowski dimension equals (τ − 2)/(τ − 3) a.s.
In work in progress [19], we use the general theory in this paper to analyze another fundamental random graph model, the configuration model with degree distribution with exponent τ ∈ (3,4), and derive the continuum analogs of the maximal components of this model. We defer a more detailed discussion of related work and the relevance of the current study to Section 3.
Organization of the paper: A reasonable amount of notation regarding the entrance boundary of the multiplicative coalescent is required to describe the main results (Theorems 1.8, 1.9). To ease the reader into the paper, we start in Section 1.1 with the special case of the Norros-Reittu model and in Theorem 1.2 describe what the main results imply for this model. Then in Section 1.2 we define the multiplicative coalescent as well as the class of entrance boundaries of importance for the paper and then describe the two main results. The results use two notions of convergence of metric spaces; these are given a precise formulation in Section 2.1. Section 2.2 describes an important class of random trees called p-trees and the corresponding inhomogenous continuum random trees that arise as scaling limits of these objects. These are then used in Section 2.3 to give a precise description of the scaling limits of maximal components. We discuss the relevance of the main results, relate these to existing work and give an overview of the proof in Section 3. The proofs of the main results are contained in Sections 4 -7.
Notation: Throughout this paper, we make use of the following standard notation. We let d −→ denote convergence in distribution, and P −→ convergence in probability. For a sequence of random variables (X n ) n 1 , we write X n = o P (b n ) when |X n |/b n P −→ 0 as n → ∞. For a nonnegative function n → g (n), we write f (n) = O(g (n)) when | f (n)|/g (n) is uniformly bounded, and f (n) = o(g (n)) when lim n→∞ f (n)/g (n) = 0. Furthermore, we write f (n) = Θ(g (n)) if f (n) = O(g (n)) and g (n) = O( f (n)). We say that a sequence of events (E n ) n 1 occurs with high probability (whp) when P(E n ) → 1.

Model formulation.
We start by describing a particular class of random graph models called the Poissonian random graph or the Norros-Reittu model [22,57], sometimes also referred to as the rank-one random graph model [22]. In the regime of interest for this paper, as shown in [46], this model is equivalent to the Chung-Lu model [29][30][31][32] and the Britton-Deijfen-Martin-Löf model [25]. Start with vertex set [n] := {1, 2, . . . , n} and suppose each vertex i ∈ [n] has a weight w i 0 attached to it; intuitively this measures the propensity or attractiveness of this vertex in the formation of links. Writing w = (w 1 , . . . , w n ), place an edge between i and j independently for each i = j ∈ [n] with probability q i j = q i j (w ) := 1 − exp(−w i w j / n ), (1.1) where n is the total weight given by To complete the formulation, we need to specify how these vertex weights are chosen. Essentially we want the empirical distribution of weights n −1 i ∈[n] δ {w i } to converge to a fixed pre-specified distribution F as n → ∞. There are a number of ways to do this, but for this paper the following choice turns out to be convenient for a clear statement of the results. Let (w i ) i ∈[n] be constructed by We will use W for a random variable with distribution F . We will use NR n (w ) to denote the corresponding random graph.
1.1.2. Motivation and known results. As described in the introduction, one impetus for the formulation of a wide array of network models, is to capture the heterogeneous and heavytailed nature of the degree distribution of empirical networks. Write N k for the number of vertices with degree k in NR n (w ). Under the assumptions in the previous section, one can show [22,Theorem 3.13] that where W ∼ F . In particular, the degree distribution also has tail exponent τ. More important in the context of this paper is the connectivity threshold. For i 1 write C i for the i th largest connected component and let |C i | denote its number of vertices. Now define the parameter ν := E(W 2 ) E(W ) , (1.5) and note that ν < ∞ by (1.3). Then by [ The main aim of this paper is to understand the critical regime ν = 1 where also |C 1 |/n P −→ 0. In this setting, there are different universality classes depending on the vertex weights. In the Erdős-Rényi or weakly inhomogeneous universality class, critical clusters have size of order n 2/3 and their metric space structure was discovered by Addario-Berry, Broutin and Goldschmidt [4]. Interestingly, when E(W 3 ) < ∞, component sizes still scale like n 2/3 [16] while assuming finite 6 + ε-moments the metric space structure of rank-1 inhomogeneous random graphs is (apart from a trivial rescaling of size and time) the same [20]. However, in the strongly inhomogeneous regime where E(W 3 ) = ∞, the scaling limits of critical clusters are dramatically different in the sense that their sizes are gives by n (τ−2)/(τ−1) , where τ is the degree power-law exponent given by (1.3) [17,41]. In this paper, we focus on their metric space structure, obtained after rescaling edges by n −(τ−3)/(τ−1) and taking the limit as n → ∞. We show that this limiting metric space is compact and its Minkowski dimension equals (τ − 2)/(τ − 3), whereas the Erdős-Rényi scaling limit has Minkowski dimension 2.
Write NR n (w (λ)) for the corresponding random graph and let C i (λ) denote the corresponding i th largest component. Then this critical scaling window was first identified and studied in [41] where it was shown that for every fixed λ ∈ R, |C 1 |/n (τ−2)/(τ−1) as well as n (τ−2)/(τ−1) /|C 1 | are tight. The entire distributional asymptotics of component sizes were derived in [17] where it was shown that in the product topology on R N , where (Z i (λ) : i 1) are excursions away from zero of a special stochastic process described in more detail in Section 1.2.

Our results.
We make the following convention: For any metric measure space (S, d , µ) and a > 0, aS denotes the metric measure space (S, ad , µ), i.e, the space where the distance is scaled by a and the measure remains unchanged.
Consider the random graph NR n (w (λ)) and view each connected component C as a connected metric space via the usual graph distance where each edge has length one. Further, we can view each connected component C as a metric measure space by assigning weight w i /( j ∈C w j ) to vertex i ∈ C . Note that the normalization yields a probability measure on each connected component. Let S denote the space of (equivalence classes) of compact Note that distributions F that are exact power laws, i.e., of the form F (x) = 1 − (ι/x) τ−1 for x > ι and some τ ∈ (3, 4), satisfy Assumption 1.1. The main result of this section is as follows: Theorem 1.2 (Scaling limits with degree exponent τ ∈ (3, 4)). Fix λ ∈ R and consider the critical Norros-Reittu model NR n (w (λ)), i.e, assume that ν = 1 where ν is as in (1.5). Assume that the limiting distribution F satisfies Assumption 1.1.
Then, there exists an appropriate limiting sequence of random compact metric measure spaces M nr ∞ (λ) := (M nr i (λ)) i 1 such that the components in the critical regime satisfy 1 Here convergence is with respect to the product topology on S N induced by the Gromov-Hausdorff-Prokhorov metric on each coordinate S . For each i 1, the limiting metric spaces have the following properties: (a) M nr i (λ) is random compact metric measure space obtained by taking a random real tree T i (λ) and identifying a random (finite) number of pairs of points (thus creating shortcuts). 1.2. Connectivity asymptotics for the multiplicative coalescent. In this section we consider a slightly more general setting than in Section 1.1. The motivation is as follows: recall that for the rank-one model, two vertices were connected with essentially probability proportional to the product of the weight between these two vertices. For probabilists, this connectivity pattern is quite reminiscent of the famous multiplicative coalescent [9,10,15]. Whilst interesting in its own right, its fundamental importance in the context of random graphs is as follows: A wide array of random graph models can be constructed in a dynamic fashion where as time progresses new edges are created between pre-existing clusters. Even though the merging dynamics between connected components tend to be quite different from that specified by the multiplicative coalescent, the mergers from the barely subcritical regime through the critical scaling window can be approximated by the multiplicative coalescent. This idea was exploited in [18] to prove universality of scaling limits in the critical regime for several random graphs models. Thus components at criticality of a wide array of random graph models can be thought of consisting of two major parts: (a) "Blobs" that are components formed in the barely subcritical regime. (b) Edges formed between such blobs as the system proceeds from the barely subcritical regime through the critical scaling window.
The results below (in particular Theorem 1.8) specify how to handle the second aspect. In a companion paper we show how one can use macroscopic averaging of distances within blobs in random graph models such as the configuration model to show that these models also have the same scaling limit in the critical regime as Theorem 1.2 in the setting where degrees obey power-laws with exponents τ ∈ (3, 4). Further, it will follow from Theorem 1.8 that the convergence in (1.7) holds with respect to the product topology induced by Gromovweak topology on each coordinate. Therefore, Theorem 1.2 can be recovered partially from the more general Theorem 1.8 at the expense of working with a weaker topology.
Before stating the result we will need to define the multiplicative coalescent. The natural domain of this Markov process is the space We will work in the simpler setup where the Markov process starts with a finite number of clusters, i.e, the process starts with x ∈ 2 ↓ such that ∃n < ∞ such that x i = 0 for i > n. Write 2 ↓ (n) for the collection of such vectors. Now the Markov process (X(t )) t 0 with initial state X(0) = x evolves as follows. Write X(t ) = (X i (t )) i 1 . Then for i = j , clusters i and j merge at rate X i (t ) · X j (t ) into a single cluster of size X i (t ) + X j (t ).
Note that for any fixed time t > 0, it is easy to find the distribution of masses X(t ) via the following random graph: Definition 1.4 (Random graph G n (x, t )). Consider the vertex set [n] := {1, 2, . . . , n} and assign weight x i to vertex i . Now connect each pair of vertices i , j with i = j independently with probability (1.10) Call this random graph G n (x, t ). For a connected component C ⊆ G n (x, t ), let mass(C ) := i ∈C x i . Let (C i (t )) i 1 denote the connected components arranged in decreasing order of their masses.
The following is obvious from the definition of the multiplicative coalescent: Lemma 1.5. For each fixed t 0, the masses of the multiplicative coalescent at time t started with finite number of initial clusters with masses x satisfies Analogous to (1.9), consider the two spaces These spaces turn out to be crucial in describing the entrance boundary of the eternal multiplicative coalescent in [10]. In the context of this paper, we are interested in studying scaling limits of connected components of the random graph G n (x, t ) when (suitably normalized) asymptotics of the weight vector x are described by a vector c ∈ l 0 . Let We will make the following assumptions about the weight vector x := x(n) used to form the graph G n (x, t ). These place the associated graph in a particular entrance boundary of the associated eternal multiplicative coalescent [10, Proposition 7]. Assumption 1.6. For each n 1, let x (n) = (x (n) i : 1 i n) be an initial finite-length vector belonging to 2 ↓ (n). Suppose that as n → ∞ there exists c ∈ l 0 such that σ 3 (x (n) ) (σ 2 (x (n) )) 3 → j c 3 j , (1.12) x (n) j σ 2 (x (n) )) → c j for j 1, and (1.13) (1.14) Now let ξ j : j 1 be a sequence of independent exponential random variables where ξ j has rate c j for each j 1. For a fixed λ ∈ R, consider the process It turns out that this process is well defined precisely if c ∈ 3 ↓ [10]. Consider the "reflected at zero" processṼ  (1.16) and the excursions ofṼ c λ (·) from zero. Then Aldous and Limic [10] showed that the lengths of these excursions are a.s. in l 2 precisely when c ∈ l 0 , and thus can be arranged in decreasing order. Write  (1.14), the masses of the connected components of the graph G n (x, t n ) satisfy with respect to the topology in 2 ↓ , where Z(λ) is as in (1.18). Now consider the connected components in G n (x, t ), and as before, view each component C as a connected metric space via the usual graph distance where each edge has length one. Further, view each component C as a measured metric space by assigning mass x i / mass(C ) to each vertex i ∈ C . Let S * denote the space of (equivalence classes) of measured metric spaces equipped with Gromov-weak topology (see Section 2.1.2 for definition) and view : i 1 as a random element in S N * . Then our next result is about Gromov-weak convergence of M n (λ).
Here weak convergence is on S N * which is equipped with the natural product topology induced by the Gromov-weak topology on each coordinate S * . Remark 1. A full description of the limit objects is given in Section 2.3. The limit objects use tilted versions of inhomogeneous continuum random trees and checking compactness even of the original versions at this level of generality turns out to be quite intractable. However as the next theorem shows, in the special case of relevance to the rank-one model, one can prove much more.
Consider the special sequence c = c(α, Consequently, the Hausdorff dimension satisfies the bound dim h (M c i (λ)) (τ − 2)/(τ − 3) a.s. Remark 2. Since we are dealing with equivalence classes of metric spaces (see Sections 2.1.1 and 2.1.2), Theorem 1.9 should be understood as claiming the existence of representative spaces M c i (λ) that are compact, and satisfy the said conditions about the fractal dimensions. We will only work with these representative spaces throughout this paper.

DEFINITIONS AND LIMIT OBJECTS
2.1. Convergence of metric spaces. Proper notions of convergence of (measured) metric spaces is one of the central themes in this paper. Here we define the two topologies used in the statement of our results. We mainly follow [1,26,38,39].
A correspondence C between X 1 and X 2 is a measurable subset of X 1 × X 2 such that for every x 1 ∈ X 1 there exists at least one x 2 ∈ X 2 such that (x 1 , x 2 ) ∈ C and vice-versa. The Gromov-Hausdorff distance between the two metric spaces (X 1 , d 1 ) and (X 2 , d 2 ) is defined as C is a correspondence between X 1 and X 2 .
Suppose (X 1 , d 1 ) and (X 2 , d 2 ) are two metric spaces and p 1 ∈ X 1 , and p 2 ∈ X 2 . Then the pointed Gromov-Hausdorff distance between X 1 := (X 1 , d 1 , p 1 ) and X 2 := (X 2 , d 2 , p 2 ) is given by inf dis(C ) : C is a correspondence between X 1 and X 2 and (p 1 , p 2 ) ∈ C . (2.1) We will need a metric that also keeps track of associated measures on the corresponding spaces. A compact measured metric space (X , d , µ) is a compact metric space (X , d ) with an associated probability measure µ on the Borel sigma algebra B(X ). Given two compact measured metric spaces (X 1 , d 1 , µ 1 ) and (X 2 , d 2 , µ 2 ) and a measure π on the product space X 1 × X 2 , the discrepancy of π with respect to µ 1 and µ 2 is defined as where π 1 , π 2 are the marginals of π and || · || denotes the total variation distance between probability measures. Then the Gromov-Haussdorf-Prokhorov distance between X 1 and X 2 is defined as where the infimum is taken over all correspondences C and measures π on X 1 × X 2 . Similar to (2.1), we can define a "pointed Gromov-Hausdorff-Prokhorov distance", d pt GHP between two metric measure spaces X 1 and X 2 having two distinguished points p 1 and p 2 respectively by taking the infimum in (2.2) over all correspondences C and measures π on X 1 × X 2 such that (p 1 , p 2 ) ∈ C .
Write S for the collection of all measured compact metric spaces (X , d , µ). The function d GHP is a pseudometric on S , and defines an equivalence relation X ∼ Y ⇔ d GHP (X , Y ) = 0 on S . LetS := S / ∼ be the space of isometry equivalent classes of measured compact metric spaces andd GHP the induced metric. Then by [1], (S ,d GHP ) is a complete separable metric space. To ease notation, we will continue to use (S , d GHP ) instead of (S ,d GHP ) and X = (X , d , µ) to denote both the metric space and the corresponding equivalence class.

Gromov-weak topology.
Here we mainly follow [38]. Introduce an equivalence relation on the space of complete and separable metric spaces that are equipped with a probability measure on the associated Borel σ-algebra by declaring two such spaces (X 1 , d 1 , µ 1 ) and (X 2 , d 2 , µ 2 ) to be equivalent when there exists an isometry ψ : support(µ 1 ) → support(µ 2 ) such that µ 2 = ψ * µ 1 := µ 1 • ψ −1 , i.e., the push-forward of µ 1 under ψ is µ 2 . Write S * for the associated space of equivalence classes. As before, we will often ease notation by not distinguishing between a metric space and its equivalence class.
Fix m 2, and a complete separable metric space (X , d ). Then given a collection of points denote the symmetric matrix of pairwise distances between the collection of points. A function Φ : S * → R is called a polynomial of degree m if there exists a bounded continuous function φ : Here µ ⊗m is the m-fold product measure of µ. Let Π denote the space of all polynomials on S * . Definition 2.1 (Gromov-weak topology). A sequence (X n , d n , µ n ) n 1 ∈ S * is said to converge to (X , d , µ) ∈ S * in the Gromov-weak topology if and only if Φ((X n , d n , µ n )) → Φ((X , d , µ)) for all Φ ∈ Π.
In [38,Theorem 1] it is shown that S * is a Polish space under the Gromov-weak topology.
It is also shown that, in fact, this topology can be completely metrized using the so-called Gromov-Prokhorov metric.

Spaces of trees with edge lengths, leaf weights and root-to-leaf measures.
In the proof of the main results we need the following two spaces built on top of the space of discrete trees. The first space T I J was formulated in [12,13] where it was used to study trees spanning a finite number of random points sampled from an inhomogeneous continuum random tree (as described in the next section). We use the same notation in this paper. A tree t ∈ T I J can be viewed as being composed of two parts: (1) shape(t) describing the shape of the tree (including the labels of leaves and hubs) but ignoring edge lengths. The set of all possible shapes T shape I J is obviously finite for fixed I , J .
(2) The edge lengths l(t) := (l e : e ∈ t). Consider the product topology on T I J consisting of the discrete topology on T shape I J and the product topology on R m where m is the number of edges of t. The space T * I J : We will need a slightly more general space. Along with the three attributes above in T I J , the trees in this space have the following two additional properties. Let L (t) := {1+, . . . , J +} denote the collection of non-root leaves in t. Then every leaf v ∈ L (t) has the following attributes: In addition to the topology on T I J , the space T * I J with these additional two attributes inherits the product topology on R J owing to leaf weights and (d pt GHP ) J owing to the root-to-leaf measures.
For consistency, we add to the spaces T I J and T * I J a conventional state ∂. Its use will be clear later on.

Random p-trees and inhomogeneous continuum random trees (ICRTs).
For fixed m 1, write T m and T ord m for the collection of all rooted trees with vertex set [m] and rooted ordered trees with vertex set [m] respectively. Here we will view a rooted tree as being directed with the root being the original progenitor and each edge being directed from child to parent. An ordered rooted tree is a tree where children of each individual are assigned an order (meant to describe for example orientation in a planar embedding, e.g., right to left or some notion of age, e.g., oldest to youngest).
In this section, we define a family of random tree models called p-trees [27,59], and their corresponding limits, the so-called inhomogeneous continuum random trees, which play a key role in describing the limit metric spaces as well as in the proof. Fix m 1, and a probability mass function p = (p 1 , p 2 , . . . , p m ) with p i > 0 for all i ∈ [m]. A p-tree is a random tree in T m , with law as follows. For any fixed t ∈ T m and v ∈ t, write d v (t) for the number of children of v in the tree t. Then the law of the p-tree, denoted by P tree , is defined as: Generating a random p-tree T ∼ P tree and then assigning a uniform random order on the children of every vertex v ∈ T gives a random element with law P ord (·; p) given by Obviously a p-tree can be constructed by first generating an ordered p-tree with the above distribution and then forgetting about the order.
In a series of papers [11][12][13] it was shown that p-trees, under various assumptions, converge to inhomogeneous continuum random trees that we now describe. Recall the space 2 ↓ in (1.9). Consider the subset Θ ⊂ 2 ↓ given by Now recall from [37,51] that a real tree is a metric space (T , d ) that satisfies the following for every pair a, b ∈ T :

Construction of the ICRT:
Given θ ∈ Θ, we will now define the inhomogeneous continuum random tree T θ (∞) . We mainly follow the notation in [13]. Assume that we are working on a probability space (Ω, F , P θ ) rich enough to support the following: (a) For each i 1, let P i := (ξ i ,1 , ξ i ,2 , . . .) be a rate θ i Poisson process, independent for different i . The first point of each process ξ i ,1 is special and is called a joinpoint, whilst the remaining points ξ i , j with j 2 will be called i -cutpoints [13]. (b) Independent of the above, let U = (U (i ) j : j 1, i 1) be a collection of i.i.d. uniform (0, 1) random variables. These are not required to construct the tree but will be used to define a certain function on the tree.
The random real tree (with marked vertices) T θ (∞) is then constructed as follows: (i) Arrange the cutpoints ξ i , j : i 1, j 2 in increasing order as 0 < η 1 < η 2 < · · · . The assumption that i θ 2 i < ∞ implies that this is possible. For every cutpoint η k = ξ i , j , let η * k := ξ i ,1 be the corresponding joinpoint. (ii) Next, build the tree inductively. Start with the branch [0, η 1 ]. Inductively assuming we have completed step k, attach the branch (η k , η k+1 ] to the joinpoint η * k corresponding to η k . Write T θ 0 for the corresponding tree after one has used up all the branches [0, η 1 ], (η k , η k+1 ] : k 1 . Note that for every i 1, the joinpoint ξ i ,1 corresponds to a vertex with infinite degree. Label this vertex i . The ICRT T θ (∞) is the completion of the marked metric tree T θ 0 . As argued in [13,Section 2], this is a real-tree as defined above which can be viewed as rooted at the vertex corresponding to zero. We call the vertex corresponding to joinpoint ξ i ,1 hub i . Since i θ i = ∞, one can check that hubs are almost everywhere dense on T θ (∞) .

Remark 3.
The uniform random variables (U (i ) j : j 1, i 1) give rise to a natural ordering on T θ (∞) (or a planar embedding of T θ (∞) ) as follows. For i 1, let (T (i ) j : j 1) be the collection of subtrees hanging off of the i th hub. Associate U (i ) j with the subtree T (i ) j , and think of T (i ) j 1 appearing "to the right of" T (i ) . This is the natural ordering on T θ (∞) when it is being viewed as a limit of ordered p-trees. We can think of the pair (T θ (∞) ,U ) as the ordered ICRT. Reduced tree r (∞) I J : Fix I 0 and J 1. Now let η 0 = 0 and for j 0 call vertex η j the j th sampled leaf and label this as j + to differentiate this from hub j . Note that the subtree of  Mass measure: For every vertex v ∈ T θ (∞) , define the degree of v to be the number of connected components of T θ (∞) \ {v}. Vertices with degree one are called leaves of T θ (∞) and all other vertices form the skeleton of the tree. Let L (T θ (∞) ) denote the set of leaves of T θ (∞) . In [13], it was shown that one can associate to T θ (∞) , a natural probability measure called the mass measure satisfying µ(L (T θ (∞) )) = 1. Root-to-vertex path measures: Now using the collection of uniform random variables above, we will define a function G (∞) on the tree as well as a collection of measures on paths emanating from the root. Recall that the hubs in T θ (∞) have infinite degrees. Let (T (i ) j : j 1) be the collection of subtrees of hub i in T θ (∞) (labeled in some fashion). For each y ∈ T θ (∞) , let (2.7) We will show in our proof that G (∞) (y) is finite for almost every realization of T θ (∞) and for µalmost every y ∈ T θ (∞) (see Lemma 4.9 and Theorem 4.15 below). For y ∈ T θ (∞) , let [ρ, y] denote the path from the root ρ to y. For every y, define a probability measure on [ρ, y] as if v is the i th hub and y ∈ T (i ) j for some j. (2.8) Thus, this probability measure is concentrated on the hubs on the path from y to the root.

Remark 4.
Note that both G (∞) (·) and Q (∞) y (·) depend on the realization of the pair (T θ (∞) ,U ), but we chose to suppress them to avoid cumbersome notation.

Random tree R (∞)
I J : Recall the tree r (∞) I J above. Recall that η j is the vertex in the tree T θ (∞) corresponding to leaf j + for 1 j J . To each of these J leaves, associate the value G (∞) (η j ), and associate the probability measure Q (∞) η j to the path [0+, j +]. This tree is a random element of the space T * I J (see Section 2.1.3), which we denote by R (∞) I J .

Continuum limits of components.
The aim of this section is to give an explicit description of the limiting (random) metric spaces in Theorem 1.8. We start by constructing a specific tilted version of the ICRT in Section 2.3.1. Then in Section 2.3.2 we describe the limits of maximal components.
2.3.1. Tilted ICRTs and vertex identification. Let (Ω, F , P θ ) and T θ (∞) be as in Section 2.2 and let γ > 0 a constant. Informally, the construction goes as follows: We will first tilt the distribution of the original ICRT T θ (∞) using the functional to get a tilted tree T θ, (∞) . We then generate a random but finite number N (∞) of pairs of points (x k , y k ) : 1 k N (∞) . The final metric space is obtained by creating "shortcuts" by identifying the points x k and y k . Formally the construction proceeds in four steps: (a) Tilted ICRT: Define P θ on Ω by .
The expectation in the denominator is with respect to the original measure P θ . In our proof we will show that this object is finite. Write (T θ, (∞) , µ ) and U = (U (i ), j : i , j 1) for the tree and the mass measure on it, and the associated random variables under this change of measure. (b) Poisson number of identification points: Conditionally on ((T θ, (∞) , µ ),U ), generate N (∞) having a Poisson(Λ (∞) ) distribution, where Here, (T (i ), j : j 1) denotes the collection of subtrees of hub i in T θ, (∞) . (As mentioned before in Remark 4, G (∞) (·) depends on the realization of the ordered ICRT. U (i ), j appears in the expression above as the function G (∞) acts on y ∈ T θ, (∞) for which the associated order is described by U .) (c) "First" endpoints (of shortcuts): Conditionally on (a) and (b), sample x k from T θ, (∞) with density proportional to G (∞) (x)µ (d x) for 1 k N (∞) .

(d) "Second" endpoints (of shortcuts) and identification:
Having chosen x k , choose y k from the path [ρ, x k ] joining the root ρ and x k according to the probability measure Q (∞) ) Identify x k and y k , i.e., form the quotient space by introducing the equivalence relation x k ∼ y k for 1 k N (∞) .

Definition 2.2.
Fix γ 0 and θ ∈ Θ as in (2.6). Let G ∞ (θ, γ) be the metric measure space constructed via the four steps above equipped with the measure inherited from the mass measure on T θ, (∞) .
In our proofs, we will always think of the leaf end (of a shortcut or a surplus edge) as the first endpoint, and the second endpoint will be selected from the skeleton.

Limits of the components.
Fix λ ∈ R and c ∈ l 0 as in (1.11) and consider the setting of Theorem 1.8. We will need 2 main objects: (a) The processṼ c λ (·) in (1. 16). Recall that the excursions of this process from zero could be arranged in increasing order of lengths as Z (λ). Let Ξ (i ) = (c j : ξ j ∈ Z i ) denote the point process of jumps of the processṼ c λ (·) corresponding to the excursion Z i (λ). Abusing notation we will write The actual lengths of these excursions (Z i (λ) : i 1) as in (1.18).
From these objects, for each fixed i 1, define the random variableγ (i ) and the point process Our proof (see Proposition 5.1) will imply that θ (i ) ∈ Θ as in (2.6) a.s. Define , and generate the random metric measure spaces where G ∞ (θ,γ) is as described in Section 2.3.1 and the metric spaces are conditionally independent across i given the driving parameters in (2.10). Let M c ∞ (λ) = (M c i (λ) : i 1). Then this is the limiting collection of metric spaces in Theorem 1.8.
To describe the sequence of spaces M nr ∞ (λ) appearing in Theorem 1.2, define Here W is a random variable with distribution F as in (1.3). Then

DISCUSSION
We describe the two major motivations for developing the general theory of this paper in Sections 3.1 and 3.2. In Sections 3.3 and 3.4, we include a brief discussion about ICRTs as well as give an overview of the order in which the proofs are carried out.

Universality and domains of attraction of critical random graph models.
One natural question the reader might ask at this point is why the general theory in Section 1.2, why not just stick to the rank-one random graph model as in Section 1.1. As we have described in the introduction, the aim of this paper is the development of general theory applicable to a wide array of models. What does one mean by this? It turns out that many different random graph models can be constructed in a dynamic fashion as a graph-valued process {G n (t ) : t 0} where edges are added as time advances thus resulting in mergers of components as t ↑ t c . In this construction, there is a critical time t c (model-dependent) such that the giant component emerges after time t c . Now for most random graph models (including the configuration model) the dynamics of mergers of components starting at time zero do not look like the multiplicative coalescent. However if one were to zoom in at the critical time t c , for many models, there exists ε n ↓ 0 such that if one were to look at the interval [t c − ε n , t c + ε n ], then mergers of components can be approximated by the multiplicative coalescent. Here t c − ε n often corresponds to the barely subcritical regime of the random graph. Thus if one had good control over component functionals at the barely subcritical time t c −ε n and in particular if one was able to show that component sizes appropriately normalized satisfied Assumption 1.6, then one can use Theorem 1.8 to derive convergence at the critical time t c of the maximal components. Note that one does not expect component sizes at time t c − ε n to satisfy assumptions of the Norros-Reittu model in (1.4). Rather in most cases, at time t c − ε n , the expected size of the component of a randomly selected vertex V n would scale like n δ 1 while the maximal component would scale like n δ 2 (ignoring logarithmic corrections) where δ 1 < δ 2 are related to various scaling exponents of the system. In work in progress [19], Theorem 1.9 coupled with delicate estimates of various scaling exponents for the configuration model in the barely subcritical regime, proves analogous results for the configuration model with degree exponent τ ∈ (3, 4). Sizes of maximal components in the critical regime including the heavy-tailed regime for this model was previously analyzed in [48]. Further as was done in [18], where a number of sufficient conditions for the domain of attraction of the critical Erdős-Rényi scaling limits were derived, we hope to derive similar general conditions for a random graph model to belong to the same domain of attraction as the rank-one model with τ ∈ (3, 4), established in this paper.

3.2.
Minimal spanning tree on inhomogeneous random graphs. As described in the introduction, a second major motivation for the technical analysis in this paper is the minimal spanning tree. To fix ideas, consider the Norros-Reittu model in the supercritical regime (the parameter in (1.5) ν > 1). To each edge attach a random edge weight i.i.d. across edges, assumed to be derived from a continuous distribution. Consider the minimal spanning tree (MST) of the giant component. A large amount of simulation-based evidence from statistical physics [23,24,28,62] suggests that when the degree exponent τ ∈ (3, 4) then the distances in this object scale like n (τ−3)/(τ−1) , the same distance scaling shown in this paper for the maximal components in the critical regime (Theorem 1.2). This is not a coincidence. As has been shown in a series of fundamental papers [3][4][5] for the complete graph and the supercritical Erdős-Rényi random graph, a major ingredient in the analysis of the MST problem is the scaling of maximal components in the critical regime which then provides crucial input for the scaling limit of the MST. Till date we have no rigorous results for the scaling of the MST on any "inhomogeneous" random graph model. This The reason behind this choice of θ i is explained in Section 8. On the right, an approximation of a Brownian CRT (using a uniform random tree on the same number of vertices). Vertex sizes are proportional to the degree of the vertex.
paper provides the first step in answering this question in the heavy-tailed regime. Further this program should enable one to analyze the MST for random graph models other than the rank-one model which belong to the same "domain of attraction" in the critical regime.
3.3. Inhomogeneous continuum random trees. As evident from Section 2.2, ICRTs play a major role in the description of our limiting objects. Despite a lot of work on these objects in the last decade [11,13,27], a number of questions regarding these continuum objects are still open, ranging from sufficient conditions for compactness to the dependence of the fractal properties of this object on the driving parameter θ. Our proof shows that in some special cases, ICRTs are compact metric spaces when θ is sampled according to an appropriate sizebiased distribution. This can be seen as an annealed result on compactness of the ICRT. Whether compactness is true for non-random sequences θ ∈ Θ has been open problem for more than a decade [11]. Similar questions hold for its fractal dimensions. See Section 8 for a more detailed account of these problems.

Overview of the proof.
In Section 4, we study the random graph G n (x, t ) as in Definition 1.4. We start with the simple observation that conditional on the vertex set of components of G n (x, t ), a fixed component C has the same distribution as G n (x, t ) conditional on being connected. This section studies asymptotics for such distributions assuming specific regularity properties of vertex weights in the component in the large network limit, showing Gromovweak convergence of the associated graph under proper normalization of edge lengths and vertex weights. Section 5 uses the size-biased exploration of the process G n (x, t ) [9] to show that maximal connected components satisfy the hypothesis required in Section 4. Section 6 studies the special entrance boundary in (1.19) proving both compactness of the limiting objects as well as strengthening the convergence in the Gromov-weak topology to convergence in d GHP . In Section 7, we derive the box-counting or Minkowski dimension. In Section 8, we conclude by describing a number of open problems.

PROOFS: ASYMPTOTICS CONDITIONAL ON BEING CONNECTED
The aim of this Section is to study large connected components of G n (x, t ) assuming vertex weights satisfy a few regularity properties. 4.1. Tilted p-trees and connected components of G (x, t ). Recall the random graph G (x, t ) from Definition 1.4. Here for any t 0, (C i (t ) : i 1) denotes the components in decreasing order of their mass sizes. In this section we will describe results from [20] which gave a method of constructing connected components of G (x, t ) conditional on the vertices of the components. This construction involved tilted versions of p-trees introduced in Section 2.2. Since these trees are parametrized via a driving probability mass function (pmf) p, it will be easy to parametrize various random graph constructions in terms of pmfs as opposed to vertex weights x. Proposition 4.1 will relate vertex weights to pmfs.
Fix n 1 and V ⊂ [n] and write G con V for the space of all simple connected graphs with vertex set V . For fixed a > 0, and probability mass function where Z (p, a) is the normalizing constant ) be the vertex set of C i (t ) for i 1 and note that V (i ) : i 1 denotes a random finite partition of the full vertex set [n]. The following result is obvious from the construction of G (x, t ): Thus the random graph G (x, t ) can be generated in two stages: (i) Stage I: Generate the partition of the vertices into different components, i.e., generate V (i ) : i 1 . (ii) Stage II: Conditional on the partition, generate the internal structure of each component following the law of P con (·; p (i ) , a (i ) , V (i ) ), independently across different components.
Let us now describe an algorithm to generate such connected components using distribution (4.2). To ease notation, let V = [m] for some m 1 and fix a probability mass function p on [m] and a constant a > 0 and write P con (·) := P con (·; p, a, [m]) on G con m := G con [m] . We will first need to set up some notation before describing this result.
Depth-first exploration of ordered trees. Recall that we used T ord m for the space of ordered (or planar) trees with vertex set [m]. Given a tree t ∈ T ord m , one can use the associated order to explore the tree in a depth-first manner. More precisely we start with v(1) being the root of t. At each stage 1 i m, we will keep track of three types of vertices: the set of active vertices-A (i ), the set of explored vertices-O (i ), and the set of unexplored vertices-U (i ). The set of active vertices will in fact be viewed as a vertical stack (not just a set) with A (i ) representing the state of this stack at the end of step A (i ). Initialize the process with Write P(t) for set of pairs of vertices {u, v} such that u, v ∈ A (i ) for some 1 i m; namely both vertices are active but have not yet been explored. Using terminology from [4], call this collection the set of permitted edges. Thus, Write E (t) for the edge set of t. Now define the function L : T ord m → R + by Recall the (ordered) p-tree distribution from (2.5). Using L(·) to tilt this distribution results in the distribution For future reference we fix notation for the various objects required in the proof below.   .1), independent across permitted edges. Then, the resulting random graph has distribution P con on G con m , i.e, has the same distribution asG m (p, a).

Convergence of connected components under weight assumptions.
The aim of this section is to prove Gromov-weak convergence for the connected graphG m (p, a) under regularity conditions on a and p as m → ∞. We will assume that we have ordered the index set [m] so that p 1 p 2 · · · p m > 0. Let

Assumption 4.4.
As m → ∞, the following hold: The following theorem is the main result of this section.
where G ∞ (θ, γ) is the random metric space defined in Definition 2.2 and convergence is in the Gromov-weak topology on metric spaces.
The rest of this section proves this result. We will throughout assume thatG m (p, a) has been constructed using Proposition 4.3.

Two constructions of p-trees:
Exploration process and the birthday construction. We start by describing an explicit construction of the (untilted) p-tree T p m first developed in [11]. At the end of this section we describe a second construction used later in the paper. Exploration process construction: The first construction is initiated by setting up a map ψ p : [0, 1] m → T ord as follows. Let u := (u v : v ∈ [m]) be a collection of distinct points in (0, 1). Define Set v * to be the root of the tree ψ p (u). Define Then F exc,p (1−) = 0 and F exc,p (s) > 0 for s ∈ [0, 1). Extend the definition of F exc,p to s ∈ [0, 1] by define F exc,p (1) = 0. We use F exc,p to construct a depth-first-search of an ordered tree whose exploration in this depth-first manner is encoded by the function F exc,p . This in turn defines the tree ψ p (u). As before, in this construction we carry along a set of explored vertices O (i ), active vertices A (i ) and unexplored vertices U (i ) = [m]\(A (i )∪O (i )), for 0 i m. We view A (i ) as the state of a vertical stack A after the i th step in the depth-first-search. Initialize 1 j k where we have ordered these vertices in the sequence that they are found in this interval, i.e., Update the stack A as follows: (ii) Push u( j ), 1 j k, to the top of A sequentially (so that u(k) will be on the top of the stack at the end). Let A (i ) be the state of the stack after the above operations. Figure 4.1 for a pictorial description of this construction.
, v ∈ D(i )} and using the order prescribed in the above exploration to make the tree an ordered tree. The fact that this procedure actually produces a tree is proved in [11]. i.e., ψ p (X) For future reference, coupled with the above construction, define S (i ) : where a is the scaling constant in (4.1). Birthday construction: We now describe a second construction of p-trees, first formulated in [27]. We urge the reader to skim this portion and return to it once she has reached Section 4.5. Let Y := (Y 0 , Y 1 , . . .) be an infinite sequence of i.i.d. random variables with distribution p.
Let R 0 = 0 and for l 1, let R l denote the l -th repeat time, i.e., Now consider the directed graph formed via the edges It is easy to check that this gives a tree which we view as rooted at Y 0 . Intuitively the process of constructing a tree is as follows: the tree "grows" via the addition of new vertices sampled using p till it stumbles across a "repeat" (a vertex already found) when it goes back to the first occurrence of this "repeat" and starts growing from that position. The following striking result was shown in [27].

Theorem 4.7 ([27, Lemma 1 and Theorem 2]). The random tree T (Y) viewed as an object in
Remark 5. The independence between the sequence Y R 1 −1 , Y R 2 −1 , . . . and the constructed p tree T (Y) is truly remarkable. In particular, suppose S is a p-tree with distribution as in (2.4) and for fixed r 1, letỸ 1 ,Ỹ 2 , . . .Ỹ r be i.i.d. with distribution p. Write S r ⊂ S for the tree spanned by these vertices and the root. Let T B r ⊂ T (Y) denote the subtree with vertex set Y 0 , Y 1 , . . . , Y R r −1 , namely the tree constructed in the first R r steps. Here B is a mnemonic for "birthday tree" and also to distinguish this construction from a generic random tree model with r vertices. Then the above result (formalized as [27,Corollary 3]) implies that these can be jointly constructed as We use this fact often in Section 4.5.

4.3.
Uniform integrability of the tilt. The first use of the above construction of the p-tree is to prove the following: Here we have used (e x − 1)/x e x for x > 0 for the first inequality and the second inequality follows using the fact that t is a tree, so that for each (i , j ) ∈ E (t) such that i is the parent of j , we have p i p j p 1 p j . By Assumption 4.4, we have ap 1 → γθ 1 . In particular, there is a constant C > 0 such that for all m 1, and t ∈ T ord m , Now recall the functions A m andĀ m := a A m from (4.6). Using the equivalent characterization of the permitted edge set from (4.3) and comparing this with (4.6), it is easy to check that Now by the definition of F exc,p , By (4.6), Thus By Assumption 4.4(ii) and (4.10), for any s 0, there exists K = K (s) < ∞ such that Now the following lemma completes the proof of Proposition 4.8. ■ Lemma 4.9. There exists a positive constant c > 0 such that for every m 1 and x e, ) be the collection of uniform random variables used to construct F p . Write Q[0, 1] for the set of rationals in [0, 1]. Then note that We start by analyzing R 1 (m). For fixed q ∈ Q[0, 1], define the collection of m functions Also note that If we can show that κ := sup then standard concentration inequalities for the maxima in empirical processes [49, Theorem 1.1(b)] will imply the existence of a constant c 1 > 0 such that for all m 1 and x > 0, Let us now prove (4.14). In fact we will show the stronger result: Let X (1) < X (2) < · · · < X (m) denote the order statistics of X and let π denote the corresponding permutation of [m], namely X (i ) = X π(i ) . Note that .
We first analyze R 11 (m). By the DKW inequality [54], by simply expanding the square. Now note that since π is a uniform random permutation of the vertex set [m], for any fixed i 1 we also have Thus Now assuming that we construct π by sequentially sampling without replacement from [m], let F k denote the σ-field generated by (π(1), π(2), . . . , π(k)) for 0 k m − 1. Let M 0 = 0 and consider the sequence It is easy to check that {M k : 0 k m − 1} is a martingale with respect to the filtration {F k : 0 k m − 1}. Then (4.17) and Doob's L 2 -maximal inequality yield Using (4.16) with i = m/2 then gives E(R 12 (m)) 16 for all m 1. Thus we have shown that sup m 1 max(E(R 11 (m)), E(R 12 (m))) < ∞. This proves (4.14) and thus (4.15).
To complete the proof of the lemma, we need to get a tail bound on R 2 (m) appearing in (4.13). As before, using [49], it is enough to show sup m 1 E(R 2 (m)) < ∞. However, note that We now use (4.14) together with Assumption 4.4 to complete the proof. ■ 4.4. Another construction ofG m (p, a) and a modification. In this section, we start by giving a more explicit description of the algorithm described in Proposition 4.3 via adding permitted edges to a tilted p-tree. We first set up some notation. As a matter of convention, we will view ordered rooted trees via their planar embedding, using the associated ordering to determine the relative locations of siblings of an individual. We think of the left most sibling as the "oldest". Further, in a depth-first exploration, we explore the tree from left to right. Now given a planar rooted tree t ∈ T m , let ρ denote the root and for every vertex v ∈ denotes the set of endpoints of all permitted edges emanating from v. Define We will now use the set (s j , t j ) : 1 j N (m) to generate pairs of points (L j , R j ) : 1 j N (m) in the tree that will be joined to form the surplus edges. (iii) "First" endpoints: Fix j and suppose s j ∈ (y * (i − 1), y * (i )] for some i 1, where y * (i ) is as given right above (4.19). Then the first endpoint of the surplus edge corresponding to (s j , t j ) is L j := v(i ). (iv) "Second" endpoints: Note that in the interval (y * (i − 1), y * (i )], the functionĀ (m) is of constant height aG (m) (v(i )). We will view this height as being partitioned into subintervals of length ap u for each u ∈ P(v(i ), T p, m ), the collection of endpoints of permitted edges emanating from L k . (Assume that this partitioning is done according to some preassigned rule, e.g., using the order of the vertices in P(v(i ), T p, m ).) Suppose t j belongs to the interval corresponding to u. Then the second endpoint is R j = u. Form an edge between (L j , R j ).
(v) In this construction, it is possible that one created more than one surplus edge between two vertices. Remove any multiple surplus edges.
(iv ) Conditional on T p, m , N (m) = k and the first endpoints (L j : 1 j k), generate the second endpoints in an i.i.d. fashion where conditional on L j = v, the probability distribution of R j is given by Identify L j and R j for 1 j k. Thus, instead of adding an edge between L j and one of the right children on the path [ρ, L j ] as in Lemma 4.10(c), we identify it to the parent of this vertex which is on [ρ, L j ]. Also, we do not remove any multiple surplus edges. This construction turns out to be easier to work with. G mod m (p, a) will be viewed as a metric measure space via the graph distance where vertex v has mass p u where the sum is taken over all u ∈ [m] which have been identified with v. Intuitively it is clear that σ(p)G m (p, a) and σ(p)G mod m (p, a) are "close". This is formalized in Lemma 4.12.

Remark 6.
At this point we urge the reader to go back to Section 2.3.1 and remind themselves of the four steps in the construction of the limit metric space G ∞ (θ, γ), and note the similarities to the construction above. In particular, we make note of the following:  .8) can be ignored as we will see in Lemma 4.14), and the number of shortcut points selected, namely N (m) , has a Poisson distribution with mean a E(G (m) (V 1 ) | T p, m ). Here V 1 has distribution p. (b) For the limit object, we tilt the measure using the functional L (∞) (T θ (∞) , and the number of shortcuts, namely N (∞) , follows a Poisson distribution with mean γ E(G (∞) (V 1 ) | T θ, (∞) ,U ). Here V 1 is distributed according to the mass measure µ on T θ, (∞) . As a brief warm-up to the kind of calculations in the next section, we now prove a simple lemma on tightness of the number of surplus edges. We will prove distributional convergence of this object in the next section.
, v ∈ P(u,t), and some universal positive constant c. Hence Since σ(p) → 0 and aσ(p) → γ, P(F c ) → 0 as desired.  .3). To simplify notation, we will write Φ(X ) instead of Φ (X , d , µ). To prove Theorem 4.5, we need to show that for every fixed 1 and functions φ and Φ as above, where we sample points according to p inG m (p, a) while we sample points according to the measure on G ∞ (θ, γ) inherited from the mass measure. Now recall the explicit five step construction ofG m (p, a) in Section 4.4 starting from the tilted p-tree T p, m and the Poisson number of surplus edges N (m) . Fix K 1 and note that Using Lemma 4.11, we can choose K large (independent of m) to make the bound on the right arbitrarily small. Further, in view of Lemma 4.12, we can work with G mod m (p, a) instead ofG m (p, a). Hence it suffices to prove the following convergence for every fixed k 0: (4.23) To analyze this term, we first need to setup some notation.
Note that both the finite m and the limit object are obtained by starting with a discrete tree for finite m and a real tree in the limit, and sampling a random number of pairs to create "shortcuts". Recall the space T * I J in Section 2.1.3. Fix k 0 and let t be an element in T * I ,(k+ ) for some I 0. "I " will not play a role in the definition below. Write ρ for the root and denote the leaves by x k,k+ := (x 1 , x 2 , . . . , x k , x k+1 , . . . , x k+ ).
Also recall that for each i , there is a probability measure ν t,i (·) on the path [ρ, x i ] for 1 i k + . For 1 i k, sample y i according to the distribution ν t,i (·) independently for different i and connect x i and y i . Let t denote the (random) tree thus obtained and let d t denote the graph distance on t . Define the function g (k) φ : T * I ,(k+ ) → R by We will drop the superscript and simply write V i ,V i etc. when there is no scope of confusion. Note that T p, m ( V k,k+ ) = ∂ wheneverV 1 , . . . ,V k ,V k+1 , . . .V k+ are not all distinct or one of them is an ancestor of another vertex in V k,k+ . In either of these two case, the subtree spanned by the root and V k,k+ will have less than k + leaves. We made the convention of setting  (4.12) and the discussion below (4.19)). It thus follows that Using (4.25), we see that where V k,k+ = (V 1 ,V 2 , . . .V k+ ), and V i are i.i.d. with distribution p. Since T p, m is sampled according to a tilted p-tree distribution, combining (4.26), and (4.27), we get the following result: where C m = E(L(T ■ Write E θ for expectation conditional on T θ (∞) and the random variables U (i ) j that encode the order on T θ (∞) , i.e., and note that E Φ G ∞ (θ, γ) 1l N (∞) = k has an expression similar to (4.28). Indeed, from the construction of G ∞ (θ, γ) given in Section 2.3.1, it follows that The proof of this theorem is accomplished via the following two theorems for which we need to set up some notation. Fix I 0 and J 1. We will assume that T p m has been constructed via the birthday construction (see Section 4.2.1). This construction gives rise to an unordered p-tree. To obtain an ordered p-tree from this, let D  Now recall the tree R (∞) I J defined in Section 2.2 using the limit ICRT T θ (∞) . The main ingredients in the proof of Theorem 4.15 are the following two theorems: The second result we will need is as follows. Recall the function g (k) φ on T * I ,(k+ ) as in (4.24).  (∞) ). This is made precise in the following simple lemma whose proof we leave to the reader. Then We will apply this lemma with the random variables that arise in Theorem 4.15. That is, we set , and similarly define X (m),2 and X (∞),2 to be the second coordinates in the display (4.31). To define X (m) r , we proceed as follows. For m = ∞, sample as above J r points using the mass measure µ from T θ (∞) and define where Var p defined analogously to E p is the conditional variance operator. Obviously From the argument given below (4.11), it follows that G (m) ∞ ||F exc,p || ∞ . Hence Lemma 4.9 implies that sup m C (m) < ∞. This verifies (i) of the lemma. Let us now verify condition (ii) of the lemma. Writing this out explicitly, we have to show for each fixed r 1, Similarly modify the "second endpoint" measure in (4.22) to keep track of only ancestors with labels R, namely Assuming this proposition, we now complete the proof of Theorem 4.17. Note that for any fixed bounded continuous function f on T * I J and any truncation level R 1, we have Since σ(q) q max → 0 as m → ∞, this completes the proof. Once this ordering has been defined, we can construct the function G (m) (·) as in (4.18).
In this case we can write this function explicitly in terms of the associated uniform random variables as follows. Define Then Similarly, the root-to-leaf measure Q (m),R v (recall (4.33)) can also be expressed in terms of this function. Now using (4.36), for every fixed hub i R, j J , and a.e. sample point ω, one of the following two holds: For every hub i 1 and leaf Then note that Thus, it is enough to show that given ε > 0, we can find R = R(ε) < ∞ such that P(E (2) R > ε) < ε. To this end, first choose K ε large enough so that P(η J > K ε ) < ε/2, and then choose R ε large enough so that Then note that where the first term in the second inequality follows from the choice of K ε , while the second term comes from the stick-breaking construction of T θ (∞) using the countable collection of Poisson point processes. This completes the proof. Since T * I J assumes the product topology on these coordinates, it is enough to show the required estimate in Proposition 4.20 (c) with functions of the form Here t ∈ T I J , a j ∈ R are associated leaf values and M j are the paths from the root to leaf j with an associated probability measure and f , g j and h j are bounded uniformly continuous functions on the spaces T I J , R and S (measured compact metric spaces) respectively. To simplify notation, we will simply write this as f (t).
Since V j 's have been sampled in an i.i.d. fashion from p, it is enough to show that for any two bounded uniformly continuous functions h, g on R and S respectively, Writing π 1 and π 2 for the marginals of π, we have, using the above choice of correspondence C and of the measure π, . (4.42) Now suppose we show (4.40). Using part (a) and part (b) of Proposition 4.20, we get 1 ) > 0. Now using the bound in (4.42) and uniform continuity of h, we see that (4.41) is true. Hence it is enough to prove (4.40).
Recall from Section 4.2.1 the construction of V (m) 1 and the tree simultaneously via the birthday construction, where V (m) 1 is obtained as the value before the first repeat time, namely Y R 1 −1 . Fix ε > 0. By [27,Theorem 4], under Assumptions 4.4 we may choose K ε large so that the first repeat time satisfies P(R 1 > K ε /σ(p)) < ε for all m 1. Next, by uniform continuity of g , choose δ ∈ (0, 1) such that |g (x) − g (y)| < ε if |x − y| < δ. Finally choose R large so that for all m, First, by choice of K ε and boundedness of g , (4.43) and a similar inequality holds true if we replace the functional G (m) by G R (m) . Next, writing we have is a tricky object for which we will need a tractable upper bound. Recall that we have used T B 1 for the birthday tree in (4.7) constructed by time R 1 . For every vertex i ∈ T B 1 , let J (i ) be the first child of i in the birthday construction (the first new, i.e., previously un-sampled vertex sampled immediately after a prior sampling of i ). This will be an empty set if i is a leaf in the eventual full tree T p m . Recall that i ; j was used to denote the event that j is a child of i in T p m . Then note that Thus, (m) (R). (4.45) For i = j ∈ [m], define the event E i j := i appears before K ε σ(p) , i ; j , j = J (i ) . Then for E i j to happen, the following needs to happen in the birthday construction: (a) There is an 0 r 1 K ε /σ(p) such that till time r 1 , neither i or j have been sampled. (b) At time r 1 + 1 vertex i is sampled. (c) There is an r 2 0 such that in the times [r 1 + 1, r 1 + 1 + r 2 ] samples, j does not appear. (d) Then at time r 1 + r 2 + 2, vertex i is sampled again. (e) In the next time step r 1 + r 2 + 3 vertex j is sampled. Therefore, Using this in (4.45), we get

Proof of Theorem 4.18:
We now prove continuity of the function g (k) φ on the space T * I ,(k+ ) . In fact, we will give a quantitative estimate. Since we are assuming the discrete topology on the coordinate corresponding to the shape, without loss of generality we will work with two trees t, t ∈ T * I ,(k+ ) having the same shape. We need to distinguish the labels for the root and the leaves in the two trees; so write 0+ (respectively 0+) for the root of t (respectively t) and write j + : 1 j k + (respectively j + : 1 j k + l ) for the collection of leaves in t (respectively t). Finally, let ν j be the corresponding probability measure on the path M j := [0+, j +] for 1 j k, and analogously let ν j be the probability measure on M j := [0+, j +]. View these paths as pointed measured metric spaces pointed at the roots 0+ and 0+ respectively. Now let ε j := d where l e (·) denotes the length of the edge e and we have used the fact that both trees have the same shape. Write ht(t) for the height of tree t (not graph distance, rather in terms of maximal distance from the root when incorporating edge lengths). The following proposition completes the proof of Theorem 4.18: Proposition 4.25. For two trees t, t ∈ T * I ,(k+ ) having the same shape, and with ε as in (4.47), Proof: For each j k, choose a correspondence C j and a measure π j on the product space [0+, j +] × [0+, j +] such that the following conditions are met: (a) (0+, 0+) ∈ C j ; (b) the distortion satisfies dis(C j ) < 3ε j ; (c) the measure of the complement satisfies π j (C c j ) < 2ε j ; (d) and finally ||ν j − p * π j || + ||ν j − p * π j || < 2ε j , (4.48) where p * π j and p * π j are the marginals of π j . Now sample (X j , X j ) ∼ π j from [0+, j +] × [0+, j +] independently for 1 j k. By (4.48), we can couple (X j , X j ) with two random variables X j , X j (again independently for 1 j k) such that X j ∼ ν j and X j ∼ ν j , and further P X j = X j + P X j = X j < 2ε j . (4.49) Using conditions (b) and (c), we get where d t is the distance metric on tree t which incorporates the edge lengths. Now write E for the following "good event": It follows from (4.49) and (4.50) that Now we are going to create "shortcuts" by gluing the leaves to the corresponding sampled points. Let S (resp. S) be the (random) metric space obtained by identifying each of the leaves j + (resp. j +) with X j (resp. X j ) in t (resp. t) for 1 j k and write d S (resp. d S ) for the induced metric. Then by definition, and an analogous expression holds for g (k) φ (t). This gives Consider the map from t to t which takes every vertex to the corresponding vertex and points on each edge are mapped by linear interpolation (using the edge lengths) to points on the corresponding edge. Consider a ∈ [0, j +] and let a ∈ [0, j +] be the corresponding point in t for some j k. Then note that on the set E . Now consider a shortest path in S connecting (k + i 1 )+ and (k + i 2 )+. We can go from (k + i 1 )+ to (k + i 2 )+ by taking the same route in S, i.e., by traversing the same edges and taking the same shortcuts in the same order. We make the following observations: (i) The difference between distance traversed while crossing the edge e is |l e (t)−l e (t)|. (ii) By (4.53), on the set E , taking a "shortcut" contributes at most (3ε j + e |l e (t) − l e (t)|) to the difference between distance traversed. Since we have to take at most k shortcuts, we immediately get on the set E . By symmetry, a similar inequality holds if we interchange the roles of S and S. This observation combined with and yields the result. ■

PROOFS: CONVERGENCE IN GROMOV-WEAK TOPOLOGY
Recall from Proposition 4.1 that conditional on the partition of the vertices V (i ) : i 1 into the connected components, the actual structure of the components of G (x, t ) can be generated independently as the connected graphG |V (i ) | (a (i ) n , p (i ) n ) where a (i ) n , p (i ) n are as in Proposition 4 .1 and given m, p, a,G m (a, p) is the connected random graph model studied in the previous section. For Theorem 1.8, the time scale t = t n of interest in the expression of a (i ) n is t n := λ + 1 σ 2 (x (n) ) , for fixed λ ∈ R. Let N (R + ) denote the space of counting measures on R + equipped with the vague topology. Define Υ (i ) n := (p v /σ(p), v ∈ V (i ) ) and view (a (i ) n σ(p (i ) n ), Υ (i ) n ) as a random element of S := R + × N (R + ) (equipped with the product topology). Finally, define Finally recall the definitions ofγ (i ) , θ (i ) from (2.10). Writing these out explicitly, define Proof of Theorem 1.8: We prove the theorem assuming Proposition 5.1. By an application of Skorohod embedding we may assume that we are working on a probability space where the convergence in Proposition 5.1 happens almost surely. In particular, in this space, Assumption 4.4 is satisfied almost surely for p (i ) n for any fixed i 1. Now an application of Theorem 4.5 completes the proof. ■

Verification of weight assumptions in maximal components.
Here we give the proof of Proposition 5.1. To ease notation, we will throughout assume λ = 0. The general case follows in an identical fashion, but this assumption simplifies notation. We will write V c instead of V c 0 for the process in (1.15) with λ = 0 and simply write C i for C i ([σ 2 (x (n) )] −1 ). We start by describing an exploration scheme (developed in [9]) which simultaneously constructs the graph G n (x, t ) and a "breadth first" walk. This was carefully analyzed in [10] to prove Theorem 1.7.
For every ordered pair (u, v), let η u,v be an exponential random variable with rate t x v (independent across ordered pairs). Note that there is a simple relation between the connection probabilities of G n (x, t ) given by (1.10) and the above random variables given by: q uv := P(η uv < x u ).  (1) is chosen by sizebiased sampling, namely with probability proportional to vertex weights x. When possible we will suppress dependence on n to ease notation. Now let D(v(1)) := v : η v (1),v x v (1) denote the collection of "children" of v(1) and note that by (5.2) this generates the right connection probabilities in G n (x, t ). Think of the associated η v (1),v values (for vertices connected to v(1)) as "birth-times" of these connections in the interval [0, x v (1) ] and label the corresponding vertices as v (2) Associate with this construction a breadth-first walk as follows: (1) .
. At this "time" we will explore the unexplored neighbors of v(i ). By this time, there are |U (i )| := i − 1 + |A (i )| vertices that have either been explored or are active.
. Again update the walk as After finishing a component (which happens when A (i ) = for some i 2), choose the next vertex to explore in a size-biased manner from the unexplored set U (i ). If U (i ) = , then we have finished constructing the partition of the graph into the connected components. Now note the following important properties of this exploration: Thus, the size of the component of v(i ), j l =i x v(l ) is essentially the length of the excursion of the walk beyond past minima.
As a starting point in proving Theorem 1.7, Aldous and Limic [10] show the following result. Their result is more general (incorporating the presence of a "Brownian component") but we state their result as applied to our setting. Using this result, Aldous and Limic [10] show that the corresponding maximal excursions beyond past minima ofZ n also converge to the maximal excursions beyond past minima of V c λ , namely the excursion lengths of the reflected processV c λ (see (1.16)) from zero. A consequence of the proof of Theorem 1.7 in [10] using Proposition 5.2 is the following result: Lemma 5.3. Fix K and let E n (K ) be the time required for the above construction to explore the maximal K components {C i : 1 i K }. Then {E n (K ) : K 1} is tight.
In other words, for any fixed K 1, the maximal length excursions ofV c are found in finite time. Thus, even though the total weight of vertices σ 1 → ∞, when exploring the graph in size-biased fashion, under Assumption 1.6 one needs only a finite amount of "time" to find the maximal components. Here time is measured in terms of the weight of vertices already explored. Now define Thus, S n,2 (t ) is the normalized sum of squares of vertex weights of vertices explored by time t and R ε n is the normalized sum of these squares where we only retain explored vertices with weight at most εσ 2 . Using the same set of exponential random variables ξ j : j 1 that arose in the definition of the process V c in (1.15) define a new process The same proof techniques as in [10] now implies the following. Since the ideas basically follow from [10] we only sketch the proof.
Proof: Fix K 1, and for each i 1, let ξ (n) i denote the time when vertex i is added to the collection of active vertices. Now consider the K + 1 dimensional stochastic process Y K n (s) := Z n (s), In the proof of Proposition 5.2, Aldous and Limic showed that Y K n d −→ Y K ∞ for every fixed K 1. Thus to complete the proof, it is enough to show, for every fixed A > 0 and η > 0, lim sup ε→0 lim sup n→∞ P(R ε n (A) > η) = 0. Now as described on [10, Page 17], we can couple (ξ (n) 1 , ξ (n) 2 , . . . , ξ (n) n ) with a sequence of independent exponential random variables (ξ (n) 1 ,ξ (n) 2 , . . . ,ξ (n) n ) withξ (n) j having rate x j /σ 2 such thatξ (n) Then it is enough to show lim sup ε→0 lim sup n→∞ E(R ε n (A)) = 0, which is trivial since We have used both (1.12) and (1.13) in the last convergence assertion. Thus, first letting n → ∞ and then ε → 0 completes the proof. ■ We can now complete the proof of Proposition 5.1. First, note that to prove (5.1), it is enough to show that for any two rationals r < s, j c j 1l r ξ j s = ∞ almost surely where ξ j are the associated exponential rate c j random variables. This, however, is trivially true as j c 2 j = ∞. To prove the other assertions, define, for i 1, the point processes Ξ (i ) n := {x u /σ 2 : u ∈ C i }, namely the rescaled vertex weights in the i th maximal component. Analogously define Ξ (i ) ∞ = {c v : v ∈ Z i }, namely the collection of jumps in the i th largest excursion ofV c . Let for the normalized sum of squares of vertex weights in a component. Definẽ We will view these as random elements ofS ∞ whereS := R 2 × N (R). Lemma 5.3 and Lemma 5.4 now imply the following: Expressing the functionals that arise in Proposition 5.1 in terms of vertex weights in maximal components completes the proof. Indeed, as n → ∞. The proof of P n d −→ P ∞ is similar. ■

Gromov-weak convergence in Theorem 1.2.
That convergence in (1.7) holds with respect to Gromov-weak topology is an easy consequence of Theorem 1.8. Indeed, setting we can write NR n (w (λ)) as the model G (x, t n ) where x = x (n) := (x i : i ∈ [n]). A direct computation will show that x (n) satisfies Assumption 1.6 with the entrance boundary c nr defined in (2.11). Note also that where ζ is as defined in (2.12). Combining these observations, we see that t n − (σ 2 (x (n) )) −1 → t nr λ as n → ∞, where t nr λ is as in (2.12). Since n (τ−3)/(τ−1) σ 2 (x (n) ) → EW , we conclude that M nr ∞ (λ) defined in (2.13) is the Gromov-weak limit of n −(τ−3)/(τ−1) M nr n (λ), where M nr n (λ) is as in (1.6). Remark 7. Theorem 1.8 is stated for a fixed λ ∈ R, but in the argument just given, we have to work with a sequence, namely t n − (σ 2 (x (n) )) −1 converging to t nr λ . This, however, does not make any difference. Indeed, the proof of [10,Proposition 9] can be imitated to prove the same result in the setup where we have a sequence converging to t instead of a fixed t , and no new idea is involved here. (In [10, Lemma 27], Aldous and Limic prove a similar result for the multiplicative coalescent. They do not, however, explicitly state the convergence of the associated process under the same assumption.)

PROOFS: CONVERGENCE IN GROMOV-HAUSDORFF-PROKHOROV TOPOLOGY
In this section, we improve Gromov-weak convergence in Theorem 1.2 to Gromov-Hausdorff-Prokhorov convergence. To do so, we will rely on [14, Theorem 6.1] which gives a criterion for convergence in Gromov-Hausdorff-weak topology. We do not give the definition of Gromov-Hausdorff-weak topology and instead refer the reader to [14,Definition 5.1]. Convergence in Gromov-Hausdorff-weak topology implies convergence in Gromov-Hausdorff-Prokhorov topology when we are working with metric measure spaces having full support (i.e., the support of the measure is the entire metric space). This is true in our situation. Indeed, it is a trivial fact that C i (λ) has full support. Further, the mass measure on an inhomogeneous continuum random tree has full support which implies that the same is true for M nr i (λ). Applying [14,Theorem 6.1] to our situation, we see that it is enough to prove the following lemma: Lemma 6.1 (Global lower mass-bound). Let C i (λ) be the i th largest component of NR n (w (λ)). Then the following assertion is true: For each i 1, v ∈ [n] and δ > 0, let B (v, δ) denote the intrinsic ball (in NR n (w (λ))) of radius δn (τ−3)/(τ−1) around v and set Before moving on to the proof of Lemma 6.1, we state a result that essentially says that instead of looking at the largest components, we can work with the components of highweight vertices. This observation will be used to prove the global lower-mass bound: The purpose of this section is to prove a strong result (Proposition 6.3 stated below) that gives control over the number of intrinsic balls of radius εn η needed to cover the largest components. This acts as a crucial ingredient in the proof of Lemma 6.1 as well as the proof of the bound on the upper box-counting dimension. We continue to prove Proposition 6.3. Write The proof consists of four steps. In the first step, we reduce the proof to the study of the height of mixed-Poisson branching processes. In the second step, we ensure that we can take λ = 0, while in the third step, we study the survival probability of such critical infinitevariance branching processes. In the fourth and final step, we prove the claim. In particular, diam(C res (i )) εn η for this i . Thus, Now the random graph NR n (w (λ)) restricted to [n] \ [i − 1] is the Norros-Reittu random graph NR n (w (i ) (λ)), where w (i ) , and (i ) n = n k=i w k . Indeed, this follows from the simple observation Write W (i ) n (λ) for a random variable whose distribution is given by (n − i + 1) −1 n j =i δ w (i ) j (λ) , and for any non-negative random variable X with E X > 0, let X • be the random variable having the size-biased distribution given by We will use the following comparison to a mixed-Poisson branching process: Lemma 6.4 (Domination by a mixed-Poisson branching process). Fix i ∈ [n] and consider NR n (w (i ) (λ)). Then, there exists a coupling of C res (i ) and a branching process where the root has a Poi(w (i ) i (λ)) offspring distribution while every other vertex has a Poi((W (i ) n (λ)) • ) offspring distribution such that in the breadth-first exploration of C res (i ) starting from i , each vertex v ∈ C res (i ) has at most the number of children as in the branching process.
When ht(T (i ) n (λ)) > εn η /2, at least one of the subtrees of the root needs to have height at least εn η /2. Combining this observation with (6.4) and (6.5), we get where T (i ) n (λ) is a branching process tree where every vertex has a Poi((W (i ) n (λ)) • ) offspring distribution.
We make the convention of writing T (i ) n , W (i ) n etc. instead of T (i ) n (0), W (i ) n (0) etc. With this notation, it is easy to see that The survival probability of mixed-Poisson branching processes. We would like to compare our mixed-Poisson branching process with an offspring distribution that is independent of n. For this, we rely on the following two lemmas: Lemma 6.5 (Mixed-Poisson branching processes of different parameters). Let T (i ) n and T (i ) n (λ) be as above. Assume further that λ 0. Then, for each k 1, Proof: We follow [43, Proof of Lemma 3.4(1)]. Writing δ = 1 + λn −η , we note that we can obtain T (i ) n as a subtree of T (i ) n (λ) by killing every child independently with probability 1−δ −1 . Write A for the event in which ht(T (i ) n (λ)) k and no vertex in the leftmost path of length k starting from the root in T (i ) n (λ) is killed. Then Indeed, the probability of the leftmost path surviving is precisely 1/δ k . To finish the proof, note that A implies ht(T (i ) n ) k, so that Repeated application of the above will yield the following simple inequality: if {a n } n 1 and {b n } n 1 are sequences of positive numbers satisfying Recall that ι denotes the leftmost point of the support of F , and note that from (1.2) it follows that ∞ w j f = j /n, j = 1, 2, . . . , n (note also that w n = ι). Define the function h n : [ι, w 1 ) → (ι, ∞) by h n (y) y f = 1/n. This immediately implies f (h n (y))h n (y) = f (y). (6.8) Let g n : [ι, w 1 ) → (0, ∞) be given by A direct computation and an application of (6.8) yields Since u f (u) is non-increasing on [ι, ∞) under Assumption 1.1, we conclude that g n (y) 0 on (ι, w 1 ). Thus, g n (·) is non-increasing on [ι, w 1 ). By right continuity, we can define g n (w 1 ) = w 1 /( ∞ w 1 u f (u) d u). Since w n w n−1 . . . w 1 , we conclude that g n (w 1 ) g n (w 2 ) . . . g n (w n ). Clearly h n (w j ) = w j −1 for j = 2, . . . , n. Thus Now an application of (6.7) gives which is equivalent to This concludes the proof. ■ We continue to study the survival probability of mixed-Poisson branching processes with infinite variance offspring distribution:  (1)). By the Otter-Dwass formula, which describes the distribution of the total progeny of a branching process (see [36] for the special case when the branching process starts with a single individual, [58] for the more general case, and [42] for a simple proof based on induction), we have where X i are i.i.d. random variables distributed as W • . By [41,Proposition 2.7], in our situation, Take k = m (τ−2)/(τ−3) in the second inequality in (6.9) to get where |T | denotes the total number of vertices in T . We condition on the size |T | and write By [50,Theorem 4], there exists a κ > 1 such that, uniformly for u 1, Combining this with (6.10), we get as required. ■ Proof of Proposition 6.3: Clearly 1 n 1/(τ−1) , where the second inequality is a consequence of Lemma 6.6 and the last step follows from Lemma 6.7.
Substituting the estimate (6.11) into (6.6) leads to Here, we have used the second inequality in (6.13). Combining this estimate with (6.12) and the first inequality in (6.13), we end up with Note that εN η = ε −δη . A little more work after plugging this into (6.14) will lead to (6.2). ■

Proof of global lower-mass bound.
In this section, we complete the proof of Lemma 6.1. We start with some preliminaries: Recall the definitions of η and ρ from (6.1). Recall that for v ∈ [n], B (v, δ) denotes the intrinsic ball (in NR n (w (λ))) around v or radius δn η . We will use the following bound on the weight of balls: Lemma 6.9 (Weights of balls around high-weight vertices cannot be too small). For every ε > 0 and i 1, there exist n i ,ε large and δ i ,ε > 0 such that for all n n i ,ε and δ ∈ (0, δ i ,ε ], Proof: We rely on a cluster exploration used in [17] which we describe next. We denote by (Z l (i )) l 0 the exploration process of C (i ), the cluster containing i , starting from i , in the breadth-first search, where Z 0 (i ) = 1 and where Z 1 (i ) denotes the number of potential neighbors of the initial vertex i . The variable Z l (i ) has the interpretation of the number of potential neighbors of the first l explored potential vertices in the cluster whose neighbors have not yet been explored. As a result, we explore by taking one vertex of the 'stack' of size Z l (i ), drawing its mark and checking whether it is a real vertex, followed by drawing its number of potential neighbors. Thus, we set Z 0 (i ) = 1, Z 1 (i ) = Poi(w i ), and note that, for l 2, Z l (i ) satisfies the recursion relation where X l denotes the number of potential neighbors of the l th potential vertex that is explored, where X 1 = X 1 (i ) = Poi(w i ). More precisely, when we explore the l th potential vertex, we start by drawing its mark M l in an i.i.d. way with distribution When we have already explored a vertex with the same mark as the one drawn, we turn the status of the vertex to be explored to inactive, the potential vertex does not become a real vertex, and we proceed with the next potential vertex. When, instead, it receives a mark that we have not yet seen, then the potential vertex becomes a real vertex, its mark M l ∈ [n] indicating to which vertex in [n] the l th explored vertex corresponds, so that M l ∈ C (i ). We then draw X l = Poi(w M l ), and X l denotes the number of potential vertices incident to the real vertex M l . Again, upon exploration, these potential vertices might become real vertices, and this occurs precisely when their mark corresponds to a vertex in [n] that has not appeared in the cluster exploration so far. We call the above procedure of drawing a mark for a potential vertex to investigate whether it corresponds to a real vertex a vertex check. Let Z (n) t (i ) = n −1/(τ−1) Z t n ρ (i ) for t > 0. Then, by imitating the techniques used in the proof of [17, Theorem 2.4], we obtain ([17, Theorem 2.4] states the result for i = 1. However the exact same proof goes through for any i 2.) The limiting process (S t (i )) t >0 is defined as follows: Let We let (I i (t )) i 1 denote independent increasing indicator processes defined by Here Exp(ai −1/(τ−1) ) i 1 are independent exponential random variables with rates ai −1/(τ−1) . Then we define for all t 0, where c = λ + ζ − ab and ζ is as in (2.12). We call (S t ) t 0 a thinned Lévy process.
Hence, for all n n ε 1 ,ε 2 , where the second inequality is a consequence of Lemma 6.9. Next, on the set for any v ∈ C (i ). Further, by [17,Theorem 1.4], n −ρ j ∈C (i ) w j converges in distribution to a positive random variable. Hence, there exists ξ (i ) ε 2 > 0 such that The result follows upon combining (6.25), (6.27) and (6.28). ■ We are now ready for the proof of Lemma 6.1: Proof of Lemma 6.1: Using Proposition 6.2, for any i 1 and ε > 0, we can choose K such that By Lemma 6.10, we can choose ξ > 0 and an integern such that for all n n and k ∈ [K ]. Combining (6.29) and (6.30), we see that which yields the desired tightness. ■

PROOFS: FRACTAL DIMENSION
In this section, we prove the assertions about the box-counting dimension. Throughout this section, C ,C will denote universal constants whose values may change from line to line.
We first prove a similar result for the component of j , C ( j ). Consider C (1), and as usual, view C (1) as a metric measure space via the graph distance and by assigning mass (1)). Now note that conditional on the vertex set of C (1), C (1) has the same distribution as the graphG m (p, a) where a = (1 + λn −η )( j ∈C (1) w j ) 2 / n . Using [17,Proposition 3.7] and [17, Lemma 3.1], it is easy to verify that the conditions in Assumption 4.4 hold with this choice of a and p. Thus, by Theorem 4.5, n −η C (1) converges in Gromov-weak topology to a limiting space that we denote by M (1). Further, the sequence n −η C (1) n 1 satisfies the global lower mass-bound property by Lemma 6.10. Hence,  Since the convergence in (7.1) holds with respect to the Gromov-Hausdorff topology, for every x, ε > 0, Fix an arbitrary δ > 0 and, for any ε > 0, define Let E n be the event defined in (6.3). Clearly, on the event where I q (·) and H S (1) (·) are as defined around (6.18 Combining (7.5),(7.6), (7.7) and (7.8), we conclude that, for any u ε > 0, Now I q (u ε ) are i.i.d. Bernoulli random variables with where a is as in (6.16). Choose s > 0 small so that e s − 1 2s. Clearly E exp sI q (u ε ) = 1 + p q e s − 1 exp p q e s − 1 exp(2sp q ).
The rest of this section is devoted to the proof of Proposition 7.2. As in Section 7.1 and for simplicity, we work with j = 1. The proof is similar for any j 2. Before starting with the proof, we collect some preliminaries. The proof below relies on two asymptotic bounds on |C (1)|. For this, we use lim sup n P n −ρ |C (1)| s) = P(H S (1) (0) s), (7.14) where H S (1) (·)is defined around (6.18). Our main result on the lower tails of the distribution of H S (1) (0) is in the following lemma: We split where, abbreviating d j = a/ j 1/(τ−1) , Here (N j (t )) t 0 are independent rate d j Poisson processes. Thus, (R t ) t 0 is a Lévy process, while (D t ) t 0 subtracts the multiple hits. When b > 0 and t s with s small, and using that D s is non-decreasing, where we have used (6.13) in the second step to lower bound i K w 2 i . We write C (A) = a∈A C (a). Then, for j K , we bound w(C ( j )) w(C ([K ])). This completes the proof. ■ By (7.14) and Lemma 7.3, lim sup n P |C (1)| ε δh/2 n ρ C ε δh/2 .
Analysis of X (n) 2 . We next give an upper bound on P(X (n) 2 (ε) x ε ). We start with P X (n) 2 (ε) x ε x −1 ε E X (n) 2 (ε) . (iii) {k ∈ C (1)}, occur disjointly, where dist(i , j ) denotes the graph distance in the random graph NR n (w ). There are two cases depending on whether k > N (ε) or k N (ε). When k N (ε), we can ignore the event {dist( j , k) 2εn η }. This gives, for 2 i N (ε), where, for two increasing events A, B , we write A • B for the event that A and B occur disjointly. By the BK inequality, we bound P dist(i , k) 4εn η P k ∈ C (1) .
By sandwiching ε between 1/(k − 1) p and 1/k p , we obtain the bound: dim(M (1)) π a.s. ■ Proof of (1.8) and (1.20): Proposition 7.2 combined with an argument identical to the the one given right after the proof of Proposition 7.1 yields the lower bound: dim M nr i (λ) π a.s. (1.8) follows once we combine this lower bound with (7.11), and (1.20) follows as a consequence of (2.13). ■

OPEN PROBLEMS
In Theorem 1.8, we have considered a general entrance boundary c ∈ l 0 . To study specific properties of the limit objects, we focused mainly on the special case c = c(α, τ) as in (1.19) and in this case, we have shown compactness and identified the box counting dimension in Theorem 1.9. An important problem in this context is to establish necessary and sufficient conditions on c that ensure compactness of the limiting spaces.
Another motivation for pursuing this problem comes from the following simple corollary of Theorem 1.9: For any i 1, consider the sequence θ (i ) as in (2.10). Then T θ (i ) (∞) is almost surely compact. Similarly, compactness of M (1) (as defined in (7.1)) implies compactness of the associated ICRT T θ (∞) where θ = (θ i : i 1) is given by the following prescription: Let q k be such that These can be thought of as "annealed results," since θ (i ) and θ are random. No result is known in this direction without a prior distribution on θ, i.e., sufficient conditions on nonrandom θ ∈ Θ that ensure compactness of the tree T θ (∞) are not known. In [11, Section 7], Aldous, Miermont and Pitman conjecture that boundedness of T θ (∞) for θ ∈ Θ is equivalent to ∞ 1 (ψ θ (u)) −1 d u < ∞, where ψ θ , in our situation, is given by This conjecture, however, is open to date. Our proof technique demonstrates a method of proving such annealed results via approximation by random graphs. Thus, classification of those c ∈ l 0 for which the spaces M c i (λ) are compact will lead to a broad class of prior distributions on θ for which T θ (∞) is compact. which in turn implies that both the Hausdorff dimension and the packing dimension of a ψ θ Lévy tree equal (τ − 2)/(τ − 3) a.s. (see [34,40]). Using the analogy between ICRTs and Lévy trees as in [11,Section 7], it is natural to expect that the same is true for T θ (∞) and hence for M (1). This is the heuristic behind Conjecture 1.3.