1 Introduction and results

In the last two decades many results regarding scaling limits of large discrete random objects to continuum analogs have been proved. Examples range from Aldous’s continuum random tree [7, 8, 51], Schramm-Loewner evolution and critical planar systems [61], to what is most closely related to this paper: scaling limits of maximal components in the critical regime for random graphs as well as the minimal spanning tree on the giant component in the supercritical regime [3,4,5].

Motivated by empirical observations on real-world networks, in the last decade, researchers from a wide array of fields including computer science, the social sciences and statistical physics have proposed a large number of random graph models to explain various functionals of real world systems including power law degree distributions and small world scaling of distances between nodes in the network [6, 21, 32, 33, 35, 44, 55, 56]. Many of these models have a parameter t related to the edge density and a model-dependent critical point \(t_c\). Writing n for the number of vertices in the network, if \(t< t_c\) then the maximal connected component \(\mathscr {C}_1(n)\) has size that is negligible compared to n, while if \(t> t_c\) one has a giant component \(\mathscr {C}_1(n)\sim f(t) n\) for some positive model-dependent function \(f(t) > 0\) for \(t> t_c\). The “\(t=t_c\)” regime is often referred to as the critical regime. Just as a study of the classical critical Erdős-Rényi random graph spurred enormous activity in probabilistic combinatorics in the 90s [9, 21, 47, 52, 53], the study of the critical regime of these new random graph models and new phenomena such as explosive percolation [2, 60] have motivated a concerted effort to understand the critical regime of these new random graph models.

In this context, for more than a decade [23, 24, 28, 62], one of the fundamental open conjectures in this area (loosely stated) is as follows. Consider distances between typical points in the maximal component either in the critical regime or the minimal spanning tree on the giant component in the supercritical regime scale

  1. (a)

    If the random graph model has an asymptotic degree distribution with finite third moments, then distances scale like \(n^{1/3}\).

  2. (b)

    If the random graph model has a limiting degree distribution \(\left\{ p_k\right\} _{k\ge 1}\) with tail \(p_k\sim C/k^{\tau }\) for \(\tau \in (3,4)\), then distances scale like \(n^{(\tau -3)/(\tau -1)}\).

Contributions of this paper Since we will need to setup some notation before getting to the main results, let us give a general overview of the contributions of this paper:

  1. (i)

    General theory The fundamental aim of the paper is to develop a general theory one can use to prove (b) in the conjecture above for a wide class of random graphs and, in particular, derive a new class of continuum scaling limits. To do so, we consider the multiplicative coalescent with entrance boundary in the space \(l_0\) as in [10] [see (1.11) below]. Viewing the maximal components as measured metric spaces (using graph distance and vertex weights), we show that these components with edges and associated measures properly rescaled converge to continuum random objects in the Gromov weak sense. These resulting objects are obtained via appropriate tilts and vertex identifications of inhomogeneous continuum random trees; untilted versions of the same objects have been used to describe the entrance boundary of the additive coalescent [13]. These resulting random objects are “tree-like” but with a dense collection of “hubs” (corresponding to infinite-degree vertices).

  2. (ii)

    Proof techniques The standard technique in proving such results is to study height processes of certain spanning trees of the components and to show that these processes converge to limiting excursions that code the limiting random real trees. In our context, the convergence of height processes of the corresponding approximating \(\mathbf {p}\)-trees is not known. In [11], the height processes of \(\mathbf {p}\)-trees were shown to converge to limiting excursions in certain regimes, but these results are not applicable to our situation. Because of this, we develop new techniques relying on first showing convergence in Gromov-weak topology via a careful analysis of the tree spanning a finite collection of “typical” points in random “tilted” \(\mathbf {p}\)-trees. In one fundamental class of random graph models, we then extend Gromov-weak convergence to Gromov-Hausdorff-Prokhorov convergence by proving a global lower mass-bound.

  3. (iii)

    Special case As an example of the general theory, we study the special case of the Norros-Reittu model [57] (which in the regime of interest has been proven [46] to be equivalent to the Chung-Lu model [30] and the rank-one random graph [22]). In this case, we show that the limiting spaces are compact. We also show that the box-counting or Minkowski dimension equals \((\tau -2)/(\tau -3)\) a.s.

In work in progress [19], we use the general theory in this paper to analyze another fundamental random graph model, the configuration model with degree distribution with exponent \(\tau \in (3,4)\), and derive the continuum analogs of the maximal components of this model. We defer a more detailed discussion of related work and the relevance of the current study to Sect. 3.

Organization of the paper A reasonable amount of notation regarding the entrance boundary of the multiplicative coalescent is required to describe the main results (Theorems 1.8, 1.9). To ease the reader into the paper, we start in Sect. 1.1 with the special case of the Norros-Reittu model and in Theorem 1.2 describe what the main results imply for this model. Then in Sect. 1.2 we define the multiplicative coalescent as well as the class of entrance boundaries of importance for the paper and then describe the two main results. The results use two notions of convergence of metric spaces; these are given a precise formulation in Sect. 2.1. Section 2.2 describes an important class of random trees called \(\mathbf {p}\)-trees and the corresponding inhomogenous continuum random trees that arise as scaling limits of these objects. These are then used in Sect. 2.3 to give a precise description of the scaling limits of maximal components. We discuss the relevance of the main results, relate these to existing work and give an overview of the proof in Sect. 3. The proofs of the main results are contained in Sects. 47.

Notation Throughout this paper, we make use of the following standard notation. We let \(\mathop {\longrightarrow }\limits ^{d}\) denote convergence in distribution, and \(\mathop {\longrightarrow }\limits ^{\mathrm {P}}\) convergence in probability. For a sequence of random variables \((X_n)_{n\ge 1}\), we write \(X_n=o_{\mathrm {P}}(b_n)\) when \(|X_n|/b_n\mathop {\longrightarrow }\limits ^{\mathrm {P}}0\) as \(n\rightarrow \infty \). For a non-negative function \(n\mapsto g(n)\), we write \(f(n)=O(g(n))\) when |f(n)| / g(n) is uniformly bounded, and \(f(n)=o(g(n))\) when \(\lim _{n\rightarrow \infty } f(n)/g(n)=0\). Furthermore, we write \(f(n)=\Theta (g(n))\) if \(f(n)=O(g(n))\) and \(g(n)=O(f(n))\). We say that a sequence of events \(({\mathscr {E}}_n)_{n\ge 1}\) occurs with high probability (whp) when \({{\mathrm{\mathbb {P}}}}({{\mathscr {E}}}_n)\rightarrow 1\).

1.1 Rank-one random graph

1.1.1 Model formulation

We start by describing a particular class of random graph models called the Poissonian random graph or the Norros-Reittu model [22, 57], sometimes also referred to as the rank-one random graph model [22]. In the regime of interest for this paper, as shown in [46], this model is equivalent to the Chung-Lu model [29,30,31,32] and the Britton-Deijfen-Martin-Löf model [25]. Start with vertex set \([n]:=\left\{ 1,2,\ldots , n\right\} \) and suppose each vertex \(i\in [n]\) has a weight \(w_i\ge 0\) attached to it; intuitively this measures the propensity or attractiveness of this vertex in the formation of links. Writing \(\varvec{w}=(w_1,\ldots , w_n)\), place an edge between i and j independently for each \(i\ne j\in [n]\) with probability

$$\begin{aligned} q_{ij}=q_{ij}(\varvec{w}):= 1-\exp (-w_i w_j/\ell _n), \end{aligned}$$
(1.1)

where \(\ell _n\) is the total weight given by

$$\begin{aligned} \ell _n:= \sum _{i\in [n]} w_i.\nonumber \end{aligned}$$

To complete the formulation, we need to specify how these vertex weights are chosen. Essentially we want the empirical distribution of weights \(n^{-1} \sum _{i\in [n]} \delta \left\{ w_i\right\} \) to converge to a fixed pre-specified distribution F as \(n\rightarrow \infty \). There are a number of ways to do this, but for this paper the following choice turns out to be convenient for a clear statement of the results. Let \((w_i)_{i\in [n]}\) be constructed by

$$\begin{aligned} w_i:= [1-F]^{-1}(i/n), \quad i\in [n], \end{aligned}$$
(1.2)

where F is a cumulative distribution function on \([0,\infty )\) and \([1-F]^{-1}\) is the generalized inverse

$$\begin{aligned}{}[1-F]^{-1}(u):= \inf \left\{ s:[1-F](s)\le u\right\} .\nonumber \end{aligned}$$

We assume there exists \(\tau \in (3,4)\) and \(c_{F} > 0\) such that

$$\begin{aligned} \lim _{x\rightarrow \infty } x^{\tau -1} [1-F(x)]:= c_{F}. \end{aligned}$$
(1.3)

We will use W for a random variable with distribution F. We will use \(\mathrm{NR}_n(\varvec{w})\) to denote the corresponding random graph.

1.1.2 Motivation and known results

As described in the introduction, one impetus for the formulation of a wide array of network models, is to capture the heterogeneous and heavy-tailed nature of the degree distribution of empirical networks. Write \(N_k\) for the number of vertices with degree k in \(\mathrm{NR}_n(\varvec{w})\). Under the assumptions in the previous section, one can show [22, Theorem 3.13] that

$$\begin{aligned} \frac{N_k}{n} \mathop {\longrightarrow }\limits ^{\mathrm {P}}{{\mathrm{\mathbb {E}}}}\left( {\mathrm {e}}^{-W } \frac{W^k}{k!}\right) , \quad k\ge 0, \end{aligned}$$
(1.4)

where \(W\sim F\). In particular, the degree distribution also has tail exponent \(\tau \). More important in the context of this paper is the connectivity threshold. For \(i\ge 1\) write \(\mathscr {C}_i\) for the ith largest connected component and let \(|\mathscr {C}_i|\) denote its number of vertices. Now define the parameter

$$\begin{aligned} \nu := \frac{{{\mathrm{\mathbb {E}}}}(W^2)}{{{\mathrm{\mathbb {E}}}}(W)}, \end{aligned}$$
(1.5)

and note that \(\nu <\infty \) by (1.3). Then by [22, Theorem 3.1 and Sect. 16.4], we have the following criterion for the phase transition for the largest component.

  1. (a)

    Supercritical regime If \(\nu >1\), then there exists \(\rho \in (0,1) \) such that \(|\mathscr {C}_1|/n\mathop {\longrightarrow }\limits ^{\mathrm {P}}\rho \) whilst \(|\mathscr {C}_2|/n\mathop {\longrightarrow }\limits ^{\mathrm {P}}0\);

  2. (b)

    Subcritical regime If \(\nu <1\), then \(|\mathscr {C}_1|/n\mathop {\longrightarrow }\limits ^{\mathrm {P}}0\).

The main aim of this paper is to understand the critical regime \(\nu =1\) where also \(|\mathscr {C}_1|/n\mathop {\longrightarrow }\limits ^{\mathrm {P}}0\). In this setting, there are different universality classes depending on the vertex weights. In the Erdős-Rényi or weakly inhomogeneous universality class, critical clusters have size of order \(n^{2/3}\) and their metric space structure was discovered by Addario-Berry, Broutin and Goldschmidt [4]. Interestingly, when \({{\mathrm{\mathbb {E}}}}(W^3)<\infty \), component sizes still scale like \(n^{2/3}\) [16] while assuming finite \(6+\varepsilon \)-moments the metric space structure of rank-1 inhomogeneous random graphs is (apart from a trivial rescaling of size and time) the same [20]. However, in the strongly inhomogeneous regime where \({{\mathrm{\mathbb {E}}}}(W^3)=\infty \), the scaling limits of critical clusters are dramatically different in the sense that their sizes are gives by \(n^{(\tau -2)/(\tau -1)},\) where \(\tau \) is the degree power-law exponent given by (1.3) [17, 41]. In this paper, we focus on their metric space structure, obtained after rescaling edges by \(n^{-(\tau -3)/(\tau -1)}\) and taking the limit as \(n\rightarrow \infty .\) We show that this limiting metric space is compact and its Minkowski dimension equals \((\tau -2)/(\tau -3)\), whereas the Erdős-Rényi scaling limit has Minkowski dimension 2.

In this paper, we analyze the entire critical scaling window. Let \(\varvec{w}\) denote the weight sequence as in (1.2) and fix \(\lambda \in \mathbb {R}\). Now consider the weight sequence \(\varvec{w}(\lambda ):=(w_i(\lambda ))_{i\in [n]}\) defined by

$$\begin{aligned} \varvec{w}(\lambda ):= \left( 1+\frac{\lambda }{n^{(\tau -3)/(\tau -1)}}\right) \varvec{w}. \end{aligned}$$

Write \(\mathrm{NR}_n(\varvec{w}(\lambda ))\) for the corresponding random graph and let \(\mathscr {C}_i(\lambda )\) denote the corresponding ith largest component. Then this critical scaling window was first identified and studied in [41] where it was shown that for every fixed \(\lambda \in \mathbb {R}\), \(|\mathscr {C}_1|/n^{(\tau -2)/(\tau -1)}\) as well as \(n^{(\tau -2)/(\tau -1)}/|\mathscr {C}_1|\) are tight. The entire distributional asymptotics of component sizes were derived in [17] where it was shown that in the product topology on \(\mathbb {R}^{\mathbb {N}}\),

$$\begin{aligned} \left( \frac{|\mathscr {C}_i(\lambda )|}{n^{(\tau -2)/(\tau -1)}}:i\ge 1\right) \mathop {\longrightarrow }\limits ^{d}(Z_i(\lambda ):i\ge 1), \end{aligned}$$

where \((Z_i(\lambda ):i\ge 1)\) are excursions away from zero of a special stochastic process described in more detail in Sect. 1.2.

1.1.3 Our results

We make the following convention:

For any metric measure space \((\mathbb {S}, d, \mu )\) and \(a>0\), \(a\mathbb {S}\) denotes the metric measure space \((\mathbb {S}, ad, \mu )\), i.e., the space where the distance is scaled by a and the measure remains unchanged.

Consider the random graph \(\mathrm{NR}_n(\varvec{w}(\lambda ))\) and view each connected component \(\mathscr {C}\) as a connected metric space via the usual graph distance where each edge has length one. Further, we can view each connected component \(\mathscr {C}\) as a metric measure space by assigning weight \(w_i/(\sum _{j\in \mathscr {C}}w_j)\) to vertex \(i\in \mathscr {C}\). Note that the normalization yields a probability measure on each connected component. Let \(\mathscr {S}\) denote the space of (equivalence classes) of compact measured metric spaces equipped with the Gromov-Hausdorff-Prokhorov metric (see Sect. 2.1.1 for definition). View

$$\begin{aligned} \mathbf {M}_n^{{{\mathrm{nr}}}}(\lambda ):= \big (\mathscr {C}_i(\lambda ):i\ge 1\big ) \end{aligned}$$
(1.6)

as a random element of \(\mathscr {S}^{\mathbb {N}}\).

Next recall that the lower and upper box counting dimensions of a compact metric space \(\mathscr {M}\) are given by

$$\begin{aligned} {{\mathrm{\underline{dim}}}}(\mathscr {M}):= \liminf _{\delta \downarrow 0}\frac{\log {[\mathscr {N}(\mathscr {M},\delta )]}}{\log (1/\delta )}, \quad \text {and}\quad {{\mathrm{\overline{dim}}}}(\mathscr {M}):= \limsup _{\delta \downarrow 0}\frac{\log {[\mathscr {N}(\mathscr {M},\delta )]}}{\log (1/\delta )} \end{aligned}$$

respectively, where \(\mathscr {N}(\mathscr {M},\delta )\) is the minimal number of open balls with radius \(\delta \) required to cover \(\mathscr {M}\). Also let \(\dim _h(\mathscr {M})\) denote the Hausdorff dimension of \(\mathscr {M}\). When \({{\mathrm{\underline{dim}}}}(\mathscr {M})={{\mathrm{\overline{dim}}}}(\mathscr {M})=\dim \), then the box-counting or Minkowski dimension is \(\dim \).

Before stating our main result, we introduce a technical condition.

Assumption 1.1

The support of the limiting distribution F (defined just before (1.2)) is given by \([\iota , \infty )\) for some \(\iota >0\). Further, F has a continuous density f on \([\iota , \infty )\) such that xf(x) is non-increasing on \([\iota , \infty )\).

Note that distributions F that are exact power laws, i.e., of the form \(F(x)=1-(\iota /x)^{\tau -1}\) for \(x>\iota \) and some \(\tau \in (3, 4)\), satisfy Assumption 1.1. The main result of this section is as follows:

Theorem 1.2

(Scaling limits with degree exponent \(\tau \in (3,4)\)) Fix \(\lambda \in \mathbb {R}\) and consider the critical Norros-Reittu model \(\mathrm{NR}_n(\varvec{w}(\lambda ))\), i.e., assume that \(\nu =1\) where \(\nu \) is as in (1.5). Assume that the limiting distribution F satisfies Assumption 1.1.

Then, there exists an appropriate limiting sequence of random compact metric measure spaces \(\mathbf {M}_\infty ^{{{\mathrm{nr}}}}(\lambda ):= (M_i^{{{\mathrm{nr}}}}(\lambda ))_{i\ge 1}\) such that the components in the critical regime satisfy

$$\begin{aligned} \frac{1}{n^{(\tau -3)/(\tau -1)}}\mathbf {M}_n^{{{\mathrm{nr}}}}(\lambda ) \mathop {\longrightarrow }\limits ^{d}\mathbf {M}_\infty ^{{{\mathrm{nr}}}}(\lambda ), \quad \text{ as } n\rightarrow \infty . \end{aligned}$$
(1.7)

Here convergence is with respect to the product topology on \(\mathscr {S}^{\mathbb {N}}\) induced by the Gromov-Hausdorff-Prokhorov metric on each coordinate \(\mathscr {S}\). For each \(i\ge 1\), the limiting metric spaces have the following properties:

  1. (a)

    \(M_i^{{{\mathrm{nr}}}}(\lambda )\) is random compact metric measure space obtained by taking a random real tree \(\mathscr {T}_i(\lambda )\) and identifying a random (finite) number of pairs of points (thus creating shortcuts).

  2. (b)

    Call a point \(u\in \mathscr {T}_i(\lambda )\)hub point if deleting the u results in infinitely many disconnected components of \(\mathscr {T}_i(\lambda )\). Then \(\mathscr {T}_i(\lambda )\) has infinitely many hub points which are everywhere dense on the tree \(\mathscr {T}_i(\lambda )\).

  3. (c)

    The box-counting or Minkowski dimension of \(M_i^{{{\mathrm{nr}}}}(\lambda )\) satisfies

    $$\begin{aligned} \dim (M_i^{{{\mathrm{nr}}}}(\lambda ))=\frac{\tau -2}{\tau -3} \qquad a.s. \end{aligned}$$
    (1.8)

    Consequently, the Hausdorff dimension satisfies the bound \(\dim _h(M_i^{{{\mathrm{nr}}}}(\lambda )) \le (\tau -2)/(\tau -3)\) a.s.

Conjecture 1.3

We strongly believe that both the Hausdorff dimension and the packing dimension of \(M_i^{{{\mathrm{nr}}}}(\lambda )\) equal \((\tau -2)/(\tau -3)\) a.s. See Sect. 8 for a discussion.

1.2 Connectivity asymptotics for the multiplicative coalescent

In this section we consider a slightly more general setting than in Sect. 1.1. The motivation is as follows: recall that for the rank-one model, two vertices were connected with essentially probability proportional to the product of the weight between these two vertices. For probabilists, this connectivity pattern is quite reminiscent of the famous multiplicative coalescent [9, 10, 15]. Whilst interesting in its own right, its fundamental importance in the context of random graphs is as follows: a wide array of random graph models can be constructed in a dynamic fashion where as time progresses new edges are created between pre-existing clusters. Even though the merging dynamics between connected components tend to be quite different from that specified by the multiplicative coalescent, the mergers from the barely subcritical regime through the critical scaling window can be approximated by the multiplicative coalescent. This idea was exploited in [18] to prove universality of scaling limits in the critical regime for several random graphs models.

Thus components at criticality of a wide array of random graph models can be thought of consisting of two major parts:

  1. (a)

    “Blobs” that are components formed in the barely subcritical regime.

  2. (b)

    Edges formed between such blobs as the system proceeds from the barely subcritical regime through the critical scaling window.

The results below (in particular Theorem 1.8) specify how to handle the second aspect. In a companion paper we show how one can use macroscopic averaging of distances within blobs in random graph models such as the configuration model to show that these models also have the same scaling limit in the critical regime as Theorem 1.2 in the setting where degrees obey power-laws with exponents \(\tau \in (3, 4)\). Further, it will follow from Theorem 1.8 that the convergence in (1.7) holds with respect to the product topology induced by Gromov-weak topology on each coordinate. Therefore, Theorem 1.2 can be recovered partially from the more general Theorem 1.8 at the expense of working with a weaker topology.

Before stating the result we will need to define the multiplicative coalescent. The natural domain of this Markov process is the space

$$\begin{aligned} \ell ^2_{\downarrow }:= \Big \{\mathbf {x}= (x_1, x_2, \ldots ):x_1\ge x_2\ge \cdots \ge 0, ~ \sum _i x_i^2 <\infty \Big \}, \end{aligned}$$
(1.9)

equipped with the metric \(d(\mathbf {x}, \mathbf {y}):= \sqrt{\sum _{i\ge 1} (x_i-y_i)^2}\). We will work in the simpler setup where the Markov process starts with a finite number of clusters, i.e., the process starts with \(\mathbf {x}\in \ell ^2_{\downarrow }\) such that \(\exists n < \infty \) such that \(x_i =0\) for \(i> n\). Write \(\ell ^2_{\downarrow }(n)\) for the collection of such vectors. Now the Markov process \((\mathbf {X}(t))_{t\ge 0}\) with initial state \(\mathbf {X}(0) = \mathbf {x}\) evolves as follows. Write \(\mathbf {X}(t) = (X_i(t))_{i\ge 1}\). Then for \(i\ne j\), clusters i and j merge at rate \(X_i(t)\cdot X_j(t)\) into a single cluster of size \(X_i(t)+X_j(t)\).

Note that for any fixed time \(t>0\), it is easy to find the distribution of masses \(\mathbf {X}(t)\) via the following random graph:

Definition 1.4

(Random graph \(\mathscr {G}_n(\mathbf {x},t)\)) Consider the vertex set \([n]:=\left\{ 1,2,\ldots , n\right\} \) and assign weight \(x_i\) to vertex i. Now connect each pair of vertices ij with \(i\ne j\) independently with probability

$$\begin{aligned} q_{ij}:= 1-\exp (- tx_i x_j). \end{aligned}$$
(1.10)

Call this random graph \(\mathscr {G}_n(\mathbf {x},t)\). For a connected component \(\mathscr {C}\subseteq \mathscr {G}_n(\mathbf {x},t)\), let \({{\mathrm{mass}}}(\mathscr {C}):= \sum _{i\in \mathscr {C}} x_i\). Let \((\mathscr {C}_i(t))_{i\ge 1}\) denote the connected components arranged in decreasing order of their masses.

The following is obvious from the definition of the multiplicative coalescent:

Lemma 1.5

For each fixed \(t\ge 0\), the masses of the multiplicative coalescent at time t started with finite number of initial clusters with masses \(\mathbf {x}\) satisfies

$$\begin{aligned} (X_i(t):i\ge 1) \mathop {=}\limits ^{d} \big ({{\mathrm{mass}}}(\mathscr {C}_i(t)):i\ge 1\big ). \end{aligned}$$

Analogous to (1.9), consider the two spaces

$$\begin{aligned} \ell ^3_{\downarrow }:= \bigg \{\mathbf {c}:= (c_1, c_2,\ldots )\ :\ c_1\ge c_2\ge \cdots \ge ~ 0,~ \sum _{i\ge 1} c_i^3 < \infty \bigg \}, \quad l_0:= \ell ^3_{\downarrow }{\setminus } \ell ^2_{\downarrow }. \end{aligned}$$
(1.11)

These spaces turn out to be crucial in describing the entrance boundary of the eternal multiplicative coalescent in [10]. In the context of this paper, we are interested in studying scaling limits of connected components of the random graph \(\mathscr {G}_n(\mathbf {x},t)\) when (suitably normalized) asymptotics of the weight vector \(\mathbf {x}\) are described by a vector \(\mathbf {c}\in l_0\). Let

$$\begin{aligned} \sigma _r(\mathbf {x}):= \sum _i x_i^r, \quad 1\le r\le 3. \end{aligned}$$

We will make the following assumptions about the weight vector \(\mathbf {x}:=\mathbf {x}(n)\) used to form the graph \(\mathscr {G}_n(\mathbf {x},t)\). These place the associated graph in a particular entrance boundary of the associated eternal multiplicative coalescent [10, Proposition 7].

Assumption 1.6

For each \(n\ge 1\), let \(\mathbf {x}^{(n)} = (x_i^{(n)}: 1\le i\le n)\) be an initial finite-length vector belonging to \(\ell ^2_{\downarrow }(n)\). Suppose that as \(n\rightarrow \infty \) there exists \(\mathbf {c}\in l_0\) such that

$$\begin{aligned} \frac{\sigma _3(\mathbf {x}^{(n)})}{(\sigma _2(\mathbf {x}^{(n)}))^3}&\rightarrow \sum _j c_j^3, \end{aligned}$$
(1.12)
$$\begin{aligned} \frac{x_j^{(n)}}{\sigma _2(\mathbf {x}^{(n)}))}&\rightarrow c_j\ \text { for } j\ge 1,\ \text { and} \end{aligned}$$
(1.13)
$$\begin{aligned} \sigma _2(\mathbf {x}^{(n)})&\rightarrow 0. \end{aligned}$$
(1.14)

Now let \(\left\{ \xi _j:j\ge 1\right\} \) be a sequence of independent exponential random variables where \(\xi _j\) has rate \(c_j\) for each \(j\ge 1\). For a fixed \(\lambda \in \mathbb {R}\), consider the process

$$\begin{aligned} V^{\mathbf {c}}_\lambda (s):= \lambda s + \sum _j (c_j \mathbbm {1}\left\{ \xi _j \le s\right\} - c_j^2 s), \quad s\ge 0. \end{aligned}$$
(1.15)

It turns out that this process is well defined precisely if \(\mathbf {c}\in \ell ^3_{\downarrow }\) [10]. Consider the “reflected at zero” process

$$\begin{aligned} \tilde{V}^{\mathbf {c}}_\lambda (s):= V^{\mathbf {c}}(s) - \min _{0\le s^\prime \le s} \tilde{V}^{\mathbf {c}}(s^\prime ), \end{aligned}$$
(1.16)

and the excursions of \(\tilde{V}^{\mathbf {c}}_\lambda (\cdot )\) from zero. Then Aldous and Limic [10] showed that the lengths of these excursions are a.s. in \(l^2\) precisely when \(\mathbf {c}\in l_0\), and thus can be arranged in decreasing order. Write

$$\begin{aligned} \mathscr {Z}(\lambda ):= (\mathscr {Z}_i(\lambda ):i\ge 1) \end{aligned}$$
(1.17)

for these excursions in decreasing order of their length. Let \(Z_i(\lambda ):= |\mathscr {Z}_i(\lambda )|\) denote the length of the ith largest excursion and let

$$\begin{aligned} \mathbf {Z}(\lambda ):= (Z_i(\lambda ):i\ge 1) \in \ell ^2_{\downarrow }\qquad \text{ a.s. } \end{aligned}$$
(1.18)

Then Aldous and Limic [10] proved the following result:

Theorem 1.7

([10, Proposition 7]) Fix \(\lambda \in \mathbb {R}\) and consider the time scale \(t_n:= \lambda + [\sigma _2(\mathbf {x}^{(n)})]^{-1}\). Under Assumptions (1.12), (1.13), (1.14), the masses of the connected components of the graph \(\mathscr {G}_n(\mathbf {x},t_n)\) satisfy

$$\begin{aligned} \left( {{\mathrm{mass}}}\left[ \mathscr {C}_i\left( \lambda + \frac{1}{\sigma _2(\mathbf {x}^{(n)})}\right) \right] :i\ge 1\right) \mathop {\longrightarrow }\limits ^{d}\mathbf {Z}(\lambda ), \quad \text{ as } n\rightarrow \infty , \end{aligned}$$

with respect to the topology in \(\ell ^2_{\downarrow }\), where \(\mathbf {Z}(\lambda )\) is as in (1.18).

Now consider the connected components in \(\mathscr {G}_n(\mathbf {x},t)\), and as before, view each component \(\mathscr {C}\) as a connected metric space via the usual graph distance where each edge has length one. Further, view each component \(\mathscr {C}\) as a measured metric space by assigning mass \(x_i/{{\mathrm{mass}}}(\mathscr {C})\) to each vertex \(i\in \mathscr {C}\). Let \(\mathscr {S}_{*}\) denote the space of (equivalence classes) of measured metric spaces equipped with Gromov-weak topology (see Sect. 2.1.2 for definition) and view

$$\begin{aligned} \mathbf {M}_n(\lambda ):= \left( \mathscr {C}_i\bigg (\lambda + \frac{1}{\sigma _2(\mathbf {x}^{(n)})}\bigg ):i\ge 1\right) \end{aligned}$$

as a random element in \(\mathscr {S}_{*}^{\mathbb {N}}\). Then our next result is about Gromov-weak convergence of \(\mathbf {M}_n(\lambda )\).

Theorem 1.8

Fix \(\lambda \in \mathbb {R}\). Then under Assumption 1.6, there exist an appropriate limiting sequence of metric spaces \(\mathbf {M}_{\infty }^{\mathbf {c}}(\lambda ):= (M_i^{\mathbf {c}}(\lambda ):i\ge 1)\) such that

$$\begin{aligned} \sigma _2(\mathbf {x}^{(n)})\mathbf {M}_n(\lambda ) \mathop {\longrightarrow }\limits ^{d}\mathbf {M}_\infty ^{\mathbf {c}}(\lambda ), \quad \text{ as } n\rightarrow \infty . \end{aligned}$$

Here weak convergence is on \(\mathscr {S}_{*}^{\mathbb {N}}\) which is equipped with the natural product topology induced by the Gromov-weak topology on each coordinate \(\mathscr {S}_{*}\).

Remark 1

A full description of the limit objects is given in Sect. 2.3. The limit objects use tilted versions of inhomogeneous continuum random trees and checking compactness even of the original versions at this level of generality turns out to be quite intractable. However as the next theorem shows, in the special case of relevance to the rank-one model, one can prove much more.

Consider the special sequence \(\mathbf {c}= \mathbf {c}(\alpha ,\tau ):=(c_i(\alpha ,\tau ):\ i\ge 1)\in l_0\) with \(\tau \in (3,4)\) and \(\alpha > 0\), where

$$\begin{aligned} c_i(\alpha ,\tau ):= \frac{\alpha }{i^{1/(\tau -1)}}, \quad i\ge 1. \end{aligned}$$
(1.19)

Then we have the following result about the limiting metric spaces:

Theorem 1.9

Fix \(\alpha >0\), \(\tau \in (3,4)\) and let \(\mathbf {c}= \mathbf {c}(\alpha ,\tau )\) as in (1.19). Consider the limiting metric spaces \(\mathbf {M}_\infty ^{\mathbf {c}}(\lambda ):=(M_i^{\mathbf {c}}(\lambda ):i\ge 1)\).

Then almost surely \(M_i^{\mathbf {c}}(\lambda )\) is compact for every \(i\ge 1\). Further, the Minkowski dimension of \(M_i^{\mathbf {c}}(\lambda )\) satisfies

$$\begin{aligned} \dim (M_i^{\mathbf {c}}(\lambda )) = \frac{\tau -2}{\tau -3} \quad a.s. \end{aligned}$$
(1.20)

Consequently, the Hausdorff dimension satisfies the bound \(\dim _h(M_i^{\mathbf {c}}(\lambda )) \le (\tau -2)/(\tau -3)\) a.s.

Remark 2

Since we are dealing with equivalence classes of metric spaces (see Sects. 2.1.1 and 2.1.2), Theorem 1.9 should be understood as claiming the existence of representative spaces \(M_i^{\mathbf {c}}(\lambda )\) that are compact, and satisfy the said conditions about the fractal dimensions. We will only work with these representative spaces throughout this paper.

2 Definitions and limit objects

2.1 Convergence of metric spaces

Proper notions of convergence of (measured) metric spaces is one of the central themes in this paper. Here we define the two topologies used in the statement of our results. We mainly follow [1, 26, 38, 39].

2.1.1 Gromov-Hausdorff-Prokhorov metric

In this section, all metric spaces under consideration will be compact metric spaces with associated probability measures. Let us first recall the Gromov-Hausdorff distance \(d_{{{\mathrm{GH}}}}\) between metric spaces. Fix two metric spaces \((X_1,d_1)\) and \((X_2, d_2)\). For a subset \(C\subseteq X_1 \times X_2\), the distortion of C is defined as

$$\begin{aligned} {{\mathrm{dis}}}(C):= \sup \left\{ |d_1(x_1,y_1) - d_2(x_2, y_2)|: (x_1,x_2) , (y_1,y_2) \in C\right\} . \end{aligned}$$

A correspondence C between \(X_1\) and \(X_2\) is a measurable subset of \(X_1 \times X_2\) such that for every \(x_1 \in X_1\) there exists at least one \(x_2 \in X_2\) such that \((x_1,x_2) \in C\) and vice-versa. The Gromov-Hausdorff distance between the two metric spaces \((X_1,d_1)\) and \((X_2, d_2)\) is defined as

$$\begin{aligned} d_{{{\mathrm{GH}}}}(X_1, X_2) = \frac{1}{2}\inf \left\{ {{\mathrm{dis}}}(C): C \text{ is } \text{ a } \text{ correspondence } \text{ between } X_1 \text{ and } X_2\right\} . \end{aligned}$$

Suppose \((X_1, d_1)\) and \((X_2, d_2)\) are two metric spaces and \(p_1\in X_1\), and \(p_2\in X_2\). Then the pointed Gromov-Hausdorff distance between \(\varvec{X}_1:=(X_1, d_1, p_1)\) and \(\varvec{X}_2:=(X_2, d_2, p_2)\) is given by

$$\begin{aligned}&d_{{{\mathrm{GH}}}}^{{{\mathrm{pt}}}}(\varvec{X}_1, \varvec{X}_2)\nonumber \\&= \frac{1}{2}\inf \left\{ {{\mathrm{dis}}}(C): C \text{ is } \text{ a } \text{ correspondence } \text{ between } X_1 \text{ and } X_2 \text{ and } (p_1, p_2)\in C\right\} . \end{aligned}$$
(2.1)

We will need a metric that also keeps track of associated measures on the corresponding spaces. A compact measured metric space \((X, d , \mu )\) is a compact metric space (Xd) with an associated probability measure \(\mu \) on the Borel sigma algebra \(\mathscr {B}(X)\). Given two compact measured metric spaces \((X_1, d_1, \mu _1)\) and \((X_2,d_2, \mu _2)\) and a measure \(\pi \) on the product space \(X_1\times X_2\), the discrepancy of \(\pi \) with respect to \(\mu _1\) and \(\mu _2\) is defined as

$$\begin{aligned} D(\pi ;\mu _1, \mu _2):= ||\mu _1-\pi _1|| + ||\mu _2-\pi _2||, \end{aligned}$$

where \(\pi _1, \pi _2\) are the marginals of \(\pi \) and \(||\cdot ||\) denotes the total variation distance between probability measures. Then the Gromov-Haussdorf-Prokhorov distance between \(X_1\) and \(X_2\) is defined as

$$\begin{aligned} d_{{{\mathrm{GHP}}}}(X_1, X_2):= \inf \left\{ \max \left( \frac{1}{2} {{\mathrm{dis}}}(C),~D(\pi ;\mu _1,\mu _2),~\pi (C^c)\right) \right\} , \end{aligned}$$
(2.2)

where the infimum is taken over all correspondences C and measures \(\pi \) on \(X_1 \times X_2\).

Similar to (2.1), we can define a “pointed Gromov-Hausdorff-Prokhorov distance”, \(d_{{{\mathrm{GHP}}}}^{{{\mathrm{pt}}}}\) between two metric measure spaces \(X_1\) and \(X_2\) having two distinguished points \(p_1\) and \(p_2\) respectively by taking the infimum in (2.2) over all correspondences C and measures \(\pi \) on \(X_1 \times X_2\) such that \((p_1, p_2)\in C\).

Write \(\mathscr {S}\) for the collection of all measured compact metric spaces \((X,d,\mu )\). The function \(d_{{{\mathrm{GHP}}}}\) is a pseudometric on \(\mathscr {S}\), and defines an equivalence relation \(X \sim Y \Leftrightarrow d_{{{\mathrm{GHP}}}}(X,Y) = 0\) on \(\mathscr {S}\). Let \({\bar{\mathscr {S}}} := \mathscr {S}/ \sim \) be the space of isometry equivalent classes of measured compact metric spaces and \({\bar{d}}_{{{\mathrm{GHP}}}}\) the induced metric. Then by [1], \(({\bar{\mathscr {S}}}, {\bar{d}}_{{{\mathrm{GHP}}}})\) is a complete separable metric space. To ease notation, we will continue to use \((\mathscr {S}, d_{{{\mathrm{GHP}}}})\) instead of \(({\bar{\mathscr {S}}}, {\bar{d}}_{{{\mathrm{GHP}}}})\) and \(X = (X, d, \mu )\) to denote both the metric space and the corresponding equivalence class.

2.1.2 Gromov-weak topology

Here we mainly follow [38]. Introduce an equivalence relation on the space of complete and separable metric spaces that are equipped with a probability measure on the associated Borel \(\sigma \)-algebra by declaring two such spaces \((X_1, d_1, \mu _1)\) and \((X_2, d_2, \mu _2)\) to be equivalent when there exists an isometry \(\psi :\mathrm {support}(\mu _1)\rightarrow \mathrm {support}(\mu _2)\) such that \(\mu _2=\psi _{*}\mu _1:=\mu _1\circ \psi ^{-1}\), i.e., the push-forward of \(\mu _1\) under \(\psi \) is \(\mu _2\). Write \(\mathscr {S}_{*}\) for the associated space of equivalence classes. As before, we will often ease notation by not distinguishing between a metric space and its equivalence class.

Fix \(m\ge 2\), and a complete separable metric space (Xd). Then given a collection of points \(\mathbf {x}:=(x_1, x_2, \ldots , x_m)\in X^m\), let \(\mathbf {D}(\mathbf {x}):= (d(x_i, x_j))_{i,j\in [m]}\) denote the symmetric matrix of pairwise distances between the collection of points. A function \(\Phi :\mathscr {S}_* \rightarrow \mathbb {R}\) is called a polynomial of degree m if there exists a bounded continuous function \(\phi :\mathbb {R}_+^{n^2}\rightarrow \mathbb {R}\) such that

$$\begin{aligned} \Phi ((X,d,\mu )):= \int \phi (\mathbf {D}(\mathbf {x})) \mu ^{\otimes m}(d(\mathbf {x})). \end{aligned}$$
(2.3)

Here \(\mu ^{\otimes m}\) is the m-fold product measure of \(\mu \). Let \(\varvec{\Pi }\) denote the space of all polynomials on \(\mathscr {S}_*\).

Definition 2.1

(Gromov-weak topology) A sequence \((X_n, d_n, \mu _n)_{n\ge 1} \in \mathscr {S}_*\) is said to converge to \((X, d, \mu ) \in \mathscr {S}_*\) in the Gromov-weak topology if and only if \(\Phi ((X_n, d_n, \mu _n))\rightarrow \Phi ((X, d, \mu ))\) for all \(\Phi \in \varvec{\Pi }\).

In [38, Theorem 1] it is shown that \(\mathscr {S}_*\) is a Polish space under the Gromov-weak topology. It is also shown that, in fact, this topology can be completely metrized using the so-called Gromov-Prokhorov metric.

2.1.3 Spaces of trees with edge lengths, leaf weights and root-to-leaf measures

In the proof of the main results we need the following two spaces built on top of the space of discrete trees. The first space \(\mathbf {T}_{IJ}\) was formulated in [12, 13] where it was used to study trees spanning a finite number of random points sampled from an inhomogeneous continuum random tree (as described in the next section). We use the same notation in this paper.

The space \(\mathbf {T}_{IJ}\): Fix \(I\ge 0\) and \(J\ge 1\). Let \(\mathbf {T}_{IJ}\) be the space of trees having the following properties:

  1. (a)

    There are exactly J leaves labeled \(1+, \ldots , J+\), and the tree is rooted at another labeled vertex \(0+\).

  2. (b)

    There may be extra labeled vertices (called hubs) with distinct labels in \(\left\{ 1,2,\ldots , I\right\} \). (It is possible that only some, and not all labels in \(\left\{ 1,2,\ldots , I\right\} \) are used).

  3. (c)

    Every edge e has a strictly positive edge length \(l_e\).

A tree \(\mathbf {t}\in \mathbf {T}_{IJ}\) can be viewed as being composed of two parts:

  1. (1)

    \({{\mathrm{shape}}}(\mathbf {t})\) describing the shape of the tree (including the labels of leaves and hubs) but ignoring edge lengths. The set of all possible shapes \(\mathbf {T}_{IJ}^{{{\mathrm{shape}}}}\) is obviously finite for fixed IJ.

  2. (2)

    The edge lengths \(\mathbf {l}(\mathbf {t}):= (l_e:e\in \mathbf {t})\). Consider the product topology on \(\mathbf {T}_{IJ}\) consisting of the discrete topology on \(\mathbf {T}_{IJ}^{{{\mathrm{shape}}}}\) and the product topology on \(\mathbb {R}^m\) where m is the number of edges of \(\mathbf {t}\).

The space \(\mathbf {T}_{IJ}^*\): We will need a slightly more general space. Along with the three attributes above in \(\mathbf {T}_{IJ}\), the trees in this space have the following two additional properties. Let \(\mathscr {L}(\mathbf {t}):= \left\{ 1+, \ldots , J+\right\} \) denote the collection of non-root leaves in \(\mathbf {t}\). Then every leaf \(v\in \mathscr {L}(\mathbf {t}) \) has the following attributes:

  1. (d)

    Leaf weights A strictly positive number A(v). Write \(\mathbf {A}(\mathbf {t}):=(A(v): v\in \mathscr {L}(\mathbf {t}))\).

  2. (e)

    Root-to-leaf measures A probability measure \(\nu _{\mathbf {t},v}\) on the path \([0+,v]\) connecting the root and the leaf v. Here the path is viewed as a line segment pointed at \(0+\) and has the usual Euclidean topology. Write \(\varvec{\nu }(\mathbf {t}):= (\nu _{\mathbf {t},v}: v\in \mathscr {L}(\mathbf {t}))\) for this collection of probability measures.

In addition to the topology on \(\mathbf {T}_{IJ}\), the space \(\mathbf {T}_{IJ}^*\) with these additional two attributes inherits the product topology on \(\mathbb {R}^{J}\) owing to leaf weights and \((d_{{{\mathrm{GHP}}}}^{{{\mathrm{pt}}}})^J\) owing to the root-to-leaf measures.

For consistency, we add to the spaces \(\mathbf {T}_{IJ}\) and \(\mathbf {T}_{IJ}^*\) a conventional state \(\partial \). Its use will be clear later on.

2.2 Random \(\mathbf {p}\)-trees and inhomogeneous continuum random trees (ICRTs)

For fixed \(m \ge 1\), write \(\mathbb {T}_m\) and \(\mathbb {T}_m^{{{\mathrm{ord}}}}\) for the collection of all rooted trees with vertex set [m] and rooted ordered trees with vertex set [m] respectively. Here we will view a rooted tree as being directed with the root being the original progenitor and each edge being directed from child to parent. An ordered rooted tree is a tree where children of each individual are assigned an order (meant to describe for example orientation in a planar embedding, e.g., right to left or some notion of age, e.g., oldest to youngest).

In this section, we define a family of random tree models called \(\mathbf {p}\)-trees [27, 59], and their corresponding limits, the so-called inhomogeneous continuum random trees, which play a key role in describing the limit metric spaces as well as in the proof. Fix \(m \ge 1\), and a probability mass function \(\mathbf {p}= (p_1, p_2,\ldots , p_m)\) with \(p_i > 0\) for all \(i\in [m]\). A \(\mathbf {p}\)-tree is a random tree in \(\mathbb {T}_m\), with law as follows. For any fixed \(\mathbf {t}\in \mathbb {T}_m\) and \(v\in \mathbf {t}\), write \(d_v(\mathbf {t})\) for the number of children of v in the tree \(\mathbf {t}\). Then the law of the \(\mathbf {p}\)-tree, denoted by \({{\mathrm{\mathbb {P}}}}_{\text {tree}}\), is defined as:

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{\text {tree}}(\mathbf {t}) = {{\mathrm{\mathbb {P}}}}_{\text {tree}}(\mathbf {t}; \mathbf {p}) = \prod _{v\in [m]} p_v^{d_v(\mathbf {t})}, \quad \mathbf {t}\in \mathbb {T}_m. \end{aligned}$$
(2.4)

Generating a random \(\mathbf {p}\)-tree \(\mathscr {T}\sim {{\mathrm{\mathbb {P}}}}_{\text {tree}}\) and then assigning a uniform random order on the children of every vertex \(v\in \mathscr {T}\) gives a random element with law \({{\mathrm{\mathbb {P}}}}_{{{\mathrm{ord}}}}(\cdot ; \mathbf {p})\) given by

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{{{\mathrm{ord}}}}(\mathbf {t}) = {{\mathrm{\mathbb {P}}}}_{{{\mathrm{ord}}}}(\mathbf {t}; \mathbf {p}) = \prod _{v\in [m]} \frac{p_v^{d_v(\mathbf {t})}}{(d_v(\mathbf {t})) !}, \quad \mathbf {t}\in \mathbb {T}_m^{{{\mathrm{ord}}}}. \end{aligned}$$
(2.5)

Obviously a \(\mathbf {p}\)-tree can be constructed by first generating an ordered \(\mathbf {p}\)-tree with the above distribution and then forgetting about the order.

In a series of papers [11,12,13] it was shown that \(\mathbf {p}\)-trees, under various assumptions, converge to inhomogeneous continuum random trees that we now describe. Recall the space \(\ell ^2_{\downarrow }\) in (1.9). Consider the subset \(\Theta \subset \ell ^2_{\downarrow }\) given by

$$\begin{aligned} \Theta := \bigg \{\varvec{\theta }:=(\theta _i:i\ge 1)\in \ell ^2_{\downarrow }: \sum _{i=1} \theta _i =\infty ,~ \sum _{i=1}^\infty \theta _i^2 = 1\bigg \}. \end{aligned}$$
(2.6)

Now recall from [37, 51] that a real tree is a metric space \((\mathscr {T},d)\) that satisfies the following for every pair \(a,b\in \mathscr {T}\):

  1. (a)

    There is a unique isometric map \(f_{a,b}:[0,d(a,b)]\rightarrow \mathscr {T}\) such that \(f_{a,b}(0)=a,~ f_{a,b}(d(a,b)) =b\).

  2. (b)

    For any continuous one-to-one map \(g:[0,1]\rightarrow \mathscr {T}\) with \(g(0)=a\) and \(g(1)=b\), we have \(g([0,1]) = f_{a,b}([0,d(a,b)])\).

Construction of the ICRT Given \(\varvec{\theta }\in \Theta \), we will now define the inhomogeneous continuum random tree \(\mathscr {T}^{\varvec{\theta }}_{(\infty )}\). We mainly follow the notation in [13]. Assume that we are working on a probability space \((\Omega , \mathscr {F},{{\mathrm{\mathbb {P}}}}_{\varvec{\theta }})\) rich enough to support the following:

  1. (a)

    For each \(i\ge 1\), let \(\mathscr {P}_i:= (\xi _{i,1}, \xi _{i,2}, \ldots )\) be a rate \(\theta _i\) Poisson process, independent for different i. The first point of each process \(\xi _{i,1}\) is special and is called a joinpoint, whilst the remaining points \(\xi _{i,j}\) with \(j\ge 2\) will be called i-cutpoints [13].

  2. (b)

    Independent of the above, let \(\varvec{U}=(U_j^{(i)}:j\ge 1,\ i\ge 1)\) be a collection of i.i.d. uniform (0, 1) random variables. These are not required to construct the tree but will be used to define a certain function on the tree.

The random real tree (with marked vertices) \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) is then constructed as follows:

  1. (i)

    Arrange the cutpoints \(\left\{ \xi _{i,j}:i\ge 1, j\ge 2\right\} \) in increasing order as \(0< \eta _1< \eta _2 < \cdots \). The assumption that \(\sum _i \theta _i^2 <\infty \) implies that this is possible. For every cutpoint \(\eta _k=\xi _{i,j}\), let \(\eta _k^*:=\xi _{i,1}\) be the corresponding joinpoint.

  2. (ii)

    Next, build the tree inductively. Start with the branch \([0,\eta _1]\). Inductively assuming we have completed step k, attach the branch \((\eta _k, \eta _{k+1}]\) to the joinpoint \(\eta _k^*\) corresponding to \(\eta _k\).

Write \(\mathscr {T}_0^{\varvec{\theta }}\) for the corresponding tree after one has used up all the branches \([0,\eta _1], \left\{ (\eta _k, \eta _{k+1}]: k\ge 1\right\} \). Note that for every \(i\ge 1\), the joinpoint \(\xi _{i,1}\) corresponds to a vertex with infinite degree. Label this vertex i. The ICRT \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) is the completion of the marked metric tree \(\mathscr {T}^{\varvec{\theta }}_0\). As argued in [13, Section 2], this is a real-tree as defined above which can be viewed as rooted at the vertex corresponding to zero. We call the vertex corresponding to joinpoint \(\xi _{i,1}\) hub i. Since \(\sum _i \theta _i = \infty \), one can check that hubs are almost everywhere dense on \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\).

Remark 3

The uniform random variables \((U_j^{(i)}:j\ge 1,\ i\ge 1)\) give rise to a natural ordering on \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) (or a planar embedding of \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\)) as follows. For \(i\ge 1\), let \((\mathscr {T}_j^{(i)}:j\ge 1)\) be the collection of subtrees hanging off of the ith hub. Associate \(U_j^{(i)}\) with the subtree \(\mathscr {T}_j^{(i)}\), and think of \(\mathscr {T}_{j_1}^{(i)}\) appearing “to the right of” \(\mathscr {T}_{j_2}^{(i)}\) if \(U_{j_1}^{(i)}< U_{j_2}^{(i)}\). This is the natural ordering on \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) when it is being viewed as a limit of ordered \(\mathbf {p}\)-trees. We can think of the pair \((\mathscr {T}_{(\infty )}^{\varvec{\theta }}, \varvec{U})\) as the ordered ICRT.

Reduced tree \(r_{IJ}^{(\infty )}\): Fix \(I\ge 0\) and \(J\ge 1\). Now let \(\eta _0 = 0\) and for \(j\ge 0\) call vertex \(\eta _j\) the jth sampled leaf and label this as \(j+\) to differentiate this from hub j. Note that the subtree of \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) spanned by \(\left\{ 0+,1+, \ldots , J+\right\} \) (namely the part of the tree constructed from the interval \([0,\eta _J]\)) is a tree in the usual sense with random edge lengths. For all hubs i, if \(i\le I\), retain its label and remove the label otherwise. This gives a random element of \(\mathbf {T}_{IJ}\) (recall the definiton Sect. 2.1.3), which we denote by \(r_{IJ}^{(\infty )}\). See Fig. 3 corresponding to the stick-breaking construction in Figs. 1 and 2.

Fig. 1
figure 1

An illustration of the ICRT construction with four point process \(\left\{ \mathscr {P}_i:1\le i\le 4\right\} \). The red points represent the joinpoint of the corresponding point process and the blue points the corresponding cutpoints. The last line contains the union of the four point processes. See Fig. 2 for the corresponding tree

Mass measure For every vertex \(v\in \mathscr {T}_{(\infty )}^{\varvec{\theta }}\), define the degree of v to be the number of connected components of \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}{{\setminus }}\left\{ v\right\} \). Vertices with degree one are called leaves of \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) and all other vertices form the skeleton of the tree. Let \(\mathscr {L}(\mathscr {T}_{(\infty )}^{\varvec{\theta }})\) denote the set of leaves of \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\). In [13], it was shown that one can associate to \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\), a natural probability measure called the mass measure satisfying \(\mu (\mathscr {L}(\mathscr {T}_{(\infty )}^{\varvec{\theta }}))=1\).

Root-to-vertex path measures Now using the collection of uniform random variables above, we will define a function \(\mathfrak {G}_{(\infty )}\) on the tree as well as a collection of measures on paths emanating from the root. Recall that the hubs in \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) have infinite degrees. Let \((\mathscr {T}_j^{(i)}:j\ge 1)\) be the collection of subtrees of hub i in \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) (labeled in some fashion). For each \(y\in \mathscr {T}_{(\infty )}^{\varvec{\theta }}\), let

$$\begin{aligned} \mathfrak {G}_{(\infty )}(y)=\sum _{i\ge 1}\theta _{i}\left[ \sum _{j\ge 1}U_j^{(i)}\times \mathbbm {1}\left\{ y\in \mathscr {T}_j^{(i)}\right\} \right] . \end{aligned}$$
(2.7)

We will show in our proof that \(\mathfrak {G}_{(\infty )}(y)\) is finite for almost every realization of \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) and for \(\mu \)-almost every \(y\in \mathscr {T}_{(\infty )}^{\varvec{\theta }}\) (see Lemma 4.9 and Theorem 4.15 below). For \(y\in \mathscr {T}_{(\infty )}^{\varvec{\theta }}\), let \([\rho ,y]\) denote the path from the root \(\rho \) to y. For every y, define a probability measure on \([\rho ,y]\) as

$$\begin{aligned} Q_{y}^{(\infty )}(v):= \frac{\theta _i U_{j}^{(i)}}{ \mathfrak {G}_{(\infty )}(y)}, \quad \text{ if } v \text{ is } \text{ the } i\text{ th } \text{ hub } \text{ and } y\in \mathscr {T}_j^{(i)} \text{ for } \text{ some } j. \end{aligned}$$
(2.8)
Fig. 2
figure 2

The tree constructed via the stick-breaking construction from Fig. 1

Fig. 3
figure 3

Reduced tree \(r_{47}^{(\infty )}\) corresponding to the tree in Fig. 2

Thus, this probability measure is concentrated on the hubs on the path from y to the root.

Remark 4

Note that both \(\mathfrak {G}_{(\infty )}(\cdot )\) and \(Q_{y}^{(\infty )}(\cdot )\) depend on the realization of the pair \((\mathscr {T}_{(\infty )}^{\varvec{\theta }}, \varvec{U})\), but we chose to suppress them to avoid cumbersome notation.

Random tree \(\mathscr {R}_{IJ}^{(\infty )}\) Recall the tree \(r_{IJ}^{(\infty )}\) above. Recall that \(\eta _j\) is the vertex in the tree \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) corresponding to leaf \(j+\) for \(1\le j\le J\). To each of these J leaves, associate the value \(\mathfrak {G}_{(\infty )}(\eta _j)\), and associate the probability measure \(Q_{\eta _j}^{(\infty )}\) to the path \([0+, j+]\). This tree is a random element of the space \(\mathbf {T}_{IJ}^{*}\) (see Sect. 2.1.3), which we denote by \(\mathscr {R}_{IJ}^{(\infty )}\).

2.3 Continuum limits of components

The aim of this section is to give an explicit description of the limiting (random) metric spaces in Theorem 1.8. We start by constructing a specific tilted version of the ICRT in Sect. 2.3.1. Then in Sect. 2.3.2 we describe the limits of maximal components.

2.3.1 Tilted ICRTs and vertex identification

Let \((\Omega , \mathscr {F}, {{\mathrm{\mathbb {P}}}}_{\theta })\) and \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) be as in Sect. 2.2 and let \(\gamma >0\) a constant. Informally, the construction goes as follows: We will first tilt the distribution of the original ICRT \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) using the functional

$$\begin{aligned} L_{(\infty )}(\mathscr {T}_{(\infty )}^{\varvec{\theta }}, \varvec{U}):= \exp \left( \gamma \int _{y\in \mathscr {T}_{(\infty )}^{\varvec{\theta }}} \mathfrak {G}_{(\infty )}(y)\mu (dy)\right) \end{aligned}$$
(2.9)

to get a tilted tree \(\mathscr {T}_{(\infty )}^{\varvec{\theta },\star }\). We then generate a random but finite number \(N_{(\infty )}^\star \) of pairs of points \(\left\{ (x_k, y_k):1\le k\le N_{(\infty )}^\star \right\} \). The final metric space is obtained by creating “shortcuts” by identifying the points \(x_k\) and \(y_k\). Formally the construction proceeds in four steps:

  1. (a)

    Tilted ICRT Define \({{{\mathrm{\mathbb {P}}}}}_{\theta }^\star \) on \(\Omega \) by

    $$\begin{aligned} \frac{d {{{{\mathrm{\mathbb {P}}}}}}_{\theta }^\star }{d{{{{\mathrm{\mathbb {P}}}}}}_{\theta }}=\frac{\exp \left( \gamma \int _{y\in \mathscr {T}^{\theta }}\mathfrak {G}_{(\infty )}(y)\mu (dy) \right) }{{{\mathrm{\mathbb {E}}}}\left[ \exp \left( \gamma \int _{x\in \mathscr {T}^{\theta }} \mathfrak {G}_{(\infty )}(x)\mu (dx) \right) \right] }. \end{aligned}$$

    The expectation in the denominator is with respect to the original measure \({{{\mathrm{\mathbb {P}}}}}_{\theta }\). In our proof we will show that this object is finite. Write \((\mathscr {T}_{(\infty )}^{\varvec{\theta },\star }, \mu ^\star )\) and \(\varvec{U}^{\star }=(U_j^{(i), \star }: i,j\ge 1)\) for the tree and the mass measure on it, and the associated random variables under this change of measure.

  2. (b)

    Poisson number of identification points Conditionally on \(((\mathscr {T}_{(\infty )}^{\varvec{\theta },\star }, \mu ^\star ), \varvec{U}^{\star })\), generate \(N_{(\infty )}^\star \) having a \(\mathrm {Poisson}(\Lambda _{(\infty )}^\star )\) distribution, where

    $$\begin{aligned} \Lambda _{(\infty )}^\star := \gamma \int _{y\in \mathscr {T}_{(\infty )}^{\varvec{\theta },\star }}\mathfrak {G}_{(\infty )}(y)\mu ^\star (dy) =\gamma \sum _{i\ge 1}\theta _{i}\left[ \sum _{j\ge 1}U_j^{(i), \star }\mu ^\star (\mathscr {T}_j^{(i), \star })\right] . \end{aligned}$$

    Here, \((\mathscr {T}_j^{(i), \star } : j\ge 1)\) denotes the collection of subtrees of hub i in \(\mathscr {T}_{(\infty )}^{\varvec{\theta },\star }\). (As mentioned before in Remark 4, \(\mathfrak {G}_{(\infty )}(\cdot )\) depends on the realization of the ordered ICRT. \(U_j^{(i), \star }\) appears in the expression above as the function \(\mathfrak {G}_{(\infty )}\) acts on \(y\in \mathscr {T}_{(\infty )}^{\varvec{\theta },\star }\) for which the associated order is described by \(\varvec{U}^{\star }\)).

  3. (c)

    “First” endpoints (of shortcuts) Conditionally on (a) and (b), sample \(x_k\) from \(\mathscr {T}_{(\infty )}^{\varvec{\theta },\star }\) with density proportional to \(\mathfrak {G}_{(\infty )}(x)\mu ^\star (dx)\) for \(1\le k\le N_{(\infty )}^\star \).

  4. (d)

    “Second” endpoints (of shortcuts) and identification Having chosen \(x_k\), choose \(y_k\) from the path \([\rho , x_k]\) joining the root \(\rho \) and \(x_k\) according to the probability measure \(Q_{x_k}^{(\infty )}\) as in (2.8) but with \(U_j^{(i),\star }\) replacing \(U_j^{(i)}\). (Note that \(y_k\) is always a hub on \([\rho , x_k]\)). Identify \(x_k\) and \(y_k\), i.e., form the quotient space by introducing the equivalence relation \(x_k\sim y_k\) for \(1\le k\le N_{(\infty )}^\star \).

Definition 2.2

Fix \(\gamma \ge 0\) and \(\varvec{\theta }\in \Theta \) as in (2.6). Let \(\mathscr {G}_{\infty }(\varvec{\theta },\gamma )\) be the metric measure space constructed via the four steps above equipped with the measure inherited from the mass measure on \(\mathscr {T}_{(\infty )}^{\varvec{\theta },\star }\).

In our proofs, we will always think of the leaf end (of a shortcut or a surplus edge) as the first endpoint, and the second endpoint will be selected from the skeleton.

2.3.2 Limits of the components

Fix \(\lambda \in \mathbb {R}\) and \(\mathbf {c}\in l_0 \) as in (1.11) and consider the setting of Theorem 1.8. We will need 2 main objects:

  1. (a)

    The process \(\tilde{V}^{\mathbf {c}}_{\lambda }(\cdot )\) in (1.16). Recall that the excursions of this process from zero could be arranged in increasing order of lengths as \(\mathscr {Z}(\lambda )\). Let \(\Xi ^{(i)} = (c_j: \xi _j \in \mathscr {Z}_i)\) denote the point process of jumps of the process \(\tilde{V}^{\mathbf {c}}_{\lambda }(\cdot )\) corresponding to the excursion \(\mathscr {Z}_i(\lambda )\). Abusing notation we will write \(\Xi ^{(i)} = (c_j: j \in \mathscr {Z}_i)\).

  2. (b)

    The actual lengths of these excursions \((Z_i(\lambda ):i\ge 1)\) as in (1.18).

From these objects, for each fixed \(i\ge 1\), define the random variable \(\bar{\gamma }^{(i)}\) and the point process \(\varvec{\theta }^{(i)} = (\theta _j^{(i)}:j\in \mathscr {Z}_i(\lambda ))\) as

$$\begin{aligned} \bar{\gamma }^{(i)}:= Z_i(\lambda )\sqrt{\sum _{v\in \mathscr {Z}_i(\lambda )} c_v^2}, \qquad \varvec{\theta }^{(i)}:= \left( \frac{c_j}{\sqrt{\sum _{v\in \mathscr {Z}_i(\lambda )} c_v^2}}: j\in \mathscr {Z}_i(\lambda )\right) . \end{aligned}$$
(2.10)

Our proof (see Proposition 5.1) will imply that \(\varvec{\theta }^{(i)} \in \Theta \) as in (2.6) a.s. Define

$$\begin{aligned} \Gamma _i(\lambda ):=Z_i(\lambda )\left( \sum _{v\in \mathscr {Z}_i(\lambda )}c_v^2\right) ^{-1/2}, \end{aligned}$$

and generate the random metric measure spaces

$$\begin{aligned} M_i^{\mathbf {c}}(\lambda ):= \Gamma _i(\lambda )\cdot \mathscr {G}_{\infty }(\varvec{\theta }^{(i)}, \bar{\gamma }^{(i)}), \end{aligned}$$

where \(\mathscr {G}_{\infty }(\varvec{\theta },\bar{\gamma })\) is as described in Sect. 2.3.1 and the metric spaces are conditionally independent across i given the driving parameters in (2.10). Let \(\mathbf {M}_{\infty }^{\mathbf {c}}(\lambda ) = (M_i^{\mathbf {c}}(\lambda ):i\ge 1)\). Then this is the limiting collection of metric spaces in Theorem 1.8.

To describe the sequence of spaces \(\mathbf {M}_{\infty }^{{{\mathrm{nr}}}}(\lambda )\) appearing in Theorem 1.2, define

$$\begin{aligned} \mathbf {c}^{{{\mathrm{nr}}}}&:=(c_j^{{{\mathrm{nr}}}}: j\ge 1),\quad \text { where }\quad c_j^{{{\mathrm{nr}}}}= \frac{1}{\mathbb {E}W}\left( \frac{c_F}{j}\right) ^{1/(\tau -1)}, \end{aligned}$$
(2.11)
$$\begin{aligned} \zeta&:=-\left( \frac{c_F^{2/(\tau -1)}}{{{\mathrm{\mathbb {E}}}}W}\right) \sum _{i=1}^{\infty }\left[ \int _{i-1}^i\frac{du}{u^{2/(\tau -1)}}-\frac{1}{i^{2/(\tau -1)}}\right] , \quad \text { and }\quad t^{{{\mathrm{nr}}}}_{\lambda }:=\frac{(\lambda +\zeta )}{{{\mathrm{\mathbb {E}}}}W}. \end{aligned}$$
(2.12)

Here W is a random variable with distribution F as in (1.3). Then

$$\begin{aligned} \mathbf {M}_{\infty }^{{{\mathrm{nr}}}}(\lambda )=\frac{1}{{{\mathrm{\mathbb {E}}}}W}\cdot \mathbf {M}_{\infty }^{\mathbf {c}^{{{\mathrm{nr}}}}}\left( t^{{{\mathrm{nr}}}}_{\lambda }\right) . \end{aligned}$$
(2.13)

3 Discussion

We describe the two major motivations for developing the general theory of this paper in Sects. 3.1 and 3.2. In Sects. 3.3 and 3.4, we include a brief discussion about ICRTs as well as give an overview of the order in which the proofs are carried out.

3.1 Universality and domains of attraction of critical random graph models

One natural question the reader might ask at this point is why the general theory in Sect. 1.2, why not just stick to the rank-one random graph model as in Sect. 1.1. As we have described in the introduction, the aim of this paper is the development of general theory applicable to a wide array of models. What does one mean by this? It turns out that many different random graph models can be constructed in a dynamic fashion as a graph-valued process \(\left\{ \mathscr {G}_n(t): t\ge 0\right\} \) where edges are added as time advances thus resulting in mergers of components as \(t\uparrow t_c\). In this construction, there is a critical time \(t_c\) (model-dependent) such that the giant component emerges after time \(t_c\).

Now for most random graph models (including the configuration model) the dynamics of mergers of components starting at time zero do not look like the multiplicative coalescent. However if one were to zoom in at the critical time \(t_c\), for many models, there exists \(\varepsilon _n\downarrow 0\) such that if one were to look at the interval \([t_c-\varepsilon _n, t_c+\varepsilon _n]\), then mergers of components can be approximated by the multiplicative coalescent. Here \(t_c -\varepsilon _n\) often corresponds to the barely subcritical regime of the random graph. Thus if one had good control over component functionals at the barely subcritical time \(t_c-\varepsilon _n\) and in particular if one was able to show that component sizes appropriately normalized satisfied Assumption 1.6, then one can use Theorem 1.8 to derive convergence at the critical time \(t_c\) of the maximal components. Note that one does not expect component sizes at time \(t_c-\varepsilon _n\) to satisfy assumptions of the Norros-Reittu model in (1.4). Rather in most cases, at time \(t_c-\varepsilon _n\), the expected size of the component of a randomly selected vertex \(V_n\) would scale like \(n^{\delta _1}\) while the maximal component would scale like \(n^{\delta _2}\) (ignoring logarithmic corrections) where \(\delta _1 < \delta _2\) are related to various scaling exponents of the system. In work in progress [19], Theorem 1.9 coupled with delicate estimates of various scaling exponents for the configuration model in the barely subcritical regime, proves analogous results for the configuration model with degree exponent \(\tau \in (3,4)\). Sizes of maximal components in the critical regime including the heavy-tailed regime for this model was previously analyzed in [48]. Further as was done in [18], where a number of sufficient conditions for the domain of attraction of the critical Erdős-Rényi scaling limits were derived, we hope to derive similar general conditions for a random graph model to belong to the same domain of attraction as the rank-one model with \(\tau \in (3,4)\), established in this paper.

Fig. 4
figure 4

On the left an approximation of an ICRT (using \(\mathbf {p}\)-trees on approximately 20,000 vertices) corresponding to \(\theta _i\propto i^{-1/(\tau -2)}\) where \(\tau =3.01\). The reason behind this choice of \(\theta _i\) is explained in Sect. 8. On the right an approximation of a Brownian CRT (using a uniform random tree on the same number of vertices). Vertex sizes are proportional to the degree of the vertex

3.2 Minimal spanning tree on inhomogeneous random graphs

As described in the introduction, a second major motivation for the technical analysis in this paper is the minimal spanning tree. To fix ideas, consider the Norros-Reittu model in the supercritical regime (the parameter in (1.5) \(\nu > 1\)). To each edge attach a random edge weight i.i.d. across edges, assumed to be derived from a continuous distribution. Consider the minimal spanning tree (MST) of the giant component. A large amount of simulation-based evidence from statistical physics [23, 24, 28, 62] suggests that when the degree exponent \(\tau \in (3,4)\) then the distances in this object scale like \(n^{(\tau -3)/(\tau -1)}\), the same distance scaling shown in this paper for the maximal components in the critical regime (Theorem 1.2).

This is not a coincidence. As has been shown in a series of fundamental papers [3,4,5] for the complete graph and the supercritical Erdős-Rényi random graph, a major ingredient in the analysis of the MST problem is the scaling of maximal components in the critical regime which then provides crucial input for the scaling limit of the MST. Till date we have no rigorous results for the scaling of the MST on any “inhomogeneous” random graph model. This paper provides the first step in answering this question in the heavy-tailed regime. Further this program should enable one to analyze the MST for random graph models other than the rank-one model which belong to the same “domain of attraction” in the critical regime.

3.3 Inhomogeneous continuum random trees

As evident from Sect. 2.2, ICRTs play a major role in the description of our limiting objects. Despite a lot of work on these objects in the last decade [11, 13, 27], a number of questions regarding these continuum objects are still open, ranging from sufficient conditions for compactness to the dependence of the fractal properties of this object on the driving parameter \(\varvec{\theta }\). Our proof shows that in some special cases, ICRTs are compact metric spaces when \(\varvec{\theta }\) is sampled according to an appropriate size-biased distribution. This can be seen as an annealed result on compactness of the ICRT. Whether compactness is true for non-random sequences \(\varvec{\theta }\in \Theta \) has been open problem for more than a decade [11]. Similar questions hold for its fractal dimensions. See Sect. 8 for a more detailed account of these problems.

3.4 Overview of the proof

In Sect. 4, we study the random graph \(\mathscr {G}_n(\mathbf {x},t)\) as in Definition 1.4. We start with the simple observation that conditional on the vertex set of components of \(\mathscr {G}_n(\mathbf {x},t)\), a fixed component \(\mathscr {C}\) has the same distribution as \(\mathscr {G}_n(\mathbf {x},t)\) conditional on being connected. This section studies asymptotics for such distributions assuming specific regularity properties of vertex weights in the component in the large network limit, showing Gromov-weak convergence of the associated graph under proper normalization of edge lengths and vertex weights. Section 5 uses the size-biased exploration of the process \(\mathscr {G}_n(\mathbf {x},t)\) [9] to show that maximal connected components satisfy the hypothesis required in Sect. 4. Section 6 studies the special entrance boundary in (1.19) proving both compactness of the limiting objects as well as strengthening the convergence in the Gromov-weak topology to convergence in \(d_{{{\mathrm{GHP}}}}\). In Sect. 7, we derive the box-counting or Minkowski dimension. In Sect. 8, we conclude by describing a number of open problems.

4 Proofs: asymptotics conditional on being connected

The aim of this Section is to study large connected components of \(\mathscr {G}_n(\mathbf {x},t)\) assuming vertex weights satisfy a few regularity properties.

4.1 Tilted \(\mathbf {p}\)-trees and connected components of \(\mathscr {G}(\mathbf {x},t)\)

Recall the random graph \(\mathscr {G}(\mathbf {x},t)\) from Definition 1.4. Here for any \(t\ge 0\), \((\mathscr {C}_i(t):i\ge 1)\) denotes the components in decreasing order of their mass sizes. In this section we will describe results from [20] which gave a method of constructing connected components of \(\mathscr {G}(\mathbf {x},t)\) conditional on the vertices of the components. This construction involved tilted versions of \(\mathbf {p}\)-trees introduced in Sect. 2.2. Since these trees are parametrized via a driving probability mass function (pmf) \(\mathbf {p}\), it will be easy to parametrize various random graph constructions in terms of pmfs as opposed to vertex weights \(\mathbf {x}\). Proposition 4.1 will relate vertex weights to pmfs.

Fix \(n\ge 1\) and \(\mathscr {V}\subset [n]\) and write \(\mathbb {G}_{\mathscr {V}}^{{{\mathrm{con}}}}\) for the space of all simple connected graphs with vertex set \(\mathscr {V}\). For fixed \(a > 0\), and probability mass function \(\mathbf {p}= (p_v: v \in \mathscr {V})\), define probability distributions \({{\mathrm{\mathbb {P}}}}_{{{\mathrm{con}}}}(\cdot ; \mathbf {p}, a, \mathscr {V})\) on \(\mathbb {G}_{\mathscr {V}}^{{{\mathrm{con}}}}\) as follows: Define for \(i,j \in \mathscr {V}\),

$$\begin{aligned} q_{ij}:= 1-\exp (-a p_i p_j). \end{aligned}$$
(4.1)

Then

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{{{\mathrm{con}}}}(G; \mathbf {p}, a, \mathscr {V}): = \frac{1}{Z(\mathbf {p},a)} \prod _{(i,j)\in E(G)} q_{ij} \prod _{(i,j)\notin E(G)} (1-q_{ij}), \quad \text{ for } G \in \mathbb {G}_{\mathscr {V}}^{{{\mathrm{con}}}}, \end{aligned}$$
(4.2)

where \(Z(\mathbf {p},a)\) is the normalizing constant

$$\begin{aligned} Z(\mathbf {p},a) := \sum _{G \in \mathbb {G}_{\mathscr {V}}^{{{\mathrm{con}}}}}{\prod _{(i,j)\in E(G)} q_{ij} \prod _{(i,j)\notin E(G)}(1-q_{ij})}. \end{aligned}$$

Now let \(\mathscr {V}^{(i)} := V(\mathscr {C}_i(t))\) be the vertex set of \(\mathscr {C}_i(t)\) for \(i \ge 1\) and note that \(\left\{ \mathscr {V}^{(i)}:i\ge 1\right\} \) denotes a random finite partition of the full vertex set [n]. The following result is obvious from the construction of \(\mathscr {G}(\mathbf {x},t)\):

Proposition 4.1

([20, Proposition 6.1]) Conditional on the partition \(\left\{ \mathscr {V}^{(i)}:i\ge 1\right\} \) define

$$\begin{aligned} \mathbf {p}_n^{(i)} := \left( \frac{x_v}{\sum _{v \in \mathscr {V}^{(i)}}x_v } : v \in \mathscr {V}^{(i)} \right) , \quad a_n^{(i)}:= t\left( \sum _{v\in \mathscr {V}_{(i)}} x_v\right) ^2, \quad i\ge 1. \end{aligned}$$

For each fixed \(i \ge 1\), let \(G_i \in \mathbb {G}_{\mathscr {V}^{(i)}}^{{{\mathrm{con}}}}\) be a connected simple graph with vertex set \(\mathscr {V}^{(i)}\). Then

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \mathscr {C}_i(t) = G_i, \;\; \forall i \ge 1\ \big |\ \left\{ \mathscr {V}^{(i)}:i\ge 1\right\} \right) = \prod _{i\ge 1} {{\mathrm{\mathbb {P}}}}_{{{\mathrm{con}}}}( G_i; \mathbf {p}_n^{(i)}, a_n^{(i)}, \mathscr {V}^{(i)}). \end{aligned}$$

Thus the random graph \(\mathscr {G}(\mathbf {x},t)\) can be generated in two stages:

  1. (i)

    Stage I Generate the partition of the vertices into different components, i.e., generate \(\left\{ \mathscr {V}^{(i)}:i\ge 1\right\} \).

  2. (ii)

    Stage II Conditional on the partition, generate the internal structure of each component following the law of \({{\mathrm{\mathbb {P}}}}_{{{\mathrm{con}}}}(\cdot ; \mathbf {p}^{(i)}, a^{(i)}, \mathscr {V}^{(i)})\), independently across different components.

Let us now describe an algorithm to generate such connected components using distribution (4.2). To ease notation, let \(\mathscr {V}= [m]\) for some \(m\ge 1\) and fix a probability mass function \(\mathbf {p}\) on [m] and a constant \(a>0\) and write \({{\mathrm{\mathbb {P}}}}_{{{\mathrm{con}}}}(\cdot ):= {{\mathrm{\mathbb {P}}}}_{{{\mathrm{con}}}}(\cdot ;\mathbf {p},a,[m])\) on \(\mathbb {G}_m^{{{\mathrm{con}}}}:= \mathbb {G}_{[m]}^{{{\mathrm{con}}}}\). We will first need to set up some notation before describing this result.

Depth-first exploration of ordered trees Recall that we used \(\mathbb {T}_m^{{{\mathrm{ord}}}}\) for the space of ordered (or planar) trees with vertex set [m]. Given a tree \(\mathbf {t}\in \mathbb {T}_m^{{{\mathrm{ord}}}}\), one can use the associated order to explore the tree in a depth-first manner. More precisely we start with v(1) being the root of \(\mathbf {t}\). At each stage \(1\le i\le m\), we will keep track of three types of vertices: the set of active vertices–\(\mathscr {A}(i)\), the set of explored vertices–\(\mathscr {O}(i)\), and the set of unexplored vertices–\(\mathscr {U}(i)\). The set of active vertices will in fact be viewed as a vertical stack (not just a set) with \(\mathscr {A}(i)\) representing the state of this stack at the end of step \(\mathscr {A}(i)\). Initialize the process with \(\mathscr {A}(1) = \left\{ v(1)\right\} \) (the root of \(\mathbf {t}\)), \(\mathscr {O}(1) = \emptyset \) and \(\mathscr {U}(1) = [m]{\setminus }\left\{ v(1)\right\} \). At step \(i\ge 1\), we let

  1. (i)

    v(i) denote the vertex at the top of the stack \(\mathscr {A}(i)\) and let \(\mathscr {D}(i)\subset \mathscr {U}(i)\) denote the set of children of v(i). Delete v(i) from \(\mathscr {A}(i)\) and arrange the vertices of \(\mathscr {D}(i)\) from oldest to youngest at the top of the stack to form \(\mathscr {A}(i+1)\);

  2. (ii)

    \(\mathscr {O}(i+1) = \mathscr {O}(i) \cup \left\{ v(i)\right\} \);

  3. (iii)

    \(\mathscr {U}(i+1) = \mathscr {U}(i){\setminus }\mathscr {D}(i)\).

Write \(\mathfrak {P}(\mathbf {t})\) for set of pairs of vertices \(\left\{ u,v\right\} \) such that \(u,v\in \mathscr {A}(i)\) for some \(1\le i\le m\); namely both vertices are active but have not yet been explored. Using terminology from [4], call this collection the set of permitted edges. Thus,

$$\begin{aligned} \mathfrak {P}(\mathbf {t}):= \left\{ (v(i),u)\ \big |\ 2\le i\le m,\ u\in \mathscr {A}(i-1){\setminus }\left\{ v(i)\right\} \right\} . \end{aligned}$$
(4.3)

Write \(E(\mathbf {t})\) for the edge set of \(\mathbf {t}\). Now define the function \(L : \mathbb {T}_m^{{{\mathrm{ord}}}} \rightarrow \mathbb {R}_+\) by

$$\begin{aligned} \displaystyle L(\mathbf {t})=\displaystyle L_{(m)}(\mathbf {t}):= \prod _{(k,\ell )\in E(\mathbf {t})} \left[ \frac{\exp (a p_k p_{\ell })- 1}{ap_k p_{\ell }} \right] \exp \left( \sum _{(k,\ell ) \in \mathfrak {P}(\mathbf {t})} a p_k p_{\ell }\right) , \quad \mathbf {t}\in \mathbb {T}_m^{{{\mathrm{ord}}}}. \end{aligned}$$
(4.4)

Recall the (ordered) \(\mathbf {p}\)-tree distribution from (2.5). Using \(L(\cdot )\) to tilt this distribution results in the distribution

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{{{\mathrm{ord}}}}^\star ( \mathbf {t}) := {{\mathrm{\mathbb {P}}}}_{{{\mathrm{ord}}}}(\mathbf {t}) \cdot \frac{L(\mathbf {t})}{{{\mathrm{\mathbb {E}}}}_{{{\mathrm{ord}}}}[ L(\mathscr {T}^{\mathbf {p}}_m)]}, \quad \mathbf {t}\in \mathbb {T}_m^{{{\mathrm{ord}}}}. \end{aligned}$$
(4.5)

For future reference we fix notation for the various objects required in the proof below.

Definition 4.2

Fix \(m\ge 1\), \(a> 0\), and a probability mass function \(\mathbf {p}\) on [m]. We will write \({\tilde{\mathscr {G}}}_m(\mathbf {p}, a)\) to denote a random graph with distribution \({{\mathrm{\mathbb {P}}}}_{{{\mathrm{con}}}}(\cdot ,\mathbf {p},a,[m])\). \({\mathscr {T}}^{\mathbf {p}, \star }_m\) will denote a random planar tree with the tilted \(\mathbf {p}\)-tree distribution (4.5), and \({\mathscr {T}}^{\mathbf {p}}_m\) will denote a random tree with the original \(\mathbf {p}\) tree distribution (2.5).

Proposition 4.3

([20, Proposition 7.4]) Fix \(m\ge 1\), a probability mass function \(\mathbf {p}\) on [m], and \(a>0\). Consider a random connected graph on [m] constructed as follows:

  1. (a)

    First generate a rooted planar random tree \({\mathscr {T}}^{\mathbf {p}, \star }_m\) with distribution \({{\tilde{{{\mathrm{\mathbb {P}}}}}}}_{{{\mathrm{ord}}}}(\cdot )\) as in (4.5).

  2. (b)

    Let \(\mathfrak {P}({\mathscr {T}}^{\mathbf {p}, \star }_m)\) denote the permitted edge set of this random tree. Add each such edge \(\left\{ u,v\right\} \in \mathfrak {P}({\mathscr {T}}^{\mathbf {p}, \star }_m)\) with probability \(q_{uv}\) as in (4.1), independent across permitted edges.

Then, the resulting random graph has distribution \({{\mathrm{\mathbb {P}}}}_{{{\mathrm{con}}}}\) on \(\mathbb {G}_m^{{{\mathrm{con}}}}\), i.e., has the same distribution as \({\tilde{\mathscr {G}}}_m(\mathbf {p}, a)\).

4.2 Convergence of connected components under weight assumptions

The aim of this section is to prove Gromov-weak convergence for the connected graph \({{\tilde{\mathscr {G}}}}_m(\mathbf {p},a)\) under regularity conditions on a and \(\mathbf {p}\) as \(m\rightarrow \infty \). We will assume that we have ordered the index set [m] so that \(p_1\ge p_2\ge \cdots \ge p_m >0\). Let

$$\begin{aligned} \sigma (\mathbf {p}):= \sqrt{\sum _i p_i^2}. \end{aligned}$$

Assumption 4.4

As \(m\rightarrow \infty \), the following hold:

  1. (i)

    \(\sigma (\mathbf {p})\rightarrow 0\) and further for each fixed \(i\ge 1\), \(p_i/\sigma (\mathbf {p})\rightarrow \theta _i\) where \(\varvec{\theta }:= (\theta _1, \theta _2, \ldots )\) is an element of \(~\Theta \) as in (2.6).

  2. (ii)

    There is a constant \(\gamma >0\) such that \(a\sigma (\mathbf {p})\rightarrow \gamma \).

The following theorem is the main result of this section.

Theorem 4.5

Consider the connected random graph \({\tilde{\mathscr {G}}}_m(\mathbf {p},a)\) viewed as a metric measure space via the graph distance where each vertex v is assigned measure \(p_v\). Under Assumption 4.4,

$$\begin{aligned} \sigma (\mathbf {p}) {\tilde{\mathscr {G}}}_m(\mathbf {p},a) \mathop {\longrightarrow }\limits ^{d}\mathscr {G}_{\infty }(\varvec{\theta },\gamma ), \end{aligned}$$

where \(\mathscr {G}_{\infty }(\varvec{\theta },\gamma )\) is the random metric space defined in Definition 2.2 and convergence is in the Gromov-weak topology on metric spaces.

The rest of this section proves this result. We will throughout assume that \({\tilde{\mathscr {G}}}_m(\mathbf {p},a)\) has been constructed using Proposition 4.3.

4.2.1 Two constructions of \(\mathbf {p}\)-trees: exploration process and the birthday construction

We start by describing an explicit construction of the (untilted) \(\mathbf {p}\)-tree \(\mathscr {T}_m^{\mathbf {p}}\) first developed in [11]. At the end of this section we describe a second construction used later in the paper.

Exploration process construction The first construction is initiated by setting up a map \(\psi _{\mathbf {p}}:[0,1]^m\rightarrow \mathbb {T}^{{{\mathrm{ord}}}}\) as follows. Let \(\mathbf {u}:=(u_v:v\in [m])\) be a collection of distinct points in (0, 1). Define

$$\begin{aligned} F^{\mathbf {p}}(s) := - s + \sum _{v=1}^m p_v \mathbbm {1}\left\{ u_v\le s\right\} , \qquad s\in [0,1]. \end{aligned}$$

Assume that there exists a unique point \(v^* \in [m] \) such that \(F^{\mathbf {p}}(u_{v^*}-) = \min _{s\in [0,1]} F^{\mathbf {p}}(s)\). Set \(v^*\) to be the root of the tree \(\psi _{\mathbf {p}}(\mathbf {u})\). Define \(y_i := u_i - u_{v^*}\) \( \text{ mod } 1\) for \(i \in [m]\), and

$$\begin{aligned} F^{{{\mathrm{exc}}},\mathbf {p}}(s):= F^{\mathbf {p}}( u_{v^*} + s \text{ mod } 1) - F^{\mathbf {p}}(u_{v*}-), \qquad 0\le s < 1. \end{aligned}$$

Then \(F^{{{\mathrm{exc}}},\mathbf {p}}(1-) = 0\) and \(F^{{{\mathrm{exc}}},\mathbf {p}}(s) > 0\) for \(s \in [0,1)\). Extend the definition of \(F^{{{\mathrm{exc}}},\mathbf {p}}\) to \(s \in [0,1]\) by define \(F^{{{\mathrm{exc}}},\mathbf {p}}(1) = 0\). We use \(F^{{{\mathrm{exc}}},\mathbf {p}}\) to construct a depth-first-search of an ordered tree whose exploration in this depth-first manner is encoded by the function \(F^{{{\mathrm{exc}}},\mathbf {p}}\). This in turn defines the tree \(\psi _{\mathbf {p}}(\mathbf {u})\). As before, in this construction we carry along a set of explored vertices \(\mathscr {O}(i)\), active vertices \(\mathscr {A}(i)\) and unexplored vertices \(\mathscr {U}(i) = [m]{\setminus }(\mathscr {A}(i)\cup \mathscr {O}(i))\), for \(0\le i \le m\). We view \(\mathscr {A}(i)\) as the state of a vertical stack \(\mathscr {A}\) after the ith step in the depth-first-search. Initialize with \(\mathscr {O}(0) = \emptyset \), \(\mathscr {A}(0) = \left\{ v^*\right\} \), \(\mathscr {U}(0) = [m] {\setminus }\left\{ v(1)\right\} \), and define \(y^*(0) = 0\). At step \(i \in [m]\), let v(i) be the value that is on the top of the stack \(\mathscr {A}(i-1)\) and define \(y^*(i) := y^*(i-1)+p_{v(i)}\). Define \(\mathscr {D}(i) := \left\{ j \in [m] :y^*(i-1)< y_j < y^*(i) \right\} \). Suppose \(\mathscr {D}(i) = \left\{ u(j) :1\le j \le k\right\} \) where we have ordered these vertices in the sequence that they are found in this interval, i.e.,

$$\begin{aligned} y^*(i-1)< y_{u(1)}< \cdots< y_{u(k)} < y^*(i). \end{aligned}$$

Update the stack \(\mathscr {A}\) as follows:

  1. (i)

    Delete v(i) from \(\mathscr {A}\).

  2. (ii)

    Push u(j), \(1\le j\le k\), to the top of \(\mathscr {A}\) sequentially (so that u(k) will be on the top of the stack at the end).

Let \(\mathscr {A}(i)\) be the state of the stack after the above operations. Update \(\mathscr {O}(i) := \mathscr {O}(i-1) \cup \left\{ v(i)\right\} \) and \(\mathscr {U}:= \mathscr {U}(i-1){\setminus }\mathscr {D}(i) \). See Fig. 5 for a pictorial description of this construction.

Fig. 5
figure 5

The function \(F^{\mathbf {p}}\) and the corresponding tree \(\psi _{\mathbf {p}}\)

The tree \(\psi _{\mathbf {p}}(\mathbf {u}) \in \mathbb {T}_m^{{{\mathrm{ord}}}}\) is constructed by putting the edges \(\left\{ (v(i),v): i \in [m], v \in \mathscr {D}(i)\right\} \) and using the order prescribed in the above exploration to make the tree an ordered tree. The fact that this procedure actually produces a tree is proved in [11].

Lemma 4.6

([11, Section 3.2]) Consider the map \(\psi _{\mathbf {p}}\). Let \(\mathbf {X}:=(X_v: v\in [m])\) be i.i.d. random variables distributed uniformly on (0, 1). Then the random tree \(\psi _{\mathbf {p}}(\mathbf {X})\) has distribution (2.5), i.e., \(\psi _{\mathbf {p}}(\mathbf {X}) \mathop {=}\limits ^{d} \mathscr {T}^{\mathbf {p}}\).

For future reference, coupled with the above construction, define \(\mathscr {S}(i):=\mathscr {A}(i-1){\setminus }\left\{ v(i)\right\} \) for \(i\in [m]\). Define the function \(A_m(\cdot )\) on [0, 1] via

$$\begin{aligned} A_m(u):= \sum _{v\in \mathscr {S}(i)} p_v, \quad \text{ for } u \in (y^*(i-1), y^*(i)], i \in [m]. \end{aligned}$$
(4.6)

Further let \({\bar{A}}_m(u) := a A_m (u)\), \(u \in [0,1]\), where a is the scaling constant in (4.1).

Birthday construction We now describe a second construction of \(\mathbf {p}\)-trees, first formulated in [27]. We urge the reader to skim this portion and return to it once she has reached Sect. 4.5. Let \(\mathbf {Y}:=(Y_0, Y_1, \ldots )\) be an infinite sequence of i.i.d. random variables with distribution \(\mathbf {p}\). Let \(R_0=0\) and for \(l\ge 1\), let \(R_l\) denote the l-th repeat time, i.e.,

$$\begin{aligned} R_l=\min \bigg \{k>R_{l-1}: Y_k\in \{Y_0,\ldots ,Y_{k-1}\}\bigg \}. \end{aligned}$$

Now consider the directed graph formed via the edges

$$\begin{aligned} \mathscr {T}(\mathbf {Y}):= \left\{ (Y_{j-1}, Y_j): Y_j\notin \left\{ Y_0, \ldots , Y_{j-1}\right\} , j\ge 1\right\} . \end{aligned}$$

It is easy to check that this gives a tree which we view as rooted at \(Y_0\). Intuitively the process of constructing a tree is as follows: the tree “grows” via the addition of new vertices sampled using \(\mathbf {p}\) till it stumbles across a “repeat” (a vertex already found) when it goes back to the first occurrence of this “repeat” and starts growing from that position. The following striking result was shown in [27].

Theorem 4.7

([27, Lemma 1 and Theorem 2]) The random tree \(\mathscr {T}(\mathbf {Y})\) viewed as an object in \(\mathbb {T}_m\) is distributed as a \(\mathbf {p}\)-tree with distribution (2.4) independently of \(Y_{R_1-1}, Y_{R_2-1}, \ldots \) which are i.i.d. with distribution \(\mathbf {p}\).

Remark 5

The independence between the sequence \(Y_{R_1-1}, Y_{R_2-1}, \ldots \) and the constructed \( \mathbf {p}\) tree \(\mathscr {T}(\mathbf {Y})\) is truly remarkable. In particular, suppose \(\mathscr {S}\) is a \(\mathbf {p}\)-tree with distribution as in (2.4) and for fixed \(r\ge 1\), let \(\tilde{Y_1}, \tilde{Y_2}, \ldots \tilde{Y_r} \) be i.i.d. with distribution \(\mathbf {p}\). Write \(\mathscr {S}_r\subset \mathscr {S}\) for the tree spanned by these vertices and the root. Let \(\mathscr {T}_r^{\mathscr {B}}\subset \mathscr {T}(\mathbf {Y})\) denote the subtree with vertex set \(\left\{ Y_0, Y_1, \ldots , Y_{R_r-1}\right\} \), namely the tree constructed in the first \(R_r\) steps. Here \(\mathscr {B}\) is a mnemonic for “birthday tree” and also to distinguish this construction from a generic random tree model with r vertices. Then the above result (formalized as [27, Corollary 3]) implies that these can be jointly constructed as

$$\begin{aligned} (\tilde{Y_1}, \tilde{Y_2},\ldots , \tilde{Y_r}; \mathscr {S}_r)\mathop {=}\limits ^{d} (Y_{R_1-1}, Y_{R_2-1}, \ldots Y_{R_r -1}; \mathscr {T}_r^{\mathscr {B}}). \end{aligned}$$
(4.7)

We use this fact often in Sect. 4.5.

4.3 Uniform integrability of the tilt

The first use of the above construction of the \(\mathbf {p}\)-tree is to prove the following:

Proposition 4.8

Fix \(s\ge 1\) and consider the tilt \(L(\cdot )\) as in (4.4). Under Assumptions 4.4, there is a constant \(K:=K(s) < \infty \) such that

$$\begin{aligned} \sup _{m\ge 1}\ {{\mathrm{\mathbb {E}}}}_{{{\mathrm{ord}}}}\left( \left[ L(\mathscr {T}^{\mathbf {p}}_m)\right] ^s\right) \le K. \end{aligned}$$

In particular, the collection of random variables \(\left\{ L(\mathscr {T}^{\mathbf {p}}_m): m\ge 1\right\} \) is uniformly integrable.

Proof

Writing out the tilt \(L(\cdot )\) explicitly, we have

$$\begin{aligned} \displaystyle L(\mathbf {t}):= \prod _{(k,\ell )\in E(\mathbf {t})} \left[ \frac{\exp (a p_k p_{\ell })- 1}{ap_k p_{\ell }} \right] \exp \bigg (\sum _{(k,\ell ) \in \mathfrak {P}(\mathbf {t})} a p_k p_{\ell }\bigg ) = \mathbb {I}(\mathbf {t}) {\bar{L}}(\mathbf {t}), \end{aligned}$$
(4.8)

say, where,

$$\begin{aligned} \mathbb {I}(\mathbf {t}):= \prod _{(i,j)\in E(\mathbf {t})} \frac{\exp (a p_i p_j)- 1}{ap_ip_j} \le \exp \bigg ( a\sum _{(i,j)\in E(\mathbf {t})} p_i p_j \bigg ) \le \exp ( a p_1). \end{aligned}$$
(4.9)

Here we have used \(({\mathrm {e}}^x-1)/x \le {\mathrm {e}}^x\) for \(x >0\) for the first inequality and the second inequality follows using the fact that \(\mathbf {t}\) is a tree, so that for each \((i,j) \in E(\mathbf {t})\) such that i is the parent of j, we have \(p_ip_j \le p_1 p_j\). By Assumption 4.4, we have \(ap_1 \rightarrow \gamma \theta _1\). In particular, there is a constant \(C> 0\) such that for all \(m\ge 1\), and \(\mathbf {t}\in {\mathbb {T}}_m^{{{\mathrm{ord}}}}\),

$$\begin{aligned} \mathbb {I}(\mathbf {t})\le C\ \text { and }\ L(\mathbf {t})\le C \exp \bigg (\sum _{(k,\ell ) \in \mathfrak {P}(\mathbf {t})} a p_k p_{\ell }\bigg ). \end{aligned}$$
(4.10)

Now recall the functions \(A_m\) and \(\bar{A}_m:= aA_m\) from (4.6). Using the equivalent characterization of the permitted edge set from (4.3) and comparing this with (4.6), it is easy to check that

$$\begin{aligned} \sum _{(i,j) \in \mathfrak {P}(\mathscr {T}_m^{\mathbf {p}})} a p_i p_j = a \sum _{i \in [m]} \sum _{ j \in \mathscr {S}(i)} p_i p_j= \int _0^1 {\bar{A}}_m(s) ds. \end{aligned}$$

Now by the definition of \(F^{{{\mathrm{exc}}},\mathbf {p}}\),

$$\begin{aligned} F^{{{\mathrm{exc}}},\mathbf {p}}(y^*(i)) = \sum _{v \in \mathscr {A}(i) } p_v, \quad \text{ for } i \in [m]. \end{aligned}$$
(4.11)

By (4.6),

$$\begin{aligned} A_m(t) = \sum _{v \in \mathscr {S}(i)} p_v = \sum _{v \in \mathscr {A}(i-1)}p_v - p_{v(i)}, \quad \text{ for } t \in (y^*(i-1), y^*(i)]. \end{aligned}$$

Thus

$$\begin{aligned} \Vert A_m\Vert _{\infty }\le \Vert F^{{{\mathrm{exc}}},\mathbf {p}}\Vert _{\infty }. \end{aligned}$$
(4.12)

By Assumption 4.4(ii) and (4.10), for any \(s\ge 0\), there exists \(K=K(s) < \infty \) such that

$$\begin{aligned} \left[ L(\mathscr {T}^{\mathbf {p}}_m)\right] ^s \le C^s \exp \left( K \frac{\Vert F^{{{\mathrm{exc}}},\mathbf {p}}\Vert _{\infty }}{\sigma (\mathbf {p})}\right) . \end{aligned}$$

Now the following lemma completes the proof of Proposition 4.8. \(\square \)

Lemma 4.9

There exists a positive constant \(c > 0\) such that for every \(m\ge 1\) and \(x\ge {\mathrm {e}}\),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \Vert F^{{{\mathrm{exc}}},\mathbf {p}}\Vert _{\infty } \ge x\sigma (\mathbf {p})\right) \le \exp \big (-c x\log (\log {x})\big ). \end{aligned}$$

Proof

Write \(\mathscr {R}(m):= \Vert F^{{{\mathrm{exc}}},\mathbf {p}}\Vert _{\infty }/\sigma (\mathbf {p})\) and as before, let \(\mathbf {X}=(X_v:v\in [m])\) be the collection of uniform random variables used to construct \(F^{\mathbf {p}}\). Write \(\mathbb {Q}[0,1]\) for the set of rationals in [0, 1]. Then note that

$$\begin{aligned} \mathscr {R}(m)= \sup _{q\in \mathbb {Q}[0,1]} \frac{F^{\mathbf {p}}(q)}{\sigma (\mathbf {p})}- \inf _{q\in \mathbb {Q}[0,1]} \frac{F^{\mathbf {p}}(q)}{\sigma (\mathbf {p})}:= \mathscr {R}_1(m)+ \mathscr {R}_2(m). \end{aligned}$$
(4.13)

We start by analyzing \(\mathscr {R}_1(m)\). For fixed \(q\in \mathbb {Q}[0,1]\), define the collection of m functions

$$\begin{aligned} s_q^j(x):= \frac{p_j}{\sigma (\mathbf {p})}\left( \mathbbm {1}\left\{ x\le q\right\} -q\right) , \quad 1\le j\le m.\nonumber \end{aligned}$$

Note that for all \(j\in [m]\), \(s_q^j:[0,1]\rightarrow [-1,1]\), with \({{\mathrm{\mathbb {E}}}}(s_q^j(X_j)) =0\) and further

$$\begin{aligned} \mathscr {R}_1(m)= \sup _{q\in \mathbb {Q}[0,1]} \left( s_q^1(X_1)+\cdots + s_q^m(X_m)\right) .\nonumber \end{aligned}$$

Also note that

$$\begin{aligned} \sup _{q\in \mathbb {Q}[0,1]} {{\mathrm{Var}}}\left( s_q^1(X_1)+\cdots + s_q^m(X_m)\right) =\sup _{q\in \mathbb {Q}[0,1]} q(1-q)= \frac{1}{4}.\nonumber \end{aligned}$$

If we can show that

$$\begin{aligned} \kappa := \sup _{m\ge 1} {{\mathrm{\mathbb {E}}}}(\mathscr {R}_1(m)) < \infty , \end{aligned}$$
(4.14)

then standard concentration inequalities for the maxima in empirical processes [49, Theorem 1.1(b)] will imply the existence of a constant \(c_1>0\) such that for all \(m\ge 1\) and \(x >0\),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(\mathscr {R}_1(m)\ge {{\mathrm{\mathbb {E}}}}(\mathscr {R}_1(m))+x)\le \exp \left( -\frac{x}{4} \log \left[ 1+ 2\log \left( 1+\frac{x}{2\kappa +\frac{1}{4}}\right) \right] \right) . \end{aligned}$$
(4.15)

Let us now prove (4.14). In fact we will show the stronger result:

$$\begin{aligned} \sup _{m\ge 1}{{\mathrm{\mathbb {E}}}}\left( \sup _{q\in \mathbb {Q}[0,1]} \left| \sum _{j=1}^m s_q^j(X_j)\right| \right) < \infty .\nonumber \end{aligned}$$

Let \(X_{(1)}< X_{(2)}< \cdots < X_{(m)}\) denote the order statistics of \(\mathbf {X}\) and let \(\pi \) denote the corresponding permutation of [m], namely \(X_{(i)} = X_{\pi (i)}\). Note that

$$\begin{aligned} \sup _{q\in \mathbb {Q}[0,1]} \left| \sum _{j=1}^m s_q^j(X_j)\right| := \max _{1\le i\le m} |\vartheta _i|, \quad \text{ where } \quad \vartheta _i:= \frac{-X_{(i)} +\sum _{j=1}^i p_{\pi (j)}}{\sigma (\mathbf {p})}. \end{aligned}$$

Hence

$$\begin{aligned} \max _{i\in [m]} |\vartheta _i|&\le \max _{i\in [m]}~ [\sigma (\mathbf {p})]^{-1}{\left| -X_{(i)} +\frac{i}{m}\right| } ~+~ \max _{i\in [m]} [\sigma (\mathbf {p})]^{-1}{\left| \sum _{j=1}^{i} p_{\pi (j)} -\frac{i}{m}\right| }\\&:= \mathscr {R}_{11}(m)+\mathscr {R}_{12}(m). \end{aligned}$$

We first analyze \(\mathscr {R}_{11}(m)\). By the DKW inequality [54],

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \max _{i\in [m]}~ {\left| -X_{(i)} +\frac{i}{m}\right| } \ge \sigma (\mathbf {p}) x\right) \le 2\exp \left( -2m\cdot \left( \sigma (\mathbf {p})x\right) ^2\right) \nonumber \end{aligned}$$

By Cauchy-Schwartz, \(m\sigma ^2(\mathbf {p})\ge (\sum _i p_i)^2=1\). Thus \(\sup _{m\ge 1} {{\mathrm{\mathbb {E}}}}(\mathscr {R}_{11}(m)) < \infty \). We now analyze \(\mathscr {R}_{12}(m)\). Since

$$\begin{aligned} {{\mathrm{\mathbb {E}}}}(p_{\pi (j)})= & {} \frac{1}{m},\quad \text {and}\,{{\mathrm{\mathbb {E}}}}(p_{\pi (i)}p_{\pi (j)}) = \frac{\sum _{k\ne \ell \in [m]} p_k p_{\ell }}{m(m-1)}\\= & {} \frac{1-\sigma ^2(\mathbf {p})}{m(m-1)}\quad \text {for} \,i\ne j\in [m], \end{aligned}$$

for any \(i\in [m]\) we have

$$\begin{aligned} {{\mathrm{\mathbb {E}}}}\left( \left( \sum _{j=1}^{i} p_{\pi (j)} -\frac{i}{m}\right) ^2\right)&= \frac{i\sigma ^2(\mathbf {p})}{m} + \frac{i(i-1)}{m(m-1)}(1-\sigma ^2(\mathbf {p})) -\frac{i^2}{m^2} \le \frac{i}{m}\sigma ^2(\mathbf {p}) \end{aligned}$$
(4.16)

by simply expanding the square. Now note that since \(\pi \) is a uniform random permutation of the vertex set [m], for any fixed \(i\ge 1\) we also have

$$\begin{aligned} \sum _{j=1}^i p_{\pi (j)} - \frac{i}{m} \mathop {=}\limits ^{d} \sum _{j=0}^{i-1} p_{\pi (m-j)} - \frac{i}{m} = \left( \frac{m-i}{m} -\sum _{j=1}^{m-i} p_{\pi (i)}\right) . \end{aligned}$$

Thus

$$\begin{aligned} {{\mathrm{\mathbb {E}}}}(\mathscr {R}_{12}(m)) \le 2 {{\mathrm{\mathbb {E}}}}\left( \max _{i\in [m/2]} [\sigma (\mathbf {p})]^{-1}{\left| \sum _{j=1}^{i} p_{\pi (j)} -\frac{i}{m}\right| }\right) . \end{aligned}$$
(4.17)

Now assuming that we construct \(\pi \) by sequentially sampling without replacement from [m], let \(\mathscr {F}_k\) denote the \(\sigma \)-field generated by \((\pi (1), \pi (2), \ldots , \pi (k))\) for \(0\le k\le m-1\). Let \(M_0 = 0\) and consider the sequence

$$\begin{aligned} M_k:= \frac{\sum _{j=1}^k p_{\pi (j)} - k/m}{m-k}, \quad 0\le k\le m-1. \end{aligned}$$

It is easy to check that \(\left\{ M_k:0\le k\le m-1\right\} \) is a martingale with respect to the filtration \(\left\{ \mathscr {F}_k: 0\le k\le m-1\right\} \). Then (4.17) and Doob’s \(\mathbb {L}^2\)-maximal inequality yield

$$\begin{aligned} {{\mathrm{\mathbb {E}}}}(\mathscr {R}_{12}(m)) \le \frac{2m}{\sigma (\mathbf {p})} \sqrt{{{\mathrm{\mathbb {E}}}}(\left[ M_{m/2}\right] ^2)}\nonumber \end{aligned}$$

Using (4.16) with \(i=m/2\) then gives \({{\mathrm{\mathbb {E}}}}(\mathscr {R}_{12}(m)) \le 16\) for all \(m\ge 1\). Thus we have shown that \(\sup _{m\ge 1} \max ({{\mathrm{\mathbb {E}}}}(\mathscr {R}_{11}(m)),{{\mathrm{\mathbb {E}}}}(\mathscr {R}_{12}(m))) < \infty \). This proves (4.14) and thus (4.15).

To complete the proof of the lemma, we need to get a tail bound on \(\mathscr {R}_2(m)\) appearing in (4.13). As before, using [49], it is enough to show \(\sup _{m\ge 1} {{\mathrm{\mathbb {E}}}}(\mathscr {R}_2(m))< \infty \). However, note that

$$\begin{aligned} \mathscr {R}_2(m)&= \max _{i\in [m]}\frac{\left| \sum _{j=1}^{i-1} p_{\pi (j)} - X_{(i)}\right| }{\sigma (\mathbf {p})} \le \mathscr {R}_1(m) + \frac{p_1}{\sigma (\mathbf {p})}. \end{aligned}$$

We now use (4.14) together with Assumption 4.4 to complete the proof. \(\square \)

4.4 Another construction of \(\tilde{\mathscr {G}}_m(\mathbf {p}, a)\) and a modification

In this section, we start by giving a more explicit description of the algorithm described in Proposition 4.3 via adding permitted edges to a tilted \(\mathbf {p}\)-tree. We first set up some notation. As a matter of convention, we will view ordered rooted trees via their planar embedding, using the associated ordering to determine the relative locations of siblings of an individual. We think of the left most sibling as the “oldest”. Further, in a depth-first exploration, we explore the tree from left to right. Now given a planar rooted tree \(\mathbf {t}\in \mathbb {T}_m\), let \(\rho \) denote the root and for every vertex \(v\in [m]\), let \([\rho ,v]\) denote the path connecting \(\rho \) to v in the tree. Given this path and a vertex \(i\in [\rho ,v]\), write \(\mathscr {R}\mathscr {C}(i,[\rho ,v])\) for the set of all children of i which fall to the right of \([\rho ,v]\). Thus in the depth-first exploration of the tree, when we get to v,

$$\begin{aligned} \mathfrak {P}(v,\mathbf {t}):= \cup _{i\in [m]} \mathscr {R}\mathscr {C}(i,[\rho ,v]) \end{aligned}$$

denotes the set of endpoints of all permitted edges emanating from v. Define

$$\begin{aligned} \mathfrak {G}_{(m)}(v):= \sum _{i\in [\rho ,v]} \sum _{j\in [m]} p_j \mathbbm {1}\left\{ j\in \mathscr {R}\mathscr {C}(i,[\rho ,v])\right\} . \end{aligned}$$
(4.18)

The function \(A_m(\cdot )\) defined in (4.6) is intimately connected to \(\mathfrak {G}_{(m)}(\cdot )\). More precisely, let \((v(1), v(2), \ldots , v(m))\) denote the order in the depth-first exploration of the tree. Let \(y^*(0)=0\) and \(y^*(i) = y^*(i-1) + p_{v(i)}\). Define

$$\begin{aligned} A_{(m)}(u) = \mathfrak {G}_{(m)}(u),\quad \text {for}\quad u\in (y^*(i-1), y^*(i)],\quad \text {and}\quad \bar{A}_{(m)}(\cdot ):= a A_{(m)}(\cdot ). \end{aligned}$$
(4.19)

Then the function \(A_{(m)}(\cdot )\) associated with an ordered \(\mathbf {p}\)-tree has the same distribution as the function \(A_{m}(\cdot )\) associated with the tree \(\psi _{\mathbf {p}}(\mathbf {X})\), where \(\mathbf {X}=(X_v: v\in [m])\) are i.i.d. random variables uniformly distributed on (0, 1) .

Finally, define the function

$$\begin{aligned} \Lambda _{(m)}(\mathbf {t}) := a\sum _{v\in [m]} p_v \mathfrak {G}_{(m)}(v). \end{aligned}$$
(4.20)

While all of these objects depend on the tree \(\mathbf {t}\), we suppress this dependence to ease notation. Now Proposition 4.3 implies we can construct \({{\tilde{\mathscr {G}}}}_m(\mathbf {p},a)\) via the following five steps:

  1. (i)

    Tilted \(\mathbf {p}\) -tree Generate a tilted ordered \(\mathbf {p}\)-tree \(\mathscr {T}^{\mathbf {p},\star }_m\) with distribution (4.5). Now consider the (random) objects \(\mathfrak {P}(v,\mathscr {T}^{\mathbf {p},\star }_m)\) for \(v\in [m]\) and the corresponding (random) functions \(\mathfrak {G}_{(m)}(\cdot )\) on [m] and \(A_{(m)}(\cdot )\) on [0, 1].

  2. (ii)

    Poisson number of possible surplus edges Let \(\mathscr {P}\) denote a rate one Poisson process on \(\mathbb {R}_+^2\) and define

    $$\begin{aligned} \bar{A}_{(m)}\cap {\mathscr {P}}:= \left\{ (s,t)\in \mathscr {P}: s\in [0,1], t\le \bar{A}_{(m)}(s)\right\} . \end{aligned}$$
    (4.21)

    Write \(\bar{A}_{(m)}\cap {\mathscr {P}}:= \left\{ (s_j,t_j):1\le j\le N_{(m)}^\star \right\} \) where \(N_{(m)}^\star = |\bar{A}_{(m)}\cap {\mathscr {P}}|\). We will now use the set \(\left\{ (s_j, t_j):1\le j\le N_{(m)}^\star \right\} \) to generate pairs of points \(\left\{ (\mathscr {L}_j,\mathscr {R}_j): 1\le j\le N_{(m)}^\star \right\} \) in the tree that will be joined to form the surplus edges.

  3. (iii)

    “First” endpoints Fix j and suppose \(s_j \in (y^*(i-1), y^*(i)]\) for some \(i\ge 1\), where \(y^*(i)\) is as given right above (4.19). Then the first endpoint of the surplus edge corresponding to \((s_j, t_j)\) is \(\mathscr {L}_j:= v(i)\).

  4. (iv)

    “Second” endpoints Note that in the interval \((y^*(i-1), y^*(i)]\), the function \(\bar{A}_{(m)}\) is of constant height \(a\mathfrak {G}_{(m)}(v(i))\). We will view this height as being partitioned into sub-intervals of length \(a p_u\) for each \(u\in \mathfrak {P}(v(i),\mathscr {T}^{\mathbf {p},\star }_m)\), the collection of endpoints of permitted edges emanating from \(\mathscr {L}_k\). (Assume that this partitioning is done according to some preassigned rule, e.g., using the order of the vertices in \(\mathfrak {P}(v(i),\mathscr {T}^{\mathbf {p},\star }_m)\)). Suppose \(t_j\) belongs to the interval corresponding to u. Then the second endpoint is \(\mathscr {R}_j = u\). Form an edge between \((\mathscr {L}_j, \mathscr {R}_j)\).

  5. (v)

    In this construction, it is possible that one created more than one surplus edge between two vertices. Remove any multiple surplus edges.

Lemma 4.10

The above construction gives a random graph with distribution \({\tilde{\mathscr {G}}}_m(\mathbf {p},a)\) as in Definition 4.2. Further, conditional on \(\mathscr {T}^{\mathbf {p},\star }_m\):

  1. (a)

    \(N_{(m)}^\star \) has Poisson distribution with mean \(\Lambda _{(m)}(\mathscr {T}_m^{\mathbf {p},\star })\) where \(\Lambda _{(m)}\) is as in (4.20).

  2. (b)

    Conditional on \(\mathscr {T}_m^{\mathbf {p},\star }\) and \(N_{(m)}^\star =k\), the first endpoints \((\mathscr {L}_j: 1\le j\le k)\) can be generated in an i.i.d. fashion by sampling from the vertex set [m] with probability distribution

    $$\begin{aligned} {\mathscr {J}}^{(m)}(v) \propto p_v \mathfrak {G}_{(m)}(v), \quad v\in [m]. \end{aligned}$$
  3. (c)

    Conditional on \(\mathscr {T}_m^{\mathbf {p},\star }\), \(N_{(m)}^\star =k\) and the first endpoints \((\mathscr {L}_j: 1\le j\le k)\), the second endpoints can be generated in an i.i.d. fashion where the probability that \(\mathscr {R}_j = u\) is proportional to \(p_u\) if u is a right child of some individual \(y\in [\rho ,\mathscr {L}_j]\).

Proof

The assertions follow from Proposition 4.3 and standard properties of Poisson processes. \(\square \)

The modified space \(\mathscr {G}_m^{{{\mathrm{mod}}}}(\mathbf {p},a)\): We construct a modified graph \(\mathscr {G}_m^{{{\mathrm{mod}}}}(\mathbf {p},a)\) as follows:

(i\(^\prime \)):

Generate a tilted ordered \(\mathbf {p}\)-tree \(\mathscr {T}_m^{\mathbf {p},\star }\) with distribution (4.5).

(ii\(^\prime \)):

Conditional on \(\mathscr {T}_m^{\mathbf {p},\star }=k\), generate \(N_{(m)}^\star \sim \mathsf{Poi}(\Lambda _{(m)}(\mathscr {T}_m^{\mathbf {p},\star }))\).

(iii\(^\prime \)):

Conditional on \(\mathscr {T}_m^{\mathbf {p},\star }\) and \(N_{(m)}^\star =k\), generate the first endpoints \((\mathscr {L}_j: 1\le j\le k)\) in an i.i.d. fashion by sampling from the vertex set [m] with probability distribution

$$\begin{aligned} {\mathscr {J}}^{(m)}(v) \propto p_v \mathfrak {G}_{(m)}(v), \quad v\in [m]. \end{aligned}$$
(iv\(^\prime \)):

Conditional on \(\mathscr {T}_m^{\mathbf {p},\star }\), \(N_{(m)}^\star =k\) and the first endpoints \((\mathscr {L}_j: 1\le j\le k)\), generate the second endpoints in an i.i.d. fashion where conditional on \(\mathscr {L}_j = v\), the probability distribution of \(\mathscr {R}_j\) is given by

$$\begin{aligned} Q_{v}^{(m)}(y):= {\left\{ \begin{array}{ll} \sum _{u} p_u \mathbbm {1}\left\{ u\in \mathscr {R}\mathscr {C}(y,[\rho ,v])\right\} /\mathfrak {G}_{(m)}(v) &{} \text { if } y\in [\rho ,v],\\ 0 &{} \text { otherwise }. \end{array}\right. } \end{aligned}$$
(4.22)

Identify \(\mathscr {L}_j\) and \(\mathscr {R}_j\) for \(1\le j\le k\).

Thus, instead of adding an edge between \(\mathscr {L}_j\) and one of the right children on the path \([\rho ,\mathscr {L}_j]\) as in Lemma 4.10(c), we identify it to the parent of this vertex which is on \([\rho ,\mathscr {L}_j]\). Also, we do not remove any multiple surplus edges. This construction turns out to be easier to work with. \(\mathscr {G}_m^{{{\mathrm{mod}}}}(\mathbf {p},a)\) will be viewed as a metric measure space via the graph distance where vertex v has mass \(\sum p_u\) where the sum is taken over all \(u\in [m]\) which have been identified with v. Intuitively it is clear that \(\sigma (\mathbf {p}){{\tilde{\mathscr {G}}}}_m(\mathbf {p},a)\) and \(\sigma (\mathbf {p})\mathscr {G}_m^{{{\mathrm{mod}}}}(\mathbf {p},a)\) are “close”. This is formalized in Lemma 4.12.

Remark 6

At this point we urge the reader to go back to Sect. 2.3.1 and remind themselves of the four steps in the construction of the limit metric space \(\mathscr {G}_{\infty }(\varvec{\theta },\gamma )\), and note the similarities to the construction above. In particular, we make note of the following:

  1. (a)

    For finite m, we essentially tilt the \(\mathbf {p}\)-tree distribution via the functional \({\bar{L}}(\mathscr {T}_m^{\mathbf {p}}) = \exp (a{{\mathrm{\mathbb {E}}}}[\mathfrak {G}_{(m)}(V_1)\ |\ \mathscr {T}_{m}^{\mathbf {p}}])\) (the term \(\mathbb {I}(\mathscr {T}_m^{\mathbf {p}})\) as in (4.8) can be ignored as we will see in Lemma 4.14), and the number of shortcut points selected, namely \(N_{(m)}^\star \), has a Poisson distribution with mean \(a{{\mathrm{\mathbb {E}}}}(\mathfrak {G}_{(m)}(V_1)\ |\ \mathscr {T}_{m}^{\mathbf {p},\star })\). Here \(V_1\) has distribution \(\mathbf {p}\).

  2. (b)

    For the limit object, we tilt the measure using the functional \(L_{(\infty )}(\mathscr {T}_{(\infty )}^{\varvec{\theta }}, \varvec{U}) = \exp (\gamma {{\mathrm{\mathbb {E}}}}[\mathfrak {G}_{(\infty )}(V_1)\ |\ \mathscr {T}_{(\infty )}^{\varvec{\theta }}, \varvec{U}])\), and the number of shortcuts, namely \(N_{(\infty )}^\star \), follows a Poisson distribution with mean \(\gamma {{\mathrm{\mathbb {E}}}}(\mathfrak {G}_{(\infty )}(V_1)\ |\ \mathscr {T}_{(\infty )}^{\varvec{\theta },\star }, \varvec{U}^{\star })\). Here \(V_1\) is distributed according to the mass measure \(\mu ^\star \) on \(\mathscr {T}_{(\infty )}^{\varvec{\theta },\star }\).

As a brief warm-up to the kind of calculations in the next section, we now prove a simple lemma on tightness of the number of surplus edges. We will prove distributional convergence of this object in the next section.

Lemma 4.11

Under Assumption 4.4, the sequence \(\left\{ N_{(m)}^\star :m\ge 1\right\} \) is tight, where \(N_{(m)}^\star \) is as given below (4.21).

Proof

Fix \(r > 1\). First note that conditional on \(\mathscr {T}_m^{\mathbf {p},\star } =\mathbf {t}\), \(N_{(m)}^\star \) has a Poisson distribution with mean \(\Lambda _{(m)}(\mathbf {t})\). Thus, there exists a constant \(C = C(r) \) such that

$$\begin{aligned} {{\mathrm{\mathbb {E}}}}([N_{(m)}^\star ]^r|\mathscr {T}_m^{\mathbf {p},\star } =\mathbf {t}) \le C [\Lambda _{(m)}(\mathbf {t})]^r.\nonumber \end{aligned}$$

Further, note that the tilt \(L(\mathbf {t})\) in (4.4) satisfies

$$\begin{aligned} L(\mathbf {t}):= \mathbb {I}(\mathbf {t}) \exp \left( \sum _{(k,\ell ) \in \mathfrak {P}(\mathbf {t})} a p_k p_{\ell }\right) = \mathbb {I}(\mathbf {t}) \exp (\Lambda _{(m)}(\mathbf {t})),\nonumber \end{aligned}$$

where \(1\le \mathbb {I}(\mathbf {t}) \le C^\prime \) for a fixed constant \(C^\prime \) independent of m by (4.9). Thus, Proposition 4.8 shows that

$$\begin{aligned} \sup _{m\ge 1}\ {{\mathrm{\mathbb {E}}}}\big (\exp (\gamma \Lambda _{(m)}(\mathscr {T}^{\mathbf {p}}_m))\big ) < \infty \nonumber \end{aligned}$$

for any \(\gamma >0\). In particular,

$$\begin{aligned} \sup _{m\ge 1}\ {{\mathrm{\mathbb {E}}}}([N_{(m)}^\star ]^r)&\le \sup _{m\ge 1}\ C{{\mathrm{\mathbb {E}}}}\left( [\Lambda _{(m)}(\mathscr {T}^{\mathbf {p},\star }_m)]^r\right) =C\sup _{m\ge 1}\ \frac{{{\mathrm{\mathbb {E}}}}([\Lambda _{(m)}(\mathscr {T}^{\mathbf {p}}_m)]^r L(\mathscr {T}^{\mathbf {p}}_m)))}{{{\mathrm{\mathbb {E}}}}(L(\mathscr {T}^{\mathbf {p}}_m))}<\infty ,\nonumber \end{aligned}$$

which proves tightness of \(\left\{ N_{(m)}^\star :m\ge 1\right\} \). \(\square \)

We conclude this section by proving a lemma which essentially says that it is enough to work with the modified space \(\mathscr {G}_m^{{{\mathrm{mod}}}}(\mathbf {p},a)\).

Lemma 4.12

Recall the five-step construction of \({\tilde{\mathscr {G}}}_m(\mathbf {p}, a)\). Construct \(\mathscr {G}_m^{{{\mathrm{mod}}}}(\mathbf {p},a)\) on the same space by coupling it with \({\tilde{\mathscr {G}}}_m(\mathbf {p}, a)\) in the obvious way. Then, under Assumption 4.4,

$$\begin{aligned} d_{{{\mathrm{GHP}}}}\left( \sigma (\mathbf {p}){{\tilde{\mathscr {G}}}}_m(\mathbf {p},a),\ \sigma (\mathbf {p})\mathscr {G}_m^{{{\mathrm{mod}}}}(\mathbf {p},a)\right) \mathop {\longrightarrow }\limits ^{\mathrm {P}}0. \end{aligned}$$

Proof

Define the event

$$\begin{aligned} F:=\left\{ N_{(m)}^\star \text { equals the number of surplus edges in }{\tilde{\mathscr {G}}}_m(\mathbf {p}, a)\right\} . \end{aligned}$$

In other words, F describes the event in which \({\tilde{\mathscr {G}}}_m(\mathbf {p}, a)\) does not have multiple surplus edges. It is easy to check that

$$\begin{aligned} d_{{{\mathrm{GHP}}}}\left( {{\tilde{\mathscr {G}}}}_m(\mathbf {p},a),\ \mathscr {G}_m^{{{\mathrm{mod}}}}(\mathbf {p},a)\right) \le N_{(m)}^\star \ \text { on the set }\ F. \end{aligned}$$

Thus, Lemma 4.11 combined with the assumption \(\sigma (\mathbf {p})\rightarrow 0\) yields the result provided we show that \({{\mathrm{\mathbb {P}}}}(F^c)\rightarrow 0\). To this end, note that

$$\begin{aligned}&{{\mathrm{\mathbb {P}}}}\left( \exists \text { multiple surplus edges between }u\text { and }v\big |\mathscr {T}^{\mathbf {p},\star }_m=\mathbf {t}\right) \\&\quad \quad \quad ={{\mathrm{\mathbb {P}}}}\left( \mathsf{Poi}(ap_u p_v)\ge 2\right) \le c(ap_u p_v)^2 \end{aligned}$$

for every \(u\in [m]\), \(v\in \mathfrak {P}(u,\mathbf {t})\), and some universal positive constant c. Hence

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( F^c\mid \mathscr {T}^{\mathbf {p},\star }_m=\mathbf {t}\right)&\le ca^2\sigma (\mathbf {p})^2\sum _{u\in [m]}p_u^2\sum _{v\in \mathfrak {P}(u,\mathbf {t})}\left( p_v/\sigma (\mathbf {p})\right) ^2\\&\le c(a\sigma (\mathbf {p}))^2\sum _{u\in [m]}p_u^2=c(a\sigma (\mathbf {p}))^2 \sigma (\mathbf {p})^2. \end{aligned}$$

Since \(\sigma (\mathbf {p})\rightarrow 0\) and \(a\sigma (\mathbf {p})\rightarrow \gamma \), \({{\mathrm{\mathbb {P}}}}(F^c)\rightarrow 0\) as desired. \(\square \)

4.5 Completing the Proof of Theorem 4.5

At this point we urge the reader to remind themselves of (a) the four steps in the construction of the limit object in Sect. 2.3, (b) the birthday construction of \(\mathbf {p}\)-trees at the end of Sect. 4.2.1 and (c) the definition of Gromov-weak topology in Sect. 2.1.2 of complete separable measured metric spaces \(\mathscr {S}_*\). Fix \(\ell \ge 1\) and a bounded continuous function \(\phi :\mathbb {R}_+^{\ell ^2}\rightarrow \mathbb {R}\). Let \(\Phi \) be as in (2.3). To simplify notation, we will write \(\Phi (X)\) instead of \(\Phi (X,d,\mu )\). To prove Theorem 4.5, we need to show that for every fixed \(\ell \ge 1\) and functions \(\phi \) and \(\Phi \) as above,

$$\begin{aligned} {{\mathrm{\mathbb {E}}}}\left[ \Phi \left( \sigma (\mathbf {p})\cdot {\tilde{\mathscr {G}}}_m(\mathbf {p},a)\right) \right] \rightarrow {{\mathrm{\mathbb {E}}}}\left[ \Phi \left( \mathscr {G}_{\infty }(\varvec{\theta },\gamma )\right) \right] \quad \text{ as } m\rightarrow \infty , \end{aligned}$$

where we sample \(\ell \) points according to \(\mathbf {p}\) in \({\tilde{\mathscr {G}}}_m(\mathbf {p},a)\) while we sample \(\ell \) points according to the measure on \(\mathscr {G}_{\infty }(\varvec{\theta },\gamma )\) inherited from the mass measure. Now recall the explicit five step construction of \({\tilde{\mathscr {G}}}_m(\mathbf {p},a)\) in Sect. 4.4 starting from the tilted \(\mathbf {p}\)-tree \(\mathscr {T}_m^{\mathbf {p},\star }\) and the Poisson number of surplus edges \(N_{(m)}^\star \). Fix \(K\ge 1\) and note that

$$\begin{aligned}&\left| {{\mathrm{\mathbb {E}}}}\left[ \Phi \left( \sigma (\mathbf {p}){\tilde{\mathscr {G}}}_m(\mathbf {p},a)\right) \right] - \sum _{k=0}^K {{\mathrm{\mathbb {E}}}}\left[ \Phi \left( \sigma (\mathbf {p}){\tilde{\mathscr {G}}}_m(\mathbf {p},a)\right) \mathbbm {1}\left\{ N_{(m)}^\star =k\right\} \right] \right| \\&\qquad \le ||\phi ||_{\infty } {{\mathrm{\mathbb {P}}}}(N_{(m)}^\star \ge K+1). \end{aligned}$$

Using Lemma 4.11, we can choose K large (independent of m) to make the bound on the right arbitrarily small. Further, in view of Lemma 4.12, we can work with \({\mathscr {G}}_m^{{{\mathrm{mod}}}}(\mathbf {p},a)\) instead of \({\tilde{\mathscr {G}}}_m(\mathbf {p},a)\). Hence it suffices to prove the following convergence for every fixed \(k\ge 0\):

$$\begin{aligned}&{{\mathrm{\mathbb {E}}}}\left[ \Phi \left( \sigma (\mathbf {p}){\mathscr {G}}_m^{{{\mathrm{mod}}}}(\mathbf {p},a)\right) \mathbbm {1}\left\{ N_{(m)}^\star =k\right\} \right] \nonumber \\&\quad \rightarrow {{\mathrm{\mathbb {E}}}}\left[ \Phi \left( {\mathscr {G}}_{\infty }(\varvec{\theta }, \gamma )\right) \mathbbm {1}\left\{ N_{(\infty )}^\star =k\right\} \right] \ \quad \text { as }\ m\rightarrow \infty . \end{aligned}$$
(4.23)

To analyze this term, we first need to setup some notation.

Note that both the finite m and the limit object are obtained by starting with a discrete tree for finite m and a real tree in the limit, and sampling a random number of pairs to create “shortcuts”. Recall the space \(\mathbf {T}_{IJ}^*\) in Sect. 2.1.3. Fix \(k\ge 0\) and let \(\mathbf {t}\) be an element in \(\mathbf {T}_{I,(k+\ell )}^*\) for some \(I\ge 0\). “I” will not play a role in the definition below. Write \(\rho \) for the root and denote the leaves by

$$\begin{aligned} \mathbf {x}_{k,k+\ell }:= (x_1, x_2, \ldots ,x_{k}, x_{k+1}, \ldots , x_{k+\ell }). \end{aligned}$$

Also recall that for each i, there is a probability measure \(\nu _{\mathbf {t},i }(\cdot )\) on the path \([\rho , x_i]\) for \(1\le i\le k+\ell \). For \(1\le i\le k\), sample \(y_i\) according to the distribution \(\nu _{\mathbf {t},i}(\cdot )\) independently for different i and connect \(x_i\) and \(y_i\). Let \(\mathbf {t}'\) denote the (random) tree thus obtained and let \(d_{\mathbf {t}'}\) denote the graph distance on \(\mathbf {t}'\). Define the function \(g^{(k)}_\phi :\mathbf {T}_{I,(k+\ell )}^*\rightarrow \mathbb {R}\) by

$$\begin{aligned} g^{(k)}_{\phi }(\mathbf {t}):= \left\{ \begin{array}{l} {{\mathrm{\mathbb {E}}}}\left[ \phi \left( d_{\mathbf {t}'}(x_i, x_j): k+1\le i\le k+\ell \right) \right] ,\quad \text { if }\mathbf {t}\ne \partial ,\\ 0,\text { if }\mathbf {t}=\partial . \end{array} \right. \end{aligned}$$
(4.24)

In words, we look at the expectation of \(\phi \) applied to the pairwise distances between the last \(\ell \) leaves after sampling \(y_i\) on the path \([\rho , x_i]\) for \(1\le i\le k\) and connecting \(x_i\) and \(y_i\). Note that here the expectation is only taken over the choices of \(y_i\).

Next, given \(\mathbf {t}\in \mathbb {T}_m^{{{\mathrm{ord}}}}\) and \(\varvec{v}:=(v_1, \ldots , v_{r})\) with \(v_i\in [m]\), set \(\mathbf {t}(\varvec{v})\) to be the subtree of \(\mathbf {t}\) spanning the vertices \(\varvec{v}\) and the root provided \(v_1, \ldots , v_{r}\) are all distinct and none of them is an ancestor of another vertex in \(\varvec{v}\). When this condition fails, set \(\mathbf {t}(\varvec{v})=\partial \).

Now, conditional on \(\mathscr {T}_m^{\mathbf {p},\star }\), construct a tree \(\mathscr {T}_m^{\mathbf {p},\star }({\widetilde{\mathbf {V}}}_{k,k+\ell }^{(m)})\) where

  1. (i)

    \({\widetilde{\mathbf {V}}}_{k,k+\ell }^{(m)}:= ({\bar{V}}_1^{(m)}, \ldots , {\bar{V}}_k^{(m)},V_{k+1}^{(m)},\ldots V_{k+\ell }^{(m)})\);

  2. (ii)

    \({\bar{V}}_i^{(m)}\), \(1\le i\le k\) are i.i.d. with the distribution \(\mathscr {J}^{(m)}(\cdot )\) as in Lemma 4.10(b); and

  3. (iii)

    \(V_{k+1}^{(m)}, \ldots V_{k+\ell }^{(m)}\) are i.i.d. with distribution \(\mathbf {p}\). Further, \({\bar{V}}_1^{(m)}, \ldots , {\bar{V}}_k^{(m)},V_{k+1}^{(m)},\ldots V_{k+\ell }^{(m)}\) are jointly independent.

We will drop the superscript and simply write \(V_i\), \({\bar{V}}_i\) etc. when there is no scope of confusion. Note that \(\mathscr {T}_m^{\mathbf {p},\star }({\widetilde{\mathbf {V}}}_{k,k+\ell })=\partial \) whenever \({\bar{V}}_1, \ldots , {\bar{V}}_k,V_{k+1},\ldots V_{k+\ell }\) are not all distinct or one of them is an ancestor of another vertex in \({\widetilde{\mathbf {V}}}_{k,k+\ell }\). In either of these two case, the subtree spanned by the root and \({\widetilde{\mathbf {V}}}_{k,k+\ell }\) will have less than \(k+\ell \) leaves. We made the convention of setting \(\mathscr {T}_m^{\mathbf {p},\star }({\widetilde{\mathbf {V}}}_{k,k+\ell })=\partial \) to make sure that we are always working with a bona fide element in \(\mathbf {T}_{I,(k+\ell )}^*\). However, this makes no difference at all since by [27, Corollary 15],

$$\begin{aligned} \lim _m\ {{\mathrm{\mathbb {P}}}}\left( \mathscr {T}_m^{\mathbf {p}}(V_1,\ldots ,V_{k+\ell })=\partial \right) =0 \end{aligned}$$

where \(V_1,\ldots ,V_{k+\ell }\) are i.i.d. \(\mathbf {p}\) random variables. Now \(\mathscr {T}_m^{\mathbf {p},\star }\) is obtained by tilting the distribution of \(\mathscr {T}_m^{\mathbf {p}}\), where the tilt \(L(\cdot )\) is uniformly integrable (Proposition 4.8). Further, \({\bar{V}}_i\), \(1\le i\le k\) are i.i.d. with the distribution \(\mathscr {J}^{(m)}(v)\propto p_v\mathfrak {G}_{(m)}(v)\) where \(\max _v \mathfrak {G}_{(m)}(v)\) is stochastically dominated by \(\Vert F^{{{\mathrm{exc}}}, \mathbf {p}}\Vert _{\infty }\) (see (4.12) and the discussion below (4.19)). It thus follows that

$$\begin{aligned} \lim _m\ {{\mathrm{\mathbb {P}}}}\left( \mathscr {T}_m^{\mathbf {p},\star }({\widetilde{\mathbf {V}}}_{k,k+\ell })=\partial \right) =0. \end{aligned}$$
(4.25)

Using (4.25), we see that

$$\begin{aligned}&{{\mathrm{\mathbb {E}}}}\left[ \Phi \left( \sigma (\mathbf {p}){\mathscr {G}}_m^{{{\mathrm{mod}}}}(\mathbf {p},a)\right) \mathbbm {1}\left\{ N_{(m)}^\star =k\right\} \right] \nonumber \\&\quad \quad \quad ={{\mathrm{\mathbb {E}}}}\left\{ {{\mathrm{\mathbb {E}}}}_{\mathbf {p},\star }\left[ g_\phi ^{(k)}\left( \sigma (\mathbf {p})\mathscr {T}_m^{\mathbf {p},\star }({\widetilde{\mathbf {V}}}_{k,k+\ell })\right) \right] \mathbbm {1}\left\{ N_{(m)}^\star =k\right\} \right\} +o(1), \end{aligned}$$
(4.26)

where \({{\mathrm{\mathbb {E}}}}_{\mathbf {p},\star }(\cdot ):= {{\mathrm{\mathbb {E}}}}(\cdot |\mathscr {T}_{m}^{\mathbf {p},\star })\). At this point, we also define \({{\mathrm{\mathbb {E}}}}_{\mathbf {p}}(\cdot ):={{\mathrm{\mathbb {E}}}}(\cdot |\mathscr {T}_{m}^{\mathbf {p}})\) where \(\mathscr {T}_{m}^{\mathbf {p}}\) has the original ordered \(\mathbf {p}\)-tree distribution (2.5).

Now since \(\mathscr {J}^{(m)}(v)\propto p_v \mathfrak {G}_{(m)}(v)\), we see that the inner expectation in (4.26) can be simplified as

$$\begin{aligned} {{\mathrm{\mathbb {E}}}}_{\mathbf {p},\star }\left[ g_\phi ^{(k)}\left( \sigma (\mathbf {p})\mathscr {T}_m^{\mathbf {p},\star }({\widetilde{\mathbf {V}}}_{k,k+\ell })\right) \right] = \frac{{{\mathrm{\mathbb {E}}}}_{\mathbf {p},\star }\left[ \prod _{i=1}^k \mathfrak {G}_{(m)}(V_i) g_\phi ^{(k)}\left( \sigma (\mathbf {p})\mathscr {T}_m^{\mathbf {p},\star }({\mathbf {V}}_{k,k+\ell })\right) \right] }{[{{\mathrm{\mathbb {E}}}}_{\mathbf {p},\star }(\mathfrak {G}_{(m)}(V_1)]^k}, \end{aligned}$$
(4.27)

where \(\mathbf {V}_{k, k+\ell } = (V_1, V_2, \ldots V_{k+\ell })\), and \(V_i\) are i.i.d. with distribution \(\mathbf {p}\). Since \(\mathscr {T}_m^{\mathbf {p},\star }\) is sampled according to a tilted \(\mathbf {p}\)-tree distribution, combining (4.26), and (4.27), we get the following result:

Lemma 4.13

Fix \(k\ge 0\). Define the events \(A^{\star }_{(m),k}=\{N^{\star }_{(m)}=k\}\) and \(A_{(m),k}=\{N_{(m)}=k\). Then

$$\begin{aligned}&{{\mathrm{\mathbb {E}}}}\left[ \Phi \left( \sigma (\mathbf {p}){\mathscr {G}}_m^{{{\mathrm{mod}}}}(\mathbf {p},a)\right) \mathbbm {1}\left\{ A^{\star }_{(m),k}\right\} \right] \nonumber \\&\quad = C_m{{\mathrm{\mathbb {E}}}}\left[ \frac{{{\mathrm{\mathbb {E}}}}_{\mathbf {p}}\left[ \left( \prod _{i=1}^k \mathfrak {G}_{(m)}(V_i)\right) g_\phi ^{(k)}\left( \sigma (\mathbf {p})\mathscr {T}_m^{\mathbf {p}}({\mathbf {V}}_{k,k+\ell })\right) \right] }{[{{\mathrm{\mathbb {E}}}}_{\mathbf {p}}(\mathfrak {G}_{(m)}(V_1)]^k} L(\mathscr {T}_{m}^{\mathbf {p}})\mathbbm {1}\left\{ A_{(m),k}\right\} \right] \nonumber \\&\qquad +o(1), \end{aligned}$$
(4.28)

where \(C_m = \left\{ {{\mathrm{\mathbb {E}}}}(L(\mathscr {T}_m^{\mathbf {p}}))\right\} ^{-1}\), and L is the tilt as in (4.4). Further, conditional on \(\mathscr {T}_m^{\mathbf {p}}\), \(N_{(m)}\) has a Poisson distribution with mean \(\Lambda _{(m)}(\mathscr {T}_m^{\mathbf {p}})= a{{\mathrm{\mathbb {E}}}}_{\mathbf {p}}(\mathfrak {G}_{(m)}(V))\) as in (4.20), where V has distribution \(\mathbf {p}\) independent of \(\mathscr {T}_m^{\mathbf {p}}\).

This formula will be the starting point to prove (4.23). Recall from (4.8) that the tilt \(L(\cdot ) = \mathbb {I}(\cdot ) {\bar{L}}(\cdot )\), where \(\mathbb {I}(\cdot )\) has a messy form given by (4.9). We have already seen in (4.10) that under Assumption 4.4, \(\mathbb {I}(\cdot )\le C\) for a constant C all \(m\ge 1\). The following lemma coupled with dominated convergence theorem will now imply that we can replace L with \({\bar{L}}\) in Lemma 4.13 and in all the subsequent analysis below:

Lemma 4.14

Under Assumption 4.4, \(\mathbb {I}(\mathscr {T}_m^{\mathbf {p}}) \mathop {\longrightarrow }\limits ^{\mathrm {P}}1\) as \(m\rightarrow \infty \).

Proof

By (4.9) we have \(1\le \mathbb {I}(\mathscr {T}_m^{\mathbf {p}}) \le \exp (a \sum _{(k,l)\in E(\mathscr {T}_m^{\mathbf {p}})} p_k p_l)\). Thus it is enough to show that \(a{{\mathrm{\mathbb {E}}}}(\sum _{(k,l)\in E(\mathscr {T}_m^{\mathbf {p}})} p_k p_l) \rightarrow 0\). Now for \(k\ne l\in [m]\), write \(\left\{ k\leadsto l\right\} \) for the event in which l is a child of k in \(\mathscr {T}_m^{\mathbf {p}}\). Then standard properties of \(\mathbf {p}\)-trees [59, Section 6.2] implies that for \(k\ne l_1\ne l_2\in [m]\)

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(k\leadsto l_1) = p_k, \quad {{\mathrm{\mathbb {P}}}}(k\leadsto l_1 \text{ and } k\leadsto l_2) = p_k^2. \end{aligned}$$
(4.29)

Thus

$$\begin{aligned} a{{\mathrm{\mathbb {E}}}}\left( \sum _{(k,l)\in E(\mathscr {T}_m^{\mathbf {p}})} p_k p_l\right) = a\sum _{k=1}^m p_k \sum _{l\ne k}p_l p_k \le a\sum _{k=1}^m p_k^2 = a[\sigma (\mathbf {p})]^2\rightarrow 0, \end{aligned}$$

as \(m\rightarrow \infty \) by Assumption 4.4. \(\square \)

Write \({{\mathrm{\mathbb {E}}}}_{\varvec{\theta }}\) for expectation conditional on \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) and the random variables \(U_j^{(i)}\) that encode the order on \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\), i.e.,

$$\begin{aligned} {{\mathrm{\mathbb {E}}}}_{\varvec{\theta }}(\cdot ):={{\mathrm{\mathbb {E}}}}\left( \cdot \ \big |\ \mathscr {T}_{(\infty )}^{\varvec{\theta }},\ \varvec{U}\right) , \end{aligned}$$

and note that \({{\mathrm{\mathbb {E}}}}\left[ \Phi \left( {\mathscr {G}}_{\infty }(\varvec{\theta }, \gamma )\right) \mathbbm {1}\left\{ N_{(\infty )}^\star =k\right\} \right] \) has an expression similar to (4.28). Indeed, from the construction of \({\mathscr {G}}_{\infty }(\varvec{\theta }, \gamma )\) given in Sect. 2.3.1, it follows that

$$\begin{aligned}&{{\mathrm{\mathbb {E}}}}\left[ \Phi \left( {\mathscr {G}}_{\infty }(\varvec{\theta }, \gamma )\right) \mathbbm {1}\left\{ N_{(\infty )}^\star =k\right\} \right] \nonumber \\&\quad = C_{\infty }{{\mathrm{\mathbb {E}}}}\left[ \frac{{{\mathrm{\mathbb {E}}}}_{\varvec{\theta }}\left[ \left( \prod _{i=1}^k \mathfrak {G}_{(\infty )}(V_i^{(\infty )})\right) g_\phi ^{(k)}\left( \mathscr {T}_{(\infty )}^{\varvec{\theta }}({\mathbf {V}}_{k,k+\ell }^{(\infty )})\right) \right] }{\left[ {{\mathrm{\mathbb {E}}}}_{\varvec{\theta }}(\mathfrak {G}_{(\infty )}(V_1^{(\infty )})\right] ^k}\right. \nonumber \\&\qquad \left. \quad L_{(\infty )}(\mathscr {T}_{(\infty )}^{\varvec{\theta }}, \varvec{U})\mathbbm {1}\left\{ N_{(\infty )} =k\right\} \right] , \end{aligned}$$
(4.30)

where (a) \(\mathfrak {G}_{(\infty )}(\cdot )\) is as defined in (2.7) (b) \(L_{(\infty )}(\mathscr {T}_{(\infty )}^{\varvec{\theta }}, \varvec{U})\) is as in (2.9), (c) \(C_{\infty }=[{{\mathrm{\mathbb {E}}}}L_{(\infty )}(\mathscr {T}_{(\infty )}^{\varvec{\theta }}, \varvec{U})]^{-1}\), (d) \(V_i^{(\infty )}\) are i.i.d. random variables sampled from \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) using the mass measure \(\mu \), (e) \({\mathbf {V}}_{k,k+\ell }^{(\infty )}=(V_1^{(\infty )},\ldots , V_{k+\ell }^{(\infty )})\), (f) \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}({\mathbf {V}}_{k,k+\ell }^{(\infty )})\) is the tree spanned by the root of \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) and \({\mathbf {V}}_{k,k+\ell }^{(\infty )}\), viewed as an element of \(\mathbf {T}_{0, k+\ell }^{*}\) by declaring the leaf values to be \(\mathfrak {G}_{(\infty )}(V_j^{(\infty )})\) and the root-to-leaf measures to be \(Q_{V_j}^{(\infty )}(\cdot )\) as in (2.8), and (g) conditional on \((\mathscr {T}_{(\infty )}^{\varvec{\theta }}, \varvec{U})\), \(N_{(\infty )}\) has a Poisson distribution with mean

$$\begin{aligned} \Lambda _{(\infty )}:=\gamma \int _{y\in \mathscr {T}_{(\infty )}^{\varvec{\theta }}} \mathfrak {G}_{(\infty )}(y)\mu (dy)={{\mathrm{\mathbb {E}}}}_{\varvec{\theta }}\left[ \mathfrak {G}_{(\infty )}(V_1^{(\infty )})\right] . \end{aligned}$$

Finally, observe that \(L_{(m)}(\cdot )=\mathbb {I}_{(m)}(\cdot ){\bar{L}}_{(m)}(\cdot )\) where \(\bar{L}_{(m)}(\mathbf {t})=\exp (a{{\mathrm{\mathbb {E}}}}_{\mathbf {p}}[\mathfrak {G}_{(m)}(V_1^{(m)})])\), and recall that \(a\sigma (\mathbf {p})\rightarrow \gamma \) (Assumption 4.4) and \({L}_{(m)}(\mathscr {T}_m^{\mathbf {p}})\) is uniformly integrable (Proposition 4.8). Therefore, combining Lemma 4.14, Lemma 4.13 and (4.30) with Theorem 4.15 stated below yields (4.23) and thus completes the proof of Theorem 4.5.

Theorem 4.15

For each \(k\ge 0\),

$$\begin{aligned}&\left( {{\mathrm{\mathbb {E}}}}_{\mathbf {p}}\left[ \frac{\mathfrak {G}_{(m)}(V_1^{(m)})}{\sigma (\mathbf {p})}\right] , ~{{\mathrm{\mathbb {E}}}}_{\mathbf {p}}\left[ \left( \prod _{i=1}^k \frac{\mathfrak {G}_{(m)}(V_i^{(m)})}{\sigma (\mathbf {p})}\right) g_\phi ^{(k)}\left( \sigma (\mathbf {p})\mathscr {T}_m^{\mathbf {p}}({\mathbf {V}}_{k,k+\ell }^{(m)})\right) \right] \right) \nonumber \\&\quad \mathop {\longrightarrow }\limits ^{d} \left( {{\mathrm{\mathbb {E}}}}_{\varvec{\theta }}\left[ \mathfrak {G}_{(\infty )}(V_1^{(\infty )})\right] ,~ {{\mathrm{\mathbb {E}}}}_{\varvec{\theta }}\left[ \left( \prod _{i=1}^k {\mathfrak {G}_{(\infty )}(V_i^{(\infty )})}\right) g_\phi ^{(k)}\left( \mathscr {T}_{(\infty )}^{\varvec{\theta }}({\mathbf {V}}_{k,k+\ell }^{(\infty )})\right) \right] \right) . \end{aligned}$$
(4.31)

The proof of this theorem is accomplished via the following two theorems for which we need to set up some notation. Fix \(I\ge 0\) and \(J\ge 1\). We will assume that \(\mathscr {T}_m^{\mathbf {p}}\) has been constructed via the birthday construction (see Sect. 4.2.1). This construction gives rise to an unordered \(\mathbf {p}\)-tree. To obtain an ordered \(\mathbf {p}\)-tree from this, let \(\mathscr {D}_{(m)}(i)\) denote the set of children of i in the \(\mathbf {p}\)-tree for every vertex i. Generate i.i.d. uniform random variables \(\varvec{U}_{(m)}(i):=\left\{ U_{(m),i}(v): v\in \mathscr {D}_{(m)}(i)\right\} \), independent across \(v\in \mathscr {T}_m^{\mathbf {p}}\). Think of these as “ages” of the children and arrange the children from left to right in decreasing order of their ages. We can construct the function \(\mathfrak {G}_{(m)}(\cdot )\) as in (4.18) once this ordering has been defined.

Now recall that the right hand side of (4.7) tells us how to sample J i.i.d. points \((V_1^{(m)}, \ldots , V_J^{(m)})\) from distribution \(\mathbf {p}\) and the corresponding spanning subtree \(\mathscr {T}_J^{\mathscr {B}}\) from the tree using the repeat time sequence \(\left\{ R_k^{(m)}: k\ge 1\right\} \). Thus, by the Jth repeat time \(R_J\), we would have sampled all J vertices \(V_i^{(m)} = Y_{R_{i}-1}\). View \(\mathscr {T}_J^{\mathscr {B}}\) as a tree with edge lengths and marked vertices as follows: (a) rescale every edge to have length \(\sigma (\mathbf {p})\); (b) relabel \(V_j\) as \(j+\) and the root as \(0+\); (c) mark only those vertices \(i\le I\) which occur in \(\mathscr {T}_J^{\mathscr {B}}\); (d) for all \(1\le j \le J\), set the leaf values to be \(\mathfrak {G}_{(m)}(V_j)/\sigma (\mathbf {p})\), and assign the measure \(\nu ^{(m)}_j:= Q^{(m)}_{V_j}\) as defined in (4.22) to the path connecting the root to \(V_j\), i.e., to the path \([0+, j+]\) .

Definition 4.16

Fix \(I\ge 0, J\ge 1\) and consider the tree constructed as above. Set \(r_{IJ}^{(m)}=\mathscr {R}_{IJ}^{(m)}=\partial \) if some \(j+\) is not a leaf or if some leaf has been multiply labeled. Otherwise, write \(r_{IJ}^{(m)}\in \mathbf {T}_{IJ}\) for the tree with edge lengths and at most I labelled hubs, namely where we retain information in (a) and (b) above. Write \(\mathscr {R}_{IJ}^{(m)}\in \mathbf {T}_{IJ}^*\) for the tree where we retain all information (a)–(d) above, namely the leaf values \(\mathfrak {G}_{(m)}(V_j)\) and the root-to-leaf probability measures \(Q_{V_j}^{(m)}(\cdot )\) in addition to (a) and (b).

Now recall the tree \(\mathscr {R}_{IJ}^{(\infty )}\) defined in Sect. 2.2 using the limit ICRT \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\). The main ingredients in the proof of Theorem 4.15 are the following two theorems:

Theorem 4.17

Under Assumption 4.4, \(\mathscr {R}_{IJ}^{(m)} \mathop {\longrightarrow }\limits ^{d}\mathscr {R}_{IJ}^{(\infty )}\) as \(m\rightarrow \infty \) for every fixed \(I\ge 0\) and \(J\ge 1\). This convergence is with respect to the topology defined on \(\mathbf {T}^*_{IJ}\) in Sect. 2.1.3.

The second result we will need is as follows. Recall the function \(g_\phi ^{(k)}\) on \(\mathbf {T}_{I,(k+\ell )}^*\) as in (4.24).

Theorem 4.18

Fix \(I\ge 0\), \(k\ge 0\), \(\ell \ge 2\) and a bounded continuous function \(\phi \) on \(\mathbb {R}^{\ell ^2}\). Then the function \(g_{\phi }^{(k)}\) is continuous on \(\mathbf {T}_{I, (k+\ell )}^*\).

Proof of Theorem 4.15

Assuming Theorems 4.17 and 4.18, let us now show how this completes the proof. Getting a handle directly on the conditional expectations as required in Theorem 4.15 is a little tricky. Naturally, conditional on \(\mathscr {T}_m^{\mathbf {p}}\), repeated sampling of vertices and calculating sample averages should give a good idea of the conditional expectations (and the same for the limit object \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\)). This is made precise in the following simple lemma whose proof we leave to the reader. \(\square \)

Lemma 4.19

Suppose \(\mathbf {X}^{(m)} := (X^{(m),1},X^{(m),2})\) with \(m\in \left\{ 1,2,\ldots ,\right\} \cup \left\{ \infty \right\} \) is a sequence of \(\mathbb {R}^2\)-valued random variables such that for each fixed \(r\ge 1\), there exist random variables \(\mathbf {X}_r^{(m)}:=(X_r^{(m),1} , X_r^{(m),2})\) such that the following hold:

  1. (i)

    There exists a constant \(C <\infty \) such that for any \(m\in \left\{ 1,2,\ldots ,\right\} \cup \left\{ \infty \right\} \), \(r\ge 1\) and \(\varepsilon >0\),

    $$\begin{aligned} \max _{s=1,2}\ {{\mathrm{\mathbb {P}}}}\left( |X^{(m),s} - X_r^{(m),s}| > \varepsilon \right) \le \frac{C}{\varepsilon ^2 r}. \end{aligned}$$
  2. (ii)

    For each fixed \(r\ge 1\), \(\mathbf {X}_r^{(m)}\mathop {\longrightarrow }\limits ^{d}\mathbf {X}_r^{(\infty )}\).

Then \(\mathbf {X}^{(m)}\mathop {\longrightarrow }\limits ^{d}\mathbf {X}^{(\infty )}\).

We will apply this lemma with the random variables that arise in Theorem 4.15. That is, we set

$$\begin{aligned} X^{(m),1}:= {{\mathrm{\mathbb {E}}}}_{\mathbf {p}}\left[ \frac{\mathfrak {G}_{(m)}(V^{(m)})}{\sigma (\mathbf {p})}\right] , \quad X^{(\infty ),1}:= {{\mathrm{\mathbb {E}}}}_{\varvec{\theta }}\left[ \mathfrak {G}_{(\infty )}(V^{(\infty )})\right] , \end{aligned}$$

and similarly define \(X^{(m),2}\) and \(X^{(\infty ),2}\) to be the second coordinates in the display (4.31). To define \(\mathbf {X}_r^{(m)}\), we proceed as follows. For each fixed \(r\ge 1\), sample a collection of \(J_r:= [r+(k+\ell )r]\) points all i.i.d. \(\mathbf {p}\) from \(\mathscr {T}_m^{\mathbf {p}}\) and think of them as r individuals points-\((V_1^{(m)}, V_2^{(m)},\ldots , V_r^{(m)})\), and r \((k+\ell )\) dimensional vectors-\(\mathbf {V}^{(m),i}_{k,k+\ell }:= (V_{i1}^{(m)},\ldots , V_{i(k+\ell )}^{(m)})\) for \(1\le i\le r\). Define

$$\begin{aligned} H_{\phi }^{(m)}(i):= \prod _{j=1}^k \frac{\mathfrak {G}_{(m)}(V_{ij}^{(m)})}{\sigma (\mathbf {p})} g_\phi ^{(k)}\left( \sigma (\mathbf {p})\mathscr {T}_m^{\mathbf {p}}({\mathbf {V}}_{k,k+\ell }^{(m),i})\right) ,\quad \text {for}\quad 1\le i\le r. \end{aligned}$$

For \(m=\infty \), sample as above \(J_r\) points using the mass measure \(\mu \) from \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) and define

$$\begin{aligned} H_{\phi }^{(\infty )}(i):= \prod _{j=1}^k {\mathfrak {G}_{(\infty )}(V_{ij}^{(\infty )})} g_\phi ^{(k)}\left( \mathscr {T}_{(\infty )}^{\varvec{\theta }}({\mathbf {V}}_{k,k+\ell }^{(\infty ),i})\right) ,\quad \text {for}\quad 1\le i\le r. \end{aligned}$$

Now define

$$\begin{aligned} X_r^{(m),1}&:= \frac{\sum _{i=1}^r\mathfrak {G}_{(m)}(V_i^{(m)})}{r\sigma (\mathbf {p})}\ \quad \text { for }\ m\in \left\{ 1, 2,\ldots \right\} ,\\ X_r^{(\infty ),1}&:= \frac{\sum _{i=1}^r\mathfrak {G}_{(\infty )}(V_i^{(\infty )})}{r}, \quad \text { and}\\ X_r^{(m),2}&:= \frac{\sum _{i=1}^r H_{\phi }^{(m)}(i)}{r}\ \quad \text { for }\ m\in \left\{ 1, 2,\ldots \right\} \cup \left\{ \infty \right\} . \end{aligned}$$

Let \(\mathbf {X}_r^{(m)}:=(X_r^{(m),1},X_r^{(m),2})\) for \(m\in \left\{ 1, 2,\ldots \right\} \cup \left\{ \infty \right\} \). To complete the proof of the theorem, we have to check the two conditions of Lemma 4.19. Let us check condition (i) of Lemma 4.19 for the first coordinate. The second coordinate can be handled in an identical fashion.

Applying Chebyshev’s inequality conditional on \(\mathscr {T}_m^{\mathbf {p}}\) and then taking expectations, we get

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(|X^{(m),1} - X_r^{(m),1}|>\varepsilon )\le (\varepsilon ^2 r)^{-1}{{\mathrm{\mathbb {E}}}}({{\mathrm{Var}}}_{\mathbf {p}}(\mathfrak {G}_{(m)}(V_1)/\sigma (\mathbf {p})))=: (\varepsilon ^2 r)^{-1} C_{(m)}, \text{ say }, \end{aligned}$$

where \({{\mathrm{Var}}}_{\mathbf {p}}\) defined analogously to \({{\mathrm{\mathbb {E}}}}_{\mathbf {p}}\) is the conditional variance operator. Obviously

$$\begin{aligned} {{\mathrm{\mathbb {E}}}}\left( {{\mathrm{Var}}}_{\mathbf {p}}\left( \frac{\mathfrak {G}_{(m)}(V_1)}{\sigma (\mathbf {p})}\right) \right) \le {{\mathrm{Var}}}\left( \frac{\mathfrak {G}_{(m)}(V_1)}{\sigma (\mathbf {p})}\right) \le {{\mathrm{\mathbb {E}}}}\left( \left( \frac{\mathfrak {G}_{(m)}(V_1)}{\sigma (\mathbf {p})}\right) ^2\right) . \end{aligned}$$

From the argument given below (4.11), it follows that \(\Vert \mathfrak {G}_{(m)}\Vert _{\infty } \le ||F^{{{\mathrm{exc}}},\mathbf {p}}||_{\infty }\). Hence Lemma 4.9 implies that \(\sup _m\ C_{(m)} < \infty \). This verifies (i) of the lemma.

Let us now verify condition (ii) of the lemma. Writing this out explicitly, we have to show for each fixed \(r\ge 1\),

$$\begin{aligned} \left( \frac{\sum _{i=1}^r\mathfrak {G}_{(m)}(V_i^{(m)})}{r\sigma (\mathbf {p})}, \frac{\sum _{i=1}^r H_{\phi }^{(m)}(i)}{r}\right) \mathop {\longrightarrow }\limits ^{d}\left( \frac{\sum _{i=1}^r\mathfrak {G}_{(\infty )}(V_i^{(\infty )})}{r}, \frac{\sum _{i=1}^r H_{\phi }^{(\infty )}(i)}{r}\right) . \end{aligned}$$
(4.32)

To this end, for each \(m\in \left\{ 1,2,\ldots \right\} \cup \left\{ \infty \right\} \), consider the subtree spanning the \(J_r\) points \((V_i^{(m)})_{1\le i\le r}, (\mathbf {V}_{k,k+l}^{(m),i})_{1\le i\le r}\), viewed as an element of \(\mathbf {T}_{IJ}^*\) as in Definition 4.16. Using Theorem 4.17 and continuity of the function \(g_\phi ^{(k)}\) from Theorem 4.18, we get

$$\begin{aligned} \left( \left( \frac{\mathfrak {G}_{(m)}(V_i)}{\sigma (\mathbf {p})}\right) _{1\le i\le r}, \left( H_{\phi }^{(m)}(i)\right) _{1\le i\le r}\right) \mathop {\longrightarrow }\limits ^{d}\left( \left( {\mathfrak {G}_{(\infty )}(V_i)}\right) _{1\le i\le r}, \left( H_{\phi }^{(\infty )}(i)\right) _{1\le i\le r}\right) \end{aligned}$$

with respect to weak convergence on \(\mathbb {R}^{2r}\), which in turn implies (4.32). This completes the verification of the conditions of Lemma 4.19 and thus the proof of Theorem 4.15. \(\square \)

The rest of this section proves Theorems 4.17 and 4.18.

Proof of Theorem 4.17

The proof will rely on a truncation argument that is qualitatively similar to Lemma 4.19. Fix a truncation level \(R\ge 1\). Recall the definition of \(\mathfrak {G}_{(m)}(v)\) from (4.18) which kept track of the contribution of all right children of individuals i on the path \([\rho ,v]\). We will look at a truncated version of this object where we keep track of the potential contributions of only the first R vertices. More precisely let

$$\begin{aligned} \mathfrak {G}_{(m)}^{R}(v):= \sum _{\begin{array}{c} i\in [\rho ,v]\\ i\le R \end{array}} \sum _{j\in [m]} p_j \mathbbm {1}\left\{ j\in \mathscr {R}\mathscr {C}(i,[\rho ,v])\right\} . \end{aligned}$$
(4.33)

Let \(\mathfrak {G}_{(\infty )}^R(\cdot )\) be the analogous modification of \(\mathfrak {G}_{(\infty )}(\cdot )\) defined in (2.7), i.e.,

$$\begin{aligned} \mathfrak {G}_{(\infty )}^R(v)=\sum _{i\le R}\theta _{i}\left[ \sum _{j\ge 1}U_j^{(i)}\times \mathbbm {1}\left\{ v\in \mathscr {T}_j^{(i)}\right\} \right] . \end{aligned}$$

Similarly modify the “second endpoint” measure in (4.22) to keep track of only ancestors with labels \(\le R\), namely

$$\begin{aligned} Q_{v}^{(m), R}(y):= {\left\{ \begin{array}{ll} \sum _{u} p_u \mathbbm {1}\left\{ u\in \mathscr {R}\mathscr {C}(y,[\rho ,v])\right\} /\mathfrak {G}_{(m)}^{R}(v), &{} \text { if } y\in [\rho ,v] \text { and } y\le R,\\ 0, &{} \text { otherwise}. \end{array}\right. } \end{aligned}$$

Note that this does not make sense if \(\mathfrak {G}_{(m)}^{R}(v) = 0\), i.e., when there is no vertex with label \(\le R\) on the path from the root to v. In this case we follow the convention of defining the measure to be the uniform probability measure on the line \([\rho ,v]\). Define \(Q_{v}^{(\infty ), R}(\cdot )\) on \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) in an analogous fashion.

Consider the tree \(r_{IJ}^{(m)}\) as in Definition 4.16, and assign to leaf \(V_j\) the truncated measure \(Q_{V_j}^{(m), R}(\cdot )\) and leaf value \(\mathfrak {G}_{(m)}^{R}(V_j)\) (instead of \(Q_{V_j}^{(m)}(\cdot )\) and \(\mathfrak {G}_{(m)}(V_j)/\sigma (\mathbf {p})\)). We denote the resulting object (which is an element of \(\mathbf {T}_{IJ}^*\)) by \(\mathscr {R}_{IJ}^{(m),R}\). Similarly construct \(\mathscr {R}_{IJ}^{(\infty ),R}\). \(\square \)

Proposition 4.20

The following hold:

  1. (a)

    For all \(R\ge 1\), \(\mathscr {R}_{IJ}^{(m), R} \mathop {\longrightarrow }\limits ^{d}\mathscr {R}_{IJ}^{(\infty ), R}\).

  2. (b)

    \(\mathscr {R}_{IJ}^{(\infty ), R} \mathop {\longrightarrow }\limits ^{d}\mathscr {R}_{IJ}^{(\infty )}\) as \(R\rightarrow \infty \).

  3. (c)

    For any bounded continuous function \(f:\mathbf {T}_{IJ}^* \rightarrow \mathbb {R}\),

    $$\begin{aligned} \limsup _{R\rightarrow \infty }\limsup _{m\rightarrow \infty }\left| {{\mathrm{\mathbb {E}}}}(f(\mathscr {R}_{IJ}^{(m), R})) - {{\mathrm{\mathbb {E}}}}(f(\mathscr {R}_{IJ}^{(m)}))\right| = 0. \end{aligned}$$

Assuming this proposition, we now complete the proof of Theorem 4.17. Note that for any fixed bounded continuous function f on \(\mathbf {T}_{IJ}^*\) and any truncation level \(R\ge 1\), we have

$$\begin{aligned} |{{\mathrm{\mathbb {E}}}}(f(\mathscr {R}_{IJ}^{(\infty )})) - {{\mathrm{\mathbb {E}}}}(f(\mathscr {R}_{IJ}^{(m)}))|&\le |{{\mathrm{\mathbb {E}}}}(f(\mathscr {R}_{IJ}^{(\infty )})) - {{\mathrm{\mathbb {E}}}}(f(\mathscr {R}_{IJ}^{(\infty ),R}))|\\&+|{{\mathrm{\mathbb {E}}}}(f(\mathscr {R}_{IJ}^{(\infty ),R})) - {{\mathrm{\mathbb {E}}}}(f(\mathscr {R}_{IJ}^{(m),R}))|\\&+|{{\mathrm{\mathbb {E}}}}(f(\mathscr {R}_{IJ}^{(m),R})) - {{\mathrm{\mathbb {E}}}}(f(\mathscr {R}_{IJ}^{(m)}))|. \end{aligned}$$

Now letting \(m\rightarrow \infty \) and then letting \(R\rightarrow \infty \) and using Proposition 4.20 completes the proof. \(\square \)

We next prove Proposition 4.20.

4.6 Proof of Proposition 4.20

We start with three preliminary lemmas. Recall that \(\left\{ i\leadsto j\right\} \) denotes the event that j is a child of i in \(\mathscr {T}_m^{\mathbf {p}}\).

Lemma 4.21

Under Assumption 4.4, for each fixed \(i\ge 1\),

$$\begin{aligned} \left| \frac{\sum _{j\in [m]}p_j \mathbbm {1}\left\{ i\leadsto j\right\} }{\sigma (\mathbf {p})} - \frac{p_i}{\sigma (\mathbf {p})}\right| \mathop {\longrightarrow }\limits ^{\mathrm {P}}0 \quad \text{ as } m\rightarrow \infty . \end{aligned}$$

Proof

Recall from (4.29) that for fixed i, the collection of events \(\left\{ \left\{ i\leadsto j\right\} : j\ne i\right\} \) are pairwise independent and have the same probability \(p_i\). Thus

$$\begin{aligned} {{\mathrm{\mathbb {E}}}}\left( \frac{\sum _{j\in [m]}p_j \mathbbm {1}\left\{ i\leadsto j\right\} }{\sigma (\mathbf {p})}\right) = \frac{p_i}{\sigma (\mathbf {p})}\left( \sum _{j\ne i} p_j\right) = (1-p_i)\frac{p_i}{\sigma (\mathbf {p})}, \end{aligned}$$

and

$$\begin{aligned} {{\mathrm{Var}}}\left( \frac{\sum _{j\in [m]}p_j \mathbbm {1}\left\{ i\leadsto j\right\} }{\sigma (\mathbf {p})}\right) = \sum _{j\in [m]} \frac{p_j^2}{\sigma ^2(\mathbf {p})}{{\mathrm{Var}}}(\mathbbm {1}\left\{ i\leadsto j\right\} )\le p_i. \end{aligned}$$

This completes the proof as \(\max _{i\in [m]}p_i = p_1\rightarrow 0\) and \(p_i/\sigma (\mathbf {p})\rightarrow \theta _i\) under Assumption 4.4. \(\square \)

Lemma 4.22

Under Assumption 4.4, for each fixed \(i\ge 1\),

$$\begin{aligned} \max _{j: i\leadsto j} \frac{p_j}{\sigma (\mathbf {p})} \mathop {\longrightarrow }\limits ^{\mathrm {P}}0, \quad \text{ as } m\rightarrow \infty . \end{aligned}$$

Proof

Fix \(\varepsilon >0\) and write

$$\begin{aligned} \mathscr {N}_{\varepsilon }(m):= \left\{ j: p_j\ge \sigma (\mathbf {p})\varepsilon \right\} \ \text { and }\ n_{\varepsilon }(m)=|\mathscr {N}_{\varepsilon }(m)|. \end{aligned}$$

Note that by Assumption 4.4, for every \(\varepsilon > 0\), \(\left\{ n_\varepsilon (m):m\ge 1\right\} \) is a bounded sequence. Further, (4.29) and Markov’s inequality yield

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \max _{j: i\leadsto j} \frac{p_j}{\sigma (\mathbf {p})}> \varepsilon \right) \le \sum _{j\in \mathscr {N}_{\varepsilon }(m)} p_i = n_\varepsilon {(m)}p_i\rightarrow 0, \end{aligned}$$

as \(\max _{i\in [m]}p_i =p_1\rightarrow 0\). \(\square \)

Recall that \(\mathscr {D}_m(i)\) is the set of children of vertex i in \(\mathscr {T}_m^{\mathbf {p}}\). For later use let \(d_m(i):=|\mathscr {D}_m(i)|\) denote the degree of i in \(\mathscr {T}_m^{\mathbf {p}}\). Note that Lemma 4.21 together with the lemma just proven gives

$$\begin{aligned} d_m(i)\mathop {\longrightarrow }\limits ^{\mathrm {P}}\infty , \quad \text{ as } m\rightarrow \infty . \end{aligned}$$
(4.34)

Lemma 4.23

For each fixed m, let \(\mathbf {q}(m):= (q_1,q_2,\ldots q_d)\) be a probability mass function with \(q_i > 0\) for all im where \(d = d(m)\rightarrow \infty \) as \(m\uparrow \infty \). Assume further that \(q_{\max }:=\max _{i\in [d]} q_i\rightarrow 0 \) as \(m\rightarrow \infty \). Let \(\left\{ U_i^{(m)}:1\le i\le d\right\} \) be i.i.d. uniform random variables and consider the function

$$\begin{aligned} W_m(t):= \sum _{i=1}^d q_i \mathbbm {1}\left\{ U_i^{(m)} \le t\right\} - t, \quad t\in [0,1]. \end{aligned}$$

Then \(\sup _{t\in [0,1]} |W_m(t)| \mathop {\longrightarrow }\limits ^{\mathrm {P}}0\) as \(m\rightarrow \infty \).

Proof

Recall the proof of Lemma 4.9 where we studied the tightness of the tilt. Then replacing \(\mathbf {p}\) in the proof by \(\mathbf {q}\), the quantity of interest is \(\sup _{t\in [0,1]} |W_m(t)| = \sigma (\mathbf {q})\mathscr {R}_1(m)\) where \(\mathscr {R}_1(m)\) is as defined in (4.3) and \(\sigma (\mathbf {q}):=\sqrt{\sum _i q_i^2}\). Now (4.14) and (4.15) imply the existence of a constant C (independent of m) such that for all m and \(x\ge e\),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(\sup _{t\in [0,1]} |W_m(t)| > x\sigma (\mathbf {q}))\le \exp (-C x\log (\log {x})). \end{aligned}$$

Since \(\sigma (\mathbf {q})\le \sqrt{q_{\max }} \rightarrow 0\) as \(m\rightarrow \infty \), this completes the proof. \(\square \)

We now have all the ingredients for the proof of Proposition 4.20. We prove parts (a), (b) and (c) one by one.

Proof of Proposition 4.20(a)

Recall from Definition 4.16 the tree \(r_{IJ}^{(m)}\) that contains all the edge lengths and hub information in \(\mathscr {R}_{IJ}^{(m)}\) but ignores root-to-leaf measures and lead values \(\mathfrak {G}_{(m)}(\cdot )\). By [27, Corollary 15] or [13, Proposition 3], for fixed \(J\ge 1\), we have

$$\begin{aligned} \left( r_{I^{\prime }J}^{(m)}: I^{\prime }\ge 0\right) \mathop {\longrightarrow }\limits ^{d}\left( r_{I^{\prime } J}^{(\infty )}: I^{\prime }\ge 0\right) \end{aligned}$$
(4.35)

with respect to the product topology on \(\prod _{I^\prime \ge 0} \mathbf {T}_{I^{\prime } J}\). Using Lemmas 4.21 and 4.22 and Skorohod embedding, we assume that we are working on a probability space that supports a sequence of unordered \(\mathbf {p}\)-trees \(\left\{ \mathscr {T}_m^{\mathbf {p},{{\mathrm{uo}}}}:m\ge 1\right\} \), sampled vertices \(\left\{ V_j^{(m)}:1\le j\le J, m\ge 1\right\} \) using the associated sequence of probability mass functions \(\left\{ \mathbf {p}(m):m\ge 1\right\} \), an ICRT \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\), and sampled vertices \(\left\{ V_j^{(\infty )}:1\le j\le J\right\} \) using the mass measure such that the following hold:

  1. (A)

    Convergence in (4.35) happens almost surely:

    $$\begin{aligned} \left( r_{I^{\prime }J}^{(m)}: I^{\prime }\ge 0, \right) \mathop {\longrightarrow }\limits ^{\mathrm {a.s.}}\left( r_{I^{\prime } J}^{(\infty )}: I^{\prime }\ge 0\right) \quad \text { as }m\rightarrow \infty \end{aligned}$$
    (4.36)

    coordinatewise, where the underlying tree corresponding to \(r_{I^{\prime }J}^{(m)}\) is spanned by the root of \(\mathscr {T}_m^{\mathbf {p},{{\mathrm{uo}}}}\) and \(V_j^{(m)},\ 1\le j\le J\).

  2. (B)

    Writing \(s_m(i):= \sum _{v\in \mathscr {D}_m(i)} p_v\) for the sum of weights of children of i in \(\mathscr {T}_m^{\mathbf {p},{{\mathrm{uo}}}}\), we have

    $$\begin{aligned} \left( \frac{s_m(i)}{\sigma (\mathbf {p})}:i\ge 1\right) \mathop {\longrightarrow }\limits ^{\mathrm {a.s.}}\left( \theta _i:i\ge 1\right) \end{aligned}$$
    (4.37)

    coordinatewise. (We can assume that this holds because of Lemma 4.21).

  3. (C)

    For fixed hub \(i\ge 1\) and \(m\ge 1\), write

    $$\begin{aligned} q_{m,i}(v):= \frac{p_v}{s_m(i)} , \qquad v\in \mathscr {D}_m(i), \qquad q_{m,i}^{\max }:=\max _{v\in \mathscr {D}_m(i)} q_{m,i}(v). \end{aligned}$$
    (4.38)

    Then we assume (using Lemma 4.22 and (4.34)) that for all \(i\ge 1\)

    $$\begin{aligned}q_{m,i}^{\max }\mathop {\longrightarrow }\limits ^{\mathrm {a.s.}}0\text { and }d_m(i)\mathop {\longrightarrow }\limits ^{\mathrm {a.s.}}\infty .\end{aligned}$$

Now, for each \(z\in [m]\) and \(i\ge 1\), if \(i\in [\rho ,z]\) (where \(\rho =\rho _m\) is the root of \(\mathscr {T}_m^{\mathbf {p},{{\mathrm{uo}}}}\)), write \(c(i;z)\in \mathscr {D}_m(i)\) for the child of i that is the ancestor of z. Next, construct a collection \(\left\{ U_{m,i}(v):m\ge 1,i\ge 1, v\in [m]\right\} \) of uniform[0, 1] random variables on the same space such that

  1. (a)

    \(\left\{ \mathscr {T}_m^{\mathbf {p},{{\mathrm{uo}}}}, U_{m,i}(v):i\ge 1, v\in [m]\right\} \) are jointly independent for each \(m\ge 1\); and

  2. (b)

    for each \(i\le R\) and \(j\le J\) for which \(i\in [\rho , V_j^{(\infty )}]\), \(U_{m,i}\left( c(i;V_j^{(m)})\right) \) is a constant sequence (in m) eventually.

As described below Theorem 4.15, we can use these uniform random variables to generate the sequence of ordered \(\mathbf {p}\)-trees \(\left\{ \mathscr {T}_m^{\mathbf {p}}\right\} \) from \(\left\{ \mathscr {T}_m^{\mathbf {p},{{\mathrm{uo}}}}\right\} \) as follows: Let \(\varvec{U}_{m,i}:=\left\{ U_{m,i}(v): v\in \mathscr {D}_{m}(i)\right\} \). Think of these as “ages” of the children and arrange the children from left to right in decreasing order of their ages.

Once this ordering has been defined, we can construct the function \(\mathfrak {G}_{(m)}(\cdot )\) as in (4.18). In this case we can write this function explicitly in terms of the associated uniform random variables as follows. Define

$$\begin{aligned} \mathbb {O}_{(m),i}(z)&:= \frac{1}{\sigma (\mathbf {p})} \mathbbm {1}\left\{ i\in [\rho ,z]\right\} \sum _{v\in \mathscr {D}_m(i)} p_v \mathbbm {1}\left\{ U_{m,i}(v)< U_{m,i}(c(i;z)) \right\} \\&= \mathbbm {1}\left\{ i\in [\rho ,z]\right\} \frac{s_m(i)}{\sigma (\mathbf {p})} \sum _{v\in \mathscr {D}_m(i)} q_{m,i}(v) \mathbbm {1}\left\{ U_{m,i}(v) < U_{m,i}(c(i;z))\right\} . \end{aligned}$$

Then

$$\begin{aligned} \left( \sigma (\mathbf {p})\right) ^{-1}\mathfrak {G}_{(m)}^{R}(z)= \sum _{i\le R} \mathbb {O}_{(m), i}(z). \end{aligned}$$
(4.39)

Similarly, the root-to-leaf measure \(Q_{v}^{(m), R}\) (recall (4.33)) can also be expressed in terms of this function.

Now using (4.36), for every fixed hub \(i\le R\), \(j\le J\), and a.e. sample point \(\omega \), one of the following two holds:

  1. (a)

    \(i\notin [\rho , V_j^{(\infty )}]\), in which case there exists \(m= m(\omega )\) such that \(i\notin [\rho , V_j^{(m)}]\) for all \(m> m(\omega )\).

  2. (b)

    \(i\in [\rho , V_j^{(\infty )}]\), in which case there exists \(m= m(\omega )\) such that \(i\in [\rho , V_j^{(m)}]\) for all \(m > m(\omega )\).

When the latter happens, using Lemma 4.23 together with (4.37) and (4.38), we get

$$\begin{aligned} \bigg |\mathbb {O}_{(m), i}(V_j^{(m)}) - \theta _i U_{m, i}\left( c(i;V_j^{(m)})\right) \bigg |\mathop {\longrightarrow }\limits ^{\mathrm {P}}0 \quad \text{ as } m\rightarrow \infty . \end{aligned}$$

By construction, \(U_{m, i}\left( c(i;V_j^{(m)})\right) \) is eventually constant in m on the event \(\left\{ i\in [\rho , V_j^{(\infty )}]\right\} \). This immediately implies convergence of the (scaled) truncated leaf values \(\mathfrak {G}_{(m)}^{R}(V_j^{(m)})/\sigma (\mathbf {p})\) [see (4.39)] for \(1\le j\le J\), and similarly the truncated root to leaf measures \(Q_{V_j^{(m)}}^{(m), R}\) jointly with the convergence in (4.36) and thus yields the convergence \(\mathscr {R}_{IJ}^{(m), R} \mathop {\longrightarrow }\limits ^{d}\mathscr {R}_{IJ}^{(\infty ), R}\). \(\square \)

Proof of Proposition 4.20(b)

Recall from Sect. 2.2 that \(\mathscr {R}_{IJ}^{(\infty )}\) is obtained by applying the stick-breaking construction to \([0,\eta _J]\), and leaf \(j+\) in \(\mathscr {R}_{IJ}^{(\infty )}\) corresponds to the vertex coming from \(\eta _j\). It is easy to see from the definition of \(\mathfrak {G}_{(\infty )}^{R}\) and \(Q_{v}^{(\infty ),R}\) that it suffices to prove

$$\begin{aligned} 0\le \mathscr {E}_R^{(1)}:= \sum _{j=1}^J (\mathfrak {G}_{(\infty )}(\eta _j) - \mathfrak {G}_{(\infty )}^{R}(\eta _j)) \mathop {\longrightarrow }\limits ^{\mathrm {P}}0, \quad \text{ as } R\rightarrow \infty . \end{aligned}$$

For every hub \(i\ge 1\) and leaf \(\eta _j\), write \(\left\{ i\rightarrow \eta _j\right\} \) if \(\eta _j\) is a descendant of i (namely \(i\in [\rho ,\eta _j]\)). Then note that

$$\begin{aligned} \mathscr {E}_R^{(1)}=\sum _{j=1}^J \sum _{i=R+1}^{\infty } \sum _{k=1}^{\infty } \theta _i U_{k}^{(i)} \mathbbm {1}\left\{ \eta _j\in \mathscr {T}_k^{(i)}\right\} \le \sum _{j=1}^J \sum _{i=R+1}^\infty \theta _i \mathbbm {1}\left\{ i\rightarrow \eta _j\right\} := \mathscr {E}_R^{(2)}. \end{aligned}$$

Thus, it is enough to show that given \(\varepsilon > 0\), we can find \(R= R(\varepsilon ) < \infty \) such that \({{\mathrm{\mathbb {P}}}}(\mathscr {E}_{R}^{(2)} > \varepsilon )<\varepsilon \). To this end, first choose \(K_{\varepsilon }\) large enough so that \({{\mathrm{\mathbb {P}}}}(\eta _J > K_\varepsilon ) < \varepsilon /2\), and then choose \(R_\varepsilon \) large enough so that

$$\begin{aligned} \frac{J K_{\varepsilon }}{\varepsilon }\sum _{R_{\varepsilon }+1}^\infty \theta _i^2 < \varepsilon /2. \end{aligned}$$

Then note that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(\mathscr {E}_{R_\varepsilon }^{(2)}> \varepsilon )&\le {{\mathrm{\mathbb {P}}}}(\eta _J> K_\varepsilon )\\&\quad +{{\mathrm{\mathbb {P}}}}\left( J\sum _{i=R_\varepsilon +1}^\infty \theta _i \mathbbm {1}\left\{ i \text{ th } \text{ hub } \text{ appears } \text{ before } \text{ time } K_\varepsilon \right\} > \varepsilon \right) \\&\le \frac{\varepsilon }{2}+ \frac{J}{\varepsilon }\sum _{i=R_{\varepsilon }+1}^\infty \theta _i \left( 1-\exp (-\theta _i K_\varepsilon )\right) \\&< \varepsilon \qquad \text { by the choice of } R_{\varepsilon }, \end{aligned}$$

where the first term in the second inequality follows from the choice of \(K_\varepsilon \), while the second term comes from the stick-breaking construction of \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) using the countable collection of Poisson point processes. This completes the proof. \(\square \)

Proof of Proposition 4.20(c)

Recall that the tree \(\mathscr {R}_{IJ}^{(m)}\) (and \(\mathscr {R}_{IJ}^{(m),R}\)) can be thought of as being made up of \(2J+1\) coordinates:

  1. (a)

    One coordinate for the shape and edge length information along with the labels smaller than I namely \(r_{IJ}^{(m)}\) (see Definition 4.16). Note that this is the same for both \(\mathscr {R}_{IJ}^{(m)}\) and \(\mathscr {R}_{IJ}^{(m),R}\).

  2. (b)

    J coordinates for the leaf values \(\mathfrak {G}_{(m)}(V_j)/\sigma (\mathbf {p})\) (resp. \(\mathfrak {G}_{(m)}^{R}(V_j)/\sigma (\mathbf {p})\)).

  3. (c)

    J coordinates for the measured metric spaces \(\mathscr {M}_j^{(m)}:= ([\rho , V_j^{(m)}],\ Q_{V_j}^{(m)})\) (resp. \(\mathscr {M}_j^{(m), R}:= ([\rho , V_j^{(m)}],\ Q_{V_j}^{(m), R})\)).

Since \(\mathbf {T}_{IJ}^*\) assumes the product topology on these coordinates, it is enough to show the required estimate in Proposition 4.20 (c) with functions of the form

$$\begin{aligned} f(\mathbf {t}, (a_i)_{1\le j\le J}, (M_j)_{1\le j\le J}):= F(\mathbf {t}) \prod _{1\le j\le J} g_j(a_i) \prod _{1\le j\le J} h_j(M_j). \end{aligned}$$

Here \(\mathbf {t}\in \mathbf {T}_{IJ}\), \(a_j\in \mathbb {R}\) are associated leaf values and \(M_j\) are the paths from the root to leaf j with an associated probability measure and f, \(g_j\) and \(h_j\) are bounded uniformly continuous functions on the spaces \(\mathbf {T}_{IJ}\), \(\mathbb {R}\) and \(\mathscr {S}\) (measured compact metric spaces) respectively. To simplify notation, we will simply write this as \(f(\mathbf {t})\).

Now we can go from \(\mathscr {R}_{IJ}^{(m)}\) to \(\mathscr {R}_{IJ}^{(m), R}\) by flipping one coordinate at a time. Thus writing

$$\begin{aligned} f^{(-i)}_1(\mathbf {t}):= & {} F(\mathbf {t}) \prod _{\begin{array}{c} 1\le j\le J\\ j\ne i \end{array}} g_j(a_i) \prod _{1\le j\le J} h_j(M_j), \qquad \\ f^{(-i)}_2(\mathbf {t}):= & {} F(\mathbf {t}) \prod _{1\le j\le J} g_j(a_i) \prod _{\begin{array}{c} 1\le j\le J\\ j\ne i \end{array}} h_j(M_j), \end{aligned}$$

we get

$$\begin{aligned} |{{\mathrm{\mathbb {E}}}}(f(\mathscr {R}_{IJ}^{(m)}))-{{\mathrm{\mathbb {E}}}}(f(\mathscr {R}_{IJ}^{(m), R}))|&\le \sum _{j=1}^J||f^{(-j)}_1||_{\infty }\\&{{\mathrm{\mathbb {E}}}}\left( \left| g_j\left( \frac{\mathfrak {G}_{(m)}(V_j)}{\sigma (\mathbf {p})}\right) - g_j\left( \frac{\mathfrak {G}_{(m)}^{R}(V_j)}{\sigma (\mathbf {p})}\right) \right| \right) \\&+\sum _{j=1}^J||f^{(-j)}_2||_{\infty } {{\mathrm{\mathbb {E}}}}(|h_j(\mathscr {M}_j^{(m)}) - h_j(\mathscr {M}_j^{(m), R}) |) \end{aligned}$$

Since \(V_j\)’s have been sampled in an i.i.d. fashion from \(\mathbf {p}\), it is enough to show that for any two bounded uniformly continuous functions hg on \(\mathbb {R}\) and \(\mathscr {S}\) respectively,

$$\begin{aligned} \limsup _{R\rightarrow \infty }\limsup _{m\rightarrow \infty }\ {{\mathrm{\mathbb {E}}}}\left( \left| g\left( \frac{\mathfrak {G}_{(m)}(V_1^{(m)})}{\sigma (\mathbf {p})}\right) - g\left( \frac{\mathfrak {G}_{(m)}^{R}(V_1^{(m)})}{\sigma (\mathbf {p})}\right) \right| \right) = 0, \end{aligned}$$
(4.40)

and

$$\begin{aligned} \limsup _{R\rightarrow \infty }\limsup _{m\rightarrow \infty }\ {{\mathrm{\mathbb {E}}}}\left( |h(\mathscr {M}_1^{(m)}) - h(\mathscr {M}_1^{(m), R}) |\right) = 0. \end{aligned}$$
(4.41)

Now consider the measured metric spaces \(\mathscr {M}_1^{(m)} \) and \(\mathscr {M}_1^{(m),R}\). As remarked above, they share the same metric space, namely the path \([\rho , V_1^{(m)}]\). The only difference is in the associated probability measures. Consider the natural correspondence \(C=\left\{ (x,x): x\in [\rho ,V_1^{(m)}]\right\} \) between \(\mathscr {M}_1^{(m)} \) and \(\mathscr {M}_1^{(m),R}\). Further, define a probability measure \(\pi \) on \([\rho ,V_1^{(m)}] \times [\rho , V_1^{(m)}]\) as

$$\begin{aligned} \pi (i,i):= {\left\{ \begin{array}{ll} {\sum _{u} p_u \mathbbm {1}\left\{ u\in \mathscr {R}\mathscr {C}(i,[\rho ,V_1^{(m)}])\right\} }/{\mathfrak {G}_{(m)}(V_1^{(m)})}, &{} \text { if }i\in (\rho ,V_1^{(m)}] \text { and } i\le R,\\ \left[ {\mathfrak {G}_{(m)}(V_1^{(m)}) - \mathfrak {G}_{(m)}^{R}(V_1^{(m)}) }\right] /{\mathfrak {G}_{(m)}(V_1^{(m)})}, &{} \text { if } i=\rho . \end{array}\right. } \end{aligned}$$

Writing \(\pi _1\) and \(\pi _2\) for the marginals of \(\pi \), we have, using the above choice of correspondence C and of the measure \(\pi \),

$$\begin{aligned} d_{{{\mathrm{GHP}}}}^{{{\mathrm{pt}}}}(\mathscr {M}_1^{(m)}, \mathscr {M}_1^{(m), R})\le & {} \left( || \pi _1 - Q_{V_1}^{(m)} ||+||\pi _2 - Q_{V_1}^{(m), R} || \right) \nonumber \\\le & {} \frac{2\left[ {\mathfrak {G}_{(m)}(V_1) - \mathfrak {G}_{(m)}^{R}(V_1) }\right] }{\mathfrak {G}_{(m)}(V_1)}. \end{aligned}$$
(4.42)

Now suppose we show (4.40). Using part (a) and part (b) of Proposition 4.20, we get \((\sigma (\mathbf {p}))^{-1}\mathfrak {G}_{(m)}(V_1^{(m)}) \mathop {\longrightarrow }\limits ^{d}\mathfrak {G}_{(\infty )}(V_1^{(\infty )}) > 0\). Now using the bound in (4.42) and uniform continuity of h, we see that (4.41) is true. Hence it is enough to prove (4.40).

Recall from Sect. 4.2.1 the construction of \(V_1^{(m)}\) and the tree simultaneously via the birthday construction, where \(V_1^{(m)}\) is obtained as the value before the first repeat time, namely \(Y_{R_1-1}\). Fix \(\varepsilon >0\). By [27, Theorem 4], under Assumptions 4.4 we may choose \(K_\varepsilon \) large so that the first repeat time satisfies \({{\mathrm{\mathbb {P}}}}(R_1 > K_\varepsilon /\sigma (\mathbf {p})) < \varepsilon \) for all \(m\ge 1\). Next, by uniform continuity of g, choose \(\delta \in (0,1)\) such that \(|g(x) -g(y)| < \varepsilon \) if \(|x-y| < \delta \). Finally choose R large so that for all m,

$$\begin{aligned} \frac{K_\varepsilon ^2}{\delta \wedge \varepsilon } \sum _{i=R+1}^m \frac{p_i^2}{\sigma ^2(\mathbf {p})} < \varepsilon . \end{aligned}$$

First, by choice of \(K_\varepsilon \) and boundedness of g,

$$\begin{aligned} \left| {{\mathrm{\mathbb {E}}}}\left[ g\left( \frac{\mathfrak {G}_{(m)}(V_1^{(m)})}{\sigma (\mathbf {p})} \right) \right] -{{\mathrm{\mathbb {E}}}}\left[ g\left( \frac{\mathfrak {G}_{(m)}(V_1^{(m)})}{\sigma (\mathbf {p})} \right) \mathbbm {1}\left\{ R_1 \le \frac{K_\varepsilon }{\sigma (\mathbf {p})}\right\} \right] \right| \le ||g||_{\infty } \varepsilon , \end{aligned}$$
(4.43)

and a similar inequality holds true if we replace the functional \(\mathfrak {G}_{(m)}\) by \(\mathfrak {G}_{(m)}^R\). Next, writing

$$\begin{aligned} \mathscr {E}_{(m)}^{(1)}(R):= \left| {{\mathrm{\mathbb {E}}}}\left[ \left\{ g\left( \frac{\mathfrak {G}_{(m)}(V_1^{(m)})}{\sigma (\mathbf {p})} \right) -g\left( \frac{\mathfrak {G}_{(m)}^{R}(V_1^{(m)})}{\sigma (\mathbf {p})} \right) \right\} \mathbbm {1}\left\{ R_1 \le \frac{K_\varepsilon }{\sigma (\mathbf {p})}\right\} \right] \right| , \end{aligned}$$

we have

$$\begin{aligned} \mathscr {E}_{(m)}^{(1)}(R) \le \varepsilon + 2||g||_{\infty } {{\mathrm{\mathbb {P}}}}\left( R_1 \le \frac{K_\varepsilon }{\sigma (\mathbf {p})}~,~ \frac{ \left( \mathfrak {G}_{(m)}(V_1^{(m)}) - \mathfrak {G}_{(m)}^R(V_1^{(m)})\right) }{\sigma (\mathbf {p})} \ge \delta \right) \end{aligned}$$
(4.44)

by our choice of \(\delta \). The difference \(\mathfrak {G}_{(m)}(V_1^{(m)}) - \mathfrak {G}_{(m)}^{R}(V_1^{(m)})\) is a tricky object for which we will need a tractable upper bound. Recall that we have used \(\mathscr {T}_1^{\mathscr {B}}\) for the birthday tree in (4.7) constructed by time \(R_1\). For every vertex \(i\in \mathscr {T}_1^{\mathscr {B}}\), let \(\mathscr {J}(i)\) be the first child of i in the birthday construction (the first new, i.e., previously un-sampled vertex sampled immediately after a prior sampling of i). This will be an empty set if i is a leaf in the eventual full tree \(\mathscr {T}_m^{\mathbf {p}}\). Recall that \(\left\{ i\leadsto j\right\} \) was used to denote the event that j is a child of i in \(\mathscr {T}_m^{\mathbf {p}}\). Then note that

$$\begin{aligned} \mathfrak {G}_{(m)}(V_1^{(m)}) - \mathfrak {G}_{(m)}^{R}(V_1^{(m)}) \le \sum _{i\ge R+1}\mathbbm {1}\left\{ i\in \mathscr {T}_1^{\mathscr {B}}\right\} \sum _{j\in [m]}p_j\mathbbm {1}\left\{ i\leadsto j, j\ne \mathscr {J}(i)\right\} . \end{aligned}$$

Thus,

$$\begin{aligned}&{{\mathrm{\mathbb {P}}}}\Bigg (R_1 \le \frac{K_\varepsilon }{\sigma (\mathbf {p})}~,~ \frac{ \left( \mathfrak {G}_{(m)}(V_1^{(m)}) - \mathfrak {G}_{(m)}^{R}(V_1^{(m)})\right) }{\sigma (\mathbf {p})} \ge \delta \Bigg )\nonumber \\&\quad \le \frac{1}{\delta }\sum _{i=R+1}^m \sum _{j\in [m]}\frac{p_j}{\sigma (\mathbf {p})}{{\mathrm{\mathbb {P}}}}\left( i \text{ appears } \text{ before } \frac{K_\varepsilon }{\sigma (\mathbf {p})}, i\leadsto j, j\ne \mathscr {J}(i)\right) =:\mathscr {E}_{(m)}^{(2)}(R). \end{aligned}$$
(4.45)

For \(i\ne j\in [m]\), define the event \(E_{ij}:=\left\{ i \text{ appears } \text{ before } \frac{K_\varepsilon }{\sigma (\mathbf {p})}, i\leadsto j, j\ne \mathscr {J}(i)\right\} \). Then for \(E_{ij}\) to happen, the following needs to happen in the birthday construction: (a) There is an \(0\le r_1\le K_\varepsilon /\sigma (\mathbf {p})\) such that till time \(r_1\), neither i or j have been sampled. (b) At time \(r_1+1\) vertex i is sampled. (c) There is an \(r_2\ge 0\) such that in the times \([r_1+1, r_1+1+r_2]\) samples, j does not appear. (d) Then at time \(r_1+r_2+2\), vertex i is sampled again. (e) In the next time step \(r_1+r_2+3\) vertex j is sampled. Therefore,

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(E_{ij})\le \sum _{r_1=0}^{K_\varepsilon /\sigma (\mathbf {p})} \sum _{r_2=0}^{\infty } (1-p_i-p_j)^{r_1} p_i(1-p_j)^{r_2} p_i p_j \le p_i^2\frac{K_\varepsilon }{\sigma (\mathbf {p})}. \end{aligned}$$

Using this in (4.45), we get

$$\begin{aligned} \mathscr {E}_{(m)}^{(2)}(R)\le \frac{K_\varepsilon }{\delta }\sum _{R+1}^m \frac{p_i^2}{[\sigma (\mathbf {p})]^2} \le \varepsilon ,\ \text{ by } \text{ our } \text{ choice } \text{ of } R. \end{aligned}$$
(4.46)

Combining (4.43), (4.44), (4.45) and (4.46) now gives the following lemma which completes the proof of (4.40) and thus the proof of part (c) of the proposition. \(\square \)

Lemma 4.24

Given \(\varepsilon > 0\) choose \(K_\varepsilon , \delta \) and R as above. Then, for all \(m\ge 1\),

$$\begin{aligned} \left| {{\mathrm{\mathbb {E}}}}\left[ g\left( \frac{\mathfrak {G}_{(m)}(V_1^{(m)})}{\sigma (\mathbf {p})} \right) \right] - {{\mathrm{\mathbb {E}}}}\left[ g\left( \frac{\mathfrak {G}_{(m)}^{R}(V_1^{(m)})}{\sigma (\mathbf {p})} \right) \right] \right| \le \varepsilon (4||g||_{\infty }+1). \end{aligned}$$

Proof of Theorem 4.18

We now prove continuity of the function \(g_{\phi }^{(k)}\) on the space \(\mathbf {T}_{I,(k+\ell )}^*\). In fact, we will give a quantitative estimate. Since we are assuming the discrete topology on the coordinate corresponding to the shape, without loss of generality we will work with two trees \(\mathbf {t}, {\overline{\mathbf {t}}}\in \mathbf {T}_{I,(k+\ell )}^*\) having the same shape. We need to distinguish the labels for the root and the leaves in the two trees; so write \(0+\) (respectively \(\overline{0}+\)) for the root of \(\mathbf {t}\) (respectively \({\overline{\mathbf {t}}}\)) and write \(\left\{ j+: 1\le j\le k+\ell \right\} \) (respectively \(\left\{ \overline{j}+: 1\le j\le k+l\right\} \)) for the collection of leaves in \(\mathbf {t}\) (respectively \({\overline{\mathbf {t}}}\)). Finally, let \(\nu _j\) be the corresponding probability measure on the path \(\mathscr {M}_j:= [0+, j+]\) for \(1\le j\le k\), and analogously let \({\overline{\nu }}_j\) be the probability measure on \({\overline{\mathscr {M}}}_j:= [\overline{0}+, \overline{j}+]\). View these paths as pointed measured metric spaces pointed at the roots \(0+\) and \({\overline{0}}+\) respectively. Now let \(\varepsilon _j:= d_{{{\mathrm{GHP}}}}^{{{\mathrm{pt}}}}(\mathscr {M}_j,{\overline{\mathscr {M}}}_j)\), where \(d_{{{\mathrm{GHP}}}}^{{{\mathrm{pt}}}}\) is the pointed Gromov-Hausdorff-Prokhorov metric defined in Sect. 2.1. \(\square \)

Write \(L = {\ell \atopwithdelims ()2}\). Let \(\phi :\mathbb {R}_+^L\rightarrow \mathbb {R}\) be a bounded continuous function. For \(K >0\), let \(\square (K) = [0,K]^L\), and for \(\delta >0\), define

$$\begin{aligned} {{\mathrm{osc}}}_{\phi }(\delta , K):= \sup _{\begin{array}{c} \mathbf {x},\mathbf {y}\in \square (K)\\ ||\mathbf {x}-\mathbf {y}||_{\infty } < \delta \end{array} } |\phi (\mathbf {x}) - \phi (\mathbf {y})|. \end{aligned}$$

Finally, define

$$\begin{aligned} \varepsilon := 4\sum _{j=1}^k \varepsilon _j + (k+1)\sum _{e} \big |l_e(\mathbf {t}) - l_e({\overline{\mathbf {t}}})\big |, \end{aligned}$$
(4.47)

where \(l_e(\cdot )\) denotes the length of the edge e and we have used the fact that both trees have the same shape. Write \({{\mathrm{ht}}}(\mathbf {t})\) for the height of tree \(\mathbf {t}\) (not graph distance, rather in terms of maximal distance from the root when incorporating edge lengths). The following proposition completes the proof of Theorem 4.18:

Proposition 4.25

For two trees \(\mathbf {t}, {\overline{\mathbf {t}}}\in \mathbf {T}_{I,(k+\ell )}^*\) having the same shape, and with \(\varepsilon \) as in (4.47),

$$\begin{aligned} |g_{\phi }^{(k)}(\mathbf {t}) - g_{\phi }^{(k)}({\overline{\mathbf {t}}})|\le 2\varepsilon ||\phi ||_{\infty } + {{\mathrm{osc}}}_{\phi }\bigg (\varepsilon \;,\; 2{{\mathrm{ht}}}(\mathbf {t})+2{{\mathrm{ht}}}({\overline{\mathbf {t}}})\bigg ). \end{aligned}$$

Proof

For each \(j\le k\), choose a correspondence \(C_j\) and a measure \(\pi _j\) on the product space \([0+, j+]\times [\overline{0}+, \overline{j}+]\) such that the following conditions are met: (a) \((0+,\overline{0}+)\in C_j\); (b) the distortion satisfies \({{\mathrm{dis}}}(C_j) <3\varepsilon _j\); (c) the measure of the complement satisfies \(\pi _j(C_j^c)< 2\varepsilon _j\); (d) and finally

$$\begin{aligned} ||\nu _j - p_*\pi _j|| + ||{\overline{\nu }}_j - {\overline{p}}_*\pi _j|| < 2\varepsilon _j, \end{aligned}$$
(4.48)

where \(p_*\pi _j\) and \({\overline{p}}_*\pi _j\) are the marginals of \(\pi _j\). Now sample \((X_j^{\star }, {\overline{X}}_j^{\star })\sim \pi _j\) from \([0+, j+]\times [\overline{0}+, \overline{j}+]\) independently for \(1\le j\le k\). By (4.48), we can couple \((X_j^{\star }, {\overline{X}}_j^{\star })\) with two random variables \(X_j, {\overline{X}}_j\) (again independently for \(1\le j\le k\)) such that \(X_j\sim \nu _j\) and \({\overline{X}}_j\sim {\overline{\nu }}_j\), and further

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( X_j\ne X_j^{\star }\right) +{{\mathrm{\mathbb {P}}}}\left( {\overline{X}}_j\ne {\overline{X}}_j^{\star }\right) < 2\varepsilon _j. \end{aligned}$$
(4.49)

Using conditions (b) and (c), we get

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \left| d_{\mathbf {t}}\left( 0+,X_j^{\star }\right) - d_{{\overline{\mathbf {t}}}}\left( {\overline{0}}+,\overline{X}_j^{\star }\right) \right| > 3\varepsilon _j\right) \le 2\varepsilon _j, \end{aligned}$$
(4.50)

where \(d_{\mathbf {t}}\) is the distance metric on tree \(\mathbf {t}\) which incorporates the edge lengths. Now write E for the following “good event”:

$$\begin{aligned} E:=\bigcap _{j=1}^k \left\{ X_j = X_j^{\star },\; {\overline{X}}_j= {\overline{X}}_j^{\star }, \; \left| d_{\mathbf {t}}\left( 0+,X_j^{\star }\right) - d_{{\overline{\mathbf {t}}}}\left( {\overline{0}}+,\overline{X}_j^{\star }\right) \right| \le 3\varepsilon _j \right\} . \end{aligned}$$

It follows from (4.49) and (4.50) that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(E^c)\le 4\sum _{j=1}^k\varepsilon _j. \end{aligned}$$
(4.51)

Now we are going to create “shortcuts” by gluing the leaves to the corresponding sampled points. Let \(\mathbf {S}\) (resp. \({\overline{\mathbf {S}}}\)) be the (random) metric space obtained by identifying each of the leaves \(j+\) (resp. \({\overline{j}}+\)) with \(X_j\) (resp. \({\overline{X}}_j\)) in \(\mathbf {t}\) (resp. \({\overline{\mathbf {t}}}\)) for \(1\le j\le k\) and write \(d_{\mathbf {S}}\) (resp. \(d_{{\overline{\mathbf {S}}}}\)) for the induced metric. Then by definition,

$$\begin{aligned} g_{\phi }^{(k)}(\mathbf {t})= {{\mathrm{\mathbb {E}}}}\left[ \phi \bigg (d_{\mathbf {S}}\left( (k+i_1)+,\ {(k+i_2)}+ \right) : 1\le i_1< i_2\le \ell \bigg )\right] , \end{aligned}$$

and an analogous expression holds for \(g_{\phi }^{(k)}({\overline{\mathbf {t}}})\). This gives

$$\begin{aligned}&\big |g_{\phi }^{(k)}(\mathbf {t})-g_{\phi }^{(k)}({\overline{\mathbf {t}}})\big |\le {{\mathrm{\mathbb {E}}}}\bigg (\bigg |\phi \big (d_{\mathbf {S}}\left( (k+i_1)+,\ {(k+i_2)}+ \right) : 1\le i_1< i_2\le \ell \big )\nonumber \\&\quad \quad -\phi \left( d_{{\overline{\mathbf {S}}}}\left( \overline{(k+i_1)}+,\ {\overline{(k+i_2)}}+ \right) : 1\le i_1<i_2\le \ell \right) \bigg |\bigg ). \end{aligned}$$
(4.52)

Consider the map from \(\mathbf {t}\) to \({\overline{\mathbf {t}}}\) which takes every vertex to the corresponding vertex and points on each edge are mapped by linear interpolation (using the edge lengths) to points on the corresponding edge. Consider \(a\in [0, j+]\) and let \(\overline{a}\in [{\overline{0}}, {\overline{j}}+]\) be the corresponding point in \({\overline{\mathbf {t}}}\) for some \(j\le k\). Then note that

$$\begin{aligned} \bigg |d_{\mathbf {t}}\left( a, X_j\right) -d_{{\overline{\mathbf {t}}}}\left( {\overline{a}}, {\overline{X}}_j\right) \bigg |&\le \bigg |d_{\mathbf {t}}\left( 0+, X_j\right) -d_{{\overline{\mathbf {t}}}}\left( {\overline{0}}+, \overline{X}_j\right) \bigg |\nonumber \\&\quad +\bigg |d_{\mathbf {t}}\left( 0+, a\right) -d_{{\overline{\mathbf {t}}}}\left( {\overline{0}}+, {\overline{a}}\right) \bigg |\nonumber \\&\le 3\varepsilon _j+\sum _e |l_e(\mathbf {t})-l_{e}({\overline{\mathbf {t}}})| \end{aligned}$$
(4.53)

on the set E.

Now consider a shortest path in \(\mathbf {S}\) connecting \((k+i_1)+\) and \({(k+i_2)}+\). We can go from \(\overline{(k+i_1)}+\) to \(\overline{(k+i_2)}+\) by taking the same route in \(\overline{\mathbf {S}}\), i.e., by traversing the same edges and taking the same shortcuts in the same order. We make the following observations: (i) The difference between distance traversed while crossing the edge e is \(|l_e(\mathbf {t})-l_e({\overline{\mathbf {t}}})|\). (ii) By (4.53), on the set E, taking a “shortcut” contributes at most \((3\varepsilon _j+\sum _e |l_e(\mathbf {t})-l_{e}({\overline{\mathbf {t}}})|)\) to the difference between distance traversed. Since we have to take at most k shortcuts, we immediately get

$$\begin{aligned} d_{{\overline{\mathbf {S}}}}\left( \overline{(k+i_1)}+,\ \overline{(k+i_2)}+\right)&\le d_{\mathbf {S}}\left( {(k+i_1)}+,\ {(k+i_2)}+\right) \\&\quad +3\sum _{j=1}^k\varepsilon _j+(k+1)\sum _e |l_e(\mathbf {t})-l_{e}({\overline{\mathbf {t}}})| \end{aligned}$$

on the set E. By symmetry, a similar inequality holds if we interchange the roles of \(\mathbf {S}\) and \({\overline{\mathbf {S}}}\). This observation combined with (4.51) and (4.52) yields the result. \(\square \)

5 Proofs: convergence in Gromov-weak topology

Recall from Proposition 4.1 that conditional on the partition of the vertices \(\left\{ \mathscr {V}^{(i)}:i\ge 1\right\} \) into the connected components, the actual structure of the components of \(\mathscr {G}(\mathbf {x}, t)\) can be generated independently as the connected graph \({\tilde{\mathscr {G}}}_{|\mathscr {V}^{(i)}|}(a_n^{(i)},\mathbf {p}_n^{(i)})\) where \(a_n^{(i)}, \mathbf {p}_n^{(i)}\) are as in Proposition 4.1 and given \(m, \mathbf {p},a\), \({\tilde{\mathscr {G}}}_m(a,\mathbf {p})\) is the connected random graph model studied in the previous section. For Theorem 1.8, the time scale \(t = t_n\) of interest in the expression of \(a_n^{(i)}\) is

$$\begin{aligned} t_n:= \lambda + \frac{1}{\sigma _2(\mathbf {x}^{(n)})}, \end{aligned}$$

for fixed \(\lambda \in \mathbb {R}\). Let \(\mathscr {N}(\mathbb {R}_+)\) denote the space of counting measures on \(\mathbb {R}_+\) equipped with the vague topology.Define \(\varvec{\Upsilon }_n^{(i)}:= (p_v/\sigma (\mathbf {p}), v\in \mathscr {V}^{(i)})\) and view \((a_n^{(i)}\sigma (\mathbf {p}_n^{(i)}), \varvec{\Upsilon }_n^{(i)})\) as a random element of \(\mathbb {S}:= \mathbb {R}_+\times \mathscr {N}(\mathbb {R}_+)\) (equipped with the product topology). Finally, define

$$\begin{aligned} \mathscr {P}_n:= \left( \left( a_n^{(i)}\sigma (\mathbf {p}_n^{(i)}), \varvec{\Upsilon }_n^{(i)}\right) :i\ge 1\right) \end{aligned}$$

viewed as an element of \(\mathbb {S}^{\infty }\), again equipped with the product topology induced by a single coordinate \(\mathbb {S}\). Now given an infinite vector \(\mathbf {c}\in l_0\) recall the process \(\bar{V}_\lambda ^{\mathbf {c}}(\cdot )\) as in (1.16), the corresponding excursions \(\mathscr {Z}(\lambda )\) as in (1.17) and the corresponding excursion lengths in (1.18). Finally recall the definitions of \(\bar{\gamma }^{(i)}, \varvec{\theta }^{(i)}\) from (2.10). Writing these out explicitly, define

$$\begin{aligned} \mathscr {P}_{\infty }:= & {} ((\bar{\gamma }^{(i)}, \varvec{\theta }^{(i)}):i\ge 1)\\= & {} \left( Z_i(\lambda )\sqrt{\sum _{v\in \mathscr {Z}_i(\lambda )} c_v^2}\;,\; \Bigg (\frac{c_j}{\sqrt{\sum _{v\in \mathscr {Z}_i(\lambda )} c_v^2}}: j\in \mathscr {Z}_i(\lambda )\Bigg ):i\ge 1\right) . \end{aligned}$$

Proposition 5.1

The following hold under Assumption 1.6:

  1. (i)

    For every \(i\ge 1\), \(\sigma (\mathbf {p}_n^{(i)}) \mathop {\longrightarrow }\limits ^{\mathrm {P}}0\) as \(n\rightarrow \infty \).

  2. (ii)

    \(\mathscr {P}_n\mathop {\longrightarrow }\limits ^{d}\mathscr {P}_{\infty }\) on \(\mathbb {S}^{\infty }\) as \(n\rightarrow \infty \). Further for every fixed \(i\ge 1\), almost surely,

    $$\begin{aligned} \sum _{v\in \mathscr {Z}_i(\lambda )} c_v= \infty . \end{aligned}$$
    (5.1)

Proof of Theorem 1.8

We prove the theorem assuming Proposition 5.1. By an application of Skorohod embedding we may assume that we are working on a probability space where the convergence in Proposition 5.1 happens almost surely. In particular, in this space, Assumption 4.4 is satisfied almost surely for \(\mathbf {p}_n^{(i)}\) for any fixed \(i\ge 1\). Now an application of Theorem 4.5 completes the proof. \(\square \)

5.1 Verification of weight assumptions in maximal components

Here we give the proof of Proposition 5.1. To ease notation, we will throughout assume \(\lambda =0\). The general case follows in an identical fashion, but this assumption simplifies notation. We will write \(V^{\mathbf {c}}\) instead of \(V^{\mathbf {c}}_0\) for the process in (1.15) with \(\lambda =0\) and simply write \(\mathscr {C}_i\) for \(\mathscr {C}_i([\sigma _2(\mathbf {x}^{(n)})]^{-1})\).

We start by describing an exploration scheme (developed in [9]) which simultaneously constructs the graph \(\mathscr {G}_n(\mathbf {x},t)\) and a “breadth first” walk. This was carefully analyzed in [10] to prove Theorem 1.7.

For every ordered pair (uv), let \(\eta _{u,v}\) be an exponential random variable with rate \(tx_v\) (independent across ordered pairs). Note that there is a simple relation between the connection probabilities of \(\mathscr {G}_n(\mathbf {x},t)\) given by (1.10) and the above random variables given by:

$$\begin{aligned} q_{uv}:= {{\mathrm{\mathbb {P}}}}( \eta _{uv} < x_u). \end{aligned}$$
(5.2)

At each stage \(i\ge 1\), we have a collection of active vertices \(\mathscr {A}(i)\), a collection of explored vertices \(\mathscr {O}(i)\) and a collection of unexplored vertices \(\mathscr {U}(i)= [n]{\setminus }\mathscr {A}(i)\cup \mathscr {O}(i)\).

Initialize with \(\mathscr {O}(1) = \emptyset \) and \(\mathscr {A}(1) = \left\{ v(1)\right\} \), where the first vertex v(1) is chosen by size-biased sampling, namely with probability proportional to vertex weights \(\mathbf {x}\). When possible we will suppress dependence on n to ease notation. Now let \(\mathscr {D}(v(1)):=\left\{ v: \eta _{v(1),v}\le x_{v(1)}\right\} \) denote the collection of “children” of v(1) and note that by (5.2) this generates the right connection probabilities in \(\mathscr {G}_n(\mathbf {x},t)\). Think of the associated \(\eta _{v(1),v}\) values (for vertices connected to v(1)) as “birth-times” of these connections in the interval \([0,x_{v(1)}]\) and label the corresponding vertices as \(v(2), v(3), \ldots v(|\mathscr {D}(v(1))|+1)\). Update the process as \(\mathscr {O}(2):= \left\{ v(1)\right\} \), \(\mathscr {A}(2):= \mathscr {D}(v(1))\) and \(\mathscr {U}(2) = \mathscr {U}(1){\setminus }\mathscr {D}(v(1))\).

Associate with this construction a breadth-first walk as follows:

$$\begin{aligned} Z_n(0):=0, \quad Z_n(u):= -u + \sum _{v} x_v\mathbbm {1}\left\{ \eta _{v(1),v}\le u\right\} , \quad 0\le u\le x_{v(1)}. \end{aligned}$$

Recursively for \(i\ge 2\) let \(T_{i-1}:= \sum _{j=1}^{i-1} x_{v(j)}\). At this “time” we will explore the unexplored neighbors of v(i). By this time, there are \(|\mathscr {U}(i)|:= i-1+|\mathscr {A}(i)|\) vertices that have either been explored or are active. Let \(\mathscr {D}(v(i)):= \left\{ v\in \mathscr {U}(i): \eta _{v(i)v}\le x_{v(i)}\right\} \) and again label these as \(v(i+|\mathscr {A}(i)|), v(i+|\mathscr {A}(i)|+1), \ldots v(i+|\mathscr {A}(i)|+|\mathscr {D}(v(i))|-1)\) in increasing order of their \(\eta _{v(i)v}\) values. Update \(\mathscr {O}(i+1) = \mathscr {O}(i)\cup \left\{ v(i)\right\} \), \(\mathscr {A}(i+1) = \mathscr {A}(i)\cup \mathscr {D}(v(i)){\setminus }\left\{ v(i)\right\} \) and \(\mathscr {U}(i+1) = \mathscr {U}(i){\setminus }\mathscr {D}(v(i))\). Again update the walk as

$$\begin{aligned} Z(T_{i-1}+u) = Z(T_{i-1}) - u + \sum _{v\in \mathscr {D}(v(i))} x_v \mathbbm {1}\left\{ \eta _{v(i),v}\le u\right\} , \quad 0\le u\le x_{v(i)}. \end{aligned}$$

After finishing a component (which happens when \(\mathscr {A}(i) = \emptyset \) for some \(i\ge 2\)), choose the next vertex to explore in a size-biased manner from the unexplored set \(\mathscr {U}(i)\). If \(\mathscr {U}(i)=\emptyset \), then we have finished constructing the partition of the graph into the connected components.

Now note the following important properties of this exploration:

  1. (a)

    The ordering \(({v(1)}, {v(2)}, \ldots , {v(n)})\) is a size-biased reordering of the vertex set [n].

  2. (b)

    If we start a new component at some stage i with vertex v(i), and finish exploring the component at stage \(j\ge i\), then the walk satisfies

    $$\begin{aligned} Z(T_j) = Z(T_{i-1}) - x_{v(i)}, \quad Z(u)\ge Z(T_j) \text{ on } T_{i-1}< u < T_j. \end{aligned}$$

    Thus, the size of the component of v(i), \(\sum _{l=i}^{j} x_{v(l)}\) is essentially the length of the excursion of the walk beyond past minima.

As a starting point in proving Theorem 1.7, Aldous and Limic [10] show the following result. Their result is more general (incorporating the presence of a “Brownian component”) but we state their result as applied to our setting.

Proposition 5.2

([10, Proposition 9]) Consider the process \(\left\{ \bar{Z}_n(s):s\ge 0\right\} \) defined by setting \(\bar{Z}_n(s) := Z(s)/\sigma _2\). Then under Assumption 1.6 \(\bar{Z}_n\mathop {\longrightarrow }\limits ^{d}V^{\mathbf {c}}\) as \(n\rightarrow \infty \).

Using this result, Aldous and Limic [10] show that the corresponding maximal excursions beyond past minima of \(\bar{Z}_n\) also converge to the maximal excursions beyond past minima of \(V^{\mathbf {c}}_\lambda \), namely the excursion lengths of the reflected process \(\bar{V}_\lambda ^{\mathbf {c}}\) (see (1.16)) from zero. A consequence of the proof of Theorem 1.7 in [10] using Proposition 5.2 is the following result:

Lemma 5.3

Fix K and let \(\mathscr {E}_n(K)\) be the time required for the above construction to explore the maximal K components \(\left\{ \mathscr {C}_i:1\le i\le K\right\} \). Then \(\left\{ \mathscr {E}_n(K):K\ge 1\right\} \) is tight.

In other words, for any fixed \(K\ge 1\), the maximal length excursions of \(\bar{V}^{\mathbf {c}}\) are found in finite time. Thus, even though the total weight of vertices \(\sigma _1\rightarrow \infty \), when exploring the graph in size-biased fashion, under Assumption 1.6 one needs only a finite amount of “time” to find the maximal components. Here time is measured in terms of the weight of vertices already explored. Now define

$$\begin{aligned} S_{n,2}(u)= \sum _{i: T_{i}\le u} \left( \frac{x_{v(i)}}{\sigma _2}\right) ^2, \quad R_n^{\varepsilon }(u):= \sum _{i: T_i\le u} \frac{x_{v(i)}^2}{\sigma _2^2}\mathbbm {1}\left\{ x_{v(i)}< \sigma _2\varepsilon \right\} . \end{aligned}$$

Thus, \(S_{n, 2}(t)\) is the normalized sum of squares of vertex weights of vertices explored by time t and \(R_n^{\varepsilon }\) is the normalized sum of these squares where we only retain explored vertices with weight at most \(\varepsilon \sigma _2\). Using the same set of exponential random variables \(\left\{ \xi _j:j\ge 1\right\} \) that arose in the definition of the process \(V^{\mathbf {c}}\) in (1.15) define a new process

$$\begin{aligned} S_{\infty ,2}(u):= \sum _{j=1}^{\infty } c_j^2 \mathbbm {1}\left\{ \xi _j\le u\right\} . \end{aligned}$$

The same proof techniques as in [10] now implies the following. Since the ideas basically follow from [10] we only sketch the proof.

Lemma 5.4

Assumption 1.6 implies the joint convergence of the processes \((\bar{Z}_n(\cdot ), S_{n,2}(\cdot ))\mathop {\longrightarrow }\limits ^{d}(V^{\mathbf {c}}(\cdot )\), \(S_{\infty ,2}(\cdot ))\) as \(n\rightarrow \infty \).

Proof

Fix \(K\ge 1\), and for each \(i\ge 1\), let \(\xi _i^{(n)}\) denote the time when vertex i is added to the collection of active vertices. Now consider the \(K+1\) dimensional stochastic process

$$\begin{aligned} \mathbf {Y}_n^K(s):=\left( \bar{Z}_n(s),\frac{x_1}{\sigma _2} \mathbbm {1}\left\{ \xi _1^{(n)}\le s\right\} , \ldots , \frac{x_K}{\sigma _2} \mathbbm {1}\left\{ \xi _K^{(n)}\le s\right\} \right) , \quad s\ge 0. \end{aligned}$$

Write

$$\begin{aligned} \mathbf {Y}_{\infty }^K(s):= (V^{\mathbf {c}}(s), c_1\mathbbm {1}\left\{ \xi _1\le s\right\} , \ldots ,c_K\mathbbm {1}\left\{ \xi _K\le s\right\} ). \end{aligned}$$

In the proof of Proposition 5.2, Aldous and Limic showed that \(\mathbf {Y}_n^K\mathop {\longrightarrow }\limits ^{d}\mathbf {Y}_{\infty }^K\) for every fixed \(K\ge 1\). Thus to complete the proof, it is enough to show, for every fixed \(A>0\) and \(\eta >0\), \(\limsup _{\varepsilon \rightarrow 0}\limsup _{n\rightarrow \infty } {{\mathrm{\mathbb {P}}}}(R_n^{\varepsilon }(A)> \eta )=0\). Now as described on [10, Page 17], we can couple \((\xi _1^{(n)}, \xi _2^{(n)}, \ldots , \xi _n^{(n)})\) with a sequence of independent exponential random variables \(({\tilde{\xi }}_1^{(n)}, {\tilde{\xi }}_2^{(n)}, \ldots , {\tilde{\xi }}_n^{(n)})\) with \({\tilde{\xi }}_j^{(n)}\) having rate \(x_j/\sigma _2\) such that \({\tilde{\xi }}_j^{(n)}\le \xi _j^{(n)}\). Now write

$$\begin{aligned} \tilde{R}_{n}^{\varepsilon }(t):=\sum _{j: x_j<\varepsilon \sigma _2} \frac{x_j^2}{\sigma _2^2} \mathbbm {1}\left\{ {\tilde{\xi }}_j^{(n)}\le t\right\} . \end{aligned}$$

Then it is enough to show

$$\begin{aligned} \limsup _{\varepsilon \rightarrow 0}\ \limsup _{n\rightarrow \infty }\ {{\mathrm{\mathbb {E}}}}(\tilde{R}_n^{\varepsilon }(A)) = 0, \end{aligned}$$

which is trivial since

$$\begin{aligned} {{\mathrm{\mathbb {E}}}}\left( \tilde{R}_n^{\varepsilon }(A)\right) \le A\sum _{j: x_j\le \varepsilon \sigma _2} \left( \frac{x_j}{\sigma _2}\right) ^3 \rightarrow A\sum _{j: c_j< \varepsilon } c_j^3. \end{aligned}$$

We have used both (1.12) and (1.13) in the last convergence assertion. Thus, first letting \(n\rightarrow \infty \) and then \(\varepsilon \rightarrow 0\) completes the proof. \(\square \)

We can now complete the proof of Proposition 5.1. First, note that to prove (5.1), it is enough to show that for any two rationals \(r<s\), \(\sum _{j} c_j\mathbbm {1}\left\{ r\le \xi _j \le s\right\} = \infty \) almost surely where \(\xi _j\) are the associated exponential rate \(c_j\) random variables. This, however, is trivially true as \(\sum _j c_j^2 =\infty \).

To prove the other assertions, define, for \(i\ge 1\), the point processes \(\Xi _n^{(i)}:=\left\{ x_u/\sigma _2: u\in \mathscr {C}_i\right\} \), namely the rescaled vertex weights in the ith maximal component. Analogously define \(\Xi _{\infty }^{(i)} = \left\{ c_v: v\in \mathscr {Z}_i\right\} \), namely the collection of jumps in the ith largest excursion of \(\bar{V}^{\mathbf {c}}\). Let

$$\begin{aligned} s_n^{(i)} = \sum _{v\in \mathscr {C}_i} \frac{x_v^2}{\sigma _2^2},\quad \text { and }\ s_{\infty }^{(i)}:= \sum _{v\in \mathscr {Z}_i} c_v^2, \end{aligned}$$

for the normalized sum of squares of vertex weights in a component. Define

$$\begin{aligned} {\tilde{\mathscr {P}}}_n:= \left( \left( {{\mathrm{mass}}}(\mathscr {C}_i), s_n^{(i)}, \Xi _n^{(i)}\right) , i\ge 1\right) , \quad {\tilde{\mathscr {P}}}_{\infty }:= \left( \left( Z_i,s_{\infty }^{(i)}, \Xi _{\infty }^{(i)}\right) , i\ge 1\right) . \end{aligned}$$

We will view these as random elements of \({\tilde{\mathbb {S}}}^{\infty }\) where \({\tilde{\mathbb {S}}}:= \mathbb {R}^2\times \mathscr {N}(\mathbb {R})\). Lemma 5.3 and Lemma 5.4 now imply the following:

Lemma 5.5

As \(n\rightarrow \infty \), \({\tilde{\mathscr {P}}}_n \mathop {\longrightarrow }\limits ^{d}{\tilde{\mathscr {P}}}_{\infty }\) on \({\tilde{\mathbb {S}}}^{\infty }\).

Expressing the functionals that arise in Proposition 5.1 in terms of vertex weights in maximal components completes the proof. Indeed,

$$\begin{aligned} \sigma (\mathbf {p}_n^{(i)})= \frac{\sqrt{\sum _{v\in \mathscr {C}_i} x_v^2}}{\sum _{v\in \mathscr {C}_i} x_v}=\frac{\sigma _2\sqrt{s_n^{(i)}}}{{{\mathrm{mass}}}(\mathscr {C}_i)} \rightarrow 0, \end{aligned}$$

as \(n\rightarrow \infty \). The proof of \(\mathscr {P}_n\mathop {\longrightarrow }\limits ^{d}\mathscr {P}_{\infty }\) is similar. \(\square \)

5.2 Gromov-weak convergence in Theorem 1.2

That convergence in (1.7) holds with respect to Gromov-weak topology is an easy consequence of Theorem 1.8. Indeed, setting

$$\begin{aligned} x_i= n^{-\frac{\tau -2}{\tau -1}}w_i\ \text { and }\ t_n=\frac{1}{\ell _n}\left( 1+\lambda n^{-(\tau -3)/(\tau -1)}\right) n^{\frac{2(\tau -2)}{\tau -1}}, \end{aligned}$$

we can write \(\mathrm{NR}_n(\varvec{w}(\lambda ))\) as the model \(\mathscr {G}(\varvec{x}, t_n)\) where \(\varvec{x}=\varvec{x}^{(n)}:=(x_i: i\in [n])\). A direct computation will show that \(\varvec{x}^{(n)}\) satisfies Assumption 1.6 with the entrance boundary \(\mathbf {c}^{{{\mathrm{nr}}}}\) defined in (2.11). Note also that

$$\begin{aligned} t_n-\frac{1}{\sigma _2(\varvec{x}^{(n)})}&=\frac{n^{\frac{2(\tau -2)}{\tau -1}}}{\sum _{i\in [n]}w_i^2}\left( \frac{1}{\ell _n}\sum _{i\in [n]}w_i^2-1\right) +\frac{\lambda }{\ell _n/n}. \end{aligned}$$

Under the assumptions of Theorem 1.2, \(\ell _n/n\rightarrow {{\mathrm{\mathbb {E}}}}W\) and \(\sum _{i}w_i^2/n\rightarrow {{\mathrm{\mathbb {E}}}}W^2={{\mathrm{\mathbb {E}}}}W\). Further, by [17, Lemma 2.2],

$$\begin{aligned} \frac{1}{\ell _n}\sum _{i\in [n]}w_i^2=1+\zeta n^{-(\tau -3)/(\tau -1)}+o(n^{-(\tau -3)/(\tau -1)}), \end{aligned}$$

where \(\zeta \) is as defined in (2.12). Combining these observations, we see that

$$\begin{aligned} t_n-(\sigma _2(\varvec{x}^{(n)}))^{-1}\rightarrow t^{{{\mathrm{nr}}}}_{\lambda }\ \text { as }n\rightarrow \infty , \end{aligned}$$

where \(t^{{{\mathrm{nr}}}}_{\lambda }\) is as in (2.12). Since \(n^{(\tau -3)/(\tau -1)}\sigma _2(\varvec{x}^{(n)})\rightarrow {{\mathrm{\mathbb {E}}}}W\), we conclude that \(\mathbf {M}_{\infty }^{{{\mathrm{nr}}}}(\lambda )\) defined in (2.13) is the Gromov-weak limit of \(n^{-(\tau -3)/(\tau -1)}\mathbf {M}_n^{{{\mathrm{nr}}}}(\lambda )\), where \(\mathbf {M}_n^{{{\mathrm{nr}}}}(\lambda )\) is as in (1.6).

Remark 7

Theorem 1.8 is stated for a fixed \(\lambda \in \mathbb {R}\), but in the argument just given, we have to work with a sequence, namely \(t_n-(\sigma _2(\varvec{x}^{(n)}))^{-1}\) converging to \(t^{{{\mathrm{nr}}}}_{\lambda }\). This, however, does not make any difference. Indeed, the proof of [10, Proposition 9] can be imitated to prove the same result in the setup where we have a sequence converging to t instead of a fixed t, and no new idea is involved here. (In [10, Lemma 27], Aldous and Limic prove a similar result for the multiplicative coalescent. They do not, however, explicitly state the convergence of the associated process under the same assumption).

6 Proofs: convergence in Gromov-Hausdorff-Prokhorov topology

In this section, we improve Gromov-weak convergence in Theorem 1.2 to Gromov-Hausdorff-Prokhorov convergence. To do so, we will rely on [14, Theorem 6.1] which gives a criterion for convergence in Gromov-Hausdorff-weak topology. We do not give the definition of Gromov-Hausdorff-weak topology and instead refer the reader to [14, Definition 5.1]. Convergence in Gromov-Hausdorff-weak topology implies convergence in Gromov-Hausdorff-Prokhorov topology when we are working with metric measure spaces having full support (i.e., the support of the measure is the entire metric space). This is true in our situation. Indeed, it is a trivial fact that \(\mathscr {C}_i(\lambda )\) has full support. Further, the mass measure on an inhomogeneous continuum random tree has full support which implies that the same is true for \(M_i^{{{\mathrm{nr}}}}(\lambda )\).

Applying [14, Theorem 6.1] to our situation, we see that it is enough to prove the following lemma:

Lemma 6.1

(Global lower mass-bound) Let \(\mathscr {C}_i(\lambda )\) be the ith largest component of \(\mathrm{NR}_n(\varvec{w}(\lambda ))\). Then the following assertion is true:

For each \(i\ge 1\), \(v\in [n]\) and \(\delta >0\), let \(B(v, \delta )\) denote the intrinsic ball (in \(\mathrm{NR}_n(\varvec{w}(\lambda ))\)) of radius \(\delta n^{(\tau -3)/(\tau -1)}\) around v and set

$$\begin{aligned} \mathfrak {m}_i^{(n)}(\delta )=\inf \left\{ n^{-\frac{\tau -2}{\tau -1}}\sum _{j\in B(v, \delta )}w_j\ \bigg |\ v\in \mathscr {C}_i(\lambda )\right\} . \end{aligned}$$

Then the sequence \(\left\{ \left( \mathfrak {m}_i^{(n)}(\delta )\right) ^{-1}\right\} _{n\ge 1}\) is tight.

Lemma 6.1 ensures compactness of the spaces \(M_i^{{{\mathrm{nr}}}}(\lambda )\) which, in turn, implies compactness of the spaces \(M_i^{\mathbf {c}}(\lambda )\) when \(\mathbf {c}=(c_1, c_2,\ldots )\) is of the form (1.19), thus proving the first assertion in Theorem 1.9.

Before moving on to the proof of Lemma 6.1, we state a result that essentially says that instead of looking at the largest components, we can work with the components of high-weight vertices. This observation will be used to prove the global lower-mass bound:

Proposition 6.2

For every \(\varepsilon >0\) and \(k\ge 1\), there exists \(K=K(\varepsilon , k, \lambda )>0\) such that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( [K]\cap \mathscr {C}_i(\lambda )=\emptyset \text { for some }1\le i\le k\right) \le \varepsilon . \end{aligned}$$

Proposition 6.2 follows trivially from [17, Theorem 1.6 (a)] and [17, Theorem 1.1].

6.1 Bound on size of \(\varepsilon n^{(\tau -3)/(\tau -1)}\)-nets for the largest components

For convenience, we set

$$\begin{aligned} \eta =(\tau -3)/(\tau -1)\quad \text { and }\quad \rho =(\tau -2)/(\tau -1). \end{aligned}$$
(6.1)

The purpose of this section is to prove a strong result (Proposition 6.3 stated below) that gives control over the number of intrinsic balls of radius \(\varepsilon n^{\eta }\) needed to cover the largest components. This acts as a crucial ingredient in the proof of Lemma 6.1 as well as the proof of the bound on the upper box-counting dimension.

Proposition 6.3

(Small diameter after removing high-weight vertices) For every \(\varepsilon , \delta >0\), and \(N=N(\varepsilon ):=\varepsilon ^{-\delta -1/\eta }\),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{diam}}}\left( \mathrm{NR}_n(\varvec{w}(\lambda )){\setminus }[N]\right) >\varepsilon n^{\eta }\right) \le c_{\delta }\exp \left( -C/\varepsilon ^{\delta \eta }\right) , \end{aligned}$$
(6.2)

for all n sufficiently large, a positive constant \(c_{\delta }\) depending on \(\delta \) and a universal constant \(C>0\). Here \(\mathrm{NR}_n(\varvec{w}(\lambda )){\setminus }[N]\) denotes the graph obtained by removing all vertices with labels in [N] and the edges incident to them from the graph \(\mathrm{NR}_n(\varvec{w}(\lambda ))\).

We continue to prove Proposition 6.3. Write

$$\begin{aligned} E_n=\{{{\mathrm{diam}}}(\mathrm{NR}_n(\varvec{w}(\lambda )){\setminus }[N]) \le \varepsilon n^{\eta }\}. \end{aligned}$$
(6.3)

The proof consists of four steps. In the first step, we reduce the proof to the study of the height of mixed-Poisson branching processes. In the second step, we ensure that we can take \(\lambda =0\), while in the third step, we study the survival probability of such critical infinite-variance branching processes. In the fourth and final step, we prove the claim.

Comparison to mixed-Poisson branching processes Let \(\mathscr {C}_{{{\mathrm{res}}}}(i)\) be the cluster of i in the (restricted) random graph on the vertex set \([n]{\setminus }[i-1]\) with edge probabilities \(q_{k\ell }(\varvec{w}(\lambda ))\) for \(k,\ell \in [n]{\setminus } [i-1]\), where \(q_{k\ell }(\varvec{w}(\lambda ))\) is as in (1.1).

Note that the event \(E_n^c\) implies the existence of \(i>N\) such that the following happens: (a) The diameter of the component of i in \(\mathrm{NR}_n(\varvec{w}(\lambda )){\setminus }[N]\) is bigger than \(\varepsilon n^{\eta }\). (b) No \(j\in \left\{ N+1,\ldots , i-1\right\} \) belongs to the component of i in \(\mathrm{NR}_n(\varvec{w}(\lambda )){\setminus }[N]\). In particular, \({{\mathrm{diam}}}(\mathscr {C}_{{{\mathrm{res}}}}(i))\ge \varepsilon n^{\eta }\) for this i. Thus,

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(E_n^c)\le \sum _{i>N} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{diam}}}(\mathscr {C}_{{{\mathrm{res}}}}(i))>\varepsilon n^{\eta }\right) . \end{aligned}$$
(6.4)

Now the random graph \(\mathrm{NR}_n(\varvec{w}(\lambda ))\) restricted to \([n]{\setminus } [i-1]\) is the Norros-Reittu random graph \({{\mathrm{NR}}}_n(\varvec{w}^{(i)}(\lambda ))\), where \(\varvec{w}^{(i)}(\lambda )=(w_j^{(i)}(\lambda ):j\in [n])\), \(w_j^{(i)}(\lambda )=w_j(\lambda )\ell _n^{(i)}/\ell _n\) for \(j\in [n]{\setminus } [i-1]\) and \(w_j^{(i)}(\lambda )=0\) for \(j\in [i-1]\), and \(\ell _n^{(i)}=\sum _{k=i}^n w_k\). Indeed, this follows from the simple observation

$$\begin{aligned} \frac{w_k^{(i)}(\lambda )w_{\ell }^{(i)}(\lambda )}{\sum _{r=i}^n w_{r}^{(i)}(\lambda )} =\left( 1+\frac{\lambda }{n^{\eta }}\right) \frac{w_k w_{\ell }}{\ell _n}. \end{aligned}$$

Write \(W_n^{(i)}(\lambda )\) for a random variable whose distribution is given by \((n-i+1)^{-1}\sum _{j=i}^{n}\delta _{w_j^{(i)}(\lambda )}\), and for any non-negative random variable X with \({{\mathrm{\mathbb {E}}}}X>0\), let \(X^\circ \) be the random variable having the size-biased distribution given by

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(X^\circ \le x)={{\mathrm{\mathbb {E}}}}(X\mathbbm {1}_{\left\{ X\le x\right\} })/{{\mathrm{\mathbb {E}}}}X. \end{aligned}$$

We will use the following comparison to a mixed-Poisson branching process:

Lemma 6.4

(Domination by a mixed-Poisson branching process) Fix \(i\in [n]\) and consider \({{\mathrm{NR}}}_n(\varvec{w}^{(i)}(\lambda ))\). Then, there exists a coupling of \(\mathscr {C}_{{{\mathrm{res}}}}(i)\) and a branching process where the root has a \(\mathsf{Poi}(w_i^{(i)}(\lambda ))\) offspring distribution while every other vertex has a \(\mathsf{Poi}((W_n^{(i)}(\lambda ))^\circ )\) offspring distribution such that in the breadth-first exploration of \(\mathscr {C}_{{{\mathrm{res}}}}(i)\) starting from i, each vertex \(v\in \mathscr {C}_{{{\mathrm{res}}}}(i)\) has at most the number of children as in the branching process.

Proof

See [57, Proposition 3.1]. \(\square \)

It immediately follows from Lemma 6.4 that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{diam}}}(\mathscr {C}_{{{\mathrm{res}}}}(i))>\varepsilon n^{\eta }\right) \le {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}({\overline{T}_n}^{(i)}(\lambda ))>\varepsilon n^{\eta }/2\right) , \end{aligned}$$
(6.5)

where \(\overline{T}_n^{(i)}(\lambda )\) is a mixed-Poisson branching process tree whose root has a \(\mathsf{Poi}(w_i^{(i)}(\lambda ))\) offspring distribution and every other vertex has a \(\mathsf{Poi}((W_n^{(i)}(\lambda ))^\circ )\) offspring distribution. As before, \({{\mathrm{ht}}}(\mathbf {t})\) denotes the height of the tree \(\mathbf {t}\).

When \({{\mathrm{ht}}}(\overline{T}_n^{(i)}(\lambda ))>\varepsilon n^{\eta }/2\), at least one of the subtrees of the root needs to have height at least \(\varepsilon n^{\eta }/2\). Combining this observation with (6.4) and (6.5), we get

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(E_n^c)&\le \sum _{i>N} {{\mathrm{\mathbb {E}}}}\left[ \mathsf{Poi}(w_i^{(i)}(\lambda ))\right] {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}\left( T_n^{(i)}(\lambda )\right) \ge \varepsilon n^{\eta }/2\right) \nonumber \\&\le \sum _{i>N} w_i^{(i)}(\lambda ) {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}\left( T_n^{(i)}(\lambda )\right) \ge \varepsilon n^{\eta }/2\right) , \end{aligned}$$
(6.6)

where \(T_n^{(i)}(\lambda )\) is a branching process tree where every vertex has a \(\mathsf{Poi}((W_n^{(i)}(\lambda ))^\circ )\) offspring distribution.

We make the convention of writing \(T_n^{(i)}\), \(W_n^{(i)}\) etc. instead of \(T_n^{(i)}(0)\), \(W_n^{(i)}(0)\) etc. With this notation, it is easy to see that \(W_n^{(i)}(\lambda )\mathop {=}\limits ^{d}(1+\lambda n^{-\eta }) W_n^{(i)}\) and hence \((W_n^{(i)}(\lambda ))^\circ \mathop {=}\limits ^{d}(1+\lambda n^{-\eta }) (W_n^{(i)})^\circ \).

The survival probability of mixed-Poisson branching processes We would like to compare our mixed-Poisson branching process with an offspring distribution that is independent of n. For this, we rely on the following two lemmas:

Lemma 6.5

(Mixed-Poisson branching processes of different parameters) Let \(T_n^{(i)}\) and \(T_n^{(i)}(\lambda )\) be as above. Assume further that \(\lambda \ge 0\). Then, for each \(k\ge 1\),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}(T_n^{(i)}(\lambda ))\ge k\right) \le (1+\lambda n^{-\eta })^k \cdot {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}(T_n^{(i)})\ge k\right) . \end{aligned}$$

Proof

We follow [43, Proof of Lemma 3.4(1)]. Writing \(\delta =1+\lambda n^{-\eta }\), we note that we can obtain \(T_n^{(i)}\) as a subtree of \(T_n^{(i)}(\lambda )\) by killing every child independently with probability \(1-\delta ^{-1}\). Write \(\mathcal {A}\) for the event in which \({{\mathrm{ht}}}(T_n^{(i)}(\lambda ))\ge k\) and no vertex in the leftmost path of length k starting from the root in \(T_n^{(i)}(\lambda )\) is killed. Then

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(\mathcal {A}) = \delta ^{-k} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}(T_n^{(i)}(\lambda ))\ge k\right) . \end{aligned}$$

Indeed, the probability of the leftmost path surviving is precisely \(1/\delta ^k\). To finish the proof, note that \(\mathcal {A}\) implies \({{\mathrm{ht}}}(T_n^{(i)})\ge k\), so that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}(T_n^{(i)})\ge k\right) \ge {{\mathrm{\mathbb {P}}}}(\mathcal {A})=\delta ^{-k} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}(T_n^{(i)}(\lambda ))\ge k\right) , \end{aligned}$$

which is the desired inequality. \(\square \)

Lemma 6.6

(Stochastic bound by n-independent variable) Under Assumption 1.1, the random variable \((W_n^{(i)})^\circ \) is stochastically upper bounded by \(W^{\circ }\) where \(W\sim F\), i.e., \((W_n^{(i)})^\circ \mathop {\le }\limits ^{\mathrm {st}} W^{\circ }\).

Proof

First we make the following elementary observation: if \(a_1, a_2, b_1, b_2\) are positive numbers such that

$$\begin{aligned} \frac{a_1}{b_1}\le \frac{a_2}{b_2},\ \text { then }\ \frac{a_1}{b_1}\le \frac{a_1+a_2}{b_1+b_2}\le \frac{a_2}{b_2}. \end{aligned}$$

Repeated application of the above will yield the following simple inequality: if \(\left\{ a_n\right\} _{n\ge 1}\) and \(\left\{ b_n\right\} _{n\ge 1}\) are sequences of positive numbers satisfying

$$\begin{aligned} \frac{a_1}{b_1}\le \frac{a_2}{b_2}\le \frac{a_3}{b_3}\le \ldots ,\ \text { then }\ \frac{a_1}{b_1}\le \frac{a_1+a_2}{b_1+b_2}\le \frac{a_1+a_2+a_3}{b_1+b_2+b_3}\le \ldots . \end{aligned}$$
(6.7)

Recall that \(\iota \) denotes the leftmost point of the support of F, and note that from (1.2) it follows that \(\int _{w_j}^{\infty }f=j/n\), \(j=1, 2, \ldots , n\) (note also that \(w_n=\iota \)). Define the function \(h_n: [\iota ,w_1)\rightarrow (\iota ,\infty )\) by \( \int _y^{h_n(y)}f=1/n. \) This immediately implies

$$\begin{aligned} f(h_n(y))h_n'(y)=f(y). \end{aligned}$$
(6.8)

Let \(g_n: [\iota ,w_1)\rightarrow (0,\infty )\) be given by

$$\begin{aligned} g_n(y)=\frac{y}{\int _y^{h_n(y)}uf(u)\ du}. \end{aligned}$$

A direct computation and an application of (6.8) yields

$$\begin{aligned} \bigg (\int _y^{h_n(y)}uf(u)\ du\bigg )^2 g_n'(y)=\int _y^{h_n(y)}uf(u)\ du-yf(y)\left( h_n(y)-y\right) . \end{aligned}$$

Since uf(u) is non-increasing on \([\iota , \infty )\) under Assumption 1.1, we conclude that \(g_n'(y)\le 0\) on \((\iota , w_1)\). Thus, \(g_n(\cdot )\) is non-increasing on \([\iota , w_1)\). By right continuity, we can define \(g_n(w_1)=w_1/(\int _{w_1}^{\infty }uf(u)\ du)\). Since \(w_n\le w_{n-1}\le \cdots \le w_1\), we conclude that \(g_n(w_1)\le g_n(w_2)\le \cdots \le g_n(w_n)\). Clearly \(h_n(w_j)=w_{j-1}\) for \(j=2,\ldots ,n\). Thus

$$\begin{aligned} \frac{w_1}{\int _{w_1}^{\infty }uf(u)\ du}\le \frac{w_2}{\int _{w_2}^{w_1}uf(u)\ du}\le \cdots \le \frac{w_n}{\int _{\iota }^{w_{n-1}}uf(u)\ du}. \end{aligned}$$

Now an application of (6.7) gives

$$\begin{aligned} \frac{w_1+w_2+\cdots +w_k}{\int _{w_k}^{\infty }uf(u)\ du}\le \frac{w_1+w_2+\cdots +w_n}{\int _{\iota }^{\infty }uf(u)\ du},\ \ \quad k=1,2,\ldots ,n, \end{aligned}$$

which is equivalent to

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\big ((W_n^{(i)})^\circ \ge w_k\big )&=\frac{w_1+w_2+\cdots +w_k}{w_1+w_2+\cdots +w_n}\le \frac{\int _{w_k}^{\infty }uf(u)\ du}{\int _{\iota }^{\infty }uf(u)\ du}\\&={{\mathrm{\mathbb {P}}}}\big (W^\circ \ge w_k\big ),\ \quad \ k=1,2,\ldots ,n. \end{aligned}$$

This concludes the proof. \(\square \)

We continue to study the survival probability of mixed-Poisson branching processes with infinite variance offspring distribution:

Lemma 6.7

(Survival probability of infinite-variance MPBP) Let T denote a mixed-Poisson branching process tree with offspring distribution \(\mathsf{Poi}(W^{\circ })\). Then, there exists a constant \(c_{6.7}\) such that for all \(m\ge 1\),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}({{\mathrm{ht}}}(T)\ge m)\le c_{6.7} m^{-1/(\tau -3)}. \end{aligned}$$

Proof

This is a well-known result. We sketch the proof briefly for completeness. Recall the following facts about \(W^\circ \): (a) \(\mathbb {E}[W^\circ ]=\nu =1\) and (b) for \(x\rightarrow \infty \), \({{\mathrm{\mathbb {P}}}}(W^\circ >x)=c x^{-(\tau -2)}(1+o(1)).\) By the Otter-Dwass formula, which describes the distribution of the total progeny of a branching process (see [36] for the special case when the branching process starts with a single individual, [58] for the more general case, and [42] for a simple proof based on induction), we have

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(|T|=k)=\frac{1}{k} {{\mathrm{\mathbb {P}}}}\left( \sum _{i=1}^k X_i = k-1\right) , \end{aligned}$$

where \(X_i\) are i.i.d. random variables distributed as \(W^\circ \). By [41, Proposition 2.7], in our situation, \(\mathbb {P}(\sum _{i=1}^k X_i = k-1)\le ck^{-1/(\tau -2)}\), so that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(|T|= k)\le ck^{-(\tau -1)/(\tau -2)}\quad \text { and }\qquad {{\mathrm{\mathbb {P}}}}(|T|\ge k)\le ck^{-1/(\tau -2)}. \end{aligned}$$
(6.9)

Take \(k=m^{(\tau -2)/(\tau -3)}\) in the second inequality in (6.9) to get

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}({{\mathrm{ht}}}(T)\ge m)\le c m^{-1/(\tau -3)}+\mathbb {P}\left( {{\mathrm{ht}}}(T)\ge m, |T|\le m^{(\tau -2)/(\tau -3)}\right) , \end{aligned}$$

where |T| denotes the total number of vertices in T. We condition on the size |T| and write

$$\begin{aligned} \mathbb {P}\big ({{\mathrm{ht}}}(T)\ge m, |T|\le m^{(\tau -2)/(\tau -3)}\big )&=\sum _{k=1}^{m^{(\tau -2)/(\tau -3)}} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}(T)\ge m\ \big |\ |T|=k\right) {{\mathrm{\mathbb {P}}}}(|T|=k)\nonumber \\&\le c \sum _{k=1}^{m^{(\tau -2)/(\tau -3)}} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}(T)\ge m\ \big |\ |T|=k\right) k^{-\frac{\tau -1}{\tau -2}}. \end{aligned}$$
(6.10)

By [50, Theorem 4], there exists a \(\kappa >1\) such that, uniformly for \(u\ge 1\),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}(T)\ge u k^{(\tau -3)/(\tau -2)}\ \big |\ |T|=k\right) \le {\mathrm {e}}^{-a u^{\kappa }}. \end{aligned}$$

Combining this with (6.10), we get

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}(T)\ge m, |T|\le m^{(\tau -2)/(\tau -3)}\right)&\le \sum _{k=1}^{m^{\frac{\tau -2}{\tau -3}}} \exp \left( -a \left( mk^{-\frac{\tau -3}{\tau -2}}\right) ^{\kappa }\right) k^{-\frac{\tau -1}{\tau -2}}\\&\quad =\Theta \left( m^{-1/(\tau -3)}\right) , \end{aligned}$$

as required. \(\square \)

Proof of Proposition 6.3

Clearly

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}\left( T_n^{(i)}\right) \ge m\right)&\le {{\mathrm{\mathbb {E}}}}\left[ \left( W_n^{(i)}\right) ^\circ \right] {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}\left( T_n^{(i)}\right) \ge m-1\right) \\&=:\nu _n^{(i)}{{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}\left( T_n^{(i)}\right) \ge m-1\right) , \end{aligned}$$

where

$$\begin{aligned} \nu _n^{(i)}={{\mathrm{\mathbb {E}}}}\left[ \left( W_n^{(i)}\right) ^\circ \right] =\frac{\sum _{j\ge i} (w_j^{(i)})^2}{\sum _{j\ge i} w_j^{(i)}} =\left( \frac{\ell _n^{(i)}}{\ell _n}\right) \frac{\sum _{j\ge i} w_j^2}{\sum _{j\ge i} w_j}=\frac{\sum _{j\ge i} w_j^2}{\ell _n}. \end{aligned}$$

Iterating this \(\varepsilon n^{\eta }/4\) times, we get

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}\left( T_n^{(i)}\right) \ge \varepsilon n^{\eta }/2\right)&\le (\nu _n^{(i)})^{\varepsilon n^{\eta }/4} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}\left( T_n^{(i)}\right) \ge \varepsilon n^{\eta }/4\right) \nonumber \\&\le (\nu _n^{(i)})^{\varepsilon n^{\eta }/4}{{\mathrm{\mathbb {P}}}}\left( {{\mathrm{ht}}}\left( T\right) \ge \varepsilon n^{\eta }/4\right) \le (\nu _n^{(i)})^{\varepsilon n^{\eta }/4}\nonumber \\&\quad \times c_{6.7} \left( \frac{4}{\varepsilon }\right) ^{1/(\tau -3)}\frac{1}{n^{1/(\tau -1)}}, \end{aligned}$$
(6.11)

where the second inequality is a consequence of Lemma 6.6 and the last step follows from Lemma 6.7.

Substituting the estimate (6.11) into (6.6) leads to

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(E_n^c)\le c \varepsilon ^{-1/(\tau -3)} n^{-1/(\tau -1)}\left( 1+\frac{\max \left\{ \lambda , 0\right\} }{n^{\eta }}\right) ^{1+\varepsilon n^{\eta }/2} \sum _{i>N} w_i (\nu _n^{(i)})^{\varepsilon n^{\eta }/4} \end{aligned}$$
(6.12)

for some constant c. Here we have used Lemma 6.5 and the simple fact that \(w_i^{(i)}\le w_i\).

Next, note that it is an easy consequence of (1.3) that there exist constants \(c', c''>0\) such that for all \(i\in [n]\),

$$\begin{aligned} w_i \le c'\left( \frac{n}{i}\right) ^{1/(\tau -1)}\quad \text { and }\quad \sum _{j=1}^{i}w_j^2\ge c''\sum _{j=1}^{i}\left( \frac{n}{i}\right) ^{2/(\tau -1)}. \end{aligned}$$
(6.13)

Further, [17, Lemma 2.2] implies that \(\nu _n^{(1)}<1\) for large n. Hence, for every \(i\ge 2\),

$$\begin{aligned} \nu _n^{(i)}=\nu _n^{(1)}-\frac{1}{\ell _n}\sum _{j=1}^{i-1}w_j^2 \le 1-C n^{-\eta } i^{\eta }\le \exp \left( -C n^{-\eta } i^{\eta }\right) \end{aligned}$$

for some \(C>0\). Here, we have used the second inequality in (6.13). Combining this estimate with (6.12) and the first inequality in (6.13), we end up with

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(E_n^c)\le C'\varepsilon ^{-1/(\tau -3)} \sum _{i>N} i^{-1/(\tau -1)}\exp \left( - C\varepsilon i^{\eta }/4\right) \end{aligned}$$

for some \(C'>0\). Taking \(N=\varepsilon ^{-\delta -1/\eta }\), we arrive at

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(E_n^c)&\le C'\varepsilon ^{-1/(\tau -3)}N^{-1/(\tau -1)} \sum _{i>N} \exp \left( - C\varepsilon i^{\eta }/4\right) \nonumber \\&\le C'\varepsilon ^{\delta /(\tau -1)}\sum _{k=0}^{\infty }\sum _{i=N2^k}^{N 2^{k+1}-1}\exp \left( - C\varepsilon i^{\eta }/4\right) \nonumber \\&\le C'\varepsilon ^{\delta /(\tau -1)}N\sum _{k=0}^{\infty }2^k\exp \left( - C\varepsilon (N2^k)^{\eta }/4\right) . \end{aligned}$$
(6.14)

Note that \(\varepsilon N^{\eta }=\varepsilon ^{-\delta \eta }\). A little more work after plugging this into (6.14) will lead to (6.2). \(\square \)

6.2 Proof of global lower-mass bound

In this section, we complete the proof of Lemma 6.1. We start with some preliminaries:

Lemma 6.8

(Weight of size-biased reordering) Let \(\pi _v(1)=v\) and \((\pi _v(i): i\in [n]{\setminus }\{1\})\) be a size-biased reordering on \([n]{\setminus } \{v\}\) where the size of vertex \(v'\) is proportional to \(w_{v'}\) for \(v'\in [n]{\setminus }\left\{ v\right\} \). Then, for every \(k=o(n)\), there exists a \(J>0\) such that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\bigg (\exists v:\sum _{i=1}^k w_{\pi _v(i)}\le k/2\bigg )\le n{\mathrm {e}}^{-Jk}. \end{aligned}$$

Proof

See [17, Proof of Lemma 5.1]. \(\square \)

Recall the definitions of \(\eta \) and \(\rho \) from (6.1). Recall that for \(v\in [n]\), \(B(v,\delta )\) denotes the intrinsic ball (in \(\mathrm{NR}_n(\varvec{w}(\lambda ))\)) around v or radius \(\delta n^{\eta }\). We will use the following bound on the weight of balls:

Lemma 6.9

(Weights of balls around high-weight vertices cannot be too small) For every \(\varepsilon >0\) and \(i\ge 1\), there exist \(n_{i, \varepsilon }\) large and \(\delta _{i,\varepsilon }>0\) such that for all \(n\ge n_{i,\varepsilon }\) and \(\delta \in (0, \delta _{i,\varepsilon }]\),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \sum _{j\in B(i,\delta )}w_j \le \left( \frac{c_{F}}{2i}\right) ^{1/(\tau -1)} \frac{\delta n^{\rho }}{2}\right) \le n\exp \left( -\frac{J \delta n^{\rho }}{i^{1/(\tau -1)}}\right) +\frac{\varepsilon }{2^{i}}. \end{aligned}$$
(6.15)

Proof

We rely on a cluster exploration used in [17] which we describe next. We denote by \((Z_l(i))_{l\ge 0}\) the exploration process of \(\mathscr {C}(i)\), the cluster containing i, starting from i, in the breadth-first search, where \(Z_0(i)=1\) and where \(Z_1(i)\) denotes the number of potential neighbors of the initial vertex i. The variable \(Z_l(i)\) has the interpretation of the number of potential neighbors of the first l explored potential vertices in the cluster whose neighbors have not yet been explored. As a result, we explore by taking one vertex of the ‘stack’ of size \(Z_l(i)\), drawing its mark and checking whether it is a real vertex, followed by drawing its number of potential neighbors. Thus, we set \(Z_0(i)=1, Z_1(i)=\mathsf{Poi}(w_i)\), and note that, for \(l\ge 2\), \(Z_l(i)\) satisfies the recursion relation

$$\begin{aligned} Z_l(i)=Z_{l-1}(i)+X_l-1, \end{aligned}$$

where \(X_l\) denotes the number of potential neighbors of the lth potential vertex that is explored, where \(X_1=X_1(i)=\mathsf{Poi}(w_i)\). More precisely, when we explore the lth potential vertex, we start by drawing its mark \(M_l\) in an i.i.d. way with distribution

$$\begin{aligned} \mathbb {P}(M=m)=w_m/\ell _n, \quad 1 \le m \le n. \end{aligned}$$

When we have already explored a vertex with the same mark as the one drawn, we turn the status of the vertex to be explored to inactive, the potential vertex does not become a real vertex, and we proceed with the next potential vertex. When, instead, it receives a mark that we have not yet seen, then the potential vertex becomes a real vertex, its mark \(M_l\in [n]\) indicating to which vertex in [n] the lth explored vertex corresponds, so that \(M_l\in \mathscr {C}(i)\). We then draw \(X_l=\mathsf{Poi}(w_{M_l})\), and \(X_l\) denotes the number of potential vertices incident to the real vertex \(M_l\). Again, upon exploration, these potential vertices might become real vertices, and this occurs precisely when their mark corresponds to a vertex in [n] that has not appeared in the cluster exploration so far. We call the above procedure of drawing a mark for a potential vertex to investigate whether it corresponds to a real vertex a vertex check. Let

$$\begin{aligned} \mathscr {Z}_t^{(n)}(i)=n^{-1/(\tau -1)} Z_{\lceil tn^{\rho }\rceil }(i)\ \text { for }\ t>0. \end{aligned}$$

Then, by imitating the techniques used in the proof of [17, Theorem 2.4], we obtain

$$\begin{aligned} (\mathscr {Z}_t^{(n)}(i))_{t>0}\mathop {\longrightarrow }\limits ^{d}(\mathscr {S}_t(i))_{t>0}. \end{aligned}$$

([17, Theorem 2.4] states the result for \(i=1\). However the exact same proof goes through for any \(i\ge 2\)). The limiting process \((\mathscr {S}_t(i))_{t>0}\) is defined as follows: Let

$$\begin{aligned} a=c_{F}^{1/(\tau -1)}/\mathbb {E}[W]\quad \text { and }\quad b=b(i)=(c_{F}/i)^{1/(\tau -1)}. \end{aligned}$$
(6.16)

We let \((\mathscr {I}_i(t))_{i\ge 1}\) denote independent increasing indicator processes defined by

$$\begin{aligned} \mathscr {I}_i(s)=\mathbbm {1}\left\{ \mathsf{Exp}(a i^{-1/(\tau -1)})\in [0,s]\right\} ,\quad s\ge 0, \end{aligned}$$
(6.17)

so that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \mathscr {I}_i(s)=0 \ \forall s\in [0,t]\right) =\exp \left( -at/i^{1/(\tau -1)}\right) . \end{aligned}$$

Here \(\big (\mathsf{Exp}(a i^{-1/(\tau -1)})\big )_{i\ge 1}\) are independent exponential random variables with rates \(a i^{-1/(\tau -1)}\). Then we define

$$\begin{aligned} \mathscr {S}_t(i)=b-abt+ct+\sum _{j\ne i}^{\infty } \frac{b}{j^{1/(\tau -1)}}\left[ \mathscr {I}_j(t)- \frac{at}{j^{1/(\tau -1)}}\right] \end{aligned}$$
(6.18)

for all \(t\ge 0\), where \(c=\lambda +\zeta -ab\) and \(\zeta \) is as in (2.12). We call \((\mathscr {S}_t)_{t\ge 0}\) a thinned Lévy process.

Let \(\mathscr {H}_n^{(i)}(u)\) denote the hitting time of u of the process \((\mathscr {Z}_t^{(n)}(i))_{t> 0}\). Then, by [17, Corollary 3.4], \(\mathscr {H}_n^{(i)}(u)\mathop {\longrightarrow }\limits ^{d}\mathscr {H}_{\mathscr {S}(i)}(u),\) the hitting time of u of the process \((\mathscr {S}_t(i))_{t>0}\). This implies the existence of a \(B_{\varepsilon ,i}\) (independent of n) and an integer \(n_{i,\varepsilon }\) such that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \mathscr {H}_n^{(i)}\big ((c_{F}/2i)^{1/(\tau -1)}\big )\le B_{\varepsilon ,i}\right) \le \varepsilon 2^{-i}\ \quad \text { for }\ n\ge n_{i,\varepsilon }, \end{aligned}$$
(6.19)

since the limiting process \((\mathscr {S}_t(i))_{t>0}\) starts from \((c_{F}/i)^{1/(\tau -1)}\) and takes a positive amount of time to reach \((c_{F}/2i)^{1/(\tau -1)}\).

Let |B(ir)| denote the number of vertices in B(ir). Let \(\delta _{\varepsilon ,i}\) be so small that

$$\begin{aligned} (c_{F}/2i)^{1/(\tau -1)}\delta _{\varepsilon ,i}<B_{\varepsilon ,i}. \end{aligned}$$
(6.20)

Then we claim that for all \(\delta \in (0,\delta _{\varepsilon ,i}]\),

$$\begin{aligned} \mathbb {P}\Big (|B(i,\delta )|\le (c_{F}/2i)^{1/(\tau -1)} \delta n^{\rho }\Big ) \le \mathbb {P}\Big (\mathscr {H}_n^{(i)}\big ((c_{F}/2i)^{1/(\tau -1)}\big )\le B_{\varepsilon ,i}\Big ). \end{aligned}$$
(6.21)

That (6.21) holds can be seen as follows. For \(|B(i,\delta )|\le (c_{F}/2i)^{1/(\tau -1)} \delta n^{\rho }\) to occur, there has to exist some \(j\in [1, \delta n^{\eta }]\) such that the number of vertices at distance j from i is smaller than \((c_{F}/2i)^{1/(\tau -1)} \delta n^{\rho }/(\delta n^{\eta })\), i.e.,

$$\begin{aligned} \min _{1\le j\le \delta n^{\eta }}|\partial B(i, jn^{-\eta })| \le (c_{F}/2i)^{1/(\tau -1)}n^{1/(\tau -1)}. \end{aligned}$$
(6.22)

Now the number of vertices at distance j from i is precisely the number of vertices in generation j of the breadth-first exploration process, and hence this number (scaled by \(n^{\rho }\)) appears in the function \(\mathscr {Z}_t^{(n)}(i)\). Thus, (6.22) implies that \((\mathscr {Z}_t^{(n)}(i))_{t>0}\) has to hit \((c_{F}/2i)^{1/(\tau -1)}\) before we have finished exploring up to generation \(\delta n^{\eta }\), i.e., we must have that

$$\begin{aligned} \mathscr {H}_n^{(i)}\big ((c_{F}/2i)^{1/(\tau -1)}\big ) \le \frac{|B(i,\delta )|}{n^{\rho }}\le \left( \frac{c_{F}}{2i}\right) ^{1/(\tau -1)}\delta <B_{\varepsilon , i}, \end{aligned}$$

where the last inequality holds by (6.20) and because \(\delta \in (0, \delta _{\varepsilon ,i}]\).

Combining (6.19) and (6.21), we conclude that for all \(\delta \in (0,\delta _{\varepsilon ,i}]\) and \(n\ge n_{i,\varepsilon }\),

$$\begin{aligned} \mathbb {P}\Big (|B(i,\delta )|\le (c_{F}/2i)^{1/(\tau -1)} \delta n^{\rho }\Big ) \le \varepsilon 2^{-i}. \end{aligned}$$
(6.23)

This explains the second term in (6.15).

To see what happens when \(|B(i,\delta )|\ge (c_{F}/2i)^{1/(\tau -1)} \delta n^{\rho }\), recall that the vertices appear in a size-biased fashion in our exploration process. Hence

$$\begin{aligned}&{{\mathrm{\mathbb {P}}}}\left( \sum _{j\in B(i,\delta )}w_j\le \left( \frac{c_{F}}{2i}\right) ^{1/(\tau -1)}\frac{\delta n^{\rho }}{2}, |B(i,\delta )|\ge \left( \frac{c_{F}}{2i}\right) ^{1/(\tau -1)} \delta n^{\rho }\right) \nonumber \\&\quad \le {{\mathrm{\mathbb {P}}}}\left( \sum _{j=1}^{\delta n^{\rho }\left( c_{F}/(2i)\right) ^{1/(\tau -1)}} w_{\pi _i(j)}\le \left( \frac{c_{F}}{2i}\right) ^{1/(\tau -1)}\frac{\delta n^{\rho }}{2}\right) \nonumber \\&\quad \le n\exp \left( -\frac{J \delta n^{\rho }}{i^{1/(\tau -1)}}\right) , \end{aligned}$$
(6.24)

by Lemma 6.8. Combining (6.23) and (6.24) proves the claim. \(\square \)

Lemma 6.10

For \(v\in [n]\), let \(\mathscr {C}(v)\) denote the component of v in \(\mathrm{NR}_n(\varvec{w}(\lambda ))\). Then for every fixed \(i\ge 1\) and \(\varepsilon _1,\varepsilon _2>0\), there exist \(\xi =\xi _{\varepsilon _1,\varepsilon _2}^{(i)}>0\) and an integer \({{\bar{n}}}={{{\bar{n}}}_{\varepsilon _1,\varepsilon _2}}^{(i)}\) such that

$$\begin{aligned} \mathbb {P}\left( \min _{v\in \mathscr {C}(i)} \Big (\sum _{j\in B(v, \varepsilon _1)}w_j\Big )\le \xi n^{\rho }\right) \le \varepsilon _2\ \quad \text { for }\ n\ge {\bar{n}}. \end{aligned}$$

Proof

Recall Proposition 6.3, and choose \(N_{\varepsilon _1,\varepsilon _2}\) and \(n_{\varepsilon _1,\varepsilon _2}\) large so that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{diam}}}\left( \mathrm{NR}_n(\varvec{w}(\lambda )){\setminus } [N_{\varepsilon _1,\varepsilon _2}]\right) \le \varepsilon _1 n^{\eta }/2\right) \ge 1-\varepsilon _2 \end{aligned}$$
(6.25)

for all \(n\ge n_{\varepsilon _1, \varepsilon _2}\). Let

$$\begin{aligned} F_1= & {} \left\{ {{\mathrm{diam}}}(\mathrm{NR}_n(\varvec{w}(\lambda )){\setminus } [N_{\varepsilon _1,\varepsilon _2}]) \le \varepsilon _1 n^{\eta }/2\right\} \quad \text { and } \quad \\ F_2= & {} \left\{ {{\mathrm{diam}}}(\mathscr {C}(i))>\varepsilon _1 n^{\eta }/2\right\} . \end{aligned}$$

Clearly, on the set \(F_1\cap F_2\),

$$\begin{aligned} \min _{v\in \mathscr {C}(i)}\left\{ \sum _{j\in B(v, \varepsilon _1 )}w_j\right\} \ge \min _{k\in [N_{\varepsilon _1,\varepsilon _2}]} \left\{ \sum _{j\in B(k, \varepsilon _1/2)}w_j\right\} . \end{aligned}$$
(6.26)

Recall the definition of \(\delta _{\varepsilon ,i}\) in (6.20), and let

$$\begin{aligned} \Delta _{\varepsilon _1,\varepsilon _2}=\varepsilon _1\wedge \left( \delta _{\varepsilon _2,1}\wedge \cdots \wedge \delta _{\varepsilon _2,N_{\varepsilon _1,\varepsilon _2}}\right) /2. \end{aligned}$$

Then (6.26) implies

$$\begin{aligned} \min _{v\in \mathscr {C}(i)} \sum _{j\in B(v, \varepsilon _1)}w_j \ge \min _{k\in [N_{\varepsilon _1,\varepsilon _2}]}\sum _{j\in B(k, \Delta _{\varepsilon _1,\varepsilon _2})}w_j \end{aligned}$$

on the set \(F_1\cap F_2\). Hence, for all \(n\ge n_{\varepsilon _1,\varepsilon _2}\),

$$\begin{aligned}&\mathbb {P}\left( F_1\cap F_2\cap \Big \{\min _{v\in \mathscr {C}(i)} \Big \{\sum _{j\in B(v, \varepsilon _1)}w_j\Big \}\le \left( \frac{c_{F}}{2N_{\varepsilon _1,\varepsilon _2}}\right) ^{1/(\tau -1)}\frac{\Delta _{\varepsilon _1,\varepsilon _2}n^{\rho }}{2}\Big \}\right) \nonumber \\&\quad \qquad \le \sum _{k=1}^{N_{\varepsilon _1,\varepsilon _2}} \mathbb {P}\Big ( \sum _{j\in B(k, \Delta _{\varepsilon _1,\varepsilon _2})}w_j\le \Big (\frac{c_{F}}{2N_{\varepsilon _1,\varepsilon _2}}\Big )^{1/(\tau -1)} \frac{\Delta _{\varepsilon _1,\varepsilon _2}}{2} n^{\rho }\Big )\nonumber \\&\quad \qquad \le \sum _{k=1}^{N_{\varepsilon _1,\varepsilon _2}} \left( n\exp \left( -\frac{J \Delta _{\varepsilon _1,\varepsilon _2} n^{\rho }}{N_{\varepsilon _1,\varepsilon _2}^{1/(\tau -1)}}\right) +\frac{\varepsilon _2}{2^{k}}\right) \nonumber \\&\quad \qquad \le n^2\exp \left( -\frac{J \Delta _{\varepsilon _1,\varepsilon _2}n^{\rho }}{N_{\varepsilon _1,\varepsilon _2}^{1/(\tau -1)}}\right) +\varepsilon _2, \end{aligned}$$
(6.27)

where the second inequality is a consequence of Lemma 6.9.

Next, on the set \(F_1\cap F_2^c\),

$$\begin{aligned} \sum _{j\in B(v,\varepsilon _1)}w_j=\sum _{j\in \mathscr {C}(i)}w_j \end{aligned}$$

for any \(v\in \mathscr {C}(i)\). Further, by [17, Theorem 1.4], \(n^{-\rho }\sum _{j\in \mathscr {C}(i)}w_j\) converges in distribution to a positive random variable. Hence, there exists \(\xi _{\varepsilon _2}^{(i)}>0\) such that

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\ \mathbb {P}\left( F_1\cap F_2^c\cap \Big \{\min _{v\in \mathscr {C}(i)} \Big (\sum _{j\in B(v, \varepsilon _1)}w_j\Big )\le \xi _{\varepsilon _2}^{(i)} n^{\rho }\Big \}\right) \nonumber \\&\quad \le \limsup _{n\rightarrow \infty }\ {{\mathrm{\mathbb {P}}}}\left( \sum _{j\in \mathscr {C}(i)}w_j\le \xi _{\varepsilon _2}^{(i)} n^{\rho }\right) \le \varepsilon _2. \end{aligned}$$
(6.28)

The result follows upon combining (6.25), (6.27) and (6.28). \(\square \)

We are now ready for the proof of Lemma 6.1:

Proof of Lemma 6.1

Using Proposition 6.2, for any \(i\ge 1\) and \(\varepsilon >0\), we can choose K such that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \mathscr {C}_i(\lambda )\cap [K]=\emptyset \right) \le \varepsilon /2. \end{aligned}$$
(6.29)

By Lemma 6.10, we can choose \(\xi >0\) and an integer \({\bar{n}}\) such that

$$\begin{aligned} \mathbb {P}\left( \min _{v\in \mathscr {C}(k)} \Big (\sum _{j\in B(v, \delta )}w_j\Big )\le \xi n^{\rho }\right) \le \varepsilon /(2K) \end{aligned}$$
(6.30)

for all \(n\ge {\bar{n}}\) and \(k\in [K]\). Combining (6.29) and (6.30), we see that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \left( \mathfrak {m}_i^{(n)}(\delta )\right) ^{-1}>1/\xi \right) \le \varepsilon \ \text { for }\ n\ge {\bar{n}}, \end{aligned}$$

which yields the desired tightness. \(\square \)

7 Proofs: Fractal dimension

In this section, we prove the assertions about the box-counting dimension. Throughout this section, \(C,C'\) will denote universal constants whose values may change from line to line.

We first prove a similar result for the component of j, \(\mathscr {C}(j)\). Consider \(\mathscr {C}(1)\), and as usual, view \(\mathscr {C}(1)\) as a metric measure space via the graph distance and by assigning mass \(p_v:=w_v/(\sum _{\ell \in \mathscr {C}(1)}w_{\ell })\) to vertex \(v\in \mathscr {C}(1)\). Set \(\mathbf {p}:=(p_v: v\in \mathscr {C}(1))\). Now note that conditional on the vertex set of \(\mathscr {C}(1)\), \(\mathscr {C}(1)\) has the same distribution as the graph \({\tilde{\mathscr {G}}}_m(\mathbf {p}, a)\) where \(a=(1+\lambda n^{-\eta })(\sum _{j\in \mathscr {C}(1)} w_j)^2/\ell _n\). Using [17, Proposition 3.7] and [17, Lemma 3.1], it is easy to verify that the conditions in Assumption 4.4 hold with this choice of a and \(\mathbf {p}\). Thus, by Theorem 4.5, \(n^{-\eta }\mathscr {C}(1)\) converges in Gromov-weak topology to a limiting space that we denote by \(\mathscr {M}(1)\). Further, the sequence \(\left\{ n^{-\eta }\mathscr {C}(1)\right\} _{n\ge 1}\) satisfies the global lower mass-bound property by Lemma 6.10. Hence,

$$\begin{aligned} n^{-\eta }\mathscr {C}(1)\mathop {\longrightarrow }\limits ^{d}\mathscr {M}(1) \end{aligned}$$
(7.1)

with respect to the Gromov-Hausdorff-Prokhorov topology. By similar arguments, we can show that \(n^{-\eta }\mathscr {C}(j)\mathop {\longrightarrow }\limits ^{d}\mathscr {M}(j)\) with respect to the Gromov-Hausdorff-Prokhorov topology for any \(j\ge 1\) and an appropriate (random) compact metric measure space \(\mathscr {M}(j)\). In Sect. 7.1, we identify the upper box-counting dimension, and in Sect. 7.2 the lower box-counting dimension.

7.1 Upper bound on the Minkowski dimension

The key ingredient in the proof is the following lemma:

Proposition 7.1

Write \(\pi =(\tau -2)/(\tau -3)\). Then for every \(j\ge 1\),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{\overline{dim}}}}(\mathscr {M}(j))>\pi \right) =0. \end{aligned}$$

Proof

For simplicity, we work with \(j=1\). The proof is similar for any \(j\ge 2\). Recall that \(\mathscr {N}(\mathscr {M}, \delta )\) denotes the minimum number of open balls of radius \(\delta \) needed to cover the compact space \(\mathscr {M}\). Write

$$\begin{aligned} \mathfrak {N}_{(\infty )}(\varepsilon ):=\mathscr {N}(\mathscr {M}(1), \varepsilon )\qquad \text { and } \qquad \mathfrak {N}_{(n)}(\varepsilon ):=\mathscr {N}(\mathscr {C}(1), \varepsilon n^{\eta }). \end{aligned}$$
(7.2)

Since the convergence in (7.1) holds with respect to the Gromov-Hausdorff topology, for every \(x, \varepsilon >0\),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \mathfrak {N}_{(\infty )}(2\varepsilon )>x\right) \le \limsup _n\ {{\mathrm{\mathbb {P}}}}\left( \mathfrak {N}_{(n)}(\varepsilon )>x\right) . \end{aligned}$$
(7.3)

Fix an arbitrary \(\delta >0\) and, for any \(\varepsilon >0\), define

$$\begin{aligned} x_{\varepsilon }&:=\varepsilon ^{-\delta -\pi },\qquad \ u_{\varepsilon }:=|\log \varepsilon |,\qquad \ \delta ':=\frac{\delta }{2}\left( \frac{\tau -1}{\tau -2}\right) ,\qquad \text { and }\qquad \nonumber \\ N(\varepsilon )&=\varepsilon ^{-\delta '-1/\eta }. \end{aligned}$$
(7.4)

Let \(E_n\) be the event defined in (6.3). Clearly, on the event \(E_n\cap \left\{ \mathfrak {N}_{(n)}(\varepsilon )>x_{\varepsilon }\right\} \), any \(v\in \mathscr {C}(1)\) is within distance \(\varepsilon n^{\eta }\) from a point in \(\mathscr {C}(1)\cap [N(\varepsilon )]\). Hence,

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \mathfrak {N}_{(n)}(\varepsilon )>x_{\varepsilon }\right) \le {{\mathrm{\mathbb {P}}}}(E_n^c)+{{\mathrm{\mathbb {P}}}}\left( |\mathscr {C}(1)\cap [N(\varepsilon )]|\ge x_{\varepsilon }\right) , \end{aligned}$$
(7.5)

and, by Proposition 6.3,

$$\begin{aligned} \limsup _n\ {{\mathrm{\mathbb {P}}}}(E_n^c)\le c_{\delta '}\exp \left( -C/\varepsilon ^{\delta '\eta }\right) . \end{aligned}$$
(7.6)

It remains to bound \({{\mathrm{\mathbb {P}}}}\left( |\mathscr {C}(1)\cap [N(\varepsilon )]|\ge x_{\varepsilon }\right) \). To this end, note that by [17, Proposition 3.7],

$$\begin{aligned} |\mathscr {C}(1)\cap [N(\varepsilon )]|\mathop {\longrightarrow }\limits ^{d}\sum _{q=1}^{N(\varepsilon )}\mathscr {I}_q\left( \mathscr {H}_{\mathscr {S}(1)}(0)\right) , \end{aligned}$$
(7.7)

where \(\mathscr {I}_q(\cdot )\) and \(\mathscr {H}_{\mathscr {S}(1)}(\cdot )\) are as defined around (6.18). Further, [45, Theorem 1.4] implies the existence of positive constants \(A_1\) and \(A_2\) such that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \mathscr {H}_{\mathscr {S}(1)}(0)>u_{\varepsilon }\right) \le A_1\exp (-A_2 u_{\varepsilon }^{\tau -1}). \end{aligned}$$
(7.8)

Combining (7.5),(7.6), (7.7) and (7.8), we conclude that, for any \(u_{\varepsilon }>0\),

$$\begin{aligned} \limsup _n\ {{\mathrm{\mathbb {P}}}}\left( \mathfrak {N}_{(n)}(\varepsilon )>x_{\varepsilon }\right)&\le c_{\delta '}\exp \left( -C\varepsilon ^{-\delta '\eta }\right) +A_1\exp (-A_2 u_{\varepsilon }^{\tau -1})\nonumber \\&\quad +{{\mathrm{\mathbb {P}}}}\left( \sum _{q=1}^{N(\varepsilon )}\mathscr {I}_q\left( u_{\varepsilon }\right) \ge x_{\varepsilon }\right) . \end{aligned}$$
(7.9)

Now \(\mathscr {I}_q\left( u_{\varepsilon }\right) \) are i.i.d. Bernoulli random variables with

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(\mathscr {I}_q\left( u_{\varepsilon }\right) =1)=1-\exp \left( -au_{\varepsilon }/q^{1/(\tau -1)}\right) =:p_q, \end{aligned}$$

where a is as in (6.16). Choose \(s>0\) small so that \({\mathrm {e}}^s-1\le 2s\). Clearly

$$\begin{aligned} {{\mathrm{\mathbb {E}}}}\exp \left( s\mathscr {I}_q\left( u_{\varepsilon }\right) \right)&=1+p_q\left( {\mathrm {e}}^s-1\right) \le \exp \left( p_q\left( {\mathrm {e}}^s-1\right) \right) \le \exp (2sp_q). \end{aligned}$$

Hence, there exists a constant \(A_3>0\) such that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \sum _{q=1}^{N(\varepsilon )}\mathscr {I}_q\left( u_{\varepsilon }\right) \ge x_{\varepsilon }\right)&\le \exp \left( -sx_{\varepsilon }+2s\sum _{q=1}^{N(\varepsilon )}p_q\right) \nonumber \\&\le \exp \left( -sx_{\varepsilon }+2sA_3 u_{\varepsilon } N(\varepsilon )^\rho \right) \nonumber \\&=\exp \left( -sx_{\varepsilon }+2sA_3u_{\varepsilon }\varepsilon ^{-\frac{\delta }{2}-\pi }\right) . \end{aligned}$$
(7.10)

Combining (7.3), (7.9) and (7.10), we see that \(\sum _{k=1}^{\infty }{{\mathrm{\mathbb {P}}}}\left( \mathfrak {N}_{(\infty )}(2/k)>k^{\delta +\pi }\right) <\infty .\) Since \(\delta >0\) was arbitrary, we conclude that

$$\begin{aligned} \limsup _k\ \frac{\log \left( \mathfrak {N}_{(\infty )}(2/k)\right) }{\log (k/2)}\le \pi \qquad a.s. \end{aligned}$$

By sandwiching \(\varepsilon \) between \(2/(k-1)\) and 2 / k, we get the desired upper bound on \({{\mathrm{\overline{dim}}}}(\mathscr {M}(1))\). \(\square \)

Proof of upper bounds in (1.8) and (1.20): We only give the proof of (1.8). This will imply (1.20) because of (2.13). Fix \(i\ge 1\) and let

$$\begin{aligned} K_n:=\min \left\{ j\in [n]:\ j\in \mathscr {C}_i(\lambda )\right\} . \end{aligned}$$

By Proposition 6.2, \(K_n\) is tight. By passing to a subsequence if necessary, we can assume that we are working on a space where

$$\begin{aligned} \left( n^{-\eta }\mathbf {M}_n^{{{\mathrm{nr}}}}(\lambda ), K_n\right) \rightarrow \left( \mathbf {M}_{\infty }^{{{\mathrm{nr}}}}(\lambda ), K_{\infty }\right) \qquad a.s. \end{aligned}$$

for some (integer-valued) random variable \(K_{\infty }\). Then

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{\overline{dim}}}}\left( M_i^{{{\mathrm{nr}}}}(\lambda )\right)>\pi \right) =\sum _{j=1}^{\infty }{{\mathrm{\mathbb {P}}}}\left( {{\mathrm{\overline{dim}}}}\left( M_i^{{{\mathrm{nr}}}}(\lambda )\right) >\pi ,\ K_{\infty }=j\right) . \end{aligned}$$

By Proposition 7.1, \({{\mathrm{\mathbb {P}}}}\left( {{\mathrm{\overline{dim}}}}\left( M_i^{{{\mathrm{nr}}}}(\lambda )\right) >\pi ,\ K_{\infty }=j\right) =0\) for every \(j\ge 1\), and hence

$$\begin{aligned} {{\mathrm{\overline{dim}}}}\left( M_i^{{{\mathrm{nr}}}}(\lambda )\right) \le \pi \ \text { a.s.} \end{aligned}$$
(7.11)

This completes the proof of the upper bound on the Minkowski dimension. \(\square \)

7.2 Lower bound on the Minkowski dimension

We next extend the argument for the upper bound to prove a lower bound on the Minkowski dimension of \(\mathscr {M}(j)\). As in (7.3),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \mathfrak {N}_{(\infty )}(\varepsilon /2)<x\right) \le \limsup _n\ {{\mathrm{\mathbb {P}}}}\left( \mathfrak {N}_{(n)}(\varepsilon )<x\right) . \end{aligned}$$

Recall the definitions in (7.2), and for an arbitrary \(\delta >0\) and \(\varepsilon >0\), adapt (7.4) to

$$\begin{aligned} \underline{x}_{\varepsilon }:=\varepsilon ^{\delta -\pi },\qquad \delta ':=\frac{\delta }{\pi }\left( 1-h\right) , \qquad \text { and } \qquad \underline{N}(\varepsilon )=\varepsilon ^{-(1-\delta ')/\eta }, \end{aligned}$$

where \(\pi =(\tau -2)/(\tau -3)\) as in Proposition 7.1, and \(h>0\) is sufficiently small so that

$$\begin{aligned} \kappa _3:= & {} 2-\delta -(1-\delta ')\left( \frac{3\tau -8}{\tau -3}\right) +\frac{\tau -2}{\tau -3}>0,\ \text { and }\nonumber \\ \kappa _4:= & {} 1-\delta -(1-\delta ')\left( \frac{2\tau -5}{\tau -3}\right) +\frac{\tau -2}{\tau -3}>0. \end{aligned}$$
(7.12)

(A simple calculation will show that it is possible to choose \(h>0\) small so that (7.12) holds whenever \(\tau >3\)).

The main result in this section is the following estimate on \(\mathfrak {N}_{(n)}(\varepsilon ):=\mathscr {N}(\mathscr {C}(j), \varepsilon n^{\eta })\):

Proposition 7.2

There exist \(\kappa >0\) and \(c>0\) such that

$$\begin{aligned} \limsup _n\ {{\mathrm{\mathbb {P}}}}\big (\mathfrak {N}_{(n)}(\varepsilon )<\underline{x}_{\varepsilon }\big )\le c\varepsilon ^{\kappa }. \end{aligned}$$
(7.13)

Consequently, for every \(j\ge 1\),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( {{\mathrm{\underline{dim}}}}(\mathscr {M}(j))<\pi \right) =0. \end{aligned}$$

The rest of this section is devoted to the proof of Proposition 7.2. As in Sect. 7.1 and for simplicity, we work with \(j=1\). The proof is similar for any \(j\ge 2\). Before starting with the proof, we collect some preliminaries. The proof below relies on two asymptotic bounds on \(|\mathscr {C}(1)|\). For this, we use

$$\begin{aligned} \limsup _n\ {{\mathrm{\mathbb {P}}}}\big (n^{-\rho }|\mathscr {C}(1)|\le s)={{\mathrm{\mathbb {P}}}}(\mathscr {H}_{\mathscr {S}(1)}(0)\le s), \end{aligned}$$
(7.14)

where \(\mathscr {H}_{\mathscr {S}(1)}(\cdot )\)is defined around (6.18). Our main result on the lower tails of the distribution of \(\mathscr {H}_{\mathscr {S}(1)}(0)\) is in the following lemma:

Lemma 7.3

(Lower tails of \(\mathscr {H}_{\mathscr {S}(1)}(0)\)) There exists \(C>0\) such that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(\mathscr {H}_{\mathscr {S}(1)}(0)\le s)\le C s. \end{aligned}$$
(7.15)

Proof

We note that

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\big (\mathscr {H}_{\mathscr {S}(1)}(0)\le s\big )={{\mathrm{\mathbb {P}}}}\big (\exists t\le s:\mathscr {S}_t(1)=0\big ). \end{aligned}$$

We split

$$\begin{aligned} \mathscr {S}_t(1)=b-abt+ct+\frac{b}{a}\big (\mathscr {R}_t-\mathscr {D}_t\big ), \end{aligned}$$
(7.16)

where, abbreviating \(d_j=a/j^{1/(\tau -1)}\),

$$\begin{aligned} \mathscr {R}_t = \sum _{j\ge 2}^{\infty } d_j\left[ N_j(t)- d_jt\right] ,\ \text { and }\ \mathscr {D}_t =\sum _{j\ge 2}^{\infty } d_j[N_j(t)-1]\mathbbm {1}_{\{N_j(t)\ge 2\}}. \end{aligned}$$

Here \((N_j(t))_{t\ge 0}\) are independent rate \(d_j\) Poisson processes. Thus, \((\mathscr {R}_t)_{t\ge 0}\) is a Lévy process, while \((\mathscr {D}_t)_{t\ge 0}\) subtracts the multiple hits. When \(b>0\) and \(t\le s\) with s small, and using that \(\mathscr {D}_s\) is non-decreasing,

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\big (\mathscr {H}_{\mathscr {S}(1)}(0)\le s\big )\le {{\mathrm{\mathbb {P}}}}\bigg (\inf _{t\in [0,s]} \mathscr {R}_t\le -a/4\bigg )+{{\mathrm{\mathbb {P}}}}\big (\mathscr {D}_s\ge a/4\big ). \end{aligned}$$
(7.17)

We start with the latter contribution. Since, for a Poisson random variable Z with parameter \(\lambda \),

$$\begin{aligned} \mathbb {E}\big [(Z-1)\mathbbm {1}_{\{Z\ge 2\}}\big ]=\sum _{k\ge 2}(k-1) {\mathrm {e}}^{-\lambda }\frac{\lambda ^k}{k!} =\lambda ^2 \sum _{k\ge 2}\frac{1}{k}{\mathrm {e}}^{-\lambda }\frac{\lambda ^{k-2}}{(k-2)!} \le \frac{\lambda ^2}{2}, \end{aligned}$$

we have

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\big (\mathscr {D}_s\ge a/4\big )\le \frac{4}{a}\mathbb {E}[\mathscr {D}_s]\le \frac{2}{a}\sum _{j\ge 2}d_j (d_js)^2 =\frac{2s^2}{a}\sum _{j\ge 2}d_j^3. \end{aligned}$$

For the first term in (7.17), we use Doob’s \(L^2\)-inequality to bound

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\bigg (\inf _{t\in [0,s]} \mathscr {R}_t\le -a/4\bigg )&\le \frac{16}{a^2} \mathbb {E}[\mathscr {R}_s^2] =\frac{16}{a^2} \sum _{j\ge 2}^{\infty } d_j^2{\mathrm {Var}}(N_j(s)) \\&=\frac{16}{a^2} \sum _{j\ge 2}^{\infty } d_j^2(d_js)=\frac{16 s}{a^2} \sum _{j\ge 2}^{\infty } d_j^3, \end{aligned}$$

so that (7.15) follows. \(\square \)

Lemma 7.4

(Cluster weight convergence) For a set of vertices \(A\subseteq [n]\), let \(w(A)=\sum _{a\in A} w_a\) denote its weight. Then, as \(n\rightarrow \infty \), for every \(j\ge 1\), \(\mathbb {E}[n^{-\rho }w(\mathscr {C}(j))]\) remains uniformly bounded as \(n\rightarrow \infty \), where \(\rho \) is as in (6.1).

Proof

Fix \(K\ge 0\) so large that

$$\begin{aligned} \nu _n^{(K+1)}:=\frac{1}{\ell _n}\sum _{i\in [n]{\setminus } [K]} w_i^2\le \frac{1-n^{-\eta }}{1+|\lambda | n^{-\eta }}. \end{aligned}$$
(7.18)

This is possible, since \(\ell _n/n\rightarrow \mathbb {E}[W]\), while

$$\begin{aligned} \frac{1}{\ell _n}\sum _{i\in [n]{\setminus } [K]} w_i^2&=\nu _n-\frac{1+o(1)}{\mathbb {E}[W]n}\sum \limits _{i\le K} w_i^2 \le \nu _n-Cn^{-1+2/(\tau -1)} \sum \limits _{i\le K} i^{-2/(\tau -1)}\nonumber \\&\le 1-C'n^{-\eta } K^{\eta }, \end{aligned}$$
(7.19)

where we have used (6.13) in the second step to lower bound \(\sum _{i\le K} w_i^2\). We write \(\mathscr {C}(A)=\bigcup _{a\in A} \mathscr {C}(a)\). Then, for \(j\le K\), we bound

$$\begin{aligned} w(\mathscr {C}(j))\le w(\mathscr {C}([K])). \end{aligned}$$

We next investigate \(\mathbb {E}[w(\mathscr {C}([K]))]\). Note that

$$\begin{aligned} p_{ij}=1-{\mathrm {e}}^{-(1+\lambda n^{-\eta }) w_iw_j/\ell _n}\le (1+\lambda n^{-\eta }) \frac{w_iw_j}{\ell _n}, \end{aligned}$$

where \(\ell _n=\sum _{l\in [n]}w_l\) is the total weight. Thus, for any \(A\subseteq [n]\),

$$\begin{aligned} \mathbb {P}\big ({{\mathrm{dist}}}(A,j)=l\big )\le \sum _{a\in A}\sum _{i_1,\ldots , i_{l-1}\in [n]}^* \prod _{s=1}^l p_{i_{s-1},i_s}, \end{aligned}$$

where \(i_0=a, i_l=j\) and the sum is over distinct vertices not in A. Using the bound on \(p_{i,j}\) and performing the sum over \(i_1,\ldots , i_{l-1}\), we obtain that

$$\begin{aligned} \mathbb {P}({{\mathrm{dist}}}([K],j)=l)\le & {} (1+\lambda n^{-\eta })\sum _{a\in [K]}\frac{w_a w_j}{\ell _n} \bigg ((1+\lambda n^{-\eta })\nu _n^{(K+1)}\bigg )^{l-1}\nonumber \\= & {} (1+\lambda n^{-\eta })w([K])\frac{w_j}{\ell _n} \bigg [(1+\lambda n^{-\eta })\nu _n^{(K+1)}\bigg ]^{l-1}. \end{aligned}$$
(7.20)

By (7.18),

$$\begin{aligned} (1+\lambda n^{-\eta })\nu _n^{(K+1)}\le 1-n^{-\eta }. \end{aligned}$$

As a result, for large n,

$$\begin{aligned} \mathbb {E}\big [w(\mathscr {C}([K]))\big ]\le & {} 2w([K])\Big [1+\sum _{j\in [n]{\setminus } [K]} \frac{w_j^2}{\ell _n}\sum _{l\ge 1} (1-n^{-\eta })^{l-1}\Big ]\\\le & {} 4w([K])\Big [1+\sum _{l\ge 0} (1-n^{-\eta })^{l}\Big ]=8w([K]) n^{\eta }. \end{aligned}$$

Since, by an argument similar to (7.19),

$$\begin{aligned} w([K])\le CK^{\rho } n^{1/(\tau -1)}, \end{aligned}$$

we arrive at

$$\begin{aligned} \mathbb {E}\big [w(\mathscr {C}([K]))\big ]\le CK^{\rho } n^{\eta +1/(\tau -1)}=CK^{\rho } n^{\rho }. \end{aligned}$$

This completes the proof. \(\square \)

By (7.14) and Lemma 7.3,

$$\begin{aligned} \limsup _n\ {{\mathrm{\mathbb {P}}}}\big (|\mathscr {C}(1)|\le \varepsilon ^{\delta h/2}n^\rho \big )\le C\varepsilon ^{\delta h/2}. \end{aligned}$$

We conclude that

$$\begin{aligned} \limsup _n\ {{\mathrm{\mathbb {P}}}}\big (\mathfrak {N}_{(n)}(\varepsilon )<\underline{x}_{\varepsilon }\big )\le & {} \limsup _n\ {{\mathrm{\mathbb {P}}}}\Big (\{\mathfrak {N}_{(n)}(\varepsilon )<\underline{x}_{\varepsilon }\}\cap \{|\mathscr {C}(1)|> \varepsilon ^{\delta h/2}n^{\rho }\}\Big ) \nonumber \\&+C\varepsilon ^{\delta h/2}. \end{aligned}$$
(7.21)

We now study the event in (7.21). We note that \(\mathfrak {N}_{(n)}(\varepsilon )\ge X^{(n)}(\varepsilon )\), which is defined as

$$\begin{aligned} X^{(n)}(\varepsilon )=1+\sum _{i=2}^{\underline{N}(\varepsilon )} \mathbbm {1}_{\{i\in \mathscr {C}(1)\}} \mathbbm {1}_{\{{{\mathrm{dist}}}_{\mathscr {C}(1)}(i, [i-1])>4\varepsilon n^{\eta }\}}, \end{aligned}$$
(7.22)

where \({{\mathrm{dist}}}_{\mathscr {C}(1)}(A,B)\) is the graph distance between the sets of vertices \(A\cap \mathscr {C}(1)\) and \(B\cap \mathscr {C}(1)\). Indeed, we start counting in the order \(i\ge 1\), and determine whether an extra ball is needed to cover vertex i after we have covered the vertices in \([i-1]\cap \mathscr {C}(1)\). The first contribution in (7.22) comes from the ball that covers vertex 1.

Use inclusion-exclusion to write \(X^{(n)}(\varepsilon )\) as

$$\begin{aligned} X^{(n)}(\varepsilon )=X_1^{(n)}(\varepsilon )-X_2^{(n)}(\varepsilon ), \end{aligned}$$

where

$$\begin{aligned} X_1^{(n)}(\varepsilon )=\sum _{i=1}^{\underline{N}(\varepsilon )} \mathbbm {1}_{\{i\in \mathscr {C}(1)\}}, \qquad X_2^{(n)}(\varepsilon )=\sum _{i=2}^{\underline{N}(\varepsilon )}\mathbbm {1}_{\{i\in \mathscr {C}(1)\}} \mathbbm {1}_{\{{{\mathrm{dist}}}_{\mathscr {C}(1)}(i, [i-1])\le 4\varepsilon n^{\eta }\}}. \end{aligned}$$

Therefore,

$$\begin{aligned}&{{\mathrm{\mathbb {P}}}}\Big (\{\mathfrak {N}_{(n)}(\varepsilon )<\underline{x}_{\varepsilon }\} \cap \{|\mathscr {C}(1)|> \varepsilon ^{\delta h/2}n^{\rho }\}\Big ) \nonumber \\&\quad \le {{\mathrm{\mathbb {P}}}}\Big (\{X_1^{(n)}(\varepsilon )<2\underline{x}_{\varepsilon }\} \cap \{|\mathscr {C}(1)|>\varepsilon ^{\delta h/2}n^{\rho }\}\Big )\nonumber \\&\qquad +\,{{\mathrm{\mathbb {P}}}}\big (X_2^{(n)}(\varepsilon )>\underline{x}_{\varepsilon }\big ). \end{aligned}$$
(7.23)

We will show that the limsup as \(n\rightarrow \infty \) of the first probability is bounded by \(C\varepsilon ^{\kappa _1}\), and the limsup as \(n\rightarrow \infty \) of the second by \(C\varepsilon ^{\kappa _2}\) with \(\kappa _1,\kappa _2>0\), so that Proposition 7.2 will follow with \(\kappa =\min \{\delta h/2, \kappa _1, \kappa _2\}\).

Analysis of \(X_1^{(n)}\). It follows from [17, Proposition 3.7] that

$$\begin{aligned} \limsup _{n\rightarrow \infty }\ {{\mathrm{\mathbb {P}}}}\Big (\{X_1^{(n)}(\varepsilon )<2\underline{x}_{\varepsilon }\} \cap \{|\mathscr {C}(1)|>\varepsilon ^{\delta h/2}n^{\rho }\}\Big ) \le {{\mathrm{\mathbb {P}}}}\big (\underline{X}_1(\varepsilon )\le 2\underline{x}_{\varepsilon }\big ),\qquad \quad \end{aligned}$$
(7.24)

where

$$\begin{aligned} \underline{X}_1(\varepsilon ):=\sum _{i=1}^{\underline{N}(\varepsilon )} \mathcal {I}_i(\varepsilon ^{\delta h/2}) \end{aligned}$$

is a sum of independent indicators with success probabilities \(1-\exp \big (-d_i \varepsilon ^{\delta h/2}\big )\), \(i=1,\ldots , \underline{N}(\varepsilon )\) (recall (6.17)), with \(d_j\) as defined right below (7.16). Note that

$$\begin{aligned} \mathbb {E}\big [\underline{X}_1(\varepsilon )\big ]= & {} \sum _{i=2}^{\underline{N}(\varepsilon )}\mathbb {P}\big (\mathcal {I}_i(\varepsilon ^{\delta h/2})=1\big ) =\sum _{i=2}^{\underline{N}(\varepsilon )}\bigg [1-\exp \big (-d_i \varepsilon ^{\delta h/2}\big )\bigg ]\nonumber \\\le & {} \sum _{i=2}^{\underline{N}(\varepsilon )}d_i \varepsilon ^{\delta h/2} \le C\varepsilon ^{\delta h/2}\underline{N}(\varepsilon )^{\rho }. \end{aligned}$$
(7.25)

Similarly, for small enough \(\varepsilon >0\),

$$\begin{aligned} \mathbb {E}\big [\underline{X}_1(\varepsilon )\big ] =\sum _{i=2}^{\underline{N}(\varepsilon )}\bigg [1-\exp \big (-d_i \varepsilon ^{\delta h/2}\big )\bigg ] \ge \tfrac{1}{2}\sum _{i=2}^{\underline{N}(\varepsilon )}d_i \varepsilon ^{\delta h/2} \ge C'\varepsilon ^{\delta h/2}\underline{N}(\varepsilon )^{\rho }\ge 3\underline{x}_{\varepsilon }.\nonumber \\ \end{aligned}$$
(7.26)

Further, since \(\underline{X}_1(\varepsilon )\) is a sum of independent indicators,

$$\begin{aligned} {\mathrm {Var}}(\underline{X}_1(\varepsilon ))\le \mathbb {E}[\underline{X}_1(\varepsilon )]. \end{aligned}$$
(7.27)

Combining (7.24), (7.25), (7.26), and (7.27), we get

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\ \mathbb {P}\big (\{X_1^{(n)}(\varepsilon )\le 2\underline{x}_{\varepsilon }\} \cap \{|\mathscr {C}(1)|> \varepsilon ^{\delta h/2}n^{\rho }\}\big )\nonumber \\&\quad \le \mathbb {P}\big (\underline{X}_1(\varepsilon )\le 2\underline{x}_{\varepsilon }\big ) \le \mathbb {P}\Big (\big |\underline{X}_1(\varepsilon )-\mathbb {E}[\underline{X}_1(\varepsilon )]\big | \ge \underline{x}_{\varepsilon }\Big ) \le \underline{x}_{\varepsilon }^{-2}{\mathrm {Var}}(\underline{X}_1(\varepsilon ))\nonumber \\&\quad \le \underline{x}_{\varepsilon }^{-2}\mathbb {E}[\underline{X}_1(\varepsilon )] \le C\varepsilon ^{\kappa _1}, \end{aligned}$$
(7.28)

where \(\kappa _1=2\pi -2\delta +\delta h/2-\pi (1-\delta ')>0\) when \(\delta >0\) is sufficiently small. This proves a bound on the first term on the right side of (7.23).

Analysis of \(X_2^{(n)}\). We next give an upper bound on \({{\mathrm{\mathbb {P}}}}(X_2^{(n)}(\varepsilon )\ge \underline{x}_{\varepsilon })\). We start with

$$\begin{aligned} \mathbb {P}\big (X_2^{(n)}(\varepsilon ) \ge \underline{x}_{\varepsilon }\big ) \le \underline{x}_{\varepsilon }^{-1} \mathbb {E}\big [X_2^{(n)}(\varepsilon )\big ]. \end{aligned}$$
(7.29)

Further,

$$\begin{aligned} \mathbb {E}\big [X_2^{(n)}(\varepsilon )\big ]&=\sum \limits _{i=2}^{\underline{N}(\varepsilon )}\mathbb {P}\Big (i\in \mathscr {C}(1), {{\mathrm{dist}}}_{\mathscr {C}(1)}(i, [i-1])\le 4\varepsilon n^{\eta }\Big ). \end{aligned}$$
(7.30)

When \(i\in \mathscr {C}(1)\) and \({{\mathrm{dist}}}_{\mathscr {C}(1)}(i, [i-1])\le 4\varepsilon n^{\eta }\), there has to be \(j\in [i-1]\) and \(k\in [n]\) such that the three events

  1. (i)

    \(\{{{\mathrm{dist}}}(i, k)\le 4\varepsilon n^{\eta }\}\);

  2. (ii)

    \(\{{{\mathrm{dist}}}(j, k)\le 4\varepsilon n^{\eta }\}\);

  3. (iii)

    \(\{k\in \mathscr {C}(1)\}\),

occur disjointly, where \({{\mathrm{dist}}}(i,j)\) denotes the graph distance in the random graph \(\mathrm{NR}_n(\varvec{w})\). There are two cases depending on whether \(k>\underline{N}(\varepsilon )\) or \(k\le \underline{N}(\varepsilon )\). When \(k\le \underline{N}(\varepsilon )\), we can ignore the event \(\{{{\mathrm{dist}}}(j, k)\le 2\varepsilon n^{\eta }\}\). This gives, for \(2\le i\le \underline{N}(\varepsilon )\),

$$\begin{aligned}&\mathbb {P}\Big (i\in \mathscr {C}(1), {{\mathrm{dist}}}_{\mathscr {C}(1)}(i, [i-1])\le 4\varepsilon n^{\eta }\Big )\\&\quad \le \sum _{j=1}^{\underline{N}(\varepsilon )} \sum _{k>\underline{N}(\varepsilon )} \mathbb {P}\Big (\{{{\mathrm{dist}}}(i, k)\le 4\varepsilon n^{\eta }\}\circ \{{{\mathrm{dist}}}(j, k)\le 4\varepsilon n^{\eta }\}\circ \{k\in \mathscr {C}(1)\}\Big )\\&\qquad +\sum _{k=1}^{\underline{N}(\varepsilon )} \mathbb {P}\Big (\{{{\mathrm{dist}}}(i, k)\le 4\varepsilon n^{\eta }\}\circ \{k\in \mathscr {C}(1)\}\Big ), \end{aligned}$$

where, for two increasing events AB, we write \(A\circ B\) for the event that A and B occur disjointly.

By the BK inequality, we bound

$$\begin{aligned}&\mathbb {P}\Big (i\in \mathscr {C}(1), {{\mathrm{dist}}}_{\mathscr {C}(1)}(i, [i-1])\le 4\varepsilon n^{\eta }\Big )\\&\quad \le \sum _{j=1}^{\underline{N}(\varepsilon )} \sum _{k>\underline{N}(\varepsilon )} \mathbb {P}\big ({{\mathrm{dist}}}(i, k) \le 4\varepsilon n^{\eta }\big ) \mathbb {P}\big ({{\mathrm{dist}}}(j, k)\le 4\varepsilon n^{\eta }\big ) \mathbb {P}\big (k\in \mathscr {C}(1)\big )\\&\qquad +\sum _{k=1}^{\underline{N}(\varepsilon )} \mathbb {P}\big ({{\mathrm{dist}}}(i, k)\le 4\varepsilon n^{\eta }\big ) \mathbb {P}\big (k\in \mathscr {C}(1)\big ). \end{aligned}$$

Similar to (7.20), we have

$$\begin{aligned} \mathbb {P}\big ({{\mathrm{dist}}}(i,j)=l\big )\le \bigg (1+\frac{\lambda }{n^{\eta }}\bigg )\frac{w_iw_j}{\ell _n} \nu _n(\lambda )^{l-1}, \end{aligned}$$

where \(\nu _n(\lambda )=(1+\lambda n^{-\eta })\nu _n\). In our case, \(\nu _n=1+O(n^{-\eta })\), so that, for \(l\le 4\varepsilon n^{\eta }\),

$$\begin{aligned} \mathbb {P}\big ({{\mathrm{dist}}}(i,j)\le l\big )\le Cl\frac{w_i w_j}{\ell _n}. \end{aligned}$$

Further,

$$\begin{aligned} \mathbb {P}\big (k\in \mathscr {C}(1)\big )= & {} \mathbbm {1}_{\{k=1\}} +\sum _{l\in [n]} \mathbb {P}\big (\{l\in \mathscr {C}(1)\}\circ \{kl\text { occupied}\}\big )\\\le & {} \mathbbm {1}_{\{k=1\}} + \sum _{l\in [n]}\bigg (1+\frac{\lambda }{n^{\eta }}\bigg ) \frac{w_k w_l}{\ell _n} \mathbb {P}\big (l\in \mathscr {C}(1)\big )\\= & {} \mathbbm {1}_{\{k=1\}} +\bigg (1+\frac{\lambda }{n^{\eta }}\bigg ) \frac{w_k}{\ell _n} \mathbb {E}\big [w(\mathscr {C}(1))\big ], \end{aligned}$$

where we recall that \(w(A)=\sum _{a\in A}w_a\) denotes the total weight of A. By Lemma 7.4, \(\mathbb {E}[n^{-\rho }w(\mathscr {C}(1))]\) remains uniformly bounded as \(n\rightarrow \infty \). We conclude that

$$\begin{aligned}&\mathbb {P}\Big (i\in \mathscr {C}(1), {{\mathrm{dist}}}_{\mathscr {C}(1)}(i, [i-1])\le 4\varepsilon n^{\eta }\Big )\\&\le C\varepsilon ^2 n^{2\eta +\rho } \sum _{j=1}^{\underline{N}(\varepsilon )} \sum _{k>\underline{N}(\varepsilon )} \frac{w_i w_j w_k^3}{\ell _n^3} +C\varepsilon n^{\eta +\rho } \sum _{k=2}^{\underline{N}(\varepsilon )} \frac{w_i w_k^2}{\ell _n^2} +C\varepsilon n^{\eta } \frac{w_i w_1}{\ell _n}\\&\le C'\varepsilon ^2 n^{2\eta +\rho -3+5/(\tau -1)} \sum _{j=1}^{\underline{N}(\varepsilon )} \sum _{k>\underline{N}(\varepsilon )} i^{-1/(\tau -1)} j^{-1/(\tau -1)} k^{-3/(\tau -1)}\\&\quad +C'\varepsilon n^{\eta +\rho -2+3/(\tau -1)} \sum _{k=2}^{\underline{N}(\varepsilon )}i^{-1/(\tau -1)} k^{-2/(\tau -1)}+C'\varepsilon n^{\eta -1+2/(\tau -1)}i^{-1/(\tau -1)}, \end{aligned}$$

where the last step uses the first inequality in (6.13). Note that

$$\begin{aligned} 2\eta +\rho -3+5/(\tau -1)=\eta +\rho -2+3/(\tau -1)=\eta -1+2/(\tau -1)=0, \end{aligned}$$

so that the powers of n cancel. Combining the above with (7.30) leads to

$$\begin{aligned} \mathbb {E}\big [X_2^{(n)}(\varepsilon )\big ]\le & {} C\varepsilon ^2 \Big (\sum _{j=1}^{\underline{N}(\varepsilon )} j^{-1/(\tau -1)}\Big )^2 \sum _{k>\underline{N}(\varepsilon )} k^{-3/(\tau -1)}\\&+C\varepsilon \sum _{j=1}^{\underline{N}(\varepsilon )} j^{-1/(\tau -1)} \sum _{k=1}^{\underline{N}(\varepsilon )} k^{-2/(\tau -1)}. \end{aligned}$$

Note that

$$\begin{aligned} \sum _{j=1}^{N} j^{-p/(\tau -1)}= & {} O(N^{(\tau -1-p)/(\tau -1)})\ \text { for }\ p=1,2,\ \text { and }\ \sum _{k>N} k^{-3/(\tau -1)}\\= & {} O(N^{(\tau -4)/(\tau -1)}). \end{aligned}$$

Thus

$$\begin{aligned} \mathbb {E}\big [X_2^{(n)}(\varepsilon )\big ]\le & {} C\varepsilon ^2 \underline{N}(\varepsilon )^{2(\tau -2)/(\tau -1)+(\tau -4)/(\tau -1)} +C\varepsilon \underline{N}(\varepsilon )^{(\tau -2)/(\tau -1)+(\tau -3)/(\tau -1)}\\= & {} C\Big [\varepsilon ^2 \underline{N}(\varepsilon )^{(3\tau -8)/(\tau -1)} +\varepsilon \underline{N}(\varepsilon )^{(2\tau -5)/(\tau -1)}\Big ]. \end{aligned}$$

Using (7.29) and plugging in the values \(\eta =(\tau -3)/(\tau -1), \pi =(\tau -2)/(\tau -3)\), we arrive at

$$\begin{aligned} \limsup _{n\rightarrow \infty } \mathbb {P}\big (X_2^{(n)}(\varepsilon )\ge \underline{x}_{\varepsilon }\big )\le & {} C \underline{x}_{\varepsilon }^{-1}\Big [\varepsilon ^2 \underline{N}(\varepsilon )^{(3\tau -8)/(\tau -1)} +\varepsilon \underline{N}(\varepsilon )^{(2\tau -5)/(\tau -1)}\Big ]\nonumber \\= & {} C[\varepsilon ^{\kappa _3}+ \varepsilon ^{\kappa _4}\big ], \end{aligned}$$
(7.31)

where the exponents \(\kappa _3\) and \(\kappa _4\) are positive because of the choice of \(\delta '\) (see (7.12)).

Completion of the proof of Proposition 7.2: Note that (7.13) follows upon combining (7.21), (7.23), (7.28), and (7.31). Now fix \(p>1/\kappa \), where \(\kappa \) is as in (7.13). Then \(\sum _{k=1}^{\infty }{{\mathrm{\mathbb {P}}}}\left( \mathfrak {N}_{(\infty )}(1/k^p)<(2k)^{(\pi -\delta )p}\right) <\infty .\) Since \(\delta >0\) was arbitrary, we conclude that

$$\begin{aligned} \liminf _k\ \frac{\log \left( \mathfrak {N}_{(\infty )}(1/k^p)\right) }{\log (k^p)} \ge \pi \quad a.s. \end{aligned}$$

By sandwiching \(\varepsilon \) between \(1/(k-1)^p\) and \(1/k^p\), we obtain the bound: \({{\mathrm{\underline{dim}}}}(\mathscr {M}(1))\ge \pi \) a.s. \(\square \)

Proof of (1.8) and (1.20): Proposition 7.2 combined with an argument identical to the the one given right after the proof of Proposition 7.1 yields the lower bound: \({{\mathrm{\underline{dim}}}}\left( M_i^{{{\mathrm{nr}}}}(\lambda )\right) \ge \pi \) a.s. (1.8) follows once we combine this lower bound with (7.11), and (1.20) follows as a consequence of (2.13). \(\square \)

8 Open problems

In Theorem 1.8, we have considered a general entrance boundary \(\mathbf {c}\in l_0\). To study specific properties of the limit objects, we focused mainly on the special case \(\mathbf {c}=\mathbf {c}(\alpha ,\tau )\) as in (1.19) and in this case, we have shown compactness and identified the box counting dimension in Theorem 1.9. An important problem in this context is to establish necessary and sufficient conditions on \(\mathbf {c}\) that ensure compactness of the limiting spaces.

Another motivation for pursuing this problem comes from the following simple corollary of Theorem 1.9: For any \(i\ge 1\), consider the sequence \(\varvec{\theta }^{(i)}\) as in (2.10). Then \(\mathscr {T}_{(\infty )}^{\varvec{\theta }^{(i)}}\) is almost surely compact. Similarly, compactness of \(\mathscr {M}(1)\) (as defined in (7.1)) implies compactness of the associated ICRT \(\mathscr {T}_{(\infty )}^{{\overline{\varvec{\theta }}}}\) where \(\overline{\varvec{\theta }}=(\overline{\theta }_i:i\ge 1)\) is given by the following prescription: Let \(q_k\) be such that

$$\begin{aligned} \sum _{q=1}^{q_k}\mathscr {I}_q\left( \mathscr {H}_{\mathscr {S}(1)}(0)\right) =k, \end{aligned}$$

where \(\mathscr {I}_q(\cdot )\) and \(\mathscr {H}_{\mathscr {S}(1)}(\cdot )\) are as defined around (6.18). Define

$$\begin{aligned} \overline{\theta }_i= \frac{q_i^{-1/(\tau -1)}}{\left( \sum _{k=1}^{\infty }q_k^{-2/(\tau -1)}\right) ^{1/2}}\ \qquad \text { for }\ \qquad i\ge 1. \end{aligned}$$

These can be thought of as “annealed results,” since \(\varvec{\theta }^{(i)}\) and \({\overline{\varvec{\theta }}}\) are random. No result is known in this direction without a prior distribution on \(\varvec{\theta }\), i.e., sufficient conditions on non-random \(\varvec{\theta }\in \Theta \) that ensure compactness of the tree \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) are not known. In [11, Section 7], Aldous, Miermont and Pitman conjecture that boundedness of \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) for \(\varvec{\theta }\in \Theta \) is equivalent to \(\int _1^{\infty }(\psi _{\varvec{\theta }}(u))^{-1}du<\infty \), where \(\psi _{\varvec{\theta }}\), in our situation, is given by

$$\begin{aligned} \psi _{\varvec{\theta }}(u)=\sum _{i=1}^{\infty }\left( \exp (-u\theta _i)-1+u\theta _i\right) . \end{aligned}$$

This conjecture, however, is open to date. Our proof technique demonstrates a method of proving such annealed results via approximation by random graphs. Thus, classification of those \(\mathbf {c}\in l_0\) for which the spaces \(M_i^{\mathbf {c}}(\lambda )\) are compact will lead to a broad class of prior distributions on \(\varvec{\theta }\) for which \(\mathscr {T}_{(\infty )}^{\varvec{\theta }}\) is compact.

Problem 8.1

Find necessary and sufficient conditions on \(\mathbf {c}\) that ensure compactness of the spaces \(M_i^{\mathbf {c}}(\lambda )\) for \(i\ge 1\).

Another related problem is to find the fractal dimensions of the limiting spaces. As a corollary to Theorem 1.9, we get

$$\begin{aligned} \dim \left( \mathscr {T}_{(\infty )}^{\varvec{\theta }^{(i)}}\right) = (\tau -2)/(\tau -3) a.s. \end{aligned}$$
(8.1)

where \(\varvec{\theta }^{(i)}\) is as in (2.10) corresponding to \(\mathbf {c}\) of the form (1.19). Proposition 7.1 and Proposition 7.2 show that the assertion in (8.1) remains true if we replace \(\varvec{\theta }^{(i)}\) by \({\overline{\varvec{\theta }}}\). Now, it is not hard to prove that

$$\begin{aligned} \inf _j\ \overline{\theta }_j j^{1/(\tau -2)}>0\quad a.s.\quad \text {and}\quad \sup _j\ \overline{\theta }_j j^{1/(\tau -2)}<\infty \quad a.s. \end{aligned}$$

It then follows that

$$\begin{aligned} \tau -2&=\sup \left\{ a\ge 0\ :\ \lim _{u\rightarrow \infty }u^{-a}\psi _{{\overline{\varvec{\theta }}}}(u)=\infty \right\} \\&=\inf \left\{ a\ge 0\ :\ \lim _{u\rightarrow \infty }u^{-a}\psi _{{\overline{\varvec{\theta }}}}(u)=0\right\} \ a.s., \end{aligned}$$

which in turn implies that both the Hausdorff dimension and the packing dimension of a \(\psi _{{\overline{\varvec{\theta }}}}\) Lévy tree equal \((\tau -2)/(\tau -3)\) a.s. (see [34, 40]). Using the analogy between ICRTs and Lévy trees as in [11, Section 7], it is natural to expect that the same is true for \(\mathscr {T}_{(\infty )}^{{\overline{\varvec{\theta }}}}\) and hence for \(\mathscr {M}(1)\). This is the heuristic behind Conjecture 1.3.

Problem 8.2

Prove Conjecture 1.3.