1 The Model and Main Results

1.1 The Newman–Watts Model

The Newman–Watts small world model, often referred to as “small world” in short, is one of the first random graph models created to model real-life networks. It was introduced by Ball, Mollisson and Scalia-Tomba [6] as “the great circle” epidemic model, then also by Watts and Strogatz [36], and a simplifying modification was made by Newman and Watts [31] later. The Newman–Watts model consist of a cycle on n vertices, each connected to the \(k\ge 1\) nearest vertices, and then extra shortcut edges are added in a similar fashion to the creation of the Erdős-Rényi graph [21]: i.e., for each pair of not yet connected vertices, we connect them independently with probability p.

The model has been studied from different aspects. Newman et al. studied distances [32, 33] with simulations and mean-field approximation, as well as the threshold for a large outbreak of the spread of non-deterministic epidemics [30]. Barbour and Reinert treated typical distances rigorously. First, in [7], they studied a continuous circle with circumference n instead of a cycle on n many vertices, and added \(\mathrm{Poi}(n\rho /2)\) many 0-length shortcuts at locations chosen according to the uniform measure on the circle. Then, in [8], they studied the discrete model, with all edge lengths equal to 1. They showed that typical distances in both models scale as \(\log n\).

Besides typical distances, the mixing time of simple random walk on the Newman–Watts model was also studied, i.e., the time when the distribution of the position of the walker gets close enough to the stationary distribution in total variation distance. Durrett [20] showed that the order of the mixing time is between \((\log n)^2\) and \((\log n)^3\), then Addario-Berry and Lei [1] proved that Durett’s lower bound is sharp.

1.2 Main Results

We work on the Newman–Watts small world model [31] with independent random edge weights: we take a cycle \(C_n\) on n vertices, that we denote by \([n]:=\{1,2,\dots , n\}\), and each edge \((i,j)\in [n], |i-j|=1\mod n\) is present. Then independently for each \(i, j \in [n], |i-j|\ne 1\mod n\) we add the edge (ij) with probability \(\rho /n\) to form shortcut edges. The parameter \(\rho \) is the asymptotic average number of shortcuts from a vertex. Conditioned on the edges of the resulting graph, we assign weights that are i.i.d. exponential random variables with mean 1 to the edges. We denote the weight of edge e by \(X_e\). We write \(\mathrm{NW}_n(\rho )\) for a realization of this weighted random graph.

We define the distance between two vertices in \(\mathrm{NW}_n(\rho )\) as the sum of weights along the shortest weight path connecting the two vertices. In this respect, the weighted graph with this distance function is a (non-Euclidean) random metric space. Further, interpreting the edge weights as time or cost, the distance between two vertices can also correspond to the time it takes for information to spread from one vertex to the other on the network, or it can model the cost of transmission between the two vertices.

We say that a sequence of events \(\{\mathcal {E}_n\}_{n\in \mathbb {N}}\) happens with high probability (w.h.p.) if \(\lim _{n\rightarrow \infty }\mathbb {P}(\mathcal {E}_n)=1\), that is, the probability that the event holds tends to 1 as the size of the graph tends to infinity. We write \(\mathrm{Bin}, \mathrm{Poi}, \mathrm{Exp}\) for binomial, \(\mathrm{Poisson}\), and exponential distributions. For random variables \(\{X_n\}_{n\in \mathbb {N}}, X\), we write \(X_n\buildrel {d}\over {\longrightarrow }X\) if \(X_n\) tends to X in distribution as \(n\rightarrow \infty \). The moment generating function of a random variable X is the function \(M_X(\vartheta ):=\mathbb {E}[ \exp \{ \vartheta X\}]\).

Our first result is about typical distances in the weighted graph. Let \(\Gamma _{ij}\) denote the set of all paths \(\gamma \) in \(\mathrm{NW}_n(\rho )\) between two vertices \(i,j \in [n]\). Then the weight of the shortest weight path is defined by

$$\begin{aligned} \mathcal {P}_n(i,j) := \min _{\gamma \in \Gamma _{ij}} \sum _{e\in \gamma }X_e. \end{aligned}$$
(1.1)

Theorem 1.1

(Typical distances) Let UV be two uniformly chosen vertices in [n]. Then, the distance \(\mathcal {P}_n(U,V)\) in \(\mathrm{NW}_n(\rho )\) with i.i.d. \(\mathrm{Exp}(1)\) edge weights satisfies w.h.p.

$$\begin{aligned} \mathcal {P}_n(U,V)-\frac{1}{\lambda } \log n \buildrel {d}\over {\longrightarrow }-\frac{1}{\lambda } (\log W^U W^V +\Lambda + c), \end{aligned}$$

where \(\lambda \) is the largest root of the polynomial \(p(x)=x^2+(1-\rho )x-2\rho \), \(\Lambda \) is a standard Gumbel random variable, the random variables \(W^U, W^V\) are independent copies of the martingale limit of the multi-type branching process defined below in Sect. 2.3, and \(c:=\log (1-\pi _R^2/2)-\log (\lambda (\lambda +1))\) with \(\pi _R=2/(\lambda +2)\).

Let us write \(\gamma ^\star =\gamma ^\star (i,j)\) for the path that minimizes the weighted distance in (1.1). We call \(\mathrm{H}_n(U,V) := |\gamma ^\star (U,V)|\) the hopcount, i.e., the number of edges along the shortest-weight path between two uniformly chosen vertices.

Theorem 1.2

(Central Limit Theorem for the hopcount) Let UV be two uniformly chosen vertices in [n]. Then, the hopcount \(\mathrm{H}_n(U,V)\) in \(\mathrm{NW}_n(\rho )\) with i.i.d. \(\mathrm{Exp}(1)\) edge weights satisfies w.h.p.

$$\begin{aligned} \frac{\mathrm{H}_n(U,V) - \frac{\lambda +1}{\lambda } \log n}{\sqrt{\frac{\lambda +1}{\lambda } \log n}} \buildrel {d}\over {\longrightarrow }Z, \end{aligned}$$

where Z is a standard normal random variable.

Our next result characterises the proportion of vertices within distance t away from a uniformly chosen vertex U as a function of t. To put this result into perspective, note that we can model the spread of information starting from some source set \(I_0\subset [n]\) at time \(t=0\) as follows: We assume that once a vertex v receives the information at time t, it starts transmitting the information towards all its neighbors at rate 1. Let us denote the vertices that are connected to v by an edge by \(\mathcal {H}(v)\), then, for each \(w \in \mathcal {H}(v)\), w receives the information from v at time \(t+X_{(v,w)}\). We further assume that transmission happens only after the first receipt of the information, that is, any consecutive receipts are ignored. If instead of the spread of information spread, we model the spread of a disease, this model is often called an SI-epidemic (susceptible-infected).

In the next theorem we consider this epidemic spread model from a single source \(I_0=\{U\}\) on \(\mathrm{NW}_n(\rho )\) with i.i.d. \(\mathrm{Exp}(1)\) transmission times. We define

$$\begin{aligned} \mathrm{I}_n(t,U):=\frac{1}{n} \sum _{i\in [n]} {\mathbbm {1}}\{i \text { is infected before or at time } t\}= \frac{1}{n} \#\{ i: i \in [n], \Gamma _{U,j} \le t \}, \end{aligned}$$
(1.2)

the fraction of infected vertices at time t of the epidemic started from the vertex U.

Theorem 1.3

(Epidemic curve) Let U be a uniformly chosen vertex in [n], and let us consider the epidemic spread with source U and i.i.d. \(\mathrm{Exp}(1)\) transmission times on \(\mathrm{NW}_n(\rho )\). Then, the proportion of infected individuals satisfies w.h.p.

$$\begin{aligned} \mathrm{I}_n(t + \tfrac{1}{\lambda }\log n,U) \buildrel {d}\over {\longrightarrow }f(t + \tfrac{1}{\lambda }\log W_U), \end{aligned}$$

where \(f(t) = 1-\mathrm{M}_{W_V}\left( x(t)\right) \), where \(M_{W_V}(\cdot )\) is the moment generating function of \(W_V\), and \(x(t)= -\left( 1-\frac{1}{2} \pi _R^2\right) {\mathrm e}^{\lambda t}/ (\lambda (\lambda +1))\), with \(\pi _R=2/(\lambda +2)\); and where \(W_U, W_V\), are the same random variables as in Theorem 1.1.

Remark 1.4

Note that Theorems 1.1 and 1.2 are analogous to similar results in the sequence of papers [1113, 24], while Theorem 1.3 is analogous to the results in [9, 14]. The intuitive message of Theorem 1.3 is that a linear proportion of infected vertices can be observed after a time that is proportional to the logarithm of the size of the population. This time has a random shift given by \(\tfrac{1}{\lambda }\log W_U\). Besides this random shift, the fraction of infected individuals follows a deterministic curve \(f(\cdot )\): only the ‘position of the curve’ on the time-axes is random. A bigger value of \(W_U\) means that the local neighborhood of U is “dense”, and hence the spread is quick in the initial stages: indeed, a bigger value of \(W_U\) shifts the function \(f( t+ (\log W_U)/\lambda )\) more to the left on the time axes. This phenomenon has been observed in real-life epidemics, see e.g. [2, 35] for a characterisation of typical epidemic curve shapes. For individual epidemic curves, browse e.g. [18].

The next proposition characterizes the function \(M_{W_V}(t)\) in the definition of the epidemic curve function f(t) in Theorem 1.3.

Proposition 1.5

(Functional equation system for the moment generating function) The moment generating function \(M_{W_V}(\vartheta ), \vartheta \in \mathbb {R}^+\) of the random variable \(W_V\) satisfies the following functional equation system, with \(M_{W_V}(\vartheta ):=M_{W^{\scriptscriptstyle {(B)}}}(\vartheta )\):

$$\begin{aligned} \begin{array}{ll} M_{W^{\scriptscriptstyle {(B)}}} (\vartheta ) &{}= \left( \int _0^\infty M_{W^{\scriptscriptstyle {(R)}}} (\vartheta {\mathrm e}^{-\lambda x}) {\mathrm e}^{-x} \mathrm{d}x \right) ^2 \,\cdot \, \exp \!\left\{ \rho \cdot \int _0^\infty \left( M_{W^{\scriptscriptstyle {(B)}}} (\vartheta {\mathrm e}^{-\lambda x}) -1\right) {\mathrm e}^{-x} \mathrm{d}x \right\} , \\ M_{W^{\scriptscriptstyle {(R)}}} (\vartheta ) &{}= \int _0^\infty M_{W^{\scriptscriptstyle {(R)}}} (\vartheta {\mathrm e}^{-\lambda x}) {\mathrm e}^{-x} \mathrm{d}x \,\cdot \, \exp \! \left\{ \rho \cdot \int _0^\infty \left( M_{W^{\scriptscriptstyle {(B)}}} (\vartheta {\mathrm e}^{-\lambda x}) -1\right) {\mathrm e}^{-x} \mathrm{d}x \right\} . \end{array}\nonumber \\ \end{aligned}$$
(1.3)

Remark 1.6

These functional equations and the fact that there exists a solution for all \(\vartheta \in \mathbb {R}^+\) follow from the usual branching recursion of multi-type branching processes, that can be found e.g. in [5].

1.3 Related Literature, Comparison and Context

First passage percolation (FPP) was first introduced by Hammersley and Welsh [22] to study spreading dynamics on lattices, in particular on \(\mathbb {Z}^d, d\ge 2\). The intuitive idea behind the method is that one imagines water flowing at a constant rate through the (random) medium, the waterfront representing the spread. The model turned out to be able to capture the core idea of several other processes, such as weighted graph distances and epidemic spreads.

Janson [25] studied typical distances and the corresponding hopcount, flooding times as well as diameter of FPP on the complete graph. He showed that typical distances, the flooding time and diameter converge to 1, 2, and 3 times \(\log n/n\), respectively, while the hopcount is of order \(\log n\).

1.3.1 Universality Class

In a sequence of papers (e.g. [1113, 23, 24]) van der Hofstad et al. investigated FPP on random graphs. Their aim was to determine universality classes for the shortest path metric for weighted random graphs without ‘extrinsic’ geometry (e.g. the supercritical Erdős–Rényi random graph, the configuration model, or rank-1 inhomogeneous random graphs). They showed that typical distances and the hopcount scale as \(\log n\), as long as the degree distribution has finite asymptotic variance and the edge weights are continuous on \([0,\infty )\). On the other hand, power-law degrees with infinite asymptotic variance drastically change the metric and there are several universality classes, compare [24] with [11]. In this respect, Theorems 1.1 and 1.2 show that the presence of the circle does not modify the universality class of the model. CLT for the hopcount in weighted random graphs first occured in [25] for the complete graph, then it was implicitly stated in [23] for the Erdős-Rényi random graph, with average degree at least \((\log n)^3\). For finite mean degree random graphs, CLT of the hopcount was proved in [1113].

1.3.2 Comparison to the Erdős–Rényi graph

Notice that the subgraph formed by shortcut edges is approximately an Erdős–Rényi graph, with the difference that the presence of the cycle always makes \(\mathrm{NW}_n(\rho )\) connected and hence there is no subcritical or critical regime in \(\mathrm{NW}_n(\rho )\). Typical distances on the Erdős–Rényi graph with parameter \(\rho /n\) and \(\mathrm{Exp}(1)\) edge weights scale as \(\log n/(\rho -1)\) [12], while for \(\mathrm{NW}_n(\rho )\) they scale as \((\log n)/\lambda \), with \(\lambda =(\rho -1 + \sqrt{\rho ^2 + 6\rho +1})/2> \rho -1\) for all \(\rho \in \mathbb {R}\). This means that when \(\rho >1\), the presence of the cycle makes typical distances shorter, and this appears already in the constant scaling factor of \(\log n\). However, \(\lambda (\rho )/\rho \rightarrow 1\) as \(\rho \rightarrow \infty \) meaning that the effect of the cycle becomes more and more negligible as the number of shortcut edges grow.

1.3.3 Comparison to Inhomogeneous Random Graphs

Kolossváry et al. [29] studied FPP on the inhomogeneous random graph model (IHRG), defined in [15]. In this model, vertices have types from a type space S, and conditioned on the types of the vertices, edges are present independently with probabilities that depend on the types. One can fine-tune the parameters of this model so that any finite neighborhood of a vertex in the \(\mathrm{NW}_n(\rho )\) model is similar to that of in the IHRG, that is, both of them can be modelled using the same continuous time multi-type branching process. It would be natural to conjecture that typical distances are then the same in these two models. It turns out that this is almost but not entirely the case: the first order term \(\lambda ^{-1} \log n\), and the random variables \(W_U, W_V\) are the same, but the additive constant c in Theorem 1.1 is not: the geometry of the Newman–Watts model modifies how the two branching processes can connect to each other, which modifies the constant. Writing the main result in [29] in the same form as the one in Theorem 1.1, we obtain \(c_\text {IHRG} = \log \left( (\rho +2)(2 \rho +\lambda ^2) /( \rho (\lambda +2)^2\lambda (\lambda +1)\right) \).

1.3.4 Comparison to the Discrete Model

Barbour and Reinert were the first to investigated typical distances on the Newman–Watts model rigorously. In [7] they investigated a similar model, a continuous circle with circumference L instead of L many vertices, and added \(\mathrm{Poi}(L\rho /2)\) many shortcuts at locations chosen according to uniform measure on the circle. Distances are measured by the usual arc measure along the circle, while shortcuts are given length 0. Their results - considering typical distances - are implicit, but rewritten they show the distance is logarithmic function of L:

$$\begin{aligned} \mathbb {P}(\mathcal {P}_L(U,V)>(\log (L\rho )/2+x)/\rho ) \rightarrow \int _0^\infty \frac{{\mathrm e}^{-y}\mathrm{d}y}{1+y(2{\mathrm e}^{2x})} \end{aligned}$$

In a subsequent paper [8] they treated the discrete model \(\mathrm{NW}_n(\rho (n))\) with unit edge weights. They gave complete characterisation of typical distances in terms of the parameter \(\rho (n)\) that might also tend to infinity with n. In particular, they showed that the earlier continuous model is a good approximation only if \(\rho (n)\rightarrow \rho \): in this case the distances are again logarithmic.

1.3.5 The Epidemic Curve

The study of the epidemic curve on random graphs initiates from Barbour and Reinert [9], who investigated the epidemic curve on the Erdős-Rényi random graph and on the configuration model with bounded degrees, where also possible other aspects such as contagious period of vertices or dependence of the transmission time distribution on the degrees might be present. Later, in [14] Bhamidi et al. pointed out the connection between FPP, typical distances, and the epidemic curve by studying the epidemic spread on the configuration model with arbitrary continuous edge-weight distribution. Our Theorem 1.3 is very much along the lines of these two results.

1.3.6 Possible Future Directions

In [3, 10, 19] the competition of two spreading processes running on the same graph is investigated. This can be considered a competition between two epidemics, as well as the word-of-mouth marketing of two similar products. The results suggest that the outcome depends on the universality class of the model: in ultra-small worlds, one competitor only gets a negligible part of the vertices, while on regular graphs coexistence might be possible, i.e., both colors can paint a linear fraction of vertices. Studying competition on \(\mathrm{NW}_n(\rho )\) is an interesting and challenging future project.

1.4 Structure of the Paper

In what follows, we prove Theorems 1.1, 1.2 and 1.3. The brief idea of the proof is the following: we choose two vertices uniformly at random, then we start to explore the neighbourhoods of these vertices in the graph in terms of the distance from these vertices (Sect. 2). We show that this procedure w.h.p. results in ‘shortest weight trees’ (\(\mathrm{SWT}\)’s) that can be coupled to two independent copies of a continuous time multi-type branching process (CMBP). We then handle how these two shortest weight trees connect in the graph in Sect. 3 with the help of a Poisson approximation. We provide the proof of Theorem 1.3 about the epidemic curve in Sect. 4 based on our result on distances. Finally we prove the Central Limit Theorem for the hopcount in Sect. 5, based on an indicator representation of the ‘generation of vertices’ in the branching processes.

2 Exploration Process

To explore the neighborhood of a vertex, we use a modification of Dijkstra’s algorithm.

Introduce the following notations: \(\mathcal {N}(t), \,\mathcal {A}(t), \,\mathcal {U}(t)\) denote the set of explored (dead), active (alive) and unexplored vertices at time t, respectively, and \(\mathrm{N}(t), \,\mathrm{A}(t), \mathrm{U}(t)\) for the sizes of these sets. The remaining lifetime of some vertex \(w \in \mathcal {A}(t)\) at time t is denoted by \(R_w(t)\), and means that w will become explored exactly at time \(t+R_w(t)\). The set of remaining lifetimes is \(\mathcal {R}_{\left\{ \mathcal {A}(t)\right\} }(t)\). As before, \(\mathcal {H}(v)\) denotes the neighbors of a vertex v (Figs. 1, 2).

Fig. 1
figure 1

A realisation of the Newman–Watts model for \(k=1\) and \(\rho =1.1\) with 60 vertices. On these two pictures, we illustrated the growing neighbourhood of a uniformly picked vertex. Circle edges are red and global edges are blue in the exploration. The edges that are partially red or blue are the ones that have an already explored vertex on one side while a not-yet explored (active) vertex on the other side (Color figure online)

2.1 The Exploration Process on an Arbitrary Weighted Graph

Let \(i=1\). The vertex from which we start the exploration process is denoted by \(v_1\). We color \(v_1\) blue and set the time as \(t=T_1=0\). Evidently, we take

$$\begin{aligned} \mathcal {N}(0) = \{v_1\}, \quad \mathcal {A}(0) = \mathcal {H}(v_1), \quad \mathcal {U}(0) = [n]\setminus \left( \{v_1\}\cup \mathcal {H}(v_1)\right) . \end{aligned}$$

The remaining lifetimes are determined by the edge weights, i.e.

$$\begin{aligned} \mathcal {R}_{\left\{ \mathcal {A}(0)\right\} }(0) = \{R_w(0)=X_{(v_1,w)} \text { for all } w \in \mathcal {H}(v_1)\}. \end{aligned}$$

We color the active vertices \(w \in \mathcal {H}(v_1)\) to have the same color as the edge \((v_1,w)\).

We work with induction from now on. In each step, we increase i by 1. We can construct the continuous time process in steps, namely, at the random times when we explore a new vertex.

Fig. 2
figure 2

We indicated the growing neighbourhood of a uniformly picked vertex in the exploration process. Exclamation marks indicate ‘bad events’ for the coupling to a branching process: the vertices at the endpoint of edges (indicated along the edge) with two blue exclamation marks are vertices that are blue active and have been already explored as well. The vertex with two red and one blue exclamation mark is twice red active and once blue active at the same time (Color figure online)

Let \(\tau _i=\min \left\{ \mathcal {R}_{\left\{ \mathcal {A}(T_{i-1})\right\} }(T_{i-1})\right\} \), the minimum of remaining lifetimes. Then define \(T_i:=T_{i-1}+\tau _i\), the time when we explore the next vertex. Nothing changes in the time interval \([T_{i-1},T_i)\), hence for any \(t \in [T_{i-1},T_i)\),

$$\begin{aligned} \mathcal {N}(t):=\mathcal {N}(T_{i-1}), \quad \mathcal {A}(t):=\mathcal {A}(T_{i-1}), \quad \mathcal {U}(t):=\mathcal {U}(T_{i-1}). \end{aligned}$$

From all the remaining lifetimes, we subtract the time passed: for some \(0\le s\le \tau _i\),

$$\begin{aligned} \mathcal {R}_{\left\{ \mathcal {A}(T_{i-1})\right\} }(T_{i-1}+s):=\mathcal {R}_{\left\{ \mathcal {A}(T_{i-1})\right\} }(T_{i-1})-s, \end{aligned}$$

subtracted element-wise. At time \(T_i\), the vertex (or all the vertices, if there is more than one such vertices) \(v_i\) of which the remaining lifetime equals 0, becomes explored and its neighbors become active. We shall refer to \(v_i\) as the \(i^\text {th}\) explored vertex. We set

$$\begin{aligned} \mathcal {N}(T_i):=\mathcal {N}(T_{i-1})\cup \{v_i\}, \quad \mathcal {A}(T_i):=(\mathcal {A}(T_{i-1})\setminus \{v_i\})\cup \mathcal {H}(v_i), \quad \mathcal {U}(T_i):=\mathcal {U}(T_{i-1})\setminus \mathcal {H}(v_i). \end{aligned}$$

We refresh the set of remaining lifetimes:

$$\begin{aligned} \mathcal {R}_{\left\{ \mathcal {A}(T_i)\right\} }(T_i):=\mathcal {R}_{\left\{ \mathcal {A}(T_{i-1})\right\} }(T_i)\setminus \{R_{v_i}(T_i)\}\cup \{R_x(T_i) : x \in \mathcal {H}(v_i)\} \end{aligned}$$

where \(R_x(T_i)=X_{(v_i,x)}\), the edge weight of \((v_i,x)\), and x also gets the color of \((v_i,x)\).

On an arbitrary connected weighted graph, the exploration process can be continued until all vertices become explored. Note that this algorithm builds the shortest weight tree \(\mathrm{SWT}\) from the starting vertex. This tree will be modeled using the branching process.

Remark 2.1

The set of active vertices might contain several occurrences of a vertex, in case at least two neighbors of a vertex are explored already, see Fig. 2.

2.2 Exploration on the Weighted Newman–Watts Random Graph

We aim to apply the exploration process defined above for discovering the neighborhood of a vertex in a realization of \(\mathrm{NW}_n(\rho )\). In the beginning, we think of the environment as completely random, and we reveal the presence and weight of edges as the exploration algorithm proceeds, i.e., we reveal an edge when one of its endpoints becomes explored. In this respect, all the quantities defined for the exploration process become random variables. In this section, we investigate the behavior of this random exploration process (Figs. 1, 2).

Let us color the cycle-edges red and the shortcut-edges blue, and let us say that an instance of a vertex is red/blue in the exploration if it is encountered via a red/blue edge. Below, adding the subscript R or B to any quantity corresponds to the same quantity restricted to only the red or blue vertices, respectively. Note that in case there are more paths leading to a vertex i from the root, there might be multiple instances of i in the exploration and they might have different colors. However, as these paths have different lengths, eventually a unique instance of i will be determined by being explored first, making the coloring unique on explored vertices. We deal with the issue of multiple instances thoroughly in Sects. 2.4.3 and 2.4.1.

While running the exploration process, we build a weighted tree along the process containing the edges that are used to explore the new vertices in the algorithm (restricted to the explored vertices, this is indeed a tree). This tree has root \(v_1\), grows in time, and at any time t it contains the vertex \(v\in [n]\) precisely when \(\mathcal {P}(v_1, v)<t\). Let us denote the tree up to time t by \(\mathrm{SWT}^{v_1}(t)\).

Claim 2.2

(Children) Suppose the vertex v is being explored for the first time (i.e., not “double-explored”). If v is red, one new red and \(\mathrm{Binomial}(n-3,\frac{\rho }{n})\) many new blue active vertices are born. If v is blue, two new red and \(\mathrm{Binomial}(n-4,\frac{\rho }{n})\) many new blue active vertices are born. The number of new blue active vertices is asymptotically \(\mathrm{Poi}(\rho )\) in both cases. Further, at any time t, the elements of \(\mathcal {R}_{\left\{ \mathcal {A}(t)\right\} }(t)\) are i.i.d. \(\mathrm{Exp}(1)\) random variables, and the next explored vertex is chosen uniformly over the set of active vertices.

Proof

On a cycle there are two vertices neighboring a vertex, hence, if v is red, then it has been reached from one of his neighbors. The other one is added to the new red active vertices. If v is blue then it has been reached via a shortcut edge and hence both of its neighbors on the cycle are added to the new red active vertices. Since there are \(\mathrm{Bin}(n-3, \rho /n)\) many shortcut edges from a vertex, this is also the distribution of new blue active vertices born when exploring a red vertex. For the exploration of a blue vertex, we reached this vertex via a blue edge, hence an additional \(\mathrm{Bin}(n-4, \rho /n)\) new active blue vertices. Clearly, by the convergence of binomial to Poisson distribution, each vertex has asymptotically \(\mathrm{Poi}(\rho )\) many blue neighbours. The second statement follows from the fact that the edge weights are i.i.d. exponential random variables, which has the memoryless property. Finally, note that at any time, \(\mathcal {R}_{\left\{ \mathcal {A}(t)\right\} }(t)\) consists of i.i.d. exponential random variables, and the algorithm takes the minimum of these. Clearly, the minimum of finite many absolutely continuous random variables is unique almost surely, and uniform over the indices. \(\square \)

2.3 Multi-type Branching Processes

We define the following continuous time multi-type branching process (CMBP) that will correspond to the initial stages of \(\mathrm{SWT}(t)\).

There are two particle types, red (R) and blue (B), and their lifetime is \(\mathrm{Exp}(1)\), independent from everything else. Particles give birth upon their death. They leave behind offspring as in Claim 2.2: each particle has \(\mathrm{Poi}(\rho )\) many blue offspring, red particles have one, while blue particles have two red children. Dead and alive particles will correspond to explored and active vertices, respectively. With this wording, for the number of alive and dead particles, we define

Definition 2.3

We shall write \(\mathbf A(t)=(\mathrm{A}_R(t),\mathrm{A}_B(t))\) for the number of alive particles of each type, \(\mathrm{A}(t)\) standing for the total number of alive particles. Let \(\mathrm{N}(t)=\mathrm{N}_R(t)+\mathrm{N}_B(t)\), where \(\mathrm{N}_q(t)\) means the number of dead particles of type \(q=R,B\). We assume the above quantities to be right-continuous. Superscripts (R), (B) refer to the process started with a single particle of the given type.

The exploration process corresponds to the process started with a single blue-type particle, which dies immediately.

2.3.1 Literature on Multi-type Branching Processes

Here we restate the necessary theorems from [5] which we will use.

Definition 2.4

(Mean matrix) Let \(\mathrm{M}(t):= \mathrm{M}_{r,q}(t) = \mathbb {E}[\mathrm{A}_q^{(r)}(t)],\ (q,r=R,B)\) the mean matrix, where \(\mathrm{A}_q^{(r)}(t)\) is as defined above in Definition 2.3.

It is not hard to see that \(\mathrm{M}(t)\) satisfies the semigroup property \(\mathrm{M}(t+s)=\mathrm{M}(t)\mathrm{M}(s)\) and the continuity condition \(\lim _{t \rightarrow 0}\mathrm{M}(t)=\mathrm{I}\), where \(\mathrm{I}\) denotes the identity matrix. As a result, we have:

Theorem 2.5

(Athreya-Ney) There exists an infinitesimal generator matrix \(\mathrm{Q}\) such that \(\mathrm{M}(t)={\mathrm e}^{\mathrm{Q}t}\), where \(\mathrm{Q}_{r,q}=a_r \mathbb {E}[D^{(r)}_q]-\delta _{r,q}\). Here, \(a_r\) is the rate of dying for a particle of type r, (i.e., the parameter of its exponential lifetime), D is the number of offspring with the same sub-end superscript conventions as in Definition 2.3, and \(\delta _{r,q}={\mathbbm {1}}_{\{r=q\}}\) (i.e., \(\delta _{r,q}=1\) if and only if \(r=q\)).

In our case,

$$\begin{aligned} \mathrm{Q}=\left( \begin{array}{cc} 0 &{}\quad \rho \\ 2 &{}\quad \rho -1 \end{array} \right) \end{aligned}$$

Eigenvalues and eigenvectors of the \(\mathrm{Q}\) matrix Using the characteristic polynomial, for \(\rho \ge 1\), the maximal eigenvalue \(\lambda \) and the second eigenvalue \(\lambda _2\) is given by

$$\begin{aligned} \lambda =\frac{\rho -1+\sqrt{\rho ^2+6\rho +1}}{2}, \quad \lambda _2=\frac{\rho -1-\sqrt{\rho ^2+6\rho +1}}{2}. \end{aligned}$$
(2.1)

For \(0<leq\rho <1\), \(\lambda _2\) is the negative dominant eigenvalue.

The normalized left eigenvector \({\varvec{\pi }}\) that satisfies \({\varvec{\pi }}\mathrm{Q}= \lambda {\varvec{\pi }}\) gives the stationary type-distribution:

$$\begin{aligned} {\varvec{\pi }}= \left( \pi _R, \pi _B\right) =\left( \frac{2}{\lambda +2}, \frac{\lambda }{\lambda +2}\right) . \end{aligned}$$
(2.2)

We denote the right (column) eigenvector of \(\mathrm{Q}\) by \(\mathbf {u}\) and normalize it so that \({\varvec{\pi }}\mathbf {u}= 1\). For later use, without computing, we denote by \(\mathbf {v}_2\) and \(\mathbf {u}_2\) the left (row) and right (column) eigenvector of \(\mathrm{Q}\) belonging to the eigenvalue \(\lambda _2\). The most important theorem for our purposes is that the CMBP grows exponentially with rate \(\lambda \) (the so-called Malthusian parameter), more precisely,

Theorem 2.6

([5]) With the notation as above, almost surely,

$$\begin{aligned} \lim _{t\rightarrow \infty } \mathbf A(t){\mathrm e}^{-\lambda t} = W {\varvec{\pi }}\end{aligned}$$

where W is a non-negative random variable, the almost sure martingale limit of \(W_t := \mathbf A(t)\mathbf {u}\,{\mathrm e}^{-\lambda t}\). Further, \(W>0\) almost surely on the event of non-extinction.

Theorem 2.7

([5]) Define \(T_m\), the \(m^\text {th}\) split time, as the time of the \(m^\text {th}\) death in the branching process. (We assume \(T_1=0\) for the death of the root.) On the event \(\{W>0\}\),

  1. (i)

    For each \(q \in (R,B)\), \(\lim _{m\rightarrow \infty } \mathrm{N}_q(T_m)/\mathrm{N}(T_m) = \lim _{m\rightarrow \infty } \mathrm{N}_q(T_m)/m \buildrel \text {a.s.}\over {=}\pi _q\)

  2. (ii)

    \(\lim _{m\rightarrow \infty } m{\mathrm e}^{-\lambda T_m} \buildrel \text {a.s.}\over {=}\frac{1}{\lambda }W\)

Corollary 2.8

For the vector of dead particles \(\mathbf N(t)=(N_R(t), N_B(t))\),

$$\begin{aligned} \mathbf N(t){\mathrm e}^{-\lambda t} \buildrel {a.s.}\over {\longrightarrow }\frac{1}{\lambda }W {\varvec{\pi }}. \end{aligned}$$

Proof of Theorems 2.5, 2.6, 2.7 and Corollary 2.8

The proofs can be found in [5, ChapterV.7]. \(\square \)

Throughout the next sections, we develop error bounds on the coupling between the branching process and the exploration process on the graph. For convenience, we introduce

$$\begin{aligned} t_n:=\frac{1}{2\lambda } \log n, \end{aligned}$$
(2.3)

the times we will observe the branching and exploration processes at, as well as

$$\begin{aligned} W^{\scriptscriptstyle {(n)}}:=e^{-\lambda t_n} \mathrm{A}(t_n), \quad \text {with} \quad W^{\scriptscriptstyle {(n)}}\buildrel {d}\over {\longrightarrow }W, \end{aligned}$$
(2.4)

the approximations of the martingale limit W at the times \(t_n\). Note that in our case, extinction can never occur, hence almost surely \(W>0\).

2.4 Labeling, Coupling, Error Terms

In this section we develop a coupling between the CMBP discussed in the previous section and \(\mathrm{SWT}(t)\), the exploration process on \(\mathrm{NW}_n(\rho )\).

Error bound on coupling the offspring The CMBP is defined with \(\mathrm{Poi}(\rho )\) blue offspring distribution, while in the exploration process a vertex has \(\mathrm{Bin}(n-3,\rho /n)\) or \(\mathrm{Bin}(n-4,\rho /n)\)) many blue children. Let \(\{\xi _i\}_{i=1}^{n}\) i.i.d. Bernoulli trials with success probability \(\rho /n\), \(X=\sum _{i=1}^{n} \xi _i\sim \mathrm{Bin}(n,\rho /n)\), and let \(Y\sim \mathrm{Poi}(\rho )\). By the usual coupling of binomial and Poisson random variables, \(\mathbb {P}(X\ne Y)\le \frac{\rho ^2}{n}\). Now we decompose X as \(Z=\sum _{i=1}^{n-3}\xi _i \sim \mathrm{Bin}(n-3,\rho /n)\) and \(V=\sum _{i=n-2}^{n}\xi _i \sim \mathrm{Bin}(3,\rho /n)\). Note that Z and V are independent and we can write Z as \(Z=X-V\). Then, under the usual coupling of X and Y,

$$\begin{aligned} \mathbb {P}(Z\ne Y) \le \mathbb {P}(X\ne Y)+\mathbb {P}(V\ne 0)= \frac{\rho ^2}{n}+\frac{3\rho }{n} + o(1/n^2). \end{aligned}$$

For the blue offspring of a blue vertex let \(\hat{Z}=\sum _{i=1}^{n-4}\xi _i\sim \mathrm{Bin}(n-4,\rho /n)\) and \(\hat{V}=\sum _{i=n-3}^{n} \xi _i\sim \mathrm{Bin}(4,\rho /n)\), by similar arguments \(\mathbb {P}(\hat{Z}\ne Y) \le \frac{\rho ^2}{n}+\frac{4\rho }{n} + o(1/n^2)\) holds. Taking maximum and using union bound, the probability that up to k steps, at least one particle has different number of blue offspring in the exploration process and the Poisson branching process, is at most \(k(\rho ^2+4\rho )/n\).

2.4.1 Labeling and Thinning

We relate the CMBP to the exploration process on \(\mathrm{NW}_n(\rho )\) through the labeling of the earlier. Below, everything must be interpreted modulo n.

  1. (i)

    The root is labeled u, the source of the exploration process. u can be U, a uniformly chosen vertex in [n].

  2. (ii)

    Every other particle gets a label when it is born.

  3. (iii)

    We distinguish “left type” and “right type” red children. Left type red particles have a left type red child, right type red particles have a right type red child, blue particles have a red child of both types.

  4. (iv)

    A left type red child of v gets label \(v-1\), a right type red child of v is labeled \(v+1\).

  5. (v)

    The blue children of v get a set of labels uniformly chosen from [n].

Lemma 2.9

We say that the labeling fails if two explored vertices share the same label (this still allows for several occurrences of the same label in the active set). The probability that the labeling fails at the iith split is at most 2i / n.

Proof

The labeling fails at the iith split if the splitting particle has a label that is already taken by an explored vertex. We distinguish two cases.

When a blue particle splits Since the label of a blue particle is chosen uniformly in [n], and there are at most \(i-1\) dead labels already, the probability that we choose from this set is \((i-1)/n\).

When a red particle splits Note that the labeling procedure ensures that whenever a blue particle v is explored, it starts a growing (possibly asymmetric) red interval of red vertices around it. A red vertex, upon dying, extends this interval in one direction (if it is left type, then towards the left). Note that the original vertex v in this interval had a uniformly chosen label in [n]. Let us denote the position of the kith explored blue vertex by \(c_k\), and write \(l_k(T_i)\) and \(r_k(T_i)\) for the number of explored red vertices to the left and to the right of \(c_k\) after the iith split, \(i\ge k\). Finally, we denote the whole interval of explored vertices around \(c_k\) after the iith split by \(I_k(T_i)\). Recall that the process is by definition right-continuous.

In this setting, the label of a red vertex that is just being explored can coincide with the label of an already explored red vertex if and only if two intervals ‘grow into each other’ at the iith split. Denote by \(I^*\) the interval that grows at the iith split, write \(c^*, r^*(T_{i-1}), \ell ^*(T_{i-1})\) for the location of its blue vertex, right and left length, respectively. Then, \(I^\star \) grows into another interval \(I_k\) if and only if \(c_k\), the location of the blue vertex in \(I_k\), is at position \(c^*-l^*(T_{i-1})-r_k(T_{i-1})-1\) or is at position \(c^*+r^*(T_{i-1})+\ell _k(T_{i-1})+1\). (The first case means that the furthest explored red vertex on the right of \(I_k\) was a red active child of the furthest explored left vertex in \(I^*\)). Since the location of \(c_k\) is uniform in [n],

$$\begin{aligned} \mathbb {P}(I^*(T_{i-1}) \cap I_k(T_{i-1})= \varnothing , I^*(T_{i}) \cap I_k(T_{i})\ne \varnothing )= \frac{2}{n}. \end{aligned}$$

Note that there are exactly as many intervals as blue explored vertices (at either \(T_{i-1}\) or \(T_i\), since the iith explored vertex \(v_i\) must be red).

Let the bad event \(E_i=\{v_i \text { is red and its label is already used}\}\). Hence,

$$\begin{aligned} \mathbb {P}(E_{i}) \le \sum _{k=1}^{\mathrm{N}_B(T_{i-1})-1} \frac{2}{n} = \frac{2}{n} \left( \mathrm{N}_B(T_{i-1})-1\right) \le \frac{2i}{n}, \end{aligned}$$

since there are at most i blue explored vertices. Note that the proof also applies when the new red explored vertex coincides with a formerly explored blue one, in case \(\ell _k(T_{i-1})=0\) or \(r_k(T_{i-1})=0\). Hence, the statement of the lemma follows. \(\square \)

In \(\mathrm{NW}_n(\rho )\), the shortest path (uv) through x necessarily uses the shortest path between (ux). As a result, in the CMBP, we also do not need later occurrences of the label x. Hence, we mark the second (or any later) occurrence of a label thinned, and all its descendants ghosts. We move towards bounding the proportion of ghosts among active individuals to carry on with the CMBP approximation. To determine whether a vertex is a ghost, we need knowledge about its ancestors.

2.4.2 Ancestral Line

We approach the problem of ghost actives with the help of the ancestral line. We define the ancestral line \(\mathrm{AL}(y)\) of a vertex y as the chain of particles leading to y from the root, including the root and y itself. Then an alive particle is a ghost if and only if at least one of its ancestors is thinned. The ancestral line was introduced by Bühler in [16, 17] with the following observation: for each time interval \([T_{k},T_{k+1})\) we can allocate a unique particle on the ancestral line that was active in the interval \([T_{k},T_{k+1})\). For the following observations, we condition on \(\{D_i, i=1,\ldots ,k\}\), where \(D_i\) is the total number of offspring of the iith splitting particle. Denote by \(G_k\) the generation of a uniformly chosen alive (active) particle Y after the kith split. Then \(G_k=L_1+L_2+\cdots +L_k\), where the indicators \(L_i\) are conditionally independent and \(L_i=1\) if and only if the ancestor of Y that was alive in the time interval \([T_i,T_{i+1})\) was newborn (born at \(T_i\)). (A rewording of the indicators \(L_i\) is as follows: \(L_i=1\) if and only if the \(i^\text {th}\) splitting particle is in \(\mathrm{AL}(Y)\).)

Since Y is chosen uniformly, and at each split the individual to split is also chosen uniformly among the currently active individuals, each one of these active individuals is equally likely to be an ancestor of Y. Further, in the interval \([T_i,T_{i+1})\), \(D_i\) many particles are newborn, and \(S_i\) many are alive, which yields the probability \(\mathbb {P}(L_i=1| D_i, i=1,\ldots ,k)=D_i/S_i\), see the discussion at the beginning of [17, Sect. 2.A]. We arrive to the following corollary:

Corollary 2.10

The probability of the \(i^\text {th}\) dying particle being an ancestor of Y, a uniformly chosen active vertex after the \(k^\text {th}\) split:

$$\begin{aligned} \mathbb {P}(v_i \in \mathrm{AL}(Y)|D_i, i=1,...,k )=\mathbb {P}(L_i=1)=\frac{D_i}{S_i}. \end{aligned}$$

Expected proportion of thinned actives Let us combine Corollary 2.10 and Lemma 2.9. To be able to do so, we need the following lemma. We will provide its proof later on.

Lemma 2.11

For every \(\varepsilon >0\), there exists a positive integer-valued random variable \(K=K(\varepsilon )\) so that K is always finite and for every \(i>K,\; S_i = i \lambda (1+o(i^{-1/2+\varepsilon }))\) holds.

Recall that \(t_n=(\log n) / 2 \lambda \), and it was chosen such that the number of active vertices is of order \(\sqrt{n}\), and that \(\mathcal {A}(t), \mathrm{A}(t), \mathcal {N}(t), \mathrm{N}(t)\) denotes the set and number of active and dead individuals in the CMBP at time t, respectively.

Lemma 2.12

Let \(\mathcal {A}_G(t)=\{y \in \mathcal {A}(t) : y \text { is a ghost}\}\) the set of ghost active vertices at time t and \(\mathrm{A}_G(t)\) its size. For every fixed \(s\in \mathbb {R}\), the proportion \(\mathrm{A}_G(t_n+s)/\mathrm{A}(t_n+s)\) tends to 0 in probability as n tends to infinity.

Proof of Lemma 2.12

The proportion \(\mathrm{A}_G(t)/\mathrm{A}(t)=\mathbb {P}(Y \in \mathcal {A}_G(t))\), where Y is uniform over \(\mathcal {A}(t)\), i.e., uniformly chosen active individual. Recall that \(v_i\) is the particle that dies at \(T_i\). For an event E, let us write \(\mathbb {P}_k(E):=\mathbb {P}(E| D_i, i=1, \dots , k)\). Using these notation and Corollary 2.10 for the representation of the ancestral line of \(V \in \mathcal {A}(t)\), we can write

$$\begin{aligned} \mathbb {P}_k(Y \in \mathcal {A}_G(t)) \le \sum _{i=1}^{\mathrm{N}(t)} \mathbb {P}_k(v_i \in \mathrm{AL}(Y)\text { and } v_i \text { is thinned}), \end{aligned}$$

Since the labeling is independent of the family tree,

$$\begin{aligned} \mathbb {P}_k(Y \in \mathcal {A}_G(t))\le \sum _{i=1}^{\mathrm{N}(t)} \mathbb {P}_k\left( v_i \in \mathrm{AL}(Y)\right) \cdot \mathbb {P}_k\left( v_i\text { is thinned}\right) \le \sum _{i=1}^{\mathrm{N}(t)} \frac{D_i}{S_i} \frac{2i}{n}. \end{aligned}$$
(2.5)

We apply Lemma 2.11 by splitting the sum for parts up to K and above, use \(D_i<S_i\) for \(i\le K\):

$$\begin{aligned} \begin{array}{ll} \mathbb {P}(Y \in \mathcal {A}_G(t)) &{}\le \sum _{i=1}^{K} \frac{2i}{n} + \sum _{i=K+1}^{\mathrm{N}(t)} \frac{D_i}{\lambda (1+o(i^{-1/2+\varepsilon }))n} \\ &{}\le \frac{K^2}{n} + 2\frac{\sum _{i=K+1}^{\mathrm{N}(t)}D_i}{\lambda n} < \frac{K^2}{n} + 2\frac{\mathrm{A}(t)+ \mathrm{N}(t)}{\lambda n} \end{array} \end{aligned}$$
(2.6)

where we used that all particles are either active or dead in the process and with a possible modification of K, we can have \((1+o(i^{-1/2+\varepsilon })> 1/2\) for all \(i>K\). Next, we can use Corollary 2.8 and Theorem 2.6, which gives that \(\mathrm{N}(t)+\mathrm{A}(t)={\mathrm e}^{\lambda t}(\frac{1}{\lambda }+1) W^{\scriptscriptstyle {(n)}}(1+o(1))\). Hence

$$\begin{aligned} \mathrm{A}_G(t_n+s)/\mathrm{A}(t_n+s) = \mathbb {P}(Y \in \mathcal {A}_G(t_n+s))\le K^2/n + 2{\mathrm e}^{\lambda (t_n+s)}\frac{\lambda +1}{\lambda ^2 n}W^{\scriptscriptstyle {(n)}}(1+o(1)). \end{aligned}$$

Setting \(t_n=\log n/ (2\lambda )\), the right hand side tends to 0 as \(n \rightarrow \infty \), since \(W^{\scriptscriptstyle {(n)}}\rightarrow W\) and K is a tight random variable (does not depend on n). \(\square \)

Let us now return to the proof of Lemma 2.11. This lemma follows from [4, Theorems 1, 2]. Here, we restate [4, Theorem 1] using our notations and for a special case, where each eigenvalue has multiplicity 1. This is sufficient for our purposes and easier than the general case.

Theorem 2.13

(Asmussen, [4]) Let \(\mathbf {Z}_n\) be the number of individuals in the \(n^\text {th}\) generation of a (discreet time) supercritical multi type Galton–Watson process, with dominant eigenvalue \(\lambda \), the corresponding left and right eigenvector \(\mathbf {v}\) and \(\mathbf {u}\). For any other eigenvalue \(\nu \), \(\mathbf {v}_\nu \) and \(\mathbf {u}_\nu \) denote the left and right eigenvectors belonging to \(\nu \).

For an arbitrary vector \(\mathbf {a}\in \mathbb {R}^p\) with the property \(\mathbf {v}\cdot \mathbf {a}=0\) define

$$\begin{aligned} \mu :=\sup \{\nu : \mathbf {v}_\nu \mathbf {a}\ne 0\}, \quad \sigma ^2 := \lim _{n\rightarrow \infty } \frac{|\mathbf {v}| \mathrm{Var}(\mathbf {Z}_n \mathbf {a})}{\lambda ^n} \end{aligned}$$
(2.7)

If \(\mu ^2<\lambda \), then with \(C_n = (2\sigma ^2 \mathbf {Z}_n \mathbf {u}\log n)^{1/2}\)

$$\begin{aligned} \liminf _{n\rightarrow \infty } \frac{\mathbf {Z}_n \mathbf {a}}{C_n}&= -1 \quad \text {and} \quad \limsup _{n\rightarrow \infty } \frac{\mathbf {Z}_n \mathbf {a}}{C_n}&= 1. \end{aligned}$$

We also restate [4, Theorem 2] without change.

Theorem 2.14

(Asmussen 2., [4]) Replacing \(\mathbf Z_n\) with \(\mathbf A(t), t \in [0,\infty )\), Theorem 2.13 remains valid for any supercritical irreducible multi-type Markov branching process.

Proof of Lemma 2.11

We use the previous two theorems for the 2-type branching process defined in Sect. 2.3. Since \({\varvec{\pi }}\) and \(\mathbf {v}_2\) are linearly independent, for any \(\mathbf {a}\ne (0,0)\) with \({\varvec{\pi }}\mathbf {a}= 0\), necessarily \(\mathbf {v}_2 \mathbf {a}\ne 0\), which implies \(\mu = \lambda _2\) in (2.7). The eigenvalues of the mean matrix \(\mathrm{M}(t)\) are \({\mathrm e}^{\lambda t}\) and \({\mathrm e}^{\lambda _2 t}\). The condition \(\mu ^2<\lambda \) in Theorem 2.13 is then equivalent to \(2\lambda _2<\lambda \) which follows from the nonnegativity of \(\rho \), through simple algebraic computations, see (2.1). The asymptotic variance \(\sigma ^2\) and \(C_t\) in this case becomes:

$$\begin{aligned} \sigma ^2 = \lim _{t\rightarrow \infty } {\varvec{\pi }}\mathrm{Var}(\mathbf A(t) \mathbf {a}) {\mathrm e}^{-\lambda t}, \quad C_t = (2 \sigma ^2 \mathbf A(t) \mathbf {u}\log t)^{1/2} \end{aligned}$$

This implies that the theorem rewrites to

$$\begin{aligned} \limsup _{t\rightarrow \infty }\frac{\mathbf A(t) \mathbf {a}}{C_t} =1 \text { and } \liminf _{t\rightarrow \infty }\frac{\mathbf A(t) \mathbf {a}}{C_t} =-1. \end{aligned}$$

Applying this for the split time \(T_i\), we get that there is only a finite number of indexes i such that \(\left| \mathbf A(T_i) \mathbf {a}/C_{T_i}\right| > 2\). Let the maximum of these indexes be K, a random variable. Since \(T_i - \log i/\lambda \) has an almost sure limit by Theorem 2.7, \(T_i\) is of order \(\log i\). This implies that \(C_{T_i}\) is of order \((i\log \log i)^{1/2}\), and by definition of the almost sure convergence, \(C_{T_i}\) exceeds \(i^{1/2+\varepsilon }\) only finitely many times for every \(\varepsilon >0\).

Since \(\mathbb {E}[\mathbf A (t) \mathbf {a}] = 0\) if and only if \(\pi \mathbf {a}= 0\), we can apply the theorem for the centered version \(S_i^c:=S_i-\mathbb {E}S_i\). Then for \(i>K\), \(|S_i^c|\le C_{T_i}\). The fluctuation is of smaller order then \(S_i;\) itself, which means we can indeed write \(S_i=i\lambda (1+o(i^{-1/2+\varepsilon }))\). For more detail on this, see the proof of [26, Corollary 3.16]. \(\square \)

2.4.3 The Number of Multiple Active and Active-Explored Labels

Recall that both in the exploration process as well as in the branching process there might be multiple occurrences of active vertices, see Remark 2.1, as thinning only prevents multiple explored labels.

Later we want to use that the number of different active labels that are not ghosts at \(T_i\) is approximately the same as \(S_i\), i.e., there are not many multiple occurrences. In Lemma 2.12 we have seen that the proportion of ghosts is negligible on the time scale \(t_n\), but we still have to deal with labels that are multiply active, or are explored and active at the same time. We will discuss these issues in the following five cases:

  1. 1.

    A blue active vertex has been already explored.

  2. 2.

    A red active vertex has been already explored.

  3. 3.

    A blue active vertex is also red active.

  4. 4.

    A vertex is double red active.

  5. 5.

    A vertex is double blue active.

We will denote by \(p_\alpha (t)\) the probability that a uniformly chosen active vertex falls under case \(\alpha =1,\dots , 5\) at time t, which is the same as the proportion of vertices falling under case \(\alpha \) among all active vertices.

Case 1. Blue active being already explored At time t, there are at most \(\mathrm{N}(t)\) explored labels that are not thinned. Under the condition that the active vertex is blue, its label is chosen uniformly over [n], so the probability that this label has been already explored is at most \(\mathrm{N}(t)/n\). Substitute \(\mathrm{N}(t_n+s)\) from Corollary 2.8, then for \(t_n+s=\frac{1}{2\lambda }\log n+s\),

$$\begin{aligned} p_1(t_n+s)\le \mathbb {P}(v \text { is already explored } \vert \ v \text { is blue})=\mathrm{N}(t_n+s)/n. \end{aligned}$$
(2.8)

Case 2. Red active being already explored This case can be treated similarly as the thinning of red vertices, so we also use the notation there. A label of the red active vertex is explored if and only if two intervals are about to grow into each other: the furthest explored red vertices in both intervals are neighbors. We call these intervals neighbors. Then, for two neighboring intervals, the active vertices at the end of each interval are explored in the other interval. Let \(I_k\) and \(I_j, 1\le k<j\le \mathrm{N}_B(t)\) intervals with blue particles with label \(c_k\) and \(c_j\) respectively. Conditioned on \(c_k, l_k(t), r_j(t)\), there are two possibilities so that \(I_k\) and \(I_j\) are neighbors: \(c_j=c_i+r_i+\ell _j+1\) or \(c_j=c_i-\ell _i-r_j-1\). Thus for each pair of indices the probability of the intervals being neighbors is 2 / n (these are not independent, but expectation is linear). Summing up for all pair of indexes and dividing by the number of all red actives gives the proportion of case 2 red actives among all red actives.

$$\begin{aligned} p_2(t_n+s) \le \frac{1}{\mathrm{A}_B(t_n+s)}\displaystyle \sum _{1\le i<j\le \mathrm{N}_B(t_n+s) }\frac{2}{n} = \frac{{\mathrm{N}_B(t_n+s) \atopwithdelims ()2} \cdot 2/n}{2\mathrm{N}_B(t_n+s)} \le \frac{\mathrm{N}_B(t_n+s)}{2n}.\quad \end{aligned}$$
(2.9)

Case 3. Blue active being red active Using that the labels of blue vertices are chosen uniformly,

$$\begin{aligned} \mathbb {P}(v \text { is red and blue active}) = \mathrm{A}_R(t_n+s)/n. \end{aligned}$$
(2.10)

Case 4. Multiple red active vertices This case is similar to Case 2. A vertex v can be red active twice if the two intervals that it belongs to are “almost neighbors”, that is, both have v as an active vertex on one of their ends (v is the only vertex separating them.) Conditioning on the location of one of the intervals, the blue vertex in the other interval can be at 2 different locations, hence

$$\begin{aligned} p_4(t_n+s) = p_2(t_n+s). \end{aligned}$$
(2.11)

Case 5. Multiple blue active vertices Again, the label of a blue vertex is chosen uniformly at random, hence the probability that the label of an active blue vertex v coincides with another active blue label is at most \(\mathrm{A}_B(t)/n\). Hence

$$\begin{aligned} p_5(t_n+s) \le \frac{1}{\mathrm{A}_B(t_n+s)} \sum _{v\in \mathcal {A}_B(t_n+s)} \mathrm{A}_B(t_n+s)/n = \mathrm{A}_B(t_n+s)/n. \end{aligned}$$
(2.12)

Corollary 2.15

Define \(\mathrm{A}_e(t)\) the effective size of the active set as follows: we subtract from \(\mathrm{A}(t)\) the number of ghosts, already explored and multiple active labels, to get the number of different labels in \(\mathcal {A}(t)\). Then

$$\begin{aligned} \mathrm{A}_e(t_n+s)/\mathrm{A}(t_n+s) \buildrel {\mathbb {P}}\over {\longrightarrow }1. \end{aligned}$$
(2.13)

Similarly, define \(\mathrm{N}_e(t)\) the effective size of the explored set to be equal to the size of the explored set minus the number of thinned and ghost explored vertices, i.e., vertices that were ghosts when they became explored. Then

$$\begin{aligned} \mathrm{N}_e(t_n+s)/\mathrm{N}(t_n+s) \buildrel {\mathbb {P}}\over {\longrightarrow }1. \end{aligned}$$
(2.14)

Proof

We start with the proof of formula (2.13). By the previous arguments, a lower bound can be given if we subtract the individual probabilities for red and blue vertices to be deleted (note that this is a crude bound since we do not weight it with the proportion of red and blue active labels):

$$\begin{aligned} \mathrm{A}_e(t_n+s)/\mathrm{A}(t_n+s) \ge 1- \sum _{i=1}^5 p_{i}(t_n+s) = 1- \frac{2\mathrm{N}(t_n+s)+ \mathrm{A}(t_n+s)}{n}, \end{aligned}$$

where we summed up the rhs of (2.8), (2.9), (2.10), (2.11), (2.12) to obtain the rhs. Now we can use that \(t_n+s=\log n/(2\lambda ) +s \), use \(\mathrm{N}(t)\) from Corollary 2.8, and Theorem 2.6 to get

$$\begin{aligned} \mathrm{A}_e(t_n+s)/\mathrm{A}(t_n+s) \ge 1- \frac{\lambda +2}{\lambda } {\mathrm e}^{\lambda s} W^{\scriptscriptstyle {(n)}}(1+o(1))/\sqrt{n} , \end{aligned}$$

which tends to 1 since \(W^{\scriptscriptstyle {(n)}}\rightarrow W\) a.s. by (2.4) and Theorem 2.6.

To prove formula (2.14), we use the first moment method to bound the number of thinned vertices. By Corollary 2.9, the probability that the iith explored vertex is thinned is at most 2i / n. Hence, conditioned on the size of the explored set \(\mathrm{N}(t_n+s)\), the expected number of thinned vertices is:

$$\begin{aligned} \begin{array}{ll} &{}\sum _{i=1}^{\mathrm{N}(t_n+s)} \mathbb {P}(\text {the}\; i\mathrm{ith}\; \text {explored vertex is thinned}) \le \sum _{i=1}^{\mathrm{N}(t_n+s)} 2i/n \le 2\mathrm{N}(t_n+s)^2/n \\ &{}\qquad = 2\big (W^{\scriptscriptstyle {(n)}}{\mathrm e}^{\lambda s}/\lambda \big )^2(1+o(1)) \buildrel {n\rightarrow \infty }\over {\longrightarrow }2 \big (W{\mathrm e}^{\lambda s}/\lambda \big )^2 , \end{array} \end{aligned}$$

where we used Corollary 2.8 for the asymptotic size of \(\mathrm{N}(t_n+s)\) to obtain the second line. Since this conditional expected size is of order 1, Markov’s inequality implies that the number of thinned vertices is at most of order \(\log n\) w.h.p.

We show that the number of ghost explored vertices is of the same order. Note that proportion of ghost actives among actives tends to 0 by (2.13) (see also Lemma 2.12). Recall that the next explored vertex is a uniformly chosen active vertex, hence the proportion of ghosts becoming explored among explored vertices also tends to 0 in probability. One can make this argument rigorous by using first moment method and the upper bound from (2.6) to get that the expected number of ghost explored vertices is also of constant order. Markov’s inequality finishes the proof again. \(\square \)

The conclusion of this section is summarized in the following corollary.

Corollary 2.16

Fix \(n\ge 1\) and \(\rho >0\). Consider the thinned CMBP with label u for the root. Then, there is a coupling of shortest weight tree \(\mathrm{SWT}^u(t)\) in \(\mathrm{NW}_n(\rho )\) to the evolution of the thinned CMBP as long as \(t\le t_n +M\) for some arbitrary large \(M\in \mathbb R\). Further, the set of active vertices in the thinned CMBP can be approximated by the set of labeled active vertices in the the original CMBP in the sense that the proportion of the different labels among the actives over the total active vertices tends to zero as \(n\rightarrow \infty \), in the sense of Corollary 2.15.

3 Connection Process

Now that we have a good approximation of the shortest weight tree (\(\mathrm{SWT}\)) started from a vertex, it provides us a method to observe the shortest weight path between two vertices. Let us give a raw sketch of this method before moving into the details. The previous section provides us with a coupling of a CMBP and the \(\mathrm{SWT}\) as long as the total number of vertices is of order \(\sqrt{n}\) in the \(\mathrm{SWT}\). To find the shortest weight path between vertices U and V, we grow the shortest weight trees from one of the vertices (\(\mathrm{SWT}^U\)) until time \(t_n\) (the size is then of order \(\sqrt{n}\)). Then, conditioned on the presence of \(\mathrm{SWT}^U(t_n)\), we grow \(\mathrm{SWT}^V(\cdot )\) and see when is the first time that these two trees intersect. The shortest weight path is determined by the first intersection of the explored set of vertices in the two processes. However, to avoid contradiction with the neighbors of the vertex that would be explored in both \(\mathrm{SWT}\)s, and we have a good bound on the effective size of the set of active vertices, it turns out to be easier to look at the times when the first few active vertices in \(\mathrm{SWT}^U(t_n)\) become explored in \(\mathrm{SWT}^V(\cdot )\). Note that a vertex w in the active set of vertices in \(\mathrm{SWT}^U(t_n)\) is at distance \(t_n+ R_w(t_n)\) from U, where \(R_w(t_n)\) denotes the remaining lifetime, see Section 2. Then we have yet to minimize the total length of the paths over vertices in \(\mathcal {A}^U(t_n)\cap \mathcal {N}^V(\cdot )\). This is what we shall carry out now rigorously.

Definition 3.1

(Collision and connection) We grow \(\mathrm{SWT}^U\), the shortest weight tree of U until time \(t_n=\frac{1}{2\lambda }\log n\), and then fix it. Then we grow \(\mathrm{SWT}^V\) until time \(t_n+M\), for some large \(M\in \mathbb {R}\), conditioned on the presence of \(\mathrm{SWT}^U(t_n)\). We say that a collision happens at time \(t_n+s\) when an active vertex in \(\mathrm{SWT}^U(t_n)\) becomes explored in \(\mathrm{SWT}^V\) at time \(t_n+s\). Denote the set of collision times by the point process \((t_n+ P_i)_{i\in \mathbb {N}}\).Footnote 1 If a collision happens at vertex \(x_i\) at time \(t_n+P_i\), this determines a path between U and V with length \(2t_n+P_i+R^{U}_{x_i}(t_n)\), where \(R^U_{x_i}(t_n)\) is the remaining lifetime of \(x_i\) in \(\mathrm{SWT}^U(t_n)\). Then the length of the shortest weight path is given by

$$\begin{aligned} \min _{i\in \mathbb {N}} \left( 2t_n+P_i+R^{U}_{x_i}(t_n+P_i) \right) \end{aligned}$$

among all collision events.

We can see that in the case of growing \(\mathrm{SWT}^V\) after \(\mathrm{SWT}^U\), the labels belonging to explored vertices in \(\mathrm{SWT}^U\) can not be used again, leading to some extra thinned vertices in \(\mathrm{SWT}^V\). We claim that the number of additional ghosts is not too big. (Since we would like to get a bound on the effective size of active vertices in \(\mathrm{SWT}^V(t_n+u)\), we must delete the descendants of vertices that formed earlier collision events.)

Claim 3.2

Consider the case of growing \(\mathrm{SWT}^V\) after \(\mathrm{SWT}^U(t_n)\) on the same graph \(\mathrm{NW}_n(\rho )\). Then the effective size of the active and explored set in \(\mathrm{SWT}^V\) for times \(t=t_n+s\) is asymptotically the same as the size of the active and explored set respectively, that is, the statements of Corollary 2.15 remain valid for \(\mathrm{SWT}^V\) as well.

Note that for the active set, it suffices to bound the proportion of ghosts, as the error terms caused by multiple active, or active-explored vertices are not increased by the presence of \(\mathrm{SWT}^U\).

Proof

We consider the computations in the proof of Lemma 2.12, using (2.5). Recall that the proportion of ghosts depends simultaneously on the thinning probability of the \(i^\text {th}\) explored vertex as well as it being an ancestor of a uniform active vertex.

The arguments with the ancestral line (see 2.4.2) remain valid without any modification, we only have to examine the change in the thinning probability.

In case the iith explored vertex is blue, its label is chosen uniformly, thus the probability that this label coincides with a previously chosen label equals \(\left( \mathrm{N}^U(t_n) + i-1\right) /n\). In case the iith explored vertex in \(\mathrm{SWT}^V\) is red, we can use the same idea as before: it has the label of an already explored vertex if and only if two intervals grow into each other with the iith step. We now consider the union of the intervals in \(\mathrm{SWT}^U\) and \(\mathrm{SWT}^V\). Conditioned on the interval that grows, for any interval the probability that these two grow into each other is 2 / n. The number of intervals is at most the total number of blue explored vertices, \(\mathrm{N}_B^U(t_n)+ \mathrm{N}_B^V(T_i)\). Hence the probability that the labeling fails at the iith step of \(\mathrm{SWT}^V\), if this is a red vertex is at most

$$\begin{aligned} \sum _{k=1}^{\mathrm{N}^U_B(t_n)+N_B^V(T_i)} \frac{2}{n} = \frac{2\left( \mathrm{N}^U_B(t_n) + \mathrm{N}_B^V(T_i)\right) }{n} \le \frac{2\left( \mathrm{N}^U(t_n) + i \right) }{n}. \end{aligned}$$

Since the color of the i-th explored vertex is either blue or red, we get

$$\begin{aligned} \mathbb {P}\big (\text {labeling fails in} \mathrm{SWT}^V \text {at step i}\big ) \le 2 \left( \mathrm{N}^U(t_n) + i\right) /n. \end{aligned}$$
(3.1)

For the probability of a uniformly chosen active vertex in \(\mathrm{SWT}^V\) being a ghost, similarly to (2.5), we have

$$\begin{aligned} \frac{\mathrm{A}_G^V(t_n+s)}{\mathrm{A}^V(t_n+s)} = \sum _{i=1}^{\mathrm{N}^V(t_n+s)} \frac{D_i}{S_i} \cdot \frac{2(\mathrm{N}^U(t_n) + i )}{n} =\sum _{i=1}^{\mathrm{N}^V(t_n+s)} \frac{D_i}{S_i} \cdot \frac{2i}{n} + \sum _{i=1}^{\mathrm{N}^V(t_n+s)} \frac{D_i}{S_i} \cdot \frac{2}{n} \mathrm{N}^U(t_n).\qquad \quad \end{aligned}$$
(3.2)

By Lemma 2.12, the first sum on the rhs tends to 0 as n tends to \(\infty \). for the second sum, let us recall the a.s. finite K in Lemma 2.11, and we split the sum again. We use Corollary 2.8, 2.4 and \(t_n=\log n/(2\lambda )\) to get

$$\begin{aligned} \frac{2}{n} \mathrm{N}^U(t_n) \sum _{i=1}^{K} \frac{D_i}{S_i} \le \frac{2}{n} \mathrm{N}^U(t_n) \sum _{i=1}^{K} 1 = K \frac{2}{\lambda \sqrt{n}} W_U^{\scriptscriptstyle {(n)}}(1+o(1)) \buildrel {n\rightarrow \infty }\over {\longrightarrow }0. \end{aligned}$$

For the second part of the sum, by Lemma 2.11 again,

$$\begin{aligned} \Sigma _2 = \frac{2}{n} \mathrm{N}^U(t_n) \sum _{i=K+1}^{\mathrm{N}^V(t_n+s)} D_i/S_i = \frac{2}{n} \mathrm{N}^U(t_n) \sum _{i=K+1}^{\mathrm{N}^V(t_n+s)} \frac{D_i}{i\lambda (1+o(i^{-1/2+\varepsilon }))} \end{aligned}$$

Using \(\mathbb {E}[D_i] \le \lambda +2\), we bound the expected value of the sum \(\Sigma _3:=\sum _{i=K+1}^{\mathrm{N}^V(t_n+s)} \frac{D_i}{i\lambda (1+o(i^{-1/2 + \varepsilon }))}\) with tower rule.

$$\begin{aligned} \mathbb {E}\left[ \Sigma _3\right] \le \mathbb {E}\left[ (1+o(1)) \sum _{i=K+1}^{\mathrm{N}^V(t_n+s)} \frac{\lambda +2}{i\lambda } \right] \!\le \frac{\lambda +2}{\lambda } \mathbb {E}\Big [\log (\mathrm{N}^V(t_n+s))\Big ] (1+o(1)) \end{aligned}$$

Since the logarithm is concave, we use Jensen’s inequality:

$$\begin{aligned} \mathbb {E}[\Sigma _3] \le \frac{\lambda +2}{\lambda } \log \mathbb {E}\Big [\mathrm{N}^V(t_n+s)\Big ] (1+o(1)) \end{aligned}$$
(3.3)

From Theorem 2.5 it follows that \(\mathbb {E}\big [\mathrm{N}^V(t_n+s)\big ]=(0,1) \exp \{\mathrm{Q}\!\cdot \!(t_n+s)\} \mathbf 1\), where \(\mathbf 1=(1,1)^T\) is a column vector. Using the Jordan decomposition of the matrix \(\mathrm{Q}\) and exponentiating, elementary matrix analysis yields that the leading order is determined by the main eigenvalue \(\lambda \) and hence \((1,0)\exp \{\mathrm{Q}(t_n+s)\}\mathbf 1\le e^{\lambda (t_n+u)} C_1\) for some constant \(C_1\ge 0\). Let us then use this bound with \(t_n+s=\log n/(2\lambda ) +s\) to give an upper bound on the rhs of (3.3), and set \(C_2:=2C_1(\lambda +2)\). Then Markov’s inequality yields

$$\begin{aligned} \mathbb {P}\left( \Sigma _3 \ge C_2 (\log n)^2 \right) \le 1/\log n \buildrel {n\rightarrow \infty }\over {\longrightarrow }0. \end{aligned}$$

Then on the w.h.p. event \(\{\Sigma _3 \le C_2 (\log n)^2 \}\) for \(\Sigma _2 = \frac{2}{n} \mathrm{N}^U(t_n) \Sigma _3\), by Corollary 2.8, (2.4) again,

$$\begin{aligned} \Sigma _2 \le \frac{2}{n} \mathrm{N}^U(t_n) C_2(\log n)^2 \le C_2(\log n)^2 \frac{W_U^{\scriptscriptstyle {(n)}} (1+o(1))}{\lambda \sqrt{n}} \buildrel {n\rightarrow \infty }\over {\longrightarrow }0. \end{aligned}$$

Since we showed that the proportion of ghost actives tends to 0, the proportion of ghost explored vertices also tends to 0, by similar arguments as in the proof of Corollary 2.15. For the number of thinned vertices, we calculate the expected value as before, with thinning probability as in (3.1). Conditioned on the sizes of both explored sets \(\mathrm{N}^U(t_n)\) and \(\mathrm{N}^V(t_n+s)\), the expected number of thinned vertices in \(\mathrm{SWT}^V\) at time \(t_n+s\) is given by

$$\begin{aligned} \begin{array}{ll} &{}\sum _{i=1}^{\mathrm{N}^V(t_n+s)} \mathbb {P}({i^\text {th}\text {explored in} \mathrm{SWT}^V \text {is thinned}}) \le \sum _{i=1}^{\mathrm{N}^V(t_n+s)} 2 \big ( \mathrm{N}^U(t_n) + i\big )/n \\ &{}\quad \le \frac{2}{n} \left( (\mathrm{N}^V(t_n+s))^2 + \mathrm{N}^V(t_n+s)\mathrm{N}^U(t_n) \right) \! = \frac{2}{\lambda ^2} \left( {\mathrm e}^{2\lambda s}(W_V^{\scriptscriptstyle {(n)}})^2 + {\mathrm e}^{\lambda s}W_V^{\scriptscriptstyle {(n)}}W_U^{\scriptscriptstyle {(n)}}\right) (1+o(1)), \end{array} \end{aligned}$$

that is, the expected number of thinned vertices is of constant order. We finish the proof by applying Markov’s inequality to show that this number is at most of order \(\log n\) w.h.p. \(\square \)

Fig. 3
figure 3

We indicated the growing neighbourhood of two uniformly picked vertices in the exploration process, with purple and yellow colors, respectively. The letters ‘C’ on the picture show that a collision event happens at the given vertex. Notice that all these collision events have a remaining edge-length yet to be covered in the exploration process of U, i.e., the vertex is active in \(\mathrm{SWT}^U\) and explored in \(\mathrm{SWT}^V\) (Color figure online)

3.1 The Poisson Point Process of Collisions

Recall that we say that a collision event happens at time \(t_n+s\) when an active vertex in \(\mathrm{SWT}^U(t_n)\) becomes explored in \(\mathrm{SWT}^V\) at time \(t_n+s\). First we show that for each pair of colours, with respect to the parameter s in \(t_n+s\), the set of points \((P_i)_{i\in \mathbb {N}}\) form a non-homogeneous Poisson point process (PPP) on \(\mathbb {R}\), and that these PPP-s are asymptotically independent. We consider the intensity measure \(\mu (\mathrm{d}t)\), \(t \in \mathbb {R}\) as the derivative of the mean function M(t) (expected value of points up till time t). To determine the intensity measure of the collision process, we will consider the four collision point processes for each possible pair of colours. None of the PPPs is empty: since the labels of blue vertices are chosen uniformly, they can meet any color, and considering the growing set of intervals, we see that red can meet red as well (see Fig. 3).

Let us introduce the notation for \(q,r \in \{R,B\}\), \(s\in \mathbb {R}\)

$$\begin{aligned} \begin{array}{ll} \mathcal {C}_{q,r}(s)&{}:=\mathcal {N}^V_{q}(t_n+s) \cap \mathcal {A}^U_{r}(t_n), \quad \mathrm{C}_{q,r}(s):=|\mathcal {C}_{q,r}(s)|,\\ \mathcal {C}(s)&{}:=\bigcup \nolimits _{q,r \in \{R,B\}} \mathcal {C}_{q,r}(s), \quad \mathrm{C}(s):=|\mathcal {C}(s)| \end{array} \end{aligned}$$
(3.4)

(Note that e.g. \(\mathcal {C}_{R,B}(s)\) denotes the set of red explored labels in \(\mathrm{SWT}^V\) that are blue active in \(\mathrm{SWT}^U\).) The corresponding asymptotic intensity measures are denoted by \(\mu _{q,r}(s)\) and total intensity measure by \(\mu (s)\). Our goal for this section is to prove

Theorem 3.3

(Total intensity measure of the collision PPP) The point processes \(\mathcal {C}_{q,r}(s), s\in \mathbb {R}, \ q,r=R,B\) of collision events are asymptotically independent Poisson point processes, with intensity measures on \(\mathrm{NW}_n(\rho )\) given by

$$\begin{aligned} \begin{array}{ll} \mu _{B,B}(\mathrm ds) &{}:= \pi _B^2 W_V^{\scriptscriptstyle {(n)}}W_U^{\scriptscriptstyle {(n)}}{\mathrm e}^{\lambda s} \mathrm{d}s (1+o(1)), \;\; \\ \mu _{R,B}(\mathrm ds) &{}:= \pi _R \pi _B W_V^{\scriptscriptstyle {(n)}}W_U^{\scriptscriptstyle {(n)}}{\mathrm e}^{\lambda s} \mathrm{d}s (1+o(1)), \;\; \\ \mu _{B,R}(\mathrm ds) &{}:= \pi _B \pi _R W_V^{\scriptscriptstyle {(n)}}W_U^{\scriptscriptstyle {(n)}}{\mathrm e}^{\lambda s} \mathrm{d}s (1+o(1)), \\ \mu _{R,R}(\mathrm ds) &{}:= \frac{1}{2} \pi _R^2 W_V^{\scriptscriptstyle {(n)}}W_U^{\scriptscriptstyle {(n)}}{\mathrm e}^{\lambda s} \mathrm{d}s (1+o(1)). \end{array} \end{aligned}$$
(3.5)

where the \(1+o(1)\) factor only depends on n. The total intensity measure of the collision Poisson point process is then

$$\begin{aligned} \mu (\mathrm ds) = W_U^{\scriptscriptstyle {(n)}}W_V^{\scriptscriptstyle {(n)}}{\mathrm e}^{\lambda s}\! \left( 1 - \pi _R^2/2\right) \! \mathrm{d}s (1+o(1)). \end{aligned}$$
(3.6)

It is not hard to show (e.g. using Borel–Cantelli lemma) that these PPP-s have only finitely many points on \((-\infty , 0)\), hence, indexing the points by \(i\in \mathbb {N}\) is doable. Before we proceed to the proof, we take a small analytic excursion.

Proposition 3.4

(PPP approximation of ’increasing’ binomial distributions) Let us consider a time-dependent multinomial distribution with \(C \sqrt{n}\) trials, where the success parameters for type 1 and type 2 successes are \(R_1(s)=f_1(s)/\sqrt{n}\) and \(R_2(s)=f_2(s)/\sqrt{n}\), respectively, for increasing functions \(f_1(\cdot )\) and \(f_2(\cdot )\). Let \((N_1(s), N_2(s))\) denote the number of type 1 and type 2 successes (using the success probabilities at time s). Then the collection of random variables \((N_1(s), N_2(s))_{s>0}\), as n goes to infinity, converges in probability to a two-dimensional Poisson point process with mean \(C f_1(s)\times C f_2(s)\). Shortly,

$$\begin{aligned} \mathrm{Multinomial}(C \sqrt{n}, f_1(s)/\sqrt{n}, f_2(s)/\sqrt{n}) \buildrel {d}\over {\longrightarrow }PPP(C f_1(s))\times PPP(C f_2(s)).\qquad \end{aligned}$$
(3.7)

In particular, the processes of type 1 and type 2 successes are asymptotically independent. The statement remains valid when type 1 and type 2 successes can occur at the same time, with probability \(R_3(s)=o(1/\sqrt{n})\) for all s.

Remark 3.5

(Analogue with an urn model) Looking at Proposition 3.4 in a simpler (but more restrictive) way, we can think of an urn model with n balls, where balls are gradually painted green and purple such that there are \(n R_1(s)=f_1(s)\sqrt{n}\) many green and \(n R_2(s)=f_2(s)\sqrt{n}\) many purple balls at time s. We allow a few balls to be both green and purple, with their number satisfying \(n R_3(s)=o(\sqrt{n})\) for all s. We draw \(C\sqrt{n}\) times with replacement, then \((N_1(s), N_2(s))\) denotes the number of drawn balls that had been painted green and purple by time s, respectively, and this converges to a two-dimensional PPP with the above mean.

Proof of Proposition 3.4

By [27, Theorem 4.7], it is enough to show that for any rectangle \([a,b]\times [c,d]\), the number of type 1 successes in the interval [ab] and type 2 successes in the interval [cd] satisfies

$$\begin{aligned}&\mathbb {P}( N_1(b)-N_1(a) =0, N_2(d)- N_2(c)=0)\nonumber \\&\quad = \exp \left\{ - C (f_1(b)-f_1(a)) - C (f_2(d)-f_2(c))\right\} . \end{aligned}$$
(3.8)

In the special case where the double success has 0 probability (there are no double-painted balls), \((N_1(b)-N_1(a), N_2(d)-N_2(c))\) follows a multinomial distribution with parameters \(C \sqrt{n}\) and \((f_1(b)-f_1(a))/\sqrt{n}, (f_2(d)-f_2(c))/\sqrt{n}\).

$$\begin{aligned} \mathbb {P}\big ( N_1(b)-N_1(a) =0, N_2(d)-N_2(c)=0 \big ) = \left( 1- \frac{f_1(b)-f_1(a)+f_2(d)-f_2(c)}{\sqrt{n}} \right) ^{C \sqrt{n}}.\nonumber \\ \end{aligned}$$
(3.9)

Note that the right hand side converges to the rhs of (3.8). This finishes the first statement of the proposition.

In case a success can be both type 1 and 2 at the same time, (with probability \(R_3(s)\)), by inclusion-exclusion, these cases are excluded twice on the intersection of [ab] and [cd] in the formula (3.9). When \([a,b] \cap [c,d]=\emptyset \), (3.9) remains valid. By the symmetric role of type 1 and type 2 successes, we can assume \(a\le c\), then, \([a,b] \cap [c,d]\ne \emptyset \) implies \(c<b\). We consider the cases \(d\le b\) (when \([c,d]\subset [a,b]\)) and \(d>b\) (\([a,b]\cap [c,d]=[c,b]\)) separately.

Case 1: \(a\le c<b<d\). Note that \(\{N_1(b)-N_1(a)=0, N_2(d)-N_2(c)=0\}\) means that we have no type 1 success on [ac], no success of any type on [cb] and no type 2 success on [bd].

$$\begin{aligned} \begin{array}{ll} &{}\mathbb {P}\big (N_1(c)-N_1(a)=0, N_1(b)-N_1(c)=0, N_2(b)-N_2(c)=0, N_2(d)-N_2(b)=0\big ) \\ &{}\quad = \left( 1 - \frac{f_1(c)-f_1(a)}{\sqrt{n}}-\Big ( \frac{f_1(b)-f_1(c)+f_2(b)-f_2(c)}{\sqrt{n}}-\big (R_3(b)-R_3(c)\big ) \Big ) \right. \\ &{}\quad \quad - \left. \frac{f_2(d)-f_2(b)}{\sqrt{n}} \right) ^{C\sqrt{n}} \end{array} \end{aligned}$$
(3.10)

Using \(R_3(b)-R_3(c)=o\big (1/\sqrt{n}\big )\) and \(1-x={\mathrm e}^{-x}+o \big ( 1/ {x^2} \big )\), the left hand side of (3.10) becomes

$$\begin{aligned} \left( 1- \frac{f_1(b)-f_1(a)+f_2(d)-f_2(c)+o(1)}{\sqrt{n}} \right) ^{C \sqrt{n}} \rightarrow {\mathrm e}^{-\big ( f_1(b)-f_1(a)+f_2(d)-f_2(c) \big )\cdot C}, \end{aligned}$$

as n tends to infinity.

Case 2: \(a\le c<d\le b\) Note that \(\{N_1(b)-N_1(a)=0, N_2(d)-N_2(c)=0\}\) means that there is no type 1 success on [ac] and [db] and no success of any kind on [cd]. By similar computations as in Case 1, and using that \(R_3(d)-R_3(c)=o\big (1/\sqrt{n}\big )\) as well,

$$\begin{aligned} \begin{array}{ll} &{}\mathbb {P}\big (N_1(c)-N_1(a)=0, N_1(d)-N_1(c)=0, N_2(d)-N_2(c)=0, N_1(b)-N_1(d)=0\big ) \\ &{}\quad \quad = \left( 1- \frac{f_1(b)-f_1(a)+f_2(d)-f_2(c)+o(1)}{\sqrt{n}} \right) ^{C \sqrt{n}}, \end{array} \end{aligned}$$

and we have already showed that this converges to the right hand side of (3.8). \(\square \)

Proof of Theorem 3.3

In Corollary 2.15 and in Claim 3.2 we showed that the effective size of the active and explored sets at times \(t_n+s\) for \(s\in \mathbb {R}\) are asymptotically the same as the number of active and explored individuals in the CMBP respectively, so we can use the asymptotics of \((\mathrm{A}_R(t), \mathrm{A}_B(t))\) from Theorem 2.6 and the asymptotics of \((\mathrm{N}_R(t),\mathrm{N}_B(t))\) from Corollary 2.8. For the coming paragraphs, for \(s\in \mathbb {R}\) and an event E we define the notation

$$\begin{aligned} \mathbb {P}_s(E):=\mathbb {P}(E| \mathrm{A}^U_{B}(t_n), \mathrm{A}^U_{R}(t_n), \mathrm{N}^V_B(t_n+s), \mathrm{N}^V_R(t_n+s)). \end{aligned}$$

\(\square \)

Blue–blue and red–blue collision By the definition of the set \(\mathcal {C}_{B,B}(s), \mathcal {C}_{R,B}(s)\), we can use the following indicator representation:

$$\begin{aligned} \mathrm{C}_{B,B}(s) = \sum _{x \in \mathcal {A}^U_{B}(t_n)} {\mathbbm {1}}\{x \in \mathcal {N}^V_{B}(t_n+s)\}, \quad \mathrm{C}_{R,B}(s) =\sum _{x \in \mathcal {A}^U_{B}(t_n)}{\mathbbm {1}}\{x \in \mathcal {N}^V_{R}(t_n+s)\}\nonumber \\ \end{aligned}$$
(3.11)

Recall that the labels in \(\mathcal {A}^U_B(t_n)\) are chosen independently and uniformly in [n]. As a result, the pair \(\mathrm{C}_{B,B}(s), \mathrm{C}_{R,B}(s)\) has multinomial distribution with parameters \(\mathrm{A}^U_{B}(t_n)\) for the number of draws, \(\mathrm{N}^V_{B}(t_n+s)/n, \mathrm{N}^V_{R}(t_n+s)/n\) for the two success parameters, and double success has 0 probability, since double-explored vertices are thinned.

Note that \({\mathbbm {1}}\{x \in \mathcal {N}^V_{q}(t_n+s_1)\} \le {\mathbbm {1}}\{x \in \mathcal {N}^V_{q}(t_n+s_2)\}\) when \(s_1\le s_2\) for \(q=R,B\), this description fits the conditions of Proposition 3.4, since for \(q=R,B\)

$$\begin{aligned} \mathrm{N}^V_{q}(t_n+s)={\mathrm e}^{\lambda s} \pi _q W_V^{\scriptscriptstyle {(n)}}\sqrt{n}/\lambda (1+o(1)), \quad \mathrm{A}^U_B(t_n)=\pi _B W_U^{\scriptscriptstyle {(n)}}\sqrt{n} (1+o(1)), \end{aligned}$$

where we have used the asymptotic results for \(\mathbf A(t), \mathbf N(t)\) from Theorem 2.6 and Corollary 2.8 and the definition of \(W^{(n)}\) in (2.4). Note that the \(1+o(1)\) factor only depends on n and comes from the error of possible deviation from the stationary distribution \((\pi _R, \pi _B)\) in the approximations. A direct application of Proposition 3.4 shows that \((\mathrm{C}_{B,B}(s), \mathrm{C}_{R,B}(s))\) converges to two independent Poisson processes, with means

$$\begin{aligned} \begin{array}{ll} \mathbb {E}_s[\mathrm{C}_{B,B}(s)] &{}= \mathrm{N}^V_{B}(t_n+s) \cdot \mathrm{A}^U_B(t_n)/n = {\mathrm e}^{\lambda s} \pi _B^2 W_V^{\scriptscriptstyle {(n)}}W_U^{\scriptscriptstyle {(n)}}/\lambda (1+o(1)),\\ \mathbb {E}_s[\mathrm{C}_{R,B}(s)] &{}= \mathrm{N}^V_{R}(t_n+s) \cdot \mathrm{A}^U_B(t_n)/n = {\mathrm e}^{\lambda s} \pi _B \pi _R W_V^{\scriptscriptstyle {(n)}}W_U^{\scriptscriptstyle {(n)}}/ \lambda (1+o(1)). \end{array} \end{aligned}$$

Differentiating with respect to s yields the required result for the intensity measures (rate functions).

Blue–red collision We need a slightly longer argument to get the independence of this process and the other processes, i.e., that \(\mathrm{C}_{B,R}(s)\) is asymptotically independent of \(\mathrm{C}_{R,B}(s)\) and \(\mathrm{C}_{B,B}(s)\), and later also from \(\mathrm{C}_{R,R}(s)\). For this, let us recall that we stopped the evolution of \(\mathrm{SWT}^U\) at time \(t_n\). Hence, we consider \(\mathrm{SWT}^U(t_n)\) as a fixed set of intervals, \(\{I_k, k=1,...,\mathrm{N}^U_B(t_n)\}\) (some of them might have already possibly merged by time \(t_n\)). Again, we write

$$\begin{aligned} \mathrm{C}_{B,R}(s) = \sum _{x \in \mathcal {A}^U_{R}(t_n)} {\mathbbm {1}}\{x \in \mathcal {N}^V_{B}(t_n+s)\}. \end{aligned}$$
(3.12)

Consider an individual \(x \in \mathcal {A}^U_{R}(t_n)\). Then, let us write \(I_x\) for the explored interval of x, \(c_x\) for the location of the center – and already explored blue individual in \(\mathrm{SWT}^U(t_n)\) – and \(x'\) for the other active red individual at the other end of \(I_x\). Let us write \(\ell _x, r_x\) for the number of explored red vertices to the left and to the right of \(c_x\) in \(I_x\), i.e., x is at location \(c_x-\ell _x-1\) and \(x'\) is at \(c_x+r_x+1\) or the other way round. Note that \(c_x\), an explored blue label was chosen uniformly, and as a result, the marginal distribution of the labels of \(x, x', c_x\) are all uniform. We can rewrite the above sum in (3.12) as

$$\begin{aligned} \mathrm{C}_{B,R}(s) = \sum _{c_x \in \mathcal {N}^U_{B}(t_n)} {\mathbbm {1}}\{c_x \in \mathcal {N}^V_{B}(t_n+s)-r_x-1\}+{\mathbbm {1}}\{c_x \in \mathcal {N}^V_{B}(t_n+s)+\ell _x+1\},\qquad \end{aligned}$$
(3.13)

where \(\mathcal {N}^V_{B}(t_n+s)+a\) stands for shifting the whole set \(\mathcal {N}^V_{B}(t_n+s)\) by a modulo n. We aim to show that this converges to a Poisson point process with mean \(\mathrm{N}^U_B(t_n) \cdot 2 \mathrm{N}^V_{B}(t_n+s)/n\) by using Proposition 3.4. Indeed, consider the event \(\{c_x \in \mathcal {N}^V_{B}(t_n+s)-r_x-1\}\) as type 1 success, and the event \(\{c_x \in \mathcal {N}^V_{B}(t_n+s)+\ell _x+1\}\) as type 2 success. These both have probability \(\mathrm{N}^V_{B}(t_n+s)/n\) since \(c_x\) is a blue, hence uniform label, and a shift does not change the size of a set. In this case, double success occurs when \(c_x \in (\mathcal {N}^V_{B}(t_n+s)-r_x-1)\cap (\mathcal {N}^V_{B}(t_n+s)+\ell _x+1)\), which is equivalent to \(c_x+r_x+1, c_x-\ell _x-1 \in \mathcal {N}^V_{B}(t_n+s)\). Recall \(\mathrm{N}^V_{B,e}(t_n+s)\) that is the ’effective size’, i.e., the number of non-thinned and non-ghost labels in \(\mathcal {N}^V_{B}(t_n+s)\), which is asymptotically a random constant times \(\sqrt{n}\), see Claim 3.2 and Corollary 2.8. With this notation, for each fixed \(c_x\), the probability of double success is the same as the probability that the uniform set of size \(\mathrm{N}^V_{B,e}(t_n+s)\) contains two fixed labels, and this probability is \(\left( {\begin{array}{c}n-2\\ \mathrm{N}^V_{B,e}(t_n+s)-2\end{array}}\right) \big / \left( {\begin{array}{c}n\\ \mathrm{N}^V_{B,e}(t_n+s)\end{array}}\right) \), a random constant times 1 / n, which is clearly \(o\big (1/\sqrt{n}\big )\). Hence, by Proposition 3.4, \(\mathrm{C}_{B,R}(s)\) is the union of two asymptotically independent Poisson processes, which both have mean \(\mathrm{N}^U_B(t_n) \cdot \mathrm{N}^V_{B}(t_n+s)/n\).

Note that \(2\mathrm{N}^U_B(t_n) =\mathrm{A}^U_R(t_n)\), since each interval contains 2 active red vertices, one on each end. Hence

$$\begin{aligned} \mathbb {E}_s[\mathrm{C}_{B,R}(s)] = \mathrm{N}^V_{B}(t_n+s) \cdot \mathrm{A}^U_{R}(t_n)/n = {\mathrm e}^{\lambda s} \pi _B \pi _R W_V^{\scriptscriptstyle {(n)}}W_U^{\scriptscriptstyle {(n)}}/\lambda (1+o(1)), \end{aligned}$$
(3.14)

by Theorem 2.6 and Corollary 2.8, then differentiation yields the result for the intensity measure.

The advantage of the form in (3.13) is that it reveals the independence of the processes \(\mathrm{C}_{B,B}(s), \mathrm{C}_{R,B}(s), \mathrm{C}_{B,R}(s)\): in the first two cases, the number of draws were indexed by the active blue individuals while here they are indexed by the explored blue individuals, which are independent and uniform, hence the dependence comes from shared indeces only. Both index sets are uniform and of order \(\sqrt{n}\), it is easy to see the expected size of the intersection is constant, hence by Markov’s inequality, at most of order \(\log n\) w.h.p. As a result, the number of shared indexes is \(o(\sqrt{n})\), we can use a modification of Proposition 3.4 to see that \(\mathrm{C}_{B,R}(s)\) is asymptotically independent from \(\mathrm{C}_{B,B}(s), \mathrm{C}_{R,B}(s)\).

Red–red collision We write again

$$\begin{aligned} \mathrm{C}_{R,R}(s) = \sum _{x \in \mathcal {A}^U_{R}(t_n)} {\mathbbm {1}}\{x \in \mathcal {N}^V_{R}(t_n+s)\}. \end{aligned}$$
(3.15)

Here we aim for a similar description as that in (3.13). Note that the argument that we used in the previous paragraph is ‘almost valid’, in the sense that we can describe the location of \(x\in \mathcal {A}^U_R(t_n)\) by describing the location of \(c_x \in \mathcal {N}^U_B(t_n)\). The extra problem we face here is the following issue: the right end of a red interval can only merge with the left end of another interval (and not with the right end). As a result, simply changing the index \(\mathcal {N}^V_{B}(t_n+s)\) to \(\mathcal {N}^V_{R}(t_n+s)\) in formula (3.13) is not quite enough. Let us quickly introduce \(\mathcal {N}^V_{R, \text {left}}(t_n+s)\) and \(\mathcal {N}^V_{R, \text {right}}(t_n+s)\) for the set of left-type red and right-type red individuals in \(\mathrm{SWT}^V(t_n+s)\), respectively. Then, we can write

$$\begin{aligned} \mathrm{C}_{R,R}(s) = \sum _{c_x \in \mathcal {N}^U_{B}(t_n)} {\mathbbm {1}}\{c_x \in \mathcal {N}^V_{R, \text {left}}(t_n+s)-r_x-1\}+{\mathbbm {1}}\{c_x \in \mathcal {N}^V_{R, \text {right}}(t_n+s)+\ell _x+1\},\nonumber \\ \end{aligned}$$
(3.16)

that is, we shift the set of left-type red particles to the left by \(r_x+1\) to get the possible location of \(c_x\); so that the right-type active individual in \(I_x\) merges with a left-type explored individual, that is, with the left side of an interval in \(\mathrm{SWT}^V(t_n+s)\). Similarly, we shift the set of right-type red particles to the right by \(\ell _x+1\) to get the possible location of \(c_x\) for a collision on the other side.

We remark on the following issue: when two intervals \(J_i(s)\in \mathrm{SWT}^V(t_n+s)\) and \(I_k\in \mathrm{SWT}^U(t_n)\) collide at time s, in principle we should stop the evolution of \(J_i(s)\), that is, for all \(s'>s\) we should have \(J_i(s')\equiv J_i(s)\). But this would cause computational difficulties later, since then we have to condition on all the earlier collisions to be able to calculate the intensity of the next one. Hence, it is easier to ignore this effect and do the following approximation on the number of red-red collisions: we let \(J_i(s)\) grow further and it might collide with more vertices inside \(I_k\). The error terms caused by such events are negligible, since such events have been already treated when we investigated the ‘extra’ thinning of \(\mathrm{SWT}^V\) imposed by \(\mathrm{SWT}^U(t_n)\), that had a negligible contribution in the sense of Claim 3.2.

We aim to use Proposition 3.4 to show that \(\mathrm{C}_{R,R}(s)\) converges to a Poisson point process which is asymptotically independent of the other three PPPs. To show that it converges to a PPP, let \(\{c_x \in \mathcal {N}^V_{R, \text {left}}(t_n+s)-r_x-1\}\) correspond to type 1 and \(\{c_x \in \mathcal {N}^V_{R, \text {right}}(t_n+s)+\ell _x+1\}\) to type 2 successes, respectively. Since \(c_x\) is a blue label that is chosen uniformly, the success probabilities are \( \mathrm{N}^V_{R, \text {left}}(t_n+s)/n\) and \(\mathrm{N}^V_{R, \text {right}}(t_n+s)/n\). In this case, double success is the event

$$\begin{aligned} c_x \in (\mathcal {N}^V_{R, \text {left}}(t_n+s)-r_x-1)\cap (\mathcal {N}^V_{R, \text {right}}(t_n+s)+\ell _x+1). \end{aligned}$$

The probability of this event, again by the fact that \(c_x\) is a uniformly chosen label, is

$$\begin{aligned} \frac{1}{n} \big | (\mathcal {N}^V_{R, \text {left}}(t_n+s)-r_x-1)\cap (\mathcal {N}^V_{R, \text {right}}(t_n+s)+\ell _x+1) \big |. \end{aligned}$$

Note that Proposition 3.4 can be applied once we show that the size of the intersection in the above formula is \(o(\sqrt{n})\). Also note that at times \(t_n+s\), the cumulative length of all the intervals is of order \(\sqrt{n}\), which also implies \(r_x\) and \(\ell _x\) are at most of order \(\sqrt{n}\), hence the left side of an interval even shifted left and right side of the same interval shifted right on a cycle of length n cannot intersect. Now we only have to consider the left side of an interval shifted left intersecting the right side of another interval shifted right. Note that their centers are blue labels that were chosen uniformly in [n], hence their shifted positions are also uniform. The number of vertices in the intersection of the shifted sets is thus stochastically dominated by the number of thinned vertices, which was handled and proven to be \(o(\sqrt{n})\), see the proof of Claim 3.2.

Proposition 3.4 now yields that \(\mathrm{C}_{R,R}(s)\) converges to a PPP with with mean

$$\begin{aligned} \begin{array}{ll} \mathbb {E}_s[\mathrm{C}_{R,R}(s)] &{}= \mathrm{N}^U_{B}(t_n) \cdot \big (\mathrm{N}^V_{R, \text {left}}(t_n+s)+\mathrm{N}^V_{R, \text {right}}(t_n+s)\big )/n \\ &{}= \mathrm{A}^U_R(t_n)/2 \cdot \mathrm{N}^V_{R}(t_n+s)/n = \frac{\pi _R^2}{2} {\mathrm e}^{\lambda s} W_U^{\scriptscriptstyle {(n)}}W_V^{\scriptscriptstyle {(n)}}/\lambda (1+o(1)), \end{array} \end{aligned}$$

where we used that \(\mathrm{N}^U_B(t_n)=\mathrm{A}^U_R(t_n)/2\), since there are two red active individuals in each interval around an explored blue label, and the asymptotics from Theorem 2.6 and Corollary 2.8.

To show independence, note that now the number of draws in the multinomial distribution are indexed by \(\mathcal {N}_B^U(t_n)\), while in \(\mathrm{C}_{B,B}(s), \mathrm{C}_{R,B}(s)\) the number of draws are indexed by \(\mathcal {A}_B^U(t_n)\), compare (3.16) to (3.11) in particular. Once again, the dependence comes from shared indeces only, the number of which is \(o(\sqrt{n})\), hence by a modification of Proposition 3.4, \(\mathrm{C}_{R,R}(s)\) is asymptotically independent of \(\mathrm{C}_{B,B}(s)\) and \(\mathrm{C}_{R,B}(s)\). To see that \(\mathrm{C}_{R,R}(s)\) and \(\mathrm{C}_{B,R}(s)\) are also asymptotically independent, consider falling in the set \(\mathcal {N}^V_{R, \text {left}}(t_n+s)-r_x-1 \cup \mathcal {N}^V_{R, \text {right}}(t_n+s)+\ell _x+1\) to be type 1 success and falling in the set \(\mathcal {N}^V_{B}(t_n+s)-r_x-1 \cup \mathcal {N}^V_{B}(t_n+s)+\ell _x+1\) to be type 2 success [compare (3.13) to (3.16)]. Our aim is to once again apply Proposition 3.4. For this, we have show that the probability of double success is negligible, that is, the size of the intersection \(\big (\mathcal {N}^V_{R, \text {left}}(t_n+s)-r_x-1 \cup \mathcal {N}^V_{R, \text {right}}(t_n+s)+\ell _x+1\big ) \cap \big (\mathcal {N}^V_{B}(t_n+s)-r_x-1 \cup \mathcal {N}^V_{B}(t_n+s)+\ell _x+1 \big )\) is \(o(\sqrt{n})\) for each fixed x. The self-intersections of the left-shifted and right-shifted system (the intersections \(\mathcal {N}^V_{R, \text {left}}(t_n+s)-r_x-1 \cap \mathcal {N}^V_{B}(t_n+s)-r_x-1\) and \( \mathcal {N}^V_{R, \text {right}}(t_n+s)+\ell _x+1 \cap \mathcal {N}^V_{B}(t_n+s)+\ell _x+1\)) have already been handled as thinned vertices and their sizes are \(o(\sqrt{n})\), check the proof of Claim 3.2. By symmetry, it is enough to consider the intersection between the system of right-shifted right sides (i.e., \(\mathcal {N}^V_{R, \text {right}}(t_n+s)+\ell _x+1\)) and the left-shifted centers (i.e., \(\mathcal {N}^V_{B}(t_n+s)-r_x-1\)). Since the sum of all interval lengths is of order \(\sqrt{n}\), both the shifts and the interval lengths are at most of order \(\sqrt{n}\), but the cycle length is n, hence the left-shifted center \(c_k-r_x-1\) of an interval \(I_k\) cannot intersect the right-shifted right side of the same interval. For any other interval \(I_j\), its center \(c_j\) is another explored blue label, hence it is independent of \(c_k\), and also uniform in [n]. As a result, the size of the intersection has the same distribution as the size of \(\mathcal {N}^V_{B}(t_n+s) \cap \mathcal {N}^V_{R, \text {right}}(t_n+s)\), which can be upper bounded by the number of thinned vertices and that has been shown to be \(o(\sqrt{n})\). This proves that we can indeed apply Proposition 3.4 and \(\mathrm{C}_{R,R}(s)\) and \(\mathrm{C}_{R,B}(s)\) are also asymptotically independent.

We emphasize that to obtain (3.5) we assumed that the number of actual intervals in \(\mathrm{SWT}^U(t_n)\) and \(\mathrm{SWT}^V(t_n+s)\) is \(\mathrm{N}_B^U(t_n)\) and \(\mathrm{N}_B^V(t_n+s)\), respectively. This is not entirely true due to the fact that intervals within \(\mathrm{SWT}^U\) or within \(\mathrm{SWT}^V\) might have merged already. However, in this case some of the included vertices are ghosts: Corollary 2.15 shows that the effective size of the active sets at times \(t_n+s\) for \(s\in \mathbb {R}\) is asymptotically the same as the number of active individuals in the CMBP, which, by the fact that every interval has precisely two active red vertices, implies that also the number of disjoint active intervals is asymptotically the same as \(\mathrm{N}_B^U(t_n)\) and \(\mathrm{N}_B^V(t_n+s)\), respectively in the two processes. \(\square \)

3.2 Proof of Theorem 1.1

It is well known [34] that if \((E_i)_{i\in N}\) is a collection of i.i.d. random variables with cumulative function \(F_E(y)\), and the points \((P_i)_{i\in \mathbb {N}}\) form a one-dimensional Poisson point process with intensity measure \(\mu (\mathrm ds)\) on \(\mathbb {R}\), then the points \((P_i, E_i)_{i\in \mathbb {N}}\) form a two-dimensional non-homogeneous Poisson point process on \(\mathbb {R}\times \mathbb {R}\), with intensity measure \(\mu (\mathrm{d}s) \times F_E(\mathrm{d}y)\).

In our case, to get the shortest path between U and V, recall from Definition 3.1 that we have to minimize the sum of time of the collision and the remaining lifetimes over the collision events. Mathematically, we want to minimize the quantity \(P_i+E_i\) over all points \((P_i,E_i)\) of the two dimensional PPP with intensity measure \(\nu (\mathrm ds \times \mathrm{d}y):=\mu (\mathrm{d}s) \times {\mathrm e}^{y} \mathrm{d}y\), since the remaining lifetimes are i.i.d. exponential random variables.

Fig. 4
figure 4

An illustration of the two dimensional PPP with points \((P_i, E_i)_{i\in \mathbb {N}}\), with the point minimizing the sum \(P_i+E_i\) indicated

Note that event \(\{\min _i P_i+E_i \ge z\}\) is equivalent to the event that there is no point in the infinite triangle \(\Delta (z) = \{(x,y): y>0, x+y<z\}\) in this two-dimensional PPP (see Fig. 4). We calculate

$$\begin{aligned} \nu (\Delta (z)) = \int _{-\infty }^{z} \int _{0}^{z-s} \mu (\mathrm{d}s) \cdot {\mathrm e}^{-y} \mathrm{d}y = W_U^{\scriptscriptstyle {(n)}}W_V^{\scriptscriptstyle {(n)}}\left( 1-\frac{1}{2}\pi _R^2\right) \frac{1}{\lambda (\lambda +1)} {\mathrm e}^{\lambda z}(1+o(1)). \end{aligned}$$

For short, we introduce

$$\begin{aligned} \mathcal {W}^{\scriptscriptstyle {(n)}}:= W_U^{\scriptscriptstyle {(n)}}W_V^{\scriptscriptstyle {(n)}}\left( 1-\frac{1}{2}\pi _R^2\right) \frac{1}{\lambda (\lambda +1)}(1+o(1)), \end{aligned}$$
(3.17)

Then we can reformulate

$$\begin{aligned} \nu (\Delta (z)) = \mathcal {W}^{\scriptscriptstyle {(n)}}{\mathrm e}^{\lambda z} \end{aligned}$$
(3.18)

Let us turn our attention back to \(\mathcal {P}_n(U,V)\), the shortest weight path between U and V. By the previous argument, we conclude

$$\begin{aligned} \mathbb {P}(\mathcal {P}_n(U,V) \ge z+2t_n|\mathcal {W}^{\scriptscriptstyle {(n)}}) = \mathbb {P}\big (\mathrm{Poi}(\nu (\Delta (z))) = 0|\mathcal {W}^{\scriptscriptstyle {(n)}}\big ) = \exp \{-\nu (\Delta (z))\}\quad \end{aligned}$$
(3.19)

Rearranging the left hand side and substituting the computed value of \(\nu (\Delta (z))\), we get

$$\begin{aligned} \mathbb {P}(2t_n-\mathcal {P}_n(U,V) \le -z | \mathcal {W}^{\scriptscriptstyle {(n)}}) = \exp \{- \mathcal {W}^{\scriptscriptstyle {(n)}}{\mathrm e}^{\lambda z} \} \end{aligned}$$

We substitute \(t_n=\log n/(2\lambda )\), and set \(z:=-x/\lambda \) to get

$$\begin{aligned} \mathbb {P}\left( \log n -\lambda \mathcal {P}_n(U,V)< x | \mathcal {W}^{\scriptscriptstyle {(n)}}\right) = \exp \{ - \exp \{ -(x - \log \mathcal {W}^{\scriptscriptstyle {(n)}}) \} \} \end{aligned}$$

We recognize on the right hand side the cumulative distribution function of a shifted Gumbel random variable, which implies

$$\begin{aligned} (\log n-\lambda \mathcal {P}_n(U,V) ) | \mathcal {W}^{\scriptscriptstyle {(n)}}\buildrel \text {d}\over {=}\Lambda + \log \mathcal {W}^{\scriptscriptstyle {(n)}}, \end{aligned}$$

Rearranging and substituting \(\mathcal {W}^{\scriptscriptstyle {(n)}}\) from (3.17), using that the martingales \((W_U^{\scriptscriptstyle {(n)}},W_V^{\scriptscriptstyle {(n)}}) \buildrel {a.s.}\over {\longrightarrow }(W_U, W_V)\), and the factor \(1+o(1)\) only depends on n and becomes an additive term when taking logarithm, we obtain

$$\begin{aligned} \mathcal {P}_n(U,V) - \frac{1}{\lambda }\log n \buildrel {d}\over {\longrightarrow }- \frac{1}{\lambda }\Lambda - \frac{1}{\lambda }\log (W_U W_V) - \frac{1}{\lambda }\log \big (1-\pi _R^2/2\big ) + \frac{1}{\lambda }\log \left( \lambda (\lambda +1) \right) . \end{aligned}$$

This finishes the proof of Theorem 1.1.

4 Epidemic Curve

Recall the definition of the epidemic curve function from Sect. 1.2. The discussion of the epidemic curve consists of three parts: first, we find the correct function f by computing the first moment of \(\mathrm{I}_n(t,U)\). Then we prove the convergence in probability by bounding the second moment. Finally, we give a characterization of \(M_{W_V}\), the moment generating function of the random variable \(W_V\), that determines the epidemic curve function f.

4.1 First Moment

First, we condition on the value \(W_U^{\scriptscriptstyle {(n)}}\) from the martingale approximation of the branching process of the uniformly chosen vertex U. Then we can express the fraction of infected individuals as a sum of indicators, and calculate its conditional expectation:

$$\begin{aligned} \mathbb {E}\left[ \mathrm{I}_n(t,U)\left| W_U^{\scriptscriptstyle {(n)}}\right. \right] = \frac{1}{n} \sum _{w \in [n]} \mathbb {P}\left( w \text { is infected by time t} \left| W_U^{\scriptscriptstyle {(n)}}\right. \right) . \end{aligned}$$

Note that the rhs equals the probability of a uniformly chosen vertex, which we shall denote by V, being infected. Also note that a vertex is infected if and only if its distance from U is shorter than the time passed, hence

$$\begin{aligned} \mathbb {E}\left[ \mathrm{I}_n(t,U)\left| W_U^{\scriptscriptstyle {(n)}}\right. \right] = \mathbb {P}\left( \mathcal {P}_n(U,V) \le t \left| W_U^{\scriptscriptstyle {(n)}}\right. \right) . \end{aligned}$$
(4.1)

Now we can further condition on \(W_V^{\scriptscriptstyle {(n)}}\) and use the distribution of \(\mathcal {P}_n(U,V)\) conditioned on \(W_U^{\scriptscriptstyle {(n)}}, W_V^{\scriptscriptstyle {(n)}}\). Recall from Sect. 3.2, that

$$\begin{aligned} \mathbb {P}\left. \left( \mathcal {P}_n(U,V) \ge z+2t_n \right| W_U^{\scriptscriptstyle {(n)}}, W_V^{\scriptscriptstyle {(n)}}\right) = \exp \left\{ -W_U^{\scriptscriptstyle {(n)}}W_V^{\scriptscriptstyle {(n)}}\left( 1-\tfrac{1}{2} \pi _R^2\right) \tfrac{1}{\lambda (\lambda +1)}{\mathrm e}^{\lambda z}\right\} .\quad \end{aligned}$$
(4.2)

Let us set here \(z=t-\frac{1}{\lambda }\log W_U^{\scriptscriptstyle {(n)}}\) and rearrange, yielding

$$\begin{aligned} \begin{array}{ll} &{}\mathbb {P}\left( \mathcal {P}_n(U,V) \le t-\tfrac{1}{\lambda }\log W_U^{\scriptscriptstyle {(n)}}+ \tfrac{1}{\lambda }\log n \left| W_U^{\scriptscriptstyle {(n)}}, W_V^{\scriptscriptstyle {(n)}}\right. \right) \\ &{}= 1- \exp \left\{ - W_V^{\scriptscriptstyle {(n)}}\left( 1-\tfrac{1}{2} \pi _R^2\right) \tfrac{1}{\lambda (\lambda +1)}{\mathrm e}^{\lambda t}\right\} \end{array} \end{aligned}$$

Then, from (4.1) and (4.3) we get

$$\begin{aligned} \mathbb {E}\left[ \mathrm{I}_n(t-\tfrac{1}{\lambda }\log W_U^{\scriptscriptstyle {(n)}}+ \tfrac{1}{\lambda }\log n,U)| W_U^{(n)}\right] = 1-\mathbb {E}\Big [\exp \left\{ - W_V^{\scriptscriptstyle {(n)}}\left( 1-\tfrac{1}{2} \pi _R^2\right) \tfrac{1}{\lambda (\lambda +1)}{\mathrm e}^{\lambda t}\right\} \Big ].\nonumber \\ \end{aligned}$$
(4.3)

We recognize that the second term on the right hand side is the moment generating function of \(W_V^{\scriptscriptstyle {(n)}}\), \(\mathrm{M}_{W_V^{\scriptscriptstyle {(n)}}}(x)\), at \(x(t) = - \left( 1-\frac{1}{2} \pi _R^2\right) \frac{1}{\lambda (\lambda +1)}{\mathrm e}^{\lambda t}\).

Changing variables yields

$$\begin{aligned} \mathbb {E}[\mathrm{I}_n(t + \tfrac{1}{\lambda }\log n,U)| W_U^{\scriptscriptstyle {(n)}}] = 1 - M_{W_V^{\scriptscriptstyle {(n)}}}\left( x(t + \tfrac{1}{\lambda }\log W_U^{\scriptscriptstyle {(n)}})\right) . \end{aligned}$$

Note that \(W_V^{\scriptscriptstyle {(n)}}\) converges to \(W_V\) almost surely, which implies their moment generating functions converge in probability. This function is exactly the one given in Theorem 1.3.

4.2 Second Moment

The first moment above showed that the expected value of \(\mathrm{I}_n(t,U)\) indeed converges in probability to the defined f function at the given point. We prove Theorem 1.3 by showing that the variance of \(\mathrm{I}_n(t,U)\) converges to 0, then Chebyshev’s inequality yields that \(\mathrm{I}_n(t,U)\) converges to its expectation in probability.

Denote by \({\mathbbm {1}}_i = {\mathbbm {1}}\{i \text { is infected by time t}\}\). Let us calculate

$$\begin{aligned}&\mathrm{Var}\Big (\frac{1}{n} \sum _{i \in [n]} {\mathbbm {1}}_i| W_U^{\scriptscriptstyle {(n)}}\Big ) \\&\quad = \frac{1}{n^2} \sum _{i \in [n]} \mathrm{Var}({\mathbbm {1}}_i|W_U^{\scriptscriptstyle {(n)}}) + \frac{2}{n^2}\sum _{i<j \in [n]} \mathrm{Cov}[{\mathbbm {1}}_i,{\mathbbm {1}}_j| W_U^{\scriptscriptstyle {(n)}}] \end{aligned}$$

Since \({\mathbbm {1}}_i\) is an indicator, \(\mathrm{Var}({\mathbbm {1}}_i|W_U^{\scriptscriptstyle {(n)}}) \le 1\), hence the first term on the rhs is at most \(\frac{1}{n}\). As for the second term,

$$\begin{aligned} \begin{array}{ll} &{}\mathrm{Cov}[{\mathbbm {1}}_i,{\mathbbm {1}}_j| W_U^{\scriptscriptstyle {(n)}}] = \mathbb {E}[{\mathbbm {1}}_i{\mathbbm {1}}_j | W_U^{\scriptscriptstyle {(n)}}] - \mathbb {E}[{\mathbbm {1}}_i | W_U^{\scriptscriptstyle {(n)}}] \mathbb {E}[{\mathbbm {1}}_j |W_U^{\scriptscriptstyle {(n)}}] \\ &{}\quad = \mathbb {P}(i \text { and } j \text { are both are infected} | W_U^{\scriptscriptstyle {(n)}}) - \mathbb {P}(i \text { is infected}|W_U^{\scriptscriptstyle {(n)}})\mathbb {P}(j \text { is infected}|W_U^{\scriptscriptstyle {(n)}}) \end{array} \end{aligned}$$

Imagine now three exploration processes on \(\mathrm{NW}_n\), one from U, one from i and one from j. It is not hard to see that the three exploration processes from these three vertices can be approximated by three independent branching processes. This implies that the covariance can be bounded by the error of coupling between the graph and the branching processes, as well as the thinning inside one tree and between the trees: these all have error terms of order at most \(1/\log n\). It is not hard to see that the coupling can be extended to three \(\mathrm{SWT}\)’s (instead of two, as before), and the error terms are only multiplied by constants. The connection processes between \(\mathrm{SWT}^U\) and the other two are related only through the intersection of \(\mathrm{SWT}^i\) and \(\mathrm{SWT}^j\), which is again at most of order \(1/\log n\). As a result, then \(\mathbb {P}(i \text { and } j \text { are both are infected} | W_U^{\scriptscriptstyle {(n)}})-\mathbb {P}(i \text { is infected}|W_U^{\scriptscriptstyle {(n)}})\mathbb {P}(j \text { is infected}|W_U^{\scriptscriptstyle {(n)}})=O(1/\log n)\).

This coupling works if i and j are fairly apart, say \((i-j) \, \mathrm{mod}\ n \,> (\log n)^{1+\varepsilon }\) for any \(\varepsilon >0\) (this is w.h.p. longer than the length of the longest red interval). The number of “bad pairs”, which are closer than this is \(n (\log n)^{1+\varepsilon }/2\), compared to the number of all pairs \(\left( {\begin{array}{c}n\\ 2\end{array}}\right) \), the fraction goes to 0. Even for these, the covariance is bounded by 1. Then the sum divided by \(n^2\) goes to 0.

With that, we have bounded the variance by a term that goes to 0, which finishes the proof.

4.3 Characterization of the Epidemic Curve Function

In this section, we prove Proposition 1.5. Recall that adding a superscript (B) or (R) indicates a branching process described in Sect. 2.3 that is started from a blue or red type vertex, respectively. We start with the recursive formula for the martingale limit random variables from [5]:

$$\begin{aligned} W^{\scriptscriptstyle {(B)}}\buildrel \text {d}\over {=}\sum _{i=1}^{D^{(B)}_R} {\mathrm e}^{-\lambda X_i} W^{\scriptscriptstyle {(R)}}_i + \sum _{j=1}^{D^{(B)}_B} {\mathrm e}^{-\lambda X_j} W^{\scriptscriptstyle {(B)}}_j, \end{aligned}$$

where \(W^{\scriptscriptstyle {(R)}}_i\) are independent copies of \(W^{\scriptscriptstyle {(R)}}= \lim _{t\rightarrow \infty } {\mathrm e}^{-\lambda t} Z^{\scriptscriptstyle {(R)}}(t)\), and \(W^{\scriptscriptstyle {(B)}}_j\) are independent copies of \(W^{\scriptscriptstyle {(B)}}\buildrel \text {d}\over {=}W_V\), and \(X_i, X_j\) are i.i.d. \(\mathrm{Exp}(1)\). Denote the moment generating functions of \(W^{\scriptscriptstyle {(B)}}\) and \(W^{\scriptscriptstyle {(R)}}\) by \(M_{W^{\scriptscriptstyle {(B)}}}\), \(M_{W^{\scriptscriptstyle {(R)}}}\) respectively. Recall that a blue individual has two red and \(\mathrm{Poi}(\rho )\) many blue children. Hence

$$\begin{aligned} \textstyle \mathbb {E}\left[ {\mathrm e}^{\vartheta W^{\scriptscriptstyle {(B)}}}\right] = \left( \mathbb {E}\left[ \exp \{\vartheta {\mathrm e}^{-\lambda X_i}W^{\scriptscriptstyle {(R)}}\}\right] \right) ^2 \cdot \mathbb {E}\left[ \exp \left\{ \vartheta \sum _{j=1}^{D^{(B)}_B} {\mathrm e}^{-\lambda X_j} W^{\scriptscriptstyle {(B)}}_j\right\} \right] \end{aligned}$$
(4.4)

We use law of total expectation with respect to \(X_i\) to compute

$$\begin{aligned} \begin{array}{ll} J^{(R)} &{}:= \mathbb {E}\left[ \exp \{\vartheta {\mathrm e}^{-\lambda X_i}W^{\scriptscriptstyle {(R)}}\}\right] = \int _0^\infty \mathbb {E}\left[ \exp \{\vartheta {\mathrm e}^{-\lambda x}W^{\scriptscriptstyle {(R)}}\}\right] {\mathrm e}^{-x} \mathrm{d}x \\ &{}= \int _0^\infty M_{W^{\scriptscriptstyle {(R)}}} (\vartheta {\mathrm e}^{-\lambda x}) {\mathrm e}^{-x} \mathrm{d}x \end{array} \end{aligned}$$

Let \(J^{(B)}\) defined similarly, with \(M_{W^{\scriptscriptstyle {(R)}}}\) replaced by \(M_{W^{\scriptscriptstyle {(B)}}}\). Then, the second factor in (4.4) can be treated by conditioning on \(D^{(B)}_{B}\) and using independence:

$$\begin{aligned} \textstyle \prod _{j=1}^{D^{(B)}_B} \mathbb {E}\Big [\exp \{\vartheta {\mathrm e}^{-\lambda X_j} W^{\scriptscriptstyle {(B)}}_j\} \Big ] := \prod _{j=1}^{D^{(B)}_B} J^{(B)} = \left( J^{(B)}\right) ^{D^{(B)}_B}. \end{aligned}$$

Taking expectation w.r.t. \(D^{(B)}_B\buildrel \text {d}\over {=}\mathrm{Poi}(\rho )\) yields that

$$\begin{aligned} \mathbb {E}\left[ \exp \Big \{\vartheta \textstyle \sum _{j=1}^{D^{(B)}_B} {\mathrm e}^{-\lambda X_j} W^{\scriptscriptstyle {(B)}}_j\Big \} \right] = \exp \left\{ \rho \big (J^{(B)} -1\big )\right\} \end{aligned}$$

We can rewrite the factor in exponent as

$$\begin{aligned} J^{(B)} -1 = \int _0^\infty M_{W^{\scriptscriptstyle {(B)}}} (\vartheta {\mathrm e}^{-\lambda x}) {\mathrm e}^{-x} \mathrm{d}x - 1 = \int _0^\infty \left( M_{W^{\scriptscriptstyle {(B)}}} (\vartheta {\mathrm e}^{-\lambda x}) -1\right) {\mathrm e}^{-x} \mathrm{d}x, \end{aligned}$$

then the moment generating function in (4.4) becomes

$$\begin{aligned} M_{W^{\scriptscriptstyle {(B)}}} (\vartheta ) = \left( \int _0^\infty M_{W^{\scriptscriptstyle {(R)}}} (\vartheta {\mathrm e}^{-\lambda x}) {\mathrm e}^{-x} \mathrm{d}x \right) ^2 \,\cdot \, \exp \!\left\{ \rho \cdot \int _0^\infty \left( M_{W^{\scriptscriptstyle {(B)}}} (\vartheta {\mathrm e}^{-\lambda x}) -1\right) {\mathrm e}^{-x} \mathrm{d}x \right\} . \end{aligned}$$

Similarly for \(M_W^{\scriptscriptstyle {(R)}}\), using that \(D^{(R)}_R = 1\) and \(D^{(R)}_B \buildrel \text {d}\over {=}\mathrm{Poi}(\rho )\),

$$\begin{aligned} M_{W^{\scriptscriptstyle {(R)}}} (\vartheta ) = \int _0^\infty M_{W^{\scriptscriptstyle {(R)}}} (\vartheta {\mathrm e}^{-\lambda x}) {\mathrm e}^{-x} \mathrm{d}x \,\cdot \, \exp \! \left\{ \rho \cdot \int _0^\infty \left( M_{W^{\scriptscriptstyle {(B)}}} (\vartheta {\mathrm e}^{-\lambda x}) -1\right) {\mathrm e}^{-x} \mathrm{d}x \right\} \end{aligned}$$

We have just showed that the moment generating functions satisfy the system of equations given in Proposition 1.5, and by [5], there exist proper moment generating functions satisfying these functional equations.

5 Central Limit Theorem for the Hopcount

In this section we prove Theorem 1.2 that states that the hopcount \(\mathrm{H}_n(U,V)\), the number of edges along the shortest weight path between two vertices U and V chosen uniformly at random, satisfies a central limit theorem with mean and variance both \(\frac{\lambda +1}{\lambda } \log n\).

For this, we consider the shortest weight path between U and V in two parts: the path from U within \(\mathrm{SWT}^U(t_n)\) and from V within \(\mathrm{SWT}^V(\cdot )\), to the vertex where the connection happens. We denote the vertex where the connection happens by Y. These paths are disjoint with the exception of Y, hence if suffices to determine their lengths, i.e. the graph distance of Y from U and Y from V. Denote by \(G^{(U)}(Y)\) the generation of Y in \(\mathrm{SWT}^U\), similarly for V. Then the required steps from the root U to Y is exactly \(G^{(U)}(Y)\).

Claim 5.1

The choice of Y is asymptotically independent in the two \(\mathrm{SWT}\)’s.

Proof

Conditioned on Y being the connecting vertex, it is uniformly chosen over the active set of \(\mathrm{SWT}^V\). That determines its label, and it determines which particle is chosen in \(\mathrm{SWT}^U\) through the label. Since the labeling is independent of the structure of the family tree, aside from the thinning, the choice of Y in \(\mathrm{SWT}^U\) is independent from its choice in \(\mathrm{SWT}^V\). We have already bounded the fraction of ghost particles (those who have one of their ancestors thinned) by a term that goes to 0 in Lemma 2.12, hence asymptotic independence holds. \(\square \)

With these notations, \(\mathrm{H}_n(U,V) = G^{(U)}(Y) + G^{(V)}(Y)\), and the two terms are independent. We reformulate the theorem using these terms:

$$\begin{aligned} \frac{\mathrm{H}_n(U,V) - \frac{\lambda +1}{\lambda } \log n}{\sqrt{\frac{\lambda +1}{\lambda } \log n}} = \frac{G^{(U)}(Y) - \frac{\lambda +1}{2\lambda } \log n}{\sqrt{\frac{\lambda +1}{\lambda } \log n}} + \frac{G^{(V)}(Y) - \frac{\lambda +1}{2\lambda } \log n}{\sqrt{\frac{\lambda +1}{\lambda } \log n}} \end{aligned}$$
(5.1)

Considering that the terms are independent, it suffices to show that both terms on the right hand side have normal distribution with mean 0 and variance \(\tfrac{1}{2}\). We show that both terms on the rhs of (5.1), multiplied by \(\sqrt{2}\), have standard normal distribution. Due to the method we established the connection between \(\mathrm{SWT}^U\) and \(\mathrm{SWT}^V\), the two terms need to be treated somewhat differently.

5.1 Generation of the Connecting Vertex in \(\mathrm{SWT}^V\)

Recall that we established the connection between \(\mathrm{SWT}^U\) and \(\mathrm{SWT}^V\) in the following way: we grew \(\mathrm{SWT}^U\) until time \(t_n\), then we freeze its evolution. Then, we grow \(\mathrm{SWT}^V\), and every time a label is assigned to a splitting particle, we check if this label belongs to the active set of \(\mathrm{SWT}^U\). As a result, the connecting vertex Y is a particle at some splitting time \(T_k\), and hence chosen uniformly over the active vertices. This implies that for its generation, we can use the indicator decomposition of the ancestral line described in Sect. 2.4.2, for Y’s generation as \(G_k = \sum _{i=1}^{k} {\mathbbm {1}}_i \), where conditioned on the offspring variables \(D_i\), the indicators are independent and have success probability \(\mathbb {P}({\mathbbm {1}}_i=1)=\frac{D_i}{S_i}\).

In our case the number of splits is a random variable. Recall from Sect. 3.2 that the connection time minus \(t_n\) forms a tight random variable, [see e.g. (3.19)], hence till the connection there are \(\mathrm{N}(t_n + Z)\) many explored vertices for some random \(Z \in \mathbb {R}\). By Corollary 2.8, \(\mathrm{N}(t_n+u)=C \sqrt{n}\) for some bounded random variable C (that might depend on n, but is tight). Denote by

$$\begin{aligned} \begin{array}{ll} B_1 &{}= \frac{\sum _{i=1}^{C\sqrt{n}}{\mathbbm {1}}_i - \sum _{i=1}^{C\sqrt{n}}\frac{D_i}{S_i}}{\sqrt{\sum _{i=1}^{C\sqrt{n}} \frac{D_i}{S_i}\left( 1-\frac{D_i}{S_i}\right) }}, \quad B_2 = \frac{\sqrt{ \sum _{i=1}^{C\sqrt{n}} \frac{D_i}{S_i}\left( 1-\frac{D_i}{S_i}\right) }}{\sqrt{\frac{\lambda +1}{2\lambda } \log n}}, \\ B_3 &{}= \frac{\sum _{i=1}^{C\sqrt{n}} \frac{D_i}{S_i} - \frac{\lambda +1}{2\lambda }\log n}{\sqrt{\frac{\lambda +1}{2\lambda } \log n}} \end{array} \end{aligned}$$
(5.2)

Then

$$\begin{aligned} \frac{G^{(V)}(Y) - \frac{\lambda +1}{2\lambda } \log n}{\sqrt{\frac{\lambda +1}{2\lambda } \log n}}= \frac{\sum _{i=1}^{C\sqrt{n}}{\mathbbm {1}}_i - \frac{\lambda +1}{2\lambda } \log n}{\sqrt{\frac{\lambda +1}{2\lambda } \log n}}= B_1 \!\cdot \! B_2 + B_3 \end{aligned}$$
(5.3)

Our aim is to show that Lindeberg’s CLT is applicable for \(B_1\), \(B_2\) converges to 1, and \(B_3\) converges to 0.

5.1.1 Term \(B_1\)

For this sum of (conditionally independent) indicators, Lindeberg’s condition is trivially satisfied if the total variance tends to infinity. To give a lower bound,

$$\begin{aligned} \sum _{i=1}^{C\sqrt{n}} D_i/S_i\left( 1-D_i/S_i\right) = \sum _{i=1}^{C\sqrt{n}}D_i/S_i - \sum _{i=1}^{C\sqrt{n}}D_i^2/S_i^2. \end{aligned}$$
(5.4)

Recall Lemma 2.11, and split the sum according to the random variable K. Each vertex has at least one red child, hence \(D_i\ge 1\). Then

$$\begin{aligned} \sum _{i=1}^{C\sqrt{n}}D_i/S_i \ge \sum _{i=K+1}^{C\sqrt{n}}\frac{1}{i\lambda (1+o(i^{-1/2+\varepsilon }))}. \end{aligned}$$

where K is a.s. finite. The \(i^\text {th}\) term on the rhs is at least 1 / (2i), and thus the rhs tends to infinity, and is at least \(\log n/(2\lambda )\). For the second term in (5.4), we can use that the second moment of \(D_i \buildrel \text {d}\over {=}\mathrm{Poi}(\rho )+1+{\mathbbm {1}}\{i^\text {th}\text { explored is blue}\}\) can be bounded by some constant \(M_2\) independent of i. Hence, again cutting the sum at K, the sum of the first K terms is a.s. finite. For the rest, we can use Lemma 2.11 again, and then Markov’s inequality yields:

$$\begin{aligned} \mathbb {P}\!\left( \sum _{i=K+1}^{C\sqrt{n}}\!\! \frac{D_i^2}{\left( i^2\lambda ^2(1+o(i^{-1/2+\varepsilon })^2\right) } \ge \!\!\sum _{i=K+1}^{C\sqrt{n}}\!\! \frac{M_2}{\left( i^2 2 \lambda ^2 \right) } \cdot \log \log n \right) \!\le \frac{1}{\log \log n}. \end{aligned}$$
(5.5)

Note that \(\sum _{i=1}^{\infty } M^2/(i^2 2 \lambda ^2) \le \tfrac{M_2\pi ^2}{12} \). Combining the two estimates for the two terms in (5.4), we see that the variance tends to infinity w.h.p. As a result, the term \(B_1\) in (5.2) satisfies a CLT.

5.1.2 Term \(B_2\)

Similarly as for the term \(B_1\), we cut the sum at K given by Lemma 2.11 and write

$$\begin{aligned} \frac{\sum _{i=1}^{C\sqrt{n}}D_i/S_i(1-D_i/S_i)}{\frac{\lambda +1}{2\lambda } \log n} = \frac{\sum _{i=1}^{K}D_i/S_i(1-D_i/S_i)}{\frac{\lambda +1}{2\lambda } \log n} + \frac{\sum _{i=K+1}^{C\sqrt{n}}D_i/S_i}{\frac{\lambda +1}{2\lambda } \log n} - \frac{\sum _{i=1}^{C\sqrt{n}}D_i^2/S_i^2}{\frac{\lambda +1}{2\lambda } \log n}. \end{aligned}$$

The first fraction tends to 0, as the numerator is a.s. finite. For the numerator in the third term, we can use (5.5) again, which shows that the third term tends to zero w.h.p. We have yet to show that the second term tends to 1. Let \(\mathcal {F}_n = \sigma (D_1,...,D_n)\) be the filtration generated by the random variables \(D_i\). Then

$$\begin{aligned} \frac{\sum _{i=K+1}^{C\sqrt{n}}D_i/S_i}{\frac{\lambda +1}{2\lambda } \log n} = \frac{\sum _{i=K+1}^{C\sqrt{n}} \frac{D_i- \mathbb {E}[D_i|\mathcal {F}_{i-1}]}{i\lambda (1+o(i^{-1/2+\varepsilon }))} }{\frac{\lambda +1}{2\lambda } \log n} + \frac{\sum _{i=K+1}^{C\sqrt{n}} \frac{\mathbb {E}[D_i|\mathcal {F}_{i-1}]}{i\lambda (1+o(i^{-1/2+\varepsilon }))}}{\frac{\lambda +1}{2\lambda } \log n}. \end{aligned}$$
(5.6)

For the first term of the rhs of (5.6), we will use Chebyshev’s inequality. For this, elementary calculation using tower rule yields that

$$\begin{aligned} \mathrm{Var}\left[ \sum _{i=K+1}^{C\sqrt{n}} \frac{D_i-\mathbb {E}[D_i|\mathcal {F}_{i-1}]}{i\lambda (1+o(i^{-1/2+\varepsilon }))} \right] \le \sum _{i,K+1}^{C\sqrt{n}} \frac{\mathbb {E}[ D_i^2|\mathcal {F}_{i-1}]}{\lambda ^2 i^2 (1+o(i^{-1/2+\varepsilon })^2} \end{aligned}$$

Since \(\mathbb {E}[ D_i^2|\mathcal {F}_{i-1}]\le M_2\) as in Sect. 5.1.1, we get that the rhs is at most \(M_2 \pi ^2/(3\lambda ^2)\). Then Chebyshev’s inequality yields

$$\begin{aligned} \mathbb {P}\left( \left| \sum _{i=K+1}^{C\sqrt{n}} \frac{D_i- \mathbb {E}[D_i|\mathcal {F}_{i-1}]}{i\lambda (1+o(i^{-1/2+\varepsilon }))} \right| \ge \log \log n \cdot \frac{\pi ^2}{3} \frac{M_2}{\lambda ^2} \right) \le \frac{1}{(\log \log n)^2}. \end{aligned}$$
(5.7)

This implies that the first term in (5.6) tends to 0 w.h.p.

Now to show that the second term in (5.6) tends to 1, we use a corollary of Theorem 2.6 (see [5]), stating that the vector \(\left( \frac{S_{i-1}^R}{S_{i-1}}, \frac{S_{i-1}^B}{S_{i-1}}\right) \rightarrow (\pi _R,\pi _B)\) a.s. Further analysis (in particular, the central limit theorem about \((S_{i}^R, S_i^B)\) in [26]) yields that the error term is at most of order \(i^{-1/2+\varepsilon }\). Hence, using that \(D_i \buildrel \text {d}\over {=}\mathrm{Poi}(\rho )+1+{\mathbbm {1}}\{i^\text {th}\text { explored is blue}\}\) and the definition of \(\lambda \) it is elementary to show that

$$\begin{aligned} \mathbb {E}[D_i|\mathcal {F}_{i-1}] = (\lambda +1)(1+o(i^{-1/2+\varepsilon })) \end{aligned}$$

Substituting this into the sum, we have

$$\begin{aligned} \frac{\sum _{i=K+1}^{C\sqrt{n}} \frac{\mathbb {E}[D_i|\mathcal {F}_{i-1}]}{i\lambda (1+ o(i^{-1/2+\varepsilon }))}}{\frac{\lambda +1}{2\lambda } \log n} = \frac{\sum _{i=K+1}^{C\sqrt{n}} 1/i }{\log n/2} + \frac{\sum _{i=K+1}^{C\sqrt{n}} o(i^{-1/2+\varepsilon })/i}{\log n/2} \end{aligned}$$
(5.8)

The first term on the rhs, introducing a constant error term \(\delta \) from the integral approximation, equals

$$\begin{aligned} \frac{\sum _{i=K+1}^{C\sqrt{n}} 1/i }{ \log n/2} = \frac{\log (C\sqrt{n})-\log (K+1)+\delta }{\log n/2} \buildrel {n\rightarrow \infty }\over {\longrightarrow }1, \end{aligned}$$
(5.9)

since C is a tight random variable. The second term in (5.8) is at most \(\sum _{i=0}^{\infty }O(i^{-3/2+\varepsilon })\) is summable and finite, hence divided by \(\log n\) it tends to 0. Combining everything, we get that \(B_2\) in (5.2) tends to 1 w.h.p.

5.1.3 Term \(B_3\)

As before, we cut this sum at K given by Lemma 2.11, and the sum of the first K terms divided by \(\log n\) tends to 0 since \(D_i/S_i<1\). When we consider the rest of the sum, we use the approximation of \(S_i\) (given by Lemma 2.11) and add and subtract \(\mathbb {E}[D_i|\mathcal {F}_{i-1}]\) again:

$$\begin{aligned} \frac{ \sum _{i=K+1}^{C\sqrt{n}} \frac{D_i}{S_i} - \frac{\lambda +1}{2\lambda }\log n}{\sqrt{\frac{\lambda +1}{2\lambda }\log n}}= & {} \frac{ \sum _{i=K+1}^{C\sqrt{n}} \frac{D_i - \mathbb {E}[D_i|\mathcal {F}_{i-1}]}{i\lambda (1+o(i^{-1/2+\varepsilon }))} }{\sqrt{\frac{\lambda +1}{2\lambda }\log n}}\nonumber \\&+\, \frac{ \left( \sum _{i=K+1}^{C\sqrt{n}} \frac{\mathbb {E}[D_i|\mathcal {F}_{i-1}]}{i\lambda (1+o(i^{-1/2+\varepsilon }))} \right) - \frac{\lambda +1}{2\lambda }\log n}{\sqrt{\frac{\lambda +1}{2\lambda }\log n}} \end{aligned}$$
(5.10)

The numerator of the first term on the rhs has been treated in (5.7) and is w.h.p of order at most \(\log \log n\), hence the first term on the rhs tends to 0 w.h.p. For the second term on the rhs of (5.10) we can use (5.8) and (5.9), and then it is at most

$$\begin{aligned} \frac{\lambda +1}{\lambda } \cdot \frac{ \log C - \log (K+1) +\delta + \sum _{i=K+1}^{C\sqrt{n}} o(i^{-1/2+\varepsilon })/i }{\sqrt{\frac{\lambda +1}{2\lambda }\log n}} \rightarrow 0 \end{aligned}$$

almost surely, since C is a tight random variable. This shows that the term \(B_3\) in (5.2) tends to 0 w.h.p, and finishes the proof of the CLT for the generation of the connecting vertex in \(\mathrm{SWT}^V\), see (5.3).

5.2 Generation of the Connecting Vertex in \(\mathrm{SWT}^U\)

For the generation of Y in \(\mathrm{SWT}^U\), we have to use a different approach. This is because the label of the connecting vertex is chosen uniformly among the active vertex of \(\mathrm{SWT}^V\) but is not necessarily uniform over the active vertices in \(\mathrm{SWT}^U\). Indeed, it is a longish but elementary calculation to show that conditioned on the event that a connection happens, any active red label in \(\mathrm{SWT}^U\) is chosen with asymptotic probability \((\mathrm{A}^{(U)}(t_n))^{-1} (1-\tfrac{\pi _R}{2})/(1-\tfrac{\pi _R^2}{2})\), while any active blue label is chosen with asymptotical probability \((\mathrm{A}^{(U)}(t_n))^{-1}1/(1-\tfrac{\pi _R^2}{2})\), where \(\mathrm{A}^{(U)}(t_n)\) is the total number of active vertices in \(\mathrm{SWT}^U\). However, the following claim is still valid and will be enough to show the needed CLT:

Claim 5.2

Conditioned on the connecting vertex having a label of a certain color in \(\mathrm{SWT}^U\), with high probability, it is chosen uniformly at random among the active labels of that color in \(\mathrm{SWT}^U\).

Proof

We show the statement first for color blue. Recall that a blue label was chosen uniformly in [n]. Since the restriction of a uniform distribution to any set is again uniform, the probability of connection is the same for any particular blue active label among the different labels in the active blue labels in \(\mathrm{SWT}^U\). Recall that the number of different labels (that are neither thinned nor ghost) is called the effective size and it is treated in Corollary 2.15.

The problem here is though, that some labels in the branching process approximation are multiply active, and these are neither thinned nor called ghosts, and if chosen, they modify the uniform probability for the connection.Footnote 2 However, Corollary 2.15 implies that the fraction of multiple actives tends to 0 at time \(t_n\). Hence if we pick an active label in \(\mathrm{A}_B^U(t_n)\) it has multiplicity 1 w.h.p., including red and blue instances. (This also implies that asymptotically, the label has a well defined color.) Hence with high probability, at the connecting vertex, we have a uniform distribution over all possible blue active labels.

An analogous argument can be carried through for red active labels as well, using the fact that the centre of the interval where they belong to is chosen uniformly, and the fact that multiple red labels have proportion tending to 0 at time \(t_n\). \(\square \)

To finish the central limit theorem of \(G^{(U)}(Y)\), we use a general result of Kharlamov [28] about the generation of a uniformly chosen active individual in a given type-set in a multi-type branching process. For this, consider a type set \(\mathcal S\) of a multi-type branching process, and let \(\mathcal {A}_{\mathcal {S}} = \cup _{q \in \mathcal {S}} \mathcal {A}_q\) the set of active individuals with any type from the type-set \(\mathcal {S}\). Then, [28, Theorem2] states that the generation of a uniformly chosen individual in \(\mathcal {A}_{\mathcal S}\) satisfies a central limit theorem with asymptotic mean and variance that is independent of the choice of \(\mathcal S\).Footnote 3

To apply this result, first pick \(\mathcal {S}:=\{R,B\}\) in our case. Then, the statement simply turns into a CLT of the generation of a uniformly picked active individual. We have seen when treating \(G^{(V)}(Y)\) that the asymptotic mean and variance are both \(\frac{\lambda +1}{2\lambda } \log n\) in this case.

Now apply the result again for \(\mathcal {S}:=\{R\}\) and \(\mathcal {S}:=\{B\}\), separately. Combined with the previous observation, we get that an individual chosen uniformly at random with color blue/red, respectively, also satisfies a CLT with the same asymptotic mean and variance. This, combined with Claim 5.2, implies that whether Y is red or blue in \(\mathrm{SWT}^U\), its generation \(G^{(U)}(Y)\) admits a central limit theorem with mean and variance \(\frac{\lambda +1}{2\lambda } \log n\). This completes the proof of Theorem 1.2.