1 Introduction

Experimental evidence [5, 30, 34] suggests that in various biological systems, the complex kinetics of genetic control is reasonably well approximated by Boolean network models. This kind of models were first formulated by Kauffman [21]. Over the last few years, these models have received significant attention, both at the level of model formulations and numerical simulations (see e.g. the surveys [19, 20, 22, 28, 32] and the references therein) and at the rigorous level (see e.g. [9, 16]). The class of models can be described as follows. Genes are represented by the vertices of a directed graph \(\mathscr {G}_n=\mathscr {G}_n([n], \mathscr {E}_n)\) on n vertices, where \([n]:=\left\{ 1,2,\ldots , n\right\} \) denotes the vertex set and \(\mathscr {E}_n\) denotes the edge set of \(\mathscr {G}_n\). The state \(\eta _t(x)\) of a vertex \(x\in [n]\) at time \(t=0, 1, 2, \ldots \) is either 1 (‘on’) or 0 (‘off’), and each vertex x receives input from the vertices which point to it in \(\mathscr {G}_n\), namely

$$\begin{aligned} Y^x := \{y \in [n]: \langle y,x\rangle \in \mathscr {E}_n\}. \end{aligned}$$

\(Y^x\) is called the input set and its members say \(\{Y^x_i\}_{i=1}^{|Y^x|}\), are called the input vertices for x. The states \(\{{\varvec{\eta }}_t := (\eta _t(1), \eta _t(2), \ldots , \eta _t(n))\}_{t \geqslant 0}\) start from some initial configuration \({\varvec{\eta }}_0\) and evolve according to the update rule

$$\begin{aligned} \eta _{t+1}(x) = f_x\left( \eta _t(Y^x_1), \eta _t(Y^x_2), \ldots , \eta _t\left( Y^x_{|Y^x|}\right) \right) , \qquad x\in [n], \end{aligned}$$
(1.1)

where each \(f_x:\left\{ 0,1\right\} ^{|Y^x|}\rightarrow \left\{ 0,1\right\} \) is some Boolean function defined on the set of possible states of the input vertices for x. Here and later we use |A| to denote the size of a set A.

In order to understand general properties of such dynamical systems, various random Boolean network models have been formulated. These models form an important subfamily of Boolean network models. The first such random Boolean network (RBN) model was introduced by Kauffman [21]. The model has only one parameter, namely r (fixed number of inputs per vertex). It consists of the following specification of the constructs in the above general model. The ground graph \(\mathscr {G}_n\) is constructed by letting the input set \(Y^x\) consist of r distinct vertices \(Y^x_1, Y^x_2, \ldots , Y^x_r\), which are uniformly chosen from \([n]{\setminus }\{x\}\). The values \(f_x(\mathbf {v}), x \in [n], \mathbf {v}\in \{ 0, 1\}^{r},\) are assigned independently and each equals 1 with probability 1 / 2. Later, in a slight generalization of Kauffman’s original model, another parameter p (expression bias) was introduced in the model. In this generalization, which we denote by \(\text {RBN}_{r,p}\), each of the values \(f_x(\mathbf {v}), x \in [n], \mathbf {v}\in \{ 0, 1\}^{r},\) equals 1 with probability p as opposed to 1 / 2.

In both of the above models, the functions are time-independent, so the vales \(\{f_x(\mathbf {v})\}\) can be thought of as entries of a \(2^r\)-truth table. Once these values are chosen and fixed, the dynamics of \(\{{\varvec{\eta }}_t\}_{t\geqslant 0}\) then proceeds according to (1.1). Note that, once the values of the functions \(f_x\) and the graph \(\mathscr {G}_n\) are fixed at the beginning, the dynamics of this system is deterministic.

Kauffman’s model has been analyzed in detail for \(r=1\) [16]. The general model \(\text {RBN}_{r,p}\) has been studied extensively via simulations (see e.g. [20, 29]) and using heuristics from Statistical Physics (see e.g., [13, 14]). Derrida and Pomeau [12] have analyzed Kaufman’s model under additional assumption, which they call

$$\begin{aligned} {\textit{annealed approximation}}:&\text { the input sets } \{Y^x\}_{x\in [n]} \text { and the values } \nonumber \\&\{f_x(\mathbf {v}): x\in [n], \mathbf {v}\in \{0,1\}^r\} \text { are resampled at each time step.} \end{aligned}$$
(1.2)

See Sect. 1.1.1 for more details.

Letting \(d_{\text {Ham}}(\cdot , \cdot )\) denotes the Hamming distance between two configurations in \(\{0, 1\}^n\), the argument in [12] (under the additional assumption (1.2)) gives

$$\begin{aligned} \lim _{t\rightarrow \infty } \lim _{n\rightarrow \infty }\frac{d_{\text {Ham}}({\varvec{\eta }}_t, \tilde{{\varvec{\eta }}}_t)}{n} = {\left\{ \begin{array}{ll} 0 &{} \text { if } 2p(1-p)r<1 \\ y^*>0 &{} \text { if } 2p(1-p)r>1\end{array}\right. } \end{aligned}$$
(1.3)

for any two unequal initial configuration \({\varvec{\eta }}_0, \tilde{{\varvec{\eta }}}_0 \in \{0, 1\}^n\). Based on this observation, Derrida and Pomeau argued that

$$\begin{aligned}&\text { the } order\text {-}chaos \text { phase transition curve for } \text {RBN}_{r,p} \text { (in the sense of (1.3))} \nonumber \\&\text { is given by } 2p(1-p)\cdot r = 1. \end{aligned}$$
(1.4)

So, if \(2p(1-p)\cdot r<1\), then the configuration of zeros and ones in \(\text {RBN}_{r,p}\) are expected to be stable under perturbations in the initial configuration. On the other hand, if \(2p(1-p)\cdot r>1\), then small perturbation in initial configuration may greatly affect the behavior of the system.

The curve described in (1.4) also appears as a phase transition curve in another approximation of the deterministic dynamical system described in (1.1) using threshold contact process [9, 25], where the ground graph \(\mathscr {G}_n\) remains fixed through time. Also, the quantity \(y^*\) appearing in (1.3) equals the quasi-stationary density of occupied vertices in the threshold contact process (see the discussion after Theorem 1.5). We shall describe this approximation in Sect. 1.1.2.

Although \(\text {RBN}_{r,p}\) has been studied extensively, the model deals with an idealized setting of homogeneous ground graphs, where every vertex has the same in-degree. This assumption constrains the model in the context of biological applications where such networks have quite heterogeneous degrees [4]. So, the obvious natural questions are that if the update rule is similar to that for \(\text {RBN}_{r,p}\) (namely, given the input sets \(\{Y^x: x \in [n]\}\), where \(Y^x = \{Y^x_1, Y^x_2, \ldots , Y^x_{|Y^x|}\}\), (1.1) holds for the values \(\{f_x(\mathbf {v}): \mathbf {v}\in \left\{ 0,1\right\} ^{|Y^x|}, x \in [n]\}\) chosen via independent coin flips such that each value is 1 with probability p and 0 with probability \(1-p\)), then whether similar phase transition occurs in case of heterogeneous and more complex ground graphs, and if it occurs, how do the corresponding phase transition curves depend on the underlying parameters describing the properties of the ground graphs.

In the context of gene networks, heterogeneous graphs have been considered in the physics and biology literature. Mainly two classes of such graph models have been formulated.

  1. (i)

    Graphs with prescribed in-degree: A number of authors (see e.g., [6, 17, 24, 31]) have considered directed graph models with prescribed in-degree distribution. Here, one starts with a probability mass function \(\mathbf {p}^{\text {in}}:= \left\{ p^{\text {in}}_k\right\} _{k\ge 1}\) and chooses the ground graph uniformly at random from the collection of all simple directed graphs having in-degree distribution \(\mathbf {p}^{\text {in}}\). The precise method of construction is described in Sect. 1.2. We denote the associated random Boolean network model, where the ground graph has in-degree distribution \(\mathbf {p}^{\text {in}}\) and the update rule is similar to that of \(\text {RBN}\) with expression bias p, by \(\text {RBN}^{\text {in}}(\mathbf {p}^{\text {in}},p)\). It has been argued in the papers [6, 17, 24, 31] that under the additional assumption (1.2)

    $$\begin{aligned}&\text {the order-chaos phase transition curve for } \text {RBN}^{\text {in}}(\mathbf {p}^{\text {in}},p) \text { (in the sense of (1.3)) } \nonumber \\&\text { is given by }2p(1-p)\cdot r^{\text {in}} = 1, \text { where } r^{\text {in}} := \sum _k k\cdot p^{\text {in}}_k \text { is the average in-degree.}\nonumber \\ \end{aligned}$$
    (1.5)

Remark 1.1

An analogous model can be built, where one starts with the out-degree distribution \(\mathbf {p}^{\text {out}}=\{p^{\text {out}}_k\}_{k\geqslant 0}\) and the ground graph is chosen uniformly at random from the collection of all directed graphs having out-degree distribution \(\mathbf {p}^{\text {out}}\). In that case, the associated in-degree distribution turns out to be asymptotically Poisson with mean \(\sum _k kp^{\text {out}}_k\), which means that a positive fraction of vertices will have no input vertex for them. For these vertices the update rule cannot be defined. So we avoid this particular model. We didn’t find occurrence of this model in the physics and biology model as well.

  1. (ii)

    Graphs with prescribed joint distribution of in-degree and out-degree: Another class of models have been considered in order to incorporate correlation between the in-degree and out-degree via prescribing their joint distribution [23]. These models are more complex in nature. Here, one starts with a bivariate probability mass function \(\mathbf {p}^{\text {in,out}}=\{p^{\text {in,out}}_{k,l}\}_{k\geqslant 1, l\geqslant 0}\) and chooses a graph uniformly at random from the collection of all graphs for which the joint distribution of in-degree and out-degree is \(\mathbf {p}^{\text {in,out}}\). We defer a complete description of the construction to Sect. 1.2. We denote the analogue of \(\text {RBN}^{\text {in}}\) with joint distribution \(\mathbf {p}^{\text {in,out}}\) and expression bias p by \(\text {RBN}^{\text {in,out}}(\mathbf {p}^{\text {in,out}}, p)\). It is noted in [23] that under additional assumption (1.2),

    $$\begin{aligned}&\text {the order-chaos phase transition curve for } \text {RBN}^{\text {in,out}}(\mathbf {p}^{\text {in,out}},p) \text { (in the sense of (1.3)) } \nonumber \\&\text { is } 2p(1-p) \cdot \frac{r^{\text {in,out}}}{r^{\text {in}}}=1, \text { where } r^{\text {in,out}} := \sum _{k,l} k\cdot l\cdot p^{\text {in,out}}_{k,l}, r^{\text {in}} := \sum _{k,l} k\cdot p^{\text {in,out}}_{k,l}. \nonumber \\ \end{aligned}$$
    (1.6)

One of the aims of this current paper is to prove that the curves described in (1.5) and (1.6) represent the phase transition curves for the respective RBN models, if we approximate the dynamics by threshold contact process. See Sect. 1.1.2 for details about this approximation and the definition of the threshold contact process dynamics.

1.1 Approximations to Boolean networks

Proving rigorous results about the RBN models, which are formulated as discrete dynamical systems, turns out to be quite hard. In order to understand the phase transitions of these models and have deeper insight about their behavior, some approximations have been proposed.

1.1.1 Annealed approximations of \(\text {RBN}\) models

As mentioned earlier, Derrida and Pomeau [12] have used annealed approximation (see (1.2)) in their analysis of \(\text {RBN}_{r,p}\). Recall that \(d_{\text {Ham}}({\varvec{\eta }}, \tilde{{\varvec{\eta }}}) := \sum _{x\in [n]} (\eta (x)-\tilde{\eta }(x))\) denotes the Hamming distance between two configurations \({\varvec{\eta }}, \tilde{{\varvec{\eta }}} \in \{0, 1\}^n\). A standard calculation (see [12]) suggests that if \({\varvec{\eta }}_0\) and \(\tilde{{\varvec{\eta }}}_0\) are two initial configurations in \(\text {RBN}_{r,p}\) and \(n^{-1}d_{\text {Ham}}({\varvec{\eta }}_0, \tilde{{\varvec{\eta }}}_0)=x \in [0,1]\), then after one time step

$$\begin{aligned} d_{\text {Ham}}({\varvec{\eta }}_1, \tilde{{\varvec{\eta }}}_1) \sim Binomial\left( n, 2p(1-p)[1-(1-x)^r]\right) . \end{aligned}$$
(1.7)

Now, under the additional assumption (1.2) one can iterate (1.7) to have

$$\begin{aligned}&d_{\text {Ham}}({\varvec{\eta }}_{t+1}, \tilde{{\varvec{\eta }}}_{t+1}) \sim Binomial\left( n, 2p(1-p)[1-(1-y_t)^r]\right) , \\&\quad \text { where } y_t = n^{-1}d_{\text {Ham}}({\varvec{\eta }}_{t}, \tilde{{\varvec{\eta }}}_{t}), \end{aligned}$$

for any \(t\geqslant 0\). This in turn implies

$$\begin{aligned} \lim _{t\rightarrow \infty } \lim _{n\rightarrow \infty }\frac{d_{\text {Ham}}({\varvec{\eta }}_t, \tilde{{\varvec{\eta }}}_t)}{n} = y^*,\end{aligned}$$
(1.8)

where \(y^*\) is a fixed point (with x in its basin of attraction) of the map \(y \mapsto \varphi (y) := 2p(1-p)(1-(1-y)^r)\). If \(2p(1-p)r<1\), then 0 is the only fixed point of \(\varphi \) and it is stable. On the other hand, if \(2p(1-p)r>1\), then \(\varphi \) has another fixed point \(y^* \in (0,1)\) which is stable, and 0 is an unstable fixed point. So, (1.3) holds under assumption (1.2).

Similar idea has been used in [6, 17, 23, 24, 31] to obtain phase transition curves described in (1.5) and (1.6) for the respective RBN models under additional assumption (1.2).

1.1.2 Approximation by threshold contact process

The authors in [9] consider a different process called the threshold contact process to approximate the dynamics in RBN models. To motivate this process and explain its connection to the RBN models, a new process \(\{\varvec{\zeta }_t=(\zeta _t(1), \zeta _t(2), \ldots , \zeta _t(n))\}_{t\geqslant 1}\) has been considered in [9], where

$$\begin{aligned} \zeta _t(x) := \mathbf {1}_{\{\eta _t(x) \ne \eta _{t-1}(x)\}} \end{aligned}$$

is the exclusive OR of \(\eta _t(x)\) and \(\eta _{t-1}(x)\). Fix a vertex \(x\in [n]\). If at least one of the inputs \(y\in Y^x\) changes its state between time epochs \(t-1\) and t so that \(\eta _t(y) \ne \eta _{t-1}(y)\), then the state of vertex x at time \(t+1\) is computed by looking at a different entry of \(f_x\).

See Fig. 1 for an example. Hence,

$$\begin{aligned}&{{\mathrm{\mathbb {P}}}}(\zeta _{t+1}(x) = 1| \zeta _t(Y^x_i), 1\leqslant i\leqslant |Y^x|) \\&\quad = {{\mathrm{\mathbb {P}}}}\left( f_x\left( \eta _{t-1}(Y^x_1), \ldots , \eta _{t-1}\left( Y^x_{|Y^x|}\right) \right) \right. \\&\qquad \qquad \left. \left. \ne f_x\left( \eta _{t}(Y^x_1), \ldots , \eta _{t}\left( Y^x_{|Y^x|}\right) \right) \right| \zeta _t(Y^x_i), 1\leqslant i\leqslant |Y^x| \right) \\&\quad = {\left\{ \begin{array}{ll} 2p(1-p) &{} \text { if } \max _{1\leqslant i \leqslant |Y^x|} \zeta _t(Y^x_i)=1 \\ 0 &{} \text { otherwise}\end{array}\right. }. \end{aligned}$$

The conditional probability of the intersection \(\cap _i \{\zeta _{t+1}(x_i) = 1\}\) will be the product of probabilities of the individual events, as the functions \(f_{x_i}\) are independent.

Fig. 1
figure 1

a The states of \(\{Y^x_i\}_{i=1}^4\) is the same in time \(t-1\) and time t. Hence \(\eta _t(x)=\eta _{t+1}(x)\). But in b the state of \(Y^x_4\) is different in time \(t-1\) and time t. Hence \(\eta _t(x)\ne \eta _{t+1}(x)\) with probability \(2p(1-p)\), as both values of \(f_x\) are independent biased coin flips

This observation has motivated the authors in [9] to consider the following Markov process.

Definition 1.2

The discrete-time threshold contact process with parameter \(q \in (0,1)\) on a directed graph having vertex set [n], where \(Y^x\) denotes the set of input vertices for \(x \in [n]\), is the Markov process \(\left\{ \varvec{\xi }_t :=(\xi _t(1), \ldots , \xi _t(n))\right\} _{ t\ge 0}\) on \(\{0,1\}^n\) for which the evolution dynamics is

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \left. \xi _{t+1}(x)=1 \right| \varvec{\xi }_s, s\leqslant t\right) ={\left\{ \begin{array}{ll} q &{} \text { if } \xi _t(y)=1 \text { for at least one } y\in Y^x \\ 0 &{} \text { otherwise} \end{array}\right. }. \end{aligned}$$
(1.9)

Conditional on the state up to time t, the decisions on the values of \(\xi _{t+1}(x), x \in [n]\), are independent.

Since the dynamics of \(\varvec{\xi }_t\) approximates that of \(\varvec{\zeta }_t\), ‘rapid convergence of \(\varvec{\xi }_t\) to all-zero configuration’ corresponds to the ordered behavior, and ‘prolonged persistence of \(\varvec{\xi }_t\) with positive density of 1’s’ corresponds to the chaotic behavior of the associated \(\text {RBN}\) models.

The threshold contact process is a variant of basic contact process, which was introduced by Harris [1] and has been studied on square latices and infinite homogeneous trees (see [35] for details) extensively and on random graphs [10, 11] more recently. The threshold contact process is an important interacting particle system in its own right. This process has been studied on square lattices in the context of ‘coexistence’ and comparison with ‘threshold voter model’ (see Section II.2 of [35]) and in the continuum [36]. This process has also been studied on random graphs recently [9, 25, 37].

For the remainder of the paper we will write \(q=2p(1-p)\). From the description of the threshold contact process, it is not hard to see that starting from any initial configuration of zeros and ones \(\xi _1(x)=0\) for all \(x \in [n]\) with probability at least \((1-q)^n\). So \(\{\xi _t(x)\}_{x\in [n],t\geqslant 0}\) can avoid all-zero configuration for at most exponentially (in n) long time.

Definition 1.3

For a stochastic process \(\{\varvec{\xi }_t\}_{t\geqslant 0}\) on \(\{0,1\}^n\) (resp. a set valued process \(\{\xi _t\}_{t\geqslant 0}\) on \(\{A: A\subset [n]\}\)) if there are constants \(c>0\) and \(\rho \in (0,1)\) such that with probability tending to 1 as n goes to infinity \(n^{-1} \sum _{x\in [n]}\varvec{\xi }_t(x)\) (resp. \(n^{-1}|\xi _t|\)) stays above \(\rho \) for time at least \(e^{cn}\), then we say that \(\{\varvec{\xi }_t\}_{t\geqslant 0}\) (resp. \(\{\xi _t\}_{t\geqslant 0}\)) has “exponential persistence” property.

For the threshold contact process on random directed graphs with fixed in-degree the following has been proved in [9].

Theorem 1.4

[9] Suppose \({{\mathrm{\mathbb {P}}}}_n\) denotes the annealed probability distribution of threshold contact process \(\{\xi _t\}\) (defined in Definition 1.2) with parameter q starting from all-one configuration on a random graph chosen uniformly from all directed graphs on n vertices in which each vertex has in-degree r. Also let \(\rho \) be the survival probability of a branching process with offspring distribution \(q{\varvec{\delta }}_r+(1-q){\varvec{\delta }}_0\).

  1. 1.

    If \(q(r-1)>1\), then for every \(\eta >0\) there exists \(c>0\) such that

    $$\begin{aligned} {{\mathrm{\mathbb {P}}}}_n\left( \inf _{0\leqslant t\leqslant \exp (cn)}\frac{|\xi _t|}{n} > \rho -\eta \right) \rightarrow 1 \quad \text {as } n\rightarrow \infty . \end{aligned}$$
    (1.10)
  2. 2.

    If \(qr>1\), then for every \(\eta >0\) there exists \(c>0\) and \(b=b(q) \in (0,1)\) such that

    $$\begin{aligned} {{\mathrm{\mathbb {P}}}}_n\left( \inf _{0\leqslant t\leqslant \exp (cn^b)}\frac{|\xi _t|}{n} > \rho -\eta \right) \rightarrow 1 \quad \text { as } n\rightarrow \infty . \end{aligned}$$

So, in case of ‘fixed in-degree’ random directed graph, “exponential persistence” has been proved only when \(q>1/(r-1)\). For q between \(1/(r-1)\) and 1 / r the persistence time obtained in [9] was weaker than exponential. Later the results has been improved in [25] to have “exponential persistence” for any \(q > 1/r\).

Theorem 1.5

[25] In the set-up of Theorem 1.4, if \(qr>1\), then for every \(\eta >0\) there exists \(c>0\) such that (1.10) holds.

These two results together with soft argument about the case ‘\(qr<1\)’ confirm that the curve appearing in (1.4) represents the phase transition curve for the threshold contact process approximating \(\text {RBN}_{r,p}\).

Moreover, the quantity \(\rho \) appearing in Theorems 1.4 and 1.5 agrees with the limiting fraction \(y^*\) appearing in (1.3) and (1.8), because if we write \(\theta =1-y*\), then the definition of \(y^*\) suggests

$$\begin{aligned} 1-\theta =2p(1-p)(1-\theta ^r) \text { which means } \theta =1-2p(1-p)+2p(1-p)\theta ^r. \end{aligned}$$

Hence, from the theory of branching process, \(\theta \) is the extinction probability of a branching process (starting from one individual) with offspring distribution \((1-2p(1-p)){\varvec{\delta }}_0+2p(1-p){\varvec{\delta }}_r\), which implies \(\theta =1-\rho \) and hence \(\rho =y^*\).

The main difference between the processes \(\{\varvec{\xi }_t\}\) and \(\{\varvec{\zeta }_t\}\) is that the first one (a Markov process) enjoys lots of independence unlike the second one (a nonMarkovian deterministic process). But, these two processes, \(\{\varvec{\xi }_t\}\) and \(\{\varvec{\zeta }_t\}\), agree on their one-step transition probability. This relation is similar in spirit with that between \(\text {RBN}\) models and their annealed approximations, but in case of threshold contact process the ground graph remains fixed through time unlike the case of annealed approximation.

1.2 Construction of random directed ground graphs

Last few decades have seen enormous activity in the study of random graphs. Many authors have contributed to unearth different properties and applications of random graphs (both directed and undirected). Early work in this area goes back to Bender and Canfield [3]. We refer the readers to the books by Bollobás [2] and Durrett [15] and references therein to know more about random graphs and related areas. The models that are described here already appears in the literature, e.g. the method of construction of the ground graph in \(\text {RBN}^{\text {in,out}}\) can be found in [26], and analogue of the ground graph in \(\text {RBN}^{\text {in}}\) is known as “random r-out” in the literature.

However, in order to be self-contained and consistent in notations, here we provide precise mathematical formulation of the ground graph models which are used for defining \(\text {RBN}^{\text {in}}\) and \(\text {RBN}^{\text {in,out}}\).

1.2.1 Construction of the graph for \(\text {RBN}^{\text {in}}\)

For \(\text {RBN}^{\text {in}}\), we start with a prescribed in-degree distribution \(\mathbf {p}^{\text {in}}=\{p^{\text {in}}_k\}_{k\geqslant 1}\) having finite mean \(r^{\text {in}} := \sum _k k \cdot p^{\text {in}}_k < \infty \). Following [9], we construct the (directed) ground graph \(\mathscr {G}_n = \mathscr {G}_n([n], \mathscr {E}_n)\) having in-degree distribution \(\mathbf {p}^{\text {in}}\) as follows. First, we choose the in-degrees \(I_1, I_2, \ldots , I_n \) independently with common distribution \(\mathbf {p}^{\text {in}}\). Then, for each vertex \(x \in [n]\) we construct the corresponding input set \(Y^x:=\{Y^x_1, Y^x_2, \ldots , Y^x_{I_x}\}\) by choosing \(I_x\) many distinct vertices uniformly from \([n] {\setminus }\{x\}\). Finally we place oriented edges from these chosen vertices to x to obtain the edge set

$$\begin{aligned} \mathscr {E}_n := \{\langle Y^x_i, x\rangle : x\in [n], 1\leqslant i\leqslant I_x\} \end{aligned}$$

of the graph \(\mathscr {G}_n\). In other words, if we write \({{\mathrm{\mathbb {P}}}}_{1,\mathbf {I}}\) for the conditional (“quenched”) distribution of \(\mathscr {G}_n\) given the in-degree sequence \(\mathbf {I}=(I_1, I_2, \ldots , I_n)\) and \({{\mathrm{\mathbb {P}}}}_{1,n}\) for the unconditional (“annealed”) distribution of \(\mathscr {G}_n\), and if \(E_{zx}\) denotes the number of directed edges from vertex z to x, then

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{1,n}(\cdot ) := \sum _{\mathbf {I}\in \mathbb {N}^n} {{\mathrm{\mathbb {P}}}}_{1,\mathbf {I}}(\cdot ) \cdot \mathbf {p}^{\text {in}}_{\otimes n}(\mathbf {I}), \end{aligned}$$
(1.11)

where \(\mathbf {p}^{\text {in}}_{\otimes n}\) is the product measure on \(\mathbb {N}^n\) with marginal \(\mathbf {p}^{\text {in}}\), and

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{1,\mathbf {I}} (E_{zx}= & {} e_{zx}, z, x\in [n])\nonumber \\= & {} 1\bigg /\prod _{i=1}^n {n-1 \atopwithdelims ()I_i}\quad \text {if }e_{z,x}\in \{0,1\}, e_{x,x}=0, \sum _{z=1}^n e_{z,x}=I_x,\qquad \end{aligned}$$
(1.12)

for all \(x\in [n]\) and \({{\mathrm{\mathbb {P}}}}_{1,\mathbf {I}}(E_{z,x}=e_{z,x}, 1\leqslant z, x\leqslant n)=0\) otherwise. Thus, under \({{\mathrm{\mathbb {P}}}}_{1,\mathbf {I}}\) \(\mathscr {G}_n\) is distributed uniformly over all simple directed graphs having in-degree sequence \(\mathbf {I}\).

1.2.2 Construction of the graph for \(\text {RBN}^{\text {in,out}}\)

We follow the construction procedure of Newman et al. [26, 27] to obtain the configuration model. We start with a prescribed joint distribution for in-degree and out-degree \(\mathbf {p}^{\text {in,out}}=\{p^{\text {in,out}}_{k,l}\}_{k\geqslant 1, l\geqslant 0}\) such that \(r^{\text {in,out}} := \sum _{k,l} k\cdot l\cdot p^{\text {in,out}}_{k,l} < \infty \) and \(r^{\text {in}} := \sum _{k,l} k\cdot p^{\text {in,out}}_{k,l} = \sum _{k,l} l \cdot p^{\text {in,out}}_{k,l}=:r^{\text {out}}\). It is easy to see that the later condition is necessary for \(\mathbf {p}^{\text {in,out}}\) to be eligible for our purpose. Let \(\left\{ (I_i, O_i)\right\} _{i=1}^n\) be i.i.d. with common distribution \(\mathbf {p}^{\text {in,out}}\); here \(I_x\) and \(O_x\) denote the in-degree and out-degree of vertex x respectively. We need to condition on the event

$$\begin{aligned} E_n := \left\{ \sum _{i=1}^n I_i = \sum _{i=1}^n O_i\right\} \end{aligned}$$
(1.13)

in order to have a valid degree sequence. Having chosen the degree sequence \((\mathbf {I},\mathbf {O})=((I_1,O_1), \ldots , (I_n,O_n))\), we allocate \(I_x\) many “inward arrows” and \(O_x\) many “outward arrows” for vertex x. Then we pick a uniform random matching between the sets of inward and outward arrows. If one of the inward arrows of x is matched with one of the outward arrows of z, then we let \(\langle z,x\rangle \in \mathscr {E}_n\). We will use \({{\mathrm{\mathbb {P}}}}_{2,\mathbf {I},\mathbf {O}}\) to denote the conditional (“quenched”) distribution of \(\mathscr {G}_n\) given the in-degree and out-degree sequence \((\mathbf {I},\mathbf {O})\). We also condition on the event

$$\begin{aligned} F_n:=\{ \mathscr {G}_n \text { is simple}\}, \end{aligned}$$
(1.14)

i.e. it neither contains any self-loop at some vertex, nor contains multiple edges between two vertices. So if \({{\mathrm{\mathbb {P}}}}_{2,n}\) denotes the unconditional (“annealed”) distribution of \(\mathscr {G}_n\), then

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{2,n}(\cdot ) = \sum _{\mathbf {I},\mathbf {O}\in \mathbb {N}^n} {{\mathrm{\mathbb {P}}}}_{2,\mathbf {I},\mathbf {O}}(\cdot | F_n) \cdot \mathbf {p}^{\text {in,out}}_{\otimes n}((\mathbf {I},\mathbf {O}) | E_n), \end{aligned}$$
(1.15)

where \(\mathbf {p}^{\text {in,out}}_{\otimes n}\) is the product measure on \((\mathbb {N}^2)^n\) with marginal \(\mathbf {p}^{\text {in,out}}\).

Remark 1.6

Given the prescribed in-degree distribution \(\mathbf {p}^{\text {in}}\) having finite mean, one can construct a graph \(\mathscr {G}_n\) by following the construction of \(\mathscr {G}_n\) for \(\text {RBN}^{\text {in,out}}\) (see Sect. 1.2.2) corresponding to any \(\mathbf {p}^{\text {in,out}}\) with marginal \(\mathbf {p}^{\text {in}}\). But the quenched distribution of \(\mathscr {G}_n\) will no longer be uniform over all candidates.

1.3 Dynamics

Once the ground graph has been fixed via one of the above construction procedures, we will be interested in properties of the threshold contact process \(\left\{ \varvec{\xi }_t\right\} _{t\ge 0}\) as defined in Definition 1.2 with parameter \(q=2p(1-p) \in [0, 1]\). We will often view this as a set valued process \(\left\{ \xi _t\right\} _{t\geqslant 0}\), where \(\xi _t := \left\{ x\in [n]: \xi _t(x)=1\right\} \). We will sometimes refer to \(\xi _t\) as the set of occupied vertices and \([n]{\setminus } \xi _t\) as the set of vacant sites at time t. We will write \({{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\) for the distribution of the threshold contact process \(\{\xi _t\}_{t\geqslant 0}\) with parameter q, conditioned on the ground graph \(\mathscr {G}_n\). For a fixed set \(A\subseteq [n]\), we will write \(\left\{ \xi _t^A\right\} _{t\ge 0}\) for the process started with \(\xi _0^A= A\).

1.4 Main results

As described in Sects. 1.2 and 1.3, there are two layers of randomness: one corresponds to the distribution of the ground graph \(\mathscr {G}_n\), and the other corresponds to the distribution of the threshold contact process conditioned on the ground graph \(\mathscr {G}_n\).

We need another ingredient to present our main results. For a probability distribution \({\varvec{\mu }}\) on \(\{0, 1, \ldots \}\) and \(q\in [0,1]\) let \({\varvec{\pi }}({\varvec{\mu }},q) \in [0,1]\) denote the survival probability for the branching process with offspring distribution \((1-q) {\varvec{\delta }}_0+q{\varvec{\mu }}\) starting from one individual. From the branching process theory,

$$\begin{aligned} {\varvec{\pi }}({\varvec{\mu }},q)=1-\theta , \end{aligned}$$
(1.16)

where \(\theta \in [0,1]\) is the minimum value satisfying \(\theta = 1 - q + \sum _{k\geqslant 0}q\cdot {\varvec{\mu }}\{k\} \cdot \theta ^k\). Using the above ingredients we now present our main result.

Theorem 1.7

For any probability distribution \(\mathbf {p}^{\text {in}}=\{p^{\text {in}}_k\}_{k \ge 1}\) on \(\mathbb {N}\), let \({{\mathrm{\mathbb {P}}}}_{1,n}\) be the probability distribution (as defined in (1.11)) on the set of random directed simple graphs on n vertices having in-degree distribution \(\mathbf {p}^{\text {in}}\). Suppose \(\mathbf {p}^{\text {in}}\) has finite second moment, \(p^{\text {in}}_1=0\) and mean \(r^{\text {in}}>2\). Let \(q>1/r^{\text {in}}\) and \(\pi :={\varvec{\pi }}(\mathbf {p}^{\text {in}},q)>0\) be the branching process survival probability as defined in (1.16). For any \(\varepsilon >0\) there is a constant \(\Delta (\varepsilon )>0\) and a ‘good’ set of graphs \(\mathcal {G}_n\) satisfying \({{\mathrm{\mathbb {P}}}}_{1,n}(\mathcal {G}_n) = 1-o(1)\) such that if \(\mathscr {G}_n \in \mathcal {G}_n\) and if \(\{t_n\}\) is any sequence with \(t_n \leqslant \exp (\Delta n)\) and \(\lim _{n\rightarrow \infty } t_n=\infty \), then

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \inf _{0\leqslant t \leqslant \exp (\Delta n)}\frac{1}{n} \left| \xi ^{[n]}_t\right| \geqslant \pi -\varepsilon , \sup _{t_n\leqslant t \leqslant \exp (\Delta n)}\frac{1}{n} \left| \xi ^{[n]}_t\right| \leqslant \pi +\varepsilon \right) = 1- o(1). \end{aligned}$$

Moreover, if q satisfies \(q\cdot r^{\text {in}} < 1\), then there is a constant \(C(q, r^{\text {in}})>0\) such that \({{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(\xi ^{[n]}_{C\log n} \ne \emptyset )=o(1)\) for all \(\mathscr {G}_n \in \mathcal {G}_n\).

The above theorem proves (1.5) for the threshold-contact process. It also proves “exponential persistence” (maximum possible order of persistence) in the supercritical regime.

Remark 1.8

The proof techniques for \(\text {RBN}^{\text {in}}\) (after minor modifications) can be used to prove similar results for nonuniform directed graph models such as (i) directed graphs with communities (ii) assortative (resp. disassortative) directed graphs. In (i) each node belongs to one of the K communities, and the quenched probability \(P(Y^x_i=u)\) is proportion to c(ux) / n, where c(ux) depends on the community indices for the vertex u and x. In (ii) \(P(Y^x_i=u)\) increases (resp. decreases) with \(I_u\).

Next we move to \(\text {RBN}^{\text {in,out}}\).

Theorem 1.9

For any bivariate probability distribution \(\mathbf {p}^{\text {in,out}}=\{p^{\text {in,out}}_{k,l}\}_{k\ge 1, l \ge 0}\) on \(\mathbb {N}^2\), let \({{\mathrm{\mathbb {P}}}}_{2,n}\) be the probability distribution (as defined in (1.15)) on the set of random directed simple graphs on n vertices for which the joint distribution of in-degree and out-degree is \(\mathbf {p}^{\text {in,out}}\). Suppose \(p^{\text {in,out}}_{k,l} = 0\) whenever \(k\le 1\), and the marginal distributions corresponding to \(\mathbf {p}^{\text {in,out}}\) have equal mean \(r^{\text {in}}\) and finite second moment. Let \(\tilde{\mathbf {p}}^{\text {in}}=\{\tilde{p}_k^{\text {in}}:= (r^{\text {in}})^{-1}\sum _{l} l \cdot p^{\text {in,out}}_{k,l}\}_{k \geqslant 2}\) be the size-biased in-degree distribution with mean \(\tilde{r}^{\text {in}}\), \(q > 1/\tilde{r}^{\text {in}}\) and \(\pi :={\varvec{\pi }}(\tilde{\mathbf {p}}^{\text {in}},q)>0\) be the branching process survival probability as defined in (1.16). For any \(\varepsilon >0\) there is a constant \(\Delta (\varepsilon )>0\) and a ‘good’ set of graphs \(\mathcal {G}_n\) satisfying \({{\mathrm{\mathbb {P}}}}_{2,n}(\mathcal {G}_n) =1-o(1)\) such that if \(\mathscr {G}_n \in \mathcal {G}_n\) and if \(t_n\) is any sequence satisfying \(t_n \leqslant \exp (\Delta n)\) and \(\lim _{n\rightarrow \infty } t_n = \infty \), then

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \inf _{t \leqslant \exp (\Delta n)}\frac{1}{n} \left| \xi ^{[n]}_t\right| \geqslant \pi -\varepsilon , \sup _{t_n\leqslant t \leqslant \exp (\Delta n)}\frac{1}{n} \left| \xi ^{[n]}_t\right| \leqslant \pi +\varepsilon \right) =1-o(1). \end{aligned}$$

Moreover, if q satisfies \(q\tilde{r}^{\text {in}} < 1\), then there is a constant \(C(q, \tilde{r}^{\text {in}})>0\) such that \({{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(\xi ^{[n]}_{C\log n} \ne \emptyset )=o(1)\) for all \(\mathscr {G}_n \in \mathcal {G}_n\).

The above theorem proves (1.6) for the threshold-contact process. It also proves “exponential persistence” for the supercritical regime in \(\text {RBN}^{\text {in,out}}\).

1.5 Proof ideas

The subcritical case is relatively easier, as we can couple with a subcritical branching process. Although there are technical challenges because of heterogeneity of the ground graph, these have been tackled using a suitable second moment argument.

In the supercritical regime, there are two main aspects: estimate the probability of survival of the dual process starting from (i) a typical single vertex, say x, (ii) a typical large initial set, say A, with size between \((\log (n))^a\) and \(\varepsilon n\) for some \(a>0\) and small \(\varepsilon >0\). To tackle the first, we have approximated the small neighborhood of x up to distance \(O(\log \log n)\) by an appropriate finite truncated tree. The approximation may fail for some of the vertices if we go beyond this distance. Hence the survival probability of the dual process starting from x till time \(O(\log \log n)\) equals approximately that of a suitable supercritical branching process (see Proposition 3.2). Also, using branching process large deviation, we have shown that the number of occupied vertices in the dual process starting from x, when the dual process survives, reaches size \(O((\log n)^a)\) by time \(O(\log \log n)\) for some \(a>0\).

The remaining part is the most challenging. The existing approaches do not work (see Sect. 1.6 for reasons), so we need to be innovative. Here we have shown that there is a ‘good’ set of graphs (see Proposition 3.1) such that whenever the size of the occupied set A in the dual process (irrespective of the location of A) lies between \((\log n)^a\) and \(\varepsilon n\), then the oriented neighborhood of A in the edge reversed graph \(\overleftarrow{\mathscr {G}}_n\) up to distance \(O(log\log (n/|A|))\) dominates (with very high probability) a suitably truncated forest (see Sect. 2.2) having offspring distribution approximately equal to the out-degree distribution of \(\mathscr {G}_n\). So, it suffices to bound from below the size of the dual process on such truncated forests. Using this estimate on forests (see Lemma 4.1) we can show that the size of the dual process starting from A reaches above |A| within time \(O(\log \log (n/|A|)\) with exponentially small error probability (see Proposition 4.2).

Combining the estimates of part (i) and (ii) it follows that the dual process starting from a single vertex survives with probability given by the survival probability of a branching process, and when it survives, it maintains positive density for exponentially long time. This result enables us to prove our main results.

1.6 Discussion

It is needless to say that one of the main challenges to prove prolonged persistence in the supercritical regime for the threshold contact process that we consider is the heterogeneity of the ground graphs. As mentioned earlier, this process on uniformly chosen directed graphs with constant in-degree r has been analyzed in [9]. Suppose \(G_{n,r}\) denotes such a graph. In order to prove prolonged persistence for threshold contact process on \(G_{n,r}\), the authors of [9] have proved an “isoperimetric inequality” for small subsets of vertices in \(G_{n,r}\).

Theorem 1.10

[9] For \(U\subset [n]\) suppose

$$\begin{aligned} U^{*1} := \{z \in [n]: \langle z, x\rangle \text { is an edge in } G_{n,r} \text { for some } x \in U \}. \end{aligned}$$

Then for any \(\eta >0\) there is an \(\epsilon _o(\eta )>0\) such that for any \(m \leqslant \epsilon _0 n\)

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(\exists U \subset [n] \text { such that } |U|=m, |U^{*1}|\leqslant (r-1-\eta )m) \leqslant \exp (-\eta m\log (n/m)/2). \end{aligned}$$

The idea behind this definition of \(U^{*1}\) is that if U is the set of occupied vertices at time t in the dual coalescing branching process on the edge reversed graph \(\overleftarrow{G}_{n,r}\), then the vertices in \(U^{*1}\) may be occupied at time \(t +1\). \(U^{*1}\) may contain vertices of U. In view of the above isoperimetric inequality, the authors have argued that in the case \(q(r-1)>1\) if the number of occupied vertices in the dual process drops below \(\varepsilon n\) for some small \(\varepsilon >0\), then at the next time step the number jumps above \(\varepsilon n\) with exponentially small error probability. This observation helps the author to prove “exponential persistence” in this case. For the case \(q \in (1/r, 1/(r-1))\) the authors in [9] explored the locally homogeneous tree-like structure of \(G_{n,r}\) to prove that once the number of occupied vertices in the dual process drops below \(\varepsilon n\), it regains the size at a later time, but with an error probability which is larger than exponential. This allows them to have a weaker persistence result for this case. This approach heavily depends on the fact that \(G_{n,r}\) has same in-degree for all vertices. So it cannot be generalized in the heterogeneous case.

In the follow-up paper [25], the authors consider the same process on the same random graph model as in [9]. They also use the local homogeneous tree-like structure of \(G_{n,r}\) and the fact that \(G_{n,r}\) has fixed in-degree r crucially in all of their main results. They have been able to show that there is a number \(g\in \mathbb {N}\) (g is a constant, but it depends on q and r) such that if the number of occupied vertices in the dual process at time 0 is \(m \leqslant \varepsilon n\) for some small \(\varepsilon >0\), then after g time steps the number of occupied vertices increases to \((1+\delta )m\) with exponentially small (in m) error probability. This result allows the authors in [25] to obtain “exponential persistence” for threshold contact process on \(G_{n,r}\) for any \(q>1/r\). But, again this technique cannot be generalized to heterogeneous ground graphs, as all the arguments only work in the fixed r case. So, a new approach would be required to deal with the heterogeneity and correlations present in the degree distribution, which we present in this paper.

On the other hand, not much is understood about the roles of different initial subsets of occupied sites in the prognosis of the dynamics beyond the level of the “first moment method”. It is also not understood whether the dynamics avoids certain ‘bad’ configurations of occupied sites from which survival of the dynamics is difficult when q is near the critical value, but stays in the supercritical regime. So, we need to have a very good control (more than exponential) over the probabilities of unlikely events, as the number of possible initial subsets of size m is super-exponentially large in m.

Keeping these in mind, the technique used in this paper, namely coupling the dual process with a suitably truncated branching process up to a certain time, seems to be the only effective way to obtain “exponential persistence” in the entire supercritical regime.

However, this technique does not work that well in the critical regime where the parameters \(p, r^{\text {in}}\) and \(r^{\text {in,out}}\) satisfy the equality in (1.5) and (1.6). Based on the behavior of critical contact process on d-dimensional torus, one expects to have persistence of activity till some polynomial time in this case.

Coming back to the original \(\text {RBN}\) models, both approximations discussed in Sect. 1.1 indicate that the behavior of the original \(\text {RBN}\) models with expression bias p should undergo a phase transition, and the phase transition curve should be \(2p(1-p)\cdot r=1\), where r is the average out-degree in the edge-reversed ground graph. Like many other statistical physics models that have phase transitions, most of the features of the \(\text {RBN}\) models are expected to behave differently in the chaotic and ordered regions. Stability in the sense of (1.3) is one such feature. Based on the observation \(\rho =y^*\) made in Sect. 1.1.2 and the results obtained in this paper, one expects

Conjecture 1.11

For any two realizations \(\{{\varvec{\eta }}_t\}_{t\geqslant 0}\) and \(\{\tilde{{\varvec{\eta }}}_t\}_{t\geqslant 0}\) of \(\text {RBN}^{\text {in}}(\mathbf {p}^{\text {in}}, p)\) (resp. \(\text {RBN}^{\text {in,out}}(\mathbf {p}^{\text {in,out}}, p)\)) starting from different initial configurations if \(\{\varvec{\zeta }_t\}_{t\geqslant 1}\) is the exclusive OR process of \(\{{\varvec{\eta }}_t\}_{t\geqslant 0}\), then

$$\begin{aligned} \lim _{t\rightarrow \infty }\lim _{n\rightarrow \infty }\frac{1}{n}d_{\text {Ham}}({\varvec{\eta }}_t, \tilde{{\varvec{\eta }}}_t) = \lim _{t\rightarrow \infty }\lim _{n\rightarrow \infty }\frac{1}{n} \sum _{x\in [n]}\zeta _t(x) = \pi , \end{aligned}$$

where \(\pi \) is the survival probability of the branching process (starting from one individual) with offspring distribution \((1-2p(1-p)){\varvec{\delta }}_0+2p(1-p)\mathbf {p}\) in which \(\mathbf {p}\) represents the out-degree distribution of the edge-reversed ground graph, so \(\mathbf {p}=\mathbf {p}^{\text {in}}\) for \(\text {RBN}^{\text {in}}(\mathbf {p}^{\text {in}},p)\) and \(\tilde{\mathbf {p}}^{\text {in}}\) (as defined in Theorem 1.9) for \(\text {RBN}^{\text {in,out}}(\mathbf {p}^{\text {in,out}}, p)\).

Other features of \(\text {RBN}\) models include length of limit cycles, number of different limit cycles, evolution of distance between two trajectories starting from different initial configuration etc. Simulation studies indicate length of cycles is \(O(\sqrt{N})\) in the ordered regime and \(O(e^{cn})\) in the chaotic regime (see [8] and [10] of [12]). Similar changes in behavior for the number of distinct attractors has also been observed [38, 39]. In the physics literature, some authors (see [23] and references therein) have used annealed approximation assumption to predict the behavior of different features of the original \(\text {RBN}\) models, but the complete picture is yet to be discovered.

1.7 Organization of the paper

The remainder of the paper is organized as follows. In Sect. 2, we describe the dual process for the threshold contact process, which will play a crucial role in our argument, and give quantitative estimates for the error involved in approximating the local neighborhoods of certain small subsets of vertices in the edge-reversed graph. In this section we also mention some more ingredient lemmas, which are used later. Section 3 contains description of the set of ‘good’ graphs that appears in the theorems and proof of the facts that their probabilities are \(1-o(1)\). Finally, in Sect. 4 we put all the ingredients together to have the proofs of the main theorems.

2 Preliminaries

Before jumping into the core of the proof we need some preliminary facts. We begin this section with the definition of the dual process for the threshold contact process, which will play a major role in proving the main results. We also collect asymptotic properties of local neighborhoods of the random graph models in \(\left\{ \text {RBN}^i: i=1, 2\right\} \).

2.1 Dual coalescing branching process

For a given directed graph \(\mathscr {G}_n=([n], \mathscr {E}_n)\), let \(\overleftarrow{\mathscr {G}}_n=([n], \overleftarrow{\mathscr {E}}_n)\) be the reversed (directed) graph obtained by reversing the edges, i.e., \(\overleftarrow{\mathscr {E}}_n:=\{\langle x,y\rangle : \langle y,x\rangle \in \mathscr {E}_n\}\). We write \(x\rightarrow z\) if \(\langle x,z\rangle \in \overleftarrow{\mathscr {E}}_n\), and in that case we will occasionally refer to z as a child of x in \(\overleftarrow{\mathscr {G}}_n\).

Now, using an analogue (see Fig. 2) of Harris’s graphical construction of basic contact process [1, 35] for the threshold contact process \(\left\{ \xi _t: t\ge 0\right\} \) on the graph \(\mathscr {G}_n\), it is not hard to see that the dual process \(\{\overleftarrow{\xi }_t: t\geqslant 0\}\) is the following coalescing branching process on the edge-reversed graph \(\overleftarrow{\mathscr {G}}_n\). For any \(t\ge 0\), each site of \(\overleftarrow{\xi }_t\) gives birth at time t independently with probability q. If \(x \in \overleftarrow{\xi }_t\) gives birth, all of its children are included in \(\overleftarrow{\xi }_{t+1}\). In other words, every vertex \(\overleftarrow{\xi }_t\) gives birth with probability q independently across vertices, and \(z\in [n]\) is included in \(\overleftarrow{\xi }_{t+1}\) if there exists \(x\in \overleftarrow{\xi }_t\) which gives birth at time t and \(x\rightarrow z\). We write \(\{\overleftarrow{\xi }_t^B: t\ge 0\}\) for the coalescing branching process starting from \(\overleftarrow{\xi }_0^B = B\).

Fig. 2
figure 2

In all the figures, x-axis represents \(\mathscr {G}_n\) (space), but for simplicity in drawing we have used one-dimensional line. Time goes up vertically. In the space-time graph, a “gadget” connects a vertex x at time t to its input nodes at time \(t-1\). a One such gadget. Each gadget appears with probability q independent of other gadgets. b A possible collection of gadgets among different timelines. c, d Show how to construct the threshold contact process starting from a given initial set A using the gadgets. e Shows how to obtain the dual process and why is it a coalescing branching process. Finally f demonstrates the duality relationship between the threshold contact process and its dual, as both events \(\{\xi ^A_t \cap B \ne \emptyset \}\) and \(\{\hat{\xi }^B_t \cap A \ne \emptyset \}\) are equivalent to existence of a path described in f

It is easy to check [18] that the following duality relation holds. For any \(t\ge 0\) and for any two sets \(A, B \subseteq [n]\) we have

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \xi ^A_t \cap B \ne \emptyset \right) = {{\mathrm{\mathbb {P}}}}\left( \overleftarrow{\xi }^B_t \cap A \ne \emptyset \right) . \end{aligned}$$
(2.1)

This will enable us to do suitable analysis for the dual process, and then carry the implications forward to prove our main results about \(\{\xi _t\}\).

2.2 Local neighborhoods

Next, we need to understand the structure of the neighborhood of a small set of vertices of \(\overleftarrow{\mathscr {G}}_n\). The goal is to see whether the oriented neighborhood of a typical small vertex set contains an oriented forest having offspring distribution is close to the out-degree distribution for \(\overleftarrow{\mathscr {G}}_n\). See Fig. 3 for an example.

Fig. 3
figure 3

a An illustration of oriented neighborhood up to distance 2 from the vertex set \(A=\{1, 2, 3, 4\}\). The neighborhood consists of 35 vertices. b The associated forest \(\{Z^A_t\}_{0\leqslant t\leqslant 2}\). Vertices belonging to \(O^A_t, C^A_t\) and \(R^A_t\) are denoted by open circles, filled circles and filled squares respectively. \(r=2.1\) has been used for defining \(R^A_t\). c Only gives the open vertices \(\{O^A_t\}\)

For \(A \subset [n]\) we let \(\overleftarrow{Z}^A_0=A\) and for \(l\geqslant 1\) let

$$\begin{aligned} \overleftarrow{Z}^A_l\!:=\!\{z \!\in \! [n]: \text { there is an oriented path in } \overleftarrow{\mathscr {G}}_n \text { of length } l \text { from some } x \in A \text { to } z \}. \end{aligned}$$

Let \(\{K_m\}_{m=1}^n\) be a sequence of numbers, which will be specified later (see (3.5)). Using \(\sqcup \) to denote disjoint union, we introduce the following coupling between the directed subgraph of \(\overleftarrow{\mathscr {G}}_n\) induced by \(\cup _{i= 0}^{K_{|A|}} \overleftarrow{Z}^A_i\) and a forest \(\{Z^A_t, 0\leqslant t\leqslant K_{|A|}\}\) along with partitions \(Z^A_t=C^A_t\sqcup O^A_t \sqcup R^A_t\), where \(C^A_t\), \(O^A_t\) and \(R^A_t\) represent ‘closed’, ‘open’ and ‘removed’ sites at level t respectively. Let \(A=\{u_1, \ldots , u_{|A|}\}\). For the root level of the forest we choose \(Z^A_0=A\) and \(C^A_0=R^A_0=\emptyset \), so \(O^A_0=A\). The sites in \(Z^A_0\) are labeled \(u_1, \ldots , u_{|A|}\). For each \(t\geqslant 0\), every site of \(O^A_t\) mimics the corresponding vertex in \(\overleftarrow{Z}^A_t\) with same label, so a site of \(O^A_t\) having label u gives birth to \(I_u\) many children at level \(t+1\). The new born sites at level \(t+1\) are assigned the same labels following those of \(\overleftarrow{Z}^A_{t+1}\). Writing \(r=r^{\text {in}}\) (resp. \(\tilde{r}^{\text {in}}\)) in case of \(\text {RBN}^{\text {in}}\) (resp. \(\text {RBN}^{\text {in,out}}\)), and letting

$$\begin{aligned} \Pi (l,A) \text { denote the subset of }A \text { consisting of }l\wedge |A| \text { elements with minimum indices}, \end{aligned}$$

we scan the sites of \(Z^A_{t+1}\) in an increasing order of labels, and define

$$\begin{aligned} R^A_{t+1} := Z^A_{t+1}{\setminus }\Pi \left( 2r |O^A_t|,Z^A_{t+1}\right) . \end{aligned}$$

For a site in \(Z^A_{t+1} {\setminus } R^A_{t+1}\), we say that a “collision” has occurred if its label either matches with that of a site in \(\cup _{s=0}^t O^A_s\), or has already been found while scanning the sites of level \(t+1\). We include all of these sites in \(C^A_{t+1}\). If collision does not occur at a site, we include that in \(O^A_{t+1}\).

For \(u\in Z^A_t\), let \(\overleftarrow{u}^A_t\in A\) denote the label of the unique ancestor of u having level 0. For any subset \(B\subset A\) and \(t\geqslant 1\) let \(Z^{A,B}_0=O^{A,B}_0=B\) and

$$\begin{aligned} Z^{A,B}_t&:= \left\{ u\in Z^A_t: \overleftarrow{u}^A_t \in B\right\} \\ C^{A,B}_t&:= \Pi \left( 2r\left| O^{A,B}_{t-1}\right| , Z^{A,B}_t\right) \cap C^A_t, \quad O^{A,B}_t:=\Pi \left( 2r\left| O^{A,B}_{t-1}\right| , Z^{A,B}_t\right) \cap O^A_t. \end{aligned}$$

So, each site in \(O^A_t\) corresponds to a unique vertex in \(\overleftarrow{Z}^A_t\) with the same label. Note that this map from \(O^A_t\) to \(\overleftarrow{Z}^A_t\) may not be onto because of collisions and removal of sites. See Fig. 4 for a schematic description of different quantities defined here.

Fig. 4
figure 4

a Brown region represents removed vertices in \(R^A_t\), cyan region represents closed vertices in \(C^A_t\) and black region represents open vertices in \(O^A_t\). b Only the black region represents vertices in \(O^{A,B}_t\), the yellow region represents vertices that are in \(Z^{A,B}_t{\setminus } O^{A,B}_t\) (color figure online)

The law of \(\overleftarrow{\mathscr {G}}_n\) induces the law of \(\{Z^A_t\}\) along with its partitions. We identify these two laws. Now our aim is to estimate the probability of collision, and then understand the offspring distribution in the above forest. We write

$$\begin{aligned}&\vartheta (m,K) := m + m\sum _{l=1}^{K} (2r)^l, \text { and } \\&\overline{I}_n:= \frac{1}{n} \sum _{z=1}^n I_z, \overline{I_n^2}:=\frac{1}{n} \sum _{z=1}^n I_z^2, \quad \overline{O}_n:= \frac{1}{n} \sum _{z=1}^n O_z, \overline{O_n^2}:=\frac{1}{n} \sum _{z=1}^n O_z^2,\nonumber \\&\quad \overline{IO_n}:= \frac{1}{n} \sum _{z=1}^n I_zO_z. \nonumber \end{aligned}$$
(2.2)

For \(\eta \in (0,1)\) and any probability distribution \({\varvec{\mu }}\), we define

$$\begin{aligned} \Gamma (\eta ,{\varvec{\mu }}):= & {} \int _0^\eta {\varvec{\mu }}^\leftarrow (1-t)\; dt, \text { where } {\varvec{\mu }}^\leftarrow (t) := \inf \{y \in \mathbb {R}: {\varvec{\mu }}((-\infty , y]) \geqslant t\} \nonumber \\&\text {for } t \in [0,1]. \end{aligned}$$
(2.3)

Recall from Sect. 1.2 that \(I_x\) denotes the in-degree of x and \(\{Y^x_1, \ldots , Y^x_{I_x}\}\) denotes the set of input vertices for x in \(\mathscr {G}_n\).

Lemma 2.1

(For \(\text {RBN}^{\text {in}}\)) If \(\mathbf {p}^{\text {in}}\) has finite second moment, then there is a constant \(C_{2.1}>0\) and a set of in-degree sequences \(\mathscr {I}_n \subset \mathbb {N}^n\) such that \(\mathbf {p}^{\text {in}}_{\otimes n}(\mathscr {I}_n)=1-o(1)\) and \(\mathbf {I}\in \mathscr {I}_n\) implies

$$\begin{aligned} (1)&{{\mathrm{\mathbb {P}}}}_{1,\mathbf {I}}( x \in C^A_t) \leqslant 2\vartheta (|A|, K)/n \text { for any } A \subset [n], x \in Z^A_t {\setminus } R^A_t \text { and } 1\leqslant t\leqslant K \\ (2)&{{\mathrm{\mathbb {P}}}}_{1,\mathbf {I}}\left[ (Y^x_i, x \in [n], 1\leqslant i\leqslant I_x) \in \cdot \right] \leqslant C_{2.1} {{\mathrm{\mathbb {P}}}}_{1,\mathbf {I}}\left[ ({\tilde{Y}}^x_i, x \in [n], 1\leqslant i\leqslant I_x) \in \cdot \right] , \end{aligned}$$

where \(\{{\tilde{Y}}^x_i\}_{x\in [n], i\leqslant I_x}\) are i.i.d. with common distribution Uniform([n]).

Proof

We take

$$\begin{aligned} \mathscr {I}_n := \left\{ \max _z I_z \leqslant n^{3/4}, \overline{I}_n <c_1, \overline{I_n^2} < c_2\right\} , \end{aligned}$$

where \(c_i=2\sum _k k^ip^{\text {in}}_k\). Obviously \(\mathbf {p}^{\text {in}}_{\otimes n}(\mathscr {I}_n)=1-o(1)\).

(1) Note that \(\sum _{t=0}^{K}|Z^A_t{\setminus } R^A_t| \leqslant \vartheta (|A|,K)\). So if \(\mathbf {I}\in \mathscr {I}_n\), then it is easy to see from the construction of \(\mathscr {G}_n\) under the law \({{\mathrm{\mathbb {P}}}}_{1,\mathbf {I}}\) that for any \(1\leqslant t\leqslant K\) and \(x \in Z^A_t {\setminus } R^A_t\),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{1,\mathbf {I}}(x \in C^A_t) \leqslant \frac{\vartheta (|A|,K)}{n-\max _z I_z} \leqslant 2\vartheta (|A|,K)/n. \end{aligned}$$

(2) It is easy to see that

$$\begin{aligned} \left( Y^x_i, 1\leqslant i\leqslant I_x, x \!\in \! [n]\right) \,{\buildrel d \over =}\,\left( \left. {\tilde{Y}}^x_i, 1\leqslant i\leqslant I_x, x \!\in \! [n]\right| {\tilde{Y}}^z_i \ne {\tilde{Y}}^z_j \ne z \forall z \in [n], i \ne j\right) , \end{aligned}$$

so it suffices to show that \(\mathbf {I}\in \mathscr {I}_n\) implies \({{\mathrm{\mathbb {P}}}}_{1,\mathbf {I}}({\tilde{Y}}^z_i \ne {\tilde{Y}}^z_j \ne z \forall z \in [n], 1\leqslant i \ne j\leqslant I_z) \geqslant c\) for some constant. Using the inequality \(1-x\geqslant e^{-2x}\) for \(x>0\) small enough, if \(\mathbf {I}\in \mathscr {I}_n\), then

$$\begin{aligned}&{{\mathrm{\mathbb {P}}}}_{1,\mathbf {I}}({\tilde{Y}}^z_i \ne {\tilde{Y}}^z_j \ne z \forall z \in [n], 1\leqslant i \ne j\leqslant I_z) = \prod _{z=1}^n \prod _{i=1}^{I_z} (1-i/n) \\&\quad \geqslant \exp \left( -2\sum _{z=1}^n \sum _{i=1}^{I_z} (i/n)\right) =\exp (- \overline{I}_n-\overline{I_n^2}) \geqslant e^{-c_1-c_2} \end{aligned}$$

for large enough n. \(\square \)

Lemma 2.2

(For \(\text {RBN}^{\text {in,out}}\)) If the marginal distributions \(\mathbf {p}^{\text {in}}\) and \(\mathbf {p}^{\text {out}}\) corresponding to \(\mathbf {p}^{\text {in,out}}\) have finite second moment, then there is a constant \(C_{2.2}>0\) and a set of degree sequences \(\mathscr {A}\subset (\mathbb {N}^2)^n\) such that \(\mathbf {p}^{\text {in,out}}_{\otimes n}(\mathscr {A}| E_n)=1-o(1)\) and for all \((\mathbf {I},\mathbf {O}) \in \mathscr {A}\),

$$\begin{aligned} (1)&{{\mathrm{\mathbb {P}}}}_{2,\mathbf {I},\mathbf {O}}(x \in C^A_t) \leqslant 4\Gamma (2\varepsilon ,\mathbf {p}^{\text {out}})/(r^{\text {in}} - 2\varepsilon ) \text { whenever } A \subset [n] \text { satisfies } \\&\quad \vartheta (|A|,K) \leqslant \varepsilon n \text {for some }\varepsilon >0 \text { small}, x \in Z^A_t {\setminus } R^A_t \text { and } 1\leqslant t\leqslant K \\ (2)&{{\mathrm{\mathbb {P}}}}_{2,\mathbf {I},\mathbf {O}}\left[ \left. (Y^x_i, 1\leqslant i \leqslant I_x, x \in [n]) \in \cdot \right| F_n \right] \\&\quad \leqslant C_{2.2} {{\mathrm{\mathbb {P}}}}_{2,\mathbf {I},\mathbf {O}}\left[ \left. \left( {\tilde{Y}}^x_i, 1\leqslant i \leqslant I_x, x \in [n]\right) \in \cdot \right| \sum _{x=1}^n\sum _{i=1}^{I_x} \mathbf {1}_{\{{\tilde{Y}}^x_i = z\}} = O_z \forall z \in [n] \right] , \end{aligned}$$

where \(\{{\tilde{Y}}^x_i\}\) are i.i.d. with common distribution \(\sum _{z\in [n]} O_z {\varvec{\delta }}_z/\sum _{x\in [n]} O_x\). Moreover, if \(p^{\text {out}}_k \sim ck^{-\alpha }\) for some \(\alpha >2\), then \(\Gamma (\eta ,\mathbf {p}^{\text {out}}) \sim c\eta ^{(\alpha -2)/(\alpha -1)}\) for some constant \(c>0\) which depends on \(\alpha \) only.

If \(N=rn\) for some constant r and \(\{{\tilde{Y}}_i\}_{i=1}^N\) are i.i.d. with common distribution \(Multinomial(1; \alpha _1, \alpha _2, \ldots , \alpha _n)\), then for any small \(\varepsilon > 0\),

$$\begin{aligned} (3)&P\left( \left( {\tilde{Y}}_1, \ldots , {\tilde{Y}}_{\varepsilon N}\right) \in \cdot \left| \sum _{i=1}^N \mathbf {1}_{\{{\tilde{Y}}_i = z\}} = N\alpha _z \forall z \in [n]\right. \right) \\&\quad \leqslant (1+o(1)) \exp (-\varepsilon \log (1-\varepsilon ) N) P\left( \left( {\tilde{Y}}_1, \ldots , {\tilde{Y}}_{\varepsilon n}\right) \in \cdot \right) , \\ (4)&\left| \left| P\left( \left( {\tilde{Y}}_1, \ldots , {\tilde{Y}}_{\varepsilon n}\right) \in \cdot \left| \sum _{i=1}^N \mathbf {1}_{\{{\tilde{Y}}_i = z\}} =N\alpha _z \forall z \in [n]\right. \right) - P\left( \left( {\tilde{Y}}_1, \ldots , {\tilde{Y}}_{\varepsilon n}\right) \in \cdot \right) \right| \right| _{TV} \\&\quad \leqslant O(\varepsilon ^2 n)+o(1). \end{aligned}$$

Proof

For \(\vartheta (\cdot ,\cdot )\) as in (2.2) and \(\Gamma (\cdot ,\cdot )\) as in (2.3) we take

$$\begin{aligned} \mathscr {A}_n := E_n \cap \left\{ \frac{\sum _{i=1}^{\vartheta (|A|,K)} O_{n,n-i+1} }{\sum _z O_z - \vartheta (|A|,K)} \leqslant 4\Gamma (2\varepsilon , \mathbf {p}^{\text {out}})/(r^{\text {in}}-2\varepsilon ) \right\} , \end{aligned}$$

where \(O_{n,1} \leqslant O_{n,2} \leqslant \cdots \leqslant O_{n,n}\) are the order statistics for \(O_1, \ldots , O_n\). In order to prove \(\mathbf {p}^{\text {in,out}}_{\otimes n}(\mathscr {A}_n| E_n)=1-o(1)\), first recall that \(\mathbf {p}^{\text {in}}\) and \(\mathbf {p}^{\text {out}}\) have the same mean \(r^{\text {in}}\). So using Chebyshev’s inequality, \(\mathbf {p}^{\text {in,out}}_{\otimes n}(\overline{O}_n \leqslant r^{\text {in}}/2) = O(1/n)\), and hence

$$\begin{aligned} \mathbf {p}^{\text {in,out}}_{\otimes n}\left( \left. \overline{O}_n \leqslant r^{\text {in}}/2 \right| E_n \right) = o(1), \end{aligned}$$
(2.4)

because \(\mathbf {p}^{\text {in,out}}_{\otimes n}(E_n)=O(1/\sqrt{n})\) by the local central limit theorem.

Now we apply Theorem 1 of [33] for the function

$$\begin{aligned} J(t) = {\left\{ \begin{array}{ll} 0 &{} \text { for } 0\leqslant t\leqslant 1-2\varepsilon \\ 1 &{} \text { for } 1-\varepsilon \leqslant t\leqslant 1 \\ (t-1)/\varepsilon +2 &{} \text { for } 1-2\varepsilon \leqslant t\leqslant 1-\varepsilon \end{array}\right. }. \end{aligned}$$

and the i.i.d. random variables \(O_1, \ldots , O_n\). Since \(\mathbf {p}^{\text {out}}\) has finite second moment, it can be checked easily that the quantity given in (10) of [33], \(\sigma ^2(J,\mathbf {p}^{\text {out}})\) is finite. This together with Theorem 4 of [33] implies

$$\begin{aligned}&E^{\text {in,out}}_{\otimes n}(S_n-\mu )^2 =O(1/n), \text { where }\ S_n := \frac{1}{n} \sum _{i=1}^{n} J(i/(n+1)) O_{n,i} \text { and } \\&\quad \mu := \int _0^1 J(t) (\mathbf {p}^{\text {out}})^\leftarrow (t) \; dt \end{aligned}$$

is as in (11) of [33]. Note that

$$\begin{aligned} \int _{1-\varepsilon }^1 (\mathbf {p}^{\text {out}})^\leftarrow (t)\; dt\leqslant & {} \mu \leqslant \int _{1-2\varepsilon }^1 (\mathbf {p}^{\text {out}})^\leftarrow (t)\; dt, \text { which means } \Gamma (\varepsilon ,\mathbf {p}^{\text {out}}) \\\leqslant & {} \mu \leqslant \Gamma (2\varepsilon ,\mathbf {p}^{\text {out}}). \end{aligned}$$

Consequently, using Chebyshev’s inequality

$$\begin{aligned}&\mathbf {p}^{\text {in,out}}_{\otimes n}(S_n > 2\mu ) = O(1/n), \text { which implies } \\&\mathbf {p}^{\text {in,out}}_{\otimes n}\left( \frac{1}{n} \sum _{i=1}^{\vartheta (|A|,K)} O_{n,n-i+1} > 2\Gamma (2\varepsilon ,\mathbf {p}^{\text {out}})\right) =O(1/n), \text { and consequently}\\&\mathbf {p}^{\text {in,out}}_{\otimes n}\left( \left. \frac{1}{n} \sum _{i=1}^{\vartheta (|A|,K)} O_{n,n-i+1} > 2\Gamma (2\varepsilon ,\mathbf {p}^{\text {out}})\right| E_n\right) = o(1), \end{aligned}$$

as \(\mathbf {p}^{\text {in,out}}_{\otimes n}(E_n)=O(1/\sqrt{n})\). Combining the last display with (2.4) and noting that \(\vartheta (|A|,K) \leqslant \varepsilon n\), we see that \(\mathbf {p}^{\text {in,out}}_{\otimes n}(\mathscr {A})=1-o(1)\).

  1. (1)

    It is easy to see from the construction of \(\mathscr {G}_n\) that if we write the labels of the sites in \(\cup _{t=1}^{K} Z^A_t {\setminus } R^A_t\) in an increasing order, then a collision can occur at the k-th site (in this ordering) with probability \(\leqslant \sum _{i=1}^{|A|+k-1} O_{n,n-i+1}/[\sum _z O_z - k]\). Therefore, for any \(1\leqslant t\leqslant K\) and \(x\in Z^A_t {\setminus } R^A_t\)

    $$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{2,\mathbf {I},\mathbf {O}}(x \in C^A_t) \leqslant \frac{\sum _{i=1}^{\vartheta (|A|,K)} O_{n,n-i+1}}{\sum _z O_z-\vartheta (|A|,K)} \end{aligned}$$

    as \(|\cup _{t=1}^{K} |Z^A_t {\setminus } R^A_t| \leqslant \vartheta (|A|,K)-|A|\). So the assertion follows from the definition of \(\mathscr {A}\).

  2. (2)

    We can imitate the argument of Theorem 3.1.2 of [15] to see that under \({{\mathrm{\mathbb {P}}}}_{2,\mathbf {I},\mathbf {O}}\) the number of self-loops and multiple edges are asymptotically independent, and both of them have asymptotic Poisson distribution whose means are functions of the moments \(\sum _{k,l} k^il^j p^{\text {in,out}}_{k,l}, i, j \in \{0, 1, 2\}\). So \({{\mathrm{\mathbb {P}}}}_{2,\mathbf {I},\mathbf {O}}(F_n)\) has a positive limit. This together with the fact that

    $$\begin{aligned}&\left( Y^x_i, 1\leqslant i\leqslant I_x, x \in [n]\right) \\&\quad \,{\buildrel d \over =}\,\left( \left. {\tilde{Y}}^x_i, 1\leqslant i\leqslant I_x, x \in [n]\right| \sum _{x=1}^n\sum _{i=1}^{I_x} \mathbf {1}_{\{{\tilde{Y}}^x_i = z\}} =O_z \forall z \in [n]\right) . \end{aligned}$$

    gives the desired inequality.

  3. (3)

    For a vector of positive integers \(\mathbf {y}\), we write \(X_i(\mathbf {y})\) for the number of components of \(\mathbf {y}\) which are i. We also write \(\tilde{\mathbf {Y}}=({\tilde{Y}}_1, \ldots , {\tilde{Y}}_N)\) and \(\tilde{\mathbf {Y}}_{a:b}=({\tilde{Y}}_a, {\tilde{Y}}_ {a+1}, \ldots , {\tilde{Y}}_b)\).

    For any \(\mathbf {y}\in [n]^{\varepsilon N}\),

    $$\begin{aligned}&\frac{P\left( \tilde{Y}_{1:\varepsilon N}=\mathbf {y}\left| X_z(\tilde{\mathbf {Y}}) = N\alpha _z \forall z\in [n]\right. \right) }{P\left( \tilde{Y}_{1:\varepsilon N}=\mathbf {y}\right) }\nonumber \\&\quad = \frac{P\left( X_z(\tilde{Y}_{(\varepsilon N + 1):N}) = N\alpha _z-X_z(\mathbf {y}) \forall z\in [n]\right) }{P\left( X_z(\tilde{Y}) = N\alpha _z \forall z\in [n]\right) }. \end{aligned}$$
    (2.5)

    In order to bound the fraction in (2.5) recall that

    $$\begin{aligned} Multinomial(N; \alpha _1, \ldots , \alpha _n) \,{\buildrel d \over =}\,\left( Z_1, \ldots , Z_n\left| \sum _{i=1}^n Z_i = N\right. \right) , \end{aligned}$$
    (2.6)

    where \(\{Z_i\}_{i=1}^n\) are independent and \(Z_i \sim Poisson(N\alpha _i)\). In that case, it is not hard to check that \(P(\sum _{i=1}^n Z_i = N) = (1+o(1))/\sqrt{2\pi N}\) by Stirling’s formula. So, the ratio in (2.5) is

    $$\begin{aligned} (1+o(1)) \sqrt{1-\varepsilon } \prod _{z\in [n]} \frac{P(Y_z=N\alpha _z-X_z(\mathbf {y}))}{P(Z_z=N\alpha _z)}, \end{aligned}$$
    (2.7)

    where \(Y_i \sim Poisson(N(1-\varepsilon )\alpha _i), Z_i \sim Poisson(N\alpha _i), i\in [n],\) and they are independent. The expression in the last display equals

    $$\begin{aligned}&(1+o(1))\sqrt{1-\varepsilon } \prod _{z=1}^n \frac{(N\alpha _z)!}{e^{-N\alpha _z} (N\alpha _z)^{N\alpha _z}} \cdot \frac{e^{-(N\alpha _z(1-\varepsilon ))} (N\alpha _z(1-\varepsilon ))^{(N\alpha _z-X_z(\mathbf {y}))}}{(N\alpha _z-X_z(\mathbf {y}))!} \nonumber \\&\quad \!=\! (1+o(1))\sqrt{1-\varepsilon } [e^\varepsilon (1\!-\!\varepsilon )]^N \exp \left( -log(1\!-\!\varepsilon ) \varepsilon N\right) \prod _{z=1}^n \prod _{i=1}^{X_z(\mathbf {y})} \left( 1-\frac{i-1}{N\alpha _z}\right) . \end{aligned}$$
    (2.8)

    Using the inequality \(1-\varepsilon \leqslant e^{-\varepsilon }\) we get the desired bound.

  4. (4)

    Using the bound of part (3), the total variation distance between these two measures is

    $$\begin{aligned}&\frac{1}{2} \sum _{\mathbf {y}\in [n]^{\varepsilon N}} P\left( \tilde{\mathbf {Y}}_{1:\varepsilon N}=\mathbf {y}\right) \left| \frac{P\left( \tilde{\mathbf {Y}}_{1:\varepsilon N}=\mathbf {y}\left| X_z(\tilde{\mathbf {Y}}_{1:N}) = N\alpha _z \forall z\in [n]\right. \right) }{ P\left( \tilde{\mathbf {Y}}_{1:\varepsilon N}=\mathbf {y}\right) }-1\right| \nonumber \\&\quad \leqslant [1+\exp (-\varepsilon \log (1-\varepsilon ) N)] P\left( \left( \tilde{\mathbf {Y}}_{1:\varepsilon N}\right) \in A^c\right) \nonumber \\&\qquad + \sup _{\mathbf {y}\in A} \left| \frac{P\left( \tilde{\mathbf {Y}}_{1:\varepsilon N}=\mathbf {y}\left| X_z(\tilde{\mathbf {Y}}_{1:N}) = N\alpha _z \forall z\in [n]\right. \right) }{ P\left( \tilde{\mathbf {Y}}_{1:\varepsilon N}=\mathbf {y}\right) }-1\right| \end{aligned}$$
    (2.9)

    for any set A. Now recall from (2.8) that

    $$\begin{aligned}&\frac{P\left( \tilde{Y}_{1:\varepsilon N}=\mathbf {y}\left| X_z(\tilde{\mathbf {y}}) = N\alpha _z \forall z\in [n]\right. \right) }{P\left( \tilde{Y}_{1:\varepsilon N}=\mathbf {y}\right) } \nonumber \\&\quad = (1\!+\!o(1))\sqrt{1-\varepsilon } [e^\varepsilon (1-\varepsilon )]^N \exp \left( -log(1\!-\!\varepsilon ) \varepsilon N\right) \prod _{z=1}^n \prod _{i=1}^{X_z(\mathbf {y})} \left( 1-\frac{i-1}{N\alpha _z}\right) \nonumber \\ \end{aligned}$$
    (2.10)

    Since \(e^x \geqslant 1+x\) for any real number x, the first term in the right hand side of (2.10) lies between \(1-\varepsilon ^2N\) and 1, whereas the second term lies between \(\exp (\varepsilon ^2N)\) and \(\exp (2\varepsilon ^2 N)\) when \(\varepsilon >0\) is small. Also the product term in (2.10) lies between 1 and

    $$\begin{aligned} 1-\sum _{z=1}^n \sum _{i=1}^{X_z(\mathbf {y})} \frac{i-1}{N\alpha _z} = 1 - \sum _{z=1}^n \frac{X_z(\mathbf {y})(X_z(\mathbf {y})-1)}{2N\alpha _z}. \end{aligned}$$

    Consequently, if we take

    $$\begin{aligned} A_\eta :=\left\{ \mathbf {y}\in [n]^{\varepsilon N}: \sum _{z=1}^n \frac{X_z(\mathbf {y})(X_z(\mathbf {y})-1)}{2N\alpha _z} < \eta \varepsilon ^2 N\right\} , \end{aligned}$$

    then

    $$\begin{aligned} \left| \frac{P\left( \tilde{\mathbf {Y}}_{1:\varepsilon N}=\mathbf {y}\left| X_z(\tilde{\mathbf {Y}}_{1:N}) = N\alpha _z \forall z\in [n]\right. \right) }{ P\left( \tilde{\mathbf {Y}}_{1:\varepsilon N}=\mathbf {y}\right) }-1\right| = O(\varepsilon ^2 N)\end{aligned}$$

    whenever \(\mathbf {y}\in A_\eta \). So, in view of (2.9), it suffices to show that \(P(\tilde{\mathbf {Y}}_{1:\varepsilon N} \in A_\eta ^c)=O(\exp (-C\varepsilon N))\) for some constant \(C>0\) and for some suitable choice of \(\eta \).

Note that the joint distribution of \(\{X_z(\tilde{\mathbf {Y}}_{1:\varepsilon N})\}_{z\in [n]}\) is \(Multunomial(\varepsilon N; \alpha _1, \ldots , \alpha _n)\), so using (2.6) and local central limit theorem

$$\begin{aligned} P\left( \tilde{Y}_{1:\varepsilon N} \in A_\eta ^c\right) = (1+o(1)) \sqrt{2\pi N} P\left( \sum _{i=1}^n \frac{\hat{Y}_i(\hat{Y}_i-1)}{N\alpha _i} \geqslant \eta \varepsilon ^2 N\right) , \end{aligned}$$

where \(\{\hat{Y}_i\}_{i=1}^n\) are independent and \(\hat{Y}_i \sim Poisson(\varepsilon N\alpha _i)\). Hence using standard large deviation argument, the above probability is at most \(\exp (-C(\eta )\varepsilon N)\) for some constant \(C(\eta )\) such that \(C(\eta )>0\) when \(\eta \) is large enough. This completes the argument. \(\square \)

Remark 2.3

The assertions (3) and (4) of Lemma 2.2 is true if we replace the index set \(\{1, 2, \ldots , \varepsilon N\}\) by (possibly random) \(\{i_1, i_2, \ldots , i_{\varepsilon N}\}\).

2.3 Ingredients

In this subsection, we will state and prove some of the basic lemmas which will be required in proving our main results.

Lemma 2.4

Let X be any nonnegative random variable such that \(2 (E X)^2 \leqslant E X^2 < \infty \). Then \(\log Ee^{-tX} \leqslant var(X) t^2/2 - E(X) t\) for any \(t>0\).

Proof

Let \(\mu = E X\) and \(\mu _2= \sqrt{E X^2}\) so that \(\sigma ^2=var(X)=\mu _2^2-\mu ^2\). We choose \(p=\mu ^2/\mu _2^2\) and \(\alpha =\mu _2^2/\mu \) so that \(Y:=(1-p){\varvec{\delta }}_0+p{\varvec{\delta }}_\alpha \) satisfies \(EY=\mu \) and \(E Y^2=\mu _2^2\). By Benette’s inequality [8],

$$\begin{aligned} \text { for any }t>0, \log Ee^{-t X} \leqslant \log Ee^{-tY} = \log [(1-p) + p e^{-\alpha t}] =: \varphi (t). \end{aligned}$$
(2.11)

Differentiating the function \(\varphi \) and noting that \(p\alpha =\mu \) and \(\mu \alpha =\mu _2^2\) we get

$$\begin{aligned} \varphi '(t) = \frac{-p\alpha e^{-\alpha t}}{(1-p) + pe^{-\alpha t}} = \frac{-\mu }{(1-p)e^{\alpha t} + p}, \quad \varphi ''(t)=\sigma ^2 \frac{e^{\alpha t}}{[(1-p) e^{\alpha t} + p]^2}. \end{aligned}$$

Also note that the quadratic function \(f(x)=[(1-p) x+p]^2-x\) has nonnegative slope at \(x=1\) if \(2(1-p)\geqslant 1\), which is true by our hypothesis. So \(\varphi ''(t) \leqslant \sigma ^2\) for any \(t\geqslant 0\). Finally using Taylor series expansion for the function \(\varphi \) we see that for any \(t>0\),

$$\begin{aligned} \varphi (t)= & {} \varphi (0) +\varphi '(0) t+ \varphi ''(u) t^2/2 \text { for some }u\in [0,t] \\\leqslant & {} -\mu t +\sigma ^2 t^2/2. \end{aligned}$$

This inequality together with (2.11) gives the desired result. \(\square \)

Lemma 2.5

For any \(\kappa >0\) and \(\Delta \geqslant 1\) the function \(\phi _{\kappa ,\Delta }(\gamma ):=\gamma [\log (\Delta /\gamma )]^\kappa \) is increasing for \(\gamma \leqslant \Delta e^{-\kappa }\) and decreasing for \(\Delta e^{-\kappa } \leqslant \gamma \leqslant \Delta \). Hence \(\phi _{\kappa ,\Delta }(\gamma ) \leqslant \Delta (\kappa /e)^\kappa \) for \(\gamma \in [0, \Delta ]\).

Proof

The derivative of \(\phi _{\kappa ,\Delta }(\gamma )\) with respect to \(\gamma \) is

$$\begin{aligned}{}[\log (\Delta /\gamma )]^{\kappa -1} (\log (\Delta /\gamma ) - \kappa ). \end{aligned}$$

The first factor is positive for \(\gamma < \Delta \) and the second factor is positive for \(\gamma \leqslant \Delta e^{-\kappa }\). Hence the conclusion follows. \(\square \)

Lemma 2.6

For \(\delta >0, \vartheta (m):=\beta _1m[\log (n/m)]^{\beta _2}, 0 < \gamma \leqslant 1\) and any integer \(\Lambda \geqslant 1\) there is an \(\epsilon _{2.6} >0\) depending on \(\Lambda , \beta _1, \beta _2, \gamma \) such that \(m\leqslant \epsilon _{2.6} n\) and \(M\in \mathbb {N}\) imply

$$\begin{aligned} P(Binomial(\Lambda M, (\vartheta (m)/n)^\gamma ) > \frac{1}{\gamma } (1+\delta ) M) \leqslant \exp (-(1+\delta /2)M\log (n/m)). \end{aligned}$$

Proof

A standard large deviations result for the Binomial distribution, see e.g., Lemma 2.8.4 in [15] implies \(P( Binomial(\Lambda M,q) \geqslant \Lambda M r) \leqslant \exp (-\Lambda M H_q(r))\) for any \(r > q\), where

$$\begin{aligned} H_q(r) := r \log \left( \frac{r}{q} \right) + (1-r) \log \left( \frac{1-r}{1-q} \right) . \end{aligned}$$
(2.12)

When \(r=(1+\delta )/(\gamma \Lambda )\), the first term in the large deviation bound (2.12) is

$$\begin{aligned}&\exp (-\Lambda M r\log (r/q))\\&\quad \leqslant \exp \left( - \frac{1}{\gamma }(1+\delta )M\left[ \log \left( \frac{n}{m}\right) ^\gamma -\log \frac{\Lambda \gamma \beta _1^\gamma }{1+\delta } -\beta _2\gamma \log \log \frac{n}{m}\right] \right) \end{aligned}$$

For the second term in the large deviation bound in (2.12) we note that \(1/(1-q) > 1\) and \((1-r)\log (1-r) \geqslant -1/e\) by Lemma 2.5 (with \(\kappa =\Delta =1\)), and conclude

$$\begin{aligned} \exp \left( -\Lambda M(1-r)\log \left( \frac{1-r}{1-q} \right) \right) \leqslant \exp \left( -\Lambda M(1-r)\log (1-r) \right) \leqslant \exp (\Lambda M/e). \end{aligned}$$

Combining the last two estimates

$$\begin{aligned}&P(Binomial(\Lambda M, (\vartheta (m)/n)^\gamma ) > \frac{1}{\gamma } (1+\delta ) M) \\&\quad \leqslant \exp \left( - (1+\delta )M \log (n/m) +\beta _4 M +\beta _5 M \log \log \frac{n}{m}\right) , \end{aligned}$$

for constants \(\beta _4\) and \(\beta _5\).

Now we choose

$$\begin{aligned} \epsilon _{2.6}:= \max \left\{ \epsilon \in (0, e^{-2\beta _5/\delta }): \epsilon \left[ \log \frac{1}{\epsilon }\right] ^{2\beta _5/\delta } \leqslant \exp (-2\beta _4/\delta )\right\} . \end{aligned}$$

Clearly \(\epsilon _{2.6}>0\) and, in view of Lemma 2.5 with \(\kappa =2\beta _5/\delta \) and \(\Delta =1\), \(m\leqslant \epsilon _{2.6} n\) implies

$$\begin{aligned} (m/n)\left[ \log \frac{n}{m}\right] ^{2\beta _5/\delta } \leqslant \epsilon _{2.6}\left[ \log \frac{1}{\epsilon _{2.6}}\right] ^{2\beta _5/\delta } \leqslant \exp (-2\beta _4/\delta ), \end{aligned}$$

which in turn implies \(\beta _4+\beta _5\log \log [n/m] \leqslant (\delta /2)\log (n/m)\). This completes the proof. \(\square \)

3 Choice of ‘good’ graphs \(\mathcal {G}_n\)

For \(B \subset A\subset [n]\), recall the definition of the forest \(\{Z^A_t\}_{t=0}^{K_{|A|}}\) (as described in Sect. 2.2) with associated subsets \(\{O^{A,B}_t\}\) of ‘open’ sites. Let \(\mathbf {p}\) be the limiting out-degree distribution for \(\overleftarrow{\mathscr {G}}_n\), namely

$$\begin{aligned} \mathbf {p}= {\left\{ \begin{array}{ll} \mathbf {p}^{\text {in}}&{} \text { for } \text {RBN}^{\text {in}}\\ \tilde{\mathbf {p}}^{\text {in}} = \left\{ (r^{\text {in}})^{-1} \sum _{l} l p^{\text {in,out}}_{k,l}\right\} _{k=2}^\infty &{} \text { for } \text {RBN}^{\text {in,out}}\end{array}\right. } \end{aligned}$$
(3.1)

with \(p_0=p_1=0\) and mean \(r=\sum _k kp_k>2\).

Proposition 3.1

For any \({\tilde{q}}>1/r\) and small \(\delta >0\) there are constants \(c_1, c_2 > 0\) (which depends on \({\tilde{q}}, \delta \) and the distribution \(\mathbf {p}\)) such that for \(K_m=c_1\log _2(c_2\log (n/m))\) and for \(A \subset [n]\) if we define

$$\begin{aligned} E_A := \cap _{B \in \{B\subset A: |B| \geqslant (1-\delta )|A|\}} \left\{ |O^{A,B}_{K_{|A|}}| \geqslant (4/\delta ) {\tilde{q}}^{-K_{|A|}}|B|\right\} , \end{aligned}$$

then there is an \(\epsilon _{3.1}>0\) such that for any \(a>0\) the probability of

$$\begin{aligned} \mathcal {G}^1_n :=\cap _{A\in \{A\subset [n]: (\log n)^a \leqslant |A| \leqslant \epsilon _{3.1} n\}} E_A \end{aligned}$$

under \({{\mathrm{\mathbb {P}}}}_{i,n}, i=1, 2,\) is \(1-o(1)\).

Proposition 3.1 says that with high probability most of the small subsets of vertices of \(\overleftarrow{\mathscr {G}}_n\) are ‘good’ in the sense that every ‘sufficiently large’ subset of a set A has ‘many’ descendants after \(O(\log \log (n/|A|))\) many generations. Consequently, for every such good subset A, a positive fraction of the vertices in A has individually many descendants. This property will be very helpful in proving Proposition 4.2 later.

Proof of Proposition 3.1

Let \(\mathbf {p}_n=\{p_{n,k}\}_{k\geqslant 2}\) be the distribution

$$\begin{aligned} \mathbf {p}_n :={\left\{ \begin{array}{ll} \frac{1}{n} \sum \nolimits _{z\in [n]} {\varvec{\delta }}_{I_z} &{} \text { for }\text {RBN}^{\text {in}}\\ \sum \nolimits _{z\in [n]} O_z {\varvec{\delta }}_{I_z}/\sum \nolimits _{z\in [n]} O_z &{} \text { for }\text {RBN}^{\text {in,out}}\end{array}\right. }. \end{aligned}$$
(3.2)

In view of Lemma 2.1 and 2.2, \(\mathbf {p}_n\) approximates out-degree distribution for the graph \(\overleftarrow{\mathscr {G}}_n\) with \(p_{n,0}=p_{n,1}=0\). Let \(r_n:=\sum _k kp_{n,k} \in (2,\infty )\) be the mean of \(\mathbf {p}_n\). It is easy to see that \(r_n \rightarrow r\).

In this proof, we write \({{\mathrm{\mathbb {P}}}}\) for the probability distribution on the forests \(\{Z^A_t\}_{t\geqslant 0, A\subset [n]}\), when \(\mathbf {p}_n\) is used as its offspring distribution. We also use \(\tilde{{{\mathrm{\mathbb {P}}}}}\) as a dummy replacement for \({{\mathrm{\mathbb {P}}}}_{1,\mathbf {I}}\) and \({{\mathrm{\mathbb {P}}}}_{2,\mathbf {I},\mathbf {O}}(\cdot | F_n)\). Lemma 2.1 and 2.2 suggest that for any event F involving the structure of the graph which depends on at most \(\varepsilon n\) many vertices of the graph,

$$\begin{aligned} \left. \begin{array}{c} \mathbf {p}^{\text {in}}_{\otimes n}\left( \left\{ \mathbf {I}: \tilde{{{\mathrm{\mathbb {P}}}}}(F) \leqslant C_{2.1} {{\mathrm{\mathbb {P}}}}(F)\right\} \right) \\ \\ \mathbf {p}^{\text {in,out}}_{\otimes n}\left( \left. \left\{ (\mathbf {I},\mathbf {O}): \tilde{{{\mathrm{\mathbb {P}}}}}(F) \leqslant C_{2.2} \exp (-r^{\text {out}}\varepsilon \log (1-\varepsilon )n) {{\mathrm{\mathbb {P}}}}(F)\right\} \right| E_n\right) \end{array} \right\} = 1-o(1). \end{aligned}$$
(3.3)

Now fix \(\eta >0\) small so that \(r(1-\eta )>2\) and \({\tilde{q}}r(1-\eta )>1\), and

$$\begin{aligned} \gamma := {\left\{ \begin{array}{ll} 1 &{} \text { for }\text {RBN}^{\text {in}}\\ \frac{\alpha -2}{\alpha -1} &{} \text { for }\text {RBN}^{\text {in,out}}\text { when } p^{\text {out}}_k \sim ck^{-\alpha } \text { and }\alpha >3 \end{array}\right. }. \end{aligned}$$

When the tail of \(\mathbf {p}^{\text {out}}\) is lighter than polynomial, then \(\gamma \) is taken to be 1. Clearly \(1/\gamma <2<r(1-\eta )\), so we can choose \(\delta \in (0, 1/10)\) small enough such that \((1+5\delta )/(2\gamma ) < 1\). We need to introduce some more notations, let

$$\begin{aligned} {\tilde{r}}&:= r_n(1-\eta ) \text { so that }{\tilde{q}}{\tilde{r}}>1 \text { for large enough } n, \end{aligned}$$
(3.4)
$$\begin{aligned} \rho&> 0 \text { be such that } 2^{\rho -1}\left( 1-\frac{1+5\delta }{2\gamma }\right) \geqslant 1 \text { and } ({\tilde{q}}{\tilde{r}})^\rho \left( 1-\frac{1+5\delta }{\gamma {\tilde{r}}}\right) > 1,\nonumber \\ \sigma&\geqslant 1 \text { be such that } \left[ ({\tilde{q}}{\tilde{r}})^\rho \left( 1-\frac{1+5\delta }{\gamma {\tilde{r}}}\right) \right] ^\sigma \geqslant {\tilde{r}}^\rho ,\nonumber \\ I_n(\eta )&:= \sup _\theta \left( \theta r(1--\eta )-\log \left( \sum _k e^{\theta k} p_{n,k}\right) \right) > 0 \, (\text {LDP rate function for }\mathbf {p}_n) \nonumber \\ k_m&:= \log _2\left[ \frac{1+3\delta }{({\tilde{r}}-\gamma ^{-1}(1+5\delta )) I_n(\eta )} \log \frac{n}{m}\right] ,\text { and }K_m=\rho \sigma k_m \text { for }m \leqslant n, \text { so that } \\ \vartheta (m)&:= m+\sum _{l=1}^{K_m} (2r)^l \leqslant \beta _1m[\log (n/m)]^{\beta _2} \nonumber \\ \text { for } \beta _1&=\left( 1+\frac{2r}{2r-1}\right) \left( \frac{1+3\delta }{({\tilde{r}}-\gamma ^{-1}(1+5\delta )) I_n(\eta )}\right) ^{\rho \sigma \log _2(2r)} \text { and } \beta _2 := \rho \sigma \log _2(2r). \nonumber \end{aligned}$$
(3.5)

Suppose \(B\subset A\subset [n]\) are subsets such that \(|A|=m\) and \(|B|\geqslant (1-\delta )m\). For \(k\geqslant 1\), define the events

$$\begin{aligned} H^{A,B}_k:= & {} \left\{ \sum _{i=1}^{\rho } |C^{A,B}_{\rho (k-1)+i}| \leqslant \frac{1}{\gamma }(1+5\delta )|O^{A,B}_{\rho (k-1)}|\right\} ,\\ L^{A,B}_{k,j}:= & {} \left\{ |Z^{A,B}_{\rho (k-1)+j}| \geqslant {\tilde{r}}|O^{A,B}_{\rho (k-1)+j-1}|\right\} \text { and } L^{A,B}_k := \cap _{j=1}^{\rho } L^{A,B}_{k,j}. \end{aligned}$$

Note that on the event \(H^{A,B}_k\),

$$\begin{aligned} \left| O^{A,B}_{\rho k}\right|\geqslant & {} 2\left| O^{A,B}_{\rho k-1}\right| - \left| C^{A,B}_{\rho k}\right| \nonumber \\\geqslant & {} 2^2\left| O^{A,B}_{\rho k-2}\right| - 2\left| C^{A,B}_{\rho k-1}\right| -\left| C^{A,B}_{\rho k}\right| \nonumber \\\geqslant & {} \cdots \nonumber \\\geqslant & {} 2^\rho \left| O^{A,B}_{\rho (k-1)}\right| - \sum _{i=1}^\rho 2^{\rho -i} \left| C^{A,B}_{\rho (k-1)+i}\right| \nonumber \\\geqslant & {} 2^\rho \left| O^{A,B}_{\rho (k-1)}\right| - 2^{\rho -1}\sum _{i=1}^\rho \left| C^{A,B}_{\rho (k-1)+i}\right| \nonumber \\\geqslant & {} (2^\rho -2^{\rho -1}\gamma ^{-1} (1+5\delta )) \left| O^{A,B}_{\rho (k-1)}\right| \geqslant 2\left| O^{A,B}_{\rho (k-1)}\right| , \end{aligned}$$
(3.6)

by the choice of \(\rho \). Since \(|O^{A,B}_t| \leqslant (2r) |O^{A,B}_{t-1}|\) for any \(t\geqslant 1\), a similar argument which leads to the previous display suggests that the following inequalities are true on the event \(H^{A,B}_k\cap \cap _{j=1}^i L^{A,B}_{k,j}\).

$$\begin{aligned} \left| O^{A,B}_{\rho (k-1)+i}\right|\geqslant & {} {\tilde{r}}\left| O^{A,B}_{\rho (k-1)+i-1}\right| - \left| C^{A,B}_{\rho (k-1)+i}\right| \nonumber \\\geqslant & {} {\tilde{r}}^2 \left| O^{A,B}_{\rho (k-1)+i-2}\right| -{\tilde{r}}\left| C^{A,B}_{\rho (k-1)+i-1}\right| - \left| C^{A,B}_{\rho (k-1)+i}\right| \nonumber \\\geqslant & {} \cdots \nonumber \\\geqslant & {} {\tilde{r}}^i \left| O^{A,B}_{\rho (k-1)}\right| - \sum _{j=1}^i {\tilde{r}}^{i-j} \left| C^{A,B}_{\rho (k-1)+j}\right| \nonumber \\\geqslant & {} {\tilde{r}}^i \left| O^{A,B}_{\rho (k-1)}\right| - {\tilde{r}}^{i-1} \sum _{j=1}^i \left| C^{A,B}_{\rho (k-1)+j}\right| \nonumber \\\geqslant & {} ({\tilde{r}}^i-{\tilde{r}}^{i-1} \gamma ^{-1} (1+5\delta )) \left| O^{A,B}_{\rho (k-1)}\right| \end{aligned}$$
(3.7)
$$\begin{aligned}\geqslant & {} {\tilde{r}}\left( 1-\frac{1+5\delta }{\gamma {\tilde{r}}}\right) \left| O^{A,B}_{\rho (k-1)}\right| , \end{aligned}$$
(3.8)

Taking \(i=\rho \) in (3.7),

$$\begin{aligned} \left| O^{A,B}_{\rho k}\right| \geqslant {\tilde{r}}^\rho \left( 1-\frac{1+5\delta }{\gamma {\tilde{r}}}\right) \left| O^{A,B}_{\rho (k-1)}\right| \quad \text {on the event } H^{A,B}_k \cap L^{A,B}_k. \end{aligned}$$
(3.9)

Recalling \(K_m=\rho \sigma k_m\) and using (3.6) and (3.9) repeatedly,

$$\begin{aligned} \left| O^{A,B}_{K_m}\right| \geqslant |B|2^{k_m}\left[ {\tilde{r}}^\rho \left( 1-\frac{1+5\delta }{\gamma {\tilde{r}}}\right) \right] ^{(\sigma -1)k_m} \text { on the event }\cap _{k=1}^{\sigma k_m} H^{A,B}_k \cap \cap _{k=k_m+1}^{\sigma k_m} L^{A,B}_k. \end{aligned}$$

Now note that

$$\begin{aligned} {\tilde{q}}^{\rho \sigma } \left[ {\tilde{r}}^\rho \left( 1-\frac{1+5\delta }{\gamma {\tilde{r}}}\right) \right] ^{\sigma -1} \geqslant \left[ ({\tilde{q}}{\tilde{r}})^\rho \left( 1-\frac{1+5\delta }{\gamma {\tilde{r}}}\right) \right] ^\sigma {\tilde{r}}^{-\rho } \geqslant 1 \end{aligned}$$

by the choices of \(\rho \) and \(\sigma \). So if

$$\begin{aligned} (m/n) \leqslant \exp \left( -\frac{4({\tilde{r}}-\gamma ^{-1}(1+5\delta )) I_n(\eta )}{\delta (1+3\delta )}\right) \text { so that }2^{k_m}\geqslant (4/\delta ), \end{aligned}$$
(3.10)

then

$$\begin{aligned} \left| O^{A,B}_{K_m}\right| \geqslant |B|(4/\delta ){\tilde{q}}^{-K_m}\quad \text {on the event } \cap _{k=1}^{\sigma k_m} H^{A,B}_k \cap \cap _{k=k_m+1}^{\sigma k_m} L^{A,B}_k. \end{aligned}$$
(3.11)

To estimate the probability of the event in (3.11) recall that \(|C^{A,B}_{t+1}|, |O^{A,B}_{t+1}| \leqslant (2r) |O^{A,B}_t|\) for any \(t\geqslant 0\) and by Lemma 2.1 and 2.2 each site is included in \(C^{A,B}_t \subset C^A_t\) with probability at most \(c(\vartheta (m)/n)^\gamma \). So for any \(k\geqslant 1\), \(\sum _{i=1}^\rho |C^{A,B}_{\rho (k-1)+i}|\) conditionally on \(|O^{A,B}_{\rho (k-1)}|\) is stochastically dominated by the \(Binomial(\Lambda M, c(\vartheta (m)/n)^\gamma )\) distribution, where \(\Lambda =(2r) + (2r)^2 + \cdots + (2r)^\rho \) and \(M=|O^{A,B}_{\rho (k-1)}|\). Hence, applying Lemma 2.6 with the above choices of \(\Lambda \) and M if

$$\begin{aligned} m \leqslant \epsilon _{2.6}(\Lambda ,5\delta ,\eta ,\gamma ) n, \end{aligned}$$
(3.12)

then

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \left. (H^{A,B}_k)^c\right| \left| O^{A,B}_{\rho (k-1)}\right| \right) \leqslant \exp \left( -(1+5\delta /2)\left| O^{A,B}_{\rho (k-1)}\right| \log (n/m)\right) . \end{aligned}$$

Since \(|O^{A,B}_{\rho (k-1)}| \geqslant |B|\) on the event \(\cap _{j=1}^{k-1} H^{A,B}_j\) by (3.6), the above inequality reduces to

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( (H^{A,B}_k)^c \cap \cap _{j=1}^{k-1} H^{A,B}_j\right)\leqslant & {} \exp \left( -(1+5\delta /2)|B|\log (n/m)\right) \nonumber \\\leqslant & {} \exp \left( -(1+\delta )m\log (n/m)\right) . \end{aligned}$$
(3.13)

The last inequality follows from the fact that \(|B|\geqslant (1-\delta )m\) and \(\delta \in (0,1/10)\), which makes \((1+5\delta /2)(1-\delta )\geqslant 1+\delta \).

By the choice of \(I_n(\eta )\), a standard large deviation argument for the sum of i.i.d. random variables yields

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \left. (L^{A,B}_{k,i})^c\right| \left| O^{A,B}_{\rho (k-1)+i-1}\right| \right) \leqslant \exp \left( -\left| O^{A,B}_{\rho (k-1)+i-1}\right| I_n(\eta )\right) \end{aligned}$$

for any \(k\geqslant 1\) and \(1\leqslant i\leqslant \rho \). Now repeated applications of the inequality in (3.6) suggest that \(|O^{A,B}_{\rho k_m}| \geqslant 2^{k_m}|B|\) on the event \(\cap _{j=1}^{k_m} H^{A,B}_j\). In view of (3.6) and (3.8), for any \(k>k_m\) and \(1\leqslant i\leqslant \rho \),

$$\begin{aligned} \left| O^{A,B}_{\rho (k-1)+i-1}\right| \geqslant ({\tilde{r}}-\gamma ^{-1}(1+5\delta ))\left| O^{A,B}_{\rho (k-1)}\right| \geqslant ({\tilde{r}}-\gamma ^{-1}(1+5\delta )) \left| O^{A,B}_{\rho k_m}\right| \end{aligned}$$

on the event \(\cap _{j=k_m+1}^k H^{A,B}_j \cap _{j=1}^{i-1} L^{A,B}_{k,j}\). So the inequality in the last display reduces to

$$\begin{aligned}&{{\mathrm{\mathbb {P}}}}\left( \left( L^{A,B}_{k,i}\right) ^c\cap _{j=1}^{i-1} L^{A,B}_{k,j} \cap _{j=1}^{k} H^{A,B}_j\right) \leqslant \exp \left( -({\tilde{r}}-\gamma ^{-1}(1+5\delta ))2^{k_m}|B| I_n(\eta )\right) \nonumber \\&\quad \leqslant \exp (-(1+3\delta )|B|\log (n/m)) \leqslant \exp (-(1+\delta )m\log (n/m)). \end{aligned}$$
(3.14)

The last two inequalities follow from the definition of \(k_m\) and the facts that \(|B| \geqslant (1-\delta )m\), which implies \((1+3\delta )|B| \geqslant (1+\delta )m\) for \(\delta \in (0,1/10)\). Applying Lemma 2.5 with \(\kappa =\Delta =1\),

$$\begin{aligned} m\log (n/m) = n \phi _{1,1}(m/n) \geqslant n \phi _{1,1}(1/n) = \log n \text { for }m\leqslant n/e. \end{aligned}$$
(3.15)

Combining (3.11), (3.13) and (3.14) if m / n is small satisfying (3.10), (3.12) and (3.15), then

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}\left( \left| O^{A,B}_{K_m}\right| < |B|(4/\delta ){\tilde{q}}^{-K_m}\right)\leqslant & {} {{\mathrm{\mathbb {P}}}}\left( \left( \cap _{k=1}^{\sigma k_m} H^{A,B}_k \cap \cap _{k=k_m+1}^{\sigma k_m} L^{A,B}_k\right) ^c\right) \\\leqslant & {} \sum _{k=1}^{\sigma k_m} {{\mathrm{\mathbb {P}}}}\left( \left( H^{A,B}_k\right) ^c \cap _{j=1}^{k-1} H^{A,B}_j\right) \\&+ \sum _{k=k_m+1}^{\sigma k_m} \sum _{i=1}^{\rho } {{\mathrm{\mathbb {P}}}}\left( \left( L^{A,B}_{k,i}\right) ^c\cap _{j=1}^{i-1} L^{A,B}_{k,j} \cap _{j=1}^{k} H^{A,B}_j\right) \\\leqslant & {} [\sigma +\rho (\sigma -1)] k_m \exp (-(1+\delta )m\log (n/m)) \\\leqslant & {} [\sigma +\rho (\sigma -1)] \log _2[C\log n] \\&\times \exp (-(1+3\delta /4)m\log (n/m))n^{-\delta /4} \\\leqslant & {} \exp (-(1+3\delta /4)m\log (n/m)) \end{aligned}$$

for large enough n. Since the event considered in the last display involves at most \(\vartheta (m)\) vertices of the graph, the above estimate together with (3.3) implies

$$\begin{aligned} \tilde{{{\mathrm{\mathbb {P}}}}}\left( \left| O^{A,B}_{K_m}\right| < |B|(4/\delta ){\tilde{q}}^{-K_m}\right) \leqslant \exp (-(1+3\delta /8)m\log (n/m)), \end{aligned}$$

with (\(\mathbf {p}^{\text {in}}_{\otimes n}/\mathbf {p}^{\text {in,out}}_{\otimes n}\)) probability \(1-o(1)\) provided \(m/n \leqslant \varepsilon \) is small.

Using this estimate and union bound we see that if m / n is small, then

$$\begin{aligned}&\tilde{{{\mathrm{\mathbb {P}}}}} \left( \cup _{A\in \{A\subset [n]: |A|=m\}} E_A^c\right) \nonumber \\&\quad \leqslant \tilde{{{\mathrm{\mathbb {P}}}}} \left( \cup _{m'\in [(1-\delta )m, m]} \cup _{\{(A,B): B\subset A\subset [n], |A|=m, |B|=m'\}} \left\{ \left| O^{A,B}_{K_m}\right| <(4/\delta ){\tilde{q}}^{-K_m}|B|\right\} \right) \nonumber \\&\quad \leqslant \sum _{m'\in [(1-\delta )m, m]} {n\atopwithdelims ()m} {m\atopwithdelims ()m'}\exp \left( -(1+3\delta /8)m\log \frac{n}{m}\right) . \end{aligned}$$
(3.16)

It is easy to check that \({L\atopwithdelims ()l}\leqslant \frac{L^l}{l!} \leqslant (Le/l)^l\) for any positive integers \(l\leqslant L\) and the function \(\phi _{1,e}(\cdot )\) defined in Lemma 2.5 is increasing on (0, 1). So for \(m'\geqslant (1-\delta )m\),

$$\begin{aligned} {n\atopwithdelims ()m} \leqslant \left( \frac{ne}{m}\right) ^m \quad \text {and}\quad {m\atopwithdelims ()m'}= & {} {m\atopwithdelims ()m-m'} \\\leqslant & {} \left( \frac{me}{m-m'}\right) ^{m-m'} =\exp \left[ m \phi _{1,e}\left( \frac{m-m'}{m}\right) \right] \\\leqslant & {} \exp (m\phi _{1,e}(\delta )) \leqslant (e/\delta )^{\delta m}. \end{aligned}$$

Also there are at most \(m\leqslant e^m\) choices for \(m'\). Using these bounds the right hand side of (3.16)

$$\begin{aligned}\leqslant & {} \exp [m+m\log (ne/m)+m\delta \log (e/\delta )-(1+3\delta /8)m\log (n/m)]\\\leqslant & {} \exp [-(3\delta /8)m\log (n/m)+\Delta _1 m] \end{aligned}$$

for some constant \(\Delta _1\). If \(m/n \leqslant \exp (-8\Delta _1/\delta )\), then the right hand side of the last display is \(\leqslant \exp [-(\delta /4)m\log (n/m)]\). Therefore, if \(\epsilon _{3.1}\) is chosen small enough, then for any \(m\leqslant \epsilon _{3.1} n\),

$$\begin{aligned} \tilde{{{\mathrm{\mathbb {P}}}}} \left( \cup _{A\in \{A\subset [n]: |A|=m\}} E_A^c\right) \leqslant \exp [-(\delta /4)m\log (n/m)]. \end{aligned}$$

Combining this with the fact that \(m\mapsto m\log (n/m)\) is increasing for \(m\leqslant n/e\) (by Lemma 2.5),

$$\begin{aligned} \tilde{{{\mathrm{\mathbb {P}}}}}((\mathcal {G}^1_n)^c)\leqslant & {} \sum _{m \leqslant [(\log n)^a,\epsilon _{3.1} n]} \tilde{{{\mathrm{\mathbb {P}}}}} \left( \cup _{A\in \{A\subset [n]: |A|=m\}} E_A^c\right) \\\leqslant & {} \sum _{m \in [(\log n)^a,\epsilon _{3.1} n]} \exp [-(\delta /4)(\log n)^a\log (n/(\log n)^a)] \\\leqslant & {} n \exp [-(\delta /4)(\log n)^{1+a}(1+o(1))] = o(1/\sqrt{n}). \end{aligned}$$

This together with (3.3) completes the proof. \(\square \)

Recall the definition of \({\varvec{\pi }}(\cdot ,\cdot )\) from (1.16) and let \(CL^x_k := \cup _{l=0}^k\overleftarrow{Z}^{\{x\}}_l\) be the oriented cluster of depth k starting from \(x\in [n]\) in the graph \(\overleftarrow{\mathscr {G}}_n\). For any sequence \(\{t_n\}\) as in the statement of our theorems and \({\tilde{r}}\) as in (3.4), define the events

$$\begin{aligned} A_x:= & {} \left\{ \left| \overleftarrow{\xi }^{\{x\}}_{2a\log \log n/\log (q{\tilde{r}})}\right| \geqslant (\log n)^a\right\} , \tilde{A}_x := \left\{ \overleftarrow{\xi }^{\{x\}}_{t_n\wedge 2a\log \log n/\log (q{\tilde{r}})} \ne \emptyset \right\} ,\\ A_{x,y}:= & {} \left\{ CL^x_{2a\log \log n/\log (q{\tilde{r}})} \cap CL^y_{2a\log \log n/\log (q{\tilde{r}})}=\emptyset \right\} . \end{aligned}$$

Proposition 3.2

For \({\varvec{\pi }}(\cdot ,\cdot )\) as in (1.16), \(\mathbf {p}\) as in (3.1), \(q>1/r\) and \(\varepsilon >0\) let \(\pi :={\varvec{\pi }}(\mathbf {p},q)\) and

$$\begin{aligned}&\mathcal {G}^2_n := \left\{ \mathscr {G}_n: n(\pi -\varepsilon ) \leqslant \sum _{x\in [n]} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_x), \sum _{x\in [n]} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(\tilde{A}_x) \leqslant n(\pi +\varepsilon )\right\} \\&\qquad \cap \left\{ \sum _{x,y\in [n], x\ne y}\mathbf {1}_{A_{x,y}^c} \leqslant n^{9/5}\right\} . \end{aligned}$$

Then \({{\mathrm{\mathbb {P}}}}_{i,n}(\mathcal {G}^2_n)=1-o(1)\) for \(i=1, 2\).

Proof

In this proof also the notations \({{\mathrm{\mathbb {P}}}}\) and \(\tilde{{{\mathrm{\mathbb {P}}}}}\) serve the same purpose as they did in the proof of Proposition 3.1. \(\mathbb {E}\) and \(\tilde{\mathbb {E}}\) denote the corresponding expectations. Also let \(s_n:=2a\log \log n/\log (q{\tilde{r}}), \tilde{s}_n:=t_n\wedge 2a\log \log n/\log (q{\tilde{r}})\).

First we note that if

$$\begin{aligned} C_x := \left\{ \left| CL^x_{2a\log \log n/\log (q{\tilde{r}})}\right| \leqslant n^{1/4}\right\} , \text { then } {{\mathrm{\mathbb {P}}}}(C_x), \tilde{{{\mathrm{\mathbb {P}}}}}(C_x) \geqslant 1 - n^{-1/8}, \end{aligned}$$
(3.17)

using Markov inequality. This bound and (4) of Lemma 2.2 with \(\varepsilon =n^{-3/4}\) imply

$$\begin{aligned}&\left| \mathbb {E}\left( {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_x)\right) - \tilde{\mathbb {E}}\left( {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_x)\right) \right| = o(1) \nonumber \\&\quad + \left| \mathbb {E}\left( {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_x)\right) \mathbf {1}_{C_x} - \tilde{\mathbb {E}}\left( {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_x)\right) \mathbf {1}_{C_x}\right| = o(1). \end{aligned}$$
(3.18)

Now if \(B_x\) denotes the event that collision does not occur in the cluster \(CL^x_{2a\log \log n/\log (q{\tilde{r}})}\), then combining (3.17) with Lemma 2.1 and 2.2,

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}(B_x^c) = o(1)+{{\mathrm{\mathbb {P}}}}(B_x^c \cap C_x) = o(1) + \tilde{{{\mathrm{\mathbb {P}}}}}(B_x^c \cap C_x) = o(1). \end{aligned}$$
(3.19)

On the event \(B_x\), the law of \(|\overleftarrow{\xi }^{\{x\}}_t|, 0\leqslant t\leqslant 2a\log \log n/\log (q{\tilde{r}})\) under the annealed measure \({{\mathrm{\mathbb {P}}}}\times {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\) is same as that of a branching process with offspring distribution \((1-q) {\varvec{\delta }}_0+q\mathbf {p}_n\), where \(\mathbf {p}_n\) is as in (3.2). So if \(\{Z_t\}_{t\geqslant 0}\) is such a branching process with \(Z_0=1\), then its survival probability is \({\varvec{\pi }}(\mathbf {p}_n,q)\) and using (3.19),

$$\begin{aligned} \mathbb {E}[{{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_x)] = P\left( Z_{s_n} > (q{\tilde{r}})^{s_n/2}\right) + o(1). \end{aligned}$$

In order to estimate the right hand side of the last display we use large deviation estimate for branching processes conditioned on their survival (see [7]), which says that for any \(\delta >0\) there is \(c(\delta )>0\) such that

$$\begin{aligned} P\left( \left. \left| \frac{Z_{t+1}}{Z_t} -EZ_1\right| >\delta \right| B\right) \leqslant e^{-c(\delta )t}.\quad \text {for } B := \{Z_t \text { survives}\}. \end{aligned}$$

Now, by the definition of \({\tilde{r}}\) and the fact that \(r_n \rightarrow r\), \(q{\tilde{r}}\geqslant E(Z_1) -2r\eta \) for large enough n. Iterating the probability bound in the last display and replacing \(\delta \) by \(2r\eta \)

$$\begin{aligned} P\left( \left. Z_{s_n} > (q{\tilde{r}})^{s_n/2}\right| B \right)\geqslant & {} P\left( \left. \cap _{s_n/2<t\leqslant s_n}\{Z_t>(EZ_1 - 2r\eta )Z_{t-1}\}\right| B\right) \\\geqslant & {} 1-(s_n/2)\exp (-c(2r\eta )s_n/2)=1-o(1). \end{aligned}$$

Combining this with the fact \(P(B)={\varvec{\pi }}(\mathbf {p}_n,q)\) we have \(P(Z_{s_n} > (q{\tilde{r}})^{s_n/2}) = {\varvec{\pi }}(\mathbf {p}_n,q)+o(1)\). This together with (3.18) and the fact that \({\varvec{\pi }}(\mathbf {p}_n,q) \rightarrow \pi \) as \(n \rightarrow \infty \) implies

$$\begin{aligned} \tilde{\mathbb {E}}[{{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_x)] = \pi +o(1). \end{aligned}$$
(3.20)

Repeating the argument, which leads to (3.20), after replacing \(s_n\) by \(\tilde{s}_n\) we get

$$\begin{aligned} \tilde{\mathbb {E}}[{{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(\tilde{A}_x)] = \pi +o(1). \end{aligned}$$
(3.21)

Also using (3.17) and Lemma 2.1 and 2.2

$$\begin{aligned} \tilde{{{\mathrm{\mathbb {P}}}}}(A_{x,y}^c)\leqslant & {} \tilde{{{\mathrm{\mathbb {P}}}}}(C_x^c) + \tilde{{{\mathrm{\mathbb {P}}}}}(C_y^c) + \tilde{{{\mathrm{\mathbb {P}}}}}(A_{x,y}^c \cap C_x \cap C_y) \nonumber \\\leqslant & {} 2n^{-1/8} + c n^{1/4}\left( \frac{n^{1/4}}{n}\right) ^{(\alpha -2)/(\alpha -1)} \leqslant cn^{-1/8} \end{aligned}$$
(3.22)

for some constant \(c>0\). Using the last inequality and following the argument which leads to (2.13) in [9], if \(x_1, x_2 \in [n]\) are such that \(x_1\ne x_2\), then

$$\begin{aligned}&\tilde{\mathbb {E}}[{{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_{x_1}) {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_{x_2})] - \tilde{\mathbb {E}}[{{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_{x_1})]\tilde{\mathbb {E}}[{{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_{x_2})]\\&\quad \leqslant \tilde{{{\mathrm{\mathbb {P}}}}}(A_{x_1,x_2}^c)[1+1/\tilde{{{\mathrm{\mathbb {P}}}}}(A_{x_1,x_2})] \leqslant cn^{-1/8}. \end{aligned}$$

So using a standard second moment argument and then combining with (3.20)

$$\begin{aligned} \tilde{{{\mathrm{\mathbb {P}}}}}\left( n(\pi -\varepsilon ) \leqslant \sum _{x\in [n]} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_x), \sum _{x\in [n]} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(\tilde{A}_x) \leqslant n(\pi +\varepsilon )\right) = 1- o(1). \end{aligned}$$

By a similar argument if \(x_1, x_2, x_3, x_4 \in [n]\) are such that \(x_1\ne x_2\), \(x_3\ne x_4\) and \(\{x_1, x_2\} \cap \{x_3, x_4\} = \emptyset \), then

$$\begin{aligned}&\tilde{\mathbb {E}}[A_{x_1,x_2} \cap A_{x_3,x_4}] - \tilde{\mathbb {E}}[A_{x_1,x_2}]\tilde{\mathbb {E}}[A_{x_3,x_4}] \\&\quad \leqslant \tilde{{{\mathrm{\mathbb {P}}}}}(\cup _{i\in \{1,2\}, j\in \{3,4\}} A_{x_i,x_j}^c)[1+1/\tilde{{{\mathrm{\mathbb {P}}}}}(\cap _{i\in \{1,2\}, j\in \{3,4\}} A_{x_i,x_j})] \leqslant c n^{-1/8} \leqslant c n^{-1/8}, \end{aligned}$$

and hence combining with (3.22) and using the standard second moment argument,

$$\begin{aligned} \tilde{{{\mathrm{\mathbb {P}}}}}\left( \sum _{x,y\in [n], x\ne y} \mathbf {1}_{A_{x,y}^c} > n^{-1/10}{n\atopwithdelims ()2}\right) = o(1). \end{aligned}$$

This completes the proof. \(\square \)

4 Proofs of the theorems

Let \(\mathscr {F}\) be a forest consisting of m rooted directed trees and let \(\mathscr {F}_{k,i}\) denote the set of vertices of the i-th tree which are at oriented distance k from the root level and \(\mathscr {F}_k=\cup _{i=1}^m \mathscr {F}_{k,i}\).

Lemma 4.1

Suppose \(P_{\mathscr {F}, q}\) denotes the law of \(\{\overleftarrow{\xi }^A_t, A\subset \mathscr {F}_0, t\geqslant 0\}\) on the directed forest \(\mathscr {F}\). If \({\tilde{q}}=q\wedge (1/2)\) and if \(|\mathscr {F}_k| \geqslant 2{\tilde{q}}^{-k}|\mathscr {F}_0|\), then

$$\begin{aligned} P_{\mathscr {F}, q}\left( \left| \overleftarrow{\xi }^{\mathscr {F}_0}_k\right| \leqslant |\mathscr {F}_0| \right) \leqslant \exp \left( -c {\tilde{q}}^k |\mathscr {F}_k|^2/\sum _{i=1}^m |\mathscr {F}_{k,i}|^2\right) . \end{aligned}$$

Proof

Let \(m=|\mathscr {F}_0|\). For \(x\in \mathscr {F}_k\) let \(Y_x:=\mathbf {1}\{x \in \overleftarrow{\xi }^{\mathscr {F}_0}_k\}\) and for \(1\leqslant i\leqslant m\) let \(N_i:=\sum _x Y_x \mathbf {1}\{x\in \mathscr {F}_{k,i}\}\). It is easy to see that if l(xy) equals half of the distance between x and y in the forest ignoring the orientation of the edges, then

$$\begin{aligned}&E_{\mathscr {F}, q} Y_x = q^k, \quad E_{\mathscr {F}, q} (Y_x Y_z) = {\left\{ \begin{array}{ll} q^k &{} \text { if } x=z\\ q^{k+l(x,y)-1} &{} \text { if } 1\leqslant l(x,y) \leqslant k,\\ q^{2k} &{} \text { otherwise } \end{array}\right. } \text { so that } \\&E_{\mathscr {F}, q} N_i = q^k|\mathscr {F}_{k,i}|, E_{\mathscr {F}, q} N_i^2 = \sum _{x,z \in \mathscr {F}_{k,i}} E_{\mathscr {F}, q} (Y_x Y_z) \in \left[ q^{2k-1}|\mathscr {F}_{k,i}|^2 , q^k|\mathscr {F}_{k,i}|^2\right] . \end{aligned}$$

By our hypothesis, \(\sum _{i=1}^m E_{\mathscr {F}, {\tilde{q}}} N_i \geqslant 2m\). It is easy to see that the hypothesis of Lemma 2.4 is satisfied for the random variables \(\{N_i\}\) under the probability distribution \(P_{\mathscr {F},{\tilde{q}}}\), as \({\tilde{q}}\leqslant 1/2\). So applying this lemma

$$\begin{aligned} P_{\mathscr {F}, {\tilde{q}}}\left( \left| \overleftarrow{\xi }^{\mathscr {F}_0}_k\right| \leqslant m\right)= & {} P_{\mathscr {F}, {\tilde{q}}}\left( \sum _{i=1}^m N_i \leqslant m\right) \leqslant \exp \left( tm + \sum _{i=1}^m \log E_{\mathscr {F}, {\tilde{q}}} e^{-t N_i}\right) \\\leqslant & {} \exp \left( tm + \sum _{i=1}^m \left[ -t E_{\mathscr {F}, {\tilde{q}}} N_i + \left( E_{\mathscr {F}, {\tilde{q}}}N_i^2 - [E_{\mathscr {F}, {\tilde{q}}} N_i]^2\right) t^2/2\right] \right) \\\leqslant & {} \exp \left( -\sum _{i=1}^m \left[ (t/2) E_{\mathscr {F}, {\tilde{q}}} N_i - \left( E_{\mathscr {F}, {\tilde{q}}}N_i^2-\left[ E_{\mathscr {F}, {\tilde{q}}}N_i\right] ^2\right) t^2/2\right] \right) \end{aligned}$$

for any \(t\geqslant 0\). Optimizing the last expression with respect to t and noting that \(at-bt^2/2 \leqslant a^2/2b\) for any \(a, b>0\) we have

$$\begin{aligned} P_{\mathscr {F}, {\tilde{q}}}\left( \left| \overleftarrow{\xi }^{\mathscr {F}_0}_k\right| \leqslant m\right)\leqslant & {} \exp \left( -\frac{\left( \sum _{i=1}^m E_{\mathscr {F}, {\tilde{q}}} N_i\right) ^2}{8\sum _{i=1}^m \left( E_{\mathscr {F}, {\tilde{q}}} N_i^2 - (E_{\mathscr {F}, {\tilde{q}}} N_i)^2\right) }\right) \\\leqslant & {} \exp \left( -\frac{{\tilde{q}}^{2k}/8}{{\tilde{q}}^k-{\tilde{q}}^{2k}} \left( \sum _{i=1}^m\left| \mathscr {F}_{k,i}\right| \right) ^2/\sum _{i=1}^m\left| \mathscr {F}_{k,i}\right| ^2\right) . \end{aligned}$$

This completes the proof with \(c=1/8\), as \(P_{\mathscr {F},q}\) stochastically dominates \(P_{\mathscr {F},{\tilde{q}}}\). \(\square \)

Proposition 4.2

Let \({\tilde{q}}:=q\wedge (1/2)\) and \(\delta >0\) be small, and \(\epsilon _{3.1}, \{K_m\}, \mathcal {G}^1_n\) be as in Proposition 3.1. There are constants \(C_{4.2}, b > 0\) such that if \(\mathscr {G}_n \in \mathcal {G}^1_n\) and \(A\subset [n]\) has size \(m\leqslant \epsilon _{3.1} n\), then

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \left| \overleftarrow{\xi }^A_{K_{m}}\right| \leqslant m\right) \leqslant \exp \left( -C_{4.2} m(\log (n/m))^{-b}\right) . \end{aligned}$$

Proof

For \(\mathscr {G}_n \in \mathcal {G}^1_n\), any \(A \subset [n]\) with \(|A|=m\leqslant \epsilon _{3.1} n\) and \(\delta >0\) small (as in Proposition 3.1), define

$$\begin{aligned} \tau _A := \left\{ x\in A: \left| O^{A,\{x\}}_{K_m}\right| \geqslant (4/\delta ){\tilde{q}}^{-K_m}\right\} . \end{aligned}$$

Clearly \(|\tau _A| \geqslant \delta |A|\), because otherwise \(B=A\setminus \tau _A\) will have \(|B| \geqslant (1-\delta )|A|\) and

$$\begin{aligned} \left| O^{A,B}_{K_m}\right| \leqslant \sum _{x\in B} \left| O^{A,\{x\}}_{K_m}\right| <(4/\delta ){\tilde{q}}^{-K_m} |B| \end{aligned}$$

by the definition of \(\tau _A\). This contradicts the fact that \(\mathscr {G}_n \in \mathcal {G}^1_n\).

Let \(\mathscr {F}\) be the subgraph of \(\{Z^{A,\tau _A}_t\}_{t=0}^{K_m}\) induced by the vertex set

$$\begin{aligned} \cup _{x\in \tau _A} \left( \cup _{i=0}^{K_m-1} O^{A,\{x\}}_t \cup \Pi \left( \left\lceil (4/\delta ){\tilde{q}}^{-K_m}\right\rceil , O^{A,\{x\}}_{K_m}\right) \right) . \end{aligned}$$

So \(\mathscr {F}\) is a labeled directed forest with depth \(K_m\) such that \(|\mathscr {F}_0| \geqslant \delta m\) and \(|\mathscr {F}_{K_m,i}|=\lceil (4/\delta ){\tilde{q}}^{-K_m}\rceil \) for all i. Applying Lemma 4.1 with k replaced by \(K_m\) and m replaced by \(\delta m\), and noting that \(|\overleftarrow{\xi }^A_{K_m}|\) stochastically dominates \(|\overleftarrow{\xi }^{\mathscr {F}_0}_{K_m}|\),

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \left| \overleftarrow{\xi }^A_{K_m}\right| \leqslant m\right) \leqslant \exp \left( -\frac{1}{8} {\tilde{q}}^{K_m} \delta m\right) . \end{aligned}$$

This proves the result. \(\square \)

Proof of Theorem 1.7 and 1.9

We take \(\mathcal {G}_n:=\mathcal {G}^1_n \cap \mathcal {G}^2_n\), where \(\mathcal {G}^1_n\) and \(\mathcal {G}^2_n\) are as in Proposition 3.1 and 3.2 respectively, and we will see that

$$\begin{aligned} \Delta :=\frac{1}{2} C_{4.2}\epsilon _{3.1}[\log (1/\epsilon _{3.1})]^{-b} \end{aligned}$$

will suffice, where \(C_{4.2}, b\) are as in Proposition 4.2. Clearly \({{\mathrm{\mathbb {P}}}}_{i,n}(\mathcal {G}_n)=1-o(1)\). Define

$$\begin{aligned} T_x:=\inf \left\{ t\geqslant 1: \left| \overleftarrow{\xi }^{\{x\}}_t\right| \geqslant \epsilon _{3.1} n\right\} . \end{aligned}$$

We take \(T_x=\infty \) if \(\overleftarrow{\xi }^{\{x\}}_t\) never reaches \(\epsilon _{3.1} n\). Recalling the definition of the event \(A_x\) from (3.20) and then applying Proposition 3.1, \(\mathscr {G}_n \in \mathcal {G}_n\) implies

$$\begin{aligned}&{{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q} \left( A_x \cap \left\{ T_x > 2a\log \log n/\log (q{\tilde{r}}) + \sum ^{\epsilon _{3.1} n-1}_{m=(\log n)^a} K_m\right\} \right) \nonumber \\&\quad \leqslant \sum _{m=(\log n)^a}^{\epsilon _{3.1} n} \exp \left( -C_{4.2} m(\log (n/m))^{-b}\right) \nonumber \\&\quad \leqslant n \exp \left( -C_{4.2} (\log n)^a(\log (n/(\log n)^a))^{-b}\right) = o(1/n) \end{aligned}$$
(4.1)

if a is large enough. For \(i\geqslant 1\) if \(|\overleftarrow{\xi }^{\{x\}}_{T_x + (i-1) K_m}| \geqslant \epsilon _{3.1} n\), then we can again apply Proposition 3.1 with A replaced by any subset of \(\overleftarrow{\xi }^{\{x\}}_{T_x+(i-1)K_m}\) consisting of \(\epsilon _{3.1} n\) many vertices to have

$$\begin{aligned}&{{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \left\{ \left| \overleftarrow{\xi }^{\{x\}}_{T_x + i K_m}\right| < \epsilon _{3.1} n\right\} \cap \left\{ \left| \overleftarrow{\xi }^{\{x\}}_{T_x+(i-1)K_m}\right| \geqslant \epsilon _{3.1} n\right\} \right) \\&\quad \leqslant \exp \left( -C_{4.2} \epsilon _{3.1}[\log (1/\epsilon _{3.1})]^{-b}n\right) , \end{aligned}$$

which in turn implies

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \left\{ \left| \overleftarrow{\xi }^{\{x\}}_{T_x + e^{\Delta n} K_m}\right| < \epsilon _{3.1} n\right\} \cap \{T_x < \infty \}\right) = e^{\Delta n}e^{-2\Delta n}=o(1/n). \end{aligned}$$
(4.2)

Combining (4.1) and (4.2) and using union bound,

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \cup _{x\in [n]} \left[ A_x \cap \left\{ \overleftarrow{\xi }^{\{x\}}_{\exp (\Delta n)} = \emptyset \right\} \right] \right) \leqslant n o(1/n)=o(1). \end{aligned}$$

This together with the duality relationship between \(\xi _t\) and \(\overleftarrow{\xi }_t\) suggests

$$\begin{aligned}&{{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \xi ^{[n]}_{\exp (\Delta n)} \supset \{x\in [n]: A_x \text { occurs}\}\right) \nonumber \\&\qquad ={{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \overleftarrow{\xi }^{\{x\}}_{\exp (\Delta n)} \ne \emptyset \text { if } A_x \text { occurs}\right) = 1-o(1). \end{aligned}$$
(4.3)

Now in order to estimate the size of \(\{x \in [n]: A_x \text { occurs}\}\), we will use a second moment argument for \(\sum _{x\in [n]} \mathbf {1}_{A_x}\). Note that

$$\begin{aligned}&E_{\mathscr {G}_n,q}\left[ \sum _{x\in [n]} \mathbf {1}_{A_x} - \sum _{x\in [n]} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_x)\right] ^2 \\&\quad = \sum _{x,y\in [n]} \left[ {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_x \cap A_y) -{{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_x) {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_y)\right] . \end{aligned}$$

Recalling the definition of the event \(A_{x,y}\) from (3.22) if \(A_{x,y}\) occurs, then the corresponding summand in the above sum is 0, otherwise the summands are at most 1. Keeping this observation in mind and using the fact that \(\mathscr {G}_n \in \mathcal {G}^2_n\),

$$\begin{aligned} E_{\mathscr {G}_n,q}\left[ \sum _{x\in [n]} \mathbf {1}_{A_x} - \sum _{x\in [n]} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_x)\right] ^2 \leqslant n + \sum _{x,y\in [n], x\ne y} \mathbf {1}_{A_{x,y}^c} \leqslant n + {n\atopwithdelims ()2} o(1). \end{aligned}$$

The same estimate is true if we replace \(A_x\) by \(\tilde{A}_x\) in the above display. Also from Proposition 3.2 \(n(\pi -\varepsilon ) \leqslant \sum _{x\in [n]} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(A_x)\) and \(\sum _{x\in [n]} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}(\tilde{A}_x) \leqslant n(\pi +\varepsilon )\) for \(\mathscr {G}_n \in \mathcal {G}^2_n\). Therefore, by Chebyshev’s inequality

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( (\pi -2\varepsilon ) n \leqslant \sum _{x\in [n]} \mathbf {1}_{A_x}, \sum _{x\in [n]} \mathbf {1}_{\tilde{A}_x} \leqslant (\pi +2\varepsilon ) n\right) = 1-o(1). \end{aligned}$$

Combining this with (4.3)

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \left| \xi ^{[n]}_{\exp (\Delta n)}\right| \geqslant n(\pi -2\varepsilon )\right) =1-o(1). \end{aligned}$$

Also, using duality between \(\xi _t\) and \(\overleftarrow{\xi }_{t}\) again, for any \(t\geqslant t_n\)

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \left| \xi ^{[n]}_{t}\right| > n(\pi +2\varepsilon )\right) = {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \sum _{x\in [n]} \mathbf {1}_{\tilde{A}_x} > (\pi +2\varepsilon ) n\right) = o(1). \end{aligned}$$

So the required result follows from attractiveness of the threshold contact process.

To prove the last assertion suppose \(q<1/r\). Then we take

$$\begin{aligned} \mathcal {G}_n := \left\{ \mathscr {G}_n: {{\mathrm{\mathbb {E}}}}_{\mathscr {G}_n,q}\left| \overleftarrow{\xi }^{\{x\}}_{C\log n}\right| \leqslant n^{C\log (qr)/2} \text { for all } x \in [n]\right\} \end{aligned}$$

for some constant \(C>0\). In order to see that \({{\mathrm{\mathbb {P}}}}_{i,n}(\mathcal {G}_n)=1-o(1)\), recall the definition of \({{\mathrm{\mathbb {P}}}}\) and \(\tilde{{{\mathrm{\mathbb {P}}}}}\) from the proof of Proposition 3.1. Using union bound and Markov inequality,

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}((\mathcal {G}_n)^c) \leqslant \sum _{x \in [n]} n^{-C\log (qr)/2}{{\mathrm{\mathbb {E}}}}\left( {{\mathrm{\mathbb {E}}}}_{\mathscr {G}_n,q}\left| \overleftarrow{\xi }^{\{x\}}_{C\log n}\right| \right) \leqslant n \cdot n^{-C\log (qr)/2}\cdot n^{-C\log (qr_n)}. \end{aligned}$$

Since a branching process starting from x with offspring distribution \(\mathbf {p}_n\) stochastically dominates \(|\overleftarrow{\xi }^{\{x\}}_t|\), \(\tilde{{{\mathrm{\mathbb {P}}}}}((\mathcal {G}_n)^c)\) has the same upper bound. This together with the fact that \(r_n \rightarrow r\) implies \({{\mathrm{\mathbb {P}}}}_{i,n}(\mathcal {G}_n)=1-o(1)\). Finally for \(\mathscr {G}_n \in \mathcal {G}_n\), again using union bound and Markov inequality

$$\begin{aligned} {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \xi ^{[n]}_{C\log n} \ne \emptyset \right) = {{\mathrm{\mathbb {P}}}}_{\mathscr {G}_n,q}\left( \overleftarrow{\xi }^{[n]}_{C\log n} \ne \emptyset \right) \leqslant \sum _{x \in [n]} {{\mathrm{\mathbb {E}}}}_{\mathscr {G}_n,q}\left| \overleftarrow{\xi }^{\{x\}}_{C\log n}\right| =o(1) \end{aligned}$$

if C is chosen large enough. \(\square \)