1 Introduction

Swarming is an ubiquitous natural phenomenon which occurs at all levels of animal kingdom, from bacterial colonies to schools of fish and human crowds. One of the simplest ways to reproduce swarming mathematically is an aggregation model. In this model, each individual is modelled by a particle that typically interacts with all other particles according to a specified potential function. Typically it is assumed that the particles repel each other at short distances and attract each other at longer distances. In many cases this leads to the formation of swarms. This model and its variants are also often used in robotics literature [13, 14]. As has been observed recently in the literature, the behavior of the interaction potential can lead to highly complex patterns [20, 32, 34, 35]. Most notably we find that the potential can be classified as one in which the system is “confining” in the large N limit (where N is the number of particles) or non-confining. For second-order systems the language used is H-stability vs. catastrophic to describe the energy per particle in the limit of large N [6, 10]. For this paper we are interested in the confining case such that the N→∞ limit has stable solutions that are confined to some finite region, typically approximating a continuum density (possibly concentrated as a measure). The paper focuses specifically on a simple model in one dimension in which the stable solution has two clusters of points regardless of the number of agents or the underlying interaction structure.

In practice, and especially for large system sizes, the assumption of all-to-all interaction is often unrealistic. In robotics, for instance, all-to-all communication can prove prohibitively expensive for a large number of robots. Related models also frequently demand a theoretical understanding how a (possibly dynamic) random network interaction structure affects well-understood, deterministic behaviors such as phase transitions [1], consensus and synchronization [26, 31], and the emergence of collective behavior in locust swarms [18]. Thus it is quite relevant to understand the stability of such system properties in the presence of a relatively sparse network or a random interaction topology. In this paper we study what happens to the system when the assumption of all-to-all communication is relaxed. Specifically, we consider the aggregation model on a network, where the particles are represented by vertices on a graph that interact only if there is an edge between them. The basic particle model then becomes

$$ \frac{\mathrm{d} x_{i}}{\mathrm{d} t}=\frac{1}{N}\sum_{k=1}^{N}e_{ij}F\bigl(\vert x_{i}-x_{j}\vert \bigr)\frac{x_{i}-x_{j}}{\vert x_{i} -x_{j}\vert }, \quad i=1\ldots N. $$
(1)

We take F(r) to be a repulsive-attractive force, i.e. positive for small r and negative for large r. The coefficients e ij encode the connectivity between the particles x i and x j , so that e ij =1 if the vertices i and j interact and zero otherwise. We assume that e ij =e ji , so that the underlying graph is undirected, and in addition that F(0)=0 so that the repulsion is “weak” at the origin. An example of a simple continuous force satisfying these conditions is

$$ F(r)=\min ( \kappa r,1-r ) ,\quad \kappa>0. $$
(2)

For such a force, it was shown in [12, 19, 32] that in the case of a fully connected graph, the equilibrium state can consist of several clusters. For two clusters, this equilibrium is simply

$$ x_{1}\ldots x_{n}=-1/2, \qquad x_{n+1}\ldots x_{N}=1/2, \quad n = N/2. $$
(3)

An example of the evolution to such an equilibrium is shown in the top-left corner of Fig. 1.

Fig. 1
figure 1

Dynamics of the aggregation model (1) for a p-random graph (i.e. with e ij given by (4)) for a range of different p with N=150 and κ=0.5. The y-axis denotes time

We refer to this system as a compromise model because the agents in each group prefer to remain a fixed distance away from all other agents; however, their attraction to the other group forces them to coexist at the same location with half of the total agents. Under all-to-all interactions, this particular steady state is stable provided that 0<κ<1. The main issue that concerns us is how the lack of full connectivity affects the stability of this two-cluster solution. As a simple model in this direction, we consider an Erdős-Rényi random graph model \(\mathcal{G}(N,p)\) for the interaction structure, for which

$$ e_{ij}=\left \{ \begin{array}{@{}l@{\quad }l}1&\mbox{with probability }p, \\[2pt] 0&\mbox{with probability }q=1-p, \end{array} \right . $$
(4)

and investigate how the stability of this equilibrium changes as p decreases. It is well-known that such a random graph \(\mathcal{G}(N,p)\) undergoes a phase transition from connectedness to disconnectedness when p=logN/N. Roughly speaking, for the critical scaling \(p=p_{0}\frac{\log N}{N}\) and in the limit of large N, the graph is disconnected with high probability provided that p 0<1 and is connected when p 0>1. While the connectivity of the underlying graph is a necessary condition for the stability of the two-cluster steady state, as we shall show it is certainly not sufficient.

Figure 1 illustrates how the fully non-linear dynamics change as p is decreased. In each simulation, and in the entirety of the analysis to follow, we first draw a random graph from \(\mathcal {G}(N,p)\) to determine the interaction structure e ij and then solve the deterministic ODEs (1) numerically. That is, the edges e ij are random initially but remain fixed in time. Note that the long-time dynamics and equilibrium remain the same as the fully connected case, even for relatively small p. When p becomes too small (around p≈0.3 for parameters of Fig. 1), the system undergoes a phase change and the two-cluster steady state appears to lose its stability. This observation naturally leads to the following question: how small (resp. big) can we take p and still guarantee stability (resp. instability) of the two-cluster steady state? To formulate this question more precisely, we focus on linear stability theory as N→∞. As a result of translation invariance of the steady state, the linearized problem always has at least one zero eigenvalue. We shall therefore call the two-cluster solution stable asymptotically almost surely if the probability that all remaining eigenvalues are negative tends to one as N→∞. Similarly, we call the two-cluster solution unstable asymptotically almost surely if the probability that one of the remaining eigenvalues is non-negative tends to one as N→∞.

The main results of the paper follow from a linear stability analysis of the equilibrium solution (3). We write the N×N adjacency matrix E of the random graph in terms of n×n blocks,

$$E = \begin{pmatrix} A \quad B\\ B^{t} \quad C \end{pmatrix} , $$

to emphasize the distinction between the two groups. Evaluating the Jacobian of (1) at the equilibrium (3) leads to the linearized system for Φ=(Φ 1,…,Φ n )t∈ℝn and Ψ=(Ψ 1,…,Ψ n )t∈ℝn

$$ \left \{ \begin{array}{@{}l}\lambda\varPhi_{i}=\sum^{n}_{j=1} \kappa a_{ij} ( \varPhi_{i}-\varPhi_{j} ) +\sum_{j=1}^{n}b_{ij} ( \varPsi_{j}-\varPhi_{i} ), \\[3pt] \lambda\varPsi_{i}=\sum_{j=1}^{n} b_{ji} ( \varPsi_{j}-\varPsi_{i} ) +\sum_{j=1}^{n}\kappa c_{ij} ( \varPsi_{i} - \varPsi_{j} ). \end{array} \right . $$
(5)

We can re-write this in the form

$$\lambda \left ( \begin{array}{c}\boldsymbol{\varPhi}\\[2pt] \boldsymbol{\varPsi}\end{array} \right ) =L\left ( \begin{array}{c}\boldsymbol{\varPhi}\\[2pt] \boldsymbol{\varPsi}\end{array} \right ) $$

where L denotes the N×N matrix

$$ L=\kappa \biggl(\begin{array}{c@{\quad }c} D_{A}-A & 0\\ 0 & D_{C}-C \end{array}\biggr) -\begin{pmatrix} D_{B} \quad -B\\ -B^{t} \quad D_{B^{t}}\end{pmatrix} :=\kappa L_{1}-L_{2}. $$
(6)

Here, D A (respectively, D B ,D C ) denotes the n×n diagonal matrix formed from row sums of A, i.e.

$$(D_{A})_{ii}=\sum_{j=1}^{n}a_{ij}. $$

The matrices L 1 and L 2 therefore equal the graph Laplacians formed from two subgraphs of E; the first subgraph contains only those edges that do not connect the two groups and the second subgraph contains only those edges that do connect the two groups. We can therefore interpret the original eigenvalue problem for L as a difference of positive semi-definite Laplacian matrices.

Our main goal is to characterize the stability regime and transition to instability for the two-cluster steady state. Our approach is two-fold. On one hand, we will apply rigorous estimates using random matrix theory to derive rigorous bounds on the transition regime. On the other hand, we will use some heuristic arguments, combined with formal asymptotics and extensive numerics to derive sharp estimates for the transition regime. We summarize our rigorous result as follows.

Theorem 1.1

(Rigorous bounds for stability)

Define F by (2) for 0<κ<1 and let e ij denote the N×N adjacency matrix of an Erdős-Rényi random graph \(\mathcal{G}(N,p)\). If

$$ p\geq\epsilon\frac{\log^{3/2} N}{N}$$
(7)

for some ϵ>0 then the steady state (3) is stable asymptotically almost surely. If

$$ p \leq2(1-\epsilon)\frac{\log(N)}{N}$$
(8)

for some ϵ>0 then the steady state (3) is unstable asymptotically almost surely.

The second statement is consistent with the fact that connectivity of the underlying graph is a necessary condition for stability of the compromise solution. However, connectivity of the graph alone does not suffice, in and of itself, to guarantee convergence to the compromised state. The simulations in Fig. 1 highlight this fact. In each case the realization of the random graph is connected, so each particle influences every other particle, yet for the smaller values of p the compromise state is not reached. Figure 2 demonstrates further that, in general, stability of the compromise solution also depends on the κ parameter: The connected graph in each simulation is identical, yet stability fails as κ approaches one-half. This establishes once again that connectivity of the graph is not the determining factor, but rather that a more complicated interplay between the system parameter κ and the structure of the graph determines stability of the compromise.

Fig. 2
figure 2

Comparison of the dynamics between the consensus model (10) and the aggregation model (1, 2). Here, N=150 and p=0.2. The network topology is fixed at t=0 and is identical for all four simulations

Using heuristic methods, we make the following conjecture for this dependence based on asymptotics of the transition regime:

Conjecture 1.1

(Heuristic asymptotics of the transition regime)

Define F by (2) for 0<κ<1 and let e ij denote the N×N adjacency matrix of an Erdős-Rényi random graph \(\mathcal{G}(N,p)\). There exists a constant p 0c (independent of N) with the following property. If

$$p\geq(p_{0c}+\epsilon)\frac{\log N}{N} $$

for some ϵ>0 then the steady state (3) is stable asymptotically almost surely. If

$$p\leq(p_{0c}-\epsilon)\frac{\log N}{N} $$

then the steady state (3) is unstable asymptotically almost surely.

The asymptotic calculations in Sect. 3, suggest that in fact

$$ p_{0c}=\frac{2}{2-\kappa^{-\kappa/(\kappa+1)} ( 1+\kappa ) }. $$
(9)

We compare this formula and the simplifying assumptions used to derive it with numerical experiments in Sect. 3.2.

1.1 Connection to Consensus Algorithms

Equation (6) generalizes the classical consensus algorithm on an undirected graph. Consensus occurs when all the agents approach a common value in the long time limit. This problem has attracted scholars from many areas such as physics, control theory and biology. Specifically, the case of single-integrator, linear dynamics taking the form

$$ \frac{ \mathrm {d}x_{i} }{\mathrm {d}t} = \sum^{N}_{j\neq i} e_{ij} (x_{i}-x_{j}), \quad x_{i} \in \mathbb{R}^m, $$
(10)

is especially well-studied in control theory. Here \(E = \{ e_{ij} \}^{N}_{i,j=1}\) denotes the weighted adjacency matrix of the underlying graph \(\mathcal{G}(N)\): e ij >0 if two nodes are connected and e ij =0 otherwise. The above equation can be written in matrix form,

$$\dot{\mathbf {x}}= -[L \otimes \mathrm {Id}] \mathbf {x}, $$

where x=(x 1,…,x N ) is the vector of all nodes and L is the Laplacian matrix. While in general the consensus problem is asymmetric, we consider the symmetric case e ij =e ji to emphasize the similarity with our compromise problem.

Definition 1.1

The system (10) reaches consensus if, for all initial conditions x i (0) and all 1≤i,jN, it holds that ∥x i (t)−x j (t)∥→0 as t→∞.

As shown in Ren [27, 29, 30], Olfati-Saber [24, 25], Moreau [22, 23], and Cao [5], the role of connectivity and stability are highly relevant for cooperative control algorithms. For directed graphs with a fixed topology (i.e. the e ij are constant in time), the main theorem regarding consensus is the following [28]:

Theorem 1.2

The system (10) reaches consensus if and only if the directed graph \(\mathcal{G}(N)\) has a directed spanning tree. In this case, \(x_{i}(t) \to\sum^{N}_{i=1} \nu_{i}x_{i}(0)\) as t→∞, where \(\mathbf{\nu}=[\nu_{1},\dots,\nu_{n}]^{T}\geq0\), 1 T ν=1, and L T ν=0.

In the case of a fully connected, undirected graph this reduces to the well-known fact that the consensus is the center of mass of the initial data. The above theorem also has a natural simplification in the case of general undirected graphs:

Corollary 1.3

For an undirected graph \(\mathcal{G}(N)\) (e ij =e ji ), the system (10) reaches consensus if and only if \(\mathcal {G}(N)\) is connected.

We now connect these results to well-known results for graph connectivity of Erdős-Rényi graphs in the large N limit [2, 4, 11]. These results are not new, but the connection between the two problems has not been emphasized in the literature. We state the main result in this direction to have a point of comparison for our results on the compromise model.

Theorem 1.4

Consider Eq. (10) in the case of an Erdős-Rényi graph \(\mathcal{G}(N,p)\). The graph is constructed by choosing e ij =e ji equal to zero or one with some fixed probability p∈(0,1) at initialization then remains constant in time. If for some ϵ>0

$$p \geq(1+\epsilon)\frac{\log N}{N}$$

then the problem (10) reaches consensus asymptotically almost surely. If for some ϵ>0

$$p\leq(1-\epsilon)\frac{\log N}{N}$$

then the problem (10) fails to reach consensus asymptotically almost surely.

In the body of this paper we find that the critical probability for convergence to the compromise solution, in the compromise model, differs from the critical probability for graph connectivity; the threshold has the same logN/N scaling but with a different coefficient. This is illustrated in Fig. 2. In this set of experiments, we fix p=0.2, N=150 and run both the consensus model and the compromise model for an identical underlying network topology and initial condition in all simulations. We study the effect of varying κ from κ=−1 (the consensus limit) to positive values of κ for the compromise model. The underlying graph is connected; because of this, the consensus model (10) quickly reaches the consensus state. The compromise model (1), (2) reaches the “compromise” state when κ=0.3; however it is unstable when κ=0.4 or higher. This clearly shows that the compromise model can lead to unstable configurations even when the underlying graph is connected.

1.2 Preliminary Material

Before we address our main task, we first introduce our notation along with the background material from linear algebra, probability theory and random matrix theory that we shall use. Capital roman letters such as A,B,C refer to matrices while the corresponding lower-case letters a ij ,b ij ,c ij denote their respective entries. We reserve Id for the identity matrix and 1=(1,…,1)t for the constant vector. The size of both will be clear from context. The italicized e ij will always denote the edges of the graph under consideration, whereas the roman version “e” denotes the base of the natural logarithm. For an n×n symmetric matrix A, we let λ i (A) denote the its ith eigenvalue sorted in decreasing order. In other words, we have

$$ \lambda_{1}(A) \geq\lambda_{2}(A) \geq \cdots\geq \lambda_{n}(A), $$
(11)

where each eigenvalue appears according to its algebraic multiplicity. Under this convention, a standard identity from linear algebra provides a useful characterization of the ith eigenvalue, i.e. the Courant-Fischer formula, cf. [33]. This formula yields a system of eigenvalue inequalities, Weyl’s inequalities, that will also prove useful, cf. [33] as well.

Lemma 1.5

(Courant-Fischer Max-Min formula)

Let A denote an n×n symmetric matrix and V a subspace ofn. Then

$$ \lambda_{i}(A) = \max_{V: \mathrm{dim}(V) = i} \ \min_{ \mathbf{v} \in V : \|v\|_{2} = 1} \langle\mathbf{v}, A \mathbf{v} \rangle $$
(12)

Lemma 1.6

(Weyl’s inequalities)

Let A and B denote symmetric n×n matrices. Then for any i,j such that 1≤i,j,i+j−1≤n,

$$ \lambda_{i+j-1}(A+B) \leq\lambda_{i}(A) + \lambda_{j}(B). $$
(13)

Given a sequence of measurable events \(\{ W_{n} \}^{\infty}_{n=1}\), each of which lies in some (possibly different) probability space, we say that W n holds asymptotically almost surely (a.a.s.) if

$$\mathbb{P}(W_{n}) \rightarrow1 \quad\mbox{as } n \rightarrow \infty. $$

Here and in what follows, ℙ always denotes the measure on the probability space in which the relevant event lies. We denote by X p a Bernoulli random variable with parameter p, i.e.

$$ \mathbb{P}( X_{p} = 1) = p \quad\mbox{and} \quad \mathbb{P}(X_{p} = 0) = q := 1-p, $$
(14)

whereas \(\tilde{X}_{p} := X_{p} - p\) will denote a mean-zero Bernoulli random variable. We use \(\mathbb{E}(X)\) to denote the mean or expectation of the random variable X, and the notation X= d Y to signify that the random variables X and Y have the same distribution.

Our arguments require probabilistic estimates of the form ℙ(|X|≥λ), where X will represent either a weighted sum of \(\tilde{X}_{p}\) variables or the operator norm of a symmetric random matrix. For the first case it suffices to simply recall in Lemma 1.7 a variant of the well-known Chernoff bound, cf. [21]. In Lemma 1.8 we prove an operator norm estimate using a standard technique from random matrix theory. The proof of Lemma 1.8 closely mirrors Theorem 1.4 in [36]. Theorem 1.4 in [36] is stated without proof, so to obtain an explicit result we essentially reproduce the arguments from [36] while taking care to keep the estimates as concrete as we will need.

Lemma 1.7

(Chernoff bound)

Let X 1,…,X m denote discrete, independent random variables satisfying \(\mathbb{E}(X_{i}) = 0\) and |X i |≤1. If \(\mathbb{E}(X^{2}_{i}) = \sigma^{2}_{i}\) and \(\sigma^{2} = \sum\sigma^{2}_{i}\), then for any 0≤λ≤2σ

$$ \mathbb{P} \Biggl( \Biggl|\sum^{m}_{i=1} X_{i} \Biggr| \geq\lambda\sigma \Biggr) \leq2\mathrm{e}^{-\lambda^{2}/4}. $$
(15)

Lemma 1.8

Let \(A = \{ a_{ij} \}^{n}_{i,j=1}\) denote a symmetric random matrix with independent upper-triangular entries. Let p 0,q 0>0 denote arbitrary constants independent of n. If

$$(\mathrm{i}) \ \mathbb{E}(a_{ij}) = 0, \qquad(\mathrm{ii})\ \mathbb{E}\bigl(a^{2}_{ij}\bigr) \leq1 \quad \mbox{\textit{and}} \quad(\mathrm{iii})\ |a_{ij}|^{2} \leq \frac{n}{q_{0} \log^{3/2} n} \quad \mbox{\textit{a}.\textit{s}.}, $$

then

$$ \mathbb{P} \bigl( \|A\|_{2} \geq4\sqrt{p_{0} n \log^{3/2} n}\, \bigr) \to0 $$
(16)

faster than n M for any M>0 and all n sufficiently large.

Proof

We follow Wigner’s trace method [37] as outlined in [3, 33, 36]. For any integer k>0 and any λ>0, the fact that \(\|A\|^{2k}_{2} \leq \mathrm {trace}(A^{2k})\) and Markov’s inequality combine to show

$$ \mathbb {P}\bigl( \|A\|_2 \geq\lambda\bigr) = \mathbb {P}\bigl( \|A\|^{2k}_{2} \geq\lambda^{2k}\bigr) \leq \mathbb {P}\bigl( \mathrm {trace}\bigl(A^{2k}\bigr) \geq\lambda^{2k}\bigr) \leq\lambda^{-2k} \mathbb {E}\bigl( \mathrm {trace}\bigl(A^{2k}\bigr) \bigr). $$

Substituting the hypotheses (i)–(iii) into the estimate from [36] yields

After performing the change of variables l=k+1−p, this estimate reads

To estimate the sum, let C(n,k):=8(k+1)/(q 0log3/2 n) and \(f(l) := [\sqrt{C}(k+1-l)]^{2l}\). Elementary calculus then shows that f attains its maximum when

$$ \sqrt{C}(k+1-l)\bigl[ \log\sqrt{C}(k+1-l) + 1\bigr] = \sqrt{C}(k+1). $$

This has a unique solution in terms of special functions,

$$ \sqrt{C}(k+1-l) = \mathrm {e}^{W[\mathrm {e}\sqrt{C}(k+1)]-1}, $$

where W[x] denotes the product-log function. That is, W[x] denotes the unique (for x>0) solution to W[x]eW[x]=x. As a consequence,

Substituting this expression into the estimate for ℙ(∥A2λ) and simplifying demonstrates

$$ \mathbb {P}\bigl(\|A\|_{2} \geq\lambda\bigr) \leq\lambda^{-2k} 4^{2k}n^{k+1} \biggl[\frac{C^{1-1/W}(k+1)^{2-2/W}}{W^{2-2/W} } \biggr]^{k+1}. $$

Now set \(\lambda= 4\sqrt{p_{0} n \log^{3/2} n }\) and k=⌊logn⌋. Then

$$ \mathbb {P}\bigl( \|A\|_{2} \geq4\sqrt{p_0 n \log^{3/2} n} \,\bigr) \leq n\log^{3/2} n \biggl[ \biggl( \frac{8}{q_0W^2} \biggr)^{1-1/W}\frac {(k+1)^{3(1-1/W)} }{p_0 \log^{3-3W/2} n } \biggr]^{k+1}. $$

With this choice of k+1, it then follows that

$$ \mathbb {P}\bigl( \|A\|_{2} \geq4\sqrt{p_0 n \log^{3/2} n} \,\bigr) \leq n\log^{3/2} n \biggl[ \biggl( \frac{8}{q_0W^2} \biggr)^{1-1/W}\frac{1}{p_0 \log^{3/2W} n } \biggr]^{k+1} $$

and that \(W[\mathrm {e}\sqrt{C}(k+1)] \rightarrow\infty\) as well. Thus

$$ \biggl[ \biggl( \frac{8}{q_0W^2} \biggr)^{1-1/W}\frac{1}{p_0 \log^{3/2W} n } \biggr] \to0, $$

so that

$$ \mathbb {P}\bigl( \|A\|_{2} \geq4\sqrt{p_0 n \log^{\beta} n}\, \bigr) \to0 $$

faster than n M for any M>0 and all n sufficiently large. □

We also need to establish the connectivity properties of a slight modification of the standard Erdős-Rényi random graph \(\mathcal{G}(N,p)\) on N vertices. Given a parameter p∈(0,1), we construct an undirected, bipartite graph on N vertices by assigning independent edges e ij =e ji = d X p whenever 1≤in<jN and forcing e ij =e ji =0 otherwise. We let \(\mathcal{K}(N)\) denote the set of all possible bipartite graphs constructed in this manner. We write \((V,E) \in \mathcal{K}(N,p)\) or \((V,E) \in\mathcal{G}(N,p)\) to specify the parameter p when referring to a randomly sampled graph of either type. Here V={v 1,…,v N } is a set of N vertex labels and \(E = \{ e_{ij} \}^{N}_{ij=1}\) denotes the corresponding N×N adjacency matrix. We let \(\mathcal{K}_{c}(N) \subset\mathcal{K}(N)\) denote the subset of connected graphs and \(\mathcal{K}_{d}(N) = \mathcal{K}(N) \setminus\mathcal{K}_{c}(N)\) the disconnected graphs.

By slightly modifying standard proofs from the literature regarding Erdős-Rényi graphs [4] we can readily prove the following lemma. While more sophisticated and more general results exist concerning random bipartite graphs [4], we include a proof below for the sake of completeness.

Lemma 1.9

Given p∈(0,1) let \((V,E) \in\mathcal{K}(N,p)\) denote a corresponding random graph. If

$$ p \geq2(1+\epsilon) \frac{\log N}{N}$$
(17)

for some constant ϵ>0 then (V,E) is connected with probability at least 1−(N/2)ϵ/2, i.e. asymptotically almost surely. Conversely, if

$$ p \leq2(1-\epsilon) \frac{\log N}{N}$$
(18)

for some constant ϵ>0 then (V,E) contains isolated vertices with probability at least \(1 - \mathrm{e}^{-(N/2)^{\epsilon/2}}\).

Proof

For fixed n:=N/2, let W n,k denote the event that there exist k vertices \(\{v_{i_{1}},\ldots,v_{i_{k}}\}\) with no edges connecting \(\{ v_{i_{1}},\ldots,v_{i_{k}}\}\) and \(V \setminus\{v_{i_{1}},\ldots,v_{i_{k}}\}\). Note that

To estimate ℙ(W n,k ), for a fixed \(\{v_{i_{1}},\ldots,v_{i_{k}}\} \in W_{n,k}\) let j denote the number of indices i l with i l n and kj the number of indices with i l >n. By independence of the edges e ij , a straightforward computation shows that

$$ \mathbb {P}\bigl( \{v_{i_1},\ldots,v_{i_k}\} \in W_{n,k} \bigr) = (1-p)^{ (n-j)(k-j) + j(n+j-k) }. $$

Summing over the total number of possible choices for \(\{v_{i_{1}},\ldots ,v_{i_{k}}\}\) yields as a consequence that

The estimates \(\binom{N}{k} \leq(N\mathrm {e}/k)^{k}\) and (1−p)≤ep therefore give

Without loss of generality, assume 0<ϵ<1. Then log(Ne/k)−p(nk/2)≤−ϵlogn+(1+ϵ)(logn)/N+log2e. As ϵ>0, for all n sufficiently large it follows that

$$ \mathbb {P}\bigl( (V,E) \in \mathcal {K}_{d}(N) \bigr) \leq\sum ^{n}_{k=1} n^{-k\epsilon/2} \leq n^{-\epsilon/2} \frac{1}{1 - n^{-\epsilon/2}} \rightarrow0 $$

as desired. To show the second statement, it suffices to show that one of the vertices {v 1,…,v n } becomes isolated. Let R i ={e i,n+1=⋯=e i,N =0} denote the event that the vertex v i is isolated. As R 1= d ⋯= d R n and these events are independent for 1≤in, it follows that

$$ \mathbb {P}\Biggl( \,\bigcup^{n}_{i=1} R_i \Biggr) = 1 - \mathbb {P}\bigl(R^c_1\bigr)^n = 1 - \bigl[1 - \mathbb {P}(R_1) \bigr]^n = 1 - \bigl[1 - (1-p)^n \bigr]^n. $$

When p≤(1−ϵ)n −1logn it follows that (1−p)nn ϵ/2−1 for n sufficiently large. Thus [1−ℙ(R 1)]n≤exp(−nℙ(R 1))≤exp(−n ϵ/2), so that

$$ \mathbb {P}\Biggl(\, \bigcup^{n}_{i=1} R_i \Biggr) \geq1 - \mathrm {exp}\bigl(-n^{\epsilon/2}\bigr) \to1 $$

as desired. □

2 Rigorous Estimates for Stability

We may now turn to our main task, i.e. the proof of Theorem 1.1. Recall the eigenvalue problem reads

$$ \lambda \left ( \begin{array}{c}\boldsymbol{\varPhi}\\[2pt] \boldsymbol{\varPsi}\end{array} \right ) =L\left ( \begin{array}{c}\boldsymbol{\varPhi}\\[2pt] \boldsymbol{\varPsi}\end{array} \right ) $$
(19)

Note that Φ=Ψ=1 always defines an eigenvector of (19) with eigenvalue zero. We therefore call a “two-group” solution stable when the second largest eigenvalue, λ 2(L), of the system (19) is strictly negative. In crude analogy to the law of large numbers, we expect that L should concentrate around its mean, \(L \approx\mathbb{E}(L)\), where the error becomes negligible in the limit of infinite system size. Taking the expectation \(\mathbb{E}\) of (19) entrywise gives

$$\mathbb{E}(L) = p L_{\mathrm{comp}}, $$

where L comp denotes the stability matrix when the underlying graph is complete (e ij ≡1 in (19)). Thus

$$ L = p L_{\mathrm{comp}} + R, $$
(20)

where \(R := L - \mathbb{E}(L)\) is a N×N, symmetric matrix that has mean-zero entries. From Weyl’s inequalities (13), we have

$$\lambda_{2}(L) \leq p \lambda_{2}(L_{\mathrm{comp}}) + \|R\|_{2}. $$

Heuristically then, if λ 2(L comp)<0 and the error ∥R2 is asymptotically negligible the “two-group” solution is stable asymptotically almost surely. Using the estimates from the previous section, we show this is indeed the case provided p does not vanish too rapidly.

To realize this program, we first consider stability of the complete graph. A direct computation reveals that

$$ L_{\mathrm{comp}}:=n(\kappa-1)\mathrm{Id}+\biggl(\begin{array}{c@{\quad }c} -\kappa\mathbf{1}\otimes\mathbf{1} & \mathbf{1}\otimes\mathbf {1}\\[2pt] \quad \mathbf{1}\otimes\mathbf{1} & -\kappa\mathbf{1}\otimes\mathbf {1}\end{array}\biggr). $$
(21)

Let Ψ=0 and Φ=e 1e j , where e j ∈ℝn denotes any of the (n−1) remaining standard basis vectors. Setting \(\mathbf{v}=(\mathbf {\varPhi },\boldsymbol{\varPsi})^{t}\) or v=(Ψ,Φ)t and performing a straightforward computation shows that

$$L_{\mathrm{comp}}\mathbf{v}=n(\kappa-1)\mathbf{v}, $$

so that n(κ−1) is an eigenvalue with multiplicity N−2 by linear independence. The choice v=(1,−1)t yields an eigenvalue of −N. Stability of the complete graph therefore demands κ<1, in which case

$$\lambda_{1}(L_{\mathrm{comp}})=0,\qquad \lambda_{2}(L_{\mathrm{comp}})=\cdots=\lambda_{N-1}(L_{\mathrm{comp}})=n(\kappa-1),\qquad \lambda_{N}(L_{\mathrm{comp}})=-N. $$

Turning now to the random graph case, we wish to find how small we can take p while not losing control of the error R in (20). To this end we decompose the error as

$$ R=D+\tilde{E}, $$
(22)

where D denotes a diagonal matrix and \(\tilde{E}\) denotes a symmetric matrix. The diagonal matrix D has entries

$$ \everymath{\displaystyle }\begin{array}{rcl} d_{ii}&=&\kappa\sum_{j=1}^{n}\tilde{a}_{ij}-\sum_{j=1}^{n}\tilde{b}_{ij}\\[2pt] d_{i+n,i+n}&=&\kappa\sum_{j=1}^{n}\tilde{c}_{ij}-\sum_{j=1}^{n}\tilde {b}_{ji}\end{array} \quad \mbox{for }1\leq i\leq n. $$
(23)

The matrix D therefore has entries comprised of sums of independent random variables, although the entries of D depend on one another and depend on the entries of \(\tilde{E}\) as well. The matrix \(\tilde{E}\) has the form

$$ \tilde{E}=\biggl(\begin{array}{c@{\quad }c} -\kappa\tilde{A} & \tilde{B}\\[2pt] \tilde{B}^{t} & -\kappa\tilde{C}\end{array} \biggr) $$
(24)

The entries of \(\tilde{A}\) and \(\tilde{C}\) satisfy \(\tilde {a}_{ij}=\tilde {a}_{ji}=_{d}\tilde{X}_{p}\) and \(\tilde{c}_{ij}=\tilde {c}_{ji}=_{d}\tilde {X}_{p}\), and are independent on the upper triangular portion. The entries \(\tilde{b}_{ij}=_{d}\tilde{X}_{p}\) of B are independent across the full matrix. Estimating ∥R2 therefore involves estimating the operator norm of two types of matrices: a diagonal matrix with entries that are sums of independent \(\tilde{X}_{p}\) variables and a symmetric matrix with independent \(\tilde{X}_{p}\) variables on the upper triangle. Lemma 1.7 allows us to handle the former while Lemma 1.8 suffices to handle the latter.

We first estimate ∥D2. As D is diagonal, this simply equals the entry with maximum absolute value. We simply apply Lemma 1.7 directly to the N independent random variables that constitute a given diagonal entry. A direct calculation of the relevant quantities in the statement of the lemma shows that

$$\sigma^{2} := \kappa^{2} \sum ^{n}_{j=1} \mathbb{E}\bigl( \tilde{a}^{2}_{ij} \bigr) + \sum^{n}_{j=1} \mathbb{E}\bigl( \tilde{b}^{2}_{ij}\bigr) = n\bigl(1+\kappa^{2} \bigr)p(1-p) = n\bigl(1+\kappa^{2}\bigr)pq $$

Thus whenever λ 2/4≤n(1+κ 2)pq, it follows from the lemma that

$$\mathbb{P} \bigl( |d_{ii}| \geq\lambda\sigma \bigr) \leq2 \mathrm{e}^{-\lambda^{2}/4}. $$

By the union bound, this implies

$$\mathbb{P} \bigl( \|D\|_{2} \geq\lambda\sigma \bigr) \leq4n \mathrm{e}^{-\lambda^{2}/4}. $$

To ensure the right hand side still tends to zero as n→∞, we take λ 2/4=(1+ϵ)logn for some ϵ>0. In turn, this places the requirement on p that

$$ pq > \frac{\log n}{4(1+\kappa^{2})n}. $$
(25)

Substituting these choices into the previous bound demonstrates that

$$ \mathbb{P} \bigl( \|D\|_{2} \geq2\sqrt{(1+\epsilon) \bigl(1+ \kappa^{2}\bigr) pqn\log n } \,\bigr) \leq4n^{-\epsilon} $$
(26)

provided that (25) holds.

It remains to estimate \(\|\tilde{E}\|_{2}\). For this task we apply Lemma 1.8 from the preliminary material. We let M denote the N×N symmetric matrix

$$M = \frac{ \tilde{E} }{\sqrt{p}}, $$

so that its entries m ij satisfy

$$\mathbb{E}(m_{ij}) = 0, \qquad\mathbb{E}\bigl(m^{2}_{ij} \bigr) \leq\max{ \bigl\{ \kappa^{2},1 \bigr\} }, \qquad|m_{ij}|^{2} \leq\max{\bigl\{ \kappa^{2},1 \bigr\} }\frac{1}{p}. $$

As 0<κ<1, we satisfy the hypotheses of Lemma 1.8 provided we place one further restriction on p, i.e. that

$$Np \geq q^{2}_{0} \log^{3/2} N $$

where q 0>0 denotes an arbitrary, fixed constant. We then have that there exists an N-independent constant C so that the estimate

$$ \mathbb{P} \bigl( \|\tilde{E}\|_{2} \geq4p_{0}\sqrt{Np \log^{3/2} N }\, \bigr) \leq CN^{-2} $$
(27)

holds for all p 0,q 0>0 and N sufficiently large. Substituting the previous estimates (26), (27) into the bound for ∥R2, we find

$$\lambda_{2}(L) \leq Np\frac{(\kappa-1)}{2} + 4p_{0}\sqrt{Np} \log^{3/4}N + 2\sqrt{(1+\epsilon) \bigl(1+\kappa^{2}\bigr) pqn\log n }$$

holds with probability at least

$$1 - 2\max \bigl\{ 4(N/2)^{-\epsilon},CN^{-2} \bigr\} . $$

Taking ϵ=3 and p 0=(1−κ)q 0 δ/8 for some δ>0, for instance, we see that

$$\lambda_{2}(L) \leq-\sqrt{Np}\frac{(1-\kappa)q_{0}}{2}(1-\delta) \log^{3/4}N + 4\sqrt{np\bigl(1+\kappa^{2}\bigr) \log n}$$

with probability at least

$$1 - CN^{-2}. $$

As a consequence, we find λ 2(L)<0 asymptotically almost surely. The following theorem encapsulates the preceding discussion.

Theorem 2.1

Let 0<κ<1 and p∈(0,1) satisfy

$$ p \geq q^{2}_{0} \frac{\log^{3/2} N}{N}$$
(28)

for any q 0>0. Then for any δ>0

$$\lambda_{2}(L) \leq- \biggl( \frac{(1-\kappa)(1-\delta)}{2}q_{0} + O \bigl( \log^{-1/4} N \bigr) \biggr) \sqrt{Np} \log^{3/4} N $$

asymptotically almost surely.

To establish the converse result, we appeal to the results regarding connectivity of random graphs in the preliminary material. Suppose that

$$p \leq2(1-\epsilon) \frac{\log N}{N}$$

for some ϵ>0. Then by Lemma 1.9, a graph \((V,E) \in\mathcal{K}(N,p)\) contains isolated vertices with probability at least 1−exp(−(N/2)ϵ/2). For any such graph, let j denote the index of an isolated vertex and set

$$\mathbf{v} = c_{1} \mathbf{1} + c_{2} \mathbf{e}_{j}$$

for any choice of coefficients c 1,c 2 such that ∥v2=1. Then 〈e j ,L 2 e j 〉=0, so that

$$\langle\mathbf{v} , L \mathbf{v} \rangle= \kappa c^{2}_{2} \langle \mathbf{e}_{j},L_{1} \mathbf{e}_{j} \rangle\geq0 $$

provided κ≥0. By the Courant-Fischer formula (12), this implies λ 2(L)≥0 for any such graph. Therefore for any choice of κ N ≥0 we find λ 2(L)≥0 asymptotically almost surely. We summarize this fact in the following theorem.

Theorem 2.2

Let κ>0. If for some ϵ≥0

$$p \leq2(1-\epsilon)\frac{\log N}{N}, $$

then λ 2(L≥0) with probability at least 1−exp(−(N/2)ϵ/2). In particular, the steady-state (3) is unstable asymptotically almost surely.

Remark 2.3

From a modelling perspective, in the system of ODEs (1) only κ>0 makes sense. However, the preceding arguments hold if κ≤0 as well. When κ<0, the statement “λ 2(L)<0 asymptotically almost surely” proves exactly equivalent to the connectedness of the full Erdős-Rényi random graph \(\mathcal{G}(N,p)\) on N vertices. In this case it is well-known that the sharp threshold is p=logN/N, so that at κ=0 a “discontinuity” occurs in the sharp threshold for stability.

3 Estimates for Critical Probability

The main goal of this section is to derive formula (9). We first present formal asymptotics and then present numerical simulations supporting these asymptotics.

3.1 Formal Asymptotics

The formal asymptotics rely on the following key lemma that may be of independent interest.

Lemma 3.1

Let P λ denote the Poisson distribution with parameter λ. Define

$$ X:=P_{\lambda};\qquad Y:=-\kappa P_{\lambda}\quad \mbox{\textit{and}}\quad Z:=X+Y,\quad \mbox{\textit{where} }0<\kappa<1. $$
(29)

Suppose that

$$ \lambda=\lambda_{0}\log N\quad \mbox{\textit{with} }N\gg1,\ \lambda_{0}=O(1) $$
(30)

and let Z 1,…,Z N be N independent realizations of the random variable Z. Define S and x 0 through the equations

(31)
(32)

Then in the limit N≫1, we have

$$ \mathbb{E}\bigl(\min ( Z_{1},\ldots, Z_{N} ) \bigr)\sim x_{0}\log N,\quad N\gg1. $$
(33)

As a consequence, \(\mathbb{E(}\min(Z_{1},\ldots ,Z_{N}))\sim0\) if and only if

$$ \lambda_{0}=\frac{1}{2-\kappa^{-\kappa/(\kappa+1)} ( 1+\kappa ) }. $$
(34)

Remark 3.2

The asymptotic statement in (33), that

$$\frac{\mathbb{E}(\min ( Z_{1},\ldots, Z_{N} ) )}{x_{0}\log N}\rightarrow1 \quad\mbox{as } N\rightarrow\infty, $$

follows as consequence of the more precise statement

$$\mathbb{P}\bigl(\min ( Z_{1},\ldots, Z_{N} ) >x_{0}\log N\bigr)\sim \exp \bigl\{ -CN^{\psi(x_{0},S,\lambda_{0}) + 1} \bigr\} $$

for some positive constant C=O(logN).

We first use the lemma to derive (9), then prove Lemma 3.1 on page 17. As in Sect. 2, we decompose L into three parts

$$ L=pL_{\mathrm{comp}}+D+\tilde{E} ,$$
(35)

where L comp is defined in (21) and D and \(\tilde{E}\) are defined in (22), (23), (24). The matrix \(\tilde{E}\) is a symmetric random matrix whose entries have mean zero and D is a diagonal matrix whose entries are minus the row sums \(\tilde{E}\). As noted in Sect. 2, the matrix L comp has a zero eigenvalue with algebraic multiplicity one, an eigenvalue λ=−N(1−κ)/2 with algebraic multiplicity 2N−2 and an eigenvalue λ=−N<−N(1−κ)/2 with algebraic multiplicity one. With this in mind, and in the spirit of asymptotics, we formally replace

$$ L_{\mathrm{comp}}\sim-\frac{N(1-\kappa)}{2}\mathrm {Id}. $$
(36)

To gain further insight on the remaining terms, we first consider the simpler case where \(e_{ij} =_{d} \mathcal{N}(p,pq)\) are normally distributed with the same mean and variance as the Bernoulli case. The diagonal entires of D therefore have distribution

$$ d_{ii} =_{d} \sqrt{1+\kappa^{2}}\sqrt{pqN/2} \mathcal{N}(0,1).$$
(37)

If \(\mathcal{N}_{1},\ldots,\mathcal{N}_{N}\) denote N independent unit Gaussians, an argument similar to the proof of Lemma 3.1 demonstrates that \(\mathbb{E}( \max(\mathcal{N}_{1},\ldots,\mathcal{N}_{N}) ) \sim\sqrt {2\log N}\). Due to the symmetry of \(\tilde{E}\), the entries of D are not independent. However, if we formally assume that D has independent entries this would imply that \(\mathbb{E} ( \max(d_{ii}) ) \sim \sqrt{ ( 1+\kappa^{2} ) pqN\log N}\). Finally, \(\tilde{E}/\sqrt{pq}\) is a symmetric random matrix whose entries have mean zero and variance bounded by one. In the case of normally distributed weighted edges e ij , the entries of \(\tilde{E}/\sqrt{pq} =_{d} \mathcal{N}(0,1)\) have uniformly bounded fourth moments. From the Bai-Yin theorem (cf. [3]), it follows that \(\|\tilde{E}\|_{2} =O(\sqrt{pqN}\,)\) asymptotically almost surely. Thus \(\|\tilde{E}\|_{2}\) is \(O(\sqrt{\log N}\,)\) smaller than the maximum entry of D if N is large. We therefore formally discard \(\tilde{E}\) in (35) to obtain

$$\lambda_{2}(L)\sim-\frac{Np(1-\kappa)}{2}+\sqrt{ \bigl( 1+ \kappa^{2} \bigr) pqN\log N}. $$

The two terms in balance precisely when p has the critical scaling p=O(logN/N). Substituting p=p 0logN/N and setting λ 2(L)=0 yields a critical threshold

$$ p_{0c}=\frac{4 ( \kappa^{2}+1 ) }{(1-\kappa)^{2}}, $$
(38)

so that the eigenvalue λ 2 is negative for p 0>p 0c and is positive for p 0<p 0c asymptotically almost surely.

We now turn our attention to the original problem. From (23) we have

$$ \everymath{\displaystyle }\begin{array}{rcl}d_{ii}&=&(1-\kappa)\frac{N}{2}p+\sum_{j=1}^{n}\kappa a_{ij}-\sum_{j=1}^{n} b_{ij}\\[12pt] d_{i+n,i+n}&=&(1-\kappa)\frac{N}{2}p+\sum_{j=1}^{n}\kappa c_{ij}-\sum_{j=1}^{n}b_{ji}\end{array} \quad \mbox{for }1\leq i\leq n. $$
(39)

As in the normal case, our first simplifying assumption is to discard \(\tilde{E}\) in (35). Unlike the normal case, we cannot prove that \(\|\tilde{E}\|_{2}=O(\sqrt{pqN}\,)\) for p as small as p=p 0logN/N (cf. Lemma 1.8). The numerical evidence in Sect. 3.2 is consistent with \(\|\tilde {E}/\sqrt{pqN}\|_{2} = o(\sqrt{\log N}\,)\) when p=p 0logN/N, however, so that \(\tilde{E}\) is of lower order in this case as well. Substituting (39) and (36) into (35) and discarding \(\tilde{E}\) we obtain

$$L\sim\hat{D},$$

where \(\hat{D}\) denotes the diagonal matrix with entries

$$ \everymath{\displaystyle }\begin{array}{rcl}\hat{d}_{ii}&=&\sum_{j=1}^{n}\kappa a_{ij}-\sum_{j=1}^{n}b_{ij}\\[12pt] \hat{d}_{i+n,i+n}&=&\sum_{j=1}^{n}\kappa c_{ij}-\sum_{j=1}^{n}b_{ji}\end{array} \quad \mbox{for }1\leq i\leq n. $$
(40)

This amounts to approximating L by its diagonal. As in the normal case, we continue to assume that the entries \(\hat{d}_{ii}\) are independent. While this is not true, in practice this assumption introduces negligible error to the overall computation. Next, we set p=p 0logN/N and approximate the sum of n independent Bernoulli trials \(\sum_{j=1}^{n}a_{ij}\) by a Poisson distribution. That is, we replace \(\sum_{j=1}^{n}a_{ij}\sim P_{\lambda}\) with λ=pn=(p 0logN)/2. Then \(\hat {d}_{ii}\sim\kappa P_{\lambda}-P_{\lambda}\) is the difference of two Poisson distributions. The threshold occurs precisely when \(\mathbb{E}\max ( \hat{d}_{1,1},\ldots,\hat{d}_{N,N} ) =0\). By Lemma 3.1 this happens precisely when λ 0=p 0c /2 satisfies

$$ \frac{p_{0c}}{2}=\frac{1}{2-\kappa^{-\kappa/(\kappa+1)} ( 1+\kappa ) }.$$
(41)

This is precisely the threshold (9).

It remains to give the derivation of Lemma 3.1. We first recall Laplace’s method since it plays a central role in the derivation. We state it as follows:

Lemma 3.3

(Laplace’s method)

Suppose that f(x) is smooth inside [a,b]. Suppose that f(x) has a global maximum at x 0 with a<x 0<b and with f′′(x 0)<0. Then

$$ \int_{a}^{b}\exp \biggl\{ \frac{1}{\varepsilon}f(x) \biggr\} \sim\sqrt {\frac{2\pi\varepsilon}{ \vert f^{\prime\prime}(x_{0})\vert }} \exp \biggl\{ \frac{1}{\varepsilon}f(x_{0}) \biggr\} \bigl( 1+O(\sqrt {\varepsilon}) \bigr) ,\quad \varepsilon\ll1 $$
(42)

If f is increasing inside [a,b] and with f′(b)>0, then

$$ \int_{a}^{b}\exp \biggl\{ \frac{1}{\varepsilon}f(x) \biggr\} \sim \frac{\varepsilon}{f^{\prime}(b)}\exp \biggl\{ \frac{1}{\varepsilon }f(b) \biggr\} \bigl( 1+O(\varepsilon) \bigr) ,\quad \varepsilon\ll1 $$
(43)

See for example [17] for explanation of Laplace’s method.

Proof of Lemma 3.1

Let f(t),g(t) and h(t) be the probability density function for X,Y and Z, respectively. We have

Using the Stirling approximation formula

$$\log ( k! ) \sim k\log k-k+O(\log k),\quad k\gg1 $$

we then estimate

$$f(t)\sim C\exp \bigl( \phi(t,\lambda) \bigr) $$

where we define

$$\phi(t,\lambda):=-\lambda+t \biggl( \log\frac{\lambda}{t}+1 \biggr) ,\quad t>0. $$

Here and below, C will denote a positive quantity that has order at most O(logN). We then obtain

$$ h(t)\sim\int_{t}^{\infty}C\exp \bigl\{ \psi(t,s, \lambda) \bigr\} \,ds\quad \mbox{where } \psi(t,s,\lambda):=\phi(s,\lambda)+ \phi \biggl( \frac {s-t}{\kappa},\lambda \biggr) . $$
(44)

Consider the critical scaling

$$s=S\log N;\qquad t=T\log N,\qquad \lambda=\lambda_{0}\log N;\qquad S,T, \lambda_{0}=O(1). $$

We then have

$$\exp \bigl\{ \psi(t,s,\lambda) \bigr\} =\exp \bigl\{ \psi(T,S, \lambda_{0})\log N \bigr\} $$

so that the integral (44) can be estimated asymptotically using Laplace’s method (42). It then yields

$$ h(t)\sim C\exp \bigl\{ \psi\bigl(T,S(T),\lambda_{0}\bigr)\log N \bigr \} ,\quad t=T/\varepsilon,\ T,\lambda_{0}=O(1) $$
(45)

where S(T) satisfies \(\frac{d}{dS}\psi(T,S,\lambda_{0})=0\):

$$ \log\frac{\lambda_{0}}{S}+\frac{1}{\kappa}\log\frac{\kappa\lambda_{0}}{S-T}=0 $$
(46)

Elementary calculus shows that (46) has a unique root S=S(T)>T which is the global maximum of ψ(T,S,λ 0). Next, we approximate

$$ \int_{-\infty}^{x_{0}}\exp \bigl\{ \psi\bigl(T,S(T), \lambda_{0}\bigr)\log N \bigr\} \,dT\sim C\exp \bigl\{ \psi \bigl(x_{0},S(x_{0}),\lambda_{0}\bigr)\log N \bigr\} $$
(47)

This follows from the fact that the function ψ(x 0,S(x 0),λ 0) is increasing in x 0 and from (43). Hence we have:

$$P(Z>x_{0}\log N)\sim1-C\exp \bigl\{ \psi\bigl(x_{0},S(x_{0}), \lambda_{0}\bigr)\log N \bigr\} $$

and

$$P\bigl(\min(Z_{1},\ldots Z_{N})>x_{0}\log N \bigr)\sim \bigl( 1-C\exp \bigl\{ \psi \bigl(x_{0},S(x_{0}), \lambda_{0}\bigr)\log N \bigr\} \bigr)^{N}. $$

Using the elementary estimate (1−x/N)N∼exp(−x) as N→∞, we therefore obtain

$$P\bigl(\min(Z_{1},\ldots, Z_{N})>x_{0}\log N \bigr)\sim\exp \bigl[ -C\exp \bigl\{ \bigl[ \psi\bigl(x_{0},S(x_{0}), \lambda_{0}\bigr)+1 \bigr] \log N \bigr\} \bigr] . $$

Thus we have a sharp threshold: if ψ+1<0 then min(Z 1,…,Z N )>x 0logN with probability rapidly approaching one; in the oppose case, min(Z 1,…,Z N )<x 0logN with probability rapidly approaching one. Thus \(\mathbb{E}(\min(Z_{1},\ldots, Z_{N}))=x_{0}\log N\) precisely when \(\psi(x_{0},S,\lambda_{0})+1=0=\frac{d}{dS}\psi(x_{0},S,\lambda_{0})\). Finally to show (34), we set x 0=0 in (32) to obtain

$$ \log\frac{\kappa\lambda_{0}}{S}=-\kappa\log\frac{\lambda_{0}}{S} $$
(48)

so that (31) simplifies

$$ 0=1-2\lambda_{0}+S+\frac{S}{\kappa};\quad S= \lambda_{0}\kappa^{1/(\kappa+1)} $$
(49)

Solving (48), (49) for λ 0 then yields (34). □

3.2 Numerical Computations

To test the asymptotic theory, we compare the theoretical threshold (9) with numerical estimates of the threshold. For a given probability p and given system size N define \(f(p,N) :=\mathbb{E(\lambda}_{2}(L))\), where L denotes the stability matrix (6). We estimate f(p,N) by taking the average of λ 2(L) for 1,000 different random realizations. To estimate p 0c for a fixed value of N, we use the bisection method to solve f(p c ,N)=0 then set p 0c =p c N/logN. This yields the following table.

From the asymptotics in Lemma 3.1 we expect error of at least \(O ( \frac{1}{\sqrt{\log N}} )\). This demands very large N before we can expect to see reasonable agreement between the asymptotics and numerics. For example, as \(\frac{1}{\sqrt{\log10,000}}=0.33\) the typical error of 10 % with N=10,000 in the table is still in line with expectations. We do not include results for N=1000 if κ>0.5. This is because p 0c becomes too big, requiring a larger value of N than N=1000 to make p sufficiently small. For example κ=0.6 yields p 0c =32.16 so that p=32.16log(1000)/1000≈0.2 which is introduces an O(p) error comparable with \(O(1/\sqrt{\log N}\,)\). The graph p 0c (κ) is also shown in Fig. 3. Good agreement is observed between the theoretical prediction and numerical computations.

Fig. 3
figure 3

Comparison between the numerics (see Table 1) and theory as given by (9) for the threshold value

Table 1 Comparison between asymptotic theory and numerics for the critical threshold p 0c

As we have no rigorous proof of the simplifying assumptions in the derivation of formula (9), we attempt to verify them numerically. Specifically, we assume that the matrix \(\tilde{E}\) in (35) is of lower order and that the diagonal entries in (39) are actually independent random variables. To verify the first assumption, let \(Y=(X_{p}-p)/\sqrt{p(1-p)N}\) where X p denotes a Bernoulli random variable. Take the critical scaling p=p 0logN/N and consider a symmetric N×N matrix M(N,p 0) with upper triangular entries m ij = d Y. We compute the expected operator norm \(\mathbb {E(}\Vert M(N,p_{0})\Vert _{2})\) for fixed p 0 and N using an average of 100 independent trials. Table 2 shows the results as a function of p 0 and N up to N=10000.

Table 2 Numerical verification that \(\tilde{E}\) in (35) is negligible

The value in parentheses denotes the standard deviation of the 100 trials. The results are consistent with our first assumption, that \(\Vert M(N,p_{0}) \Vert_{2} = o(\sqrt{\log N}\,)\); in fact these numerics suggest that ∥M(N,p 0)∥2=O(1) as N→∞ and for fixed p 0. However, as \(\sqrt{\log10\,000} \approx3.03\) is still rather large, a much more systematic numerical study is required to verify this conjecture with any certainty. This lies beyond the scope of the present work.

To verify the second assumption we consider the following numerical test. Let \(S_{i}=\sum_{j=1}^{N}a_{ij}\) where e ij = d X p denote Bernoulli random variables with the dependence assumption ξ ij =ξ ji and let \(A_{i}=\sum_{j=1}^{N}b_{ij}\) where b ij = d X p denote fully independent Bernoulli random variables. We then compute \(s:=\mathbb {E}(\min(S_{1},\ldots,S_{N}))\) and \(a:=\mathbb {E}(\min(A_{1},\ldots ,A_{N}))\). The second assumption essentially states that s/a→1 as N→∞. Numerically this is indeed true. In all the cases we tried, the difference between s and a was negligible. For example taking N=1000, p 0=1.5 and using 2000 trials, we found that a≈1.823 and s≈1.815 with a nearly identical histogram of the sample trials.

4 Discussion

This paper presents a study of the behavior of a well-known swarming algorithm where the standard all-to-all coupling between agents is replaced by a random graph where two agents interact with some probability p. The classical ‘compromise’ solution is shown to lose stability when the graph is very sparse and estimates are derived on the sparseness of the graph (in terms of bounds on p) such that the clustering solution is no longer stable. While the best result we can obtain rigorously is that the compromise solution is stable when pO(log3/2 N/N), the calculations in Sect. 3 suggest that the critical probability scales like p=O(logN/N). Moreover, the constant is strictly larger than the threshold for connectivity of the underlying graph. Closing the gap between the rigorous result and our conjectured scaling remains a difficult open problem. The main difficulty lies in obtaining stronger estimates on the operator norm of Bernoulli random matrices, such as an improved version of Lemma 1.8, when p scales with logN/N. To the best of our knowledge such estimates do not yet exist, which demonstrates the need for better understanding of random Bernoulli matrices in the critical regime p=O(logN/N).

We note that the all-to-all coupling assumption underlying many aggregation models, while useful for analytical computations, is numerically expensive: simulating one step of an aggregation model on N fully coupled particles has a cost of O(N 2). On the other hand, our analysis suggests that similar dynamics might be achieved with relatively sparse coupling, whereby each particle is coupled to only O(logN) other particles chosen at random. The cost of each step in the computation would then be of O(NlogN) while still retaining the qualitative aspects of the N 2 coupling. This observation might also prove useful when using the compromise model, or a related model, in a distributed control setting [15, 16]. For problems in which communication between agents is expensive, such as mobile robot technology that uses a wireless signal for communication, the reduction to O(logN) interactions per agent would then allow use of the compromise model in a setting where the usual O(N) cost is prohibitively expensive.

Figure 4 illustrates this phenomenon for some two dimensional models in which the geometry is more complex than that of the line. As in the one dimensional case, we draw a random graph from \(\mathcal{G}(N,p)\) at initialization, then solve the two-dimensional equivalent of (1) numerically while keeping the graph fixed throughout the simulation. The sparse connectivity of the graph manifests as a ‘noisy’ version of the fully-connected configuration even though we use a fixed graph for the simulation and no noise is actually present. We conjecture that the expected value over different realizations of these sparse graphs (where the connection is viewed as a random variable) would lead to an additional diffusion process. It is also an interesting open question to study how the sparsity of connections affects the confinement properties of the potential. Numerical experiments shown in Fig. 4 indicate that confinement is preserved even for relatively small values of p. Based on the results of this paper, it is tempting to conjecture that confinement is preserved up to p=O(logN/N).

Fig. 4
figure 4

Examples of two-dimensional steady states for the system (1) with F(r)=min(ar+b,1−r), e ij drawn according to (4) and N=500. The ODE system was simulated using the forward Euler method until the equilibrium was achieved. Note that the “shadow” of the steady state is preserved even for relatively small values of N

Our current analysis readily extends to higher dimensions and to more general random graph models. In higher dimensions, ‘simplex’ configurations (cf. the top row in Fig. 4) are the natural analogue of the ‘compromise’ solution that we consider in this paper. Without the random graph structure, the stability analysis for such solutions already exists [19, 32]. Extending the present analysis to these cases would therefore only require a version of the trace method for block matrices, but in principle this extension is straightforward. Outside of this modification, the program to demonstrate stability remains the same. The assumption that all agents interact, on average, with the same number Np of other agents might be inappropriate depending on the application. The generalization of the Erdős-Rényi model \(\mathcal{G}(N,p)\) due to Chung and Lu [7, 8] overcomes this difficulty by allowing for arbitrary degree sequences. Provided the minimum degree of the graph is sufficiently large, the trace method also applies when studying the Laplacian matrix for such graphs [9]. Our results should therefore extend, in a straightforward manner, to this generalized version of the Erdős-Rényi model.