1 Introduction

1.1 Overview

Let A be the adjacency matrix of a graph with vertex set \([N]\! =\! \{1, \dots , N\}\). We are interested in the geometric structure of the eigenvectors of A, in particular their spatial localization. An \(\ell ^2\)-normalized eigenvector \({\textbf{w}} = (w_x)_{x \in [N]} \in {\mathbb {R}}^N\) gives rise to a probability measure \(x \mapsto w_x^2\) on the set of vertices [N]. Informally, \({\textbf{w}}\) is delocalized if its mass is approximately uniformly distributed throughout [N], and localized if its mass is essentially concentrated on a small number of vertices.

In this paper we study the spatial localization of eigenvectors for the Erdős–Rényi graph \( {\mathbb {G}} \equiv {\mathbb {G}}(N,d/N)\). It is the simplest model of a random graph, where each edge of the complete graph on N vertices is kept independently with probability d/N, with \(0 \leqslant d \leqslant N\). Here, \(d \equiv d_N\) is a parameter whose interpretation is the expected degree of a vertex. It is well known that \({\mathbb {G}}\) undergoes a dramatic change in behaviour at the critical scale \(d \asymp \log N\), which is the scale at and below which the vertex degrees do not concentrate. For \(d \gg \log N\), with high probability all degrees are approximately equal and the graph is homogeneous. On the other hand, for \(d \lesssim \log N\), the degrees do not concentrate and the graph becomes highly inhomogeneous: it contains for instance hubs of large degree, leaves, and isolated vertices. As long as \(d > 1\), the graph \({\mathbb {G}}\) has with high probability a unique giant component, and we shall always restrict our attention to it.

The Erdős–Rényi graph \({\mathbb {G}}\) at and below criticality was proposed in [12] as a simple and natural model on which to address the question of spatial localization of eigenvectors. Its graph structure provides an intrinsic and nontrivial notion of distance, which allows for a study of the geometry of the eigenvectors. It can be interpreted as a model of quantum disorder, where the disorder arises from the random geometry of the graph. Moreover, its phase diagram turns out to be remarkably amenable to rigorous analysis.

In this paper we establish the existence of a fully localized phase in a region of the phase diagram of \({\mathbb {G}}\) near the spectral edge. This complements the completely delocalized phase established in [12, 14, 42, 50]. Our results in both phases are quantitative with essentially optimal bounds.

As a consequence, for a range of critical densities \(d \asymp \log N\), we establish a mobility edge separating the localized and delocalized phases. We derive the explicit behaviour of the localization length on either side of the mobility edge. In particular, we show how the localization length diverges as one approaches the mobility edge from the localized phase (see Fig. 3 below). The Erdős–Rényi graph at criticality is hence one of the very few models where a mobility edge can be rigorously established. Moreover, our proofs yield strong quantitative control of the localization length in the localized phase, as well as complete delocalization in the delocalized phase, all the way up to the mobility edge in both phases. To the best of our knowledge, this is the first time quantitative control is obtained in the vicinity of the mobility edge.

A graphical overview of the main result of this paper, and how it fits into the previous results of [12, 14], is provided by the phase diagram of Fig. 1. It depicts three phases, which are most conveniently characterized by the \(\ell ^\infty \)-norm \(\Vert {\textbf{w}} \Vert _\infty \) of an \(\ell ^2\)-normalized eigenvector \({\textbf{w}}\). Clearly, \(N^{-1} \leqslant \Vert {\textbf{w}} \Vert _\infty ^2 \leqslant 1\), where \(\Vert {\textbf{w}} \Vert _\infty ^2 = 1\) corresponds to localization at a single vertex and \(\Vert {\textbf{w}} \Vert _\infty ^2 = N^{-1}\) to perfect delocalization. We say that an eigenvalue \(\lambda \) of the rescaled adjacency matrix with eigenvector \({\textbf{w}}\) belongs to

  1. (i)

    the localized phase if \(\Vert {\textbf{w}} \Vert _\infty ^2 \asymp 1\),

  2. (ii)

    the delocalized phase if \(\Vert {\textbf{w}} \Vert _\infty ^2 = N^{-1 + o(1)}\),

  3. (iii)

    the semilocalized phase if \(\Vert {\textbf{w}} \Vert _\infty ^2 \geqslant N^{-\gamma }\) for some constant \(\gamma < 1\).

In particular, the localized phase is a subphase of the semilocalized phase. The result of this paper is the existence of phase (i), while phases (ii) and (iii) were previously established in [12, 14].

Fig. 1
figure 1

The phase diagram of the rescaled adjacency matrix \(A / \sqrt{d}\) of the (giant component of the) Erdős–Rényi graph \({\mathbb {G}}(N,d/N)\) at criticality, where \(d = b \log N\) with b fixed. The horizontal axis records the location \(\lambda \) in the spectrum and the vertical axis the sparseness parameter b. The spectrum is confined to the coloured region, which is split into the indicated phases. The thick purple lines correspond to phase boundaries, which are not covered by our results. (The phase boundary at energy 0 for \(b \leqslant 1\) is discussed in [12, 14].) For large enough b, there is a mobility edge between the localized and the delocalized phases at energies \(\pm 2\)

We now briefly describe the structure of the phase diagram in Fig. 1. It is well known [75] that, as long as \(d \gg 1\), the global eigenvalue density of H converges to the semicircle law supported in \([-2,2]\). We write \(d = b \log N\) for some constantFootnote 1\(b > 0\). The localized and semilocalized phases exist only if \(b < b_*\), where

(1.1)

For fixed \(b < b_*\), the spectrum splits into two disjoints parts, the delocalized phase \((-2,0) \cup (0,2)\) and the semilocalized phase \((-\lambda _{\max }, -2) \cup (2, \lambda _{\max })\), where \(\lambda _{\max } > 2\) is an explicit function of b (see (B.3) below). A region \((-\lambda _{\max }, -\lambda _{\textrm{loc}}) \cup (\lambda _{\textrm{loc}}, \lambda _{\max })\) near the spectral edges is in fact fully localized, where \(2 \leqslant \lambda _{\textrm{loc}} < \lambda _{\max }\). In particular, for large enough b, the semilocalized phase consists entirely of the localized phase, i.e. \(\lambda _{\textrm{loc}} = 2\). For smaller values of b, the diagram in Fig. 1 does not rule out the possibility of an eigenvector \({\textbf{w}}\) in the semilocalized phase satisfying \(\Vert {\textbf{w}} \Vert _\infty ^2 = N^{-\gamma + o(1)}\) for some constant \(\gamma \in (0,1)\). This latter scenario corresponds to eigenvectors that are neither fully localized nor fully delocalized,Footnote 2 where \(\gamma \in (0,1)\) plays the role of an anomalous fractal dimension. In fact, it is plausible that such fractal eigenvectors occur in the semilocalized phase for small enough b; for more details, we refer to [70] and the heuristic discussion on localization phenomena later in this subsection, as well as [15, Appendix G].

The localization-delocalization transition for \({\mathbb {G}}\) described above is an example of an Anderson transition, where a disordered quantum system exhibits localized or delocalized states depending on the disorder strength and the location in the spectrum, corresponding to an insulator or conductor, respectively. Originally proposed in the 1950s [17] to model conduction in semiconductors with random impurities, this phenomenon is now recognized as a general feature of wave transport in disordered media, and is one of the most influential ideas in modern condensed matter physics [3, 43, 60, 63]. It is expected to occur in great generality whenever linear waves, such as quantum particles, propagate through a disordered medium. For weak enough disorder, the stationary states are expected to be delocalized, while a strong enough disorder can give rise to localized states.

The general heuristic behind localization is the following. A disordered quantum system is characterized by its Hamiltonian H, a large Hermitian random matrix. The disorder inherent in H gives rise to spatial regions where the environment is in some sense exceptional, such as vertices of unusually large degree for the Erdős–Rényi graph.Footnote 3 These regions are possible localization centres, around which localized states may form. Whether they do so is captured by the following well-known rule of thumb, also known as Mott’s criterion: localization occurs whenever the eigenvalue spacing is much larger than the tunnelling amplitude between localization centres. The simplest illustration of this rule is for a two-state system whose Hamiltonian is the matrix \(\bigl ( {\begin{matrix} a &{} \tau \\ \tau &{} b \end{matrix}} \bigr ). \) Setting the tunnelling amplitude \(\tau \) between the two sites to zero, we have an eigenvalue spacing \(|a - b |\). Denoting by \({\textbf{e}}_1, {\textbf{e}}_2\) the standard basis vectors of \({\mathbb {R}}^2\), we find that if \(|\tau | \ll |a - b |\) then the eigenvectors are approximately \({\textbf{e}}_1, {\textbf{e}}_2\), corresponding to localization, and if \(|\tau | \gg |a - b |\) then the eigenvectors are approximately \(\frac{1}{\sqrt{2}} ({\textbf{e}}_1 \pm {\textbf{e}}_2)\), corresponding to delocalization.

More generally, for a disordered Hamiltonian H defined on some connected graph, a simple yet instructive way to think of the rule of thumb is to suppose that the localization centres are spatially separated, and to construct the Hamiltonian \({{\widehat{H}}}\) from H by removing edges from the graph so as to disconnect the localization centres from each other. Hence, \({{\widehat{H}}}\) is defined on a union of connected components, each of which is associated with a single localization centre. Denote by \({\mathcal {W}}\) the set of localization centres, which are vertices in the underlying graph. A component associated with a localization centre \(x \in {\mathcal {W}}\) trivially gives rise to a localized state \({\textbf{w}}(x)\) of \({{\widehat{H}}}\) with eigenvalue \(\theta (x)\), which has the interpretation of the energy of the localization centre x. Upon putting back the removed edges, one can imagine two scenarios:

  1. (i)

    localization, where the eigenvectors of H remain close to the eigenvectors \({\textbf{w}}(x)\) of \({{\widehat{H}}}\);

  2. (ii)

    hybridization, where eigenvectors \({\textbf{w}}(x)\) associated with several resonant localization centres x with similar energies \(\theta (x)\) are superimposed to form eigenvectors of H.

We refer to Fig. 2 for an illustration of this dichotomy.

Fig. 2
figure 2

A schematic illustration of the localization-hybridization dichotomy. A disordered Hamiltonian H defined on a graph has three localization centres whose energies are in resonance. They are indicated in red, green, and blue. The Hamiltonian \({{\widehat{H}}}\) is obtained from H by splitting the graph into disconnected components around each localization centre. This gives rise to three eigenvalues of \({{\widehat{H}}}\) associated with the three components. Each component associated with a centre x carries a localized state \(\textbf{w}(x)\) drawn as a decaying density in the colour corresponding to that of the centre. The spectrum of \({{\widehat{H}}}\) is drawn above the real line in dotted lines that match the colour of the associated centre. Upon putting back the edges of the graph to return to H, we can have either localization or hybridization, depending on Mott’s criterion. In each case, we draw the spectrum of H below the real line. In the case of localization, the eigenvectors of H remain close to \({\textbf{w}}(x)\) and the eigenvalues are shifted by an amount that is small compared to the eigenvalue spacing. In the case of hybridization, the eigenvectors of H are delocalized over the three centres, being approximately nontrivial linear combinations of all three vectors \({\textbf{w}}(x)\)

To use Mott’s criterion, we note that one possible way of quantifying the tunnelling amplitude is

(1.2)

Indeed, this expression clearly generalizes the tunnelling amplitude for the two-level system given above, expressing in general the error arising from putting back the edges of H missing from \(\widehat{H}\), with respect to the family \(({\textbf{w}}(x))_{x \in {\mathcal {W}}}\). It measures the extent to which the localized eigenvectors \({\textbf{w}}(x)\) of \({{\widehat{H}}}\) are approximate eigenvectors of H (sometimes also called quasi-modes of H). Under some assumptions on the localization components, \(\tau \) also controls the off-diagonal part in a block-diagonal representation of H in the basis ; see Sect. 2.3 below. Mott’s criterion states that localization occurs whenever \(\tau \ll \Delta \), where

(1.3)

is the eigenvalue spacing. Otherwise, we expect hybridization. The main difficulty in establishing localization is therefore to control resonances, i.e. pairs of vertices xy such that \(|\theta (x) - \theta (y) |\) is small, and to rule out hybridization. Although this picture is helpful in gaining intuition about localization, in many instances, such as in the proof for the Erdős–Rényi graph in this paper,Footnote 4 it is but a rough caricature and the true picture is considerably more subtle (as we explain later in this subsection and in more detail in Sect. 2.3).

The semilocalized phase of the Erdős–Rényi graph from [12] is a phase where the eigenvectors are concentrated around a small number of localization centres, but where hybridization cannot be ruled out. As shown in [12], the set of all possible localization centres of \({\mathbb {G}}\) corresponds to vertices \(x \in [N]\) whose normalized degree

is greater than 2. The energy \(\theta (x)\) of a localization centre x is approximately equal to \(\Lambda (\alpha _x)\), where we introduced the function \(\Lambda :[2,\infty ) \rightarrow [2,\infty )\) through

(1.4)

As shown in [13] (see also [71]), there is a one-to-one correspondence between eigenvalues \(\lambda > 2\) in the semilocalized phase and vertices x of normalized degree \(\alpha _x \geqslant 2\) given by \(\lambda = \Lambda (\alpha _x) + o(1)\). An eigenvalue \(\lambda \) in the semilocalized phase has an eigenvector that is almost entirely concentrated in small balls around the set of vertices in resonance with the energy \(\lambda \) [12]. The size of the set of resonant vertices \(\mathcal W_\lambda \) is comparable to the density of states, equal to \(N^{\rho _b(\lambda ) + o(1)}\) for an explicit exponent \(\rho _b(\lambda ) < 1\) given in (B.4) in Appendix B below. Hence, owing to the small size of the set of resonant vertices \({\mathcal {W}}_\lambda \), the semilocalized phase is sharply distinguished from the delocalized phase. However, the key issue of controlling resonances and ruling out hybridization is not addressed in [12] (see Fig. 2).

In this paper we prove localization by ruling out hybridization among the resonant vertices. Our result holds for the largest and smallest \(N^{\mu }\) eigenvalues for \(\mu < \frac{1}{24}\). For small enough \(\mu \), our obtained rate of exponential decay is optimal all the way up to radii of the order of the diameter of the graph. The bound \(\frac{1}{24}\) is not optimal, and we expect that by a refinement of the method developed in this paper it can be improved; for the sake of keeping the argument reasonably simple, we refrain from doing so here. Heuristic arguments suggest that the optimal upper bound for \(\mu \) is \(\frac{1}{4}\); see [15, Appendix G] as well as [70].

At this point it is helpful to review the previous works [12, 16], which addressed the tunnelling amplitude (1.2) in the above simple picture of localization based on disjoint neighbourhoods of localization centres. In [12], the estimate \(\tau \lesssim d^{-1/2}\) was established in the entire semilocalized phase, while in [16] it was improved to \(\tau \lesssim d^{-3/2}\) at the spectral edge. In fact, here we argue that the best possible bound on \(\tau \) in terms of the local approximate eigenvectors \({\textbf{w}}(x)\) introduced above is \(\textrm{e}^{-c \frac{\log N}{\log d}}\) for some constant \(c > 0\). To see this, we recall from [12] that the vector \({\textbf{w}}(x)\) is exponentially decaying around x at some fixed rate \(C > 0\) depending on b. Thus, the best possible estimate for \(\tau \) arising from the exponential decay is \(\textrm{e}^{-C r}\), where \(r \asymp {{\,\textrm{diam}\,}}({\mathbb {G}}) = \frac{\log N}{\log d} (1+ o(1))\) (see [32]).

As for the eigenvalue spacing (1.3), it was not addressed at all in [12]. In [16], it was estimated at the spectral edge as \(\Delta \geqslant d^{-1 - \varepsilon }\) with high probability for any constant \(\varepsilon > 0\). Combined with the bound \(\tau \lesssim d^{-3/2}\) at the spectral edge obtained in [16], one finds that Mott’s criterion is satisfied at the spectral edge. This observation was used in [16] to prove localization for the top O(1) eigenvectors.

In the interior of the localized phase, the eigenvalue spacing at energy \(\lambda \) is typically of order \(N^{-\rho _b(\lambda )}\), where \(\rho _b(\lambda ) > 0\) is an exponent defined in (B.4) below. This is much smaller than the best possible estimate on the tunnelling amplitude, \(\textrm{e}^{-c \frac{\log N}{\log d}} = N^{-o(1)}\). Hence, Mott’s criterion is never satisfied inside the localized phase, and thus the simple picture based on local approximate eigenvectors cannot be used to establish localization. In this paper we therefore introduce a new setup for proving localization.

The first key idea of our proof is to abandon the above simple picture of localization, and to replace the local approximate eigenvectors \({\textbf{w}}(x)\) by global approximate eigenvectors, denoted by \({\textbf{u}}(x)\), which approximate eigenvectors of H much more accurately and therefore lead to a much smaller tunnelling amplitude (1.2). To define \({\textbf{u}}(x)\), we consider the graph obtained from \({\mathbb {G}}\) by removing all localization centres except x; we denote by \(\lambda (x)\) the second largest eigenvalue of its adjacency matrix, and by \({\textbf{u}}(x)\) the associated eigenvector. The latter is localized around the vertex x. Crucially, the quantity (1.2) with \({\textbf{w}}(x)\) replaced by \({\textbf{u}}(x)\) can now be estimated by a polynomial error \(N^{-\zeta }\) for some \(\zeta > 0\).

The price to pay for passing from local approximate eigenvectors \(\textbf{w}(x)\) to global approximate eigenvectors \({\textbf{u}}(x)\) is a breakdown of orthogonality. Indeed, the vectors \({\textbf{u}}(x)\) have a nonzero overlap, and a significant difficulty in our proof is to control these overlaps and various resulting interactions between localization centres.

To complete the verification of Mott’s criterion, we need to establish a polynomial lower bound \(\min _{x \ne y \in {\mathcal {W}}} |\lambda (x) - \lambda (y) | \geqslant N^{-\eta }\) for some \(\eta < \zeta \). Clearly, the left-hand side cannot be larger than the eigenvalue spacing \(N^{-\rho _b(\lambda )}\) at the energy \(\lambda \) we are considering, which yields the necessary bound \(\eta> \rho _b(\lambda ) > 0\). Hence, we require an anticoncentration result for the eigenvalue difference \(\lambda (x) - \lambda (y)\) on a polynomial scale. Owing to the discrete law of \({\mathbb {G}}\) (a product of independent Bernoulli random variables), methods based on smoothness such as Wegner estimates [73] are not available, and obtaining strong enough anticoncentration is the most involved part of our proof. Our basic strategy is to perform a recursive resampling of neighbourhoods of increasingly large balls around y. At each step, we derive a concentration bound for \(\lambda (x)\) and an anticoncentration bound for \(\lambda (y)\). The key tool for the latter is a self-improving version, due to Kesten [57], of a classical anticoncentration result of Doeblin, Lévy, Kolmogorov, and Rogozin. In order to obtain sufficiently strong anticoncentration, it is crucial to perform the recursion up to radii comparable to the diameter of \({\mathbb {G}}\).

We conclude this overview with a survey of related results. The eigenvalues and eigenvectors of the Erdős–Rényi graph have been extensively studied in the denser regime \(d \gg \log N\). Complete eigenvector delocalization for \(d \gg \log N\) was established in [42, 50]. The local spectral statistics in the bulk were proved to follow the universal GOE statistics in [40, 52] for \(d \geqslant N^{o(1)}\). At the spectral edge, the local spectral statistics were proved to be Tracy–Widom for \(d \gg N^{1/3}\) [40, 62], to exhibit a transition from Tracy–Widom to Gaussian at \(d \asymp N^{1/3}\) [53], and to be Gaussian throughout the regime \(N^{o(1)} \leqslant d \ll N^{1/3}\) [49, 53]. In fact, in the latter regime the Tracy–Widom statistics were recovered in [55, 61] after subtraction of an explicit random shift.

The random d-regular graph is another canonical model of sparse random graphs. Owing to the regularity constraint, it is much more homogeneous than the Erdős–Rényi graph and only exhibits a delocalized phase. The eigenvectors were proved to be completely delocalized for all \(d \geqslant 3\) in [21, 22, 54], and the local spectral statistics in the bulk were shown to follow GOE statistics for \(d \geqslant N^{o(1)}\) in [19] and at the edge Tracy–Widom statistics for \(N^{o(1)} \leqslant d \ll N^{1/3}\) in [20, 56] or for \(N^{2/3} \ll d \leqslant N/2\) in [48].

Anderson transitions have been studied in a variety of models. The archetypal example is the tight-binding, or Anderson, model on \({\mathbb {Z}}^d\) [1, 2, 11, 18]. In dimensions \(d \leqslant 2\), all eigenvectors of the Anderson model are expected to be localized, while for \(d \geqslant 3\) a coexistence of localized and delocalized phases, separated by a mobility edge, is expected for small enough disorder. So far, only the localized phase of the Anderson model has been understood rigorously, starting from the landmark works [7, 45]; see for instance [11] for a recent survey.

Although a rigorous understanding of the metal-insulator transition for the Anderson tight-binding model is still elusive, some progress has been made for random band matrices. Random band matrices [30, 46, 64, 74] interpolate between the Anderson model and mean-field Wigner matrices. They retain the d-dimensional structure of the Anderson model but have proved more amenable to rigorous analysis. They are conjectured [46] to have a similar phase diagram as the Anderson model in dimensions \(d \geqslant 3\). For \(d = 1\) much has been understood both in the localized [31, 33, 65, 66] and the delocalized [27,28,29, 35,36,37,38,39, 41, 51, 67, 68, 79] phases. For large enough d, recent progress in the delocalized phase has been made in [76,77,78]. A simplification of band matrices is the ultrametric ensemble [47], where the Euclidean metric of \({\mathbb {Z}}^d\) is replaced with an ultrametric arising from a tree structure. For this model, a phase transition was rigorously established in [72].

Another modification of the d-dimensional Anderson model is the Anderson model on the Bethe lattice, an infinite regular tree corresponding to the case \(d = \infty \). For it, the existence of a delocalized phase was shown in [8, 44, 58]. In [9, 10] it was shown that for unbounded random potentials the delocalized phase exists for arbitrarily weak disorder. The underlying mechanism is resonant delocalization, in which the exponentially decaying tunnelling amplitudes between localization centres are counterbalanced by an exponentially large number of possible channels through which tunnelling can occur, so that Mott’s criterion is violated. As a consequence, the eigenvectors hybridize.

Heavy-tailed Wigner matrices, or Lévy matrices, whose entries have \(\alpha \)-stable laws for \(0< \alpha < 2\), were proposed in [34] as a simple model that exhibits a transition in the localization of its eigenvectors; we refer to [6] for a summary of the predictions from [34, 69]. In [24, 25] it was proved that eigenvectors are weakly delocalized for energies in a compact interval around the origin, and for \(0< \alpha < 2/3\) eigenvectors are weakly localized for energies far enough from the origin. In [6] full delocalization, as well as GOE local eigenvalue statistics, were proved in a compact interval around the origin, and in [5] the law of the eigenvector components was computed. Recently, by comparison to a limiting tree model, a mobility edge was established in [4] for \(\alpha \) near 0 or 1.

Conventions Every quantity that is not explicitly called fixed or a constant is a sequence depending on N. We use the customary notations \(o(\cdot )\) and \(O(\cdot )\) in the limit \(N \rightarrow \infty \). For nonnegative XY, if \(X = O(Y)\) then we also write \(X \lesssim Y\), and if \(X = o(Y)\) then we also write \(X \ll Y\). Moreover, we write \(X \asymp Y\) to mean \(X \lesssim Y\) and \(Y \lesssim X\). We say that an event \(\Omega \) holds with high probability if \({\mathbb {P}}(\Omega ) = 1 - o(1)\). Throughout this paper every eigenvector is assumed to be normalized in \(\ell ^2([N])\). Finally, we use \(\kappa \in (0,1)\) to denote a small positive constant, which is used to state assumptions and definitions; a smaller \(\kappa \) always results in a weaker condition.

1.2 Results

Let \({\mathbb {G}} \equiv {\mathbb {G}}(N,d/N)\) be the Erdős-Rényi graph with vertex set [N] and edge probability d/N for \(0 \leqslant d \leqslant N\). Let \(A = (A_{xy})_{x,y \in [N]} \in \{0,1\}^{N\times N}\) be the adjacency matrix of \({\mathbb {G}}\). Thus, \(A =A^*\), \(A_{xx}=0\) for all \(x \in [N]\), and are independent \({\text {Bernoulli}}(d/N)\) random variables. Define the rescaled adjacency matrix

We always assume that d satisfies

$$\begin{aligned} \sqrt{\log N} \, \log \log N \ll d \leqslant 3 \log N. \end{aligned}$$
(1.5)

Owing to the nonzero expectation of H, it is well known that the largest eigenvalue of H, denoted by \(\lambda _1(H)\), is an outlier separated from the rest of the spectrum (see e.g. Proposition 3.4 (iv) below), and we shall always discard it from our discussion. The lower bound of (1.5) is made for convenience, to ensure that \(\lambda _1(H)\) is separated from the bulk spectrumFootnote 5; the upper bound of (1.5) is made without loss of generality, since for \(d \geqslant 3 \log N\) the localized phase does not exist and the entire spectrum is known to belong to the delocalized phase [12, 14].

We denote by \(B_{r}(x)\) (respectively by \(S_{r}(x)\)) the closed ball (respectively the sphere) of radius r with respect to the graph distance of \({\mathbb {G}}\) around the vertex x. We refer to Sect. 2.1 below for a full account of notations used throughout this paper.

The localized phase is characterized by a threshold \(\alpha ^*\) defined, for any fixed \(\kappa \in (0,1)\) and \(\mu \in [0,1]\), as

(1.6)

We refer to Appendix B below for the basic qualitative properties of \(\alpha ^*\) as well as a graph. We shall show exponential localization for any eigenvector with eigenvalue \(\lambda \) satisfying the condition

$$\begin{aligned} \lambda \ne \lambda _1(H), \quad |\lambda | \geqslant \Lambda (\alpha ^*(\mu ))+ \kappa , \end{aligned}$$
(1.7)

for sufficiently small \(\mu >0\). In particular, the number of eigenvalues \(\lambda \) satisfying (1.7) is with high probability \(N^{\mu + o(1)}\) as \(\kappa \rightarrow 0\); see Remark 1.2 below.

1.2.1 Exponential localization

Our main result is the following theorem. We recall that by convention all eigenvectors are normalized.

Theorem 1.1

(Exponential localization). Suppose that (1.5) holds. Fix \(\kappa \in (0,1)\) and \(\mu \in (0,1/24)\). Then there is a constant \(c \in (0,1)\) depending on \(\kappa \) such that, with high probability, for any eigenvector \({\textbf{w}} = (w_x)_{x \in [N]}\) of H with eigenvalue \(\lambda \) satisfying (1.7), there exists a unique vertex \(x\in [N]\) with \(\alpha _x > \alpha ^*(\mu )\) such that

$$\begin{aligned} w_x^2 = {\frac{\alpha _x-2}{2(\alpha _x - 1)}} + o(1), \quad \Vert {\textbf{w}}|_{B_i(x)^c} \Vert \lesssim \sqrt{\alpha _x} (1-c)^{i}, \end{aligned}$$
(1.8)

for all \(i \in {\mathbb {N}}\) with \(1 \leqslant i \leqslant \frac{1}{6} \frac{\log N}{\log d}\).

Remark 1.2

(Eigenvalue locations). The eigenvalue \(\lambda \) of the eigenvector \({\textbf{w}}\) and the associated vertex x from Theorem 1.1 satisfy \(|\lambda | = \Lambda (\alpha _x) + o(1)\) with high probability. This follows from (2.4) and (3.3) below.

The eigenvalue locations in the localized phase were previously studied in [13, Theorem 2.1] (see also [12, Theorem 1.7]). In particular, if \(d > b_* \log N\) then the localized phase does not exist (see [13, Remark 2.5]) and there is no eigenvalue of H satisfying (1.7). Conversely, if \(d \leqslant (b_* - \varepsilon ) \log N\) for some constant \(\varepsilon > 0\) then for small enough \(\kappa > 0\) there is a polynomial number of eigenvalues satisfying (1.7), by [12, Theorem 1.7]. By the same argument, if \(d/\log N\) is small enough, then with high probability \(N^{\mu + o(1)}\) eigenvalues of H satisfy (1.7) as \(N \rightarrow \infty \) and \(\kappa \rightarrow 0\).

Remark 1.3

(Conditions in Theorem 1.1). The exponential decay in Theorem 1.1 holds up to the scale \(\frac{\log N}{\log d}\) of the diameter of \({\mathbb {G}}\). The upper bound 1/24 and the factor 1/6 are not optimal (see the discussion in [15, Appendix G]), and they can be improved with some extra effort, which we however refrain from doing here.

Remark 1.4

(Optimal exponential decay). If \(\mu \) is sufficiently small then the rate \(1 - c\) of exponential decay from (1.8) can be made explicit. Suppose (1.5). Then for each small enough constant \(\varepsilon >0\) and small enough constant \(\mu >0\), depending on \(\varepsilon \), with high probability, for any eigenvector \({\textbf{w}}\) of H with eigenvalue \(\lambda \) satisfying (1.7), there exists a unique vertex \(x\in [N]\) with \(\alpha _{x} > \alpha ^*(\mu )\) such that

$$\begin{aligned} \Vert {\textbf{w}}|_{B_i(x)^c} \Vert \lesssim \sqrt{\alpha _x} \bigg (\frac{1 + O(\varepsilon )}{\sqrt{\alpha _x-1}}\bigg )^{i+1} \end{aligned}$$
(1.9)

for each \(i \in {\mathbb {N}}\) satisfying \(i \ll \frac{\log N}{\log d} \frac{1}{\log \frac{10\log N}{d} }\). This follows from (2.3) and Proposition 4.3 below.

The rate of decay in (1.9) is optimal up to the error term \(O(\varepsilon )\). Indeed, by Theorem 1.6 and the explicit form (1.10)–(1.11) below, we find that

$$\begin{aligned} \Vert {\textbf{w}}|_{B_i(x)^c} \Vert \asymp \sqrt{\alpha _x} \biggl (\frac{1}{\sqrt{\alpha _x - 1}}\biggr )^{i+1} + o(1). \end{aligned}$$

for any fixed \(i \in {\mathbb {N}}\). In particular, (1.9) improves the rate \(\frac{q^i}{(1-q)^2}\) with \(q = (2 + o(1))\frac{\sqrt{\alpha _x - 1}}{\alpha _x}\) obtained in [16, Theorem 1.7] at the spectral edge, corresponding to \(\alpha _x = ( 1 + o(1)) \alpha ^*(0)\).

1.2.2 Geometric structure of eigenvectors

Next, we describe the precise geometric structure of the eigenvectors in the localized phase. For any vertex x with \(\alpha _x > 2\) and radius \(r \in {\mathbb {N}}^*\), we shall define two local vectors, \({\textbf{w}}_r(x)\) and \({\textbf{v}}_r(x)\), which depend only on \(\mathbb G\) in the ball \(B_r(x)\). If \({\textbf{w}}\) is an eigenvector of H as in Theorem 1.1 with associated vertex x, then \({\textbf{w}}\) will be well approximated by \({\textbf{w}}_r(x)\) and \({\textbf{v}}_r(x)\) for suitably chosen \(r \gg 1\).

To define these local vectors, we need the following definitions. Let \(r \in {\mathbb {N}}^*\). For \(\alpha > 2\) define the positive sequence \((u_i(\alpha ))_{i = 0}^{r-1}\) through

(1.10)

We normalize the sequence by choosing \(u_0(\alpha ) > 0\) such that \(\sum _{i = 0}^{r - 1} u_i(\alpha )^2 = 1\).

Definition 1.5

(Localization profile vectors \({\textbf{w}}_r(x)\) and \({\textbf{v}}_r(x)\)). Let \(r \in {\mathbb {N}}^*\) and \(x \in [N]\).

  1. (i)

    Denote by \({\textbf{w}}_r(x)\) an eigenvector of \(H|_{B_r(x)}\) associated with its largest eigenvalue, chosen so that its value at x is nonnegative. Here \(H|_{B_r(x)}\) denotes the matrix H restricted to the vertices in \(B_r(x)\) (See Sect. 2.1 below.)

  2. (ii)

    For \(\alpha _x > 2\) and \((u_i(\alpha ))_{i=0}^{r-1}\) as in (1.10), define

    (1.11)

    where \({\textbf{1}}_{S_i(x)}\) denotes the indicator function of the sphere \(S_i(x)\).

Note that \({\textbf{w}}_r(x)\) is unique by the Perron-Frobenius theorem for irreducible matrices with nonnegative entries.

Theorem 1.6

(Localization profile). Suppose that (1.5) holds and fix \(\kappa \in (0,1)\) and \(\mu \in (0,1/24)\). With high probability, for any eigenvector \(\textbf{w}\) of H with eigenvalue \(\lambda \) satisfying (1.7), there exists a unique vertex \(x \in [N]\) with \(\alpha _x > \alpha ^*(\mu )\) such thatFootnote 6

$$\begin{aligned} {\textbf{w}} = {\textbf{w}}_r(x) + o(1) = {\textbf{v}}_r(x) + o(1) \end{aligned}$$
(1.12)

for each \(r \in {\mathbb {N}}^*\) satisfying \(\log d \ll r \leqslant \frac{1}{6} \frac{\log N}{\log d}\). Here, o(1) is meant with respect to the Euclidean norm on \({\mathbb {R}}^N\).

In particular, \({\textbf{w}}\) has locally the radial exponentially decaying structure of \({\textbf{v}}_r(x)\).

1.2.3 Mobility edge and localization length

Next, we combine the results of this paper with those obtained for the delocalized phase in [12, 14] to establish a mobility edge at \(\pm 2\) for certain values of d, and analyse the structure of the eigenvectors quantitatively in the vicinity of the mobility edge.

Theorem 1.7

(Mobility edge). Fix \(\kappa >0\) and suppose that \(\frac{23}{24} + \kappa \leqslant \frac{d}{b_* \log N} \leqslant 1 - \kappa \). Then, with high probability, for any eigenvector \({\textbf{w}}\) of H with eigenvalue \(\lambda \ne \lambda _1(H)\) we have the following dichotomy.

  1. (i)

    (Localized phase) If \(|\lambda | \geqslant 2 + \kappa \) then \({\textbf{w}}\) is exponentially localized as in (1.8) and (1.12).

  2. (ii)

    (Delocalized phase) If \(|\lambda | \leqslant 2 - \kappa \) then \({\textbf{w}}\) is completely delocalized in the sense that

    $$\begin{aligned} \Vert {\textbf{w}} \Vert _\infty ^2 \leqslant N^{-1 + o(1)}. \end{aligned}$$
    (1.13)

Both phases in Theorem 1.7 are nonempty under the assumption on d; see Sect. 1.1 and [13, Remark 2.5]. Theorem 1.7 establishes a dichotomy because (1.8) and (1.13) are mutually exclusive, since \(\Vert {\textbf{w}} \Vert _\infty ^2 \geqslant w_x^2 = \frac{\alpha _x - 2}{2(\alpha _x - 1)} + o(1) > rsim 1\) if \(\alpha _x > \alpha ^*(\mu ) \geqslant 2 + \kappa \).

Next, we investigate the spatial extent of the eigenvectors near the mobility edge. To that end, we use the following notion of localization length. With each normalized vector \({\textbf{w}}\) we associate the length

(1.14)

where \({\textrm{d}}(x,y)\) denotes the distance from x to y in the graph \({\mathbb {G}}\). Regarding \(y \mapsto w_y^2\) as a probability measure on [N], the quantity \(\ell ({\textbf{w}})\) expresses the minimal expected distance from a reference vertex x. The minimizing vertex x has the interpretation of a localization centre for \({\textbf{w}}\).

Denote by \({{\,\textrm{diam}\,}}({\mathbb {G}})\) the diameterFootnote 7 of \({\mathbb {G}}\). It is a classical fact [32] that with high probability \({{\,\textrm{diam}\,}}(\mathbb G) = \frac{\log N}{\log d} (1 + o(1))\) as long as \(d \gg 1\).

Theorem 1.8

(Localization length). Fix \(\kappa > 0\) and suppose that \(\frac{23}{24} + \kappa \leqslant \frac{d}{b_* \log N} \leqslant 1 - \kappa \). Then, with high probability, for any eigenvector \({\textbf{w}}\) of H with eigenvalue \(\lambda \ne \lambda _1(H)\) we have

$$\begin{aligned} \ell ({\textbf{w}}) = {\left\{ \begin{array}{ll} \frac{|\lambda |}{2 \sqrt{\lambda ^2 - 4}} + o(1) &{} \text {if } |\lambda | \geqslant 2 + \kappa \\ {{\,\textrm{diam}\,}}({\mathbb {G}}) (1 + o(1)) &{} \text {if } |\lambda | \leqslant 2 - \kappa . \end{array}\right. } \end{aligned}$$
(1.15)

Remark 1.9

By the same proof, the first estimate of (1.15) holds also for all eigenvectors satisfying the conditions of Theorem 1.1. Moreover, the constant \(\frac{23}{24}\) is not optimal and can be reduced with some extra effort.

Theorem 1.8 shows that the localization length diverges as one approaches the mobility edge from the localized phase, and that it equals the diameter of the graph in the delocalized phase. See Fig. 3 for an illustration.

Fig. 3
figure 3

An illustration of the behaviour of the localization length (1.14) around the mobility edge, established in Theorem 1.8. We plot the asymptotic localization length \(\ell \) of an eigenvector with eigenvalue \(\lambda \) as a function of \(\lambda \). Here \(\frac{d}{b_* \log N}\) is a fixed number in \([\frac{23}{24} + \kappa , 1 - \kappa ]\). The spectrum is asymptotically given by the interval \([-\lambda _{\max }, \lambda _{\max }]\). We only draw a portion of the spectrum near the right edge. Below the mobility edge 2, the localization length is \({{\,\textrm{diam}\,}}({\mathbb {G}}) = \frac{\log N}{\log d} (1 + o(1))\). Above the mobility edge 2, the localization length is finite and diverges as one approaches the mobility edge

1.2.4 Eigenfunction correlator and dynamical localization

Finally, as a consequence of Theorem 1.1, we control quantities commonly used to characterize Anderson localization (see e.g. [11, Section 1.4]). In particular, we establish exponential decay of the eigenfunction correlator and dynamical localization.

Corollary 1.10

Suppose (1.5). Then there is a constant \(\mu >0\) such that for each fixed \(\kappa \in (0,1)\), there exist constants \(c, C > 0\) depending only on \(\kappa \) such that the following holds with high probability. Let \(J \subset [\Lambda (\alpha ^*(\mu ))+ \kappa , \sqrt{d}/2]\) be an interval with associated spectral projection \(\Pi _J(H)\). For any \(x \in [N]\), any measurable function \(F:{\mathbb {R}}\rightarrow {\mathbb {C}}\) satisfying \(\Vert F \Vert _\infty \leqslant 1\), and any \(r \geqslant 0\), we have

$$\begin{aligned} \Vert (\Pi _J(H) F(H) {\textbf{1}}_x ) |_{B_{r}(x)^c} \Vert \leqslant C \textrm{e}^{- c r}. \end{aligned}$$
(1.16)

In particular, denoting by \({\textbf{w}}_\lambda \) the normalized eigenvector of H associated with \(\lambda \in {{\,\textrm{spec}\,}}(H)\), the eigenfunction correlator satisfies the estimate

(1.17)

and we have dynamical localization,

(1.18)

Remark 1.11

By a close inspection of the proof in [15, Appendix F] (using that all error probabilities are polynomially small in N), we note that the estimates (1.17) and (1.18) hold also in expectation, provided one multiplies both sides by the factor \(\textrm{e}^{c \, \textrm{d}(x,y)}\).

Structure of the paper We conclude this section with a short summary of the structure of the paper. In Sect. 2, we collect a few basic notations, then state the three core propositions of the paper: Proposition 2.2, which gives exponential decay of the approximate eigenvectors, Proposition 2.3, which compares the approximate eigenvalues with the true eigenvalues, and Proposition 2.4, which estimates the spacing between neighbouring approximate eigenvalues. After stating them, we use them to deduce Theorem 1.1 in Sect. 2.2. Then, we sketch the proofs of these three propositions in Sect. 2.3. Theorems 1.6, 1.7, and 1.8 are proved in the short Sects. 2.4, 2.5, and 2.6, respectively. Section 3 is devoted to preliminary results on the graph \({\mathbb {G}}\), its spectrum, and its Green function. Sections 4, 5 and 6 are devoted to the proofs of Propositions 2.2, 2.3, and 2.4, respectively. In the appendices, we collect some auxiliary results and basic tools.

2 Proof of Main Results

The rest of the paper is devoted to the proofs of Theorems 1.1, 1.6, 1.7, and 1.8, as well as Corollary 1.10. The former four are proved in this section, while Corollary 1.10 is proved in [15, Appendix F].

Throughout, \(\kappa \in (0,1)\) denotes an arbitrary positive constant.

2.1 Basic notations

We write \({\mathbb {N}}= \{0,1,2,\dots \}\). We set for any \(n \in {\mathbb {N}}^*\) and . We write \(|X |\) for the cardinality of a finite set X. For \(X \subset [N]\) we write . We use \(\mathbb {1}_{\Omega }\) to denote the indicator function of an event \(\Omega \).

Vectors in \({\mathbb {R}}^N\) are denoted by boldface lowercase Latin letters like \({\textbf{u}}\), \({\textbf{v}}\) and \({\textbf{w}}\). We use the notation \({\textbf{v}} = (v_x)_{x \in [N]} \in {\mathbb {R}}^N\) for the entries of a vector. We denote by the support of a vector \(\textbf{v}\). We denote by the Euclidean scalar product on \({\mathbb {R}}^N\) and by the induced Euclidean norm. For \(X \subset [N]\) we set . For any \(x \in [N]\), we define the standard basis vector , so that . To any subset \(S \subset [N]\) we assign the vector \({\textbf{1}}_S\in {\mathbb {R}}^N\) given by . In particular, \({\textbf{1}}_{\{ x\}} = {\textbf{1}}_x\).

We denote by \({\textrm{d}}(x,y)\) the distance between the vertices \(x,y \in [N]\) with respect to the graph \({\mathbb {G}}\), i.e. the number of edges in the shortest path connecting x and y. For \(r \in {\mathbb {N}}\) and \(x \in [N]\), we denote by the closed ball of radius r around x, and by the sphere of radius r around the vertex x. For \(X \subset [N]\) we denote by \({\mathbb {G}} |_X\) the subgraph on X induced by \({\mathbb {G}}\).

For a matrix \(M \in {\mathbb {R}}^{N \times N}\), \(\Vert M \Vert \) is its operator norm induced by the Euclidean norm on \({\mathbb {R}}^N\). For an \(N \times N\) Hermitian matrix M, we denote by \(\lambda _1(M) \geqslant \lambda _2(M) \geqslant \cdots \geqslant \lambda _N(M)\) the ordered eigenvalues of M. For an \(N \times N\) matrix \(M\in {\mathbb {R}}^{N\times N}\) and a subset \(X \subset [N]\), we introduce the \(N \times N\) matrices as well as with entries \(M^{(X)}_{xy} = M_{xy} \mathbb {1}_{x,y \notin X}\).

2.2 Exponential localization: proof of Theorem 1.1

In this section, after introducing some notation and stating the core propositions of the proof, we use them to prove our main result, Theorem 1.1. Recalling the definition of \(\Lambda \) from (1.4), we introduce the \(\mu \)-dependent sets

(2.1)
(2.2)

By definition, \({\mathcal {W}} \subset {\mathcal {V}}\).

The following definition introduces the fundamental approximate eigenvalues and eigenvectors underlying our proof.

Definition 2.1

(\(\lambda (x)\) and \({\textbf{u}}(x)\)). For any \(x \in {\mathcal {W}}\), we abbreviate . Moreover, we denote by \(\textbf{u}(x)\) a normalized eigenvector of \(H^{({\mathcal {V}} {\setminus } \{x\})}\) with eigenvalue \(\lambda (x)\) satisfying .

As we shall see, with high probability \(\lambda (x)\) is a simple eigenvalue and hence \({\textbf{u}}(x)\) is unique (see Corollary 3.6 below).

The proof of Theorem 1.1 consists of three main steps, which are the content of the three following propositions. The next proposition states that \({\textbf{u}}(x)\) has the exponential decay claimed in Theorem 1.1 for \({\textbf{w}}\). It is proved in Sect. 4 below.

Proposition 2.2

(Exponential decay of \({\textbf{u}}(x)\)). Suppose that (1.5) holds. Then there is a constant \(c \in (0,1)\) such that, for each fixed \(\mu \in [0,1/3)\), with high probability, for each \(x \in {\mathcal {W}}\),

for all \(i \in {\mathbb {N}}\) satisfying \(1 \leqslant i \leqslant \min \big \{ \frac{1}{5} - \frac{\mu }{4}, \frac{1}{3} - \mu \big \} \frac{\log N}{\log d}-2\).

In the proof of Theorem 1.1, the next two propositions will be used to conclude that any eigenvector of H whose associated eigenvalue satisfies (1.7) is close to \({\textbf{u}}(x)\) for some \(x \in {\mathcal {W}}\). Given Proposition 2.2, this will directly imply Theorem 1.1.

The next proposition, Proposition 2.3, states that \(\lambda (x)\) and \({\textbf{u}}(x)\) are approximate eigenvalues and eigenvectors of H, respectively, with an error bounded by an inverse power of N. Moreover, up to such an error, each eigenvalue of H satisfying (1.7) is approximated by \(\lambda (x)\) for some \(x \in {\mathcal {W}}\). In particular, it provides an upper bound for the tunnelling amplitude, in the sense of (1.2), for the global approximate eigenvectors from Definition 2.1. Its proof is given in Sect. 5 below.

Proposition 2.3

(Approximate eigenvalues). Suppose (1.5). Fix \(\mu \in [0,1/4)\) and \(\zeta \in [0, 1/2 - 3\mu /2)\). Then, with high probability, for each \(x \in {\mathcal {W}}\) there exists \(\varepsilon _x \in {\mathbb {R}}\) such that

counted with multiplicity and

$$\begin{aligned} \max _{x\in {\mathcal {W}}}\max \big \{ |\varepsilon _x |, \, \Vert (H-\lambda (x)){\textbf{u}}(x)\Vert \big \} \leqslant N^{-\zeta }. \end{aligned}$$

The next proposition establishes a spacing of at least \(N^{-\eta }\) between the approximate eigenvalues \((\lambda (x))_{x \in {\mathcal {W}}}\), for large enough \(\eta \). It is proved in Sect. 6 below.

Proposition 2.4

(Eigenvalue spacing). Suppose (1.5). Fix \(\mu \in (0,1/24)\) and \(\eta > 8 \mu \). Then, with high probability,

$$\begin{aligned} |\lambda (x) - \lambda (y) | \geqslant N^{-\eta } \end{aligned}$$

for all x, \(y \in {\mathcal {W}}\) with \(x \ne y\).

We now deduce Theorem 1.1 from Propositions 2.2, 2.3, and 2.4.

Proof of Theorem 1.1

Let \(\lambda \) be an eigenvalue of H satisfying (1.7), and \({\textbf{w}}\) an associated, normalized eigenvector. Fix \(\zeta \in (8\mu , 1/3)\) and \(\eta \in (8\mu , \zeta )\). As \(\mu < 1/24\), both intervals are nonempty and \(\zeta < 1/2 - 3\mu /2\). Thus, Propositions 2.3 and 2.4 are applicable with these choices of \(\zeta \) and \(\eta \). We shall show below that, on the intersection of the high-probability events of Propositions 2.3 and 2.4, there exists a unique \(x \in {\mathcal {W}}\) such that

$$\begin{aligned} {\textbf{w}}= {\textbf{u}}(x) +O(N^{\eta -\zeta }) \end{aligned}$$
(2.3)

(under suitable choice of the sign of \({\textbf{w}}\)). Thus, Theorem 1.1 follows from \(\eta < \zeta \) and Proposition 2.2 as \(\exp \big ( \frac{1}{6} \frac{\log N}{\log d} \log (1-c) \big ) \gg N^{-\varepsilon }\) for any \(\varepsilon >0\).

What remains is the proof of (2.3). This is an application of perturbation theory in the form of Lemma D.2, whose conditions we justify now. Note first that, combining (5.24) below and the trivial fact \(\lambda _1({\mathbb {E}}H) = \sqrt{d} (1 + o(1))\) with rank-one eigenvalue interlacing (Lemma D.4), we conclude that with high probability \(\lambda _1(H) = \sqrt{d} (1 + o(1))\) and \(\lambda _2(H) \leqslant \sqrt{d}/2\). Hence, the eigenvalue \(\lambda \) satisfying (1.7) lies in \([\Lambda (\alpha ^*(\mu ))+ \kappa , d/2]\). From Propositions 2.3 and 2.4 with \(\eta < \zeta \), we conclude that

$$\begin{aligned} {{\,\textrm{dist}\,}}(\lambda , {{\,\textrm{spec}\,}}(H){\setminus }\{\lambda \}) \geqslant N^{-\eta } - 2 N^{-\zeta }, \quad |\lambda - \lambda (x) | \leqslant N^{-\zeta } \end{aligned}$$
(2.4)

for a unique \(x \in {\mathcal {W}}\) (see Fig. 4). In particular, as \(\eta < \zeta \), there is \(\Delta \asymp N^{-\eta }\) such that \(\lambda \) is the unique eigenvalue of H in \([\lambda (x) - \Delta , \lambda (x) + \Delta ]\). Moreover, \(\Vert (H-\lambda (x))\textbf{u}(x) \Vert \leqslant N^{-\zeta }\) by Proposition 2.3. Therefore, all conditions of Lemma D.2 are satisfied, and it implies (2.3). This concludes the proof of Theorem 1.1. \(\square \)

Fig. 4
figure 4

An illustration of the setup for the perturbation theory in the proof of Theorem 1.1. We draw two instances of the interval \([\Lambda (\alpha ^*) + \kappa , \infty )\). On the top line, we draw each \(\lambda (x)\) for \(x \in {\mathcal {W}}\) as blue dot. Each dot is surrounded by a blue buffer of width \(N^{-\eta }\). By Proposition 2.4, these buffers do not intersect with high probability. On the bottom line, we draw each eigenvalue of H as red dot. Each dot on the top line gives rise to a red region on the bottom line of width \(2 N^{-\zeta }\). Since \(\zeta > \eta \), the red regions are disjoint. By Proposition 2.3, each red region contains exactly one eigenvalue of H and each eigenvalue of H is contained in a red region. Hence, the eigenvalues of H are separated by at least \(N^{-\eta }/2\)

By choosing \(\eta \) in (2.4) of the proof of Theorem 1.1 sufficiently small, we conclude the following result.

Corollary 2.5

(Eigenvalue spacing of H). Suppose (1.5). Fix \(\mu \in (0,1/24)\) and \(\eta > 8\mu \). Then, with high probability,

$$\begin{aligned} {{\,\textrm{dist}\,}}(\lambda , {{\,\textrm{spec}\,}}(H) {\setminus } \{ \lambda \}) \geqslant N^{-\eta }, \end{aligned}$$

for every \(\lambda \in {{\,\textrm{spec}\,}}(H) \cap [\Lambda (\alpha ^*(\mu ))+ \kappa , \infty )\).

Remark 2.6

(Eigenvalue spacing in critical regime). In the critical regime, i.e. when \(d \asymp \log N\), the lower bound on \(\eta \) in the conditions of Proposition 2.4 and Corollary 2.5 can be weakened to \(\eta > 4\mu \). Details can be found in Remark 6.2 below.

Remark 2.7

(Eigenvector mass on vertices in \({\mathcal {V}} {\setminus } \{x \}\)). With high probability the following holds. Let \({\textbf{w}}\) be an eigenvector of H associated with the vertex x as in Theorem 1.1. Then, from \({\textbf{u}}(x)|_{{\mathcal {V}} {{\setminus }} \{x \}} = 0\) and (2.3), we conclude

$$\begin{aligned} \Vert {\textbf{w}}|_{{\mathcal {V}}{\setminus } \{x \}} \Vert \lesssim N^{-\varepsilon } \end{aligned}$$

for any small enough \(\varepsilon >0\).

2.3 Sketch of the proof

In this subsection we sketch the proof of Theorem 1.1. We use the definitions and notations from Sects. 2.1 and 2.2.

The basic strategy is to find an orthogonal matrix U, a diagonal \(n\times n\) matrix \(\Theta = {{\,\textrm{diag}\,}}(\theta _1, \dots , \theta _n)\), a symmetric \((N-n) \times (N-n)\) matrix X, and a symmetric \(N\times N\) matrix E such that the following holds. In the basis of the columns \({\textbf{u}}_1, \dots , {\textbf{u}}_N\) of U, the matrix H has the form

$$\begin{aligned} U^* H U = \begin{pmatrix} \Theta &{} 0 \\ 0 &{} X \end{pmatrix} + E, \end{aligned}$$
(2.5)

where the matrices E and X satisfy

(2.6a)
(2.6b)

here I denotes the interval containing the eigenvalues of H that we are interested in (cf. (1.7)). We call the first n columns \({\textbf{u}}_1, \dots , {\textbf{u}}_n\) of H profile vectors.

If \(\Vert E \Vert = o(1)\) then, for each \(i \in [n]\), the vector \({\textbf{u}}_i\) is an approximate eigenvector of H with approximate eigenvalue \(\theta _i\). Unlike approximate eigenvalues, in general approximate eigenvectors have nothing to do with the actual eigenvectors. For \({\textbf{u}}_i\) to be close to an eigenvector of H, we require the stronger estimates (2.6), which can be regarded as a version of Mott’s criterion in terms of the profile vectors encoded by \({\textbf{u}}_1, \dots , {\textbf{u}}_n\). Localization then follows provided the \({\textbf{u}}_i\) are shown to be localized.

In [13], it was showed that there is a one-to-one correspondence between eigenvalues of H in the semilocalized phase \([2 + o(1), \infty ) {\setminus } \{ \lambda _1(H)\}\) and vertices x of \({\mathbb {G}}\) with normalized degree \(\alpha _x \geqslant 2 + o(1)\). Subsequently, in [12, 16], the eigenvectors of H in the semilocalized phase were investigated using the decomposition (2.5). There, the profile vectors \({\textbf{u}}_i\) were supported in balls \(B_r(x)\) around vertices x of sufficiently large \(\alpha _x\), where \(r \gg 1\). We refer to such vectors as local profile vectors: they are spatially localized (in the graph distance) and their supports are disjoint. Examples of such local profile vectors are \({\textbf{w}}_r(x)\) and \({\textbf{v}}_r(x)\) for \(x \in {\mathcal {W}}\), defined in Definition 1.5 (others were defined in [12, 16]).

The local profile vectors are exponentially decaying with a rate \(c > 0\) depending on b and the energy. The best possible error estimate for \(\Vert (H - \theta _i) {\textbf{u}}_i \Vert \) under the condition that \({\textbf{u}}_i\) is supported in \(B_r(x)\) is obtained by choosing \({\textbf{u}}_i = {\textbf{w}}_r(x)\), the top eigenvector of \(H|_{B_r(x)}\); in that case the error is purely a boundary term of order \(\textrm{e}^{-cr}\). If the supports of the profile vectors are separated by more than 1 (in the graph distance) then it is easy to see that \(\Vert E \Vert \leqslant \max _{i} \Vert (H- \theta _i) {\textbf{u}}_i \Vert \) (see also our formulation of Mott’s criterion (1.2)). Hence, the best possible error estimate for \(\Vert E \Vert \) is \(\textrm{e}^{-cr}\). Since the diameter of \({\mathbb {G}}\) is \(\frac{\log N}{\log d}(1 + o(1))\) with high probability, the best bound resulting from this approach is \(\Vert E \Vert \lesssim N^{-c/\log d} =N^{-o(1)}\) for some constant \(c>0\). However, inside the semilocalized phase this bound is always much larger than the typical eigenvalue spacing \(N^{-\eta }\) for some \(\eta > 0\). Recalling the condition (2.6a), we conclude that any approach to prove localization in the semilocalized phase that uses local profile vectors is doomed to fail.

The reason why any approach based on local profile vectors fails is that local profile vectors (such as \({\textbf{w}}_r(x)\)) are supported on balls containing a comparatively small set of vertices, and the mass of the true eigenvectors outside of such balls is not small enough to be fully negligible. This leads us to introduce the global profile vectors \({\textbf{u}}(x)\), \(x \in {\mathcal {W}}\), from Definition 2.1, with associated approximate eigenvalues \(\lambda (x)\). They are defined as the second eigenvector-eigenvalue pair of the matrix \(H^{({\mathcal {V}}{\setminus } \{x\})}\). Thus, \(\Theta = {{\,\textrm{diag}\,}}( (\lambda (x))_{x \in {\mathcal {W}}})\) and the first \(n = |{\mathcal {W}} |\) columns of U are given by the orthonormalization of the family \(({\textbf{u}}(x))_{x \in {\mathcal {W}}}\). The global profile vector \({\textbf{u}}(x)\) and the best possible local profile vector \({\textbf{w}}_r(x)\) are each defined as eigenvectors of the graph after removal of a set of vertices, \(|{\mathcal {V}} | - 1 \sim N^{\mu } \ll N\) vertices for the former and \(|B_r(x)^c | \sim N\) vertices for the latter. This suggests that \({\textbf{u}}(x)\) is a better approximation of a true eigenvector of H. The price to pay is that its definition is less explicit, and, crucially, the family \(({\textbf{u}}(x))_{x \in {\mathcal {W}}}\) is not orthogonal owing to the global profile vectors having nonzero overlaps. As explained below, the need to control the overlaps of the global profile vectors presents a serious complication.

The proof of Theorem 1.1 consists of three main steps:

  1. (i)

    exponential decay for \({\textbf{u}}(x)\) around x,

  2. (ii)

    \(\Vert E\Vert \leqslant N^{-\zeta }\) and \({{\,\textrm{spec}\,}}(X)\) is separated from I,

  3. (iii)

    \(\min _{x\ne y\in {\mathcal {W}}} |\lambda (x)-\lambda (y)| \geqslant N^{-\eta }\),

for some constants \(\zeta>\eta >0\) (see also Sect. 2.3.4 below). These items corresponds to Propositions 2.2, 2.3, and 2.4, respectively. We outline their proofs in Sects. 2.3.1, 2.3.2, and 2.3.3, respectively.

2.3.1 Exponential decay of \({\textbf{u}}(x)\)

First we explain the need to introduce the two different vertex sets \({\mathcal {W}} \subset {\mathcal {V}}\). In the definition of \(\lambda (x)\) and \(\textbf{u}(x)\), all vertices in \({\mathcal {V}}{\setminus } \{x \}\) are removed, while the profile vectors \({\textbf{u}}(x)\) are only considered for x in the smaller set \({\mathcal {W}}\). The difference between \({\mathcal {W}}\) and \({\mathcal {V}}\) is used precisely to obtain a spectral gap for \(H^{({\mathcal {V}} {\setminus } \{x \})}\) around \(\lambda (x)\), \(x \in {\mathcal {W}}\).

To show exponential decay of \({\textbf{u}}(x)\), we use its definition, a simple computation, and a truncated Neumann series expansion to obtain

$$\begin{aligned} \begin{aligned} {\textbf{u}}(x)|_{{\mathcal {V}}^c}&= c_x \bigg (1-\frac{H^{({\mathcal {V}})}}{\lambda (x)}\bigg )^{-1}{\textbf{1}}_{S_1(x)}\\&=c_x\sum _{k=0}^{n-1}\bigg (\frac{H^{(\mathcal V)}}{\lambda (x)}\bigg )^{k}{\textbf{1}}_{S_1(x)}+c_x\bigg (1-\frac{H^{(\mathcal V)}}{\lambda (x)}\bigg )^{-1}\bigg (\frac{H^{(\mathcal V)}}{\lambda (x)}\bigg )^{n}{\textbf{1}}_{S_1(x)}, \end{aligned} \end{aligned}$$
(2.7)

where . Each term of the sum is supported in \(B_n(x)\) and, thus, vanishes when restricting to \(B_n(x)^c\). Hence, as , \(\Vert {\textbf{1}}_{S_1(x)} \Vert = \sqrt{d \alpha _x}\), and \(\Vert (\lambda (x) - H^{({\mathcal {V}})})^{-1} \Vert \lesssim 1\) by the spectral gap of \(H^{({\mathcal {V}}{\setminus } \{ x\})}\) around \(\lambda (x) = \lambda _2(H^{({\mathcal {V}}{\setminus } \{x\})})\) mentioned above, we obtain \(\Vert {\textbf{u}}(x)|_{B_n(x)^c}\Vert \lesssim \sqrt{\alpha _x} \, q^n \) with \(q=\lambda (x)^{-1}\Vert H^{({\mathcal {V}})}|_{B_{n+1}(x)}\Vert <1\). This is the desired exponential decay. We remark that in (2.7) we establish exponential decay of the Green function of \(H^{({\mathcal {V}})}\) evaluated at \(\lambda (x)\), using that \(\lambda (x)\) is away from the spectrum of \(H^{({\mathcal {V}})}\). This is an instance of a Combes-Thomas estimate, and we translate it to an exponential decay for the eigenvector \(\textbf{u}(x)\). Furthermore, we show that \(\lambda (x)\) is isolated in the spectrum of \(H^{({\mathcal {V}}{\setminus }\{x\})}\) and, thus, perturbation theory implies that \({\textbf{u}}(x) = {\textbf{w}}_r(x) + o(1)\).

The rate obtained from the above argument is far from optimal, but an extension of this argument does yield the optimal rate of decay for \(\Vert {\textbf{u}}(x)|_{B_i(x)^c} \Vert \) for small enough \(\mu \). To that end, we choose \(n = C i\) for a large constant \(C>0\) in (2.7), which makes the remainder term in (2.7) with \(n = C i\) subleading. It remains to estimate the terms \((H^{({\mathcal {V}})})^k{\textbf{1}_{S_1(x)}}\) for \(k = i\), ..., Ci, since they vanish for \(k < i\) when restricted to \(B_i(x)^c\). For \(k \geqslant i\), we relate \((H^{({\mathcal {V}})})^k{{\textbf{1}}_{S_1(x)}}\) to the number of a family of walks on the graph \({\mathbb {G}}\). We obtain optimal bounds on this number by a path counting argument, exploiting the tree structure of \({\mathbb {G}}|_{B_r(x)}\), a precise bound on the degrees in \(B_r(x){\setminus } \{x \}\) and the concentration of the sphere sizes \(|S_i(x) |\) for \(i \leqslant r\).

2.3.2 Approximate eigenvalues

We now sketch how (ii) is proved. In contrast to the case of local profile vectors discussed above, the proof of \(\Vert E \Vert \lesssim \max _{x \in {\mathcal {W}}} \Vert (H- \lambda (x)) {\textbf{u}} (x) \Vert \) requires also a control of the nonzero overlaps for \(x \ne y\). By exponential decay of \({\textbf{u}}(x)\), it is easy to see that these overlaps are \(N^{-o(1)}\), but, as explained above, a polynomial bound \(N^{-c}\) is required to prove localization. The construction of \({\textbf{u}}(x)\) and \(\lambda (x)\) and a simple computation reveal that

(2.8)

Then the main idea to estimate \(\varepsilon _y(x)\) is the following elementary bound. Let \({\mathcal {T}}\) be a finite set. For any \((u_t)_{t\in {\mathcal {T}}} \in {\mathbb {C}}^{{\mathcal {T}}}\) and any random \(T \subset {\mathcal {T}}\) we have

$$\begin{aligned} {\mathbb {E}}\Biggl [\sum _{t \in T} |u_t |^2\Biggr ]= \sum _{t \in {\mathcal {T}}} {\mathbb {E}}[\mathbb {1}_{t \in T}] |u_t |^2 \leqslant \max _{t \in {\mathcal {T}}} {\mathbb {P}}(t \in T) \sum _{t \in {\mathcal {T}}} |u_t |^2. \end{aligned}$$
(2.9)

Heuristically, by the independence of the edges in the Erdős-Rényi graph, the edges between \({\mathcal {V}} {\setminus } \{x\}\) and \(({\mathcal {V}} {\setminus } \{x\})^c\) are sampled independently of the subgraphs \({\mathbb {G}}|_{({{\mathcal {V}}}{\setminus }\{x\})^c}\) and \({\mathbb {G}}|_{{\mathcal {V}}{\setminus }\{x\}}\) and are, therefore, independent of \({\textbf{u}}(x)\). Hence, (2.8), (2.9) and \(|S_1(y) | \lesssim \log N\) yield

$$\begin{aligned} {\mathbb {E}}\big [|\varepsilon _y(x)|^2\big |A^{(y)}\big ]\leqslant & {} \frac{1}{d} \,{\mathbb {E}}\bigg [|S_1(y)|\sum _{t\in S_1(y)}\langle {\textbf{1}}_t, {\textbf{u}}(x) \rangle ^2\bigg |A^{(y)}\bigg ] \\\lesssim & {} \frac{(\log N)^2}{Nd}\Vert \textbf{u}(x)\Vert ^2=N^{-1+o(1)}. \end{aligned}$$

From \(|{{\mathcal {V}}}|=N^{\mu +o(1)}\) and Chebyshev’s inequality, we therefore conclude \(\Vert (H-\lambda (x)){\textbf{u}}(x)\Vert ^2\leqslant N^{2\mu -1+o(1)}\) with high probability.

Since \(({\textbf{u}}(x))_{x \in {\mathcal {W}}}\) is not an orthogonal family, we choose the first columns of U in (2.5) as the Gram-Schmidt orthonormalization \(( {\textbf{u}}^\perp (x))_{x \in {\mathcal {W}}}\) of \(({\textbf{u}}(x))_{x \in {\mathcal {W}}}\) (with respect to a fixed order on \(\mathcal W\)), i.e.

(2.10)

where \(\Pi _{<x}\) denotes the orthogonal projection onto \({\text {span}}\{ \textbf{u}(y) :y \in {\mathcal {W}}, \, y < x\}\). It remains to show that, for any \(x \in {\mathcal {W}}\), \(\Vert (H-\lambda (x)){\textbf{u}}^\perp (x) \Vert \) is also bounded by an inverse power of N. The denominator in (2.10) is \( > rsim 1\) since and for all \(y \ne x\). Moreover, \(H-\lambda (x)\) applied to the numerator of (2.10) is bounded by a negative power of N, since \(\Vert (H-\lambda (x)){\textbf{u}}(x) \Vert \leqslant N^{\mu - 1/2 + o(1)}\) as shown above, and \(\Vert \overline{\Pi } \!\,_{< x} H {\Pi }_{<x} \Vert \leqslant N^{\mu - 1/2 + o(1)}\) where . The latter bound is proved using the above estimate on \(\Vert (H -\lambda (x))\textbf{u}(x) \Vert \). Given this construction and the bounds explained above, we extend \(({\textbf{u}}^\perp (x))_{x \in {\mathcal {W}}}\) to an orthonormal basis and choose these vectors as columns of U. In particular, the first \(n = |{\mathcal {W}} |\) columns of U are given by \(({\textbf{u}}^\perp (x))_{x \in {\mathcal {W}}}\).

What remains is to show that \({{\,\textrm{spec}\,}}(X)\) is separated from I. To that end, we decompose its domain of definition into the span of \(\textbf{w}_1\), the eigenvector of H associated with its largest eigenvalue, and its orthogonal complement. This largest eigenvalue and the overlaps between \({\textbf{w}}_1\) and \({\textbf{u}}^\perp (x)\) for all \(x \in {\mathcal {W}}\) can be controlled relatively precisely, and we omit \({\textbf{w}}_1\) from the remaining explanations below. It suffices to show that \(\lambda _1(X) \leqslant \Lambda (\alpha ^*) + o(1)\), by the definitions of \({\mathcal {V}}\) and \({\mathcal {W}}\). This upper bound on \(\lambda _1(X)\) is equivalent to

$$\begin{aligned} \lambda _1(\overline{\Pi } \!\,_{{\mathcal {W}}} H \overline{\Pi } \!\,_{{\mathcal {W}}}) \leqslant \Lambda (\alpha ^*) + o(1), \end{aligned}$$
(2.11)

where and \(\Pi _{{\mathcal {W}}}\) is the orthogonal projection onto \({\text {span}}\{ {\textbf{u}}^\perp (x) :x \in {\mathcal {W}}\}\). An estimate of the form (2.11) was first derived in [12], except that there the projection was defined in terms local profile vectors. Since we are using the global profile vectors \({\textbf{u}}^\perp (x)\), in our case this estimate is considerably more involved. The rough strategy is to make a link between (2.11) and a corresponding estimate for local profile vectors, which was already established in [12] using bounds on the non-backtracking matrix of H, an Ihara-Bass type identity, and a local delocalization result for approximate eigenvectors.

To that end, let Q be the orthogonal projection onto the complement of \(\bigcup _{y \in {\mathcal {V}} {\setminus } {\mathcal {W}}} B_{2r_\star - 1}(y)\), where \(r_\star \asymp \sqrt{\log N}\) is chosen as in [12, eq. (1.8)]. In particular, the local profile vectors \({\textbf{v}}(x)\) from [12] satisfy \({{\,\textrm{supp}\,}}{\textbf{v}}(x)\subset B_{r_\star }(x)\) for every \(x \in {\mathcal {V}}\). We denote by \(\Pi _Q\) the projection onto \({\text {span}}\{ Q {\textbf{u}}^\perp (x) :x \in {\mathcal {W}}\}\). If we obtain a small enough upper bound on \(\Vert \Pi _{{\mathcal {W}}} - \Pi _{Q} \Vert \) then it suffices to show \(\lambda _1(\overline{\Pi } \!\,_Q H \overline{\Pi } \!\,_Q) \leqslant \Lambda (\alpha ^*) + o(1)\), where . To get a sufficient estimate for \(\Vert \Pi _{{\mathcal {W}}} - \Pi _{Q} \Vert \), we need that \({\textbf{u}}^\perp (x)|_{B_r(y)}\) is polynomially small in N for \(x \ne y\) as a summation over \(x,y \in {\mathcal {W}}\) is required. This is achieved through an argument motivated by (2.9). Let \(\Pi _{{\textbf{v}}}\) be the orthogonal projection onto \({\text {span}}\{ {\textbf{v}}(x) :x \in {\mathcal {V}} {\setminus } \mathcal W\}\). By definition of Q and \({{\,\textrm{supp}\,}}{\textbf{v}}(x) \subset B_{r_\star }(x)\), \(\Pi _Q\) and \(\Pi _{{\textbf{v}}}\) commute. Thus, \(\lambda _1(\overline{\Pi } \!\,_Q H \overline{\Pi } \!\,_Q) \leqslant \max \{\lambda _1(\Pi _{{\textbf{v}}} H \Pi _{{\textbf{v}}}), \lambda _1(\overline{\Pi } \!\,_Q \overline{\Pi } \!\,_{{\textbf{v}}} H\overline{\Pi } \!\,_{\textbf{v}}\overline{\Pi } \!\,_Q ) \} + o(1)\), where . By [12], \(\lambda _1(\Pi _{{\textbf{v}}} H \Pi _{{\textbf{v}}}) \leqslant \Lambda (\alpha ^*) + o(1)\). For eigenvectors of H which are orthogonal to the local profile vectors \({\textbf{v}}(x)\) and whose associated eigenvalues are large enough, we obtain a weak delocalization estimate by following an argument in [12]. This weak delocalization estimate shows that \(\lambda _1(\overline{\Pi } \!\,_Q \overline{\Pi } \!\,_{{\textbf{v}}} H\overline{\Pi } \!\,_{{\textbf{v}}}\overline{\Pi } \!\,_Q ) \leqslant \Lambda (\alpha ^*) + o(1)\) if \(\lambda _1(\overline{\Pi } \!\,_{{\mathcal {W}}} H^{({\mathcal {V}}{\setminus } {\mathcal {W}})} \overline{\Pi } \!\,_{{\mathcal {W}}}) \leqslant \Lambda (\alpha ^*) + o(1)\). The last bound is finally obtained by a careful analysis of the spectrum of \(H^{(\mathcal V{\setminus } {\mathcal {W}})}\), which is based on viewing it as a perturbation of \({\Pi }_{{\mathcal {W}}} H^{({\mathcal {V}}{\setminus } {\mathcal {W}})} {\Pi }_{{\mathcal {W}}} + \overline{\Pi } \!\,_{{\mathcal {W}}} H^{({\mathcal {V}}{\setminus } {\mathcal {W}})} \overline{\Pi } \!\,_{{\mathcal {W}}}\) and analysing \(H^{({\mathcal {V}}{\setminus } {\mathcal {W}})}\) on \({{\,\textrm{ran}\,}}\Pi _{{\mathcal {W}}}\) in detail.

2.3.3 Eigenvalue spacing

We now sketch how to prove (iii). To that end, we fix \(a \ne b \in {\mathcal {W}}\). To prove that \(\lambda (a)\) and \(\lambda (b)\) are not too close to each other, we choose an appropriate radius r on the scale \(\frac{\log N}{\log d}\) of the diameter of \({\mathbb {G}}\). Then we fix the two subgraphs \({\mathbb {G}}|_{B_{r}(b)}\) and \({\mathbb {G}}|_{B_{r}(b)^{c}}\) and show that resampling the edges between \(S_r(b)\) and \(B_r(b)^c\) results in a substantial change of \(\lambda (b)\) while \(\lambda (a)\) remains almost unchanged: we establish simultaneous anticoncentration for \(\lambda (b)\) and concentration for \(\lambda (b)\), which yields anticoncentration for their difference. The edges between \(S_r(b)\) and \(B_r(b)^c\) form an independent family of Bernoulli random variables by definition of the Erdős-Rényi graph.

On a more formal level, we work conditionally on and prove the following two statements in order to obtain a lower bound on \(|\lambda (a)-\lambda (b)|\).

  1. (a)

    \(\lambda (a)\) fluctuates little under resampling of the edges between \(S_{r}(b)\) and \(B_{r}(b)^{c}\), i.e. the concentration estimate

    $$\begin{aligned} {\mathbb {P}}(|\lambda (a)-z|\ll N^{-\eta } | {\mathcal {F}})\geqslant 1-N^{-\eta /2+o(1)} \end{aligned}$$

    holds if z is the second largest eigenvalue of \(H^{({\mathcal {V}} \cup B_r(b){\setminus } \{a\})}\), which is \({\mathcal {F}}\)-measurable.

  2. (b)

    \(\lambda (b)\) fluctuates a lot under resampling of the edges between \(S_{r}(b)\) and \(B_{r}(b)^{c}\), i.e. the anticoncentration estimate

    $$\begin{aligned} {\mathbb {P}}(|\lambda (b)-z|\geqslant N^{-\eta }| {\mathcal {F}})\geqslant 1- N^{-\eta /2+o(1)} \end{aligned}$$

    holds for any \({\mathcal {F}}\)-measurable spectral parameter z in I (see the definition of I after (2.6b)).

We justify (a) by replacing \({\textbf{u}}(a)\) with an \({\mathcal {F}}\)-measurable version. This allows for a use of (2.9) in a similar fashion as in the first part of Sect. 2.3.2 and reveals that, conditionally on \({\mathcal {F}}\), \(\lambda (a)\) is concentrated around the second largest eigenvalue of \(H^{({\mathcal {V}} \cup B_r(b){\setminus } \{ a\})}\) since \(|B_r(b) |\) is not too large due to our choice of r.

The proof of (b) is much more elaborate. We start by noting that \(\lambda (b)\) is characterized by the equation

$$\begin{aligned} \lambda (b) + \frac{1}{d} \sum _{x,y \in S_1(b)} (H^{({\mathcal {V}})} - \lambda (b))^{-1}_{xy} = 0, \end{aligned}$$
(2.12)

as follows from Schur’s complement formula. The main strategy is to derive a recursive family of equations for the Green function, starting from (2.12) and extending to increasingly large spheres around b, to which Kesten’s self-improving anticoncentration result can be applied. To quantify anticoncentration, we use Lévy’s concentration function

(2.13)

where X is a random variable and \(L >0\) is deterministic.

Proposition 2.8

(Theorem 2 of [57]). There exists a universal constant K such that for any independent random variables \(X_1,\ldots , X_n\) satisfying \(Q(X_i,L) \leqslant 1/2\) we have

$$\begin{aligned} Q\bigg (\sum _{i \in [n]} X_i,L\bigg ) \leqslant \frac{K}{\sqrt{n}} \max _{i \in [n]} Q(X_i,L). \end{aligned}$$
(2.14)

This result is an improvement due to Kesten [57] of a classical anticoncentration result of Doeblin, Lévy, Kolmogorov, and Rogozin. Kesten’s insight was that such an estimate can be made self-improving, as manifested by the factor \(\max _{i \in [n]} Q(X_i,L)\) on the right-hand side. This factor is crucial for our argument, as it allows us to successively improve the upper bound on Q.

We now explain more precisely how the expression of \(\lambda (b)\) in terms of a large number of Green function entries is obtained. We shall tacitly use that \({\mathbb {G}}|_{B_{r}(b)}\) is a tree, which can be easily shown to be true with high probability. Applying Schur’s complement formula at \(x\in S_i(b)\), using standard resolvent identities, and arguing similarly as in (2.9) to control errors yields

$$\begin{aligned} \frac{1}{G_{xx}(i-1,z)}= - z-\frac{1}{d}\sum _{y\in S_{1}^+(x)}G_{yy}(i,z)+o(1), \end{aligned}$$
(2.15)

where and \(S_1^+(x) = S_1(x) \cap S_{i+1}(b)\) is the set of children of x in the tree \({\mathbb {G}}|_{B_r(x)}\) rooted at b. The error o(1) is polynomially small in N; it comprises error terms arising from removing vertices from H and neglecting all off-diagonal Green function entries.

Setting the error term in (2.15) to zero, we obtain a recursive equation for the idealized Green function entries \((g_x(z))_{x \in B_r(b){\setminus } \{b\}}\), given by

(2.16)

which is an approximate version of the recursion (2.15) for the actual Green function entries. The recursion begins at the boundary of the ball \(B_r(b)\) and propagates inwards. We note that, for any \(1 \leqslant i \leqslant r\), conditioned on \({\mathcal {F}}\), the family \((g_x(z))_{x \in S_i(b)}\) is independent if \({\mathbb {G}}|_{B_r(b)}\) is a tree and z is \({\mathcal {F}}\)-measurable. From (2.15), (2.16) and \(r \asymp \frac{\log N}{\log d}\), it is not hard to conclude by induction that, for large enough \(\eta >0\), with high probability, \(g_x(z) = G_{xx}(0,z) + o(N^{-\eta }) = ( H^{({\mathcal {V}})}- z)^{-1}_{xx} + o(N^{-\eta })\) for all \(x \in S_1(b)\). Hence, if we can prove that \(Q(g_x(z), N^{-\eta }) \leqslant N^{-c}\) for all \(x \in S_1(b)\) and some constant \(c>0\), then a union bound over \(a \ne b \in {\mathcal {W}}\), \(|{\mathcal {W}} | \asymp N^{\mu }\), the smallness of the off-diagonal entries of \((H^{({\mathcal {V}})} - \lambda (b))^{-1}\) as argued after (2.15), and (2.12) imply anticoncentration for \(\lambda (b) - z\). This is (b), which together with (a) implies that \(\min _{a\ne b \in {\mathcal {W}}} |\lambda (a) - \lambda (b) | \geqslant N^{-\eta }\) with high probability, i.e. (iii).

Therefore, to complete the sketch of (iii), what remains is to prove \(Q(g_x(z), N^{-\eta }) \leqslant N^{-c}\) for all \(x \in S_1(b)\), whose proof we sketch now. Throughout the entire argument we condition on \({\mathcal {F}}\) and use that \({\mathbb {G}}|_{B_r(x)}\) is a tree. To begin the recursion, we first show that \(Q(g_x, d^{-1}) \leqslant 1/2\) for any \(x \in S_r(b)\). This follows from the first case of (2.16), using a weak lower bound on the entries \(G_{yy}(r,z)\) and anticoncentration from the fact that the size of \(S_1^+(x)\) is a binomial random variable conditioned on \({\mathcal {F}}\).

Next, let \(x \in S_i(b)\) for \(1 \leqslant i \leqslant r - 1\). By using the second case in (2.16) and Proposition 2.8, we iteratively refine the resolution, i.e. decrease the second argument of Q, and decrease the upper bound on Q, which are \(d^{-1}\) and 1/2, respectively, at the starting point. Indeed, conditioning on \(A |_{B_i(b)}\), the second case in (2.16) and rescaling the second argument of Q yield

$$\begin{aligned} Q\big (g_x(z), (T^2d)^{-r + i} \big )&\leqslant Q\bigg ( \sum _{y \in S_1^+(x)} g_y(z), (T^2d)^{-r + i +1}\bigg ) \nonumber \\&\leqslant \frac{K}{\sqrt{|S_1^+(x) |}} \max _{y \in S_1^+(x)} Q \big ( g_y(z), (T^2d)^{-r + i + 1} \big ), \end{aligned}$$
(2.17)

where we applied Proposition 2.8 using the independence of \((g_y(z))_{y \in S_1^+(x)}\) in the second step. Here, we also used that \(Q(f(X), L) \leqslant Q (X, T^{-2}L)\) if and \(X \in [T^{-1},T]\) since the derivative of f is bounded from below by \(T^{-2}\) on this interval, and that, with high probability, g lies in \([T^{-1}, T]\) for \(T \asymp \sqrt{\frac{\log N}{d}}\).

The estimate (2.17) yields the desired self-improvement provided that \(|S_1^+(x) |\) is large enough. However, \(|S_1^+(x) |\) is not large enough for all vertices x in \(B_r(b)\) (and in fact consistently applying (2.17) at all vertices yields an anticoncentration bound at the root b that is far from optimal and too weak to conclude localization). Sometimes, a better bound than (2.17) can be obtained by replacing Kesten’s estimate (2.14) with the trivial estimate

$$\begin{aligned} Q\bigg (\sum _{i \in [n]} X_i,L\bigg ) \leqslant \min _{i \in [n]} Q(X_i,L), \end{aligned}$$
(2.18)

which follows immediately from the independence of the random variables \(X_i\). Although this estimate lacks the factor \(K / \sqrt{n}\) from (2.14), it replaces the maximum with a minimum. Thus, an important ingredient in our recursive self-improving anticoncentration argument is an algorithm that determines which of (2.14) or (2.18) is to be used at any given vertex \(x \in B_r(b)\). It relies on the notion of robust vertices.

Recursively, a vertex \(x \in B_{r}(b)\) is called robust if \(x \in S_r(b)\) or \(S_1^+(x)\) contains at least d/2 robust vertices. We denote the set of robust vertices by \({\mathcal {R}}\). An important auxiliary result is that the root b is robust with high probability, which in particular implies that \(S_i(b) \cap {\mathcal {R}}\) is large for any \(i \leqslant r\). Therefore, we can restrict to \(x \in S_i(b) \cap {\mathcal {R}}\) and proceed similarly as in (2.17) to obtain

$$\begin{aligned} Q(g_x(z), (T^2d)^{-r +i })\leqslant & {} Q\bigg ( \sum _{y \in S_1^+(x)\cap {\mathcal {R}}} g_y(z), (T^2d)^{-r + i + 1}\bigg ) \\\leqslant & {} \frac{K\sqrt{2}}{\sqrt{d}} \max _{y \in S_1^+(x)\cap {\mathcal {R}}} Q \big ( g_y(z), (T^2d)^{-r + i + 1} \big ), \end{aligned}$$

where in the first step we used (2.18). Thus, we obtain \( Q(g_x(z), (T^2d)^{-r}) \leqslant (K \sqrt{2d^{-1}})^{r-1} \) for all \(x \in S_1(b)\) and, therefore, by choosing r such that \((T^2d)^{r+1} = N^{\eta }\), we arrive at \(Q(g_x(z), N^{-\eta }) \leqslant N^{-\eta /2 + o(1)}\) for all \(x \in S_1(b)\). This is the desired anticoncentration bound, (b), which, as explained above, implies (iii).

2.3.4 The three main exponents of N

Throughout this paper we use the three exponents \(\mu \), \(\zeta \), \(\eta >0\) to control three central quantities of the argument. We summarize their roles here for easy reference.

  • \(N^\mu \) is the typical size of the vertex sets \({\mathcal {V}}\) and \({\mathcal {W}}\) (cf. Proposition 3.2 (i)), as the parameter \(\mu \) is introduced to control \(\alpha ^*\) (see (1.6)). Consequently, \(N^\mu \) is also the typical number of eigenvalues of H satisfying (1.7). Therefore, the factor \(N^\mu \) emerges from union bounds when a property is required for all \(x \in {\mathcal {V}}\) or \(x \in {\mathcal {W}}\).

  • \(N^{-\zeta }\) is the upper bound on the eigenvalue approximation that we establish in Proposition 2.3, i.e. on the distance between \(\lambda (x)\) and the eigenvalues of H or, more precisely, the upper bound on \(\Vert (H-\lambda (x)){\textbf{u}}(x) \Vert \). Proposition 2.3 requires the condition \(\zeta < 1/2-3\mu /2\).

  • \(N^{-\eta }\) is the lower bound on the eigenvalue spacing, or correspondingly \(\min _{x \ne y \in {\mathcal {W}}} |\lambda (x)-\lambda (y) |\), that we prove in Proposition 2.4. Since the typical eigenvalue spacing is \(N^{-\mu }\) in the interval we consider, we clearly need \(\mu < \eta \). In fact, our proof of Proposition 2.4 requires the stronger condition \(8\mu < \eta \) for technical reasons as well as \(\mu <1/24\).

To apply perturbation theory in the proof of Theorem 1.1, we need that the error in the eigenvalue approximation \(N^{-\zeta }\) be smaller than the eigenvalue spacing \(N^{-\eta }\). This means that \(\eta < \zeta \).

2.4 Localization profile: proof of Theorem 1.6

Theorem 1.6 is an immediate consequence of the following result.

Proposition 2.9

(Local approximation for \({\textbf{u}}(x)\)). Let \(\mu \in [0,1/3)\). Then, with high probability, the following holds for all \(x\in {\mathcal {W}}\). If \(r \in {\mathbb {N}}\) satisfies

$$\begin{aligned} \log d \ll r \leqslant \min \bigg \{ \frac{1}{6} \frac{\log N}{\log d}, \min \bigg \{ \frac{1}{5} - \frac{\mu }{4}, \frac{1}{3} - \mu \bigg \} \frac{\log N}{\log d} - 2 \bigg \}, \end{aligned}$$
(2.19)

then

  1. (i)

    \({\textbf{u}}(x) = {\textbf{v}}_r(x)+o(1)\),

  2. (ii)

    \({\textbf{u}}(x) = {\textbf{w}}_r(x) + o(1)\).

Part (i) is proved in Sect. 3.2 below. Part (ii) follows from Corollary 4.2 below, since \(\Lambda (\alpha _x) \geqslant 2\).

Proof of Theorem 1.6

From (2.3) in the proof of Theorem 1.1, we know that \({\textbf{w}} = {\textbf{u}}(x) + O(N^{-\varepsilon })\) for a unique \(x \in \mathcal W\) and some small enough \(\varepsilon >0\). Therefore, Theorem 1.6 follows from Proposition 2.9. \(\square \)

2.5 Mobility edge: proof of Theorem 1.7

Part (i) follows from Theorem 1.1, since if \(d \geqslant \frac{23}{24} b_* \log N\), by Lemma D.1 below. Part (ii) was proved in [14, Theorem 1.1 (ii)].

2.6 Localization length: proof of Theorem 1.8

For any eigenvector \({\textbf{w}}\) we introduce the function

so that \(\ell ({\textbf{w}}) = \min _u q(u)\).

For the localized phase, suppose that \(\lambda \) is an eigenvalue satisfying \(|\lambda | \geqslant 2 + \kappa \) with associated eigenvector \({\textbf{w}}\). Denote by x the unique vertex associated with \({\textbf{w}}\) from Theorems 1.1 and 1.6. Then the following estimates hold on the intersection of the high-probability events of these two theorems. By Theorem 1.1, there exists a constant \(R \equiv R_\kappa \) such that

(2.20)

where the third inequality holds for large enough constant R. Moreover, for any \(\varepsilon > 0\) there exists a constant \(R' \in {\mathbb {N}}\) such that for \(u \in B_R(x)\) we have

(2.21)

where the second step follows from the estimate \({\textrm{d}}(u,y) \leqslant R + {\textrm{d}}(x,y)\) and the exponential decay from Theorem 1.1, and the third step from Theorem 1.6 with some \(\log d \ll r \ll \frac{\log N}{\log d}\).

To analyse the sum, we abbreviate and introduce the set for \(0 \leqslant i \leqslant r\), which is the set of vertices in \(S_i(x)\) whose geodesic to x does not pass through u. By Proposition 3.2 (ii) below, the graph \({\mathbb {G}} \vert _{B_r(x)}\) is a tree, which implies

$$\begin{aligned} {\textrm{d}}(u,y) = {\left\{ \begin{array}{ll} k + i &{} \text {if } y \in T_i(u,x) \\ |k - i | &{} \text {if } y \in S_i(x) {\setminus } T_i(u,x). \end{array}\right. } \end{aligned}$$
(2.22)

Next, we estimate \(|S_i(x) {\setminus } T_i(u,x) |\). For \(1 \leqslant i \leqslant k\), the set \(S_i(x) {\setminus } T_i(u,x)\) consists of the unique vertex on the geodesic from x to u at distance i from x. For \(i > k\), we have \(S_i(x) {\setminus } T_i(u,x) = S_i(x) \cap S_{i - k}(u)\). Here we used the tree structure of \({\mathbb {G}} \vert _{B_r(x)}\). Hence, we conclude that for \(i > k\) we have \(|S_i(x) {\setminus } T_i(u,x) | \leqslant |S_{i - k}(u) | \lesssim (\log N) d^{i - k - 1}\), by Proposition 3.1.

Next, by [12, Lemma 5.4], with high probability we have \(|S_i(x) | = |S_1(x) | d^{i - 1} (1 + o(1))\) for \(1 \leqslant i \leqslant r\). Since \(|S_1(x) | \geqslant \alpha ^* d \geqslant 2d\), we conclude that \(|S_i(x) | \geqslant d^i\) for \(i \leqslant r\). Putting all of these estimates together, we conclude for all \(1 \leqslant i \leqslant r\) that

$$\begin{aligned} \frac{|S_i(x) {\setminus } T_i(u,x) |}{|S_i(x) |} \lesssim {\left\{ \begin{array}{ll} d^{-i} &{} \text {if } 1 \leqslant i \leqslant k \\ (\log N) d^{-k-1} &{} \text {if } k < i \leqslant r. \end{array}\right. } \end{aligned}$$

Since \(T_i(u,x) = S_i(x)\) for \(k = 0\) or \(i = 0\), using the condition (1.5) we conclude that

$$\begin{aligned} \frac{|S_i(x) {\setminus } T_i(u,x) |}{|S_i(x) |} = o(1) \end{aligned}$$
(2.23)

for all \(u \in B_R(x)\) and \(0 \leqslant i \leqslant r\).

Next, using (2.22) and recalling the definition (1.11), we write the above sum as

where in the last step we used (2.23), the fact that \(R'\) is constant, and \(\sum _{i = 0}^{r-1} u_i(\alpha _x)^2 = 1\). The latter sum is clearly minimized for \(k = 0\). Recalling (2.21), we therefore conclude that for any \(u \in B_{R}(x)\) we have

$$\begin{aligned} q(u) \geqslant q(x) + O(\varepsilon ) + o(1), \quad q(x) = \sum _{i = 1}^\infty i u_i(\alpha _x)^2 + O(\varepsilon ) + o(1) \end{aligned}$$

for large enough \(R'\) depending on \(\varepsilon \). Since \(\varepsilon > 0\) was arbitrary, and recalling (2.20), by taking the minimum over \(u \in [N]\), it therefore suffices to show that

$$\begin{aligned} \sum _{i = 1}^\infty i u_i(\alpha _x)^2 = \frac{\alpha _x}{2(\alpha _x - 2)} + o(1) = \frac{|\lambda |}{2 \sqrt{\lambda ^2 - 4}} + o(1). \end{aligned}$$
(2.24)

The first equality of (2.24) is an elementary computation using the definition (1.10), recalling the normalization \(\sum _{i = 0}^{r - 1} u_i(\alpha )^2 = 1\) and that \(r \gg 1\). The second equality of (2.24) follows from the estimate \(|\lambda | = \Lambda (\alpha _x) + o(1)\) by Remark 1.2, which can be inverted to obtain \(\frac{1}{\alpha _x} = \frac{1}{2}\bigl (1 - \frac{\sqrt{\lambda ^2 - 4}}{|\lambda |}\bigr ) + o(1)\).

For the delocalized phase, suppose that \(\lambda \) is an eigenvalue satisfying \(|\lambda | \leqslant 2 - \kappa \) with associated eigenvector \({\textbf{w}}\). We use [14, Theorem 1.1 (ii)] to deduce that with probability \(1 - O(N^{-10})\) we have \(\Vert \textbf{w}\Vert _\infty ^2 \leqslant N^{-1 + o(1)}\). Hence, with probability \(1 - O(N^{-9})\) we have, for any \(x \in [N]\) and \(r \geqslant 0\),

Next, we deduce from [32, Lemma 1] that for any constant \(\varepsilon > 0\) there is a constant \(\delta > 0\) such that, with high probability, if \(r \leqslant (1 - \varepsilon ) \frac{\log N}{ \log d}\) then \(|B_r(x) | \leqslant N^{1 - \delta }\) for all \(x \in [N]\). Choosing , we conclude that, for any \(\varepsilon > 0\), with high probability, for all eigenvectors \({\textbf{w}}\) with eigenvalue \(\lambda \) satisfying \(|\lambda | \leqslant 2 - \kappa \), we have \(\ell ({\textbf{w}}) \geqslant (1 - \varepsilon + o(1)) \frac{\log N}{\log d}\). Since \(\varepsilon > 0\) was an arbitrary constant, we conclude the stronger lower bound \(\ell ({\textbf{w}}) \geqslant (1 - o(1)) \frac{\log N}{\log d} = {{\,\textrm{diam}\,}}({\mathbb {G}}) (1 + o(1))\), where the last step follows from [32]. The complementary upper bound \(\ell ({\textbf{w}}) \leqslant {{\,\textrm{diam}\,}}({\mathbb {G}})\) follows by definition of \({{\,\textrm{diam}\,}}({\mathbb {G}})\), since \({\textrm{d}}(x,y) \leqslant {{\,\textrm{diam}\,}}({\mathbb {G}})\) for all \(x,y \in [N]\) (here we used that under our assumption on d, the graph \({\mathbb {G}}\) is with high probability connected). This concludes the proof.

3 Preliminaries

The rest of this paper is devoted to the proofs of Propositions 2.2, 2.3 and 2.4. We begin with a short section that collects some basic properties of the graph \({\mathbb {G}}\) and its spectrum.

3.1 Properties of the graph

In this subsection, we collect some basic local properties of the Erdős-Rényi graph \({\mathbb {G}}\) around vertices in \({\mathcal {V}}\).

Proposition 3.1

Suppose that \(\sqrt{\log N} \ll d \leqslant 3 \log N\). With high probability, the following holds.

  1. (i)

    \(\max _{x\in [N]}|S_1(x)|\leqslant 10 \log N\).

  2. (ii)

    \(|B_i(x)|\lesssim \max \{|S_1(x) |,d\} d^{i-1}\) for all \(x \in [N]\) and all \(i \in {\mathbb {N}}\) with \(i \leqslant \frac{1}{3}\frac{\log N}{\log d}\).

Item (i) is a simple application of Bennett’s inequality and (ii) follows from [12, 13]; a detailed proof is given in [15, Appendix E]. In particular, by the assumption \(d \gg \sqrt{\log N}\), we get from Proposition 3.1 (i) that

$$\begin{aligned} \Lambda (\alpha _x) \lesssim \sqrt{\alpha _x} \lesssim \sqrt{\frac{\log N}{d}} \ll \sqrt{d} \end{aligned}$$
(3.1)

for all \(x \in [N]\) on the high-probability event from Proposition 3.1.

Proposition 3.2

Let \(\mu \in [0,1/3)\) be a constant. Suppose that \(\sqrt{\log N} \ll d \leqslant 3 \log N\). With high probability, for any \(r \in {\mathbb {N}}\) satisfying

$$\begin{aligned} 1\leqslant r \leqslant \min \bigg \{\frac{1}{5}- \frac{\mu }{4},\frac{1}{3}-\mu \bigg \}\frac{\log N}{\log d}, \end{aligned}$$
(3.2)

the following holds.

  1. (i)

    \(|{\mathcal {V}}|\leqslant N^{\mu +o(1)}\).

  2. (ii)

    \({\mathbb {G}}|_{B_{r}(x)}\) is a tree for all \(x \in {\mathcal {V}}\).

  3. (iii)

    \(B_{r}(x)\cap B_{r}(y)=\emptyset \) for all \(x, y\in {{\mathcal {V}}}\) satisfying \(x \ne y\).

  4. (iv)

    Let \(\nu \in [0,1]\). If \(1- \mu - \nu - r \frac{\log d}{\log N} > rsim 1\) then \(|S_1(y) | \leqslant \alpha ^*(\nu ) d \) for all \(y \in \bigcup _{x \in {\mathcal {V}}} (B_r(x){\setminus } \{x \})\).

These statements are all consequences of [13, 16]; the details are given in [15, Appendix E].

3.2 Properties of the spectrum

In this subsection we collect basic spectral properties of H and some of its submatrices.

Definition 3.3

Let \({\textbf{w}}_1\) be a normalized eigenvector of H with nonnegative entries associated with the largest eigenvalue \(\lambda _1(H)\) of H.

Note that, with high probability, \({\textbf{w}}_1\) is unique and coincides with the Perron–Frobenius eigenvector of the giant component of \({\mathbb {G}}\).

Proposition 3.4

Suppose (1.5). With high probability the following holds.

  1. (i)

    For any \(\mu \in [0,1]\) we have \(\max \{ \lambda _2(H^{({\mathcal {V}})}), -\lambda _N(H^{({\mathcal {V}})})\} \leqslant \Lambda (\alpha ^*) + o(1)\).

  2. (ii)

    Fix \(\mu \in [0,1/3)\). If \(X \subset \bigcup _{x\in {\mathcal {V}}}B_r(x)\) with \(r \in {\mathbb {N}}\) as in (3.2), then \(\lambda _1(H^{(X)})\) and the corresponding eigenvector \({\textbf{w}}\) of \(H^{(X)}\) satisfy

    $$\begin{aligned} \lambda _1(H^{(X)}) = \sqrt{d}(1 + o(1)), \qquad \qquad \biggl \Vert \textbf{w}- \frac{{\textbf{1}}_{X^c}}{|X^c |^{1/2}} \biggr \Vert = o(1). \end{aligned}$$
  3. (iii)

    Fix \(\mu \in [0,1)\). If \(x \in {\mathcal {V}}\), \(r \in {\mathbb {N}}\) satisfies \(\log d \ll r \leqslant \frac{1}{6} \frac{\log N}{\log d}\) and \(X \subset [N]\) satisfies \(X \cap B_r(x) = \emptyset \), then

    $$\begin{aligned} \Vert (H^{(X)} - \Lambda (\alpha _x)) {\textbf{v}}_r (x) \Vert = o(1), \end{aligned}$$

    where \({\textbf{v}}_r(x)\) was defined in (1.11).

  4. (iv)

    Fix \(\mu \in [0,4/5)\). For any \(r \in {\mathbb {N}}\) satisfying \(r \ll \frac{d}{\log \log N}\), there is a normalized vector \({\textbf{q}}\) with \({{\,\textrm{supp}\,}}{\textbf{q}} \subset \big ( \bigcup _{x \in {\mathcal {V}}} B_{r+1}(x) \big )^c\) such that

    $$\begin{aligned} \Vert (H - \sqrt{d}){\textbf{q}} \Vert \lesssim d^{-1/2}, \qquad \Vert {\textbf{w}}_1 - {\textbf{q}} \Vert \lesssim d^{-1}, \quad \Vert {\textbf{q}} - N^{-1/2}{\textbf{1}}_{[N]} \Vert \lesssim d^{-1/2}. \end{aligned}$$

These results follow essentially from [12, 13, 16]; the detailed proof is presented in Section C below.

Definition 3.5

We denote by \(\Omega \) the intersection of the high-probability events of Propositions 3.1, 3.2 and 3.4.

In particular, on \(\Omega \) the estimate (3.1) holds.

Corollary 3.6

Fix \(\mu \in [0,1/3)\). On \(\Omega \), the following holds for all \(x \in {\mathcal {W}}\) and all \(X \subset [N]\). If \(X \cap B_r(x) = \emptyset \) and \({\mathcal {V}} {{\setminus }} \{x \} \subset X \subset \bigcup _{y\in \mathcal V}B_r(y)\) for some \(r \in {\mathbb {N}}\) satisfying \(\log d \ll r \leqslant \min \big \{ \frac{1}{6}, \frac{1}{5} - \frac{\mu }{4}, \frac{1}{3} - \mu \big \} \frac{\log N}{\log d}\) then

$$\begin{aligned} \lambda _1(H^{(X)}) \!= \!\sqrt{d} ( 1+ o(1)), \qquad \lambda _2(H^{(X)}) \!=\! \Lambda (\alpha _x) \!+\! o(1), \qquad \lambda _3(H^{(X)}) \leqslant \Lambda (\alpha ^*) \!+\! o(1). \end{aligned}$$

Corollary 3.6 with \(X = {\mathcal {V}} {{\setminus }} \{x \}\) directly implies that, on \(\Omega \),

$$\begin{aligned} \lambda (x) = \lambda _2(H^{({\mathcal {V}}{\setminus } \{ x\})}) = \Lambda (\alpha _x) + o(1). \end{aligned}$$
(3.3)

Proof of Corollary 3.6

The statement about \(\lambda _1(H^{(X)}) \) is identical to Proposition 3.4 (ii). By eigenvalue interlacing (Lemma D.3), we have

$$\begin{aligned} \lambda _{3}(H^{(X)})\leqslant \lambda _{3}(H^{({\mathcal V}{\setminus }\{x\})})\leqslant \lambda _{2}(H^{({\mathcal V})})\leqslant \Lambda (\alpha ^{*})+o(1), \end{aligned}$$

where for the last inequality we used Proposition 3.4 (i). Finally, Proposition 3.4 (iii) implies that there exists an eigenvalue of \(H^{(X)}\) at distance o(1) from \(\Lambda (\alpha _x)\). Because of the estimates on \(\lambda _{1}(H^{(X)})\) and \(\lambda _{3}(H^{(X)})\) just proven, and because \(\Lambda (\alpha _x) \geqslant \Lambda (\alpha ^*) + \kappa /2\) for \(x \in {\mathcal {W}}\) (recall (2.2)) as well as \(\Lambda (\alpha _x) \ll \sqrt{d}\) by (3.1), this eigenvalue has to be \(\lambda _{2}(H^{(X)})\). \(\square \)

Proof of Proposition 2.9 (i)

We show that the conclusion of Proposition 2.9 (i) holds on \(\Omega \). The proof uses a spectral gap of \(H^{({\mathcal {V}} {{\setminus }} \{x \})}\) around \(\lambda (x) = \lambda _2(H^{({\mathcal {V}} {{\setminus }} \{x \})})\), that \({\textbf{v}}_r(x)\) is an approximate eigenvector of \(H^{({\mathcal {V}} {\setminus } \{ x\})}\) by Proposition 3.4 (iii), and perturbation theory. Indeed, from Corollary 3.6 with \(X = {\mathcal {V}} {\setminus } \{ x\}\), recalling the definition of \({\mathcal {W}}\) (see (2.2)), we obtain that \(\lambda _2(H^{({\mathcal {V}} {\setminus } \{x \})})\) is separated from the other eigenvalues of \(H^{({\mathcal {V}} {\setminus } \{x \})}\) by a positive constant. Owing to (3.3), Proposition 3.4 (iii) with \(X = {\mathcal {V}} {{\setminus }} \{ x\}\) and Lemma D.2 below imply \(\Vert {\textbf{u}}(x) -{\textbf{v}}_r(x) \Vert =o(1)\), i.e. Proposition 2.9 (i). \(\square \)

We conclude the following corollary from Proposition 2.9 (i) and its proof.

Corollary 3.7

Fix \(\mu \in [0,1/3)\). On \(\Omega \) we have for all \(x \in {\mathcal {W}}\).

Proof

We first note that for any \(r \rightarrow \infty \) as \(N\rightarrow \infty \), we have \(u_0(\alpha _x) = \sqrt{\frac{\alpha _x - 2}{2(\alpha _x - 1)}} + o(1)\) by (1.10). By Proposition 2.9 (i) and its proof, on \(\Omega \), where in second step we used the definition (1.11), and in the last step we used that \(\alpha _x \geqslant 2 + \kappa \), so that the sequence \((u_i(\alpha _x))_{i}\) from (1.10) is exponentially decaying in i, uniformly in r. \(\square \)

3.3 Properties of the Green function

In this subsection, we fix \(\mu \in [0,1/3)\). Define

$$\begin{aligned} {\mathcal {J}} = [\Lambda (\alpha ^*) + \kappa / 4, \sqrt{d} /2]. \end{aligned}$$

We shall use that whenever \(z \in {\mathcal {J}}\), all Green functions appearing in our proof are bounded, which is the content of the following result.

Lemma 3.8

Suppose (1.5) and (3.2). On \(\Omega \), for \(z,z' \in {\mathcal {J}}\) and \(X \subset [N]\) satisfying \({\mathcal {V}} \subset X \subset \bigcup _{x \in {\mathcal {V}}} B_r(x)\), we have

$$\begin{aligned} \Vert (H^{(X)} - z)^{-1} \Vert&\leqslant {8} / \kappa , \end{aligned}$$
(3.4)
$$\begin{aligned} \Vert (H^{(X)} - z)^{-1}-(H^{(X)} - z')^{-1} \Vert&\leqslant (8 / \kappa )^2 \, |z-z'|. \end{aligned}$$
(3.5)

Proof of Lemma 3.8

Eigenvalue interlacing (Lemma D.3) and Proposition 3.4 (i) and (ii) imply

$$\begin{aligned} \lambda _2(H^{(X)})\leqslant \lambda _2(H^{({\mathcal {V}})}) \leqslant \Lambda (\alpha ^*)+ \kappa /8 \qquad \text {and} \qquad \lambda _1(H^{(X)})= \sqrt{d}(1+o(1))\geqslant \sqrt{d}/2+ \kappa /8. \end{aligned}$$

Therefore, \(\text {dist}(z,\text {Spec}(H^{(X)}))\geqslant \kappa /8\) for any \(z\in {\mathcal {J}}\), which proves (3.4). The Lipschitz bound (3.5) follows from (3.4) and the resolvent identity. \(\square \)

Lemma 3.9

Suppose (1.5) and (3.2). On \(\Omega \), for any \({\mathcal {V}} \subset X\subset \bigcup _{x\in {\mathcal {V}}}B_r(x)\), \(z\in {\mathcal {J}}\), and \(y\notin X\), we have

$$\begin{aligned}-( H^{(X)}-z)^{-1}_{yy}\geqslant (3z)^{-1}.\end{aligned}$$

Proof of Lemma 3.9

Denoting by \(\lambda _1\geqslant \lambda _2\geqslant \dots \) and \({\textbf{w}}_1,\textbf{w}_2,\dots \) the eigenvalues and eigenvectors of \(H^{(X)}\), respectively, we have

(3.6)

where in the second step we used \(z - \lambda _N \geqslant z- \lambda _i > 0\) for all \(i \geqslant 2\), which follows from eigenvalue interlacing (Lemma D.3), Proposition 3.4 (i), and the condition \(z \in {\mathcal {J}}\).

To estimate the right-hand side of (3.6), we use Proposition 3.2 (i) and Proposition 3.1 (ii) as well as (3.2) to estimate \(|X | \leqslant \sum _{x \in {\mathcal {V}}} |B_r(x) | \leqslant N^{1/3 + o(1)}\) on \(\Omega \). From Proposition 3.4 (ii) we therefore deduce that

(3.7)

We conclude that the first term on the right-hand side of (3.6) is bounded from below by \(((2+o(1))z)^{-1}\), as \(z - \lambda _N \leqslant 2z\) from Proposition 3.4 (i). The second term on the right-hand side of (3.6) is estimated using (3.7) as well as \(|\lambda _1 - z | > rsim \sqrt{d} \geqslant z\) by Proposition 3.4 (ii) and \(z \in {\mathcal {J}}\). \(\square \)

4 Exponential Decay of \({\textbf{u}}(x)\) and Proof of Proposition 2.2

In this section we establish the exponential decay of \({\textbf{u}}(x)\) around the vertex x. In particular, Proposition 2.2 is a direct consequence of Proposition 4.1 below. Moreover, we prove in Corollary 4.2 below that \({\textbf{u}}(x)\) is well approximated by \({\textbf{w}}_r(x)\), the eigenvector of \(H|_{B_r(x)}\) corresponding to its largest eigenvalue. This implies Proposition 2.9 (ii).

Throughout this section we use the high-probability event \(\Omega \) from Definition 3.5.

4.1 Simple exponential decay of \({\textbf{u}}(x)\)

In this subsection we establish exponential decay at some positive but not optimal rate.

Proposition 4.1

Suppose that (1.5) holds. Then there is a constant \(c \in (0,1)\) such that, for each fixed \(\mu \in [0,1/3)\), on \(\Omega \), for each \(x \in {\mathcal {W}}\) there exists \(q_x > 0\) such that

$$\begin{aligned} \Vert {\textbf{u}}(x)|_{B_i(x)^c} \Vert \lesssim \sqrt{\alpha _x} \, q_x^{i}, \quad q_x = \frac{\Lambda (\alpha ^*(1/2)) + o(1)}{\lambda (x)} \leqslant 1 - c, \end{aligned}$$
(4.1)

for all \(i \in {\mathbb {N}}\) satisfying \(1 \leqslant i \leqslant \min \big \{ \frac{1}{5} - \frac{\mu }{4}, \frac{1}{3} - \mu \big \} \frac{\log N}{\log d}-2\).

Proof

We note that \({{\,\textrm{supp}\,}}{\textbf{u}}(x) \subset ([N] {\setminus } {\mathcal {V}})\cup \{x \}\) and decompose , where is the orthogonal projection on the coordinates in \([N] {\setminus } {\mathcal {V}}\). We apply the projection Q to the eigenvalue-eigenvector relation

solve for \(Q {\textbf{u}}(x)\), and obtain

(4.2)

where we used that \(H^{({\mathcal {V}} {\setminus } \{x \})} {\textbf{1}}_x = \textbf{1}_{S_1(x)}/\sqrt{d}\) and \(Q H^{({\mathcal {V}} {{\setminus }} \{x \})} Q = H^{({\mathcal {V}})}\). We also used that \(\lambda (x) - H^{({\mathcal {V}})}\) is invertible, which can be seen as follows. From Proposition 3.4 (ii) with \(X = {\mathcal {V}}\) and \(r=0\), we conclude \(\lambda _1(H^{({\mathcal {V}})}) = \sqrt{d}(1 + o(1))\) and, hence, (3.3) and Proposition 3.4 (i) yield that \(\lambda (x)\) is not an eigenvalue of \(H^{({\mathcal {V}})}\) and

$$\begin{aligned} {{\,\textrm{dist}\,}}(\lambda (x), {{\,\textrm{spec}\,}}H^{({\mathcal {V}})}) > rsim 1. \end{aligned}$$
(4.3)

Let

the projection onto the coordinates in \([N] {\setminus } ({\mathcal {V}} \cup B_i(x))\). As \({{\,\textrm{supp}\,}}{\textbf{u}}(x) \subset ([N] {\setminus } {\mathcal {V}})\cup \{x \}\) and \(Q_i Q = Q_i\), we conclude from (4.2) that

(4.4)

For any \(n \in {\mathbb {N}}\) we have

$$\begin{aligned} \bigg (1 - \frac{H^{({\mathcal {V}})}}{\lambda (x)} \bigg )^{-1} = \sum _{k=0}^n \bigg (\frac{H^{({\mathcal {V}})}}{\lambda (x)} \bigg )^{k} + \bigg (1 - \frac{H^{({\mathcal {V}})}}{\lambda (x)} \bigg )^{-1} \bigg ( \frac{H^{({\mathcal {V}})}}{\lambda (x)} \bigg )^{n+1}. \end{aligned}$$
(4.5)

Since \(H^{({\mathcal {V}})}\) is a local operator, we conclude that \(Q_i (H^{({\mathcal {V}})})^k {\textbf{1}}_{S_1(x)} = 0\) if \(k + 1 \leqslant i\). Hence, fixing \(i \geqslant 1\) and applying (4.5) with \(n = i - 1\) to (4.4), we get

(4.6)

Here, in the last step, we employed , \(\Vert Q_i \Vert \leqslant 1\), \(\Vert (\lambda (x) - H^{(\mathcal V)})^{-1} \Vert \lesssim 1\) by (4.3).

Let \(r \in {\mathbb {N}}\) be the largest integer satisfying (3.2). We now claim that, on \(\Omega \),

$$\begin{aligned} \Vert H^{({\mathcal {V}})}{\textbf{v}} \Vert \leqslant (\Lambda (\alpha ^*(1/2)) + o(1))\Vert {\textbf{v}} \Vert \end{aligned}$$
(4.7)

for any \({\textbf{v}}\) such that \({{\,\textrm{supp}\,}}{\textbf{v}} \subset B_r(x)\). Before proving (4.7), we use it conclude the proof of (4.1).

For \(i \leqslant r - 2\) we have \({{\,\textrm{supp}\,}}(H^{({\mathcal {V}})})^{i} {\textbf{1}}_{S_1(x)} \subset B_r(x)\), so that iterative applications of (4.7) yield

$$\begin{aligned} \Vert (H^{({\mathcal {V}})})^{i} {\textbf{1}}_{S_1(x)} \Vert \lesssim (\Lambda (\alpha ^*(1/2)) + o(1))^{i} \sqrt{\alpha _x} \sqrt{d}. \end{aligned}$$
(4.8)

Hence, the first bound in (4.1) follows from (4.6). We now show the second bound in (4.1). If \(\frac{\log N}{d} \geqslant T\) for a sufficiently large constant T then \(q_x \leqslant 1 - c\) for some constant \(c > 0\) by using Corollary A.3 with \(\nu = 1/2\), \(\lambda (x) \geqslant \Lambda (\alpha ^*(\mu ))\), \(\mu \leqslant 1/3\) and possibly increasing its T. If \(\frac{\log N}{d} < T \lesssim 1\) then \(\alpha _x \lesssim 1\) by Proposition 3.1 (i) and the estimate \(q_x \leqslant 1- c\) for some constant \(c > 0\) follows from \(\Lambda (\alpha _x) \leqslant 2 \sqrt{\alpha _x} \lesssim 1\) and \(\lambda (x) = \Lambda (\alpha _x) + o(1) \geqslant \Lambda (\alpha ^*) + \kappa /4 \geqslant \Lambda (\alpha ^*(1/2)) + \kappa /4\), which is a consequence of (3.3) and the definition of \({\mathcal {W}}\). This completes the proof of the second bound in (4.1).

What remains is the proof of (4.7). For any \({\textbf{v}} \in {\mathbb {R}}^{[N]}\), the Cauchy-Schwarz inequality implies \(\Vert ({\mathbb {E}}H)^{({\mathcal {V}})}{\textbf{v}} \Vert \leqslant \sqrt{\frac{d |{{\,\textrm{supp}\,}}\textbf{v}|}{N}}\Vert {\textbf{v}} \Vert \). By Proposition 3.1 (ii) and (3.2), if \({{\,\textrm{supp}\,}}{\textbf{v}} \subset B_{r}(x)\) then \(|{{\,\textrm{supp}\,}}{\textbf{v}} | \leqslant N^{1/5+o(1)}\). Therefore,

$$\begin{aligned} \Vert H^{({\mathcal {V}})} {\textbf{v}} \Vert\leqslant & {} \Vert (H - {\mathbb {E}}H)^{({\mathcal {V}})} {\textbf{v}} \Vert + o(\Vert {\textbf{v}} \Vert ) \\\leqslant & {} \Vert (H - {\mathbb {E}}H)^{(\{ y :\, \alpha _y \geqslant \alpha ^*(1/2) \})} {\textbf{v}} \Vert + o(\Vert {\textbf{v}} \Vert )\\\leqslant & {} (\Lambda (\alpha ^*(1/2)) + o(1))\Vert {\textbf{v}} \Vert , \end{aligned}$$

where in the second step, we used by Proposition 3.2 (iv) and, in the third step, \(\Vert (H - {\mathbb {E}}H)^{(\{y \,:\, \alpha _y \geqslant \alpha ^*(1/2)\})} \Vert \leqslant \Lambda (\alpha ^*(1/2)) + o(1)\) by Lemma C.1.Footnote 8 This concludes the proof of (4.7). \(\square \)

4.2 Approximating \({\textbf{u}}(x)\) by \({\textbf{w}}_r(x)\)

From Proposition 4.1 and its proof, we deduce the following result, which compares \({\textbf{u}}(x)\) and \(\textbf{w}_r(x)\) from Definition 1.5.

Corollary 4.2

Suppose that (1.5) holds and fix \(\mu \in [0,1/3)\). Then, on \(\Omega \), for all \(x \in {\mathcal {W}}\) and \(r \in {\mathbb {N}}\) satisfying (2.19),

$$\begin{aligned} {\textbf{u}}(x) = {\textbf{w}}_r(x) + O\big (\alpha _x q_x^{r} \Lambda (\alpha _x)^{-1} \big ) = {\textbf{w}}_r(x) + o\big (\Lambda (\alpha _x)^{-1} \big ) , \end{aligned}$$

where \(q_x\) is the same as in Proposition 4.1.

Proof

We shall apply Lemma D.2 with \(M = H|_{B_r(x)}\), \({\widehat{\lambda }} = \lambda (x)\) and \({\textbf{v}} = {\textbf{u}}(x)\). First, we check its conditions by studying the spectral gap of \(H|_{B_r(x)}\) around its largest eigenvalue. As \(r \leqslant \frac{1}{6} \frac{\log N}{\log d}\), we conclude from Proposition 3.4 (iii) with \(X = B_r(x)^c\) that \(H|_{B_r(x)} = H^{(B_r(x)^c)}\) has an eigenvalue \(\Lambda (\alpha _x) + o(1)\). The bound (4.7) implies that \(\Vert H|_{B_r(x) {\setminus } \{ x\}} \Vert \leqslant \Vert H^{({\mathcal {V}})} \Vert \leqslant \Lambda (\alpha ^*(1/2)) + o(1)\), where in the first step we used Proposition 3.2 (iii). By eigenvalue interlacing (Lemma D.3), we therefore deduce that

$$\begin{aligned} \lambda _1(H|_{B_r(x)}) = \Lambda (\alpha _x) + o(1), \qquad \qquad \lambda _2(H|_{B_r(x)}) \leqslant \lambda _1(H|_{B_r(x) {\setminus } \{ x\}}) \leqslant \Lambda (\alpha ^*(1/2)) + o(1). \end{aligned}$$

Thus, the definition of \({\mathcal {W}}\) in (2.2) implies

$$\begin{aligned} \lambda _1(H|_{B_r(x)}) - \lambda _2(H|_{B_r(x)}) > rsim \Lambda (\alpha _x) \big ( 1 - \frac{\Lambda (\alpha ^*(1/2))}{\Lambda (\alpha ^*(\mu )) +\kappa /2}\big ) > rsim \Lambda (\alpha _x), \end{aligned}$$

where the last inequality follows from Corollary A.3 if \(\frac{\log N}{d} \geqslant T\) for some large enough constant T, and from \(\Lambda (\alpha ^*(1/2)) + \kappa /2 \leqslant \Lambda (\alpha ^*(\mu )) + \kappa /2 \leqslant \Lambda (\alpha _x) \lesssim 1\) (see (3.1)) otherwise. Hence, owing to (3.3), there is \(\Delta > rsim \Lambda (\alpha _x)\) such that \(H|_{B_r(x)}\) has precisely one eigenvalue in \([\lambda (x) - \Delta , \lambda (x) + \Delta ]\).

Let \(P_r\) be the orthogonal projection onto the coordinates in \(B_r(x)\). The eigenvalue-eigenvector relation \((H^{({\mathcal {V}} {\setminus } \{x \})} - \lambda (x)) {\textbf{u}}(x) = 0\) and \(P_r H^{({\mathcal {V}}{\setminus } \{x \})} P_r = H|_{B_r(x)}\) imply

$$\begin{aligned} (H|_{B_r(x)} - \lambda (x)) {\textbf{u}} (x) = - P_r H^{({\mathcal {V}}{\setminus } \{x \})} (1 - P_r){\textbf{u}} (x) - \lambda (x) (1 - P_r) {\textbf{u}}(x). \end{aligned}$$

Therefore, since \((1 - P_r){\textbf{u}}(x) = {\textbf{u}}(x)|_{B_r(x)^c}\), we get

$$\begin{aligned} \Vert (H|_{B_r(x)} - \lambda (x)) {\textbf{u}} (x) \Vert \leqslant \lambda (x) \Vert \textbf{u}(x)|_{B_r(x)^c} \Vert + \Vert P_r H^{({\mathcal {V}}{\setminus } \{x \})} (\textbf{u}(x)|_{B_r(x)^c}) \Vert \lesssim \alpha _x q_x^{r}. \end{aligned}$$
(4.9)

In the last step, we used \(\lambda (x) \lesssim \alpha _x^{1/2}\) and Proposition 4.1 to estimate the first term. For the second term, we used that \(P_r H^{({\mathcal {V}}{\setminus } \{x \})} ({\textbf{u}}(x)|_{B_r(x)^c}) = P_r H^{({\mathcal {V}}{\setminus } \{x \})} ( \textbf{u}(x)|_{S_{r+1}(x)})\) by the locality of \(H^{({\mathcal {V}}{\setminus } \{x \})}\) as well as the identity \(H^{({\mathcal {V}}{\setminus } \{x \})} (\textbf{u}(x)|_{S_{r+1}(x)}) = H^{({\mathcal {V}})} ({\textbf{u}}(x)|_{S_{r+1}(x)})\), which yield

$$\begin{aligned} \Vert H^{({\mathcal {V}}{\setminus } \{x \})} ({\textbf{u}}(x)|_{S_{r+1}(x)}) \Vert \lesssim \Lambda (\alpha ^*(1/2)) \Vert {\textbf{u}}(x)|_{S_{r+1}(x)} \Vert \lesssim \alpha _x^{1/2} \Vert {\textbf{u}}(x)|_{B_r(x)^c} \Vert \lesssim \alpha _x q_x^{r} \end{aligned}$$

due to (4.7) as \(r + 1 \leqslant \min \big \{\frac{1}{5} - \frac{\mu }{4}, \frac{1}{3} - \mu \big \} \frac{\log N}{\log d} \), \(\Lambda (\alpha ^*(1/2))^2 \leqslant \Lambda (\alpha ^*)^2 \lesssim \alpha _x\), and Proposition 4.1.

Since \(\alpha _x \lesssim \frac{\log N}{d}\) by Proposition 3.1 (i), \(q_x \leqslant 1 - c \) for some constant \(c > 0\) by Proposition 4.1, \(r \gg \log d > rsim \log \big (\frac{10\log N}{d} \big )\) by (1.5) and \(\alpha _x \geqslant 2\), we obtain

$$\begin{aligned} \alpha _x q_x^r \ll 1 \lesssim \alpha _x^{1/2}. \end{aligned}$$
(4.10)

Therefore, owing to (4.9) and \(\alpha _x^{1/2} \asymp \Lambda (\alpha _x) \lesssim \Delta \), we can now apply Lemma D.2 with \(M = H|_{B_r(x)}\), \({\widehat{\lambda }} = \lambda (x)\) and \({\textbf{v}} = {\textbf{u}}(x)\) to conclude \({\textbf{u}}(x) = {\textbf{w}}_r(x) + O ( \alpha _x q_x^{r} \Lambda (\alpha _x)^{-1})\) from (4.9) and \(\Delta > rsim \Lambda (\alpha _x)\) as well as and . This proves the first equality in Corollary 4.2.

Since \(\alpha _x q_x^r \ll 1\) by (4.10), the last equality in Corollary 4.2 follows immediately. \(\square \)

4.3 Optimal exponential decay of \({\textbf{u}}(x)\)

In this subsection we establish an explicit rate of exponential decay of \({\textbf{u}}(x)\). It holds only for a smaller set of eigenvectors near the spectral edge, requiring \(\mu \) to be small enough. As pointed out in Remark 1.4, up to the error \(O(\varepsilon )\), this rate is optimal.

Proposition 4.3

Suppose (1.5). Then the following holds.

  1. (i)

    (Subcritical regime) There are constants \(T \geqslant 1\) and \(c>0\) such that if \(\frac{\log N}{d} \geqslant T\) then, on \(\Omega \), for any small enough constant \(\varepsilon > 0\) and each \(\mu \in [0,\varepsilon ]\),

    $$\begin{aligned} \Vert {\textbf{u}}(x)|_{B_i(x)^c} \Vert \lesssim \sqrt{\alpha _x} \bigg ( \frac{1 + O(\varepsilon )}{\sqrt{\alpha _x - 1}} \bigg )^{i+1} \end{aligned}$$

    for all \(x \in {\mathcal {W}}\) and \(i \leqslant c \frac{\log N}{(\log d) \log \frac{10\log N}{d} }\).

  2. (ii)

    (Critical regime) There is a constant \(c > 0\) such that, for any constants \(T \geqslant 1\) and \(\varepsilon > 0\) with \(\varepsilon \leqslant c \kappa \), if \(\frac{\log N}{d} \leqslant T\) and \(\mu \in [0,c T^{-1} \varepsilon ^2]\) then, with high probability,

    $$\begin{aligned} \Vert {\textbf{u}}(x)|_{B_i(x)^c} \Vert \lesssim \sqrt{\alpha _x} \bigg (\frac{1 + O(\varepsilon )}{\sqrt{\alpha _x-1}}\bigg )^{i+1} \end{aligned}$$

    for all \(x \in {\mathcal {W}}\) and \(i \leqslant \frac{c \varepsilon ^2}{T} \frac{\log N}{\log d}\).

Proof

Let \(x \in {\mathcal {W}}\). For any i, \(n \in {\mathbb {N}}\), we conclude from (4.4) and (4.5) (see also (4.6)) that

$$\begin{aligned} \Vert {\textbf{u}}(x)|_{B_i(x)^c} \Vert \lesssim \frac{1}{\lambda (x)\sqrt{d}} \biggl \Vert Q_i \sum _{k=0}^n \bigg ( \frac{H^{({\mathcal {V}})}}{\lambda (x)} \bigg )^k {\textbf{1}}_{S_1(x)} \biggr \Vert + \frac{\Vert (H^{({\mathcal {V}})})^{n+1} \textbf{1}_{S_1(x)} \Vert }{\sqrt{d} \lambda (x)^{n+1}} \end{aligned}$$

since , \(\Vert Q_i \Vert \leqslant 1\) and \(\Vert (\lambda (x) - H^{({\mathcal {V}})})^{-1} \Vert \lesssim 1\) by (4.3). We denote the right-hand side of (3.2) by r. In the following, we always assume that \(n + 1\leqslant r\) and tacitly use the graph properties listed in Proposition 3.2. As in the proof of Proposition 4.1 (see (4.8) and use \(q_x \leqslant 1 - \varepsilon \) by (4.1)), we find some constant \(\varepsilon > 0\) such that if \(n + 1 \leqslant r \) then

$$\begin{aligned} \frac{\Vert (H^{({\mathcal {V}})})^{n+1} {\textbf{1}}_{S_1(x)} \Vert }{\sqrt{d} \lambda (x)^{n+1}} \lesssim \sqrt{\alpha _x} (1- \varepsilon )^{n+1}. \end{aligned}$$
(4.11)

We shall use the following result, whose proof is given at the end of this subsection.

Claim 4.4 Suppose that all vertices in \(B_{r - 1}(x){\setminus } \{x \}\) have degree at most \(\tau d\), where \(2 \leqslant 2 \sqrt{\tau } < \lambda (x)\). Then for all \(n \leqslant r - 1\) and \(i \in {\mathbb {N}}\) we have on \(\Omega \)

$$\begin{aligned} \frac{1}{\lambda (x)\sqrt{d}} \biggl \Vert Q_i \sum _{k = 0}^n \bigg ( \frac{H^{({\mathcal {V}})}}{\lambda (x)} \bigg )^k {\textbf{1}}_{S_1(x)} \biggr \Vert \lesssim \sqrt{\alpha _x} \bigg ( \frac{2}{\lambda (x)+ \sqrt{\lambda (x)^2 - 4\tau }} \bigg )^{i + 1}. \end{aligned}$$
(4.12)

We now explain how we choose \(\tau \) and n in the subcritical and the critical regimes in order to deduce Proposition 4.3 from Claim 4.4. In the subcritical regime, i.e. for the proof of (i), we choose and \(n \leqslant \varepsilon \frac{\log N}{\log d}\) for some small enough constant \(\varepsilon > 0\). By arguing similarly as in the proof of Corollary A.3 below, we find a constant \(T \geqslant 1\) such that

$$\begin{aligned} \frac{4 \tau }{\lambda (x)^2} \leqslant \frac{4 \alpha ^*(1-\varepsilon ) (1 + o(1))}{\Lambda (\alpha ^*(\mu ))^2} = O(\varepsilon ) < 1 \end{aligned}$$
(4.13)

by (3.3) and (2.2) if \(\frac{\log N}{d} \geqslant T\). Here, the last inequality holds if \(\varepsilon \) is sufficiently small. Hence, the assumption on \(\tau \) in Claim 4.4 holds. Moreover, with our choice of \(\tau \), the degrees in \(B_n(x){\setminus }\{x\}\) are bounded by \(\tau d\) due to Proposition 3.2 (iv) and our assumption on n. Hence, the assumptions of Claim 4.4 hold. As \(\lambda (x)\lesssim \sqrt{\frac{\log N}{d}}\) by (3.3) and Proposition 3.1 (i), the right-hand side of (4.11) is bounded by the right-hand side of (4.12) if \(n = i C \log \big (\frac{10\log N}{d} \big )\) for some sufficiently large constant C. Since we need that \(n +1 \leqslant r\), this yields the upper bound on i in (i). From (4.13) and (3.3), we deduce that \(\lambda (x) + \sqrt{\lambda (x)^2 - 4\tau } = \lambda (x) \big ( 1 + \sqrt{1 - \frac{4\tau }{\lambda (x)^2}}\big ) = \Lambda (\alpha _x) ( 1 +o(1))(2 + O(\varepsilon )) \geqslant 2\sqrt{\alpha _x - 1} (1 + O(\varepsilon ))^{-1}\), which completes the proof of (i).

For the proof of (ii), we note that \(\mu \leqslant c \varepsilon ^2 T^{-1} < 1/3\) for a sufficiently small constant \(c>0\) since \(T \geqslant 1\), \(\varepsilon \leqslant c \kappa \) and \(\kappa \leqslant 1\). We choose \(\tau = 1 + \varepsilon \) for some constant \(\varepsilon >0\). As \(\lambda (x)^2 - 4 \tau = ( \lambda (x)^2 - 4)(1 - \frac{4\varepsilon }{\lambda (x)^2 - 4}) > 0\) due to (3.3) and (2.2) if \(\varepsilon \leqslant c \kappa \) with a small enough constant \(c > 0\). This establishes the first condition for (4.12). We set and conclude from Bennett’s inequality (Lemma D.1 below) and Proposition 3.2 that

$$\begin{aligned}{} & {} {\mathbb {P}}\Big ( \alpha _y \leqslant \tau \text { for all } x \in {\mathcal {W}} \text { and for all } y \in B_{R}(x){\setminus } \{x \} \Big ) \\{} & {} \quad \leqslant \exp \bigg ( \bigg ( \mu + R \frac{\log d}{\log N} - \frac{d}{\log N} h(\varepsilon )\bigg )\log N \bigg ), \end{aligned}$$

where . Since \(h(\varepsilon ) \geqslant 3c \varepsilon ^2\) for some small enough constant \(c >0\), the upper bound on \(\mu \) imposed in the statement, the definition of R and \(\frac{d}{\log N} \geqslant \frac{1}{T}\) imply that the factor in front of \(\log N\) is negative. Therefore, with high probability, we can apply (4.12) simultaneously for all \(x \in {\mathcal {W}}\). We choose \(n = C ( i + 1)\) and deduce that if \(C > 0\) is a large enough constant then the error in (4.11) is dominated by the right-hand side of (4.12) as \(\lambda (x) \lesssim 1\) for \(\frac{\log N}{d} \leqslant T\). We recall that r denotes the right-hand side of (3.2) and note that the condition of (4.11), \(n + 1 = C (i + 1) +1 \leqslant r\), can be satisfied by possibly decreasing the constant \(c > 0\). Finally, we obtain \(\lambda (x) + \sqrt{\lambda (x)^2 - 4\tau } = \lambda (x) + \sqrt{\lambda (x)^2 - 4\tau } + O( \frac{\varepsilon }{\sqrt{\lambda (x)^2 - 4}} ) = 2 \sqrt{\alpha _x - 1} ( 1 + O(\varepsilon ))^{-1}\) similarly as argued above and in the proof of (i) using \(\varepsilon \leqslant c \kappa \) and \(\kappa \leqslant 1\). This proves (ii) and, thus, Proposition 4.3. \(\square \)

Proof of Claim 4.4

For \(y \in S_1(x)\) we denote by \(B_n^+(y)\) the ball of radius n around y in the graph \({\mathbb {G}} |_{[N] {\setminus } \{x\}}\), and we write . Since \({\mathbb {G}}|_{B_{n+1}(x) {\setminus } {\mathcal {V}}}\) is a forest on \(\Omega \) and for each \(y \in S_1(x)\), \({\mathbb {G}}|_{B_{n}^+(y)}\) is a tree, we obtain

$$\begin{aligned} (A^{({\mathcal {V}})})^k {\textbf{1}}_{S_1(x)} = \sum _{y \in S_1(x)} \sum _{j=1}^{k+1} \sum _{z \in S_{j-1}^+(y)} {\textbf{1}}_z N_k(y,z), \end{aligned}$$

where \(N_k(y,z)\) denotes the number of walks on \({\mathbb {G}}|_{B_{n}^+(y)}\) of length k between y and z.

Thus, for \(i \geqslant 0\), we conclude

$$\begin{aligned} Q_i \sum _{k=0}^n \frac{1}{\lambda (x)^k d^{k/2}} (A^{({\mathcal {V}})})^k \textbf{1}_{S_1(x)}&= Q_i \sum _{y \in S_1(x)} \sum _{j=1}^{n+1} \sum _{z \in S_{j-1}^+(y)} \sum _{k= j-1}^n{\textbf{1}}_z \frac{N_k(y,z)}{\lambda (x)^k d^{k/2}} \\&= \sum _{y \in S_1(x)} \sum _{j=i + 1}^{n+1} \sum _{z \in S_{j-1}^+(y)}{\textbf{1}}_z \bigg ( \sum _{k= j-1}^n \frac{N_k(y,z)}{\lambda (x)^k d^{k/2}} \bigg ), \end{aligned}$$

where the second step follows from the definition of \(Q_i\). Since the sets , are disjoint, we conclude that

$$\begin{aligned} \biggl \Vert Q_i \sum _{k = 0}^n \bigg ( \frac{H^{({\mathcal {V}})}}{\lambda (x)} \bigg )^k {\textbf{1}}_{S_1(x)} \biggr \Vert ^2 = \sum _{y \in S_1(x)} \sum _{j=i + 1}^{n+1} \sum _{z \in S_{j-1}^+(y)} \bigg ( \sum _{k=j-1}^n \frac{N_k(y,z)}{\lambda (x)^k d^{k/2}} \bigg )^2. \end{aligned}$$
(4.14)

Next, let \(z \in S_{j-1}^+(y)\) for some \(y \in S_1(x)\). We note that \(N_k(y,z) = 0\) if \(k - (j-1)\) is odd due to the bipartite structure of a tree. If this difference is even, then we now show that

$$\begin{aligned} N_k(y,z) \leqslant (M^k)_{1j} (\tau d)^{(k - (j-1))/2}, \end{aligned}$$
(4.15)

where M is the adjacency matrix of \({\mathbb {N}}^*\) (regarded as a graph where consecutive numbers are adjacent), see (D.2) below for a precise definition, under the assumption that the degree of each vertex in \({\mathbb {G}}|_{B_{n}^+(y)}\) is bounded by \(\tau d\).

For the proof of (4.15), we introduce the set of walks on \({\mathbb {N}}^*\)

and for each \(\gamma \in W_k(j)\) we introduce the set of walks on \({\mathbb {G}}|_{B_{n}^+(y)}\) that project down to \(\gamma \),

By definition, any walk \(\Gamma \in W_k(y,z;\gamma )\) projects down to a walk \(\gamma \in W_k(j)\), which implies

$$\begin{aligned} N_k(y,z) \leqslant \sum _{\gamma \in W_k(j)} |W_k(y,z;\gamma ) |. \end{aligned}$$
(4.16)

In order to prove (4.15), we fix \(\gamma \in W_k(j)\) and estimate \(|W_k(y,z;\gamma ) |\). For \(i \in [j]\), let , i.e. \(T_i\) is the last time when \(\gamma \) hits i. Clearly, \(T_1< T_2< \ldots< T_{j-1} < T_j = k\). See Fig. 5 for an illustration of the walks \(\gamma \) and \(\Gamma \). At each time \(T_i\) with \(i \in [j-1]\), the walk \(\gamma \) takes a step to the right, and by definition of the times \(T_i\), any walk \(\Gamma \in W_k(y,z;\gamma )\) takes a step outwards on the geodesic from y to z. This means that \(j - 1\) of the k steps of \(\Gamma \) are fixed. Of the remaining \(k - (j - 1)\) steps, half correspond to steps to the left of \(\gamma \), which again correspond to a uniquely determined step of \(\Gamma \) along the unique path back towards y. Hence, only \((k - (j - 1))/2\) of the k steps of \(\Gamma \) are free to choose. Since the degrees in \({\mathbb {G}}|_{B_{n}^+(y)}\) are bounded by \(\tau d\), we obtain \(|W_k(y,z; \gamma ) | \leqslant (\tau d)^{(k-(j-1))/2}\). Plugging this estimate into (4.16) implies (4.15), since \(|W_k(j) | = (M^k)_{1j}\) by definition of M.

Fig. 5
figure 5

An illustration of two walks \(\gamma \in W_k(j)\) (bottom) and \(\Gamma \in W_k(y,z;\gamma )\) (top). Here \(k = 21\) and \(j = 6\). By definition of \(W_k(y,z;\gamma )\), for each \(t \in \{0, \dots , k\}\) we have \(\gamma (t) = {\textrm{d}}(x, \Gamma (t))\). The time \(T_i\) is the last time when \(\gamma \) hits i. We draw an edge \(\{\Gamma (t), \Gamma (t+1)\}\) in red if the choice of the vertex \(\Gamma (t+1)\) is uniquely determined by \(\gamma \) and in blue otherwise. In the latter case, there are at most \(\tau d\) possible choices for \(\Gamma (t+1)\). Red edges arise in two ways: (i) a step to the right in \(\gamma \) following a time \(T_i\) (\(j - 1\) in total, in \(\Gamma \) corresponding to a step towards z along the geodesic from y to z); (ii) a step to the left in \(\gamma \) (\((k - (j - 1))/2\) in total, in \(\Gamma \) corresponding to a step towards y)

Next, applying (4.15) to (4.14), we obtain

$$\begin{aligned} \biggl \Vert Q_i \sum _{k = 0}^n \bigg ( \frac{H^{({\mathcal {V}})}}{\lambda (x)} \bigg )^k {\textbf{1}}_{S_1(x)} \biggr \Vert ^2&\leqslant \sum _{j = i + 1}^{n+1} |S_j(x) | \bigg ( \sum _{k = j -1}^n \frac{(M^k)_{1j} (\tau d)^{(k - (j-1))/2}}{\lambda (x)^k d^{k/2}} \bigg )^2 \\&\leqslant \sum _{j = i + 1}^{n+1} |S_j(x) | \frac{1}{(\tau d)^{j - 1}} \bigg ( \sum _{k = j - 1}^n \frac{(M^k)_{1j} \tau ^{k/2}}{\lambda (x)^k} \bigg ) ^2 \\&\lesssim \sum _{j = i + 1}^{n+1} |S_1(x) | \frac{1}{\tau ^{j-1}} \bigg ( \sum _{k = 0}^\infty \frac{(M^k)_{1j} \tau ^{k/2}}{\lambda (x)^k} \bigg ) ^2 \\&= \sum _{j = i + 1}^{n+1} |S_1(x) | \frac{1}{\tau ^{j-1}} \bigg (\bigg ( 1 - \frac{\sqrt{\tau }}{\lambda (x)} M\bigg )^{-1}_{1j} \bigg )^2\\&= \sum _{j = i+1}^{n+1} |S_1(x) | \lambda (x)^2 \bigg (\frac{2}{ \lambda (x) + \sqrt{\lambda (x)^2 - 4\tau }} \bigg )^{2j} \\&\lesssim |S_1(x) | \lambda (x)^2 \bigg ( \frac{2}{\lambda (x)+ \sqrt{\lambda (x)^2 - 4\tau }} \bigg )^{2 (i + 1)}. \end{aligned}$$

Here, in the third step, we used that \(|S_j(x) | \lesssim |S_1(x) | d^{j-1}\) by Proposition 3.1 (ii) and that \((M^k)_{1j} \geqslant 0\) for all \(k \in {\mathbb {N}}\). The fourth step follows from the condition \(2\sqrt{\tau }/\lambda (x) < 1\), the invertibility and the Neumann series representation of M in Lemma D.5 below with \(t = \lambda (x) / \sqrt{\tau }\). The fifth step is a consequence of the representation of \(( 1 - \frac{\sqrt{\tau }}{\lambda (x)}M)^{-1}_{1j}\) in Lemma D.5 below. In the last step, we used that \(\lambda (x) = \Lambda (\alpha _x) + o(1) \geqslant 2 + \kappa /4\) to sum up the geometric series and conclude that the series is \(\lesssim 1\). This completes the proof of (4.12) and, thus, the one of Claim 4.4. \(\square \)

5 Approximate Eigenvalues: Proof of Proposition 2.3

In this section we prove Proposition 2.3, by showing that in the interval \({\mathcal {I}}\) there is a one-to-one correspondence between eigenvalues of H and the points \(\lambda (x)\) for \(x \in {\mathcal {W}}\), up to a polynomially small error term.

5.1 Proof of Proposition 2.3

We recall \({\textbf{w}}_1\) from Definition 3.3.

Definition 5.1

Let \(\Pi \) be the orthogonal projection onto and .

Note that the set in the definition of \(\Pi \) is not orthogonal. Throughout the proof, we regard H as a block matrix associated with the orthogonal sum decomposition \({{\,\textrm{ran}\,}}\Pi \oplus ({{\,\textrm{ran}\,}}\Pi )^\perp \).

Proposition 5.2

Suppose (1.5). Fix \(\mu \in [0,1/3)\) and \(\zeta \in [0,1/2 - \mu )\). With high probability, the following holds.

  1. (i)

    \(\Vert (H- \lambda (x)){\textbf{u}}(x) \Vert \leqslant N^{-\zeta }\) for all \(x \in {\mathcal {W}}\).

  2. (ii)

    If \(\zeta < 1/2 - 3 \mu /2\) then

    counted with multiplicity, where \(|\varepsilon _x|\leqslant N^{-\zeta }\) for all \(x\in {\mathcal {W}}\) and \(\lambda _1(H)=\sqrt{d}(1+o(1))\).

  3. (iii)

    If \(\zeta < 1/2 - 3 \mu /2\) then \(\Vert {\overline{\Pi }} H \Pi \Vert \leqslant N^{-\zeta }\).

  4. (iv)

    If \(\mu < 1/4\) then \(\lambda _1({\overline{\Pi }} H {\overline{\Pi }})\leqslant \Lambda (\alpha ^*) + \kappa /2 +o(1)\).

Proof of Proposition 2.3

Owing to the block decomposition \(H = \Pi H \Pi + \overline{\Pi } \!\,H \overline{\Pi } \!\, + \overline{\Pi } \!\, H \Pi + \Pi H \overline{\Pi } \!\,\), Proposition 5.2 (iii) yields

counted with multiplicities, where \(|\varepsilon _\lambda |\leqslant 2 \Vert {\overline{\Pi }} H \Pi \Vert \leqslant N^{-\zeta }\) for all \(\lambda \). Therefore, Proposition 2.3 follows from the definition of \({\mathcal {I}}\) as well as Proposition 5.2 (ii), (iv), and (i). \(\square \)

The rest of this section is devoted to the proof of Proposition 5.2. We assume the condition (1.5) throughout. We recall the definition of the high probability event \(\Omega \) from Definition 3.5. For any event A and random variable X we write

(5.1)

5.2 Proof of Proposition 5.2 (i)

The proof of Proposition 5.2 (i) relies on the following result, whose proof is given at the end of this subsection.

Proposition 5.3

Let \(\mu \in [0,1/3)\). For any \(x\in {{\mathcal {W}}}\), we have the decomposition

$$\begin{aligned} (H-\lambda (x)){\textbf{u}}(x)=\sum _{y\in {{\mathcal {V}}} {\setminus } \{x \}}\varepsilon _{y}(x) {\textbf{1}}_{y}. \end{aligned}$$
(5.2)

Moreover, for any \(x,y\in [N]\), we have the estimate

$$\begin{aligned} {\mathbb {E}}_{\Omega }[\mathbb {1}_{x \in {\mathcal {W}}} \mathbb {1}_{y\in {{\mathcal {V}} {\setminus } \{x \} }}\,\varepsilon _{y}(x)^{2}]\leqslant d^{-1}(10 \log N)^{2}N^{2\mu - 3}. \end{aligned}$$

From Proposition 5.3, for any \(x\in [N]\), we obtain

$$\begin{aligned} {\mathbb {E}}_{\Omega } \bigl [\mathbb {1}_{x\in {\mathcal {W}}}\Vert (H-\lambda (x))\textbf{u}(x)\Vert ^2\bigr ] \leqslant d^{-1}(10 \log N)^{2} N^{2\mu -2}. \end{aligned}$$
(5.3)

Proof of Proposition 5.2 (i). From (5.3), a union bound, and Chebyshev’s inequality, we conclude that

$$\begin{aligned} {\mathbb {P}} \bigl (\exists x\in {\mathcal {W}},\, \Vert (H-\lambda (x)){\textbf{u}}(x)\Vert > N^{-\zeta }\bigr ) \leqslant {\mathbb {P}}(\Omega ^c) + N^{2\zeta +2\mu -1+o(1)} \end{aligned}$$

for any \(\zeta >0\). This proves Proposition 5.2 (i) since \(\zeta < 1/2 -\mu \) by assumption, and \({\mathbb {P}}(\Omega ^c) = o(1)\) by Propositions 3.1, 3.2 and 3.4. \(\square \)

We shall need modifications of the sets \({\mathcal {V}}\) and \({\mathcal {W}}\) defined in (2.1) and (2.2), respectively. For \(X \subset [N]\) we define

(5.4)
(5.5)

The point of these definitions is that \({\mathcal {V}}^{(X)}\) and \(\mathcal W^{(X)}\) depend only on the edges in \({\mathbb {G}} \vert _{X^c}\). The following remark states that on the event \(\Omega \) the effect of the upper index in these definitions amounts simply to excluding vertices.

Remark 5.4

On \({\Omega }\cap \{ y \in {\mathcal {V}} \}\), owing to Proposition 3.2 (iii), we have \({\mathcal {V}}^{(y)} = {\mathcal {V}} {{\setminus }} \{ y\}\) and \({\mathcal {W}}^{(y)} = {\mathcal {W}} {{\setminus }} \{ y\}\).

Fig. 6
figure 6

An illustration of the identity (5.6). The set \(\mathcal V {\setminus } \{x\}\) is drawn in green and its complement \(({\mathcal {V}} {\setminus } \{x\})^c\) in blue. The blue neighbours of the green vertices are drawn explicitly, and the remaining blue vertices are represented by the shaded blue region. The edges of incident to \({\mathcal {V}} {\setminus } \{x\}\) are drawn in red. The adjacency matrix of these red edges is \(A - A^{({\mathcal {V}} {\setminus } \{x\})}\). The vector \({\textbf{u}}(x)\) is supported on the blue vertices, and hence the vector \((A - A^{({\mathcal {V}} {\setminus } \{x\})}) {\textbf{u}}(x)\) is supported on the green vertices. Its value at a green vertex y equals the sum of the entries of \({\textbf{u}}(x)\) at the blue vertices adjacent to y

Proof of Proposition 5.3

Note that \(H - H^{({\mathcal {V}} {\setminus } \{x \})}\) is \(d^{-1/2}\) times the adjacency matrix of the subgraph of \({\mathbb {G}}\) containing the edges incident to \({\mathcal {V}} {\setminus } \{x\}\). Hence, because \((H^{({\mathcal V}{{\setminus }}\{x\})}-\lambda (x)){\textbf{u}}(x)=0\) and \({{\,\textrm{supp}\,}}{\textbf{u}}(x) \subset ({\mathcal {V}} {{\setminus }} \{x \})^c\), we find

(5.6)

See Fig. 6 for an illustration. The Cauchy-Schwarz inequality implies

(5.7)

on \({\Omega }\) due to Proposition 3.1 (i). From now on we fix \(x,y\in [N]\). Moreover, let \(\tilde{{\textbf{u}} }(x)\) be the eigenvector of \(H^{({\mathcal {V}}^{(y)} \cup \{ y \} {\setminus } \{ x \})}\) associated with its second largest eigenvalue \(\lambda _2(H^{(\mathcal V^{(y)} \cup \{ y \} {{\setminus }} \{ x \})})\) and satisfying . On \({\Omega }\cap \{ y \in {\mathcal {V}} \}\), we have \({{\mathcal {V}}}={{\mathcal {V}}}^{(y)} \cup \{ y\}\) by Remark 5.4 and, thus, \(\textbf{u}(x)=\tilde{{\textbf{u}}}(x)\). As \(x \ne y\), on \(\Omega \cap \{ y \in \mathcal V\}\), the events \(\{ x \in {\mathcal {W}} \}\) and \(\{ x \in {\mathcal {W}}^{(y)} \}\) coincide by Remark 5.4. Therefore, since \({\mathcal {W}}^{(y)}\) and \(\widetilde{{\textbf{u}}}(x)\) are \(\sigma (A^{(y)})\)-measurable, we obtain

(5.8)

Here, in the first step, in addition, we spelled out the condition \(y \in {\mathcal {V}}\) as \(|S_1(y) | \geqslant \alpha ^*(\mu )d\), used that \(|S_1(y) | \leqslant 10 \log N\) on \(\Omega \) by Proposition 3.1 (i) and then dropped the indicator function \(\mathbb {1}_{\Omega }\). The second step follows from the independence of the event \(\{ \alpha ^*(\mu ) d \leqslant |S_1(y) | \leqslant 10 \log N, t\in S_{1}(y) \}\) from \(\sigma (A^{(y)})\) and . (For these two steps, see also (2.9).) For the third step, we conditioned on \(|S_1(y) |\) and used that if \(|S_1(x) | = k\) then t lies in a uniformly distributed subset of \([N]{{\setminus }} \{y \}\) with k elements. For the last step, we used that \({\mathcal {W}}^{(y)} \subset \mathcal W \subset {\mathcal {V}}\).

Finally, applying (5.8) to (5.7) and using the estimate \({\mathbb {P}}(x \in {\mathcal {V}}) \leqslant N^{\mu - 1}\) (by the definitions (2.1) and (1.6)) concludes the proof of Proposition 5.3. \(\square \)

5.3 Proof of Proposition 5.2 (ii), (iii)

In this section, we conclude Proposition 5.2 (ii) and (iii) from the following result, which is also proved in this section.

Definition 5.5

Order the elements of \({\mathcal {W}}\) in some arbitrary fashion, and denote by \(({\textbf{u}}^{\perp }(x))_{x\in {\mathcal {W}}}\) the Gram-Schmidt orthonormalization of \(({\textbf{u}}(x))_{x\in {\mathcal {W}}}\).

Proposition 5.6

Let \(\mu \in [0,1/3)\). Then the following holds with high probability. For any \(x\in {\mathcal {W}}\), we have

$$\begin{aligned} \Vert (H-\lambda (x)){\textbf{u}}^{\perp }(x)\Vert \lesssim N^{\mu -1/2 + o(1)}. \end{aligned}$$
(5.9)

More generally, denoting \({\mathcal {D}} = \sum _{x \in {\mathcal {W}}} \lambda (x) \textbf{u}^\perp (x) ({\textbf{u}}^\perp (x))^*\) we have

$$\begin{aligned} \Vert (H-{\mathcal {D}}){\textbf{u}}\Vert \lesssim N^{3\mu /2-1/2 + o(1)}\Vert {\textbf{u}}\Vert \end{aligned}$$
(5.10)

for all .

Proof of Proposition 5.2 (ii)

By Definition 5.1, \({\textbf{w}}_1\) is an eigenvector of \(\Pi H \Pi \) with eigenvalue \(\lambda _1(H)\). Let \(\zeta < 1/2 - 3\mu / 2\). From (5.10), we conclude that, for each \(x \in {\mathcal {W}}\), there is \(\varepsilon _x \in [-N^{-\zeta }, N^{-\zeta }]\) such that counted with multiplicity. By Proposition 3.4 (ii), \(\lambda _1(H) = \sqrt{d}(1 + o(1))\). Hence, by (3.1) and (3.3),

$$\begin{aligned} \lambda _1(H) \gg \Lambda (\alpha _x) + o(1) = \lambda (x) \end{aligned}$$
(5.11)

for any \(x \in {\mathcal {W}}\). Therefore, we have found \(1 + |{\mathcal {W}} |\) non-zero eigenvalues of \(\Pi H \Pi \) (counted with multiplicity). Since the dimension of \({{\,\textrm{ran}\,}}\Pi \) is at most \(1 + |{\mathcal {W}} |\), this completes the proof of Proposition 5.2 (ii). \(\square \)

Proof of Proposition 5.2 (iii)

In order to estimate \(\Vert \overline{\Pi } \!\,H \Pi \Vert \), let \({\textbf{v}} \in {{\,\textrm{ran}\,}}\Pi \). We decompose \({\textbf{v}} = \alpha {\textbf{w}}_1 + {\textbf{u}}\) from some \(\alpha \in {\mathbb {R}}\) and . Let \(\zeta < 1/2- 3\mu /2\). From (5.10), we obtain that

$$\begin{aligned} \overline{\Pi } \!\, H {\textbf{v}} = \overline{\Pi } \!\,\alpha \lambda _1(H) {\textbf{w}}_1 + \overline{\Pi } \!\,{\mathcal {D}} {\textbf{u}} + o (N^{-\zeta } \Vert {\textbf{u}} \Vert ) = o(N^{-\zeta } \Vert {\textbf{u}} \Vert ), \end{aligned}$$
(5.12)

where the last steps follows from \({\textbf{w}}_1 \in {{\,\textrm{ran}\,}}\Pi \) and \({{\,\textrm{ran}\,}}{\mathcal {D}} \subset {{\,\textrm{ran}\,}}\Pi \) due to the definitions of \({\mathcal {D}}\) and \(\Pi \). It remains to show that \(\Vert {\textbf{u}} \Vert \lesssim \Vert {\textbf{v}} \Vert \). From (5.10), we also conclude

(5.13)

Since \(\Vert {\mathcal {D}} \Vert \leqslant \max _{x \in {\mathcal {W}}} \lambda (x) \ll \lambda _1(H)\) by (5.11), we obtain from (5.13) that and, thus, . Therefore,

(5.14)

Hence, \(\Vert {\textbf{u}} \Vert \leqslant (1 - o(1))\Vert {\textbf{v}} \Vert \) which completes the proof of Proposition 5.2 (iii) due to (5.12). \(\square \)

The main ingredient in the proof of Proposition 5.6 is the following result. It uses the following orthogonal projection.

Definition 5.7

(\(\Pi _{{\mathcal {X}}}\)). For any \({\mathcal {X}} \subset {\mathcal {W}}\), we denote by \(\Pi _{{\mathcal {X}}}\) the orthogonal projection onto and we define .

Proposition 5.8

If \(\mu \in [0,1/3)\) then

$$\begin{aligned} {\mathbb {E}}_{\Omega }\bigg [\max _{{\mathcal {X}}\subset \mathcal W}\Vert {\overline{\Pi }}_{{\mathcal {X}}}H\Pi _{{\mathcal {X}}}\Vert ^{2}\bigg ]\lesssim d^{-1}(\log N)^{2}N^{2\mu -1}. \end{aligned}$$

Before proving Proposition 5.8 we deduce Proposition 5.6 from it.

Proof of Proposition 5.6

Recall from Definition 5.5 that the elements of \(\mathcal W\) are ordered in an arbitrary fashion. For \(x \in {\mathcal {W}}\), let \(\Pi _{<x}\) be the orthogonal projection onto and . Then for any \(x \in {\mathcal {W}}\) we have

$$\begin{aligned} {\textbf{u}}^\perp (x) = \frac{{\textbf{u}}(x) -\Pi _{<x}{\textbf{u}}(x) }{\Vert {\textbf{u}}(x) -\Pi _{<x}{\textbf{u}}(x) \Vert }= \frac{{\overline{\Pi }}_{<x}{\textbf{u}}(x) }{\Vert {\overline{\Pi }}_{<x}{\textbf{u}}(x) \Vert }. \end{aligned}$$
(5.15)

In order to estimate \((H-\lambda (x))\overline{\Pi } \!\,_{<x}{\textbf{u}}(x)\), we conclude from the definitions of \(\Pi _{<x}\) and \(\overline{\Pi } \!\,_{<x}\) that

$$\begin{aligned} H{\overline{\Pi }}_{<x} = {\overline{\Pi }}_{<x}H{\overline{\Pi }}_{<x} + \Pi _{<x}H{\overline{\Pi }}_{<x} = {\overline{\Pi }}_{<x}H-{\overline{\Pi }}_{<x}H\Pi _{<x} +\Pi _{<x}H{\overline{\Pi }}_{<x}, \end{aligned}$$

which then yields

$$\begin{aligned} (H-\lambda (x)){\overline{\Pi }}_{<x}{\textbf{u}}(x) = {\overline{\Pi }}_{<x} (H-\lambda (x)) {\textbf{u}}(x) +\Pi _{<x}H{\overline{\Pi }}_{<x}{\textbf{u}}(x) -{\overline{\Pi }}_{<x}H\Pi _{<x}{\textbf{u}}(x). \end{aligned}$$

Hence,

$$\begin{aligned} \Vert (H-\lambda (x)){\overline{\Pi }}_{<x}{\textbf{u}}(x) )\Vert \leqslant \Vert (H-\lambda (x)) {\textbf{u}}(x) \Vert +2 \Vert {\overline{\Pi }}_{<x}H\Pi _{<x} \Vert . \end{aligned}$$

From (5.3) combined with a union bound over x, Proposition 5.8, and Chebyshev’s inequality, we deduce that with high probability

$$\begin{aligned} \max _{x \in {\mathcal {W}}} \Vert (H-\lambda (x)){\overline{\Pi }}_{<x}{\textbf{u}}(x) )\Vert \leqslant N^{\mu -1/2+o(1)}. \end{aligned}$$

Moreover, since for x, \(y \in \mathcal W\) satisfying \(y < x\), we find that with high probability, for all \(x \in {\mathcal {W}}\),

where the last step follows from Corollary 3.7 and the fact that \({\mathbb {P}}(\Omega ) = 1 - o(1)\). Plugging these two estimates into (5.15) yields (5.9).

For the proof of (5.10), we write \(\textbf{u}= \sum _{x\in {\mathcal {W}}} a_x {\textbf{u}}^\perp (x) \) and apply (5.9) and Cauchy-Schwarz,

$$\begin{aligned} \Vert (H-{\mathcal {D}}){\textbf{u}} \Vert = \biggl \Vert \sum _{x\in {\mathcal {W}}} a_x (H-\lambda (x)){\textbf{u}}^\perp (x) \biggr \Vert \leqslant N^{\mu -1/2+o(1)}|\mathcal W|^{1/2}\bigg (\sum _{x\in {\mathcal {W}}} a_x^2\bigg )^{1/2}. \end{aligned}$$

Hence, (5.10) follows from \({\mathcal {W}} \subset {\mathcal {V}}\) and Proposition 3.2 (i). \(\square \)

Proof of Proposition 5.8

Let \({\textbf{v}}\in {{\,\textrm{ran}\,}}\Pi _{{\mathcal {X}}}\). We write \({\textbf{v}}=\sum _{x\in {\mathcal X}}a_{x}{\textbf{u}}(x)\) with \(a_{x}\in {\mathbb {R}}\), use (5.2) and \(\overline{\Pi } \!\,_{{\mathcal {X}}} {\textbf{u}} (x) = 0\) for each \(x \in {\mathcal {X}}\) to obtain

$$\begin{aligned} \Vert {\overline{\Pi }}_{{\mathcal {X}}}H{\textbf{v}}\Vert ^{2}&=\biggl \Vert \sum _{x\in {\mathcal X}}a_{x}{\overline{\Pi }}_{{\mathcal {X}}}\bigg (\lambda (x) \textbf{u}(x)+\sum _{y\in {{\mathcal {V}} {\setminus }\{ x\}}}\varepsilon _{y}(x)\textbf{1}_{y}\bigg ) \biggr \Vert ^{2} \\&=\biggl \Vert {\overline{\Pi }}_{{\mathcal {X}}}\sum _{x\in \mathcal X,y\in {{\mathcal {V}} {\setminus } \{x \}}}\varepsilon _{y}(x)a_{x}{\textbf{1}}_{y} \biggr \Vert ^{2}\\&\leqslant \sum _{y\in {{\mathcal {V}}}}\bigg (\sum _{x\in {{\mathcal {X}} {\setminus } \{y \}}}\varepsilon _{y}(x)a_{x}\bigg )^{2} \leqslant \bigg (\sum _{x\in \mathcal X}a_{x}^{2} \bigg )\bigg (\sum _{x\in {{\mathcal {W}}},\, y \in {\mathcal {V}} {\setminus }\{x\}}\varepsilon _{y}(x)^{2}\bigg ). \end{aligned}$$

Moreover, since for any \(y \in {\mathcal {V}} {{\setminus }} \{ x\}\), on \(\Omega \) we have

for some constant \(c>0\) by Corollary 3.7. Therefore, on \(\Omega \) we have

$$\begin{aligned} \Vert {\overline{\Pi }}_{{\mathcal {X}}}H{\textbf{v}}\Vert ^{2}&\leqslant \frac{1}{c}\bigg (\sum _{x \in {\mathcal {W}},\, y\in {{\mathcal {V}} {\setminus } \{x\}}}\varepsilon _{y}(x)^{2}\bigg )\Vert {\textbf{v}}\Vert ^{2} \end{aligned}$$

and, in particular, \(\Vert {\overline{\Pi }}_{{\mathcal {X}}}H\Pi _{\mathcal X}\Vert ^{2}\leqslant \frac{1}{c}\left( \sum _{x \in {\mathcal {W}},\, y\in {{\mathcal {V}} {{\setminus }} \{x\}}}\varepsilon _{y}(x)^{2}\right) \) for any \({\mathcal {X}} \subset {\mathcal {W}}\). By Proposition 5.3, we therefore conclude

$$\begin{aligned} {\mathbb {E}}_{\Omega }\left[ \sum _{x \in {\mathcal {W}},\, y\in {\mathcal V{\setminus }\{x\}}}\varepsilon _{y}(x)^{2}\right] =\sum _{x,y\in [N]} {\mathbb {E}}_{\Omega } \big [ \mathbb {1}_{x \in {\mathcal {W}}} \mathbb {1}_{y\in {{\mathcal {V}} {\setminus } \{x \}}} \, \varepsilon _{y}(x)^{2}\big ]\lesssim d^{-1}(\log N)^{2}N^{2\mu -1}, \end{aligned}$$

as claimed. \(\square \)

5.4 Proof of Proposition 5.2 (iv)

For the proof of Proposition 5.2 (iv), we shall need several notions from the works [12, 13]. As in [12, eq. (1.8)], we set

$$\begin{aligned} r_\star = \lfloor c \sqrt{\log N} \rfloor \end{aligned}$$
(5.16)

for the constant \(c>0\) from [12]. Following [12, eq. (1.9)], we define

(5.17)

for \(u >0\). For any \(\tau \in [1 + \xi ^{1/2}, 2]\), we denote by \({\mathbb {G}}_\tau \) the pruned graph introduced in [12, Proposition 3.1]. We denote the balls and spheres in \({\mathbb {G}}_\tau \) around a vertex \(x \in [N]\) by \(B_i^\tau (x)\) and \(S_i^\tau (x)\), respectively. The pruned graph \({\mathbb {G}}_\tau \) is a subgraph of \({\mathbb {G}}\), which possesses a number of useful properties listed in [12, Proposition 3.1]. In particular, the balls \(B_{2r_\star }^\tau (x)\) and \(B_{2r_\star }^\tau (y)\) in \({\mathbb {G}}_\tau \) are disjoint if x, \(y \in [N]\) satisfy \(x \ne y\) and \(\min \{\alpha _x, \alpha _y \} \geqslant \tau \).

Recalling the definition of \(u_i(\alpha )\) from (1.10), for any \(x \in [N]\) with \(\alpha _x \geqslant 2 + \xi ^{1/4}\) and \(\sigma = \pm \), as in [12, eq. (3.5)], we define

(5.18)

where, for the last coefficient \(u_{r_\star }(\alpha _x)\) we make the special choice , and \(u_0(\alpha _x)> 0\) is chosen such that \(\textbf{v}_\sigma ^\tau (x)\) is normalized, i.e. \(\sum _{i=0}^{r_\star } u_i^2(\alpha _x) = 1\).

Remark 5.9

The family is orthonormal. See [12, Remark 3.3]

As in [12, Definition 3.6], we denote the adjacency matrix of \({\mathbb {G}}_\tau \) by \(A^\tau \) and define the matrix

(5.19)

where \(\chi ^\tau \) is the orthogonal projection onto . Moreover, we recall [12, Definition 3.10].

Definition 5.10

(\(\Pi ^\tau \), \({{\widehat{H}}}^\tau \)). Define the orthogonal projections (see Remark 5.9)

and the associated block matrix (recall (5.19))

(5.20)

We note that, for any \(\tau \in [1+ \xi ^{1/2},2]\), by (1.5) we have

$$\begin{aligned} \xi = o(1), \qquad \qquad \xi _{\tau -1} = o(1). \end{aligned}$$
(5.21)

For any \(\tau \in [1 + \xi ^{1/2}, 2]\), the definition of \({\widehat{H}}^\tau \) in (5.20) and [12, Proposition 3.12] yield that, with high probability,

(5.22)

where we used (3.1), (5.21) and (1.5) in the last two steps.

Owing to (5.21), [12, Lemmas 3.8, 3.11].Footnote 9, with high probability, we have

$$\begin{aligned} \Vert H - {\mathbb {E}}H - {\widehat{H}}^\tau \Vert = o(1). \end{aligned}$$
(5.23)

From (5.23) and (5.22), we conclude that, with high probability,

$$\begin{aligned} \Vert H - {\mathbb {E}}H \Vert \ll \sqrt{d}. \end{aligned}$$
(5.24)

After these preparations, we can start the proof of Proposition 5.2 (iv). We begin with the following definition.

Definition 5.11

(\(Q_r\)). For \(r \in {\mathbb {N}}\), denote by \(Q_r\) the orthogonal projection defined by restriction to the set \((\bigcup _{y\in {{\mathcal {V}}}{{\setminus }}{\mathcal W}}B_{r}(y))^{c}\).

Definition 5.12

(Q, \(\Pi _Q\)). Let \(r_\star \) be as in (5.16). Set and define \(\Pi _{Q}\) as the orthogonal projection onto , and write .

Then

$$\begin{aligned} \lambda _{1}({\overline{\Pi }}H{\overline{\Pi }})&\leqslant \lambda _{1}({\overline{\Pi }}(H-{\mathbb {E}}H){\overline{\Pi }})+\Vert {\overline{\Pi }}({\mathbb {E}}H){\overline{\Pi }}\Vert \nonumber \\&\leqslant \lambda _{1}({\overline{\Pi }}_{Q}{\widehat{H}}^{\tau }{\overline{\Pi }}_{Q})+ 2 \Vert H -{\mathbb {E}}H \Vert \Vert \Pi -\Pi _{Q}\Vert +\Vert H-{\mathbb {E}}H-{\widehat{H}}^{\tau }\Vert \nonumber \\&\quad +\Vert \overline{\Pi } \!\, ({\mathbb {E}}H) \overline{\Pi } \!\, \Vert \nonumber \\&\leqslant \lambda _{1}({\overline{\Pi }}_{Q}{\widehat{H}}^{\tau }{\overline{\Pi }}_{Q})+ 2 \Vert H -{\mathbb {E}}H \Vert \Vert \Pi -\Pi _{Q}\Vert +o(1), \end{aligned}$$
(5.25)

whose last step follows from (5.23) and \(\Vert \overline{\Pi } \!\, ({\mathbb {E}} H) \overline{\Pi } \!\, \Vert = o(1)\). The latter bound is a consequence of

$$\begin{aligned} \Vert {\overline{\Pi }}({\mathbb {E}}H){\overline{\Pi }}\Vert&= \sqrt{d} \Vert {\overline{\Pi }} ({\textbf{e}}{\textbf{e}}^{*} - 1/N ) {\overline{\Pi }}\Vert =\sqrt{d}\Vert {\overline{\Pi }}({\textbf{e}}-{\textbf{w}}_1)({\textbf{e}}^{*}-\textbf{w}_1^{*}){\overline{\Pi }}\Vert \nonumber \\&\quad + o(d^{-1/2}) \lesssim d^{-1/2}, \end{aligned}$$
(5.26)

where we introduced , used \(\Pi {\textbf{w}}_1 = {\textbf{w}}_1\) in the second step, and in the last step we used Proposition 3.4 (iv) to estimate \(\Vert {\textbf{e}} - \textbf{w}_1 \Vert \leqslant \Vert {\textbf{w}}_1 - {\textbf{q}} \Vert + \Vert {\textbf{q}} - {\textbf{e}} \Vert \lesssim d^{-1/2}\).

Fix \(\mu \in [0,1/4)\). We then claim that with high probability

$$\begin{aligned} \Vert \Pi -\Pi _Q\Vert \lesssim d^{-1} \end{aligned}$$
(5.27)

and

$$\begin{aligned} \lambda _1(\overline{\Pi } \!\,_Q {\widehat{H}}^\tau \overline{\Pi } \!\,_Q) \leqslant \Lambda (\alpha ^*) + \kappa /2. \end{aligned}$$
(5.28)

Using (5.27) and (5.28), Proposition 5.2 (iv) follows immediately from (5.25) and (5.24). What remains to prove Proposition 5.2 (iv), therefore, is the proof of (5.27) and (5.28).

5.5 Proof of (5.27)

Clearly,

$$\begin{aligned} \Vert \Pi - \Pi _Q \Vert = \Vert \Pi \overline{\Pi } \!\,_Q - \overline{\Pi } \!\, \Pi _Q \Vert \leqslant \Vert \Pi \overline{\Pi } \!\,_Q \Vert + \Vert \overline{\Pi } \!\, \Pi _Q \Vert = \Vert \overline{\Pi } \!\,_Q\Pi \Vert + \Vert \overline{\Pi } \!\, \Pi _Q \Vert , \end{aligned}$$
(5.29)

where in the last step we used that \((\Pi \overline{\Pi } \!\,_Q)^* = \overline{\Pi } \!\,_Q \Pi \). In order to estimate the terms on the right-hand side, we continue with

$$\begin{aligned} \Vert \overline{\Pi } \!\,_Q \Pi \Vert ^2 = \sup _{\begin{array}{c} {\textbf{v}} \in {{\,\textrm{ran}\,}}\Pi \\ \Vert {\textbf{v}} \Vert = 1 \end{array}} \Vert \overline{\Pi } \!\,_Q {\textbf{v}} \Vert ^2 \leqslant \sup _{\begin{array}{c} {\textbf{v}} \in {{\,\textrm{ran}\,}}\Pi \\ \Vert {\textbf{v}} \Vert = 1 \end{array}} \inf _{\begin{array}{c} {\textbf{u}} \in {{\,\textrm{ran}\,}}\Pi _Q \\ \Vert {\textbf{u}} \Vert = 1 \end{array}} \Vert {\textbf{v}} - {\textbf{u}} \Vert ^2. \end{aligned}$$
(5.30)

We now apply the next lemma, whose proof is given in Sect. 5.7 below.

Lemma 5.13

Fix \(\mu \in [0,1/3)\). With high probability, for all \(x\in {\mathcal {W}}\), \(y \in {\mathcal {V}} {{\setminus }} {\mathcal {W}}\) and \(r \in {\mathbb {N}}\) satisfying \(r \ll \frac{\log N}{\log d}\), we have

$$\begin{aligned} \Vert {\textbf{u}}^\perp (x)|_{B_r(y)}\Vert \lesssim N^{-1/2 + \mu + o(1)}. \end{aligned}$$

Owing to Lemma 5.13 as well as parts (iii) and (i) of Proposition 3.2, we obtain

$$\begin{aligned} \Vert {\textbf{u}}^\perp (x) - Q {\textbf{u}}^\perp (x) \Vert = \Bigg (\sum _{y \in \mathcal V{\setminus } {\mathcal {W}}} \Vert {\textbf{u}}^\perp (x)|_{B_{2r_\star - 1}(y)} \Vert ^2\Bigg )^{1/2} \leqslant N^{-1/2 + 3 \mu /2 + o(1)}. \end{aligned}$$
(5.31)

Let \({\textbf{q}}\) be as in Proposition 3.4 (iv) with \(r = 2 r_\star - 2 \ll \frac{d}{\log \log N}\) by (1.5). Since \({{\,\textrm{supp}\,}}{\textbf{q}} \subset \big ( \bigcup _{x \in {\mathcal {V}}} B_{r + 1} (x) \big )^c = \big ( \bigcup _{x \in {\mathcal {V}}} B_{2r_\star -1} (x) \big )^c\), we conclude from Proposition 3.4 (iv) and the definition of Q that

$$\begin{aligned} \Vert {\textbf{w}}_1 -Q {\textbf{w}}_1 \Vert = \Vert (1-Q) ( {\textbf{w}}_1 - {\textbf{q}}) \Vert \leqslant \Vert {\textbf{w}}_1-{\textbf{q}}\Vert \lesssim d^{-1}. \end{aligned}$$
(5.32)

For \(\gamma _1, \gamma _x \in {\mathbb {R}}\) for \(x \in {\mathcal {W}}\) write

(5.33)

Then

$$\begin{aligned} \Vert {\textbf{v}} - {\textbf{u}} \Vert\lesssim & {} d^{-1} |\gamma _1 | + N^{-1/2 + 3\mu /2 + o(1)} \sum _{x \in {\mathcal {W}}} |\gamma _x | \\\lesssim & {} d^{-1} |\gamma _1 | + N^{-1/2 + 2\mu +o(1)} \bigg (\sum _{x\in {\mathcal {W}}} |\gamma _x |^2 \bigg )^{1/2} \lesssim d^{-1} \Vert {\textbf{v}} \Vert . \end{aligned}$$

Here, we used in (5.32) and (5.31) in the first step, Proposition 3.2 (i) in the second step and, in the fourth step, \(\mu < 1/4\) as well as \(\Vert {\textbf{v}} \Vert ^2 \asymp |\gamma _1 |^2 + \sum _{x \in {\mathcal {W}}} |\gamma _x |^2\) (the inequality \( > rsim \) follows from (5.14) and the orthogonality of \(({\textbf{u}}^\perp (x))_{x \in {\mathcal {W}}}\); the inequality \(\lesssim \) is trivial). Hence, if \(\Vert {\textbf{v}} \Vert = 1\) then \(\Vert {\textbf{u}} \Vert = 1 + O(d^{-1})\) and, thus, \(\Vert {\textbf{v}} - \frac{\textbf{u}}{\Vert {\textbf{u}} \Vert } \Vert \lesssim d^{-1}\). Therefore, \(\Vert \overline{\Pi } \!\,_Q \Pi \Vert \lesssim d^{-1}\) by (5.30).

Finally, similarly to (5.30), we have

$$\begin{aligned} \Vert \overline{\Pi } \!\, \Pi _Q \Vert ^2 \leqslant \sup _{\begin{array}{c} {\textbf{u}} \in {{\,\textrm{ran}\,}}\Pi _Q \\ \Vert {\textbf{u}} \Vert = 1 \end{array}} \inf _{\begin{array}{c} {\textbf{v}} \in {{\,\textrm{ran}\,}}\Pi \\ \Vert {\textbf{v}} \Vert = 1 \end{array}} \Vert {\textbf{v}} - {\textbf{u}} \Vert ^2, \end{aligned}$$

and the same argument as above, with the representation (5.33), implies that the right-hand side is \(O(d^{-1})\). By (5.29), we therefore conclude (5.27).

5.6 Proof of (5.28)

We begin by introducing another orthogonal projection.

Definition 5.14

(\(\Pi _{{\textbf{v}}}\)). Let \(\Pi _{{\textbf{v}}}\) be the orthogonal projection onto .

For any \(z\in {\mathcal {V}} {{\setminus }} {\mathcal {W}}\), we have \({{\,\textrm{supp}\,}}\textbf{v}^\tau _\pm (z)\subset B_{r_\star }(z)\) and, thus, by definition of Q, \({{\,\textrm{supp}\,}}{\textbf{v}}^\tau _\pm (z) \cap {{\,\textrm{supp}\,}}Q{\textbf{u}}^\perp (x) = \emptyset \) for any \(x \in {\mathcal {W}}\) and \({{\,\textrm{supp}\,}}{\textbf{v}}^\tau _\pm (z) \cap {{\,\textrm{supp}\,}}Q \textbf{w}_1 = \emptyset \). Therefore, \({\textbf{v}}^\tau _\pm (z)\) is orthogonal to \({{\,\textrm{ran}\,}}\Pi _Q\), i.e. \(\Pi _Q {\textbf{v}}_\pm ^\tau (x) = 0\). That implies \(\Pi _{{\textbf{v}}} \Pi _Q = 0 = \Pi _Q \Pi _{{\textbf{v}}}\) and, in particular, \(\Pi _{{\textbf{v}}}\) and \(\Pi _{Q}\) commute. Since \(\Pi _{{\textbf{v}}}\) and \({\widehat{H}}^\tau \) commute and \(\Pi _{{\textbf{v}}}\) and \(\Pi _Q\) commute, we obtain

$$\begin{aligned} \begin{aligned} \lambda _{1}({\overline{\Pi }}_{Q}{\widehat{H}}^{\tau }{\overline{\Pi }}_{Q})&=\lambda _{1}({\overline{\Pi }}_{Q}\Pi _{{\textbf{v}}}{\widehat{H}}^{\tau }\Pi _{\textbf{v}}{\overline{\Pi }}_{Q}+{\overline{\Pi }}_{Q}{\overline{\Pi }}_{\textbf{v}}{\widehat{H}}^{\tau }{\overline{\Pi }}_{{\textbf{v}}}{\overline{\Pi }}_{Q})\\&=\max \{\lambda _{1}({\overline{\Pi }}_{Q}\Pi _{\textbf{v}}{\widehat{H}}^{\tau }\Pi _{{\textbf{v}}}{\overline{\Pi }}_{Q}),\, \lambda _{1}({\overline{\Pi }}_{Q}{\overline{\Pi }}_{\textbf{v}}{\widehat{H}}^{\tau }{\overline{\Pi }}_{{\textbf{v}}}{\overline{\Pi }}_{Q})\}. \end{aligned} \end{aligned}$$
(5.34)

By the definition of \({\widehat{H}}^\tau \) in (5.20), we have

$$\begin{aligned} \lambda _1(\Pi _{{\textbf{v}}}{\widehat{H}}^\tau \Pi _{{\textbf{v}}})= \max _{x\in {\mathcal V}{\setminus }{{\mathcal {W}}}}\Lambda (\alpha _{x}) \leqslant \Lambda (\alpha ^{*})+\kappa /2. \end{aligned}$$
(5.35)

What remains, therefore, is to estimate \(\lambda _{1}({\overline{\Pi }}_{Q}{\overline{\Pi }}_{\textbf{v}}{\widehat{H}}^{\tau }{\overline{\Pi }}_{{\textbf{v}}}{\overline{\Pi }}_{Q})\).

Suppose that \({\textbf{w}}\) is a normalized eigenvector of \({\overline{\Pi }}_{Q}{\overline{\Pi }}_{\textbf{v}}{\widehat{H}}^\tau {\overline{\Pi }}_{{\textbf{v}}}{\overline{\Pi }}_{Q}\) such that the associated eigenvalue \(\lambda \) satisfies \(\lambda \geqslant \Lambda (\alpha ^*)+\kappa /2\). We now check that the next lemma, whose proof is given in Sect. 5.7 below, is applicable to \({\textbf{w}}\) and \(\lambda \) for any \(x \in {\mathcal {V}} {{\setminus }} {\mathcal {W}}\).

Lemma 5.15

Let \(r_\star \in {\mathbb {N}}\) be as in (5.16). In particular, \(r_\star \asymp \sqrt{\log N}\). Suppose \(\tau \in [1 + \xi ^{1/2},2]\).

There is a constant \(C>0\) such that the following holds with high probability,Footnote 10. Let \(x \in [N]\), \(\lambda > 2 \tau + C \xi \) and \({\textbf{w}}\) satisfy

$$\begin{aligned} ({\widehat{H}}^\tau {\textbf{w}})|_{B_{2r_\star -1}^\tau (x)}=\lambda \textbf{w}|_{B_{2r_\star -1}^\tau (x)}. \end{aligned}$$
(5.36)

If \(\alpha _x \geqslant 2 + \xi ^{1/4}\) and \({\textbf{v}}_-^\tau (x) \perp {\textbf{w}} \perp {\textbf{v}}_+^{\tau }(x)\) or \(2 + \xi ^{1/4} > \alpha _x \geqslant \tau \) then

(5.37)

An analogous result holds if \(\lambda < - 2\tau - C \xi \).

Choosing \(\tau = 1 + \xi ^{1/2}\) and taking \(x \in {\mathcal {V}} {{\setminus }} {\mathcal {W}}\), we now verify the conditions of Lemma 5.15 for \({\textbf{w}}\) and \(\lambda \). Owing to (5.21), there is a constant \(c\equiv c_\kappa >0\) such that \(\frac{2 \tau + C \xi }{\lambda } \leqslant 1 - c\) as \(\lambda \geqslant \Lambda (\alpha ^*) + \kappa /2 \geqslant 2 + 2c\) due to the definition of \(\alpha ^*(\mu )\) in (1.6). In particular, \(\lambda > 2 \tau + C \xi \). From \({\overline{\Pi }}_{Q}{\overline{\Pi }}_{\textbf{v}}{\widehat{H}}^\tau {\overline{\Pi }}_{{\textbf{v}}}{\overline{\Pi }}_{Q}{\textbf{w}} = \lambda {\textbf{w}}\) we conclude that \(\overline{\Pi } \!\,_Q {\textbf{w}} = {\textbf{w}}\) and \(\overline{\Pi } \!\,_{{\textbf{v}}} {\textbf{w}} = {\textbf{w}} \). Thus, as \(\Pi _{{\textbf{v}}}\) and \({\widehat{H}}^\tau \) commute, we get \(\overline{\Pi } \!\,_Q {\widehat{H}}^\tau {\textbf{w}} = \lambda {\textbf{w}}\). Restricting both sides in the last identity to \(B_{2r_\star - 1}^\tau (x)\) yields \(({\widehat{H}}^\tau {\textbf{w}})|_{B_{2r_\star -1}^\tau (x)} = \lambda {\textbf{w}} |_{B^\tau _{2r_\star - 1}(x)}\) as Q is the restriction to \(\big (\bigcup _{y \in {\mathcal {V}} {{\setminus }} {\mathcal {W}}} B_{2r_\star -1}(y) \big )^c\) and \(B_{2r_\star - 1}^\tau (x) \subset B_{2 r_\star -1}(x)\) by [12, Proposition 3.1 (iv)]. This proves (5.36). Note that \(\alpha _x \geqslant 2 + \xi ^{1/4}\). From \(\overline{\Pi } \!\,_{{\textbf{v}}} {\textbf{w}} = {\textbf{w}}\), we conclude \(\textbf{w}\perp {\textbf{v}}^\tau _+(x)\). Since is an orthonormal family by Remark 5.9, the definition of \(\Pi _{{\textbf{v}}}\) implies \(\Pi _{{\textbf{v}}} {\textbf{v}}_-^\tau (y) = 0\) for all \(y \in {\mathcal {V}}\). Therefore, as moreover \(\Pi _Q {\textbf{v}}_-^\tau (x) = 0\), we have \(\overline{\Pi } \!\,_Q \overline{\Pi } \!\,_{{\textbf{v}}} {\widehat{H}}^\tau \overline{\Pi } \!\,_{{\textbf{v}}} \overline{\Pi } \!\,_Q {\textbf{v}}_-^\tau (x) = -\Lambda (\alpha _x) {\textbf{v}}_-^\tau (x)\). Hence, \({\textbf{w}} \perp {\textbf{v}}_-^\tau (x)\). Therefore, we have verified all assumptions of Lemma 5.15 with \(\tau =1 + \xi ^{1/2}\) for \({\textbf{w}}\), \(\lambda \), and any \(x \in {\mathcal {V}} {\setminus } {\mathcal {W}}\).

Since \(\frac{2 \tau + C \xi }{\lambda } \leqslant 1 - c\) as shown above, for the right-hand side of (5.37), we get

$$\begin{aligned} \frac{\lambda ^2}{(\lambda - 2 \tau - C\xi )^2} \bigg ( \frac{2\tau + C \xi }{\lambda } \bigg )^{r_\star }= & {} \bigg ( 1 - \frac{2 \tau + C \xi }{\lambda } \bigg )^{-2} \bigg ( \frac{2 \tau + C \xi }{\lambda } \bigg )^{r_\star } \\\leqslant & {} c^{-2} (1 - c)^{r_\star } \ll d^{-1/2} \end{aligned}$$

as \(r_\star \asymp \sqrt{\log N}\). Therefore, Lemma 5.15, the disjointness of the balls \((B_{2r_\star }^\tau (x))_{x \in {\mathcal {V}} {\setminus } {\mathcal {W}}}\) (see the paragraph after (5.17) as well as [12, Proposition 3.1 (i)]) and \(\Vert {\textbf{w}} \Vert ^2 = 1\) imply

(5.38)

We recall from Definition 5.11 that \(Q_0\) denotes the orthogonal projection defined by restriction to the set \((\mathcal V{{\setminus }} {\mathcal {W}})^c\). Since \({\textbf{w}} = {\textbf{w}} |_{{\mathcal {V}}{{\setminus }} {\mathcal {W}}} + Q_0 {\textbf{w}}\), we obtain from (5.38), \(Q_0 {\widehat{H}}^\tau Q_0 = ({\widehat{H}}^\tau )^{({\mathcal {V}} {{\setminus }} {\mathcal {W}})}\), and \(\Vert {\widehat{H}}^\tau \Vert \ll \sqrt{d}\) by (5.22) that

(5.39)

Here, the third step is a consequence of \(\overline{\Pi } \!\,_Q {\textbf{w}} = {\textbf{w}}\). In the fifth step, we used (5.26) and (5.27) and the sixth step follows from (5.23) and \(\Vert M^{({\mathcal {V}}{\setminus }{\mathcal {W}})} \Vert \leqslant \Vert M \Vert \) for any matrix M.

We now apply the next result, whose proof is given in Sect. 5.7 below.

Lemma 5.16

Fix \(\mu \in [0,1/3)\). With high probability, \(\lambda _{1}({\overline{\Pi }}H^{({{\mathcal {V}}}{{\setminus }}{\mathcal W})}{\overline{\Pi }})\leqslant \Lambda (\alpha ^{*})+o(1).\)

From Lemma 5.16 and (5.39), we deduce that \(\lambda \leqslant \Lambda (\alpha ^*) + o(1)\), in contradiction with the assumption \(\lambda \geqslant \Lambda (\alpha ^*)+\kappa /2\). We conclude that \(\lambda _1({\overline{\Pi }}_{Q}{\overline{\Pi }}_{{\textbf{v}}}{\widehat{H}}^\tau {\overline{\Pi }}_{{{\textbf{v}}}}{\overline{\Pi }}_{Q})\leqslant \Lambda (\alpha ^*) + \kappa /2\). Owing to (5.34) and (5.35), this proves (5.28).

5.7 Proofs of auxiliary results

In this final subsection we prove Lemmas 5.13, 5.15, and 5.16.

Proof of Lemma 5.13

For fixed \(y \in [N]\), on the event \(\Omega \cap \{ y \in \mathcal V{{\setminus }} {\mathcal {W}}\}\), if \(x \in {\mathcal {W}}^{(y)}\), let \({\textbf{u}}^{(y)}(x)\) be the eigenvector of \(H^{(({\mathcal {V}}^{(y)}\cup \{ y \} ) {{\setminus }} \{ x\})}\) with eigenvalue \(\lambda _2(H^{(({\mathcal {V}}^{(y)}\cup \{ y \} ) {{\setminus }} \{ x\})})\) and satisfying . See Corollaries 3.6 and 3.7 for the existence and uniqueness of \({\textbf{u}}^{(y)}(x)\).

In analogy to Definition 5.5, let \(((\textbf{u}^{(y)})^\perp (x))_{x \in {\mathcal {W}}^{(y)}}\) be the Gram-Schmidt orthonormalization of \(({\textbf{u}}^{(y)}(x))_{x \in {\mathcal {W}}^{(y)}}\). On \(\Omega \cap \{y \in {\mathcal {V}} {{\setminus }} {\mathcal {W}}\}\), we have \(\mathcal V^{(y)} \cup \{y \} = {\mathcal {V}}\) and \({\mathcal {W}}^{(y)} = {\mathcal {W}}\) by Remark 5.4. Therefore, \(\textbf{u}^{(y)}(x) = {\textbf{u}}(x)\) and, thus, \(({\textbf{u}}^{(y)})^\perp (x) = \textbf{u}^\perp (x)\) for all \(x \in {\mathcal {W}}^{(y)} = {\mathcal {W}}\) on \(\Omega \cap \{y \in {\mathcal {V}} {{\setminus }} {\mathcal {W}}\}\). Hence, for fixed x, \(y \in [N]\) with \(x \ne y\), we estimate

Here, in the first step, we also used that for all \(y \in {\mathcal {V}} {{\setminus }} {\mathcal {W}}\) and \(x \in {\mathcal {W}}^{(y)}\). In the second step, we conditioned on \(A^{(y)}\), employed the notations

and used that \({\mathcal {W}}^{(y)}\), \(({\textbf{u}}^{(y)})^\perp (x)\), and \(B_{r-1}^{(y)}(a)\) are \(A^{(y)}\)-measurable, that \(\Omega \subset \Omega _r^{(y)}\) by Proposition 3.1, as \(r \ll \frac{\log N}{\log d}\), and that \( a \in B_r(y)\) is equivalent to \(b \in S_1(y)\) for some \(b \in B_{r-1}^{(y)}(a)\). The third step follows from the independence of \(b \in S_1(y)\) and \(|S_1(y) |\) from \(A^{(y)}\). The normalization of \(({\textbf{u}}^{(y)})^{\perp }(x)\) and the definition of \(\Omega _r^{(y)}\) imply the fourth step. In the fifth step, we argued as in the last steps of (5.8) and, finally, we used \(r \ll \frac{\log N}{\log d}\) as well as the definitions of \({\mathcal {V}}\) in (2.1) and of \(\alpha ^*\) in (1.6).

Therefore, a union bound over x, \(y \in [N]\) with \(x \ne y\) and Chebyshev’s inequality complete the proof of Lemma 5.13. \(\square \)

Proof of Lemma 5.15

In order to prove Lemma 5.15, we follow [12, proof of Proposition 3.14 (i)]Footnote 11, whose assumptions are all satisfied apart from the eigenvalue-eigenvector relation \({\widehat{H}} {\textbf{w}} = \lambda {\textbf{w}}\).

We now explain the necessary, minor, modifications. We start with the case \(\alpha _x \geqslant 2 + \xi ^{1/4}\) and \({\textbf{w}}\perp \textbf{v}_\pm ^\tau (x)\) which corresponds to the case \(x \in {\mathcal {V}}\) in [12, Proposition 3.14]. The eigenvalue-eigenvector relation is used in [12, proof of Proposition 3.14 (i)] only in [12, eq. (3.55)]. Using (5.36) instead of the eigenvalue-eigenvector relation and the notation of [12, proof of Proposition 3.14 (i)], we now verify the first two steps in [12, eq. (3.55)]. We write \(P_{2r_\star - 1}\) for the orthogonal defined by restriction to the set \(B_{2r_\star - 1}^\tau (x)\). For any \(i < r_\star \), since \({{\,\textrm{supp}\,}}{\textbf{g}}_i \subset B_{2r_\star - 1}^\tau (x)\) by [12, eq. (3.52)], we have \(P_{2r_\star -1} {\textbf{g}}_i = {\textbf{g}}_i\). Therefore, for any \(i < r_\star \), we obtain

Here, we used (5.36) and \({\textbf{w}} \perp {\textbf{v}}_\pm ^\tau (x)\) in the second step and \({{\,\textrm{supp}\,}}{\textbf{g}}_i \subset B_{2r_\star -1}^\tau (x)\) as well as the definition of \({\widehat{H}}^{\tau ,x}\) from [12, eq. (3.48)] in the third step.

The remaining steps in [12, eq. (3.55)] and the remainder of [12, proof of Proposition 3.14 (i)] including the case \(2 + \xi ^{1/4} > \alpha _x \geqslant \tau \), which corresponds to the case \(x \in {\mathcal {V}}_\tau {{\setminus }} {\mathcal {V}}\) in [12], are obtained in the same way as in [12]. \(\square \)

Proof of Lemma 5.16

We write \(H^{({\mathcal {V}} {\setminus } {\mathcal {W}})}\) in the block decomposition

$$\begin{aligned} H^{({{\mathcal {V}}}{\setminus }{{\mathcal {W}}})}=\Pi H^{({{\mathcal {V}}}{\setminus }{\mathcal W})}\Pi +{\overline{\Pi }}H^{({{\mathcal {V}}}{\setminus }{\mathcal W})}{\overline{\Pi }}+{\overline{\Pi }}H^{({{\mathcal {V}}}{\setminus }{\mathcal W})}\Pi +\Pi H^{({{\mathcal {V}}}{\setminus }{{\mathcal {W}}})}{\overline{\Pi }}. \end{aligned}$$

The nonzero eigenvalues of the block diagonal arise as the eigenvalues of the individual diagonal blocks, i.e.

$$\begin{aligned}{} & {} {{\,\textrm{spec}\,}}(\Pi H^{({{\mathcal {V}}}{\setminus }{{\mathcal {W}}})}\Pi +{\overline{\Pi }}H^{({\mathcal V}{\setminus }{{\mathcal {W}}})}{\overline{\Pi }}) {\setminus } \{ 0\} \\{} & {} \quad =\big ( {{\,\textrm{spec}\,}}(\Pi H^{({{\mathcal {V}}}{\setminus }{\mathcal W})}\Pi )\cup {{\,\textrm{spec}\,}}({\overline{\Pi }}H^{({{\mathcal {V}}}{\setminus }{\mathcal W})}{\overline{\Pi }}) \big ) {\setminus } \{ 0\} , \end{aligned}$$

counted with multiplicity. Therefore, for any \(1\leqslant i\), \(j\leqslant N\) there are at least \(i+j\) eigenvalues of the block diagonal larger than \(\min \{\lambda _{i}({\overline{\Pi }}H^{({{\mathcal {V}}}{{\setminus }}{\mathcal W})}{\overline{\Pi }}),\, \lambda _{j}(\Pi H^{({{\mathcal {V}}}{{\setminus }}{\mathcal W})}\Pi )\}\) provided this number is positive. Hence, we conclude

$$\begin{aligned} \min \{\lambda _{1}({\overline{\Pi }}H^{({{\mathcal {V}}}{\setminus }{\mathcal W})}{\overline{\Pi }}),\,\lambda _{1+|{{\mathcal {W}}}|}(\Pi H^{({\mathcal V}{\setminus }{{\mathcal {W}}})}\Pi )\} \leqslant \lambda _{2+|{{\mathcal {W}}}|}(H^{({\mathcal V}{\setminus }{{\mathcal {W}}})})+2\Vert \overline{\Pi } \!\, H^{({{\mathcal {V}}}{\setminus }{{\mathcal {W}}})} {\Pi } \Vert . \end{aligned}$$
(5.40)

Moreover, using eigenvalue interlacing (Lemma D.3) and Proposition 3.4 (i), we obtain

$$\begin{aligned} \lambda _{2+|{{\mathcal {W}}}|}(H^{({{\mathcal {V}}}{\setminus }{{\mathcal {W}}})})\leqslant \lambda _{2}(H^{({{\mathcal {V}}})}) \leqslant \Lambda (\alpha ^{*})+o(1). \end{aligned}$$
(5.41)

Lemma 5.16 follows from (5.40) and (5.41) provided that we show that there is a constant \(c >0\) such that

$$\begin{aligned} \lambda _{1+|{{\mathcal {W}}}|}(\Pi H^{({{\mathcal {V}}}{\setminus }{{\mathcal {W}}})}\Pi ) \geqslant \Lambda (\alpha ^*) + c, \end{aligned}$$
(5.42)

and

$$\begin{aligned} \Vert \overline{\Pi } \!\, H^{({\mathcal {V}}{\setminus } {\mathcal {W}})} \Pi \Vert =o(1). \end{aligned}$$
(5.43)

For the proof of (5.42), we recall the projections \(\Pi _{{\mathcal {W}}}\) from Definition 5.7 and \(Q_0\) from Definition 5.11. By the definition of \({\textbf{u}}(x)\) for \(x \in {\mathcal {W}}\), we have \(\Pi _{{\mathcal {W}}} = Q_0 \Pi _{{\mathcal {W}}}\). Hence, (5.10) implies \((H - {\mathcal {D}})Q_{0} \Pi _{{\mathcal {W}}} = o(1)\). Applying \(Q_0\) to the last relation yields

$$\begin{aligned} H^{({\mathcal {V}}{\setminus } {\mathcal {W}})} \Pi _{{\mathcal {W}}} = {\mathcal {D}} \Pi _{{\mathcal {W}}} + o(1) \end{aligned}$$
(5.44)

as \(Q_0 H Q_0 = H^{({\mathcal {V}} {{\setminus }} {\mathcal {W}})}\) and \(Q_0 {\mathcal {D}}{{Q}}_{0} = {\mathcal {D}}\). Since \({{\,\textrm{ran}\,}}\Pi _{{\mathcal {W}}} \subset {{\,\textrm{ran}\,}}\Pi \) and \(\Pi _{\mathcal W} {\mathcal {D}} \Pi _{{\mathcal {W}}} = {\mathcal {D}}\), the definition of \({\mathcal {D}}\) and (5.44) imply that \(\Pi H^{({\mathcal {V}}{\setminus } {\mathcal {W}})} \Pi \) has at least \(|{\mathcal {W}} |\) many eigenvalues in \([\min _{x\in {\mathcal {W}}} \lambda (x) - o(1), \max _{x \in {\mathcal {W}}} \lambda (x) + o(1)]\). Note that \(\min _{x \in {\mathcal {W}}} \lambda (x) \geqslant \Lambda (\alpha ^*) + c \) for some constant \(c>0\) by (3.3) and (2.2).

Furthermore, let \({\textbf{q}}\) be as in Proposition 3.4 (iv) with \(r = 1\). From \(\Vert H^{({\mathcal {V}}{{\setminus }} {\mathcal {W}})} \Vert \leqslant \Vert H \Vert \lesssim \sqrt{d}\) by (5.24) and \(\Vert {\mathbb {E}}H \Vert \lesssim \sqrt{d}\), Proposition 3.4 (iv) and \(H^{({\mathcal {V}} {\setminus } {\mathcal {W}})} {\textbf{q}} = H {\textbf{q}}\), we deduce

$$\begin{aligned} H^{({\mathcal {V}} {\setminus } {\mathcal {W}})} {\textbf{w}}_1 = H^{({\mathcal {V}}{\setminus } {\mathcal {W}})} {\textbf{q}} + o(1) = H {\textbf{q}} + o(1) = H{\textbf{w}}_1 + o(1) = \lambda _1(H) {\textbf{w}}_1 + o(1). \end{aligned}$$
(5.45)

Hence, \(\Pi H^{({\mathcal {V}} {\setminus } {\mathcal {W}})}\Pi \) has an eigenvalue at \(\lambda _1(H) + o(1) = \sqrt{d}(1 + o(1)) > rsim \sqrt{d} \gg \max _{x\in {\mathcal {W}}} \lambda (x) + o(1)\) by Proposition 3.4 (ii), (3.1) and (3.3). Therefore, \(\Pi H^{({\mathcal {V}}{\setminus } {\mathcal {W}})} \Pi \) has \(1 + |{\mathcal {W}} |\) many eigenvalues larger or equal to \(\Lambda (\alpha ^*) + c\) for some constant \(c>0\). This proves (5.42).

Finally, (5.43) follows from (5.44), \({{\,\textrm{ran}\,}}{\mathcal {D}} \subset {{\,\textrm{ran}\,}}\Pi _{{\mathcal {W}}} \subset {{\,\textrm{ran}\,}}\Pi \) and (5.45). This completes the proof of Lemma 5.16. \(\square \)

6 Eigenvalue spacing: Proof of Proposition 2.4

We recall the definition of the high-probability event \(\Omega \) from Definition 3.5. In this section we use the notation from (5.1), as well as the conditional versions

Throughout this section, we assume that d satisfies (1.5), \(\mu \in (0,1/3)\) and that \(\eta \) satisfies

$$\begin{aligned} 0< \frac{\eta }{2} < \min \bigg \{ \frac{1}{6}, \frac{1}{5} - \frac{\mu }{4}, \frac{1}{3} - \mu \bigg \}. \end{aligned}$$
(6.1)

Proposition 2.4 follows directly from the following result.

Proposition 6.1

For any \(a\ne b\in [N]\), we have

$$\begin{aligned} {\mathbb {P}}_\Omega (a, b \in {\mathcal {W}} ,\, |\lambda (a) -\lambda (b)|\leqslant N^{-\eta } )\leqslant N^{-2 + 2\mu -\eta /4+o(1)}. \end{aligned}$$
(6.2)

Remark 6.2

If we restrict ourselves to the critical regime \(d \asymp \log N\), then Proposition 6.1 can be improved by replacing the factor \(N^{-\eta }\) inside the probability in (6.2) by \(N^{-\eta /2}\). See Remark 6.27 below for more details.

Proof of Proposition 2.4

The condition (6.1) holds by assumptions on \(\mu \) and \(\eta \). A union bound and Proposition 6.1 yield

where we used that \({\mathbb {P}}(\Omega ^c) = o(1)\) by definition of \(\Omega \) (recall Definition 3.5). As \(\eta > 8 \mu \), we conclude that the right-hand side is o(1). \(\square \)

6.1 Key tools of the proof of Proposition 6.1

The rest of this section is devoted to the proof of Proposition 6.1. Throughout, we fix deterministic vertices \(a \ne b \in [N]\) and suppose that \(\eta \) satisfies (6.1). We use the following definitions. Let

(6.3)

In particular, \(d^{r + 1} \leqslant N^{\eta /2} < d^{r+2}\). Note that r from (6.3) satisfies the condition of Proposition 3.1 (ii) and (3.2), i.e. the condition of Proposition 3.2.

Fig. 7
figure 7

An illustration of the \(\sigma \)-algebra \({\mathcal {F}}_i\). Here \(i = 3\), and the vertex b is drawn in green. Conditioning on \({\mathcal {F}}_i\) means that the graph is fixed in the ball \(B_i(b)\) and its complement, drawn in grey. The only randomness is the choice of the edges from \(S_i(b)\) to \(B_i(b)^c\), drawn in blue. By Lemma 6.4, these edges are chosen independently with probability d/N

Definition 6.3

We define the \(\sigma \)-algebra

for \(0 \leqslant i\leqslant r\), and we abbreviate \({\mathcal {F}} \equiv {\mathcal {F}}_r\).

More explicitly, we define \({\mathcal {F}}_i\) inductively through the filtration \({\mathcal {G}}_1 \subset {\mathcal {G}}_2 \subset \cdots \subset \mathcal G_i\), where \({\mathcal {G}}_1 = \sigma (A |_{\{b\}})\) and \({\mathcal {G}}_{k+1} = \sigma ({\mathcal {G}}_k, A |_{S_k(b)})\), since by construction \(S_k(b)\) is \({\mathcal {G}}_k\)-measurable. Then we set \({\mathcal {F}}_i = \sigma ({\mathcal {G}}_i, A |_{B_{i}(b)^c})\), using that \(B_i(b)\), and hence \(B_i(b)^c\), is \({\mathcal {G}}_i\)-measurable. See Fig. 7 for an illustration of \({\mathcal {F}}_i\). The following lemma is an immediate consequence of the definition of \({\mathcal {F}}_i\) and the independence of the family \((A_{xy})\).

Lemma 6.4

Conditionally on \({\mathcal {F}}_i\), the random variables are independent Bernoulli random variables with mean d/N.

Definition 6.5

For \(0 \leqslant i \leqslant r\) define and .

By definition, H(i) is \({\mathcal {F}}_i\)-measurable. We shall need the following regularization of the function \(t \mapsto t^{-1}\).

Definition 6.6

Define \(\iota :{\mathbb {R}}\rightarrow {\mathbb {R}}\) through

(6.4)

with .

Remark 6.7

The function \(\iota \) is an involution on \({\mathbb {R}}\) with Lipschitz constant \(T^2\).

Definition 6.8

For \(x \in S_i(b)\) we define .

Definition 6.9

For each \(z \in {\mathbb {R}}\), we define the family \((g_x(z))_{x\in B_r(b){{\setminus }} \{b\}}\) recursively through

$$\begin{aligned} {\left\{ \begin{array}{ll} g_x(z) = - \iota \Big ( z + \frac{1}{d} \sum _{y \in S_1^+(x) {\setminus } {\mathcal {V}}^{(B_r(b))}} G_{yy}(r,z)\Big ) &{} \text { if }x \in S_r(b) \\ g_x(z) = -\iota \Big (z + \frac{1}{d} \sum _{y \in S_1^+(x)} g_y(z) \Big ) &{} \text { if }x \in B_{r - 1}(b){\setminus } \{b\}. \end{array}\right. } \end{aligned}$$
(6.5)

The following remark contains a crucial independence property of the family \((g_x(z))_x\).

Remark 6.10

Conditioned on \({\mathcal {F}}\), if \({\mathbb {G}}|_{B_r(b)}\) is a tree and z is \({\mathcal {F}}\)-measurable, then for any \(1 \leqslant i \leqslant r\) the family \((g_x(z))_{x \in S_i(b)}\) is independent. To see this, we first note that \((g_x(z))_{x \in S_r(b)}\) is independent conditioned on \(\mathcal F\) because of Lemma 6.4. For \(i \leqslant r - 1\), the statement follows inductively by using the tree structure of \({\mathbb {G}}|_{B_r(b)}\).

Remark 6.11

The function \(\iota :{\mathbb {R}}\rightarrow {\mathbb {R}}\) is a regularized version of the function \(t \mapsto t^{-1}\) as a function \({\mathbb {R}}_+ \rightarrow {\mathbb {R}}_+\). The regularization acts on both small values of t (for \(t < T^{-1}\)) and large values of t (for \(t > T\)). The former regularization is needed to ensure the Lipschitz continuity of the function \(\iota \), which is used in the proof of Proposition 6.15 to ensure the stability of \(g_x(z)\) under change of the argument z. The latter regularization is needed to ensure the Lipschitz continuity of the function \(\iota ^{-1}\), which is used in the proof of Proposition 6.26 below to ensure that anticoncentration of a random variable is preserved, up to a factor \(T^2\), by applying \(\iota \). In the proof of Proposition 6.15 below, we show and use that with high probability the argument of \(\iota \) is always contained in the interval \([T^{-1}, T]\) where \(\iota \) coincides with \(t \mapsto t^{-1}\). Moreover, we choose the lower and upper bounds, \(T^{-1}\) and T, to be each other’s inverses for convenience, since in that case \(\iota \) is an involution. Actually, an inspection of our proof shows that the lower bound \(T^{-1}\) could be replaced with the larger value \(\kappa /10\). Finally, we note that in the critical regime \(d \asymp \log N\), the parameter T is of order one. This observation can be used to improve Proposition 6.1 somewhat in that regime; see Remarks 6.2 and 6.27.

We now state the three key propositions that underlie the proof of Propositions 6.16.12, 6.14, and 6.15. Their proofs are postponed to Sects. 6.5, 6.3, and 6.4, respectively.

Using the independence from Remark 6.10, we obtain the following anticoncentration estimate for the family \((g_x(z))_{x\in B_r(b){\setminus }\{b\}}\).

Proposition 6.12

(Anticoncentration of \(g_x\)). Let \(z \in {\mathcal {J}}\) be \({\mathcal {F}}\)-measurable and \((g_x(z))_x\) be defined as in Definition 6.9. Then for any \(a,b \in [N]\) we have

$$\begin{aligned} {\mathbb {P}}_\Omega \bigg (a, b \in {\mathcal {W}}, \, \biggl |\frac{1}{d}\sum _{x\in S_1(b)} g_{x}(z) +z \biggr | \leqslant N^{-\eta } \bigg ) \leqslant N^{-2 + 2\mu -\eta /4+o(1)}. \end{aligned}$$
(6.6)

For each \(0 \leqslant i \leqslant r\) we shall need the following \(\mathcal F_i\)-measurable approximation of \(\lambda (a)\).

Definition 6.13

For \(i \in [r]\) we abbbreviate .

By definition, \(\lambda (a,i)\) is \({\mathcal {F}}_i\)-measurable. The next results states that \(\lambda (a,i)\) is with high probability close to \(\lambda (a)\).

Proposition 6.14

(Comparison of \(\lambda (a)\) and \(\lambda (a,i)\)). For any small enough \(\varepsilon >0\), we have

$$\begin{aligned} {\mathbb {P}}_\Omega \bigl (a, b \in {\mathcal {W}}, \, \exists i \in [r],\, | \lambda (a)-\lambda (a,i) |\geqslant \varepsilon \bigr )\leqslant \varepsilon ^{-2} N^{-3 + 2\mu +\eta /2+o(1)}. \end{aligned}$$

The next result states that, when choosing the spectral parameter \(z = \lambda (a,r)\), the Green function entries of \(H^{({\mathcal {V}})}\) on \(S_1(b)\) are well approximated by the family \((g_x)\) from Definition 6.9.

Proposition 6.15

(Approximation of Green function by \(g_x\)). Let \((g_x(z))_x\) be defined as in Definition 6.9 with . For any constant \(c > 0\) and any \(\varepsilon \leqslant N^{-c}\), we have

$$\begin{aligned} {\mathbb {P}}_\Omega \Bigl (a,b \in {\mathcal {W}},\, \exists x,y\in S_1(b), \, |(H^{({\mathcal {V}})}-z)^{-1}_{xy} -g_x(z)\mathbb {1}_{x=y}|\geqslant \varepsilon \Bigr )\leqslant \varepsilon ^{-2} N^{-3 + 2\mu + 2 \eta + o(1)}. \end{aligned}$$

6.2 Proof of Proposition 6.1

In this subsection we prove Proposition 6.1. We begin by introducing the following events that we use throughout this section. Recall the definitions of the sets \({\mathcal {V}}^{(X)}\) and \({\mathcal {W}}^{(X)}\) from (5.4) and (5.5).

Definition 6.16

We define

and, \(1 \leqslant i \leqslant r\),

Remark 6.17

We record the following straightforward properties of \(\Xi \) and \(\Xi _i\).

  1. (i)

    \(\Xi _i\) is \({\mathcal {F}}_i\)-measurable.

  2. (ii)

    \(\Xi _i \subset \Xi \) (by definition of \({\mathcal {W}}^{(B_i(b))}\)).

  3. (iii)

    On \(\Omega \), for any \(b \in {\mathcal {W}}\) and \(1 \leqslant i \leqslant r\) we have \({\mathcal {V}}^{(B_i(b))} = {\mathcal {V}} {{\setminus }} \{b\}\) and \(\mathcal W^{(B_i(b))} = {\mathcal {W}}{{\setminus }} \{b\}\) (see Proposition 3.2 (iii)). In particular, \(\Omega \cap \Xi _i = \Omega \cap \Xi \) for all \(1 \leqslant i \leqslant r\).

Next, we state a basic result that, together with Lemma 3.8, is used throughout this section to establish the boundedness of the Green function for certain spectral parameters.

Lemma 6.18

On \(\Omega \cap \Xi \) we have \(\lambda (a,i) = \Lambda (\alpha _a) + o(1)\) and \(\lambda (a,i) \in {\mathcal {J}}\) for all \(1 \leqslant i \leqslant r\).

Proof

By Remark 6.17 (iii), on \(\Omega \) the assumptions of Corollary 3.6 are satisfied for \(x = a\) and \(X =( {\mathcal {V}}^{(B_{i}(b))} \cup B_{i}(b)) {{\setminus }} \{a\}\). Therefore, \(\lambda (a,i) = \Lambda (\alpha _a)+o(1)\), from which we conclude that \(\lambda (a,i) \geqslant \Lambda (\alpha ^*) + \kappa /4\) by the definition (2.2) of \({\mathcal {W}}\) as well as \(\lambda (a,i) \ll \sqrt{d}\) by (3.1). \(\square \)

Proof of Proposition 6.1

By spectral decomposition of \(H^{({\mathcal {V}} {\setminus } \{b\})}\), we have for any \(t > 0\). By Corollary 3.7, on \(\Omega \) we have , which implies

$$\begin{aligned} \lim _{t \downarrow 0} \frac{1}{(H^{({\mathcal {V}} {\setminus } \{b\})} - \lambda (b) - \textrm{i}t)^{-1}_{bb}} = 0, \end{aligned}$$

and hence Schur’s complement formula yields

$$\begin{aligned} \lambda (b) + \frac{1}{d} \sum _{x,y \in S_1(b)} ( H^{({\mathcal {V}})}-\lambda (b))_{xy}^{-1} =0. \end{aligned}$$

Therefore, with \(z=\lambda (a,r)\), we obtain from the definition of the family \((g_x(\lambda (a,r)))_x\) in Definition 6.9 that

$$\begin{aligned}&\biggl |\lambda (a,r) + \frac{1}{d}\sum _{x\in S_1(b)} g_{x}(\lambda (a,r)) \biggr |\\&\quad \leqslant |\lambda (a,r) - \lambda (b)| + \frac{1}{d} \sum _{x,y \in S_1(b)} \bigl |(H^{({\mathcal {V}})}-\lambda (a,r))^{-1}_{xy} - (H^{({\mathcal {V}})}-\lambda (b))^{-1}_{xy} \bigr |\\&\quad \quad + \frac{1}{d} \sum _{x,y \in S_1(b)} |g_x(\lambda (a,r))\mathbb {1}_{x=y}- (H^{({\mathcal {V}})}-\lambda (a,r))^{-1}_{xy}|\\&\quad \leqslant C\kappa ^{-2}(\log N)^2 (|\lambda (a)-\lambda (b)|+ |\lambda (a,r)-\lambda (a)|) \\&\quad \quad + \frac{1}{d} \sum _{x,y \in S_1(b)} |g_x(\lambda (a,r))\mathbb {1}_{x=y} - ( H^{(\mathcal V)}-\lambda (a,r))^{-1}_{xy}| \end{aligned}$$

on the event \(\Omega \), where we used (3.5) and Proposition 3.2 (i) for the second inequality. Here C is some positive constant. Thus, for any \(\gamma \geqslant 0\), we obtain

$$\begin{aligned}{} & {} {\mathbb {P}}_{\Omega \cap \Xi }(C\kappa ^{-2}(\log N)^2|\lambda (b)-\lambda (a)|\leqslant \gamma ) \\{} & {} \quad \leqslant {\mathbb {P}}_{\Omega \cap \Xi }\bigg (\biggl |\frac{1}{d}\sum _{x\in S_1(b)} g_{x}(\lambda (a,r)) +\lambda (a,r) \biggr |\leqslant 3\gamma \bigg ) \\{} & {} \quad \quad + {\mathbb {P}}_{\Omega \cap \Xi }\big (C\kappa ^{-2}(\log N)^2|\lambda (a)-\lambda (a,r)|\geqslant \gamma \big ) \\{} & {} \quad \quad + {\mathbb {P}}_{\Omega \cap \Xi }\bigg (\frac{1}{d} \sum _{x,y \in S_1(b)} \Bigl |( H^{(\mathcal V)}-\lambda (a,r))^{-1}_{xy}-g_x(\lambda (a,r))\mathbb {1}_{x=y} \Bigr |\geqslant \gamma \bigg ). \end{aligned}$$

Now Proposition 6.1 follows with the choice , applying Proposition 6.12 to the first line, Proposition 6.14 to the second line, and Proposition 6.15 combined with Proposition 3.2 (i) on \(\Omega \) to the third line. Here we used Lemma 6.18 to ensure that \(\lambda (a,i) \in {\mathcal {J}}\). \(\square \)

6.3 Proof of Proposition 6.14

In this subsection we prove Proposition 6.14.

Proof of Proposition 6.14

We follow the proof of Proposition 5.3. Let \(\textbf{u}(a,i)\) be a normalized eigenvector of \(H^{(({\mathcal {V}}^{(B_i(b))}\cup B_i(b)){{\setminus }} \{ a \})}\) associated with the eigenvalue \(\lambda (a,i)\). Then, as \({{\,\textrm{supp}\,}}{\textbf{u}}(a,i) \subset ({\mathcal {V}}^{(B_i(b))} \cup B_i(b))^c \cup \{a \}\), on the event \(\Omega \cap \{ a, b \in {\mathcal {W}}\}\), using Remark 6.17 (iii) we obtain

$$\begin{aligned} (H^{({\mathcal {V}}{\setminus }{\{a\}})}-\lambda (a,i)){\textbf{u}}(a,i)\!=\! (H^{({\mathcal {V}} {\setminus } \{a \})} - H^{(({\mathcal {V}}^{(B_i(b))}\cup B_i(b)){\setminus } \{ a \})}){\textbf{u}}(a,i)\!=\!\sum _{y\in S_{i}({b})}\varepsilon _{ay}(i) {\textbf{1}}_{y}, \end{aligned}$$
(6.7)

where . From Lemma 6.4 we conclude, using Cauchy-Schwarz,

(6.8)

where we used that on \(\Omega \) we have \(|S_1^+(y) | \lesssim \log N\), that \({\textbf{u}}(a,i)\) is \({\mathcal {F}}_i\)-measurable, and that by Lemma 6.4 we have \({\mathbb {P}}(v \in S_1^+(y) \,|\, {\mathcal {F}}_i) = \frac{d}{N}\) for any \(y \in S_i(b)\) and \(v \notin B_i(b)\).

By the definition (2.2) of \({\mathcal {W}}\), we have \(\Lambda (\alpha _a) \geqslant \Lambda (\alpha ^*) + \kappa /2\), and on \(\Omega \) we have \(\Lambda (\alpha _a) \ll \sqrt{d}\) (recall (3.1)). Hence, by Corollary 3.6, for any small enough \(\varepsilon > 0\), if there are a scalar \({\widehat{\lambda }}\) and a normalized vector \(\widehat{{\textbf{u}}}\) such that, for some \(a \in \mathcal W\), \(\Vert (H^{({\mathcal {V}}{{\setminus }} \{ a\})} - {\widehat{\lambda }})\widehat{{\textbf{u}}} \Vert \leqslant \varepsilon \) and \({\widehat{\lambda }} = \Lambda (\alpha _a) + o(1)\), then \(|\lambda (a) - {\widehat{\lambda }} | \leqslant \varepsilon \) (as \(\lambda (a) = \lambda _2(H^{({\mathcal {V}}{{\setminus }} \{a \})})\)).

We apply this observation to the choices \({\widehat{\lambda }} = \lambda (a,i)\) and \(\widehat{{\textbf{u}}} = {\textbf{u}}(a,i)\), for which Lemma 6.18 yields \(\lambda (a,i) = \Lambda (\alpha _a) + o(1)\). Hence, for any small enough \(\varepsilon > 0\), we have \(|\lambda (a) - \lambda (a,i) | \leqslant \varepsilon \) provided that \(\Vert (H^{({\mathcal {V}}{{\setminus }} \{a\})}- \lambda (a,i)) {\textbf{u}}(a,i) \Vert \leqslant \varepsilon \). We can then estimate

$$\begin{aligned}&{\mathbb {P}}_\Omega (a,b \in {\mathcal {W}}, \,| \lambda (a)-\lambda (a,i)|> \varepsilon ) \\&\quad \leqslant {\mathbb {P}}_\Omega \left( \Xi _i \cap \Bigl \{\bigl \Vert ( H^{({\mathcal {V}}{\setminus }{\{a\}})}-\lambda (a,i) ){\textbf{u}}(a,i) \bigr \Vert ^2> \varepsilon ^2\Bigr \}\right) \\&\quad \leqslant \varepsilon ^{-2} \,{\mathbb {E}}\Bigg [\mathbb {1}_{\Xi _i} \mathbb {1}_{|S_i(b) | \leqslant N^{\eta /2 + o(1)}} {\mathbb {E}}_\Omega \Bigg [\sum _{y\in S_{i}({b})}(\varepsilon _{ay}(i))^{2}\,\bigg |\,{{\mathcal {F}}}_{i}\Bigg ] \Bigg ] \\&\quad \leqslant \varepsilon ^{-2} \, {\mathbb {P}}(\Xi _i) \, \frac{\log N}{N} N^{\eta /2 + o(1)} \leqslant \varepsilon ^{-2} \, N^{2\mu +\eta /2-3+o(1)}, \end{aligned}$$

where in the first step we used Remark 6.17 (iii), in the second step (6.7), Remark 6.17 (i), and the estimate \(|S_i(b) | \leqslant N^{\eta /2 + o(1)}\) on \(\Omega \) by Proposition 3.2 (ii) and (6.3), in the third step (6.8), and the fourth step Remark 6.17 (ii) and Lemma 6.19 below. Now Proposition 6.14 follows from a union bound over \(i \in [r]\) with \(r \lesssim \log N\). \(\square \)

The following result is used throughout the rest of this section.

Lemma 6.19

For any \(a \ne b \in [N]\) we have \({\mathbb {P}}(a,b \in {\mathcal {W}}) \leqslant N^{- 2 + 2 \mu }\).

Proof

Since \(0\leqslant \Lambda '(\alpha )=\frac{\alpha -2}{2(\alpha -1)^{3/2}} \leqslant \frac{1}{2}\) for all \(\alpha \geqslant 2\), we have

$$\begin{aligned} P(a,b \in {\mathcal {W}})&\leqslant {\mathbb {P}}\Big ( \min \{|S_1(a)|,|S_1(b)| \} \geqslant (\alpha ^*+\kappa ) d \Big ) \\&= {\mathbb {E}}\Bigl [{\mathbb {P}}\Big ( \min \{|S_1(a)|,|S_1(b)| \} \geqslant (\alpha ^*+\kappa ) d \,\Big |\, A_{ab}\Big )\Bigr ] \\&\leqslant {\mathbb {P}}\big (|S_1(a)| \geqslant (\alpha ^*+\kappa ) d -1\big )^2 \leqslant {\mathbb {P}}\big ( |S_1(a) | \geqslant \alpha ^* d \big )^2 \leqslant N^{-2+2\mu }, \end{aligned}$$

where we used in the third step that conditionally on \(A_{ab}\), \(S_1(a)\) and \(S_1(b)\) are independent and in the last step the definition of \(\alpha ^*\). \(\square \)

6.4 Proof of Proposition 6.15

This subsection is devoted to the proof of Proposition 6.4. We begin with the following result, which contains two estimates. The first one is an approximate version of Schur’s complement formula, where \(G(i - 1,z)\) is related to G(iz) at the cost of an error term; this amounts to removing not just the vertex \(x \in S_i(b)\) at which the Green function is evaluated, but the entire ball \(B_i(b)\). The second estimate provides an upper bound on the off-diagonal entries of the Green function.

Lemma 6.20

 

  1. (i)

    For \(1 \leqslant i \leqslant r\) let \(z_i \in {\mathcal {J}}\) be \({\mathcal {F}}_i\)-measurable and \(x \in S_i(b)\). Then

    $$\begin{aligned} -\frac{1}{G_{xx}(i-1,z_i)} = z_i + \frac{1}{d} \sum _{y \in S_1^+(x)} G_{yy}(i,z_i) +{\mathcal {E}}_i(x) \end{aligned}$$
    (6.9)

    where the error term \({\mathcal {E}}_i(x)\) satisfies, for any \(\varepsilon > 0\),

    $$\begin{aligned} {\mathbb {P}}_\Omega \left( \left| {\mathcal {E}}_i(x) \right| > \varepsilon \,|\, {\mathcal {F}}_i \right) \mathbb {1}_{b \in {\mathcal {W}}} \lesssim d^2 \kappa ^{-4} (\log N)^2 |S_i(b) |^2 N^{-1} \varepsilon ^{-2} . \end{aligned}$$
    (6.10)
  2. (ii)

    Let \(z \in {\mathcal {J}}\) be \({\mathcal {F}}_1\)-measurable. For any \(x \ne y \in [N]\) and \(\varepsilon > 0\), we have

    $$\begin{aligned} {\mathbb {P}}_\Omega \big ( | (H^{({\mathcal {V}})} - z)^{-1}_{xy} | \geqslant \varepsilon \,|\, {\mathcal {F}}_1 \big ) \mathbb {1}_{b \in {\mathcal {W}}} \mathbb {1}_{x,y \in S_1(b)} \lesssim \kappa ^{-4} (\log N) N^{-1} \varepsilon ^{-2}. \end{aligned}$$
    (6.11)

Proof of Proposition 6.15

We choose , which is \({\mathcal {F}}_i\)-measurable, and set \(z = z_r\). For \(\varepsilon > 0\) we introduce the event given by

with \({\mathcal {E}}_i(x)\) defined as in Lemma 6.20. We estimate the probability of \(\Theta _1^c\) using \(\Xi _i \in {\mathcal {F}}_i\), Lemmas 6.20 (i) and 6.19, Proposition 3.2 (ii), we well as Remark 6.17 as

$$\begin{aligned} {\mathbb {P}}_{\Omega \cap \Xi _i}(\exists x\in S_i (b) , |{\mathcal {E}}_i(x)| > \varepsilon ) \leqslant {\mathbb {P}}_{\Omega }(\Xi ) N^{-1 + 3\eta /2 + o(1)} \varepsilon ^{-2} \leqslant N^{-3 + 2 \mu + 3\eta /2 + o(1)} \varepsilon ^{-2}. \end{aligned}$$

Similarly, we find using Lemma 6.20 (ii) that

$$\begin{aligned} {\mathbb {P}}_{\Omega \cap \Xi _1}(\Theta _3^c) \leqslant {\mathbb {P}}_\Omega (\Xi ) N^{-1 + \eta + o(1)} \varepsilon ^{-2} \leqslant N^{-3 + 2 \mu + \eta /2 + o(1)} \varepsilon ^{-2}. \end{aligned}$$

Hence, using Remark 6.17 and Proposition 6.14, we have

$$\begin{aligned} {\mathbb {P}}_\Omega (\Theta ^c\cap \Xi )&\leqslant \sum _{i=1}^r \bigg ( {\mathbb {P}}_{\Omega \cap \Xi _i}(\exists x\in S_i (b) , |{\mathcal {E}}_i(x)|> \varepsilon )+{\mathbb {P}}_{\Omega \cap \Xi } (|z-z_i| > \varepsilon ) \bigg )\nonumber \\&\quad + {\mathbb {P}}_{\Omega \cap \Xi _1} \big ( \Theta _3^c\big ) \nonumber \\&\leqslant N^{-3 + 2 \mu +2\eta +o(1)} \varepsilon ^{-2}. \end{aligned}$$
(6.12)

We shall show below that there is a constant \(C >0\) such that on the event \(\Omega \cap \Theta \cap \Xi \) we have

$$\begin{aligned} |g_{x}(z)-G_{xx}(i - 1 ,z)|\leqslant C \varepsilon \sum _{j=0 }^{r-i+1} (C\kappa ^{-4}d^{-1})^{j}|S_{j}^{+}(x)|\leqslant N^{o(1)} \varepsilon \end{aligned}$$
(6.13)

for all \(1 \leqslant i \leqslant r\) and all \(x \in S_i(b)\). The second inequality in (6.13) follows from \(r\ll \log N\) and \(d^{-j}|S_j^+(x)|\leqslant d^{-j}|B_j(x)|\lesssim \log N\) for all \(x\in [N]\) by Proposition 3.2 (ii).

Before proving (6.13), we conclude Proposition 6.15 from (6.13) (after renaming \(\varepsilon \mapsto N^{-o(1)}\varepsilon \)) with \(i = 1\), the definition of \(\Theta _3\) and (6.12), where we used that \(G(0,z) = (H^{({\mathcal {V}})}-z)^{-1}\) on \(\Omega \cap \{ b \in {\mathcal {W}}\}\) as \({\mathcal {V}}^{(B_i(b))} = {\mathcal {V}} {{\setminus }} \{ b\}\).

What remains, therefore, is the proof of (6.13). We prove it by inductively decreasing i starting from \(i = r + 1\). By convention, for any \(x\in S_{r+1}(b)\) we denote , so that (6.13) trivially holds for \(i = r+1\). Therefore, we can assume throughout the following argument that \(g_x(z)\) is defined by the second case in Definition 6.9 for all \(x \in B_r(b) {\setminus } \{ b\}\) (note that on the event \(\Omega \cap \Xi \) we have \(S_1^+(x) {{\setminus }} {\mathcal {V}}^{(B_r(b))} = S_1^+(x)\) for any \(x \in S_r(b)\), by Proposition 3.2 (iii)).

To verify the induction step, we assume that (6.13) holds on \(S_j(b)\) for all \(i+1 \leqslant j \leqslant r + 1\) and consider \(x\in S_i(b)\). We first show that on \(\Omega \cap \Xi \cap \Theta \)

$$\begin{aligned} g_x(z) = -\bigg (z+\frac{1}{d}\sum _{y\in S_{1}^{+}(x)}g_{y}(z)\bigg )^{-1}. \end{aligned}$$
(6.14)

To that end, we conclude from the induction hypothesis, (3.5), the definition of \(\Theta _2\) and Proposition 3.2 (i) that

$$\begin{aligned}&\biggl |\frac{1}{d}\sum _{y\in S_{1}^{+}(x)}G_{yy}(i,z_{i})-\frac{1}{d}\sum _{y\in S_{1}^{+}(x)}g_{y}(z) \biggr |\nonumber \\&\quad \leqslant \frac{1}{d}\sum _{y\in S_{1}^{+}(x)}\left( |G_{yy}(i,z)-g_{y}(z)|+|G_{yy}(i,z_{i})-G_{yy}(i,z)|\right) \nonumber \\&\quad \leqslant d^{-1}|S_1(x) | \bigg (\max _{y \in S_1^+(x)} |G_{yy}(i,z)-g_{y}(z)|+(8/\kappa )^{2}|z-z_{i}|\bigg ) \nonumber \\&\quad \leqslant N^{o(1)} \varepsilon . \end{aligned}$$
(6.15)

By Lemma 6.18, on \(\Omega \cap \Xi \) we have \(z_i \in {\mathcal {J}}\). We recall from Lemma 3.9 that all Green function entries \(G_{xx}(i-1,z_i)\) and \(G_{yy}(i,z_{i})\) are negative for \(z_i \in {\mathcal {J}}\). We use the upper bound (3.4) for \(-G_{xx}(i-1,z_i)\) as well as (6.9) to obtain, on \(\Omega \cap \Xi \cap \Theta _1 \cap \Theta _2\),

$$\begin{aligned} \frac{10}{9T}&\leqslant \frac{\kappa }{9}\leqslant -\frac{1}{G_{xx}(i-1,z_i)}-{\mathcal {E}}_i(x)= z_{i}+\frac{1}{d}\sum _{y\in S_{1}^{+}(x)}G_{yy}(i,z_{i})\nonumber \\&\leqslant \Lambda (\alpha _{a})+ o(1) \leqslant \sqrt{2 \alpha _a} + o(1) \leqslant \frac{T}{2}. \end{aligned}$$
(6.16)

Here, we also used Corollary 3.6, which is applicable as \({\mathcal {V}}^{(B_i(b))} = {\mathcal {V}} {\setminus } \{b \}\) on \(\Omega \cap \{b\in {\mathcal {W}}\}\) by Proposition 3.2 (iii), \(\alpha _a \leqslant 10 d^{-1}\log N\) on \(\Omega \) by Proposition 3.2 (i), and the definitions of \(\Lambda \) and T. Then, by the assumption \(\varepsilon \leqslant N^{-c}\), we find that (6.16) and (6.15) imply \(T^{-1}\leqslant z+\frac{1}{d}\sum _{y\in S_{1}^{+}(x)}g_{y}(z)\leqslant T\), which yields (6.14) by the definitions of \(\iota \) and \(g_x(z)\) in (6.4) and Definition 6.9, respectively. This concludes the proof of (6.14).

Finally, we find on \(\Omega \cap \Xi \cap \Theta \)

$$\begin{aligned}&|g_{x}(z)-G_{xx}(i-1,z)|\\&\quad \leqslant g_{x}(z)G_{xx}(i-1,z_i)\left| g_{x}(z)^{-1}-G_{xx}(i-1,z_i)^{-1}\right| +(8/\kappa )^{2}|z-z_i|\\&\quad \leqslant (8/\kappa )^2 \bigg (\biggl |\frac{1}{d}\sum _{y\in S_{1}^{+}(x)}G_{yy}(i,z_{i})-\frac{1}{d}\sum _{y\in S_{1}^{+}(x)}g_{y}(z) \biggr |+|z-z_{i}|+|{{\mathcal {E}}}_{i}(x) |\bigg ) \\&\quad \leqslant (8/\kappa )^2\bigg (\frac{1}{d}\sum _{y\in S_{1}^{+}(x)} \sum _{j=0 }^{r-i} (C\kappa ^{-4}d^{-1})^{j}|S_{j}^{+}(y)|C \varepsilon +((8/\kappa )^2 d^{-1}|S_{1}^{+}(x)|+1)|z-z_{i}|+|{\mathcal E}_{i}(x) | \bigg )\\&\quad \leqslant C^{-1}\bigg (64 \sum _{j=1}^{r-i+1} (C\kappa ^{-4}d^{-1})^{j}|S_{j}^{+}(x)|+64^2\kappa ^{-4}(d^{-1}|S_{1}^{+}(x)|+2)\bigg )C \varepsilon . \end{aligned}$$

Here we used (3.5) in the first inequality, (6.9) and (6.14) in the second, (3.4) and the iteration hypothesis in the third and that \(|S_{j+1}^+(x)| =\sum _{y\in S_{1}^{+}(x)} |S_{j}^{+}(y)|\) and the definitions of \(\Theta _1\) and \(\Theta _2\) in the last one. We conclude (6.13) for large enough C. \(\square \)

What remains is the proof of Lemma 6.20.

Proof of Lemma 6.20

We begin with (i). Throughout the proof, we fix i and we condition on \({\mathcal {F}}_i\). We always assume that \(b \in {\mathcal {W}}\), which is an \({\mathcal {F}}_i\)-measurable event since \(i \geqslant 1\).

We start by observing that the event

satisfies \(\Omega \subset \Gamma _i\), by Proposition 3.2 (ii), (iii). See Fig. 8 for an illustration of \(\Gamma _i\).

Fig. 8
figure 8

An illustration of the event \(\Gamma _i\) for \(i = 3\). The vertices of \({\mathcal {V}}\) are drawn in green. By definition of \(\Gamma _i\), the red edges are forbidden

For the proof we abbreviate

(6.17)

so that on the event \(\Gamma _i\) we have \(H(i-1) = H^{({\mathcal {T}}_i)}\). From Schur’s complement formula we get on the event \(\Gamma _i\)

$$\begin{aligned} -\frac{1}{G_{xx}(i-1,z_i)} = z_i + \frac{1}{d} \sum _{u,v \in S_1^+(x)} (H^{({\mathcal {T}}_i \cup \{x\} )}-z_i)^{-1}_{uv} , \end{aligned}$$
(6.18)

where we used that x has no neighbours in \(S_i(b)\) by definition of \(\Gamma _i\).

We now decompose the error term \({\mathcal {E}}_i(x)\) from (6.9) into several summands estimated separately. To that end, let \(\{ x, y_1, \ldots , y_{|S_i(b) |-1} \} = S_i(b)\) be an enumeration of \(S_i(b)\). We set and

for \(j = 1, \ldots , |S_i(b) | - 1\). Then, from (6.18) we conclude that (6.9) holds with \({\mathcal {E}}_i(x) = {\mathcal {E}}_i^{(0)} - \sum _{j=1}^{|S_i(b) |-1} {\mathcal {E}}_i^{(j)}\). Chebyshev’s and the Cauchy-Schwarz inequalities yield

$$\begin{aligned} {\mathbb {P}}_\Omega ( |{\mathcal {E}}_i(x) | > \varepsilon \,|\, {\mathcal {F}}_i) \leqslant \varepsilon ^{-2} \, |S_i(b) |\bigg ( {\mathbb {E}}_\Omega [|{\mathcal {E}}_i^{(0)} |^2 \,|\,{\mathcal {F}}_i] + \sum _{j=1}^{|S_i(b) |-1} {\mathbb {E}}_\Omega [ |\mathcal E_i^{(j)} |^2 \,|\, {\mathcal {F}}_i] \bigg ). \end{aligned}$$
(6.19)

Therefore, it remains to estimate \({\mathbb {E}}_{\Omega } [ |\mathcal E_i^{(j)} |^2 \,|\, {\mathcal {F}}_i]\) for all \(j = 0, \ldots , |S_i(b) |-1\). To that end, we introduce a \(\sigma \)-algebra refining \({\mathcal {F}}_i\) from Definition 6.3. For any subset \(X \subset S_i(b)\) we define the \(\sigma \)-algebraFootnote 12

using that \(S_i(b)\) is \({\mathcal {F}}_i\)-measurable. See Fig. 9 for an illustration of \({\mathcal {F}}_i(X)\). Moreover, for \(X \subset S_i(b)\) we define the event

We note that for any \(X \subset S_i(b)\), the event \(\Delta _i(X)\) lies in \({\mathcal {F}}_i(X)\), since \(H^{({\mathcal {T}}_i \cup X)}\) is \(\mathcal F_i(X)\)-measurable (see the definition (6.17) and Fig. 9). Furthermore, by Lemma 3.8 for \(z_i \in {\mathcal {J}}\) we have

$$\begin{aligned} \Omega \subset \bigcap _{X \subset S_i(b)} \Delta _i(X). \end{aligned}$$
(6.20)
Fig. 9
figure 9

An illustration of the \(\sigma \)-algebra \({\mathcal {F}}_i(X)\). Here \(i = 3\), the vertex b is drawn in green, and the set \(X \subset S_i(b)\) is drawn in blue. Conditioning on \({\mathcal {F}}_i(X)\) means that we fix all edges within \(B_i(b)\), within \(B_i(b)^c\), and connecting \(S_i(b) {{\setminus }} X\) with \(B_i(b)^c\). The only randomness is the choice of the edges from X to \(B_i(b)^c\), drawn in blue. After removal of the vertices in \({\mathcal {T}}_i \cup X\) (see (6.17)), only black edges and edges within \(B_i(b)^c\) remain, which shows that \(H^{({\mathcal {T}}_i \cup X)}\) is \({\mathcal {F}}_i(X)\)-measurable. Note that for \(X = S_i(b)\) we have \({\mathcal {F}}_i(X) = {\mathcal {F}}_i\), and we recover the illustration from Fig. 7

For the estimate of \({\mathcal {E}}_i^{(0)}\), we introduce the sets and and the family \((Z_q)_{q \in {\mathcal {Q}}}\) defined by . We first note that, for any \( q = (u,v) \in {\mathcal {Q}}\), we have

$$\begin{aligned} {\mathbb {P}}( q \in Q \,|\, {\mathcal {F}}_i(\{x \})) = {\mathbb {P}}(u \in S_1^+(x), v \in S_1^+(x) \,|\, {\mathcal {F}}_i ) = \frac{d^2}{N^2} . \end{aligned}$$
(6.21)

In the last step we used that, by Lemma 6.4, for any \(X \subset S_i(b)\), conditionally on \({\mathcal {F}}_i(X)\), the random variables are independent Bernoulli-\(\frac{d}{N}\) random variables. Moreover, we get

$$\begin{aligned} \sum _{q \in {\mathcal {Q}}} |Z_q |^2 \mathbb {1}_{\Delta _i(\{x \})} \leqslant {\text {Tr}} ( (H^{({\mathcal {T}}_i \cup \{ x \})} - z_i)^{-2} ) \mathbb {1}_{\Delta _i(\{ x\})} \lesssim N \kappa ^{-2}. \end{aligned}$$
(6.22)

Since \({\mathcal {E}}_i^{(0)} = \frac{1}{d} \sum _{q \in Q} Z_q\), the inclusion \(\Omega \subset \Gamma _i \cap \Delta _i(\{x\})\) and the Cauchy-Schwarz inequality imply

$$\begin{aligned}{} & {} {\mathbb {E}}_\Omega \big [ |{\mathcal {E}}_i^{(0)} |^2 \big | {\mathcal {F}}_i(\{ x\}) \big ]\nonumber \\{} & {} \quad \leqslant \frac{1}{d^2} {\mathbb {E}}_\Omega \bigg [ |Q | \sum _{q \in Q} |Z_q |^2 \bigg | {\mathcal {F}}_i(\{ x\}) \bigg ] \nonumber \\{} & {} \quad \lesssim \frac{(\log N)^2}{ d^2}\, {\mathbb {E}}\bigg [ \sum _{q \in Q} |Z_q |^2 \mathbb {1}_{\Delta _i(\{x \})} \bigg | {\mathcal {F}}_i(\{ x\}) \bigg ] \nonumber \\{} & {} \quad \lesssim \frac{(\log N)^2}{ d^2} \max _{q \in {\mathcal {Q}}} {\mathbb {P}}( q \in Q \,|\, {\mathcal {F}}_i(\{x\})) \sum _{q \in {\mathcal {Q}}} |Z_q |^2 \mathbb {1}_{\Delta _i(\{x \})} \lesssim \frac{ (\log N)^2}{ \kappa ^2 N}, \end{aligned}$$
(6.23)

where we used that \(|S_1(x) | \lesssim \log N\) on \(\Omega \), by Proposition 3.2 (i), as well as the \(\mathcal F_i(\{x \})\)-measurability of \(|Z_q |^2 \mathbb {1}_{\Delta _i(\{x \})}\). The last step follows from (6.21) and (6.22).

To bound \({\mathcal {E}}_i^{(j)}\) for a fixed \(j \in \{ 1, \ldots , |S_i(b) |-1\}\), we conclude on the event \(\Gamma _i\) from the resolvent identity that

$$\begin{aligned} {\mathcal {E}}_i^{(j)} = \frac{1}{ d^{3/2}} \sum _{u \in S_1^+(x)} \big ( H^{({\mathcal {T}}_i\cup \{ y_0, \ldots , y_{j-1}\})} -z_i\big )^{-1}_{uy_j} \sum _{v \in S_1^+(y_j)} \big ( H^{({\mathcal {T}}_i \cup \{ y_0, \ldots , y_{j}\})} - z_i\big )^{-1}_{vu}. \end{aligned}$$

Therefore, by applying the Cauchy-Schwarz inequality twice and using (6.20) with \(X = \{y_0, \dots , y_{j}\}\), we obtain

$$\begin{aligned} |{\mathcal {E}}_i^{(j)} |^2 \mathbb {1}_{\Omega }&\leqslant \mathbb {1}_{\Omega } \frac{|S_1^+(x) |}{d^3} \sum _{u \in S_1^+(x)} |(H^{({\mathcal {T}}_i \cup \{ y_0, \ldots , y_{j-1}\})} - z_i)^{-1}_{uy_j} |^2\nonumber \\&\quad \biggl |\sum _{v \in S_1^+(y_j)} (H^{({\mathcal {T}}_i \cup \{y_0,\ldots ,y_j\})} - z_i)^{-1}_{vu} \biggr |^2 \nonumber \\&\lesssim \mathbb {1}_{\Omega } \frac{|S_1^+(x) | |S_1^+(y_j) |}{d^3\kappa ^2} \sum _{u \in S_1^+(x)} |(H^{(\mathcal T_i \cup \{ y_0, \ldots , y_{j-1}\})} - z_i)^{-1}_{uy_j} |^2 \nonumber \\&\lesssim \mathbb {1}_{\Omega } \frac{(\log N)^2}{d^3\kappa ^2} \sum _{u \in S_1^+(x)} |(H^{({\mathcal {T}}_i \cup \{ y_0, \ldots , y_{j-1}\})} - z_i)^{-1}_{uy_j} |^2, \end{aligned}$$
(6.24)

where in the last step we used \(|S_1(x) | + |S_1(y_j) | \lesssim \log N\) from Proposition 3.2 (i) on \(\Omega \).

We now set for \(u \in B_i(b)^c\), and . We apply \({\mathbb {E}}[ \, \cdot \,|\, {\mathcal {F}}_i(\{y_0, \ldots , y_{j-1}\})]\) to (6.24) and, similarly as in (6.23), obtain

$$\begin{aligned}&{\mathbb {E}}_{\Omega }[|{\mathcal {E}}_i^{(j)} |^2\,|\, {\mathcal {F}}_i(\{y_0, \ldots , y_{j-1}\})] \nonumber \\&\quad \lesssim \frac{(\log N)^2}{ d^3 \kappa ^2} \max _{u \in {\mathcal {U}}} {\mathbb {P}}( u\in U \,|\, {\mathcal {F}}_i(\{ y_0, \ldots , y_{j-1}\})) \sum _{u \in {\mathcal {U}}} |Y_{u} |^2 \mathbb {1}_{\Delta _i(\{ y_0, \ldots , y_{j-1}\})} \nonumber \\&\quad \lesssim \frac{(\log N)^2}{ d^2 \kappa ^4 N}, \end{aligned}$$
(6.25)

where we used that \((Y_u)_{u \in {\mathcal {U}}}\) is \({\mathcal {F}}_i(\{y_0, \dots , y_{j-1}\})\)-measurable, and that \(\sum _{u \in {\mathcal {U}}} |Y_{u} |^2\mathbb {1}_{\Delta _i(\{ y_0, \ldots , y_{j-1}\})} = \Vert (Y_u)_{u \in B_i(b)^c} \Vert ^2 \mathbb {1}_{\Delta _i(\{ y_0, \ldots , y_{j-1}\})} \lesssim \kappa ^{-2}\). We also used the remark following (6.21).

Finally, using the estimates (6.23) and (6.25) in (6.19) together with the tower property of the conditional expectation complete the proof of (6.9) and (6.10). This concludes the proof of (i).

Next, we prove (ii). For the proof of (6.11), we fix \(x,y \in S_1(b)\) and conclude from the resolvent identity [23, eq. (3.5)] that

$$\begin{aligned} (H^{({\mathcal {V}})}- z)^{-1}_{xy}&= - (H^{({\mathcal {V}})}-z)^{-1}_{xx} \sum _{u \notin {\mathcal {V}} \cup \{x \}} H_{xu} (H^{({\mathcal {V}} \cup \{x\})} - z)^{-1}_{uy} \\&= - \frac{1}{\sqrt{d}} (H^{({\mathcal {V}})}-z)^{-1}_{xx} \sum _{u \notin {\mathcal {V}} \cup \{x \}} \mathbb {1}_{u \in S_1^+(x)} (H^{({\mathcal {V}}\cup \{x\})} - z)^{-1}_{uy}. \end{aligned}$$

Therefore, the Cauchy-Schwarz inequality and Lemma 3.8 imply

$$\begin{aligned} |(H^{({\mathcal {V}})}- z)^{-1}_{xy} |^2 \mathbb {1}_{\Omega } \lesssim \mathbb {1}_{\Omega } \frac{|S_1^+(x) |}{\kappa ^2 d} \sum _{u \notin \mathcal T_1 \cup \{x \}} \mathbb {1}_{u \in S_1^+(x)}|(H^{({\mathcal {T}}_1 \cup \{x\})} - z)^{-1}_{uy} |^2, \end{aligned}$$

where we also used that \({\mathcal {V}} = \{b \} \cup {\mathcal {V}}^{(B_1(b))}=\mathcal T_1\) on \(\Omega \) by Proposition 3.2 (iii). Hence, since \(|S_1(x) | \lesssim \log N\) on \(\Omega \) by Proposition 3.2 (i), \(\Omega \subset \Delta _1(\{x\})\) and \(\Delta _1(\{x\}) \in {\mathcal {F}}_1(\{x\})\), we obtain

$$\begin{aligned}{} & {} {\mathbb {E}}\big [ |(H^{({\mathcal {V}})}- z)^{-1}_{xy} |^2 \mathbb {1}_{\Omega } \,|\, {\mathcal {F}}_1(\{x \}) \big ]\\{} & {} \quad \lesssim \frac{\log N}{\kappa ^2 d} \max _{u \notin {\mathcal {T}}_1 \cup \{ x\}} {\mathbb {P}}(u \in S_1^+(x) \,|\, {\mathcal {F}}_1(\{x \})) \sum _{u \notin {\mathcal {T}}_1 \cup \{ x\}} |(H^{({\mathcal {T}}_1 \cup \{x\})} - z)^{-1}_{uy} |^2 \mathbb {1}_{\Delta _1(\{x\})}\\{} & {} \quad \lesssim \frac{\log N}{\kappa ^4 N}. \end{aligned}$$

Here, in the last step, we used \({\mathbb {P}}(u \in S_1^+(x) \,|\, \mathcal F_1(\{x \})) \leqslant d/N\) and \(\sum _{u \notin {\mathcal {T}}_1 \cup \{ x\}} |(H^{({\mathcal {T}}_1 \cup \{x\})} - z)^{-1}_{uy} |^2 \mathbb {1}_{\Delta _1(\{x\})} \lesssim \kappa ^{-2}\). Thus, Chebyshev’s inequality and the tower property of the conditional expectation complete the proof of (6.11) and, therefore, the one of Lemma 6.20. \(\square \)

6.5 Proof of Proposition 6.12

This subsection is devoted to the proof of Proposition 6.12. We start by introducing the notion of a robust vertex.

Fig. 10
figure 10

An illustration of the set \({\mathcal {R}} \equiv {\mathcal {R}}_r({\mathbb {G}}, b)\) of robust vertices. Here \(r = 3\) and \(d = 4\). We draw the ball \(B_r(b)\) around the root b. The robust vertices are drawn in blue and the non-robust vertices in white. In this example, the root b is robust

Definition 6.21

Let \(b \in [N]\) and \(r \in {\mathbb {N}}^*\). We call a vertex \(y \in B_{r}(b)\) robust if

  1. (a)

    \(y\in S_{r}(b)\) or

  2. (b)

    \(y\in B_{r-1}(b)\) and at least d/2 vertices in \(S_1^{+}(y)\) are robust.

We denote by \({\mathcal {R}} \equiv {\mathcal {R}}_r({\mathbb {G}}, b) \subset B_r(b)\) the set of robust vertices.

Note that \({\mathcal {R}}\) is an \({\mathcal {F}}\)-measurable random set. See Fig. 10 for an illustration of Definition 6.21. The following result states that with high probability the root b is robust, conditioned on \(S_1(a)\) and \(S_1(b)\).

Proposition 6.22

(The root is robust) Suppose that \(\sqrt{\log N} \ll d \lesssim \log N\) and that r satisfies (3.2). Then \({\mathbb {P}}_\Omega (b \notin {\mathcal {R}} \,|\, S_1(a), S_1(b)) \lesssim N^{-1/2}\) whenever \(a,b \in {\mathcal {W}}\).

The proof of Proposition 6.22 is given at the end of this subsection. From now on, we choose r as in (6.3).

Definition 6.23

Let \(z \in {\mathcal {J}}\) be an \({\mathcal {F}}\)-measurable real random variable. We introduce the event \(\Upsilon \) on which the following conditions hold.

  1. (A)

    \({\mathbb {G}} |_{B_r(b)}\) is a tree.

  2. (B)

    \(b \in {\mathcal {R}}\).

  3. (C)

    \(-G_{yy}(r,z) \geqslant (3z)^{-1}\) for all \(y \in (B_r(b) \cup {\mathcal {V}}^{(B_r(b))})^c\).

  4. (D)

    \(|B_r(b) \cup {\mathcal {V}}^{(B_r(b))} | \leqslant N^{1/2}\).

Lemma 6.24

We have \(\Upsilon \in {\mathcal {F}}\) and \({\mathbb {P}}_\Omega (\Upsilon ^c \,|\, S_1(a), S_1(b)) = O(N^{-1/2})\) whenever \(a,b \in {\mathcal {W}}\).

Proof

That \(\Upsilon \in {\mathcal {F}}\) follows from the Definitions 6.5 and 6.21. The estimate follows from Proposition 6.22 and the facts that on \(\Omega \) the conditions (A), (C), and (D) hold surely. That (C) holds surely on \(\Omega \) follows from Lemma 3.9 and the observation that \({\mathcal {V}} = {\mathcal {V}}^{(B_r(b))}\) on \(\Omega \) (see Proposition 3.2 (iii)). That (D) holds surely on \(\Omega \) follows from the statements (i), (i), (ii) of Proposition 3.2 (recall the choice (6.3)). \(\square \)

We recall the definition of Lévy’s concentration function Q(XL) from (2.13).

Remark 6.25

The concentration function has the following obvious properties.

  1. (i)

    For any \(u > 0\) we have \(Q(u X, uL) = Q(X,L)\). More generally, if f is a continuous bijection on \({\mathbb {R}}\) such that \(f^{-1}\) is K-Lipschitz, then \(Q(f(X), L) \leqslant Q(X,KL)\).

  2. (ii)

    If X and Y are independent then \(Q(X + Y, L) \leqslant \min \{Q(X,L), Q(Y,L)\}\).

Property (ii) is in general not sharp, and in some situations it can be improved considerably; see Proposition 2.8. Nevertheless, in some situations (ii) gives a better bound than Proposition 2.8; this is due to the minimum in (ii) as opposed to the maximum in Proposition 2.8. An important theme in the proof of Proposition 6.12 is a judicious mix of (ii) and Proposition 2.8. How to do this mix is encoded by the set of robust vertices from Definition 6.21.

We denote by \(Q^{{\mathcal {F}}}\) Lévy’s concentration function with respect to the probability measure \({\mathbb {P}}(\, \cdot \, |\,\mathcal F)\). The main tool behind the proof of Proposition 6.12 is the following anticoncentration estimate for \(g_x(z)\).

Proposition 6.26

Let \(z \in {\mathcal {J}}\) be a \({\mathcal {F}}\)-measurable real random variable. There exists a constant \(\chi >0\) such that, on the event \(\Upsilon \), for any \(1 \leqslant i \leqslant r\) and \(x\in S_{i}(b)\cap {\mathcal {R}}\) we have

$$\begin{aligned} Q^{{\mathcal {F}}}\bigg (g_x(z),\frac{1}{8 z (T^2 d)^{r-i+1}}\bigg ) \leqslant \frac{1}{2 (\chi d)^{(r-i)/2}} . \end{aligned}$$

Before proving Proposition 6.26, we use it to conclude the proof of Proposition 6.12.

Proof of Proposition 6.12

We estimate

(6.26)

where we used Remark 6.17 (i) and (iii) as well as \(\Upsilon \in {\mathcal {F}}\) by Lemma 6.24. The second term on the right-hand side of (6.26) is estimated as

$$\begin{aligned} {\mathbb {P}}_\Omega (\Xi \cap \Upsilon ^c) = {\mathbb {E}}[\mathbb {1}_{a \in {\mathcal {W}}} \, \mathbb {1}_{b \in {\mathcal {W}}}\, {\mathbb {P}}_\Omega (\Upsilon ^c \,|\, S_1(a), S_1(b))] \lesssim N^{-1/2} \, {\mathbb {P}}(a, b \in {\mathcal {W}}), \end{aligned}$$

by Lemma 6.24.

To estimate the first term on the right-hand side of (6.26), we use Proposition 6.26 with \(i = 1\) and the estimate

$$\begin{aligned} 8z d (T^2 d)^{r} \leqslant d^{2r+3/2}\leqslant N^{\eta } \end{aligned}$$
(6.27)

where we used the definitions of r and T from (6.3) and Definition 6.6, as well as \(d \gg \sqrt{\log N}\). On \(\Upsilon \) we have \(b \in {\mathcal {R}}\) and hence, by Definition 6.21, there exists \(x_* \in S_1(b) \cap {\mathcal {R}}\). Using that \((g_x(z))_{x \in S_1(b)}\) is an independent family, on the event \(\Upsilon \) conditionally on \({\mathcal {F}}\), this yields, on the event \(\Upsilon \),

$$\begin{aligned}&{\mathbb {P}}\biggl (\biggl |\frac{1}{d}\sum _{x\in S_1(b)} g_{x}(z) +z \biggr | \leqslant N^{-\eta } \, \bigg | \, {\mathcal {F}}\biggr ) \\&\quad \leqslant Q^{\mathcal {F}} \biggl (\frac{1}{d}\sum _{x\in S_1(b)} g_{x}(z) ,\, \frac{1}{8 z d (T^2 d)^{r}}\biggr ) \\&\quad \leqslant Q^{\mathcal {F}} \biggl (\frac{1}{d} \,g_{x_*}(z) ,\, \frac{1}{8 z d (T^2 d)^{r}}\biggr ) = Q^{\mathcal {F}} \biggl ( g_{x_*}(z) ,\, \frac{1}{8 z (T^2 d)^{r}}\biggr ), \end{aligned}$$

where in the second step we used Remark 6.25 (ii) and in the last step Remark 6.25 (i). From Proposition 6.26 we therefore conclude that the first term on the right-hand side of (6.26) is bounded by

$$\begin{aligned} \frac{1}{2 (\chi d)^{(r-i)/2}} \,{\mathbb {P}}(\Xi _r) \leqslant N^{-\eta / 4 + o(1)} \, {\mathbb {P}}(a,b \in {\mathcal {W}}), \end{aligned}$$

where we used the definition 6.3 and Remark 6.17 (ii). The claim now follows from Lemma 6.19. \(\square \)

Remark 6.27

If we restrict ourselves to the critical regime \(d \asymp \log N\), then the factor \(N^{-\eta }\) inside the probability in (6.6) can be improved to \(N^{-\eta /2}\). To see this, we note that in this regime the parameter T from Definition 6.6 satisfies \(T = 10 \kappa ^{-1}\) since, in the critical regime, the estimate (6.16) with small enough \(\kappa \) remains valid for this smaller choice of T. Thus, the estimate (6.27) in the proof of Proposition 6.12 can be replaced with \(8z d (T^2 d)^{r} \leqslant (C d)^r\), which is bounded by \(N^{\eta /2 + o(1)}\).

The key tool behind the proof of Proposition 6.26 is Proposition 2.8 due to Kesten.

Proof of Proposition 6.26

Throughout the proof, the argument z of \(g_x(z)\) for any \(x \in B_r(b)\) will always be the random variable z from Definition 6.23. Therefore, we omit this argument from our notation and write \(g_x \equiv g_x(z)\).

For \(i \in [r]\) we define

where K is the universal constant from Proposition 2.8. We prove Proposition 6.26 by showing that, for all \(i \in [r]\) and \(x \in S_i(b)\cap {\mathcal {R}}\) we have

$$\begin{aligned} Q^{{\mathcal {F}}}\big (g_x,L_{i}\big ) \leqslant P_{i}. \end{aligned}$$
(6.28)

We show (6.28) by induction on \(i = r, r - 1, \dots , 1\).

We start the induction at \(i = r\). Abbreviate , which is an \({\mathcal {F}}\)-measurable set. For \(x\in S_r(b)\), conditioned on \({\mathcal {F}}\), \((\mathbb {1}_{y\in S_1^+(x)})_{y \in {\mathcal {X}}}\) are i.i.d. Bernoulli random variables. Hence, conditioned on \({\mathcal {F}}\) we have

$$\begin{aligned} \sum _{y\in S_1^+(x) {\setminus } {\mathcal {V}}^{(B_r(b))}} G_{yy}(r,z) \overset{{\textrm{d}}}{=}\sum _{k=0}^{|{\mathcal {X}}|}\mathbb {1}_{|S_1^+(x) \cap {\mathcal {X}}|=k} \sum _{i=1}^{k}G_{\sigma (i)\sigma (i)}(r,z), \end{aligned}$$
(6.29)

where \(\sigma \) is a uniform random enumeration of \({\mathcal {X}}\) (i.e. a bijection \([|{\mathcal {X}} |] \rightarrow {\mathcal {X}}\)) that is independent of \(|S_1^+(x) \cap {\mathcal {X}}|\). Because of the condition (C) in Definition 6.23, for any \(k\ne l\), \(|\sum _{i=1}^{l}G_{\sigma (i)\sigma (i)}(r,z)-\sum _{i=1}^{k}G_{\sigma (i)\sigma (i)}(r,z)|\geqslant (3z)^{-1} \). Therefore, for any \(t\in {\mathbb {R}}\) we get on \(\Upsilon \)

$$\begin{aligned}{} & {} {\mathbb {P}}\Biggl (\Biggl |\sum _{k=0}^{|{\mathcal {X}}|}\mathbb {1}_{|S_1^+(x) \cap {\mathcal {X}}|=k} \sum _{i=1}^{k}G_{\sigma (i)\sigma (i)}(r,z) - t \Biggr | \leqslant \frac{1}{8z} \, \bigg |\, \sigma , {\mathcal {F}}\Biggr ) \\{} & {} \quad \leqslant \max _{0 \leqslant k\leqslant |{\mathcal {X}}|} {\mathbb {P}}(|S_1^+(x) \cap {\mathcal {X}}|=k \,|\, {\mathcal {F}})\leqslant \frac{1}{2}, \end{aligned}$$

where in the last step we used that \(|S_1^+(x) \cap {\mathcal {X}}| \overset{{\textrm{d}}}{=}{\text {Binom}}(|{\mathcal {X}} |, d/N)\) conditioned on \({\mathcal {F}}\), that \(|{\mathcal {X}} | \geqslant N - N^{1/2}\) by the condition (D) in Definition 6.23, and that \(d \gg 1\). From (6.29) we therefore conclude that

$$\begin{aligned} Q^{{\mathcal {F}}} \Biggl (\sum _{y\in S_1^+(x) {\setminus } {\mathcal {V}}^{(B_r(b))}} G_{yy}(r,z) , \frac{1}{8z}\Biggr ) \leqslant \frac{1}{2}. \end{aligned}$$

Hence, Remarks 6.7 and 6.25 (i) imply \(Q^{{\mathcal {F}}}\biggl (g_x,\frac{1}{8z T^2 d}\biggr ) \leqslant \frac{1}{2}\), which is (6.28) for \(i = r\).

For the induction step, we let \(i < r\), choose \(x \in S_i(b) \cap {\mathcal {R}}\), and assume that \(Q^{{\mathcal {F}}}(g_y,L_{i+1})\leqslant P_{i+1}\) for all \(y\in S_{i+1}(b)\cap {\mathcal {R}}\). Note that \(S_1^+(x)\) and \({\mathcal {R}}\) are \({\mathcal {F}}\)-measurable, and that the family \((g_y)_{y \in S_1+(x)}\) is independent on \(\Upsilon \) conditioned on \({\mathcal {F}}\), by Remark 6.10 and Definition 6.23 (A). Hence, we can apply Proposition 2.8 to the concentration function \(Q^{{\mathcal {F}}}\) to obtain

$$\begin{aligned} Q^{{\mathcal {F}}}\Biggl (\sum _{y\in S_1^+(x)\cap {\mathcal {R}}} g_y,L_{i+1}\Biggr )\leqslant \frac{K}{\sqrt{| S_1^+(x)\cap {\mathcal {R}} |}} P_{i+1} \leqslant \frac{K\sqrt{2} P_{i+1}}{d^{1/2}}, \end{aligned}$$
(6.30)

where the last inequality follows from \(| S_1^+(x)\cap {\mathcal {R}} | \geqslant d/2\), by Definition 6.21. Moreover, the conditional independence of the sums \(\sum _{y \in S_1^+(x) \cap \mathcal R} g_y\) and \(\sum _{y\in S_1^+(x) {{\setminus }} {\mathcal {R}}} g_y\) combined with Remark 6.25 (i) and (ii) yields

$$\begin{aligned} Q^{{\mathcal {F}}}\Biggl (\frac{1}{d}\sum _{y\in S_1^+(x)} g_y,\frac{L_{i+1}}{d}\Biggr )&= Q^{{\mathcal {F}}}\Biggl (\sum _{y\in S_1^+(x)} g_y, L_{i+1}\Biggr ) \\&= Q^{{\mathcal {F}}}\Biggl (\sum _{y\in S_1^+(x) \cap {\mathcal {R}}} g_y+\sum _{y\in S_1^+(x) {\setminus } {\mathcal {R}}} g_y,L_{i+1}\Biggr ) \\&\leqslant Q^{{\mathcal {F}}}\Biggl (\sum _{y\in S_1^+(x) \cap {\mathcal {R}}} g_y,L_{i+1}\Biggr ). \end{aligned}$$

Hence, by Remark 6.7, Remark 6.25 (i), Definition 6.9, and (6.30), we obtain

$$\begin{aligned} Q^{{\mathcal {F}}}\Biggl (g_x,\frac{L_{i+1}}{T^2 d}\Biggr ) \leqslant \frac{K\sqrt{2} P_{i+1}}{d^{1/2}}, \end{aligned}$$

which is (6.28). This completes the proof of (6.28) and, hence, the one of Proposition 6.26. \(\square \)

Proof of Proposition 6.22

The proof proceeds in two steps: first by establishing the claim for the root of a Galton-Watson branching process with Poisson offspring distribution with mean d, and then concluding by a comparison argument.

Denote by \({\mathcal {P}}_s\) a Poisson random variable with expectation s. Let W denote the Galton-Watson branching process with Poisson offspring distribution \({\mathcal {P}}_d\), which we regard as a random rooted ordered treeFootnote 13 whose root we call o. We use the graph-theoretic notations (such as \(S_i(x)\)) from Sect. 2.1 also on rooted ordered trees. Moreover, we extend Definition 6.21 to a rooted ordered tree T in the obvious fashion, and when needed we use the notation \({\mathcal {R}} \equiv {\mathcal {R}}_r(T,o)\) to indicate the radius r, the tree T, and the root o explicitly.

We define the parameter . By Bennett’s inequality (see Lemma D.1 below), we find that \(\delta \leqslant \textrm{e}^{-cd}\) for some universal constant \(c > 0\). We shall show by induction on i that

$$\begin{aligned} {\mathbb {P}}(o \notin {\mathcal {R}}_i(W,o)) \leqslant \delta \end{aligned}$$
(6.31)

for all \(i \geqslant 0\). For \(i = 0\) we have \({\mathbb {P}}(o \notin {\mathcal {R}}_i(W,o)) = 0\) since \(o \in {\mathcal {R}}_0(W,o)\) by (the analogue of) Definition 6.21, and (6.31) is trivial.

To advance the induction, we suppose that (6.31) holds for some \(i \geqslant 0\). By Definition 6.21,

$$\begin{aligned} {\mathbb {P}}(o \notin {\mathcal {R}}_{i+1}(W,o)) = {\mathbb {P}}\Biggl (\sum _{x \in S_1(o)} \mathbb {1}_{x \in {\mathcal {R}}_{i+1}(W,o)} < \frac{d}{2}\Biggr ). \end{aligned}$$

By definition of the branching process W, conditioned on \(S_1(o)\), the random variables \((\mathbb {1}_{x \in {\mathcal {R}}_{i+1}(W,o)})_{x \in S_1(o)}\) are independent Bernoulli random variables with expectation

where \(x \in S_1(o)\). We conclude that \(\sum _{x \in S_1(o)} \mathbb {1}_{x \in {\mathcal {R}}_{i+1}(W,o)} \overset{{\textrm{d}}}{=}{\mathcal {P}}_{d (1 - \zeta _i)}\). Using the induction assumption \(\zeta _i \leqslant \delta \) from (6.31) and the bound \(\delta \leqslant \textrm{e}^{-c d} < 1/4\) for large enough d, we therefore conclude that

$$\begin{aligned} {\mathbb {P}}(o \notin {\mathcal {R}}_{i+1}(W)) = {\mathbb {P}}\biggl ({\mathcal {P}}_{d (1 - {\mathbb {P}}(o \notin {\mathcal {R}}_{i}(W,o)))} < \frac{d}{2}\biggr ) \leqslant {\mathbb {P}}\biggl ({\mathcal {P}}_{3d/4} \leqslant \frac{d}{2}\biggr ) = \delta . \end{aligned}$$

This concludes the proof of (6.31) for all \(i \geqslant 0\).

Hence, denoting by \({\mathcal {B}}_{n,p}\) a random variable with law \({\text {Binom}}(n,p)\), we conclude that if \(|S_1(o) | \geqslant d\) then

$$\begin{aligned}{} & {} {\mathbb {P}}(o \notin {\mathcal {R}}_r(W,o) \,|\, S_1(o)) \nonumber \\{} & {} \quad = {\mathbb {P}}\Biggl (\sum _{x \in S_1(o)} \mathbb {1}_{x \in {\mathcal {R}}_{r - 1}(W,o)}< \frac{d}{2} \, \bigg |\, S_1(o)\Biggr ) = {\mathbb {P}}\biggl ({\mathcal {B}}_{|S_1(o) |, 1 - \zeta _{r-1}}< \frac{d}{2} \, \bigg |\, S_1(o)\biggr ) \nonumber \\{} & {} \quad \leqslant {\mathbb {P}}\biggl ({\mathcal {B}}_{d, 1 - \delta } < \frac{d}{2}\biggr ) = {\mathbb {P}}\biggl ({\mathcal {B}}_{d, \delta } \geqslant \frac{d}{2}\biggr ) \leqslant \textrm{e}^{-c d \delta \frac{1}{2 \delta } \log \frac{1}{2 \delta }}\leqslant \textrm{e}^{-c d^2} \leqslant N^{-1} \end{aligned}$$
(6.32)

for some universal constant \(c > 0\), where in the third step we used that \(|S_1(o) | \geqslant d\) and \(\zeta _{r - 1} \leqslant \delta \) by (6.31), in the fifth step Bennett’s inequality (see Lemma D.1 below), in the sixth step that \(\delta \leqslant \textrm{e}^{-cd}\), and in the last step the assumption \(d \gg \sqrt{\log N}\). This concludes the estimate for the Galton-Watson process W.

Next, we analyse \({\mathbb {P}}_\Omega (b \notin {\mathcal {R}}_r({\mathbb {G}}, b) \,|\, S_1(a), S_1(b))\). We note first that we can assume that \(|B_1(a) | \leqslant 1 + 10 \log N\) and that \(B_1(a)\) and \(B_1(b)\) are disjoint, for otherwise the above probability vanishes by definition of \(\Omega \) and Proposition 3.2.

We observe that a rooted ordered tree can be regarded as an equivalence class of (labelled) rooted trees up to a relabelling of the vertices that preserves the ordering of the children of each vertex. We denote by \([{\mathbb {T}},x]\) the equivalence class of the labelled rooted tree \(({\mathbb {T}}, x)\), where x is the root. By convention, if \({\mathbb {T}}\) is not a tree (i.e. if it contains a cycle) then its equivalence class is the empty tree. We denote by \(\mathfrak T_r\) the set of rooted ordered trees of depth r. Moreover, we denote by \({\mathfrak {T}}_r^* \subset {\mathfrak {T}}_r\) the subset of rooted ordered trees with at most \(N^{1/5}\) vertices and whose root is a robust vertex with \(|S_1(b) |\) children. Abbreviating , we can write

$$\begin{aligned} {\mathbb {P}}_\Omega (b \notin {\mathcal {R}}_r({\mathbb {G}}, b) \, |\, S_1(a), S_1(b))&= {\mathbb {P}}_\Omega ([{\mathbb {G}}|_{B_r(b)},b] \notin {\mathfrak {T}}_r^* \,|\, S_1(a), S_1(b)) \nonumber \\&\leqslant {\mathbb {P}}(\{[{\mathbb {G}}|_{B_r(b)},b] \notin {\mathfrak {T}}_r^* \}\cap \Delta \,|\, S_1(a), S_1(b)), \end{aligned}$$
(6.33)

where we used that, since \(a,b \in {\mathcal {W}} \subset {\mathcal {V}}\), Proposition 3.2 implies that on the event \(\Omega \) the graph \({\mathbb {G}}|_{B_r(b)}\) is a tree with at most \(N^{1/5}\) vertices, and that \(\Omega \subset \Delta \).

For the following, let \(T \in {\mathfrak {T}}_r^*\) and denote by o its root. For \(1 \leqslant i \leqslant r\) we introduce the event

In particular, \({\mathbb {P}}(\Theta _1 \,|\, S_1(a), S_1(b)) = 1\) because of the assumed disjointness of \(S_1(a)\) and \(S_1(b)\). We now estimate

$$\begin{aligned} {\mathbb {P}}(\{[{\mathbb {G}}|_{B_r(b)},b] = T\}\cap \Delta \,|\, S_1(a), S_1(b)) = {\mathbb {P}}(\Theta _r \, | \, S_1(a), S_1(b)) \end{aligned}$$
(6.34)

recursively, for \(1 \leqslant i \leqslant r - 1\), using the expression

$$\begin{aligned}{} & {} \frac{{\mathbb {P}}(\Theta _{i+1}| \, S_1(a), S_1(b))}{{\mathbb {P}}(\Theta _{i}| \, S_1(a), S_1(b))}\\{} & {} \quad = \frac{(N' - |B_i(o) |)!}{(N' - |B_{i+1}(o) |)! \prod _{x \in S_i(o)} |S_1^+(x) |!} \prod _{x \in S_i(o)} \biggl (\frac{d}{N}\biggr )^{|S_1^+(x) |} \biggl (1 - \frac{d}{N}\biggr )^{N - |B_i(o) | - |S_1^+(x) |}, \end{aligned}$$

where , and the graph-theoretic quantities on the left-hand side are in terms of \({\mathbb {G}}\) and on the right-hand side in terms of the deterministic rooted ordered tree T. Here, the multinomial factor in front arises from a choice of the \(|S_i(o) |\) disjoint subsets representing the children of the vertices in \(S_i(o)\) from \(N' - |B_i(o) |\) available vertices, and the remaining product follows by independence of the edges in \({\mathbb {G}}\). We deduce that

$$\begin{aligned} \frac{{\mathbb {P}}(\Theta _{i+1}| \, S_1(a), S_1(b))}{{\mathbb {P}}(\Theta _{i}| \, S_1(a), S_1(b))}&= (1 + O(N^{-3/4})) \frac{(N' - |B_i(o) |)!}{(N' - |B_{i+1}(o) |)! \, N^{|S_{i+1}(o) |}} \prod _{x \in S_i(o)} \frac{d^{|S_1^+(x) |}}{|S_1^+(x) |!} \textrm{e}^{-d} \\&= (1 + O(N^{-3/5})) \prod _{x \in S_i(o)} \frac{d^{|S_1^+(x) |}}{|S_1^+(x) |!} \textrm{e}^{-d}, \end{aligned}$$

where we used that \(N - N' \leqslant 1 + 10 \log N\) and \(|B_r(o) | \leqslant N^{1/5}\). By induction on i and comparison with the Galton-Watson tree W, using that

$$\begin{aligned} \frac{{\mathbb {P}}( W|_{B_{i+1}(o)} = T|_{B_{i+1}(o)} \,|\, S_1(o))}{{\mathbb {P}}( W|_{B_{i}(o)} = T|_{B_{i}(o)} \,|\, S_1(o))} = \prod _{x \in S_i(o)} \frac{d^{|S_1^+(x) |}}{|S_1^+(x) |!} \textrm{e}^{-d}, \end{aligned}$$

as well as \({\mathbb {P}}(\Theta _1 | S_1(a), S_1(b)) = 1\), we therefore conclude from (6.34) that if \(|S_1(b) | = |S_1(o) |\) then

$$\begin{aligned} {\mathbb {P}}(\{[{\mathbb {G}}|_{B_r(b)},b] = T\} \cap \Delta \,|\, S_1(a), S_1(b)) = (1 + O(N^{-1/2})) \, {\mathbb {P}}(W |_{B_r(o)} = T \,|\, S_1(o)) \end{aligned}$$

for all \(T \in {\mathfrak {T}}_r^*\). Thus,

$$\begin{aligned}&{\mathbb {P}}(\{[{\mathbb {G}}|_{B_r(b)},b] \notin {\mathfrak {T}}_r^*\} \cap \Delta \,|\, S_1(a), S_1(b)) \\&\quad \leqslant 1 - \sum _{T \in {\mathfrak {T}}_r^*} {\mathbb {P}}(\{[\mathbb G|_{B_r(b)},b] = T\} \cap \Delta \,|\, S_1(a), S_1(b)) \\&\quad = 1 - \sum _{T \in {\mathfrak {T}}_r^*} {\mathbb {P}}(W |_{B_r(o)} = T \,|\, S_1(o)) + O(N^{-1/2}) \\&\quad = {\mathbb {P}}(o \notin {\mathcal {R}}_r(W, o) \,|\, S_1(o)) + O(N^{-1/2}). \end{aligned}$$

The claim now follows from (6.32) and (6.33), noting that if \(b \in {\mathcal {W}}\) then \(|S_1(b) | = |S_1(o) | \geqslant d\). \(\square \)