Local Kesten--McKay law for random regular graphs

We study the adjacency matrices of random $d$-regular graphs with large but fixed degree $d$. In the bulk of the spectrum $[-2\sqrt{d-1}+\varepsilon, 2\sqrt{d-1}-\varepsilon]$ down to the optimal spectral scale, we prove that the Green's functions can be approximated by those of certain infinite tree-like (few cycles) graphs that depend only on the local structure of the original graphs. This result implies that the Kesten--McKay law holds for the spectral density down to the smallest scale and the complete delocalization of bulk eigenvectors. Our method is based on estimating the Green's function of the adjacency matrices and a resampling of the boundary edges of large balls in the graphs.

1 Introduction and main results 2 2 Proof outline and main ideas 11 3 Structure of random and deterministic regular graphs 14 4 Trees and tree extension 18 5 Initial estimates 22 6 Local resampling by switching 25 7 Graph distance between switched vertices 31 8 Green's function distance and switching cells 34 9 Stability under removal of a neighborhood 38 10 Stability under switching 44 11 Improved decay in the switched graph 54  1 Introduction and main results 1.1. Introduction. The random regular graph with fixed degree is perhaps the most fundamental model of a sparse random graph, and arises naturally in different contexts. The spectral properties of the adjacency matrices of regular graphs are one of their main properties of interest, in particular in computer science and combinatorics, in the theory of expanders (see e.g. [72]), in statistical physics, in the theory of disordered operators and quantum chaos (as discussed e.g. in [73]), and also in the theory of ζ-functions (see e.g. [77]). Many results about the spectrum of (random and deterministic) regular graphs exist, but the local eigenvalue spacing and the eigenvector distribution of the random regular graphs have so far not been understood for fixed degree. Numerical evidence supports the conjecture that the local eigenvalue spacing statistics of a random d-regular graph on N vertices are the same as those of the Gaussian Orthogonal Ensemble (GOE) [47,51,68,69]. For d ∈ [N ε , N 2/3−ε ], this was proved in [17,18]. The case of fixed degree d has up to now been approached mostly by combinatorial methods, which typically use the local tree-like structure of the random regular graph.
In this paper, we introduce a novel approach to combine the use of the local tree-like structure with a simultaneous effective use of the global random-matrix like behavior of the random regular graph. Below we first discuss known consequences of the tree-like and of the random-matrix like behavior, individually.
Tree-like structure. Let A = A(G) be the adjacency matrix of a (uniform) random d-regular graph G on N vertices. Thus A is uniformly chosen among all symmetric matrices with entries in {0, 1} with j A ij = d and A ii = 0 for all i. Note that A has a trivial constant eigenvector with eigenvalue d.
It is well known that most regular graphs of fixed degree d 3 are locally tree-like in the sense that: the radius c log d−1 N neighborhoods of almost all vertices are the same as those in the infinite d-regular tree, and the neighborhoods of all vertices have bounded excess (the smallest number of edges that must be removed to yield a tree); see e.g. Proposition 3.1 below. This locally tree-like structure is known to imply the following asymptotic properties as N → ∞: The macroscopic spectral density of A converges to the Kesten-McKay law [54,64], which is the spectral density of the infinite d-regular tree. It has support [−2 √ d − 1, 2 √ d − 1] and density It has been shown that the convergence also holds on spectral scales smaller than the macroscopic scale by a logarithmic factor in N [14,31,45].
For any ε > 0, the nontrivial eigenvalues of A are contained in [−2 This was conjectured in [12] and proved in [43]; see also [19,71] for recent alternative arguments. In fact, the scale ε can again be improved from 1 by a logarithmic factor in N .
The eigenvectors of A are weakly delocalized. In particular, for any ℓ 2 -normalized eigenvector v of A, the tree-like structure implies the bound v ∞ (log N ) −c , see [23,31,45]; that if V ⊂ [[N ]] is a set with i∈V |v i | 2 ε then V has at least N c(ε) elements [23]; and a version of quantum ergodity: for any vector a ∈ R N with a ∞ 1, averages of i a i v i over many eigenvectors v converge to 1 N i a i [13,14,24]. Random matrix-like behavior. The estimates discussed above all rely on a small factor (log N ) −c obtained from the tree-like structure. However, strong forms of eigenvector delocalization and local concentration of the eigenvalue density, as now understood for many random matrix ensembles (first proved in [38]) and important for the understanding of eigenvalue spacing [39,40,57,75], are not expected to follow merely from the local (tree-like) graph structure. In this paper, we provide the first strong evidence that random matrix behavior persists at local scales for random regular graphs with fixed degree d (large enough), by combining the tree-like structure at small distances with random matrix-like structure at large distances.
In particular, the following corollaries are direct consequences of our main result. More general and quantitative results are stated further below, including a precise local profile of the Green's function down to the smallest possible spectral scales, an isotropic version of the delocalization estimate, a quantum unique ergodicity statement for the eigenvectors, as well as its implications for the eigenvector distribution.
is concentrated around the Kesten-McKay law down to optimal spectral scale N −1+o (1) .
While the spectral density (or the trace of the Green's function) does concentrate, the individual entries of the Green's function of the random regular graph with bounded degree do not concentrate. This is different from the typical examples in random matrix theory, and it is one of the reasons the fixed degree graphs are more difficult to study. For example, the random regular graphs contains a triangle with probability uniformly bounded from below. For graphs with bounded degree, triangles and other short cycles have a strong local influence on the elements of the Green's function, and thus the spectrum; see the discussion below Corollary 1.11. However, while the local graph structure has an effect of order 1 on the Green's function entries, for random regular graphs, the local structure is nevertheless sufficiently stable to imply the above corollaries. Such results are not true for Erdős-Rényi graphs of bounded average degree; see Figure 3.

Graphs and the Green's function.
It is convenient and natural to normalize the adjacency matrix A by a factor 1/ √ d − 1 as H = A/ √ d − 1 so that the asymptotic spectral density of H is given by the rescaled Kesten-McKay law with support [−2, 2] and density function The Green's function (resolvent) of G is defined by G ij (z) = G ij (G, z) = (H − z) −1 ij for z ∈ C + , where C + is the upper half plane. The Green's function encodes all spectral information of H (and thus of A). In particular, the spectral resolution is given by η = Im[z]: the macroscopic behavior corresponds to η of order 1, the mesoscopic behavior to 1/N ≪ η ≪ 1, and the microscopic behavior of individual eigenvalues corresponds to η below 1/N . (In this paper, we do not shift the top eigenvalue as in [18].) For any vertices i, j, the Green's function G ij (G, z) depends on the entire graph G. However, for z of order 1 (and slightly below) away from the spectrum, it is known that the Green's function G ij can be well approximated from the knowledge of the local graph structure around i and j. Indeed, a sufficiently large distance to the spectrum implies strong exponential decay of G, which implies that the Green's function is determined by the local graph structure. On the other hand, only using the local tree-like structure, such an approximation is in general not expected to hold on scales η ≪ 1. However, our main result shows that, for random d-regular graphs with d large but fixed, in the bulk of the spectrum, with high probability, the Green's function does have a local approximation down to the optimal spectral scale η (log N ) O(1) /N . This result relies crucially on a combination of local stability (the tree-like behavior near most vertices) and the fluctuations on the boundary. Thus our method is different from previous results which have either only used the local stability (combinatorics) or fluctuations (random matrix theory).
To state our results more precisely, we need some terminology.
Graphs, adjacency matrices, Green's functions. Throughout this paper, graphs G are always simple (i.e., have no self-loops or multiple edges) and have vertex degrees at most d (non-regular graphs are also used). The geodesic distance (length of the shortest path between two vertices) in the graph G is denoted by dist G (·, ·). For any graph G, the adjacency matrix is the (possibly infinite) symmetric matrix A indexed by the vertices of the graph, with A ij = A ji = 1 if there is an edge between i and j, and A ij = 0 otherwise. Throughout the paper, we denote the normalized adjacency matrix by H = A/ √ d − 1, where the normalization by 1/ √ d − 1 is chosen independently of the actual degrees of the graph. Moreover, we denote the (unnormalized) adjacency matrix of a directed edge (i, j) by e ij , i.e. (e ij ) kl = δ ik δ jl . The Green's function of a graph G is the unique matrix G = G(z) defined by G(H − z) = I. In Appendix B, several well-known properties of Green's function are summarized; they will be used throughout the paper. Subsets and Subgraphs. Let G be a graph, and denote the set of its edges by the same symbol G and its vertices by G. More generally, throughout the paper, we use blackboard bold letters for set or subsets of vertices, and calligraphic letters for graphs or subgraphs. For any subset X ⊂ G, we define the graph G (X) by removing the vertices X and edges adjacent to X from G, i.e., the adjacency matrix of G (X) is the restriction of that of G to G \ X. We write G (X) for the Green's function of G (X) . For any subgraph X ⊂ G, we denote by ∂X = {v ∈ G : dist G (v, X ) = 1} the vertex boundary of X in G, and by ∂ E X = {e ∈ G : e is adjacent to X but e ∈ X } the edge boundary of X in G. Moreover, for any subset X ⊂ G, we denote by ∂X and ∂ E X the vertex and edge boundaries of the subgraph induced by G on X.
Neighborhoods. Given a subset X of the vertex set of a graph G and an integer r > 0, we denote the r-neighborhood of X in G by B r (X, G), i.e., it is the subgraph induced by G on the set B r (X, G) = {j ∈ G : dist G (X, j) r}. In particular B r (i, G) is the radius-r neighborhood of the vertex i.
Moreover, given vertices i, j in G and r > 0, we denote by E r (i, j, G) the smallest subgraph of G that contains all paths of length at most r between i and j. Namely, E r (i, j, G) := {e ∈ G : there exists a path from i to j of length at most r containing e}. (1.4) Notice that E 2r (i, j, G) ⊂ B r (i, G) ∪ B r (j, G).
Trees. The infinite d-regular tree is the unique (up to isometry) infinite connected d-regular graph without cycles, and is denoted by Y. The rooted d-regular tree with root degree c is the unique (up to isometry) infinite connected graph that is d-regular at every vertex except for a distinguished root vertex o, which has degree c.
(i) The tree extension (abbreviated TE) of G 0 is the (possibly infinite) graph TE(G 0 ) defined by attaching to any extensible vertex v in G 0 a rooted d-regular tree with root degree d − g(v) − deg G0 (v).
(ii) The Green's function of G 0 with tree extension, denoted P (G 0 ), is the Green's function of the (possibly infinite) graph TE(G 0 ). Figure 2. Given a graph G (with the standard deficit function g = d − deg G ), the left figure illustrates a subgraph H ⊂ G, which by our conventions inherits its deficit function from G by restriction. Thus all vertices in H have the same degrees in the tree extension TE(H) as in G = TE(G). The right figure illustrates the graph G (X) obtained by removing a vertex set X. By our convention on the deficit function, the tree extension of G (X) is then trivial.
Definition 1.5. Given an integer r > 0, we call P ij (E r (i, j, G)) the localized Green's function of G at vertices i, j.
Thus the localized Green's functions at i, j is the Green's function of a graph that itself depends on i, j. However, the dependence of the graph on i, j is weak, in the sense that, up to a small error, the graph E r (i, j, G) could be replaced by any neighborhood of i, j that is not too small and not too large; see Proposition 4.2 and Remark 4.3.
In our main result, stated in Section 1.4 below, we will show that the Green's function G ij (G) can be approximated by the localized Green's function P ij (E r (i, j, G)). To interpret this result, we note the following elementary properties of the localized Green's function.
If dist(i, j) > r, then E r (i, j, G) is the empty graph, and therefore P ij (E r (i, j, G), z) = 0.
If E r (i, j, G) has no cycles (thus it is a tree), then TE(E r (i, j, G)) is an infinite tree. In particular, if G is d-regular, then TE(E r (i, j, G)) is the infinite d-regular tree Y, and therefore P ij (E r (i, j, G), z) = G ij (Y, z). By a straightforward calculation (see Section 4), it then follows that , (1.5) where m d and m sc are the Stieltjes transforms of the Kesten-McKay and semicircle laws; see (1.6) below.
If E r (i, j, G) has bounded excess, then upper bounds similar to the right-hand side of (1.5) hold. In particular, P ij (E(i, j, r), z) is uniformly bounded in z ∈ C + and decays exponentially in the distance with rate log(|m sc (z)|/ √ d − 1) (see Section 4). . (1.7) Moreover, it is well known that m sc (z) is a holomorphic bijection from the upper half plane C + to the upper half unit disk D + := {z ∈ C + : |z| < 1}, and that it satisfies the algebraic equation z = − m sc (z) + 1 m sc (z) , z ∈ C + , (1.8) and in particular that |m sc (z)| 1.

Main result.
Let G N,d be the set of simple d-regular graphs on the vertex set [[N ]]. Throughout the paper, we control error estimates in terms of (large powers of) the parameter where z ∈ C + . We will often omit the parameter z from the notation if it is clear from the context. Given α > 4, we define the spectral domain Our main result is the following theorem.
We emphasize that, for fixed d, the right-hand side of (1.11) converges to 0, as N → ∞, uniformly in the spectral domain D. The constants in the statement of the theorem can be improved at the expense of a longer proof and a more complicated statement; we do not pursue this. Theorem 1.6 states that, in D, the Green's function G ij (G) is well approximated by P ij (E r * (i, j, G)), which is random, but only depends on the local graph structure of G near the vertices i and j. Since the local structure of a random regular graph is well understood, the theorem has a number of consequences. Specifically, under the assumptions of the theorem, it is well known that there are κ > 0 and δ > 0 such that, with R = ⌊κ log d−1 N ⌋, one can assume that the radius-R neighborhoods of all but N δ many vertices of G coincide with those of the infinite d-regular tree, and that the R-neighborhoods of all other vertices have excess at most ω (see e.g. Proposition 3.1). Moreover, for the vertices i that have radius-R tree neighborhoods, we have (see e.g. Proposition 4.1) (1.12) The vertices whose R-neighbourhood has bounded excess still satisfy (see e.g. Proposition 4.2) |P ii (E r * (i, i, G))| 3|m sc |/2 3/2. (1.13) Together with this information on the local graph structure, the local stability result of Theorem 1.6 has a number of consequences, discussed next.
1.5. Consequences of Theorem 1.6 and the local graph structure. First, (1.11) and (1.13) imply that the eigenvectors are completely delocalized in ℓ ∞ .
On the other hand, it is easy to see that, with probability Ω(N −d+2 ), the random d-regular graph has a localized eigenvector (see Figure 3). Thus (1.14) cannot hold with higher than polynomial probability. Moreover, the Erdős-Rényi graph with finite average degree d has localized eigenvectors with probability Ω(1). Thus (1.14) cannot hold (with high probability) for the Erdős-Rényi graph with average degree d.
Since the random regular graph is symmetric under permutations, delocalization in the standard basis in fact implies that the eigenvectors are isotropically delocalized and also that a local version of the quantum unique ergodicity property holds. This is stated in the next two corollaries, whose proofs follow exactly as in [18,Section 8], using (1.14) as an input and a general moment inequality for exchangeable random variables. Thus the proofs of the following two corollaries make strong use of the exchangeability of the random regular graph. On the other hand, the proof of Theorem 1.6 does not exploit exchangeability in a significant way, and we believe that the method could be extended, for example, to graphs with more general degree sequences. Corollary 1.8 (Isotropic eigenvector delocalization). Under the assumptions of Corollary 1.7, for any vector q ∈ R N with q 2 = 1 and k q k = 0, with probability 1 − o(N −ω+8 ), for all eigenvectors v of H whose eigenvalue λ obeys |λ ± 2| (log N ) 1−α/2 .  Figure 3. Corollary 1.7 shows that a random d-regular graph has only completely delocalized eigenvectors with probability 1 − o(N −ω+8 ). On the other hand, it is not difficult to show that a random d-regular graph has localized eigenvectors with probability Ω(N −d+2 ). For example, a random 3-regular graph contains the subgraph shown on the left with probability Ω(N −1 ). For comparison, also notice that an Erdős-Rényi graph with finite average degree contains localized eigenvectors with probability Ω(1); see the right figure.
Corollary 1.9 (QUE). Under the assumptions of Corollary 1.7, for any vector q ∈ R N with q 2 = 1 and k q k = 0, with probability for all eigenvectors v of H whose eigenvalue λ obeys |λ ± 2| (log N ) 1−α/2 . In particular, for any index In [15], it was recently proved that the entry distribution of almost eigenvectors of random d-regular graphs is close in the weak topology to a Gaussian distribution N (0, σ) for some 0 σ 1. However, due to the lack of delocalization estimates of almost eigenvectors, in [15], it was not ruled out that the distribution is trivial, i.e., that σ = 0. In (1.16), we prove that the bulk eigenvectors of random d-regular graphs cannot concentrate on a set of size o(N ), thus ruling out σ < 1, at least for bulk eigenvectors. Therefore, combining [15] and (1.16), we conclude that the entry distribution of bulk eigenvectors converges to the standard Gaussian distribution N (0, 1), and more generally that its local covariance structure is given by Gaussian waves on the d-regular tree [32,33].
To make the statement precise, we recall some terminology from [15,32]. Given a d-regular graph G on the vertex set [[N ]], any subset For any λ ∈ [−2, 2] and any good distance matrix D, the covariance matrix C λ (D) of the Gaussian wave on the infinite d-regular tree is given by where x, y are any two vertices of Y, with dist Y (x, y) = D ij , and G xy is the Green's function of Y, which depends only on dist Y (x, y) as computed explicitly in Proposition 4.1. Equivalently, C λ (D) is given by where U k (x) are the Chebyshev polynomials of the second kind; see [33]. Corollary 1.10 (Eigenvector normality). Fix ε > 0 and a good distance matrix (D ij ) 1 i,j k . For G uniformly chosen from G N,d , let v be an eigenvector whose normalized eigenvalue λ obeys |λ ± 2| (log N ) 1−α/2 , and let X be chosen uniformly from the subsets {X ⊂ [[N ]] : D G (X) = D}. Then, under the assumptions of Theorem 1.6, with probability at least 1 − ε, the empirical distribution of √ N v X is at most ε-far (in the weak topology) from the joint Gaussian distribution with covariance matrix C λ (D). In particular, for any index i ∈ [[N ]], the entry distribution √ N v i , is close to the standard Gaussian distribution N (0, 1). We assume that N N (α, ω, d, ε) is large enough.
Moreover, (1.11) with (1.12) and (1.13) imply that the Stieltjes transform of the empirical spectral measure (the trace of the Green's function) concentrates around that the Kesten-McKay Law, for z ∈ D. In particular, it is well known that this implies concentration of the spectral measure, for intervals of lengths as small as (log N ) C N −1 (see, for example, [35,Section 8.1]).
In the opposite direction, (1.11) also implies that the individual entries of the Green's function do not concentrate. For example, and the first term on the right-hand side can be easily seen to depend strongly on the local graph structure. Its fluctuation is of order 1.
1.6. Related results. The spectral density at scales much larger than the typical eigenvalue spacing have been studied in [14,31,45,79]. In [18], it was proved that the semicircle law holds down to the optimal mesoscopic scale for degree d ∈ [(log N ) 4 , N 2/3 (log N ) −4/3 ]. The methods of this paper could be extended from fixed d to d growing slowly with N , say to d at most (log N ) 4 when the results of [18] start to apply. Thus the results of this paper complement those of [18]. For simplicity, we restrict in this paper to the most interesting case of fixed degree d. In [17], we further proved that the bulk eigenvalues exhibit GOE spacing statistics provided that d ∈ [N ε , N 2/3−ε ]. Simulations indicate that GOE spacing statistics continue to hold for random regular graphs of fixed degree [47,51,68,69]; we believe that the results of this paper are an important step towards proving this prediction. Macroscopic eigenvalue statistics for random regular graphs of fixed degree have been studied using the techniques of Poisson approximation of short cycles [30,53] and (non-rigorously) using the replica method [67]. These results show that the macroscopic eigenvalue statistics for random regular graphs of fixed degree are different from those of a Gaussian matrix. However, this is not predicted to be the case for the local eigenvalue statistics. Spectral properties of regular directed graphs have also been studied recently [26,27].
The delocalization of eigenvectors of (random and deterministic) regular graphs has been studied in [13,14,23,24,31,45,58,79] (see also [70] for a survey of results on eigenvector delocalization in random matrices). Our result is the first one to imply the optimal bound of order 1/ √ N (up to logarithmic corrections) on their ℓ ∞ -norms. The previously best ℓ ∞ -bounds on the eigenvectors of random regular graphs of fixed degree were about 1/ log N .
The study of eigenvector statistics goes back to random matrix theory. For the Gaussian Orthogonal Ensemble, the eigenvectors are uniformly distributed on the orthogonal group, and it is easy to see that they have a near Gaussian entry distribution. For generalized Wigner matrices, asymptotic normality of eigenvectors is proved in [22,56,76], and for the adjacency matrices of Erdős-Rényi graphs and d-regular graphs with growing (average) degrees in [21]. For random regular graphs of fixed degree, it was recently proved in [15] that random lifts of (almost) eigenvectors of d-regular graphs to their universal covering, i.e., the infinite d-regular tree, are close to Gaussian waves [33], with unknown standard deviation σ ∈ [0, 1]. Combining with our results on delocalization of eigenvectors, we conclude that σ = 1, and thus the entry distribution of eigenvectors converge to standard Gaussian distribution N (0, 1) for d-regular graphs.
The value of second largest eigenvalue λ 2 of regular graphs is of particular interest. It was conjectured that almost all d-regular graphs obey λ 2 = 2 √ d − 1 + o(1) [12], and this was proved in [43] (see also [19,28,44,71] for different proofs and other estimates on the second largest eigenvalue). In fact, much more precisely, it is conjectured that the distribution of the second largest eigenvalue on scale N −2/3 is the same as that of the largest eigenvalue of the Gaussian Orthogonal Ensemble [72]. In particular, this conjecture would imply that slightly more than half of all regular graphs are Ramanujan graphs, namely d-regular graphs with λ 2 2 √ d − 1 (for explicit and probabilistic constructions of sequences of Ramanujan graphs, see [60,62,63]).
The Anderson model is a basic model for quantum transport in a disordered medium. The detailed understanding of its phase diagram, in particular the delocalization of states for small disorder in dimension d 3, and the conjecturally related random matrix eigenvalue statistics in finite volume, is a major problem in mathematical physics. The Anderson model is understood much better on the infinite regular tree (Bethe Lattice) [2][3][4][5][6][7][8][9][10]55]; see also [11] for a review. Given the understanding on the infinite tree, the Anderson model on the random regular graph is a natural finite volume model to test the conjectures on the corresponding local eigenvalue statistics. At sufficiently large disorder, it is known that the Anderson model on the random regular graph exhibits Poisson statistics [46], but the case of small disorder is open. The case of vanishing disorder is the adjacency matrix of the random regular graph. The eigenstates of the Anderson model on random regular graph have also been studied in connection with many-body localization [29,61].
The spectrum of random regular graphs has also received interest from the study of ζ-functions, as it can be related by an exact relationship to the poles of the Ihara ζ-function of regular graphs [16,50]; see also [77,78].
In random matrix theory, the local spectral statistics of the class of (generalized) Wigner matrices are now very well understood; see in particular [20,34,36,37,39,41,42,52,75] and references therein, as well as the recent book [40]. Many results on local eigenvalue statistics also exist for Erdős-Rényi random graphs, in particular [34,35,48,49]; the latter results apply down to logarithmically small average degree. Similar results have also been proved for more general degree distributions [1]. However, differently from the case of regular graphs, these types of results are not true for Erdős-Rényi graph with bounded average degree. For a review of other results for discrete random matrices, see also [80].
Our results use the idea of switchings for random regular graphs. Switchings of random regular graphs were introduced to prove enumeration results in [65] (see also [81] for subsequent developments). However, our use of switchings is quite unconventional, in that we apply switchings to many edges on the boundaries of large neighborhoods simultaneously (see Section 6). It generalizes the method we introduced with A. Knowles in [18] for the study d-regular graphs with d (log N ) 4 .

Proof outline and main ideas
In this section, we give a high-level outline of the proof of Theorem 1.6, whose details occupy the remainder of the paper. The proof is based on the general principle that, for small distances, a random regular graph behaves almost deterministically, while on the other hand, for large distances, it behaves much like a random matrix.

2.2.
Structure of the proof. The proof consists of several sections, which we briefly describe in this section. Here, we also define several subsets of G N,d , namely the sets These sets depend on parameters z ∈ C + and ℓ ∈ N (and also on the previously fixed parameters).
Small distance structure; the setΩ. The small distance behavior is captured in terms of cycles in neighborhoods of radius R. For any graph, we define the excess to be the smallest number of edges that must be removed to yield a graph with no cycles (a forest). Then, with R, ω, δ as fixed above, we define the setΩ ⊂ G N,d to consist of graphs such that the radius-R neighborhood of any vertex has excess at most ω; the number of vertices that have an R-neighborhood that contains a cycle is at most N δ . The setΩ provides rough a priori stability at small distances. All regular graphs appearing throughout the paper will be members ofΩ. It is well-known that P(Ω) 1 − o(N −ω+δ ); see Proposition 3.1.
Green's function approximation; the sets Ω(z, ℓ) and Ω − (z, ℓ). For z ∈ C + , we define the set Ω(z, ℓ) ⊂Ω be the set of graphs G such that for any two vertices i, j in [[N ]], it holds that Our main goal is to prove that Ω(z, ℓ) has high probability uniformly in the spectral domain z ∈ D.
That Ω(z, ℓ) has high probability is not difficult to show if |z| is large enough; see Section 5. To extend this estimate to smaller z, we define the set Ω − (z, ℓ) ⊂ Ω(z, ℓ) by the same conditions as Ω(z, ℓ), except that the right-hand side in (2.2) is smaller by a factor 1/2: Our main goal is to show that, for any z ∈ D ∩ Λ ℓ (where the spectral domain is defined in (1.10) and Λ ℓ is defined in (15.15)), if Ω(z, ℓ) has high probability, then the event Ω(z, ℓ) \ Ω − (z, ℓ) has very small probability, so that Ω − (z, ℓ) still has high probability. Then, by the Lipschitz-continuity of the Green's function, it follows that Ω − (z, ℓ) ⊂ Ω(z ′ , ℓ) for small |z − z ′ |, and thus that Ω(z ′ , ℓ) also has high probability. This can then be repeated to show that Ω(z, ℓ) holds for all z ∈ D ∩ Λ ℓ with high probability. Since these sets Λ ℓ all together cover D, it follows that Ω(z, ℓ * ) holds for all z ∈ D with high probability.
Local resampling. To show that Ω(z, ℓ) \ Ω − (z, ℓ) has small probability, we use the random matrix-like structure of random regular graphs at large distances. To this end, we fix a vertex, without loss of generality chosen to be 1, and abbreviate the ℓ-neighborhood of 1 (as a set of vertices in [[N ]] and as a graph, respectively; see Section 1.2 for our notational conventions) by In Section 6, we resample the boundary of the neighborhood T by switching the boundary edges with uniformly chosen edges from the remainder of the graph. The switched graph is often denoted byG. On the vertex set T, it coincides with the unswitched graph G, but the boundary of T in the switched graph G is now essentially random compared to the original graph G.
Given G, the switching is specified by the resampling data S, which consists of µ independently chosen oriented edges from G (T) . The local resampling is implemented by switching a boundary edge of T with one of the independently chosen edges encoded by S. In fact, in this operation, not all pairs of edges can be switched (are switchable) while keeping the graph simple. Therefore, given S, we denote by W ⊂ [ [1, µ]] the index set for switchable edges (see Section 6 for the definition), whose switching leaves the uniform measure on G N,d invariant. For notational convenience, without loss of generality, we assume that W = {1, 2, 3, . . . , ν} where ν µ throughout the paper (except in the definition in Section 6).
Switching from G toG. Throughout Sections 7-14, we condition on a graph G that satisfies certain estimates, and only use the randomness of the switching that specifies how to modify G toG. By our choice of ℓ and using T has bounded excess (which we can and do assume), the number of edges in the boundary of T is about (log N ) O (1) . The randomness of these edges ultimately provides access to concentration estimates, which exhibit the random-matrix like structure of the random regular graph at large distances.
Note that, if we remove the vertex set T from G, our switchings have a simpler effect than in G: they only consist of removing the edges {b i , c i } and adding instead {a i , b i }, for i ∈ W . Therefore, instead of studying the change from G toG at once, it will be convenient to analyze the effect of the switching in several steps. For this, we define the following graphs (which need not be regular). Following the conventions of Section 1.3, the deficit functions of these graphs are given by d − deg, where deg the degree function of the graph considered, and we abbreviate their Green's functions by G, G (T) , G (T) ,G (T) , andG respectively.
Distance estimates. To use the local resampling, we require some estimates on the local distance structure of graphs and some a priori estimates on their Green's functions. These are collected in Sections 7-8. In fact, we use both the usual graph distance (of the unswitched and switched graphs) and a notion of "distance" that is defined in terms of the size of the Green's function of the graph from which the set T is removed (again for the unswitched and switched graph).
The need for the Green's function distance arises as follows. While estimates that involve sums over the diagonal of the Green's function can be controlled quite well using only the graph distance, estimates of sums of off-diagonal terms are more delicate because the number terms is squared compared to the diagonal terms. By direct combinatorial arguments, it would be difficult to control large distances sufficiently precisely. However, to understand spectral properties, it is the size of the Green's function rather than distances themselves that is relevant; and while the size of the Green's function between two vertices is directly related to the distance between them if there are only few cycles, on a global scale (where many cycles could be present) cancellations can make the Green's function much smaller. These cancellations are captured in terms of a Ward identity, which states that the Green's function of any symmetric matrix obeys (see also Appendix B) Removing the neighborhood T and stability under resampling; the sets Ω + 1 (z, ℓ). Our goal is to show that estimates on the Green's function of G improve near the vertex 1 under the above mentioned local resampling. For this, we work with the Green's function of the graph G (T) obtained from G by removing the vertex set T (on which the graph does not change under switching).
As a preliminary step to showing that the estimates for the Green's function improve, we show that they are stable under the operation of removing T and resampling, i.e., roughly that the estimates analogous to those assumed continue to hold. More precisely, in Section 9, we show that if G ∈ Ω(z, ℓ), then the (non-regular) graph G (T) obeys the analogous estimate We define the set Ω + 1 (z, ℓ) ⊂Ω similarly as the set Ω(z, ℓ), except that G is replaced by the graph G (T) (and with different constant), i.e., Ω + 1 (z, ℓ) is the set of G ∈Ω such that Clearly, by (2.6), we have Ω(z, ℓ) ⊂ Ω + 1 (z, ℓ). In Section 10, we show that if G (T) obeys the (stronger) estimate (2.6), then with high probability the resampled graph obeys G (T) ∈ Ω + (z, ℓ).
Locally improved Green's function approximation; the sets Ω ′ 1 (z, ℓ). The set Ω ′ 1 (z, ℓ) ⊂Ω is defined by the improved estimates (14.1)-(14.4) near the vertex 1, with constant K = 2 10 . In Sections 11-14, it is proved that if we start with a graph G ∈ Ω + 1 (z, ℓ), with high probability with respect to the local resampling around vertex 1, the switched graphG belongs to Ω ′ 1 (z, ℓ). Involution. To sum up, the argument outlined above shows that, for any graph G in Ω(z, ℓ), with high probability with respect to the randomness of the local resampling, the switched graphG is in the set Ω ′ 1 (z, ℓ). However, our goal was to show that a uniform d-regular graph G is in Ω ′ 1 (z, ℓ), except for an event of small probability. This follows from the statement we proved forG using that our switching acts as an involution on the larger product probability space (see Proposition 6.5).
Self-consistent equation. The sets Ω + 1 (z, ℓ) and Ω ′ 1 (z, ℓ) depend on the choice of vertex 1. However, for any i ∈ [[N ]], we can define Ω ′ i (z, ℓ) in the same way, by replacing the vertex 1 in the above definitions by vertex i (or using symmetry). By a union bound, then also the union of the events Ω ′ i (z, ℓ) over i ∈ [[N ]] holds with high probability. On the latter event, we derive (in Section 15) a self-consistent equation for the quantity where the sum ranges over the set of oriented edges in G, and G (i) (G) is the Green's function of the graph G with vertex i removed. On the infinite d-regular tree, it is straightforward computation to show that G (i) jj (z) = m sc (z) holds for any directed edge (i, j) (see Proposition 4.1). For the random regular graph, we will show that Q(G) obeys (see (15.6)) The main result of Section 15, proved using this self-consistent equation, is that, for any z ∈ D ∩ Λ ℓ , where Λ ℓ ⊂ C + is a domain on which the self-consistent equation is not singular (see Section 15 for details). In the final step, we will use different choices of ℓ to cover the entire spectral domain D.
2.3. Random walk picture. We conclude this section with the following random walk heuristic for the Green's function. The Green's function G ij (G, z) has a formal expansion in terms of walks in G from i to j with complex z-dependent weights: where the sum ranges over all walks w from i to j of length |w|. In several parts of the paper, it might be useful to think about the Green's function in this picture, though we never use it directly. However, the expansion (2.8) is only absolutely convergent for |z| > √ d − 1, where the formal sum is dominated by the shortest walks. In our case of primary interest, Im z ≪ 1 (with Re z inside the spectrum of the adjacency matrix), the expansion becomes highly oscillatory and is not absolutely convergent. Long walks become dominant and the Green's function can only remain bounded due to significant cancellations.
On the tree, it is easy to compute the Green's function exactly. In particular, one finds that the Green's function is bounded for all z ∈ C + , and that, roughly speaking, each step of a walk contributes a factor −m sc (z)/ √ d − 1. A popular and very efficient method to exhibit the required cancellations that result from the tree structure is via nonbacktracking walks.
Our main effort is not in exhibiting the cancellations resulting from the tree structure, but it is rather in exhibiting the cancellations of very long walks, where the tree structure ceases to be effective. To obtain these cancellations, we exploit the randomness of the random regular graph in combination with a Ward identity. Using a multiscale approach (implemented via a continuity argument), we successively prove that the Green's function remains bounded even for small Im z, and moreover that it has good decay. Such bounds, together with linear algebra (Schur complement formula, resolvent identity), allow to obtain well-defined partially resummed versions of random walk identities, in which the crucial cancellations are accounted for nonperturbatively.

Structure of random and deterministic regular graphs
In this section, we collect some properties of random and deterministic regular graphs, which we use in the remainder of the paper.
Excess of random regular graphs. For any graph G, we define its excess to be the smallest number of edges that must be removed to yield a graph with no cycles (a forest). It is given by excess(G) := #edges(G) − #vertices(G) + #connected components(G). (3.1) There are different conventions for the normalization of the excess. Our normalization is such that the excess of a tree or forest is 0. Note that if H ⊂ G is a subgraph, then excess(H) excess(G). We will use the following well-known estimates for the excess in random regular graphs. Most R-neighborhoods are trees: In fact, one can take κ < δ/(2ω + 2).
Proof. The statements are well known; for completeness, we sketch proofs in Appendix A.1.
Excess and the number of non-backtracking walks. The next proposition bounds the number of non-backtracking walks (NBW) between two vertices in a graph in terms of the excess of the graph. Here a non-backtracking walk of length n is a sequence of vertices (i 0 , . . . , i n ) such that the edge {i k−1 , i k } is adjacent to {i k , i k+1 } and such that the walks makes no steps backwards, i.e., i k−1 = i k+1 .
Proposition 3.2. Let G be a graph with excess at most ω. Then the following hold.
For any vertices i, j ∈ G, and any k 1, we have For any subgraph H ⊂ G and two vertices i, j in H such that E ℓ (i, j, G) ⊂ H, we have |{NBW from i to j of length ℓ + k which are not completely in H}| 2 ω(k+1)+1 . (3.5) The graph G does not need to be regular or finite, and self-loops and multi-edges are allowed.
Proof. The statements are presumably also well known; lacking a reference, we include their proofs in Appendix A.2.
Boundary of a neighborhood. In the remainder of the paper, given a graph G on [[N ]], we will often fix a vertex, chosen without loss of generality to be 1, and denote its ℓ-neighborhood by T = B ℓ (1, G), with corresponding vertex set T = B ℓ (1, G). We further enumerate ∂ E T as {e 1 , e 2 , . . . , e µ }, i.e., e i are the edges with one vertex in T and one in [[N ]] \ T, and correspondingly ∂T as {a 1 , . . . , a µ }, where a i is the endpoint of e i not in T. We also write T i = {v ∈ G : dist G (1, v) = i} for i = 0, 1, 2, . . . , ℓ.
, assume that B R (1, G) has excess at most ω, and that ℓ ≪ R. Then the following hold.
After removing T, most boundary vertices of T are isolated from the other boundary vertices: After removing T, any vertex x ∈ [[N ]] \ T can only be close to few boundary vertices of T: Notice that the graph G \ T is slightly larger than G (T) because the edges between the vertices T ℓ and [[N ]] \ T are not removed.  Recall that we view A and B (U) as subgraphs of B (which has zero deficit function) and that their deficit functions are given by our conventions in Section 1.3.
As a ball, B has by definition exactly one connected component, and by assumption it has excess at most ω. Thus χ(B) 1 − ω. We recall that e i is the edge on the boundary of T containing a i . Thus the graph B \ {e i1 , . . . , e i β } has at most α + 1 connected components: the component containing the vertex 1 and the components containing the vertices a i with i ∈ A j for some j ∈ [ [1, α]]. (Notice for i ∈ A j with j > α, we did not remove the edge e i . Therefore a i is still connected to 1.) Thus χ(B \ {e i1 , . . . , e i β }) α + 1. It follows that 1 + α χ(B \ {e i1 , . . . , e i β }) = χ(B) + β 1 − ω + β, and thus β α + ω. Since, by definition, we have β = α i=1 |A i | 2α, the first two inequalities in (3.9) follow. The third inequality is trivial for i > α, and for i α, we have which implies that |A i | ω − α + 2 ω + 1 as claimed.
Proof of (3.6). By definition, any i, j such that dist G (T) (a i , a j ) R/2 belong to the same connected component of A. (Indeed, a i is at distance ℓ + 1 from the vertex 1 and R ≫ ℓ, and thus B R/2 (a i , G) ⊂ B for any i ∈ [ [1, µ]].) In particular, if the set A i containing i has size 1, then for any j ∈ [ [1, µ]] \ {i}, we have dist G (T) (a i , a j ) > R/2. Recalling that β 2ω is the number of i for which the set A i containing it has size greater than 1, the claim (3.6) follows from (3.9).
Proof of (3.7). The claim is trivial if x ∈ B, since we then have dist G (T) (x, {a 1 , a 2 , . . . , a µ }) R−ℓ > R/2 by definition. Thus assume that x ∈ B. Let A j be such that x and the vertices a i with i ∈ A j are in the same connected component of A. We first show that those vertices a p with p ∈ A k where k = j do not contribute to (3.7). Indeed, then x and a p are in the different connected components of A. But since B R/2 (a p , G) ⊂ B, it then follows that dist G (T) (x, a p ) > R/2. Therefore |{p ∈ [ [1, µ]] : dist G (T) (x, a p ) R/2}| |A j |, and the claim (3.7) follows from the third inequality in (3.9).

Proof of Proposition 3.4.
Proof of Proposition 3.4. For the first statement, viewing the annulus A as a subgraph of G (T) , the bound (3.9) immediately implies that the sum of deficit function over any connected component of A satisfies g(v) max j |A j | ω + 1.
where ∂u is the set of neighbors of the vertex u in G. Notice that (3.11) so that the claim follows from To prove (3.12), we consider the graph Note that H is obtained from B by removing exactly k i=1 (|B i | − 1) edges and that H is connected. Since by assumption B has excess at most ω, after removing any ω + 1 edges, it cannot be connected. This implies (3.12).

Trees and tree extension
For the infinite regular tree and for the rooted infinite regular tree with given root degree, it is elementary to compute the Green's function explicitly, as done in the following proposition.
Proposition 4.1. Let Y be the infinite d-regular tree. For all z ∈ C + , its Green's function is .
Let Y 0 be the rooted infinite d-regular tree with root degree d − 1. Its Green's function is where ℓ(x, y) is the depth of the common ancestor of the vertices x and y in Y 0 . In particular, if x is the root of Y 0 , then G xx (z) = m sc (z).
The proof is given below. More general results for Green's functions on regular trees are discussed e.g. in [9, Section 3] and references given there.
The main results of this section are the following estimates for P ij (G 0 , z), the Green's function of the tree extension TE(G 0 ) of a graph G 0 , as defined in Definition 1.4.

Proposition 4.2.
Let ω 6 and √ d − 1 2 ω+2 . Let G 0 be a finite graph with vertex set G 0 and deficit function g. Assume that (i) any connected component of G 0 has excess at most ω, and that (ii) the sum of deficit function over any connected component of G 0 satisfies g(v) 8ω. Then the following holds for all z ∈ C + and all i, j ∈ G 0 .
and the diagonal terms satisfy the better estimate (ii) Let H 0 ⊂ G 0 be a subgraph with vertex set H 0 . Then for any two vertices i, j in H 0 such that E ℓ (i, j) ⊂ H 0 , the ij-th entries of the Green's functions of the tree extensions of G 0 and H 0 satisfy Item (i) states that P ij (G 0 ) is bounded and has (up to constants) the same decay as the Green's function of the infinite d-regular tree Y. In particular, (4.3) and (4.4) together with (1.7) imply that |P ij (G 0 , z)| (1 + δ ij /2)|m sc (z)|. (4.6) Item (ii) states that P ij (G 0 ) depends only weakly on G 0 . Especially, it implies the the following principle, which is used repeatedly throughout Sections 9-14. . Let X be a (small) set of vertices in a graph G. For vertices i, j ∈ X, it is often convenient to replace P ij (E r (i, j, G)), namely the ij-th entry of the Green's function of the graph TE(E r (i, j, G)) which itself depends on i, j, by P ij (G 0 ) of a graph G 0 which is independent of i, j and contains E r (i, j, G) for i, j ∈ X. In this situation, we abbreviate P = P (G 0 ). The estimate (4.5) then implies that P ij and P ij (E r (i, j, G)) are close in the sense provided the assumptions of (4.5) are obeyed. Proof. Let dist Y (x, y) = 1. The Schur complement formula implies where ∂y is the set of adjacent vertices of y in Y. By homogeneity, G yy is independent of x and y if dist Y (x, y) = 1 and therefore equal to the unique solution to the equation m = −1/(z +m) with Im m > 0, which is m sc . Applying the Schur complement formula again, it follows that This proves (4.1) and (4.2) for x = y. The case dist Y (x, y) = 1 then follows, e.g., from as claimed. The general case is similar by induction.

4.2.
Proof of Proposition 4.2 for g ≡ 0. For the proof of Proposition 4.1, we require the notion of covering of a graph. Given a graph G, a graphG together with a surjective map π :G → G is a covering of G if for each x ∈G, the restriction of π to the neighborhood of x is a bijection onto the neighborhood of π(x) on G. Every d-regular graph is covered by the infinite d-regular tree Y which is its universal covering. The Green's functions of a graph G and a coverG with covering map π :G → G obey the following identity. For each x ∈G and π(x) = i ∈ G, we have if the right-hand side is summable (see Appendix B for the elementary proof of (4.12)). In particular, if G is an infinite simple d-regular graph and π : Y → G its universal covering map, where Y are the vertices of Y, then by (4.1) and (4.12), for any vertex x ∈ Y such that π(x) = i, the resolvent entries of the graph G are given by For the number of non-backtracking paths, recall the estimates of Proposition 3.2. Using these, the proofs of (4.3) and (4.5) are straightforward from (4.13) if g ≡ 0.
Proof of (4.3) for g ≡ 0. For vertices i, j in different connected components of G 0 , we have P ij (G 0 ) = 0 and there is nothing to prove. Therefore, we can assume that i and j are in the same connected component.
Since we assume g ≡ 0, the tree extension G 1 = TE(G 0 ) is d-regular, and (4.13) implies (4.14) Since G 0 has excess at most ω, the same is true for G 1 . By the estimates for the number of non-backtracking paths from Proposition 3.2, the right-hand side of (4.13) is summable, provided that √ d − 1 2 ω+2 , and This completes the proof if g ≡ 0.
Proof of (4.5) for g ≡ 0. As in the proof of (4.3), we can assume that i and j are in the same connected component of G 0 . By (4.13), since all the non-backtracking paths from i to j of length ℓ are contained in H 0 , we have By (3.5), we therefore have To extend the bounds (4.3) and (4.5) to g ≡ 0, we use an alternative representation of P ij (G 0 ) given as follows. In Definition 1.4, P ij (G 0 , z) is defined as the Green's function of the infinite graph obtained by attaching a d-regular tree at every extensible vertex of G 0 . The next lemma shows that it is equivalently given by attaching to every extensible vertex a self-loop with z-dependent complex weight. The proof of the lemma follows by application of the Schur complement formula.
where H 2 is the normalized z-dependent adjacency matrix obtained by attaching to any extensible vertex Proof. Let G 1 = TE(G 0 ), and denote the normalized adjacency matrix of G 0 and G 1 by H 0 and H 1 respectively. Then H 1 has the block form where D is the normalized adjacency matrix of several copies of Y 0 , i.e. infinite d-regular tree with root degree d − 1, and B xy is 1/ √ d − 1 if y is an extensible vertex of G 0 and x the root of one of the former copies of the tree Y 0 , and B xy = 0 otherwise. By the Schur complement formula (B.3), it follows that, for any i, j ∈ G 0 , Thus and the claim of the lemma follows.
As previously, we abbreviate G 1 = TE(G 0 ), and denote by G 2 the finite z-dependent graph with complex weight obtained by attaching at each extensible vertex Moreover, to extend (4.3) and (4.5) from g ≡ 0 to g ≡ 0, we denote by G ′ 0 the same graph as G 0 but with deficit function g ≡ 0, by G ′ 1 = TE(G ′ 0 ) its tree extension, and by G ′ 2 the finite z-dependent graph with complex weight obtained by attaching at each extensible vertex We denote the normalized adjacency matrices of G 2 and G ′ 2 by H 2 and H ′ 2 respectively. Proof of (4.3) for g ≡ 0. By Lemma 4.4 and the case g ≡ 0, we have Our goal is to estimate 15) and the resolvent formula (B.1) implies By multiplying both sides of (4.16) by (|m sc |q distG 0 (i,j) ) −1 , we obtain where the first inequality uses the triangle inequality dist G0 (i, v) + dist G0 (v, j) − dist G0 (i, j) 0 and q 1, and the second and third inequalities follow from the assumptions g(v) 8ω, √ d − 1 2 ω+2 , and ω 6. By taking the maximum on the left-hand side of the above inequality and rearranging it, we get Γ 2 ω+2 .
Proof of (4.5) for g ≡ 0. The extension to the case g ≡ 0 again follows by comparing to the case g ≡ 0.
We define H 2 and H ′ 2 analogously to G ′ 2 and G ′ 2 . Our goal now is to bound The resolvent identity (B.1) and (4.15) imply To bound Γ, we distinguish two cases: Taking the difference of (4.17) and (4.18), dividing both sides by |m sc |q ℓ(i,j)+1 , and then taking the maximum over i, j ∈ H 0 , this leads to Since by assumptions g(v) 8ω, √ d − 1 2 ω+2 and ω 6, again rearranging the above expression, we get Γ 2 2ω+3 . This finishes the proof.

Initial estimates
As the first step of the proof of Theorem 1.6, we now show that (1.11) holds whenever |z| 2d − 1. Indeed, the following proposition states that (1.11) holds deterministically for |z| 2d − 1 under the assumption that the graph has locally bounded excess, which is guaranteed to hold with high probability by (3.2). (Related results appear in [31].) Let G be a d-regular graph on N vertices, with excess at most ω in any radius-R neighborhood. Then for any z ∈ C + with |z| 2d − 1, and any i, j ∈ G, the Green's function of G satisfies Proof. We denote the normalized adjacency matrix of G by H (where we recall that the normalization of the entries is always by 1 To show the second bound, set τ = 1 2 log(d − 1). Fix a vertex i, and define the diagonal matrix M by Then we have The entries of the matrix M HM −1 − H are given by Therefore M HM −1 − H ∞→∞ and M HM −1 − H 1→1 are bounded by d − 1, and by interpolation Since H is real symmetric and its eigenvalues are all in the interval In particular, for z such that |z| 2d − 1, its distance to the spectrum of H + (M HM −1 − H) is at least 1, and thus which implies (5.2). This completes the proof.
Thus we can assume dist G (i, j) < r 0 . Let G 0 := B r0+r (i, G), let G 1 = TE(G 0 ) be the tree extension of G 0 , and let P be the Green's function of G 1 . Then, by (4.5), we have Therefore it suffices to prove the claim with P ij (E r (i, j, G)) replaced by P ij , and an additional factor 1/2 on the right-hand side.
where H is the normalized adjacency matrix on T 0 induced by G and B is the part of the adjacency matrix of the edges from ∂T 0 to T 0 . Taking the difference of the last two equations, for any i, j ∈ T 0 , Since the radius-R neighborhood of i has excess at most ω, each row of B contains at most ω + 1 nonzero entries. Therefore, by (4.3) and Lemma 5.2, we have using that R > 2r 0 . Therefore, by the second bound of (5.2), |G for all x, y ∈ ∂T 0 except for the diagonal entries and at most 4ω 2 off-diagonal entries. By the first bound of (5.2), for these remaining entries we have |G (T0) xy | 1/d. The same bounds hold for P (T0) instead of G (T0) . As a result, we obtain , as well as that R > 4r 0 , the right-hand side is bounded by Together with (5.4), we conclude that where the last inequality follows from

Local resampling by switching
In this section, we define a local resampling of a random regular graph by using switchings. We effectively resample the edges on the boundary of balls of radius ℓ, by switching them with random edges from the remainder of the graph. This resampling generalizes the local resampling introduced in [18], where switchings were used to resample the neighbors of a vertex (corresponding to ℓ = 0). The local resampling provides an effective access to the randomness of the random regular graph, which is fundamental for the remainder of the paper.
6.1. Definition. To introduce the local resampling, we require some definitions.

Switchings. A (simple) switching is encoded by a pair of oriented edges
We assume that the two edges are disjoint, i.e. that |{v 1 , v 2 , v 3 , v 4 }| = 4. Then the switching consists of replacing the edges {v 1 , v 2 }, {v 3 , v 4 } by the edges {v 1 , v 4 }, {v 2 , v 3 }, as illustrated in Figure 5. We denote the graph after the switching S by T S (G), and the new edges (Double switchings, which we used in [18], are not needed in this paper; henceforth we will therefore refer to simple switchings as switchings.) Resampling data. Our local resampling involves a center vertex, which by symmetry we now assume to be 1, and a radius ℓ. Given a d-regular graph G, we abbreviate To be precise, given a graph G, we enumerate ∂ E T as ∂ E T = {e 1 , e 2 , . . . , e µ }, and orient the edges e i by defining e i to have the same vertices as e i and to be directed from a vertex l i ∈ T to a vertex a i ∈ [[N ]] \ T. The directed edges e i = (l i , a i ) are illustrated in Figure 6. Note that µ and the edges e 1 , . . . , e µ depend on G.
Then we choose (b 1 , c 1 ), . . . , (b µ , c µ ) to be independent, uniformly chosen oriented edges from the graph G (T) , i.e., the edges of G that are not incident to T, and define The sets S will be called the resampling data for G. By definition, the edges e i are distinct, but the vertices a i are not necessarily distinct and neither are the vertices l i .
and the set of admissible switchings 3) The interpretation of I i = 1 is that the graph E| [Si] is 1-regular. The interpretation of J i = 1 is that the edges of S i do not interfere with the edges of any other S j . Indeed, the condition |[S i ] ∩ [S j ]| 1 guarantees that the switchings encoded by S i and S j do not influence each other, meaning that T Si and T Sj commute. We say that the index i ∈ [ [1, µ]] is admissible or switchable if i ∈ W .
Let ν := |W | be the number of admissible switchings and i 1 , i 2 , . . . , i ν be an arbitrary enumeration of W . Then we define the switched graph by and the switching data by To make the structure more clear, we introduce an enlarged probability space.
Equivalently to the definition above, the sets S i are uniformly distributed over i.e., the set of pairs of oriented edges in E containing e i and another oriented edge in G (T) . Therefore S = ( S 1 , S 2 , . . . , S µ ) is uniformly distributed over the set S(G) = S 1 (G) × · · · × S µ (G).
Definition 6.1. For any graph G ∈ G N,d , denote by ι(G) = {G} × S(G) the fibre of local resamplings of G (with respect to vertex 1), and define the enlarged probability spacẽ Here P(G) = 1/|G N,d | is the uniform probability measure on G N,d , and for G ∈ G N,d , we denote the uniform probability measure on S(G) by P G .
Let π :G N,d → G N,d , (G, S) → G be the canonical projection onto the first component.
On the enlarged probability space, we define the maps For the statement of the next proposition, recall that G N,d denotes the set of simple d-regular graph on For any graph T , we havẽ andT is an involution:T •T = id.
Proof. The first claim is obvious by construction. To verify thatT is an involution, let (G, S) ∈G N,d and abbreviate (G,S) =T (G, S). Then, due to (6.9), the edge boundaries of the ℓ-neighborhoods of 1 have the same number of edges µ inG and G. Moreover, we can choose the (arbitrary) enumeration of the boundary of the ℓ-ball inG such that, for any In other words, that T is measure preserving means that if G is uniform over G N,d , and given G, we choose S uniform over S(G), then T S (G) is uniform over G N,d .
Proof. We decompose the enlarged probability space according to the ℓ-neighborhood of 1 as Notice that, given any T , the size of the set S(G) is (by construction) independent of the graph G ∈ G N,d (T ). Therefore, given any T , the restricted measureP|G N,d (T ) is uniform, i.e., proportional to the counting measure on the finite setG N,d (T ). Since, by Proposition 6.3, the mapT is an involution oñ G N,d (T ), it is in particular a bijection and as such preserves the uniform measureP|G N,d (T ). SinceT acts diagonally in the decomposition (6.10), this implies that the mapT preserves the measureP. Since P =P • π −1 and T = π •T , it immediately follows that also T is measure preserving: as claimed.
T T Figure 7. The figure illustrates the idea of Proposition 6.5. The horizontal axis represents the set of graphs G N,d , and the vertical direction the fibres of possible switchings. In particular, the sets Ω, Ω ′ , Ω + ,Ω are represented on the horizontal axis. The area in medium and dark grey representsΩ = T −1 (Ω). The sets Ω ′ and Ω + and their preimages can be illustrated analogously, but for simplicity we assume for the figure that Ω = Ω + . The lightly shaded area bounded by the vertical bars is ι(Ω). In (6.12), we devideΩ \Ω ′ into the part contained in ι(Ω + ) (the second term) and the part outside of ι(Ω + ) (the first term). The part inside ι(Ω + ) is small because of assumption (iii). To bound the part outside ι(Ω + ), we use thatT is an involution. This implies that the image underT of the area in dark gray is contained in ι(Ω) (thus its projection to the horizontal axis lies in Ω as shown in the figure), and not intersectingΩ + . Its contribution is small by assumption (ii), which implies that ι(Ω) contains most of Ω + .
The following general proposition, which makes use of the involution property ofT , is central to our approach. The idea of its proof is illustrated in Figure 7.
Roughly, the proposition shows that if, for most graphs G ∈ G N,d , an event holds for the switched graph T S (G) with high probability under the randomness of the switching S, then it also holds with high probability on G N,d . This enables us to condition on a (good) graph G for much of the paper, and then only use with the randomness of S which has a simple probabilistic structure.
More specifically, in our application of the proposition, the setΩ is a large set of regular graphs obeying rough a priori estimates (there are only few cycles), the set Ω is a set of graphs for which the Green's function obeys good estimates, and the set Ω ′ is a sets of graphs on which the Green's function obeys better estimates (near a given vertex). The proposition states that if with respect to the resampling most graphs obey the better estimates, then these estimates also hold on the original probability space with high probability. The sets to which the proposition will be applied are further discussed in Section 2. The proposition will be applied in Section 15.
and since T is measure preserving, and sinceT is a measure preserving involution, we have To bound the probability ofΩ, we make the following observations. First, Ω ⊂T (Ω) ⊂ ι(Ω). Second, any element (Ĝ,Ŝ) ∈Ω can be written asĜ = T (G, S) for some G ∈ Ω + and S ∈ S(G). SinceT is an involution, this (G, S) must in fact be given by (G, S) =T (Ĝ,Ŝ). Together this implies that (Ĝ,Ŝ) ∈Ω + , and thus thatΩ has no intersection withΩ + . As a result, where the second inequality follows since G N,d ⊃ Ω and the last inequality follows from the assumptions (i)-(iii).
6.3. Estimates for local resampling. In the following, we give some basic estimates for the local resampling. In particular, we show that, with high probability, most edges are switchable.
(ii) For any positive integer ω, we have Proof. To prove (i), we recall that, for any i, the oriented edge (b i , c i ) is uniformly chosen from the oriented edges of G (T) . By definition of T, there are at least N d , and since for any vertex x ∈ G (T) , the degree obeys deg G (T) (x) d, To prove (ii), we need to analyze the events I i J i = 0 more carefully. We define the disjoint sets  (6.15) holds. To bound the number of elements on the right-hand side of (6.15), we first note that Therefore, it follows that We will show that which implies the claim since To prove (6.17), first notice that there is a subset Therefore, by (6.13), Similarly, since | ∪ j [e j ]| 2µ, and, for any i ∈Ã ′ 2 , we have and where we used that the probability factorizes since the setsÃ 0 ,Ã 1 ,Ã ′ 2 are disjoint.
We also write for the set of vertices at distance i from 1. Further, we enumerate the boundary edges c 1 ), . . . , (b µ , c µ ) are chosen independently and uniformly among oriented edges from the graph G (T) . We denote by S(G) the set of all possible switching data, so that S is uniformly distributed on S. Given switching data S, we denote the set of admissible switchings by W . Without loss of generality, we will assume for notational convenience that W = {1, 2, 3, . . . , ν} where |W | = ν µ.
To study the change of graphs before and after local resampling, we define the following graphs (which need not be regular).
G is the original unswitched graph; Following the conventions of Section 1.3, the deficit functions of these graphs are given by d − deg, where deg is the degree function of the graph considered, and we abbreviate their Green's functions by G, G (T) , G (T) ,G (T) , andG respectively.

Graph distance between switched vertices
This section provides estimates on the distances between the vertices participating in the switching, in the graph with vertices T removed (before and after switching). It can be helpful to think about these estimates in terms of the sets K x ⊂ [[N ]] \ T defined by  .7), it is shown that (3.6) except for at most 2ω many, the K a does not intersect the other K a .
Roughly speaking, in this section it is shown that, for any graph G ∈Ω, the following estimates hold with high probability under P G : (7.5) except for at most ω many, the K b are trees.
By symmetry, the same statements hold with b replaced by c. More precisely, in the remainder of this section, we show that the estimates stated in the following propositions hold.
Proposition 7.1. For any graph G ∈Ω (as in Section 2.2), the following holds with P G -probability at least 1 − o(N −ω+δ ): Note that B a is the set of indices i such that K ai is not disjoint from all sets other K a and K b , and that B b is the set of indices i such that K bi is not disjoint from all other sets K b and K a .
We will show that the estimates (7.3) and (7.4) also imply the following estimates for the switched graphG (T) . For The remainder of this section is devoted to the proofs of Propositions 7.1-7.2.
7.1. Proof of Proposition 7.1. Recall that the oriented edges (b i , c i ) are independent and distributed approximately uniformly, so that (6.13) holds. The claims essentially follow from this.
Proof of (7.2). In any graph with degree bounded by d, the number of vertices at distance at most R/2 from vertex x is bounded by Since the b 1 , . . . , b µ are independent, it therefore follows that where, in the last inequality, we used that (4( Proof of (7.3). Recall the annulus A, and sets A 1 , A 2 , . . . from Lemma 3.5. By (3.9), |A 1 ∪· · ·∪A α | 2ω, and for any i ∈ A α+1 ∪ A α+2 ∪ · · · , a i is at least distance R in G (T) from other vertices a j . It follows that By a union bound, the right-hand side is bounded by . . , j ω }, and the sum over A ′ runs through the subsets of A α+1 ∪ A α+2 ∪ · · · with |A ′ | = ω, the sum over B ′ runs through subsets of [ [1, µ]] with |B ′ | = ω. Notice that if a k and a m are in different connected components of A, then dist G (T) (a k , b i ) R/2 and dist G (T) (a m , b j ) R/2 imply b i and b j are in different connected components of A (those of a k and a m , respectively), and in particular then b i = b j . As a consequence, the indices j 1 , . . . j ω must be distinct, and in particular the random variables b j1 , . . . , b jω are independent. Thus the previous expression is bounded by where we used that there are µ ω choices for A ′ and B ′ respectively, and the estimate (7.8) with x = a i1 , . . . , a iω .
Proof of (7.5). By the assumption G ∈Ω, all except at most N δ many vertices have radius-R tree neighborhoods. In particular, the same holds for G (T) . By (6.13), it follows that P G (the radius-R neighborhood of c i contains cycles) 2N −1+δ .
By the union bound, and using that the number of ways to choose ω + 1 elements from µ elements is bounded from above by µ ω+1 , given that δ < 1/ω and using that µ 2(d − 1) ℓ+1 = (log N ) O(1) by the choice of parameters in Section 2.

Proof of Proposition 7.2.
Proof of (7.6). By the definition of the sets B a and B b , for any i ∈ [ [1, µ] SinceG (T) is obtained from G (T) by removing the edges {b i , c i } i ν and adding the edges {a i , b i } i ν , the claim (7.6) directly follows from (7.9).
Proof of (7.7). We consider three cases.

Green's function distance and switching cells
The bounds provided in the Section 7 provide accurate control for distances at most R/2. However, random vertices are typically much further from each other, and as mentioned in Section 2, we require stronger upper bounds on the Green's function for such large distances. These bounds are in fact a general consequence of the Ward identity, which holds for the Green's function of any symmetric matrix (see (B.6)). To make use of it, we introduce a much coarser measure of distance in terms of the size of the Green's function as follows.  Figure 9. The S-cells are defined in terms of the unswitched graph. In the switching process, distances between S-cells may decrease. This is accounted for by the coarser S ′ -cells. For later use, we note the following elementary properties of S-cells: For any x ∈ S i and y ∈ S j such that i = j we have |G (T) If b k ∈ S i , then also c k ∈ S i ; and, consequently, if b k ∈ S ′ i then c k ∈ S ′ i .

Estimates
. From now on, we fix the parameters M and ω ′ by where δ > 0 was fixed in Section 2. The next proposition shows that the cells do not cluster.
Proposition 8.2. For any graph G ∈ Ω + 1 (z, ℓ) (as in Section 2.2), with probabililty at least 1 − o(N −ω+δ ) under S, the following estimates hold: In particular, x is ∼-connected to at most ω ′ of the S-cells.
Except for at most ω ′ many indices i, the vertex b i is a singleton in the graph R, and thus the S-cell In particular, each S-cell contains at most ω ′ of {b 1 , b 2 , . . . , b µ }.
Most S ′ -cells are far from the other vertices participating in the switching: In particular, each S ′ -cell contains at most ω ′ + 5ω of {b 1 , b 2 , . . . , b ν }.
In the remainder of this section, we prove the above proposition. It is essentially a straightforward consequence of the definitions, combined with union bounds. Proof. We show that dist G (T) (x, y) 8r implies x ∼ y. Assume that dist G (T) (x, y) 8r. Then there must be a vertex u such that dist G (T) (x, u) 4r and dist G (T) (y, u) 4r. Moreover, by the definition (2.7) of Ω + 1 (z, ℓ) and estimate (4.4), also |G The inequality (8.9) implies |Ṽ x | 2N/M 2 , and since any vertex has at most 2(d − 1) 4r vertices in its radius-4r neighborhood, we also have |V where the second inequality holds because b i is approximately uniform (6.13).
Proof of (8.5). The proof is similar to that of (7.2). By the union bound and (8.8), we have where, in the second inequality, we used µ ω ′ µ ω ′ and that since µ d ℓ+1 and by the definition of M (8.4).
Proof of (8.6). The proof is similar to that of (7.4). Indeed, by the union bound and (8.8), as needed.
Proof of (8. Since the graphG (T) is obtained from G (T) by removing edges {b j , c j } j ν and adding edges {a j , b j } j ν , we also have Moreover, for any other S-cell S j = S, S 1 , we have where we used (8.11), r ≪ R and the definition (8.3) of S-cells, i.e. S = B 2r (b i , G (T) ) and S j = B 2r (I j , G (T) ). Thus S is a S ′ -cell itself, and

Stability under removal of a neighborhood
The following deterministic estimate shows that removing the neighborhood T from the graph G has a small effect on the Green's function in the complement of T.  (E r (i, j, G))| |m sc |q r . (9.1) As discussed in Section 2, the removal of T is useful because our switchings have a smaller effect in G (T) than they do in G. Indeed, in the original graph G, our switchings have the effect of removing two edges and adding two edges, while in G (T) our switchings only remove the edges {b i , c i } i ν and add the edges {a i , b i } i ν . In the next few sections, we therefore work with G (T) and its switched versionG (T) , and only return to the full graph in Section 12.
The remainder of this section is devoted to the proof of the proposition. The main ingredients are that (i) given any i, j, there can only be a few vertices in T that are close to i or j, by the deterministic assumption on the excess of R-neigborhoods, and (ii) that for all other vertices in T, the decay of the Green's function implied by (9.1) shows that the removal of them has a small effect.

9.1.
Step 1: Removal of vertices close to i or j. From (6.24), recall that T ℓ = {v ∈ G : dist G (1, v) = ℓ} is the set of inner vertex boundary of T . The first step of the proof of Proposition 9.1 consists of removing the vertices in T ℓ that are close to i or j. The set of such vertices is where G \ T is obtained from G by removing the subgraph T induced by G on T (but not removing T ℓ ). Then |U| 2ω + 2 by (3.8). The following proposition shows that the Green's function remains to be locally approximated after removing U.
The proof of Proposition 9.2 follows a general structure that occurs repeatedly in similar estimates throughout the paper.
(i) The first ingredient in this structure, which we refer to as localization, replaces the Green's function P ij (E r (i, j, G)) of the vertex-dependent graph E r (i, j, G) by the Green's function P ij = P ij (G 0 ) of a graph G 0 that does not depend on i, j, by an application of Remark 4.3. For this, among other things, we need to verify the assumptions of Proposition 9.2.
(ii) The second ingredient, which we refer to as the starting point for the argument, is an algebraic relation that expresses the quantity to be estimated in a convenient form. The starting point typically follows from the Schur complement formula or the resolvent formula.
(iii) The third ingredient is a collection of previously established estimates required to estimate the expressions given by the starting point. It typically includes estimates on elements of Green's functions and graph distances.
The actual proofs then usually follow by combination of the above ingredients. In principle, this step is straightforward, but often several different cases need to be distinguished, which makes some of the arguments appear somewhat lengthy.
Below we provide the first instance of the strategy described above to prove Proposition 9.2.
Er(i, j, G) 1 3r j i Figure 10. The innermost disk shows T, the second largest disk the set X, and the outermost disk G0. For any i, j ∈ X, the graph Er(i, j, G) is contained in G0.
Localization. We approximate P ij (E r (i, j, G)) by a vertex independent Green's function P ij according to Remark 4.3, applied with G 0 = B 3r (1, G) and X = B 2r (U, G). We abbreviate Verification of assumptions in Proposition 4.2. As subgraphs of G ∈Ω, the radius-R neighborhoods of G 0 and G Starting point. To remove U, we apply the Schur complement formula (B.4): for any i, j ∈ G (U) , Our goal is to show that the difference G is small, by using that the difference of G and P is small. As evident from the right-hand sides of (9.6), for this we require upper bounds on the entries of G and (G| U ) −1 (and analogously for P and (P | U ) −1 ).
These bounds imply the upper bounds for the entries of (G| U ) −1 stated in the following claim. The claim essentially follows from the fact that the off-diagonal entries of G| U are much smaller than the diagonal entries which have size roughly m sc . Claim 9.3. Under the assumptions of Proposition 9.1, for any U ⊂ T with |U| 2ω + 2, and any x, y ∈ U, Proof. By the identity G| U (G| U ) −1 = I U×U , we have Let Γ := max x,y∈U |(G| U ) −1 xy |. Then (9.7) and (9.9) imply Taking the maximum over x, y ∈ U in the equation above and using (9.7) gives provided that √ d − 1 (ω + 1)2 ω+6 . Γ 2/|m sc | follows by rearranging. The same argument applies to P | U , and we obtain (9.8).

Step 2: Estimate of G
ij . By definition of U, there are no vertices in T \ U that are close to i or j in the graph G (T) . Thus the step mostly follows from the decay of the Green's function together with the assumption that there are few cycles.
Starting point. Define G 0 = B 3r (1, G) and G 1 = TE(G 0 ) as in Section 9.1. The normalized adjacency matrices of G (U) and G where H (U) is the normalized adjacency matrix of T (U) . The nonzero entries of B and B 1 occur for the indices (i, j) ∈ {a 1 , . . . , a µ } × T ℓ \ U and take values 1/ √ d − 1. Notice thatB ij = (B 1 ) ij . We denote the normalized Green's functions of G (U) and G and also Proof. For any x ∈ T ℓ \ U, by (9.15) we have In the last inequality, we used that the excess of T (U) is at most ω so that for any x ∈ T ℓ \ U, we have deg T (U) (x) ω + 1 and thus y∈T\U H (U) xy The terms in the last sum in (9.18) vanish unless y ∈ T ℓ \ U. Therefore the sum is bounded by For the first sum in (9.19), the number of vertices a i adjacent to x is at most d − 1. For the second sum, by (3.6), for all pairs i = j with at most (2ω) 2 exceptions, dist G (T) (a i , a j ) > R/2. For these pairs, a i and a j are in different connected components of the graph G which means that |(D 1 − z) −1 aiaj | = 0. Therefore there are at most (2ω) 2 non-vanishing terms in the second sum. We use also that which follows from (4.6), provided that √ d − 1 2 2ω+3 . Therefore, By combining (9.18) and (9.20), we have shown that provided that √ d − 1 8(ω + 1). This completes the proof.
Claim 9.5. For any x, y ∈ T \ U,
This finishes the proof.

Stability under switching
We recall the S-cells and S ′ -cells from Definition 8.1, and the set of switching data S(G) from Section 6.2. The results of this section are the following stability estimates.
Proposition 10.1. Let z ∈ C + , G ∈Ω (as in Section 2.2) be a d-regular graph, and K 2 be a constant such that, for all i, j ∈ [[N ]] \ T, Then there exists an event F (G) ⊂ S(G) with P G (F (G)) = 1 − o(N −ω+δ ), explicitly defined in Section 10.1 below, such that for any S ∈ F (G) such thatG = T S (G) ∈Ω the following hold: ] \ T such that j ∈ S t and i ∼ S t for some t, For all estimates, we assume In particular, for any G ∈ Ω(z, ℓ), Proposition 9.1 implies that the assumptions of Proposition 10.1 are satisfied with K = 2. Thus Propositions 9.1 and 10.1 together show that, for any graph G ∈ Ω(z, ℓ), with high probability under S, the switched graphG belongs to Ω + 1 (z, ℓ) (as in Section 2). Then, for any G ∈ Ω + 1 (z, ℓ), we have Indeed, (i) follows from Proposition 6.6, (ii) follows from (7.5), (iii) follows from Propositions 3.3 and 7.1, and (iv) follows from Proposition 8.2.
10.2. Proof of (10.2). The proof of (10.2) follows the structure described below (9.4). Moreover, similarly to the proof of Proposition 9.1, we distinguish between vertices i, j that are close to the edges that get removed in going from G (T) toĜ (T) and vertices that are far from these edges. We first focus on i, j that are close to those edges that get removed.
Localization. First, we replace P ij (E r (i, j, G)) by the vertex-independent Green's function P ij , using Remark 4.3 with Moreover, we defineĜ 0 to be the graph obtained by removing the edges {b i , c i } i ν from G 0 . The deficit function ofĜ 0 is defined to be the restriction of that ofĜ (T) . We abbreviate Notice thatĜ 1 is equivalently obtained by removing the edges {b i , c i } i ν from G 1 . The following properties of G 0 follow from (7.3) and (7.4). Since the deficit function of G 0 (respectivelyĜ 0 ) is the restriction of that of G (T) (respectivelyĜ (T) ), any of the vertices a i , b i , c i ∈ K contributes 1 to the sum of the deficit function over K. By Claim 10.2, the sums of the deficit functions over any of the connected components of G 0 andĜ 0 are therefore bounded by 3ω + 2 × 2ω 8ω. Thus the assumptions of (4.5) are verified for both G 0 andĜ 0 , and for any i, j ∈ X, provided that √ d − 1 2 ω+2 , and an analogous estimate holds forP . Up to a small error, we can therefore use P instead of P (E r ((i, j, G (T) )) andP instead of P (E r ((i, j,Ĝ (T) )).
Starting point. By the resolvent identity (B.1), we have: Taking the difference of (10.10) and (10.11), we obtain yj −P yj ), (10.12) where the summation is over the oriented edges We regard (10.12) as an equation forĜ (T) −P , and will show thatĜ ij −P ij is small as a consequence of the smallness of G (T) − P .
Green's function estimates. We first collect some estimates on Green's functions, used repeatedly: (10.14) The first estimate follows from (4.6), and (10.1); the second estimate follows from assumption (10.1), and P ij (E r (i, j, G (T) )) = 0; the last estimate holds by the definition of ∼ in Section 8.1.
Proof of (10.2) in the remaining case. In the remaining case at least one of i, j is not contained in X; and by symmetry we can assume that i ∈ X. Then E r (i, j, G (T) ) = E r (i, j,Ĝ (T) ) and the graphs on both sides of the equality also have the same deficit function. It therefore suffices to show that |Ĝ ij | is small. By the resolvent identity (10.10), we have Since i / ∈ X, we have dist G (T) (i, {b k , c k }) 2r and therefore, by (10.14), For the case that exactly one of i, j is in X, i.e. i ∈ X and j ∈ X, we now decompose the set E defined in (10.13) For (x, y) ∈ E ′ 1 , since y, j ∈ X, |Ĝ (T) yj | |P yj (E r ((y, j,Ĝ (T) ))| + 2K|m sc |q r 2|m sc | by (10.16) and (4.6), and there are at most 2ω terms, i.e. | E ′ 1 | 2ω by (7.2), where now [ · · · ] refers to the terms in the sum in (10.17). For (x, y) ∈ E ′ 2 , since y, j ∈ X, |Ĝ (T) yj | 2K|m sc |q r by (10.16), and | E ′ 2 | 2ω ′ by (8.5), Combining the sums over E ′ 1 , E ′ 2 , E ′ 3 , from (10.17) we obtain This concludes the proof of (10.2) for i ∈ X and j ∈ X, ij − P ij (E r ((i, j, G (T) ))| 2K|m sc |q r .

(10.20)
For the case that i, j ∈ X, noticing that dist G (T) (b k , j) > 2r, we decompose the set E as E = E ′ 2 ∪ E ′ 3 , where E ′ 2 and E ′ 3 are defined in (10.19). By (10.20), for any (x, y) ∈ E, P yj (E r (y, j,Ĝ (T) )) = 0 and thus |Ĝ (T) | yj 2K|m sc |q r . Then the same argument as above implies This finishes the proof of the stability ofĜ (T) .
where E is as in (10.13). Notice that if i, j are in different S-cells, then for any (x, y) ∈ E, either i, x are in different S-cells, or y, j are in different S-cells. Similarly if j ∈ S t and i ∼ S t for some t, then for any (x, y) ∈ E, either |G  2), together with (4.6) for the bound for all x, y, and with P ix (E r (i, x, G (T) )) = 0 for dist G (T) (i, x) 2r; and P yj (E r (y, j,Ĝ (T) )) = 0 for distĜ (T) (y, j) 2r. The last bound in (10.22) holds by the definition of ∼.
Localization. The switching vertices that are not on the boundary of T after switching are given by By our construction of S-cells and S ′ -cells, it follows that X 1 = S 1 = S ′ 1 and X 2 = ∪ κ i=2 S i = ∪ κ ′ i=2 S ′ i . By our conventions, the deficit function of the graph G 0 is the restriction of that on G (T) . We define the graphĜ 0 by removing edges {b i , c i } i ν from G 0 andG 0 by adding edges {a i , b i } i ν toĜ 0 . The graphsĜ 0 andG 0 are given the restricted deficit functions fromĜ (T) andG (T) respectively. We abbreviate Notice that the graphĜ 1 is obtained by removing the edges {b i , c i } i ν from G 1 , and that the graphG 1 is obtained by adding the edges {b i , a i } i ν toĜ 1 . We use the following fact throughout this section.
Verification of assumptions in Proposition 4.2. By assumptionG = T S (G) ∈Ω. SinceĜ 0 andG 0 can be viewed as subgraphs of G andG respectively, the radius-R neighborhoods of them have excess at most ω. Moreover, the same argument as in Section 10.2 implies that the sum of the deficit functions on each connected component ofĜ 0 and that ofG 0 are bounded by 8ω. Therefore the assumptions for (4.5) are verified for both graphsĜ 0 andG 0 . Thus (4.3)-(4.5) hold forP andP , and as in (10.9), we can useP instead of P (E r (i, j,Ĝ (T) )) andP instead of P (E r (i, j,G (T) )).
Starting point. The proof is similar to that of (10.2). By the resolvent identity (B.1), we havẽ Taking difference of (10.27) and (10.28), we havẽ yj −P yj ), (10.29) where the sums are over the ordered pairs We regard (10.29) as an equation forG (T) −P , and will show thatG (T) −P is small, using thatĜ (T) −P is small by (10.2).
Green's function estimates. We collect some estimates on Green's function, which are repeatedly used in the proof: The first estimate follows from (4.6) and (10.2); the second estimate follows from P ij (E r (i, j,Ĝ (T) )) = 0 and (10.2); the last estimate is from (10.3).

Improved decay in the switched graph
In the graphG = T S (G), the edge boundary ∂ E T and the vertex boundary ∂T of T are given by 1, ν]] are those that get switched, and the verticesã i = a i with i ∈ [[ν + 1, µ]] are those for which the switching does not take place. Here recall from Remark 6.7 that we assume without loss of generality that the index set of admissible switchings is The result of this section is the following proposition, showing that (i) between most vertices in I the Green's function is small; (ii) for any vertex not in I, the Green's function between it and most vertices in I is also small. This decay asserted by the proposition is better than that between the boundary vertices of T which we assumed in the unswitched graph. This improvement is crucial for the subsequent sections, in particular for the derivation of the self-consistent equation.
Proposition 11.1. Under the same assumptions as in Proposition 10.1, let S ∈ F (G) (as in Section 10.1) and assume thatG = T S (G) ∈Ω (as in Section 2.2). Then there exists J ⊂ [ [1, ν]] with |J| ν − ω ′ − 6ω such that, for any k ∈ J, The proposition uses the randomness of the resampling via the properties of the Green's function that are encoded by the S ′ -cells. Indeed, recall that if c k was a random index, independent ofG (T) and i, then the size of the right-hand sides would be of order 1/ √ N η ≪ |m sc |q 3r+2 by the Ward identity (B.6). The remainder of this section is devoted to the proof of the proposition.

Preliminaries.
To prove Proposition 11.1, we use the same setup as in the proof of (10.4). Thus, from (10.25) and the paragraph below, recall the sets X 1 , X 2 and the graphs G 0 ,Ĝ 0 ,G 0 , and that the set X 1 ∪ X 2 is contained in G 0 (the vertex set of G 0 ). We also recall the S ′ -cells defined in Section 8.1.
We will prove Proposition 11.1 with the set J ⊂ [ [1, ν]] given by the set of indices k ∈ [ [1, ν]] such that the following conditions hold: (i) b k , c k ∈ X 2 (i.e. the S-cell containing b k and c k is not S 1 ); (ii) B R (c k , G (T) ) is a tree; (iii) the S ′ -cell S ′ containing b k and c k is not S ′ 1 (as implied by (i)) and satisfies By the assumption S ∈ F (G), and using the definition of F (G) given in Section 10.1, note that (7.5) and (8.7) hold. (7.5) implies that condition (ii) in the definition of J is true for all k ∈ [ [1, ν]] with at most ω exceptions. (8.7) implies condition (i), and further that condition (iii) is true for all k ∈ [ [1, ν]] with at most ω ′ + 5ω exceptions. It follows that as asserted in the statement of Proposition 11.1. With this definition of J, to prove Proposition 11.1, we now follow the structure described below (9.4) (without the localization step, which is not required here).
where the summation is over the ordered pairs By our assumption on η, the first term on the right-hand side of (11.6) is smaller than the right-hand sides of (11.2)-(11.4), so we only need to estimate the sum on the right-hand side of (11.6).
Green's function estimates. To estimate the sum on the right-hand side of (11.6), we use the following estimates on Green's functions, which hold for (x, y) ∈ E: The last bound in (11.8) holds by (10.3). The remaining estimates follow from Propositions 10.1, together with (4.6) for the bound for all x, y; with P ix (E r (i, x,Ĝ (T) )) = 0 for for the bound for distĜ (T) (i, x) 2r; and with P yc k (E r (y, c k ,G (T) )) = 0 for the bound for distG (T) (y, c k ) 2r.
Distance estimates. Since the estimates (11.8)-(11.9) depend on distances, we need some estimates on distances in the graphsĜ (T) andG (T) . These are summarized in the following lemma.
Lemma 11.2. Let k ∈ J and S ′ be the S ′ -cell that contains c k . Then the following estimates hold.
(i) In the graphG (T) , the vertex c k is far away from {a 1 , . . . , a µ , b 1 , . . . , b ν }: {a 1 , . . . , a µ , b 1 , . . . , b ν }) > 2r. (11.10) (ii) If distG (T) (i, S ′ ) > 2r, then distĜ (T) (i, a k ) distG(T) (i, a k ) 2r. (11.11) (iii) If i ∈ X 1 and distG(T) (i, a k ) 2r, then distG(T) (i, S ′ ) > 2r. (11.12) Notice also that, by the definition of J, Proof. To prove (i), it follows from (11.5) from the definition of J that It remains to prove distG(T) (c k , {a k , b k }} > 2r. Given any geodesic inG (T) from c k to {a k , b k }, we distinguish two cases. In the first case that the geodesic contains any of the edges {a m , b m } m ν , the condition (11.5) which holds by the definition of J, implies that its length is larger than 2r. In the second case that the geodesic contains none of the edges {a m , b m } m ν , it a path on the graphĜ (T) . Therefore, to prove (i), it suffices to show that (11.10) holds with the graphG (T) replaced byĜ (T) . By the condition b k , c k ∈ X 2 , and since b k , c k are adjacent in G (T) , it follows from Lemma 8.3 that dist G (T) (b k , a k ) > 8r, and therefore that Moreover, since c k has radius-R tree neighborhood in G (T) , and since inĜ (T) the edge {b k , c k } is removed compared to G (T) , we have This completes the proof of (11.10) withG (T) replaced byĜ (T) , and thus the proof of (i). For (ii), since a k and b k ∈ S ′ are adjacent in the graphG (T) , we have The first inequality in (11.11) is trivial sinceĜ (T) ⊂G (T) . To prove (iii), note that any geodesic from i to S ′ inG (T) either contains a k , or does not contain the edge {a k , b k }. In the first case that the geodesic contains a k , its length is at least 1 + distG (T) (i, a k ) > 2r, as desired. In the second case, where the first inequality holds since i ∈ X 1 ∪ X 2 \ S ′ , and the last inequality follows from the definition of the S-cells and Lemma 8.3. Recall that the graphG (T) is obtained from G (T) by adding the edges {a m , b m } m ν and removing the edges {b m , c m } m ν . And by the definition of the set J, we know {b k , c k } ⊂ S ′ and {b m , c m : m ∈ [[1, ν]]\ {k}} ⊂ X 1 ∪X 2 \ S ′ . Therefore, the graphG (T) \ {a k , b k } and G (T) are different only on the subgraphs induced on S ′ and X 1 ∪ X 2 \ S ′ , and the equality in the above equation holds. Remark 11.3. Recall the random walk representation of the Green's function from Section 2.3. In terms of the random walk heuristic, together with the a priori estimates (11.8)-(11.9) on the Green's function, one can understand the bounds of Proposition 11.1 as follows. For the left-hand side of (11.2), the walk with most weight is 2r and distG(T) (b k , c k ) 2r, the walks i → a k and b k → c k each contribute at least a small factor q r ; the walk a k → b k has at least one step and thus contributes at least a factor q. Therefore |G (T) ic k | q r × q × q r = q 2r+1 . For the left-hand side of (11.3), the walk with most weight is i = c j → b j → a j → a k → b k → c k , and it therefore follows that |G For the left-hand side of (11.4), the walk with most weight is ic k | q 2r+1 . The proof of Proposition 11.1 essentially follows from the heuristic described in Remark 11.3, which can be made rigorous by combining the estimates on the Green's function of (11.8)-(11.9) with those on the distances stated in Lemma 11.2. This requires a division into a number of cases and is done carefully below.
11.2. Proof of (11.2). Let (11.13) ic k | : i ∈ X 2 and i ∈ S ′ . (11.14) Thus Γ 1 is the maximal size of the Green's function between c k and vertices in X 1 which is away from S ′ , and Γ 2 is the maximal size of the Green's function between c k and vertices in X 2 which is in different S ′ -cells from c k .
Given Proposition 11.4, the claim (11.2) is an immediate consequence.
Proof of (11.2). It suffices to show that the left-hand side of (11.2) is bounded by max{Γ 1 , Γ 2 }. First, if i ∈ X 2 , then i = c l for some l = k, and by the definition of J, then c l ∈ S ′ . Thus the left-hand sides of (11.2) is bounded by Γ 2 . Second, if i ∈ X 1 , then either i = a l or i = c l for some l = k. In either case, by the definition of J, distG (T) (i, S ′ ) > R/4 − 2r > 2r. Thus the left-hand side of (11.2) is bounded by Γ 1 .
For (b k , a k ) ∈ E 2 , we have b k ∈ X 2 and i ∈ X 1 , which implies i and b k are in different S-cells. Thus, by (11.8)-(11.9), For (b l , a l ) ∈ E 3 , we again have that i and b l are in different S-cells (since b l ∈ X 2 ) and distG (T) (a l , c k ) 2r by (11.10). Thus, by (11.8)-(11.9) and | E 3 | µ 2(d − 1) ℓ+1 , For (a l , b l ) ∈ E 4 , there are at most ω + 1 indices l such that dist G (T) (i, a l ) distĜ (T) (i, a l ) < 2r by (3.7), and at most | E 4 | µ 2(d − 1) ℓ+1 indices such that distĜ (T) (i, a l ) 2r. Moreover, we have b l ∈ X 2 and also b l ∈ S ′ by the definition of J. Thus, by (11.8) and the definition of Γ 2 ,
Thus, by (11.8) and |G (T) Combining the sums over E 1 , . . . E 5 , and taking the maximum over i obeying the conditions in the definition of Γ 1 in (11.13), we get To bound Γ 2 , let i ∈ X 2 be as in the definition of Γ 2 . Let S ′′ be the S ′ -cell containing i, and notice that S ′ = S ′′ , S ′ 1 from the definition of Γ 2 . We now divide For (x, y) ∈ E ′ 1 , i and x are in different S-cells (since x ∈ X 1 and i ∈ X 2 ) and distG(T) (y, c k ) > 2r by (11.10).
For (b k , a k ) ∈ E ′ 2 , i and b k are in different S-cells since i ∈ S ′′ and b k ∈ S ′ by assumption. Moreover, we have distG(T) (a k , c k ) > 2r by (11.10). Thus For (b l , a l ) ∈ E ′ 3 , there are at most 5ω indices l such that distG(T) (i, b l ) < 2r by (7.7) in Proposition 7.2, and at most For (b l , a l ) ∈ E ′ 4 , i and b l are in different S-cells; a l and c k are in different S-cells (since a l ∈ S ′ 1 and c k ∈ S ′ ); there are at most | E ′ 4 | µ 2(d − 1) ℓ+1 terms. Thus Combining the sums over E ′ 1 , . . . , E ′ 4 , and taking the maximum over i obeying the conditions in the definition of Γ 2 , we get In summary, in (11.16) and (11.21), we have shown that where a, b, c, d, e are explicit constants given in (11.16) and (11.21). By plugging the second estimate into the first one, noticing b + ce < 1, and using the explicit values of a, b, c, d, e, it follows that 11.3. Proofs of (11.4) and (11.3).
Proof of (11.3). It remains to estimateG (T) cj c k for j ∈ J \ {k}. As previously, we denote by S ′ the S ′ -cell containing c k , and now denote by S ′′ the S ′ -cell containing c j . The estimates in Lemma 11.2 on distances from c k also apply with c k replaced by c j . Similarly to the bound of Γ 2 , we use formula (11.6) and devide E as E = E 1 ∪ · · · ∪ E 5 , where Notice that for any x ∈ {a 1 , . . . , a ν , b 1 , . . . , b ν }\{b j }, by the definition of J, x, c j are in different S-cells, and thus |Ĝ (T) cj x | 2M/ √ N η by (11.8). Moreover, by the definition of J, for any y ∈ {a m , b m : m ∈ [[1, ν]]\{k}}, we have distG (T) (y, S ′ ) > R/4 − 2r > 2r and thus y satisfies the condition either in (11.13) or in (11.14).
12.1. Preparation of the proof. As in (11.1), we denote by ∂ E T the boundary edges of T in the switched graphG, and the corresponding boundary vertex set by I = {ã 1 ,ã 2 , . . . ,ã µ }. Let J be the index set of Proposition 11.1. Throughout the following proof, C represents constants that may differ from line to line, but depends only on the constant K of (10.1) and the excess ω. As in previous proofs, we follow the structure described below (9.4).
Starting point. The normalized adjacency matrices ofG andG 1 respectively have the block form where H is the normalized adjacency matrix for T , andB (respectivelyB 1 ) corresponds to the edges from I to T ℓ , where I is the set of boundary vertices of T in the switched graphG as defined in (11.1), and T ℓ is the inner vertex boundary of T as in (6.24). To be precise, the nonzero entries ofB andB 1 occur for the indices (i, j) ∈ I × T ℓ and take values 1/ √ d − 1. Notice thatB ij = (B 1 ) ij ; in the rest of this section we will therefore not distinguish B andB.
By the Schur Complement formula (B.3), we havẽ and, by the resolvent identity (B.1), the difference of (12.6) and (12.7) is In terms of the random walk heuristic described in Section 2.3, (12.8) has the interpretation that only walks that exit T contribute (see Figure 11). We will adopt suggestive terminology corresponding to the random walk picture below. By Proposition 11.1, the Green's functionG (T) is small between most vertices in I. This is the main reason that the right-hand side of (12.8) is small. In the following, we analyze the various contributions precisely.  Figure 11. Only walks that exit T contribute to (12.8).

Boundary.
The following lemma estimates the weight of "walks" from x ∈ T to T ℓ , the inner vertex boundary of T . It depends on the distance of x to the boundary, or equivalently that from x to 1.
Lemma 12.3. Assume thatG 0 has excess at most ω. For vertices x ∈ T ℓ1 , i.e. x is at distance ℓ 1 from vertex 1, we have For vertices x ∈ T ℓ1 and y ∈ T ℓ2 , with ℓ 1 ℓ 2 , we have The proof of the lemma uses the following combinatorial estimate on the distances of a vertex x to T ℓ (which is the inner vertex boundary of T ). Lemma 12.4. Assume that the graphG 0 = B 3r (1, G) has excess at most ω. Given x ∈ T ℓ1 , let L x be the multiset consisting of 2(ω + 1)(d − 1) ℓ−ℓ3 copies of the number q ℓ+ℓ1−2ℓ3 for ℓ 3 ∈ [[0, ℓ 1 ]], and let K x be the multiset K x = {q distG 0 (x,i) : i ∈ T ℓ }. Then the k-th largest number of K x is smaller than or equal to the k-th largest number of L x .
We postpone the proof of the lemma to Appendix A.3. Given the lemma, the proof of Lemma 12.3 is completed as follows.
Proof of Lemma 12.3. To prove (12.9), we use by Proposition 4.2. Defining the multiset L x as in Lemma 12.4, the inequality continuous with This finishes the proof of (12.9).
Remark 12.5. In the worst case, when x = 1, we have Moreover, when x, y ∈ T ℓ1 , we have These special cases will be used below.
12.3. Outside T. The following proposition shows that the weight of "walks" outside T is small. It essentially follows from Proposition 11.1. Proof. The first claim follows from (7.7). The second one follows from (3.7) by considering the graphG, since by our assumptionG ∈Ω, the R-neighborhood B R (1,G) has excess at most ω.
By the defining relation (10.5) of F (G) and Proposition 11.
12.4. Proof of (12.2). The proof of (12.2) follows essentially from (12.8) and the fact the difference of G (T) andP (T) is small (Proposition 10.1).
Moreover, for x ∈ T \ {1}, we have the stronger estimate Then the first term on the right-hand side of (12.8) is bounded by where we used (12.13) and (12.11). Next we bound the second term on the right-hand side of (12.8).
Thus it only remains to prove (12.2) for x ∈ T. Therefore, by taking the difference of these two equations, For the first term in (12.28), notice that by combining (12.25) and (12.22), we have The first term in (12.28) is bounded by where we used (12.13). For the second term in (12.28), sinceG ∈Ω, its radius-R neighborhood of vertex 1 has excess at most ω. By (3.7) there are at most ω + 1 indices k ∈ [ [1, µ]], such thatã k is in the same connected component as x in the graphG 0 . Thus,P a k x are zero for all k ∈ [ [1, µ]] except for at most ω + 1 of them, and they are bounded |P where we used (12.22). Combining the arguments above, they lead to where we used (12.29) and (12.14). This finishes the proof of (12.2).
Proof of (12.1). For x, y ∈ T, we denote Γ = max x,y∈T {|G xy −P xy |}. Then, by the Schur complement formula (12.8), The estimate of the first term follows the same argument as that for (12.23): For the second term, similarly, we have where we bounded |P xl k |, |P lmy | C|m sc | and used the estimates (10.4), (12.12) and (12.15). Therefore, by taking supremum of both sides of (12.31) and rearranging, we have Γ Cω ′2 |m sc |q r . For x ∈ T and y ∈ B 2r (1,G) \ T, the same argument as for (12.28) implies: where we bounded |G xl k | C|m sc |, and used the estimate (12.13), the bound for Γ and the fact that for all k ∈ [ [1, µ]] with at most ω + 1 exceptions,P (T) a k y are zero. For x ∈ T and y / ∈ B 2r (1,G), similarly, we have: where we bounded |G xl k | C|m sc | and used the estimate (12.14). By taking the difference, Notice that |G xy | C|m sc |q r , |G l k lm | C|m sc |, |P l k lm | C|m sc | and |G l k lm −P l k lm | Cω ′2 |m sc |q r , we have amy | Cω ′ |m sc |q r .

Concentration in the switched graph
The result of this section is the following proposition, which shows that the average of the Green's function ofG (T) over the vertex boundary of T concentrates under resampling of the edge boundary of T. This part is where the condition that the edge boundary contains ≫ log N edges is important. More precisely, recall the vertex boundary I = {ã 1 ,ã 2 , . . . ,ã µ } of T inG from (11.1). For any finite graph H (not necessarily regular and not necessary on N vertices), we define where E denotes the set of oriented edges of H, and G (i) (H, z) the Green's function of the graph obtained from H by removing the vertex i. Notice that we always normalize (13.1) by N d, irregardless of the actual number of oriented edges in H (which can be smaller than N d).
We recall the .
To prove Proposition 13.1, in Lemma 13.3, we first show a similar statement for the unswitched graph G (T) in which the problem becomes a concentration problem of independent random variables. Then we prove Proposition 13.1 by comparision, using the estimates of Proposition 10.1, and the fact that the change from Q(G, z) to Q(G (T) , z) is small (Lemma 13.4). Proposition 10.1 is applicable since, by the definition of set Ω + 1 (z, ℓ) in Section 2, any graph G ∈ Ω + 1 (z, ℓ) satisfies the assumptions in Proposition 10.1 with K = 2 10 .
The following proposition is used repeatedly in this section. It follows from exactly the same argument as Proposition 9.2, and we therefore omit the proof.
Lemma 13.2. Given z ∈ C + , a constant K ′ 2, and G ∈Ω. Let H be one of the graphs G (T) ,Ĝ (T) , G (T) orG, and suppose that Then, for any vertices i, j in H (x) , we have Here all graphs have deficit function g = d − deg, and we recall that H (x) is the graph obtained from H by removing the vertex x.
13.1. Estimate for the unswitched graph. The next lemma shows that a quantity closely related to the boundary average of the Green's function of the unswitched graph concentrates. Lemma 13.3. For any z ∈ C + and G ∈ Ω + 1 (z, ℓ), we define the set F ′ (G) ⊂ F (G) (as in Section 10.1) such that Proof. Let 1, µ]]. Conditioned on the graph G (T) , the random variables S 1 , S 2 , . . . , S µ are independent and identically distributed, and thus X 1 , X 2 , . . . , X µ are i.i.d random variables. By Lemma 13.2 and the assumption that G ∈ Ω + 1 (z, ℓ), for any k ∈ [[1, µ]], we have |X k | 2K|m sc |q r , where K = 2 10 . By Azuma's inequality for independent random variables, it therefore follows that In the following, we still need to estimate E[X k ]. Let E be the set of oriented edges of G (T) . By definition, T is the ℓ neighborhood of vertex 1, and it therefore intersects at most d and since by Lemma 13.2, we also have |G Moreover, since by assumption G ∈Ω, all except for at most N δ vertices have radius-R tree neighborhoods in G, and therefore For the vertices i contained in the set on the left-hand side, we have the bound |P ii (E r (i, i, G (Tj) ))| 2|m sc | from (4.4). For the other vertices i, whose r-neighborhood in G (T) is a d-regular tree, we have the equality P ii (E r (i, i, G (Tj) )) = m sc . Therefore Combining (13.7), (13.8), and taking t = (log N ) 1/2+δ /(4K) in (13.6), we get Since N −1+δ ≪ (log N ) 1/2+δ |m sc |q r / √ µ, it follows that (13.5) holds with overwhelming probability, and we can define F ′ (G) ⊂ F (G) as claimed with probability where we used (10.6). This completes the proof.
The proof of Lemma 13.4 uses Lemma 13.5 below, which is a direct consequence of the Ward identity (B.6).
Lemma 13.5. Given a graph G with degree bounded by d. We denote by E the set of oriented edges of G, by H its normalized adjacency matrix, and by G = (H − z) −1 its Green's function. Then, if for some z ∈ C + and any (i, j) ∈ E, it holds that then for any vertex x ∈ G, Proof. By the Schur complement formula (B.5) and the Ward identity (B.6), we obtain as claimed.
To prove Lemma 13.6 we need the estimates summarized in the following lemma.
13.4. Proof of Proposition 13.1. Finally, using the previous lemmas, we can proof Proposition 13.1.
Proof of Proposition 13.1. For k ∈ J, the r-neighborhood of c k is a d-regular tree with root degree d − 1 in any of the graphs G (Tb k ) ,Ĝ (Tb k ) andG (T) ; therefore, by (4.2), On the other hand, for the indices k ∈ [[1, µ]] \ J, by (2.7), Proposition 10.1, and Lemma 13.2, using that for G ∈ Ω + 1 (z, ℓ), the assumption of Proposition 10.1 holds with K = 2 10 , we have The above estimates (13.22) and (13.17) give (ω ′ + 9ω)|m sc |q r µ + (8K 2 |m sc |q 2r + 16q 2r ) (log N ) 1/2+δ |m sc |q r 4 √ µ . (13.23) Moreover, by the above estimates (13.22), (13.18) and usingã k = c k for k ∈ J, we have (13.24) In the above estimates we used ℓ 4 log d−1 log N by (2.1) so that √ µ ≫ log N = ω ′ . The left-hand side of (13.2) is bounded by |(13.5)| + |(13.9)| + |(13.24)| + |(13.5)| 2(log N ) 1/2+δ |m sc |q r √ µ , 14 Improved approximation in the switched graph The results of this section are the following proposition, stating that the Green's function obeys better estimates than the original one near vertex 1. As in the previous sections, we writeG = T S (G) and assume that S ∈ F ′ (G) (as in Lemma 13.3) is such thatG = T S (G) ∈Ω (as in Section 2.2). Throughout the proof, C represents constants depending only on the constant K from (10.1) and the excess ω, which may be different from line to line. (i) For the vertex x = 1, Moreover, if the vertex 1 has radius-R tree neighborhood in the graphG, then the following stronger estimates hold.
(i') For the vertex x = 1, For the the average ofG 1x over the vertices x adjacent to 1,
We use the same set-up as in Section 12, and notice that (14.2) is (12.2).
14.1. Proof of (14.1) and (14.3). By (12.8), we havẽ a kãk ). (14.5) For the last term on the right-hand side of (14.5), we have Cq r+2 q ℓ+1 (|m sc |q r ) C|m sc |q r+ℓ+2 , where we used (12.22) for the first factor, (12.25) for the second factor, and (10.4) for the last factor. For the second term on the right-hand side of (14.5), we have provided that ω ′2 q ℓ ≪ 1, where we used (12.29) for the first factor, (12.25) for the second factor, and (12.15) for the last factor. Therefore (14.5) is bounded bỹ where the implicit constant depends only on the excess ω and K from (10.1).
Proof of (14.3). If the radius-R neighborhood of the vertex 1 is a tree, then by Proposition 4.1, Notice that µ = d(d − 1) ℓ under the assumption that the R-neighborhood is a tree. Moreover, for all and by Proposition 13.1, we can simplify (14.7) to get This finishes the proof of (14.3).
Proof of (14.1). Since by assumptionG ∈Ω, the radius-R neighborhood of the vertex 1 has excess at most ω. Therefore, there are at most 2ω(d − 1) ℓ indices k ∈ [ [1, µ]] such that the non-backtracking path from 1 to l k of length ℓ is not unique. Let : non-backtracking path from 1 to l k of length ℓ is unique}.
Proof of (14.4). For any vertex x adjacent to 1, by (12.8) we have, a kãk ). (14.12) For the last term on the right-hand side of (14.12), where in the first inequality, we used (12.22) for the first factor, and (10.4) for the last factor; in the second inequality, we used (12.9) for the case x ∈ T 1 . For the second term on the right-hand side of (14.12), we have provided that ω ′2 q ℓ ≪ 1, where we used (12.15). Therefore, they together lead tõ where the implicit constant depends only on the excess ω and K. Especially, if vertex 1 has radius-R neighborhood, then by Proposition 4.1 for any index k ∈ [ [1, µ]]. Thus averaging (14.13) over all the vertices x adjacent to 1 (in the following, we write x ∼ 1 when the vertex x is adjacent to 1), we get In the third line, we used the fact that for any index k ∈ [ [1, µ]], among the d children of vertex 1, one of them is distance ℓ − 1 to the vertex l k , and the others are distance ℓ + 1 to the vertex l k . In the last line, we used Proposition (13.1), and |m 2 d m 2ℓ−1 sc (1 + m 2 sc )| 4. This finishes the proof of Proposition 14.1.

Proof of main results
In this section, we use the estimates established in the previous sections to prove Theorem 1.6.
Given a graph G and a vertex i, we resample the edge boundary of B ℓ (i, G) using switchings; without loss of generality we assume i = 1. Denote the resampled graph by T S (G) (which depends on the choice of i); S is the resampling data (whose distribution depends on G).  The improvement under resampling applies to the switched graphs T S (G). However, by general properties of T , it implies an improvement on the original space of graphs. , Ω + = Ω + 1 (z, ℓ), and Ω ′ = Ω ′ 1 (z, ℓ). Therefore, Proposition 6.5 implies which was the claim.
Clearly, (15.5) also holds with vertex 1 replaced by any other vertex i ∈ [[N ]]. In particular, for any graph in the intersection of the Ω ′ i (z, ℓ) over i ∈ [[N ]], we have the following improved estimates for the entries of its Green's function.
It follows that |θ| ε ℓ and 1 − ε ℓ r < 1. This finishes the proof of the first statement.

A Combinatorial estimates for random regular graphs
A.1. Proof of Proposition 3.1.
Proof of (3.2). For ω = 1, a proof of the statement is given in [59,Lemma 2.1] or [19,Lemma 7], for example. The more general statement follows from the same proof. More precisely, in [59, (2.4)], it is shown that for any i ∈ [[N ]], the excess X i in B R (i, G) is stochastically dominated by a binomial random variable with n = d(d − 1) R trials and success probability p = d(d − 1) R−1 /N . It follows that By a union bound, and using κ < δ/(2ω + 2), therefore which is better than claimed.
Proof of (3.4). We fix vertices i, j and an integer k. Given a graph G, we denote by t k (G) the total number of non-backtracking paths from i to j of length less than dist G (i, j) + k. We modify the graph G in the three steps such that, in each step, t k does not decrease, and the excess remains the same. Then it suffices to prove (3.4) for the final graph.
Step 1. Given an edge e = {x, y} ∈ G that is not a self-loop and not on a geodesic from i to j, we shrink the edge e to a point (remove e and identify its incident vertices), and so obtain a new graph G ′ . There is a bijection between the oriented edges of the graph G \ {e} and those of the graph G ′ . Now we show that the total number of non-backtracking paths from i to j of length less than dist G (i, j) + k = dist G ′ (i, j) + k in G ′ is at least t k . Let ( e 1 , e 2 , e 3 , . . . ) be any non-backtracking path from i to j in the graph G that is not a geodesic. If some e β is (x, y) or (y, x), we remove it from the path and view the remaining part as a path from i to j in the graph G ′ . In this way we get a shorter path from i to j in G ′ . The new path is still non-backtracking, and we can recover the original path in G from the new path in G ′ since x = y. Therefore the total number of non-backtracking paths from i to j of length less than dist G (i, j) + k = dist G ′ (i, j) + k in G ′ is at least t k .
We repeat this procedure with edges e (not on a geodesic) chosen arbitrarily as long as possible. This creates a new graph G 1 (which may depend on the choice of edges in the steps) with vertex set G 1 . By construction, the edges in G 1 are either self-loops or on geodesics from i to j. Thus the vertex set of G 1 decomposes into G 1 = V 0 ∪ V 1 ∪ · · · ∪ V distG 1 (i,j) , where V m := {v ∈ G 1 : dist G1 (i, v) = m}, (A. 5) or equivalently, V distG 1 (i,j)−m := {v ∈ G 1 : dist G1 (v, j) = m}. In particular, V 0 = {i} and V distG 1 (i,j) = {j}. Any edge in G 1 is either a self-loop or has one vertex in V m and the other vertex in V m+1 , for some m ∈ [[0, dist G1 (i, j) − 1]]. The excess of G 1 is ω.
Step 2. Given two edges e = {v m , v m+1 } and e ′ = {v m , v ′ m+1 } with v m ∈ V m and v m+1 = v ′ m+1 ∈ V m+1 , we remove the edge e ′ and identify v ′ m+1 with v m+1 , thus creating a new graph G ′ 1 . Again there is a bijection between the oriented edges of the graph G 1 \ {e ′ } and those of the graph G ′ 1 . Now we show that the total number of non-backtracking paths from i to j of length less than dist G (i, j) + k = dist G ′ 1 (i, j) + k in G ′ 1 is at least t k . Let ( e 1 , e 2 , e 3 , . . . ) be any non-backtracking path from i to j in the graph G 1 . If e β = (v ′ m+1 , v m ) and e β+1 = (v m , v m+1 ), we replace e β by (v m+1 , v m ); if e β = (v ′ m+1 , v m ) and e β+1 = (v m , v m+1 ), we remove both e β and e β+1 ; if e β = (v m , v ′ m+1 ) and e β−1 = (v m+1 , v m ), we replace e β by (v m , v m+1 ); if e β = (v m , v ′ m+1 ) and e β−1 = (v m+1 , v m ), we remove both e β and e β−1 . Then we view the remaining part as a path from i to j in the graph G ′ 1 , whose length is at most as long as that of the original path. The new path is still non-backtracking, we can recover the original path in G 1 from the new path in G ′ 1 since v m+1 = v ′ m+1 . Therefore the total number of non-backtracking paths from i to j of length less than dist G (i, j) + k = dist G ′ 1 (i, j) + k in G ′ 1 is at least t k . For any m ∈ [[0, dist G1 (i, j) − 2]], if in the new graph |{v : dist G1 (i, v) = m + 1}| 2, we can repeat the above process to reduce it by one. We repeat this procedure as long as possible, choosing at every step edges e and e ′ arbitrarily such that the conditions are satisfied. Finally, we obtain a graph G 2 (which again is not unique) that has exactly dist G2 (i, j) + 1 vertices, {v 0 = i, v 1 , v 2 , . . . , v distG 2 (i,j) = j}, such that dist G2 (i, v m ) = m for m ∈ [[0, dist G2(i,j) ]]. The excess of G 2 is ω.
Step 3. In the final step, given any edge e from v m to v m+1 , if it is the only edge from v m to v m+1 , we shrink it to a point. This preserves non-backtracking paths, and it reduces the distance between i and j by one. By shrinking all edges of multiplicity one, we obtain a graph G 3 . The number of non-backtracking paths from i to j of length less than dist G3 (i, j) + k is at least t k , and the excess of G 3 is ω. Final step. To bound the number of non-backtracking paths from i to j in G, it suffices to estimate the number of non-backtracking paths from i to j in the graph G 3 . Let ℓ = dist G3 (i, j), s be the total number of self-loops in G 3 , w m + 1 the multiplicity of the edge {v m−1 , v m }, for m ∈ [ [1, ℓ]], and set w = max 1 m ℓ w m . Since G 3 has excess ω, s + ℓ m=1 w m = ω. The maximum degree of the graph G 3 is bounded by 2s + 2w + 2. Now any non-backtracking path from i to j of length ℓ + k necessarily contains the edges (v 0 , v 1 ), (v 1 , v 2 ), . . . , (v ℓ−1 , v ℓ ), and for each of them there are w 1 + 1, w 2 + 2, . . . , w ℓ + 1 choices respectively. For other steps, there are at most 2s + 1 + 2w choices. The total number of such paths is bounded by ℓ + k ℓ (2s + 1 + 2w) k ℓ m=1 (w m + 1), (A.6) under the condition s + ℓ m=1 w m = ω. Note that (A.6) increases if we decrease s by 1 and increase some w m in such a way that w increases by 1. Therefore (A.6) achieves its maximum at s = 0. We denote a k := ℓ + k ℓ (1 + 2w) k ℓ m=1 (w m + 1), Since 1 + n 2 n for any n ∈ N 0 and ℓ m=1 w m = ω, we then have a 0 ℓ m=1 (w m + 1) 2 ω . For a k with k 1, notice that ω = ℓ m=1 w m w + (ℓ − 1) so that w ω − (ℓ − 1), and thus a k ℓ + k k (1 + 2w)a k−1 (ℓ + 1)(2ω − 2ℓ + 3)a k−1 (2ω + 5) 2 8 a k−1 (2 ω − 1)a k−1 , given that ω 6. Therefore t k a 0 + a 1 + · · · + a k−1 2 ω 1 + (2 ω − 1) + · · · (2 ω − 1) k−1 2 ωk .
This finishes the proof.
Proof of (3.5). Let H be the vertex set of H, ω 0 be the excess of the subgraph H, andH the subgraph induced by G on H. If dist G (i, j) ℓ + 1, then (3.4) implies #{non-backtracking paths from i to j of length ℓ + k, not completely in H} #{non-backtracking paths from i to j of length ℓ + k} 2 ωk , and the claim (3.5) follows. Therefore, in the following, assume that dist G (i, j) ℓ, and also that H, G are connected (otherwise, we can replace H by its connected component containing i and j, and G by its connected component containing H). For any non-backtracking path from i to j which is not completely contained in H, let e be the first edge in the path which does not belong to H. There are three possibilities for such edge e: (i) e ∈H. We denote the set of such edges by E 1 . (ii) If we remove e from G, then G \ {e} breaks into two connected components. It is necessary that the component not containing i, j contains cycles. We denote the set of such edges by E 2 . (iii) e ∈H, and if we remove e from G, G \ {e} is still connected. We denote the set of such edges by E 3 . We consider the graph G \ {E 1 ∪ E 2 ∪ E 3 }, from G by removing edges E 1 ∪ E 2 ∪ E 3 . It consists some many connected components, one corresponds to the graph H, others are in one-to-one correspondence with the connected components of G \H, the graph from removing H from G. Notice from the definition of these edge sets E 1 , E 2 , E 3 , each connected component of G \H contains exactly one edges in E 2 or at least two edges in E 3 . Therefore, G \ {E 1 ∪ E 2 ∪ E 3 } has at most 1 + |E 2 | + |E 3 |/2 connected components, where 1 represents the component H. For the excess of G \ {E 1 ∪E 2 ∪E 3 }, since its subgraph H has excess ω 0 , and each new components, due to removing of edges in E 2 , has excess at least 1, G \ {E 1 ∪ E 2 ∪ E 3 } has excess at least ω 0 + |E 2 |.
This completes the proof.
A.3. Proof of Lemma 12.4. To understand the distances distG(x, i) for all i ∈ T ℓ , we need some more notations. A simple pruning [43,Definition 4.4] is the operation of removing one leaf and its incident edge from a graph. By repeating pruning on the graphG 0 , we get a graphG 2 with vertex setG 2 , such that it contains at most two leave vertices: 1 and x.
Since there is no self-loop and multi-edge in our graph G, for any y 1 = y 2 with π(y 1 ) = π(y 2 ) = i and w 1 ∼ y 1 and w 2 ∼ y 2 , it is necessary that w 1 = w 2 . Therefore: