Stability of the matrix Dyson equation and random matrices with correlations

We consider real symmetric or complex hermitian random matrices with correlated entries. We prove local laws for the resolvent and universality of the local eigenvalue statistics in the bulk of the spectrum. The correlations have fast decay but are otherwise of general form. The key novelty is the detailed stability analysis of the corresponding matrix valued Dyson equation whose solution is the deterministic limit of the resolvent.


Introduction
E. Wigner's vision on the ubiquity of random matrix spectral statistics in quantum systems posed a main challenge to mathematics. The basic conjecture is that the distribution of the eigenvalue gaps of a large self-adjoint matrix with sufficient disorder is universal in the sense that it is independent of the details of the system and it depends Oskari H. Ajanki 1 IST Austria, Klosterneuburg, Austria only on the symmetry type of the model. This universal statistics has been computed by Dyson, Gaudin and Mehta for the Gaussian Unitary and Orthogonal Ensembles (GUE/GOE) in the limit as the dimension of the matrix goes to infinity. GUE and GOE are the simplest mean field random matrix models in their respective symmetry classes. They have centered Gaussian entries that are identically distributed and independent (modulo the hermitian symmetry). The celebrated Wigner-Dyson-Mehta (WDM) universality conjecture, as formulated in the classical book of Mehta [44], asserts that the same gap statistics holds if the matrix elements are independent and have arbitrary identical distribution (they are called Wigner ensembles). The WDM conjecture has recently been proved in increasing generality in a series of papers [18,21,24,25] for both the real symmetric and complex hermitian symmetry classes via the Dyson Brownian motion. An alternative approach introducing the four-moment comparison theorem was presented in [49,50,52]. In this paper we only discuss universality in the bulk of the spectrum, but we remark that a similar development took place for the edge universality.
The next step towards Wigner's vision is to drop the assumption of identical distribution in the WDM conjecture but still maintain the mean field character of the model by requiring a uniform lower and upper bound on the variances of the matrix elements. This generalization has been achieved in two steps. If the matrix of variances is stochastic, then universality was proved in [18,27,29], in parallel with the proof of the original WDM conjecture for Wigner ensembles. Without the stochasticity condition on the variances the limiting eigenvalue density is not the Wigner semicircle any more; the correct density was analyzed in [1,2] and the universality was proved [4]. We remark that one may also depart from the semicircle law by adding a large diagonal component to Wigner matrices; universality for such deformed Wigner matrices was obtained in [43]. Finally we mention a separate direction to generalize the original WDM conjecture that aims at departing from the mean field condition: bulk universality for general band matrices with a band width comparable to the matrix size was proved in [11], see also [48] for Gaussian block-band matrices.
In this paper we drop the third key condition in the original WDM conjecture, the independence of the matrix elements, i.e. we consider matrices with correlated entries. Correlations come in many different forms and if they are extremely strong and long range, the universality may even be violated. We therefore consider random matrix models with a suitable decay of correlations. These models still carry sufficiently many random degrees of freedom for Wigner's vision to hold and, indeed, our main result yields spectral universality for such matrices.
We now describe the key points of the current work. Our main result is the local law for the resolvent The central role of (1.2) in the context of random matrices has been recognized by several authors [33,38,45,53]. We will call (1.2) Matrix Dyson Equation (MDE) since the analogous equation for the resolvent is sometimes called Dyson equation in perturbation theory. Local laws have become a cornerstone in the analysis of spectral properties of large random matrices [4,8,20,23,29,35,37,51]. In its simplest form, a local law considers the normalized trace 1 N Tr G(ζ ) of the resolvent. Viewed as a Stieltjes transform, it describes the empirical density of eigenvalues on the scale determined by η = Im ζ . Assuming a normalization such that the spectrum of H remains bounded as N → ∞, the typical eigenvalue spacing in the bulk is of order 1/N . The local law asserts that this normalized trace approaches a deterministic function m(ζ ) as the size N of the matrix tends to infinity and this convergence holds uniformly even if η = η N depends on N as long as η 1/N . Equivalently, the empirical density of the eigenvalues converges on any scales slightly above 1/N to a deterministic limit measure on R with Stieltjes transform m(ζ ).
Since G is asymptotically close to M, the deterministic limit of the Stieltjes transform of the empirical spectral measure is given by m(ζ ) = 1 N Tr M(ζ ). Already in the case of random matrices with centered independent entries (Wigner-type matrices) the limiting measure ρ(dω) and its Stieltjes transform m(ζ ) typically depend on the entire matrix of variances s xy := E|h xy | 2 and the only known way to determine ρ is to solve (1.2). However, in this setting the problem simplifies considerably because the off-diagonal elements of G tend to zero, M is a diagonal matrix and (1.2) reduces to a vector equation for its diagonal elements. In case the variance matrix is doubly stochastic, y s xy = 1 (generalized Wigner matrix), the problem simplifies yet again, leading to M = m sc 1, where m sc = m sc (ζ ) is the Stieltjes transform of the celebrated semicircle law.
The main novelty of this work is to handle general correlations that do not allow to simplify (1.2). The off-diagonal matrix elements G xy , x = y, do not vanish in general, even in the N → ∞ limit. The proof of the local law consists of two major parts. First, we derive an approximate equation Second, we show that the matrix Dyson equation (1.2) is stable under small perturbations, concluding that G ≈ M. The nontrivial correlations and the non commutativity of the matrix structure in the Dyson equation pose major difficulties compared to the uncorrelated case.
Local laws are the first step of a general three step strategy developed in [24,25,27,29] for proving universality. The second step is to add a tiny independent Gaussian component and prove universality for this slightly deformed model via analyzing the fast convergence of the Dyson Brownian motion (DBM) to local equilibrium. Finally, the third step is a perturbation argument showing that the tiny Gaussian component does not alter the local statistics.
In fact, the second and the third steps are very robust arguments and they easily extend to the correlated case. They do not use any properties of the original ensemble other than the a priori bounds encoded in the local laws, provided that the variances of the matrix elements have a positive lower bound (see [12,26,41,42]). Therefore our work focuses on the first step, establishing the stability of (1.2) and thus obtaining a local law.
Prior to the current paper, bulk universality has already been established for several random matrix models which carry some specific correlation from their construction. These include sample covariance matrices [25], adjacency matrices of large regular graphs [7] and invariant β-ensembles at various levels of generality [9,10,16,17,30,34,47]. However, neither of these papers aimed at understanding the effect of a general correlation nor were their methods suitable to deal with it. Universality for Gaussian matrices with a translation invariant covariance structure was established in [3]. For general distributions of the matrix entries, but with a specific two-scale finite range correlation structure that is smooth on the large scale and translation invariant on the short scale, universality was proved in [14], independently of the current work.
Finally, we mention that there exists an extensive literature on the limiting eigenvalue distribution for random matrices with correlated entries on the global scale (see e.g. [5,6,13,33,36,46] and references therein), however these works either dealt with Gaussian random matrices or more specific correlation structures that allow one to effectively reduce (1.2) to a vector or scalar equation. While the matrix Dyson equation in full generality was introduced for the analysis on the global scale before us, we are not aware of a proof establishing that the empirical density of states converges to the deterministic density given by the solution of the MDE for a similarly broad class of models that we consider in this paper. This convergence is expressed by the fact that 1 N Tr G(ζ ) ≈ 1 N Tr M(ζ ) holds for any fixed ζ ∈ H. We thus believe that our proof identifying the limiting eigenvalue distribution is a new result even on the global scale for ensembles with general short range correlations and non-Gaussian distribution.
We present the stability of the MDE and its application to random matrices with correlated entries separately. Our findings on the MDE are given in Sect. 2.1, while Sect. 2.2 contains the results about random matrices with correlated entries. These sections can be read independently of each other, with the latter relying on the former only through some basic definitions. In Sect. 3 we prove the local law for random matrices with correlations. The proof relies on the results stated in Sect. 2.1. These results concerning the MDE are established in Sect. 4, which can be read independently of any other section. Besides the results from Sect. 2.1 on the MDE the main ingredients of the proof in Sect. 3 are (i) estimates on the random error term appearing in the approximate MDE (1.4) and (ii) the fluctuation averaging mechanism for this error term. These two inputs (Lemma 3.4 and Proposition 3.5) are established in Sects. 5 and 6, respectively. However, Sect. 3 can be understood without reading the ensuing sections, taking these inputs for granted. Finally, we apply the local law to establish the rigidity of eigenvalues and bulk universality in Sect. 7. The "Appendix" collects the proofs for auxiliary results of generic nature that are not directly concerned with either the MDE or the random matrices.

The matrix Dyson equation
In this section we present our main results on the Matrix Dyson Equation and its stability. The corresponding proofs are carried out in Sect. 4. We consider the linear space C N ×N of N × N complex matrices R = (r xy ) N x,y=1 , and make it a Hilbert space by equipping it with the standard normalized scalar product We denote the cone of strictly positive definite matrices by C + := { R ∈ C N ×N : R > 0 }, and by C + its closure, the cone of positive semidefinite matrices. Let A = A * ∈ C N ×N be a self-adjoint matrix. We will refer to A as the bare matrix. Furthermore, let S : C N ×N → C N ×N be a linear operator that is T, for all R, T ∈ C N ×N . We will refer to S as the self-energy operator.
We call a pair (A, S) consisting of a bare matrix and a self-energy operator with the properties above a data pair. For a given data pair (A, S) and a spectral parameter ζ ∈ H in the upper half plane we consider the associated Matrix Dyson Equation The question of existence and uniqueness of solutions to (2.2) with the constraint (2.3) has been answered in [38]. The MDE has a unique solution matrix M(ζ ) for any spectral parameter ζ ∈ H and these matrices constitute a holomorphic function M : H → C N ×N . On the space of matrices C N ×N we consider three norms. For R ∈ C N ×N we denote by R the operator norm induced by the standard Euclidean norm · on C N , by R hs := √ R , R the norm associated with the scalar product (2.1) and by |r xy |, (2.4) the entrywise maximum norm on C N ×N . We also denote the normalized trace of R by R := 1 , R . For linear operators T : C N ×N → C N ×N we denote by T the operator norm induced by the norm · on C N ×N and by T sp the operator norm induced by · hs .
The following proposition provides a representation of the solution M as the Stieltjes-transform of a measure with values in C + . This is a standard result for matrix-valued Nevanlinna functions (see e.g. [32]). For the convenience of the reader we provide a proof which also gives an effective control on the boundedness of the support of this matrix-valued measure.
The measure V(dτ ) = (v xy (dτ )) N x,y=1 on the real line with values in positive semidefinite matrices is unique. It satisfies the normalization V(R) = 1 and has support in the interval [− κ, κ], where We will now make additional quantitative assumptions on the data pair (A, S) that ensure a certain regularity of the measure V(dτ ). Our assumptions, labeled A1 and A2, always come together with a set of model parameters P 1 and P 2 , respectively, that control them effectively. Estimates will typically be uniform in all data pairs that satisfy these assumptions with the given set of model parameters. In particular, they are uniform in the size N of the matrix, which is of great importance in the application to random matrix theory.
A1 Flatness Let P 1 = ( p 1 , P 1 ) with p 1 , P 1 > 0. The self-energy operator S is called flat (with model parameters P 1 ) if it satisfies the lower and upper bound More precisely, where c > 0 is a universal constant and the constant C > 0 depends only on the model parameters P 1 and P 0 . Furthermore, ρ is real analytic on the open set {τ ∈ R : ρ(τ ) > 0}. Definition 2.3 (Self-consistent density of states) Assuming a flat self-energy operator, the probability density ρ : R → [0, ∞), defined through (2.9), is called the selfconsistent density of states (of the MDE with data pair (A, S)). We denote by supp ρ ⊆ R its support on the real line and call it the self-consistent spectrum. With a slight abuse of notation we also denote by A2 Faster than power law decay Let P 2 = (P, π 1 , π 2 ), where P > 0 is a constant and π k = (π k (ν)) ∞ ν=0 , k = 1, 2 are sequences of positive constants. The data pair (A, S) is said to have faster than power law decay (with model parameters P 2 ) if there exists a pseudometric d on the index space {1, . . . , N } such that the pseudometric space X = ({1, . . . , N }, d) has sub-P-dimensional volume (cf. (2.11)) and (2.13) holds for any ν ∈ N and x, y ∈ X.
In order to state bounds of the form (2.12) and (2.13) more conveniently we introduce the following matrix norms. Definition 2.4 (Faster than power law decay) Given a pseudometric d on {1, . . . , N } and a sequence π = (π(ν)) ∞ ν=0 of positive constants, we define: If R π 1, for some sequence π , we say that R has faster than power law decay (up to level 1 N ) in the pseudometric space X : This norm expresses the typical behavior of many matrices in this paper that they have an off-diagonal decay faster than any power, up to a possible mean-field term of order 1/N . Using this norm the bounds (2.12) and (2.13) take the simple forms: Our main result, the stability of the MDE, holds uniformly for all spectral parameters that are either away from the self-consistent spectrum, supp ρ, or where the selfconsistent density of states takes positive values. Therefore, for any δ > 0 we set Theorem 2.5 (Faster than power law decay of solution) Assume A1 and A2 and let δ > 0. Then there exists a positive sequence γ such that The sequence γ depends only on δ and the model parameters P 1 and P 2 .
Our main result on the MDE is its stability with respect to the entrywise maximum norm on C N ×N , see (2.4). The choice of this norm is especially useful for applications in random matrix theory, since the matrix valued error terms are typically controlled in this norm. We denote by the ball of radius τ > 0 around R ∈ C N ×N w.r.t. the entrywise maximum norm.
Theorem 2.6 (Stability) Assume A1 and A2, let δ > 0 and ζ ∈ D δ . Then there exist constants c 1 , c 2 > 0 and a unique function The function G is analytic. In particular, there exists a constant C > 0 such that for all D 1 , D 2 ∈ B max c 1 /2 (0). Furthermore, there is a sequence γ of positive constants, and a linear operator Z = Z ζ : C N ×N → C N ×N such that the derivative of G, evaluated at D = 0, has the form ∇G(0) = Z + M Id, (2.18) and Z, as well as its adjoint Z * with respect to the scalar product (2.1), satisfy Here c 1 , c 2 , C and γ depend only on δ and the model parameters P 1 , P 2 from assumptions A1 and A2.
Theorem 2.6 states quantitative regularity properties of the analytic map G. These estimates yield strong stability properties of the MDE. For a concrete application in the proof of the local law, see Corollary 3.2 below.

Random matrices with correlations
In this section we present our results on local eigenvalue statistics of random matrices with correlations. Let H = (h x,y ) N x,y=1 ∈ C N ×N be a self-adjoint random matrix. For a spectral parameter ζ ∈ H we consider the associated Matrix Dyson Equation is close to the non-random solution M(ζ ) of the MDE (2.20), provided N is large enough. In order to list these assumptions, we write H as a sum of its expectation and fluctuation Here, the bare matrix A is a non-random self-adjoint matrix and W is a self-adjoint random matrix with centered entries, E W = 0. The normalization factor N −1/2 in (2.22) ensures that the spectrum of the fluctuation matrix W, with entries of a typical size of order one, remains bounded. In the following we will assume that there exists some pseudometric d on the index set {1, . . . , N }, such that the resulting pseudometric space has sub-P-dimensional volume for some constant P > 0, i.e. d satisfies (2.11), and that the bare and fluctuation matrices satisfy the following assumptions: B1 Existence of moments Moments of all orders of W exist, i.e., there is a sequence of positive constants κ 1 = (κ 1 (ν)) ν∈N such that for all x, y ∈ X and ν ∈ N. B2 Decay of expectation The entries a xy of the bare matrix A decay in the distance of the indices x and y, i.e., there is a sequence of positive constants for all x, y ∈ X and ν ∈ N. B3 Decay of correlation The correlations in W are fast decaying, i.e., there is a sequence of positive constants κ 3 = (κ 3 (ν)) ν∈N such that for all symmetric sets , and all smooth functions φ : C |A| → R and ψ : C |B| → R, we have is the distance between A and B in the product metric on X. The supremum norm on vector valued functions There is a positive constant κ 4 such that for any two deterministic vectors where · denotes the standard Euclidean norm on C N .
We consider the constants (2.28) Furthermore, the normalized trace converges with the improved rate (2.29) The constant C depends only on the model parameters K in addition to δ, ε and ν.
In Sect. 3 we present the proof of Theorem 2.7 that is based on the results from Sect. 2.1 about the Matrix Dyson Equation. As a standard consequence of the local law (2.28) and the uniform boundedness of Im m x x from Theorem 2.5, the eigenvectors of H in the bulk are completely delocalized. This directly follows from the uniform boundedness of Im G x x (ζ ) and spectral decomposition of the resolvent (see e.g. [20]).

Corollary 2.8 (Delocalization of eigenvectors)
Pick any δ, ε, ν > 0 and let u be a normalized, u = 1, eigenvector of H, corresponding to an eigenvalue λ ∈ R in the bulk, i.e., ρ(λ) δ. Then for a positive constant C, depending only on the model parameters K in addition to δ, ε and ν.
The averaged local law (2.29) directly implies the rigidity of the eigenvalues in the bulk. For any τ ∈ R, we define This is the index of an eigenvalue that is typically close to a spectral parameter τ in the bulk. Then the standard argument presented in Sect. 7.1 proves the following result. Corollary 2.9 (Rigidity) For any δ, ε, ν > 0 we have for a positive constant C, depending only on the model parameters K in addition to δ, ε and ν.
Another consequence of Theorem 2.7 is the universality of the local eigenvalue statistics in the bulk of the spectrum of H both in the sense of averaged correlation functions and in the sense of gap universality. For the universality statement we make the following additional assumption that is stronger than B4: for any real symmetric R ∈ R N ×N (any complex hermitian R ∈ C N ×N ).
When B5 is assumed we consider κ 5 as an additional model parameter.
The first formulation of the bulk universality states that the k-point correlation functions ρ k of the eigenvalues of H, rescaled around an energy parameter ω in the bulk, converge weakly to those of the GUE/GOE. The latter are given by the correlation functions of well known determinantal processes. The precise statement is the following: Corollary 2.10 (Correlation function bulk universality) Let H satisfy B1-B3 and B5 with β = 1 (β = 2). Pick any δ > 0 and choose any ω ∈ R with ρ(ω) δ. Fix k ∈ N and ε ∈ (0, 1/2). Then for any smooth, compactly supported test function : R k → R the k-point local correlation functions ρ k : R k → [0, ∞) of the eigenvalues of H converge to the k-point correlation function ϒ k : R k → [0, ∞) of the GOE(GUE)-determinantal point process, where τ = (τ 1 , . . . , τ k ), and the positive constants C, c depend only on δ, and the model parameters.
The second formulation compares the joint distributions of gaps between consecutive eigenvalues of H in the bulk with those of the GUE/GOE. The proofs of Corollaries 2.10 and 2.11 are presented in Sect. 7.2.

Corollary 2.11 (Gap universality in bulk)
Let H satisfy B1-B3 and B5 with β = 1 (β = 2). Pick any δ > 0, an energy τ in the bulk, i.e. ρ(τ ) δ, and let i = i(τ ) be the corresponding index defined in (2.30). Then for all n ∈ N and all smooth compactly supported observables : R n → R, there are two positive constants C and c, depending on n, δ, and the model parameters, such that the local eigenvalue distribution is universal, Here the second expectation E G is with respect to GUE and GOE in the cases of complex Hermitian and real symmetric H, respectively, and ρ sc (0) = 1/(2π) is the value of Wigner's semicircle law at the origin.
During the final preparation of this manuscript and after announcing our theorems, we learned that a similar universality result but with a special correlation structure was proved independently in [14]. The covariances in [14] have a specific finite range and translation invariant structure, where ψ is a piecewise Lipschitz function with finite support in the third and fourth variables. The short scale translation invariance in (2.32) allows one to use partial Fourier transform after effectively decoupling the slow variables from the fast ones. This renders the matrix Eq. (1.2) into a vector equation for N 2 variables and the necessary stability result directly follows from [1]. The main difference between the current work and [14] is that here we analyze (1.2) as a genuine matrix equation without relying on translation invariance and thus arbitrary short range correlations are allowed.

Local law for random matrices with correlations
In this section we show how the stability of the MDE, Theorem 2.6, can be combined with probabilistic estimates for random matrices with correlated entries to obtain a conceptually simple proof of the local law, Theorem 2.7. We state these probabilistic estimates in Lemma 3.4 and Proposition 3.5 below before applying them to establish the local law. Their proofs are postponed to Sects. 5 and 6, respectively. Consider any self-adjoint random matrix H, and let (A, S) be the data pair for the MDE generated by the first two moments of H through (2.20). Clearly, the self-energy operator S : C N ×N → C N ×N generated by (2.20) is self-adjoint with respect to the scalar product (2.1), and preserves the cone of positive semidefinite matrices. The next lemma, whose proof is postponed to end of this section, shows that also the other assumptions with regards to our MDE results in Sect. 2.1 are satisfied for random matrices considered in Sect. 2.2. In order to apply the stability of the MDE, we first write the defining equation for the resolvent (2.21), namely −1 = (ζ 1 − H)G(ζ ), of H into the form a perturbed version of the MDE (2.2). Here the error matrix D : We view the resolvent G(ζ ) as a perturbation of the deterministic matrix M(ζ ) induced by the random perturbation D(ζ ). Using the notation G = G ζ from Theorem 2.6, we identify from (2.2) and (3.1a), M(ζ ) = G ζ (0) and G(ζ ) = G ζ (D(ζ )). Thus Theorem 2.6 yields the following:

4)
for some non-random J(ζ ) ∈ C N ×N with fast decay, J(ζ ) γ C, where the sequence γ is from Theorem 2.6.   3.2 shows, that on the event where the rough a-priori bound (3.2) holds, the proof of the local law (2.28) and (2.29) is reduced to bounding the error D on the right hand sides of (3.3) and (3.4) by (N Im ζ ) −1/2 and (N Im ζ ) −1 , respectively. In order, to state such estimates for the error matrix we use the notion of stochastic domination, first introduced in [20], that is designed to compare random variables up to N ε -factors on very high probability sets.
for any ε > 0, ν ∈ N and some (N -independent) family of positive constants C. In this case we write X ≺ Y .
In this paper the family C of constants in Definition 3.3 will always be an explicit function of the model parameters (2.27) and possibly some additional parameters that are considered fixed and apparent from the context. However, the constants are always uniform in the spectral parameter ζ on the domain under consideration and indices x, y in case X = r xy is the element of a matrix R = (r xy ) x,y . To use the notion of stochastic domination, we will think of H = H (N ) as embedded into a sequence of random matrices with the same model parameters.
The following lemma asserts that the error matrix D from (3.1b) converges to zero as the size N of the random matrix grows to infinity.
Near the real axis and in the regime where the harmonic extension of the self-consistent density of states is bounded away from zero, we have for all ζ ∈ H with ρ(ζ ) δ and Im ζ N −1+ε .
The proof of this key technical result is postponed to Sect. 5. In order to bound the first term on the right hand side of (3.4) we use the following fluctuation averaging mechanism (introduced in [28] for Wigner matrices) to improve the bound (3.6) to a better bound for the inner product J, D , given a version of the entry-wise local law. and dist(ζ, Spec(H B )) −1 ≺ N C for all B X. Let ε ∈ (0, 1/2) be a constant and a non-random control parameter with N −1/2 N −ε . Suppose that the entrywise local law holds in the form Then the error matrix D, defined in (3.1b), satisfies for every non-random R ∈ C N ×N with faster than power law decay.
Note that Proposition 3.5 is stated on a slightly larger domain of spectral parameters than Theorem 2.7 as it allows ζ to be away from the convex hull of supp ρ even if ρ(ζ ) is not bounded away from zero. This slight extension will be needed in Sect. 7. The proof of Proposition 3.5 is carried out in Sect. 6. We have now stated all the results needed to prove the local law. In order to keep formulas short, will use the notation: Proof of Theorem 2. 7 We will start with the proof of (2.28). By the Stieltjes transform representation (2.5) of M and the trivial bound G(ζ ) 1 Im ζ the norm of the difference (ζ ) converges to zero as ζ moves further away from the real axis. In particular, the a-priori bound (3.2) needed for Corollary 3.2 automatically holds for sufficiently large Im ζ . Thus combining the corollary with the unconditional error bound (3.5) the estimate (3.3) takes the form for any fixed constant C 1 > 0 and sufficiently large η * . Now let τ ∈ R, η 0 ∈ [N −1+ε , η * ] and ζ 0 = τ + iη 0 ∈ H such that ρ(ζ 0 ) δ for some δ ∈ (0, 1]. Note that ρ(ζ 0 ) δ and η 0 η * imply τ ∈ [− C 1 , C 1 ] for some positive constant C 1 because ρ is the harmonic extension of the self-consistent density of states with compact support in [−κ, κ] (Proposition 2.1). Since in addition the selfconsistent density of states is uniformly Hölder continuous (cf. Proposition 2.2), there is a constant c 1 , depending on δ and P, such that inf η∈[η 0 ,η * ] ρ(τ +iη) c 1 . Therefore, by (3.6) and (2.17) we infer that Since N −ε/2 (N Im ζ ) −1/2 , the inequality (3.11) establishes on a high probability event a gap in the set possible values that (ζ ) can take. The indicator function in (3.11) is absent for ζ = τ + iη * because of (3.10), i.e. at that point the value lies below the gap. From the Lipshitz-continuity of ζ → (ζ ) with Lipshitz-constant bounded by 2N 2 for Im ζ 1 N and a standard continuity argument together with a union bound (e.g. Lemma A.1 in [4]), we conclude that (ζ ) lies below the gap for any ζ with Im ζ ∈ [η 0 , η * ] with very high probability. Thus, using the definition of stochastic domination, we see that max ζ ∈D δ (ζ ) ≺ (N Im ζ ) −1/2 , i.e., the entrywise local law (2.28) holds. Now we prove (2.29). Let ζ ∈ H with Im ζ N −1+ε and ρ(ζ ) δ, so that the entrywise local law (3.7) holds at ζ with := (N Im ζ ) −1/2 . Applying the fluctuation averaging (Proposition 3.5) with := (N Im ζ ) −1/2 and R := J(ζ ) yields | J, D | ≺ (N Im ζ ) −1 . Plugging this estimate for the first term into the right hand side of (3.4), and recalling the definition of stochastic domination, yields (2.29). This finishes the proof of Theorem 2.7.
Proof of Lemma 3.1 The condition (2.12) on the bare matrix A is clearly satisfied by (2.24). The lower bound on S in (2.7) follows from (2.26). To show this, let i Wv| 2 κ 4 j i , for any normalized vector v ∈ C N . We will now verify the upper bounds on S in (2.7) and (2.13). Both bounds follow from the decay of covariances |E w xu w vy | κ 3 (2ν) (q xy q uv ) ν + (q xv q uy ) ν , q xy := Indeed, to see the upper bound in (2.7) it suffices to show for a constant C > 0, depending on K , and any normalized vector r ∈ C N , because we can use for any R ∈ C + the spectral decomposition R = i i r i r * i as above. The estimate (3.12) yields where we defined the matrix Q (ν) with entries q (v) xy := q ν xy and |r| := (|r x |) x∈X . Since The bound (2.13) follows because the right hand side of (3.15) is finite.

The solution of the matrix Dyson equation
Most of the inequalities in this and the following section are uniform in the data pair (A, S) that determines the MDE and its solution, given a fixed set of model parameters P k corresponding to the assumptions Ak. We therefore introduce a convention for inequalities up to constants, depending only on the model parameters.

Convention 4.1 (Comparison relation and constants)
Suppose a set of model parameters P is given. Within the proofs we will write C and c for generic positive constants, depending on P. In particular, C and c may change their values from inequality to inequality. If C, c depend on additional parameters L , we will indicate this by writing C(L ), c(L ). We also use the comparison relation α β or β α for any positive α and β if there exists a constant C > 0 that depends only on P, but is otherwise uniform in the data pair (A, S), such that α Cβ. In particular, C does not depend on the dimension N or the spectral parameter ζ . In case α β α we write α ∼ β.
For two matrices R, T ∈ C + we similarly write R T if the inequality R CT in the sense of quadratic forms holds with a constant C > 0 depending only on the model parameters.
In the upcoming analysis many quantities depend on the spectral parameter ζ . We will often suppress this dependence in our notation and write e.g. M = M(ζ ), ρ = ρ(ζ ), etc.
Proof of Proposition 2.1 In this proof we will generalize the proof of Proposition 2.1 from [2] to our matrix setup. By taking the imaginary part of both sides of the MDE and using Im M 0 and A = A * we see that Im ζ 1.
In particular, this implies the trivial bound on the solution to the MDE, Let w ∈ C N be normalized, w * w = 1. Since M(ζ ) has positive imaginary part, the analytic function ζ → w * M(ζ )w takes values in H. From the trivial upper bound (4.1) and the MDE itself, we infer the asymptotics iηw * M(iη)w → −1 as η → ∞.
By the characterization of Stieltjes transforms of probability measures on the complex upper half plane (cf. Theorem 3.5 in [31]), we infer where v w is a probability measure on the real line. By polarization, we find the general representation (2.5).
Indeed, letting ( · ) ± denote the positive and negative parts, we find for any R ∈ C N ×N . Since R hs R the bound S 1 follows. The following argument will prove that Im M(ζ ) → 0 as Im ζ ↓ 0 locally uniformly for all Let us fix ζ ∈ H with |ζ | > κ and suppose that M satisfies the upper bound Then by taking the inverse and then the norm on both sides of (2.2) we conclude that Therefore, (4.3) implies (4.4) and we see that there is a gap in the possible values of M , namely Since ζ → M(ζ ) is a continuous function and for large Im ζ the values of this function lie below the gap by the trivial bound (4.1), we infer By taking the norm on both sides of (4.6), using a trivial estimate on the right hand side and rearranging the resulting terms, we get Here we used M 2 S < 1, which is satisfied by (4.5) for |ζ | > κ. We may estimate the right hand side of (4.7) further by applying (4.5). Thus we find The right hand side of (4.8) converges to zero locally uniformly for all ζ ∈ H with |ζ | > κ as Im ζ ↓ 0. This finishes the proof of Proposition 2.1.
The following proposition lists bounds on M that, besides the ones stated in Sect. 2.1, constitute the only properties of M that we need outside this section.

Proposition 4.2 (Properties of the solution) Assume A1 and that A
P 0 for some constant P 0 > 0. Then uniformly for all spectral parameters ζ ∈ H the following bounds hold: (i) The solution is bounded in the spectral norm, .
Proof The inequalities (4.9) and (4.10) provide upper and lower bounds on the singular values of the solution, respectively. Before proving these bounds we show that M has a bounded normalized Hilbert-Schmidt norm, For this purpose we take the imaginary part of (2. where we used the definition of ρ in (2.10). Taking the normalized trace on both sides of (4.13) shows (4.12).
Proof of (ii) Taking the norm on both sides of (2.2) yields where S hs→ · denotes the norm of S from C N ×N equipped with the norm · hs to C N ×N equipped with · . For the last inequality in (4.14) we used (4.12) and that by A1 we have S hs→ · 1 (cf. (4.2)).
For the upper bound, taking the imaginary part of the MDE [cf. (4.6)] and using A1 and that Im M Im ζ 1 by the Stieltjes transform representation (2.5), we get Proof of (i) In the regime |ζ | 1 + κ the bound (4.9) follows from the Stieltjes transform representation (2.5). Thus we consider |ζ | 1 + κ. We take the imaginary part on both sides of (2.2) and use the lower bound in (4.11) and S [1]

Since in general Im
In order to show the fast decay of off-diagonal entries of M, Theorem 2.5, we rely on the following general result on matrices with decaying off-diagonal entries.

Lemma 4.3 (Perturbed Combes-Thomas estimate)
with some positive sequence β = (β(ν)) ∞ ν=0 , and R −1 1. Then there exists a sequence α = (α(ν)) ∞ ν=0 , depending only on β and P (cf. (2.11)), such that This lemma is reminiscent of a standard Combes-Thomas estimate: an off-diagonal decay of the entries of a matrix R implies a similar decay for its inverse, R −1 , provided the smallest singular value is bounded away from zero. Indeed, in the case of α(0) = β(0) = 0 the proof of this lemma directly follows from the standard strategy for establishing Combes-Thomas estimates, see e.g. Proposition 13.3.1. in [45]; we omit the details. We now explain how to extend this standard result to our case, where Lemma 4.3 allows for a nondecaying component. The detailed proof will be given in the "Appendix", here we only present the basic idea.
Write R = S + T, where S has a fast off-diagonal decay and T has entries of size |t xy | N −1 . Note that T cannot simply be considered as a small perturbation since its norm can be of order one, i.e. comparable with that of S. Instead, the proof relies on showing that S inherits the lower bound on its singular values from R and then applying the standard α(0) = β(0) = 0 version of the Combes-Thomas estimate to S to generate the decaying component of R −1 . The point is that T can potentially change only finitely many singular values by a significant amount since T max N −1 . If these few singular values were close to zero, then they would necessarily be isolated, hence the corresponding singular vectors would be strongly localized. However, because of its small entries, T acts trivially on localized vectors which implies that isolated singular values are essentially stable under adding or subtracting T. This argument excludes the creation of singular values close to zero by subtracting T from R. The details are found in the "Appendix". Putting all these ingredients together, we can now complete the proof of Theorem 2.5.
Proof of Theorem 2.5 Recall the model parameters π 1 , π 2 from A2. We consider the MDE (2.2) entrywise and see that where we used the assumptions (2.12) and (2.13), as well as M max M . By (4.9) and ζ ∈ D δ , we have M δ −1 . Furthermore, for large |ζ | we also have M(ζ ) |ζ | −1 . We can now apply Lemma 4.3 with the choice R := M M −1 to see the existence of a positive sequence γ such that (2.15) holds. This finishes the proof of Theorem 2.5.

Stability of the matrix Dyson equation
The goal of this section is to prove Proposition 2.2 and Theorem 2.6. The main technical result, which is needed for these proofs, is the linear stability of the MDE. For its statement we introduce for any R ∈ C N ×N the sandwiching operator C R : Note that C −1 R = C R −1 and C * R = C R * for any R ∈ C N ×N , where C * R denotes the adjoint with respect to the scalar product (2.1).

Proposition 4.4 (Linear stability) Assume A1 and A
P 0 for some constant P 0 > 0 (cf. (2.8)). There exists a universal numerical constant C > 0 such that uniformly for all ζ ∈ H: (4.17) Before we show a few technical results that prepare the proof of Proposition 4.4, we give a heuristic argument that explains how the operator Id − C M S on the left hand side of (4.17) is connected to the stability of the MDE (2.2), written in the form with perturbation matrix D has a unique solution G(D), depending differentiably on D. Then by differentiating on both sides of (4.18) with respect to D, setting D = 0 and using the MDE for M(ζ ) = G(0), we find where ∇ R denotes the directional derivative with respect to D in the direction R ∈ C N ×N . Rearranging the terms in (4.19) and multiplying with M = M(ζ ) from the left yields Thus G(D) has a bounded derivative at D = 0, i.e., the MDE is stable with respect to the perturbation D to linear order, whenever the operator Id − C M S is invertible and its inverse is bounded. In order to extend the linear stability to the full stability of the MDE for non-infinitesimal perturbations, the linear stability bound (4.17) is fed as an input into a quantitative implicit function theorem [cf. (b) of Lemma 4.10 and (4.55) below]. The implicit function theorem then yields the existence of the analytic map D → G(D) appearing in Theorem 2.6.
The following definition will play a crucial role in the upcoming analysis.
where we have introduced an auxiliary matrix We call F the saturated self-energy operator or the saturation of S for short.
The operator F inherits the self-adjointness with respect to (2.1) and the property of mapping C + to itself from the self-energy operator S. We will now briefly discuss the reason for introducing F. In order to invert Id − C M S in (4.20) we have to show that C M S is dominated by Id in some sense. Neither S nor M can be directly related to the identity operator, but their specific combination C M S can. We extract this delicate information from the MDE via a Perron-Frobenius argument. The key observation is that as Im ζ ↓ 0 the imaginary part of the MDE (4.6) becomes an eigenvalue equation for the operator R → M * S[R]M with eigenvalue 1 and corresponding eigenmatrix Im M. Since this operator is positivity preserving and Im M ∈ C + , its spectral radius is 1. Naively speaking, through the replacement of M * by M, the operator C M S gains an additional phase which reduces the spectral radius further and thus guarantees the invertibility of Id − C M S. However, the non-selfadjointness of the aforementioned operators makes it hard to turn control on their spectral radii into norm-estimates. It is therefore essential to find an appropriate symmetrization of these operators before Perron-Frobenius is applied. A similar problem appeared in a simpler commutative setting in [2]. There, M = diag(m) was a diagonal matrix and the MDE became a vector equation. In this case the problem of inverting Id − C M S reduces to inverting a matrix 1 − diag(m) 2 S, where S ∈ R N ×N is a matrix with non-negative entries that plays the role of the self-energy operator S in the current setup. The idea in [2] was to write with invertible diagonal matrices R and T, a diagonal unitary matrix U and a selfadjoint matrix F, playing the role of the operator F, with positive entries that satisfies the bound F 1. It is then possible to see that U − F is invertible as long as U does not leave the Perron-Frobenius eigenvector of F invariant. In this commutative setting it is possible to choose F = diag(|m|)S diag(|m|), where the absolute value is taken in each component. In our current setting we will achieve a decomposition similar to (4.22) on the level of operators acting on C N ×N [cf. (4.39) below]. The definition (4.21) ensures that the saturation F is self-adjoint, positivity-preserving and satisfies F 1, as we will establish later.
(i) The spectral radius of F is given by The eigenmatrix F is controlled by the solution of the MDE: Proof Since F preserves the cone C + of positive semidefinite matrices, a version of the Perron-Frobenius theorem for cone preserving operators implies that there exists a normalized F ∈ C + such that F[F] = F sp F. We will show uniqueness of this eigenmatrix later in the proof. First we will prove that (4.24) holds for any such F. Proof of (i) We define for any matrix R ∈ C N ×N the operator K R : Note that for self-adjoint R ∈ C N ×N we have K R = C R [cf. (4.16)]. Using definition (4.27), the imaginary part of the MDE (4.6) can be written in the form (4.28) We will now write up the Eq. (4.28) in terms of Im M, F and W. In order to express M in terms of W, we introduce the unitary matrix Here, the matrices W and U commute. The identity (4.30) should be viewed as a balanced polar decomposition. Instead of having unitary matrices U 1 or U 2 on the left or right of the decompositions M = U 1 Q 1 or M = Q 2 U 2 , respectively, the unitary matrix U * appears in the middle of M = Q * U * Q with Q = W √ Im M. Using (4.30) we also find an expression for K M , namely Plugging (4.31) into (4.28) and applying the inverse of C √ Im M C W K U * on both sides, yields where we used the definition of F from (4.21) and K −1 U * [W −2 ] = W −2 , which holds because U and W commute. We project both sides of (4.32) onto the eigenmatrix F of F. Since F is self-adjoint with respect to the scalar product (2.1) and by Solving this identity for F sp yields (4.24).
Proof of (ii) and (iii) Let ζ ∈ H with |ζ | 3(1 + κ) and F(ζ ) sp 1/2. The bounds on the eigenmatrix (4.25) and on the spectral gap (4.26) are a consequence of the estimate We verify (4.33) below. Given (4.33), the remaining assertions, (4.25) and (4.26), of Lemma 4.7, are consequences of the following general result that is proven in the "Appendix". It generalises to a non-commutative setting the basic fact (cf. Lemma A.1) that symmetric matrices with strictly positive entries have a positive spectral gap. The proof of Lemma 4.8 is given in the "Appendix".
for some positive constants γ and . Then T has a spectral gap of size θ := γ 6 2 4 , i.e., Furthermore, the eigenvalue 1 is non-degenerate and the corresponding normalized, Lemma 4.8 shows the uniqueness of the eigenmatrix F as well. In the regime |ζ | 3(1+κ) the constants hidden in the comparison relation of (4.33) will depend on |ζ |, but otherwise the upcoming arguments are not affected. In particular the qualitative property of having a unique eigenmatrix F remains true even for large values of |ζ |.
Proof of (4.33): The bounds in (4.33) are a consequence of Assumption A1 and the bounds (4.23) on W and (4.11) on Im M, respectively. Indeed, from A1 we have S[R] ∼ R 1 for positive semidefinite matrices R. By the definition (4.21a) of F this immediately yields Since (4.23) and (4.11) imply M −2 1 C W [Im M] M 3 1, we conclude that (4.33) holds.

Proof of Proposition 4.4
To show (4.17) we consider the regime of large and small values of |ζ | separately. We start with the simpler regime, |ζ | 3(1 + κ). In this case we apply the bound M(ζ ) (|ζ | − κ ) −1 , which is an immediate consequence of the Stieltjes transform representation (2.5) of M. In particular, where we used κ S 1/2 in the last and second to last inequality. We also used that T sp T for any self-adjoint T ∈ C N ×N . The claim (4.17) hence follows in the regime of large |ζ |. Now we consider the regime |ζ | 3(1+κ). Here we will use the spectral properties of the saturated self-energy operator F, established in Lemma 4.7. First we rewrite Id − C M(ζ ) S in terms of F. For this purpose we recall the definition of U from (4.29). With the identity (4.30) we find Combining (4.38) with the definition of F from (4.21a) we verify The bounds (4.23) on W and (4.11) on Im M imply bounds on C W and C √ Im M , respectively. In fact, in the regime of bounded |ζ |, we have (4.40) Therefore, taking the inverse and then the norm · sp on both sides of (4.39) and using (4.40) as well as C T sp C T for self-adjoint T ∈ C N ×N yields Note that C U and C U * are unitary operators on C N ×N and thus C U sp = C U * sp = 1.
We estimate the norm of the inverse of C U − F. In case F sp < 1/2 we will simply use the bound (C U − F) −1 sp 2 in (4.41) and (4.9) for estimating M , thus verifying (4.17) in this case.
If F sp 1/2, we apply the following lemma, which was stated as Lemma 5.8 in [2]. Lemma 4.9 (Rotation-Inversion Lemma) Let T be a self-adjoint and U a unitary operator on C N ×N . Suppose that T has a spectral gap, i.e., there is a constant θ > 0 such that with a non-degenerate largest eigenvalue T sp 1. Then there exists a universal positive constant C such that With the lower bound (4.26) on the spectral gap of F, we find Plugging (4.42) into (4.41) and using (4.9) to estimate M , shows (4.17), provided the denominator on the right hand side of (4.42) satisfies for some universal constant C > 0.
In the remainder of this proof we will verify (4.43). We establish lower bounds on both arguments of the maximum in (4.43) and combine them afterwards. We start with a lower bound on 1 − F sp . Estimating the numerator of the fraction on the right hand side of (4.24) from below and its denominator from above, F ,W −2 ρ M 2 F , by applying the bounds from (4.23) and (4.11), we see that (4.44) Since ρ(ζ ) is the harmonic extension of a probability density (namely the selfconsistent density of states ρ), we have the trivial upper bound ρ(ζ ) Im ζ / dist(ζ, supp ρ) 2 . Continuing from (4.44) wefind the lower bound where we used (4.9) in the second inequality. (4.29) and (4.21)] and because of (4.23) we have − Im U M −2 ρ. Continuing from (4.46), using the normalization F hs = 1 and (4.9), we get the lower bound Combining this with (4.45) shows (4.43) and thus finishes the proof of Proposition 4.4.
Proof of Proposition 2. 2 We show that the harmonic extension ρ(ζ ) of the selfconsistent density of states [cf. (2.10)] is uniformly c-Hölder continuous on the entire complex upper half plane. Thus its unique continuous extension to the real line, the self-consistent density of states, inherits this regularity. We differentiate both sides of the MDE with respect to ζ and find the equation Inverting the operator Id − C M S and taking the normalized Hilbert-Schmidt norm reveals a bound on the derivative of the solution to the MDE, Since ζ → M(ζ ) is an analytic function on H, we have the basic identity 2π i∂ ζ ρ = 2i∂ ζ Im M = ∂ ζ M . Therefore, making use of (4.48), we get For the last inequality in (4.49) we employed the bound (4.9) and the linear stability, Proposition 4.4. The universal constant C stems from its statement (4.17). From (4.49) we read off that the harmonic extension ρ of the self-consistent density of states is 1 C+3 -Hölder continuous. It remains to prove that ρ is real analytic at any τ 0 with ρ(τ 0 ) > 0. Since ρ is continuous, it is bounded away from zero in a neighborhood of τ 0 . Using (4.47), (4.9) and (4.17) we conclude that M is uniformly continuous in the intersection of a small neighborhood of τ 0 in C with the complex upper half plane. In particular, M has a unique continuous extension M(τ 0 ) to τ 0 . Furthermore, by differentiating (2.2) with respect to ζ and by the uniqueness of the solution to (2.2) with positive imaginary part one verifies that M coincides with the solution Q to the holomorphic initial value problem i.e. M(τ 0 + ω) = Q(ω) for any ω ∈ H with sufficiently small absolute value. Since the solution Q is analytic in a small neighborhood of zero, we conclude that M can be holomorphically extended to a neighborhood of τ 0 in C. By continuity (2.10) remains true for ζ ∈ R close to τ 0 and thus ρ is real analytic there.
In the proof of Theorem 2.6 we will often consider T : T is a linear operator on C N ×N equipped with two different norms. We indicate the norms in the notation of the corresponding induced operator norm T A→B . We will use A, B = hs, · , 1, ∞, max, etc. We still keep our convention that T sp = T hs→hs and T = T · → · . Furthermore, we introduce the norms Some of the norms on matrices R ∈ C N ×N are ordered, e.g. max{ R max , R hs } R R 1∨∞ . Note that if · A · A and · B · B , then · A→B · A→ B . In particular, for T : C N ×N → C N ×N we have e.g. T max→hs T max→ · T max→1∨∞ . In order to show the existence and properties of the map D → G(D) from Theorem 2.6 we rely on an implicit function theorem, which we state here for reference purposes. Lemma 4.10 (Quantitative implicit function theorem) Let T : C A × C D → C A be a continuously differentiable function with invertible derivative ∇ (1) T (0, 0) at the origin with respect to the first argument and T (0, 0) = 0. Suppose C A and C D are equipped with norms that we both denote by · , and let the linear operators on these spaces be equipped with the corresponding induced operator norms. Let δ > 0 and C 1 , C 2 < ∞ be constants, such that Here B # δ is the δ-ball around 0 with respect to · in C # , and ∇ (2) denotes the derivative with respect to the second variable.
Then there exists a constant ε > 0, depending only on δ, C 1 and C 2 , and a unique continuously differentiable function f : we compute the directional derivative of J with respect to G in the direction R ∈ We consider C N ×N ∼ = C N 2 with the entrywise maximum norm · max and use the short hand notation T max := T max→max for the induced operator norm of any linear T : C N ×N → C N ×N . To apply Lemma 4.10 in this setup we need the following two estimates: (i) The operator norm of ( Id − C M S ) −1 on (C N ×N , · max ) is controlled by its spectral norm, We will prove these estimates after we have used them to show that the hypotheses of the quantitative inverse function theorem hold. Let us first bound the operator R → ∇   d(x, y)) ν + γ (0), ν ∈ N. (4.57) Using (4.54) and (4.53), in conjunction with (4.9) and (4.17), we see that 1 are sufficiently small. The first part of Theorem 2.6, the existence and uniqueness of the analytic function G, now follows from the implicit function theorem Lemma 4.10. In particular, (2.17) follows from the analyticity.
Proof of (i) First we remark that (2.13) for a large enough ν and together with (2.11) imply S max→1∨∞ 1. (4.59) We expand the geometric series corresponding to the operator (Id − C M S) −1 to second order, We consider each of the three terms on the right hand side separately and estimate their norms as operators from C N ×N with the entrywise maximum norm to itself. The easiest is Id max = 1. For the second term we use the estimate For the third term on the right hand side of (4.60) we apply The last factor on the right hand side of (4.62) is bounded by For the first factor we use C M S hs→max C M S hs→ · C M S hs→ · . We plug this and (4.63) into (4.62). Then we use the resulting inequality in combination with (4.61) in (4.60) and find where we also used S max→ · S max→1∨∞ and (4.59). Since C M M 2 the claim (4.53) follows.
Proof of (ii) Recall the definition of W R in (4.51). We estimate for all R ∈ C N ×N . Here, the linear operator Z is given by (4.65) We will estimate the entries of the three summands separately. We show that Z[R] γ 1 2 for any R ∈ C N ×N with R max 1, where γ depends only on δ and P. We begin with a few easy observations: for two matrices R, T ∈ C N ×N that have faster than power law decay, R γ R 1 and T γ T 1, their sum and product have faster than power law decay as well, i.e., R +T γ R+T 1 and RT γ RT 1. Here, γ R+T and γ RT depend only on γ R , γ T and P (cf. (2.11)). Furthermore, we see that by (2.13) the matrix S[R] has faster than power law decay for any R ∈ C N ×N with R max 1.
By the following argument we estimate the first summand on the right hand side of (4.65). Using (2.13), MR max M 1∨∞ R max and the estimate (4.56), the matrix S[MR] has faster than power law decay. Since C M multiplies with M on both sides (cf. (4.16)) and M has faster than power law decay (cf. Theorem 2.5), we conclude that so has C M S[MR]. Now we turn to the second summand on the right hand side of (4.65). Since C M S[MR] has faster than power law decay, its entries are bounded. Using again (2.13) as above, we see that C M S applied to C M S[MR] has faster than power law decay as well.
Finally, we estimate the third summand from (4.65). Since the matrix C M S[MR] has faster than power law decay, its · hs -norm is bounded. By the linear stability (4.17) and ζ ∈ D δ , we conclude ( which is bounded by (4.59) and (4.56). Therefore, the third term on the right hand side of (4.65) is an application of C M S to a matrix with bounded entries, which results in a matrix with faster than power law decay.
we can follow the same line of reasoning as for the entries of Z [R]. This finishes the proof of (2.19) and with it the proof of Theorem 2.6.

Estimating the error term
In this section we prove the key estimates, stated precisely in Lemmas 3.4 and 5.1, for the error matrix D that appears as the perturbation in the Eq. (3.1) for the resolvent G. We start by estimating D(ζ ) in terms of the auxiliary quantity (ζ ) (cf. (3.9)) when ζ is away from the convex hull of supp ρ. To this end, we recall the two endpoints of this convex hull (cf. Proposition 3.5): With this definition, we have the resolvent expansion formula In particular, for any y ∈ B the rows of G outside B have the expansion Here we introduced, for any two index sets A, B ⊆ X, the short hand notation In case A = X we simply write B x and x = ∅ x , i.e., the superscript over the summation means exclusion of these indices from the sum. Recall that H is written as a sum of its expectation matrix A and its fluctuation 1 We use the expansion formula (5.5) on the resolvent elements in (WG) xy = u w xu G uy and find that the entries of D can be written in the form Note that the set B with y ∈ B here is arbitrary, e.g., it may depend on x and y. In fact, we will choose it to be a neighborhood of {x, y}, momentarily. Let A ⊆ B be another index set. We split the sum over z ∈ B in the second term on the right hand side of (5.6) into a sum over w ∈ A and w ∈ B\A and use ( and for some ε 1 > 0. Note that although D itself does not depend on the choice of ε 1 , its decomposition into D (k) does. We will estimate each error matrix D (k) separately, where the estimates may still depend on ε 1 . Since ε 1 > 0 is arbitrarily small, it is eliminated from the final bounds on D using the following property of the stochastic domination (Definition 3.3): if some positive random variables X, Y satisfy X ≺ N ε Y for every ε > 0, then X ≺ Y .
The following lemma provides entrywise estimates on the individual error matrices.
Proof We show the estimates (5.9a)-(5.9e) one by one. The bound (5.9a) is trivial since by the bounded moment assumption (2.23) the entries of W satisfy |w xy | ≺ 1. For the proof of (5.9b) we simply use first Cauchy-Schwarz in the v-summation of (5.7b) and then the Ward-identity, where we used the Schur complement formula in the form of the general resolvent expansion identity To the u-summation in ( The assumption (A.32) of Lemma A.2 is an immediate consequence of the decay of correlation (2.25). In order to verify (A.33) we use both (2.25) and the N -dependent smoothness , where we used (5.10) again. Finally, we turn to the proof of (5.9e). Let B k be as in the statement of Lemma 5.3. We set and use a telescopic sum to write d (5) xy as (5.14) We estimate the rightmost term in (5.14) simply by where the sum over u and v on the right hand side of the first inequality is bounded by a constant because of the decay of covariances (3.12) and we used (5.10) in the second ineqality. Thus, (5.9e) follows from (5.14) and the bound To show (5.15) we first see that where we used the general resolvent identity The last two terms on the right hand side of (5.16) are estimated by the second term on the right hand side of (5.15) using first Cauchy-Schwarz, the decay of covariances (3.12), and then the Ward-identity (5.10). For the first term in (5.16) we use the same argument as in (3.14) to see that (3.12) implies for any two vectors r, t ∈ C N ×N . We obtain (5.15) by applying (5.17) with the choice r u := G B k ux k , t v := G B k x k v and using the Ward-identity afterwards. In this way (5.9e) follows and Lemma 5.3 is proven.
The following definition is motivated by the formula that expresses the matrix elements of G B in terms of the matrix elements of G. For R ∈ C N ×N and A, B X we denote by R AB := (r xy ) x∈A,y∈B its submatrix. In case A = B we write R AB = R A for short. Then we have

Definition 5.4 For
We will make use of the following general fact: If R ∈ C N ×N satisfies R δ 1, as well as then any submatrix R A of R satisfies We verify (5.24) for R = M in two separate regimes and thus show (5.23). First let ζ be such that ρ(ζ ) δ/2. Then the lower bound in the imaginary part in (5.24) follows from (4.11) and (4.9). Now let ζ be such that δ/2 dist(ζ, [κ − , κ + ]) δ −1 . Then we may also assume that we have dist(Re ζ, [κ − , κ + ]) δ/4, because otherwise Im ζ δ/4 and thus ρ(ζ ) δ 1. In this situation the claim follows from the case that we already considered, namely ρ(ζ ) δ/2 because there δ was arbitrary. Since M is the Stieltjes transform of a C + -valued measure with support in [κ − , κ + ] (cf. (2.5)), its real part is positive definite to the left of κ − and negative definite to the right of κ + . In both cases we also have the effective bound |Re M| δ 1 because dist(Re ζ, [κ − , κ + ]) δ 4 . Now we apply (5.23) to see (5.20). By (2.24) and (2.13) the right hand side of (2.20) and with it M −1 has faster than power law decay. The same is true for its submatrix with indices in X\B. Thus (5.20) follows directly from the definition (5.19) of M B , the upper bound on its singular values from (5.23) and the Combes-Thomas estimate in Lemma 4.3.
To prove (5.21) we use where we applied (5.23) for the first comparison relation and used − Im M −1 ∼ δ ρ 1 (cf. (4.11) and (4.10)) for the second. The bound on Im m B x x in (5.21) follows and the bound on |m B x x | follows at least in the regime ρ(ζ ) δ/2. We are left with showing |m B x x | δ 1 in the case δ/2 dist(ζ, [κ − , κ + ]) δ −1 . As we did above, we may assume that dist(Re ζ, [κ − , κ + ]) for x, y / ∈ B and T B := ((T −1 ) X\B ) −1 , provided all inverses exist. We will use this identity for T = M, G. Note that this definition T B with T = G is consistent with the definition (5.4) on the index set X\B because of (5.18). Recalling that G B∪{x,y} = (G u,v ) u,v∈B∪{x,y} and M B∪{x,y} are matrices of dimension |B| + 2, we have Therefore, as long as (|B| where we used in the last step that (M B∪{x,y} ) −1 ∼ δ 1, which follows from using (5.24) and (5.25) for the choice R = M in the regimes ρ δ 1 and dist(Re ζ, [κ − , κ + ]) δ 1, respectively.
Again using the definite signs of the imaginary and real part of M as well as that of (M B∪{x,y} ) −1 in these two regimes, we infer that as well. We conclude that there is a constant c, depending only on δ and K , such that With the identity (5.26) the claim (5.22) follows and Lemma 5.5 is proven.

Proof of Lemmas 3.4 and 5.1
We begin with the proof of (3.5). We continue the estimates on all the error matrices listed in Lemma 5.3. Therefore, we fix ζ = τ + iη with |τ | C and η ∈ [1, C]. Since Im ζ 1, we have the trivial resolvent bound and a lower bound on diagonal elements, Indeed, to get the lower bound we apply the Schur complement formula applied to the (x, x)-element of the resolvent G B = (H B − ζ 1) −1 to obtain We take absolute value on both sides and estimate trivially, Here we used the first bound of (5.27) to control the norm of the resolvent and the assumptions ( Here we also used that by assumption (2.24) for any ν ∈ N the expectation matrix satisfies to obtain the second summand on the right hand side of (5.29) from estimating |d (2) xy |. Since ε 1 > 0 was arbitrary (5.29) implies (3.5). Now we prove (3.6) and (5.2) in tandem. Let δ > 0 and ζ ∈ H such that δ dist(ζ, [κ − , κ + ]) + ρ(ζ ) δ −1 and Im ζ N −1+ε . We show that From (5.31) the bound (3.6) follows immediately in the regime where ρ δ. Also (5.2) follows from (5.31). Indeed, in the regime of spectral parameters ζ ∈ H with δ dist(ζ, [κ − , κ + ]) δ −1 we have ρ ∼ δ Im ζ because ρ is the harmonic extension of a probability density supported inside [κ − , κ + ].

Fluctuation averaging
In this section we prove Proposition 3.5 by which a error bound for the entrywise local law can be used to improve the bound on the error matrix D to 2 , once D is averaged against a non-random matrix R with faster than power law decay.
Proof of Proposition 3.5 Let R ∈ C N ×N with R β 1 for some positive sequence β. Within this proof we use Convention 4.1 such that ϕ ψ means ϕ Cψ for a constant C, depending only on P := (K , δ, ε 1 , β, C), where C and δ are the constants from the statement of the proposition, K are the model parameters (cf (2.27)) and ε 1 enters in the splitting of the error matrix D into D (k) (cf. (5.8)). Note that since ε 1 is arbitrary it suffices to show (3.8) up to factors of N ε 1 . We will also use the notation O ≺ ( ) for a random variable that is stochastically dominated by some nonnegative .
We split the expression R, D from (3.8) according to the definition (5.7) of the matrices D (k) . Then we estimate R, D (k) separately for every k. We do this in three steps. First we estimate D (k) max for k = 2, 3, 5 directly without using the averaging effect of R. Afterwards we show the bounds on R, D (1) and R, D (4) , respectively. In the upcoming arguments the following observation will be useful. The local law Estimating D (k) max : Here, we show that under the assumption (3.7) the error matrices with indices k = 2, 3, 5 satisfy the improved entrywise bound where ε 1 stems from (5.8) and P is the model parameter from (2.11). We start by estimating the entries of D (2) . Directly from its definition in (5.7b) we infer | d (2) The maximum norm on the entries of the resolvents G and G B xy are bounded by (6.1) and (5.20). The decay (2.12) of the entries of the bare matrix and that d(v, z) N ε 1 in the last sum then imply D (2) max ≺ N −ν for any ν ∈ N.
To show (6.2) for k = 3 we use the representation (5.11) and the large deviation estimate (5.12) just as we did in the proof of Lemma 5. 3 3) The faster than power law decay of M from (2.15) together with the definition of A xy in (5.8) implies max z / ∈A xy |m zy | C(ν)N −ε 1 ν for any ν ∈ N. Since N −1/2 we infer (6.2) for k = 3 from (6.3).
Finally we consider the case k = 5. We follow the proof of Lemma 5.3 and use the representation (5.14). We estimate the two summands on the right hand side of (5.14), starting with the second term. We rewrite this term in the form To bound the first term on the right hand side of (5.14) we use (5.16). Each of the three terms on the right hand side of (5.16) has to be bounded by N 2ε 1 P 2 . The second and third term are bounded by 1 N by the decay of covariances (3.12). For the first term we use (5.17), (6.1) and (5.20).
Estimating R, D (1) : Here we will show that | R, D (1) | ≺ N C Pε 1 2 , (6.4) for some numerical constant C > 0. We split the error matrix D (1) into two pieces D (1) where B xy is a 2N ε 1 -environment of the set {x, y} (cf. (5.8)). The second part is trivially bounded, D (1b) max ≺ N ε 1 P 2 , using the local law (6.1), with B = ∅. For the bound on R, D (1a) we write R, D (1a) = X + Y + Z , (6.5) where the term on the right hand side are sums of σ xuy := N −3/2 r xy w xu m uy over disjoint domains expressed in terms of the following metric balls: The fast decay of off-diagonal entries for R and M, |r xy | + |m xy | 1 N for d(x, y) N ε 1 (cf. (2.15)), yields immediately |X | ≺ μ N 2Pε 1 μ N −3μ . This suffices for (6.4). The off-diagonal decay also yields where the sums are over index tuples x = (x 1 , . . . , x 2μ ) ∈ X 2μ and B k x := B k N ε 1 (x) is the ball around x with respect to the product metric In (6.6b) we have used the triangle inequality to conclude that d(u, x) 3N ε 1 . For Y and Z we continue the estimates in (6.6) by using the decay of correlations (2.25) and the ensuing lumping of index pairs (x i , u i ): y 2 ), (y 2 , y 1 )}) is the symmetrized distance on X 2 , induced by d. Inserting (6.7) into the moment bounds on Y and Z effectively reduces the combinatorics of the sum in (6.6a) from N 4μ to N 2μ and in (6.6b) from N 2μ to N μ . We conclude that | R, D (1a) | ≺ N C Pε 1 N −1 . Moreover, together with N −1/2 and our earlier estimate for R, D (1b) this yields (6.4). Estimating R, D (4) : Similarly to the strategy for estimating | R, D (1) | we write Here we also used (6.1) and (5.20) for the second stochastic domination bound. Combining (6.9) with (6.1) we see that The rest of the proof of Proposition 3.5 is dedicated to showing the high moment bound Similar to (6.6) the fast decay of off-diagonal entries of both R and M, and the a priori bound (6.9) immediately yield |X | ≺ μ N 4ε 1 Pμ N −2μ 2μ . Since this is already sufficient for (6.11), we focus on the terms Y and Z in (6.12). Using again the decay of off-diagonal entries yields: x,y z∈B 1 We call the subscripts i of the indices x i and z i labels. In order to further estimate the moments of Y and Z we introduce the set of lone labels of (x, z): The corresponding index pair (x i , z i ) for i ∈ L(x, z), is called lone index pair. We partition the sums in (6.12a) and (6.12b) according to the number of lone labels, i.e. we insert the partition of unity 1 = 2μ =0 1(|L(x, z)| = ). A simple counting argument reveals that fixing the number of lone labels reduces the combinatorics of the sums in (6.12a) and (6.12b). More precisely, (6.14) The expectation in (6.12a) and (6.12b) is bounded using the following technical result.

Lemma 6.1 (Key estimate for averaged local law) Assume the hypotheses of Propo-
Using (6.14) and Lemma 6.1 on the right hand sides of (6.12a) and (6.12b) after partitioning according to the number of lone labels, yields (6.16) Since N −1/2 the high moment bounds in (6.16) together with the simple estimate for X imply (6.11). This finishes the proof of Proposition 3.5 up to verifying Lemma 6.1 which will occupy the rest of the section.
Proof of Lemma 6.1 Let us consider the data ξ := (x, y, (Q i ) 2μ i=1 ) fixed. We start by writing the product on the left hand side of (6.15) in the form. , v), (6.17) where the two auxiliary functions ξ , w x,y : X 2μ × X 2μ → C, are defined by In order to estimate (6.17) we partition the sum over the indices u i and v i depending on their distance from the set of lone index pairs, To this end we introduce the partition {B i : i ∈ {0} ∪ L} of X, when i = 0, (6.20) and the shorthand where the components specify whether u i and v i are close to a lone index pair or not; e.g. σ i determines which lone index u i is close to, if any. For any fixed ξ , as σ runs through all possible elements of ({0} ∪ L ) 4μ , the sets B(ξ, σ ) form a partition of the summation set on the right hand side of (6.17) (taking into account the restriction u i , v i / ∈ Q i ). Therefore it will be sufficient to estimate so that if an external index has an isolated label as subscript, then it is isolated from all the other indices in the following sense: , y, σ ).
Notice that isolated labels indicate not only separation from all other external indices, as lone labels do, but also from all internal indices. Given a resolvent entry G B uv we will refer to u, v as lower indices and the set B as an upper index set.
The next lemma, whose proof we postpone until the end of this section, yields an algebraic representation for (6.22) provided the internal indices are properly restricted. (signed) monomials ξ,σ,α : B(ξ, σ ) → C, such that ξ,σ,α (u, v) for each α is of the form: Here the notations (− 1) # and ( · ) # indicate possible signs and complex conjugations that may depend only on (ξ, σ, α), respectively, and that will be irrelevant for our estimates. The dependence on (ξ, σ, α) has been suppressed in the notations, e.g., n = n(ξ, σ, α), U r = U r (ξ, σ, α), etc. The numbers n and q of factors in (6.26) are bounded, n + q μ 1. Furthermore, for any fixed α the two subsets R (k) , k = 1, 2, form a partition of {1, . . . , 2μ}, and the monomials (6.26) have the following three properties: 1. The lower indices a t , b t , u t , v t , w t are in ∪ i∈ L B i , and d(a t , b t ) N ε 1 . 2. The upper index sets E r , F r , U r , U r , V r are bounded in size by N C(μ)ε 1 , and B r ⊆ U r , U r , V r . The total number of these sets appearing in the expansion (6.24) is bounded by N C(μ)ε 1 .

At least one of the following two statements is always true:
Since Lemma 6.1 relies heavily on this representation, we make a few remarks: (i) Monomials with different values of α may be equal. The indices a t , b t , u t , v t , w t may overlap, but they are always distinct from the internal indices since from (6.21) and (6.23) we see that (ii) The reciprocals of the resolvent entries are not important for our analysis because the diagonal resolvent entries are comparable to 1 in absolute value when a local law holds (cf. (5.21)). (iii) Property 3 asserts that each monomial is either a deterministic function of H (B i ) for some isolated label i, and consequently almost independent of the rows/columns of H labeled by x i , y i [Case (I)], or the monomial contains at least | L| additional off-diagonal resolvent factors [Case (II)]. In the second case, each of these extra factors will provide an additional factor for typical internal indices due to faster than power law decay of M and the local law (6.1). Atypical internal indices, e.g. when u r and v r are close to each other, do not give a factor since m u r v r is not small, but there are much fewer atypical indices than typical ones and this entropy factor makes up for the lack of smallness. These arguments will be made rigorous in Lemma 6.3 below. By using the monomial sum representation (6.24) in (6.22), and estimating each summand separately, we obtain 27) where the factor N C(μ)ε 1 originates from (6.25), and we have bounded the summation over by a μ-dependent constant. Thus (6.15) holds if we show, uniformly in α, that (6.28) In order to prove this bound, we fix α, and sum over the internal indices to get (1) (1) r r ∈R (2) (2) r , (6.29) where we have used the formula (6.26) for the monomial ξ,σ,α . The sums over the internal indices have been absorbed into the following factors: (2) .
The right hand side of (6.29) will be bounded using the following three estimates which follow by combining the monomial representation with our previous stochastic estimates.

Lemma 6.3 (Three sources of smallness)
Consider an arbitrary monomial ξ,σ,α , of the form (6.26). Then, under the hypotheses of Proposition 3.5, the following three estimates hold: 1. The resolvent entries with no internal lower indices are small while the reciprocals of the resolvent entries are bounded, in the sense that 2. If ξ,σ,α satisfies (I) of Property 3 of Lemma 6.2, then its contribution is very small in the sense that

Sums over the internal indices around external indices with lone labels yield extra smallness:
where |σ r | * := |{0, σ r , σ r }| − 1 counts how many, if any, of the two indices u r and v r , are restricted to vicinity of distinct external indices.
We postpone the proof of Lemma 6.3 and first see how it is used to finish the proof of Lemma 6.1. The bound (6.28) follows by combining Lemmas 6.2 and 6.3 to estimate the right hand side of (6.29). If (I) of Property 3 of Lemma 6.2 holds, then applying (6.32) and (6.31) in (6.29) yields (6.28). On the other hand, if (I) of Property 3 of Lemma 6.2 is not true, then we use (6.31) and (6.33) to get By Part 3 of Lemma 6.2 we know that (II) holds. Thus the power of on the right hand side of (6.34) is at least 2μ + | L|. On the other hand, from (6.23) we see that Hence the power of N −1/2 on the right hand side of (6.34) is at least |L| − | L|. Using these bounds together with N −1/2 in (6.34), and then taking expectations yields (6.28). Plugging (6.28) into (6.27) completes the proof of (6.15).
(6.35) By the bound on the size of E t , F r in Property 2 of Lemma 6.2, (6.35) is applicable for these upper index sets. Then (6.31) follows from the second bound of Property 1 of Lemma 6.2 and the decay of the entries of M E from (5.20).
In order to prove Part 2, let i ∈ L be the label from (I) of Property 3 of Lemma 6.2. We have where the first term on the right hand side vanishes because w xy (u, v)'s are centred random variables by (6.19). Now the covariance is smaller than any inverse power of Let us first consider (1) r . If σ r = s and σ r = t, then we need to estimate u∈B s \Q r v∈B t \Q r w x r y r (u, v) G U r uv , where s, t ∈ L\ L, and Q r ⊆ U r ⊆ Q r ∪ B L .
(6.36) Since B s \Q r , B t \Q r ⊆ X\U r , the indices u, v do not overlap the upper index set U r . Hence, in the case k = 1 and s = t = 0 the estimate (6.33) follows from (A.48) of Lemma A.3.
If s, t ∈ L, then taking modulus of (6.36) and using (6.35) yields (6.33): where d (A, B) := inf{d(a, b) : a ∈ A, b ∈ B} for any sets A and B of X. Here we have also used the definition (6.13) of lone labels and N −1/2 .
Suppose now that exactly one component of σ r equals 0 and one is in L. In this case, we split w x r y r (u, v) in (6.36) into two parts corresponding to w x r u w v y r and its expectation, and estimate the corresponding sums separately. First, using (A.34) of Lemma A.2 yields On the other hand, using (6.35) we estimate the expectation part: Similar to the part (6.38), because of (3.12), we can estimate (6.39) by O ≺ (N Cε 1 N −1 ). As N −1/2 , this finishes the proof of (6.33) in the case k = 1. Now we prove (6.33) for (2) r . In this case, we need to bound, where s = σ r , t = σ r have again values in {0}∪ L\ L. Here, u r ∈ B L \U r , v r ∈ B L \V r , and Q r ⊆ U r , V r ⊆ Q r ∪ B L . By definitions of the lone and isolated labels (6.13) and (6.23), respectively, we know that, if s ∈ L\ L, then d(u , u r ) N ε 1 , and similarly, if t ∈ L\ L, then d(v r , v) N ε 1 . Thus, if s, t ∈ L\ L, then estimating similarly as in (6.37) with (6.35), yields In the remaining cases, we split (6.40) into two parts corresponding to the term w x r u w v y r and its expectation in the definition of (6.19) of w x r y r (u, v), and estimate these two parts separately.
The average part is bounded similarly as in (6.39), i.e., if s ∈ L\ L and t = 0, then Here d(u, u r ) N ε 1 since u ∈ B s , s ∈ L\ L, while u r ∈ B L . Taking ν > Cε −1 1 and using the (3.12) to bound the sum over the covariances by a constant, we thus we see that the right hand side is O ≺ (N Cε 1 N −1 ). Since N −1/2 , this matches (6.33) as |σ r | * = |{0, s, t}| − 1 = 1. Now, we are left to bound the size of terms of the form (6.40), where w x r y t (u, v) is replaced with 1 N w x r u w vy r , and either s = 0 or t = 0. In these cases the sums over u and v factorize, i.e., we have When the sum is over a small set, i.e., over B s for some s ∈ L\ L, then we estimate the sizes of the entries of W and G (#) by O ≺ (N −1/2 ) and O ≺ ( ), respectively. On the other hand, when u or v is summed over B 0 \Q r , we use (A.34) of Lemma A.2 to obtain a bound of size O ≺ ( ). In each case, we obtain an estimate that matches (6.33).
Proof of Lemma 6.2 We consider the data (ξ, σ ) fixed, and write L = L(x, y, σ ), etc. We start by enumerating the isolated labels (see (6.23)) The monomial expansion (6.24) is constructed iteratively in steps. Indeed, we will define 1 + representations, with some indices a t , b t / ∈ E t , w r / ∈ F r . The numbers m and q as well as the sets E t , F r may vary from monomial to monomial, i.e., they are functions of k and α. Furthermore, for each fixed k and α, the lower indices and the upper index sets satisfy for some 1 t 2μ, and F r ⊆ B(k) ∪ Q r , for some 1 r 2μ; (c) If a t ∈ B s i and b t ∈ B s j , with 1 i, j k, then i = j; (d) For each s = 1, . . . , 2μ there are two unique labels 1 t (s), t (s) m, such that a t (s) = u s , b t (s) / ∈ {v r } r =s , and a t (s) / ∈ {u r } r =s , b t (s) = v s hold, respectively.
We will call the right hand side of (6.43) the level-k expansion in the following and we will define it by a recursion on k.
The level-0 expansion is determined by the formula (6.18a): In order to do that, first we list the elements of each B s k as {x ka : 1 a |B s k |} = B s k , and we define which is a one-by-one exhaustion of B s k ; namely B k1 ⊆ B k2 ⊆ · · · ⊆ B k,|B s k | ⊆ B s k . Note that B k,a+1 = B ka ∪ {x ka }. We now consider a generic level-(k − 1) monomial , which is of the form (6.44) and satisfies (a)-(d). Each monomial (k−1) α will give rise several level-k monomials that are constructed independently for different α's as follows. Expanding each of the m factors in the first product of (6.44) using the standard resolvent expansion identity and each of the q factors in the second product of (6.44) using yields a product of sums of resolvent entries and their reciprocals. Inserting these formulas into (6.44) and expressing the resulting product as a single sum yields the representation where A α (k) is some finite subset of integers and β simply labels the resulting monomials in an arbitrary way. From the resolvent identities (6.47) it is easy to see that the monomials (k) β inherit the properties (a)-(d) from the level-(k − 1) monomials. In particular, summing over α = 1, . . . , M k−1 in (6.48) yields the level-k monomial expansion (6.43), with M k := α |A α (k)|. We will assume w.l.o.g. that the sets A α (k), 1 α M k−1 , form a partition of the first M k integers.
This procedure defines the monomial representation recursively. Since function of the (u, v) indices, strictly speaking we should record which lower indices in the generic form (6.44) are considered independent variables. Initially, at level k = 0, all indices are variables, see (6.18a). Later, the expansion formulas (6.47) bring in new lower indices, denoted generically by x ka from the set ∪ s∈ L B s which is disjoint from the range of the components u r , v r of the variables (u, v) as B(ξ, σ ) is a subset of (X\∪ s∈ L B s ) 2μ . However, the structure of (6.47) clearly shows at which location the "old" a, b indices from the left hand side of these formulas appear in the "new" formulas on the right hand side. Now the simple rule is that if any of these indices a, b were variables on the left hand side, they are considered variables on the right hand side as well. In this way the concept of independent variables is naturally inherited along the recursion. With this simple rule we avoid the cumbersome notation of explicitly indicating which indices are variables in the formulas.
We note that the monomials of the final expansion (6.46) can be written in the form (6.26). Indeed, the second products in (6.26) and (6.44) are the same, while the first product of (6.44) is split into the three other products in (6.26) using (d). Properties 1 and 2 in Lemma 6.2 for the monomials in (6.46) follow easily from (a)-(d). Indeed, (a) yields the first part of Property 1, while the second part of Property 1 follows from (c) and the basic property d(B s , B t ) N ε 1 for distinct lone labels s, t ∈ L.
For a given ξ , we define the family of subsets of X: By construction [cf. (6.47) and (b)] the upper index sets are members of this ξdependent family. Since | Q r |, |B s k | N C 0 ε 1 , for some C 0 ∼ 1, we get |E | μ N C 0 μ . Property 2 follows directly from these observations. Next we prove Property 3 of the monomials (6.46). To this end, we use the formula (6.48) to define a partial ordering '<' on the monomials by It follows that for every α = 1, 2, . . . , M = M , there exists a sequence (α k ) −1 k=1 , such that Here, D k is the largest set A ⊆ X, such that (k) α k depends only on the matrix elements of H (A) .
Since both the upper index sets and the total number of resolvent elements of the form G (A) ab are both larger (or equal) on the right hand side than on the left hand sides of the identities (6.47), and the added indices on the right hand side are from B s k , we have We claim that The first implication follows from the monotonicity of D k 's. In order to get the second implication, suppose that   (1) (ξ, σ, α), etc. Let us set b * := 1 + max x,y |B N ε 1 (x, y)|. Each of the factors in every monomial at the level k −1 is turned into a sum over monomials by the resolvent identities (6.47). Since each such monomial contains at most five resolvent entries (cf. the last terms in (6.47b)), we obtain the first of the following two bounds: For the second bound we recall that each of the at most p k−1 factors in every level-(k − 1) monomial is expanded by the resolvent identities (6.47) into a sum of at most b * terms. The product of these sums yields single sum of at most b p k−1 * terms. From (6.45) and (6.18a) we get: M 0 := 1, p 0 = 2μ. Since k 2μ, we have max k p k 2μ 25 μ . Plugging this into the second bound of (6.53) yields M k ((b * ) 2μ 25 μ ) 2μ . This proves (6.25) since b * N Cε 1 by (2.11). Finally, we obtain the bound on the number of factors in (6.26) using n + q p μ 1.

Bulk universality and rigidity
In this section we show how to use the strong local law, Theorem 2.7, to obtain the remaining results of Sect. 2.2 on random matrices with correlated entries.

Rigidity
(7.1) The normalized trace converges with the improved rate (7. 2) The constant C depends only on the model parameters K in addition to δ, ε and ν.
Remark 7.2 Theorem 2.7 and Proposition 7.1 provide a local law with optimal convergence rate 1 N Im ζ inside the bulk of the spectrum and convergence rate 1 N away from the convex hull of supp ρ, respectively. In order to prove a local law inside spectral gaps and at the edges of the self-consistent spectrum, additional assumptions on H are needed to exclude a naturally appearing instability that may be caused by exceptional rows and columns of H and the outlying eigenvalues they create. This instability is already present in the case of independent entries as explained in Section 11.2 of [1].

Remark 7.3
The local law in Proposition 7.1 extends beyond the regime of bounded spectral parameters ζ . The upper bound 1 δ on the distance of ζ from [κ − , κ + ] can be dropped in both (7.1) and (7.2). Furthermore, as was done e.g. for Wigner-type matrices in [4], by following the |ζ |-dependence along the proof the estimates on the difference G − M in (7.1) and (7.2) can be improved to Proof of Proposition 7.1 The proof has three steps. In the first step we will establish a weaker version of Proposition 7.1 where instead of the bound ≺ N −1/2 we will only show ≺ N −1/2 + (N Im ζ ) −1 . Then we will use this version in the second step to prove that there are no eigenvalues outside a small neighborhood of [κ − , κ + ]. Finally, in the third step we will show (7.1) and (7.2).
Step 1 The proof of this step follows the same strategy as the proof of Theorem 2.7. Only instead of using Lemma 3.4 to estimate the error matrix D we will use Lemma 5.1. In analogy to the proof of (2.28) we begin by showing the entrywise bound In fact, following the same line of reasoning that was used to prove (3.11), but using (5.2) instead of (3.6) to estimate D max we see that for any ε > 0. The last term on the right hand side can be absorbed into the left hand side and since ε was arbitrary (7.4) yields This inequality establishes a gap in the possible values that can take, provided ε < 1/2 because N −ε N −1/2 + (N Im ζ ) −1 . Exactly as we argued for (3.11) we can get rid of the indicator function in (7.5) by using a continuity argument together with a union bound in order to obtain (7.3).
As in the proof of Theorem 2.7 we now use the fluctuation averaging to get an improved convergence rate for the normalized trace, for all ζ ∈ H with δ dist(ζ , [κ − , κ + ]) 1 δ and Im ζ N −1+ε . Indeed, (7.6) is an immediate consequence of (7.3) and the fluctuation averaging Proposition 3.5.
Step 2: In this step we use (7.6) to prove the following lemma.
for a positive constant C, depending only on the model parameters K in addition to δ and ν.
In order to show (7.7) fix τ ∈ [−δ −1 , κ − − δ ] ∪ [κ + + δ, δ −1 ] and η ∈ [N −1+ε , 1 ], and let {λ i } N i=1 be the eigenvalues of H. Employing (7.6) we get Here, we used in the last inequality that 1 N Tr M is the Stieltjes transform of the selfconsistent density of states ρ with supp ρ ⊆ [κ − , κ + ]. Since the left hand side of (7.8) is a Lipschitz continuous function in τ with Lipschitz constant bounded by N we can use a union bound to establish (7.8) first on a fine grid of τ -values and then uniformly for all τ and for the choice η = N −2/3 , In particular, the eigenvalue λ i cannot be at position τ with very high probability, i.e.
Now we exclude that there are eigenvalues far away from the self-consistent spectrum by using a continuity argument. Let W be a standard GUE matrix with E| w xy | 2 = 1 N , (λ ± are defined as in (5.1) for the matrix H (α) . In particular, κ (0) ± = ± 2. Since the constant C(δ, v) in (7.9) is uniform for all random matrices with the same model parameters K , we see that for some positive constant C(μ), depending on μ, the upper bound κ 1 from (2.23) on the moments, the sequence κ 2 from (2.24) and P from (2.11). Thus we can use a union bound to establish Since for α = 0 all eigenvalues are in [−κ − 2 δ, κ + 2 δ] with very high probability and with very high probability no eigenvalue can leave this interval by (7.10), we conclude that Together with (7.9) this finishes the proof of Lemma 7.4.
Step 3 In this step we use (7.7) to improve the bound on the error matrix D away from [−κ − , κ + ] and thus show (7.1) and (7.2) by following the same strategy that was used in Step 1 and in the proof of Theorem 2.7. By Lemma 7.4 there are with very high probability no eigenvalues in R\[κ − − δ/2, κ + + δ/2] . Therefore, for any B ⊆ X also the submatrix H B of H has no eigenvalues in this interval. In particular, for any x ∈ X\B we have in a high probability event. As in the proof of Lemma 3.4 we bound the entries of the error matrix D by estimating the right hand sides of the Eqs. (5.9a)-(5.9e) further. But now we use (7.11), so that Im ζ in the denominators cancel and we end up with (7.12) Following the strategy of proof from Step 1 we see that (7.12) implies (7.1) and (7.2). This finishes the proof of Proposition 7.1.

Proof of Corollary 2.9
The proof follows a standard argument that establishes rigidity from the local law, which we present here for the convenience of the reader. The argument uses a Cauchy-integral formula that was also applied in the construction of the Helffer-Sjöstrand functional calculus (cf. [15]) and it already appeared in different variants in [22,27,28].
Let τ ∈ R such that ρ(τ ) δ for some δ > 0. We will now apply Lemma 5.1 of [4] which shows how to estimate the difference between two measures in terms of the difference of their Stieltjes transforms. With the same notation that was used in the statement of that lemma we make the choices and τ 1 := κ − − δ, τ 2 := τ , η 1 := N −1/2 , η 2 := N −1+ ε , ε := 1, for some fixed δ, ε > 0. We estimate the error terms J 1 , J 2 and J 3 from Lemma 5.1 of [4] by using (7.2) and (2.29). In this way we find is proven.

Bulk universality
Given the local law (Theorem 2.7), the proof of bulk universality (Corollaries 2.10 and 2.11) follows standard arguments based upon the three step strategy explained in the introduction. We will only sketch the main differences due to the correlations. We start by introducing an Ornstein-Uhlenbeck (OU) process on random matrices H t that conserves the first two mixed moments of the matrix entries (7.14) where the covariance operator : C N ×N → C N ×N is given as and B t is matrix of standard real (complex) independent Brownian motions with the appropriate symmetry B * t = B t for β = 1 (β = 2) whose distribution is invariant under the orthogonal (unitary) symmetry group. We remark that a large Gaussian component, as created by the flow (7.14), was first used in [39] to prove universality for the Hermitian symmetry class.
Along the flow the matrix H t = A + 1 √ N W t satisfies the condition B3 on the dependence of the matrix entries uniformly in t. In particular, since determines the operator S we see that H t is associated to the same MDE as the original matrix H. Also the condition B4 and B5 can be stated in terms of , and are hence both conserved along the flow.
For the following arguments we write W t as a vector containing all degrees of freedom originating from the real and imaginary parts of the entries of W t . This vector has N (N + 1)/2 real entries for β = 1 and N 2 real entries for β = 2. We partition X 2 = I ∪ I > into its upper, I := {(x, y) : x y}, and lower, I > = {(x, y) : x > y}, triangular part. Then we identify where w t ((x, y)) := 1 √ N w xy for β = 1 and for β = 2. In terms of the vector w t the flow (7.14) takes the form where b t = (b t (α)) α is a vector of independent standard Brownian motions, and 1/2 is the square-root of the covariance matrix corresponding to H = H 0 : (α, β) := E w 0 (α)w 0 (β).
Recall the notation B τ (x) = {y ∈ X : d(x, y) τ } for any x ∈ X, and set Using (2.11) and B3 we see that for any α respectively. For any fixed α, we denote by w α the vector obtained by removing all the entries of w which may become strongly dependent on the component w(α) along the flow (7.15), i.e., we define w α (γ ) := w(γ )1(γ / ∈ B 2 (α)). (7.17) In the case that X has independent entries it was proven in [12] that the process (7.15) conserves the local eigenvalue statistics of H up to times t N −1/2 , provided bulk local law holds uniformly in t along the flow as well. We will now show that this insight extends for dependent random matrices as well. The following result is a straightforward generalization of Lemma A.1. from [12] to matrices with dependent entries. A similar result was independently given in [14].
In the term (7.21b), if δ ∈ B 1 (α), then w(α)w(δ) is almost independent of w α : If δ ∈ B 2 (α)\B 1 (α), then w(α) is almost independent of (w(δ), w α ) and where we have used (7.16). The last term containing derivatives is bounded by . The term (7.21c) is negligible by . For (7.21d) we use (7.16) and the definition of to obtain α δ,γ ∈B 2 (α) The last term (7.21e) is estimated similarly , and the double sum over α, δ produces a factor of size C N due to the exponential decay of . Combining the estimates for the five terms on the right hand side of (7.21) we obtain (7.18).
Proof of Corollaries 2.10 and 2. 11 We will only sketch the argument here as the procedure is standard. First we show that the matrix H t defined through (7.14) satisfies the bulk universality if t t N := N −1+ξ 1 , for any ξ 1 > 0. For simplicity, we will focus on t N −1/2 only. Indeed, from the fullness assumption B5 it follows that H t is of the form where c(t) ∼ 1 and U is a GUE/GOE-matrix independent of H t . For t N −1/2 the matrix H t has essentially the same correlation structure as H, controlled by essentially the same model parameters. In particular the corresponding S t operator is almost the same as S. Let M t solve the corresponding MDE with S replaced by S t and let ρ t denote the function related to M t similarly as ρ is related to M (see Definition 2.3). Using the general stability for MDEs, Theorem 2. (cf. (2.17)) it is easy to check that M t is close to M, in particular ρ t (ω) δ/2 when ρ(ω) δ. Moreover, the local law applies to H t as well, i.e. the resolvent G t (ζ ) of H t approaches M t (ζ ) for spectral parameters ζ with ρ(Re ζ ) δ. The bulk spectrum of H t is therefore the same as that of H in the limit. Combining these facts with the decomposition (7.22) we can apply Theorem 2.2 from the recent work [41] to conclude bulk universality for H t , with t = t N = N −1+ξ 1 in the sense of correlation functions as in Corollary 2.10. In order to prove the gap universality, Corollary 2.11, we use Theorems 2.4 and 2.5 from [42] or Theorem 2.1 and Remark 2.2 from [26]. Second, we use Lemma 7.5 to show that H and H t have the same local correlation functions in the bulk. Suppose ρ(ω) δ for some ω ∈ R. We show that the difference of the local k-point correlation functions ρ k and ρ k ;t N of H and H t N , respectively, converge weakly to zero as N → ∞. This convergence follows from the standard arguments provided that where F = F N is a function of H expressed as a smooth function of the following observables with p k and ξ 2 ∈ (0, 1) sufficiently small. Here the derivatives of might grow only as a negligible power of N (for details see the proof of Theorem 6.4 in [27]). In particular, basic resolvent formulas yield RHS of (7.18) where t is defined like in (3.9) but for the entries of G t (ζ ) := (H t − ζ ) −1 with Im ζ N −1+ξ 2 . In particular, we have used |G xy (t)| |m xy (t)|+ t 1+ t here. The constant from (7.19) is easily bounded by N ε +Cξ 2 , where the arbitrary small constant ε > 0 originates from stochastic domination estimates for t and |w s (α)|'s. The constant from (7.19), on the other hand, is trivially bounded by N C since the resolvents satisfy trivial bounds in the regime |Im ζ | N −2 , and the weight |w(α)| multiplying the third derivatives of f is canceled for large values of |w(α)| by the inverse in the definition G = (A + N −1/2 W − ζ 1) −1 . Since the local law holds for H t , uniformly in t ∈ [0, t N ], we see that t N ε (N η) −1/2 N ε −ξ 2 /2 with very high probability and hence Choosing the exponents ε, ε , ξ 1 , ξ 2 sufficiently small we see that the right hand side goes to zero as N → ∞. This completes the proof of Corollary 2.10. Finally, the comparison estimate (7.23) and the rigidity bound (2.31) allows us to compare the gap distributions of H t N and H, see Theorem 1.10 of [40]. This proves Corollary 2.11.
We write the difference between R −1 and S −1 as The first equality in (A.5) implies where we used Q ∞→2 √ N Q and Q 2→∞ √ N Q max for any Q ∈ We split R * R into a decaying and an entrywise small piece as we did with R itself in (A.1), From the related properties of S and T we can easily see that where L = (l xy ) x,y . Using the a priori knowledge from the assumption R −1 1 of Lemma 4.3, we will show that L −1 1, which is equivalent to (A.2). Note that both L and K are selfadjoint.
Via spectral calculus we write K as a sum of a matrix K s with small spectral norm and a matrix K b with bounded rank with some ε > 0 to be determined later. Indeed, from the Hilbert-Schmidt norm bound on the eigenvalues λ i (K) of K, we see that rank K b 1 ε 2 . On the other hand K s ε by its definition in (A.9). Since L + K has a bounded inverse (cf. (A.8)), so does L + K b for small enough ε, i.e.
Now we fix ε ∼ 1 such that (A.10) is satisfied. In particular the eigenvalues of L+K b are separated away from zero. Since rank K b 1, we can apply the interlacing property of rank one perturbations finitely many times to see that there are only finitely many eigenvalues of L in a neighborhood of zero, i.e.
for some constant c 1 ∼ 1. In particular, there are constants c 2 ∼ c 3 ∼ 1 such that c 2 + c 3 c 1 and L has a spectral gap at [c 2 , c 2 + c 3 ], We split L into the finite rank part L s associated to the spectrum below the gap and the rest, The rest of the proof is devoted to showing that L s 1 [0,c 2 ) (L), which implies that L has a bounded inverse and thus shows (A.2). More precisely, we will show that there are points x 1 , . . . , x L with L := rank L s 1 (cf. (A.11)) and a positive sequence (C(v)) v∈N such that for any v ∈ N and x ∈ X, where (e x ) x∈X denotes the canonical basis of C N . Let l = (l x ) be any normalized eigenvector of L s in the image of 1 [0,c 2 ) (L) with associated eigenvalue λ. We need to show that λ 1. Since l, 1 [0,c 2 ) (L)e x = l x , the decay property (A.13) of the spectral projection 1 [0,c 2 ) (L) away from the finitely many centers x i implies that the components l x have arbitrarily high polynomial decay away from the points x 1 , . . . , x L . In particular, x |l x | is bounded and therefore we have (cf. (A.7)) We infer that for the eigenvalue λ we get a lower bound via 1 l * (L+K)l = λ+l * Kl, where we used (A.8) for the inequality. Thus, λ 1 for large enough N . Now we prove (A.13) by induction. We show that for any l = 0, . . . , L there is an l-dimensional subspace of the image of 1 [0,c 2 ) (L) such that the associated orthogonal projection P l satisfies (A.14) The induction is over l. For l = 0 there is nothing to show. Now suppose that (A.14) has been established for some l < L. We will see now that it then holds for l replaced by l + 1 as well.
We maximize the maximum norm of all vectors in the image of 1 [0,c 2 ) (L) − P l and pick the index x l+1 where the maximum is attained, Here, ξ > 0 since l < L. Now we extend the projection P l by the normalized vector v defined as The so defined vector v attains its maximum norm at the point x l+1 and the value of this norm is ξ , since for any x we have (A.17) Here we used (A.15) and that 1 [0,c 2 ) (L) − P l 0 is an orthogonal projection.
We will show that P l+1 satisfies (A.14) with l replaced by l + 1. We start by establishing that ξ 1. We write v 2 as a sum of contributions originating from the neighborhoods B := l+1 i=1 B R (x i ) of the points x i with some radius R to be determined later and their complement. We estimate the components of v by using (A.17) and the definition of v in (A.16), With the induction hypothesis (A.14) the second summand in the sum on the right hand side of (A.18) is bounded by for y / ∈ B. For the other summand in (A.18) we use the decay estimate The bound (A.20) follows from the integral representation where the integral is over a closed contour encircling only the eigenvalues of L within [0, c 2 ). Since L has a spectral gap above c 2 (cf. (A.12)) we may choose such that max ζ ∈ (L − ζ 1) −1 1.
Since the entries of L are decaying by (A.7), we can apply the standard Combes-Thomas estimate to see that the entries of (L − ζ 1) − where in the second inequality we estimated the size of B with (2.11). Now we choose R := ξ −1/P , ν := 4P . Using ξ 1 (cf. (A.15)), we obtain that the right hand side (A.22) is bounded by a constant multiple of ξ . Thus (A.22) proves ξ 1. We finish the induction by using the definition (A.16) of v and estimating Let R be an arbitrary normalized self-adjoint matrix satisfying T , R = 0. We use the spectral representation R = i i r i r * i , with the orthonormal eigenbasis (r i ) N i=1 of R. Plugging this spectral representation into the right hand side of (A.24) reveals the identity R , (Id ± T )[R] = q * (1 ± S)q, (A. 25) where we introduced the vector q ∈ R N of eigenvalues of R and the matrix S ∈ R N ×N with non-negative entries: The vector q is normalized since q = R hs = 1, and the matrix S is symmetric because of the self-adjointness of T . Furthermore, by (4.34) the entries of S satisfy lower and upper bounds, γ N −1 s i j N −1 .
(A. 26) In particular, by the Perron-Frobenius theorem, the matrix S has a unique normalized eigenvector s with positive entries and with associated eigenvalue equal to its spectral norm, Ss = S s.
We will now show that S has a spectral gap and r has a non-vanishing component in the direction orthogonal to s. This will imply Then the following large deviation estimate holds for every ν ∈ N: Proof Here we use Convention 4.1 such that ϕ ψ means ϕ Cψ for a constant C, depending only on P := (β 1 , β 2 , β 3 , P) (cf. (2.11)). We divide the proof into three steps.
Step 1 In this step we introduce a cutoff both for X and b. We show that it suffices to prove the moment bound It is easy to verify that X and b satisfy the assumption of Lemma A.2. Now suppose that (A.35) holds with X, b replaced by X , b. In particular, x b x X x ≺ 1. Then we see that Step 2 In this step we completely remove the weak dependence between X and b, i.e. we show that it is enough to prove (A.35) for a centered sequence X independent of b satisfying (A.36), the assumption (ii), and (A.32). Indeed, suppose that X and b are not independent, but satisfy (A.33) instead. Let b be a copy of b that is independent of X and b. We show that for any μ, v ∈ N, ( where the maximum is taken over all x 1 , . . . , x 2μ ∈ X. Now we employ (A.33) as well as the bounds |b x | 1 and |X x | √ N to infer (A.39).
Step 3 By Step 1 and Step 2 we may assume for the proof of (A.35) that X is independent of b and that these random vectors satisfy (A.36), the hypothesis (ii) and (A.32). In this final step we construct for every ε > 0 a partition of X into non-empty sets I 1 , . . . , I K with the following properties: (P1) With a constant C(ε), depending only on ε and P (cf. (2.11)), the size of the partition is bounded by K C(ε) N (P+1)ε ; (P2) The indices within an element of the partition are far away from each other, i.e., if x, y ∈ I k , x = y, then d(x, y) N ε . In other words, the elements within each I k are far from each other hence the corresponding components of X and b are practically independent.
We postpone the construction of this partition to the end of the proof and explain first how it is used to get (A.35). We split the sum according to the partition and estimate By the bound (P1) on the size of the partition and by choosing ε sufficiently small, it remains to show For an independent sequence (X x ) x∈I k satisfying the assumption (ii), the moment bound (A.40) would be a simple consequence of the Marcinkiewicz-Zygmund inequality. Therefore, (A.40) follows from for all μ, ν ∈ N, where X = ( X x ) x is an independent family of random variables, which is also independent of X and b and has the same marginal distributions as X .
To show (A.41) we expand the powers on the left hand side and use the independence of b from X and X as well as the upper bound |b x | 1, l.h.s. of (A.41) N 2μ max where the maximum is over all ξ = (x 1 , . . . , x 2μ ) ∈ I 2μ k . For such a ξ let ξ 1 , . . . , ξ R ∈ I k denote the indices appearing within ξ , clearly R 2μ. Let furthermore the non-negative integers μ 1 , . . . , μ R and μ 1 , . . . , μ R denote the corresponding numbers of appearances within (x 1 , . . . , x μ ) and (x μ+1 , . . . , x 2μ ), respectively. Then we can further estimate the term inside the maximum on the right hand side of (A.42) corresponding to ξ by using the telescopic sum, We will now inductively construct the partition I 1 , . . . , I K with the properties (i) and (ii) above. Suppose that the disjoint sets I 1 , . . . , I k have already been constructed. Then we pick an arbitrary x 0 ∈ J 0 := X\(I 1 ∪ · · · ∪ I k ). Next we pick x 1 ∈ J 1 := J 0 \B N ε (x 0 ), then x 2 ∈ J 2 := J 1 \B N ε (x 1 ) and so on. The process stops at some step L when J L+1 is empty and we set I k+1 := {x 0 , . . . , x L }. By construction, property (ii) is satisfied for all elements I k of the partition. We verify the upper bound (i) on the number K of such elements. For every k we have | I l | .
Since I K contains at least one element, we infer by induction that We solve for K and thus see that (i) holds true. This finishes the proof of Lemma A.2.
x,y∈X be families of random variables that satisfy the following assumptions: (i) The families X and Y are centered, E X x = E Y x = 0.
(ii) Both X and Y have uniformly bounded moments: There is a sequence of constants β 1 such that E |X x | μ + E |Y x | μ β 1 (μ), for all μ ∈ N and all x ∈ X. (iii) The correlations within X and Y decay fast: There is a sequence β 2 of constants, s.t. for all ε > 0, A, B ⊆ X, with d(A, B) N ε , and smooth functions φ : Then the following large deviation estimate holds for every v ∈ N: x, y b xy (X x Y y − E X x Y y ) ≺ x, y |b xy | 2 1/2 Proof We use Convention 4.1 such that ϕ ψ means ϕ Cψ for a constant C, depending only on P := (β 1 , β 2 , β 3 , P) (cf. (2.11)). The proof of Lemma A.3 follows a similar strategy as the proof of Lemma A.2. Exactly as in Step 1 of the proof of Lemma A.2, we introduce new families of centered random variables and rescaled coefficients b xy := b xy ( u,v |b uv |) 1/2 + N − ν .
for any x ∈ I k and y ∈ I l . Note that if k = l, then ι(x) = 1 for all x ∈ I k . Furthermore, let us define S := (S, T ) : S ⊆ I k , T ⊆ I l , such that x y for all x ∈ S, y ∈ T , the pairs of subsets with a distance of at least N ε 3 . Inspired by Appendix B of [19] we use the partition of unity