Stability of the Matrix Dyson Equation and Random Matrices with Correlations

We consider real symmetric or complex hermitian random matrices with correlated entries. We prove local laws for the resolvent and universality of the local eigenvalue statistics in the bulk of the spectrum. The correlations have fast decay but are otherwise of general form. The key novelty is the detailed stability analysis of the corresponding matrix valued Dyson equation whose solution is the deterministic limit of the resolvent.


Introduction
E. Wigner's vision on the ubiquity of random matrix spectral statistics in quantum systems posed a main challenge to mathematics. The basic conjecture is that the distribution of the eigenvalue gaps of a large self-adjoint matrix with sufficient disorder is universal in the sense that it is independent of the details of the system and it depends only on the symmetry type of the model. This universal statistics has been computed by Dyson, Gaudin and Mehta for the Gaussian Unitary and Orthogonal Ensembles (GUE/GOE) in the limit as the dimension of the matrix goes to infinity. GUE and GOE are the simplest mean field random matrix models in their respective symmetry classes. They have centered Gaussian entries that are identically distributed and independent (modulo the hermitian symmetry). The celebrated Wigner-Dyson-Mehta (WDM) universality conjecture, as formulated in the classical book of Mehta [44], asserts that the same gap statistics holds if the matrix elements are independent and have arbitrary identical distribution (they are called Wigner ensembles). The WDM conjecture has recently been proved in increasing generality in a series of papers [18,21,24,25] for both the real symmetric and complex hermitian symmetry classes via the Dyson Brownian motion. An alternative approach introducing the four-moment comparison theorem was presented in [49,50,52]. In this paper we only discuss universality in the bulk of the spectrum, but we remark that a similar development took place for the edge universality.
The next step towards Wigner's vision is to drop the assumption of identical distribution in the WDM conjecture but still maintain the mean field character of the model by requiring a uniform lower and upper bound on the variances of the matrix elements. This generalization has been achieved in two steps. If the matrix of variances is stochastic, then universality was proved in [18,27,29], in parallel with the proof of the original WDM conjecture for Wigner ensembles. Without the stochasticity condition on the variances the limiting eigenvalue density is not the Wigner semicircle any more; the correct density was analyzed in [1,2] and the universality was proved [4]. We remark that one may also depart from the semicircle law by adding a large diagonal component to Wigner matrices; universality for such deformed Wigner matrices was obtained in [43]. Finally we mention a separate direction to generalize the original WDM conjecture that aims at departing from the mean field condition: bulk universality for general band matrices with a band width comparable to the matrix size was proved in [11], see also [48] for Gaussian block-band matrices.
In this paper we drop the third key condition in the original WDM conjecture, the independence of the matrix elements, i.e. we consider matrices with correlated entries. Correlations come in many different forms and if they are extremely strong and long range, the universality may even be violated. We therefore consider random matrix models with a suitable decay of correlations. These models still carry sufficiently many random degrees of freedom for Wigner's vision to hold and, indeed, our main result yields spectral universality for such matrices.
We now describe the key points of the current work. Our main result is to prove a local law for the resolvent Here the self-adjoint matrix A and the operator S on the space of matrices are determined by the first two moments of the random matrix A :" E H , SrRs :" E pH´AqRpH´Aq . (1. 3) The central role of (1.2) in the context of random matrices has been recognized by several authors [33,38,45,53]. We will call (1.2) Matrix Dyson Equation (MDE) since the analogous equation for the resolvent is sometimes called Dyson equation in perturbation theory.
Local laws have become a cornerstone in the analysis of spectral properties of large random matrices [4,8,20,23,29,35,37,51]. In its simplest form, a local law considers the normalized trace 1 N Tr Gpζq of the resolvent. Viewed as a Stieltjes transform, it describes the empirical density of eigenvalues on the scale determined by η " Im ζ. Assuming a normalization such that the spectrum of H remains bounded as N Ñ 8, the typical eigenvalue spacing in the bulk is of order 1{N. The local law asserts that this normalized trace approaches a deterministic function mpζq as the size N of the matrix tends to infinity and this convergence holds uniformly even if η " η N depends on N as long as η " 1{N. Equivalently, the empirical density of the eigenvalues converges on any scales slightly above 1{N to a deterministic limit measure on R with Stieltjes transform mpζq.
Since G is asymptotically close to M, the deterministic limit of the Stieltjes transform of the empirical spectral measure is given by mpζq " 1 N Tr Mpζq. Already in the case of random matrices with centered independent entries (Wigner-type matrices) the limiting measure ρpdωq and its Stieltjes transform mpζq typically depend on the entire matrix of variances s xy :" E|h xy | 2 and the only known way to determine ρ is to solve (1.2). However, in this setting the problem simplifies considerably because the off-diagonal elements of G tend to zero, M is a diagonal matrix and (1.2) reduces to a vector equation for its diagonal elements. In case the variance matrix is doubly stochastic, ř y s xy " 1 (generalized Wigner matrix ), the problem simplifies yet again, leading to M " m sc 1, where m sc " m sc pζq is the Stieltjes transform of the celebrated semicircle law.
The main novelty of this work is to handle general correlations that do not allow to simplify (1.2). The off-diagonal matrix elements G xy , x ‰ y, do not vanish in general, even in the N Ñ 8 limit. The proof of the local law consists of two major parts. First, we derive an approximate equation Local laws are the first step of a general three step strategy developed in [24,25,27,29] for proving universality. The second step is to add a tiny independent Gaussian component and prove universality for this slightly deformed model via analyzing the fast convergence of the Dyson Brownian motion (DBM) to local equilibrium. Finally, the third step is a perturbation argument showing that the tiny Gaussian component does not alter the local statistics.
In fact, the second and the third steps are very robust arguments and they easily extend to the correlated case. They do not use any properties of the original ensemble other than the a priori bounds encoded in the local laws, provided that the variances of the matrix elements have a positive lower bound (see [12,26,41,42]). Therefore our work focuses on the first step, establishing the stability of (1.2) and thus obtaining a local law.
Prior to the current paper, bulk universality has already been established for several random matrix models which carry some specific correlation from their construction. These include sample covariance matrices [25], adjacency matrices of large regular graphs [7] and invariant β-ensembles at various levels of generality [9,10,16,17,30,34,47]. However, neither of these papers aimed at understanding the effect of a general correlation nor were their methods suitable to deal with it. Universality for Gaussian matrices with a translation invariant covariance structure was established in [3]. For general distributions of the matrix entries, but with a specific two-scale finite range correlation structure that is smooth on the large scale and translation invariant on the short scale, universality was proved in [14], independently of the current work.
Finally, we mention that there exists an extensive literature on the limiting eigenvalue distribution for random matrices with correlated entries on the global scale (see e.g. [5,6,13,33,36,46] and references therein), however these works either dealt with Gaussian random matrices or more specific correlation structures that allow one to effectively reduce (1.2) to a vector or scalar equation. While the Matrix Dyson Equation in full generality was introduced for the analysis on the global scale before us, we are not aware of a proof establishing that the empirical density of states converges to the deterministic density given by the solution of the MDE for a similarly broad class of models that we consider in this paper. This convergence is expressed by the fact that 1 N Tr Gpζq « 1 N Tr Mpζq holds for any fixed ζ P H. We thus believe that our proof identifying the limiting eigenvalue distribution is a new result even on the global scale for ensembles with general short range correlations and non-Gaussian distribution.
We present the stability of the MDE and its application to random matrices with correlated entries separately. Our findings on the MDE are given in Section 2.1, while Section 2.2 contains the results about random matrices with correlated entries. These sections can be read independently of each other. In Section 3 we prove the local law for random matrices with correlations. The proof relies on the results from Section 2.1. These results concerning the MDE are established in Section 4, which can be read independently of any other section. The main technical ingredients of the proof in Section 3 are (i) estimates on the random error term appearing in the approximate MDE (1.4) and (ii) the fluctuation averaging mechanism for this error term. These two inputs will be provided in Sections 5 and 6, respectively. Finally, we apply the local law to establish the rigidity of eigenvalues and bulk universality in Section 7.

The Matrix Dyson Equation
In this section we present our main results on the Matrix Dyson Equation and its stability. The corresponding proofs are carried out in Section 4. We consider the linear space C NˆN of NˆN complex matrices R " pr xy q N x,y"1 , and make it a Hilbert space by equipping it with the standard normalized scalar product xR , Ty :" 1 N Tr R˚T . (2.1) We denote the cone of strictly positive definite matrices by C`:" tR P C NˆN : R ą 0u , and by C`its closure, the cone of positive semidefinite matrices. Let A " A˚P C NˆN be a self-adjoint matrix. We will refer to A as the bare matrix. Furthermore, let S : C NˆN Ñ C NˆN be a linear operator that is • self-adjoint w.r.t. the scalar product (2.1), i.e. Tr R˚SrTs " Tr SrRs˚T for any R, T P C NˆN , • positivity preserving, i.e. SrRs ě 0 for any R ě 0.
Note that in particular S commutes with taking the adjoint, SrRs˚" SrR˚s, and hence it is real symmetric, Tr R SrTs " Tr SrRsT, for all R, T P C NˆN . We will refer to S as the self-energy operator.
We call a pair pA, Sq consisting of a bare matrix and a self-energy operator with the properties above a data pair. For a given data pair pA, Sq and a spectral parameter ζ P H in the upper half plane we consider the associated Matrix Dyson Equation (MDE), Mpζ q´1 " ζ 1´A`S rMpζ qs , The question of existence and uniqueness of solutions to (2.2) with the constraint (2.3) has been answered in [38]. The MDE has a unique solution matrix Mpζ q for any spectral parameter ζ P H and these matrices constitute a holomorphic function M : H Ñ C NˆN .
On the space of matrices C NˆN we consider three norms. For R P C NˆN we denote by R the operator norm induced by the standard Euclidean norm ¨ on C N , by R hs :" a xR , Ry the norm associated with the scalar product (2.1) and by the entrywise maximum norm on C NˆN . We also denote the normalized trace of R by xRy :" x1 , Ry. For linear operators T : C NˆN Ñ C NˆN we denote by T the operator norm induced by the norm ¨ on C NˆN and by T sp the operator norm induced by ¨ hs .
The following proposition provides a representation of the solution M as the Stieltjestransform of a measure with values in C`. This is a standard result for matrix-valued Nevanlinna functions (see e.g. [32]). For the convenience of the reader we provide a proof which also gives an effective control on the boundedness of the support of this matrix-valued measure. The measure Vpdτ q " pv xy pdτ qq N x,y"1 on the real line with values in positive semidefinite matrices is unique. It satisfies the normalization VpRq " 1 and has support in the interval r´κ, κs, where κ :" A `2 S 1{2 . (2.6) We will now make additional quantitative assumptions on the data pair pA, Sq that ensure a certain regularity of the measure Vpdτ q. Our assumptions, labeled A1 and A2, always come together with a set of model parameters P 1 and P 2 , respectively, that control them effectively. Estimates will typically be uniform in all data pairs that satisfy these assumptions with the given set of model parameters. In particular, they are uniform in the size N of the matrix, which is of great importance in the application to random matrix theory.
A1 Flatness: Let P 1 " pp 1 , P 1 q with p 1 , P 1 ą 0. The self-energy operator S is called flat (with model parameters P 1 ) if it satisfies the lower and upper bound p 1 xRy 1 ď SrRs ď P 1 xRy 1 , R P C`. (2.7) Proposition 2.2 (Regularity of density of states). Assume that S is flat, i.e. it satisfies A1 with some model parameters P 1 and that the bare matrix has a bounded spectral norm, More precisely, where c ą 0 is a universal constant and the constant C ą 0 depends only on the model parameters P 1 and P 0 . Furthermore, ρ is real analytic on the open set tτ P R : ρpτ q ą 0u.
Definition 2.3 (Density of states). Assuming a flat self-energy operator, the probability density ρ : R Ñ r0, 8q, defined through (2.9), is called the density of states (of the MDE with data pair pA, Sq). We denote by supp ρ Ď R its support on the real line. With a slight abuse of notation we also denote by ρpζ q :" 1 π Im xMpζ qy , ζ P H , (2.10) the harmonic extension of ρ to the complex upper half plane.
The second set of assumptions describe the decay properties of the data pair pA, Sq. To formulate them, we need to equip the index set t1, . . . , Nu with a concept of distance. Recall that a pseudometric d on a set A is a symmetric function d : AˆA Ñ r0, 8s such that dpx, yq ď dpx, zq`dpz, yq for all x, y, z P A. We say that the pseudometric space pA, dq with a finite set A has sub-P -dimensional volume, for some constant P ą 0, if the metric balls B τ pxq :" ty : dpx, yq ď τ u, satisfy A2 Faster than power law decay: Let P 2 " pP, π 1 , π 2 q, where P ą 0 is a constant and π k " pπ k pνqq 8 ν"0 , k " 1, 2 are sequences of positive constants. The data pair pA, Sq is said to have faster than power law decay (with model parameters P 2 ) if there exists a pseudometric d on the index space t1, . . . , Nu such that the pseudometric space X " pt1, . . . , Nu, dq has sub-P -dimensional volume (cf. (2.11)) and holds for any ν P N and x, y P X.
In order to state bounds of the form (2.12) and (2.13) more conveniently we introduce the following matrix norms.
Definition 2.4 (Faster than power law decay). Given a pseudometric d on t1, . . . , Nu and a sequence π " pπpνqq 8 ν"0 of positive constants, we define: (2.14) If R π ď 1, for some sequence π, we say that R has faster than power law decay (up to level 1 N ) in the pseudometric space X :" pt1, . . . , Nu, d q. This norm expresses the typical behavior of many matrices in this paper that they have an off-diagonal decay faster than any power, up to a possible mean-field term of order 1{N. Using this norm the bounds (2.12) and (2.13) take the simple forms: Our main result, the stability of the MDE, holds uniformly for all spectral parameters that are either away from the support of the density of states or where the density of states takes positive values. Therefore, for any δ ą 0 we set D δ :" ζ P H : ρpζq`distpζ, supp ρq ą δ ( . Theorem 2.5 (Faster than power law decay of solution). Assume A1 and A2 and let δ ą 0. Then there exists a positive sequence γ such that The sequence γ depens only on δ and the model parameters P 1 and P 2 .
Our main result on the MDE is its stability with respect to the entrywise maximum norm on C NˆN , see (2.4). The choice of this norm is especially useful for applications in random matrix theory, since the matrix valued error terms are typically controlled in this norm. We denote by B max τ pRq :" Q P C NˆN : Q´R max ď τ ( , the ball of radius τ ą 0 around R P C NˆN w.r.t. the entrywise maximum norm. Theorem 2.6 (Stability). Assume A1 and A2, let δ ą 0 and ζ P D δ . Then there exist constants c 1 , c 2 ą 0 and a unique function G : The function G is analytic. In particular, there exists a constant C ą 0 such that Furthermore, there is a positive sequence γ and a linear operator Z : C NˆN Ñ C NˆN such that the derivative of G, evaluated at D " 0, has the form ∇Gp0q " Z`M Id , (2.18) and Z, as well as its adjoint Z˚with respect to the scalar product (2.1), satisfy for every R P C NˆN . Here c 1 , c 2 , C and γ depend only on δ and the model parameters P 1 , P 2 from assumptions A1 and A2.
We now introduce a generalization of A1 that relaxes the lower bound on S in (2.7) to demonstrate that our theory on the MDE also applies to a somewhat more general setup. This generalization is of relevance for band matrices with a band width of order N.
A1' L-Flatness: Let P 1 1 " pp 1 , P 1 , p 0 , Kq with p 1 , P 1 , p 0 ą 0 and K P N. The self-energy operator S is called L-flat (with model parameters P 1 1 and L P N) if for some zero/one-matrix Z " pZ kl q K k,l"1 P t0, 1u KˆK with positive diagonal, z kk " 1, and some partition A 1 , . . . , A K of t1, . . . , Nu, satisfying the following upper and lower bounds hold: Z kl xP k , Ry P l ď SrRs ď P 1 xRy 1 , @ R P C`. (2.21) Here, P k denotes the orthogonal projection onto the subspace of vectors with support in A k , i.e. pP k vq x :" v x ½px P A k q for x " 1, . . . , N.
Note that S being 1-flat simply means that S is flat, i.e. satisfies A1.
Remark 2.7 (Band matrices). Consider the example of a symmetric P -dimensional random band matrix H " ph xy q x,yPX P R NˆN with index space X " t1, . . . , nu P , dpx, yq " x´y .
Suppose for simplicity that H has independent centered, E h xy " 0, entries up to the symmetry constraint h xy " h yx with variances s xy :" E h 2 xy ď C n P , for some positive constant C ą 0 and that H has a macroscopic band width, i.e. s xy ě c n P ½pdpx, yq ď ε nq , with constants c ą 0 and ε P p0, 1q. As explained in Section 2.2 below the MDE associated to H has the data pair S xy rRs " s xy r xy ½px ‰ yq`δ xy ÿ uPX s xu r uu , x, y P X .
Then the lower bound on S in (2.7) is not satisfied, but A1' holds true uniformly for n ě 2{ε.
Theorem 2.8 (Replacement of A1 by A1'). Theorems 2.5 and 2.6 remain true without any changes if A1 is replaced by A1', i.e. if there is some L P N such that S is L-flat and the model parameters P 1 are replaced by P 1 1 and K. Proposition 2.2 also remains valid under the change from A1 to A1', provided the constant c is allowed to depend on L.
The proof of Theorem 2.8 requires only minor changes of the proofs under the simple flatness assumption A1. We explain these changes and thus prove Theorem 2.8 in the appendix.

Random matrices with correlations
In this section we present our results on local eigenvalue statistics of random matrices with correlations. Let H " ph x,y q N x,y"1 P C NˆN be a self-adjoint random matrix. For a spectral parameter ζ P H we consider the associated Matrix Dyson Equation (MDE), Mpζ q´1 " ζ 1´A`SrMpζ qs , Here, the bare matrix A is a non-random self-adjoint matrix and W is a self-adjoint random matrix with centered entries, E W " 0. The normalization factor N´1 {2 in (2.24) ensures that the spectrum of the fluctuation matrix W, with entries of a typical size of order one, remains bounded. In the following we will assume that there exists some pseudometric d on the index set t1, . . . , Nu, such that the resulting pseudometric space X " pt1, . . . , Nu, dq , has sub-P -dimensional volume for some constant P ą 0, i.e. d satisfies (2.11), and that the bare and fluctuation matrices satisfy the following assumptions: B1 Existence of moments: Moments of all orders of W exist, i.e., there is a sequence of positive constants κ 1 " pκ 1 pνqq νPN such that for all x, y P X and ν P N.
B2 Decay of expectation: The entries a xy of the bare matrix A decay in the distance of the indices x and y, i.e., there is a sequence of positive constants κ 2 " pκ 2 pνqq νPN such that |a xy | ď κ 2 pνq p1`dpx, yqq ν , (2.26) for all x, y P X and ν P N.
B3 Decay of correlation: The correlations in W are fast decaying, i.e., there is a sequence of positive constants κ 3 " pκ 3 pνqq νPN such that for all symmetric sets A, B Ď X 2 (A is symmetric if px, yq P A implies py, xq P A) and all smooth functions φ : C |A| Ñ R and ψ : C |B| Ñ R, we have pw xy q px,yqPA , and d 2 pA, Bq :" min max dpx 1 , x 2 q, dpy 1 , y 2 q ( : px 1 , y 1 q P A , px 2 , y 2 q P B ( , is the distance between A and B in the product metric on X. The supremum norm on vector valued functions Φ " pφ i q i is Φ 8 :" sup Y max i |φ i pY q|. B4 Flatness: There is a positive constant κ 4 such that for any two deterministic vectors u, v P C N we have Theorem 2.9 (Local law for correlated random matrices). Let G be the resolvent of a random matrix H written in the form (2.24) that satisfies B1-B4. For all δ, ε ą 0 and ν P N there exists a positive constant C such that in the bulk, Furthermore, the normalized trace converges with the improved rate The constant C depends only on the model parameters K in addition to δ, ε and ν.
In Section 3 we present the proof of Theorem 2.9 that is based on the results from Section 2.1 about the Matrix Dyson Equation. As a standard consequence of the local law (2.30) and the uniform boundedness of Im m xx from Theorem 2.5, the eigenvectors of H in the bulk are completely delocalized. This directly follows from the uniform boundedness of Im G xx pζ q and spectral decomposition of the resolvent (see e.g. [20]). Corollary 2.10 (Delocalization of eigenvectors). Pick any δ, ε, ν ą 0 and let u be a normalized, u " 1, eigenvector of H, corresponding to an eigenvalue λ P R in the bulk, i.e., ρpλq ě δ. Then for a positive constant C, depending only on the model parameters K in addition to δ, ε and ν.
The averaged local law (2.31) directly implies the rigidity of the eigenvalues in the bulk. For any τ P R, we define This is the index of an eigenvalue that is typically close to a spectral parameter τ in the bulk. Then the standard argument presented in Section 7.1 proves the following result.
Corollary 2.11 (Rigidity). For any δ, ε, ν ą 0 we have for a positive constant C, depending only on the model parameters K in addition to δ, ε and ν.
Another consequence of Theorem 2.9 is the universality of the local eigenvalue statistics in the bulk of the spectrum of H both in the sense of averaged correlation functions and in the sense of gap universality. For the universality statement we make the following additional assumption that is stronger than B4: for any real symmetric R P R NˆN (any complex hermitian R P C NˆN ).
In this case we consider κ 5 as an additional model parameter.
The first formulation of the bulk universality states that the k-point correlation functions ρ k of the eigenvalues of H, rescaled around an energy parameter ω in the bulk, converge weakly to those of the GUE/GOE. The latter are given by the correlation functions of well known determinantal processes. The precise statement is the following: Corollary 2.12 (Correlation function bulk universality). Let H satisfy B1-B3 and B5 with β " 1 (β " 2). Pick any δ ą 0 and choose any ω P R with ρpωq ě δ. Fix k P N and ε P p0, 1{2q. Then for any smooth, compactly supported test function Φ : R k Ñ R the k-point local correlation functions ρ k : R k Ñ r0, 8q of the eigenvalues of H converge to the k-point correlation function Υ k : R k Ñ r0, 8q of the GOE(GUE)-determinantal point process,ˇˇˇˇˇż where τ " pτ 1 , . . . , τ k q, and the positive constants C, c depend only on δ, ε, Φ and the model parameters.
The second formulation compares the joint distributions of gaps between consecutive eigenvalues of H in the bulk with those of the GUE/GOE. The proofs of Corollaries 2.12 and 2.13 are presented in Section 7.2.
Corollary 2.13 (Gap universality in bulk). Let H satisfy B1-B3 and B5 with β " 1 (β " 2). Pick any δ ą 0, an energy τ in the bulk, i.e. ρpτ q ě δ, and let i " ipτ q be the corresponding index defined in (2.32). Then for all n P N and all smooth compactly supported observables Φ : R n Ñ R, there are two positive constants C and c, depending on n, δ, Φ and the model parameters, such that the local eigenvalue distribution is universal, E Φ´`Nρpλ i q pλ i`j´λi q˘n j"1¯´E G Φ´`Nρ sc p0q pλ rN {2s`j´λrN {2s q˘n j"1¯ ď C N c .
Here the second expectation E G is with respect to GUE and GOE in the cases of complex Hermitian and real symmetric H, respectively, and ρ sc p0q " 1{p2πq is the value of Wigner's semicircle law at the origin.
During the final preparation of this manuscript and after announcing our theorems, we learned that a similar universality result but with a special correlation structure was proved independently in [14]. The covariances in [14] have a specific finite range and translation invariant structure, where ψ is a piecewise Lipschitz function with finite support in the third and fourth variables. The short scale translation invariance in (2.34) allows one to use partial Fourier transform after effectively decoupling the slow variables from the fast ones. This renders the matrix equation (1.2) into a vector equation for N 2 variables and the necessary stability result directly follows from [1]. The main difference between the current work and [14] is that here we analyze (1.2) as a genuine matrix equation without relying on translation invariance and thus arbitrary short range correlations are allowed.

Local law for random matrices with correlations
In this section we will prove Theorem 2.9, the local law for random matrices with correlations. The proof heavily relies on the results in Section 2.1 whose proofs will be presented separately in Section 4. By its definition (2.23) the resolvent of H satisfies the perturbed MDÉ where the random error matrix D : H Ñ C NˆN is given by Dpζ q :"´p SrGpζ qs`H´AqGpζ q . (3.1b) Note that the self-energy operator S : C NˆN Ñ C NˆN , defined in (2.22), is clearly selfadjoint with respect to the scalar product (2.1) and preserves the cone C`. In particular, pA, Sq is a data pair for the MDE as defined at the beginning of Section 2.1. We will now verify that this data pair satisfies the assumption A1 and A2 that were made in Section 2.1. Proof. The condition (2.12) on the bare matrix A is clearly satisfied by (2.26). The lower bound on S in (2.7) follows from (2.28). To show this, let R " ř i ̺ i r i ri P C`with an orthonormal basis pr i q N i"1 . Then v˚SrRsv " for any normalized vector v P C N .
We will now verify the upper bounds on S in (2.7) and (2.13). Both bounds follow from the decay of covariances |E w xu w vy | ď κ 3 p2νq`pq xy q uv q ν`p q xv q uy q ν˘, q xy :" 1 1`dpx, yq , ν P N , (3.2) which is an immediate consequence of (2.27) with the choices W A " pw xu , w ux q, W B " pw vy , w yv q, φpξ 1 , ξ 2 q " ξ 1 and ψpξ 1 , ξ 2 q " ξ 1 . Indeed, to see the upper bound in (2.7) it suffices to show for a constant C ą 0, depending on K , and any normalized vector r P C N , because we can use for any R P C`the spectral decomposition R " ř i ̺ i r i ri as above. The estimate (3.2) yields where we defined the matrix Q pνq with entries q pνq xy :" q ν xy and |r| :" p|r x |q xPX . Since The bound (2.13) follows because the right hand side of (3.5) is finite.
We use the notion of stochastic domination, first introduced in [20], that is designed to compare random variables up to N ε -factors on very high probability sets.
Definition 3.2 (Stochastic domination). Let X " X pN q , Y " Y pN q be sequences of non-negative random variables. We say X is stochastically dominated by Y if for any ε ą 0, ν P N and some (N-independent) family of positive constants C. In this case we write X ă Y .
In this paper the family C of constants in Definition 3.2 will always be an explicit function of the model parameters (2.29) and possibly some additional parameters that are considered fixed and apparent from the context. However, the constants are always uniform in the spectral parameter ζ on the domain under consideration and indices x, y in case X " r xy is the element of a matrix R " pr xy q x,y . To use the notion of stochastic domination, we will think of H " H pN q as embedded into a sequence of random matrices with the same model parameters.
The following lemma asserts that the error matrix D from (3.1b) converges to zero as the size N of the random matrix grows to infinity. The convergence holds with respect to the entrywise maximum norm (2.4). Section 5 is devoted to proving this key technical result. In this section we will merely use it in combination with the stability of the MDE, Theorem 2.6, to show that G approaches the solution M of the unperturbed MDE (2.2). We will use the short hand Λpζ q :" Gpζ q´Mpζ q max .
(3.6) Lemma 3.3 (Smallness of error matrix). Let C ą 0 and δ, ε ą 0 be fixed. Away from the real axis the error matrix D satisfies Near the real axis and in the regime where the harmonic extension of the density of states is bounded away from zero, we have Dpζ q max ½pΛpζ q ď N´εq ă 1 ?
Proof of Theorem 2.9. The proof is divided into two steps. The first step is the proof of (2.30). In the second step we show that the improved convergence rate (2.31) for the trace is a consequence of (2.30) and the fluctuation averaging mechanism (introduced in [28] for Wigner matrices), which is explained in more detail in Section 6.
Proof of (2.30): First we conclude from (3.7) and the stability of the MDE, Theorem 2.6, that for spectral parameters with large imaginary part, for any fixed constant C 1 ą 0. Now let τ P R, η 0 P rN´1`ε, 1s and ζ 0 " τ`iη 0 P H such that ρpζ 0 q ě δ for some δ P p0, 1s. Note that ρpζ 0 q ě δ and η 0 ď 1 imply τ P r´C 1 , C 1 s for some positive constant C 1 because ρ is the harmonic extension of the density of states with compact support in r´κ, κs (Proposition 2.1). Since in addition the density of states is uniformly Hölder continuous (cf. Proposition 2.2), there is a constant c 1 , depending on δ and P, such that inf ηPrη 0 ,1s ρpτ`iηq ě c 1 .
Therefore, by (3.8) and (2.17) we infer that Since N´ε {2 ě pN Im ζq´1 {2 , the inequality (3.10) establishes on a high probability event a gap in the possible values that Λpζq can take. The indicator function in (3.10) is absent for ζ " τ`i because of (3.9), i.e. at that point the value lies below the gap. From the Lipshitz-continuity of ζ Þ Ñ Λpζq with Lipshitz-constant bounded by 2N 2 for Im ζ ě 1 N and a standard continuity argument together with a union bound (e.g. Lemma A.1 in [4]), we conclude that the values lie below the gap for any ζ with Im ζ P rη 0 , 1s with very high probability. Thus (2.30) holds.
Proof of (2.31): Let ζ P H with Im ζ ě N´1`ε and ρpζq ě δ. We show the improved convergence rate for the normalized trace 1 N TrrG´Ms " x1 , G´My. By Step 1 there is an r ε ą 0 such that for every ν ą 0 we have G´M max ď N´r ε on an event whose complement has probability bounded by N´ν. On this event G coincides with the unique solution of the perturbed MDE (2.16) from Theorem 2.6 that depends analytically on D. We use the representation (2.18) of its derivative to see that x1 , G´My " x1 , ZrDs`MDy`Op D 2 max q " xZ˚r1s`M˚, Dy`Op D 2 max q . Since D max ă 1 ?
N Im ζ (cf. (3.8)), we infer that |x1 , G´My| ă |xZ˚r1s`M˚, Dy|`1 N Im ζ . (3.11) By (2.19) and (2.15) the entries of Z˚r1s`M˚have faster than power law decay, i.e. there is a positive sequence α and a constant C 2 ą 0, depending only on δ and the model parameters K , such that Z˚r1s`M˚ α ď C 2 .
Thus we can apply the following proposition.

Suppose that the local law holds in the form
Gpζ q´Mpζ q max ă Ψ . for every non-random R P C NˆN with faster than power law decay.
The proof of Proposition 3.4 is carried out in Section 6. Applied to the choices R :" Z˚r1s`M˚and Ψ :" Most of the inequalities in this and the following section are uniform in the data pair pA, Sq that determines the MDE and its solution, given a fixed set of model parameters P k corresponding to the assumptions Ak. We therefore introduce a convention for inequalities up to constants, depending only on the model parameters. . Suppose a set of model parameters P is given. Within the proofs we will write C and c for generic positive constants, depending on P. In particular, C and c may change their values from inequality to inequality. If C, c depend on additional parameters L , we will indicate this by writing CpL q, cpL q. We also use the comparison relation α À β or β Á α for any positive α and β if there exists a constant C ą 0 that depends only on P, but is otherwise uniform in the data pair pA, Sq, such that α ď Cβ. In particular, C does not depend on the dimension N or the spectral parameter ζ. In case α À β À α we write α " β. For two matrices R, T P C`we similarly write R À T if the inequality R ď CT in the sense of quadratic forms holds with a constant C ą 0 depending only on the model parameters.
In the upcoming analysis many quantities depend on the spectral parameter ζ. We will often suppress this dependence in our notation and write e.g. M " Mpζ q, ρ " ρpζ q, etc.
Proof of Proposition 2.1. In this proof we will generalize the proof of Proposition 2.1 from [2] to our matrix setup. By taking the imaginary part of both sides of the MDE and using Im M ě 0 and A " A˚we see that In particular, this implies the trivial bound on the solution to the MDE, Let w P C N be normalized, w˚w " 1. Since Mpζq has positive imaginary part, the analytic function ζ Þ Ñ w˚Mpζqw takes values in H. From the trivial upper bound (4.1) and the MDE itself, we infer the asymptotics iη w˚Mpiηqw Ñ´1 as η Ñ 8 .
By the characterization of Stieltjes transforms of probability measures on the complex upper half plane (cf. Theorem 3.5 in [31]), we infer where v w is a probability measure on the real line. By polarization, we find the general representation (2.5). We now show that supp V Ď r´κ, κs, where κ " A `2 S 1{2 (cf. (2.6)). Note that A1 implies S À 1. Indeed, let p¨q˘denote the positive and negative parts, so that R " pRe Rq`´pRe Rq´`ipIm Rq`´ipIm Rq´, holds for any R P C NˆN . Using this representation, we see that and since R hs ď R the bound S À 1 follows. The following argument will prove that Im Mpζ q Ñ 0 as Im ζ Ó 0 locally uniformly for all ζ P H with |ζ| ą κ. This implies supp V Ď r´κ, κs.
Let us fix ζ P H with |ζ| ą κ and suppose that M satisfies the upper bound Then by taking the inverse and then the norm on both sides of (2.2) we conclude that Therefore, (4.3) implies (4.4) and we see that there is a gap in the possible values of M , namely Since ζ Þ Ñ Mpζ q is a continuous function and for large Im ζ the values of this function lie below the gap by the trivial bound (4.1), we infer Let us now take the imaginary part of the MDE and multiply it with M˚from the left and with M from the right, By taking the norm on both sides of (4.6), using a trivial estimate on the right hand side and rearranging the resulting terms, we get Here we used M 2 S ă 1, which is satisfied by (4.5) for |ζ| ą κ. We may estimate the right hand side of (4.7) further by applying (4.5). Thus we find The right hand side of (4.8) converges to zero locally uniformly for all ζ P H with |ζ| ą κ as Im ζ Ó 0. This finishes the proof of Proposition 2.1.
The following proposition lists a number of important bounds on M.
Proposition 4.2 (Properties of the solution). Assume A1 and that A ď P 0 for some constant P 0 ą 0. Then uniformly for all spectral parameters ζ P H the following bounds hold: (i) The solution is bounded in the spectral norm, The inverse of the solution is bounded in the spectral norm, The imaginary part of M is bounded from below and above by the harmonic extension of the density of states, ρpζ q 1 À Im Mpζ q À p1`|ζ| 2 q Mpζ q 2 ρpζ q 1 . Proof. The inequalities (4.9) and (4.10) provide upper and lower bounds on the singular values of the solution, respectively. Before proving these bounds we show that M has a bounded normalized Hilbert-Schmidt norm, For this purpose we take the imaginary part of (2.2) (cf. (4.6)) and find The lower bound on S from (2.7) implies where we used the definition of ρ in (2.10). Taking the normalized trace on both sides of (4.14) shows (4.12).
Proof of (ii): Taking the norm on both sides of (2.2) yields where S hsÑ ¨ denotes the norm of S from C NˆN equipped with the norm ¨ hs to C NˆN equipped with ¨ . For the last inequality in (4.15) we used (4.12) and that by A1 we have S hsÑ ¨ À 1 (cf. (4.2)).
Proof of (iii): First we treat the simple case of large spectral parameters, |ζ| ě 1`κ, where κ was defined in (2.6). Recall that the matrix valued measure Vpdτ q (cf. (2.5)) is supported in r´κ, κs by Proposition 2.1. The normalization, VpRq " 1 implies that for any vector u P C N with u " 1 the function ζ Þ Ñ 1 π Imru˚Mpζ qus is the harmonic extension of a probability measure with support in r´κ, κs, hence it behaves as´ζ´1 for large |ζ|. We conclude that for |ζ| ě 1`κ. Since for these ζ we also have Mpζ q " |ζ|´1 by the Stieltjes transform representation (2.5) we conclude that (4.11) holds in this regime. Now we consider ζ P H with |ζ| ď 1`κ. We start with the lower bound on Im M. From (4.14) we see that and since M´1 À 1 by (ii), the lower bound in (4.11) is proven.
For the upper bound, taking the imaginary part of the MDE (cf. (4.6)) and using A1 and that Im M Á Im ζ 1 by the Stieltjes transform representation (2.5), we get Proof of (i): In the regime |ζ| ě 1`κ the bound (4.9) follows from the Stieltjes transform representation (2.5). Thus we consider |ζ| ď 1`κ. We take the imaginary part on both sides of (2.2) and use the lower bound in (4.11) and Sr1s Á 1 to get Im Mpζ q´1 ě SrIm Mpζ qs Á ρpζ q 1 .
Since in general Im R´1 ě 1 implies R ď 1 for any R P C NˆN , we infer that Mpζ q À ρpζ q´1. On the other hand Mpζ q À distpζ, supp ρq´1 follows from (2.5) again.
Proof of Theorem 2.5. Recall the model parameters π 1 , π 2 from A2. We consider the MDE (2.2) entrywise and see that where we used (2.12), (2.13) and M max ď M . By (4.9) and ζ P D δ , we have }M} À 1 δ . Furthermore, for large |ζ| we also have }M} À 1{|ζ|. We can now apply Lemma A.2 from the appendix with the choice R :" M M´1 to see the existence of a positive sequence γ such that (2.15) holds. This finishes the proof of Theorem 2.5.

Stability of the Matrix Dyson Equation
The goal of this section is to prove Proposition 2.2 and the stability of the MDE, Theorem 2.6. The main technical result, which is needed for these proofs, is the linear stability of the MDE. For its statement we introduce for any R P C NˆN the sandwiching operator Note that C´1 R " C R´1 and CR " C R˚f or any R P C NˆN , where CR denotes the adjoint with respect to the scalar product (2.1).
There exists a universal numerical constant C ą 0 such that uniformly for all ζ P H.
Before we show a few technical results that prepare the proof of Proposition 4.3, we give a heuristic argument that explains the connection between this proposition and the linear stability of the MDE. Let us suppose that the perturbed MDÉ with perturbation matrix D has a unique solution GpDq, depending differentiably on D.
Then by differentiating on both sides of (4.19) with respect to D, setting D " 0 and using the MDE for Mpζq " Gp0q, we find where ∇ R denotes the directional derivative with respect to D in the direction R P C NˆN . Rearranging the terms in (4.20) and multiplying with M " Mpζq from the left yields Thus GpDq has a bounded derivative at D " 0, i.e. the MDE is stable with respect to the perturbation D to linear order, whenever the operator Id´C M S is invertible and its inverse is bounded.
The following definition will play a crucial role in the upcoming analysis.
Definition 4.4 (Saturated self-energy operator). Let M " Mpζ q be the solution of the MDE at some spectral parameter ζ P H. We define the linear operator F " F pζ q : where we have introduced an auxiliary matrix We call F the saturated self-energy operator or the saturation of S for short.
The operator F inherits the self-adjointness with respect to (2.1) and the property of mapping C`to itself from the self-energy operator S. We will now briefly discuss the reason for introducing F . In order to invert Id´C M S in (4.21) we have to show that C M S is dominated by Id in some sense. Neither S nor M can be directly related to the identity operator, but their specific combination C M S can. We extract this delicate information from the MDE via a Perron-Frobenius argument. Unfortunately C M S is neither positivity preserving nor symmetric. The key is to find an appropriate symmetrization of this operator before Perron-Frobenius is applied. A similar problem appeared in a simpler commutative setting in [2]. There, M " diagpmq was a diagonal matrix and the MDE became a vector equation. In this case the problem of inverting Id´C M S reduces to inverting a matrix 1´diagpmq 2 S, where S P R NˆN is a matrix with non-negative entries that plays the role of the self-energy operator S in the current setup. The idea in [2] was to write with invertible diagonal matrices R and T, a diagonal unitary matrix U and a self-adjoint F with positive entries that satisfies the bound F ď 1. It is then possible to see that U´F is invertible as long as U does not leave the Perron-Frobenius eigenvector of F invariant. In this commutative setting it is possible to choose F " diagp|m|qS diagp|m|q, where the absolute value is taken in each component. In our current setting we will achieve a decomposition similar to (4.23) on the level of operators acting on C NˆN (cf. (4.38) below). The definition (4.22) ensures that the saturation F is self-adjoint, positivitypreserving and satisfies F ď 1, as we will establish later.
Lemma 4.5 (Bounds on W). Assume A1 and A ď P 0 for some constant P 0 ą 0. Then uniformly for all spectral parameters ζ P H with |ζ| ď 3p1`κq the matrix W " Wpζ q P C`, defined in (4.22b), fulfils the bounds Proof. We write W 4 in a form that follows immediately from its definition (4.22b), We estimate pIm Mq´1 from above and below by employing (4.11) in the regime |ζ| À 1, Using the trivial bounds 2 M´1 ´2 ď M˚M`MM˚ď 2 M 2 1 and M´1 À 1 from (4.10) as well as (4.11) again, we find This is equivalent to (4.24).
Lemma 4.6 (Spectrum of F ). Assume A1 and A ď P 0 for some constant P 0 ą 0. Then the saturated self-energy operator F " F pζ q, defined in (4.22), has a unique normalized, F hs " 1, eigenmatrix F " Fpζ q P C`, corresponding to its largest eigenvalue, Furthermore, the following properties hold uniformly for all spectral parameters ζ P H such that |ζ| ď 3p1`κq and F pζ q sp ě 1{2.
(i) The spectral radius of F is given by The eigenmatrix F is bounded from above and below by The saturation F has the uniform spectral gap, i.e., Proof. Since F preserves the cone C`of positive semidefinite matrices, a version of the Perron-Frobenius theorem for cone preserving operators implies that there exists a normalized F P C`such that (4.25) holds. We will show uniqueness of this eigenmatrix later in the proof. First we will prove that (4.26) holds for any such F.
Proof of (i): We define for any matrix R P C NˆN the operator K R : Note that for self-adjoint R P C NˆN we have K R " C R (cf. (4.17)). Using definition (4.29), the imaginary part of the MDE (4.6) can be written in the form We will now write up the equation (4.30) in terms of Im M, F and W. In order to express M in terms of W, we introduce the unitary matrix via the spectral calculus of the self-adjoint matrix C´1 ? Im M rRe Ms. With (4.31) and the definition of W " |C´1 ?
Im M rRe Ms´i1| 1{2 from (4.22b) we may write M as Here, the matrices W and U commute. Using (4.32) we also find an expression for K M , namely Plugging (4.33) into (4.30) and applying the inverse of C ? Im M C W K U˚o n both sides, we end up with where we used the definition of F from (4.22) and K´1 U˚r W´2s " W´2, which holds because U and W commute. We project both sides of (4.34) onto the eigenmatrix F of F . Since F is self-adjoint with respect to the scalar product (2.1) and by the eigenmatrix equation (4.25) we get Solving this identity for F sp yields (4.26).
Proof of (ii) and (iii): Let ζ P H with |ζ| ď 3p1`κq and F pζq sp ě 1{2. The bounds on the eigenmatrix (4.27) and on the spectral gap (4.28) are a consequence of the following property of F .
• For all matrices R P C`the operator F satisfies We verify (4.35) below. Knowing (4.35), the remaining assertions, (4.27) and (4.28), of Lemma 4.6 follow from a technical result, Lemma A.3, in the appendix. This lemma shows the uniqueness of the eigenmatrix F as well. In the regime |ζ| ě 3p1`κq the constants hidden in the comparison relation of (4.35) will depend on |ζ|, but otherwise the upcoming arguments are not effected. In particular the qualitative property of having a unique eigenmatrix F remains true even for large values of |ζ|.
Proof of (4.35): The bounds in (4.35) are a consequence of Assumption A1 and the bounds (4.24) on W and (4.11) on Im M, respectively. Indeed, from A1 we have SrRs " xRy1 for positive semidefinite matrices R. By the definition (4.22a) of F this immediately yields Since (4.24) and (4.11) imply we conclude that (4.35) holds.
Proof of Proposition 4.3. To show (4.18) we consider the regime of large and small values of |ζ| separately. We start with the simpler regime, |ζ| ě 3p1`κq. In this case we apply the bound which is an immediate consequence of the Stieltjes transform representation (2.5) of M.
In particular, where we used κ ě S 1{2 in the last and second to last inequality. We also used that T sp ď T for any self-adjoint T P C NˆN . The claim (4.18) hence follows in the regime of large |ζ|.
Now we consider the regime |ζ| ď 3p1`κq. Here we will use the spectral properties of the saturated self-energy operator F , established in Lemma 4.6. First we rewrite Id´C Mpζq S in terms of F . For this purpose we recall the definition of U from (4.31). With the identity (4.32) we find Combining (4.37) with the definition of F from (4.22a) we verify The bounds (4.24) on W and (4.11) on Im M imply bounds on C W and C ? Im M , respectively. In fact, in the regime of bounded |ζ|, we have Therefore, taking the inverse and then the norm ¨ sp on both sides of (4.38) and using (4.39) as well as C T sp ď C T for self-adjoint T P C NˆN yields Note that C U and C U˚a re unitary operators on C NˆN and thus C U sp " C U˚ sp " 1.
We estimate the norm of the inverse of C U´F . In case F sp ă 1{2 we will simply use the bound pC U´F q´1 sp ď 2 in (4.40) and (4.9) for estimating M , thus verifying (4.18) in this case. If F sp ě 1{2, we apply the following lemma, which was stated as Lemma 5.6 in [2].
Lemma 4.7 (Rotation-Inversion Lemma). Let T be a self-adjoint and U a unitary operator on C NˆN . Suppose that T has a spectral gap, i.e., there is a constant GappT q ą 0 such that with a non-degenerate largest eigenvalue T sp ď 1. Then there exists a universal positive constant C such that where T is the normalized, T hs " 1, eigenmatrix of T , corresponding to T sp .
With the lower bound (4.28) on the spectral gap of F , we find Plugging (4.41) into (4.40) and using (4.9) to estimate M , shows (4.18), provided the denominator on the right hand side of (4.41) satisfies for some universal constant C ą 0.
In the remainder of this proof we will verify (4.42). We establish lower bounds on both arguments of the maximum in (4.42) and combine them afterwards. We start with a lower bound on 1´ F sp . Estimating the numerator of the fraction on the right hand side of (4.26) from below xF, C W rIm Msy Á ρ xF,W 2 y Á M ´2 xFy , and its denominator from above, xF ,W´2y À ρ M 2 xFy , by applying the bounds from (4.24) and (4.11), we see that Since ρpζ q is the harmonic extension of a probability density (namely the density of states ρ), we have the trivial upper bound ρpζ q À Im ζ{ distpζ, supp ρq 2 . Continuing from (4.43) we find the lower bound where we used (4.9) in the second inequality. Now we estimate |1´xF, C U rFsy| from below. We begin with where 1´xF , C Re U Fy ě 0 in the last inequality, because U is unitary and F hs " 1. Since Im U "´W´2 (cf. (4.31) and (4.22)) and because of (4.24) we havé Continuing from (4.45) and using the normalization F hs " 1, we get the lower bound Here, (4.9) was used again. Combining (4.46) with (4.44) shows (4.42) and thus finishes the proof of Proposition 4.3.
Proof of Proposition 2.2. We show that the harmonic extension ρpζ q of the density of states (cf. (2.10)) is uniformly c-Hölder continuous on the entire complex upper half plane. Thus its unique continuous extension to the real line, the density of states, inherits this regularity. We differentiate both sides of the MDE with respect to ζ and find the equation Inverting the operator Id´C M S and taking the normalized Hilbert-Schmidt norm reveals a bound on the derivative of the solution to the MDE, Since ζ Ñ xMpζ qy is an analytic function on H, we have the basic identity 2πiB ζ ρ " 2iB ζ ImxMy " B ζ xMy. Therefore, making use of (4.48), we get (4.49) For the last inequality in (4.49) we employed the bound (4.9) and the linear stability, Proposition 4.3. The universal constant C stems from its statement (4.18). From (4.49) we read off that the harmonic extension ρ of the density of states is 1 C`3 -Hölder continuous. It remains to prove that ρ is real analytic at any τ 0 with ρpτ 0 q ą 0. Since ρ is continuous, it is bounded away from zero in a neighborhood of τ 0 . Using (4.47), (4.9) and (4.18) we conclude that M is uniformly continuous in the intersection of a small neighborhood of τ 0 in C with the complex upper half plane. In particular, M has a unique continuous extension Mpτ 0 q to τ 0 . Furthermore, by differentiating (2.2) with respect to ζ and by the uniqueness of the solution to (2.2) with positive imaginary part one verifies that M coincides with the solution Q to the holomorphic initial value problem i.e. Mpτ 0`ω q " Qpωq for any ω P H with sufficiently small absolute value. Since the solution Q is analytic in a small neighborhood of zero, we conclude that M can be holomorphically extended to a neighborhood of τ 0 in C. By continuity (2.10) remains true for ζ P R close to τ 0 and thus ρ is real analytic there.
In the proof of Theorem 2.6 we will often consider T : pC NˆN , ¨ A q Ñ pC NˆN , ¨ B q, i.e., T is a linear operator on C NˆN equipped with two different norms. We indicate the norms in the notation of the corresponding induced operator norm T AÑB . We will use A, B " hs, ¨ , 1, 8, max, etc. We still keep our convention that T sp " T hsÑhs and T " T ¨ Ñ ¨ . Some of the norms on matrices R P C NˆN are ordered, e.g.
with ¨ 1 and ¨ 8 denoting the operator norms induced by the ℓ 1 -and ℓ 8 -norms on C N , respectively: In particular, for T : C NˆN Ñ C NˆN we have e.g.
Proof of Theorem 2.6. We will apply a quantitative version of the implicit function theorem, Lemma A.4. For this purpose we define the function J : With this definition the perturbed MDE (2.16) takes the form For the application of the implicit function theorem we control the derivatives of J with respect to G and D. With the short hand notation For the second identity in (4.54) we used (2.2). The derivative with respect to D is simply the identity operator, Therefore, estimating ∇ pDq J is trivial. We consider C NˆN with the entrywise maximum norm ¨ max and use the short hand notation T max :" T maxÑmax for the induced operator norm of any linear T : C NˆN Ñ C NˆN . To apply Lemma A.4 we need the following two estimates: (i) The operator norm of the inverse of Id´C M S on pC NˆN , ¨ max q is controlled by its spectral norm, We will prove these estimates after we have used them to show that the hypotheses of the quantitative inverse function theorem hold. Let us first bound the operator R Þ Ñ ∇ pGq R J rM, 0s. To this end, using (4.54) we have p∇ pGq for an arbitrary R. For the last line we have used MR max ď M 1_8 R max . By Theorem 2.5 there is a sequence γ, depending only on δ and P, such that Here and in the following unrestricted summations ř x are understood to run over the entire index set from 1 to N. Since the sizes of the balls with respect to d grow only polynomially in their radii (cf. (2.11)), the right hand side of (4.58) is bounded by a constant that only depends on δ and P for a sufficiently large choice of ν. Using this estimate together with the bound (i) for the inverse of Id´C M S in (4.57) yields p∇ pGq R J rM, 0sq´1 max À 1. Next, to verify assumption (A.53) of Lemma A.4 we write (4.59) Using (4.56) and (4.55), in conjunction with (4.9) and (4.18), we see that 1 are sufficiently small. The first part of Theorem 2.6, the existence and uniqueness of the analytic function G, now follows from the implicit function theorem Lemma A.4.
Proof of (i): First we remark that (2.13) for a large enough ν and together with (2.11) imply S maxÑ1_8 À 1 . (4.61) We expand the geometric series corresponding to the operator pId´C M Sq´1 to second order, We consider each of the three terms on the right hand side separately and estimate their norms as operators from C NˆN with the entrywise maximum norm to itself. The easiest is Id max " 1. For the second term we use the estimate For the third term on the right hand side of (4.62) we apply The last factor on the right hand side of (4.64) is bounded by For the first factor we use We plug (4.66) and (4.65) into (4.64). Then we use the resulting inequality in combination with (4.63) in (4.62) and find where we also used S maxÑ ¨ ď S maxÑ1_8 and (4.61). Since C M ď M 2 the claim (4.55) follows.
Proof of (ii): Recall the definition of W R in (4.53). We estimate From the bound (4.61) we infer (4.56).
Proof of ( for all R P C NˆN . Here, the linear operator Z is given by (4.68) We will estimate the entries of the three summands separately. We show that ZrRs γ ď 1 2 for any R P C NˆN with R max ď 1, where γ depends only on δ and P. We begin with a few easy observations: For two matrices R, T P C NˆN that have faster than power law decay, R γ R ď 1 and T γ T ď 1, their sum and product have faster than power law decay as well, i.e., R`T γ R`T ď 1 and RT γ RT ď 1.
Here, γ R`T and γ RT depend only on γ R , γ T and P (cf. (2.11)). Furthermore, we see that by (2.13) the matrix SrRs has faster than power law decay for any R P C NˆN with R max ď 1.
By the following argument we estimate the first summand on the right hand side of (4.68). Using (2.13), MR max ď M 1_8 R max and the estimate (4.58), the matrix SrMRs has faster than power law decay. Since C M multiplies with M on both sides (cf. (4.17)) and M has faster than power law decay (cf. Theorem 2.5), we conclude that so has C M SrMRs.
Now we turn to the second summand on the right hand side of (4.68). Since C M SrMRs has faster than power law decay, its entries are bounded. Using again (2.13) as above, we see that C M S applied to C M SrMRs has faster than power law decay as well.
Finally, we estimate the third summand from (4.68). Since the matrix C M SrMRs has faster than power law decay, its ¨ hs -norm is bounded. By (4.18) and ζ P D δ , we conclude that which is bounded by (4.61) and (4.58). Therefore, the third term on the right hand side of (4.68) is an application of C M S to a matrix with bounded entries, which results in a matrix with faster than power law decay. Altogether we have established that (2.19) hold with only ZrRs γ on the left hand side. It remains to show that also Z˚rRs satisfies this bound. Since Z˚has a structure that resembles the structure (4.68) of Z, namely we can follow the same line of reasoning as for the entries of ZrRs. This finishes the proof of (2.19) and with it the proof of Theorem 2.6.

Estimating the error term
In this section we prove the key estimates, stated precisely in Lemmas 3.3 and 5.1, for the error matrix D that appears as the perturbation in the equation (3.1) for the resolvent G. We start by estimating D away from the convex hull of supp ρ in terms of Λ defined in (3.6). To this end, we define κ´:" min supp ρ , κ`:" max supp ρ , to be the two endpoints of this convex hull.
Convention 5.2. Throughout this section we will use Convention 4.1 with the set of model parameters P replaced by the set K from (2.29). If the constant C, hidden in the comparison relation, depends on additional parameters L , then we write α À L β.
We rewrite the entries d xy of D in a different form, that allows us to see their smallness, by expanding the term pH´AqG (cf. (3.1b)) in neighborhoods of x and y. For any B Ď t1, . . . , Nu we introduce the matrix obtained from H by setting the rows and the columns labeled by the elements of B equal to zero. The corresponding resolvent is With this definition, we have the resolvent expansion formula In particular, for any y " 1, . . . , N the rows of G outside B have the expansion Here we introduced, for any two index sets A, B Ď X, the short hand notation B ÿ xPA :" ÿ xPAzB .
In case A " X we simply write ř B x , i.e., the superscript over the summation means exclusion of these indices from the sum. Recall that H is written as a sum of its expectation matrix A and its fluctuation 1 ? N W (cf. (2.24)) and therefore We use the expansion formula (5.5) on the resolvent elements in pWGq xy " ř u w xu G uy and find that the entries of D can be written in the form Note that the set B here is arbitrary, e.g., it may depend on x and y. In fact, we will choose it to be a neighborhood of tx, yu, momentarily. Let A Ď B be another index set. We split the sum over z P B in the second term on the right hand side of (5.6) into a sum over w P A and w P BzA and use (2.24) again, We end up with the following decomposition of the error matrix: where the entries d pkq xy of the individual matrices D pkq are given by for some ε 1 ą 0. Note that although D itself does not depend on the choice of ε 1 , its decomposition into D pkq does. We will estimate each error matrix D pkq separately, where the estimates may still depend on ε 1 . Since ε 1 ą 0 is arbitrarily small, it is eliminated from the final bounds on D using the following property of the stochastic domination (Definition 3.2): If some positive random variables X, Y satisfy X ă N ε Y for every ε ą 0, then X ă Y .
The following lemma provides entrywise estimates on the individual error matrices.
Lemma 5.3. Let C ą 0 a constant and ζ P H with distpζ, SpecpH B qq´1 ă N C for all B Ĺ X. The entries of the error matrices D pkq " D pkq pζ q, defined in (5.7), satisfy the bounds where the pB k q |Bxy| k"0 in (5.9e) are an arbitrary increasing sequence of subsets of B xy with B k`1 " B k Y tx k u for some x k P B xy . In particular, H " Proof. We show the estimates (5.9a) to (5.9e) one by one. The bound (5.9a) is trivial since by the bounded moment assumption (2.25) the entries of W satisfy |w xy | ă 1. For the proof of (5.9b) we simply use first Cauchy-Schwarz in the v-summation of (5.7b) and then the Ward-identity, where we used the Schur complement formula in the form of the general resolvent expansion identity To the u-summation in ( The assumption (A.55) of Lemma A.5 is an immediate consequence of the decay of correlation (2.27). In order to verify (A.56) we use both (2.27) and the N-dependent smoothness where we used (5.10) again. Finally, we turn to the proof of (5.9e). Let B k be as in the statement of Lemma 5.3. We set and use a telescopic sum to write d p5q xy as We estimate the rightmost term in (5.14) simply by where the sum over u and v on the right hand side of the first inequality is bounded by a constant because of the decay of covariances (3.2) and we used (5.10) in the second ineqality. Thus, (5.9e) follows from (5.14) and the bound |α pk`1q xz´α pkq To show (5.15) we first see that where we used the general resolvent identity The last two terms on the right hand side of (5.16) are estimated by the second term on the right hand side of (5.15) using first Cauchy-Schwarz, the decay of covariances (3.2), and then the Ward-identity (5.10). For the first term in (5.16) we use the same argument as in (3.4) for any two vectors r, t P C NˆN . We obtain (5.15) by applying (5.17) with the choice r u :" G B k ux k , t v :" G B k x k v and using the Ward-identity afterwards. In this way (5.9e) follows and Lemma 5.3 is proven.
The following definition is motivated by the formula that expresses the matrix elements of G B in terms of the matrix elements of G. For R P C NˆN and A, B Ĺ X we denote by R AB :" pr xy q xPA,yPB its submatrix. In case A " B we write R AB " R A for short. Now let x P XzB. Then we have Lemma 5.5. Let δ ą 0 and ζ P H be such that δ ď distpζ, rκ´, κ`sq`ρpζ q ď 1 δ .
Then for all B Ĺ X and x, y R B the matrix M B , defined in (5.19), satisfies for all x, y R B.
Proof. We begin by establishing upper and lower bounds on the singular values of M B , We will make use of the following general fact: Let R P C NˆN satisfy the upper bound R À δ 1 as well as Then any submatrix R A of R satisfies We verify (5.24) for R " M in two separate regimes and thus show (5.23). First let ζ be such that ρpζq ě δ{2. Then the lower bound in the imaginary part in (5.24) follows from (4.11) and (4.9). Now let ζ be such that δ{2 ď distpζ, rκ´, κ`sq ď δ´1. Then we may also assume that we have distpRe ζ, rκ´, κ`sq ě δ 4 , because otherwise Im ζ ě δ 4 and thus ρpζq Á δ 1. In this situation the claim follows from the case that we already considered, namely ρpζq ě δ{2 because there δ was arbitrary. Since M is the Stieltjes transform of a C`-valued measure with support in rκ´, κ`s (cf. (2.5)), its real part is positive definite to the left of κá nd negative definite to the right of κ`. In both cases we also have the effective bound |Re M| Á δ 1 because distpRe ζ, rκ´, κ`sq ě δ 4 . Now we apply (5.23) to see (5.20). By (2.26) and (2.13) the right hand side of (2.22) and with it M´1 has faster than power law decay. The same is true for its submatrix with indices in XzB. Thus (5.20) follows directly from the definition (5.19) of M B , the upper bound on its singular values from (5.23) and Lemma A.2.
To prove (5.21) we use where we applied (5.23) for the first comparison relation and used´Im M´1 " δ ρ1 (cf. (4.11) and (4.10)) for the second. The bound on Im m B xx in (5.21) follows and the bound on |m B xx | follows at least in the regime ρpζq ě δ{2. We are left with showing |m B xx | Á δ 1 in the case δ{2 ď distpζ, rκ´, κ`sq ď δ´1. As we did above, we may assume that distpRe ζ, rκ´, κ`sq ě δ 4 . We restrict to Re ζ ď κ´´δ 2 . The case Re ζ ě κ``δ 2 is treated analogously. In this regime where we used Re M´1 " pM´1q˚pRe Mq M´1 " δ Re M " δ 1 for the last comparison relation. Thus, (5.21) follows. Now we show (5.22). By the Schur complement formula we have for any T P C NˆN the identitỳ pT B q tx,yu˘´1 "`T tx,yu´Ttx,yuB pT B q´1T Btx,yu˘´1 "`pT BYtx,yu q´1˘t x,yu , (5.26) for x, y R B and T B :" ppT´1q XzB q´1, provided all inverses exist. We will use this identity for T " M, G. Note that this definition T B with T " G is consistent with the definition (5.4) on the index set XzB because of (5.18). Recalling that G BYtx,yu " pG u,v q u,vPBYtx,yu and M BYtx,yu are matrices of dimension |B|`2, we have G BYtx,yu´MBYtx,yu ď p|B|`2q G BYtx,yu´MBYtx,yu max ď p|B|`2qΛ .
Therefore, as long as p|B|`2q Λ pM BYtx,yu q´1 ď 1 2 we get where we used in the last step that pM BYtx,yu q´1 " δ 1, which follows from using (5.24) and (5.25) for the choice R " M in the regimes ρ Á δ 1 and distpRe ζ, rκ´, κ`sq Á δ 1, respectively. Again using the definite signs of the imaginary and real part of M as well as that of pM BYtx,yu q´1 in these two regimes, we infer that `p pM BYtx,yu q´1q tx,yu˘´1 " δ 1 , as well. We conclude that there is a constant c, depending only on δ and K , such that `p pG BYtx,yu q´1q tx,yu˘´1´`p pM BYtx,yu q´1q tx,yu˘´1 ½´Λ ď With the identity (5.26) the claim (5.22) follows and Lemma 5.5 is proven.
Proof of Lemmas 3.3 and 5.1. We begin with the proof of (3.7). We continue the estimates on all the error matrices listed in Lemma 5.3. Therefore, we fix ζ " τ`i with |τ | ď C. Since Im ζ " 1, we have the trivial resolvent bound for any B Ď X. We also have a lower bound on diagonal elements, Indeed, by the Schur complement formula applied to the px, xq-element of the resolvent G B " pH B´ζ 1q´1 we havé We take absolute value on both sides and estimate trivially, Here we used (5.27) to control the norm of the resolvent and the assumptions (2.26) and (2.25) to bound ř u |h xu | 2 . By the choice of A xy and B xy from (5.8) and assumption (2.11) we also have |A xy | ď |B xy | ă N ε 1 P . (5.29) We apply (5.27), (5.28) and (5.29) to estimate the right hand sides of the bounds in (5.9) further and find for any ν P N. Here we also used that by assumption (2.26) for any ν P N the expectation matrix satisfies to obtain the second summand on the right hand side of (5.30) from estimating |d p2q xy |. Since ε 1 ą 0 was arbitrary (5.30) implies (3.7). Now we prove (3.8) and (5.2) in tandem. Let δ ą 0 and ζ P H such that δ ď distpζ, rκ´, κ`sq`ρpζq ď δ´1 and Im ζ ě N´1`ε. We show that (5.32) From (5.32) the bound (3.8) follows immediately in the regime where ρ ď δ. Also (5.2) follows from (5.32). Indeed, in the regime of spectral parameters ζ P H with δ ď distpζ, rκ´, κ`sq ď δ´1 we have ρ " δ Im ζ because ρ is the harmonic extension of a probability density supported inside rκ´, κ`s. For the proof of (5.32) we use (5.22), (5.21), (5.31), (5.29) and G xy max À 1`Λ (cf. (4.9)) to estimate the right hand side of each inequality in (5.9). In this way we get for any ν P N, provided ε ą ε 1 P to ensure N´ε ď c{|B xy |, i.e. that the constraint Λ ď N´ε makes (5.22) applicable. Here, we also used ρ Á δ Im ζ to see that ρ N Im ζ Á δ 1 N . Since (5.33) holds for arbitrarily small ε 1 ą 0, the claim (5.32) and with it Lemmas 3.3 and 5.1 are proven.

Fluctuation averaging
The main result of this section is Proposition 3.4 by which a error bound Ψ for the entrywise local law can be used to improve the bound on the error matrix D to Ψ 2 , once D is averaged against a non-random matrix R with faster than power law decay.
Proof of Proposition 3.4. Let R P C NˆN with R β ď 1 for some positive sequence β. Within this proof we use Convention 4.1 such that ϕ À ψ means ϕ ď Cψ for a constant C, depending only on Ă P :" pK , δ, ε 1 , β, Cq, where C and δ are the constants from the statement of the proposition, K are the model parameters (cf (2.29)) and ε 1 enters in the splitting of the error matrix D into D pkq (cf. (5.8)). Note that since ε 1 is arbitrary it suffices to show (3.13) up to factors of N ε 1 . We will also use the notation O ă pΦq for a random variable that is stochastically dominated by some nonnegative Φ.
We split the expression xR, Dy from (3.13) according to the definition (5.7) of the matrices D pkq . Then we estimate xR, D pkq y separately for every k. We do this in three steps. First we estimate D pkq max for k " 2, 3, 5 directly without using the averaging effect of R. Afterwards we show the bounds on xR, D p1q y and xR, D p4q y, respectively. In the upcoming arguments the following observation will be useful. The local law (3.12) together with (5.22) implies that for every B Ď X with |B| ď N ε{2 we have Here, until the end of this proof, we consider G B as the C pN´|B|qˆpN´|B|q -matrix G B " pG B xy q x,yRB as opposed to the general convention (5.3). Estimating D pkq max : Here, we show that under the assumption (3.12) the error matrices with indices k " 2, 3, 5 satisfy the improved entrywise bound where ε 1 stems from (5.8) and P is the model parameter from (2.11). We start by estimating the entries of D p2q . Directly from its definition in (5.7b) we infer The maximum norm on the entries of the resolvents G and G Bxy are bounded by (6.1) and (5.20). The decay (2.12) of the entries of the bare matrix and that dpv, zq ě N ε 1 in the last sum then imply D p2q max ă N´ν for any ν P N. To show (6.2) for k " 3 we use the representation (5.11) and the large deviation estimate (5.12) just as we did in the proof of Lemma 5.3. In this way we get for any ν P N. Since Ψ ě N´1 {2 we infer (6.2) for k " 3 from (6.3).
Finally we consider the case k " 5. We follow the proof of Lemma 5.3 and use the representation (5.14). We estimate the two summands on the right hand side of (5.14), starting with the second term. We rewrite this term in the form ř Axy z S xz rGs G zy and use (2.13) as well as G max ď M max`Ψ together with the upper bound on M in (2.15).
To bound the first term on the right hand side of (5.14) we use (5.16). Each of the three terms on the right hand side of (5.16) has to be bounded by N 2ε 1 P Ψ 2 . The second and third term are bounded by 1 N by the decay of covariances (3.2). For the first term we use (5.17), (6.1) and (5.20).
We split the error matrix D p1q into two pieces D p1q " D p1aq`Dp1bq , defined by where B xy is a 2N ε 1 -environment of the set tx, yu (cf. (5.8)). The matrix D p1bq we estimate simply by where we used (5.29) and (6.1). For the bound on xR, D p1aq y we write xR, D p1aq y " X`Y`Z , (6.7) where the three summands on the right hand side are defined as The faster than power law decay of both R and M (cf. (2.15)), i.e. that |r xy |`|m xy | À 1 N for dpx, yq ě N ε 1 , yields the high moment bounds where the sums are over index tuples x " px 1 , . . . , x 2µ q P X 2µ and B k x :" B kN ε 1 pxq is the ball around x with respect to the product metric dpx, yq :" In (6.8c) we have used the triangle inequality to conclude that dpu, xq ď 3N ε 1 . By trivial bounds we see that the right hand side of (6.8a) is ă µ N 2P ε 1 µ N´3 µ . This suffices to show (6.5). For Y and Z we continue the estimates in (6.8) by using the decay of correlations (2.27) and the ensuing lumping of index pairs px i , u i q: where d sym ppx 1 , x 2 q, py 1 , y 2 qq :" dppx 1 , x 2 q, tpy 1 , y 2 q, py 2 , y 1 quq is the symmetrized distance on X 2 , induced by d. Inserting (6.9) into the moment bounds on Y and Z effectively reduces the combinatorics of the sum in (6.8b) from N 4µ to N 2µ and in (6.8c) from N 2µ to N µ . We conclude that Combining this with (6.6) implies (6.5) because of and Ψ ě N´1 {2 .
Estimating xR, D p4q y: Similarly to the strategy for estimating |xR, D p1q y| we write From the decay of correlations (2.27) and dptx, zu, XzB xy q ě N ε 1 for any z P A xy as well as the N-dependent smoothness of the resolvent as a function of the matrix entries of W for distpζ, SpecpHqq ě N´C we see that Lemma A.6 can be applied for a large deviation estimate on the pu, vq-sum in the definition (6.10) of Z B xz for B " B xy , i.e.
Here we also used (6.1) and (5.20) for the second stochastic domination bound. Combining (6.11) with (6.1) we see that The rest of the proof of Proposition 3.4 is dedicated to showing the high moment bound E |xR, D p4aq y| 2µ À µ N Cpµqε 1 Ψ 2µ . (6.13) Together with (6.2), (6.5) and (6.12) this bound implies (3.13) since ε 1 can be chosen arbitrarily small. In analogy to (6.7) we write xR, D p4aq y " X`Y`Z , for the three summands with the short hand σ xzy :" 1 N r xy Z Bxy xz m zy .
Similar to (6.8) the faster than power law decay of R and M implies the moment bounds Using a priori bound (6.11) it trivially follows that X is bounded by CpµqN 4ε 1 P µ N´2 µ Ψ 2µ . Since this is already sufficient for (6.14a) we focus on the terms Y and Z in (6.14). We call the subscripts i of the indices x i and z i labels. In order to further estimate the moments of Y and Z we introduce the set of lone labels of px, zq: Lpx, zq :" The corresponding index pair px i , z i q for i P Lpx, zq, is called lone index pair. We partition the sums in (6.14b) and (6.14c) according to the number of lone labels, i.e. we insert the partition of unity A simple counting argument reveals that fixing the number of lone labels reduces the combinatorics of the sums in (6.14b) and (6.14c). More precisely, The expectation in (6.14b) and (6.14c) is bounded using the following technical result.
Lemma 6.1 (Key estimate for averaged local law). Assume the hypotheses of Proposition 3.4 hold, let µ P N and x, y P X 2µ . Suppose there are 2µ subsets Q 1 , . . . , Q 2µ of X, Using (6.16) and Lemma 6.1 on the right hand sides of (6.14b) and (6.14c) after partitioning according to the number of lone labels, yields Since Ψ ě N´1 {2 the high moment bounds in (6.18) together with (6.14a) imply (6.13). This finishes the proof of Proposition 3.4 up to verifying Lemma 6.1 which will occupy the rest of the section.
Proof of Lemma 6.1. Let us consider the data ξ :" px, y, pQ i q 2µ i"1 q , fixed. We start by writing the product on the left hand side of (6.17) in the form.
where the two auxiliary functions Γ ξ , w x,y : X 2µˆX2µ Ñ C, are defined by In order to estimate (6.19) we partition the sum over the indices u i and v i depending on their distance from the set of lone index pairs, px i , y i q with i P L, where L " Lpx, yq. To this end we introduce the partition tB i : i P t0u Y Lu of X, and the shorthand Bpξ, σq :" where the components σ i " pσ 1 i , σ 2 i q P pt0u Y Lq 2 of σ " pσ i q 2µ i"1 specify whether u i and v i are close to a lone index pair or not; e.g. σ 1 i determines which lone index u i is close to, if any. For any fixed ξ, as σ runs through all possible elements of pt0u Y Lq 4µ , the sets Bpξ, σq form a partition of the summation set on the right hand side of (6.19) (taking into account the restriction u i , v i R Q i ). Therefore it will be sufficient to estimate ÿ pu,vqPBpξ,σq w x,y pu, vq Γ ξ pu, vq (6.24) for every fixed σ P pt0u Y Lq 4µ . Since x i and y i are fixed, while u i and v i are free variables, with their domains depending on pξ, σq, we say that the former are external indices and the latter are internal indices. Let us define the set of isolated labels, p Lpx, y, σq " Lpx, yqztσ 1 1 , . . . , σ 1 2µ , σ 2 1 , . . . , σ 2 2µ u , (6.25) so that if an external index has an isolated label as subscript, then it is isolated from all the other indices in the following sense: Notice that isolated labels indicate not only separation from all other external indices, as lone labels do, but also from all internal indices. Given a resolvent entry G B uv we will refer u, v as lower indices and the set B as an upper index set.
The next lemma, whose proof we postpone until the end of this section, yields an algebraic representation for (6.24) provided the internal indices are properly restricted. (signed) monomials Γ ξ,σ,α : Bpξ, σq Ñ C, such that Γ ξ,σ,α pu, vq for each α is of the form: Here the notations p´1q # and p¨q # indicate possible signs and complex conjugations that may depend only on pξ, σ, αq, respectively, and that will be irrelevant for our estimates. The dependence on pξ, σ, αq has been suppressed in the notations, e.g., n " npξ, σ, αq, U r " U r pξ, σ, αq, etc. The numbers n and q of factors in (6.28) are bounded, n`q À µ 1. Furthermore, for any fixed α the two subsets R pkq , k " 1, 2, form a partition of t1, . . . , 2µu, and the the monomials (6.28) have the following four properties: 2. The upper index sets E r , F r , U r , U 1 r , V 1 r are bounded in size by N Cpµqε 1 , and B r Ď U r , U 1 r , V 1 r . The total number of these sets appearing in the expansion (6.26) is bounded by N Cpµqε 1 .
3. At least one of the following two statements is always true: Since Lemma 6.1 relies heavily on this representation, we make a few remarks: (i) Monomials with different values of α may be equal. The indices a t , b t , u 1 t , v 1 t , w t may overlap, but they are always distinct from the internal indices since from (6.23) and (6.25) we see that (ii) The reciprocals of the resolvent entries are not important for our analysis because the diagonal resolvent entries are comparable to 1 in absolute value when a local law holds (cf. (5.21)). (iii) Property 3 asserts that each monomial is either a deterministic function of H pB i q for some isolated label i, and consequently almost independent of the rows/columns of H labeled by x i , y i (Case (I)), or the monomial contains at least | p L| additional offdiagonal resolvent factors (Case (II)). In the second case, each of these extra factors will provide an additional factor Ψ for typical internal indices due to faster than power law decay of M and the local law (3.12). Typical internal indices, e.g. when u r and v r are close to each other, do not give a factor Ψ since m urvr is not small, but there are much fewer atypical indices than typical ones and this entropy factor makes up for the lack of smallness. These arguments will be made rigorous in Lemma 6.3 below.
By using the monomial sum representation (6.26) in (6.24), and estimating each summand separately, we obtaiňˇˇˇˇE The right hand side of (6.31) will be bounded using the following three estimates which follow by combining the monomial representation with our previous stochastic estimates. Lemma 6.3 (Three sources of smallness). Consider an arbitrary monomial Γ ξ,σ,α , of the form (6.28). Then, under the hypotheses of Proposition 3.4, the following three estimates hold: 1. The resolvent entries with no internal lower indices are small while the reciprocals of the resolvent entries are bounded, in the sense that 2. If Γ ξ,σ,α satisfies (I) of Property 3 of Lemma 6.2, then its contribution is very small in the sense that | E w x,y pu, vq Γ ξ,σ,α pu, vq| À µ,ν N´ν , pu, vq P Bpξ, σq . where |σ r |˚:" |t0, σ 1 r , σ 2 r u|´1 counts how many, if any, of the two indices u r and v r , are restricted to vicinity of distinct external indices.
We postpone the proof of Lemma 6.3 and first see how it is used to finish the proof of Lemma 6.1. The bound (6.30) follows by combining Lemma 6.2 and Lemma 6.3 to estimate the right hand side of (6.31). If (I) of Property 3 of Lemma 6.2 holds, then applying (6.34) and (6.33) in (6.31) yields (6.30). On the other hand, if (I) of Property 3 of Lemma 6.2 is not true, then we use (6.33) and (6.35) to get By Part 3 of Lemma 6.2 we know that (II) holds. Thus the power of Ψ on the right hand side of (6.36) is at least 2µ`| p L|. On the other hand, from (6.25) we see that Hence the power of N´1 {2 on the right hand side of (6.36) is at least |L|´| p L|. Using these bounds together with Ψ ě N´1 {2 in (6.36), and then taking expectations yields (6.30). Plugging (6.30) into (6.29) completes the proof of (6.17).
Proof of Lemma 6.3. Combining (6.1) and (5.20) we see that for some sequence α |G E uv | ă N Cε 1 Ψ`α pνq p1`dpu, vqq ν , whenever u, v R E , and |E| ď N Cε 1 . (6.37) By the bound on the size of E t , F r in Property 2 of Lemma 6.2, (6.37) is applicable for these upper index sets. Then (6.33) follows from the second bound of Property 1 of Lemma 6.2 and the decay of the entries of M E from (5.20).
In order to prove Part 2, let i P p L be the label from (I) of Property 3 of Lemma 6.2. We have where the first term on the right hand side vanishes because w xy pu, vq's are centred random variables by (6.21). Now the covariance is smaller than any inverse power of N, since Since B s zQ r , B t zQ r Ď XzU r , the indices u, v do not overlap the upper index set U r . Hence, in the case k " 1 and s " t " 0 the estimate (6.35) follows from (A.76) of Lemma A.6. If s, t P L, then taking modulus of (6.39) and using (6.37) yields (6.35): where dpA, Bq :" inftdpa, bq : a P A, b P Bu for any sets A and B of X. Here we have also used the definition (6.15) of lone labels and Ψ ě N´1 {2 .
Suppose now that exactly one component of σ r equals 0 and one is in L. In this case, we split w xryr pu, vq in (6.39) into two parts corresponding to w xru w vyr and its expectation, and estimate the corresponding sums separately. First, using (A.57) of Lemma A.5 yields  On the other hand, using (6.37) we estimate the expectation part: As with the other part (6.41) because of (3.2) this is also O ă pN Cε 1 N´1q. As Ψ ě N´1 {2 , this finishes the proof of (6.35) in the case k " 1. Now we prove (6.35) for Θ p2q r . In this case, we need to bound, ÿ uPBszQr ÿ vPBtzQr w xryr pu, vq G where s " σ 1 r , t " σ 2 r have again values in t0u Y Lz p L, and By definitions of the lone and isolated labels (6.15) and (6.25), respectively, we know that, if s P Lz p L, then dpu, u 1 r q ě N ε 1 , and similarly, if t P Lz p L, then dpv 1 r , vq ě N ε 1 . Thus, if s, t P Lz p L, then estimating similarly as in (6.40) with (6.37), yields In the remaining cases, we split (6.43) into two parts corresponding to the term w xru w vyr and its expectation in the definition of (6.21) of w xryr pu, vq, and estimate these two parts separately.
The average part is bounded similarly as in (6.42), i.e., if s P Lz p L and t " 0, then Here dpu, u 1 r q ě N ε 1 since u P B s , s P Lz p L, while u 1 r P B p L . Taking ν ą Cε´1 1 and using the (3.2) to bound the sum over the covariances by a constant, we thus we see that the right hand side is O ă pN Cε 1 N´1Ψq. Since Ψ ě N´1 {2 , this matches (6.35) as |σ r |˚" |t0, s, tu|´1 " 1. Now, we are left to bound the size of terms of the form (6.43), where w xryt pu, vq is replaced with 1 N w xru w vyr , and either s " 0 or t " 0. In these cases the sums over u and v factorize, i.e., we have 1 Nˆÿ When the sum is over a small set, i.e., over B s 1 for some s 1 P Lz p L, then we estimate the sizes of the entries of W and G p#q by O ă pN´1 {2 q and O ă pΨq, respectively. On the other hand, when u or v is summed over B 0 zQ r , we use (A.57) of Lemma A.5 to obtain a bound of size O ă pΨq. In each case, we obtain an estimate that matches (6.35).
Proof of Lemma 6.2. We consider the data pξ, σq fixed, and write p L " p Lpx, y, σq, etc. We start by enumerating the isolated labels (see (6.25)) ts 1 , . . . , s p ℓ u " p L , p ℓ :" | p L| , (6.45) and set p Bpkq :" Y k j"1 B s j for 1 ď k ď p ℓ (recall the definition from (6.22) and that B s j 's are disjoint).
The monomial expansion (6.26) is constructed iteratively in p ℓ steps. Indeed, we will define 1`p ℓ representations, where the M k " M k pξ, σq monomials Γ pkq α " Γ pkq ξ,σ,α : Bpξ, σq Ñ C, evaluated at pu, vq P Bpξ, σq, are of the form The numbers m and q as well as the sets E t , F r may vary from monomial to monomial, i.e., they are functions of k and α. Furthermore, for each fixed k and α, the lower indices and the upper index sets satisfy (a) a t , b t P tu r , v r u p r"1 Y p Bpkq, and w s P ta t , b t u m t"1 ; (c) If a t P B s i and b t P B s j , with 1 ď i, j ď k, then i ‰ j; (d) For each s " 1, . . . , 2µ there are two unique labels 1 ď t 1 psq, t 2 psq ď m, such that a t 1 psq " u s , b t 1 psq R tv r u r‰s , and a t 2 psq R tu r u r‰s , b t 2 psq " v s hold, respectively.
We will call the right hand side of (6.46) the level-k expansion in the following and we will define it by a recursion on k.
We now consider a generic level-pk´1q monomial Γ pk´1q α , which is of the form (6.47) and satisfies (a)-(d). Each monomial Γ pk´1q α will give rise several level-k monomials that are constructed independently for different α's as follows. Expanding each of the m factors in the first product of (6.47) using the standard resolvent expansion identity and each of the q factors in the second product of (6.47) using yields a product of sums of resolvent entries and their reciprocals. Inserting these formulas into (6.47) and expressing the resulting product as a single sum yields the representation where A α pkq is some finite subset of integers and β simply labels the resulting monomials in an arbitrary way. From the resolvent identities (6.50) it is easy to see that the monomials Γ pkq β inherit the properties (a)-(d) from the level-pk´1q monomials. In particular, summing over α " 1, . . . , M k´1 in (6.51) yields the level-k monomial expansion (6.46), with M k :" ř α |A α pkq|. We will assume w.l.o.g. that the sets A α pkq, 1 ď α ď M k´1 , form a partition of the first M k integers.
This procedure defines the monomial representation recursively. Since Γ pkq α is a function of the pu, vq indices, strictly speaking we should record which lower indices in the generic form (6.47) are considered independent variables. Initially, at level k " 0, all indices are variables, see (6.20a). Later, the expansion formulas (6.50) bring in new lower indices, denoted generically by x ka from the set Y sP p L B s which is disjoint from the range of the components u r , v r of the variables pu, vq as Bpξ, σq is a subset of pXzY sP p L B s q 2µ . However, the structure of (6.50) clearly shows at which location the "old" a, b indices from the left hand side of these formulas appear in the "new" formulas on the right hand side. Now the simple rule is that if any of these indices a, b were variables on the left hand side, they are considered variables on the right hand side as well. In this way the concept of independent variables is naturally inherited along the recursion. With this simple rule we avoid the cumbersome notation of explicitly indicating which indices are variables in the formulas.
We note that the monomials of the final expansion (6.49) can be written in the form (6.28). Indeed, the second products in (6.28) and (6.47) are the same, while the first product of (6.47) is split into the three other products in (6.28) using (d). Properties 1 and 2 in Lemma 6.2 for the monomials in (6.49) follow easily from (a)-(d). Indeed, (a) yields the first part of Property 1, while the second part of Property 1 follows from (c) and the basic property dpB s , B t q ě N ε 1 for distinct lone labels s, t P p L. For a given ξ, we define the family of subsets of X: ) .
By construction (cf. (6.50) and (b)) the upper index sets are members of this ξ-dependent family. Since |Q r |, |B s k | ď N C 0 ε 1 , for some C 0 " 1, we get |E | À µ N C 0 µ . Property 2 follows directly from these observations. Next we prove Property 3 of the monomials (6.49). To this end, we use the formula (6.51) to define a partial ordering 'ă' on the monomials by It follows that for every α " 1, 2, . . . , M " M p ℓ , there exists a sequence pα k q p ℓ´1 k"1 , such that Let us fix an arbitrary label α " 1, . . . , M of the final expansion. Suppose that the k-th monomial Γ pkq α k , in the chain (6.53), is of the form (6.47), and define Here, D k is the largest set A Ď X, such that Γ pkq α k depends only on the matrix elements of H pAq .
Since both the upper index sets and the total number of resolvent elements of the form G pAq ab are both larger (or equal) on the right hand side than on the left hand sides of the identities (6.50), and the added indices on the right hand side are from B s k , we have We claim that The first implication follows from the monotonicity of D k 's. In order to get the second implication, suppose that Γ pk´1q α k´1 equals (6.47). Since D k does not contain B s k the monomial Γ pkq α k can not be of the form (6.47), with the upper index sets E t and F t replaced with E t Y B s k and F t Y B s k , respectively. The formulas (6.50) hence show that Γ pkq α k contains at least one more resolvent entry of the form G pAq ab than Γ pk´1q α k´1 , and thus m k ě m k´1`1 . Property 3 follows from (6.56). Indeed, suppose that there are no isolated label s such that B s Ď D p ℓ . Then applying (6.56) for each k " 1, . . . , p ℓ, yields m p ℓ ě m 0`p ℓ . Since m 0 " p, using the notations from (6.28) we have m p ℓ " n`|R p1q |`2|R p2q | , by Property (c) of the monomials. This completes the proof of Property 3. Now only the bound (6.27) on the number of monomials M " M p ℓ remains to be proven, which is a simple counting. Let p k be the largest number of factors among the monomials at the level-k expansion, i.e., if the monomials Γ pkq α are written in the form (6.28), then Let us set b˚:" 1`max x,y |B N ε 1 px, yq|. Each of the factors in every monomial at the level k´1 is turned into a sum over monomials by the resolvent identities (6.50). Since each such monomial contains at most five resolvent entries (cf. the last terms in (6.50b)), we obtain the first of the following two bounds: In order to derive the second bound we recall that each of the at most p k´1 factors in every level-pk´1q monomial is expanded by the resolvent identities (6.50) into a sum of at most b˚terms. The product of these sums yields single sum of at most b p k´1 terms. From (6.48) and (6.20a) we get M 0 :" 1 and p 0 " 2µ. Since k ď p ℓ ď 2µ, we immediately see that max k p k ď 2µ 25 µ . Plugging this into the second bound of (6.57) yields M k ď ppb˚q 2µ 25 µ q 2µ . Since b˚ď N Cε 1 by (2.11) this completes the proof of (6.27). Finally, we obtain the bound on the number of factors in (6.28) using n`q ď p p ℓ À µ 1.
7 Bulk universality and rigidity

Rigidity
Proposition 7.1 (Local law away from rκ´, κ`s). Let G be the resolvent of a random matrix H of the form (2.24) that satisfies B1-B4. Let κ´, κ`be the endpoints of the convex hull of supp ρ as in (5.1). For all δ, ε ą 0 and ν P N there exists a positive constant C such that away from rκ´, κ`s, The normalized trace converges with the improved rate P " D ζ P H , δ ď distpζ , rκ´, κ`sq ď 1 δ : The constant C depends only on the model parameters K in addition to δ, ε and ν.
Remark 7.2. Theorem 2.9 and Proposition 7.1 provide a local law with optimal convergence rate 1 N Im ζ inside the bulk of the spectrum and convergence rate 1 N away from the convex hull of supp ρ, respectively In order to prove a local law inside spectral gaps and at the edges of the support of the density of states, additional assumptions on H are needed to exclude a naturally appearing instability that may be caused by exceptional rows and columns of H and the outlying eigenvalues they create. This instability is already present in the case of independent entries as explained in Section 11.2 of [1].
Proof of Proposition 7.1. The proof has three steps. In the first step we will establish a weaker version of Proposition 7.1 where instead of the bound Λ ă 1 ? N we will only show Λ ă 1 ? N`1 N Im ζ . Then we will use this version in the second step to prove that there are no eigenvalues outside a small neighborhood of rκ´, κ`s. Finally, in the third step we will show (7.1) and (7.2).
Step 1 : The proof of this step follows the same strategy as the proof of Theorem 2.9. Only instead of using Lemma 3.3 to estimate the error matrix D we will use Lemma 5.1. In analogy to the proof of (2.30) we begin by showing the entrywise bound In fact, following the same line of reasoning that was used to prove (3.10), but using (5.2) instead of (3.8) to estimate D max we see that for any ε ą 0. The last term on the right hand side can be absorbed into the left hand side and since ε was arbitrary (7.4) yields This inequality establishes a gap in the possible values that Λ can take, provided ε ă 1{2 because N´ε ě 1 ? N`1 N Im ζ . Exactly as we argued for (3.10) we can get rid of the indicator function in (7.5) by using a continuity argument together with a union bound in order to obtain (7.3).
As in the proof of Theorem 2.9 we now use the fluctuation averaging to get an improved convergence rate for the normalized trace, for all ζ P H with δ ď distpζ , rκ´, κ`sq ď 1 δ and Im ζ ě N´1`ε. Indeed, (7.6) is an immediate consequence of (7.3) and the fluctuation averaging Proposition 3.4.
Step 2 : In this step we use (7.6) to prove the following lemma. for a positive constant C, depending only on the model parameters K in addition to δ and ν.
In order to show (7.7) we consider the imaginary part of 1 N TrrG´Ms and use that where pλ i q N i"1 are the eigenvalues of H. We choose τ P r´δ´1, κ´´δs Y rκ``δ, δ´1s and η P rN´1`ε, 1s and estimate a single term in the sum on the right hand side by employing (7.6), Here, we used in the last inequality that 1 N Tr M is the Stieltjes transform of the density of states ρ with supp ρ Ď rκ´, κ`s. Since the left hand side of (7.8) is a Lipschitz continuous function in τ with Lipschitz constant bounded by N we can use a union bound to establish (7.8) first on a fine grid of τ -values and then uniformly for all τ and for the choice In particular, the eigenvalue λ i cannot be at position τ with very high probability, i.e.
Now we exclude that there are eigenvalues far away from the support of the density of states by using a continuity argument. Let Ă W be a standard GUE matrix with E| r w xy | 2 " 1 N , pλ pαq i q i the eigenvalues of H pαq :" α H`p1´αq Ă W for α P r0, 1s and κ :" sup α maxt|κ pαq |, |κ pαq |u, where κ pαq are defined as in (5.1) for the matrix H pαq . In particular, κ p0q "˘2. Since the constant Cpδ, νq in (7.9) is uniform for all random matrices with the same model parameters K , we see that sup αPr0,1s The eigenvalues λ pαq i are Lipschitz continuous in α. In fact, |B α λ pαq i | ď H´Ă W ă ? N . Here, the simple bound on H´Ă W follows from E H´Ă W 2µ " E rTrpH´Ă Wq 2 s µ ď CpµqN µ , for some positive constant Cpµq, depending on µ, the upper bound κ 1 from (2.25) on the moments, the sequence κ 2 from (2.26) and P from (2.11). Thus we can use a union bound to establish Since for α " 0 all eigenvalues are in r´κ´2 δ, κ`2 δs with very high probability and with very high probability no eigenvalue can leave this interval by (7.10), we conclude that P " D i : |λ i | ě κ`2 δ ‰ ď Cpδ, νqN´ν.
Together with (7.9) this finishes the proof of Lemma 7.3.
Step 3 : In this step we use (7.7) to improve the bound on the error matrix D away from r´κ´, κ`s and thus show (7.1) and (7.2) by following the same strategy that was used in Step 1 and in the proof of Theorem 2.9. By Lemma 7.3 there are with very high probability no eigenvalues in " κ´´δ 2 , κ``δ 2 ‰ .
Therefore, for any B Ď X also the submatrix H B of H has no eigenvalues in this interval. In particular, for any x P XzB we have in a high probability event. As in the proof of Lemma 3.3 we bound the entries of the error matrix D by estimating the right hand sides of the equations (5.9a) to (5.9e) further. But now we use (7.11), so that Im ζ in the denominators cancel and we end up with Dpζ q max ½pΛpζ q ď N´εq ă 1 ? N , (7.12) on the domain of spectral parameters with δ ď distpζ , rκ´, κ`sq ď 1 δ . Following the strategy of proof from Step 1 we see that (7.12) implies (7.1) and (7.2). This finishes the proof of Proposition 7.1.
Proof of Corollary 2.11. The proof follows a standard argument that establishes rigidity from the local law, which we present here for the convenience of the reader. The argument uses a Cauchy-integral formula that was also applied in the construction of the Helffer-SjÃűstrand functional calculus (cf. [15]) and it already appeared in different variants in [28], [22] and [27].
Since r ε was arbitrary and there are no eigenvalues of H to the left of κ´´r δ (cf. Lemma 7.3), we infer for any τ P R with ρpτ q ě δ. Combining (7.13) with the definition (2.32) of ipτ q yields This in turn implies (2.33) and Corollary 2.11 is proven.

Bulk universality
Given the local law (Theorem 2.9), the proof of bulk universality (Corollaries 2.12 and 2.13) follows standard arguments based upon the three step strategy explained in the introduction. We will only sketch the main differences due to the correlations. We start by introducing an Ornstein-Uhlenbeck (OU) process on random matrices H t that conserves the first two mixed moments of the matrix entries where the covariance operator Σ : C NˆN Ñ C NˆN is given as and B t is matrix of standard real (complex) independent Brownian motions with the appropriate symmetry Bt " B t for β " 1 (β " 2) whose distribution is invariant under the orthogonal (unitary) symmetry group. We remark that a large Gaussian component, as created by the flow (7.14), was first used in [39] to prove universality for the Hermitian symmetry class.
Along the flow the matrix H t " A`1 ? N W t satisfies the condition B3 on the dependence of the matrix entries uniformly in t. In particular, since Σ determines the operator S we see that H t is associated to the same MDE as the original matrix H. Also the condition B4 and B5 can be stated in terms of Σ, and are hence both conserved along the flow.
For the following arguments we write W t as a vector containing all degrees of freedom originating from the real and imaginary parts of the entries of W t . This vector has NpN`1q{2 real entries for β " 1 and N 2 real entries for β " 2. We partition X 2 " I ď 9 YI ą into its upper, I ď :" tpx, yq : x ď yu, and lower, I ą " tpx, yq : x ą yu, triangular part. Then we identify where w t ppx, yqq :" 1 ? N w xy for β " 1 and w t ppx, yqq :" N Re w xy for px, yq P I ď , 1 ?
N Im w xy for px, yq P I ą , for β " 2. In terms of the vector w t the flow (7.14) takes the form where b t " pb t pαqq α is a vector of independent standard Brownian motions, and Σ 1{2 is the square-root of the covariance matrix corresponding to H " H 0 : Σpα, βq :" E w 0 pαqw 0 pβq .
Using (2.11) and B3 we see that for any α |B 2 pαq| ď N Cε and |Σpα, γq| ď Cpε, νqN´ν , γ R B 1 pαq , (7.16) respectively. For any fixed α, we denote by w α the vector obtained by removing all the entries of w which may become strongly dependent on the component wpαq along the flow (7.15), i.e., we define w α pγq :" wpγq½pγ R B 2 pαqq . (7.17) In the case that X has independent entries it was proven in [12] that the process (7.15) conserves the local eigenvalue statistics of H up to times t ! N´1 {2 , provided bulk local law holds uniformly in t along the flow as well. We will now show that this insight extends for dependent random matrices as well. The following result is a straightforward generalization of Lemma A.1. from [12] to matrices with dependent entries. We remark that a similar result was independently given in [14].
where w α,θ s :" w α s`θ pw s´w α s q for θ P r0, 1s, and B k α 1¨¨¨αk " B Bwpα 1 q¨¨¨B Bwpα k q . Proof. We will suppress the t-dependence, i.e. we write w " w t , etc. Ito's formula yields where dM " dM t is a martingale term. Taylor expansion around w " w α yields By plugging these into (7.20) and taking expectation, we obtain Now, we estimate the five terms on the right hand side of (7.21) separately. First, (7.21a) is small since wpαq is almost independent of w α by B3 and (7.17): In the term (7.21b), if δ P B 1 pαq, then wpαqwpδq is almost independent of w α : If δ P B 2 pαqzB 1 pαq, then wpαq is almost independent of pwpδq, w α q and where we have used (7.16). The last term containing derivatives is bounded by r Ξ. The term (7.21c) is negligible by |Σpα, δq| À ε,ν N´ν and | E B 2 αδ f pw α q | ď r Ξ. For (7.21d) we use (7.16) and the definition of Ξ to obtain ÿ α ÿ δ,γPB 2 pαq The last term (7.21e) is estimated similarly: and the double sum over α, δ produces a factor of size CN due to the exponential decay of Σ. Combining the estimates for the five terms on the right hand side of (7.21) we obtain (7.18).
Second, we use Lemma 7.4 to show that H and H t have the same local correlation functions in the bulk. Suppose ρpωq ě δ for some ω P R. We show that the difference pτ 1 , . . . , τ k q Þ Ñ`ρ k;t N´ρ k˘´ω`τ 1 N , . . . , ω`τ k Nō f the local k-point correlation functions ρ k and ρ k;t N of H and H t N , respectively, converge weakly to zero as N Ñ 8. This convergence follows from the standard arguments provided that where F " F N is a function of H expressed as a smooth function Φ of the following observables with p ď k and ξ 2 P p0, 1q sufficiently small. Here the derivatives of Φ might grow only as a negligible power of N (for details see the proof of Theorem 6.4 in [27]). In particular, basic resolvent formulas yield RHS of (7.18) where Λ t is defined like Λ in (3.6) but for the entries of G t pζq :" pH t´ζ q´1 with Im ζ ě N´1`ξ 2 . In particular, we have used |G xy ptq| ď |m xy ptq|`Λ t À 1`Λ t here. The constant Ξ from (7.19) is easily bounded by N ε 1`C ξ 2 , where the arbitrary small constant ε 1 ą 0 originates from stochastic domination estimates for Λ t and |w s pαq|'s. The constant r Ξ from (7.19), on the other hand, is trivially bounded by N C since the resolvents satisfy trivial bounds in the regime |Im ζ | ě N´2, and the weight |wpαq| multiplying the third derivatives of f is canceled for large values of |wpαq| by the inverse in the definition G " pA`1 ? N W´ζ 1q´1. Since the local law holds for H t , uniformly in t P r0, t N s, we see that Λ t ď N ε 1 pN ηq´1 {2 ď N ε 1´ξ 2 {2 with very high probability and hence Choosing the exponents ε, ε 1 , ξ 1 , ξ 2 sufficiently small we see that the right hand side goes to zero as N Ñ 8. This completes the proof of Corollary 2.12. Finally, the comparison estimate (7.25) and the rigidity bound (2.33) allows us to compare the gap distributions of H t N and H, see Theorem 1.10 of [40]. This proves Corollary 2.13.
(iii) In the proof of the lower bound in the property (4.35) of the operator F .
The adjustments of the arguments in (i)-(iii) result in a weaker bound on the eigenmatrix F, namely (4.27) is replaced by where the positive constants C 1 , C 2 may depend on L from A1'. The lower bound on the spectral gap ϑ in (4.28) is modified in a similar fashion, i.e., ϑ Á M ´C 3 for some positive constant C 3 , depending only on L. These two changes imply that the constant C in (4.18) depends on L as well, and so does the constant c from Proposition 2.2.
Modification of (i): Recall the notations, model parameters and the partition A 1 , . . . , A K from A1' and that P k denotes the projection onto the subspace of vectors with support in A k . The matrix Z satisfies z kk " 1 and pZ L q kl ě 1 for every k, l " 1, . . . , K. The proof of (4.12) continues from (4.13) as follows: We multiply with P j on both sides of (4.13) and take the normalized trace. Then we use (2.21) to get xP j Im My Á Note that we used Z " Z t , which can be assumed because S is symmetric with respect to the scalar product (2.1) on C NˆN . Since the diagonal entries of Z are bounded away from zero by A1', we also have w Á T w. Since Im M is positive definite, we conclude that w has positive entries. By the Perron-Frobenius Theorem the symmetric matrix T with non-negative entries has a bounded spectral norm. Therefore, (4.12) holds because Modification of (ii): Instead of using (4.14) in the first inequality of (4.16) we use (4.13) and the lower bound We provide a proof of (A.3) under the assumption A1': Iterating (A.2) 2L times shows that the entries of w are bounded from below by their sum, In the second and forth inequality we used A1', namely z kk " 1 and pZ L q kl ě 1, while in the third inequality the lower bound on the diagonal elements of T 2 , was used. The last inequality of (A.5) holds because M´1 À 1 by (4.10) and |A k | " N by A1'. With (2.21) and z kk " 1 we see the first inequality of The second inequality of (A.6) follows from (A.4).
Modification of (iii): Without the lower bound from (2.7) also the lower bound from (4.35) may no longer be valid. Thus the property (4.35) is replaced by • For all matrices R P C`the L-th iterate of F satisfies Here L is the model parameter from Assumption A1'. The rest of the argument in the proof of Lemma 4.6 following the statement of the property (4.35) until the start of the proof of (4.35) remains unchanged.
Proof of (A.7): The upper bound in (A.7) follows from iterating the upper bound of (4.35), which does still hold and whose proof does not require any adjustment. For the lower bound in (A.7) we will use Assumption A1'. We use the abbreviation We will now show that for any l " 1, . . . , L the l-th iterate of F satisfies The lower bound in claim (A.7) follows from (A.9) by taking l " L and using pZ L q kj Á 1 (cf. A1') as well as C˚r1s " C W rIm Ms Á M ´2 1 , which is a consequence of the lower bounds in (4.11) and (4.24).
We prove (A.9) by induction on l. For l " 1 we simply combine the definition of F with (2.21). For the induction step we assume (A.9) for l ă L. Acting with F on both sides and applying (2.21) again reveals Now we use xP i y " 1 N |A i | Á 1 and the lower bounds from (4.11) and (4.24) again to get @ C˚rP i s 2 D " (A.11) Plugging (A.11) into (A.10) shows the induction step. This finishes the proof of (A.9) and concludes the proof of (A.7) as well.
holds for every ν P N and x, y P X with some positive sequence β " pβpνqq 8 ν"0 and R´1 ď 1. Then there exists a sequence α " pαpνqq 8 ν"0 , depending only on β and P (cf. (2.11)), such that for every ν P N and x, y P X.
Proof. Within this proof we adapt Convention 4.1 such that ϕ À ψ means ϕ ď Cψ for a constant C, depending only on P :" pβ, P q. It suffices to prove (A.15) for N ě N 0 for some threshold N 0 À 1. Thus, we will often assume N to be large enough in the following. We split R into a decaying component S and an entrywise small component T, i.e. we define R " S`T , s xy :" r xy ½´|r xy | ě 2C 1 N¯.
(A. 16) The main part of the proof of Lemma A.2 is to show that S has a bounded inverse, We postpone the proof of (A.17) and show first how it is used to establish (A.15).
Since the entries of S are decaying as |s xy | ď βpνq p1`dpx, yqq ν , for any ν P N, we can apply Lemma A.1 in order to get the decay of the entries of S´1 to arbitrarily high polynomial order ν P N, In particular, we find that the ¨ 1_8 -norm (introduced in (4.50)) of S´1 is bounded We show now that R´1´S´1 max À 1 N which together with (A.18) implies (A.15). For a matrix Q P C NˆN viewed as an operator mapping between C N equipped with the standard Euclidean and the maximum norm we use the induced operator norms We write the difference between R´1 and S´1 aś S´1TR´1 " R´1´S´1 "´R´1TS´1. (A.20) The first equality in (A.20) implies where we used for any Q P C NˆN , (A. 19) and T max À 1 N from the definition of T in (A.16). The second equality in (A.20) on the other hand implies where (A. 19) and (A.21) were used in the second inequality. This finishes the proof of Lemma A.2 up to verifying (A.17).
We split R˚R into a decaying and an entrywise small piece as we did with R itself in (A. 16), R˚R " L`K , L :" S˚S , K :" S˚T`T˚S`T˚T .
From the related properties of S and T we can easily see that L " pl xy q x,y is decaying and K has entries of order 1 N , i.e.
Using the a priori knowledge from the assumption R´1 À 1 of Lemma A.2, we will show that L´1 À 1, which is equivalent to (A.17). Note that both L and K are selfadjoint. Via spectral calculus we write K as a sum of a matrix K s with small spectral norm and a matrix K b with bounded rank K " K s`Kb , with some ε ą 0 determined later. Indeed, from the Hilbert-Schmidt norm bound on the eigenvalues λ i pKq of K, we see that rank K b À 1 ε 2 . On the other hand K s ď ε by its definition in (A.24). Since L`K has a bounded inverse (cf. (A.23)), so does L`K b for small enough ε, i.e.
pL`K b q´1 À 1 . (A.25) Now we fix ε " 1 such that (A.25) is satisfied. In particular the eigenvalues of L`K b are separated away from zero. Since rank K b À 1, we can apply the interlacing property of rank one perturbations finitely many times to see that there are only finitely many eigenvalues of L in a neighborhood of zero, i.e. rank rL ½ r0,c 1 q pLqs À 1 , (A. 26) for some constant c 1 " 1. In particular, there are constants c 2 " c 3 " 1 such that c 2`c3 ď c 1 and L has a spectral gap at rc 2 , c 2`c3 s, We split L into the finite rank part L s associated to the spectrum below the gap and the rest, L " L s`Lb , L s :" L ½ r0,c 2 q pLq , L b :" L ½ pc 2`c3 ,8q pLq . The rest of the proof is devoted to showing that L s Á ½ r0,c 2 q pLq, which implies that L has a bounded inverse and thus shows (A.17). More precisely, we will show that there are points x 1 , . . . , x L with L :" rank L s À 1 (cf. (A.26)) and a positive sequence pCpνqq νPN such that for any ν P N and x P X, where pe x q xPX denotes the canonical basis of C N . Let l " pl x q be any normalized eigenvector of L s in the image of ½ r0,c 2 q pLq with associated eigenvalue λ. We need to show that λ Á 1. Since xl, ½ r0,c 2 q pLqe x y " l x , the decay property (A.28) of the spectral projection ½ r0,c 2 q pLq away from the finitely many centers x i implies that the components l x have arbitrarily high polynomial decay away from the points x 1 , . . . , x L . In particular, ř x |l x | is bounded and therefore we have (cf. (A.22)) We infer that for the eigenvalue λ we get a lower bound via where we used (A.23) for the inequality. Thus, λ Á 1 for large enough N. Now we prove (A.28) by induction. We show that for any l " 0, . . . , L there is an l-dimensional subspace of the image of ½ r0,c 2 q pLq such that the associated orthogonal projection P l satisfies The induction is over l. For l " 0 there is nothing to show. Now suppose that (A.29) has been established for some l ă L. We will see now that it then holds for l replaced by l`1 as well.
We maximize the maximum norm of all vectors in the image of ½ r0,c 2 q pLq´P l and pick the index x l`1 where the maximum is attained, ξ :" max x p½ r0,c 2 q pLq´P l qe x " p½ r0,c 2 q pLq´P l qe x l`1 .
Here, ξ ą 0 since l ă L. Now we extend the projection P l by the normalized vector v defined as P l`1 :" P l`v v˚, v :" 1 ξ p½ r0,c 2 q pLq´P l qe x l`1 . (A.31) The so defined vector v attains its maximum norm at the point x l`1 and the value of this norm is ξ, since for any x we have Here we used (A.30) and that ½ r0,c 2 q pLq´P l ě 0 is an orthogonal projection.
We will show that P l`1 satisfies (A.29) with l replaced by l`1. We start by establishing that ξ Á 1. We write v 2 as a sum of contributions originating from the neighborhoods B :" Ť l`1 i"1 B R px i q of the points x i with some radius R to be determined later and their complement. We estimate the components of v by using (A.32) and the definition of v in (A.31), where in the second inequality we estimated the size of B with (2.11). Now we choose R :" ξ´1 {P , ν :" r4P s. Using ξ ď 1 (cf. (A.30)), we obtain that the right hand side (A.37) is bounded by a constant multiple of ξ. Thus (A.37) proves ξ Á 1. We finish the induction by using the definition (A.31) of v and estimating P l`1 e x ď P l e x `|v x | ď P l e x `1 ξ` P l e x `|p½ r0,c 2 q pLqq x l`1 x |˘.
Since where c and C are fixed positive constants. Then there is a gap in the spectrum of T , The eigenvalue 1 is non-degenerate and the corresponding normalized, T hs " 1, eigenmatrix T P C`satisfies the bounds Now we show the existence of a spectral gap and that 1 is a non-degenerate eigenvalue. Let R P C NˆN be normalized, R hs " 1, and orthogonal to T, i.e., xT , Ry Since T preserves C`and therefore T rRs˚" T rR˚s, we have xR , pId˘T qrRsy " xRe R , pId˘T qrRe Rsy`xIm R , pId˘T qrIm Rsy .
Thus, it suffices to show (A.42) for self-adjoint matrices R. For a normalized self-adjoint R P C NˆN with xT , Ry " 0 we use the spectral representation with the orthonormal eigenbasis pr i q N i"1 of R. Plugging (A.43) into the right hand side of (A.42) reveals the identity xR , pId˘T qrRsy " q˚p1˘Sqq , (A. 44) where we introduced the vector q P R N of eigenvalues of R and the matrix S P R NˆN with non-negative entries as The vector q is normalized since q " R hs " 1 and the matrix S is symmetric because of the self-adjointness of T , s ij " xr i ri , T rr j rj sy " xT rr i ri s , r j rj y " s ji .
Furthermore, by (A.38) the entries of S satisfy lower and upper bounds, In particular, by the Perron-Frobenius theorem, the matrix S has a unique normalized eigenvector s with positive entries and with associated eigenvalue equal to spectral norm, S s " S s . (A.46) We will now show that S has a spectral gap and r has a non-vanishing component in the direction orthogonal to s. This will imply |q˚Sq| ď 1´c N q, and that the Perron-Frobenius eigenvector s " ps i q i satisfies where we used (A.46), (A.48), (A.45) and s " 1 in that order. Now we employ Lemma 5.6 from [2] to see that S has a spectral gap Finally we show that there is a non-vanishing component of r in the direction orthogonal to s. More precisely, we write q " p1´ w 2 q 1{2 s`w , (A. 50) for some w K s. By taking the scalar product with t :" pN´1 {2 ri Tr i q N i"1 and using the orthogonality t˚q " xT , Ry " 0, we get the equality in The first inequality in (A.51) follows from the lower bound on T in (A.40). Since t ď T hs " 1 and (cf. (A.48)) we conclude that where we used c ď C. Combining (A.49) with (A.50) and (A.52) yields where we also used S ď 1. Thus, (A.47) is established and Lemma A.3 is proven.
Lemma A.4 (Quantitative implicit function theorem). Let T : C AˆCD Ñ C A be a continuously differentiable function with invertible derivative ∇ p1q T p0, 0q at the origin with respect to the first argument and T p0, 0q " 0. Suppose C A and C D are equipped with norms that we both denote by ¨ and let the linear operators on these spaces be equipped with the corresponding induced operator norms. Let δ ą 0 such that where B # δ is the δ-ball around 0 with respect to ¨ in C # . Suppose for some positive constants C 1 , C 2 , where ∇ p2q is the derivative with respect to the second variable. Then there is a constant ε ą 0, depending only on δ, C 1 and C 2 , and a unique function f : B D ε Ñ B A δ such that T pf pdq, dq " 0 for all d P B D ε . The function f is continuously differentiable. If T is analytic, then so is f .
Proof. The proof is elementary and left to the reader.
Lemma A.5 (Linear large deviation). Let X " pX x q xPX and b " pb x q xPX be families of random variables that satisfy the following assumptions: (i) The family X is centered, E X x " 0.
(ii) The family X has uniformly bounded moments, i.e.
Then the large deviation estimate holds for any ν P N.
Step 1: In this step we introduce a trivial cutoff both for X and b. We show that it suffices to prove the moment bound where r ν P N and θ : r0, 8q Ñ r0, 1s is a smooth cutoff function such that θ| r0,1{2s " 1 and θ| r1,8q " 0. We will now see that r X and r b satisfy the assumption of Lemma A.5. For this purpose, let φ be smooth. We use the short hand notations θ " θpN´1|X x | 2 q , φ " φpX A q , r φ " φp r X A q .
For the first bound in (A.61) we estimate, for any ν P N, where we used EX x " 0 in the first step, then the Cauchy-Schwarz inequality in the second step and finally (A.54) for the last inequality. We conclude that |X x´r X x | " |X x |p1´θq`|E rX x θs| ď |X x |½p2|X x | 2 ą Nq`CpνqN´ν.
In combination with (A.54) we infer E|X x´r X x | µ ď Cpµ, νqN´ν, We integrate a and b with respect to the distribution of X and see (A.61).
With the help of (A.61) the assumptions (A.54) and (A.55) are easily verified when we replace X by r X.
The assumption (A.56) is checked similarly. Furthermore, r X and r b obviously satisfy the additional bounds (A.59). Now suppose that (A.58) holds with X, b replaced by r X, r b. In particular, ř x r b x r X x ă 1. Then we see that ÿ and since r ν was arbitrary, Lemma A.5 is proven, up to checking (A.58) for random variables X and b that satisfy (A.59) in addition to the assumptions of the lemma.
Step 2: In this step we completely remove the weak dependence between X and b, i.e. we show that it is enough to prove (A.58) for a centered family X independent of b satisfying (A.59), (A.54) and (A.55). Indeed, suppose that X and b are not independent, but satisfy (A.56) instead. Let r b be a copy of b that is independent of X and b. We show that for any µ, ν P N, ď Cpµ, νqN´ν. To prove (A.65) we expand the powers on the left hand side, compare term by term and find l.h.s. of (A.65) ď N 2µ max where the maximum is taken over all x 1 , . . . , x 2µ P X. Now we employ (A.56) as well as the bounds |b x | ď 1 and |X x | ď ? N to infer (A.65).
Step 3: By Step 1 and Step 2 we may assume for the proof of (A.58) that X is independent of b and that these random vectors satisfy (A.59), (A.54) and (A.55). In this final step we construct for every ε ą 0 a partition of X into non-empty sets I 1 , . . . , I K with the following properties: (i) With a constant Cpεq, depending only on ε and P (cf. (2.11)), the size of the partition is bounded by In other words, the elements within each I k are far from each other hence the corresponding components of X and b are practically independent. We postpone the construction of this partition to the end of the proof and explain first how it is used to get (A.58). We split the sum according to the partition and estimate By the estimate (A.66) on the size of the partition and by choosing ε sufficiently small, it remains to show For an independent family pX x q xPI k with (A.54), the moment bound (A.68) would be a simple consequence of the Marcinkiewicz-Zygmund inequality. Therefore, (A.54) follows from for all µ, ν P N, where r X " p r X x q x is an independent family of random variables, which is also independent of X and b and has the same marginal distributions as X.
To show (A.69) we expand the powers on the left hand side and use the independence of b from X and r X as well as the upper bound |b x | ď 1, l.h.s. of (A.69) ď N 2µ max where the maximum is over all ξ " px 1 , . . . , x 2µ q P I 2µ k . For such a ξ let ξ 1 , . . . , ξ R P I k denote the indices appearing within ξ, clearly R ď 2µ. Let furthermore the non-negative integers µ 1 , . . . , µ R and r µ 1 , . . . , r µ R denote the corresponding numbers of appearances within px 1 , . . . , x µ q and px µ`1 , . . . , x 2µ q, respectively. Then we can further estimate the term inside the maximum on the right hand side of (A.70) corresponding to ξ by using the telescopic sum, We will now inductively construct the partition I 1 , . . . , I K with the properties (i) and (ii) above. Suppose that the disjoint sets I 1 , . . . , I k have already been constructed. Then we pick an arbitrary x 0 P J 0 :" XzpI 1 Y¨¨¨Y I k q. Next we pick x 1 P J 1 :" J 0 zB N ε px 0 q, then x 2 P J 2 :" J 1 zB N ε px 1 q and so on. The process stops at some step L when J L`1 is empty and we set I k`1 :" tx 0 , . . . , x L u. By construction, property (ii) is satisfied for all elements I k of the partition. We verify the upper bound (i) on the number K of such elements. For every k we have where we used (2.11) for the last inequality. In particular, (A.73) provides a lower bound on the size of I k which we use to obtain that N´k ÿ l"1 |I l | ď`1´N´ε P˘ˆN´k´1 ÿ l"1 |I l |˙.
Since I K contains at least one element, we infer by induction that 1 ď N´K´1 ÿ l"1 |I l | ď p1´N´ε P q K´1 N ď Ne´p K´1qN´ε P .
We solve for K and thus see that (i) holds true. This finishes the proof of Lemma A.5.
Lemma A.6 (Quadratic large deviation). Let X " pX x q xPX , Y " pY x q xPX , b " pb xy q x,yPX be families of random variables that satisfy the following assumptions: (i) The families X and Y are centered, E X x " E Y x " 0.
(ii) Both X and Y have uniformly bounded moments, i.e.
E |X x | µ`E |Y x | µ ď β 1 pµq , µ P N , for all x P X and some sequence β 1 of positive constants.
(iii) The correlations within X and Y decay, i.e. for ε ą 0, A, B Ď X with dpA, Bq ě N ε and smooth functions φ : C |A| Ñ C, ψ : C |B| Ñ C we have holds for any ν P N.
Proof. We use Convention 4.1 such that ϕ À ψ means ϕ ď Cψ for a constant C, depending only on Ă P :" pβ 1 , β 2 , β 3 , P q (cf. (2.11)). The proof of Lemma A.6 follows a similar strategy as the proof of Lemma A.5. Exactly as in Step 1 of the proof of Lemma A.5, we introduce new families of centered random variables r X x :" p1´EqrX x θpN´1|X x | 2 qs , r Y x :" p1´EqrY x θpN´1|Y x | 2 qs , and rescaled coefficients r b xy :" b xỳ ř u,v |b uv | 2˘1 {2`N´r ν .
In this way we reduce the proof of (A.76) to showing the moment bound Step 2 of the proof of Lemma A.5 and using (A.75) we may also assume that b is independent of pX, Y q.
To show (A.77) we fix ε ą 0 and choose the partition I 1 , . . . , I K from Step 3 of the proof of Lemma A.5 of the index set X. In particular, (A.66) and (A.67) are satisfied. We split the sums over x and y in (A.77) according to this partition and estimate l.h.s. of (A.77) ď K 4µ K max By choosing ε sufficiently small and using (A.66) it suffices to show that for any fixed k, l " 1, . . . , K we have the moment bound For any x P I k and y P I l we introduce the relation x ' y whenever dpx, yq ď N ε 3 .
If dpx, yq ą N ε 3 , we correspondingly write x ' y. Since the distances of indices within the set I k are bounded from below by N ε (c.f. (A.67)), we see that for every x P I k there exists at most one y P I l such that x ' y and the other way around. We set x ' y , 0 otherwise , for any x P I k and y P I l . Note that if k " l, then ιpxq " 1 for all x P I k . Furthermore, let us define S :" ! pS, T q : S Ď I k , T Ď I l , such that x ' y for all x P S , y P T ) , the pairs of subsets with a distance of at least N ε 3 . Inspired by Appendix B of [19] we use the partition of unity 1 " σ xy |S | ÿ pS,T qPS ½px P Sq½py P T q , x P I k , y P I l , x ' y , (A.80) where we introduced the numbers σ xy :" if ιpxq " ιpyq " 0 , 6 if ιpxq " 0 and ιpyq " 1 , 6 if ιpxq " 1 and ιpyq " 0 , 9 if ιpxq " 1 and ιpyq " 1 .
We split the sum in (A.79) into a sum over pairs px, yq with x ' y and x ' y. Afterwards we use (A.80) and find ÿ xPI k ,yPI l b xy pX x Y y´E X x Y y q " U`1 |S | ÿ pS,T qPS V pS, T q , with the short hand notation U :" ÿ xPI k ,yPI l x'y b xy pX x Y y´E X x Y y q , V pS, T q :" ÿ xPS,yPT σ xy b xy pX x Y y´E X x Y y q .
Thus, proving (A.79) reduces to showing for any pair of index sets pS, T q P S that E |U| 2µ`E |V pS, T q| 2µ ď Cpµq.
The moment bound on U can be seen with exactly the same argument as (A.58) in the proof of Lemma A.5 since the family of centered random variables X x Y y´E X x Y y in this sum are almost independent. The moment bound on V pS, T q follows by comparing the moments of V pS, T q with the moments of r V pS, T q :" ÿ xPS,yPT where r X S " p r X x q xPS and r Y T " p r Y x q xPT are independent families of random variables, which are independent of pb, X, Y q as well, with the same marginal distributions as X S " pX x q xPS and Y T " pY x q xPT , respectively. The result of this comparison is that E |V pS, T q| 2µ´E | r V pS, T q| 2µ ď Cpµ, νqN´ν, because X S and Y T are essentially uncorrelated since dpS, T q ě N ε 3 and because the families X S and Y T themselves are already essentially uncorrelated (cf. (A.67)). Finally, the moments of r V pS, T q satisfy the necessary bound by the Marcinkiewicz-Zygmund inequality as in the proof of Lemma A.5. The details are left to the reader.