Universality for general Wigner-type matrices

We consider the local eigenvalue distribution of large self-adjoint $N\times N$ random matrices $\mathbf{H}=\mathbf{H}^*$ with centered independent entries. In contrast to previous works the matrix of variances $s_{ij} = \mathbb{E}\, |h_{ij}|^2 $ is not assumed to be stochastic. Hence the density of states is not the Wigner semicircle law. Its possible shapes are described in the companion paper [1]. We show that as $N$ grows, the resolvent, $\mathbf{G}(z)=(\mathbf{H}-z)^{-1}$, converges to a diagonal matrix, $ \mathrm{diag}(\mathbf{m}(z)) $, where $\mathbf{m}(z)=(m_1(z),\dots,m_N(z))$ solves the vector equation $ -1/m_i(z) = z + \sum_j s_{ij} m_j(z) $ that has been analyzed in [1]. We prove a local law down to the smallest spectral resolution scale, and bulk universality for both real symmetric and complex hermitian symmetry classes.


Introduction
In the seminar paper [27] Wigner introduced random self-adjoint matrices, H = H * , with centered, identically distributed and independent entries (subject to the symmetry constraint). He proved that the empirical density of the eigenvalues converges to the semicircle distribution. Wigner also conjectured that the distribution of the distance between consecutive eigenvalues (gap statistics) is universal, hence it is the same as in the Gaussian model. His revolutionary observation was that these universality phenomena hold for much larger classes of physical systems and only the basic symmetry type determines local spectral statistics. It is generally believed, but mathematically unproven, that random matrix theory (RMT), among many other examples, also describes the local statistics of random Schrödinger operators in the delocalized regime and quantization of chaotic classical Hamiltonians.
The first rigorous results on the local eigenvalue statistics in the bulk spectrum were given by Dyson, Mehta and Gaudin in the 60's. These concerned the Gaussian models and identified their local correlation functions. According to Wigner's universality hypothesis, these statistics should hold independently of the common law of the matrix elements. This conjecture was resolved recently in a series of works. The strongest result on Wigner matrices in the bulk spectrum is Theorem 7.2 in [10], see [16] and [26] for a summary of the history and related results. In fact, the three-step approach developed in [14,17,11] also applies for generalized Wigner matrices that allow for non-identically distributed matrix elements as long as the variance matrix s ij := E|h ij | 2 is stochastic, i.e. j s ij = 1 (in particular, independent of the index i). The stochasticity of S guarantees that the eigenvalue density is given by the semicircle law and the diagonal elements G ii = G ii (z) of the resolvent G(z) = (H − z) −1 with Im z > 0 become not only deterministic but also independent of i as the the matrix size N goes to infinity. They asymptotically satisfy a system of self-consistent equations that reduces to a particularly simple scalar equation for the common value m ≈ G ii for all i as N → ∞. The solution m = m(z) of (1.2) is the Stieltjes transform of the Wigner semicircle law. In this paper we consider a general variance matrix S without stochasticity condition. We show that the approximate self-consistent equation (1.1) still holds, but it does not simplify to a scalar equation. In fact, G ii remains i-dependent even as N → ∞ and it is close to the solution m i of the Quadratic Vector Equation (QVE) under the additional condition that Im m i > 0.
In the context of random matrices importance of this equation has been realized by Girko [19], Shlyakhtenko [25], Khorunzhy and Pastur [21], see also Guionnet [20], as well as Anderson and Zeitouni [4,3], but no detailed study has been initiated. In the companion paper [1] we analyzed (1.3) in full detail. We showed that m := N −1 i m i is the Stieltjes transform of a probability density ρ that is supported on a finite number of intervals, inside of which it is a real analytic function. We also described the behavior of ρ near the edges of its support; it features only square root or cubic root (cusp) singularities and an explicit one parameter family of profiles interpolating between them as a gap in the support closes.
The main result of the current paper is the universality of the local eigenvalue statistics in the bulk for Wigner-type matrices with a general variance matrix (cf. Theorem 1.15). This extends Wigner's vision towards full universality by considering a much larger class of matrix ensembles than previously studied. In particular, we demonstrate that local statistics, as expected, are fully independent of the global density. This fact has already been established for very general β-ensembles in [7] (see also [5] and [24]) and for additively deformed Wigner ensembles having a density with a single interval support [23]. Our class admits a general variance matrix and allows for densities with several intervals (we do not, however, consider non-centered distributions here; an extension to matrices with non-centered entries on the diagonal may be incorporated in our analysis with additional technical effort).
Following the three-step approach, we first prove local laws for G on the scale η = Im z N −1 , i.e. down to the optimal scale just slightly above the eigenvalue spacing. This is the main and novel part of our analysis. The previous proofs (see [11] for a pedagogical presentation) heavily relied on properties of the semicircle law, especially on its square root edge singularity. With possible cubic root singularities and small gaps in the support of ρ an additional scale appears which needs to be controlled. The second step is to prove universality for Wigner-type matrices with a tiny Gaussian component via Dyson Brownian motion (DBM). The method of local relaxation flow, introduced first in [13,14], also heavily relies on the semicircle law since it requires that the global density remain unchanged along DBM. In [15] a new method was developed to localize the DBM that proves universality of the gap statistics around a fixed energy τ in the bulk, assuming that the local law holds near τ (we remark that a similar result was obtained independently in [22]). Since Wigner-type matrices were one of the main motivations for [15], it was formulated such that it could be directly applied once the local laws are available. Finally, the third step is a perturbation result to remove the tiny Gaussian component using the Green function comparison method that first appeared in [17] and can be applied to our case basically without any modifications.
In a separate paper [2] we apply the results of this work and [1] to treat Gaussian random matrices with correlated entries. Assuming translation invariance of the correlation structure in these Gaussian matrix ensembles we prove an optimal local law, bulk universality and non-trivial decay of off-diagonal resolvent entries.

Set-up and main results
Let H (N ) ∈ C N ×N be a sequence of self-adjoint random matrices. In particular, if the entries of H are real then H (N ) is symmetric. The matrix ensemble H = H (N ) is said to be of Wigner-type if its entries h ij are independent for i ≤ j and centered, i.e., Eh ij = 0 for all i, j = 1, . . . , N . (1.4) The dependence of H and other quantities on the dimension N will be suppressed in our notation. The matrix of variances, S = (s ij ) N i,j=1 , is defined through (1.5) It is symmetric with non-negative entries. In [1] it was shown that for every such matrix S the quadratic vector equation (QVE), Imz > 0}, has a unique solution. The main result of this paper is to establish the local law for Wigner-type matrices, i.e. that for large N the resolvent, G(z) = (H − z) −1 , with spectral parameter z = τ + iη ∈ H, is close to the diagonal matrix, diag(m(z)), as long as η N −1 . As a consequence, we obtain rigidity estimates on the eigenvalues and complete delocalization for the eigenvectors. Combining this information with the Dyson-Brownian motion, we obtain universality of the eigenvalue gap statistics in the bulk.
We now list the assumptions on the variance matrices S = S (N ) . The restrictions on S are controlled by three model parameters, p, P > 0 and L ∈ N, which do not depend on N . These parameters will remain fixed throughout this paper. In the following we will always assume that S satisfies the following conditions: The assumption on the boundedness of m is an implicit condition in the sense that it can be checked only after solving (1.6). In [1] we list sufficient, explicitly checkable conditions on S, which ensure (1.9). We also remark that the assumption (1.7) can be replaced by s ij ≤ C/N for some positive constant C. This will lead to a rescaling of m. We pick the normalization C = 1 just for convenience.
In addition to the assumptions on the variances of h ij , we also require uniform boundedness of higher moments. This leads to another basic model parameter, µ = (µ 1 , µ 2 , . . . ), which is a sequence of non-negative real numbers.
(D) For all N the entries h ij of the random matrix H have bounded moments, E|h ij | k ≤ µ k s k/2 ij , k ∈ N , i, j = 1, . . . , N .
(1. 10) In order to state our main result, in the next Corollary we collect a few facts about the solution of the QVE that are proven in [1]. Although these properties are sufficient for the formulation of our results, for their proofs we will need much more precise information about the solution of the QVE. Theorems 4.1 and 4.2 summarize everything that is needed from [1] besides the existence and uniqueness of the solution of the QVE. In particular, the statement of Corollary 1.2 follows easily from Theorem 4.1. Im m i (τ + iη) , is a probability density. Its support is contained in [−2, 2] and is a union of closed disjoint intervals There exists a positive constant δ * , depending only on the model parameters p, P and L, such that the sizes of these intervals are bounded from below by Note that (1.13) provides a lower bound on the length of the intervals that constitute supp ρ, while the length of the gaps, α k+1 − β k , between neighboring intervals can be arbitrarily small.
is again denoted by ρ. With a slight abuse of notation we still write supp ρ, as in (1.12), for the support of the density of states as a function on the real line.
The density of states will be shown to be the eigenvalue distribution of H in the large N limit on the macroscopic scale. For any fixed values τ 1 , τ 2 ∈ R with τ 1 < τ 2 it satisfies provided the denominator does not vanish in the limit. The identity (1.15) motivates the terminology of density of states for the function ρ. The harmonic extension of ρ to H is a version of the density of states, that is smoothed out on the scale η. It satisfies the identity ρ(z) = (πN ) −1 k Im m k (z) not just for z ∈ R (cf. (1.11)) but for all z ∈ H and it will be used in the statement of our main result, Theorem 1.6.
In fact, Theorem 1.6, implies a local version of (1.15), where the length of the interval, [τ 1 , τ 2 ], may converge to zero as N tends to infinity. Our estimates and thus the proven speed of convergence depend on the distance of the interval to the edges of supp ρ and even on the length of the closest gap in this case. We introduce a function ∆ : R → [0, ∞), which encodes this relation. Definition 1.4 (Local gap size). Let α k and β k be the edges of the support of the density of states (cf. (1.12)) and δ * the constant introduced in Corollary 1.2. Then for any δ ∈ [0, δ * ) we set (1.16) Finally, we fix an arbitrarily small tolerance exponent γ ∈ (0, 1). This number will stay fixed throughout this paper in the same fashion as the model parameters P , p, L and µ. Our main result is stated for spectral parameters z = τ + iη whose imaginary parts satisfy (1.17) For a compact statement of the main theorem we define the notion of stochastic domination, introduced in [9] and [11]. This notion is designed to compare sequences of random variables in the large N limit up to small powers of N on high probability sets. Definition 1.5 (Stochastic domination). Suppose N 0 : (0, ∞) 2 → N is a given function, depending only on the model parameters p, P , L and µ, as well as on the tolerance exponent γ. For two sequences, ϕ = (ϕ (N ) ) N and ψ = (ψ (N ) ) N , of non-negative random variables we say that ϕ is stochastically dominated by ψ if for all ε > 0 and D > 0, (1.18) In this case we write ϕ ≺ ψ.
The threshold N 0 (ε, D) = N 0 (ε, D; P, p, L, µ, γ) will always be an explicit function whose value will be increased throughout the paper, though we will not follow its form. This will happen only finitely many times, ensuring that N 0 stays finite. The threshold is uniform in all other parameters, e.g. z, i, j, . . . , that the sequences ϕ and ψ may depend on. Typically, we will not mention the existence of N 0 -it is implicit in the notation ϕ ≺ ψ. As an example, we see that the bounded moment condition, (D), implies Actually, the function N 0 depends only on finitely many moment parameters (µ 1 , . . . , µ M ) instead of the entire sequence µ, where now the number of required moments M = M (ε, D; P, p, L, γ), is an explicit function. Now we are ready to state our main result on the local law. Suppose H = H (N ) is a sequence of self-adjoint random matrices with the corresponding sequence S = S (N ) of variance matrices and ρ = ρ (N ) the induced sequence of densities of state. Recall that δ * is the positive constant, depending only on p, P and L, introduced in Corollary 1.2 and ∆ δ is defined as in Definition 1.4. Theorem 1.6 (Local law). Suppose that assumptions (A)-(D) are satisfied and fix an arbitrary γ ∈ (0, 1). There is a deterministic function κ = κ (N ) : H → (0, ∞] such that uniformly for all z = τ + iη ∈ H with η ≥ N γ−1 the resolvents of the random matrices H = H (N ) satisfy Furthermore, for any sequence of deterministic vectors w = w (N ) ∈ C N with max i |w i | ≤ 1 the averaged resolvent diagonal has an improved convergence rate, (1.20) In particular, for w i = 1 this implies The function κ satisfies the following bounds. There is a constant δ ∈ (0, δ * ), depending only on the model parameters p, P and L, such that with ∆ = ∆ δ , .
If z is not too close to the support of the density of states in the sense that then κ satisfies the improved bound Theorem 1.6 can be localized to a spectral interval, i.e., the statements hold for Re z ∈ [a, b] provided (1.9) holds for z ∈ [a, b] + i(0, ∞). In particular, in the bulk of the spectrum the local law simplifies considerably. Corollary 1.7 (Local law in the bulk). Assume that p/N ≤ s ij ≤ 1/N for all i, j and that (D) holds. Consider an interval [a, b] in the bulk, i.e., there is ρ * > 0 such that ρ(τ ) ≥ ρ * for all τ ∈ [a, b]. Then uniformly for all z = τ + iη, with τ ∈ [a, b] and η ≥ N γ−1 , and non-random w ∈ C N satisfying max i |w i | ≤ 1, the local law holds: This local law generalizes the previous local laws for stochastic variance matrices S (see [11] and references therein). It is valid for densities ρ with an edge behavior different from the square root growth that is known from Wigner's semicircular law. In particular, singularities that interpolate between a square root and a cubic root are possible. In the bulk of the support of the density of states, i.e., where ρ is bounded away from zero, the function κ is bounded. The same is true near the edges, unless the nearby gap is small. The bound deteriorates near small gaps in the support of ρ.
In applications, the sequence S = S (N ) may be constructed by discretizing a continuous limit function. As a simple example, suppose f is a smooth, non-negative, symmetric, f (x, y) = f (y, x), function on [0, 1] 2 with a positive diagonal, f (x, x) > 0. Then the sequence of variance matrices, satisfies conditions (A)-(C). The validity of (C) can be verified by using the general criteria (cf. Theorem 1.8 and Theorem 4.1 of [1]) for uniform boundedness. In this case the solution, m = (m 1 , . . . , m N ), of the QVE converges to a limit in the sense that The continuous QVE such as this one fall into the class of general QVEs thoroughly analyzed in the companion paper [1]. In particular, the stability analysis applies and the density of states converges to a limit We introduce a notion for expressing that events hold with high probability in the limit as N tends to infinity. Definition 1.8 (Overwhelming probability). Suppose N 0 : (0, ∞) → N is a given function, depending only on the model parameters p, P , L and µ, as well as on the tolerance exponent γ. For a sequence A = (A (N ) ) N of random events we say that A hold asymptotically with overwhelming probability (a.w.o.p.), if for all D > 0: There is a simple connection between the notions of stochastic domination and asymptotically overwhelming probability. For two sequences A = A (N ) and B = B (N ) the statement 'A implies B a.w.o.p.' is equivalent to 1 A ≺ 1 B , where the threshold N 0 , implicit in the stochastic domination, does not depend on ε, i.e., N 0 (ε, D) = N 0 (D).
We denote by λ 1 ≤ · · · ≤ λ N the eigenvalues of the random matrix H. The following corollary shows that the eigenvalue distribution converges to the density of states as N tends to infinity. Corollary 1.9 (Convergence of cumulative eigenvalue distribution). Uniformly for all τ ∈ R the cumulative empirical eigenvalue distribution approaches the integrated density of states, Furthermore, for an arbitrary tolerance exponent γ ∈ (0, 1) there are no eigenvalues away from the support of the density of states, where we interpret β 0 := −∞, α K+1 := +∞ and δ k is defined as δ 0 := δ K := N γ−2/3 , as well as Based on (1.15) we define the index, i(τ ), of an eigenvalue that we expect to be located close to the spectral parameter τ by (1.30) Here, ω denotes the smallest integer that is bigger or equal to ω for any ω ∈ R.
where κ is the function from Theorem 1.6. (1.37) In particular, the eigenvectors are completely delocalized, i.e., u (i) Definition 1.14 (q-full random matrix). We say that H is q-full for some q > 0 if either of the following applies: • H is real symmetric and Eh 2 ij ≥ q/N for all i, j = 1, . . . , N ; • H is complex hermitian and for all i, j = 1, . . . , N the real symmetric 2 × 2-matrix, is strictly positive definite such that σ ij ≥ q/N . Theorem 1.15 (Universality). Suppose that in addition to (A)-(D) being satisfied, the matrix H is q-full. Then for all ε > 0, n ∈ N and all smooth compactly supported observables F : R n → R, there are two positive constants C and c such that for any τ ∈ R with ρ(τ ) ≥ ε the local eigenvalue distribution is universal, Here, E G denotes the expectation with respect to the standard Gaussian ensemble, i.e., with respect to GUE and GOE in the cases of complex Hermitian and real symmetric H, respectively, and ρ sc (0) = 1/(2π) is the value of Wigner's semicircle law at the origin. This paper focusses on random matrices in the real symmetric and the complex hermitian symmetry class. The third universality class, the quaternion self dual case can be treated in a similar fashion.
We introduce a few conventions and notations used throughout this paper. Convention 1.16 (Constants and comparison relation). We use the convention that every positive constant with a lower star index, such as δ * , c * and λ * , explicitly depends only on the model parameters P , p and L. These dependencies can be reconstructed from the proofs, but we will not follow them. Constants c, c 1 , c 2 , . . . , C, C 1 , C 2 , . . . also depend only on P , p and L. They will have a local meaning within a specific proof.
For two non-negative functions ϕ and ψ depending on a set of parameters u ∈ U , we use the comparison relation if there exists a positive constant c, depending explicitly on P , p and L such that ϕ(u) ≥ c ψ(u) for all u ∈ U . The notation ψ ∼ ϕ means that both ψ ϕ and ψ ϕ hold true. In this case we say that ψ and ϕ are comparable. We also write ψ = ϕ + O(ϑ), if |ψ − ϕ| ϑ.
We denote the normalized scalar product between two vectors u, w ∈ C N and the average of a vector by respectively. Note that with this convention | u, u | = N −1 u 2 2 .
2 Bound on the random perturbation of the QVE We introduce the notation G (V ) for the resolvent of the matrix H (V ) , which is identical to H except for the removal of the rows and columns corresponding to the indices V ⊆ {1, . . . , N }. The enumeration of the indices is kept, even though G (V ) has a lower dimension. The diagonal elements of the resolvent, g := (G 11 , . . . , G N N ), satisfy the perturbed quadratic vector equation for all z ∈ H and i = 1, . . . , N . The random perturbation d = (d 1 , . . . , d N ) is given by Here and in the following, the upper indices on the sums indicate which indices are not summed over. For the proof of this simple identity as well as (2.3) below via the Schur complement formula we refer to [11]. As in (2.2) we will often omit the dependence on the spectral parameter z in our notation, i.e., G ij = G ij (z), d k = d k (z), etc.. We will now derive an upper bound on d ∞ = max i |d i |, provided |g i − m i | is bounded by a small constant. At the same time we will control the off-diagonal elements G kl of the resolvent. These satisfy the identity for k = l. The strategy in what follows below is that (2.2) and (2.3) are used to improve a rough bound on the entries of the resolvent G to get the correct bounds on the random perturbation and the off-diagonal resolvent elements. Later, in Section 3, the stability of the QVE under the small perturbation, d, will provide the improved bound on the diagonal elements, G ii − m i = g i − m i . We introduce a short notation for the difference between g and the solution m of the unperturbed equation (1.6), The following lemma is analogous to Lemma 5.2 in [11] with minor modifications. For the completeness of this paper, we repeat these arguments. One small modification is that our estimates also deal with the regime where |z| is large. To keep the formulas short we denote The dependence of the upcoming error bounds on [z] is not always optimal and this dependence is not kept in the statement of our main result Theorem 1.6, either. In fact, the regime [z] ∼ 1 is the most interesting, since our results show that the spectrum of H lies a.w.o.p. inside a compact interval (cf. Corollary 1.9). For the first reading we therefore recommend to think of [z] = 1 in most of our proofs. The [z]-dependence is used mainly in order to propagate a bound from the regime of very large imaginary part of the spectral parameter (Imz ≥ N 5 ) to the entire domain, on which Theorem 1.6 holds.

Lemma 2.1 (Bound on perturbation).
There is a small positive constant λ * , such that uniformly for all spectral parameters z = τ + iη ∈ H with η ≥ N γ−1 : For the proof of this lemma we will need an additional property of the solution of the QVE that is a corollary of Theorem 4.1, where all properties of m taken from [1] are summarized.
Proof of Lemma 2.1. Here we use the three large deviation estimates, is independent of the rows and columns of H with indices in V , these estimates follow directly from the large deviation bounds in Appendix C of [11]. Furthermore, we use The latter inequality is just assumption (1.7) and the bound on h ij follows from (1.10).
We will now show that the removal of a few rows and columns in H will only have a small effect on the entries of the resolvent. The general resolvent identity, leads to the bound In the inequality we used that |m k (z)| ∼ [z] −1 (cf. Corollary 2.2), |g k | = |m k | + O(Λ) and that λ * is chosen to be small enough. We use (2.10) in a similar calculation for G (l) ij and find that on (2.11) Again using (2.10) and that the denominator of the last expression is comparable to [z] −1 , we conclude provided λ * is small. Therefore, we see that it is possible to remove one or two upper indices from G ij for the price of a term, whose size is at most [z]Λ 2 o . We have now collected all necessary ingredients and use them to estimate all the terms in (2.2) one by one. We start with the first summand. By (2.7a) we find With the help of (2.10) we remove the upper index from G (k) ij and get (2.14) For the second summand in (2.2) we use the large deviation bound for the diagonal, (2.7c), and find that By removing the upper index again we estimate We use this in (2.15) and for sufficiently small λ * we arrive at The third summand in (2.2) is estimated directly by We combine the estimates for the individual terms (2.14), (2.17), (2.18) and (2.8). Altogether we conclude that We will now derive in a similar fashion a stochastic domination bound for the off-diagonal error term Λ o . Afterwards, we will combine the two bounds and infer the claim of the lemma. For the off-diagonal error term we proceed along the same lines as for |d k |, using (2.3) instead of (2.2). For k = l we find Here, we applied the large deviation bound (2.7b). Using the Ward identity for the resolvent G (kl) , and (2.10) for removing the upper index of G We remove the upper indices from G (kl) ii and end up with The bound remains true without the summand containing Λ o on the right hand side, since this term can be absorbed into the left hand side, as its coefficient is bounded by N −γ/2 , while on the left Λ o is not multiplied by a small coefficient. Putting (2.19) and (2.23) together yields the desired result (2.5).

Local law away from local minima
In this section we will use the stability of the QVE to establish the main result away from the local minima of the density of states inside its own support, i.e. away from the set M := τ ∈ supp ρ : τ is the location of a local minimum of ρ .
The case where z is close to M requires a more detailed analysis. This is given is Section 4. At the end of this section we will also sketch the proof of Corollary 1.7. In this section we prove the following.
Proposition 3.1 (Local law away from local minima). Let δ * be any positive constant, depending only on the model parameters p, P and L. Then, uniformly for all Furthermore, on the same domain, for any sequence of deterministic vectors This proposition shows the local law (Theorem 1.6) away from the set M. Its proof uses a continuity argument in z. In particular, continuity of the solution of the QVE is needed. The statement of the following corollary is part of the properties of m listed in Theorem 4.1.
The solution of the QVE is uniformly Hölder-continuous, Since the solution can be extended to the real line, it is the harmonic extension to the complex upper half plane of its own restriction to the real line. Therefore, Im m i (τ ) = π p i (τ ) for τ ∈ R. The density of states is the average of the probability densities p i , i.e., ρ = p .
Since we will estimate the difference, g−m, we start by deriving an equation for this quantity. Using the QVE for m and the perturbed equation (2.1) for g we find Rearranging the terms leads to In the proof of Proposition 3.1 we will view (3.7) as a quadratic equation for g − m and we use its stability to bound Λ d in terms of d ∞ . We will now demonstrate this effect in the case when z is far away from the support of the density of states.
Lemma 3.3 (QVE stability away from the support). For z ∈ H with |z| ≥ 10, we have Furthermore, there is a matrix valued function T : H → C N ×N , depending only on S and satisfying the uniform bound T(z) ∞→∞ 1, such that for all w ∈ C N and |z| ≥ 10 the averaged difference between g and m satisfies the improved bound Proof. Since the matrix S is flat (cf. (1.7)), it satisfies the norm bound S ∞→∞ ≤ 1. We also have the trivial bound |m i (z)| ≤ 1/dist(z, supp ρ) ≤ 2|z| −1 ≤ 1/5 at our disposal. This follows directly from the Stieltjes transform representation (3.4). In particular, from the geometric series. By inverting the matrix 1 − diag(m) 2 S and using the trivial bound on m in (3.7) we find Using the bound inside the indicator function from (3.8) and |z| ≥ 10 the assertion (3.8) of the lemma follows. The bound for the average, (3.9), follows by taking the inverse of 1 − diag(m) 2 S on both sides of (3.7) and using (3.8) and |m i | ∼ |z| −1 .
For the proof of Proposition 3.1 we use the stability of (3.7) also close to supp ρ. This requires more care and is carried out in detail in [1]. The result of that analysis is Theorem 4.2. Here we will only need the following consequence of that theorem.
There exist a positive constant λ * , such that the QVE is stable away from M, Furthermore, there is a matrix valued function T : H → C N ×N , depending only on S and satisfying the uniform bound T(z) ∞→∞ 1, such that for all w ∈ C N , 14) Furthermore, the following fluctuation averaging result is needed. It was first established for generalized Wigner matrices with Bernoulli distributed entries in [18].
Theorem 3.5 (Fluctuation Averaging). For any z ∈ D and any sequence of deterministic vectors w = w (N ) ∈ C N with the uniform bound, w ∞ ≤ 1 the following holds true: Proof. The proof directly follows the one given in [11]. We only mention some minor necessary be the complementary projection to the conditional expectation of a random variable X given the matrix H (k) , in which the k-th row and column are removed. From the definition of d in (2.2) and Schur's complement formula in the form, we infer the identity In particular, we have that a.w.o.p.
Thus, proving (3.15) reduces to showing In the setting where H is a generalized Wigner matrix and |z| ≤ 10 this bound is precisely the content of Theorem 4.7 from [11]. The a priori bound used in the proof of that theorem is replaced by for any V ⊆ {1, . . . , N } with N -independent size. This bound is proven in the same way as (2.19). Here, the N 0 hidden in the stochastic domination depends on the size |V | of the index set. Following the proof of Theorem 4.7 given in [11] with (3.17) and tracking the z-dependence, yields the fluctuation averaging, Theorem 3.5.
Proof of Proposition 3.1. Let us show first that (3.3) follows directly from (3.2) by applying the fluctuation averaging, Theorem 3.5. Indeed, (3.2) provides a deterministic bound on the offdiagonal error, Λ o , which is needed to apply the fluctuation averaging to the right hand side of (3.14). It also shows that the indicator functions on the left hand side of (3.14) and (3.9) are a.w.o.p. nonzero. Thus, (3.3) is proven, provided (3.2) is true. The proof of (3.2) is split into the consideration of two different regimes. In the first regime the absolute value of z is large, |z| ≥ N 5 . In this case we make use only of weak a priori bounds on the resolvent elements and the entries of d. Together with Lemma 3.3 they will suffice to prove (3.2) in this case. In the second regime, |z| ≤ N 5 , we use a continuity argument. We will establish a gap in the possible values that the continuous function, z → [z]Λ(z), might have. Here, the stability result Corollary 3.4 is used. We use this gap to propagate the bound with the help of Lemma A.1 in the appendix from |z| = N 5 to the whole domain where |z| ≤ N 5 , η ≥ N γ−1 and we stay away from M.
Regime 1: Let |z| ≥ N 5 . We show that the indicator functions in the statement of Lemma 2.1 are a.w.o.p. not vanishing. We start by showing that the diagonal contribution, Λ d , to Λ is sufficiently small. The reduced resolvent elements for an arbitrary V ⊆ {1, . . . , N } satisfy From this and the definition of d in (2.2) we read off the a priori bound, Here, we used the general resolvent identity (2.9) in the form (3.19) and (3.18) we conclude that uniformly for |z| ≥ N 2 we have With the trivial bound |m i (z)| ≤ 1/dist(z, supp ρ) on the solution of the QVE we infer that on this domain the indicator function in (3.8) is a.w.o.p. non-zero and therefore uniformly for In the last inequality we used the bound on d from (3.19). Thus, for |z| ≥ N 2 the diagonal contribution to Λ does not play a role in the indicator function in the statement of Lemma 2.1. Now we derive a similar bound for the off-diagonal contribution Λ o . Using the resolvent identity (2.9) for i = j again, the bound |h ij | ≺ N −1/2 on the entries of the random matrix and the a priori bound on the reduced resolvent elements, (3.18), in the expansion formula (2.3) yields for k = l. With the bound (3.20) we conclude that We conclude that Lemma 2.1 applies in this regime even without the indicator functions in the formulas (2.5). We use the bound from this lemma for the norm of d and the off-diagonal contribution, Λ o , to Λ, while we use the first inequality in (3.21) for the diagonal contribution, Λ d . In this way we get where we also used g k = m k + O(Λ d ). We find for any ε ∈ (0, γ) that the right hand side of the first inequality can be estimated further by The term N −ε |z| 2 Λ d can be absorbed into the left hand side and by the definition of the stochastic domination and since ε is arbitrarily small the remaining N ε on the right hand side can be replaced by 1 without changing the correctness of this bound. In this way we arrive at For the bound on the off-diagonal error term we plug this result into (3.24) and get Regime 2: Now let |z| ≤ N 5 and suppose that δ * is a positive constant, depending only on the model parameters p, P and L. We start by establishing a gap in the possible values of Λ. The diagonal contribution, Λ d , satisfies according to (3.8) in Lemma 3.3 (for |z| ≥ 10) and (3.13) from Corollary 3.4 (for |z| ≤ 10), where λ * is a sufficiently small positive constant. We estimate the norm of d by Lemma 2.1 and also use the the bound on the off-diagonal contribution, Λ o , from the same lemma, (3.26) Now we use g k = m k + O(Λ d ) and that Im m(z) = πρ(z). With these two ingredients we find for any ε ∈ (0, γ) that The term N −ε [z] 2 Λ d can be absorbed into the left hand side and we arrive at For the off-diagonal error terms we plug this into the second bound of (3.26) after using Im g ρ + Λ d and get In particular, we combine (3.28) and (3.29) to establish a gap in the values that Λ can take, Here we used η ≥ N γ−1 . Now we apply Lemma A.1 on the connected domain with the choices The continuity condition (A.1) of the lemma for these two functions follows from the Höldercontinuity, (3.5), of the solution of the QVE and the weak continuity of the resolvent elements, We will now sketch the proof of Corollary 1.7. The set-up in this corollary differs slightly from the one used in the rest of this paper, because the uniform bound (assumption (C)) on the solution of (1.6) is not assumed. We therefore use additional information from [1] about m in this more general setting.
Proof of Corollary 1.7. Since the boundedness assumption (C) on the solution of the QVE is dropped in this corollary, its proof starts by showing that nevertheless for some constant P > 0 we have The probability densities are comparable, The solution m has a uniformly Hölder-continuous extension (denoted again by m) to the closed complex upper half plane H = H ∪ R, Its absolute value satisfies Let ρ : R → [0, ∞), τ → p(τ ) be the density of states, defined in (1.11). Then there exists a positive constant δ * , depending only on the model parameters p, P and L, such that the following holds true. The support of the density consists of K ∼ 1 disjoint intervals of lengths at least 2δ * , i.e., The size of the harmonic extension, ρ, up to constant factors, is given by explicit functions as follows. Let η ∈ [0, δ * ].
• Bulk: Close to the support of the density of states but away from the local minima in M (cf. (3.1)) the function ρ is comparable to 1, i.e., • At an internal edge: At the edges α i , β i−1 with i = 2, . . . , K in the direction where the support of the density of states continues the size of ρ is • Inside a gap: Between two neighboring edges β i−1 and α i with i = 2, . . . , K, the function ρ satisfies • Around an extreme edge: At the extreme points α 1 and β K of supp ρ the density of states grows like a square root , (4.5d) • Close to a local minimum: In a neighborhood of a local minimum in the interior of the support of the density of states, i.e., for τ 0 ∈ M ∩ int supp ρ, we have • Away from the support: Away from the interval in which supp ρ is contained The next theorem shows that the QVE is stable under small perturbations, d, in the sense that once a solution of the perturbed QVE (4.6) is sufficiently close to m, then the difference between the two can be estimated in terms of d ∞ . In [1] it is stated as Proposition 8.1.
the difference between g and m is bounded in terms of in the following two ways. For all w ∈ C N on the whole complex upper half plane: The scalar function Θ : H → [0, ∞) satisfies a cubic equation of the form (4.10) The coefficients π 1 , π 2 : H → C may depend on S and g. They satisfy for all z ∈ H. Moreover, the functions σ, s, t (1) , t (2) and T are regular in the sense that Furthermore, the function σ is related to the density of the stated by We warn the reader that in this paper Θ and σ denote the absolute values of the quantities denoted by the same symbols in Proposition 8.1 of [1]. The function σ appears naturally in the analysis of the QVE. Analogous to the more explicitly constructed function ∆ from Definition 1.4, at an edge the value of σ 3 encodes the size of the corresponding gap in supp ρ. At the local minima in M \ {α i , β i } the value of σ 3 is small, provided the density of states has a small value at the minimum. In this sense it is again analogous to ∆, which vanishes at these internal minima.

Coefficients of the cubic equation
The stability of QVE near the points in M requires a careful analysis of the cubic equation (4.10) for Θ from Theorem 4.2. For this, we will provide a more explicit description of the upper and lower bounds from (4.11) on the coefficients, π 1 and π 2 , of the cubic equation. There are positive constants δ * and c * such that for all η ∈ [0, δ * ] the coefficients, π 1 and π 2 , of the cubic equation (4.10) satisfy the following bounds.
• Around an internal edge: At the edges α i , β i−1 of the gap with length ∆ := α i − β i−1 for i = 2, . . . , K, we have (4.15a) • Well inside a gap: Between two neighboring edges β i−1 and α i of the gap with length ∆ := α i − β i−1 for i = 2, . . . , K, the first coefficient, π 1 , satisfies The second coefficient, π 2 , satisfies the upper bounds, (4.15c) • Around an extreme edge: Around the extreme points α 1 and β K of the support of the density of states (4.15d) • Close to a local minimum: In a neighborhood of the local minimum in the interior of the support of the density of states, i.e. for τ 0 ∈ M ∩ int supp ρ, we have Proof. The proof is split according to the cases above. In each case we combine the general formulas (4.11) with the knowledge about the harmonic extension, ρ, of the density of states from Theorem 4.1 and about the behavior of the positive Hölder-continuous function, σ, at the minima in M from (4.14). The positive constant δ * is chosen to have at most the same value as in Theorem 4.1. We start with the simplest case.
The claim now follows from the behavior of ρ, given in Theorem 4.1, inside this domain.
Close to a local minimum: In this case ρ + σ is comparable to ρ. In fact, using the 1/3-Hölder-continuity of σ (cf. (4.12)) and its bound at the minimum, τ 0 ∈ M, (cf. (4.14)) we find In the last relation we used the behavior (4.5e) of ρ from Theorem 4.1. By (4.11) we conclude that inside the δ * -neighborhood of τ 0 , Using the upper and lower bounds on ρ(z) again, gives the desired result, (4.15e).
Around an internal edge: First we prove the bounds on |π 2 |, starting from (4.11). The upper bound simply uses the 1/3-Hölder-continuity and the behavior at the edge points of σ, where τ 0 is one of the edge points α i or β i−1 . The claim follows from plugging in the size of ρ from the two corresponding domains in Theorem 4.1, i.e., the domain close to an edge, (4.5b), and the domain inside a gap, (4.5c).
For the lower bound we consider two different regimes. In the first case z is close to the edge point, |z − τ 0 | ≤ c∆, for some small positive constant c, depending only on the model parameters p, P and L. We find provided c is small enough. This bound coincides with the lower bound on π 2 in (4.15a), once the size of ρ from (4.5b) is used.
In the second regime, |z−τ 0 | ≥ c∆, we simply use |π 2 (z)| ρ(z) from (4.11). If Rez ∈ supp ρ, then the size of ρ from (4.5b) yields the desired lower bound. If, on the other hand, Rez lies inside a gap of supp ρ, then we use the freedom of choosing the constant c * in Proposition 4.3. Suppose c * ≤ c/2. Then |z − τ 0 | ≥ c∆ and |Rez − τ 0 | ≤ c * ∆ imply Imz ∆ and This finishes the proof of the upper and lower bound on |π 2 | on this domain. For the claim about |π 1 | we plug the result about |π 2 | and the size of ρ into Well inside a gap: For the upper bound on |π 2 | we simply use (4.18) again, which follows from (4.12) and (4.14). The comparison relation for |π 1 | now follows from (4.19) again. For the lower bound, |π 1 | Imz/ρ and (4.5c) from Theorem 4.1 are sufficient. This finishes the proof of the proposition.

Rough bound on Λ close to local minima
In the following lemma we will see that a.w.o.p. Λ ≤ c for some arbitrarily small constant c > 0.
Since the local law away from M is already shown in Proposition 3.1 we may restrict to bounded z in the following. From here on until the end of Section 4 we assume |z| ≤ 10.

Lemma 4.4 (Rough bound).
Let λ * be a positive constant. Then, uniformly for all z = τ +iη ∈ H with η ≥ N γ−1 , the function Λ is uniformly small, Proof. Away from the local minima in M the claim follows from (3.2) in Proposition 3.1. We will therefore prove that Λ is smaller than any fixed positive constant in some δ-neighborhood of M. We will use the freedom to choose the size δ of these neighborhoods as small as we like. Let us sketch the upcoming argument. Close to the points in M we make use of Theorem 4.2. Using Lemma 2.1, we will see that the right hand side of the cubic equation in Θ, (4.10), is smaller than a small negative power, N −ε , of N , provided Λ is bounded by a small constant, Λ ≤ λ * . This will imply that Θ itself is small and through (4.8) that the bound on Λ can be improved to Λ ≤ λ * /2. In this way we establish a gap in the possible values that the continuous function Λ can take. Lemma A.1 in the appendix is then used to propagate the bound on Λ from Proposition 3.1 into the δ-neighborhoods of the points in M.
Now we start the detailed proof from the fact that Θ satisfies the cubic equation (4.10), whose right hand side is bounded by C d ∞ for some constant C, depending only on the model parameters. Note that d ∞ 1 as long as Λ ≤ λ * because in this case |m i | ∼ 1, |g i | ∼ 1 and g satisfies the perturbed QVE with perturbation d. From the definition of Θ in (4.7) and the uniform bound on s from (4.13), we get Θ Λ. Since the coefficient |π 2 | is uniformly bounded (cf. (4.11)), the cubic equation for Θ implies the three bounds Here, ε 1 , ε 2 ∈ (0, 1) are arbitrary constants and C 1 , C 2 > 0 depend only on the model parameters.
Let δ ∈ (0, 1) be another constant, which we will later fix later to only depend on the model parameters p, P , and L. We split M into four subsets, which are treated separately, The function ∆ is from Definition 1.4 and its value is simply the length of the gap at the point τ 0 ∈ ∂ supp ρ where it is evaluated. We also define the δ-neighborhoods of these subsets, As an immediate consequence of the upper and lower bounds on the coefficients, π 1 and π 2 , presented in Proposition 4.3, we see that On D 2 (δ) only the regimes around an internal edge, (4.15a), and around an extreme edge, (4.15d), are relevant. The case well inside the gap, (4.15b) and (4.15c), does not apply for small enough δ, since ∆(τ 0 ) > δ 1/2 but |z − τ 0 | ≤ δ. Now we make a choice for the two constants ε 1 and ε 2 . We express them in terms of δ as ε 1 := δ , ε 2 := δ 1/5 .
We pair the bounds on Θ from (4.21) with the corresponding bounds from (4.22) on the coefficients of the cubic equation. For small enough δ the conditions on π 1 in (4.21a) and π 2 in (4.21b) are automatically satisfied by the choice of ε 1 and ε 2 , as well as the upper and lower bounds from (4.22a) and (4.22b). Thus, for small enough δ we end up with At this stage we use Lemma 2.1 in the form of d ∞ ≺ N −γ/2 on the set where Λ ≤ λ * /10, say, and (4.8) from Theorem 4.2. We may choose λ * to be sufficiently small compared to the constants with the same name from these two statements. Furthermore, we choose δ so small that δ 1/5 ≤ λ * . On the different regimes D k (δ), we find that The right hand sides, including the constants from the comparison relation, can be made smaller than any given constant λ * by choosing δ = δ * , depending only on the model parameters, small enough and N sufficiently large. Furthermore, (4.23) establish a gap in the possible values that Λ can take on the δ * -neighborhood of any point in M. By Proposition 3.1 we have the bound Λ ≺ N −γ/2 outside these δ * -neighborhoods and thus also for at least one point in the boundary of each neighborhood. Now we apply Lemma A.1 to each neighborhood and in this way we propagate the bound Λ ≤ λ * to every point z in the δ * -neighborhood of M with Imz ≥ N γ−1 .

Proof of Theorem 1.6
According to Proposition 3.1 the local law, Theorem 1.6, holds outside the δ * -neighborhoods of the points in M. It remains to show that it is true inside these neighborhoods as well. From here on we assume that z ∈ H satisfies dist(z, M) ≤ δ * and Imz ≥ N γ−1 . Let τ 0 ∈ M be one of the closest points to z in M, i.e., |z − τ 0 | = dist(z, M) .
When τ 0 ∈ ∂ supp ρ we denote by θ = θ(τ 0 ) ∈ {±1} the direction that points towards the gap in supp ρ at τ 0 . In case τ 0 ∈ ∂ supp ρ we make the arbitrary choice θ := +1, i.e., The minimum τ 0 will be considered fixed in the following analysis. We parametrize z as follows in the neighborhood of τ 0 ∈ M: where η ∈ (0, δ * ] and ω ∈ [−δ * , δ * ]. We will then prove the local law in the form To define E we introduce explicit auxiliary functions π 1 , π 2 and ρ that are comparable in size to the corresponding functions π 1 , π 2 and ρ. The reason for using these auxiliary quantities for the definition of E instead of the original ones is twofold. Firstly, in this way E will be an explicit function instead of one that is implicitly defined through the solution of the QVE. The function E is explicit in the sense that there is a formula for the solution of the cubic equation that defines it and the coefficients are given by the explicit functions π 1 , π 2 and ρ. Secondly, E will be monotonic of its second variable, η. This property will be used later. The definition of the three auxiliary functions will be different, depending on whether τ 0 is in the boundary of the support of the density of states or not. Recall the definition (1.16) of ∆ δ (τ ).
• Edge: If τ 0 ∈ ∂ supp ρ, i.e. τ 0 is an edge of a gap of size ∆ := ∆ 0 (τ 0 ) in the support of the density of states or an extreme edge. Then we define the three explicit functions (4.26a) Here, c * is the constant from Proposition 4.3.

29)
We explain how to define the function κ that appears in the statement of Theorem 1.6 and its relationship with E. We simply set κ to be the right hand side of (1.22) whenever (1.23) is not satisfied and to be half of the right hand side of (1.24) whenever (1.23) is satisfied. With this definition we have (4.30) for any N ≥ N 0 , where the threshold N 0 here depends on ε in addition to p, P , L, µ and γ. The inequality (4.30) is verified by plugging its right hand side into (4.29) in place of E and checking that on each regime the resulting expression on the right hand side of (4.29) is smaller than the resulting expression on the left hand side of (4.29). The factor of N 9 ε in (4.30) can be absorbed in the stochastic domination in (4.25). Thus Here we used Im g ρ+| g−m | and ρ η. Since at the end the local law implies | g−m | ≺ E, heuristically we may replace | g−m | in (4.31) by E. In this case, from the fluctuation averaging, Theorem 3.5, we would be able to conclude that for any deterministic vector w with bounded entries, (4.32) Up to the technical factor of N 8ε the right hand side coincides with the right hand side of the cubic equation defining E. On the other hand, the right hand side of the cubic equation (4.10) for the quantity Θ from Theorem 4.2 is of the same form as the left hand side of (4.32). Therefore, we infer (4.33) We will argue that on appropriately chosen domains out of the three summands in the cubic expression in Θ always one is the biggest by far. Therefore, the error function E, defined by (4.29), is essentially the best bound on Θ that one may hope to deduce from (4.33). Indeed, since Θ is by definition an average of g − m, we expect Θ ≺ E.
We will now prove (4.25). To this end we gradually improve the bound on Θ. Fix some ε ∈ (0, ε). The sequence of deterministic bounds on this quantity is defined as  and some k ∈ N the quantity Θ fulfills (4.35) Then Θ(z) ≺ Φ k+1 (ω, η).
We will postpone the proof of this lemma until the end of this section. First we show how to use this result to prove the local law (Theorem 1.6). Fix an integer k ≥ 0 and assume that Θ + | g − m | ≺ Φ k is already proven. For k = 0 this follows from the rough bound on Λ in Lemma 4.4, Λ ≺ 1 = Φ 0 . As an induction step we show that Θ + | g − m | ≺ Φ k+1 .
From (4.31) we see that The right hand side is a deterministic bound on the off-diagonal error Λ o . We apply the fluctuation averaging, Theorem 3.5, to the right hand side of the cubic equation (4.10). In this way we see that the hypothesis (4.35) of Lemma 4.5 is satisfied. Therefore, the bound on Θ is improved to In order to improve the bound on | g − m | as well, we use the bound (4.9) from Theorem 4.2 for averages of g − m against bounded vectors. Since by Lemma 4.4 the deviation function Λ is bounded by a small constant, the indicator function in (4.9) is a.w.o.p. non-zero. Choosing w = (1, . . . , 1), we find that where w = Tw is a bounded, w ∞ 1, deterministic vector. Together with the bound (4.36) we apply the fluctuation averaging again, This concludes one step in the iteration, i.e., we have shown Θ + | g − m | ≺ Φ k+1 . We repeat this step finitely many times and each time improve Φ k by a factor of N −ε until it reaches its target value N 9ε E and is not improved anymore. At that stage we have The subindex ε indicates that the threshold N 0 from the stochastic domination may depend on ε. But since ε was arbitrary, we infer Θ + | g − m | ≺ E, where now and until the start of the proof of Lemma 4.5 below the stochastic domination is ε-independent. By (4.31) we conclude For the bound on the diagonal contribution, Λ d , we use (4.8) to get Finally, with the help of (4.9), (4.40) and the fluctuation averaging, we prove the bound on averages of g − m against any bounded, w ∞ ≤ 1, deterministic vector, This finishes the proof of Theorem 1.6 apart from the proof of Lemma 4.5 which we will tackle now. The shaded area is forbidden for Θ. Since the continuous function Θ lies below this region at η = δ * it stays below it for any η ≥ N γ−1 , hence proving Θ ≤ Φ k+1 .
Proof of Lemma 4.5. The spectral parameter z = τ 0 +θ ω +iη lies inside the δ * -neighborhood of τ 0 . We fix ω ∈ [−δ * , δ * ] and show that the claim holds for any choice of η ∈ [N γ−1 , δ * ] We split the interval of possible values of η into two or three regimes, depending on the case we are treating.
In the cubic equation (4.29), used to define the error function E, the coefficients π 1 and π 2 on the left hand side are monotonously increasing functions of η. The linear and the constant coefficient of E on the right hand side are monotonously decreasing in η. Thus, E itself is a monotonously decreasing function of η. From this fact and the definition of the regimes I 1 , I 2 and I 3 we see that Here, we interpret I 2 = ∅ if η 1 ≤ η 2 and I 3 = ∅ if η 2 ≤ N γ−1 . Now we define a z-dependent indicator function . (4.41) This function fixes the values of Θ to a small interval just below the deterministic control parameter Φ k . We will prove that Θ cannot take these values, i.e. χ = 0 a.w.o.p.. Figure 4.1 illustrates this argument. Compared to Figure 6.1 in [11] we see that instead of two there are now three domains, I 1 (ω), I 2 (ω) and I 3 (ω), to be distinguished. The reason for this extra complication is that (4.10) is cubic in Θ, compared to the quadratic equation for [v] that appeared in the proof of Lemma 6.2 in [11]. To see that χ = 0, first note that the choice of the domains, I l , ensures that there is always one summand on the left hand side of the cubic equation (4.10) for Θ which dominates the two others by a factor N ε , whenever χ does not vanish. In fact, by construction we have: Claim: The random functions Θ and χ satisfy a.w.o.p. (4.42) We will verify this fact at the end of the proof of this lemma. Now we will simply use it. First we combine the assumption (4.35) of the lemma and (4.42) to obtain Here we also gave up a factor of N ε to get an inequality instead of the stochastic domination, and replaced ρ by the comparable quantity ρ. By the definition of the indicator function χ we have Θχ ≥ N −7ε Φ k . Using this to bound the left hand side, and that ε ≤ ε, we obtain Comparing this with the defining equation (4.29) for E we conclude that a.w.o.p.
On the other hand, by the definition of Φ k in (4.34) we know that Φ k > N 8ε E. These two inequalities yield Now we successively, for l = 1, 2, 3, apply Lemma A.1 on the connected domains τ 0 + θω + i I l (ω) with the choices ϕ := Θ and where as explained after the definition of I 1 , I 2 and I 3 above we have holds because of (4.43) and the definition of χ and Φ for an appropriate choice of the exponent D 3 . The condition, ϕ(z 0 ) ≤ Φ(z 0 ) a.w.o.p., necessary for the application of Lemma A.1 on the first domain, τ 0 + θω + i I 1 (ω), is obtained form Proposition 3.1. With Lemma A.1 we propagate the bound to all z ∈ τ 0 +θω+i I 1 (ω). Now we apply Lemma A.1 on the second domain τ 0 +θω+i I 2 (ω), provided I 2 (ω) is not empty. The bound (A.3) for the new z 0 = τ 0 + θω + iη 1 is obtained from the previous step. Finally, we apply Lemma A.1 to τ 0 + θω + i I 3 (ω), in case it is not empty, with the new choice z 0 = τ 0 + θω + iη 2 . Altogether, we applied the lemma at most three times. Through this procedure we prove that a.w.o.p.
This finishes the proof of Lemma 4.5 up to verifying the claim (4.42).
Proof of the claim: For the proof of (4.42) one verifies case by case that on I 1 the term π 1 Θ ∼ |π 1 |Θ is bigger than the two other terms, π 2 Θ 2 and Θ 3 by a factor of N ε . If I 3 is not empty then the term Θ 3 is the biggest. If I 2 is not empty, then |π 2 | ∼ π 2 and π 2 Θ 2 is the biggest term by a factor of N ε . More specifically, when η ∈ I j and χ = χ(ω, η) = 1 we show where π 3 = π 3 := 1. As an example we demonstrate these relations in a few cases: • Well inside a gap: If τ 0 ∈ ∂ supp ρ and ω ∈ [c * ∆, ∆/2] then I 2 (ω) = ∅. We now check that on I 1 (ω) the linear term in Θ is the biggest while on I 3 (ω) the cubic term dominates. First, let η ∈ I 1 (ω). Then the following chain of inequalities hold, Here, we used (4.28), (4.15b), the definition of I 1 (ω) and (4.26c) in the form π 2 ∼ (∆+η) 1/3 . Now we can use χ to replace Φ k by Θ. By definition of χ and since π k |π k | for k = 1, 2 we also get We conclude that on I 1 (ω) the linear term in Θ dominates the others, Suppose now that η ∈ I 3 (ω). In this case, using the choice of the indicator function χ, By definition of I 3 (ω) and (4.26c) we find that Altogether we find that the cubic term dominates the two others, • Inside a gap close to an edge on I 2 : If τ 0 ∈ ∂ supp ρ, ω ∈ [0, c * ∆] and η ∈ I 2 (ω), then we will show the quadratic term in Θ dominates the two other terms. We have where in the inequality we used the definition of I 2 (ω). The choice of χ guarantees that Φ k χ ≥ N 3ε Θ χ. Thus, the quadratic term is larger than the cubic term by a factor of N ε . On the other hand Here, in the first inequality we used the indicator function χ and in the second inequality the definition of I 2 (ω). Altogether, we arrive at • Internal minimum on I 1 : If τ 0 ∈ M \ ∂ supp ρ and η ∈ I 1 (ω), then the linear term is the biggest, Here, we used (4.28) and the definitions of π 1 and I 1 (ω), respectively. Since Φ k χ ≥ N 6ε Θ χ and by the definition of π 2 this shows that the linear term is larger than the quadratic term by a factor of N 4ε . In order to compare the linear with the cubic term we estimate further. By definition of I 1 (ω), Again we use the lower bound on Φ k χ and get Thus we showed that on the domain I 1 (ω) The other cases are proven similarly. This completes the proof of (4.42).

Proof of Corollary 1.9
Here we explain how the local law, Theorem 1.6, is used to estimate the difference between the cumulative density of states and the eigenvalue distribution function of the random matrix H.
The following auxiliary result shows that the difference between two probability measures can be estimated in terms of the difference of their respective Stieltjes transforms. For completeness the proof is given in the appendix. It uses a Cauchy-integral formula that was also applied in the construction of the Helffer-Sjöstrand functional calculus (cf. [8]) and it appeared in different variants in [18], [12] and [17].
Lemma 5.1 (Bounding measures by Stieltjes transforms). There is a universal constant C > 0, such that for any two probability measures, ν 1 and ν 2 , on the real line and any three numbers η 1 , η 2 , ε ∈ (0, 1] with ε ≥ max{η 1 , η 2 }, the difference between the two measures evaluated on the interval [τ 1 , τ 2 ] ⊆ R, with τ 1 < τ 2 , satisfies Here, the three contributions to the error, J 1 , J 2 and J 3 , are defined as where m ν denotes the Stieltjes transform of ν for any signed measure ν.
We will now apply this lemma to prove Corollary 1.9 with the choices of the measures As a first step we show that a.w.o.p. there are no eigenvalues with an absolute value larger or equal than 10, i.e., We focus on the eigenvalues λ i ≥ 10. The ones with λ i ≤ −10 are treated in the same way. We will show first that there are no eigenvalues in a small interval around τ with τ ≥ 10. In fact, we prove that for γ ∈ (0, 1/3), where the supremum is taken over Plugging this bound into the definition of J 1 , J 2 and J 3 from (5.2) and using (5.1) and the fact that ρ = 0 in this regime shows the validity of (5.5). We conclude that a.w.o.p. there are no eigenvalues in an interval of length N −1 to the right of τ . By using a union bound this implies that The eigenvalues larger than N are treated by the following simple argument, Thus ( Here we evaluated ∆(τ 1 ) = 1 and thus κ η + (N η) −1 . With J 1 defined as in (5.2) we infer J 1 ≺ N −1 . Theorem 1.6 also implies the bound sup ω∈ [−20,20] sup η∈ [1,2] since in this regime κ 1, thus showing that J 3 ≺ N −1 . We are left with estimating the three terms constituting J 2 . The first and second of these terms are estimated trivially by using the boundedness of their integrands. Therefore, we conclude that where the error term, R, is defined as This expression is derived by using the bound (1.22) on κ for the integrand of the third contribution to J 2 .
To estimate R further we distinguish three cases, depending on whether τ is away from M, close to an edge or close to a local minimum in the interior of supp ρ. In each of these cases we prove Away from M: In case dist(τ, M) ≥ δ * , with δ * the size of the neighborhood around the local minima from Theorem 4.1, we have ∆ 1/3 + ρ ∼ 1 and thus the η-integral in (5.10) yields a factor comparable to N −1 log N . Thus, R(τ ) ≺ N −1 .
Close to an internal local minimum: Suppose |τ − τ 0 | ≤ δ * for some τ 0 ∈ M \ ∂ supp ρ. Then using the size of ρ from (4.5e) of Theorem 4.1 we see that The bound (5.11) follows by performing the integration over η.
This finishes the proof of (5.11). We insert this bound into (5.9) and use that γ was arbitrary. Thus, we find This finishes the proof of (1.27) since there are no eigenvalues below −10. Now we prove (1.28). Let τ ∈ R \ supp ρ. Suppose that for some k = 1, . . . , K we have |τ − β k | = dist(τ, ∂ supp ρ). The case when τ is closer to the set {α k } than to {β k } is treated similarly. Suppose further that where δ k are defined as in (1.29) and δ 0 = N γ−2/3 . Note that there is nothing to show if k > 1 and the size of the gap, α k −β k−1 , is smaller than 2δ k , i.e., if such a τ does not exist. In particular, we have α k − β k−1 = ∆(τ ) N −1/2 . We will show that a.w.o.p. there are no eigenvalues in an interval of length N −2/3 to the right of τ , i.e. We use the local law, Theorem 1.6, to estimate the differences between the Stieltjes transforms of the two measures for the integrands in the definition of the three error terms, J 1 , J 2 and J 3 from (5.2). By the definition of δ k the condition (1.23) is satisfied inside the integrals and we use the improved bound, (1.24), on κ. Indeed, we find where the supremum is taken over ω ∈ [τ − N −2/3 , τ + 2N −2/3 ] and η ∈ [N −2/3 , 2N −2/3 ]. With this, the definition of δ k and the size of ρ from (4.5c) and (4.5d) we infer From this (5.12) follows. The claim, (1.28), is now a consequence of a simple union bound taken over the events in (5.12) with different choices of τ . This finishes the proof of Corollary 1.9.

Proof of Corollary 1.10
Here we show how we get the rigidity, Corollary 1.10, from Corollary 1.9. Fix a τ ∈ [α 1 , β K ]. We define the random fluctuation to the left, δ − , and to the right, δ + , of the eigenvalue λ i(τ ) as We show now that with this definition, We start with the upper bound on λ i(τ ) . By the definition of i(τ ) we find the inequality The definition of δ + = δ + (τ ) implies that By monotonicity of the cumulative eigenvalue distribution, we conclude that λ i(τ ) ≤ τ + δ + . Thus, the upper bound is proven. Now we show the lower bound. We start similarly, By definition of δ − we get Here the lim inf is necessary, since the cumulative eigenvalue distribution is not continuous from the left. We conclude that λ i(τ ) ≥ τ − δ − − ε for all ε > 0 and therefore the lower bound is proven. Now we start with the proof of (1.33). For this we show that for any τ that is well inside the support of the density of states, i.e., that satisfies (1.31), we have If τ is in the bulk, i.e., dist(τ, M) ≥ δ * , then δ ∼ N −1 and thus (5.16) follows from (1.27). We distinguish the two remaining cases, namely whether τ is close to an edge or to a local minimum inside the interior of supp ρ.
Close to an edge: Suppose that τ ∈ [β k − δ * , β k − ε k ]. The case when τ is closer to {α k } than to {β k } is treated similarly. By the definition of ε k in (1.32) and by the size of ρ from (4.5d) and (4.5b) in Theorem 4.1 we see that ε k N γ δ. Using Corollary 1.9 we find for any ε ∈ (0, γ/2) that

On the other hand
Here we used the size of ρ from Theorem 4.1, the definition of δ and β k − τ ≥ ε k . Since ε was arbitrary we conclude that δ + (τ ) ≺ δ. The bound, δ − (τ ) ≺ δ, is shown in the same way.

Proof of Corollary 1.13
The delocalization of eigenvectors is a simple consequence of the anisotropic local law Theorem 1.12 using the argument from [11]. Expressing the resolvent in the eigenbasis, we have where u (i) is the 2 -normalised eigenvector corresponding to the eigenvalue λ i . We evaluate this at z := λ k + iN γ−1 with γ > 0 as in the statement of Theorem 1.12. The anisotropic local law implies that also b · G(z)b is uniformly bounded. Hence we get by keeping only a single summand i = k from (5.20). As γ > 0 was arbitrary we conclude that uniformly in k.
6 Anisotropic law and universality 6.1 Proof of Theorem 1.12 Given the entrywise local law, Theorem 1.6, the proof of the anisotropic law follows exactly as Section 7 in [6], where the same argument was presented for generalized Wigner matrices (this argument itself mimicked the detailed proof of the isotropic law for sample covariance matrices in Section 5 of [6]). The only difference is that in our case G ii (z) is close to m i (z), the i-th component of the solution to the QVE, which now genuinely depends on i, while in [6] we had G ii (z) ≈ m sc (z) for every i, where m sc (z) is the solution to (1.2). However, the diagonal resolvent elements played no essential role in [6]. We now explain the small modifications. Recall from Section 5.2 of [6] that by polarization it is sufficient to prove (1.36) for 2normalized vectors w = v. We can then write The first term containing the diagonal elements G ii is clearly bounded by the right hand side of (1.36) by Theorem 1.6. This is the first instant where the nontrivial i-dependence of m i is used.
The main technical part of the proof in [6] is then to control Z, the contribution of the off diagonal terms. We can follow this proof in our case to the letter; the nontrivial i-dependence of m i requires a slight modification only at one point. To see this, we recall the main structure of the proof. For any even p, the moment is computed. Let us concentrate on a fixed summand in (6.1) and let B = {b k 1 } ∪ {b k 2 } be the set of v-indices appearing in that term. Using the resolvent identity (2.9) we successively expand the resolvents until each of them appears in a maximally expanded form, where every resolvent entry is of the form G ij , with i, j / ∈ B, appear. In other words, the v-indices and the indices of the resolvent entries are completely decoupled; only explicit products of entries of H represent the connections between them. We can now take partial expectation w.r.t. the rows and columns of these h-terms. In this way we guarantee that each index in B appears at least twice as a value of b k1 or b k2 in (6.1), i.e., the entries of v must be at least paired, and therefore the 2p-fold summation in (6.1) effectively becomes at most a p-fold summation. This renders the uncontrolled 1 -norm of v to q -norms of v, with q ≥ 2, which are bounded by one by normalization.
Along this procedure it is only at the treatment of the maximally expanded diagonal resolvent elements appearing in the non-trivial leaves (cf. Subsection 5.12 of [6]) where we need to slightly adjust the proof to the setting where S is not stochastic. Using the QVE (1.6) and Schur's formula, similarly as in (3.16), we obtain a representation, where all the dependence of the B-columns and -rows of H is explicit This formula replaces (5.41) from [6]. Taking the inverse of this formula and expanding around the leading term m b , we get a geometric series representation for G (B\b) bb in terms of powers of the last three term in (6.2). The resulting formula is analogous to (5.42) in [6]. The geometric series converges because the last three term on the right hand side of (6.2) are much smaller than |1/m b | ∼ 1 a.w.o.p.. Indeed, the last two terms in (6.2) are of size N −1 and N −1/2+c a.w.o.p., respectively. The double sum in (6.2) is small by using the large deviation estimates (2.7a)-(2.7c), similarly as in the proof of Lemma 2.1. When estimating the diagonal sum i = j, we note that |G ii − G ii | similarly to (2.12), and then we use the local law Theorem 1.6 to see that also |G ii − m i | is small.
The proof in [6] did not use the specific form of the subtracted term s bi m i δ ij in (6.2), just the fact that the subtraction made (2.7c) applicable for the double summation in (6.2). After this slight modification, the rest of the proof in [6] goes through without any further changes.

Proof of Theorem 1.15
For the proof of Theorem 1.15 we follow the method developed in [14,17,11]. Theorem 2.1 from [15] was designed for proving universality for a random matrix with a small independent Gaussian component and densities of state that may differ from Wigner's semicircle law. The main theorem in [15] asserts that if local laws hold in a sufficiently strong sense then bulk universality holds locally for matrices with a small Gaussian component. We remark that a similar approach was independently developed in [22] that can also be easily used to conclude bulk universality from Theorem 1.6, but here we follow [15]. In Section 2.5 of [15] a recipe was given how to use this theorem to establish universality for a quite general class of random matrix models even without the Gaussian component, as long as uniform local laws on the optimal scale are known and the matrix satisfies the appropriate q-fullness condition (cf. Definition 1.14) that allows for an application of the moment matching (Lemma 6.5 in [17]) and the Green's function comparison theorem (Theorem 2.3 in [17]). Following this recipe it remains to show that the local law, Theorem 1.6, holds for all matrices with the same variance matrix as H t = e −t/2 H 0 + (1 − e −t ) 1/2 U, for all t ∈ [0, T ] and T is a small negative power of N , i.e., T = N −ε . Here, U is the standard Gaussian ensemble (GUE or GOE), H 0 has independent entries and is independent of U and H T has variance matrix S. The bounded moment condition (D) is automatically satisfied for H t by the construction of H 0 .
The variance matrix of H t is where S G is the variance matrix of the standard Gaussian ensemble and S 0 is given by Since condition (A) is simply a normalization, it suffices to verify (B) and (C) from (1.9) and (1.8) for the variance matrices S t . The uniform primitivity, assumption (B) is satisfied because S is q-full (cf. Definition 1.14) and T 1. In particular, the matrices S t are uniformly q/2-full. This also implies the boundedness of the corresponding solutions, m t , to the QVE in a neighborhood of the origin, z = 0, by Remark 4.2 in [1]. The validity of (C) away from z = 0 follows from Remark 1.11 in [1] and from the L 2 -bound, This bound holds for the solution of the QVE that corresponds to very general variance matrices and is part of the statement of Theorem 1.1 in [1]. The 3 in this bound replaces the original 2 in order to compensate for the potentially violated normalization of the variance matrices. This shows that Theorem 1.6 can be applied to H t and thus universality is proven. Proof. We will not carry the upper index N in this proof. First we choose a grid G ⊆ D with the following properties

A Appendix
• The number of points in G is polynomially large, i.e., |G| ≤ C 2 N D 4 .
• The grid is connected and sufficiently dense in D, i.e., for any two points z 1 , z ∈ G there is a path (z i ) K i=2 ⊆ G, such that max{|z K − z|, |z i+1 − z i |} ≤ N −D 5 for all i = 1, . . . , K − 1.
Here, the positive exponent D 5 is choose sufficiently large such that C 1 N D 2 −ε 1 D 5 ≤ N −D 3 /2.
Then an upper bound on the positive constants D 4 and C 2 is determined by the choice of D 5 and the diameter of D, i.e., by D 1 . Now let z ∈ G. Then we find a path (z i ) K i=1 in G that connects z 0 with z K+1 := z in the sense of the second property of G. We may assume the length of the path, K, to be bounded by By (A.1) and since G is sufficiently dense in D this bound extends to all z ∈ D and the lemma is proven.
Finally we derive a bound for L 3 . We split the integral into two components, to get the bound Together, the two bounds imply