Universal Eigenvalue Statistics for Dynamically Defined Matrices

We consider dynamically defined Hermitian matrices generated from orbits of the doubling map. We prove that their spectra fall into the GUE universality class from random matrix theory.


Introduction
Eugene Wigner formulated the vision that the spectra of complex quantum systems are well described by the eigenvalues of random matrices.His work inspired the Wigner-Dyson-Mehta universality conjecture that the eigenvalues of matrices with independent entires are universal in the sense that they only depend on the symmetry type of the matrix-Hermitian (GUE) or real symmetric (GOE).The Wigner-Dyson-Mehta universality conjecture was a driving force of random matrix theory and was finally resolved around 10 years ago in a series of groundbreaking works [19,20,25,26,37].A major focus of random matrix theory in recent years has been to broaden the reach to this vision to various models of "not-too-random" matrices whose entries have various dependencies.Among the main well-established avenues of research are the study of probabilistically generated correlations [1,4,5,8,18,33] and graph-theoretically induced constraints on random adjacency matrices [9,10,31].The motivation to move towards increasingly structured ensembles is the widely held belief that the universality phenomenon encompasses many other strongly correlated systems than traditional random matrices.This belief is based on many real-world examples of strongly correlated point processes which empirically reproduce random matrix statistics.Famous examples include the BGS quantum chaos conjecture and Montgomery's pair correlation conjecture for the zeros of the Riemann zeta function.It is major open problem to explain the breadth of the universality of random matrix statistics.
In our ongoing research program [3,2], we turned to dynamical systems theory as a method for producing matrices with dependencies.We consider dynamically defined matrices which are generated by evaluating a complex-valued function f along orbits of an ergodic transformation T and then using the sequence f (x), f (T x), f (T 2 x), . . . to fill a rectangular array, e.g., as in (2.1) below.As is common in dynamical systems theory, all the randomness then comes from sampling the starting position x.
We briefly recall that mathematical physicists have long studied the spectral theory of dynamically defined matrices of Schrödinger type.These matrices are typically tridiagonal, thus sparse.In the Schrödinger (or more generally, Jacobi) world, many important works have tied together dynamical systems theory and spectral theory in deep and sometimes surprising ways, with a particular role played by the Almost Mathieu operator; see [6,12,13,17,32] for examples of breakthrough results.One finding has been that sufficiently random-like Schrödinger operators display Anderson localization and also Poisson spectral statistics [36].Hence, roughly speaking, it can be said that the sparse dynamically defined Schrödinger operators tend to belong the opposite of the random matrix regime.
We take a new perspective and consider dynamically defined Hermitian matrices that are full (i.e., all entries are of comparable size).This puts us in the world of random matrix theory, but with a novel kind of correlation structure.Our goal is then to establish that these dynamically defined matrices display random matrix statistics down to the smallest scale, i.e., that dynamical correlations can mimic random ones.
The main result of this work can be summarized as follows.
The GUE universality class contains dynamically defined matrices.
We emphasize that a single entry f (T k x) of our dynamically generated matrix (defined in (2.1) below) fully determines all other entries, so in some sense there is complete deterministic dependence within the matrix and this fact places the model outside of the existing techniques.From a dynamical systems perspective, it is nonetheless clear that, say, a strongly mixing dynamical system will lead to rapid decorrelation of the entries.The latter perspective is fruitful for us, but we have to make precise how the dynamical decorrelation of matrix entries manifests on the spectral level and this requires some new techniques.
Naturally, the precise manner of dynamical decorrelation depends on the underlying dynamical system.In the prior works [3,2], we have focused on extremely rigid dynamical systems of skew-shift type which are heavily correlated; the resulting spectra are difficult to analyze down to small spectral scales and universality remains open.Even in the Schrödinger world, the skew-shift still presents a significant frontier [13,30,34].
and derive the desired strong conclusion-universal eigenvalue statistics.
Existing techniques on correlated matrices come in two flavors: either they use a special model-dependent correlation structure (e.g., in the random graph setting) [9,10,31] or they require assumptions on the correlation decay [5,15,24] that are not fulfilled in the dynamical setting for the reason mentioned above that a single matrix entry fully determines all other ones.We overcome these limitations by using a resampling trick and by adapting techniques developed for correlated matrices with finite range of dependence [15] to a logarithmic range of dependence.
From the random matrix theory perspective, our result broadens the scope of the GUE universality class to encompass dynamically defined matrices.From a mathematical physics perspective, it provides a spectraltheoretic confirmation of the quasi-random nature of the doubling map; this can be seen as a "delocalization analog" of a well-known result by Bourgain-Schlag [13] establishing Anderson localization for potentials dynamically defined via strongly mixing potentials; see also [16].We leave it as an open problem to extend the result presented here to a wider class of dynamical systems satisfying a quantitative mixing assumption, such as appropriate subshifts of finite type described by the potential formalism of Bowen [14].Our result opens the door to developing a much more general theory which connects dynamical systems theory to the spectra of dynamically defined full matrices.It would be interesting to see which features of such a theory mirror developments in the world of Schrödinger operator, especially given that they typically model the physically different localization regime.
The paper is organized as follows.
• In Section 2, we define our model of dynamically defined matrices and list our main results on a local law for the Green's function, the universality of its eigenvalue statistics, and the delocalization of eigenvectors.• In Section 3, we introduce our resampling method, list our analogous universality results for our resampled matrices, and show that the results of section 2 can be derived from the results of this section.• In Section 4, we review the importance of deriving a self-consistent equation for the Stieltjes' transform of our resampled matrices.We show that our model has a scalar self-consistent equation, and we derive this scalar self-consistent equation from an associated matrix self-consistent equation.• In Section 5, we derive our matrix self-consistent equation via appropriate concentration estimates.
• In Section 6, we prove the stability of our matrix self-consistent equation and derive associated error bounds for the Stieltjes' transform of our resampled matrix model, as well as the Green's function of our resampled matrix model.• In Section 7, we use our stability bounds and error estimates to prove a local law via an inductive scheme.• In Section 8. we compare our resampled matrix model to a model with a small Gaussian part via an Ornstein-Uhlenbeck process.This allows us to prove our main universality result.

Model and main result
2.1.Dynamically defined matrices.Let T = R/Z denote the standard torus.Given an evaluation (or sampling) function f : T → C and a fixed starting point x ∈ [0, 1], we consider the dynamically defined matrix Observe that all the entries of X belong to a single dynamical orbit-the orbit of the starting point x ∈ [0, 1] under the doubling map T .The normalization factor 1 √ N is standard and ensures that the limiting spectral distribution is supported on an order-1 set.For convenience, we start at the top left with T x instead of x and we skip N discrete time steps when moving from one row to the next.
Our results concern the (real-valued) spectrum of the following Hermitization H X of X. (2.2) To turn this into a random matrix ensemble, we sample the starting position x ∈ [0, 1] from the uniform measure on [0, 1], which we call P. We write E for the associated expectation.The uniform measure is a natural choice because it is the equilibrium measure of the doubling map.
We now state an informal version of the main result.See Theorem 2.9 below for the formal version.
Theorem (Main result, informal version).Suppose that f is an admissible evaluation function in the sense of Definition 2.1.Then, for every k ≥ 1, the k-point correlation functions of the eigenvalues of H X converge to the k-point correlation functions of the eigenvalues of an N × N GUE matrix as N → ∞.
The function f (x) = exp(2πix) is an example of an admissible evaluation function in the sense of Definition 2.1.
The result can be rephrased to describe the singular values of X if desired.Generalization to non-square matrices X is straightforward.

Admissible evaluation functions.
The following assumption on the evaluation function f arises naturally in the proof.
We require f ∈ C 2 (T) and define its Fourier coefficients by Then we associate to f the function 2 where 2 n means that 2 is not a divisor of n.To see that , where the last estimate uses the Cauchy-Schwarz inequality.
Definition 2.1 (Admissible evaluation function).We say that f ∈ C 2 (T) is an admissible evaluation function if E[f ] = 0 and there exists a constant g min > 0 such that Example 2.2.The functions f (x) = e 2πix and f (x) = cos(2πx) are admissible evaluation functions.
Remark 2.3.A simple sufficient condition for f ∈ C 2 (T) to be an admissible evaluation function can be obtained by ignoring cancellations and requiring some concentration of Fourier modes along dyadic scales.More precisely, if there exists an integer n ≥ 1 such that 2 n and then f is an admissible evaluation function.
To clarify the meaning of the function g f , we express it through the following function φ : Z → C which measures the correlation between f and f among different dyadic scales: We can use φ to define the infinite Toeplitz matrix Φ by Observe that φ(j) = φ(−j) so Φ is Hermitian.We will see below that Φ determines the limiting spectra distribution of H X as N → ∞ By expanding the square in (2.4), we can now rewrite g f as a Fourier series associated to the correlation function φ, i.e., (2.9) One says that formula (2.9) represents the Fourier symbol of the infinite Toeplitz matrix Φ.In particular, inf x g f (x) = inf spec Φ.We see that Assumption (2.5) is equivalent to the spectral condition that Φ is strictly positive definite.This is how the function g f arises in the proof.

2.3.
First result: local law.We begin by stating a local law which shows that, as N → ∞, the empirical spectral distribution of H converges to a deterministic limiting density even at small scales.The local law is a key prerequisite to our main result, universality.The local law is conveniently formulated through the Stieltjes transform.Given a measure dµ on the real axis, its Stieltjes transform m µ is defined by We prove a formulation of the local law in which the Stieljtes transform of H X converges to m ∞ (z) which solves the following self-consistent equation (2.10) Proposition 2.4.There exists a unique solution m ∞ (z) to (2.10) with positive imaginary part.
This proposition is proved in Appendix A. To state the local law, we introduce the following notation.Given ε, δ > 0, we let where the first condition ρ ∞ (E) ≥ ε puts us in the bulk of the spectrum.We say that a sequence of events A N holds almost surely as N → ∞, if P(A N ) → 1 as N → ∞, where we recall that P is the uniform measure for x ∈ [0, 1].Theorem 2.5 (Local law).Let f ∈ C 2 (T) be an admissible evaluation function.Then, for every ε, δ > 0, (2.11) sup holds almost surely as N → ∞.Note that the right-hand side in (2.11) vanishes in the limit as long as η N −1+δ with δ > 0 arbitrary, so the local law descends arbitrarily close to the optimal scale N −1 where individual eigenvalues become resolved.
We can identify m ∞ (z) as the Stieltjes transform of a limiting spectral distribution which we call ρ ∞ (x).This ρ ∞ (x) is a deterministic probability density function which generalizes the Wigner semicircle law ρ sc (x) = By the Stieltjes inversion formula, The proof of this proposition is deferred to Appendix A. Notice that the density ρ ∞ (x) is deterministic and only depends on the choice of evaluation function f .Example 2.7.For the evaluation function f (x) = exp(2πix), we have , the Wigner semicircle law.See Figure 1 for a pictorial representation of the emergent global law in this special case of Theorem 2.5.
Remark 2.8.Let us explain the form of the self-consistent equation (2.10).In the proof, it arises in the form where m Φ ≡ m ρΦ is the Stieltjes transform of the spectral density of the infinite Toeplitz matrix Φ.Since Φ is unitarily equivalent to multiplication by g f in Fourier space, we have the explicit representation which connects (2.13) to (2.10).

2.4.
Main result -Universality.We are now ready to state our main universality result.We write λ 1 , . . ., λ 2N for the eigenvalues of H X and p N N : R 2N → R for the symmetrized eigenvalue density of H X .The k-point correlation functions p k N are defined by The universal objects are p N,GUE , defined analogously as the k-point correlation functions of a matrix from the Gaussian Unitary Ensemble.They are explicitly computable, e.g., where K N (x, y) has an explicit formula in terms of Hermite polynomials [27].
(i) The condition ρ f (x) ≥ ε means that we work in the bulk of the spectrum.It may be possible to prove similar results at the spectral edge, but this is not our focus here.(ii) While the results are stated as holding almost surely as N → ∞, the proof quantifies the convergence speed as polynomial.Since this point is not our focus here, we use the qualitative notion for simplicity of presentation.(iii) There exist universality result for correlated matrices in the literature [15,24].However, these do not apply to our dynamically defined model.For instance, the condition on correlation decay in [24] is that for any pair of bounded functions F and G applied to entries H ij and H kl , For our model, fixing i, j, k, l and taking F = id and G = T p for an appropriate power p, the left-hand side equals E[|H kl | 2 ] = N −1 , a constant.Since this does not decay at all, the correlation condition (2.15) breaks down in our model.Remark 2.11 (Open problem).The formulae (2.1) and (2.2) defining our matrix ensemble can be generalized to dynamically define a matrix ensemble starting from any dynamical system (X, T, µ).It is an interesting open problem, lying on the interface of dynamical systems theory and random matrix theory, to extend the present universality result to a much broader class of dynamical systems.Natural candidates are dynamical systems satisfying a quantitative mixing assumption, e.g., subshifts of finite type building on the potential formalism developed in [14]; see also [7].

Delocalization of eigenvectors.
In physics, the delocalization of eigenvectors is seen as a hallmark of the random matrix theory phase.In a standard way, the precise control on Green's function established in proving of Theorem 2.5 implies eigenvector delocalization bounds in the bulk of the spectrum.
We use the convention that eigenvectors u α ∈ R N are 2 -normalized and so a completely delocalized eigenvector, which is roughly equally supported on all coordinates, would have max Theorem 2.12 (Delocalization bound for eigenvectors).Let H X u α = Eu α for some E ∈ R satisfying ρ ∞ (E) ≥ ε for some > 0.Then, there exists δ > 0 such that holds almost surely as N → ∞.
2.6.Proof strategy.Overall, our proof relies on the famous 3-step strategy for proving universality developed in the past years.However, the dynamical nature of correlations in our matrix ensemble requires some new ideas which can be summarized as follows.
• The first novel feature of our proof is a dynamical resampling trick which we use to "preprocess" the dynamically defined ensemble H X into another one, called H Y , whose entries become independent after (log N ) 6 many discrete time steps.The resampling procedure is based on the binary expansion because the doubling map then acts as a digit shift.Importantly, the construction is such that the Green's functions before and after resampling agree on scales Im[z] N −1 .This means that their spectra are essentially the same and so it suffices to prove local law and universality for the ensemble H Y with its logarithmic range of dependence.
• In a second step, we prove local law and universality for the eigenvalues of H Y with its logarithmic range of dependence by extending Che's analysis for finite range of dependence [15].The possibility that these methods could be extended to range of dependence to be logarithmic, even to a logarithmic power of a logarithm, was already raised by Che.
• A problem compared to the situation studied in [15] is that we no longer have strict positivity of the correlation matrix and Lipschitz continuity of the correlations.We address these issues by leveraging the block structure and symmetries to reduce to a scalar self-consistent equation for the Stieltjes transform.For this scalar equation, we can recover positive definiteness by using the Ward identity, the assumption that f is an admissible evaluation functions and techniques in the spectral theory of Toeplitz matrices, e.g., turning a banded Toeplitz matrix into a circulant matrix (whose spectrum is explicitly computable) by a finite-rank perturbation.• Other building blocks of the proofs, e.g., the fast equilibration of Dyson Brownian Motion, are standard in the field and are accordingly only summarized briefly.
We use the convention that C denotes a generic positive constant that is independent of N and whose value may change from line to line.

3.
Step I: Resampling to logarithmic range of dependence 3.1.Resampling binary digits.Recall that x is the starting point of the doubling map dynamics in (2.1).We write for its binary expansion.The doubling map acts as a digit shift in the binary basis, We let {b k n } n,k≥1 be a two-parameter family of independent coin flips with outcomes {0, 1} occurring with probability 1  2 each.We then define the resampling of T k x by (3.1) (The choice of 6 is somewhat arbitrary; any sufficiently large power of a logarithm will do.) The resampled matrix ensemble is given by with Hermitization Local law and universality for the resampled ensemble.The advantage of the resampled ensemble H Y is that in contrast to the original ensemble H X it only has logarithmic range of dependence: for |a − b| > (log N ) 6 , the random variables y a and y b are independent.In Step 2, we will use this fact to prove that the analogs of our main results (local law and universality) indeed hold for H Y .For completeness, they are stated here.
We write pk N for the k-point correlation function associated to the eigenvalues of H Y .Theorem 3.2 (Universality after resampling).Let f ∈ C 4 (T) be an admissible evaluation function.Let k ≥ 1 and let O : R k → R be smooth and compactly supported.Then, for ε > 0 and ρ(E) ≥ ε, there exists κ > 0 such that as N → ∞.
Before we prove these theorems, we use them to derive Theorems 2.5 and 2.9.

Green's function comparison lemma.
Here and in the following we denote From now on, we always assume that f ∈ C 2 (T) is an admissible evaluation function.
ε and ρ ∞ is the measure associated to m ∞ .Given σ 1 , . . ., σ n ≤ σ, we set Then, there exists C σ > 0 depending only on σ such that Proof of Lemma 3.3.Let σ > 0. By the resolvent identity, we have that The entries of the matrix (X − Y ) are of the form f (T k x) − y k .For these, we have the bound, The outside factors are controlled by since the logarithm comes with a power.

3.4.
Proof of the main result assuming Theorems 3.1 and 3.2.
Proof of the local law, Theorem 2.5.This follows by combining Lemma 3.3 and Theorem 3.1.
For Theorem 2.9, we need another technical comparison result.
Proof.The derivation of Lemma 3.4 from Lemma 3.3 and the local laws is standard for generalized Wigner matrices; see [27,Thm 15.3] and [22,Thm 6.4].In essence, the proof idea is to mollify the observable O in each argument through approximate δ-functions of the form Im x−E−iη with η ∼ N −1−σ , which is a good approximation because these small scales resolve individual eigenvalues.This allows to express the tested correlation function using polynomials in Im m(z 1 ), . . ., Im m(z k ) to which one applies Lemma 3.3.An inclusion-exclusion argument takes care of the possibility of eigenvalues matching.The local laws for m Y (z) and m X (z) play the role of a priori bound on the eigenvalue density in the form of [27, eq. (15.7)].
Our contribution here is solely to observe that the arguments carry over verbatim to the ensemble defined by H Y .For the details, we refer the interested reader to [27][Thm 15.3] and [22,Thm 6.4].
Proof of the main result, Theorem 2.9.This follows by combining Lemma 3.4 and Theorem 3.
We will see that it is essentially given by (2.13) up to error terms that are subleading in N .
The standard approach for matrices with independent entries is to remove a single row and column via the Schur complement formula.With dependencies, this is no longer sufficient.The solution found in [15] is to expand the rows and columns even further until sufficient decoupling is achieved.In this section, we adapt this approach to the dynamically defined ensemble.4.1.Preliminaries.We introduce some notation.Following [15], we define the 2N 2 × 2N 2 matrix Σ, called the full correlation matrix of H, One can immediately see that the matrix Σ is not strictly positive semidefinite in the sense of [15, Def.2.2] since Σ (i,j)(i,j) = 0 for 1 ≤ i, j ≤ N .
To quantify the error terms in the self-consistent equation, we need some notation.
Definition 4.1 (Notation for index reduction).Let A be a matrix indexed by a subset of the integers.(i) For T and S subsets of Z, we write A| T,S for the submatrix of A whose rows are given by T and whose columns are given by S. (ii) For U is a subset of Z, we denote by A (U ) the matrix This means that we set each entry of A whose row or column index belongs to U to 0 and do not alter the matrix A otherwise.(iii) For U ⊂ {1, . . ., 2N }, we also set (ii) Given i ∈ {1, . . ., 2N }, let I and J be disjoint subsets of Let • denote the operator norm on C |I| with respect to the Euclidean norm.We define

4.2.
The self-consistent equation.We first state the general form of the self-consistent equation satisfied by the Green's function.
Lemma 4.3 (Self-consistent equation for the Green's function).Let z = E + iη and σ > 0. Then This lemma is proved in the next section.The proof is similar to that of [15,Lemma 3.10] due to the fact that the relevant concentration estimates for quadratic forms can be extended to a logarithmic range of dependence.
We are able to leverage the special structural properties of the dynamically defined matrix model to reduce to a scalar self-consistent equation for the Stieltjes transform m(z).
The relevant finite-N analog of Φ from (2.8) is the banded N × N Toeplitz matrix Φ N defined as follows, for any 1 ≤ i, j ≤ N , (4.4) We note that Φ N is Hermitian because the function φ satisfies φ(j) = φ(−j).
With the identification of the most-important parts of the covariance, we can simplify the self-consistent equation from 4.3 to have a nice block structure.
We have the following equations on the blocks of G i (4.5) where the O σ errors are elementwise.By setting these errors to zero, we can derive the limiting matrix self-consistent equation.
If we let Ξ(M) be the operator on 2N by 2N matrices that sends Intuitively, we see that G should approach the solution of the limiting self-consistent equation, Unfortunately, this matrix self-consistent equation is still not in a good enough form for detailed analysis.However, we notice vast simplications from the equations in 4.5.The quations for G 1 and G 2 are closed and this is all that is necessary to determine the behavior of the trace of the Green's function.Furthermore, we would also expect G 3 ,G 4 and the off digaonal entries of G 1 to concentrate around zero.With this simplications, we can present a scalar self-consistent equation.This form of the self-consistent equation is the most important for deriving stability and error propagation bounds in section 6.
We write m Φ N for the Stieltjes transform of the (no longer explicit) spectral measure of Φ N .Recall the definition of m(z) from Equation (4.1).Proposition 4.5 (Scalar self-consistent equation for m(z)).We have Remark 4.6.Formally taking N → ∞, we see that we can expect the above equation to approximate the limiting self-consistent equation (2.13) .
4.3.Proof of Proposition 4.5.We recall the Ward identity (or rather Ward identities since they helpfully decompose into two relations for sample covariance matrices such as H).These give us control on the sizes of the off-diagonal Green's function elements in terms of its diagonal elements.
Lemma 4.7 (Ward Identities).The Green's function satisfies Here and in the following use both A i,j or A ij for entries of a matrix A, depending on convenience.
Proof of Proposition 4.5.
Step 1. Correlation matrix.Let 1 ≤ i, j, k, l ≤ N .Using (3.1) and conservation of measure of the doubling map, we find for the correlation matrix (4.10) where Φ N is given by (4.4) and we introduced its sibling with We note that (4.10) contains all the only non-trivial entries of Σ, i.e., all others are zero.The error terms O(2 −(log N ) 6 ) come from (3.1) and the mean-value theorem; they are super-polynomially decaying and can thus be ignored in the following.
Step 2. Reduction to blocks.We introduce the block Green's functions and their normalized traces The Schur complement formula and cyclicity readily imply that T 1 (z) = T 2 (z) and consequently (4.12) We see that it suffices to study the reduced Stieltjes transform T 1 (z).We decompose the self-consistent equation (4.3) into blocks accordingly, taking into account when the correlation matrix vanishes.We first let i, j ≤ N .Rearranging terms, we obtain (4.13) For the second step, we used the formula from Step 1, Cauchy-Schwarz, the Ward identity, and ψ 2 < ∞ (proved below) to estimate which is dominated by g f .Hence, we have Using the formula from Step 1 to rewrite the left-hand side of (4.13), we conclude that (4.14) For i, j > N we have to consider the expression Completely analogous reasoning yields the remarkably different formula where we used that 1 N l G 2 ll = T 2 (z) = T 1 (z).Inspecting the system of equations (4.14) and (4.15), we notice that we can reduce G 1 to T 1 (z) also from (4.14) by averaging over i = j.By contrast, the elements of G 2 are multiplied by the Hermitian Toeplitz matrix Φ N .We obtain the system (4.16) where in the second line the error estimate holds componentwise.We can use the second equation to solve for G 2 as where e 1 is a number and E 2 is a matrix.Then where ẽ1 satisfies, (4.20) The proof uses spectral bounds on the finite Toeplitz matrix Φ N .These are inherited from the infinite Toeplitz matrix Φ as summarized in the following lemma.Lemma 4.9 (Spectral estimates for finite banded Toeplitz matrix).There exist N -independent constants c, C > 0 such that Proof of Lemma 4.9.Recall that g f is the Fourier symbol of the infinite Toeplitz matrix Φ and so It is a well-known fact that the finite restriction of an infinite Toeplitz matrix has spectrum lying inside the spectrum determined by the Fourier symbol of the infinite matrix [29, Lemma 6].In the present case, this implies that if we define the auxiliary Toeplitz matrix (4.22) Recall that the matrix Φ N defined by (4.4) is cut off at the band edges |i − j| = (log N ) 6 .Using Definition (2.7) of φ, the fact that f ∈ C 2 (T) and the mean-value theorem, we see that and so Lemma 4.9 follows from Weyl's norm perturbation theorem.
Proof of Proposition 4.8.We abbreviate T 1 ≡ T 1 (z).Solving the second identity for G 2 yields G 2 = −(I + E 2 )(ΦT 1 + z) −1 and then the first identity gives (4.23) After some algebraic manipulation, we derive (4.24) Our goal is now to control the second error term 1 N Tr(E 2 (Φ N T 1 + z) −1 Φ N ).By the cyclic property of the trace, we know that Now, since Φ N is a band matrix of band size 2(log N ) 6 and only bounded entries, we see that Φ N E 2 ∞ ≤ C(log N ) 6 E 2 ∞ .To estimate the entries of (Φ N T 1 + z) −1 , we need to use the fact that we are considering the inverse of a banded matrix.We apply the following result on the decay of entries of the inverse of banded matrices from [15].Lemma 4.10 (Inversion of band matrices [15]).Let A be an invertible infinite or finite matrix, which is W -banded in the sense that A(i, j) = 0 if |i − j| ≥ W .We have .
We observe that κ(T 1 Φ N + z) as defined in Lemma 4.10 is uniformly bounded independenly of N .Indeed, by Lemma 4.9, we have The band width that we have to consider is W = (log N ) 6 .Thus, we see that the constant α is given by, We can then control individual entries of the matrix The bound on individual entries implies a bound on the normalized trace.This proves Proposition 4.8.

Derivation of the matrix self-consistent equation
In this section, we prove Lemma 4.3.The following derivation is modeled after [15].While we simplify some aspects, it is rather technical.Hence, we invite the reader to instead consider Appendix B for a heuristic derivation of (4.3) based on Gaussian integration by parts and the loop equation.
T c ,J .As an immediate corollary, we can estimate the sizes of the Green's function after row removal in terms of the original Green's function (more precisely, in terms of the stochastic control parameters from Definition 4.2).This is used as a deterministic a priori estimate later.
Corollary 5.2.Let i be an integer in {1, . . ., 2N } and let Proof.Consider the identity of (5.1) with I = {k}, J = {l} and T = I.We obtain (5.4) We can treat the second term as an inner product and bound it by treating G k,I and G I,j as 4(log N ) 6 dimensional vectors.That is The first term |G kl | can be bounded by Γ, which is less than 4(log N ) 6 Γ 2 γ since Γ, γ ≥ 1 by definition.

Concentration Identities.
A key ingredient in the proof of [15,Lemma 3.10] are concentration bounds for variables with finite range of dependence.Here we extend these estimates to the case of logarithmic range of dependence by grouping arguments similar to those in [15].Lemma 5.3.Let a 1 , . . ., a N be a family of dependent random variables that satisfy the property that a i and a j are independent whenever |i − j| ≥ (log N ) 6  We fix a parameter σ > 0, a power p ≥ p 0 (σ) and assume the following moment estimate. (5.5) Then, there exists a universal constant C > 0 such that for any collection of deterministic A i and B ij with 1 ≤ i, j ≤ N , with probability 1 − N −σ p and some σ > σ , we have that Proof.All sums are implicitly over {1, . . ., N }.By dividing the integers modulo (log N ) 6 , we see that each sum (5.8) for 0 ≤ k ≤ (log N ) 6 − 1 is a sum of terms involving only independent a i .We can then appeal to standard concentration estimates, see [27,Theorem 7.7], to conclude that with probability 1 − c p N σ p , we have where σ < σ.A union bound for all the sums A k gives the desired estimate.We can absorb (log N ) 6 factors into N σ both in the bound (5.6).
We come to the quadratic summation, where we have to do some more manipulation.First, we split (5.9) Existence and uniqueness of solutions.
We emphasize that M N (z) denotes a scalar quantity.The proof uses Brouwer's fixed point theorem.
Proof.Fix z ∈ C + .We define the function F (w) = − 1 w m Φ N (− z w ) that appears on the left-hand side of (6.1).We can rewrite F (w) as where dρ Φ N is the empirical spectral distribution of Φ N .We consider the compact, convex domain , or an appropriate constant C > 0 to be determined.Below we show that F : D z → D z .Then Brouwer's fixed point theorem implies that F has a fixed point.This fixed point is then the desired solution M N (z) ∈ D z which we note has positive imaginary part.It remains to establish . Next, we consider the behavior of the imaginary part, ( From (4.21), we conclude that dρ Since Brouwer's fixed point theorem does not imply uniqueness, we have to prove it by hand using the structure of the self-consistent equation.Lemma 6.2 (Uniqueness).Let z ∈ C + .Then there is a unique solution M N (z) to (6.1) with positive imaginary part.
Proof.Fix z ∈ C + .We first show that any solution M N (z) of (6.1) with positive imaginary part must satisfy Indeed, taking imaginary parts of both sides of (6.1) in the representation (6.4), we obtain from which (6.6) follows by rearranging.
6.3.Stability estimates.The estimates used in the proof of the last lemma can be refined show stability of solutions to (6.1).Recall that M N (z) is the unique solution with positive imaginary part to (6.10) .
Lemma 6.3 (Stability).Let Φ ∈ {Φ N , Φ} and let T 0 solve (6.11) Moreover, let T be a solution to the approximate self-consistent equation, (6.12) for some E ∈ C. Fix a parameter ω > 0 and assume we are at a point z such that Im[T 0 (z)] ≥ ω.Then, there exist constants ω > 0 and C ω depending only on ω such that if |E| ≤ ω and |T − T 0 | ≤ ω , then we have, ( Remark 6.4.The condition that Im[T 0 ] ≥ ω means that we consider values of z whose real part corresponds to the bulk of the spectrum. Proof.Following the manipulations of (6.8) in the previous Lemma 6.2, we can assert that We will write T x + z = T 0 x + z + (T − T 0 )x and expand the denominator.By Lemma 4.9, we see that where C is an upper bound on the support of dρ Φ .Writing K(z) for this difference, we have shown that, for sufficiently small ε ω , ( We claim that there exists c ω such that 1 − R xρΦ(x)dx (T0x+z) 2 > c ω .The claim then follows by making ε ω sufficiently small, which we point out also makes |K(z)| small.
Recall that from looking at the imaginary parts of the self-consistent equation, we have the relation, (6.17) By using the fact that T 0 has strictly positive imaginary part > ω, we can deduce that there is an even bigger gap between 1 and Notice that in the support of dρ Φ we know that Im[T 0 x + z] ≥ cω and |T 0 x + z| ≤ C(z) for some constant C(z) > 0. When writing T 0 x + z = |T 0 x + z|e iθ(x,z) in polar coordinates, we find that sin θ > cω C(z) .In particular, there is some θ ω such that θ ω ≤ θ(x, z) ≤ π − θ ω .We have By integrating this over x, we obtain |T0x+z| 2 < 1 and 2θ ω ≤ 2Θ ≤ 2π − 2θ ω , this relation implies that there is a strictly positive gap between 1 and R xρΦ(x)dx (T0x+z) 2 that depends only on ω.Thus, for sufficiently small ω , we know that there exists some constant Considering (6.16), this implies that |T − T 0 | ≤ C ω |E|.

6.4.
Comparison to the limiting self-consistent equation.Recall that m ∞ (z) is the unique solution of the limiting self-consistent equation (2.13).Here we prove that it is close to M N (z), the solution of the auxiliary self-consistent equation (6.1) that was analyzed in the preceding subsections We can combine our discussion in the following lemma, The proof idea is that, while the N × N matrix Φ N does not have an explicitly computable eigenvalue distribution, a logarithmic-rank perturbation will.Since Φ N was a band matrix with band size (log N ) 6 , we can extend the band by wrapping around the other side of matrix, as if it were a torus (such matrices are called circulant matrices) and then explicitly compute the spectrum.
Proof.We define the circulant matrix Note that ΦN is a Hermitian matrix and a rank 2(log N ) 6 perturbation of Φ N .By taking the finite Fourier series, we see that ΦN has eigenvalues explicitly given by ẽk := where k can take any integer value between 0 and N − 1.We order these eigenvalues as In this subsection, we show that we can recover the Green's function block G 2 from solutions m ∞ (z) to the limiting scalar self-consistent equation (2.13).The exact same method will give us bounds on the entries of G 3 , G 4 , and the off-diagonal entries of G 1 , Lemma 6.6.Let (T 1 , G 2 ) be a solution to the system of equations (4.18).Assume in addition to this that as N → ∞.Also recall the other matrices G 1 ,G 3 and G 4 from (4.5).In this proof we will let T 0 (z) ≡ m ∞ (z), the solution to the limiting self-consistent equation (2.13).Then (6.33) Proof.We will drop the argument z from T 1 (z) and T 0 (z) when the context is obvious.It suffices to prove the result for G 2 .All other blocks are similar.It follows from (4.18) that (6.34) We recall T 0 ≡ m ∞ (z).We expand the first resolvent using the resolvent identity and find We bound the entries of (Φ N T 0 + z) −1 and (Φ N T 1 + z) −1 by Lemma 4.10 since they are inverses of a band matrix.The function κ is bounded as in the proof of Lemma 4.10.Hence , we obtain for every 1 ≤ i, j ≤ N (6.36) Here, α 0 and α 1 are the corresponding constants from the application of Lemma 4.10 to the matrices (Φ N T 0 + z) and (Φ N T 1 + z), respectively.The assumption that Im[T 0 ] ≥ ω clearly shows that Φ N T 0 + z is bounded above and below and, thus, we can assert that α 0 = 1 + O((log N ) −6 ) and the result of infinite summation (1 − α 0 ) −1 = O((log N ) 6 ).
In addition to this, we have already assumed that , so clearly we have the same upper and lower bounds on Φ N T 1 + z , and we may still assert that α 1 = 1 + O((log N ) 6 ).This shows that we have the following estimate, (6.37)

Establishing the Local Law
Once we have derived the error estimates for our self-consistent equation along with the stability estimates, we can prove a local law via a standard continuity approach.7.1.The Global Law.The first goal is to establish the local law at large scales, e.g. a global law.We will establish the following theorem, Theorem 7.1.Let M be the exact solution to the matrix of self-consistent equations (4.7), let m ∞ (z) be the exact solution to the infinite self-consistent equation (2.13), and let G be the Green's function of the matrix Y as in section 4. Let D be a compact subset of C + .Then, for N D,ν,p sufficiently large depending on D, ν and p, we can establish the following bound for N ≥ N D,ν,p . (7.1) Proof.We see from Lemma 4.3 that we know that the matrix G satisfies the self-consistent equation up to error of order given by N 2σ Γ 5 γ 3 √ N η .From the results of Lemmas 6.3, 6.6, and 6.5, we know that we can derive the desired result (7.1) as long as we know that Γ, γ 1 for z ∈ D. (Note that since G and M are Lipschitz, we can derive the high probability bounds on a discrete grid that is of polynomial size and extend to the entire set D without too much loss in probability.) We have the deterministic bound that Γ ≤ 1 η , which will be bounded by 1 in our compact region D. It suffices to establish bounds on γ.By using the Schur complement formula, we know that (7.2) (G (J) We can attempt to estimate the operator norm of the right hand side.First, observe that we can bound the operator norm of H I,I by the Frobenius norm, Finally, we can estimate, (7.4) We know that G ). Combining our previous estimates by the triangle inequality, this shows that γ ≤ O(N σ ) in D. This completes the proof.7.2.Proving the Local Law.As we have seen in the previous proof, it was necessary to get a bound on γ and Γ.We will establish the local analogue first.Lemma 7.2.Assume that for some parameter ε > 0.Then, we have that Proof.The explicit form of the solution of the self-consistent equation ensures that M ∞ = O(1).Clearly, the condition (7.5) would imply that Γ = O(1) immediately.We need to do more work to establish the same result for γ.
As is usual, we let J and I be non-intersecting subsets of [i − 2(log N ) 6 , . . ., i + 2(log N ) 6 ] for some integer i.

The Comparison to the Gaussian
A very powerful tool in proving universality of various random matrix models is the study of the Dyson Brownian motion.The study was initially pioneered in a series of papers by Erdos, Schlein, Yau, and collaborators [20,19,25] and culminating in an optimal time proof of universality in [26].The study of the Dyson Brownian motion has since been used to great effect in many papers, such as [21,22,28,23,35].In this section, we apply the Dyson-Brownian motion to prove universality for the ensemble H.Under this convention, we see that B ab = 0, B N +a,N +b for 1 ≤ a, b ≤ N .
We consider the matrix evolution on H to be given by.
with H(0) = H being our initial matrix and H(t) be the result after running the Brownian motion for time t.
By our choice of Brownian motion, we see that H(t) has the same covariance and independence structure as the matrix H(0).Thus, we can show that a local law holds for H(t) without much difficulty.
The following integration by parts lemma will be useful in understanding the time evolution of functions of the matrix H(t).Lemma 8.1.Let (x 1 , . . ., x N ) be an array of J correlated random variables with mean 0 (where J is allowed to be a function of N ).Assume further that E[|x k | 3 ] is bounded uniformly for all k.Pick some index i ∈ [1, . . ., N ] and let T be the set of indices that are correlated with i.Then, we have the following relation, Proof.The proof is an exercise in applying the Taylor expansion.
Let x (T ) be the tuple of integers (x 1 1(1 ∈ T ), . . ., x N 1(N ∈ T )).We see that if we expand f in the variables in T , we see that we can derive the expression, (8.4) We can bound, (8.5) We now let U j be the set of integers that are correlated with j.We can again apply the Taylor expansion to compute the expectation of (8.6) The second term on the right hand side above can be estimated as we have done previously.
We see that We can now reverse the application of the Taylor expansion and write, (8.8) Substituting this expression inside E ∂ j f x (T ∪U ) E[x j x i ] gives us the expression E[∂ j f (x)]E[x j x i ] plus an error term expression which can bounded in the same way as we have done previously.This completes the proof of the expression.
We will apply the previous lemma when we compute the time evolution of functions of H(t).
Lemma 8.2.Let f be a function in C 3 from C 2N ×2N → C.Then, we have the following relation, Proof.We start with applying Ito's Lemma.We see that, To evaluate the first term on the right-hand side of the above equation, we may apply Lemma 8.1.We see that we may derive, (8.11) − 1 2 Here, we used the fact that In what follows, it will be useful to state exactly what derivatives we need to control in the expression D 3 f ∞ , rather than apply a supremum bound.Given a pair (i, j), we define the set T i,j as follows T i,j = {(i , j ) : |i − i | ≤ 4(log N ) 12 or |j − j | < 4(log N ) 12 }.
Essentially, if one lets T i,j be the set of indices of entries that could be correlated with H ij (t), then T i,j is the set of indices of entries that could be correlated with entries whose indices are in T i,j .
When we apply the Taylor expansion, we see that we consider expressions of the form.
Here, we apply the notation from Lemma 8.1 to let H(t) T i,j to represent the matrix H(t) with certain entries set to 0.
In the proof of the previous lemma, θ is a constant between 0 and 1.
With the above lemma in hand, we can now establish a Green's function comparison theorem.
Lemma 8.3.Recall the setting of Lemma 3.3; namely, let n ≥ 1, ε > 0 and let E 1 , . . ., E n satisfy ρ ∞ (E i ) ≥ ε, where ρ ∞ is the density associated with m ∞ .Given σ 1 , . . ., σ n ≤ σ, we set Consider the matrix dynamics H(t) with H(0) coming from our initial matrix distribution as in (3.3).We let G t be the Green's function of H(t) − z with normalized trace m t and G 0 be the Green's function of H(0) − z with normalized trace m 0 .Then, there exists C σ > 0 depending only on σ such that Proof.We will prove the comparison when n = 1.The proof of the general statement follows similar details.We will try to apply the previous Lemma 8. 2. We see that it suffices to derive a bound on the third derivatives, (8.13) Tr G θ,s (z) .
Here, cd and ef are entries in T a,b and G θ,s (z) is the Green's function of the matrix H(s) − z + θ(H T a,b (s) − H(s)), where θ is a constant between 0 and 1.First fix a time s and set θ = 0. We will first establish a bound here before discussing the general case.
By direct computation, one can see that where, recall, Γ(z) is a uniform upper bound for the entries of G s (z) and ∂ α indicates any third order partial derivative.Again, by direct computation, one can see that the change of Γ as the imaginary part η of z changes satisfies a useful inequality, ∂Γ ∂η ≤ Γ η .
whenever N −1− ≤ η ≤ N −1 and η.Now, when E is in the bulk of the distribution, we can apply our local law to ensure that Γ(E + iN −1+ ) = O(N ).Thus, with probability 1 − N −D for some large D, we could ensure that ∂ |α| 1 2N Tr[G s (z)] ≤ CN 12 and has the trivial bound N 8 otherwise.By Lipschitz continuity, one can establish these results on a discrete grid of times and extend to the entire interval [0, t].In addition, one can show that matrices of the form H(t) T i,j + θ H(t) − H(t) T i,j satisfy a similar local law.Again, applying local law results to a discrete grid of θs and noting the fact that there are no more than N 2 choices of these special T i,j modifications will allow us to get a uniform probability bound on all choices of θ and i, j.This gives us a desired proof of the bounds on the derivatives we need to apply Lemma 8.2 and complete the proof of the Theorem.
The results of the above Green's function comparison theorem can be used to prove the following comparison on correlation functions, as we have seen earlier in the proof of Lemma 3.4.Comparing to the GOE.At this point, we have established that the statistics of H(0) match those of H(t).We will be finished once we show that the statistics of H(t) match those of the GOE.
However, recall from our interpolation that H(t) has a correlated Gaussian component where G has the correlation structure given by E[C ab C cd ] = ξ abcd .However, Because our covariance matrix ξ < c 0 is positive semidefinite, we can split the matrix C as C = C + G where C and G are independent Gaussians and G is a GUE matrix.Thus, the matrix H(t) can be represented in the form H + cGU E for c ≥ N −1+ and H independent of the GUE.Theorem 2.2 of [35] proves that the matrix H(t) will have universal spectral statistics.Theorem 8.4 shows that H(0) will have the same spectral statistics as H(t).This proves Theorem 3.2.Finally using Lemma 3.4, this will further prove Theorem 2.9.
Appendix A. On the limiting objects m ∞ (z) and ρ ∞ (z) In this appendix, we prove Propositions 2.4 and 2.6.
Here, we wrote (G Ỹ ) ij = G im Ỹmj .Observe now that ∂ Ỹkl G im = G ik G lm .Integrating by parts, using the fact that E[ Ỹkl Ỹmj ] = ξ kljm will show that there is a prefactor of ξ kljm G ik G lm associated with this quantity.At the last step we can remove the expectation, anticipating that these quantities will be concentrated.

Figure 1 .
Figure 1.The empirical spectral distribution of matrices given by (2.1) with f (x) = exp(2πix).This histogram depicts the emergence of the Wigner semicircle law on the global scale.Theorem 2.5 also establishes that this convergence continues to hold down to small scales.

2 . 4 .
It remains to prove Theorems 3.1 and 3.2.The main work is to establish Theorem 3.1, the rest is handled by the well-developed Dyson Brownian Motion machinery.From now on, we only consider the resampled matrices Y and we denote H ≡ H Y , m ≡ m Y , and G ≡ G Y .Derivation of the scalar self-consistent equation This section is devoted to studying the self-consistent equation satisfied by the Stieltjes transform (4.1)

5. 1 .
Basic removal operations and bounds.The following lemma can be seen as a generalization of the Schur complement formula.The proof is via the resolvent equation and can be found in [15, Lemma 3.3].Lemma 5.1.Let T, I, J be three subsets of {1, . . ., 2N } satisfying T ∩ I, T ∩ J = ∅.Then, we have the following resolvent identities.

|H ab | 2 ,
which can be bounded by O( N σ √ N η ).This only uses the fact that |H ab | = O(1) √ N and we have at most O((log N ) 12 ) terms in the sum.

8. 1 .
Local Law estimates under Interpolation.We consider the evolution of the Green's function under the modified Ornstein-Uhlenbeck (OU) process given as follows.Recall H = [Y ij ], our 2N × 2N Hermitian block matrix with the N × N diagonal blocks set to 0, and let dB ij be a matrix valued Brownian motion with correlation structure given by(8.1)Cov[B ab (t)B cd (t)] = tξ abcd .

Theorem 8 . 4 .
Fix a time t = N −1+ with > 0. Consider the matrix dynamics H(t) with H(0) coming from our initial matrix distribution as in(3.3).Let p (k),t N be the correlation functions of H(t).Let ρ be the density corresponding the limiting spectral distribution of H(0).and let E be a point in the support of ρ.Then, for any compactly supported continuous test function O from R k → R, we have the following comparison estimate, It remains to understand how the matrix 1 −T1Φ N −z affects the error estimate.This is relegated to the next subsection; see Proposition 4.8.Together with (4.12), this proves Proposition 4.5.