Edge universality for non-Hermitian random matrices

We consider large non-Hermitian real or complex random matrices \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X$$\end{document}X with independent, identically distributed centred entries. We prove that their local eigenvalue statistics near the spectral edge, the unit circle, coincide with those of the Ginibre ensemble, i.e. when the matrix elements of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X$$\end{document}X are Gaussian. This result is the non-Hermitian counterpart of the universality of the Tracy–Widom distribution at the spectral edges of the Wigner ensemble.

Much less is known about the spectral universality for non-Hermitian models. In the simplest case of the Ginibre ensemble, i.e. random matrices with i.i.d. standard Gaussian entries without any symmetry condition, explicit formulas for all correlation functions have been computed first for the complex case [31] and later for the more complicated real case [10,36,49] (with special cases solved earlier [20,21,43]). Beyond the explicitly computable Ginibre case only the method of four moment matching by Tao and Vu has been available. Their main universality result in [54] states that the local correlation functions of the eigenvalues of a random matrix X with i.i.d. matrix elements coincide with those of the Ginibre ensemble as long as the first four moments of the common distribution of the entries of X (almost) match the first four moments of the standard Gaussian. This result holds for both real and complex cases as well as throughout the spectrum, including the edge regime.
In the current paper we prove the edge universality for any n × n random matrix X with centred i.i.d. entries in the edge regime, in particular we remove the four moment matching condition from [54]. More precisely, under the normalization E |x ab | 2 = 1 n , the spectrum of X converges to the unit disc with a uniform spectral density according to the circular law [6][7][8]30,32,51]. The typical distance between nearest eigenvalues is of order n −1/2 . We pick a reference point z on the boundary of the limiting spectrum, |z| = 1, and rescale correlation functions by a factor of n −1/2 to detect the correlation of individual eigenvalues. We show that these rescaled correlation functions converge to those of the Ginibre ensemble as n → ∞. This result is the non-Hermitian analogue of the Tracy-Widom edge universality in the Hermitian case. A similar result is expected to hold in the bulk regime, i.e. for any reference point |z| < 1, but our method is currently restricted to the edge.
Investigating spectral statistics of non-Hermitian random matrices is considerably more challenging than Hermitian ones. We give two fundamental reasons for this: the first one is already present in the proof of the circular law on the global scale. The second one is specific to the most powerful existing method to prove universality of eigenvalue fluctuations.
The first issue a general one; it is well known that non-Hermitian, especially non-normal spectral analysis is difficult because, unlike in the Hermitian case, the resolvent (X − z) −1 of a non-normal matrix is not effective to study eigenvalues near z. Indeed, (X − z) −1 can be very large even if z is away from the spectrum, a fact that is closely related to the instability of the non-Hermitian eigenvalues under perturbations. The only useful expression to grasp non-Hermitian eigenvalues is Girko's celebrated formula, see (14) later, expressing linear statistics of eigenvalues of X in terms of the log-determinant of the symmetrized matrix Girko's formula is much more subtle and harder to analyse than the analogous expression for the Hermitian case involving the boundary value of the resolvent on the real line. In particular, it requires a good lower bound on the smallest singular value of X − z, a notorious difficulty behind the proof of the circular law. Furthermore, any conceivable universality proof would rely on a local version of the circular law as an a priori control. Local laws on optimal scale assert that the eigenvalue density on a scale n −1/2+ is deterministic with high probability, i.e. it is a law of large number type result and is not sufficiently refined to detect correlations of individual eigenvalues. The proof of the local circular law requires a careful analysis of H z that has an additional structural instability due to its block symmetry. A specific estimate, tailored to Girko's formula, on the trace of the resolvent of (H z ) 2 was the main ingredient behind the proof of the local circular law on optimal scale [14,16,59], see also [54] under three moment matching condition. Very recently the optimal local circular law was even proven for ensembles with inhomogeneous variance profiles in the bulk [3] and at the edge [4], the latter result also gives an optimal control on the spectral radius. An optimal local law for H z in the edge regime previously had not been available, even in the i.i.d. case. The second major obstacle to prove universality of fluctuations of non-Hermitian eigenvalues is the lack of a good analogue of the Dyson Brownian motion. The essential ingredient behind the strongest universality results in the Hermitian case is the Dyson Brownian motion (DBM) [19], a system of coupled stochastic differential equations (SDE) that the eigenvalues of a natural stochastic flow of random matrices satisfy, see [27] for a pedagogical summary. The corresponding SDE in the non-Hermitian case involves not only eigenvalues but overlaps of eigenvectors as well, see e.g. [11,Appendix A]. Since overlaps themselves have strong correlation whose proofs are highly nontrivial even in the Ginibre case [11,29], the analysis of this SDE is currently beyond reach.
Our proof of the edge universality circumvents DBM and it has two key ingredients. The first main input is an optimal local law for the resolvent of H z both in isotropic and averaged sense, see (13) later, that allows for a concise and transparent comparison of the joint distribution of several resolvents of H z with their Gaussian counterparts by following their evolution under the natural Ornstein-Uhlenbeck (OU). We are able to control this flow for a long time, similarly to an earlier proof of the Tracy-Widom law at the spectral edge of a Hermitian ensemble [41]. Note that the density of eigenvalues of H z develops a cusp as |z| passes through 1, the spectral radius of X . The optimal local law for very general Hermitian ensembles in the cusp regime has recently been proven [22], strengthening the non-optimal result in [2]. This optimality was essential in the proof of the universality of the Pearcey statistics for both the complex Hermitian [22] and real symmetric [17] matrices with a cusp in their density of states. The matrix H z , however, does not satisfy the key flatness condition required [22] due its large zero blocks. A very delicate analysis of the underlying matrix Dyson equation was necessary to overcome the flatness condition and prove the optimal local law for H z in [3,4].
Our second key input is a lower tail estimate on the lowest singular value of X − z when |z| ≈ 1. A very mild regularity assumption on the distribution of the matrix elements of X , see (4) later, guarantees that there is no singular value below n −100 , say. Cruder bounds guarantee that there cannot be more than n singular values below n −3/4 ; note that this natural scaling reflects the cusp at zero in the density of states of H z . Such information on the possible singular values in the regime [n −100 , n −3/4 ] is sufficient for the optimal local law since it is insensitive to n -eigenvalues, but for universality every eigenvalue must be accounted for. We therefore need a stronger lower tail bound on the lowest eigenvalue λ 1 of (X − z)(X − z) * . With supersymmetric methods we recently proved [18] a precise bound of the form modulo logarithmic corrections, for the Ginibre ensemble whenever |z| = 1 + O(n −1/2 ). Most importantly, (2) controls λ 1 on the optimal n −3/2 scale and thus excluding singular values in the intermediate regime [n −100 , n −3/4− ] that was inaccessible with other methods. We extend this control to X with i.i.d. entries from the Ginibre ensemble with Green function comparison argument using again the optimal local law for H z .

Notations and conventions
We introduce some notations we use throughout the paper. We write H for the upper half-plane H := {z ∈ C| z > 0}, and for any z ∈ C we use the notation dz := 2 −1 i(dz ∧dz) for the two dimensional volume form on C. For any 2n ×2n matrix A we use the notation A := (2n) −1 Tr A to denote the normalized trace of A. For positive quantities f , g we write f g and f ∼ g if f ≤ Cg or cg ≤ f ≤ Cg, respectively, for some constants c, C > 0 which depends only on the constants appearing in (3). We denote vectors by bold-faced lower case Roman letters x, y ∈ C k , for some k ∈ N. Vector and matrix norms, x and A , indicate the usual Euclidean norm and the corresponding induced matrix norm. Moreover, for a vector x ∈ C k , we use the notation dx := dx 1 . . . dx k .
We will use the concept of "with very high probability" meaning that for any fixed D > 0 the probability of the event is bigger than 1 − n −D if n ≥ n 0 (D). Moreover, we use the convention that ξ > 0 denotes an arbitrary small constant.
We use the convention that quantities without tilde refer to a general matrix with i.i.d. entries, whilst any quantity with tilde refers to the Ginibre ensemble, e.g. we use to denote a non-Hermitian matrix with i.i.d. entries and its eigenvalues, respectively, and X , { σ i } n i=1 to denote their Ginibre counterparts.

Model and main results
We consider real or complex i.i.d. matrices X , i.e. matrices whose entries are independent and identically distributed as x ab d = n −1/2 χ for a random variable χ . We formulate two assumptions on the random variable χ :

Assumption (A)
In the real case we assume that E χ = 0 and E χ 2 = 1, while in the complex case we assume E χ = E χ 2 = 0 and E |χ | 2 = 1. In addition, we assume the existence of high moments, i.e. that there exist constants C p > 0 for each p ∈ N, Assumption (B) There exist α, β > 0 such that the probability density g : F → [0, ∞) of the random variable χ satisfies where F = R, C in the real and complex case, respectively.

Remark 1
We remark that we use Assumption (B) only to control the probability of a very small singular value of X − z. Alternatively, one may use the statement for any l ≥ 1, uniformly in |z| ≤ 2, that follows directly from [55, Theorem 3.2] without Assumption (B). Using (5) makes Assumption (B) superfluous in the entire paper, albeit at the expense of a quite sophisticated proof.
We denote the eigenvalues of X by σ 1 , . . . , σ n ∈ C, and define the k-point corre- for any smooth compactly supported test function F : C k → C, with i j ∈ {1, . . . , n} for j ∈ {1, . . . , k} all distinct. For the important special case when χ follows a standard real or complex Gaussian distribution, we denote the k-point function of the Ginibre matrix X by p (n,Gin(F)) k for F = R, C. The circular law implies that the 1-point function converges to the uniform distribution on the unit disk. On the scale n −1/2 of individual eigenvalues the scaling limit of the k-point function has been explicitly computed in the case of complex and real Ginibre matrices, X ∼ Gin(R), Gin(C), i.e. for any fixed

Remark 2
The k-point correlation function p (∞,Gin(F)) z 1 ,...,z k of the Ginibre ensemble in both the complex and real cases F = C, R is explicitly known; see [31] and [44] for the complex case, and [10,20,28] for the real case, where the appearance of ∼ n 1/2 real eigenvalues causes a singularity in the density. In the complex case p (∞,Gin(C)) where for any complex numbers z 1 , z 2 , w 1 , w 2 the kernel K (iv) For z 1 = z 2 and |z 1 | = 1, for any z ∈ C, with γ z any contour from 0 to z.
For the corresponding much more involved formulas for p (∞,Gin(R)) k we refer the reader to [10].
Our main result is the universality of p (∞,Gin(R,C)) z 1 ,...,z k at the edge. In particular we show, that the edge-scaling limit of p (n) k agrees with the known scaling limit of the corresponding real or complex Ginibre ensemble.
Theorem 1 (Edge universality) Let X be an i.i.d. n × n matrix, whose entries satisfy Assumption (A) and (B). Then, for any fixed integer k ≥ 1, and complex spectral parameters z 1 , . . . , z k such that z j 2 = 1, j = 1, . . . , k, and for any compactly supported smooth function F : C k → C, we have the bound where the constant in O(·) may depend on k and the C 2k+1 norm of F, and c > 0 is a small constant depending on k.

Proof strategy
For the proof of Theorem 1 it is essential to study the linearized 2n × 2n matrix H z defined in (1) with eigenvalues λ z 1 ≤ · · · ≤ λ z 2n and resolvent G(w) = G z (w) := (H z − w) −1 . We note that the block structure of H z induces a spectrum symmetric around 0, i.e. λ z i = −λ z 2n−i+1 for i = 1, . . . , n. The resolvent becomes approximately deterministic as n → ∞ and its limit can be found by solving the simple scalar equation which is a special case of the matrix Dyson equation (MDE), see e.g. [1]. In the following we may often omit the z-dependence of m z , G z (w), . . ., in the notation. We note that on the imaginary axis we have m(iη) = i m(iη), and in the edge regime For η > 0 we define where M should be understood as a 2n × 2n whose four n × n blocks are all multiples of the identity matrix, and we note that [4, Eq. (3.62)] Throughout the proof we shall make use of the following optimal local law which is a direct consequence of [4, Theorem 5.2] (extending [3, Theorem 5.2] to the edge regime). Compared to [4] we require the local law simultaneously in all the spectral parameters z, η and for η slightly below the fluctuation scale n −3/4 . We defer the proofs for both extensions to "Appendix A". Proposition 1 (Local law for H z ) Let X be an i.i.d. n × n matrix, whose entries satisfy Assumption (A) and (B), and let H z be as in (1). Then for any deterministic vectors x, y and matrix R and any ξ > 0 the following holds true with very high probability: Simultaneously for any z with for |1 − |z|| n −1/2 and all η such that n −1 ≤ η ≤ n 100 we have the bounds For the application of Proposition 1 towards the proof of Theorem 1 the special case of R being the identity matrix, and x, y being either the standard basis vectors, or the vectors 1 ± of zeros and ones defined later in (58). The linearized matrix H z can be related to the eigenvalues σ i of X via Girko's Hermitization formula [32,54] 1 n for rescaled test functions f z 0 (z) := n f ( √ n(z − z 0 )), where f : C → C is smooth and compactly supported. When using (14) the small η regime requires additional bounds on the number of small eigenvalues λ z i of H z , or equivalently small singular values of X − z. For very small η, say η ≤ n −100 , the absence of eigenvalues below η, can easily be ensured by Assumption (B). For η just below the critical scale of n −3/4 , however, we need to prove an additional bound on the number of eigenvalues, as stated below.

Proposition 2 For any n
on the number of small eigenvalues, for any ξ > 0.
We remark that the precise asymptotics of (15) are of no importance for the proof of Theorem 1. Instead it would be sufficient to establish that for any > 0 there exists δ > 0 such that we have E {i| λ z i ≤ n −3/4− } n −δ . The paper is organized as follows: in Sect. 3 we will prove Proposition 2 by a Green function comparison argument, using the analogous bound for the Gaussian case, as recently obtained in [18]. In Sect. 4 we will then present the proof of our main result, Theorem 1, which follows from combining the local law (13), Girko's Hermitization identity (14), the bound on small singular values (15) and another long-time Green function comparison argument.

Estimate on the lower tail of the smallest singular value of X − z
The main result of this section is an estimate of the lower tail of the density of the smallest λ z i in Proposition 2. For this purpose we introduce the following flow with initial data X 0 = X , where B t is the real or complex matrix valued standard Brownian motion, i.e. B t ∈ R n×n or B t ∈ C n×n , accordingly with X being real or complex, where (b t ) ab in the real case, and in the complex case, are independent standard real Brownian motions for a, b ∈ [n]. The flow (16) induces a flow dχ t = −χ t dt/2 + db t on the entry distribution χ with solution where g∼N(0, 1) is a standard real or complex Gaussian, independent of χ , with E g 2 = 0 in the complex case. By linearity of cumulants we find where κ i, j (x) denotes the joint cumulant of i copies of x and j copies of x, in particular Thus (17) implies that, in distribution, where X is a real or complex Ginibre matrix independent of X 0 = X . Then, we define the 2n × 2n matrix H t = H z t as in (1) replacing X by X t , and its resolvent G t (w) = G z t (w) := (H t − w) −1 , for any w ∈ H. We remark that we defined the flow in (16) with initial data X and not H z in order to preserve the shape of the self consistent density of states of the matrix H t along the flow. In particular, by (16) it follows that H t is the solution of the flow with where I denotes the n × n identity matrix.
We will use the cumulant expansion that holds for any smooth function f : where the error term Ω(K , f ) goes to zero as the expansion order K goes to infinity. In our application the error is negligible for, say, K = 100 since with each derivative we gain an additional factor of n −1/2 and due to the independence (24) the sums of any order have effectively only n 2 terms. Applying (25) to (22) with f = ∂ α R t , the first order term is zero due to the assumption E x α = 0, and the second order term cancels. The third order term is given by

Proof of Eq. (26)
It follows from the resolvent identity that ∂ α G = −GΔ α G, where Δ α is the matrix of all zeros except for a 1 in the α-th entry. 1 Thus, neglecting minuses and irrelevant constant factors, for any fixed α, the sum (26) is given by a sum of terms of the form Hence, considering all possible choices of γ 1 , γ 2 , γ 3 and using independence to conclude that κ t (α, β 1 , β 2 ) can only be non-zero if β 1 , β 2 ∈ {α, α } we arrive at where the sums are taken over (a, b) ∈ [2n] 2 \ ([n] 2 ∪ [n + 1, 2n] 2 ) and c ∈ [2n], and we dropped the time dependence of G = G t for notational convenience.
We estimate the three sums in (27) using that, by (10), (12), it follows from Proposition 1, and Cauchy-Schwarz estimates by This concludes the proof of (26) by choosing ξ in Proposition 1 accordingly.
Finally, in the cumulant expansion of (22) we are able to bound the terms of order at least four trivially. Indeed, for the fourth order, the trivial bound is e −2t since the n 3 from the summation is compensated by the n −2 from the cumulants and the n −1 from the normalization of the trace. Morever, we can always perform at least two Ward-estimates on the first and last G with respect to the trace index. Thus we can estimate any fourth-order term by e −2t (nη) −2 ≤ e −3t/2 n −7/2 η −4 , and we note that the power-counting for higher order terms is even better than that. Whence we have shown that E |d R t / dt| e −3t/2 n −7/2 η −4 and the proof of Proposition 3 is complete after integrating (22) in t from t 1 to t 2 .
Let X be a real or complex n × n Ginibre matrix and let H z be the linearized matrix defined as in (1) replacing X by X . Let λ i = λ z i , with i ∈ {1, . . . , 2n}, be the eigenvalues of H z . We define the non negative Hermitian matrix Y = Y z := ( X − z)( X − z) * , then, by [18],[Eq. (13c)- (14)] it follows that for any η ≤ n −3/4 we have for X distributed according to the complex, or respective, real Ginibre ensemble. Combining (28) and Proposition 3 we now present the proof of Proposition 2.

Proof of Proposition 2
Let λ i (t), with i ∈ {1, . . . , 2n}, be the eigenvalues of H t for any t ≥ 0. Note that λ i (0) = λ i , since H 0 = H z . By (21), choosing t 1 = 0, t 2 = +∞ it follows that for any ξ > 0. Since the distribution of H ∞ is the same as H z it follows that and combining (28) with (29), we immediately conclude the bound in (15).

Edge universality for non-Hermitian random matrices
In this section we prove our main edge universality result, as stated in Theorem 1. In the following of this section without loss of generality we can assume that the test function F is of the form with f (1) , . . . , f (k) : C → C being smooth and compactly supported functions. Indeed, any smooth function F can be effectively approximated by its truncated Fourier series (multiplied by smooth cutoff function of product form); see also [54,Remark 3]. Using the effective decay of the Fourier coefficients of F controlled by its C 2k+1 norm, a standard approximation argument shows that if (8)  To resolve eigenvalues on their natural scale we consider the rescaling f z 0 (z) := n f ( √ n(z − z 0 )) and compare the linear statistics n −1 i f z 0 (σ i ) and n −1 i f z 0 ( σ i ), with σ i , σ i being the eigenvalues of X and of the comparison Ginibre ensemble X , respectively. For convenience we may normalize both linear statistics by their deterministic approximation from the local law (13) which, according to (14) is given by where D denotes the unit disk of the complex plane. N and z 1 , . . . , z k ∈ C be such that z j 2 = 1 for all j ∈

Proposition 4 Let k ∈
[k], and let f (1) , . . . , f (k) be smooth compactly supported test functions. Denote the eigenvalues of an i.i.d. matrix X satisfying Assumptions (A)-(B) and a corresponding real or complex Ginibre matrix X by Then we have the bound for some small constant c(k) > 0, where the implicit multiplicative constant in O(·) depends on the norms Δf ( j) 1 , j = 1, 2, . . . , k.
Proof of Theorem 1 Theorem 1 follows directly from Proposition 4 by the definition of the k-point correlation function in (6), the exclusion-inclusion principle and the bound 1 π D f z 0 (z) dz 1.
The remainder of this section is devoted to the proof of Proposition 4. We now fix some k ∈ N and some z 1 , . . . , z k , f (1) , . . . , f (k) as in Proposition 4. All subsequent estimates in this section, also if not explicitly stated, hold true uniformly for any z in an order n −1/2 -neighborhood of z 1 , . . . , z k . In order to prove (32), we use Girko's formula (14) to write where I ( j) with η 0 := n −3/4−δ , for some small fixed δ > 0, and for some very large T > 0, say T := n 100 . We define I

Proof of Proposition 4
The first step in the proof of Proposition 4 is the reduction to a corresponding statement about the I 3 -part in (33), as summarized in the following lemma.
In order to conclude the proof of Proposition 4, due to Lemma 1, it only remains to prove that for any fixed k with some small constant c(k) > 0, where we recall the definition of I 3 and the corresponding I 3 for Ginibre from (33). The proof of (35) is similar to the Green function comparison proof in Proposition 3 but more involved due to the fact that we compare products of resolvents and that we have an additional η-integration.
Here we define the observable (20).
3 , the proof of Proposition 4 follows directly from (35), modulo the proofs of Lemmata 1-2 that will be given in the next two subsections.

Proof of Lemma 1
In order to estimate the probability that there exists an eigenvalue of H z very close to zero, we use the following proposition that has been proven in [3,Prop. 5.7] adapting the proof of [9,Lemma 4.12].

Proposition 5 Under Assumption
for all u > 0 and z ∈ C.
In the following lemma we prove a very high probability bound for I

Lemma 3 For any j ∈ [k] the bounds
we easily estimate |I 1 | as follows for any ξ > 0 with very high probability owing to the high moment bound (3). By (9) it follows that m z (iη) − (η + 1) −1 ∼ η −2 for large η, proving also the bound on I 4 in (39). The bound for I 3 follows immediately from the averaged local law in (13). For the I 2 estimate we split the η-integral of m z (iη) − m z (iη) in I 2 as follows where l ∈ N is a large fixed integer. Using (10) we find that the third term in (40) with very high probability for any ξ > 0. Alternatively, this bound also follows from (5) without Assumption (B), circumventing Proposition 5, see Remark 1. For the second term in (40) we define η 1 := n −3/4+ξ with some very small ξ > 0 and using (42) by the averaged local law in (13), and M z (iη 1 ) η 1/3 1 from (10). Here from the second to third line in (42) we used that again by the local law. By redefining ξ , this concludes the high probability bound on I 2 in (39), and thereby the proof of the lemma.
In the following lemma we prove an improved bound for I ( j) 2 , compared with (39), which holds true only in expectation. The main input of the following lemma is the stronger lower tail estimate on λ i , in the regime |λ i | ≥ n −l , from (15) instead of (43).
Proof We split the η-integral of m z (iη) − m z (iη) as in (40). The third term in the r.h.s. of (40) is of order n −1−4δ/3 . Then, we estimate the first term in the r.h.s. of (40) in terms of the smallest (in absolute value) eigenvalue λ n+1 as where in the last inequality we use (38) with u = e −t n. Note that by (15) it follows that E {i : |λ i | ≤ n δ/2 η 0 } n −δ/2 .
Hence, by (46), using similar computations to (42), we conclude that and j 1 + j 2 + j 3 + j 4 = k. Assume j 1 ≥ 1, the case j 4 ≥ 1 is completely analogous, then Since similar bounds hold true for I 4 as well, the above inequalities conclude the proof of (34).

Proof of Lemma 2
We begin with a lemma generalizing the bound in (39) to derivatives of I where ∂ l γ := ∂ γ 1 . . . ∂ γ l , with very high probability uniformly in t ≥ 0. Proof We omit the t-and z-dependence of G z t , m z within this proof since all bounds hold uniformly in t ≥ 0 and z − z j n −1/2 . We also omit the η-argument from these functions, but the η-dependence of all estimates will explicitly be indicated. Note that the l = 0 case was already proven in (39). We now separately consider the remaining cases l = 1 and l ≥ 2. For notational simplicity we neglect the n ξ multiplicative error factors (with arbitrarily small exponents ξ > 0) applications of the local law (13) within the proof. In particular we will repeatedly use (13) in the form where we defined the parameter ψ := 1 nη + 1 n 1/2 η 1/3 .

Case l = 1
This follows directly from where in the last step we used G(iT ) ≤ T −1 = n −100 and (49). Since this bound is uniform in z we may bound the remaining integral by n Δf ( j) 1 , proving (48).
In the explicit deterministic term we performed an integration and estimated The claim (48) for l ≥ 2 and a ≡ b + n (mod 2n) now follows from estimating the remaining z-integral in (52) by n Δf ( j) 1 . (20) and Ito's Lemma it follows that

Proof of Lemma 2 By
where we recall the definition of κ t in 23. In fact, the point-wise estimate from Lemma 5 gives a sufficiently strong bound for most terms in the cumulant expansion, the few remaining terms will be computed more carefully.
In the cumulant expansion (25) of (53) the second order terms cancel exactly and we now separately estimate the third-, fourth-and higher order terms.

Order three terms
For the third order, when computing ∂ α ∂ β 1 ∂ β 2 Z t through the Leibniz rule we have to consider all possible assignments of derivatives ∂ α , ∂ β 1 , ∂ β 2 to the factors I using Lemma 5 and the cumulant scaling (24). Note that the condition a = b in the lemma is ensured by the fact that for a = b the cumulants κ t (α, β 1 , . . . ) vanish. The first term in (54) requires an additional argument. We write out all possible index allocations and claim that ultimately we obtain the same bound, as for the other two terms in (54), i.e. for all of which we obtain a bound of n ξ e −2t n 2 η 2 0 j Δf ( j) 1 , again using Lemma 5 and (24).

Higher order terms
For terms order at least 5, there is no need to additionally gain from any of the factors of I 3 and we simply bound all those, and their derivatives, by n ξ using Lemma 5. This results in a bound of n ξ −(l−4)/2 e −lt/2 j Δf ( j) 1 for the terms of order l. By combining the estimates on the terms of order three, four and higher order derivatives, and integrating in t we obtain the bound (37). This completes the proof of Lemma 2.
In order to conclude the local law simultaneously in all z, η we use a standard grid argument. To do so, we choose a regular grid of z's and η's at a distance of, say, n −3 and use Lipschitz continuity (with Lipschitz constant n 2 ) of (η, z) → G z (iη) and a union bound over the exceptional events at each grid point.
Funding Open access funding provided by Institute of Science and Technology (IST Austria).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.