Local Law of Addition of Random Matrices on Optimal Scale

The eigenvalue distribution of the sum of two large Hermitian matrices, when one of them is conjugated by a Haar distributed unitary matrix, is asymptotically given by the free convolution of their spectral distributions. We prove that this convergence also holds locally in the bulk of the spectrum, down to the optimal scales larger than the eigenvalue spacing. The corresponding eigenvectors are fully delocalized. Similar results hold for the sum of two real symmetric matrices, when one is conjugated by Haar orthogonal matrix.


Introduction
The pioneering work [31] of Voiculescu connected free probability with random matrices, as one of the most prominent examples for a noncommutative probability space is the space of Hermitian N × N matrices. On one hand, the law of the sum of two free random variables with laws μ α and μ β is given by the free additive convolution μ α μ β . On the other hand, in the case of Hermitian matrices, the law can be identified with the distribution of the eigenvalues. Thus the free additive convolution computes the eigenvalue distribution of the sum of two free Hermitian matrices. However, freeness is characterized by an infinite collection of moment identities and cannot easily be verified in general. A fundamental direct mechanism to generate freeness is conjugation by random unitary matrices. More precisely, two large Hermitian random matrices are asymptotically free if the unitary transfer matrix between their eigenbases is Haar distributed. The most important example is when the spectra of the two matrices are deterministic and the unitary conjugation is the sole source of randomness. In other words, if A = A (N ) and B = B (N ) are two sequences of deterministic N × N Hermitian matrices and U is a Haar distributed unitary, then A and U BU * are asymptotically free in the large N limit and the asymptotic eigenvalue distribution of A + U BU * is given by the free additive convolution μ A μ B of the eigenvalue distributions of A and B.
Z. Bao, L. Erdős and K.Schnelli were supported by ERC Advanced Grant RANMAT No. 338804. Since Voiculescu's first proof, several alternative approaches have been developed, see e.g. [11,16,29,30], but all of them were global in the sense that they describe the eigenvalue distribution in the weak limit, i.e. on the macroscopic scale, tested against N -independent test functions (to fix the scaling, we assume that A (N ) and B (N ) are uniformly bounded).
The study of a local law, i.e. identification of the eigenvalue distribution of A+U BU * with the free additive convolution below the macroscopic scale, was initiated by Kargin. First, he reached the scale (log N ) −1/2 in [25] by using the Gromov-Milman concentration inequality for the Haar measure (a weaker concentration result was obtained earlier by Chatterjee [14]). Kargin later improved his result down to scale N −1/7 in the bulk of the spectrum [26] by analyzing the stability of the subordination equations more efficiently. This result was valid only away from finitely many points in the bulk spectrum and no effective control was given on this exceptional set. Recently in [1], we reduced the minimal scale to N −2/3 by establishing the optimal stability and by using a bootstrap procedure to successively localize the Gromov-Milman inequality from larger to smaller scales. Moreover, our result holds in the entire bulk spectrum. In fact, the key novelty in [1] was a new stability analysis in the entire bulk spectrum.
The main result of the current paper is the local law for H = A + U BU * down to the scale N −1+γ , for any γ > 0. Note that the typical eigenvalue spacing is of order N −1 , a scale where the eigenvalue density fluctuates and no local law holds. Thus our result holds down to the optimal scale.
There are several motivations to establish such refinements of the macroscopic limit laws. First, such bounds are used as a priori estimates in the proofs of Wigner-Dyson-Mehta type universality results on local spectral statistics; see e.g. [12,20,21,27] and references therein. Second, control on the diagonal resolvent matrix elements for some η = Im z implies that the eigenvectors are delocalized on scale η −1 ; the optimal scale for η yields complete delocalization of the eigenvectors. Third, the local law is ultimately related to an effective speed of convergence in Voiculescu's theorem on the global scale [1,26].
The basic idea of the proof is a continuity argument in the imaginary part η = Im z of the spectral parameter z ∈ C + in the resolvent G(z) = (H − z) −1 . This method for the matrix elements of G(z) was first introduced in [19] in the context of Wigner matrices. It requires an initial step, an a priori control on G(z) for large η, say η = 1. In the context of the current paper, the a priori bound is provided by Kargin's result [26]. Since G(z) is continuous in z, this also provides a control on G(z) for slightly smaller η. This weak control shows that the normalized trace of G(z) (and in fact all diagonal elements G ii ) is in the stability regime of a self-consistent equation which identifies the limiting object. The main work is to estimate the error between the equations for G(z) and its limit. Our analysis has three major ingredients.
First, we use a partial randomness decomposition of the Haar measure that enables us to take partial expectation of G ii with respect to the ith column of U . Second, to compute this partial expectation, we establish a new system of self-consistent equations involving only two auxiliary quantities. Keeping in mind, as a close analogy, that freeness involves checking infinitely many moment conditions for monomials of A, B and U , one may fear that an equation for G involves BG, whose equation involves BG B etc., i.e. one would end up with an infinite system of equations. Surprisingly this is not the case and monitoring two appropriately chosen quantities in tandem is sufficient to close the system. Third, to connect the partial expectation of G ii with the subordination functions from free probability, we rely on the optimal stability result for the subordination equations obtained in [1].
We stress that exploiting concentration only for the partial randomness surpasses the more general but less flexible Gromov-Milman technique. The main point is that we use concentration for each G ii separately, exploiting the randomness of a single column (namely the ith one) of the Haar unitary U . Since G ii depends much stronger on this column than on the other ones, the partial expectation of G ii with respect to the ith column is already essentially deterministic. The concentration around this partial expectation is more efficient since it uses only O(N ) random variables instead of all the O(N 2 ) variables used in Gromov-Milman method.
One prominent application of our work concerns the single ring theorem of Guionnet, Krishnapur and Zeitouni [22] on the eigenvalue distribution of matrices of the form U T V , where T is a fixed positive definite matrix and U , V are independent Haar distributed. Via the hermitization technique, local laws for the addition of random matrices can be used to prove local versions of the single ring theorem. This approach was demonstrated recently by Benaych-Georges [8], who proved a local single ring theorem on scale (log N ) −1/4 using Kargin's local law on scale (log N ) −1/2 . The local law on the optimal scale N −1 is one of the key ingredients to prove the local single ring theorem on the optimal scale. The local single ring theorem will be proved in our separate work [2].

Notation.
The following definition for high-probability estimates is suited for our purposes, which was first used in [18].
be two families of nonnegative random variables where V (N ) is a possibly N -dependent parameter set. We say that Y stochastically dominates X , uniformly in v, if for all (small) > 0 and (large) D > 0, sup v∈V (N ) We further rely on the following notation. We use the symbols O( · ) and o( · ) for the standard big-O and little-o notation. We use c and C to denote strictly positive constants that do not depend on N . Their values may change from line to line.
We use bold font for vectors in C N and denote the components as We denote by v 2 the Euclidean norm and by v ∞ = max i |v i | the uniform norm of v ∈ C N . We denote by M N (C) the set of N × N matrices over C. For A ∈ M N (C), we denote by A its operator norm and by A 2 its Hilbert-Schmidt norm. The matrix entries of A are denoted by A i j = e * i Ae j . We denote by tr A the normalized trace of For v, w ∈ C N , the rank-one matrix vw * has elements (vw * ) i j = (v i w j ).
Finally, we use double brackets to denote index sets, i.e. , for n 1 , n 2 ∈ R.

Free additive convolution.
In this subsection, we recall the definition of the free additive convolution. This is a shortened version of Sect. 2.1 of [1] added for completeness. Given a probability measure 1 μ on R its Stieltjes transform, m μ , on the complex upper half-plane C + := {z ∈ C : Im z > 0} is defined by Note that m μ : C + → C + is an analytic function such that Conversely, if m : C + → C + is an analytic function such that lim η ∞ iη m(iη) = 1, then m is the Stieltjes transform of a probability measure μ, i.e. m(z) = m μ (z), for all z ∈ C + . We denote by F μ the negative reciprocal Stieltjes transform of μ, i.e.
as follows from (2.2), and note that F μ is analytic on C + with nonnegative imaginary part.
The free additive convolution is the symmetric binary operation on probability measures on R characterized by the following result. Proposition 2.1 (Theorem 4.1 in [6], Theorem 2.1 in [15]). Given two probability measures, μ 1 and μ 2 , on R, there exist unique analytic functions, ω 1 , ω 2 : C + → C + , such that, (i) for all z ∈ C + , Im ω 1 (z), Im ω 2 (z) ≥ Im z, and It follows from (2.5) that the analytic function F : C + → C + defined by satisfies the analogue of (2.4). Thus F is the negative reciprocal Stieltjes transform of a probability measure μ, called the free additive convolution of μ 1 and μ 2 , usually denoted by μ ≡ μ 1 μ 2 . The functions ω 1 and ω 2 of Proposition 2.1 are called subordination functions and m is said to be subordinated to m μ 1 , respectively to m μ 2 . Moreover, observe that ω 1 and ω 2 are analytic functions on C + with nonnegative imaginary parts. Hence they admit the Nevanlinna representations where a ω j ∈ R and ω j are finite Borel measures on R. For further details and historical remarks on the free additive convolution we refer to, e.g. [23,32].
Choosing μ 1 as a single point mass at b ∈ R and μ 2 arbitrary, it is straightforward to check that μ 1 μ 2 is μ 2 shifted by b. We exclude this uninteresting case by assuming hereafter that μ 1 and μ 2 are both supported at more than one point. For general μ 1 and μ 2 , the atoms of μ 1 μ 2 are identified as follows. A point c ∈ R is an atom of μ 1 μ 2 , if and only if there exist a, b ∈ R such that c = a + b and μ 1 ({a}) + μ 2 ({b}) > 1; see [Theorem 7.4, [10]]. Properties of the continuous part of μ 1 μ 2 may be inferred from the boundary behavior of the functions F μ 1 μ 2 , ω 1 and ω 2 . For simplicity, we restrict the discussion to compactly supported probability measures in the following. Proposition 2.2 (Theorem 2.3 in [3], Theorem 3.3 in [4]). Let μ 1 and μ 2 be compactly supported probability measures on R none of them being a single point mass. Then the functions F μ 1 μ 2 , ω 1 , ω 2 : C + → C + extend continuously to R.
We are now all set to introduce our notion of regular bulk, B μ 1 μ 2 , of μ 1 μ 2 . Informally, we let B μ 1 μ 2 be the open set on which μ 1 μ 2 has a continuous density that is strictly positive and bounded from above. For a formal definition we first introduce the set (2.9) Note that U μ 1 μ 2 does not contain any atoms of μ 1 μ 2 . By the Luzin-Privalov theorem the set {x ∈ R : lim η 0 F μ 1 μ 2 (x + iη) = 0} has Lebesgue measure zero. In fact, a stronger statement applies for the case at hand. Belinschi [5] showed that if x ∈ R is such that lim η 0 F μ 1 μ 2 (x + iη) = 0, then it must be of the form x = a + b with Since there could only be finitely many such point x, the set U μ 1 μ 2 must contain an open non-empty interval.
The regular bulk is obtained from U μ 1 μ 2 by removing the zeros of f μ 1 μ 2 inside U μ 1 μ 2 .
Definition 2.4. The regular bulk of the measure μ 1 μ 2 is the set (2.10) Note that B μ 1 μ 2 is an open nonempty set on which μ 1 μ 2 admits the density f μ 1 μ 2 . The density is strictly positive and thus by Proposition 2.3 real analytic on B μ 1 μ 2 . (N ) and B ≡ B (N ) be two sequences of deterministic real diagonal matrices in M N (C), whose empirical spectral distributions are denoted by μ A and μ B , respectively. More precisely,

Definition of the model and assumptions. Let A ≡ A
For simplicity we omit the N -dependence of the matrices A and B from our notation. Throughout the paper, we assume for some positive constant C uniform in N . Proposition 2.1 asserts the existence of unique analytic functions ω A and ω B satisfying the analogue of (2.5) such that, for all z ∈ C + , We will assume that there are deterministic probability measures μ α and μ β on R, neither of them being a single point mass, such that the empirical spectral distributions μ A and μ B converge weakly to μ α and μ β , as N → ∞. More precisely, we assume that 14) as N → ∞, where d L denotes the Lévy distance. Proposition 2.1 asserts that there are unique analytic functions ω α , ω β satisfying the analogue of (2.5) such that, for all z ∈ C + , For simplicity, we frequently use the notation G(z) instead of G H (z) and we write In addition, for brevity, we set, for any given γ > 0, The main results of this paper are as follows.
Theorem 2.5. Let μ α and μ β be two compactly supported probability measures on R, and assume that neither is supported at a single point and that at least one of them is supported at more than two points. Assume that the sequences of matrices A and B in (2.16) are such that their empirical eigenvalue distributions μ A and μ B satisfy (2.14).
The assumption that neither of μ α and μ β is a point mass, ensures that the free additive convolution is not a simple translate. The additional assumption that at least one of them is supported at more than two points is made for brevity of the exposition here. In Appendix B, we present the corresponding result for the special case when μ α and μ β are both convex combinations of two point masses.

23)
i.e. the Lévy distances of the empirical eigenvalue distributions of A and B from their limiting distributions control uniformly the deviations of the corresponding subordination functions and Stieltjes transforms. Note moreover that max z∈S I (0,1) |m μ α μ β (z)| < ∞ by compactness of I and analyticity of m μ 1 μ 2 . Thus the Stieltjes-Perron inversion formula directly implies that (μ A μ B ) ac has a density, f μ A μ B , inside I and that Hence, using m α (ω β (z)) = m μ α μ β (z), we observe that |m H (z) − m μ A μ B (z)| is bounded by the right side of (2.25), too.

Remark 2.3.
Note that assumption (2.14) does not exclude that the matrix H has outliers in the large N limit. In fact, the model H = A + U BU * shows a rich phenomenology when, say, A has a finite number of large spikes; we refer to the recent works in [7,13,26].
Let λ 1 , . . . , λ N be the eigenvalues of H , and u 1 , . . . , u N be the corresponding 2normalized eigenvectors. The following result shows complete delocalization of the bulk eigenvectors. (2.26)

Strategy of proof.
In this subsection, we informally outline the strategy of our proofs. Throughout the paper, without loss of generality, we assume For brevity, we use the shorthand m ≡ m μ A μ B for the Stieltjes transform of μ A μ B . We consider first the unitary setting. Let and denote their Green functions by We write z = E + iη ∈ C + , E ∈ R and η > 0, for the spectral parameter. In the sequel we often omit z ∈ C + from the notation when no confusion can arise. Recalling (2.17), we have For brevity, we set The following functions will play a key role in our proof.
Notice that the role of A and B are not symmetric in these notations. By cyclicity of the trace, we may write We remark that the approximate subordination functions defined above are slightly different from the candidate subordination functions used in [26,29] which were later used in [1]. The functions ω c A (z) and ω c B (z) turn out to be good approximations to the subordination functions ω A (z) and ω B (z) of (2.13). A direct consequence of the definition in (2.30) is that (2.32) Having set the notation, our main task is to show that where we focus, for simplicity, on the diagonal Green function entries only. We first heuristically explain how (2.33) leads to our main result in (2.20). A key input is the local stability of the system (2.13) established in [1]; see Subsection 3.3 for a summary. Averaging over the index i in (2.33), we get Replacing H by H, we analogously get which is a perturbation of (2.13). Using the local stability of the system (2.13), we obtain Plugging the first estimate back into (2.33) we get (2.20). The full proof of this step is accomplished in Sect. 7. We next return to (2.33). Its proof relies on the following decomposition of the Haar measure on the unitary group given, e.g. in [17,28]. For any fixed i ∈ 1, N , any Haar unitary U can be written as Here R i is the Householder reflection (up to a sign) sending the vector The gist of the decomposition in (2.37) is that the Householder reflection R i and the unitary U i are independent, for each fixed i ∈ 1, N . Hence, the decomposition in (2.37) allows one to split off the partial randomness of the vector v i from U .
The proof of (2.33) is divided into two parts: To prove part (i), we resolve dependences by expansion and use concentration estimates for the vector v i . This part is accomplished in Sect. 5.
Part (ii) is carried out in Sect. 6. We start from the Green function identity Taking the E v i expectation of (2.38) and recalling the definition of the approximate subordination function ω c B (z) in (2.30), it suffices to show that to prove (2.33). Denoting B i := U i B(U i ) * and setting, for z ∈ C + , we will prove that Approximating e −iθ i v i by a Gaussian vector and using integration by parts for Gaussian random variables, we get the pair of equations where we dropped the z-argument for the sake of brevity; see (6.23) and (6.24) for precise statements with, for technical reasons, slightly modified S i and T i . Solving the two equations above for E v i S i we find Returning to (2.39), we also obtain, using concentration estimates for ( BG) ii (which follow from the concentration estimates of G ii established in part (i) and (2.38)), that Thus, averaging (2.40) over the index i and comparing with (2.41), we conclude that Plugging this last estimate back into (2.40), we eventually find that which together with (2.39) and (2.38) gives us part (ii). This completes the sketch of the proof for the unitary case. The proof of the orthogonal case is similar. The necessary modifications are given in Appendix A.

Preliminaries
In this section, we first collect some basic tools used later on and then summarize results of [1]. In particular, we discuss, under the assumptions of Theorem 2.5, stability properties of the system (2.13) and state essential properties of the subordination functions ω A and ω B .

Stochastic domination and large deviation properties.
Recall the definition of stochastic domination in Definition 1.1. The relation ≺ is a partial ordering: it is transitive and it satisfies the arithmetic rules of an order relation, e.g., if X 1 ≺ Y 1 and X 2 ≺ Y 2 then Gaussian vectors have well-known large deviation properties. We will use them in the following form whose proof is standard.

Rank-one perturbation formula.
At various places, we use the following fundamental perturbation formula: for α, β ∈ C N and an invertible D ∈ M N (C), we have as can be checked readily. A standard application of (3.2) is recorded in the following lemma.

Lemma 3.2. Let D ∈ M N (C) be Hermitian and let Q ∈ M N (C) be arbitrary. Then, for any finite-rank Hermitian matrix R ∈ M N (C), we have
Proof. Let z ∈ C + and α ∈ C N . Then from (3.2) we have We can thus estimate Since R = R * ∈ M N (C) has finite rank, we can write R as a finite sum of rank-one Hermitian matrices of the form ±αα * . Thus iterating (3.5) we get (3.3).
Considering μ 1 , μ 2 as fixed, the equation is equivalent to (2.6) and, by Proposition 2.1, there are unique analytic functions ω 1 , ω 2 : (2.13). When no confusion can arise, we simply write for for some positive constant S. In particular, the partial Jacobian matrix of (3.6) given by In particular, ω 1 and ω 2 are Lipschitz continuous with constant 2S. A more detailed analysis yields the following local stability result of the system μ 1 ,μ 2 (ω 1 , ω 2 , z) = 0.

Lemma 3.3 (Proposition 4.1, [1]
). Fix z 0 ∈ C + . Assume that the functions ω 1 , ω 2 , r 1 , where ω 1 (z), ω 2 (z) solve the unperturbed system μ 1 ,μ 2 (ω 1 , ω 2 , z) = 0 with Im ω 1 (z) ≥ Im z and Im ω 2 (z) ≥ z, z ∈ C + . Assume that there is a constant S such that is linearly S-stable at (ω 1 (z 0 ), ω 2 (z 0 )), and assume in addition that there are strictly positive constants K and k with k > δ and with k 2 > δK S such that In Sect. 7, we will apply Lemma 3.3 with the choices μ 1 = μ A and μ 2 = μ B . We thus next show that the system μ A ,μ B (ω A , ω B , z) = 0 is S-stable, for all z ∈ S I (0, 1), and that (3.12) holds uniformly on S I (0, 1); see (2.18) for the definition. Lemma 3.4 (Lemma 5.1 and Corollary 5.2 of [1]). Let μ A , μ B be the probability measures from (2.11) satisfying the assumptions of Theorem 2.5. Let ω A , ω B denote the associated subordination functions of (2.13). Let I be the interval in Theorem 2.5. Then for N sufficiently large, the system μ A ,μ B (ω A , ω B , z) = 0 is S-stable with some positive constant S, uniformly on S I (0, 1). Moreover, there exist two strictly positive constants K and k, such that for N sufficiently large, we have Under the assumptions of Lemma 3.4, the estimates in (3.15) can be extended as follows. There isk > 0 such that This follows by combining (3.15) with the Nevanlinna representations in (2.8).
We conclude this section by mentioning that the general perturbation result in Lemma 3.3 combined with Lemma 3.4, can be used to prove (2.23). We refer to [1] for details.

Partial Randomness Decomposition
We use a decomposition of Haar measure on the unitary groups obtained in [17] (see also [28]): For a Haar distributed unitary matrix U ≡ U N , there exist a random vector v 1 , which is independent of v 1 , such that one has the decomposition where and where θ 1 is the argument of the first coordinate of the vector v 1 . More generally, for any i ∈ 1, N , there exists an independent pair (v i , U i ), with v i a uniformly distributed unit vector v i and with U i ∈ U (N − 1) a Haar unitary, such that one has the decomposition where U i is the unitary matrix with e i as its ith column and U i as its (i, i)-matrix minor, and θ i is the argument of the ith coordinate of v i . In addition, using the definition of R i and U i , we note that With the above notation, we can write where we introduced the shorthand notation We further define It is well known that for a uniformly distributed unit vector v i ∈ C N , there exists a Gaussian vector (4.5) By definition, θ i is also the argument of g ii . Set and introduce an N C (0, N −1 ) variable g ii which is independent of the unitary matrix U and of g i . Then, we denote In addition, by definition, we have In subsequent estimates for G i j , it is convenient to approximate r i by in the decomposition U = −e iθ i R i U i , without changing the randomness of U i . To estimate the precision of this approximation, we require more notation: Let Correspondingly, we denote The following lemma shows that r i can be replaced by w i in Green function entries at the expense of an error that is below the precision we are interested in.
Proof of Lemma 4.1. Fix i, j, k ∈ 1, N . We first note that By the concentration inequalities in Lemma 3.1, and g ii , g ii ∼ N C (0, N −1 ), we see that where we have used (4.5). Plugging the estimates in (4.13) into (4.12) and using the fact (4.14) Denote Fix now z ∈ C + . Dropping z from the notation, a first order Neumann expansion of the resolvent yields Observe that the second term on the right side of (4.15) is a polynomial in the terms with coefficients of the form δ k 1 1i δ k 2 2i , for some nonnegative integers k 1 , k 2 such that k 1 + k 2 ≥ 1. By assumption (4.10), the fact B i e i = b i e i , and assumption (2.12), we further observe that the first four terms in (4.16) are stochastically dominated by one. The last four terms are also stochastically dominated by one as follows from the trivial fact e * i B i e i = b i and Lemma 3.1. The terms in the second line of (4.16) are stochastically dominated by with Q i = I or B i , and with x i = e i or g i , where the last step follows from (4.10).
Note that the terms in the second line of (4.16) appear only linearly in (4.15). Hence, (4.14), (4.17) and the order one bound for the first and last four terms in (4.16) lead to (4.11).

Concentration with Respect to the Vector g i
In this section, we show that G (i) ii concentrates around the partial expectation is the expectation with respect to the collection (Re g i j , Im g i j ) N j=1 . Besides the diagonal Green function entries G ii ], carried out in the Sects. 6 and 7, involves the quantities T i and S i . From a technical point of view, it is convenient to be able to go back and forth between T i , S i and their expectations Thus after establishing concentration estimates for G (i) ii in Lemma 5.1 below, we establish in Corollary 5.2 concentration estimates for T i and S i where we also give a rough bounds on T i , S i and related quantities. We need some more notation: for a general random variable X we define (5.2) The main task in this section is to prove the following lemma.
Lemma 5.1. Suppose that the assumptions of Theorem 2.5 are satisfied and let γ > 0. Fix z = E + iη ∈ S I (η m , 1) and assume that Proof of Lemma 5.1. In this proof we fix z ∈ S I (η m , 1). Recall the definition of G i (z) in (4.4) and note that G i (z) is independent of v i (or g i ). It is therefore natural to expand G (i) (z) around G i (z) and to use the independence between G i (z) and g i in order to verify the concentration estimates. However, by construction, we have which may be as large as 1/η, depending on a i , b i and z. To circumvent problems coming from instabilities in G i ii (z), we may use a "regularization" trick to enhance stability in the e i -direction: instead of considering the Green function of H (i) = A + B (i) directly, we first consider the (z-dependent) matrix Some algebra then reveals that . (5.8) By assumption (5.3) and identity (5.7), we have Setting j = i in (5.8) and expressing the denominator on the right side by using (5.9)-(5.10), we get In particular, together with Lemma 3.4 and Im ω B (z) ≥ Im z, this implies that the absolute value of the denominator on the right side of (5.8) is bounded from below by some strictly positive constant. Thus, applying IE g i on both sides of (5.10), we obtain the concentration estimate in (5.4).
In the rest of the proof, we verify (5.10). Consider next the matrix since Im ω B is uniformly bounded from below on S I (η m , 1) by Lemma 3.4. We now expand G {i} (z) around G [i] (z) and use the independence among G [i] (z) and g i . For simplicity, we hereafter drop the z-dependence from the notation. We start with noticing that where we introduced Iterating the rank-one perturbation formula (3.2) once, we obtain Substituting the second identity in (5.16) to the first one, we obtain Taking the (i, j)th matrix entry in (5.17), we get We now rewrite (5.19) as .
, it suffices to verify the following statements to show (5.10): We first show claim (i). Substituting the definitions in (5.15) into (5.18), we have Let Q i 1 and Q i 2 each stand for either I or B i . Recalling that w i = e i + g i and that g i ∼ N C (0, N −1 I ) is a complex Gaussian vector, we compute To bound the right side of (5.24) we observe that |( ii | ≺ 1, where we used that e i is an eigenvector of B i and (5.13). (Notice that, for simplicity, here and at several other places we consistently use the notation ≺ even when the stronger ≤ or relations would also hold, i.e. we use the concept stochastic domination even for estimating almost surely bounded or deterministic quantities.) To control the second term on the right side of (5.24), we note that a first order Neumann expansion of the resolvents yields where we used the boundedness of b i , ω B (z), Q i 1 and Q i 2 . Notice next the identities for j ∈ 1, N , with z = E + iη and |G| 2 = G * G. The second identity in (5.26) is the Ward identity that is valid for the Green function of any self-adjoint operator and it can be checked by spectral calculus. For the first identity in (5.26), recalling the definition in (5.12) and that e * j (A + B i )e i = (a i + b i )δ i j , one sees that for any fixed i, , ii | 2 thus the first identity in (5.26) with j = i follows. For j = i, one can see the first identity of (5.26) by applying the Ward identity to the minor of G [i] , with ith row and ith column removed. Since |G Since H i is a Hermitian finite-rank perturbation of H , we can apply (3.3) to conclude that We will now show that tr Q i 1 G i Q i 2 is bounded. Using the resolvent identities and tr B i = tr B = 0, we get thus to control tr Q i 2 Q i 1 G i we need to bound tr (A − z) k G i for k = 0, 1, 2. Since H i is a Hermitian finite-rank perturbation of H , we can apply (3.3) to conclude that Thus, returning to (5.24), we showed Using the Gaussian concentration estimates in (3.1) and w i = e i + g i , we obtain where we also used that e i is an eigenvector of B i , that B i is bounded and (5.26). In the last step (5.13) and (5.30) were used. Combined with (5.31) we thus proved For a later use we remark that, combining (5.28) and (5.32), we also proved In a very similar way we get, recalling that tr B = 0 and B ≺ 1, that To deal with terms containing four or six factors of w i in IE g i [ i ] (see (5.23)), we use the following rough bound. For general random variables X and Y satisfying |X |, |Y | ≺ 1, we have with k ∈ N. Recalling further the shorthand notation m ≡ m μ A μ B and from (2.13) that we get from the above that Thus from (5.34) we obtain Plugging (5.40) into (5.23), using the identity ω A + ω B = z − 1/m and taking the expectation, a straightforward computation shows that Then from Lemma 3.4 one observes that statement (ii) of (5.22) holds. In fact, the first term on the right side of (5.41) is bounded away from zero uniformly on z ∈ S I (η m , 1). We move on to statement (iii) of (5.22). Let Q i 1 and Q i 2 each stand again for either I or B i . Then we note that  (η m , 1) and assume that Proof. Using once more (3.2), we can write To prove (5.45), we follow, mutatis mutandis, the proof of (5.4) by replacing G For instance, for T i the counterpart of (5.8) is Now, according to (5.11), (5.10) and the bound |T i | ≺ 1 (c.f. (5.44)), it suffices to show The proof of (5.46) is nearly the same as the one of (5.10). One can also use a similar argument for S i by using the bound |S i | ≺ 1 from (5.44). We omit the details.

Identification of the Partial Expectation E g i G (i) i i
In this section, we estimate the partial expectation E g i G (i) ii , which together with the concentration inequalities in Lemma 5.1 lead to the following lemma. Recall the definition of S i and T i in (5.1).

Proposition 6.1. Suppose that the assumptions of Theorem 2.5 are satisfied and let
hold uniformly in i ∈ 1, N . Then, In the proof of Proposition 6.1 we will need the following auxiliary lemma whose proof is postponed to the very end of this section. (6.4) and the bounds

Lemma 6.2. Under the assumption of Proposition 6.1, the estimates
hold uniformly in i ∈ 1, N . Furthermore the estimates ] to establish (6.2) and (6.3). Recall the definition of H (i) and G (i) from (4.9). We start with the identity Since A is diagonal, we have Recalling the definitions in (4.7) and (4.8), we have ii . Since moreover B is traceless by assumption (2.27), we have tr B i = tr B = 0. Thus the a priori estimates in (6.1), the bound in (5.44), and the following concentration estimates (c.f. Lemma 3.1) for all j ∈ 1, N , imply that g * i B i G (i) e i is the only relevant term in (6.9). Thus recalling from definition (5.1) that Using integration by parts for complex Gaussian random variables, we compute E g i [S i ] next. Regarding g and g as independent variables for computing ∂ g f (g, g), we have for differentiable functions f : C 2 → C. Using (6.12) with σ 2 = 1/N for each component of g i = (g i1 , . . . , g i N ), we have Using the definitions in (4.7), (4.8) and regarding g ik , g ik as independent variables, we have so that Since e i is an eigenvector of B i with eigenvalue b i , we further get from (6.15) that Plugging (6.16) into (6.13) and rearranging, we get We next claim that the last two terms on the right of (6.17) are small. Using the boundedness of G (i) ii (following from the a priori estimate (6.1)), the bound (5.44), the concentration estimates in (6.10), and estimate (6.5) of the auxiliary Lemma 6.2, and the trivial bounds we see that the last two terms on the right side of (6.17) are indeed negligible, i.e.
where we also used the definitions of T i and S i in (5.1). From assumption (6.1) and Corollary 5.2, we have the bounds We hence obtain from (6.19), (6.5), and the concentration estimates in (6.6), (5.4) that Repeating the above computations for we similarly obtain Now, using the bounds in (6.20), the estimates (6.4) and |tr G (i) −tr G| ≺ 1 N η (following from (3.3)), we obtain from (6.21) and (6.22) the equations (6.23) and We first approximately solve (6.24) for E g i [T i ] to show, under the assumptions of Proposition 6.1, that To see this, we recall (6.8) and (6.11) which together with assumption (6.1) imply that By the concentration estimate (5.45), we also have In addition, by the identity BG = I − (A − z)G, assumption (6.1) and equality (5.38), we have, using the shorthand notation m ≡ m μ A μ B , Substituting (6.26) and assumption (6.1) into (6.24), and using |T i |, |S i | ≺ 1, we obtain Using (6.27) and the second equation of (2.13), we have Then solving (6.23) and (6.24) for E g i S i , we obtain Averaging over the index i and reorganizing, we get . (6.31) Now, recalling the concentration of S i in (5.45) and estimate (6.11), we have Note that under assumption (6.1), we can use Corollary 5.2 to get (5.44), which together with (6.1) implies that the assumptions in Lemma 4.1 in the case of i = j = k are satisfied. Then, by (4.11) with i = j = k and (6.8), we get for all i ∈ 1, N . Using (6.32) and (6.33) we obtain Substituting (6.34) and assumption (6.1) into the right side of (6.31), and using |tr G| 1 (following from (6.27)) and |T i | ≺ N − γ 4 , we obtain Now, plugging (6.35) back into (6.30) gives which together with (6.8) and (6.32) implies that (6.37) in light of the definition of ω c B (z) in (2.30). By assumption (6.1) we see that ω c . Hence by (3.15), we also have Im ω c B (z) ≥ c for some positive constant c. Therefore, we get (6.2) from (6.37).
Then (6.36) and (6.2), together with the definition of ω c B (z) in (2.30) and the concentration of S i in (5.2), imply the estimate of S i in (6.3).
We conclude this section with the proof of Lemma 6.2.
Proof of Lemma 6.2. We start by invoking the finite-rank perturbation formula (3.3) to get Hence, it suffices to verify (6.4) and (6.5) with G (i) replaced by G. Recalling from Sect. 4 that R i = I − r i r * i and using the fact that R i is a Householder reflection (in fact r i 2 2 = 2 by construction), we have B i = R i B R i . Then we write Using that G ≤ 1/η, we immediately get the deterministic bound |d i | ≤ C/N η, for some numerical constant C. Together with (6.39) this implies the first estimate in (6.4). The second estimate in (6.4) is obtained in the similar way. The bounds in (6.5) follow by combining the sharp formulas for tr ( BG) and tr ( BG B) from (6.27), (6.35) with the estimates in (6.4).
To prove (6.6), we set Q i = B i or ( B i ) 2 and note that where we used that g i and G i are independent, and once more (3.3).

Proof of Theorem 2.5: Inequalities (2.20) and (2.22)
In this section, we prove the estimates (2.20) and (2.22) of Theorem 2.5 via a continuity argument. We also prove Theorem 2.6. First, let us recall the matrix H and its Green function G defined in (2.28) and (2.29), these are the natural counterparts of H and G with the roles of A and B as well as the roles of U and U * interchanged. We can apply a similar partial randomness decomposition to the unitary U * in H as we did for U in H in Sect. 4. This means that, for any i ∈ 1, N , there exists an independent pair ( v i , U i ), uniformly distributed on S N −1 C and U (N − 1), respectively, such that with r i := √ 2(e i + e −i θ i v i )/ e i + e −i θ i v i 2 , we have the decomposition U * = −e i θ i R i U i , where θ i is the argument of the ith coordinate of v i ; where R i := (I − r i r * i ) and U i is the unitary matrix with e i as its ith column and U i as its (i, i)-matrix minor. Analogously to g i defined in (4.6), we define a Gaussian vector g i = ( g i1 , . . . , g i N ) ∼ N C (0, N −1 I ), to approximate e −i θ i v i . Setting w i := e i + g i and W i := I − w i w * i , we define for all i ∈ 1, N . Calligraphic letters are used to distinguish the decompositions of H from the decompositions of H . Next, we introduce the z-dependent random variable Moreover, for any δ ∈ [0, 1] and z ∈ S I (η m , 1), we define the following event The subscript d refers to "diagonal" matrix elements. With the above notation, we have the following lemma.
such that if the estimate holds for all D > 0 and N ≥ N 1 (D, γ, ε), for some threshold N 1 (D, γ, ε), then we also have Proof. In this proof we fix z ∈ S I (η m , 1). By the definition of ≺ in Definition 1.1, we see that assumption (7.3) implies and Hence, we can use Corollary 5.2 to get (5.44). Together with the boundedness of G are satisfied when i = j = k. Thus (4.11) holds when i = j = k. Hence, invoking, (7.5) and Proposition 6.1, we get Switching the roles of A and B as well as U and U * , and further using (2.31), we also get (7.8) under (7.6). Now, we state the conclusions (7.7) and (7.8) in a more explicit quantitative form assuming (7.3) which is a quantitative form of (7.5)-(7.6). Namely, we show that the inequalities is an event determined as the intersection of the "typical" events in all the concentration estimates in Sects. 4-6. To see this more precisely, we go back to the proofs in these sections. The concentration estimates always involved quantities of the form IE g i [g * i Q x] with x = g i , e i and some explicit matrix Q that is independent of g i but often z-dependent. The total number of such estimates was linear in N . Thus, according to Lemma 3.1, for any (small) ε > 0 and (large) D > 0, there exists an event d (z, D, ε) with such that all estimates of the form in Sects. 4-6 hold on d (z, D, ε) for all N ≥ N 2 (D, ε). In addition, the threshold N 2 (D, ε) is independent of the spectral parameter z. We now follow the proofs in Sects. 4-6 to the letter but we use (7.10), (7.11) and (7.3) instead of the ≺ relation. Instead of (7.7) and (7.8), we find that the analogous but more quantitative bounds (7.9) hold on the intersection of the events d (z, N − γ 4 ) and d (z, D, ε).
It remains to show that on the event d (z, 7.12) hold when N ≥ N 3 (D, γ, ε).
To this end, we use the stability of the system μ A ,μ B (ω A , ω B , z) = 0 as formulated in Lemma 3.3. By the definition of the approximate subordination functions ω c A (z) and ω c B (z) in (2.30), by the identity (2.32) and by taking the average over the index i in the estimates in (7.9), we get the system of equations (7.13) where the error terms r A and r B satisfy (D, γ, ε). Using the definition of d (z, δ) in (7.2), (7.9) and the fact that z ∈ S I (η m , 1), so ω A (z) and ω B (z) are well separated from the real axis, we have γ, ε). Hence, plugging the third equation of (7.13) into the first two and using (3.15) together with (7.14), we get (D, γ, ε). Therefore, by Lemma 3.3, we get (7.12). Hence, we completed the proof of Lemma 7.1. Given Lemma 7.1, we next prove Theorem 2.5 via a continuity argument similarly to [19].
Proof of (2.20) of Theorem 2.5. Using Theorem 1.2 (i) of [26] together with Lemma C.1 of [26], we see that for η = 1, we have if 0 < γ ≤ 1/7 (say). In addition, owing to the estimate G ≤ 1/η, assumption (4.10) obviously holds for η = 1. Hence, by Lemma 4.1 in the case of i = j = k and its analogue for G Hence, for any E ∈ I and D > 0, For simplicity, we omit the real part E from the notation and rewrite Our aim is to show that for any η ∈ [η m , 1], To see (7.20), we first notice that by the Lipschitz continuity of the Green function and of the subordination functions ω A (z) and ω B (z) (see (3.9)), we have where the last step is obtained by choosing γ > 0 sufficiently small. Now, we start from (7.18). By (7.21), we get Hence, we can use Lemma 7.1 to get which together with (7.21) implies (7.20) with η = 1. Now, replacing 1 by 1 − N −5 , we get from (7.22), (7.18) and the fact P holds for all N ≥ N 3 (D, γ, ε). Now, using (7.23) instead of (7.18), we get (7.20) for η = 1 − N −5 . Iterating this argument, we obtain for any η ∈ [η m , 1] ∩ N −5 Z that Hence, we have for all N ≥ N 3 (D, γ, ε), which further implies for all N ≥ N 3 (D, γ, ε), by using (7.21). Then, using Lemma 7.1 again, we obtain uniformly for all η ∈ [η m , 1] ∩ N −5 Z, when N ≥ N 3 (D, γ, ε). Finally, by continuity, we can extend the bounds from z in the discrete lattice to the entire domain S I (η m , 1). We then get Proof of Theorem 2.6. Using the spectral decomposition of the Green function G, we have Fix a small γ > 0. For any λ i ∈ I, we set E = λ i on the right side of (7.26) and use (2.20) to bound the left side of it with z = λ i + iη, η = N −1+γ . Then we obtain Since γ > 0 is arbitrarily small, we get (2.26). This completes the proof of Theorem 2.6.

Proof of Theorem 2.5: Inequalities (2.21)
In this section, we prove (2.21) of Theorem 2.5. Note that, from (7.25) in the proof of (2.20) in Theorem 2.5, we know that the following estimates hold uniformly on S I (η m , 1), Taking (8.1) as an input, we follow the discussion in Sects. 5-7 to prove the estimate (2.21) with the following modifications. We introduce the quantities that generalize T i (z) and S i (z) defined in (5.1). In particular, T i (z) ≡ T i,i (z) and S i (z) ≡ S i,i (z), but we henceforth implicitly assume that i = j. (We use a comma in the subscripts of T i, j , S i, j since they are not the entries of some matrix.) We often abbreviate T i, j ≡ T i, j (z) and S i, j ≡ S i, j (z). We first establish the concentration estimates for G (i) i j (see Lemma 8.1), and T i, j and S i, j ; see Lemma 8.2. In Proposition 8.3 we then derive self-consistent equations for E g i T i, j and E g i S i, j that will show, together with concentration estimates, that We then close the argument via continuity.
We start with the analogue of Lemma 5.1 for the off-diagonal entries of G (i) .
for all i, j ∈ 1, N , i = j. Then To control G [i] j j , we recall from (5.12) that the matrix H [i] is block-diagonal and we thus have, for j = i where A i and B i the are (i, i)-matrix minors of A and B respectively (obtained by removing the ith column and ith row) and The remaining part of the proof is nearly the same as the one of Lemma 5.1. We omit the details.
We have the following analogue of Corollary 5.2.
and the concentration estimates Proof. With the estimates in (8.1) and (8.8), the proof is analogous to that of Corollary 5.2. Here we get the conclusions for all z = E + iη ∈ S I (η m , 1) at once, since we use the uniform estimate (8.1) instead of assumption (5.43) for one fixed z. We omit the details.
Finally, we have the following counterpart to Proposition 6.1. and Proof. The proof is similar to that of Proposition 6.1. Having established the concentration inequalities in (8.4), it suffices to estimate E g i G (i) i j to prove (8.11). We then start with Choosing henceforth i = j, mimicking the reasoning from (6.9) to (6.11) and using (8.9), we arrive at Then, instead of (6.17), we obtain where we directly used the definitions in (8.2). Then, similarly to (6.23), using the concentration estimates in Lemma 8.1 and in Lemma 8.2, as well as the Gaussian concentration estimates in (6.10), the bound (6.18) and Lemma 6.2 for tracial quantities, we obtain Analogously, we also have Solving E g i S i, j from (8.16) and (8.17), we have Using (6.35), the assumption |G (i) i j | ≺ 1 and the bound |T i, j | ≺ 1 of (8.9), we have which together with (8.13), (8.14), the concentration estimate (8.10) implies that This proves the estimate in (8.11). Next, we bound S i, j . Starting from (8.18) we directly get the second estimates in (8.12) from the Green function bound (8.11) and the concentration estimate (8.10).
It remains to estimate T i, j . Plugging the bound on G i j in (8.11) and the bound on S i, j in (8.12) into the equation (8.17), we obtain Invoking the estimate (6.29) we get E g i T i, j ≺ 1 √ N η . Then the first estimate in (8.12) follows from the concentration estimate for T i, j in (8.10). This completes the proof.
Having established Lemma 8.1 and Proposition 8.3, we next prove (2.21) of Theorem 2.5 via a continuity argument similar to the proof of (2.20).
Proof of (2.21) of Theorem 2.5. Fixing any z ∈ S I (η m , 1) and using Proposition 8.3, under the assumption we have Hence, in principle, it suffices to conduct a continuity argument from η = 1 to η = η m (similar to the proof of (2.20) of Theorem 2.5) to show that the bound (8.21) holds uniformly for z ∈ S I (η m , 1). However, in order to show that (8.23) also holds uniformly for z ∈ S I (η m , 1) quantitatively, we monitor G i j in the continuity argument as well. To this end, we introduce the z-dependent random and N − γ 4 by 1. We also set δ = 1 in this proof. This is a quantitative description of the derivation of the first bound in (8.22) and (8.23) from (8.21). The main difference is that here o (z) is the event defined as the intersection of the "typical" events in all the concentration estimates in Sects. 4-6, in the proofs of Lemma 8.1 and Proposition 8.3, and the event on which the following bounds hold Note that, by (8.1) and (8.8), we know that (8.24) holds with high probability uniformly on S I (η m , 1). With the analogue of Lemma 7.1 for o (z, δ = 1) and o (z), we conduct a continuity argument similar to the one in the proof of (2.20). Again, by Lipschitz continuity of the Green function it suffices to show estimate (2.21) on the lattice S I (η m , 1) defined in (7.19). We fix E ∈ I ∩ N −5 Z, write z = E + iη and decrease η from η = 1 down to N −1+γ in steps of size N −5 . The initial estimate for η = 1, i.e. o (E + i) ≤ 1 follows directly from the trivial fact G (i) (z) , G(z) ≤ 1/η. Then one can show step by step that for any η ∈ [η m , 1], say, which is the analogue of (7.20). The remaining proof is nearly the same as the counterpart in the proof of (2.20). We thus omit the details.
Similarly to (4.6), we define g ik := sgn(v ii ) g ik , k = i , and introduce an N (0, N −1 ) variable g ii , which is independent of the orthogonal matrix U and of g i . Let g i := (g i1 , . . . , g i N ) and note that g i ∼ N R (0, N −1 I ). Then we set w i := e i + g i and W i := I − w i w * i as before. With these modifications, we follow the proofs in Sects. 4-7 verbatim. The only difference is the derivation of (6.19). Instead of (6.12), we use the following integration by parts formula for real Gaussian random variables for differentiable functions f : R → R. Correspondingly, instead of (6.14), we have Hence, we get k + e k e * i + e k g * i + g i e * k G (i) e j instead of (6.15). Substitution into the identity yields E g i [g * i B i G (i) e j ] = (r.h.s. of (8.15)) + where we introduced b i := w i B i w i . Note that the last two terms were discussed in the unitary setup, and they were shown to be negligible. Therefore, to get (8.16) also in the orthogonal case, we rely on the following lemma to discard the supplementary small terms in (A.3). At first, let us discuss the case of i = j, which suffices for the proof of (2.20).
Lemma A.1. Under the assumption of Proposition 6.1, we have the following bounds for all i ∈ 1, . . . , N .
Note that, the proofs of Lemma 5.1, Lemma 6.1 and Lemma 7.1 still work since we have the bounds (3.9), (3.14), and (3.15) as well. Although the bound in (3.9) should be replaced by 2S |z−1| in the case μ α = μ β , it is harmless for our proof. Hence, analogously to the proof of Theorem 2.5, one can use Lemma 3.3, Lemma 7.1 and estimates (3.9), (3.14) and (3.15), to complete the proof of Proposition B.1. Especially, the proof in the case μ α = μ β exactly agrees with the proof of Theorem 2.5. For the case μ α = μ β , we need to replace S by S |z−1| in Lemma 3.3 due to (B.6). In the sequel, we simply illustrate the continuity argument in this case. Let z, z ∈ S ς I (a, b), where z = E + iη and z = E + iη , with η = η + N −5 . In addition, we set z 0 = z, ω 1 = ω A , ω 2 = ω B , ω 1 = ω c A and ω 2 = ω c B in Lemma 3.3. Suppose now that (B.5) holds for z . Using the Lipschitz continuity of the Green function (i.e. G(z) − G(z ) ≤ N 2 |z − z |) and of the subordination functions ω A (z) and ω B (z) (c.f. (3.9) with S replaced by S |z−1| ), we can choose δ in (3.11) to be In light of the condition k 2 > δK S |z−1| (c.f. sentence above (3.12), with S replaced by S |z−1| ), one needs to guarantee that δS ≤ |z − 1|ε, for sufficiently small constant ε > 0, which is a direct consequence of the assumption that z ∈ S ς I (a, b) and (B.7). Note that r (z) 2 ≺ 1 √ N η remains valid since estimate (7.15) does not depend on the stability of the system μ A ,μ B (ω A , ω B , z) = 0, as long as (3.14), (3.15) and (3.9) hold. The remaining parts of the proof are analogous to those of Theorem 2.5 and we thus omit the details.