Communications in Mathematical Physics Local Law of Addition of Random Matrices on Optimal Scale

The eigenvalue distribution of the sum of two largeHermitianmatrices, when one of them is conjugated by a Haar distributed unitary matrix, is asymptotically given by the free convolution of their spectral distributions. We prove that this convergence also holds locally in the bulk of the spectrum, down to the optimal scales larger than the eigenvalue spacing. The corresponding eigenvectors are fully delocalized. Similar results hold for the sum of two real symmetric matrices, when one is conjugated by Haar orthogonal matrix.


Introduction
The pioneering work [31] of Voiculescu connected free probability with random matrices, as one of the most prominent examples for a noncommutative probability space is the space of Hermitian N × N matrices. On one hand, the law of the sum of two free random variables with laws µ α and µ β is given by the free additive convolution µ α ⊞ µ β . On the other hand, in case of Hermitian matrices, the law can be identified with the distribution of the eigenvalues. Thus the free additive convolution computes the eigenvalue distribution of the sum of two free Hermitian matrices. However, freeness is characterized by an infinite collection of moment identities and cannot easily be verified in general. A fundamental direct mechanism to generate freeness is conjugation by random unitary matrices. More precisely, two large Hermitian random matrices are asymptotically free if the unitary transfer matrix between their eigenbases is Haar distributed. The most important example is when the spectra of the two matrices are deterministic and the unitary conjugation is the sole source of randomness. In other words, if A = A (N ) and B = B (N ) are two sequences of deterministic N × N Hermitian matrices and U is a Haar distributed unitary, then A and U BU * are asymptotically free in the large N limit and the eigenvalue distribution of A + U BU * is given by the free additive convolution µ A ⊞ µ B of the eigenvalue distributions of A and B.
Since Voiculescu's first proof, several alternative approaches have been developed, see e.g., [11,16,29,30], but all of them were global in the sense that they describe the eigenvalue distribution in the weak limit, i.e., on the macroscopic scale, tested against N -independent test functions (to fix the scaling, we assume that A (N ) and B (N ) are uniformly bounded).
be two families of nonnegative random variables where V (N ) is a possibly N -dependent parameter set. We say that Y stochastically dominates X, uniformly in v, if for all (small) ǫ > 0 and (large) D > 0,

2)
for sufficiently large N ≥ N 0 (ǫ, D). If Y stochastically dominates X, uniformly in v, we write X ≺ Y .
We further rely on the following notation. We use the symbols O( · ) and o( · ) for the standard big-O and little-o notation. We use c and C to denote strictly positive constants that do not depend on N . Their values may change from line to line. For a, b ≥ 0, we write a b, a b if there is C ≥ 1 such that a ≤ Cb, a ≥ C −1 b respectively.
We use bold font for vectors in C N and denote the components as v = (v 1 , . . . , v N ) ∈ C N . The canonical basis of C N is denoted by (e i ) N i=1 . For v, w ∈ C N , we write v * w for the scalar product N i=1 v i w i . We denote by v 2 the Euclidean norm and by v ∞ = max i |v i | the uniform norm of v ∈ C N . We denote by M N (C) the set of N × N matrices over C. For A ∈ M N (C), we denote by A its operator norm and by A 2 its Hilbert-Schmidt norm. The matrix entries of A are denoted by A ij = e * i Ae j . We denote by trA the normalized trace of A, i.e., trA = For v, w ∈ C N , the rank-one matrix vw * has elements (vw * ) ij = (v i w j ). Let g = (g 1 , . . . , g N ) be a real or complex Gaussian vector. We write g ∼ N R (0, σ 2 I N ) if g 1 , . . . , g N are independent and identically distributed (i.i.d.) N (0, σ 2 ) normal variables; and we write g ∼ N C (0, σ 2 I N ) if g 1 , . . . , g N are i.i.d. N C (0, σ 2 ) variables, where g i ∼ N C (0, σ 2 ) means that Re g i and Im g i are independent N (0, σ 2 2 ) normal variables. Finally, we use double brackets to denote index sets, i.e., n 1 , n 2 := [n 1 , n 2 ] ∩ Z , for n 1 , n 2 ∈ R.

Main results
2.1. Free additive convolution. In this subsection, we recall the definition of the free additive convolution. This is a shortened version of Section 2.1 of [1] added for completeness.
Given a probability measure * µ on R its Stieltjes transform, m µ , on the complex upper half-plane C + := {z ∈ C : Im z > 0} is defined by Note that m µ : C + → C + is an analytic function such that lim ηր∞ iη m µ (iη) = −1 . Conversely, if m : C + → C + is an analytic function such that lim ηր∞ iη m(iη) = 1, then m is the Stieltjes transform of a probability measure µ, i.e., m(z) = m µ (z), for all z ∈ C + . * All probability measures considered will be assumed to be Borel.
We denote by F µ the negative reciprocal Stieltjes transform of µ, i.e., Observe that as follows from (2.2). Note, moreover, that F µ is an analytic function on C + with nonnegative imaginary part.
The free additive convolution is the symmetric binary operation on probability measures on R characterized by the following result.
We are now all set to introduce our notion of regular bulk, B µ1⊞µ2 , of µ 1 ⊞ µ 2 . Informally, we let B µ1⊞µ2 be the open set on which µ 1 ⊞ µ 2 has a continuous density that is strictly positive and bounded from above. For a formal definition we first introduce the set (2.9) Note that U µ1⊞µ2 does not contain any atoms of µ 1 ⊞ µ 2 . By the Luzin-Privalov theorem the set {x ∈ R : lim ηց0 F µ1⊞µ2 (x + iη) = 0} has Lebesgue measure zero. In fact, an even stronger statement applies for the case at hand. Belinschi [5] showed that if x ∈ R is such that lim ηց0 F µ1⊞µ2 (x + iη) = 0, then it must be of the form There could only be finitely many such x, thus U µ1⊞µ2 must contain an open non-empty interval.
The regular bulk is obtained from U µ1⊞µ2 by removing the zeros of f µ1⊞µ2 inside U µ1⊞µ2 .
Definition 2.4. The regular bulk of the measure µ 1 ⊞ µ 2 is the set is an open nonempty set on which µ 1 ⊞ µ 2 admits the density f µ1⊞µ2 . The density is strictly positive and thus (by Proposition 2.3) real analytic on B µ1⊞µ2 .

2.2.
Definition of the model and assumptions. Let A ≡ A (N ) and B ≡ B (N ) be two sequences of deterministic real diagonal matrices in M N (C), whose empirical spectral distributions are denoted by µ A and µ B , respectively. More precisely, The matrices A and B actually depend on N , but we omit this from our notation. Proposition 2.1 asserts the existence of unique analytic functions ω A and ω B satisfying the analogue of (2.5) such that, for all z ∈ C + , We will assume that there are deterministic probability measures µ α and µ β on R, neither of them being a single point mass, such that the empirical spectral distributions µ A and µ B converge weakly, as N → ∞, to µ α and µ β , respectively. More precisely, we assume that 13) as N → ∞, where d L denotes the Lévy distance. Proposition 2.1 asserts that there are unique analytic functions ω α , ω β satisfying the analogue of (2.5) such that, for all z ∈ C + , (2.14) Proposition 4.13 of [9] states that d Our results also holds for the real case when U is Haar distributed on the orthogonal group, O(N ), of degree N . Throughout the main part of the paper the discussion will focus on the unitary case while the orthogonal case is addressed in Appendix A. We introduce the Green function, G H , of H and its normalized trace, m H , by For simplicity, we frequently use the notation G(z) instead of G H (z) and we write G ij (z) ≡ (G H ) ij (z) for the (i, j)th matrix element of G(z).

2.3.
Main results. For a, b ≥ 0, b ≥ a, and I ⊂ R, let In addition, for brevity, we set, for any given γ > 0, The main results of this paper are as follows.
Theorem 2.5. Let µ α and µ β be two compactly supported probability measures on R, and assume that neither is supported at a single point and that at least one of them is supported at more than two points. Assume that the sequences of matrices A and B in (2.15) are such that their empirical eigenvalue distributions µ A and µ B satisfy (2.13). Let I ⊂ B µα⊞µ β be a nonempty compact interval. Then, for any fixed γ > 0, the estimates and hold uniformly on S I (η m , 1) (see (2.17)), where η ≡ Im z and η m is given in (2.18).
Remark 2.1. The assumption that neither of µ α and µ β is a point mass, ensures that the free additive convolution is not a simple translate. The additional assumption that at least one of them is supported at more than two points is for the brevity of the exposition here.
In Appendix B, we present the corresponding result for the special case when µ α and µ β are both convex combinations of two point masses.
i.e., the Lévy distances of the empirical eigenvalue distributions of A and B from their limiting distributions control uniformly the deviations of the corresponding subordination functions and Stieltjes transforms. Note moreover that max z∈SI (0,1) |m µα⊞µ β (z)| < ∞ by compactness of I and analyticity of m µ1⊞µ2 . Thus the Stieltjes-Perron inversion formula directly implies that (µ A ⊞ µ B ) ac has a density, f µA⊞µB , inside I and that uniformly on S I (η m , 1), where η = Im z. Averaging over the index i, we get the corresponding statement for |m H − m µA⊞µB | with the same error bound.
Remark 2.3. Note that assumption (2.13) does not exclude that the matrix H has outliers in the large N limit. In fact, the model H = A + U BU * shows a rich phenomenology when, say, A has a finite number of large spikes; we refer to the recent works in [7,13,26].
Let λ 1 , . . . , λ N be the eigenvalues of H, and u 1 , . . . , u N be the corresponding ℓ 2 -normalized eigenvectors. The following result shows complete delocalization of the bulk eigenvectors.  and denote their Green functions by We write z = E + iη ∈ C + , with E ∈ R and η > 0, for the spectral parameter. In the sequel we often omit z ∈ C + from the notation if no confusion can arise. Recalling (2.16), we have For brevity, we set A := U * AU , B := U BU * .
The following functions will play a key role in our proof.
Notice that the role of A and B are not symmetric in these notations. By cyclicity of the trace, we may write We remark that the approximate subordination functions defined above are slightly different from the candidate subordination functions introduced in [26] which were later used in [1].
The functions ω c A (z) and ω c B (z) turn out to be good approximations to the subordination functions ω A (z) and ω B (z) of (2.12). A direct consequence of the definition in (2.28) is that (2.30) Having set the notation, our main task is to show that where we focus, for simplicity, on the diagonal Green function entries only. We first heuristically explain how (2.31) leads to our main result in (2.19). A key input is the local stability of the system (2.12) established in [1]; see Subsection 3.3 for a summary. Averaging over i in (2.31), we get Replacing H by H, we analogously get which is a perturbation of (2.12). Using the local stability of the system (2.12), we obtain Plugging this estimate back into (2.31) we get (2.19). The full proof of this step is accomplished in Section 7. We now return to (2.31). Its proof relies on the following decomposition of Haar measure on the unitary group given, e.g., in [17,28]. For any fixed i ∈ 1, N , any Haar unitary U can be written as Here R i is the Householder reflection (up to a sign) sending the vector e i to v i , where v i ∈ C N is a random vector distributed uniformly on the complex unit (N − 1)-sphere, and θ i ∈ [0, 2π) is the argument of the ith coordinate of v i . The unitary matrix U i has e i as its ith column and its (i, i)-matrix minor (obtained by removing the ith column and ith row) is Haar distributed on U (N − 1); see Section 4 for more detail.
The gist of the decomposition in (2.35) is that the Householder reflection R i and the unitary U i are independent, for each fixed i ∈ 1, N . Hence, the decomposition in (2.35) allows one to split off the partial randomness of the vector v i from U .
The proof of (2.31) is divided into two parts: To prove part (i), we resolve dependences by expansion and use concentration estimates for the vector v i . This part is accomplished in Section 5.
Part (ii) is carried out in Section 6. We start from the Green function identity Taking the E v i expectation of (2.36) and recalling the definition of the approximate subordination function ω c B (z) in (2.28), it suffices to show that to prove (2.31). Denoting B i := U i B(U i ) * and setting, for z ∈ C + , Hence, it suffices to estimate E vi S ♯ i ] instead. Approximating e −iθi v i by a Gaussian vector and using integration by parts for Gaussian random variables, we get the pair of equations where we dropped the z-argument for the sake of brevity; see (6.23) and (6.24) for precise statements with slightly modified S # i and T # i . Solving E v i S ♯ i from the above two equations, we arrive at Returning to (2.37), we also obtain, using concentration estimates for ( BG) ii (which follow from the concentration estimates of G ii established in part (i) and (2.36)), that Thus, averaging (2.38) over the index i and comparing with (2.39), we conclude that Plugging this last estimate back into (2.38), we eventually find that which together with (2.37) and (2.36) gives us part (ii). This completes the sketch of the proof for the unitary case. The proof of the orthogonal case is similar. The necessary modifications are given in Appendix A.

Preliminaries
In this section, we first collect some basic tools used later on and then summarize results of [1]. In particular, we discuss, under the assumptions of Theorem 2.5, stability properties of the system (2.12) and state essential properties of the subordination functions ω A and ω B .
3.1. Stochastic domination and large deviation properties. Recall the definition of stochastic domination in Definition 1.1. The relation ≺ is a partial ordering: it is transitive and it satisfies the arithmetic rules of an order relation, e.g., if X 1 ≺ Y 1 and X 2 ≺ Y 2 then Gaussian vectors have well-known large deviation properties. We will use them in the following form whose proof is standard.
Rank-one perturbation formula. At various places, we use the following fundamental perturbation formula: for α, β ∈ C N and an invertible D ∈ M N (C), we have as can be checked readily. A standard application of (3.2) is recorded in the following lemma.
Proof. Let z ∈ C + and α ∈ C N . Then from (3.2) we have We can thus estimate Since R = R * ∈ M N (C) has finite rank, we can write R as a finite sum of rank-one Hermitian matrices of the form ±αα * . Thus iterating (3.5) we get (3.3).
for some positive constant S. In particular, the partial Jacobian matrix of (3.6) given by In particular, ω 1 and ω 2 are Lipschitz continuous with constant 2S. A more detailed analysis yields the following local stability result of the system Φ µ1,µ2 (ω 1 , ω 2 , z) = 0. ). Fix z 0 ∈ C + . Assume that the functions ω 1 , ω 2 , r 1 , where r(z) := ( r 1 (z), r 2 (z)) ⊤ . Assume moreover that there is δ ∈ [0, 1] such that where , ω 2 (z 0 )), and assume in addition that there are strictly positive constants K and k with k > δ and with k 2 > δKS such that In Section 7, we will apply Lemma 3.3 with the choices µ 1 = µ A and µ 2 = µ B . We thus next show that the system Φ µA,µB (ω A , ω B , z) = 0 is S-stable, for all z ∈ S I (0, 1), and that (3.12) holds uniformly on S I (0, 1); see (2.17) for the definition.  [1]). Let µ A , µ B be the probability measures from (2.11) satisfying the assumptions of Theorem 2.5. Let ω A , ω B denote the associated subordination functions of (2.12). Let I be the interval in Theorem 2.5. Then for N sufficiently large, the system Φ µA,µB (ω A , ω B , z) = 0 is S-stable with some positive constant S, uniformly on S I (0, 1). Moreover, there exist two strictly positive constants K and k, such that for N sufficiently large, we have Remark 3.1. Under the assumptions of Lemma 3.4, the estimates in (3.15) can be extended as follows. There isk > 0 such that This follows by combining (3.15) with the Nevanlinna representations in (2.8).
We conclude this section by mentioning that the general perturbation result in Lemma 3.3 combined with Lemma 3.4, can be used to prove (2.22). We refer to [1] for details.

Partial randomness decomposition
We use a decomposition of Haar measure on the unitary groups obtained in [17] (see also [28] , which is independent of v 1 , such that one has the decomposition and where θ 1 is the argument of the first coordinate of the vector v 1 . More generally, for any i ∈ 1, N , there exists an independent pair (v i , U i ), with v i a uniformly distributed unit vector v i and with U i ∈ U (N − 1) a Haar unitary, such that one has the decomposition where U i is the unitary matrix with e i as its ith column and U i as its (i, i)-matrix minor.
With the above notation, we can write where we introduced the shorthand notation We further define It is well known that for a uniformly distributed unit vector v i ∈ C N , there exists a Gaussian vector By definition, θ i is also the argument of g ii . Set and introduce an N C (0, N −1 ) variable g ii which is independent of the unitary matrix U and of g i . Then, we denote g i := (g i1 , . . . , g iN ) and note g i ∼ N C (0, N −1 I). In addition, by definition, we have In subsequent estimates for G ij , it is convenient to approximate r i by in the decomposition U = −e iθi R i U i , without changing the randomness of U i . To estimate the precision of this approximation, we require more notation: Let Correspondingly, we denote The following lemma shows that r i can be replaced by w i in Green function entries at the expense of an error that is below the precision we are interested in.
Proof of Lemma 4.1. Fix i, j, k ∈ 1, N . We first note that By the strong concentration of the norms in (4.11) and g ii , g ii ∼ N C (0, N −1 ), we have Fix now z ∈ C + . Dropping z from the notation, a first order Neumann expansion of the resolvent yields Observe that the second term on the right side of (4.13) is a polynomial in the terms with coefficients of the form δ k1 1i δ k2 2i , for some nonnegative integers k 1 , k 2 such that k 1 +k 2 ≥ 1. By assumption (4.9) and the fact B i e i = b i e i , we further observe that the first four terms in (4.14) are stochastically dominated by one. The last four terms are also stochastically dominated by one as follows from the trivial fact e * i B i e i = b i and Lemma 3.1. The terms in the second line of (4.14) are stochastically dominated by with Q i = I or B i , and with x i = e i or g i , where the last step follows from (4.9). Note that the terms in the second line of (4.14) appear only linearly in (4.13). Hence, (4.12), (4.15) and the order one bound for the first and last four terms in (4.14) lead to (4.10).

Concentration with respect to the vector g i
In this section, we show that G is the expectation with respect to the collection (Re g ij , Im g ij ) N j=1 . Besides the diagonal Green function entries G ii ], carried out in the Sections 6 and 7, involves the quantities T i and S i . From a technical point of view, it is convenient to be able to go back and forth between T i , S i and their expectations Thus after establishing concentration estimates for G (i) ii in Lemma 5.1 below, we establish in Corollary 5.2 concentration estimates for T i and S i where we also give a rough bounds on T i , S i and related quantities. We need some more notation: for a general random variable X we define The main task in this section is to prove the following lemma.
Lemma 5.1. Suppose that the assumptions of Theorem 2.5 are satisfied and let γ > 0. Fix z = E + iη ∈ S I (η m , 1) and assume that Proof of Lemma 5.1. In this proof we fix z ∈ S I (η m , 1). Recall the definition of G i (z) in (4.4) and note that G i (z) is independent of v i (or g i ). It is therefore natural to expand G (i) (z) around G i (z) and to use the independence between G i (z) and g i in order to verify the concentration estimates. However, by construction, we have which may be as large as 1/η, depending on a i , b i and z. To circumvent problems coming from instabilities in G i ii (z), we may use a "regularization" trick to enhance stability in the e i -direction: instead of considering the Green function of is in fact well-defined on the whole upper-half plane. Fix any j ∈ 1, N . Using the rank-one perturbation formula (3.2), we get Some algebra then reveals that . (5.8) By assumption (5.3) and identity (5.7), we have ii (z) and G {i} ii (z) are uniformly bounded. We will prove below that Setting j = i in (5.8) and expressing the denominator on the right side by using (5.9)-(5.10), we get In particular, together with Lemma 3.4 and Im ω B (z) ≥ Im z, this implies that the absolute value of the denominator on the right side of (5.8) is bounded from below by some strictly positive constant. Thus, applying IE g i on both sides of (5.10), we obtain the concentration estimate in (5.4).
In the rest of the proof, we verify (5.10). Consider next the matrix since Im ω B is uniformly bounded from below on S I (η m , 1) by Lemma 3.4. We now expand G {i} (z) around G [i] (z) and use the independence among G [i] (z) and g i . For simplicity, we hereafter drop the z-dependence from the notation. We start with noticing that where we introduced Iterating the rank-one perturbation formula (3.2) once, we obtain Taking the (i, j)th matrix entry in (5.16), we have where we introduced We now rewrite (5.17) as Since |G {i} ii | ≺ 1 (c.f., (5.9)) and |G ii | ≺ 1 (c.f., (5.13)), it suffices to verify the following statements to show (5.10): We first show claim (i). Substituting the definitions in (5.15) into (5.18), we have Let Q i 1 and Q i 2 each stand for either I or B i . Recalling that w i = e i + g i and that g i ∼ N C (0, N −1 I) is a complex Gaussian vector, we compute To bound the right side of (5.23) we observe that |( ii | ≺ 1, where we used that e i is an eigenvector of B i and (5.13). (Notice that, for simplicity, here and at several other places we consistently use the notation ≺ even when the stronger ≤ or relations would also hold, i.e., we use the concept stochastic domination even for estimating almost surely bounded or deterministic quantities.) To control the second term on the right side of (5.23), we note that a first order Neumann expansion of the resolvents yields where we used the boundedness of . Notice next the identities Since H i is a Hermitian finite-rank perturbation of H, we can apply (3.3) to conclude that We will now show that tr Q i 1 G i Q i 2 is bounded. Using the resolvent identities and tr B i = tr B = 0, we get Since H i is a Hermitian finite-rank perturbation of H, we can apply (3.3) to conclude that Thus, returning to (5.23), we showed Using the Gaussian concentration estimates in (3.1) and w i = e i + g i , we obtain where we also used that e i is an eigenvector of B i , that B i is bounded and (5.25). In the last step (5.13) and (5.29) were used. Combined with (5.30) we thus proved For a later use we remark that, combining (5.27) and (5.31), we also proved In a very similar way we get, recalling that tr B = 0 and B ≺ 1, that To deal with terms containing four or six factors of w i in IE g i [Ξ i ] (see (5.22)), we use the following rough bound. For general random variables X and Y satisfying |X|, |Y | ≺ 1, we have with k ∈ N. Recalling further the shorthand notation m ⊞ ≡ m µA⊞µB and from (2.12) that we get from the above that Thus from (5.33) we obtain Plugging (5.39) into (5.22), using the identity ω A + ω B = z − 1/m ⊞ and taking the expectation, a straightforward computation shows that Then from Lemma 3.4 one observes that statement (ii) of (5.21) holds. In fact, the first term on the right side of (5.40) is bounded away from zero uniformly on z ∈ S I (η m , 1). We move on to statement (iii) of (5.21). Let Q i 1 and Q i 2 each stand again for either I or B i . Then we note that  Fix z = E + iη ∈ S I (η m , 1) and assume that In particular, |S i (z)|, |T i (z)| ≺ 1, for all i ∈ 1, N . Moreover, we have Proof. Using once more (3.2), we can write Hence to prove the bound in (5.43) it suffices to bound x * i Q To prove (5.44), we follow, mutatis mutandis, the proof of (5.4) by replacing G For instance, for T i the counterpart of (5.8) is Now, according to (5.11), (5.10) and the bound |T i | ≺ 1 (c.f., (5.43)), it suffices to show The proof of (5.45) is nearly the same as the one of (5.10). One can also use a similar argument for S i by using the bound |S i | ≺ 1 from (5.43). We omit the details.

Identification of the partial expectation
In this section, we estimate the partial expectation E g i G (i) ii , which together with the concentration inequalities in Lemma 5.1 lead to the following lemma. Recall the definition of S i and T i in (5.1). Proposition 6.1. Suppose that the assumptions of Theorem 2.5 are satisfied and let γ > 0.

2)
and max i∈ 1,N In the proof of Proposition 6.1 we will need the following auxiliary lemma whose proof is postponed to the very end of this section. Lemma 6.2. Under the assumption of Proposition 6.1, the estimates

4)
and the bounds hold uniformly in i ∈ 1, N . Furthermore the estimates ii (z)], E g i S i (z) and E g i [T i (z)] to establish (6.2) and (6.3). Recall the definition of H (i) and G (i) from (4.8). We start with the identity Recalling the definitions in (4.6) and (4.7), we have Since e i is an eigenvector of B i (c.f., (4.3)), we have ( ii . Since moreover B is traceless by assumption (2.25), we have tr B i = tr B = 0. Thus the apriori estimates in (6.1), the bound in (5.43), and the following concentration estimates (c.f., Lemma 3.1) for all j ∈ 1, N , imply that g * i B i G (i) e i is the only relevant term in (6.9). Thus recalling from definition (5.1) that Using integration by parts for complex Gaussian random variables, we compute E g i [S i ] next. Regarding g and g as independent variables for computing ∂ g f (g, g), we have for differentiable functions f : C 2 → C. Using (6.12) with σ 2 = 1/N for each component of g i = (g i1 , . . . , g iN ), we have Using the definitions in (4.6), (4.7) and regarding g ik , g ik as independent variables, we have so that Since e i is an eigenvector of B i with eigenvalue b i , we further get from (6.15) that Plugging (6.16) into (6.13) and rearranging, we get We next claim that the last two terms on the right of (6.17) are small. Using the boundedness of G (i) ii (following from the apriori estimate (6.1)), the bound (5.43), the concentration estimates in (6.10), and estimate (6.5) of the auxiliary Lemma 6.2, and the trivial bounds 18) we see that the last two terms on the right side of (6.17) are indeed negligible, i.e., where we also used the definitions of T i and S i in (5.1). From assumption (6.1) and Corollary 5.2, we have the bounds We hence obtain from (6.19), (6.5), and the concentration estimates in (6.6), (5.4) that Repeating the above computations for Now, using the bounds in (6.20), the estimates (6.4) and |tr G (i) − tr G| ≺ 1 N η (following from (3.3)), we obtain from (6.21) and (6.22) the equations and We first approximately solve (6.24) for E g i [T i ] to show, under the assumptions of Proposition 6.1, that To see this, we recall (6.8) and (6.11) which together with assumption (6.1) imply that By the concentration estimate (5.44), we also have In addition, by the identity BG = I − (A − z)G, assumption (6.1) and equality (5.37), we have, using the shorthand notation m ⊞ ≡ m µA⊞µB , Substituting (6.26) and assumption (6.1) into (6.24), and using |T i |, |S i | ≺ 1, we obtain Using (6.27) and the second equation of (2.12), we have Then solving (6.23) and (6.24) for E g i S i , we obtain Averaging over the index i and reorganizing, we get .

(6.31)
Now, recalling the concentration of S i in (5.44) and estimate (6.11), we have Note that under assumption (6.1), we can use Corollary 5.2 to get (5.43), which together with (6.1) implies that the assumptions in Lemma 4.1 in the case of i = j = k are satisfied. Then, by (4.10) with i = j = k and (6.8), we get for all i ∈ 1, N . Using (6.32) and (6.33) we obtain Substituting (6.34) and assumption (6.1) into the right side of (6.31), and using |tr G| 1 (following from (6.27)) and |T i | ≺ N − γ 4 , we obtain Now, plugging (6.35) back into (6.30) gives which together with (6.8) and (6.32) implies that in light of the definition of ω c B (z) in (2.28). By assumption (6.1) we see that ω c B (z) = ω B (z) + O ≺ (N − γ 4 ). Hence by (3.15), we also have Im ω c B (z) ≥ c for some positive constant c. Therefore, we get (6.2) from (6.37).
Then (6.36) and (6.2), together with the definition of ω c B (z) in (2.28) and the concentration of S i in (5.2), imply the estimate of S i in (6.3).
We conclude this section with the proof of Lemma 6.2.
Proof of Lemma 6.2. We start by invoking the finite-rank perturbation formula (3.3) to get Hence, it suffices to verify (6.4) and (6.5) with G (i) replaced by G. Recalling from Section 4 that R i = I − r i r * i and using the fact that R i is a Householder reflection (in fact r i 2 2 = 2 by construction), we have B i = R i BR i . Then we write Using that G ≤ 1/η, we immediately get the deterministic bound |d i | ≤ C/N η, for some numerical constant C. Together with (6.39) this implies the first estimate in (6.4). The second estimate in (6.4) is obtained in the similar way. The bounds in (6.5) follow by combining the sharp formulas for tr ( BG) and tr ( BG B) from (6.27), (6.35) with the estimates in (6.4).
To prove (6.6), we set Q i = B i or ( B i ) 2 and note that where we used that g i and G i are independent, and once more (3.3).

Proof of Theorem 2.5: Inequalities (2.19) and (2.21)
In this section, we prove the estimates (2.19) and (2.21) of Theorem 2.5 via a continuity argument. We also prove Theorem 2.6.
First, let us recall the matrix H and its Green function G defined in (2.26) and (2.27), these are the natural counterparts of H and G with the roles of A and B interchanged. We can apply a similar partial randomness decomposition to the unitary U * in H as we did for U in H in Section 4. This means that, for any i ∈ 1, N , there exists an independent pair ( v i , U i ), uniformly distributed on S N −1 C and U (N − 1), respectively, such that with r i := √ 2(e i + e −i θi v i )/ e i + e −i θi v i 2 , we have the decomposition U * = −e i θi R i U i , where θ i is the argument of the ith coordinate of v i ; where R i := (I − r i r * i ) and U i is the unitary matrix with e i as its ith column and U i as its (i, i)-matrix minor. Analogously to g i defined in (4.5), we define a Gaussian vector g i = ( g i1 , . . . , g iN ) ∼ N C (0, N −1 I), to approximate e −i θi v i . Setting w i := e i + g i and W i : for all i ∈ 1, N . Calligraphic letters are used to distinguish the decompositions of H from the decompositions of H.
Next, we introduce the z-dependent random variable Moreover, for any δ ∈ [0, 1] and z ∈ S I (η m , 1), we define the following event The subscript d refers to "diagonal" matrix elements. With the above notation, we have the following lemma.
Lemma 7.1. Suppose that the assumptions of Theorem 2.5 are satisfied and fix γ > 0. For any ε with 0 < ε ≤ γ 8 and for any D > 0 there exists a positive integer N 2 (D, ε) such that the following holds: For any fixed z = E + iη ∈ S I (η m , 1) there exists an event such that if the estimate holds for all D > 0 and N ≥ N 1 (D, γ, ε), for some threshold N 1 (D, γ, ε), then we also have Proof. In this proof we fix z ∈ S I (η m , 1). By the definition of ≺ in Definition 1.1, we see that assumption (7.3) implies and Hence, we can use Corollary 5.2 to get (5.43). Together with the boundedness of G (i) ii and G ii (c.f., (7.5) and (3.15)) this implies that the assumptions in (4.9) of Lemma 4.1 are satisfied when i = j = k. Thus (4.10) holds when i = j = k. Hence, invoking, (7.5) and Proposition 6.1, we get Switching the roles of A and B as well as U and U * , and further using (2.29), we also get under (7.6). Now, we state the conclusions (7.7) and (7.8) in a more explicit quantitative form assuming (7.3) which is a quantitative form of (7.5)-(7.6). Namely, we show that the inequalities Here Ω d (z) is an event determined as the intersection of the "typical" events in all the concentration estimates in Sections 4-6.
To see this more precisely, we go back to the proofs in these sections. The concentration estimates always involved quantities of the form IE g i [g * i Qx] with x = g i , e i and some explicit matrix Q that is independent of g i but often z-dependent. The total number of such estimates was linear in N . Thus, according to Lemma 3.1, for any (small) ε > 0 and (large) D > 0, there exists an event Ω d (z, D, ε) with such that all estimates of the form in Sections 4-6 hold on Ω d (z, D, ε) for all N ≥ N 2 (D, ε). In addition, the threshold N 2 (D, ε) is independent of the spectral parameter z.
We now follow the proofs in Sections 4-6 to the letter but we use (7.10), (7.11) and (7.3) instead of the ≺ relation. Instead of (7.7) and (7.8), we find that the analogous but more quantitative bounds (7.9) hold on the intersection of the events Θ d (z, N − γ 4 ) and Ω d (z, D, ε). It remains to show that on the event Θ d (z, To this end, we use the stability of the system Φ µA,µB (ω A , ω B , z) = 0 as formulated in Lemma 3.3. By the definition of the approximate subordination functions ω c A (z) and ω c B (z) in (2.28), by the identity (2.30) and by taking the average over the index i in the estimates in (7.9), we get the system of equations where the error terms r A and r B satisfy . Using the definition of Θ d (z, δ) in (7.2), (7.9) and the fact that z ∈ S I (η m , 1), so ω A (z) and ω B (z) are well separated from the real axis, we have on the event Θ d (z, N − γ 4 ) ∩ Ω d (z) when N ≥ N 3 (D, γ, ε). Hence, plugging the third equation of (7.13) into the first two and using (3.15) together with (7.14), we get γ, ε). Therefore, by Lemma 3.3, we get (7.12). Hence, we completed the proof of Lemma 7.1. Given Lemma 7.1, we next prove Theorem 2.5 via a continuity argument similarly to [19].
Proof of (2.19) of Theorem 2.5. From Theorem 1.2 of [26], we see that for η = 1, we have if 0 < γ ≤ 1/7 (say). In addition, owing to the estimate G ≤ 1/η, assumption (4.9) obviously holds for η = 1. Hence, by Lemma 4.1 in the case of i = j = k and its analogue for G Hence, for any E ∈ I and D > 0, Thanks to the Lipschitz continuity of the Green function, i.e., G(z) − G(z ′ ) ≤ N 2 |z − z ′ | for any z, z ′ ∈ S I (η m , 1), it suffices to show (2.19) on the lattice S I (η m , 1). We now fix E ∈ I ∩ N −5 Z and decrease η from η = 1 down to N −1+γ in steps of size N −5 . Recall the events Θ d (z, δ) and Ω d (z) in Lemma 7.1, and choose the same ε < γ 8 in Ω d (z, D, ε) for all z. For simplicity, we omit the real part E from the notation and rewrite Our aim is to show that for any η ∈ [η m , 1], To see (7.20), we first notice that by the Lipschitz continuity of the Green function and of the subordination functions ω A (z) and ω B (z) (see (3.9)), we have where the last step is obtained by choosing γ > 0 sufficiently small. Now, we start from (7.18). By (7.21), we get Hence, we can use Lemma 7.1 to get which together with (7.21) implies (7.20) with η = 1. Now, replacing 1 by 1 − N −5 , we get from (7.22), (7.18) and the fact P( holds for all N ≥ N 3 (D, γ, ε). Now, using (7.23) instead of (7.18), we get (7.20) for η = 1 − N −5 . Iterating this argument, we obtain for any η ∈ [η m , 1] ∩ N −5 Z that Hence, we have for all N ≥ N 3 (D, γ, ε), which further implies for all N ≥ N 3 (D, γ, ε), by using (7.21). Then, using Lemma 7.1 again, we obtain uniformly for all η ∈ [η m , 1] ∩ N −5 Z, when N ≥ N 3 (D, γ, ε). Finally, by continuity, we can extend the bounds from z in the discrete lattice to the entire domain S I (η m , 1). We then get Proof of Theorem 2.6. Using the spectral decomposition of the Green function G, we have Fix a small γ > 0. For any λ i ∈ I, we set E = λ i on the right side of (7.26) and use (2.19) to bound the left side of it with z = λ i + iη, η = N −1+γ . Then we obtain Since γ > 0 is arbitrarily small, we get (2.24). This completes the proof of Theorem 2.6.
8. Proof of Theorem 2.5: Inequalities (2.20) In this section, we prove (2.20) of Theorem 2.5. Note that, from (7.25) in the proof of (2.19) in Theorem 2.5, we know that the following estimates hold uniformly on S I (η m , 1), Taking (8.1) as an input, we follow the discussion in Sections 5-7 to prove the estimate (2.20) with the following modifications. We introduce the quantities that generalize T i (z) and S i (z) defined in (5.1). In particular, T i (z) ≡ T i,i (z) and S i (z) ≡ S i,i (z), but we henceforth implicitly assume that i = j. (We use a comma in the subscripts of T i,j , S i,j since they are not the entries of some matrix.) We often abbreviate T i,j ≡ T i,j (z) and S i,j ≡ S i,j (z). We first establish the concentration estimates for G (i) ij (see Lemma 8.1), and T i,j and S i,j ; see Lemma 8.2. In Proposition 8.3 we then derive self-consistent equations for E g i T i,j and E g i S i,j that will show, together with concentration estimates, that |G ij | ≺ 1. We then close the argument via continuity.
We start with the analogue of Lemma 5.1 for the off-diagonal entries of G (i) .
Lemma 8.1. Suppose that the assumptions of Theorem 2.5 are satisfied and let γ > 0. Fix z = E + iη ∈ S I (η m , 1) and assume that for all i, j ∈ 1, N , i = j. Then (8.5) see (5.8 where Ξ i is defined in (5.18) and Ψ i,j is defined in (5.19). Recalling statements (i) and (ii) in (5.21), it suffices to establish that |IE g i [Ψ i,j ]| ≺ 1 √ N η . Note that Ψ i,j contains the terms listed in (5.33), (5.34) and (5.41), as well as the terms Since e i is an eigenvector of B i and of ii . Moreover, using Lemma 3.1 with Q i = I or B i , we have where A i and B i the are (i, i)-matrix minors of A and B respectively (obtained by removing the ith column and ith row) and U i ∈ U (N − 1) is the (i, i)-matrix minor of U i which is Haar distributed as seen at the beginning of Section 4. Note that the matrix and the concentration estimates Proof. With the estimates in (8.1) and (8.8), the proof is analogous to that of Corollary 5.2.
Here we get the conclusions for all z = E + iη ∈ S I (η m , 1) at once, since we use the uniform estimate (8.1) instead of assumption (5.42) for one fixed z. We omit the details.
Finally, we have the following counterpart to Proposition 6.1.
. Under the assumptions of Lemma 8.1, we have Proof. The proof is similar to that of Proposition 6.1. Having established the concentration inequalities in (8.4), it suffices to estimate E g i G (i) ij to prove (8.11). We then start with Choosing henceforth i = j, mimicking the reasoning from (6.9) to (6.11) and using (8.9), we arrive at Then, instead of (6.17), we obtain where we directly used the definitions in (8.2). Then, similarly to (6.23), using the concentration estimates in Lemma 8.1 and in Lemma 8.2, as well as the Gaussian concentration estimates in (6.10), the bound (6.18) and Lemma 6.2 for tracial quantities, we obtain Analogously, we also have Solving E g i S i,j from (8.16) and (8.17), we have Using (6.35), the assumption |G ij | ≺ 1 and the bound |T i,j | ≺ 1 of (8.9), we have which together with (8.13), (8.14), the concentration estimate (8.10) implies that This proves the estimate in (8.11). Next, we bound S i,j . Starting from (8.18) we directly get the second estimates in (8.12) from the Green function bound (8.11) and the concentration estimate (8.10).
It remains to estimate T i,j . Plugging the bound on G ij in (8.11) and the bound on S i,j in (8.12) into the equation (8.17), we obtain Invoking the estimate (6.29) we get E g i T i,j ≺ 1 √ N η . Then the first estimate in (8.12) follows from the concentration estimate for T i,j in (8.10). This completes the proof.
Having established Lemma 8.1 and Proposition 8.3, we next prove (2.20) of Theorem 2.5 via a continuity argument similar to the proof of (2.19).
Proof of (2.20) of Theorem 2.5. Fixing any z ∈ S I (η m , 1) and using Proposition 8.3, under the assumption we have Hence, in principle, it suffices to conduct a continuity argument from η = 1 to η = η m (similar to the proof of (2.19) of Theorem 2.5) to show that the bound (8.21) holds uniformly for z ∈ S I (η m , 1). However, in order to show that (8.23) also holds uniformly for z ∈ S I (η m , 1) quantitatively, we monitor G ij in the continuity argument as well. To this end, we introduce the z-dependent random variable and, for any δ ∈ [0, 1] and z ∈ S I (η m , 1), we define the event c.f., (7.1) and (7.2). The subscript o refers to "off-diagonal". We will mimic the proof of (2.19). Analogously, using Lemma 4.1 and Proposition 8.3, one shows that there exists an event Ω o (z) ≡ Ω o (z, D, ε) such that the conclusions in Lemma 7.1 still hold when we replace Θ d (z, δ) by Θ o (z, δ), Ω d (z) by Ω o (z) and N − γ 4 by 1. We also set δ = 1 in this proof. This is a quantitative description of the derivation of the first bound in (8.22) and (8.23) from (8.21). The main difference is that here Ω o (z) is the event defined as the intersection of the "typical" events in all the concentration estimates in Sections 4-6, in the proofs of Lemma 8.1 and Proposition 8.3, and the event on which the following bounds hold Note that, by (8.1) and (8.8), we know that (8.24) holds with high probability uniformly on S I (η m , 1). With the analogue of Lemma 7.1 for Θ o (z, δ = 1) and Ω o (z), we conduct a continuity argument similar to the one in the proof of (2.19). Again, by Lipschitz continuity of the Green function it suffices to show estimate (2.20) on the lattice S I (η m , 1) defined in (7.19). We fix E ∈ I ∩N −5 Z, write z = E + iη and decrease η from η = 1 down to N −1+γ in steps of size N −5 . The initial estimate for η = 1, i.e., Λ o (E + i) ≤ 1 follows directly from the trivial fact G (i) (z) , G(z) ≤ 1/η. Then one can show step by step that for any η ∈ [η m , 1], say, which is the analogue of (7.20). The remaining proof is nearly the same as the counterpart in the proof of of (2.19). We thus omit the details.
Appendix A. Orthogonal case In this appendix, we show that Theorem 2.5 and Theorem 2.6 also hold in the orthogonal setup where U is Haar distributed on the orthogonal group O(N ). From the proof of Theorem 2.6, we see that it is implied by Theorem 2.5. Hence, it suffices to discuss the latter. We outline the necessary changes in the discussion of Sections 4-8 to adapt our proof to the orthogonal case. We mainly show the modification for the proof of (2.19) in detail, and (2.20) will be discussed briefly at the end.
First, we modify some notation. We start with the decomposition for the Haar measure on the orthogonal group analogous to (2.35). For all i ∈ 1, N , according to [28], there exist a random vector v i = (v i1 , . . . , v iN ), uniformly distributed on the real unit (N − 1)-sphere S N −1 R := {x ∈ R N : x * x = 1}, and a Haar distributed orthogonal matrix U i ∈ O(N − 1), which is independent of v i , such that one has the decomposition and U i is the orthogonal matrix with e i as its ith column and U i as its (i, i)-matrix minor. Moreover, there is a real Gaussian vector g i ∼ N R (0, N −1 I) such that Similarly to (4.5), we define g ik := sgn(v ii ) g ik , k = i , and introduce an N (0, N −1 ) variable g ii , which is independent of the orthogonal matrix U and of g i . Let g i := (g i1 , . . . , g iN ) and note that g i ∼ N R (0, N −1 I). Then we set w i := e i + g i and W i := I − w i w * i as before. With these modifications, we follow the proofs in Sections 4-7 verbatim. The only difference is the derivation of (6.19). Instead of (6.12), we use the following integration by parts formula for real Gaussian random variables R gf (g) e − g 2 2σ 2 dg = σ 2 R f ′ (g) e − g 2 2σ 2 dg , (A.2) for differentiable functions f : R → R. Correspondingly, instead of (6.14), we have Hence, we get ∂ B i G (i) kj ∂g ik = e * k B i G (i) e i e * k + e k e * i + e k g * i + g i e * k B i I − e i e * i − e i g * i − g i e * i − g i g * i G (i) e j + e * k B i G (i) I − e i e * i − e i g * i − g i e * i − g i g * i B i e i e * k + e k e * i + e k g * i + g i e * k G (i) e j instead of (6.15). Substitution into the identity yields E g i [g * i B i G (i) e j ] = (r.h.s. of (8.15)) + where we introduced b i := w i B i w i . Note that the last two terms were discussed in the unitary setup, and they were shown to be negligible. Therefore, to get (8.16) also in the Here we excluded the case (θ, ξ, ζ) = (−1, 1 2 , 1 2 ) since it is equivalent to (θ, ξ, ζ) = (1, 1 2 , 1 2 ) under a shifting, where the latter is a special case of µ α = µ β . In Section 7 of [1], we explained why the setting of (B.1) is special, and we thus excluded it from Theorem 2.5.

(B.4)
The following proposition presents the local law under the setting (B.1).