Eigenstate Thermalization Hypothesis for Wigner Matrices

We prove that any deterministic matrix is approximately the identity in the eigenbasis of a large random Wigner matrix with very high probability and with an optimal error inversely proportional to the square root of the dimension. Our theorem thus rigorously verifies the Eigenstate Thermalization Hypothesis by Deutsch [Deutsch 1991] for the simplest chaotic quantum system, the Wigner ensemble. In mathematical terms, we prove the strong form of Quantum Unique Ergodicity (QUE) with an optimal convergence rate for all eigenvectors simultaneously, generalizing previous probabilistic QUE results in [Bourgade, Yau 2017] and [Bourgade, Yau, Yin 2020].

. I Since the groundbreaking discovery of E. Wigner [ ] postulating that Hermitian random matrices can effectively model the universal statistics of gaps between energy levels of large atomic nuclei, simple random matrices have been routinely used to replace more complicated quantum Hamilton operators for many other physically relevant problems, especially in disordered or chaotic quantum systems. A fundamental phenomenon of such systems is Quantum Ergodicity (QE), stating that the eigenvectors tend to become uniformly distributed in the phase space.
In this paper we study an enhanced version of this question, the Quantum Unique Ergodicity (QUE), for real or complex Wigner matrices and for general observables. We recall that the Wigner matrix ensemble consists of N ×N random Hermitian matrices W = W * with centred, independent, identically distributed (i.i.d.) entries up to the symmetry constraint w ab = w ba . Let {ui} N i=1 be an orthonormal eigenbasis of W . Our main Theorem . asserts that for any deterministic matrix A with A ≤ 1 we have the limit ui, Auj → δij A with very high probability (and hence uniform in i, j) with optimal speed of convergence 1/ Here we introduced the shorthand notation R := 1 N Tr R for the normalized trace of any N × N matrix. In other words, ( . ) establishes the QUE in strong form (i.e. uniformly in i, j) for any Wigner matrix, and shows that the action of any bounded traceless deterministic matrix on the eigenbasis {ui} N i=1 makes it asymptotically orthogonal to itself (up to an optimal error N −1/2 ). For genuinely complex Wigner matrices our second main Theorem . asserts that Theorem . ] but only for random matrices with a Gaussian component of size t 1/N with an error of order 1/ √ N t. An off-diagonal version, ui, Auj → 0 for i = j, coined as quantum weak mixing, was also obtained in [ ] and strengthened in [ ]. Standard Green function comparison arguments may be used to remove the large Gaussian component but only with a considerably suboptimal error or under the extra assumption of matching the first several (in fact more than four) moments of the matrix elements of W with those of the Gaussian GOE/GUE ensemble.
Summarizing, our Theorem . generalizes the probabilistic QUE proven in [ , Corollary . ] and in [ , Theorem . ] to general Wigner ensembles in three aspects: (i) the speed of convergence is optimal (up to an N factor); (ii) the limit is controlled in very high probability, and (iii) it holds uniformly throughout the spectrum including bulk, edge and the intermediate regime. For any deterministic Hermitian observable written in spectral decomposition A = N k=1 a k |q k q k |, our main result, shows that the fluctuations of N ui, q k q k , uj are so strongly asymptotically independent for different k's that their average has the expected 1/ √ N fluctuation scaling reminiscent to the central limit theorem, up to an N factor. In fact, in our companion paper [ , Theorem . ] we also show that the diagonal overlaps ui, Aui , after a small averaging in the index i, satisfy a CLT.
Next we outline the novel ideas of our proof. We consider a spectrally averaged version of the overlaps for bounded traceless observables, A = 0, A ≤ 1, where J = N with some tiny > 0. Our goal is to show that Λ is essentially of order one, with high probability. Denoting by G = G(z) = (W − z) −1 the resolvent at z ∈ H, notice that, by spectral decomposition, where η is slightly above the local eigenvalue spacing at E and ρ is the semicircular density at E smoothed out on scale η; the primed quantities defined analogously. The main work consists in proving a high probability optimal bound on the quadratic functional of the resolvent GAGA , with possible imaginary parts and at different spectral parameters. Note that for overlaps with a rank one observable, A = |q q|, it is sufficient to control ui, Aui = | q, ui | 2 . After a mild local averaging in the index i this becomes comparable with q, ( G)q whose control is equivalent to a conventional single-G isotropic local law. This served as a natural input for the DBM proofs on eigenvectors in [ , ]. For traceless observables, however, ui, Aui does not have a sign, so we need to consider | ui, Aui | 2 to understand its size, hence the relevant quantity is Λ 2 containing two G factors ( . ), i.e. single-G local laws are not sufficient. For estimating ( . ) we face a combination of two serious difficulties. First, we need to gain an additional cancellation from the fact that A is traceless; second, we need to handle local laws for products of several G's. The first issue already arises on the level of a single-G local law: In Theorem . we will prove that the resolvent approximation G ≈ m by the Stieltjes transform m = m(z) of the Wigner semicircular density, commonly referred to as a local law, holds to a higher accuracy when tested against a traceless observable. More precisely for the decomposition GA = A G + G(A − A ) and with ρ := | m|/π we have that with both errors being optimal, in fact they identify the scale of the asymptotic Gaussian fluctuation of G and G(A − A ) , respectively [ , ]. Note that the error term for the traceless part is much smaller than that for G in the relevant small η regime. For GAG * A the discrepancy is even bigger; without zero trace assumption GAG * A ∼ 1/η (e.g. for A = I), while for A = 0 we will show that GAG * A ∼ 1 even for very small η.
The second issue touches upon the basic mechanism of the standard proof of the local laws. It consists in deriving an approximate self-consistent equation for the quantity in question, e.g. GAGA , and compare it with the corresponding deterministic equation (Dyson equation) without approximation error. The main error term W GAGA , a renormalized version of W GAGA , see ( . ), is expected to be smaller than GAGA , but when estimating its high moments by a cumulant expansion many terms with traces of more than two G-factors emerge. Trivial a priori bounds using G ≤ 1/η are not affordable, so one has to continue expanding, resulting in higher and higher degree monomials in G; reminiscent to the notoriously difficult closure problem in the BBGKY hierarchy for the correlation functions of interacting particle dynamics. In the proof of the conventional local law G = m + O(1/N η), the expansion is stopped by using the Ward identity GG * = G/η, reducing the number of G factors by one. However, with a deterministic matrix in between, as in GAG * , Ward identity is not applicable. A trivial Schwarz bound followed by Ward identity, is available, but at the expense of replacing the traceless matrix A with the non-zero trace matrix AA * , hence losing the main cancellation effect that we cannot afford. Our main idea is to use Λ from ( . ) as the basic control quantity and derive a stochastic Gronwall inequality for it. In doing so, we use the spectral decomposition of G to estimate traces of products of many G s and A s by the lower degree term GAGA . Technically, this requires to extract sufficiently many Λ-factors in the cumulant expansion, which we achieve by a subtle Feynman graph analysis to estimate all high moments of | W GAGA |. Feynman diagrams have been systematically used to organize cumulant expansions and their estimates come on different levels of sophistication, see e.g. [ , , ], but also related expansions in random matrices, e.g. [ , , ]. For the proof of ( . ) via cumulant expansion (e.g. following [ ]), it is sufficient to monitor the number of N -factors (from the size of the cumulants and from the summation of intermediate indices) and the number of ρ/η factors from the Ward identity. In the current analysis we additionally need to monitor the Λ factors. While the number of traceless A-factors is preserved along the expansion, but the cancellation effect of some of them may be lost as in ( . ). Our proof has to carefully offset all such losses by the gains from higher order cumulants that typically accompany the loss of effective A-factors. In particular, since we are aiming at an optimal bound, in the expansion terms that involve only second order cumulants we need to gain from all A-factors. We used a similar but much simpler expansion in our work on CLT for non-Hermitian random matrices, see [ , Prop. . , Eq. ( . c)], where the additional smallness came from the large distance between two (non-Hermitian) spectral parameters z1, z2. However, in [ ] it was sufficient to gain only a small proportion of all possible smallness factors since we did not aim at the optimal bound. In the current paper, using a refined combinatorics we manage to extract the zero trace orthogonality effect to the maximal extent; this is the key to obtain the optimal error bound in ( . ). Similarly, for the proof of ( . ) we manage to extract the asymptotic orthogonality effect between the eigenvectors ui and their complex conjugates ui optimally, resulting in the bound | GG t | 1, gaining a full power of η over e.g. GG * ∼ 1/η.
After this introduction and presenting the main results in the next Section , we prove the local laws involving two resolvents in Section . The main inputs for them are the improved bounds on renormalized ("underlined") monomials in several G's in Theorem . that are proven in Section . Note that even though we are interested in local laws only with two G's, due to the cumulant expansion we need to control arbitrary long monomials involving a product of G's and A's.

Notations and conventions.
We introduce some notations we use throughout the paper. For integers k ∈ N we use the notation [k] := {1, . . . , k}. We write H for the upper half-plane H := {z ∈ C | z > 0}. For positive quantities f, g we write f g and f ∼ g if f ≤ Cg or cg ≤ f ≤ Cg, respectively, for some constants c, C > 0 which depend only on the constants appearing in ( . ). We denote vectors by bold-faced lower case Roman letters x, y ∈ C k , for some k ∈ N. Vector and matrix norms, x and A , indicate the usual Euclidean norm and the corresponding induced matrix norm. For any N × N matrix A we use the notation A := N −1 Tr A to denote the normalized trace of A.
Moreover, for vectors x, y ∈ C N we define x, y := xiyi, Axy := x, Ay , with A ∈ C N ×N . We will use the concept of "with very high probability" meaning that for any fixed D > 0 the probability of the N -dependent event is bigger than 1 − N −D if N ≥ N0(D). Moreover, we use the convention that ξ > 0 denotes an arbitrary small constant which is independent of N .
. M We consider real symmetric or complex Hermitian N × N Wigner matrices W . We formulate the following assumptions on the entries of W .
Assumption . . The matrix elements w ab are independent up to the Hermitian symmetry w ab = w ba . We assume identical distribution in the sense that w ab being a real or complex random variable and χ d being a real random variable such that E χ od = E χ d = 0 and E |χ od | 2 = 1. In the complex case we also assume that E χ 2 od ∈ R. In addition, we assume the existence of the high moments of χ od , χ d , i.e. that there exist constants Cp > 0, for any p ∈ N, such that ( . ) In this paper we use the notations w2 := E χ 2 d , σ := E χ 2 od and their commonly occurring combination w2 := w2 − 1 − σ, and note that w2, w2, σ ∈ R.
Our first main result is the proof of the Eigenstate Thermalisation Hypothesis, that in mathematical terms is the proof of an optimal convergence rate of the strong Quantum Unique Ergodicity (QUE) for general observables uniformly in the spectrum of W .
with very high probability for any arbitrary small ξ > 0.
The first relation in Theorem . states that any deterministic observable is essentially diagonal in the eigenbasis of W , in other words the eigenvectors remain asymptotically orthogonal when tested against any traceless observable A = 0. The second relation shows the same phenomenon between the eigenbasis {ui} i∈ [N ] of W and the eigenbasis {ui} i∈[N ] of W t . The scalar products ui, uj appearing in ( . ) when A = 0 can also be identified. Indeed, the next theorem shows that these two eigenbases are also essentially orthogonal apart from the extreme cases σ = ±1 (see Remark . below).
The main inputs to prove Theorems . -. are the local laws for one and two resolvents (and their transposes) tested against matrices A with A = 0. We recall that in the limit N → ∞ the resolvent G = G(z) = (W − z) −1 becomes approximately deterministic. Its deterministic approximation is given by m = msc, the Stieltjes transform of the Wigner semicircular law, which is given by the unique solution of the quadratic equation We note that |m| ≤ 1 for any z. In this paper we allow spectral parameters with z < 0, in order to conveniently account for possible adjoints of the resolvent since G(z) * = G(z). Therefore, in contrast with most papers on local laws, msc may be negative and we define ρ = ρsc(z) := π −1 | msc| and η := | z|. The classical local law (see e.g. in [ , , ]) for a single resolvent in averaged and isotropic form states that in the spectral regime {z | N ρη ≥ 1} we have for any deterministic matrix A and vectors x, y, with A , x , y 1. Here ≺ indicates the commonly used concept of stochastic domination (see, e.g. [ ]) indicating a bound with very high probability up to a factor N for any small > 0, uniformly in A, x, y and in the spectral parameter z as long as N ρη ≥ 1. The precise definition is as follows: are families of non-negative random variables indexed by N , and possibly some parameter u, then we say that X is stochastically dominated by Y , if for all , D > 0 we have D). In this case we use the notation X ≺ Y or X = O≺(Y ).
Our key new insight is that whenever the deterministic matrix A in ( . ) is traceless, then GA is considerably smaller (by a factor of √ ρη, in the interesting small η regime) than the general bound ( . ) predicts. There is no such improvement for the isotropic law.
Theorem . (Traceless single G local law). Fix > 0, let W be a Wigner matrix satisfying Assumption . , let z ∈ C \ R, and let G(z) = (W − z) −1 . We use the notation η := | z|, ρ = ρsc(z), m = msc(z). Then for N ηρ ≥ N and for any deterministic matrix A, with A = 0 and A 1, we have We prove a similar drastic improvement owing to the traceless observables for local laws involving two resolvents, like GAGA , as well as local laws involving a resolvent and its transpose, GG t . The isotropic laws are also improved in this case. The precise statements will be given in Remark . in Section . We close the current section by a remark indicating the optimality of the new local law ( . ).
Remark . . The local law for GA in ( . ) is optimal for G, G * , as well as G. Indeed a simple calculation from [ , Theorem . ] shows, for z > 0, that In fact, in our companion paper we prove that GA is asymptotically Gaussian with zero expectation and variance given in ( . ) (see [ , Eq. ( )]). This variance is much smaller than the one without traceless observable, .
For integers J ∈ N and self-adjoint matrices B = B * we introduce the J-averaged observables Since J = N with > 0 arbitrary small, together with the definition of ≺ in Definition . , the bound in ( . ) implies that | ui, Auj | 2 + | ui, Auj | 2 ≺ N −1 , concluding the proof of Theorem . . The proof of Theorem . is completely analogous and so omitted.
As a first step towards the proof of Theorem . , we first show that Ξ B J ,Ξ B J are comparable with G1B G2B , G1B G t 2 B for suitably chosen spectral parameters in the resolvents Gi = G(zi). For any J ∈ N and E ∈ [−2, 2] we define z = z(E, J) = E + iη(E, J) ∈ H implicitly via the equation N η(E, J)ρ(z(E, J)) = J. Note that this equation has a unique solution η(E, J) > 0 since the function η → η m(E + iη) is strictly increasing from 0 to 1. The following simple lemma will be proven at the end of this section.
As the main inputs for Theorem . , we now state the various local laws and bounds for products of G's, their transposes and deterministic matrices in the following Propositions . -. . These bounds still involve the key control quantities Λ and Π. Using these bounds, we will prove Theorem . by a Gronwall argument on Λ and Π that will immediately imply Theorem . . Finally, for completeness, we also state a few representative local laws involving two resolvents in Remark . . The key technical Propositions . -. will be proven in Section .
We use the notational convention that the letter A denotes traceless matrices, while B denotes arbitrary matrices.
Proposition . . Let A = A * be a deterministic matrix with A = 0. Fix > 0 and consider z ∈ C\R such that L := N ηρ ≥ N . Then for G = G(z) we have that Proposition . . Let A = A * , A = (A ) * be a deterministic matrix with A = 0 = A . Fix > 0, let W be a Wigner matrix satisfying Assumption . , let z1, z2 ∈ C \ R, and let Gi = G(zi), for i ∈ {1, 2}. We use the notations ηi := | zi|, ρi = ρsc(zi), mi = msc(zi), and set L := N mini(ηiρi), η * := η1 ∧ η2 and ρ * = ρ1 ∨ ρ2. Then for L ≥ N and setting Λ A + = Λ A L + A , Π+ := ΠL + 1, we have the averaged local laws where G (t) indicates that the bounds are valid for both choices G or G t . Moreover, for any deterministic vectors x, y such that x + y 1 we have the isotropic law Additionally, for |σ| < 1 we have that and where the error is uniform in |σ| ≤ 1 − , for any fixed > 0.
Using Lemma . and Propositions . -. as an input, we now conclude the proof of Theorem . .
Proof of Theorem . . We start with the proof of ( . ). Choose J = N with a fixed arbitrary small > 0, and E1, E2 ∈ [−2, 2]. Then by the definition of z(Ei, J) = Ei + iη(Ei, J) above Lemma . it follows that J = N η(E1, J)ρ(z(E1, J)) = N η(E2, J)ρ(z(E2, J)), and thus we obtain from ( . ) that By a standard grid argument using the Lipschitz continuity of the resolvent we conclude that ( . ) remains valid after taking the supremum over E1, E2 ∈ [−2, 2] and therefore from the lower bound in Lemma . we conclude Finally, by ( . ) it follows that (Λ A J ) 2 ≺ 1, concluding the proof of ( . ). The proof of ( . ) is completely analogous to the proof of ( . ) above using the local law ( . ). This concludes the proof of Theorem . . Proof of Theorem . . This theorem immediately follows from Propositions . together with Λ A L ≺ 1 obtained in Theorem . .
Remark . . Proposition . combined with the Λ A + + 1(|σ| < 1)Π+ ≺ 1 obtained in Theorem . also provides local laws involving two resolvents as counterparts of the single G local law stated in Theorem . . For example, with the notations of Proposition . , we have and for |σ| < 1 we also have We stated only the local laws for two resolvents where the asymptotic orthogonality mechanism is detected, i.e. if a traceless deterministic matrix is present or if G and G t appear next to each other and |σ| < 1. Note that when both mechanisms are simultaneously present, as in the terms with transposes in ( . ), one may gain from both effects simultaneously, but we refrain from doing this here. For comparison, we also list some local laws without exploiting this mechanism: and for any |σ| ≤ 1 these relations are proven in our companion paper [ ] using Theorem . of the present paper. In the most interesting critical case z := z1 =z2 with η = | z| 1, the leading term in ( . ) is of order 1/η with a large error 1/N η 2 , while the leading term in ( . ) is bounded (even zero in the isotropic case) with a negligble error term. The leading terms in ( . ) and ( . ) are of course the same, but the error term in ( . ) is much bigger since it ignores the asymptotic orthogonality mechanism.
Note that using the decomposition B = B +B, whereB is the traceless part of B, a combination of ( . )-( . ) trivially gives local laws for any product of the form GBG (t) B for arbitrary deterministic matrices B and B .
Finally, we close this section by proving Lemma . .
Proof of Lemma . . We only consider G1B G2B , the proof of the bounds for G1B G t 2 B is completely analogous and so omitted.
For E1, E2 ∈ [−2, 2], we recall that by the definition of z(Ei, J) = Ei + iη(Ei, J) above Lemma . it follows that J = N η(E1, J)ρ(z(E1, J)) = N η(E2, J)ρ(z(E2, J)) and thus, together with ( . ) we conclude that there is constant C such that for any a, a0 it holds that With a slight abuse of notation we will write this relation as since the implicit constants in the relation will be irrelevant for our analysis. With the short-hand notations Ξ = Ξ B J , zi = z(Ei, J), Gi = G(zi), ηi = η(Ei, J), and ρi = ρ(zi), then by ( . ), and writing G1B G2B in spectral decomposition with very high probability on the set where the rigidity bound ( . ) holds. The lower bound in ( . ) is trivial by choosing E1 = γa 0 and E2 = γ b 0 . To prove the upper bound in ( . ) we use the local averaging formula for general non-negative R ab , S ab such that R ab ∼ R a 0 b 0 whenever |a − a0| ∨ |b − b0| ≤ J which is applicable for R ab in ( . ) as a consequence of the rigidity bound in ( . ), the relation in ( . ), and the fact that we can choose the ξ > 0 so that N ξ , coming from the rigidity high probability bound, is much smaller than J ≥ N . Finally we note that For any given functions f, g of the Wigner matrix W we define the renormalisation of the product g(W )W f (W ) (denoted by underline) as follows: where ∂ W f (W ) denotes the directional derivative of the function f in the direction W at the point W , and W is an independent copy of W . The definition is chosen such that it subtracts the second order term in the cumulant expansion, in particular if all entries of W were Gaussian then we had E g(W )W f (W ) = 0. Note that the definition ( . ) only makes sense if it is clear to which W the underline refers, i.e. it would be ambiguous if f (W ) = W . In our applications, however, each underlined term contains exactly a single W factor, and hence such ambiguities will not arise. As a special case we have that recalling the notation w2 = w2 − 1 − σ from Assumption . . Then by ( . ) and and ( . ), it follows that From ( . ) one can already see that in order to get a local law for G it is essential to estimate the underlined term W G in averaged and isotropic sense. In order to prove Proposition . we need bounds for underlined terms involving not only one G but also two G's (see e.g. ( . ) below). We now state the bound for these terms and for longer products of resolvents and deterministic matrices both in an averaged and in an isotropic sense, since the proof for products with more than two resolvents is very similar to the cases we need.
For l ∈ N we consider renormalised alternating products of resolvents G1, . . . , G l and deterministic matrices B1, . . . , B l in averaged and isotropic form, Each resolvent G k is evaluated at a (potentially) different spectral parameter z k ∈ C \ R and other than G k = G(z k ) we allow each resolvent to be transposed and/or being the imaginary part, i.e.
Note that adjoints of resolvents can be included in the products by conjugating the spectral parameter since G(z) * = G(z). For a given product of the form ( . ) we consider three sets i, a, t of indices, recording special structural properties of G k , B k . By definition, the set i ⊂ For the choice of the sets a, t we allow a certain freedom described in the theorem.
Theorem . . Fix > 0, let l ∈ N, z1, . . . , z l ∈ C \ R and for k ∈ [l] let G k be as in ( . ) and B k be deterministic N × N matrices, and x, y be deterministic vectors with bounded norms B k 1, with η k := | z k |, ρ k := ρ(z k ) = | m(z k )|/π and assume L ≥ N and η * 1. Let a, t denote disjoint sets of indices, a ∩ t = ∅, such that for each k ∈ a we have B k = 0, and for each k ∈ t exactly one of G k , G k+1 is transposed, where in the averaged case and k = l it is understood that G l+1 = G1. Recall the notations Π+ := ΠL + 1, Λ B + := Λ B L + B . Then with a := |a|, t := |t|, we have the following bounds: and for any 0 ≤ j < l we have the bound where the j = 0 case is understood as x, W G1B1 · · · B l−1 G l y .
In case k∈i ρ k (ρ * ) b+1 , the bounds ( . )-( . ) remain valid if the rhs. are multiplied by the factor (ρ * ) −b−1 k∈i ρ k , where b := l in case of ( . ), b := l − a − t in case of ( . ), and b := l − a − t − 1 in case of ( . ). Moreover, for any η * ≥ 1 we have the bounds Remark . (Asymptotic orthogonality effect). The main result of Theorem . are ( . ) and its isotropic counterpart ( . ). The essential part is the factor ( √ N η * ) a+t in ( . ) since the additional factors Π+ and Λ+ are a posteriori shown to be essentially O(1), c.f. Theorem . . Compared with the robust bound ( . ) in the relevant small η * regime the bound ( . ) represents an improvement of √ N η * for each occurrence when the asymptotic orthogonality can be exploited, either due to a traceless matrix B or to a switch between a resolvent and its transpose. In addition, compared to the robust averaged bound ( . ) there is a further improvement of ρ * /N η * in ( . ) if at least one orthogonality effect is exploited, enabling the optimal GA local law in Theorem . . We note that in case when √ N η * 1 the robust bounds ( . ) and ( . ) with a + t = 0 are always available also in the presence of traceless deterministic matrices and alternating G, G t simply by choosing the sets a, t to be empty.
Remark . (Alternative renormalisation). In ( . ) we defined the renormalisation with respect to an independent copy of W while in some previous papers [ ] the same notation was used to denote the renormalisation with respect to a suitable reference ensemble (e.g. the GUE-ensemble in the present paper or the complex Ginibre ensemble in case of [ ]). However, mostly these two possible definitions only differ in some sub-leading terms. For example, denoting the renormalisation with respect to an independent GUE-matrix by The difference between the two renormalisations becomes relevant in Theorem . only whenever at least one transposed resolvent occurs since then for example

However, this is the only relevant case and the statement of Theorem . holds true verbatim if
Using the bounds for the underlined terms in Theorem . we conclude the proof of Proposition . -. . We start with the proof of the local law for GA and then we prove the local laws and bounds for two G's.
Proof of Proposition . . Using the equation for G in ( . ), we start writing the equation for GA: where we recall the definition of W G in ( . ). Then, taking the average in ( . ), using that A = 0 and that | G − m | ≺ (N η) −1 by the first bound in ( . ), we conclude where in the last inequality we used Lemma . , and the notation Λ+ := Λ A L + A . Additionally, we also have that where we used that |Gii| + |(GA)ii| ≺ 1 by ( . ). Combining ( . )-( . ) we finally conclude that where we used | W GA | ≺ Λ+ρ 1/2 N −1 η −1/2 by ( . ). This concludes the proof of ( . ).
Next, using the local law for single resolvent proven above, we proceed with the proof of the bounds for for products of two resolvents and deterministic traceless matrices.
Proof of Proposition . . We start writing the equation for generic products of two G's G1B1G2B2, where Gi = (W − zi) −1 and B1, B2 are deterministic matrices. Using the equation ( . ) for G1B1, where we used that with W G1 from ( . ). The identity in ( . ) follows by the definition of underline in ( . ).

Remark . . For notational simplicity, throughout the proof of Proposition . we use the notations
rather than distinguishing the different Λ's. However, the proof naturally yields in fact a factor of Λ A + for each traceless A giving the bounds in Propositions . .
Proof of the bounds ( . ). We focus only on the proof of the bound for G1G2A , the bounds for G t 1 G2A , G1AG2 , G t 1 AG2 , G1A G2 , and G t 1 A G2 are completely analogous and so omitted, modulo the bound for the underlined term. In particular, the bound in ( . ) has to be replaced by Then using Cauchy-Schwarz we have that where we used the Ward identity, that G1 ≺ ρ1, and that K = N η * ρ * . In penultimate inequality of ( . ) we also used Lemma . to prove that G2A G t 1 A ≺ ρ1ρ2Λ 2 + . Using exactly the same computations we conclude the same bound for (G1G2) t G2A as well. Now we show that the terms with a pre-factor w2 are negligible. We start with obtained using that |Gii| ≺ 1, by the isotropic law ( . ), and |(G1G2A)ii| ≺ ρ1ρ2/η1η2 by a simple Schwarz inequality. The bound for | diag(G1G2)G2A | is analogous and so omitted. Combining ( . ) with ( . )-( . ), using that | G2A | ≺ √ ρ2N −1 η −1/2 2 by ( . ), and dividing by the factor in the lhs. of ( . ), we conclude that where to go from the first to the second line we used that | G2A |Λ+ ≺ ρ by ( . ). This concludes the proof of the bound of | G1G2A |.
Proof of the bound in ( . ) for x, G1AG2y . Using the bound for G1G2A and the estimates in Lemma . below as an input, the proof of the bound follows by exactly the same computations as in the proof of the bound for | G1G2A |.
Proof of the local laws for two resolvents in ( . ) and ( . ). We focus only on the proof of the local law for G1AG2A , the proof of the local law for G t 1 AG2A is exactly the same. The prof of the local law for G1G t 2 is also analogous to the proof of the local law for G1AG2A with the only difference that the multiplicative factor in the rhs. of ( . ) has to be replaced by This difference does not create any change since for |σ| < 1 the stability factor 1−σm1m2 is bounded from below by 1 − |σ|.
In order to conclude the proof of Proposition . we are only left with the averaged local laws for G1A G2A and G1A G t 2 A in ( . ) and for G t 1 G2 in ( . ).
Proof of the local laws in ( . ). We present only the proof of the local law for G1A G2A , the proof for G1A G t 2 A is identical and so omitted. We start with the formula analogous to ( . ) but with G's instead of G's, generating altogether twelve terms with a 1/N pre-factor. Ten of them can be estimated by Λ 2 + ρ1ρ2L −1 exactly as in ( . )-( . ) by writing out 2i Gi = Gi − G * i . Note that whenever the analogue of ( . ) is used, but with G 1 G1, we could gain the necessary factor ρ1ρ2/(η1η2) instead of only ρ1/η1 in the first Schwarz inequality in ( . ). Keeping the two special 1/N terms, this gives the expansion The two remaining 1/N terms, where G2 is separated from G1 by A's, are estimated as follows: where in the last inequality we used the Ward identity and Lemma . below. Inserting ( . ), the local law | G2 − m2 | ≺ (N η2) −1 and ( . )-( . ) into ( . ) we conclude that Finally, combining ( . ) with the bound for W G1A G2A and W G1A G2A in ( . ), we conclude that Proof of the local law for G1 G t 2 in ( . ). We closely follow the proof of G1A G2A , hence we only explain the differences. As each traceless A, A between two resolvents gave rise to a factor Λ+ in the proof of G1A G2A , here the fact that a resolvent is followed by its transpose gives rise to a factor Π+. Keeping this modification in mind, in the basic equation for G1 G t 2 we can again estimate all the 1/N terms as in ( . )-( . ) and ( . ) by (1 + Π 2 )ρ1ρ2L −1 . Then, using the local law | Gi − mi | ≺ (N ηi) −1 , similarly to ( . ), we conclude that where we used Π+ := 1 + Π. Note that several "large" terms remained in ( . ) in contrast to ( . ) since the analogues of G2A and G * 2 A in ( . ) are now not small. Then using the bounds in ( . ) for the underlined terms in ( . ), and the local laws we conclude We remark that the second local law in ( . ) follows analogously to ( . ). Finally, writing G * 2 in the last term in the rhs. of ( . ) as G This concludes the proof of Proposition . .
For the sake of simpler notations we abbreviate and within this section write ρ i and Λ a + with i := |i|, a = |a| and Λ+ := max k∈a Λ B k + rather than carrying the products k∈i ρi and k∈a Λ B k + . Within the formal proof of Theorem . we argue, however, that the proof naturally yields the latter. In order to present the main body of the proof of Theorem . more concisely we first make four simplifying assumptions: (A-i) we assume that w2 = 1 + σ, (A-ii) we consider the regime η 1, (A-iii) for the averaged case we assume that l ∈ a ∪ t whenever |a ∪ t| = 0, (A-iv) in the isotropic bound we only consider j ≥ 1.
In Appendix A we address the necessary changes to remove each of these four simplifying assumptions.
. . Graphical representation of the cumulant expansion. Using multiple cumulant expansions we expand the high moments and | x, G1B1 · · · GjBjW Gj+1Bj+1 · · · B l−1 G l y | 2p as a polynomial of resolvent entries for any p ∈ N. More precisely, we iteratively use the expansion with some explicit error term ΩR (see e.g. [ , Proposition . ]) which for our application can easily be seen to be O(N −2p ) if R = 12p. Here for a k-tuple of double indices α = (α1, . . . , α k ) we use the short-hand notation κ(ab, (α1, . . . , α k )) := κ(w ab , wα 1 , . . . , wα k ) for the joint cumulant of w ab , wα 1 , . . . , wα k and set ∂α := ∂w α 1 · · · ∂w α k , ∂ ab := ∂w ab . We wish to express the cumulant factors in ( . ) as a matrix with a, b matrix elements. To encode the fact the that cumulants have slightly different combinatorics for a = b and a = b, we rewrite ( . ) as where we used that cumulants are invariant under reordering their entries, and thus κ(ab, α) can be expressed as the cumulant of q + 1 copies {ab} q+1 of ab and q copies {ba} q of ba. In order to simplify notations we introduce the matrices κ q+1,q for integers q, q ≥ 0 with q + q ≥ 1 with matrix elements so that ( . ) can be rewritten as where we used that due to (A-i) we have that κ({ √ N waa} 2 ) = w2 = 1 + σ = κ 1,1 aa + κ 2,0 aa . We begin with some examples before describing the general structure of the expansion. We consider the case p = 1 and l = 2 and perform a cumulant expansion where (∆ ab ) cd = δacδ bd . In order to compute the derivative of G we write By distributing the derivatives according to Leibniz' rule we can write ( . ) as where we chose two representative terms for k = 1 and k = 2 each. By performing another cumulant expansion for the remaining underlined terms in ( . ) we obtain where we again selected representative terms. We notice that the rhs. can be written as a polynomial in the entries of two types of matrices; the κ-matrices representing cumulants like κ 2,1 , and the Gmatrices representing resolvents like G or G * , or their multiples with A like A G, G * A. In order to achieve this representation we introduce additional internal summation indices to expand longer products e.g. we write (A GAG * ) da = e (A G) de (AG * )ea. The value of any given graph is the numerical result of summing up all indices. The precise definition will be given later in ( . ); here, as an example, the first term in ( . ) with indicated summation indices reads ab ij where the (directed) edges represent matrices and the vertices represent summation indices. The edge orientation indicates the order of indices of the represented matrix which for the G-edges is uniquely determined from the expansion, while for κ-edges it may be chosen arbitrarily, as long as the represented matrix is defined consistently with the orientation, see ( . ) later. Here we drew the internal vertices as empty, and the κ-vertices as filled nodes, the κ-matrices as dashed, and the G-matrices as solid edges. Both internal-and κ-vertices correspond to independent summations over the index set [N ]. Thus, graphically we can represent ( . ) as Note that the dashed edges connect only filled nodes and they form a perfect matching. The number of G-edges adjacent to each filled vertex is equal to the order of the corresponding cumulant expansion.
Similarly, for the isotropic case we obtain, for example the polynomials E| x, GAW Gy which we represent graphically as with external vertices drawn as squares. Note that the vectors x and y are naturally represented by external vertices that are drawn as solid squares.
After these examples we now explain the general structure of the graphs and give a precise definition of graphs and their graph value used in ( . ) and ( . ).
Definition . . We define the class G of oriented graphs used within this paper by the following requirements. Each Γ = (V, E) ∈ G has three types of vertices, κ-vertices Vκ, internal vertices Vi and external vertices Ve, so that V = Vκ∪Vi∪Ve, and two types of edges, κ-edges Eκ and G-edges Eg, so that E = Eκ∪Eg. For each vertex v ∈ V we define its G-inand out-degree d in g (v), d out g (v) as the number of incoming and outgoing G-edges. The total degree dg(v) is defined as the sum dg(v) := d in g (v) + d out g (v) of in-and out-degrees and the three vertex classes satisfy We can partition Vκ = k≥2 V k κ with V k κ := {v ∈ Vκ | dg(v) = k}. Within the graphs Γ each external vertex v ∈ Ve carries some x(v) ∈ C N as a vector-valued label recording which vector the vertex represents. Each κ-edge e ∈ Eκ carries two integer-valued labels r(e) ≥ 1, s(e) ≥ 0 recording the cumulant type. Each G-edge e ∈ Eg carries six labels. The binary labels i(e), t(e), * (e) ∈ {0, 1} indicate whether e represents the imaginary part, the transpose and/or the adjoint of a resolvent. The scalar label z(e) records the spectral parameter of the resolvent and the matrix-valued labels L(e), R(e) record deterministic matrices which are multiplied with the resolvent from the left/right.
We now relate the graphs to the polynomials they represent. Each internal vertex or κ-vertex v corresponds to an independent summation av ∈ [N ]. In order to unify notations we define a labelling map where ea is the a-th unit vector in the standard basis, and for v ∈ Ve, the vector x(v) is the label of v from Definition . . The G-edges e ∈ Eg represent resolvents defined via the labels of e from Definition . . We define the matrix G e as the resolvent G(z(e)) modified according to i(e), t(e), * (e) and multiplied by L(e), R(e) from the left/right. As an example, we set G e = B( G(z)) t for e ∈ Eg with i(e), t(e), * (e), z(e), L(e), R(e) = (1, 1, 0, z, B, I).
We remark that for all G-edges e considered in this paper at most one of the matrices L(e), R(e) is different from the identity matrix I. The κ-edges e ∈ Eκ represent N × N cumulant matrices κ e which are determined by the two integers r(e), s(e) from Definition . such that for a = b, κ (uv) ab where on the rhs. κ was defined in ( . ). We note that |κ (uv) ab | 1 by ( . ). Finally, we define the graph value Among the degree-2 vertices the ones between edges representing matrices whose eigenvectors are asymptotically orthogonal are of particular importance. There are two different mechanism for such orthogonality; (a) two resolvents, one with and one without transpose stand next to each other, e.g. GG t or G * (A( G) t ), (b) a traceless matrix A stands between two resolvents, e.g. (GA)G * or G(A( G) t ). Note that in some cases, e.g. (GA)( G) t , both mechanism can be present simultaneously, and hence a vertex can be 0trand t-vertex at the same time.
κ is called an zero-trace-orthogonality vertex, or short 0tr-vertex if exactly one of the two edges adjacent to v represents a resolvent (which is allowed to be the imaginary part, transposed, or adjoint) multiplied by a traceless matrix on the side of v, while the other adjacent edge represents a resolvent matrix multiplied by the identity matrix on the side of v. More precisely, using the labels L(e), R(e) of the edges, v is defined to be an 0tr-vertex if one of the following three conditions is satisfied: (b.i) there are incoming/outgoing edges (uv), (vw) ∈ Eg such that either R((uv)) = 0, L((vw)) = I or L((vw)) = 0, R((uv)) = I, (b.ii) there are two outgoing edges (vu), (vw) ∈ Eg such that either L((vu)) = 0, L((vw)) = I or L((vw)) = 0, L((vu)) = I, (b.iii) there are two incoming edges (uv), (wv) ∈ Eg such that either R((uv)) = 0, R((wv)) = I or R((wv)) = 0, R((uv)) = I.

Proposition . (Cumulant expansion)
. Let a, t, i be fixed sets as in Theorem . of sizes a := |a|, t := |t|, i := |i|. Then for any p ∈ N there exists a finite (N -independent) family of graphs Gp = G av p ∪G iso p ⊂ G such that and for each graph Γ we may select two disjoint subsets V t o∪ V 0tr o =: Vo of tand 0tr-vertices, respectively, such that the following properties are satisfied: (P ) The graph (Vκ, Eκ) is a perfect matching, in particular, |Vκ| = 2|Eκ|. (P ) The number of κ-edges satisfies 1 ≤ |Eκ| ≤ 2p.

(P ) The number of G-edges satisfies
and dg(u) = dg(v) ≥ 2. Therefore we may define the G-degree of (uv) as dg((uv)) := dg(u) = dg(v) and partition Eκ =˙ k≥2 E k κ into E k κ := {e ∈ Eκ | dg(e) = k}. (P ) Every Eg-cycle on V 2 κ ∪ Vi must contain at least two V 2 κ -vertices, and in particular there cannot exist isolated loop edges, and there are at most |E 2 κ | cycles. (P ) Denoting the number of isolated cycles in (Vκ ∪ Vi, Eg) with k vertices in Vo by n o=k cyc , we have (P ) The numbers of selected internal 0trand t-vertices are where in the averaged case j := l and j is determined by the lhs. of ( . ) in the isotropic case. (P ) If j ∈ a (with again j := l in the averaged case), then the set of selected 0tr-vertices V 0tr The graphs Γ ∈ G av p satisfy (P )-(P ) and in addition: (P av ) There are no external vertices, i.e. Ve = ∅ (P av ) The number of internal vertices satisfies |Vi| = 2(l − 1)p.
The graphs Γ ∈ G iso p satisfy (P )-(P ) and in addition: (P iso ) The number of external vertices is |Ve| = 4p each v ∈ Ve has degree dg(v) = 1 and the unique connected vertex u ∈ V with (uv) ∈ Eg or (vu) ∈ Eg satisfies u ∈ Vκ. (P iso ) The number of internal vertices satisfies |Vi| = 2p(l − 2).
Definition . . For some parameters a, t, l, i, p ∈ N we call graphs Γ ∈ G together with their selected V t o , V 0tr o sets satisfying (P )-(P ) and (P av )-(P av ) av-graphs, while we call graphs Γ ∈ G (together with the sets V t o , V 0tr o and the extra parameter j ∈ [l − 1]) satisfying (P )-(P ) and (P iso )-(P iso ) iso-graphs.
Proof of Proposition . . In order to obtain ( . ) we iteratively perform cumulant expansions exactly as in the examples ( . ) and ( . ) until no underlined terms remain. Each cumulant expansion removes at least one underlined term, hence this process terminates. We now explain which kinds of G-edges are created through this cumulant expansion procedure for the averaged case ( . ), the isotropic case ( . ) being very similar. Initially, the graph representing the lhs. of ( . ) after writing out |Tr X| 2p = (Tr X) p (Tr X * ) p consists of 2p cycles each with a W factor and l G-edges representing G-factors G k B k or B * k G * k for k ∈ [l]. Each of these G-factors can be fully described via the labels i(e), t(e), * (e), z(e), L(e), R(e) from Definition . , the first four being determined by the form of G k while the latter two encode the multiplication from the left/right by deterministic matrices, e.g. L(e) = I, R(e) = B k for G k B k . While performing cumulant expansions of some W = ab w ab ∆ ab using ( . ) these G-edges are modified and new G edges are created via the action of derivatives, and κ-edges representing κ(ab, α) are also created. This process creates creates (finitely) many different graphs for every cumulant expansion, both through the explicit summation over cumulants in ( . ) and the Leibniz rule for the derivative ∂α acting on the product of all remaining W 's and G's. We note that for resolvent derivatives we have Hence, a derivative action on e representing the G-factor G e = G k B k (or similarly B * k G * k ) creates two G-edges e1, e2, such that only the resolvent representing e2 is multiplied from the right by R(e2) = B k while L(e2) = L(e1) = R(e1) = I. The labels t(e), z(e) indicating the transposition status and spectral parameter are directly inherited to both e1, e2, while the label i(e) is inherited to exactly one of e1, e2, i(e1) = 1 the other one satisfying i(e2) = 0, * (e2) ∈ {0, 1}. If * (e) = 1, i(e) = 0, then both e1, e2 satisfy * (e1) = * (e2) = 1. It follows inductively that each G-factor encountered in the expansion can be represented by an edge e with six labels i(e), t(e), * (e), z(e), L(e), R(e), with L(e) = I, or L(e) = B * k for some k, while R(e) = I or R(e) = B k for some k, with for each e at least one of L(e), R(e) being the identity. The spectral parameter label z(e) satisfies z(e) ∈ {z1, . . . , z l } for each e. For example, the ab-derivative of the G-factor G e = B( G(z)) t described by the edge e with labels (1, 1, 0, z, B, I) yields a sum of two terms and hence the two new graphs given by We now describe the selection of the orthogonality vertices V t o , V 0tr o which is done in two steps. To unify notations we set j := l in the averaged case. (orth-) For each k ∈ t \ {j}, a \ {j} we collect 2p distinct vertices from Vi into the sets V t o and V 0tr o , respectively. (orth-) If j ∈ t or j ∈ a, then we select one vertex from V 2 κ into V t o or V 0tr o , respectively, for each W acting as a degree-2 cumulant on some resolvent. Regarding (orth-) for k ∈ a ∪ t \ {j} the initial graphs representing the lhs. of ( . )-( . ) contain p internal vertices v1, . . . vp between G-edges representing (G k B k ), (G k+1 B k+1 ) and p internal vertices vp+1, . . . v2p between G-edges representing (B * k+1 G * k+1 ), (B * k G * k ). The G-edges adjacent to these internal vertices may change due to derivative actions along the cumulant expansions, however in case k ∈ a, due to the derivative rules explained in the paragraph above it is ensured that all times the two unique G-edges e1, e2 adjacent to v k satisfy R(e1) = B k , L(e2) = I for k ≤ p and R(e1) = I, L(e2) = B * k , so that v k is guaranteed to remain an 0tr-vertex. Similarly, for k ∈ t it is ensured that the two unique G-edges e1, e2 adjacent to v k satisfy t(e1) = 1, t(e2) = 0, so that v k is guaranteed to remain an t-vertex.
Regarding (orth-) we note that while performing the cumulant expansion for W = ab w ab ∆ ab in GjBjW Gj+1 we obtain the degree-2 cumulant term as ab GjBj∆ ab Gj+1 ∂ ba + σ∂ ab the derivatives ∂ ab or ∂ ba acting on some resolvent G result in G∆ ab G or G∆ ba G. In case j ∈ a the κ-vertex corresponding to the summation index a satisfies the definition of 0tr-vertex since Bj = 0 and the other resolvent is not multiplied by some additional matrix in the a-direction. Similarly, in case j ∈ t either both or none of the two G's in G∆ ab G or G∆ ba G are transposed, while, by definition, exactly one of Gj, Gj+1 is transposed. Thus exactly one of the κ-vertices corresponding to the a or b-summations satisfies the definition of being a t-vertex.
We note that the condition a∩t = ∅ ensures the sets V t o , V 0tr o constructed in this way to be disjoint. We now check that the properties (P )-(P ), as well as (P av )-(P av ) and (P iso )-(P iso ) also hold for these graphs.
The properties (P )-(P ) are obvious by construction since each cumulant expansion comes with two κ-vertices, and in total there are 2p underlined terms and thereby at most 2p cumulant expansions. The properties (P av ), (P iso ) follow from the fact that for each factor of Tr W G1B1 · · · G l B l and x, G1B1 · · · GjBjW Gj+1Bj+1 · · · B l−1 G l y there are l − 1 and respectively l − 2 internal vertices of in-and out-degree 1 and that these properties remain invariant under cumulant expansions. Similarly, the properties (P av ) and (P iso ) hold true trivially for the initial terms and remain invariant under cumulant expansions.
For (P ) note that the cumulant κ(ab, (α1, . . . , α k )) comes together with matrices after derivative action, where the transpose is taken in case the derivative acts on a transposed resolvent. In all cases the in-degree of the vertex associated with a is equal to the out-degree of the vertex associated with b. For (P ) note that by the definition of the underline-renormalisation it follows that for degree two edges the corresponding ∂ ba derivative cannot act on its own trace and therefore cycles have to involve at least two V 2 κ vertices. For (P ) we note that |V 2 κ \ Vo| = 2|E 2 κ | − |Vo ∩ V 2 κ |, while due to (P ) each cycle with zero Vovertices contains at least two V 2 κ \ Vo-vertices and each cycle with one Vo-vertex contains at least one V 2 κ \ Vo-vertex.
The claim (P ) follows immediately from the construction (orth-). Similarly, claim (P ) follows from the construction (orth-) together with the observation that because |Eκ| is the total number of cumulant expansions, a total of 2p − |Eκ| derivatives have acted on some W , and thus the number n of W 's acting on as degree-2 cumulants on some G satisfies and, trivially, n ≤ 2p. This concludes the proof of (P ) in the mutually exclusive cases j ∈ a and j ∈ t (recall that a ∩ t = ∅ by assumption). For the claim ( . a) on the number of G-edges in (P ) note that the number of G's remains invariant under the derivative actions. For ( . b) note that each derivative acting on some G increases the number of G's by one, while each of the 2p − |Eκ|derivatives acting on some W leaves the number of G's invariant. Thus we conclude that the total number of G's is

Remark . . Proposition . holds true verbatim also under the alternative definition of the renormalisation outlined in Remark . in case no G is transposed. Also the proof of the proposition remains unchanged except for the proof of Property (P ).
For the alternative renormalisation also for degree two edges when expanding W G · · · = ab ∆ ab G · · · ∂ ba the derivative σ∂ ab may act on its own trace. However, since no G is transposed this action will necessarily result in ∆ ab G · · · G∆ ab and therefore no loops are created.
Using Proposition . , in order to conclude Theorem . , it remains to estimate Val(Γ) for each Γ ∈ Gp as follows. We note that the following Proposition is valid for any av-/iso-graphs Γ ∈ G from Definition . , i.e. graphs satisfying the properties (P )-(P ) and (P av )-(P av )/(P iso )-(P iso ) above rather than only for the specific families of graphs G av p , G iso p arising in the cumulant expansion.

Proposition . (Value estimate).
For each av-graph Γ ∈ G for some parameters a, t, l, p, i ∈ N we have the bound with K as in ( . ), while for each iso-graph Γ for some parameters a, t, l, p, i ∈ N we have the bound Proof of Theorem . . Theorem . follows immediately by combining Propositions . and . under the simplifying assumptions made at the beginning of Section , the removal of which is discussed in Appendix A. Following the proof Proposition . it is evident that both Λ 2ap + and Π 2tp + can be replaced by the product of individual Λ B k + , Π B k + for k ∈ a ∪ t, as claimed in Theorem . . Finally, regarding the replacement of ρ i by k∈i ρ(z k ) in the bounds of Theorem . , it is easy to see that during the cumulant expansion the number of G(z k ) is preserved and each gives rise to a factor ρ(z k ) in Proposition . , hence the factor ρ 2ip may be replaced by the factor k∈i ρ(z k ) 2p . Similarly, for the replacement of Λ a + by k∈a Λ B k + we note that each B k appears exactly 2p times also after the cumulant expansions, and therefore each Λ B k + can only appear in at most the 2p-th power on the rhs. of ( . )-( . ).
. . Estimating graph values: Proof of Proposition . . The proof of Proposition . goes in three major steps formulated in Lemmata . , . and . which we first state and then use to conclude the proof of Proposition . . First, we express the value Val(Γ) = Val(Γ red ) as the value of the reduced graph Γ red obtained from Γ by collapsing all degree-2 vertices Vi ∪ V 2 κ . Thus, in graph-theoretic terms, Γ red is the minimal (with the least number of edges) graph having Γ \ E 2 κ as a subdivision. We claim that each summation index av for v ∈ Vi ∪ V 2 κ appears in exactly two G-factors and no κ-matrices, and thus the summation can be written as a matrix product after (potentially) transposing one of the two G's in the cases of two incoming or two outgoing edges, e.g. av (GB)xa v Gya v = (GBG t )xy. Indeed, the index av appears only in exactly two G-edges since dg(v) = 2, cf. Definition . . Moreover, due to (P ) no κ-edge is adjacent to Vi while for v ∈ V 2 κ the corresponding κ-edge (uv) or (vu) due to ( . ) and ( . ) is given by κ 1,1 or κ 2,0 which are constant-1, and constant σ-matrices, and thus effectively the index av does not appear in any κ (vu)/(uv) matrix.
In the reduction process the value of Γ effectively reduces to a summation over vertices of degree at least 3, traces of G-cycles and entries of G-chains and E ≥3 κ -matrices, represented by Γ red . Here we use the terminology that a G-cycle is a cycle of G-edges on V 2 κ ∪ Vi vertices, irrespective of the edge orientation, and that a G-chain is a chain of G-edges with internal V 2 κ ∪ Vi-vertices and external V ≥3 κ ∪ Ve-vertices, again irrespective of the edge orientation. Note that the reduction completely collapses each Eg-cycles on Vi ∪ V 2 κ -vertices into a single vertex with a loop edge. The sets of these single vertices and loop edges are denoted by Vcyc and E red,cyc g . Therefore the edge set of reduced graph Γ red is naturally partitioned into V (Γ red ) := V ≥3 κ∪ Ve∪Vcyc and its edge set is E(Γ red ) := E red g∪ E ≥3 κ . The graph reduction by partial resummations corresponds to generalising the definition of value to where we defined as a matrix product of (possibly transposed, depending on the in-and out-degrees) of G (v 1 v 2 ) , . . . , Lemma . . For each av-/iso-graph Γ ∈ G with parameters a, t, l, i, p and the selected vertex κ ) denote its reduction. The reduced graph then satisfies and Val(Γ red ) = Val(Γ).

Moreover, we have
Second, we estimate the value of each graph by bounding the size of each of the reduced G-edges entrywise and the summations trivially.

Lemma . . For each av-/iso-graph
where δ ≥4 := e∈Eκ dg(e) 2 − 2 + Finally, in the third step we improve upon the entrywise estimate as by estimating summations corresponding to some V ≥3 κ -vertices more effectively, using a Schwarz inequality followed by the Ward identity GG * = G/η.

Lemma . . For each av-graph Γ ∈ G with the selected vertex sets
and for each iso-graph Γ ∈ G we have |Val(Γ red )| ≺ I3-Est(Γ) with Before proving Lemmata . -. we conclude the proof of Proposition . .
Proof of Proposition . . The proof of Proposition . distinguishes several cases. For the averaged bound we consider the two cases a = t = 0 and a + t = o > 0, |Vi ∩ Vo| = 2(o − 1)p separately, with the remaining case o > 0, |Vi| = 2op being discussed in Section A. , while for the isotropic bound we consider the cases o ≥ 0, |Vi ∩ Vo| = 2op and o > 0, |Vi ∩ Vo| = 2(o − 1)p separately.
Next, we consider the |Vo∩Vi| = 2op case of isotropic bound, where we obtain from Lemma . , ( . b), and |Vi| = 2p again using K N ρ 2 and |E 2 κ | + |E 3 κ |/2 ≤ |Eκ| ≤ 2p. Next, we consider the |Vo ∩ Vi| = 2(o − 1)p, o > 0 case of both the averaged bound, and the isotropic bound where we similarly obtain (from estimating (. . .)+ ≥ 0 for the iso-graphs) Here we used (P ) and (P av )/(P iso ) and Vo ⊂ Vi ∪ V 2 κ (since by definition Vo are degree-2 vertices, while Ve = ∅ due to (P av ) in the averaged case and dg(v) = 1, v ∈ Ve due to (P iso ) in the isotropic case and V ≥3 κ -vertices have degree at least 3 by (P )) in the equality. Furthermore, we used (P ) in the first inequality, and (P ) in the second inequality, and K/N ρ 2 and (P ) in the final step.
. . . Graph reduction: Proof of Lemma . . Since for dg((uv)) = 2 we have κ (uv) ab = 1 or κ (uv) ab = σ for all a, b due to ( . ) (using Assumption (A-i)) it is possible to write (with potential transpositions) the summation over av for v ∈ Vi ∪ V 2 κ as matrix products which are then associated with edges of the reduced graph Γ red . In this way G-chains (v1v2), . . . , (v k−1 v k ) ∈ Eg with v2, . . . , v k−1 ∈ V 2 κ ∪ Vi and v1, v k ∈ V 2 κ ∪ Vi are reduced to the edge (v1v k ) ∈ E red g , and G-cycles (v1v2), . . . , (v k v1) ∈ Eg with v1, . . . , v k ∈ V 2 κ ∪ Vi are reduced to isolated loops which we represent by the vertex v1 ∈ Vcyc and the loop-edge (v1v1) ∈ E red,cyc g ⊂ E red g . For each cycle of length k we arbitrarily pick one of the k possible reductions since they are all equivalent.
The first relation in ( . ) follows trivially since for each of the carried out summations corresponding to V 2 κ ∪ Vi the number of G-edges is reduced by one with the exception that for cycles the last index is kept in Vcyc. The second relation in ( . ) is a direct consequence of (P ). Next, the claim ( . ) follows from (P ) and by noting that the definition of a(e), t(e) is consistent with the counting of t/0tr-vertices in Γ. This concludes the proof of Lemma . .
Remark . . The estimates ( . a)-( . b) are designed to take advantage of the asymptotic orthogonality vertices. Indeed, using that a posteriori we will show that Λ+ + Π+ ≺ 1 in the bulk, ρ ∼ 1, both inequalities essentially depend on the number of orthogonality-vertices as . Therefore as long as η N −1/2 the orthogonality helps and our bounds do exploit this effect. However, for η N −1/2 it is better to use ( . a)-( . b) by simply ignoring the asymptotic orthogonality, i.e. choosing a = t = ∅. We will not need to use this improvement in the main body of the proof of Theorem . , but it will be used when we remove simplification (iii) in Section A. .
Using Lemma . , the proof of which we defer to the the end of the subsection, we now conclude the proof of Lemma . . From Lemma . we obtain Val(Γ) = Val(Γ red ) with Val(Γ red ) as in ( . ). By estimating |κ (uv) ab | 1 and G via Lemma . we obtain from ( . ), where we used ( . ) in the second step. Here we counted the factors of ρ as due to ( . ), completing the proof of Lemma . .
Proof of Lemma . . We actually prove a slightly more general bound which allows for chains G e = G1B1 · · · G l B l with i For any e ∈ E red g we consider the original chain or cycle in Γ that was reduced to e. The alternating chains associated with e are the maximal subchains of these original chain/cycle with internal vertices from Vo and at least one Vo-vertex. For example, if e ∈ E red g was the reduction of the cycle (GA)( GA)(BG * )(G)(AG * )( G) t then the alternating chains associated with e are (GA)( GA) and (G)(AG * )( G) t . By maximality, o(e), the number of Vo-vertices in the original chain/cycle that has been reduced to e is equal to the total number of Vo vertices in the alternating chains associated with e. In particular, c(e) ≤ o(e).
Averaged bound for o(e) = 0. In the case without alternating chains, i.e. for o(e) = 0 we simply split off any G-factor by Cauchy-Schwarz and obtain Here, and frequently in the remaining proof we used the Ward identity G(z)G(z) * = G(z)/ z and the norm bounds G 1/η, B k 1.
Averaged bound for a(e) = l(e). For G e = G1B1G2B2 · · · G l B l we use spectral decomposition to write a ∈ {ua, ua}, depending on whether G k is transposed or not. By additional averaging using the analogue of ( . ), Cauchy-Schwarz and the high-probability bounds from rigidity ( . ) it follows that where log N factors have been incorporated into the ≺ notation in the ultimate inequality.
Averaged bound for 2 ≤ o(e) < l(e) and c(e) = 1. For this case we may assume by cyclicity that G e = G1B1 · · · GoBoGo+1Bo+1 · · · G l B l such that the summations between G1 and Go correspond to orthogonality indices. Here we make use of the inequality for arbitrary matrices X, Y, Z which follows from singular value decomposition of Y = U SV * and Cauchy-Schwarz in the form where i2···o is the number of G's among G2, . . . Go, and we used the previously considered o(e) = l(e) case in the last step.
Averaged bound for 2 ≤ o(e) < l(e) and c(e) ≥ 2. For at least two alternating chains, c(e) ≥ 2, we may write by cyclicity G e = G e 1 · · · G e c(e) for G e j = Gj,1Bj,1Gj,2 · · · Bj,o j Gj,o j +1Bj,o j +1 · · · G j,l j B j,l j , for some 1 ≤ oj ≤ lj − 1 such for each G e j the first oj internal summation indices are orthogonality indices. By Cauchy-Schwarz it follows that where ij denotes the number of G's among Gj,2, . . . , Gj,o j , and we used the previously discussed o(e) = l(e) case in the third inequality. This concludes the proof of ( . a).
Isotropic bound for o(e) ≤ l(e)−2. We decompose G e = G e 1 · · · G e k such that each of G e 2 , . . . , G e k−1 begins with a new alternating chain followed (potentially) by further G's, G e 1 either begins with an alternating chain, or is a chain without orthogonality indices, and G e k is either an alternating chain or a chain without orthogonality indices. For example, by brackets denoting the decomposition, we would separate if the indices associated with B1, B2, B4 are orthogonality indices, and estimate | v, G e w | ≤ v, G e 1 (G e 1 ) * v (Tr G e 2 (G e 2 ) * ) · · · (Tr G e k−1 (G e k−1 ) * ) w, (G e k ) * G e k w 1/2 . For the two isotropic factors of length lj with oj orthogonality indices and ij many G's we claim that where i1···o is the number of G's among G1, . . . , Go. For the tracial factors we have, as in ( . ), that By combining ( . )-( . ) we obtain completing the proof of ( . b) also in this case.
For the proof of ( . b) in case (. . .)+ > 0 we consider a larger candidate set of size eg(V ≥3 κ , Ve) + eg(V ≥3 κ , V ≥3 κ ) that consists of all edges adjacent to V ≥3 κ -vertices. Going through all V ≥3 κ -vertices in arbitrary order we remove at most k − 2 edges for each vertex v ∈ V k κ , so that the at most two remaining edges are not loops; this yields again a Wardable set. Since |V k κ | = 2|E k κ |, the total number of removed edges is at most which, together with ( . b) yields ( . b). This completes the proof of (S ) modulo ( . ) that we prove now.
Proof of ( . ). The bound ( . a) follows from For the bound ( . b) we note that the set E red g \ E red,cyc g can be partitioned into edges within V ≥3 κ , edges within Ve and edges between these two sets, and thus from (P )-(P iso ) and ( . ) we obtain Furthermore, by (P iso ) each Ve-Ve edge corresponds to at least one V 2 κ -vertex, while by (P ) each cycle E red,cyc g corresponds to at least two V 2 κ -vertices in Γ (which are in particular not part of any chain), whence eg(Ve, Ve) ≤ |V 2 κ | − 2|Vcyc| = 2(|E 2 κ | − |Vcyc|) and the claim follows.
Proof of (S ). We recall from the proof of Lemma . that ( . ) is the minimum of two different estimates given in ( . ). Estimating each G e for e ∈ E red g by Lemma . with a ρ-exponent of i(e) in ( . ) yields the first bound |Val(Γ red )| ≺ I i 2 -Est(Γ). Similarly, estimating each G e by Lemma . with a ρexponent of l(e) − o(e) − 1(e ∈ E red g \ E red,cyc g ) yields the second bound |Val(Γ red )| ≺ I 0 2 -Est(Γ), cf. the first inequality in ( . ). In order to prove (S ) for a given Wardable set E Ward we estimate G e for e ∈ E red,cyc g ∪ (E red g \ (E red,cyc g ∪ E Ward )) exactly as in Lemma . and remove the corresponding edges from the graph, leaving only E Ward -edges. In order to conclude the proof it remains to establish an additional gain of K −1/2 (compared to the first bound) and ρK −1/2 (compared to the second bound) per E Ward -edge e compared to the entrywise estimates.
The proof now follows by induction since by Lemma . after the removal of v1, the next vertex v2 has degree at most 2 etc. and ( . )-( . ) can be used to establish the gain of (ρ)K −1/2 iteratively for each e ∈ E Ward .

A
A. R T .
As a consequence additional graphs appear in the estimate where degree-two κ-edges are collapsed due to δ ab which we will show to be lower order due to fewer summations. Indeed, let Γ be any av/isograph and for (uv) ∈ E 2 κ consider the graph Γ obtained from collapsing the vertices u, v into one. We claim that I3-Est(Γ ) ≤ I3-Est(Γ), (A. ) and thus the bounds in Proposition . remain valid for partially collapsed graphs. Repeating the estimates (A. ) recursively for all collapsed κ-vertices we see that the proof of Theorem . is complete also without the simplifying Assumption (A-i). It remains to prove (A. ). By Lemma . both vertices u, v are necessarily internal vertices of some G-chain or G-cycle. Now there are several possible scenarios. First, one of u, v may be in the set Vo of selected orthogonality vertices (but not both cf. the construction in (orth-)), and second, (opt ) u, v are in the same chain, (opt ) u, v are in the same cycle, (opt ) u, v are in two different chains, (opt ) u, v are in two different cycles, (opt ) one of u, v is in a chain, the other one in a cycle.
For instance suppose we are in scenario (opt ), in which we compare I3-Est(Γ) of the two graphs where the square vertices denote vertices from Ve ∪ V ≥3 κ , and the G-edges may denote chains of arbitrary lengths l1, . . . , l4 with o1, . . . o4 internal Vo-vertices. On the lhs. of (A. ) the product of the estimates on the chains (xy) and (x y ) using Lemma . is at least while on the rhs. of (A. ) the product of the estimates on the four chains (xu), (uy), (x u), (uy ) and the size N of the summation corresponding to u is at most However, for the graph on the rhs. we can gain at least two additional factors of ρK −1/2 since there are at least two additional edges for which the Ward gain from (S ) is applicable due to the u-summation. Thus, we obtain an estimate Using K N ρ 2 it follows that the bound (A. ) is not larger than (A. ) in both cases, and thus the I3-Estestimate on the subgraph on the lhs. cannot be larger than the subgraph on the rhs. and the claim (A. ) follows. The comparison works similarly for the other scenarios (opt ), (opt ), (opt ) and (opt ), so we omit further details regarding the proof of (A. ) in those cases.
A. . Averaged bound in case l ∈ a ∪ t = ∅; removing Assumption (A-iii). Here we consider the l ∈ a ∪ t case of the averaged bound in Theorem . . The only difference to the case l ∈ a ∪ t is the selection process of Vo vertices described in (orth-)-(orth-). We fix some j ∈ a ∪ t arbitrarily and select vertices into Vo as follows: (orth'-) For each k ∈ t \ {j}, a \ {j} we collect 2p distinct vertices from Vi into the sets V t o and V 0tr o , respectively. (orth'-) If j ∈ t or j ∈ a, then we select one additional vertex from Vi into V t o or V 0tr o , respectively, for each W acting as a degree-2 cumulant on some resolvent. The fact that the selection of Vo-vertices in (orth'-) is possible follows exactly as for (orth-), i.e. due to the fact that internal orthogonality vertices are guaranteed to remain orthogonality vertices throughout the cumulant expansion. For (orth'-) we note that we could also add all 2p vertices corresponding to j into the set Vo due to them being internal, however more Vo vertices is not necessarily beneficial, cf. Remark . . It remains to establish that the set Vo satisfies the same bounds as the set Vo constructed in case l ∈ a ∪ t, i.e.
The claim (A. b) follows immediately from ( . ). Regarding (A. a) we monitor the change of |E 2,3 κ |, Vo, etc. along the iterative construction of the graphs in the proof of Proposition . . In addition, we count the number ncyc,W G of cycles including W and some non-underlined G. In the beginning of the expansion we have 2p cycles, each including one W and l underlined G's. Now, if some W acts on some W in another cycle, then the two cycles are replaced by one cycle with 2l non-underlined G's. However, if some W acts on a G in another cycle, then the two cycles are replaced by one cycle with one W and (2l + 1) G's, l of which are not underlined, e.g.
E Tr W GAGB Tr W GAGB = N −1 E ab Tr ∆ ab GAGB Tr ∆ ba GAGB − Tr W G∆ ba GAGB + · · · = N −1 E Tr GAGBGAGB − Tr GAGBGAGBW G + · · · , demonstrating the two possible actions. The newly created partially underlined cycle is of importance since it, contrary to the original fully underlined cycles, allows for W to act on some G's (the nonunderlined ones) in its own cycle, e.g.
This mechanism was also present in the main body of the proof of Theorem . , see e.g. ( . ), but there we did not need to monitor the number of partially underlined cycles along the cumulant expansion and they disappeared in the end. In the current proof ncyc,W G is an auxiliary quantity to prove (A. a).
We claim that in all steps along the expansion the inequality |Vo| + 2|V o=0 cyc | + |V o=1 cyc | + ncyc,W G ≤ 2(o − 1)p + 2|E 2 κ | (A. ) is valid, which is obvious initially since there we have Eκ = ∅, |Vo| = 2(o − 1)p and ncyc,W G = 0. Whenever some W acts as a degree-2 cumulant on another W , then in our algorithm no vertex is added to Vo, while |E 2 κ | is increased by 1, and one pure-G cycle is created, so (A. ) continues to remain valid. Otherwise, if some W acts on some G in its own cycle (which is only possible if the corresponding G is not underlined), then ncyc,W G is decreased by 1, while |Vo|, |E 2 κ | are increased by 1, and either |V o=0 cyc | is increased by at most 1, or |V o=1 cyc | is increased by at most 2, confirming (A. ). Next, if some W acts on G in another cycle, then |V o=0,1 cyc | cannot increase, while ncyc,W G may increase by 1, and both |Vo|, |E 2 κ | do increase by 1, respecting (A. ). Finally, if W acts on either W or G in some non-cycle, then the number of cycles cannot be increased, making the validity of (A. ) trivial. Any higher-degree cumulant expansions cannot increase the lhs. of (A. ) while leaving the rhs. invariant. This proves (A. ) inductively along the expansion. Hence, after all cumulant expansions are performed (so that, in particular, ncyc,W G = 0) (A. ) implies (A. a).
In order to describe the structure of graphs encoding the polynomial from the cumulant expansion of E| x, W G1B1 · · · B l−1 G l y | 2p (A. ) similarly to the graphs from Definition . , it is convenient to add a new type of edge E= for edges encoding the identity matrix I which we use to "connect" x and W , i.e. in the resulting graph, (Vκ∪Ve, E=) is bipartite. Since the E=-edges represent the identity matrix which is symmetric, their orientation is irrelevant contrary to the Eg and Eκ-edges. The set of graphs we obtain satisfies (P )-(P ) (after redefining the dg-degree to also count E=-edges), |Vi| = 2(l − 1)p = 2(o + b)p, |Vo| = |Vi ∩ Vo| = 2op, Vo ∩ V 2 κ = ∅, (A. ) Eq. (P iso ) and |E=| = 2p. (A. ) Moreover we claim that each graph satisfies the inequality |{v ∈ Vκ | |v ∩ E=| ≥ 2}| ≤ (2p − |Eκ|) ∧ p, ) where v ∩ E= is understood as the set of edges from E= adjacent to v. Indeed, in the expansion of (A. ) two E= edges can only meet in some v ∈ Vκ if one of the adjacent W 's acted as a derivative on the other one. Thus (A. ) follows from the fact that in total 2p − |Eκ| derivatives have acted on some W , cf. the proof of (P ), noting that the upper bound of p is trivial by (A. ). Example graphs occurring along the expansion encode the polynomials (where the second and third term are non-zero only in the real case) E| x, W Gy | 2 = E x, W Gy y, G * W x = E ab κ(ab, ba)IxaG by G * yb Iax + E ab κ(ab, ab)IxaG by G * ya I bx + E abcd κ(ab, ab)κ(cd, cd)IxaG bc G dy G * ya G * bc I dx − E abcd κ(ab, ab, ba)κ(cd, dc)IxaG bb G ad GcyG * ya G * bc I dx + · · · (A. ) which we represent graphically as with double line representing E= edges.
The graph reduction procedure of Vi∪V 2 κ is performed exactly as in Lemma . . However, regarding the E=-edges, two new phenomena occur. First, if some v ∈ V 2 κ connects two (uv), (vw) ∈ E= edges, then the two edges are collapsed and contribute just a scalar factor to the graph value, the inner product xu, xw of the vectors xu, xw associated with the external vertices u, w, see e.g. in the first term on the rhs. of (A. )-(A. ): a IxaIay = x, y . In the reduced graph, we record this as a new type of isolated vertex v ∈ Vsc, representing the scalar product. Second, if some v ∈ V 2 κ connects one (uv) ∈ E= and one (vw) ∈ Eg, then the reduction process corresponds to simply replacing these two edges by (uw) ∈ E red g , representing the matrix G (uw) := G (vw) , see the second, third and fourth term in (A. )-(A. ), e.g. b G by I xb = Gxy. As a result of the reduction process we obtain a reduced graph Γ red = (Ve ∪ V ≥3 κ ∪ Vcyc ∪ Vsc, E = ∪ E red g ), where E = denotes the subset of E= connecting Ve and V ≥3 κ . Similarly to ( . ), the number of edges in the reduced graph is given by For each e ∈ E red g we use the entrywise bound of Lemma . . On top of that, similarly to Lemma . , we obtain a set E Ward ⊂ (E red g \ E red,cyc g ) ∪ E = of Wardable edges, which contrary to the previous case may include both E red g -and E = -edges. The proof of ( . b) verbatim also applies to the current case by choosing the candidate sets of all E = ∪ (E red g \ E red,cyc