1 Introduction

Since the groundbreaking discovery of E. Wigner [53] postulating that Hermitian random matrices can effectively model the universal statistics of gaps between energy levels of large atomic nuclei, simple random matrices have been routinely used to replace more complicated quantum Hamilton operators for many other physically relevant problems, especially in disordered or chaotic quantum systems. A fundamental phenomenon of such systems is Quantum Ergodicity (QE), stating that the eigenvectors tend to become uniformly distributed in the phase space.

In this paper we study an enhanced version of this question, the Quantum Unique Ergodicity (QUE), for real or complex Wigner matrices and for general observables. We recall that the Wigner matrix ensemble consists of \(N\times N\) random Hermitian matrices \(W=W^*\) with centred, independent, identically distributed (i.i.d.) entries up to the symmetry constraint \(w_{ab} = \overline{w_{ba}}\). Let \(\left\{ \varvec{u}_i \right\} _{i=1}^N\) be an orthonormal eigenbasis of \(W\). Our main Theorem 1 asserts that for any deterministic matrix \(A\) with \(\Vert A\Vert \le 1\) we have the limit \(\langle \varvec{u}_i, A \varvec{u}_j\rangle \rightarrow \delta _{ij}\langle A\rangle \) with very high probability (and hence uniform in \(i,j\)) with optimal speed of convergence \(1/\sqrt{N}\), i.e.

$$\begin{aligned} \max _{ij}\left|\langle \varvec{u}_i, A \varvec{u}_j\rangle -\langle A\rangle \delta _{ij}\right| \lesssim \frac{N^\epsilon }{\sqrt{N}} . \end{aligned}$$
(1)

Here we introduced the shorthand notation \(\langle R \rangle :=\frac{1}{N}{{\,\mathrm{Tr}\,}}R\) for the normalized trace of any \(N\times N\) matrix. In other words, (1) establishes the QUE in strong form (i.e. uniformly in \(i,j\)) for any Wigner matrix, and shows that the action of any bounded traceless deterministic matrix on the eigenbasis \(\left\{ \varvec{u}_i \right\} _{i=1}^N\) makes it asymptotically orthogonal to itself (up to an optimal error \(N^{-1/2}\)). For genuinely complex Wigner matrices our second main Theorem 2 asserts that

$$\begin{aligned} \max _{ij} \left|\langle \varvec{u}_i, \overline{\varvec{u}_j}\rangle \right| \lesssim \frac{N^\epsilon }{\sqrt{N}}, \end{aligned}$$
(2)

again with very high probability, showing that the eigenbases of \(W\) and \(W^t=\overline{W}\) are asymptotically orthogonal.

The question of ergodicity for general observables is also known as the Eigenstate Thermalization Hypothesis (ETH) in the physics literature since the seminal papers of Deutsch [22] and Srednicki [51], see also [20] and [21] for reviews and further references. Our result thus proves ETH with an optimal speed of convergence as predicted, e.g., in [20, Eqs. (20)] for the simplest chaotic quantum system, the Wigner ensemble.

Historically, the most prominent model for quantum ergodicity is the natural quantization of a chaotic classical dynamical system in the semiclassical or in the high-energy regimes. The first mathematical result on QE was obtained by Shnirelman [49]. It asserts that for most high energy (normalized) eigenfunctions \(\psi _i\) of the Laplace-Beltrami operator on a surface with ergodic geodesic flow the measures \(|\psi _i(x)|^2 \mathrm{d} x\) become completely flat as \(i\rightarrow \infty \). This result was later extended by Colin de Verdiére [19] and Zelditch [56] for much larger classes of observables showing that if \(A\) is an appropriate pseudodifferential operator with symbol \(\sigma (A)\), then \(\langle \psi _i, A \psi _j\rangle \rightarrow \delta _{ij} \int _{S^*} \sigma (A)\) for most index pairs as \(i,j\rightarrow \infty \), where \(S^*\) is the unit cotangent bundle of the surface. The analogous result on large regular graphs was obtained by Anantharaman and Le Masson [2]. The celebrated Quantum Unique Ergodicity (QUE) conjecture, formulated by Rudnick and Sarnak [46] in 1994, is a natural strengthening of these results stating that the same limits hold for all indices excluding that exceptional sequences may exhibit exotic behaviour (scarring). QUE in this form is still an outstanding open question; only certain special cases have been proven, e.g. on arithmetic surfaces for the joint eigenfunctions of the Laplacian and the Hecke operator by Lindenstrauss [42], with Soundararajan’s extension [50], see also [15, 36].

The speed of convergence in quantum ergodicity has been a fundamental question in the theory of quantum chaos, see e.g. [54] for a review and [5] for numerical results. For strongly chaotic (hyperbolic) systems the general physics prediction is that the variance of \(\langle \psi _i, A \psi _j\rangle \) is proportional with the inverse of the Heisenberg time, roughly speaking the local eigenvalue spacing (see e.g. [23, Eq. (24)] building upon earlier results by Feingold and Peres [32]). For the hyperbolic geodesic flow on general Riemannian manifolds only inverse logarithmic decay has been proven by Zelditch [55] and Schubert [48] which is even optimal for a special highly degenerate eigenbasis of the quantization of Arnold’s cat map [47], see also [40] for surfaces of high genus. For similar quantitative QUE results on large deterministic graphs see [1, 3, 16]. Much stronger polynomial bounds hold for special arithmetic surfaces proven by Luo and Sarnak [43] and for linear maps on the torus [39, 45] and toral eigenfunctions [35]. For random \(d\)-regular graph optimal polynomial speed of convergence for QUE with diagonal observables has been obtained in [6, 7].

For large Wigner matrices, using the Dyson Brownian motion (DBM) for eigenvectors, Bourgade and Yau [12] showed that for any fixed deterministic unit vector \(\varvec{q}\) and any \(i\) in the bulk spectrum or close to the edge, the squared overlaps \(N |\langle \varvec{u}_i, \varvec{q}\rangle |^2\) converge in distribution to the square of a standard Gaussian as \(N\rightarrow \infty \) (see also [8] for deformed Wigner matrices and [37, 52] for the same result under four moment matching condition in the bulk). The corresponding a priori bound, asserting that \(N |\langle \varvec{u}_i, \varvec{q}\rangle |^2\lesssim N^\epsilon \) with very high probability for any \(\epsilon >0\), has been known beforehand as the complete delocalisation of eigenvectors [11, 29, 30, 38]. DBM methods allow to obtain optimal delocalisation estimates [10]. We mention that [12] also obtains asymptotic normality for the joint distribution of finitely many eigenvectors tested against one fixed vector \(\varvec{q}\) and for the joint distribution of a single eigenvector with finitely many test vectors \(\varvec{q}_1, \varvec{q}_2, \ldots \varvec{q}_K\). Very recently the joint normality of finitely many eigenvectors and finitely many test vectors has also been achieved [44].

These results based upon DBM establish the universality of fluctuation for individual eigenvectors tested against finite rank observables A, i.e. for \(A=\frac{N}{K}\sum _{k\le K} a_k |\varvec{q}_k\rangle \langle \varvec{q}_k|\), with \(K\) being \(N\)-independent and \(a_k\in [-1,1]\). The key mechanism of QUE for general observables is the self-averaging (ergodic) property of this sum as the rank \(K= K(N)\) tends to infinity. As a simple corollary of the fluctuation results, QUE in a weak form was also obtained asserting that \(\langle \varvec{u}_i, A \varvec{u}_i\rangle \rightarrow \langle A\rangle \) in probability for any fixed \(i\) in the bulk if \({{\,\mathrm{rank}\,}}(A)\) grows with \(N\), see [12, Corollary 1.4] (this result was stated only for diagonal matrices \(A\), but it directly generalizes to any \(A\) by spectral decomposition). However, the effective probabilistic estimates in [12] were not sufficient to prove the strong form of QUE, i.e. to guarantee that the limit holds for all eigenvectors simultaneously. This uniformity was proven in [13, Theorem 2.5] but only for random matrices with a Gaussian component of size \(t\gg 1/N\) with an error of order \(1/\sqrt{Nt}\). An off-diagonal version, \(\langle \varvec{u}_i, A \varvec{u}_j\rangle \rightarrow 0\) for \(i\ne j\), coined as quantum weak mixing, was also obtained in [13] and strengthened in [9]. Standard Green function comparison arguments may be used to remove the large Gaussian component but only with a considerably suboptimal error or under the extra assumption of matching the first several (in fact more than four) moments of the matrix elements of \(W\) with those of the Gaussian GOE/GUE ensemble.

Summarizing, our Theorem 1 generalizes the probabilistic QUE proven in [12, Corollary 1.4] and in [13, Theorem 2.5] to general Wigner ensembles in three aspects: (i) the speed of convergence is optimal (up to an \(N^\epsilon \) factor); (ii) the limit is controlled in very high probability, and (iii) it holds uniformly throughout the spectrum including bulk, edge and the intermediate regime. For any deterministic Hermitian observable written in spectral decomposition \(A=\sum _{k=1}^N a_k |\varvec{q}_k\rangle \langle \varvec{q}_k|\), our main result,

$$\begin{aligned} \left|\langle \varvec{u}_i, A \varvec{u}_j\rangle - \delta _{ij}\langle A\rangle \right| = \left|\frac{1}{N}\sum _{k=1}^N a_k \Bigl ( N \langle \varvec{u}_i, \varvec{q}_k \rangle \langle \varvec{q}_k, \varvec{u}_j \rangle - \delta _{ij} \Bigr )\right| \lesssim \frac{N^\epsilon }{\sqrt{N}}, \end{aligned}$$
(3)

shows that the fluctuations of \(N \langle \varvec{u}_i, \varvec{q}_k \rangle \langle \varvec{q}_k, \varvec{u}_j \rangle \) are so strongly asymptotically independent for different \(k\)’s that their average has the expected \(1/\sqrt{N}\) fluctuation scaling reminiscent to the central limit theorem, up to an \(N^\epsilon \) factor. In fact, in our companion paper [18, Theorem 2.3] we also show that the diagonal overlaps \(\langle \varvec{u}_i,A\varvec{u}_i\rangle \), after a small averaging in the index \(i\), satisfy a CLT@.

Next we outline the novel ideas of our proof. We consider a spectrally averaged version of the overlaps

$$\begin{aligned} \varLambda ^2:= \max _{i_0, j_0}\frac{1}{(2J)^2}\sum _{\begin{array}{c} |i-i_0|< J\\ |j-j_0|< J \end{array}} N |\langle \varvec{u}_i, A\varvec{u}_j\rangle |^2 \end{aligned}$$
(4)

for bounded traceless observables, \(\langle A\rangle =0\), \(\Vert A\Vert \le 1\), where \(J= N^\epsilon \) with some tiny \(\epsilon >0\). Our goal is to show that \(\varLambda \) is essentially of order one, with high probability. Denoting by \(G=G(z)= (W-z)^{-1}\) the resolvent at \(z\in {\mathbf {H}}\), notice that, by spectral decomposition,

$$\begin{aligned} \varLambda ^2 \sim \sup _{E, E'\in [-2,2]} (\rho \rho ')^{-1} \langle \mathfrak {I}G(E+\mathrm {i}\eta ) A \mathfrak {I}G (E'+\mathrm {i}\eta ') A\rangle , \end{aligned}$$
(5)

where \(\eta \) is slightly above the local eigenvalue spacing at \(E\) and \(\rho \) is the semicircular density at \(E\) smoothed out on scale \(\eta \); the primed quantities defined analogously. We note that a relation analogous to (5) has previously been used in [4, Section 5]. The main work consists in proving a high probability optimal bound on the quadratic functional of the resolvent \(\langle GAGA\rangle \), with possible imaginary parts and at different spectral parameters. Note that for overlaps with a rank one observable, \(A= |\varvec{q}\rangle \langle \varvec{q}|\), it is sufficient to control \(\langle \varvec{u}_i, A\varvec{u}_i\rangle = |\langle \varvec{q}, \varvec{u}_i\rangle |^2\). After a mild local averaging in the index \(i\) this becomes comparable with \(\langle \varvec{q}, ( \mathfrak {I}G)\varvec{q}\rangle \) whose control is equivalent to a conventional single-\(G\) isotropic local law. This served as a natural input for the DBM proofs on eigenvectors in [12, 13]. For traceless observables, however, \(\langle \varvec{u}_i, A\varvec{u}_i\rangle \) does not have a sign, so we need to consider \(|\langle \varvec{u}_i, A\varvec{u}_i\rangle |^2\) to understand its size, hence the relevant quantity is \(\varLambda ^2\) containing two \(G\) factors (5), i.e. single-\(G\) local laws are not sufficient.

For estimating (5) we face a combination of two serious difficulties. First, we need to gain an additional cancellation from the fact that \(A\) is traceless; second, we need to handle local laws for products of several \(G\)’s. The first issue already arises on the level of a single-\(G\) local law: In Theorem 3 we will prove that the resolvent approximation \(G\approx m\) by the Stieltjes transform \(m=m(z)\) of the Wigner semicircular density, commonly referred to as a local law, holds to a higher accuracy when tested against a traceless observable. More precisely for the decomposition \(\langle GA\rangle = \langle A\rangle \langle G\rangle + \langle G(A-\langle A\rangle )\rangle \) and with \(\rho :=\left|\mathfrak {I}m\right|/\pi \) we have that

$$\begin{aligned} \langle G\rangle = m+{{\mathcal {O}}}\left( \frac{1}{N\eta } \right) , \quad \langle G(A-\langle A\rangle )\rangle = {{\mathcal {O}}}\left( \frac{\rho ^{1/2}}{N\eta ^{1/2}} \right) , \end{aligned}$$
(6)

with both errors being optimal, in fact they identify the scale of the asymptotic Gaussian fluctuation of \(\langle G\rangle \) and \( \langle G(A-\langle A\rangle )\rangle \), respectively [18, 34]. Note that the error term for the traceless part is much smaller than that for \(\langle G\rangle \) in the relevant small \(\eta \) regime. For \(\langle GAG^*A\rangle \) the discrepancy is even bigger; without zero trace assumption \(\langle GAG^*A\rangle \sim 1/\eta \) (e.g. for \(A=I\)), while for \(\langle A\rangle =0\) we will show that \(\langle GAG^*A\rangle \sim 1\) even for very small \(\eta \).

The second issue touches upon the basic mechanism of the standard proof of the local laws. It consists in deriving an approximate self-consistent equation for the quantity in question, e.g. \(\langle GAGA\rangle \), and compare it with the corresponding deterministic equation (Dyson equation) without approximation error. The main error term \(\langle \underline{WGAGA}\rangle \), a renormalized version of \(\langle WGAGA\rangle \), see (42), is expected to be smaller than \(\langle GAGA\rangle \), but when estimating its high moments by a cumulant expansion many terms with traces of more than two \(G\)-factors emerge. Trivial a priori bounds using \(\Vert G\Vert \le 1/\eta \) are not affordable, so one has to continue expanding, resulting in higher and higher degree monomials in \(G\); reminiscent to the notoriously difficult closure problem in the BBGKY hierarchy for the correlation functions of interacting particle dynamics. In the proof of the conventional local law \(\langle G\rangle =m+{{\mathcal {O}}}\left( 1/N\eta \right) \), the expansion is stopped by using the Ward identity \(GG^* = \mathfrak {I}G/\eta \), reducing the number of \(G\) factors by one. However, with a deterministic matrix in between, as in \(GAG^*\), Ward identity is not applicable. A trivial Schwarz bound followed by Ward identity,

$$\begin{aligned} |\langle GAG^*A \rangle | \le \langle GAA^*G^* \rangle = \frac{1}{\eta } \langle (\mathfrak {I}G)AA^* \rangle , \end{aligned}$$
(7)

is available, but at the expense of replacing the traceless matrix \(A\) with the non-zero trace matrix \(AA^*\), hence losing the main cancellation effect that we cannot afford. Our main idea is to use \(\varLambda \) from (4) as the basic control quantity and derive a stochastic Gronwall inequality for it. In doing so, we use the spectral decomposition of \(G\) to estimate traces of products of many \(G's\) and \(A'\)s by the lower degree term \(\langle GAGA\rangle \). Technically, this requires to extract sufficiently many \(\varLambda \)-factors in the cumulant expansion, which we achieve by a subtle Feynman graph analysis to estimate all high moments of \(|\langle \underline{WGAGA}\rangle |\).

Feynman diagrams have been systematically used to organize cumulant expansions and their estimates come on different levels of sophistication, see e.g. [17, 27, 28], but also related expansions in random matrices, e.g. [24, 25, 33]. For the proof of (6) via cumulant expansion (e.g. following [28]), it is sufficient to monitor the number of \(N\)-factors (from the size of the cumulants and from the summation of intermediate indices) and the number of \(\rho /\eta \) factors from the Ward identity. In the current analysis we additionally need to monitor the \(\varLambda \) factors. While the number of traceless \(A\)-factors is preserved along the expansion, but the cancellation effect of some of them may be lost as in (7). Our proof has to carefully offset all such losses by the gains from higher order cumulants that typically accompany the loss of effective \(A\)-factors. In particular, since we are aiming at an optimal bound, in the expansion terms that involve only second order cumulants we need to gain from all \(A\)-factors. We used a similar but much simpler expansion in our work on CLT for non-Hermitian random matrices, see [17, Prop. 5.3, Eq. (5.10c)], where the additional smallness came from the large distance between two (non-Hermitian) spectral parameters \(z_1, z_2\). However, in [17] it was sufficient to gain only a small proportion of all possible smallness factors since we did not aim at the optimal bound. In the current paper, using a refined combinatorics we manage to extract the zero trace orthogonality effect to the maximal extent; this is the key to obtain the optimal error bound in (3). Similarly, for the proof of (2) we manage to extract the asymptotic orthogonality effect between the eigenvectors \(\varvec{u}_i\) and their complex conjugates \(\overline{\varvec{u}_i}\) optimally, resulting in the bound \(\left|\langle GG^t\rangle \right|\lesssim 1\), gaining a full power of \(\eta \) over e.g. \(\langle GG^*\rangle \sim 1/\eta \).

After this introduction and presenting the main results in the next Sect. 2, we prove the local laws involving two resolvents in Sect. 3. The main inputs for them are the improved bounds on renormalized (“underlined”) monomials in several \(G\)’s in Theorem 5 that are proven in Sect. 5. Note that even though we are interested in local laws only with two \(G\)’s, due to the cumulant expansion we need to control arbitrary long monomials involving a product of \(G\)’s and \(A\)’s.

1.1 Notations and conventions

We introduce some notations we use throughout the paper. For integers \(k\in {\mathbf {N}}\) we use the notation \([k]:= \left\{ 1,\ldots , k \right\} \). We write \({\mathbf {H}}\) for the upper half-plane \({\mathbf {H}}:= \left\{ z\in {\mathbf {C}}\Big |\mathfrak {I}z>0 \right\} \). For positive quantities \(f,g\) we write \(f\lesssim g\) and \(f\sim g\) if \(f \le C g\) or \(c g\le f\le Cg\), respectively, for some constants \(c,C>0\) which depend only on the constants appearing in (8). We denote vectors by bold-faced lower case Roman letters \({\varvec{x}}, {\varvec{y}}\in {\mathbf {C}}^k\), for some \(k\in {\mathbf {N}}\). Vector and matrix norms, \(\left\Vert \varvec{x}\right\Vert \) and \(\left\Vert A\right\Vert \), indicate the usual Euclidean norm and the corresponding induced matrix norm. For any \(N\times N\) matrix \(A\) we use the notation \(\langle A\rangle := N^{-1}{{\,\mathrm{Tr}\,}}A\) to denote the normalized trace of \(A\). Moreover, for vectors \({\varvec{x}}, {\varvec{y}}\in {\mathbf {C}}^N\) we define

$$\begin{aligned} \langle {\varvec{x}},{\varvec{y}}\rangle := \sum \overline{x}_i y_i, \qquad A_{\varvec{x}\varvec{y}}:=\langle \varvec{x},A\varvec{y}\rangle , \end{aligned}$$

with \(A\in {\mathbf {C}}^{N\times N}\). We will use the concept of “with very high probability” meaning that for any fixed \(D>0\) the probability of the \(N\)-dependent event is bigger than \(1-N^{-D}\) if \(N\ge N_0(D)\). Moreover, we use the convention that \(\xi >0\) denotes an arbitrary small constant which is independent of \(N\).

2 Main Results

We consider real symmetric or complex Hermitian \(N\times N\) Wigner matrices \(W\). We formulate the following assumptions on the entries of \(W\).

Assumption 1

The matrix elements \(w_{ab}\) are independent up to the Hermitian symmetry \(w_{ab}=\overline{w_{ba}}\). We assume identical distribution in the sense that \(w_{ab}{\mathop {=}\limits ^{\mathrm {d}}}N^{-1/2}\chi _{\mathrm {od}}\), for \(a<b\), \(w_{aa}{\mathop {=}\limits ^{\mathrm {d}}}N^{-1/2}\chi _{\mathrm {d}}\), with \(\chi _{\mathrm {od}}\) being a real or complex random variable and \(\chi _{\mathrm {d}}\) being a real random variable such that \({{\,\mathrm{{\mathbf {E}}}\,}}\chi _{\mathrm {od}}={{\,\mathrm{{\mathbf {E}}}\,}}\chi _{\mathrm {d}}=0\) and \({{\,\mathrm{{\mathbf {E}}}\,}}|\chi _{\mathrm {od}}|^2=1\). In the complex case we also assume that \({{\,\mathrm{{\mathbf {E}}}\,}}\chi _{\mathrm {od}}^2\in {\mathbf {R}}\). In addition, we assume the existence of the high moments of \(\chi _{\mathrm {od}}\), \(\chi _{\mathrm {d}}\), i.e. that there exist constants \(C_p>0\), for any \(p\in {\mathbf {N}}\), such that

$$\begin{aligned} {{\,\mathrm{{\mathbf {E}}}\,}}|\chi _{\mathrm {d}}|^p+{{\,\mathrm{{\mathbf {E}}}\,}}|\chi _{\mathrm {od}}|^p\le C_p. \end{aligned}$$
(8)

In this paper we use the notations \(w_2:={{\,\mathrm{{\mathbf {E}}}\,}}\chi _{\mathrm {d}}^2\), \(\sigma :={{\,\mathrm{{\mathbf {E}}}\,}}\chi _{\mathrm {od}}^2\) and their commonly occurring combination \(\widetilde{w_2}:=w_2-1-\sigma \), and note that \(w_2,\widetilde{w_2},\sigma \in {\mathbf {R}}\).

Our first main result is the proof of the Eigenstate Thermalisation Hypothesis, that in mathematical terms is the proof of an optimal convergence rate of the strong Quantum Unique Ergodicity (QUE) for general observables uniformly in the spectrum of \(W\).

Theorem 1

(Eigenstate Thermalization Hypothesis). Let \(W\) be a Wigner matrix satisfying Assumption 1, and denote by \(\varvec{u}_1,\dots ,\varvec{u}_N\) its orthonormal eigenvectors. Then for any deterministic matrix \( A\) with \(\left\Vert A\right\Vert \lesssim 1\) it holds

$$\begin{aligned} \max _{i,j\in [N]}\left|\langle \varvec{u}_i, A \varvec{u}_j\rangle -\langle A\rangle \delta _{ij}\right|+\max _{i,j\in [N]}\left|\langle \varvec{u}_i, A \overline{\varvec{u}_j}\rangle -\langle A\rangle \langle \varvec{u}_i,\overline{\varvec{u}_j}\rangle \right| \le \frac{N^\xi }{\sqrt{N}}, \end{aligned}$$
(9)

with very high probability for any arbitrary small \(\xi >0\).

The first relation in Theorem 1 states that any deterministic observable is essentially diagonal in the eigenbasis of \(W\), in other words the eigenvectors remain asymptotically orthogonal when tested against any traceless observable \(\langle A\rangle =0\). The second relation shows the same phenomenon between the eigenbasis \(\left\{ {\varvec{u}}_i \right\} _{i\in [N]}\) of \(W\) and the eigenbasis \(\left\{ \overline{{\varvec{u}}_i} \right\} _{i\in [N]}\) of \(W^t\). The scalar products \(\langle \varvec{u}_i,\overline{\varvec{u}_j}\rangle \) appearing in (9) when \(\langle A\rangle \ne 0\) can also be identified. Indeed, the next theorem shows that these two eigenbases are also essentially orthogonal apart from the extreme cases \(\sigma =\pm 1\) (see Remark 1 below).

Theorem 2

Let \(W\) be a Wigner matrix satisfying Assumption 1, and denote by \(\varvec{u}_1,\dots ,\varvec{u}_N\) its orthonormal eigenvectors. Recall \(\sigma :={{\,\mathrm{{\mathbf {E}}}\,}}\chi _{\mathrm {od}}^2\) and assume \(|\sigma |<1\), then there is a constant \(C_\sigma <\infty \) such that

$$\begin{aligned} \max _{i,j\in [N]}\left|\langle \varvec{u}_i, \overline{\varvec{u}_j}\rangle \right| \le C_\sigma \frac{N^\xi }{\sqrt{N}}, \end{aligned}$$
(10)

with very high probability for any arbitrary small \(\xi >0\).

Remark 1

Theorem 2 does not hold for \(\sigma =\pm 1\). Indeed, for \(\sigma =1\) the matrix \(W\) is real symmetric hence the eigenvectors can be chosen real so \(\left|\langle \varvec{u}_i, \overline{\varvec{u}_j}\rangle \right|=\delta _{ij}\). On the other hand for \(\sigma =-1\) and \(w_2=0\) the spectrum is symmetric, i.e. the eigenvalues \(\lambda _1\le \lambda _2\le \dots \) and the corresponding eigenvectors \({\varvec{u}}_1,{\varvec{u}}_2,\dots \) come in pairs, \(\lambda _{N-i+1} = -\lambda _i\) and \(\varvec{u}_{N-i+1}=\overline{\varvec{u}_{i}}\) (up to phase) and thus \(\left|\langle \varvec{u}_i, \overline{\varvec{u}_j}\rangle \right|=\delta _{i,N-j+1}\).

The main inputs to prove Theorems 12 are the local laws for one and two resolvents (and their transposes) tested against matrices \(A\) with \(\langle A\rangle =0\). We recall that in the limit \(N\rightarrow \infty \) the resolvent \(G=G(z)=(W-z)^{-1}\) becomes approximately deterministic. Its deterministic approximation is given by \(m=m_{\mathrm {sc}}\), the Stieltjes transform of the Wigner semicircular law, which is given by the unique solution of the quadratic equation

$$\begin{aligned} -\frac{1}{m}=z+m, \qquad \mathfrak {I}m(z) \mathfrak {I}z>0. \end{aligned}$$
(11)

We note that \(|m|\le 1\) for any \(z\). In this paper we allow spectral parameters with \(\mathfrak {I}z<0\), in order to conveniently account for possible adjoints of the resolvent since \(G(z)^*=G(\overline{z})\). Therefore, in contrast with most papers on local laws, \(\mathfrak {I}m_{\mathrm {sc}}\) may be negative and we define \(\rho =\rho _{\mathrm {sc}}(z):=\pi ^{-1}|\mathfrak {I}m_{\mathrm {sc}}|\) and \(\eta := |\mathfrak {I}z|\).

The classical local law (see e.g. in [11, 30, 38]) for a single resolvent in averaged and isotropic form states that in the spectral regime \(\left\{ z \Big |N\rho \eta \ge 1 \right\} \) we have

$$\begin{aligned} |\langle (G-m)A\rangle |\prec \frac{1}{N\eta }, \qquad \left|\langle {\varvec{x}}, (G-m){\varvec{y}}\rangle \right|\prec \sqrt{\frac{\rho }{N\eta }} \end{aligned}$$
(12)

for any deterministic matrix A and vectors \({\varvec{x}}, {\varvec{y}}\), with \(\Vert A\Vert , \left\Vert \varvec{x}\right\Vert , \left\Vert \varvec{y}\right\Vert \lesssim 1\). Here \(\prec \) indicates the commonly used concept of stochastic domination (see, e.g. [26]) indicating a bound with very high probability up to a factor \(N^\epsilon \) for any small \(\epsilon >0\), uniformly in \(A, {\varvec{x}}, {\varvec{y}}\) and in the spectral parameter z as long as \(N\rho \eta \ge 1\). The precise definition is as follows:

Definition 1

(Stochastic Domination). If

$$\begin{aligned} X=\left( X^{(N)}(u) \Big |N\in {\mathbf {N}}, u\in U^{(N)} \right) \quad \text {and}\quad Y=\left( Y^{(N)}(u) \Big |N\in {\mathbf {N}}, u\in U^{(N)} \right) \end{aligned}$$

are families of non-negative random variables indexed by \(N\), and possibly some parameter \(u\), then we say that \(X\) is stochastically dominated by \(Y\), if for all \(\epsilon , D>0\) we have

$$\begin{aligned}\sup _{u\in U^{(N)}} {{\,\mathrm{{\mathbf {P}}}\,}}\left[ X^{(N)}(u)>N^\epsilon Y^{(N)}(u)\right] \le N^{-D}\end{aligned}$$

for large enough \(N\ge N_0(\epsilon ,D)\). In this case we use the notation \(X\prec Y\) or \(X= {{\mathcal {O}}_\prec }\left( Y \right) \).

Our key new insight is that whenever the deterministic matrix A in (12) is traceless, then \(\langle GA\rangle \) is considerably smaller (by a factor of \(\sqrt{\rho \eta }\), in the interesting small \(\eta \) regime) than the general bound (12) predicts. There is no such improvement for the isotropic law.

Theorem 3

(Traceless single \(G\) local law). Fix \(\epsilon >0\), let \(W\) be a Wigner matrix satisfying Assumption 1, let \(z\in {\mathbf {C}}\setminus {\mathbf {R}}\), and let \(G(z)=(W-z)^{-1}\). We use the notation \(\eta :=|\mathfrak {I}z|\), \(\rho =\rho _{\mathrm {sc}}(z)\), \(m=m_{\mathrm {sc}}(z)\). Then for \(N\eta \rho \ge N^{\epsilon }\) and for any deterministic matrix \(A\), with \(\langle A\rangle =0\) and \(\left\Vert A\right\Vert \lesssim 1\), we have

$$\begin{aligned} |\langle (G-m)A\rangle | = |\langle GA\rangle |\prec \frac{\sqrt{\rho }}{N\sqrt{\eta }}. \end{aligned}$$
(13)

We prove a similar drastic improvement owing to the traceless observables for local laws involving two resolvents, like \(\langle GAGA\rangle \), as well as local laws involving a resolvent and its transpose, \(\langle GG^t\rangle \). The isotropic laws are also improved in this case. The precise statements will be given in Remark 3 in Sect. 3. We close the current section by a remark indicating the optimality of the new local law (13).

Remark 2

The local law for \(\langle GA\rangle \) in (13) is optimal for \(G,G^*\), as well as \(\mathfrak {I}G\). Indeed a simple calculation from [18, Theorem 4.1] shows, for \(\mathfrak {I}z>0\), that

$$\begin{aligned} {{\,\mathrm{{\mathbf {E}}}\,}}\left|\langle \mathfrak {I}G A\rangle \right|^2 \approx \frac{\langle AA^*\rangle }{2N^2}\Bigl (\frac{\mathfrak {I}m}{\eta }-\mathfrak {R}\partial _z m\Bigr ) \sim \frac{\mathfrak {I}m}{N^2\eta }. \end{aligned}$$
(14)

In fact, in our companion paper we prove that \(\langle \mathfrak {I}GA\rangle \) is asymptotically Gaussian with zero expectation and variance given in (14) (see [18, Eq. (94)]). This variance is much smaller than the one without traceless observable, \({{\,\mathrm{Var}\,}}\langle \mathfrak {I}G\rangle \sim (N\eta )^{-2}\) (see [34]).

3 Quantum Unique Ergodicity: Proof of Theorems 12

For integers \(J\in {\mathbf {N}}\) and self-adjoint matrices \(B=B^*\) we introduce the \(J\)-averaged observables

$$\begin{aligned} \varXi _J^B&:= \biggl (\max _{i_0,j_0}\frac{N}{(2J)^2}\sum _{\left|i-i_0\right|<J}\sum _{\left|j-j_0\right|<J} \left|\langle \varvec{u}_i, B \varvec{u}_j\rangle \right|^2 \biggr )^{1/2}, \end{aligned}$$
(15)
$$\begin{aligned} {\bar{\varXi }}_J^B&:= \biggl (\max _{i_0,j_0}\frac{N}{(2J)^2}\sum _{\left|i-i_0\right|<J}\sum _{\left|j-j_0\right|<J} \left|\langle \varvec{u}_i, B \overline{\varvec{u}_j}\rangle \right|^2 \biggr )^{1/2}. \end{aligned}$$
(16)

The following theorem shows that both averaged overlaps \(\varXi _J^B,{\bar{\varXi }}_J^B\)are essentially bounded if B is traceless, and that \({\bar{\varXi }}_J^I\) is essentially bounded for \(\left|\sigma \right|<1\) and for choosing the identity matrix \(B=I\).

Theorem 4

Fix \(\epsilon >0\), let \(J\ge N^\epsilon \), and let \(A=A^*\) be a deterministic Hermitian matrixFootnote 1 such that \(\langle A\rangle =0\), \(\left\Vert A\right\Vert \lesssim 1\), then it holds

$$\begin{aligned} \varLambda _J^A\prec 1 \qquad \text {for}\qquad \varLambda _J^A:=\varXi _J^A+\overline{\varXi }_J^A. \end{aligned}$$
(17)

Similarly, for any \(|\sigma |<1\) it holds

$$\begin{aligned} \varPi _J\prec 1 \qquad \text {for}\qquad \varPi _J:={\bar{\varXi }}_J^I, \end{aligned}$$
(18)

where the error is uniform in \(|\sigma |\le 1-\epsilon '\), for any fixed \(\epsilon '>0\).

Hence, up to a J-averaging, we have the asymptotic orthogonality of \(\varvec{u}_i\) and \(A\varvec{u}_j\), \(A\overline{\varvec{u}_j}\) for any \(i,j\) and for any traceless A. Similarly, for \(\left|\sigma \right|<1\) we have the asymptotic orthogonality of \(\varvec{u}_i\) and \(\overline{\varvec{u}_j}\). Note that in case \(\sigma =\pm 1\) the bound (18) does not hold since then \(\left|\varPi _J\right|\gtrsim \sqrt{N/J}\), as easily seen. Using Theorem 4 we immediately conclude Theorems 12 by removing J-averaging with a small J.

Proof

(Theorems 12) For the proof of Theorem 1 we may assume by linearity that \(A\) is traceless and self-adjoint. For any \(i,j\in [N]\), by (17), we have that

$$\begin{aligned} |\langle \varvec{u}_i, A \varvec{u}_j\rangle |^2+|\langle \varvec{u}_i, A \overline{\varvec{u}_j}\rangle |^2\le \frac{J^2}{N} (\varLambda _J^A)^2\prec \frac{J^2}{N}. \end{aligned}$$
(19)

Since \(J=N^\epsilon \) with \(\epsilon >0\) arbitrary small, together with the definition of \(\prec \) in Definition 1, the bound in (19) implies that \( |\langle \varvec{u}_i, A \varvec{u}_j\rangle |^2+|\langle \varvec{u}_i, A \overline{\varvec{u}_j}\rangle |^2\prec N^{-1}\), concluding the proof of Theorem 1. The proof of Theorem 2 is completely analogous and so omitted. \(\square \)

As a first step towards the proof of Theorem 4, we first show that \(\varXi _J^B,{\bar{\varXi }}_J^B\) are comparable with \(\langle \mathfrak {I}G_1 B \mathfrak {I}G_2 B\rangle ,\langle \mathfrak {I}G_1 B \mathfrak {I}G_2^t B\rangle \) for suitably chosen spectral parameters in the resolvents \(G_i=G(z_i)\). For any \(J\in {\mathbf {N}}\) and \(E\in [-2,2]\) we define \(z=z(E,J)=E+\mathrm {i}\eta (E,J)\in {\mathbf {H}}\) implicitly via the equation \(N\eta (E,J)\rho (z(E,J))=J\). Note that this equation has a unique solution \(\eta (E,J)>0\) since the function \(\eta \rightarrow \eta \mathfrak {I}m(E+\mathrm {i}\eta )\) is strictly increasing from \(0\) to \(1\). The following simple lemma will be proven at the end of this section.

Lemma 1

Let \(\epsilon >0\), fix some \(J\ge N^\epsilon \) and let \(B=B^*\) be an arbitrary deterministic self-adjoint matrix. For \(E_1,E_2\in [-2,2]\) let \(z_i=z(E_i,J)\) and set \(G_i=G(z_i),\rho _i=\rho (z_i)\), then we have

$$\begin{aligned} \begin{aligned} (\varXi _J^B)^2&\lesssim \sup _{E_1,E_2}\frac{\langle \mathfrak {I}G_1B \mathfrak {I}G_2 B\rangle }{\rho _1\rho _2}\lesssim (\varXi _J^B)^2, \\ ({\bar{\varXi }}_J^B)^2&\lesssim \sup _{E_1,E_2}\frac{\langle \mathfrak {I}G_1B \mathfrak {I}G_2^t B\rangle }{\rho _1\rho _2}\lesssim ({\bar{\varXi }}_J^B)^2, \end{aligned} \end{aligned}$$
(20)

with very high probability.

As the main inputs for Theorem 4, we now state the various local laws and bounds for products of \(G\)’s, their transposes and deterministic matrices in the following Propositions 12. These bounds still involve the key control quantities \(\varLambda \) and \(\varPi \). Using these bounds, we will prove Theorem 4 by a Gronwall argument on \(\varLambda \) and \(\varPi \) that will immediately imply Theorem 3. Finally, for completeness, we also state a few representative local laws involving two resolvents in Remark 3. The key technical Propositions 12 will be proven in Sect. 4.

Proposition 1

Let \(A=A^*\) be a deterministic matrix with \(\langle A\rangle =0\). Fix \(\epsilon >0\) and consider \(z\in {\mathbf {C}}\setminus {\mathbf {R}}\) such that \(L:=N\eta \rho \ge N^\epsilon \). Then for \(G=G(z)\) we have that

$$\begin{aligned} |\langle GA\rangle |\prec \frac{\sqrt{\rho }\varLambda _+^A}{N\sqrt{\eta }}, \end{aligned}$$
(21)

with \(\varLambda _+^A:=\varLambda _L^A+\Vert A\Vert \).

Proposition 2

Let \(A=A^*,A'=(A')^*\) be a deterministic matrix with \(\langle A\rangle =0=\langle A'\rangle \). Fix \(\epsilon >0\), let \(W\) be a Wigner matrix satisfying Assumption 1, let \( z_1,z_2\in {\mathbf {C}}\setminus {\mathbf {R}}\), and let \(G_i=G(z_i)\), for \(i\in \left\{ 1,2 \right\} \). We use the notations \(\eta _i:=|\mathfrak {I}z_i|\), \(\rho _i=\rho _{\mathrm {sc}}(z_i)\), \(m_i=m_{\mathrm {sc}}(z_i)\), and set \(L:=N\min _i(\eta _i\rho _i)\), \(\eta _*:=\eta _1\wedge \eta _2\) and \(\rho ^*=\rho _1\vee \rho _2\). Then for \(L\ge N^\epsilon \) and setting \(\varLambda _+^A=\varLambda _L^A+\Vert A\Vert \), \(\varPi _+:=\varPi _L+1\), we have the averaged local laws

$$\begin{aligned}&|\langle G_1G_2^{(t)}A\rangle |\prec \frac{\sqrt{\rho ^*}\varLambda _+^A}{N\eta _*^{3/2}},\,\, |\langle \mathfrak {I}G_1 A G_2^{(t)}\rangle |\prec \frac{\rho _1\varLambda _+^A}{L\sqrt{\eta _*}}, \,\, |\langle \mathfrak {I}G_1 A \mathfrak {I}G_2^{(t)}\rangle |\prec \frac{\rho _1\rho _2\varLambda _+^A}{L\sqrt{\eta _*}}, \end{aligned}$$
(22)
$$\begin{aligned}&\langle G_1AG_2^{(t)}A'\rangle =m_1m_2\langle AA'\rangle +{{\mathcal {O}}_\prec }\left( \varLambda _+^A\varLambda _+^{A'}\sqrt{\frac{\rho ^*}{N\eta _*}} \right) , \end{aligned}$$
(23)
$$\begin{aligned}&\langle \mathfrak {I}G_1A\mathfrak {I}G_2^{(t)} A'\rangle =\mathfrak {I}m_1 \mathfrak {I}m_2\langle AA'\rangle +{\mathcal {O}}_\prec \left( \frac{\rho _1 \rho _2\varLambda _+^A\varLambda _+^{A'}}{\sqrt{L}} \right) , \end{aligned}$$
(24)

where \(G^{(t)}\) indicates that the bounds are valid for both choices \(G\) or \(G^t\). Moreover, for any deterministic vectors \({\varvec{x}}, {\varvec{y}}\) such that \(\Vert {\varvec{x}}\Vert +\Vert {\varvec{y}}\Vert \lesssim 1\) we have the isotropic law

$$\begin{aligned} |\langle {\varvec{x}}, G_1AG_2 {\varvec{y}}\rangle |\prec \varLambda _+^A\sqrt{\frac{\rho ^*}{\eta _*}}. \end{aligned}$$
(25)

Additionally, for \(|\sigma |<1\) we have that

$$\begin{aligned} \langle G_1G_2^t\rangle =\frac{m_1m_2}{1-\sigma m_1 m_2}+{{\mathcal {O}}_\prec }\left( \varPi _+^2\sqrt{\frac{\rho ^*}{N\eta _*}} \right) , \end{aligned}$$
(26)

and

$$\begin{aligned} \langle \mathfrak {I}G_1\mathfrak {I}G_2^t\rangle =\frac{\mathfrak {I}m_1 \mathfrak {I}m_2 (1-\sigma ^2|m_1|^2|m_2|^2)}{|1-\sigma m_1m_2|^2|1-\sigma \overline{m_1}m_2|^2}+{\mathcal {O}}_\prec \left( \frac{\rho _1 \rho _2\varPi _+^2}{\sqrt{L}} \right) , \end{aligned}$$
(27)

where the error is uniform in \(|\sigma |\le 1-\epsilon '\), for any fixed \(\epsilon '>0\).

Using Lemma 1 and Propositions 12 as an input, we now conclude the proof of Theorem 4.

Theorem 4

We start with the proof of (17). Choose \(J=N^\epsilon \) with a fixed arbitrary small \(\epsilon >0\), and \(E_1,E_2\in [-2,2]\). Then by the definition of \(z(E_i,J)=E_i+\mathrm {i}\eta (E_i,J)\) above Lemma 1 it follows that

$$\begin{aligned} J=N\eta (E_1,J)\rho (z(E_1,J))=N\eta (E_2,J)\rho (z(E_2,J)), \end{aligned}$$

and thus we obtain from (24) that

$$\begin{aligned} \frac{\left|\langle \mathfrak {I}G(z(E_1,J))A\mathfrak {I}G(z(E_2,J))^{(t)} A \rangle \right|}{\rho (z(E_1,J))\rho (z(E_2,J)) }\prec 1+\frac{(\varLambda _+^A)^2}{J^{1/2}}. \end{aligned}$$
(28)

By a standard grid argument using the Lipschitz continuity of the resolvent we conclude that (28) remains valid after taking the supremum over \(E_1,E_2\in [-2,2]\) and therefore from the lower bound in Lemma 1 we conclude

$$\begin{aligned} (\varLambda _J^A)^2\prec 1+\frac{(\varLambda ^A_+)^2}{J^{1/2}}. \end{aligned}$$
(29)

Finally, by (29) it follows that \((\varLambda _J^A)^2\prec 1\), concluding the proof of (17). The proof of (18) is completely analogous to the proof of (17) above

using the local law (27).

This concludes the proof of Theorem 4. \(\square \)

Theorem 3

This theorem immediately follows from Propositions 1 together with \(\varLambda _L^A\prec 1\) obtained in Theorem 4. \(\square \)

Remark 3

Proposition 2 combined with the \(\varLambda _+^A+\varvec{1}(|\sigma |<1)\varPi _+\prec 1\) obtained in Theorem 4 also provides local laws involving two resolvents as counterparts of the single G local law stated in Theorem 3. For example, with the notations of Proposition 2, we have

$$\begin{aligned} \langle G_1AG_2^{(t)}A'\rangle&=m_1m_2\langle AA'\rangle +{{\mathcal {O}}_\prec }\left( \sqrt{\frac{\rho ^*}{N\eta _*}} \right) , \qquad |\langle {\varvec{x}}, G_1AG_2 {\varvec{y}}\rangle |\prec \sqrt{\frac{\rho ^*}{\eta _*}}, \end{aligned}$$
(30)

and for \(|\sigma |< 1\) we also have

$$\begin{aligned} \langle G_1G_2^t\rangle =\frac{m_1m_2}{1-\sigma m_1m_2}+{{\mathcal {O}}_\prec }\left( \sqrt{\frac{\rho ^*}{N\eta _*}} \right) . \end{aligned}$$
(31)

We stated only the local laws for two resolvents where the asymptotic orthogonality mechanism is detected, i.e. if a traceless deterministic matrix is present or if \(G\) and \(G^t\) appear next to each other and \(|\sigma |<1\). Note that when both mechanisms are simultaneously present, as in the terms with transposes in (22), one may gain from both effects simultaneously, but we refrain from doing this here.

For comparison, we also list some local laws without exploiting this mechanism:

$$\begin{aligned} \begin{aligned} \langle G_1G_2\rangle&=\frac{m_1m_2}{1- m_1m_2}+{\mathcal {O}}_\prec \left( \frac{1}{N\eta _1\eta _2}\right) , \\ \langle {\varvec{x}}, G_1G_2 {\varvec{y}}\rangle&=\frac{m_1m_2\langle {\varvec{x}}, {\varvec{y}}\rangle }{1-m_1m_2}+{\mathcal {O}}_\prec \left( \frac{\sqrt{\rho ^*}}{\sqrt{N\eta _*}\eta ^*}\right) , \end{aligned} \end{aligned}$$
(32)

and for any \(|\sigma |\le 1\)

$$\begin{aligned} \langle G_1G_2^t\rangle =\frac{m_1m_2}{1-\sigma m_1m_2}+{\mathcal {O}}_\prec \left( \frac{1}{N\eta _1\eta _2}\right) ; \end{aligned}$$
(33)

these relations are proven in our companion paper [18] using Theorem 5 of the present paper. In the most interesting critical case \(z:=z_1={\bar{z}}_2\) with \(\eta =|\mathfrak {I}z|\ll 1\), the leading term in (32) is of order \(1/\eta \) with a large error \(1/N\eta ^2\), while the leading term in (30) is bounded (even zero in the isotropic case) with a negligble error term. The leading terms in (31) and (33) are of course the same, but the error term in (33) is much bigger since it ignores the asymptotic orthogonality mechanism.

Note that using the decomposition \(B = \langle B\rangle + \mathring{B}\), where \(\mathring{B}\) is the traceless part of \(B\), a combination of (30)–(33) trivially gives local laws for any product of the form \(GBG^{(t)}B'\) for arbitrary deterministic matrices \(B\) and \(B'\).

Finally, we close this section by proving Lemma 1.

Proof

(Lemma 1) We only consider \(\langle \mathfrak {I}G_1 B \mathfrak {I}G_2 B\rangle \), the proof of the bounds for \(\langle \mathfrak {I}G_1 B \mathfrak {I}G_2^t B\rangle \) is completely analogous and so omitted.

We recall that by the averaged local law for single resolvent in (13) it follows the rigidity of the eigenvalues (see e.g. [26, Theorem 7.6] or [30]):

$$\begin{aligned} |\lambda _i-\gamma _i|\prec \frac{1}{N^{2/3}{\widehat{i}}^{1/3}}, \end{aligned}$$
(34)

where \({\widehat{i}}:=i\wedge (N+1-i)\), and \(\gamma _i\) are the classical eigenvalue locations (quantiles) defined by

$$\begin{aligned} \int _{-\infty }^{\gamma _i} \rho (x)\, {\text {d}}\!{}x=\frac{i}{N}, \qquad i\in [N], \end{aligned}$$
(35)

where we recall \(\rho (x)=\rho _\mathrm {sc}(x)=(2\pi )^{-1}\sqrt{(4-x^2)_+}\).

For \(E_1,E_2\in [-2,2]\), we recall that \(J=N\eta (E_i,J)\rho (z(E_i,J))\), for \(i\in \{1,2\}\), by the definition of \(z(E_i,J)=E_i+\mathrm {i}\eta (E_i,J)\) above Lemma 1, and thus, together with (35) we conclude that there is constant \(C\) such that for any \(a, a_0\) it holds that

$$\begin{aligned} |\gamma _a-\gamma _{a_0}|\le \eta ( \gamma _{a_0}) \Rightarrow |a-a_0|\le CJ, \quad |a-a_0|\le J \Rightarrow |\gamma _a-\gamma _{a_0}|\le C\eta ( \gamma _{a_0}).\nonumber \\ \end{aligned}$$
(36)

With a slight abuse of notation we will write this relation as

$$\begin{aligned} |\gamma _a-\gamma _{a_0}|\lesssim \eta ( \gamma _{a_0}) \Leftrightarrow |a-a_0|\lesssim J, \end{aligned}$$
(37)

since the implicit constants in the \(\lesssim \) relation will be irrelevant for our analysis. With the short-hand notations \(\varXi =\varXi _J^B\), \(z_i=z(E_i,J)\), \(G_i=G(z_i)\), \(\eta _i=\eta (E_i,J)\), and \(\rho _i=\rho (z_i)\), then by (34), and writing \(\langle \mathfrak {I}G_1 B \mathfrak {I}G_2 B\rangle \) in spectral decomposition

$$\begin{aligned} \begin{aligned} \langle \mathfrak {I}G_1 B \mathfrak {I}G_2 B\rangle&=\sum _{ab} \frac{R_{ab}S_{ab}}{N}, \\ R_{ab}&:=\frac{\eta _1\eta _2}{|\lambda _a-z_1|^2|\lambda _b-z_2|^2}, \qquad S_{ab}:=|\langle \varvec{u}_a,B \varvec{u}_b\rangle |^2, \end{aligned} \end{aligned}$$
(38)

it follows that

$$\begin{aligned} \varXi ^2\lesssim \sup _{E_1, E_2\in [-2,2]} \frac{\langle \mathfrak {I}G_1 B \mathfrak {I}G_2 B\rangle }{\rho _1\rho _2}\lesssim \varXi ^2, \end{aligned}$$
(39)

with very high probability on the set where the rigidity bound (34) holds. The lower bound in (39) is trivial by choosing \(E_1=\gamma _{a_0}\) and \(E_2=\gamma _{b_0}\). To prove the upper bound in (39) we use the local averaging formula

$$\begin{aligned} \sum _{ab} R_{ab} S_{ab} \sim \sum _{a_0b_0} \frac{1}{(2J)^2} \sum _{\begin{array}{c} \left|a-a_0\right|< J, \\ \left|b-b_0\right|< J \end{array}} R_{ab} S_{ab} \sim \sum _{a_0b_0} R_{a_0b_0} \frac{1}{(2J)^2} \sum _{\begin{array}{c} \left|a-a_0\right|< J, \\ \left|b-b_0\right|< J \end{array}} S_{ij} \end{aligned}$$
(40)

for general non-negative \(R_{ab},S_{ab}\) such that \(R_{ab}\sim R_{a_0b_0}\) whenever \(\left|a-a_0\right|\vee \left|b-b_0\right|\le J\) which is applicable for \(R_{ab}\) in (38) as a consequence of the rigidity bound in (34), the relation in (37), and the fact that we can choose the \(\xi >0\) so that \(N^\xi \), coming from the rigidity high probability bound, is much smaller than \(J\ge N^\epsilon \). Finally we note that \(N^{-2}\sum _{ab}R_{ab}=\langle \mathfrak {I}G_1\rangle \langle \mathfrak {I}G_2\rangle \sim \rho _1\rho _2\) by \(|\langle \mathfrak {I}G_i-\mathfrak {I}m_i\rangle |\prec \rho _i\) from (13), concluding the proof of the upper bound in (39). \(\square \)

4 Local Laws for One and Two Resolvents

In this section we prove the local laws in Propositions 12. By the self consistent equation for \(m\) in (11), and by \(G(W-z)=I\), we have

$$\begin{aligned} G=m-mWG - m \langle G\rangle G + m\langle G-m\rangle G. \end{aligned}$$
(41)

For any given functions \(f,g\) of the Wigner matrix \(W\) we define the renormalisation of the product \(g(W)Wf(W)\) (denoted by underline) as follows:

$$\begin{aligned} \underline{g(W)Wf(W)}:=g(W)Wf(W)-{\widetilde{{{\,\mathrm{{\mathbf {E}}}\,}}}}g(W){\widetilde{W}}(\partial _{{\widetilde{W}}}f)(W)-{\widetilde{{{\,\mathrm{{\mathbf {E}}}\,}}}}(\partial _{{\widetilde{W}}}g)(W){\widetilde{W}}f(W), \nonumber \\ \end{aligned}$$
(42)

where \(\partial _{{{\widetilde{W}}}} f(W)\) denotes the directional derivative of the function \(f\) in the direction \({\widetilde{W}}\) at the point \(W\), and \({\widetilde{W}}\) is an independent copy of \(W\). The definition is chosen such that it subtracts the second order term in the cumulant expansion, in particular if all entries of \(W\) were Gaussian then we had \({{\,\mathrm{{\mathbf {E}}}\,}}\underline{g(W)Wf(W)}=0\). Note that the definition (42) only makes sense if it is clear to which \(W\) the underline refers, i.e. it would be ambiguous if \(f(W)=W\). In our applications, however, each underlined term contains exactly a single \(W\) factor, and hence such ambiguities will not arise. As a special case we have that

$$\begin{aligned} \underline{WG}=WG+\langle G\rangle G + \sigma \frac{G^t G}{N} + \frac{\widetilde{w_2}}{N} {{\,\mathrm{diag}\,}}(G)G, \end{aligned}$$
(43)

recalling the notation \(\widetilde{w_2}=w_2-1-\sigma \) from Assumption 1. Then by (41) and and (43), it follows that

$$\begin{aligned} G=m-m\underline{WG}+\frac{m\sigma }{N}G^t G+\frac{\widetilde{w_2}}{N}{{\,\mathrm{diag}\,}}(G)G+m\langle G-m\rangle G. \end{aligned}$$
(44)

From (44) one can already see that in order to get a local law for \(G\) it is essential to estimate the underlined term \(\underline{WG}\) in averaged and isotropic sense. In order to prove Proposition 2 we need bounds for underlined terms involving not only one \(G\) but also two \(G\)’s (see e.g. (56) below). We now state the bound for these terms and for longer products of resolvents and deterministic matrices both in an averaged and in an isotropic sense, since the proof for products with more than two resolvents is very similar to the cases we need.

For \(l\in {\mathbf {N}}\) we consider renormalised alternating products of resolvents \(G_1,\ldots ,G_l\) and deterministic matrices \(B_1,\ldots ,B_l\) in averaged and isotropic form,

$$\begin{aligned} \langle \underline{WG_1B_1G_2B_2\cdots G_l B_l}\rangle , \quad \langle \varvec{x},\underline{W G_1 B_1 G_2\cdots B_{l-1}G_l}\varvec{y}\rangle . \end{aligned}$$
(45)

Each resolvent \(G_k\) is evaluated at a (potentially) different spectral parameter \(z_k\in {\mathbf {C}}\setminus {\mathbf {R}}\) and other than \(G_k=G(z_k)\) we allow each resolvent to be transposed and/or being the imaginary part, i.e.

$$\begin{aligned} G_k\in \left\{ G(z_k),G(z_k)^t,\mathfrak {I}G(z_k),(\mathfrak {I}G(z_k))^t \right\} . \end{aligned}$$
(46)

Note that adjoints of resolvents can be included in the products by conjugating the spectral parameter since \(G(z)^*=G(\overline{z})\). For a given product of the form (45) we consider three sets \({\mathfrak {i}},{\mathfrak {a}},{\mathfrak {t}}\) of indices, recording special structural properties of \(G_k,B_k\). By definition, the set \({\mathfrak {i}}\subset [l]\) collects those indices \(k\in [l]\) for which \(G_k\in \left\{ \mathfrak {I}G(z_k),(\mathfrak {I}G(z_k))^t \right\} \). For the choice of the sets \({\mathfrak {a}},{\mathfrak {t}}\) we allow a certain freedom described in the theorem.

Theorem 5

Fix \(\epsilon >0\), let \(l\in {\mathbf {N}}\), \(z_1,\dots , z_l\in {\mathbf {C}}\setminus {\mathbf {R}}\) and for \(k\in [l]\) let \(G_k\) be as in (46) and \(B_k\) be deterministic \(N\times N\) matrices, and \(\varvec{x},\varvec{y}\) be deterministic vectors with bounded norms \(\left\Vert B_k\right\Vert \lesssim 1\), \(\left\Vert \varvec{x}\right\Vert +\left\Vert \varvec{y}\right\Vert \lesssim 1\). Set

$$\begin{aligned} L := N\min _{k} (\eta _k\rho _k),\quad \rho ^*:= \max _{k}\rho _k,\quad \eta _*:=\min _k \eta _k, \end{aligned}$$
(47)

with \(\eta _k:=\left|\mathfrak {I}z_k\right|\), \(\rho _k:=\rho (z_k)=\left|\mathfrak {I}m(z_k)\right|/\pi \) and assume \(L\ge N^\epsilon \) and \(\eta _*\lesssim 1\). Let \({\mathfrak {a}},{\mathfrak {t}}\) denote disjoint sets of indices, \({\mathfrak {a}}\cap {\mathfrak {t}}=\emptyset \), such that for each \(k\in {\mathfrak {a}}\) we have \(\langle B_k\rangle =0\), and for each \(k\in {\mathfrak {t}}\) exactly one of \(G_k,G_{k+1}\) is transposed, where in the averaged case and \(k=l\) it is understood that \(G_{l+1}=G_1\). Recall the notations \(\varPi _+:=\varPi _L+1\), \(\varLambda _{+}^{B}:=\varLambda _L^B+\left\Vert B\right\Vert \). Then with \(a:=\left|{\mathfrak {a}}\right|, t:=\left|{\mathfrak {t}}\right|\), we have the following bounds:

  1. (av1)

    For \({\mathfrak {a}}={\mathfrak {t}}=\emptyset \) we have

    $$\begin{aligned} \begin{aligned} \left|\langle \underline{WG_1B_1G_2B_2\cdots G_l B_l}\rangle \right|&\prec \frac{\rho ^{*}}{ N \eta _*^{l} }. \end{aligned} \end{aligned}$$
    (48)
  2. (av2)

    For \({\mathfrak {a}},{\mathfrak {t}}\subset [l]\), \(\left|{\mathfrak {a}}\cup {\mathfrak {t}}\right|\ge 1\) we have the bound

    $$\begin{aligned} \begin{aligned} \left|\langle \underline{WG_1B_1G_2B_2\cdots G_l B_l}\rangle \right|&\prec \frac{(\sqrt{N}\eta _*)^{a+t}}{N\eta _*^l} \sqrt{\frac{\rho ^{*}}{N\eta _*}} \varPi _+^t\prod _{k\in {\mathfrak {a}}}\varLambda _+^{B_k}. \end{aligned} \end{aligned}$$
    (49)
  3. (iso)

    For \({\mathfrak {a}},{\mathfrak {t}}\subset [l-1]\) and for any \(0\le j<l\) we have the bound

    $$\begin{aligned}&\left|\langle \varvec{x},\underline{G_1 B_1\cdots G_{j} B_j W G_{j+1}B_{j+1}\cdots B_{l-1}G_l }\varvec{y}\rangle \right| \nonumber \\&\quad \prec \frac{(\sqrt{N}\eta _*)^{a+t}}{\eta _*^{l-1}} \sqrt{\frac{\rho ^{*}}{N\eta _*}} \varPi _+^t\prod _{k\in {\mathfrak {a}}}\varLambda _+^{B_k}, \end{aligned}$$
    (50)

    where the \(j=0\) case is understood as \(\langle \varvec{x},\underline{W G_1 B_1\cdots B_{l-1}G_l}\varvec{y}\rangle \).

In case \(\prod _{k\in {\mathfrak {i}}} \rho _k\lesssim (\rho ^*)^{b+1}\), the bounds (48)–(50) remain valid if the rhs. are multiplied by the factor \((\rho ^*)^{-b-1}\prod _{k\in {\mathfrak {i}}} \rho _k\), where \(b:=l\) in case of (48), \(b:=l-a-t\) in case of (49), and \(b:=l-a-t-1\) in case of (50). Moreover, for any \(\eta _*\ge 1\) we have the bounds

$$\begin{aligned} \begin{aligned} \left|\langle \underline{WG_1B_1G_2B_2\cdots G_l B_l}\rangle \right|&\prec \frac{1}{N\eta _*^l},\\ \left|\langle \varvec{x},\underline{G_1 B_1\cdots G_{j} B_j W G_{j+1}B_{j+1}\cdots B_{l-1}G_l }\varvec{y}\rangle \right|&\prec \frac{1}{N^{1/2}\eta _*^l}. \end{aligned} \end{aligned}$$
(51)

Remark 4

(Asymptotic orthogonality effect) The main result of Theorem 5 are (49) and its isotropic counterpart (50). The essential part is the factor \((\sqrt{N}\eta _*)^{a+t}\) in (49) since the additional factors \(\varPi _+\) and \(\varLambda _+\) are a posteriori shown to be essentially \({{\mathcal {O}}}\left( 1 \right) \), c.f. Theorem 4. Compared with the robust bound (48) in the relevant small \(\eta _*\) regime the bound (49) represents an improvement of \(\sqrt{N} \eta _*\) for each occurrence when the asymptotic orthogonality can be exploited, either due to a traceless matrix B or to a switch between a resolvent and its transpose. In addition, compared to the robust averaged bound (48) there is a further improvement of \(\sqrt{\rho ^*/N\eta _*}\) in (49) if at least one orthogonality effect is exploited, enabling the optimal \(GA\) local law in Theorem 3. We note that in case when \(\sqrt{N}\eta _*\gg 1\) the robust bounds (48) and (50) with \(a+t=0\) are always available also in the presence of traceless deterministic matrices and alternating \(G,G^t\) simply by choosing the sets \({\mathfrak {a}},{\mathfrak {t}}\) to be empty.

Remark 5

(Alternative renormalisation) In (42) we defined the renormalisation with respect to an independent copy of \(W\) while in some previous papers [17] the same notation was used to denote the renormalisation with respect to a suitable reference ensemble (e.g. the GUE-ensemble in the present paper or the complex Ginibre ensemble in case of [17]). However, mostly these two possible definitions only differ in some sub-leading terms. For example, denoting the renormalisation with respect to an independent GUE-matrix by

$$\begin{aligned} \underline{Wf(W)}_\mathrm {GUE}:=Wf(W)-\widetilde{{\,\mathrm{{\mathbf {E}}}\,}}_\mathrm {GUE} \widetilde{W} (\partial _{\widetilde{W}} f) \end{aligned}$$

we have trivially

$$\begin{aligned} \langle \underline{WG}\rangle -\langle \underline{WG}_\mathrm {GUE}\rangle = \sigma \frac{\langle G^t G\rangle }{N} + \widetilde{w}_2 \frac{\langle {{\,\mathrm{diag}\,}}(G)G\rangle }{N} = {{\mathcal {O}}_\prec }\left( \frac{\rho }{N\eta } \right) . \end{aligned}$$

The difference between the two renormalisations becomes relevant in Theorem 5 only whenever at least one transposed resolvent occurs since then for example

$$\begin{aligned} \langle \underline{WG^t}\rangle -\langle \underline{WG^t}_\mathrm {GUE}\rangle = \sigma \langle G\rangle ^2 + \widetilde{w}_2 \frac{\langle {{\,\mathrm{diag}\,}}(G)G\rangle }{N} \sim 1.\end{aligned}$$

However, this is the only relevant case and the statement of Theorem 5 holds true verbatim if \(\underline{Wf(W)}\) is replaced by \(\underline{Wf(W)}_\mathrm {GUE}\) in case no resolvents are transposed, c.f. Remark 7 in Sect. 5.

Using the bounds for the underlined terms in Theorem 5 we conclude the proof of Proposition 12. We start with the proof of the local law for \(\langle GA\rangle \) and then we prove the local laws and bounds for two \(G\)’s.

Proof

(Proposition 1) Using the equation for \(G\) in (44), we start writing the equation for \(GA\):

$$\begin{aligned} GA=mA-m\underline{WG}A+m\langle G-m\rangle GA+\frac{m\sigma }{N}G^t GA+\frac{\widetilde{w_2}}{N}{{\,\mathrm{diag}\,}}(G)GA, \end{aligned}$$
(52)

where we recall the definition of \(\underline{WG}\) in (42). Then, taking the average in (52), using that \(\langle A\rangle =0\) and that \(|\langle G-m\rangle |\prec (N\eta )^{-1}\) by the first bound in (13), we conclude

$$\begin{aligned} \left[ 1+{\mathcal {O}}_\prec \left( \frac{1}{N\eta }\right) \right] \langle GA\rangle =-m\langle \underline{WG}A\rangle +\frac{m\sigma }{N}\langle G^t GA\rangle +\frac{\widetilde{w_2}}{N}\langle {{\,\mathrm{diag}\,}}(G)GA\rangle . \end{aligned}$$

Next we notice that

$$\begin{aligned} \frac{1}{N}|\langle G^t GA\rangle |\le \frac{1}{N}\langle |G|\rangle ^{1/2}\langle GA|G^t|AG^*\rangle ^{1/2}\prec \frac{\varLambda _+\sqrt{\rho }}{N\sqrt{\eta }},\nonumber \\ \end{aligned}$$
(53)

where in the last inequality we used Lemma 5, and the notation \(\varLambda _+:=\varLambda _L^A+\Vert A\Vert \). Additionally, we also have that

$$\begin{aligned} \frac{1}{N}|\langle {{\,\mathrm{diag}\,}}(G)GA\rangle |=\left| \frac{1}{N^2}\sum _i G_{ii} (GA)_{ii}\right| \prec \frac{1}{N}, \end{aligned}$$
(54)

where we used that \(|G_{ii}|+|(GA)_{ii}|\prec 1\) by (12). Combining (53)–(54) we finally conclude that

$$\begin{aligned} \langle GA\rangle =-m\langle \underline{WG}A\rangle +{\mathcal {O}}_\prec \left( \frac{\varLambda _+\sqrt{\rho }}{N\sqrt{\eta }}\right) ={\mathcal {O}}_\prec \left( \frac{\varLambda _+\sqrt{\rho }}{N\sqrt{\eta }}\right) , \end{aligned}$$
(55)

where we used \(|\langle \underline{WG}A\rangle |\prec \varLambda _+\rho ^{1/2} N^{-1}\eta ^{-1/2}\) by (49). This concludes the proof of (21). \(\square \)

Next, using the local law for single resolvent proven above, we proceed with the proof of the bounds for for products of two resolvents and deterministic traceless matrices.

Proof

(Proposition 2) We start writing the equation for generic products of two \(G\)’s \(G_1B_1G_2B_2\), where \(G_i=(W-z_i)^{-1}\) and \(B_1,B_2\) are deterministic matrices. Using the equation (52) for \(G_1B_1\), writing \(G_2=m_2+(G_2-m_2)\), we obtain

$$\begin{aligned} G_1B_1G_2B_2&=m_1m_2 B_1B_2+m_1B_1(G_2-m_2)B_2-m_1\underline{WG_1B_1G_2}B_2 \nonumber \\&\quad +m_1\langle G_1-m_1\rangle G_1B_1G_2B_2 + m_1\langle G_1B_1G_2\rangle G_2B_2 \nonumber \\&\quad +\frac{m_1\sigma }{N}G_1^t G_1B_1G_2B_2+\frac{m_1\sigma }{N} (G_1B_1G_2)^t G_2B_1 \nonumber \\&\quad + \frac{m_1\widetilde{w_2}}{N}{{\,\mathrm{diag}\,}}(G_1)G_1B_1G_2B_2+\frac{m_1 \widetilde{w_2}}{N}{{\,\mathrm{diag}\,}}(G_1B_1G_2)G_2B_2, \end{aligned}$$
(56)

where we used that

$$\begin{aligned} \begin{aligned} \underline{WG_1B_1G_2}&=\underline{WG_1}B_1G_2+\langle G_1B_1G_2\rangle G_2+\frac{\sigma }{N}(G_1B_1G_2)^t G_2 \\&\quad +\frac{\widetilde{w_2}}{N}{{\,\mathrm{diag}\,}}(G_1B_1G_2)G_2, \end{aligned} \end{aligned}$$
(57)

with \(\underline{WG_1}\) from (43). The identity in (57) follows by the definition of underline in (42).

Remark 6

For notational simplicity, throughout the proof of Proposition 2 we use the notations

$$\begin{aligned} \varLambda _+:= \varLambda _+^A\vee \varLambda _+^{A'}, \end{aligned}$$

rather than distinguishing the different \(\varLambda \)’s. However, the proof naturally yields in fact a factor of \(\varLambda _+^A\) for each traceless \(A\) giving the bounds in Propositions 2.

Proof

(Eq. (22)) We focus only on the proof of the bound for \(\langle G_1G_2A\rangle \), the bounds for \(\langle G_1^t G_2A\rangle \), \(\langle \mathfrak {I}G_1 A G_2\rangle \), \(\langle \mathfrak {I}G_1^t A G_2\rangle \), \(\langle \mathfrak {I}G_1 A \mathfrak {I}G_2\rangle \), and \(\langle \mathfrak {I}G_1^t A \mathfrak {I}G_2\rangle \) are completely analogous and so omitted, modulo the bound for the underlined term. In particular, the bound in (62) has to be replaced by

$$\begin{aligned} |\langle \underline{WG_2\mathfrak {I}G_1}A\rangle |\prec \frac{\rho _1\varLambda _+}{\sqrt{NK}\eta _*},\quad |\langle \underline{W\mathfrak {I}G_2 \mathfrak {I}G_1}A\rangle |\prec \frac{\rho _1\rho _2\varLambda _+}{\sqrt{NK}\eta _*}, \end{aligned}$$

for \(\langle \mathfrak {I}G_1 A G_2\rangle \), \(\langle \mathfrak {I}G_1^t A G_2\rangle \) and \(\langle \mathfrak {I}G_1 A \mathfrak {I}G_2\rangle \), \(\langle \mathfrak {I}G_1^t A \mathfrak {I}G_2\rangle \), respectively, where \(K:=N\eta _*\rho ^*\).

Choosing \(B_1=I\) and \(B_2=A\) in (56), with \(\langle A\rangle =0\), using \(|\langle G_1-m_1\rangle |\prec (N\eta _1)^{-1}\) we find that

$$\begin{aligned} \left[ 1+{\mathcal {O}}_\prec \left( \frac{1}{N\eta _2}\right) \right] \langle G_1G_2A\rangle&=-\,m_1\langle \underline{WG_1G_2}A\rangle +m_1\langle G_2A\rangle \nonumber \\&\quad +m_1\langle G_1G_2\rangle \langle G_2A\rangle +\frac{m_1\sigma }{N}\langle G_1^t G_1G_2A\rangle \nonumber \\&\quad +\frac{m_1\sigma }{N}\langle (G_1G_2)^t G_2A\rangle +\frac{m_1 \widetilde{w_2}}{N}\langle {{\,\mathrm{diag}\,}}(G_1)G_1G_2A\rangle \nonumber \\&\quad +\frac{m_1 \widetilde{w_2}}{N}\langle {{\,\mathrm{diag}\,}}(G_1G_2)G_2A\rangle . \end{aligned}$$
(58)

Then using Cauchy-Schwarz we have that

$$\begin{aligned} \begin{aligned} \frac{1}{N}|\langle G_1G_2AG_1^t\rangle |&\le \frac{1}{N}\langle G_1G_1^*\rangle ^{1/2}\langle G_2AG_1^t(G_1^t)^*AG_2^*\rangle ^{1/2} \\&\quad \prec \frac{\sqrt{\rho _1}}{N\eta _1\sqrt{\eta _2}}\langle \mathfrak {I}G_2 A \mathfrak {I}G_1^t A\rangle ^{1/2} \\&\quad \prec \frac{\rho _1\sqrt{\rho _2}\varLambda _+}{N\eta _1\sqrt{\eta _2}}\le \frac{\rho ^*\sqrt{\rho _*}\varLambda _+}{\sqrt{NK\eta _1\eta _2}}, \end{aligned} \end{aligned}$$
(59)

where we used the Ward identity, that \(\langle \mathfrak {I}G_1\rangle \prec \rho _1\), and that \(K=N\eta _*\rho ^*\). In the penultimate inequality of (59) we also used Lemma 5 to prove that \(\langle \mathfrak {I}G_2 A \mathfrak {I}G_1^t A\rangle \prec \rho _1\rho _2\varLambda _+^2\). Using exactly the same computations we conclude the same bound for \(\langle (G_1G_2)^t G_2A\rangle \) as well. Now we show that the terms with a pre-factor \(\widetilde{w_2}\) are negligible. We start with

$$\begin{aligned} \frac{1}{N}|\langle {{\,\mathrm{diag}\,}}(G_1)G_1G_2A\rangle |=\left| \frac{1}{N^2}\sum _i G_{ii} (G_1G_2A)_{ii}\right| \prec \frac{\sqrt{\rho _1\rho _2}}{N\sqrt{\eta _1\eta _2}}, \end{aligned}$$
(60)

obtained using that \(|G_{ii}|\prec 1\), by the isotropic law (12), and \(|(G_1G_2A)_{ii}|\prec \sqrt{\rho _1\rho _2/\eta _1\eta _2}\) by a simple Schwarz inequality. The bound for \(|\langle {{\,\mathrm{diag}\,}}(G_1G_2)G_2A\rangle |\) is analogous and so omitted. Combining (58) with (59)–(60), using the bound \(|\langle G_2A\rangle |\prec \sqrt{\rho _2}N^{-1}\eta _2^{-1/2}\) by (21), and dividing by the factor in the lhs. of (58), we conclude that

$$\begin{aligned} \begin{aligned} \langle G_1G_2A\rangle&=-m_1\langle \underline{WG_1G_2}A\rangle +m_1\langle G_1G_2\rangle \langle G_2A\rangle +{\mathcal {O}}_\prec \left( \frac{\rho ^*\varLambda _+}{\sqrt{NK}\eta _*}\right) \\&={\mathcal {O}}_\prec \left( \frac{\rho ^*\varLambda _+}{\sqrt{NK}\eta _*}\right) , \end{aligned} \end{aligned}$$
(61)

where to go from the first to the second line we used that \(|\langle G_2A\rangle |\varLambda _+ \prec \rho _2^{1/2}N^{-1}\eta _2^{-1/2}\) by (55), that \(|\langle G_1G_2\rangle |\prec \sqrt{\rho _1\rho _2/(\eta _1\eta _2)}\) by a Schwarz inequality, and that

$$\begin{aligned} \left|\langle \underline{WG_1G_2}A\rangle \right|\prec \frac{\rho ^*\varLambda _+}{\sqrt{NK}\eta _*}, \end{aligned}$$
(62)

by (49). This concludes the proof of the bound of \(|\langle G_1G_2A\rangle |\). \(\square \)

Proof

(Eq.(25) for \(\langle {\varvec{x}}, G_1AG_2{\varvec{y}}\rangle \)) Using the bound for \(\langle G_1G_2A\rangle \) and the estimates in Lemma 5 below as an input, the proof of the bound

$$\begin{aligned} |\langle {\varvec{x}}, G_1AG_2{\varvec{y}}\rangle |\prec \varLambda _+\sqrt{\frac{\rho ^*}{\eta _*}}, \end{aligned}$$
(63)

follows by exactly the same computations as in the proof of the bound for \(|\langle G_1G_2A\rangle |\)\(\square \)

Proof

(Local laws for two resolvents: Eqs. (23) and (26)) We focus only on the proof of the local law for \(\langle G_1AG_2A'\rangle \), the proof of the local law for \(\langle G_1^t AG_2A'\rangle \) is exactly the same. The prof of the local law for \(\langle G_1G_2^t\rangle \) is also analogous to the proof of the local law for \(\langle G_1AG_2A'\rangle \) with the only difference that the multiplicative factor in the rhs. of (64) has to be replaced by

$$\begin{aligned} 1-\sigma m_1m_2+{\mathcal {O}}_\prec \left( \frac{1}{N\eta _*}\right) . \end{aligned}$$

This difference does not create any change since for \(|\sigma |<1\) the stability factor \(1-\sigma m_1m_2\) is bounded from below by \(1-|\sigma |\).

Choosing \(B_1=A\) and \(B_2=A'\) in (56), with \(\langle A\rangle =\langle A'\rangle =0\), and using that \(|\langle G_1-m_1\rangle |\prec (N\eta _1)^{-1}\), we find that

$$\begin{aligned}&\Bigl (1+{{\mathcal {O}}}\left( \frac{1}{N\eta _2} \right) \Bigr )\langle G_1AG_2A'\rangle \nonumber \\&\quad =m_1m_2\langle AA'\rangle -m_1\langle \underline{WG_1AG_2}A'\rangle +\langle G_1AG_2\rangle \langle G_2A'\rangle \nonumber \\&\qquad +\frac{m_1\sigma }{N}\langle G_1^t G_1AG_2A'\rangle +\frac{m_1\sigma }{N}\langle (G_1AG_2)^t G_2A'\rangle \nonumber \\&\qquad +\frac{m_1\widetilde{w_2}}{N}\big [\langle {{\,\mathrm{diag}\,}}(G_1)G_1AG_2A'\rangle +\langle {{\,\mathrm{diag}\,}}(G_1AG_2)G_2A'\rangle \big ]. \end{aligned}$$
(64)

We start with the bound

$$\begin{aligned} \frac{1}{N}|\langle G_1^t G_1AG_2A'\rangle |\le & {} \frac{1}{N}\langle G_1A|G_2|AG_1^*\rangle ^{1/2}\langle (G_1^t)^*A'|G_2|A'(G_1)^t\rangle ^{1/2} \nonumber \\= & {} \frac{1}{N\eta _1} \langle \mathfrak {I}G_1A|G_2|A\rangle ^{1/2}\langle \mathfrak {I}G_1^t A'|G_2|A'\rangle ^{1/2}\prec \frac{\rho _1\varLambda _+^2}{N\eta _1}, \end{aligned}$$
(65)

where we used a Schwarz inequality and that \(|\langle \mathfrak {I}G_1A|G_2|A\rangle |\prec \rho _1\varLambda _+^2\) by Lemma 5 below. Following exactly the same computations we conclude that \(|\langle (G_1AG_2)^t G_2A'\rangle |\prec \varLambda _+^2\rho _2\eta _2^{-1}\). Similarly, we also bound

$$\begin{aligned} \frac{1}{N}|\langle {{\,\mathrm{diag}\,}}(G_1)G_1AG_2A'\rangle |=\left| \frac{1}{N^2}\sum _i(G_1)_{ii}(G_1AG_2A')_{ii}\right| \prec \frac{\sqrt{\rho _1\rho _2}}{N\sqrt{\eta _1\eta _2}}, \end{aligned}$$
(66)

where we used that \(|(G_1)_{ii}|\prec 1\) and that \(|(G_1AG_2A')_{ii}|\prec \sqrt{\rho _1\rho _2/(\eta _1\eta _2)}\) by a Schwarz inequality. The bound for \(|\langle {{\,\mathrm{diag}\,}}(G_1AG_2)G_2A'\rangle |\) is completely analogous and so omitted.

Combining (64) with (65)–(66), and using that

$$\begin{aligned} |\langle G_2A'\rangle |\prec \frac{ \sqrt{\rho _2}\varLambda _+}{N\sqrt{\eta _2}}, \qquad |\langle G_2G_1A\rangle |\prec \frac{ \rho ^*\varLambda _+}{\sqrt{NK}\eta _*}, \end{aligned}$$

by (21) and (22), respectively, we conclude that

$$\begin{aligned} \langle G_1AG_2A'\rangle= & {} m_1m_2\langle AA'\rangle -m_1\langle \underline{WG_1AG_2}A'\rangle +{\mathcal {O}}_\prec \left( \frac{\rho ^*\varLambda _+^2}{K^2}\right) \nonumber \\= & {} m_1m_2\langle AA'\rangle +{\mathcal {O}}_\prec \left( \frac{\rho ^*\varLambda _+^2}{\sqrt{K}}\right) . \end{aligned}$$
(67)

To go from the first to the second line of (67) we used that \(|\langle \underline{WG_1AG_2}A'\rangle |\prec \varLambda _+^2\rho ^*K^{-1/2}\) by (49). This concludes the proof of the local law for \(\langle G_1AG_2A'\rangle \). \(\square \)

In order to conclude the proof of Proposition 2 we are only left with the averaged local laws for \(\mathfrak {I}G_1 A \mathfrak {I}G_2 A'\) and \(\mathfrak {I}G_1A \mathfrak {I}G_2^t A'\) in (24) and for \(\mathfrak {I}G_1^t\mathfrak {I}G_2\) in (27).

Proof

(Eq. (24)) We present only the proof of the local law for \(\langle \mathfrak {I}G_1 A \mathfrak {I}G_2 A'\rangle \), the proof for \(\langle \mathfrak {I}G_1 A \mathfrak {I}G_2^t A'\rangle \) is identical and so omitted. We start with the formula analogous to (56) but with \(\mathfrak {I}G\)’s instead of \(G\)’s, generating altogether twelve terms with a \(1/N\) pre-factor. Ten of them can be estimated by \(\varLambda _+^2\rho _1\rho _2L^{-1}\) exactly as in (65)–(66) by writing out \(2\mathrm {i}\mathfrak {I}G_i=G_i-G_i^*\). Note that whenever the analogue of (65) is used, but with \(G_1^{(t)}G_2\) instead of \(G_1^{(t)}G_1\), we could gain the necessary factor \(\sqrt{\rho _1\rho _2/(\eta _1\eta _2)}\) instead of only \(\rho _1/\eta _1\) in the first Schwarz inequality in (65). Keeping the two special \(1/N\) terms, this gives the expansion

$$\begin{aligned}&\langle \mathfrak {I}G_1 A\mathfrak {I}G_2 A'\rangle +{{\mathcal {O}}_\prec }\left( \frac{\varLambda _+^2\rho _1\rho _2}{L} \right) \nonumber \\&\quad =\mathfrak {I}m_ 1\mathfrak {I}m_2 \langle AA'\rangle +\mathfrak {I}m_1 \langle \mathfrak {I}(G_2-m_2) A'A\rangle +\overline{m_1}\overline{\langle G_1-m_1\rangle }\langle \mathfrak {I}G_1 A\mathfrak {I}G_2 A'\rangle \nonumber \\&\qquad +\mathfrak {I}[m_1\langle G_1-m_1\rangle ]\langle G_1 A\mathfrak {I}G_2 A'\rangle -\mathfrak {I}m_1 \langle \underline{WG_1A\mathfrak {I}G_2} A'\rangle \nonumber \\&\qquad -\overline{m_1}\langle \underline{W\mathfrak {I}G_1A\mathfrak {I}G_2} A'\rangle +\mathfrak {I}m_1\langle G_1AG_2\rangle \langle \mathfrak {I}G_2A'\rangle \nonumber \\&\qquad +\mathfrak {I}m_1\langle G_1A\mathfrak {I}G_2\rangle \langle G_2^*A'\rangle +\overline{m_1}\langle \mathfrak {I}G_1 AG_2\rangle \langle \mathfrak {I}G_2A'\rangle \nonumber \\&\qquad +\overline{m_1}\langle \mathfrak {I}G_1A\mathfrak {I}G_2\rangle \langle G_2^*A'\rangle +\frac{\sigma \mathfrak {I}m_1}{N}\langle G_1A\mathfrak {I}G_2A' G_1^t\rangle \nonumber \\&\qquad +\frac{\sigma \overline{m_1}}{N}\langle \mathfrak {I}[G_1^t G_1]A\mathfrak {I}G_2 A'\rangle . \end{aligned}$$
(68)

The two remaining \(1/N\) terms, where \(\mathfrak {I}G_2\) is separated from \(G_1\) by \(A\)’s, are estimated as follows:

$$\begin{aligned} |\langle G_1A\mathfrak {I}G_2A' G_1^t\rangle |\le \langle G_1A\mathfrak {I}G_2A'G_1^*\rangle ^{1/2}\langle (G_1^t)^*A\mathfrak {I}G_2A'G_1^t\rangle ^{1/2} \prec \frac{N\rho _1\rho _2\varLambda _+^2}{L}, \end{aligned}$$
(69)

where in the last inequality we used the Ward identity and Lemma 5 below. Inserting (69), the local law \(|\langle G_2-m_2\rangle |\prec (N\eta _2)^{-1}\) and (21)–(22) into (68) we conclude that

$$\begin{aligned} \begin{aligned} \langle \mathfrak {I}G_1 A\mathfrak {I}G_2 A'\rangle&=\mathfrak {I}m_1 \mathfrak {I}m_2 \langle AA'\rangle -\mathfrak {I}m_1 \langle \underline{WG_1A\mathfrak {I}G_2} A'\rangle \\&\quad -\overline{m_1}\langle \underline{W\mathfrak {I}G_1 A\mathfrak {I}G_2} A'\rangle +{\mathcal {O}}_\prec \left( \frac{\varLambda _+^2\rho _1\rho _2}{L}\right) . \end{aligned} \end{aligned}$$
(70)

Finally, combining (70) with the bound for \(\langle \underline{WG_1A\mathfrak {I}G_2} A'\rangle \), \(\langle \underline{W\mathfrak {I}G_1 A\mathfrak {I}G_2} A'\rangle \) in (49), we conclude that

$$\begin{aligned} \langle \mathfrak {I}G_1 A\mathfrak {I}G_2 A'\rangle =\mathfrak {I}m_1 \mathfrak {I}m_2 \langle AA'\rangle +\left( \frac{\varLambda _+^2\rho _1\rho _2}{\sqrt{L}}\right) . \end{aligned}$$
(71)

\(\square \)

Proof

(Local law for \(\langle \mathfrak {I}G_1\mathfrak {I}G_2^t\rangle \): Eq. (27)) We closely follow the proof of \(\langle \mathfrak {I}G_1A\mathfrak {I}G_2 A'\rangle \), hence we only explain the differences. As each traceless \(A\), \(A'\) between two resolvents gave rise to a factor \(\varLambda _+\) in the proof of \(\langle \mathfrak {I}G_1A\mathfrak {I}G_2 A'\rangle \), here the fact that a resolvent is followed by its transpose gives rise to a factor \(\varPi _+\). Keeping this modification in mind, in the basic equation for \(\langle \mathfrak {I}G_1\mathfrak {I}G_2^t\rangle \) we can again estimate all the \(1/N\) terms as in (65)–(66) and (69) by \((1+\varPi ^2)\rho _1\rho _2L^{-1}\). Then, using the local law \(|\langle G_i-m_i\rangle |\prec (N\eta _i)^{-1}\), similarly to (68), we conclude that

$$\begin{aligned} \langle \mathfrak {I}G_1 \mathfrak {I}G_2^t\rangle= & {} \mathfrak {I}m_1 \mathfrak {I}m_2-\mathfrak {I}m _1 \langle \underline{WG_1\mathfrak {I}G_2^t}\rangle -\overline{m_1}\langle \underline{W\mathfrak {I}G_1\mathfrak {I}G_2^t}\rangle \nonumber \\&+\sigma \mathfrak {I}m_1 \langle G_1G_2^t\rangle \langle \mathfrak {I}G_2\rangle +\sigma \mathfrak {I}m_1\langle G_1\mathfrak {I}G_2^t\rangle \langle G_2^*\rangle \nonumber \\&+\sigma \overline{m_1}\langle \mathfrak {I}G_1 G_2^t\rangle \langle \mathfrak {I}G_2\rangle \nonumber \\&+\sigma \overline{m_1}\langle \mathfrak {I}G_1\mathfrak {I}G_2^t\rangle \langle G_2^*\rangle +{\mathcal {O}}_\prec \left( \frac{\varPi _+^2\rho _1\rho _2}{L}\right) , \end{aligned}$$
(72)

where we used \(\varPi _+:=1+\varPi \). Note that several “large” terms remained in (72) in contrast to (70) since the analogues of \(\langle \mathfrak {I}G_2A\rangle \) and \(\langle G_2^*A\rangle \) in (68) are now not small. Then using the bounds in (49) for the underlined terms in (72), and the local laws

$$\begin{aligned} \langle G_1G_2^t\rangle= & {} \frac{m_i\mathfrak {I}m_j}{1-\sigma m_1m_2}+{\mathcal {O}}_\prec \left( \frac{\rho ^*\varPi _+^2}{\sqrt{K}}\right) ,\nonumber \\ \langle G_i\mathfrak {I}G_j^t\rangle= & {} \frac{m_i\mathfrak {I}m_j}{(1-\sigma m_i m_j)(1-\sigma m_i\overline{m_j})}+{\mathcal {O}}_\prec \left( \frac{\rho _j\varPi _+^2}{\sqrt{K}}\right) , \end{aligned}$$
(73)

we conclude

$$\begin{aligned} \langle \mathfrak {I}G_1 \mathfrak {I}G_2^t\rangle= & {} \frac{\mathfrak {I}m_1\mathfrak {I}m_2(1-\sigma ^2|m_1m_2|^2)}{|1-\sigma \overline{m_1}m_2|^2(1-\sigma m_1m_2)}+\sigma \overline{m_1}\langle \mathfrak {I}G_1\mathfrak {I}G_2^t\rangle \langle G_2^*\rangle \nonumber \\&+{\mathcal {O}}_\prec \left( \frac{\varPi _+^2\rho _1\rho _2}{\sqrt{L}}\right) . \end{aligned}$$
(74)

We remark that the second local law in (73) follows analogously to (26). Finally, writing \(\langle G_2^*\rangle \) in the last term in the rhs. of (74) as \(\langle G_2^*\rangle =\overline{m_2}+\langle (G_2-m_2)^*\rangle \), we conclude (27). \(\square \)

This concludes the proof of Proposition 2. \(\square \)

5 Feynman Diagrams: Proof of Theorem 5

For the sake of simpler notations we abbreviate

$$\begin{aligned} \eta :=\eta _*=\min _k\eta _k,\quad \rho :=\rho ^*=\max _k\rho _k, \quad K:= N\eta _*\rho ^*\ge L = N \min _k(\eta _k\rho _k) \end{aligned}$$
(75)

and within this section write \(\rho ^i\) and \(\varLambda _+^{a}\) with \(i:=\left|{\mathfrak {i}}\right|,a=\left|{\mathfrak {a}}\right|\) and \(\varLambda _+:=\max _{k\in {\mathfrak {a}}}\varLambda _+^{B_k}\) rather than carrying the products \(\prod _{k\in {\mathfrak {i}}}\rho _i\) and \(\prod _{k\in {\mathfrak {a}}}\varLambda _+^{B_k}\). Within the formal proof of Theorem 5 we argue, however, that the proof naturally yields the latter. In order to present the main body of the proof of Theorem 5 more concisely we make four simplifying assumptions, the removal of which are routine modifications.Footnote 2

5.1 Graphical representation of the cumulant expansion

Using multiple cumulant expansions we expand the high moments

$$\begin{aligned} {{\,\mathrm{{\mathbf {E}}}\,}}\left|\langle \underline{WG_1B_1G_2B_2\cdots G_l B_l}\rangle \right|^{2p} \end{aligned}$$

and

$$\begin{aligned} \left|\langle \varvec{x},\underline{G_1 B_1\cdots G_{j} B_j W G_{j+1}B_{j+1}\cdots B_{l-1}G_l }\varvec{y}\rangle \right|^{2p} \end{aligned}$$

as a polynomial of resolvent entries for any \(p\in {\mathbf {N}}\). More precisely, we iteratively use the expansion

$$\begin{aligned} \begin{aligned} {{\,\mathrm{{\mathbf {E}}}\,}}w_{ab} f(W)&= \sum _{k=1}^R \sum _{\varvec{\alpha }\in \left\{ ab,ba \right\} ^k}\frac{\kappa (ab,\varvec{\alpha })}{k!} {{\,\mathrm{{\mathbf {E}}}\,}}\partial _{\varvec{\alpha }} f(W) + \varOmega _R \end{aligned} \end{aligned}$$
(76)

with some explicit error term \(\varOmega _R\) (see [28, Proposition 3.2 and Appendix C] applied with \({\mathcal {N}}(ab)=\left\{ ab,ba \right\} \), or the previous works with slightly different error terms [14] and [34, Lemmata 3.1, 7.1]) which for our application can easily be seen to be \({{\mathcal {O}}}\left( N^{-2p} \right) \) if \(R=12p\). Here for a \(k\)-tuple of double indices \(\varvec{\alpha }=(\alpha _1,\ldots ,\alpha _k)\) we use the short-hand notation \(\kappa (ab,(\alpha _1,\ldots ,\alpha _k)):=\kappa (w_{ab},w_{\alpha _1},\ldots ,w_{\alpha _k})\) for the joint cumulant of the random variables \(w_{ab},w_{\alpha _1},\ldots ,w_{\alpha _k}\) and set \(\partial _{\varvec{\alpha }}:=\partial _{w_{\alpha _1}}\cdots \partial _{w_{\alpha _k}}\), \(\partial _{ab}:=\partial _{w_{ab}}\). We wish to express the cumulant factors in (76) as a matrix with \(a,b\) matrix elements. To encode the fact the that cumulants have slightly different combinatorics for \(a=b\) and \(a\ne b\), we rewrite (76) as

$$\begin{aligned} {{\,\mathrm{{\mathbf {E}}}\,}}w_{ab} f(W)&= \sum _{k=1}^R \biggl (\varvec{1}(a=b)\frac{\kappa (\left\{ aa \right\} ^{k+1})}{k!} {{\,\mathrm{{\mathbf {E}}}\,}}\partial _{aa}^k f(W) \nonumber \\&\quad + \varvec{1}(a\ne b) \sum _{q+q'=k}\left( {\begin{array}{c}k\\ q\end{array}}\right) \frac{\kappa (\left\{ ab \right\} ^{q+1},\left\{ ba \right\} ^{q'})}{k!}{{\,\mathrm{{\mathbf {E}}}\,}}\partial _{ab}^p \partial _{ba}^{q'} f(W)\biggr )+ \varOmega _R, \end{aligned}$$
(77)

where we used that cumulants are invariant under reordering their entries, and thus \(\kappa (ab,\varvec{\alpha })\) can be expressed as the cumulant of \(q+1\) copies \(\left\{ ab \right\} ^{q+1}\) of \(ab\) and \(q'\) copies \(\left\{ ba \right\} ^{q'}\) of \(ba\). In order to simplify notations we introduce the matrices \(\kappa ^{q+1,q'}\) for integers \(q,q'\ge 0\) with \(q+q'\ge 1\) with matrix elements

$$\begin{aligned} \kappa ^{1,1}_{ab}&:= 1,\qquad \kappa ^{2,0}_{ab} := \sigma , \nonumber \\ \frac{\kappa ^{q+1,q'}_{ab}}{N^{(k+1)/2}}&:= \varvec{1}(a=b) \frac{\kappa (\left\{ aa \right\} ^{k+1})}{(k+1)k!} + \varvec{1}(a\ne b) \left( {\begin{array}{c}k\\ q\end{array}}\right) \frac{\kappa (\left\{ ab \right\} ^{q+1},\left\{ ba \right\} ^{q'})}{k!}, \end{aligned}$$
(78)

with \(k=q+q'\ge 2\), so that (76) can be rewritten as

$$\begin{aligned} \begin{aligned} {{\,\mathrm{{\mathbf {E}}}\,}}w_{ab} f(W)&= \sum _{k=1}^R \sum _{q+q'=k} \frac{\kappa ^{q+1,q'}_{ab}}{N^{(k+1)/2}} {{\,\mathrm{{\mathbf {E}}}\,}}\partial _{ab}^q \partial _{ba}^{q'} f(W) + \varOmega _R\\&= {{\,\mathrm{{\mathbf {E}}}\,}}\frac{\partial _{ba} f(W)+ \sigma \partial _{ab} f(W)}{N}\\&\quad +\sum _{k=2}^R\sum _{q+q'=k} \frac{\kappa ^{q+1,q'}_{ab}}{N^{(k+1)/2}} {{\,\mathrm{{\mathbf {E}}}\,}}\partial _{ab}^q \partial _{ba}^{q'} f(W) + \varOmega _R, \end{aligned} \end{aligned}$$
(79)

where we used that due to (A-i) we have that \(\kappa (\left\{ \sqrt{N}w_{aa} \right\} ^2)=w_2 =1+\sigma =\kappa ^{1,1}_{aa}+\kappa ^{2,0}_{aa}\).

We begin with some examples before describing the general structure of the expansion. We consider the case \(p=1\) and \(l=2\) and perform a cumulant expansion

$$\begin{aligned}&{{\,\mathrm{{\mathbf {E}}}\,}}\left|\langle \underline{WGA \mathfrak {I}G A}\rangle \right|^2 \nonumber \\&\quad ={{\,\mathrm{{\mathbf {E}}}\,}}\langle \underline{WGA\mathfrak {I}G A}\rangle \langle \underline{A \mathfrak {I}G A G^*W}\rangle \nonumber \\&\quad = N^{-1}\sum _{ab} {{\,\mathrm{{\mathbf {E}}}\,}}\Bigl (\langle \varDelta ^{ab}GA\mathfrak {I}G A\rangle \partial _{ba} \langle \underline{A\mathfrak {I}G A G^*W}\rangle \nonumber \\&\qquad + \sigma \langle \varDelta ^{ab}GA\mathfrak {I}G A\rangle \partial _{ab} \langle \underline{A\mathfrak {I}G A G^*W}\rangle \Bigr )\nonumber \\&\qquad + \sum _{k=2}^R \sum _{q+q'=k}\frac{\kappa ^{q+1,q'}_{ab}}{N^{(k+1)/2}} {{\,\mathrm{{\mathbf {E}}}\,}}\partial _{ab}^q \partial _{ba}^{q'}\Bigl [\langle \varDelta ^{ab}GA\mathfrak {I}G A\rangle \langle \underline{A\mathfrak {I}G A G^*W}\rangle \Bigr ], \end{aligned}$$
(80)

where \((\varDelta ^{ab})_{cd}=\delta _{ac}\delta _{bd}\). In order to compute the derivative of \(\mathfrak {I}G\) we write

$$\begin{aligned} \partial _{ba} \mathfrak {I}G = \eta \partial _{ba} G G^*= - \eta (G\varDelta ^{ba} GG^*+GG^*\varDelta ^{ba}G^*) = - (G\varDelta ^{ba} \mathfrak {I}G + \mathfrak {I}G\varDelta ^{ba} G^*). \end{aligned}$$

By distributing the derivatives according to Leibniz’ rule we can write (80) as

$$\begin{aligned} \begin{aligned}&{{\,\mathrm{{\mathbf {E}}}\,}}\sum _{ab}\frac{\kappa ^{1,1}_{ab}}{N} \langle \varDelta ^{ab}GA\mathfrak {I}G A\rangle \langle A\mathfrak {I}G A G^*\varDelta ^{ba}-\underline{A(G\varDelta ^{ba}\mathfrak {I}G + \mathfrak {I}G\varDelta ^{ba} G^*) A G^*W}\rangle \\&\quad +{{\,\mathrm{{\mathbf {E}}}\,}}\sum _{ab}\frac{\kappa ^{2,0}_{ab}}{N} \langle \varDelta ^{ab}GA\mathfrak {I}G A\rangle \langle A\mathfrak {I}G A G^*\varDelta ^{ab}-\underline{A(G\varDelta ^{ab}\mathfrak {I}G + \mathfrak {I}G\varDelta ^{ab} G^*) A G^*W}\rangle \\&\quad - {{\,\mathrm{{\mathbf {E}}}\,}}\sum _{ab} \frac{\kappa ^{2,1}_{ab}}{N^{3/2}} \langle \varDelta ^{ab}G\varDelta ^{ba}GA\mathfrak {I}G A\rangle \langle A\mathfrak {I}G A G^*\varDelta ^{ab}-\underline{A\mathfrak {I}G A G^*\varDelta ^{ab}G^*W}\rangle +\cdots \end{aligned}\nonumber \\ \end{aligned}$$
(81)

where we chose two representative terms for \(k=1\) and \(k=2\) each. By performing another cumulant expansion for the remaining underlined terms in (81) we obtain

$$\begin{aligned} \&N^2{{\,\mathrm{{\mathbf {E}}}\,}}\left|\langle \underline{WGA\mathfrak {I}G A}\rangle \right|^2 \nonumber \\&\quad = {{\,\mathrm{{\mathbf {E}}}\,}}\sum _{ab}\frac{\kappa ^{1,1}_{ab}}{N} (GA\mathfrak {I}G A)_{ba}\biggl [(A\mathfrak {I}G AG^*)_{ab} + \sigma (A\mathfrak {I}G AG^*)_{ba}\biggr ]\nonumber \\&\qquad -{{\,\mathrm{{\mathbf {E}}}\,}}\sum _{ab} \frac{\kappa ^{2,1}_{ab}}{N^{3/2}} G_{bb} (GA\mathfrak {I}G A)_{aa}(A\mathfrak {I}G AG^*)_{ab}\nonumber \\&\qquad +{{\,\mathrm{{\mathbf {E}}}\,}}\sum _{abcd} \frac{\kappa ^{1,1}_{ab}}{N}\frac{\kappa ^{1,1}_{cd}}{N}G_{bd}(GA\mathfrak {I}GA)_{ca} \Bigl [(A\mathfrak {I}G)_{db}(G^*A G^*)_{ac} +(A\mathfrak {I}G)_{da}(G^*A G^*)_{bc} \Bigr ]\nonumber \\&\qquad -{{\,\mathrm{{\mathbf {E}}}\,}}\sum _{abcd} \frac{\kappa ^{2,1}_{ab}}{N^{3/2}}\frac{\kappa ^{1,1}_{ab}}{N} G_{bb}(GA\mathfrak {I}G )_{ad}(G^*A)_{ca} (A\mathfrak {I}G A G^*)_{da} G^*_{bc}\nonumber \\&\qquad +{{\,\mathrm{{\mathbf {E}}}\,}}\sum _{abcd}\frac{\kappa ^{2,1}_{ab}}{N^{3/2}}\frac{\kappa ^{2,1}_{cd}}{N^{3/2}} G_{bb}G_{ac}G_{dd}(GA\mathfrak {I}G A)_{ca} (A\mathfrak {I}G A G^*)_{da} G^*_{bc} + \cdots , \end{aligned}$$
(82)

where we again selected representative terms. We notice that the rhs. can be written as a polynomial in the entries of two types of matrices; the \(\kappa \)-matrices representing cumulants like \(\kappa ^{2,1}\), and the \(G\)-matrices representing resolvents like \(\mathfrak {I}G\) or \(G^*\), or their multiples with \(A\) like \(A\mathfrak {I}G\), \(G^*A\). In order to achieve this representation we introduce additional internal summation indices to expand longer products e.g. we write \((A\mathfrak {I}G A G^*)_{da}=\sum _e (A\mathfrak {I}G)_{de}(AG^*)_{ea}\). The value of any given graph is the numerical result of summing up all indices. The precise definition will be given later in (89); here, as an example, the first term in (82) with indicated summation indices reads

figure a

where the (directed) edges represent matrices and the vertices represent summation indices. The edge orientation indicates the order of indices of the represented matrix which for the \(G\)-edges is uniquely determined from the expansion, while for \(\kappa \)-edges it may be chosen arbitrarily, as long as the represented matrix is defined consistently with the orientation, see (88) later. Here we drew the internal vertices as empty, and the \(\kappa \)-vertices as filled nodes, the \(\kappa \)-matrices as dashed, and the \(G\)-matrices as solid edges. Both internal- and \(\kappa \)-vertices correspond to independent summations over the index set \([N]\).

Thus, graphically we can represent (82) as

(83)

Note that the dashed edges connect only filled nodes and they form a perfect matching. The number of \(G\)-edges adjacent to each filled vertex is equal to the order of the corresponding cumulant expansion.

Similarly, for the isotropic case we obtain, for example the polynomials

$$\begin{aligned}&{{\,\mathrm{{\mathbf {E}}}\,}}\left|\langle \varvec{x},\underline{GA W G}\varvec{y}\rangle \right|^2 = {{\,\mathrm{{\mathbf {E}}}\,}}\langle \varvec{x},\underline{GA W G}\varvec{y}\rangle \langle \varvec{y},\underline{G^*W AG^*}\varvec{x}\rangle \nonumber \\&\quad = {{\,\mathrm{{\mathbf {E}}}\,}}\sum _{ab} \frac{\kappa ^{1,1}_{ab}}{N}(GA)_{\varvec{x}a}G_{b\varvec{y}} G^*_{\varvec{y}b}(AG^*)_{a\varvec{x}} \nonumber \\&\qquad + {{\,\mathrm{{\mathbf {E}}}\,}}\sum _{abcd}\frac{\kappa ^{1,1}_{ab}}{N}\frac{\kappa ^{1,1}_{cd}}{N} (GA)_{\varvec{x}a}G_{bd}G_{c\varvec{y}} \Bigl [G^*_{\varvec{y}b}G^*_{ac}(AG^*)_{d\varvec{x}} + G^*_{\varvec{y}c}(AG^*)_{db} G^*_{a\varvec{x}} \Bigr ]\nonumber \\&\qquad - {{\,\mathrm{{\mathbf {E}}}\,}}\sum _{abcd} \frac{\kappa ^{2,1}_{ab}}{N^{3/2}}\frac{\kappa ^{1,1}_{cd}}{N} (GA)_{\varvec{x}a} G_{bb} G_{ad} G_{c\varvec{y}} G^*_{\varvec{y}a} G^*_{bc} (AG^*)_{d\varvec{x}} + \cdots \end{aligned}$$
(84)

which we represent graphically as

(85)

with external vertices drawn as squares. Note that the vectors \(\varvec{x}\) and \(\varvec{y}\) are naturally represented by external vertices that are drawn as solid squares.

After these examples we now explain the general structure of the graphs and give a precise definition of graphs and their graph value used in (83) and (85).

Definition 2

We define the class \({\mathcal {G}}\) of oriented graphs used within this paper by the following requirements. Each \(\varGamma =(V,E)\in {\mathcal {G}}\) has three types of vertices, \(\kappa \)-vertices \(V_\kappa \), internal vertices \(V_\mathrm {i}\) and external vertices \(V_\mathrm {e}\), so that \(V=V_\kappa {\dot{\cup }} V_\mathrm {i}{\dot{\cup }} V_\mathrm {e}\), and two types of edges, \(\kappa \)-edges \(E_\kappa \) and \(G\)-edges \(E_g\), so that \(E=E_\kappa {\dot{\cup }} E_g\). For each vertex \(v\in V\) we define its \(G\)-in- and out-degree \(d_g^\mathrm {in}(v),d_g^\mathrm {out}(v)\) as the number of incoming and outgoing \(G\)-edges. The total degree \(d_g(v)\) is defined as the sum \(d_g(v):=d_g^\mathrm {in}(v)+d_g^\mathrm {out}(v)\) of in- and out-degrees and the three vertex classes satisfy

$$\begin{aligned} d_g(v)= {\left\{ \begin{array}{ll} 1,&{} v\in V_\mathrm {e}\\ 2, &{} v\in V_\mathrm {i}, \end{array}\right. },\qquad d_g(v)\ge 2, \quad v\in V_\kappa . \end{aligned}$$
(86)

We can partition \(V_\kappa = \bigcup _{k\ge 2} V_\kappa ^k\) with \(V_\kappa ^k:=\left\{ v\in V_\kappa \Big |d_g(v)=k \right\} \).

Within the graphs \(\varGamma \) each external vertex \(v\in V_\mathrm {e}\) carries some \(\varvec{x}(v)\in {\mathbf {C}}^N\) as a vector-valued label recording which vector the vertex represents. Each \(\kappa \)-edge \(e\in E_\kappa \) carries two integer-valued labels \(r(e)\ge 1,s(e)\ge 0\) recording the cumulant type. Each \(G\)-edge \(e\in E_g\) carries six labels. The binary labels \(i(e),t(e),*(e)\in \left\{ 0,1 \right\} \) indicate whether \(e\) represents the imaginary part, the transpose and/or the adjoint of a resolvent. The scalar label \(z(e)\) records the spectral parameter of the resolvent and the matrix-valued labels \(L(e),R(e)\) record deterministic matrices which are multiplied with the resolvent from the left/right.

We now relate the graphs to the polynomials they represent. Each internal vertex or \(\kappa \)-vertex \(v\) corresponds to an independent summation \(a_v\in [N]\). In order to unify notations we define a labelling map

$$\begin{aligned} \begin{aligned} \varvec{x}:V&\rightarrow {\mathbf {C}}^N,\qquad v\mapsto \varvec{x}_v := {\left\{ \begin{array}{ll} \varvec{x}(e),&{} v\in V_\mathrm {e},\\ \varvec{e}_{a_v},&{} v\in V_\mathrm {i}\cup V_\kappa , \end{array}\right. } \end{aligned} \end{aligned}$$
(87)

where \(\varvec{e}_a\) is the \(a\)-th unit vector in the standard basis, and for \(v\in V_\mathrm {e}\), the vector \(\varvec{x}(v)\) is the label of \(v\) from Definition 2. The \(G\)-edges \(e\in E_g\) represent resolvents defined via the labels of \(e\) from Definition 2. We define the matrix \({\mathcal {G}}^{e}\) as the resolvent \(G(z(e))\) modified according to \(i(e),t(e),*(e)\) and multiplied by \(L(e),R(e)\) from the left/right. As an example, we set

$$\begin{aligned} {\mathcal {G}}^e = B(\mathfrak {I}G(z))^t \quad \text {for } e \in E_g \end{aligned}$$

with

$$\begin{aligned} \Bigl (i(e),t(e),*(e),z(e),L(e),R(e)\Bigr )=(1,1,0,z,B,I). \end{aligned}$$

We remark that for all \(G\)-edges \(e\) considered in this paper at most one of the matrices \(L(e),R(e)\) is different from the identity matrix \(I\). The \(\kappa \)-edges \(e\in E_\kappa \) represent \(N\times N\) cumulant matrices \(\kappa ^e\) which are determined by the two integers \(r(e),s(e)\) from Definition 2 such that for \(a\ne b\),

$$\begin{aligned} \kappa _{ab}^{(uv)} := \kappa ^{r((uv)),s((uv))}_{ab}, \end{aligned}$$
(88)

where on the rhs. \(\kappa \) was defined in (78). We note that \(\left|\kappa _{ab}^{(uv)}\right|\lesssim 1\) by (8). Finally, we define the graph value

$$\begin{aligned} \begin{aligned} {{\,\mathrm{Val}\,}}(\varGamma ) := \sum _{\begin{array}{c} a_v\in [N]\\ v\in V_\mathrm {i}\cup V_\kappa \end{array}} \biggl [\prod _{(uv)\in E_\kappa } \biggl ( N^{-d_g(u)/2}\kappa ^{(uv)}_{a_u a_v} \biggr )\biggr ] \biggl ({\prod _{(uv)\in E_g}} {\mathcal {G}}^{(uv)}_{\varvec{x}_u \varvec{x}_v}\biggr ). \end{aligned} \end{aligned}$$
(89)

Among the degree-\(2\) vertices the ones between edges representing matrices whose eigenvectors are asymptotically orthogonal are of particular importance. There are two different mechanism for such orthogonality; (a) two resolvents, one with and one without transpose stand next to each other, e.g. \(GG^t\) or \(G^*(A(\mathfrak {I}G)^t)\), (b) a traceless matrix \(A\) stands between two resolvents, e.g. \((GA)G^*\) or \(G(A(\mathfrak {I}G)^t)\). Note that in some cases, e.g. \( (GA)(\mathfrak {I}G)^t\), both mechanism can be present simultaneously, and hence a vertex can be \(0\mathrm {tr}\)- and \(t\)-vertex at the same time.

Definition 3

(Orthogonality vertices).

  1. (a)

    A vertex \(v\in V_\kappa ^2\cup V_\mathrm {i}\) is called a \(t\)-orthogonality vertex, or short \(t\)-vertex if the two unique \(G\)-edges \(e_1,e_2\in E_g\) adjacent to \(v\) satisfy \((t(e_1),t(e_2))\in \left\{ (0,1),(1,0) \right\} \), i.e. if exactly one of the two \(G\)-edges adjacent to \(v\) is transposed.

  2. (b)

    A vertex \(v\in V_\mathrm {i}\cup V_\kappa ^2\) is called an zero-trace-orthogonality vertex, or short \(0\mathrm {tr}\)-vertex if exactly one of the two edges adjacent to \(v\) represents a resolvent (which is allowed to be the imaginary part, transposed, or adjoint) multiplied by a traceless matrix on the side of \(v\), while the other adjacent edge represents a resolvent matrix multiplied by the identity matrix on the side of \(v\). More precisely, using the labels \(L(e),R(e)\) of the edges, \(v\) is defined to be an \(0\mathrm {tr}\)-vertex if one of the following three conditions is satisfied:

    1. (b.i)

      there are incoming/outgoing edges \((uv),(vw)\in E_g\) such that either \(\langle R((uv))\rangle =0,L((vw))=I\) or \(\langle L((vw))\rangle =0,R((uv))=I\),

    2. (b.ii)

      there are two outgoing edges \((vu),(vw)\in E_g\) such that either \(\langle L((vu))\rangle =0,L((vw))=I\) or \(\langle L((vw))\rangle =0,L((vu))=I\),

    3. (b.iii)

      there are two incoming edges \((uv),(wv)\in E_g\) such that either \(\langle R((uv))\rangle =0,R((wv))=I\) or \(\langle R((wv))\rangle =0,R((uv))=I\).

Proposition 3

(Cumulant expansion). Let \({\mathfrak {a}},{\mathfrak {t}},{\mathfrak {i}}\) be fixed sets as in Theorem 5 of sizes \(a:=\left|{\mathfrak {a}}\right|, t:=\left|{\mathfrak {t}}\right|, i:=\left|{\mathfrak {i}}\right|\). Then for any \(p\in {\mathbf {N}}\) there exists a finite (\(N\)-independent) family of graphs \({\mathcal {G}}_p={\mathcal {G}}_p^\mathrm {av}\cup {\mathcal {G}}_p^\mathrm {iso}\subset {\mathcal {G}}\) such that

$$\begin{aligned} {{\,\mathrm{{\mathbf {E}}}\,}}\left|{{\,\mathrm{Tr}\,}}\underline{WG_1B_1G_2B_2\cdots G_l B_l}\right|^{2p}&=\sum _{\varGamma \in {\mathcal {G}}_p^\mathrm {av}}{{\,\mathrm{{\mathbf {E}}}\,}}{{\,\mathrm{Val}\,}}(\varGamma ) + {{\mathcal {O}}}\left( N^{-2p} \right) , \end{aligned}$$
(90)
$$\begin{aligned} {{\,\mathrm{{\mathbf {E}}}\,}}\left|\langle \varvec{x},\underline{G_1 B_1\cdots G_{j} B_j W G_{j+1}B_{j+1}\cdots B_{l-1}G_l }\varvec{y}\rangle \right|^{2p}&=\sum _{\varGamma \in {\mathcal {G}}_p^\mathrm {iso}}{{\,\mathrm{{\mathbf {E}}}\,}}{{\,\mathrm{Val}\,}}(\varGamma ) + {{\mathcal {O}}}\left( N^{-2p} \right) , \end{aligned}$$
(91)

and for each graph \(\varGamma \) we may select two disjoint subsets \(V_\mathrm {o}^t{\dot{\cup }} V_\mathrm {o}^{0\mathrm {tr}}=:V_\mathrm {o}\) of \(t\)- and \(0\mathrm {tr}\)-vertices, respectively, such that the following properties are satisfied:

  1. (P1)

    The graph \((V_\kappa ,E_\kappa )\) is a perfect matching, in particular, \(\left|V_\kappa \right|=2\left|E_\kappa \right|\).

  2. (P2)

    The number of \(\kappa \)-edges satisfies \(1\le \left|E_\kappa \right|\le 2p\).

  3. (P3)

    The number of \(G\)-edges satisfies

    $$\begin{aligned} \left|\left\{ e\in E_g\Big |i(e)=1 \right\} \right|&=2ip \end{aligned}$$
    (92a)
    $$\begin{aligned} \left|E_g\right|&= \sum _{e\in E_\kappa } d_g(e)+2(l-1)p\ge 2p. \end{aligned}$$
    (92b)
  4. (P4)

    For \((uv)\in E_\kappa \) the \(G\)-degrees of \(u,v\in V_\kappa \) satisfy \(d_g^\mathrm {in}(u)=d_g^\mathrm {out}(v)\), \(d_g^\mathrm {in}(v)=d_g^\mathrm {out}(u)\) and \(d_g(u)=d_g(v)\ge 2\). Therefore we may define the \(G\)-degree of \((uv)\) as \(d_g((uv)):=d_g(u)=d_g(v)\) and partition \(E_\kappa ={\dot{\bigcup }}_{k\ge 2} E_\kappa ^k\) into \(E_\kappa ^k := \left\{ e\in E_\kappa \Big |d_g(e)=k \right\} \).

  5. (P5)

    Every \(E_g\)-cycle on \(V_\kappa ^2\cup V_\mathrm {i}\) must contain at least two \(V_\kappa ^2\)-vertices, and in particular there cannot exist isolated loop edges, and there are at most \(\left|E_\kappa ^2\right|\) cycles.

  6. (P6)

    Denoting the number of isolated cycles in \((V_\kappa \cup V_\mathrm {i},E_g)\) with \(k\) vertices in \(V_\mathrm {o}\) by \(n_\mathrm {cyc}^{o=k}\), we have

    $$\begin{aligned} 2n_\mathrm {cyc}^{o=0}+n_\mathrm {cyc}^{o=1} \le 2\left|E_\kappa ^2\right|-\left|V_\mathrm {o}\cap V_\kappa ^2\right|.\end{aligned}$$
  7. (P7)

    The numbers of selected internal \(0\mathrm {tr}\)- and \(t\)-vertices are

    $$\begin{aligned} \left|V_\mathrm {i}\cap V_\mathrm {o}^{0\mathrm {tr}}\right|= {\left\{ \begin{array}{ll} 2p(a-1), &{} j\in {\mathfrak {a}}\\ 2pa, &{} j\not \in {\mathfrak {a}}, \end{array}\right. },\quad \left|V_\mathrm {i}\cap V_\mathrm {o}^{t}\right|= {\left\{ \begin{array}{ll} 2p(t-1), &{} j\in {\mathfrak {t}}\\ 2pt, &{} j\not \in {\mathfrak {t}}, \end{array}\right. } \end{aligned}$$

    where in the averaged case \(j:=l\) and \(j\) is determined by the lhs. of (91) in the isotropic case.

  8. (P8)

    If \(j\in {\mathfrak {a}}\) (with again \(j:=l\) in the averaged case), then the set of selected \(0\mathrm {tr}\)-vertices \(V_\mathrm {o}^{0\mathrm {tr}}\) satisfies

    $$\begin{aligned} 2 \left|E_\kappa ^2\right| + \left|E_\kappa ^{\ge 3}\right| - 2p\le \left|V_\mathrm {o}^{0\mathrm {tr}}\cap V_\kappa ^2\right|\le 2p,\end{aligned}$$

    while otherwise \(V_\mathrm {o}^{0\mathrm {tr}}\cap V_\kappa ^2=\emptyset \). Similarly, if \(j\in {\mathfrak {t}}\), then the set of selected \(t\)-vertices \(V_\mathrm {o}^t\) satisfies

    $$\begin{aligned} 2 \left|E_\kappa ^2\right| + \left|E_\kappa ^{\ge 3}\right| - 2p\le \left|V_\mathrm {o}^t\cap V_\kappa ^2\right|\le 2p,\end{aligned}$$

    while otherwise \(V_\mathrm {o}^t\cap V_\kappa ^2=\emptyset \).

The graphs \(\varGamma \in {\mathcal {G}}_p^\mathrm {av}\) satisfy (P1)-(P8) and in addition:

  • (\(P^\mathrm{av}9\)) There are no external vertices, i.e. \(V_\mathrm {e}=\emptyset \)

  • (\(P^\mathrm{av}10\)) The number of internal vertices satisfies \(\left|V_\mathrm {i}\right|=2(l-1)p\). The graphs \(\varGamma \in {\mathcal {G}}_p^\mathrm {iso}\) satisfy (P1)-(P8) and in addition:

  • (\(P^\mathrm{iso}9\)) The number of external vertices is \(\left|V_\mathrm {e}\right|=4p\) each \(v\in V_\mathrm {e}\) has degree \(d_g(v)=1\) and the unique connected vertex \(u\in V\) with \((uv)\in E_g\) or \((vu)\in E_g\) satisfies \(u\in V_\kappa \).

  • (\(P^\mathrm{iso}10\)) The number of internal vertices satisfies \(\left|V_\mathrm {i}\right|=2p(l-2)\).

Definition 4

For some parameters \(a,t,l,i,p\in {\mathbf {N}}\) we call graphs \(\varGamma \in {\mathcal {G}}\) together with their selected \(V_\mathrm {o}^t,V_\mathrm {o}^{0\mathrm {tr}}\) sets satisfying (P1)-(P8) and (P\(^{\mathrm{av}}9\))-(P\(^{\mathrm{av}}10\)) av-graphs, while we call graphs \(\varGamma \in {\mathcal {G}}\) (together with the sets \(V_\mathrm {o}^t,V_\mathrm {o}^{0\mathrm {tr}}\) and the extra parameter \(j\in [l-1]\)) satisfying (P1)-(P8) and (P\(^{\mathrm{iso}}9\))-(P\(^{\mathrm{iso}}10\)) iso-graphs.

Proof

(Proposition 3) In order to obtain (90) we iteratively perform cumulant expansions exactly as in the examples (80) and (84) until no underlined terms remain. Each cumulant expansion removes at least one underlined term, hence this process terminates.

We now explain which kinds of \(G\)-edges are created through this cumulant expansion procedure for the averaged case (90), the isotropic case  (91) being very similar. Initially, the graph representing the lhs. of (90) after writing out \(\left|{{\,\mathrm{Tr}\,}}X\right|^{2p}=({{\,\mathrm{Tr}\,}}X)^p ({{\,\mathrm{Tr}\,}}X^*)^p\) consists of \(2p\) cycles each with a \(W\) factor and \(l\) \(G\)-edges representing \({\mathcal {G}}\)-factors \(G_k B_k\) or \(B_k^*G_k^*\) for \(k\in [l]\). Each of these \({\mathcal {G}}\)-factors can be fully described via the labels \(i(e),t(e),*(e),z(e),L(e),R(e)\) from Definition 2, the first four being determined by the form of \(G_k\) while the latter two encode the multiplication from the left/right by deterministic matrices, e.g. \(L(e)=I,R(e)=B_k\) for \(G_k B_k\). While performing cumulant expansions of some \(W=\sum _{ab}w_{ab}\varDelta ^{ab}\) using (76) these \(G\)-edges are modified and new \(G\) edges are created via the action of derivatives, and \(\kappa \)-edges representing \(\kappa (ab,\varvec{\alpha })\) are also created. This process creates creates (finitely) many different graphs for every cumulant expansion, both through the explicit summation over cumulants in (76) and the Leibniz rule for the derivative \(\partial _{\varvec{\alpha }}\) acting on the product of all remaining \(W\)’s and \(G\)’s. We note that for resolvent derivatives we have

$$\begin{aligned} \begin{aligned} \partial _{ab} G&= -G\varDelta ^{ab}G,\quad \partial _{ab} G^*= -G^*\varDelta ^{ab}G^*, \\ \partial _{ab} G^t&= -G^t\varDelta ^{ba}G^t,\quad \partial _{ab} (G^*)^t= -(G^*)^t\varDelta ^{ba}(G^*)^t \end{aligned}\end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \partial _{ab} \mathfrak {I}G&= -G\varDelta ^{ab}\mathfrak {I}G - \mathfrak {I}G\varDelta ^{ab}G^*, \\ \partial _{ab} (\mathfrak {I}G)^t&= -(\mathfrak {I}G)^t\varDelta ^{ba}G^t - (G^*)^t\varDelta ^{ba}(\mathfrak {I}G)^t. \end{aligned} \end{aligned}$$

Hence, a derivative action on \(e\) representing the \({\mathcal {G}}\)-factor \({\mathcal {G}}^e\!=\!G_kB_k\)(or similarly \(B_k^*G_k^*\)) creates two \(G\)-edges \(e_1,e_2\), such that only the resolvent representing \(e_2\) is multiplied from the right by \(R(e_2)=B_k\) while \(L(e_2)=L(e_1)=R(e_1)=I\). The labels \(t(e),z(e)\) indicating the transposition status and spectral parameter are directly inherited to both \(e_1,e_2\), while the label \(i(e)\) is inherited to exactly one of \(e_1,e_2\), \(i(e_1)=1\) the other one satisfying \(i(e_2)=0\), \(*(e_2)\in \left\{ 0,1 \right\} \). If \(*(e)=1,i(e)=0\), then both \(e_1,e_2\) satisfy \(*(e_1)=*(e_2)=1\). It follows inductively that each \({\mathcal {G}}\)-factor encountered in the expansion can be represented by an edge \(e\) with six labels \(i(e),t(e),*(e),z(e),L(e),R(e)\), with \(L(e)=I\), or \(L(e)=B_k^*\) for some \(k\), while \(R(e)=I\) or \(R(e)=B_k\) for some \(k\), with for each \(e\) at least one of \(L(e),R(e)\) being the identity. The spectral parameter label \(z(e)\) satisfies \(z(e)\in \left\{ z_1,\dots ,z_l \right\} \) for each \(e\). For example, the \(ab\)-derivative of the \({\mathcal {G}}\)-factor \({\mathcal {G}}^e=B(\mathfrak {I}G(z))^t\) described by the edge \(e\) with labels \((1,1,0,z,B,I)\) yields a sum of two terms and hence the two new graphs given by

figure b

or, in formulas,

$$\begin{aligned} \partial _{ab}\Bigl [ B(\mathfrak {I}G(z))^t\Bigr ] = -B(\mathfrak {I}G(z))^t\varDelta ^{ba}G^t-B(G^*)^t\varDelta ^{ba} (\mathfrak {I}G(z))^t.\end{aligned}$$

We now describe the selection of the orthogonality vertices \(V_\mathrm {o}^t,V_\mathrm {o}^{0\mathrm {tr}}\) which is done in two steps. To unify notations we set \(j:=l\) in the averaged case.

  • (orth-1) For each \(k\in {\mathfrak {t}}\setminus \left\{ j \right\} ,{\mathfrak {a}}\setminus \left\{ j \right\} \) we collect \(2p\) distinct vertices from \(V_\mathrm {i}\) into the sets \(V_\mathrm {o}^t\) and \(V_\mathrm {o}^{0\mathrm {tr}}\), respectively.

  • (orth-2) If \(j\in {\mathfrak {t}}\) or \(j\in {\mathfrak {a}}\), then we select one vertex from \(V_\kappa ^2\) into \(V_\mathrm {o}^t\) or \(V_\mathrm {o}^{0\mathrm {tr}}\), respectively, for each \(W\) acting as a degree-\(2\) cumulant on some resolvent.

Regarding (orth-1) for \(k\in {\mathfrak {a}}\cup {\mathfrak {t}}\setminus \left\{ j \right\} \) the initial graphs representing the lhs. of (90)–(91) contain \(p\) internal vertices \(v_1,\ldots v_{p}\) between \(G\)-edges representing \((G_k B_k),(G_{k+1}B_{k+1})\) and \(p\) internal vertices \(v_{p+1},\ldots v_{2p}\) between \(G\)-edges representing \((B_{k+1}^*G_{k+1}^*),(B_k^*G_k^*)\). The \(G\)-edges adjacent to these internal vertices may change due to derivative actions along the cumulant expansions, however in case \(k\in {\mathfrak {a}}\), due to the derivative rules explained in the paragraph above it is ensured that all times the two unique \(G\)-edges \(e_1,e_2\) adjacent to \(v_k\) satisfy \(R(e_1)=B_k,L(e_2)=I\) for \(k\le p\) and \(R(e_1)=I,L(e_2)=B_k^*\), so that \(v_k\) is guaranteed to remain an \(0\mathrm {tr}\)-vertex. Similarly, for \(k\in {\mathfrak {t}}\) it is ensured that the two unique \(G\)-edges \(e_1,e_2\) adjacent to \(v_k\) satisfy \(t(e_1)=1,t(e_2)=0\), so that \(v_k\) is guaranteed to remain an \(t\)-vertex.

Regarding (orth-2) we note that while performing the cumulant expansion for \(W=\sum _{ab}w_{ab}\varDelta ^{ab}\) in \(G_j B_j WG_{j+1}\) we obtain the degree-\(2\) cumulant term as

$$\begin{aligned} \sum _{ab}G_j B_j \varDelta ^{ab}G_{j+1}\bigl (\partial _{ba}+\sigma \partial _{ab}\bigr ) \end{aligned}$$

the derivatives \(\partial _{ab}\) or \(\partial _{ba}\) acting on some resolvent \(G\) result in \(G\varDelta ^{ab}G\) or \(G\varDelta ^{ba}G\). In case \(j\in {\mathfrak {a}}\) the \(\kappa \)-vertex corresponding to the summation index \(a\) satisfies the definition of \(0\mathrm {tr}\)-vertex since \(\langle B_j\rangle =0\) and the other resolvent is not multiplied by some additional matrix in the \(a\)-direction. Similarly, in case \(j\in {\mathfrak {t}}\) either both or none of the two \(G\)’s in \(G\varDelta ^{ab}G\) or \(G\varDelta ^{ba}G\) are transposed, while, by definition, exactly one of \(G_j,G_{j+1}\) is transposed. Thus exactly one of the \(\kappa \)-vertices corresponding to the \(a\) or \(b\)-summations satisfies the definition of being a \(t\)-vertex.

We note that the condition \({\mathfrak {a}}\cap {\mathfrak {t}}=\emptyset \) ensures the sets \(V_\mathrm {o}^t,V_\mathrm {o}^{0\mathrm {tr}}\) constructed in this way to be disjoint. We now check that the properties (P1)-(P8), as well as (P\(^{\mathrm{av}}9\))-(P\(^{\mathrm{av}}10\)) and (P\(^{\mathrm{iso}}9\))-(P\(^{\mathrm{iso}}10\)) also hold for these graphs.

The properties (P1)-(P2) are obvious by construction since each cumulant expansion comes with two \(\kappa \)-vertices, and in total there are \(2p\) underlined terms and thereby at most \(2p\) cumulant expansions. The properties (P\(^{\mathrm{av}}10\))-(P\(^{\mathrm{iso}}10\)) follow from the fact that for each factor of \({{\,\mathrm{Tr}\,}}\underline{WG_1B_1\cdots G_l B_l}\) and

$$\begin{aligned} \langle \varvec{x},\underline{G_1 B_1\cdots G_{j} B_j W G_{j+1}B_{j+1}\cdots B_{l-1}G_l }\varvec{y}\rangle \end{aligned}$$

there are \(l-1\) and respectively \(l-2\) internal vertices of in- and out-degree \(1\) and that these properties remain invariant under cumulant expansions. Similarly, the properties (P\(^{\mathrm{av}}9\)) and (P\(^{\mathrm{iso}}9\)) hold true trivially for the initial terms and remain invariant under cumulant expansions.

For (P4) note that the cumulant \(\kappa (ab,(\alpha _1,\ldots ,\alpha _k))\) comes together with matrices

$$\begin{aligned} \varDelta ^{ab}, (\varDelta ^{\alpha _1})^{(t)},\ldots , (\varDelta ^{\alpha _k})^{(t)}\end{aligned}$$

after derivative action, where the transpose is taken in case the derivative acts on a transposed resolvent. In all cases the in-degree of the vertex associated with \(a\) is equal to the out-degree of the vertex associated with \(b\).

For (P5) note that by the definition of the underline-renormalisation it follows that for degree two edges the corresponding \(\partial _{ba}\) derivative cannot act on its own trace and therefore cycles have to involve at least two \(V_\kappa ^2\) vertices.

For (P6) we note that \(\left|V_\kappa ^2\setminus V_\mathrm {o}\right|=2\left|E_\kappa ^2\right|-\left|V_\mathrm {o}\cap V_\kappa ^2\right|\), while due to (P5) each cycle with zero \(V_\mathrm {o}\)-vertices contains at least two \(V_\kappa ^2\setminus V_\mathrm {o}\)-vertices and each cycle with one \(V_\mathrm {o}\)-vertex contains at least one \(V_\kappa ^2\setminus V_\mathrm {o}\)-vertex.

The claim (P7) follows immediately from the construction (orth-1). Similarly, claim (P8) follows from the construction (orth-2) together with the observation that because \(\left|E_\kappa \right|\) is the total number of cumulant expansions, a total of \(2p-\left|E_\kappa \right|\) derivatives have acted on some \(W\), and thus the number \(n\) of \(W\)’s acting on as degree-\(2\) cumulants on some \(G\) satisfies

$$\begin{aligned} n \ge \left|E_\kappa ^2\right| - (2p-\left|E_\kappa \right|)= 2\left|E_\kappa ^2\right| + \left|E_\kappa ^{\ge 3}\right|-2p, \end{aligned}$$
(93)

and, trivially, \(n\le 2p\). This concludes the proof of (P8) in the mutually exclusive cases \(j\in {\mathfrak {a}}\) and \(j\in {\mathfrak {t}}\) (recall that \({\mathfrak {a}}\cap {\mathfrak {t}}=\emptyset \) by assumption).

For the claim (92a) on the number of \(G\)-edges in (P3) note that the number of \(\mathfrak {I}G\)’s remains invariant under the derivative actions. For (92b) note that each derivative acting on some \(G\) increases the number of \(G\)’s by one, while each of the \(2p-\left|E_\kappa \right|\)derivatives acting on some \(W\) leaves the number of \(G\)’s invariant. Thus we conclude that the total number of \(G\)’s is

$$\begin{aligned} 2lp + \sum _{e\in E_\kappa } d_g(e) - \left|E_\kappa \right| - (2p-\left|E_\kappa \right|) = \sum _{e\in E_\kappa } d_g(e) +2(l-1)p\end{aligned}$$

and (92b) follows. \(\square \)

Remark 7

Proposition 3 holds true verbatim also under the alternative definition of the renormalisation outlined in Remark 5 in case no \(G\) is transposed. Also the proof of the proposition remains unchanged except for the proof of Property (P5). For the alternative renormalisation also for degree two edges when expanding \(\underline{WG\cdots }=\sum _{ab}\varDelta ^{ab}G\cdots \partial _{ba}\) the derivative \(\sigma \partial _{ab}\) may act on its own trace. However, since no \(G\) is transposed this action will necessarily result in \(\varDelta ^{ab} G\cdots G\varDelta ^{ab}\) and therefore no loops are created.

Using Proposition 3, in order to conclude Theorem 5, it remains to estimate \({{\,\mathrm{Val}\,}}(\varGamma )\) for each \(\varGamma \in {\mathcal {G}}_p\) as follows. We note that the following Proposition is valid for any av-/iso-graphs \(\varGamma \in {\mathcal {G}}\) from Definition 4, i.e. graphs satisfying the properties (P1)-(P8) and (P\(^\mathrm{av}\)9)-(P\(^\mathrm{av}\)10)/(P\(^\mathrm{iso}\)9)-(P\(^\mathrm{iso}\)10) above rather than only for the specific families of graphs \({\mathcal {G}}_p^\mathrm {av},{\mathcal {G}}_p^\mathrm {iso}\) arising in the cumulant expansion.

Proposition 4

(Value estimate) For each av-graph \(\varGamma \in {\mathcal {G}}\) for some parameters \(a,t, l,p,i\in {\mathbf {N}}\) we have the bound

$$\begin{aligned} \left|{{\,\mathrm{Val}\,}}(\varGamma )\right| \prec {\left\{ \begin{array}{ll} \rho ^{2(b+1)p} N^{2bp} K^{-2bp}, &{} b=l\\ \varLambda _+^{2ap}\varPi _+^{2tp}\rho ^{2ip\vee 2(b+1)p}N^{p(a+t+2b)} K^{-p(1+2b)} , &{} b<l, \end{array}\right. },\,\, b:= l-a-t\nonumber \\ \end{aligned}$$
(94)

with \(K\) as in (75), while for each iso-graph \(\varGamma \) for some parameters \(a,t,l,p,i\in {\mathbf {N}}\) we have the bound

$$\begin{aligned} \left|{{\,\mathrm{Val}\,}}(\varGamma )\right| \prec \varLambda _+^{2ap}\varPi _+^{2tp}\rho ^{2ip\vee 2(b+1)p}N^{p(a+t+2b)} K^{-p(1+2b)},\quad b:= l-a-t-1.\nonumber \\ \end{aligned}$$
(95)

Proof

(Theorem 5) Theorem 5 follows immediately by combining Propositions 3 and 4 under the simplifying assumptions made at the beginning of Sect. 5. Their removal is a routine technicality whose details we present in Appendix A of the arXiv:2012.13215 version of the present paper. Following the proof Proposition 4 it is evident that both \(\varLambda _+^{2ap}\) and \(\varPi _+^{2tp}\) can be replaced by the product of individual \(\varLambda _+^{B_k}\), \(\varPi _+^{B_k}\) for \(k\in {\mathfrak {a}}\cup {\mathfrak {t}}\), as claimed in Theorem 5.

Finally, regarding the replacement of \(\rho ^i\) by \(\prod _{k\in {\mathfrak {i}}}\rho (z_k)\) in the bounds of Theorem 5, it is easy to see that during the cumulant expansion the number of \(\mathfrak {I}G(z_k)\) is preserved and each gives rise to a factor \(\rho (z_k)\) in Proposition 4, hence the factor \(\rho ^{2ip}\) may be replaced by the factor \(\prod _{k\in {\mathfrak {i}}} \rho (z_k)^{2p}\). Similarly, for the replacement of \(\varLambda _+^a\) by \(\prod _{k\in {\mathfrak {a}}}\varLambda _+^{B_k}\) we note that each \(B_k\) appears exactly \(2p\) times also after the cumulant expansions, and therefore each \(\varLambda _+^{B_k}\) can only appear in at most the \(2p\)-th power on the rhs. of (94)–(95). \(\square \)

5.2 Estimating graph values: Proof of Proposition 4

The proof of Proposition 4 goes in three major steps formulated in Lemmata 23 and 4 which we first state and then use to conclude the proof of Proposition 4.

First, we express the value \({{\,\mathrm{Val}\,}}(\varGamma )={{\,\mathrm{Val}\,}}(\varGamma _\mathrm {red})\) as the value of the reduced graph \(\varGamma _\mathrm {red}\) obtained from \(\varGamma \) by collapsing all degree-\(2\) vertices \(V_\mathrm {i}\cup V_\kappa ^2\). Thus, in graph-theoretic terms, \(\varGamma _\mathrm {red}\) is the minimal (with the least number of edges) graph having \(\varGamma \setminus E_\kappa ^2\) as a subdivision. We claim that each summation index \(a_v\) for \(v\in V_\mathrm {i}\cup V_\kappa ^2\) appears in exactly two \({\mathcal {G}}\)-factors and no \(\kappa \)-matrices, and thus the summation can be written as a matrix product after (potentially) transposing one of the two \({\mathcal {G}}\)’s in the cases of two incoming or two outgoing edges, e.g. \(\sum _{a_v} (GB)_{\varvec{x}a_v} G_{\varvec{y}a_v}=(GBG^t)_{\varvec{x}\varvec{y}}\). Indeed, the index \(a_v\) appears only in exactly two \(G\)-edges since \(d_g(v)=2\), cf. Definition 2. Moreover, due to (P1) no \(\kappa \)-edge is adjacent to \(V_\mathrm {i}\) while for \(v\in V_\kappa ^2\) the corresponding \(\kappa \)-edge \((uv)\) or \((vu)\) due to (78) and (88) is given by \(\kappa ^{1,1}\) or \(\kappa ^{2,0}\) which are constant-\(1\), and constant \(\sigma \)-matrices, and thus effectively the index \(a_v\) does not appear in any \(\kappa ^{(vu)/(uv)}\) matrix.

In the reduction process the value of \(\varGamma \) effectively reduces to a summation over vertices of degree at least \(3\), traces of \(G\)-cycles and entries of \(G\)-chains and \(E_\kappa ^{\ge 3}\)-matrices, represented by \(\varGamma _\mathrm {red}\). Here we use the terminology that a \(G\)-cycle is a cycle of \(G\)-edges on \(V_\kappa ^2 \cup V_\mathrm {i}\) vertices, irrespective of the edge orientation, and that a \(G\)-chain is a chain of \(G\)-edges with internal \(V_\kappa ^2 \cup V_\mathrm {i}\)-vertices and external \(V_\kappa ^{\ge 3}\cup V_\mathrm {e}\)-vertices, again irrespective of the edge orientation. Note that the reduction completely collapses each \(E_g\)-cycles on \(V_\mathrm {i}\cup V_\kappa ^2\)-vertices into a single vertex with a loop edge. The sets of these single vertices and loop edges are denoted by \(V_\mathrm {cyc}\) and \(E_g^{\mathrm {red},\mathrm {cyc}}\). Therefore the edge set of reduced graph \(\varGamma _\mathrm {red}\) is naturally partitioned into \(V(\varGamma _\mathrm {red}):= V^{\ge 3}_\kappa {\dot{\cup }} V_\mathrm {e} {\dot{\cup }} V_\mathrm {cyc}\) and its edge set is \(E(\varGamma _\mathrm {red}):= E_g^\mathrm {red} {\dot{\cup }} E^{\ge 3}_\kappa \).

The graph reduction by partial resummations corresponds to generalising the definition of value to

$$\begin{aligned} {{\,\mathrm{Val}\,}}(\varGamma _\mathrm {red})&:= N^{-\left|E_\kappa ^2\right|+\left|V_\mathrm {cyc}\right|} \sum _{\begin{array}{c} a_v\in [N]\\ v\in V_\kappa ^{\ge 3} \end{array}} \biggl [\prod _{(uv)\in E_\kappa ^{\ge 3}} \biggl ( N^{-d_g(u)/2} \kappa ^{(uv)}_{a_u a_v} \biggr )\biggr ]\nonumber \\&\quad \times \biggl ({\prod _{v\in V_\mathrm {cyc}}}\langle {\mathcal {G}}^{(vv)}\rangle \biggr )\biggl (\prod _{\begin{array}{c} (uv)\in E_g^{\mathrm {red}}\setminus E_g^{\mathrm {red},\mathrm {cyc}} \end{array}} {\mathcal {G}}^{(uv)}_{\varvec{x}_u\varvec{x}_v}\biggr ). \end{aligned}$$
(96)

where we defined

$$\begin{aligned} {\mathcal {G}}^{(v_1 v_k)}:=({\mathcal {G}}^{(v_1v_2)})^{(t)}\cdots ({\mathcal {G}}^{(v_{k-1}v_k)})^{(t)} \end{aligned}$$
(97)

as a matrix product of (possibly transposed, depending on the in- and out-degrees) of \({\mathcal {G}}^{(v_1v_2)},\ldots ,{\mathcal {G}}^{(v_{k-1}v_k)}\), whenever \(d_g(v_2)=\cdots =d_g(v_{k-1})=2\). For each edge \(e\in E_g^\mathrm {red}\) we record the number of \(\mathfrak {I}G\)’s, the total number of \(G\)-edges and the number of summed up \(V_\mathrm {o}^{t/0\mathrm {tr}}\)-vertices in the corresponding chains and cycles by \(i(e),l(e),t(e),a(e)\), respectively and set \(o(e):=a(e)+t(e)\). The letter \(o\) refers to the counting of vertices with the asymptotic orthogonality effect. Note that for cycles all \(V_\mathrm {o}^{t/0\mathrm {tr}}\)-vertices in the cycle contribute towards \(t(e),a(e)\) while for chains the first and last vertex necessarily are in \(V_\kappa ^{\ge 3}\cup V_\mathrm {e}\) and hence, by definition cannot be \(V_\mathrm {o}^{t/0\mathrm {tr}}\)-vertices. Thus the parameters \(a,t,i,l\) satisfy the relations

$$\begin{aligned} \begin{aligned} 1&\le l(e), \qquad 0 \le i(e)\le l(e), \\ 0&\le a(e)+ t(e)=o(e)\le {\left\{ \begin{array}{ll} l(e), &{} e\in E_g^{\mathrm {red},\mathrm {cyc}},\\ l(e)-1,&{} e\in E_g^{\mathrm {red}}\setminus E_g^{\mathrm {red},\mathrm {cyc}}. \end{array}\right. } \end{aligned} \end{aligned}$$
(98)

We denote the set of \(v\in V_\mathrm {cyc}\) with \(o((vv))=k\) by \(V_\mathrm {cyc}^{o=k}\) which are of cardinality \(\left|V_\mathrm {cyc}^{o=k}\right|=n_\mathrm {cyc}^{o=k}\), c.f. (P6).

Lemma 2

For each av-/iso-graph \(\varGamma \in {\mathcal {G}}\) with parameters \(a, t, l, i,p\) and the selected vertex sets \(V_\mathrm {o}^{0\mathrm {tr}},V_\mathrm {o}^{t}\), let \(\varGamma _\mathrm {red}=(V_\kappa ^{\ge 3}\cup V_\mathrm {e}\cup V_\mathrm {cyc},E_g^\mathrm {red}\cup E_\kappa ^{\ge 3})\) denote its reduction. The reduced graph then satisfies

$$\begin{aligned} \left|E_g^\mathrm {red}\right| = \left|E_g\right| - \left|V_\mathrm {i}\right| -\left|V_\kappa ^2\right| + \left|V_\mathrm {cyc}\right|=\left|E_g\right| - \left|V_\mathrm {i}\right| -2\left|E_\kappa ^2\right| + \left|V_\mathrm {cyc}\right| \end{aligned}$$
(99)

and

$$\begin{aligned} {{\,\mathrm{Val}\,}}(\varGamma _\mathrm {red})={{\,\mathrm{Val}\,}}(\varGamma ). \end{aligned}$$

Moreover, we have

$$\begin{aligned} \sum _{e\in E_g^\mathrm {red}} t(e) = \left|V_\mathrm {o}^t\right|, \,\,\sum _{e\in E_g^\mathrm {red}} a(e) = \left|V_\mathrm {o}^{0\mathrm {tr}}\right|, \,\, \sum _{e\in E_g^\mathrm {red}} l(e)=\left|E_g\right|, \,\, \sum _{e\in E_g^\mathrm {red}} i(e)=2ip. \nonumber \\ \end{aligned}$$
(100)

Second, we estimate the value of each graph by bounding the size of each of the reduced \(G\)-edges entrywise and the summations trivially.

Lemma 3

For each av-/iso-graph \(\varGamma \in {\mathcal {G}}\) with the selected vertex sets \(V_\mathrm {o}^{0\mathrm {tr}},V_\mathrm {o}^{t}\) we have \(\left|{{\,\mathrm{Val}\,}}(\varGamma _\mathrm {red})\right|\prec {{\,\mathrm{I_2-Est}\,}}(\varGamma )\) with

$$\begin{aligned} \begin{aligned} {{\,\mathrm{I_2-Est}\,}}(\varGamma )&:= \varLambda _+^{\left|V_\mathrm {o}^{0\mathrm {tr}}\right|}\varPi _+^{\left|V_\mathrm {o}^t\right|} \rho ^{2ip\vee (\left|V_\mathrm {i}\right|+2\left|E_\kappa ^2\right|-\left|V_\mathrm {o}\right|)} N^{\left|V_\mathrm {i}\right|+\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|/2-\left|V_\mathrm {o}\right|/2-\delta ^{\ge 4}} \\&\quad \times K^{\left|V_\mathrm {o}\right|-\left|V_\mathrm {i}\right|-2\left|E_\kappa ^2\right|+\left|V_\mathrm {cyc}^{o=0}\right|+\left|V_\mathrm {cyc}^{o=1}\right|/2}, \end{aligned} \end{aligned}$$
(101)

where

$$\begin{aligned} \delta ^{\ge 4}:=\sum _{e\in E_\kappa }\Bigl (\frac{d_g(e)}{2}-2\Bigr )_+ \end{aligned}$$

Finally, in the third step we improve upon the entrywise estimate as by estimating summations corresponding to some \(V_\kappa ^{\ge 3}\)-vertices more effectively, using a Schwarz inequality followed by the Ward identity \(GG^*= \mathfrak {I}G/\eta \).

Lemma 4

For each av-graph \(\varGamma \in {\mathcal {G}}\) with the selected vertex sets \(V_\mathrm {o}^{0\mathrm {tr}},V_\mathrm {o}^{0\mathrm {tr}}\) we have \(\left|{{\,\mathrm{Val}\,}}(\varGamma _\mathrm {red})\right|\prec {{\,\mathrm{I_3-Est}\,}}(\varGamma )\) with

$$\begin{aligned} {{\,\mathrm{I_3-Est}\,}}(\varGamma )&:= \varLambda _+^{\left|V_\mathrm {o}^{0\mathrm {tr}}\right|} \varPi _+^{\left|V_\mathrm {o}^t\right|}\rho ^{2ip\vee (\left|V_\mathrm {i}\right|+2\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|-\left|V_\mathrm {o}\right|)} N^{\left|V_\mathrm {i}\right|+\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|/2-\left|V_\mathrm {o}\right|/2-\delta ^{\ge 4}} \nonumber \\&\quad \times K^{\left|V_\mathrm {o}\right|-\left|V_\mathrm {i}\right|-2\left|E_\kappa ^2\right|-\left|E_\kappa ^3\right|/2+\left|V_\mathrm {cyc}^{o=0}\right|+\left|V_\mathrm {cyc}^{o=1}\right|/2} \end{aligned}$$
(102a)

and for each iso-graph \(\varGamma \in {\mathcal {G}}\) we have \(\left|{{\,\mathrm{Val}\,}}(\varGamma _\mathrm {red})\right|\prec {{\,\mathrm{I_3-Est}\,}}(\varGamma )\) with

$$\begin{aligned} \begin{aligned} {{\,\mathrm{I_3-Est}\,}}(\varGamma )&:= \varLambda _+^{\left|V_\mathrm {o}^{0\mathrm {tr}}\right|}\varPi _+^{\left|V_\mathrm {o}^t\right|} \rho ^{2ip\vee \left( \left|V_\mathrm {i}\right|+2\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|-\left|V_\mathrm {o}\right|\right) } N^{\left|V_\mathrm {i}\right|+\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|/2-\left|V_\mathrm {o}\right|/2-\delta ^{\ge 4}} \\&\quad \times K^{\left|V_\mathrm {o}\right|-\left|V_\mathrm {i}\right|-2\left|E_\kappa ^2\right|-\left|E_\kappa ^3\right|/2+\left|V_\mathrm {cyc}^{o=0}\right|+\left|V_\mathrm {cyc}^{o=1}\right|/2-\bigl (p-\left|E_\kappa ^2\right|+\left|V_\mathrm {cyc}\right|-\delta ^{\ge 4}\bigr )_+} \end{aligned} \end{aligned}$$
(102b)

Before proving Lemmata 24 we conclude the proof of Proposition 4.

Proof

(Proposition 4) The proof of Proposition 4 distinguishes several cases. For the averaged bound we consider the two cases \(a=t=0\) and \(a+t=o>0, \left|V_\mathrm {i}\cap V_\mathrm {o}\right|=2(o-1)p\) separately, while for the isotropic bound we consider the cases \(o\ge 0, \left|V_\mathrm {i}\cap V_\mathrm {o}\right|=2op\) and \(o>0, \left|V_\mathrm {i}\cap V_\mathrm {o}\right|=2(o-1)p\) separately.

We first consider the \(o=0\) case of the averaged bound where we obtain from Lemma 2, (102a) with \(V_\mathrm {o}=\emptyset \) from (P8), \(\left|V_\mathrm {cyc}\right|\le \left|E_\kappa ^2\right|\) from (P6) and \(\left|V_\mathrm {i}\right|=2p(l-1)\) from (\(P^\mathrm{av}10\)) that

$$\begin{aligned} \begin{aligned} \left|{{\,\mathrm{Val}\,}}(\varGamma )\right|&\prec \rho ^{ \left|V_\mathrm {i}\right|+2\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|} N^{\left|V_\mathrm {i}\right|+\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|/2} K^{-\left|V_\mathrm {i}\right| -\left|E_\kappa ^2\right| -\left|E_\kappa ^3\right|/2 }\\&=\rho ^{2(l-1)p+2\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|} N^{2lp} K^{-2p(l-1)} N^{-2p+\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|/2} K^{-\left|E_\kappa ^2\right| -\left|E_\kappa ^3\right|/2 } \\&\lesssim \rho ^{2p(l+1)} N^{2lp} K^{-2lp}, \end{aligned} \end{aligned}$$

where in the last step we used \(K\lesssim N\rho ^2\) due to \(\eta =\min _k\eta _k\lesssim \max _k\rho _k=\rho \) and \(\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|/2\le \left|E_\kappa \right|\le 2p\) from (P2) and (P4) .

Next, we consider the \(\left|V_\mathrm {o}\cap V_\mathrm {i}\right|=2op\) case of isotropic bound, where we obtain from Lemma 2, (102b), and \(\left|V_\mathrm {i}\right|=2p(o+b-1)\) from (\(P^\mathrm{iso}10\)) that

$$\begin{aligned} \begin{aligned} \left|{{\,\mathrm{Val}\,}}(\varGamma )\right|&\prec \varLambda _+^{2ap}\varPi _+^{2tp}\rho ^{2ip\vee (2p(b-1) + 2 \left|E_\kappa ^2\right| + \left|E_\kappa ^3\right|) } N^{p(o+2b)} N^{\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|/2-2p-\delta ^{\ge 4}} \\&\quad \times K^{p(1-2b)-\left|E_\kappa ^2\right|-\left|E_\kappa ^3\right|/2 + \delta ^{\ge 4}} \\&\lesssim \varLambda _+^{2ap}\varPi _+^{2tp} \rho ^{2ip\vee 2p(b+1)} N^{p(o+2b)} K^{-p(1+2b)} \end{aligned} \end{aligned}$$

again using \(K\lesssim N\rho ^2\) and \(\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|/2\le \left|E_\kappa \right|\le 2p\).

Next, we consider the \(\left|V_\mathrm {o}\cap V_\mathrm {i}\right|=2(o-1)p, o>0\) case of both the averaged bound, and the isotropic bound where we similarly obtain (from estimating \((\ldots )_+\ge 0\) for the iso-graphs)

$$\begin{aligned}&\left|{{\,\mathrm{Val}\,}}(\varGamma )\right| \nonumber \\&\quad \prec \varLambda _+^{\left|V_\mathrm {o}^{0\mathrm {tr}}\right|}\varPi _+^{\left|V_\mathrm {o}^t\right|} \rho ^{i'} N^{\left|V_\mathrm {i}\right|+\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|/2-\left|V_\mathrm {o}\right|/2} K^{\left|V_\mathrm {o}\right|-\left|V_\mathrm {i}\right| -2\left|E_\kappa ^2\right| -\left|E_\kappa ^3\right|/2 +\left|V_\mathrm {cyc}^{o=0}\right|+\left|V_\mathrm {cyc}^{o=1}\right|/2}\nonumber \\&\quad = \varLambda _+^{\left|V_\mathrm {o}^{0\mathrm {tr}}\right|} \varPi _+^{\left|V_\mathrm {o}^t\right|} \rho ^{i'} N^{p(o+2b-1)+\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|/2-\left|V_\mathrm {o}\cap V_\kappa \right|/2} \nonumber \\&\qquad \times K^{\left|V_\mathrm {o}\cap V_\kappa \right|-2pb -2\left|E_\kappa ^2\right| -\left|E_\kappa ^3\right|/2 +\left|V_\mathrm {cyc}^{o=0}\right|+\left|V_\mathrm {cyc}^{o=1}\right|/2}\nonumber \\&\quad \le \varLambda _+^{2pa}\varPi _+^{2pt}\rho ^{i'} N^{p(o+2b)} K^{-(2b+1)p } \Bigl (\frac{K}{N}\Bigr )^{p+\left|V_\mathrm {o}\cap V_\kappa \right|/2 -\left|E_\kappa ^2\right| -\left|E_\kappa ^3\right|/2} \nonumber \\&\quad \lesssim \varLambda _+^{2pa}\varPi _+^{2pt} \rho ^{2ip\vee 2(b+1)p} N^{p(o+2b)} K^{-(2b+1)p }, \end{aligned}$$
(103)

with

$$\begin{aligned} \begin{aligned} i'&= 2ip\vee (\left|V_\mathrm {i}\right|+2\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|-\left|V_\mathrm {o}\right|)= 2ip \vee \Bigl ( 2bp + 2\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right| - \left|V_\mathrm {o}\cap V_\kappa \right| \Bigr ). \end{aligned} \end{aligned}$$

Here we used (P3) and (P\(^\mathrm{av}\)10)/(P\(^\mathrm{iso}\)10) and \(V_\mathrm {o}\subset V_\mathrm {i}\cup V_\kappa ^2\) (since by definition \(V_\mathrm {o}\) are degree-\(2\) vertices, while \(V_\mathrm {e}=\emptyset \) due to (P\(^\mathrm{av}\)10) in the averaged case and \(d_g(v)=1,v\in V_\mathrm {e}\) due to (P\(^\mathrm{iso}\)10) in the isotropic case and \(V_\kappa ^{\ge 3}\)-vertices have degree at least \(3\) by (P4)) in the equality. Furthermore, we used (P6) in the first inequality, and (P8) in the second inequality, and \(K/N\lesssim \rho ^2\) and (P8) in the final step. \(\square \)

5.2.1 Graph reduction: Proof of Lemma 2

Since for \(d_g((uv))=2\) we have \(\kappa ^{(uv)}_{ab}=1\) or \(\kappa ^{(uv)}_{ab}=\sigma \) for all \(a,b\) due to (78) (using Assumption (A-i)) it is possible to write (with potential transpositions) the summation over \(a_v\) for \(v\in V_\mathrm {i}\cup V_\kappa ^2\) as matrix products which are then associated with edges of the reduced graph \(\varGamma _\mathrm {red}\). In this way \(G\)-chains \((v_1v_2),\dots ,(v_{k-1}v_k)\in E_g\) with \(v_2,\ldots ,v_{k-1}\in V_\kappa ^2\cup V_\mathrm {i}\) and \(v_1,v_k\not \in V_\kappa ^2\cup V_\mathrm {i}\) are reduced to the edge \((v_1v_k)\in E_g^\mathrm {red}\), and \(G\)-cycles \((v_1v_2),\dots ,(v_k v_1)\in E_g\) with \(v_1,\ldots ,v_k\in V_\kappa ^2\cup V_\mathrm {i}\) are reduced to isolated loops which we represent by the vertex \(v_1\in V_\mathrm {cyc}\) and the loop-edge \((v_1v_1)\in E_g^\mathrm {red,cyc}\subset E_g^\mathrm {red}\). For each cycle of length \(k\) we arbitrarily pick one of the \(k\) possible reductions since they are all equivalent.

The first relation in (99) follows trivially since for each of the carried out summations corresponding to \(V_\kappa ^2\cup V_\mathrm {i}\) the number of \(G\)-edges is reduced by one with the exception that for cycles the last index is kept in \(V_\mathrm {cyc}\). The second relation in (99) is a direct consequence of (P1). Next, the claim (100) follows from (P3) and by noting that the definition of \(a(e),t(e)\) is consistent with the counting of \(t/0\mathrm {tr}\)-vertices in \(\varGamma \). This concludes the proof of Lemma 2.

5.2.2 Entrywise bound: Proof of Lemma 3

For edges in the reduced graph we use the bound from the following lemma. Note that \(o(e)\le l(e)\) for cycles \(e\) and \(o(e)\le l(e)-1\) for chains \(e\) and therefore the exponents of \(K\) below are guaranteed to be non-positive.

Lemma 5

For \(e\in E_g^{\mathrm {red},\mathrm {cyc}}\) we have the averaged bound

$$\begin{aligned} \begin{aligned} \left|\langle {\mathcal {G}}^{e}\rangle \right|&\prec \varLambda _+^{a(e)}\varPi _+^{t(e)} \rho ^{i(e)\vee (l(e)-o(e)+\varvec{1}[0<o(e)<l(e)])} N^{l(e)-\frac{o(e)}{2}-1} \\&\quad \times K^{o(e)-l(e)+\varvec{1}(o(e)=0) + \frac{\varvec{1}(o(e)=1)}{2}} \end{aligned} \end{aligned}$$
(104a)

and for \(e\in E_g^{\mathrm {red}}\setminus E_g^{\mathrm {red},\mathrm {cyc}}\) the isotropic bound

$$\begin{aligned} \left|\langle \varvec{v},{\mathcal {G}}^{e} \varvec{w}\rangle \right|&\prec \left\Vert \varvec{v}\right\Vert \left\Vert \varvec{w}\right\Vert \varLambda _+^{a(e)}\varPi _+^{t(e)} \rho ^{i(e)\vee (l(e)-o(e)-\varvec{1}[o(e)=l(e)-1])} N^{l(e)-\frac{o(e)}{2}-1} \nonumber \\&\quad \times K^{o(e)-l(e)+1} \end{aligned}$$
(104b)

for any two deterministic vectors \({\varvec{v}}, {\varvec{w}}\). Moreover, the same bounds hold true if within the chain \({\mathcal {G}}\) absolute values of resolvents \(\left|G(z)\right|\) appear in addition, to \((\mathfrak {I}G)^{(t)},(G^*)^{(t)},(G)^{(t)}\).

Remark 8

The estimates (104a)–(104b) are designed to take advantage of the asymptotic orthogonality vertices. Indeed, using that a posteriori we will show that \(\varLambda _++\varPi _+\prec 1\) in the bulk, \(\rho \sim 1\), both inequalities essentially depend on the number of orthogonality-vertices as \((K/\sqrt{N})^o \sim (\sqrt{N}\eta )^o\) (ignoring some \(K\) factors in (104a) for \(o=0, 1\)). Therefore as long as \(\eta \ll N^{-1/2}\) the orthogonality helps and our bounds do exploit this effect. However, for \(\eta \gg N^{-1/2}\) it is better to use (104a)–(104b) by simply ignoring the asymptotic orthogonality, i.e. choosing \({\mathfrak {a}}={\mathfrak {t}}=\emptyset \).

Using Lemma 5, the proof of which we defer to the the end of the subsection, we now conclude the proof of Lemma 3. From Lemma 2 we obtain \({{\,\mathrm{Val}\,}}(\varGamma )={{\,\mathrm{Val}\,}}(\varGamma _\mathrm {red})\) with \({{\,\mathrm{Val}\,}}(\varGamma _\mathrm {red})\) as in (96). By estimating \(\left|\kappa ^{(uv)}_{ab}\right|\lesssim 1\) and \({\mathcal {G}}\) via Lemma 5 we obtain from (100),

$$\begin{aligned} \prod _{(uv)\in E_\kappa ^{\ge 3}}\biggl ( \sum _{a_u,a_v} N^{-d_g((uv))/2} \biggr )=\prod _{k\ge 3}\Bigl (N^{2-k/2}\Bigr )^{\left|E_\kappa ^k\right|}= N^{\left|E_\kappa ^3\right|/2-\delta ^{\ge 4}}, \end{aligned}$$

and \(\left|V_\mathrm {cyc}\right|=\left|E_g^{\mathrm {red},\mathrm {cyc}}\right|\) that

$$\begin{aligned} \left|{{\,\mathrm{Val}\,}}(\varGamma )\right|&\prec \varLambda _+^{\left|V_\mathrm {o}^{0\mathrm {tr}}\right|}\varPi _+^{\left|V_\mathrm {o}^t\right|} \rho ^{ i'' } N^{-\left|E_\kappa ^2\right|+\left|V_\mathrm {cyc}\right|+\left|E_\kappa ^3\right|/2+\left|E_g\right|-\left|V_\mathrm {o}\right|/{2}-\left|E_g^\mathrm {red}\right|-\delta ^{\ge 4}} \nonumber \\&\qquad \times K^{\left|V_\mathrm {o}\right|-\left|E_g\right| + \left|E_g^\mathrm {red}\right|-\left|V_\mathrm {cyc}\right|+\left|V_\mathrm {cyc}^{o=0}\right|+\left|V_\mathrm {cyc}^{o=1}\right|/2} \nonumber \\&\lesssim \varLambda _+^{\left|V_\mathrm {o}^{0\mathrm {tr}}\right|}\varPi _+^{\left|V_\mathrm {o}^t\right|} \rho ^{ 2ip\vee (\left|V_\mathrm {i}\right|+2\left|E_\kappa ^2\right|-\left|V_\mathrm {o}\right|)} N^{\left|E_\kappa ^2\right|+\left|E_\kappa ^3\right|/{2}+\left|V_i\right|-\left|V_\mathrm {o}\right|/{2}-\delta ^{\ge 4}} \nonumber \\&\qquad \times K^{\left|V_\mathrm {o}\right|-\left|V_i\right|-2\left|E_\kappa ^2\right|+\left|V_\mathrm {cyc}^{o=0}\right|+\left|V_\mathrm {cyc}^{o=1}\right|/{2}}, \end{aligned}$$
(105)

where we used (99) in the second step. Here we counted the factors of \(\rho \) as

$$\begin{aligned} i'' :={}&\sum _{e\in E_g^\mathrm {red}} i(e) \vee {\left\{ \begin{array}{ll} l(e)-o(e)+\varvec{1}(0<o(e)<l(e)),&{} e\in E_g^{\mathrm {red},\mathrm {cyc}},\\ l(e)-o(e)-\varvec{1}(o(e)=l(e)-1), &{} e\in E_g^{\mathrm {red}}\setminus E_g^{\mathrm {red},\mathrm {cyc}}, \end{array}\right. }\nonumber \\ \ge {}&\sum _{e\in E_g^\mathrm {red}} i(e) \vee [l(e)-o(e)-\varvec{1}(e\not \in E_g^{\mathrm {red},\mathrm {cyc}})] \nonumber \\ \ge {}&2ip \vee \Bigl ( \left|E_g\right|-\left|V_\mathrm {o}\right| - \left|E_g^\mathrm {red}\right| + \left|V_\mathrm {cyc}\right| \Bigr ) = 2ip \vee \Bigl (\left|V_\mathrm {i}\right|+2\left|E_\kappa ^2\right|-\left|V_\mathrm {o}\right|\Bigr ) \end{aligned}$$
(106)

due to (99), completing the proof of Lemma 3.

Proof

(Lemma 5) We actually prove a slightly more general bound which allows for chains \({\mathcal {G}}^{e}=G_1B_1\cdots G_l B_l\) with

$$\begin{aligned} G_k\in \left\{ G(z_k),G(z_k)^*,\mathfrak {I}G(z_k),\left|G(z_k)\right|,(G(z_k))^t,(G(z_k)^*)^t,(\mathfrak {I}G(z_k))^t,\left|G(z_k)\right|^t \right\} , \end{aligned}$$

i.e. including factors of the form \(\left|G\right|=\sqrt{G^*G}=\sqrt{GG^*}\). Within the proof we will repeatedly use (98) which implies \(o(e)\le l(e)\) for \(e\in E_g^{\mathrm {red},\mathrm {cyc}}\) and \(o(e)\le l(e)-1\) for \(e\in E_g^{\mathrm {red}}\setminus E_g^{\mathrm {red},\mathrm {cyc}}\). We prove (104a) by distinguishing several cases depending on the parameters \(o(e)\) and \(l(e)\) and a new parameter \(c(e)\) counting the number of alternating chains associated with \(e\) defined as follows. For any \(e\in E_g^\mathrm {red}\) we consider the original chain or cycle in \(\varGamma \) that was reduced to \(e\). The alternating chains associated with \(e\) are the maximal subchains of these original chain/cycle with internal vertices from \(V_\mathrm {o}\) and at least one \(V_\mathrm {o}\)-vertex. For example, if \(e\in E_g^\mathrm {red}\) was the reduction of the cycle \(\langle (GA)(\mathfrak {I}G A)(B G^*)(G)(AG^*)(\mathfrak {I}G)^t\rangle \) then the alternating chains associated with \(e\) are \((GA)(\mathfrak {I}G A)\) and \((G)(AG^*)(\mathfrak {I}G)^t\). By maximality, \(o(e)\), the number of \(V_\mathrm {o}\)-vertices in the original chain/cycle that has been reduced to \(e\) is equal to the total number of \(V_\mathrm {o}\) vertices in the alternating chains associated with \(e\). In particular, \(c(e) \le o(e)\).

5.2.3 Averaged bound for \(o(e)=0\)

In the case without alternating chains, i.e. for \(o(e)=0\) we simply split off any \(G\)-factor by Cauchy-Schwarz and obtain

$$\begin{aligned} \begin{aligned} \left|\langle G_1 B_1 G_2 B \cdots G_l B_l\rangle \right|&\le \sqrt{\langle G_1 \left|B_1\right|^2 G_1^*\rangle \langle G_2B_2 \cdots G_l \left|B_l\right|^2 G_l^*\cdots B_2^*G_2^*\rangle } \\&\prec \frac{\rho }{\eta ^{l-1}} \le \rho ^l N^{l-1} K^{1-l}. \end{aligned} \end{aligned}$$

Here, and frequently in the remaining proof we use the norm bounds \(\left\Vert G\right\Vert \lesssim 1/\eta \), \(\left\Vert B_k\right\Vert \lesssim 1\), and the Ward identity \(G(z)G(z)^*=\mathfrak {I}G(z)/\mathfrak {I}z\).

5.2.4 Averaged bound for \(a(e)=l(e)\)

For \({\mathcal {G}}^e = G_1 B_1 G_2 B_2 \cdots G_l B_l \) we use spectral decomposition to write

$$\begin{aligned} \begin{aligned} \langle {\mathcal {G}}^e\rangle = N^{-1} \sum _{a_1\ldots a_l} \langle \varvec{u}^{(1)}_{a_1}, B_1 \varvec{u}^{(2)}_{a_2}\rangle \cdots \langle \varvec{u}^{(l)}_{a_l}, B_l \varvec{u}_{a_1}^{(1)}\rangle p_{a_1}^{(1)}\cdots p_{a_l}^{(l)}, \end{aligned} \end{aligned}$$

where \(p^{(k)}_a=(\lambda _a-z_k)^{-1},(\lambda _a-\overline{z_k})^{-1}, \mathfrak {I}(\lambda _a-z_k)^{-1},\left|\lambda _a-z_k\right|^{-1}\) depending on whether \(G_k=G,G^*,\mathfrak {I}G,\left|G\right|\), and \(\varvec{u}^{(k)}_a\in \left\{ \varvec{u}_a,\overline{\varvec{u}_a} \right\} \), depending on whether \(G_k\) is transposed or not. By additional averaging using the analogue of (40), Cauchy-Schwarz and the high-probability bounds

$$\begin{aligned} \sum _{a} \frac{1}{\left|\lambda _a-z\right|} \lesssim N \log N,\qquad \sum _{a} \left|\mathfrak {I}\frac{1}{\lambda _a-z} \right|\lesssim \rho (z) N , \end{aligned}$$
(107)

from rigidity (34) it follows that

$$\begin{aligned} \left|\langle {\mathcal {G}}^e\rangle \right|&\lesssim \frac{1}{N} \sum _{\begin{array}{c} a_k\in [N]\\ k\in [l] \end{array}} \left|p_{a_1}^{(1)}\right|\cdots \left|p_{a_l}^{(l)}\right| \frac{1}{L^l}\sum _{\begin{array}{c} \left|b_k-a_k\right|\le L\\ k\in [l] \end{array}} \left|\langle \varvec{u}_{b_1}^{(1)}, B_1 \varvec{u}_{b_2}^{(2)}\rangle \right|\cdots \left|\langle \varvec{u}_{b_l}^{(l)}, B_l \varvec{u}_{b_1}^{(1)}\rangle \right| \nonumber \\&\lesssim \frac{1}{N} \sum _{\begin{array}{c} a_k\in [N]\\ k\in [l] \end{array}} \left|p_{a_1}^{(1)}\right|\cdots \left|p_{a_l}^{(l)}\right| \sqrt{\frac{1}{L^2}\sum _{\left|a_1-b_1\right|\le L}\sum _{\left|a_2-b_2\right|<L} \left|\langle \varvec{u}_{b_1}^{(1)},B_1\varvec{u}_{b_2}^{(2)}\rangle \right|^2 }\cdots \nonumber \\&\quad \times \sqrt{\frac{1}{L^2}\sum _{\left|a_l-b_l\right|\le L}\sum _{\left|a_1-b_1\right|<L} \left|\langle \varvec{u}_{b_l}^{(l)},B_l\varvec{u}_{b_1}^{(1)}\rangle \right|^2 }\nonumber \\&\prec \varLambda _+^{a}\varPi _+^{t} \rho ^{i} N^{l/2-1}, \end{aligned}$$
(108)

where \(\log N\) factors have been incorporated into the \(\prec \) notation in the ultimate inequality.

5.2.5 Averaged bound for \(o(e)=1\)

By cyclicity we may assume \({\mathcal {G}}^e=G_1 B_1 G_2 B_l\cdots G_l B_l\) is such that the index between \(G_1B_1\) and \(G_2B_2\) is the asymptotic orthogonality index and estimate

$$\begin{aligned} \begin{aligned} \left|\langle G_1 B_1 G_2 B_2 \cdots G_l B_l\rangle \right|&\le \sqrt{\langle G_1 B_1 G_2 G_2^*B_1^*G_1^*\rangle \langle B_2 G_3 \cdots G_l \left|B_l\right|^2 G_l^*\cdots G_3^*B_2^*\rangle } \\&\prec \eta ^{-1} \varLambda _+ \rho N^{l-5/2}\rho ^{l-2} K^{5/2-l}\le \varLambda _+ \rho ^l N^{l-3/2} K^{3/2-l}, \end{aligned} \end{aligned}$$

from the \(o(e)=0\) and \(o(e)=l(e)\) cases, and using the Ward identity.

5.2.6 Averaged bound for \(2\le o(e)<l(e)\) and \(c(e)=1\)

For this case we may assume by cyclicity that

$$\begin{aligned} {\mathcal {G}}^e=G_1 B_1 \cdots G_o B_o G_{o+1}B_{o+1}\cdots G_l B_l \end{aligned}$$

such that the summations between \(G_1\) and \(G_o\) correspond to orthogonality indices. Here we make use of the inequality

$$\begin{aligned} \left|\langle XYZ\rangle \right| \le \Bigl [\langle X^*X (YY^*)^{1/2}\rangle \langle ZZ^*(Y^*Y)^{1/2}\rangle \Bigr ]^{1/2} \end{aligned}$$
(109)

for arbitrary matrices \(X,Y,Z\) which follows from singular value decomposition of \(Y=USV^*\) and Cauchy-Schwarz in the form

$$\begin{aligned} \begin{aligned} \left|\langle XYZ\rangle \right|^2&= \left|\langle X U\sqrt{S}\sqrt{S}V^*Z\rangle \right|^2 \\&\le \langle XUSU^*X^*\rangle \langle Z^*V S V^*Z \rangle =\langle X^*X (YY^*)^{1/2} \rangle \langle Z Z^*(Y^*Y)^{1/2} \rangle . \end{aligned} \end{aligned}$$

By (109) with \(X=G_1 B_1,Y=G_{2},Z=B_2 G_3\cdots B_o G_{o+1} B_{o+1}\cdots G_l B_l\) we obtain

$$\begin{aligned} \begin{aligned}&\left|\langle G_1 B_1 \cdots G_o B_o G_{o+1} \cdots G_l B_l\rangle \right| \\&\quad \le \sqrt{\langle B_1^*G_1^*G_1 B_1 \left|G_2\right|\rangle \langle \left|G_2\right|^{1/2} B_2 G_3 \cdots G_l \left|B_l\right|^2 G_l^*\cdots G_3^*B_2^*\left|G_2\right|^{1/2} \rangle }\\&\quad \lesssim \frac{1}{\eta ^{l-o}} \Bigl [\langle B_1^*\mathfrak {I}G_1 B_1 \left|G_2\right|\rangle \langle \left|G_2\right| B_2 G_3 \cdots B_o \mathfrak {I}G_{o+1} B_o^*\cdots G_3^*B_2^*\rangle \Bigr ]^{1/2} \\&\quad \prec \varLambda _+^a\varPi _+^t \rho ^{l-o+1+i_{2\ldots o}} N^{l-o/2-1} K^{o-l} , \end{aligned} \end{aligned}$$

where \(i_{2\cdots o}\) is the number of \(\mathfrak {I}G\)’s among \(G_2,\ldots G_o\), and we used the previously considered \(o(e)=l(e)\) case in the last step.

5.2.7 Averaged bound for \(2\le o(e)<l(e)\) and \(c(e)\ge 2\)

For at least two alternating chains, \(c(e)\ge 2\), we may write by cyclicity \(\langle {\mathcal {G}}^e\rangle =\langle {\mathcal {G}}^{e_1}\cdots {\mathcal {G}}^{e_{c(e)}}\rangle \) for

$$\begin{aligned} {\mathcal {G}}^{e_j}=G_{j,1}B_{j,1}G_{j,2}\cdots B_{j,o_j} G_{j,o_j+1}B_{j,o_j+1} \cdots G_{j,l_j} B_{j,l_j},\end{aligned}$$

for some \(1\le o_j\le l_j -1\) such for each \({\mathcal {G}}^{e_j}\) the first \(o_j\) internal summation indices are orthogonality indices. By Cauchy-Schwarz it follows that

$$\begin{aligned} \left|\langle {\mathcal {G}}^e\rangle \right|&\le \frac{1}{N}\sqrt{\prod _{j\in [c(e)]} {{\,\mathrm{Tr}\,}}{\mathcal {G}}^{e_j}({\mathcal {G}}^{e_j})^*}\nonumber \\&\lesssim \frac{1}{N}\prod _{j\in [c(e)]} \frac{1}{\eta ^{l_j-o_j}} \sqrt{{{\,\mathrm{Tr}\,}}\mathfrak {I}G_{j,1} B_{j,1} \cdots G_{j,o_j} B_{j,o_j} \mathfrak {I}G_{j,o_j+1} B_{j,o_j}^*G_{j,o_j}^*\cdots B_{j,1}^*}\nonumber \\&\prec \frac{1}{N}\prod _{j\in [c(e)]} \frac{N^{o_j/2}\rho ^{i_j+1}}{\eta ^{l_j-o_j}} \varLambda _+^{a_j}\varPi _+^{t_j} \nonumber \\&\le \varLambda _+^{\sum _j a_j}\varPi _+^{\sum _j t_j} \rho ^{\sum _j(l_j-o_j+i_j+1)} N^{\sum _j (l_j - o_j/2) -1 } K^{\sum _j (o_j-l_j)} , \end{aligned}$$
(110)

where \(i_j\) denotes the number of \(\mathfrak {I}G\)’s among \(G_{j,2},\ldots , G_{j,o_j}\), and we used the previously discussed \(o(e)=l(e)\) case in the third inequality. This concludes the proof of (104a).

5.2.8 Isotropic bound for \(o(e)=l(e)-1\)

The claimed bound is trivial if \(l(e)=1\) (and hence \(o(e)=0\)). Otherwise for \(l(e)\ge 2\) we estimate

$$\begin{aligned}&\left|\langle \varvec{v},{\mathcal {G}}^e \varvec{w}\rangle \right| \\&\quad = \left|\langle \varvec{v},G_1 B_1 G_2 B_2 \cdots B_{l-1} G_l,\varvec{w}\rangle \right| \\&\quad \lesssim \sum _{\begin{array}{c} a_k\in [N]\\ k\in [l] \end{array}} \left|p_{a_1}^{(1)}\right|\cdots \left|p_{a_l}^{(l)}\right| \left| \langle \varvec{v},\varvec{u}_{a_1}^{(1)}\rangle \right| \left|\langle \varvec{u}_{a_1}^{(1)},B_1\varvec{u}_{a_2}^{(2)}\rangle \right|\cdots \left|\langle \varvec{u}_{a_{l-1}}^{(l-1)},B_{l-1}\varvec{u}_{a_l}^{(l)}\rangle \right|\left|\langle \varvec{u}_{a_l}^{(l)},\varvec{w}\rangle \right|\\&\quad \prec \frac{1}{N} \sum _{\begin{array}{c} a_k\in [N]\\ k\in [l] \end{array}} \left|p_{a_1}^{(1)}\right| \cdots \left|p_{a_l}^{(l)}\right| \frac{1}{L^{l}}\sum _{\begin{array}{c} \left|b_k-a_k\right|\le L\\ k\in [l] \end{array}} \left|\langle \varvec{u}_{b_1}^{(1)},B_1\varvec{u}_{b_2}^{(2)}\rangle \right|\cdots \left|\langle \varvec{u}_{b_{l-1}}^{(l-1)},B_{l-1}\varvec{u}_{b_l}^{(l)}\rangle \right|\\&\quad \prec \varLambda _+^{a}\varPi _+^{t} \rho ^{i} N^{l/2-1/2} , \end{aligned}$$

using delocalisation \(\left|\langle \varvec{u}_a,\varvec{v}\rangle \right|+\left|\langle \overline{\varvec{u}_a},\varvec{v}\rangle \right|\prec N^{-1/2}\) for any deterministic \(\varvec{v}\) with \(\left\Vert \varvec{v}\right\Vert \lesssim 1\), by the isotropic law in (12), in the second inequality.

5.2.9 Isotropic bound for \(o(e)\le l(e)-2\)

We decompose \({\mathcal {G}}^e={\mathcal {G}}^{e_1}\cdots {\mathcal {G}}^{e_{k}}\) such that each of \({\mathcal {G}}^{e_2},\ldots ,{\mathcal {G}}^{e_{k-1}}\) begins with a new alternating chain followed (potentially) by further \(G\)’s, \({\mathcal {G}}^{e_1}\) either begins with an alternating chain, or is a chain without orthogonality indices, and \({\mathcal {G}}^{e_k}\) is either an alternating chain or a chain without orthogonality indices. For example, by brackets denoting the decomposition, we would separate

$$\begin{aligned} \langle \varvec{v},(GB_1G^*B_2 (\mathfrak {I}G)^t B_3) (GB_4G^t B_5)(G^*B_6)\varvec{w}\rangle \end{aligned}$$

if the indices associated with \(B_1,B_2,B_4\) are orthogonality indices, and estimate

$$\begin{aligned} \begin{aligned}&\left|\langle \varvec{v},{\mathcal {G}}^e \varvec{w}\rangle \right| \\&\quad \le \Bigl [\langle \varvec{v},{\mathcal {G}}^{e_1}({\mathcal {G}}^{e_1})^*\varvec{v}\rangle ({{\,\mathrm{Tr}\,}}{\mathcal {G}}^{e_2} ({\mathcal {G}}^{e_2})^*) \cdots ( {{\,\mathrm{Tr}\,}}{\mathcal {G}}^{e_{k-1}} ({\mathcal {G}}^{e_{k-1}})^*) \langle \varvec{w},({\mathcal {G}}^{e_{k}})^*{\mathcal {G}}^{e_{k}}\varvec{w}\rangle \Bigr ]^{1/2}. \end{aligned} \end{aligned}$$

For the two isotropic factors of length \(l_j\) with \(o_j\) orthogonality indices and \(i_j\) many \(\mathfrak {I}G\)’s we claim that

$$\begin{aligned} \left|\langle \varvec{v},{\mathcal {G}}^{e_j}({\mathcal {G}}^{e_j})^*\varvec{v}\rangle \right| \prec \frac{N^{2l_j-o_j-1}\rho ^{2i_j\vee 2(l_j-o_j)}\varLambda _+^{2a_j}\varPi _+^{2t_j}}{K^{2(l_j-o_j)-1}} \end{aligned}$$
(111)

which follows from

$$\begin{aligned} \begin{aligned}&\left|\langle \varvec{v},G_1 B_1 \cdots G_o B_o G_{o+1}B_{o+1} \cdots G_l G_l^*\cdots B_{o+1}^*G_{o+1}^*B_o^*G_o \cdots B_1^*G_1^*\varvec{v}\rangle \right| \\&\quad \lesssim \frac{\left|\langle \varvec{v},G_1 B_1 \cdots G_o B_o \mathfrak {I}G_{o+1} B_o^*G_o \cdots B_1^*G_1^*\varvec{v}\rangle \right|}{\eta ^{2(l-o)-1}}\\&\quad \prec \frac{N^{2l-o-1}\rho ^{2i_{1\cdots o}+2(l-o)}\varLambda _+^{2a}\varPi _+^{2t}}{K^{2(l-o)-1}}, \end{aligned} \end{aligned}$$

where \(i_{1\cdots o}\) is the number of \(\mathfrak {I}G\)’s among \(G_1,\ldots ,G_o\). For the tracial factors we have, as in (110), that

$$\begin{aligned} {{\,\mathrm{Tr}\,}}{\mathcal {G}}^{e_j}({\mathcal {G}}^{e_j})^*\prec \frac{N^{2l_j-o_j}\rho ^{2i_j\vee 2(l_j-o_j)}\varLambda _+^{2a_j}\varPi _+^{2t_j}}{K^{2(l_j-o_j)}}. \end{aligned}$$
(112)

By combining (111)–(112) we obtain

$$\begin{aligned}\begin{aligned} \left|\langle \varvec{v},{\mathcal {G}}^e\varvec{w}\rangle \right|&\prec \frac{K}{N} \prod _{j\in [k]} \frac{N^{l_j-o_j/2}\rho ^{i_j\vee (l_j-o_j)}\varLambda _+^{a_j}\varPi _+^{t_j}}{K^{l_j-o_j}} \\&= \varLambda _+^{a}\varPi _+^{t} \rho ^{i\vee (l-o)} N^{l-o/2-1} K^{o-l+1}, \end{aligned} \end{aligned}$$

completing the proof of (104b) also in this case. \(\square \)

5.2.10 Improved degree three estimate: Proof of Lemma 4

The proof of Lemma 4 consists of identifying improvements over the estimate given in Lemma 3 that relied solely on entrywise bounds for each individual \({\mathcal {G}}\)-factor. In order to quantify the improvement we distinguish the two different entrywise bounds in Lemma 3 as

$$\begin{aligned} \left|{{\,\mathrm{Val}\,}}(\varGamma _\mathrm {red})\right| \prec {{\,\mathrm{I_2^i-Est}\,}}(\varGamma )\wedge {{\,\mathrm{I_2^0-Est}\,}}(\varGamma ),\end{aligned}$$
(113)

where \({{\,\mathrm{I_2^i-Est}\,}},{{\,\mathrm{I_2^0-Est}\,}}\) are defined as in (101) but with \({{\,\mathrm{I_2^i-Est}\,}}\) having \(\rho \)-exponent \(2ip\), and \({{\,\mathrm{I_2^0-Est}\,}}\) having \(\rho \)-exponent \(\left|V_\mathrm {i}\right|+2\left|E_\kappa ^2\right|-\left|V_\mathrm {o}\right|\). Note that \(\rho \lesssim 1\) and therefore the maximum in the exponent of \(\rho \) in (101) corresponds to the minimum of \({{\,\mathrm{I_2^i-Est}\,}},{{\,\mathrm{I_2^0-Est}\,}}\).

Within the reduced graphs we call a subset \(E_\mathrm {Ward}\subset E_g^{\mathrm {red}}\setminus (E_g^{\mathrm {red},\mathrm {cyc}}\cup \left\{ (vv)\Big |v\in V_\kappa ^{\ge 3} \right\} )\) Wardable if each subgraph \(\varGamma '\subset (V_\mathrm {e}\cup V_\kappa ^{\ge 3},E_\mathrm {Ward})\) satisfies \(\min \left\{ d_g^{\varGamma '}(v)\Big |v\in V_\kappa ^{\ge 3} \right\} \le 2\). The contribution of these Wardable edges will be estimated better than their trivial entrywise bound to obtain \({{\,\mathrm{I_3-Est}\,}}\). We start with a simple alternative characterization of Wardable subsets (see [27, Lemma 4.5] and [31, 41]).

Lemma 6

A subset \(E_\mathrm {Ward}\) is Wardable if and only if there exists an ordering \(V_\kappa ^{\ge 3}=\left\{ v_1,v_2,\ldots \right\} \) such that the sequence of graphs \(\varGamma _0:=(V_\mathrm {e}\cup V_\kappa ^{\ge 3},E_\mathrm {Ward})\), \(\varGamma _{k}:=\varGamma _{k-1}\setminus \left\{ v_k \right\} \) satisfies \(d_g^{\varGamma _{k-1}}(v_k)\le 2\) for each \(k\ge 1\), where it is understood that \(\varGamma _k\) is obtained from \(\varGamma _{k-1}\) by removing \(v_k\) and all adjacent edges.

Proof

Suppose that \(E_\mathrm {Ward}\) is Wardable. Then by definition there exists \(v_1\) with \(d_g^{\varGamma _0}(v_1)\le 2\) and we obtain \(\varGamma _1\) which in turn contains some vertex \(v_2\) with \(d_g^{\varGamma _1}(v_2)\le 2\). Continuing inductively yields the desired ordering.

For the reverse implication let \(v_1,v_2,\ldots \) be the given ordering and let \(\varGamma '\) be arbitrary. Set \(k_{\min }:=\min \left\{ k\Big |v_k\in \varGamma ' \right\} \) so that \(\varGamma '\subset \varGamma _{k_{\min }-1}\) and consequently \(d_g^{\varGamma '}(v_{k_{\min }})\le d_g^{\varGamma _{k_{\min }-1}}(v_{k_{\min }})\le 2\). \(\square \)

Lemma 4 follows immediately from combining the following two statements (where for the iso-graphs we simply estimate \(\rho ^{\left|E_\mathrm {Ward}\right|}\le \rho ^{\left|E_\kappa ^3\right|}\) in the definition of \({{\,\mathrm{I_3-Est}\,}}(\varGamma )\) below):

  1. (S1)

    For each av-graph \(\varGamma \) the reduced graph \(\varGamma _\mathrm {red}\) admits a Wardable set \(E_\mathrm {Ward}\) of size

    $$\begin{aligned} \left|E_\mathrm {Ward}\right|\ge \left|E_\kappa ^3\right|. \end{aligned}$$
    (114a)

    and for each iso-graph \(\varGamma \) satisfying the reduced graph \(\varGamma _\mathrm {red}\) admits a Wardable set of size

    $$\begin{aligned} \begin{aligned} \left|E_\mathrm {Ward}\right|&\ge \left|E_\kappa ^3\right| + \Bigl (2p - 2(\left|E_\kappa ^2\right|-\left|V_\mathrm {cyc}\right|)-\sum _{e\in E_\kappa } (d_g(e)-4)_+\Bigr )_+. \end{aligned} \end{aligned}$$
    (114b)
  2. (S2)

    For any av- or iso-graph \(\varGamma \in {\mathcal {G}}\) and a given Wardable set \(E_\mathrm {Ward}\) we have the improved estimates

    $$\begin{aligned} \left|{{\,\mathrm{Val}\,}}(\varGamma _\mathrm {red})\right| \prec {{\,\mathrm{I_3^i-Est}\,}}(\varGamma ) \wedge {{\,\mathrm{I_3^0-Est}\,}}(\varGamma )\end{aligned}$$

    with

    $$\begin{aligned} \begin{aligned} {{\,\mathrm{I_3^i-Est}\,}}(\varGamma ):&= K^{-\left|E_\mathrm {Ward}\right|/2} {{\,\mathrm{I_2^i-Est}\,}}(\varGamma ), \\ {{\,\mathrm{I_3^0-Est}\,}}(\varGamma ):&= \rho ^{\left|E_\mathrm {Ward}\right|}K^{-\left|E_\mathrm {Ward}\right|/2} {{\,\mathrm{I_2^0-Est}\,}}(\varGamma ). \end{aligned} \end{aligned}$$

Proof

(Step (S1)) We start with two inequalities that will be proven later. Denoting the number of \(E_g^\mathrm {red}\)-edges between two subsets of vertices \(V',V''\subset V\) by \(e_g(V',V'')\), we claim that for av-/iso graphs \(\varGamma \) we have

$$\begin{aligned} e_g(V_\kappa ^3,V_\mathrm {e})+e_g(V_\kappa ^3,V_\kappa ^{\ge 3}) \ge 3\left|E_\kappa ^3\right|, \end{aligned}$$
(115a)

while for iso-graphs \(\varGamma \) we also have

$$\begin{aligned} e_g(V_\kappa ^{\ge 3},V_\mathrm {e})+e_g(V_\kappa ^{\ge 3},V_\kappa ^{\ge 3}) \ge \sum _{e\in E_\kappa ^{\ge 3}}d_g(e)+2p - 2(\left|E_\kappa ^2\right|-\left|V_\mathrm {cyc}\right|). \end{aligned}$$
(115b)

Armed with these inequalities, we first construct candidate sets of edges within \(E_g^{\mathrm {red}}\setminus E_g^{\mathrm {red},\mathrm {cyc}}\) which are not necessarily Wardable, and then iteratively remove certain edges to make the sets Wardable. For the proof of \(\left|E_\mathrm {Ward}\right|\ge \left|E_\kappa ^3\right|\) for both av- and iso-graphs we start with the candidate set consisting of all \(G\)-edges adjacent to \(V_\kappa ^3\)-vertices. The size of this set is \( e_g(V_\kappa ^3,V_\mathrm {e})+e_g(V_\kappa ^3,V_\kappa ^{\ge 3}) \). We remove at most one edge adjacent to any \(v\in V_\kappa ^3\), so that the at most two remaining edges are not loops. After doing so in arbitrary order for all \(V_\kappa ^3\)-vertices we obtain an edge set which is Wardable by construction. Since the total number of removed edges is at most \( \left| V_\kappa ^3\right| = 2 \left| E_\kappa ^3\right|\), we immediately obtain (114a), and (114b) in case \((\ldots )_+=0\) from (115a).

For the proof of (114b) in case \((\ldots )_+>0\) we consider a larger candidate set of size \(e_g(V_\kappa ^{\ge 3},V_\mathrm {e})+e_g(V_\kappa ^{\ge 3},V_\kappa ^{\ge 3}) \) that consists of all edges adjacent to \(V_\kappa ^{\ge 3}\)-vertices. Going through all \(V_\kappa ^{\ge 3}\)-vertices in arbitrary order we remove at most \(k-2\) edges for each vertex \(v\in V_\kappa ^{k}\), so that the at most two remaining edges are not loops; this yields again a Wardable set. Since \(\left|V_\kappa ^{k}\right|= 2\left|E_\kappa ^k\right|\), the total number of removed edges is at most

$$\begin{aligned} \sum _{k\ge 3} \sum _{e\in E_\kappa ^k} 2(k-2) = \sum _{e\in E_\kappa ^{\ge 3}} (2d_g(e)-4) = \sum _{e\in E_\kappa ^{\ge 3}} d_g(e) + \sum _{e\in E_\kappa } (d_g(e)-4)_+ - \left|E_\kappa ^3\right|, \end{aligned}$$

which, together with (115b) yields (114b). This completes the proof of (S1) modulo (115) that we prove now. \(\square \)

Proof

(Eq. (115)) The bound (115a) follows from

$$\begin{aligned} \begin{aligned} 6\left|E_\kappa ^3\right|&= 2\sum _{(uv)\in E_\kappa ^3} d_g((uv)) = \sum _{v\in V_\kappa ^3} d_g(v) \\&= 2 e_g(V_\kappa ^3,V_\kappa ^3) + e_g(V_\kappa ^3,V_\kappa ^{\ge 4}\cup V_\mathrm {e}) \le 2 e_g(V_\kappa ^3,V_\kappa ^{\ge 3}\cup V_\mathrm {e}). \end{aligned}\end{aligned}$$

For the bound (115b) we note that the set \(E_g^{\mathrm {red}}\setminus E_g^{\mathrm {red},\mathrm {cyc}}\) can be partitioned into edges within \(V_\kappa ^{\ge 3}\), edges within \(V_\mathrm {e}\) and edges between these two sets, and thus from (P3)–(\(P^\mathrm{iso}10\)) and (99) we obtain

$$\begin{aligned} \begin{aligned}&e_g(V_\kappa ^{\ge 3},V_\mathrm {e})+e_g(V_\kappa ^{\ge 3},V_\kappa ^{\ge 3}) \\&\quad =\left|E_g^{\mathrm {red}}\setminus E_g^{\mathrm {red},\mathrm {cyc}}\right|-e_g(V_\mathrm {e},V_\mathrm {e})=\left|E_g^\mathrm {red}\right|-\left|V_\mathrm {cyc}\right|-e_g(V_\mathrm {e},V_\mathrm {e}) \\&\quad = \sum _{e\in E_\kappa ^{\ge 3}}d_g(e)+2p - e_g(V_\mathrm {e},V_\mathrm {e}). \end{aligned} \end{aligned}$$

Furthermore, by (\(P^\mathrm{iso}9\)) each \(V_\mathrm {e}\)-\(V_\mathrm {e}\) edge corresponds to at least one \(V_\kappa ^2\)-vertex, while by (P5) each cycle \(E_g^{\mathrm {red},\mathrm {cyc}}\) corresponds to at least two \(V_\kappa ^2\)-vertices in \(\varGamma \) (which are in particular not part of any chain), whence

$$\begin{aligned} e_g(V_\mathrm {e},V_\mathrm {e}) \le \left|V_\kappa ^2\right| - 2\left|V_\mathrm {cyc}\right|=2(\left|E_\kappa ^2\right|-\left|V_\mathrm {cyc}\right|) \end{aligned}$$

and the claim follows. \(\square \)

Proof

(Claim 5.2.3) We recall from the proof of Lemma 3 that (101) is the minimum of two different estimates given in (113). Estimating each \({\mathcal {G}}^e\) for \(e\in E_g^\mathrm {red}\) by Lemma 5 with a \(\rho \)-exponent of \(i(e)\) in (96) yields the first bound \(\left|{{\,\mathrm{Val}\,}}(\varGamma _\mathrm {red})\right|\prec {{\,\mathrm{I_2^i-Est}\,}}(\varGamma )\). Similarly, estimating each \({\mathcal {G}}^e\) by Lemma 5 with a \(\rho \)-exponent of \(l(e)-o(e)-\varvec{1}(e\in E_g^{\mathrm {red}}\setminus E_g^{\mathrm {red},\mathrm {cyc}})\) yields the second bound \(\left|{{\,\mathrm{Val}\,}}(\varGamma _\mathrm {red})\right|\prec {{\,\mathrm{I_2^0-Est}\,}}(\varGamma )\), cf. the first inequality in (106). In order to prove 5.2.3 for a given Wardable set \(E_\mathrm {Ward}\) we estimate \({\mathcal {G}}^{e}\) for \(e\in E_g^{\mathrm {red},\mathrm {cyc}}\cup (E_g^{\mathrm {red}}\setminus (E_g^{\mathrm {red},\mathrm {cyc}}\cup E_\mathrm {Ward}))\) exactly as in Lemma 3 and remove the corresponding edges from the graph, leaving only \(E_\mathrm {Ward}\)-edges. In order to conclude the proof it remains to establish an additional gain of \(K^{-1/2}\) (compared to the first bound) and \(\rho K^{-1/2}\) (compared to the second bound) per \(E_\mathrm {Ward}\)-edge \(e\) compared to the entrywise estimates.

Let \(v_1,v_2,\ldots \) denote the ordering of \(V_\kappa ^{\ge 3}\) guaranteed to exist by Lemma 6. By definition of \(E_\mathrm {Ward}\) at most two Wardable edges are adjacent to \(v_1\) and whence the part of the value depending on \(a_{v_1}\) can be estimated by either

$$\begin{aligned} \begin{aligned} \sum _{a_{v_1}} \left|{\mathcal {G}}_{\varvec{x}_w a_{v_1}}^{(wv)}\right|&\le N^{1/2} \sqrt{ [{\mathcal {G}}^{(wv_1)} ({\mathcal {G}}^{(wv_1)})^*]_{\varvec{x}_w \varvec{x}_x} } \end{aligned} \end{aligned}$$
(116)

or

$$\begin{aligned} \sum _{a_{v_1}} \left|{\mathcal {G}}_{\varvec{x}_w a_{v_1}}^{(wv_1)}\right| \left|{\mathcal {G}}_{a_{v_1}\varvec{x}_y}^{(v_1y)}\right| \le \sqrt{ [{\mathcal {G}}^{(wv_1)} ({\mathcal {G}}^{(wv_1)})^*]_{\varvec{x}_w \varvec{x}_w} } \sqrt{ [ ({\mathcal {G}}^{(v_1y)})^*{\mathcal {G}}^{(yv_1)} ]_{\varvec{x}_y \varvec{x}_y} } \end{aligned}$$
(117)

using Cauchy-Schwarz for some \(w,y\in V_\kappa ^{\ge 3}\cup V_\mathrm {e}\). In case of \({{\,\mathrm{I_2^i-Est}\,}}\) the entrywise estimate on the lhs. of (116)–(117) used in the proof of Lemma 3 is at least

$$\begin{aligned} \varLambda _+^a\varPi _+^t\rho ^{i}N^{l-o/2}K^{o-l+1} \quad \text {and}\quad \varLambda _+^{a+a'}\varPi _+^{t+t'}\rho ^{i+i'}N^{l+l'-o/2-o'/2-1}K^{o+o'-l-l'+2} \end{aligned}$$

with \(i=i((wv_1))\), \(l=l((wv_1))\), \(a=a((wv_1))\), \(t=t((wv_1))\), \(o=t+a\) and \(i'=i((v_1y))\), \(l'=l((v_1y))\), \(a'=a((v_1y))\), \(t'=t((v_1y))\), \(o'=t'+a'\) while applying Lemma 5 to the rhs. yields

$$\begin{aligned} \varLambda _+^a\varPi _+^t \rho ^{i} N^{l-o/2} K^{o-l+1/2} \,\, \text {and}\,\, \varLambda _+^{a+a'}\varPi _+^{t+t'}\rho ^{i+i'}N^{l+l'-o/2-o'/2-1}K^{o+o'-l-l'+1}, \end{aligned}$$

demonstrating the gains of at least \(K^{-1/2}\) and \((K^{-1/2})^2\), respectively. Similarly, the \({{\,\mathrm{I_2^0-Est}\,}}\)-estimate on the lhs. of (116)–(117) is at least

$$\begin{aligned}&\varLambda _+^a\varPi _+^t\rho ^{l-o-1}N^{l-o/2}K^{o-l+1} \\&\quad \text {and} \\&\quad \varLambda _+^{o+o'}\rho ^{l+l'-o-o'-2}N^{l+l'-o/2-o'/2-1}K^{o+o'-l-l'+2} \end{aligned}$$

while, in comparison, when applying Lemma 5 to the rhs. of (116)–(117), we obtain bounds of

$$\begin{aligned} \varLambda _+^a\varPi _+^t \rho ^{l-o} N^{l-o/2} K^{o-l+1/2} \,\, \text {and} \,\, \varLambda _+^{o+o'}\rho ^{l+l'-o-o'}N^{l+l'-o/2-o'/2-1}K^{o+o'-l-l'+1}, \end{aligned}$$

demonstrating exactly the claimed gain of \(\rho K^{-1/2}\) per edge. Here, for example, we counted that \({\mathcal {G}}^{(wv_1)} ({\mathcal {G}}^{(wv_1)})^*\) contains \(2l\) factors of \(G\) and \(2o\) orthogonality indices satisfying \(2o\le 2l-2<2l-1\).

The proof now follows by induction since by Lemma 6 after the removal of \(v_1\), the next vertex \(v_2\) has degree at most \(2\) etc. and (116)–(117) can be used to establish the gain of \((\rho )K^{-1/2}\) iteratively for each \(e\in E_\mathrm {Ward}\). \(\square \)