Fluctuations of extreme eigenvalues of sparse Erdős–Rényi graphs

We consider a class of sparse random matrices which includes the adjacency matrix of the Erdős–Rényi graph \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathcal {G}}}(N,p)$$\end{document}G(N,p). We show that if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N^{\varepsilon } \leqslant Np \leqslant N^{1/3-\varepsilon }$$\end{document}Nε⩽Np⩽N1/3-ε then all nontrivial eigenvalues away from 0 have asymptotically Gaussian fluctuations. These fluctuations are governed by a single random variable, which has the interpretation of the total degree of the graph. This extends the result (Huang et al. in Ann Prob 48:916–962, 2020) on the fluctuations of the extreme eigenvalues from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Np \geqslant N^{2/9 + \varepsilon }$$\end{document}Np⩾N2/9+ε down to the optimal scale \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Np \geqslant N^{\varepsilon }$$\end{document}Np⩾Nε. The main technical achievement of our proof is a rigidity bound of accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N^{-1/2-\varepsilon } (Np)^{-1/2}$$\end{document}N-1/2-ε(Np)-1/2 for the extreme eigenvalues, which avoids the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(Np)^{-1}$$\end{document}(Np)-1-expansions from Erdős et al. (Ann Prob 41:2279–2375, 2013), Huang et al. (2020) and Lee and Schnelli (Prob Theor Rel Fields 171:543–616, 2018). Our result is the last missing piece, added to Erdős et al. (Commun Math Phys 314:587–640, 2012), He (Bulk eigenvalue fluctuations of sparse random matrices. arXiv:1904.07140), Huang et al. (2020) and Lee and Schnelli (2018), of a complete description of the eigenvalue fluctuations of sparse random matrices for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Np \geqslant N^{\varepsilon }$$\end{document}Np⩾Nε.

with probability p 0 with probability 1 − p.
We introduce the normalized adjacency matrix where the normalization is chosen so that the eigenvalues of A are typically of order one. The goal of this paper is to obtain the asymptotic distribution of the extreme eigenvalues of A. The extreme eigenvalues of graphs are of fundamental importance in spectral graph theory and have attracted much attention in the past 30 years; see for instance [1,4,17] for reviews. The Erdős-Rényi graph is the simplest model of a random graph and its adjacency matrix is the canonical example of a sparse random matrix.
Each row and column of A has typically N p nonzero entries, and hence A is sparse whenever p → 0 as N → ∞. In the complementary dense regime, where p is of order one, A is a Wigner matrix (up to a centring of the entries). The edge statistics of Wigner matrices have been fully understood in [8,10,23,[26][27][28], where it was shown that the distribution of the largest eigenvalue is asymptotically given by the GOE Tracy-Widom distribution [29,30].
To discuss the edge statistics of A in the sparse regime, we introduce the following conventions. Unless stated otherwise, all quantities depend on the fundamental parameter N , and we omit this dependence from our notation. We write X Y to mean X = O ε (N −ε Y ) for some fixed ε > 0. We write X Y to mean X = O(Y ) and Y = O(X ). We denote the eigenvalues of A by λ 1 ≤ · · · ≤ λ N . The largest eigenvalue λ N of A is its Perron-Frobenius eigenvalue. For N p 1, it is typically of order √ N p, while the other eigenvalues λ 1 , λ 2 , . . . , λ N −1 are typically of order one. The edge statistics of sparse matrices were first studied in [8,9], where it was proved that when N p N 2/3 the second largest eigenvalue of A exhibits GOE Tracy-Widom fluctuations, i.e. lim N →∞ where F 1 (s) is the distribution function of the GOE Tracy-Widom distribution. In [24], this result was extended to N p N 1/3 , which it turns out is optimal. Indeed, in [19] it was shown that when N 2/9 N p N 1/3 the Tracy-Widom distribution for λ N −1 no longer holds, and the extreme eigenvalues have asymptotically Gaussian fluctuations. More precisely, in [19] it was shown that if N 2/9 N p N 1/3 then 1). (1.2) In this paper we show (1.2) for the whole range 1 N p N 1/3 . In fact, we show this for a general class of sparse random matrices introduced in [8,9]. It is easy to check that the normalized adjacency matrix A (1.1) of G(N , p) satisfies the following definition with the choice q . .= N p. (1.3) A sparse matrix is a real symmetric N × N matrix H = H * ∈ R N ×N whose entries H i j satisfy the following conditions. We define the random matrix where e . .= N −1/2 (1, 1, . . . , 1) * , and f ≥ 0.
For simplicity of presentation, in this paper we focus only on real matrices, although our results and proofs extend to matrices with complex entries with minor modifications which we omit; see also Remark 8.2 below.
To describe the fluctuations of the eigenvalues of A, we define the random variable We denote by γ sc,i be the ith N -quantile of the semicircle distribution, which is the limiting empirical eigenvalue measure of A for N p 1. Explicitly, Throughout the following we fix an exponent β ∈ (0, 1/2] and set q = N β . (1.6) If A is the normalized adjacency matrix (1.1) of G(N , p) then from (1.3) and (1.6) we find that the condition 1 N p N 1/3 reads 1 q N 1/6 , i.e. β ∈ (0, 1/6). We may now state our main result. Our main result is a rigidity estimate for the eigenvalues of A with accuracy In contrast, the corresponding rigidity results of [9,19,24] have accuracy up to a fixed power of q −1 : up to q −2 in [9], q −4 in [24], and q −6 in [19]. For arbitrarily small polynomial values of q, the rigidity provided by an expansion up to a fixed power of q −1 is not sufficient to analyse the fluctuations of the extreme eigenvalues. Thus, the main technical achievement of our paper is the avoidance of q −1 -expansions in the error bounds.
Remark 1. 4 The variable Z was introduced in [19], where its importance for the edge fluctuations of sparse random matrices was first recognized. Using it, the authors proved (1.8) for β ∈ (1/9, 1/6). In (1.11), the Gaussian fluctuations (1.9) present for λ i (A) are absent for λ i ( A). Hence, the fluctuations of the eigenvalues of A can be all simultaneously eliminated to leading order by an appropriate random rescaling. Note that we can write A = d −1/2 A(1 + O(N −δ )), in analogy to (1.10). Thus, (1.11) states that if one replaces the deterministic normalization d −1/2 with the random normalization D −1/2 the fluctuations vanish to leading order. In fact, although it is not formulated that way, our proof can essentially be regarded as a rigidity result for the matrix A. Remark 1.5 is consistent with the fact that for more rigid graph models where the average degree is fixed, Z does not appear: for a random d-regular graph, the second largest eigenvalue of the adjacency matrix has Tracy-Widom fluctuations for N 2/9 d N 1/3 [2]. Moreover, in [19] it was proved that the second largest eigenvalue of A has Tracy-Widom fluctuations for q N 1/9 . Theorem 1.2 trivially implies the following result. Next, we remark on the fluctuations of single eigenvalues inside the bulk. This problem was first addressed in [11] for GUE, extended to GOE in [25], and recently extended to general Wigner matrices in [3,21]. In these works, it was proved that the bulk eigenvalues of Wigner matrices fluctuate on the scale √ log N /N . More precisely,

Remark 1.5 Let
The bulk eigenvalue fluctuation of sparse matrices was studied in [12], where it was shown that for fixed β ∈ (0, 1/2), there exists c ≡ c(β) > 0 such that with probability at least In summary, we have the following general picture of fluctuations of eigenvalues for sparse random matrices. The fluctuations of any single eigenvalue consists of two components: a random matrix component and a sparseness component. The random matrix component is independent of the sparseness and coincides with the corresponding fluctuations of GOE. It has order N −2/3 at the edge and order √ log N /N in the bulk. The sparseness component is captured by the random variable Z and has order 1/( √ Nq) throughout the spectrum except near the origin. Thus, the sparseness component dominates in the bulk as soon as q √ N and at the edge as soon as q N 1/6 . In fact, our proof suggests that Z is only the leading order such Gaussian contribution arising from the sparseness, and that there is an infinite hierarchy of strongly correlated and asymptotically Gaussian random variables of which Z is the largest and whose magnitudes decrease in powers of q −2 . In order to obtain random matrix Tracy-Widom statistics near the edge, one would have to subtract all of such contributions up order N −2/3 . For q = N β with β arbitrarily small, the number of such terms becomes arbitrarily large.
For completeness, we mention that the bulk eigenvalue statistics have also been analysed in terms of their correlation functions and eigenvalue spacings, which have a very different behaviour from the single eigenvalue fluctuations described above. It was proved in [8,9,18,22] that the asymptotics of the local eigenvalue correlation functions in the bulk coincide with those of GOE for any q 1. Thus, the sparseness has no impact on the asymptotic behaviour of the correlation functions and the eigenvalue spacings.
We conclude this section with a few words about the proof. The fluctuations of the extreme eigenvalues are considerably harder to analyse than those of the bulk eigenvalues, and in particular the method of [12] breaks down at the edge because the self-consistent equations on which it relies become unstable. The key difficulty near the edge is to obtain strong rigidity estimates on the locations of the extreme eigenvalues, while no such estimates are needed in the bulk. Indeed, the central step of the proof is Proposition 4.1 below, which provides an upper bound for the fluctuations of the largest eigenvalue of H . This is obtained by showing, for suitable E outside the bulk of the spectrum and η > 0, that the imaginary part of the Green's function 1/η. Our basic approach is the self-consistent polynomial method for sparse matrices developed in [19,24]. Thus, we first obtain a highly precise bound on the self-consistent polynomial P of the Green's function, which provides a good estimate of Tr G outside the bulk. The key observation in this part is that the cancellation built into P persists also in the derivative of P. Armed with the good estimate of Tr G, our second key idea is to estimate the imaginary part of P, which turns out to be much smaller than P itself; from this we deduce strong enough bounds on the imaginary part of G. These two estimates together conclude the proof. We refer to Sect. 3 below for more details of the proof strategy.
The rest of the paper is organized as follows. In Sect. 2 we introduce the notations and previous results that we use in this paper. In Sect. 3 we explain the strategy of the proof. In Sect. 4 we prove Theorem 1.2, assuming key rigidity estimates at the edge (Proposition 4.1) and inside the bulk (Lemma 4.2). In Sect. 5 we give a careful construction of the self-consistent polynomial P of the Green's function. In Sects. 6-8, we prove Proposition 4.1, by assuming several improved estimates for large classes of polynomials of Green's functions. In Sect. 9 we prove Lemma 4.2. Finally in Sect. 10 we prove the estimates that we used in Sects. 6-8.

Preliminaries
In this section we collect notations and tools that will be used. For the rest of this paper we fix β ∈ (0, 1/6) and define δ as in (1. (2.1) Convention Throughout the paper, the argument of G and of any Stieltjes transform is always denoted by z ∈ C\R, and we often omit it from our notation.
The Stieltjes transform of the eigenvalue density at z is denoted by G(z). For deterministic z we have the differential rule If h is a real-valued random variable with finite moments of all order, we denote by We state the cumulant expansion formula, whose proof is given in e.g. [ The following result gives bounds on the cumulants of the entries of H , whose proof follows from Definition 1.1 and the homogeneity of the cumulants.

Lemma 2.2 For every k ∈ N we have
We use the following convenient notion of high-probability bound from [7].

Definition 2.3 (Stochastic domination). Let
be two families of random variables, where Y (N ) (u) are nonnegative and U (N ) is a possibly N -dependent parameter set. We say that X is stochastically dominated by Y , uniformly in u, if for all (small) ε > 0 and (large) D > 0 we have . If X is stochastically dominated by Y , uniformly in u, we use the notation X ≺ Y , or, equivalently X = O ≺ (Y ).
Note that for deterministic X and Y , for any ε > 0. Sometimes we say that an event ≡ (N ) holds with very high probability if for all D > 0 we have By estimating the moments of Z defined in (1.4) and invoking Chebyshev's inequality, we find We have the following elementary result about stochastic domination.
(ii) Suppose that X is a nonnegative random variable satisfying X ≤ N C and X ≺ for some deterministic ≥ N −C . Then EX ≺ .
Fix (a small) c > 0 and define the spectral domains We recall the local semicircle law for Erdős-Rényi graphs from [9].
As a standard consequence of the local law, we have the complete delocalization of eigenvectors. Lemma 2.6 Let u 1 , . . . , u N be the (L 2 -normalized) eigenvectors of H . We have Remark 2.7 Proposition 2.5 was proved in [9] under the additional assumption EH 2 ii = 1/N for all i. However, the proof is insensitive to the variance of the diagonal entries, and one can easily repeat the steps in [9] under the general assumption EH 2 ii = C i /N . A weak local law for H with general variances on the diagonal can also be found in [15].
The following Lemmas 2.9-2.12 characterize the asymptotic eigenvalues density of H . The proof of the following result is postponed to Sect. 5.

Lemma 2.9 There exists a deterministic polynomial
uniformly for all deterministic z ∈ S. Here a 2 , a 3 , . . . are real, deterministic, and bounded. They depend on the law of H .
Lemma 2.9 states that when x is replaced with G(z), the expectation of P 0 (z, x) is very small. This is because of a cancellation built into P, which however holds only in expectation and not with high probability. The following two results are essentially proved in [19, Propositions 2.5-2.6], and we state them without proof. We denote by C + the complex upper half-plane.

Moreover,
Im m 0 (z) and

Lemma 2.11
There exists a random algebraic function m . . C + → C + satisfying P(z, m(z)) = 0, such that m is the Stieltjes transform of a random symmetric probability measure . We have supp = [−L, L], where

Moreover,
and Let γ i denote the ith N -quantile of , i.e.
Similarly, let γ 0,i and γ sc,i denote the ith N -quantile of 0 and the semicircle distribution respectively. We have the following result, whose proof is given in "Appendix A" below.

Outline of the proof
In this section we describe the strategy of the proof. The foundation of the proof is the method of recursive self-consistent estimates for high moments using the cumulant expansion introduced in [13], building on the previous works [5,6,20]. It was first used to study sparse matrices in [24], which also introduced the important idea of estimating moments of a self-consistent polynomial in the trace of the Green's function. There, the authors derived a precise local law near the edge and obtained the extreme eigenvalue fluctuations for p N −2/3 . Subsequently, in [19], by developing the key insight that for N −7/9 p N −2/3 the leading fluctuations are fully captured by the random variable Z from (1.4), the authors obtained the extreme eigenvalue fluctuations for N −7/9 p N −2/3 . In this paper we use the same basic strategy as [19,24]. As in most results on the extreme eigenvalue statistics, the main difficulty is to establish rigidity bounds for the extreme eigenvalues.
The proof of Theorem 1.2 consists of essentially two separate results: an upper bound on the largest eigenvalue of H (Proposition 4.1 below) and a rigidity estimate in the bulk (Lemma 4.2 below). The latter is a modification of [19, Proposition 2.9], and our main task is to show the former.
We use the random spectral parameter z = L 0 + Z + w introduced in [19], where w = κ + iη is deterministic. In order to obtain the estimate of Proposition 4.1 for the largest eigenvalue of H using the Green's function, one has to preclude the existence of an eigenvalue near Re z for a suitable z, which follows provided one can show [see (6.5) and the discussions afterwards for more details]. The proof of (3.1) is the main work of our proof. It relies on the following key new ideas.
1. In the previous works [19,24], following the work [10] on Wigner matrices, (3.1) is always proved using Im G ≤ Im m + |G − m| and estimating the two terms on right-hand side separately. There, the term |G −m| is estimated by obtaining an estimate on |P(z, G)| from which an estimate on |G − m| follows by inverting a self-consistent equation associated with the polynomial P. In our current setting, |G − m| turns out to be much larger than Im G and hence this approach does not work. Thus, we have to estimate | Im(G − m)| instead of |G − m| and take advantage of the fact that it is much smaller than |G − m|. To that end, we first estimate |Im P(z, G)| by exploiting a crucial cancellation arising from taking the imaginary part, which yields stronger bounds on |Im P(z, G)| than are possible for |P(z, G)|. 2. To estimate |Im(G − m)| from |Im P(z, G)|, we have to invert a self-consistent equation associated with Im P. This equation is only stable provided that |G − m| is small enough. 3. The main work is to derive a strong enough bound on |G −m| to ensure the stability of the self-consistent equation for Im(G −m). The precision required for this step is much higher than that obtained in [19]. Our starting point is the same as in [19,24]: estimating high moments E|P| 2n of P ≡ P(z, G) using the cumulant expansion. Note that P is constructed in such a way that the expectation EP(z, G) is very small by a near-exact cancellation (see Lemma 2.9). In the high moments, the interactions between different factors of P and P, corresponding to the fluctuations of P, give rise to error terms whose control is the key difficulty of the proof. They cannot be estimated naively and have to be re-expanded to arbitrarily high order using a recursive application of the cumulant expansion. These error terms typically contain the partial derivative ∂ 2 P of P in the second argument G. As soon as P is differentiated, the cancellation built into P is lost. However, we nevertheless need to exploit remnants of this cancellation that are inherited by these higher-order terms containing derivatives of P. We track them by rewriting the partial derivative ∂ 2 P in terms of the derivative ∂ w P = ∂ 1 P + ∂ 2 P∂ w G and an error term, and then use that ∂ w commutes with the derivative ∂ ∂ H i j from the cumulant expansion to obtain a form where the cancellation from the next cumulant expansion is obvious also for the derivative of P.
Let us explain the above points in more detail. The proof of (3.1) contains two steps. The main step is to bound the high moments of P in Proposition 6.1. We start with EH i j G ji P n−1 P * n + E(P − H G)P n−1 P * n .
We expand the first term on the right-hand side by Lemma 2.1 to get Note that the polynomial P is designed such that and for the same reason there are cancellations between the second and third terms on right-hand side of (3.2). It turns out that the most dangerous terms on right-hand side of (3.2) are contained within the first sum. One representative error term, arising from k = 3 and s = 2 in (3.2), is which involves the interaction of P and P and hence depends on the fluctuations of P.
To get a sharp enough estimate of (3.3), it is not enough to take absolute value inside the expectation and then estimate |∂ 2 P| and |N −1 (G * 2 ) j j | by Lemmas 2.11 and 2.8 respectively. Instead, the key idea is to rewrite the error term, so that it becomes amenable to another expansion step, as 1 which comes from the approximations which of course have to be justified. Ignoring the error terms generated in this process, we find that (3.3) is reduced to Since ∂ w and ∂/∂ H i j commute, we can again expand the first term on the right-hand side with Lemma 2.1. In this way the operator ∂ w plays no role in our computation, and we can get the desired estimate using the smallness of EP. A major difficulty in the above argument results from the fact that we need to track carefully the algebraic structure of the error terms arising from repeated applications of simplifications of the form (3.4). In particular, such terms occur inside expectations multiplying lots of other terms, and we need to ensure that such approximations remain valid in general expressions. In order to achieve this, we implement the ideas in [12,14] to construct a hierarchy of Schwinger-Dyson equations for a sufficiently large class of polynomials in the entries of the Green's function.
A desired bound for P, Proposition 6.1, together with the stability analysis of the self-consistent equation associated with P (Lemma 6.2 below), yields the key estimate where we recall that Re z = L 0 + Z + κ. This estimate is crucial in establishing the stability of the self-consistent equation associated with Im P (see Lemma 6.4). More precisely, a Taylor expansion shows As ∂ 2 2 P(z, m) ≈ 2, taking the imaginary part and rearranging terms yields It can be showed that |Re ∂ 2 P(z, m)| √ κ, and we move this factor to the right-hand side of (3.6) to obtain a recursive estimate of Im(G − m). The third term on right-hand side of (3.6) says that in order for this estimate to work, we need which is exactly (3.5). The final step in showing (3.2) is to bound the high moments of Im P in Proposition 6.3. As Im P is much smaller than P near the edge, compared to E|P| 2n , we obtain a much smaller bound for E|Im P| 2n . The proof is similar to that of Proposition 6.1, but contains significantly fewer expansions. Combining Proposition 6.3 and Lemma 6.4 leads to our desired estimate of Im G, which is As we prove the above for z satisfying Im m 1 N η , we get (3.1) as desired.

Proof of Theorem 1.2
In this section we prove Theorem 1.2. The key result is the following upper bound on the largest eigenvalue of H . The proof is postponed to Sect. 6.

Proposition 4.1 Denoting by μ N the largest eigenvalue of H , we have
We also need the following result to estimate the eigenvalues away from the spectral edges. The proof is postponed to Sect. 9.

Lemma 4.2 Let ρ denote the empirical eigenvalue density of H , and set
We have for all I ⊂ I 1 and I = I 2 .
Proof of Theorem 1. 2 We prove (1.8) for i ∈ { N /2 − 1, . . . , N − 1}, and the same analysis works for the other half of the spectrum. Let i ∈ { N /2 − 1, . . . , N − 1} and suppose first that Then trivially we have γ i ∈ I 2 with very high probability. In addition, by the Cauchy interlacing theorem we have λ i ≤ μ N , and together with Proposition 4.1 we obtain Thus by the triangle inequality we get Next, suppose (4.3) does not hold, namely for some a ∈ (0, 3). Let ν be the empirical eigenvalue density of A. By the Cauchy interlacing theorem, for any I ⊂ R. Together with (4.2), we have where in the last step we used (4.5). By the definition of Together with the uniform square root behaviour of the density of near L from Lemma 2.11 we therefore have f (γ i ) with very high probability for any x between λ i and γ i . Thus the mean value theorem yields Using the above relation, together with (4.4) and Lemma 2.12, we conclude that We then take the expectation using Lemma 2.4, which yields Combining the above two formulas we have (1.8) as desired.

Abstract polynomials and the construction of P 0
Convention Throughout this section, z ∈ S is deterministic.
In this section we construct the polynomial P 0 and prove Lemma 2.9. It was essentially proved in [19, Proposition 2.9]; here we follow a more systematic approach, based on a class of abstract polynomials in the Green's function entries, which provides an explicit proof. We shall generalize this class further in Sect. 7.

Abstract polynomials, part I
We start by introducing a notion of formal monomials in a set of formal variables, which are used to construct P 0 . Here the word formal refers to the fact that these definitions are purely algebraic and we do not assign any values to variables or monomials.

Definition 5.2
We assign to each monomial T ∈ T with ν 1 = ν 1 (T ) its evaluation which is a random variable depending on an It is obtained by replacing, in the formal monomial T , the formal indices i 1 , . . . , i ν 1 with the integers i 1 , . . . , i ν 1 and the formal variables G xy with elements G xy of the Green's function (2.1) with parameter z. We define Defining the random variable we have the following result, whose proof is given in Sect. 10.1 below.
In the sequel we also need the subset of formal monomials without off-diagonal entries. We define an averaging map M from T 0 to the space of random variables through is that it replaces all diagonal entries of G in T by their average G and then applies S. Note that it is only applied to monomials T ∈ T 0 without off-diagonal entries. The following result is proved in Sect. 10.2 below.

Lemma 5.5
For any fixed T ∈ T 0 there exists k ∈ N and T (1) , . . . , Lemma 5.5 leads to the following result.
Note that we in particular have M(1, T ) = M(T ) through (5.7).

The construction of P 0 and Proof of Lemma 2.9
We compute and we shall find a polynomial Q 0 such that We then set P 0 (z, where is a fixed positive integer to be chosen later, and R ( ji) +1 is a remainder term defined analogously to R +1 in (2.4). One can follow, e.g. the proof of Lemma 3.4 (iii) in [16], and readily check that for ≡ (β) large enough. From now on, we always assume the remainder term in cumulant expansion is negligible. Now let us look at each X k . For k = 1, by the differential rule (2.2) and C 2 (H i j ) = 1/N for i = j, we have For k = 2, the most dangerous term is Other terms in X 2 also satisfy the same bound. Similar estimates can also be done for all even k, which yield For odd k ≥ 3, we split where terms in X k,1 contain no off-diagonal entries of G. Use Lemma 5.3, we easily find By Lemma 2.2, we see that i j is deterministic and uniformly bounded. Combining with (5.12)-(5.14), we have where To handle the right-hand side of (5.15) we invoke Lemma 5.6. Naively, we have for each n. By Lemma 5.6, we can write Thus we can set and note that Q 0 is a polynomial of degree 2 β −1 . This concludes the proof of Lemma 2.9.

Remark 5.7
After the construction of P 0 (and consequently P), we shall construct a more general class of abstract polynomials associated with P in Sect. 7.2 below.

Proof of Proposition 4.1
Convention Throughout this section, The proof of Proposition 4.1 consists of two steps; in the first we first estimate G and in the second we apply this estimate to obtain a more precise bound of Im G.

Estimate of G
Define the spectral domain As a guide to the reader, the lower bound on κ is chosen to be slightly smaller than the scale 1 √ Nq on which the extreme eigenvalues fluctuate; analogously, the lower bound on η is chose to be slightly smaller than the scale . Using that Im m(z) η √ κ (see Lemma 2.11), this choice of lower bound on η will allow us to rule out the presence of eigenvalues [see (6.6) below], and hence establish rigidity.
Recall the definition of τ in Lemma 2.11, and note that the lower bound on κ ensures, with very high probability, for all w ∈ Y. The main technical step is the following bound for P(z, G), whose proof is postponed to Sect. 7.
for all w ∈ Y. Suppose ε(w) is Lipschitz continuous with Lipschitz constant N and moreover that for each fixed κ the function η → ε(κ + iη) is nonincreasing for η > 0. Then Combining Proposition 6.1 and Lemma 6.2, we find that for any deterministic that does not depend on η we obtain the implication Using the initial estimate |G − m| ≺ 1 from Proposition 2.5, we therefore conclude the key bound

Estimate of Im G
Define the subset (6.5) In this section we show that for all w ∈ Y * . This immediately implies that whenever κ + iη ∈ Y * , with very high probability there is no eigenvalue in the interval (L 0 + Z + κ − η, L 0 + Z + κ + η).
In addition, [9,Lemma 4.4] implies and hence the largest eigenvalue μ N of H satisfies (4.1), and Proposition 4.1 is proved. What remains, therefore, is the proof of (6.6). In analogy to Proposition 6.1, we have the following estimate for Im P(z, G), whose proof is postponed to Sect. 8.
Proof A Taylor expansion gives Note that ∂ k 2 P(z, m) ≺ 1, and, recalling the definition of P, we find from (2.5) and by Lemma 2.11 that for all k ≥ 1, where the last inequality holds for any w ∈ Y * we have This implies, for all k ≥ 2, where in the second step we used (6.4). Taking imaginary part of (6.7) and rearranging the terms, we have Note that | Im ∂ 2 P(z, m)| | Im m| √ κ, and by (2.7) we have |∂ 2 P(z, m)| √ κ. Thus and together with (6.4) and (6.8) we have This yields the claim.
From Proposition 6.3 and Lemma 6.4 we obtain the implication we thus conclude (6.6). This concludes the proof of Proposition 4.1.

Proof of Proposition 6.1
Convention Throughout this section, z is given by (6.1), where w ∈ Y is deterministic.
Fix n ∈ N + and set We shall show, for any fixed n ∈ N, that from which Proposition 6.1 follows by Chebyshev's inequality. The rest of this section is therefore devoted to the proof of (7.1). Set Note that argument z of G is random, and and as a result We define the parameter Recalling the random variable from (5.3), we find Moreover, we have The next lemma collects basic estimates for the derivatives of P.

Lemma 7.1
Under the assumptions of Proposition 6.1, for any fixed k ∈ N + we have Proof By the mean value theorem, for some ξ between m and G. Then the first estimate in (7.8) is proved using Lemma 2.11 and (6.3). The second estimate in (7.8) is proved by Lemmas 2.6 and 2.8. By (7.4) and (7.8), one easily checks that and combing with (7.7) one concludes (7.9).

The first expansion
By (H − z)G = I , we have We use Lemma 2.1 to calculate the last term. By setting h = H i j , f = f ji (H ) = G ji P n−1 P * n , we get where, as in (5.12), we choose a large enough ∈ N + such that the remainder term is is negligible. By splitting the differentials in (7.10) basing on if P, P are differentiated, we have E|P| 2n = EQ 0 P n−1 P * n + EZG 2 P n−1 P * n We have the following result, which handles the terms on right-hand side of (7.11) and directly implies (7.1). (7.11). We have

Lemma 7.2 Let (I-IV) be as in
E r P 2n−r (7.12) as well as The rest of Sect. 7 is devoted to showing Lemma 7.2. To simplify notation, we drop the complex conjugates in (I-IV) (which play no role in the subsequent analysis), and estimate the quantities .14) and

Abstract polynomials, part II
In order to estimate (7.14) and (7.15), we introduce the following class of abstract polynomials, which generalizes the class T from Definition 5.1.
We denote by V the set of formal monomials V of the form (7.16).
We extend the evaluation from Definition 5.2 to the set V, and denote the evaluation of V as in (7.16) by V i 1 ,...,i ν 1 . We also extend the operation S from (5.2) to V.
The next lemma is an analogue of Lemma 5.3, whose proof is postponed to Sect. 10.3.
(ii) Moreover, when ν 4 (V ) = ν 5 (V ) = 0, we have the stronger estimate In the sequel, we also need the subset In analogy to (5.5), we define an averaging map M from V 0 to the space of random variables through The following is an analogue of Lemma 5.5, whose proof is given in Sect. 10.4.
Finally, we have the following extension of Lemma 5.6, which is proved in Sect. 10.5. Lemma 7.7 Fix r , u, v ∈ N. Let T ∈ T 0 and let M(r , T ) be as in Lemma 5.6. Then

By (7.4) and C 2 (H
Estimating the last term using Lemma 7.1, we conclude By H G = zG + I and z ≺ 1, we deduce that H G ≺ 1. In addition, it is easy to check that G 3 ≺ ϒ(N η) −1 . Thus the first term on right-hand side of (7.19) can be estimated by As a result, where in the second step we used Hölder's inequality. Since for all w ∈ Y, we have X 1 ≺ E 2 P 2n−2 as desired.

The estimate of X 2
Let us split where in the second and third step we used Lemmas 2.8 and 7.1 respectively. Note that and similarly Thus by (7.20) we get as desired. As for the term X 2,2 , we see from (7.3) that the most dangerous term is . X 2,2,1 + X 2,2,2 + X 2,2,3 , where a i j is deterministic and uniformly bounded. Note that we write One can easily check for all t ≥ 1. Thus we have For X 2,2,2 , we can again apply Lemma 2.1 for h = H i j and get Note that for all fixed s ≥ 0. Together with (7.9) and the trivial bound N −1 ≺ E, we see that where in the last step we also used ϒ ≺ E. Similar steps also work for X 2,2,3 . As a result, we have (7.23) ≺ 2n r =2 E r P 2n−r . Other terms in X 2,2 can be estimated in a similar fashion, which leads to Combining with (7.22) we get

The computation of X 3
Let us split Step 1 When s = 1, 3, it is easy to see from (7.3) that Using (7.9), we can deduce where in the second step we used Lemma 2.8. As in (7.21) and (7.24), we have for all t ≥ 0. Thus X 3,s ≺ 2n r =2 E r P 2n−r for s = 1, 3.
Step 2 Let us consider Similar as in the previous steps, we can show that Together with Lemma 2.8 we get As the last two terms can be estimated by O ≺ (E 2 P 2n−2 ), we have Step 3 Let us compute X 3,2,1 . We write where and b 2 , . . . , b β −1 are bounded. We can estimate the right-hand side of (7.28) by 2n r =2 O ≺ (E r P 2n−2 ), so that Step 4 Let us consider the term EM(V ) in (7.29). Explicitly, is bounded by Lemma 2.2. Since In addition, Thus, Step 5 We expand the term (A) again by Lemma 2.1, and get By Lemma 7.1, whenever the derivative ∂ k /∂ H k i j on the right-hand side hits G 3 P 2n−2 , the corresponding term can be bounded by O ≺ 2n r =2 E r P 2n−r . Furthermore, since ∂ k /∂ H k i j commutes with ∂ w , The analysis of Y k is similar to that of X k in Sect. 5.2. For k = 1, by (7.3), (7.31) and Lemma 7.4 (i), we have (7.33) For k = 2, by (7.3) and (7.31) we see that the most dangerous term is where in the last step we again used w ∈ Y and Hölder's inequality. Other terms in Y 2 also satisfy the same bound. A similar estimate can also be obtained for all even k, where by definition terms in Y k,1 contain no off-diagonal entries of G or G 2 . Use Lemma 7.4 (i), we can again show that By Lemma 2.2, we see that where a (k) i j is deterministic and uniformly bounded. Combining with (7.33)-(7.34), we obtain where Since s ≥ 2, one readily checks that the last two terms can be bounded by O ≺ ( 2n r =2 E r P 2n−r ). Thus (7.35) reads Note that by construction, T (s) in (7.36) is the same as in (5.16). From Lemma 7.7, we see that the term M( β −1 − 2s + 2 , T (s) ) in (7.37) is the same as in (5.18), which implies Thus (7.37) reduces to (7.38)

Final
Step By (7.32) and (7.38), we see that there is a cancellation between (A) and (B), which leads to (7.39) The first two terms on right-hand side of (7.39) are stochastically dominated by (7.40) and one can check that ϒ/(N η) E 2 and N −1−2β E 2 , so that we need to keep track of these terms in order to obtain a further cancellation. So far we have been dealing with EM(V ) in (7.29), and other terms in (7.29) can be handled in the same way as in Steps 4 and 5. Compared to EM(V ), each N −2β b l N −lβ EP N −1 G 2 G 2+2l P 2n−2 contains an additional factor N −lβ . Similarly to (7.39) and (7.40), it can be shown that E r P 2n−r for all l ≥ 2. As a result, we have where b 1 is defined as in (7.30).
Next, we consider the other terms on right-hand side of (7.26). Similarly to (7.41), we can also show that E r P 2n−r as well as Note that this results in two cancellations on right-hand side of (7.26), and we have As we have already estimated X 3,1 and X 3,3 in Step 1, we conclude that (7.42) Remark 7. 8 The crucial step in analysing X 3 is the computation of X 3,2,1 in (7.41). As in (7.27), we can write X 3,2, the formula (7.41) implies the estimate The argument for X 3,2,1 can be repeated for general ES(V ), which allows one to show the following result.

Conclusion
After the steps in Sects. 7.3.1-7.3.3, it remains to estimate X k for k ≥ 4. When k ≥ 4 is even, the estimate of X k is similar to that of X 2 in Sect. 7.3.2. In fact, by Lemma 2.2, we see that there will be additional factors of N −β in X k when k ≥ 4, which makes the estimate easier. Using Lemma 7.4 (i), one can show that When k ≥ 4 is odd, the estimate of X k is similar to that of X 3 in Sect. 7.3.3. By Lemma 2.2, we see that there will be additional factors of N −(k−2)β in X k , k ≥ 4. Using Lemmas 7.1, 7.4 and 7.9, one can show that As a result, we arrive at

The computation of (II') in (7.14)
Using Lemma 2.1 with h = H i j , we have For each k, we write Each Z k can be handled again by applying Lemma 2.1 with h = H i j . One easily shows that Combining with (7.44) and (7.45), we have The analysis of X k is similar to those of X k in Sect. 7.3, and we only sketch the key steps.
For k = 2, we see from (7.3) that the most dangerous term in X 2 is which is very close to the left-hand side of (7.23). We can apply Lemma 7.4 (i) and show that (7.47) is bounded by O ≺ ( 2n r =2 E r P 2n−r ). Similarly, we can also handle all the other terms in X 2 , which leads to For k = 3, by the differential rule (7.3), we see that the most dangerous term in X 3 is which is very close to the right-hand side of (7.25). Similarly to (7.26), we have and the right-hand side can be computed similarly to X 3,2,1 , X 3,2,2 , X 3,2,3 in (7.26). As a result, we can show that For k ≥ 4, the argument is similar to that in Sect. 7.3.4. We can show that Combining the above with (7.46)-(7.49), we have Now observe the cancellation between (7.43) and (7.50), which leads to E r P 2n−r as desired.

The estimate of (7.15)
From the construction of P 0 in Sect. 5.2, we can easily show that and in this section we shall see that the analogue holds when the factor P 2n−1 is added inside the expectations. Let us write and analyse each X (1) k . Let us first consider the case when k is odd. For k = 1, it is easy to see from (7.3) and Lemma 2.8 that For odd k ≥ 3, we see from (7.3) and Lemma 2.8 that i j is deterministic and uniformly bounded. For even k, we follow a similar strategy as in Sect. 7.3.2. We see from (7.3) and Lemma 2.8 that i j is deterministic and uniformly bounded. The first term on right-hand side of (7.53) can be written as ES(V ), where V ∈ V, ν 2 (V ) = 0 and ν 4 (V ) = ν 5 (V ) = 0. Thus we can apply Lemma 7.4 (ii) to estimate this term, and show that it is bounded by Combining (7.51), (7.52) and (7.54), we have where we recall the definition of S(T ) in (5.2). Observe that from the above steps, T (s) in (7.55) is the same as in (5.16). To handle E[S(T (s) )P 2n−1 ], we introduce the following analogue of Lemmas 5.6 and 7.7.
Proof The proof analogous to those of Lemmas 5.6 and 7.7. We use the identity to replace the diagonal entries in S(T ), and then expand the terms containing H using Lemma 2.1. We omit the details.
By Lemma 7.10 we have, for any s ∈ {2, 3, . . . , /2 }, Together with (7.55) we have From Lemma 7.10, we see that the term M( β −1 − 2s + 2 , T (s) ) in (7.56) is the same as in (5.18), which implies and together with (7.15) we conclude that E r P 2n−r as desired. This concludes the proof of Lemma 7.2 and hence also that of Proposition 6.1.

Proof of Proposition 6.3
Convention Throughout this section, z is given by (6.1), where w ∈ Y * is deterministic.
Let us fix n ∈ N + and set We shall show that from which Proposition 6.3 follows by Chebyshev's inequality. We shall see that the proof of (8.1) is much simpler than that of (7.1), as it does not require a secondary expansion as in Sect. 7.3.3. We define the parameter Recall the definition of from (5.3). It is easy to check that ≺ .
In addition, recall the definitions of P , Q and Q 0 from (7.2). With the help of (6.4), we obtain the following improved version of Lemma 7.1.

Lemma 8.1 We have
and where in the second step we used that H has real entries.

Remark 8.2
Although we used that the entries of H are real in (8.4), our argument easily extends to complex entries of H . To see how, for any holomorphic f : . We view all quantities appearing in our arguments as functions of z and use the operator J instead of Im. Then it is easy to check that in both real and complex cases, Proposition 6.3 as well as all its consequences remain true if we replace Im by J everywhere. Note that Im G = J G and Im P = J P, but in general Im G i j = J G i j . An alternative point of view is to regard all of our quantities as functions of z and H , and to take the imaginary part with respect to the Hermitian conjugation of z and H .
Similarly to (7.11), we can use Lemma 2.1 on the last term of (8.4), and get E (Im P(z, G) We shall prove the following result, which directly implies (8.1). (8.5). Then

Lemma 8.3 Let (V)-(VII) be as in
and

Proof of (8.7)
Define (8.8) so that (VII) = k=1 X (2) k . Note that for f . . R → C and h real, d Im f (h) dh = Im d f (h) dh , so that the derivatives in (8.8) can be computed through (7.3). Let us estimate each X (2) k . For any fixed k ∈ N + , it is easy to see from (8.3) that By (7.3) and Proposition 2.5, we see that (8.9) where in the last step we estimated Im G x x by O ≺ (Im G), using its spectral decomposition and Lemma 2.6. Here we see the crucial effect of taking imaginary part of P, which results Im G on right-hand side of (8.9) instead of G. Note that Im G ≤ + Im m + η/ √ κ, and together with Lemma 2.8 we have By Cauchy-Schwarz and (6. Together with ( √ κ ) s ≺ E s Im for all s ≥ 0, we get Im for all r ≥ 1. Combining the above estimate with (8.10), we have This concludes the proof of (8.7).

Proof of (8.6)
The proof is similar to the estimate of (7.15) in Sect. 7.5. Define k . We analyse each X k . Consider first the case when k is odd. For k = 1, it is easy to see from (7.3) and Lemma 2.8 that When k ≥ 3 is odd, we see from (7.3) and Lemma 2.8 that where a (k) i j is deterministic and uniformly bounded. For even k, we see from (7.3) and Lemma 2.8 that where a (k) i j is deterministic and uniformly bounded. Note that the analogue of (8.13) has appeared in (7.53). To handle this term, we use the following result.
Proof The proof essentially follows from the strategy of showing Lemmas 5.3 and 7.4. We use the identity to replace the G i j in the equation, and then expand the terms containing H using Lemma 2.1. We omit the details.

Lemma 8.4 immediately implies
Combining (8.11)-(8.14), we have Here we recall the definition of S(T ) in (5.2), and observe that T (s) above is the same as in (5.16). To handle the last relation, we introduce the following analogue of Lemmas 5.6 and 7.7.

Proof
The proof is similar to those of Lemmas 5.6 and 7.7. We use the identity to replace the diagonal entries in S(T ), and then expand the terms containing H using Lemma 2.1. We omit the details.
By Lemma 8.5, we have, for any s ∈ {2, 3, . . . , /2 }, From Lemma 8.5, we see that the term M( β −1 − 2s + 2 , T (s) ) above is the same as in (5.18), which implies In addition, note that From the definition of (V) in (8.5), we conclude that E r Im P 2n−r Im as desired. This concludes the proof of Lemma 8.3, and hence also that of Proposition 6.3.

Proof of Lemma 4.2
Convention Throughout this section, z is given by (6.1), where w is deterministic and contained in where c > 0 is fixed.
The key in proving Lemma 4.2 is the following result.
uniformly for all w ∈ D.
The stability analysis of P was dealt for the region Y in Lemma 6.2, and one easily checks that the same result holds for the region D. This leads to the next lemma. Lemma 9.2 Lemma 6.2 holds provided that Y is replaced with D.
Combining Proposition 9.1 and Lemma 9.2, we obtain the implication and thus uniformly for all w ∈ D. By the rigidity estimate (9.2), together with a standard analysis using Helffer-Sjöstrand formula (e.g. [12,Proposition 3.2]), one immediately concludes the proof of Lemma 4.2.
The rest of the section is devoted to the proof of Proposition 9.1. It is simpler than that of Proposition 6.1, and we only give a sketch. A detailed proof of a slightly weaker result can be found in [19,Proposition 2.9].

Proof of Proposition 9.1
Fix n ∈ N + and set P . .= P(z, G) 2n = E|P(z, G)| 2n 1 2n , We shall show that and Proposition 9.1 is obtained by Chebyshev's inequality.
We shall see that the proof of (9.3) is much simpler than that of (7.1), as it does not require a secondary expansion as in Sect. 7.3.3. Recall the definitions of P , Q and Q 0 from (7.2), and recall the definition of ϒ from (7.5). We have the bound In addition, note that Lemma 7.1 remains true for w ∈ D.

Lemma 9.3 We have
as well as We now sketch the proof of (9.6). The proof of (9.7) follows in a similar fashion. Let us first consider (XI). We write (XI) = l k=1 X (4) k , where For k = 1, one can repeat the steps in Sect. 7.3.1 and show that Note that we have the bound (9.4), which implies For k = 2, one can follow the steps in Section 2 of [19], and show that X A similar strategy works for all even k ≥ 4. This gives For k = 3, we split X where in the last step we used 3,2 . Similarly to (7.26), we have .
A similar strategy works for all odd k ≥ 5. Note that (9.10) implies the bound By Lemma 2.2, we see that, compared to X (4) 3 , there will be additional factors of N −(k−2)β in X (4) k for all k ≥ 4. Thus we can shown that Using the above relation, together with (9.8)-(9.10), we get The computation of (IX) is similar, and we can show that By Proposition 2.5, we have Thus, there is a cancellation between the leading order terms of (IX) and (XI), which implies ϒ r P 2n−r as desired. This concludes the proof of Lemma 9.3, and also that of Lemma 4.2.

Proof of the improved estimates for abstract polynomials
In this section we repeatedly use the following identity.
Lemma 10. 1 We have Proof The resolvent identity (H − z)G = I shows and from which the proof follows.
Let f (G) be a function of the entries of G. We and we shall see that the last two terms above cancel each other up to leading order, by Lemma 2.1. As a result, we can replace E f (G)G i j by a slightly nicer quantity E f (G)δ i j G. This is the idea that we use throughout this section.
In each of the following subsections, the assumptions on z are given by the assumptions of the corresponding lemma being proved.

Proof of Lemma 5.3
As discussed in Remark 5.4, it suffices to look at the case ν 2 = 1.
Without loss of generality, let T i 1 ,...,i ν 1 = a i 1 ,..., , and a i 1 ,...,i ν 1 is uniformly bounded. Using Lemma 10.1 for i = i 1 and j = i 2 , we have Similarly, the last term in (10.1) becomes Let us estimate each X (5) k and X (6) k . For k = 1, by C 2 (H i j ) = N −1 (1 + O(δ i j )) and Lemma 2.8 we have Notice the cancellation between the above two equations. This gives For k = 2, the most dangerous type of term in X (6) 2 contains only one off-diagonal entry of G, e.g.

Proof of Lemma 5.5
Let Now let us expand the last two terms by Lemma 2.1. As in Sect. 10.1, we shall see a cancellation among the leading terms, which gives For the terms on right-hand side of (10.5) that are not in T 0 , we can use Lemma 5.3 and show that they are bounded by O ≺ (N ν 1 (T )−θ(T ) (E + N −1 )). As a result, we find for some fixed integer m. Each T (l) satisfies T (l) ∈ T 0 , ν 1 (T (l) ) = ν 1 (T )+1, σ (T (l) )− σ (T ) ∈ 2N + 4, and θ(T (l) ) = θ(T ) + 1 + β(σ (T (l) ) − σ (T ) − 2). We can then repeat (10.6) on the term After k − 1 times of repetition we get the desired result. This concludes the proof of Lemma 5.5.

Proof of Lemma 7.4
(i) Let V be of the form (7.16). By Lemma 2.8, we see that the result is trivially true for ν 2 ≥ 2, and hence we assume ν 2 = 1. Define By the definition of ν 2 , we consider two cases. Case 1 The contribution of ν 2 comes from G x 1 y 1 G x 2 y 2 . . . G x k y k . Without loss of generality, we assume x 1 = y 1 , and and we denote the first sum by k=1 X (7) k . Similarly, the last term in (10.7) becomes − k=1 i 1 ,...,i ν 1 ,x and we denote the first sum by k=1 X (8) k . Similarly to Sect. 10.1, we see that when expanded by Lemma 2.1, the leading terms of X (7) 1 and X (8) 1 cancel, and together with Lemma 7.1 we can show that X (7) 1 + X (8) 1 ≺ E 2 (V ).
Note that this term can be written as ES(V ), where V ∈ V, ν 1 (V ) = ν 1 (V ) + 1, θ(V ) = θ(V ) + 1 + β, σ (V ) = σ (V ) + 1, and ν i (V ) = ν i (V ) for i = 2, 3, 4, 5. When a term in X We can then expand the first two terms on right-hand side of (10.11) using Lemma 2.1. The first term on right-hand side of (10.11) gives k=1 i 1 ,...,i ν 1 ,x,y and we abbreviate the first sum above by k=1 X (9) k . The second term on right-hand side of (10.11) gives − k=1 i 1 ,...,i ν 1 ,x and we abbreviate the first sum above by k=1 X (10) k . By (7.4), we see that and X (10) Thus there is a cancellation between X The first term on right-hand side of (10.13) is the leading term, and it no longer contains (G 2 ) i 1 i 2 . The rest of the proof is analogues to Case 1. We omit the details.
After k times of repetition we conclude the proof of Lemma 7.5.

Proof of Lemma 7.7
The proof follows by repeatedly using the following result.