1 Introduction

The eigenvalues of a \(n \times n\) matrix \({\mathbf {M}}\) are the roots in \(\mathbb {C}\) of the characteristic polynomial \(\det ({\mathbf {M}}-z{\mathbf {I}})\), where \({\mathbf {I}}\) is the identity matrix. We let \(\lambda _1({\mathbf {M}}), \ldots , \lambda _n({\mathbf {M}})\) denote the eigenvalues of \({\mathbf {M}}\). In this case, the empirical spectral measure of \({\mathbf {M}}\) is given by

$$\begin{aligned} \mu _{{\mathbf {M}}} := \frac{1}{n} \sum _{i = 1}^n \delta _{\lambda _i({\mathbf {M}})}. \end{aligned}$$

The corresponding empirical spectral distribution (ESD) is given by

$$\begin{aligned} F^{{\mathbf {M}}}(x,y) := \frac{1}{n} \# \left\{ 1 \le i \le n: \mathrm{Re}(\lambda _i({\mathbf {M}})) \le x, \mathrm{Im}(\lambda _i({\mathbf {M}})) \le y \right\} . \end{aligned}$$

Here \(\# E\) denotes the cardinality of the set \(E\). If the matrix \({\mathbf {M}}\) is Hermitian, then the eigenvalues \(\lambda _1({\mathbf {M}}), \ldots , \lambda _n({\mathbf {M}})\) are real. In this case the ESD is given by

$$\begin{aligned} F^{{\mathbf {M}}}(x) := \frac{1}{n} \# \left\{ 1 \le i \le n : \lambda _i({\mathbf {M}}) \le x \right\} . \end{aligned}$$

Given a random \(n \times n\) matrix \({\mathbf {X}}_n\), an important problem in random matrix theory is to study the limiting distribution of the empirical spectral measure as \(n\) tends to infinity. We consider one of the simplest random matrix ensembles, when the entries of \({\mathbf {X}}_n\) are iid copies of the random variable \(\xi \). We refer to \(\xi \) as the atom variable of \({\mathbf {X}}_n\).

When \(\xi \) is a standard complex Gaussian random variable, \({\mathbf {X}}_n\) can be viewed as a random matrix drawn from the probability distribution

$$\begin{aligned} {\mathbb {P}}(d {\mathbf {M}}) = \frac{1}{\pi ^{n^2}} e^{- {{\mathrm{tr}}}({\mathbf {M}}{\mathbf {M}}^*)} d {\mathbf {M}}\end{aligned}$$

on the set of complex \(n \times n\) matrices. Here \(d {\mathbf {M}}\) denotes the Lebesgue measure on the \(2n^2\) real entries

$$\begin{aligned} \{ \mathrm{Re}(m_{ij}) : 1 \le i, j \le n\} \cup \{ \mathrm{Im}(m_{ij}) : 1 \le i, j \le n\} \end{aligned}$$

of \({\mathbf {M}}=(m_{ij})_{i,j=1}^n\). This is known as the complex Ginibre ensemble. The real Ginibre ensemble and quaternionic Ginibre ensemble are defined analogously.

Following Ginibre [9], one may compute the joint density of the eigenvalues of a random matrix \({\mathbf {X}}_n\) drawn from the complex Ginibre ensemble. Mehta [17, 18] used this joint density function to compute the limiting spectral measure of the complex Ginibre ensemble. In particular, he showed that if \({\mathbf {X}}_n\) is drawn from the complex Ginibre ensemble, then the ESD of \(\frac{1}{\sqrt{n}} {\mathbf {X}}_n\) converges to the circular law \(F_{\mathrm {circ}}\), where

$$\begin{aligned} F_{\mathrm {circ}}(x,y) := \mu _{\mathrm {circ}} \left( \left\{ z \in \mathbb {C} : \mathrm{Re}(z) \le x, \mathrm{Im}(z) \le y \right\} \right) \end{aligned}$$

and \(\mu _{\mathrm {circ}}\) is the uniform probability measure on the unit disk in the complex plane. Edelman [7] verified the same limiting distribution for the real Ginibre ensemble.

For the general (non-Gaussian) case, there is no formula for the joint distribution of the eigenvalues and the problem appears much more difficult. The universality phenomenon in random matrix theory asserts that the spectral behavior of a random matrix does not depend on the distribution of the atom variable \(\xi \) in the limit \(n \rightarrow \infty \). In other words, one expects that the circular law describes the limiting ESD of a large class of random matrices (not just Gaussian matrices); Fig. 1 presents a numerical simulation depicting this universality phenomenon.

Fig. 1
figure 1

The eigenvalues of random matrices with iid entries. The first plot contains the eigenvalues of \(50\) samples of \(100 \times 100\) random matrices drawn from the real Ginibre ensemble. The second plot contains the eigenvalues of \(50\) samples of \(100 \times 100\) random matrices whose entries are Bernoulli random variables (i.e. each entry takes values \(\pm 1\) with equal probability). The black circle in each plot is the unit circle of radius one centered at the origin

In the 1950s, Wigner [36] proved a version of the universality phenomenon for Hermitian random matrices, now known as Wigner matrices. However, the random matrix ensemble described above is not Hermitian. In fact, many of the techniques used to deal with Hermitian random matrices do not apply to non-Hermitian matrices [2, Section 11.1].

An important result was obtained by Girko [10, 11] who related the empirical spectral measure of non-Hermitian matrices to that of Hermitian matrices. Building upon this Hermitization technique, Bai [2, 3] gave the first rigorous proof of the circular law for general (non-Gaussian) distributions under a number of moment and smoothness assumptions on the atom variable \(\xi \). Important results were obtained more recently by Pan and Zhou [25] and Götze and Tikhomirov [13]. Tao and Vu [31] were able to prove the circular law under the assumption that \({\mathbb {E}}|\xi |^{2+{\varepsilon }} < \infty \), for some \({\varepsilon }> 0\). Recently, Tao and Vu [28, 33] established the law assuming only that \(\xi \) has finite variance.

For any \(m \times n\) matrix \({\mathbf {M}}\), we denote the Hilbert–Schmidt norm \(\Vert {\mathbf {M}}\Vert _2\) by the formula

$$\begin{aligned} \Vert {\mathbf {M}}\Vert _2 := \sqrt{ {{\mathrm{tr}}}({\mathbf {M}}{\mathbf {M}}^*) } = \sqrt{ {{\mathrm{tr}}}({\mathbf {M}}^*{\mathbf {M}})}. \end{aligned}$$
(1.1)

Theorem 1.1

(Tao and Vu [33]) Let \(\xi \) be a complex random variable with mean zero and unit variance. For each \(n \ge 1\), let \({\mathbf {X}}_n\) be a \(n \times n\) matrix whose entries are iid copies of \(\xi \), and let \({\mathbf {N}}_n\) be a \(n \times n\) deterministic matrix. If \({{\mathrm{rank}}}({\mathbf {N}}_n) = o(n)\) and \(\sup _{n \ge 1} \frac{1}{n^2} \Vert {\mathbf {N}}_n \Vert _2^2 < \infty \), then the ESD of \(\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n)\) converges almost surely to the circular law \(F_{\mathrm {circ}}\) as \(n \rightarrow \infty \).Footnote 1

One of the key steps in proving Theorem 1.1 is controlling the largest and smallest singular values of \({\mathbf {X}}_n + {\mathbf {N}}_n\). We recall that the singular values of a \(m \times n\) matrix \({\mathbf {M}}\) are the eigenvalues of \(|{\mathbf {M}}| := \sqrt{ {\mathbf {M}}^*{\mathbf {M}}}\). We let \(\sigma _1({\mathbf {M}}) \ge \cdots \ge \sigma _n({\mathbf {M}}) \ge 0\) denote the singular values of \({\mathbf {M}}\). In particular, the largest and smallest singular values are given by

$$\begin{aligned} \sigma _1({\mathbf {M}}) := \sup _{\Vert x\Vert = 1} \Vert {\mathbf {M}}x \Vert , \quad \sigma _n({\mathbf {M}}) := \inf _{\Vert x \Vert = 1} \Vert {\mathbf {M}}x \Vert , \end{aligned}$$

where \(\Vert v\Vert \) denotes the Euclidean norm of the vector \(v\). We let \(\Vert {\mathbf {M}}\Vert := \sigma _1({\mathbf {M}})\) denote the spectral norm of the matrix \({\mathbf {M}}\).

While the behavior of the largest singular value is well studied (e.g. see [1, 27]), bounds for the smallest singular value appear more difficult. Using techniques from additive combinatorics, Tao and Vu established the following bound on the least singular value of \({\mathbf {X}}_n + {\mathbf {N}}_n\).

Theorem 1.2

(Tao and Vu [31]) Assume that \({\mathbf {X}}_n\) is an \(n \times n\) random matrix whose entries are iid copies of a random variable with mean zero and variance one. Assume that \({\mathbf {N}}_n\) is a deterministic \(n \times n\) matrix whose entries are bounded by \(n^\alpha \) in absolute value. Then for any \(B>0\), there exists \(A>0\) (depending on \(B\) and \(\alpha \)) such that

$$\begin{aligned} {\mathbb {P}}\left( \sigma _{n}({\mathbf {X}}_n +{\mathbf {N}}_n)\le n^{-A}\right) = O(n^{-B}). \end{aligned}$$

2 Universality of random block matrices

The goal of this note is to study a class of random matrices that generalizes the random matrix ensemble discussed above. In particular, we consider random block matrices whose entries are not necessarily independent. We will show that, under some moment assumptions, the limiting ESD of these block matrices is also given by the circular law.

2.1 Quaternions and matrices of quaternions

One of the prototypical examples of a block matrix is that of a quaternionic matrix. We now review some preliminary facts on quaternions and matrices of quaternions. Most of the these results can be found in the detailed survey by Zhang [37]. Let \({\mathbb {H}}\) denote the non-commutative field of quaternions. As a real vectors space \({\mathbb {H}}\) admits a basis \(\{1,{\mathbf {i}},{\mathbf {j}},{\mathbf {k}}\}\) with the usual multiplicative table: \(1\) is the identity element and

$$\begin{aligned} {\mathbf {i}}^2 = {\mathbf {j}}^2 = {\mathbf {k}}^2 = -1, \quad {\mathbf {i}}{\mathbf {j}}= -{\mathbf {j}}{\mathbf {i}}= {\mathbf {k}}, \quad {\mathbf {j}}{\mathbf {k}}= - {\mathbf {k}}{\mathbf {j}}= {\mathbf {i}}, \quad {\mathbf {k}}{\mathbf {i}}= - {\mathbf {i}}{\mathbf {k}}= {\mathbf {j}}. \end{aligned}$$

For \(q = q_0 + {\mathbf {i}}q_1 + {\mathbf {j}}q_2 + q_3 {\mathbf {k}}\in {\mathbb {H}}\), we have \(q^*:= q_0 - q_1 {\mathbf {i}}- q_2 {\mathbf {j}}- q_3 {\mathbf {k}}\), \(\mathrm{Re}(q) := q_0\), and \(\mathrm{Im}(q) := q_1 {\mathbf {i}}+ q_2 {\mathbf {j}}+ q_3 {\mathbf {k}}\). Then

$$\begin{aligned} q q^*= q_0^2 + q_1^2 + q_2^2 + q_3^2, \end{aligned}$$

and thus any nonzero quaternion is invertible. Define the norm \(|q| : = \sqrt{q q^*}\). It follows that for any \(q,q' \in {\mathbb {H}}\), \(|q q'| = |q| |q'|\). Real numbers and complex numbers can be thought of as quaternions in the natural way, and one has \({\mathbb {R}}\subset {\mathbb {C}}\subset {\mathbb {H}}\). Every quaternion \(q = q_0 + q_1 {\mathbf {i}}+ q_2 {\mathbf {j}}+ q_3 {\mathbf {k}}\) can be written uniquely as \(q = c_1 + c_2{\mathbf {j}}\) where \(c_1 = q_0 + q_1{\mathbf {i}}\), \(c_2 = q_2 + q_3 {\mathbf {i}}\) are complex numbers.

We say that two quaternions \(q, q'\) are similar if there exists a nonzero quaternion \(x\) such that \(q = x q' x^{-1}\). We let \( {\mathbb {S}}({\mathbb {H}})\) denote the group of quaternions with norm one. It follows that \(q,q'\) are similar if and only if there exists \(x \in {\mathbb {S}}({\mathbb {H}})\) with \(q = x q' x^*\). The following lemma shows that every quaternion is similar to a complex number.

Lemma 2.1

[37] If \(q = q_0 + q_1 {\mathbf {i}}+ q_2 {\mathbf {j}}+ q_3 {\mathbf {k}}\in {\mathbb {H}}\), then \(q\) and \(\mathrm{Re}(q) + |\mathrm{Im}(q)| {\mathbf {i}}\) are similar.

Let \({\mathbf {M}}\) be a \(n \times n\) matrix with quaternion entries. Then \(\lambda \in {\mathbb {H}}\) is called a right eigenvalue of \({\mathbf {M}}\) if there exists a nonzero vector \(X \in {\mathbb {H}}^n\) such that \({\mathbf {M}}X = X \lambda \). If \(\lambda \) is a right eigenvalue of \({\mathbf {M}}\), one finds that \(q \lambda q^{-1}\) is also a right eigenvalue of \({\mathbf {M}}\) for any nonzero quaternion \(q\). Hence the right spectrum of \({\mathbf {M}}\) is either infinite or contained in \({\mathbb {R}}\). From Lemma 2.1, we restrict our attention to complex right eigenvalues. We consider the (unique) decomposition \({\mathbf {M}}= {\mathbf {M}}_1 + {\mathbf {M}}_2 {\mathbf {j}}\). Then for any \(\lambda \in {\mathbb {C}}\) and \(X = Y + Z {\mathbf {j}}\) with \(Y,Z \in {\mathbb {C}}^n\), the following are equivalent:

  1. (i)

    \({\mathbf {M}}X = X \lambda \),

  2. (ii)

    \(\begin{bmatrix} {\mathbf {M}}_1&\quad {\mathbf {M}}_2 \\ -\overline{{\mathbf {M}}}_2&\quad \overline{{\mathbf {M}}}_1 \end{bmatrix} \begin{bmatrix} Y \\ - \overline{Z} \end{bmatrix} = \lambda \begin{bmatrix} Y \\ -\overline{Z} \end{bmatrix}\),

  3. (iii)

    \(\begin{bmatrix} {\mathbf {M}}_1&\quad {\mathbf {M}}_2 \\ -\overline{{\mathbf {M}}}_2&\quad \overline{{\mathbf {M}}}_1 \end{bmatrix} \begin{bmatrix} Z \\ \overline{Y} \end{bmatrix} = \bar{\lambda } \begin{bmatrix} Z \\ \overline{Y} \end{bmatrix}\).

Thus, the right spectrum of \({\mathbf {M}}\), when restricted to complex numbers, is given by the \(2n\) eigenvalues of the complex matrix

$$\begin{aligned} \begin{bmatrix} {\mathbf {M}}_1&\quad {\mathbf {M}}_2 \\ -\overline{{\mathbf {M}}}_2&\quad \overline{{\mathbf {M}}}_1 \end{bmatrix}. \end{aligned}$$

Moreover, the complex eigenvalues appear as conjugate pairs. The whole set of right eigenvalues of \({\mathbf {M}}\) is then the union of all similarity classes of the complex right eigenvalues of \({\mathbf {M}}\).

2.2 Random quaternionic matrices

Let \(\xi \) be a real random variable with mean zero and variance \(1/4\). We study the right eigenvalues of random quaternion matrices whose entries are iid copies of \(\xi _0 + \xi _1 {\mathbf {i}}+ \xi _2 {\mathbf {j}}+ \xi _3 {\mathbf {k}}\), where \(\xi _0, \xi _1, \xi _2, \xi _3\) are iid copies of \(\xi \). From the discussion above, we find that this is equivalent to studying the eigenvalues of random complex block matrices. Indeed, the problem reduces to studying the eigenvalues of the \(2n \times 2n\) matrix

$$\begin{aligned} {\mathbf {X}}_n = \begin{bmatrix} {\mathbf {A}}_n&\quad {\mathbf {B}}_n \\ - \overline{{\mathbf {B}}}_n&\quad \overline{{\mathbf {A}}}_n \end{bmatrix}, \end{aligned}$$
(2.1)

where \({\mathbf {A}}_n, {\mathbf {B}}_n\) are independent \(n \times n\) complex matrices whose entries are iid copies of \(\xi _0 + \xi _1 {\mathbf {i}}\). We note, however, that the entries of \({\mathbf {X}}_n\) are not independent. Thus, Theorem 1.1 cannot be applied to the block matrix \({\mathbf {X}}_n\).

In the case that \(\xi \) is Gaussian (i.e. the quaternionic Ginibre ensemble), the circular law was established by Benaych-Georges and Chapon [4] using logarithmic potential theory. We will verify the circular law for random quaternionic matrices when the atom variable \(\xi \) is non-Gaussian.

Theorem 2.2

(Universality for quaternion random matrices) Let \(\xi \) be a complex random variable with mean zero and variance \(1/2\), and suppose \({\mathbb {E}}[\xi ^2] = 0\) and \({\mathbb {E}}|\xi |^{2+\eta } < \infty \) for some \(\eta > 0\). For each \(n \ge 1\), let \({\mathbf {A}}_n, {\mathbf {B}}_n\) be independent \(n \times n\) matrices whose entries are iid copies of \(\xi \), and let \({\mathbf {X}}_n\) be the \(2n \times 2n\) matrix defined in (2.1). For each \(n \ge 1\), let \({\mathbf {N}}_n\) be a deterministic \(2n \times 2n\) matrix, and suppose the sequence \(\{{\mathbf {N}}_n\}_{n \ge 1}\) satisfies \({{\mathrm{rank}}}({\mathbf {N}}_n) = O(n^{1-{\varepsilon }})\) and \(\sup _{n \ge 1} \frac{1}{n^2} \Vert {\mathbf {N}}_n\Vert _2^2 < \infty \), for some \({\varepsilon }> 0\). Then the ESD of \(\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n)\) converges almost surely to the circular law \(F_{\mathrm {circ}}\) as \(n \rightarrow \infty \).

2.3 Random block matrices

More generally, we will study random block matrices of the form

$$\begin{aligned} {\mathbf {X}}_n = \begin{bmatrix} {\mathbf {A}}_n&\quad {\mathbf {B}}_n \\ {\mathbf {C}}_n&\quad {\mathbf {D}}_n \end{bmatrix}, \end{aligned}$$
(2.2)

where \({\mathbf {A}}_n = (a_{ij})_{i,j=1}^n, {\mathbf {B}}_n = (b_{ij})_{i,j=1}^n, {\mathbf {C}}_n = (c_{ij})_{i,j=1}^n, {\mathbf {D}}_n = (d_{ij})_{i,j=1}^n\), and

$$\begin{aligned} \{(a_{ij}, b_{ij}, c_{ij}, d_{ij}) : 1 \le i,j \le n \} \end{aligned}$$

is a collection of iid copies of the random vector \((\xi _1,\xi _2,\xi _3,\xi _4)\). Here the random variables \(\xi _1, \xi _2, \xi _3, \xi _4\) are not required to be independent.

This ensemble of block matrices was proposed by Tao at the AIM Workshop on Random Matrices as a matrix model with dependent entries in which the circular law is still expected to hold.Footnote 2 We will prove the circular law for this ensemble of random block matrices under some moment assumptions on the atom variables \(\xi _1,\xi _2,\xi _3,\xi _4\).

The matrix \({\mathbf {X}}_n\) in (2.2) can be viewed as a \(2 \times 2\) block matrix. More generally, we will study \(d \times d\) block matrices for any \(d \ge 2\). We begin with the following definition.

Definition 2.3

(Random block matrices with dependent entries; Condition C0) Let \(d \ge 2\). Let \((\xi _{st})_{s,t=1}^d\) be a complex random matrix where each entry \(\xi _{st}\) has mean zero and variance \(1/d\). For each \(s,t \in \{1,\ldots ,d\}\), let \(\{ x_{st;ij}\}_{i,j \ge 1}\) be an infinite double array of complex random variables all defined on the same probability space. For each \(n \ge 1\) and all \(s,t \in \{1,\ldots ,d\}\), define the \(n \times n\) random matrix \({\mathbf {X}}_{n,st} := (x_{st;ij})_{i,j=1}^n\). Define the \(dn \times dn\) random block matrix

$$\begin{aligned} {\mathbf {X}}_n := \begin{bmatrix} {\mathbf {X}}_{n,11}&\quad \ldots&\quad {\mathbf {X}}_{n,1d} \\ \vdots&\quad \ddots&\quad \vdots \\ {\mathbf {X}}_{n,d1}&\quad \ldots&\quad {\mathbf {X}}_{n,dd} \end{bmatrix} = \left( {\mathbf {X}}_{n,st} \right) _{s,t=1}^d. \end{aligned}$$

We say the sequence of matrices \(\{{\mathbf {X}}_n\}_{n \ge 1}\) satisfies condition C0 with parameter \(d\) and atom variables \((\xi _{st})_{s,t=1}^d\) if the following conditions hold:

  1. (i)

    \(\{ (x_{st;ij})_{s,t=1}^d : 1 \le i,j \}\) is a collection of iid copies of \((\xi _{st})_{s,t=1}^d\),

  2. (ii)

    We have \({\mathbb {E}}\left[ \xi _{st} \overline{\xi _{uv}}\right] = 0\) for all \((s,t) \ne (u,v)\).

In Theorem 2.4 below, we establish the circular law for a class of random block matrices that satisfy condition C0. In particular, Theorem 2.2 is a corollary of the following theorem in the case that \(d=2\).

Theorem 2.4

(Universality for random block matrices) Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of random matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume that

$$\begin{aligned} \max _{1 \le s,t \le d} {\mathbb {E}}|\xi _{st}|^{2+\eta } < \infty \end{aligned}$$

for some \(\eta > 0\). For each \(n \ge 1\), let \({\mathbf {N}}_n\) be a deterministic \(dn \times dn\) matrix, and suppose the sequence \(\{{\mathbf {N}}_n\}_{n \ge 1}\) satisfies \({{\mathrm{rank}}}({\mathbf {N}}_n) = O(n^{1-{\varepsilon }})\) and \(\sup _{n\ge 1} \frac{1}{n^2} \Vert {\mathbf {N}}_n\Vert _2^2 < \infty \) for some \({\varepsilon }> 0\). Then the ESD of \(\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n)\) converges almost surely to the circular law \(F_\mathrm {circ}\) as \(n \rightarrow \infty \).

In Definition 2.3, we require the atom variables \((\xi _{st})_{s,t=1}^d\) to be uncorrelated. In this note, we will not deal with the correlated case. However, when there is a correlation among the atom variables, we do not always expect the circular law to be the limiting distribution. In Fig. 2, we plot the eigenvalues of \(\frac{1}{\sqrt{2n}} {\mathbf {X}}_n\) in the case that

$$\begin{aligned} {\mathbf {X}}_n = \begin{bmatrix} {\mathbf {A}}_n&\quad {\mathbf {A}}_n \\ {\mathbf {A}}_n&\quad {\mathbf {B}}_n \end{bmatrix}, \end{aligned}$$
(2.3)

where \({\mathbf {A}}_n, {\mathbf {B}}_n\) are independent \(n \times n\) random matrices drawn from the real Ginibre ensemble. In particular, \({\mathbf {X}}_n\) does not satisfy condition (ii) of Definition 2.3. Figure 2 predicts that more of the eigenvalues will concentrate near the origin, and so we do not believe the limiting distribution will be uniform on the unit disk.

Fig. 2
figure 2

The eigenvalue plot of \(\frac{1}{\sqrt{2n}} {\mathbf {X}}_n\), when \(n=2000\), \({\mathbf {X}}_n\) is defined in (2.3), and \({\mathbf {A}}_n, {\mathbf {B}}_n\) are independent \(n \times n\) random matrices drawn from the real Ginibre ensemble. The eigenvalues appear to concentrate near the origin

2.4 Least singular value bound

One of the key ingredients in the proof of Theorem 2.4 is a bound on the least singular value of random matrices \(\{{\mathbf {X}}_n\}_{n \ge 1}\) that satisfy condition C0. In particular, we establish the following result.

Theorem 2.5

(Least singular value bound) Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of random matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume that

$$\begin{aligned} \max _{1 \le s,t \le d} {\mathbb {E}}|\xi _{st}|^{2+\eta } < \infty \end{aligned}$$

for some \(\eta > 0\). For each \(n \ge 1\), let \({\mathbf {N}}_n\) be a deterministic \(dn \times dn\) matrix whose entries are bounded by \(n^{\alpha }\) for some \(\alpha > 0\). Then, for every \(B > 0\), there exist \(A>0\) (depending only on \(d, B, \alpha \)) such that

$$\begin{aligned} {\mathbb {P}}\left( \sigma _{dn} ({\mathbf {X}}_n + {\mathbf {N}}_n) \le n^{-A} \right) =O(n^{-B}) . \end{aligned}$$

2.5 Overview

The proof of Theorem 2.5 requires studying an inverse Littlewood–Offord problem for random multilinear forms. To this end, we introduce the Littlewood–Offord problem and random multilinear forms in Sect. 3. Sections 4, 5, and 6 contain the proof of Theorem 2.5. Finally, we prove Theorem 2.4 in Sect. 7. A number of auxiliary results are contained in the appendix.

2.6 Notation

We use asymptotic notation (such as \(O,o,\Omega , \asymp \)) under the assumption that \(n \rightarrow \infty \). We use \(X \ll Y, Y \gg X, Y=\Omega (X)\), or \(X = O(Y)\) to denote the bound \(X \le CY\) for all sufficiently large \(n\) and for some constant \(C\). Notations such as \(X \ll _k Y\) and \(X=O_k(Y)\) mean that the hidden constant \(C\) depends on another constant \(k\). \(X=o(Y)\) or \(Y=\omega (X)\) means that \(X/Y \rightarrow 0\) as \(n \rightarrow \infty \).

We let \(\Vert {\mathbf {M}}\Vert _2\) denote the Hilbert–Schmidt norm of \({\mathbf {M}}\) [defined in (1.1)], and let \(\Vert {\mathbf {M}}\Vert \) denote the spectral norm of \({\mathbf {M}}\). For a vector \({\mathbf {v}}\), we let \(\Vert {\mathbf {v}}\Vert = \Vert {\mathbf {v}}\Vert _2\) denote the Euclidean norm of \({\mathbf {v}}\).

We let \({\mathbf {I}}_n\) denote the \(n \times n\) identity matrix. Often we will just write \({\mathbf {I}}\) for the identity matrix when the size can be deduced from the context. Similarly, we let \({\mathbf 0}\) denote the zero matrix.

For an event \(E\), we let \(\mathbf {1}_{E}\) denote the indicator function of the event \(E\). We write a.s., a.a., and a.e. for almost surely, Lebesgue almost all, and Lebesgue almost everywhere respectively. We use \(\sqrt{-1}\) to denote the imaginary unit and reserve \(i\) as an index.

3 The Littlewood–Offord problem and random multilinear forms

In this section, we introduce the Littlewood–Offord problem and some anti-concentration results for random multilinear forms, which will be used to prove Theorem 2.5.

3.1 The Littlewood–Offord problem

Consider \(\xi \) a real random variable with mean zero and unit variance. A large portion of classical probability theory is devoted to studying random sums \(S_\xi (A) := \sum _{i=1}^n a_i x_i\), where \(A =\{a_1,\ldots , a_n\}\) is a multiset of complex vectors in \({\mathbb {C}}^d\) and \(x_1,\ldots ,x_n\) are iid copies of \(\xi \). The Littlewood–Offord problem is to estimate the small ball probability

$$\begin{aligned} \rho _{\beta ,\xi }(A) := \sup _{z \in {\mathbb {C}}^d} {\mathbb {P}}( \Vert S_\xi (A) - z\Vert \le \beta ). \end{aligned}$$

In particular, if \(\rho _{\beta ,\xi }(A)\) is small, then the random sum \(S_\xi (A)\) is well spread. Conversely, if \(\rho _{\beta ,\xi }(A)\) is large, then the random sum concentrates near a point.

A classical result of Littlewood and Offord [16], which was strengthened by Erdős [8], gives an estimate for the small ball probability when \(\xi \) is a Bernoulli random variable (takes values \(\pm 1\) each with probability \(1/2\)) and \(d=1\).

Theorem 3.1

(Erdős [8]) Let \(\xi \) be a Bernoulli random variable. If the complex numbers \(a_i\) satisfy \(|a_i| \ge 1\) for all \(i\), then

$$\begin{aligned} \rho _{1,\xi }(A) = O(n^{-1/2}). \end{aligned}$$

The reader is invited to consult [23] and references therein for further extensions of this result. Motivated by inverse theorems from additive combinatorics, Tao and Vu [29] consider the following phenomenon:

If \(\rho _{\beta ,\xi }(A)\) is large, then most of the elements of \(A\) are additively correlated.

In order to introduce the precise result, we recall the notion of a generalized arithmetic progression (GAP). A set \(Q\subset {\mathbb {C}}^d\) is a GAP of rank r if it can be expressed in the form

$$\begin{aligned} Q= \{g_0+ k_1g_1 + \dots +k_r g_r : K_i \le k_i \le K_i', k_i\in {\mathbb {Z}}\quad \hbox {for all } 1 \le i \le r\} \end{aligned}$$

for some \(g_0,\ldots ,g_r\in {\mathbb {C}}^d\), and some integers \(K_1,\ldots ,K_r,K'_1,\ldots ,K'_r\).

The vectors \(g_i\) are the generators of \(Q\), the numbers \(K_i'\) and \(K_i\) are the dimensions of \(Q\), and \({{\mathrm{Vol}}}(Q) := |B|\) is the volume of \(Q\). We say that \(Q\) is proper if \(|Q| = {{\mathrm{Vol}}}(Q)\). If \(g_0=0\) and \(-K_i=K_i'\) for all \(i\ge 1\), we say that \(Q\) is symmetric.

Consider a proper symmetric GAP \(Q= \{\sum _{i=1}^r k_ig_i : -K_i \le k_i \le K_i\}\) of rank \(r=O(1)\) and size \(N=n^{O(1)}\) in \({\mathbb {C}}\). Assume that \(\xi \) has Bernoulli distribution and for each \(a_i\) there exists \(q_i\in Q\) such that \(|a_i-q|\le \delta \). Then, because the random sum \(\sum _i q_ix_i\) takes values in the GAP \(nQ:=\{\sum _{i=1}^r k_ig_i : -nK_i \le k_i \le nK_i\}\), a GAP of size \(|nQ| \le n^r N=n^{O(1)}\), the pigeon-hole principle implies that \(\sum _i q_ix_i\) takes some value in \(nQ\) with probability \(n^{-O(1)}\). Thus we have

$$\begin{aligned} \rho _{n\delta , \xi }(A) = n^{-O(1)}. \end{aligned}$$
(3.1)

This example shows that if \(A\) is close to a \(GAP\) of rank \(O(1)\) and size \(n^{O(1)}\), then \(A\) has large small ball probability. It was shown by Tao and Vu in [28, 29, 31, 32] that this is essentially the only example which has large small ball probability. We recite here an explicit version from [21] which will be used later on.

We say that a vector \(a\) is \(\delta \) -close to a set \(Q\) if there exists \(q\in Q\) such that \(\Vert a-q\Vert \le \delta \).

Theorem 3.2

(Inverse Littlewood–Offord theorem for linear forms [21]) Let \(0 <\varepsilon < 1\) and \(B>0\). Let \( \beta >0\) be a parameter that may depend on \(n\). Suppose that \(\sum _i \Vert a_i\Vert ^2 =1\) and

$$\begin{aligned} \rho :=\rho _{\beta ,\xi }(A) \ge n^{-B}, \end{aligned}$$

where \(x_1,\ldots ,x_n\) are iid copies of a random variable \(\xi \) having bounded \((2+\eta )\)-moment. Then, for any number \(n'\) between \(n^\varepsilon \) and \(n\), there exists a proper symmetric GAP \(Q=\{\sum _{i=1}^r k_ig_i : |k_i|\le K_i \}\) such that

  • At least \(n-n'\) elements of \(a_i\) are \(\beta \)-close to \(Q\).

  • \(Q\) has small rank, \(r=O_{B,\varepsilon }(1)\), and small size

    $$\begin{aligned} |Q| \le \max \left\{ O_{B,\varepsilon }\left( \frac{\rho ^{-1}}{\sqrt{n'}}\right) ,1\right\} . \end{aligned}$$
  • There is a non-zero integer \(p=O_{B,\varepsilon }(\sqrt{n'})\) such that all generators \(g_i\) of \(Q\) have the form \(g_i=(g_{i1},\dots ,g_{id})\), where \(g_{ij}=\beta \frac{p_{ij}}{p} \) with \(p_{ij} \in {\mathbb {Z}}\) and \(|p_{ij}|=O_{B,\varepsilon }(\beta ^{-1} \sqrt{n'}).\)

3.2 Random multilinear forms

One can view the sum \(S_\xi (A) =a_1 x_1+\dots +a_n x_n\) as a linear function of the random variables \(x_1,\dots , x_n\). It is natural to study general polynomials of higher degree.

Let \(D\) be a fixed positive integer. Let \(x_{1i_1},x_{2i_2},\dots ,x_{Di_D}\), \(1\le i_1,\dots ,i_D \le n\), be iid copies of a random variable \(\xi \), and let \(A=(a_{i_1i_2 \dots i_D})_{1\le i_1,\dots ,i_D\le n}\) be an \(n^D\)-array of complex numbers. We define the D-multilinear concentration probability of \(A\) by

$$\begin{aligned} \rho _{\beta ,\xi }(A)&:= \sup _{a \in {\mathbb {C}}, L_{D-1}}{\mathbb {P}}\left( \sum _{1\le i_1,\ldots ,i_D\le n} a_{i_1i_2\ldots i_D}x_{1i_1}x_{2i_2} \ldots x_{Di_D} \right. \\&\quad \ \left. + L_{D-1}({\mathbf {x}}_1,{\mathbf {x}}_2,\ldots ,{\mathbf {x}}_D)\in B(a,\beta ) \right) , \end{aligned}$$

where \({\mathbf {x}}_i=(x_{i1},\dots ,x_{in})\) and \(L_{D-1}({\mathbf {x}}_1,\dots , {\mathbf {x}}_D)\) is any \((D-1)\)-multilinear form of \(({\mathbf {x}}_1,\dots ,{\mathbf {x}}_D)\).

We would like to characterize \(A\) with large \(\rho _{\beta ,\xi }(A)\). The following examples serve as good candidates.

Example 3.3

In what follows \(\xi \) has Bernoulli distribution and for each \(a_{i_1i_2\ldots i_D}\) there exists \(q_{i_1i_2\dots i_D}\) such that \(|a_{i_1i_2\dots i_D}-q_{i_1i_2\dots i_D}|\le \delta .\)

  1. (1)

    Let \(Q\) be a proper symmetric GAP of rank \(r=O(1)\) and size \(n^{O(1)}\). Assume that the approximated values \(q_{i_1i_2\dots i_D}\) belong to \(Q\). Then, the pigeon-hole principle implies that \(\sum _{i_1,\dots ,i_D}q_{i_1i_2\dots i_D}x_{1i_1}\dots x_{Di_D}\) takes some value in \(n^2Q\) with probability \(n^{-O(1)}\). Passing back to \(a_{i_1i_2\dots i_D}\), we obtain \(\rho _{n^2\delta ,\xi }(A) =n^{-O(1)}\).

  2. (2)

    Assume that \(q_{i_1i_2\dots i_D}\) can be written as \(q_{i_1i_2\dots i_D}=k_{i_1}b_{\bar{i}_1i_2\dots i_D}+l_{i_2}b_{i_1\bar{i}_2\dots i_D}+\dots + m_{i_D} b_{i_1i_2\dots \bar{i}_D}\), where \(b_{\bar{i}_1i_2\dots i_D},\ldots ,b_{i_1i_2\dots \bar{i}_D}\) are arbitrary sequences in \({\mathbb {R}}^d\) without indices \(i_1,\dots ,i_D\) respectively, and \(k_{i_1},l_{i_2},\dots , m_{i_D}\) are integers bounded by \(n^{O(1)}\) such that

    $$\begin{aligned} {\mathbb {P}}_{{\mathbf {x}}_1}\left( \sum _{i_1} k_{i_1}x_{1i_1}= 0\right)&=n^{-O(1)},\dots , \\ {\mathbb {P}}_{{\mathbf {x}}_D}\left( \sum _{i_D} m_{i_D}x_{Di_D}= 0\right)&=n^{-O(1)}. \end{aligned}$$

    Then, as \(\sum _{i_1,i_2,\dots ,i_D}q_{i_1i_2\dots i_D}x_{1i_1}x_{2i_2}\dots x_{Di_D}\) factors out, we have

    $$\begin{aligned} {\mathbb {P}}\left( \sum _{i_1,i_2,\dots ,i_D}q_{i_1i_2\dots i_D}x_{1i_1}x_{2i_2}\dots x_{di_D} =\mathbf {0}\right) =n^{-O(1)}. \end{aligned}$$

    Passing back to \(a_{i_1i_2\dots i_D}\), we hence obtain \(\rho _{n^2\delta ,\xi }(A) =n^{-O(1)}.\)

  3. (3)

    Assume that \(q_{i_1i_2\dots i_D}=q_{i_1i_2\dots i_D}' +q_{i_1i_2\dots i_D}''\), where \(q_{i_1i_2\dots i_D}'\in Q\), a proper symmetric GAP of rank \(O(1)\) and size \(n^{O(1)}\), and \(q_{i_1i_2\dots i_D}''\) is a sum of a few forms from (2) in such a way that the linear factors are zero with high probability. As such, we have

    $$\begin{aligned} \sup _{q\in n^2Q}{\mathbb {P}}_{{\mathbf {x}}_1,\dots ,{\mathbf {x}}_D}\left( \sum _{i_1,\dots ,i_D}q_{i_1i_2\dots i_D}x_{1i_1}x_{2i_2}\dots x_{di_D} =q\right) =n^{-O(1)}. \end{aligned}$$

    Hence we also have \(\rho _{n^2\delta ,\xi }(A) =n^{-O(1)}\) in this case.

The above examples demonstrate that if the \(a_{i_1i_2\dots i_D}\) can be decomposed into additive and algebraic structural parts, then \(\rho _{\xi ,\beta }(A)\) is large. We conjecture that these are essentially the only cases that have large concentration probability.

Conjecture 3.4

Assume that \(\rho _{\xi ,\beta }(A) \ge n^{-B}\) for some generic \(\xi \) and small \(\beta \), then most of the elements of \(A\) can be \(\beta \)-approximated by a set of \(q_{i_1i_2\dots i_D}\) from (3) of Example 3.3.

Due to its nature, we believe that any justification of Conjecture 3.4 would be highly technical. In this note we prove a weak version of it as follows.

Theorem 3.5

(Weak inverse-type theorem for multilinear forms) Let \(0 <\varepsilon < 1\) and \(C>0\). Let \( \beta >0\) be a parameter that may depend on \(n\). Assume that

$$\begin{aligned} \rho&= \sup _{a,L_{D-1}} {\mathbb {P}}_{{\mathbf {x}}_1,\dots ,{\mathbf {x}}_D}\left( \sum _{i_1,i_2,\dots ,i_D}a_{i_1i_2\dots i_D}x_{1i_1}x_{2i_2}\dots x_{Di_D}\right. \\&\quad \left. -L_{D-1}({\mathbf {x}}_1,\dots ,{\mathbf {x}}_D) \in B(a,\beta )\right) \ge n^{-C}, \end{aligned}$$

where \({\mathbf {x}}_1=(x_{11},\dots ,x_{1n}),\dots ,{\mathbf {x}}_D=(x_{D1},\dots ,x_{Dn})\), and \(x_{1i_1},\dots ,x_{Di_D}\) are iid copies of a random variable \(\xi \) having bounded \((2+\eta )\)-moment. Then there exist index sets \(I_1,I_1^0\) with \(|I_1|=n-n^\varepsilon \) and \(|I_1^0|=O_{C,\varepsilon }(1)\) such that for any \(i_1\in I_1\), there exist index sets \(I_2,I_2^0\) depending on \(i_1\) with \(|I_2|=n-n^\varepsilon \) and \(|I_2^0|=O_{C,\varepsilon }(1)\) such that ...there exist index sets \(I_{D-1},I_{D-1}^0\) depending on \(i_1,\dots ,i_{D-2}\) with \(|I_{D-1}|=n-n^\varepsilon \) and \(|I_{D-1}^0|=O_{C,\varepsilon }(1)\) such that the following holds: for any \(i_{D-1}\in I_{D-1}\), there exist integers \(k_{j_1 \dots j_{D-1}}\), where each index \(j_k\) with \(1\le k\le D-1\) either takes value \(i_k\) or belongs to the thin sets \(I_k^0\), such that \(k_{j_1 \dots j_{D-1}}=n^{O_{C,d,\varepsilon }(1)}\) and \(k_{i_1,\dots ,i_{D-1}}\ne 0\), as well as

$$\begin{aligned} {\mathbb {P}}_{{\mathbf {x}}_D}\left( \left| \sum _{1\le i_D \le n}\sum _{j_1 \dots j_{D-1}}k_{j_1 \dots j_{D-1}}a_{j_1 \dots j_{D-1} i_D}x_{i_D}\right| \le \beta n^{O_{C,\varepsilon }(1)}\right) \ge n^{-O_{C,\varepsilon }(1)}. \end{aligned}$$

Notice that while in Example 3.3 most of the \(D-1\) dimensional arrays of \(A\) have structure, Theorem 3.5 just asserts that most of the 1-dimension arrays \(a_{j_1 \dots j_{D-1} i_D}, 1\le i_D\le n\), with fixed \(j_1,\dots ,j_{D-1}\), have structure.

For the rest of this section we give a proof of Theorem 3.5. Our argument heavily relies on the following simple fact about GAPs of small rank.

Fact 3.6

Assume that \(q_1,\dots ,q_{r+1}\) are elements of a GAP of rank \(r\) and of cardinality \(n^C\), then there exists integer coefficients \(\alpha _1,\dots ,\alpha _r\) with \(|\alpha _i|\le n^{rC}\) and such that

$$\begin{aligned} \sum _i \alpha _i q_i =0. \end{aligned}$$

3.3 Proof of Theorem 3.5

Without loss of generality, we assume that \(\xi \) has discrete distribution. The continuous case can be easily extended by a standard limiting argument. We begin by applying Theorem 3.2.

Lemma 3.7

Let \(\varepsilon <1\) and \(C\) be positive constants. Assume that \(\rho _{\xi ,\beta }(A)=\rho \ge n^{-C}\). Then the following holds with probability at least \(\frac{3\rho }{4}\) with respect to \({\mathbf {x}}_2,\dots , {\mathbf {x}}_D\). There exist a proper symmetric GAP \(Q_{{\mathbf {x}}_2\dots {\mathbf {x}}_D}\) of rank \(O_{C,\varepsilon }(1)\) and size \(O_{C,\varepsilon }(1/\rho )\) and a set \(I_{{\mathbf {x}}_2,\dots , {\mathbf {x}}_D}\) of \(n-n^\varepsilon \) indices such that for each \(i\in I_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\), there exists \(q_i\in Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\) so that

$$\begin{aligned} \left| \sum _{i_2,\dots , i_D}a_{ii_2\dots i_D}x_{2i_2}\dots x_{Di_D}-q_i\right| \le \beta . \end{aligned}$$

Proof

(of Lemma 3.7) For short we write

$$\begin{aligned} \sum _{i_1,i_2,\dots ,i_D} a_{i_1i_2\dots i_D}x_{1i_1}x_{2i_2}\dots x_{Di_D} = \sum _{i=1}^n x_{1i} B_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D), \end{aligned}$$

where

$$\begin{aligned} B_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D):=\sum _{i_2,\dots ,i_D}a_{ii_2\dots i_D}x_{2i_2}\dots x_{Di_D}. \end{aligned}$$

We call a vector tuple \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) good if

$$\begin{aligned} {\mathbb {P}}_{{\mathbf {x}}_1}\left( \left| \sum _{i=1}^n x_{1i}B_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in B(a,\beta )\right| \right) \ge \rho /4. \end{aligned}$$

We call \({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D\) bad otherwise. Let \(G\) be the collection of good tuples.

First, we estimate the probability \(p\) of randomly chosen vectors \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) being bad by an averaging method.

$$\begin{aligned} {\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D} {\mathbb {P}}_{{\mathbf {x}}_1} \left( \sum _{i=1}^n x_{1i}B_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in B(a,\beta )\right)&=\rho \\ p \rho /4 + 1-p&\ge \rho .\\ (1-\rho )/(1-\rho /4)&\ge p. \end{aligned}$$

Thus, the probability of randomly chosen vectors \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) being good is at least

$$\begin{aligned} 1-p \ge (3\rho /4)/(1-\rho /4) \ge 3\rho /4. \end{aligned}$$

Next, we consider good vectors \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G\). By definition, we have

$$\begin{aligned} {\mathbb {P}}_{{\mathbf {x}}_1} \left( \sum _{i=1}^n x_{1i} B_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D) \in B(a,\beta )\right) \ge \rho /4 . \end{aligned}$$

Observe that this is a high concentration of a linear form of \(x_{1i}\). A direct application of Theorem 3.2 to the sequence \(B_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\), \(i=1,\dots ,n\) yields the desired result. \(\square \)

By a useful property of GAP containment (see for instance [30, Section 8] and [20, Theorem 6.1]), we may assume that the \(q_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) span \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\). From now on we fix such a \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\) for each \({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D\). Let \(G\) be the collection of good vectors \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\). Thus,

$$\begin{aligned} {\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G)\ge 3\rho /4. \end{aligned}$$
(3.2)

Now we state our crucial lemma for the proof of Theorem 3.5.

Lemma 3.8

There exits an index set \(I\) of size at least \(n-2n^\varepsilon \), an index set \(I_0\) of size \(O_{C,\varepsilon }(1)\), and an integer \(k\ne 0\) with \(|k|\le n^{O_{C,\varepsilon }(1)}\) such that for any index \(i\) from \(I\), there are numbers \(k_{ii_0} \in {\mathbb {Z}}, i_0\in I_0\), all bounded by \(n^{O_{C,\varepsilon }(1)}\), such that

$$\begin{aligned} {\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\left( k B_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)+ \sum _{i_0\in I_0} k_{ii_0} B_{i_0}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in B(a,\beta )\right) =\rho /n^{O_{C,\varepsilon }(1)}. \end{aligned}$$

Proof

(of Lemma 3.8) For each \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G\), we choose from \(I_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\) \(s\) indices \(i_{(1,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)},\dots ,i_{{(s,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)}}\) such that \(q_{i_{{(j,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)}}}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) span \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\), where \(s\) is the rank of \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\). We note that \(s=O_{C,\varepsilon }(1)\) for all \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G\).

Consider the tuples \((i_{{(1,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)}}, \dots ,i_{{(s,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)}})\) for all \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G\). Because there are \(\sum _{s} O_{C,\varepsilon ,\mu }(n^s) = n^{O_{C,\varepsilon ,\mu }(1)}\) possibilities these tuples can take, there exists a tuple, say \((1,\dots ,r)\) (by rearranging the rows of \(A\) if needed), such that \((i_{{(1,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)}},\dots , i_{{(s,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)}})=(1,\dots ,r)\) for all \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G'\), where \(G'\) is a subset of \(G\) satisfying

$$\begin{aligned} {\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G')\ge {\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G)/n^{O_{C,\varepsilon }(1)} =\rho /n^{O_{C,\varepsilon }(1)}. \end{aligned}$$
(3.3)

For each \(1\le i\le r\), we express \(q_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) in terms of the generators of \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\) for each \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G'\),

$$\begin{aligned}&q_{i}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D) \\&\quad =c_{i1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)g_{1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)+\dots + c_{ir}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)g_{r}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D), \end{aligned}$$

where \(c_{i1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D),\dots c_{ir}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) are integers bounded by \(n^{O_{C,\varepsilon }(1)}\), and \(g_{i}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) are the generators of \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\).

We show that there are many \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) that correspond to the same coefficients \(c_{i_1i_2}\).

Claim 3.9

There exists a (“dense”) subset \(G''\subset G'\) such that the following holds

  • \({\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G'')\ge {\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G')/n^{O_{C,\varepsilon }(1)} \ge \rho /n^{O_{C,\varepsilon }(1)};\)

  • (common tuples) there exist \(r\) tuples \((c_{11},\dots ,c_{1r}),\dots , (c_{r1},\dots c_{rr})\), whose components are integers bounded by \(n^{O_{C,\varepsilon ,\mu }(1)}\), such that the following hold for all \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G''\):

    1. (1)

      \(q_{i}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D) = c_{i1}g_{1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)+\dots + c_{ir}g_{r}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\), for \(i=1,\dots ,r\);

    2. (2)

      The vectors \((c_{11},\dots ,c_{1r}),\dots , (c_{r1},\dots c_{rr})\) span \({\mathbb {Z}}^{{{\mathrm{rank}}}(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D})}\).

Proof

(of Claim 3.9) Consider the collection \(\mathcal {C}\) of the coefficient-tuples

$$\begin{aligned} \mathcal {C}&:= \Big \{\Big (\big (c_{11}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D),\dots ,c_{1r} ({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\big );\dots ; \big (c_{r1}(\dots ),\dots c_{rr}(\dots )\big )\Big ),\\&\quad ({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G'\Big \} . \end{aligned}$$

Because the number of possibilities these tuples can take is at most \((n^{O_{C,\varepsilon }(1)})^{r^2} =n^{O_{C,\varepsilon }(1)}\), by the pigeon-hole principle there exists a coefficient-tuple, say \(((c_{11},\dots ,c_{1r})\), \(\dots , (c_{r1},\dots c_{rr}))\in \mathcal {C}\), such that

$$\begin{aligned}&\Big (\big (c_{11}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D),\dots ,c_{1r}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\big );\dots ;\big (c_{r1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D),\\&\quad \dots c_{rr}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\big )\Big )\\&\quad =\Big ((c_{11},\dots ,c_{1r}),\dots , (c_{r1},\dots c_{rr})\Big ) \end{aligned}$$

for all \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) from a subset \(G''\) of \(G'\) which satisfies

$$\begin{aligned} {\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G'')&\ge {\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G')/n^{O_{C,\varepsilon }(1)}\nonumber \\&\ge \rho /n^{O_{C,\varepsilon }(1)}. \end{aligned}$$
(3.4)

\(\square \)

Now we focus on the elements of \(G''\). Because \(|I_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}|\ge n-n^\varepsilon \) for each \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G''\), we obtain the following.

Claim 3.10

There is a set \(I\) of size \(n-3n^\varepsilon \) such that \(I \cap \{1,\dots ,r\} =\emptyset \) and for each \(i\in I\) we have

$$\begin{aligned} {\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(i\in I_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}, ({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G'') \ge {\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G'')/2. \end{aligned}$$
(3.5)

Proof

(of Claim 3.10) The result easily follows by an elementary averaging argument, \(\square \)

Lemma 3.8: proof conclusion Now we fix an arbitrary index \(i\) from \(I\). We concentrate on those \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G''\) where the index \(i\) belongs to \(I_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\). Because \(q_{i}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D) \in Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\), we can write

$$\begin{aligned} q_{i}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\!=\! c_{1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)g_{1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\!+\!\dots c_{r}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)g_{r}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D), \end{aligned}$$

where \(c_{1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D),\dots ,c_r({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) are integers bounded by \(n^{O_{C,\varepsilon }(1)}\).

For short, we denote by \(\mathbf {v}_{i,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\) the vector \((c_{1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D),\dots c_{r}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D))\), we also use the shorthand \({\mathbf {v}}_j\) for the vectors \((c_{j1},\dots ,c_{jr})\) obtained from Claim 3.9.

Because \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\) is spanned by \(q_{1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D),\dots , q_{r}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\), we must have \(k:=\det (\mathbf {v}_1,\dots \mathbf {v}_r)\ne 0\) and that

$$\begin{aligned}&k q_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D) + \det (\mathbf {v}_{i,y},\mathbf {v}_2,\dots ,\mathbf {v}_r)q_{1} ({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\\&\quad +\dots + \det (\mathbf {v}_{i,y},\mathbf {v}_1,\dots ,\mathbf {v}_{r-1})q_{r} ({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)=0. \end{aligned}$$

Furthermore, because each coefficient of the identity above is bounded by \(n^{O_{C,\varepsilon ,\mu }(1)}\), there exists a subset \(G_{i}''\) of \(G''\) such that all \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G_{i}''\) correspond to the same identity, and

$$\begin{aligned} {\mathbb {P}}_{({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)}(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G_{i}'')&\ge ({\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G'')/2)/(n^{O_{C,\varepsilon }(1)})^r\\&\ge \rho /n^{O_{C,\varepsilon }(1)}. \end{aligned}$$

In other words, there exist integers \(k_1,\dots ,k_r\), all bounded by \(n^{O_{C,\varepsilon }(1)}\), such that

$$\begin{aligned} k q_i({\mathbf {x}}_2,\dots , {\mathbf {x}}_D) + k_1 q_{1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)+ \dots + k_r q_{r}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)=0 \end{aligned}$$

for all \(({\mathbf {x}}_2,\dots , {\mathbf {x}}_D)\in G_{i}''\).

Note that \(k\) is independent of the choice of \(i\) and \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\). By passing from \(q_i\) to \(B_i\) by approximation, we thus complete the proof of Lemma 3.8. \(\square \)

We are now ready to complete the proof of our inverse result.

Theorem 3.5: proof conclusion From Lemma 3.8, for any fixed \(i\in I\), we consider the following \((D-1)\)-multilinear form

$$\begin{aligned} B_i'({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)&:= k B_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)+ \sum _{i_0\in I_0} k_{ii_0} B_{i_0}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\\&=\left( k a_{ii_2\dots i_D} + \sum _{i_0} k_{ii_0} a_{i_0i_2 \dots i_D}\right) x_{2i_2}\dots x_{Di_D}\\&:=\sum _{i_2,\dots , i_D} b_{i_2\dots i_D}' x_{2i_2}\dots x_{Di_D}. \end{aligned}$$

By the conclusion of Lemma 3.8, we have \(\sup _a{\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(B_i'({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\!\in \! B(a,\beta ))\ge \rho /n^{O_{C,\varepsilon }(1)}\). Thus Lemma 3.8 is applicable again for this new \((D-1)\)-multilinear form. By iterating the process \(D-1\) times, we obtain the conclusion of Theorem 3.5.

4 Singularity of block matrices: the approach to prove Theorem 2.5

As the singular values do not change under row and column permutations, for the sake of convenience, we will restrict our analysis to matrices of the form \({\mathbf {M}}_n= {\mathbf {X}}_n+{\mathbf {N}}_n\), where \({\mathbf {N}}_n\) is any deterministic matrix of polynomially bounded norm and \({\mathbf {X}}_n\) is a \(dn \times dn\) matrix whose \(ij\)th block takes the form

$$\begin{aligned} \begin{pmatrix} x_{11;ij} &{}\quad x_{12;ij} &{}\quad \dots &{}\quad x_{1d;ij} \\ \dots &{}\quad \dots &{}\quad \dots &{}\quad \dots \\ x_{d1;ij} &{}\quad x_{d2;ij} &{}\quad \dots &{}\quad x_{dd;ij} \end{pmatrix}, \end{aligned}$$

where \((x_{11;ij},\dots ,x_{dd;ij}), 1\le i,j\le n\), are iid copies of \((\xi _{11},\dots ,\xi _{dd})\) which satisfy the following conditions from Definition 2.3 and Theorem 2.5:

$$\begin{aligned}&{\mathbb {E}}\xi _{st}=0, {\mathbb {E}}|\xi _{st}|^2=1, {\mathbb {E}}|\xi _{st}|^{2+\eta }<\infty \quad \text{ for } \text{ some } \eta >0 \nonumber \\&{\mathbb {E}}\left[ \xi _{st} \overline{\xi _{uv}}\right] = 0 \quad \text{ for } \text{ all } (s,t) \ne (u,v). \end{aligned}$$
(4.1)

We now restate Theorem 2.5 as follows.

Theorem 4.1

For any \(B>0\), there exists \(A>0\) depending on \(B\) and \(\alpha \) such that

$$\begin{aligned} {\mathbb {P}}(\sigma _{dn}({\mathbf {M}}_n)\le n^{-A})\le n^{-B}. \end{aligned}$$

In the sequel we sketch the proof of Theorem 4.1. In general, our approach will resemble that of [19, 20, 26, 31, 35] where the main ingredient is an inverse-type argument. However, as our matrix now consists of large blocks of correlated entries, we need to elaborate more on the algebraic and technical side. For the sake of simplicity, we will prove our result under the following condition.

Assumption 4.2

With probability one, \(|x_{11;ij}| \le n^{B+1} \wedge \dots \wedge |x_{dd;ij}| \le n^{B+1}\) for all \(i,j\).

In what follows we assume that \({\mathbf {M}}_{n}\) has full rank. This is the main case to consider as most random matrices are non-singular with very high probability. The case that \({\mathbf {M}}_n\) is singular can be deduced by a standard argument (see for instance [22, Appendix A]).

Assume that \(\sigma _{nd}({\mathbf {M}}_n)\le n^{-A}\). Thus \({\mathbf {M}}_n{\mathbf {x}}={\mathbf {y}}\) for some \(\Vert {\mathbf {x}}\Vert =1\) and \(\Vert {\mathbf {y}}\Vert \le n^{-A}\). Let \({\mathbf {C}}=(c_{i,j}({\mathbf {M}}_n))\), \(1\le i,j\le dn\), be the matrix of cofactors of \({\mathbf {M}}_n\). By definition, \({\mathbf {C}}{\mathbf {y}}= \det ({\mathbf {M}}_n) \cdot {\mathbf {x}}\), and thus we have \(\Vert {\mathbf {C}}{\mathbf {y}}\Vert = |\det ({\mathbf {M}}_n)|\).

By paying a factor of \(dn\) in probability, without loss of generality we can assume that the first component of \({\mathbf {C}}{\mathbf {y}}\) is greater than \(\det ({\mathbf {M}}_n)/(dn)^{1/2}\),

$$\begin{aligned} |c_{1,1}({\mathbf {M}}_n)y_1+\dots c_{1,dn}({\mathbf {M}}_n)y_{dn}|\ge |\det ({\mathbf {M}}_n)|/(dn)^{1/2}. \end{aligned}$$
(4.2)

Claim 4.3

Let \({\mathbf {M}}_{n-1}\) be the matrix obtained from \({\mathbf {M}}_n\) by removing its first \(d\) rows, and \( c_{i_1i_2\dots i_d}({\mathbf {M}}_{n-1}), 1\le i_1,\dots ,i_d\le nd\) be the sign determinant of the minor obtained from \({\mathbf {M}}_{n-1}\) by removing its \(i_1,\dots ,i_d\)th columns. We have

$$\begin{aligned} \sum _{1\le i_1,i_2,\dots ,i_d \le dn} |c_{i_1i_2\dots i_d}({\mathbf {M}}_{n-1})|^2 \ge n^{2A-O(B+\alpha )}|\det ({\mathbf {M}}_n)|^2 \end{aligned}$$
(4.3)

Proof

(of Claim 4.3) As \(\Vert {\mathbf {y}}\Vert \le n^{-A}\), it follows from (4.2) that

$$\begin{aligned} \sum _{i_1=1}^{dn} |c_{1,i_1}({\mathbf {M}}_n)|^2 \ge n^{2A-2} |\det ({\mathbf {M}}_n)|^2. \end{aligned}$$
(4.4)

Next, as each cofactor \(c_{1i_1}({\mathbf {M}}_n)\), as a sign determinant of a \((dn-1)\times (dn-1)\) block of \({\mathbf {M}}_n\), can be expressed as

$$\begin{aligned} c_{1,i_1}({\mathbf {M}}_n)=(-1)^{(1+i_1)\dots +(d+i_d)}\sum _{1\le i_2,\dots ,i_d \le dn} m_{2i_2}\dots m_{di_d} c_{i_1i_2\dots i_d}({\mathbf {M}}_{n-1}). \end{aligned}$$

The claim then follows by applying Cauchy–Schwarz inequality together with Condition 4.2 and the upper bound \(n^\alpha \) on the entries of \({\mathbf {N}}_n\). \(\square \)

By Claim 4.3, in order to prove Theorem 4.1 it suffices to justify the following result.

Theorem 4.4

For any \(B>0\), there exists \(A>0\) such that

$$\begin{aligned} {\mathbb {P}}\left( \left( \sum _{1\le i_1,i_2,\dots ,i_d \le dn} |c_{i_1i_2\dots i_d}({\mathbf {M}}_{n-1})|^2\right) ^{1/2} \ge n^A|\det ({\mathbf {M}}_n)| \right) \le n^{-B}. \end{aligned}$$

Next, express \(\det ({\mathbf {M}}_n)\) as a \(d\)-multilinear form of its first \(d\) rows

$$\begin{aligned} \det ({\mathbf {M}}_n) = \sum _{1\le i_1,i_2,\dots ,i_d \le dn} c_{i_1i_2\dots i_d}({\mathbf {M}}_{n-1}) m_{1i_1}\dots m_{di_d}. \end{aligned}$$

With \(c:=(\sum _{1\le i_1,i_2,\dots ,i_d \le dn} |c_{i_1i_2\dots i_d}({\mathbf {M}}_{n-1})|^2)^{1/2}\) and \(a_{i_1\dots i_d}:=c_{i_1\dots i_d}({\mathbf {M}}_{n-1})/c\),

$$\begin{aligned} \frac{1}{c}\det ({\mathbf {M}}_n) =\sum _{1\le i_1,i_2,\dots ,i_d \le dn} a_{i_1i_2\dots i_d}({\mathbf {M}}_{n-1}) m_{1i_1}\dots m_{di_d}. \end{aligned}$$
(4.5)

Heuristically, conditioning on \({\mathbf {M}}_{n-1}\), the \(d\)-multilinear form in the RHS of (4.5) is comparable to 1 in absolute value with probability extremely close to one. Thus the assumption \({\mathbb {P}}(|\det (M_n)|/c\le n^{-A})\ge n^{-B}\) of Theorem 4.4, with an appropriately large value \(A\), must yield a high cancelation of the multilinear form. Based on this observation, our rough approach will consist of two main steps.

  • Step 1 Assume that for an appropriately large value \(A>0\) we have

    $$\begin{aligned} {\mathbb {P}}_{x_{11;11},\dots , x_{dd;1n}}\left( \left| \sum _{1\le i_1,i_2,\dots ,i_d \le dn} a_{i_1i_2\dots i_d}({\mathbf {M}}_{n-1}) m_{1i_1}\dots m_{di_d}\right| \!\le n^{-A}\big |{\mathbf {M}}_{n-1}\right) \!\ge n^{-B}. \end{aligned}$$

    Then the normalized cofactors \(a_{i_1\dots i_d}\) of \({\mathbf {M}}_{n-1}\) must satisfy a very special property.

  • Step 2 The probability, with respect to \({\mathbf {M}}_{n-1}\), that the \(a_{i_1\dots i_d}\) satisfy this special property is negligible.

Although the setting of Step 1 is identical to our inverse problem discussed in Sect. 3, the dependencies of the entries make the problem substantially harder. We will remove these dependencies using a series of decoupling tricks to arrive at a conclusion as useful as Theorem 3.5.

Theorem 4.5

(Step 1) Let \(0<\varepsilon <1\) be given constant. Assume that

$$\begin{aligned} \sup _a{\mathbb {P}}_{x_{11;11},\dots , x_{dd;1n}}\left( \left| \sum _{1\le i_1,i_2,\dots ,i_d \le dn} a_{i_1i_2\dots i_d}({\mathbf {M}}_{n-1}) m_{1i_1}\dots m_{di_d}\!-\!a \right| \le n^{-A}\!\right) \!\ge \! n^{-B}. \end{aligned}$$

for some sufficiently large integer \(A\), where \(a_{i_{1}i_{2}\dots i_{d}}=c_{i_{1}i_{2}\dots i_{d}}/c\). Then there exists \(k=O(d)\) indices \(i_1<\dots <i_k\) and a complex vector \({\mathbf {u}}=(u_1,\dots ,u_{nd})\) which satisfies the following properties.

  • (orthogonality) \(\Vert {\mathbf {u}}\Vert _2\asymp 1\) and \(|\langle {\mathbf {u}}_1,{\mathbf {r}}_i^{(1)}({\mathbf {M}}_{n-1})\rangle + \langle {\mathbf {u}}_2,{\mathbf {r}}_i^{(2)}({\mathbf {M}}_{n-1})\rangle | \le n^{-A/2+O_{B,\varepsilon }(1)}\) for \(n-O_{B,\varepsilon }(1)\) rows \({\mathbf {r}}_i\) of \({\mathbf {M}}_{n-1}\), where \({\mathbf {u}}_1\) and \({\mathbf {r}}_i^{(1)}\) are the subvectors corresponding to the components indexed by \(i_1,\dots ,i_k\) of \({\mathbf {u}}\) and \({\mathbf {r}}_i\) respectively, and \({\mathbf {u}}_2\) and \({\mathbf {r}}_i^{(2)}\) are the subvectors corresponding to the remaining components of \({\mathbf {u}}\) and \({\mathbf {r}}_i\) respectively;

  • (additive structure) there exists a GAP \(Q\) of rank \(O_{B,\varepsilon }(1)\) and size \(n^{O_{B,\varepsilon }(1)}\) that contains at least \(dn-2n^\varepsilon \) components \(u_i\);

  • (controlled form) all the components \(u_i\), and all the generators of the GAP are rational complex numbers of the form \(\frac{p}{q}+ \sqrt{-1} \frac{p'}{q'} \), where \(|p|,|q|,|p'|,|q'| \le n^{A/2+O_{B,\varepsilon }(1)}\).

In the second step, we show that the probability that \({\mathbf {M}}_{n-1}\) has the above properties is negligible.

Theorem 4.6

(Step 2) With respect to \({\mathbf {M}}_{n-1}\), the probability that there exists a vector \({\mathbf {u}}\) as in Theorem 4.5 is \(\exp (-\Omega (n))\).

5 Singularity of block matrices: proof Theorem 4.5

Recall that in the inverse step, Theorem 4.5, we assumed a high concentration of a multilinear form on a small ball of radius \(n^{-A}\). As the entries in each block are dependent, we are not able to apply Theorem 3.5 yet. In what follows we present two main steps to remove these dependencies.

5.1 Dependency removal I: general linear forms

First, it will be useful to study the concentration of the linear form

$$\begin{aligned} \sum _{1\le i\le n} (a_{11;i}x_{11;i}+\dots + a_{dd;i}x_{dd;i}), \end{aligned}$$

where \((x_{11;i},\dots ,x_{dd;i})\) are iid copies of \((x_{11},\dots ,x_{dd})\) satisfying (4.1). Intuitively, as the covariance of \((x_{11},\dots ,x_{dd})\) is non-singular, the random variables \((x_{11;i},\dots ,x_{dd;i})\) are not totally dependent on each other. (See Appendix for a more precise statement.) This fact may suggest a way to apply Theorem 3.2 with respect to \((x_{11;1},\dots ,x_{11;n})\) while holding \(x_{12;1},\dots ,x_{dd;n}\) fixed and vice versa. In what follows \((x_{1;i},\dots ,x_{D;i})\) plays the role of \((x_{11;i},\dots ,x_{dd;i})\).

Theorem 5.1

(Inverse Littlewood–Offord theorem for mixing linear forms) Let \(0<\varepsilon <1, B>0\) be given, and \(D\) be a positive integer. Let \( \beta >0\) be an arbitrary real number that may depend on \(n\). Suppose that \(a_{1;i},\dots , a_{D;i}\in {\mathbb {C}}\) such that \(\sum _{i=1}^n \sum _{1\le j\le D} |a_{j;i}|^2=1\) and

$$\begin{aligned} \sup _a{\mathbb {P}}_{x_{1;1},\dots ,x_{D;n}} \left( \left| \sum _{i=1}^n (a_{1;i}x_{1;i}+\dots + a_{D;i}x_{D;i}) -a\right| \le \beta \right) =\gamma \ge n^{-B}, \end{aligned}$$

where \((x_{1;i},\dots ,x_{D;i}),1\le i\le n\) are iid copies of \((x_{1},\dots ,x_{D})\) from (4.1). Then there exist positive constants \(\alpha ,c_0,C_0\) depending only on the distribution of \((x_{1},\dots ,x_{D})\) and \(D\) tuples \((\eta _{k1},\eta _{k2},\dots ,\eta _{kD}),1\le k\le D\) of complex numbers such that

  • \(|\eta _{ij}|\) are bounded from below and above by \(c_0\) and \(C_0\) respectively,

  • The least singular value of the matrix \((\eta _{ij})\) is at least \(\alpha \),

  • for any number \(n'\) between \(n^\varepsilon \) and \(n\), there exists a proper symmetric GAP \(Q=\{\sum _{i=1}^r k_ig_i : k_i\in {\mathbb {Z}}, |k_i|\le L_i \} \subset {\mathbb {C}}\) whose parameters satisfy (i) and (ii) of Theorem 3.2 and for at least \(n-n'\) indices \(i\), we have \(\eta _{k1}a_{1;i}+\dots +\eta _{kD}c_ia_{D;i}, 1\le k\le D\) are \(\beta \)-close to \(Q\).

As Theorem 5.1 can be shown by using the method of [21], we skip its proof and refer the reader to Appendix for a proof of a somewhat more general result (Theorem 6.1 below). We now introduce a useful corollary.

Corollary 5.2

Assume as in Theorem 5.1. Then there exists a proper symmetric GAP \(Q=\{\sum _{i=1}^r k_ig_i : k_i\in {\mathbb {Z}}, |k_i|\le L_i \} \subset {\mathbb {C}}\) whose parameters satisfy (i) and (ii) of Theorem 5.1 and an index set \(I\) of size at least \(n-n'\) such that with \((\gamma _{ij})\) being the inverse matrix of \((\eta _{ij})\): the sequence \(a_{k;i},i\in I,1\le k\le D\) are \(O(\beta )\)-close to the GAP \(P=P_1+\dots +P_D\), where \(P_k=\gamma _{k1}\cdot Q + \gamma _{k2}\cdot Q+\dots +\gamma _{kD} \cdot Q\).

5.2 Dependency removal II: decoupling

We now work with the multilinear form appearing in Theorem 4.5. Our goal is to show the following.

Theorem 5.3

Let \(0 <\varepsilon < 1\) and \(B>0\). Let \( \beta >0\) be a parameter that may depend on \(n\). Assume that

$$\begin{aligned} {\mathbb {P}}_{x_{11;11},\dots , x_{dd;1n}}\left( \left| \sum _{1\le i_1,i_2,\dots ,i_d \le dn} a_{i_1i_2\dots i_d}({\mathbf {M}}_{n-1}) m_{1i_1}\dots m_{di_d}\right| \le n^{-A}\vert {\mathbf {M}}_{n-1}\right) \ge n^{-B}. \end{aligned}$$

Then there exist index sets \(I_1,I_1^0\) with \(|I_1|=dn-n^\varepsilon \) and \(|I_1^0|=O_{C,\varepsilon }(1)\) such that for any \(i_1\in I_1\), there exist index sets \(I_2,I_2^0\) depending on \(i_1\) with \(|I_2|=dn-n^\varepsilon \) and \(|I_2^0|=O_{C,\varepsilon }(1)\) such that ...there exist index sets \(I_{d-1},I_{d-1}^0\) depending on \(i_1,\dots ,i_{d-2}\) with \(|I_{d-1}|=n-n^\varepsilon \) and \(|I_{d-1}^0|=O_{C,\varepsilon }(1)\) such that the following holds: for any \(i_{d-1}\in I_{d-1}\), there exist integers \(k_{j_1 \dots j_{d-1}}\), where each index \(j_k\) with \(1\le k\le d-1\) either takes value \(i_k\) or belongs to the thin sets \(I_k^0\), such that \(k_{j_1 \dots j_{d-1}}=n^{O_{C,d,\varepsilon }(1)}\) and \(k_{i_1,\dots ,i_{d-1}}\ne 0\), as well as

$$\begin{aligned} {\mathbb {P}}_{{\mathbf {x}}_d}\left( \left| \sum _{1\le i_d \le n}\sum _{j_1 \dots j_{d-1}}k_{j_1 \dots j_{d-1}}a_{j_1 \dots j_{d-1} i_d}x_{i_d}\right| \le \beta n^{O_{C,\varepsilon }(1)}\right) \ge n^{-O_{C,\varepsilon }(1)}. \end{aligned}$$

Thus Theorem 5.3 asserts that as long as the entries in each block are not totally dependent, the conclusion of Theorem 3.5 still holds as if the matrix entries were mutually independent.

In what follows we introduce the main supporting lemmas to prove Theorem 5.3. By definition, we can rewrite this form as

$$\begin{aligned} \sum _{i_1,\dots ,i_d} a_{i_1 i_2 \dots i_d} \det [{\mathbf {c}}_{i_1},\dots ,{\mathbf {c}}_{i_d}], \end{aligned}$$

where \(\det [{\mathbf {c}}_{i_1},\dots ,{\mathbf {c}}_{i_d}]\) is the determinant of the \(d\times d\) block generated by the \(i_1\)th, \(\dots i_d\)th columns of the matrix of the first \(d\) rows of \(M_n\).

Let \(\mathcal {U}:=\{U_1,\dots ,U_d\}\) be an ordered random partition of \([n]\). These index sets will serve as the collection of blocks (among the \(n\) blocks of size \(d\times d\) of the matrix generated by the first \(d\) rows) to be partitioned. We denote by \(B(U_i)\) the collection of indices generated by \(U_i\), that is

$$\begin{aligned} B(U_i):=\cup _{l\in U_i} \{(d-1)l+1,\dots ,dl\}. \end{aligned}$$
(5.1)

Given any partition \(\mathcal {U}\), we easily obtain the following lemma by a series applications of the Cauchy–Schwarz inequality.

Lemma 5.4

(Decoupling lemma) Assume that

$$\begin{aligned} \rho =\sup _{a} {\mathbb {P}}_{x_{11;11},\dots ,x_{dd;1n}} \left( \left| \sum _{1\le i_1,\dots ,i_d \le dn} a_{i_1 i_2 \dots i_d} \det [{\mathbf {c}}_{i_1},\dots ,{\mathbf {c}}_{i_d}] -a\right| \le \beta \right) \ge n^{-B}, \end{aligned}$$

Then,

$$\begin{aligned}&\!\!\!{\mathbb {P}}_{x_{11;11}',\dots ,x_{dd;1n}'}\left( \left| \sum _{ i_1\in B(U_1),\dots ,i_d\in B(U_d)} a_{i_1 i_2 \dots i_d}\det [{\mathbf {c}}_{i_1},\dots ,{\mathbf {c}}_{i_d}]-a\right| =O_B(\beta \sqrt{\log n})\right) \nonumber \\&\!\!\quad =\Omega (\rho ^{2d}), \end{aligned}$$
(5.2)

where \(({x_{11;11}}',\dots ,{x_{dd;11}}');\dots ; ({x_{11;1n}}',\dots ,{x_{dd;1n}}')\) are iid copies of the vector \((x_{11}-x_{11}',\dots , x_{dd}-x_{dd}')\), and where \((x_{11}',\dots ,x_{dd}')\) is an independent copy of \((x_{11},\dots ,x_{dd})\).

As the proof of Lemma 5.4 is standard, we refer the reader to [6, 20, 35]. As the columns \({\mathbf {c}}_{i_1},\dots ,{\mathbf {c}}_{i_d}\) are independent, we will be able to obtain an analogue of Lemma 3.8 as follows.

Lemma 5.5

There exist index sets \(I_0(U_1)\) with \(|I_0(U_1)|=O(1)\) and \(I(U_1)\subset B(U_1)\) with \(|I(U_1)|\ge d|U_1|-n^\varepsilon \) and an integer \(k\ne 0, k=n^{O_{B,\varepsilon }(1)}\) such that for any \(i\in I(U_1)\), there exist integers \(k_{ii_0}=n^{O_{B,\varepsilon }(1)}\) such that

$$\begin{aligned}&{\mathbb {P}}_{x_{rs;1j}, j\in B(U_2)\cup \dots \cup B(U_d)} \left( \left| k \sum _{i_2\in B(U_2),\dots ,i_d \in B(U_d)} a_{i,i_2 \dots i_d} \det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \right. \right. \\&\qquad \left. \left. -\sum _{i_0\in I_0} k_{ii_0}\sum _{i_2\in B(U_2),\dots ,i_d\in B(U_d)} a_{i_0,i_2 \dots i_d}\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \right| =O\left( \beta n^{O_{B,\varepsilon }(1)}\right) \right) \\&\quad \ge \rho ^{2d}/n^{O_{B,\varepsilon }(1)}, \end{aligned}$$

where \({\mathbf {c}}_{j}^{\bar{1}}\) is the \(j\)th column \({\mathbf {c}}_j\) without its first component.

Proof

(of Lemma 5.5) As usual, it suffices to assume \(\xi \) to have discrete distribution. For each \(l\in U_1\), let \(B_l=\{(l-1)d+1,\dots ,ld\}\) be the \(l\)th block. By the determinant expansion, we have

$$\begin{aligned}&\sum _{i_1\in B_l,i_2\in B(U_2),\dots ,i_d\in B(U_d)}a_{i_1i_2\dots i_d}\det [{\mathbf {c}}_{i_1},\dots ,{\mathbf {c}}_{id}]\\&\quad = \sum _{i=1}^n x_{1i;1l} \sum _{i_2,\dots ,i_d} (-1)^{1+i}a_{((l-1)d+i)i_2 \dots i_d}\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots ,{\mathbf {c}}_{i_d}^{\bar{1}}\right] \\&\quad \quad +\cdots + \sum _{i=1}^n x_{di;1l} \sum _{i_2,\dots ,i_d} (-1)^{d+i}a_{((l-1)d+i)i_2 \dots i_d}\det \left[ {\mathbf {c}}_{i_2}^{\bar{d}},\dots ,{\mathbf {c}}_{i_d}^{\bar{d}}\right] . \end{aligned}$$

By summing over \(l\in U_1\) and by applying Lemma 5.4 and Corollary 5.2 for the random variables \(x_{rs,1j},j\in B(U_1)\), with high probability with respect the the random variables indexing from \(\bar{U}_1\) (i.e. \(x_{rs;1j}, j\in \bar{U}_1\)), most of the coefficients

$$\begin{aligned} \sum _{i_2,\dots ,i_d} a_{((l-1)d+1)i_2 \dots i_d}\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots ,{\mathbf {c}}_{i_d}^{\bar{1}}\right] , \dots , \sum _{i_2,\dots ,i_d} a_{((ld)i_2 \dots i_d}\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots ,{\mathbf {c}}_{i_d}^{\bar{1}}\right] \end{aligned}$$

belong to a GAP of rank \(O_{C,\varepsilon }(1)\) and size \(n^{O_{C,\varepsilon }(1)}\). From here, to conclude Lemma 5.5, we just follow the proof of Lemma 3.8 verbatim. \(\square \)

5.3 Randomization

Roughly speaking, by iterating Lemma 5.5 to the new \((d-1)\)-linear form of the random variables restricted by \(\bar{U}_1\) and so on, we will be able to deduce an analogue of Theorem 3.5 with the dependence upon \(U_1,\dots ,U_d\). One might then try to randomize \(U_1,\dots ,U_d\) to obtain Theorem 5.3. However, the randomization of \(U_1,\dots ,U_d\) altogether may pose a highly technical difficulty. To avoid this hurdle we will try to randomize one pair at a time before each iteration of Lemma 5.5.

Assume that \((U_{12},U_3,\dots ,U_d)\) is a \((d-1)\) ordered partition of \([n]\) in which each partition has order \(\Theta (n)\). Fixing this partition, we next partition \(U_{12}\) into \(U_1,U_2\) randomly. For the new \(d\) partition \((U_1,\dots ,U_d)\) we then apply Lemma 5.5. As a result, the index \(i_2\) in the conclusion belongs to \(B(U_2)\). We will show that by randomizing \(U_1\), one may recover the result for \(i_2\) now an element of \(B(U_{12})\). Let us first extend Lemma 5.5 as follows.

Lemma 5.6

There exist subsets \(I_0(U_{1})\) and \(I(U_{1})\) of \(B(U_{12})\) with size \(O(1)\) and \(d|U_{12}|-n^\varepsilon \) respectively and an integer \(k\ne 0, k=n^{O_{B,\varepsilon }(1)}\) such that for any \(i\in I(U_1)\), there exist integers \(k_{ii_0}=n^{O_{B,\varepsilon }(1)}\)such that the following holds for all \(i\in I(U_1)\) :

$$\begin{aligned}&{\mathbb {P}}_{x_{11;11},\dots , x_{dd;1n}} \left( \left| k \sum _{i_2\in B(U_{12}),\dots ,i_d \in B(U_d)} a_{ii_2 \dots i_d}(U_1) \det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \right. \right. \\&\qquad \left. \left. -\sum _{i_0\in I_0} k_{ii_0}\sum _{i_2\in B(U_{12}),\dots ,i_d\in B(U_d)} a_{i_0i_2 \dots i_d}(U_1)\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \right| =O\left( \beta n^{O_{B,\varepsilon }(1)}\right) \right) \\&\quad \ge \rho ^{2d}/n^{O_{B,\varepsilon }(1)}, \end{aligned}$$

where

$$\begin{aligned} a_{ii_2 \dots i_d}(U_1):= \left\{ \begin{array}{ll} a_{ii_2\dots i_d} &{}\quad if \, i\in B(U_1), i_2\in B(U_2)\\ a_{i_2i\dots i_d} &{}\quad if \, i\in B(U_2),i_2\in B(U_1)\\ 0 &{}\quad otherwise. \end{array}\right. \end{aligned}$$

In comparison with Lemma 5.5, the probability in Lemma 5.6 is now with respect to all random variables of the first \(d\) rows of \(M_n\). Also, \(i_2\) now runs over all the indices restricted by \(U_{12}\). The entries \(a_{ii_2\dots i_d}(U_1)\), without the indices \(i_3,\dots ,i_d\), could be viewed as entries of a symmetric matrix.

Proof

(of Lemma 5.6) We first fix the random variables restricted by \(\bar{U}_{12}\) for which the conclusion of Lemma 5.4 holds with respect to the random variables restricted by \(U_{12}\). Similarly to the proof of Lemma 5.5, the following holds with high probability with respect to \(x_{rs;1i},i\in B(U_2)\): there exist subsets \(I_0(U_1)\) and \(I(U_1)\) of \(B(U_1)\) with size \(O(1)\) and \(d|U_1|-n^\varepsilon \) respectively such that the following holds for all \(i\in I(U_1)\):

$$\begin{aligned}&{\mathbb {P}}_{x_{rs;1i_2}, i_2 \in B(U_2)} \left( \left| k' \sum _{i_2\in B(U_2),\dots ,i_d \in B(U_d)} a_{ii_2 \dots i_d} \det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \right. \right. \nonumber \\&\quad \left. \left. -\sum _{i_0\in I_0} {k'}_{ii_0}\sum _{i_2\in B(U_2),\dots ,i_d\in B(U_d)} a_{i_0i_2 \dots i_d}\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \right| =O\left( \beta n^{O_{B,\varepsilon }(1)}\right) \right) \nonumber \\&\quad \quad \ge \rho ^{2d}/n^{O_{B,\varepsilon }(1)}. \end{aligned}$$
(5.3)

By switching the role of \(U_1\) and \(U_2\), there also exist subsets \(I_0(U_2)\) and \(I(U_2)\) of \(B(U_2)\) with size \(O(1)\) and \(d|U_2|-n^\varepsilon \) respectively such that the following holds for all \(i\in I(U_2)\):

$$\begin{aligned}&{\mathbb {P}}_{x_{rs;1i_1}, i_1\in B(U_1)} \left( \left| k'' \sum _{i_1\in B(U_1),i_3\in B(U_3),\dots ,i_d \in B(U_d)} a_{i_1ii_3 \dots i_d} \det \left[ {\mathbf {c}}_{i_1}^{\bar{1}},{\mathbf {c}}_{i_3}^{\bar{1}}\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \right. \right. \nonumber \\&\qquad \left. \left. -\sum _{i_0\in I_0} {k''}_{ii_0}\sum _{i_1\in B(U_1),i_3\in B(U_3),\dots ,i_d\in B(U_d)} a_{i_1i_0 i_3 \dots i_d}\det \left[ {\mathbf {c}}_{i_1}^{\bar{1}},{\mathbf {c}}_{i_3}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \right| =O\left( \beta n^{O_{B,\varepsilon }(1)}\right) \right) \nonumber \\&\quad \ge \rho ^{2d}/n^{O_{B,\varepsilon }(1)}. \end{aligned}$$
(5.4)

Now, by the definition of \(a_{ii_2\dots i_d}(U_1)\), with \(I=I(U_1)\cup I(U_2)\) and \(I_0=I_0(U_1)\cup I_0(U_2)\), we can rewrite both of the events in (5.3) and (5.4) in the following form

$$\begin{aligned}&k \sum _{i_2\in B(U_{12}),\dots ,i_d \in B(U_d)} a_{ii_2 \dots i_d}(U_1) \det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \\&\quad -\sum _{i_0\in I_0} k_{ii_0}\sum _{i_2\in B(U_2),\dots ,i_d\in B(U_d)} a_{i_0i_2 \dots i_d}(U_1)\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] =O\left( \beta n^{O_{B,\varepsilon }(1)}\right) . \end{aligned}$$

The conclusion of Lemma 5.6 then follows from (5.3) and (5.4), noting that \(\{x_{rs;1i_1}, i_1 \in U_1\}\) and \(\{x_{rs;1i_2}, i_2 \in U_2\}\) are independent. \(\square \)

Now we randomize \(U_1\) to obtain the following main result of the subsection.

Lemma 5.7

(Randomization) There exist subsets \(I_0(U_{12})\) and \(I(U_{12})\) of \(B(U_{12})\) with size \(O(1)\) and \(d|U_{12}|-n^\varepsilon \) respectively such that the following holds for all \(i\in I\):

$$\begin{aligned}&{\mathbb {P}}_{x_{rs;1i},i \in \bar{B}(U_{12});{x_{rs;1i}}',i\in B(U_{12})} \left( \left| k \sum _{i_2\in B(U_{12}),\dots ,i_d \in B(U_d)} a_{i,i_2 \dots i_d} \det \left[ {{\mathbf {c}}_{i_2}^{\bar{1}}}',{\mathbf {c}}_{i_3}^{\bar{1}}, \dots ,{\mathbf {c}}_{i_d}^{\bar{1}}\right] \right. \right. \\&\quad \left. \left. -\sum _{i_0\in I_0} k_{ii_0}\sum _{i_2\in B(U_2),\dots ,i_d\in B(U_d)} a_{i_0,i_2 \dots i_d}\det \left[ {{\mathbf {c}}_{i_2}^{\bar{1}}}',{\mathbf {c}}_{i_3}^{\bar{1}} ,\dots ,{\mathbf {c}}_{i_d}^{\bar{1}}\right] \right| =O\left( \beta n^{O_{B,\varepsilon }(1)}\right) \right) \\&\quad \quad \ge \rho ^{2d}/n^{O_{B,\varepsilon }(1)}, \end{aligned}$$

where \({x^{(i)}_{rs}}':=\eta _{i}x^{(i)}_{rs}\) with \(\eta _i\) iid Bernoulli random variables of parameter \(1/2\), and \({\mathbf {c}}_{i_2}':=\eta _{i_2} {\mathbf {c}}_{i_2}\) in the determinants.

Proof

(of Theorem 5.7) Note that Lemma 5.6 holds for any choice \(U_1\subset U_{12}\). As \(I_0(U_1)\subset [n]^{O_{B,\varepsilon }(1)}\) and \(k(U_1)\le n^{O_{B,\varepsilon }(1)}\), there are only \(n^{O_{B,\varepsilon }(1)}\) possibilities that the tuple \((I_0(U_1),k(U_1))\) can take. Thus, there exists a tuple \((I_0,k)\) such that \(I_0(U_1)=I_0\) and \(k(U_1)=k\) for \(2^{|U_{12}|}/n^{O_{B,\varepsilon }(1)}\) different sets \(U_1\). Let us denote this set of \(U_1\) by \(\mathcal {S}\); we have

$$\begin{aligned} |\mathcal {S}|\ge 2^{|U_{12}|}/n^{O_{B,\varepsilon }(1)}. \end{aligned}$$

Next, let \(I\) be the collection of all \(i\in B(U_{12})\) which belong to at least \(|\mathcal {S}|/2\) index sets \(I(U_1)\). Then,

$$\begin{aligned} |I||\mathcal {S}| + (d|U_{12}|-|I|)|\mathcal {S}|/2&\ge (d|U_{12}|-n^\varepsilon )|\mathcal {S}|\\ |I|&\ge |U_{12}|-2n^\varepsilon . \end{aligned}$$

From now on we fix an \(i\in I\). Consider the tuples \((k_{ii_0}(U_1), i_0\in I_0)\) over all \(U_1\) where \(i\in I(U_1)\). Because there are only \(n^{O_{B,\varepsilon }(1)}\) possibilities such tuples can take, there must be a tuple, say \((k_{ii_0}, i_0\in I_0)\), such that \((k_{ii_0}(U_1), i_0\in I_0)=(k_{ii_0}, i_0\in I_0)\) for at least \(|\mathcal {S}|/2n^{O_{B,\varepsilon }(1)}=2^n/n^{O_{B,\varepsilon }(1)}\) sets \(U_1\). Without loss of generality, we assume that \(i\in U_1\) for at least half of those sets. Let \(\mathcal {U}\) denote the collection of such \(U_1\), and for each \(U_1\in \mathcal {U}\) we let \({\mathbf {u}}=(u_1,\dots ,u_{|B(U_{12})|})\in {\mathbb {R}}^{|B(U_{12})|}\) be its characteristic vector, i.e. \(u_i=1\) if \(i\in B(U_1)\), and \(u_i=0\) otherwise.

By the definition of \(a_{ii_2\dots i_d}(U_1)\), as \(i\in U_1\), we can write

$$\begin{aligned}&k \sum _{i_2\in B(U_{12}),\dots ,i_d \in B(U_d)} a_{i,i_2 \dots i_d}(U_1) \det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \\&\qquad -\sum _{i_0\in I_0} k_{ii_0}\sum _{i_2\in B(U_{12}),\dots ,i_d\in B(U_d)} a_{i_0,i_2 \dots i_d}(U_1)\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \\&\quad = k \sum _{i_2\in B(U_{12}),\dots ,i_d \in B(U_d)} a_{i,i_2 \dots i_d} (1-u_{i_2})\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \\&\qquad - \sum _{i_0\in I_0} k_{ii_0}\sum _{i_2\in B(U_{12}),\dots ,i_d\in B(U_d)} a_{i_0i_2 \dots i_d} (1-u_{i_2})\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] . \end{aligned}$$

Recall that \(|\mathcal {U}|= 2^{|U_{12}|}/n^{O_{B,\varepsilon }(1)}\). Hence, by Lemma 5.6, we have the following joint probability

$$\begin{aligned}&{\mathbb {P}}_{x_{11;11},\dots , x_{dd;1n}} {\mathbb {P}}_{U_1} \left( \left| k \sum _{i_2\in B(U_{12}),\dots ,i_d \in B(U_d)} a_{i,i_2 \dots i_d} (1-u_{i_2})\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \right. \right. \\&\!\!\quad \left. \left. - \sum _{i_0\in I_0} k_{ii_0}\sum _{i_2\in B(U_{12}),\dots ,i_d\in B(U_d)} a_{i_0i_2 \dots i_d} (1\!-\!u_{i_2})\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \right| \!=\! O\left( \beta n^{O_{B,\varepsilon }(1)}\!\right) \!\!\right) \\&\quad \quad =n^{-O_{B,\varepsilon }(1)} . \end{aligned}$$

By applying the Cauchy–Schwarz inequality, we obtain

$$\begin{aligned}&n^{-O_{B,\varepsilon }(1)}\le \Big [{\mathbb {E}}_{x_{rs;1i},1\le i\le n, 1\le r,s\le d} {\mathbb {E}}_{U_1}(...)\Big ]^2 \nonumber \\&\quad \le {\mathbb {E}}_{x_{11;11},\dots , x_{dd;1n}} \Big [{\mathbb {E}}_{U_1}(...))\Big ]^2 = {\mathbb {E}}_{x_{11;11},\dots , x_{dd;1n}} \Big [{\mathbb {E}}_{{\mathbf {u}}}(...))\Big ]^2 \nonumber \\&\quad \le {\mathbb {E}}_{x_{rs;1i},1\le i\le n, 1\le r,s\le d} {\mathbb {E}}_{{\mathbf {u}},{\mathbf {u}}'}\left( k \sum _{i_2\in B(U_{12}),\dots ,i_d \in B(U_d))} a_{i,i_2 \dots i_d} (u_{i_2}-u_{i_2}')\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] \right. \nonumber \\&\quad \quad \left. -\sum _{i_0\in I_0} k_{ii_0}\sum _{i_2\in B(U_{12}),\dots ,i_d\in B(U_d)} a_{i_0 i_2 \dots i_d} (u_{i_2}-u_{i_2}')\det \left[ {\mathbf {c}}_{i_2}^{\bar{1}},\dots {\mathbf {c}}_{i_d}^{\bar{1}}\right] |= O\left( \beta n^{O_{B,\varepsilon }(1)}\right) \right) \nonumber \\&\quad ={\mathbb {E}}_{x_{rs;1i}, i\notin B(U_{12}); {x_{rs;1i}}',i\in B(U_{12}), 1\le r,s\le d} \left( k \sum _{i_2\in B(U_{12}),\dots ,i_d \in B(U_d))} a_{i,i_2 \dots i_d} \det \left[ {{\mathbf {c}}_{i_2}^{\bar{1}}}',\dots ,{\mathbf {c}}_{i_d}^{\bar{1}}\right] \right. \nonumber \\&\quad \quad \left. -\sum _{i_0\in I_0} k_{ii_0}\sum _{i_2\in B(U_{12}),\dots ,i_d\in B(U_d)} a_{i_0 i_2 \dots i_d} \det \left[ {{\mathbf {c}}_{i_2}^{\bar{1}}}',\dots ,{\mathbf {c}}_{i_d}^{\bar{1}}\right] |= O\left( \beta n^{O_{B,\varepsilon }(1)}\right) \right) , \end{aligned}$$
(5.5)

where \({x^{(i)}_{rs}}':=(u_i-u_i')x^{(i)}_{rs}\) and in the determinant formulas the column \({\mathbf {c}}_{i_2}'\) stands for \((u_{i_2}-u_{i_2}'){\mathbf {c}}_{i_2}\). Also, in the first estimate we used the elementary property that for any function \(f\),

$$\begin{aligned}&{\mathbb {E}}_{{\mathbf {u}},{\mathbf {u}}'}\Big (\Vert f({\mathbf {u}})\Vert _2=O\left( \beta n^{O_{B,\varepsilon }(1)}\right) ,\Vert f({\mathbf {u}}')\Vert _2=O\left( \beta n^{O_{B,\varepsilon }(1)}\right) \Big )\\&\quad \le {\mathbb {E}}_{{\mathbf {u}},{\mathbf {u}}'}\Big (\Vert f({\mathbf {u}})-f({\mathbf {u}}')\Vert _2=O\left( \beta n^{O_{B,\varepsilon }(1)}\right) \Big ). \end{aligned}$$

The proof is complete by noting that \(k\) and \(I_0\) are independent of the choice of \(i\). \(\square \)

Proof

(of Theorem 5.3) Note that by Lemma 5.7, we just need to deal with a \((d-1)\)-multilinear form of the rows \({\mathbf {r}}_2,\dots ,{\mathbf {r}}_d\). Our next step is to apply this machinery again when fixing \(U_{123}=U_{12}\cup U_3\) and letting \(U_{12}\) be chosen as a random subset of \(U_{123}\). By iterating the machinery \(d-1\) times similarly to the proof of Theorem 3.5, we obtain the result as claimed. \(\square \)

We now conclude this section by proving the inverse step of Sect. 4.

5.4 Proof of Theorem 4.5

We first apply Theorem 5.3 to obtain

$$\begin{aligned} {\mathbb {P}}_{{\mathbf {x}}_d}\left( \sum _{1\le i_d \le dn}\sum _{j_1 \dots j_{d-1}}k_{j_1 \dots j_{d-1}}a_{j_1 \dots j_{d-1} i_d}x_{i_d}\in B(a,\beta )\right) \ge \rho ^{2d}/n^{O_{C,\varepsilon }(1)}. \end{aligned}$$
(5.6)

Set \(K_0,\dots , K_d\) to be a sequence of thresholds with \(K_i:=n^{-A/2+2i d}\). We consider two cases.

Case 1 (degenerate case)

Subcase 1.1 Assume that for all \(i\in I_1\),

$$\begin{aligned} \sum _{i_2,\dots ,i_d}\left| ka_{ii_2\dots i_d}-\sum k_{ii_0}a_{i_0i_2\dots i_d}\right| ^2 \le K_0^2=n^{-A}. \end{aligned}$$
(5.7)

As \(\sum _{i_1,i_2,\dots ,i_d}|a_{i_1i_2\dots i_d}|^2=1\), there exists \(i_2,\dots ,i_d\) such that

$$\begin{aligned} \sum _{i_1}|a_{i_1i_2\dots i_d}|^2\ge 1/(dn)^{d-1}\ge n^{-O_d(1)}. \end{aligned}$$
(5.8)

We next fix these indices \(i_2,\dots ,i_d\). It follows from (5.7) that \(|ka_{ii_2\dots i_d}-\sum k_{ii_0}a_{i_0i_2\dots i_d}|\) \(\le K_0\) for any \(i\in I_1\). Set

$$\begin{aligned} v_i:=\frac{1}{k}\sum k_{ii_0}a_{i_0i_2\dots i_d}. \end{aligned}$$

It is clear that the set of \(v_i\)’s is a GAP of rank \(|I_0|=O_{B,\varepsilon }(1)\) and size \(n^{O_{B,\varepsilon }(1)}\). Also, by definition, with \({\mathbf {v}}=(v_i,i\in I_1)\)

$$\begin{aligned} \Vert {\mathbf {v}}-(a_{ii_2\dots i_d})_{1\le i\le dn}\Vert \le n^{1/2}K_0 =n^{-A/2+1/2}. \end{aligned}$$

On the other hand, as the vector \((a_{ii_2\dots i_d})_{1\le i\le dn}\) is orthogonal to any row \({\mathbf {r}}_j({\mathbf {M}}_{n-1})\) of \({\mathbf {M}}_{n-1}\), we have

$$\begin{aligned} |\langle {\mathbf {v}},{\mathbf {r}}_j({\mathbf {M}}_{n-1}) \rangle |\le n^{-A/2+O_{B,\varepsilon }(1)}. \end{aligned}$$

Recall that by the approximation and by (5.7), \(\Vert {\mathbf {v}}\Vert \ge n^{-O_d(1)}\). Thus by letting \({\mathbf {u}}:={\mathbf {v}}/\Vert {\mathbf {v}}\Vert \), we have

$$\begin{aligned} |\langle {\mathbf {u}},{\mathbf {r}}_j({\mathbf {M}}_{n-1}) \rangle |\le n^{-A/2+O_{B,\varepsilon ,d}(1)}. \end{aligned}$$

It is clear that \({\mathbf {u}}\) satisfies all the conditions of Theorem 4.5, we are done with this subcase.

Subcase 1.2 From now on we assume that there exists \(i\in I_1\) such that

$$\begin{aligned} \sum _{i_2,\dots ,i_d}\left| ka_{ii_2\dots i_d}-\sum k_{ii_0}a_{i_0i_2\dots i_d}\right| ^2 \ge K_1^2=n^{-A}. \end{aligned}$$
(5.9)

Fixing \(i\), we apply Theorem 5.3 for the index \(i_2\), and reconsidering Subcases 1.1 with the new threshold \(K_1:=n^{-A/2-2d}\). By iterating the process for at most \(d-1\) steps, we will end up with either Subcase 1.1 (and hence done) or with the following non-degenerate case.

Case 2 (non-degenerate case) There exist a collection \(J_1,\dots , J_{d-1}\) of the indices \(j_1,\dots ,j_{d-1}\) such that \(|J_i|=O_{B,\varepsilon }(1)\) and

$$\begin{aligned} \sum _{1\le i \le dn}\left| \sum _{j_1\in J_1, \dots j_{d-1}\in J_{d-1}}k_{j_1 \dots j_{d-1}}a_{j_1 \dots j_{d-1} i}\right| ^2 \ge K_d^2= n^{-A+2d^2}. \end{aligned}$$

Notice that for each fixed \(i_1,\dots ,i_{d-1}\) the vector \((a_{i_1,\dots ,i_{d-1},i},1\le i\le dn,i\ne i_1,\dots ,i_{d-1})\) is orthogonal to any \({\mathbf {r}}_j^{(i_1,\dots ,i_{d-1})}({\mathbf {M}}_{n-1})\), the \(j\)th row of \({\mathbf {M}}_{n-1}\) without components \(i_1,\dots ,i_{d-1}\). By adding zeros to the missing components \(i_1,\dots ,i_{d-1}\) if needed, we see that the \({\mathbb {R}}^{nd}\) vector \((a_{i_1,\dots ,i_{d-1},i},1\le i\le dn)\) is now orthogonal to \({\mathbf {r}}_j({\mathbf {M}}_{n-1})\).

Set \(J=J_1\cup \dots \cup J_{d-1}\), we thus have

$$\begin{aligned} \sum _{i\notin J,1\le i\le dn} a_{j_1,\dots ,j_{d-1},i} {\mathbf {r}}_j(i) = -\sum _{i\in J,1\le i\le dn}a_{j_1,\dots ,j_{d-1},i}{\mathbf {r}}_j(i), \end{aligned}$$

where the entries \(a_{j_1,\dots ,j_{d-1},i}\) are set to be zero if the indices are not distinct.

Now we set \({\mathbf {w}}_1:=(w_i)_{i\notin J}\) and \({\mathbf {w}}_2:=(w_i)_{i\in J}\), where \(w_i:=k_{j_1 \dots j_{d-1}}a_{i_1 \dots i_{d-1} i}\). Then

$$\begin{aligned} \left\langle {\mathbf {w}}_1,{\mathbf {r}}_j^{(\bar{J})}({\mathbf {M}}_{n-1})\right\rangle = -\left\langle {\mathbf {w}}_2, {\mathbf {r}}_j^{(J)}({\mathbf {M}}_{n-1})\right\rangle . \end{aligned}$$

Set \({\mathbf {v}}:={\mathbf {w}}/\Vert {\mathbf {w}}\Vert \). Theorem 3.2 applied to (5.6) implies that \({\mathbf {v}}\) can be approximated by a vector \({\mathbf {u}}\) as follows.

  • \(|u_i-v_i|\le n^{-A/2+O_{B,\varepsilon ,d}(1)}\) for all \(i\).

  • There exists a GAP of rank \(O_{B,\varepsilon }(1)\) and size \(n^{O_{B,\varepsilon }(1)}\) that contains at least \(dn-n^\varepsilon \) components \(u_i\).

  • All the components \(u_i\), and all the generators of the GAP are rational complex numbers of the form \(\frac{p}{q}+\sqrt{-1}\frac{p'}{q'}\), where \(|p|,|q|,|p'|,|q'| \le n^{A/2+O_{B,\varepsilon }(1)}\).

Note that, by the approximation above, \(\Vert {\mathbf {u}}\Vert \asymp 1\) and \(|\langle {\mathbf {u}}_1,{\mathbf {r}}_j^{(\bar{J})}({\mathbf {M}}_{n-1}) \rangle + \langle {\mathbf {u}}_2,{\mathbf {r}}_j^{(J)}({\mathbf {M}}_{n-1})| \le n^{-A/2+O_{B,\varepsilon }(1)}\) for all row vectors of \({\mathbf {M}}_{n-1}\).

6 Singularity of block matrices: proof sketch for Theorem 4.6

Our first ingredient is the following variant of Theorem 3.2 in which random variables are replaced by random matrices and \(a_i\) are replaced by vectors.

Theorem 6.1

(Inverse Littlewood–Offord for sequence of random operators) Let \(\{{\mathbf {u}}^{(i)}=({\mathbf {u}}_1^{(i)},\dots ,{\mathbf {u}}_d^{(i)}), 1\le i\le n\}\) be a sequence of \(n\) vectors in \({\mathbb {C}}^d\) such that the following concentration-type holds with high probability

$$\begin{aligned} \sup _{{\mathbf {u}}\in {\mathbb {C}}^d} {\mathbb {P}}_{X^{(1)},\dots ,X^{(n)}} \left( \sum _{1\le i\le n} X^{(i)}{\mathbf {u}}^{(i)}\in B({\mathbf {u}},\beta )\right) =\gamma =n^{-O(1)}, \end{aligned}$$

where \(X^{(1)},\dots ,X^{(n)}\) are iid block matrices whose entries are copies of \((x_{11},\dots ,x_{dd})\) from (4.1). Then there exist a positive constant \(\delta \) and \(d^2\) numbers \(c_{11},\dots ,c_{dd}\) such that the least singular value \(\sigma _d\) (largest singular value \(\sigma _1\)) of the matrix \((c_{ij})_{1\le i,j\le d}\) is at least \(\delta \) (at most \(\delta ^{-1}\)) and for any number \(n'\) between \(n^\varepsilon \) and \(n\), there exists a proper symmetric GAP \(Q=\{\sum _{i=1}^r k_ig_i : |k_i|\le K_i \}\subset {\mathbb {C}}^d\) such that

  • At least \(n-n'\) elements of \(V:=\{(c_{11}{\mathbf {u}}^{(i)}_1+\dots +c_{1d}{\mathbf {u}}^{(i)}_d, \dots , c_{d1}{\mathbf {u}}^{(i)}_1+\dots +c_{dd}{\mathbf {u}}^{(i)}_d), 1\le i\le n\}\) are \(\beta \)-close to \(Q\).

  • \(Q\) has small rank, \(r=O_{B,\varepsilon }(1)\), and small size

    $$\begin{aligned} |Q| \le \max \left\{ O_{B,\varepsilon }\left( \frac{\gamma ^{-1}}{\sqrt{n'}}\right) ,1\right\} . \end{aligned}$$
  • There is a non-zero integer \(p=O_{B,\varepsilon }(\sqrt{n'})\) such that all steps \(g_i\) of \(Q\) have the form \(g_i=(g_{i1},\dots ,g_{id})\), where \(g_{ij}=\beta \frac{p_{ij}}{p} \) with \(p_{ij} \in {\mathbb {Z}}\) and \(|p_{ij}|=O_{B,\varepsilon }(\beta ^{-1} \sqrt{n'}).\)

In application, \(X^{(1)},\dots , X^{(n)}\) will be the \(d\times d\) blocks of \({\mathbf {M}}_{n-1}\). It is crucial to notice that, as most of the elements of \(V\) are \(\beta \)-close to \(Q\), and as the matrix \((c_{ij})\) is far from being degenerate, it follows from Theorem 6.1 that most of the individual components \({\mathbf {u}}^{(i)}_j\) are also close to another GAP of small rank and size (see Corollary 5.2). We will present the proof of Theorem 6.1 in Appendix by following the treatment from [21]. For the rest of this section we sketch the proof of Theorem 4.6 following [19, 32].

Let \(\mathcal {N}\) be the number of such structural vectors \({\mathbf {u}}\) from Theorem 4.5. Because each GAP is determined by its generators and dimensions, the number of \(Q\)’s is bounded by

$$\begin{aligned} \left( n^{2A+O_{B,\varepsilon }(1)}\right) ^{O_{B,\varepsilon }(1)} \left( n^{O_{B,\varepsilon }(1)}\right) ^{O_{B,\varepsilon }(1)} = n^{O_{A,B,\varepsilon }(1)}. \end{aligned}$$

Next, for a given \(Q\), there are at most \(n^{O_{B,\varepsilon }(n)}\) ways to choose the \(nd-2n^\varepsilon \) components \(u_i\) that \(Q\) contains, and \(n^{O_{A,B,\varepsilon }(n^\varepsilon )}\) ways to choose the remaining components from the set \(\{\frac{p}{q}+i \frac{p'}{q'}, |p|,|q|,|p'|,|q'|\le n^{A/2+O_{B,\varepsilon }(1)}\}\). Hence, we obtain the key bound

$$\begin{aligned} \mathcal {N}\le n^{O_{A,B,\varepsilon }(1)} n^{O_{B,\varepsilon }(n)} n^{O_{A,B,\varepsilon }(n^\varepsilon )} = n^{O_{B,\varepsilon }(n)}. \end{aligned}$$
(6.1)

From now on, by conditioning on \({\mathbf {u}}_1\) and on the entries of \({\mathbf {M}}_{n-1}\) corresponding to the index \(i_1,\dots ,i_d\) of \({\mathbf {u}}_1\), without loss of generality we assume that \({\mathbf {u}}_1\) vanishes. Set \(\beta _0:=n^{-A/2+O_{B,\varepsilon }(1)}\), the bound obtained from the conclusion of Theorem 4.5. We will denote the blocks of \({\mathbf {M}}_{n-1}\) by \(X^{(i)}_j\) with \(1\le i\le n\) and \(1\le j\le n-1\). For a given vector \({\mathbf {u}}\), we define \({\mathbb {P}}_{\beta _0}({\mathbf {u}})\) as follows

$$\begin{aligned} {\mathbb {P}}_{\beta _0}({\mathbf {u}}):={\mathbb {P}}\left( \sum _{1\le i\le n} X^{(i)}_j{\mathbf {u}}^{(i)}\in B(0,\beta _0) \text{ for } \text{ at } \text{ least } (n-1)-O_{B,\varepsilon }(1) \text{ indices } j\right) . \end{aligned}$$

If \({\mathbf {u}}\) satisfies the property above, we say that \({\mathbf {u}}\) is \(\beta _0\) -orthogonal to almost all blocks of \({\mathbf {M}}_{n-1}\). Because the blocks of \({\mathbf {M}}_{n-1}\) are iid,

$$\begin{aligned} {\mathbb {P}}_{\beta _0}({\mathbf {u}}) \le \left( {\mathbb {P}}_{X^{(1)},\dots ,X^{(n)}} \left( \sum _{1\le i\le n} X^{(i)}{\mathbf {u}}^{(i)}\in B(0,\beta _0) \right) \right) ^{n-O(1)}:=\gamma _{\beta _0}({\mathbf {u}})^{n-O(1)}, \end{aligned}$$

where \(X^{(i)}\) are iid copies of \((x_{st})_{1\le s,t\le d}\).

If \(\gamma =\gamma _{\beta _0}({\mathbf {u}})\) is small, say \(n^{-\Omega (1)}\), then \({\mathbb {P}}_{\beta _0}({\mathbf {u}})\) is \(n^{-\Omega (n)}\). Thus the contribution of these \({\mathbb {P}}_{\beta _0}({\mathbf {u}})\) in the total sum \(\sum _{{\mathbf {u}}}{\mathbb {P}}_{\beta _0}({\mathbf {u}})\) is negligible as \(\mathcal {N}=n^{O(n)}\).

For the case \(\gamma \) is comparably large, \(\gamma =n^{-O(1)}\), then by Theorem 6.1, most of the components \(u_i\) are close to a new GAP of rank \(O(1)\) and of size \(O(\gamma ^{-1}/\sqrt{n})\). This would then enable us to approximate \({\mathbf {u}}\) by a new vector \({\mathbf {u}}'\) in such a way that \(|\langle {\mathbf {u}}',{\mathbf {r}}_i({\mathbf {M}}_{n-1})\rangle |\), where we recall that \({\mathbf {r}}_i({\mathbf {M}}_{n-1})\) is the \(i\)th row of \({\mathbf {M}}_{n-1}\), is still of order \(O(\beta _0)\) and the components of \({\mathbf {u}}'\) are now from the new GAPs (after a linear transformation). The number \(\mathcal {N'}\) of these \({\mathbf {u}}'\) can be bounded by \((\gamma ^{-1}/n^\varepsilon )^{n}\), while we recall that \({\mathbb {P}}_{\beta _0}({\mathbf {u}}')\) is of order \(\gamma ^{-n}\). Thus, summing over \({\mathbf {u}}'\) we obtain the desired bound

$$\begin{aligned} \sum _{{\mathbf {u}}'} {\mathbb {P}}_{\beta _0}({\mathbf {u}}') \le \#\{ \text{ new } \text{ GAPs } \} (\gamma ^{-1}/n^\varepsilon )^{n} \gamma ^{-n} = O(n^{-\varepsilon n+O(1)}). \end{aligned}$$

To proceed further, we need the following elementary claim.

Claim 6.2

Assume that \({\mathbf {C}}=(c_{ij})\) is a \(d\times d\) matrix such that \(\delta \le \sigma _d({\mathbf {C}})\le \sigma _1({\mathbf {C}}) \le \delta ^{-1}\). Let \({\mathbf {u}}'=({\mathbf {u}}'^{(1)},\dots ,{\mathbf {u}}'^{(n)})\), where \({\mathbf {u}}'^{(i)} ={\mathbf {C}}{\mathbf {u}}^{(i)}\). Then we have

$$\begin{aligned} \gamma _\beta ({\mathbf {u}})\le \gamma _{\delta ^{-1} \beta }({\mathbf {u}}')\le \gamma _{\delta ^{-2}\beta }({\mathbf {u}}). \end{aligned}$$

By paying a factor of \(n^{O_{B,\varepsilon }(1)}\) in probability, we may assume that \(|\langle {\mathbf {u}},{\mathbf {r}}_i(M_{n}) \rangle | \le \beta _0\) for the first \(d(n-1)-O_{B,\varepsilon }(1)\) rows of \({\mathbf {M}}_{n}\). Also, by paying another factor of \(n^{n^\varepsilon }\) in probability, we may assume that the first \(d n_0\) components of \({\mathbf {u}}\) belong to a GAP \(Q\), and the Euclidean norm of \({\mathbf {u}}^{(n_0)}\) is comparable, \(\Vert {\mathbf {u}}^{(n_0)}\Vert = \Omega (1/n)\) (recall that \(\Vert {\mathbf {u}}\Vert \asymp 1\)), where

$$\begin{aligned} n_0:=n-n^\varepsilon . \end{aligned}$$

We refer to the remaining \(u_i\)’s as exceptional components. Note that these extra factors do not affect our final bound \(\exp (-\Omega (n))\). Because \(\Vert {\mathbf {u}}^{(n_0)}\Vert = \Omega (1/n)\) and \(X^{(n_0)}\) is not degenerate with high probability, there exist positive constants \(c_1,c_2\) such that \(c_2<1\) and for any \(\beta \le c_1/\sqrt{n}\) we have

$$\begin{aligned} \gamma _{\beta }({\mathbf {u}})&\le \sup _a {\mathbb {P}}_{X^{(n_0)}}(|X^{(n_0)}{\mathbf {u}}^{(n_0)}-a|\le \beta ) \le 1-c_2. \end{aligned}$$
(6.2)

6.1 Classification

Next, let \(C\) be a sufficiently large constant depending on \(B\) and \(\varepsilon \) but not \(A\). We classify \({\mathbf {u}}\) into two classes \(\mathcal {B}\) and \(\mathcal {B}'\), depending on whether \({\mathbb {P}}_{\beta _0}({\mathbf {u}})\ge n^{-Cn}\) or not. Because of (6.1), for \(C\) large enough,

$$\begin{aligned} \sum _{{\mathbf {u}}\in \mathcal {B}'}{\mathbb {P}}_{\beta _0}({\mathbf {u}})\le n^{O_{B,\varepsilon }(n)}/n^{Cn} \le n^{-n/2}. \end{aligned}$$
(6.3)

For the set \(\mathcal {B}\) of remaining vectors, we divide it into two subfamilies. Set \(n':=n^{1-\varepsilon }\). We say that \({\mathbf {u}}\in \mathcal {B}\) is compressible if for any \(n'\) components \({\mathbf {u}}^{(i_1)},\dots ,{\mathbf {u}}^{(i_{n'})}\) among the \({\mathbf {u}}^{(1)},\dots , {\mathbf {u}}^{(n_0)}\), we have

$$\begin{aligned} \sup _a{\mathbb {P}}_{X_{i_1},\dots ,X_{i_{n'}}}\left( \left| X_{i_1}{\mathbf {u}}^{(i_1)}+\dots +X_{i_{n'}}{\mathbf {u}}^{(i_{n'})}-a\right| \!\right. \left. \le n^{-B-4}\right) \!\ge \! (n')^{-1/2+o(1)}.\quad \quad \ \end{aligned}$$
(6.4)

Let \(\mathcal {B}_1\) and \(\mathcal {B}_2\) be the set of compressible and incompressible vectors respectively. We focus on \(\mathcal {B}_1\) first.

6.2 Approximation for compressible vectors

Set \(\beta :=n^{-B-4}\). It follows from Theorem 6.1 that, among any \({\mathbf {u}}^{(i_1)},\dots ,{\mathbf {u}}^{(i_{n'})}\), there are, say, at least \(n'/2+1\) vectors that belong to a ball of radius \(\beta \) in \({\mathbb {C}}^d\) (because our GAP now has only one element after a linear transformation \({\mathbf {C}}=(c_{ij})_{1\le i,j\le d}\)). A simple covering argument then implies that there is a ball of radius \(2\beta \) that contains all but \(n'-1\) vectors \({\mathbf {u}}^{(i)}\).

Thus there exists a vector \({\mathbf {u}}'=({\mathbf {u}}'^{(1)},\dots ,{\mathbf {u}}'^{(n)})\in (2\beta )\cdot ({\mathbb {Z}}+ \sqrt{-1} {\mathbb {Z}})^{nd}\) such that

  • \(|{\mathbf {C}}{\mathbf {u}}^{(i)}-{\mathbf {u}}'^{(i)}|\le 4\beta \) for all \(i\);

  • \({\mathbf {u}}'^{(i)}\) takes the same vector-value for at least \(n_0-n'\) indices \(i\).

Because of the approximation and Assumption 4.2, whenever \(\sum _{1\le i\le n} X^{(i)}{\mathbf {u}}^{(i)}\in B({\mathbf {u}},\beta )\), we also have

$$\begin{aligned} \left| \sum _{1\le i\le n} X^{(i)}{{\mathbf {u}}'}^{(i)}-{\mathbf {C}}X^{(i)}{\mathbf {u}}^{(i)}\right| \le n(n^{B+1}+n^\alpha )(4\beta )+\beta _0:=\beta '. \end{aligned}$$

By definition, \(\beta ' \le c_1/\sqrt{n}\), and thus by (6.2), \({\mathbb {P}}_{\beta '}({\mathbf {u}}') \le (1-c_2)^{(1-o(1))n}\). Now we bound the number of \({\mathbf {u}}'\) obtained from the approximation. First, there are \(O(n^{n-n_0+n'}) = O(n^{2n^{1-\varepsilon }})\) ways to choose those \({\mathbf {u}}'^{(i)}\) that take the same vector \({\mathbf {w}}\in {\mathbb {C}}^d\), and there are just \(O(\beta ^{-d})\) ways to choose \({\mathbf {w}}\). The remaining components belong to the set \((2\beta )^{-d}\cdot ({\mathbb {Z}}+ i{\mathbb {Z}})^d\), and thus there are at most \(O((\beta ^{-d})^{n-n_0+n'})= O(n^{O_{A,B,\varepsilon }(n^{1-\varepsilon })})\) ways to choose them. Hence we obtain the total bound

$$\begin{aligned} \sum _{{\mathbf {u}}\in \mathcal {B}_1}{\mathbb {P}}_{\beta _0}({\mathbf {u}}) \le \sum _{{\mathbf {u}}'}{\mathbb {P}}_{\beta '}({\mathbf {u}}')&\le O\left( n^{2n^{1-\varepsilon }}\right) O\left( n^{O_{A,B,\varepsilon }(n^{1-\varepsilon })}\right) (1-c_2)^{(1-o(1))n}\\&\le (1-c_2)^{(1-o(1))n}. \end{aligned}$$

6.3 Approximation for incompressible vectors

The treatment here is similar, we will sketch the main steps, leaving the details for the reader as an exercise.

First, by exposing the rows of \({\mathbf {M}}_{n-1}\) accordingly, and by paying an extra factor \(\left( {\begin{array}{c}n_0\\ n'\end{array}}\right) =O(n^{n^{1-\varepsilon }})\) in probability, we can assume that the components \({\mathbf {u}}^{(n_0-n'+1)},\dots ,{\mathbf {u}}^{(n_0)}\) satisfy

$$\begin{aligned}&\sup _{{\mathbf {a}}\in {\mathbb {C}}^d}{\mathbb {P}}_{X^{(n_0-n'+1)},\dots ,X^{(n_0)}}\left( \left| X^{(n_0-n'+1)} {\mathbf {u}}^{(n_0-n'+1)}+\dots +X^{(n_0)}{\mathbf {u}}^{(n_0)}-{\mathbf {a}}\right| \le n^{-B-4}\right) \nonumber \\&\quad \le (n')^{-1/2+o(1)}\nonumber \\&\quad \le n^{-1/2+\varepsilon /2+o(1)}. \end{aligned}$$
(6.5)

Next, define a radius sequence \(\beta _k, k\ge 0\) where \(\beta _0=n^{-A/2+O_{B,\varepsilon }(1)}\) is the bound obtained from the conclusion of Theorem 4.5, and \(\beta _{k+1}:= (n^{B+2}+n^{\alpha +1}+1)^2 \beta _k.\) Also define

$$\begin{aligned} \gamma _{\beta _k}({\mathbf {u}})\!:= \!\sup _{{\mathbf {a}}\in {\mathbb {C}}^d}{\mathbb {P}}_{X^{(n_0-n'+1)},\dots ,X^{(n_0)}}\left( \left| X^{(n_0-n'+1)} {\mathbf {u}}^{(n_0-n'+1)}+\dots +X^{(n_0)}{\mathbf {u}}^{(n_0)}\!-\!{\mathbf {a}}\right| \!\le \! \beta \!\right) \!. \end{aligned}$$

Clearly \({\mathbb {P}}_{\beta _k}({\mathbf {u}}) \le \gamma ^{n-1}_{\beta _k}({\mathbf {u}})\). As \({\mathbf {C}}\) is non-degenerate, with \({\mathbf {u}}'\) from Theorem 6.1,

$$\begin{aligned} \gamma _{\beta _k}({\mathbf {u}}) \le \gamma _{(n(n^{B+1}+n^\alpha ) \beta _k +\beta _k))}({\mathbf {u}}') \le \gamma _{\beta _{k+1}}({\mathbf {u}}). \end{aligned}$$
(6.6)

Furthermore, we have freedom to choose \(k\) before applying Theorem 6.1 to obtain \({\mathbf {u}}'\). By the pigeon-hole principle, there exists \(k=k_0({\mathbf {u}})\le C\varepsilon ^{-1}\) such that

$$\begin{aligned} \pi _{\beta _{k_0+1}}({\mathbf {u}}) \le n^{\varepsilon n} \pi _{\beta _{k_0}}({\mathbf {u}}). \end{aligned}$$
(6.7)

Since \(A\) was chosen sufficiently large compared to \(O_{B,\varepsilon }(1)\) and \(C\), we have \(\beta _{k_0+1}\le n^{-B-4}\). With this choice of \(k_0\), we apply Theorem 6.1 to obtain an approximation \({\mathbf {u}}'\) of \({\mathbf {C}}{\mathbf {u}}\) with the following properties.

  1. (i)

    \(|{\mathbf {C}}{\mathbf {u}}^{(i)}-{\mathbf {u}}'^{(i)}|\le \beta _{k_0}\) for all \(i\).

  2. (ii)

    The components of \({{\mathbf {u}}'}^{(i)}\) belong to \(Q\) for all but \(n^{1-2\varepsilon }\) indices \(i\), and the generators of \(Q\), belong to the set \(\beta _{k_0}\cdot \{p/q +\sqrt{-1} p'/q' , |p|,|q|,|p'|,|q'|\le n^{A/2+O_{B,\varepsilon }(1)}\}\).

  3. (iii)

    \(Q\) has rank \(O_{B,\varepsilon }(1)\) and size \(|Q|=O(\gamma _{\beta _{k_0}({\mathbf {u}})}^{-1}/n^{1/2-\varepsilon })\).

Let \(\mathcal {B'}\) be the collection of such \({\mathbf {u}}'\). By definition,

$$\begin{aligned} {\mathbb {P}}_{(n^{B+2}+n^{\alpha +1}+1)\beta _{k_0}}({\mathbf {u}}')= \gamma ^{n-1}_{(n^{B+2}+n^{\alpha +1}+1)\beta _{k_0}}({\mathbf {u}}') \le \gamma ^{n-1}_{\beta _{k_0+1}}({\mathbf {u}}) \le n^{\varepsilon n}\gamma ^{n-1}_{\beta _{k_0}}({\mathbf {u}}). \end{aligned}$$
(6.8)

Arguing similarly to the treatment for \(\mathcal {N}\), we can bound the cardinality \(\mathcal {N'}\) of \(\mathcal {B'}\) by

$$\begin{aligned} \mathcal {N}'\le \gamma _{\beta _{k_0}}({\mathbf {u}})^{-n}/n^{(1/2-\varepsilon -o(1))n}. \end{aligned}$$
(6.9)

It follows from (6.8) and (6.9) that

$$\begin{aligned} \sum _{{\mathbf {u}}'\in \mathcal {B'}}{\mathbb {P}}_{(n^{B+2}+n^{\alpha +1}+1)\beta _{k_0}}({\mathbf {u}}') \le n^{-(1/2-4\varepsilon -o(1))n}, \end{aligned}$$

completing the treatment for incompressible vectors.

7 Universality of random block matrices: proof of Theorem 2.4

This section is devoted to Theorem 2.4. We begin by introducing the following notation. Given a \(n \times n\) matrix \({\mathbf {M}}\), we let \(\mu _{{\mathbf {M}}}\) denote the empirical measure built from the eigenvalues of \({\mathbf {M}}\) and \(\nu _{{\mathbf {M}}}\) denote the symmetric empirical measure built from the singular values of \({\mathbf {M}}\). That is,

$$\begin{aligned} \mu _{{\mathbf {M}}} := \frac{1}{n} \sum _{i =1}^n \delta _{\lambda _i({\mathbf {M}})} \quad \text {and}\quad \nu _{{\mathbf {M}}} := \frac{1}{2n} \sum _{i =1}^n \left( \delta _{\sigma _i({\mathbf {M}})} + \delta _{-\sigma _i({\mathbf {M}})} \right) \!, \end{aligned}$$

where \(\lambda _1({\mathbf {M}}), \ldots , \lambda _n({\mathbf {M}}) \in \mathbb {C}\) are the eigenvalues of \({\mathbf {M}}\) and \(\sigma _1({\mathbf {M}}) \ge \cdots \ge \sigma _n({\mathbf {M}})\) are the singular values of \({\mathbf {M}}\). Recall that \(F^{{\mathbf {M}}}\) is the ESD of \({\mathbf {M}}\). In particular, we have

$$\begin{aligned} F^{{\mathbf {M}}}(x,y) = \int \limits _{-\infty }^x \int \limits _{-\infty }^{y} \mu _{{\mathbf {M}}}(z)dt ds, \end{aligned}$$

where \(z = s + \sqrt{-1}t\).

Many of the techniques used to study Hermitian matrices fail to work for non-Hermitian matrices [2, Section 11.1]. Consider a \(n \times n\) non-Hermitian matrix \({\mathbf {M}}\). In [10, 11], Girko introduced a natural connection between \(\mu _{{\mathbf {M}}}\) and the collection of measures \(\{\nu _{{\mathbf {M}}- z {\mathbf {I}}}\}_{z \in {\mathbb {C}}}\). Formally, we present this connection as Lemma 7.1 below.

Lemma 7.1 follows from [2, Lemma 11.2] and is based on Girko’s original observation [10, 11]. The lemma has appeared in a number of different forms; for example, see [5, Lemma 4.3] and [12].

Lemma 7.1

(Lemma 11.2 from [2]) Let \({\mathbf {M}}\) be a \(n \times n\) matrix. For any \(uv \ne 0\), we have

$$\begin{aligned}&\int \int e^{\sqrt{-1} ux + \sqrt{-1} uy} F^{{\mathbf {M}}}(dx, dy) \\&\quad = \frac{u^2 + v^2}{4 \sqrt{-1} u \pi } \int \int \frac{ \partial }{\partial s} \left[ \int \limits _{0}^\infty \ln |x|^2 \nu _{{\mathbf {M}}- z{\mathbf {I}}}(dx) \right] e^{\sqrt{-1} us + \sqrt{-1} vt} dt ds, \end{aligned}$$

where \(z = s + \sqrt{-1} t\).

We define the function

$$\begin{aligned} g_{{\mathbf {M}}}(s,t) := \frac{\partial }{\partial s} \int \limits _{0}^\infty \log |x|^2 \nu _{{\mathbf {M}}-z{\mathbf {I}}}(dx), \end{aligned}$$
(7.1)

where \(z = s + \sqrt{-1} t\). We also define

$$\begin{aligned} g(s,t) := \left\{ \begin{array}{ll} \frac{2s}{s^2 + t^2}, &{}\quad \text {if } s^2 + t^2 >1\\ 2s, &{}\quad \text {otherwise} \end{array} \right. \!\! . \end{aligned}$$
(7.2)

Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\). We define the \(2dn \times 2dn\) Hermitian matrix

$$\begin{aligned} {\mathbf {H}}_n = {\mathbf {H}}_n(z) := \begin{bmatrix} {\mathbf 0}&\quad \frac{1}{\sqrt{n}} {\mathbf {X}}_n - z {\mathbf {I}}\\ \frac{1}{\sqrt{n}} {\mathbf {X}}_n^*- \bar{z} {\mathbf {I}}&\quad {\mathbf 0}\end{bmatrix} \end{aligned}$$

for \(z \in \mathbb {C}\). It is straight-forward to verify that the eigenvalues of \({\mathbf {H}}_n\) are given by

$$\begin{aligned} \pm \sigma _1\left( \frac{1}{\sqrt{n}}{\mathbf {X}}_n - z{\mathbf {I}}\right) , \pm \sigma _2\left( \frac{1}{\sqrt{n}}{\mathbf {X}}_n - z{\mathbf {I}}\right) , \ldots , \pm \sigma _{dn}\left( \frac{1}{\sqrt{n}}{\mathbf {X}}_n - z{\mathbf {I}}\right) . \end{aligned}$$

In other words, \(\nu _{\frac{1}{\sqrt{n}} {\mathbf {X}}_n - z {\mathbf {I}}}\) is the empirical spectral measure of \({\mathbf {H}}_n\). By Lemma 7.1, the problem of studying \(\mu _{\frac{1}{\sqrt{n}} {\mathbf {X}}_n}\) reduces to studying the eigenvalue distribution of \({\mathbf {H}}_n\).

7.1 Truncation

In practice, it will be more convenient to work with a truncated version of \({\mathbf {H}}_n\). That is, we will work with a new matrix \(\hat{{\mathbf {H}}}_n\) whose entries are truncated versions of the entries of the original matrix \({\mathbf {H}}_n\). This subsection is devoted to the following standard truncation arguments.

Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume

$$\begin{aligned} m_{2+\eta } := \max _{1 \le s,t \le d} {\mathbb {E}}|\xi _{st}|^{2 + \eta } < \infty , \end{aligned}$$
(7.3)

for some \(\eta > 0\). Let \(\delta > 0\). For each \(s,t \in \{1,\ldots ,d\}\), we define

$$\begin{aligned} \tilde{\xi }_{st}^{(n)} := \xi _{st} \mathbf {1}_{\{|\xi _{st}| \le n^{\delta }\}} - {\mathbb {E}}\left[ \xi _{st} \mathbf {1}_{\{|\xi _{st}| \le n^{\delta }\}} \right] \quad \text {and} \quad \hat{\xi }_{st}^{(n)} := \frac{\tilde{\xi }_{st}^{(n)}}{\sqrt{d{{\mathrm{Var}}}(\tilde{\xi }_{st}^{(n)})}}. \end{aligned}$$

Here \(\mathbf {1}_{E}\) denotes the indicator function of the event \(E\). We present the following standard truncation lemma.

Lemma 7.2

Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume (7.3) holds for some \(\eta > 0\). For each \(\delta > 0\), there exists \(n_0 > 0\) such that the following holds for all \(n > n_0\).

  1. (i)

    For each \(s,t \in \{1,\ldots ,d\}\), \(\hat{\xi }_{st}^{(n)}\) has mean zero and variance \(1/d\).

  2. (ii)

    a.s. \(\max _{1 \le s,t \le d} \left| \hat{\xi }_{st}^{(n)}\right| \le 4 n^{\delta }\).

  3. (iii)

    We have

    $$\begin{aligned} \max _{1 \le s,t \le d} \left| 1/d - {{\mathrm{Var}}}( \tilde{\xi }_{st}^{(n)} ) \right| \le 2 \frac{m_{2+\eta }}{n^{\delta \eta }}. \end{aligned}$$
  4. (iv)

    We have

    $$\begin{aligned} \max _{ (s,t) \ne (u,v)} \left| {\mathbb {E}}\left[ \hat{\xi }_{st}^{(n)} \overline{\hat{\xi }_{uv}^{(n)}} \right] \right| \le 10 \frac{\sqrt{m_{2 + \eta }}}{n^{\delta \eta /2}}. \end{aligned}$$

Proof

(of Lemma 7.2) We first note that

$$\begin{aligned} {{\mathrm{Var}}}(\tilde{\xi }_{st}^{(n)}) \le {\mathbb {E}}|\xi _{st}|^2 \mathbf {1}_{\{|\xi _{st}| \le n^{\delta }\}} \le {\mathbb {E}}|\xi _{st}|^2 = 1/d \end{aligned}$$
(7.4)

for all \(s,t \in \{1,\ldots ,d\}\). We also note that

$$\begin{aligned} \left| 1/d- {{\mathrm{Var}}}(\tilde{\xi }_{st}^{(n)}) \right| = \left| 1/d - {\mathbb {E}}\left| \tilde{\xi }_{st}^{(n)}\right| ^2 \right| \le 2 {\mathbb {E}}|\xi _{st}|^2 \mathbf {1}_{\{|\xi _{st}| > n^{\delta } \}} \le 2\frac{m_{2+\eta }}{n^{\delta \eta }}. \end{aligned}$$
(7.5)

Since this holds for all \(s,t \in \{1,\ldots ,d\}\), we obtain (iii). We now take \(n_0\) sufficiently large such that

$$\begin{aligned} 8m_{2 + \eta } \le n_0^{\delta \eta } \end{aligned}$$
(7.6)

and

$$\begin{aligned} \min _{1 \le s,t \le d} {{\mathrm{Var}}}(\tilde{\xi }_{st}^{(n)}) > \frac{1}{2d} \end{aligned}$$

for all \(n > n_0\); let \(n > n_0\). Then each \(\hat{\xi }_{st}^{(n)}\) has mean zero and variance \(1/d\) by construction. Moreover, we have a.s.

$$\begin{aligned} \left| \hat{\xi }_{st}^{(n)} \right| \le \frac{2n^{\delta }}{\sqrt{d{{\mathrm{Var}}}(\tilde{\xi }_{st}^{(n)})}} \le 4 n^{\delta } \end{aligned}$$

for all \(s,t \in \{1,\ldots ,d\}\). We now verify (iv); fix \(s,t,u,v \in \{1,\ldots ,d\}\) with \((s,t) \ne (u,v)\). We have

$$\begin{aligned} \left| {\mathbb {E}}\left[ \hat{\xi }_{st}^{(n)} \overline{\hat{\xi }_{uv}^{(n)}} \right] - {\mathbb {E}}\left[ \tilde{\xi }_{st}^{(n)} \overline{\tilde{\xi }_{uv}^{(n)}} \right] \right|&\le {\mathbb {E}}\left| \hat{\xi }_{st}^{(n)} \overline{\hat{\xi }_{uv}^{(n)}} \right| \left| 1 - d \sqrt{{{\mathrm{Var}}}( \tilde{\xi }_{st}^{(n)})} \sqrt{ {{\mathrm{Var}}}(\tilde{\xi }_{uv}^{(n)})}\right| \\&\le \left| \frac{1}{d} -\sqrt{{{\mathrm{Var}}}( \tilde{\xi }_{st}^{(n)})} \sqrt{ {{\mathrm{Var}}}(\tilde{\xi }_{uv}^{(n)})} \right| \\&\le d \left| \frac{1}{d^2} - {{\mathrm{Var}}}(\tilde{\xi }_{st}^{(n)}) {{\mathrm{Var}}}(\tilde{\xi }_{uv}^{(n)}) \right| \\&\le \left| {{\mathrm{Var}}}(\tilde{\xi }_{st}^{(n)}) - \frac{1}{d} \right| + \left| {{\mathrm{Var}}}(\tilde{\xi }_{uv}^{(n)}) - \frac{1}{d} \right| \\&\le 4 \frac{m_{2+\eta }}{n^{\delta \eta }} \end{aligned}$$

by (7.4) and the Cauchy–Schwarz inequality. Here the last inequality follows from (7.5). By another application of Cauchy–Schwarz and (7.4), we obtain

$$\begin{aligned} \left| {\mathbb {E}}\left[ \tilde{\xi }_{st}^{(n)} \overline{ \tilde{\xi }_{uv}^{(n)}} \right] \right|&\le \sqrt{ {\mathbb {E}}|\xi _{st}|^2 \mathbf {1}_{\{|\xi _{st}| > n^{\delta }\}} } + \sqrt{ {\mathbb {E}}|\xi _{uv}|^2 \mathbf {1}_{\{|\xi _{uv}| > n^{\delta }\}} } \\&\quad + 2 {\mathbb {E}}|\xi _{st}|^2 \mathbf {1}_{\{|\xi _{st}| > n^{\delta }\}} + 2 {\mathbb {E}}|\xi _{uv}|^2 \mathbf {1}_{\{|\xi _{uv}| > n^{\delta }\}} \\&\le 2 \frac{ \sqrt{m_{2+\eta }}}{n^{\delta \eta / 2}} + 4 \frac{m_{2+\eta }}{n^{\delta \eta }}. \end{aligned}$$

Combining the bounds above with (7.6), we obtain

$$\begin{aligned} \left| {\mathbb {E}}\left[ \hat{\xi }_{st}^{(n)} \overline{\hat{\xi }_{uv}^{(n)}} \right] \right| \le 10 \frac{\sqrt{m_{2+\eta }}}{n^{\delta \eta /2}}. \end{aligned}$$
(7.7)

Since (7.7) holds for any \((s,t) \ne (u,v)\), the proof of the lemma is complete. \(\square \)

We will continue to use the notation introduced in Definition 2.3. That is, for any \(s,t \in \{1,\ldots ,d\}\) and \(1 \le i, j \le n\), we let \(x_{st;ij}\) denote the \((i,j)\)-entry of the matrix \({\mathbf {X}}_{n,st}\). For every \(s,t \in \{1,\ldots ,d\}\), \(n \ge 1\), and \(1 \le i,j \le n\), we define

$$\begin{aligned} \tilde{x}_{st;ij}^{(n)} := x_{st;ij} \mathbf {1}_{\{|x_{st;ij}| \le n^{\delta }\}} - {\mathbb {E}}\left[ x_{st;ij} \mathbf {1}_{\{|x_{st;ij}| \le n^{\delta }\}} \right] \end{aligned}$$

and

$$\begin{aligned} \hat{x}_{st;ij}^{(n)} := \frac{\tilde{x}_{st;ij}^{(n)}}{ \sqrt{d {{\mathrm{Var}}}(\tilde{x}_{st;ij}^{(n)} )} }. \end{aligned}$$

Set \(\tilde{{\mathbf {X}}}_{n,st} := \left( \tilde{x}_{st;ij}^{(n)} \right) _{i,j=1}^n\) and \(\hat{{\mathbf {X}}}_{n,st} := \left( \hat{x}_{st;ij}^{(n)} \right) _{i,j=1}^n\) for every \(n \ge 1\) and \(s,t \in \{1,\ldots ,d\}\). We also define the \(dn \times dn\) random block matrices

$$\begin{aligned} \tilde{{\mathbf {X}}}_n := \left( \tilde{{\mathbf {X}}}_{n,st} \right) _{s,t=1}^d, \quad \hat{{\mathbf {X}}}_n := \left( \hat{{\mathbf {X}}}_{n,st} \right) _{s,t=1}^d. \end{aligned}$$

For \(z \in \mathbb {C}\), we define the \(2dn \times 2dn\) matrices

$$\begin{aligned} \tilde{{\mathbf {H}}}_n = \tilde{{\mathbf {H}}}_n(z) := \begin{bmatrix} {\mathbf 0}&\quad \frac{1}{\sqrt{n}} \tilde{{\mathbf {X}}}_n - z {\mathbf {I}}\\ \frac{1}{\sqrt{n}} \tilde{{\mathbf {X}}}_n^*- \bar{z} {\mathbf {I}}&\quad {\mathbf 0}\end{bmatrix} \end{aligned}$$

and

$$\begin{aligned} \hat{{\mathbf {H}}}_n = \hat{{\mathbf {H}}}_n(z) := \begin{bmatrix} {\mathbf 0}&\quad \frac{1}{\sqrt{n}} \hat{{\mathbf {X}}}_n - z {\mathbf {I}}\\ \frac{1}{\sqrt{n}} \hat{{\mathbf {X}}}_n^*- \bar{z} {\mathbf {I}}&\quad {\mathbf 0}\end{bmatrix}. \end{aligned}$$

We will make use of the following corollary to the law of large numbers.

Lemma 7.3

(Law of large numbers) Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of random matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume (7.3) holds for some \(\eta > 0\). Let \(\delta > 0\). Then a.s.

$$\begin{aligned} \limsup _{n \rightarrow \infty } \frac{1}{n^2} \Vert {\mathbf {X}}_n \Vert _2^2&\le d,\end{aligned}$$
(7.8)
$$\begin{aligned} \limsup _{n \rightarrow \infty } \frac{1}{n^2} \Vert \hat{{\mathbf {X}}}_n\Vert _2^2&\le 8d, \end{aligned}$$
(7.9)

and

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n^2} \sum _{s,t=1}^d \sum _{i,j=1}^n |x_{st;ij}|^{2 + \eta } \mathbf {1}_{\{|x_{st;ij}| > n^{\delta }|\}} = 0. \end{aligned}$$
(7.10)

Proof

(of Lemma 7.3) We first prove (7.8). We begin by noting that

$$\begin{aligned} \frac{1}{n^2} \Vert {\mathbf {X}}_n \Vert _2^2 = \sum _{s,t=1}^d \frac{1}{n^2} \sum _{i,j=1}^n |x_{st;ij}|^2. \end{aligned}$$

For any \(s,t \in \{1,\ldots ,d\}\), we apply the law of large numbers and obtain a.s.

$$\begin{aligned} \limsup _{n \rightarrow \infty } \frac{1}{n^2} \sum _{i,j=1}^n |x_{st;ij}|^2 \le {\mathbb {E}}|\xi _{st}|^2 = \frac{1}{d}. \end{aligned}$$

Since \(d\) is fixed, independent of \(n\), we conclude that a.s.

$$\begin{aligned} \limsup _{n \rightarrow \infty } \sum _{s,t=1}^d \frac{1}{n^2} \sum _{i,j=1}^n |x_{st;ij}|^2 \le d, \end{aligned}$$

and the proof of (7.8) is complete.

For (7.9), we apply the bounds in Lemma 7.2 to obtain

$$\begin{aligned} \sum _{s,t=1}^d \sum _{i,j=1}^n \left| \hat{x}^{(n)}_{st;ij} \right| ^2 \le 2 \sum _{s,t=1}^d \sum _{i,j=1}^n \left| \tilde{x}^{(n)}_{st;ij} \right| ^2 \le 4 \sum _{s,t=1}^d \sum _{i,j=1}^n \left( |x_{st;ij}|^2 + {\mathbb {E}}|x_{st;ij}|^2 \right) \end{aligned}$$

for \(n\) sufficiently large. Hence (7.9) follows from (7.8).

We now prove (7.10); fix \(s,t \in \{1,\ldots ,d\}\). By the law of large numbers, for any \(M > 0\), we have a.s.

$$\begin{aligned} \limsup _{n \rightarrow \infty } \frac{1}{n^2} \sum _{i,j=1}^n |x_{st;ij}|^{2 + \eta } \mathbf {1}_{\{|x_{st;ij}| > n^{\delta }|\}} \le {\mathbb {E}}|\xi _{st}|^{2+\eta } \mathbf {1}_{\{|\xi _{st}| > M\}}. \end{aligned}$$

By the dominated convergence theorem, it follows that

$$\begin{aligned} \lim _{M \rightarrow \infty } {\mathbb {E}}|\xi _{st}|^{2 + \eta } \mathbf {1}_{\{|\xi _{st}| > M\}} = 0. \end{aligned}$$

We conclude that a.s.

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n^2} \sum _{i,j=1}^n |x_{st;ij}|^{2 + \eta } \mathbf {1}_{\{|x_{st;ij}| > n^{\delta }|\}} = 0. \end{aligned}$$

Since \(d\) is fixed, independent of \(n\), the claim follows. \(\square \)

We let \(L(F,G)\) denote the Levy distance between two distribution functions \(F,G\). That is,

$$\begin{aligned} L(F,G) := \inf \{ {\varepsilon }> 0 : F(x - {\varepsilon }) - {\varepsilon }\le G(x) \le F(x + {\varepsilon }) + {\varepsilon }\text { for all } x \in \mathbb {R} \}. \end{aligned}$$
(7.11)

Convergence in Levy distance implies convergence in distribution [2, Remark A.40]. We will compare the ESD of \(\hat{{\mathbf {H}}}_n\) to the ESD of \({\mathbf {H}}_n\) using the Levy metric.

Lemma 7.4

Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of random matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume (7.3) holds for some \(\eta > 0\). Let \(\delta > 0\). Then a.s.

$$\begin{aligned} \sup _{z \in \mathbb {C}} L\left( F^{{\mathbf {H}}_n}, F^{\hat{{\mathbf {H}}}_n}\right) = o(n^{-\delta \eta /3}). \end{aligned}$$

Proof

(of Lemma 7.4) We will apply [2, Corollary A.41] to bound \(L(F^{H_n}, F^{\tilde{H}_n})\) and \(L(F^{\tilde{H}_n}, F^{\hat{H}_n})\) separately. Thus, we have

$$\begin{aligned}&\sup _{z \in \mathbb {C}} n^{\delta \eta } L^3\left( F^{{\mathbf {H}}_n}, F^{\tilde{{\mathbf {H}}}_n}\right) \\&\quad \le \frac{n^{\delta \eta }}{n^2} \left\| {\mathbf {X}}_n - \tilde{{\mathbf {X}}}_n \right\| _2^2 \\&\quad \le 2 \frac{n^{\delta \eta }}{n^2} \sum _{s,t=1}^d \sum _{i,j=1}^n \left( \left| x_{st,ij}\right| ^2 \mathbf {1}_{\{|x_{st;ij}| > n^{\delta }\}} + {\mathbb {E}}|x_{st;ij}|^2\mathbf {1}_{\{|x_{st;ij}| > n^{\delta }\}} \right) \\&\quad \le \frac{2}{n^2} \sum _{s,t=1}^d \sum _{i,j=1}^n \left( \left| x_{st,ij}\right| ^{2+\eta } \mathbf {1}_{\{|x_{st;ij}| > n^{\delta }\}} + {\mathbb {E}}|x_{st;ij}|^{2+\eta } \mathbf {1}_{\{|x_{st;ij}| > n^{\delta }\}} \right) . \end{aligned}$$

We note that

$$\begin{aligned} \frac{1}{n^2} \sum _{s,t=1}^d \sum _{i,j=1}^n {\mathbb {E}}|x_{st;ij}|^{2+\eta } \mathbf {1}_{\{|x_{st;ij}| > n^{\delta }\}} = \sum _{s,t=1}^n {\mathbb {E}}|\xi _{st}|^{2+\eta } \mathbf {1}_{\{|\xi _{st}| > n^{\delta }\}}, \end{aligned}$$

and thus, by the dominated convergence theorem, we obtain

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n^2} \sum _{s,t=1}^d \sum _{i,j=1}^n {\mathbb {E}}|x_{st;ij}|^{2+\eta } \mathbf {1}_{\{|x_{st;ij}| > n^{\delta }\}} = 0. \end{aligned}$$

Therefore, by Lemma 7.3, we conclude that a.s.

$$\begin{aligned} \lim _{n \rightarrow \infty } \sup _{z \in \mathbb {C}} n^{\delta \eta } L^3\left( F^{{\mathbf {H}}_n}, F^{\tilde{{\mathbf {H}}}_n}\right) = 0, \end{aligned}$$

and hence a.s.

$$\begin{aligned} \sup _{z \in \mathbb {C}} L\left( F^{{\mathbf {H}}_n}, F^{\tilde{{\mathbf {H}}}_n}\right) = o(n^{-\delta \eta / 3}). \end{aligned}$$
(7.12)

Applying [2, Corollary A.41] again, we obtain

$$\begin{aligned} \sup _{z \in \mathbb {C}} n^{\delta \eta } L^3\left( F^{\tilde{{\mathbf {H}}}_n}, F^{\hat{{\mathbf {H}}}_n}\right)&\le \frac{n^{\delta \eta }}{n^2} \left\| \tilde{{\mathbf {X}}}_n - \hat{{\mathbf {X}}}_n \right\| _2^2 \\&\le \frac{n^{\delta \eta }}{n^2} \sum _{s,t=1}^d \sum _{i,j=1}^n \left| \hat{x}_{st;ij}^{(n)} \right| ^2 \left| 1 - \sqrt{d {{\mathrm{Var}}}( \tilde{x}_{st;ij}^{(n)}) } \right| ^2 \\&\le \frac{n^{\delta \eta }}{n^2} \sum _{s,t=1}^d \sum _{i,j=1}^n \left| \hat{x}_{st;ij}^{(n)} \right| ^2 \left| 1 - d {{\mathrm{Var}}}(\tilde{x}_{st;ij}^{(n)}) \right| ^2 \\&\le d^2 \frac{n^{\delta \eta }}{n^2} \sum _{s,t=1}^d \sum _{i,j=1}^n \left| \hat{x}_{st;ij}^{(n)} \right| ^2 \left| \frac{1}{d} - {{\mathrm{Var}}}(\tilde{\xi }_{st}^{(n)}) \right| ^2 \\&\le 4d^2 \frac{m_{2+\eta }^2}{n^{\delta \eta }} \frac{1}{n^2} \left\| \hat{{\mathbf {X}}}_n \right\| _2^2. \end{aligned}$$

Here the last inequality follows from Lemma 7.2. By Lemma 7.3, we have a.s.

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n^{2+\delta \eta }} \left\| \hat{{\mathbf {X}}}_n \right\| _2^2 = 0, \end{aligned}$$

and we conclude that a.s.

$$\begin{aligned} \sup _{z \in \mathbb {C}} L\left( F^{\tilde{{\mathbf {H}}}_n}, F^{\hat{{\mathbf {H}}}_n}\right) = o(n^{-\delta \eta / 3}). \end{aligned}$$
(7.13)

The claim now follows from (7.12), (7.13), and the triangle inequality for Levy distance. \(\square \)

7.2 Cubic relation

We now consider the distribution of eigenvalues of \({\mathbf {H}}_n\). In fact, by Lemma 7.4, it will suffice to consider the eigenvalues of \(\hat{{\mathbf {H}}}_n\). To this end, we will study the resolvent of \(\hat{{\mathbf {H}}}_n\) in Theorem 7.6 below. Indeed, for \(w \in \mathbb {C}\) with \(\mathrm{Im}(w) > 0\),

$$\begin{aligned} \hat{m}_n(z,w) := \frac{1}{2dn} {{\mathrm{tr}}}\left( \hat{{\mathbf {H}}}_n(z) - w {\mathbf {I}}\right) ^{-1} = \int \limits _{\mathbb {R}} \frac{1}{x - w} \nu _{\frac{1}{\sqrt{n}}\hat{{\mathbf {X}}}_n - z {\mathbf {I}}}(dx) \end{aligned}$$

is the Stieltjes transform of the measure \(\nu _{\frac{1}{\sqrt{n}}\hat{{\mathbf {X}}}_n - z{\mathbf {I}}}\). It follows from standard Stieltjes transform techniques (e.g. [2, Theorem B.9]) that computing the limiting ESD of \(\hat{{\mathbf {H}}}_n(z)\) is equivalent to computing the limit of \(\hat{m}_n(z,w)\) for all \(w \in {\mathbb {C}}\) with \(\mathrm{Im}(w) > 0\).

As is standard in random matrix theory, we will not compute \(\hat{m}_n\) explicitly. Instead we will derive a fixed point equation. Indeed, we will show

$$\begin{aligned} \hat{m}_n(z,w) + \frac{ \hat{m}_n(z,w) + w}{ (\hat{m}_n(z,w) + w)^2 - |z|^2} = o(1) \end{aligned}$$
(7.14)

for \(z,w\in {\mathbb {C}}\) with \(\mathrm{Im}(w) > 0\). We will then conclude that \(m_n(z,w)\) converges to a limit, which we denote by \(m(z,w)\). It follows that \(m(z,w)\) satisfies the equation

$$\begin{aligned} m(z,w) + \frac{ m(z,w) + w}{ (m(z,w) + w)^2 - |z|^2} = 0. \end{aligned}$$
(7.15)

From (7.15) we will deduce the limiting ESD of \(\hat{{\mathbf {H}}}_n\). Equation (7.15) has appeared previously in [5, 13] and in a slightly different form in [2, Chapter 11]. We refer to Eq. (7.15) as a cubic relation since it can be rewritten as the cubic polynomial equation

$$\begin{aligned} m(z,w)^3 + 2w m(z,w)^2 + (w^2 - |z|^2 + 1) m(z,w) + w = 0. \end{aligned}$$

In this subsection we will show \(\hat{m}_n\) satisfies (7.14). We begin with the following concentration result for bilinear forms from [24].

Lemma 7.5

(Lemma 3.10 of [24]) Let \((x,y)\) be a random vector in \(\mathbb {C}^2\) where \(x,y\) both have mean zero, unit variance, and satisfy

  • \(\max \{|x|,|y|\} \le L\) a.s.,

  • \({\mathbb {E}}[\bar{x} y] = \rho \).

Let \((x_1, y_1), (x_2, y_2), \ldots , (x_n, y_n)\) be iid copies of \((x,y)\), and set \(X \!=\! (x_1, x_2, \ldots , x_n)^\mathrm {T}\) and \(Y=(y_1, y_2, \ldots , y_n)^\mathrm {T}\). Let \({\mathbf {B}}\) be a \(n \times n\) random matrix, independent of \(X\) and \(Y\), which satisfies \(\Vert {\mathbf {B}}\Vert \le n^{1/4}\) a.s. Then, for any \(p \ge 2\),

$$\begin{aligned} {\mathbb {P}}\left( \left| \frac{1}{n} X^*{\mathbf {B}}Y - \frac{\rho }{n} {{\mathrm{tr}}}{\mathbf {B}}\right| > n^{-1/8} \right) =O _p \left( \frac{ L^{2p}}{n^{p/8}} \right) . \end{aligned}$$

We formally establish (7.14) in the following theorem.

Theorem 7.6

Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of random matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume (7.3) holds for some \(\eta > 0\). Let \(0 < \delta < 1/100\). Consider the truncated random matrices \(\{\hat{{\mathbf {X}}}_n\}_{n \ge 1}\) and \(\{\hat{{\mathbf {H}}}_n(z)\}_{n \ge 1}\). For \(z,w \in \mathbb {C}\) with \(\mathrm{Im}(w) > 0\), define

$$\begin{aligned} \hat{{\mathbf {G}}}_n(z,w) := \left( \hat{{\mathbf {H}}}_n(z) - w{\mathbf {I}}\right) ^{-1} \quad \text {and}\quad \hat{m}_n(z,w) := \frac{1}{2dn} {{\mathrm{tr}}}\hat{{\mathbf {G}}}_n(z,w). \end{aligned}$$

Let \(M, \beta > 0\). Then, for \(v_n := \max \left\{ n^{-\eta \delta /100}, n^{-1/100} \right\} \), a.s.

$$\begin{aligned} \sup _{|z| \le M} \sup _{|w| \le \beta , \mathrm{Im}(w) \ge v_n} \left| \hat{m}_n(z,w) + \frac{ \hat{m}_n(z,w) + w}{ (\hat{m}_n(z,w) + w)^2 - |z|^2} \right| = O_{M,\beta ,d,m_{2+\eta }}(v_n^5). \end{aligned}$$

In order to prove Theorem 7.6, we will need the following deterministic lemmas.

Lemma 7.7

Let \({\mathbf {R}}\) be the \(2n \times 2n\) block matrix given by

$$\begin{aligned} {\mathbf {R}}= \begin{bmatrix} -w {\mathbf {I}}&\quad {\mathbf {B}}\\ {\mathbf {B}}^*&\quad - w {\mathbf {I}}\end{bmatrix}^{-1} = \begin{bmatrix} {\mathbf {R}}_1&\quad {\mathbf {R}}_2 \\ {\mathbf {R}}_3&\quad {\mathbf {R}}_4 \end{bmatrix}, \end{aligned}$$

where \({\mathbf {B}}, {\mathbf {R}}_1,{\mathbf {R}}_2,{\mathbf {R}}_3,{\mathbf {R}}_4\) are \(n \times n\) matrices. Then \({{\mathrm{tr}}}{\mathbf {R}}_1 = {{\mathrm{tr}}}{\mathbf {R}}_4\) for any \(w \in \mathbb {C}\) with \(\mathrm{Im}(w) > 0\).

Proof

(of Lemma 7.7) We first note that

$$\begin{aligned} \begin{bmatrix} -w {\mathbf {I}}&\quad {\mathbf {B}}\\ {\mathbf {B}}^*&\quad - w {\mathbf {I}}\end{bmatrix} = \begin{bmatrix} {\mathbf 0}&\quad {\mathbf {B}}\\ {\mathbf {B}}^*&\quad {\mathbf 0}\end{bmatrix} - w {\mathbf {I}}\end{aligned}$$

is invertible for any \(w \in \mathbb {C}\) with \(\mathrm{Im}(w) > 0\). Let \(\sigma _1, \sigma _2, \ldots , \sigma _n \ge 0\) denote the singular values of \({\mathbf {B}}\). Then \(-w{\mathbf {I}}+ w^{-1} {\mathbf {B}}{\mathbf {B}}^*\) has eigenvalues \(-w + w^{-1}\sigma _i^2\) for \(i=1,2,\ldots ,n\). In particular

$$\begin{aligned} \mathrm{Im}\left( -w + \frac{\sigma _i^2}{w} \right) = - \mathrm{Im}(w) - \mathrm{Im}(w) \frac{\sigma _i^2}{|w|^2} < 0 \end{aligned}$$

for \(\mathrm{Im}(w) > 0\). Thus \(-w{\mathbf {I}}+ w^{-1}{\mathbf {B}}{\mathbf {B}}^*\) is invertible. Similarly, \(-w{\mathbf {I}}+ w^{-1} {\mathbf {B}}^*{\mathbf {B}}\) has the same eigenvalues and is also invertible. By the Schur complement [15, Section 0.7.3],

$$\begin{aligned} {\mathbf {R}}_1&= \left( -w{\mathbf {I}}+ w^{-1} {\mathbf {B}}{\mathbf {B}}^*\right) ^{-1}, \\ {\mathbf {R}}_4&= \left( -w{\mathbf {I}}+ w^{-1} {\mathbf {B}}^*{\mathbf {B}}\right) ^{-1}. \end{aligned}$$

Since \({\mathbf {R}}_1\) and \({\mathbf {R}}_4\) have the same eigenvalues, \({{\mathrm{tr}}}{\mathbf {R}}_1 = {{\mathrm{tr}}}{\mathbf {R}}_4\). \(\square \)

We introduce \({\varepsilon }\)-nets as a convenient way to discretize a compact set. Let \({\varepsilon }> 0\). A set \(X\) is an \({\varepsilon }\)-net of a set \(Y\) if for any \(y \in Y\), there exists \(x \in X\) such that \(\Vert x-y\Vert \le {\varepsilon }\). In order to prove Theorem 7.6, we will need the following well-known estimate for the maximum size of an \({\varepsilon }\)-net.

Lemma 7.8

(Lemma 3.11 of [24]) The set \(\{w \in \mathbb {C} : |w| \le \beta , \mathrm{Im}(w) \ge \alpha \}\) admits an \({\varepsilon }\)-net of size at most

$$\begin{aligned} \left( 1 + \frac{2\beta }{{\varepsilon }} \right) ^2. \end{aligned}$$

We will also take advantage of the following facts, which can be found in [14, 15]. Let \({\mathbf {B}}\) be a \(n \times n\) matrix with singular values \(\sigma _1({\mathbf {B}}) \ge \cdots \ge \sigma _n({\mathbf {B}}) \ge 0\). Then the \(2n \times 2n\) matrix

$$\begin{aligned} \begin{bmatrix} -w {\mathbf {I}}&\quad {\mathbf {B}}\\ {\mathbf {B}}^*&\quad -w {\mathbf {I}}\end{bmatrix} = \begin{bmatrix} 0&\quad {\mathbf {B}}\\ {\mathbf {B}}^*&\quad 0 \end{bmatrix} - w {\mathbf {I}}\end{aligned}$$
(7.16)

has an orthonormal basis of eigenvectors with eigenvalues \(\pm \sigma _1({\mathbf {B}}) - w, \ldots , \pm \sigma _n({\mathbf {B}}) - w\). Thus, if \(\mathrm{Im}(w) > 0\), the matrix (7.16) is invertible. Let

$$\begin{aligned} {\mathbf {G}}:= \begin{bmatrix} -w {\mathbf {I}}&{\mathbf {B}}\\ {\mathbf {B}}^*&-w {\mathbf {I}}\end{bmatrix}^{-1}. \end{aligned}$$

Then, for \(\mathrm{Im}(w) > 0\), we have

$$\begin{aligned} \Vert {\mathbf {G}}\Vert = \max _{1 \le i \le n} \frac{1}{ |\pm \sigma _i({\mathbf {B}}) - w|} \le \frac{1}{\mathrm{Im}(w)}. \end{aligned}$$
(7.17)

We will make use of the following identity: for any invertible \(n \times n\) matrices \({\mathbf {A}}\) and \({\mathbf {B}}\),

$$\begin{aligned} {\mathbf {A}}^{-1} - {\mathbf {B}}^{-1} = {\mathbf {A}}^{-1} ({\mathbf {B}}- {\mathbf {A}}) {\mathbf {B}}^{-1}. \end{aligned}$$
(7.18)

A special case of (7.18) is the resolvent identity (also known as Hilbert’s identity): for any Hermitian \(n \times n\) matrix \({\mathbf {A}}\),

$$\begin{aligned} ({\mathbf {A}}- w {\mathbf {I}})^{-1} - ({\mathbf {A}}- w' {\mathbf {I}})^{-1} = (w-w') ({\mathbf {A}}- w {\mathbf {I}})^{-1} ({\mathbf {A}}- w' {\mathbf {I}})^{-1} \end{aligned}$$
(7.19)

for all \(w,w' \in {\mathbb {C}}\) with \(\mathrm{Im}(w), \mathrm{Im}(w') > 0\).

We are now ready to prove Theorem 7.6.

Proof

(of Theorem 7.6) Fix \(M, \beta > 0\). For the remainder of the proof, the implicit constants in our asymptotic notation (such as \(O,o,\Omega , \ll \)) depend only on the constants \(M, \beta \), \(d\), and \(m_{2+\eta }\); for simplicity, we no longer include these subscripts in our notation.

For notational convenience, we will write \({\mathbf {X}}_n\) instead of \(\hat{{\mathbf {X}}}_n\). That is, we let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) denote the sequence of truncated matrices. Similarly, we write \({\mathbf {X}}_{n,st}\) instead of \(\hat{{\mathbf {X}}}_{n,st}\) for \(s,t \in \{1,\ldots ,d\}\). We define the \(2dn \times 2dn\) matrix

$$\begin{aligned} {\mathbf {G}}_n(z,w) := \begin{bmatrix} -w {\mathbf {I}}&\quad \frac{1}{\sqrt{n}} {\mathbf {X}}_n - z {\mathbf {I}}\\ \frac{1}{\sqrt{n}} {\mathbf {X}}_n^*- \bar{z} {\mathbf {I}}&\quad - w {\mathbf {I}}\end{bmatrix}^{-1}. \end{aligned}$$

We will often drop the dependence on \(z,w\) and simply write \({\mathbf {G}}_n\) instead of \({\mathbf {G}}_n(z,w)\). We write \({\mathbf {G}}_n = ({\mathbf {G}}_{n,st})_{s,t=1}^{2d}\) where each \({\mathbf {G}}_{n,st}\) is a \(n \times n\) matrix. Then \({\mathbf {G}}_{n,st}(i,j)\) denotes the \((i,j)\)-entry of \({\mathbf {G}}_{n,st}\). We define \(m_{n,st}(z,w) := \frac{1}{n} {{\mathrm{tr}}}{\mathbf {G}}_{n,st}\) for \(s,t \in \{1,\ldots ,2d\}\) and \(m_n(z,w) := \frac{1}{2dn} {{\mathrm{tr}}}{\mathbf {G}}_n\). We will often drop the dependence on \(z,w\) and simply write \(m_n\) and \(m_{n,st}\) instead of \(m_n(z,w)\) and \(m_{n,st}(z,w)\).

Let \(1 \le k \le n\). We let \({\mathbf {r}}_k({\mathbf {X}}_{n,st})\) denote the \(k\)th row of \({\mathbf {X}}_{n,st}\) with the \(k\)th entry removed. Similarly, we let \({\mathbf {c}}_k({\mathbf {X}}_{n,st})\) denote the \(k\)th column of \({\mathbf {X}}_{n,st}\) with the \(k\)th entry removed. We let \({\mathbf {X}}_{n,st}^{(k)}\) be the \((n-1) \times (n-1)\) matrix constructed from \({\mathbf {X}}_{n,st}\) by removing the \(k\)th column and \(k\)th row. We let \({\mathbf {X}}_{n}^{(k)}\) be the \(d(n-1) \times d(n-1)\) block matrix given by \({\mathbf {X}}_{n}^{(k)} := \left( {\mathbf {X}}_{n,st}^{(k)}\right) _{s,t=1}^d\). Define the \(2d(n-1) \times 2d(n-1)\) matrix

$$\begin{aligned} {\mathbf {G}}^{(k)}_n(z,w) := \begin{bmatrix} -w {\mathbf {I}}&\quad \frac{1}{\sqrt{n}} {\mathbf {X}}_{n}^{(k)} - z {\mathbf {I}}\\ \frac{1}{\sqrt{n}} {{\mathbf {X}}_{n}^{(k)}}^*- \bar{z} {\mathbf {I}}&\quad -w {\mathbf {I}}\end{bmatrix}^{-1}. \end{aligned}$$

We will often drop the dependence on \(z,w\) and simply write \({\mathbf {G}}^{(k)}_n\). We again write \({\mathbf {G}}_n^{(k)} = \left( {\mathbf {G}}_{n,st}^{(k)}\right) _{s,t=1}^{2d}\) where each \({\mathbf {G}}_{n,st}^{(k)}\) is a \((n-1) \times (n-1)\) matrix. We let \(m_{n,st}^{(k)}(z,w) := \frac{1}{n} {{\mathrm{tr}}}{\mathbf {G}}_{n,st}^{(k)}\) for \(s,t \in \{1,\ldots ,2d\}\) and \(m_n^{(k)}(z,w) := \frac{1}{2dn} {{\mathrm{tr}}}{\mathbf {G}}_n^{(k)}\). We will often drop the dependence on \(z,w\) and write \(m_n^{(k)}\) and \(m_{n,st}^{(k)}\) instead of \(m_n^{(k)}(z,w)\) and \(m_{n,st}^{(k)}(z,w)\).

From (7.17), for \(w \in \mathbb {C}\) with \(\mathrm{Im}(w) \ge v_n\), we have the deterministic bounds \(\Vert {\mathbf {G}}_n(z,w)\Vert \le v_n^{-1}\), \(| m_n(z,w) | \le v_n^{-1}\), and \(| m_{n,st}(z,w) | \le v_n^{-1}\). By Cauchy’s interlacing theorem [15, Theorem 4.3.8] (or alternatively [2, (A.1.12)]), we have the deterministic bound

$$\begin{aligned} \sup _{1 \le k \le n} \sup _{|z| \le M} \sup _{|w| \le \beta , \mathrm{Im}(w) \ge v_n} \left| m_n^{(k)}(z,w) - m_n(z,w) \right| = O \left( \frac{1}{v_n n} \right) . \end{aligned}$$
(7.20)

By Lemma 7.7, we have

$$\begin{aligned} \sum _{s=1}^d m_{n,ss}(z,w) = \sum _{s=d+1}^{2d} m_{n,ss}(z,w) \end{aligned}$$

and

$$\begin{aligned} \sum _{s=1}^d m_{n,ss}^{(k)}(z,w) = \sum _{s=d+1}^{2d} m_{n,ss}^{(k)}(z,w) \end{aligned}$$

for any \(1 \le k \le n\). Thus, from (7.20), we find

$$\begin{aligned} \sup _{1 \le k \le n} \sup _{|z| \le M} \sup _{|w| \le \beta , \mathrm{Im}(w) \ge v_n} \left| \frac{1}{d} \sum _{s=1}^d m_{n,ss}^{(k)}(z,w) - m_n(z,w) \right| = O\left( \frac{1}{n v_n} \right) \end{aligned}$$
(7.21)

and

$$\begin{aligned} \sup _{1 \le k \le n} \sup _{|z| \le M} \sup _{|w| \le \beta , \mathrm{Im}(w) \ge v_n} \left| \frac{1}{d} \sum _{s=d+1}^{2d} m_{n,ss}^{(k)}(z,w) - m_n(z,w) \right| = O \left( \frac{1}{n v_n} \right) .\quad \end{aligned}$$
(7.22)

Fix \(1 \le k \le n\) and \(z \in \mathbb {C}\) with \(|z| \le M\). Fix \(w \in \mathbb {C}\) with \(|w| \le \beta \) and \(\mathrm{Im}(w) \ge v_n\). Let \({\mathbf {Q}}_k\) be the \(2d \times 2d\) matrix given by \({\mathbf {Q}}_k := ( {\mathbf {G}}_{n,st}(k,k) )_{s,t=1}^{2d}\). By the Schur complement [15, Section 0.7.3],

$$\begin{aligned} {\mathbf {Q}}_k^{-1} = \begin{bmatrix} -w {\mathbf {I}}&\quad \left( \frac{1}{\sqrt{n}} x_{st;kk} - z \delta _{s,t} \right) _{s,t=1}^d \\ \left( \frac{1}{\sqrt{n}} \bar{x}_{st;kk} - \bar{z} \delta _{s,t} \right) _{s,t=1}^d&\quad - w {\mathbf {I}}\end{bmatrix} - \frac{1}{n} {\mathbf {R}}_k {\mathbf {G}}^{(k)}_n {\mathbf {R}}_k^*, \end{aligned}$$

where \(\delta _{s,t}\) is the Kronecker delta and

$$\begin{aligned} {\mathbf {R}}_k := \begin{bmatrix} {\mathbf 0}&\quad \left( {\mathbf {r}}_k( {\mathbf {X}}_{n,st}) \right) _{s,t=1}^d \\ \left( {\mathbf {c}}_k( {\mathbf {X}}_{n,ts})^*\right) _{s,t=1}^d&\quad {\mathbf 0}\end{bmatrix}. \end{aligned}$$

By the truncation assumption and Lemma 7.2, we have a.s.

$$\begin{aligned} \max _{1 \le k \le n} \max _{1 \le s,t \le d} \frac{|x_{st;kk}|}{\sqrt{n}} \ll \frac{n^{\delta }}{\sqrt{n}} \le \frac{1}{n^{2/5}} \end{aligned}$$

for \(n\) sufficiently large.

We observe that \({\mathbf {R}}_k\) and \({\mathbf {G}}^{(k)}_n\) are independent random matrices. By expanding the product, we note that the entries of \({\mathbf {R}}_k {\mathbf {G}}^{(k)}_n {\mathbf {R}}_k^*\) are linear combinations of bilinear forms. We will apply Lemma 7.5 to control each bilinear form. Applying the bound \(\Vert {\mathbf {G}}_n(z,w) \Vert \le v_n^{-1}\) and Lemma 7.5, we obtain

$$\begin{aligned} \left\| \frac{1}{n} {\mathbf {R}}_k {\mathbf {G}}^{(k)}_n {\mathbf {R}}_k^*\!-\! \begin{bmatrix} \left( \frac{1}{d} \sum _{s=d+1}^{2d} m_{n,ss}^{(k)} \right) {\mathbf {I}}_d&\quad {\mathbf 0}\\ {\mathbf 0}&\quad \left( \frac{1}{d} \sum _{s=1}^d m_{n,ss}^{(k)} \right) {\mathbf {I}}_d \end{bmatrix} \right\| \!\ll \! n^{-1/8} + \frac{1}{ n^{\delta \eta /2} v_n} \end{aligned}$$
(7.23)

with probability \(1 - O(n^{-100})\). Here we obtain the bound on the spectral norm by bounding each entry individually and noting that

$$\begin{aligned} \Vert {\mathbf {B}}\Vert \le \Vert {\mathbf {B}}\Vert _2 \le 2d \max _{i,j} |{\mathbf {B}}_{ij}| \end{aligned}$$

for any \(2d \times 2d\) matrix \({\mathbf {B}}\) (recall that the matrices above are \(2d \times 2d\)). The bound (7.23) holds with probability \(1 - O(n^{-100})\) by taking \(p\) sufficiently large in Lemma 7.5. The factor \(1/d\) appears because the entries of \({\mathbf {R}}_k\) have variance \(1/d\). We also used (iv) from Lemma 7.2 and the deterministic bound \(\sup _{s,t \in \{1,\ldots ,2d\}} | m_{n,st}^{(k)} | \le \Vert {\mathbf {G}}^{(k)}_n \Vert \le v_n^{-1}\).

By (7.21), (7.22), and the union bound over \(1 \le k \le n\), we obtain

$$\begin{aligned} \sup _{1 \le k \le n} \Vert {\mathbf {Q}}_k^{-1} - {\mathbf {M}}_n \Vert \ll n^{-1/8} + \frac{1}{ n^{\delta \eta /2} v_n} + \frac{1}{n v_n} \end{aligned}$$
(7.24)

with probability \(1 - O(n^{-99})\), where

$$\begin{aligned} {\mathbf {M}}_n := \begin{bmatrix} -(m_n(z,w) + w) {\mathbf {I}}_d&\quad -z {\mathbf {I}}_d \\ -\bar{z} {\mathbf {I}}_d&\quad -(m_n(z,w) + w) {\mathbf {I}}_d \end{bmatrix}. \end{aligned}$$

We note that

$$\begin{aligned} |m_n(z,w) + w| \ge \mathrm{Im}(m_n(z,w)+w) \ge \mathrm{Im}(w) \ge v_n > 0. \end{aligned}$$
(7.25)

It follows that \(M_n\) is invertible and the inverse is given (in block form) by

$$\begin{aligned} {\mathbf {M}}_n^{-1} = \begin{bmatrix} {\mathbf {M}}_{n,1}&\quad {\mathbf {M}}_{n,2} \\ {\mathbf {M}}_{n,3}&\quad {\mathbf {M}}_{n,4} \end{bmatrix}, \end{aligned}$$

where \({\mathbf {M}}_{n,1}, {\mathbf {M}}_{n,2}, {\mathbf {M}}_{n,3}, {\mathbf {M}}_{n,4}\) are \(d \times d\) matrices with

$$\begin{aligned} {\mathbf {M}}_{n,1} = {\mathbf {M}}_{n,4}&:= - \frac{ m_n(z,w)+w }{ (m_n(z,w)+w)^2 - |z|^2 } {\mathbf {I}}, \\ {\mathbf {M}}_{n,2}&:= -\frac{z}{m_n(z,w)+w} {\mathbf {M}}_{n,1}, \\ {\mathbf {M}}_{n,3}&:= -\frac{\bar{z}}{m_n(z,w)+w} {\mathbf {M}}_{n,1}. \end{aligned}$$

Using (7.25), we obtain

$$\begin{aligned}&\inf _{\mathrm{Im}(w) \ge v_n} | (m_n(z,w) + w)^2 - |z|^2 | \nonumber \\&\quad \ge \inf _{\mathrm{Im}(w) \ge v_n} \left| m_n(z,w) + w - |z| \right| |m_n(z,w) + w + |z|| \ge v_n^2 \end{aligned}$$
(7.26)

and hence

$$\begin{aligned} \sup _{|z| \le M} \sup _{|w| \le \beta , \mathrm{Im}(w) \ge v_n} \left\| {\mathbf {M}}_n^{-1} \right\| = O\left( v_n^{-4}\right) . \end{aligned}$$

Since \(\sup _{\mathrm{Im}(w) \ge v_n} \Vert {\mathbf {G}}_n(z,w)\Vert \le v_n^{-1}\), we obtain

$$\begin{aligned} \sup _{1 \le k \le n} \sup _{|w| \le \beta , \mathrm{Im}(w) \ge v_n} \Vert {\mathbf {Q}}_k\Vert = O\left( v_n^{-1}\right) . \end{aligned}$$

Therefore, by (7.24), we have

$$\begin{aligned} \sup _{1 \le k \le n} \left\| {\mathbf {Q}}_k - {\mathbf {M}}_n^{-1} \right\|&\le \sup _{1 \le k \le n} \left\| {\mathbf {Q}}_k ({\mathbf {Q}}_k^{-1} - {\mathbf {M}}_n ) {\mathbf {M}}_n^{-1}\right\| \\&\le \sup _{1 \le k \le n} \left\| {\mathbf {Q}}_k \Vert \Vert {\mathbf {Q}}_k^{-1} - {\mathbf {M}}_n \right\| \left\| {\mathbf {M}}_n^{-1} \right\| \\&\ll \frac{1}{n^{1/8} v_n^6} + \frac{1}{n^{\delta \eta /2} v_n^6} \end{aligned}$$

with probability at least \(1 - O(n^{-99})\). Since \(m_n(z,w)\) is the normalized sum of the diagonal elements of \({\mathbf {G}}_n\), we now consider the diagonal elements of \({\mathbf {Q}}_k\) and \({\mathbf {M}}_n\); from the above estimate, we conclude that

$$\begin{aligned} \left| m_n(z,w) + \frac{ m_n(z,w)+w }{ (m_n(z,w)+w)^2 - |z|^2 } \right| \ll v_n^5 \end{aligned}$$
(7.27)

with probability \(1 - O(n^{-99})\). Here we used the fact that

$$\begin{aligned} \frac{1}{n^{1/8} v_n^6} + \frac{1}{n^{\delta \eta /2} v_n^6} \le 2 v_n^5 \end{aligned}$$

by definition of \(v_n\).

We now use an \({\varepsilon }\)-net argument to extend (7.27) to all \(|z| \le M\) and \(|w| \le \beta \) with \(\mathrm{Im}(w) \ge v_n\). Since \( \sup _{\mathrm{Im}(w) \ge v_n } \Vert {\mathbf {G}}_n(z,w) \Vert \le v_n^{-1}\), we apply (7.18) and the resolvent identity (7.19) to obtain the deterministic bounds

$$\begin{aligned} |m_n(z,w) - m_n(z,w')| \le \frac{|w-w'|}{v_n^2} \end{aligned}$$

and

$$\begin{aligned} |m_n(z,w) - m_n(z',w)| \le \frac{|z-z'|}{v_n^2} \end{aligned}$$

for all \(z,z' \in \mathbb {C}\) and \(w,w' \in \mathbb {C}\) with \(\mathrm{Im}(w), \mathrm{Im}(w') \ge v_n\). Applying (7.26) and the triangle inequality, we obtain

$$\begin{aligned}&\left| \frac{ m_n(z,w) + w}{(m_n(z,w) + w)^2 - |z|^2} - \frac{m_n(z,w') + w'}{(m_n(z,w') + w')^2 - |z|^2} \right| \\&\quad \ll \frac{1}{v_n^2} \left| m_n(z,w) + w - m_n(z,w') - w' \right| \\&\quad \quad + \frac{1}{v_n} \left| \frac{1}{ (m_n(z,w) + w)^2 - |z|^2} - \frac{1}{(m_n(z,w') + w')^2 - |z|^2} \right| \\&\quad \ll \frac{|w-w'|}{v_n^4} + \frac{1}{v_n^5} \left| (m_n(z,w) + w)^2 - (m_n(z,w') + w')^2 \right| \\&\quad \ll \frac{|w-w'|}{v_n^8} \end{aligned}$$

for all \(|z| \le M\) and \(|w|, |w'| \le \beta \) with \(\mathrm{Im}(w), \mathrm{Im}(w') \ge v_n\). Similarly,

$$\begin{aligned} \left| \frac{ m_n(z,w) + w}{(m_n(z,w) + w)^2 - |z|^2} - \frac{m_n(z',w) + w}{(m_n(z',w) + w)^2 - |z'|^2} \right| \ll \frac{|z-z'|}{v_n^8} \end{aligned}$$

for all \(|z|, |z'| \le M\) and \(|w| \le \beta \) with \(\mathrm{Im}(w) \ge v_n\).

We now apply an \({\varepsilon }\)-net argument with \({\varepsilon }= v_n^{13}\) to the sets \(\{z \in \mathbb {C} : |z| \le M \}\) and \(\{ w \in \mathbb {C} : |w| \le \beta , \mathrm{Im}(w) \ge v_n\}\). Let \(\mathcal {N}_1\) and \(\mathcal {N}_2\) denote the respective \({\varepsilon }\)-nets of the two sets. By Lemma 7.8,

$$\begin{aligned} |\mathcal {N}_1| + |\mathcal {N}_2 |\ll v_n^{-26} \le n^{1/2}. \end{aligned}$$

Therefore, by a standard \({\varepsilon }\)-net argument and the union bound, we conclude that

$$\begin{aligned} \sup _{|z| \le M} \sup _{|w| \le \beta , \mathrm{Im}(w) \ge v_n} \left| m_n(z,w) + \frac{ m_n(z,w) + w}{(m_n(z,w) + w)^2 - |z|^2} \right| = O(v_n^5) \end{aligned}$$

with probability (say) \(1 - O(n^{-2})\). The claim now follows from an application of the Borel–Cantelli lemma. \(\square \)

7.3 Proof of Theorem 2.4

This subsection is devoted to the proof of Theorem 2.4. With Lemma 7.4 and Theorem 7.6 in hand, the proof of Theorem 2.4 will follow from a standard (and somewhat technical) argument; see [2, Chapter 11], [31], and references therein. We detail the argument below.

Recall the definition of the functions \(g_{{\mathbf {M}}}\) and \(g\) given in (7.1) and (7.2). By [2, Chapter 11] (see also [5] and [13, Section 3]), for each \(z \in \mathbb {C}\), there exists a probability measure \(\nu _z\) on the real line such that

$$\begin{aligned} g(s,t) = \frac{\partial }{\partial s} \int \limits _{0}^\infty \log |x|^2 d\nu _z(dx), \end{aligned}$$

where \(z = s + \sqrt{-1}t\).

Assume \(\{{\mathbf {X}}_n\}_{n \ge 1}\) and \(\{{\mathbf {N}}_n\}_{n \ge 1}\) satisfy the assumptions of Theorem 2.4. By Lemma 7.1 and [2, Lemma 11.5], in order to prove Theorem 2.4, it suffices to show that a.s.

$$\begin{aligned} \int \int \left[ g_{\frac{1}{\sqrt{n}}({\mathbf {X}}_n + {\mathbf {N}}_n)}(s,t) - g(s,t)\right] e^{\sqrt{-1} us + \sqrt{-1} vt} dt ds \longrightarrow 0 \end{aligned}$$

as \(n \rightarrow \infty \).

By the triangle inequality and Lemma 7.3, we have that a.s.

$$\begin{aligned} \frac{1}{n} \left\| \frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) \right\| _2^2 = O_d(1). \end{aligned}$$
(7.28)

Let \(A > 0\). Define

$$\begin{aligned} T:= \left\{ (s,t) : |s| \le A, |t| \le A^3 \right\} . \end{aligned}$$

By [2, Lemma 11.7] and (7.28), in order to prove Theorem 2.4 it suffices to show that for each fixed \(A>0\) a.s.

$$\begin{aligned} \int \int \limits _T \left[ g_{\frac{1}{\sqrt{n}}({\mathbf {X}}_n + {\mathbf {N}}_n)}(s,t) - g(s,t)\right] e^{\sqrt{-1} us + \sqrt{-1} vt} dt ds \longrightarrow 0 \end{aligned}$$

as \(n \rightarrow \infty \).

Let \({\varepsilon }_n := n^{-B}\) for some sufficiently large \(B>0\) (independent of \(n\)) to be chosen later. Following the integration by parts argument from [2, Section 11.7], it suffices to show that a.s.

$$\begin{aligned} \limsup _{n \rightarrow \infty } \int \int \limits _{T} \left| \int \limits _{{\varepsilon }_n}^{\infty } \log |x|^2 \left( \nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) - z{\mathbf {I}}}(dx) - \nu _z(dx) \right) \right| dt ds = 0 \end{aligned}$$
(7.29)

and

$$\begin{aligned} \limsup _{n \rightarrow \infty } \int \int \limits _{T} \left| \int \limits _{0}^{{\varepsilon }_n} \log |x|^2 \nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) - z{\mathbf {I}}}(dx) \right| dt ds = 0, \end{aligned}$$
(7.30)

and similarly with the two-dimensional integral on \(T\) replaced by one-dimensional integrals on the boundary of \(T\). We shall only estimate the two-dimensional integrals, as the treatment of the one-dimensional integrals are similar.

We prove (7.29) first. By (7.28), it follows that \(\nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) - z {\mathbf {I}}}\) is supported on \([-n^{50}, n^{50}]\) a.s. Thus, it suffices to show that a.s.

$$\begin{aligned} \limsup _{n \rightarrow \infty } \int \int \limits _{T} \left| \int \limits _{{\varepsilon }_n}^{n^{50}} \log |x|^2 \left( \nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) - z{\mathbf {I}}}(dx) - \nu _z(dx) \right) \right| dt ds = 0. \end{aligned}$$

By definition of \({\varepsilon }_n\), it suffices to show that a.s.

$$\begin{aligned} \limsup _{n \rightarrow \infty } (\log n) \sup _{z \in T} \sup _{x \in \mathbb {R}} \left| \nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) - z{\mathbf {I}}}( (-\infty , x)) - \nu _z( (-\infty , x)) \right| = 0. \end{aligned}$$
(7.31)

(7.31) will follow from Lemma 7.9 below.

We now prove (7.30). By Theorem 2.5 (and the Borel–Cantelli lemma), for some sufficiently large \(B > 0\), we have the following:

$$\begin{aligned} \text {for a.e. } z \in T, \text { a.s. }\quad \lim _{n \rightarrow \infty } \int \limits _{0}^{{\varepsilon }_n} \log |x|^2 \nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) - z{\mathbf {I}}}(dx) = 0. \end{aligned}$$
(7.32)

We now observe that it is possible to switch the quantifiers “a.e.” on \(z\) and “a.s.” on \(\omega \) in (7.32) using the arguments from [5, Section 4] and Fubini’s theorem, where \(\omega \) denotes an element of the sample space. Thus, we have

$$\begin{aligned} \text {a.s., } \text {for a.e. } z \in T, \quad \lim _{n \rightarrow \infty } \int \limits _{0}^{{\varepsilon }_n} \log |x|^2 \nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) - z{\mathbf {I}}}(dx) = 0. \end{aligned}$$
(7.33)

Using the \(L^2\)-norm argument in [31, Section 12], it follows that a.s.

$$\begin{aligned} \left( \int \int \limits _{T} \left| \int \limits _{0}^{{\varepsilon }_n} \log |x|^2 \nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) - z {\mathbf {I}}}(dx) \right| ^2 dt ds \right) ^{1/2} \end{aligned}$$
(7.34)

is bounded uniformly in \(n\), and hence the sequence of functions \(\int _{0}^{{\varepsilon }_n} \log |x|^2 \nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) - z{\mathbf {I}}}\) \((dx)\) is a.s. uniformly integrable on \(T\). Let \(L > 1\) be a large parameter and define \(T_{L,n}\) to be the set of all \(z \in T\) such that \(\left| \int _{0} ^{{\varepsilon }_n} \log |x|^2 \nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) - z {\mathbf {I}}}(dx)\right| \le L\). By (7.33) and the dominated convergence theorem, we have a.s.

$$\begin{aligned} \lim _{n \rightarrow \infty } \int \int \limits _{T_{L,n}} \left| \int \limits _{0}^{{\varepsilon }_n} \log |x|^2 \nu _{\frac{1}{\sqrt{n}}({\mathbf {X}}_n + {\mathbf {N}}_n) - z{\mathbf {I}}}(dx) \right| dt ds = 0. \end{aligned}$$

On the other hand, from the uniform boundedness of (7.34), we obtain a.s.

$$\begin{aligned} \limsup _{n \rightarrow \infty } \int \int \limits _{T {\setminus } T_{L,n}} \left| \int \limits _{0}^{{\varepsilon }_n} \log |x|^2 \nu _{\frac{1}{\sqrt{n}}({\mathbf {X}}_n + {\mathbf {N}}_n) - z{\mathbf {I}}}(dx) \right| dt ds \ll \frac{1}{L}. \end{aligned}$$

Combining the bounds above and taking \(L \rightarrow \infty \) yields (7.30).

It remains to establish the following lemma.

Lemma 7.9

Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of random matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume (7.3) holds for some \(\eta > 0\). For each \(n \ge 1\), let \({\mathbf {N}}_n\) is a \(dn \times dn\) matrix such that \({{\mathrm{rank}}}({\mathbf {N}}_n) = O(n^{1-{\varepsilon }})\) for some \({\varepsilon }> 0\). Then there exists \(\alpha > 0\) such that a.s.

$$\begin{aligned} \sup _{|z| \le M} \left\| \nu _{\frac{1}{\sqrt{n}}({\mathbf {X}}_n + {\mathbf {N}}_n) - z{\mathbf {I}}} - \nu _z \right\| = O_{M,m_{2+\eta },d}(n^{-\alpha }), \end{aligned}$$

where \(\Vert \nu - \mu \Vert := \sup _{x \in \mathbb {R}} | \nu ((-\infty , x)) - \mu ((-\infty , x))|\) for any two probability measures \(\nu , \mu \) on the real line.

Proof

(of Lemma 7.9) The proof of Lemma 7.9 is based on the arguments from [34, Lemma 64]. By [2, Theorem A.43],

$$\begin{aligned} \sup _{z \in \mathbb {C}} \left\| \nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_N) - z{\mathbf {I}}} - \nu _{\frac{1}{\sqrt{n}} {\mathbf {X}}_n - z{\mathbf {I}}} \right\| = O(n^{-{\varepsilon }}). \end{aligned}$$

Thus, by the triangle inequality, it suffices to show that a.s.

$$\begin{aligned} \sup _{|z| \le M} \left\| \nu _{\frac{1}{\sqrt{n}} {\mathbf {X}}_n - z{\mathbf {I}}} - \nu _z \right\| = O_{M,m_{2+\eta }}(n^{-\alpha }) \end{aligned}$$

for some \(\alpha > 0\).

From [13, Remark 3.1], it follows that for each \(z \in \mathbb {C}\), \(\nu _z\) has density \(\rho _z\) with

$$\begin{aligned} \sup _{z \in \mathbb {C}} \sup _{x \in \mathbb {R}} |\rho _z(x)| \le 1. \end{aligned}$$

By [2, Lemma B.18], it suffices to show that a.s.

$$\begin{aligned} \sup _{|z| \le M} L\left( F^{{\mathbf {H}}_n(z)}, F_z \right) = O_{M,m_{2+\eta }}(n^{-\alpha }), \end{aligned}$$

where \(F_z\) is the cumulative distribution function of \(\nu _z\). We remind the reader that \(L(F,G)\) denotes the Levy distance, defined in (7.11), between the distribution functions \(F\) and \(G\).

By Lemma 7.4, it suffices to show that a.s.

$$\begin{aligned} \sup _{|z| \le M} \left\| \nu _{\frac{1}{\sqrt{n}} \hat{{\mathbf {X}}}_n - z {\mathbf {I}}} - \nu _z \right\| = O_{M,m_{2+\eta }}(n^{-\alpha }), \end{aligned}$$
(7.35)

where \(\{\hat{{\mathbf {X}}}_n\}_{n \ge 1}\) is the sequence of truncated matrices from Lemma 7.4 for some \(0 < \delta < 1/100\). Let \(m(z,w)\) denote the Stieltjes transform of \(\nu _z\). That is,

$$\begin{aligned} m(z,w) := \int \frac{1}{x - w} d \nu _z(dx) = \int \frac{ \rho _z(x) dx}{x - w}. \end{aligned}$$

From [13, Section 3], it follows that \(m(z,w)\) is a solution of

$$\begin{aligned} m(z,w) + \frac{m(z,w) + w}{(m(z,w) + w)^2 - |z|^2} = 0 \end{aligned}$$

analytic in the upper-half plane \(\{w \in \mathbb {C} : \mathrm{Im}(w) > 0 \}\).

By [13, Remark 3.1], we choose \(\beta > 100\) sufficiently large (depending only on \(M\)) such that \(\rho _z\) is supported inside the interval \([-\beta /2, \beta /2]\) for all \(|z| \le M\). By Theorem 7.6 and [13, Lemma 2.4], it follows that a.s.

$$\begin{aligned} \sup _{|z| \le M } \sup _{|w| \le 4\beta , \mathrm{Im}(w) \ge v_n} | \hat{m}_n(z,w) - m(z,w)| = O_{M,m_{2+\eta }}(v_n^4), \end{aligned}$$
(7.36)

where \(v_n\) is defined in Theorem 7.6.

For the remainder of the proof, we fix a realization in which (7.36) holds. The implicit constants in our asymptotic notation (such as \(O,o,\Omega , \ll \)) depend only on the constants \(M, m_{2+\eta },d\); for simplicity, we no longer include these subscripts in our notation. By [13, (3.2)] and (7.36), it follows that

$$\begin{aligned} \sup _{|z| \le M} \sup _{|w| \le 4\beta , \mathrm{Im}(w) \ge v_n} \mathrm{Im}\left( \hat{m}_n(z,w) \right) \ll 1. \end{aligned}$$

Write \(w = u + \sqrt{-1}v\). For any interval \(I \subset \mathbb {R}\), we define

$$\begin{aligned} N_I(z) := \# \left\{ 1 \le i \le 2dn : \lambda _i(\hat{{\mathbf {H}}}_n(z)) \in I \right\} , \end{aligned}$$

where \(\lambda _1(\hat{{\mathbf {H}}}_n(z)), \lambda _2(\hat{{\mathbf {H}}}_n(z)), \ldots , \lambda _{2dn}(\hat{{\mathbf {H}}}_n(z))\) are the eigenvalues of \(\hat{{\mathbf {H}}}_n(z)\). We remind the reader that the eigenvalues of \(\hat{{\mathbf {H}}}_n(z)\) are given by

$$\begin{aligned} \pm \sigma _1\left( \frac{1}{\sqrt{n}} \hat{{\mathbf {X}}}_n - z{\mathbf {I}}\right) , \pm \sigma _2\left( \frac{1}{\sqrt{n}} \hat{{\mathbf {X}}}_n - z{\mathbf {I}}\right) , \ldots , \pm \sigma _{dn}\left( \frac{1}{\sqrt{n}} \hat{{\mathbf {X}}}_n - z{\mathbf {I}}\right) . \end{aligned}$$

For an interval \(I \subset [-\beta , \beta ]\) of length \(|I| = v \ge v_n\) centered at \(u\), we have

$$\begin{aligned} \sup _{|z| \le M} \frac{N_I(z)}{8dn v}&\le \sup _{|z| \le M} \frac{1}{2dn} \sum _{i=1}^{2dn} \frac{v}{v^2 + |u - \lambda _i(\hat{{\mathbf {H}}}_n(z))|^2}\\&= \sup _{|z| \le M} \mathrm{Im}(\hat{m}_n(z,u+\sqrt{-1}v)) \ll 1. \end{aligned}$$

We conclude that for any interval \(I \subset [-\beta , \beta ]\) of length \(|I| \ge v_n\),

$$\begin{aligned} \sup _{|z| \le M} N_I(z) \ll |I| n. \end{aligned}$$
(7.37)

Fix an interval \(I \subset [-\beta , \beta ]\) with \(|I| \ge 10 v_n\). Define

$$\begin{aligned} f(y) := \frac{1}{\pi } \int \limits _{I} \frac{v_n}{v_n^2 + (x-y)^2} dx. \end{aligned}$$

We have

$$\begin{aligned} \frac{1}{2dn} \sum _{i=1}^{2dn} f( \lambda _i(\hat{{\mathbf {H}}}_n(z))) = \frac{1}{\pi } \mathrm{Im}\int \limits _{I} m_n(z, u + \sqrt{-1} v_n) du \end{aligned}$$

and

$$\begin{aligned} \int \limits _{\mathbb {R}} f(y) \rho _z(y) dy = \frac{1}{\pi } \mathrm{Im}\int \limits _{I} m(z,u + \sqrt{-1} v_n) du \end{aligned}$$

for any \(z \in \mathbb {C}\). Therefore, by (7.36), we obtain

$$\begin{aligned} \sup _{|z| \le M} \left| \frac{1}{2dn} \sum _{i=1}^{4n} f( \lambda _i({\mathbf {H}}_n(z))) - \int \limits _{I} f(y) \rho _z(y) dy \right| = O(|I| v_n^4). \end{aligned}$$

We note the following pointwise bounds:

$$\begin{aligned} f(y) \ll \frac{v_n |I|}{{{\mathrm{dist}}}^2(y,I)} \end{aligned}$$

when \(y \notin I\) and \({{\mathrm{dist}}}(y,I) \ge |I|\), and

$$\begin{aligned} f(y) \ll \frac{1}{1 + {{\mathrm{dist}}}(y,I)/v_n} \end{aligned}$$

when \(y \notin I\) and \({{\mathrm{dist}}}(y,I) < |I|\). In the case where \(y \in I\), we have

$$\begin{aligned} f(y) = 1 + O \left( \frac{1}{1 + {{\mathrm{dist}}}(y,I^c)/v_n} \right) \end{aligned}$$

as \(\frac{1}{\pi } \frac{v_n}{v_n^2 + (x-y)^2}\) has total integral \(1\). Using these bounds, we find

$$\begin{aligned} \sup _{|z| \le M } \left| \int \limits _{\mathbb {R}} f(y) \rho _z(y) dy - \int \limits _{I} \rho _z(y)dy \right| = O \left( v_n \log \frac{|I|}{v_n} \right) . \end{aligned}$$

Similarly, by (7.37), Riemann integration, and the trivial bound \(N_{J} \le 2dn\) for any interval \(J\) outside of \([-\beta , \beta ]\), we have

$$\begin{aligned} \sup _{|z| \le M} \left| \frac{1}{2dn} \sum _{i=1}^{2dn} f( \lambda _i(\hat{{\mathbf {H}}}_n(z))) - \frac{1}{2dn} N_I(z) \right| = O \left( v_n \log \frac{|I|}{v_n} \right) . \end{aligned}$$

Combining the bounds above, we conclude that for any interval \(I \subset [-\beta , \beta ]\) with \(|I| \ge 10v_n\), we have

$$\begin{aligned} \sup _{|z| \le M} \left| \frac{1}{2dn} N_I(z) - \int \limits _{I} \rho _z(y) dy \right| = O(v_n^4 |I|) + O\left( v_n \log \frac{|I|}{v_n} \right) . \end{aligned}$$

In particular, since \(\rho _z\) is supported inside \([-\beta /2, \beta /2]\), we obtain

$$\begin{aligned} \sup _{|z| \le M} \frac{1}{2dn} N_{[-\beta /2, \beta /2]^{\mathsf {c}}}(z) = O(v_n \log v_n^{-1}), \end{aligned}$$

where \([-\beta /2, \beta /2]^\mathsf {c}\) is the complement of the interval \([-\beta /2,\beta /2]\). Thus, we have

$$\begin{aligned}&\sup _{|z| \le M} \left\| \nu _{\frac{1}{\sqrt{n}} \hat{{\mathbf {X}}}_n - z{\mathbf {I}}} - \nu _z \right\| \\&\quad \ll v_n \log v_n^{-1} + \sup _{|z| \le M} \sup _{x \in [-\beta /2,\beta /2]} \left| \frac{1}{2dn} N_{[-\beta , x)}(z) - \int \limits _{-\beta }^x \rho _z(y) dy \right| \\&\quad \ll v_n \log v_n^{-1}. \end{aligned}$$

Since this bound holds for each fixed realization in which (7.36) holds, we obtain (7.35) a.s. The proof of the lemma is complete. \(\square \)