Abstract
The circular law asserts that if \({\mathbf {X}}_n\) is a \(n \times n\) matrix with iid complex entries of mean zero and unit variance, then the empirical spectral distribution of \(\frac{1}{\sqrt{n}} {\mathbf {X}}_n\) converges almost surely to the uniform distribution on the unit disk as \(n\) tends to infinity. Answering a question of Tao, we prove the circular law for a general class of random block matrices with dependent entries. The proof relies on an inverse-type result for the concentration of linear operators and multilinear forms.
1 Introduction
The eigenvalues of a \(n \times n\) matrix \({\mathbf {M}}\) are the roots in \(\mathbb {C}\) of the characteristic polynomial \(\det ({\mathbf {M}}-z{\mathbf {I}})\), where \({\mathbf {I}}\) is the identity matrix. We let \(\lambda _1({\mathbf {M}}), \ldots , \lambda _n({\mathbf {M}})\) denote the eigenvalues of \({\mathbf {M}}\). In this case, the empirical spectral measure of \({\mathbf {M}}\) is given by
The corresponding empirical spectral distribution (ESD) is given by
Here \(\# E\) denotes the cardinality of the set \(E\). If the matrix \({\mathbf {M}}\) is Hermitian, then the eigenvalues \(\lambda _1({\mathbf {M}}), \ldots , \lambda _n({\mathbf {M}})\) are real. In this case the ESD is given by
Given a random \(n \times n\) matrix \({\mathbf {X}}_n\), an important problem in random matrix theory is to study the limiting distribution of the empirical spectral measure as \(n\) tends to infinity. We consider one of the simplest random matrix ensembles, when the entries of \({\mathbf {X}}_n\) are iid copies of the random variable \(\xi \). We refer to \(\xi \) as the atom variable of \({\mathbf {X}}_n\).
When \(\xi \) is a standard complex Gaussian random variable, \({\mathbf {X}}_n\) can be viewed as a random matrix drawn from the probability distribution
on the set of complex \(n \times n\) matrices. Here \(d {\mathbf {M}}\) denotes the Lebesgue measure on the \(2n^2\) real entries
of \({\mathbf {M}}=(m_{ij})_{i,j=1}^n\). This is known as the complex Ginibre ensemble. The real Ginibre ensemble and quaternionic Ginibre ensemble are defined analogously.
Following Ginibre [9], one may compute the joint density of the eigenvalues of a random matrix \({\mathbf {X}}_n\) drawn from the complex Ginibre ensemble. Mehta [17, 18] used this joint density function to compute the limiting spectral measure of the complex Ginibre ensemble. In particular, he showed that if \({\mathbf {X}}_n\) is drawn from the complex Ginibre ensemble, then the ESD of \(\frac{1}{\sqrt{n}} {\mathbf {X}}_n\) converges to the circular law \(F_{\mathrm {circ}}\), where
and \(\mu _{\mathrm {circ}}\) is the uniform probability measure on the unit disk in the complex plane. Edelman [7] verified the same limiting distribution for the real Ginibre ensemble.
For the general (non-Gaussian) case, there is no formula for the joint distribution of the eigenvalues and the problem appears much more difficult. The universality phenomenon in random matrix theory asserts that the spectral behavior of a random matrix does not depend on the distribution of the atom variable \(\xi \) in the limit \(n \rightarrow \infty \). In other words, one expects that the circular law describes the limiting ESD of a large class of random matrices (not just Gaussian matrices); Fig. 1 presents a numerical simulation depicting this universality phenomenon.
The eigenvalues of random matrices with iid entries. The first plot contains the eigenvalues of \(50\) samples of \(100 \times 100\) random matrices drawn from the real Ginibre ensemble. The second plot contains the eigenvalues of \(50\) samples of \(100 \times 100\) random matrices whose entries are Bernoulli random variables (i.e. each entry takes values \(\pm 1\) with equal probability). The black circle in each plot is the unit circle of radius one centered at the origin
In the 1950s, Wigner [36] proved a version of the universality phenomenon for Hermitian random matrices, now known as Wigner matrices. However, the random matrix ensemble described above is not Hermitian. In fact, many of the techniques used to deal with Hermitian random matrices do not apply to non-Hermitian matrices [2, Section 11.1].
An important result was obtained by Girko [10, 11] who related the empirical spectral measure of non-Hermitian matrices to that of Hermitian matrices. Building upon this Hermitization technique, Bai [2, 3] gave the first rigorous proof of the circular law for general (non-Gaussian) distributions under a number of moment and smoothness assumptions on the atom variable \(\xi \). Important results were obtained more recently by Pan and Zhou [25] and Götze and Tikhomirov [13]. Tao and Vu [31] were able to prove the circular law under the assumption that \({\mathbb {E}}|\xi |^{2+{\varepsilon }} < \infty \), for some \({\varepsilon }> 0\). Recently, Tao and Vu [28, 33] established the law assuming only that \(\xi \) has finite variance.
For any \(m \times n\) matrix \({\mathbf {M}}\), we denote the Hilbert–Schmidt norm \(\Vert {\mathbf {M}}\Vert _2\) by the formula
Theorem 1.1
(Tao and Vu [33]) Let \(\xi \) be a complex random variable with mean zero and unit variance. For each \(n \ge 1\), let \({\mathbf {X}}_n\) be a \(n \times n\) matrix whose entries are iid copies of \(\xi \), and let \({\mathbf {N}}_n\) be a \(n \times n\) deterministic matrix. If \({{\mathrm{rank}}}({\mathbf {N}}_n) = o(n)\) and \(\sup _{n \ge 1} \frac{1}{n^2} \Vert {\mathbf {N}}_n \Vert _2^2 < \infty \), then the ESD of \(\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n)\) converges almost surely to the circular law \(F_{\mathrm {circ}}\) as \(n \rightarrow \infty \).Footnote 1
One of the key steps in proving Theorem 1.1 is controlling the largest and smallest singular values of \({\mathbf {X}}_n + {\mathbf {N}}_n\). We recall that the singular values of a \(m \times n\) matrix \({\mathbf {M}}\) are the eigenvalues of \(|{\mathbf {M}}| := \sqrt{ {\mathbf {M}}^*{\mathbf {M}}}\). We let \(\sigma _1({\mathbf {M}}) \ge \cdots \ge \sigma _n({\mathbf {M}}) \ge 0\) denote the singular values of \({\mathbf {M}}\). In particular, the largest and smallest singular values are given by
where \(\Vert v\Vert \) denotes the Euclidean norm of the vector \(v\). We let \(\Vert {\mathbf {M}}\Vert := \sigma _1({\mathbf {M}})\) denote the spectral norm of the matrix \({\mathbf {M}}\).
While the behavior of the largest singular value is well studied (e.g. see [1, 27]), bounds for the smallest singular value appear more difficult. Using techniques from additive combinatorics, Tao and Vu established the following bound on the least singular value of \({\mathbf {X}}_n + {\mathbf {N}}_n\).
Theorem 1.2
(Tao and Vu [31]) Assume that \({\mathbf {X}}_n\) is an \(n \times n\) random matrix whose entries are iid copies of a random variable with mean zero and variance one. Assume that \({\mathbf {N}}_n\) is a deterministic \(n \times n\) matrix whose entries are bounded by \(n^\alpha \) in absolute value. Then for any \(B>0\), there exists \(A>0\) (depending on \(B\) and \(\alpha \)) such that
2 Universality of random block matrices
The goal of this note is to study a class of random matrices that generalizes the random matrix ensemble discussed above. In particular, we consider random block matrices whose entries are not necessarily independent. We will show that, under some moment assumptions, the limiting ESD of these block matrices is also given by the circular law.
2.1 Quaternions and matrices of quaternions
One of the prototypical examples of a block matrix is that of a quaternionic matrix. We now review some preliminary facts on quaternions and matrices of quaternions. Most of the these results can be found in the detailed survey by Zhang [37]. Let \({\mathbb {H}}\) denote the non-commutative field of quaternions. As a real vectors space \({\mathbb {H}}\) admits a basis \(\{1,{\mathbf {i}},{\mathbf {j}},{\mathbf {k}}\}\) with the usual multiplicative table: \(1\) is the identity element and
For \(q = q_0 + {\mathbf {i}}q_1 + {\mathbf {j}}q_2 + q_3 {\mathbf {k}}\in {\mathbb {H}}\), we have \(q^*:= q_0 - q_1 {\mathbf {i}}- q_2 {\mathbf {j}}- q_3 {\mathbf {k}}\), \(\mathrm{Re}(q) := q_0\), and \(\mathrm{Im}(q) := q_1 {\mathbf {i}}+ q_2 {\mathbf {j}}+ q_3 {\mathbf {k}}\). Then
and thus any nonzero quaternion is invertible. Define the norm \(|q| : = \sqrt{q q^*}\). It follows that for any \(q,q' \in {\mathbb {H}}\), \(|q q'| = |q| |q'|\). Real numbers and complex numbers can be thought of as quaternions in the natural way, and one has \({\mathbb {R}}\subset {\mathbb {C}}\subset {\mathbb {H}}\). Every quaternion \(q = q_0 + q_1 {\mathbf {i}}+ q_2 {\mathbf {j}}+ q_3 {\mathbf {k}}\) can be written uniquely as \(q = c_1 + c_2{\mathbf {j}}\) where \(c_1 = q_0 + q_1{\mathbf {i}}\), \(c_2 = q_2 + q_3 {\mathbf {i}}\) are complex numbers.
We say that two quaternions \(q, q'\) are similar if there exists a nonzero quaternion \(x\) such that \(q = x q' x^{-1}\). We let \( {\mathbb {S}}({\mathbb {H}})\) denote the group of quaternions with norm one. It follows that \(q,q'\) are similar if and only if there exists \(x \in {\mathbb {S}}({\mathbb {H}})\) with \(q = x q' x^*\). The following lemma shows that every quaternion is similar to a complex number.
Lemma 2.1
[37] If \(q = q_0 + q_1 {\mathbf {i}}+ q_2 {\mathbf {j}}+ q_3 {\mathbf {k}}\in {\mathbb {H}}\), then \(q\) and \(\mathrm{Re}(q) + |\mathrm{Im}(q)| {\mathbf {i}}\) are similar.
Let \({\mathbf {M}}\) be a \(n \times n\) matrix with quaternion entries. Then \(\lambda \in {\mathbb {H}}\) is called a right eigenvalue of \({\mathbf {M}}\) if there exists a nonzero vector \(X \in {\mathbb {H}}^n\) such that \({\mathbf {M}}X = X \lambda \). If \(\lambda \) is a right eigenvalue of \({\mathbf {M}}\), one finds that \(q \lambda q^{-1}\) is also a right eigenvalue of \({\mathbf {M}}\) for any nonzero quaternion \(q\). Hence the right spectrum of \({\mathbf {M}}\) is either infinite or contained in \({\mathbb {R}}\). From Lemma 2.1, we restrict our attention to complex right eigenvalues. We consider the (unique) decomposition \({\mathbf {M}}= {\mathbf {M}}_1 + {\mathbf {M}}_2 {\mathbf {j}}\). Then for any \(\lambda \in {\mathbb {C}}\) and \(X = Y + Z {\mathbf {j}}\) with \(Y,Z \in {\mathbb {C}}^n\), the following are equivalent:
-
(i)
\({\mathbf {M}}X = X \lambda \),
-
(ii)
\(\begin{bmatrix} {\mathbf {M}}_1&\quad {\mathbf {M}}_2 \\ -\overline{{\mathbf {M}}}_2&\quad \overline{{\mathbf {M}}}_1 \end{bmatrix} \begin{bmatrix} Y \\ - \overline{Z} \end{bmatrix} = \lambda \begin{bmatrix} Y \\ -\overline{Z} \end{bmatrix}\),
-
(iii)
\(\begin{bmatrix} {\mathbf {M}}_1&\quad {\mathbf {M}}_2 \\ -\overline{{\mathbf {M}}}_2&\quad \overline{{\mathbf {M}}}_1 \end{bmatrix} \begin{bmatrix} Z \\ \overline{Y} \end{bmatrix} = \bar{\lambda } \begin{bmatrix} Z \\ \overline{Y} \end{bmatrix}\).
Thus, the right spectrum of \({\mathbf {M}}\), when restricted to complex numbers, is given by the \(2n\) eigenvalues of the complex matrix
Moreover, the complex eigenvalues appear as conjugate pairs. The whole set of right eigenvalues of \({\mathbf {M}}\) is then the union of all similarity classes of the complex right eigenvalues of \({\mathbf {M}}\).
2.2 Random quaternionic matrices
Let \(\xi \) be a real random variable with mean zero and variance \(1/4\). We study the right eigenvalues of random quaternion matrices whose entries are iid copies of \(\xi _0 + \xi _1 {\mathbf {i}}+ \xi _2 {\mathbf {j}}+ \xi _3 {\mathbf {k}}\), where \(\xi _0, \xi _1, \xi _2, \xi _3\) are iid copies of \(\xi \). From the discussion above, we find that this is equivalent to studying the eigenvalues of random complex block matrices. Indeed, the problem reduces to studying the eigenvalues of the \(2n \times 2n\) matrix
where \({\mathbf {A}}_n, {\mathbf {B}}_n\) are independent \(n \times n\) complex matrices whose entries are iid copies of \(\xi _0 + \xi _1 {\mathbf {i}}\). We note, however, that the entries of \({\mathbf {X}}_n\) are not independent. Thus, Theorem 1.1 cannot be applied to the block matrix \({\mathbf {X}}_n\).
In the case that \(\xi \) is Gaussian (i.e. the quaternionic Ginibre ensemble), the circular law was established by Benaych-Georges and Chapon [4] using logarithmic potential theory. We will verify the circular law for random quaternionic matrices when the atom variable \(\xi \) is non-Gaussian.
Theorem 2.2
(Universality for quaternion random matrices) Let \(\xi \) be a complex random variable with mean zero and variance \(1/2\), and suppose \({\mathbb {E}}[\xi ^2] = 0\) and \({\mathbb {E}}|\xi |^{2+\eta } < \infty \) for some \(\eta > 0\). For each \(n \ge 1\), let \({\mathbf {A}}_n, {\mathbf {B}}_n\) be independent \(n \times n\) matrices whose entries are iid copies of \(\xi \), and let \({\mathbf {X}}_n\) be the \(2n \times 2n\) matrix defined in (2.1). For each \(n \ge 1\), let \({\mathbf {N}}_n\) be a deterministic \(2n \times 2n\) matrix, and suppose the sequence \(\{{\mathbf {N}}_n\}_{n \ge 1}\) satisfies \({{\mathrm{rank}}}({\mathbf {N}}_n) = O(n^{1-{\varepsilon }})\) and \(\sup _{n \ge 1} \frac{1}{n^2} \Vert {\mathbf {N}}_n\Vert _2^2 < \infty \), for some \({\varepsilon }> 0\). Then the ESD of \(\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n)\) converges almost surely to the circular law \(F_{\mathrm {circ}}\) as \(n \rightarrow \infty \).
2.3 Random block matrices
More generally, we will study random block matrices of the form
where \({\mathbf {A}}_n = (a_{ij})_{i,j=1}^n, {\mathbf {B}}_n = (b_{ij})_{i,j=1}^n, {\mathbf {C}}_n = (c_{ij})_{i,j=1}^n, {\mathbf {D}}_n = (d_{ij})_{i,j=1}^n\), and
is a collection of iid copies of the random vector \((\xi _1,\xi _2,\xi _3,\xi _4)\). Here the random variables \(\xi _1, \xi _2, \xi _3, \xi _4\) are not required to be independent.
This ensemble of block matrices was proposed by Tao at the AIM Workshop on Random Matrices as a matrix model with dependent entries in which the circular law is still expected to hold.Footnote 2 We will prove the circular law for this ensemble of random block matrices under some moment assumptions on the atom variables \(\xi _1,\xi _2,\xi _3,\xi _4\).
The matrix \({\mathbf {X}}_n\) in (2.2) can be viewed as a \(2 \times 2\) block matrix. More generally, we will study \(d \times d\) block matrices for any \(d \ge 2\). We begin with the following definition.
Definition 2.3
(Random block matrices with dependent entries; Condition C0) Let \(d \ge 2\). Let \((\xi _{st})_{s,t=1}^d\) be a complex random matrix where each entry \(\xi _{st}\) has mean zero and variance \(1/d\). For each \(s,t \in \{1,\ldots ,d\}\), let \(\{ x_{st;ij}\}_{i,j \ge 1}\) be an infinite double array of complex random variables all defined on the same probability space. For each \(n \ge 1\) and all \(s,t \in \{1,\ldots ,d\}\), define the \(n \times n\) random matrix \({\mathbf {X}}_{n,st} := (x_{st;ij})_{i,j=1}^n\). Define the \(dn \times dn\) random block matrix
We say the sequence of matrices \(\{{\mathbf {X}}_n\}_{n \ge 1}\) satisfies condition C0 with parameter \(d\) and atom variables \((\xi _{st})_{s,t=1}^d\) if the following conditions hold:
-
(i)
\(\{ (x_{st;ij})_{s,t=1}^d : 1 \le i,j \}\) is a collection of iid copies of \((\xi _{st})_{s,t=1}^d\),
-
(ii)
We have \({\mathbb {E}}\left[ \xi _{st} \overline{\xi _{uv}}\right] = 0\) for all \((s,t) \ne (u,v)\).
In Theorem 2.4 below, we establish the circular law for a class of random block matrices that satisfy condition C0. In particular, Theorem 2.2 is a corollary of the following theorem in the case that \(d=2\).
Theorem 2.4
(Universality for random block matrices) Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of random matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume that
for some \(\eta > 0\). For each \(n \ge 1\), let \({\mathbf {N}}_n\) be a deterministic \(dn \times dn\) matrix, and suppose the sequence \(\{{\mathbf {N}}_n\}_{n \ge 1}\) satisfies \({{\mathrm{rank}}}({\mathbf {N}}_n) = O(n^{1-{\varepsilon }})\) and \(\sup _{n\ge 1} \frac{1}{n^2} \Vert {\mathbf {N}}_n\Vert _2^2 < \infty \) for some \({\varepsilon }> 0\). Then the ESD of \(\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n)\) converges almost surely to the circular law \(F_\mathrm {circ}\) as \(n \rightarrow \infty \).
In Definition 2.3, we require the atom variables \((\xi _{st})_{s,t=1}^d\) to be uncorrelated. In this note, we will not deal with the correlated case. However, when there is a correlation among the atom variables, we do not always expect the circular law to be the limiting distribution. In Fig. 2, we plot the eigenvalues of \(\frac{1}{\sqrt{2n}} {\mathbf {X}}_n\) in the case that
where \({\mathbf {A}}_n, {\mathbf {B}}_n\) are independent \(n \times n\) random matrices drawn from the real Ginibre ensemble. In particular, \({\mathbf {X}}_n\) does not satisfy condition (ii) of Definition 2.3. Figure 2 predicts that more of the eigenvalues will concentrate near the origin, and so we do not believe the limiting distribution will be uniform on the unit disk.
The eigenvalue plot of \(\frac{1}{\sqrt{2n}} {\mathbf {X}}_n\), when \(n=2000\), \({\mathbf {X}}_n\) is defined in (2.3), and \({\mathbf {A}}_n, {\mathbf {B}}_n\) are independent \(n \times n\) random matrices drawn from the real Ginibre ensemble. The eigenvalues appear to concentrate near the origin
2.4 Least singular value bound
One of the key ingredients in the proof of Theorem 2.4 is a bound on the least singular value of random matrices \(\{{\mathbf {X}}_n\}_{n \ge 1}\) that satisfy condition C0. In particular, we establish the following result.
Theorem 2.5
(Least singular value bound) Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of random matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume that
for some \(\eta > 0\). For each \(n \ge 1\), let \({\mathbf {N}}_n\) be a deterministic \(dn \times dn\) matrix whose entries are bounded by \(n^{\alpha }\) for some \(\alpha > 0\). Then, for every \(B > 0\), there exist \(A>0\) (depending only on \(d, B, \alpha \)) such that
2.5 Overview
The proof of Theorem 2.5 requires studying an inverse Littlewood–Offord problem for random multilinear forms. To this end, we introduce the Littlewood–Offord problem and random multilinear forms in Sect. 3. Sections 4, 5, and 6 contain the proof of Theorem 2.5. Finally, we prove Theorem 2.4 in Sect. 7. A number of auxiliary results are contained in the appendix.
2.6 Notation
We use asymptotic notation (such as \(O,o,\Omega , \asymp \)) under the assumption that \(n \rightarrow \infty \). We use \(X \ll Y, Y \gg X, Y=\Omega (X)\), or \(X = O(Y)\) to denote the bound \(X \le CY\) for all sufficiently large \(n\) and for some constant \(C\). Notations such as \(X \ll _k Y\) and \(X=O_k(Y)\) mean that the hidden constant \(C\) depends on another constant \(k\). \(X=o(Y)\) or \(Y=\omega (X)\) means that \(X/Y \rightarrow 0\) as \(n \rightarrow \infty \).
We let \(\Vert {\mathbf {M}}\Vert _2\) denote the Hilbert–Schmidt norm of \({\mathbf {M}}\) [defined in (1.1)], and let \(\Vert {\mathbf {M}}\Vert \) denote the spectral norm of \({\mathbf {M}}\). For a vector \({\mathbf {v}}\), we let \(\Vert {\mathbf {v}}\Vert = \Vert {\mathbf {v}}\Vert _2\) denote the Euclidean norm of \({\mathbf {v}}\).
We let \({\mathbf {I}}_n\) denote the \(n \times n\) identity matrix. Often we will just write \({\mathbf {I}}\) for the identity matrix when the size can be deduced from the context. Similarly, we let \({\mathbf 0}\) denote the zero matrix.
For an event \(E\), we let \(\mathbf {1}_{E}\) denote the indicator function of the event \(E\). We write a.s., a.a., and a.e. for almost surely, Lebesgue almost all, and Lebesgue almost everywhere respectively. We use \(\sqrt{-1}\) to denote the imaginary unit and reserve \(i\) as an index.
3 The Littlewood–Offord problem and random multilinear forms
In this section, we introduce the Littlewood–Offord problem and some anti-concentration results for random multilinear forms, which will be used to prove Theorem 2.5.
3.1 The Littlewood–Offord problem
Consider \(\xi \) a real random variable with mean zero and unit variance. A large portion of classical probability theory is devoted to studying random sums \(S_\xi (A) := \sum _{i=1}^n a_i x_i\), where \(A =\{a_1,\ldots , a_n\}\) is a multiset of complex vectors in \({\mathbb {C}}^d\) and \(x_1,\ldots ,x_n\) are iid copies of \(\xi \). The Littlewood–Offord problem is to estimate the small ball probability
In particular, if \(\rho _{\beta ,\xi }(A)\) is small, then the random sum \(S_\xi (A)\) is well spread. Conversely, if \(\rho _{\beta ,\xi }(A)\) is large, then the random sum concentrates near a point.
A classical result of Littlewood and Offord [16], which was strengthened by Erdős [8], gives an estimate for the small ball probability when \(\xi \) is a Bernoulli random variable (takes values \(\pm 1\) each with probability \(1/2\)) and \(d=1\).
Theorem 3.1
(Erdős [8]) Let \(\xi \) be a Bernoulli random variable. If the complex numbers \(a_i\) satisfy \(|a_i| \ge 1\) for all \(i\), then
The reader is invited to consult [23] and references therein for further extensions of this result. Motivated by inverse theorems from additive combinatorics, Tao and Vu [29] consider the following phenomenon:
If \(\rho _{\beta ,\xi }(A)\) is large, then most of the elements of \(A\) are additively correlated.
In order to introduce the precise result, we recall the notion of a generalized arithmetic progression (GAP). A set \(Q\subset {\mathbb {C}}^d\) is a GAP of rank r if it can be expressed in the form
for some \(g_0,\ldots ,g_r\in {\mathbb {C}}^d\), and some integers \(K_1,\ldots ,K_r,K'_1,\ldots ,K'_r\).
The vectors \(g_i\) are the generators of \(Q\), the numbers \(K_i'\) and \(K_i\) are the dimensions of \(Q\), and \({{\mathrm{Vol}}}(Q) := |B|\) is the volume of \(Q\). We say that \(Q\) is proper if \(|Q| = {{\mathrm{Vol}}}(Q)\). If \(g_0=0\) and \(-K_i=K_i'\) for all \(i\ge 1\), we say that \(Q\) is symmetric.
Consider a proper symmetric GAP \(Q= \{\sum _{i=1}^r k_ig_i : -K_i \le k_i \le K_i\}\) of rank \(r=O(1)\) and size \(N=n^{O(1)}\) in \({\mathbb {C}}\). Assume that \(\xi \) has Bernoulli distribution and for each \(a_i\) there exists \(q_i\in Q\) such that \(|a_i-q|\le \delta \). Then, because the random sum \(\sum _i q_ix_i\) takes values in the GAP \(nQ:=\{\sum _{i=1}^r k_ig_i : -nK_i \le k_i \le nK_i\}\), a GAP of size \(|nQ| \le n^r N=n^{O(1)}\), the pigeon-hole principle implies that \(\sum _i q_ix_i\) takes some value in \(nQ\) with probability \(n^{-O(1)}\). Thus we have
This example shows that if \(A\) is close to a \(GAP\) of rank \(O(1)\) and size \(n^{O(1)}\), then \(A\) has large small ball probability. It was shown by Tao and Vu in [28, 29, 31, 32] that this is essentially the only example which has large small ball probability. We recite here an explicit version from [21] which will be used later on.
We say that a vector \(a\) is \(\delta \) -close to a set \(Q\) if there exists \(q\in Q\) such that \(\Vert a-q\Vert \le \delta \).
Theorem 3.2
(Inverse Littlewood–Offord theorem for linear forms [21]) Let \(0 <\varepsilon < 1\) and \(B>0\). Let \( \beta >0\) be a parameter that may depend on \(n\). Suppose that \(\sum _i \Vert a_i\Vert ^2 =1\) and
where \(x_1,\ldots ,x_n\) are iid copies of a random variable \(\xi \) having bounded \((2+\eta )\)-moment. Then, for any number \(n'\) between \(n^\varepsilon \) and \(n\), there exists a proper symmetric GAP \(Q=\{\sum _{i=1}^r k_ig_i : |k_i|\le K_i \}\) such that
-
At least \(n-n'\) elements of \(a_i\) are \(\beta \)-close to \(Q\).
-
\(Q\) has small rank, \(r=O_{B,\varepsilon }(1)\), and small size
$$\begin{aligned} |Q| \le \max \left\{ O_{B,\varepsilon }\left( \frac{\rho ^{-1}}{\sqrt{n'}}\right) ,1\right\} . \end{aligned}$$ -
There is a non-zero integer \(p=O_{B,\varepsilon }(\sqrt{n'})\) such that all generators \(g_i\) of \(Q\) have the form \(g_i=(g_{i1},\dots ,g_{id})\), where \(g_{ij}=\beta \frac{p_{ij}}{p} \) with \(p_{ij} \in {\mathbb {Z}}\) and \(|p_{ij}|=O_{B,\varepsilon }(\beta ^{-1} \sqrt{n'}).\)
3.2 Random multilinear forms
One can view the sum \(S_\xi (A) =a_1 x_1+\dots +a_n x_n\) as a linear function of the random variables \(x_1,\dots , x_n\). It is natural to study general polynomials of higher degree.
Let \(D\) be a fixed positive integer. Let \(x_{1i_1},x_{2i_2},\dots ,x_{Di_D}\), \(1\le i_1,\dots ,i_D \le n\), be iid copies of a random variable \(\xi \), and let \(A=(a_{i_1i_2 \dots i_D})_{1\le i_1,\dots ,i_D\le n}\) be an \(n^D\)-array of complex numbers. We define the D-multilinear concentration probability of \(A\) by
where \({\mathbf {x}}_i=(x_{i1},\dots ,x_{in})\) and \(L_{D-1}({\mathbf {x}}_1,\dots , {\mathbf {x}}_D)\) is any \((D-1)\)-multilinear form of \(({\mathbf {x}}_1,\dots ,{\mathbf {x}}_D)\).
We would like to characterize \(A\) with large \(\rho _{\beta ,\xi }(A)\). The following examples serve as good candidates.
Example 3.3
In what follows \(\xi \) has Bernoulli distribution and for each \(a_{i_1i_2\ldots i_D}\) there exists \(q_{i_1i_2\dots i_D}\) such that \(|a_{i_1i_2\dots i_D}-q_{i_1i_2\dots i_D}|\le \delta .\)
-
(1)
Let \(Q\) be a proper symmetric GAP of rank \(r=O(1)\) and size \(n^{O(1)}\). Assume that the approximated values \(q_{i_1i_2\dots i_D}\) belong to \(Q\). Then, the pigeon-hole principle implies that \(\sum _{i_1,\dots ,i_D}q_{i_1i_2\dots i_D}x_{1i_1}\dots x_{Di_D}\) takes some value in \(n^2Q\) with probability \(n^{-O(1)}\). Passing back to \(a_{i_1i_2\dots i_D}\), we obtain \(\rho _{n^2\delta ,\xi }(A) =n^{-O(1)}\).
-
(2)
Assume that \(q_{i_1i_2\dots i_D}\) can be written as \(q_{i_1i_2\dots i_D}=k_{i_1}b_{\bar{i}_1i_2\dots i_D}+l_{i_2}b_{i_1\bar{i}_2\dots i_D}+\dots + m_{i_D} b_{i_1i_2\dots \bar{i}_D}\), where \(b_{\bar{i}_1i_2\dots i_D},\ldots ,b_{i_1i_2\dots \bar{i}_D}\) are arbitrary sequences in \({\mathbb {R}}^d\) without indices \(i_1,\dots ,i_D\) respectively, and \(k_{i_1},l_{i_2},\dots , m_{i_D}\) are integers bounded by \(n^{O(1)}\) such that
$$\begin{aligned} {\mathbb {P}}_{{\mathbf {x}}_1}\left( \sum _{i_1} k_{i_1}x_{1i_1}= 0\right)&=n^{-O(1)},\dots , \\ {\mathbb {P}}_{{\mathbf {x}}_D}\left( \sum _{i_D} m_{i_D}x_{Di_D}= 0\right)&=n^{-O(1)}. \end{aligned}$$Then, as \(\sum _{i_1,i_2,\dots ,i_D}q_{i_1i_2\dots i_D}x_{1i_1}x_{2i_2}\dots x_{Di_D}\) factors out, we have
$$\begin{aligned} {\mathbb {P}}\left( \sum _{i_1,i_2,\dots ,i_D}q_{i_1i_2\dots i_D}x_{1i_1}x_{2i_2}\dots x_{di_D} =\mathbf {0}\right) =n^{-O(1)}. \end{aligned}$$Passing back to \(a_{i_1i_2\dots i_D}\), we hence obtain \(\rho _{n^2\delta ,\xi }(A) =n^{-O(1)}.\)
-
(3)
Assume that \(q_{i_1i_2\dots i_D}=q_{i_1i_2\dots i_D}' +q_{i_1i_2\dots i_D}''\), where \(q_{i_1i_2\dots i_D}'\in Q\), a proper symmetric GAP of rank \(O(1)\) and size \(n^{O(1)}\), and \(q_{i_1i_2\dots i_D}''\) is a sum of a few forms from (2) in such a way that the linear factors are zero with high probability. As such, we have
$$\begin{aligned} \sup _{q\in n^2Q}{\mathbb {P}}_{{\mathbf {x}}_1,\dots ,{\mathbf {x}}_D}\left( \sum _{i_1,\dots ,i_D}q_{i_1i_2\dots i_D}x_{1i_1}x_{2i_2}\dots x_{di_D} =q\right) =n^{-O(1)}. \end{aligned}$$Hence we also have \(\rho _{n^2\delta ,\xi }(A) =n^{-O(1)}\) in this case.
The above examples demonstrate that if the \(a_{i_1i_2\dots i_D}\) can be decomposed into additive and algebraic structural parts, then \(\rho _{\xi ,\beta }(A)\) is large. We conjecture that these are essentially the only cases that have large concentration probability.
Conjecture 3.4
Assume that \(\rho _{\xi ,\beta }(A) \ge n^{-B}\) for some generic \(\xi \) and small \(\beta \), then most of the elements of \(A\) can be \(\beta \)-approximated by a set of \(q_{i_1i_2\dots i_D}\) from (3) of Example 3.3.
Due to its nature, we believe that any justification of Conjecture 3.4 would be highly technical. In this note we prove a weak version of it as follows.
Theorem 3.5
(Weak inverse-type theorem for multilinear forms) Let \(0 <\varepsilon < 1\) and \(C>0\). Let \( \beta >0\) be a parameter that may depend on \(n\). Assume that
where \({\mathbf {x}}_1=(x_{11},\dots ,x_{1n}),\dots ,{\mathbf {x}}_D=(x_{D1},\dots ,x_{Dn})\), and \(x_{1i_1},\dots ,x_{Di_D}\) are iid copies of a random variable \(\xi \) having bounded \((2+\eta )\)-moment. Then there exist index sets \(I_1,I_1^0\) with \(|I_1|=n-n^\varepsilon \) and \(|I_1^0|=O_{C,\varepsilon }(1)\) such that for any \(i_1\in I_1\), there exist index sets \(I_2,I_2^0\) depending on \(i_1\) with \(|I_2|=n-n^\varepsilon \) and \(|I_2^0|=O_{C,\varepsilon }(1)\) such that ...there exist index sets \(I_{D-1},I_{D-1}^0\) depending on \(i_1,\dots ,i_{D-2}\) with \(|I_{D-1}|=n-n^\varepsilon \) and \(|I_{D-1}^0|=O_{C,\varepsilon }(1)\) such that the following holds: for any \(i_{D-1}\in I_{D-1}\), there exist integers \(k_{j_1 \dots j_{D-1}}\), where each index \(j_k\) with \(1\le k\le D-1\) either takes value \(i_k\) or belongs to the thin sets \(I_k^0\), such that \(k_{j_1 \dots j_{D-1}}=n^{O_{C,d,\varepsilon }(1)}\) and \(k_{i_1,\dots ,i_{D-1}}\ne 0\), as well as
Notice that while in Example 3.3 most of the \(D-1\) dimensional arrays of \(A\) have structure, Theorem 3.5 just asserts that most of the 1-dimension arrays \(a_{j_1 \dots j_{D-1} i_D}, 1\le i_D\le n\), with fixed \(j_1,\dots ,j_{D-1}\), have structure.
For the rest of this section we give a proof of Theorem 3.5. Our argument heavily relies on the following simple fact about GAPs of small rank.
Fact 3.6
Assume that \(q_1,\dots ,q_{r+1}\) are elements of a GAP of rank \(r\) and of cardinality \(n^C\), then there exists integer coefficients \(\alpha _1,\dots ,\alpha _r\) with \(|\alpha _i|\le n^{rC}\) and such that
3.3 Proof of Theorem 3.5
Without loss of generality, we assume that \(\xi \) has discrete distribution. The continuous case can be easily extended by a standard limiting argument. We begin by applying Theorem 3.2.
Lemma 3.7
Let \(\varepsilon <1\) and \(C\) be positive constants. Assume that \(\rho _{\xi ,\beta }(A)=\rho \ge n^{-C}\). Then the following holds with probability at least \(\frac{3\rho }{4}\) with respect to \({\mathbf {x}}_2,\dots , {\mathbf {x}}_D\). There exist a proper symmetric GAP \(Q_{{\mathbf {x}}_2\dots {\mathbf {x}}_D}\) of rank \(O_{C,\varepsilon }(1)\) and size \(O_{C,\varepsilon }(1/\rho )\) and a set \(I_{{\mathbf {x}}_2,\dots , {\mathbf {x}}_D}\) of \(n-n^\varepsilon \) indices such that for each \(i\in I_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\), there exists \(q_i\in Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\) so that
Proof
(of Lemma 3.7) For short we write
where
We call a vector tuple \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) good if
We call \({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D\) bad otherwise. Let \(G\) be the collection of good tuples.
First, we estimate the probability \(p\) of randomly chosen vectors \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) being bad by an averaging method.
Thus, the probability of randomly chosen vectors \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) being good is at least
Next, we consider good vectors \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G\). By definition, we have
Observe that this is a high concentration of a linear form of \(x_{1i}\). A direct application of Theorem 3.2 to the sequence \(B_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\), \(i=1,\dots ,n\) yields the desired result. \(\square \)
By a useful property of GAP containment (see for instance [30, Section 8] and [20, Theorem 6.1]), we may assume that the \(q_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) span \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\). From now on we fix such a \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\) for each \({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D\). Let \(G\) be the collection of good vectors \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\). Thus,
Now we state our crucial lemma for the proof of Theorem 3.5.
Lemma 3.8
There exits an index set \(I\) of size at least \(n-2n^\varepsilon \), an index set \(I_0\) of size \(O_{C,\varepsilon }(1)\), and an integer \(k\ne 0\) with \(|k|\le n^{O_{C,\varepsilon }(1)}\) such that for any index \(i\) from \(I\), there are numbers \(k_{ii_0} \in {\mathbb {Z}}, i_0\in I_0\), all bounded by \(n^{O_{C,\varepsilon }(1)}\), such that
Proof
(of Lemma 3.8) For each \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G\), we choose from \(I_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\) \(s\) indices \(i_{(1,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)},\dots ,i_{{(s,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)}}\) such that \(q_{i_{{(j,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)}}}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) span \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\), where \(s\) is the rank of \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\). We note that \(s=O_{C,\varepsilon }(1)\) for all \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G\).
Consider the tuples \((i_{{(1,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)}}, \dots ,i_{{(s,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)}})\) for all \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G\). Because there are \(\sum _{s} O_{C,\varepsilon ,\mu }(n^s) = n^{O_{C,\varepsilon ,\mu }(1)}\) possibilities these tuples can take, there exists a tuple, say \((1,\dots ,r)\) (by rearranging the rows of \(A\) if needed), such that \((i_{{(1,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)}},\dots , i_{{(s,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)}})=(1,\dots ,r)\) for all \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G'\), where \(G'\) is a subset of \(G\) satisfying
For each \(1\le i\le r\), we express \(q_i({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) in terms of the generators of \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\) for each \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G'\),
where \(c_{i1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D),\dots c_{ir}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) are integers bounded by \(n^{O_{C,\varepsilon }(1)}\), and \(g_{i}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) are the generators of \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\).
We show that there are many \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) that correspond to the same coefficients \(c_{i_1i_2}\).
Claim 3.9
There exists a (“dense”) subset \(G''\subset G'\) such that the following holds
-
\({\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G'')\ge {\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G')/n^{O_{C,\varepsilon }(1)} \ge \rho /n^{O_{C,\varepsilon }(1)};\)
-
(common tuples) there exist \(r\) tuples \((c_{11},\dots ,c_{1r}),\dots , (c_{r1},\dots c_{rr})\), whose components are integers bounded by \(n^{O_{C,\varepsilon ,\mu }(1)}\), such that the following hold for all \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G''\):
-
(1)
\(q_{i}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D) = c_{i1}g_{1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)+\dots + c_{ir}g_{r}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\), for \(i=1,\dots ,r\);
-
(2)
The vectors \((c_{11},\dots ,c_{1r}),\dots , (c_{r1},\dots c_{rr})\) span \({\mathbb {Z}}^{{{\mathrm{rank}}}(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D})}\).
-
(1)
Proof
(of Claim 3.9) Consider the collection \(\mathcal {C}\) of the coefficient-tuples
Because the number of possibilities these tuples can take is at most \((n^{O_{C,\varepsilon }(1)})^{r^2} =n^{O_{C,\varepsilon }(1)}\), by the pigeon-hole principle there exists a coefficient-tuple, say \(((c_{11},\dots ,c_{1r})\), \(\dots , (c_{r1},\dots c_{rr}))\in \mathcal {C}\), such that
for all \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) from a subset \(G''\) of \(G'\) which satisfies
\(\square \)
Now we focus on the elements of \(G''\). Because \(|I_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}|\ge n-n^\varepsilon \) for each \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G''\), we obtain the following.
Claim 3.10
There is a set \(I\) of size \(n-3n^\varepsilon \) such that \(I \cap \{1,\dots ,r\} =\emptyset \) and for each \(i\in I\) we have
Proof
(of Claim 3.10) The result easily follows by an elementary averaging argument, \(\square \)
Lemma 3.8: proof conclusion Now we fix an arbitrary index \(i\) from \(I\). We concentrate on those \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G''\) where the index \(i\) belongs to \(I_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\). Because \(q_{i}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D) \in Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\), we can write
where \(c_{1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D),\dots ,c_r({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\) are integers bounded by \(n^{O_{C,\varepsilon }(1)}\).
For short, we denote by \(\mathbf {v}_{i,{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\) the vector \((c_{1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D),\dots c_{r}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D))\), we also use the shorthand \({\mathbf {v}}_j\) for the vectors \((c_{j1},\dots ,c_{jr})\) obtained from Claim 3.9.
Because \(Q_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}\) is spanned by \(q_{1}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D),\dots , q_{r}({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\), we must have \(k:=\det (\mathbf {v}_1,\dots \mathbf {v}_r)\ne 0\) and that
Furthermore, because each coefficient of the identity above is bounded by \(n^{O_{C,\varepsilon ,\mu }(1)}\), there exists a subset \(G_{i}''\) of \(G''\) such that all \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\in G_{i}''\) correspond to the same identity, and
In other words, there exist integers \(k_1,\dots ,k_r\), all bounded by \(n^{O_{C,\varepsilon }(1)}\), such that
for all \(({\mathbf {x}}_2,\dots , {\mathbf {x}}_D)\in G_{i}''\).
Note that \(k\) is independent of the choice of \(i\) and \(({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\). By passing from \(q_i\) to \(B_i\) by approximation, we thus complete the proof of Lemma 3.8. \(\square \)
We are now ready to complete the proof of our inverse result.
Theorem 3.5: proof conclusion From Lemma 3.8, for any fixed \(i\in I\), we consider the following \((D-1)\)-multilinear form
By the conclusion of Lemma 3.8, we have \(\sup _a{\mathbb {P}}_{{\mathbf {x}}_2,\dots ,{\mathbf {x}}_D}(B_i'({\mathbf {x}}_2,\dots ,{\mathbf {x}}_D)\!\in \! B(a,\beta ))\ge \rho /n^{O_{C,\varepsilon }(1)}\). Thus Lemma 3.8 is applicable again for this new \((D-1)\)-multilinear form. By iterating the process \(D-1\) times, we obtain the conclusion of Theorem 3.5.
4 Singularity of block matrices: the approach to prove Theorem 2.5
As the singular values do not change under row and column permutations, for the sake of convenience, we will restrict our analysis to matrices of the form \({\mathbf {M}}_n= {\mathbf {X}}_n+{\mathbf {N}}_n\), where \({\mathbf {N}}_n\) is any deterministic matrix of polynomially bounded norm and \({\mathbf {X}}_n\) is a \(dn \times dn\) matrix whose \(ij\)th block takes the form
where \((x_{11;ij},\dots ,x_{dd;ij}), 1\le i,j\le n\), are iid copies of \((\xi _{11},\dots ,\xi _{dd})\) which satisfy the following conditions from Definition 2.3 and Theorem 2.5:
We now restate Theorem 2.5 as follows.
Theorem 4.1
For any \(B>0\), there exists \(A>0\) depending on \(B\) and \(\alpha \) such that
In the sequel we sketch the proof of Theorem 4.1. In general, our approach will resemble that of [19, 20, 26, 31, 35] where the main ingredient is an inverse-type argument. However, as our matrix now consists of large blocks of correlated entries, we need to elaborate more on the algebraic and technical side. For the sake of simplicity, we will prove our result under the following condition.
Assumption 4.2
With probability one, \(|x_{11;ij}| \le n^{B+1} \wedge \dots \wedge |x_{dd;ij}| \le n^{B+1}\) for all \(i,j\).
In what follows we assume that \({\mathbf {M}}_{n}\) has full rank. This is the main case to consider as most random matrices are non-singular with very high probability. The case that \({\mathbf {M}}_n\) is singular can be deduced by a standard argument (see for instance [22, Appendix A]).
Assume that \(\sigma _{nd}({\mathbf {M}}_n)\le n^{-A}\). Thus \({\mathbf {M}}_n{\mathbf {x}}={\mathbf {y}}\) for some \(\Vert {\mathbf {x}}\Vert =1\) and \(\Vert {\mathbf {y}}\Vert \le n^{-A}\). Let \({\mathbf {C}}=(c_{i,j}({\mathbf {M}}_n))\), \(1\le i,j\le dn\), be the matrix of cofactors of \({\mathbf {M}}_n\). By definition, \({\mathbf {C}}{\mathbf {y}}= \det ({\mathbf {M}}_n) \cdot {\mathbf {x}}\), and thus we have \(\Vert {\mathbf {C}}{\mathbf {y}}\Vert = |\det ({\mathbf {M}}_n)|\).
By paying a factor of \(dn\) in probability, without loss of generality we can assume that the first component of \({\mathbf {C}}{\mathbf {y}}\) is greater than \(\det ({\mathbf {M}}_n)/(dn)^{1/2}\),
Claim 4.3
Let \({\mathbf {M}}_{n-1}\) be the matrix obtained from \({\mathbf {M}}_n\) by removing its first \(d\) rows, and \( c_{i_1i_2\dots i_d}({\mathbf {M}}_{n-1}), 1\le i_1,\dots ,i_d\le nd\) be the sign determinant of the minor obtained from \({\mathbf {M}}_{n-1}\) by removing its \(i_1,\dots ,i_d\)th columns. We have
Proof
(of Claim 4.3) As \(\Vert {\mathbf {y}}\Vert \le n^{-A}\), it follows from (4.2) that
Next, as each cofactor \(c_{1i_1}({\mathbf {M}}_n)\), as a sign determinant of a \((dn-1)\times (dn-1)\) block of \({\mathbf {M}}_n\), can be expressed as
The claim then follows by applying Cauchy–Schwarz inequality together with Condition 4.2 and the upper bound \(n^\alpha \) on the entries of \({\mathbf {N}}_n\). \(\square \)
By Claim 4.3, in order to prove Theorem 4.1 it suffices to justify the following result.
Theorem 4.4
For any \(B>0\), there exists \(A>0\) such that
Next, express \(\det ({\mathbf {M}}_n)\) as a \(d\)-multilinear form of its first \(d\) rows
With \(c:=(\sum _{1\le i_1,i_2,\dots ,i_d \le dn} |c_{i_1i_2\dots i_d}({\mathbf {M}}_{n-1})|^2)^{1/2}\) and \(a_{i_1\dots i_d}:=c_{i_1\dots i_d}({\mathbf {M}}_{n-1})/c\),
Heuristically, conditioning on \({\mathbf {M}}_{n-1}\), the \(d\)-multilinear form in the RHS of (4.5) is comparable to 1 in absolute value with probability extremely close to one. Thus the assumption \({\mathbb {P}}(|\det (M_n)|/c\le n^{-A})\ge n^{-B}\) of Theorem 4.4, with an appropriately large value \(A\), must yield a high cancelation of the multilinear form. Based on this observation, our rough approach will consist of two main steps.
-
Step 1 Assume that for an appropriately large value \(A>0\) we have
$$\begin{aligned} {\mathbb {P}}_{x_{11;11},\dots , x_{dd;1n}}\left( \left| \sum _{1\le i_1,i_2,\dots ,i_d \le dn} a_{i_1i_2\dots i_d}({\mathbf {M}}_{n-1}) m_{1i_1}\dots m_{di_d}\right| \!\le n^{-A}\big |{\mathbf {M}}_{n-1}\right) \!\ge n^{-B}. \end{aligned}$$Then the normalized cofactors \(a_{i_1\dots i_d}\) of \({\mathbf {M}}_{n-1}\) must satisfy a very special property.
-
Step 2 The probability, with respect to \({\mathbf {M}}_{n-1}\), that the \(a_{i_1\dots i_d}\) satisfy this special property is negligible.
Although the setting of Step 1 is identical to our inverse problem discussed in Sect. 3, the dependencies of the entries make the problem substantially harder. We will remove these dependencies using a series of decoupling tricks to arrive at a conclusion as useful as Theorem 3.5.
Theorem 4.5
(Step 1) Let \(0<\varepsilon <1\) be given constant. Assume that
for some sufficiently large integer \(A\), where \(a_{i_{1}i_{2}\dots i_{d}}=c_{i_{1}i_{2}\dots i_{d}}/c\). Then there exists \(k=O(d)\) indices \(i_1<\dots <i_k\) and a complex vector \({\mathbf {u}}=(u_1,\dots ,u_{nd})\) which satisfies the following properties.
-
(orthogonality) \(\Vert {\mathbf {u}}\Vert _2\asymp 1\) and \(|\langle {\mathbf {u}}_1,{\mathbf {r}}_i^{(1)}({\mathbf {M}}_{n-1})\rangle + \langle {\mathbf {u}}_2,{\mathbf {r}}_i^{(2)}({\mathbf {M}}_{n-1})\rangle | \le n^{-A/2+O_{B,\varepsilon }(1)}\) for \(n-O_{B,\varepsilon }(1)\) rows \({\mathbf {r}}_i\) of \({\mathbf {M}}_{n-1}\), where \({\mathbf {u}}_1\) and \({\mathbf {r}}_i^{(1)}\) are the subvectors corresponding to the components indexed by \(i_1,\dots ,i_k\) of \({\mathbf {u}}\) and \({\mathbf {r}}_i\) respectively, and \({\mathbf {u}}_2\) and \({\mathbf {r}}_i^{(2)}\) are the subvectors corresponding to the remaining components of \({\mathbf {u}}\) and \({\mathbf {r}}_i\) respectively;
-
(additive structure) there exists a GAP \(Q\) of rank \(O_{B,\varepsilon }(1)\) and size \(n^{O_{B,\varepsilon }(1)}\) that contains at least \(dn-2n^\varepsilon \) components \(u_i\);
-
(controlled form) all the components \(u_i\), and all the generators of the GAP are rational complex numbers of the form \(\frac{p}{q}+ \sqrt{-1} \frac{p'}{q'} \), where \(|p|,|q|,|p'|,|q'| \le n^{A/2+O_{B,\varepsilon }(1)}\).
In the second step, we show that the probability that \({\mathbf {M}}_{n-1}\) has the above properties is negligible.
Theorem 4.6
(Step 2) With respect to \({\mathbf {M}}_{n-1}\), the probability that there exists a vector \({\mathbf {u}}\) as in Theorem 4.5 is \(\exp (-\Omega (n))\).
5 Singularity of block matrices: proof Theorem 4.5
Recall that in the inverse step, Theorem 4.5, we assumed a high concentration of a multilinear form on a small ball of radius \(n^{-A}\). As the entries in each block are dependent, we are not able to apply Theorem 3.5 yet. In what follows we present two main steps to remove these dependencies.
5.1 Dependency removal I: general linear forms
First, it will be useful to study the concentration of the linear form
where \((x_{11;i},\dots ,x_{dd;i})\) are iid copies of \((x_{11},\dots ,x_{dd})\) satisfying (4.1). Intuitively, as the covariance of \((x_{11},\dots ,x_{dd})\) is non-singular, the random variables \((x_{11;i},\dots ,x_{dd;i})\) are not totally dependent on each other. (See Appendix for a more precise statement.) This fact may suggest a way to apply Theorem 3.2 with respect to \((x_{11;1},\dots ,x_{11;n})\) while holding \(x_{12;1},\dots ,x_{dd;n}\) fixed and vice versa. In what follows \((x_{1;i},\dots ,x_{D;i})\) plays the role of \((x_{11;i},\dots ,x_{dd;i})\).
Theorem 5.1
(Inverse Littlewood–Offord theorem for mixing linear forms) Let \(0<\varepsilon <1, B>0\) be given, and \(D\) be a positive integer. Let \( \beta >0\) be an arbitrary real number that may depend on \(n\). Suppose that \(a_{1;i},\dots , a_{D;i}\in {\mathbb {C}}\) such that \(\sum _{i=1}^n \sum _{1\le j\le D} |a_{j;i}|^2=1\) and
where \((x_{1;i},\dots ,x_{D;i}),1\le i\le n\) are iid copies of \((x_{1},\dots ,x_{D})\) from (4.1). Then there exist positive constants \(\alpha ,c_0,C_0\) depending only on the distribution of \((x_{1},\dots ,x_{D})\) and \(D\) tuples \((\eta _{k1},\eta _{k2},\dots ,\eta _{kD}),1\le k\le D\) of complex numbers such that
-
\(|\eta _{ij}|\) are bounded from below and above by \(c_0\) and \(C_0\) respectively,
-
The least singular value of the matrix \((\eta _{ij})\) is at least \(\alpha \),
-
for any number \(n'\) between \(n^\varepsilon \) and \(n\), there exists a proper symmetric GAP \(Q=\{\sum _{i=1}^r k_ig_i : k_i\in {\mathbb {Z}}, |k_i|\le L_i \} \subset {\mathbb {C}}\) whose parameters satisfy (i) and (ii) of Theorem 3.2 and for at least \(n-n'\) indices \(i\), we have \(\eta _{k1}a_{1;i}+\dots +\eta _{kD}c_ia_{D;i}, 1\le k\le D\) are \(\beta \)-close to \(Q\).
As Theorem 5.1 can be shown by using the method of [21], we skip its proof and refer the reader to Appendix for a proof of a somewhat more general result (Theorem 6.1 below). We now introduce a useful corollary.
Corollary 5.2
Assume as in Theorem 5.1. Then there exists a proper symmetric GAP \(Q=\{\sum _{i=1}^r k_ig_i : k_i\in {\mathbb {Z}}, |k_i|\le L_i \} \subset {\mathbb {C}}\) whose parameters satisfy (i) and (ii) of Theorem 5.1 and an index set \(I\) of size at least \(n-n'\) such that with \((\gamma _{ij})\) being the inverse matrix of \((\eta _{ij})\): the sequence \(a_{k;i},i\in I,1\le k\le D\) are \(O(\beta )\)-close to the GAP \(P=P_1+\dots +P_D\), where \(P_k=\gamma _{k1}\cdot Q + \gamma _{k2}\cdot Q+\dots +\gamma _{kD} \cdot Q\).
5.2 Dependency removal II: decoupling
We now work with the multilinear form appearing in Theorem 4.5. Our goal is to show the following.
Theorem 5.3
Let \(0 <\varepsilon < 1\) and \(B>0\). Let \( \beta >0\) be a parameter that may depend on \(n\). Assume that
Then there exist index sets \(I_1,I_1^0\) with \(|I_1|=dn-n^\varepsilon \) and \(|I_1^0|=O_{C,\varepsilon }(1)\) such that for any \(i_1\in I_1\), there exist index sets \(I_2,I_2^0\) depending on \(i_1\) with \(|I_2|=dn-n^\varepsilon \) and \(|I_2^0|=O_{C,\varepsilon }(1)\) such that ...there exist index sets \(I_{d-1},I_{d-1}^0\) depending on \(i_1,\dots ,i_{d-2}\) with \(|I_{d-1}|=n-n^\varepsilon \) and \(|I_{d-1}^0|=O_{C,\varepsilon }(1)\) such that the following holds: for any \(i_{d-1}\in I_{d-1}\), there exist integers \(k_{j_1 \dots j_{d-1}}\), where each index \(j_k\) with \(1\le k\le d-1\) either takes value \(i_k\) or belongs to the thin sets \(I_k^0\), such that \(k_{j_1 \dots j_{d-1}}=n^{O_{C,d,\varepsilon }(1)}\) and \(k_{i_1,\dots ,i_{d-1}}\ne 0\), as well as
Thus Theorem 5.3 asserts that as long as the entries in each block are not totally dependent, the conclusion of Theorem 3.5 still holds as if the matrix entries were mutually independent.
In what follows we introduce the main supporting lemmas to prove Theorem 5.3. By definition, we can rewrite this form as
where \(\det [{\mathbf {c}}_{i_1},\dots ,{\mathbf {c}}_{i_d}]\) is the determinant of the \(d\times d\) block generated by the \(i_1\)th, \(\dots i_d\)th columns of the matrix of the first \(d\) rows of \(M_n\).
Let \(\mathcal {U}:=\{U_1,\dots ,U_d\}\) be an ordered random partition of \([n]\). These index sets will serve as the collection of blocks (among the \(n\) blocks of size \(d\times d\) of the matrix generated by the first \(d\) rows) to be partitioned. We denote by \(B(U_i)\) the collection of indices generated by \(U_i\), that is
Given any partition \(\mathcal {U}\), we easily obtain the following lemma by a series applications of the Cauchy–Schwarz inequality.
Lemma 5.4
(Decoupling lemma) Assume that
Then,
where \(({x_{11;11}}',\dots ,{x_{dd;11}}');\dots ; ({x_{11;1n}}',\dots ,{x_{dd;1n}}')\) are iid copies of the vector \((x_{11}-x_{11}',\dots , x_{dd}-x_{dd}')\), and where \((x_{11}',\dots ,x_{dd}')\) is an independent copy of \((x_{11},\dots ,x_{dd})\).
As the proof of Lemma 5.4 is standard, we refer the reader to [6, 20, 35]. As the columns \({\mathbf {c}}_{i_1},\dots ,{\mathbf {c}}_{i_d}\) are independent, we will be able to obtain an analogue of Lemma 3.8 as follows.
Lemma 5.5
There exist index sets \(I_0(U_1)\) with \(|I_0(U_1)|=O(1)\) and \(I(U_1)\subset B(U_1)\) with \(|I(U_1)|\ge d|U_1|-n^\varepsilon \) and an integer \(k\ne 0, k=n^{O_{B,\varepsilon }(1)}\) such that for any \(i\in I(U_1)\), there exist integers \(k_{ii_0}=n^{O_{B,\varepsilon }(1)}\) such that
where \({\mathbf {c}}_{j}^{\bar{1}}\) is the \(j\)th column \({\mathbf {c}}_j\) without its first component.
Proof
(of Lemma 5.5) As usual, it suffices to assume \(\xi \) to have discrete distribution. For each \(l\in U_1\), let \(B_l=\{(l-1)d+1,\dots ,ld\}\) be the \(l\)th block. By the determinant expansion, we have
By summing over \(l\in U_1\) and by applying Lemma 5.4 and Corollary 5.2 for the random variables \(x_{rs,1j},j\in B(U_1)\), with high probability with respect the the random variables indexing from \(\bar{U}_1\) (i.e. \(x_{rs;1j}, j\in \bar{U}_1\)), most of the coefficients
belong to a GAP of rank \(O_{C,\varepsilon }(1)\) and size \(n^{O_{C,\varepsilon }(1)}\). From here, to conclude Lemma 5.5, we just follow the proof of Lemma 3.8 verbatim. \(\square \)
5.3 Randomization
Roughly speaking, by iterating Lemma 5.5 to the new \((d-1)\)-linear form of the random variables restricted by \(\bar{U}_1\) and so on, we will be able to deduce an analogue of Theorem 3.5 with the dependence upon \(U_1,\dots ,U_d\). One might then try to randomize \(U_1,\dots ,U_d\) to obtain Theorem 5.3. However, the randomization of \(U_1,\dots ,U_d\) altogether may pose a highly technical difficulty. To avoid this hurdle we will try to randomize one pair at a time before each iteration of Lemma 5.5.
Assume that \((U_{12},U_3,\dots ,U_d)\) is a \((d-1)\) ordered partition of \([n]\) in which each partition has order \(\Theta (n)\). Fixing this partition, we next partition \(U_{12}\) into \(U_1,U_2\) randomly. For the new \(d\) partition \((U_1,\dots ,U_d)\) we then apply Lemma 5.5. As a result, the index \(i_2\) in the conclusion belongs to \(B(U_2)\). We will show that by randomizing \(U_1\), one may recover the result for \(i_2\) now an element of \(B(U_{12})\). Let us first extend Lemma 5.5 as follows.
Lemma 5.6
There exist subsets \(I_0(U_{1})\) and \(I(U_{1})\) of \(B(U_{12})\) with size \(O(1)\) and \(d|U_{12}|-n^\varepsilon \) respectively and an integer \(k\ne 0, k=n^{O_{B,\varepsilon }(1)}\) such that for any \(i\in I(U_1)\), there exist integers \(k_{ii_0}=n^{O_{B,\varepsilon }(1)}\)such that the following holds for all \(i\in I(U_1)\) :
where
In comparison with Lemma 5.5, the probability in Lemma 5.6 is now with respect to all random variables of the first \(d\) rows of \(M_n\). Also, \(i_2\) now runs over all the indices restricted by \(U_{12}\). The entries \(a_{ii_2\dots i_d}(U_1)\), without the indices \(i_3,\dots ,i_d\), could be viewed as entries of a symmetric matrix.
Proof
(of Lemma 5.6) We first fix the random variables restricted by \(\bar{U}_{12}\) for which the conclusion of Lemma 5.4 holds with respect to the random variables restricted by \(U_{12}\). Similarly to the proof of Lemma 5.5, the following holds with high probability with respect to \(x_{rs;1i},i\in B(U_2)\): there exist subsets \(I_0(U_1)\) and \(I(U_1)\) of \(B(U_1)\) with size \(O(1)\) and \(d|U_1|-n^\varepsilon \) respectively such that the following holds for all \(i\in I(U_1)\):
By switching the role of \(U_1\) and \(U_2\), there also exist subsets \(I_0(U_2)\) and \(I(U_2)\) of \(B(U_2)\) with size \(O(1)\) and \(d|U_2|-n^\varepsilon \) respectively such that the following holds for all \(i\in I(U_2)\):
Now, by the definition of \(a_{ii_2\dots i_d}(U_1)\), with \(I=I(U_1)\cup I(U_2)\) and \(I_0=I_0(U_1)\cup I_0(U_2)\), we can rewrite both of the events in (5.3) and (5.4) in the following form
The conclusion of Lemma 5.6 then follows from (5.3) and (5.4), noting that \(\{x_{rs;1i_1}, i_1 \in U_1\}\) and \(\{x_{rs;1i_2}, i_2 \in U_2\}\) are independent. \(\square \)
Now we randomize \(U_1\) to obtain the following main result of the subsection.
Lemma 5.7
(Randomization) There exist subsets \(I_0(U_{12})\) and \(I(U_{12})\) of \(B(U_{12})\) with size \(O(1)\) and \(d|U_{12}|-n^\varepsilon \) respectively such that the following holds for all \(i\in I\):
where \({x^{(i)}_{rs}}':=\eta _{i}x^{(i)}_{rs}\) with \(\eta _i\) iid Bernoulli random variables of parameter \(1/2\), and \({\mathbf {c}}_{i_2}':=\eta _{i_2} {\mathbf {c}}_{i_2}\) in the determinants.
Proof
(of Theorem 5.7) Note that Lemma 5.6 holds for any choice \(U_1\subset U_{12}\). As \(I_0(U_1)\subset [n]^{O_{B,\varepsilon }(1)}\) and \(k(U_1)\le n^{O_{B,\varepsilon }(1)}\), there are only \(n^{O_{B,\varepsilon }(1)}\) possibilities that the tuple \((I_0(U_1),k(U_1))\) can take. Thus, there exists a tuple \((I_0,k)\) such that \(I_0(U_1)=I_0\) and \(k(U_1)=k\) for \(2^{|U_{12}|}/n^{O_{B,\varepsilon }(1)}\) different sets \(U_1\). Let us denote this set of \(U_1\) by \(\mathcal {S}\); we have
Next, let \(I\) be the collection of all \(i\in B(U_{12})\) which belong to at least \(|\mathcal {S}|/2\) index sets \(I(U_1)\). Then,
From now on we fix an \(i\in I\). Consider the tuples \((k_{ii_0}(U_1), i_0\in I_0)\) over all \(U_1\) where \(i\in I(U_1)\). Because there are only \(n^{O_{B,\varepsilon }(1)}\) possibilities such tuples can take, there must be a tuple, say \((k_{ii_0}, i_0\in I_0)\), such that \((k_{ii_0}(U_1), i_0\in I_0)=(k_{ii_0}, i_0\in I_0)\) for at least \(|\mathcal {S}|/2n^{O_{B,\varepsilon }(1)}=2^n/n^{O_{B,\varepsilon }(1)}\) sets \(U_1\). Without loss of generality, we assume that \(i\in U_1\) for at least half of those sets. Let \(\mathcal {U}\) denote the collection of such \(U_1\), and for each \(U_1\in \mathcal {U}\) we let \({\mathbf {u}}=(u_1,\dots ,u_{|B(U_{12})|})\in {\mathbb {R}}^{|B(U_{12})|}\) be its characteristic vector, i.e. \(u_i=1\) if \(i\in B(U_1)\), and \(u_i=0\) otherwise.
By the definition of \(a_{ii_2\dots i_d}(U_1)\), as \(i\in U_1\), we can write
Recall that \(|\mathcal {U}|= 2^{|U_{12}|}/n^{O_{B,\varepsilon }(1)}\). Hence, by Lemma 5.6, we have the following joint probability
By applying the Cauchy–Schwarz inequality, we obtain
where \({x^{(i)}_{rs}}':=(u_i-u_i')x^{(i)}_{rs}\) and in the determinant formulas the column \({\mathbf {c}}_{i_2}'\) stands for \((u_{i_2}-u_{i_2}'){\mathbf {c}}_{i_2}\). Also, in the first estimate we used the elementary property that for any function \(f\),
The proof is complete by noting that \(k\) and \(I_0\) are independent of the choice of \(i\). \(\square \)
Proof
(of Theorem 5.3) Note that by Lemma 5.7, we just need to deal with a \((d-1)\)-multilinear form of the rows \({\mathbf {r}}_2,\dots ,{\mathbf {r}}_d\). Our next step is to apply this machinery again when fixing \(U_{123}=U_{12}\cup U_3\) and letting \(U_{12}\) be chosen as a random subset of \(U_{123}\). By iterating the machinery \(d-1\) times similarly to the proof of Theorem 3.5, we obtain the result as claimed. \(\square \)
We now conclude this section by proving the inverse step of Sect. 4.
5.4 Proof of Theorem 4.5
We first apply Theorem 5.3 to obtain
Set \(K_0,\dots , K_d\) to be a sequence of thresholds with \(K_i:=n^{-A/2+2i d}\). We consider two cases.
Case 1 (degenerate case)
Subcase 1.1 Assume that for all \(i\in I_1\),
As \(\sum _{i_1,i_2,\dots ,i_d}|a_{i_1i_2\dots i_d}|^2=1\), there exists \(i_2,\dots ,i_d\) such that
We next fix these indices \(i_2,\dots ,i_d\). It follows from (5.7) that \(|ka_{ii_2\dots i_d}-\sum k_{ii_0}a_{i_0i_2\dots i_d}|\) \(\le K_0\) for any \(i\in I_1\). Set
It is clear that the set of \(v_i\)’s is a GAP of rank \(|I_0|=O_{B,\varepsilon }(1)\) and size \(n^{O_{B,\varepsilon }(1)}\). Also, by definition, with \({\mathbf {v}}=(v_i,i\in I_1)\)
On the other hand, as the vector \((a_{ii_2\dots i_d})_{1\le i\le dn}\) is orthogonal to any row \({\mathbf {r}}_j({\mathbf {M}}_{n-1})\) of \({\mathbf {M}}_{n-1}\), we have
Recall that by the approximation and by (5.7), \(\Vert {\mathbf {v}}\Vert \ge n^{-O_d(1)}\). Thus by letting \({\mathbf {u}}:={\mathbf {v}}/\Vert {\mathbf {v}}\Vert \), we have
It is clear that \({\mathbf {u}}\) satisfies all the conditions of Theorem 4.5, we are done with this subcase.
Subcase 1.2 From now on we assume that there exists \(i\in I_1\) such that
Fixing \(i\), we apply Theorem 5.3 for the index \(i_2\), and reconsidering Subcases 1.1 with the new threshold \(K_1:=n^{-A/2-2d}\). By iterating the process for at most \(d-1\) steps, we will end up with either Subcase 1.1 (and hence done) or with the following non-degenerate case.
Case 2 (non-degenerate case) There exist a collection \(J_1,\dots , J_{d-1}\) of the indices \(j_1,\dots ,j_{d-1}\) such that \(|J_i|=O_{B,\varepsilon }(1)\) and
Notice that for each fixed \(i_1,\dots ,i_{d-1}\) the vector \((a_{i_1,\dots ,i_{d-1},i},1\le i\le dn,i\ne i_1,\dots ,i_{d-1})\) is orthogonal to any \({\mathbf {r}}_j^{(i_1,\dots ,i_{d-1})}({\mathbf {M}}_{n-1})\), the \(j\)th row of \({\mathbf {M}}_{n-1}\) without components \(i_1,\dots ,i_{d-1}\). By adding zeros to the missing components \(i_1,\dots ,i_{d-1}\) if needed, we see that the \({\mathbb {R}}^{nd}\) vector \((a_{i_1,\dots ,i_{d-1},i},1\le i\le dn)\) is now orthogonal to \({\mathbf {r}}_j({\mathbf {M}}_{n-1})\).
Set \(J=J_1\cup \dots \cup J_{d-1}\), we thus have
where the entries \(a_{j_1,\dots ,j_{d-1},i}\) are set to be zero if the indices are not distinct.
Now we set \({\mathbf {w}}_1:=(w_i)_{i\notin J}\) and \({\mathbf {w}}_2:=(w_i)_{i\in J}\), where \(w_i:=k_{j_1 \dots j_{d-1}}a_{i_1 \dots i_{d-1} i}\). Then
Set \({\mathbf {v}}:={\mathbf {w}}/\Vert {\mathbf {w}}\Vert \). Theorem 3.2 applied to (5.6) implies that \({\mathbf {v}}\) can be approximated by a vector \({\mathbf {u}}\) as follows.
-
\(|u_i-v_i|\le n^{-A/2+O_{B,\varepsilon ,d}(1)}\) for all \(i\).
-
There exists a GAP of rank \(O_{B,\varepsilon }(1)\) and size \(n^{O_{B,\varepsilon }(1)}\) that contains at least \(dn-n^\varepsilon \) components \(u_i\).
-
All the components \(u_i\), and all the generators of the GAP are rational complex numbers of the form \(\frac{p}{q}+\sqrt{-1}\frac{p'}{q'}\), where \(|p|,|q|,|p'|,|q'| \le n^{A/2+O_{B,\varepsilon }(1)}\).
Note that, by the approximation above, \(\Vert {\mathbf {u}}\Vert \asymp 1\) and \(|\langle {\mathbf {u}}_1,{\mathbf {r}}_j^{(\bar{J})}({\mathbf {M}}_{n-1}) \rangle + \langle {\mathbf {u}}_2,{\mathbf {r}}_j^{(J)}({\mathbf {M}}_{n-1})| \le n^{-A/2+O_{B,\varepsilon }(1)}\) for all row vectors of \({\mathbf {M}}_{n-1}\).
6 Singularity of block matrices: proof sketch for Theorem 4.6
Our first ingredient is the following variant of Theorem 3.2 in which random variables are replaced by random matrices and \(a_i\) are replaced by vectors.
Theorem 6.1
(Inverse Littlewood–Offord for sequence of random operators) Let \(\{{\mathbf {u}}^{(i)}=({\mathbf {u}}_1^{(i)},\dots ,{\mathbf {u}}_d^{(i)}), 1\le i\le n\}\) be a sequence of \(n\) vectors in \({\mathbb {C}}^d\) such that the following concentration-type holds with high probability
where \(X^{(1)},\dots ,X^{(n)}\) are iid block matrices whose entries are copies of \((x_{11},\dots ,x_{dd})\) from (4.1). Then there exist a positive constant \(\delta \) and \(d^2\) numbers \(c_{11},\dots ,c_{dd}\) such that the least singular value \(\sigma _d\) (largest singular value \(\sigma _1\)) of the matrix \((c_{ij})_{1\le i,j\le d}\) is at least \(\delta \) (at most \(\delta ^{-1}\)) and for any number \(n'\) between \(n^\varepsilon \) and \(n\), there exists a proper symmetric GAP \(Q=\{\sum _{i=1}^r k_ig_i : |k_i|\le K_i \}\subset {\mathbb {C}}^d\) such that
-
At least \(n-n'\) elements of \(V:=\{(c_{11}{\mathbf {u}}^{(i)}_1+\dots +c_{1d}{\mathbf {u}}^{(i)}_d, \dots , c_{d1}{\mathbf {u}}^{(i)}_1+\dots +c_{dd}{\mathbf {u}}^{(i)}_d), 1\le i\le n\}\) are \(\beta \)-close to \(Q\).
-
\(Q\) has small rank, \(r=O_{B,\varepsilon }(1)\), and small size
$$\begin{aligned} |Q| \le \max \left\{ O_{B,\varepsilon }\left( \frac{\gamma ^{-1}}{\sqrt{n'}}\right) ,1\right\} . \end{aligned}$$ -
There is a non-zero integer \(p=O_{B,\varepsilon }(\sqrt{n'})\) such that all steps \(g_i\) of \(Q\) have the form \(g_i=(g_{i1},\dots ,g_{id})\), where \(g_{ij}=\beta \frac{p_{ij}}{p} \) with \(p_{ij} \in {\mathbb {Z}}\) and \(|p_{ij}|=O_{B,\varepsilon }(\beta ^{-1} \sqrt{n'}).\)
In application, \(X^{(1)},\dots , X^{(n)}\) will be the \(d\times d\) blocks of \({\mathbf {M}}_{n-1}\). It is crucial to notice that, as most of the elements of \(V\) are \(\beta \)-close to \(Q\), and as the matrix \((c_{ij})\) is far from being degenerate, it follows from Theorem 6.1 that most of the individual components \({\mathbf {u}}^{(i)}_j\) are also close to another GAP of small rank and size (see Corollary 5.2). We will present the proof of Theorem 6.1 in Appendix by following the treatment from [21]. For the rest of this section we sketch the proof of Theorem 4.6 following [19, 32].
Let \(\mathcal {N}\) be the number of such structural vectors \({\mathbf {u}}\) from Theorem 4.5. Because each GAP is determined by its generators and dimensions, the number of \(Q\)’s is bounded by
Next, for a given \(Q\), there are at most \(n^{O_{B,\varepsilon }(n)}\) ways to choose the \(nd-2n^\varepsilon \) components \(u_i\) that \(Q\) contains, and \(n^{O_{A,B,\varepsilon }(n^\varepsilon )}\) ways to choose the remaining components from the set \(\{\frac{p}{q}+i \frac{p'}{q'}, |p|,|q|,|p'|,|q'|\le n^{A/2+O_{B,\varepsilon }(1)}\}\). Hence, we obtain the key bound
From now on, by conditioning on \({\mathbf {u}}_1\) and on the entries of \({\mathbf {M}}_{n-1}\) corresponding to the index \(i_1,\dots ,i_d\) of \({\mathbf {u}}_1\), without loss of generality we assume that \({\mathbf {u}}_1\) vanishes. Set \(\beta _0:=n^{-A/2+O_{B,\varepsilon }(1)}\), the bound obtained from the conclusion of Theorem 4.5. We will denote the blocks of \({\mathbf {M}}_{n-1}\) by \(X^{(i)}_j\) with \(1\le i\le n\) and \(1\le j\le n-1\). For a given vector \({\mathbf {u}}\), we define \({\mathbb {P}}_{\beta _0}({\mathbf {u}})\) as follows
If \({\mathbf {u}}\) satisfies the property above, we say that \({\mathbf {u}}\) is \(\beta _0\) -orthogonal to almost all blocks of \({\mathbf {M}}_{n-1}\). Because the blocks of \({\mathbf {M}}_{n-1}\) are iid,
where \(X^{(i)}\) are iid copies of \((x_{st})_{1\le s,t\le d}\).
If \(\gamma =\gamma _{\beta _0}({\mathbf {u}})\) is small, say \(n^{-\Omega (1)}\), then \({\mathbb {P}}_{\beta _0}({\mathbf {u}})\) is \(n^{-\Omega (n)}\). Thus the contribution of these \({\mathbb {P}}_{\beta _0}({\mathbf {u}})\) in the total sum \(\sum _{{\mathbf {u}}}{\mathbb {P}}_{\beta _0}({\mathbf {u}})\) is negligible as \(\mathcal {N}=n^{O(n)}\).
For the case \(\gamma \) is comparably large, \(\gamma =n^{-O(1)}\), then by Theorem 6.1, most of the components \(u_i\) are close to a new GAP of rank \(O(1)\) and of size \(O(\gamma ^{-1}/\sqrt{n})\). This would then enable us to approximate \({\mathbf {u}}\) by a new vector \({\mathbf {u}}'\) in such a way that \(|\langle {\mathbf {u}}',{\mathbf {r}}_i({\mathbf {M}}_{n-1})\rangle |\), where we recall that \({\mathbf {r}}_i({\mathbf {M}}_{n-1})\) is the \(i\)th row of \({\mathbf {M}}_{n-1}\), is still of order \(O(\beta _0)\) and the components of \({\mathbf {u}}'\) are now from the new GAPs (after a linear transformation). The number \(\mathcal {N'}\) of these \({\mathbf {u}}'\) can be bounded by \((\gamma ^{-1}/n^\varepsilon )^{n}\), while we recall that \({\mathbb {P}}_{\beta _0}({\mathbf {u}}')\) is of order \(\gamma ^{-n}\). Thus, summing over \({\mathbf {u}}'\) we obtain the desired bound
To proceed further, we need the following elementary claim.
Claim 6.2
Assume that \({\mathbf {C}}=(c_{ij})\) is a \(d\times d\) matrix such that \(\delta \le \sigma _d({\mathbf {C}})\le \sigma _1({\mathbf {C}}) \le \delta ^{-1}\). Let \({\mathbf {u}}'=({\mathbf {u}}'^{(1)},\dots ,{\mathbf {u}}'^{(n)})\), where \({\mathbf {u}}'^{(i)} ={\mathbf {C}}{\mathbf {u}}^{(i)}\). Then we have
By paying a factor of \(n^{O_{B,\varepsilon }(1)}\) in probability, we may assume that \(|\langle {\mathbf {u}},{\mathbf {r}}_i(M_{n}) \rangle | \le \beta _0\) for the first \(d(n-1)-O_{B,\varepsilon }(1)\) rows of \({\mathbf {M}}_{n}\). Also, by paying another factor of \(n^{n^\varepsilon }\) in probability, we may assume that the first \(d n_0\) components of \({\mathbf {u}}\) belong to a GAP \(Q\), and the Euclidean norm of \({\mathbf {u}}^{(n_0)}\) is comparable, \(\Vert {\mathbf {u}}^{(n_0)}\Vert = \Omega (1/n)\) (recall that \(\Vert {\mathbf {u}}\Vert \asymp 1\)), where
We refer to the remaining \(u_i\)’s as exceptional components. Note that these extra factors do not affect our final bound \(\exp (-\Omega (n))\). Because \(\Vert {\mathbf {u}}^{(n_0)}\Vert = \Omega (1/n)\) and \(X^{(n_0)}\) is not degenerate with high probability, there exist positive constants \(c_1,c_2\) such that \(c_2<1\) and for any \(\beta \le c_1/\sqrt{n}\) we have
6.1 Classification
Next, let \(C\) be a sufficiently large constant depending on \(B\) and \(\varepsilon \) but not \(A\). We classify \({\mathbf {u}}\) into two classes \(\mathcal {B}\) and \(\mathcal {B}'\), depending on whether \({\mathbb {P}}_{\beta _0}({\mathbf {u}})\ge n^{-Cn}\) or not. Because of (6.1), for \(C\) large enough,
For the set \(\mathcal {B}\) of remaining vectors, we divide it into two subfamilies. Set \(n':=n^{1-\varepsilon }\). We say that \({\mathbf {u}}\in \mathcal {B}\) is compressible if for any \(n'\) components \({\mathbf {u}}^{(i_1)},\dots ,{\mathbf {u}}^{(i_{n'})}\) among the \({\mathbf {u}}^{(1)},\dots , {\mathbf {u}}^{(n_0)}\), we have
Let \(\mathcal {B}_1\) and \(\mathcal {B}_2\) be the set of compressible and incompressible vectors respectively. We focus on \(\mathcal {B}_1\) first.
6.2 Approximation for compressible vectors
Set \(\beta :=n^{-B-4}\). It follows from Theorem 6.1 that, among any \({\mathbf {u}}^{(i_1)},\dots ,{\mathbf {u}}^{(i_{n'})}\), there are, say, at least \(n'/2+1\) vectors that belong to a ball of radius \(\beta \) in \({\mathbb {C}}^d\) (because our GAP now has only one element after a linear transformation \({\mathbf {C}}=(c_{ij})_{1\le i,j\le d}\)). A simple covering argument then implies that there is a ball of radius \(2\beta \) that contains all but \(n'-1\) vectors \({\mathbf {u}}^{(i)}\).
Thus there exists a vector \({\mathbf {u}}'=({\mathbf {u}}'^{(1)},\dots ,{\mathbf {u}}'^{(n)})\in (2\beta )\cdot ({\mathbb {Z}}+ \sqrt{-1} {\mathbb {Z}})^{nd}\) such that
-
\(|{\mathbf {C}}{\mathbf {u}}^{(i)}-{\mathbf {u}}'^{(i)}|\le 4\beta \) for all \(i\);
-
\({\mathbf {u}}'^{(i)}\) takes the same vector-value for at least \(n_0-n'\) indices \(i\).
Because of the approximation and Assumption 4.2, whenever \(\sum _{1\le i\le n} X^{(i)}{\mathbf {u}}^{(i)}\in B({\mathbf {u}},\beta )\), we also have
By definition, \(\beta ' \le c_1/\sqrt{n}\), and thus by (6.2), \({\mathbb {P}}_{\beta '}({\mathbf {u}}') \le (1-c_2)^{(1-o(1))n}\). Now we bound the number of \({\mathbf {u}}'\) obtained from the approximation. First, there are \(O(n^{n-n_0+n'}) = O(n^{2n^{1-\varepsilon }})\) ways to choose those \({\mathbf {u}}'^{(i)}\) that take the same vector \({\mathbf {w}}\in {\mathbb {C}}^d\), and there are just \(O(\beta ^{-d})\) ways to choose \({\mathbf {w}}\). The remaining components belong to the set \((2\beta )^{-d}\cdot ({\mathbb {Z}}+ i{\mathbb {Z}})^d\), and thus there are at most \(O((\beta ^{-d})^{n-n_0+n'})= O(n^{O_{A,B,\varepsilon }(n^{1-\varepsilon })})\) ways to choose them. Hence we obtain the total bound
6.3 Approximation for incompressible vectors
The treatment here is similar, we will sketch the main steps, leaving the details for the reader as an exercise.
First, by exposing the rows of \({\mathbf {M}}_{n-1}\) accordingly, and by paying an extra factor \(\left( {\begin{array}{c}n_0\\ n'\end{array}}\right) =O(n^{n^{1-\varepsilon }})\) in probability, we can assume that the components \({\mathbf {u}}^{(n_0-n'+1)},\dots ,{\mathbf {u}}^{(n_0)}\) satisfy
Next, define a radius sequence \(\beta _k, k\ge 0\) where \(\beta _0=n^{-A/2+O_{B,\varepsilon }(1)}\) is the bound obtained from the conclusion of Theorem 4.5, and \(\beta _{k+1}:= (n^{B+2}+n^{\alpha +1}+1)^2 \beta _k.\) Also define
Clearly \({\mathbb {P}}_{\beta _k}({\mathbf {u}}) \le \gamma ^{n-1}_{\beta _k}({\mathbf {u}})\). As \({\mathbf {C}}\) is non-degenerate, with \({\mathbf {u}}'\) from Theorem 6.1,
Furthermore, we have freedom to choose \(k\) before applying Theorem 6.1 to obtain \({\mathbf {u}}'\). By the pigeon-hole principle, there exists \(k=k_0({\mathbf {u}})\le C\varepsilon ^{-1}\) such that
Since \(A\) was chosen sufficiently large compared to \(O_{B,\varepsilon }(1)\) and \(C\), we have \(\beta _{k_0+1}\le n^{-B-4}\). With this choice of \(k_0\), we apply Theorem 6.1 to obtain an approximation \({\mathbf {u}}'\) of \({\mathbf {C}}{\mathbf {u}}\) with the following properties.
-
(i)
\(|{\mathbf {C}}{\mathbf {u}}^{(i)}-{\mathbf {u}}'^{(i)}|\le \beta _{k_0}\) for all \(i\).
-
(ii)
The components of \({{\mathbf {u}}'}^{(i)}\) belong to \(Q\) for all but \(n^{1-2\varepsilon }\) indices \(i\), and the generators of \(Q\), belong to the set \(\beta _{k_0}\cdot \{p/q +\sqrt{-1} p'/q' , |p|,|q|,|p'|,|q'|\le n^{A/2+O_{B,\varepsilon }(1)}\}\).
-
(iii)
\(Q\) has rank \(O_{B,\varepsilon }(1)\) and size \(|Q|=O(\gamma _{\beta _{k_0}({\mathbf {u}})}^{-1}/n^{1/2-\varepsilon })\).
Let \(\mathcal {B'}\) be the collection of such \({\mathbf {u}}'\). By definition,
Arguing similarly to the treatment for \(\mathcal {N}\), we can bound the cardinality \(\mathcal {N'}\) of \(\mathcal {B'}\) by
It follows from (6.8) and (6.9) that
completing the treatment for incompressible vectors.
7 Universality of random block matrices: proof of Theorem 2.4
This section is devoted to Theorem 2.4. We begin by introducing the following notation. Given a \(n \times n\) matrix \({\mathbf {M}}\), we let \(\mu _{{\mathbf {M}}}\) denote the empirical measure built from the eigenvalues of \({\mathbf {M}}\) and \(\nu _{{\mathbf {M}}}\) denote the symmetric empirical measure built from the singular values of \({\mathbf {M}}\). That is,
where \(\lambda _1({\mathbf {M}}), \ldots , \lambda _n({\mathbf {M}}) \in \mathbb {C}\) are the eigenvalues of \({\mathbf {M}}\) and \(\sigma _1({\mathbf {M}}) \ge \cdots \ge \sigma _n({\mathbf {M}})\) are the singular values of \({\mathbf {M}}\). Recall that \(F^{{\mathbf {M}}}\) is the ESD of \({\mathbf {M}}\). In particular, we have
where \(z = s + \sqrt{-1}t\).
Many of the techniques used to study Hermitian matrices fail to work for non-Hermitian matrices [2, Section 11.1]. Consider a \(n \times n\) non-Hermitian matrix \({\mathbf {M}}\). In [10, 11], Girko introduced a natural connection between \(\mu _{{\mathbf {M}}}\) and the collection of measures \(\{\nu _{{\mathbf {M}}- z {\mathbf {I}}}\}_{z \in {\mathbb {C}}}\). Formally, we present this connection as Lemma 7.1 below.
Lemma 7.1 follows from [2, Lemma 11.2] and is based on Girko’s original observation [10, 11]. The lemma has appeared in a number of different forms; for example, see [5, Lemma 4.3] and [12].
Lemma 7.1
(Lemma 11.2 from [2]) Let \({\mathbf {M}}\) be a \(n \times n\) matrix. For any \(uv \ne 0\), we have
where \(z = s + \sqrt{-1} t\).
We define the function
where \(z = s + \sqrt{-1} t\). We also define
Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\). We define the \(2dn \times 2dn\) Hermitian matrix
for \(z \in \mathbb {C}\). It is straight-forward to verify that the eigenvalues of \({\mathbf {H}}_n\) are given by
In other words, \(\nu _{\frac{1}{\sqrt{n}} {\mathbf {X}}_n - z {\mathbf {I}}}\) is the empirical spectral measure of \({\mathbf {H}}_n\). By Lemma 7.1, the problem of studying \(\mu _{\frac{1}{\sqrt{n}} {\mathbf {X}}_n}\) reduces to studying the eigenvalue distribution of \({\mathbf {H}}_n\).
7.1 Truncation
In practice, it will be more convenient to work with a truncated version of \({\mathbf {H}}_n\). That is, we will work with a new matrix \(\hat{{\mathbf {H}}}_n\) whose entries are truncated versions of the entries of the original matrix \({\mathbf {H}}_n\). This subsection is devoted to the following standard truncation arguments.
Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume
for some \(\eta > 0\). Let \(\delta > 0\). For each \(s,t \in \{1,\ldots ,d\}\), we define
Here \(\mathbf {1}_{E}\) denotes the indicator function of the event \(E\). We present the following standard truncation lemma.
Lemma 7.2
Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume (7.3) holds for some \(\eta > 0\). For each \(\delta > 0\), there exists \(n_0 > 0\) such that the following holds for all \(n > n_0\).
-
(i)
For each \(s,t \in \{1,\ldots ,d\}\), \(\hat{\xi }_{st}^{(n)}\) has mean zero and variance \(1/d\).
-
(ii)
a.s. \(\max _{1 \le s,t \le d} \left| \hat{\xi }_{st}^{(n)}\right| \le 4 n^{\delta }\).
-
(iii)
We have
$$\begin{aligned} \max _{1 \le s,t \le d} \left| 1/d - {{\mathrm{Var}}}( \tilde{\xi }_{st}^{(n)} ) \right| \le 2 \frac{m_{2+\eta }}{n^{\delta \eta }}. \end{aligned}$$ -
(iv)
We have
$$\begin{aligned} \max _{ (s,t) \ne (u,v)} \left| {\mathbb {E}}\left[ \hat{\xi }_{st}^{(n)} \overline{\hat{\xi }_{uv}^{(n)}} \right] \right| \le 10 \frac{\sqrt{m_{2 + \eta }}}{n^{\delta \eta /2}}. \end{aligned}$$
Proof
(of Lemma 7.2) We first note that
for all \(s,t \in \{1,\ldots ,d\}\). We also note that
Since this holds for all \(s,t \in \{1,\ldots ,d\}\), we obtain (iii). We now take \(n_0\) sufficiently large such that
and
for all \(n > n_0\); let \(n > n_0\). Then each \(\hat{\xi }_{st}^{(n)}\) has mean zero and variance \(1/d\) by construction. Moreover, we have a.s.
for all \(s,t \in \{1,\ldots ,d\}\). We now verify (iv); fix \(s,t,u,v \in \{1,\ldots ,d\}\) with \((s,t) \ne (u,v)\). We have
by (7.4) and the Cauchy–Schwarz inequality. Here the last inequality follows from (7.5). By another application of Cauchy–Schwarz and (7.4), we obtain
Combining the bounds above with (7.6), we obtain
Since (7.7) holds for any \((s,t) \ne (u,v)\), the proof of the lemma is complete. \(\square \)
We will continue to use the notation introduced in Definition 2.3. That is, for any \(s,t \in \{1,\ldots ,d\}\) and \(1 \le i, j \le n\), we let \(x_{st;ij}\) denote the \((i,j)\)-entry of the matrix \({\mathbf {X}}_{n,st}\). For every \(s,t \in \{1,\ldots ,d\}\), \(n \ge 1\), and \(1 \le i,j \le n\), we define
and
Set \(\tilde{{\mathbf {X}}}_{n,st} := \left( \tilde{x}_{st;ij}^{(n)} \right) _{i,j=1}^n\) and \(\hat{{\mathbf {X}}}_{n,st} := \left( \hat{x}_{st;ij}^{(n)} \right) _{i,j=1}^n\) for every \(n \ge 1\) and \(s,t \in \{1,\ldots ,d\}\). We also define the \(dn \times dn\) random block matrices
For \(z \in \mathbb {C}\), we define the \(2dn \times 2dn\) matrices
and
We will make use of the following corollary to the law of large numbers.
Lemma 7.3
(Law of large numbers) Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of random matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume (7.3) holds for some \(\eta > 0\). Let \(\delta > 0\). Then a.s.
and
Proof
(of Lemma 7.3) We first prove (7.8). We begin by noting that
For any \(s,t \in \{1,\ldots ,d\}\), we apply the law of large numbers and obtain a.s.
Since \(d\) is fixed, independent of \(n\), we conclude that a.s.
and the proof of (7.8) is complete.
For (7.9), we apply the bounds in Lemma 7.2 to obtain
for \(n\) sufficiently large. Hence (7.9) follows from (7.8).
We now prove (7.10); fix \(s,t \in \{1,\ldots ,d\}\). By the law of large numbers, for any \(M > 0\), we have a.s.
By the dominated convergence theorem, it follows that
We conclude that a.s.
Since \(d\) is fixed, independent of \(n\), the claim follows. \(\square \)
We let \(L(F,G)\) denote the Levy distance between two distribution functions \(F,G\). That is,
Convergence in Levy distance implies convergence in distribution [2, Remark A.40]. We will compare the ESD of \(\hat{{\mathbf {H}}}_n\) to the ESD of \({\mathbf {H}}_n\) using the Levy metric.
Lemma 7.4
Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of random matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume (7.3) holds for some \(\eta > 0\). Let \(\delta > 0\). Then a.s.
Proof
(of Lemma 7.4) We will apply [2, Corollary A.41] to bound \(L(F^{H_n}, F^{\tilde{H}_n})\) and \(L(F^{\tilde{H}_n}, F^{\hat{H}_n})\) separately. Thus, we have
We note that
and thus, by the dominated convergence theorem, we obtain
Therefore, by Lemma 7.3, we conclude that a.s.
and hence a.s.
Applying [2, Corollary A.41] again, we obtain
Here the last inequality follows from Lemma 7.2. By Lemma 7.3, we have a.s.
and we conclude that a.s.
The claim now follows from (7.12), (7.13), and the triangle inequality for Levy distance. \(\square \)
7.2 Cubic relation
We now consider the distribution of eigenvalues of \({\mathbf {H}}_n\). In fact, by Lemma 7.4, it will suffice to consider the eigenvalues of \(\hat{{\mathbf {H}}}_n\). To this end, we will study the resolvent of \(\hat{{\mathbf {H}}}_n\) in Theorem 7.6 below. Indeed, for \(w \in \mathbb {C}\) with \(\mathrm{Im}(w) > 0\),
is the Stieltjes transform of the measure \(\nu _{\frac{1}{\sqrt{n}}\hat{{\mathbf {X}}}_n - z{\mathbf {I}}}\). It follows from standard Stieltjes transform techniques (e.g. [2, Theorem B.9]) that computing the limiting ESD of \(\hat{{\mathbf {H}}}_n(z)\) is equivalent to computing the limit of \(\hat{m}_n(z,w)\) for all \(w \in {\mathbb {C}}\) with \(\mathrm{Im}(w) > 0\).
As is standard in random matrix theory, we will not compute \(\hat{m}_n\) explicitly. Instead we will derive a fixed point equation. Indeed, we will show
for \(z,w\in {\mathbb {C}}\) with \(\mathrm{Im}(w) > 0\). We will then conclude that \(m_n(z,w)\) converges to a limit, which we denote by \(m(z,w)\). It follows that \(m(z,w)\) satisfies the equation
From (7.15) we will deduce the limiting ESD of \(\hat{{\mathbf {H}}}_n\). Equation (7.15) has appeared previously in [5, 13] and in a slightly different form in [2, Chapter 11]. We refer to Eq. (7.15) as a cubic relation since it can be rewritten as the cubic polynomial equation
In this subsection we will show \(\hat{m}_n\) satisfies (7.14). We begin with the following concentration result for bilinear forms from [24].
Lemma 7.5
(Lemma 3.10 of [24]) Let \((x,y)\) be a random vector in \(\mathbb {C}^2\) where \(x,y\) both have mean zero, unit variance, and satisfy
-
\(\max \{|x|,|y|\} \le L\) a.s.,
-
\({\mathbb {E}}[\bar{x} y] = \rho \).
Let \((x_1, y_1), (x_2, y_2), \ldots , (x_n, y_n)\) be iid copies of \((x,y)\), and set \(X \!=\! (x_1, x_2, \ldots , x_n)^\mathrm {T}\) and \(Y=(y_1, y_2, \ldots , y_n)^\mathrm {T}\). Let \({\mathbf {B}}\) be a \(n \times n\) random matrix, independent of \(X\) and \(Y\), which satisfies \(\Vert {\mathbf {B}}\Vert \le n^{1/4}\) a.s. Then, for any \(p \ge 2\),
We formally establish (7.14) in the following theorem.
Theorem 7.6
Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of random matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume (7.3) holds for some \(\eta > 0\). Let \(0 < \delta < 1/100\). Consider the truncated random matrices \(\{\hat{{\mathbf {X}}}_n\}_{n \ge 1}\) and \(\{\hat{{\mathbf {H}}}_n(z)\}_{n \ge 1}\). For \(z,w \in \mathbb {C}\) with \(\mathrm{Im}(w) > 0\), define
Let \(M, \beta > 0\). Then, for \(v_n := \max \left\{ n^{-\eta \delta /100}, n^{-1/100} \right\} \), a.s.
In order to prove Theorem 7.6, we will need the following deterministic lemmas.
Lemma 7.7
Let \({\mathbf {R}}\) be the \(2n \times 2n\) block matrix given by
where \({\mathbf {B}}, {\mathbf {R}}_1,{\mathbf {R}}_2,{\mathbf {R}}_3,{\mathbf {R}}_4\) are \(n \times n\) matrices. Then \({{\mathrm{tr}}}{\mathbf {R}}_1 = {{\mathrm{tr}}}{\mathbf {R}}_4\) for any \(w \in \mathbb {C}\) with \(\mathrm{Im}(w) > 0\).
Proof
(of Lemma 7.7) We first note that
is invertible for any \(w \in \mathbb {C}\) with \(\mathrm{Im}(w) > 0\). Let \(\sigma _1, \sigma _2, \ldots , \sigma _n \ge 0\) denote the singular values of \({\mathbf {B}}\). Then \(-w{\mathbf {I}}+ w^{-1} {\mathbf {B}}{\mathbf {B}}^*\) has eigenvalues \(-w + w^{-1}\sigma _i^2\) for \(i=1,2,\ldots ,n\). In particular
for \(\mathrm{Im}(w) > 0\). Thus \(-w{\mathbf {I}}+ w^{-1}{\mathbf {B}}{\mathbf {B}}^*\) is invertible. Similarly, \(-w{\mathbf {I}}+ w^{-1} {\mathbf {B}}^*{\mathbf {B}}\) has the same eigenvalues and is also invertible. By the Schur complement [15, Section 0.7.3],
Since \({\mathbf {R}}_1\) and \({\mathbf {R}}_4\) have the same eigenvalues, \({{\mathrm{tr}}}{\mathbf {R}}_1 = {{\mathrm{tr}}}{\mathbf {R}}_4\). \(\square \)
We introduce \({\varepsilon }\)-nets as a convenient way to discretize a compact set. Let \({\varepsilon }> 0\). A set \(X\) is an \({\varepsilon }\)-net of a set \(Y\) if for any \(y \in Y\), there exists \(x \in X\) such that \(\Vert x-y\Vert \le {\varepsilon }\). In order to prove Theorem 7.6, we will need the following well-known estimate for the maximum size of an \({\varepsilon }\)-net.
Lemma 7.8
(Lemma 3.11 of [24]) The set \(\{w \in \mathbb {C} : |w| \le \beta , \mathrm{Im}(w) \ge \alpha \}\) admits an \({\varepsilon }\)-net of size at most
We will also take advantage of the following facts, which can be found in [14, 15]. Let \({\mathbf {B}}\) be a \(n \times n\) matrix with singular values \(\sigma _1({\mathbf {B}}) \ge \cdots \ge \sigma _n({\mathbf {B}}) \ge 0\). Then the \(2n \times 2n\) matrix
has an orthonormal basis of eigenvectors with eigenvalues \(\pm \sigma _1({\mathbf {B}}) - w, \ldots , \pm \sigma _n({\mathbf {B}}) - w\). Thus, if \(\mathrm{Im}(w) > 0\), the matrix (7.16) is invertible. Let
Then, for \(\mathrm{Im}(w) > 0\), we have
We will make use of the following identity: for any invertible \(n \times n\) matrices \({\mathbf {A}}\) and \({\mathbf {B}}\),
A special case of (7.18) is the resolvent identity (also known as Hilbert’s identity): for any Hermitian \(n \times n\) matrix \({\mathbf {A}}\),
for all \(w,w' \in {\mathbb {C}}\) with \(\mathrm{Im}(w), \mathrm{Im}(w') > 0\).
We are now ready to prove Theorem 7.6.
Proof
(of Theorem 7.6) Fix \(M, \beta > 0\). For the remainder of the proof, the implicit constants in our asymptotic notation (such as \(O,o,\Omega , \ll \)) depend only on the constants \(M, \beta \), \(d\), and \(m_{2+\eta }\); for simplicity, we no longer include these subscripts in our notation.
For notational convenience, we will write \({\mathbf {X}}_n\) instead of \(\hat{{\mathbf {X}}}_n\). That is, we let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) denote the sequence of truncated matrices. Similarly, we write \({\mathbf {X}}_{n,st}\) instead of \(\hat{{\mathbf {X}}}_{n,st}\) for \(s,t \in \{1,\ldots ,d\}\). We define the \(2dn \times 2dn\) matrix
We will often drop the dependence on \(z,w\) and simply write \({\mathbf {G}}_n\) instead of \({\mathbf {G}}_n(z,w)\). We write \({\mathbf {G}}_n = ({\mathbf {G}}_{n,st})_{s,t=1}^{2d}\) where each \({\mathbf {G}}_{n,st}\) is a \(n \times n\) matrix. Then \({\mathbf {G}}_{n,st}(i,j)\) denotes the \((i,j)\)-entry of \({\mathbf {G}}_{n,st}\). We define \(m_{n,st}(z,w) := \frac{1}{n} {{\mathrm{tr}}}{\mathbf {G}}_{n,st}\) for \(s,t \in \{1,\ldots ,2d\}\) and \(m_n(z,w) := \frac{1}{2dn} {{\mathrm{tr}}}{\mathbf {G}}_n\). We will often drop the dependence on \(z,w\) and simply write \(m_n\) and \(m_{n,st}\) instead of \(m_n(z,w)\) and \(m_{n,st}(z,w)\).
Let \(1 \le k \le n\). We let \({\mathbf {r}}_k({\mathbf {X}}_{n,st})\) denote the \(k\)th row of \({\mathbf {X}}_{n,st}\) with the \(k\)th entry removed. Similarly, we let \({\mathbf {c}}_k({\mathbf {X}}_{n,st})\) denote the \(k\)th column of \({\mathbf {X}}_{n,st}\) with the \(k\)th entry removed. We let \({\mathbf {X}}_{n,st}^{(k)}\) be the \((n-1) \times (n-1)\) matrix constructed from \({\mathbf {X}}_{n,st}\) by removing the \(k\)th column and \(k\)th row. We let \({\mathbf {X}}_{n}^{(k)}\) be the \(d(n-1) \times d(n-1)\) block matrix given by \({\mathbf {X}}_{n}^{(k)} := \left( {\mathbf {X}}_{n,st}^{(k)}\right) _{s,t=1}^d\). Define the \(2d(n-1) \times 2d(n-1)\) matrix
We will often drop the dependence on \(z,w\) and simply write \({\mathbf {G}}^{(k)}_n\). We again write \({\mathbf {G}}_n^{(k)} = \left( {\mathbf {G}}_{n,st}^{(k)}\right) _{s,t=1}^{2d}\) where each \({\mathbf {G}}_{n,st}^{(k)}\) is a \((n-1) \times (n-1)\) matrix. We let \(m_{n,st}^{(k)}(z,w) := \frac{1}{n} {{\mathrm{tr}}}{\mathbf {G}}_{n,st}^{(k)}\) for \(s,t \in \{1,\ldots ,2d\}\) and \(m_n^{(k)}(z,w) := \frac{1}{2dn} {{\mathrm{tr}}}{\mathbf {G}}_n^{(k)}\). We will often drop the dependence on \(z,w\) and write \(m_n^{(k)}\) and \(m_{n,st}^{(k)}\) instead of \(m_n^{(k)}(z,w)\) and \(m_{n,st}^{(k)}(z,w)\).
From (7.17), for \(w \in \mathbb {C}\) with \(\mathrm{Im}(w) \ge v_n\), we have the deterministic bounds \(\Vert {\mathbf {G}}_n(z,w)\Vert \le v_n^{-1}\), \(| m_n(z,w) | \le v_n^{-1}\), and \(| m_{n,st}(z,w) | \le v_n^{-1}\). By Cauchy’s interlacing theorem [15, Theorem 4.3.8] (or alternatively [2, (A.1.12)]), we have the deterministic bound
By Lemma 7.7, we have
and
for any \(1 \le k \le n\). Thus, from (7.20), we find
and
Fix \(1 \le k \le n\) and \(z \in \mathbb {C}\) with \(|z| \le M\). Fix \(w \in \mathbb {C}\) with \(|w| \le \beta \) and \(\mathrm{Im}(w) \ge v_n\). Let \({\mathbf {Q}}_k\) be the \(2d \times 2d\) matrix given by \({\mathbf {Q}}_k := ( {\mathbf {G}}_{n,st}(k,k) )_{s,t=1}^{2d}\). By the Schur complement [15, Section 0.7.3],
where \(\delta _{s,t}\) is the Kronecker delta and
By the truncation assumption and Lemma 7.2, we have a.s.
for \(n\) sufficiently large.
We observe that \({\mathbf {R}}_k\) and \({\mathbf {G}}^{(k)}_n\) are independent random matrices. By expanding the product, we note that the entries of \({\mathbf {R}}_k {\mathbf {G}}^{(k)}_n {\mathbf {R}}_k^*\) are linear combinations of bilinear forms. We will apply Lemma 7.5 to control each bilinear form. Applying the bound \(\Vert {\mathbf {G}}_n(z,w) \Vert \le v_n^{-1}\) and Lemma 7.5, we obtain
with probability \(1 - O(n^{-100})\). Here we obtain the bound on the spectral norm by bounding each entry individually and noting that
for any \(2d \times 2d\) matrix \({\mathbf {B}}\) (recall that the matrices above are \(2d \times 2d\)). The bound (7.23) holds with probability \(1 - O(n^{-100})\) by taking \(p\) sufficiently large in Lemma 7.5. The factor \(1/d\) appears because the entries of \({\mathbf {R}}_k\) have variance \(1/d\). We also used (iv) from Lemma 7.2 and the deterministic bound \(\sup _{s,t \in \{1,\ldots ,2d\}} | m_{n,st}^{(k)} | \le \Vert {\mathbf {G}}^{(k)}_n \Vert \le v_n^{-1}\).
By (7.21), (7.22), and the union bound over \(1 \le k \le n\), we obtain
with probability \(1 - O(n^{-99})\), where
We note that
It follows that \(M_n\) is invertible and the inverse is given (in block form) by
where \({\mathbf {M}}_{n,1}, {\mathbf {M}}_{n,2}, {\mathbf {M}}_{n,3}, {\mathbf {M}}_{n,4}\) are \(d \times d\) matrices with
Using (7.25), we obtain
and hence
Since \(\sup _{\mathrm{Im}(w) \ge v_n} \Vert {\mathbf {G}}_n(z,w)\Vert \le v_n^{-1}\), we obtain
Therefore, by (7.24), we have
with probability at least \(1 - O(n^{-99})\). Since \(m_n(z,w)\) is the normalized sum of the diagonal elements of \({\mathbf {G}}_n\), we now consider the diagonal elements of \({\mathbf {Q}}_k\) and \({\mathbf {M}}_n\); from the above estimate, we conclude that
with probability \(1 - O(n^{-99})\). Here we used the fact that
by definition of \(v_n\).
We now use an \({\varepsilon }\)-net argument to extend (7.27) to all \(|z| \le M\) and \(|w| \le \beta \) with \(\mathrm{Im}(w) \ge v_n\). Since \( \sup _{\mathrm{Im}(w) \ge v_n } \Vert {\mathbf {G}}_n(z,w) \Vert \le v_n^{-1}\), we apply (7.18) and the resolvent identity (7.19) to obtain the deterministic bounds
and
for all \(z,z' \in \mathbb {C}\) and \(w,w' \in \mathbb {C}\) with \(\mathrm{Im}(w), \mathrm{Im}(w') \ge v_n\). Applying (7.26) and the triangle inequality, we obtain
for all \(|z| \le M\) and \(|w|, |w'| \le \beta \) with \(\mathrm{Im}(w), \mathrm{Im}(w') \ge v_n\). Similarly,
for all \(|z|, |z'| \le M\) and \(|w| \le \beta \) with \(\mathrm{Im}(w) \ge v_n\).
We now apply an \({\varepsilon }\)-net argument with \({\varepsilon }= v_n^{13}\) to the sets \(\{z \in \mathbb {C} : |z| \le M \}\) and \(\{ w \in \mathbb {C} : |w| \le \beta , \mathrm{Im}(w) \ge v_n\}\). Let \(\mathcal {N}_1\) and \(\mathcal {N}_2\) denote the respective \({\varepsilon }\)-nets of the two sets. By Lemma 7.8,
Therefore, by a standard \({\varepsilon }\)-net argument and the union bound, we conclude that
with probability (say) \(1 - O(n^{-2})\). The claim now follows from an application of the Borel–Cantelli lemma. \(\square \)
7.3 Proof of Theorem 2.4
This subsection is devoted to the proof of Theorem 2.4. With Lemma 7.4 and Theorem 7.6 in hand, the proof of Theorem 2.4 will follow from a standard (and somewhat technical) argument; see [2, Chapter 11], [31], and references therein. We detail the argument below.
Recall the definition of the functions \(g_{{\mathbf {M}}}\) and \(g\) given in (7.1) and (7.2). By [2, Chapter 11] (see also [5] and [13, Section 3]), for each \(z \in \mathbb {C}\), there exists a probability measure \(\nu _z\) on the real line such that
where \(z = s + \sqrt{-1}t\).
Assume \(\{{\mathbf {X}}_n\}_{n \ge 1}\) and \(\{{\mathbf {N}}_n\}_{n \ge 1}\) satisfy the assumptions of Theorem 2.4. By Lemma 7.1 and [2, Lemma 11.5], in order to prove Theorem 2.4, it suffices to show that a.s.
as \(n \rightarrow \infty \).
By the triangle inequality and Lemma 7.3, we have that a.s.
Let \(A > 0\). Define
By [2, Lemma 11.7] and (7.28), in order to prove Theorem 2.4 it suffices to show that for each fixed \(A>0\) a.s.
as \(n \rightarrow \infty \).
Let \({\varepsilon }_n := n^{-B}\) for some sufficiently large \(B>0\) (independent of \(n\)) to be chosen later. Following the integration by parts argument from [2, Section 11.7], it suffices to show that a.s.
and
and similarly with the two-dimensional integral on \(T\) replaced by one-dimensional integrals on the boundary of \(T\). We shall only estimate the two-dimensional integrals, as the treatment of the one-dimensional integrals are similar.
We prove (7.29) first. By (7.28), it follows that \(\nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) - z {\mathbf {I}}}\) is supported on \([-n^{50}, n^{50}]\) a.s. Thus, it suffices to show that a.s.
By definition of \({\varepsilon }_n\), it suffices to show that a.s.
(7.31) will follow from Lemma 7.9 below.
We now prove (7.30). By Theorem 2.5 (and the Borel–Cantelli lemma), for some sufficiently large \(B > 0\), we have the following:
We now observe that it is possible to switch the quantifiers “a.e.” on \(z\) and “a.s.” on \(\omega \) in (7.32) using the arguments from [5, Section 4] and Fubini’s theorem, where \(\omega \) denotes an element of the sample space. Thus, we have
Using the \(L^2\)-norm argument in [31, Section 12], it follows that a.s.
is bounded uniformly in \(n\), and hence the sequence of functions \(\int _{0}^{{\varepsilon }_n} \log |x|^2 \nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) - z{\mathbf {I}}}\) \((dx)\) is a.s. uniformly integrable on \(T\). Let \(L > 1\) be a large parameter and define \(T_{L,n}\) to be the set of all \(z \in T\) such that \(\left| \int _{0} ^{{\varepsilon }_n} \log |x|^2 \nu _{\frac{1}{\sqrt{n}} ({\mathbf {X}}_n + {\mathbf {N}}_n) - z {\mathbf {I}}}(dx)\right| \le L\). By (7.33) and the dominated convergence theorem, we have a.s.
On the other hand, from the uniform boundedness of (7.34), we obtain a.s.
Combining the bounds above and taking \(L \rightarrow \infty \) yields (7.30).
It remains to establish the following lemma.
Lemma 7.9
Let \(\{{\mathbf {X}}_n\}_{n \ge 1}\) be a sequence of random matrices that satisfies condition C0 with parameter \(d \ge 2\) and atom variables \((\xi _{st})_{s,t=1}^d\), and assume (7.3) holds for some \(\eta > 0\). For each \(n \ge 1\), let \({\mathbf {N}}_n\) is a \(dn \times dn\) matrix such that \({{\mathrm{rank}}}({\mathbf {N}}_n) = O(n^{1-{\varepsilon }})\) for some \({\varepsilon }> 0\). Then there exists \(\alpha > 0\) such that a.s.
where \(\Vert \nu - \mu \Vert := \sup _{x \in \mathbb {R}} | \nu ((-\infty , x)) - \mu ((-\infty , x))|\) for any two probability measures \(\nu , \mu \) on the real line.
Proof
(of Lemma 7.9) The proof of Lemma 7.9 is based on the arguments from [34, Lemma 64]. By [2, Theorem A.43],
Thus, by the triangle inequality, it suffices to show that a.s.
for some \(\alpha > 0\).
From [13, Remark 3.1], it follows that for each \(z \in \mathbb {C}\), \(\nu _z\) has density \(\rho _z\) with
By [2, Lemma B.18], it suffices to show that a.s.
where \(F_z\) is the cumulative distribution function of \(\nu _z\). We remind the reader that \(L(F,G)\) denotes the Levy distance, defined in (7.11), between the distribution functions \(F\) and \(G\).
By Lemma 7.4, it suffices to show that a.s.
where \(\{\hat{{\mathbf {X}}}_n\}_{n \ge 1}\) is the sequence of truncated matrices from Lemma 7.4 for some \(0 < \delta < 1/100\). Let \(m(z,w)\) denote the Stieltjes transform of \(\nu _z\). That is,
From [13, Section 3], it follows that \(m(z,w)\) is a solution of
analytic in the upper-half plane \(\{w \in \mathbb {C} : \mathrm{Im}(w) > 0 \}\).
By [13, Remark 3.1], we choose \(\beta > 100\) sufficiently large (depending only on \(M\)) such that \(\rho _z\) is supported inside the interval \([-\beta /2, \beta /2]\) for all \(|z| \le M\). By Theorem 7.6 and [13, Lemma 2.4], it follows that a.s.
where \(v_n\) is defined in Theorem 7.6.
For the remainder of the proof, we fix a realization in which (7.36) holds. The implicit constants in our asymptotic notation (such as \(O,o,\Omega , \ll \)) depend only on the constants \(M, m_{2+\eta },d\); for simplicity, we no longer include these subscripts in our notation. By [13, (3.2)] and (7.36), it follows that
Write \(w = u + \sqrt{-1}v\). For any interval \(I \subset \mathbb {R}\), we define
where \(\lambda _1(\hat{{\mathbf {H}}}_n(z)), \lambda _2(\hat{{\mathbf {H}}}_n(z)), \ldots , \lambda _{2dn}(\hat{{\mathbf {H}}}_n(z))\) are the eigenvalues of \(\hat{{\mathbf {H}}}_n(z)\). We remind the reader that the eigenvalues of \(\hat{{\mathbf {H}}}_n(z)\) are given by
For an interval \(I \subset [-\beta , \beta ]\) of length \(|I| = v \ge v_n\) centered at \(u\), we have
We conclude that for any interval \(I \subset [-\beta , \beta ]\) of length \(|I| \ge v_n\),
Fix an interval \(I \subset [-\beta , \beta ]\) with \(|I| \ge 10 v_n\). Define
We have
and
for any \(z \in \mathbb {C}\). Therefore, by (7.36), we obtain
We note the following pointwise bounds:
when \(y \notin I\) and \({{\mathrm{dist}}}(y,I) \ge |I|\), and
when \(y \notin I\) and \({{\mathrm{dist}}}(y,I) < |I|\). In the case where \(y \in I\), we have
as \(\frac{1}{\pi } \frac{v_n}{v_n^2 + (x-y)^2}\) has total integral \(1\). Using these bounds, we find
Similarly, by (7.37), Riemann integration, and the trivial bound \(N_{J} \le 2dn\) for any interval \(J\) outside of \([-\beta , \beta ]\), we have
Combining the bounds above, we conclude that for any interval \(I \subset [-\beta , \beta ]\) with \(|I| \ge 10v_n\), we have
In particular, since \(\rho _z\) is supported inside \([-\beta /2, \beta /2]\), we obtain
where \([-\beta /2, \beta /2]^\mathsf {c}\) is the complement of the interval \([-\beta /2,\beta /2]\). Thus, we have
Since this bound holds for each fixed realization in which (7.36) holds, we obtain (7.35) a.s. The proof of the lemma is complete. \(\square \)
Notes
Here, and throughout the paper, we use asymptotic notation such as \(O,o\) under the assumption that \(n \rightarrow \infty \). See Sect. 2.6 for a complete description of our asymptotic notation.
References
Bai, Z.D., Silverstein, J., Yin, Y.Q.: A note on the largst eigenvalue of large-dimensional sample covariance matrix. J. Multivar. Anal. 26, 166–168 (1988)
Bai, Z.D., Silverstein, J.: Spectral analysis of large dimensional random matrices. In: Mathematics Monograph Series, vol. 2. Science Press, Beijing (2006)
Bai, Z.D.: Circular law. Ann. Probab. 25, 494–529 (1997)
Benaych-Georges, F., Chapon, F.: Random right eigenvalues of Gaussian quaternionic matrices. Random Matrices Theory Appl. 1 (2012)
Bordenave, C., Chafai, D.: Around the circular law. Probab. Surv. 9, 1–89 (2012)
Costello, K., Tao, T., Vu, V.: Random symmetric matrices are almost surely non-singular. Duke Math. J. 135, 395–413 (2006)
Edelman, A.: The probability that a random real Gaussian matrix has \(k\) real eigenvalues, related distributions, and the circular law. J. Multivar. Anal. 60, 203–232 (1997)
Erdős, P.: On a lemma of Littlewood and Offord. Bull. Am. Math. Soc. 51, 898–902 (1945)
Ginibre, J.: Statistical ensembles of complex, quaternion and real matrices. J. Math. Phys. 6, 440–449 (1965)
Girko, V.L.: Circular law. Theory Probab. Appl. 29, 694–706 (1984)
Girko, V.L.: The strong circular law, twenty years later, II. Random Oper. Stoch. Equ. 12(3), 255–312 (2004)
Goldsheid, I., Khoruzhenko, B.A.: The Thouless formula for random non-Hermitian Jacobi matrices. Isr. J. Math. 148, 331–346 (2005)
Götze, F., Tikhomirov, T.: The circular law for random matrices. Ann. Probab. 38(4), 1444–1491 (2010)
Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991)
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1991)
Littlewood, J.E., Offord, A.C.: On the number of real roots of a random algebraic equation. III. Rec. Math. Mat. Sbornik N. S. 12, 277–286 (1943)
Mehta, M.L.: Random Matrices and the Statistical Theory of Energy Levels. Academic Press, Dublin (1967)
Mehta, M.L.: Random Matrices, 3rd edn. Elsevier/Academic Press, Amsterdam (2004)
Nguyen, H., O’Rourke, S.: The elliptic law. arXiv:1208.5883 [math.PR]
Nguyen, H.: Inverse Littlewood–Offord problems and the singularity of random symmetric matrices. Duke Math. J. 161(4), 545–586 (2012)
Nguyen, H., Vu, V.: Optimal Littlewood–Offord theorems. Adv. Math. 226(6), 5298–5319 (2011)
Nguyen, H., Vu, V.: Random matrices: law of the determinant. Ann. Probab. 42(1), 146–167 (2014)
Nguyen, H., Vu, V.: Small probability, inverse theorems, and applications. Bolyai Soc. Math. Stud. 25, 409–463 (2013)
O’Rourke, S., Renfrew, D.: Low rank perturbations of large elliptic random matrices. arXiv:1309.5326 [math.PR]
Pan, G., Zhou, W.: Circular law, extreme singular values and potential theory. J. Multivar. Anal. 101, 645–656 (2010)
Rudelson, M., Vershynin, R.: The Littlewood–Offord problem and invertibility of random matrices. Adv. Math. 218, 600–633 (2008)
Soshnikov, A.: A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices. J. Stat. Phys. 108, 1033–1056 (2002)
Tao, T., Vu, V.: From the Littlewood–Offord problem to the circular law: universality of the spectral distribution of random matrices. Bull. Am. Math. Soc. (N.S.) 46(3), 377–396 (2009)
Tao, T., Vu, V.: Inverse Littlewood–Offord theorems and the condition number of random matrices. Ann. Math. (2) 169(2), 595–632 (2009)
Tao, T., Vu, V.: On the singularity probability of random Bernoulli matrices. J. Am. Math. Soc. 20, 603–628 (2007)
Tao, T., Vu, V.: Random matrices: the circular law. Commun. Contemp. Math. 10, 261–307 (2008)
Tao, T., Vu, V.: Smooth analysis of the condition number and the least singular value. Math. Comput. 79, 2333–2352 (2010)
Tao, T., Vu, V.: Random matrices: universality of ESDs and the circular law. Ann. Probab. 38(5), 2023–2065 (2010)
Tao, T., Vu, V.: Random matrices: universality of local eigenvalue statistics. Acta Math. 206, 127–204 (2011)
Vershynin, R.: Invertibility of symmetric random matrices. Random Struct. Algorithm 44, 135–182 (2014)
Wigner, E.P.: On the distributions of the roots of certain symmetric matrices. Ann. Math. 67, 325–327 (1958)
Zhang, F.: Quaternions and matrices of quaternions. Linear Algebra Appl. 251, 21–57 (1997)
Acknowledgments
The authors are grateful to T. Tao and the anonymous referees for valuable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
H. Nguyen is partly supported by research grant DMS-1358648. S. O’Rourke is supported by grant AFOSAR-FA-9550-12-1-0083.
Appendix: Proof of Theorems 5.1 and 6.1
Appendix: Proof of Theorems 5.1 and 6.1
We will mainly focus on Theorem 6.1 as the proof for Theorem 5.1 is similar. Assume that \(z_{11},\dots ,z_{dd}\) are random variables satisfying (4.1). Then as \({\mathbb {E}}|z_i|^{2+\eta }\) is bounded and as the covariance matrix \(({\mathbb {E}}z_{ij} {\bar{z}}_{i'j'})_{ij,i'j'}\) is the identity matrix (and thus is non-degenerate), we have the following fact (whose proof is left as an exercise).
Claim 8.1
There exists a sufficiently small constant \(\delta >0\) such that the following holds.
-
(1)
There exist measurable sets \(R_1,\dots , R_{d^2}\subset B(0,\delta ^{-1}) \subset {\mathbb {C}}^{d^2}\) such that \({\mathbb {P}}((z_{11},\dots ,z_{dd})\in R_i) \ge \delta \) for all \(1\le i\le d^2\) and for any collection of \(d^2\) vectors \({\mathbf {v}}_1\in R_1,\dots ,{\mathbf {v}}_{d^2}\in R_{d^2}\), the least singular value of the matrix formed from \({\mathbf {v}}_1,\dots ,{\mathbf {v}}_{d^2}\) is at least \(\delta \).
-
(2)
There exist measurable sets \(R_1,\dots ,R_d \subset B(0,\delta ^{-1}) \subset {\mathbb {C}}^d\) such that \({\mathbb {P}}((z_{11},\dots ,z_{1d})\in R_1,\dots , (z_{d1},\dots ,z_{dd})\in R_d)\ge \delta \), and for any collection of \(d\) vectors \({\mathbf {v}}_1\in R_1,\dots ,{\mathbf {v}}_{d}\in R_{d}\), the least singular value of the matrix formed from \({\mathbf {v}}_1,\dots ,{\mathbf {v}}_{d}\) is at least \(\delta \).
We remark that (1) and (2) are useful for proving Theorems 5.1 and 6.1 respectively. Now we sketch the main steps to prove Theorem 6.1 following [21].
First, we have
By using the independence of \(X^{(i)}\) and other elementary estimates such as \(|x|\le |x|^2/2+1/2\) and \(|\cos (\pi x)|\le \exp (-2\Vert x\Vert _{{\mathbb {R}}/{\mathbb {Z}}}^2)\), we obtain
where \(X'\) is an iid copy of \(X=(x_{ij})_{1\le i,j\le d}\) and \(Z=X-X'\).
By scaling the \({\mathbf {u}}^{(i)}\) by a factor of \(\beta ^{-1}\), it is enough to assume \(\beta =1\). Set \(M:= 2A \log n\) where \(A\) is large enough. From the fact that \(\gamma \ge n^{-O(1)}\) we easily obtain
For each integer \(0\le m \le M\) we define the level set
Then it follows from (8.1) that there exists \(m\le M\) such that \(\mu (S_m) \ge \gamma \exp (\frac{m}{4}-2)\) and \(\mu (T)\ge c\gamma \exp (\frac{m}{4}-2)m^{-2d}\), where
By a proper discretization, with \(N\) a sufficiently large prime, we obtain the following discrete analog: there exists a subset \(S\) of size at least \(cN^{2d}\gamma \exp (\frac{m}{4}-2)m^{-2d}\) of \(B_1= \{k_1/N + \sqrt{-1}k_2/N: k_1,k_2\in {\mathbb {Z}}, -2N \le k_1,k_2 \le 2N\}\) such that the following holds for any \({\mathbf {s}}\in S\)
Next, by the definition of \(S\),
It then follows that, by (2) of Claim 8.1, there exists a matrix \({\mathbf {C}}=(c_{11},\dots ,c_{dd})\) such that \(\delta \le \sigma _d ({\mathbf {C}}) \le \sigma _1 \le \delta ^{-1}\) and the following holds for some sufficiently large constant \(C\)
where \({\mathbf {v}}^{(i)}:=(c_{11}{\mathbf {u}}^{(i)}_1+\dots + c_{1d}{\mathbf {u}}^{(i)}_d,\dots ,c_{d1}{\mathbf {u}}^{(i)}_1+\dots + c_{dd}{\mathbf {u}}^{(i)}_d)\).
Let \(n'\) be any number between \(n^{\varepsilon }\) and \(n\). We say that an index \(1\le i\le n\) is bad if \(\sum _{{\mathbf {s}}\in S} \Vert \langle {\mathbf {s}},{\mathbf {v}}^{(i)} \rangle \Vert ^2_{{\mathbb {R}}/{\mathbb {Z}}} \ge \frac{Cm|S|}{n'}\). Clearly the number of bad indices is at most \(n'\). Let \(I\) be the set of good indices, and \(V\) be the set of vectors \({\mathbf {v}}^{(i)},i\in I\). Recall that for an arbitrary vector \({\mathbf {v}}\in V\)
Set \(k:=c\sqrt{\frac{n'}{m}}\) for some sufficiently small constant \(c\), and let \(V_k:=k(V\cup \{0\})\). By the Cauchy–Schwarz inequality, for any \({\mathbf {v}}\in V_k\), we have
The last estimate implies that the size of \(V_k\) does not grow fast in terms of \(k\). The treatment from here is identical to [21, Section 6] by using a Freiman-type inverse result [21, Theorem 3.2].
Rights and permissions
About this article
Cite this article
Nguyen, H.H., O’Rourke, S. On the concentration of random multilinear forms and the universality of random block matrices. Probab. Theory Relat. Fields 162, 97–154 (2015). https://doi.org/10.1007/s00440-014-0567-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-014-0567-7
Mathematics Subject Classification (2000)
- 15A52
- 15A63
- 11B25

