1 Introduction

Suppose \(A_{n}\) is an \(n\times n\) Hermitian matrix and \(\lambda _{1},\lambda _{2},\ldots,\lambda _{n}\) denote the real eigenvalues of \(A_{n}\). The empirical spectral distribution function (ESD) of \(A_{n}\) can be defined as

$$\begin{aligned} F^{A_{n}}(x)=\frac{1}{n}\sum_{i=1}^{n}I_{\{\lambda _{i}\leq x\}}, \end{aligned}$$

where \(I_{A}\) represents the indicator function on the set A. The limit distribution of \(F^{A_{n}}(x)\) as \(n\rightarrow \infty \), if it exists, will be called the limiting spectral distribution (LSD) of \(A_{n}\). Since most of the global spectral limiting properties of \(A_{n}\) can be determined by its LSD, the LSD of large dimensional random matrices has attracted considerable interest among mathematicians, probabilists, and statisticians, one can refer to Wigner [15, 16], Grenander and Silverstein [7], Jonsson [8], Yin and Krishnaiah [18], Bai and Yin [4] and so on.

The Wigner matrix is one of the most basic and popular objects in the random matrix theory. A Wigner matrix is a symmetric (or Hermitian in the complex case) random matrix whose entries on or above the diagonal are independent random variables. When a Wigner matrix \(X_{n}\) whose entries are i.i.d. real (or complex) random variables with mean zero and variance 1, Wigner [16] proved that the expected ESD of \(W_{n}=\frac{1}{\sqrt{n}}X_{n}\), tends to the limiting distribution \(F_{sc}\), whose density is given by

$$\begin{aligned} f_{sc}(x)= \textstyle\begin{cases} \frac{1}{2\pi }\sqrt{4-x^{2}}I_{\{ \vert x \vert \leq 2\}} &\text{if } \vert x \vert \leq 2, \\ 0 &\text{otherwise}. \end{cases}\displaystyle \end{aligned}$$

The LSD \(F_{sc}\) is usually called the semicircular law in the literature. Grenander [6] proved that \(\|F^{W_{n}}-F_{sc}\|\rightarrow {0}\) in probability. Arnold [1, 2] obtained the result that \(F^{W_{n}}\) converges to \(F_{sc}\) almost surely. Pastur [12] removed the identically distributed assumption, and considered that when the entries above or on the diagonal of \(X_{n}\) are independent real or complex random variables with mean zero and variance 1, may not necessarily be identically distributed, but satisfy the following Lindeberg type assumption, for any constant \(\eta >0\):

$$\begin{aligned} \lim_{n\rightarrow \infty }\frac{1}{n^{2}}\sum _{i,j}^{n}E \vert X_{ij} \vert ^{2}I\bigl( \vert X_{ij} \vert \geq \eta \sqrt{n} \bigr)=0. \end{aligned}$$
(1.1)

Then the ESD of \(W_{n}\) converges almost surely to the semicircular law.

Among the results above, the assumption that the entries of the Wigner matrix have a common variance is necessary. However, in a practical application, the uniform variance assumption is a strong condition. In the paper, we will remove the uniform variance assumption and establish the same semicircular law result under a milder assumption on the variances of the entries, in particular, we assume that the covariances of the entries may not be equal to a constant, but only the average of the normalized sums of in each row of the data matrix converges to a positive constant. The result reads as follows.

Theorem 1.1

Let\(W_{n}=\frac{1}{\sqrt{n}}X_{n}\)be a Wigner matrix, and the entries above or on the diagonal of\(X_{n}\)be independent real or complex random variables, but they may not be necessarily identically distributed. Assume that all the entries of\(X_{n}\)are of mean zero, and the variance\(E|X_{ij}|^{2}=\sigma ^{2} _{ij}\), where\(\sigma _{ij}\)satisfies\(\frac{1}{n}\sum_{i=1}^{n}|\frac{1}{n}\sum_{j=1}^{n}\sigma _{ij}^{2}-1| \rightarrow {0}\)as\(n\rightarrow \infty \), and the assumption (1.1) holds. Then, almost surely, the ESD of\(W_{n}\)converges weakly to the semicircular law.

Remark 1.1

The result of Theorem 1.1 can be extended to a general one: when the average of the normalized sums in each row converges weakly to a positive constant \(\sigma ^{2}\), then almost surely the LSD of \(W_{n}\) is the general semicircular law with density

$$\begin{aligned} f_{sc,\sigma }(x)= \textstyle\begin{cases} \frac{1}{2\pi }\sqrt{4\sigma ^{2}-x^{2}}I_{\{ \vert x \vert \leq 2\sigma \}} &\text{if } \vert x \vert \leq 2\sigma, \\ 0 &\text{otherwise}.\end{cases}\displaystyle \end{aligned}$$

Now, we will consider the LSD of a sample covariance matrix, which is also an important object in random matrix theory and multivariate statistics. Suppose \(Y_{n}=(Y_{ij})_{n\times N}\) is a real or complex random matrix, whose entries \(Y_{ij}\ (i=1,\ldots,n,j=1,\ldots,N)\) are i.i.d. real or complex random variables with mean zero and variance 1. Write \(Y_{j}=(Y_{1j},\ldots,Y_{nj})'\) and \(\mathbf{Y}_{n}=(Y_{1},\ldots,Y_{N})\). Define \(\bar{Y}=\frac{1}{N}\sum_{k=1}^{N}Y_{k}\). Since \(\tilde{S}_{n}=\frac{1}{N-1}\sum_{k=1}^{N}(Y_{k}-\bar{Y})(Y_{k}- \bar{Y})^{*}\) shares the same the LSD with \(S_{n}=\frac{1}{N}\sum_{k=1}^{N}Y_{k}Y_{k}^{*}=\frac{1}{N}\mathbf{Y}_{n} \mathbf{Y}_{n}^{*}\), where ∗ represents the conjugate transpose symbol, we usually consider the sample covariance matrix defined by \(S_{n}=\frac{1}{N}\mathbf{Y}_{n}\mathbf{Y}_{n}^{*}\), The limiting spectral properties of large sample covariance matrices have generated a considerable amount of interest in statistics, signal processing and other disciplines. The first result on the LSD of \(S_{n}\) is due to Marc̆enko and Pastur [10], who proved that when \(\lim_{n\rightarrow \infty }\frac{n}{N}=y\in (0,\infty )\), the LSD of \(S_{n}\) is M-P law \(F_{y}^{MP}(x)\) with density

$$\begin{aligned} f^{MP}_{y}(x)= \textstyle\begin{cases} \frac{1}{2\pi xy}\sqrt{(b-x)(x-a)}, &\text{if } a\leq x\leq b, \\ 0, &\text{otherwise},\end{cases}\displaystyle \end{aligned}$$

and has a point mass \(1-1/y\) at the origin if \(y>1\), where \(a=(1-\sqrt{y})^{2}\) and \(b=(1+\sqrt{y})^{2}\). There is also some work on the discussion of the M-P law of sample covariance matrices, such as Bai and Yin [4], Grenander and Silverstein [7], Jonsson [8], Yin [17], Silverstein [13] and Silverstein and Bai [14]. A typical result (see Theorem 3.9 of Bai and Silverstein [3]) states that when the entries of \(Y_{n}\) are independent random variables with mean zero and variance 1, \(n/N\rightarrow y\in (0,\infty )\), and for any \(\eta >0\),

$$\begin{aligned} \frac{1}{\eta ^{2}nN}\sum_{i,j}E\bigl( \vert Y_{ij} \vert ^{2}I\bigl( \vert Y_{ij} \vert \geq \eta \sqrt{N}\bigr)\bigr)\rightarrow 0, \end{aligned}$$
(1.2)

then the ESD of \(S_{n}\) tends to the M-P law \(F_{y}^{MP}\) almost surely. Note that the entries of \(Y_{n}\) having a uniform variance 1 is also a necessary condition in the proof. By the same motivation as Theorem 1.1, we will also consider removing the equal covariance condition. Similarly, we can get the following result.

Theorem 1.2

Assume that the entries of the random matrix\(Y_{n}\)defined above are independent variables with mean zero and variance\(E|Y_{ij}|^{2}=\sigma _{ij}^{2}\), where\(\sigma _{ij}\)satisfies\(\frac{1}{n}\sum_{i=1}^{n}|\frac{1}{N}\sum_{j=1}^{N}\sigma _{ij}^{2}-1| \rightarrow {0}\). Assume that\(n/N\rightarrow y\in (0,\infty )\)as\(n\rightarrow \infty \)and the assumption (1.2) holds. Then, almost surely, the ESD of the sample covariance matrix\(S_{n}=\frac{1}{N}Y_{n}Y_{n}^{*}\)converges weakly to the M-P law.

Remark 1.2

Likewise, if there exists a positive constant \(\sigma ^{2}>0\) satisfying \(\frac{1}{n}\sum_{i=1}^{n}\times |\frac{1}{N}\sum_{j=1}^{N}\sigma _{ij}^{2}- \sigma ^{2}|\rightarrow {0}\), the other assumptions remain unchanged, we also get, almost surely, for the LSD of \(S_{n}=\frac{1}{N}Y_{n}Y_{n}^{*}\) the general M-P law with density

$$\begin{aligned} f^{MP}_{y,\sigma }(x)= \textstyle\begin{cases} \frac{1}{2\pi xy\sigma ^{2}}\sqrt{(\tilde{b}-x)(x-\tilde{a})}, &\text{if } \tilde{a}\leq x\leq \tilde{b}, \\ 0 &\text{otherwise},\end{cases}\displaystyle \end{aligned}$$

and it has a point mass \(1-1/y\) at the origin if \(y>1\), where \(\tilde{a}=\sigma ^{2}(1-\sqrt{y})^{2}\) and \(\tilde{b}=\sigma ^{2}(1+\sqrt{y})^{2}\).

The rest of the paper is organized as follows. The proofs of the main results are presented in Sect. 2. In the Appendix, some useful lemmas are listed. In the sequel, when there is no confusion, we may get rid of the subscript n in the notation of matrices for brevity. \(A^{\ast }\) denotes the conjugate transpose of matrix A, and \(\operatorname{tr}(A)\) denotes the trace of A, and C denotes positive constant, which may be different in different cases.

2 Proofs

The Stieltjes transform method is mainly adopted to complete the proofs. For a distribution function \(F(x)\), its Stieltjes transform can be defined as

$$\begin{aligned} s_{F}(z)= \int \frac{1}{x-z}\,dF(x),\quad z\in \mathbb{C}^{+}=\{z=u+iv| u\in \mathbb{R}, v>0\}. \end{aligned}$$

Obviously, we can write the Stieltjes transform of ESD \(F^{A_{n}}(x)\) as

$$\begin{aligned} s_{F^{A_{n}}}(z)=\frac{1}{n}\sum_{i=1}^{n} \frac{1}{\lambda _{i}-z}= \frac{1}{n}\operatorname{tr}(A_{n}-zI_{n})^{-1}, \end{aligned}$$

where \(I_{n}\) is the identity matrix with order n. The continuity theorem of Stieltjes transform states that, for a sequence of functions of bounded variation \(\{G_{n}\}\) with the Stieltjes transform \(s_{G_{n}}(z)\), \(G_{n}(-\infty )=0\) for all n, and a function of bounded variation G, \(G(-\infty )=0\), with the Stieltjes transform \(s_{G}(z)\), \(G_{n}\) converges vaguely to G if and only if \(s_{G_{n}}(z)\) converges to \(s_{G}(z)\) for \(z\in \mathbb{C}^{+}\). In view of the fact that the sequence of the ESD of Wigner matrix is tight (see Lytova and Pastur [9]), the weak convergence of the ESD can be obtained by the convergence of their corresponding Stieltjes transform. Furthermore, if the LSD is a deterministic probability density function, then the almost surely convergence of the ESD can be achieved by the almost surely convergence of the Stieltjes transform, which is a basic idea in the following proofs.

2.1 Proof of Theorem 1.1

Define \(F^{W_{n}}\) to be the ESD of \(W_{n}\) and \({s}_{n}(z)\) the Stieltjes transforms of \(F^{W_{n}}\). Then by the continuity theorem of the Stieltjes transform, we complete the proof of Theorem 1.1 by showing

$$\begin{aligned} s_{n}(z)\rightarrow s(z),\quad z\in \mathbb{C}^{+} \text{ a.s.}, \end{aligned}$$
(2.1)

where \(s(z)\) is the Stieltjes transform of the semicircular law \(F_{sc}\).

The proofs of the real-valued Wigner matrix are almost the same as those of the complex-valued Wigner matrix, that is, all the results as well as the main ingredients of the proofs in the real symmetric matrices case remain valid in the Hermitian case with natural modifications. For the sake of simplicity, we will confine ourselves to a real symmetric Wigner matrix. To this end, we will write \(\hat{W}_{n}=\frac{1}{\sqrt{n}}\hat{X}_{n}\) to be a Wigner matrix independent of \(W_{n}\), and the entries of \(\hat{X}_{n}=(\hat{X}_{ij})_{n\times n}\) are independent \(N(0,1)\) random variables. Define \(F^{\hat{W_{n}}}\) to be the ESD of \(\hat{W}_{n}\), and \(\hat{s}_{n}(z)\) the Stieltjes transforms of \(F^{\hat{W}_{n}}\). By Theorem 2.9 of Bai and Silverstein [3], we know that, almost surely, the LSD of \(\hat{W}_{n}\) is a semicircular law \(F_{sc}(x)\), which means

$$\begin{aligned} \hat{s}_{n}(z)\rightarrow s(z),\quad z\in \mathbb{C}^{+},\text{ a.s.} \end{aligned}$$

Thus, (2.1) can be achieved by

$$\begin{aligned} \hat{s}_{n}(z)-s_{n}(z)\rightarrow 0,\quad z\in \mathbb{C}^{+},\text{ a.s.} \end{aligned}$$
(2.2)

In the sequel, we will complete the proof of (2.2) by the following two steps.

  1. (i)

    For any fixed \(z\in \mathbb{C}^{+}\), \(\hat{s}_{n}(z)-E\hat{s}_{n}(z)\rightarrow {0}, a.s\). and \(s_{n}(z)-Es_{n}(z)\rightarrow {0}, a.s\).

  2. (ii)

    For any fixed \(z\in \mathbb{C}^{+}\), \(Es_{n}(z)-E\hat{s}_{n}(z)\rightarrow {0}\).

We begin with step (i). Define \(W_{k}\) to be a major submatrix of order \((n-1)\), obtained from \(W_{n}\) with the kth row and column removed, and \(\alpha _{k}\) to be the vector from the kth column of \(W_{n}\) by deleting the kth entry. Denote by \(E_{k}(\cdot )\) conditional expectation with respect to the σ-field generated by the random variables {\(X_{i,j}\), \(i,j>k\)}, with the convention that \(E_{n}s_{n}(z)=Es_{n}(z)\) and \(E_{0}s_{n}(z)=s_{n}(z)\). Then

$$\begin{aligned} s_{n}(z)-Es_{n}(z)=\sum_{k=1}^{n} \bigl[E_{k-1}\bigl(s_{n}(z)\bigr)-E_{k} \bigl(s_{n}(z)\bigr)\bigr]:= \sum_{k=1}^{n} \gamma _{k}. \end{aligned}$$

By Theorem A.5 of Bai and Silverstein [3], we know

$$\begin{aligned} \gamma _{k} = {}&\frac{1}{n}\bigl(E_{k-1} \operatorname{tr}(W_{n}-zI)^{-1}-E_{k} \operatorname{tr}(W_{n}-zI)^{-1}\bigr) \\ ={}&\frac{1}{n}\bigl(E_{k-1}\bigl(\operatorname{tr}(W_{n}-zI)^{-1}- \operatorname{tr}(W_{k}-zI_{n-1})^{-1}\bigr) \\ &{} -E_{k}\bigl(\operatorname{tr}(W_{n}-zI)^{-1}- \operatorname{tr}(W_{k}-zI_{n-1})^{-1}\bigr)\bigr) \\ ={}&\frac{1}{n} \biggl(E_{k-1} \frac{1+\alpha _{k}^{*}(W_{k}- zI_{n-1})^{-2}\alpha _{k}}{-z-\alpha _{k}^{*}(W_{k}- zI_{n-1})^{-1}\alpha _{k}}-E_{k} \frac{1+\alpha _{k}^{*}(W _{k}-zI_{n-1})^{-2}\alpha _{k}}{-z-\alpha _{k}^{*}(W_{k}-zI_{n-1})^{-1}\alpha _{k}} \biggr). \end{aligned}$$

Note that

$$\begin{aligned} & \bigl\vert 1+\alpha _{k}^{*}(W_{k}-zI_{n-1})^{-2} \alpha _{k} \bigr\vert \\ &\quad \leq 1+\alpha _{k}^{*}(W_{k}-zI_{n-1})^{-1}(W_{k}- \bar{z}I_{n-1})^{-1} \alpha _{k} \\ &\quad =v^{-1}\operatorname{Im}\bigl(z+\alpha _{k}^{*}(W_{k}-zI_{n-1})^{-1} \alpha _{k}\bigr), \end{aligned}$$

which implies \(|\gamma _{k}|\leq 2/nv\). Since \(\{\gamma _{k}, k\geq 1\}\) forms a martingale difference sequence, then it follows by Lemma A.1 with \(p=4\) that

$$\begin{aligned} E{ \bigl\vert s_{n}(z)-Es_{n}(z) \bigr\vert }^{4} \leq K_{4}E \Biggl(\sum _{k=1}^{n} \vert \gamma _{k} \vert ^{2} \Biggr)^{2} \leq K_{4} \Biggl(\sum _{k=1}^{n}\frac{4}{n^{2}v^{2}} \Biggr)^{2} \leq \frac{16K_{4}}{n^{2}v^{4}}=O\bigl(n^{-2}\bigr), \end{aligned}$$

which, together with the Borel–Cantelli lemma, yields

$$\begin{aligned} s_{n}(z)-Es_{n}(z)\rightarrow {0},\quad \text{a.s.} \end{aligned}$$

for every fixed \(z\in \mathbb{C}^{+}\).

Similarly, we also get

$$\begin{aligned} \hat{s}_{n}(z)-E\hat{s}_{n}(z) \rightarrow {0},\quad \text{a.s.} \end{aligned}$$

for every fixed \(z\in \mathbb{C}^{+}\). Therefore, step (i) is completed.

Now we come to step (ii). We firstly introduce some notation:

$$\begin{aligned} &X(s)=s^{1/2}X+(1-s)^{1/2}\hat{X},\quad 0\leq {s}\leq {1},\qquad G(z)= \biggl( \frac{1}{\sqrt{n}}X-zI\biggr)^{-1}, \\ &\hat{G}(z)=\biggl(\frac{1}{\sqrt{n}}\hat{X}-zI\biggr)^{-1},\qquad G(s,z)= \biggl( \frac{1}{\sqrt{n}}X(s)-zI\biggr)^{-1}. \end{aligned}$$

By the facts that \(X(1)=X\), \(X(0)=\hat{X}\), we can write

$$\begin{aligned} & Es_{n}(z)-E\hat{s}_{n}(z) \\ & \quad= \frac{1}{n}E \bigl(trG(z)-\operatorname{tr}\hat{G}(z) \bigr) \\ &\quad= \int _{0}^{1}\frac{\partial }{\partial {s}}E \biggl( \frac{1}{n}trG(s,z) \biggr)\,ds \\ &\quad=\frac{1}{2n^{3/2}} \int _{0}^{1}Etr\frac{\partial }{\partial {z}}G(s,z) \bigl((1-s)^{-1/2} \hat{X}-s^{-1/2}X\bigr)\,ds. \end{aligned}$$
(2.3)

Denote \(G'=\frac{\partial }{\partial {z}}G(s,z)\). Write the \((i,j)\)-entry of \(G'\) by \(G'_{ij}\) and the \((i,j)\)-entry of \(X(s)\) by \(X_{ij}(s)\). Since the random variables \(\hat{X}_{ij}\) are independent \(N(0,1)\) random variables, applying the Stein equation in Lemma A.2 with \(\varPhi =G_{ij}'\), we have

$$\begin{aligned} & Etr\frac{\partial }{\partial {z}}G(s,z) (1-s)^{-1/2}\hat{X} \\ &\quad = (1-s)^{-1/2}\sum_{i,j}E\bigl( \hat{X}_{ij}G'_{ij}\bigr) \\ &\quad=(1-s)^{-1/2}\sum_{i,j} \bigl(E \vert \hat{X}_{ij} \vert ^{2}E\bigl(D_{ij}{(s)}G'_{i j} \bigr) (1-s)^{1/2}n^{-1/2} \bigr) \\ &\quad=n^{-1/2}\sum_{i,j}E\bigl(D_{ij}{(s)}G'_{ij} \bigr), \end{aligned}$$

where \(D_{ij}{(s)}={\partial }/{\partial X_{ij}(s)}\).

On the other hand, as the random variables \(X_{ij}\) are independent, we will adopt the generalized Stein equation in Lemma A.3 to rewrite the second term in the parentheses of the r.h.s. of (2.3). To this end, we will take \(p=1\) and \(\varPhi =G'_{ij}\) in Lemma A.3. Note that \(\kappa _{1}=EX_{ij}=0\) and \(\kappa _{2}=E|X_{ij}|^{2}\). Then we have

$$\begin{aligned} & Etr\frac{\partial }{\partial {z}}G(s,z)s^{-1/2}X \\ & \quad= s^{-1/2}\sum_{i,j}E \bigl(X_{ij}G'_{ji}\bigr) \\ &\quad=s^{-1/2}\sum_{i,j} \bigl(E \vert X_{ij} \vert ^{2}E\bigl(D_{ij}{(s)}G'_{ji} \bigr)s^{1/2}n^{-1/2}+ \varepsilon _{ij} \bigr) \\ &\quad=n^{-1/2}\sum_{i,j}E \vert X_{ij} \vert ^{2}E\bigl(D_{ij}{(s)}G'_{ji} \bigr)+s^{-1/2} \varepsilon \end{aligned}$$

and

$$\begin{aligned} \vert \varepsilon \vert := \biggl\vert \sum_{i,j} \varepsilon _{ij} \biggr\vert \leq C\frac{s}{n}\sum _{i,j}E \vert X_{ij} \vert ^{3} \sup _{X(s)\in {\wp _{n}}} \bigl\vert D_{ij}^{2}(s)G'_{ji} \bigr\vert , \end{aligned}$$

where \(\wp _{n}\) is the set of \(n\times n\) real symmetric matrices.

Thus

$$\begin{aligned} & \bigl\vert Es_{n}(z)-E\hat{s}_{n}(z) \bigr\vert \\ &\quad\leq \frac{1}{2n^{3/2}} \int _{0}^{1} \biggl(\frac{1}{n^{1/2}} \biggl\vert \sum_{i,j} \bigl\{ \bigl(1-E \vert X_{ij} \vert ^{2}\bigr)E\bigl(D_{ij}{(s)}G'_{ji} \bigr)\bigr\} \biggr\vert +\frac{1}{s^{1/2}} \vert \varepsilon \vert \biggr)\,ds \\ &\quad=\frac{1}{2n^{2}} \int _{0}^{1} \biggl\vert \sum _{i,j}\bigl\{ \bigl(1-E \vert X_{ij} \vert ^{2}\bigr)E\bigl(D_{ij}{(s)}G'_{ji} \bigr) \bigr\} \biggr\vert \,ds+ \frac{1}{2n^{3/2}} \int _{0}^{1}\frac{1}{s^{1/2}} \vert \varepsilon \vert \,ds \\ &\quad:=I+\mathit{II}. \end{aligned}$$
(2.4)

By (3.25) in Lytova and Pastur [9],

$$\begin{aligned} \bigl\vert D_{ij}^{(l)}G'_{ij} \bigr\vert \leq c_{l}/{v^{(l+2)}}, \end{aligned}$$
(2.5)

where \(c_{l}\) is an absolute constant for every l. Let \(l=1\), then \(|D_{ij}G'_{ij}|\leq c_{1}/{v^{3}} \), and

$$\begin{aligned} E\bigl(D_{ij}{(s)}G'_{ji}\bigr)\leq \sup _{s} \bigl\vert D_{ij}(s)G'_{ij} \bigr\vert \leq c_{1}/{v^{3}}. \end{aligned}$$

Since \(E|X_{ij}|^{2}=\sigma _{ij}^{2}\), and \(\sigma _{ij}\) satisfies \(\frac{1}{n}\sum_{i=1}^{n}|\frac{1}{n}\sum_{j=1}^{n}\sigma _{ij}^{2}-1| \rightarrow {0} \) based on the condition in Theorem 1.1, we easily get

$$\begin{aligned} \frac{1}{n^{2}} \biggl\vert \sum_{i,j} \bigl(1-E \vert X_{ij} \vert ^{2}\bigr) \biggr\vert = \frac{1}{n^{2}} \biggl\vert n^{2}- \sum _{i,j}\sigma _{ij}^{2} \biggr\vert \leq \frac{1}{n}\sum_{i}^{n} \Biggl\vert 1- \frac{1}{n}\sum_{j}^{n}\sigma _{ij}^{2} \Biggr\vert \rightarrow 0, \end{aligned}$$

and then

$$\begin{aligned} I=o(1). \end{aligned}$$
(2.6)

We have

$$ \mathit{II}\leq \frac{C}{2n^{3/2}} \int _{0}^{1}\frac{s^{1/2}}{n}\sum _{i,j}E \vert X_{ij} \vert ^{3} \sup _{X(s)\in {\wp _{n}}} \bigl\vert D_{ij}^{2}(s)G'_{ij} \bigr\vert \,ds. $$

By the assumption (1.1), we select a sequence \(\eta _{n}\downarrow {0}\) as \(n\rightarrow {\infty }\), such that

$$\begin{aligned} \lim_{n\rightarrow \infty }\frac{1}{n^{2}\eta _{n}^{2}}\sum_{i,j}E \vert X_{ij} \vert ^{2}I\bigl( \vert X_{ij} \vert \geq \eta _{n} \sqrt{n}\bigr)=0. \end{aligned}$$

And the convergence rate of \(\eta _{n}\) can be as slow as desired. For definiteness, we may assume that \(\eta _{n} > 1/\log n\) and \(\eta _{n}\rightarrow 0\). Then we have

$$ \mathit{II}\leq \frac{C\eta _{n}}{2n^{2}} \int _{0}^{1}s^{1/2}\sum _{i,j}E \vert X_{ij} \vert ^{2} \sup _{X(s)\in {\wp _{n}}} \bigl\vert D_{ij}^{2}(s)G'_{ij} \bigr\vert \,ds. $$

Since

$$\begin{aligned} \frac{1}{n^{2}}\sum_{i,j}E \vert X_{ij} \vert ^{2}-1 \leq {}&\frac{1}{n^{2}} \biggl\vert \sum_{i,j} \sigma _{ij}^{2}- n^{2} \biggr\vert \\ ={}&\frac{1}{n} \biggl\vert n-\frac{1}{n}\sum _{i,j}\sigma _{ij}^{2} \biggr\vert \leq \frac{1}{n}\sum_{i}^{n} \Biggl\vert 1-\frac{1}{n}\sum_{j}^{n}\sigma _{ij}^{2} \Biggr\vert \rightarrow 0, \end{aligned}$$

we obtain \(\frac{1}{n^{2}}\sum_{i,j}E|X_{ij}|^{2}=1+o(1)\). And by (2.5), let \(l=2\), then

$$\begin{aligned} \sup_{X(s)\in {\wp _{n}}} \bigl\vert D_{ij}^{2}(s)G'_{ij} \bigr\vert \leq c_{2}/{v^{4}}. \end{aligned}$$

So we have

$$\begin{aligned} \mathit{II}\leq \frac{1}{2} \int _{0}^{1}s^{1/2}\eta _{n} \biggl(\frac{1}{n^{2}}\sum_{i,j}E \vert X_{ij} \vert ^{2}\biggr) \sup_{X(s)\in {\wp _{n}}} \bigl\vert D_{ij}^{2}(s)G'_{ji} \bigr\vert \,ds \leq C\eta _{n}+o(1). \end{aligned}$$

By \(\eta _{n}\rightarrow 0\) as \(n\rightarrow \infty \), we have \(\mathit{II}=o(1)\). This, together with (2.4) and (2.6), means that

$$\begin{aligned} Es_{n}(z)-E\hat{s}_{n}(z)\rightarrow 0 \end{aligned}$$

for any fixed \(z\in \mathbb{C}^{+}\). Step (ii) is completed.

Combining steps (i) and (ii), we see that (2.2) is proved. Therefore, we have

$$ F^{W_{n}}\mathop{\rightarrow}^{w}F_{sc}, \quad\text{a.s.}, $$

By (2.1)–(2.2), we complete the proof of Theorem 1.1.

2.2 Proof of Theorem 1.2

We will also consider the real-valued sample covariance matrix case and take a similar procedure to Theorem 1.1 to complete the proof. To this end, we also firstly define \(\hat{Y}_{n}=(\hat{Y}_{ij})_{n\times N}\) to be a \(n\times N\) random matrix independent of \(Y_{n}\), and the entries \(\hat{Y}_{ij}^{,}s\) to be i.i.d. \(N(0,1)\) random variables. Write \(\hat{S}_{n}=\frac{1}{N}\hat{Y}_{n}\hat{Y}_{n}^{*}\). We will use \(F^{S_{n}}\) and \(F^{\hat{S}_{n}}\) to denote the ESD of \(S_{n}\) and \(\hat{S}_{n}\), respectively. Let \(m_{n}(z)\) and \(\hat{m}_{n}(z)\) be the Stieltjes transforms of \(F^{S_{n}}\) and \(F^{\hat{S}_{n}}\), respectively.

By Theorem 3.10 of Bai and Silverstein [3], we have obtained

$$ \hat{m}_{n}(z)\rightarrow m(z), \quad\text{a.s.} $$

where \(m(z)\) is the Stieltjes transform of standard M-P law \(F_{y}^{MP}\). Thus, by the continuity theorem of the Stieltjes transform again, we complete the proof by showing that, for any fixed \(z\in \mathbb{C}^{+}\),

  1. (i)

    \(m_{n}(z)-Em_{n}(z)\rightarrow 0\), a.s. and \(\hat{m}_{n}(z)-E\hat{m}_{n}(z)\rightarrow 0\), a.s.

  2. (ii)

    \(Em_{n}(z)-E\hat{m}_{n}(z)\rightarrow {0}\).

For (i), we prove it by a similar argument to Bai and Silverstein [3]. For the sake of completeness, we will also give the proof. Let \(\widetilde{E}_{k}(\cdot )\) denote the conditional expectation given by \(\{\text{\scriptsize {$Y_{k+1}$}},\ldots ,\text{\scriptsize {$Y_{N}$}} \}\), with the convention that \(\widetilde{E}_{N}m_{n}(z)=Em_{n}(z)\) and \(\widetilde{E}_{0}m_{n}(z)=m_{n}(z)\). Then

$$\begin{aligned} m_{n}(z)-Em_{n}(z) =\sum_{k=1}^{N} \bigl(\widetilde{E}_{k}m_{n}(z)- \widetilde{E}_{k-1}m_{n}(z) \bigr):=\sum_{k=1}^{N}\widetilde{\gamma }_{k}, \end{aligned}$$

where

$$\begin{aligned} \widetilde{\gamma }_{k}&=\frac{1}{n} \bigl(\widetilde{E}_{k}\operatorname{tr}(S_{n}-zI_{n})^{-1}- \widetilde{E}_{k-1}{\operatorname{tr}(S_{n}-zI_{n})}^{-1} \bigr) \\ &=\frac{1}{n} \bigl((\widetilde{E}_{k}-\widetilde{E}_{k-1}) \bigl({\operatorname{tr}(S_{n}-zI_{n})}^{-1}-{ \operatorname{tr}(S_{nk}-zI_{n})}^{-1}\bigr) \bigr) \\ &=-\frac{1}{n} \biggl((\widetilde{E}_{k}-\widetilde{E}_{k-1}) \frac{\text{\scriptsize {Y}}^{*}_{k}(S_{nk}-zI_{n})^{-2}\text{\scriptsize {Y}}_{k}}{1+\text{\scriptsize {Y}}^{*}_{k}(S_{nk}-zI_{n})^{-1}\text{\scriptsize {Y}}_{k}} \biggr). \end{aligned}$$

Here \(S_{nk}=S_{n}-\text{\scriptsize {Y}}_{k}\text{\scriptsize {Y}}_{k}^{*}\) and \(\text{\scriptsize {Y}}_{k}\) is the kth column of \(Y_{n}\) with the kth element removed, and

$$\begin{aligned} \biggl\vert \frac{\text{\scriptsize {Y}}^{*}_{k}{(S_{nk}-zI_{n})}^{-2}\text{\scriptsize {Y}}_{k}}{1+\text{\scriptsize {Y}}^{*}_{k}{(S_{nk}-zI_{n})}^{-1}\text{\scriptsize {Y}}_{k}} \biggr\vert \leq \frac{\text{\scriptsize {Y}}^{*}_{k}{((S_{nk}-uI_{n})^{2}+v^{2}I_{n})}^{-1}\text{\scriptsize {Y}}_{k}}{\operatorname{Im}{(1+\text{\scriptsize {Y}}^{*}_{k}{(S_{nk}-zI_{n})}^{-1}\text{\scriptsize {Y}}_{k})}} = \frac{1}{v}. \end{aligned}$$

Note that \(\{\widetilde{\gamma }_{k},k\geq 1\}\) forms a sequence of bounded martingale differences.

By Lemma A.1 with \(p=4\), we have

$$\begin{aligned} E{ \bigl\vert m_{n}(z)-Em_{n}(z) \bigr\vert }^{4} \leq K_{4}E \Biggl(\sum _{k=1}^{N} \vert \gamma _{k} \vert ^{2} \Biggr)^{2} \leq \frac{K_{4}N^{2}}{v^{4}n^{4}}=O \bigl(N^{-2}\bigr). \end{aligned}$$

By the Borel–Cantelli lemma again, we see that almost surely \(m_{n}(z)-Em_{n}(z)\rightarrow 0\). By the same argument, we get \(\hat{m}_{n}(z)-E\hat{m}_{n}(z)\rightarrow 0\), a.s., which means (i) is completed.

Then we come to the proof of (ii). We firstly introduce some notation. For \(0\leq {s}\leq {1}\),

$$\begin{aligned} &H(s)=s^{1/2}Y+(1-s)^{1/2}\hat{Y},\qquad V(s)=\frac{1}{\sqrt{n}}H(s), \\ &J(s)=V(s)V^{*}(s),\qquad U(z,s)=\bigl(J(s)-zI\bigr)^{-1}, \\ &M_{0}(z,s)=\frac{1}{n}\operatorname{tr} {U}(z,s),\qquad {U}'=\frac{\partial }{\partial z}{U}(z,s). \end{aligned}$$

By the same procedure in (2.3), we have

$$\begin{aligned} & Em_{n}(z)-E\hat{m}_{n}(z) \\ &\quad= \int _{0}^{1}\frac{\partial }{\partial {s}}E \bigl(M_{0}(z,s)\bigr)\,ds \\ &\quad=\frac{1}{2nN^{1/2}} \int _{0}^{1}Etr \bigl(\bigl({(1-s)}^{-1/2} \hat{Y}-s^{-1/2}Y\bigr)V^{*}(s)U' \bigr)\,ds. \end{aligned}$$
(2.7)

It follows by Lemma A.2 with \(\varPhi =(V^{*}(s)U')_{ij}\) that

$$\begin{aligned} & Etr \bigl({(1-s)}^{-1/2}\hat{Y}V^{*}(s)U' \bigr) \\ &\quad = (1-s)^{-1/2}\sum_{i,j}E\bigl( \hat{Y}_{ij}\bigl(V^{*}(s)U'\bigr)_{ij} \bigr) \\ &\quad=\frac{1}{N^{1/2}}\sum_{i,j}E \vert \hat{Y}_{ij} \vert ^{2}E \bigl(D_{ij}(s) \bigl(V^{*}(s)U'\bigr)_{ij} \bigr) , \end{aligned}$$

where \(D_{ij}{(s)}={\partial }/{\partial V_{ij}(s)}\).

By Lemma A.3 with \(p=1\) and \(\varPhi =(V^{*}(s)U')_{ij}\) again, we can see by \(\kappa _{1}=EY_{ij}=0\) and \(\kappa _{2}=E|Y_{ij}|^{2}\) that

$$\begin{aligned} & Etr\bigl(s^{-1/2}Y\bigl(V^{*}(s)U'\bigr)\bigr) \\ &\quad= s^{-1/2}\sum_{i,j}E\bigl(Y_{ij} \bigl(V^{*}(s)U'\bigr)_{ij}\bigr) \\ &\quad= s^{-1/2}\sum_{i,j} \biggl( \frac{1}{N^{1/2}}s^{1/2}E \vert Y_{ij} \vert ^{2}E \bigl(D_{ij}\bigl(V^{*}(s)U' \bigr)_{ij} \bigr)+\varepsilon _{ij} \biggr) \\ &\quad=\frac{1}{N^{1/2}}\sum_{i,j}E \vert Y_{ij} \vert ^{2}E\bigl(D_{ij} \bigl(V^{*}(s)U'\bigr)_{ij}\bigr)+s^{-1/2} \varepsilon _{0} \end{aligned}$$

and

$$\begin{aligned} \vert \varepsilon _{0} \vert := \biggl\vert \sum _{i,j}\varepsilon _{ij} \biggr\vert \leq C \frac{s}{N}\sum_{i,j}E \vert Y_{ij} \vert ^{3}\sup_{V\in \mathcal{M}_{n,N}} \bigl\vert D_{ij}^{2}{\bigl(V^{*}U' \bigr)}_{ij} \bigr\vert , \end{aligned}$$

where \(\mathcal{M}_{n,N}\) is the set of \(n\times N\) real matrices.

By (2.7), we have

$$\begin{aligned} & \bigl\vert Em_{n}(z)-E\hat{m}_{n}(z) \bigr\vert \\ &\quad\leq \frac{1}{2nN^{1/2}} \int _{0}^{1} \biggl(\frac{1}{N^{1/2}} \biggl\vert \sum_{i,j}\bigl[\bigl(1-E \vert Y_{ij} \vert ^{2}\bigr)E\bigl(D_{ij} \bigl(V^{*}U'\bigr)_{ij}\bigr)\bigr] \biggr\vert + \frac{1}{s^{1/2}} \vert \varepsilon _{0} \vert \biggr) \,ds \\ &\quad=\frac{1}{2nN} \int _{0}^{1} \biggl\vert \sum _{i,j}\bigl\{ \bigl(1-E \vert Y_{ij} \vert ^{2}\bigr)E\bigl(D_{ij}\bigl(V^{*}U' \bigr)_{ij}\bigr) \bigr\} \biggr\vert \,ds+\frac{1}{2nN^{1/2}} \int _{0}^{1}\frac{1}{s^{1/2}} \vert \varepsilon _{0} \vert \,ds \\ &\quad:=\widehat{I}+\widehat{\mathit{II}}. \end{aligned}$$
(2.8)

The bound of \(|D_{ij}^{r}(V^{*}U')_{ji}|, r=1,2\), is critical for the proof. Since \((V^{*}U')_{ij}\) is analytic in \(z\in \mathbb{C}^{+}\), by the Cauchy inequality for the bound of derivatives of analytic functions in Lemma A.4, to get the bound of \(D_{ij}^{r}(V^{*}U')_{ij}, r=1,2\), on any compact set of \(\mathbb{C}^{+}\), it suffices to find the bound of \(D_{ij}^{r}(V^{*}U)_{ij}\) on the compact set. By elementary calculations, we can get the derivatives of \(V^{*}U\) with respect to the entries \(V_{ij}\), \(i = 1, 2,\ldots, n\), \(j = 1, 2,\ldots N\),

$$\begin{aligned} &D_{ij}\bigl(V^{*}U\bigr)_{ij}=D_{ij}(UV)_{ij}=U_{ii}- \bigl(V^{*}UV\bigr)_{jj}U_{ii}-(UV)^{2}_{ij}, \\ &D_{ij}^{2}\bigl(V^{*}U\bigr)_{ij}=-6U_{ii}(UV)_{ij}+6U_{ii}(UV)_{ii} \bigl(V^{*}UV\bigr)_{jj}+2(UV)^{3}_{ij}. \end{aligned}$$

As \(U=(VV^{*}-zI_{n\times n})^{-1}\), this induces \(\|U\|\leq \frac{1}{v}, \vert U_{ii} \vert \leq \frac{1}{v} \). Define \(\widetilde{U}=(V^{*}V-zI_{N\times N})^{-1}\). We also have \(\|\widetilde{U}\|\leq \frac{1}{v} \). By the facts that \(V\widetilde{U}=UV\) and \(V^{*}V\widetilde{U}=V^{*}UV\), we get

$$\begin{aligned} V^{*}V\widetilde{U}=I_{N\times N}+z\widetilde{U} \end{aligned}$$

and

$$\begin{aligned} \bigl\vert {\bigl(V^{*}UV\bigr)_{jj}} \bigr\vert = \bigl\vert {\bigl(V^{*}V\widetilde{U}\bigr)_{jj}} \bigr\vert \leq 1+\frac{ \vert z \vert }{v}. \end{aligned}$$

By the Schwartz inequality, we have

$$\begin{aligned} \bigl\vert (UV)_{ij} \bigr\vert \leq \bigl(U^{*}VV^{*}U^{*}\bigr)^{1/2}_{ii} \leq \biggl(\biggl(1+\frac{ \vert z \vert }{v}\biggr) \frac{1}{v} \biggr)^{1/2}. \end{aligned}$$

It follows by the Cauchy inequality in Lemma A.4 that

$$\begin{aligned} \sup_{V\in \mathcal{M}_{n,N}} \bigl\vert D_{ij}\bigl(V^{*}U' \bigr)_{ij} \bigr\vert =O(1) \end{aligned}$$

and

$$\begin{aligned} \sup_{V\in \mathcal{M}_{n,N}} \bigl\vert D_{ij}^{2} \bigl(V^{*}U'\bigr)_{ij} \bigr\vert =O(1) \end{aligned}$$
(2.9)

hold uniformly on any compact set of \(\mathbb{C}^{+}\).

Since \(\frac{1}{n}\sum_{i=1}^{n}|\frac{1}{N}\sum_{j=1}^{N}\sigma _{ij}^{2}-1| \rightarrow {0} \) by the assumption in Theorem 1.2,

$$\begin{aligned} \frac{1}{nN} \biggl\vert \sum_{i,j} \bigl(1-E \vert Y_{ij} \vert ^{2}\bigr) \biggr\vert = \frac{1}{nN} \biggl\vert nN-\sum_{i,j} \sigma _{ij}^{2} \biggr\vert \leq \frac{1}{n}\sum _{i}^{n} \Biggl\vert 1-\frac{1}{N}\sum _{j}^{N} \sigma _{ij}^{2} \Biggr\vert \rightarrow 0, \end{aligned}$$

which, together with (2.8) and (2.9), yields \(\widehat{I}=o(1)\).

By the assumption (1.2), without loss of generality, we select a sequence \(\eta _{n}\downarrow {0}\) and \(\eta _{n} > 1/\log N\) as \(n\rightarrow {\infty }\). Then

$$\begin{aligned} \widehat{\mathit{II}} \leq \frac{C\eta _{n}}{2nN} \int _{0}^{1}s^{1/2}\sum _{i,j}E \vert Y_{ij} \vert ^{2} \sup _{V\in \mathcal{M}_{n,N}} \bigl\vert D_{ij}^{2}{ \bigl(V^{*}U'\bigr)}_{ij} \bigr\vert \,ds. \end{aligned}$$

We also easily get \(\frac{1}{nN}\sum_{i,j}E|Y_{ij}|^{2}=1+o(1)\). Using (2.9) again, we can see

$$\begin{aligned} \bigl\vert Em_{n}(z)-E\hat{m}_{n}(z) \bigr\vert \leq C \eta _{n}+o(1). \end{aligned}$$

As \(\eta _{n}\rightarrow 0\), we have

$$ Em_{n}(z)-E\hat{m}_{n}(z)\rightarrow {0} $$

for any \(z\in \mathbb{C}^{+}\), which completes the proof of (ii).

Based on steps (i) and (ii), we conclude that

$$ F^{S_{n}}\mathop{\rightarrow}^{w}F_{y}^{MP},\quad \text{a.s.} $$

The proof of Theorem 1.2 is complete.