1 Introduction

In 1843, Hamilton described the hyper-complex number of rank 4, to which he gave the name of quaternion (see Kuipers 1999). Research on the quaternion matrices can be traced back to Wolf (1936). After a long blank period, people gradually discovered that quaternions and quaternion matrices play important roles in quantum physics, robot technology and artificial satellite attitude control, among other applications, see Adler (1995) and Finkelstein et al. (1962). Consequently, studies on quaternions have attracted considerable attention in recent years, see So et al. (1994), Zhang (1995), Kanzieper (2002), Akemann (2005), and Akemann and Phillips (2013), among others. In the following, we introduce the quaternion notation. A quaternion can be represented as a \(2 \times 2\) complex matrix

$$\begin{aligned} x = a \cdot \mathbf e + b \cdot \mathbf i + c \cdot \mathbf j + d \cdot \mathbf k =\left( {\begin{array}{*{20}{c}} a+bi &{}\quad c+di\\ { - c+di }&{}\quad {a-bi } \end{array}} \right) , \quad a,b,c,d\in {\mathbb R} \end{aligned}$$
(1)

where \(i\) denotes the imaginary unit and the quaternion units can be represented as

$$\begin{aligned} \mathbf {e} = \left( \begin{array}{c@{\quad }c} 1&{}0\\ 0&{}1\\ \end{array} \right) , \quad \mathbf i = \left( \begin{array}{c@{\quad }c} i&{}0\\ 0&{}- i\\ \end{array} \right) , \quad \mathbf j = \left( \begin{array}{c@{\quad }c} 0&{}1\\ -1&{}0\\ \end{array}\right) , \quad \mathbf k = \left( \begin{array}{c@{\quad }c} 0&{}i\\ i&{}0\\ \end{array}\right) . \end{aligned}$$

The conjugate of \(x\) is defined as

$$\begin{aligned} \bar{x} = a \cdot \mathbf e - b \cdot \mathbf i - c \cdot \mathbf j - d \cdot \mathbf k=\left( {\begin{array}{*{20}{c}} a-bi &{}\quad -c-di \\ { c-di }&{}\quad {a+bi } \end{array}} \right) \end{aligned}$$

and its norm as

$$\begin{aligned} \left\| x \right\| = \sqrt{{a^2} + {b^2} + {c^2} + {d^2}}. \end{aligned}$$

More details can be found in Kuipers (1999), Zhang (1997), and Mehta (2004). Using the matrix representation (1) of quaternions, an \(n\times n\) quaternion matrix \(\mathbf X\) can be rewritten as a \(2n\times 2n\) complex matrix \(\psi (\mathbf X)\), and so we can deal with quaternion matrices as complex matrices for convenience. Denote \(\mathbf S=\frac{1}{n}\mathbf X\mathbf X^*\) and \(\psi (\mathbf S)=\frac{1}{n}\psi (\mathbf X)\psi (\mathbf X)^*\). It is known (see Zhang 1997) that the multiplicities of all the eigenvalues (obviously they are all real) of \(\psi (\mathbf S)\) are even. Taking one from each of the \(n\) pairs of eigenvalues of \(\psi (\mathbf S)\), the \(n\) values are defined to be the eigenvalues of \(\mathbf S\).

In addition, wide application of computer science has increased a thousand fold in terms of computing speed and storage capability in the recent decades. Due to the failure of the applications of many classical conclusions, we need a new theory to analyze very large data sets with high dimensions. Luckily, the theory of random matrices (RMT) might be a possible route for dealing with these problems. The sample covariance matrix is one of the most important random matrices in RMT, which can be traced back to Wishart (1928). Marčenko and Pastur (1967) proved that the empirical spectral distribution (ESD) of a large dimensional complex sample covariance matrix tends to the Marčenko–Pastur (M–P) law. Since then, several successive studies on large dimensional complex or real sample covariance matrices have been completed. Here, the readers are referred to three books Anderson et al. (2010), Bai and Silverstein (2010), and Mehta (2004) for more details.

Under the normality assumption, there are three classic random matrix models: Gaussian orthogonal ensemble (GOE), for which all entries of the matrix are real normal random variables, Gaussian unitary ensemble (GUE), for which all entries of the matrix are complex normal random variables, and Gaussian symplectic ensemble (GSE), for which all entries of the matrix are normal quaternion random variables. Benefiting from the density function of the ensemble and the joint density of the eigenvalues, the results have gotten their own style. If we remove the normality assumption, the corresponding first two models have already had satisfactory results. For quaternion matrices, there are only a few references (see Yin and Bai 2014; Yin et al. 2013, 2014).

In this paper, we prove that the ESD of the quaternion sample covariance matrix also converges to the M–P law. However, due to the multiplication of quaternions is not commutative, when the entries of \(\mathbf X_n\) are quaternion random variables, few works on the spectral properties are found in the literature unless the random variables are normality distributed, because in this case the joint density of the eigenvalues is available. Thanks to the tool provided by Yin et al. (2013), it makes the quaternion case possible. For the proof of this result, we first introduce two definitions about ESD and Stieltjes transform. Let \({\mathbf A}\) be a \(p \times p\) Hermitian matrix and denote its eigenvalues by \({s_j}, j = 1,2, \ldots , p\). The ESD of \({\mathbf A}\) is defined by

$$\begin{aligned} {F^{\mathbf A}}(x) =\frac{1}{p}\sum \limits _{j = 1}^p {I({s _j} \le x)}, \end{aligned}$$

where \({I(D)}\) is the indicator function of an event \({D}\) and the Stieltjes transform of \({F^{\mathbf A}}(x)\) is given by

$$\begin{aligned} { m}(z)=\int _{-\infty }^{+\infty }\frac{1}{x-z}\,\mathrm{d}{F^{\mathbf A}}(x), \end{aligned}$$

where \(z=u+\upsilon i\in \mathbb {C}^+\). Let \(g(x)\) and \({\mathbf m}_g(x)\) denote the density function and the Stieltjes transform of the M–P law, which are

$$\begin{aligned} g(x)= \left\{ {\begin{array}{*{20}{l}} {\frac{1}{{2\pi xy{\sigma ^2}}}\sqrt{(b - x)(x - a)} ,}&{}\quad a \le x \le b;\\ {0,} &{}\quad \text{ otherwise }, \end{array}} \right. \end{aligned}$$
(2)

and

$$\begin{aligned} { m}_g(z)=\frac{{{\sigma ^2}(1 - y) - z + \sqrt{{{(z - {\sigma ^2} - y{\sigma ^2})}^2} - 4y{\sigma ^2}} }}{{2yz{\sigma ^2}}}, \end{aligned}$$
(3)

respectively, where \(a = {\sigma ^2}{(1 - \sqrt{y} )^2}\), \(b = {\sigma ^2}{(1 + \sqrt{y} )^2}\). Here, the constant \(y\) is the limit of dimension \(p\) to sample size \(n\) ratio and \({\sigma ^2}\) is the scale parameter. If \(y > 1\), \(G(x)\), the distribution function of \(g(x)\), has a point mass \(1 - 1/y\) at the origin.

Now, our main theorem can be described as follows.

Theorem 1

Let \(\mathbf X_n = \left( {x_{jk}^{(n)}}\right) \), \(j = 1, \ldots ,p, k= 1,\ldots ,n\). Suppose for each \(n\), \(\left\{ x_{jk}^{(n)}\right\} \) are independent quaternion random variables with a common mean \(\mu \) and variance \(\sigma ^2\). Assume that \(y_n=p/n \rightarrow y \in (0,\infty )\) and for any constant \(\eta > 0\),

$$\begin{aligned} \frac{1}{{np}}{\sum \limits _{jk} {{ E}\Vert {x_{jk}^{(n)}} \Vert } ^2}I\left( \Vert {x_{jk}^{(n)}} \Vert > \eta \sqrt{n} \right) \rightarrow 0. \end{aligned}$$
(4)

Then, with probability one, the ESD of the sample covariance matrix \(\mathbf S_n=\frac{1}{n}{\mathbf X_n}{\mathbf X_n^*}\) converges to the M–P law in distribution which has density function (2) and a point mass \(1-1/y\) at the origin when \(y>1\). Here, superscript \(^*\) stands for the complex conjugate transpose.

Remark 2

Without loss of generality, in the proof of Theorem 1, we assume that \( \sigma ^2=1\). Furthermore, one can see that removing the common mean of the entries of \(\mathbf X_n\) does not alter the LSD of sample covariance matrices. In fact, let

$$\begin{aligned} \mathbf T_n=\frac{1}{n}(\mathbf {X}_n-{ E}\mathbf{X}_n)(\mathbf {X}_n-\mathrm{E}\mathbf{X}_n)^*. \end{aligned}$$

By Lemma 17, we have, for all large \(p\),

$$\begin{aligned} \Vert {{F^{\mathbf S_n}} - {F^{\mathbf T_n}}} \Vert _{KS} \le \frac{1}{2p}\mathrm{rank}({ E}\mathbf X_n)\le \frac{1}{p}\rightarrow 0 \end{aligned}$$

where \(\Vert f\Vert _{KS}=\sup _x|f(x)|\). Consequently, we assume that \(\mu =0\).

The paper is organized as follows. In Sect. 2, the structure of the inverse of some matrices about quaternions is established which is the key tool of proving Theorem 1. Section 3 demonstrates the proof of the main theorem by two steps and in Sect. 4, we outline some auxiliary lemmas that can be used in last section.

2 Preliminaries

We shall use Lemma 2.5 of Yin et al. (2013) to prove our main result in next section. To keep this work self-contained, the lemma is now stated as follows.

Definition 3

A matrix is of Type-I, if it has the following structure:

$$\begin{aligned} \left( {\begin{array}{*{20}{c}} {{t_1}}&{}\quad 0&{}\quad {{a_{12}}}&{}\quad {{b_{12}}}&{}\quad \cdots &{}\quad {{a_{1n}}}&{}\quad {{b_{1n}}}\\ 0&{}\quad {{t_1}}&{}\quad {{c_{12}}}&{}\quad {{d_{12}}}&{}\quad \cdots &{}\quad {{c_{1n}}}&{}\quad {{d_{1n}}}\\ {{d_{12}}}&{}\quad { - {b_{12}}}&{}\quad {{t_2}}&{}\quad 0&{}\quad \cdots &{}\quad {{a_{2n}}}&{}\quad {{b_{2n}}}\\ { - {c_{12}}}&{}\quad {{a_{12}}}&{}\quad 0&{}\quad {{t_2}}&{}\quad \cdots &{}\quad {{c_{2n}}}&{}\quad {{d_{2n}}}\\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \vdots \\ {{d_{1n}}}&{}\quad { - {b_{1n}}}&{}\quad {{d_{2n}}}&{}\quad { - {b_{2n}}}&{}\quad \cdots &{}\quad {{t_n}}&{}\quad 0\\ { - {c_{1n}}}&{}\quad {{a_{1n}}}&{}\quad { - {c_{2n}}}&{}\quad {{a_{2n}}}&{}\quad \ldots &{}\quad 0&{}\quad {{t_n}} \end{array}} \right) . \end{aligned}$$

Here, all the entries are complex.

Definition 4

A matrix is of Type-II, if it has the following structure:

$$\begin{aligned} \left( {\begin{array}{*{20}{c}} {{t_1}}&{}\quad 0&{}\quad {{a_{12}} + {c_{12}} i}&{}\quad {{b_{12}} + {d_{12}} i}&{} \cdots &{}{{a_{1n}} + {c_{1n}} i}&{}\quad {{b_{1n}} + {d_{1n}} i}\\ 0&{}\quad {{t_1}}&{}\quad { - {{\bar{b}}_{12}} - {{\bar{d}}_{12}} i}&{}\quad {{{\bar{a}}_{12}} + {{\bar{c}}_{12}} i}&{} \cdots &{}{ - {{\bar{b}}_{1n}} - {{\bar{d}}_{1n}} i}&{}\quad {{{\bar{a}}_{1n}} + {{\bar{c}}_{1n}} i}\\ {{{\bar{a}}_{12}} + {{\bar{c}}_{12}} i}&{}\quad { - {b_{12}} - {d_{12}} i}&{}\quad {{t_2}}&{}\quad 0&{} \cdots &{}{{a_{2n}} + {c_{2n}} i}&{}\quad {{b_{2n}} + {d_{2n}} i}\\ {{{\bar{b}}_{12}} + {{\bar{d}}_{12}} i}\quad &{}{{a_{12}} + {c_{12}} i}&{}\quad 0&{}{{t_2}}&{} \cdots &{}{ - {{\bar{b}}_{2n}} - {{\bar{d}}_{2n}} i}&{}\quad {{{\bar{a}}_{2n}} + {{\bar{c}}_{2n}} i}\\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots \\ {{{\bar{a}}_{1n}} + {{\bar{c}}_{1n}} i}&{}\quad { - {b_{1n}} - {d_{1n}} i}&{}\quad {{{\bar{a}}_{2n}} + {{\bar{c}}_{2n}} i}&{}\quad { - {b_{2n}} - {d_{2n}} i}&{} \cdots &{}{{t_n}}&{}\quad 0\\ {{{\bar{b}}_{1n}} + {{\bar{d}}_{1n}} i}&{}\quad {{a_{1n}} + {c_{1n}} i}&{}\quad {{{\bar{b}}_{2n}} + {{\bar{d}}_{2n}} i}&{}\quad {{a_{2n}} + {c_{2n}} i}&{} \ldots &{}0&{}\quad {{t_n}} \end{array}} \right) . \end{aligned}$$

Here, \(i=\sqrt{-1}\) denotes the usual imaginary unit and all the other entries are complex numbers.

Definition 5

A matrix is of Type-III, if it has the following structure:

$$\begin{aligned} \left( {\begin{array}{*{20}{c}} {{t_1}}&{}\quad 0&{}\quad {{a_{12}} }&{}\quad {{b_{12}} }&{}\quad \cdots &{}\quad {{a_{1n}}}&{}\quad {{b_{1n}} }\\ 0&{}\quad {{t_1}}&{}\quad { - {{\bar{b}}_{12}} }&{}\quad {{{\bar{a}}_{12}} }&{}\quad \cdots &{}\quad { - {{\bar{b}}_{1n}} }&{}\quad {{{\bar{a}}_{1n}}}\\ {{{\bar{a}}_{12}} }&{}\quad { - {b_{12}} }&{}\quad {{t_2}}&{}\quad 0&{}\quad \cdots &{}\quad {{a_{2n}} }&{}\quad {{b_{2n}} }\\ {{{\bar{b}}_{12}} }&{}\quad {{a_{12}}}&{}\quad 0&{}\quad {{t_2}}&{}\quad \cdots &{}\quad { - {{\bar{b}}_{2n}} }&{}\quad {{{\bar{a}}_{2n}} }\\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \vdots \\ {{{\bar{a}}_{1n}}}&{}\quad { - {b_{1n}} }&{}\quad {{{\bar{a}}_{2n}}}&{}\quad { - {b_{2n}} }&{}\quad \cdots &{}\quad {{t_n}}&{}\quad 0\\ {{{\bar{b}}_{1n}} }&{}\quad {{a_{1n}} }&{}\quad {{{\bar{b}}_{2n}} }&{}\quad {{a_{2n}} }&{}\quad \ldots &{}\quad 0&{}\quad {{t_n}} \end{array}} \right) . \end{aligned}$$

Here, all the entries are complex.

Lemma 6

For all \(n\ge 1\), if a complex matrix \(\Omega _n\) is invertible and of Type-II, then \(\Omega _n^{-1}\) is a Type-I matrix.

The following corollary is immediate.

Corollary 7

For all \(n\ge 1\), if a complex matrix \(\Omega _n\) is invertible and of Type-III, then \(\Omega _n^{-1}\) is a Type-I matrix.

3 Proof of Theorem 1

In this section, we present the proof in two steps. The first one is to truncate, centralize and rescale the random variables \(\{x_{ij}^{(n)}\}\), then we may assume the additional conditions which are given in Remark 12. The other is the proof of the Theorem 1 under the additional conditions. Throughout the remainder of this paper, a local constant C may take different value at different place.

3.1 Truncation, centralization and rescaling

3.1.1 Truncation

Note that, condition (4) is equivalent to: for any \(\eta > 0\),

$$\begin{aligned} \mathop {\lim }\limits _{n \rightarrow \infty } \frac{1}{{{\eta ^2}np}}{\sum \limits _{jk} {{E}\Vert {x_{jk}^{(n)}} \Vert } ^2}I\left( \Vert {x_{jk}^{(n)}} \Vert > \eta \sqrt{n} \right) = 0. \end{aligned}$$
(5)

Applying Lemma 15, one can select a sequence \({\eta _n} \downarrow 0\) such that (5) remains true when \(\eta \) is replaced by \(\eta _n\).

Lemma 8

Suppose that the assumptions of Theorem refth:1 hold. Truncate the variables \(x_{jk}^{(n)}\) at \({\eta _n}\sqrt{n}\), and denote the resulting variables by \(\widehat{x}_{jk}^{(n)}\), i.e., \(\widehat{x}_{jk}^{(n)}=x_{jk}^{(n)}I(\Vert x_{jk}^{(n)}\Vert \le {\eta _n}\sqrt{n})\). Also denote

$$\begin{aligned} \widehat{\mathbf X}_n=\left( \widehat{x}_{jk}^{(n)}\right) \quad \text{ and }\quad \ \widehat{\mathbf S}_n=\frac{1}{n}{\widehat{\mathbf X}_n}{\widehat{\mathbf X}_n^*}. \end{aligned}$$

Then, with probability 1,

$$\begin{aligned} \Vert {{F^{{\mathbf S_n}}} - {F^{\widehat{\mathbf S}_n}}} \Vert _{KS}\rightarrow 0. \end{aligned}$$

Proof

Using Lemma 17, one has

$$\begin{aligned} \Vert {{F^{{\mathbf S_n}}} - {F^{\widehat{\mathbf S}_n}}} \Vert _{KS}&\le \frac{1}{{2p}}\mathrm{rank}\left( {\frac{1}{\sqrt{n}}{\mathbf X_n} - {\frac{1}{\sqrt{n}}{\widehat{\mathbf X}}_n}} \right) \nonumber \\&\le \frac{1}{{2p}}\sum \limits _{jk} {I\left( \Vert {x_{jk}^{(n)}} \Vert >{\eta _n}\sqrt{n} \right) }. \end{aligned}$$
(6)

Taking condition (5) into consideration, we get

$$\begin{aligned}&{ E}\left( \frac{1}{2p}\sum \limits _{jk} {I\left( \Vert {x_{jk}^{(n)}} \Vert > {\eta _n}\sqrt{n} \right) } \right) \!\le \! \frac{1}{{2\eta _n^2{np}}}{\sum \limits _{jk} {{ E}\Vert {x_{jk}^{(n)}} \Vert } ^2}I \left( \Vert {x_{jk}^{(n)}} \Vert > {\eta _n}\sqrt{n} \right) \!=\!o(1) \end{aligned}$$

and

$$\begin{aligned}&\mathrm{Var}\left( \frac{1}{2p}\sum \limits _{jk} {I\left( \Vert {x_{jk}^{(n)}} \Vert > {\eta _n}\sqrt{n} \right) } \right) \\&\quad \le \frac{1}{{4\eta _n^2{p^2}n}}{\sum \limits _{jk} {{ E}\Vert {x_{jk}^{(n)}} \Vert } ^2}I\left( \Vert {x_{jk}^{(n)}} \Vert > {\eta _n}\sqrt{n} \right) =o\left( \frac{1}{p}\right) \!. \end{aligned}$$

Then, by Bernstein’s inequality (see Lemma 18), for all small \(\varepsilon > 0 \) and large \(n\), we obtain

$$\begin{aligned} \mathrm{P}\left( \frac{1}{2p}\sum \limits _{jk } {I\left( \Vert {x_{jk}^{(n)}} \Vert > {\eta _n}\sqrt{n} \right) } \ge \varepsilon \right) \le 2{\mathrm{e}^{ - \varepsilon p/2}} \end{aligned}$$

which is summable. Combining (6), the above inequality with the Borel–Cantelli lemma, it follows that

$$\begin{aligned} \Vert {{F^{{\mathbf S_n}}} - {F^{\widehat{\mathbf S}_n}}} \Vert _{KS} \mathop {\longrightarrow }\limits ^{\mathrm{a.s.}} 0. \end{aligned}$$

This completes the proof of the lemma. \(\square \)

3.1.2 Centralization

Lemma 9

Suppose that the assumptions of Lemma 8 hold. Denote

$$\begin{aligned} \widetilde{x}_{jk}^{(n)}=\widehat{x}_{jk}^{(n)}-{ E}\widehat{x}_{jk}^{(n)}, \ \widetilde{\mathbf X}_n=(\widetilde{x}_{jk}^{(n)})\quad \text{ and } \quad \ \widetilde{\mathbf S}_n=\frac{1}{n}{\widetilde{\mathbf X}_n}{\widetilde{\mathbf X}_n^*}. \end{aligned}$$

Then, we obtain

$$\begin{aligned} L(F^{\widehat{\mathbf S}_n}, F^{\widetilde{\mathbf S}_n})=o(1), \end{aligned}$$

where \(L(\cdot ,\cdot )\) denotes the Lévy distance.

Proof

Using Lemma 16 and condition (5), we have

$$\begin{aligned} L^4(F^{\widehat{\mathbf S}_n},F^{\widetilde{\mathbf S}_n})&\le \frac{1}{2p^2}\left( \mathrm{tr}(\widehat{\mathbf S}_n+\widetilde{\mathbf S}_n)\right) \left( \mathrm{tr}\left( \frac{1}{\sqrt{n}}\widehat{\mathbf X}_n-\frac{1}{\sqrt{n}}\widetilde{\mathbf X}_n\right) \left( \frac{1}{\sqrt{n}}\widehat{\mathbf X}_n-\frac{1}{\sqrt{n}}\widetilde{\mathbf X}_n\right) ^*\right) \nonumber \\&=\frac{1}{2n^2p^2}\left( \sum _{jk}^{}\left( \Vert \widehat{x}_{jk}^{(n)}\Vert ^2+\Vert \widehat{x}_{jk}^{(n)}-{E}\widehat{x}_{jk}^{(n)}\Vert ^2\right) \right) \left( \sum _{jk}^{}\Vert { E}\widehat{x}_{jk}^{(n)}\Vert ^2\right) \nonumber \\&=\left( \frac{1}{np}\sum _{jk}^{}\left( \Vert \widehat{x}_{jk}^{(n)}\Vert ^2\!+\!\Vert \widehat{x}_{jk}^{(n)}\!-\!{ E}\widehat{x}_{jk}^{(n)}\Vert ^2\right) \right) \left( \frac{1}{2np}\sum _{jk}^{}\Vert { E}\widehat{x}_{jk}^{(n)}\Vert ^2\right) . \end{aligned}$$
(7)

To complete the proof of this lemma, we need to show that the first parentheses of the right-hand side of (7) is almost surely bounded. Applying Lemma 22, one has

$$\begin{aligned} { E}\left| \frac{1}{np}\sum _{jk}^{}\left( \Vert \widehat{x}_{jk}^{(n)}\Vert ^2-{E}\Vert \widehat{x}_{jk}^{(n)}\Vert ^2\right) \right| ^4&\le \frac{C}{n^4p^4}\left[ \sum _{jk}{ E}\Vert \widehat{x}_{jk}^{(n)}\Vert ^8+\left( \sum _{j,k}{ E}\Vert \widehat{x}_{jk}^{(n)}\Vert ^4\right) ^2\right] \\&\le Cn^{-2}\left( \eta _n^6n^{-1}y_n^{-3}+\eta _n^4y_n^{-2}\right) . \end{aligned}$$

This indicates by the Borel–Cantelli lemma

$$\begin{aligned} \frac{1}{np}\sum _{jk}^{}\left( \Vert \widehat{x}_{jk}^{(n)}\Vert ^2-{ E}\Vert \widehat{x}_{jk}^{(n)}\Vert ^2\right) \mathop {\longrightarrow }\limits ^{\mathrm{a.s.}}0. \end{aligned}$$

Moreover, we can similarly obtain

$$\begin{aligned} \frac{1}{np}\sum _{jk}^{}\left( \Vert \widehat{x}_{jk}^{(n)}-{ E}\widehat{x}_{jk}^{(n)}\Vert ^2-{E}\Vert \widehat{x}_{jk}^{(n)}-{E}\widehat{x}_{jk}^{(n)}\Vert ^2\right) \mathop {\longrightarrow }\limits ^\mathrm{a.s.} 0. \end{aligned}$$
(8)

Now, turning to (7), for all large \(n\),

$$\begin{aligned}&L^4(F^{\widehat{\mathbf S}_n},F^{\widetilde{\mathbf S}_n})\\&\quad \le \left( \frac{1}{np}\sum _{jk}^{}\left( { E}\Vert \widehat{x}_{jk}^{(n)}\Vert ^2+{ E}\Vert \widehat{x}_{jk}^{(n)}-{ E}\widehat{x}_{jk}^{(n)}\Vert ^2\right) +o_\mathrm{a.s.}(1)\right) \left( \frac{1}{2np}\sum _{jk}^{}\Vert { E}\widehat{x}_{jk}^{(n)}\Vert ^2\right) \\&\quad \le \frac{C}{np}\sum _{jk}^{}\Vert { E}\widehat{x}_{jk}^{(n)}\Vert ^2\\&\quad \le \frac{C}{np}\sum _{jk}^{}{ E}\Vert x_{jk}^{(n)}\Vert ^2I\left( \Vert x_{jk}^{(n)}\Vert >\eta _n\sqrt{n}\right) \rightarrow 0. \end{aligned}$$

The proof of the lemma is complete. \(\square \)

3.1.3 Rescaling

Define

$$\begin{aligned} \widetilde{\sigma }_{jk}^2={ E}\Vert \widetilde{x}_{jk}^{(n)}\Vert ^2,\,\, \xi _{jk}= \left\{ {\begin{array}{*{20}{c}} \zeta _{jk},&{} \widetilde{\sigma }_{jk}^2 < 1/2\\ \widetilde{x}_{jk}^{(n)}, &{} \widetilde{\sigma }_{jk}^2 \ge 1/2 \end{array}} \right. ,\,\, \varvec{\Lambda }= \frac{1}{\sqrt{n}}(\xi _{jk}),\,\, \sigma _{jk}^2={ E}\Vert \xi _{jk}\Vert ^2, \end{aligned}$$

where \(\zeta _{jk}\) is a bounded quaternion random variable with \({ E}\zeta _{jk}=0\), \(\mathrm{Var}\zeta _{jk}=1\) and independent with all other variables.

Lemma 10

Write

$$\begin{aligned} \breve{x}_{jk}^{(n)}={ \sigma _{jk}^{-1} {\xi }_{jk}},\,\, \ \breve{\mathbf X}_n=\left( \breve{x}_{jk}^{(n)}\right) ,\,\, \ \mathrm{and} \,\,\ \breve{\mathbf S}_n=\frac{1}{n}{\breve{\mathbf X}_n}{\breve{\mathbf X}_n^*}. \end{aligned}$$

Under the conditions assumed in Lemma 9, we have

$$\begin{aligned} L\left( F^{\breve{\mathbf S}_n}, F^{\widetilde{\mathbf S}_n}\right) =o(1). \end{aligned}$$

Proof

(a): Our first goal is to show that

$$\begin{aligned} L\left( F^{\widetilde{\mathbf S}_n}, F^{\varvec{\Lambda \Lambda }^*}\right) \mathop {\longrightarrow }\limits ^\mathrm{a.s.} 0. \end{aligned}$$

Let \({\mathcal E}_n\) be the set of pairs \((j,k)\) : \(\widetilde{\sigma }_{jk}^2 < \frac{1}{2}\) and \(N_n=\sum \nolimits _{(j,k)\in {\mathcal E}_n} I\left( \widetilde{\sigma }_{jk}^2 <1/2\right) \) be the number of such pairs. Due to \(\frac{1}{np}\sum \nolimits _{jk} \widetilde{\sigma }_{jk}^2 \rightarrow 1 \), we conclude that \(N_n=o(np)\). Owing to Lemma 16 and (8), we get

$$\begin{aligned} {L^4}(F^{\widetilde{\mathbf S}_n}, F^{\varvec{\Lambda }\varvec{\Lambda }^*})&\le \frac{1}{2p^2}(\mathrm{tr}(\widetilde{\mathbf S}_n+\varvec{\Lambda }\varvec{\Lambda }))\left( \mathrm{tr}\left( \frac{1}{\sqrt{n}}\widetilde{\mathbf X}_n-\varvec{\Lambda }\right) \left( \frac{1}{\sqrt{n}}\widetilde{\mathbf X}_n-\varvec{\Lambda }\right) ^*\right) \nonumber \\&=\frac{1}{2n^2p^2}\left( \sum _{jk}^{}\left( \Vert \widetilde{x}_{jk}^{(n)}\Vert ^2+\Vert \xi _{jk}\Vert ^2\right) \right) \left( \sum _{jk}^{}\Vert \xi _{jk}-\widetilde{x}_{jk}^{(n)}\Vert ^2\right) \nonumber \\&=\frac{1}{2n^2p^2}\left( \sum _{jk}^{}{ E}\left( \Vert \widetilde{x}_{jk}^{(n)}\Vert ^2\!+\!\Vert \xi _{jk}\Vert ^2\right) \!+\!o_\mathrm{a.s.}(1)\right) \left( \sum _{jk}^{}\Vert \xi _{jk}\!-\!\widetilde{x}_{jk}^{(n)}\Vert ^2\right) \nonumber \\&\le \frac{C}{np}\sum _{jk}^{}\Vert \xi _{jk}-\widetilde{x}_{jk}^{(n)}\Vert ^2 :=\frac{C}{np}\sum \limits _{h=1}^K u_h \end{aligned}$$
(9)

where \(K=N_n\) and \(u_h=\Vert \xi _{jk}-\widetilde{x}_{jk}^{(n)}\Vert ^2\). Using the fact that for all \(l\ge 1\), \(l!\ge (l/3)^l\), we have

$$\begin{aligned} { E}\left( \frac{1}{np}\sum \limits _{h=1}^K u_h\right) ^m&=\frac{1}{ n^mp^{m}}\sum \limits _{m_1+\cdots +m_K=m}\frac{m!}{{m_1}!\ldots {m_K}!}{ E}u_1^{m_1}\ldots { E}u_K^{m_K} \\&\le \frac{1}{ n^{m}p^{m}}\sum \limits _{l=1}^m \sum _{\begin{array}{c} m_1+\cdots +m_l=m \\ m_t\ge 1 \end{array}}\frac{m!}{l!m_1! \ldots m_l!}\prod _{t=1}^l \left( \sum _{h=1}^K { E}u_h^{m_t}\right) \\&\le C \sum _{l=1}^m n^{-m} p^{-m}l^m(l!)^{-1}(2\eta _n^2 n)^{m-l}2^l K^l \\&\le C\sum _{l=1}^m\left( \frac{6K}{np}\right) ^l\left( \frac{2\eta _n^2l}{p}\right) ^{m-l} \le C\left( \frac{6K}{np}+\frac{2\eta _n^2 m}{p}\right) ^m. \end{aligned}$$

By selecting \(m=[\log p]\) that implies \(\frac{2\eta _n^2m}{p}\rightarrow 0\), and noticing \(\frac{6K}{np} \rightarrow 0\), one obtains for any fixed \(t, \varepsilon >0\),

$$\begin{aligned} { E}\left( \frac{1}{\varepsilon np}\sum \limits _{h=1}^K u_h\right) ^m \le o\left( p^{-t}\right) . \end{aligned}$$

From the inequality above with \(t=2\) and (9), it follows that

$$\begin{aligned} L\left( F^{\widetilde{\mathbf S}_n} , F^{\varvec{\Lambda }\varvec{\Lambda }^*}\right) \mathop {\longrightarrow }\limits ^\mathrm{a.s.} 0. \end{aligned}$$

(b): Our next goal is to show that

$$\begin{aligned} L\left( F^{\breve{\mathbf S}_n}, F^{\varvec{\Lambda }\varvec{\Lambda }^*}\right) \mathop {\longrightarrow }\limits ^\mathrm{a.s.} 0. \end{aligned}$$

Applying Lemma 16, we have

$$\begin{aligned} {L^4}(F^{\breve{\mathbf S}_n} , F^{\varvec{\Lambda }\varvec{\Lambda }^*})&\le \frac{1}{2p^2}\left( \mathrm{tr}\left( \breve{\mathbf S}_n+\varvec{\Lambda }\varvec{\Lambda }\right) \right) \left( \mathrm{tr}\left( \frac{1}{\sqrt{n}}\breve{\mathbf X}_n-\varvec{\Lambda }\right) \left( \frac{1}{\sqrt{n}}\breve{\mathbf X}_n-\varvec{\Lambda }\right) ^*\right) \nonumber \\&=\frac{1}{2n^2p^2}\left( \sum _{jk}^{}\left( \Vert \breve{x}_{jk}^{(n)}\Vert ^2+\Vert \xi _{jk}\Vert ^2\right) \right) \left( \sum _{jk}^{}\Vert \xi _{jk}-\breve{x}_{jk}^{(n)}\Vert ^2\right) \nonumber \\&=\frac{1}{2n^2p^2}\left( \sum _{jk}^{}\left( 1+\sigma _{jk}^{-2}\right) { E}\Vert \xi _{jk}\Vert ^2+o_\mathrm{a.s.}(1)\right) \\&\quad \times \left( \sum _{jk}^{}\left( 1-\sigma _{jk}^{-1}\right) ^2\Vert \xi _{jk}\Vert ^2\right) \nonumber \\&\le \frac{C}{np}\sum _{jk}^{}\left( 1-\sigma _{jk}^{-1}\right) ^2\Vert \xi _{jk}\Vert ^2. \end{aligned}$$

Using the fact

$$\begin{aligned}&{ E}\left( \frac{C}{np}\sum _{jk}^{}\left( 1-\sigma _{jk}^{-1}\right) ^2\Vert \xi _{jk}\Vert ^2\right) = \frac{C}{np}\sum _{jk} \left( 1- \sigma _{jk}\right) ^2 \le \frac{C}{np}\sum _{jk} \left( 1- \sigma _{jk}^2\right) \\&\quad \le \frac{C\eta _n^2}{{{\eta _n^2np}}}\sum _{(j,k) \not \in {\mathcal E}_n } \left[ { E}\Vert x_{jk}^{(n)}\Vert ^2 I\left( \Vert x_{jk}^{(n)}\Vert \ge \eta _n \sqrt{n}\right) +\left( { E}\Vert x_{jk}^{(n)}\Vert I\left( \Vert x_{jk}^{(n)}\Vert \ge \eta _n \sqrt{n}\right) \right) ^2\right] \\&\quad \rightarrow 0 \end{aligned}$$

and Lemma 22, one gets

$$\begin{aligned}&{ E } \left| \frac{C}{np}\sum _{jk}^{}\left( 1{-}\sigma _{jk}^{-1}\right) ^2\left( \Vert \xi _{jk}\Vert ^2{-}{ E}\Vert \xi _{jk}\Vert ^2\right) \right| ^4 \\&\quad \!\le \frac{C}{n^4p^4}\left[ \sum _{j, k}{ E}\Vert x_{jk}^{(n)}\Vert ^8 I(\Vert x_{jk}^{(n)}\Vert \!\le \! \eta _n \sqrt{n})\!+\!\left( \sum _{j, k} { E}\Vert x_{jk}^{(n)}\Vert ^4 I\left( \Vert x_{jk}^{(n)}\Vert \!\le \! \eta _n \sqrt{n}\right) \right) ^2 \right] \\&\quad \!\le Cn^{-2}\left[ n^{-1}\eta _n^6y_n^{-3}+\eta _n^4y_n^{-2}\right] \end{aligned}$$

which is summable. Together with the Borel–Cantelli lemma, it follows that

$$\begin{aligned} L\left( F^{\breve{\mathbf S}_n} , F^{\varvec{\Lambda }\varvec{\Lambda }^*}\right) \mathop {\longrightarrow }\limits ^\mathrm{a.s.}0. \end{aligned}$$

(c): Finally, from (a) and (b), we can easily get the lemma. \(\square \)

Combining the results of Lemmas 8, 9, and 10, we have the following remarks.

Remark 11

For brevity, we shall drop the superscript (n) from the variables. Also the truncated and renormalized variables are still denoted by \(x_{jk}\).

Remark 12

Under the conditions assumed in Theorem 1, we can further assume that

  1. (1)

    \(\Vert x_{jk}\Vert \le \eta _n\sqrt{n}\),

  2. (2)

    \({ E}(x_{jk})=0 \) and \( \mathrm{Var}(x_{jk})=1\).

3.2 Completion of the proof

Denote

$$\begin{aligned} { m}_n(z)=\frac{1}{2p}\mathrm{tr}\left( \mathbf S_n-z\mathbf I_{2p}\right) ^{-1}, \end{aligned}$$
(10)

where \(z=u+\upsilon i\in \mathbb {C}^+\).

3.2.1 Random part

First, we should show that

$$\begin{aligned} {{ m}_n}(z) - {E}{{m}_n}(z) \mathop {\longrightarrow }\limits ^\mathrm{a.s.}0. \end{aligned}$$
(11)

Let \(\pi _j\) denote the \(j\)th column of \(\mathbf X_n\), \(\mathbf S_n^k = {\mathbf S}_n - \frac{1}{n}{\varvec{\pi }} _k\varvec{\pi }_k^*\) and \({{ E}_k}( \cdot )\) denote the conditional expectation given \(\{ {{\varvec{\pi }_{k + 1}},{\varvec{\pi }_{k + 2}}, \ldots ,{\varvec{\pi }_{2n}}} \}\). Then,

$$\begin{aligned} {{ m}_n}(z) - { E}{{ m}_n}(z) =&\frac{1}{{2p}}\sum \limits _{k = 1}^{2n} {{\gamma _k}}, \end{aligned}$$

where

$$\begin{aligned} {\gamma _k} =&\,{{{ E}_{k - 1}}\mathrm{tr}{{\left( {\mathbf S_n} - z{\mathbf I_{2p}}\right) }^{ - 1}} - } {{ E}_k}\mathrm{tr}{\left( {\mathbf S_n} - z{\mathbf I_{2p}}\right) ^{ - 1}}\\ =&\,\left( {{ E}_{k - 1}} - {{ E}_k}\right) \left[ \mathrm{tr}{\left( {\mathbf S_{n}} - z{\mathbf I_{2p}}\right) ^{ - 1}} - \mathrm{tr}{\left( \mathbf S_n^k - z{\mathbf I_{2p}}\right) ^{ - 1}}\right] . \end{aligned}$$
  1. 1.

    When \(k = 2t - 1 (t =1,2, \ldots ,n)\), due to \((k+1)\)th column is a function of the \(k\)th column, we obtain

    $$\begin{aligned} {\gamma _k} =&\,{{\mathrm{E}_{k - 1}}\mathrm{tr}{{\left( {\mathbf S_n} - z{\mathbf I_{2p}}\right) }^{ - 1}} - } {{ E}_k}\mathrm{tr}{\left( {\mathbf S_n} - z{\mathbf I_{2p}}\right) ^{ - 1}} = 0. \end{aligned}$$
  2. 2.

    When \(k = 2t (t =0,1, \ldots ,n)\), together with the formula

    $$\begin{aligned} {({\mathbf A} + {\varvec{\alpha }} {{\varvec{\beta }}^*})^{ - 1}} = {{\mathbf A}^{ - 1}} - \frac{{{{\mathbf A}^{ - 1}}{\varvec{\alpha } } {{\varvec{\beta }} ^ * }{{\mathbf A}^{ - 1}}}}{{1 + {{\varvec{\beta }}^ * }{{\mathbf A}^{ - 1}}{\varvec{\alpha } }}}, \end{aligned}$$

    one finds

    $$\begin{aligned} {\gamma _k} =&\,\left( {{ E}_{k - 1}} - {{ E}_k}\right) \left[ \mathrm{tr}{({\mathbf S_{n}} - z{\mathbf I_{2p}})^{ - 1}} - \mathrm{tr}{\left( \mathbf S_n^k - z{\mathbf I_{2p}}\right) ^{ - 1}}\right] \\ =\,&({{ E}_{k - 1}} - {{ E}_k})\frac{\frac{1}{n}{\varvec{\pi }_{k}^ * {{\left( \mathbf S_n^k - z{\mathbf I_{2p}}\right) }^{ - 2}}{\varvec{\pi }_{k}}}}{{1 +\frac{1}{n} \varvec{\pi }_{k}^ * {{\left( \mathbf S_n^k - z{\mathbf I_{2p}}\right) }^{ - 1}}{\varvec{\pi }_{k}}}}. \end{aligned}$$

    Since

    $$\begin{aligned}&\left| \frac{\frac{1}{n}{\varvec{\pi }_{k}^ * {{\left( \mathbf S_n^k - z{\mathbf I_{2p}}\right) }^{ - 2}}{\varvec{\pi }_{k}}}}{{1 +\frac{1}{n} \varvec{\pi }_{k}^ * {{\left( \mathbf S_n^k - z{\mathbf I_{2p}}\right) }^{ - 1}}{\varvec{\pi }_{k}}}} \right| \\&\quad \le \frac{\frac{1}{n}{\varvec{\pi }_{k}^ * {{\left( {{\left( \mathbf S_n^k - u{\mathbf I_{2p}}\right) }^2} + {\upsilon ^2}{\mathbf I_{2p}}\right) }^{ - 1}}{\varvec{\pi }_{k}}}}{{\mathfrak {I}\left( 1 +\frac{1}{n} \varvec{\pi }_{k}^ * {{(\mathbf S_n^k - z{\mathbf I}_{2p})}^{ - 1}}{\varvec{\pi }}_{k}\right) }}\\&\quad = \frac{1}{\upsilon }, \end{aligned}$$

    we can easily get

    $$\begin{aligned} \left| \gamma _k\right| \le \frac{2}{\upsilon }. \end{aligned}$$

Using Lemma 21, it follows that

$$\begin{aligned} { E}{\left| {{{ m}_{n}}\left( z\right) - { E}{{ m}_{n}}\left( z\right) } \right| ^4} \le \frac{{{K_4}}}{{{{\left( 2p\right) }^4}}}{ E}{\left( {\sum \limits _{k = 1}^{2n} {{{\left| {{\gamma _k}} \right| }^2}} } \right) ^2} \le \frac{{{4 K_4}{n^2}}}{{{p^4}{v^4}}} = O\left( {n^{ - 2}}\right) \!. \end{aligned}$$

Combining the Borel–Cantelli lemma with the Chebyshev inequality, we conclude

$$\begin{aligned} {{ m}_n}(z) - { E}{{m}_n}(z)\mathop {\longrightarrow }\limits ^\mathrm{a.s.}0. \end{aligned}$$

3.2.2 Mean convergence

When \(\sigma ^2=1\), (3) turns into

$$\begin{aligned} { m}(z)=\frac{{1 - y- z + \sqrt{{{(1-z-y)}^2} - 4yz} }}{{2yz}}. \end{aligned}$$
(12)

Next, we show that

$$\begin{aligned} {E}{{ m}_n}(z) \rightarrow { m}(z). \end{aligned}$$

By Lemma 20 and (10), one has

$$\begin{aligned} {{ m}_n}(z) = \frac{1}{{2p}}\sum \limits _{k = 1}^p {\mathrm{tr}\left( \frac{1}{n}\varvec{\phi }_k^{\prime }\bar{\varvec{\phi }}_k - z{\mathbf I_2} - \frac{1}{{{n^2}}}\varvec{\phi }_k^{\prime }{\mathbf X}_{nk}^*{{\left( \frac{1}{n}{\mathbf X_{nk}}{\mathbf X_{nk}^*} - z{\mathbf I_{2p - 2}}\right) }^{-1}}{\mathbf X_{n k}}\bar{\varvec{\phi }}_k\right) }^{ - 1} \end{aligned}$$
(13)

where \({\mathbf X}_{nk}\) is the matrix resulting from deleting the \(k\)th quaternion row of \(\mathbf X_n\), and \({\varvec{\phi }}_k^{\prime }\) is the vector obtained from the \(k\)th quaternion row of \({\mathbf X}_n\). Here, superscript \(^{\prime }\) only stands for the transpose and \({\varvec{\phi }}_k^{\prime }\) is a \(1\times n\) quaternion matrix. Write

$$\begin{aligned} {\varvec{\varepsilon }_k}&=\frac{1}{n}\varvec{\phi }_k^{\prime }\bar{\varvec{\phi }}_k - z{\mathbf I_2} - \frac{1}{{{n^2}}}\varvec{\phi }_k^{\prime }{\mathbf X}_{nk}^*{{\left( \frac{1}{n}{\mathbf X_{nk}}{\mathbf X_{nk}^*} - z{\mathbf I_{2p - 2}}\right) }^{-1}}{\mathbf X_{nk}}\bar{\varvec{\phi }}_k \nonumber \\&\quad - (1 - z - {y_n} - {y_n}z{ E}{{ m}_n}(z)){\mathbf I_2} \end{aligned}$$
(14)

and

$$\begin{aligned} {\delta _n}&= - \frac{1}{2p\left( {1 - z - {y_n} - {y_n}z{ E}{{ m}_n}(z)}\right) }\nonumber \\&\qquad \times \sum \limits _{k = 1}^p E\mathrm{tr}\left\{ {\varvec{\varepsilon }_k}{\left( \left( 1 - z - {y_n} - {y_n}z{ E}{{ m}_n}(z)\right) {\mathbf I_2} + {\varvec{\varepsilon }_k}\right) ^{ - 1}}\right\} \end{aligned}$$
(15)

where \(y_n=p/n\). This implies that

$$\begin{aligned} { E}{{ m}_n}(z) = \frac{1}{{1 - z - {y_n} - {y_n}z{E}{{ m}_n}(z)}} + {\delta _n}. \end{aligned}$$

Solving \({ E}{{ m}_n}(z)\) from the equation above, we get

$$\begin{aligned} { E}{ m}_n(z)=\frac{1}{2y_nz}\left( 1-z-y_n+y_nz\delta _n\pm \sqrt{(1-z-y_n-y_nz\delta _n)^2-4y_nz}\right) . \end{aligned}$$

As proved in the Eq. (3.17) of Bai (1993), we can assert that

$$\begin{aligned} { E}{m}_n(z)=\frac{1-z-y_n+y_nz\delta _n+\sqrt{(1-z-y_n-y_nz\delta _n)^2-4y_nz}}{2y_nz}. \end{aligned}$$
(16)

Comparing (12) with (16), it suffices to show that

$$\begin{aligned} {\delta _n} \rightarrow 0. \end{aligned}$$

For this purpose, we need the following two lemmas.

Lemma 13

Under the conditions of Remark 12, for any \(z=u+vi\) with \(v>0\) and for any \(k=1,\ldots ,p\), we have

$$\begin{aligned} |E\mathrm{tr}\varvec{\varepsilon }_k| \rightarrow 0. \end{aligned}$$
(17)

Proof

By calculation, we have

$$\begin{aligned} |E\mathrm{tr}\varvec{\varepsilon }_k|&=\left| - \frac{1}{{{n^2}}}E\mathrm{tr}{\mathbf X}_{nk}^*{{\left( \frac{1}{n}{\mathbf X_{nk}}{\mathbf X_{nk}^*} - z{\mathbf I_{2p - 2}}\right) }^{-1}}{\mathbf X_{nk}}+2{y_n}+2 {y_n}z{ E}{{ m}_n}\left( z\right) \right| \\&=\left| - \frac{1}{n}E\mathrm{tr}{{\left( \frac{1}{n}{\mathbf X_{nk}}{\mathbf X_{nk}^*} - z{\mathbf I_{2p - 2}}\right) }^{-1}}\frac{1}{n}{\mathbf X_{nk}}{\mathbf X}_{nk}^* +2{y_n}+2 {y_n}z{ E}{{ m}_n}\left( z\right) \right| \\&\le \frac{2}{n}+\frac{|z|}{n}\left| { E}\left[ \mathrm{tr}{{\left( \frac{1}{n}{\mathbf X}_{n}{\mathbf X}_{n}^* - z{\mathbf I}_{2p }\right) }^{-1}}-\mathrm{tr}{{\left( \frac{1}{n}{\mathbf X}_{nk}{\mathbf X}_{nk}^* - z{\mathbf I_{2p - 2}}\right) }^{-1}}\right] \right| \\&\le \frac{2}{n}+\frac{2|z|}{n\upsilon } \rightarrow 0 \end{aligned}$$

where the last inequality has used Lemma 19 twice. Then, the proof is complete. \(\square \)

Lemma 14

Under the conditions of Remark 12, for any \(z=u+vi\) with \(v>0\) and any \(k=1,\ldots ,p\), we have

$$\begin{aligned} { E}|\mathrm{tr}\varvec{\varepsilon }_k^2| \rightarrow 0. \end{aligned}$$

Proof

Write the form of \(({\mathbf S_{n}} - z{\mathbf I_{2p}})\)

$$\begin{aligned} \left( {\begin{array}{*{20}{c}} {{t_1}}&{}\quad 0&{}\quad {{a_{12}}}&{}\quad {{b_{12}}}&{}\quad \cdots \\ 0&{}\quad {{t_1}}&{}\quad { - {{\bar{b}}_{12}}}&{}\quad {{{\bar{a}}_{12}}}&{}\quad \cdots \\ {{{\bar{a}}_{12}}}&{}\quad { - {b_{12}}}&{}\quad {{t_2}}&{}\quad 0&{}\quad \cdots \\ {{{\bar{b}}_{12}}}&{}\quad {{a_{12}}}&{}\quad 0&{}\quad {{t_2}}&{}\quad \cdots \\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots \end{array}} \right) . \end{aligned}$$

By Corollary 7 and (13), we have \(\frac{1}{n}\varvec{\phi }_k^{\prime }\bar{\varvec{\phi }}_k - z{\mathbf I_2} - \frac{1}{{{n^2}}}\varvec{\phi }_k^{\prime }{\mathbf X}_{nk}^*{\mathbf R_k}{\mathbf X_{n k}}\bar{\varvec{\phi }}_k\) is a scalar matrix. Let \(\mathbf R_k={\left( \frac{1}{n}{\mathbf X_{nk}}{\mathbf X_{nk}^*} - z{\mathbf I_{2p - 2}}\right) }^{-1}\). Denote by \(\varvec{\alpha }_k\) the first column of \(\varvec{\phi }_k\) and by \(\varvec{\beta }_k\) the second column of \(\varvec{\phi }_k\), then combining (14) we have

$$\begin{aligned} \varvec{\varepsilon }_k=\theta _k\mathbf{I}_2 \end{aligned}$$
(18)

where

$$\begin{aligned} \theta _k=&\, \frac{1}{n}{\varvec{\alpha }_k^{\prime }}{{\bar{\varvec{\alpha }} }_k} - z - \frac{1}{{{n^2}}}{\varvec{\alpha }_k^{\prime }}{\mathbf X_{nk}^*}{\mathbf R_k}{\mathbf X_{nk}}{{\bar{\varvec{\alpha }} }_k}-(1 - z - {y_n} - {y_n}z{ E}{{ m}_n}(z))\\ =&\,\frac{1}{n}{\varvec{\beta }_k^{\prime }}{{\bar{\varvec{\beta }} }_k} - z - \frac{1}{{{n^2}}}{\varvec{\beta }_k^{\prime }}{\mathbf X_{nk}^*}{\mathbf R_k}{\mathbf X_{nk}}{{\bar{\varvec{\beta }} }_k}-(1 - z - {y_n} - {y_n}z{ E}{{ m}_n}(z)). \end{aligned}$$

Let \(\widetilde{ E}(\cdot )\) denote the conditional expectation given \(\{ x_{jl},j=1,\ldots ,p,l=1,\ldots ,n;l\ne k\}\), then we get

$$\begin{aligned}&{ E}|\mathrm{tr}\varvec{\varepsilon }_k^2|=\frac{1}{2}{ E}|\mathrm{tr}\varvec{\varepsilon }_k|^2 \le&{2}\left[ { E}|\mathrm{tr}\varvec{\varepsilon }_k-\widetilde{ E}\mathrm{tr}\varvec{\varepsilon }_k|^2+{ E}|\widetilde{ E}\mathrm{tr}\varvec{\varepsilon }_k-{ E}\mathrm{tr}\varvec{\varepsilon }_k|^2+|{ E}\mathrm{tr}\varvec{\varepsilon }_k|^2\right] . \end{aligned}$$
(19)

According to the inequality above, we proceed to complete the estimation of \({ E}|\mathrm{tr}\varvec{\varepsilon }_k^2|\) by the following three steps.

  1. (a)

    For the first term of the right-hand side of (19), denote \(\mathbf T=(t_{jl})=\mathbf I_{2n}-\frac{1}{n}{\mathbf X}_{nk}^*{\mathbf R_k}{\mathbf X_{nk}}\) where \(t_{jl}=\left( \begin{array}{c@{\quad }c}e_{jl}&{}f_{jl}\\ h_{jl}&{}g_{jl}\end{array}\right) \). Then, rewrite

    $$\begin{aligned} \mathrm{tr}\varvec{\varepsilon }_k-\widetilde{ E}\mathrm{tr}\varvec{\varepsilon }_k&=\mathrm{tr}\left( \frac{1}{n}\varvec{\phi }_k^{\prime }\bar{\varvec{\phi }}_k - \frac{1}{{{n^2}}}\varvec{\phi }_k^{\prime }{\mathbf X}_{nk}^*{\mathbf R_k}{\mathbf X_{nk}}\bar{\varvec{\phi }}_k\right) -\mathrm{tr}\left( \mathbf I_2 - \frac{1}{{{n^2}}}{\mathbf X}_{nk}^*{\mathbf R_k}{\mathbf X_{nk}}\right) \\&=\frac{1}{n}\mathrm{tr}\left( \varvec{\phi }_k^{\prime }\mathbf T\bar{\varvec{\phi }}_k-\mathbf T\right) \\&=\frac{1}{n}\left( \sum _{j=1}^{n}\mathrm{tr}(\Vert x_{kj}\Vert ^2-1)t_{jj}+\sum _{j\ne l}^{}\mathrm{tr}(x_{kl}^*x_{kj}t_{jl})\right) \!. \end{aligned}$$

    By elementary calculation, we obtain

    $$\begin{aligned}&\widetilde{ E}|\mathrm{tr}\varvec{\varepsilon }_k-\widetilde{ E}\mathrm{tr}\varvec{\varepsilon }_k|^2\nonumber \\&\quad =\frac{1}{n^2}\bigg (\sum _{j=1}^{n}\widetilde{ E}|\mathrm{tr}(\Vert x_{kj}\Vert ^2-1)t_{jj}|^2+\sum _{j\ne l}^{}\widetilde{ E}\left[ \mathrm{tr}\left( x_{kl}^*x_{kj}t_{jl}\right) \mathrm{tr}(x_{kj}^*x_{kl}t_{jl}^*)\right. \nonumber \\&\left. \qquad +\mathrm{tr}\left( x_{kl}^*x_{kj}t_{jl}\right) \mathrm{tr}\left( x_{kl}^*x_{kj}t_{lj}^*\right) \right] \bigg )\nonumber \\&\quad \le \frac{1}{n^2}\left( \sum _{j=1}^{n}\widetilde{ E}(\Vert x_{kj}\Vert ^2-1)^2|e_{jj}+g_{jj}|^2+2\sum _{j\ne l}\widetilde{ E}|\mathrm{tr}(x_{kl}^*x_{kj}t_{jl})|^2\right) \nonumber \\&\quad \le \frac{C}{n^2}\left( \eta _n^2n\sum _{j=1}^{n}(|e_{jj}|^2 +|g_{jj}|^2)+\sum _{j\ne l}^{}(|e_{jl}|^2+|f_{jl}|^2+|g_{jl}|^2+|h_{jl}|^2)\right) \nonumber \\&\quad \le \frac{C\eta _n^2}{n}\sum _{j=1}^{n}(|e_{jj}|^2 +|g_{jj}|^2)+\frac{C}{n^2}\sum _{j, l}^{}(|e_{jl}|^2+|f_{jl}|^2+|g_{jl}|^2+|h_{jl}|^2)\nonumber \\&\quad \le \frac{C\eta _n^2}{n}\mathrm{tr} \mathbf T \mathbf T^*+\frac{C}{n^2}\mathrm{tr} \mathbf T \mathbf T^*. \end{aligned}$$
    (20)

    For \(\frac{1}{\sqrt{n}}{\mathbf X}_{nk}\), there exists a \((2p-2)\times q\) orthonormal matrix \(\mathbf U\) and a \(2n\times q\) orthonormal matrix \(\mathbf V\) such that

    $$\begin{aligned} \frac{1}{\sqrt{n}}{\mathbf X}_{nk}=\mathbf U \mathrm{diag}(s_1,\ldots ,s_q)\mathbf V^* \end{aligned}$$

    where \(s_1,\ldots ,s_q\) are the singular values of \(\frac{1}{\sqrt{n}}{\mathbf X}_{nk}\) and \(q=\min \{(2p-2),2n\}\). Then, we get

    $$\begin{aligned} \mathbf I_{2n}-\mathbf T=&\left( \frac{1}{\sqrt{n}}{\mathbf X}_{nk}^*\right) {\mathbf R_k}\left( \frac{1}{\sqrt{n}}{\mathbf X}_{nk}\right) \\ =&\,\mathbf V \mathrm{diag}\left( \frac{s_1^2}{s_1^2-z},\cdots ,\frac{s_q^2}{s_q^2-z}\right) \mathbf V^* \end{aligned}$$

    which implies that

    $$\begin{aligned} \mathbf T =\mathbf V \mathrm{diag} \left( \frac{-z}{s_1^2-z},\cdots ,\frac{-z}{s_q^2-z}\right) \mathbf V^*. \end{aligned}$$

    Consequently, it follows that

    $$\begin{aligned} \mathrm{tr} \mathbf T \mathbf T^*=\sum _{j=1}^{q}\frac{|z|^2}{|s_j^2-z|^2}\le \frac{2n|z|^2}{\upsilon ^2}. \end{aligned}$$
    (21)

    By (20) and (21), we obtain

    $$\begin{aligned} { E}|\mathrm{tr}\varvec{\varepsilon }_k-\widetilde{ E}\mathrm{tr}\varvec{\varepsilon }_k|^2\rightarrow 0. \end{aligned}$$
    (22)
  2. (b)

    Next, the second term of right-hand side of (19) is estimated. Note that

    $$\begin{aligned} \widetilde{ E}\mathrm{tr}\varvec{\varepsilon }_k-{ E}\mathrm{tr}\varvec{\varepsilon }_k=\frac{z}{n}\left( E\mathrm{tr}{\mathbf R_k}-\mathrm{tr}{\mathbf R_k}\right) . \end{aligned}$$

    Using the martingale decomposition method, we have

    $$\begin{aligned} { E}|\widetilde{ E}\mathrm{tr}\varvec{\varepsilon }_k-{ E}\mathrm{tr}\varvec{\varepsilon }_k|^2=&\,\frac{|z|^2}{n^2}{ E}|E\mathrm{tr}{\mathbf R_k}-\mathrm{tr}{\mathbf R_k}|^2\le \frac{4|z|^2}{n\upsilon ^2}\rightarrow 0. \end{aligned}$$
    (23)
  3. (c)

    Finally, combining (17), (19), (22), and (23), we conclude that

    $$\begin{aligned} { E}|\mathrm{tr}\varvec{\varepsilon }_k^2|\rightarrow 0. \end{aligned}$$

These indicate that we complete the proof of the lemma. \(\square \)

Now, we are in a position to show that

$$\begin{aligned} \delta _n\rightarrow 0. \end{aligned}$$

By (15) and (18), we can write

$$\begin{aligned} {\delta _n}&= - \frac{1}{2p({{1 - z - {y_n} - {y_n}z{ E}{{ m}_n}(z)}})^2}\sum \limits _{k = 1}^pE\mathrm{tr}{\varvec{\varepsilon }_k}\\&\quad + \frac{1}{2p({{1 - z - {y_n} - {y_n}z{ E}{{ m}_n}(z)}})^2}{\sum \limits _{k = 1}^p{ E}\frac{\mathrm{tr}{\varvec{\varepsilon }_k^2}}{(1 - z - {y_n} - {y_n}z{ E}{{ m}_n}(z)) + \theta _k}}. \end{aligned}$$

Note that

$$\begin{aligned} \mathfrak {I}(1 - z - {y_n} - {y_n}z{ E}{{m}_n}(z)) < - \upsilon , \end{aligned}$$

which implies that

$$\begin{aligned} | {1 - z - {y_n} - {y_n}z{ E}{{ m}_n}(z)} | > \upsilon . \end{aligned}$$
(24)

By Lemma 13 and (24), we have

$$\begin{aligned} \left| \frac{1}{2p({{1 - z - {y_n} - {y_n}z{ E}{{ m}_n}(z)}})^2}\sum \limits _{k = 1}^pE\mathrm{tr}{\varvec{\varepsilon }_k}\right| \le \frac{1}{2p\upsilon ^2}\sum \limits _{k = 1}^p|E\mathrm{tr}{\varvec{\varepsilon }_k}| \rightarrow 0. \end{aligned}$$
(25)

Together with Lemma 14, (24), and

$$\begin{aligned}&\mathfrak {I}(\theta _k+ (1 - z - {y_n} - {y_n}z{ E}{{ m}_n}(z)))\\&\quad =\mathfrak {I}\left( {\frac{1}{n}{\varvec{\alpha }_k^{\prime }}{{\bar{\varvec{\alpha }} }_k} - z - \frac{1}{{{n^2}}}{\varvec{\alpha }_k^{\prime }}{\mathbf X_{nk}^*}{{\left( \frac{1}{n}{\mathbf X_{nk}}{\mathbf X_{nk}^*} - z{\mathbf I_{2p - 2}}\right) }^{ - 1}}{\mathbf X_{nk}}{{\bar{\varvec{\alpha }} }_k}}\right) \\&\quad = -\upsilon \left( {1 + {\varvec{\alpha }_k^{\prime }}{\mathbf X_{nk}^*}{{\left[ {{{\left( \frac{1}{n}{\mathbf X_{nk}}{\mathbf X_{nk}^*} - u{\mathbf I_{2p - 2}}\right) }^2} + {\upsilon ^2}{\mathbf I_{2p - 2}}} \right] }^{ - 1}}{\mathbf X_{nk}}{{\bar{\varvec{\alpha }} }_k}} \right) < - \upsilon \end{aligned}$$

one finds that

$$\begin{aligned}&\left| \frac{1}{2p({{1 - z - {y_n} - {y_n}z{ E}{{ m}_n}(z)}})^2} \sum \limits _{k = 1}^p{ E}\frac{\mathrm{tr}{\varvec{\varepsilon }_k^2}}{(1 - z - {y_n} - {y_n}z{ E}{{ m}_n}(z)) + \theta _k}\right| \nonumber \\&\quad \le \frac{1}{2p\upsilon ^3}\sum \limits _{k = 1}^p{ E}|\mathrm{tr}{\varvec{\varepsilon }_k^2}| \rightarrow 0. \end{aligned}$$
(26)

Combining (25) with (26), we get

$$\begin{aligned} |{\delta _n}|&\le \left| \frac{1}{2p\left( {{1 - z - {y_n} - {y_n}z{ E}{{ m}_n}\left( z\right) }}\right) ^2}\sum \limits _{k = 1}^pE\mathrm{tr}{\varvec{\varepsilon }_k}\right| \\&\quad + \left| \frac{1}{2p\left( {{1 - z - {y_n} - {y_n}z{ E}{{ m}_n}\left( z\right) }}\right) ^2}{\sum \limits _{k = 1}^p{ E}\frac{\mathrm{tr}{\varvec{\varepsilon }_k^2}}{\left( 1 - z - {y_n} - {y_n}z{ E}{{ m}_n}\left( z\right) \right) + \theta _k}}\right| \\&\rightarrow 0. \end{aligned}$$

So far, we have completed the proof of the mean convergence

$$\begin{aligned} { E}{{ m}_n}(z) \rightarrow {\ m}\left( z\right) . \end{aligned}$$

3.2.3 Completion of the proof of Theorem 1

By Sects. 3.2.1 and 3.2.2, for any fixed \(z\in \mathbb C^+\), we have

$$\begin{aligned} { m}_n\left( z\right) \mathop {\longrightarrow }\limits ^\mathrm{a.s.} { m}\left( z\right) . \end{aligned}$$

To complete the proof Theorem 1, we need the last part of Chapter 2 of Bai and Silverstein (2010). For the readers convenience, we repeat here. That is, for each \(z\in \mathbb C^+\), there exists a null set \(N_z\) (i.e., \(\text{ P }\left( N_z\right) =0\)) such that

$$\begin{aligned} { m}_n\left( z,w\right) \rightarrow { m}\left( z\right) ,\quad \text{ for } \text{ all }\ w\in N_z^c. \end{aligned}$$

Now, let \(\mathbb C_0^+\) be a dense subset of \(\mathbb C^+\) (e.g., all \(z\) of rational real and imaginary parts) and let \(N=\bigcup _{z\in \mathbb C_0^+} N_{z}\). Then,

$$\begin{aligned} { m}_n\left( z,w\right) \rightarrow { m}\left( z\right) , \quad \text{ for } \text{ all } w\in N^c\,\, \text{ and }\,\, z\in \mathbb C_0^+. \end{aligned}$$

Let \(\mathbb C_m^+=\{z\in \mathbb C^+:\ \mathfrak {I}z >1/m,\ |z|\le m\}\). When \(z\in \mathbb C_m^+\), we have \(|{ m}_n\left( z\right) |\le m\). Applying Lemma 23, we have

$$\begin{aligned} { m}_n\left( z,w\right) \rightarrow { m}\left( z\right) , \quad \text{ for } \text{ all } \ w\in N^c \,\, \text{ and }\,\, z\in \mathbb C_m^+. \end{aligned}$$

Since the convergence above holds for every \(m\), we conclude that

$$\begin{aligned} { m}_n\left( z,w\right) \rightarrow { m}\left( z\right) ,\quad \text{ for } \text{ all } \ w\in N^c \,\, \text{ and } \,\, z\in \mathbb C^+. \end{aligned}$$

Applying Lemma 24, we conclude that

$$\begin{aligned} F^{\mathbf S_n}\overset{w}{\rightarrow } F, \ \text{ a.s. } \end{aligned}$$