Appendix 1
In this section, we give estimators for the parameters in our new test statistic, \({\widehat{T}}_{\mathrm{DT}}\), and discuss their asymptotic properties.
Estimation of \({x}_{jl}\)
Let \({\varvec{X}}=[{\varvec{x}}_{1},\ldots ,{\varvec{x}}_{n}]\), \(\overline{{\varvec{X}}}=[{\overline{{\varvec{x}}}},\ldots ,{\overline{{\varvec{x}}}}]\) and \({\varvec{P}}_{n}={\varvec{I}}_{n}-{\varvec{1}}_{n}{\varvec{1}}_{n}^T/n\), where \({\varvec{1}}_{n}=(1,\ldots ,1)^T\). Recall \({\varvec{S}}\) is the sample covariance matrix. One can write that \({\varvec{S}}=({\varvec{X}}-\overline{{\varvec{X}}})({\varvec{X}}-\overline{{\varvec{X}}})^T/(n-1)={\varvec{X}}{\varvec{P}}_{n}{\varvec{X}}^T/(n-1)\). Let us write the eigen-decomposition of \({\varvec{S}}\) as \({\varvec{S}}=\sum _{j=1}^{p}\hat{\lambda }_{j}\hat{{\varvec{h}}}_{j}\hat{{\varvec{h}}}_{j}^T\) having eigenvalues \(\hat{\lambda }_{1}\ge \cdots \ge \hat{\lambda }_{p}\ge 0\) and the corresponding p-dimensional unit eigenvectors \(\hat{{\varvec{h}}}_{1},\ldots ,\hat{{\varvec{h}}}_{p}\). We assume \(P({\varvec{h}}_{j}^T\hat{{\varvec{h}}}_{j} \ge 0)=1\) for all j without loss of generality. We also define the following \(n \times n\) dual sample covariance matrix\(:\)
$$\begin{aligned} {\varvec{S}}_{D}=(n-1)^{-1}{\varvec{P}}_{n}{\varvec{X}}{\varvec{X}}^T {\varvec{P}}_n =(n-1)^{-1}({\varvec{X}}-\overline{{\varvec{X}}})^T({\varvec{X}}-\overline{{\varvec{X}}}). \end{aligned}$$
Note that \({\varvec{S}}\) and \({\varvec{S}}_{D}\) share non-zero eigenvalues. Let us write the eigen-decomposition of \({\varvec{S}}_{D}\) as \({\varvec{S}}_{D}=\sum _{j=1}^{n-1}\hat{\lambda }_{j}\hat{{\varvec{u}}}_{j}\hat{{\varvec{u}}}_{j}^T\), where \(\hat{{\varvec{u}}}_{j}=(\hat{u}_{j1},\ldots ,\hat{u}_{jn})^T\) denotes a n-dimensional unit eigenvector corresponding to \(\hat{\lambda }_{j}\). In high-dimensional settings, we calculate \(\hat{{\varvec{h}}}_{j}\) by using \(\hat{{\varvec{u}}}_{j}\) as follows\(:\)
$$\begin{aligned} \hat{{\varvec{h}}}_{j}=\{(n-1)\hat{\lambda }_{j}\}^{-1/2}({\varvec{X}}-\overline{{\varvec{X}}}) \hat{{\varvec{u}}}_{j}. \end{aligned}$$
Note that \({\varvec{1}}_{n}^T {\varvec{S}}_D{\varvec{1}}_{n}=0\), so that \({\varvec{1}}_{n}^T\hat{{\varvec{u}}}_{j}=\sum _{l=1}^n\hat{u}_{jl}=0\) when \(\hat{\lambda }_{j}>0\).
For high-dimensional data, the sample eigenvalues and eigenvectors get huge noise. See Jung and Marron (2009) and Shen et al. (2016) for the details. In order to remove the huge noise, Yata and Aoshima (2012) focused on a geometric representation of \({\varvec{S}}_{D}\) and proposed the NR method. If one applies the NR method, the \(\lambda _{j}\)s and \({\varvec{h}}_j\)s are estimated by
$$\begin{aligned} {\tilde{\lambda }}_{j}= & {} \hat{\lambda }_{j}-\frac{\text{ tr }({\varvec{S}}_{D}) -\sum _{l=1}^j\hat{\lambda }_{l} }{n-1-j}\quad (j=1,\ldots ,n-2) \hbox { and } \end{aligned}$$
(17)
$$\begin{aligned} {\tilde{{\varvec{h}}}}_{j}= & {} \{(n-1){\tilde{\lambda }}_{j}\}^{-1/2} ({\varvec{X}}-\overline{{\varvec{X}}})\hat{{\varvec{u}}}_{j}\quad (j=1,\ldots ,n-2). \end{aligned}$$
(18)
Note that \(P({\tilde{\lambda }}_j \ge 0)=1\) for \(j=1,\ldots ,n-2\). We emphasize that \({\tilde{\lambda }}_{j}\)s and \({\tilde{{\varvec{h}}}}_j\)s have consistency properties under much milder conditions than \(\hat{\lambda }_{j}\)s and \(\hat{{\varvec{h}}}_j\)s. However, for the estimation of \(x_{jl}={\varvec{x}}_{l}^T{{\varvec{h}}}_{j}\), Aoshima and Yata (2018a) showed that \({\varvec{x}}_{l}^T{\tilde{{\varvec{h}}}}_{j}\) involves a huge bias and gave a modification for all j, l by
$$\begin{aligned} {\tilde{{\varvec{h}}}}_{jl}=\Big (\frac{n-1}{n-2}\Big ) \frac{({\varvec{X}}-\overline{{\varvec{X}}}) \hat{{\varvec{u}}}_{jl} }{\{(n-1){\tilde{\lambda }}_{j}\}^{1/2}}=\frac{(n-1)^{1/2}({\varvec{X}}-\overline{{\varvec{X}}}) \hat{{\varvec{u}}}_{jl} }{(n-2){{\tilde{\lambda }}}_{j}^{1/2}}, \end{aligned}$$
where
$$\begin{aligned} \hat{{\varvec{u}}}_{jl}=(\hat{u}_{j1},\ldots ,\hat{u}_{jl-1},-\hat{u}_{jl}/(n-1), \hat{u}_{jl+1},\ldots ,\hat{u}_{jn})^T. \end{aligned}$$
Note that \(\sum _{l=1}^{n}\hat{{\varvec{u}}}_{jl}/n_i=\{(n-2)/(n-1)\}\hat{{\varvec{u}}}_{j}\) and \(\sum _{l=1}^{n}{\tilde{{\varvec{h}}}}_{jl}/n={\tilde{{\varvec{h}}}}_{j}\). Then, we estimate \(x_{jl}\) by
$$\begin{aligned} {\tilde{x}}_{jl}={\varvec{x}}_{l}^T{\tilde{{\varvec{h}}}}_{jl} \ \hbox { for all }j,l. \end{aligned}$$
(19)
From Lemma B.1 in Aoshima and Yata (2018a), we have the following result.
Proposition 5
Assume (A-i) and (A-ii). It holds for \(j=1,\ldots ,k\) that as \(m \rightarrow \infty\)
$$\begin{aligned} \sum _{l=1}^{n}\frac{({\tilde{x}}_{jl}-{x}_{jl})^2}{n \lambda _j}=O_P\Big ( \frac{\lambda _1^2/n+{\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}}{\lambda _j^2}\Big ). \end{aligned}$$
Note that \(\text{ Var }({x}_{jl})=\lambda _j\). If \(\lambda _1^2/(n\lambda _j^2)=o(1)\), the normalized mean squared error in Proposition 5 tends to 0 under \(H_0\) in (1). See Sect. 5.1 in Aoshima and Yata (2018a) for the details.
Estimation of \(K_{1*}\)
We use the CDM method given by Yata and Aoshima (2010) to estimate \(K_{1*}\). Let \(n_{(1)}=\lceil n/2 \rceil\) and \(n_{(2)}=n-n_{(1)}\). Let \({\varvec{X}}_{1}=[{\varvec{x}}_{1},\ldots ,{\varvec{x}}_{n_{(1)}}]\) and \({\varvec{X}}_{2}=[{\varvec{x}}_{n_{(1)}+1},\ldots ,{\varvec{x}}_{n}]\). We define
$$\begin{aligned} {\varvec{S}}_{D(1)}=\{(n_{(1)}-1)(n_{(2)}-1)\}^{-1/2}({\varvec{X}}_{1} -\overline{{\varvec{X}}}_{1})^T({\varvec{X}}_{2}-\overline{{\varvec{X}}}_{2}), \end{aligned}$$
where \(\overline{{\varvec{X}}}_{i}=[{\overline{{\varvec{x}}}}_{n(i)},\ldots ,{\overline{{\varvec{x}}}}_{n(i)}]\) with \({\overline{{\varvec{x}}}}_{n(1)}=\sum _{l=1}^{n_{(1)}}{\varvec{x}}_{l}/n_{(1)}\) and \({\overline{{\varvec{x}}}}_{n(2)}=\sum _{l=n_{(1)}+1}^{n}\) \({\varvec{x}}_{l}/n_{(2)}\). We estimate \(\lambda _{j}\) by the j-th singular value, \(\acute{\lambda }_{j}\), of \({\varvec{S}}_{D(1)}\), where
$$\begin{aligned} \acute{\lambda }_{1}\ge \cdots \ge \acute{\lambda }_{n_{(2)}-1}\ge 0. \end{aligned}$$
Yata and Aoshima (2010) showed that \(\acute{\lambda }_{j}\) has several consistency properties for high-dimensional non-Gaussian data. Note that \(E\{\text{ tr }({\varvec{S}}_{D(1)}{\varvec{S}}_{D(1)}^T)\}=\text{ tr }\left( {\varvec{\varSigma }}^2 \right)\). We estimate \({\varPsi }_{r}\) by \(\widehat{\varPsi }_{1}=\text{ tr }({\varvec{S}}_{D(1)}{\varvec{S}}_{D(1)}^T)\) and
$$\begin{aligned} \widehat{\varPsi }_{r}=\text{ tr }({\varvec{S}}_{D(1)}{\varvec{S}}_{D(1)}^T) -\sum _{s=1}^{r-1}\acute{\lambda }_{s}^2\ \ \hbox {for }r=2,\ldots ,n_{(2)}-1. \end{aligned}$$
(20)
Note that \(P(\widehat{\varPsi }_{r}\ge 0)=1\) for \(r=1,\ldots ,n_{(2)}-1\). Then, Aoshima and Yata (2018a) gave the following result.
Lemma 1
(Aoshima and Yata 2018a) Assume (A-i) and (A-ii). Then, it holds that \(\widehat{\varPsi }_{r}/{{\varPsi }_{r}}=1+o_P(1)\) as \(m\rightarrow \infty\) for \(r=1,\ldots ,k+1\).
Thus, we estimate \(\text{ tr }({\varvec{\varSigma }}_*^2)\) by \(\widehat{\varPsi }_{k+1}\). Let
$$\begin{aligned} \widehat{K}_{1*}=2\widehat{\varPsi }_{k+1}/\{n(n-1) \}. \end{aligned}$$
Then, from Lemma 1, under (A-i) and (A-ii), it holds that
$$\begin{aligned} \widehat{K}_{1*}/{K}_{1*}=1+o_P(1) \ \hbox { as }m\rightarrow \infty . \end{aligned}$$
Estimation of k
Recently, Jung et al. (2018) proposed a test of the number of spiked components for high-dimensional data. On the other hand, Aoshima and Yata (2018a) gave estimation of k in (A-ii) by using the CDM method. Let \(\hat{\tau }_{r}=\widehat{\varPsi }_{r+1}/\widehat{\varPsi }_{r}\ (=1-\acute{\lambda }_{r}^2/\widehat{\varPsi }_{r})\) for all r, where \(\widehat{\varPsi }_{r}\) given by (20). Note that \(\hat{\tau }_{r}\in [0,1)\) for \(\acute{\lambda }_{r}>0\). Then, Aoshima and Yata (2018a) gave the following results.
Proposition 6
(Aoshima and Yata 2018a) Assume (A-i) and (A-ii). It holds that as \(m\rightarrow \infty\)
$$\begin{aligned}&P(\hat{\tau }_{r}<1-c_r)\rightarrow 1 \ \ \hbox {with some fixed constant }c_r\in (0,1)\hbox { for }r=1,\ldots ,k;\\&\hat{\tau }_{k+1}=1+o_P(1). \end{aligned}$$
Proposition 7
(Aoshima and Yata 2018a) Assume (A-i), (A-ii) and (A-v). Assume also \(\lambda _{k+1}^2/\varPsi _{k+1}=O(n^{-c})\) as \(m\rightarrow \infty\) with some fixed constant \(c>1/2\). It holds that as \(m\rightarrow \infty\)
$$\begin{aligned} P\Big (\hat{\tau }_{k+1}> \{1+(k+1)\gamma (n)\}^{-1} \Big ) \rightarrow 1, \end{aligned}$$
where \(\gamma (n)\) is a function such that \(\gamma (n)\rightarrow 0\) and \(n^{1/2}\gamma (n)\rightarrow \infty\) as \(n\rightarrow \infty\).
From Propositions 6 and 7, if one can assume the conditions in Proposition 7, one may consider k as the first integer \(r\ (={\hat{k}}_{o},\ \hbox {say})\) such that
$$\begin{aligned} \hat{\tau }_{r+1}\{1+(r+1)\gamma (n)\}>1 \quad (r \ge 0). \end{aligned}$$
(21)
Then, it holds that \(P({\hat{k}}_{o}=k)\rightarrow 1\) as \(m\rightarrow \infty\). Note that \(\widehat{\varPsi }_{n_{(2)}}=0\) from the fact that rank\(({\varvec{S}}_{D(1)})\le n_{(2)}-1\). Finally, one may choose k as
$$\begin{aligned} {\hat{k}}=\min \{{\hat{k}}_{o},n_{(2)}-2\} \end{aligned}$$
in actual data analysis. According to Aoshima and Yata (2018a), we use \(\gamma (n)=(n^{-1} \log {n})^{1/2}\) in (21). If \({\hat{k}}=0\), that is, the data is the NSSE model.
Appendix 2
Let \(\psi _{r}=\lambda _1^2/(n^2\lambda _{r})+{\varvec{\mu }}^T{\varvec{\varSigma }}{\varvec{\mu }}/(n\lambda _{r})\) for \(r=1,\ldots ,k\).
Proofs of Propositions 1 and 2
From the fact that \(\lambda _{k+1}^2\le \text{ tr }({\varvec{\varSigma }}_*^2)\), we note that
$$\begin{aligned} K_{2*}&\le 4\varDelta _* \lambda _{k+1}/n =O( \varDelta _* K_{1*}^{1/2}). \end{aligned}$$
(22)
Then, under (A-iv), it holds that as \(m\rightarrow \infty\)
$$\begin{aligned} \text{ Var }(T_{\mathrm{DT}})/\varDelta _*^2=K_*/\varDelta _*^2\rightarrow 0. \end{aligned}$$
(23)
Thus, we can conclude the result of Proposition 2. On the other hand, from (22), under (A-ii) and (A-iii), it holds that \(K_{2*}=o(K_{1*})\). Then, from Theorem 5 in Aoshima and Yata (2015), we can conclude the result of Proposition 1.
Proof of Theorem 1
From (S6.28) in Appendix B of Aoshima and Yata (2018a), we claim that for \(r=1,\ldots ,k\),
$$\begin{aligned} \sum _{l<l'}^{n} \frac{ {\tilde{x}}_{rl} {\tilde{x}}_{rl'}-{x}_{rl} {x}_{rl'}}{n(n-1)} =O_P\Big \{\psi _{r}^{1/2}(\psi _{r}^{1/2}+\lambda _{r}^{1/2} /n^{1/2}+{\varvec{h}}_r^T{\varvec{\mu }})\Big \} \end{aligned}$$
(24)
as \(m\rightarrow \infty\) under (A-i) and (A-ii). Here, by noting that \(\text{ tr }({\varvec{\varSigma }}_*^2)/\lambda _k^2=O(1)\) and \({\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}\le \varDelta \lambda _1\) we have that for \(r=1,\ldots ,k\)
$$\begin{aligned} \lambda _1^2/(n^2 \lambda _r)&=o(K_{1*}^{1/2});\quad {\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}/(n \lambda _r)=O\{\varDelta _* \lambda _1/(n \lambda _r)\}=o(\varDelta _*); \\ \lambda _1^2/n^3&=o(K_{1*}); \ \hbox { and } \ {\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}/n^2=O(\varDelta _* \lambda _1/n^2 )=o(\varDelta _* K_{1*}^{1/2}) \end{aligned}$$
under (A-ii), (A-v) and (A-vi), so that for \(r=1,\ldots ,k\)
$$\begin{aligned} \psi _{r}=o\big (\max \{K_{1*}^{1/2}, \varDelta _* \}\big ) \ \hbox { and } \ \psi _{r}^{1/2}\lambda _{r}^{1/2}/n^{1/2}=o\big (\max \{K_{1*}^{1/2}, \varDelta _* \}\big ). \end{aligned}$$
Also, note that \(|{\varvec{h}}_r^T{\varvec{\mu }}|\le \varDelta ^{1/2}=O(\varDelta _{*}^{1/2})\) under (A-vi), so that for \(r=1,\ldots ,k\)
$$\begin{aligned} \psi _{r}^{1/2} {\varvec{h}}_r^T{\varvec{\mu }}=o\big (\max \{K_{1*}^{1/2}, \varDelta _* \}\big ). \end{aligned}$$
Thus, from (24) we can conclude the first result of Theorem 1. In addition, from the first result, it holds that
$$\begin{aligned} {\widehat{T}}_{\mathrm{DT}}=T_{\mathrm{DT}}+o_P( K_{1*}^{1/2} ) \end{aligned}$$
under (A-i) to (A-iii), (A-v) and (A-vi). Thus from Proposition 1, it concludes the second result of Theorem 1.
Proofs of Corollaries 1 and 2
From the first result of Theorem 1, it holds that
$$\begin{aligned} {\widehat{T}}_{\mathrm{DT}}/\varDelta _*={T}_{\mathrm{DT}}/\varDelta _*+o_P(1) \end{aligned}$$
under (A-i), (A-ii) and (A-iv) to (A-vi). Thus from Proposition 2, it concludes the result of Corollary 1. In addition, from Corollary 1 and Lemma 1, it holds
$$\begin{aligned} P\left( {\widehat{T}}_{\mathrm{DT}}/{\widehat{K}_{1*}^{1/2}}>z_{\alpha }\right) =P\left( {\widehat{T}}_{\mathrm{DT}}/{\varDelta _*}>z_{\alpha }{\widehat{K}_{1*}^{1/2}}/{\varDelta _*}\right) =P\{1+o_P(1)>o_P(1)\} \rightarrow 1 \end{aligned}$$
under (A-i), (A-ii) and (A-iv) to (A-vi). It concludes the result of Corollary 2.
Proof of Theorem 2
First, we consider the case when (A-iii) is met. We note that \(K_*/K_{1*}\rightarrow 1\) as \(m\rightarrow \infty\) under (A-ii) and (A-iii). Then, from Theorem 1 and Lemma 1, under (A-i) to (A-iii), (A-v) and (A-vi), we have that
$$\begin{aligned}&P\left( {\widehat{T}}_{\mathrm{DT}}/{\widehat{K}_{1*}^{1/2}}>z_{\alpha }\right) \nonumber \\&\quad =P\{ \left( {\widehat{T}}_{\mathrm{DT}}-\varDelta _*\right) /K_{*}^{1/2}> \left( z_{\alpha }{K}_{1*}^{1/2}-\varDelta _*\right) /K_*^{1/2}+o_P(1)\}\nonumber \\&\quad =\varPhi \{ \left( \varDelta _*-z_{\alpha }K_{1*}^{1/2}\right) /K_*^{1/2}\}+o(1) =\varPhi \left( {\varDelta }_*/{K_{1*}^{1/2}}-z_{\alpha }\right) +o(1). \end{aligned}$$
(25)
It concludes the result about Size in Theorem 2. On the other hand, under (A-iv), from (23), it holds that
$$\begin{aligned} \varPhi \{ \left( \varDelta _*-z_{\alpha }K_{1*}^{1/2}\right) /K_*^{1/2}\}=1+o(1). \end{aligned}$$
Hence, from (25) and Corollary 2, by considering a convergent subsequence of \(\varDelta _*^2/K_{1*}\), we can conclude the result about Power in Theorem 2.
Proofs of Propositions 3 and 4
By noting that \(E({\varvec{x}}_l-{\varvec{\mu }})={\varvec{0}}\) (\(l=1,\ldots ,n\)) and \(\text{ Var }(T_{{\varvec{\mu }}})=K_{1*}\), the results are obtained straightforwardly from the results of Proposition 1 and Theorem 1.
Proof of Theorem 3
By noting that \({\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}\le \varDelta \lambda _1\) we have that for \(r=1,\ldots ,k\)
$$\begin{aligned} \lambda _1^2/(n^2 \lambda _r)&=O(\lambda _1/n^{3/2})=o(\varDelta );\quad {\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}/(n \lambda _r)=O\{\varDelta \lambda _1/(n \lambda _r)\}=o(\varDelta ); \\ \lambda _1^2/n^3&=o(\varDelta ^2 ); \ \hbox { and } \ {\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}/n^2=O(\varDelta _* \lambda _1/n^2 )=o(\varDelta ^2) \end{aligned}$$
as \(m\rightarrow \infty\) under (A-vii). Thus, from (24) it holds that
$$\begin{aligned} {\widehat{T}}_{\mathrm{DT}}/\varDelta =T_{\mathrm{DT}}/\varDelta +o_P(1) \end{aligned}$$
under (A-i), (A-ii) and (A-vii). Then, from Proposition 2, we can conclude the result of Theorem 3.
Proof of Proposition 5
From Lemma B.1 in Appendix B of Aoshima and Yata (2018a), under (A-i) and (A-ii), it holds for \(j=1,\ldots ,k\) that as \(m \rightarrow \infty\)
$$\begin{aligned} \sum _{l=1}^{n}\frac{({\tilde{x}}_{jl}-{x}_{jl})^2}{n}=O_P(n\psi _{j}). \end{aligned}$$
Thus, we can conclude the result of Proposition 5.