Inference on high-dimensional mean vectors under the strongly spiked eigenvalue model

Abstract

In this paper, we discuss inference problems on high-dimensional mean vectors under the strongly spiked eigenvalue (SSE) model. First, we consider one-sample test. In order to avoid huge noise, we derive a new test statistic by using a data transformation technique. We show that the asymptotic normality can be established for the new test statistic. We give an asymptotic size and power of a new test procedure and investigate the performance theoretically and numerically. We apply the findings to the construction of confidence regions on the mean vector under the SSE model. We further discuss multi-sample problems under the SSE models. Finally, we demonstrate the new test procedure by using actual microarray data sets.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis (3rd ed.). New York: Wiley.

    Google Scholar 

  2. Aoshima, M., & Yata, K. (2011). Two-stage procedures for high-dimensional data. Sequential Analysis, 30, 356–399. (Editor’s special invited paper).

    MathSciNet  Article  Google Scholar 

  3. Aoshima, M., & Yata, K. (2014). A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data. Annals of the Institute of Statistical Mathematics, 66, 983–1010.

    MathSciNet  Article  Google Scholar 

  4. Aoshima, M., & Yata, K. (2015). Asymptotic normality for inference on multisample, high-dimensional mean vectors under mild conditions. Methodology and Computing in Applied Probability, 17, 419–439.

    MathSciNet  Article  Google Scholar 

  5. Aoshima, M., & Yata, K. (2018a). Two-sample tests for high-dimension, strongly spiked eigenvalue models. Statistica Sinica, 28, 43–62.

    MathSciNet  MATH  Google Scholar 

  6. Aoshima, M., & Yata, K. (2018b). Distance-based classifier by data transformation for high-dimension, strongly spiked eigenvalue models. Annals of the Institute of Statistical Mathematics, in press (https://doi.org/10.1007/s10463-018-0655-z).

  7. Bai, Z., & Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statistica Sinica, 6, 311–329.

    MathSciNet  MATH  Google Scholar 

  8. Bennett, B. M. (1951). Note on a solution of the generalized Behrens–Fisher problem. Annals of the Institute of Statistical Mathematics, 2, 87–90.

    MathSciNet  Article  Google Scholar 

  9. Chen, S. X., & Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics, 38, 808–835.

    MathSciNet  Article  Google Scholar 

  10. Dempster, A. P. (1958). A high dimensional two sample significance test. The Annals of Mathematical Statistics, 29, 995–1010.

    MathSciNet  Article  Google Scholar 

  11. Dempster, A. P. (1960). A significance test for the separation of two highly multivariate small samples. Biometrics, 16, 41–50.

    MathSciNet  Article  Google Scholar 

  12. Ishii, A., Yata, K., & Aoshima, M. (2016). Asymptotic properties of the first principal component and equality tests of covariance matrices in high-dimension, low-sample-size context. Journal of Statistical Planning and Inference, 170, 186–199.

    MathSciNet  Article  Google Scholar 

  13. Jung, S., & Marron, J. S. (2009). PCA consistency in high dimension, low sample size context. The Annals of Statistics, 37, 4104–4130.

    MathSciNet  Article  Google Scholar 

  14. Jung, S., Lee, M. H., & Ahn, J. (2018). On the number of principal components in high dimensions. Biometrika, 105, 389–402.

    MathSciNet  Article  Google Scholar 

  15. Katayama, S., Kano, Y., & Srivastava, M. S. (2013). Asymptotic distributions of some test criteria for the mean vector with fewer observations than the dimension. Journal of Multivariate Analysis, 116, 410–421.

    MathSciNet  Article  Google Scholar 

  16. Nishiyama, T., Hyodo, M., Seo, T., & Pavlenko, T. (2013). Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices. Journal of Statistical Planning and Inference, 143, 1898–1911.

    MathSciNet  Article  Google Scholar 

  17. Notterman, D. A., Alon, U., Sierk, A. J., & Levine, A. J. (2001). Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research, 61, 3124–3130.

    Google Scholar 

  18. Nutt, C. L., Mani, D. R., Betensky, R. A., Tamayo, P., Cairncross, J. G., Ladd, C., et al. (2003). Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research, 63, 1602–1607.

    Google Scholar 

  19. Shen, D., Shen, H., Zhu, H., & Marron, J. S. (2016). The statistics and mathematics of high dimension low sample size asymptotics. Statistica Sinica, 26, 1747–1770.

    MathSciNet  MATH  Google Scholar 

  20. Srivastava, M. S. (2007). Multivariate theory for analyzing high dimensional data. Journal of the Japan Statistical Society, 37, 53–86.

    MathSciNet  Article  Google Scholar 

  21. Srivastava, M. S., & Du, M. (2008). A test for the mean vector with fewer observations than the dimension. Journal of Multivariate Analysis, 99, 386–402.

    MathSciNet  Article  Google Scholar 

  22. Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., et al. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1, 203–209.

    Article  Google Scholar 

  23. Yata, K., & Aoshima, M. (2010). Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix. Journal of Multivariate Analysis, 101, 2060–2077.

    MathSciNet  Article  Google Scholar 

  24. Yata, K., & Aoshima, M. (2012). Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. Journal of Multivariate Analysis, 105, 193–215.

    MathSciNet  Article  Google Scholar 

  25. Yata, K., & Aoshima, M. (2015). Principal component analysis based clustering for high-dimension, low-sample-size data. arXiv preprint, arXiv:1503.04525.

Download references

Acknowledgements

We would like to thank an associate editor and two anonymous referees for their constructive comments.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Aki Ishii.

Additional information

Research of the first author was partially supported by Grant-in-Aid for Young Scientists, Japan Society for the Promotion of Science (JSPS), under Contract Number 18K18015. Research of the second author was partially supported by Grant-in-Aid for Scientific Research (C), JSPS, under Contract Number 18K03409. Research of the third author was partially supported by Grants-in-Aid for Scientific Research (A) and Challenging Research (Exploratory), JSPS, under Contract Numbers 15H01678 and 17K19956.

Appendices

Appendix 1

In this section, we give estimators for the parameters in our new test statistic, \({\widehat{T}}_{\mathrm{DT}}\), and discuss their asymptotic properties.

Estimation of \({x}_{jl}\)

Let \({\varvec{X}}=[{\varvec{x}}_{1},\ldots ,{\varvec{x}}_{n}]\), \(\overline{{\varvec{X}}}=[{\overline{{\varvec{x}}}},\ldots ,{\overline{{\varvec{x}}}}]\) and \({\varvec{P}}_{n}={\varvec{I}}_{n}-{\varvec{1}}_{n}{\varvec{1}}_{n}^T/n\), where \({\varvec{1}}_{n}=(1,\ldots ,1)^T\). Recall \({\varvec{S}}\) is the sample covariance matrix. One can write that \({\varvec{S}}=({\varvec{X}}-\overline{{\varvec{X}}})({\varvec{X}}-\overline{{\varvec{X}}})^T/(n-1)={\varvec{X}}{\varvec{P}}_{n}{\varvec{X}}^T/(n-1)\). Let us write the eigen-decomposition of \({\varvec{S}}\) as \({\varvec{S}}=\sum _{j=1}^{p}\hat{\lambda }_{j}\hat{{\varvec{h}}}_{j}\hat{{\varvec{h}}}_{j}^T\) having eigenvalues \(\hat{\lambda }_{1}\ge \cdots \ge \hat{\lambda }_{p}\ge 0\) and the corresponding p-dimensional unit eigenvectors \(\hat{{\varvec{h}}}_{1},\ldots ,\hat{{\varvec{h}}}_{p}\). We assume \(P({\varvec{h}}_{j}^T\hat{{\varvec{h}}}_{j} \ge 0)=1\) for all j without loss of generality. We also define the following \(n \times n\) dual sample covariance matrix\(:\)

$$\begin{aligned} {\varvec{S}}_{D}=(n-1)^{-1}{\varvec{P}}_{n}{\varvec{X}}{\varvec{X}}^T {\varvec{P}}_n =(n-1)^{-1}({\varvec{X}}-\overline{{\varvec{X}}})^T({\varvec{X}}-\overline{{\varvec{X}}}). \end{aligned}$$

Note that \({\varvec{S}}\) and \({\varvec{S}}_{D}\) share non-zero eigenvalues. Let us write the eigen-decomposition of \({\varvec{S}}_{D}\) as \({\varvec{S}}_{D}=\sum _{j=1}^{n-1}\hat{\lambda }_{j}\hat{{\varvec{u}}}_{j}\hat{{\varvec{u}}}_{j}^T\), where \(\hat{{\varvec{u}}}_{j}=(\hat{u}_{j1},\ldots ,\hat{u}_{jn})^T\) denotes a n-dimensional unit eigenvector corresponding to \(\hat{\lambda }_{j}\). In high-dimensional settings, we calculate \(\hat{{\varvec{h}}}_{j}\) by using \(\hat{{\varvec{u}}}_{j}\) as follows\(:\)

$$\begin{aligned} \hat{{\varvec{h}}}_{j}=\{(n-1)\hat{\lambda }_{j}\}^{-1/2}({\varvec{X}}-\overline{{\varvec{X}}}) \hat{{\varvec{u}}}_{j}. \end{aligned}$$

Note that \({\varvec{1}}_{n}^T {\varvec{S}}_D{\varvec{1}}_{n}=0\), so that \({\varvec{1}}_{n}^T\hat{{\varvec{u}}}_{j}=\sum _{l=1}^n\hat{u}_{jl}=0\) when \(\hat{\lambda }_{j}>0\).

For high-dimensional data, the sample eigenvalues and eigenvectors get huge noise. See Jung and Marron (2009) and Shen et al. (2016) for the details. In order to remove the huge noise, Yata and Aoshima (2012) focused on a geometric representation of \({\varvec{S}}_{D}\) and proposed the NR method. If one applies the NR method, the \(\lambda _{j}\)s and \({\varvec{h}}_j\)s are estimated by

$$\begin{aligned} {\tilde{\lambda }}_{j}= & {} \hat{\lambda }_{j}-\frac{\text{ tr }({\varvec{S}}_{D}) -\sum _{l=1}^j\hat{\lambda }_{l} }{n-1-j}\quad (j=1,\ldots ,n-2) \hbox { and } \end{aligned}$$
(17)
$$\begin{aligned} {\tilde{{\varvec{h}}}}_{j}= & {} \{(n-1){\tilde{\lambda }}_{j}\}^{-1/2} ({\varvec{X}}-\overline{{\varvec{X}}})\hat{{\varvec{u}}}_{j}\quad (j=1,\ldots ,n-2). \end{aligned}$$
(18)

Note that \(P({\tilde{\lambda }}_j \ge 0)=1\) for \(j=1,\ldots ,n-2\). We emphasize that \({\tilde{\lambda }}_{j}\)s and \({\tilde{{\varvec{h}}}}_j\)s have consistency properties under much milder conditions than \(\hat{\lambda }_{j}\)s and \(\hat{{\varvec{h}}}_j\)s. However, for the estimation of \(x_{jl}={\varvec{x}}_{l}^T{{\varvec{h}}}_{j}\), Aoshima and Yata (2018a) showed that \({\varvec{x}}_{l}^T{\tilde{{\varvec{h}}}}_{j}\) involves a huge bias and gave a modification for all jl by

$$\begin{aligned} {\tilde{{\varvec{h}}}}_{jl}=\Big (\frac{n-1}{n-2}\Big ) \frac{({\varvec{X}}-\overline{{\varvec{X}}}) \hat{{\varvec{u}}}_{jl} }{\{(n-1){\tilde{\lambda }}_{j}\}^{1/2}}=\frac{(n-1)^{1/2}({\varvec{X}}-\overline{{\varvec{X}}}) \hat{{\varvec{u}}}_{jl} }{(n-2){{\tilde{\lambda }}}_{j}^{1/2}}, \end{aligned}$$

where

$$\begin{aligned} \hat{{\varvec{u}}}_{jl}=(\hat{u}_{j1},\ldots ,\hat{u}_{jl-1},-\hat{u}_{jl}/(n-1), \hat{u}_{jl+1},\ldots ,\hat{u}_{jn})^T. \end{aligned}$$

Note that \(\sum _{l=1}^{n}\hat{{\varvec{u}}}_{jl}/n_i=\{(n-2)/(n-1)\}\hat{{\varvec{u}}}_{j}\) and \(\sum _{l=1}^{n}{\tilde{{\varvec{h}}}}_{jl}/n={\tilde{{\varvec{h}}}}_{j}\). Then, we estimate \(x_{jl}\) by

$$\begin{aligned} {\tilde{x}}_{jl}={\varvec{x}}_{l}^T{\tilde{{\varvec{h}}}}_{jl} \ \hbox { for all }j,l. \end{aligned}$$
(19)

From Lemma B.1 in Aoshima and Yata (2018a), we have the following result.

Proposition 5

Assume (A-i) and (A-ii). It holds for\(j=1,\ldots ,k\) that as\(m \rightarrow \infty\)

$$\begin{aligned} \sum _{l=1}^{n}\frac{({\tilde{x}}_{jl}-{x}_{jl})^2}{n \lambda _j}=O_P\Big ( \frac{\lambda _1^2/n+{\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}}{\lambda _j^2}\Big ). \end{aligned}$$

Note that \(\text{ Var }({x}_{jl})=\lambda _j\). If \(\lambda _1^2/(n\lambda _j^2)=o(1)\), the normalized mean squared error in Proposition 5 tends to 0 under \(H_0\) in (1). See Sect. 5.1 in Aoshima and Yata (2018a) for the details.

Estimation of \(K_{1*}\)

We use the CDM method given by Yata and Aoshima (2010) to estimate \(K_{1*}\). Let \(n_{(1)}=\lceil n/2 \rceil\) and \(n_{(2)}=n-n_{(1)}\). Let \({\varvec{X}}_{1}=[{\varvec{x}}_{1},\ldots ,{\varvec{x}}_{n_{(1)}}]\) and \({\varvec{X}}_{2}=[{\varvec{x}}_{n_{(1)}+1},\ldots ,{\varvec{x}}_{n}]\). We define

$$\begin{aligned} {\varvec{S}}_{D(1)}=\{(n_{(1)}-1)(n_{(2)}-1)\}^{-1/2}({\varvec{X}}_{1} -\overline{{\varvec{X}}}_{1})^T({\varvec{X}}_{2}-\overline{{\varvec{X}}}_{2}), \end{aligned}$$

where \(\overline{{\varvec{X}}}_{i}=[{\overline{{\varvec{x}}}}_{n(i)},\ldots ,{\overline{{\varvec{x}}}}_{n(i)}]\) with \({\overline{{\varvec{x}}}}_{n(1)}=\sum _{l=1}^{n_{(1)}}{\varvec{x}}_{l}/n_{(1)}\) and \({\overline{{\varvec{x}}}}_{n(2)}=\sum _{l=n_{(1)}+1}^{n}\)\({\varvec{x}}_{l}/n_{(2)}\). We estimate \(\lambda _{j}\) by the j-th singular value, \(\acute{\lambda }_{j}\), of \({\varvec{S}}_{D(1)}\), where

$$\begin{aligned} \acute{\lambda }_{1}\ge \cdots \ge \acute{\lambda }_{n_{(2)}-1}\ge 0. \end{aligned}$$

Yata and Aoshima (2010) showed that \(\acute{\lambda }_{j}\) has several consistency properties for high-dimensional non-Gaussian data. Note that \(E\{\text{ tr }({\varvec{S}}_{D(1)}{\varvec{S}}_{D(1)}^T)\}=\text{ tr }\left( {\varvec{\varSigma }}^2 \right)\). We estimate \({\varPsi }_{r}\) by \(\widehat{\varPsi }_{1}=\text{ tr }({\varvec{S}}_{D(1)}{\varvec{S}}_{D(1)}^T)\) and

$$\begin{aligned} \widehat{\varPsi }_{r}=\text{ tr }({\varvec{S}}_{D(1)}{\varvec{S}}_{D(1)}^T) -\sum _{s=1}^{r-1}\acute{\lambda }_{s}^2\ \ \hbox {for }r=2,\ldots ,n_{(2)}-1. \end{aligned}$$
(20)

Note that \(P(\widehat{\varPsi }_{r}\ge 0)=1\) for \(r=1,\ldots ,n_{(2)}-1\). Then, Aoshima and Yata (2018a) gave the following result.

Lemma 1

(Aoshima and Yata 2018a) Assume (A-i) and (A-ii). Then, it holds that \(\widehat{\varPsi }_{r}/{{\varPsi }_{r}}=1+o_P(1)\) as\(m\rightarrow \infty\) for\(r=1,\ldots ,k+1\).

Thus, we estimate \(\text{ tr }({\varvec{\varSigma }}_*^2)\) by \(\widehat{\varPsi }_{k+1}\). Let

$$\begin{aligned} \widehat{K}_{1*}=2\widehat{\varPsi }_{k+1}/\{n(n-1) \}. \end{aligned}$$

Then, from Lemma 1, under (A-i) and (A-ii), it holds that

$$\begin{aligned} \widehat{K}_{1*}/{K}_{1*}=1+o_P(1) \ \hbox { as }m\rightarrow \infty . \end{aligned}$$

Estimation of k

Recently, Jung et al. (2018) proposed a test of the number of spiked components for high-dimensional data. On the other hand, Aoshima and Yata (2018a) gave estimation of k in (A-ii) by using the CDM method. Let \(\hat{\tau }_{r}=\widehat{\varPsi }_{r+1}/\widehat{\varPsi }_{r}\ (=1-\acute{\lambda }_{r}^2/\widehat{\varPsi }_{r})\) for all r, where \(\widehat{\varPsi }_{r}\) given by (20). Note that \(\hat{\tau }_{r}\in [0,1)\) for \(\acute{\lambda }_{r}>0\). Then, Aoshima and Yata (2018a) gave the following results.

Proposition 6

(Aoshima and Yata 2018a) Assume (A-i) and (A-ii). It holds that as\(m\rightarrow \infty\)

$$\begin{aligned}&P(\hat{\tau }_{r}<1-c_r)\rightarrow 1 \ \ \hbox {with some fixed constant }c_r\in (0,1)\hbox { for }r=1,\ldots ,k;\\&\hat{\tau }_{k+1}=1+o_P(1). \end{aligned}$$

Proposition 7

(Aoshima and Yata 2018a) Assume (A-i), (A-ii) and (A-v). Assume also\(\lambda _{k+1}^2/\varPsi _{k+1}=O(n^{-c})\) as \(m\rightarrow \infty\) with some fixed constant \(c>1/2\). It holds that as \(m\rightarrow \infty\)

$$\begin{aligned} P\Big (\hat{\tau }_{k+1}> \{1+(k+1)\gamma (n)\}^{-1} \Big ) \rightarrow 1, \end{aligned}$$

where \(\gamma (n)\) is a function such that \(\gamma (n)\rightarrow 0\) and \(n^{1/2}\gamma (n)\rightarrow \infty\) as \(n\rightarrow \infty\).

From Propositions 6 and 7, if one can assume the conditions in Proposition 7, one may consider k as the first integer \(r\ (={\hat{k}}_{o},\ \hbox {say})\) such that

$$\begin{aligned} \hat{\tau }_{r+1}\{1+(r+1)\gamma (n)\}>1 \quad (r \ge 0). \end{aligned}$$
(21)

Then, it holds that \(P({\hat{k}}_{o}=k)\rightarrow 1\) as \(m\rightarrow \infty\). Note that \(\widehat{\varPsi }_{n_{(2)}}=0\) from the fact that rank\(({\varvec{S}}_{D(1)})\le n_{(2)}-1\). Finally, one may choose k as

$$\begin{aligned} {\hat{k}}=\min \{{\hat{k}}_{o},n_{(2)}-2\} \end{aligned}$$

in actual data analysis. According to Aoshima and Yata (2018a), we use \(\gamma (n)=(n^{-1} \log {n})^{1/2}\) in (21). If \({\hat{k}}=0\), that is, the data is the NSSE model.

Appendix 2

Let \(\psi _{r}=\lambda _1^2/(n^2\lambda _{r})+{\varvec{\mu }}^T{\varvec{\varSigma }}{\varvec{\mu }}/(n\lambda _{r})\) for \(r=1,\ldots ,k\).

Proofs of Propositions 1 and 2

From the fact that \(\lambda _{k+1}^2\le \text{ tr }({\varvec{\varSigma }}_*^2)\), we note that

$$\begin{aligned} K_{2*}&\le 4\varDelta _* \lambda _{k+1}/n =O( \varDelta _* K_{1*}^{1/2}). \end{aligned}$$
(22)

Then, under (A-iv), it holds that as \(m\rightarrow \infty\)

$$\begin{aligned} \text{ Var }(T_{\mathrm{DT}})/\varDelta _*^2=K_*/\varDelta _*^2\rightarrow 0. \end{aligned}$$
(23)

Thus, we can conclude the result of Proposition 2. On the other hand, from (22), under (A-ii) and (A-iii), it holds that \(K_{2*}=o(K_{1*})\). Then, from Theorem 5 in Aoshima and Yata (2015), we can conclude the result of Proposition 1.

Proof of Theorem 1

From (S6.28) in Appendix B of Aoshima and Yata (2018a), we claim that for \(r=1,\ldots ,k\),

$$\begin{aligned} \sum _{l<l'}^{n} \frac{ {\tilde{x}}_{rl} {\tilde{x}}_{rl'}-{x}_{rl} {x}_{rl'}}{n(n-1)} =O_P\Big \{\psi _{r}^{1/2}(\psi _{r}^{1/2}+\lambda _{r}^{1/2} /n^{1/2}+{\varvec{h}}_r^T{\varvec{\mu }})\Big \} \end{aligned}$$
(24)

as \(m\rightarrow \infty\) under (A-i) and (A-ii). Here, by noting that \(\text{ tr }({\varvec{\varSigma }}_*^2)/\lambda _k^2=O(1)\) and \({\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}\le \varDelta \lambda _1\) we have that for \(r=1,\ldots ,k\)

$$\begin{aligned} \lambda _1^2/(n^2 \lambda _r)&=o(K_{1*}^{1/2});\quad {\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}/(n \lambda _r)=O\{\varDelta _* \lambda _1/(n \lambda _r)\}=o(\varDelta _*); \\ \lambda _1^2/n^3&=o(K_{1*}); \ \hbox { and } \ {\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}/n^2=O(\varDelta _* \lambda _1/n^2 )=o(\varDelta _* K_{1*}^{1/2}) \end{aligned}$$

under (A-ii), (A-v) and (A-vi), so that for \(r=1,\ldots ,k\)

$$\begin{aligned} \psi _{r}=o\big (\max \{K_{1*}^{1/2}, \varDelta _* \}\big ) \ \hbox { and } \ \psi _{r}^{1/2}\lambda _{r}^{1/2}/n^{1/2}=o\big (\max \{K_{1*}^{1/2}, \varDelta _* \}\big ). \end{aligned}$$

Also, note that \(|{\varvec{h}}_r^T{\varvec{\mu }}|\le \varDelta ^{1/2}=O(\varDelta _{*}^{1/2})\) under (A-vi), so that for \(r=1,\ldots ,k\)

$$\begin{aligned} \psi _{r}^{1/2} {\varvec{h}}_r^T{\varvec{\mu }}=o\big (\max \{K_{1*}^{1/2}, \varDelta _* \}\big ). \end{aligned}$$

Thus, from (24) we can conclude the first result of Theorem 1. In addition, from the first result, it holds that

$$\begin{aligned} {\widehat{T}}_{\mathrm{DT}}=T_{\mathrm{DT}}+o_P( K_{1*}^{1/2} ) \end{aligned}$$

under (A-i) to (A-iii), (A-v) and (A-vi). Thus from Proposition 1, it concludes the second result of Theorem 1.

Proofs of Corollaries 1 and 2

From the first result of Theorem 1, it holds that

$$\begin{aligned} {\widehat{T}}_{\mathrm{DT}}/\varDelta _*={T}_{\mathrm{DT}}/\varDelta _*+o_P(1) \end{aligned}$$

under (A-i), (A-ii) and (A-iv) to (A-vi). Thus from Proposition 2, it concludes the result of Corollary 1. In addition, from Corollary 1 and Lemma 1, it holds

$$\begin{aligned} P\left( {\widehat{T}}_{\mathrm{DT}}/{\widehat{K}_{1*}^{1/2}}>z_{\alpha }\right) =P\left( {\widehat{T}}_{\mathrm{DT}}/{\varDelta _*}>z_{\alpha }{\widehat{K}_{1*}^{1/2}}/{\varDelta _*}\right) =P\{1+o_P(1)>o_P(1)\} \rightarrow 1 \end{aligned}$$

under (A-i), (A-ii) and (A-iv) to (A-vi). It concludes the result of Corollary 2.

Proof of Theorem 2

First, we consider the case when (A-iii) is met. We note that \(K_*/K_{1*}\rightarrow 1\) as \(m\rightarrow \infty\) under (A-ii) and (A-iii). Then, from Theorem 1 and Lemma 1, under (A-i) to (A-iii), (A-v) and (A-vi), we have that

$$\begin{aligned}&P\left( {\widehat{T}}_{\mathrm{DT}}/{\widehat{K}_{1*}^{1/2}}>z_{\alpha }\right) \nonumber \\&\quad =P\{ \left( {\widehat{T}}_{\mathrm{DT}}-\varDelta _*\right) /K_{*}^{1/2}> \left( z_{\alpha }{K}_{1*}^{1/2}-\varDelta _*\right) /K_*^{1/2}+o_P(1)\}\nonumber \\&\quad =\varPhi \{ \left( \varDelta _*-z_{\alpha }K_{1*}^{1/2}\right) /K_*^{1/2}\}+o(1) =\varPhi \left( {\varDelta }_*/{K_{1*}^{1/2}}-z_{\alpha }\right) +o(1). \end{aligned}$$
(25)

It concludes the result about Size in Theorem 2. On the other hand, under (A-iv), from (23), it holds that

$$\begin{aligned} \varPhi \{ \left( \varDelta _*-z_{\alpha }K_{1*}^{1/2}\right) /K_*^{1/2}\}=1+o(1). \end{aligned}$$

Hence, from (25) and Corollary 2, by considering a convergent subsequence of \(\varDelta _*^2/K_{1*}\), we can conclude the result about Power in Theorem 2.

Proofs of Propositions 3 and 4

By noting that \(E({\varvec{x}}_l-{\varvec{\mu }})={\varvec{0}}\) (\(l=1,\ldots ,n\)) and \(\text{ Var }(T_{{\varvec{\mu }}})=K_{1*}\), the results are obtained straightforwardly from the results of Proposition 1 and Theorem 1.

Proof of Theorem 3

By noting that \({\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}\le \varDelta \lambda _1\) we have that for \(r=1,\ldots ,k\)

$$\begin{aligned} \lambda _1^2/(n^2 \lambda _r)&=O(\lambda _1/n^{3/2})=o(\varDelta );\quad {\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}/(n \lambda _r)=O\{\varDelta \lambda _1/(n \lambda _r)\}=o(\varDelta ); \\ \lambda _1^2/n^3&=o(\varDelta ^2 ); \ \hbox { and } \ {\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}/n^2=O(\varDelta _* \lambda _1/n^2 )=o(\varDelta ^2) \end{aligned}$$

as \(m\rightarrow \infty\) under (A-vii). Thus, from (24) it holds that

$$\begin{aligned} {\widehat{T}}_{\mathrm{DT}}/\varDelta =T_{\mathrm{DT}}/\varDelta +o_P(1) \end{aligned}$$

under (A-i), (A-ii) and (A-vii). Then, from Proposition 2, we can conclude the result of Theorem 3.

Proof of Proposition 5

From Lemma B.1 in Appendix B of Aoshima and Yata (2018a), under (A-i) and (A-ii), it holds for \(j=1,\ldots ,k\) that as \(m \rightarrow \infty\)

$$\begin{aligned} \sum _{l=1}^{n}\frac{({\tilde{x}}_{jl}-{x}_{jl})^2}{n}=O_P(n\psi _{j}). \end{aligned}$$

Thus, we can conclude the result of Proposition 5.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ishii, A., Yata, K. & Aoshima, M. Inference on high-dimensional mean vectors under the strongly spiked eigenvalue model. Jpn J Stat Data Sci 2, 105–128 (2019). https://doi.org/10.1007/s42081-018-0029-z

Download citation

Keywords

  • Asymptotic normality
  • Data transformation
  • Eigenstructure estimation
  • Large p small n
  • Noise reduction methodology
  • Spiked model