Abstract
In this paper, we discuss inference problems on high-dimensional mean vectors under the strongly spiked eigenvalue (SSE) model. First, we consider one-sample test. In order to avoid huge noise, we derive a new test statistic by using a data transformation technique. We show that the asymptotic normality can be established for the new test statistic. We give an asymptotic size and power of a new test procedure and investigate the performance theoretically and numerically. We apply the findings to the construction of confidence regions on the mean vector under the SSE model. We further discuss multi-sample problems under the SSE models. Finally, we demonstrate the new test procedure by using actual microarray data sets.
This is a preview of subscription content, access via your institution.





References
Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis (3rd ed.). New York: Wiley.
Aoshima, M., & Yata, K. (2011). Two-stage procedures for high-dimensional data. Sequential Analysis, 30, 356–399. (Editor’s special invited paper).
Aoshima, M., & Yata, K. (2014). A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data. Annals of the Institute of Statistical Mathematics, 66, 983–1010.
Aoshima, M., & Yata, K. (2015). Asymptotic normality for inference on multisample, high-dimensional mean vectors under mild conditions. Methodology and Computing in Applied Probability, 17, 419–439.
Aoshima, M., & Yata, K. (2018a). Two-sample tests for high-dimension, strongly spiked eigenvalue models. Statistica Sinica, 28, 43–62.
Aoshima, M., & Yata, K. (2018b). Distance-based classifier by data transformation for high-dimension, strongly spiked eigenvalue models. Annals of the Institute of Statistical Mathematics, in press (https://doi.org/10.1007/s10463-018-0655-z).
Bai, Z., & Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statistica Sinica, 6, 311–329.
Bennett, B. M. (1951). Note on a solution of the generalized Behrens–Fisher problem. Annals of the Institute of Statistical Mathematics, 2, 87–90.
Chen, S. X., & Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics, 38, 808–835.
Dempster, A. P. (1958). A high dimensional two sample significance test. The Annals of Mathematical Statistics, 29, 995–1010.
Dempster, A. P. (1960). A significance test for the separation of two highly multivariate small samples. Biometrics, 16, 41–50.
Ishii, A., Yata, K., & Aoshima, M. (2016). Asymptotic properties of the first principal component and equality tests of covariance matrices in high-dimension, low-sample-size context. Journal of Statistical Planning and Inference, 170, 186–199.
Jung, S., & Marron, J. S. (2009). PCA consistency in high dimension, low sample size context. The Annals of Statistics, 37, 4104–4130.
Jung, S., Lee, M. H., & Ahn, J. (2018). On the number of principal components in high dimensions. Biometrika, 105, 389–402.
Katayama, S., Kano, Y., & Srivastava, M. S. (2013). Asymptotic distributions of some test criteria for the mean vector with fewer observations than the dimension. Journal of Multivariate Analysis, 116, 410–421.
Nishiyama, T., Hyodo, M., Seo, T., & Pavlenko, T. (2013). Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices. Journal of Statistical Planning and Inference, 143, 1898–1911.
Notterman, D. A., Alon, U., Sierk, A. J., & Levine, A. J. (2001). Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research, 61, 3124–3130.
Nutt, C. L., Mani, D. R., Betensky, R. A., Tamayo, P., Cairncross, J. G., Ladd, C., et al. (2003). Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research, 63, 1602–1607.
Shen, D., Shen, H., Zhu, H., & Marron, J. S. (2016). The statistics and mathematics of high dimension low sample size asymptotics. Statistica Sinica, 26, 1747–1770.
Srivastava, M. S. (2007). Multivariate theory for analyzing high dimensional data. Journal of the Japan Statistical Society, 37, 53–86.
Srivastava, M. S., & Du, M. (2008). A test for the mean vector with fewer observations than the dimension. Journal of Multivariate Analysis, 99, 386–402.
Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., et al. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1, 203–209.
Yata, K., & Aoshima, M. (2010). Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix. Journal of Multivariate Analysis, 101, 2060–2077.
Yata, K., & Aoshima, M. (2012). Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. Journal of Multivariate Analysis, 105, 193–215.
Yata, K., & Aoshima, M. (2015). Principal component analysis based clustering for high-dimension, low-sample-size data. arXiv preprint, arXiv:1503.04525.
Acknowledgements
We would like to thank an associate editor and two anonymous referees for their constructive comments.
Author information
Affiliations
Corresponding author
Additional information
Research of the first author was partially supported by Grant-in-Aid for Young Scientists, Japan Society for the Promotion of Science (JSPS), under Contract Number 18K18015. Research of the second author was partially supported by Grant-in-Aid for Scientific Research (C), JSPS, under Contract Number 18K03409. Research of the third author was partially supported by Grants-in-Aid for Scientific Research (A) and Challenging Research (Exploratory), JSPS, under Contract Numbers 15H01678 and 17K19956.
Appendices
Appendix 1
In this section, we give estimators for the parameters in our new test statistic, \({\widehat{T}}_{\mathrm{DT}}\), and discuss their asymptotic properties.
Estimation of \({x}_{jl}\)
Let \({\varvec{X}}=[{\varvec{x}}_{1},\ldots ,{\varvec{x}}_{n}]\), \(\overline{{\varvec{X}}}=[{\overline{{\varvec{x}}}},\ldots ,{\overline{{\varvec{x}}}}]\) and \({\varvec{P}}_{n}={\varvec{I}}_{n}-{\varvec{1}}_{n}{\varvec{1}}_{n}^T/n\), where \({\varvec{1}}_{n}=(1,\ldots ,1)^T\). Recall \({\varvec{S}}\) is the sample covariance matrix. One can write that \({\varvec{S}}=({\varvec{X}}-\overline{{\varvec{X}}})({\varvec{X}}-\overline{{\varvec{X}}})^T/(n-1)={\varvec{X}}{\varvec{P}}_{n}{\varvec{X}}^T/(n-1)\). Let us write the eigen-decomposition of \({\varvec{S}}\) as \({\varvec{S}}=\sum _{j=1}^{p}\hat{\lambda }_{j}\hat{{\varvec{h}}}_{j}\hat{{\varvec{h}}}_{j}^T\) having eigenvalues \(\hat{\lambda }_{1}\ge \cdots \ge \hat{\lambda }_{p}\ge 0\) and the corresponding p-dimensional unit eigenvectors \(\hat{{\varvec{h}}}_{1},\ldots ,\hat{{\varvec{h}}}_{p}\). We assume \(P({\varvec{h}}_{j}^T\hat{{\varvec{h}}}_{j} \ge 0)=1\) for all j without loss of generality. We also define the following \(n \times n\) dual sample covariance matrix\(:\)
Note that \({\varvec{S}}\) and \({\varvec{S}}_{D}\) share non-zero eigenvalues. Let us write the eigen-decomposition of \({\varvec{S}}_{D}\) as \({\varvec{S}}_{D}=\sum _{j=1}^{n-1}\hat{\lambda }_{j}\hat{{\varvec{u}}}_{j}\hat{{\varvec{u}}}_{j}^T\), where \(\hat{{\varvec{u}}}_{j}=(\hat{u}_{j1},\ldots ,\hat{u}_{jn})^T\) denotes a n-dimensional unit eigenvector corresponding to \(\hat{\lambda }_{j}\). In high-dimensional settings, we calculate \(\hat{{\varvec{h}}}_{j}\) by using \(\hat{{\varvec{u}}}_{j}\) as follows\(:\)
Note that \({\varvec{1}}_{n}^T {\varvec{S}}_D{\varvec{1}}_{n}=0\), so that \({\varvec{1}}_{n}^T\hat{{\varvec{u}}}_{j}=\sum _{l=1}^n\hat{u}_{jl}=0\) when \(\hat{\lambda }_{j}>0\).
For high-dimensional data, the sample eigenvalues and eigenvectors get huge noise. See Jung and Marron (2009) and Shen et al. (2016) for the details. In order to remove the huge noise, Yata and Aoshima (2012) focused on a geometric representation of \({\varvec{S}}_{D}\) and proposed the NR method. If one applies the NR method, the \(\lambda _{j}\)s and \({\varvec{h}}_j\)s are estimated by
Note that \(P({\tilde{\lambda }}_j \ge 0)=1\) for \(j=1,\ldots ,n-2\). We emphasize that \({\tilde{\lambda }}_{j}\)s and \({\tilde{{\varvec{h}}}}_j\)s have consistency properties under much milder conditions than \(\hat{\lambda }_{j}\)s and \(\hat{{\varvec{h}}}_j\)s. However, for the estimation of \(x_{jl}={\varvec{x}}_{l}^T{{\varvec{h}}}_{j}\), Aoshima and Yata (2018a) showed that \({\varvec{x}}_{l}^T{\tilde{{\varvec{h}}}}_{j}\) involves a huge bias and gave a modification for all j, l by
where
Note that \(\sum _{l=1}^{n}\hat{{\varvec{u}}}_{jl}/n_i=\{(n-2)/(n-1)\}\hat{{\varvec{u}}}_{j}\) and \(\sum _{l=1}^{n}{\tilde{{\varvec{h}}}}_{jl}/n={\tilde{{\varvec{h}}}}_{j}\). Then, we estimate \(x_{jl}\) by
From Lemma B.1 in Aoshima and Yata (2018a), we have the following result.
Proposition 5
Assume (A-i) and (A-ii). It holds for \(j=1,\ldots ,k\) that as \(m \rightarrow \infty\)
Note that \(\text{ Var }({x}_{jl})=\lambda _j\). If \(\lambda _1^2/(n\lambda _j^2)=o(1)\), the normalized mean squared error in Proposition 5 tends to 0 under \(H_0\) in (1). See Sect. 5.1 in Aoshima and Yata (2018a) for the details.
Estimation of \(K_{1*}\)
We use the CDM method given by Yata and Aoshima (2010) to estimate \(K_{1*}\). Let \(n_{(1)}=\lceil n/2 \rceil\) and \(n_{(2)}=n-n_{(1)}\). Let \({\varvec{X}}_{1}=[{\varvec{x}}_{1},\ldots ,{\varvec{x}}_{n_{(1)}}]\) and \({\varvec{X}}_{2}=[{\varvec{x}}_{n_{(1)}+1},\ldots ,{\varvec{x}}_{n}]\). We define
where \(\overline{{\varvec{X}}}_{i}=[{\overline{{\varvec{x}}}}_{n(i)},\ldots ,{\overline{{\varvec{x}}}}_{n(i)}]\) with \({\overline{{\varvec{x}}}}_{n(1)}=\sum _{l=1}^{n_{(1)}}{\varvec{x}}_{l}/n_{(1)}\) and \({\overline{{\varvec{x}}}}_{n(2)}=\sum _{l=n_{(1)}+1}^{n}\) \({\varvec{x}}_{l}/n_{(2)}\). We estimate \(\lambda _{j}\) by the j-th singular value, \(\acute{\lambda }_{j}\), of \({\varvec{S}}_{D(1)}\), where
Yata and Aoshima (2010) showed that \(\acute{\lambda }_{j}\) has several consistency properties for high-dimensional non-Gaussian data. Note that \(E\{\text{ tr }({\varvec{S}}_{D(1)}{\varvec{S}}_{D(1)}^T)\}=\text{ tr }\left( {\varvec{\varSigma }}^2 \right)\). We estimate \({\varPsi }_{r}\) by \(\widehat{\varPsi }_{1}=\text{ tr }({\varvec{S}}_{D(1)}{\varvec{S}}_{D(1)}^T)\) and
Note that \(P(\widehat{\varPsi }_{r}\ge 0)=1\) for \(r=1,\ldots ,n_{(2)}-1\). Then, Aoshima and Yata (2018a) gave the following result.
Lemma 1
(Aoshima and Yata 2018a) Assume (A-i) and (A-ii). Then, it holds that \(\widehat{\varPsi }_{r}/{{\varPsi }_{r}}=1+o_P(1)\) as \(m\rightarrow \infty\) for \(r=1,\ldots ,k+1\).
Thus, we estimate \(\text{ tr }({\varvec{\varSigma }}_*^2)\) by \(\widehat{\varPsi }_{k+1}\). Let
Then, from Lemma 1, under (A-i) and (A-ii), it holds that
Estimation of k
Recently, Jung et al. (2018) proposed a test of the number of spiked components for high-dimensional data. On the other hand, Aoshima and Yata (2018a) gave estimation of k in (A-ii) by using the CDM method. Let \(\hat{\tau }_{r}=\widehat{\varPsi }_{r+1}/\widehat{\varPsi }_{r}\ (=1-\acute{\lambda }_{r}^2/\widehat{\varPsi }_{r})\) for all r, where \(\widehat{\varPsi }_{r}\) given by (20). Note that \(\hat{\tau }_{r}\in [0,1)\) for \(\acute{\lambda }_{r}>0\). Then, Aoshima and Yata (2018a) gave the following results.
Proposition 6
(Aoshima and Yata 2018a) Assume (A-i) and (A-ii). It holds that as \(m\rightarrow \infty\)
Proposition 7
(Aoshima and Yata 2018a) Assume (A-i), (A-ii) and (A-v). Assume also \(\lambda _{k+1}^2/\varPsi _{k+1}=O(n^{-c})\) as \(m\rightarrow \infty\) with some fixed constant \(c>1/2\). It holds that as \(m\rightarrow \infty\)
where \(\gamma (n)\) is a function such that \(\gamma (n)\rightarrow 0\) and \(n^{1/2}\gamma (n)\rightarrow \infty\) as \(n\rightarrow \infty\).
From Propositions 6 and 7, if one can assume the conditions in Proposition 7, one may consider k as the first integer \(r\ (={\hat{k}}_{o},\ \hbox {say})\) such that
Then, it holds that \(P({\hat{k}}_{o}=k)\rightarrow 1\) as \(m\rightarrow \infty\). Note that \(\widehat{\varPsi }_{n_{(2)}}=0\) from the fact that rank\(({\varvec{S}}_{D(1)})\le n_{(2)}-1\). Finally, one may choose k as
in actual data analysis. According to Aoshima and Yata (2018a), we use \(\gamma (n)=(n^{-1} \log {n})^{1/2}\) in (21). If \({\hat{k}}=0\), that is, the data is the NSSE model.
Appendix 2
Let \(\psi _{r}=\lambda _1^2/(n^2\lambda _{r})+{\varvec{\mu }}^T{\varvec{\varSigma }}{\varvec{\mu }}/(n\lambda _{r})\) for \(r=1,\ldots ,k\).
Proofs of Propositions 1 and 2
From the fact that \(\lambda _{k+1}^2\le \text{ tr }({\varvec{\varSigma }}_*^2)\), we note that
Then, under (A-iv), it holds that as \(m\rightarrow \infty\)
Thus, we can conclude the result of Proposition 2. On the other hand, from (22), under (A-ii) and (A-iii), it holds that \(K_{2*}=o(K_{1*})\). Then, from Theorem 5 in Aoshima and Yata (2015), we can conclude the result of Proposition 1.
Proof of Theorem 1
From (S6.28) in Appendix B of Aoshima and Yata (2018a), we claim that for \(r=1,\ldots ,k\),
as \(m\rightarrow \infty\) under (A-i) and (A-ii). Here, by noting that \(\text{ tr }({\varvec{\varSigma }}_*^2)/\lambda _k^2=O(1)\) and \({\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}\le \varDelta \lambda _1\) we have that for \(r=1,\ldots ,k\)
under (A-ii), (A-v) and (A-vi), so that for \(r=1,\ldots ,k\)
Also, note that \(|{\varvec{h}}_r^T{\varvec{\mu }}|\le \varDelta ^{1/2}=O(\varDelta _{*}^{1/2})\) under (A-vi), so that for \(r=1,\ldots ,k\)
Thus, from (24) we can conclude the first result of Theorem 1. In addition, from the first result, it holds that
under (A-i) to (A-iii), (A-v) and (A-vi). Thus from Proposition 1, it concludes the second result of Theorem 1.
Proofs of Corollaries 1 and 2
From the first result of Theorem 1, it holds that
under (A-i), (A-ii) and (A-iv) to (A-vi). Thus from Proposition 2, it concludes the result of Corollary 1. In addition, from Corollary 1 and Lemma 1, it holds
under (A-i), (A-ii) and (A-iv) to (A-vi). It concludes the result of Corollary 2.
Proof of Theorem 2
First, we consider the case when (A-iii) is met. We note that \(K_*/K_{1*}\rightarrow 1\) as \(m\rightarrow \infty\) under (A-ii) and (A-iii). Then, from Theorem 1 and Lemma 1, under (A-i) to (A-iii), (A-v) and (A-vi), we have that
It concludes the result about Size in Theorem 2. On the other hand, under (A-iv), from (23), it holds that
Hence, from (25) and Corollary 2, by considering a convergent subsequence of \(\varDelta _*^2/K_{1*}\), we can conclude the result about Power in Theorem 2.
Proofs of Propositions 3 and 4
By noting that \(E({\varvec{x}}_l-{\varvec{\mu }})={\varvec{0}}\) (\(l=1,\ldots ,n\)) and \(\text{ Var }(T_{{\varvec{\mu }}})=K_{1*}\), the results are obtained straightforwardly from the results of Proposition 1 and Theorem 1.
Proof of Theorem 3
By noting that \({\varvec{\mu }}^T {\varvec{\varSigma }}{\varvec{\mu }}\le \varDelta \lambda _1\) we have that for \(r=1,\ldots ,k\)
as \(m\rightarrow \infty\) under (A-vii). Thus, from (24) it holds that
under (A-i), (A-ii) and (A-vii). Then, from Proposition 2, we can conclude the result of Theorem 3.
Proof of Proposition 5
From Lemma B.1 in Appendix B of Aoshima and Yata (2018a), under (A-i) and (A-ii), it holds for \(j=1,\ldots ,k\) that as \(m \rightarrow \infty\)
Thus, we can conclude the result of Proposition 5.
Rights and permissions
About this article
Cite this article
Ishii, A., Yata, K. & Aoshima, M. Inference on high-dimensional mean vectors under the strongly spiked eigenvalue model. Jpn J Stat Data Sci 2, 105–128 (2019). https://doi.org/10.1007/s42081-018-0029-z
Received:
Accepted:
Published:
Issue Date:
Keywords
- Asymptotic normality
- Data transformation
- Eigenstructure estimation
- Large p small n
- Noise reduction methodology
- Spiked model