Abstract
In this paper, we consider the asymptotic normality for various inference problems on multisample and high-dimensional mean vectors. We verify that the asymptotic normality of concerned statistics is proved under mild conditions for high-dimensional data. We show that the asymptotic normality can be justified theoretically and numerically even for non-Gaussian data. We introduce the extended cross-data-matrix (ECDM) methodology to construct an unbiased estimator at a reasonable computational cost. With the help of the asymptotic normality, we show that the concerned statistics given by ECDM can ensure consistency properties for inference on multisample and high-dimensional mean vectors. We give several applications such as confidence regions for high-dimensional mean vectors, confidence intervals for the squared norm and the test of multisample mean vectors. We also provide sample size determination so as to satisfy prespecified accuracy on inference. Finally, we give several examples by using a microarray data set.
Article PDF
Similar content being viewed by others
Change history
21 December 2021
A Correction to this paper has been published: https://doi.org/10.1007/s11009-021-09919-w
References
Aoshima M, Yata K (2011a) Two-stage procedures for high-dimensional data. Seq Anal 30:356–399 (Editor’s special invited paper)
Aoshima M, Yata K (2011b) Authors’ response. Seq Anal 30:432–440
Aoshima M, Yata K (2011c) Effective methodologies for statistical inference on microarray studies. In: Spiess PE (ed) Prostate cancer - from bench to bedside. InTech, pp 13–32
Bai Z, Sarandasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6:311–329
Chen SX, Qin YL (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38:808–835
Chiaretti S, Li X, Gentleman R, Vitale A, Vignetti M, Mandelli F, Ritz J, Foa R (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103:2771–2778
Ghosh M, Mukhopadhyay N, Sen PK (1997) Sequential estimation. Wiley, New York
McLeish DL (1974) Dependent central limit theorems and invariance principles. Ann Probab 2:620–628
Pollard KS, Dudoit S, van der Laan MJ (2005) Multiple testing procedures: R multitest package and applications to genomics. In: Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S (eds) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York, pp 249–271
Srivastava MS (2005) Some tests concerning the covariance matrix in high dimensional data. J Jpn Stat Soc 35:251–272
Yata K, Aoshima M (2010) Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix. J Multivar Anal 101:2060–2077
Yata K, Aoshima M (2012) Inference on high-dimensional mean vectors with fewer observations than the dimension. Methodol Comput Appl Probab 14:459–476
Yata K, Aoshima M (2013) Correlation tests for high-dimensional data using extended cross-data-matrix methodology. J Multivar Anal 117:313–331
Author information
Authors and Affiliations
Corresponding author
Additional information
The original version of this article was revised due to a retrospective Open Access order.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Aoshima, M., Yata, K. Asymptotic Normality for Inference on Multisample, High-Dimensional Mean Vectors Under Mild Conditions. Methodol Comput Appl Probab 17, 419–439 (2015). https://doi.org/10.1007/s11009-013-9370-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-013-9370-7
Keywords
- Asymptotic normality
- Confidence region
- Cross-data-matrix methodology
- Large p small n
- Microarray
- Two-stage procedure