Asymptotic Normality for Inference on Multisample, High-Dimensional Mean Vectors Under Mild Conditions


In this paper, we consider the asymptotic normality for various inference problems on multisample and high-dimensional mean vectors. We verify that the asymptotic normality of concerned statistics is proved under mild conditions for high-dimensional data. We show that the asymptotic normality can be justified theoretically and numerically even for non-Gaussian data. We introduce the extended cross-data-matrix (ECDM) methodology to construct an unbiased estimator at a reasonable computational cost. With the help of the asymptotic normality, we show that the concerned statistics given by ECDM can ensure consistency properties for inference on multisample and high-dimensional mean vectors. We give several applications such as confidence regions for high-dimensional mean vectors, confidence intervals for the squared norm and the test of multisample mean vectors. We also provide sample size determination so as to satisfy prespecified accuracy on inference. Finally, we give several examples by using a microarray data set.

This is a preview of subscription content, access via your institution.


  1. Aoshima M, Yata K (2011a) Two-stage procedures for high-dimensional data. Seq Anal 30:356–399 (Editor’s special invited paper)

    Article  MATH  MathSciNet  Google Scholar 

  2. Aoshima M, Yata K (2011b) Authors’ response. Seq Anal 30:432–440

    Article  MATH  MathSciNet  Google Scholar 

  3. Aoshima M, Yata K (2011c) Effective methodologies for statistical inference on microarray studies. In: Spiess PE (ed) Prostate cancer - from bench to bedside. InTech, pp 13–32

  4. Bai Z, Sarandasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6:311–329

    MATH  Google Scholar 

  5. Chen SX, Qin YL (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38:808–835

    Article  MATH  MathSciNet  Google Scholar 

  6. Chiaretti S, Li X, Gentleman R, Vitale A, Vignetti M, Mandelli F, Ritz J, Foa R (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103:2771–2778

    Article  Google Scholar 

  7. Ghosh M, Mukhopadhyay N, Sen PK (1997) Sequential estimation. Wiley, New York

    Book  MATH  Google Scholar 

  8. McLeish DL (1974) Dependent central limit theorems and invariance principles. Ann Probab 2:620–628

    Article  MATH  MathSciNet  Google Scholar 

  9. Pollard KS, Dudoit S, van der Laan MJ (2005) Multiple testing procedures: R multitest package and applications to genomics. In: Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S (eds) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York, pp 249–271

    Chapter  Google Scholar 

  10. Srivastava MS (2005) Some tests concerning the covariance matrix in high dimensional data. J Jpn Stat Soc 35:251–272

    Article  Google Scholar 

  11. Yata K, Aoshima M (2010) Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix. J Multivar Anal 101:2060–2077

    Article  MATH  MathSciNet  Google Scholar 

  12. Yata K, Aoshima M (2012) Inference on high-dimensional mean vectors with fewer observations than the dimension. Methodol Comput Appl Probab 14:459–476

    Article  MATH  MathSciNet  Google Scholar 

  13. Yata K, Aoshima M (2013) Correlation tests for high-dimensional data using extended cross-data-matrix methodology. J Multivar Anal 117:313–331

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Makoto Aoshima.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Aoshima, M., Yata, K. Asymptotic Normality for Inference on Multisample, High-Dimensional Mean Vectors Under Mild Conditions. Methodol Comput Appl Probab 17, 419–439 (2015).

Download citation


  • Asymptotic normality
  • Confidence region
  • Cross-data-matrix methodology
  • Large p small n
  • Microarray
  • Two-stage procedure

AMS 2000 Subject Classifications

  • 62H10
  • 62L10
  • 60F05