Inference on High-Dimensional Mean Vectors with Fewer Observations Than the Dimension

Abstract

We focus on inference about high-dimensional mean vectors when the sample size is much fewer than the dimension. Such data situation occurs in many areas of modern science such as genetic microarrays, medical imaging, text recognition, finance, chemometrics, and so on. First, we give a given-radius confidence region for mean vectors. This inference can be utilized as a variable selection of high-dimensional data. Next, we give a given-width confidence interval for squared norm of mean vectors. This inference can be utilized in a classification procedure of high-dimensional data. In order to assure a prespecified coverage probability, we propose a two-stage estimation methodology and determine the required sample size for each inference. Finally, we demonstrate how the new methodologies perform by using a microarray data set.

This is a preview of subscription content, access via your institution.

References

  1. Ahn J, Marron JS, Muller KM, Chi Y-Y (2007) The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika 94:760–766

    MathSciNet  MATH  Article  Google Scholar 

  2. Aoshima M (2005) Statistical inference in two-stage sampling. Trans Am Math Soc 215:125–145

    MathSciNet  Google Scholar 

  3. Aoshima M, Mukhopadhyay N (1998) Fixed-width simultaneous confidence intervals for multinormal means in several intraclass correlation models. J Multivar Anal 66(1):46–63

    MathSciNet  MATH  Article  Google Scholar 

  4. Aoshima M, Takada Y (2004) Asymptotic second-order efficiency for multivariate two-stage estimation of a linear function of normal mean vectors. Seq Anal 23(3):333–353

    MathSciNet  MATH  Article  Google Scholar 

  5. Aoshima M, Takada Y, Srivastava MS (2002) A two-stage procedure for estimating a linear function of k multinormal mean vectors when covariance matrices and unknown. J Stat Plan Inference 100:109–119

    MathSciNet  MATH  Article  Google Scholar 

  6. Aoshima M, Yata K (2010) Asymptotic second-order consistency for two-stage estimation methodologies and its applications. Ann Inst Stat Math 62:571–600

    MathSciNet  Article  Google Scholar 

  7. Aoshima M, Yata K (2011) Two-stage procedures for high-dimensional data. Seq Anal (Editor’s special invited paper), to appear

  8. Bai Z, Sarandasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6:311–329

    MATH  Google Scholar 

  9. Bradley RC (2005) Basic properties of strong mixing conditions. A survey and some open questions. Probab Surv 2:107–144 (electronic)

    MATH  Google Scholar 

  10. Chiaretti S, Li X, Gentleman R, Vitale A, Vignetti M, Mandelli F, Ritz J, Foa R (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103:2771–2778

    Article  Google Scholar 

  11. Ghosh M, Mukhopadhyay N, Sen PK (1997) Sequential estimation. Wiley, New York

    MATH  Book  Google Scholar 

  12. Hall P, Marron JS, Neeman A (2005) Geometric representation of high dimension, low sample size data. J R Stat Soc Ser B 67:427–444

    MathSciNet  MATH  Article  Google Scholar 

  13. Kolmogorov AN, Rozanov YA (1960) On strong mixing conditions for stationary Gaussian processes. Theory Probab Appl 5:204–208

    MathSciNet  Article  Google Scholar 

  14. Mukhopadhyay N, Duggan WT (1997) Can a two-stage procedure enjoy second-order properties? Sankhyā Ser A 59:435–448

    MathSciNet  MATH  Google Scholar 

  15. Mukhopadhyay N, Duggan WT (1999) On a two-stage procedure having second-order properties with applications. Ann Inst Stat Math 51:621–636

    MathSciNet  MATH  Article  Google Scholar 

  16. Pollard KS, Dudoit S, van der Laan MJ (2005) Multiple testing procedures: R multitest package and applications to genomics. In: Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S (eds) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York, pp 249–271

    Chapter  Google Scholar 

  17. Srivastava MS (2005) Some tests concerning the covariance matrix in high dimensional data. J Jpn Stat Soc 35:251–272

    Google Scholar 

  18. Stein C (1945) A two-sample test for a linear hypothesis whose power is independent of the variance. Ann Math Stat 16:243–258

    MATH  Article  Google Scholar 

  19. Yata K (2010) Effective two-stage estimation for a linear function of high-dimensional gaussian means. Seq Anal 29:463–482

    MathSciNet  MATH  Article  Google Scholar 

  20. Yata K, Aoshima M (2009a) Double shrink methodologies to determine the sample size via covariance structures. J Stat Plan Inference 139:81–99

    MathSciNet  MATH  Article  Google Scholar 

  21. Yata K, Aoshima M (2009b) PCA consistency for non-gaussian data in high dimension, low sample size context. Commun Stat, Theory Methods (Special issue honoring Zacks S, ed Mukhopadhyay N) 38:2634–2652.

    MathSciNet  MATH  Google Scholar 

  22. Yata K, Aoshima M (2010a) Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix. J Multivar Anal 101:2060–2077

    MathSciNet  MATH  Article  Google Scholar 

  23. Yata K, Aoshima M (2010b) Intrinsic dimensionality estimation of high dimension, low sample size data with d-asymptotics. Commun Stat, Theory Method (Special issue honoring Akahira M, ed Aoshima M) 39:1511–1521.

    MathSciNet  MATH  Google Scholar 

  24. Yata K, Aoshima M (2011) Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. J Mult Anal, revised

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kazuyoshi Yata.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Yata, K., Aoshima, M. Inference on High-Dimensional Mean Vectors with Fewer Observations Than the Dimension. Methodol Comput Appl Probab 14, 459–476 (2012). https://doi.org/10.1007/s11009-011-9233-z

Download citation

Keywords

  • Classification
  • Confidence region
  • HDLSS
  • Sample size determination
  • Two-stage estimation
  • Variable selection

Mathematics Subject Classifications (2010)

  • 62L10
  • 62H10