Skip to main content
Log in

A \(U\)-statistic approach for a high-dimensional two-sample mean testing problem under non-normality and Behrens–Fisher setting

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

A two-sample test statistic is presented for testing the equality of mean vectors when the dimension, \(p\), exceeds the sample sizes, \(n_i,\; i = 1, 2\), and the distributions are not necessarily normal. Under mild assumptions on the traces of the covariance matrices, the statistic is shown to be asymptotically Chi-square distributed when \(n_i, p \rightarrow \infty \). However, the validity of the test statistic when \(p\) is fixed but large, including \(p > n_i\), and when the distributions are multivariate normal, is shown as special cases. This two-sample Chi-square approximation helps us establish the validity of Box’s approximation for high-dimensional and non-normal data to a two-sample setup, valid even under Behrens–Fisher setting. The limiting Chi-square distribution of the statistic is obtained using the asymptotic theory of degenerate \(U\)-statistics, and using a result from classical asymptotic theory, it is further extended to an approximate normal distribution. Both independent and paired-sample cases are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahmad, M. R. (2008). Analysis of high-dimensional repeated measures designs: The one- and two-sample test statistics. Ph.D. Thesis. Göttingen: Cuvillier Verlag.

  • Ahmad, M. R., Werner, C., Brunner, E. (2008). Analysis of high dimensional repeated measures designs: The one sample case. Computational Statistics & Data Analysis, 53, 416–427.

    Google Scholar 

  • Ahmad, M. R., von Rosen, D., Singull, M. (2012a). A note on mean testing for high-dimensional multivariate data under non-normality. Statistica Neerlandica, 67(1), 81–99.

    Google Scholar 

  • Ahmad, M. R., Yamada, T., von Rosen, D. (2012b). Tests of covariance matrices for high-dimensional multivariate data. Test (Submitted).

  • Bai, Z., Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statistica Sinica, 6, 311–329.

    Google Scholar 

  • Box, G. E. P. (1954). Some theorems on quadratic forms applied in the study of analysis of variance problems I: Effect of inequality of variance in the one-way classification. Annals of mathematical statistics, 25, 290–302.

    Google Scholar 

  • Chen, S. X., Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics, 38(2), 808–835.

    Google Scholar 

  • Davis, C. S. (2002). Statistical methods for the analysis of repeated measurements. New York: Springer.

  • Dempster, A. P. (1958). A high dimensional two sample significance test. Annals of Mathematical Statistics, 29(4), 995–1010.

    Google Scholar 

  • Dempster, A. P. (1969). Elements of continuous multivariate analysis. MA: Addison-Wesley.

  • Denker, M. (1985). Asymptotic distribution theory in nonparametric statistics. Vieweg, Braunschweig: Vieweg Advanced Lectures.

  • Denker, M., Gordin, M. (2011). Limit theorems for von Mises statistics of a measure preserving, transformation. arXiv:1109.0635v1[math.DS]. September 3, 2011.

  • Denker, M., Keller, G. (1983). On \(U\)-statistics and v. Mises’ statistics for weakly dependent processes. Z. Wahrscheinlichkeitstheorie verw. Gebiete, 64, 505–522.

    Google Scholar 

  • Dunford, N., Schwartz, J. T. (1967). Linear operators. Part II: Spectral theory-Self-adjoint operators in Hilbert space. New York: Wiley.

  • Fujikoshi, Y., Ulyanov, V. V., Shimizu, R. (2010). Multivariate statistics: High-dimensional and large-sample approximations. New York: Wiley.

  • Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., Smola, A. (2008). A kernel method for the two-sample problem. Journal of Machine Learning Research, 1, 1–43.

    Google Scholar 

  • Hájek, J., Šidák, Z., Sen, P. K. (1999). Theory of rank tests. San Diego: Academic Press.

  • Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics, 19, 293–325.

    Google Scholar 

  • Holzmann, H., Koch, S., Min, A. (2004). Almost sure limit theorems for \(U\)-statistics. Statistics & Probability Letters, 69, 261–269.

    Google Scholar 

  • Jiang, J. (2010). Large sample techniques for statistics. New York: Springer.

  • Koroljuk, V. S., Borovskich, Y. V. (1994). Theory of U-statistics. Dordrecht: Kluwer.

  • Kowalski, J., Tu, X. M. (2008). Modern applied U-statistics. New York: Wiley.

  • Kreyszig, E. (1978). Introductory functional analysis with applications. New York: Wiley.

    MATH  Google Scholar 

  • Ledoit, O., Wolf, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. The Annals of Statistics, 30, 1081–1102.

    Google Scholar 

  • Lee, A. J. (1990). U-statistics: Theory and practice. Boca Raton: CRC Press.

  • Lehmann, E. L. (1999). Elements of large-sample theory. New York: Springer.

    Book  MATH  Google Scholar 

  • Leucht, A. (2012). Degenerate \(U\)- and \(V\)-statistics under weak dependence: Asymptotic theory and bootstrap consistency. Bernoulli, 18, 552–585.

    Article  MATH  MathSciNet  Google Scholar 

  • Masujima, M. (2009). Applied mathematical methods in theoretical physics (2nd ed.). New York: Wiley.

    Book  MATH  Google Scholar 

  • Neuhaus, G. (1977). Functional limit theorems for \(U\)-statistics in the degenerate case. Journal of Multivariate Analysis, 7, 424–439.

    Article  MATH  MathSciNet  Google Scholar 

  • Neumeyer, N. (2004). A central limit theorem for two-sample \(U\)-processes. Statistics & Probability Letters, 67, 73–85.

    Article  MATH  MathSciNet  Google Scholar 

  • Pinheiro, A., Sen, P. K., Pinheiro, H. P. (2009). Decomposibility of high-dimensional diversity measures: Quasi-\(U\)-statistics, martigales, and nonstandard asymptotics. Journal of Multivariate Statistics, 100, 1645–1656.

    Google Scholar 

  • Reed, M., Simon, B. (1980). Methods of modern mathematical physics, Vol. I: Functional analysis. San Diego, CA: Academic Press.

  • Serfling, R. J. (1980). Approximation theorems of mathematical statistics. Weinheim: Wiley.

  • Srivastava, M. S. (2007). Multivariate theory for analyzing high dimensional data. Journal of Japan Statistical Association, 37, 53–86.

    Google Scholar 

  • Srivastava, M. S. (2009). A test for the mean vector with fewer observations than the dimension under non-normality. Journal of Multivariate Analysis, 100, 518–532.

    Google Scholar 

  • van der Vaart, A. W. (1998). Asymptotic statistics. Cambridge: Cambridge University Press.

Download references

Acknowledgments

The author is sincerely thankful to Prof. Dr. Anne Leucht, Theoretical Econometrics and Statistics, Department of Economics, University of Mannheim, Germany, for her careful perusal of the manuscript and helpful suggestions. Thanks are also due to the editor, the associate editor, and to two anonymous referees for their helpful suggestions and comments which lead to this improved version of the article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Rauf Ahmad.

About this article

Cite this article

Ahmad, M.R. A \(U\)-statistic approach for a high-dimensional two-sample mean testing problem under non-normality and Behrens–Fisher setting. Ann Inst Stat Math 66, 33–61 (2014). https://doi.org/10.1007/s10463-013-0404-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-013-0404-2

Keywords

Navigation