Abstract
A two-sample test statistic is presented for testing the equality of mean vectors when the dimension, \(p\), exceeds the sample sizes, \(n_i,\; i = 1, 2\), and the distributions are not necessarily normal. Under mild assumptions on the traces of the covariance matrices, the statistic is shown to be asymptotically Chi-square distributed when \(n_i, p \rightarrow \infty \). However, the validity of the test statistic when \(p\) is fixed but large, including \(p > n_i\), and when the distributions are multivariate normal, is shown as special cases. This two-sample Chi-square approximation helps us establish the validity of Box’s approximation for high-dimensional and non-normal data to a two-sample setup, valid even under Behrens–Fisher setting. The limiting Chi-square distribution of the statistic is obtained using the asymptotic theory of degenerate \(U\)-statistics, and using a result from classical asymptotic theory, it is further extended to an approximate normal distribution. Both independent and paired-sample cases are considered.
Similar content being viewed by others
References
Ahmad, M. R. (2008). Analysis of high-dimensional repeated measures designs: The one- and two-sample test statistics. Ph.D. Thesis. Göttingen: Cuvillier Verlag.
Ahmad, M. R., Werner, C., Brunner, E. (2008). Analysis of high dimensional repeated measures designs: The one sample case. Computational Statistics & Data Analysis, 53, 416–427.
Ahmad, M. R., von Rosen, D., Singull, M. (2012a). A note on mean testing for high-dimensional multivariate data under non-normality. Statistica Neerlandica, 67(1), 81–99.
Ahmad, M. R., Yamada, T., von Rosen, D. (2012b). Tests of covariance matrices for high-dimensional multivariate data. Test (Submitted).
Bai, Z., Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statistica Sinica, 6, 311–329.
Box, G. E. P. (1954). Some theorems on quadratic forms applied in the study of analysis of variance problems I: Effect of inequality of variance in the one-way classification. Annals of mathematical statistics, 25, 290–302.
Chen, S. X., Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics, 38(2), 808–835.
Davis, C. S. (2002). Statistical methods for the analysis of repeated measurements. New York: Springer.
Dempster, A. P. (1958). A high dimensional two sample significance test. Annals of Mathematical Statistics, 29(4), 995–1010.
Dempster, A. P. (1969). Elements of continuous multivariate analysis. MA: Addison-Wesley.
Denker, M. (1985). Asymptotic distribution theory in nonparametric statistics. Vieweg, Braunschweig: Vieweg Advanced Lectures.
Denker, M., Gordin, M. (2011). Limit theorems for von Mises statistics of a measure preserving, transformation. arXiv:1109.0635v1[math.DS]. September 3, 2011.
Denker, M., Keller, G. (1983). On \(U\)-statistics and v. Mises’ statistics for weakly dependent processes. Z. Wahrscheinlichkeitstheorie verw. Gebiete, 64, 505–522.
Dunford, N., Schwartz, J. T. (1967). Linear operators. Part II: Spectral theory-Self-adjoint operators in Hilbert space. New York: Wiley.
Fujikoshi, Y., Ulyanov, V. V., Shimizu, R. (2010). Multivariate statistics: High-dimensional and large-sample approximations. New York: Wiley.
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., Smola, A. (2008). A kernel method for the two-sample problem. Journal of Machine Learning Research, 1, 1–43.
Hájek, J., Šidák, Z., Sen, P. K. (1999). Theory of rank tests. San Diego: Academic Press.
Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics, 19, 293–325.
Holzmann, H., Koch, S., Min, A. (2004). Almost sure limit theorems for \(U\)-statistics. Statistics & Probability Letters, 69, 261–269.
Jiang, J. (2010). Large sample techniques for statistics. New York: Springer.
Koroljuk, V. S., Borovskich, Y. V. (1994). Theory of U-statistics. Dordrecht: Kluwer.
Kowalski, J., Tu, X. M. (2008). Modern applied U-statistics. New York: Wiley.
Kreyszig, E. (1978). Introductory functional analysis with applications. New York: Wiley.
Ledoit, O., Wolf, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. The Annals of Statistics, 30, 1081–1102.
Lee, A. J. (1990). U-statistics: Theory and practice. Boca Raton: CRC Press.
Lehmann, E. L. (1999). Elements of large-sample theory. New York: Springer.
Leucht, A. (2012). Degenerate \(U\)- and \(V\)-statistics under weak dependence: Asymptotic theory and bootstrap consistency. Bernoulli, 18, 552–585.
Masujima, M. (2009). Applied mathematical methods in theoretical physics (2nd ed.). New York: Wiley.
Neuhaus, G. (1977). Functional limit theorems for \(U\)-statistics in the degenerate case. Journal of Multivariate Analysis, 7, 424–439.
Neumeyer, N. (2004). A central limit theorem for two-sample \(U\)-processes. Statistics & Probability Letters, 67, 73–85.
Pinheiro, A., Sen, P. K., Pinheiro, H. P. (2009). Decomposibility of high-dimensional diversity measures: Quasi-\(U\)-statistics, martigales, and nonstandard asymptotics. Journal of Multivariate Statistics, 100, 1645–1656.
Reed, M., Simon, B. (1980). Methods of modern mathematical physics, Vol. I: Functional analysis. San Diego, CA: Academic Press.
Serfling, R. J. (1980). Approximation theorems of mathematical statistics. Weinheim: Wiley.
Srivastava, M. S. (2007). Multivariate theory for analyzing high dimensional data. Journal of Japan Statistical Association, 37, 53–86.
Srivastava, M. S. (2009). A test for the mean vector with fewer observations than the dimension under non-normality. Journal of Multivariate Analysis, 100, 518–532.
van der Vaart, A. W. (1998). Asymptotic statistics. Cambridge: Cambridge University Press.
Acknowledgments
The author is sincerely thankful to Prof. Dr. Anne Leucht, Theoretical Econometrics and Statistics, Department of Economics, University of Mannheim, Germany, for her careful perusal of the manuscript and helpful suggestions. Thanks are also due to the editor, the associate editor, and to two anonymous referees for their helpful suggestions and comments which lead to this improved version of the article.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Ahmad, M.R. A \(U\)-statistic approach for a high-dimensional two-sample mean testing problem under non-normality and Behrens–Fisher setting. Ann Inst Stat Math 66, 33–61 (2014). https://doi.org/10.1007/s10463-013-0404-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-013-0404-2