Abstract
Let \({\mathbb {X}}= (X_i)_{1\le i \le n}\) be an i.i.d. sample of square-integrable variables in \(\mathbb {R}^d\), with common expectation \(\mu \) and covariance matrix \(\varSigma \), both unknown. We consider the problem of testing if \(\mu \) is \(\eta \)-close to zero, i.e. \(\Vert \mu \Vert \le \eta \) against \(\Vert \mu \Vert \ge (\eta + \delta )\); we also tackle the more general two-sample mean closeness (also known as relevant difference) testing problem. The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distance \(\delta \) such that we can control both the Type I and Type II errors at a given level. The main technical tools are concentration inequalities, first for a suitable estimator of \(\big \Vert \mu \big \Vert ^2\) used a test statistic, and secondly for estimating the operator and Frobenius norms of \(\varSigma \) coming into the quantiles of said test statistic. These properties are obtained for Gaussian and bounded distributions. A particular attention is given to the dependence in the pseudo-dimension \(d_*\) of the distribution, defined as \(d_* := \big \Vert \varSigma \big \Vert _2^2/\big \Vert \varSigma \big \Vert _\infty ^2\). In particular, for \(\eta =0\), the minimum separation distance is \({\varTheta }( d_*^{\nicefrac {1}{4}}\sqrt{\big \Vert \varSigma \big \Vert _\infty /n})\), in contrast with the minimax estimation distance for \(\mu \), which is \({\varTheta }(d_e^{\nicefrac {1}{2}}\sqrt{\big \Vert \varSigma \big \Vert _\infty /n})\) (where \(d_e:=\big \Vert \varSigma \big \Vert _1/\big \Vert \varSigma \big \Vert _\infty \)). This generalizes a phenomenon spelled out in particular by Baraud (Bernoulli 8(5):577–606 2002, [3]).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
With the notation \(\big \Vert \varSigma \big \Vert _p\) we mean p-Schatten norm. We will freely use in the paper the equivalent notation \(\big \Vert \varSigma \big \Vert _\infty = \big \Vert \varSigma \big \Vert _{\textrm{op}}\), \(\big \Vert \varSigma \big \Vert _1 = \mathop {\textrm{Tr}}(\varSigma )\), \(\big \Vert \varSigma \big \Vert _2^2 = \mathop {\textrm{Tr}}(\varSigma ^2)\).
- 2.
That is, the real random variable \(\big \Vert \varPhi (Z)\big \Vert \) is integrable, which guarantees that the integral of \(\varPhi (Z)\) is well-defined in a strong sense as an element of the Hilbert space; see e.g. Cohn [9].
- 3.
We are happy to report that at the time of publication of this article, it appears that very recent work of V. Spokoiny has precisely addressed this topic: see /Sharp deviation bounds and concentration phenomenon for the squared norm of a sub-gaussian vector/, V. Spokoiny, arXiv: 2305.07885, 2023.
- 4.
In the original result the u deviation term involves an additional constant D and we simply use \(D\le C\) here.
References
Anderson, T.: An Introduction to Multivariate Statistical Analysis, 3rd edn. Wiley Series in Probability and Mathematical Statistics. Wiley (2003)
Balasubramanian, K., Li, T., Yuan, M.: On the optimality of kernelembedding based goodness-of-fit tests. J. Mach. Learn. Res. 22(1), 1–45 (2021)
Baraud, Y.: Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8(5), 577–606 (2002)
Berger, J.O., Delampady, M.: Testing precise hypotheses. Stat. Sci. 2(3), 317–335 (1987)
Birgé, L.: An alternative point of view on Lepski’s method. In: State of the Art in Probability and Statistics (Leiden, 1999), vol. 36, pp. 113–133. IMS Lecture Notes Monograph Series Institute Mathematical Statistics (2001)
Blanchard, G., Carpentier, A., Gutzeit, M.: Minimax Euclidean separation rates for testing convex hypotheses in \(\mathbb{R} ^{d}\). Electron. J. Stat. 12(2), 3713–3735 (2018)
Bousquet, O.: A Bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Mathematiques de l’Académie des Sciences 334(6), 495–500 (2002)
Chwialkowski, K., Strathmann, H., Gretton, A.: A kernel test of goodness of fit. In: Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), vol. 48, pp. 2606–2615 (2016)
Cohn, D.L.: Measure Theory/Donald L. Cohn. English. Birkhauser Boston, ix, p. 373 (1980)
Dette, H., Kokot, K., Aue, A.: Functional data analysis in the Banach space of continuous functions. Ann. Stat. 48(2), 1168–1192 (2020)
Dette, H., Kokot, K., Volgushev, S.: Testing relevant hypotheses in functional time series via self-normalization. J. R. Stat. Soc. Ser. B 82(3), 629–660 (2020)
Dette, H., Munk, A.: Nonparametric comparison of several regression functions: exact and asymptotic theory. Ann. Stat. 26(6), 2339–2368 (1998)
Ermakov, M.S.: Minimax detection of a signal in a Gaussian white noise. Theory Probab. Appl. 35(4), 667–679 (1991)
Fromont, M., Laurent, B., Lerasle, M., Reynaud-Bouret, P.: Kernels based tests with non-asymptotic bootstrap approaches for two-sample problems. In: Mannor, S., Srebro, N., Williamson R.C. (eds.) Proceedings of the 25th Annual Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 23, pp. 1–23 (2012)
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13(25), 723–773 (2012)
Houdré, C., Reynaud-Bouret, P.: Exponential inequalities, with constants, for U-statistics of order two. In: Stochastic Inequalities and Applications. Progress in Probability, vol. 56, pp. 55–69 (2003)
Hsu, D., Kakade, S., Zhang, T.: A tail inequality for quadratic forms of subgaussian random vectors. Electron. Commun. Probab. 17, 6 (2012)
Ingster, Y.I.: Minimax nonparametric detection of signals in white Gaussian noise. Probl. Inf. Transm. 18(2), 130–140 (1982)
Ingster, Y. I.: Asymptotically minimax hypothesis testing for nonparametric alternatives I-II-III. Math. Methods Stat. 2(2–4), 85–114, 171–189, 249–268 (1993)
Ingster, Y., Suslina, I.A.: Nonparametric goodness-of-fit testing under Gaussian models. In: Lecture Notes in Statistics, vol. 169. Springer (2012)
Ingster, Y.I., Suslina, I.A.: Minimax detection of a signal for Besov bodies and balls. Problems Inf. Transm. 34(1), 48–59 (1998)
Jirak, M., Wahl, M.: Perturbation Bounds for Eigenspaces Under a Relative Gap Condition (2018). arXiv: 1803.03868 [math.PR]
Kim, I., Balakrishnan, S., Wasserman, L.: Minimax Optimality of Permutation Tests (2020). arXiv: 2003.13208 [math.ST]
Koltchinskii, V., Lounici, K.: Concentration inequalities and moment bounds for sample covariance operators. Bernoulli 23(1), 110–133 (2017)
Lam-Weil, J., Carpentier, A., Sriperumbudur, B.K.: Local Minimax Rates for Closeness Testing of Discrete Distributions (2021). arXiv: 1902.01219 [math.ST]
Lepski, O.V., Spokoiny, V.G.: Minimax nonparametric hypothesis testing: the case of an inhomogeneous alternative. Bernoulli 5(2), 333–358 (1999)
Lugosi, G., Mendelson, S.: Mean estimation and regression under heavy-tailed distributions: a survey. Found. Comput. Math. 19(5), 1145–1190 (2019)
Marienwald, H., Fermanian, J.-B., Blanchard, G.: High-dimensional multi-task averaging and application to kernel mean embedding. In: AISTATS 2021 (2020). arXiv: 2011.06794 [stat.ML]
Massart, P.: Concentration Inequalities and Model Selection. Springer (2003)
Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B.: Kernel mean embedding of distributions: a review and beyond. Found. Trends Mach. Learn. 10(1–2), 1–141 (2017)
Munk, A., Czado, C.: Nonparametric validation of similar distributions and assessment of goodness of fit. J. R. Stat. Soc. Ser. B 60(1), 223–241 (1998)
Naumov, A., Spokoiny, V.G., Ulyanov, V.: Bootstrap confidence sets for spectral projectors of sample covariance. Probab. Theory Related Fields 174(3), 1091–1132 (2019)
Ostrovskii, D.M., Ndaoud, M., Javanmard, A., Razaviyayn, M.: Near-Optimal Model Discrimination with Non-Disclosure (2020). arXiv: 2012.02901 [math.ST]
Smola, A., Gretton, A., Song, L., Schölkopf, B.: A Hilbert space embedding for distributions. In: Proceedings of International Conference on Algorithmic Learning Theory (ALT 2007), pp. 13–31 (2007)
Spokoiny, V.G.: Adaptive hypothesis testing using wavelets. Ann. Stat. 24(6), 2477–2498 (1996)
Spokoiny, V.G.: Parametric estimation. Finite sample theory. Ann. Stat. 40(6), 2877–2909 (2012)
Spokoiny, V.G., Dickhaus, T.: Basics of Modern Mathematical Statistics. Springer Texts in Statistics. Springer (2015)
Spokoiny, V.G., Zhilova, M.: Sharp deviation bounds for quadratic forms. Math. Methods Stat. 22(2), 100–113 (2013)
Spokoiny, V.G., Zhilova, M.: Bootstrap confidence sets under model misspecification. Ann. Stat. 43(6), 2653–2675 (2015)
van Handel, R.: Structured random matrices. In: Convexity and Concentration, pp. 107–156. Springer (2017)
Vershynin, R.: High-Dimensional Probability: An Introduction with Applications to Data Science. Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47. Cambridge University Press (2018)
Wellek, S.: Testing Statistical Hypotheses of Equivalence. Chapman and Hall/CRC (2002)
Acknowledgements
GB acknowledges support from: Deutsche Forschungsgemeinschaft (DFG)–SFB1294/1–318763901; Agence Nationale de la Recherche (ANR), ANR-19-CHIA-0021-01 “BiSCottE”; the Franco-German University (UFA) through the binational Doktorandenkolleg CDFA 01-18. Both authors are extremely grateful to the two reviewers and to the editor, who by their very careful read of the initial manuscript and their various suggestions allowed us to improve its quality significantly.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Blanchard, G., Fermanian, JB. (2023). Nonasymptotic One- and Two-Sample Tests in High Dimension with Unknown Covariance Structure. In: Belomestny, D., Butucea, C., Mammen, E., Moulines, E., Reiß, M., Ulyanov, V.V. (eds) Foundations of Modern Statistics. FMS 2019. Springer Proceedings in Mathematics & Statistics, vol 425. Springer, Cham. https://doi.org/10.1007/978-3-031-30114-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-30114-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30113-1
Online ISBN: 978-3-031-30114-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)