Nonasymptotic One- and Two-Sample Tests in High Dimension with Unknown Covariance Structure

Blanchard, Gilles; Fermanian, Jean-Baptiste

doi:10.1007/978-3-031-30114-8_3

Gilles Blanchard^7,8 &
Jean-Baptiste Fermanian^7,9

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 425))

Included in the following conference series:

Foundations of Modern Statistics

355 Accesses

Abstract

Let \({\mathbb {X}}= (X_i)_{1\le i \le n}\) be an i.i.d. sample of square-integrable variables in \(\mathbb {R}^d\), with common expectation \(\mu \) and covariance matrix \(\varSigma \), both unknown. We consider the problem of testing if \(\mu \) is \(\eta \)-close to zero, i.e. \(\Vert \mu \Vert \le \eta \) against \(\Vert \mu \Vert \ge (\eta + \delta )\); we also tackle the more general two-sample mean closeness (also known as relevant difference) testing problem. The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distance \(\delta \) such that we can control both the Type I and Type II errors at a given level. The main technical tools are concentration inequalities, first for a suitable estimator of \(\big \Vert \mu \big \Vert ^2\) used a test statistic, and secondly for estimating the operator and Frobenius norms of \(\varSigma \) coming into the quantiles of said test statistic. These properties are obtained for Gaussian and bounded distributions. A particular attention is given to the dependence in the pseudo-dimension \(d_*\) of the distribution, defined as \(d_* := \big \Vert \varSigma \big \Vert _2^2/\big \Vert \varSigma \big \Vert _\infty ^2\). In particular, for \(\eta =0\), the minimum separation distance is \({\varTheta }( d_*^{\nicefrac {1}{4}}\sqrt{\big \Vert \varSigma \big \Vert _\infty /n})\), in contrast with the minimax estimation distance for \(\mu \), which is \({\varTheta }(d_e^{\nicefrac {1}{2}}\sqrt{\big \Vert \varSigma \big \Vert _\infty /n})\) (where \(d_e:=\big \Vert \varSigma \big \Vert _1/\big \Vert \varSigma \big \Vert _\infty \)). This generalizes a phenomenon spelled out in particular by Baraud (Bernoulli 8(5):577–606 2002, [3]).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
With the notation \(\big \Vert \varSigma \big \Vert _p\) we mean p-Schatten norm. We will freely use in the paper the equivalent notation \(\big \Vert \varSigma \big \Vert _\infty = \big \Vert \varSigma \big \Vert _{\textrm{op}}\), \(\big \Vert \varSigma \big \Vert _1 = \mathop {\textrm{Tr}}(\varSigma )\), \(\big \Vert \varSigma \big \Vert _2^2 = \mathop {\textrm{Tr}}(\varSigma ^2)\).
2.
That is, the real random variable \(\big \Vert \varPhi (Z)\big \Vert \) is integrable, which guarantees that the integral of \(\varPhi (Z)\) is well-defined in a strong sense as an element of the Hilbert space; see e.g. Cohn [9].
3.
We are happy to report that at the time of publication of this article, it appears that very recent work of V. Spokoiny has precisely addressed this topic: see /Sharp deviation bounds and concentration phenomenon for the squared norm of a sub-gaussian vector/, V. Spokoiny, arXiv: 2305.07885, 2023.
4.
In the original result the u deviation term involves an additional constant D and we simply use \(D\le C\) here.

References

Anderson, T.: An Introduction to Multivariate Statistical Analysis, 3rd edn. Wiley Series in Probability and Mathematical Statistics. Wiley (2003)
Google Scholar
Balasubramanian, K., Li, T., Yuan, M.: On the optimality of kernelembedding based goodness-of-fit tests. J. Mach. Learn. Res. 22(1), 1–45 (2021)
MATH Google Scholar
Baraud, Y.: Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8(5), 577–606 (2002)
MathSciNet MATH Google Scholar
Berger, J.O., Delampady, M.: Testing precise hypotheses. Stat. Sci. 2(3), 317–335 (1987)
MathSciNet MATH Google Scholar
Birgé, L.: An alternative point of view on Lepski’s method. In: State of the Art in Probability and Statistics (Leiden, 1999), vol. 36, pp. 113–133. IMS Lecture Notes Monograph Series Institute Mathematical Statistics (2001)
Google Scholar
Blanchard, G., Carpentier, A., Gutzeit, M.: Minimax Euclidean separation rates for testing convex hypotheses in \(\mathbb{R} ^{d}\). Electron. J. Stat. 12(2), 3713–3735 (2018)
Article MathSciNet MATH Google Scholar
Bousquet, O.: A Bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Mathematiques de l’Académie des Sciences 334(6), 495–500 (2002)
Article MathSciNet MATH Google Scholar
Chwialkowski, K., Strathmann, H., Gretton, A.: A kernel test of goodness of fit. In: Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), vol. 48, pp. 2606–2615 (2016)
Google Scholar
Cohn, D.L.: Measure Theory/Donald L. Cohn. English. Birkhauser Boston, ix, p. 373 (1980)
Google Scholar
Dette, H., Kokot, K., Aue, A.: Functional data analysis in the Banach space of continuous functions. Ann. Stat. 48(2), 1168–1192 (2020)
Article MathSciNet MATH Google Scholar
Dette, H., Kokot, K., Volgushev, S.: Testing relevant hypotheses in functional time series via self-normalization. J. R. Stat. Soc. Ser. B 82(3), 629–660 (2020)
Article MathSciNet MATH Google Scholar
Dette, H., Munk, A.: Nonparametric comparison of several regression functions: exact and asymptotic theory. Ann. Stat. 26(6), 2339–2368 (1998)
Article MathSciNet MATH Google Scholar
Ermakov, M.S.: Minimax detection of a signal in a Gaussian white noise. Theory Probab. Appl. 35(4), 667–679 (1991)
Article MathSciNet MATH Google Scholar
Fromont, M., Laurent, B., Lerasle, M., Reynaud-Bouret, P.: Kernels based tests with non-asymptotic bootstrap approaches for two-sample problems. In: Mannor, S., Srebro, N., Williamson R.C. (eds.) Proceedings of the 25th Annual Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 23, pp. 1–23 (2012)
Google Scholar
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13(25), 723–773 (2012)
MathSciNet MATH Google Scholar
Houdré, C., Reynaud-Bouret, P.: Exponential inequalities, with constants, for U-statistics of order two. In: Stochastic Inequalities and Applications. Progress in Probability, vol. 56, pp. 55–69 (2003)
Google Scholar
Hsu, D., Kakade, S., Zhang, T.: A tail inequality for quadratic forms of subgaussian random vectors. Electron. Commun. Probab. 17, 6 (2012)
Google Scholar
Ingster, Y.I.: Minimax nonparametric detection of signals in white Gaussian noise. Probl. Inf. Transm. 18(2), 130–140 (1982)
MathSciNet MATH Google Scholar
Ingster, Y. I.: Asymptotically minimax hypothesis testing for nonparametric alternatives I-II-III. Math. Methods Stat. 2(2–4), 85–114, 171–189, 249–268 (1993)
Google Scholar
Ingster, Y., Suslina, I.A.: Nonparametric goodness-of-fit testing under Gaussian models. In: Lecture Notes in Statistics, vol. 169. Springer (2012)
Google Scholar
Ingster, Y.I., Suslina, I.A.: Minimax detection of a signal for Besov bodies and balls. Problems Inf. Transm. 34(1), 48–59 (1998)
MATH Google Scholar
Jirak, M., Wahl, M.: Perturbation Bounds for Eigenspaces Under a Relative Gap Condition (2018). arXiv: 1803.03868 [math.PR]
Kim, I., Balakrishnan, S., Wasserman, L.: Minimax Optimality of Permutation Tests (2020). arXiv: 2003.13208 [math.ST]
Koltchinskii, V., Lounici, K.: Concentration inequalities and moment bounds for sample covariance operators. Bernoulli 23(1), 110–133 (2017)
Article MathSciNet MATH Google Scholar
Lam-Weil, J., Carpentier, A., Sriperumbudur, B.K.: Local Minimax Rates for Closeness Testing of Discrete Distributions (2021). arXiv: 1902.01219 [math.ST]
Lepski, O.V., Spokoiny, V.G.: Minimax nonparametric hypothesis testing: the case of an inhomogeneous alternative. Bernoulli 5(2), 333–358 (1999)
Article MathSciNet MATH Google Scholar
Lugosi, G., Mendelson, S.: Mean estimation and regression under heavy-tailed distributions: a survey. Found. Comput. Math. 19(5), 1145–1190 (2019)
Article MathSciNet MATH Google Scholar
Marienwald, H., Fermanian, J.-B., Blanchard, G.: High-dimensional multi-task averaging and application to kernel mean embedding. In: AISTATS 2021 (2020). arXiv: 2011.06794 [stat.ML]
Massart, P.: Concentration Inequalities and Model Selection. Springer (2003)
Google Scholar
Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B.: Kernel mean embedding of distributions: a review and beyond. Found. Trends Mach. Learn. 10(1–2), 1–141 (2017)
Article MATH Google Scholar
Munk, A., Czado, C.: Nonparametric validation of similar distributions and assessment of goodness of fit. J. R. Stat. Soc. Ser. B 60(1), 223–241 (1998)
Article MathSciNet MATH Google Scholar
Naumov, A., Spokoiny, V.G., Ulyanov, V.: Bootstrap confidence sets for spectral projectors of sample covariance. Probab. Theory Related Fields 174(3), 1091–1132 (2019)
Article MathSciNet MATH Google Scholar
Ostrovskii, D.M., Ndaoud, M., Javanmard, A., Razaviyayn, M.: Near-Optimal Model Discrimination with Non-Disclosure (2020). arXiv: 2012.02901 [math.ST]
Smola, A., Gretton, A., Song, L., Schölkopf, B.: A Hilbert space embedding for distributions. In: Proceedings of International Conference on Algorithmic Learning Theory (ALT 2007), pp. 13–31 (2007)
Google Scholar
Spokoiny, V.G.: Adaptive hypothesis testing using wavelets. Ann. Stat. 24(6), 2477–2498 (1996)
Article MathSciNet MATH Google Scholar
Spokoiny, V.G.: Parametric estimation. Finite sample theory. Ann. Stat. 40(6), 2877–2909 (2012)
Google Scholar
Spokoiny, V.G., Dickhaus, T.: Basics of Modern Mathematical Statistics. Springer Texts in Statistics. Springer (2015)
Google Scholar
Spokoiny, V.G., Zhilova, M.: Sharp deviation bounds for quadratic forms. Math. Methods Stat. 22(2), 100–113 (2013)
Article MathSciNet MATH Google Scholar
Spokoiny, V.G., Zhilova, M.: Bootstrap confidence sets under model misspecification. Ann. Stat. 43(6), 2653–2675 (2015)
Article MathSciNet MATH Google Scholar
van Handel, R.: Structured random matrices. In: Convexity and Concentration, pp. 107–156. Springer (2017)
Google Scholar
Vershynin, R.: High-Dimensional Probability: An Introduction with Applications to Data Science. Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47. Cambridge University Press (2018)
Google Scholar
Wellek, S.: Testing Statistical Hypotheses of Equivalence. Chapman and Hall/CRC (2002)
Google Scholar

Download references

Acknowledgements

GB acknowledges support from: Deutsche Forschungsgemeinschaft (DFG)–SFB1294/1–318763901; Agence Nationale de la Recherche (ANR), ANR-19-CHIA-0021-01 “BiSCottE”; the Franco-German University (UFA) through the binational Doktorandenkolleg CDFA 01-18. Both authors are extremely grateful to the two reviewers and to the editor, who by their very careful read of the initial manuscript and their various suggestions allowed us to improve its quality significantly.

Author information

Authors and Affiliations

Institut de Mathématiques d’Orsay, CNRS, Université Paris-Saclay, 307 rue Michel Magat, Bâtiment 307, 91400, Orsay, France
Gilles Blanchard & Jean-Baptiste Fermanian
Inria, 1 Rue Honoré d’Estienne d’Orves, 91120, Palaiseau, France
Gilles Blanchard
École Normale Supérieure de Rennes, Campus de Ker lann, 11 Avenue Robert Schuman, 35170, Bruz, France
Jean-Baptiste Fermanian

Authors

Gilles Blanchard
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Baptiste Fermanian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gilles Blanchard .

Editor information

Editors and Affiliations

Faculty of Mathematics, University of Duisburg-Essen, Essen, Germany
Denis Belomestny
Institut Polytechnique de Paris, CREST, ENSAE, Palaiseau, France
Cristina Butucea
Institute for Applied Mathematics, Heidelberg University, Heidelberg, Baden-Württemberg, Germany
Enno Mammen
CMAP, Ecole Polytechnique, Palaiseau, France
Eric Moulines
Institute of Mathematics, Humboldt-Universität zu Berlin, Berlin, Germany
Markus Reiß
Faculty of Computer Science, HSE University and Moscow State University, Moscow, Russia
Vladimir V. Ulyanov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blanchard, G., Fermanian, JB. (2023). Nonasymptotic One- and Two-Sample Tests in High Dimension with Unknown Covariance Structure. In: Belomestny, D., Butucea, C., Mammen, E., Moulines, E., Reiß, M., Ulyanov, V.V. (eds) Foundations of Modern Statistics. FMS 2019. Springer Proceedings in Mathematics & Statistics, vol 425. Springer, Cham. https://doi.org/10.1007/978-3-031-30114-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-30114-8_3
Published: 17 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30113-1
Online ISBN: 978-3-031-30114-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics