Skip to main content

Nonasymptotic One- and Two-Sample Tests in High Dimension with Unknown Covariance Structure

  • Conference paper
  • First Online:
Foundations of Modern Statistics (FMS 2019)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 425))

Included in the following conference series:

  • 355 Accesses

Abstract

Let \({\mathbb {X}}= (X_i)_{1\le i \le n}\) be an i.i.d. sample of square-integrable variables in \(\mathbb {R}^d\), with common expectation \(\mu \) and covariance matrix \(\varSigma \), both unknown. We consider the problem of testing if \(\mu \) is \(\eta \)-close to zero, i.e. \(\Vert \mu \Vert \le \eta \) against \(\Vert \mu \Vert \ge (\eta + \delta )\); we also tackle the more general two-sample mean closeness (also known as relevant difference) testing problem. The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distance \(\delta \) such that we can control both the Type I and Type II errors at a given level. The main technical tools are concentration inequalities, first for a suitable estimator of \(\big \Vert \mu \big \Vert ^2\) used a test statistic, and secondly for estimating the operator and Frobenius norms of \(\varSigma \) coming into the quantiles of said test statistic. These properties are obtained for Gaussian and bounded distributions. A particular attention is given to the dependence in the pseudo-dimension \(d_*\) of the distribution, defined as \(d_* := \big \Vert \varSigma \big \Vert _2^2/\big \Vert \varSigma \big \Vert _\infty ^2\). In particular, for \(\eta =0\), the minimum separation distance is \({\varTheta }( d_*^{\nicefrac {1}{4}}\sqrt{\big \Vert \varSigma \big \Vert _\infty /n})\), in contrast with the minimax estimation distance for \(\mu \), which is \({\varTheta }(d_e^{\nicefrac {1}{2}}\sqrt{\big \Vert \varSigma \big \Vert _\infty /n})\) (where \(d_e:=\big \Vert \varSigma \big \Vert _1/\big \Vert \varSigma \big \Vert _\infty \)). This generalizes a phenomenon spelled out in particular by Baraud (Bernoulli 8(5):577–606 2002, [3]).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    With the notation \(\big \Vert \varSigma \big \Vert _p\) we mean p-Schatten norm. We will freely use in the paper the equivalent notation \(\big \Vert \varSigma \big \Vert _\infty = \big \Vert \varSigma \big \Vert _{\textrm{op}}\), \(\big \Vert \varSigma \big \Vert _1 = \mathop {\textrm{Tr}}(\varSigma )\), \(\big \Vert \varSigma \big \Vert _2^2 = \mathop {\textrm{Tr}}(\varSigma ^2)\).

  2. 2.

    That is, the real random variable \(\big \Vert \varPhi (Z)\big \Vert \) is integrable, which guarantees that the integral of \(\varPhi (Z)\) is well-defined in a strong sense as an element of the Hilbert space; see e.g. Cohn [9].

  3. 3.

    We are happy to report that at the time of publication of this article, it appears that very recent work of V. Spokoiny has precisely addressed this topic: see /Sharp deviation bounds and concentration phenomenon for the squared norm of a sub-gaussian vector/, V. Spokoiny, arXiv: 2305.07885, 2023.

  4. 4.

    In the original result the u deviation term involves an additional constant D and we simply use \(D\le C\) here.

References

  1. Anderson, T.: An Introduction to Multivariate Statistical Analysis, 3rd edn. Wiley Series in Probability and Mathematical Statistics. Wiley (2003)

    Google Scholar 

  2. Balasubramanian, K., Li, T., Yuan, M.: On the optimality of kernelembedding based goodness-of-fit tests. J. Mach. Learn. Res. 22(1), 1–45 (2021)

    MATH  Google Scholar 

  3. Baraud, Y.: Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8(5), 577–606 (2002)

    MathSciNet  MATH  Google Scholar 

  4. Berger, J.O., Delampady, M.: Testing precise hypotheses. Stat. Sci. 2(3), 317–335 (1987)

    MathSciNet  MATH  Google Scholar 

  5. Birgé, L.: An alternative point of view on Lepski’s method. In: State of the Art in Probability and Statistics (Leiden, 1999), vol. 36, pp. 113–133. IMS Lecture Notes Monograph Series Institute Mathematical Statistics (2001)

    Google Scholar 

  6. Blanchard, G., Carpentier, A., Gutzeit, M.: Minimax Euclidean separation rates for testing convex hypotheses in \(\mathbb{R} ^{d}\). Electron. J. Stat. 12(2), 3713–3735 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bousquet, O.: A Bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Mathematiques de l’Académie des Sciences 334(6), 495–500 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chwialkowski, K., Strathmann, H., Gretton, A.: A kernel test of goodness of fit. In: Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), vol. 48, pp. 2606–2615 (2016)

    Google Scholar 

  9. Cohn, D.L.: Measure Theory/Donald L. Cohn. English. Birkhauser Boston, ix, p. 373 (1980)

    Google Scholar 

  10. Dette, H., Kokot, K., Aue, A.: Functional data analysis in the Banach space of continuous functions. Ann. Stat. 48(2), 1168–1192 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  11. Dette, H., Kokot, K., Volgushev, S.: Testing relevant hypotheses in functional time series via self-normalization. J. R. Stat. Soc. Ser. B 82(3), 629–660 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  12. Dette, H., Munk, A.: Nonparametric comparison of several regression functions: exact and asymptotic theory. Ann. Stat. 26(6), 2339–2368 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  13. Ermakov, M.S.: Minimax detection of a signal in a Gaussian white noise. Theory Probab. Appl. 35(4), 667–679 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  14. Fromont, M., Laurent, B., Lerasle, M., Reynaud-Bouret, P.: Kernels based tests with non-asymptotic bootstrap approaches for two-sample problems. In: Mannor, S., Srebro, N., Williamson R.C. (eds.) Proceedings of the 25th Annual Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 23, pp. 1–23 (2012)

    Google Scholar 

  15. Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13(25), 723–773 (2012)

    MathSciNet  MATH  Google Scholar 

  16. Houdré, C., Reynaud-Bouret, P.: Exponential inequalities, with constants, for U-statistics of order two. In: Stochastic Inequalities and Applications. Progress in Probability, vol. 56, pp. 55–69 (2003)

    Google Scholar 

  17. Hsu, D., Kakade, S., Zhang, T.: A tail inequality for quadratic forms of subgaussian random vectors. Electron. Commun. Probab. 17, 6 (2012)

    Google Scholar 

  18. Ingster, Y.I.: Minimax nonparametric detection of signals in white Gaussian noise. Probl. Inf. Transm. 18(2), 130–140 (1982)

    MathSciNet  MATH  Google Scholar 

  19. Ingster, Y. I.: Asymptotically minimax hypothesis testing for nonparametric alternatives I-II-III. Math. Methods Stat. 2(2–4), 85–114, 171–189, 249–268 (1993)

    Google Scholar 

  20. Ingster, Y., Suslina, I.A.: Nonparametric goodness-of-fit testing under Gaussian models. In: Lecture Notes in Statistics, vol. 169. Springer (2012)

    Google Scholar 

  21. Ingster, Y.I., Suslina, I.A.: Minimax detection of a signal for Besov bodies and balls. Problems Inf. Transm. 34(1), 48–59 (1998)

    MATH  Google Scholar 

  22. Jirak, M., Wahl, M.: Perturbation Bounds for Eigenspaces Under a Relative Gap Condition (2018). arXiv: 1803.03868 [math.PR]

  23. Kim, I., Balakrishnan, S., Wasserman, L.: Minimax Optimality of Permutation Tests (2020). arXiv: 2003.13208 [math.ST]

  24. Koltchinskii, V., Lounici, K.: Concentration inequalities and moment bounds for sample covariance operators. Bernoulli 23(1), 110–133 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  25. Lam-Weil, J., Carpentier, A., Sriperumbudur, B.K.: Local Minimax Rates for Closeness Testing of Discrete Distributions (2021). arXiv: 1902.01219 [math.ST]

  26. Lepski, O.V., Spokoiny, V.G.: Minimax nonparametric hypothesis testing: the case of an inhomogeneous alternative. Bernoulli 5(2), 333–358 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lugosi, G., Mendelson, S.: Mean estimation and regression under heavy-tailed distributions: a survey. Found. Comput. Math. 19(5), 1145–1190 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  28. Marienwald, H., Fermanian, J.-B., Blanchard, G.: High-dimensional multi-task averaging and application to kernel mean embedding. In: AISTATS 2021 (2020). arXiv: 2011.06794 [stat.ML]

  29. Massart, P.: Concentration Inequalities and Model Selection. Springer (2003)

    Google Scholar 

  30. Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B.: Kernel mean embedding of distributions: a review and beyond. Found. Trends Mach. Learn. 10(1–2), 1–141 (2017)

    Article  MATH  Google Scholar 

  31. Munk, A., Czado, C.: Nonparametric validation of similar distributions and assessment of goodness of fit. J. R. Stat. Soc. Ser. B 60(1), 223–241 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  32. Naumov, A., Spokoiny, V.G., Ulyanov, V.: Bootstrap confidence sets for spectral projectors of sample covariance. Probab. Theory Related Fields 174(3), 1091–1132 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  33. Ostrovskii, D.M., Ndaoud, M., Javanmard, A., Razaviyayn, M.: Near-Optimal Model Discrimination with Non-Disclosure (2020). arXiv: 2012.02901 [math.ST]

  34. Smola, A., Gretton, A., Song, L., Schölkopf, B.: A Hilbert space embedding for distributions. In: Proceedings of International Conference on Algorithmic Learning Theory (ALT 2007), pp. 13–31 (2007)

    Google Scholar 

  35. Spokoiny, V.G.: Adaptive hypothesis testing using wavelets. Ann. Stat. 24(6), 2477–2498 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  36. Spokoiny, V.G.: Parametric estimation. Finite sample theory. Ann. Stat. 40(6), 2877–2909 (2012)

    Google Scholar 

  37. Spokoiny, V.G., Dickhaus, T.: Basics of Modern Mathematical Statistics. Springer Texts in Statistics. Springer (2015)

    Google Scholar 

  38. Spokoiny, V.G., Zhilova, M.: Sharp deviation bounds for quadratic forms. Math. Methods Stat. 22(2), 100–113 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  39. Spokoiny, V.G., Zhilova, M.: Bootstrap confidence sets under model misspecification. Ann. Stat. 43(6), 2653–2675 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  40. van Handel, R.: Structured random matrices. In: Convexity and Concentration, pp. 107–156. Springer (2017)

    Google Scholar 

  41. Vershynin, R.: High-Dimensional Probability: An Introduction with Applications to Data Science. Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47. Cambridge University Press (2018)

    Google Scholar 

  42. Wellek, S.: Testing Statistical Hypotheses of Equivalence. Chapman and Hall/CRC (2002)

    Google Scholar 

Download references

Acknowledgements

GB acknowledges support from: Deutsche Forschungsgemeinschaft (DFG)–SFB1294/1–318763901; Agence Nationale de la Recherche (ANR), ANR-19-CHIA-0021-01 “BiSCottE”; the Franco-German University (UFA) through the binational Doktorandenkolleg CDFA 01-18. Both authors are extremely grateful to the two reviewers and to the editor, who by their very careful read of the initial manuscript and their various suggestions allowed us to improve its quality significantly.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gilles Blanchard .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Blanchard, G., Fermanian, JB. (2023). Nonasymptotic One- and Two-Sample Tests in High Dimension with Unknown Covariance Structure. In: Belomestny, D., Butucea, C., Mammen, E., Moulines, E., Reiß, M., Ulyanov, V.V. (eds) Foundations of Modern Statistics. FMS 2019. Springer Proceedings in Mathematics & Statistics, vol 425. Springer, Cham. https://doi.org/10.1007/978-3-031-30114-8_3

Download citation

Publish with us

Policies and ethics