Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
  1. Home
  2. Probability Theory and Related Fields
  3. Article
A high-dimensional Wilks phenomenon
Download PDF
Download PDF
  • Published: 11 March 2010

A high-dimensional Wilks phenomenon

  • Stéphane Boucheron1 &
  • Pascal Massart2 

Probability Theory and Related Fields volume 150, pages 405–433 (2011)Cite this article

  • 563 Accesses

  • 15 Citations

  • Metrics details

Abstract

A theorem by Wilks asserts that in smooth parametric density estimation the difference between the maximum likelihood and the likelihood of the sampling distribution converges toward a Chi-square distribution where the number of degrees of freedom coincides with the model dimension. This observation is at the core of some goodness-of-fit testing procedures and of some classical model selection methods. This paper describes a non-asymptotic version of the Wilks phenomenon in bounded contrast optimization procedures. Using concentration inequalities for general functions of independent random variables, it proves that in bounded contrast minimization (as for example in Statistical Learning Theory), the difference between the empirical risk of the minimizer of the true risk in the model and the minimum of the empirical risk (the excess empirical risk) satisfies a Bernstein-like inequality where the variance term reflects the dimension of the model and the scale term reflects the noise conditions. From a mathematical statistics viewpoint, the significance of this result comes from the recent observation that when using model selection via penalization, the excess empirical risk represents a minimum penalty if non-asymptotic guarantees concerning prediction error are to be provided. From the perspective of empirical process theory, this paper describes a concentration inequality for the supremum of a bounded non-centered (actually non-positive) empirical process. Combining the now classical analysis of M-estimation (building on Talagrand’s inequality for suprema of empirical processes) and versatile moment inequalities for functions of independent random variables, this paper develops a genuine Bernstein-like inequality that seems beyond the reach of traditional tools.

Download to read the full article text

Working on a manuscript?

Avoid the common mistakes

References

  1. Akaike H.: A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19(6), 716–723 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  2. Alexander K.: Rates of growth and sample moduli for weighted empirical processes indexed by sets. Probab. Theory Relat. Fields 75, 379–423 (1987)

    Article  MATH  Google Scholar 

  3. Angluin D., Laird P.: Learning from noisy examples. Mach. Learn. 2(4), 343–370 (1987)

    Google Scholar 

  4. Arlot S., Massart P.: Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res. 10, 245–279 (2009)

    Google Scholar 

  5. Assouad P.: Densité et dimension. Ann. Inst. Fourier (Grenoble) 33(3), 233–282 (1983)

    MathSciNet  MATH  Google Scholar 

  6. Bartlett P., Mendelson S.: Empirical minimization. Probab. Theory Relat. Fields 135(3), 311–334 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bartlett P., Boucheron S., Lugosi G.: Model selection and error estimation. Mach. Learn. 48, 85–113 (2002)

    Article  MATH  Google Scholar 

  8. Bickel P., Doksum K.: Mathematical Statistics. Holden-Day Inc., San Francisco (1976)

    Google Scholar 

  9. Boucheron S., Bousquet O., Lugosi G.: Theory of classification: some recent advances. ESAIM Probab. Stat. 9, 329–375 (2005)

    Article  MathSciNet  Google Scholar 

  10. Boucheron S., Bousquet O., Lugosi G., Massart P.: Moment inequalities for functions of independent random variables. Ann. Probab. 33(2), 514–560 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  11. Bousquet, O.: Concentration inequalities for sub-additive functions using the entropy method. In: Stochastic Inequalities and Applications. Progress in Probability, vol. 56, pp. 213–247. Birkhäuser, Basel (2003)

  12. Bousquet O.: A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334(6), 495–500 (2002)

    MathSciNet  MATH  Google Scholar 

  13. de la Pena V., Giné E.: Decoupling. Springer, Berlin (1999)

    Book  Google Scholar 

  14. Devroye L., Wagner T.: Distribution-free inequalities for the deleted and holdout error estimates. IEEE Trans. Inform. Theory 25, 202–207 (1977)

    Article  MathSciNet  Google Scholar 

  15. Efron B., Stein C.: The jackknife estimate of variance. Ann. Stat. 9(3), 586–596 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  16. Fan J.: Local linear regression smoothers and their minimax efficiency. Ann. Stat. 21, 196–216 (1993)

    Article  MATH  Google Scholar 

  17. Fan J., Zhang C., Zhang J.: Generalized likelihood ratio statistics and Wilks phenomenon. Ann. Stat. 29(1), 153–193 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  18. Gayraud G., Pouet C.: Minimax testing composite null hypotheses in the discrete regression scheme. Math. Methods Stat. 10(4), 375–394 (2001)

    MathSciNet  MATH  Google Scholar 

  19. Giné E., Koltchinskii V.: Concentration inequalities and asymptotic results for ratio type empirical processes. Ann. Probab. 34(3), 1143–1216 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  20. Giné, E., Koltchinskii, V., Wellner, J.: Stochastic inequalities and applications. In: Ratio Limit Theorems for Empirical Processes, pp. 249–278. Birkhaüser, Basel (2003)

  21. Huber, P.: The behavior of the maximum likelihood estimates under non-standard conditions. In: Proceedings of Fifth Berkeley Symposium on Probability and Mathematical Statistics, pp. 221–233. University of California Press, Berkeley (1967)

  22. Kearns M., Ron D.: Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Comput. 11(6), 1427–1453 (1999)

    Article  Google Scholar 

  23. Kearns M., Mansour Y., Ng A., Ron D.: An experimental and theoretical comparison of model selection methods. Mach. Learn. 27, 7–50 (1997)

    Article  Google Scholar 

  24. Koltchinskii V.: Localized rademacher complexities and oracle inequalities in risk minimization. Ann. Stat. 34, 2593–2656 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  25. Ledoux, M.: On Talagrand’s deviation inequalities for product measures. ESAIM Probab. Stat. 1, 63–87 (1995/1997)

    Google Scholar 

  26. Ledoux M.: The concentration of measure phenomenon. American Mathematical Society, Providence (2001)

    MATH  Google Scholar 

  27. Ledoux M., Talagrand M.: Probability in Banach spaces. Springer, Berlin (1991)

    MATH  Google Scholar 

  28. Mallows C.: Some comments on C p . Technometrics 15(4), 661–675 (1973)

    Article  MATH  Google Scholar 

  29. Mammen E., Tsybakov A.: Smooth discrimination analysis. Ann. Stat. 27(6), 1808–1829 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  30. Massart P.: About the constants in Talagrand’s concentration inequality. Ann. Probab. 28, 863–885 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  31. Massart P.: Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse IX(2), 245–303 (2000)

    MathSciNet  Google Scholar 

  32. Massart, P.: Concentration inequalities and model selection. Ecole d’Eté de Probabilité de Saint-Flour xxxiv. In: Lecture Notes in Mathematics, vol. 1896. Springer, Berlin (2007)

  33. Massart P., Nedelec E.: Risk bounds for classification. Ann. Stat. 34(5), 2326–2366 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  34. Pollard D.: Convergence of Stochastic Processes. Springer, Berlin (1984)

    MATH  Google Scholar 

  35. Portnoy S.: Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Stat. 16, 356–366 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  36. Quenouille M.: Approximate test of correlation in time series. J. R. Stat. Soc. Ser. B 11, 68–84 (1949)

    MathSciNet  MATH  Google Scholar 

  37. Rakhlin A., Mukherjee S., Poggio T.: Stability results in learning theory. Anal. Appl. (Singapore) 3, 397–417 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  38. Rio E.: Inégalités de concentration pour les processus empiriques de classes de parties. Probab. Theory Relat. Fields 119, 163–175 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  39. Schoelkopf B., Smola A.: Learning with Kernels. MIT Press, Cambridge (2002)

    Google Scholar 

  40. Shorack G.R., Wellner J.A.: Empirical Processes with Applications to Statistics. Wiley, New York (1986)

    MATH  Google Scholar 

  41. Talagrand M.: A new look at independence. Ann. Probab. 24, 1–34 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  42. Talagrand M.: New concentration inequalities in product spaces. Invent. Math. 126, 505–563 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  43. Tsybakov A.B.: Optimal aggregation of classifiers in statistical learning. Ann. Stat. 32, 135–166 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  44. Tukey J.: Bias and confidence in not quite large samples. Ann. Math. Stat. 29, 614 (1958)

    Article  Google Scholar 

  45. van de Geer S.: Applications of Empirical Process Theory. Cambridge University Press, London (2000)

    MATH  Google Scholar 

  46. van der Vaart A.: Asymptotic Statistics. Cambridge University Press, London (1998)

    MATH  Google Scholar 

  47. van der Vaart A., Wellner J.: Weak Convergence and Empirical Processes. Springer, Berlin (1996)

    MATH  Google Scholar 

  48. Vapnik V.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)

    MATH  Google Scholar 

  49. Wilks S.: The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat. 9, 60–62 (1938)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Laboratoire Probabilités et Modèles Aléatoires, Université Paris-Diderot, 175 rue du Chevaleret, 75013, Paris, France

    Stéphane Boucheron

  2. Département de Mathématiques, Université Paris-Sud, 91405, Orsay, France

    Pascal Massart

Authors
  1. Stéphane Boucheron
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Pascal Massart
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pascal Massart.

Additional information

This work was supported by ANR Grant TAMIS and Network of Excellence PASCAL II.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Boucheron, S., Massart, P. A high-dimensional Wilks phenomenon. Probab. Theory Relat. Fields 150, 405–433 (2011). https://doi.org/10.1007/s00440-010-0278-7

Download citation

  • Received: 28 March 2009

  • Revised: 09 February 2010

  • Published: 11 March 2010

  • Issue Date: August 2011

  • DOI: https://doi.org/10.1007/s00440-010-0278-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Wilks phenomenon
  • Risk estimates
  • Suprema of empirical processes
  • Concentration inequalities
  • Statistical learning

Mathematics Subject Classification (2000)

  • 60E15
  • 62G08
  • 62H30
Download PDF

Working on a manuscript?

Avoid the common mistakes

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature