Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
  1. Home
  2. Probability Theory and Related Fields
  3. Article
Minimal Penalties for Gaussian Model Selection
Download PDF
Download PDF
  • Published: 04 July 2006

Minimal Penalties for Gaussian Model Selection

  • Lucien Birgé1 &
  • Pascal Massart2 

Probability Theory and Related Fields volume 138, pages 33–73 (2007)Cite this article

  • 940 Accesses

  • 3 Altmetric

  • Metrics details

Abstract

This paper is mainly devoted to a precise analysis of what kind of penalties should be used in order to perform model selection via the minimization of a penalized least-squares type criterion within some general Gaussian framework including the classical ones. As compared to our previous paper on this topic (Birgé and Massart in J. Eur. Math. Soc. 3, 203–268 (2001)), more elaborate forms of the penalties are given which are shown to be, in some sense, optimal. We indeed provide more precise upper bounds for the risk of the penalized estimators and lower bounds for the penalty terms, showing that the use of smaller penalties may lead to disastrous results. These lower bounds may also be used to design a practical strategy that allows to estimate the penalty from the data when the amount of noise is unknown. We provide an illustration of the method for the problem of estimating a piecewise constant signal in Gaussian noise when neither the number, nor the location of the change points are known.

Download to read the full article text

Working on a manuscript?

Avoid the common mistakes

References

  1. Abramovich, F., Benjamini, Y., Donoho, D.L., Johnstone, I.M.: Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34, (2006)

  2. Akaike H. (1969). Statistical predictor identification. Ann. Inst. Statist. Math. 22:203–217

    Article  MathSciNet  Google Scholar 

  3. Akaike H. (1973). Information theory and an extension of the maximum likelihood principle. In: Petrov P.N., Csaki F. (eds) Proceedings 2nd International Symposium on Information Theory. Akademia Kiado, Budapest, pp. 267–281

    Google Scholar 

  4. Akaike H. (1974). A new look at the statistical model identification. IEEE Trans. Autom. Control 19:716–723

    Article  MATH  MathSciNet  Google Scholar 

  5. Akaike H. A Bayesian analysis of the minimum AIC procedure. Ann. Inst. Statist. Math. 30, Part A, 9–14 (1978)

  6. Amemiya T. (1985). Advanced Econometrics. Basil Blackwell, Oxford

    Google Scholar 

  7. Barron A.R., Birgé L., Massart P. (1999). Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113:301–415

    Article  MATH  Google Scholar 

  8. Barron A.R., Cover T.M. (1991). Minimum complexity density estimation. IEEE Trans. Inf. Theory 37:1034–1054

    Article  MathSciNet  Google Scholar 

  9. Birgé, L.: An alternative point of view on Lepski’s method. In: de Gunst, M.C.M., Klaassen, C.A.J., van der Vaart, A.W. (eds.) State of the Art in Probability and Statistics, Festschrift for Willem R. van Zwet, Institute of Mathematical Statistics, Lecture Notes–Monograph Series, Vol. 36. 113–133 (2001)

  10. Birgé L., Massart P. (1998). Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4:329–375

    Article  MATH  MathSciNet  Google Scholar 

  11. Birgé L., Massart P. (2001). Gaussian model selection. J. Eur. Math. Soc. 3:203–268

    Article  MATH  Google Scholar 

  12. Birgé, L., Massart, P.: A generalized C p criterion for Gaussian model selection. Technical Report No 647. Laboratoire de Probabilités, Université Paris VI (2001) http://www.proba. jussieu.fr/mathdoc/preprints/index.html#2001

  13. Daniel C., Wood F.S. (1971). Fitting Equations to Data. Wiley, New York

    MATH  Google Scholar 

  14. Draper N.R., Smith H. (1981). Applied Regression Analysis, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  15. Efron B., Hastie R., Johnstone I.M., Tibshirani R. (2004). Least angle regression. Ann. Statist. 32:407–499

    Article  MATH  MathSciNet  Google Scholar 

  16. Feller W. (1968). An Introduction to Probability Theory and its Applications, Vol I (3rd edn). Wiley, New York

    Google Scholar 

  17. George E.I., Foster D.P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87:731–747

    Article  MATH  MathSciNet  Google Scholar 

  18. Gey S., Nédélec E. (2005). Model selection for CART regression trees. IEEE Trans. Inf. Theory 51:658–670

    Article  Google Scholar 

  19. Guyon X., Yao J.F. (1999). On the underfitting and overfitting sets of models chosen by order selection criteria. Jour. Multivar. Anal. 70:221–249

    Article  MATH  MathSciNet  Google Scholar 

  20. Hannan E.J., Quinn B.G. (1979). The determination of the order of an autoregression. J.R.S.S., B 41:190–195

    MATH  MathSciNet  Google Scholar 

  21. Hoeffding W. (1963). Probability inequalities for sums of bounded random variables. J.A.S.A. 58:13–30

    MATH  MathSciNet  Google Scholar 

  22. Hurvich K.L., Tsai C.-L. (1989). Regression and time series model selection in small samples. Biometrika 76:297–307

    Article  MATH  MathSciNet  Google Scholar 

  23. Johnstone, I.: Chi-square oracle inequalities. In: de Gunst, M.C.M., Klaassen, C.A.J. van der Vaart, A.W. (eds.) State of the Art in Probability and Statistics, Festschrift for Willem R. van Zwet, Institute of Mathematical Statistics, Lecture Notes–Monograph Series, Vol. 36. pp. 399–418 (2001)

  24. Kneip A. (1994). Ordered linear smoothers. Ann. Statist. 22:835–866

    MATH  MathSciNet  Google Scholar 

  25. Lavielle M., Moulines E. (2000). Least Squares estimation of an unknown number of shifts in a time series. J. Time Series Anal. 21:33–59

    Article  MATH  MathSciNet  Google Scholar 

  26. Lebarbier E. (2005). Detecting multiple change-points in the mean of a Gaussian process by model selection. Signal Proces. 85:717–736

    Article  Google Scholar 

  27. Li K.C. (1987). Asymptotic optimality for C p , C L , cross-validation, and generalized cross-validation: Discrete index set. Ann. Statist. 15:958–975

    MATH  MathSciNet  Google Scholar 

  28. Loubes, J.-M., Massart, P.: Discussion of “Least angle regression” by Efron, B., Hastie, R., Johnstone, I., Tibshirani, R. Ann. Statist. 32, 460–465 (2004).

  29. Mallows C.L. (1973). Some comments on C p . Technometrics 15:661–675

    Article  MATH  Google Scholar 

  30. Massart P. (1990). The tight constant in the D.K.W. inequality. Ann. Probab. 18:1269–1283

    MATH  MathSciNet  Google Scholar 

  31. McQuarrie A.D.R., Tsai C.-L. (1998). Regression and Time Series Model Selection. World Scientific, Singapore

    MATH  Google Scholar 

  32. Mitchell T.J., Beauchamp J.J. (1988). Bayesian variable selection in linear regression. J.A.S.A. 83:1023–1032

    MATH  MathSciNet  Google Scholar 

  33. Polyak B.T., Tsybakov A.B. (1990). Asymptotic optimality of the C p -test for the orthogonal series estimation of regression. Theory Probab. Appl. 35:293–306

    Article  MATH  MathSciNet  Google Scholar 

  34. Rissanen J. (1978). Modeling by shortest data description. Automatica 14:465–471

    Article  MATH  Google Scholar 

  35. Schwarz G. (1978). Estimating the dimension of a model. Ann. Statist. 6:461–464

    MATH  MathSciNet  Google Scholar 

  36. Shen X., Ye J. (2002). Adaptive model selection. J.A.S.A. 97:210–221

    MATH  MathSciNet  Google Scholar 

  37. Shibata R. (1981). An optimal selection of regression variables. Biometrika 68:45–54

    Article  MATH  MathSciNet  Google Scholar 

  38. Wallace D.L. (1959). Bounds on normal approximations to Student’s and the chi-square distributions. Ann. Math. Stat. 30:1121–1130

    MathSciNet  Google Scholar 

  39. Whittaker E.T., Watson G.N. (1927). A Course of Modern Analysis. Cambridge University Press, London

    MATH  Google Scholar 

  40. Yang Y. (2005). Can the strenghths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika 92:937–950

    Article  MathSciNet  Google Scholar 

  41. Yao Y.C. (1988). Estimating the number of change points via Schwarz criterion. Stat. Probab. Lett. 6:181–189

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. UMR 7599 “Probabilités et modèles aléatoires”, Laboratoire de Probabilités, boîte 188, Université Paris VI, 4 Place Jussieu, 75252, Paris Cedex 05, France

    Lucien Birgé

  2. UMR 8628 “Laboratoire de Mathématiques”, Bât. 425, Université Paris Sud, Campus d’Orsay, 91405, Orsay Cedex, France

    Pascal Massart

Authors
  1. Lucien Birgé
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Pascal Massart
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pascal Massart.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Birgé, L., Massart, P. Minimal Penalties for Gaussian Model Selection. Probab. Theory Relat. Fields 138, 33–73 (2007). https://doi.org/10.1007/s00440-006-0011-8

Download citation

  • Received: 11 July 2004

  • Revised: 24 March 2006

  • Published: 04 July 2006

  • Issue Date: May 2007

  • DOI: https://doi.org/10.1007/s00440-006-0011-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Gaussian linear regression
  • Variable selection
  • Model selection
  • Mallows’ C p
  • Penalized least-squares

Mathematics Subject Classification (2000)

  • Primary 62G05
  • Secondary 62G07
  • 62J05
Download PDF

Working on a manuscript?

Avoid the common mistakes

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature