Abstract
Many different model selection information criteria can be found in the literature in various contexts including regression and density estimation. There is a huge amount of literature concerning this subject and we shall, in this paper, content ourselves to cite only a few typical references in order to illustrate our presentation. Let us just mention AIC, C p , or C L , BIC and MDL criteria proposed by Akaike (1973), Mallows (1973), Schwarz (1978), and Rissanen (1978) respectively. These methods propose to select among a given collection of parametric models that model which minimizes an empirical loss (typically squared error or minus log-likelihood) plus some penalty term which is proportional to the dimension of the model. From one criterion to another the penalty functions differ by factors of log n, where n represents the number of observations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akaike, H. (1973), Information theory and an extension of the maximum likelihood principle, in P. N. Petrov & F. Csaki, eds, ‘Proceedings 2nd International Symposium on Information Theory’, Akademia Kiado, Budapest, pp. 267–281.
Barron, A. R. & Cover, T. M. (1991), ‘Minimum complexity density estimation’, IEEE Transactions on Information Theory 37 1034–1054.
Barron, A. R., Birgé, L. & Massart, P. (1995), Model selection via penalization, Technical Report 95.54, Université Paris-Sud.
Birgé, L. & Massart, P. (1994), Minimum contrast estimation on sieves, Technical Report 94.34, Université Paris-Sud.
Cirel’son, B. S., Ibragimov, I. A. & Sudakov, V. N. (1976), Norm of gaussian sample function, in ‘Proceedings of the 3rd Japan-USSR Symposium on Probability Theory’, Springer-Verlag, New York, pp. 20–41. Springer Lecture Notes in Mathematics 550.
DeVore, R. A. & Lorentz, G. G. (1993), Constructive Approximation, Springer-Verlag, Berlin.
Donoho, D. L. & Johnstone, I. M. (1994), ‘Ideal spatial adaptation by wavelet shrinkage’, Biometrika 81 425–455.
Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. & Picard, D. (1993), Density estimation by wavelet thresholding, Technical Report 426, Department of Statistics, Stanford University.
Efroimovich, S. Y. (1985), ‘Nonparametric estimation of a density of unknown smoothness’, Theory of Probability and Its Applications 30 557–568.
Grenander, U. (1981), Abstract Inference, Wiley, New-York.
Kerkyacharian, G. & Picard, D. (1992), ‘Estimation de densité par méthode de noyau et d’ondelettes: les lieus entre la géometrie du noyau et les contraintes de régularité’, Comptes Rendus de l’Academie des Sciences, Paris, Ser. I Math 315, 79–84.
Kerkyacharian, G., Picard, D. & Tribouley, K. (1994), LP adaptive density estimation, Technical report, Université Paris VII.
Le Cam, L. (1973), ‘Convergence of estimates under dimensionality restrictions’, Annals of Statistics 19, 633–667.
Le Cam, L. (1986), Asymptotic Methods in Statistical Decision Theory, Springer-Verlag, New York.
Ledoux, M. (1995). Private communication.
Li, K. C. (1987), ‘Asymptotic optimality for C p, C L , cross-validation, and generalized cross-validation: Discrete index set’, Annals of Statistics 15, 958–975.
Mallows, C. L. (1973), ‘Some comments on C p ’, Technometrics 15, 661–675.
Mason, D. M. & van Zwet, W. R. (1987), ‘A refinement of the KMT inequality for the uniform empirical process’, Annals of Probability 15, 871–884.
Meyer, Y. (1990), Ondelettes et Opérateurs I, Hermann, Paris.
Polyak, B. T. & Tsybakov, A. B. (1990), ‘Asymptotic optimality of the Cr-criteria in regression projective estimation’, Theory of Probability and Its Applications 35, 293–306.
Rissanen, J. (1978), ‘Modeling by shortest data description’, Automatica 14, 465–471.
Rudemo, M. (1982), ‘Empirical choice of histograms and kernel density estimators’, Scandinavian Journal of Statistics 9, 65–78.
Schwarz, G. (1978), ‘Estimating the dimension of a model’, Annals of Statistics 6, 461–464.
Talagrand, M. (1994), ‘Sharper bounds for Gaussian and empirical processes’, Annals of Probability 22 28–76.
Talagrand, M. (1995), New concentration inequalities in product spaces, Technical report, Ohio State University.
Vapnik, V. (1982), Estimation of Dependences Based on Empirical Data, Springer-Verlag, New York.
Wahba, G. (1990), Spline Models for Observational Data, Society for Industrial and Applied Mathematics, Philadelphia.
Whittaker, E. T. & Watson, G. N. (1927), A Course of Modern Analysis, Cambridge University Press, London.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer Science+Business Media New York
About this chapter
Cite this chapter
Birgé, L., Massart, P. (1997). From Model Selection to Adaptive Estimation. In: Pollard, D., Torgersen, E., Yang, G.L. (eds) Festschrift for Lucien Le Cam. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-1880-7_4
Download citation
DOI: https://doi.org/10.1007/978-1-4612-1880-7_4
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7323-3
Online ISBN: 978-1-4612-1880-7
eBook Packages: Springer Book Archive