Machine Learning

, Volume 48, Issue 1–3, pp 9–23 | Cite as

Model Selection for Small Sample Regression

  • Olivier Chapelle
  • Vladimir Vapnik
  • Yoshua Bengio


Model selection is an important ingredient of many machine learning algorithms, in particular when the sample size in small, in order to strike the right trade-off between overfitting and underfitting. Previous classical results for linear regression are based on an asymptotic analysis. We present a new penalization method for performing model selection for regression that is appropriate even for small samples. Our penalization is based on an accurate estimator of the ratio of the expected training error and the expected generalization error, in terms of the expected eigenvalues of the input covariance matrix.

model selection parametric regression uniform convergence bounds 


  1. Akaike, H. (1970). Statistical predictor identification. Ann. Inst. Stat. Math., 22, 202–217.Google Scholar
  2. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. Petrov, & F. Csaki (Eds.), 2nd International Symposium on Information Theory, Budapest (Vol. 22, pp. 267–281).Google Scholar
  3. Barron, A., Rissanen, J., & Yu, B. (1998). The minimum description length principle in coding and modeling. IEEE Trans. Inf. Theory, 44, 2743–2760.Google Scholar
  4. Bartlett, P., Boucheron, S., & Lugosi, G. (2000). Model selection and error estimation. In COLT'00.Google Scholar
  5. Cherkassky, V., Mulier, F., & Vapnik, V. (1997). Comparison of VC method with classical methods for model selection. In Proceedings of the World Congress on Neural Networks (pp. 957-962).Google Scholar
  6. Foster, D., & George, E. (1994). The risk inflation criterion for multiple regression. Annals of Statistics, 22:4, 1947–1975.Google Scholar
  7. Horn, R. A., & Johnson, C. R. (1985). Matrix analysis. Cambridge: Cambridge University Press.Google Scholar
  8. Mallows, C. L. (1973). Some comments on Cp. Technometrics, 15:4, 661–675.Google Scholar
  9. Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statistics, 14, 1080–1100.Google Scholar
  10. Schuurmans, D. (1997). A new metric-based approach to model selection. In Proceedings of the Fourteenth National Conference on Artificial Intelligence.Google Scholar
  11. Schwartz, G. (1978). Estimating the dimension of a model. Ann. Stat., 6, 461–464.Google Scholar
  12. Shibata, R. (1981). An optimal selection of regression variables. Biometrica, 68, 461–464.Google Scholar
  13. Vapnik, V. (1982). Estimation of dependencies based on empirical data. Berlin: Springer.Google Scholar
  14. Vapnik, V. (1998). Statistical learning theory. New York: Wiley.Google Scholar
  15. Wahba, G., Golub, G., & Heath, M. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21, 215–223.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Olivier Chapelle
    • 1
  • Vladimir Vapnik
    • 2
  • Yoshua Bengio
    • 3
  1. 1.LIP6ParisFrance
  2. 2.AT&T Research LabsMiddletownUSA
  3. 3.Dept. IRO, CP 6128Université de Montréal, Succ. Centre-VilleMontréalCanada

Personalised recommendations