Foundations of Statistical Learning and Model Selection

  • Francesco CamastraEmail author
  • Alessandro Vinciarelli
Part of the Advanced Information and Knowledge Processing book series (AI&KP)


What the reader should know to understand this chapter \(\bullet \) Basic notions of machine learning. \(\bullet \) Notions of calculus. \(\bullet \) Chapter  5.


Akaike Information Criterion Bayesian Information Criterion Test Error Generalization Error Structural Risk Minimization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    H. Akaike. Statistical predictor identification. Annals of the Institute of Statistical Mathematics, 21:202–217, 1970.Google Scholar
  2. 2.
    H. Akaike. Information theory and an extension of the maximum likelihood principle. In \(2^{nd}\) International Symposium on Information Theory, pages 267–281, 1973.Google Scholar
  3. 3.
    M. Anthony. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.Google Scholar
  4. 4.
    C. M. Bishop. Neural Networks for Pattern Recognition. Cambridge University Press, 1995.Google Scholar
  5. 5.
    S. Boucheron, G. Lugosi, and S. Massart. A sharp concentration inequality with applications. Random Structures and Algorithms, 16(3):277–292, 2000.Google Scholar
  6. 6.
    V. Cherkassky and F. Mulier. Learning from Data. John Wiley, 1998.Google Scholar
  7. 7.
    H. Chernoff. A measure of asymptotic efficiency of tests of a hypothesis based on the sum of observations. Annals of Mathematical Sciences, 23:493–507, 1952.Google Scholar
  8. 8.
    P. Craven and G. Wahba. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized crossvalidation. Numerische Mathematik, 31(4):377–403, 1978.Google Scholar
  9. 9.
    L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, 1996.Google Scholar
  10. 10.
    R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley, 2001.Google Scholar
  11. 11.
    B. Efron and R.J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, 1993.Google Scholar
  12. 12.
    R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188, 1936.Google Scholar
  13. 13.
    K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 1990.Google Scholar
  14. 14.
    S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias-variance dilemma. Neural Networks, 4(1):1–58, 1992.Google Scholar
  15. 15.
    T. Hastie, R.J. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.Google Scholar
  16. 16.
    F. Mosteller and J.W. Tukey. Data analysis, including statistics. In Handbook of Social Psychology, pages 80–203. Addison-Wesley, 1968.Google Scholar
  17. 17.
    J. Rissanen. A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11(2):416–431, 1983.Google Scholar
  18. 18.
    B. Schölkopf and A.J. Smola. Learning with Kernels. MIT Press, 2002.Google Scholar
  19. 19.
    G. Schwartz. Estimating the dimension of a model. Annals of Statistics, 6(2):461–464, 1978.Google Scholar
  20. 20.
    R. Shibata. An optimal selection of regression variables. Biometrika, 68(1):45–54, 1981.Google Scholar
  21. 21.
    M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, B36:111–147, 1974.Google Scholar
  22. 22.
    M. Stone. An asymptotic equivalence of choice of model by crossvalidation and akaike’s criterion. Journal of the Royal Statistical Society, B39:44–47, 1977.Google Scholar
  23. 23.
    V.N. Vapnik. Estimation of Dependences based on Empirical Data. Springer-Verlag, 1982.Google Scholar
  24. 24.
    V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.Google Scholar
  25. 25.
    V.N. Vapnik. Statistical Learning Theory. John Wiley, 1998.Google Scholar
  26. 26.
    V.N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280, 1971.Google Scholar
  27. 27.
    V.N. Vapnik and A. Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, 1974.Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.Department of Science and TechnologyParthenope University of NaplesNaplesItaly
  2. 2.School of Computing Science and the Institute of Neuroscience and PsychologyUniversity of GlasgowGlasgowUK

Personalised recommendations