Skip to main content

Statistical Learning Theory

  • Chapter
  • First Online:
Machine Learning

Abstract

This chapter starts by describing the necessary concepts and assumptions to ensure supervised learning. Later on, it details the Empirical Risk Minimization (ERM) principle, which is the key point for the Statistical Learning Theory (SLT). The ERM principle provides upper bounds to make the empirical risk a good estimator for the expected risk, given the bias of some learning algorithm. This bound is the main theoretical tool to provide learning guarantees for classification tasks. Afterwards, other useful tools and concepts are introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It is worth to mention that elements in X and Y may be even in another space, such as the Topological, but some mapping is considered to bring them to the Hilbert space in order to respect the definition.

  2. 2.

    Remember that R(f) represents how likely it is for a randomly selected sample to be misclassified by f.

  3. 3.

    This space contains every possible function to tackle any problem.

  4. 4.

    In case of XOR, the dataset has a finite number of possibilities. Thus, considering all of them were provided, overfitting is not verifiable in practice because memorization is indeed enough when new examples are equal to the ones in the training set.

  5. 5.

    As a binary classifier is considered, this is a power of two.

  6. 6.

    Remember the number of instances affect the number of distinct admissible functions.

References

  1. M. Anthony, N. Biggs, Computational Learning Theory. Cambridge Tracts in Theoretical Computer Science (Cambridge University Press, Cambridge, 1992)

    Google Scholar 

  2. C.M. Bishop, Pattern Recognition and Machine Learning. Information Science and Statistics (Springer-Verlag New York, Secaucus, 2006)

    MATH  Google Scholar 

  3. G.E.P. Box, G.M. Jenkins, Time Series Analysis: Forecasting and Control, 3rd edn. (Prentice Hall PTR, Upper Saddle River, 1994)

    MATH  Google Scholar 

  4. F.G. da Costa, R.A. Rios, R.F. de Mello, Using dynamical systems tools to detect concept drift in data streams. Expert Syst. Appl. 60(C), 39–50 (2016)

    Article  Google Scholar 

  5. J.B. Dence, Reply to a letter by Weissman on Stirling’s approximation. Am. J. Phys. 51(9), 776–778 (1983)

    Article  MathSciNet  Google Scholar 

  6. L. Devroye, L. Györfi, G. Lugosi, A Probabilistic Theory of Pattern Recognition. Applications of Mathematics, vol. 31, corrected 2nd edn. (Springer, Berlin, 1997), missing

    Google Scholar 

  7. E.L. Lima, Análise real. Number v. 1 in Análise real. IMPA (1989)

    Google Scholar 

  8. M. Ponti Jr., Combining classifiers: from the creation of ensembles to the decision fusion, in Graphics, Patterns and Images Tutorials (SIBGRAPI-T), 2011 24th SIBGRAPI Conference on (IEEE, Piscataway, 2011), pp. 1–10

    Google Scholar 

  9. C. Sammut, G.I. Webb, Encyclopedia of Machine Learning, 1st edn. (Springer Publishing Company, New York, 2011)

    MATH  Google Scholar 

  10. R.E. Schapire, Y. Freund, Boosting: Foundations and algorithms (MIT press, Cambridge, 2012)

    MATH  Google Scholar 

  11. W.C. Schefler, Statistics: Concepts and Applications (Benjamin/Cummings Publishing Company, San Francisco, 1988)

    Google Scholar 

  12. B. Scholkopf, A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT Press, Cambridge, 2001)

    Google Scholar 

  13. L.G. Valiant, A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)

    Article  Google Scholar 

  14. R.M.M. Vallim, R.F. de Mello, Unsupervised change detection in data streams: an application in music analysis. Prog. Artif. Intell. 4(1), 1–10 (2015)

    Article  Google Scholar 

  15. V.N. Vapnik, Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control (Wiley, Hoboken, 1998)

    Google Scholar 

  16. V. Vapnik, The Nature of Statistical Learning Theory. Information Science and Statistics (Springer, New York, 1999)

    MATH  Google Scholar 

  17. V.N. Vapnik, A.Ya. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, in Measures of Complexity: Festschrift for Alexey Chervonenkis, ed. by V. Vovk, H. Papadopoulos, A. Gammerman (Springer International Publishing, Cham, 2015), pp. 11–30

    Chapter  Google Scholar 

  18. U. von Luxburg, B. Schölkopf, Statistical Learning Theory: Models, Concepts, and Results, vol. 10 (Elsevier North Holland, Amsterdam, 2011), pp. 651–706

    MATH  Google Scholar 

  19. WDC-SILSO, Solar Influences Data Analysis Center (SIDC), Royal Observatory of Belgium, Brussels (2017). http://www.sidc.be/silso/datafiles

  20. Y. Weissman, An improved analytical approximation to n! Am. J. Phys. 51(1), 9–9 (1983)

    Article  MathSciNet  Google Scholar 

  21. R.F. de Mello, M.A. Ponti, C.H.G. Ferreira, Computing the shattering coefficient of supervised learning algorithms (2018). http://arxiv.org/abs/1805.02627

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Fernandes de Mello, R., Antonelli Ponti, M. (2018). Statistical Learning Theory. In: Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-94989-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94989-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94988-8

  • Online ISBN: 978-3-319-94989-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics