Abstract
This chapter starts by describing the necessary concepts and assumptions to ensure supervised learning. Later on, it details the Empirical Risk Minimization (ERM) principle, which is the key point for the Statistical Learning Theory (SLT). The ERM principle provides upper bounds to make the empirical risk a good estimator for the expected risk, given the bias of some learning algorithm. This bound is the main theoretical tool to provide learning guarantees for classification tasks. Afterwards, other useful tools and concepts are introduced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It is worth to mention that elements in X and Y may be even in another space, such as the Topological, but some mapping is considered to bring them to the Hilbert space in order to respect the definition.
- 2.
Remember that R(f) represents how likely it is for a randomly selected sample to be misclassified by f.
- 3.
This space contains every possible function to tackle any problem.
- 4.
In case of XOR, the dataset has a finite number of possibilities. Thus, considering all of them were provided, overfitting is not verifiable in practice because memorization is indeed enough when new examples are equal to the ones in the training set.
- 5.
As a binary classifier is considered, this is a power of two.
- 6.
Remember the number of instances affect the number of distinct admissible functions.
References
M. Anthony, N. Biggs, Computational Learning Theory. Cambridge Tracts in Theoretical Computer Science (Cambridge University Press, Cambridge, 1992)
C.M. Bishop, Pattern Recognition and Machine Learning. Information Science and Statistics (Springer-Verlag New York, Secaucus, 2006)
G.E.P. Box, G.M. Jenkins, Time Series Analysis: Forecasting and Control, 3rd edn. (Prentice Hall PTR, Upper Saddle River, 1994)
F.G. da Costa, R.A. Rios, R.F. de Mello, Using dynamical systems tools to detect concept drift in data streams. Expert Syst. Appl. 60(C), 39–50 (2016)
J.B. Dence, Reply to a letter by Weissman on Stirling’s approximation. Am. J. Phys. 51(9), 776–778 (1983)
L. Devroye, L. Györfi, G. Lugosi, A Probabilistic Theory of Pattern Recognition. Applications of Mathematics, vol. 31, corrected 2nd edn. (Springer, Berlin, 1997), missing
E.L. Lima, Análise real. Number v. 1 in Análise real. IMPA (1989)
M. Ponti Jr., Combining classifiers: from the creation of ensembles to the decision fusion, in Graphics, Patterns and Images Tutorials (SIBGRAPI-T), 2011 24th SIBGRAPI Conference on (IEEE, Piscataway, 2011), pp. 1–10
C. Sammut, G.I. Webb, Encyclopedia of Machine Learning, 1st edn. (Springer Publishing Company, New York, 2011)
R.E. Schapire, Y. Freund, Boosting: Foundations and algorithms (MIT press, Cambridge, 2012)
W.C. Schefler, Statistics: Concepts and Applications (Benjamin/Cummings Publishing Company, San Francisco, 1988)
B. Scholkopf, A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT Press, Cambridge, 2001)
L.G. Valiant, A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)
R.M.M. Vallim, R.F. de Mello, Unsupervised change detection in data streams: an application in music analysis. Prog. Artif. Intell. 4(1), 1–10 (2015)
V.N. Vapnik, Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control (Wiley, Hoboken, 1998)
V. Vapnik, The Nature of Statistical Learning Theory. Information Science and Statistics (Springer, New York, 1999)
V.N. Vapnik, A.Ya. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, in Measures of Complexity: Festschrift for Alexey Chervonenkis, ed. by V. Vovk, H. Papadopoulos, A. Gammerman (Springer International Publishing, Cham, 2015), pp. 11–30
U. von Luxburg, B. Schölkopf, Statistical Learning Theory: Models, Concepts, and Results, vol. 10 (Elsevier North Holland, Amsterdam, 2011), pp. 651–706
WDC-SILSO, Solar Influences Data Analysis Center (SIDC), Royal Observatory of Belgium, Brussels (2017). http://www.sidc.be/silso/datafiles
Y. Weissman, An improved analytical approximation to n! Am. J. Phys. 51(1), 9–9 (1983)
R.F. de Mello, M.A. Ponti, C.H.G. Ferreira, Computing the shattering coefficient of supervised learning algorithms (2018). http://arxiv.org/abs/1805.02627
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Fernandes de Mello, R., Antonelli Ponti, M. (2018). Statistical Learning Theory. In: Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-94989-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-94989-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94988-8
Online ISBN: 978-3-319-94989-5
eBook Packages: Computer ScienceComputer Science (R0)