Regularization

Zhang, Xinhua

doi:10.1007/978-1-4899-7502-7_718-1

Xinhua Zhang^3,4

482 Accesses

Abstract

Regularization plays a key role in many machine learning algorithms. Exactly fitting a model to the training data is generally undesirable, because it will fit the noise in the training examples (overfitting), and is doomed to predict (generalize) poorly on unseen data. In contrast, a simple model that fits the training data well is more likely to capture the regularities in it and generalize well. A number of regularizers have been proposed for various applications, and theoretical tools that characterize their complexity are also available.

Definition

In general, a regularizer a quantifier of the complexity of a model, and many successful machine learning algorithms fall in the framework of regularized risk minimization:

$$\displaystyle{ (\text{How well the model fits the training data}) }$$

(1)

$$\displaystyle{ +\lambda \cdot (\text{complexity/regularization of the model}), }$$

(2)

where the positive real number λ controls the trade-off.

There is a variety of regularizers,...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

1.
Regularization lies at the heart of statistical machine learning, and it is indispensable in almost every learning algorithm. A comprehensive statistical analysis from the computational learning theory perspective can be found in Bousquet et al. (2005) and Vapnik (1998). Abundant resources on compressed sensing including both theory and applications are available at http://dsp.rice.edu/cs. Regularizations related to SVMs and kernel methods are discussed in detail by Schölkopf and Smola (2002) and Shawe-Taylor and Cristianini (2004). Anthony and Bartlett (1999) provide in-depth theoretical analysis for neural networks.

Recommended Reading

Regularization lies at the heart of statistical machine learning, and it is indispensable in almost every learning algorithm. A comprehensive statistical analysis from the computational learning theory perspective can be found in Bousquet et al. (2005) and Vapnik (1998). Abundant resources on compressed sensing including both theory and applications are available at http://dsp.rice.edu/cs. Regularizations related to SVMs and kernel methods are discussed in detail by Schölkopf and Smola (2002) and Shawe-Taylor and Cristianini (2004). Anthony and Bartlett (1999) provide in-depth theoretical analysis for neural networks.

Anthony M, Bartlett PL (1999) Neural network learning: theoretical foundations. Cambridge University Press, Cambridge
Book MATH Google Scholar
Bousquet O, Boucheron S, Lugosi G (2005) Theory of classification: a survey of recent advances. ESAIM: Probab Stat 9:323–375
Article MathSciNet MATH Google Scholar
Candes E, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51(12):4203–4215
Article MathSciNet MATH Google Scholar
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Applications of mathematics, vol 31. Springer, New York
Google Scholar
Guo Y, Bartlett PL, Shawe-Taylor J, Williamson RC (1999) Covering numbers for support vector machines. In: Proceedings annual conference computational learning theory. Montreal, Canada
Book MATH Google Scholar
Kivinen J, Warmuth MK (1997) Exponentiated gradient versus gradient descent for linear predictors. Inf Comput 132(1):1–64
Article MathSciNet MATH Google Scholar
Rifkin RM, Lippert RA (2007) Value regularization and Fenchel duality. J Mach Learn Res 8:441–479
MathSciNet MATH Google Scholar
Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge
MATH Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Book MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B Stat Methodol 58:267–288
MathSciNet MATH Google Scholar
Tikhonov AN (1943) On the stability of inverse problems. Dokl Akad Nauk SSSR 39(5):195–198
MathSciNet Google Scholar
Tropp JA (2006) Algorithms for simultaneous sparse approximation, Part II: convex relaxation. Signal Process 86(3):589C–602
Article MathSciNet Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Vapnik V, Chervonenkis A (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Appl 16(2):264–281
Article MATH Google Scholar
Zhang M, Zhang D, Wells MT (2008) Variable selection for large p small n regression models with incomplete data: mapping Qtl with epistases. BMC Bioinf 9:251
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Australian National University, Canberra, Australia
Xinhua Zhang
NICTA London Circuit, Canberra, Australia
Xinhua Zhang

Authors

Xinhua Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinhua Zhang .

Editor information

Editors and Affiliations

Engineering (CSE), University of New South Wales School of Computer Science &, Sydney, New South Wales, Australia
Claude Sammut
Software Engineering, Monash University School of Computer Science &, Melbourne, Victoria, Australia
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Zhang, X. (2016). Regularization. In: Sammut, C., Webb, G. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7502-7_718-1

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7502-7_718-1
Received: 02 November 2014
Accepted: 03 May 2016
Published: 14 June 2016
Publisher Name: Springer, Boston, MA
Online ISBN: 978-1-4899-7502-7
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics