Regularization

Zhang, Xinhua

doi:10.1007/978-1-4899-7687-1_718

Xinhua Zhang^3,4,5

363 Accesses
1 Citations

Abstract

Regularization plays a key role in many machine learning algorithms. Exactly fitting a model to the training data is generally undesirable, because it will fit the noise in the training examples (overfitting), and is doomed to predict (generalize) poorly on unseen data. In contrast, a simple model that fits the training data well is more likely to capture the regularities in it and generalize well. A number of regularizers have been proposed for various applications, and theoretical tools that characterize their complexity are also available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 699.99; Price excludes VAT (USA)

Hardcover Book: USD 949.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Regularization lies at the heart of statistical machine learning, and it is indispensable in almost every learning algorithm. A comprehensive statistical analysis from the computational learning theory perspective can be found in Bousquet et al. (2005) and Vapnik (1998). Abundant resources on compressed sensing including both theory and applications are available at http://dsp.rice.edu/cs. Regularizations related to SVMs and kernel methods are discussed in detail by Schölkopf and Smola (2002) and Shawe-Taylor and Cristianini (2004). Anthony and Bartlett (1999) provide in-depth theoretical analysis for neural networks.

Recommended Reading

Regularization lies at the heart of statistical machine learning, and it is indispensable in almost every learning algorithm. A comprehensive statistical analysis from the computational learning theory perspective can be found in Bousquet et al. (2005) and Vapnik (1998). Abundant resources on compressed sensing including both theory and applications are available at http://dsp.rice.edu/cs. Regularizations related to SVMs and kernel methods are discussed in detail by Schölkopf and Smola (2002) and Shawe-Taylor and Cristianini (2004). Anthony and Bartlett (1999) provide in-depth theoretical analysis for neural networks.

Anthony M, Bartlett PL (1999) Neural network learning: theoretical foundations. Cambridge University Press, Cambridge
Book MATH Google Scholar
Bousquet O, Boucheron S, Lugosi G (2005) Theory of classification: a survey of recent advances. ESAIM: Probab Stat 9:323–375
Article MathSciNet MATH Google Scholar
Candes E, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51(12): 4203–4215
Article MathSciNet MATH Google Scholar
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Applications of mathematics, vol 31. Springer, New York
Google Scholar
Guo Y, Bartlett PL, Shawe-Taylor J, Williamson RC (1999) Covering numbers for support vector machines. In: Proceedings annual conference computational learning theory. Montreal, Canada
Book MATH Google Scholar
Kivinen J, Warmuth MK (1997) Exponentiated gradient versus gradient descent for linear predictors. Inf Comput 132(1):1–64
Article MathSciNet MATH Google Scholar
Rifkin RM, Lippert RA (2007) Value regularization and Fenchel duality. J Mach Learn Res 8: 441–479
MathSciNet MATH Google Scholar
Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge
MATH Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Book MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B Stat Methodol 58:267–288
MathSciNet MATH Google Scholar
Tikhonov AN (1943) On the stability of inverse problems. Dokl Akad Nauk SSSR 39(5):195–198
MathSciNet Google Scholar
Tropp JA (2006) Algorithms for simultaneous sparse approximation, Part II: convex relaxation. Signal Process 86(3):589C–602
Article Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Vapnik V, Chervonenkis A (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Appl 16(2): 264–281
Article MATH Google Scholar
Zhang M, Zhang D, Wells MT (2008) Variable selection for large p small n regression models with incomplete data: mapping Qtl with epistases. BMC Bioinf 9:251
Article Google Scholar

Download references

Author information

Authors and Affiliations

NICTA, Australian National University, Canberra, ACT, Australia
Xinhua Zhang
School of Computer Science, Australian National University, Canberra, ACT, Australia
Xinhua Zhang
NICTA London Circuit, Canberra, ACT, Australia
Xinhua Zhang

Authors

Xinhua Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinhua Zhang .

Editor information

Editors and Affiliations

The University of New South Wales, Sydney, NSW, Australia
Claude Sammut
Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Zhang, X. (2017). Regularization. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_718

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7687-1_718
Published: 14 April 2017
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics