Skip to main content

Regularization

  • Reference work entry
  • First Online:
Encyclopedia of Machine Learning and Data Mining

Abstract

Regularization plays a key role in many machine learning algorithms. Exactly fitting a model to the training data is generally undesirable, because it will fit the noise in the training examples (overfitting), and is doomed to predict (generalize) poorly on unseen data. In contrast, a simple model that fits the training data well is more likely to capture the regularities in it and generalize well. A number of regularizers have been proposed for various applications, and theoretical tools that characterize their complexity are also available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 699.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 949.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Regularization lies at the heart of statistical machine learning, and it is indispensable in almost every learning algorithm. A comprehensive statistical analysis from the computational learning theory perspective can be found in Bousquet et al. (2005) and Vapnik (1998). Abundant resources on compressed sensing including both theory and applications are available at http://dsp.rice.edu/cs. Regularizations related to SVMs and kernel methods are discussed in detail by Schölkopf and Smola (2002) and Shawe-Taylor and Cristianini (2004). Anthony and Bartlett (1999) provide in-depth theoretical analysis for neural networks.

Recommended Reading

Regularization lies at the heart of statistical machine learning, and it is indispensable in almost every learning algorithm. A comprehensive statistical analysis from the computational learning theory perspective can be found in Bousquet et al. (2005) and Vapnik (1998). Abundant resources on compressed sensing including both theory and applications are available at http://dsp.rice.edu/cs. Regularizations related to SVMs and kernel methods are discussed in detail by Schölkopf and Smola (2002) and Shawe-Taylor and Cristianini (2004). Anthony and Bartlett (1999) provide in-depth theoretical analysis for neural networks.

  • Anthony M, Bartlett PL (1999) Neural network learning: theoretical foundations. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Bousquet O, Boucheron S, Lugosi G (2005) Theory of classification: a survey of recent advances. ESAIM: Probab Stat 9:323–375

    Article  MathSciNet  MATH  Google Scholar 

  • Candes E, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51(12): 4203–4215

    Article  MathSciNet  MATH  Google Scholar 

  • Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Applications of mathematics, vol 31. Springer, New York

    Google Scholar 

  • Guo Y, Bartlett PL, Shawe-Taylor J, Williamson RC (1999) Covering numbers for support vector machines. In: Proceedings annual conference computational learning theory. Montreal, Canada

    Book  MATH  Google Scholar 

  • Kivinen J, Warmuth MK (1997) Exponentiated gradient versus gradient descent for linear predictors. Inf Comput 132(1):1–64

    Article  MathSciNet  MATH  Google Scholar 

  • Rifkin RM, Lippert RA (2007) Value regularization and Fenchel duality. J Mach Learn Res 8: 441–479

    MathSciNet  MATH  Google Scholar 

  • Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge

    MATH  Google Scholar 

  • Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B Stat Methodol 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Tikhonov AN (1943) On the stability of inverse problems. Dokl Akad Nauk SSSR 39(5):195–198

    MathSciNet  Google Scholar 

  • Tropp JA (2006) Algorithms for simultaneous sparse approximation, Part II: convex relaxation. Signal Process 86(3):589C–602

    Article  Google Scholar 

  • Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  • Vapnik V, Chervonenkis A (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Appl 16(2): 264–281

    Article  MATH  Google Scholar 

  • Zhang M, Zhang D, Wells MT (2008) Variable selection for large p small n regression models with incomplete data: mapping Qtl with epistases. BMC Bioinf 9:251

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinhua Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this entry

Cite this entry

Zhang, X. (2017). Regularization. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_718

Download citation

Publish with us

Policies and ethics