Abstract
In statistics, we assume that the number of samples N is larger than the number of variables p. Otherwise, linear regression will not produce any least squares solution, or it will find the optimal variable set by comparing the information criterion values of the 2p subsets of the cardinality p. Therefore, it is difficult to estimate the parameters. In such a sparse situation, regularization is often used. In the case of linear regression, we add a penalty term to the squared error to prevent the coefficient value from increasing. When the regularization term is a constant λ times the L1 and L2 norms of the coefficient, the method is called lasso and ridge, respectively. In the case of lasso, as the constant λ increases, some coefficients become 0; finally, all coefficients become 0 when λ is infinity. In that sense, lasso plays a role of model selection. In this chapter, we consider the principle of lasso and compare it with ridge. Finally, we learn how to choose the constant λ.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this book, convexity always means convex below and does not mean concave (convex above).
- 2.
In such a case, we do not express the subderivative as {f′(x 0)} but as f′(x 0).
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Suzuki, J. (2021). Regularization. In: Statistical Learning with Math and Python. Springer, Singapore. https://doi.org/10.1007/978-981-15-7877-9_6
Download citation
DOI: https://doi.org/10.1007/978-981-15-7877-9_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7876-2
Online ISBN: 978-981-15-7877-9
eBook Packages: Computer ScienceComputer Science (R0)