In statistics, we assume that the number of samples N is larger than the number of variables p. Otherwise, linear regression will not produce any least squares solution, or it will find the optimal variable set by comparing the information criterion values of the \(2^p\) subsets of the cardinality p. Therefore, it is difficult to estimate the parameters. In such a situation, regularization is often used. In the case of linear regression, we add a penalty term to the squared error to prevent the coefficient value from increasing. When the regularization term is a constant \(\lambda \) times the L1 and L2 norms of the coefficient, the method is called lasso and ridge, respectively. In the case of lasso, as the constant \(\lambda \) increases, some coefficients become 0; finally, all coefficients become 0 when \(\lambda \) is infinity. In that sense, lasso plays a role of model selection. In this chapter, we consider the principle of lasso and compare it with ridge. Finally, we learn how to choose the constant \(\lambda \).