Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches
L1 regularization is effective for feature selection, but the resulting optimization is challenging due to the non-differentiability of the 1-norm. In this paper we compare state-of-the-art optimization techniques to solve this problem across several loss functions. Furthermore, we propose two new techniques. The first is based on a smooth (differentiable) convex approximation for the L1 regularizer that does not depend on any assumptions about the loss function used. The other technique is a new strategy that addresses the non-differentiability of the L1-regularizer by casting the problem as a constrained optimization problem that is then solved using a specialized gradient projection method. Extensive comparisons show that our newly proposed approaches consistently rank among the best in terms of convergence speed and efficiency by measuring the number of function evaluations required.
KeywordsFeature Selection Loss Function Step Length Line Search Constrain Optimization Problem
- 5.Freund, R.M., Mizuno, S.: Interior point methods: Current status and future directions. Optima 51, 1–9 (1996)Google Scholar
- 10.Kumar, S., Hebert, M.: Discriminative random fields: A discriminative framework for contextual interaction in classification. In: ICCV (2003)Google Scholar
- 11.Lee, S.-I., Lee, H., Abbeel, P., Ng, A.Y.: Efficient L1 regularized logistic regression. In: AAAI (2006)Google Scholar
- 17.Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58(1), 267–288Google Scholar