# Efficient block-coordinate descent algorithms for the Group Lasso

- 1.8k Downloads
- 60 Citations

## Abstract

We present two algorithms to solve the Group Lasso problem (Yuan and Lin in, J R Stat Soc Ser B (Stat Methodol) 68(1):49–67, 2006). First, we propose a general version of the Block Coordinate Descent (BCD) algorithm for the Group Lasso that employs an efficient approach for optimizing each subproblem exactly. We show that it exhibits excellent performance when the groups are of moderate size. For groups of large size, we propose an extension of ISTA/FISTA SIAM (Beck and Teboulle in, SIAM J Imag Sci 2(1):183–202, 2009) based on variable step-lengths that can be viewed as a simplified version of BCD. By combining the two approaches we obtain an implementation that is very competitive and often outperforms other state-of-the-art approaches for this problem. We show how these methods fit into the globally convergent general block coordinate gradient descent framework in Tseng and Yun (Math Program 117(1):387–423, 2009). We also show that the proposed approach is more efficient in practice than the one implemented in Tseng and Yun (Math Program 117(1):387–423, 2009). In addition, we apply our algorithms to the Multiple Measurement Vector (MMV) recovery problem, which can be viewed as a special case of the Group Lasso problem, and compare their performance to other methods in this particular instance.

## Keywords

Block coordinate descent Group Lasso Iterative shrinkage thresholding Multiple measurement vector Line-search## Mathematics Subject Classification

9008 90C25## Notes

### Acknowledgments

We would like to thank Shiqian Ma for valuable discussions on the MMV problems. We also thank the two anonymous reviewers for their constructive comments, which improved this paper significantly.

## References

- 1.Bach, F.: Consistency of the group Lasso and multiple kernel learning. J. Mach. Learn. Res.
**9**, 1179–1225 (2008)MathSciNetzbMATHGoogle Scholar - 2.Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci.
**2**(1), 183–202 (2009)MathSciNetzbMATHCrossRefGoogle Scholar - 3.van den Berg, E., Friedlander, M.: Joint-sparse recovery from multiple measurements. arXiv 904 (2009)Google Scholar
- 4.van den Berg, E., Friedlander, M.: Sparse Optimization With Least-squares Constraints. Tech. rep., Technical Report TR-2010-02, Department of Computer Science, University of British Columbia, Columbia (2010)Google Scholar
- 5.van den Berg, E., Schmidt, M., Friedlander, M., Murphy, K.: Group sparsity via linear-time projection. Tech. rep., Technical Report TR-2008-09, Department of Computer Science, University of British Columbia, Columbia (2008)Google Scholar
- 6.Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. Inform Theory IEEE Trans
**52**(2), 489–509 (2006)zbMATHCrossRefGoogle Scholar - 7.Chen, J., Huo, X.: Theoretical results on sparse representations of multiple-measurement vectors. IEEE Trans. Signal Process.
**54**, 12 (2006)Google Scholar - 8.Dolan, E., Moré, J.: Benchmarking optimization software with performance profiles. Math. Program.
**91**(2), 201–213 (2002)MathSciNetzbMATHCrossRefGoogle Scholar - 9.Donoho, D.: Compressed sensing, information theory. IEEE Trans.
**52**(4), 1289–1306 (2006)MathSciNetGoogle Scholar - 10.Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. preprint, Leipzig (2010)Google Scholar
- 11.Jacob, L., Obozinski, G., Vert, J.: Group Lasso with overlap and graph Lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, pp. 433–440 (2009)Google Scholar
- 12.Kim, D., Sra, S., Dhillon, I.: A scalable trust-region algorithm with application to mixed-norm regression. vol. 1. In: Internetional Conference Machine Learning (ICML), Atlanta (2010)Google Scholar
- 13.Kim, S., Xing, E.: Tree-guided group lasso for multi-task regression with structured sparsity. In: Proceedings of the 27th Annual International Conference on, Machine Learning, New York (2010)Google Scholar
- 14.Liu, J., Ji, S., Ye, J.: Multi-task feature learning via efficient l 2, 1-norm minimization. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press, Corvallis, pp. 339–348 (2009)Google Scholar
- 15.Liu, J., Ji, S., Ye, J.: SLEP: Sparse Learning with Efficient Projections. Arizona State University, Arizona (2009)Google Scholar
- 16.Ma, S., Song, X., Huang, J.: Supervised group Lasso with applications to microarray data analysis. BMC bioinformatics
**8**(1), 60 (2007)CrossRefGoogle Scholar - 17.Meier, L., Van De Geer, S., Buhlmann, P.: The group lasso for logistic regression. J. Royal Stat. Soc. Ser. B (Stat. Methodol.)
**70**(1), 53–71 (2008)zbMATHCrossRefGoogle Scholar - 18.Moré, J., Sorensen, D.: Computing a trust region step. SIAM J. Sci. Statist. Comput.
**4**, 553 (1983)MathSciNetzbMATHCrossRefGoogle Scholar - 19.Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. CORE Discussion Papers, Belgique (2010)Google Scholar
- 20.Nocedal, J., Wright, S.: Numerical optimization. Springer verlag, New York (1999)zbMATHCrossRefGoogle Scholar
- 21.Rakotomamonjy, A.: Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms. Sig. Process.
**91**(7), 1505–1526 (2011)zbMATHCrossRefGoogle Scholar - 22.Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Arxiv, preprint arXiv:1107.2848 (2011)Google Scholar
- 23.Roth, V., Fischer, B.: The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In: Proceedings of the 25th international conference on Machine learning, ACM, Bellevue, pp. 848–855 (2008)Google Scholar
- 24.Subramanian, A., Tamayo, P., Mootha, V., Mukherjee, S., Ebert, B., Gillette, M., Paulovich, A., Pomeroy, S., Golub, T., Lander, E., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. U.S.A.
**102**(43), 15,545 (2005)CrossRefGoogle Scholar - 25.Sun, L., Liu, J., Chen, J., Ye, J.: Efficient Recovery of Jointly Sparse Vectors. NIPS, Canada, (2009)Google Scholar
- 26.R, Tibshirani: Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Ser. B (Methodol.)
**58**(1), 267–288 (1966)Google Scholar - 27.Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl.
**109**(3), 475–494 (2001)MathSciNetzbMATHCrossRefGoogle Scholar - 28.Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program.
**117**(1), 387–423 (2009)MathSciNetzbMATHCrossRefGoogle Scholar - 29.Van De Vijver, M., He, Y., van’t Veer, L., Dai, H., Hart, A., Voskuil, D., Schreiber, G., Peterse, J., Roberts, C., Marton, M., et al.: A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med.
**347**(25), 1999 (2002)CrossRefGoogle Scholar - 30.Vandenberghe, L.: Gradient methods for nonsmooth problems. EE236C course notes (2008)Google Scholar
- 31.Wright, S., Nowak, R., Figueiredo, M.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process.
**57**(7), 2479–2493 (2009)MathSciNetCrossRefGoogle Scholar - 32.Yang, H., Xu, Z., King, I., Lyu, M.: Online learning for group lasso. In: 27th Intl Conf. on Machine Learning (ICML2010). Citeseer (2010)Google Scholar
- 33.Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Royal Statist. Soc. Ser. B (Statist. Methodol.)
**68**(1), 49–67 (2006)MathSciNetzbMATHCrossRefGoogle Scholar - 34.Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Royal Statist. Soc. Ser. B (Statist. Methodol.)
**67**(2), 301–320 (2005)MathSciNetzbMATHCrossRefGoogle Scholar