Mathematical Programming Computation

, Volume 5, Issue 2, pp 143–169 | Cite as

Efficient block-coordinate descent algorithms for the Group Lasso

  • Zhiwei QinEmail author
  • Katya Scheinberg
  • Donald Goldfarb
Full Length Paper


We present two algorithms to solve the Group Lasso problem (Yuan and Lin in, J R Stat Soc Ser B (Stat Methodol) 68(1):49–67, 2006). First, we propose a general version of the Block Coordinate Descent (BCD) algorithm for the Group Lasso that employs an efficient approach for optimizing each subproblem exactly. We show that it exhibits excellent performance when the groups are of moderate size. For groups of large size, we propose an extension of ISTA/FISTA SIAM (Beck and Teboulle in, SIAM J Imag Sci 2(1):183–202, 2009) based on variable step-lengths that can be viewed as a simplified version of BCD. By combining the two approaches we obtain an implementation that is very competitive and often outperforms other state-of-the-art approaches for this problem. We show how these methods fit into the globally convergent general block coordinate gradient descent framework in Tseng and Yun (Math Program 117(1):387–423, 2009). We also show that the proposed approach is more efficient in practice than the one implemented in Tseng and Yun (Math Program 117(1):387–423, 2009). In addition, we apply our algorithms to the Multiple Measurement Vector (MMV) recovery problem, which can be viewed as a special case of the Group Lasso problem, and compare their performance to other methods in this particular instance.


Block coordinate descent Group Lasso Iterative shrinkage thresholding Multiple measurement vector Line-search 

Mathematics Subject Classification

9008 90C25 



We would like to thank Shiqian Ma for valuable discussions on the MMV problems. We also thank the two anonymous reviewers for their constructive comments, which improved this paper significantly.


  1. 1.
    Bach, F.: Consistency of the group Lasso and multiple kernel learning. J. Mach. Learn. Res. 9, 1179–1225 (2008)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    van den Berg, E., Friedlander, M.: Joint-sparse recovery from multiple measurements. arXiv 904 (2009)Google Scholar
  4. 4.
    van den Berg, E., Friedlander, M.: Sparse Optimization With Least-squares Constraints. Tech. rep., Technical Report TR-2010-02, Department of Computer Science, University of British Columbia, Columbia (2010)Google Scholar
  5. 5.
    van den Berg, E., Schmidt, M., Friedlander, M., Murphy, K.: Group sparsity via linear-time projection. Tech. rep., Technical Report TR-2008-09, Department of Computer Science, University of British Columbia, Columbia (2008)Google Scholar
  6. 6.
    Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. Inform Theory IEEE Trans 52(2), 489–509 (2006)zbMATHCrossRefGoogle Scholar
  7. 7.
    Chen, J., Huo, X.: Theoretical results on sparse representations of multiple-measurement vectors. IEEE Trans. Signal Process. 54, 12 (2006)Google Scholar
  8. 8.
    Dolan, E., Moré, J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Donoho, D.: Compressed sensing, information theory. IEEE Trans. 52(4), 1289–1306 (2006)MathSciNetGoogle Scholar
  10. 10.
    Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. preprint, Leipzig (2010)Google Scholar
  11. 11.
    Jacob, L., Obozinski, G., Vert, J.: Group Lasso with overlap and graph Lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, pp. 433–440 (2009)Google Scholar
  12. 12.
    Kim, D., Sra, S., Dhillon, I.: A scalable trust-region algorithm with application to mixed-norm regression. vol. 1. In: Internetional Conference Machine Learning (ICML), Atlanta (2010)Google Scholar
  13. 13.
    Kim, S., Xing, E.: Tree-guided group lasso for multi-task regression with structured sparsity. In: Proceedings of the 27th Annual International Conference on, Machine Learning, New York (2010)Google Scholar
  14. 14.
    Liu, J., Ji, S., Ye, J.: Multi-task feature learning via efficient l 2, 1-norm minimization. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press, Corvallis, pp. 339–348 (2009)Google Scholar
  15. 15.
    Liu, J., Ji, S., Ye, J.: SLEP: Sparse Learning with Efficient Projections. Arizona State University, Arizona (2009)Google Scholar
  16. 16.
    Ma, S., Song, X., Huang, J.: Supervised group Lasso with applications to microarray data analysis. BMC bioinformatics 8(1), 60 (2007)CrossRefGoogle Scholar
  17. 17.
    Meier, L., Van De Geer, S., Buhlmann, P.: The group lasso for logistic regression. J. Royal Stat. Soc. Ser. B (Stat. Methodol.) 70(1), 53–71 (2008)zbMATHCrossRefGoogle Scholar
  18. 18.
    Moré, J., Sorensen, D.: Computing a trust region step. SIAM J. Sci. Statist. Comput. 4, 553 (1983)MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. CORE Discussion Papers, Belgique (2010)Google Scholar
  20. 20.
    Nocedal, J., Wright, S.: Numerical optimization. Springer verlag, New York (1999)zbMATHCrossRefGoogle Scholar
  21. 21.
    Rakotomamonjy, A.: Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms. Sig. Process. 91(7), 1505–1526 (2011)zbMATHCrossRefGoogle Scholar
  22. 22.
    Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Arxiv, preprint arXiv:1107.2848 (2011)Google Scholar
  23. 23.
    Roth, V., Fischer, B.: The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In: Proceedings of the 25th international conference on Machine learning, ACM, Bellevue, pp. 848–855 (2008)Google Scholar
  24. 24.
    Subramanian, A., Tamayo, P., Mootha, V., Mukherjee, S., Ebert, B., Gillette, M., Paulovich, A., Pomeroy, S., Golub, T., Lander, E., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. U.S.A. 102(43), 15,545 (2005)CrossRefGoogle Scholar
  25. 25.
    Sun, L., Liu, J., Chen, J., Ye, J.: Efficient Recovery of Jointly Sparse Vectors. NIPS, Canada, (2009)Google Scholar
  26. 26.
    R, Tibshirani: Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Ser. B (Methodol.) 58(1), 267–288 (1966)Google Scholar
  27. 27.
    Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  28. 28.
    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  29. 29.
    Van De Vijver, M., He, Y., van’t Veer, L., Dai, H., Hart, A., Voskuil, D., Schreiber, G., Peterse, J., Roberts, C., Marton, M., et al.: A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347(25), 1999 (2002)CrossRefGoogle Scholar
  30. 30.
    Vandenberghe, L.: Gradient methods for nonsmooth problems. EE236C course notes (2008)Google Scholar
  31. 31.
    Wright, S., Nowak, R., Figueiredo, M.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Yang, H., Xu, Z., King, I., Lyu, M.: Online learning for group lasso. In: 27th Intl Conf. on Machine Learning (ICML2010). Citeseer (2010)Google Scholar
  33. 33.
    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Royal Statist. Soc. Ser. B (Statist. Methodol.) 68(1), 49–67 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  34. 34.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Royal Statist. Soc. Ser. B (Statist. Methodol.) 67(2), 301–320 (2005)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society 2013

Authors and Affiliations

  1. 1.Department of Industrial Engineering and Operations ResearchColumbia UniversityNew YorkUSA
  2. 2.Department of Industrial and Systems EngineeringLehigh UniversityBethlehemUSA

Personalised recommendations