Skip to main content
Log in

An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

The sparse group Lasso is a widely used statistical model which encourages the sparsity both on a group and within the group level. In this paper, we develop an efficient augmented Lagrangian method for large-scale non-overlapping sparse group Lasso problems with each subproblem being solved by a superlinearly convergent inexact semismooth Newton method. Theoretically, we prove that, if the penalty parameter is chosen sufficiently large, the augmented Lagrangian method converges globally at an arbitrarily fast linear rate for the primal iterative sequence, the dual infeasibility, and the duality gap of the primal and dual objective functions. Computationally, we derive explicitly the generalized Jacobian of the proximal mapping associated with the sparse group Lasso regularizer and exploit fully the underlying second order sparsity through the semismooth Newton method. The efficiency and robustness of our proposed algorithm are demonstrated by numerical experiments on both the synthetic and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://www.public.asu.edu/~jye02/Software/SLEP.

  2. The source codes can be found in https://github.com/EugeneNdiaye/GAPSAFE_SGL.

References

  1. Andrew, G., Gao, J.: Scalable training of \(L_1\)-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning. ACM, pp. 33–40 (2007)

  2. Argyriou, A., Micchelli, C.A., Pontil, M., Shen, L., Xu, Y.: Efficient First Order Methods for Linear Composite Regularizers. arXiv preprint arXiv:1104.1436 (2011)

  3. Borwein, J., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer Science & Business Media, New York (2010)

    Google Scholar 

  4. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    MATH  Google Scholar 

  5. Byrd, R.H., Chin, G.M., Nocedal, J., Oztoprak, F.: A family of second-order methods for convex \(\ell _1\)-regularized optimization. Math. Program. 159(1–2), 435–467 (2016)

    MathSciNet  MATH  Google Scholar 

  6. Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  7. Chen, L., Sun, D.F., Toh, K.-C.: An efficient inexact symmetric Gauss-Seidel based majorized ADMM for high-dimensional convex composite conic programming. Math. Program. 161(1–2), 237–270 (2017)

    MathSciNet  MATH  Google Scholar 

  8. Chen, X.D., Sun, D.F., Sun, J.: Complementarity functions and numerical experiments on some smoothing Newton methods for second-order-cone complementarity problems. Comput. Optim. Appl. 25(1), 39–56 (2003)

    MathSciNet  MATH  Google Scholar 

  9. Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM, University City (1990)

    MATH  Google Scholar 

  10. Cui, Y., Sun, D.F., Toh, K.-C.: On the Asymptotic Superlinear Convergence of the Augmented Lagrangian Method for Semidefinite Programming with Multiple Solutions. arXiv preprint arXiv:1610.00875 (2016)

  11. Cui, Y., Sun, D.F., Toh, K.-C.: On the R-superlinear Convergence of the KKT Residues Generated by the Augmented Lagrangian Method for Convex Composite Conic Programming. Mathematical Programming (arXiv preprint arXiv:1706.08800) (2018)

  12. De Los Reyes, J.C., Loayza, E., Merino, P.: Second-order orthant-based methods with enriched Hessian information for sparse \(\ell _1 \)-optimization. Comput. Optim. Appl. 67(2), 225–258 (2017)

    MathSciNet  MATH  Google Scholar 

  13. Dong, Y.: An extension of Luque’s growth condition. Appl. Math. Lett. 22(9), 1390–1393 (2009)

    MathSciNet  MATH  Google Scholar 

  14. Eldar, Y.C., Mishali, M.: Robust recovery of signals from a structured union of subspaces. IEEE Trans. Inf. Theory 55(11), 5302–5316 (2009)

    MathSciNet  MATH  Google Scholar 

  15. Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Science & Business Media, New York (2007)

    MATH  Google Scholar 

  16. Fazel, M., Pong, T.K., Sun, D.F., Tseng, P.: Hankel matrix rank minimization with applications to system identification and realization. SIAM J. Matrix Anal. Appl. 34(3), 946–977 (2013)

    MathSciNet  MATH  Google Scholar 

  17. Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)

    MathSciNet  MATH  Google Scholar 

  18. Friedman, J., Hastie, T., Tibshirani, R.: A Note on the Group Lasso and a Sparse Group Lasso. arXiv preprint arXiv:1001.0736 (2010)

  19. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)

    MATH  Google Scholar 

  20. Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique, 9(R2), 41–76 (1975)

  21. Hallac, D., Leskovec, J., Boyd, S.: Network lasso: clustering and optimization in large graphs. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 387–396 (2015)

  22. Huang, L., Jia, J., Yu, B., Chun, B.-G., Maniatis, P., Naik, M.: Predicting execution time of computer programs using sparse polynomial regression. In: Advances in Neural Information Processing Systems, pp. 883–891 (2010)

  23. Jacob, L., Obozinski, G., Vert, J.-P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM, pp. 433–440 (2009)

  24. Jenatton, R., Mairal, J., Bach, F.R., Obozinski, G.R.: Proximal methods for sparse hierarchical dictionary learning. In: Proceedings of the 27th International Conference on Machine Learning (ICML), pp. 487–494 (2010)

  25. Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D., Gandin, L., Iredell, M., Saha, S., White, G., Woollen, J., Zhu, Y., Chelliah, M., Ebisuzaki, W., Higgins, W., Janowiak, J., Mo, K.C., Ropelewski, C., Wang, J., Leetmaa, A., Reynolds, R., Jenne, R., Joseph, D.: The NCEP/NCAR 40-year reanalysis project. Bull. Am. Meteorol. Soc. 77(3), 437–472 (1996)

    Google Scholar 

  26. Kim, J., Park, H.: Fast active-set-type algorithms for \(l_1\)-regularized linear regression. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 397–404 (2010)

  27. Kong, D., Ding, C.: Efficient algorithms for selecting features with arbitrary group constraints via group lasso. In: 2013 IEEE 13th International Conference on Data Mining, pp. 379–388 (2013)

  28. Kummer, B.: Newton’s method for non-differentiable functions. Adv. Math. Optim. 45, 114–125 (1988)

    MATH  Google Scholar 

  29. Kummer, B.: Newton’s method based on generalized derivatives for nonsmooth functions: convergence analysis. In: Oettli, W., Pallaschke, D. (eds.) Advances in Optimization, pp. 171–194. Springer, New York (1992)

    Google Scholar 

  30. Li, X., Sun, D.F., Toh, K.-C.: A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems. SIAM J. Optim. 28(1), 433–458 (2018)

    MathSciNet  MATH  Google Scholar 

  31. Li, X., Sun, D.F., Toh, K.-C.: On efficiently solving the subproblems of a level-set method for fused lasso problems. SIAM J. Optim. 28(2), 1842–1862 (2018)

    MathSciNet  MATH  Google Scholar 

  32. Lichman, M.: UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine (2013)

  33. Liu, J., Ji, S., Ye, J.: SLEP: sparse learning with efficient projections. Ariz. State Univ. 6, 491 (2009)

    Google Scholar 

  34. Liu, J., Ye, J.: Moreau-Yosida regularization for grouped tree structure learning. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J. and Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 1459–1467. Curran Associates, Inc. (2010)

  35. Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46(1), 157–178 (1993)

    MathSciNet  MATH  Google Scholar 

  36. Luque, F.J.: Asymptotic convergence analysis of the proximal point algorithm. SIAM J. Control Optim. 22(2), 277–293 (1984)

    MathSciNet  MATH  Google Scholar 

  37. Meng, F., Sun, D.F., Zhao, G.: Semismoothness of solutions to generalized equations and the Moreau–Yosida regularization. Math. Program. 104(2), 561–581 (2005)

    MathSciNet  MATH  Google Scholar 

  38. Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM J. Control Optim. 15(6), 959–972 (1977)

    MathSciNet  MATH  Google Scholar 

  39. Ndiaye, E., Fercoq, O., Gramfort, A., Salmon, J.: Gap safe screening rules for sparse-group lasso. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 388–396. Curran Associates, Inc. (2016)

  40. Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D.-Y., Pollack, J.R., Wang, P.: Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat. 4(1), 53 (2010)

    MathSciNet  MATH  Google Scholar 

  41. Qi, L.: Convergence analysis of some algorithms for solving nonsmooth equations. Math. Oper. Res. 18(1), 227–244 (1993)

    MathSciNet  MATH  Google Scholar 

  42. Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58(1), 353–367 (1993)

    MathSciNet  MATH  Google Scholar 

  43. Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput. 5(2), 143–169 (2013)

    MathSciNet  MATH  Google Scholar 

  44. Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)

    MathSciNet  MATH  Google Scholar 

  45. Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)

    MathSciNet  MATH  Google Scholar 

  46. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)

    MathSciNet  MATH  Google Scholar 

  47. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1997)

    MATH  Google Scholar 

  48. Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)

    MathSciNet  Google Scholar 

  49. Sun, D.F.: The strong second-order sufficient condition and constraint nondegeneracy in nonlinear semidefinite programming and their implications. Math. Oper. Res. 31(4), 761–776 (2006)

    MathSciNet  MATH  Google Scholar 

  50. Sun, D.F., Sun, J.: Semismooth matrix-valued functions. Math. Oper. Res. 27(1), 150–169 (2002)

    MathSciNet  MATH  Google Scholar 

  51. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol) 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  52. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(1), 91–108 (2005)

    MathSciNet  MATH  Google Scholar 

  53. Yu, Y.-L.: On decomposing the proximal map. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 91–99. Curran Associates, Inc. (2013)

  54. Yuan, L., Liu, J., Ye, J.: Efficient methods for overlapping group lasso. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24, pp. 352–360. Curran Associates, Inc. (2011)

  55. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(1), 49–67 (2006)

    MathSciNet  MATH  Google Scholar 

  56. Zhang, H., Jiang, J., Luo, Z.-Q.: On the linear convergence of a proximal gradient method for a class of nonsmooth convex minimization problems. J. Oper. Res. Soc. China 1(2), 163–186 (2013)

    MATH  Google Scholar 

  57. Zhao, X.Y., Sun, D.F., Toh, K.-C.: A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim. 20(4), 1737–1765 (2010)

    MathSciNet  MATH  Google Scholar 

  58. Zhou, Y., Han, J., Yuan, X., Wei, Z., Hong, R.: Inverse sparse group lasso model for robust object tracking. IEEE Trans. Multimed. 19, 1798–1810 (2017)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank Dr. Xudong Li and Ms. Meixia Lin for their help in the numerical implementations. We also thank the referees for their valuable suggestions which have helped to improve quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Defeng Sun.

Additional information

N. Zhang: This author’s research is supported in part by the Singapore-ETH Centre (SEC), which was established as a collaboration between ETH Zurich and National Research Foundation (NRF) Singapore (FI 370074011) under the auspices of the NRF’s Campus for Research Excellence and Technological Enterprise (CREATE) programme.

D. Sun: This author’s research is supported in part by a start-up research grant from the Hong Kong Polytechnic University

K.-C. Toh: This author’s research is supported in part by an Academic Research Fund under Grant number R-146-000-257-112 from the Ministry of Education, Singapore.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Zhang, N., Sun, D. et al. An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems. Math. Program. 179, 223–263 (2020). https://doi.org/10.1007/s10107-018-1329-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-018-1329-6

Keywords

Mathematics Subject Classification

Navigation