An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems

Zhang, Yangjing; Zhang, Ning; Sun, Defeng; Toh, Kim-Chuan

doi:10.1007/s10107-018-1329-6

An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems

Full Length Paper
Series A
Published: 12 September 2018

Volume 179, pages 223–263, (2020)
Cite this article

Mathematical Programming Submit manuscript

Yangjing Zhang¹,
Ning Zhang²,
Defeng Sun ORCID: orcid.org/0000-0003-0481-272X² &
…
Kim-Chuan Toh³

2113 Accesses
30 Citations
1 Altmetric
Explore all metrics

Abstract

The sparse group Lasso is a widely used statistical model which encourages the sparsity both on a group and within the group level. In this paper, we develop an efficient augmented Lagrangian method for large-scale non-overlapping sparse group Lasso problems with each subproblem being solved by a superlinearly convergent inexact semismooth Newton method. Theoretically, we prove that, if the penalty parameter is chosen sufficiently large, the augmented Lagrangian method converges globally at an arbitrarily fast linear rate for the primal iterative sequence, the dual infeasibility, and the duality gap of the primal and dual objective functions. Computationally, we derive explicitly the generalized Jacobian of the proximal mapping associated with the sparse group Lasso regularizer and exploit fully the underlying second order sparsity through the semismooth Newton method. The efficiency and robustness of our proposed algorithm are demonstrated by numerical experiments on both the synthetic and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Yurii Nesterov & Vladimir Spokoiny

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Sebastian Pokutta

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

Article Open access 06 March 2024

Yaohua Hu, Xinlin Hu & Xiaoqi Yang

Notes

http://www.public.asu.edu/~jye02/Software/SLEP.
The source codes can be found in https://github.com/EugeneNdiaye/GAPSAFE_SGL.

References

Andrew, G., Gao, J.: Scalable training of \(L_1\)-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning. ACM, pp. 33–40 (2007)
Argyriou, A., Micchelli, C.A., Pontil, M., Shen, L., Xu, Y.: Efficient First Order Methods for Linear Composite Regularizers. arXiv preprint arXiv:1104.1436 (2011)
Borwein, J., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer Science & Business Media, New York (2010)
Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
MATH Google Scholar
Byrd, R.H., Chin, G.M., Nocedal, J., Oztoprak, F.: A family of second-order methods for convex \(\ell _1\)-regularized optimization. Math. Program. 159(1–2), 435–467 (2016)
MathSciNet MATH Google Scholar
Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Chen, L., Sun, D.F., Toh, K.-C.: An efficient inexact symmetric Gauss-Seidel based majorized ADMM for high-dimensional convex composite conic programming. Math. Program. 161(1–2), 237–270 (2017)
MathSciNet MATH Google Scholar
Chen, X.D., Sun, D.F., Sun, J.: Complementarity functions and numerical experiments on some smoothing Newton methods for second-order-cone complementarity problems. Comput. Optim. Appl. 25(1), 39–56 (2003)
MathSciNet MATH Google Scholar
Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM, University City (1990)
MATH Google Scholar
Cui, Y., Sun, D.F., Toh, K.-C.: On the Asymptotic Superlinear Convergence of the Augmented Lagrangian Method for Semidefinite Programming with Multiple Solutions. arXiv preprint arXiv:1610.00875 (2016)
Cui, Y., Sun, D.F., Toh, K.-C.: On the R-superlinear Convergence of the KKT Residues Generated by the Augmented Lagrangian Method for Convex Composite Conic Programming. Mathematical Programming (arXiv preprint arXiv:1706.08800) (2018)
De Los Reyes, J.C., Loayza, E., Merino, P.: Second-order orthant-based methods with enriched Hessian information for sparse \(\ell _1 \)-optimization. Comput. Optim. Appl. 67(2), 225–258 (2017)
MathSciNet MATH Google Scholar
Dong, Y.: An extension of Luque’s growth condition. Appl. Math. Lett. 22(9), 1390–1393 (2009)
MathSciNet MATH Google Scholar
Eldar, Y.C., Mishali, M.: Robust recovery of signals from a structured union of subspaces. IEEE Trans. Inf. Theory 55(11), 5302–5316 (2009)
MathSciNet MATH Google Scholar
Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Science & Business Media, New York (2007)
MATH Google Scholar
Fazel, M., Pong, T.K., Sun, D.F., Tseng, P.: Hankel matrix rank minimization with applications to system identification and realization. SIAM J. Matrix Anal. Appl. 34(3), 946–977 (2013)
MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)
MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: A Note on the Group Lasso and a Sparse Group Lasso. arXiv preprint arXiv:1001.0736 (2010)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
MATH Google Scholar
Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique, 9(R2), 41–76 (1975)
Hallac, D., Leskovec, J., Boyd, S.: Network lasso: clustering and optimization in large graphs. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 387–396 (2015)
Huang, L., Jia, J., Yu, B., Chun, B.-G., Maniatis, P., Naik, M.: Predicting execution time of computer programs using sparse polynomial regression. In: Advances in Neural Information Processing Systems, pp. 883–891 (2010)
Jacob, L., Obozinski, G., Vert, J.-P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM, pp. 433–440 (2009)
Jenatton, R., Mairal, J., Bach, F.R., Obozinski, G.R.: Proximal methods for sparse hierarchical dictionary learning. In: Proceedings of the 27th International Conference on Machine Learning (ICML), pp. 487–494 (2010)
Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D., Gandin, L., Iredell, M., Saha, S., White, G., Woollen, J., Zhu, Y., Chelliah, M., Ebisuzaki, W., Higgins, W., Janowiak, J., Mo, K.C., Ropelewski, C., Wang, J., Leetmaa, A., Reynolds, R., Jenne, R., Joseph, D.: The NCEP/NCAR 40-year reanalysis project. Bull. Am. Meteorol. Soc. 77(3), 437–472 (1996)
Google Scholar
Kim, J., Park, H.: Fast active-set-type algorithms for \(l_1\)-regularized linear regression. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 397–404 (2010)
Kong, D., Ding, C.: Efficient algorithms for selecting features with arbitrary group constraints via group lasso. In: 2013 IEEE 13th International Conference on Data Mining, pp. 379–388 (2013)
Kummer, B.: Newton’s method for non-differentiable functions. Adv. Math. Optim. 45, 114–125 (1988)
MATH Google Scholar
Kummer, B.: Newton’s method based on generalized derivatives for nonsmooth functions: convergence analysis. In: Oettli, W., Pallaschke, D. (eds.) Advances in Optimization, pp. 171–194. Springer, New York (1992)
Google Scholar
Li, X., Sun, D.F., Toh, K.-C.: A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems. SIAM J. Optim. 28(1), 433–458 (2018)
MathSciNet MATH Google Scholar
Li, X., Sun, D.F., Toh, K.-C.: On efficiently solving the subproblems of a level-set method for fused lasso problems. SIAM J. Optim. 28(2), 1842–1862 (2018)
MathSciNet MATH Google Scholar
Lichman, M.: UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine (2013)
Liu, J., Ji, S., Ye, J.: SLEP: sparse learning with efficient projections. Ariz. State Univ. 6, 491 (2009)
Google Scholar
Liu, J., Ye, J.: Moreau-Yosida regularization for grouped tree structure learning. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J. and Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 1459–1467. Curran Associates, Inc. (2010)
Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46(1), 157–178 (1993)
MathSciNet MATH Google Scholar
Luque, F.J.: Asymptotic convergence analysis of the proximal point algorithm. SIAM J. Control Optim. 22(2), 277–293 (1984)
MathSciNet MATH Google Scholar
Meng, F., Sun, D.F., Zhao, G.: Semismoothness of solutions to generalized equations and the Moreau–Yosida regularization. Math. Program. 104(2), 561–581 (2005)
MathSciNet MATH Google Scholar
Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM J. Control Optim. 15(6), 959–972 (1977)
MathSciNet MATH Google Scholar
Ndiaye, E., Fercoq, O., Gramfort, A., Salmon, J.: Gap safe screening rules for sparse-group lasso. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 388–396. Curran Associates, Inc. (2016)
Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D.-Y., Pollack, J.R., Wang, P.: Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat. 4(1), 53 (2010)
MathSciNet MATH Google Scholar
Qi, L.: Convergence analysis of some algorithms for solving nonsmooth equations. Math. Oper. Res. 18(1), 227–244 (1993)
MathSciNet MATH Google Scholar
Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58(1), 353–367 (1993)
MathSciNet MATH Google Scholar
Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput. 5(2), 143–169 (2013)
MathSciNet MATH Google Scholar
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)
MathSciNet MATH Google Scholar
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
MathSciNet MATH Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1997)
MATH Google Scholar
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
MathSciNet Google Scholar
Sun, D.F.: The strong second-order sufficient condition and constraint nondegeneracy in nonlinear semidefinite programming and their implications. Math. Oper. Res. 31(4), 761–776 (2006)
MathSciNet MATH Google Scholar
Sun, D.F., Sun, J.: Semismooth matrix-valued functions. Math. Oper. Res. 27(1), 150–169 (2002)
MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol) 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(1), 91–108 (2005)
MathSciNet MATH Google Scholar
Yu, Y.-L.: On decomposing the proximal map. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 91–99. Curran Associates, Inc. (2013)
Yuan, L., Liu, J., Ye, J.: Efficient methods for overlapping group lasso. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24, pp. 352–360. Curran Associates, Inc. (2011)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(1), 49–67 (2006)
MathSciNet MATH Google Scholar
Zhang, H., Jiang, J., Luo, Z.-Q.: On the linear convergence of a proximal gradient method for a class of nonsmooth convex minimization problems. J. Oper. Res. Soc. China 1(2), 163–186 (2013)
MATH Google Scholar
Zhao, X.Y., Sun, D.F., Toh, K.-C.: A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim. 20(4), 1737–1765 (2010)
MathSciNet MATH Google Scholar
Zhou, Y., Han, J., Yuan, X., Wei, Z., Hong, R.: Inverse sparse group lasso model for robust object tracking. IEEE Trans. Multimed. 19, 1798–1810 (2017)
Google Scholar

Download references

Acknowledgements

The authors would like to thank Dr. Xudong Li and Ms. Meixia Lin for their help in the numerical implementations. We also thank the referees for their valuable suggestions which have helped to improve quality of this paper.

Author information

Authors and Affiliations

Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore, 119076, Singapore
Yangjing Zhang
Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
Ning Zhang & Defeng Sun
Department of Mathematics, and Institute of Operations Research and Analytics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore, 119076, Singapore
Kim-Chuan Toh

Authors

Yangjing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ning Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Defeng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Kim-Chuan Toh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Defeng Sun.

Additional information

N. Zhang: This author’s research is supported in part by the Singapore-ETH Centre (SEC), which was established as a collaboration between ETH Zurich and National Research Foundation (NRF) Singapore (FI 370074011) under the auspices of the NRF’s Campus for Research Excellence and Technological Enterprise (CREATE) programme.

D. Sun: This author’s research is supported in part by a start-up research grant from the Hong Kong Polytechnic University

K.-C. Toh: This author’s research is supported in part by an Academic Research Fund under Grant number R-146-000-257-112 from the Ministry of Education, Singapore.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Zhang, N., Sun, D. et al. An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems. Math. Program. 179, 223–263 (2020). https://doi.org/10.1007/s10107-018-1329-6

Download citation

Received: 15 December 2017
Accepted: 04 September 2018
Published: 12 September 2018
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10107-018-1329-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems

Abstract

Access this article

Similar content being viewed by others

Random Gradient-Free Minimization of Convex Functions

The Frank-Wolfe Algorithm: A Short Introduction

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems

Abstract

Access this article

Similar content being viewed by others

Random Gradient-Free Minimization of Convex Functions

The Frank-Wolfe Algorithm: A Short Introduction

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation