Abstract
The sparse group Lasso is a widely used statistical model which encourages the sparsity both on a group and within the group level. In this paper, we develop an efficient augmented Lagrangian method for large-scale non-overlapping sparse group Lasso problems with each subproblem being solved by a superlinearly convergent inexact semismooth Newton method. Theoretically, we prove that, if the penalty parameter is chosen sufficiently large, the augmented Lagrangian method converges globally at an arbitrarily fast linear rate for the primal iterative sequence, the dual infeasibility, and the duality gap of the primal and dual objective functions. Computationally, we derive explicitly the generalized Jacobian of the proximal mapping associated with the sparse group Lasso regularizer and exploit fully the underlying second order sparsity through the semismooth Newton method. The efficiency and robustness of our proposed algorithm are demonstrated by numerical experiments on both the synthetic and real data sets.
Similar content being viewed by others
Notes
The source codes can be found in https://github.com/EugeneNdiaye/GAPSAFE_SGL.
References
Andrew, G., Gao, J.: Scalable training of \(L_1\)-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning. ACM, pp. 33–40 (2007)
Argyriou, A., Micchelli, C.A., Pontil, M., Shen, L., Xu, Y.: Efficient First Order Methods for Linear Composite Regularizers. arXiv preprint arXiv:1104.1436 (2011)
Borwein, J., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer Science & Business Media, New York (2010)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Byrd, R.H., Chin, G.M., Nocedal, J., Oztoprak, F.: A family of second-order methods for convex \(\ell _1\)-regularized optimization. Math. Program. 159(1–2), 435–467 (2016)
Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Chen, L., Sun, D.F., Toh, K.-C.: An efficient inexact symmetric Gauss-Seidel based majorized ADMM for high-dimensional convex composite conic programming. Math. Program. 161(1–2), 237–270 (2017)
Chen, X.D., Sun, D.F., Sun, J.: Complementarity functions and numerical experiments on some smoothing Newton methods for second-order-cone complementarity problems. Comput. Optim. Appl. 25(1), 39–56 (2003)
Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM, University City (1990)
Cui, Y., Sun, D.F., Toh, K.-C.: On the Asymptotic Superlinear Convergence of the Augmented Lagrangian Method for Semidefinite Programming with Multiple Solutions. arXiv preprint arXiv:1610.00875 (2016)
Cui, Y., Sun, D.F., Toh, K.-C.: On the R-superlinear Convergence of the KKT Residues Generated by the Augmented Lagrangian Method for Convex Composite Conic Programming. Mathematical Programming (arXiv preprint arXiv:1706.08800) (2018)
De Los Reyes, J.C., Loayza, E., Merino, P.: Second-order orthant-based methods with enriched Hessian information for sparse \(\ell _1 \)-optimization. Comput. Optim. Appl. 67(2), 225–258 (2017)
Dong, Y.: An extension of Luque’s growth condition. Appl. Math. Lett. 22(9), 1390–1393 (2009)
Eldar, Y.C., Mishali, M.: Robust recovery of signals from a structured union of subspaces. IEEE Trans. Inf. Theory 55(11), 5302–5316 (2009)
Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Science & Business Media, New York (2007)
Fazel, M., Pong, T.K., Sun, D.F., Tseng, P.: Hankel matrix rank minimization with applications to system identification and realization. SIAM J. Matrix Anal. Appl. 34(3), 946–977 (2013)
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)
Friedman, J., Hastie, T., Tibshirani, R.: A Note on the Group Lasso and a Sparse Group Lasso. arXiv preprint arXiv:1001.0736 (2010)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique, 9(R2), 41–76 (1975)
Hallac, D., Leskovec, J., Boyd, S.: Network lasso: clustering and optimization in large graphs. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 387–396 (2015)
Huang, L., Jia, J., Yu, B., Chun, B.-G., Maniatis, P., Naik, M.: Predicting execution time of computer programs using sparse polynomial regression. In: Advances in Neural Information Processing Systems, pp. 883–891 (2010)
Jacob, L., Obozinski, G., Vert, J.-P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM, pp. 433–440 (2009)
Jenatton, R., Mairal, J., Bach, F.R., Obozinski, G.R.: Proximal methods for sparse hierarchical dictionary learning. In: Proceedings of the 27th International Conference on Machine Learning (ICML), pp. 487–494 (2010)
Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D., Gandin, L., Iredell, M., Saha, S., White, G., Woollen, J., Zhu, Y., Chelliah, M., Ebisuzaki, W., Higgins, W., Janowiak, J., Mo, K.C., Ropelewski, C., Wang, J., Leetmaa, A., Reynolds, R., Jenne, R., Joseph, D.: The NCEP/NCAR 40-year reanalysis project. Bull. Am. Meteorol. Soc. 77(3), 437–472 (1996)
Kim, J., Park, H.: Fast active-set-type algorithms for \(l_1\)-regularized linear regression. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 397–404 (2010)
Kong, D., Ding, C.: Efficient algorithms for selecting features with arbitrary group constraints via group lasso. In: 2013 IEEE 13th International Conference on Data Mining, pp. 379–388 (2013)
Kummer, B.: Newton’s method for non-differentiable functions. Adv. Math. Optim. 45, 114–125 (1988)
Kummer, B.: Newton’s method based on generalized derivatives for nonsmooth functions: convergence analysis. In: Oettli, W., Pallaschke, D. (eds.) Advances in Optimization, pp. 171–194. Springer, New York (1992)
Li, X., Sun, D.F., Toh, K.-C.: A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems. SIAM J. Optim. 28(1), 433–458 (2018)
Li, X., Sun, D.F., Toh, K.-C.: On efficiently solving the subproblems of a level-set method for fused lasso problems. SIAM J. Optim. 28(2), 1842–1862 (2018)
Lichman, M.: UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine (2013)
Liu, J., Ji, S., Ye, J.: SLEP: sparse learning with efficient projections. Ariz. State Univ. 6, 491 (2009)
Liu, J., Ye, J.: Moreau-Yosida regularization for grouped tree structure learning. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J. and Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 1459–1467. Curran Associates, Inc. (2010)
Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46(1), 157–178 (1993)
Luque, F.J.: Asymptotic convergence analysis of the proximal point algorithm. SIAM J. Control Optim. 22(2), 277–293 (1984)
Meng, F., Sun, D.F., Zhao, G.: Semismoothness of solutions to generalized equations and the Moreau–Yosida regularization. Math. Program. 104(2), 561–581 (2005)
Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM J. Control Optim. 15(6), 959–972 (1977)
Ndiaye, E., Fercoq, O., Gramfort, A., Salmon, J.: Gap safe screening rules for sparse-group lasso. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 388–396. Curran Associates, Inc. (2016)
Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D.-Y., Pollack, J.R., Wang, P.: Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat. 4(1), 53 (2010)
Qi, L.: Convergence analysis of some algorithms for solving nonsmooth equations. Math. Oper. Res. 18(1), 227–244 (1993)
Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58(1), 353–367 (1993)
Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput. 5(2), 143–169 (2013)
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1997)
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Sun, D.F.: The strong second-order sufficient condition and constraint nondegeneracy in nonlinear semidefinite programming and their implications. Math. Oper. Res. 31(4), 761–776 (2006)
Sun, D.F., Sun, J.: Semismooth matrix-valued functions. Math. Oper. Res. 27(1), 150–169 (2002)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol) 58, 267–288 (1996)
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(1), 91–108 (2005)
Yu, Y.-L.: On decomposing the proximal map. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 91–99. Curran Associates, Inc. (2013)
Yuan, L., Liu, J., Ye, J.: Efficient methods for overlapping group lasso. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24, pp. 352–360. Curran Associates, Inc. (2011)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(1), 49–67 (2006)
Zhang, H., Jiang, J., Luo, Z.-Q.: On the linear convergence of a proximal gradient method for a class of nonsmooth convex minimization problems. J. Oper. Res. Soc. China 1(2), 163–186 (2013)
Zhao, X.Y., Sun, D.F., Toh, K.-C.: A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim. 20(4), 1737–1765 (2010)
Zhou, Y., Han, J., Yuan, X., Wei, Z., Hong, R.: Inverse sparse group lasso model for robust object tracking. IEEE Trans. Multimed. 19, 1798–1810 (2017)
Acknowledgements
The authors would like to thank Dr. Xudong Li and Ms. Meixia Lin for their help in the numerical implementations. We also thank the referees for their valuable suggestions which have helped to improve quality of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
N. Zhang: This author’s research is supported in part by the Singapore-ETH Centre (SEC), which was established as a collaboration between ETH Zurich and National Research Foundation (NRF) Singapore (FI 370074011) under the auspices of the NRF’s Campus for Research Excellence and Technological Enterprise (CREATE) programme.
D. Sun: This author’s research is supported in part by a start-up research grant from the Hong Kong Polytechnic University
K.-C. Toh: This author’s research is supported in part by an Academic Research Fund under Grant number R-146-000-257-112 from the Ministry of Education, Singapore.
Rights and permissions
About this article
Cite this article
Zhang, Y., Zhang, N., Sun, D. et al. An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems. Math. Program. 179, 223–263 (2020). https://doi.org/10.1007/s10107-018-1329-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-018-1329-6