A fast unified algorithm for solving group-lasso penalize learning problems

Yang, Yi; Zou, Hui

doi:10.1007/s11222-014-9498-5

A fast unified algorithm for solving group-lasso penalize learning problems

Published: 04 August 2014

Volume 25, pages 1129–1141, (2015)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Yi Yang¹ &
Hui Zou^1,2

3944 Accesses
130 Citations
3 Altmetric
Explore all metrics

Abstract

This paper concerns a class of group-lasso learning problems where the objective function is the sum of an empirical loss and the group-lasso penalty. For a class of loss function satisfying a quadratic majorization condition, we derive a unified algorithm called groupwise-majorization-descent (GMD) for efficiently computing the solution paths of the corresponding group-lasso penalized learning problem. GMD allows for general design matrices, without requiring the predictors to be group-wise orthonormal. As illustration examples, we develop concrete algorithms for solving the group-lasso penalized least squares and several group-lasso penalized large margin classifiers. These group-lasso models have been implemented in an R package gglasso publicly available from the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/web/packages/gglasso. On simulated and real data, gglasso consistently outperforms the existing software for computing the group-lasso that implements either the classical groupwise descent algorithm or Nesterov’s method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Article 15 April 2024

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

References

Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. 96(12), 6745 (1999)
Article Google Scholar
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004)
Article MATH Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–451 (2004)
Article MATH MathSciNet Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularized paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
Article Google Scholar
Fu, W.: Penalized regressions: the bridge versus the lasso. J. Comput. Gr. Stat. 7(3), 397–416 (1998)
Google Scholar
Genkin, A., Lewis, D., Madigan, D.: Large-scale Bayesian logistic regression for text categorization. Technometrics 49(3), 291–304 (2007)
Article MathSciNet Google Scholar
Gorman, R., Sejnowski, T.: Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw. 1(1), 75–89 (1988)
Article Google Scholar
Graham, K., de Las Morenas, A., Tripathi, A., King, C., Kavanah, M., Mendez, J., Stone, M., Slama, J., Miller, M., Antoine, G., et al.: Gene expression in histologically normal epithelium from breast cancer patients and from cancer-free prophylactic mastectomy patients shares a similar profile. Br. J. Cancer 102(8), 1284–1293 (2010)
Article Google Scholar
Hunter, D., Lange, K.: A tutorial on MM algorithms. Am. Stat. 58(1), 30–37 (2004)
Article MathSciNet Google Scholar
Lange, K., Hunter, D., Yang, I.: Optimization transfer using sur- rogate objective functions (with discussion). J. Comput. Gr. Stat. 9, 1–20 (2000)
Liu, J., Ji, S., Ye, J.: SLEP: Sparse Learning with Efficient Projections, Arizona State University. URL: http://www.public.asu.edu/jye02/Software/SLEP (2009)
Meier, L., van de Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. Ser. B 70, 53–71 (2008)
Article MATH Google Scholar
Nesterov, Y.: ‘Introductory lectures on convex optimization: A basic course’, Operations Research. (2004)
Nesterov, Y.: Gradient methods for minimizing composite objective function, Technical report, Technical Report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL) (2007)
Osborne, M., Presnell, B., Turlach, B.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20(3), 389–403 (2000)
Article MATH MathSciNet Google Scholar
Quinlan, J.: Combining instance-based and model-based learning. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 236–243 (1993)
Sæbø, S., Almøy, T., Aarøe, J., Aastveit, A.: St-pls: a multi-directional nearest shrunken centroid type classifier via pls. J. Chemom. 22(1), 54–62 (2008)
Article Google Scholar
Scheetz, T., Kim, K., Swiderski, R., Philp, A., Braun, T., Knudtson, K., Dorrance, A., DiBona, G., Huang, J., Casavant, T., et al.: Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Nat. Acad. Sci. 103(39), 14429–14434 (2006)
Article Google Scholar
Segal, M., Dahlquist, K., Conklin, B.: Regression approaches for microarray data analysis. J. Comput. Biol. 10(6), 961–980 (2003)
Article Google Scholar
Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., Ladd, C., Tamayo, P., Renshaw, A., D’Amico, A., Richie, J., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
MATH MathSciNet Google Scholar
Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., Tibshirani, R.: Strong rules for discarding predictors in lasso-type problems. J. R. Stat. Soc. Ser. B 74, 245–266 (2012)
Article MathSciNet Google Scholar
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Article MATH MathSciNet Google Scholar
Vogt, J., Roth, V.: A complete analysis of the $l_{1, p}$ group-lasso. In: Proceedings of the 29th International Conference on Machine Learning (ICML-12), ICML 2012, Omnipress, pp 185–192 (2012)
Wang, H., Leng, C.: A note on adaptive group lasso. Comput. Stat. Data Anal. 52, 5277–5286 (2008)
Article MATH MathSciNet Google Scholar
Wang, L., Zhu, J., Zou, H.: Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24, 412–419 (2008)
Article Google Scholar
Wu, T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2, 224–244 (2008)
Article MATH MathSciNet Google Scholar
Wu, T., Lange, K.: The MM alternative to EM. Stat. Sci. 4, 492–505 (2010)
Article MathSciNet Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68, 49–67 (2006)
Article MATH MathSciNet Google Scholar
Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Stat. 32, 56–85 (2004)
Article MATH Google Scholar
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Article MATH Google Scholar

Download references

Acknowledgments

The authors thank the editor, an associate editor and two referees for their helpful comments and suggestions. This work is supported in part by NSF Grant DMS-08-46068.

Author information

Authors and Affiliations

School of Statistics, University of Minnesota, Minneapolis, MN, USA
Yi Yang & Hui Zou
Department of Statistics, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
Hui Zou

Authors

Yi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Zou.

Appendix: Proofs

Proof of Lemma 1

Part (1). For any $\varvec{\beta }$ and $\varvec{\beta }^*$, write $\varvec{\beta }-\varvec{\beta }^*=V$ and define $g(t)=L(\varvec{\beta }^*+tV \mid \mathbf{D})$ so that

$$\begin{aligned} g(0)=L(\varvec{\beta }^* \mid \mathbf{D}), \quad g(1)=L(\varvec{\beta }\mid \mathbf{D}). \end{aligned}$$

By the mean value theorem, $\exists \ a \in (0,1)$ such that

$$\begin{aligned} g(1)=g(0)+g^{\prime }(a)=g(0)+g^{\prime }(0)+[g^{\prime }(a)-g^{\prime }(0)]. \end{aligned}$$

(25)

Write $\Phi _f^{\prime }=\frac{\partial {\Phi (y,f)}}{\partial {f}}$. Note that

$$\begin{aligned} g^{\prime }(t)=\frac{1}{n}\sum ^n_{i=1}\tau _i\Phi _f^{\prime }(y_i,\mathbf{x}^{{ \mathsf T}}_i (\varvec{\beta }^*+tV))(\mathbf{x}^{{ \mathsf T}}_i V). \end{aligned}$$

(26)

Thus $g^{\prime }(0)=(\varvec{\beta }-\varvec{\beta }^*)^{{ \mathsf T}}\nabla L(\varvec{\beta }^* | \mathbf{D}).$ Moreover, from (26) we have

$$\begin{aligned} |g^{\prime }(a)-g^{\prime }(0)|&= \vert \frac{1}{n}\sum ^n_{i=1}\tau _i[\Phi _f^{\prime }(y_i,\mathbf{x}^{{ \mathsf T}}_i (\varvec{\beta }^*+aV))\nonumber \\&-\Phi _f^{\prime }(y_i,\mathbf{x}^{{ \mathsf T}}_i \varvec{\beta }^*)](\mathbf{x}^{{ \mathsf T}}_i V) \vert \nonumber \\&\le \frac{1}{n}\sum ^n_{i=1} \tau _i|\Phi _f^{\prime }(y_i,\mathbf{x}^{{ \mathsf T}}_i (\varvec{\beta }^*+aV))\nonumber \\&-\Phi _f^{\prime }(y_i,\mathbf{x}^{{ \mathsf T}}_i \varvec{\beta }^*) | |\mathbf{x}^{{ \mathsf T}}_i V| \nonumber \\&\le \frac{1}{n}\sum ^n_{i=1} C\tau _i|\mathbf{x}^{{ \mathsf T}}_i aV | |\mathbf{x}^{{ \mathsf T}}_i V| \end{aligned}$$

(27)

$$\begin{aligned}&\le \frac{1}{n}\sum ^n_{i=1} C \tau _i \Vert \mathbf{x}^{{ \mathsf T}}_i V \Vert ^2_2 \nonumber \\&= \frac{C}{n}V^{{ \mathsf T}} [ \mathbf{X}^{{ \mathsf T}} \varvec{\Gamma }\mathbf{X}] V, \end{aligned}$$

(28)

where in (27) we have used the inequality $| \Phi ^{\prime }(y,f_1)-\Phi ^{\prime }(y,f_2)| \le C |f_1-f_2|$. Plugging (27) into (25) we have

$$\begin{aligned} L(\varvec{\beta }\mid \mathbf{D})&\le L(\varvec{\beta }^* \mid \mathbf{D})+(\varvec{\beta }-\varvec{\beta }^*)^{{ \mathsf T}}\nabla L(\varvec{\beta }^* | \mathbf{D})\\&+\frac{1}{2}(\varvec{\beta }-\varvec{\beta }^*)^{{ \mathsf T}} \mathbf{H}( \varvec{\beta }- \varvec{\beta }^*), \end{aligned}$$

with $ \mathbf{H}=\frac{2C}{n}\mathbf{X}^{{ \mathsf T}} \varvec{\Gamma }\mathbf{X}. $

Part (2). Write $\Phi _f^{\prime \prime }=\frac{\partial {\Phi ^2(y,f)}}{\partial {f^2}}$. By Taylor’s expansion, $\exists \ b \in (0,1)$ such that

$$\begin{aligned} g(1)=g(0)+g^{\prime }(0)+g^{\prime \prime }(b). \end{aligned}$$

(29)

Note that

$$\begin{aligned} g^{\prime \prime }(b)&= \frac{1}{n}\sum ^n_{i=1}\tau _i \Phi _f^{\prime \prime }(y_i,\mathbf{x}^{{ \mathsf T}}_i (\varvec{\beta }^*+bV))(\mathbf{x}^{{ \mathsf T}}_i V)^2\nonumber \\&\le \frac{1}{n}\sum ^n_{i=1}C_2\tau _i (\mathbf{x}^{{ \mathsf T}}_i V)^2, \end{aligned}$$

(30)

where we have used the inequality $\Phi _f^{\prime \prime } \le C_2$. Plugging (30) into (29) we have

$$\begin{aligned} L(\varvec{\beta }\mid \mathbf{D})&\le L(\varvec{\beta }^* \mid \mathbf{D})+(\varvec{\beta }-\varvec{\beta }^*)^{{ \mathsf T}}\nabla L(\varvec{\beta }^* | \mathbf{D})\\&+\frac{1}{2}(\varvec{\beta }-\varvec{\beta }^*)^{{ \mathsf T}} \mathbf{H}( \varvec{\beta }- \varvec{\beta }^*), \end{aligned}$$

with $ \mathbf{H}=\frac{C_2}{n}\mathbf{X}^{{ \mathsf T}} \varvec{\Gamma }\mathbf{X}. $ $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Y., Zou, H. A fast unified algorithm for solving group-lasso penalize learning problems. Stat Comput 25, 1129–1141 (2015). https://doi.org/10.1007/s11222-014-9498-5

Download citation

Received: 19 June 2013
Accepted: 22 July 2014
Published: 04 August 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s11222-014-9498-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast unified algorithm for solving group-lasso penalize learning problems

Abstract

Access this article

Similar content being viewed by others

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

The Frank-Wolfe Algorithm: A Short Introduction

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs

Proof of Lemma 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A fast unified algorithm for solving group-lasso penalize learning problems

Abstract

Access this article

Similar content being viewed by others

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

The Frank-Wolfe Algorithm: A Short Introduction

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs

Appendix: Proofs

Proof of Lemma 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation