LARS-type algorithm for group lasso

Yau, Chun Yip; Hui, Tsz Shing

doi:10.1007/s11222-016-9669-7

LARS-type algorithm for group lasso

Published: 23 May 2016

Volume 27, pages 1041–1048, (2017)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Chun Yip Yau¹ &
Tsz Shing Hui¹

1778 Accesses
4 Citations
Explore all metrics

Abstract

The least absolute shrinkage and selection operator (lasso) has been widely used in regression analysis. Based on the piecewise linear property of the solution path, least angle regression provides an efficient algorithm for computing the solution paths of lasso. Group lasso is an important generalization of lasso that can be applied to regression with grouped variables. However, the solution path of group lasso is not piecewise linear and hence cannot be obtained by least angle regression. By transforming the problem into a system of differential equations, we develop an algorithm for efficient computation of group lasso solution paths. Simulation studies are conducted for comparing the proposed algorithm to the best existing algorithm: the groupwise-majorization-descent algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bias-constrained integer least squares estimation: distributional properties and applications in GNSS ambiguity resolution

Article Open access 14 May 2024

A review of unsupervised feature selection methods

Article 29 January 2019

Distributed Adaptive Thresholding Graph Recursive Least Squares Algorithm

Article 06 May 2024

References

Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al.: Least angle regression. Ann. Stat. 32, 407–499 (2004)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1 (2010)
Article Google Scholar
Fu, W.J.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7, 397–416 (1998)
MathSciNet Google Scholar
Hindmarsh, A.C.: Odepack, a systematized collection of ode solvers. In: Stepleman, R.S., et al. (eds.) IMACS Transactions on Scientific Computation, vol. 1, pp. 55–64. North-Holland, Amsterdam (1983)
Google Scholar
Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, New York (2013)
Book MATH Google Scholar
Liu, J., Ji, S., Ye, J.: Slep: Sparse learning with efficient projections. Arizona State Univ. 6, 491 (2009)
Google Scholar
Osborne, M.R., Presnell, B., Turlach, B.A.: On the lasso and its dual. J. Comput. Graph. Stat. 9, 319–337 (2000)
MathSciNet Google Scholar
Radhakrishnan, K., Hindmarsh, A. C.: Description anduse of LSODE, the Livermore solver for ordinary differentialequations. National Aeronautics and Space Administration, Office ofManagement, Scientific and Technical Information Program (1993)
Rosset, S., Zhu, J.: Piecewise linear regularized solution paths. Ann. Stat. 35, 1012–1030 (2007)
Roth, V., Fischer, B.: The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In: Proceedings of the 25th International Conference on Machine learning, pp. 848–855. ACM (2008)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Tibshirani, R.J.: The Solution Path of the Generalized Lasso. Stanford University, Stanford (2011)
MATH Google Scholar
Tibshirani, R.J., et al.: The lasso problem and uniqueness. Electron. J. Stat. 7, 1456–1490 (2013)
Article MathSciNet MATH Google Scholar
Yang, Y., Zou, H.: A fast unified algorithm for solving group-lasso penalize learning problems. Stat. Comput., 1–13 (2014)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68, 49–67 (2006)
Article MathSciNet MATH Google Scholar
Zhou, H., Wu, Y.: A generic path algorithm for regularized statistical estimation. J. Am. Stat. Assoc. 109, 686–699 (2014)
Article MathSciNet Google Scholar

Download references

Acknowledgments

We would like to thank the associate editor and the anonymous referees for their constructive suggestions and comments. Research is supported in part by HKSAR-RGC-ECS 405012 and HKSAR-RGC-GRF 405113, 14601015.

Author information

Authors and Affiliations

Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
Chun Yip Yau & Tsz Shing Hui

Authors

Chun Yip Yau
View author publications
You can also search for this author in PubMed Google Scholar
Tsz Shing Hui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun Yip Yau.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 355 KB)

Appendix

1.1 Proof of propositions and lemmas

Proof of Proposition 1

Suppose that the dimensions of $X_\mathcal {J}^TX_\mathcal {J}+\lambda \nabla ^2 h$ is $p' \times p'$. For any non-zero vector z of size $p'$, we have $X_\mathcal {J}z \ne 0$ because $X_\mathcal {J}z=0$ only if $z=0$ for $X_\mathcal {J}$ whose columns are linearly independent. Hence, we have $z^TX_\mathcal {J}^TX_\mathcal {J}z=(X_\mathcal {J}z)^T(X_\mathcal {J}z)>0$. Also, $\nabla ^2 h$ is a block diagonal matrix expressed as

$$\begin{aligned} \nabla ^2 h=\left[ \begin{array}{ccc} \ddots &{} ~~~~~~~~ &{} 0\\ &{} D_k &{} \\ 0 &{} ~~~~~~~~ &{} \ddots \\ \end{array}\right] , \end{aligned}$$

where $D_k=w_k(\Vert \beta _k\Vert ^2I-\beta _k\beta _k^T)/\Vert \beta _k\Vert ^3$, k in $\mathcal {J}$. According to the Cauchy-Schwarz inequality and the expression $z=[\cdots z_k^T \cdots ]^T$, k in $\mathcal {J}$, we obtain

$$\begin{aligned} \begin{aligned}&z^T(\nabla ^2 h)z\\&\quad =\sum _{k\in \mathcal {J}}z_k^TD_kz_k\\&\quad =\sum _{k\in \mathcal {J}}w_k[(\beta _k^T\beta _k)(z_k^Tz_k)-(z_k^T\beta _k)(\beta _k^Tz_k)]/\Vert \beta _k\Vert ^3\ge 0. \end{aligned} \end{aligned}$$

Therefore, we have $z^T(X_\mathcal {J}^TX_\mathcal {J}+\lambda \nabla ^2 h)z=z^TX_\mathcal {J}^TX_\mathcal {J}z+\lambda z^T(\nabla ^2 h)z>0$, which implies that $X_\mathcal {J}^TX_\mathcal {J}+\lambda \nabla ^2 h$ is positive definite. $\square $

The following lemma is used to prove Proposition 2.

Lemma 1

$\bar{h}(\beta )$ is bounded for $\beta $ in $\mathcal {G}$, where $\mathcal {G}=\{\hat{\beta }(\lambda ), \lambda \ge 0\}$.

Proof of Lemma A1

An equivalent definition for $\hat{\beta }(\lambda )$ is that $\hat{\beta }(\lambda )$ is the solution of

$$\begin{aligned} \min _{\beta }q(\beta ) \text { such that } \bar{h}(\beta )\le t\,, \end{aligned}$$

(12)

for some $t \ge 0$, see Roth and Fischer (2008). Let $\beta _{OLS}$ be the ordinary least square estimate of the regression problem. In other words, $\beta _{OLS}={{\mathrm{arg\,min}}}_{\beta }q(\beta )$. For some $t \le \bar{h}(\beta _{OLS})$ , the corresponding $\hat{\beta }(\lambda )$ is the solution of (12); thus $\bar{h}(\hat{\beta }(\lambda ))\le t \le \bar{h}(\beta _{OLS})$. On the other hand, if $t > \bar{h}(\beta _{OLS})$, the group lasso estimate is obviously the ordinary least square estimate $\beta _{OLS}$. Combining the two cases, we have $\bar{h}(\hat{\beta }(\lambda )) \le \bar{h}(\beta _{OLS})$ for all $\lambda $. $\square $

Proof of Proposition 2

From the definition of f, we have

$$\begin{aligned}&f_{\lambda +\Delta \lambda }(\hat{\beta }(\lambda +\Delta \lambda )) \le f_{\lambda +\Delta \lambda }(\hat{\beta }(\lambda ))\quad \text { and} \end{aligned}$$

(13)

$$\begin{aligned}&f_{\lambda +\Delta \lambda }(\hat{\beta }(\lambda )) = f_{\lambda }(\hat{\beta }(\lambda ))+\xi (\Delta \lambda )\,, \end{aligned}$$

(14)

where $\lim _{\Delta \lambda \rightarrow 0}\xi (\Delta \lambda )=0$. By (13) and (14), we have

$$\begin{aligned} f_{\lambda +\Delta \lambda }(\hat{\beta }(\lambda +\Delta \lambda ))\le f_{\lambda }(\hat{\beta }(\lambda ))+\xi (\Delta \lambda ), \end{aligned}$$

which, combined with Lemma 1, implies that

$$\begin{aligned} \begin{aligned}&f_{\lambda }(\hat{\beta }(\lambda +\Delta \lambda ))-f_{\lambda }(\hat{\beta }(\lambda ))\\&\quad \le f_{\lambda }(\hat{\beta }(\lambda +\Delta \lambda ))-f_{\lambda +\Delta \lambda }(\hat{\beta }(\lambda +\Delta \lambda ))+\xi (\Delta \lambda )\\&\quad \le \sup _{\beta \in \mathcal {G}}|f_{\lambda }(\beta )-f_{\lambda +\Delta \lambda }(\beta )|+\xi (\Delta \lambda )\\&\quad =\sup _{\beta \in \mathcal {G}}|\Delta \lambda \bar{h}(\beta )|+\xi (\Delta \lambda )\\&\quad =|\Delta \lambda |\sup _{\beta \in \mathcal {G}}|\bar{h}(\beta )|+\xi (\Delta \lambda )\\&\quad \rightarrow 0\,,\quad \text { as }\lambda \rightarrow 0\,. \end{aligned} \end{aligned}$$

(15)

Also, it is obvious that $f_{\lambda }(\hat{\beta }(\lambda +\Delta \lambda ))-f_{\lambda }(\hat{\beta }(\lambda ))\ge 0$ and hence

$$\begin{aligned} \lim _{\Delta \lambda \rightarrow 0}f_{\lambda }(\hat{\beta }(\lambda +\Delta \lambda ))=f_{\lambda }(\hat{\beta }(\lambda ))\,. \end{aligned}$$

(16)

Suppose that $\lim _{\Delta \lambda \rightarrow 0}\hat{\beta }(\lambda +\Delta \lambda )\ne \hat{\beta }(\lambda )$, then for all $\delta >0$, there exist some $\epsilon >0$ and $\Delta \lambda >0$ such that $|\Delta \lambda |<\delta $ and $\Vert \hat{\beta }(\lambda +\Delta \lambda )-\hat{\beta }(\lambda )\Vert \ge \epsilon $. However, $\hat{\beta }(\lambda )$ is unique, so $f_{\lambda }(\hat{\beta }(\lambda +\Delta \lambda ))-f_{\lambda }(\hat{\beta }(\lambda ))>\eta $ for $\eta >0$, contradicting (16). Thus, $\lim _{\Delta \lambda \rightarrow 0}\hat{\beta }(\lambda +\Delta \lambda )=\hat{\beta }(\lambda )$, and the proof is completed. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yau, C.Y., Hui, T.S. LARS-type algorithm for group lasso. Stat Comput 27, 1041–1048 (2017). https://doi.org/10.1007/s11222-016-9669-7

Download citation

Received: 09 July 2015
Accepted: 05 May 2016
Published: 23 May 2016
Issue Date: July 2017
DOI: https://doi.org/10.1007/s11222-016-9669-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LARS-type algorithm for group lasso

Abstract

Access this article

Similar content being viewed by others

Bias-constrained integer least squares estimation: distributional properties and applications in GNSS ambiguity resolution

A review of unsupervised feature selection methods

Distributed Adaptive Thresholding Graph Recursive Least Squares Algorithm

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (zip 355 KB)

Appendix

1.1 Proof of propositions and lemmas

Proof of Proposition 1

Lemma 1

Proof of Lemma A1

Proof of Proposition 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

LARS-type algorithm for group lasso

Abstract

Access this article

Similar content being viewed by others

Bias-constrained integer least squares estimation: distributional properties and applications in GNSS ambiguity resolution

A review of unsupervised feature selection methods

Distributed Adaptive Thresholding Graph Recursive Least Squares Algorithm

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (zip 355 KB)

Appendix

Appendix

1.1 Proof of propositions and lemmas

Proof of Proposition 1

Lemma 1

Proof of Lemma A1

Proof of Proposition 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation