Skip to main content
Log in

LARS-type algorithm for group lasso

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

The least absolute shrinkage and selection operator (lasso) has been widely used in regression analysis. Based on the piecewise linear property of the solution path, least angle regression provides an efficient algorithm for computing the solution paths of lasso. Group lasso is an important generalization of lasso that can be applied to regression with grouped variables. However, the solution path of group lasso is not piecewise linear and hence cannot be obtained by least angle regression. By transforming the problem into a system of differential equations, we develop an algorithm for efficient computation of group lasso solution paths. Simulation studies are conducted for comparing the proposed algorithm to the best existing algorithm: the groupwise-majorization-descent algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al.: Least angle regression. Ann. Stat. 32, 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1 (2010)

    Article  Google Scholar 

  • Fu, W.J.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7, 397–416 (1998)

    MathSciNet  Google Scholar 

  • Hindmarsh, A.C.: Odepack, a systematized collection of ode solvers. In: Stepleman, R.S., et al. (eds.) IMACS Transactions on Scientific Computation, vol. 1, pp. 55–64. North-Holland, Amsterdam (1983)

    Google Scholar 

  • Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, New York (2013)

    Book  MATH  Google Scholar 

  • Liu, J., Ji, S., Ye, J.: Slep: Sparse learning with efficient projections. Arizona State Univ. 6, 491 (2009)

    Google Scholar 

  • Osborne, M.R., Presnell, B., Turlach, B.A.: On the lasso and its dual. J. Comput. Graph. Stat. 9, 319–337 (2000)

    MathSciNet  Google Scholar 

  • Radhakrishnan, K., Hindmarsh, A. C.: Description anduse of LSODE, the Livermore solver for ordinary differentialequations. National Aeronautics and Space Administration, Office ofManagement, Scientific and Technical Information Program (1993)

  • Rosset, S., Zhu, J.: Piecewise linear regularized solution paths. Ann. Stat. 35, 1012–1030 (2007)

  • Roth, V., Fischer, B.: The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In: Proceedings of the 25th International Conference on Machine learning, pp. 848–855. ACM (2008)

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Tibshirani, R.J.: The Solution Path of the Generalized Lasso. Stanford University, Stanford (2011)

    MATH  Google Scholar 

  • Tibshirani, R.J., et al.: The lasso problem and uniqueness. Electron. J. Stat. 7, 1456–1490 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Yang, Y., Zou, H.: A fast unified algorithm for solving group-lasso penalize learning problems. Stat. Comput., 1–13 (2014)

  • Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68, 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou, H., Wu, Y.: A generic path algorithm for regularized statistical estimation. J. Am. Stat. Assoc. 109, 686–699 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

We would like to thank the associate editor and the anonymous referees for their constructive suggestions and comments. Research is supported in part by HKSAR-RGC-ECS 405012 and HKSAR-RGC-GRF 405113, 14601015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun Yip Yau.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 355 KB)

Appendix

Appendix

1.1 Proof of propositions and lemmas

Proof of Proposition 1

Suppose that the dimensions of \(X_\mathcal {J}^TX_\mathcal {J}+\lambda \nabla ^2 h\) is \(p' \times p'\). For any non-zero vector z of size \(p'\), we have \(X_\mathcal {J}z \ne 0\) because \(X_\mathcal {J}z=0\) only if \(z=0\) for \(X_\mathcal {J}\) whose columns are linearly independent. Hence, we have \(z^TX_\mathcal {J}^TX_\mathcal {J}z=(X_\mathcal {J}z)^T(X_\mathcal {J}z)>0\). Also, \(\nabla ^2 h\) is a block diagonal matrix expressed as

$$\begin{aligned} \nabla ^2 h=\left[ \begin{array}{ccc} \ddots &{} ~~~~~~~~ &{} 0\\ &{} D_k &{} \\ 0 &{} ~~~~~~~~ &{} \ddots \\ \end{array}\right] , \end{aligned}$$

where \(D_k=w_k(\Vert \beta _k\Vert ^2I-\beta _k\beta _k^T)/\Vert \beta _k\Vert ^3\), k in \(\mathcal {J}\). According to the Cauchy-Schwarz inequality and the expression \(z=[\cdots z_k^T \cdots ]^T\), k in \(\mathcal {J}\), we obtain

$$\begin{aligned} \begin{aligned}&z^T(\nabla ^2 h)z\\&\quad =\sum _{k\in \mathcal {J}}z_k^TD_kz_k\\&\quad =\sum _{k\in \mathcal {J}}w_k[(\beta _k^T\beta _k)(z_k^Tz_k)-(z_k^T\beta _k)(\beta _k^Tz_k)]/\Vert \beta _k\Vert ^3\ge 0. \end{aligned} \end{aligned}$$

Therefore, we have \(z^T(X_\mathcal {J}^TX_\mathcal {J}+\lambda \nabla ^2 h)z=z^TX_\mathcal {J}^TX_\mathcal {J}z+\lambda z^T(\nabla ^2 h)z>0\), which implies that \(X_\mathcal {J}^TX_\mathcal {J}+\lambda \nabla ^2 h\) is positive definite. \(\square \)

The following lemma is used to prove Proposition 2.

Lemma 1

\(\bar{h}(\beta )\) is bounded for \(\beta \) in \(\mathcal {G}\), where \(\mathcal {G}=\{\hat{\beta }(\lambda ), \lambda \ge 0\}\).

Proof of Lemma A1

An equivalent definition for \(\hat{\beta }(\lambda )\) is that \(\hat{\beta }(\lambda )\) is the solution of

$$\begin{aligned} \min _{\beta }q(\beta ) \text { such that } \bar{h}(\beta )\le t\,, \end{aligned}$$
(12)

for some \(t \ge 0\), see Roth and Fischer (2008). Let \(\beta _{OLS}\) be the ordinary least square estimate of the regression problem. In other words, \(\beta _{OLS}={{\mathrm{arg\,min}}}_{\beta }q(\beta )\). For some \(t \le \bar{h}(\beta _{OLS})\) , the corresponding \(\hat{\beta }(\lambda )\) is the solution of (12); thus \(\bar{h}(\hat{\beta }(\lambda ))\le t \le \bar{h}(\beta _{OLS})\). On the other hand, if \(t > \bar{h}(\beta _{OLS})\), the group lasso estimate is obviously the ordinary least square estimate \(\beta _{OLS}\). Combining the two cases, we have \(\bar{h}(\hat{\beta }(\lambda )) \le \bar{h}(\beta _{OLS})\) for all \(\lambda \). \(\square \)

Proof of Proposition 2

From the definition of f, we have

$$\begin{aligned}&f_{\lambda +\Delta \lambda }(\hat{\beta }(\lambda +\Delta \lambda )) \le f_{\lambda +\Delta \lambda }(\hat{\beta }(\lambda ))\quad \text { and} \end{aligned}$$
(13)
$$\begin{aligned}&f_{\lambda +\Delta \lambda }(\hat{\beta }(\lambda )) = f_{\lambda }(\hat{\beta }(\lambda ))+\xi (\Delta \lambda )\,, \end{aligned}$$
(14)

where \(\lim _{\Delta \lambda \rightarrow 0}\xi (\Delta \lambda )=0\). By (13) and (14), we have

$$\begin{aligned} f_{\lambda +\Delta \lambda }(\hat{\beta }(\lambda +\Delta \lambda ))\le f_{\lambda }(\hat{\beta }(\lambda ))+\xi (\Delta \lambda ), \end{aligned}$$

which, combined with Lemma 1, implies that

$$\begin{aligned} \begin{aligned}&f_{\lambda }(\hat{\beta }(\lambda +\Delta \lambda ))-f_{\lambda }(\hat{\beta }(\lambda ))\\&\quad \le f_{\lambda }(\hat{\beta }(\lambda +\Delta \lambda ))-f_{\lambda +\Delta \lambda }(\hat{\beta }(\lambda +\Delta \lambda ))+\xi (\Delta \lambda )\\&\quad \le \sup _{\beta \in \mathcal {G}}|f_{\lambda }(\beta )-f_{\lambda +\Delta \lambda }(\beta )|+\xi (\Delta \lambda )\\&\quad =\sup _{\beta \in \mathcal {G}}|\Delta \lambda \bar{h}(\beta )|+\xi (\Delta \lambda )\\&\quad =|\Delta \lambda |\sup _{\beta \in \mathcal {G}}|\bar{h}(\beta )|+\xi (\Delta \lambda )\\&\quad \rightarrow 0\,,\quad \text { as }\lambda \rightarrow 0\,. \end{aligned} \end{aligned}$$
(15)

Also, it is obvious that \(f_{\lambda }(\hat{\beta }(\lambda +\Delta \lambda ))-f_{\lambda }(\hat{\beta }(\lambda ))\ge 0\) and hence

$$\begin{aligned} \lim _{\Delta \lambda \rightarrow 0}f_{\lambda }(\hat{\beta }(\lambda +\Delta \lambda ))=f_{\lambda }(\hat{\beta }(\lambda ))\,. \end{aligned}$$
(16)

Suppose that \(\lim _{\Delta \lambda \rightarrow 0}\hat{\beta }(\lambda +\Delta \lambda )\ne \hat{\beta }(\lambda )\), then for all \(\delta >0\), there exist some \(\epsilon >0\) and \(\Delta \lambda >0\) such that \(|\Delta \lambda |<\delta \) and \(\Vert \hat{\beta }(\lambda +\Delta \lambda )-\hat{\beta }(\lambda )\Vert \ge \epsilon \). However, \(\hat{\beta }(\lambda )\) is unique, so \(f_{\lambda }(\hat{\beta }(\lambda +\Delta \lambda ))-f_{\lambda }(\hat{\beta }(\lambda ))>\eta \) for \(\eta >0\), contradicting (16). Thus, \(\lim _{\Delta \lambda \rightarrow 0}\hat{\beta }(\lambda +\Delta \lambda )=\hat{\beta }(\lambda )\), and the proof is completed. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yau, C.Y., Hui, T.S. LARS-type algorithm for group lasso. Stat Comput 27, 1041–1048 (2017). https://doi.org/10.1007/s11222-016-9669-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9669-7

Keywords

Navigation