Abstract
The least absolute shrinkage and selection operator (lasso) has been widely used in regression analysis. Based on the piecewise linear property of the solution path, least angle regression provides an efficient algorithm for computing the solution paths of lasso. Group lasso is an important generalization of lasso that can be applied to regression with grouped variables. However, the solution path of group lasso is not piecewise linear and hence cannot be obtained by least angle regression. By transforming the problem into a system of differential equations, we develop an algorithm for efficient computation of group lasso solution paths. Simulation studies are conducted for comparing the proposed algorithm to the best existing algorithm: the groupwise-majorization-descent algorithm.
Similar content being viewed by others
References
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al.: Least angle regression. Ann. Stat. 32, 407–499 (2004)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1 (2010)
Fu, W.J.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7, 397–416 (1998)
Hindmarsh, A.C.: Odepack, a systematized collection of ode solvers. In: Stepleman, R.S., et al. (eds.) IMACS Transactions on Scientific Computation, vol. 1, pp. 55–64. North-Holland, Amsterdam (1983)
Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, New York (2013)
Liu, J., Ji, S., Ye, J.: Slep: Sparse learning with efficient projections. Arizona State Univ. 6, 491 (2009)
Osborne, M.R., Presnell, B., Turlach, B.A.: On the lasso and its dual. J. Comput. Graph. Stat. 9, 319–337 (2000)
Radhakrishnan, K., Hindmarsh, A. C.: Description anduse of LSODE, the Livermore solver for ordinary differentialequations. National Aeronautics and Space Administration, Office ofManagement, Scientific and Technical Information Program (1993)
Rosset, S., Zhu, J.: Piecewise linear regularized solution paths. Ann. Stat. 35, 1012–1030 (2007)
Roth, V., Fischer, B.: The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In: Proceedings of the 25th International Conference on Machine learning, pp. 848–855. ACM (2008)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)
Tibshirani, R.J.: The Solution Path of the Generalized Lasso. Stanford University, Stanford (2011)
Tibshirani, R.J., et al.: The lasso problem and uniqueness. Electron. J. Stat. 7, 1456–1490 (2013)
Yang, Y., Zou, H.: A fast unified algorithm for solving group-lasso penalize learning problems. Stat. Comput., 1–13 (2014)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68, 49–67 (2006)
Zhou, H., Wu, Y.: A generic path algorithm for regularized statistical estimation. J. Am. Stat. Assoc. 109, 686–699 (2014)
Acknowledgments
We would like to thank the associate editor and the anonymous referees for their constructive suggestions and comments. Research is supported in part by HKSAR-RGC-ECS 405012 and HKSAR-RGC-GRF 405113, 14601015.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
1.1 Proof of propositions and lemmas
Proof of Proposition 1
Suppose that the dimensions of \(X_\mathcal {J}^TX_\mathcal {J}+\lambda \nabla ^2 h\) is \(p' \times p'\). For any non-zero vector z of size \(p'\), we have \(X_\mathcal {J}z \ne 0\) because \(X_\mathcal {J}z=0\) only if \(z=0\) for \(X_\mathcal {J}\) whose columns are linearly independent. Hence, we have \(z^TX_\mathcal {J}^TX_\mathcal {J}z=(X_\mathcal {J}z)^T(X_\mathcal {J}z)>0\). Also, \(\nabla ^2 h\) is a block diagonal matrix expressed as
where \(D_k=w_k(\Vert \beta _k\Vert ^2I-\beta _k\beta _k^T)/\Vert \beta _k\Vert ^3\), k in \(\mathcal {J}\). According to the Cauchy-Schwarz inequality and the expression \(z=[\cdots z_k^T \cdots ]^T\), k in \(\mathcal {J}\), we obtain
Therefore, we have \(z^T(X_\mathcal {J}^TX_\mathcal {J}+\lambda \nabla ^2 h)z=z^TX_\mathcal {J}^TX_\mathcal {J}z+\lambda z^T(\nabla ^2 h)z>0\), which implies that \(X_\mathcal {J}^TX_\mathcal {J}+\lambda \nabla ^2 h\) is positive definite. \(\square \)
The following lemma is used to prove Proposition 2.
Lemma 1
\(\bar{h}(\beta )\) is bounded for \(\beta \) in \(\mathcal {G}\), where \(\mathcal {G}=\{\hat{\beta }(\lambda ), \lambda \ge 0\}\).
Proof of Lemma A1
An equivalent definition for \(\hat{\beta }(\lambda )\) is that \(\hat{\beta }(\lambda )\) is the solution of
for some \(t \ge 0\), see Roth and Fischer (2008). Let \(\beta _{OLS}\) be the ordinary least square estimate of the regression problem. In other words, \(\beta _{OLS}={{\mathrm{arg\,min}}}_{\beta }q(\beta )\). For some \(t \le \bar{h}(\beta _{OLS})\) , the corresponding \(\hat{\beta }(\lambda )\) is the solution of (12); thus \(\bar{h}(\hat{\beta }(\lambda ))\le t \le \bar{h}(\beta _{OLS})\). On the other hand, if \(t > \bar{h}(\beta _{OLS})\), the group lasso estimate is obviously the ordinary least square estimate \(\beta _{OLS}\). Combining the two cases, we have \(\bar{h}(\hat{\beta }(\lambda )) \le \bar{h}(\beta _{OLS})\) for all \(\lambda \). \(\square \)
Proof of Proposition 2
From the definition of f, we have
where \(\lim _{\Delta \lambda \rightarrow 0}\xi (\Delta \lambda )=0\). By (13) and (14), we have
which, combined with Lemma 1, implies that
Also, it is obvious that \(f_{\lambda }(\hat{\beta }(\lambda +\Delta \lambda ))-f_{\lambda }(\hat{\beta }(\lambda ))\ge 0\) and hence
Suppose that \(\lim _{\Delta \lambda \rightarrow 0}\hat{\beta }(\lambda +\Delta \lambda )\ne \hat{\beta }(\lambda )\), then for all \(\delta >0\), there exist some \(\epsilon >0\) and \(\Delta \lambda >0\) such that \(|\Delta \lambda |<\delta \) and \(\Vert \hat{\beta }(\lambda +\Delta \lambda )-\hat{\beta }(\lambda )\Vert \ge \epsilon \). However, \(\hat{\beta }(\lambda )\) is unique, so \(f_{\lambda }(\hat{\beta }(\lambda +\Delta \lambda ))-f_{\lambda }(\hat{\beta }(\lambda ))>\eta \) for \(\eta >0\), contradicting (16). Thus, \(\lim _{\Delta \lambda \rightarrow 0}\hat{\beta }(\lambda +\Delta \lambda )=\hat{\beta }(\lambda )\), and the proof is completed. \(\square \)
Rights and permissions
About this article
Cite this article
Yau, C.Y., Hui, T.S. LARS-type algorithm for group lasso. Stat Comput 27, 1041–1048 (2017). https://doi.org/10.1007/s11222-016-9669-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9669-7