APPLE: approximate path for penalized likelihood estimators

Yu, Yi; Feng, Yang

doi:10.1007/s11222-013-9403-7

APPLE: approximate path for penalized likelihood estimators

Published: 01 June 2013

Volume 24, pages 803–819, (2014)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Yi Yu¹ &
Yang Feng²

441 Accesses
7 Citations
Explore all metrics

Abstract

In high-dimensional data analysis, penalized likelihood estimators are shown to provide superior results in both variable selection and parameter estimation. A new algorithm, APPLE, is proposed for calculating the Approximate Path for Penalized Likelihood Estimators. Both convex penalties (such as LASSO) and folded concave penalties (such as MCP) are considered. APPLE efficiently computes the solution path for the penalized likelihood estimator using a hybrid of the modified predictor-corrector method and the coordinate-descent algorithm. APPLE is compared with several well-known packages via simulation and analysis of two gene expression data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable selection and estimation using a continuous approximation to the $$L_0$$ penalty

Article 19 October 2016

High-dimensional variable selection with the plaid mixture model for clustering

Article 17 May 2018

Data-Adaptive Shrinkage via the Hyperpenalized EM Algorithm

Article 03 June 2015

References

Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Proc. 2nd International Symposium on Information Theory (1973)
Google Scholar
Barron, A., Birge, L., Massart, P.: Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113, 301–413 (1999)
Article MATH MathSciNet Google Scholar
Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232–253 (2011)
Article MATH MathSciNet Google Scholar
Breiman, A.G., Gao, H.Y.: Understanding waveshrink: variance and bias estimation. Biometrika 83, 727–745 (1996)
Article MathSciNet Google Scholar
Chen, J., Chen, Z.: Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3), 759–771 (2008)
Article MATH MathSciNet Google Scholar
Chen, S., Donoho, D.L.: On basis pursuit. Tech. Rep., Dept. Statistics, Stanford Univ. (1994)
Consortium, M.: The microarray quality control (maqc)-ii study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol. 28, 827–841 (2010)
Article Google Scholar
Daubechies, I., Defrise, M., Mol, C.D.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004)
Article MATH Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression (with discussion). Ann. Stat. 32, 407–499 (2004)
Article MATH MathSciNet Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MATH MathSciNet Google Scholar
Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Stat. Sin. 20, 101–148 (2010)
MATH MathSciNet Google Scholar
Fan, J., Xue, L., Zou, H.: Strong oracle optimality of folded concave penalized estimation (2012). arXiv:1210.5992
Feng, Y., Li, T., Ying, Z.: Likelihood adaptively modified penalties. Manuscript (2012)
Friedman, J., Hastie, T., Tibshirani, R.: Pathwise coordinate optimization. Ann. Stat. 1, 302–332 (2007)
Article MATH MathSciNet Google Scholar
Friedman, J., et al.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
Google Scholar
Fu, W.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7(3), 397–416 (1998)
Google Scholar
van de Geer, S.: High-dimensional generalized linear models and the lasso. Ann. Stat. 36, 614–645 (2008)
Article MATH Google Scholar
Genkin, A., Lewis, D.D., Madigan, D.: Large-scale Bayesian logistic regression for text categorization. Technometrics 49(3), 291–304 (2007)
Article MathSciNet Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Hastie, T., Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: The entire regularization path for the support vector machine. J. Mach. Learn. Res. 5, 1391–1415 (2004)
MATH MathSciNet Google Scholar
Kim, S.J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale l ₁ regularized least squares. J. Mach. Learn. Res. 8, 1519–1555 (2007)
MATH MathSciNet Google Scholar
Krishnapuram, B., Carin, L., Figueiredo, M., Hartemink, A.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27, 957–968 (2005)
Article Google Scholar
Lee, S.I., Lee, H., Abbeel, P., Ng, A.Y.: Efficient l ₁ regularized logistic regression. In: Proceedings of the National Conference on Artificial Intelligence, vol. 21, pp. 401–408 (2006)
Google Scholar
Mallows, C.L.: Some comments on c _p. Technometrics 12, 661–675 (1973)
Google Scholar
McCullagh, P., Nelder, J.A.: Generalized Linear Model, 2nd edn. Chapman and Hall, New York (1989)
Book Google Scholar
Meier, L., Geer, S.V.D., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. B 70(1), 53–71 (2008)
Article MATH Google Scholar
Osborne, M., Presnell, B., Turlach, B.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20(3), 389–404 (2000)
Article MATH MathSciNet Google Scholar
Park, M.Y., Hastie, T.: An l ₁ regularization-path algorithm for generalized linear models. J. R. Stat. Soc. B 69, 659–677 (2007)
Article MathSciNet Google Scholar
Rosset, S., Zhu, J.: Piecewise linear regularized solution paths. Ann. Stat. 35(3), 1012–1030 (2007)
Article MATH MathSciNet Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Article MATH Google Scholar
Shevade, K., Keerthi, S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19, 2246–2253 (2003)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 9, 1135–1151 (1996)
Google Scholar
Wei, F., Zhu, H.: Group coordinate descent algorithms for nonconvex penalized regression. Comput. Stat. Data Anal. 56(2), 316–326 (2012)
Article MATH MathSciNet Google Scholar
Wu, T.T., Lange, K.: Coordinate descent method for lasso penalized regression. Ann. Appl. Stat. 2, 224–244 (2008)
Article MATH MathSciNet Google Scholar
Wu, Y.: An ordinary differential equation-based solution path algorithm. J. Nonparametr. Stat. 23(1), 185–199 (2011)
Article MATH MathSciNet Google Scholar
Yuan, M., Zou, H.: Efficient global approximation of generalized nonlinear ℓ ₁-regularized solution paths and its applications. J. Am. Stat. Assoc. 104(488), 1562–1574 (2009)
Article MATH MathSciNet Google Scholar
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)
Article MATH Google Scholar
Zhang, C.H., Huang, J.: The sparsity and bias of the lasso selection in high-dimensional regression. Ann. Stat. 36, 1567–1594 (2008)
Article MATH Google Scholar
Zhu, J., Hastie, T.: Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3), 427–443 (2004)
Article MATH Google Scholar
Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 38, 1509–1533 (2008)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors thank the editor, the associate editor, and referees for their constructive comments. The authors thank Diego Franco Saldaña for proofreading.

Author information

Authors and Affiliations

School of Mathematical Sciences, Fudan University, Shanghai, 200433, China
Yi Yu
Department of Statistics, Columbia University, New York, 10027, NY, USA
Yang Feng

Authors

Yi Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Feng.

Appendices

Appendix A: Logistic regression

1.1 A.1 LASSO

In logistic regression, we assume (x _i, y _i), i=1,…,n are i.i.d. with $\mathbb{P}(y_{i}=1|\boldsymbol{x}_{i})=p_{i}=\exp(\boldsymbol{\beta}'\boldsymbol{x}_{i})/(1+\exp(\boldsymbol{\beta}'\boldsymbol{x}_{i}))$. Then the target function for the LASSO penalized logistic regression is defined as

The KKT conditions are given as follows.

We define active set A _k as

$$A_k=\bigg\{j: \bigg|\frac{1}{n}\sum_{i=1}^n \biggl\{y_i-\frac{\exp({\widehat {\boldsymbol{\beta}}}^{(k)'} \boldsymbol{x}_i)}{1+\exp({\widehat {\boldsymbol{\beta}}}^{(k)'}\boldsymbol{x}_i)}\biggr\}x_{ij}\bigg|\ge \lambda_k \bigg\} \cup \{0\}. $$

To update, we define

(11)

then $\boldsymbol{s}^{(k)} = (0, \boldsymbol{s}^{(k)'}_{-0})'$, $\boldsymbol{d}^{(k)} = (0, \boldsymbol{d}^{(k)'}_{-0})'$, where

To correct,

1.2 A.2 MCP

For MCP penalized logistic regression, we define the target function as

The KKT conditions are given as follows.

For a given λ _k, define the active set A _k as

$$A_k=\{A_{k-1}\cup N_k\}\setminus D_k, $$

where

and

To perform adaptive rescaling on γ, define

To update, the derivatives are defined as follows,

(12)

(13)

(14)

(15)

and

To correct we use

$${\widehat {\boldsymbol{\beta}}}^{(k,j+1)}_{A_k}={\widehat {\boldsymbol{\beta}}}^{(k,j)}_{A_k}-\bigg(\frac{\partial^2 L^{(k)}}{\partial \boldsymbol{\beta}_{A_k}\partial \boldsymbol{\beta}_{A_k}^T}\bigg)^{-1} \bigg(\frac{\partial L^{(k)}}{\partial \boldsymbol{\beta}_{A_k}}\bigg), $$

where

(16)

(17)

and

(18)

Appendix B: Poisson regression

2.1 B.3 LASSO

In Poisson regression, we assume (x _i, y _i), i=1,…,n are iid with $\mathbb{P}(Y=y_{i})=e^{-\lambda_{i}}\lambda_{i}^{y_{i}}/(y_{i})!$, where logλ _i=β′x _i. Then criterion for the LASSO penalized Poisson regression is defined as

$$L(\boldsymbol{\beta})=\frac{1}{n}\sum_{i=1}^n \big\{\exp(\boldsymbol{\beta}'\boldsymbol{x}_i)-y_i(\boldsymbol{\beta}'\boldsymbol{x}_i)\big\}+\lambda\sum_{j=1}^p |\beta_j|. $$

The KKT conditions are given as follows.

For a given λ _k, we define the active set A _k as follows.

$$A_k=\Biggl\{j:~ \Bigg|\frac{1}{n}\sum_{i=1}^n \{y_i-\exp({\widehat {\boldsymbol{\beta}}}^{(k)'}\boldsymbol{x}_i)x_{ij}\}\Bigg|\ge \lambda_k\Biggr\}\cup \{0\}. $$

To update, we define

then

and

To correct,

2.2 B.4 MCP

For MCP penalized Poisson regression, we define the target function as

The KKT conditions are,

For a given λ _k, the active set is defined as

$$A_k=\{A_{k-1}\cup N_k\}\setminus D_k, $$

where

and

To update, the derivatives are defined as follows,

and

To correct we use

where

(19)

(20)

and

(21)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, Y., Feng, Y. APPLE: approximate path for penalized likelihood estimators. Stat Comput 24, 803–819 (2014). https://doi.org/10.1007/s11222-013-9403-7

Download citation

Received: 19 September 2012
Accepted: 02 May 2013
Published: 01 June 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11222-013-9403-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

APPLE: approximate path for penalized likelihood estimators

Abstract

Access this article

Similar content being viewed by others

Variable selection and estimation using a continuous approximation to the $$L_0$$ penalty

High-dimensional variable selection with the plaid mixture model for clustering

Data-Adaptive Shrinkage via the Hyperpenalized EM Algorithm

References

Acknowledgements