Skip to main content
Log in

APPLE: approximate path for penalized likelihood estimators

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In high-dimensional data analysis, penalized likelihood estimators are shown to provide superior results in both variable selection and parameter estimation. A new algorithm, APPLE, is proposed for calculating the Approximate Path for Penalized Likelihood Estimators. Both convex penalties (such as LASSO) and folded concave penalties (such as MCP) are considered. APPLE efficiently computes the solution path for the penalized likelihood estimator using a hybrid of the modified predictor-corrector method and the coordinate-descent algorithm. APPLE is compared with several well-known packages via simulation and analysis of two gene expression data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Proc. 2nd International Symposium on Information Theory (1973)

    Google Scholar 

  • Barron, A., Birge, L., Massart, P.: Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113, 301–413 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232–253 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Breiman, A.G., Gao, H.Y.: Understanding waveshrink: variance and bias estimation. Biometrika 83, 727–745 (1996)

    Article  MathSciNet  Google Scholar 

  • Chen, J., Chen, Z.: Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3), 759–771 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Chen, S., Donoho, D.L.: On basis pursuit. Tech. Rep., Dept. Statistics, Stanford Univ. (1994)

  • Consortium, M.: The microarray quality control (maqc)-ii study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol. 28, 827–841 (2010)

    Article  Google Scholar 

  • Daubechies, I., Defrise, M., Mol, C.D.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004)

    Article  MATH  Google Scholar 

  • Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression (with discussion). Ann. Stat. 32, 407–499 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Stat. Sin. 20, 101–148 (2010)

    MATH  MathSciNet  Google Scholar 

  • Fan, J., Xue, L., Zou, H.: Strong oracle optimality of folded concave penalized estimation (2012). arXiv:1210.5992

  • Feng, Y., Li, T., Ying, Z.: Likelihood adaptively modified penalties. Manuscript (2012)

  • Friedman, J., Hastie, T., Tibshirani, R.: Pathwise coordinate optimization. Ann. Stat. 1, 302–332 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Friedman, J., et al.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)

    Google Scholar 

  • Fu, W.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7(3), 397–416 (1998)

    Google Scholar 

  • van de Geer, S.: High-dimensional generalized linear models and the lasso. Ann. Stat. 36, 614–645 (2008)

    Article  MATH  Google Scholar 

  • Genkin, A., Lewis, D.D., Madigan, D.: Large-scale Bayesian logistic regression for text categorization. Technometrics 49(3), 291–304 (2007)

    Article  MathSciNet  Google Scholar 

  • Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  • Hastie, T., Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: The entire regularization path for the support vector machine. J. Mach. Learn. Res. 5, 1391–1415 (2004)

    MATH  MathSciNet  Google Scholar 

  • Kim, S.J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale l 1 regularized least squares. J. Mach. Learn. Res. 8, 1519–1555 (2007)

    MATH  MathSciNet  Google Scholar 

  • Krishnapuram, B., Carin, L., Figueiredo, M., Hartemink, A.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27, 957–968 (2005)

    Article  Google Scholar 

  • Lee, S.I., Lee, H., Abbeel, P., Ng, A.Y.: Efficient l 1 regularized logistic regression. In: Proceedings of the National Conference on Artificial Intelligence, vol. 21, pp. 401–408 (2006)

    Google Scholar 

  • Mallows, C.L.: Some comments on c p . Technometrics 12, 661–675 (1973)

    Google Scholar 

  • McCullagh, P., Nelder, J.A.: Generalized Linear Model, 2nd edn. Chapman and Hall, New York (1989)

    Book  Google Scholar 

  • Meier, L., Geer, S.V.D., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. B 70(1), 53–71 (2008)

    Article  MATH  Google Scholar 

  • Osborne, M., Presnell, B., Turlach, B.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20(3), 389–404 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  • Park, M.Y., Hastie, T.: An l 1 regularization-path algorithm for generalized linear models. J. R. Stat. Soc. B 69, 659–677 (2007)

    Article  MathSciNet  Google Scholar 

  • Rosset, S., Zhu, J.: Piecewise linear regularized solution paths. Ann. Stat. 35(3), 1012–1030 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MATH  Google Scholar 

  • Shevade, K., Keerthi, S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19, 2246–2253 (2003)

    Article  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 9, 1135–1151 (1996)

    Google Scholar 

  • Wei, F., Zhu, H.: Group coordinate descent algorithms for nonconvex penalized regression. Comput. Stat. Data Anal. 56(2), 316–326 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  • Wu, T.T., Lange, K.: Coordinate descent method for lasso penalized regression. Ann. Appl. Stat. 2, 224–244 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Wu, Y.: An ordinary differential equation-based solution path algorithm. J. Nonparametr. Stat. 23(1), 185–199 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Yuan, M., Zou, H.: Efficient global approximation of generalized nonlinear 1-regularized solution paths and its applications. J. Am. Stat. Assoc. 104(488), 1562–1574 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  • Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)

    Article  MATH  Google Scholar 

  • Zhang, C.H., Huang, J.: The sparsity and bias of the lasso selection in high-dimensional regression. Ann. Stat. 36, 1567–1594 (2008)

    Article  MATH  Google Scholar 

  • Zhu, J., Hastie, T.: Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3), 427–443 (2004)

    Article  MATH  Google Scholar 

  • Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 38, 1509–1533 (2008)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors thank the editor, the associate editor, and referees for their constructive comments. The authors thank Diego Franco Saldaña for proofreading.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Feng.

Appendices

Appendix A: Logistic regression

1.1 A.1 LASSO

In logistic regression, we assume (x i y i ), i=1,…,n are i.i.d. with \(\mathbb{P}(y_{i}=1|\boldsymbol{x}_{i})=p_{i}=\exp(\boldsymbol{\beta}'\boldsymbol{x}_{i})/(1+\exp(\boldsymbol{\beta}'\boldsymbol{x}_{i}))\). Then the target function for the LASSO penalized logistic regression is defined as

The KKT conditions are given as follows.

We define active set A k as

$$A_k=\bigg\{j: \bigg|\frac{1}{n}\sum_{i=1}^n \biggl\{y_i-\frac{\exp({\widehat {\boldsymbol{\beta}}}^{(k)'} \boldsymbol{x}_i)}{1+\exp({\widehat {\boldsymbol{\beta}}}^{(k)'}\boldsymbol{x}_i)}\biggr\}x_{ij}\bigg|\ge \lambda_k \bigg\} \cup \{0\}. $$

To update, we define

(11)

then \(\boldsymbol{s}^{(k)} = (0, \boldsymbol{s}^{(k)'}_{-0})'\), \(\boldsymbol{d}^{(k)} = (0, \boldsymbol{d}^{(k)'}_{-0})'\), where

To correct,

1.2 A.2 MCP

For MCP penalized logistic regression, we define the target function as

The KKT conditions are given as follows.

For a given λ k , define the active set A k as

$$A_k=\{A_{k-1}\cup N_k\}\setminus D_k, $$

where

and

To perform adaptive rescaling on γ, define

To update, the derivatives are defined as follows,

(12)
(13)
(14)
(15)

and

To correct we use

$${\widehat {\boldsymbol{\beta}}}^{(k,j+1)}_{A_k}={\widehat {\boldsymbol{\beta}}}^{(k,j)}_{A_k}-\bigg(\frac{\partial^2 L^{(k)}}{\partial \boldsymbol{\beta}_{A_k}\partial \boldsymbol{\beta}_{A_k}^T}\bigg)^{-1} \bigg(\frac{\partial L^{(k)}}{\partial \boldsymbol{\beta}_{A_k}}\bigg), $$

where

(16)
(17)

and

(18)

Appendix B: Poisson regression

2.1 B.3 LASSO

In Poisson regression, we assume (x i y i ), i=1,…,n are iid with \(\mathbb{P}(Y=y_{i})=e^{-\lambda_{i}}\lambda_{i}^{y_{i}}/(y_{i})!\), where logλ i =βx i . Then criterion for the LASSO penalized Poisson regression is defined as

$$L(\boldsymbol{\beta})=\frac{1}{n}\sum_{i=1}^n \big\{\exp(\boldsymbol{\beta}'\boldsymbol{x}_i)-y_i(\boldsymbol{\beta}'\boldsymbol{x}_i)\big\}+\lambda\sum_{j=1}^p |\beta_j|. $$

The KKT conditions are given as follows.

For a given λ k , we define the active set A k as follows.

$$A_k=\Biggl\{j:~ \Bigg|\frac{1}{n}\sum_{i=1}^n \{y_i-\exp({\widehat {\boldsymbol{\beta}}}^{(k)'}\boldsymbol{x}_i)x_{ij}\}\Bigg|\ge \lambda_k\Biggr\}\cup \{0\}. $$

To update, we define

then

and

To correct,

2.2 B.4 MCP

For MCP penalized Poisson regression, we define the target function as

The KKT conditions are,

For a given λ k , the active set is defined as

$$A_k=\{A_{k-1}\cup N_k\}\setminus D_k, $$

where

and

To update, the derivatives are defined as follows,

and

To correct we use

where

(19)
(20)

and

(21)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, Y., Feng, Y. APPLE: approximate path for penalized likelihood estimators. Stat Comput 24, 803–819 (2014). https://doi.org/10.1007/s11222-013-9403-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-013-9403-7

Keywords

Navigation