We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

APPLE: approximate path for penalized likelihood estimators

Abstract

In high-dimensional data analysis, penalized likelihood estimators are shown to provide superior results in both variable selection and parameter estimation. A new algorithm, APPLE, is proposed for calculating the Approximate Path for Penalized Likelihood Estimators. Both convex penalties (such as LASSO) and folded concave penalties (such as MCP) are considered. APPLE efficiently computes the solution path for the penalized likelihood estimator using a hybrid of the modified predictor-corrector method and the coordinate-descent algorithm. APPLE is compared with several well-known packages via simulation and analysis of two gene expression data sets.

This is a preview of subscription content, access via your institution.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2

References

  1. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Proc. 2nd International Symposium on Information Theory (1973)

    Google Scholar 

  2. Barron, A., Birge, L., Massart, P.: Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113, 301–413 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  3. Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232–253 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  4. Breiman, A.G., Gao, H.Y.: Understanding waveshrink: variance and bias estimation. Biometrika 83, 727–745 (1996)

    Article  MathSciNet  Google Scholar 

  5. Chen, J., Chen, Z.: Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3), 759–771 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  6. Chen, S., Donoho, D.L.: On basis pursuit. Tech. Rep., Dept. Statistics, Stanford Univ. (1994)

  7. Consortium, M.: The microarray quality control (maqc)-ii study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol. 28, 827–841 (2010)

    Article  Google Scholar 

  8. Daubechies, I., Defrise, M., Mol, C.D.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004)

    Article  MATH  Google Scholar 

  9. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression (with discussion). Ann. Stat. 32, 407–499 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  10. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  11. Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Stat. Sin. 20, 101–148 (2010)

    MATH  MathSciNet  Google Scholar 

  12. Fan, J., Xue, L., Zou, H.: Strong oracle optimality of folded concave penalized estimation (2012). arXiv:1210.5992

  13. Feng, Y., Li, T., Ying, Z.: Likelihood adaptively modified penalties. Manuscript (2012)

  14. Friedman, J., Hastie, T., Tibshirani, R.: Pathwise coordinate optimization. Ann. Stat. 1, 302–332 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  15. Friedman, J., et al.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)

    Google Scholar 

  16. Fu, W.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7(3), 397–416 (1998)

    Google Scholar 

  17. van de Geer, S.: High-dimensional generalized linear models and the lasso. Ann. Stat. 36, 614–645 (2008)

    Article  MATH  Google Scholar 

  18. Genkin, A., Lewis, D.D., Madigan, D.: Large-scale Bayesian logistic regression for text categorization. Technometrics 49(3), 291–304 (2007)

    Article  MathSciNet  Google Scholar 

  19. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  20. Hastie, T., Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: The entire regularization path for the support vector machine. J. Mach. Learn. Res. 5, 1391–1415 (2004)

    MATH  MathSciNet  Google Scholar 

  21. Kim, S.J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale l 1 regularized least squares. J. Mach. Learn. Res. 8, 1519–1555 (2007)

    MATH  MathSciNet  Google Scholar 

  22. Krishnapuram, B., Carin, L., Figueiredo, M., Hartemink, A.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27, 957–968 (2005)

    Article  Google Scholar 

  23. Lee, S.I., Lee, H., Abbeel, P., Ng, A.Y.: Efficient l 1 regularized logistic regression. In: Proceedings of the National Conference on Artificial Intelligence, vol. 21, pp. 401–408 (2006)

    Google Scholar 

  24. Mallows, C.L.: Some comments on c p . Technometrics 12, 661–675 (1973)

    Google Scholar 

  25. McCullagh, P., Nelder, J.A.: Generalized Linear Model, 2nd edn. Chapman and Hall, New York (1989)

    Google Scholar 

  26. Meier, L., Geer, S.V.D., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. B 70(1), 53–71 (2008)

    Article  MATH  Google Scholar 

  27. Osborne, M., Presnell, B., Turlach, B.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20(3), 389–404 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  28. Park, M.Y., Hastie, T.: An l 1 regularization-path algorithm for generalized linear models. J. R. Stat. Soc. B 69, 659–677 (2007)

    Article  MathSciNet  Google Scholar 

  29. Rosset, S., Zhu, J.: Piecewise linear regularized solution paths. Ann. Stat. 35(3), 1012–1030 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  30. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MATH  Google Scholar 

  31. Shevade, K., Keerthi, S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19, 2246–2253 (2003)

    Article  Google Scholar 

  32. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 9, 1135–1151 (1996)

    Google Scholar 

  33. Wei, F., Zhu, H.: Group coordinate descent algorithms for nonconvex penalized regression. Comput. Stat. Data Anal. 56(2), 316–326 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  34. Wu, T.T., Lange, K.: Coordinate descent method for lasso penalized regression. Ann. Appl. Stat. 2, 224–244 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  35. Wu, Y.: An ordinary differential equation-based solution path algorithm. J. Nonparametr. Stat. 23(1), 185–199 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  36. Yuan, M., Zou, H.: Efficient global approximation of generalized nonlinear 1-regularized solution paths and its applications. J. Am. Stat. Assoc. 104(488), 1562–1574 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  37. Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)

    Article  MATH  Google Scholar 

  38. Zhang, C.H., Huang, J.: The sparsity and bias of the lasso selection in high-dimensional regression. Ann. Stat. 36, 1567–1594 (2008)

    Article  MATH  Google Scholar 

  39. Zhu, J., Hastie, T.: Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3), 427–443 (2004)

    Article  MATH  Google Scholar 

  40. Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 38, 1509–1533 (2008)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors thank the editor, the associate editor, and referees for their constructive comments. The authors thank Diego Franco Saldaña for proofreading.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yang Feng.

Appendices

Appendix A: Logistic regression

A.1 LASSO

In logistic regression, we assume (x i y i ), i=1,…,n are i.i.d. with \(\mathbb{P}(y_{i}=1|\boldsymbol{x}_{i})=p_{i}=\exp(\boldsymbol{\beta}'\boldsymbol{x}_{i})/(1+\exp(\boldsymbol{\beta}'\boldsymbol{x}_{i}))\). Then the target function for the LASSO penalized logistic regression is defined as

The KKT conditions are given as follows.

We define active set A k as

$$A_k=\bigg\{j: \bigg|\frac{1}{n}\sum_{i=1}^n \biggl\{y_i-\frac{\exp({\widehat {\boldsymbol{\beta}}}^{(k)'} \boldsymbol{x}_i)}{1+\exp({\widehat {\boldsymbol{\beta}}}^{(k)'}\boldsymbol{x}_i)}\biggr\}x_{ij}\bigg|\ge \lambda_k \bigg\} \cup \{0\}. $$

To update, we define

(11)

then \(\boldsymbol{s}^{(k)} = (0, \boldsymbol{s}^{(k)'}_{-0})'\), \(\boldsymbol{d}^{(k)} = (0, \boldsymbol{d}^{(k)'}_{-0})'\), where

To correct,

A.2 MCP

For MCP penalized logistic regression, we define the target function as

The KKT conditions are given as follows.

For a given λ k , define the active set A k as

$$A_k=\{A_{k-1}\cup N_k\}\setminus D_k, $$

where

and

To perform adaptive rescaling on γ, define

To update, the derivatives are defined as follows,

(12)
(13)
(14)
(15)

and

To correct we use

$${\widehat {\boldsymbol{\beta}}}^{(k,j+1)}_{A_k}={\widehat {\boldsymbol{\beta}}}^{(k,j)}_{A_k}-\bigg(\frac{\partial^2 L^{(k)}}{\partial \boldsymbol{\beta}_{A_k}\partial \boldsymbol{\beta}_{A_k}^T}\bigg)^{-1} \bigg(\frac{\partial L^{(k)}}{\partial \boldsymbol{\beta}_{A_k}}\bigg), $$

where

(16)
(17)

and

(18)

Appendix B: Poisson regression

B.3 LASSO

In Poisson regression, we assume (x i y i ), i=1,…,n are iid with \(\mathbb{P}(Y=y_{i})=e^{-\lambda_{i}}\lambda_{i}^{y_{i}}/(y_{i})!\), where logλ i =βx i . Then criterion for the LASSO penalized Poisson regression is defined as

$$L(\boldsymbol{\beta})=\frac{1}{n}\sum_{i=1}^n \big\{\exp(\boldsymbol{\beta}'\boldsymbol{x}_i)-y_i(\boldsymbol{\beta}'\boldsymbol{x}_i)\big\}+\lambda\sum_{j=1}^p |\beta_j|. $$

The KKT conditions are given as follows.

For a given λ k , we define the active set A k as follows.

$$A_k=\Biggl\{j:~ \Bigg|\frac{1}{n}\sum_{i=1}^n \{y_i-\exp({\widehat {\boldsymbol{\beta}}}^{(k)'}\boldsymbol{x}_i)x_{ij}\}\Bigg|\ge \lambda_k\Biggr\}\cup \{0\}. $$

To update, we define

then

and

To correct,

B.4 MCP

For MCP penalized Poisson regression, we define the target function as

The KKT conditions are,

For a given λ k , the active set is defined as

$$A_k=\{A_{k-1}\cup N_k\}\setminus D_k, $$

where

and

To update, the derivatives are defined as follows,

and

To correct we use

where

(19)
(20)

and

(21)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Yu, Y., Feng, Y. APPLE: approximate path for penalized likelihood estimators. Stat Comput 24, 803–819 (2014). https://doi.org/10.1007/s11222-013-9403-7

Download citation

Keywords

  • APPLE
  • LASSO
  • MCP
  • Penalized likelihood estimator
  • Solution path