Skip to main content
Log in

A Two-Step Penalized Regression Method with Networked Predictors

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Penalized regression incorporating prior dependency structure of predictors can be effective in high-dimensional data analysis (Li and Li in Bioinformatics, 24:1175–1118, 2008). Pan et al. (Biometrics, 66:474–484, 2010) proposed a penalized regression method for better outcome prediction and variable selection by smoothing parameters over a given predictor network, which can be applied to analysis of microarray data with a given gene network. In this paper, we develop two modifications to their method for further performance enhancement. First, we employ convex programming and show its improved performance over an approximate optimization algorithm implemented in their original proposal. Second, we perform bias reduction after initial variable selection through a new penalty, leading to better parameter estimates and outcome prediction. Simulations have demonstrated substantial performance improvement of the proposed modifications over the original method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Binder H, Schumacher M (2008) Comment on “Network-constrained regularization and variable selection for analysis of genomic data”. Bioinformatics 24:2566–2568

    Article  Google Scholar 

  2. Bondell HD, Reich BJ (2008) Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64:115–123

    Article  MathSciNet  MATH  Google Scholar 

  3. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499

    Article  MathSciNet  MATH  Google Scholar 

  4. Grant M, Boyd S, Ye Y (2010) CVX: Matlab software for disciplined convex programming. Available at http://www.stanford.edu/boyd/cvx

  5. Higgins ME, Claremont M, Major JE, Sander C, Lash AE (2007) CancerGenes: a gene selection resource for cancer genome projects. Nucleic Acids Res 35 (Suppl 1):D721–D726

    Article  Google Scholar 

  6. Horvath S, Zhang B, Carlson M, Lu KV, Zhu S, Felciano RM, Laurance MF, Zhao W, Shu Q, Lee Y, Scheck AC, Liau LM, Wu H, Geschwind DH, Febbo PG, Kornblum HI, Cloughesy TF, Nelson SF, Mischel PS (2006) Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target. In: Proceedings of national academy of sciences, vol 103, pp 17402–17407

    Google Scholar 

  7. Li C, Li H (2008) Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24:1118–1175

    Article  Google Scholar 

  8. Li C, Li H (2010) Variable selection and regression analysis for graph-structured covariates with an application to genomics. Ann Appl Stat 4:1498–1516

    Article  MathSciNet  MATH  Google Scholar 

  9. Meinshausen N (2007) Relaxed Lasso. Comput Stat Data Anal 52:374–393

    Article  MathSciNet  MATH  Google Scholar 

  10. Meinshausen N, Bühlmann P (2010) Stability selection (with discussion). J R Stat Soc B 72:417–473

    Article  Google Scholar 

  11. Pan W, Xie B, Shen X (2010) Incorporating predictor network in penalized regression with application to microarray data. Biometrics 66:474–484

    Article  MathSciNet  MATH  Google Scholar 

  12. Shen X, Pan W, Zhu Y (2011) Likelihood-based selection and sharp parameter estimation. To appear in JASA. Available on-line at http://www.sph.umn.edu/biostatistics/research/reports.asp

  13. Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc B 58:267–288

    MathSciNet  MATH  Google Scholar 

  14. Tibshirani R, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused Lasso. J R Stat Soc B 67:91–108

    Article  MathSciNet  MATH  Google Scholar 

  15. Wei Z, Li H (2007) A Markov random field model for network-based analysis of genomic data. Bioinformatics 23:1537–1544

    Article  MathSciNet  Google Scholar 

  16. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc B 68:49–67

    Article  MathSciNet  MATH  Google Scholar 

  17. Zhao P, Yu B (2004) Boosted Lasso. Tech rep, Dept of Statistics, UC-Berkeley

  18. Zhu Y, Shen X, Pan W (2009) Network-based support vector machine for classification of microarray samples. BMC Bioinform Suppl 10(1):S21

    Article  Google Scholar 

  19. Zhou S (2010) Thresholded Lasso for high dimensional variable selection and statistical estimation. Manuscript

  20. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Pan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, C., Pan, W. & Shen, X. A Two-Step Penalized Regression Method with Networked Predictors. Stat Biosci 4, 27–46 (2012). https://doi.org/10.1007/s12561-011-9051-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-011-9051-4

Keywords

Navigation