Skip to main content
Log in

Prediction-Oriented Marker Selection (PROMISE): With Application to High-Dimensional Regression

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript


In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on an individual patient’s biomarker/genomic profile. Two goals are to choose important biomarkers that accurately predict treatment outcomes and to cull unimportant biomarkers to reduce the cost of biological and clinical verifications. These goals are challenging due to the high dimensionality of genomic data. Variable selection methods based on penalized regression (e.g., the lasso and elastic net) have yielded promising results. However, selecting the right amount of penalization is critical to simultaneously achieving these two goals. Standard approaches based on cross-validation (CV) typically provide high prediction accuracy with high true positive rates (TPRs) but at the cost of too many false positives. Alternatively, stability selection (SS) controls the number of false positives, but at the cost of yielding too few true positives. To circumvent these issues, we propose prediction-oriented marker selection (PROMISE), which combines SS with CV to conflate the advantages of both methods. Our application of PROMISE with the lasso and elastic net in data analysis shows that, compared to CV, PROMISE produces sparse solutions, few false positives, and small type I + type II error, and maintains good prediction accuracy, with a marginal decrease in the TPRs. Compared to SS, PROMISE offers better prediction accuracy and TPRs. In summary, PROMISE can be applied in many fields to select regularization parameters when the goals are to minimize false positives and maximize prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others


  1. Alexander DH, Lange K (2011) Stability selection for genome-wide association. Genet Epidemiol 35(7):722–728. doi:

    Article  Google Scholar 

  2. Balendiran GK, Dabur R, Fraser D (2004) The role of glutathione in cancer. Cell Biochem Funct 22(6):343–352

    Article  Google Scholar 

  3. Beinrucker A, Dogan U, Blanchard G (2012) A simple extension of stability feature selection. In: Pinz A, Pock T, Bischof H, Leberl F (eds) Pattern recognition. Lecture notes in computer science. Springer, Berlin, pp 256–265. doi:

  4. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall, New York

    MATH  Google Scholar 

  5. Chin L, Andersen JN, Futreal PA (2011) Cancer genomics: from discovery science to personalized medicine. Nat Med 17(3):297–303

    Article  Google Scholar 

  6. Chung PM, Cappel RE, Gilbert HF (1991) Inhibition of glutathione disulfide reductase by glutathione. Arch Biochem Biophys 288(1):48–53

    Article  Google Scholar 

  7. Cronwright G, Le Blanc K, Gotherstrom C, Darcy P, Ehnman M, Brodin B (2005) Cancer/testis antigen expression in human mesenchymal stem cells: down-regulation of SSX impairs cell migration and matrix metalloproteinase 2 expression. Cancer Res 65(6):2207–2215

    Article  Google Scholar 

  8. Dahabreh IJ, Linardou H, Siannis F, Kosmidis P, Bafaloukos D, Murray S (2010) Somatic EGFR mutation and gene copy gain as predictive biomarkers for response to tyrosine kinase inhibitors in non-small cell lung cancer. Clin Cancer Res 16(1):291–303

    Article  Google Scholar 

  9. Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B (Stat Methodol) 70(5):849–911

    Article  MathSciNet  Google Scholar 

  10. Fan YS (2013) Companion diagnostic testing for targeted cancer therapies: an overview. Genet Test Mol Biomark 17(7):515–523

    Article  Google Scholar 

  11. Fawcett T (2004) Roc graphs: notes and practical considerations for researchers. Tech. Rep, HP Laboratories, Palo Alto

  12. Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22

    Article  Google Scholar 

  13. Gandy A (2010) Discussion on “stability selection” by Meinshausen and Bühlmann. J R Stat Soc Ser B (Stat Methodol) 72(4):458–459. doi:

    MathSciNet  Google Scholar 

  14. Gu X, Yin G, Lee JJ (2013) Bayesian two-step lasso strategy for biomarker selection in personalized medicine development for time-to-event endpoints. Contemp Clin Trials 36(2):642–650

    Article  Google Scholar 

  15. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, New York

    Book  Google Scholar 

  16. Higashi H, Vallbohmer D, Warnecke-Eberz U, Hokita S, Xi H, Brabender J, Metzger R, Baldus SE, Natsugoe S, Aikou T, Holscher AH, Schneider PM (2006) Down-regulation of Gadd45 expression is associated with tumor differentiation in non-small cell lung cancer. Anticancer Res 26(3A):2143–2147

    Google Scholar 

  17. Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67

    Article  Google Scholar 

  18. Kim ES, Herbst RS, Wistuba II, Lee JJ, Blumenschein GR, Tsao A, Stewart DJ, Hicks ME, Erasmus J, Gupta S et al (2011) The battle trial: personalizing therapy for lung cancer. Cancer Discov 1(1):44–53

    Article  Google Scholar 

  19. Lee JJ, Gu X, Liu S (2010) Bayesian adaptive randomization designs for targeted agent development. Clin Trials 7(5):584–596

    Article  Google Scholar 

  20. Leng C, Lin Y, Wahba G (2006) A note on the lasso and related procedures in model selection. Stat Sin 16(4):1273

    MathSciNet  MATH  Google Scholar 

  21. Liu J, Huang J, Ma S, Wang K (2013) Incorporating group correlations in genome-wide association studies using smoothed group lasso. Biostatistics 14(2):205–219

    Article  Google Scholar 

  22. Mehta S, Shelling A, Muthukaruppan A, Lasham A, Blenkiron C, Laking G, Print C (2010) Predictive and prognostic molecular markers for cancer medicine. Ther Adv Med Oncol 2(2):125–148

    Article  Google Scholar 

  23. Meinshausen N, Buhlmann P (2010) Stability selection. J R Stat Soc Ser B (Stat Methodol) 72(4):417–473. doi:

    Article  MathSciNet  Google Scholar 

  24. Menezes RJ, Cheney RT, Husain A, Tretiakova M, Loewen G, Johnson CS, Jayaprakash V, Moysich KB, Salgia R, Reid ME (2008) Vitamin D receptor expression in normal, premalignant, and malignant human lung tissue. Cancer Epidemiol Biomark Prev 17(5):1104–1110

    Article  Google Scholar 

  25. Pugliese D, Palermo G, Totaro A, Bassi PF, Pinto F (2016) Clinical, pathological and molecular prognostic factors in prostate cancer decision-making process. Urologia 83(1):14–20. doi:

    Article  Google Scholar 

  26. Segal MR, Dahlquist KD, Conklin BR (2003) Regression approaches for microarray data analysis. J Comput Biol 10(6):961–980

    Article  Google Scholar 

  27. Stewart DJ (2014) Wnt signaling pathway in non-small cell lung cancer. J Natl Cancer Inst 106(1):djt356

  28. Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B (Methodol) 36(2):111–147.

  29. Sznol M (2010) Reporting disease control rates or clinical benefit rates in early clinical trials of anticancer agents: useful endpoint or hype? Curr Opin Investig Drugs 11(12):1340–1341

    Google Scholar 

  30. Tibshirani R (1994) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288

    MathSciNet  MATH  Google Scholar 

  31. Tsay JJ, Tchou-Wong KM, Greenberg AK, Pass H, Rom WN (2013) Aryl hydrocarbon receptor and lung cancer. Anticancer Res 33(4):1247–1256

    Google Scholar 

  32. Uemura Y, Kobayashi M, Nakata H, Kubota T, Bandobashi K, Saito T, Taguchi H (2006) Effects of GM-CSF and M-CSF on tumor progression of lung cancer: roles of MEK1/ERK and AKT/PKB pathways. Int J Mol Med 18(2):365–373

    Google Scholar 

  33. Werft W, Benner A, Kopp-Schneider A (2012) On the identification of predictive biomarkers: detecting treatment-by-gene interaction in high-dimensional data. Comput Stat Data Anal 56(5):1275–1286. doi:

    Article  MathSciNet  Google Scholar 

  34. Younes M, Pathak M, Finnie D, Sifers RN, Liu Y, Schwartz MR (2000) Expression of the neutral amino acids transporter ASCT1 in esophageal carcinomas. Anticancer Res 20(5C):3775–3779

    Google Scholar 

  35. Zang Y, Lee JJ (2014) Adaptive clinical trial designs in oncology. Chin Clin Oncol 3(4).

  36. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320

    Article  MathSciNet  Google Scholar 

Download references


JL and VB were also partially supported by the NIH through the University of Texas MD Anderson Cancer Center Support Grant (CCSG) (P30 CA016672). VB was also partially supported by NIH grant R01 CA160736 and NSF DMS 1463233. We thank Ms. Lee Ann Chastain for helping to edit this manuscript.

Author information

Authors and Affiliations


Corresponding author

Correspondence to J. Jack Lee.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 56 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, S., Baladandayuthapani, V. & Lee, J.J. Prediction-Oriented Marker Selection (PROMISE): With Application to High-Dimensional Regression. Stat Biosci 9, 217–245 (2017).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: