Predictor Selection

  • Eric Vittinghoff
  • David V. Glidden
  • Stephen C. Shiboski
  • Charles E. McCulloch
Part of the Statistics for Biology and Health book series (SBH)


Walter et al. (2001) developed a model to identify older adults at high risk of death in the first year after hospitalization, using data collected for 2,922 patients discharged from two hospitals in Ohio. Potential predictors included demographics, activities of daily living (ADLs), the APACHE-II illness-severity score, and information about the index hospitalization. A “backward” selection procedure with a restrictive inclusion criterion was used to choose a multipredictor model, using data from one of the two hospitals. The model was then validated using data from the other hospital. The goal was to select a model that best predicted future events, with a view toward identifying patients in need of more intensive monitoring and intervention.


Bayesian Information Criterion Treatment Effect Estimate Primary Predictor Candidate Predictor Maternal Weight Gain 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Allen, D. M. and Cady, F. B. (1982). Analyzing Experimental Data by Regression. Wadsworth, Belmont, CA.zbMATHGoogle Scholar
  2. Altman, D. G. and Andersen, P. K. (1989). Bootstrap investigation of the stability of the Cox regression model. Statistics in Medicine, 8, 771–783.CrossRefGoogle Scholar
  3. Altman, D. G. and Royston, P. (2000). What do we mean by validating a prognostic model? Statistics in Medicine, 19, 453–473.CrossRefGoogle Scholar
  4. Antman, E. M., Cohen, M., Bernink, P. J. L. M., McCabe, C. H., Horaceck, T., Papuchis, G., Mautner, B., Corbalan, R., Radley, D. and Braunwald, E. (2000). The TIMI Risk Score for unstable angina/non-ST elevation MI. Journal of the American Medical Association, 284(7), 835–842.CrossRefGoogle Scholar
  5. Beach, M. L. and Meier, P. (1989). Choosing covariates in the analysis of clinical trials. Controlled Clinical Trials, 10, 161S–175S.CrossRefGoogle Scholar
  6. Begg, M. D. and Lagakos, S. (1993). Loss in efficiency caused by omitted covariates and misspecifying exposure in logistic regression models. Journal of the American Statistical Association, 88(421), 166–170.CrossRefGoogle Scholar
  7. Breiman, L. (2001). Statistical modeling: the two cultures. Statistical Science, 16(3), 199–231.MathSciNetzbMATHCrossRefGoogle Scholar
  8. Brown, J., Vittinghoff, E., Wyman, J. F., Stone, K. L., Nevitt, M. C., Ensrud, K. E. and Grady, D. (2000). Urinary incontinence: does it increase risk for falls and fractures? Study of Osteoporotic Fractures Research Group. Journal of the American Geriatric Society, B48, 721–725.Google Scholar
  9. Buckland, S. T., Burnham, K. P. and Augustin, N. H. (1997). Model selection: an integral part of inference. Biometrics, 53, 603–618.zbMATHCrossRefGoogle Scholar
  10. Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society, Series A, 158, 419–466.Google Scholar
  11. Cook, N. R., Buring, J. E. and Ridker, P. M. (2006). The effective of including C-reactive protein in cardiovascular risk prediction models for women. Annals of Internal Medicine, 145, 21–29.Google Scholar
  12. Crager, M. R. (1987). Analysis of covariance in parallel-group clinical trials with pretreatment baselines. Biometrics, 43(4), 895–901.MathSciNetzbMATHCrossRefGoogle Scholar
  13. D’Agostino, R. B., Russell, M. W., Huse, D. M., Ellison, C., Silberhatz, H., Wilson, P. W. F. and Hartz, S. C. (2000). Primary and subsequent coronary risk appraisal: new results from the Framingham Study. American Heart Journal, 139, 272–281.Google Scholar
  14. Gail, M. H., Tan, W. Y. and Piantodosi, S. (1988). Tests for no treatment effect in randomized clinical trials. Biometrika, 75, 57–64.MathSciNetzbMATHCrossRefGoogle Scholar
  15. Gail, M. H., Wieand, S. and Piantodosi, S. (1984). Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika, 71, 431–444.MathSciNetzbMATHCrossRefGoogle Scholar
  16. Glymour, M. M., Weuve, J., Berkman, L. F., Kawachi, I. and Robins, J. M. (2005). When is baseline adjustment useful in analyses of change? an example with education and cognitive change. American Journal of Epidemiology, 163(3), 267–278.CrossRefGoogle Scholar
  17. Gordon, W., Polansky, J., Boscardin, W., Fung, K. and Steinman, M. (2010). Coronary risk assessment by point-based and equation-based Framingham models: significant implications for clinical care. Journal of General Internal Medicine, 25(11), 1145–51.CrossRefGoogle Scholar
  18. Greenland, S. (1989). Modeling and variable selection in epidemiologic analysis. American Journal of Public Health, 79(3), 340–349.CrossRefGoogle Scholar
  19. Greenland, S. (2003). Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology, 14(3), 300–306.Google Scholar
  20. Greenland, S. and Brumback, B. (2002). An overview of relations among causal modeling methods. International Journal of Epidemiology, 31(5), 1030–1037.CrossRefGoogle Scholar
  21. Greenland, S., Pearl, J. and Robins, J. M. (1999). Causal diagrams for epidemiologic research. Epidemiology, 10, 37–48.CrossRefGoogle Scholar
  22. Grodstein, F., Manson, J. E. and Stampfer, M. J. (2001). Postmenopausal hormone use and secondary prevention of coronary events in the Nurses’ Health Study. Annals of Internal Medicine, 135, 1–8.Google Scholar
  23. Harrell, F. E. (2005). Regression Modeling Strategies. Springer, New York.Google Scholar
  24. Harrell, F. E., Lee, K. L., Califf, R. M., Pryor, D. B. and Rosati, R. A. (1984). Regression modelling strategies for improved prognostic prediction. Statistics in Medicine, 3, 143–152.CrossRefGoogle Scholar
  25. Hauck, W. W., Anderson, S. and Marcus, S. M. (1998). Should we adjust for covariates in nonlinear regression analyses of randomized trials? Controlled Clinical Trials, 19, 249–256.CrossRefGoogle Scholar
  26. Henderson, R. and Oman, P. (1999). Effect of frailty on marginal regression estimates in survival analysis. Journal of the Royal Statistical Society, Series B, Methodological, 61, 367–379.MathSciNetzbMATHCrossRefGoogle Scholar
  27. Herńan, M. A., Hernández-Díaz, S. and Robins, J. M. (2004). A structural approach to selection bias. Epidemiology, 15(5), 615–625.CrossRefGoogle Scholar
  28. Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: biased estimates for nonorthogonal problems. Technometrics, 12, 55–67.zbMATHCrossRefGoogle Scholar
  29. Hosmer, D. W. and Lemeshow, S. (2000). Applied Logistic Regression. John Wiley & Sons, New York, Chichester.zbMATHCrossRefGoogle Scholar
  30. Hulley, S., Grady, D., Bush, T., Furberg, C., Herrington, D., Riggs, B. and Vittinghoff, E. (1998). Randomized trial of estrogen plus progestin for secondary prevention of heart disease in postmenopausal women. The Heart and Estrogen/progestin Replacement Study. Journal of the American Medical Association, 280(7), 605–613.Google Scholar
  31. Jewell, N. P. (2004). Statistics for Epidemiology. Chapman & Hall/CRC, Boca Raton, FL.Google Scholar
  32. Kanaya, A., Vittinghoff, E., Shlipak, M. G., Resnick, H. E., Visser, M., Grady, D. and Barrett-Connor, E. (2004). Association of total and central obesity with mortality in postmenopausal women with coronary heart disease. American Journal of Epidemiology, 158(12), 1161–1170.CrossRefGoogle Scholar
  33. Lagakos, S. W. and Schoenfeld, D. A. (1984). Properties of proportional-hazards score tests under misspecified regression models. Biometrics, 40, 1037–1048.MathSciNetzbMATHCrossRefGoogle Scholar
  34. Linhart, H. and Zucchini, W. (1986). Model Selection. John Wiley & Sons, New York, Chichester.zbMATHGoogle Scholar
  35. Maldonado, G. and Greenland, S. (1993). Simulation study of confounder-selection strategies. American Journal of Epidemiology, 138, 923–936.Google Scholar
  36. Meier, P., Ferguson, D. J. and Karrison, T. (1985). A controlled trial of extended radical mastectomy. Cancer, 55, 880–891.CrossRefGoogle Scholar
  37. Miller, A. J. (1990). Subset Selection in Regression. Chapman & Hall Ltd, London, New York.zbMATHGoogle Scholar
  38. Neuhaus, J. (1998). Estimation efficiency with omitted covariates in generalized linear models. Journal of the American Statistical Association, 93, 1124–1129.MathSciNetzbMATHCrossRefGoogle Scholar
  39. Neuhaus, J. and Jewell, N. P. (1993). A geometric approach to assess bias due to omitted covariates in generalized linear models. Biometrika, 80, 807–815.MathSciNetzbMATHCrossRefGoogle Scholar
  40. Orwoll, E., Bauer, D. C., Vogt, T. M. and Fox, K. M. (1996). Axial bone mass in older women. Annals of Internal Medicine, 124(2), 185–197.Google Scholar
  41. Parzen, M. and Lipsitz, S. R. (1999). A global goodness-of-fit statistic for Cox regression models. Biometrics, 55, 580–584.MathSciNetzbMATHCrossRefGoogle Scholar
  42. Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82, 669–688.MathSciNetzbMATHCrossRefGoogle Scholar
  43. Pencina, M. J., D’Agostino Sr, R. B., D’Agostino Jr, R. B. and Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in Medicine, 27, 157–172.MathSciNetCrossRefGoogle Scholar
  44. Schmoor, C. and Schumacher, M. (1997). Effects of covariate omission and categorization when analysing randomized trials with the Cox model. Statistics in Medicine, 16, 225–237.CrossRefGoogle Scholar
  45. Steyerberg, E. W. (2009). Clinical Prediction Models. Springer, New York.zbMATHCrossRefGoogle Scholar
  46. Sun, G. W., Shook, T. L. and Kay, G. L. (1999). Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. Journal of Clinical Epidemiology, 49, 907–916.CrossRefGoogle Scholar
  47. Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16, 385–395.CrossRefGoogle Scholar
  48. van Houwelingen, H. C. (2000). Validation, calibration, revision, and combination of prognostic survival models. Statistics in Medicine, 19, 3401–3415.CrossRefGoogle Scholar
  49. Vittinghoff, E. and McCulloch, C. E. (2007). Relaxing the rule of ten events per variable in logistic and Cox regression. American Journal of Epidemiology, 165, 710–718.CrossRefGoogle Scholar
  50. Vittinghoff, E., Shlipak, M. G., Varosy, P. D., Furberg, C. D., Ireland, C. C., Khan, S. S., Blumenthal, R., Barrett-Connor, E. and Hulley, S. (2003). Risk factors and secondary prevention in women with heart disease: The Heart and Estrogen/progestin Replacement Study. Annals of Internal Medicine, 138(2), 81–89.Google Scholar
  51. Walter, L. C., Brand, R. J., Counsell, S. R., Palmer, R. M., Landefeld, C. S., Fortinsky, R. H. and Covinsky, K. E. (2001). Development and validation of a prognostic index for 1-year mortality in older adults after hospitalization. Journal of the American Medical Association, 285(23), 2987–2994.CrossRefGoogle Scholar
  52. Weisberg, S. (1985). Applied Linear Regression. John Wiley & Sons, New York, Chichester.zbMATHGoogle Scholar
  53. Whooley, M., de Jonge, P., Vittinghoff, E., Otte, C., Moos, R., Carney, R., Ali, S., Carney, R., Na, B., Feldman, M., Schiller, N. and Browner, W. (2008). Depressive symptoms, health behaviors, and risk of cardiovascular events in patients with coronary heart disease. Journal of the American Medical Association, 300(20), 2379–2388.CrossRefGoogle Scholar
  54. Concato, J., Peduzzi, P. and Holfold, T. R. (1995). Importance of events per independent variable in proportional hazards analysis i. background, goals, and general strategy. Journal of Clinical Epidemiology, 48, 1495–1501.Google Scholar
  55. Grady, D., Wenger, N. K., Herrington, D., Khan, S., Furberg, C., Hunninghake, D., Vittinghoff, E. and Hulley, S. (2000). Postmenopausal hormone therapy increases risk of venous thromboembolic disease. The Heart and Estrogen/progestin Replacement Study. Annals of Internal Medicine, 132(9), 689–696.Google Scholar
  56. Molinaro, A. and van der Laan, M. J. (2004). Deletion/substitution/addition algorithm for partitioning the covariate space in prediction. U.C. Berkeley Division of Biostatistics Working Paper Series, Paper 162.Google Scholar
  57. Peduzzi, P., Concato, J. and Feinstein, A. R. (1995). Importance of events per independent variable in proportional hazards regression analysis ii. accuracy and precision of regression estimates. Journal of Clinical Epidemiology, 48, 1503–1510.Google Scholar
  58. Rothman, K. J. and Greenland, S. (1998). Modern Epidemiology. Lippincott Williams & Wilkins Publishers, Philadelphia, PA, 2nd ed.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Eric Vittinghoff
    • 1
  • David V. Glidden
    • 1
  • Stephen C. Shiboski
    • 1
  • Charles E. McCulloch
    • 2
  1. 1.Department of Epidemiology and BiostatisticsUniversity of California, San FranciscoSan FranciscoUSA
  2. 2.Department of Epidemiology and BiostatisticsUniversity of California, San FranciscoSan FranciscoUSA

Personalised recommendations