Skip to main content

Predictor Selection

  • Chapter
  • First Online:

Part of the book series: Statistics for Biology and Health ((SBH))

Abstract

Walter et al. (2001) developed a model to identify older adults at high risk of death in the first year after hospitalization, using data collected for 2,922 patients discharged from two hospitals in Ohio. Potential predictors included demographics, activities of daily living (ADLs), the APACHE-II illness-severity score, and information about the index hospitalization. A “backward” selection procedure with a restrictive inclusion criterion was used to choose a multipredictor model, using data from one of the two hospitals. The model was then validated using data from the other hospital. The goal was to select a model that best predicted future events, with a view toward identifying patients in need of more intensive monitoring and intervention.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Allen, D. M. and Cady, F. B. (1982). Analyzing Experimental Data by Regression. Wadsworth, Belmont, CA.

    MATH  Google Scholar 

  • Altman, D. G. and Andersen, P. K. (1989). Bootstrap investigation of the stability of the Cox regression model. Statistics in Medicine, 8, 771–783.

    Article  Google Scholar 

  • Altman, D. G. and Royston, P. (2000). What do we mean by validating a prognostic model? Statistics in Medicine, 19, 453–473.

    Article  Google Scholar 

  • Antman, E. M., Cohen, M., Bernink, P. J. L. M., McCabe, C. H., Horaceck, T., Papuchis, G., Mautner, B., Corbalan, R., Radley, D. and Braunwald, E. (2000). The TIMI Risk Score for unstable angina/non-ST elevation MI. Journal of the American Medical Association, 284(7), 835–842.

    Article  Google Scholar 

  • Beach, M. L. and Meier, P. (1989). Choosing covariates in the analysis of clinical trials. Controlled Clinical Trials, 10, 161S–175S.

    Article  Google Scholar 

  • Begg, M. D. and Lagakos, S. (1993). Loss in efficiency caused by omitted covariates and misspecifying exposure in logistic regression models. Journal of the American Statistical Association, 88(421), 166–170.

    Article  Google Scholar 

  • Breiman, L. (2001). Statistical modeling: the two cultures. Statistical Science, 16(3), 199–231.

    Article  MathSciNet  MATH  Google Scholar 

  • Brown, J., Vittinghoff, E., Wyman, J. F., Stone, K. L., Nevitt, M. C., Ensrud, K. E. and Grady, D. (2000). Urinary incontinence: does it increase risk for falls and fractures? Study of Osteoporotic Fractures Research Group. Journal of the American Geriatric Society, B48, 721–725.

    Google Scholar 

  • Buckland, S. T., Burnham, K. P. and Augustin, N. H. (1997). Model selection: an integral part of inference. Biometrics, 53, 603–618.

    Article  MATH  Google Scholar 

  • Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society, Series A, 158, 419–466.

    Google Scholar 

  • Cook, N. R., Buring, J. E. and Ridker, P. M. (2006). The effective of including C-reactive protein in cardiovascular risk prediction models for women. Annals of Internal Medicine, 145, 21–29.

    Google Scholar 

  • Crager, M. R. (1987). Analysis of covariance in parallel-group clinical trials with pretreatment baselines. Biometrics, 43(4), 895–901.

    Article  MathSciNet  MATH  Google Scholar 

  • D’Agostino, R. B., Russell, M. W., Huse, D. M., Ellison, C., Silberhatz, H., Wilson, P. W. F. and Hartz, S. C. (2000). Primary and subsequent coronary risk appraisal: new results from the Framingham Study. American Heart Journal, 139, 272–281.

    Google Scholar 

  • Gail, M. H., Tan, W. Y. and Piantodosi, S. (1988). Tests for no treatment effect in randomized clinical trials. Biometrika, 75, 57–64.

    Article  MathSciNet  MATH  Google Scholar 

  • Gail, M. H., Wieand, S. and Piantodosi, S. (1984). Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika, 71, 431–444.

    Article  MathSciNet  MATH  Google Scholar 

  • Glymour, M. M., Weuve, J., Berkman, L. F., Kawachi, I. and Robins, J. M. (2005). When is baseline adjustment useful in analyses of change? an example with education and cognitive change. American Journal of Epidemiology, 163(3), 267–278.

    Article  Google Scholar 

  • Gordon, W., Polansky, J., Boscardin, W., Fung, K. and Steinman, M. (2010). Coronary risk assessment by point-based and equation-based Framingham models: significant implications for clinical care. Journal of General Internal Medicine, 25(11), 1145–51.

    Article  Google Scholar 

  • Greenland, S. (1989). Modeling and variable selection in epidemiologic analysis. American Journal of Public Health, 79(3), 340–349.

    Article  Google Scholar 

  • Greenland, S. (2003). Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology, 14(3), 300–306.

    Google Scholar 

  • Greenland, S. and Brumback, B. (2002). An overview of relations among causal modeling methods. International Journal of Epidemiology, 31(5), 1030–1037.

    Article  Google Scholar 

  • Greenland, S., Pearl, J. and Robins, J. M. (1999). Causal diagrams for epidemiologic research. Epidemiology, 10, 37–48.

    Article  Google Scholar 

  • Grodstein, F., Manson, J. E. and Stampfer, M. J. (2001). Postmenopausal hormone use and secondary prevention of coronary events in the Nurses’ Health Study. Annals of Internal Medicine, 135, 1–8.

    Google Scholar 

  • Harrell, F. E. (2005). Regression Modeling Strategies. Springer, New York.

    Google Scholar 

  • Harrell, F. E., Lee, K. L., Califf, R. M., Pryor, D. B. and Rosati, R. A. (1984). Regression modelling strategies for improved prognostic prediction. Statistics in Medicine, 3, 143–152.

    Article  Google Scholar 

  • Hauck, W. W., Anderson, S. and Marcus, S. M. (1998). Should we adjust for covariates in nonlinear regression analyses of randomized trials? Controlled Clinical Trials, 19, 249–256.

    Article  Google Scholar 

  • Henderson, R. and Oman, P. (1999). Effect of frailty on marginal regression estimates in survival analysis. Journal of the Royal Statistical Society, Series B, Methodological, 61, 367–379.

    Article  MathSciNet  MATH  Google Scholar 

  • Herńan, M. A., Hernández-Díaz, S. and Robins, J. M. (2004). A structural approach to selection bias. Epidemiology, 15(5), 615–625.

    Article  Google Scholar 

  • Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: biased estimates for nonorthogonal problems. Technometrics, 12, 55–67.

    Article  MATH  Google Scholar 

  • Hosmer, D. W. and Lemeshow, S. (2000). Applied Logistic Regression. John Wiley & Sons, New York, Chichester.

    Book  MATH  Google Scholar 

  • Hulley, S., Grady, D., Bush, T., Furberg, C., Herrington, D., Riggs, B. and Vittinghoff, E. (1998). Randomized trial of estrogen plus progestin for secondary prevention of heart disease in postmenopausal women. The Heart and Estrogen/progestin Replacement Study. Journal of the American Medical Association, 280(7), 605–613.

    Google Scholar 

  • Jewell, N. P. (2004). Statistics for Epidemiology. Chapman & Hall/CRC, Boca Raton, FL.

    Google Scholar 

  • Kanaya, A., Vittinghoff, E., Shlipak, M. G., Resnick, H. E., Visser, M., Grady, D. and Barrett-Connor, E. (2004). Association of total and central obesity with mortality in postmenopausal women with coronary heart disease. American Journal of Epidemiology, 158(12), 1161–1170.

    Article  Google Scholar 

  • Lagakos, S. W. and Schoenfeld, D. A. (1984). Properties of proportional-hazards score tests under misspecified regression models. Biometrics, 40, 1037–1048.

    Article  MathSciNet  MATH  Google Scholar 

  • Linhart, H. and Zucchini, W. (1986). Model Selection. John Wiley & Sons, New York, Chichester.

    MATH  Google Scholar 

  • Maldonado, G. and Greenland, S. (1993). Simulation study of confounder-selection strategies. American Journal of Epidemiology, 138, 923–936.

    Google Scholar 

  • Meier, P., Ferguson, D. J. and Karrison, T. (1985). A controlled trial of extended radical mastectomy. Cancer, 55, 880–891.

    Article  Google Scholar 

  • Miller, A. J. (1990). Subset Selection in Regression. Chapman & Hall Ltd, London, New York.

    MATH  Google Scholar 

  • Neuhaus, J. (1998). Estimation efficiency with omitted covariates in generalized linear models. Journal of the American Statistical Association, 93, 1124–1129.

    Article  MathSciNet  MATH  Google Scholar 

  • Neuhaus, J. and Jewell, N. P. (1993). A geometric approach to assess bias due to omitted covariates in generalized linear models. Biometrika, 80, 807–815.

    Article  MathSciNet  MATH  Google Scholar 

  • Orwoll, E., Bauer, D. C., Vogt, T. M. and Fox, K. M. (1996). Axial bone mass in older women. Annals of Internal Medicine, 124(2), 185–197.

    Google Scholar 

  • Parzen, M. and Lipsitz, S. R. (1999). A global goodness-of-fit statistic for Cox regression models. Biometrics, 55, 580–584.

    Article  MathSciNet  MATH  Google Scholar 

  • Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82, 669–688.

    Article  MathSciNet  MATH  Google Scholar 

  • Pencina, M. J., D’Agostino Sr, R. B., D’Agostino Jr, R. B. and Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in Medicine, 27, 157–172.

    Article  MathSciNet  Google Scholar 

  • Schmoor, C. and Schumacher, M. (1997). Effects of covariate omission and categorization when analysing randomized trials with the Cox model. Statistics in Medicine, 16, 225–237.

    Article  Google Scholar 

  • Steyerberg, E. W. (2009). Clinical Prediction Models. Springer, New York.

    Book  MATH  Google Scholar 

  • Sun, G. W., Shook, T. L. and Kay, G. L. (1999). Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. Journal of Clinical Epidemiology, 49, 907–916.

    Article  Google Scholar 

  • Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16, 385–395.

    Article  Google Scholar 

  • van Houwelingen, H. C. (2000). Validation, calibration, revision, and combination of prognostic survival models. Statistics in Medicine, 19, 3401–3415.

    Article  Google Scholar 

  • Vittinghoff, E. and McCulloch, C. E. (2007). Relaxing the rule of ten events per variable in logistic and Cox regression. American Journal of Epidemiology, 165, 710–718.

    Article  Google Scholar 

  • Vittinghoff, E., Shlipak, M. G., Varosy, P. D., Furberg, C. D., Ireland, C. C., Khan, S. S., Blumenthal, R., Barrett-Connor, E. and Hulley, S. (2003). Risk factors and secondary prevention in women with heart disease: The Heart and Estrogen/progestin Replacement Study. Annals of Internal Medicine, 138(2), 81–89.

    Google Scholar 

  • Walter, L. C., Brand, R. J., Counsell, S. R., Palmer, R. M., Landefeld, C. S., Fortinsky, R. H. and Covinsky, K. E. (2001). Development and validation of a prognostic index for 1-year mortality in older adults after hospitalization. Journal of the American Medical Association, 285(23), 2987–2994.

    Article  Google Scholar 

  • Weisberg, S. (1985). Applied Linear Regression. John Wiley & Sons, New York, Chichester.

    MATH  Google Scholar 

  • Whooley, M., de Jonge, P., Vittinghoff, E., Otte, C., Moos, R., Carney, R., Ali, S., Carney, R., Na, B., Feldman, M., Schiller, N. and Browner, W. (2008). Depressive symptoms, health behaviors, and risk of cardiovascular events in patients with coronary heart disease. Journal of the American Medical Association, 300(20), 2379–2388.

    Article  Google Scholar 

  • Concato, J., Peduzzi, P. and Holfold, T. R. (1995). Importance of events per independent variable in proportional hazards analysis i. background, goals, and general strategy. Journal of Clinical Epidemiology, 48, 1495–1501.

    Google Scholar 

  • Grady, D., Wenger, N. K., Herrington, D., Khan, S., Furberg, C., Hunninghake, D., Vittinghoff, E. and Hulley, S. (2000). Postmenopausal hormone therapy increases risk of venous thromboembolic disease. The Heart and Estrogen/progestin Replacement Study. Annals of Internal Medicine, 132(9), 689–696.

    Google Scholar 

  • Molinaro, A. and van der Laan, M. J. (2004). Deletion/substitution/addition algorithm for partitioning the covariate space in prediction. U.C. Berkeley Division of Biostatistics Working Paper Series, Paper 162.

    Google Scholar 

  • Peduzzi, P., Concato, J. and Feinstein, A. R. (1995). Importance of events per independent variable in proportional hazards regression analysis ii. accuracy and precision of regression estimates. Journal of Clinical Epidemiology, 48, 1503–1510.

    Google Scholar 

  • Rothman, K. J. and Greenland, S. (1998). Modern Epidemiology. Lippincott Williams & Wilkins Publishers, Philadelphia, PA, 2nd ed.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Vittinghoff, E., Glidden, D.V., Shiboski, S.C., McCulloch, C.E. (2012). Predictor Selection. In: Regression Methods in Biostatistics. Statistics for Biology and Health. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-1353-0_10

Download citation

Publish with us

Policies and ethics