Statistics and Computing

, Volume 18, Issue 2, pp 195–208 | Cite as

Investigation about a screening step in model selection

  • Willi Sauerbrei
  • Norbert Holländer
  • Anika Buchholz
Article

Abstract

In many studies a large number of variables is measured and the identification of relevant variables influencing an outcome is an important task. For variable selection several procedures are available. However, focusing on one model only neglects that there usually exist other equally appropriate models. Bayesian or frequentist model averaging approaches have been proposed to improve the development of a predictor. With a larger number of variables (say more than ten variables) the resulting class of models can be very large. For Bayesian model averaging Occam’s window is a popular approach to reduce the model space. As this approach may not eliminate any variables, a variable screening step was proposed for a frequentist model averaging procedure. Based on the results of selected models in bootstrap samples, variables are eliminated before deriving a model averaging predictor. As a simple alternative screening procedure backward elimination can be used.

Through two examples and by means of simulation we investigate some properties of the screening step. In the simulation study we consider situations with fifteen and 25 variables, respectively, of which seven have an influence on the outcome. With the screening step most of the uninfluential variables will be eliminated, but also some variables with a weak effect. Variable screening leads to more applicable models without eliminating models, which are more strongly supported by the data. Furthermore, we give recommendations for important parameters of the screening step.

Keywords

Model selection uncertainty Variable screening Bootstrap Simulation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B., Csaki, B. (eds.) Second International Symposium on Information Theory, pp. 267–281. Academiai Kiado, Budapest (1973) Google Scholar
  2. Augustin, N.H., Sauerbrei, W., Schumacher, M.: The practical utility of incorporating model selection uncertainty into prognostic models for survival data. Stat. Model. 5, 95–118 (2005) MATHCrossRefMathSciNetGoogle Scholar
  3. Buchholz, A., Sauerbrei, W., Holländer, N.: On properties of predictors derived with a two-step bootstrap model averaging approach—a simulation study in the linear regression model. Comput. Stat. Data Anal. (2007, in press) Google Scholar
  4. Buckland, S.T., Burnham, K.P., Augustin, N.H.: Model selection: an integral part of inference. Biometrics 53, 603–618 (1997) MATHCrossRefGoogle Scholar
  5. Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference: a Practical Information Theoretic Approach. Springer, New York (2002) MATHGoogle Scholar
  6. Burnham, K.P., Anderson, D.R.: Multimodel inference: understanding AIC and BIC in model selection. Sociol. Methods Res. 33, 261–304 (2004) CrossRefMathSciNetGoogle Scholar
  7. Chatfield, C.: Model uncertainty, data mining and statistical inference. J. R. Stat. Soc. Ser. A 158, 419–466 (1995) CrossRefGoogle Scholar
  8. Draper, D.: Assessment and propagation of model selection uncertainty (with) discussion. J. R. Stat. Soc. Ser. B 57, 45–97 (1995) MATHMathSciNetGoogle Scholar
  9. Harrell, F.E.J.: Regression Modeling Strategies. Springer, New York (2001) MATHGoogle Scholar
  10. Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: a tutorial (with discussion). Stat. Sci. 14, 382–417 (1999) MATHCrossRefMathSciNetGoogle Scholar
  11. Holländer, N., Sauerbrei, W., Schumacher, M.: Confidence intervals for the effect of a prognostic factor after selection of an ‘optimal’ cutpoint. Stat. Med. 23, 1701–1713 (2004) CrossRefGoogle Scholar
  12. Holländer, N., Augustin, N.H., Sauerbrei, W.: Investigation on the improvement of prediction by bootstrap model averaging. Methods Inf. Med. 45, 44–50 (2006) Google Scholar
  13. Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley, New York (2001) Google Scholar
  14. Johnson, R.W.: Fitting percentage of body fat to simple body measurements. J. Stat. Educ. 4 (1996) Google Scholar
  15. Kuha, J.: AIC and BIC: Comparison of assumptions and performance. Sociol. Methods Res. 33, 188–229 (2004) CrossRefMathSciNetGoogle Scholar
  16. Mantel, N.: Why stepdown procedures in variable selection. Technometrics 12, 621–625 (1970) CrossRefGoogle Scholar
  17. Raftery, A.E.: Bayesian model selection in social research (with discussion). Sociol. Methodol. 25, 111–195 (1995) CrossRefGoogle Scholar
  18. Sauerbrei, W.: The use of resampling methods to simplify regression models in medical statistics. J. R. Stat. Soc. Ser. C 48, 313–329 (1999) MATHCrossRefGoogle Scholar
  19. Sauerbrei, W., Schumacher, M.: A boostrap resampling procedure for model building: application to the Cox regression model. Stat. Med. 11, 2093–2109 (1992) CrossRefGoogle Scholar
  20. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978) MATHCrossRefGoogle Scholar
  21. Teräsvirta, T., Mellin, I.: Model selection criteria and model selection tests in regression models. Scand. J. Stat. 13, 159–171 (1986) Google Scholar
  22. Wyatt, J.C., Altman, D.G.: Prognostic models: clinically useful or quickly forgotten? Br. Med. J. 311, 1539–1541 (1995) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Willi Sauerbrei
    • 1
  • Norbert Holländer
    • 1
  • Anika Buchholz
    • 1
  1. 1.FreiburgGermany

Personalised recommendations