Abstract
In many studies a large number of variables is measured and the identification of relevant variables influencing an outcome is an important task. For variable selection several procedures are available. However, focusing on one model only neglects that there usually exist other equally appropriate models. Bayesian or frequentist model averaging approaches have been proposed to improve the development of a predictor. With a larger number of variables (say more than ten variables) the resulting class of models can be very large. For Bayesian model averaging Occam’s window is a popular approach to reduce the model space. As this approach may not eliminate any variables, a variable screening step was proposed for a frequentist model averaging procedure. Based on the results of selected models in bootstrap samples, variables are eliminated before deriving a model averaging predictor. As a simple alternative screening procedure backward elimination can be used.
Through two examples and by means of simulation we investigate some properties of the screening step. In the simulation study we consider situations with fifteen and 25 variables, respectively, of which seven have an influence on the outcome. With the screening step most of the uninfluential variables will be eliminated, but also some variables with a weak effect. Variable screening leads to more applicable models without eliminating models, which are more strongly supported by the data. Furthermore, we give recommendations for important parameters of the screening step.
Similar content being viewed by others
References
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B., Csaki, B. (eds.) Second International Symposium on Information Theory, pp. 267–281. Academiai Kiado, Budapest (1973)
Augustin, N.H., Sauerbrei, W., Schumacher, M.: The practical utility of incorporating model selection uncertainty into prognostic models for survival data. Stat. Model. 5, 95–118 (2005)
Buchholz, A., Sauerbrei, W., Holländer, N.: On properties of predictors derived with a two-step bootstrap model averaging approach—a simulation study in the linear regression model. Comput. Stat. Data Anal. (2007, in press)
Buckland, S.T., Burnham, K.P., Augustin, N.H.: Model selection: an integral part of inference. Biometrics 53, 603–618 (1997)
Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference: a Practical Information Theoretic Approach. Springer, New York (2002)
Burnham, K.P., Anderson, D.R.: Multimodel inference: understanding AIC and BIC in model selection. Sociol. Methods Res. 33, 261–304 (2004)
Chatfield, C.: Model uncertainty, data mining and statistical inference. J. R. Stat. Soc. Ser. A 158, 419–466 (1995)
Draper, D.: Assessment and propagation of model selection uncertainty (with) discussion. J. R. Stat. Soc. Ser. B 57, 45–97 (1995)
Harrell, F.E.J.: Regression Modeling Strategies. Springer, New York (2001)
Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: a tutorial (with discussion). Stat. Sci. 14, 382–417 (1999)
Holländer, N., Sauerbrei, W., Schumacher, M.: Confidence intervals for the effect of a prognostic factor after selection of an ‘optimal’ cutpoint. Stat. Med. 23, 1701–1713 (2004)
Holländer, N., Augustin, N.H., Sauerbrei, W.: Investigation on the improvement of prediction by bootstrap model averaging. Methods Inf. Med. 45, 44–50 (2006)
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley, New York (2001)
Johnson, R.W.: Fitting percentage of body fat to simple body measurements. J. Stat. Educ. 4 (1996)
Kuha, J.: AIC and BIC: Comparison of assumptions and performance. Sociol. Methods Res. 33, 188–229 (2004)
Mantel, N.: Why stepdown procedures in variable selection. Technometrics 12, 621–625 (1970)
Raftery, A.E.: Bayesian model selection in social research (with discussion). Sociol. Methodol. 25, 111–195 (1995)
Sauerbrei, W.: The use of resampling methods to simplify regression models in medical statistics. J. R. Stat. Soc. Ser. C 48, 313–329 (1999)
Sauerbrei, W., Schumacher, M.: A boostrap resampling procedure for model building: application to the Cox regression model. Stat. Med. 11, 2093–2109 (1992)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Teräsvirta, T., Mellin, I.: Model selection criteria and model selection tests in regression models. Scand. J. Stat. 13, 159–171 (1986)
Wyatt, J.C., Altman, D.G.: Prognostic models: clinically useful or quickly forgotten? Br. Med. J. 311, 1539–1541 (1995)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sauerbrei, W., Holländer, N. & Buchholz, A. Investigation about a screening step in model selection. Stat Comput 18, 195–208 (2008). https://doi.org/10.1007/s11222-007-9048-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-007-9048-5