- 1.2k Downloads
This chapter addresses the question of which predictor variables should be included in a linear model. The easiest version of the problem is, given a linear model, which variables should be excluded. To that end we examine the question of selecting the best subset of predictor variables from amongst the original variables. To do this requires us to define a “best” model and we examine several competing measures. We also examine the “greedy” algorithm for this problem known as backward elimination. The more difficult problem of deciding which variables to place into a linear model is addressed by the greedy algorithm of forward selection. (These algorithms are greedy in the sense of always wanting the best thing right now, rather than seeking a global sense of what is best.) We examine traditional forward selection as well as the modern adaptations of forward selection known as boosting, bagging, and random forests.
- Christensen, R. (2015). Analysis of variance, design, and regression: Linear modeling for unbalanced data (2nd ed.). Boca Raton, FL: Chapman and Hall/CRC Pres.Google Scholar
- Hurvich, C. M., & Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika, 76, 297–307.Google Scholar