Abstract
We examine the mechanism by which feature selection improves the accuracy of supervised learning. An empirical bias/variance analysis as feature selection progresses indicates that the most accurate feature set corresponds to the best bias-variance trade-off point for the learning algorithm. Often, this is not the point separating relevant from irrelevant features, but where increasing variance outweighs the gains from adding more (weakly) relevant features. In other words, feature selection can be viewed as a variance reduction method that trades off the benefits of decreased variance (from the reduction in dimensionality) with the harm of increased bias (from eliminating some of the relevant features). If a variance reduction method like bagging is used, more (weakly) relevant features can be exploited and the most accurate feature set is usually larger. In many cases, the best performance is obtained by using all available features.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Ali, K.M., Pazzani, M.J.: Error reduction through learning multiple descriptions. Machine Learning 24(3), 173–202 (1996)
Bay, S.D.: Combining nearest neighbor classifiers through multiple feature subsets. In: ICML 1998: Proceedings of the 15th International Conference on Machine Learning, pp. 37–45. Morgan Kaufmann Publishers Inc., San Francisco (1998)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. In: Advances in Neural Information Processing Systems 17, pp. 545–552. MIT Press, Cambridge (2005)
Reunanen, J.: Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research 3, 1371–1382 (2003)
Loughrey, J., Cunningham, P.: Using early-stopping to avoid overfitting in wrapper-based feature selection employing stochastic search. Technical Report TCD-CS-2005-37, Trinity College Dublin, Department of Computer Science (May 2005)
van der Putten, P., van Someren, M.: A bias-variance analysis of a real world learning problem: The CoIL challenge 2000. Machine Learning 57(1-2), 177–195 (2004)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36(1-2), 105–139 (1999)
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832–844 (1998)
Bryll, R., Gutierrez-Osuna, R., Quek, F.: Attribute bagging: Improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognition 36(6), 1291–1302 (2003)
Opitz, D.W.: Feature selection for ensembles. In: AAAI 1999: Proceedings of the 16th National Conference on Artificial Intelligence, pp. 379–384. American Association for Artificial Intelligence, Menlo Park (1999)
Tuv, E., Borisov, A., Torkkola, K.: Feature selection using ensemble based ranking against artificial contrasts. In: International Joint Conference on Neural Networks, pp. 2181–2186 (2006)
Saeys, Y., Abeel, T., Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008)
Tuv, E.: Ensemble learning. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A. (eds.) Feature Extraction: Foundations, and Applications. Studies in Fuzziness and Soft Computing, vol. 207, pp. 187–204. Springer, Heidelberg (2006)
Buntine, W., Caruana, R.: Introduction to IND and recursive partitioning. Technical Report FIA-91-28, NASA Ames Research Center (October 1991)
Wallace, C.S., Patrick, J.D.: Coding decision trees. Machine Learning 11(1), 7–22 (1993)
Buntine, W.: Learning classification trees. Statistics and Computing 2(2), 63–73 (1992)
Platt, J.C.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Smola, A.J., Bartlett, P.J., Schoelköpf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (2000)
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Computation 4(1), 1–58 (1992)
Domingos, P.: A unified bias-variance decomposition and its applications. In: Proceedings of the 17th International Conference on Machine Learning, pp. 231–238. Morgan Kaufmann, San Francisco (2000)
Bouckaert, R.R.: Practical bias variance decomposition. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 247–257. Springer, Heidelberg (2008)
Kohavi, R., Wolpert, D.H.: Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1996)
Caruana, R., de Sa, V.R.: Benefitting from the variables that variable selection discards. Journal of Machine Learning Research 3, 1245–1264 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Munson, M.A., Caruana, R. (2009). On Feature Selection, Bias-Variance, and Bagging. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04174-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-04174-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04173-0
Online ISBN: 978-3-642-04174-7
eBook Packages: Computer ScienceComputer Science (R0)