Abstract
Supervised ensemble methods construct a set of base learners (experts) and use their weighted outcome to predict new data. Numerous empirical studies confirm that ensemble methods often outperform any single base learner (Freund and Schapire, 1996, Bauer and Kohavi, 1999, Dietterich, 2000b). The improvement is intuitively clear when a base algorithm is unstable. In an unstable algorithm small changes in the training data lead to large changes in the resulting base learner (such as for decision tree, neural network, etc). Recently, a series of theoretical developments (Bousquet and Elisseeff, 2000, Poggio et al., 2002, Mukherjee et al., 2003, Poggio et al., 2004) also confirmed the fundamental role of stability for generalization (ability to perform well on the unseen data) of any learning engine. Given a multivariate learning algorithm, model selection and feature selection are closely related problems (the latter is a special case of the former). Thus, it is sensible that model-based feature selection methods (wrappers, embedded) would benefit from the regularization effect provided by ensemble aggregation. This is especially true for the fast, greedy and unstable learners often used for feature evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural Computation, 9(7):1545–1588, 1997.
E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36:525–536, 1999.
A. Borisov, V. Eruhimov, and E. Tuv. Feature Extraction, Foundations and Applications, chapter Dynamic soft feature selection for tree-based ensembles. Springer, 2005.
B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Fifth Annual Workshop on Computational Learning Theory, pages 144–152, Pittsburgh, 1992.
O. Bousquet and A. Elisseeff. Algorithmic stability and generalization performance. In Advances in Neural Information Processing Systems 13, pages 196–202, 2000. URL citeseer.nj.nec.com/bousquet01algorithmic.html.
L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996.
L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
L. Breiman. Manual On Setting Up, Using, And Understanding Random Forests V3.1, 2002.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984.
Leo Breiman. Arcing the edge. Technical Report 486, Statistics Department, University of California at Berkeley, 1997.
T.G. Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems. First International Workshop, volume 1857. Springer-Verlag, 2000a.
T.G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2):139–157, 2000b. available at ftp://ftp.cs.orst.edu/pub/tgd/papers/tr-randomized-c4.ps.gz.
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407–451, 2004.
R. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(II):179–188, 1936.
Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of Thirteenth International Conference, pages 148–156, 1996.
J. Friedman. Greedy function approximation: a gradient boosting machine, 1999a. IMS 1999 Reitz Lecture, February 24, 1999, Dept. of Statistics, Stanford University.
J. Friedman. Stochastic gradient boosting. Technical report, Dept. of Statistics, Stanford University, 1999b.
J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28:832–844, 2000.
A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian Data Analysis. Chapman and Hall, 1995.
W.R. Gillks, S. Richardson, and D.J. Spiegelhalter. Markov Chain Monte Carlo in practice. Chapman and Hall, 1996.
P. Green. Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika, 82(4):711–732, 1995.
L.K. Hansen and P. Salamon. Neural network ensembles. IEEE Trans. Pattern Analysis and Machine Intelligence, 12(10):993–1001, 1990.
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
D. J. C. MacKay. Bayesian non-linear modelling for the prediction competition. ASHRAE Transactions: Symposia, OR-94-17-1, 1994.
S. Mukherjee, P. Niyogi, T. Poggio, and R. Rifkin. Statistical learning: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. AI Memo 2002-024, MIT, 2003.
R. Neal. Bayesian Learning for Neural Networks. Springer-Verlag, 1996.
A.Y. Ng and M.I. Jordan. Convergence rates of the voting gibbs classifier, with application to bayesian feature selection. In ICML 2001, pages 377–384, 2001.
B. Parmanto, P.W. Munro, and H.R. Doyle. Improving committee diagnosis with resampling techniques. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 882–888. The MIT Press, 1996.
T. Poggio, R. Rifkin, S. Mukherjee, and P. Niyogi. General conditions for predictivity in learning theory. Nature, 428:419–422, 2004.
T. Poggio, R. Rifkin, S. Mukherjee, and A. Rakhlin. Bagging regularizes. AI Memo 2002-003, MIT, 2002.
M. Stephens. Bayesian analysis of mixtures with an unknown number of components an alternative to reversible jump methods. The Annals of Statistics, 28(1):40–74, 2000.
R. Tibshirani. Regression shrinkage and selection via lasso. J. Royal Statist. Soc., 58:267–288, 1996.
G. Valentini and T. Dietterich. Low bias bagged support vector machines. In ICML 2003, pages 752–759, 2003.
G. Valentini and F. Masulli. Ensembles of learning machines. In M. Marinaro and R. Tagliaferri, editors, Neural Nets WIRN Vietri-02, Lecture Notes in Computer Sciences. Springer-Verlag, Heidelberg, 2002.
A. Vehtari and J. Lampinen. Bayesian input variable selection using posterior probabilities and expected utilities. Technical Report Report B31, Laboratory of Computational Engineering, Helsinki University of Technology, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Tuv, E. (2006). Ensemble Learning. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-35488-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35487-1
Online ISBN: 978-3-540-35488-8
eBook Packages: EngineeringEngineering (R0)