Skip to main content

Ensemble Learning

  • Chapter
Feature Extraction

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 207))

Abstract

Supervised ensemble methods construct a set of base learners (experts) and use their weighted outcome to predict new data. Numerous empirical studies confirm that ensemble methods often outperform any single base learner (Freund and Schapire, 1996, Bauer and Kohavi, 1999, Dietterich, 2000b). The improvement is intuitively clear when a base algorithm is unstable. In an unstable algorithm small changes in the training data lead to large changes in the resulting base learner (such as for decision tree, neural network, etc). Recently, a series of theoretical developments (Bousquet and Elisseeff, 2000, Poggio et al., 2002, Mukherjee et al., 2003, Poggio et al., 2004) also confirmed the fundamental role of stability for generalization (ability to perform well on the unseen data) of any learning engine. Given a multivariate learning algorithm, model selection and feature selection are closely related problems (the latter is a special case of the former). Thus, it is sensible that model-based feature selection methods (wrappers, embedded) would benefit from the regularization effect provided by ensemble aggregation. This is especially true for the fast, greedy and unstable learners often used for feature evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural Computation, 9(7):1545–1588, 1997.

    Article  Google Scholar 

  • E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36:525–536, 1999.

    Article  Google Scholar 

  • A. Borisov, V. Eruhimov, and E. Tuv. Feature Extraction, Foundations and Applications, chapter Dynamic soft feature selection for tree-based ensembles. Springer, 2005.

    Google Scholar 

  • B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Fifth Annual Workshop on Computational Learning Theory, pages 144–152, Pittsburgh, 1992.

    Google Scholar 

  • O. Bousquet and A. Elisseeff. Algorithmic stability and generalization performance. In Advances in Neural Information Processing Systems 13, pages 196–202, 2000. URL citeseer.nj.nec.com/bousquet01algorithmic.html.

    Google Scholar 

  • L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996.

    MATH  MathSciNet  Google Scholar 

  • L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.

    Article  MATH  Google Scholar 

  • L. Breiman. Manual On Setting Up, Using, And Understanding Random Forests V3.1, 2002.

    Google Scholar 

  • L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984.

    MATH  Google Scholar 

  • Leo Breiman. Arcing the edge. Technical Report 486, Statistics Department, University of California at Berkeley, 1997.

    Google Scholar 

  • T.G. Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems. First International Workshop, volume 1857. Springer-Verlag, 2000a.

    Google Scholar 

  • T.G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2):139–157, 2000b. available at ftp://ftp.cs.orst.edu/pub/tgd/papers/tr-randomized-c4.ps.gz.

    Article  Google Scholar 

  • B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407–451, 2004.

    Article  MATH  MathSciNet  Google Scholar 

  • R. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(II):179–188, 1936.

    Google Scholar 

  • Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of Thirteenth International Conference, pages 148–156, 1996.

    Google Scholar 

  • J. Friedman. Greedy function approximation: a gradient boosting machine, 1999a. IMS 1999 Reitz Lecture, February 24, 1999, Dept. of Statistics, Stanford University.

    Google Scholar 

  • J. Friedman. Stochastic gradient boosting. Technical report, Dept. of Statistics, Stanford University, 1999b.

    Google Scholar 

  • J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28:832–844, 2000.

    Article  MathSciNet  Google Scholar 

  • A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian Data Analysis. Chapman and Hall, 1995.

    Google Scholar 

  • W.R. Gillks, S. Richardson, and D.J. Spiegelhalter. Markov Chain Monte Carlo in practice. Chapman and Hall, 1996.

    Google Scholar 

  • P. Green. Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika, 82(4):711–732, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  • L.K. Hansen and P. Salamon. Neural network ensembles. IEEE Trans. Pattern Analysis and Machine Intelligence, 12(10):993–1001, 1990.

    Article  Google Scholar 

  • T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.

    Google Scholar 

  • D. J. C. MacKay. Bayesian non-linear modelling for the prediction competition. ASHRAE Transactions: Symposia, OR-94-17-1, 1994.

    Google Scholar 

  • S. Mukherjee, P. Niyogi, T. Poggio, and R. Rifkin. Statistical learning: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. AI Memo 2002-024, MIT, 2003.

    Google Scholar 

  • R. Neal. Bayesian Learning for Neural Networks. Springer-Verlag, 1996.

    Google Scholar 

  • A.Y. Ng and M.I. Jordan. Convergence rates of the voting gibbs classifier, with application to bayesian feature selection. In ICML 2001, pages 377–384, 2001.

    Google Scholar 

  • B. Parmanto, P.W. Munro, and H.R. Doyle. Improving committee diagnosis with resampling techniques. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 882–888. The MIT Press, 1996.

    Google Scholar 

  • T. Poggio, R. Rifkin, S. Mukherjee, and P. Niyogi. General conditions for predictivity in learning theory. Nature, 428:419–422, 2004.

    Article  Google Scholar 

  • T. Poggio, R. Rifkin, S. Mukherjee, and A. Rakhlin. Bagging regularizes. AI Memo 2002-003, MIT, 2002.

    Google Scholar 

  • M. Stephens. Bayesian analysis of mixtures with an unknown number of components an alternative to reversible jump methods. The Annals of Statistics, 28(1):40–74, 2000.

    Article  MATH  MathSciNet  Google Scholar 

  • R. Tibshirani. Regression shrinkage and selection via lasso. J. Royal Statist. Soc., 58:267–288, 1996.

    MATH  MathSciNet  Google Scholar 

  • G. Valentini and T. Dietterich. Low bias bagged support vector machines. In ICML 2003, pages 752–759, 2003.

    Google Scholar 

  • G. Valentini and F. Masulli. Ensembles of learning machines. In M. Marinaro and R. Tagliaferri, editors, Neural Nets WIRN Vietri-02, Lecture Notes in Computer Sciences. Springer-Verlag, Heidelberg, 2002.

    Google Scholar 

  • A. Vehtari and J. Lampinen. Bayesian input variable selection using posterior probabilities and expected utilities. Technical Report Report B31, Laboratory of Computational Engineering, Helsinki University of Technology, 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Tuv, E. (2006). Ensemble Learning. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-35488-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35487-1

  • Online ISBN: 978-3-540-35488-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics