Encyclopedia of Machine Learning and Data Mining

2017 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Ensemble Learning

  • Gavin Brown
Reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7687-1_252



Ensemble learning refers to the procedures employed to train multiple learning machines and combine their outputs, treating them as a “committee” of decision makers. The principle is that the decision of the committee, with individual predictions combined appropriately, should have better overall  accuracy, on average, than any individual committee member. Numerous empirical and theoretical studies have demonstrated that ensemble models very often attain higher accuracy than single models.

The members of the ensemble might be predicting real-valued numbers, class labels, posterior probabilities, rankings, clusterings, or any other quantity. Therefore, their decisions can be combined by many methods, including averaging, voting, and probabilistic methods. The majority of ensemble learning methods are generic, applicable across broad classes of model types and learning tasks.

Motivation and Background

If we could build...

This is a preview of subscription content, log in to check access.

Recommended Reading1

  1. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140zbMATHGoogle Scholar
  2. Breiman L (2001) Random forests. Mach Learn 45(1):5–32zbMATHCrossRefGoogle Scholar
  3. Brown G (2004) Diversity in neural network ensembles. PhD thesis, University of BirminghamGoogle Scholar
  4. Brown G, Wyatt JL, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. J Inf Fusion 6(1):5–20CrossRefGoogle Scholar
  5. Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning. ACM, New York, pp 161–168Google Scholar
  6. Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning (ICML’96). Morgan Kauffman Publishers, San Francisco, pp 148–156Google Scholar
  7. Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4(1):1–58CrossRefGoogle Scholar
  8. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRefGoogle Scholar
  9. Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87CrossRefGoogle Scholar
  10. Kearns M, Valiant LG (1988) Learning Boolean formulae or finite automata is as hard as factoring. Technical report TR-14-88, Harvard University Aiken Computation LaboratoryGoogle Scholar
  11. Koltchinskii V, Panchenko D (2005) Complexities of convex combinations and bounding the generalization error in classification. Ann Stat 33(4):1455MathSciNetzbMATHCrossRefGoogle Scholar
  12. Krogh A, Vedelsby J (1995) Neural network ensembles, crossvalidation and active learning. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 231–238Google Scholar
  13. Kuncheva LI (2004a) Classifier ensembles for changing environments. In: International workshop on multiple classifier systems. Lecture notes in computer science, vol 3007. Springer, BerlinGoogle Scholar
  14. Kuncheva LI (2004b) Combining pattern classifiers: methods and algorithms. Wiley, New YorkzbMATHCrossRefGoogle Scholar
  15. Laplace PS (1818) Deuxieme supplement a la theorie analytique des probabilites. Gauthier-Villars, ParisGoogle Scholar
  16. Mease D, Wyner A (2008) Evidence contrary to the statistical view of Boosting. J Mach Learn Res 9:131–156Google Scholar
  17. Melville P, Mooney RJ (2005) Creating diversity in ensembles using artificial data. Inf Fusion 6(1):99–111CrossRefGoogle Scholar
  18. Polikar R (2006) Ensemble based systems in decision making. IEEE Circ Syst Mag 6(3):21–45CrossRefGoogle Scholar
  19. Rätsch G, Mika S, Schölkopf B, Müller KR (2002) Constructing Boosting algorithms from SVMs: an application to one-class classification. IEEE Trans Pattern Anal Mach Intell 24(9):1184–1199CrossRefGoogle Scholar
  20. Rodriguez J, Kuncheva L, Alonso C (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630CrossRefGoogle Scholar
  21. Roli F, Kittler J, Windridge D, Oza N, Polikar R, Haindl M et al (eds) Proceedings of the international workshop on multiple classifier systems 2000–2009. Lecture notes in computer science. Springer, Berlin. Available at: http://www.informatik.uni-trier.de/ley/db/conf/mcs/index.html
  22. Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227Google Scholar
  23. Schapire RE (1999) A brief introduction to boosting. In: Proceedings of the 16th international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, pp 1401–1406Google Scholar
  24. Schapire RE (2003) The boosting approach to machine learning: an overview. In: Denison DD, Hansen MH, Holmes C, Mallick B, Yu B (eds) Nonlinear estimation & classification Lecture notes in statistics. Springer, Berlin, pp 149–172CrossRefGoogle Scholar
  25. Strehl A, Ghosh J (2003) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetzbMATHGoogle Scholar
  26. Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3–4):385–403CrossRefGoogle Scholar
  27. Ueda N, Nakano R (1996) Generalization error of ensemble estimators. In: Proceedings of IEEE international conference on neural networks, vol 1, pp 90–95. ISBN:0-7803-3210-5CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Gavin Brown
    • 1
  1. 1.The University of ManchesterManchesterUK