Ensemble learning refers to the procedures employed to train multiple learning machines and combine their outputs, treating them as a “committee” of decision makers. The principle is that the decision of the committee, with individual predictions combined appropriately, should have better overall accuracy, on average, than any individual committee member. Numerous empirical and theoretical studies have demonstrated that ensemble models very often attain higher accuracy than single models.
The members of the ensemble might be predicting real-valued numbers, class labels, posterior probabilities, rankings, clusterings, or any other quantity. Therefore, their decisions can be combined by many methods, including averaging, voting, and probabilistic methods. The majority of ensemble learning methods are generic, applicable across broad classes of model types and learning tasks.
Motivation and Background
If we could build...
- Brown G (2004) Diversity in neural network ensembles. PhD thesis, University of BirminghamGoogle Scholar
- Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning. ACM, New York, pp 161–168Google Scholar
- Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning (ICML’96). Morgan Kauffman Publishers, San Francisco, pp 148–156Google Scholar
- Kearns M, Valiant LG (1988) Learning Boolean formulae or finite automata is as hard as factoring. Technical report TR-14-88, Harvard University Aiken Computation LaboratoryGoogle Scholar
- Krogh A, Vedelsby J (1995) Neural network ensembles, crossvalidation and active learning. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 231–238Google Scholar
- Kuncheva LI (2004a) Classifier ensembles for changing environments. In: International workshop on multiple classifier systems. Lecture notes in computer science, vol 3007. Springer, BerlinGoogle Scholar
- Laplace PS (1818) Deuxieme supplement a la theorie analytique des probabilites. Gauthier-Villars, ParisGoogle Scholar
- Mease D, Wyner A (2008) Evidence contrary to the statistical view of Boosting. J Mach Learn Res 9:131–156Google Scholar
- Roli F, Kittler J, Windridge D, Oza N, Polikar R, Haindl M et al (eds) Proceedings of the international workshop on multiple classifier systems 2000–2009. Lecture notes in computer science. Springer, Berlin. Available at: http://www.informatik.uni-trier.de/ley/db/conf/mcs/index.html
- Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227Google Scholar
- Schapire RE (1999) A brief introduction to boosting. In: Proceedings of the 16th international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, pp 1401–1406Google Scholar