Encyclopedia of Complexity and Systems Science

2009 Edition
| Editors: Robert A. Meyers (Editor-in-Chief)

Machine Learning, Ensemble Methods in

  • Sašo Džeroski
  • Panče Panov
  • Bernard Ženko
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30440-3_315

Definition of the Subject

Ensemble methods are machine learning methods that construct a set of predictive models and combine their outputs into a single prediction. The purpose of combining several models together is to achieve better predictive performance, and it has been shown in a number of cases that ensembles can be more accurate than single models. While some work on ensemble methods has already been done in the 1970s, it was not until the 1990s, and the introduction of methods such as bagging and boosting, that ensemble methods started to be more widely used. Today, they represent a standard machine learning method which has to be considered whenever good predictive accuracy is demanded.


Most machine learning techniques deal with the problem of learning predictive models of data. The data are usually given as a set of examples where examples represent objects or measurements. Each example can be described in terms of values of several (independent) variables,...

This is a preview of subscription content, log in to check access.


Primary Literature

  1. 1.
    Allwein EL, Schapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141MathSciNetGoogle Scholar
  2. 2.
    Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1–2):105–139Google Scholar
  3. 3.
    Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MathSciNetzbMATHGoogle Scholar
  4. 4.
    Breiman L (1998) Arcing classifiers. Ann Stat 26(3):801–849MathSciNetzbMATHGoogle Scholar
  5. 5.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32zbMATHGoogle Scholar
  6. 6.
    Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth and Brooks, MontereyzbMATHGoogle Scholar
  7. 7.
    Buciluǎ C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proc. of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’06). ACM, New York, pp 535–541Google Scholar
  8. 8.
    Cohen S, Intrator N (2000) A hybrid projection based and radial basis function architecture. In: Proc. of the 1st international workshop on multiple classifier systems (MCS ’00). Springer, Berlin, pp 147–156Google Scholar
  9. 9.
    Cohen S, Intrator N (2001) Automatic model selection in a hybrid perceptron/radial network. In: Proc. of the 2nd international workshop on multiple classifier systems (MCS ’01). Springer, Berlin, pp 440–454Google Scholar
  10. 10.
    Dietterich TG (1997) Machine‐learning research: four current directions. AI Mag 18(4):97–136Google Scholar
  11. 11.
    Dietterich TG (2000) Ensemble methods in machine learning. In: Proc. of the 1st international workshop on multiple classifier systems (MCS ’00). Springer, Berlin, pp 1–15Google Scholar
  12. 12.
    Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error‐correcting output codes. J Artif Intell Res 2:263–286zbMATHGoogle Scholar
  13. 13.
    Džeroski S, Ženko B (2004) Is combining classifiers with stacking better than selecting the best one? Mach Learn 54(3):255–273Google Scholar
  14. 14.
    Efron B (1979) Bootstrap methods: Another look at the jackknife. Ann Stat 7(1):1–26MathSciNetzbMATHGoogle Scholar
  15. 15.
    Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Saitta L (ed) Machine learning: Proc. of the 13th international conference (ICML ’96). Morgan Kaufmann, San Francisco, pp 148–156Google Scholar
  16. 16.
    Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):11189–1232Google Scholar
  17. 17.
    Friedman JH, Popescu BE (2005) Predictive learning via rule ensembles. Technical report, Stanford University, Department of StatisticsGoogle Scholar
  18. 18.
    Friedman JH, Hastie T, Tibshirani RJ (1998) Additive logistic regression: a statistical view of boosting. Technical report, Stanford University, Department of StatisticsGoogle Scholar
  19. 19.
    Hamming RW (1950) Error detecting and error correcting codes. Bell Syst Tech J 26(2):147–160MathSciNetGoogle Scholar
  20. 20.
    Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001Google Scholar
  21. 21.
    Hastie T, Tibshirani RJ, Friedman JH (2001) The elements of statistical learning. Springer Series in Statistics. Springer, BerlinGoogle Scholar
  22. 22.
    Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844Google Scholar
  23. 23.
    Ho TK (2000) Complexity of classification problems and comparative advantages of combined classifiers. In: Kittler J, Roli F (eds) Proc. of the 1st international workshop on multiple classifier systems (MCS ’00), vol 1857. Springer, Berlin, pp 97–106Google Scholar
  24. 24.
    Jacobs RA (1995) Methods for combining experts’ probability assessments. Neural Comput 7(5):867–888Google Scholar
  25. 25.
    Jordan MI, Jacobs RA (1992) Hierarchies of adaptive experts. In: Moody JE, Hanson S, Lippmann RP (eds) Advances in Neural Information Processing System (NIPS). Morgan Kaufmann, San Mateo, pp 985–992Google Scholar
  26. 26.
    Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, CambridgeGoogle Scholar
  27. 27.
    Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239Google Scholar
  28. 28.
    Kononenko I, Kukar M (2007) Machine learning and data mining: introduction to principles and algorithms. Horwood, ChichesterGoogle Scholar
  29. 29.
    Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience, HobokenzbMATHGoogle Scholar
  30. 30.
    Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207zbMATHGoogle Scholar
  31. 31.
    Mitchell T (1997) Machine Learning. McGraw-Hill, New YorkzbMATHGoogle Scholar
  32. 32.
    Panov P, Džeroski S (2007) Combining bagging and random subspaces to create better ensembles. In: Proc. of 7th international symposium on intelligent data analysis (IDA ’07), vol 4723. Lecture notes in computer science. Springer, Berlin, pp 118–129Google Scholar
  33. 33.
    Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45Google Scholar
  34. 34.
    Rätsch G, Demiriz A, Bennett KP (2002) Sparse regression ensembles in infinite and finite hypothesis spaces. Mach Learn 48(1–3):189–218Google Scholar
  35. 35.
    Ridgeway G, Madigan D, Richardson T (1999) Boosting methodology for regression problems. In: Heckerman D, Whittaker J (eds) Proc. of the 7th international workshop on artificial intelligence and statistics. Morgan Kaufmann, San Francisco, pp 152–161Google Scholar
  36. 36.
    Rosenblatt F (1962) Principles of neurodynamics: perceptron and the theory of brain mechanisms. Spartan Books, WashingtonGoogle Scholar
  37. 37.
    Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227Google Scholar
  38. 38.
    Schapire RE (1999) A brief introduction to boosting. In: Proc. of the 6th international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, pp 1401–1406Google Scholar
  39. 39.
    Schapire RE (2001) The boosting approach to machine learning: an overview. In: MSRI workshop on nonlinear estimation and classification, Berkeley, CA, 2001Google Scholar
  40. 40.
    Schapire RE, Singer Y (1999) Improved boosting using confidence‐rated predictions. Mach Learn 37(3):297–336zbMATHGoogle Scholar
  41. 41.
    Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–1686MathSciNetzbMATHGoogle Scholar
  42. 42.
    Ting KM, Witten IH (1999) Issues in stacked generalization. J Artif Intell Res 10:271–289zbMATHGoogle Scholar
  43. 43.
    Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San FranciscoGoogle Scholar
  44. 44.
    Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259MathSciNetGoogle Scholar

Books and Reviews

  1. 45.
    Brown G Ensemble learning bibliography. http://www.cs.man.ac.uk/%7Egbrown/ensemblebib/index.php. Accessed 26 March 2008
  2. 46.
    Weka 3: Data mining software in Java. http://www.cs.waikato.ac.nz/ml/weka/. Accessed 26 March 2008

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Sašo Džeroski
    • 1
  • Panče Panov
    • 1
  • Bernard Ženko
    • 1
  1. 1.Jožef Stefan InstituteLjubljanaSlovenia