Advertisement

Ensemble Methods in Machine Learning

  • Thomas G. Dietterich
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1857)

Abstract

Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, Bagging, and boosting. This paper reviews these methods and explains why ensembles can often perform better than any single classifier. Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.

Keywords

Decision Tree Training Data Learning Algorithm Input Feature Ensemble Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  1. Ali, M., & Pazzani, M.J. (1996). Error reduction through learning multiple descriptions. Machine Learning, 24(3), 173–202.Google Scholar
  2. Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1/2), 105–139.CrossRefGoogle Scholar
  3. Blum, A., & Rivest, R.L. (1988). Training a 3-node neural network is NP-Complete (Extended abstract). In Proceedings of the 1988 Workshop on Computational Learning Theory, pp. 9–18 San Francisco, CA. Morgan Kaufmann.Google Scholar
  4. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.MATHMathSciNetGoogle Scholar
  5. Cherkauer, K.J. (1996). Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks. In Chan, P. (Ed.), Working Notes of the AAAI Workshop on Integrating Multiple Learned Models, pp. 15–21. Available from http://www.cs.fit.edu/imlm/.
  6. Dietterich, T.G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning.Google Scholar
  7. Dietterich, T.G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2 263–286.MATHGoogle Scholar
  8. Freund, Y., & Schapire, R. E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. Tech. rep., AT&T Bell Laboratories, Murray Hill, NJ.Google Scholar
  9. Freund, Y., & Schapire, R.E. (1996). Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning, pp. 148–146. Morgan Kaufmann.Google Scholar
  10. Hansen, L., & Salamon, P. (1990). Neural network ensembles. IEEE Trans. Pattern Analysis and Machine Intell., 12, 993–1001.CrossRefGoogle Scholar
  11. Hornik, K., Stinchcombe, M., & White, H. (1990). Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks, 3, 551–560.CrossRefGoogle Scholar
  12. Hyafil, L., & Rivest, R.L. (1976). Constructing optimal binary decision trees is NP-Complete. Information Processing Letters, 5(1), 15–17.MATHCrossRefMathSciNetGoogle Scholar
  13. Kolen, J.F., & Pollack, J.B. (1991). Back propagation is sensitive to initial conditions. In Advances in Neural Information Processing Systems, Vol. 3, pp. 860–867 San Francisco, CA. Morgan Kaufmann.Google Scholar
  14. Kwok, S.W., & Carter, C. (1990). Multiple decision trees. In Schachter, R. D., Levitt, T.S., Kannal, L.N., & Lemmer, J.F. (Eds.), Uncertainty in Artificial Intelligence4, pp. 327–335. Elsevier Science, Amsterdam.Google Scholar
  15. Neal, R. (1993). Probabilistic inference using Markov chain Monte Carlo methods. Tech. rep. CRG-TR-93-1, Department of Computer Science, University of Toronto, Toronto,CA.Google Scholar
  16. Parmanto, B., Munro, P.W., & Doyle, H.R. (1996). Improving committee diagnosis with resampling techniques. In Touretzky, D.S., Mozer, M.C., & Hesselmo, M.E. (Eds.), Advances in Neural Information Processing Systems, Vol. 8, pp. 882–888 Cambridge, MA. MIT Press.Google Scholar
  17. Raviv, Y., & Intrator, N. (1996). Bootstrapping with noise: An effective regularization technique. Connection Science, 8(3–4), 355–372.CrossRefGoogle Scholar
  18. Ricci, F., & Aha, D.W. (1997). Extending local learners with error-correcting output codes. Tech. rep., Naval Center for Applied Research in Artificial Intelligence, Washington, D.C.Google Scholar
  19. Schapire, R.E. (1997). Using output codes to boost multiclass learning problems. In Proceedings of the Fourteenth International Conference on Machine Learning, pp. 313–321 San Francisco, CA. Morgan Kaufmann.Google Scholar
  20. Schapire, R.E., Freund, Y., Bartlett, P., & Lee, W.S. (1997). Boosting the margin: A new explanation for the effectiveness of voting methods. In Fisher, D. (Ed.), Machine Learning: Proceedings of the Fourteenth International Conference. Morgan Kaufmann.Google Scholar
  21. Schapire, R.E., & Singer, Y. (1998). Improved boosting algorithms using confidence-rated predictions. In Proc. 11th Annu. Conf. on Comput. Learning Theory, pp. 80–91. ACM Press, New York, NY.Google Scholar
  22. Tumer, K., & Ghosh, J. (1996). Error correlation and error reduction in ensemble classifiers. Connection Science, 8(3–4), 385–404.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Thomas G. Dietterich
    • 1
  1. 1.Oregon State UniversityCorvallisUSA

Personalised recommendations