Ensemble Methods in Supervised Learning

  • Lior Rokach


The idea of ensemble methodology is to build a predictive model by integrating multiple models. It is well-known that ensemble methods can be used for improving prediction performance. In this chapter we provide an overview of ensemble methods in classification tasks. We present all important types of ensemble methods including boosting and bagging. Combining methods and modeling issues such as ensemble diversity and ensemble size are discussed.

Key words

Ensemble Boosting AdaBoost Windowing Bagging Grading Arbiter Tree Combiner Tree 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Ali K. M., Pazzani M. J., Error Reduction through Learning Multiple Descriptions, Machine Learning, 24: 3, 173-202, 1996.Google Scholar
  2. Arbel, R. and Rokach, L., Classifier evaluation under limited resources, Pattern Recognition Letters, 27(14): 1619–1631, 2006, Elsevier.CrossRefGoogle Scholar
  3. Averbuch, M. and Karson, T. and Ben-Ami, B. and Maimon, O. and Rokach, L., Contextsensitive medical information retrieval, The 11th World Congress on Medical Informatics (MEDINFO 2004), San Francisco, CA, September 2004, IOS Press, pp. 282–286.Google Scholar
  4. Bartlett P. and Shawe-Taylor J., Generalization Performance of Support Vector Machines and Other Pattern Classifiers, In “Advances in Kernel Methods, Support Vector Learning”, Bernhard Scholkopf, Christopher J. C. Burges, and Alexander J. Smola (eds.), MIT Press, Cambridge, USA, 1998.Google Scholar
  5. Bauer, E. and Kohavi, R., “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants”. Machine Learning, 35: 1-38, 1999.Google Scholar
  6. Breiman L., Bagging predictors, Machine Learning, 24(2):123-140, 1996.zbMATHMathSciNetGoogle Scholar
  7. Bruzzone L., Cossu R., Vernazza G., Detection of land-cover transitions by combining multidate classifiers, Pattern Recognition Letters, 25(13): 1491–1500, 2004.CrossRefGoogle Scholar
  8. Buchanan, B.G. and Shortliffe, E.H., Rule Based Expert Systems, 272-292, Addison-Wesley, 1984.Google Scholar
  9. Buhlmann, P. and Yu, B., Boosting with L 2 loss: Regression and classification, Journal of the American Statistical Association, 98, 324338. 2003.CrossRefMathSciNetGoogle Scholar
  10. Buntine, W., A Theory of Learning Classification Rules. Doctoral dissertation. School of Computing Science, University of Technology. Sydney. Australia, 1990.Google Scholar
  11. Caruana R., Niculescu-Mizil A. , Crew G. , Ksikes A., Ensemble selection from libraries of models, Twenty-first international conference on Machine learning, July 04-08, 2004, Banff, Alberta, Canada.Google Scholar
  12. Chan P. K. and Stolfo, S. J., Toward parallel and distributed learning by meta-learning, In AAAI Workshop in Knowledge Discovery in Databases, pp. 227-240, 1993.Google Scholar
  13. Chan P.K. and Stolfo, S.J., A Comparative Evaluation of Voting and Meta-learning on Partitioned Data, Proc. 12th Intl. Conf. On Machine Learning ICML-95, 1995.Google Scholar
  14. Chan P.K. and Stolfo S.J, On the Accuracy of Meta-learning for Scalable Data Mining, J. Intelligent Information Systems, 8:5-28, 1997.Google Scholar
  15. Charnes, A., Cooper, W. W., and Rhodes, E., Measuring the efficiency of decision making units, European Journal of Operational Research, 2(6):429-444, 1978.zbMATHCrossRefMathSciNetGoogle Scholar
  16. Christensen S.W. , Sinclair I., Reed P. A. S., Designing committees of models through deliberate weighting of data points, The Journal of Machine Learning Research, 4(1):39–66, 2004.zbMATHCrossRefGoogle Scholar
  17. Clark, P. and Boswell, R., “Rule induction with CN2: Some recent improvements.” In Proceedings of the European Working Session on Learning, pp. 151-163, Pitman, 1991.Google Scholar
  18. Cohen S., Rokach L., Maimon O., Decision Tree Instance Space Decomposition with Grouped Gain-Ratio, Information Science, Volume 177, Issue 17, pp. 3592-3612, 2007.CrossRefGoogle Scholar
  19. Džeroski S., ženko B., Is Combining Classifiers with Stacking Better than Selecting the Best One?, Machine Learning, 54(3): 255–273, 2004.zbMATHCrossRefGoogle Scholar
  20. Dietterich, T. G., An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting and Randomization. 40(2):139-157, 2000.Google Scholar
  21. Dietterich T., Ensemble methods in machine learning. In J. Kittler and F. Roll, editors, First InternationalWorkshop on Multiple Classifier Systems, Lecture Notes in Computer Science, pages 1-15. Springer-Verlag, 2000Google Scholar
  22. Dimitriadou E., Weingessel A., Hornik K., A cluster ensembles framework, Design and application of hybrid intelligent systems, IOS Press, Amsterdam, The Netherlands, 2003.Google Scholar
  23. Domingos, P., Using Partitioning to Speed Up Specific-to-General Rule Induction. In Proceedings of the AAAI-96 Workshop on Integrating Multiple Learned Models, pp. 29-34, AAAI Press, 1996.Google Scholar
  24. Freund Y. and Schapire R. E., Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference, pages 325-332, 1996.Google Scholar
  25. F¨urnkranz, J., More efficient windowing, In Proceeding of The 14th national Conference on Artificial Intelegence (AAAI-97), pp. 509-514, Providence, RI. AAAI Press, 1997.Google Scholar
  26. Gams, M., New Measurements Highlight the Importance of Redundant Knowledge. In European Working Session on Learning, Montpeiller, France, Pitman, 1989.Google Scholar
  27. Geman S., Bienenstock, E., and Doursat, R., Neural networks and the bias variance dilemma. Neural Computation, 4:1-58, 1995.CrossRefGoogle Scholar
  28. Hansen J., Combining Predictors. Meta Machine Learning Methods and Bias Variance & Ambiguity Decompositions. PhD dissertation. Aurhus University. 2000.Google Scholar
  29. Hansen, L. K., and Salamon, P., Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993–1001, 1990.CrossRefGoogle Scholar
  30. Hu, X., Using Rough Sets Theory and Database Operations to Construct a Good Ensemble of Classifiers for Data Mining Applications. ICDM01. pp. 233-240, 2001.Google Scholar
  31. Hu X., Yoo I., Cluster ensemble and its applications in gene expression analysis, Proceedings of the second conference on Asia-Pacific bioinformatics, pp. 297–302, Dunedin, New Zealand, 2004.Google Scholar
  32. Kolen, J. F., and Pollack, J. B., Back propagation is sesitive to initial conditions. In Advances in Neural Information Processing Systems, Vol. 3, pp. 860-867 San Francisco, CA. Morgan Kaufmann, 1991.Google Scholar
  33. Krogh, A., and Vedelsby, J., Neural network ensembles, cross validation and active learning. In Advances in Neural Information Processing Systems 7, pp. 231-238 1995.Google Scholar
  34. Kuncheva, L., & Whitaker, C., Measures of diversity in classifier ensembles and their relationship with ensemble accuracy. Machine Learning, pp. 181–207, 2003.Google Scholar
  35. Leigh W., Purvis R., Ragusa J. M., Forecasting the NYSE composite index with technical analysis, pattern recognizer, neural networks, and genetic algorithm: a case study in romantic decision support, Decision Support Systems 32(4): 361–377, 2002.CrossRefGoogle Scholar
  36. Lewis D., and Catlett J., Heterogeneous uncertainty sampling for supervised learning. In Machine Learning: Proceedings of the Eleventh Annual Conference, pp. 148-156 , New Brunswick, New Jersey, Morgan Kaufmann, 1994.Google Scholar
  37. Lewis, D., and Gale,W., Training text classifiers by uncertainty sampling, In seventeenth annual international ACM SIGIR conference on research and development in information retrieval, pp. 3-12, 1994.Google Scholar
  38. Liu H., Mandvikar A., Mody J., An Empirical Study of Building Compact Ensembles.WAIM 2004: pp. 622-627.Google Scholar
  39. Maimon O., and Rokach, L. Data Mining by Attribute Decomposition with semiconductors manufacturing case study, in Data Mining for Design and Manufacturing: Methods and Applications, D. Braha (ed.), Kluwer Academic Publishers, pp. 311–336, 2001.Google Scholar
  40. Maimon O. and Rokach L., “Improving supervised learning by feature decomposition”, Proceedings of the Second International Symposium on Foundations of Information and Knowledge Systems, Lecture Notes in Computer Science, Springer, pp. 178-196, 2002.Google Scholar
  41. Maimon O. Rokach L., Ensemble of Decision Trees for Mining Manufacturing Data Sets, Machine Engineering, vol. 4 No1-2, 2004.Google Scholar
  42. Maimon, O. and Rokach, L., Decomposition Methodology for Knowledge Discovery and Data Mining: Theory and Applications, Series in Machine Perception and Artificial Intelligence - Vol. 61, World Scientific Publishing, ISBN:981-256-079-3, 2005.Google Scholar
  43. Mangiameli P., West D., Rampal R., Model selection for medical diagnosis decision support systems, Decision Support Systems, 36(3): 247–259, 2004.CrossRefGoogle Scholar
  44. Margineantu D. and Dietterich T., Pruning adaptive boosting. In Proc. Fourteenth Intl. Conf. Machine Learning, pages 211–218, 1997.Google Scholar
  45. Mitchell, T., Machine Learning, McGraw-Hill, 1997.Google Scholar
  46. Moskovitch R, Elovici Y, Rokach L, Detection of unknown computer worms based on behavioral classification of the host, Computational Statistics and Data Analysis, 52(9):4544–4566, 2008.zbMATHCrossRefMathSciNetGoogle Scholar
  47. Neal R., Probabilistic inference using Markov Chain Monte Carlo methods. Tech. Rep. CRGTR-93-1, Department of Computer Science, University of Toronto, Toronto, CA, 1993.Google Scholar
  48. Opitz, D. and Maclin, R., Popular Ensemble Methods: An Empirical Study, Journal of Artificial Research, 11: 169-198, 1999.zbMATHGoogle Scholar
  49. Parmanto, B., Munro, P. W., and Doyle, H. R., Improving committee diagnosis with resampling techinques. In Touretzky, D. S., Mozer, M. C., and Hesselmo, M. E. (Eds). Advances in Neural Information Processing Systems, Vol. 8, pp. 882-888 Cambridge, MA. MIT Press, 1996.Google Scholar
  50. Prodromidis, A. L., Stolfo, S. J. and Chan, P. K., Effective and efficient pruning of metaclassifiers in a distributed Data Mining system. Technical report CUCS-017-99, Columbia Univ., 1999.Google Scholar
  51. Provost, F.J. and Kolluri, V., A Survey of Methods for Scaling Up Inductive Learning Algorithms, Proc. 3rd International Conference on Knowledge Discovery and Data Mining, 1997.Google Scholar
  52. Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, Los Altos, 1993.Google Scholar
  53. Quinlan, J. R., Bagging, Boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 725-730, 1996.Google Scholar
  54. Rokach, L., Decomposition methodology for classification tasks: a meta decomposer framework, Pattern Analysis and Applications, 9(2006):257–271.CrossRefMathSciNetGoogle Scholar
  55. Rokach L., Genetic algorithm-based feature set partitioning for classification problems, Pattern Recognition, 41(5):1676–1700, 2008.zbMATHCrossRefGoogle Scholar
  56. Rokach L., Mining manufacturing data using genetic algorithm-based feature set decomposition, Int. J. Intelligent Systems Technologies and Applications, 4(1):57-78, 2008.CrossRefGoogle Scholar
  57. Rokach, L. and Maimon, O., Theory and applications of attribute decomposition, IEEE International Conference on Data Mining, IEEE Computer Society Press, pp. 473–480, 2001.Google Scholar
  58. Rokach L. and Maimon O., Feature Set Decomposition for Decision Trees, Journal of Intelligent Data Analysis, Volume 9, Number 2, 2005b, pp 131–158.Google Scholar
  59. Rokach, L. and Maimon, O., Clustering methods, Data Mining and Knowledge Discovery Handbook, pp. 321–352, 2005, Springer.Google Scholar
  60. Rokach, L. and Maimon, O., Data mining for improving the quality of manufacturing: a feature set decomposition approach, Journal of Intelligent Manufacturing, 17(3):285–299, 2006, Springer.CrossRefGoogle Scholar
  61. Rokach, L., Maimon, O., Data Mining with Decision Trees: Theory and Applications,World Scientific Publishing, 2008.Google Scholar
  62. Rokach L., Maimon O. and Lavi I., Space Decomposition In Data Mining: A Clustering Approach, Proceedings of the 14th International Symposium On Methodologies For Intelligent Systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer-Verlag, 2003, pp. 24–31.Google Scholar
  63. Rokach, L. and Maimon, O. and Averbuch, M., Information Retrieval System for Medical Narrative Reports, Lecture Notes in Artificial intelligence 3055, page 217-228 Springer-Verlag, 2004.Google Scholar
  64. Rokach, L. and Maimon, O. and Arbel, R., Selective voting-getting more for less in sensor fusion, International Journal of Pattern Recognition and Artificial Intelligence 20 (3)(2006), pp. 329–350.CrossRefGoogle Scholar
  65. Schaffer, C., Selecting a classification method by cross-validation. Machine Learning 13(1):135-143, 1993.Google Scholar
  66. Seewald, A.K. and F¨urnkranz, J., Grading classifiers, Austrian research institute for Artificial intelligence, 2001.Google Scholar
  67. Sharkey, A., On combining artificial neural nets, Connection Science, Vol. 8, pp.299-313, 1996.CrossRefGoogle Scholar
  68. Shilen, S., Multiple binary tree classifiers. Pattern Recognition 23(7): 757-763, 1990.CrossRefGoogle Scholar
  69. Shilen, S., Nonparametric classification using matched binary decision trees. Pattern Recognition Letters 13: 83-87, 1992.CrossRefGoogle Scholar
  70. Sohn S. Y., Choi, H., Ensemble based on Data Envelopment Analysis, ECML Meta Learning workshop, Sep. 4, 2001.Google Scholar
  71. Strehl A., Ghosh J. (2003), Cluster ensembles - a knowledge reuse framework for combining multiple partitions, The Journal of Machine Learning Research, 3: 583-617, 2003.zbMATHCrossRefMathSciNetGoogle Scholar
  72. Tan A. C., Gilbert D., Deville Y., Multi-class Protein Fold Classification using a New Ensemble Machine Learning Approach. Genome Informatics, 14:206–217, 2003.Google Scholar
  73. Tukey J.W., Exploratory data analysis, Addison-Wesley, Reading, Mass, 1977.zbMATHGoogle Scholar
  74. Tumer, K. and Ghosh J., Error Correlation and Error Reduction in Ensemble Classifiers, Connection Science, Special issue on combining artificial neural networks: ensemble approaches, 8 (3-4): 385-404, 1996.Google Scholar
  75. Tumer, K., and Ghosh J., Linear and Order Statistics Combiners for Pattern Classification, in Combining Articial Neural Nets, A. Sharkey (Ed.), pp. 127-162, Springer-Verlag, 1999.Google Scholar
  76. Tumer, K., and Ghosh J., Robust Order Statistics based Ensembles for Distributed Data Mining. In Kargupta, H. and Chan P., eds, Advances in Distributed and Parallel Knowledge Discovery , pp. 185-210, AAAI/MIT Press, 2000.Google Scholar
  77. Wolpert, D.H., Stacked Generalization, Neural Networks, Vol. 5, pp. 241-259, Pergamon Press, 1992.CrossRefGoogle Scholar
  78. Zenobi, G., and Cunningham, P. Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. In Proceedings of the European Conference on Machine Learning, 2001.Google Scholar
  79. Zhou, Z. H., and Tang,W., Selective Ensemble of Decision Trees, in GuoyinWang, Qing Liu, Yiyu Yao, Andrzej Skowron (Eds.): Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, 9th International Conference, RSFDGrC, Chongqing, China, Proceedings. Lecture Notes in Computer Science 2639, pp.476-483, 2003.Google Scholar
  80. Zhou, Z. H., Wu J., Tang W., Ensembling neural networks: many could be better than all. Artificial Intelligence 137: 239-263, 2002.zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Information Systems EngineeringBen-Gurion University of the NegevBeer-ShevaIsrael

Personalised recommendations