Skip to main content

Ensemble Learning

  • Chapter
  • First Online:
Ensemble Machine Learning
  • 19k Accesses

Abstract

Over the last couple of decades, multiple classifier systems, also called ensemble systems have enjoyed growing attention within the computational intelligence and machine learning community. This attention has been well deserved, as ensemble systems have proven themselves to be very effective and extremely versatile in a broad spectrum of problem domains and real-world applications. Originally developed to reduce the variance—thereby improving the accuracy—of an automated decision-making system, ensemble systems have since been successfully used to address a variety of machine learning problems, such as feature selection, confidence estimation, missing feature, incremental learning, error correction, class-imbalanced data, learning concept drift from nonstationary distributions, among others. This chapter provides an overview of ensemble systems, their properties, and how they can be applied to such a wide spectrum of applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. B. V. Dasarathy and B. V. Sheela, “Composite classifier system design: concepts and methodology,” Proceedings of the IEEE, vol. 67, no. 5, pp. 708–713, 1979

    Article  Google Scholar 

  2. L. K. Hansen and P. Salamon, “Neural network ensembles,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993–1001, 1990

    Article  Google Scholar 

  3. R. E. Schapire, “The strength of weak learnability,” Machine Learning, vol. 5, no. 2, pp. 197–227, June 1990

    Article  Google Scholar 

  4. Y. Freund and R. E. Schapire, “Decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997

    Article  MathSciNet  MATH  Google Scholar 

  5. L. I. Kuncheva, Combining pattern classifiers, methods and algorithms. New York, NY: Wiley Interscience, 2005

    MATH  Google Scholar 

  6. L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996

    Article  MATH  Google Scholar 

  7. R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural Computation, vol. 3, no. 1, pp. 79–87, 1991

    Article  Google Scholar 

  8. M. J. Jordan and R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,” Neural Computation, vol. 6, no. 2, pp. 181–214, 1994

    Article  Google Scholar 

  9. D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241–259, 1992

    Article  Google Scholar 

  10. J. A. Benediktsson and P. H. Swain, “Consensus theoretic classification methods,” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 4, pp. 688–704, 1992

    Article  MATH  Google Scholar 

  11. L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of combining multiple classifiers and their applications to handwriting recognition,” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 3, pp. 418–435, 1992

    Article  Google Scholar 

  12. T. K. Ho, J. J. Hull, and S. N. Srihari, “Decision combination in multiple classifier systems,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66–75, 1994

    Article  Google Scholar 

  13. G. Rogova, “Combining the results of several neural network classifiers,” Neural Networks, vol. 7, no. 5, pp. 777–781, 1994

    Article  Google Scholar 

  14. L. Lam and C. Y. Suen, “Optimal combinations of pattern classifiers,” Pattern Recognition Letters, vol. 16, no. 9, pp. 945–954, 1995

    Article  Google Scholar 

  15. K. Woods, W. P. J. Kegelmeyer, and K. Bowyer, “Combination of multiple classifiers using local accuracy estimates,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 405–410, 1997

    Article  Google Scholar 

  16. I. Bloch, “Information combination operators for data fusion: A comparative review with classification,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 26, no. 1, pp. 52–67, 1996

    Article  Google Scholar 

  17. S. B. Cho and J. H. Kim, “Combining multiple neural networks by fuzzy integral for robust classification,” IEEE Transactions on Systems, Man and Cybernetics, vol. 25, no. 2, pp. 380–384, 1995

    Article  Google Scholar 

  18. L. I. Kuncheva, J. C. Bezdek, and R. P. W. Duin, “Decision templates for multiple classifier fusion: an experimental comparison,” Pattern Recognition, vol. 34, no. 2, pp. 299–314, 2001

    Article  MATH  Google Scholar 

  19. H. Drucker, C. Cortes, L. D. Jackel, Y. LeCun, and V. Vapnik, “Boosting and other ensemble methods,” Neural Computation, vol. 6, no. 6, pp. 1289–1301, 1994

    Article  MATH  Google Scholar 

  20. L. I. Kuncheva, “Classifier ensembles for changing environments,” 5th International Workshop on Multiple Classifier Systems in Lecture Notes in Computer Science, eds. F. Roli, J. Kittler, and T. Windeatt, vol. 3077, pp. 1–15, Cagliari, Italy, 2004

    Google Scholar 

  21. L. I. Kuncheva, “Switching between selection and fusion in combining classifiers: An experiment,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 32, no. 2, pp. 146–156, 2002

    Article  Google Scholar 

  22. E. Alpaydin and M. I. Jordan, “Local linear perceptrons for classification,” IEEE Transactions on Neural Networks, vol. 7, no. 3, pp. 788–792, 1996

    Article  Google Scholar 

  23. G. Giacinto and F. Roli, “Approach to the automatic design of multiple classifier systems,” Pattern Recognition Letters, vol. 22, no. 1, pp. 25–33, 2001

    Article  MATH  Google Scholar 

  24. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001

    Article  MATH  Google Scholar 

  25. L. Breiman, “Arcing classifiers,” Annals of Statistics, vol. 26, no. 3, pp. 801–849, 1998

    MathSciNet  MATH  Google Scholar 

  26. F. M. Alkoot and J. Kittler, “Experimental evaluation of expert fusion strategies,” Pattern Recognition Letters, vol. 20, no. 11–13, pp. 1361–1369, Nov. 1999

    Article  Google Scholar 

  27. J. Kittler, M. Hatef, R. P. W. Duin, and J. Mates, “On combining classifiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226–239, 1998

    Article  Google Scholar 

  28. M. Muhlbaier, A. Topalis, and R. Polikar, “Ensemble confidence estimates posterior probability,” 6th Int. Workshop on Multiple Classifier Systems, Lecture Notes on Computer Science, eds. N. C. Oza, R. Polikar, J. Kittler, and F. Roli, Eds., vol. 3541, pp. 326–335, Monterey, CA, 2005

    Google Scholar 

  29. L. I. Kuncheva, “A theoretical study on six classifier fusion strategies,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 281–286, 2002

    Article  Google Scholar 

  30. Y. Lu, “Knowledge integration in a multiple classifier system,” Applied Intelligence, vol. 6, no. 2, pp. 75–86, 1996

    Article  Google Scholar 

  31. D. M. J. Tax, M. van Breukelen, R. P. W. Duin, and J. Kittler, “Combining multiple classifiers by averaging or by multiplying?” Pattern Recognition, vol. 33, no. 9, pp. 1475–1485, 2000

    Article  Google Scholar 

  32. G. Brown, “Diversity in neural network ensembles.” PhD, University of Birmingham, UK, 2004

    Google Scholar 

  33. G. Brown, J. Wyatt, R. Harris, and X. Yao, “Diversity creation methods: a survey and categorisation,” Information Fusion, vol. 6, no. 1, pp. 5–20, 2005

    Article  Google Scholar 

  34. A. Chandra and X. Yao, “Evolving hybrid ensembles of learning machines for better generalisation,” Neurocomputing, vol. 69, no. 7–9, pp. 686–700, Mar. 2006

    Article  Google Scholar 

  35. Y. Liu and X. Yao, “Ensemble learning via negative correlation,” Neural Networks, vol. 12, no. 10, pp. 1399–1404, 1999

    Article  Google Scholar 

  36. T. K. Ho, “Random subspace method for constructing decision forests,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832–844, 1998

    Article  Google Scholar 

  37. R. E. Banfield, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer, “Ensemble diversity measures and their application to thinning,” Information Fusion, vol. 6, no. 1, pp. 49–62, 2005

    Article  Google Scholar 

  38. L. I. Kuncheva and C. J. Whitaker, “Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy,” Machine Learning, vol. 51, no. 2, pp. 181–207, 2003

    Article  MATH  Google Scholar 

  39. L. I. Kuncheva, That elusive diversity in classifier ensembles,” Pattern Recognition and Image Analysis, Lecture Notes in Computer Science, vol. 2652, 2003, pp. 1126–1138

    Google Scholar 

  40. N. Littlestone and M. Warmuth, “Weighted majority algorithm,” Information and Computation, vol. 108, pp. 212–261, 1994

    Article  MathSciNet  MATH  Google Scholar 

  41. R. O. Duda, P. E. Hart, and D. Stork, “Algorithm independent techniques,” in Pattern classification, 2 edn New York: Wiley, 2001, pp. 453–516

    Google Scholar 

  42. L. Breiman, “Pasting small votes for classification in large databases and on-line,” Machine Learning, vol. 36, no. 1–2, pp. 85–103, 1999

    Article  Google Scholar 

  43. M. I. Jordan and L. Xu, “Convergence results for the EM approach to mixtures of experts architectures,” Neural Networks, vol. 8, no. 9, pp. 1409–1431, 1995

    Article  Google Scholar 

  44. R. Polikar, L. Udpa, S. S. Udpa, and V. Honavar, “Learn\(++\): An incremental learning algorithm for supervised neural networks,” IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, vol. 31, no. 4, pp. 497–508, 2001

    Google Scholar 

  45. H. S. Mohammed, J. Leander, M. Marbach, Polikar, and R. Polikar, “Can AdaBoost.M1 learn incrementally? A comparison to Learn\(++\) under different combination rules,” International Conference on Artificial Neural Networks (ICANN2006) in Lecture Notes in Computer Science, vol. 4131, pp. 254–263, Springer, 2006

    Google Scholar 

  46. M. D. Muhlbaier, A. Topalis, and R. Polikar, “Learn\(++\).NC: combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 152–168, 2009

    Google Scholar 

  47. D. Parikh and R. Polikar, “An ensemble-based incremental learning approach to data fusion,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 37, no. 2, pp. 437–450, 2007

    Article  Google Scholar 

  48. H. Altincay and M. Demirekler, “Speaker identification by combining multiple classifiers using Dempster-Shafer theory of evidence,” Speech Communication, vol. 41, no. 4, pp. 531–547, 2003

    Article  Google Scholar 

  49. Y. Bi, D. Bell, H. Wang, G. Guo, and K. Greer, “Combining multiple classifiers using dempster’s rule of combination for text categorization,” First International Conference, MDAI 2004, Aug 2–4 2004 in Lecture Notes in Artificial Intelligence, vol. 3131, Barcelona, Spain, pp. 127–138, 2004

    Google Scholar 

  50. T. Denoeux, “Neural network classifier based on Dempster-Shafer theory,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 30, no. 2, pp. 131–150, 2000

    Article  Google Scholar 

  51. G. A. Carpenter, S. Martens, and O. J. Ogas, “Self-organizing information fusion and hierarchical knowledge discovery: a new framework using ARTMAP neural networks,” Neural Networks, vol. 18, no. 3, pp. 287–295, 2005

    Article  Google Scholar 

  52. B. F. Buxton, W. B. Langdon, and S. J. Barrett, “Data fusion by intelligent classifier combination,” Measurement and Control, vol. 34, no. 8, pp. 229–234, 2001

    Article  Google Scholar 

  53. G. J. Briem, J. A. Benediktsson, and J. R. Sveinsson, “Use of multiple classifiers in classification of data from multiple data sources,” 2001 International Geoscience and Remote Sensing Symposium (IGARSS 2001), vol. 2, Sydney, NSW: Institute of Electrical and Electronics Engineers Inc., pp. 882–884, 2001

    Google Scholar 

  54. W. Fan, M. Gordon, and P. Pathak, “On linear mixture of expert approaches to information retrieval,” Decision Support Systems, vol. 42, no. 2, pp. 975–987, 2005

    Article  Google Scholar 

  55. S. Jianbo, W. Jun, and X. Yugeng, “Incremental learning with balanced update on receptive fields for multi-sensor data fusion,” IEEE Transactions on Systems, Man and Cybernetics (B), vol. 34, no. 1, pp. 659–665, 2004

    Google Scholar 

  56. D. Leonard, D. Lillis, L. Zhang, F. Toolan, R. Collier, and J. Dunnion, “Applying machine learning diversity metrics to data fusion in information retrieval,” in Advances in Information Retrieval, Lecture Notes in Computer Science, vol. 6611, P. Clough, C. Foley, C. Gurrin, G. Jones, W. Kraaij, H. Lee, and V. Mudoch, eds. Springer, Berlin/Heidelberg, 2011, pp. 695–698

    Google Scholar 

  57. R. Polikar, J. DePasquale, H. Syed Mohammed, G. Brown, and L. I. Kuncheva, “Learn\(++\).MF: A random subspace approach for the missing feature problem,” Pattern Recognition, vol. 43, no. 11, pp. 3817–3832, 2010

    Google Scholar 

  58. G. Widmer and M. Kubat, “Learning in the presence of concept drift and hidden contexts,” Machine Learning, vol. 23, no. 1, pp. 69–101, 1996

    Article  Google Scholar 

  59. R. Elwell and R. Polikar, “Incremental learning of concept drift in nonstationary environments,” IEEE Transactions on Neural Networks, doi: 10.1109/TNN.2011.2160459, vol. 22, no. 10, pp. 1517–1531, October 2011

    Google Scholar 

  60. J. C. Schlimmer and R. H. Granger, “Incremental learning from noisy data,” Machine Learning, vol. 1, no. 3, pp. 317–354, Sept. 1986

    Google Scholar 

  61. R. Klinkenberg, “Learning drifting concepts: example selection vs. example weighting,” Intelligent Data Analysis, Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, vol. 8, no. 3, pp. 281–300, 2004

    Google Scholar 

  62. M. Nunez, R. Fidalgo, and R. Morales, “Learning in environments with unknown dynamics: towards more robust concept learners,” Journal of Machine Learning Research, vol. 8, pp. 2595–2628, 2007

    MathSciNet  MATH  Google Scholar 

  63. P. Wang, H. Wang, X. Wu, W. Wang, and B. Shi, “A low-granularity classifier for data streams with concept drifts and biased class distribution,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 9, pp. 1202–1213, 2007

    Article  Google Scholar 

  64. C. Alippi and M. Roveri, “Just-in-time adaptive classifiers; part I: detecting nonstationary changes,” IEEE Transactions on Neural Networks, vol. 19, no. 7, pp. 1145–1153, 2008

    Article  Google Scholar 

  65. C. Alippi and M. Roveri, “Just-in-time adaptive classifiers; part II: designing the classifier,” IEEE Transactions on Neural Networks, vol. 19, no. 12, pp. 2053–2064, 2008

    Article  Google Scholar 

  66. J. Gama, P. Medas, G. Castillo, and P. Rodrigues, “Learning with drift detection,” Advances in Artificial Intelligence—SBIA 2004 in Lecture Notes in Computer Science, vol. 3171, pp. 286–295, 2004

    Google Scholar 

  67. L. Cohen, G. Avrahami-Bakish, M. Last, A. Kandel, and O. Kipersztok, “Real-time data mining of non-stationary data streams from sensor networks,” Information Fusion, vol. 9, no. 3, pp. 344–353, 2008

    Article  Google Scholar 

  68. M. Markou and S. Singh, “Novelty detection: a review—part 2: neural network based approaches,” Signal Processing, vol. 83, no. 12, pp. 2499–2521, 2003

    Article  MATH  Google Scholar 

  69. L. I. Kuncheva, “Classifier ensembles for changing environments,” Multiple Classifier Systems (MCS 2004) in Lecture Notes in Computer Science, vol. 3077, pp. 1–15, 2004

    Google Scholar 

  70. A. Blum, “Empirical support for winnow and weighted-majority algorithms: results on a calendar scheduling domain,” Machine Learning, vol. 26, no. 1, pp. 5–23, 1997

    Article  Google Scholar 

  71. Z. Xingquan, W. Xindong, and Y. Ying, “Dynamic classifier selection for effective mining from noisy data streams,” Fourth IEEE International Conference on Data Mining (ICDM ’04), pp. 305–312, 2004

    Google Scholar 

  72. N. Littlestone, “Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm,” Machine Learning, vol. 2, no. 4, pp. 285–318, Apr. 1988

    Article  Google Scholar 

  73. N. Oza, “Online ensemble learning.” Ph.D. Dissertation, University of California, Berkeley, 2001

    Google Scholar 

  74. W. N. Street and Y. Kim, “A streaming ensemble algorithm (SEA) for large-scale classification,” Seventh ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-01), pp. 377–382, 2001

    Google Scholar 

  75. S. Chen and H. He, “Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach,” Evolving Systems, vol. in press 2011

    Google Scholar 

  76. A. Tsymbal, M. Pechenizkiy, P. Cunningham, and S. Puuronen, “Dynamic integration of classifiers for handling concept drift,” Information Fusion, vol. 9, no. 1, pp. 56–68, Jan. 2008

    Article  Google Scholar 

  77. J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: an ensemble method for drifting concepts,” Journal of Machine Learning Research, vol. 8, pp. 2755–2790, 2007

    MATH  Google Scholar 

  78. J. Gao, W. Fan, and J. Han, “On appropriate assumptions to mine data streams: analysis and practice,” International Conference on Data Mining, pp. 143–152, 2007

    Google Scholar 

  79. J. Gao, B. Ding, F. Wei, H. Jiawei, and P. S. Yu, “Classifying data streams with skewed class distributions and concept drifts,” IEEE Internet Computing, vol. 12, no. 6, pp. 37–49, 2008

    Article  Google Scholar 

  80. A. Bifet, “Adaptive learning and mining for data streams and frequent patterns.” Ph.D. Dissertation, Universitat Politècnica de Catalunya, 2009

    Book  Google Scholar 

  81. A. Bifet, E. Frank, G. Holmes, and B. Pfahringer, “Accurate ensembles for data streams: Combining restricted Hoeffding trees using stacking,” 2nd Asian Conference on Machine Learning in Journal of Machine Learning Research, vol. 13, Tokyo, 2010

    Google Scholar 

  82. A. Bifet, MOA: Massive Online Analysis, Available at: http://moa.cs.waikato.ac.nz/.Lastaccessed:7/22/2011

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robi Polikar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Polikar, R. (2012). Ensemble Learning. In: Zhang, C., Ma, Y. (eds) Ensemble Machine Learning. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9326-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-9326-7_1

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-9325-0

  • Online ISBN: 978-1-4419-9326-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics