Abstract
Over the last couple of decades, multiple classifier systems, also called ensemble systems have enjoyed growing attention within the computational intelligence and machine learning community. This attention has been well deserved, as ensemble systems have proven themselves to be very effective and extremely versatile in a broad spectrum of problem domains and real-world applications. Originally developed to reduce the variance—thereby improving the accuracy—of an automated decision-making system, ensemble systems have since been successfully used to address a variety of machine learning problems, such as feature selection, confidence estimation, missing feature, incremental learning, error correction, class-imbalanced data, learning concept drift from nonstationary distributions, among others. This chapter provides an overview of ensemble systems, their properties, and how they can be applied to such a wide spectrum of applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
B. V. Dasarathy and B. V. Sheela, “Composite classifier system design: concepts and methodology,” Proceedings of the IEEE, vol. 67, no. 5, pp. 708–713, 1979
L. K. Hansen and P. Salamon, “Neural network ensembles,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993–1001, 1990
R. E. Schapire, “The strength of weak learnability,” Machine Learning, vol. 5, no. 2, pp. 197–227, June 1990
Y. Freund and R. E. Schapire, “Decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997
L. I. Kuncheva, Combining pattern classifiers, methods and algorithms. New York, NY: Wiley Interscience, 2005
L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996
R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural Computation, vol. 3, no. 1, pp. 79–87, 1991
M. J. Jordan and R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,” Neural Computation, vol. 6, no. 2, pp. 181–214, 1994
D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241–259, 1992
J. A. Benediktsson and P. H. Swain, “Consensus theoretic classification methods,” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 4, pp. 688–704, 1992
L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of combining multiple classifiers and their applications to handwriting recognition,” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 3, pp. 418–435, 1992
T. K. Ho, J. J. Hull, and S. N. Srihari, “Decision combination in multiple classifier systems,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66–75, 1994
G. Rogova, “Combining the results of several neural network classifiers,” Neural Networks, vol. 7, no. 5, pp. 777–781, 1994
L. Lam and C. Y. Suen, “Optimal combinations of pattern classifiers,” Pattern Recognition Letters, vol. 16, no. 9, pp. 945–954, 1995
K. Woods, W. P. J. Kegelmeyer, and K. Bowyer, “Combination of multiple classifiers using local accuracy estimates,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 405–410, 1997
I. Bloch, “Information combination operators for data fusion: A comparative review with classification,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 26, no. 1, pp. 52–67, 1996
S. B. Cho and J. H. Kim, “Combining multiple neural networks by fuzzy integral for robust classification,” IEEE Transactions on Systems, Man and Cybernetics, vol. 25, no. 2, pp. 380–384, 1995
L. I. Kuncheva, J. C. Bezdek, and R. P. W. Duin, “Decision templates for multiple classifier fusion: an experimental comparison,” Pattern Recognition, vol. 34, no. 2, pp. 299–314, 2001
H. Drucker, C. Cortes, L. D. Jackel, Y. LeCun, and V. Vapnik, “Boosting and other ensemble methods,” Neural Computation, vol. 6, no. 6, pp. 1289–1301, 1994
L. I. Kuncheva, “Classifier ensembles for changing environments,” 5th International Workshop on Multiple Classifier Systems in Lecture Notes in Computer Science, eds. F. Roli, J. Kittler, and T. Windeatt, vol. 3077, pp. 1–15, Cagliari, Italy, 2004
L. I. Kuncheva, “Switching between selection and fusion in combining classifiers: An experiment,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 32, no. 2, pp. 146–156, 2002
E. Alpaydin and M. I. Jordan, “Local linear perceptrons for classification,” IEEE Transactions on Neural Networks, vol. 7, no. 3, pp. 788–792, 1996
G. Giacinto and F. Roli, “Approach to the automatic design of multiple classifier systems,” Pattern Recognition Letters, vol. 22, no. 1, pp. 25–33, 2001
L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001
L. Breiman, “Arcing classifiers,” Annals of Statistics, vol. 26, no. 3, pp. 801–849, 1998
F. M. Alkoot and J. Kittler, “Experimental evaluation of expert fusion strategies,” Pattern Recognition Letters, vol. 20, no. 11–13, pp. 1361–1369, Nov. 1999
J. Kittler, M. Hatef, R. P. W. Duin, and J. Mates, “On combining classifiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226–239, 1998
M. Muhlbaier, A. Topalis, and R. Polikar, “Ensemble confidence estimates posterior probability,” 6th Int. Workshop on Multiple Classifier Systems, Lecture Notes on Computer Science, eds. N. C. Oza, R. Polikar, J. Kittler, and F. Roli, Eds., vol. 3541, pp. 326–335, Monterey, CA, 2005
L. I. Kuncheva, “A theoretical study on six classifier fusion strategies,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 281–286, 2002
Y. Lu, “Knowledge integration in a multiple classifier system,” Applied Intelligence, vol. 6, no. 2, pp. 75–86, 1996
D. M. J. Tax, M. van Breukelen, R. P. W. Duin, and J. Kittler, “Combining multiple classifiers by averaging or by multiplying?” Pattern Recognition, vol. 33, no. 9, pp. 1475–1485, 2000
G. Brown, “Diversity in neural network ensembles.” PhD, University of Birmingham, UK, 2004
G. Brown, J. Wyatt, R. Harris, and X. Yao, “Diversity creation methods: a survey and categorisation,” Information Fusion, vol. 6, no. 1, pp. 5–20, 2005
A. Chandra and X. Yao, “Evolving hybrid ensembles of learning machines for better generalisation,” Neurocomputing, vol. 69, no. 7–9, pp. 686–700, Mar. 2006
Y. Liu and X. Yao, “Ensemble learning via negative correlation,” Neural Networks, vol. 12, no. 10, pp. 1399–1404, 1999
T. K. Ho, “Random subspace method for constructing decision forests,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832–844, 1998
R. E. Banfield, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer, “Ensemble diversity measures and their application to thinning,” Information Fusion, vol. 6, no. 1, pp. 49–62, 2005
L. I. Kuncheva and C. J. Whitaker, “Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy,” Machine Learning, vol. 51, no. 2, pp. 181–207, 2003
L. I. Kuncheva, That elusive diversity in classifier ensembles,” Pattern Recognition and Image Analysis, Lecture Notes in Computer Science, vol. 2652, 2003, pp. 1126–1138
N. Littlestone and M. Warmuth, “Weighted majority algorithm,” Information and Computation, vol. 108, pp. 212–261, 1994
R. O. Duda, P. E. Hart, and D. Stork, “Algorithm independent techniques,” in Pattern classification, 2 edn New York: Wiley, 2001, pp. 453–516
L. Breiman, “Pasting small votes for classification in large databases and on-line,” Machine Learning, vol. 36, no. 1–2, pp. 85–103, 1999
M. I. Jordan and L. Xu, “Convergence results for the EM approach to mixtures of experts architectures,” Neural Networks, vol. 8, no. 9, pp. 1409–1431, 1995
R. Polikar, L. Udpa, S. S. Udpa, and V. Honavar, “Learn\(++\): An incremental learning algorithm for supervised neural networks,” IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, vol. 31, no. 4, pp. 497–508, 2001
H. S. Mohammed, J. Leander, M. Marbach, Polikar, and R. Polikar, “Can AdaBoost.M1 learn incrementally? A comparison to Learn\(++\) under different combination rules,” International Conference on Artificial Neural Networks (ICANN2006) in Lecture Notes in Computer Science, vol. 4131, pp. 254–263, Springer, 2006
M. D. Muhlbaier, A. Topalis, and R. Polikar, “Learn\(++\).NC: combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 152–168, 2009
D. Parikh and R. Polikar, “An ensemble-based incremental learning approach to data fusion,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 37, no. 2, pp. 437–450, 2007
H. Altincay and M. Demirekler, “Speaker identification by combining multiple classifiers using Dempster-Shafer theory of evidence,” Speech Communication, vol. 41, no. 4, pp. 531–547, 2003
Y. Bi, D. Bell, H. Wang, G. Guo, and K. Greer, “Combining multiple classifiers using dempster’s rule of combination for text categorization,” First International Conference, MDAI 2004, Aug 2–4 2004 in Lecture Notes in Artificial Intelligence, vol. 3131, Barcelona, Spain, pp. 127–138, 2004
T. Denoeux, “Neural network classifier based on Dempster-Shafer theory,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 30, no. 2, pp. 131–150, 2000
G. A. Carpenter, S. Martens, and O. J. Ogas, “Self-organizing information fusion and hierarchical knowledge discovery: a new framework using ARTMAP neural networks,” Neural Networks, vol. 18, no. 3, pp. 287–295, 2005
B. F. Buxton, W. B. Langdon, and S. J. Barrett, “Data fusion by intelligent classifier combination,” Measurement and Control, vol. 34, no. 8, pp. 229–234, 2001
G. J. Briem, J. A. Benediktsson, and J. R. Sveinsson, “Use of multiple classifiers in classification of data from multiple data sources,” 2001 International Geoscience and Remote Sensing Symposium (IGARSS 2001), vol. 2, Sydney, NSW: Institute of Electrical and Electronics Engineers Inc., pp. 882–884, 2001
W. Fan, M. Gordon, and P. Pathak, “On linear mixture of expert approaches to information retrieval,” Decision Support Systems, vol. 42, no. 2, pp. 975–987, 2005
S. Jianbo, W. Jun, and X. Yugeng, “Incremental learning with balanced update on receptive fields for multi-sensor data fusion,” IEEE Transactions on Systems, Man and Cybernetics (B), vol. 34, no. 1, pp. 659–665, 2004
D. Leonard, D. Lillis, L. Zhang, F. Toolan, R. Collier, and J. Dunnion, “Applying machine learning diversity metrics to data fusion in information retrieval,” in Advances in Information Retrieval, Lecture Notes in Computer Science, vol. 6611, P. Clough, C. Foley, C. Gurrin, G. Jones, W. Kraaij, H. Lee, and V. Mudoch, eds. Springer, Berlin/Heidelberg, 2011, pp. 695–698
R. Polikar, J. DePasquale, H. Syed Mohammed, G. Brown, and L. I. Kuncheva, “Learn\(++\).MF: A random subspace approach for the missing feature problem,” Pattern Recognition, vol. 43, no. 11, pp. 3817–3832, 2010
G. Widmer and M. Kubat, “Learning in the presence of concept drift and hidden contexts,” Machine Learning, vol. 23, no. 1, pp. 69–101, 1996
R. Elwell and R. Polikar, “Incremental learning of concept drift in nonstationary environments,” IEEE Transactions on Neural Networks, doi: 10.1109/TNN.2011.2160459, vol. 22, no. 10, pp. 1517–1531, October 2011
J. C. Schlimmer and R. H. Granger, “Incremental learning from noisy data,” Machine Learning, vol. 1, no. 3, pp. 317–354, Sept. 1986
R. Klinkenberg, “Learning drifting concepts: example selection vs. example weighting,” Intelligent Data Analysis, Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, vol. 8, no. 3, pp. 281–300, 2004
M. Nunez, R. Fidalgo, and R. Morales, “Learning in environments with unknown dynamics: towards more robust concept learners,” Journal of Machine Learning Research, vol. 8, pp. 2595–2628, 2007
P. Wang, H. Wang, X. Wu, W. Wang, and B. Shi, “A low-granularity classifier for data streams with concept drifts and biased class distribution,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 9, pp. 1202–1213, 2007
C. Alippi and M. Roveri, “Just-in-time adaptive classifiers; part I: detecting nonstationary changes,” IEEE Transactions on Neural Networks, vol. 19, no. 7, pp. 1145–1153, 2008
C. Alippi and M. Roveri, “Just-in-time adaptive classifiers; part II: designing the classifier,” IEEE Transactions on Neural Networks, vol. 19, no. 12, pp. 2053–2064, 2008
J. Gama, P. Medas, G. Castillo, and P. Rodrigues, “Learning with drift detection,” Advances in Artificial Intelligence—SBIA 2004 in Lecture Notes in Computer Science, vol. 3171, pp. 286–295, 2004
L. Cohen, G. Avrahami-Bakish, M. Last, A. Kandel, and O. Kipersztok, “Real-time data mining of non-stationary data streams from sensor networks,” Information Fusion, vol. 9, no. 3, pp. 344–353, 2008
M. Markou and S. Singh, “Novelty detection: a review—part 2: neural network based approaches,” Signal Processing, vol. 83, no. 12, pp. 2499–2521, 2003
L. I. Kuncheva, “Classifier ensembles for changing environments,” Multiple Classifier Systems (MCS 2004) in Lecture Notes in Computer Science, vol. 3077, pp. 1–15, 2004
A. Blum, “Empirical support for winnow and weighted-majority algorithms: results on a calendar scheduling domain,” Machine Learning, vol. 26, no. 1, pp. 5–23, 1997
Z. Xingquan, W. Xindong, and Y. Ying, “Dynamic classifier selection for effective mining from noisy data streams,” Fourth IEEE International Conference on Data Mining (ICDM ’04), pp. 305–312, 2004
N. Littlestone, “Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm,” Machine Learning, vol. 2, no. 4, pp. 285–318, Apr. 1988
N. Oza, “Online ensemble learning.” Ph.D. Dissertation, University of California, Berkeley, 2001
W. N. Street and Y. Kim, “A streaming ensemble algorithm (SEA) for large-scale classification,” Seventh ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-01), pp. 377–382, 2001
S. Chen and H. He, “Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach,” Evolving Systems, vol. in press 2011
A. Tsymbal, M. Pechenizkiy, P. Cunningham, and S. Puuronen, “Dynamic integration of classifiers for handling concept drift,” Information Fusion, vol. 9, no. 1, pp. 56–68, Jan. 2008
J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: an ensemble method for drifting concepts,” Journal of Machine Learning Research, vol. 8, pp. 2755–2790, 2007
J. Gao, W. Fan, and J. Han, “On appropriate assumptions to mine data streams: analysis and practice,” International Conference on Data Mining, pp. 143–152, 2007
J. Gao, B. Ding, F. Wei, H. Jiawei, and P. S. Yu, “Classifying data streams with skewed class distributions and concept drifts,” IEEE Internet Computing, vol. 12, no. 6, pp. 37–49, 2008
A. Bifet, “Adaptive learning and mining for data streams and frequent patterns.” Ph.D. Dissertation, Universitat Politècnica de Catalunya, 2009
A. Bifet, E. Frank, G. Holmes, and B. Pfahringer, “Accurate ensembles for data streams: Combining restricted Hoeffding trees using stacking,” 2nd Asian Conference on Machine Learning in Journal of Machine Learning Research, vol. 13, Tokyo, 2010
A. Bifet, MOA: Massive Online Analysis, Available at: http://moa.cs.waikato.ac.nz/.Lastaccessed:7/22/2011
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Polikar, R. (2012). Ensemble Learning. In: Zhang, C., Ma, Y. (eds) Ensemble Machine Learning. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9326-7_1
Download citation
DOI: https://doi.org/10.1007/978-1-4419-9326-7_1
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-9325-0
Online ISBN: 978-1-4419-9326-7
eBook Packages: EngineeringEngineering (R0)