Skip to main content
Log in

A survey on ensemble learning

  • Review Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Despite significant successes achieved in knowledge discovery, traditional machine learning methods may fail to obtain satisfactory performances when dealing with complex data, such as imbalanced, high-dimensional, noisy data, etc. The reason behind is that it is difficult for these methods to capture multiple characteristics and underlying structure of data. In this context, it becomes an important topic in the data mining field that how to effectively construct an efficient knowledge discovery and mining model. Ensemble learning, as one research hot spot, aims to integrate data fusion, data modeling, and data mining into a unified framework. Specifically, ensemble learning firstly extracts a set of features with a variety of transformations. Based on these learned features, multiple learning algorithms are utilized to produce weak predictive results. Finally, ensemble learning fuses the informative knowledge from the above results obtained to achieve knowledge discovery and better predictive performance via voting schemes in an adaptive way. In this paper, we review the research progress of the mainstream approaches of ensemble learning and classify them based on different characteristics. In addition, we present challenges and possible research directions for each mainstream approach of ensemble learning, and we also give an extra introduction for the combination of ensemble learning with other machine learning hot spots such as deep learning, reinforcement learning, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zhou Z H. Ensemble Methods: Foundations and Algorithms. Chapman and Hall/CRC, 2012

    Google Scholar 

  2. Dasarathy B V, Sheela B V. A composite classifier system design: concepts and methodology. Proceedings of the IEEE, 1979, 67(5): 708–713

    Google Scholar 

  3. Kearns M. Learning boolean formulae or finite automata is as hard as factoring. Technical Report TR-14-88 Harvard University Aikem Computation Laboratory, 1988

    Google Scholar 

  4. Schapire, Robert E. The strength of weak learnability. Machine Learning, 1990, 5(2): 197–227

    Google Scholar 

  5. Breiman L. Bagging predictors. Machine Learning, 1996, 24(2): 123–140

    MATH  Google Scholar 

  6. Hastie T, Rosset S, Zhu J, Zou H. Multi-class adaboost. Statistics and its Interface, 2009, 2(3): 349–360

    MathSciNet  MATH  Google Scholar 

  7. Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32

    MATH  Google Scholar 

  8. Ho T K. Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition. 1995, 278–282

    Google Scholar 

  9. Friedman J H. Stochastic gradient boosting. Computational Statistics and Data Analysis, 2002, 38(4): 367–378

    MathSciNet  MATH  Google Scholar 

  10. Garcia-Pedrajas N. Constructing ensembles of classifiers by means of weighted instance selection. IEEE Transactions on Neural Networks, 2009, 20(2): 258–277

    Google Scholar 

  11. Garcia-Pedrajas N, Maudes-Raedo J, Garcia-Osorio C, Rodriguez-Díez J J, Linden D E, Johnston SJ. Supervised subspace projections for constructing ensembles of classifiers. Information Sciences, 2012, 193(11): 1–21

    Google Scholar 

  12. Kuncheva L I, Rodriguez J J, Plumpton C O, Linden D E, Johnston SJ. Random subspace ensembles for FMRI classification. IEEE Transactions on Medical Imaging, 2010, 29(2): 531–542

    Google Scholar 

  13. Ye Y, Wu Q, Huang J Z, Ng M K, Li X. Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recognition, 2013, 46(3): 769–787

    Google Scholar 

  14. Bryll R, Gutierrez-Osuna R, Quek F. Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognition, 2003, 36(6): 1291–1302

    MATH  Google Scholar 

  15. Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. 1998, 92–100

    Google Scholar 

  16. Wang J, Luo S W, Zeng XH. A random subspace method for co-training. In: Proceedings of 2008 IEEE International Joint Conference on Neural Networks. 2008, 195–200

    Google Scholar 

  17. Yaslan Y, Cataltepe Z. Co-training with relevant random subspaces. Neurocomputing, 2010, 73(10–12): 1652–1661

    Google Scholar 

  18. Zhang J, Zhang D. A novel ensemble construction method for multi-view data using random cross-view correlation between within-class examples. Pattern Recognition, 2011, 44(6): 1162–1171

    MATH  Google Scholar 

  19. Guo Y, Jiao L, Wang S, Liu F, Rong K, Xiong T. A novel dynamic rough subspace based selective ensemble. Pattern Recognition, 2015, 48(5): 1638–1652

    Google Scholar 

  20. Windeatt T, Duangsoithong R, Smith R. Embedded feature ranking for ensemble MLP classifiers. IEEE Transactions on Neural Networks, 2011, 22(6): 988–994

    Google Scholar 

  21. Rodriguez J J, Kuncheva L I, Alonso CJ. Rotation forest: a new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(10): 1619–1630

    Google Scholar 

  22. Takemura A, Shimizu A, Hamamoto K. Discrimination of breast tumors in ultrasonic images using an ensemble classifier based on the AdaBoost algorithm with feature selection. IEEE Transactions on Medical Imaging, 2010, 29(3): 598–609

    Google Scholar 

  23. Amasyali M F, Ersoy OK. Classifier ensembles with the extended space forest. IEEE Transactions on Knowledge and Data Engineering, 2013, 26(3): 549–562

    Google Scholar 

  24. Polikar R, Depasquale J, Mohammed H S, Brown G, Kuncheva LI. Learn++.MF: a random subspace approach for the missing feature problem. Pattern Recognition, 2010, 43(11): 3817–3832

    MATH  Google Scholar 

  25. Nanni L, Lumini A. Evolved feature weighting for random subspace classifier. IEEE Transactions on Neural Networks, 2008, 19(2): 363–366

    Google Scholar 

  26. Kennedy J, Eberhart RC. A discrete binary version of the particle swarm optimization algorithm. Computational Cybernatics and Simulation, 1997, 5(1): 4104–4108

    Google Scholar 

  27. Zhou Z H, Tang W. Selective ensemble of decision trees. In: Proceedings of International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing. 2003, 476–483

    Google Scholar 

  28. Diao R, Chao F, Peng T, Snooke N, Shen Q. Feature selection inspired classifier ensemble reduction. IEEE Transactions on Cybernetics, 2014, 44(8): 1259–1268

    Google Scholar 

  29. Yu Z, Wang D, You J, Wong H S, Wu S, Zhang J, Han G. Progressive subspace ensemble learning. Pattern Recognition, 2016, 60: 692–705

    Google Scholar 

  30. Yu Z, Wang D, Zhao Z, Chen C P, You J, Wong H S, Zhang J. Hybrid incremental ensemble learning for noisy real-world data classification. IEEE Transactions on Cybernetics, 2017, 99: 1–14

    Google Scholar 

  31. Dos Santos E M, Sabourin R, Maupin P. A dynamic overproduce-and-choose strategy for the selection of classifier ensembles. Pattern Recognition, 2008, 41(10): 2993–3009

    MATH  Google Scholar 

  32. Hernández-Lobato D, Martínez-Muñoz G, Suárez A. Statistical instance-based pruning in ensembles of independent classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2): 364–369

    Google Scholar 

  33. Martínez-Muñoz G, Hernández-Lobato D, Suárez A. An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2): 245–259

    Google Scholar 

  34. De Stefano C, Folino G, Fontanella F, Di Freca AS. Using bayesian networks for selecting classifiers in GP ensembles. Information Sciences, 2014, 258: 200–216

    MathSciNet  Google Scholar 

  35. Rahman A, Verma B. Novel layered clustering-based approach for generating ensemble of classifiers. IEEE Transactions on Neural Networks, 2011, 22(5): 781–792

    Google Scholar 

  36. Verma B, Rahman A. Cluster-oriented ensemble classifier: impact of multicluster characterization on ensemble classifier learning. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(4): 605–618

    Google Scholar 

  37. Zhang L, Suganthan PN. Oblique decision tree ensemble via multi-surface proximal support vector machine. IEEE Transactions on Cybernetics, 2015, 45(10): 2165–2176

    Google Scholar 

  38. Tan P J, Dowe DL. Decision forests with oblique decision trees. In: Proceedings of Mexican International Conference on Artificial Intelligence. 2006, 593–603

    Google Scholar 

  39. Zhou Z H, Wu J, Tang W. Ensembling neural networks: many could be better than all. Artificial Intelligence, 2002, 137(1–2): 239–263

    MathSciNet  MATH  Google Scholar 

  40. Yu Z, Chen H, Liu J, You J, Leung H, Han G. Hybrid k-nearest neighbor classifier. IEEE Transactions on Cybernetics, 2016, 46(6): 1263–1275

    Google Scholar 

  41. Li H, Wen G, Yu Z, Zhou T. Random subspace evidence classifier. Neurocomputing, 2013, 110(13): 62–69

    Google Scholar 

  42. Hernández-Lobato D, Martínez-Muñoz G, Suárez A. How large should ensembles of classifiers be? Pattern Recognition, 2013, 46(5): 1323–1336

    MATH  Google Scholar 

  43. Wang X Z, Xing H J, Li Y, Hua Q, Dong C R, Pedrycz W. A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Transactions on Fuzzy Systems, 2015, 23(5): 1638–1654

    Google Scholar 

  44. Kuncheva LI. A bound on kappa-error diagrams for analysis of classifier ensembles. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(3): 494–501

    Google Scholar 

  45. Gao W, Zhou ZH. Approximation stability and boosting. In: Proceedings of International Conference on Algorithmic Learning Theory. 2010, 59–73

    Google Scholar 

  46. Yin X C, Huang K, Hao H W, Iqbal K, Wang ZB. A novel classifier ensemble method with sparsity and diversity. Neurocomputing, 2014, 134: 214–221

    Google Scholar 

  47. Zhang L, Suganthan PN. Random forests with ensemble of feature spaces. Pattern Recognition, 2014, 47(10): 3429–3437

    Google Scholar 

  48. Li N, Yu Y, Zhou ZH. Diversity regularized ensemble pruning. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2012, 330–345

    Google Scholar 

  49. Zhang D, Chen S, Zhou Z H, Yang Q. Constraint projections for ensemble learning. In: Proceedings of the 23rd National Conference on Artifical Intelligence-Volume 2. 2008, 758–763

    Google Scholar 

  50. Zhou Z H, Li N. Multi-information ensemble diversity. In: Proceedings of International Workshop on Multiple Classifier Systems. 2010, 134–144

    Google Scholar 

  51. Sun T, Zhou ZH. Structural diversity for decision tree ensemble learning. Frontiers of Computer Science, 2018, 12(3): 560–570

    MathSciNet  Google Scholar 

  52. Mao S, Jiao L, Xiong L, Gou S, Chen B, Yeung SK. Weighted classifier ensemble based on quadratic form. Pattern Recognition, 2015, 48(5): 1688–1706

    MATH  Google Scholar 

  53. Yu Z, Wang Z, You J, Zhang J, Liu J, Wong H S, Han G. A new kind of nonparametric test for statistical comparison of multiple classifiers over multiple datasets. IEEE Transactions on Cybernetics, 2017, 47(12): 4418–4431

    Google Scholar 

  54. Kim K J, Cho SB. An evolutionary algorithm approach to optimal ensemble classifiers for DNA microarray data analysis. IEEE Transactions on Evolutionary Computation, 2008, 12(3): 377–388

    Google Scholar 

  55. Qian C, Yu Y, Zhou ZH. Pareto ensemble pruning. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015

    Google Scholar 

  56. Zhou Z H, Feng J. Deep forest: towards an alternative to deep neural networks. 2017, arXiv preprint arXiv:1702.08835

    Google Scholar 

  57. Feng J, Zhou ZH. AutoEncoder by forest. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018

    Google Scholar 

  58. Zhang Y L, Zhou J, Zheng W, Feng J, Li L, Liu Z, Zhou ZH. Distributed deep forest and its application to automatic detection of cash-out fraud. 2018, arXiv preprint arXiv:1805.04234

    Google Scholar 

  59. Feng J, Yu Y, Zhou ZH. Multi-layered gradient boosting decision trees. In: Proceedings of Advances in Neural Information Processing Systems. 2018, 3555–3565

    Google Scholar 

  60. Pang M, Ting K M, Zhao P, Zhou ZH. Improving deep forest by confidence screening. In: Proceedings of the 18th IEEE International Conference on Data Mining. 2018, 1194–1199

    Google Scholar 

  61. Yu Z, Li L, Liu J, Han G. Hybrid adaptive classifier ensemble. IEEE Transactions on Cybernetics, 2015, 45(2): 177–190

    Google Scholar 

  62. Zhou Z H, Zhang ML. Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowledge and Information Systems, 2007, 11(2): 155–170

    Google Scholar 

  63. Zhu X, Zhang P, Lin X, Shi Y. Active learning from stream data using optimal weight classifier ensemble. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2010, 40(6): 1607–1621

    Google Scholar 

  64. Brzezinski D, Stefanowski J. Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(1): 81–94

    Google Scholar 

  65. Muhlbaier M D, Topalis A, Polikar R. Learn++.NC: combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Transactions on Neural Networks, 2009, 20(1): 152–168

    Google Scholar 

  66. Xiao J, He C, Jiang X, Liu D. A dynamic classifier ensemble selection approach for noise data. Information Sciences, 2010, 180(18): 3402–3421

    Google Scholar 

  67. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F. A review on ensembles for the class imbalance problem: bagging, boosting, and hybrid-based approaches. IEEE Transactions on Systems Man and Cybernetics Part C, 2012, 42(4): 463–484

    Google Scholar 

  68. Liu X Y, Wu J, Zhou ZH. Exploratory under-sampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009, 39(2): 539–550

    Google Scholar 

  69. Sun B, Chen H, Wang J, Xie H. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Frontiers of Computer Science, 2018, 12(2): 331–350

    Google Scholar 

  70. Li Q, Li G, Niu W, Cao Y, Chang L, Tan J, Guo L. Boosting imbal-anced data learning with wiener process oversampling. Frontiers of Computer Science, 2017, 11(5): 836–851

    MATH  Google Scholar 

  71. Abawajy J H, Kelarev A, Chowdhury M. Large iterative multitier ensemble classifiers for security of big data. IEEE Transactions on Emerging Topics in Computing, 2014, 2(3): 352–363

    Google Scholar 

  72. Li N, Zhou ZH. Selective ensemble of classifier chains. In: Proceedings of International Workshop on Multiple Classifier Systems. 2013, 146–156

    Google Scholar 

  73. Li N, Jiang Y, Zhou ZH. Multi-label selective ensemble. In: Proceedings of International Workshop on Multiple Classifier Systems. 2015, 76–88

    Google Scholar 

  74. Yu Z, Deng Z, Wong H S, Tan L. Identifying protein-kinase-specific phosphorylation sites based on the Bagging-AdaBoost ensemble approach. IEEE Transactions on Nanobioscience, 2010, 9(2): 132–143

    Google Scholar 

  75. Yu D J, Hu J, Yang J, Shen H B, Tang J, Yang JY. Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2013, 10(4): 994–1008

    Google Scholar 

  76. Yu G, Rangwala H, Domeniconi C, Zhang G, Yu Z. Protein function prediction using multilabel ensemble classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2013, 10(4): 1

    Google Scholar 

  77. Daliri MR. Combining extreme learning machines using support vector machines for breast tissue classification. Computer Methods in Biomechanics and Biomedical Engineering, 2015, 18(2): 185–191

    Google Scholar 

  78. Oliveira L, Nunes U, Peixoto P. On exploration of classifier ensemble synergism in pedestrian detection. IEEE Transactions on Intelligent Transportation Systems, 2010, 11(1): 16–27

    Google Scholar 

  79. Xu Y, Cao X, Qiao H. An efficient tree classifier ensemble-based approach for pedestrian detection. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2011, 41(1): 107–117

    Google Scholar 

  80. Zhang B. Reliable classification of vehicle types based on cascade classifier ensembles. IEEE Transactions on Intelligent Transportation Systems, 2013, 14(1): 322–332

    Google Scholar 

  81. Sun S, Zhang C. The selective random subspace predictor for traffic flow forecasting. IEEE Transactions on Intelligent Transportation Systems, 2007, 8(2): 367–373

    Google Scholar 

  82. Su Y, Shan S, Chen X, Gao W. Hierarchical ensemble of global and local classifiers for face recognition. IEEE Transactions on Image Processing, 2009, 18(8): 1885–1896

    MathSciNet  MATH  Google Scholar 

  83. Zhang P, Bui T D, Suen CY. A novel cascade ensemble classifier system with a high recognition performance on handwritten digits. Pattern Recognition, 2007, 40(12): 3415–3429

    MATH  Google Scholar 

  84. Xu X S, Xue X, Zhou ZH. Ensemble multi-instance multi-label learning approach for video annotation task. In: Proceedings of the 19th ACM International Conference on Multimedia. 2011, 1153–1156

    Google Scholar 

  85. Hautamaki V, Kinnunen T, Sedlák F, Lee K A, Ma B, Li H. Sparse classifier fusion for speaker verification. IEEE Transactions on Audio Speech and Language Processing, 2013, 21(8): 1622–1631

    Google Scholar 

  86. Guan Y, Li C T, Roli F. On reducing the effect of covariate factors in gait recognition: a classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(7): 1521–1528

    Google Scholar 

  87. Tao D, Tang X, Li X, Wu X. Asymmetric bagging and random sub-space for support vector machines-based relevance feedback in image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(7): 1088–1099

    Google Scholar 

  88. Hu W, Hu W, Maybank S. AdaBoost-based algorithm for network intrusion detection. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2008, 38(2): 577–583

    Google Scholar 

  89. Zhang P, Zhu X, Shi Y, Guo L, Wu X. Robust ensemble learning for mining noisy data streams. Decision Support Systems, 2011, 50(2): 469–479

    Google Scholar 

  90. Yu L, Wang S, Lai KK. Developing an SVM-based ensemble learning system for customer risk identification collaborating with customer relationship management. Frontiers of Computer Science, 2010, 4(2): 196–203

    Google Scholar 

  91. Fersini E, Messina E, Pozzi FA. Sentiment analysis: Bayesian ensemble learning. Decision Support Systems, 2014, 68: 26–38

    Google Scholar 

  92. Yu G, Zhang G, Yu Z, Domeniconi C, You J, Han G. Semi-supervised ensemble classification in subspaces. Applied Soft Computing, 2012, 12(5): 1511–1522

    Google Scholar 

  93. Yu Z, Zhang Y, Chen C L P, You J, Wong H S, Dai D, Wu S, Zhang J. Multiobjective semisupervised classifier ensemble. IEEE Transactions on Cybernetics, 2019, 49(6): 2280–2293

    Google Scholar 

  94. Gharroudi O, Elghazel H, Aussem A. A semi-supervised ensemble approach for multi-label learning. In: Proceedings of the 16th IEEE International Conference on Data Mining Workshops (ICDMW). 2016, 1197–1204

    Google Scholar 

  95. Lu X, Zhang J, Li T, Zhang Y. Hyperspectral image classification based on semi-supervised rotation forest. Remote Sensing, 2017, 9(9): 924

    Google Scholar 

  96. Wang S, Chen K. Ensemble learning with active data selection for semi-supervised pattern classification. In: Proceedings of 2007 International Joint Conference on Neural Networks. 2007, 355–360

    Google Scholar 

  97. Soares R G F, Chen H, Yao X. A cluster-based semi-supervised ensemble for multiclass classification. IEEE Transactions on Emerging Topics in Computational Intelligence, 2017, 1(6): 408–420

    Google Scholar 

  98. Woo H, Park CH. Semi-supervised ensemble learning using label propagation. In: Proceedings of the 12th IEEE International Conference on Computer and Information Technology. 2012, 421–426

    Google Scholar 

  99. Zhang M L, Zhou ZH. Exploiting unlabeled data to enhance ensemble diversity. Data Mining and Knowledge Discovery, 2013, 26(1): 98–129

    MathSciNet  MATH  Google Scholar 

  100. Alves M, Bazzan A L C, Recamonde-Mendoza M. Social-training: ensemble learning with voting aggregation for semi-supervised classification tasks. In: Proceedings of 2017 Brazilian Conference on Intelligent Systems (BRACIS). 2017, 7–12

    Google Scholar 

  101. Yu Z, Lu Y, Zhang J, You J, Wong H S, Wang Y, Han G. Progressive semi-supervised learning of multiple classifiers. IEEE Transactions on Cybernetics, 2018, 48(2): 689–702

    Google Scholar 

  102. Hosseini M J, Gholipour A, Beigy H. An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowledge and Information Systems, 2016, 46(3): 567–597

    Google Scholar 

  103. Wang Y, Li T. Improving semi-supervised co-forest algorithm in evolving data streams. Applied Intelligence, 2018, 48(10): 3248–3262

    Google Scholar 

  104. Yu Z, Zhang Y, You J, Chen C P, Wong H S, Han G, Zhang J. Adaptive semi-supervised classifier ensemble for high dimensional data classification. IEEE Transactions on Cybernetics, 2019, 49(2): 366–379

    Google Scholar 

  105. Li M, Zhou ZH. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 2007, 37(6): 1088–1098

    Google Scholar 

  106. Guz U, Cuendet S, Hakkani-Tur D, Tur G. Multi-view semi-supervised learning for dialog act segmentation of speech. IEEE Transactions on Audio Speech and Language Processing, 2010, 18(2): 320–329

    Google Scholar 

  107. Shi L, Ma X, Xi L, Duan Q, Zhao J. Rough set and ensemble learning based semi-supervised algorithm for text classification. Expert Systems with Applications, 2011, 38(5): 6300–6306

    Google Scholar 

  108. Abdelgayed T S, Morsi W G, Sidhu TS. Fault detection and classification based on co-training of semi-supervised machine learning. IEEE Transactions on Industrial Electronics, 2018, 65(2): 1595–1605

    Google Scholar 

  109. Saydali S, Parvin H, Safaei AA. Classifier ensemble by semi-supervised learning: local aggregation methodology. In: Proceedings of International Doctoral Workshop on Mathematical and Engineering Methods in Computer Science. 2015, 119–132

    Google Scholar 

  110. Shao W, Tian X. Semi-supervised selective ensemble learning based on distance to model for nonlinear soft sensor development. Neuro-computing, 2017, 222: 91–104

    Google Scholar 

  111. Ahmed I, Ali R, Guan D, Lee Y K, Lee S, Chung T. Semi-supervised learning using frequent itemset and ensemble learning for SMS classification. Expert Systems with Applications, 2015, 42(3): 1065–1073

    Google Scholar 

  112. Strehl A, Ghosh J. Cluster ensembles: a knowledge reuse framework for combining partitionings. Journal of Machine Learning Research, 2002, 3(3): 583–617

    MATH  Google Scholar 

  113. Yang F, Li X, Li Q, Li T. Exploring the diversity in cluster ensemble generation: random sampling and random projection. Expert Systems with Applications, 2014, 41(10): 4844–4866

    Google Scholar 

  114. Wu O, Hu W, Maybank S J, Zhu M, Li B. Efficient clustering aggregation based on data fragments. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2012, 42(3): 913–926

    Google Scholar 

  115. Franek L, Jiang X. Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recognition, 2014, 47(2): 833–842

    MATH  Google Scholar 

  116. Yu Z, Wong H S, Wang H. Graph-based consensus clustering for class discovery from gene expression data. Bioinformatics, 2007, 23(21): 2888–2896

    Google Scholar 

  117. Yu Z, Wong H S, You J, Yu G, Han G. Hybrid cluster ensemble framework based on the random combination of data transformation operators. Pattern Recognition, 2012, 45(5): 1826–1837

    MATH  Google Scholar 

  118. Yu Z, Li L, You J, Wong H S, Han G. SC3: triple spectral clustering-based consensus clustering framework for class discovery from cancer gene expression profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2012, 9(6): 1751–1765

    Google Scholar 

  119. Yu Z, Chen H, You J, Han G, Li L. Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2013, 10(3): 657–670

    Google Scholar 

  120. Yu Z, Li L, Liu J, Zhang J, Han G. Adaptive noise immune cluster ensemble using affinity propagation. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(12): 3176–3189

    Google Scholar 

  121. Ayad H G, Kamel MS. On voting-based consensus of cluster ensembles. Pattern Recognition, 2010, 43(5): 1943–1953

    MATH  Google Scholar 

  122. Zhang S, Wong H S, Shen Y. Generalized adjusted rand indices for cluster ensembles. Pattern Recognition, 2012, 45(6): 2214–2226

    MATH  Google Scholar 

  123. Fred A L N, Jain AK. Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 835–850

    Google Scholar 

  124. Lourenco A, Fred A L N, Jain AK. On the scalability of evidence accumulation clustering. In: Proceedings of the 20th International Conference on Pattern Recognition. 2010, 782–785

    Google Scholar 

  125. Amasyali M F, Ersoy O. The performance factors of clustering ensembles. In: Proceedings of the 16th IEEE Signal Processing, Communication and Applications Conference. 2008, 1–4

    Google Scholar 

  126. Fern X Z, Brodley CE. Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03). 2003, 186–193

    Google Scholar 

  127. Kuncheva L I, Whitaker CJ. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 2003, 51(2): 181–207

    MATH  Google Scholar 

  128. Kuncheva L I, Vetrov DP. Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(11): 1798–1808

    Google Scholar 

  129. Shi Y, Yu Z, Chen C L P, You J, Wong H S, Wang Y D, Zhang J. Transfer clustering ensemble selection. IEEE Transactions on Cybernetics, 2018, PP(99): 1–14

    Google Scholar 

  130. Topchy A P, Law M H C, Jain A K, Fred AL. Analysis of consensus partition in cluster ensemble. In: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04). 2004, 225–232

    Google Scholar 

  131. Wang T. CA-tree: a hierarchical structure for efficient and scalable coassociation-based cluster ensembles. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2011, 41(3): 686–698

    Google Scholar 

  132. Hore P, Hall L O, Goldgof DB. A scalable framework for cluster ensembles. Pattern Recognition, 2009, 42(5): 676–688

    MATH  Google Scholar 

  133. Fern X Z, Lin W. Cluster ensemble selection. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2008, 1(3): 128–141

    MathSciNet  Google Scholar 

  134. Azimi J, Fern X. Adaptive cluster ensemble selection. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence. 2009, 992–997

    Google Scholar 

  135. Wang X, Han D, Han C. Rough set based cluster ensemble selection. In: Proceedings of the 16th International Conference on Information Fusion. 2013, 438–444

    Google Scholar 

  136. Yu Z, Li L, Gao Y, You J, Liu J, Wong H S, Han G. Hybrid clustering solution selection strategy. Pattern Recognition, 2014, 47(10): 3362–3375

    Google Scholar 

  137. Yu Z, You J, Wong H S, Han G. From cluster ensemble to structure ensemble. Information Sciences, 2012, 198: 81–99

    MATH  Google Scholar 

  138. Yu Z, Li L, Wong H S, You J, Han G, Gao Y, Yu G. Probabilistic cluster structure ensemble. Information Sciences, 2014, 267(5): 16–34

    MathSciNet  Google Scholar 

  139. Yu Z, Zhu X, Wong H S, You J, Zhang J, Han G. Distribution-based cluster structure selection. IEEE Transactions on Cybernetics, 2017, 47(11): 3554–3567

    Google Scholar 

  140. Yang Y, Jiang J. HMM-based hybrid meta-clustering ensemble for temporal data. Knowledge-Based Systems, 2014, 56: 299–310

    Google Scholar 

  141. Yang Y, Chen K. Temporal data clustering via weighted clustering ensemble with different representations. IEEE Transactions on Knowledge and Data Engineering, 2010, 23(2): 307–320

    MathSciNet  Google Scholar 

  142. Yu Z, Wong HS. Class discovery from gene expression data based on perturbation and cluster ensemble. IEEE Transactions on Nanobio-science, 2009, 8(2): 147–160

    Google Scholar 

  143. Yu Z, Chen H, You J, Liu J, Wong H S, Han G, Li L. Adaptive fuzzy consensus clustering framework for clustering analysis of cancer data. IEEE/ACM Transactions on Computational Biology and Bioinfor-matics, 2015, 12(4): 887–901

    Google Scholar 

  144. Avogadri R, Valentini G. Fuzzy ensemble clustering based on random projections for DNA microarray data analysis. Artificial Intelligence in Medicine, 2009, 45(2): 173–183

    Google Scholar 

  145. Mimaroglu S, Aksehirli E. DICLENS: divisive clustering ensemble with automatic cluster number. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2012, 9(2): 408–420

    Google Scholar 

  146. Alush A, Goldberger J. Ensemble segmentation using efficient integer linear programming. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(10): 1966–1977

    Google Scholar 

  147. Li H, Meng F, Wu Q, Luo B. Unsupervised multiclass region coseg-mentation via ensemble clustering and energy minimization. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(5): 789–801

    Google Scholar 

  148. Zhang X, Jiao L, Liu F, Bo L, Gong M. Spectral clustering ensemble applied to SAR image segmentation. IEEE Transactions on Geo-science and Remote Sensing, 2008, 46(7): 2126–2136

    Google Scholar 

  149. Jia J, Liu B, Jiao L. Soft spectral clustering ensemble applied to image segmentation. Frontiers of Computer Science, 2011, 5(1): 66–78

    MathSciNet  Google Scholar 

  150. Rafiee G, Dlay S S, Woo WL. Region-of-interest extraction in low depth of field images using ensemble clustering and difference of Gaussian approaches. Pattern Recognition, 2013, 46(10): 2685–2699

    Google Scholar 

  151. Huang X, Zheng X, Yuan W, Wang F, Zhu S. Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization. Information Sciences, 2011, 181(11): 2293–2302

    Google Scholar 

  152. Bassiou N, Moschou V, Kotropoulos C. Speaker diarization exploiting the eigengap criterion and cluster ensembles. IEEE Transactions on Audio Speech and Language Processing, 2010, 18(8): 2134–2144

    Google Scholar 

  153. Zhuang W, Ye Y, Chen Y, Li T. Ensemble clustering for internet security applications. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012, 42(6): 1784–1796

    Google Scholar 

  154. Tsai C F, Hung C. Cluster ensembles in collaborative filtering recommendation. Applied Soft Computing, 2012, 12(4): 1417–1425

    Google Scholar 

  155. Yu Z, Luo P, You J, Wong H S, Leung H, Wu S, Zhang J, Han G. Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(3): 701–714

    Google Scholar 

  156. Yu Z, Kuang Z, Liu J, Chen H, Zhang J, You J, Wong H S, Han G. Adaptive ensembling of semi-supervised clustering solutions. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(8): 1577–1590

    Google Scholar 

  157. Wei S, Li Z, Zhang C. Combined constraint-based with metric-based in semi-supervised clustering ensemble. International Journal of Machine Learning and Cybernetics, 2018, 9(7): 1085–1100

    Google Scholar 

  158. Karypis G, Han E H S, Kumar V. Chameleon: hierarchical clustering using dynamic modeling. Computer, 1999, 32(8): 68–75

    Google Scholar 

  159. Xiao W, Yang Y, Wang H, Li T, Xing H. Semi-supervised hierarchical clustering ensemble and its application. Neurocomputing, 2016, 173: 1362–1376

    Google Scholar 

  160. Zhou Z H, Tang W. Clusterer ensemble. Knowledge-Based Systems, 2006, 19(1): 77–83

    Google Scholar 

  161. Zhang J, Yang Y, Wang H, Mahmood A, Huang F. Semi-supervised clustering ensemble based on collaborative training. In: Proceedings of International Conference on Rough Sets and Knowledge Technology. 2012, 450–455

    Google Scholar 

  162. Zhou Z H, Li M. Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529–1541

    Google Scholar 

  163. Wang H, Yang D, Qi J. Semi-supervised cluster ensemble based on normal mutual information. Energy Procedia, 2011, 13: 1673–1677

    Google Scholar 

  164. Yu Z, Luo P, Liu J, Wong H S, You J, Han G, Zhang J. Semi-supervised ensemble clustering based on selected constraint projection. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(12): 2394–2407

    Google Scholar 

  165. Yang Y, Teng F, Li T, Wang H, Zhang Q. Parallel semi-supervised multi-ant colonies clustering ensemble based on mapreduce methodology. IEEE Transactions on Cloud Computing, 2018, 6(3): 857–867

    Google Scholar 

  166. Iqbal A M, Moh’D A, Khan Z. Semi-supervised clustering ensemble by voting. Computer Science, 2012, 2(9): 33–40

    Google Scholar 

  167. Chen D, Yang Y, Wang H, Mahmood A. Convergence analysis of semi-supervised clustering ensemble. In: Proceedings of International Conference on Information Science and Technology. 2014, 783–788

    Google Scholar 

  168. Yan B, Domeniconi C. Subspace metric ensembles for semi-supervised clustering of high dimensional data. In: Proceedings of European Conference on Machine Learning. 2006, 509–520

    Google Scholar 

  169. Mahmood A, Li T, Yang Y, Wang H, Afzal M. Semi-supervised clustering ensemble for Web video categorization. In: Proceedings of International Workshop on Multiple Classifier Systems. 2013, 190–200

    Google Scholar 

  170. Mahmood A, Li T, Yang Y, Wang H, Afzal M. Semi-supervised evolutionary ensembles for web video categorization. Knowledge-Based Systems, 2015, 76: 53–66

    Google Scholar 

  171. Junaidi A, Fink GA. A semi-supervised ensemble learning approach for character labeling with minimal human effort. In: Proceedings of 2011 International Conference on Document Analysis and Recognition. 2011, 259–263

    Google Scholar 

  172. Yu Z, Wongb H S, You J, Yang Q, Liao H. Knowledge based cluster ensemble for cancer discovery from biomolecular data. IEEE Transactions on Nanobioscience, 2011, 10(2): 76–85

    Google Scholar 

  173. Yu Z, Chen H, You J, Wong H S, Liu J, Li L, Han G. Double selection based semi-supervised clustering ensemble for tumor clustering from gene expression profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014, 11(4): 727–740

    Google Scholar 

  174. Krogh A, Vedelsby J. Neural network ensembles, cross validation and active learning. In: Proceedings of the 7th International Conference on Neural Information Processing Systems. 1994, 231–238

    Google Scholar 

  175. Yin Z, Zhao M, Wang Y, Yang J, Zhang J. Recognition of emotions using multimodal physiological signals and an ensemble deep learning model. Computer Methods and Programs in Biomedicine, 2017, 140: 93–110

    Google Scholar 

  176. Kumar A, Kim J, Lyndon D, Fulham M, Feng D. An ensemble of fine-tuned convolutional neural networks for medical image classification. IEEE Journal of Biomedical and Health Informatics, 2017, 21(1): 31–40

    Google Scholar 

  177. Liu W, Zhang M, Luo Z, Cai Y. An ensemble deep learning method for vehicle type classification on visual traffic surveillance sensors. IEEE Access, 2017, 5: 24417–24425

    Google Scholar 

  178. Kandaswamy C, Silva L M, Alexandre L A, Santos JM. Deep transfer learning ensemble for classification. In: Proceedings of International Work-Conference on Artificial Neural Networks. 2015, 335–348

    Google Scholar 

  179. Nozza D, Fersini E, Messina E. Deep learning and ensemble methods for domain adaptation. In: Proceedings of the 28th IEEE International Conference on Tools with Artificial Intelligence (ICTAI). 2016, 184–189

    Google Scholar 

  180. Liu X, Liu Z, Wang G, Cai Z, Zhang H. Ensemble transfer learning algorithm. IEEE Access, 2018, 6: 2389–2396

    Google Scholar 

  181. Brys T, Harutyunyan A, Vrancx P, Nowé A, Taylor ME. Multi-objectivization and ensembles of shapings in reinforcement learning. Neurocomputing, 2017, 263: 48–59

    Google Scholar 

  182. Chen X L, Cao L, Li C X, Xu Z X, Lai J. Ensemble network architecture for deep reinforcement learning. Mathematical Problems in Engineering, 2018, 2018: 1–6

    Google Scholar 

Download references

Acknowledgments

The authors are grateful for the constructive advice received from the anonymous reviewers of this paper. The work described in this paper was partially funded by grants from the National Natural Science Foundation of China (Grant Nos. 61722205, 61751205, 61572199, 61502174, 61872148, and U1611461), the grant from the key research and development program of Guangdong province of China (2018B010107002), the grants from Science and Technology Planning Project of Guangdong Province, China (2016A050503015, 2017A030313355), and the grant from the Guangzhou science and technology planning project (201704030051).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiwen Yu.

Additional information

Dr. Yu focus on artificial intelligence, data mining, machine learning and pattern recognition. Until now, Dr. Yu has been published more than 130 referred journal papers and international conference papers, including more than 30 IEEE Transactions papers.

Wenming Cao received MS degree from the School of Automation, Huazhong University of Science and Technology (HUST), China in 2015. He received PhD degree at the Department of Computer Science, City University of Hong Kong, China. His research interests include data mining and machine learning.

Xibin Dong is a Master candidate in the School of Computer Science and Engineering in South China University of Technology, China. His research interests include machine learning, data mining. He is mainly working on the imbalance learning.

Yifan Shi is a Master candidate in the School of Computer Science and Engineering in South China University of Technology, China. His research interests include machine learning and data mining. He is mainly working on ensemble clustering.

Zhiwen Yu is a professor in School of Computer Science and Engineering, South China University of Technology, China. He is a distinguishable member of CCF (China Computer Federation), a senior member of IEEE and ACM, and the vice chair of ACM Guangzhou chapter. He is an associate editor of IEEE Transactions on Systems, Man, and Cybernetics: Systems. Dr. Yu obtained the PhD degree from City University of Hong Kong, China in 2008. The research areas of Dr. Yu focus on artificial intelligence, data mining, machine learning and pattern recognition. Until now, Dr. Yu has been published more than 130 referred journal papers and international conference papers, including more than 30 IEEE Transactions papers.

Qianli Ma received the PhD degree in computer science from the South China University of Technology, China in 2008. He is an associate professor with the School of Computer Science and Engineering, South China University of Technology, China. From 2016 to 2017, he was a Visiting Scholar with the University of California at San Diego, USA. His current research interests include machine-learning algorithms, data-mining methodologies, and time-series modeling and their applications.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, X., Yu, Z., Cao, W. et al. A survey on ensemble learning. Front. Comput. Sci. 14, 241–258 (2020). https://doi.org/10.1007/s11704-019-8208-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-019-8208-z

Keywords

Navigation