Abstract
Ensemble learning has been applied in different areas to improve the predictive performances using multiple learners. The two core building blocks, diversity and combination rule, which play a significant role in ensemble learners. The ensemble approach can be divided into two broad groups based on the variation of base classifiers: homogeneous and heterogeneous ensemble. We conducted a comprehensive review of the ensemble learning used for data analytics. The study has proceeded from the feature selection to classification. We found that the ensemble learning helps to overcome the problem associated with the dimensionality and class imbalance of data. For this reason, the ensemble approach found to be more suitable for the classification of high-dimensional data. Then we move towards the meta-analytics using ensemble learning. Our comprehensive review of the metaheuristics-based ensemble learning for homogeneous and heterogeneous ensemble found a substantial number of applications of ensemble learning from these categories. The in detail study of the ensemble learning in business applications able to identify four successful application areas: purchasing and marketing, predictive analytics, business process management, and customer churn prediction. From these application areas, we observe that majority of the approaches built homogeneous ensembles with dynamic selection for single objective optimization. Despite these success in various application domains, ensemble learning could face challenges in analytics in the future. We concluded the chapter with identifying those difficulties and some trends to overcome them for ensemble learning with meta-analytics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We created the Word Clouds using the Pro Word Cloud add-ins for Microsoft Word 2016 found at: https://store.office.com/en-us/app.aspx?assetid=WA104038830.
References
Abbasimehr H, Setak M, Tarokh MJ (2014) A comparative assessment of the performance of ensemble learning in customer churn prediction. Int Arab J Inf Technol 11(6):599–606
Akbaş A, Turhal U, Babur S, Avci C (2013) Performance improvement with combining multiple approaches to diagnosis of thyroid cancer. Engineering 5(10):264
Amini M, Rezaeenour J, Hadavandi E (2015) A cluster-based data balancing ensemble classifier for response modeling in bank direct marketing. International Journal of Computational Intelligence and Applications 14(4)
Awang MK, Makhtar M, Rahman MNA, Deris MM (2016) A new customer churn prediction approach based on soft set ensemble pruning. In: SCDM, Springer, Advances in Intelligent Systems and Computing, vol 549, pp 427–436
Baumann A, Lessmann S, Coussement K, Bock KWD (2015) Maximize what matters: Predicting customer churn with decision-centric ensemble selection. In: ECIS
Blanco R, Larrañaga P, Inza I, Sierra B (2004) Gene selection for cancer classification using wrapper approaches. International Journal of Pattern Recognition and Artificial Intelligence 18(08):1373–1390
Blaszczynski J, Dembczynski K, Kotlowski W, Pawlowski M (2006) Mining direct marketing data by ensembles of weak learners and rough set methods. In: DaWaK, Springer, Lecture Notes in Computer Science, vol 4081, pp 218–227
Bock KWD, den Poel DV (2010) Ensembles of probability estimation trees for customer churn prediction. In: IEA/AIE (2), Springer, Lecture Notes in Computer Science, vol 6097, pp 57–66
Bock KWD, den Poel DV (2011) An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction. Expert Syst Appl 38(10):12,293–12,301
Breiman L (2001) Random forests. Machine Learning 45(1):5–32
Breiman L, Breiman L (1996) Bagging predictors. In: Machine Learning, pp 123–140
Breiman L, et al (2001) Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science 16(3):199–231
Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Information Fusion 6(1):5–20
Cleofas L, Valdovinos RM, García V, Alejo R, Universitario C, Valle U (2009) Use of Ensemble Based on GA for Imbalance Problem. In: 6th International Symposium on Neural Networks, ISNN 2009 Wuhan, China, May 26-29, 2009 Proceedings, Part II, Springer Berlin Heidelberg, pp 547–554
Cunningham P, Carney J (2000) Diversity versus quality in classification ensembles based on feature selection. In: European Conference on Machine Learning, Springer, pp 109–116
Cuzzocrea A, Folino F, Guarascio M, Pontieri L (2016) A multi-view multi-dimensional ensemble learning approach to mining business process deviances. In: IJCNN, IEEE, pp 3809–3816
Dahiya S, Handa S, Singh N (2017) A feature selection enabled hybrid-bagging algorithm for credit risk evaluation. Expert Systems
Dietterich T (2000) Ensemble methods in machine learning. In: Multiple Classifier Systems, Lecture Notes in Computer Science, vol 1857, Springer Berlin Heidelberg, pp 1–15
Dounias G, Tsakonas A, Charalampakis D, Vasilakis E (2013) Effective business plan evaluation using an evolutionary ensemble. In: DATA, SciTePress, pp 97–103
Duangsoithong R, Windeatt T (2010) Bootstrap feature selection for ensemble classifiers. In: Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects, Springer-Verlag, Berlin, Heidelberg, ICDM’10, pp 28–41
Ebrahimpour MK, Eftekhari M (2017) Ensemble of feature selection methods: A hesitant fuzzy sets approach. Applied Soft Computing 50:300–312
Fallahpour S, Lakvan EN, Zadeh MH (2017) Using an ensemble classifier based on sequential floating forward selection for financial distress prediction problem. Journal of Retailing and Consumer Services 34:159–167
Folino F, Guarascio M, Pontieri L (2012) Context-aware predictions on business processes: An ensemble-based solution. In: NFMCP, Springer, Lecture Notes in Computer Science, vol 7765, pp 215–229
Gaber MM, Bader-El-Den M (2012) Optimisation of Ensemble Classifiers using Genetic Algorithm. In: Graña M, Toro C, Posada J, Howlett RJ, Jain LC (eds) Advances in Knowledge-Based and Intelligent Information and Engineering Systems, IOS Press
Gabrys B, Ruta D (2006) Genetic algorithms in classifier fusion. Applied Soft Computing 6(4):337–347
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recognition 44(8):1761–1776
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2012) A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 42(4):463–484
García S, Ramírez-Gallego S, Luengo J, Benítez JM, Herrera F (2016) Big data preprocessing: methods and prospects. Big Data Analytics 1(1):9
García-Gil D, Luengo J, García S, Herrera F (2017) Enabling Smart Data: Noise filtering in Big Data classification. ArXiv e-prints 1704.01770
Govindarajan M (2015) Comparative study of ensemble classifiers for direct marketing. Intelligent Decision Technologies 9(2):141–152
Espejo P, Ventura S, Herrera F (2010) A survey on the application of genetic programming to classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 40(2):121–144
Haque MN, Noman N, Berretta R, Moscato P (2016) Heterogeneous ensemble combination search using genetic algorithm for class imbalanced data classification. PLoS ONE 11(1):e0146,116.
Haque MN, Noman N, Berretta R, Moscato P (2016b) Optimising weights for heterogeneous ensemble of classifiers with differential evolution. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp 233–240
Hernández-Lobato D, Martínez-Muñoz G, Suárez A (2006) Pruning in ordered regression bagging ensembles. In: The 2006 IEEE International Joint Conference on Neural Network Proceedings, pp 1266–1273
Hu H, Li J, Wang H, Daggard G (2008) Robustness analysis of diversified ensemble decision tree algorithms for Microarray data classification. 2008 International Conference on Machine Learning and Cybernetics pp 115–120
Jain AK, Duin RPW, Mao J, Member S (2000) Statistical Pattern Recognition : A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1):4–37
Jordan MI, Jacobs RA (1994) Hierarchical Mixtures of Experts and the EM Algorithm. Neural Computation 6(2):181–214
Kim Y (2009) Boosting and measuring the performance of ensembles for a successful database marketing. Expert Syst Appl 36(2):2161–2176
Kim Y, Street WN, Menczer F (2006) Optimal ensemble construction via meta-evolutionary ensembles. Expert Systems with Applications 30(4):705–714
Kim YW, Oh IS (2008) Classifier ensemble selection using hybrid genetic algorithms. Pattern Recognition Letters 29(6):796–802
Kleinberg E (2000) On the algorithmic implementation of stochastic discrimination. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(5):473–490
Ko AH, Sabourin R, Britto AS, Jr (2008) From dynamic classifier selection to dynamic ensemble selection. Pattern Recognition 41(5):1718–1731
Kotsiantis S, Zaharakis I, Pintelas P (2006) Machine learning: a review of classification and combining techniques. Artificial Intelligence Review 26(3):159–190
Koutanaei FN, Sajedi H, Khanbabaei M (2015) A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. Journal of Retailing and Consumer Services 27:11–23
Krawczyk B, Galar M, Jeleń Ł, Herrera F (2016) Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Applied Soft Computing 38:714–726
Kuncheva L, Jain L (2000) Designing classifier fusion systems by genetic algorithms. IEEE Transactions on Evolutionary Computation 4(4):327–336
Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51(2):181–207
Kuncheva LI (2014) Combining Pattern Classifiers: Methods and Algorithms, 2nd edn. John Wiley & Sons, Inc.
Lai KK, Yu L, Wang S, Huang W (2007) An intelligent CRM system for identifying high-risk customers: An ensemble data mining approach. In: International Conference on Computational Science (2), Springer, Lecture Notes in Computer Science, vol 4488, pp 486–489
Lertampaiporn S, Thammarongtham C, Nukoolkit C, Kaewkamnerdpong B, Ruengjitchatchawalya M (2013) Heterogeneous ensemble approach with discriminative features and modified-SMOTEBagging for pre-miRNA classification. Nucleic acids research 41(1):e21
L’Heureux A, Grolinger K, ElYamany HF, Capretz M (2017) Machine learning with big data: Challenges and approaches. IEEE Access PP(99)
Li C (2007) Classifying imbalanced data using a bagging ensemble variation (BEV). In: Proceedings of the 45th annual southeast regional conference, ACM, New York, NY, USA, ACM-SE 45, pp 203–208
Li H, Sun J (2011) Principal component case-based reasoning ensemble for business failure prediction. Information & Management 48(6):220–227
Li H, Sun J (2012) Case-based reasoning ensemble and business application: A computational approach from multiple case representations driven by randomness. Expert Syst Appl 39(3):3298–3310
Li H, Sun J, Li J, Yan X (2013) Forecasting business failure using two-stage ensemble of multivariate discriminant analysis and logistic regression. Expert Systems 30(5):385–397
Liu Y, Wei W, Wang K, Liao Z, Gao J (2011) Balanced-sampling-based heterogeneous SVR ensemble for business demand forecasting. In: ICIC (1), Springer, Lecture Notes in Computer Science, vol 6838, pp 91–99
Ma C, Zhang HH, Wang X (2014) Machine learning for big data analytics in plants. Trends in plant science 19(12):798–808
Matthews C, Scheurmann E (2008) Ensembles of classifiers in arrears management. In: Soft Computing Applications in Business, Studies in Fuzziness and Soft Computing, vol 230, Springer, pp 1–18
Minaei-Bidgoli B, Kortemeyer G, Punch W (2004) Optimizing classification ensembles via a genetic algorithm for a web-based educational system. In: Fred A, Caelli T, Duin R, Campilho A, de Ridder D (eds) Structural, Syntactic, and Statistical Pattern Recognition, Lecture Notes in Computer Science, vol 3138, Springer Berlin Heidelberg, pp 397–406
Mirza B, Lin Z, Liu N (2015) Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift. Neurocomputing 149:316–329
Namsrai E, Munkhdalai T, Li M, Shin JH, Namsrai OE, Ryu KH (2013) A Feature Selection-based Ensemble Method for Arrhythmia Classification. Journal of Information Processing Systems 9(1):31–40
Nikulin V, Mclachlan GJ, Ng SK (2009) Ensemble Approach for the Classification of Imbalanced Data. In: AI 2009: Advances in Artificial Intelligence, Springer, pp 291–300
Oh DY, Gray JB (2013) GA-Ensemble: a genetic algorithm for robust ensembles. Computational Statistics 28(5):2333–2347
Oliveira L, Morita M, Sabourin R, Bortolozzi F (2005) Multi-objective genetic algorithms to create ensemble of classifiers. In: Coello Coello CA, Hernández Aguirre A, Zitzler E (eds) Evolutionary Multi-Criterion Optimization, Lecture Notes in Computer Science, vol 3410, Springer Berlin Heidelberg, pp 592–606
Oliveira LS, Sabourin R, Bortolozzi F, Suen CY (2003) Feature selection for ensembles: A hierarchical multi-objective genetic algorithm approach. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2, IEEE Computer Society, Washington, DC, USA, ICDAR ’03, pp 676–680
Osareh A, Shadgar B (2013) An Efficient Ensemble Learning Method for Gene Microarray Classification. BioMed Research International 2013:1–10
Oza NC (2006) Ensemble data mining methods. In: Wang J (ed) Encyclopedia of Data Warehousing and Mining, Idea Group Reference, vol 1, pp 448–453
Oza NC, Tumer K (2008) Classifier ensembles: Select real-world applications. Information Fusion 9(1):4–20
Polikar R (2006) Ensemble based systems in decision making. Circuits and Systems Magazine, IEEE 6(3):21–45
Ramírez-Gallego S, Lastra I, Martínez-Rego D, Bolón-Canedo V, Benítez JM, Herrera F, Alonso-Betanzos A (2017) Fast-mRMR: Fast minimum redundancy maximum relevance algorithm for high-dimensional big data. International Journal of Intelligent Systems 32(2):134–152
Ranawana R, Palade V (2006) Multi-Classifier Systems: Review and a roadmap for developers. International Journal of Hybrid Intelligent Systems 3(1):35–61
Rokach L (2009) Ensemble-based classifiers. Artificial Intelligence Review 33(1-2):1–39
Roli F, Giacinto G, Vernazza G (2001) Methods for designing multiple classifier systems. In: Kittler J, Roli F (eds) Multiple Classifier Systems, Lecture Notes in Computer Science, vol 2096, Springer Berlin Heidelberg, pp 78–87
Santana A, Soares R, Canuto A, Souto MCPd (2006) A dynamic classifier selection method to build ensembles using accuracy and diversity. In: Neural Networks, 2006. SBRN ’06. Ninth Brazilian Symposium on, pp 36–41
Seijo-Pardo B, Bolón-Canedo V, Alonso-Betanzos A (2017) Testing different ensemble configurations for feature selection. Neural Processing Letters pp 1–24
Srimani PK, Koti MS (2013) Medical Diagnosis Using Ensemble Classifiers - A Novel Machine-Learning Approach. Journal of Advanced Computing pp 9–27
Sun Y, Wong AKC, Kamel MS (2009) Classification of Imbalanced Data: A Review. International Journal of Pattern Recognition and Artificial Intelligence 23(04):687–719
Tang Y, Wang Y, Cooper KM, Li L (2014) Towards big data Bayesian network learning-an ensemble learning based approach. In: Big Data (BigData Congress), 2014 IEEE International Congress on, IEEE, pp 355–357
Thammasiri D, Meesad P (2012) Ensemble Data Classification based on Diversity of Classifiers Optimized by Genetic Algorithm. Advanced Materials Research 433-440:6572–6578
Tsymbal A, Puuronen S, Patterson DW (2003) Ensemble feature selection with the simple Bayesian classification. Information Fusion 4(2):87–100
Tulyakov S, Jaeger S, Govindaraju V, Doermann D (2008) Review of classifier combination methods. In: Marinai S, Fujisawa H (eds) Machine Learning in Document Analysis and Recognition, Studies in Computational Intelligence, vol 90, Springer Berlin Heidelberg, pp 361–386
Valentini G, Masulli F (2002) Ensembles of learning machines. In: Marinaro M, Tagliaferri R (eds) Neural Nets, Lecture Notes in Computer Science, vol 2486, Springer Berlin Heidelberg, pp 3–20
Wang L, Wu C (2017) Business failure prediction based on two-stage selective ensemble with manifold learning algorithm and kernel-based fuzzy self-organizing map. Knowl-Based Syst 121:99–110
Wang X, Wang H (2006) Classification by evolutionary ensembles. Pattern Recognition 39(4):595–607
Wang Y, Xiao H (2011) Ensemble learning for customers targeting. In: KSEM, Springer, Lecture Notes in Computer Science, vol 7091, pp 24–31
Wang Y, Yu C (2016) Research on the database marketing in the big data environment based on ensemble learning. Economics 12(6):21–32
Xiao J, Xie L, He C, Jiang X (2012) Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Syst Appl 39(3):3668–3675
Xiao J, Wang Y, Wang S (2013) A dynamic transfer ensemble model for customer churn prediction. In: BIFE, IEEE Computer Society, pp 115–119
Xiao J, Xiao Y, Huang A, Liu D, Wang S (2015) Feature-selection-based dynamic transfer ensemble model for customer churn prediction. Knowl Inf Syst 43(1):29–51
Xiao J, Jiang X, He C, Teng G (2016) Churn prediction in customer relationship management via GMDH-based multiple classifiers ensemble. IEEE Intelligent Systems 31(2):37–44
Xie L, Draizen EJ, Bourne PE (2017) Harnessing big data for systems pharmacology. Annual Review of Pharmacology and Toxicology 57:245–262
Xu R, He L (2008) GACEM: Genetic Algorithm Based Classifier Ensemble in a Multi-sensor System. Sensors 8(10):6203–6224
Yang P, Liu W, Zhou BB, Chawla S, Zomaya AY (2013) Ensemble-based wrapper methods for feature selection and class imbalance learning. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, pp 544–555
Yu L, Lai KK, Wang S (2008a) An evolutionary programming based knowledge ensemble model for business risk identification. In: Soft Computing Applications in Business, Studies in Fuzziness and Soft Computing, vol 230, Springer, pp 57–72
Yu L, Wang S, Lai KK (2008b) An EMD-based neural network ensemble learning model for world crude oil spot price forecasting. In: Soft Computing Applications in Business, Studies in Fuzziness and Soft Computing, vol 230, Springer, pp 261–271
Zhang L, Wang X, Moon WM (2015) PolSAR images classification through GA-based selective ensemble learning. In: Geoscience and Remote Sensing Symposium (IGARSS), 2015 IEEE International, pp 3770–3773
Zhang Y, Bhattacharyya S (2004) Genetic programming in classifying large-scale data: an ensemble method. Information Sciences 163(1–3):85–101
Zhang Z, Chen Q, Ke S, Wu Y, Qi F (2010a) Ranking potential customers based on group-ensemble. In: Strategic Advancements in Utilizing Data Mining and Warehousing Technologies, IGI Global, pp 355–365
Zhang Z, Li J, Hu H, Zhou H (2010b) A robust ensemble classification method analysis. In: Arabnia HR (ed) Advances in Computational Biology, Advances in Experimental Medicine and Biology, vol 680, Springer New York, pp 149–155
Zhao W, Liu H, Dai W, Ma J (2016) An entropy-based clustering ensemble method to support resource allocation in business process management. Knowl Inf Syst 48(2):305–330
Zhou L, Pan S, Wang J, Vasilakos AV (2017) Machine learning on big data: Opportunities and challenges. Neurocomputing 237:350–361
Acknowledgements
Pablo Moscato acknowledges previous support from the Australian Research Council Future Fellowship FT120100060 and Australian Research Council Discovery Projects DP120102576 and DP140104183.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Haque, M.N., Moscato, P. (2019). From Ensemble Learning to Meta-Analytics: A Review on Trends in Business Applications. In: Moscato, P., de Vries, N. (eds) Business and Consumer Analytics: New Ideas. Springer, Cham. https://doi.org/10.1007/978-3-030-06222-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-06222-4_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06221-7
Online ISBN: 978-3-030-06222-4
eBook Packages: Computer ScienceComputer Science (R0)