Abstract
Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key in making well-performing ensemble model is in the diversity of the base models. Of the most common solutions for introducing diversity into the decision trees are bagging and random forest. Bagging enhances the diversity by sampling with replacement and generating many training data sets, while random forest adds selecting random number of features as well. This has made random forest a winning candidate for many machine learning applications. However, assuming equal weights for all base decision trees does not seem reasonable as the randomization of sampling and input feature selection may lead to different levels of decision-making abilities across base decision trees. Therefore, we propose several algorithms that intend to modify the weighting strategy of regular random forest and consequently make better predictions. The designed weighting frameworks include optimal weighted random forest based on accuracy, optimal weighted random forest based on area under the curve (AUC), performance-based weighted random forest, and several stacking-based weighted random forest models. The numerical results show that the proposed models are able to introduce significant improvements compared to regular random forest.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brown, G.: Ensemble learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning and Data Mining, pp. 393–402. Springer, US, Boston, MA (2017)
Kuncheva, L.I., Classifiers, C.P.: Methods and Algorithms. John Wiley & Sons, New York, NY (2004)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. Paper presented at the ICML (1996)
Li, H.B., Wang, W., Ding, H.W., Dong, J.: Trees weighting random forest method for classifying high-dimensional noisy data. In: Paper presented at the 2010 IEEE 7th International Conference on E-Business Engineering, 10–12 November (2010)
Pham, H., Olafsson, S.: Bagged ensembles with tunable parameters. Comput. Intell. 35(1), 184–203 (2019)
Robnik-Å ikonja, M.: Improving random forests. In: Paper presented at the Machine Learning: ECML 2004, Berlin, Heidelberg (2004)
Kim, H., Kim, H., Moon, H., Ahn, H.: A weight-adjusted voting algorithm for ensembles of classifiers. J. Korean Stat. Soc. 40, 437–449 (2011)
Kuncheva, L.I., RodrÃguez, J.J.: A weighted voting framework for classifiers ensembles. Knowl. Inf. Syst. 38(2), 259–275 (2014)
Winham, S.J., Freimuth, R.R., Biernacka, J.M.: A weighted random forests approach to improve predictive performance. Stat. Anal. Data Min. ASA Data Sci. J. 6(6), 496–505 (2013)
Booth, A., Gerding, E., McGroarty, F.: Automated trading with performance weighted random forests and seasonality. Expert Syst. Appl. 41(8), 3651–3661 (2014)
Xuan, S., Liu, G., Li, Z.: Refined weighted random forest and its application to credit card fraud detection. In: Paper presented at the Computational Data and Social Networks, Cham
Byeon, H., Cha, S., Lim, K.: Exploring factors associated with voucher program for speech language therapy for the preschoolers of parents with communication disorder using weighted random forests, p. 10. Int. J. Adv. Comput. Sci., Appl (2019)
Sunil Babu, M., Vijayalakshmi, V.: An effective approach for sub-acute ischemic stroke lesion segmentation by adopting meta-heuristics feature selection technique along with hybrid naive bayes and sample-weighted random forest classification. Sens. Imaging 20(1), 7 (2019)
Utkin, L.V., Konstantinov, A.V., Chukanov, V.S., Kots, M.V., Ryabinin, M.A., Meldo, A.A.: A weighted random survival forest. Knowl.-Based Syst. 177, 136–144 (2019)
Shahhosseini, M., Hu, G., Pham, H.: Optimizing Ensemble Weights and Hyperparameters of Machine Learning Models for Regression Problems. arXiv preprint arXiv:1908.05287(2019)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, New York (2013)
Wolpert, D.H.: Stacked generalization. Neural Networks 5(2), 241–259 (1992)
Dua, D., Graff, C.: UCI Machine Learning Repository Irvine, CA: University of California, School of Information and Computer Science. [http://archive.ics.uci.edu/ml]
Jones, E., Oliphant, T., Peterson, P.: SciPy: Open source scientific tools for Python (2001)
Czerniak, J., Zarzycki, H.: Application of rough sets in the presumptive diagnosis of urinary system diseases. In: Artificial Intelligence and Security in Computing Systems, pp. 41–51. Springer, Boston, MA (2003)
Hooda, N., Bawa, S., Rana, P.S.: Fraudulent firm classification: a case study of an external audit. Appl. Artif. Intell. 32(1), 48–64 (2018)
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
Fernandes, K., Cardoso, J. S., Fernandes, J.: Transfer learning with partial observability applied to cervical cancer screening. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 243–250. Springer, Cham, June 2017
Yeh, I.C., Lien, C.H.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009)
Yöntem, M.K., Adem, K., İlhan, T., Kılıçarslan, S.: Divorce prediction using correlation based feature selection and artificial neural networks. Nevşehir Hacı Bektaş Veli Üniversitesi SBE Dergisi 9(1), 259–273 (2019)
Fehrman, E., Muhammad, A.K., Mirkes, E.M., Egan, V., Gorban, A.N.: The five factor model of personality and evaluation of drug consumption risk. In: Data Science, pp. 231–242. Springer, Cham (2017)
Alizadehsani, R., Habibi, J., Hosseini, M.J., Mashayekhi, H., Boghrati, R., Ghandeharioun, A., Sani, Z.A.: A data mining approach for diagnosis of coronary artery disease. Comput. Methods Programs Biomed. 111(1), 52–61 (2013)
Lyon, R.J., Stappers, B.W., Cooper, S., Brooke, J.M., Knowles, J.D.: Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach. Mon. Not. R. Astron. Soc. 459(1), 1104–1123 (2016)
Donate, J.P., Cortez, P., SáNchez, G.G., De Miguel, A.S.: Time series forecasting using a weighted cross-validation evolutionary artificial neural network ensemble. Neurocomputing 109, 27–32 (2013)
Peykani, P., Mohammadi, E., Saen, R.F., Sadjadi, S.J., Rostamy-Malkhalifeh, M.: Data envelopment analysis and robust optimization: a review. Expert Syst. e12534 (2020)
Cielen, A., Peeters, L., Vanhoof, K.: Bankruptcy prediction using a data envelopment analysis. Eur. J. Oper. Res. 154(2), 526–532 (2004)
Peykani, P., Mohammadi, E., Emrouznejad, A., Pishvaee, M.S., Rostamy-Malkhalifeh, M.: Fuzzy data envelopment analysis: an adjustable approach. Expert Syst. Appl. 136, 439–452 (2019)
Peykani, P., Mohammadi, E., Pishvaee, M.S., Rostamy-Malkhalifeh, M., Jabbarzadeh, A.: A novel fuzzy data envelopment analysis based on robust possibilistic programming: possibility, necessity and credibility-based approaches. RAIRO-Oper. Res. 52(4–5), 1445–1463 (2018)
Zheng, Z., Padmanabhan, B.: Constructing ensembles from data envelopment analysis. INFORMS J. Comput. 19(4), 486–496 (2007)
Peykani, P., Mohammadi, E.: Window network data envelopment analysis: an application to investment companies. Int. J. Ind. Math. 12(1), 89–99 (2020)
Hong, H.K., Ha, S.H., Shin, C.K., Park, S.C., Kim, S.H.: Evaluating the efficiency of system integration projects using data envelopment analysis (DEA) and machine learning. Expert Syst. Appl. 16(3), 283–296 (1999)
Peykani, P., Mohammadi, E., Seyed Esmaeili, F.S.: Stock evaluation under mixed uncertainties using robust DEA model. J. Qual. Eng. Prod. Optim. 4(1), 73–84 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shahhosseini, M., Hu, G. (2021). Improved Weighted Random Forest for Classification Problems. In: Allahviranloo, T., Salahshour, S., Arica, N. (eds) Progress in Intelligent Decision Science. IDS 2020. Advances in Intelligent Systems and Computing, vol 1301. Springer, Cham. https://doi.org/10.1007/978-3-030-66501-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-66501-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66500-5
Online ISBN: 978-3-030-66501-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)