Improved Weighted Random Forest for Classification Problems

Shahhosseini, Mohsen; Hu, Guiping

doi:10.1007/978-3-030-66501-2_4

Mohsen Shahhosseini¹⁷ &
Guiping Hu¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1301))

Included in the following conference series:

International Online Conference on Intelligent Decision Science

872 Accesses
7 Citations
4 Altmetric

Abstract

Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key in making well-performing ensemble model is in the diversity of the base models. Of the most common solutions for introducing diversity into the decision trees are bagging and random forest. Bagging enhances the diversity by sampling with replacement and generating many training data sets, while random forest adds selecting random number of features as well. This has made random forest a winning candidate for many machine learning applications. However, assuming equal weights for all base decision trees does not seem reasonable as the randomization of sampling and input feature selection may lead to different levels of decision-making abilities across base decision trees. Therefore, we propose several algorithms that intend to modify the weighting strategy of regular random forest and consequently make better predictions. The designed weighting frameworks include optimal weighted random forest based on accuracy, optimal weighted random forest based on area under the curve (AUC), performance-based weighted random forest, and several stacking-based weighted random forest models. The numerical results show that the proposed models are able to introduce significant improvements compared to regular random forest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Brown, G.: Ensemble learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning and Data Mining, pp. 393–402. Springer, US, Boston, MA (2017)
Chapter Google Scholar
Kuncheva, L.I., Classifiers, C.P.: Methods and Algorithms. John Wiley & Sons, New York, NY (2004)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. Paper presented at the ICML (1996)
Google Scholar
Li, H.B., Wang, W., Ding, H.W., Dong, J.: Trees weighting random forest method for classifying high-dimensional noisy data. In: Paper presented at the 2010 IEEE 7th International Conference on E-Business Engineering, 10–12 November (2010)
Google Scholar
Pham, H., Olafsson, S.: Bagged ensembles with tunable parameters. Comput. Intell. 35(1), 184–203 (2019)
Article MathSciNet Google Scholar
Robnik-Šikonja, M.: Improving random forests. In: Paper presented at the Machine Learning: ECML 2004, Berlin, Heidelberg (2004)
Google Scholar
Kim, H., Kim, H., Moon, H., Ahn, H.: A weight-adjusted voting algorithm for ensembles of classifiers. J. Korean Stat. Soc. 40, 437–449 (2011)
Article MathSciNet Google Scholar
Kuncheva, L.I., Rodríguez, J.J.: A weighted voting framework for classifiers ensembles. Knowl. Inf. Syst. 38(2), 259–275 (2014)
Article Google Scholar
Winham, S.J., Freimuth, R.R., Biernacka, J.M.: A weighted random forests approach to improve predictive performance. Stat. Anal. Data Min. ASA Data Sci. J. 6(6), 496–505 (2013)
Article MathSciNet Google Scholar
Booth, A., Gerding, E., McGroarty, F.: Automated trading with performance weighted random forests and seasonality. Expert Syst. Appl. 41(8), 3651–3661 (2014)
Article Google Scholar
Xuan, S., Liu, G., Li, Z.: Refined weighted random forest and its application to credit card fraud detection. In: Paper presented at the Computational Data and Social Networks, Cham
Google Scholar
Byeon, H., Cha, S., Lim, K.: Exploring factors associated with voucher program for speech language therapy for the preschoolers of parents with communication disorder using weighted random forests, p. 10. Int. J. Adv. Comput. Sci., Appl (2019)
Google Scholar
Sunil Babu, M., Vijayalakshmi, V.: An effective approach for sub-acute ischemic stroke lesion segmentation by adopting meta-heuristics feature selection technique along with hybrid naive bayes and sample-weighted random forest classification. Sens. Imaging 20(1), 7 (2019)
Article Google Scholar
Utkin, L.V., Konstantinov, A.V., Chukanov, V.S., Kots, M.V., Ryabinin, M.A., Meldo, A.A.: A weighted random survival forest. Knowl.-Based Syst. 177, 136–144 (2019)
Article Google Scholar
Shahhosseini, M., Hu, G., Pham, H.: Optimizing Ensemble Weights and Hyperparameters of Machine Learning Models for Regression Problems. arXiv preprint arXiv:1908.05287(2019)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, New York (2013)
Book Google Scholar
Wolpert, D.H.: Stacked generalization. Neural Networks 5(2), 241–259 (1992)
Article Google Scholar
Dua, D., Graff, C.: UCI Machine Learning Repository Irvine, CA: University of California, School of Information and Computer Science. [http://archive.ics.uci.edu/ml]
Jones, E., Oliphant, T., Peterson, P.: SciPy: Open source scientific tools for Python (2001)
Google Scholar
Czerniak, J., Zarzycki, H.: Application of rough sets in the presumptive diagnosis of urinary system diseases. In: Artificial Intelligence and Security in Computing Systems, pp. 41–51. Springer, Boston, MA (2003)
Google Scholar
Hooda, N., Bawa, S., Rana, P.S.: Fraudulent firm classification: a case study of an external audit. Appl. Artif. Intell. 32(1), 48–64 (2018)
Article Google Scholar
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
Article Google Scholar
Fernandes, K., Cardoso, J. S., Fernandes, J.: Transfer learning with partial observability applied to cervical cancer screening. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 243–250. Springer, Cham, June 2017
Google Scholar
Yeh, I.C., Lien, C.H.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009)
Article Google Scholar
Yöntem, M.K., Adem, K., İlhan, T., Kılıçarslan, S.: Divorce prediction using correlation based feature selection and artificial neural networks. Nevşehir Hacı Bektaş Veli Üniversitesi SBE Dergisi 9(1), 259–273 (2019)
Google Scholar
Fehrman, E., Muhammad, A.K., Mirkes, E.M., Egan, V., Gorban, A.N.: The five factor model of personality and evaluation of drug consumption risk. In: Data Science, pp. 231–242. Springer, Cham (2017)
Google Scholar
Alizadehsani, R., Habibi, J., Hosseini, M.J., Mashayekhi, H., Boghrati, R., Ghandeharioun, A., Sani, Z.A.: A data mining approach for diagnosis of coronary artery disease. Comput. Methods Programs Biomed. 111(1), 52–61 (2013)
Article Google Scholar
Lyon, R.J., Stappers, B.W., Cooper, S., Brooke, J.M., Knowles, J.D.: Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach. Mon. Not. R. Astron. Soc. 459(1), 1104–1123 (2016)
Article Google Scholar
Donate, J.P., Cortez, P., SáNchez, G.G., De Miguel, A.S.: Time series forecasting using a weighted cross-validation evolutionary artificial neural network ensemble. Neurocomputing 109, 27–32 (2013)
Article Google Scholar
Peykani, P., Mohammadi, E., Saen, R.F., Sadjadi, S.J., Rostamy-Malkhalifeh, M.: Data envelopment analysis and robust optimization: a review. Expert Syst. e12534 (2020)
Google Scholar
Cielen, A., Peeters, L., Vanhoof, K.: Bankruptcy prediction using a data envelopment analysis. Eur. J. Oper. Res. 154(2), 526–532 (2004)
Article Google Scholar
Peykani, P., Mohammadi, E., Emrouznejad, A., Pishvaee, M.S., Rostamy-Malkhalifeh, M.: Fuzzy data envelopment analysis: an adjustable approach. Expert Syst. Appl. 136, 439–452 (2019)
Article Google Scholar
Peykani, P., Mohammadi, E., Pishvaee, M.S., Rostamy-Malkhalifeh, M., Jabbarzadeh, A.: A novel fuzzy data envelopment analysis based on robust possibilistic programming: possibility, necessity and credibility-based approaches. RAIRO-Oper. Res. 52(4–5), 1445–1463 (2018)
Article MathSciNet Google Scholar
Zheng, Z., Padmanabhan, B.: Constructing ensembles from data envelopment analysis. INFORMS J. Comput. 19(4), 486–496 (2007)
Article Google Scholar
Peykani, P., Mohammadi, E.: Window network data envelopment analysis: an application to investment companies. Int. J. Ind. Math. 12(1), 89–99 (2020)
Google Scholar
Hong, H.K., Ha, S.H., Shin, C.K., Park, S.C., Kim, S.H.: Evaluating the efficiency of system integration projects using data envelopment analysis (DEA) and machine learning. Expert Syst. Appl. 16(3), 283–296 (1999)
Article Google Scholar
Peykani, P., Mohammadi, E., Seyed Esmaeili, F.S.: Stock evaluation under mixed uncertainties using robust DEA model. J. Qual. Eng. Prod. Optim. 4(1), 73–84 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, USA
Mohsen Shahhosseini & Guiping Hu

Authors

Mohsen Shahhosseini
View author publications
You can also search for this author in PubMed Google Scholar
Guiping Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohsen Shahhosseini .

Editor information

Editors and Affiliations

Faculty of Engineering and Natural Sciences, Bahcesehir University, Istanbul, Turkey
Tofigh Allahviranloo
Faculty of Engineering and Natural Sciences, Bahçeşehir University, Istanbul, Turkey
Soheil Salahshour
Faculty of Engineering and Natural Sciences, Bahcesehir University, Istanbul, Turkey
Nafiz Arica

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shahhosseini, M., Hu, G. (2021). Improved Weighted Random Forest for Classification Problems. In: Allahviranloo, T., Salahshour, S., Arica, N. (eds) Progress in Intelligent Decision Science. IDS 2020. Advances in Intelligent Systems and Computing, vol 1301. Springer, Cham. https://doi.org/10.1007/978-3-030-66501-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-66501-2_4
Published: 30 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66500-5
Online ISBN: 978-3-030-66501-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics