Skip to main content

Improved Weighted Random Forest for Classification Problems

  • Conference paper
  • First Online:
Progress in Intelligent Decision Science (IDS 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1301))

Included in the following conference series:

Abstract

Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key in making well-performing ensemble model is in the diversity of the base models. Of the most common solutions for introducing diversity into the decision trees are bagging and random forest. Bagging enhances the diversity by sampling with replacement and generating many training data sets, while random forest adds selecting random number of features as well. This has made random forest a winning candidate for many machine learning applications. However, assuming equal weights for all base decision trees does not seem reasonable as the randomization of sampling and input feature selection may lead to different levels of decision-making abilities across base decision trees. Therefore, we propose several algorithms that intend to modify the weighting strategy of regular random forest and consequently make better predictions. The designed weighting frameworks include optimal weighted random forest based on accuracy, optimal weighted random forest based on area under the curve (AUC), performance-based weighted random forest, and several stacking-based weighted random forest models. The numerical results show that the proposed models are able to introduce significant improvements compared to regular random forest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Brown, G.: Ensemble learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning and Data Mining, pp. 393–402. Springer, US, Boston, MA (2017)

    Chapter  Google Scholar 

  2. Kuncheva, L.I., Classifiers, C.P.: Methods and Algorithms. John Wiley & Sons, New York, NY (2004)

    Google Scholar 

  3. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  5. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. Paper presented at the ICML (1996)

    Google Scholar 

  6. Li, H.B., Wang, W., Ding, H.W., Dong, J.: Trees weighting random forest method for classifying high-dimensional noisy data. In: Paper presented at the 2010 IEEE 7th International Conference on E-Business Engineering, 10–12 November (2010)

    Google Scholar 

  7. Pham, H., Olafsson, S.: Bagged ensembles with tunable parameters. Comput. Intell. 35(1), 184–203 (2019)

    Article  MathSciNet  Google Scholar 

  8. Robnik-Å ikonja, M.: Improving random forests. In: Paper presented at the Machine Learning: ECML 2004, Berlin, Heidelberg (2004)

    Google Scholar 

  9. Kim, H., Kim, H., Moon, H., Ahn, H.: A weight-adjusted voting algorithm for ensembles of classifiers. J. Korean Stat. Soc. 40, 437–449 (2011)

    Article  MathSciNet  Google Scholar 

  10. Kuncheva, L.I., Rodríguez, J.J.: A weighted voting framework for classifiers ensembles. Knowl. Inf. Syst. 38(2), 259–275 (2014)

    Article  Google Scholar 

  11. Winham, S.J., Freimuth, R.R., Biernacka, J.M.: A weighted random forests approach to improve predictive performance. Stat. Anal. Data Min. ASA Data Sci. J. 6(6), 496–505 (2013)

    Article  MathSciNet  Google Scholar 

  12. Booth, A., Gerding, E., McGroarty, F.: Automated trading with performance weighted random forests and seasonality. Expert Syst. Appl. 41(8), 3651–3661 (2014)

    Article  Google Scholar 

  13. Xuan, S., Liu, G., Li, Z.: Refined weighted random forest and its application to credit card fraud detection. In: Paper presented at the Computational Data and Social Networks, Cham

    Google Scholar 

  14. Byeon, H., Cha, S., Lim, K.: Exploring factors associated with voucher program for speech language therapy for the preschoolers of parents with communication disorder using weighted random forests, p. 10. Int. J. Adv. Comput. Sci., Appl (2019)

    Google Scholar 

  15. Sunil Babu, M., Vijayalakshmi, V.: An effective approach for sub-acute ischemic stroke lesion segmentation by adopting meta-heuristics feature selection technique along with hybrid naive bayes and sample-weighted random forest classification. Sens. Imaging 20(1), 7 (2019)

    Article  Google Scholar 

  16. Utkin, L.V., Konstantinov, A.V., Chukanov, V.S., Kots, M.V., Ryabinin, M.A., Meldo, A.A.: A weighted random survival forest. Knowl.-Based Syst. 177, 136–144 (2019)

    Article  Google Scholar 

  17. Shahhosseini, M., Hu, G., Pham, H.: Optimizing Ensemble Weights and Hyperparameters of Machine Learning Models for Regression Problems. arXiv preprint arXiv:1908.05287(2019)

  18. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, New York (2013)

    Book  Google Scholar 

  19. Wolpert, D.H.: Stacked generalization. Neural Networks 5(2), 241–259 (1992)

    Article  Google Scholar 

  20. Dua, D., Graff, C.: UCI Machine Learning Repository Irvine, CA: University of California, School of Information and Computer Science. [http://archive.ics.uci.edu/ml]

  21. Jones, E., Oliphant, T., Peterson, P.: SciPy: Open source scientific tools for Python (2001)

    Google Scholar 

  22. Czerniak, J., Zarzycki, H.: Application of rough sets in the presumptive diagnosis of urinary system diseases. In: Artificial Intelligence and Security in Computing Systems, pp. 41–51. Springer, Boston, MA (2003)

    Google Scholar 

  23. Hooda, N., Bawa, S., Rana, P.S.: Fraudulent firm classification: a case study of an external audit. Appl. Artif. Intell. 32(1), 48–64 (2018)

    Article  Google Scholar 

  24. Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)

    Article  Google Scholar 

  25. Fernandes, K., Cardoso, J. S., Fernandes, J.: Transfer learning with partial observability applied to cervical cancer screening. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 243–250. Springer, Cham, June 2017

    Google Scholar 

  26. Yeh, I.C., Lien, C.H.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009)

    Article  Google Scholar 

  27. Yöntem, M.K., Adem, K., İlhan, T., Kılıçarslan, S.: Divorce prediction using correlation based feature selection and artificial neural networks. Nevşehir Hacı Bektaş Veli Üniversitesi SBE Dergisi 9(1), 259–273 (2019)

    Google Scholar 

  28. Fehrman, E., Muhammad, A.K., Mirkes, E.M., Egan, V., Gorban, A.N.: The five factor model of personality and evaluation of drug consumption risk. In: Data Science, pp. 231–242. Springer, Cham (2017)

    Google Scholar 

  29. Alizadehsani, R., Habibi, J., Hosseini, M.J., Mashayekhi, H., Boghrati, R., Ghandeharioun, A., Sani, Z.A.: A data mining approach for diagnosis of coronary artery disease. Comput. Methods Programs Biomed. 111(1), 52–61 (2013)

    Article  Google Scholar 

  30. Lyon, R.J., Stappers, B.W., Cooper, S., Brooke, J.M., Knowles, J.D.: Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach. Mon. Not. R. Astron. Soc. 459(1), 1104–1123 (2016)

    Article  Google Scholar 

  31. Donate, J.P., Cortez, P., SáNchez, G.G., De Miguel, A.S.: Time series forecasting using a weighted cross-validation evolutionary artificial neural network ensemble. Neurocomputing 109, 27–32 (2013)

    Article  Google Scholar 

  32. Peykani, P., Mohammadi, E., Saen, R.F., Sadjadi, S.J., Rostamy-Malkhalifeh, M.: Data envelopment analysis and robust optimization: a review. Expert Syst. e12534 (2020)

    Google Scholar 

  33. Cielen, A., Peeters, L., Vanhoof, K.: Bankruptcy prediction using a data envelopment analysis. Eur. J. Oper. Res. 154(2), 526–532 (2004)

    Article  Google Scholar 

  34. Peykani, P., Mohammadi, E., Emrouznejad, A., Pishvaee, M.S., Rostamy-Malkhalifeh, M.: Fuzzy data envelopment analysis: an adjustable approach. Expert Syst. Appl. 136, 439–452 (2019)

    Article  Google Scholar 

  35. Peykani, P., Mohammadi, E., Pishvaee, M.S., Rostamy-Malkhalifeh, M., Jabbarzadeh, A.: A novel fuzzy data envelopment analysis based on robust possibilistic programming: possibility, necessity and credibility-based approaches. RAIRO-Oper. Res. 52(4–5), 1445–1463 (2018)

    Article  MathSciNet  Google Scholar 

  36. Zheng, Z., Padmanabhan, B.: Constructing ensembles from data envelopment analysis. INFORMS J. Comput. 19(4), 486–496 (2007)

    Article  Google Scholar 

  37. Peykani, P., Mohammadi, E.: Window network data envelopment analysis: an application to investment companies. Int. J. Ind. Math. 12(1), 89–99 (2020)

    Google Scholar 

  38. Hong, H.K., Ha, S.H., Shin, C.K., Park, S.C., Kim, S.H.: Evaluating the efficiency of system integration projects using data envelopment analysis (DEA) and machine learning. Expert Syst. Appl. 16(3), 283–296 (1999)

    Article  Google Scholar 

  39. Peykani, P., Mohammadi, E., Seyed Esmaeili, F.S.: Stock evaluation under mixed uncertainties using robust DEA model. J. Qual. Eng. Prod. Optim. 4(1), 73–84 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohsen Shahhosseini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shahhosseini, M., Hu, G. (2021). Improved Weighted Random Forest for Classification Problems. In: Allahviranloo, T., Salahshour, S., Arica, N. (eds) Progress in Intelligent Decision Science. IDS 2020. Advances in Intelligent Systems and Computing, vol 1301. Springer, Cham. https://doi.org/10.1007/978-3-030-66501-2_4

Download citation

Publish with us

Policies and ethics