Skip to main content
Log in

Cost-sensitive ensemble methods for bankruptcy prediction in a highly imbalanced data distribution: a real case from the Spanish market

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

Bankruptcy is an issue of interest in the business world since decades. It is a crucial endeavor for survival to predict this phenomenon in periods of economic turmoil and recession. In fact, bankruptcy modeling is challenging due to the complexity of contributing factors and the highly imbalanced distribution of available data sets. This work aims at improving the prediction power of bankruptcy modeling, by applying cost-sensitive ensemble methods on a real-world Spanish bankruptcy data set to generate prediction models. The performance of the prediction models is highly competitive in comparison with the related research in the field. Cost-sensitive random forests over-performed other approaches in predicting bankruptcy, achieving a geometric mean of 90.7%, 0.094 and 0.088 type I & type II errors, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Bought from http://infotel.es.

References

  1. Akerlof, G.A., Romer, P.M., Hall, R.E., Mankiw, N.G.: Looting: the economic underworld of bankruptcy for profit. Brook. Pap. Econ. Act. 1993(2), 1–73 (1993)

    Google Scholar 

  2. Alaminos, D., del Castillo, A., Fernández, M.Á.: A global model for bankruptcy prediction. PLoS ONE 11(11), e0166693 (2016)

    Google Scholar 

  3. Alswiti, W., Faris, H., Aljawazneh, H., Safi, S., Castillo, P., Mora, A., Abukhurma, R., Alsawalqah, H.: Empirical evaluation of advanced oversampling methods for improving bankruptcy prediction. In: Proceedings of the International Conference on Time Series and Forecasting (ITISE 2018), pp. 1495–1506 (2018)

  4. Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Finance 23(4), 589–609 (1968)

    Google Scholar 

  5. Altman, E.I., Hotchkiss, E.: Corporate financial distress and bankruptcy: predict and avoid bankruptcy, analyze and invest in distressed debt, vol. 289. Wiley, Hoboken (2010)

    Google Scholar 

  6. Baird, D.G., Morrison, E.R.: Bankruptcy decision making. J Law Econ Organ 17(2), 356–372 (2001)

    Google Scholar 

  7. Balakrishnama, S., Ganapathiraju, A.: Linear discriminant analysis-a brief tutorial. In: Institute for Signal and information Processing, p. 18 (1998)

  8. Barboza, F., Kimura, H., Altman, E.: Machine learning models and bankruptcy prediction. Expert Syst. Appl. 83, 405–417 (2017)

    Google Scholar 

  9. Bellovary, J.L., Giacomino, D.E., Akers, M.D.: A review of bankruptcy prediction studies: 1930 to present. J. Financ. Educ. 3, 1–42 (2007)

    Google Scholar 

  10. Blanco-Oliver, A., Irimia-Dieguez, A., Oliver-Alfonso, M., Wilson, N.: Improving bankruptcy prediction in micro-entities by using nonlinear effects and non-financial variables. Finance Uver 65(2), 144 (2015)

    Google Scholar 

  11. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  12. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    MATH  Google Scholar 

  13. Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer (2009)

  14. Chen, N., Ribeiro, B., Vieira, A.S., Duarte, J., Neves, J.C.: A genetic algorithm-based approach to cost-sensitive bankruptcy prediction. Expert Syst. Appl. 38(10), 12939–12945 (2011)

    Google Scholar 

  15. Cho, S., Hong, H., Ha, B.C.: A hybrid approach based on the combination of variable selection using decision trees and case-based reasoning using the mahalanobis distance: For bankruptcy prediction. Expert Syst. Appl. 37(4), 3482–3488 (2010)

    Google Scholar 

  16. Collins, R.A., Green, R.D.: Statistical methods for bankruptcy forecasting. J. Econ. Bus. 34(4), 349–354 (1982)

    Google Scholar 

  17. Constand, R.L., Yazdipour, R.: Firm failure prediction models: a critique and a review of recent developments. In: Advances in Entrepreneurial Finance, pp. 185–204. Springer (2011)

  18. Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’99, pp. 155–164. ACM, New York, NY, USA (1999). https://doi.org/10.1145/312129.312220

  19. Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17, pp. 973–978. Lawrence Erlbaum Associates Ltd (2001)

  20. Faris, H., Abukhurma, R., Almanaseer, W., Saadeh, M., Mora, A.M., Castillo, P.A., Aljarah, I.: Improving financial bankruptcy prediction in a highly imbalanced class distribution using oversampling and ensemble learning: a case from the spanish market. In: Progress in Artificial Intelligence, pp. 1–23 (2019)

  21. Fejér-Király, G., et al.: Bankruptcy prediction: a survey on evolution, critiques, and solutions. Acta Universitatis Sapientiae, Econ. Bus. 3(1), 93–108 (2015)

    Google Scholar 

  22. Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165–175 (1989)

    MathSciNet  Google Scholar 

  23. García, V., Marqués, A.I., Sánchez, J.S.: Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Inform. Fusion 47, 88–101 (2019). https://doi.org/10.1016/j.inffus.2018.07.004

    Article  Google Scholar 

  24. Gerritsen, P.: Accuracy rate of bankruptcy prediction models for the dutch professional football industry. Master’s thesis, University of Twente (2015)

  25. Ghatasheh, N., Faris, H., AlTaharwa, I., Harb, Y., Harb, A.: Business analytics in telemarketing: cost-sensitive analysis of bank campaigns using artificial neural networks. Appl. Sci. 10(7), 2581 (2020). https://doi.org/10.3390/app10072581

    Article  Google Scholar 

  26. Grice, J.S., Dugan, M.T.: The limitations of bankruptcy prediction models: some cautions for the researcher. Rev. Quant. Financ. Acc. 17(2), 151–166 (2001)

    Google Scholar 

  27. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1), 29–36 (1982)

    Google Scholar 

  28. Kaski, S., Sinkkonen, J., Peltonen, J.: Bankruptcy analysis with self-organizing maps in learning metrics. IEEE Trans. Neural Netw. 12(4), 936–947 (2001)

    MATH  Google Scholar 

  29. Khor, K.C., Ng, K.H.: Evaluation of cost sensitive learning for imbalanced bank direct marketing data. Indian J. Sci. Technol. (2016). https://doi.org/10.17485/ijst/2016/v9i42/100812

    Article  Google Scholar 

  30. Kim, M.J., Kang, D.K., Kim, H.B.: Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction. Expert Syst. Appl. 42(3), 1074–1082 (2015)

    Google Scholar 

  31. Kiviluoto, K.: Predicting bankruptcies with the self-organizing map. Neurocomputing 21(1), 191–201 (1998)

    MATH  Google Scholar 

  32. Kleinert, M.: Comparison of bankruptcy prediction models of Altman (1969), Ohlson (1980) and Zmijewski (1984) on German and Belgian listed companies between 2008–2013. Master’s thesis, University of Twente (2014)

  33. Korol, T., Korodi, A., et al.: An evaluation of effectiveness of fuzzy logic model in predicting the business bankruptcy. Rom. J. Econ. Forecast. 3(1), 92–107 (2011)

    Google Scholar 

  34. Kumar, P.R., Ravi, V.: Bankruptcy prediction in banks and firms via statistical and intelligent techniques-a review. Eur. J. Oper. Res. 180(1), 1–28 (2007)

    MATH  Google Scholar 

  35. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)

    MATH  Google Scholar 

  36. Laitinen, E.K., Laitinen, T.: Bankruptcy prediction: application of the Taylor’s expansion in logistic regression. Int. Rev. Financ. Anal. 9(4), 327–349 (2001)

    Google Scholar 

  37. Le, T., Vo, M.T., Vo, B., Lee, M.Y., Baik, S.W.: A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction. Complexity 2019, 8460934 (2019). https://doi.org/10.1155/2019/8460934

    Article  Google Scholar 

  38. Lee, H.H., Lin, C.M.: Industry effect, credit contagion and bankruptcy prediction. In: 20th Annual Conference on Pacific Basin Finance, Economics, Accounting, and Management (2012)

  39. Leo, M., Sharma, S., Maddulety, K.: Machine learning in banking risk management: a literature review. Risks 7(1), 29 (2019)

    Google Scholar 

  40. Melville, P., Mooney, R.J.: Constructing diverse classifier ensembles using artificial training examples. IJCAI 3, 505–510 (2003)

    Google Scholar 

  41. Melville, P., Mooney, R.J.: Creating diversity in ensembles using artificial data. Inform. Fusion 6(1), 99–111 (2005)

    Google Scholar 

  42. Min, S.H., Lee, J., Han, I.: Hybrid genetic algorithms and support vector machines for bankruptcy prediction. Expert Syst. Appl. 31(3), 652–660 (2006)

    Google Scholar 

  43. Mossman, C.E., Bell, G.G., Swartz, L.M., Turtle, H.: An empirical comparison of bankruptcy models. Financ. Rev. 33(2), 35–54 (1998)

    Google Scholar 

  44. Nassimbwa, J., Tian, Y.: Bankruptcy effect on business competitors: Empirical study of US companies (2013). http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-76240

  45. Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)

    MATH  Google Scholar 

  46. Ouenniche, J., Bouslah, K., Cabello, J.M., Ruiz, F.: A new classifier based on the reference point method with application in bankruptcy prediction. J. Oper. Res. Soc. 69(10), 1653–1660 (2018)

    Google Scholar 

  47. O’Brien, R.G., Castelloe, J.: Sample size analysis for traditional hypothesis testing: concepts and issues. In: Pharmaceutical Statistics Using SAS: A Practical Guide, pp. 237–71 (2007)

  48. Pacey, J.W., Pham, T.M.: The predictiveness of bankruptcy models: methodological problems and evidence. Aust. J. Manag. 15(2), 315–337 (1990)

    Google Scholar 

  49. Pervan, I., Kuvek, T.: The relative importance of financial ratios and nonfinancial variables in predicting of insolvency. Croat. Oper. Res. Rev. 4(1), 187–197 (2013)

    Google Scholar 

  50. Rahim, A.H.A., Rashid, N.A., Nayan, A., Ahmad, A.R.: Smote approach to imbalanced dataset in logistic regression analysis. In: Proceedings of the Third International Conference on Computing, Mathematics and Statistics (iCMS2017), pp. 429–433. Springer (2019)

  51. Rey, D., Neuhäuser, M.: Wilcoxon-Signed-Rank Test, pp. 1658–1659. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-04898-2_616

    Book  Google Scholar 

  52. Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1619–1630 (2006)

    Google Scholar 

  53. Schapire, R.E.: Explaining adaboost. In: Empirical Inference, pp. 37–52. Springer (2013)

  54. Shen, F., Zhao, X., Li, Z., Li, K., Meng, Z.: A novel ensemble classification model based on neural networks and a classifier optimisation technique for imbalanced credit risk evaluation. Phys. A Stat. Mech. Appl. 526, 121073 (2019)

    Google Scholar 

  55. Shin, K.S., Lee, T.S., Kim, H.J.: An application of support vector machines in bankruptcy prediction model. Expert Syst. Appl. 28(1), 127–135 (2005)

    Google Scholar 

  56. Shin, K.S., Lee, Y.J.: A genetic algorithm application in bankruptcy prediction modeling. Expert Syst. Appl. 23(3), 321–328 (2002)

    Google Scholar 

  57. Shumway, T.: Forecasting bankruptcy more accurately: a simple hazard model. J. Bus. 74(1), 101–124 (2001)

    Google Scholar 

  58. Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)

    MATH  Google Scholar 

  59. Turney, P.D.: Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J. Artif. Intell. Res. 2, 369–409 (1994)

    Google Scholar 

  60. Vu, L.T., Vu, L.T., Nguyen, N.T., Do, P.T.T., Dao, D.P.: Feature selection methods and sampling techniques to financial distress prediction for vietnamese listed companies. Invest. Manag. Financ. Innov. 16(1), 276 (2019)

    Google Scholar 

  61. Wang, H.: Cost-sensitive adaboost selective ensemble for financial distress prediction. Int. J. u e Serv. Sci. Technol. 8(10), 83–94 (2015)

    Google Scholar 

  62. Wang, J.: Data Warehousing and Mining: Concepts, Methodologies, Tools, and Applications, vol. 3. IGI Global, Pennsylvania (2008)

    Google Scholar 

  63. Weiss, G.M., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? DMIN 7, 35–41 (2007)

    Google Scholar 

  64. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)

    Google Scholar 

  65. Wu, X., Yang, D., Zhang, W., Zhang, S.: A hybrid ensemble model for corporate bankruptcy prediction based on feature engineering method. Int. J. Inform. Commun. Sci. 4(3), 63 (2019)

    Google Scholar 

  66. Xu, W., Fu, H., Pan, Y.: A novel soft ensemble model for financial distress prediction with different sample sizes. Math. Probl. Eng. 2019, 3085247 (2019). https://doi.org/10.1155/2019/3085247

    Article  Google Scholar 

  67. Yu, Q., Miche, Y., Lendasse, A., Séverin, E.: Bankruptcy prediction with missing data. In: Proceedings of 2011 International Conference on Data Mining, Las Vegas, USA, pp. 279–285 (2011)

  68. Zefrehi, H.G., Altınçay, H.: Imbalance learning using heterogeneous ensembles. Expert Syst. Appl. 142, 113005 (2020)

    Google Scholar 

  69. Zhang, G., Hu, M.Y., Patuwo, B.E., Indro, D.C.: Artificial neural networks in bankruptcy prediction: general framework and cross-validation analysis. Eur. J. Oper. Res. 116(1), 16–32 (1999)

    MATH  Google Scholar 

  70. Zhou, Z.H.: Cost-sensitive learning. In: International Conference on Modeling Decisions for Artificial Intelligence, pp. 17–18. Springer (2011)

Download references

Acknowledgements

This work has been supported in part by Ministerio español de Economía y Competitividad under Project TIN2017-85727-C4-2-P (UGR-DeepBio), SPIP2017-02116 and TEC2015-68752 (also funded by FEDER), as well as Project B-TIC-402-UGR18 (FEDER and Junta de Andalucíıa) and RTI2018-102002-A-I00 (Ministerio español de Ciencia, Innovación y Universidades).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro A. Castillo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghatasheh, N., Faris, H., Abukhurma, R. et al. Cost-sensitive ensemble methods for bankruptcy prediction in a highly imbalanced data distribution: a real case from the Spanish market. Prog Artif Intell 9, 361–375 (2020). https://doi.org/10.1007/s13748-020-00219-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-020-00219-x

Keywords

Navigation