Abstract
The emergence of big data, information technology, and social media provides an enormous amount of information about firms’ current financial health. When facing this abundance of data, decision makers must identify the crucial information to build upon an effective and operative prediction model with a high quality of the estimated output. The feature selection technique can be used to select significant variables without lowering the quality of performance classification. In addition, one of the main goals of bankruptcy prediction is to identify the model specification with the strongest explanatory power. Building on this premise, an improved XGBoost algorithm based on feature importance selection (FS-XGBoost) is proposed. FS-XGBoost is compared with seven machine learning algorithms based on three well-known feature selection methods that are frequently used in bankruptcy prediction: stepwise discriminant analysis, stepwise logistic regression, and partial least squares discriminant analysis (PLS-DA). Our experimental results confirm that FS-XGBoost provides more accurate predictions, outperforming traditional feature selection methods.
Similar content being viewed by others
Notes
Tutorials explaining the methodology used by TANAGRA are available at the following address: https://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html.
References
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589–609.
Alzubi, J. A., Bharathikannan, B., & Tanwar, S. (2019). Boosted neural network ensemble classification for lung cancer disease diagnosis. Applied Soft Computing Journal, 80, 579–591. https://doi.org/10.1016/j.asoc.2019.04.031
Ander, J., Arévalo, J., Paredes, R., & Nin, J. (2018). End-to-end neural network architecture for fraud scoring in card payments. Pattern Recognition Letters, 105, 175–181. https://doi.org/10.1016/j.patrec.2017.08.024
Bao, W., Lianju, N., & Yue, K. (2019). Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Systems with Applications, 128, 301–315. https://doi.org/10.1016/j.eswa.2019.02.033
Barboza, F., Kimura, H., & Altman, E. (2017). Machine learning models and bankruptcy prediction. Expert Systems with Applications, 83, 405–417. https://doi.org/10.1016/j.eswa.2017.04.006
Bardos, M. (1998). Detecting the risk of company failure at the Banque de France. Journal of Banking & Finance, 22(10–11), 1405–1419. https://doi.org/10.1016/S0378-4266(98)00062-4
Bastien, P., Vinzi, V. E., & Tenenhaus, M. (2005). PLS generalised linear regression. Computational Statistics and Data Analysis, 48(1), 17–46. https://doi.org/10.1016/j.csda.2004.02.005
Becker, J. M., & Ismail, I. R. (2016). Accounting for sampling weights in PLS path modeling: Simulations and empirical examples. European Management Journal, 34(6), 606–617. https://doi.org/10.1016/j.emj.2016.06.009
Bellini, S., Cardinali, M. G., & Grandi, B. (2017). A structural equation model of impulse buying behaviour in grocery retailing. Journal of Retailing and Consumer Services, 36, 164–171.
Berrar, D. (2019). Performance measures for binary classification. In S. Ranganathan, M. Gribskov, K. Nakai, & C. Schönbach (Eds.), Encyclopedia of Bioinformatics and Computational Biology (pp. 546–560). Oxford: Academic Press. https://doi.org/10.1016/B978-0-12-809633-8.20351-8.
Blazy, R., & Stef, N. (2020). Bankruptcy procedures in the post-transition economies. European Journal of Law and Economics, 50(1), 7–64. https://doi.org/10.1007/s10657-019-09634-5
Bolón-Canedo, V., & Alonso-Betanzos, A. (2019). Ensembles for feature selection: A review and future trends. Information Fusion, 52, 1–12. https://doi.org/10.1016/j.inffus.2018.11.008
Boloukian, B., & Safi-esfahani, F. (2019). Journal neural networks. Elsevier. https://doi.org/10.1016/j.neunet.2019.07.012
Brezigar-Masten, A., & Masten, I. (2012). CART-based selection of bankruptcy predictors for the logit model. Expert Systems with Applications, 39(11), 10153–10159. https://doi.org/10.1016/j.eswa.2012.02.125
Buuren, S. V., & Groothuis-Oudshoorn, K. (2010). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–68.
Carmona, P., Climent, F., & Momparler, A. (2019). Predicting failure in the U. S. banking sector : An extreme gradient boosting approach. International Review of Economics and Finance, 61, 304–323. https://doi.org/10.1016/j.iref.2018.03.008
Chang, Y., Chang, K., & Wu, G. (2018). Application of eXtreme gradient boosting trees in the construction of credit risk assessment models for financial institutions. Applied Soft Computing Journal, 73, 914–920. https://doi.org/10.1016/j.asoc.2018.09.029
Chatzis, S. P., Siakoulis, V., Petropoulos, A., Stavroulakis, E., & Vlachogiannakis, N. (2018). Forecasting stock market crisis events using deep and statistical machine learning techniques. Expert Systems with Applications, 112, 353–371. https://doi.org/10.1016/j.eswa.2018.06.032
Chen, H. J., Huang, S. Y., & Lin, C. S. (2009). Alternative diagnosis of corporate bankruptcy: A neuro fuzzy approach. Expert Systems with Applications, 36(4), 7710–7720. https://doi.org/10.1016/j.eswa.2008.09.023
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).
Choi, H., Son, H., & Kim, C. (2018). Predicting financial distress of contractors in the construction industry using ensemble learning. Expert Systems with Applications, 110, 1–10. https://doi.org/10.1016/j.eswa.2018.05.026
Climent, F., Momparler, A., & Carmona, P. (2019). Anticipating bank distress in the Eurozone: An extreme gradient boosting approach. Journal of Business Research, 101, 885–896. https://doi.org/10.1016/j.jbusres.2018.11.015
Cordón, I., Luengo, J., García, S., Herrera, F., & Charte, F. (2019). Smartdata : Data preprocessing to achieve smart data in R. Neurocomputing, 360, 1–13. https://doi.org/10.1016/j.neucom.2019.06.006
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297.
du Jardin, P., & Séverin, E. (2012). Forecasting financial failure using a Kohonen map: A comparative study to improve model stability over time. European Journal of Operational Research, 221(2), 378–396. https://doi.org/10.1016/j.ejor.2012.04.006
Daoud, M., Mayo, M., Box, P. O., & Zealand, N. (2019). A survey of neural network-based cancer prediction models from microarray data. Artificial Intelligence in Medicine, 97, 204–214. https://doi.org/10.1016/j.artmed.2019.01.006
Dixon, M., Klabjan, D., & Bang, J. H. (2015). Implementing deep neural networks for financial market prediction on the Intel Xeon Phi. In Proceedings of the 8th workshop on high performance computational finance—WHPCF ’15 (pp. 1–6). https://doi.org/10.1145/2830556.2830562
du Jardin, P. (2010). Predicting bankruptcy using neural networks and other classification methods: The influence of variable selection techniques on model accuracy. Neurocomputing, 73(10–12), 2047–2060. https://doi.org/10.1016/j.neucom.2009.11.034
du Jardin, P. (2015). Bankruptcy prediction using terminal failure processes. European Journal of Operational Research, 242(1), 286–303. https://doi.org/10.1016/j.ejor.2014.09.059
Everett, J., & Watson, J. (1998). Small business failure and external risk factors. Small Business Economics, 11(4), 371–390. https://doi.org/10.1023/A:1008065527282
Fernández-Gámez, M. Á., Cisneros-Ruiz, A. J., & Callejón-Gil, Á. (2016). Applying a probabilistic neural network to hotel bankruptcy prediction. Tourism & Management Studies, 12(1), 40–52. https://doi.org/10.18089/tms.2016.12104
Friedman, J. (2001). Greedy function approximation : A gradient boosting machine author ( s ): Jerome H . Friedman Source : The Annals of Statistics , Vol . 29 , No . 5 ( Oct ., 2001 ), pp . 1189–1232 Published by : Institute of Mathematical Statistics Stable. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/009053606000000795
García, S., Luengo, J., & Herrera, F. (2016). Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowledge-Based Systems, 98, 1–29. https://doi.org/10.1016/j.knosys.2015.12.006
García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J. M., & Herrera, F. (2016). Big data preprocessing: Methods and prospects. Big Data Analytics, 1(1), 9. https://doi.org/10.1186/s41044-016-0014-0
Geng, R., Bose, I., & Chen, X. (2015). Prediction of financial distress: An empirical study of listed Chinese companies using data mining. European Journal of Operational Research, 241(1), 236–247. https://doi.org/10.1016/j.ejor.2014.08.016
Gilbert, L. R., Menon, K., & Schwartz, K. B. (1990). Predicting bankruptcy for firms in financial distress. Journal of Business Finance & Accounting, 17(1), 161–171. https://doi.org/10.1111/j.1468-5957.1990.tb00555.x
Hernandez Tinoco, M., & Wilson, N. (2013). Financial distress and bankruptcy prediction among listed companies using accounting, market and macroeconomic variables. International Review of Financial Analysis, 30, 394–419. https://doi.org/10.1016/j.irfa.2013.02.013
Hinton, G. E. (2006). Communicated by Yann Le Cun A fast learning algorithm for deep belief nets 500 units 500 units. Neural Computation, 1554, 1527–1554.
Hu, Z., Tang, J., Wang, Z., Zhang, K., Zhang, L., & Sun, Q. (2018). Deep learning for image-based cancer detection and diagnosis—A survey. Pattern Recognition, 83, 134–149. https://doi.org/10.1016/j.patcog.2018.05.014
Jabeur, S. B. (2017). Bankruptcy prediction using Partial Least Squares Logistic Regression. Journal of Retailing and Consumer Services, 36, 197–202. https://doi.org/10.1016/j.jretconser.2017.02.005
Jabeur, S. B., Gharib, C., Mefteh-Wali, S., & Arfi, W. B. (2021). CatBoost model and artificial intelligence techniques for corporate failure prediction. Technological Forecasting and Social Change, 166, 120658. https://doi.org/10.1016/j.techfore.2021.120658
Jabeur, S. B., Sadaaoui, A., Sghaier, A., & Aloui, R. (2020). Machine learning models and cost-sensitive decision trees for bond rating prediction. Journal of the Operational Research Society, 71(8), 1161–1179. https://doi.org/10.1080/01605682.2019.1581405
Jardin, P. (2016). A two-stage classification technique for bankruptcy prediction. European Journal of Operational Research, 254, 236–252. https://doi.org/10.1016/j.ejor.2016.03.008
Jardin, P. (2017). Dynamics of firm financial evolution and bankruptcy prediction. Expert Systems with Applications, 75, 25–43. https://doi.org/10.1016/j.eswa.2017.01.016
Jardin, P. (2018). Failure pattern-based ensembles applied to bankruptcy forecasting. Decision Support Systems, 107, 64–77. https://doi.org/10.1016/j.dss.2018.01.003
Jones, S. (2017). Corporate bankruptcy prediction: a high dimensional analysis. Review of Accounting Studies, 22(3), 1366–1422. https://doi.org/10.1007/s11142-017-9407-1.
Kim, E., Lee, J., Shin, H., Yang, H., Cho, S., Nam, S., et al. (2019). Champion-challenger analysis for credit card fraud detection : Hybrid ensemble and deep learning. Expert Systems with Applications, 128, 214–224. https://doi.org/10.1016/j.eswa.2019.03.042
Kim, H. J., Jo, N. O., & Shin, K. S. (2016). Optimization of cluster-based evolutionary undersampling for the artificial neural networks in corporate bankruptcy prediction. Expert Systems with Applications, 59, 226–234. https://doi.org/10.1016/j.eswa.2016.04.027
Kim, M. J., & Kang, D. K. (2010). Ensemble with neural networks for bankruptcy prediction. Expert Systems with Applications, 37(4), 3373–3379. https://doi.org/10.1016/j.eswa.2009.10.012
Kraus, M., & Feuerriegel, S. (2017). Decision support from financial disclosures with deep neural networks and transfer learning. Decision Support Systems, 104, 38–48. https://doi.org/10.1016/j.dss.2017.10.001
Krawczyk, B., & Herrera, F. (2017). A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing, 239, 39–57. https://doi.org/10.1016/j.neucom.2017.01.078
Kumar, A., Ramachandran, M., Gandomi, A. H., & Patan, R. (2019). A deep neural network based classifier for brain tumor diagnosis. Applied Soft Computing Journal, 82, 105528. https://doi.org/10.1016/j.asoc.2019.105528
Laitinen, E. K., Lukason, O., & Suvas, A. (2014). Are firm failure processes different? Evidence from seven countries. Investment Management and Financial Innovations, 11(4), 212–222.
Lang, S., Bravo-marquez, F., Beckham, C., Hall, M., & Frank, E. (2019). WekaDeeplearning4j : A deep learning package for Weka based on. Knowledge-Based Systems, 178, 48–50. https://doi.org/10.1016/j.knosys.2019.04.013
Lee, K., Booth, D., & Alam, P. (2005). A comparison of supervised and unsupervised neural networks in predicting bankruptcy of Korean firms. Expert Systems with Applications, 29(1), 1–16. https://doi.org/10.1016/j.eswa.2005.01.004
Lee, S., & Choi, W. S. (2013). A multi-industry bankruptcy prediction model using back-propagation neural network and multivariate discriminant analysis. Expert Systems with Applications, 40(8), 2941–2946. https://doi.org/10.1016/j.eswa.2012.12.009
Leong, L., Hew, T., Tan, G. W., & Ooi, K. (2013). Predicting the determinants of the NFC-enabled mobile credit card acceptance : A neural networks approach. Expert Systems with Applications, 40(14), 5604–5620. https://doi.org/10.1016/j.eswa.2013.04.018
Liang, D., Lu, C. C., Tsai, C. F., & Shih, G. A. (2016). Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study. European Journal of Operational Research, 252(2), 561–572. https://doi.org/10.1016/j.ejor.2016.01.012
Liang, D., Tsai, C. F., & Wu, H. T. (2014). The effect of feature selection on financial distress prediction. Knowledge-Based Systems, 73(1), 289–297. https://doi.org/10.1016/j.knosys.2014.10.010
Liang, D., Tsai, C. F., & Wu, H. T. (2015). The effect of feature selection on financial distress prediction. Knowledge-Based Systems, 73(1), 289–297. https://doi.org/10.1016/j.knosys.2014.10.010
Lin, F., Liang, D., Yeh, C. C., & Huang, J. C. (2014). Novel feature selection methods to financial distress prediction. Expert Systems with Applications, 41(5), 2472–2483. https://doi.org/10.1016/j.eswa.2013.09.047
Mai, F., Tian, S., Lee, C., & Ma, L. (2019). Deep learning models for bankruptcy prediction using textual disclosures. European Journal of Operational Research, 274(2), 743–758. https://doi.org/10.1016/j.ejor.2018.10.024
Ohlson, J. A. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research, 18(1), 109. https://doi.org/10.2307/2490395
Olson, D. L., Delen, D., & Meng, Y. (2012). Comparative analysis of data mining methods for bankruptcy prediction. Decision Support Systems, 52(2), 464–473. https://doi.org/10.1016/j.dss.2011.10.007
Platt, H. D., & Platt, M. B. (1994). Business cycle effects on state corporate failure rates. Journal of Economics and Business, 46(2), 113–127.
Qawaqneh, Z., Abu, A., & Barkana, B. D. (2017). Age and gender classification from speech and face images by jointly fine-tuned deep neural networks. Expert Systems with Applications, 85, 76–86. https://doi.org/10.1016/j.eswa.2017.05.037
R Core Team. R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria (2019). http://www.R-project.org/.
Rakotomalala, R. (2005) TANAGRA: A free software for research and academic purposes. In Proceedings of EGC'2005, RNTI-E-3 (Vol. 2, pp. 697–702).
Ravi Kumar, P., & Ravi, V. (2007). Bankruptcy prediction in banks and firms via statistical and intelligent techniques—A review. European Journal of Operational Research, 180(1), 1–28. https://doi.org/10.1016/j.ejor.2006.08.043
Ravisankar, P., & Ravi, V. (2010). Financial distress prediction in banks using Group Method of Data Handling neural network, counter propagation neural network and fuzzy ARTMAP. Knowledge-Based Systems, 23(8), 823–831. https://doi.org/10.1016/j.knosys.2010.05.007
Serrano-cinca, C., & Gutiérrez-nieto, B. (2013). Partial least square discriminant analysis for bankruptcy prediction. Decision Support Systems, 54(3), 1245–1255. https://doi.org/10.1016/j.dss.2012.11.015
Sghaier, A., Ben Jabeur, S., & Bannour, B. (2018). Using partial least square discriminant analysis to distinguish between Islamic and conventional banks in the MENA region. Review of Financial Economics, 36(2), 133. https://doi.org/10.1002/rfe.1018
Shi, X., Wong, Y. D., Li, M.Z.-F., Palanisamy, C., & Chai, C. (2019). A feature learning approach based on XGBoost for driving assessment and risk prediction. Accident Analysis & Prevention, 129, 170–179. https://doi.org/10.1016/j.aap.2019.05.005
Shin, K. S., Lee, T. S., & Kim, H. J. (2005). An application of support vector machines in bankruptcy prediction model. Expert Systems with Applications, 28(1), 127–135. https://doi.org/10.1016/j.eswa.2004.08.009
Son, H., Hyun, C., Phan, D., & Hwang, H. J. (2019). Data analytic approach for bankruptcy prediction. Expert Systems with Applications, 138, 112816. https://doi.org/10.1016/j.eswa.2019.07.033
Stef, N. (2018). Bankruptcy and the difficulty of firing. International Review of Law and Economics, 54, 85–94. https://doi.org/10.1016/j.irle.2017.11.002
Stef, N. (2021). Institutions and corporate financial distress in Central and Eastern Europe. European Journal of Law and Economics. https://doi.org/10.1007/s10657-021-09702-9
Stef, N., & Jabeur, S. B. (2018). The bankruptcy prediction power of new entrants. International Journal of the Economics of Business, 1516, 1–20. https://doi.org/10.1080/13571516.2018.1455389
Stef, N., & Zenou, E. (2021). Management-to-staff ratio and a firm’s exit. Journal of Business Research, 125, 252–260. https://doi.org/10.1016/j.jbusres.2020.12.027
Tsai, C. F. (2009). Feature selection in bankruptcy prediction. Knowledge-Based Systems, 22(2), 120–127. https://doi.org/10.1016/j.knosys.2008.08.002
Tsai, C. F., & Cheng, K. C. (2012). Simple instance selection for bankruptcy prediction. Knowledge-Based Systems, 27, 333–342. https://doi.org/10.1016/j.knosys.2011.09.017
Tsakonas, A., Dounias, G., Doumpos, M., & Zopounidis, C. (2006). Bankruptcy prediction with neural logic networks by means of grammar-guided genetic programming. Expert Systems with Applications, 30(3), 449–461. https://doi.org/10.1016/j.eswa.2005.10.009
Wang, F., & Ross, C. L. (2018). Machine learning travel mode choices: Comparing the performance of an extreme gradient boosting model with a multinomial logit model. Transportation Research Record: Journal of the Transportation Research Board, 2672(47), 35–45. https://doi.org/10.1177/0361198118773556
Wang, G., Ma, J., & Yang, S. (2014). An improved boosting based on feature selection for corporate bankruptcy prediction. Expert Systems with Applications, 41(5), 2353–2361. https://doi.org/10.1016/j.eswa.2013.09.033
Wold, H. (1985). Partial least squares. In S. Kotz & N. L. Johnson (Eds.), Encyclopedia of statistical sciences (Vol. 6, pp. 581–591). Wiley.
Wruck, K. H. (1990). Financial distress, reorganization, and organizational efficiency. Journal of Financial Economics, 27(2), 419–444. https://doi.org/10.1016/0304-405X(90)90063-6
Xia, Y., Liu, C., Li, Y., & Liu, N. (2017). A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Systems with Applications, 78, 225–241. https://doi.org/10.1016/j.eswa.2017.02.017
Yang, Z., You, W., & Ji, G. (2011). Using partial least squares and support vector machines for bankruptcy prediction. Expert Systems with Applications, 38(7), 8336–8342. https://doi.org/10.1016/j.eswa.2011.01.021
Yu, J., Shi, S., Zhang, F., Chen, G., & Cao, M. (2019). PredGly: Predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization. Bioinformatics, 35(16), 2749–2756. https://doi.org/10.1093/bioinformatics/bty1043
Zhang, H., Qiu, D., Wu, R., Deng, Y., Ji, D., & Li, T. (2019). Novel framework for image attribute annotation with gene selection XGBoost algorithm and relative attribute model. Applied Soft Computing Journal, 80, 57–79. https://doi.org/10.1016/j.asoc.2019.03.017
Zhang, R., Nie, F., Li, X., & Wei, X. (2019). Feature selection with multi-view data: A survey. Information Fusion, 50, 158–167. https://doi.org/10.1016/j.inffus.2018.11.019
Zhao, D., Huang, C., Wei, Y., Yu, F., Wang, M., & Chen, H. (2017). An effective computational model for bankruptcy prediction using kernel extreme learning machine approach. Computational Economics, 49(2), 325–341. https://doi.org/10.1007/s10614-016-9562-7
Zhou, L., & Lai, K. K. (2017). AdaBoost models for corporate bankruptcy prediction with missing data. Computational Economics, 50(1), 69–94. https://doi.org/10.1007/s10614-016-9581-4
Zhou, L., Lu, D., & Fujita, H. (2015). The performance of corporate financial distress prediction models with features selection guided by domain knowledge and data mining approaches. Knowledge-Based Systems, 85, 52–61. https://doi.org/10.1016/j.knosys.2015.04.017
Zhou, L., Si, Y., & Fujita, H. (2017). Predicting the listing statuses of Chinese-listed companies using decision trees combined with an improved filter feature selection method. Knowledge-Based Systems, 128, 93–101. https://doi.org/10.1016/j.knosys.2017.05.003
Funding
The authors received financial support from the Spanish Ministry of Science, Innovation and Universities. FEDER project PGC2018-093645-B-I00 is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ben Jabeur, S., Stef, N. & Carmona, P. Bankruptcy Prediction using the XGBoost Algorithm and Variable Importance Feature Engineering. Comput Econ 61, 715–741 (2023). https://doi.org/10.1007/s10614-021-10227-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-021-10227-1