Abstract
Loan default risk prediction is necessary in credit risk assessment, as it helps financing institutions and investors make decisions. However, existing prediction models focus more on using individual classifiers to obtain higher prediction accuracy, which is far from the core purpose of business (i.e., maximizing profit) and leaves opportunities to explore profit-oriented and interpretable weighting models. This study proposes a profit-oriented weighting model for loan default prediction. The model consists of three stages: constructing multiple profit-oriented sub-classifiers, determining profit-oriented weight coefficients, and providing interpretable analysis. Five lending datasets are examined based on accuracy and profit-based metrics. The empirical results demonstrate that the proposed weighting prediction system helps lenders achieve higher profits and provides concise and intuitive interpretability. Thus, it can help practitioners make better decisions and manage risk.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon request.
Change history
21 March 2024
A Correction to this paper has been published: https://doi.org/10.1007/s10479-024-05952-3
Notes
References
Asencios, R., Asencios, C., & Ramos, E. (2023). Profit scoring for credit unions using the multilayer perceptron, XGBoost and TabNet algorithms: Evidence from Peru. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2022.119201
Bates, J. M., & Granger, C. W. J. (1969). The Combination of forecasts. Journal of the Operational Research Society, 20(4), 451–468. https://doi.org/10.1057/jors.1969.103
Belhadi, A., Kamble, S. S., Mani, V., Benkhati, I., & Touriki, F. E. (2021). An ensemble machine learning approach for forecasting credit risk of agricultural SMEs’ investments in agriculture 4.0 through supply chain finance. Annals of Operations Research. https://doi.org/10.1007/s10479-021-04366-9
Bishop, C. M. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics), 1st edn. 2006. corr. 2nd printing edn. Machine Learning.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (2017). Classification and regression trees. Classification and Regression Trees. https://doi.org/10.1201/9781315139470
Byanjankar, A., Heikkila, M., & Mezei, J. (2015). Predicting credit risk in peer-to-peer lending: A neural network approach. In 2015 IEEE Symposium Series on Computational Intelligence (pp. 719–725). IEEE. https://doi.org/10.1109/SSCI.2015.109
Chen, T., & Guestrin, C. (2016). XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). New York, NY, USA: ACM. https://doi.org/10.1145/2939672.2939785
Ciampi, F. (2015). Corporate governance characteristics and default prediction modeling for small enterprises: An empirical analysis of Italian firms. Journal of Business Research. https://doi.org/10.1016/j.jbusres.2014.10.003
Coussement, K., & Van den Poel, D. (2008). Integrating the voice of customers through call center emails into a decision support system for churn prediction. Information and Management. https://doi.org/10.1016/j.im.2008.01.005
De Bock, K. W., & De Caigny, A. (2021). Spline-rule ensemble classifiers with structured sparsity regularization for interpretable customer churn modeling. Decision Support Systems, 150, 113523. https://doi.org/10.1016/j.dss.2021.113523
De Bock, K. W., & Van den Poel, D. (2012). Reconciling performance and interpretability in customer churn prediction using ensemble learning based on generalized additive models. Expert Systems with Applications, 39(8), 6816–6826. https://doi.org/10.1016/j.eswa.2012.01.014
Devos, A., Dhondt, J., Stripling, E., Baesens, B., Broucke, S. Vanden, & Sukhatme, G. (2018). PROFIT MAXIMIZING LOGISTIC REGRESSION MODELING for CREDIT SCORING. In 2018 IEEE Data science workshop, DSW 2018 - Proceedings. https://doi.org/10.1109/DSW.2018.8439113
du Jardin, P. (2021). Forecasting bankruptcy using biclustering and neural network-based ensembles. Annals of Operations Research. https://doi.org/10.1007/s10479-019-03283-2
Dželihodžić, A., Đonko, D., & Kevrić, J. (2018). Improved credit scoring model based on bagging neural network. International Journal of Information Technology & Decision Making, 17(06), 1725–1741. https://doi.org/10.1142/S0219622018500293
Finlay, S. (2010). Credit scoring for profitability objectives. European Journal of Operational Research. https://doi.org/10.1016/j.ejor.2009.05.025
Fitzpatrick, T., & Mues, C. (2021). How can lenders prosper? Comparing machine learning approaches to identify profitable peer-to-peer loan investments. European Journal of Operational Research. https://doi.org/10.1016/j.ejor.2021.01.047
García, S., Fernández, A., Luengo, J., & Herrera, F. (2010). Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences. https://doi.org/10.1016/j.ins.2009.12.010
Garrido, F., Verbeke, W., & Bravo, C. (2018). A Robust profit measure for binary classification model evaluation. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2017.09.045
Giacinto, G., & Roli, F. (2001). An Approach to the automatic design of multiple classifier systems. Pattern Recognition Letters. https://doi.org/10.1016/S0167-8655(00)00096-9
He, T., Dong, Z., Meng, K., Wang, H., & Oh, Y. (2009). Accelerating Multi-layer Perceptron based short term demand forecasting using Graphics Processing Units. In 2009 Transmission & Distribution Conference & Exposition: Asia and Pacific (pp. 1–4). IEEE. https://doi.org/10.1109/TD-ASIA.2009.5356813
Herrera, G. P., Constantino, M., Su, J. J., & Naranpanawa, A. (2023). The use of ICTs and income distribution in Brazil: A machine learning explanation using SHAP values. Telecommunications Policy. https://doi.org/10.1016/j.telpol.2023.102598
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. https://doi.org/10.1080/00401706.1970.10488634
Jain, R., & Sharma, N. (2022). A deadline-constrained time-cost-effective salp swarm algorithm for resource optimization in cloud computing. International Journal of Applied Metaheuristic Computing. https://doi.org/10.4018/ijamc.292509
Jha, P. N., & Cucculelli, M. (2021). A new model averaging approach in predicting credit risk default. Risks. https://doi.org/10.3390/risks9060114
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., et al. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems (Vol. 2017-Decem, pp. 3147–3155).
Kim, A., & Cho, S. B. (2017). Dempster-shafer fusion of semi-supervised learning methods for predicting defaults in social lending. In Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-319-70096-0_87
Kozodoi, N., Lessmann, S., Papakonstantinou, K., Gatsoulis, Y., & Baesens, B. (2019). A multi-objective approach for profit-driven feature selection in credit scoring. Decision Support Systems, 120, 106–117. https://doi.org/10.1016/j.dss.2019.03.011
Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136. https://doi.org/10.1016/j.ejor.2015.05.030
Lessmann, S., Haupt, J., Coussement, K., & De Bock, K. W. (2021). Targeting customers for profit: An ensemble learning framework to support marketing decision-making. Information Sciences, 557, 286–301. https://doi.org/10.1016/j.ins.2019.05.027
Levy, A., & Baha, R. (2021). Credit risk assessment: A comparison of the performances of the linear discriminant analysis and the logistic regression. International Journal of Entrepreneurship and Small Business. https://doi.org/10.1504/IJESB.2021.112265
Li, M., Yan, C., & Liu, W. (2021). The network loan risk prediction model based on convolutional neural network and stacking fusion model. Applied Soft Computing. https://doi.org/10.1016/j.asoc.2021.107961
Li, Y., & Chen, W. (2021). Entropy method of constructing a combined model for improving loan default prediction: A case study in China. Journal of the Operational Research Society. https://doi.org/10.1080/01605682.2019.1702905
Liang, L., & Cai, X. (2020). Forecasting peer-to-peer platform default rate with LSTM neural network. Electronic Commerce Research and Applications. https://doi.org/10.1016/j.elerap.2020.100997
Liu, R., Mai, F., Shan, Z., & Wu, Y. (2020). Predicting shareholder litigation on insider trading from financial text: An interpretable deep learning approach. Information and Management. https://doi.org/10.1016/j.im.2020.103387
Liu, Y., Yang, M., Wang, Y., Li, Y., Xiong, T., & Li, A. (2022). Applying machine learning algorithms to predict default probability in the online credit market: Evidence from China. International Review of Financial Analysis, 79, 101971. https://doi.org/10.1016/j.irfa.2021.101971
López, J., & Maldonado, S. (2019). Profit-based credit scoring based on robust optimization and feature selection. Information Sciences. https://doi.org/10.1016/j.ins.2019.05.093
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 4768–4777). https://doi.org/10.5555/3295222.3295230
Ma, X., Sha, J., Wang, D., Yu, Y., Yang, Q., & Niu, X. (2018). Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning. Electronic Commerce Research and Applications. https://doi.org/10.1016/j.elerap.2018.08.002
McCallum, A., & Nigam, K. (1998). A Comparison of event models for naive bayes text classification. AAAI/ICML-98 Workshop on learning for text categorization.
Mirjalili, S., Gandomi, A. H., Mirjalili, S. Z., Saremi, S., Faris, H., & Mirjalili, S. M. (2017). Salp Swarm Algorithm: A bio-inspired optimizer for engineering design problems. Advances in Engineering Software. https://doi.org/10.1016/j.advengsoft.2017.07.002
Moscatelli, M., Parlapiano, F., Narizzano, S., & Viggiano, G. (2020). Corporate default forecasting with machine learning. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2020.113567
Niu, X., Wang, J., & Zhang, L. (2022). Carbon price forecasting system based on error correction and divide-conquer strategies. Applied Soft Computing, 118, 107935. https://doi.org/10.1016/j.asoc.2021.107935
Óskarsdóttir, M., & Bravo, C. (2021). Multilayer network analysis for improved credit risk prediction. Omega (united Kingdom). https://doi.org/10.1016/j.omega.2021.102520
Papoušková, M., & Hájek, P. (2019). Two-stage consumer credit risk modelling using heterogeneous ensemble learning. Decision Support Systems.
Pérez-Martín, A., Pérez-Torregrosa, A., & Vaca, M. (2018). Big Data techniques to measure credit banking risk in home equity loans. Journal of Business Research. https://doi.org/10.1016/j.jbusres.2018.02.008
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). Catboost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems. https://doi.org/10.5555/3327757.3327770
Qi, J., Yang, R., & Wang, P. (2021). Application of explainable machine learning based on Catboost in credit scoring. Journal of Physics: Conference Series. https://doi.org/10.1088/1742-6596/1955/1/012039
Sagi, O., & Rokach, L. (2020). Explainable decision forest: Transforming a decision forest into an interpretable tree. Information Fusion. https://doi.org/10.1016/j.inffus.2020.03.013
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning. https://doi.org/10.1007/bf00116037
Serrano-Cinca, C., & Gutiérrez-Nieto, B. (2016). The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decision Support Systems. https://doi.org/10.1016/j.dss.2016.06.014
Shapley, L. S. (1953). The value of an n-Person Game. Contributions to the Theory of Games (AM-28), Volume II.
Stewart, R. T. (2011). A profit-based scoring system in consumer credit: Making acquisition decisions for credit cards. Journal of the Operational Research Society. https://doi.org/10.1057/jors.2010.135
Tabachnick, B. G., & Fidell, L. S. (1996). Using multivariate statistics. Harper Collins.
Thomas, L. C. (2000). A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. International Journal of Forecasting. https://doi.org/10.1016/S0169-2070(00)00034-0
Tian, S., Yu, Y., & Guo, H. (2015). Variable selection and corporate bankruptcy forecasts. Journal of Banking and Finance. https://doi.org/10.1016/j.jbankfin.2014.12.003
Verbraken, T., Bravo, C., Weber, R., & Baesens, B. (2014). Development and application of consumer credit scoring models using profit-based classification measures. European Journal of Operational Research, 238(2), 505–513. https://doi.org/10.1016/j.ejor.2014.04.001
Wang, J., Zhang, L., Liu, Z., & Niu, X. (2022). A novel decomposition-ensemble forecasting system for dynamic dispatching of smart grid with sub-model selection and intelligent optimization. Expert Systems with Applications, 201, 117201. https://doi.org/10.1016/j.eswa.2022.117201
Wang, S., Wang, J., Lu, H., & Zhao, W. (2021). A novel combined model for wind speed prediction: Combination of linear model, shallow neural networks, and deep learning approaches. Energy, 234, 121275. https://doi.org/10.1016/j.energy.2021.121275
Xia, Y., He, L., Li, Y., Liu, N., & Ding, Y. (2020a). Predicting loan default in peer-to-peer lending using narrative data. Journal of Forecasting. https://doi.org/10.1002/for.2625
Xia, Y., Liu, C., & Liu, N. (2017). Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending. Electronic Commerce Research and Applications. https://doi.org/10.1016/j.elerap.2017.06.004
Xia, Y., Zhao, J., He, L., Li, Y., & Niu, M. (2020b). A novel tree-based dynamic heterogeneous ensemble method for credit scoring. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2020.113615
Xia, Y., Zhao, J., He, L., Li, Y., & Yang, X. (2021). Forecasting loss given default for peer-to-peer loans via heterogeneous stacking ensemble approach. International Journal of Forecasting. https://doi.org/10.1016/j.ijforecast.2021.03.002
Yang, B., Wu, S., Huang, J., Guo, Z., Wang, J., Zhang, Z., et al. (2023). Salp swarm optimization algorithm based MPPT design for PV-TEG hybrid system under partial shading conditions. Energy Conversion and Management. https://doi.org/10.1016/j.enconman.2023.117410
Ye, X., Dong, L., & Ma, D. (2018). Loan evaluation in P2P lending based on random forest optimized by genetic algorithm with profit score. Electronic Commerce Research and Applications, 32, 23–36. https://doi.org/10.1016/j.elerap.2018.10.004
Yıldırım, M., Okay, F. Y., & Özdemir, S. (2021). Big data analytics for default prediction using graph theory. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2021.114840
Zhang, L., Wang, J., & Liu, Z. (2023). What should lenders be more concerned about? Developing a profit-driven loan default prediction model. Expert Systems with Applications, 213, 118938. https://doi.org/10.1016/j.eswa.2022.118938
Zhu, L., Qiu, D., Ergu, D., Ying, C., & Liu, K. (2019). A study on predicting loan default based on the random forest algorithm. Procedia Computer Science, 162, 503–513. https://doi.org/10.1016/j.procs.2019.12.017
Acknowledgements
This work was supported by Major Program of National Fund of Philosophy and Social Science of China (Grant No. 17ZDA093).
Funding
This work was supported by the Major Program of National Fund of Philosophy and Social Science of China (Grant numbers [17ZDA093]).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cui, H., Zhang, L., Yang, H. et al. Maximizing the lender’s profit: profit-oriented loan default prediction based on a weighting model. Ann Oper Res (2024). https://doi.org/10.1007/s10479-024-05912-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10479-024-05912-x