Abstract
This study utilizes classification models to provide a robust algorithm for imbalanced data where the minority class is of the interest, that is, in the context of default payments. In developing an integrated predictive accuracy algorithm, this study proposes machine learning classifiers and applies DNN, SVM, KNN, and ANN. The proposed algorithm utilizes a 30,000 imbalanced dataset to improve the accuracy of the prediction of default payments by implementing oversampling and undersampling strategies, such as synthetic minority oversampling technique (SMOTE), SVM SMOTE, random undersampling, and ALL-KNN. The results indicate that the SVM under the ALL-KNN sampling technique is able to achieve an accuracy of 98.6%, with the lowest cross entropy loss measurement of 0.028. Through the accurate implementation of the neural networks and neurons used in the proposed algorithm, this paper presents better insights into the functioning of the neural networks when used in conjunction with the resampling techniques. Using the methodology and algorithm presented in this study, credit risk assessments can be more accurately predicted in practical applications where most of the clients are categorized as non-default payments.
Similar content being viewed by others
References
2019 Global payments trends report—Canada Country Insights. (2019). Retrieved from https://www.jpmorgan.com/merchant-services/insights/reports/Canada
Abdelmoula, A. K. (2015). Bank credit risk analysis with k-nearest neighbor classifier: Case of Tunisian banks. Accounting and Management Information Systems/Contabilitate Si Informatica de Gestiune, 14(1), 79–106.
Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185.
Bahillo, J. A., Ganguly, S., Kremer, A., & Kristensen, I. (2016). The value in digitally transforming credit risk management. Retrieved from https://www.mckinsey.com/business-functions/risk/our-insights/the-value-in-digitally-transforming-credit-risk-management.
Basel I: International Convergence of Capital Measurement and Capital Standards (1988). Retrieved from https://www.bis.org/publ/bcbs04a.htm
Basel II: International Convergence of Capital Measurement and Capital Standards: A Revised Framework. (2004). Retrieved from https://www.bis.org/publ/bcbs107.htm
Basel III: A global regulatory framework for more resilient banks and banking systems—revised version June 2011. (2011). Retrieved from https://www.bis.org/publ/bcbs189.htm
Bayraci, S., & Susuz, O. (2019). A Deep Neural Network (DNN) based classification model in application to loan default prediction. Theoretical and Applied Economics, 4, 75–84.
Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1), 1–3.
Canadian Demands for Speed and Convenience Influencing Payments Innovation. (2018). Retrieved from https://www.payments.ca/industry-info/our-research/canadian-demands-speed-and-convenience-influencing-payments-innovation
CBA—Credit Card Statistics. (2019). Retrieved from https://cba.ca/credit-card-statistics
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. https://doi.org/10.1109/TIT.1967.1053964
Cao, J., Lu, H., Wang, W., & Wang, J. (2013). A loan default discrimination model using cost-sensitive support vector machine improved by PSO. Information Technology and Management, 14(3), 193–204. https://doi.org/10.1007/s10799-013-0161-1
Chen, S., Härdle, W. K., & Moro, R. A. (2011). Modeling default risk with support vector machines. Quantitative Finance, 11(1), 135–154. https://doi.org/10.1080/14697680903410015
Cimpoeru, S. S. (2011). Neural networks and their application in credit risk assessment. Evidence from the Romanian Market. Technological and Economic Development of Economy, 17(3), 519–534. https://doi.org/10.3846/20294913.2011.606339
Danenas, P., & Garsva, G. (2015). Selection of Support Vector Machines based classifiers for credit risk domain. Expert Systems with Applications, 42(6), 3194–3204. https://doi.org/10.1016/j.eswa.2014.12.001
Finlay, S. (2015). Multiple classifier architectures and their application to credit risk assessment. European Journal of Operational Research, 210(2), 368–378.
Fix, E., & Hodges, Jr., J. L. (1951). Discriminatory analysis, nonparametric discrimination. Retrieved from https://apps.dtic.mil/dtic/tr/fulltext/u2/a800276.pdf
Gu, Q., & Han, J. (2013 April). Clustered support vector machines. In Artificial intelligence and statistics (pp. 307–315). PMLR.
Hamori, S., Kawai, M., Kume, T., Murakami, Y., & Watanabe, C. (2018). Ensemble learning or deep learning? Application to default risk analysis. Journal of Risk and Financial Management, 11(1), 12. https://doi.org/10.3390/jrfm11010012
Harris, T. (2015). Credit scoring using the clustered support vector machine. Expert Systems with Applications, 42(2), 741–750. https://doi.org/10.1016/j.eswa.2014.08.029.
Härle, P., Havas, A., & Samandari, H. (2015). The future of bank risk management. Retrieved from https://www.mckinsey.com/business-functions/risk/our-insights/the-future-of-bank-risk-management
Haykin, S. S. (1998). Neural networks:Aa comprehensive foundation. Prentice-Hall.
Henley, W. E., & Hand, D. J. (1996). A k-nearest-neighbour classifier for assessing consumer credit risk. Journal of the Royal Statistical Society, Series D, 45(1), 77. https://doi.org/10.2307/2348414
Kalid, S. N., Ng, K., Tong, G., & Khor, K. (2020). A Multiple classifiers system for anomaly detection in credit card data with unbalanced and overlapped classes. IEEE Access, 8, 28210–28221. https://doi.org/10.1109/ACCESS.2020.2972009
Karaa, A., & Krichene, A. (2012). Credit-risk assessment using support vectors machine and multilayer neural network models: A comparative study case of a tunisian bank. Accounting and Management Information Systems/Contabilitate Si Informatica De Gestiune, 11(4), 587–620.
Khashman, A. (2010). Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2010.02.101
Khemakhem, S., & Boujelbènea, Y. (2015). Credit risk prediction: A comparative study between discriminant analysis and the neural network approach. Accounting and Management Information Systems/Contabilitate Si Informatica De Gestiune, 14(1), 60–78.
Kvamme, H., Sellereite, N., Aas, K., & Sjursen, S. (2018). Predicting mortgage default using convolutional neural networks. Expert Systems with Applications, 102, 207–217. https://doi.org/10.1016/j.eswa.2018.02.029
Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136.
Marinakis, Y., Marinaki, M., Doumpos, M., Matsatsinis, N., & Zopounidis, C. (2008). Optimization of nearest neighbor classifiers via metaheuristic algorithms for credit risk assessment. Journal of Global Optimization, 42(2), 279–293.
Martino, A., Rizzi, A., & Frattale Mascioli, F. M. (2019). Efficient approaches for solving the largescale k-medoids problem: Towards structured data. In C. Sabourin, J. J. Merelo, K. Madani, & K. Warwick (Eds.), Computational Intelligence: 9th International Joint Conference, IJCCI 2017 FunchalMadeira, Portugal, November 1–3, 2017 Revised Selected Papers (pp. 199–219). Cham: Springer International Publishing.
Massaron, L., & Boschetti, A. (2016). Regression analysis with Python. Packt Publishing.
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115–133.
McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in econometrics (pp. 104–142). Academic Press.
Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT Press.
Nadkarni, P., & Nadkarni, P. (2016). Core technologies: Data mining and “Big Data”. Clinical Research Computing, 9, 187–204.
Oreski, S., Oreski, D., & Oreski, G. (2012). Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment. Expert Systems with Applications, 39(16), 12605–12617. https://doi.org/10.1016/j.eswa.2012.05.023
Rao, C., Liu, M., Goh, M., & Wen, J. (2020). 2-stage modified random forest model for credit risk assessment of P2P network lending to “Three Rurals” borrowers. Applied Soft Computing, 95, 106570.
Rtayli, N., & Enneya, N. (2020). Selection features and support vector machine for credit card risk identification. Procedia Manufacturing, 46, 941–948. https://doi.org/10.1016/j.promfg.2020.05.012
Sariannidis, N., Papadakis, S., Garefalakis, A., Lemonakis, C., & Kyriaki-Argyro, T. (2020). Default avoidance on credit card portfolios using accounting, demographical and exploratory factors: Decision making based on machine learning (ML) techniques. Annals of Operations Research, 294(1), 715–739.
Sun, T., & Vasarhelyi, M. A. (2018). Predicting credit card delinquencies: An application of deep neural networks. Intelligent Systems in Accounting, Finance and Management, 25(4), 174–189. https://doi.org/10.1002/isaf.1437
Tomek, I. (2007). An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics SMC-6, 6, 448–452. https://doi.org/10.1109/TSMC.1976.4309523
Trustorff, J. H., Konrad, P. M., & Leker, J. (2011). Credit risk prediction using support vector machines. Review of Quantitative Finance and Accounting, 36(4), 565–581.
Vapnik, V. N. (2000). The nature of statistical learning theory (2nd ed.). Springer.
Wang, J., Hedar, A. R., Wang, S., & Ma, J. (2012). Rough set and scatter search metaheuristic based feature selection for credit scoring. Expert Systems with Applications, 39(6), 6123–6128.
Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473–2480.
Yu, L., Yue, W., Wang, S., & Lai, K. K. (2010). Support vector machine based multiagent ensemble learning for credit risk evaluation. Expert Systems with Applications, 37(2), 1351–1360.
Zhu, B., Yang, W., Wang, H., & Yuan, Y. (2018). A hybrid deep learning model for consumer credit scoring. In 2018 international conference on artificial intelligence and big data (ICAIBD) (pp. 205–208). https://doi.org/10.1109/ICAIBD.2018.8396195
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix 1: Descriptive statistics of the data set
Appendix 1: Descriptive statistics of the data set
Mean | SD | Min | Max | Q1 | Median | Q3 | Range | IQR | Mode | Skewness | Kurtosis | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
LIMIT_BAL | 167,484 | 129,748 | 10,000 | 1,000,000 | 50,000 | 140,000 | 240,000 | 990,000 | 190,000 | 50,000 | 0.99 | 0.54 |
SEX | 1.6037 | 0.4891 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | 2 | − 0.42 | − 1.82 |
EDUCATION | 1.8531 | 0.7903 | 0 | 6 | 1 | 2 | 2 | 6 | 1 | 2 | 0.97 | 2.08 |
MARRIAGE | 1.5519 | 0.522 | 0 | 3 | 1 | 2 | 2 | 3 | 1 | 2 | − 0.02 | − 1.36 |
AGE | 35.486 | 9.218 | 21 | 79 | 28 | 34 | 41 | 58 | 13 | 29 | 0.73 | 0.04 |
PAY_0 | − 0.0167 | 1.1238 | − 2 | 8 | − 1 | 0 | 0 | 10 | 1 | 0 | 0.73 | 2.72 |
PAY_2 | − 0.13377 | 1.19719 | − 2 | 8 | − 1 | 0 | 0 | 10 | 1 | 0 | 0.79 | 1.57 |
PAY_3 | − 0.1662 | 1.19687 | − 2 | 8 | − 1 | 0 | 0 | 10 | 1 | 0 | 0.84 | 2.08 |
PAY_4 | − 0.22067 | 1.16914 | − 2 | 8 | − 1 | 0 | 0 | 10 | 1 | 0 | 1 | 3.5 |
PAY_5 | − 0.2662 | 1.13319 | − 2 | 8 | − 1 | 0 | 0 | 10 | 1 | 0 | 1.01 | 3.99 |
PAY_6 | − 0.2911 | 1.14999 | − 2 | 8 | − 1 | 0 | 0 | 10 | 1 | 0 | 0.95 | 3.43 |
BILL_AMT1 | 51,223 | 73,636 | − 165,580 | 964,511 | 3558 | 22,382 | 67,093 | 1,130,091 | 63,535 | 0 | 2.66 | 9.81 |
BILL_AMT2 | 49,179 | 71,174 | − 69,777 | 983,931 | 2984 | 21,200 | 64,011 | 1,053,708 | 61,027 | 0 | 2.71 | 10.3 |
BILL_AMT3 | 47,013 | 69,349 | − 157,264 | 1,664,089 | 2665 | 20,089 | 60,166 | 1,821,353 | 57,502 | 0 | 3.09 | 19.78 |
BILL_AMT4 | 43,263 | 64,333 | − 170,000 | 891,586 | 2326 | 19,052 | 54,512 | 1,061,586 | 52,186 | 0 | 2.82 | 11.31 |
BILL_AMT5 | 40,311 | 60,797 | − 81,334 | 927,171 | 1763 | 18,105 | 50,202 | 1,008,505 | 48,439 | 0 | 2.88 | 12.31 |
BILL_AMT6 | 38,872 | 59,554 | − 339,603 | 961,664 | 1256 | 17,071 | 49,203 | 1,301,267 | 47,947 | 0 | 2.85 | 12.27 |
PAY_AMT1 | 5664 | 16,563 | 0 | 873,552 | 1000 | 2100 | 5006 | 873,552 | 4006 | 0 | 14.67 | 415.25 |
PAY_AMT2 | 5921 | 23,041 | 0 | 1,684,259 | 833 | 2009 | 5000 | 1,684,259 | 4167 | 0 | 30.45 | 1641.63 |
PAY_AMT3 | 5226 | 17,607 | 0 | 896,040 | 390 | 1800 | 4505 | 896,040 | 4115 | 0 | 17.22 | 564.31 |
PAY_AMT4 | 4826 | 15,666 | 0 | 621,000 | 296 | 1500 | 4014 | 621,000 | 3718 | 0 | 12.9 | 277.33 |
PAY_AMT5 | 4799 | 15,278 | 0 | 426,529 | 252 | 1500 | 4033 | 426,529 | 3781 | 0 | 11.13 | 180.06 |
PAY_AMT6 | 5216 | 17,777 | 0 | 528,666 | 117 | 1500 | 4000 | 528,666 | 3883 | 0 | 10.64 | 167.16 |
Default payment next month | 0.2212 | 0.41506 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1.34 | − 0.2 |
Rights and permissions
About this article
Cite this article
Mahbobi, M., Kimiagari, S. & Vasudevan, M. Credit risk classification: an integrated predictive accuracy algorithm using artificial and deep neural networks. Ann Oper Res 330, 609–637 (2023). https://doi.org/10.1007/s10479-021-04114-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-021-04114-z