Credit risk classification: an integrated predictive accuracy algorithm using artificial and deep neural networks

Mahbobi, Mohammad; Kimiagari, Salman; Vasudevan, Marriappan

doi:10.1007/s10479-021-04114-z

Credit risk classification: an integrated predictive accuracy algorithm using artificial and deep neural networks

Original Research
Published: 01 July 2021

Volume 330, pages 609–637, (2023)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

1759 Accesses
18 Citations
Explore all metrics

Abstract

This study utilizes classification models to provide a robust algorithm for imbalanced data where the minority class is of the interest, that is, in the context of default payments. In developing an integrated predictive accuracy algorithm, this study proposes machine learning classifiers and applies DNN, SVM, KNN, and ANN. The proposed algorithm utilizes a 30,000 imbalanced dataset to improve the accuracy of the prediction of default payments by implementing oversampling and undersampling strategies, such as synthetic minority oversampling technique (SMOTE), SVM SMOTE, random undersampling, and ALL-KNN. The results indicate that the SVM under the ALL-KNN sampling technique is able to achieve an accuracy of 98.6%, with the lowest cross entropy loss measurement of 0.028. Through the accurate implementation of the neural networks and neurons used in the proposed algorithm, this paper presents better insights into the functioning of the neural networks when used in conjunction with the resampling techniques. Using the methodology and algorithm presented in this study, credit risk assessments can be more accurately predicted in practical applications where most of the clients are categorized as non-default payments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning modelling techniques: current progress, applications, advantages, and challenges

Article Open access 17 April 2023

Machine learning techniques for credit risk evaluation: a systematic literature review

Article 01 April 2020

Machine learning-driven credit risk: a systemic review

Article Open access 16 July 2022

References

2019 Global payments trends report—Canada Country Insights. (2019). Retrieved from https://www.jpmorgan.com/merchant-services/insights/reports/Canada
Abdelmoula, A. K. (2015). Bank credit risk analysis with k-nearest neighbor classifier: Case of Tunisian banks. Accounting and Management Information Systems/Contabilitate Si Informatica de Gestiune, 14(1), 79–106.
Google Scholar
Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185.
Google Scholar
Bahillo, J. A., Ganguly, S., Kremer, A., & Kristensen, I. (2016). The value in digitally transforming credit risk management. Retrieved from https://www.mckinsey.com/business-functions/risk/our-insights/the-value-in-digitally-transforming-credit-risk-management.
Basel I: International Convergence of Capital Measurement and Capital Standards (1988). Retrieved from https://www.bis.org/publ/bcbs04a.htm
Basel II: International Convergence of Capital Measurement and Capital Standards: A Revised Framework. (2004). Retrieved from https://www.bis.org/publ/bcbs107.htm
Basel III: A global regulatory framework for more resilient banks and banking systems—revised version June 2011. (2011). Retrieved from https://www.bis.org/publ/bcbs189.htm
Bayraci, S., & Susuz, O. (2019). A Deep Neural Network (DNN) based classification model in application to loan default prediction. Theoretical and Applied Economics, 4, 75–84.
Google Scholar
Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1), 1–3.
Article Google Scholar
Canadian Demands for Speed and Convenience Influencing Payments Innovation. (2018). Retrieved from https://www.payments.ca/industry-info/our-research/canadian-demands-speed-and-convenience-influencing-payments-innovation
CBA—Credit Card Statistics. (2019). Retrieved from https://cba.ca/credit-card-statistics
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
Article Google Scholar
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. https://doi.org/10.1109/TIT.1967.1053964
Article Google Scholar
Cao, J., Lu, H., Wang, W., & Wang, J. (2013). A loan default discrimination model using cost-sensitive support vector machine improved by PSO. Information Technology and Management, 14(3), 193–204. https://doi.org/10.1007/s10799-013-0161-1
Article Google Scholar
Chen, S., Härdle, W. K., & Moro, R. A. (2011). Modeling default risk with support vector machines. Quantitative Finance, 11(1), 135–154. https://doi.org/10.1080/14697680903410015
Article Google Scholar
Cimpoeru, S. S. (2011). Neural networks and their application in credit risk assessment. Evidence from the Romanian Market. Technological and Economic Development of Economy, 17(3), 519–534. https://doi.org/10.3846/20294913.2011.606339
Article Google Scholar
Danenas, P., & Garsva, G. (2015). Selection of Support Vector Machines based classifiers for credit risk domain. Expert Systems with Applications, 42(6), 3194–3204. https://doi.org/10.1016/j.eswa.2014.12.001
Article Google Scholar
Finlay, S. (2015). Multiple classifier architectures and their application to credit risk assessment. European Journal of Operational Research, 210(2), 368–378.
Article Google Scholar
Fix, E., & Hodges, Jr., J. L. (1951). Discriminatory analysis, nonparametric discrimination. Retrieved from https://apps.dtic.mil/dtic/tr/fulltext/u2/a800276.pdf
Gu, Q., & Han, J. (2013 April). Clustered support vector machines. In Artificial intelligence and statistics (pp. 307–315). PMLR.
Hamori, S., Kawai, M., Kume, T., Murakami, Y., & Watanabe, C. (2018). Ensemble learning or deep learning? Application to default risk analysis. Journal of Risk and Financial Management, 11(1), 12. https://doi.org/10.3390/jrfm11010012
Article Google Scholar
Harris, T. (2015). Credit scoring using the clustered support vector machine. Expert Systems with Applications, 42(2), 741–750. https://doi.org/10.1016/j.eswa.2014.08.029.
Article Google Scholar
Härle, P., Havas, A., & Samandari, H. (2015). The future of bank risk management. Retrieved from https://www.mckinsey.com/business-functions/risk/our-insights/the-future-of-bank-risk-management
Haykin, S. S. (1998). Neural networks:Aa comprehensive foundation. Prentice-Hall.
Google Scholar
Henley, W. E., & Hand, D. J. (1996). A k-nearest-neighbour classifier for assessing consumer credit risk. Journal of the Royal Statistical Society, Series D, 45(1), 77. https://doi.org/10.2307/2348414
Article Google Scholar
Kalid, S. N., Ng, K., Tong, G., & Khor, K. (2020). A Multiple classifiers system for anomaly detection in credit card data with unbalanced and overlapped classes. IEEE Access, 8, 28210–28221. https://doi.org/10.1109/ACCESS.2020.2972009
Article Google Scholar
Karaa, A., & Krichene, A. (2012). Credit-risk assessment using support vectors machine and multilayer neural network models: A comparative study case of a tunisian bank. Accounting and Management Information Systems/Contabilitate Si Informatica De Gestiune, 11(4), 587–620.
Google Scholar
Khashman, A. (2010). Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2010.02.101
Article Google Scholar
Khemakhem, S., & Boujelbènea, Y. (2015). Credit risk prediction: A comparative study between discriminant analysis and the neural network approach. Accounting and Management Information Systems/Contabilitate Si Informatica De Gestiune, 14(1), 60–78.
Google Scholar
Kvamme, H., Sellereite, N., Aas, K., & Sjursen, S. (2018). Predicting mortgage default using convolutional neural networks. Expert Systems with Applications, 102, 207–217. https://doi.org/10.1016/j.eswa.2018.02.029
Article Google Scholar
Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136.
Article Google Scholar
Marinakis, Y., Marinaki, M., Doumpos, M., Matsatsinis, N., & Zopounidis, C. (2008). Optimization of nearest neighbor classifiers via metaheuristic algorithms for credit risk assessment. Journal of Global Optimization, 42(2), 279–293.
Article Google Scholar
Martino, A., Rizzi, A., & Frattale Mascioli, F. M. (2019). Efficient approaches for solving the largescale k-medoids problem: Towards structured data. In C. Sabourin, J. J. Merelo, K. Madani, & K. Warwick (Eds.), Computational Intelligence: 9th International Joint Conference, IJCCI 2017 FunchalMadeira, Portugal, November 1–3, 2017 Revised Selected Papers (pp. 199–219). Cham: Springer International Publishing.
Chapter Google Scholar
Massaron, L., & Boschetti, A. (2016). Regression analysis with Python. Packt Publishing.
Google Scholar
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115–133.
Article Google Scholar
McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in econometrics (pp. 104–142). Academic Press.
Google Scholar
Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT Press.
Google Scholar
Nadkarni, P., & Nadkarni, P. (2016). Core technologies: Data mining and “Big Data”. Clinical Research Computing, 9, 187–204.
Google Scholar
Oreski, S., Oreski, D., & Oreski, G. (2012). Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment. Expert Systems with Applications, 39(16), 12605–12617. https://doi.org/10.1016/j.eswa.2012.05.023
Article Google Scholar
Rao, C., Liu, M., Goh, M., & Wen, J. (2020). 2-stage modified random forest model for credit risk assessment of P2P network lending to “Three Rurals” borrowers. Applied Soft Computing, 95, 106570.
Article Google Scholar
Rtayli, N., & Enneya, N. (2020). Selection features and support vector machine for credit card risk identification. Procedia Manufacturing, 46, 941–948. https://doi.org/10.1016/j.promfg.2020.05.012
Article Google Scholar
Sariannidis, N., Papadakis, S., Garefalakis, A., Lemonakis, C., & Kyriaki-Argyro, T. (2020). Default avoidance on credit card portfolios using accounting, demographical and exploratory factors: Decision making based on machine learning (ML) techniques. Annals of Operations Research, 294(1), 715–739.
Article Google Scholar
Sun, T., & Vasarhelyi, M. A. (2018). Predicting credit card delinquencies: An application of deep neural networks. Intelligent Systems in Accounting, Finance and Management, 25(4), 174–189. https://doi.org/10.1002/isaf.1437
Article Google Scholar
Tomek, I. (2007). An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics SMC-6, 6, 448–452. https://doi.org/10.1109/TSMC.1976.4309523
Article Google Scholar
Trustorff, J. H., Konrad, P. M., & Leker, J. (2011). Credit risk prediction using support vector machines. Review of Quantitative Finance and Accounting, 36(4), 565–581.
Article Google Scholar
Vapnik, V. N. (2000). The nature of statistical learning theory (2nd ed.). Springer.
Book Google Scholar
Wang, J., Hedar, A. R., Wang, S., & Ma, J. (2012). Rough set and scatter search metaheuristic based feature selection for credit scoring. Expert Systems with Applications, 39(6), 6123–6128.
Article Google Scholar
Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473–2480.
Article Google Scholar
Yu, L., Yue, W., Wang, S., & Lai, K. K. (2010). Support vector machine based multiagent ensemble learning for credit risk evaluation. Expert Systems with Applications, 37(2), 1351–1360.
Article Google Scholar
Zhu, B., Yang, W., Wang, H., & Yuan, Y. (2018). A hybrid deep learning model for consumer credit scoring. In 2018 international conference on artificial intelligence and big data (ICAIBD) (pp. 205–208). https://doi.org/10.1109/ICAIBD.2018.8396195

Download references

Author information

Authors and Affiliations

Department of Economics, Thompson Rivers University, Kamloops, BC, Canada
Mohammad Mahbobi
Department of Management, International Business, Information and Supply Chain, Thompson Rivers University, Kamloops, BC, Canada
Salman Kimiagari
Thompson Rivers University, Kamloops, BC, Canada
Marriappan Vasudevan

Authors

Mohammad Mahbobi
View author publications
You can also search for this author in PubMed Google Scholar
Salman Kimiagari
View author publications
You can also search for this author in PubMed Google Scholar
Marriappan Vasudevan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Mahbobi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1: Descriptive statistics of the data set

	Mean	SD	Min	Max	Q1	Median	Q3	Range	IQR	Mode	Skewness	Kurtosis
LIMIT_BAL	167,484	129,748	10,000	1,000,000	50,000	140,000	240,000	990,000	190,000	50,000	0.99	0.54
SEX	1.6037	0.4891	1	2	1	2	2	1	1	2	− 0.42	− 1.82
EDUCATION	1.8531	0.7903	0	6	1	2	2	6	1	2	0.97	2.08
MARRIAGE	1.5519	0.522	0	3	1	2	2	3	1	2	− 0.02	− 1.36
AGE	35.486	9.218	21	79	28	34	41	58	13	29	0.73	0.04
PAY_0	− 0.0167	1.1238	− 2	8	− 1	0	0	10	1	0	0.73	2.72
PAY_2	− 0.13377	1.19719	− 2	8	− 1	0	0	10	1	0	0.79	1.57
PAY_3	− 0.1662	1.19687	− 2	8	− 1	0	0	10	1	0	0.84	2.08
PAY_4	− 0.22067	1.16914	− 2	8	− 1	0	0	10	1	0	1	3.5
PAY_5	− 0.2662	1.13319	− 2	8	− 1	0	0	10	1	0	1.01	3.99
PAY_6	− 0.2911	1.14999	− 2	8	− 1	0	0	10	1	0	0.95	3.43
BILL_AMT1	51,223	73,636	− 165,580	964,511	3558	22,382	67,093	1,130,091	63,535	0	2.66	9.81
BILL_AMT2	49,179	71,174	− 69,777	983,931	2984	21,200	64,011	1,053,708	61,027	0	2.71	10.3
BILL_AMT3	47,013	69,349	− 157,264	1,664,089	2665	20,089	60,166	1,821,353	57,502	0	3.09	19.78
BILL_AMT4	43,263	64,333	− 170,000	891,586	2326	19,052	54,512	1,061,586	52,186	0	2.82	11.31
BILL_AMT5	40,311	60,797	− 81,334	927,171	1763	18,105	50,202	1,008,505	48,439	0	2.88	12.31
BILL_AMT6	38,872	59,554	− 339,603	961,664	1256	17,071	49,203	1,301,267	47,947	0	2.85	12.27
PAY_AMT1	5664	16,563	0	873,552	1000	2100	5006	873,552	4006	0	14.67	415.25
PAY_AMT2	5921	23,041	0	1,684,259	833	2009	5000	1,684,259	4167	0	30.45	1641.63
PAY_AMT3	5226	17,607	0	896,040	390	1800	4505	896,040	4115	0	17.22	564.31
PAY_AMT4	4826	15,666	0	621,000	296	1500	4014	621,000	3718	0	12.9	277.33
PAY_AMT5	4799	15,278	0	426,529	252	1500	4033	426,529	3781	0	11.13	180.06
PAY_AMT6	5216	17,777	0	528,666	117	1500	4000	528,666	3883	0	10.64	167.16
Default payment next month	0.2212	0.41506	0	1	0	0	0	1	0	0	1.34	− 0.2

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahbobi, M., Kimiagari, S. & Vasudevan, M. Credit risk classification: an integrated predictive accuracy algorithm using artificial and deep neural networks. Ann Oper Res 330, 609–637 (2023). https://doi.org/10.1007/s10479-021-04114-z

Download citation

Accepted: 07 May 2021
Published: 01 July 2021
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10479-021-04114-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Credit risk classification: an integrated predictive accuracy algorithm using artificial and deep neural networks

Abstract

Access this article

Similar content being viewed by others

Deep learning modelling techniques: current progress, applications, advantages, and challenges

Machine learning techniques for credit risk evaluation: a systematic literature review

Machine learning-driven credit risk: a systemic review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix 1: Descriptive statistics of the data set

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Credit risk classification: an integrated predictive accuracy algorithm using artificial and deep neural networks

Abstract

Access this article

Similar content being viewed by others

Deep learning modelling techniques: current progress, applications, advantages, and challenges

Machine learning techniques for credit risk evaluation: a systematic literature review

Machine learning-driven credit risk: a systemic review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix 1: Descriptive statistics of the data set

Appendix 1: Descriptive statistics of the data set

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation